Re: [PyCUDA] Help sought wrapping MAGMA

Bryan Catanzaro Fri, 05 Feb 2010 06:58:42 -0800

Are you using CUDA 3.0 beta?  MAGMA probably uses the runtime API, while PyCUDA 
uses the driver API.  Up until CUDA 3.0, pointers are incompatible between the 
two APIs and so, for example, dereferencing a PyCUDA allocated pointer (driver) 
in a runtime API program (which I'm guessing MAGMA is), will cause a segfault.


Upgrading to CUDA 3.0 beta will solve that problem.

- bryan

On Feb 5, 2010, at 6:54 AM, Anand Patil wrote:

> Hi all,
> 
> I'm attempting to use PyCUDA to write a GPU backend for a Gaussian process 
> package, http://pymc.googlecode.com/files/GPUserGuide.pdf. I'm pretty excited 
> about it; if all goes well it should be possible to hide the GPU completely 
> from the user. PyCUDA has made development much easier so far.
> 
> Now I've got to wrap some functions in the MAGMA library, 
> http://icl.cs.utk.edu/magma/. I'm getting segfaults in function 
> magma_spotrf_gpu, which does Cholesky factorizations. This is my first time 
> using both ctypes and MAGMA, and I'm pretty new to GPU programming in 
> general, so I'm having a hard time diagnosing the problem.
> 
> However, I've wrapped the 'CPU-only' version of magma_spotrf_gpu without 
> segfaults, which leads me to believe that the problem is the argument A, 
> which is the on-gpu matrix to be factorized. magma_spotrf_gpu expects A to be 
> a float*, and I'm giving it
> 
>     ctypes.POINTER(ctypes.c_float).from_address(int(A))
> 
> where A is the output of pycuda's to_device.
> 
> The documentation of magma_spotrf_gpu follows, and a short program 
> demonstrating the problem follows that. Unfortunately it's not possible to 
> run the program; you would have to build magma v0.2 as a shared library, and 
> so far the package is only available in binary form, and the 64-bit version 
> was not compiled with -fPIC. If anyone is keen to run it, please email me 
> off-list and I'll tell you how to go about it.
> 
> Any help would be much appreciated.
> 
> Thanks in advance,
> Anand
> 
> 
> ---------------- magma_sportf_gpu documentation
> 
> int magma_spotrf_gpu(char *uplo, int *n, float *a, int *lda,
>                    float *work, int *info)
>                    
>    SPOTRF computes the Cholesky factorization of a real symmetric positive 
> definite matrix A.
>    
>    The factorization has the form 
>    
>    
>    A =U**T*U, if UPLO=’U’, 
>        
>        or 
>        
>        A=L *L**T, if UPLO=’L’,
>    where U is an upper triangular matrix and L is lower triangular.
>    
>    This is the block version of the algorithm, calling Level 3 BLAS.
>    
>    UPLO           (input) CHARACTER*1 
>                   = ’U’: Upper triangle of A is stored; 
>                   = ’L’: Lower triangle of A is stored.
> 
>    N              (input) INTEGER The order of the matrix A. N >= 0.
> 
>    A              (input/output) REAL array on the GPU, dimension (LDA,N) 
>                   On entry, the symmetric matrix A. If UPLO = ’U’, the 
> leading 
>                   N-by-N upper triangular part of A contains the upper 
>                   triangular part of the matrix A, and the strictly lower 
>                   triangular part of A is not referenced. If UPLO = ’L’, the 
>                   leading N-by-N lower triangular part of A contains the 
> lower 
>                   triangular part of the matrix A, and the strictly upper 
>                   triangular part of A is not referenced.
>                   
>                   On exit, if INFO = 0, the factor U or L from the Cholesky 
>                   factorization A = U**T*U or A = L*L**T.
>                   
>    LDA            (input) INTEGER The leading dimension of the array A. LDA 
> >= max(1,N).
>                   
>    WORK           (workspace) REAL array, dimension at least (nb, nb) where 
> nb can be 
>                   obtained through magma_get_spotrf_nb(*n) 
>                   Work array allocated with cudaMallocHost.
>                   
>    INFO           (output) INTEGER
>                   = 0: successful exit 
>                   < 0: if INFO = -i, the i-th argument had an illegal value 
>                   > 0: if INFO = i, the leading minor of order i is not 
> positive definite, and the factorization could not be completed.
> 
> 
> ---------------- pymagma.py
> 
> import pycuda.driver as cuda
> import pycuda.autoinit
> import numpy as np
> import ctypes
> 
> magma_path = 
> '/home/malaria-atlas-project/sources/magma_0.2s/lib/objectfiles/libmagma.so'
> libmagma = ctypes.cdll.LoadLibrary(magma_path)
> 
> int_pointer = lambda x: ctypes.pointer(ctypes.c_int(x))
> char_pointer = lambda x: ctypes.pointer(ctypes.c_char(x))
> 
> _magma_spotrf_gpu = libmagma.magma_spotrf_gpu     
> _magma_spotrf_gpu.restype = ctypes.c_uint
> _magma_spotrf_gpu.argtypes = [ctypes.c_char_p,
>                                ctypes.POINTER(ctypes.c_int),
>                                ctypes.POINTER(ctypes.c_float),
>                                ctypes.POINTER(ctypes.c_int),
>                                np.ctypeslib.ndpointer(dtype='float32'),
>                                ctypes.POINTER(ctypes.c_int)]
> 
> magma_get_spotrf_nb = libmagma.magma_get_spotrf_nb
> magma_get_spotrf_nb.restype = ctypes.c_uint
> magma_get_spotrf_nb.argtypes = [ctypes.POINTER(ctypes.c_int)]
> 
> def magma_spotrf_gpu(uplo, n, A, lda, work):
>    info = 1
>    from IPython.Debugger import Pdb
>    Pdb(color_scheme='LightBG').set_trace() 
>    _magma_spotrf_gpu(char_pointer(uplo),
>                        int_pointer(n),
>                        # 
> ====================================================================
>                        # = I think this argument is responsible for the 
> segmentation fault. =
>                        # 
> ====================================================================
>                        ctypes.POINTER(ctypes.c_float).from_address(int(A)),
>                        int_pointer(lda),
>                        work.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
>                        int_pointer(info))
>    return info    
> 
> if __name__ == '__main__':
>    n = 10
>    
>    # Create matrix to be factored
>    A_orig = (np.eye(n) + np.ones((n,n))*.3).astype('float32')
>    A_gpu = cuda.to_device(A_orig)
> 
>    # Allocate pagelocked work array
>    nwork = magma_get_spotrf_nb(int_pointer(n))
>    print nwork
>    work_gpu = cuda.pagelocked_empty((nwork,nwork), dtype='float32')
>    
>    # # Do Cholesky factorization
>    info = magma_spotrf_gpu('L', n, A_gpu, n, work_gpu)
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Re: [PyCUDA] Help sought wrapping MAGMA

Reply via email to