Re: [PyCUDA] Help sought wrapping MAGMA

Anand Patil Fri, 05 Feb 2010 07:00:11 -0800

Thanks for the incredibly quick response, Bryan! I'll look into that and get
back to you.


Anand

On Fri, Feb 5, 2010 at 2:58 PM, Bryan Catanzaro
<[email protected]>wrote:

> Are you using CUDA 3.0 beta?  MAGMA probably uses the runtime API, while
> PyCUDA uses the driver API.  Up until CUDA 3.0, pointers are incompatible
> between the two APIs and so, for example, dereferencing a PyCUDA allocated
> pointer (driver) in a runtime API program (which I'm guessing MAGMA is),
> will cause a segfault.
>
> Upgrading to CUDA 3.0 beta will solve that problem.
>
> - bryan
>
> On Feb 5, 2010, at 6:54 AM, Anand Patil wrote:
>
> > Hi all,
> >
> > I'm attempting to use PyCUDA to write a GPU backend for a Gaussian
> process package, http://pymc.googlecode.com/files/GPUserGuide.pdf. I'm
> pretty excited about it; if all goes well it should be possible to hide the
> GPU completely from the user. PyCUDA has made development much easier so
> far.
> >
> > Now I've got to wrap some functions in the MAGMA library,
> http://icl.cs.utk.edu/magma/. I'm getting segfaults in function
> magma_spotrf_gpu, which does Cholesky factorizations. This is my first time
> using both ctypes and MAGMA, and I'm pretty new to GPU programming in
> general, so I'm having a hard time diagnosing the problem.
> >
> > However, I've wrapped the 'CPU-only' version of magma_spotrf_gpu without
> segfaults, which leads me to believe that the problem is the argument A,
> which is the on-gpu matrix to be factorized. magma_spotrf_gpu expects A to
> be a float*, and I'm giving it
> >
> >     ctypes.POINTER(ctypes.c_float).from_address(int(A))
> >
> > where A is the output of pycuda's to_device.
> >
> > The documentation of magma_spotrf_gpu follows, and a short program
> demonstrating the problem follows that. Unfortunately it's not possible to
> run the program; you would have to build magma v0.2 as a shared library, and
> so far the package is only available in binary form, and the 64-bit version
> was not compiled with -fPIC. If anyone is keen to run it, please email me
> off-list and I'll tell you how to go about it.
> >
> > Any help would be much appreciated.
> >
> > Thanks in advance,
> > Anand
> >
> >
> > ---------------- magma_sportf_gpu documentation
> >
> > int magma_spotrf_gpu(char *uplo, int *n, float *a, int *lda,
> >                    float *work, int *info)
> >
> >    SPOTRF computes the Cholesky factorization of a real symmetric
> positive definite matrix A.
> >
> >    The factorization has the form
> >
> >
> >    A =U**T*U, if UPLO=’U’,
> >
> >        or
> >
> >        A=L *L**T, if UPLO=’L’,
> >    where U is an upper triangular matrix and L is lower triangular.
> >
> >    This is the block version of the algorithm, calling Level 3 BLAS.
> >
> >    UPLO           (input) CHARACTER*1
> >                   = ’U’: Upper triangle of A is stored;
> >                   = ’L’: Lower triangle of A is stored.
> >
> >    N              (input) INTEGER The order of the matrix A. N >= 0.
> >
> >    A              (input/output) REAL array on the GPU, dimension (LDA,N)
> >                   On entry, the symmetric matrix A. If UPLO = ’U’, the
> leading
> >                   N-by-N upper triangular part of A contains the upper
> >                   triangular part of the matrix A, and the strictly lower
> >                   triangular part of A is not referenced. If UPLO = ’L’,
> the
> >                   leading N-by-N lower triangular part of A contains the
> lower
> >                   triangular part of the matrix A, and the strictly upper
> >                   triangular part of A is not referenced.
> >
> >                   On exit, if INFO = 0, the factor U or L from the
> Cholesky
> >                   factorization A = U**T*U or A = L*L**T.
> >
> >    LDA            (input) INTEGER The leading dimension of the array A.
> LDA >= max(1,N).
> >
> >    WORK           (workspace) REAL array, dimension at least (nb, nb)
> where nb can be
> >                   obtained through magma_get_spotrf_nb(*n)
> >                   Work array allocated with cudaMallocHost.
> >
> >    INFO           (output) INTEGER
> >                   = 0: successful exit
> >                   < 0: if INFO = -i, the i-th argument had an illegal
> value
> >                   > 0: if INFO = i, the leading minor of order i is not
> positive definite, and the factorization could not be completed.
> >
> >
> > ---------------- pymagma.py
> >
> > import pycuda.driver as cuda
> > import pycuda.autoinit
> > import numpy as np
> > import ctypes
> >
> > magma_path =
> '/home/malaria-atlas-project/sources/magma_0.2s/lib/objectfiles/libmagma.so'
> > libmagma = ctypes.cdll.LoadLibrary(magma_path)
> >
> > int_pointer = lambda x: ctypes.pointer(ctypes.c_int(x))
> > char_pointer = lambda x: ctypes.pointer(ctypes.c_char(x))
> >
> > _magma_spotrf_gpu = libmagma.magma_spotrf_gpu
> > _magma_spotrf_gpu.restype = ctypes.c_uint
> > _magma_spotrf_gpu.argtypes = [ctypes.c_char_p,
> >                                ctypes.POINTER(ctypes.c_int),
> >                                ctypes.POINTER(ctypes.c_float),
> >                                ctypes.POINTER(ctypes.c_int),
> >                                np.ctypeslib.ndpointer(dtype='float32'),
> >                                ctypes.POINTER(ctypes.c_int)]
> >
> > magma_get_spotrf_nb = libmagma.magma_get_spotrf_nb
> > magma_get_spotrf_nb.restype = ctypes.c_uint
> > magma_get_spotrf_nb.argtypes = [ctypes.POINTER(ctypes.c_int)]
> >
> > def magma_spotrf_gpu(uplo, n, A, lda, work):
> >    info = 1
> >    from IPython.Debugger import Pdb
> >    Pdb(color_scheme='LightBG').set_trace()
> >    _magma_spotrf_gpu(char_pointer(uplo),
> >                        int_pointer(n),
> >                        #
> ====================================================================
> >                        # = I think this argument is responsible for the
> segmentation fault. =
> >                        #
> ====================================================================
> >
>  ctypes.POINTER(ctypes.c_float).from_address(int(A)),
> >                        int_pointer(lda),
> >
>  work.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
> >                        int_pointer(info))
> >    return info
> >
> > if __name__ == '__main__':
> >    n = 10
> >
> >    # Create matrix to be factored
> >    A_orig = (np.eye(n) + np.ones((n,n))*.3).astype('float32')
> >    A_gpu = cuda.to_device(A_orig)
> >
> >    # Allocate pagelocked work array
> >    nwork = magma_get_spotrf_nb(int_pointer(n))
> >    print nwork
> >    work_gpu = cuda.pagelocked_empty((nwork,nwork), dtype='float32')
> >
> >    # # Do Cholesky factorization
> >    info = magma_spotrf_gpu('L', n, A_gpu, n, work_gpu)
> > _______________________________________________
> > PyCUDA mailing list
> > [email protected]
> > http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Re: [PyCUDA] Help sought wrapping MAGMA

Reply via email to