Hi all,

I'm attempting to use PyCUDA to write a GPU backend for a Gaussian process
package, http://pymc.googlecode.com/files/GPUserGuide.pdf. I'm pretty
excited about it; if all goes well it should be possible to hide the GPU
completely from the user. PyCUDA has made development much easier so far.

Now I've got to wrap some functions in the MAGMA library,
http://icl.cs.utk.edu/magma/. I'm getting segfaults in function
magma_spotrf_gpu, which does Cholesky factorizations. This is my first time
using both ctypes and MAGMA, and I'm pretty new to GPU programming in
general, so I'm having a hard time diagnosing the problem.

However, I've wrapped the 'CPU-only' version of magma_spotrf_gpu without
segfaults, which leads me to believe that the problem is the argument A,
which is the on-gpu matrix to be factorized. magma_spotrf_gpu expects A to
be a float*, and I'm giving it

    ctypes.POINTER(ctypes.c_float).from_address(int(A))

where A is the output of pycuda's to_device.

The documentation of magma_spotrf_gpu follows, and a short program
demonstrating the problem follows that. Unfortunately it's not possible to
run the program; you would have to build magma v0.2 as a shared library, and
so far the package is only available in binary form, and the 64-bit version
was not compiled with -fPIC. If anyone is keen to run it, please email me
off-list and I'll tell you how to go about it.

Any help would be much appreciated.

Thanks in advance,
Anand


---------------- magma_sportf_gpu documentation

int magma_spotrf_gpu(char *uplo, int *n, float *a, int *lda,
                   float *work, int *info)

   SPOTRF computes the Cholesky factorization of a real symmetric positive
definite matrix A.

   The factorization has the form


   A =U**T*U, if UPLO=’U’,

       or

       A=L *L**T, if UPLO=’L’,
   where U is an upper triangular matrix and L is lower triangular.

   This is the block version of the algorithm, calling Level 3 BLAS.

   UPLO           (input) CHARACTER*1
                  = ’U’: Upper triangle of A is stored;
                  = ’L’: Lower triangle of A is stored.

   N              (input) INTEGER The order of the matrix A. N >= 0.

   A              (input/output) REAL array on the GPU, dimension (LDA,N)
                  On entry, the symmetric matrix A. If UPLO = ’U’, the
leading
                  N-by-N upper triangular part of A contains the upper
                  triangular part of the matrix A, and the strictly lower
                  triangular part of A is not referenced. If UPLO = ’L’, the

                  leading N-by-N lower triangular part of A contains the
lower
                  triangular part of the matrix A, and the strictly upper
                  triangular part of A is not referenced.

                  On exit, if INFO = 0, the factor U or L from the Cholesky
                  factorization A = U**T*U or A = L*L**T.

   LDA            (input) INTEGER The leading dimension of the array A. LDA
>= max(1,N).

   WORK           (workspace) REAL array, dimension at least (nb, nb) where
nb can be
                  obtained through magma_get_spotrf_nb(*n)
                  Work array allocated with cudaMallocHost.

   INFO           (output) INTEGER
                  = 0: successful exit
                  < 0: if INFO = -i, the i-th argument had an illegal value
                  > 0: if INFO = i, the leading minor of order i is not
positive definite, and the factorization could not be completed.


---------------- pymagma.py

import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import ctypes

magma_path =
'/home/malaria-atlas-project/sources/magma_0.2s/lib/objectfiles/libmagma.so'
libmagma = ctypes.cdll.LoadLibrary(magma_path)

int_pointer = lambda x: ctypes.pointer(ctypes.c_int(x))
char_pointer = lambda x: ctypes.pointer(ctypes.c_char(x))

_magma_spotrf_gpu = libmagma.magma_spotrf_gpu
_magma_spotrf_gpu.restype = ctypes.c_uint
_magma_spotrf_gpu.argtypes = [ctypes.c_char_p,
                               ctypes.POINTER(ctypes.c_int),
                               ctypes.POINTER(ctypes.c_float),
                               ctypes.POINTER(ctypes.c_int),
                               np.ctypeslib.ndpointer(dtype='float32'),
                               ctypes.POINTER(ctypes.c_int)]

magma_get_spotrf_nb = libmagma.magma_get_spotrf_nb
magma_get_spotrf_nb.restype = ctypes.c_uint
magma_get_spotrf_nb.argtypes = [ctypes.POINTER(ctypes.c_int)]

def magma_spotrf_gpu(uplo, n, A, lda, work):
   info = 1
   from IPython.Debugger import Pdb
   Pdb(color_scheme='LightBG').set_trace()
   _magma_spotrf_gpu(char_pointer(uplo),
                       int_pointer(n),
                       #
====================================================================
                       # = I think this argument is responsible for the
segmentation fault. =
                       #
====================================================================
                       ctypes.POINTER(ctypes.c_float).from_address(int(A)),
                       int_pointer(lda),
                       work.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
                       int_pointer(info))
   return info

if __name__ == '__main__':
   n = 10

   # Create matrix to be factored
   A_orig = (np.eye(n) + np.ones((n,n))*.3).astype('float32')
   A_gpu = cuda.to_device(A_orig)

   # Allocate pagelocked work array
   nwork = magma_get_spotrf_nb(int_pointer(n))
   print nwork
   work_gpu = cuda.pagelocked_empty((nwork,nwork), dtype='float32')

   # # Do Cholesky factorization
   info = magma_spotrf_gpu('L', n, A_gpu, n, work_gpu)
_______________________________________________
PyCUDA mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to