Hi all, I'm attempting to use PyCUDA to write a GPU backend for a Gaussian process package, http://pymc.googlecode.com/files/GPUserGuide.pdf. I'm pretty excited about it; if all goes well it should be possible to hide the GPU completely from the user. PyCUDA has made development much easier so far.
Now I've got to wrap some functions in the MAGMA library, http://icl.cs.utk.edu/magma/. I'm getting segfaults in function magma_spotrf_gpu, which does Cholesky factorizations. This is my first time using both ctypes and MAGMA, and I'm pretty new to GPU programming in general, so I'm having a hard time diagnosing the problem. However, I've wrapped the 'CPU-only' version of magma_spotrf_gpu without segfaults, which leads me to believe that the problem is the argument A, which is the on-gpu matrix to be factorized. magma_spotrf_gpu expects A to be a float*, and I'm giving it ctypes.POINTER(ctypes.c_float).from_address(int(A)) where A is the output of pycuda's to_device. The documentation of magma_spotrf_gpu follows, and a short program demonstrating the problem follows that. Unfortunately it's not possible to run the program; you would have to build magma v0.2 as a shared library, and so far the package is only available in binary form, and the 64-bit version was not compiled with -fPIC. If anyone is keen to run it, please email me off-list and I'll tell you how to go about it. Any help would be much appreciated. Thanks in advance, Anand ---------------- magma_sportf_gpu documentation int magma_spotrf_gpu(char *uplo, int *n, float *a, int *lda, float *work, int *info) SPOTRF computes the Cholesky factorization of a real symmetric positive definite matrix A. The factorization has the form A =U**T*U, if UPLO=’U’, or A=L *L**T, if UPLO=’L’, where U is an upper triangular matrix and L is lower triangular. This is the block version of the algorithm, calling Level 3 BLAS. UPLO (input) CHARACTER*1 = ’U’: Upper triangle of A is stored; = ’L’: Lower triangle of A is stored. N (input) INTEGER The order of the matrix A. N >= 0. A (input/output) REAL array on the GPU, dimension (LDA,N) On entry, the symmetric matrix A. If UPLO = ’U’, the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced. If UPLO = ’L’, the leading N-by-N lower triangular part of A contains the lower triangular part of the matrix A, and the strictly upper triangular part of A is not referenced. On exit, if INFO = 0, the factor U or L from the Cholesky factorization A = U**T*U or A = L*L**T. LDA (input) INTEGER The leading dimension of the array A. LDA >= max(1,N). WORK (workspace) REAL array, dimension at least (nb, nb) where nb can be obtained through magma_get_spotrf_nb(*n) Work array allocated with cudaMallocHost. INFO (output) INTEGER = 0: successful exit < 0: if INFO = -i, the i-th argument had an illegal value > 0: if INFO = i, the leading minor of order i is not positive definite, and the factorization could not be completed. ---------------- pymagma.py import pycuda.driver as cuda import pycuda.autoinit import numpy as np import ctypes magma_path = '/home/malaria-atlas-project/sources/magma_0.2s/lib/objectfiles/libmagma.so' libmagma = ctypes.cdll.LoadLibrary(magma_path) int_pointer = lambda x: ctypes.pointer(ctypes.c_int(x)) char_pointer = lambda x: ctypes.pointer(ctypes.c_char(x)) _magma_spotrf_gpu = libmagma.magma_spotrf_gpu _magma_spotrf_gpu.restype = ctypes.c_uint _magma_spotrf_gpu.argtypes = [ctypes.c_char_p, ctypes.POINTER(ctypes.c_int), ctypes.POINTER(ctypes.c_float), ctypes.POINTER(ctypes.c_int), np.ctypeslib.ndpointer(dtype='float32'), ctypes.POINTER(ctypes.c_int)] magma_get_spotrf_nb = libmagma.magma_get_spotrf_nb magma_get_spotrf_nb.restype = ctypes.c_uint magma_get_spotrf_nb.argtypes = [ctypes.POINTER(ctypes.c_int)] def magma_spotrf_gpu(uplo, n, A, lda, work): info = 1 from IPython.Debugger import Pdb Pdb(color_scheme='LightBG').set_trace() _magma_spotrf_gpu(char_pointer(uplo), int_pointer(n), # ==================================================================== # = I think this argument is responsible for the segmentation fault. = # ==================================================================== ctypes.POINTER(ctypes.c_float).from_address(int(A)), int_pointer(lda), work.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), int_pointer(info)) return info if __name__ == '__main__': n = 10 # Create matrix to be factored A_orig = (np.eye(n) + np.ones((n,n))*.3).astype('float32') A_gpu = cuda.to_device(A_orig) # Allocate pagelocked work array nwork = magma_get_spotrf_nb(int_pointer(n)) print nwork work_gpu = cuda.pagelocked_empty((nwork,nwork), dtype='float32') # # Do Cholesky factorization info = magma_spotrf_gpu('L', n, A_gpu, n, work_gpu)
_______________________________________________ PyCUDA mailing list [email protected] http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
