Re: [PyCUDA] cuBLAS on gpuarray

Bryan Catanzaro Mon, 12 Apr 2010 09:38:46 -0700

I've had the same error as well when interoperating between PyCUDA and
CUDA Runtime API based functions.  My workaround is to avoid
pycuda.autoinit and do this instead:


import pycuda.driver as cuda
cuda.init()
from pycuda.tools import make_default_context
context = make_default_context()
device = context.get_device()
import atexit
atexit.register(context.detach)

The only difference here is that at exit, we're detaching from the
cuda context instead of popping it as pycuda.autoinit does.  That gets
rid of the error, although it's probably not the "correct" solution to
the problem.

- bryan


On Mon, Apr 12, 2010 at 12:35 AM, Paul Northug <pnort...@gmail.com> wrote:
> (I am using cuda 3.0 on OS X 10.6.3, pycuda-0.94rc. The device is
> GeForce 9600M GT.)
>
> I would like to use cuBLAS on gpuarray's in pycuda. At bottom is a
> test matrix-matrix multiply program. In addition to not knowing what
> I'm doing, I'm having the following problems:
>
> 1. The program runs correctly but terminates on exit with a:
>
> terminate called after throwing an instance of 'cuda::error'
>  what():  cuCtxPushCurrent failed: invalid value
>
> In the interoperability section 3.4 of the programming guide, context
> stack manipulation is listed as not interperable. What does the error
> mean and how can I avoid it?
>
> 2. When I do sgemm(a, b, c) where a and b are gpuarray's, I am getting
> c = np.dot(b, a) instead of c = np.dot(a, b). Does gpuarray convert
> row major format to something else (column?) in its internal
> representation? Or am I calling sgemm incorrectly?
>
> 3. Now that it's possible to interoperate to some extent, are there
> plans to add runtime features to pycuda?
>
> cuBLAS was about 5 times slower for small matrices (<100) and 4500
> times faster for larger matrices (>500) than numpy. Does that sound
> about right? If so, that's impressive. What are comparable ratios for
> newer cards and dgemm?
>
> Here is my code (one of my first). It depends on pystream.cublas, a
> ctypes wrapper (http://code.google.com/p/pystream/):
>
> import numpy as np
> from ctypes import *
> from pystream import cublas
> import pycuda.driver as cuda
> import pycuda.autoinit
> import pycuda.gpuarray as gpuarray
> from time import time as now
>
> # dims
> m, k, n = 10, 10, 10
>
> # host
> a = np.random.randn(m, k).astype(np.float32)
> b = np.random.randn(k, n).astype(np.float32)
> c = np.empty((m, n), dtype=np.float32)
>
> # device
> a_g = gpuarray.to_gpu(a)
> b_g = gpuarray.to_gpu(b)
> c_g = gpuarray.empty(c.shape, dtype=np.float32)
>
> # cast to ctypes pointers to float
> ap = cast(int(a_g.gpudata), POINTER(c_float))
> bp = cast(int(b_g.gpudata), POINTER(c_float))
> cp = cast(int(c_g.gpudata), POINTER(c_float))
>
> # iterations for timing
> t = 1000
>
> # call cublas
> cublas.cublasInit()
> tic = now()
> for i in range(t):
>    cublas.cublasSgemm('n', 'n', m, n, k, 1.,
>                       ap, k,
>                       bp, n, 0.,
>                       cp, n)
> toc = now() - tic
> cublas.cublasShutdown()
> c_g.get(c)
>
> print 'cublas'
> print '%d iter, %g s/iter' % (t, toc / t)
> print c
>
> # compare to numpy
> cn = np.empty_like(c)
> tic = now()
> for i in range(t):
>    # cn = np.dot(a, b)   # this doesn't work
>    cn = np.dot(b, a)
> toc = now() - tic
>
> print 'numpy'
> print '%d iter, %g s/iter' % (t, toc / t)
> print cn
>
> print 'error ', ((c-cn)**2).sum()
>
> _______________________________________________
> PyCUDA mailing list
> pyc...@host304.hostmonster.com
> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>

_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Re: [PyCUDA] cuBLAS on gpuarray

Reply via email to