Hi,

after I finally managed to subscribe to the mailing list I just ran in
another issue. I'm still trying to implement a conjungate gradient method.
That already works but the speed up vs Scipy with CUBLAS optimisation is
only a factor of 4.

Basically I need to store an scalar value (one double) on the GPU (as
opposed to the main RAM of now) and pass this value as an argument to the
mul_add-function.
I tried using one-entry GPUarrays, but the only way I got this to work is
via gpuarray.get() which only transfers it to CPU to put it back on the
GPU. Which gives no speedup at all. Is there anyway to get this working?

Cheers,
Andi
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to