Hi, after I finally managed to subscribe to the mailing list I just ran in another issue. I'm still trying to implement a conjungate gradient method. That already works but the speed up vs Scipy with CUBLAS optimisation is only a factor of 4.
Basically I need to store an scalar value (one double) on the GPU (as opposed to the main RAM of now) and pass this value as an argument to the mul_add-function. I tried using one-entry GPUarrays, but the only way I got this to work is via gpuarray.get() which only transfers it to CPU to put it back on the GPU. Which gives no speedup at all. Is there anyway to get this working? Cheers, Andi
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
