Andreas Baumbach <[email protected]> writes: > Hi, > > after I finally managed to subscribe to the mailing list I just ran in > another issue. I'm still trying to implement a conjungate gradient method. > That already works but the speed up vs Scipy with CUBLAS optimisation is > only a factor of 4. > > Basically I need to store an scalar value (one double) on the GPU (as > opposed to the main RAM of now) and pass this value as an argument to the > mul_add-function. > I tried using one-entry GPUarrays, but the only way I got this to work is > via gpuarray.get() which only transfers it to CPU to put it back on the > GPU. Which gives no speedup at all. Is there anyway to get this working?
https://github.com/inducer/pycuda/blob/master/pycuda/sparse/cg.py :) Note how it gets around the problem you encountered by making its own custom "lc2" (=linear combination of two vectors) kernel. It also binds the scalar coefficients to texture references, which on GT200 and earlier was a reasonable way of doing scalar broadcast. (Fermi and newer have "ldu" instructions ("load uniform") that should make this redundant). HTH, Andreas
pgp57TwucKFe8.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
