Andreas Baumbach <[email protected]> writes:

> Hi,
>
> after I finally managed to subscribe to the mailing list I just ran in
> another issue. I'm still trying to implement a conjungate gradient method.
> That already works but the speed up vs Scipy with CUBLAS optimisation is
> only a factor of 4.
>
> Basically I need to store an scalar value (one double) on the GPU (as
> opposed to the main RAM of now) and pass this value as an argument to the
> mul_add-function.
> I tried using one-entry GPUarrays, but the only way I got this to work is
> via gpuarray.get() which only transfers it to CPU to put it back on the
> GPU. Which gives no speedup at all. Is there anyway to get this working?

https://github.com/inducer/pycuda/blob/master/pycuda/sparse/cg.py

:)

Note how it gets around the problem you encountered by making its own
custom "lc2" (=linear combination of two vectors) kernel. It also binds
the scalar coefficients to texture references, which on GT200 and
earlier was a reasonable way of doing scalar broadcast. (Fermi and newer
have "ldu" instructions ("load uniform") that should make this
redundant).

HTH,
Andreas

Attachment: pgp57TwucKFe8.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to