Conjugate gradient---great application!

How are you currently providing this scalar to your kernel invocation?

Why doesn't just passing it in as a scalar value work (casting it to a
numpy.float32() or float64() if needed)? I mean, do you explicitly need the
scalar to live in GPU memory where all threads can see it and update it? If
this is the case (i.e., your scalar needs to be writeable by any thread),
you could always mem_alloc() four or eight bytes and provide your kernels
with a pointer to this---is there a problem with this approach too?

Best,
Ahmed

On Tue, Jul 16, 2013 at 3:18 PM, Andreas Baumbach
<[email protected]>wrote:

> Hi,
>
> after I finally managed to subscribe to the mailing list I just ran in
> another issue. I'm still trying to implement a conjungate gradient method.
> That already works but the speed up vs Scipy with CUBLAS optimisation is
> only a factor of 4.
>
> Basically I need to store an scalar value (one double) on the GPU (as
> opposed to the main RAM of now) and pass this value as an argument to the
> mul_add-function.
> I tried using one-entry GPUarrays, but the only way I got this to work is
> via gpuarray.get() which only transfers it to CPU to put it back on the
> GPU. Which gives no speedup at all. Is there anyway to get this working?
>
> Cheers,
> Andi
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>
>
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to