On Wed, Apr 4, 2012 at 9:15 PM, Michiel Bruinink
<[email protected]> wrote:
> First of all, I made a typo in my sample program. The value of 100000 should
> be 169. That makes those array declarations less problematic, I think.

Much less. This now amounts to ~6kb per thread, which may cause
problems depending on your block size and GPU — but at least it's
realistic.

> Unfortunately, the code section that I mentioned is quite large and I am not
> allowed to make it public.
> I can say though, that it composes of calculations with the above mentioned
> arrays.
> I have not been able to make a simple program that reproduces this effect
> yet, but I will have another look.
> But still, pyCuda uses the same compiler as nvcc, right?

Yes, this is correct (unless you have several of them installed —
which I'm not sure is even possible — and somehow pointed PyCuda to
the one not in $PATH). While you are trying to construct a minimal
example, you can check that:
- you are calling your kernel with the same block/grid/other
parameters from Py an C;
- you are using the same device, and the same compilation parameters (if any);
- you are timing only the call to this kernel, and the kernel actually
finishes when you are measuring time after.

Now I'm officially out of versions. Let's wait and see if more
experienced people have other ideas.

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to