On Wed, Apr 4, 2012 at 9:15 PM, Michiel Bruinink <[email protected]> wrote: > First of all, I made a typo in my sample program. The value of 100000 should > be 169. That makes those array declarations less problematic, I think.
Much less. This now amounts to ~6kb per thread, which may cause problems depending on your block size and GPU — but at least it's realistic. > Unfortunately, the code section that I mentioned is quite large and I am not > allowed to make it public. > I can say though, that it composes of calculations with the above mentioned > arrays. > I have not been able to make a simple program that reproduces this effect > yet, but I will have another look. > But still, pyCuda uses the same compiler as nvcc, right? Yes, this is correct (unless you have several of them installed — which I'm not sure is even possible — and somehow pointed PyCuda to the one not in $PATH). While you are trying to construct a minimal example, you can check that: - you are calling your kernel with the same block/grid/other parameters from Py an C; - you are using the same device, and the same compilation parameters (if any); - you are timing only the call to this kernel, and the kernel actually finishes when you are measuring time after. Now I'm officially out of versions. Let's wait and see if more experienced people have other ideas. _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
