Hi Andreas, > Have you reported this to Nvidia? (If not, you should.)
Well, as I mentioned, I did not manage to reproduce the bug in pure Cuda, and reporting bug without the reproduction sequence is just rude ) Anyway, this is no longer a blocker to me, since I have rewritten my code (pycudafft, which I already advertised in this maillist) so that 'dir' is a template parameter now. This miraculously fixed the bug, and I got slightly increased speed as a bonus. Best regards, Bogdan On Mon, Mar 1, 2010 at 12:48 PM, Andreas Klöckner <[email protected]> wrote: > On Dienstag 09 Februar 2010, Bogdan Opanchuk wrote: >> Hello, >> >> Yet another stupid question. Most probably, I missed something >> obvious, but anyway - can someone explain why I get some NaN's in >> output for the program (listed below)? Surprisingly, bug disappears if >> I send '1' instead of '-1' as a third parameter to function (or remove >> 'int' parameters completely and leave only two pointers). Same kernel >> in pure Cuda works fine. Looks like memory corruption, but I can't >> figure out where it happens... > > This looks like a compiler bug to me. I've attached the PTX that the 3.0 > compiler generates--apparently all your loops get unrolled, and then > something gets confused, though I wasn't able to track down what > exactly. > > Couple more data points: > - Even in the first case (that you report as being ok), I get floating > point garbage in the first 32 entries of b_gpu. > - Adding an index bounds check to the second for loop also appears to > fix things. > > Have you reported this to Nvidia? (If not, you should.) > > Andreas > > PS: Sorry for the long absence everybody. I was at a workshop, and then > had lots to do on my return. Plus I have a thesis coming up, so please > bear with me. :) > > _______________________________________________ PyCUDA mailing list [email protected] http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
