Hi Andreas,

> Have you reported this to Nvidia? (If not, you should.)

Well, as I mentioned, I did not manage to reproduce the bug in pure
Cuda, and reporting bug without the reproduction sequence is just rude
)

Anyway, this is no longer a blocker to me, since I have rewritten my
code (pycudafft, which I already advertised in this maillist) so that
'dir' is a template parameter now. This miraculously fixed the bug,
and I got slightly increased speed as a bonus.

Best regards,
Bogdan

On Mon, Mar 1, 2010 at 12:48 PM, Andreas Klöckner
<[email protected]> wrote:
> On Dienstag 09 Februar 2010, Bogdan Opanchuk wrote:
>> Hello,
>>
>> Yet another stupid question. Most probably, I missed something
>> obvious, but anyway - can someone explain why I get some NaN's in
>> output for the program (listed below)? Surprisingly, bug disappears if
>> I send '1' instead of '-1' as a third parameter to function (or remove
>> 'int' parameters completely and leave only two pointers). Same kernel
>> in pure Cuda works fine. Looks like memory corruption, but I can't
>> figure out where it happens...
>
> This looks like a compiler bug to me. I've attached the PTX that the 3.0
> compiler generates--apparently all your loops get unrolled, and then
> something gets confused, though I wasn't able to track down what
> exactly.
>
> Couple more data points:
> - Even in the first case (that you report as being ok), I get floating
>  point garbage in the first 32 entries of b_gpu.
> - Adding an index bounds check to the second for loop also appears to
>  fix things.
>
> Have you reported this to Nvidia? (If not, you should.)
>
> Andreas
>
> PS: Sorry for the long absence everybody. I was at a workshop, and then
> had lots to do on my return. Plus I have a thesis coming up, so please
> bear with me. :)
>
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to