Hello,

As far as I understand, this is somehow connected with pycuda::complex
having default constructor. When you write (in reduction kernel):

__shared__ out_type sdata[BLOCK_SIZE];

and "out_type" has the default constructor, it is called from _every_
thread for _every_ value of the array. The reduction kernel starts
like:

__shared__ out_type sdata[BLOCK_SIZE];
... collecting values ...
sdata[tid] = <some value>

which means that there is a race between the default constructor and
actual initialization. I am not sure why nvcc does not complain about
this (bug?), but the following things do help:
1. Using "extern __shared__ out_type sdata[]" and set the size of
shared memory when preparing the kernel.
or
2. Putting "__syncthreads()" between default initialization and actual
initialization (not very good, since it leaves all those calls to
default constructor, but still removes the symptom).

I googled a bit, but could not find any actual rules about default
constructors and shared memory.

Best regards,
Bogdan

On Fri, Jan 27, 2012 at 4:19 AM, Andreas Kloeckner
<[email protected]> wrote:
> Hi Jesse,
>
> On Wed, 25 Jan 2012 14:51:05 -0800, Jesse Lu <[email protected]> wrote:
>> The attached script shows highly inconsistent results (> 10% error at
>> times) between the numpy and gpuarray dot products. The inconsistent
>> results seem to only appear for large gpuarrays of data type complex64 or
>> complex128. Any ideas on what's going on? Thanks!
>
> I can reproduce the issue, and I'll try to see what's behind
> it. Unfortunately, I have a lot of stuff to do at the moment, so I can't
> give you an ETA. Of course, I'd also appreciate any help in getting this
> debugged--from anyone on the list! :)
>
> Andreas
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to