Hi Andreas, On Sat, Jan 28, 2012 at 3:23 AM, Andreas Kloeckner <[email protected]> wrote: > Indeed, inserting __syncthreads() after the > shared array declaration brings the error down to more reasonable values > for me. Jesse, my recommendation would be to use that as a workaround > while we figure out a more permanent fix.
Can't we do this: >> 1. Using "extern __shared__ out_type sdata[]" and setting the size of >> shared memory when preparing the kernel. We can pass dtype instead of ctype to get_reduction_kernel_and_types(), and convert it to ctype + data size inside. Best regards, Bogdan _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
