Hi, the curandom Generator class initializes generators_per_block number of generators. This is the relevant code:
@property @memoize_method def generators_per_block(self): return min(kernel.max_threads_per_block for kernel in self._kernels()) On my machine the kernels have the following max_threads_per_block (for XORWOW): In [30]: [i.max_threads_per_block for i in g._kernels()] Out[30]: [512, 512, 512, 512, 384, 384, 384, 384, 384] The first four are for the normal and uniform generators. The last ones are for skip_aheads. Isn't this suboptimal? If I was only using the generators without skip-ahead it seems I could safely run 512 threads per block if those were initialized. Thomas
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda