Hi,

the curandom Generator class initializes generators_per_block number of
generators. This is the relevant code:

    @property
    @memoize_method
    def generators_per_block(self):
        return min(kernel.max_threads_per_block
                for kernel in self._kernels())


On my machine the kernels have the following max_threads_per_block (for
XORWOW):

In [30]: [i.max_threads_per_block for i in g._kernels()]
Out[30]: [512, 512, 512, 512, 384, 384, 384, 384, 384]

The first four are for the normal and uniform generators. The last ones are
for skip_aheads.

Isn't this suboptimal? If I was only using the generators without
skip-ahead it seems I could safely run 512 threads per block if those were
initialized.

Thomas
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to