Hi John,

[email protected] writes:
> I meet some strange things when I did some experiments on block configuration.
>
> The device I use has 16384 register, and 16384 shared memory, the function I 
> run uses 62 register and 224 shared memory. When I run it with 32*1*1 as 
> block size, it shows there are 4 active block per SM, and it shows the 
> limitation is by register. However, this make some no sense since 16384/62/32 
> = 8.

Try and use Nvidia's 'occupancy calculator' Excel sheet. That should
help clarify things.

Andreas

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to