Hi, I meet some strange things when I did some experiments on block configuration.
The device I use has 16384 register, and 16384 shared memory, the function I run uses 62 register and 224 shared memory. When I run it with 32*1*1 as block size, it shows there are 4 active block per SM, and it shows the limitation is by register. However, this make some no sense since 16384/62/32 = 8. Any idea? Cheers, John -- [email protected] Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
