Hi John, [email protected] writes: > I meet some strange things when I did some experiments on block configuration. > > The device I use has 16384 register, and 16384 shared memory, the function I > run uses 62 register and 224 shared memory. When I run it with 32*1*1 as > block size, it shows there are 4 active block per SM, and it shows the > limitation is by register. However, this make some no sense since 16384/62/32 > = 8.
Try and use Nvidia's 'occupancy calculator' Excel sheet. That should help clarify things. Andreas _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
