Hi, 

I meet some strange things when I did some experiments on block configuration.

The device I use has 16384 register, and 16384 shared memory, the function I 
run uses 62 register and 224 shared memory. When I run it with 32*1*1 as block 
size, it shows there are 4 active block per SM, and it shows the limitation is 
by register. However, this make some no sense since 16384/62/32 = 8.

Any idea?

Cheers,
John

-- 
[email protected]
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to