Most CUDA programmers would recommend you read the CUDA Programming
Guide [1] from cover to cover. Appendix F.1 shows that the maximum
threads per block is either 512 or 1024. 22 by 22 is the largest
square thread-block that'll fit in the earlier cards, which presumably
you have (although often you find that the best speed is not obtained
by the biggest but by something else like maybe 16x16).

[1] 
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf

On Mon, Nov 5, 2012 at 5:36 PM, Rui Lopes <[email protected]> wrote:
> Hello everybody,
>
> As suggested I worked on matrixMul in order to have my custom gpudot. I've
> attached the test file.
> By trial and error I figured that I could not use block sizes bigger than
> 22. Is this imposed by graphic card type/version or am I missing something
> important in the example's code? (as you may have noticed I did not need to
> change much to obtain the same result an in numpy)
>
> Best regards,
> Rui
>
>
>
>
>
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>



-- 
--
Ahmed Fasih
[email protected]
[email protected]
614 547 3323 (Google Voice)

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to