Hi Jozef,

On Samstag 07 Februar 2009, Jozef Vesely wrote:
> I figured out how to use all 16 bytes.
> No need to bother with block/grid sizes since
> for huge arrays each thread computes multiple values
> anyway. So I just 4x"unrolled" the loop.
>
> Here is the speedup:
>
> Size     |NewTime|OldTime|Ratio
> ---------+-------+-------+-----
> [snip]

Sweet. I've committed this version of your RNG. Thanks very much for your 
contribution.

Andreas

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to