Hi Jozef, On Samstag 07 Februar 2009, Jozef Vesely wrote: > I figured out how to use all 16 bytes. > No need to bother with block/grid sizes since > for huge arrays each thread computes multiple values > anyway. So I just 4x"unrolled" the loop. > > Here is the speedup: > > Size |NewTime|OldTime|Ratio > ---------+-------+-------+----- > [snip]
Sweet. I've committed this version of your RNG. Thanks very much for your contribution. Andreas
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ PyCuda mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
