Pierre Castellani <[email protected]> writes:
> I have bought kepler GPU in order to do some numerical calculation on it.
>
> I would like to use pyCuda (looks to me the best solution).
>
> Unfortunatly when I am running a test like
> MeasureGpuarraySpeedRandom 
> <http://wiki.tiker.net/PyCuda/Examples/MeasureGpuarraySpeedRandom?action=fullsearch&value=linkto%3A%22PyCuda%2FExamples%2FMeasureGpuarraySpeedRandom%22&context=180>
>
> I get the following results:
> Size     |Time GPU       |Size/Time GPU|Time CPU         |Size/Time 
> CPU|GPU vs CPU speedup
> ---------+---------------+-------------+-----------------+-------------+------------------
> 1024 
> |0.0719905126953|14224.0965047|3.09289598465e-05|33108129.2446|0.000429625497701
>  
>
> 2048 
> |0.0727789160156|28140.0179079|5.74035215378e-05|35677253.6795|0.000788738341822
>  
>
> 4096     |0.07278515625  |56275.2106478|0.00010898976326 
> |37581511.1208|0.00149741745261
> 8192 
> |0.0722379931641|113402.928863|0.000164551048279|49783942.9508|0.00227790171171
>  
>
> 16384    |0.0720771630859|227311.94318 
> |0.000254381122589|64407294.9802|0.00352928877467
> 32768    |0.0722085107422|453796.923149|0.00044281665802 
> |73999022.8609|0.0061324718301
> 65536 |0.0720480078125|909615.713047|0.000749320983887|87460516.133 
> |0.0104003012247
> 131072   |0.0723209472656|1812365.64171|0.00153271682739 
> |85516122.5202|0.0211932626071
> 262144   |0.0727287304688|3604407.75345|0.00305026916504 
> |85941268.0706|0.041940360369
> 524288   |0.0723101269531|7250547.35888|0.00601688781738 
> |87136076.9741|0.0832094766101
> 1048576  |0.0627352734375|16714297.1178|0.0123564978027 
> |84860291.0582|0.196962524042
> 2097152  |0.0743136047363|28220297.0431|0.026837512207 
> |78142563.4322|0.361138613882
> 4194304  |0.074144744873 |56569133.8905|0.0583531860352 
> |71877891.9367|0.787017153206
> 8388608  |0.0736544189453|113891442.226|0.121150952148 
> |69240958.0877|1.64485653248
> 16777216 |0.0743454406738|225665701.191|0.242345166016 
> |69228597.6891|3.2597179305
> 33554432 |0.0765948486328|438076875.912|0.484589794922 
> |69242960.4412|6.32666300112
> 67108864 |0.0805058410645|833589999.343|0.970654882812 |69137718.45  
> |12.0569497813
> 134217728|0.0846059753418|1586385919.64|1.94103554688 
> |69147485.8439|22.9420621774
> 268435456|0.094531427002 |2839642482.01|3.88270039062 
> |69136278.6189|41.0731173089
> 536870912|0.111502416992 |4814881385.37|7.7108625 
> |69625273.6967|69.1542184286
>
>
> I was not expecting fantastic result but not that bad.

I've added a note to the documentation of the function you're using to 
benchmark:

http://documen.tician.de/pycuda/array.html#pycuda.curandom.rand

That should answer your concerns.

I'd like to have a word with whoever came up with the idea that this was
a valid benchmark. Random number generation is a bad problem to
use. Parallel RNGs are more complicated than sequential ones. So
claiming that both do the same amount of work is... mistaken. But even
neglecting this basic fact, the notion that all RNGs are somehow
comparable or do comparable amounts of work is also completely
off. There are subtle tradeoffs in how much work is done and how 'good'
(uncorrelated, ...) the RN sequence and its subsequences are:

https://www.xkcd.com/221/

If you'd like to assess how viable GPUs and PyCUDA are, I'd suggest you
use a more well-defined workload, such as "compute 10^8 sines and
cosines", or, even better, the thing that you'd actually like to do.

Andreas

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to