Pierre Castellani <[email protected]> writes: > I have bought kepler GPU in order to do some numerical calculation on it. > > I would like to use pyCuda (looks to me the best solution). > > Unfortunatly when I am running a test like > MeasureGpuarraySpeedRandom > <http://wiki.tiker.net/PyCuda/Examples/MeasureGpuarraySpeedRandom?action=fullsearch&value=linkto%3A%22PyCuda%2FExamples%2FMeasureGpuarraySpeedRandom%22&context=180> > > I get the following results: > Size |Time GPU |Size/Time GPU|Time CPU |Size/Time > CPU|GPU vs CPU speedup > ---------+---------------+-------------+-----------------+-------------+------------------ > 1024 > |0.0719905126953|14224.0965047|3.09289598465e-05|33108129.2446|0.000429625497701 > > > 2048 > |0.0727789160156|28140.0179079|5.74035215378e-05|35677253.6795|0.000788738341822 > > > 4096 |0.07278515625 |56275.2106478|0.00010898976326 > |37581511.1208|0.00149741745261 > 8192 > |0.0722379931641|113402.928863|0.000164551048279|49783942.9508|0.00227790171171 > > > 16384 |0.0720771630859|227311.94318 > |0.000254381122589|64407294.9802|0.00352928877467 > 32768 |0.0722085107422|453796.923149|0.00044281665802 > |73999022.8609|0.0061324718301 > 65536 |0.0720480078125|909615.713047|0.000749320983887|87460516.133 > |0.0104003012247 > 131072 |0.0723209472656|1812365.64171|0.00153271682739 > |85516122.5202|0.0211932626071 > 262144 |0.0727287304688|3604407.75345|0.00305026916504 > |85941268.0706|0.041940360369 > 524288 |0.0723101269531|7250547.35888|0.00601688781738 > |87136076.9741|0.0832094766101 > 1048576 |0.0627352734375|16714297.1178|0.0123564978027 > |84860291.0582|0.196962524042 > 2097152 |0.0743136047363|28220297.0431|0.026837512207 > |78142563.4322|0.361138613882 > 4194304 |0.074144744873 |56569133.8905|0.0583531860352 > |71877891.9367|0.787017153206 > 8388608 |0.0736544189453|113891442.226|0.121150952148 > |69240958.0877|1.64485653248 > 16777216 |0.0743454406738|225665701.191|0.242345166016 > |69228597.6891|3.2597179305 > 33554432 |0.0765948486328|438076875.912|0.484589794922 > |69242960.4412|6.32666300112 > 67108864 |0.0805058410645|833589999.343|0.970654882812 |69137718.45 > |12.0569497813 > 134217728|0.0846059753418|1586385919.64|1.94103554688 > |69147485.8439|22.9420621774 > 268435456|0.094531427002 |2839642482.01|3.88270039062 > |69136278.6189|41.0731173089 > 536870912|0.111502416992 |4814881385.37|7.7108625 > |69625273.6967|69.1542184286 > > > I was not expecting fantastic result but not that bad.
I've added a note to the documentation of the function you're using to benchmark: http://documen.tician.de/pycuda/array.html#pycuda.curandom.rand That should answer your concerns. I'd like to have a word with whoever came up with the idea that this was a valid benchmark. Random number generation is a bad problem to use. Parallel RNGs are more complicated than sequential ones. So claiming that both do the same amount of work is... mistaken. But even neglecting this basic fact, the notion that all RNGs are somehow comparable or do comparable amounts of work is also completely off. There are subtle tradeoffs in how much work is done and how 'good' (uncorrelated, ...) the RN sequence and its subsequences are: https://www.xkcd.com/221/ If you'd like to assess how viable GPUs and PyCUDA are, I'd suggest you use a more well-defined workload, such as "compute 10^8 sines and cosines", or, even better, the thing that you'd actually like to do. Andreas _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
