Below I'm posting the results for running: python measure_gpuarray_speed_random.py on a Windows XP desktop with a 9800GT card vs a MacBook with a 9400M card.
Whilst any real speed-ups are likely to depend on the type of problem you're solving I figure that sharing these numbers might help others decide what kind of card they might need to experiment with. I think I'm right in saying that this particular test just fills increasingly large arrays with random numbers via the GPU and the CPU so mostly we're looking at memory operations rather than raw processing power? I'm using: Python 2.6 pyCUDA 0.94beta (as of the first week of January) numpy 1.4 Boost 1.38 (Windows) and 1.41 (Mac) CUDA 2.3 "python measure_gpuarray_speed_random.py" on Windows XP using 9800GT with an Intel Core 2 Duo CPU at 2.66GHz (though only 1 CPU seems to be used), 1GB RAM ==== 1024 kernel.cu tmpxft_00000d5c_00000000-3_kernel.cudafe1.gpu tmpxft_00000d5c_00000000-8_kernel.cudafe2.gpu kernel.cu tmpxft_00000b98_00000000-3_kernel.cudafe1.gpu tmpxft_00000b98_00000000-8_kernel.cudafe2.gpu 2048 <snip> 16777216 Size |Time GPU |Size/Time GPU|Time CPU |Size/Time CPU|GPU vs CPU speedup --------+----------------+-------------+-----------------+-------------+------------------ 1024 |0.003523625 |290609.812338|4.28882865906e-05|23875982.9642|0.0121716376148 2048 |0.00218711889648|936391.708422|5.3062335968e-05 |38596114.6007|0.0242612946435 4096 |0.0021967746582 |1864551.73484|0.000110264831543|37146930.1924|0.0501939655627 8192 |0.00221661279297|3695728.91846|0.000205230010986|39916189.4531|0.0925872175949 16384 |0.00223460498047|7331944.63594|0.000410583557129|39904179.5891|0.183738763995 32768 |0.00231461450195|14157001.0783|0.00081089276123 |40409782.3617|0.350335989231 65536 |0.00240575585938|27241334.4624|0.00189025 |34670546.224 |0.785719794731 131072 |0.00242032714844|54154662.5565|0.00420794775391 |31148675.7121|1.73858635459 262144 |0.00271729394531|96472448.4269|0.00867364941406 |30223033.868 |3.19201734837 524288 |0.00303069091797|172992896.402|0.0177443378906 |29546777.3005|5.85488206185 1048576 |0.00344940429687|303987561.258|0.035467828125 |29564144.6187|10.282305312 2097152 |0.00441471038818|475037276.65 |0.0707175488281 |29655326.5031|16.0186156305 4194304 |0.0064045324707 |654896203.46 |0.150012763672 |27959647.5482|23.4229062555 8388608 |0.010213684082 |821310697.749|0.278191074219 |30154123.4691|27.2370940774 16777216|0.0179603649902 |934124446.197|0.556249179688 |30161331.6705|30.9709284856 ==== "python measure_gpuarray_speed_random.py" on Mac OS X (Leopard) using 9400M with an Intel Core 2 Duo at 2GHz (again only 1 CPU seems to be used), 2GB RAM ==== 1024 <snip> 16777216 Size |Time GPU |Size/Time GPU|Time CPU |Size/Time CPU|GPU vs CPU speedup --------+----------------+-------------+-----------------+-------------+------------------ 1024 |0.00362544628906|282447.985256|4.3817150116e-05 |23369844.8505|0.0120860017284 2048 |0.00307543896484|665921.198051|0.000121678268433|16831271.7331|0.0395645206501 4096 |0.00287928857422|1422573.6304 |0.000244288223267|16767079.2526|0.0848432579679 8192 |0.00307496777344|2664092.96083|0.000451083526611|18160716.4011|0.146695367187 16384 |0.00307557250977|5327138.26384|0.000869509094238|18842816.146 |0.282714548747 32768 |0.0033080065918 |9905663.45341|0.00175915466309 |18627128.5224|0.531786927949 65536 |0.00350910644531|18675979.4897|0.00342910791016 |19111676.1902|0.977202590915 131072 |0.00442680712891|29608699.0427|0.00722674365234 |18137076.1584|1.63249571122 262144 |0.00605847070312|43269005.141 |0.0149418457031 |17544285.0374|2.46627349298 524288 |0.00753005224609|69626077.3319|0.0304311484375 |17228662.9628|4.04129313356 1048576 |0.012277375 |85407181.9098|0.0603436679688 |17376736.2061|4.91503012401 2097152 |0.0224226953125 |93528096.0104|0.133947558594 |15656515.2961|5.97374921823 4194304 |0.0410210205078 |102247675.657|0.242842695312 |17271691.0204|5.91995743417 8388608 |0.0780941503906 |107416598.529|0.482362148437 |17390684.6281|6.17667451435 16777216|0.153209472656 |109505082.872|0.949726328125 |17665316.316 |6.19887472791 ==== Conclusion? Is it ok to say that the 9800GT GPU is about 10* faster than the 9400M for this test for the 16777216 problem (the time taken is 0.017 vs 0.15)? Is the timing dependent on the bus speed at all, or is the GPU time purely down to the GPU's speed? Cheers, Ian. -- Ian Ozsvald (Professional Screencaster) [email protected] http://ProCasts.co.uk/examples.html http://TheScreencastingHandbook.com http://IanOzsvald.com + http://ShowMeDo.com http://twitter.com/ianozsvald
_______________________________________________ PyCUDA mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
