On Sun, Oct 25, 2009 at 1:22 PM, Antoine Pitrou <solip...@pitrou.net> wrote: > Having other people test it would be fine. Even better if you have an > actual multi-threaded py3k application. But ccbench results for other > OSes would be nice too :-)
My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8): Control (py3k @ r75723) --- Throughput --- Pi calculation (Python) threads=1: 633 iterations/s. threads=2: 468 ( 74 %) threads=3: 443 ( 70 %) threads=4: 442 ( 69 %) regular expression (C) threads=1: 281 iterations/s. threads=2: 282 ( 100 %) threads=3: 282 ( 100 %) threads=4: 282 ( 100 %) bz2 compression (C) threads=1: 379 iterations/s. threads=2: 735 ( 193 %) threads=3: 733 ( 193 %) threads=4: 724 ( 190 %) --- Latency --- Background CPU task: Pi calculation (Python) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 1 ms. (std dev: 1 ms.) CPU threads=2: 1 ms. (std dev: 2 ms.) CPU threads=3: 3 ms. (std dev: 6 ms.) CPU threads=4: 2 ms. (std dev: 3 ms.) Background CPU task: regular expression (C) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 975 ms. (std dev: 577 ms.) CPU threads=2: 1035 ms. (std dev: 571 ms.) CPU threads=3: 1098 ms. (std dev: 556 ms.) CPU threads=4: 1195 ms. (std dev: 557 ms.) Background CPU task: bz2 compression (C) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 2 ms.) CPU threads=2: 4 ms. (std dev: 5 ms.) CPU threads=3: 0 ms. (std dev: 0 ms.) CPU threads=4: 1 ms. (std dev: 4 ms.) Experiment (newgil branch @ r75723) --- Throughput --- Pi calculation (Python) threads=1: 651 iterations/s. threads=2: 643 ( 98 %) threads=3: 637 ( 97 %) threads=4: 625 ( 95 %) regular expression (C) threads=1: 298 iterations/s. threads=2: 296 ( 99 %) threads=3: 288 ( 96 %) threads=4: 287 ( 96 %) bz2 compression (C) threads=1: 378 iterations/s. threads=2: 720 ( 190 %) threads=3: 724 ( 191 %) threads=4: 718 ( 189 %) --- Latency --- Background CPU task: Pi calculation (Python) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 1 ms.) CPU threads=2: 0 ms. (std dev: 1 ms.) CPU threads=3: 0 ms. (std dev: 0 ms.) CPU threads=4: 1 ms. (std dev: 5 ms.) Background CPU task: regular expression (C) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 1 ms. (std dev: 0 ms.) CPU threads=2: 2 ms. (std dev: 1 ms.) CPU threads=3: 2 ms. (std dev: 2 ms.) CPU threads=4: 2 ms. (std dev: 1 ms.) Background CPU task: bz2 compression (C) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 0 ms.) CPU threads=2: 2 ms. (std dev: 3 ms.) CPU threads=3: 0 ms. (std dev: 1 ms.) CPU threads=4: 0 ms. (std dev: 0 ms.) I also ran this through Unladen Swallow's threading microbenchmark, which is a straight copy of what David Beazley was experimenting with (simply iterating over 1000000 ints in pure Python) [1]. "iterative_count" is doing the loops one after the other, "threaded_count" is doing the loops in parallel using threads. The results below are benchmarking py3k as the control, newgil as the experiment. When it says "x% faster", that is a measure of newgil's performance over py3k's. With two threads: iterative_count: Min: 0.336573 -> 0.387782: 13.21% slower # I've run this configuration multiple times and gotten the same slowdown. Avg: 0.338473 -> 0.418559: 19.13% slower Significant (t=-38.434785, a=0.95) threaded_count: Min: 0.529859 -> 0.397134: 33.42% faster Avg: 0.581786 -> 0.429933: 35.32% faster Significant (t=70.100445, a=0.95) With four threads: iterative_count: Min: 0.766617 -> 0.734354: 4.39% faster Avg: 0.771954 -> 0.751374: 2.74% faster Significant (t=22.164103, a=0.95) Stddev: 0.00262 -> 0.00891: 70.53% larger threaded_count: Min: 1.175750 -> 0.829181: 41.80% faster Avg: 1.224157 -> 0.867506: 41.11% faster Significant (t=161.715477, a=0.95) Stddev: 0.01900 -> 0.01120: 69.65% smaller With eight threads: iterative_count: Min: 1.527794 -> 1.447421: 5.55% faster Avg: 1.536911 -> 1.479940: 3.85% faster Significant (t=35.559595, a=0.95) Stddev: 0.00394 -> 0.01553: 74.61% larger threaded_count: Min: 2.424553 -> 1.677180: 44.56% faster Avg: 2.484922 -> 1.723093: 44.21% faster Significant (t=184.766131, a=0.95) Stddev: 0.02874 -> 0.02956: 2.78% larger I'd be interested in multithreaded benchmarks with less-homogenous workloads. Collin Winter [1] - http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_threading.py _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com