On 11/02/2012 07:36 PM, Alejandro Pulver wrote: > On 11/02/2012 06:40 PM, Taavi Burns wrote: >> I see that you're reading from the compressed zip file directly. That makes >> me suspect that your map/reduce is waiting for data from the >> single-CPU-bound job of zip decompression. >> >> Try decompressing the archive first, and make sure all the files fit into >> your OS' disk cache (or flush the cache between tests). > Sorry for the confusion. I've also been testing that version (which > actually runs a minute faster for CPython), but the times in my previous > mail are from "test1" which reads from disk files. > > Also note that the zip is a 300MB file I created from the extracted > files, not the 30MB 7z which would probably take too long to extract on > the fly. > Well, now that I mention it, there is something strange in these results as well (using "test2", the version which reads from a ZIP archive):
$ time python test_mapreduce.py 170686 python test_mapreduce.py 1869.19s user 11.44s system 357% cpu 8:46.44 total $ time ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py 170685 ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py 889.64s user 15.32s system 182% cpu 8:17.20 total So CPython seems to runs faster without consuming more CPU (which is strange since it's decompressing). And PyPy is taking about twice as before. In an earlier version, I used a global variable for opening the zip, and used it from "func_map"; CPython worked the same, but PyPy consumed all my RAM and ran faster (instead of slower like the previous result shows). BTW, the result is different between CPython and PyPy (counts one word less). This might point to a bug. Regards, Alejandro _______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
