On 11/02/2012 07:36 PM, Alejandro Pulver wrote:
> On 11/02/2012 06:40 PM, Taavi Burns wrote:
>> I see that you're reading from the compressed zip file directly. That makes 
>> me suspect that your map/reduce is waiting for data from the 
>> single-CPU-bound job of zip decompression.
>>
>> Try decompressing the archive first, and make sure all the files fit into 
>> your OS' disk cache (or flush the cache between tests).
> Sorry for the confusion. I've also been testing that version (which
> actually runs a minute faster for CPython), but the times in my previous
> mail are from "test1" which reads from disk files.
>
> Also note that the zip is a 300MB file I created from the extracted
> files, not the 30MB 7z which would probably take too long to extract on
> the fly.
>
Well, now that I mention it, there is something strange in these results
as well (using "test2", the version which reads from a ZIP archive):

$ time python test_mapreduce.py
170686
python test_mapreduce.py  1869.19s user 11.44s system 357% cpu 8:46.44 total

$ time ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py
170685
~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py  889.64s user 15.32s
system 182% cpu 8:17.20 total

So CPython seems to runs faster without consuming more CPU (which is
strange since it's decompressing). And PyPy is taking about twice as before.
In an earlier version, I used a global variable for opening the zip, and
used it from "func_map"; CPython worked the same, but PyPy consumed all
my RAM and ran faster (instead of slower like the previous result shows).

BTW, the result is different between CPython and PyPy (counts one word
less). This might point to a bug.

Regards,
Alejandro
_______________________________________________
pypy-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to