I see that you're reading from the compressed zip file directly. That makes me 
suspect that your map/reduce is waiting for data from the single-CPU-bound job 
of zip decompression.

Try decompressing the archive first, and make sure all the files fit into your 
OS' disk cache (or flush the cache between tests).

-- 
taa
/*eof*/

On 2012-11-02, at 16:00, Alejandro Pulver <[email protected]> wrote:

> Hello,
> 
> As part of an introductory course of computational neuroscience, we
> learned the basics of NLTK to analyze wikileaks.
> 
> On my own, I tried PyPy 1.9 (under Ubuntu 12.04 64-bits) and a simple
> MapReduce scheme as an attempt to improve performance. There are 14266
> files under "cable", adding up to 1.2GB. It can be downloaded as a 30MB
> compressed 7z here:
> http://www.dc.uba.ar/materias/incc/practicas/p2/nltk/wikis.7z
> 
> The results are:
> 
> $ time python test_mapreduce.py
> 170686
> python test_mapreduce.py  1897.59s user 13.10s system 338% cpu 9:24.29 total
> 
> $ time ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py
> 170685
> ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py  573.78s user 15.64s
> system 170% cpu 5:46.41 total
> 
> I find it strange that PyPy is using (about) 4 times less CPU than
> CPython, while only taking (about) half the time. Watching the CPU usage
> of my 4 cores confirms it: approximately half of the available cycles
> aren't used (sometimes it seems only 2 cores are used). As I'm not
> running another process that consumes them, I suspect PyPy is blocking
> for some reason (i.e. removed from the scheduling queue by waiting, or
> some other system call). It didn't improve by using 8 processes instead
> of 4.
> 
> Do you think there is a problem with my code (actually, I'm new to Python)?
> 
> Thanks in advance,
> Alejandro
> 
> P.S.: please CC me because I'm not subscribed.
> <test_mapreduce.py>
> _______________________________________________
> pypy-dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/pypy-dev
_______________________________________________
pypy-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to