I see that you're reading from the compressed zip file directly. That makes me suspect that your map/reduce is waiting for data from the single-CPU-bound job of zip decompression.
Try decompressing the archive first, and make sure all the files fit into your OS' disk cache (or flush the cache between tests). -- taa /*eof*/ On 2012-11-02, at 16:00, Alejandro Pulver <[email protected]> wrote: > Hello, > > As part of an introductory course of computational neuroscience, we > learned the basics of NLTK to analyze wikileaks. > > On my own, I tried PyPy 1.9 (under Ubuntu 12.04 64-bits) and a simple > MapReduce scheme as an attempt to improve performance. There are 14266 > files under "cable", adding up to 1.2GB. It can be downloaded as a 30MB > compressed 7z here: > http://www.dc.uba.ar/materias/incc/practicas/p2/nltk/wikis.7z > > The results are: > > $ time python test_mapreduce.py > 170686 > python test_mapreduce.py 1897.59s user 13.10s system 338% cpu 9:24.29 total > > $ time ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py > 170685 > ~/Downloads/pypy-1.9/bin/pypy test_mapreduce.py 573.78s user 15.64s > system 170% cpu 5:46.41 total > > I find it strange that PyPy is using (about) 4 times less CPU than > CPython, while only taking (about) half the time. Watching the CPU usage > of my 4 cores confirms it: approximately half of the available cycles > aren't used (sometimes it seems only 2 cores are used). As I'm not > running another process that consumes them, I suspect PyPy is blocking > for some reason (i.e. removed from the scheduling queue by waiting, or > some other system call). It didn't improve by using 8 processes instead > of 4. > > Do you think there is a problem with my code (actually, I'm new to Python)? > > Thanks in advance, > Alejandro > > P.S.: please CC me because I'm not subscribed. > <test_mapreduce.py> > _______________________________________________ > pypy-dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/pypy-dev _______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
