I got frustrated with my (actually dying now) local box and signed up for AWS. Using an m1.medium instance to build pypy (~100 minutes), and then upgrading it to a c1.xlarge (claims to be 8 virtual cores of 2.5 ECU each).
With the same sample program, I see the expected kinds of speedups! :D So using VMWare is right out. Hopefully that info is useful for someone else in the future. :) On Sun, Feb 17, 2013 at 6:38 PM, Taavi Burns <taavi.bu...@gmail.com> wrote: > That's great, thanks! I did get it to work when you wrote earlier, but > it's definitely faster now. > > I tried a ridiculously simple and no-conflict parallel program and > came up with this, which gave me some questionable performance numbers > from a build of 65ec96e15463: > > taavi@pypy:~/pypy/pypy/goal$ ./pypy-c -m timeit -s 'import > transaction; transaction.set_num_threads(1)' ' > def foo(): > x = 0 > for y in range(100000): > x += y > transaction.add(foo) > transaction.add(foo) > transaction.run()' > 10 loops, best of 3: 198 msec per loop > > taavi@pypy:~/pypy/pypy/goal$ ./pypy-c -m timeit -s 'import > transaction; transaction.set_num_threads(2)' ' > def foo(): > x = 0 > for y in range(100000): > x += y > transaction.add(foo) > transaction.add(foo) > transaction.run()' > 10 loops, best of 3: 415 msec per loop > > > It's entirely possible that this is an effect of running inside a > VMWare guest (set to use 2 cores) running on my Core2Duo laptop. If > this is the case, I'll refrain from trying to do anything remotely > like benchmarking in this environment in the future. :) > > Would it be more helpful (if I want to contribute to STM) to use > something like a high-CPU EC2 instance, or should I look at obtaining > something like an 8-real-core AMD X8? > > (my venerable X2 has started to disagree with its RAM, so it's prime > for retirement) > > Thanks! > > On Sun, Feb 17, 2013 at 3:58 AM, Armin Rigo <ar...@tunes.org> wrote: >> Hi Taavi, >> >> I finally fixed pypy-stm with signals. Now I'm getting again results >> that scale with the number of processors. >> >> Note that it stops scaling up at some point, around 4 or 6 threads, on >> machines I tried it on. I suspect it's related to the fact that >> physical processors have 4 or 6 cores internally, but the results are >> still a bit inconsistent. Using the "taskset" command to force the >> threads to run on particular physical sockets seems to help a little >> bit with some numbers. Fwiw, I got the maximum throughput on a >> 24-cores machine by really running 24 threads, but that seems >> wasteful, as it is only 25% better than running 6 threads on one >> physical socket. >> >> The next step will be trying to reduce the overhead, currently >> considerable (about 10x slower than CPython, too much to ever have any >> net benefit). Also high on the list is fixing the constant memory >> leak (i.e. implementing major garbage collection steps). >> >> >> A bientôt, >> >> Armin. > > > > -- > taa > /*eof*/ -- taa /*eof*/ _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev