Hi again, trying to do some research on ways to record memory usage in an X-platform way...
keeping my notes here: http://renesd.blogspot.com/2010/03/memory-usage-of-processes-from-python.html So far people have come up with these two useful projects so far: http://code.google.com/p/psutil/ http://code.google.com/p/pympler/ I think psutil will have most info needed to construct a decent memory recording module for benchmarks. However, it includes C code, so will probably have to rip some of the memory parts out, and maybe reimplement with ctypes. cu, On Fri, Mar 12, 2010 at 10:49 AM, René Dudfield <[email protected]> wrote: > btw, for python memory usage on linux > /proc/PID/status > > Here is some code for linux... > > wget > http://rene.f0o.com/~rene/stuff/memory_usage.py<http://rene.f0o.com/%7Erene/stuff/memory_usage.py> > > >>> import memory_usage > >>> bytes_of_resident_memory = memory_usage.resident() > > > Should be easy enough to add that to benchmarks at the start and end? > Maybe calling it in the middle would be a little harder... but not too hard. > > > TODO: Would need to be updated for other platforms, and support measuring > child processes, tests, and code cleanup :) > > cu, > > > > > On Thu, Mar 11, 2010 at 12:32 AM, Maciej Fijalkowski <[email protected]>wrote: > >> Hey. >> >> I'll answer questions that are relevant to benchmarks themselves and >> not running. >> >> On Wed, Mar 10, 2010 at 4:45 PM, Bengt Richter <[email protected]> wrote: >> > On 03/10/2010 12:14 PM Miquel Torres wrote: >> >> Hi! >> >> >> >> I wanted to explain a couple of things about the speed website: >> >> >> >> - New feature: the Timeline view now defaults to a plot grid, showing >> >> all benchmarks at the same time. It was a feature request made more >> >> than once, so depending on personal tastes, you can bookmark either >> >> /overview/ or /timeline/. Thanks go to nsf for helping with the >> >> implementation. >> >> - The code has now moved to github as Codespeed, a benchmark >> >> visualization framework (http://github.com/tobami/codespeed) >> >> - I have updated speed.pypy.org with version 0.3. Much of the work has >> >> been under the hood to make it feasible for other projects to use >> >> codespeed as a framework. >> >> >> >> For those interested in further development you can go to the releases >> >> wiki (still a work in progress): >> >> http://wiki.github.com/tobami/codespeed/releases >> >> >> >> Next in the line are some DB changes to be able to save standard >> >> deviation data and the like. Long term goals besides world domination >> >> are integration with buildbot and similarly unrealistic things. >> >> Feedback is always welcome. >> > >> > Nice looking stuff. But a couple comments: >> > >> > 1. IMO standard deviation is too often worse than useless, since it >> hides >> > the true nature of the distribution. I think the assumption of >> normalcy >> > is highly suspect for benchmark timings, and pruning may hide >> interesting clusters. >> > >> > I prefer to look at scattergrams, where things like clustering and >> correlations >> > are easily apparent to the eye, as well as the amount of data >> (assuming a good >> > mapping of density to visuals). >> >> That's true. In general a benchmark run over time is a period of >> warmup, when JIT compiles assembler followed by thing that can be >> described by average and std devation. Personally I would like to have >> those 3 measures separated, but didn't implement that yet (it has also >> some interesting statistical questions involved). Std deviation is >> useful to get whether a difference was meaningful of certain checkin >> or just noise. >> >> > >> > 2. IMO benchmark timings are like travel times, comparing different >> vehicles. >> > (pypy with jit being a vehicle capable of dynamic self-modification >> ;-) >> > E.g., which part of travel from Stockholm to Paris would you >> concentrate >> > on improving to improve the overall result? How about travel from >> Brussels to Paris? >> > Or Paris to Sydney? ;-P Different things come into play in different >> benchmarks/trips. >> > A Porsche Turbo and a 2CV will both have to wait for a ferry, if >> that's part of the trip. >> > >> > IOW, it would be nice to see total time broken down somehow, to see >> what's really >> > happening. >> >> I can't agree more with that. We already do split time when we perform >> benchmarks by hand, but they're not yet integrated into the whole >> nightly run. Total time is what users see though, that's why our >> public site is focused on that. I want more information available, but >> we have only limited amount of manpower and miquel already did quite >> amazing job in my opinion :-) We'll probably go into more details. >> >> The part we want to focus on past-release is speeding up certain parts >> of tracing as well as limiting it's GC pressure. As you can see, the >> split would be very useful for our development. >> >> > >> > Don't get me wrong, the total times are certainly useful indicators >> of progress >> > (which has been amazing). >> > >> > 3. Speed is ds/dt and you are showing the integral of dt/ds over the >> trip distance to get time. >> > A 25% improvement in total time is not a 25% improvement in speed. >> I.e., (if you define >> > improvement as a percentage change in a desired direction), for e.g. >> 25%: >> > distance/(0.75*time) != 1.25*(distance/time). >> > >> > IMO 'speed' (the implication to me in the name speed.pypy.org) would >> be benchmarks/time >> > more appropriately than time/benchmark. >> > >> > Both measures are useful, but time percentages are easy to >> mis{use,construe} ;-) >> >> That's correct. >> >> Benchmarks are in general very easy to lie about and they're by >> definition flawed. That's why I always include raw data when I publish >> stuff on the blog, so people can work on it themselves. >> >> > >> > 4. Is there any memory footprint data? >> > >> >> No. Memory measurment is hard and it's even less useful without >> breaking down. Those particular benchmarks are not very good basis for >> memory measurment - in case of pypy you would mostly observe the >> default allocated memory (which is roughly 10M for the interpreter + >> 16M for semispace GC + cache for nursery). >> >> Also our GC is of a kind that it can run faster if you give it more >> memory (not that we use this feature, but it's possible). >> >> Cheers, >> fijal >> _______________________________________________ >> [email protected] >> http://codespeak.net/mailman/listinfo/pypy-dev >> > >
_______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
