btw, for python memory usage on linux /proc/PID/status Here is some code for linux...
wget http://rene.f0o.com/~rene/stuff/memory_usage.py >>> import memory_usage >>> bytes_of_resident_memory = memory_usage.resident() Should be easy enough to add that to benchmarks at the start and end? Maybe calling it in the middle would be a little harder... but not too hard. TODO: Would need to be updated for other platforms, and support measuring child processes, tests, and code cleanup :) cu, On Thu, Mar 11, 2010 at 12:32 AM, Maciej Fijalkowski <[email protected]>wrote: > Hey. > > I'll answer questions that are relevant to benchmarks themselves and > not running. > > On Wed, Mar 10, 2010 at 4:45 PM, Bengt Richter <[email protected]> wrote: > > On 03/10/2010 12:14 PM Miquel Torres wrote: > >> Hi! > >> > >> I wanted to explain a couple of things about the speed website: > >> > >> - New feature: the Timeline view now defaults to a plot grid, showing > >> all benchmarks at the same time. It was a feature request made more > >> than once, so depending on personal tastes, you can bookmark either > >> /overview/ or /timeline/. Thanks go to nsf for helping with the > >> implementation. > >> - The code has now moved to github as Codespeed, a benchmark > >> visualization framework (http://github.com/tobami/codespeed) > >> - I have updated speed.pypy.org with version 0.3. Much of the work has > >> been under the hood to make it feasible for other projects to use > >> codespeed as a framework. > >> > >> For those interested in further development you can go to the releases > >> wiki (still a work in progress): > >> http://wiki.github.com/tobami/codespeed/releases > >> > >> Next in the line are some DB changes to be able to save standard > >> deviation data and the like. Long term goals besides world domination > >> are integration with buildbot and similarly unrealistic things. > >> Feedback is always welcome. > > > > Nice looking stuff. But a couple comments: > > > > 1. IMO standard deviation is too often worse than useless, since it hides > > the true nature of the distribution. I think the assumption of > normalcy > > is highly suspect for benchmark timings, and pruning may hide > interesting clusters. > > > > I prefer to look at scattergrams, where things like clustering and > correlations > > are easily apparent to the eye, as well as the amount of data > (assuming a good > > mapping of density to visuals). > > That's true. In general a benchmark run over time is a period of > warmup, when JIT compiles assembler followed by thing that can be > described by average and std devation. Personally I would like to have > those 3 measures separated, but didn't implement that yet (it has also > some interesting statistical questions involved). Std deviation is > useful to get whether a difference was meaningful of certain checkin > or just noise. > > > > > 2. IMO benchmark timings are like travel times, comparing different > vehicles. > > (pypy with jit being a vehicle capable of dynamic self-modification > ;-) > > E.g., which part of travel from Stockholm to Paris would you > concentrate > > on improving to improve the overall result? How about travel from > Brussels to Paris? > > Or Paris to Sydney? ;-P Different things come into play in different > benchmarks/trips. > > A Porsche Turbo and a 2CV will both have to wait for a ferry, if > that's part of the trip. > > > > IOW, it would be nice to see total time broken down somehow, to see > what's really > > happening. > > I can't agree more with that. We already do split time when we perform > benchmarks by hand, but they're not yet integrated into the whole > nightly run. Total time is what users see though, that's why our > public site is focused on that. I want more information available, but > we have only limited amount of manpower and miquel already did quite > amazing job in my opinion :-) We'll probably go into more details. > > The part we want to focus on past-release is speeding up certain parts > of tracing as well as limiting it's GC pressure. As you can see, the > split would be very useful for our development. > > > > > Don't get me wrong, the total times are certainly useful indicators of > progress > > (which has been amazing). > > > > 3. Speed is ds/dt and you are showing the integral of dt/ds over the trip > distance to get time. > > A 25% improvement in total time is not a 25% improvement in speed. > I.e., (if you define > > improvement as a percentage change in a desired direction), for e.g. > 25%: > > distance/(0.75*time) != 1.25*(distance/time). > > > > IMO 'speed' (the implication to me in the name speed.pypy.org) would > be benchmarks/time > > more appropriately than time/benchmark. > > > > Both measures are useful, but time percentages are easy to > mis{use,construe} ;-) > > That's correct. > > Benchmarks are in general very easy to lie about and they're by > definition flawed. That's why I always include raw data when I publish > stuff on the blog, so people can work on it themselves. > > > > > 4. Is there any memory footprint data? > > > > No. Memory measurment is hard and it's even less useful without > breaking down. Those particular benchmarks are not very good basis for > memory measurment - in case of pypy you would mostly observe the > default allocated memory (which is roughly 10M for the interpreter + > 16M for semispace GC + cache for nursery). > > Also our GC is of a kind that it can run faster if you give it more > memory (not that we use this feature, but it's possible). > > Cheers, > fijal > _______________________________________________ > [email protected] > http://codespeak.net/mailman/listinfo/pypy-dev >
_______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
