Ok, I just committed the changes. They address two general cases: - You want to know how fast PyPy is *now* compared to CPython in different benchmark scenarios, or tasks. - You want to know how PyPy has been *improving* overall over the last releases
That is now answered on the front page, and the reports are now much less prominent (I didn't change the logic because it is something I want to do properly, not just as a hack for speed.pypy). - I have not yet addressed the "smaller is better" point. I am aware that the wording of the "faster on average" needs to be improved (I am discussing it with Holger even now ;). Please chime in so that we can have a good paragraph that is informative and short enough while at the same time not being misleading. Miquel 2011/3/8 Miquel Torres <tob...@googlemail.com>: > you mean this timeline, right?: > http://speed.pypy.org/timeline/?ben=spectral-norm > > Because the December 22 result is so high, the yaxis maximum goes up > to 2.5, thus having less space for the more interesting < 1 range, > right? > > Regarding mozilla, do you mean this site?: http://arewefastyet.com/ > I can see their timelines have some holes, probably failed runs... > > I see a problem with the approach you suggest. Entering an arbitrary > maximum yaxis number is not a good thing. I think the onus is there on > the benchmark infrastructure to not send results that aren't > statistically significant. See Javastats > (http://www.elis.ugent.be/en/JavaStats), or ReBench > (https://github.com/smarr/ReBench). > > Something that can be done on the Codespeed side is to treat > differently points that have a too high stddev. In the aforementioned > spectral-norm timeline, the stddev "floor" is around 0.0050, while the > spike has a 0.30 stddev, much higher. A "strict" mode could be > implemented that invalidates or hides statistically unsound data. > > Btw., I had written to the arewefastyet guys about the possibility of > configuring a Codespeed instance for them. We may yet see > collaboration there ;-) > > Miquel > > > 2011/3/8 Maciej Fijalkowski <fij...@gmail.com>: >> On Tue, Mar 8, 2011 at 8:14 AM, Laura Creighton <l...@openend.se> wrote: >>> In a message of Tue, 08 Mar 2011 09:10:32 +0100, Miquel Torres writes: >>>>Hi, >>>> >>>>I finished the changes to the speed.pypy.org home page last night, but >>>>alas!, I didn't have time to deploy. I will do it later today and will >>>>then ping you back. >>>> >>>>The extra info provided is really nice as an overview, you will see ;-) >>>> >>>> >>> >>> Ah good. Thank you very much. We spent yesterday afternoon with >>> the Mozilla engineers, and I got to talk to the person who maintains >>> the benchmarks for tracemonkey. He had timelines very much like ours. >>> There is one feature he has that I would like to have. Take a look >>> at the timeline for spectral.norm. There are two spikes there. >>> Mozilla has lines like that too, though mostly it is because their >>> jit decides that the whole benchmark is bogus and optimises out all the >>> code. So it takes 0 time. oops. >>> >>> At any rate, aside from knowing that something went horribly wrong with >>> that rev, you don't really need to know how wrong. And by making the >>> graph display up to that point means that the dots where things really >>> do matter get crammed closer together than would otherwise be the case. >>> So he had a mode where things wehre displayed with an arbitrary value >>> at the bottom (in our coase it would be the top) which he could specify. >>> Then the graph would be replotted, with the outliers off the graph, but >>> making it easier to read the dots for the more normal cases. >>> >>> Any chance we could do that too? >> >> Link maybe? >> >>> >>> Laura >>> _______________________________________________ >>> pypy-dev@codespeak.net >>> http://codespeak.net/mailman/listinfo/pypy-dev >>> >> > _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev