El Wednesday 30 December 2015, a les 20:00:17, Ihar Filipau va escriure: > On 12/30/15, Albert Astals Cid <[email protected]> wrote: > > El Wednesday 30 December 2015, a les 17:04:42, Adam Reichold va escriure: > >> Hello again, > >> > >> as discussed in the code modernization thread, if we are going to make > >> performance-orient changes, we need a simple way to track functional and > >> performance regressions. > >> > >> The attached patch tries to extend the existing Python-based regtest > >> framework to measure run time and memory usage to spot significant > >> performance changes in the sense of relative deviations w.r.t. to these > >> two parameters. It also collects the sums of both which might be used as > >> "ball park" numbers to compare the performance effect of changes over > >> document collections. > > > > Have you tried it? How stable are the numbers? For example here i get for > > rendering the same file (discarding the first time that is loading the > > file > > into memory) numbers that range from 620ms to 676ms, i.e. ~10% variation > > without no change at all. > > To make the timing numbers stable, the benchmark framework should > repeat the test few times. IME at least three time. (I often do as > many as five runs.) > > The final result is a pair: the average of the timing among all runs, > and (for example) standard deviation (or simply distance to the > min/max value) computed over all the timing number. > > {I occasionally test performance on an embedded system running of the > flash (no spinning disks, no networks, nothing to screw timing) yet I > still get variations as high as 5%. Performance testing on a PC is > even trickier business: some go as far as to reboot system in > single-user mode, and shutdown all unnecessary services. Pretty much > everything running in the background - and foreground, e.g. GUI - can > contribute to the unreliability of the numbers.} > > For a benchmark on normal Linux/etc, I would advise to perform the > test once to "warm up" the caches, and only then start with the > measured test runs. > > Summary: > 1. A performance test framework should do a "warm up" phase. Timing is > discarded. > 2. A performance test framework should repeat the test 3/5/etc time, > collecting the timing information. > 3. The collected timing are averaged and the deviation (or distance to > min/max) is computed. The average is the official benchmark result, > the deviation/etc is the indication of the reliability of the > benchmark. > > fyi. > > P.S. Note that 600ms is an OK-ish duration for a benchmark: not too > short, not too long.
600 ms is the rendering time one page of one of the 1600 files in one of the 3 or 4 backends. ;) > But. Generally, shorter the duration of the > benchmark, less reliable the timing numbers are (higher the > deviation). Longer the duration - more reliable numbers are (higher > the deviation). > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
