Adam Reichold <[email protected]> writes: > Hello, > > Am 31.12.2015 um 12:55 schrieb Carlos Garcia Campos: >> Adam Reichold <[email protected]> writes: >> >>> Hello, >>> >>> Am 30.12.2015 um 19:17 schrieb Carlos Garcia Campos: >>>> Albert Astals Cid <[email protected]> writes: >>>> >>>>> El Wednesday 30 December 2015, a les 17:04:42, Adam Reichold va escriure: >>>>>> Hello again, >>>>>> >>>>>> as discussed in the code modernization thread, if we are going to make >>>>>> performance-orient changes, we need a simple way to track functional and >>>>>> performance regressions. >>>>>> >>>>>> The attached patch tries to extend the existing Python-based regtest >>>>>> framework to measure run time and memory usage to spot significant >>>>>> performance changes in the sense of relative deviations w.r.t. to these >>>>>> two parameters. It also collects the sums of both which might be used as >>>>>> "ball park" numbers to compare the performance effect of changes over >>>>>> document collections. >>>>> >>>>> Have you tried it? How stable are the numbers? For example here i get for >>>>> rendering the same file (discarding the first time that is loading the >>>>> file >>>>> into memory) numbers that range from 620ms to 676ms, i.e. ~10% variation >>>>> without no change at all. >>>> >>>> I haven't looked at the patches in detail yet, but I don't think the >>>> regtest framework should be the one measuring. I would add a tool for >>>> that, pdfperf or something like that, that could measure the internals >>>> (parsing, output devs, etc.) in a more detailed way. If we >>>> need to get more information from the poppler core we could just add a >>>> compile option to provide that info. And the regtest framework should >>>> just run the perf command and collect the results. A report command >>>> could compare results with refs or previous executions. I also think >>>> performance tests should be run with a different command, since we don't >>>> want to measure the performance of every single document we have in our >>>> test suite (it would take too long). Instead, we could select a set of >>>> documents with different kind of pdf features to run perf tests on those >>>> only. >>>> >>>> So, in summary, I would use a dedicated tool (depending on a build >>>> option if we need to get more info from the core), and maybe even a >>>> dedicated perf framework on top of that tool, since I consider perf tests >>>> different from rendering regression tests, the same way unit tests are >>>> handled by a different framework too. >>> >>> I agree that a dedicated tool might provide more detailed information. >>> But with the limited resources we have, even some information might be >>> useful. Of course we should make it reliable, e.g. by improving upon the >>> measurement procedure. >> >> Yes, you are right. We can probably start with something like this. I >> think there's an intermediate solution, though. I think we could at >> least measure times per page, since there are documents with lots of >> pages, and sometimes it's one page containing a complex pattern or image >> the one causing the regression on the whole document. Measuring every >> page makes it easier to track those regressions but also provides more >> accurate measurements than whole document times. For that we would >> probably need to change the tools (pdftoppm/cairo/text) to provide >> rendering times per page (using a command line switch, for example). > > This does sound like a useful addition, but I think it should also be > possible without adding benchmark options to the utilities: We can just > time several invocations that each render a single page after querying > the number of pages in the benchmark driver. The only downside to this > seems to be that this will make the individual benchmark durations > shorter and hence we will need more iterations to battle random > fluctuations. > >> I'm fine with extending the regtest framework, but what we don't want >> for sure is mixing performance and rendering tests. Our current test >> suite takes a long time, and also I don't think we should test >> performance in the same way. So, I would add a different subcommand >> run-perf-test, for example, to be able to run a different subset of >> tests and storing the results differently (using a json file or >> whatever, without affecting the rendering checksums). I'm not sure >> having references like for regtests is the best approach. In this case I >> would just keep the results of every revision. And a different >> subcommand could be implemented to compare results, producing the report >> with the improvements/regressions. >> >> What do you think? > > The main motivation for the current approach - that is adding additional > backends perftext and perfsplash to do measurements - was to reuse as > much of the existing regtest code base as possible. And I think that the > patch set shows how little code is necessary to implement it in this > particular way. Of course, it is somewhat of a hack as the abuse of the > _match_checksums backend method to compute the relative deviations shows.
Ah, they are new backends. This is indeed a hack because they are not actually backends. I'm sorry that I haven't had time to look at the patches in detail yet. I don't think adding new subcommands would be much harder, I can help with that if needed. > I also think that it does fulfil two of the above requirements: Since > these are separate backends, they will not affect the rendering > regression tests in anyway. And since we can invoke the regtest > framework with different backends and test directories, we can already > use separate document selections for checking for functional and > performance regressions. I see. It's useful for playing with it, but not a solution I would upstream. > I also agree that having a reference point is not really suitable for > performance measurements and that we should just create outputs for each > revision and generate reports about the changes over time. (But then > again, the regtest framework could generally be made to work that way. > After all, the reference revision is special only to the user as the > current implementation already shows.) > > In any case, if I drop this patch and significantly diverge from the > regtest framework, I would probably just start over to have something > simple and self-contained, quite possibly implementing the separate > pdfperf application to be able to hook directly into the library as well > and hopefully reusing the IMHO rather nice benchmarking capabilities of > the QTest framework. In any case, it will probably take several days > instead of a single day like this patch. I don't mind either way, but if something can be reused from the regtests framework, we could probably simply add a couple of subcommands as I suggested. I don't think we should depend on Qt for this, though. > As to which we way is taken: I will abide by the preferences of the > maintainers here: If you want the proper benchmark driver, I will gladly > try to implement that. If you can live with this patch, I will be happy > to maintain the perf backends of the regtest framework in future. (Just > a significant extension of the already quite complicated regtest > framework is something I would like to avoid.) I don't think it would be that much code to extend the current framework, but if you are going to do it, it's up to you. I can help to extend the framework if needed, though. >>> Also users probably won't care about which part of the library did >>> produce a performance regression, so the overall numbers are indeed >>> interesting IMHO. >> >> This is not for users, this is for us :-) and we need to know which part >> of the code introduced the regression. I agree we can start with a >> simpler approach, but at least knowing if the problem is in a particular >> page or all pages regressed would help a lot to identify the problem. > > What I meant was if the user doesn't care, we shouldn't care as well in > the sense that it does not matter if a regression was introduced by the > parsing or the rendering code. Fixing it will usually involve additional > profiling using proper instrumentation in any case. > > Best regards, Adam. > >>> Especially since a developer can always do proper >>> profiling when looking into a specific regression. Microbenchmarks, e.g. >>> using QTest, also provide a certain balance w.r.t. these issues, as they >>> can be used to continuously observe the performance of specific portions >>> of the code base with more or less the same overhead as a unit test. >>> >>> Best regards, Adam. >>> >>>>> Cheers, >>>>> Albert >>>>> >>>>>> >>>>>> The patch runs the measured commands repeatedly including warm-up >>>>>> iterations and collects statistics from these runs. The measurement >>>>>> results are stored as JSON documents with the actual program output of >>>>>> e.g. pdftotext or pdftoppm being discarded. >>>>>> >>>>>> To implement the check for relative deviations, it abuses the checksum >>>>>> comparison method and hence checksums are still computed for the JSON >>>>>> documents even though they are actually unnecessary. It is also limited >>>>>> to Unix-like operating systems (due to the use of the wait3 syscall to >>>>>> determine resource usage similar to the time command). >>>>> >>>>> _______________________________________________ >>>>> poppler mailing list >>>>> [email protected] >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler >>>> >>>> >>>> >>>> _______________________________________________ >>>> poppler mailing list >>>> [email protected] >>>> http://lists.freedesktop.org/mailman/listinfo/poppler >>>> >>> >>> _______________________________________________ >>> poppler mailing list >>> [email protected] >>> http://lists.freedesktop.org/mailman/listinfo/poppler >> > -- Carlos Garcia Campos PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
signature.asc
Description: PGP signature
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
