El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va escriure: > El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va escriure: > > Hello Albert, > > > > please note that in the previous patch, I made a mistake with the move > > constructor (the ref count should not be moved/swapped since it is tied > > to the instance, not its value). Attached is a fixed patch. > > > > I found the issue during further performance tests which however only > > continued to strengthen to zero sum result. In any case, I extend the > > Python-based perftest script with a C++-based driver to be able to > > reliably measure single-page runtimes for this, which I will also attach > > if anyone is interested in this. > > Ok, so it seems there's a mild agreement (or at least no strong > disagreement) that this may be the way forward. I will first wait to get > poppler 0.63 out of the door (that is already late several months, between > freedesktop shutting down ssh access, them forgetting to tell us it was > open again, my disk dying and then me procastinating as usual) and once > it's out I'll come back and reevaluate this.
I was trying to apply the patch for running a quick regtest but running git am over it gave me some warning/error I wasn't sure how to get out of correctly. Could you try rebasing the patch on top of current master? Cheers, Albert > > Cheers, > Albert > > > Best regards, Adam. > > > > Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: > > > Ha! didn't realize you had already done all that, will read/answer the > > > other email. > > > > > > Cheers, > > > > > > Albert > > > > > > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va > > > > > > escriure: > > >> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > > >> > > >> escriure: > > >>> Hello again, > > >>> > > >>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > > >>>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > > >>>> > > >>>> escriure: > > >>>>> Hello again, > > >>>>> > > >>>>> Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > >>>>>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > > >>>>>>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam > > >>>>>>> Reichold > > >>>>>>> va > > >>>>>>> > > >>>>>>> escriure: > > >>>>>>>> Hello, > > >>>>>>>> > > >>>>>>>> the attached patch replaced Poppler's internal hash table > > >>>>>>>> implementation > > >>>>>>>> GooHash by std::unordered_map found in the C++ standard library > > >>>>>>>> since > > >>>>>>>> C++11. This continues Poppler's slow drift towards standard > > >>>>>>>> library > > >>>>>>>> containers and removes one of the home-grown data structures with > > >>>>>>>> the > > >>>>>>>> main goals of reducing code size and improving the long term > > >>>>>>>> maintainability of the code base. > > >>>>>>> > > >>>>>>> Do you have any benchmarks on "rendering" speed and code size? > > >>>>>> > > >>>>>> Sorry, not at hand. I will try to prepare them during the week. > > >>>>> > > >>>>> I did run Splash rendering benchmarks of this branch against master > > >>>>> with > > >>>>> the result of rendering the circa 2400 documents of the TeXLive > > >>>> > > >>>>> documentation present on my machine being: > > >>>> I'm wondering if those 2400 documents are diverse enough, which they > > >>>> may > > >>>> not be given they are all coming from "the same place". > > >>> > > >>> They seem pretty diverse w.r.t. content, some being text heavy and > > >>> some > > >>> graphics rich. But I guess they are definitely not diverse technically > > >>> as all are prepared using TeX itself. > > >>> > > >>> The main problem on my side is that I have failed to find my DVD copy > > >>> of > > >>> the Poppler regtest document collection until now. :-\ In any case, I > > >>> am > > >>> reasonably confident on the zero sum result since GooHash does not > > >>> seem > > >>> to live in any performance critical code path. > > >>> > > >>>>> Cumulative run time: > > >>>>> Result: 90.95 min ∓ 1.1 % > > >>>>> Reference: 91.57 min ∓ 1.2 % > > >>>>> Deviation: -0.0 % > > >>>>> > > >>>>> Cumulative memory usage: > > >>>>> Result: 37.2 MB ∓ 0.7 % > > >>>>> Reference: 37.0 MB ∓ 0.7 % > > >>>>> Deviation: +0.0 % > > >>>>> > > >>>>> (Where result is this patch and the reference is master.) (The > > >>>>> measurement was taken using the perftest script which I proposed > > >>>>> here > > >>>>> some time ago for which I'll attach the patch again for > > >>>>> reproduceability.) > > >>>> > > >>>> Did you really attach this before? i really don't remember it :D > > >>> > > >>> This was a very long-winded thread ending on 2016-01-01 centered > > >>> around > > >>> the regtest framework. It went from custom driver using QTest's > > >>> benchmark functionality to custom backend in the regtest framework to > > >>> separate Python script. The script still has problems to reliably > > >>> measure really short things like running "pdftotext" for which a > > >>> custom > > >>> C++ driver might be needed, but for long-running stuff like > > >>> "pdftoppm", > > >>> the results seem reasonable. > > >>> > > >>>> Anyhow it seems the change is mostly a zero-sum runtime wise. > > >>> > > >>> Indeed. (Although thinking really really long term, I think a Poppler > > >>> that is using std::vector, std::string and friends everywhere, could > > >>> loose a lot of distributed fat in the form of a lot of memory > > >>> indirections and benefit from the highly optimized standard library. > > >>> But > > >>> it just is not a single patch thing to reap any of these benefits.) > > >>> > > >>>> Which bring us to the code side. First i know it sucks but your > > >>>> spacing > > >>>> is > > >>>> broken, please check the whole patch > > >>>> > > >>>> - nameToGID->removeInt(macGlyphNames[j]); > > >>>> - nameToGID->add(new GooString(macGlyphNames[j]), i); > > >>>> + nameToGID[macGlyphNames[j]] = i; > > >>>> > > >>>> i could fix it, but it's better (for me) if you do :D > > >>> > > >>> Definitely for me to fix. The main problem is that the spacing in > > >>> those > > >>> files was already broken and I am unsure whether I should fix the > > >>> surrounding spacing even if I am not touching it otherwise. Due to > > >>> that, > > >>> there sometimes is no correct way in the current patch. If you do not > > >>> say otherwise, I will try to make at least the touched blocks of code > > >>> consistent. > > >> > > >> Are you sure the spacing is broken? I'd say it's just xpdf weird > > >> spacing > > >> rules of using 2 soft spaces and then hard tabs at 8. > > >> > > >>>> Now onto the code, > > >>>> > > >>>> const auto emplaceRangeMap = [&](const char* encodingName, GBool > > >>>> unicodeOut,> > > >>>> > > >>>> UnicodeMapRange* ranges, int len) { > > >>>> > > >>>> residentUnicodeMaps.emplace( > > >>>> > > >>>> std::piecewise_construct, > > >>>> std::forward_as_tuple(encodingName), > > >>>> std::forward_as_tuple(encodingName, unicodeOut, ranges, len) > > >>>> > > >>>> ); > > >>>> > > >>>> }; > > >>>> > > >>>> It's the first time in my life i see std::piecewise_construct and > > >>>> std::forward_as_tuple, yes that probably makes me a bad C++ > > >>>> developer, > > >>>> but > > >>>> given there's lots of us around, it doesn't make me happy now i don't > > >>>> understand what the code does. > > >>> > > >>> The problem is that most internal Poppler types lack proper > > >>> construction > > >>> and assignment operators, hence I need to work harder to construct > > >>> those > > >>> UnicodeMap instances in-place inside the map (GooHash just stored a > > >>> pointer to it, so there was no problem.) > > >>> > > >>> The whole piecewise_construct and forward_as_tuple dance just ensures, > > >>> that those parameters to emplace are properly grouped before being > > >>> unpacked to become the parameters of std::string and UnicodeMap > > >>> construction again. If UnicodeMap was movable, this would probably > > >>> look > > >>> like > > >>> > > >>> residentUnicodeMaps.emplace( > > >>> > > >>> encodingName, > > >>> UnicodeMap{encodingName, unicodeOut, ranges, len} > > >>> > > >>> ); > > >>> > > >>> If you like, I can try to make Unicode a move-only type and simplify > > >>> the > > >>> mentioned helper functions? > > >> > > >> you can give it a try, not sure how easy that's going to be. > > >> > > >>>> Then there's the benefit/risk ratio. The code using GooHash is mostly > > >>>> rocksolid and we haven't probably touched any line in it for a long > > >>>> time > > >>>> and we have probably neither written new code that uses GooHash. > > >>>> > > >>>> In that regard we risk breaking working code. > > >>>> > > >>>> The benefit is not more speed nor less memory usage as your > > >>>> measurements > > >>>> show. > > >>>> > > >>>> Mostly the benefit is "removing code from maintainership", which i > > >>>> agree > > >>>> is a good thing, but as mentioned before it's some code "we mostly > > >>>> ignore", so it's not that much of a good thing. > > >>> > > >>> I very much agree with the risk assessment. > > >>> > > >>> But I also think the code will ossify (or maybe already is?) due to > > >>> those custom data structures and the less than idiomatic C++ usage. > > >>> Hence I think, Poppler would not just loose code, but the remaining > > >>> code > > >>> should become easier to maintain. (Of course, the piecewise_construct > > >>> fiasco shows that this has intermediate costs. But again, I think this > > >>> is just an incentive to provide types with the usual C++ semantics > > >>> which > > >>> should make all code using those types better.) > > >> > > >> Yeah, i guess you're right. > > >> > > >> Cheers, > > >> > > >> Albert > > >>> > > >>> Best regards, Adam. > > >>> > > >>>> So i'm hesitant of what to do. It really sounds like a nice thing to > > >>>> not > > >>>> have custom structures, but on the other hand it's not like they get > > >>>> much > > >>>> in the way of coding. > > >>>> > > >>>> I'd really appreciate other people's comments on this. > > >>>> > > >>>> Cheers, > > >>>> > > >>>> Albert > > >>>>> > > >>>>> I'll also attach the detailed comparison, but the gist seems to be > > >>>>> that > > >>>>> if there are significant changes, the run time is reduced but the > > >>>>> memory > > >>>>> usage is increased in the majority of cases. But this does not seem > > >>>>> to > > >>>>> show up in the cumulative results. > > >>>>> > > >>>>> Best regards, Adam. > > >>>>> > > >>>>> P.S.: One could try to improve the memory usage by tuning the load > > >>>>> factor or calling shrink_to_fit where appropriate. Would you like me > > >>>>> to > > >>>>> try to do this? > > >>>>> > > >>>>> P.P.S.: One obvious area for improvement would be better > > >>>>> interoperability between GooString and std::string, i.e. not > > >>>>> converting > > >>>>> them as C strings so that the length information does not need to be > > >>>>> recomputed. I will try to prepare this as a separate patch on top of > > >>>>> this one or should I include this here? > > >>>>> > > >>>>> Best regards, Adam. > > >>>>> > > >>>>>> Concerning code size, a release build of libpoppler.so goes from > > >>>>>> > > >>>>>> text data bss dec hex filename > > >>>>>> > > >>>>>> 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 > > >>>>>> > > >>>>>> for the current master to > > >>>>>> > > >>>>>> text data bss dec hex filename > > >>>>>> > > >>>>>> 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 > > >>>>>> > > >>>>>> with the patch applied, i.e. a 0.65% increase in binary size. > > >>>>>> > > >>>>>> > > >>>>>> Please note that in my original message, I was referring only to > > >>>>>> source > > >>>>>> code size, i.e. > > >>>>>> > > >>>>>> git diff --stat master...remove-goo-hash > > >>>>>> > > >>>>>> 18 files changed, 168 insertions(+), 803 deletions(-) > > >>>>>> > > >>>>>>> Cheers, > > >>>>>>> > > >>>>>>> Albert > > >>>>>> > > >>>>>> Best regards, Adam. > > >>>>>> > > >>>>>>>> Best regards, Adam. > > >>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> poppler mailing list > > >>>>>>> poppler@lists.freedesktop.org > > >>>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> poppler mailing list > > >>>>>> poppler@lists.freedesktop.org > > >>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler > > >>>> > > >>>> _______________________________________________ > > >>>> poppler mailing list > > >>>> poppler@lists.freedesktop.org > > >>>> https://lists.freedesktop.org/mailman/listinfo/poppler > > >> > > >> _______________________________________________ > > >> poppler mailing list > > >> poppler@lists.freedesktop.org > > >> https://lists.freedesktop.org/mailman/listinfo/poppler > > > > > > _______________________________________________ > > > poppler mailing list > > > poppler@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/poppler > > _______________________________________________ > poppler mailing list > poppler@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler