Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-05-02 Thread Albert Astals Cid
El dimecres, 2 de maig de 2018, a les 19:11:19 CEST, Adam Reichold va 
escriure:
> Hello,
> 
> Am 01.05.2018 um 23:57 schrieb Albert Astals Cid:
> > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va
> > 
> > escriure:
> >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va
> > 
> > escriure:
> >>> Hello Albert,
> >>> 
> >>> please note that in the previous patch, I made a mistake with the move
> >>> constructor (the ref count should not be moved/swapped since it is tied
> >>> to the instance, not its value). Attached is a fixed patch.
> >>> 
> >>> I found the issue during further performance tests which however only
> >>> continued to strengthen to zero sum result. In any case, I extend the
> >>> Python-based perftest script with a C++-based driver to be able to
> >>> reliably measure single-page runtimes for this, which I will also attach
> >>> if anyone is interested in this.
> >> 
> >> Ok, so it seems there's a mild agreement (or at least no strong
> >> disagreement) that this may be the way forward. I will first wait to get
> >> poppler 0.63 out of the door (that is already late several months,
> >> between
> >> freedesktop shutting down ssh access, them forgetting to tell us it was
> >> open again, my disk dying and then me procastinating as usual) and once
> >> it's out I'll come back and reevaluate this.
> > 
> > I was trying to apply the patch for running a quick regtest but running
> > git am over it gave me some warning/error I wasn't sure how to get out of
> > correctly.

regtest results are good, i guess i'll have a second quick look at the patches 
and unless someone disagrees i'll push them.

Cheers,
  Albert

> > 
> > Could you try rebasing the patch on top of current master?
> > 
> > Cheers,
> > 
> >   Albert
> 
> Attached is a version rebased against the current master.
> 
> Best regards,
> Adam
> 
> >> Cheers,
> >> 
> >>   Albert
> >>> 
> >>> Best regards, Adam.
> >>> 
> >>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
>  Ha! didn't realize you had already done all that, will read/answer the
>  other email.
>  
>  Cheers,
>  
>    Albert
>  
>  El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid
>  va
>  
>  escriure:
> > El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
> > 
> > escriure:
> >> Hello again,
> >> 
> >> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
> >>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold
> >>> va
> >>> 
> >>> escriure:
>  Hello again,
>  
>  Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam
> >> Reichold
> >> va
> >> 
> >> escriure:
> >>> Hello,
> >>> 
> >>> the attached patch replaced Poppler's internal hash table
> >>> implementation
> >>> GooHash by std::unordered_map found in the C++ standard library
> >>> since
> >>> C++11. This continues Poppler's slow drift towards standard
> >>> library
> >>> containers and removes one of the home-grown data structures
> >>> with
> >>> the
> >>> main goals of reducing code size and improving the long term
> >>> maintainability of the code base.
> >> 
> >> Do you have any benchmarks on "rendering" speed and code size?
> > 
> > Sorry, not at hand. I will try to prepare them during the week.
>  
>  I did run Splash rendering benchmarks of this branch against master
>  with
>  the result of rendering the circa 2400 documents of the TeXLive
> >>> 
>  documentation present on my machine being:
> >>> I'm wondering if those 2400 documents are diverse enough, which they
> >>> may
> >>> not be given they are all coming from "the same place".
> >> 
> >> They seem pretty diverse w.r.t. content, some being text heavy and
> >> some
> >> graphics rich. But I guess they are definitely not diverse
> >> technically
> >> as all are prepared using TeX itself.
> >> 
> >> The main problem on my side is that I have failed to find my DVD copy
> >> of
> >> the Poppler regtest document collection until now. :-\ In any case, I
> >> am
> >> reasonably confident on the zero sum result since GooHash does not
> >> seem
> >> to live in any performance critical code path.
> >> 
>  Cumulative run time:
>  Result: 90.95 min ∓ 1.1 %
>  Reference: 91.57 min ∓ 1.2 %
>  Deviation: -0.0 %
>  
>  Cumulative memory usage:
>  Result: 37.2 MB ∓ 0.7 %
>  Reference: 37.0 MB ∓ 0.7 %
>  Deviation: +0.0 %
>  
>  

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-05-02 Thread Adam Reichold
Hello,

Am 01.05.2018 um 23:57 schrieb Albert Astals Cid:
> El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va 
> escriure:
>> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va 
> escriure:
>>> Hello Albert,
>>>
>>> please note that in the previous patch, I made a mistake with the move
>>> constructor (the ref count should not be moved/swapped since it is tied
>>> to the instance, not its value). Attached is a fixed patch.
>>>
>>> I found the issue during further performance tests which however only
>>> continued to strengthen to zero sum result. In any case, I extend the
>>> Python-based perftest script with a C++-based driver to be able to
>>> reliably measure single-page runtimes for this, which I will also attach
>>> if anyone is interested in this.
>>
>> Ok, so it seems there's a mild agreement (or at least no strong
>> disagreement) that this may be the way forward. I will first wait to get
>> poppler 0.63 out of the door (that is already late several months, between
>> freedesktop shutting down ssh access, them forgetting to tell us it was
>> open again, my disk dying and then me procastinating as usual) and once
>> it's out I'll come back and reevaluate this.
> 
> I was trying to apply the patch for running a quick regtest but running git 
> am 
> over it gave me some warning/error I wasn't sure how to get out of correctly.
> 
> Could you try rebasing the patch on top of current master?
> 
> Cheers,
>   Albert

Attached is a version rebased against the current master.

Best regards,
Adam

>>
>> Cheers,
>>   Albert
>>
>>> Best regards, Adam.
>>>
>>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
 Ha! didn't realize you had already done all that, will read/answer the
 other email.

 Cheers,

   Albert

 El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va

 escriure:
> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
>
> escriure:
>> Hello again,
>>
>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
>>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va
>>>
>>> escriure:
 Hello again,

 Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam
>> Reichold
>> va
>>
>> escriure:
>>> Hello,
>>>
>>> the attached patch replaced Poppler's internal hash table
>>> implementation
>>> GooHash by std::unordered_map found in the C++ standard library
>>> since
>>> C++11. This continues Poppler's slow drift towards standard
>>> library
>>> containers and removes one of the home-grown data structures with
>>> the
>>> main goals of reducing code size and improving the long term
>>> maintainability of the code base.
>>
>> Do you have any benchmarks on "rendering" speed and code size?
>
> Sorry, not at hand. I will try to prepare them during the week.

 I did run Splash rendering benchmarks of this branch against master
 with
 the result of rendering the circa 2400 documents of the TeXLive
>>>
 documentation present on my machine being:
>>> I'm wondering if those 2400 documents are diverse enough, which they
>>> may
>>> not be given they are all coming from "the same place".
>>
>> They seem pretty diverse w.r.t. content, some being text heavy and
>> some
>> graphics rich. But I guess they are definitely not diverse technically
>> as all are prepared using TeX itself.
>>
>> The main problem on my side is that I have failed to find my DVD copy
>> of
>> the Poppler regtest document collection until now. :-\ In any case, I
>> am
>> reasonably confident on the zero sum result since GooHash does not
>> seem
>> to live in any performance critical code path.
>>
 Cumulative run time:
 Result: 90.95 min ∓ 1.1 %
 Reference: 91.57 min ∓ 1.2 %
 Deviation: -0.0 %

 Cumulative memory usage:
 Result: 37.2 MB ∓ 0.7 %
 Reference: 37.0 MB ∓ 0.7 %
 Deviation: +0.0 %

 (Where result is this patch and the reference is master.) (The
 measurement was taken using the perftest script which I proposed
 here
 some time ago for which I'll attach the patch again for
 reproduceability.)
>>>
>>> Did you really attach this before? i really don't remember it :D
>>
>> This was a very long-winded thread ending on 2016-01-01 centered
>> around
>> the regtest framework. It went from custom driver using QTest's
>> benchmark functionality to custom backend in the regtest