Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-05-06 Thread Albert Astals Cid
El dijous, 3 de maig de 2018, a les 0:57:21 CEST, Albert Astals Cid va 
escriure:
> El dimecres, 2 de maig de 2018, a les 19:11:19 CEST, Adam Reichold va
> 
> escriure:
> > Hello,
> > 
> > Am 01.05.2018 um 23:57 schrieb Albert Astals Cid:
> > > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid
> > > va
> > > 
> > > escriure:
> > >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va
> > > 
> > > escriure:
> > >>> Hello Albert,
> > >>> 
> > >>> please note that in the previous patch, I made a mistake with the move
> > >>> constructor (the ref count should not be moved/swapped since it is
> > >>> tied
> > >>> to the instance, not its value). Attached is a fixed patch.
> > >>> 
> > >>> I found the issue during further performance tests which however only
> > >>> continued to strengthen to zero sum result. In any case, I extend the
> > >>> Python-based perftest script with a C++-based driver to be able to
> > >>> reliably measure single-page runtimes for this, which I will also
> > >>> attach
> > >>> if anyone is interested in this.
> > >> 
> > >> Ok, so it seems there's a mild agreement (or at least no strong
> > >> disagreement) that this may be the way forward. I will first wait to
> > >> get
> > >> poppler 0.63 out of the door (that is already late several months,
> > >> between
> > >> freedesktop shutting down ssh access, them forgetting to tell us it was
> > >> open again, my disk dying and then me procastinating as usual) and once
> > >> it's out I'll come back and reevaluate this.
> > > 
> > > I was trying to apply the patch for running a quick regtest but running
> > > git am over it gave me some warning/error I wasn't sure how to get out
> > > of
> > > correctly.
> 
> regtest results are good, i guess i'll have a second quick look at the
> patches and unless someone disagrees i'll push them.

It's in :)

Cheers,
  Albert


___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-05-02 Thread Albert Astals Cid
El dimecres, 2 de maig de 2018, a les 19:11:19 CEST, Adam Reichold va 
escriure:
> Hello,
> 
> Am 01.05.2018 um 23:57 schrieb Albert Astals Cid:
> > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va
> > 
> > escriure:
> >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va
> > 
> > escriure:
> >>> Hello Albert,
> >>> 
> >>> please note that in the previous patch, I made a mistake with the move
> >>> constructor (the ref count should not be moved/swapped since it is tied
> >>> to the instance, not its value). Attached is a fixed patch.
> >>> 
> >>> I found the issue during further performance tests which however only
> >>> continued to strengthen to zero sum result. In any case, I extend the
> >>> Python-based perftest script with a C++-based driver to be able to
> >>> reliably measure single-page runtimes for this, which I will also attach
> >>> if anyone is interested in this.
> >> 
> >> Ok, so it seems there's a mild agreement (or at least no strong
> >> disagreement) that this may be the way forward. I will first wait to get
> >> poppler 0.63 out of the door (that is already late several months,
> >> between
> >> freedesktop shutting down ssh access, them forgetting to tell us it was
> >> open again, my disk dying and then me procastinating as usual) and once
> >> it's out I'll come back and reevaluate this.
> > 
> > I was trying to apply the patch for running a quick regtest but running
> > git am over it gave me some warning/error I wasn't sure how to get out of
> > correctly.

regtest results are good, i guess i'll have a second quick look at the patches 
and unless someone disagrees i'll push them.

Cheers,
  Albert

> > 
> > Could you try rebasing the patch on top of current master?
> > 
> > Cheers,
> > 
> >   Albert
> 
> Attached is a version rebased against the current master.
> 
> Best regards,
> Adam
> 
> >> Cheers,
> >> 
> >>   Albert
> >>> 
> >>> Best regards, Adam.
> >>> 
> >>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
>  Ha! didn't realize you had already done all that, will read/answer the
>  other email.
>  
>  Cheers,
>  
>    Albert
>  
>  El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid
>  va
>  
>  escriure:
> > El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
> > 
> > escriure:
> >> Hello again,
> >> 
> >> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
> >>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold
> >>> va
> >>> 
> >>> escriure:
>  Hello again,
>  
>  Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam
> >> Reichold
> >> va
> >> 
> >> escriure:
> >>> Hello,
> >>> 
> >>> the attached patch replaced Poppler's internal hash table
> >>> implementation
> >>> GooHash by std::unordered_map found in the C++ standard library
> >>> since
> >>> C++11. This continues Poppler's slow drift towards standard
> >>> library
> >>> containers and removes one of the home-grown data structures
> >>> with
> >>> the
> >>> main goals of reducing code size and improving the long term
> >>> maintainability of the code base.
> >> 
> >> Do you have any benchmarks on "rendering" speed and code size?
> > 
> > Sorry, not at hand. I will try to prepare them during the week.
>  
>  I did run Splash rendering benchmarks of this branch against master
>  with
>  the result of rendering the circa 2400 documents of the TeXLive
> >>> 
>  documentation present on my machine being:
> >>> I'm wondering if those 2400 documents are diverse enough, which they
> >>> may
> >>> not be given they are all coming from "the same place".
> >> 
> >> They seem pretty diverse w.r.t. content, some being text heavy and
> >> some
> >> graphics rich. But I guess they are definitely not diverse
> >> technically
> >> as all are prepared using TeX itself.
> >> 
> >> The main problem on my side is that I have failed to find my DVD copy
> >> of
> >> the Poppler regtest document collection until now. :-\ In any case, I
> >> am
> >> reasonably confident on the zero sum result since GooHash does not
> >> seem
> >> to live in any performance critical code path.
> >> 
>  Cumulative run time:
>  Result: 90.95 min ∓ 1.1 %
>  Reference: 91.57 min ∓ 1.2 %
>  Deviation: -0.0 %
>  
>  Cumulative memory usage:
>  Result: 37.2 MB ∓ 0.7 %
>  Reference: 37.0 MB ∓ 0.7 %
>  Deviation: +0.0 %
>  
>  

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-05-02 Thread Adam Reichold
Hello,

Am 01.05.2018 um 23:57 schrieb Albert Astals Cid:
> El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va 
> escriure:
>> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va 
> escriure:
>>> Hello Albert,
>>>
>>> please note that in the previous patch, I made a mistake with the move
>>> constructor (the ref count should not be moved/swapped since it is tied
>>> to the instance, not its value). Attached is a fixed patch.
>>>
>>> I found the issue during further performance tests which however only
>>> continued to strengthen to zero sum result. In any case, I extend the
>>> Python-based perftest script with a C++-based driver to be able to
>>> reliably measure single-page runtimes for this, which I will also attach
>>> if anyone is interested in this.
>>
>> Ok, so it seems there's a mild agreement (or at least no strong
>> disagreement) that this may be the way forward. I will first wait to get
>> poppler 0.63 out of the door (that is already late several months, between
>> freedesktop shutting down ssh access, them forgetting to tell us it was
>> open again, my disk dying and then me procastinating as usual) and once
>> it's out I'll come back and reevaluate this.
> 
> I was trying to apply the patch for running a quick regtest but running git 
> am 
> over it gave me some warning/error I wasn't sure how to get out of correctly.
> 
> Could you try rebasing the patch on top of current master?
> 
> Cheers,
>   Albert

Attached is a version rebased against the current master.

Best regards,
Adam

>>
>> Cheers,
>>   Albert
>>
>>> Best regards, Adam.
>>>
>>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
 Ha! didn't realize you had already done all that, will read/answer the
 other email.

 Cheers,

   Albert

 El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va

 escriure:
> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
>
> escriure:
>> Hello again,
>>
>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
>>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va
>>>
>>> escriure:
 Hello again,

 Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam
>> Reichold
>> va
>>
>> escriure:
>>> Hello,
>>>
>>> the attached patch replaced Poppler's internal hash table
>>> implementation
>>> GooHash by std::unordered_map found in the C++ standard library
>>> since
>>> C++11. This continues Poppler's slow drift towards standard
>>> library
>>> containers and removes one of the home-grown data structures with
>>> the
>>> main goals of reducing code size and improving the long term
>>> maintainability of the code base.
>>
>> Do you have any benchmarks on "rendering" speed and code size?
>
> Sorry, not at hand. I will try to prepare them during the week.

 I did run Splash rendering benchmarks of this branch against master
 with
 the result of rendering the circa 2400 documents of the TeXLive
>>>
 documentation present on my machine being:
>>> I'm wondering if those 2400 documents are diverse enough, which they
>>> may
>>> not be given they are all coming from "the same place".
>>
>> They seem pretty diverse w.r.t. content, some being text heavy and
>> some
>> graphics rich. But I guess they are definitely not diverse technically
>> as all are prepared using TeX itself.
>>
>> The main problem on my side is that I have failed to find my DVD copy
>> of
>> the Poppler regtest document collection until now. :-\ In any case, I
>> am
>> reasonably confident on the zero sum result since GooHash does not
>> seem
>> to live in any performance critical code path.
>>
 Cumulative run time:
 Result: 90.95 min ∓ 1.1 %
 Reference: 91.57 min ∓ 1.2 %
 Deviation: -0.0 %

 Cumulative memory usage:
 Result: 37.2 MB ∓ 0.7 %
 Reference: 37.0 MB ∓ 0.7 %
 Deviation: +0.0 %

 (Where result is this patch and the reference is master.) (The
 measurement was taken using the perftest script which I proposed
 here
 some time ago for which I'll attach the patch again for
 reproduceability.)
>>>
>>> Did you really attach this before? i really don't remember it :D
>>
>> This was a very long-winded thread ending on 2016-01-01 centered
>> around
>> the regtest framework. It went from custom driver using QTest's
>> benchmark functionality to custom backend in the regtest 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-05-01 Thread Albert Astals Cid
El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va 
escriure:
> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va 
escriure:
> > Hello Albert,
> > 
> > please note that in the previous patch, I made a mistake with the move
> > constructor (the ref count should not be moved/swapped since it is tied
> > to the instance, not its value). Attached is a fixed patch.
> > 
> > I found the issue during further performance tests which however only
> > continued to strengthen to zero sum result. In any case, I extend the
> > Python-based perftest script with a C++-based driver to be able to
> > reliably measure single-page runtimes for this, which I will also attach
> > if anyone is interested in this.
> 
> Ok, so it seems there's a mild agreement (or at least no strong
> disagreement) that this may be the way forward. I will first wait to get
> poppler 0.63 out of the door (that is already late several months, between
> freedesktop shutting down ssh access, them forgetting to tell us it was
> open again, my disk dying and then me procastinating as usual) and once
> it's out I'll come back and reevaluate this.

I was trying to apply the patch for running a quick regtest but running git am 
over it gave me some warning/error I wasn't sure how to get out of correctly.

Could you try rebasing the patch on top of current master?

Cheers,
  Albert

> 
> Cheers,
>   Albert
> 
> > Best regards, Adam.
> > 
> > Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
> > > Ha! didn't realize you had already done all that, will read/answer the
> > > other email.
> > > 
> > > Cheers,
> > > 
> > >   Albert
> > > 
> > > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va
> > > 
> > > escriure:
> > >> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
> > >> 
> > >> escriure:
> > >>> Hello again,
> > >>> 
> > >>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
> >  El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va
> >  
> >  escriure:
> > > Hello again,
> > > 
> > > Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> > >> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> > >>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam
> > >>> Reichold
> > >>> va
> > >>> 
> > >>> escriure:
> >  Hello,
> >  
> >  the attached patch replaced Poppler's internal hash table
> >  implementation
> >  GooHash by std::unordered_map found in the C++ standard library
> >  since
> >  C++11. This continues Poppler's slow drift towards standard
> >  library
> >  containers and removes one of the home-grown data structures with
> >  the
> >  main goals of reducing code size and improving the long term
> >  maintainability of the code base.
> > >>> 
> > >>> Do you have any benchmarks on "rendering" speed and code size?
> > >> 
> > >> Sorry, not at hand. I will try to prepare them during the week.
> > > 
> > > I did run Splash rendering benchmarks of this branch against master
> > > with
> > > the result of rendering the circa 2400 documents of the TeXLive
> >  
> > > documentation present on my machine being:
> >  I'm wondering if those 2400 documents are diverse enough, which they
> >  may
> >  not be given they are all coming from "the same place".
> > >>> 
> > >>> They seem pretty diverse w.r.t. content, some being text heavy and
> > >>> some
> > >>> graphics rich. But I guess they are definitely not diverse technically
> > >>> as all are prepared using TeX itself.
> > >>> 
> > >>> The main problem on my side is that I have failed to find my DVD copy
> > >>> of
> > >>> the Poppler regtest document collection until now. :-\ In any case, I
> > >>> am
> > >>> reasonably confident on the zero sum result since GooHash does not
> > >>> seem
> > >>> to live in any performance critical code path.
> > >>> 
> > > Cumulative run time:
> > > Result: 90.95 min ∓ 1.1 %
> > > Reference: 91.57 min ∓ 1.2 %
> > > Deviation: -0.0 %
> > > 
> > > Cumulative memory usage:
> > > Result: 37.2 MB ∓ 0.7 %
> > > Reference: 37.0 MB ∓ 0.7 %
> > > Deviation: +0.0 %
> > > 
> > > (Where result is this patch and the reference is master.) (The
> > > measurement was taken using the perftest script which I proposed
> > > here
> > > some time ago for which I'll attach the patch again for
> > > reproduceability.)
> >  
> >  Did you really attach this before? i really don't remember it :D
> > >>> 
> > >>> This was a very long-winded thread ending on 2016-01-01 centered
> > >>> around
> > >>> the regtest framework. It went from custom driver using QTest's
> > >>> benchmark functionality to custom backend in the regtest framework to
> > >>> separate Python script. The script still 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-03-04 Thread Albert Astals Cid
El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va escriure:
> Hello Albert,
> 
> please note that in the previous patch, I made a mistake with the move
> constructor (the ref count should not be moved/swapped since it is tied
> to the instance, not its value). Attached is a fixed patch.
> 
> I found the issue during further performance tests which however only
> continued to strengthen to zero sum result. In any case, I extend the
> Python-based perftest script with a C++-based driver to be able to
> reliably measure single-page runtimes for this, which I will also attach
> if anyone is interested in this.

Ok, so it seems there's a mild agreement (or at least no strong disagreement) 
that this may be the way forward. I will first wait to get poppler 0.63 out of 
the door (that is already late several months, between freedesktop shutting 
down ssh access, them forgetting to tell us it was open again, my disk dying 
and then me procastinating as usual) and once it's out I'll come back and 
reevaluate this.

Cheers,
  Albert

> 
> Best regards, Adam.
> 
> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
> > Ha! didn't realize you had already done all that, will read/answer the
> > other email.
> > 
> > Cheers,
> > 
> >   Albert
> > 
> > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va
> > 
> > escriure:
> >> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
> >> 
> >> escriure:
> >>> Hello again,
> >>> 
> >>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
>  El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va
>  
>  escriure:
> > Hello again,
> > 
> > Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> >> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> >>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold
> >>> va
> >>> 
> >>> escriure:
>  Hello,
>  
>  the attached patch replaced Poppler's internal hash table
>  implementation
>  GooHash by std::unordered_map found in the C++ standard library
>  since
>  C++11. This continues Poppler's slow drift towards standard library
>  containers and removes one of the home-grown data structures with
>  the
>  main goals of reducing code size and improving the long term
>  maintainability of the code base.
> >>> 
> >>> Do you have any benchmarks on "rendering" speed and code size?
> >> 
> >> Sorry, not at hand. I will try to prepare them during the week.
> > 
> > I did run Splash rendering benchmarks of this branch against master
> > with
> > the result of rendering the circa 2400 documents of the TeXLive
>  
> > documentation present on my machine being:
>  I'm wondering if those 2400 documents are diverse enough, which they
>  may
>  not be given they are all coming from "the same place".
> >>> 
> >>> They seem pretty diverse w.r.t. content, some being text heavy and some
> >>> graphics rich. But I guess they are definitely not diverse technically
> >>> as all are prepared using TeX itself.
> >>> 
> >>> The main problem on my side is that I have failed to find my DVD copy of
> >>> the Poppler regtest document collection until now. :-\ In any case, I am
> >>> reasonably confident on the zero sum result since GooHash does not seem
> >>> to live in any performance critical code path.
> >>> 
> > Cumulative run time:
> > Result: 90.95 min ∓ 1.1 %
> > Reference: 91.57 min ∓ 1.2 %
> > Deviation: -0.0 %
> > 
> > Cumulative memory usage:
> > Result: 37.2 MB ∓ 0.7 %
> > Reference: 37.0 MB ∓ 0.7 %
> > Deviation: +0.0 %
> > 
> > (Where result is this patch and the reference is master.) (The
> > measurement was taken using the perftest script which I proposed here
> > some time ago for which I'll attach the patch again for
> > reproduceability.)
>  
>  Did you really attach this before? i really don't remember it :D
> >>> 
> >>> This was a very long-winded thread ending on 2016-01-01 centered around
> >>> the regtest framework. It went from custom driver using QTest's
> >>> benchmark functionality to custom backend in the regtest framework to
> >>> separate Python script. The script still has problems to reliably
> >>> measure really short things like running "pdftotext" for which a custom
> >>> C++ driver might be needed, but for long-running stuff like "pdftoppm",
> >>> the results seem reasonable.
> >>> 
>  Anyhow it seems the change is mostly a zero-sum runtime wise.
> >>> 
> >>> Indeed. (Although thinking really really long term, I think a Poppler
> >>> that is using std::vector, std::string and friends everywhere, could
> >>> loose a lot of distributed fat in the form of a lot of memory
> >>> indirections and benefit from the highly optimized standard library. But
> >>> it 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-03-03 Thread Adam Reichold
Hello Albert,

please note that in the previous patch, I made a mistake with the move
constructor (the ref count should not be moved/swapped since it is tied
to the instance, not its value). Attached is a fixed patch.

I found the issue during further performance tests which however only
continued to strengthen to zero sum result. In any case, I extend the
Python-based perftest script with a C++-based driver to be able to
reliably measure single-page runtimes for this, which I will also attach
if anyone is interested in this.

Best regards, Adam.

Am 03.03.2018 um 20:29 schrieb Albert Astals Cid:
> Ha! didn't realize you had already done all that, will read/answer the other 
> email.
> 
> Cheers,
>   Albert
> 
> El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va 
> escriure:
>> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
>>
>> escriure:
>>> Hello again,
>>>
>>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
 El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va

 escriure:
> Hello again,
>
> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold
>>> va
>>>
>>> escriure:
 Hello,

 the attached patch replaced Poppler's internal hash table
 implementation
 GooHash by std::unordered_map found in the C++ standard library
 since
 C++11. This continues Poppler's slow drift towards standard library
 containers and removes one of the home-grown data structures with
 the
 main goals of reducing code size and improving the long term
 maintainability of the code base.
>>>
>>> Do you have any benchmarks on "rendering" speed and code size?
>>
>> Sorry, not at hand. I will try to prepare them during the week.
>
> I did run Splash rendering benchmarks of this branch against master
> with
> the result of rendering the circa 2400 documents of the TeXLive

> documentation present on my machine being:
 I'm wondering if those 2400 documents are diverse enough, which they may
 not be given they are all coming from "the same place".
>>>
>>> They seem pretty diverse w.r.t. content, some being text heavy and some
>>> graphics rich. But I guess they are definitely not diverse technically
>>> as all are prepared using TeX itself.
>>>
>>> The main problem on my side is that I have failed to find my DVD copy of
>>> the Poppler regtest document collection until now. :-\ In any case, I am
>>> reasonably confident on the zero sum result since GooHash does not seem
>>> to live in any performance critical code path.
>>>
> Cumulative run time:
> Result: 90.95 min ∓ 1.1 %
> Reference: 91.57 min ∓ 1.2 %
> Deviation: -0.0 %
>
> Cumulative memory usage:
> Result: 37.2 MB ∓ 0.7 %
> Reference: 37.0 MB ∓ 0.7 %
> Deviation: +0.0 %
>
> (Where result is this patch and the reference is master.) (The
> measurement was taken using the perftest script which I proposed here
> some time ago for which I'll attach the patch again for
> reproduceability.)

 Did you really attach this before? i really don't remember it :D
>>>
>>> This was a very long-winded thread ending on 2016-01-01 centered around
>>> the regtest framework. It went from custom driver using QTest's
>>> benchmark functionality to custom backend in the regtest framework to
>>> separate Python script. The script still has problems to reliably
>>> measure really short things like running "pdftotext" for which a custom
>>> C++ driver might be needed, but for long-running stuff like "pdftoppm",
>>> the results seem reasonable.
>>>
 Anyhow it seems the change is mostly a zero-sum runtime wise.
>>>
>>> Indeed. (Although thinking really really long term, I think a Poppler
>>> that is using std::vector, std::string and friends everywhere, could
>>> loose a lot of distributed fat in the form of a lot of memory
>>> indirections and benefit from the highly optimized standard library. But
>>> it just is not a single patch thing to reap any of these benefits.)
>>>
 Which bring us to the code side. First i know it sucks but your spacing
 is
 broken, please check the whole patch

 -  nameToGID->removeInt(macGlyphNames[j]);
 -  nameToGID->add(new GooString(macGlyphNames[j]), i);
 +  nameToGID[macGlyphNames[j]] = i;

 i could fix it, but it's better (for me) if you do :D
>>>
>>> Definitely for me to fix. The main problem is that the spacing in those
>>> files was already broken and I am unsure whether I should fix the
>>> surrounding spacing even if I am not touching it otherwise. Due to that,
>>> there sometimes is no correct way in the current patch. If you do not
>>> say otherwise, I will 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-03-03 Thread Albert Astals Cid
Ha! didn't realize you had already done all that, will read/answer the other 
email.

Cheers,
  Albert

El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va 
escriure:
> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va
> 
> escriure:
> > Hello again,
> > 
> > Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
> > > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va
> > > 
> > > escriure:
> > >> Hello again,
> > >> 
> > >> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> > >>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> >  El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold
> >  va
> >  
> >  escriure:
> > > Hello,
> > > 
> > > the attached patch replaced Poppler's internal hash table
> > > implementation
> > > GooHash by std::unordered_map found in the C++ standard library
> > > since
> > > C++11. This continues Poppler's slow drift towards standard library
> > > containers and removes one of the home-grown data structures with
> > > the
> > > main goals of reducing code size and improving the long term
> > > maintainability of the code base.
> >  
> >  Do you have any benchmarks on "rendering" speed and code size?
> > >>> 
> > >>> Sorry, not at hand. I will try to prepare them during the week.
> > >> 
> > >> I did run Splash rendering benchmarks of this branch against master
> > >> with
> > >> the result of rendering the circa 2400 documents of the TeXLive
> > > 
> > >> documentation present on my machine being:
> > > I'm wondering if those 2400 documents are diverse enough, which they may
> > > not be given they are all coming from "the same place".
> > 
> > They seem pretty diverse w.r.t. content, some being text heavy and some
> > graphics rich. But I guess they are definitely not diverse technically
> > as all are prepared using TeX itself.
> > 
> > The main problem on my side is that I have failed to find my DVD copy of
> > the Poppler regtest document collection until now. :-\ In any case, I am
> > reasonably confident on the zero sum result since GooHash does not seem
> > to live in any performance critical code path.
> > 
> > >> Cumulative run time:
> > >> Result: 90.95 min ∓ 1.1 %
> > >> Reference: 91.57 min ∓ 1.2 %
> > >> Deviation: -0.0 %
> > >> 
> > >> Cumulative memory usage:
> > >> Result: 37.2 MB ∓ 0.7 %
> > >> Reference: 37.0 MB ∓ 0.7 %
> > >> Deviation: +0.0 %
> > >> 
> > >> (Where result is this patch and the reference is master.) (The
> > >> measurement was taken using the perftest script which I proposed here
> > >> some time ago for which I'll attach the patch again for
> > >> reproduceability.)
> > > 
> > > Did you really attach this before? i really don't remember it :D
> > 
> > This was a very long-winded thread ending on 2016-01-01 centered around
> > the regtest framework. It went from custom driver using QTest's
> > benchmark functionality to custom backend in the regtest framework to
> > separate Python script. The script still has problems to reliably
> > measure really short things like running "pdftotext" for which a custom
> > C++ driver might be needed, but for long-running stuff like "pdftoppm",
> > the results seem reasonable.
> > 
> > > Anyhow it seems the change is mostly a zero-sum runtime wise.
> > 
> > Indeed. (Although thinking really really long term, I think a Poppler
> > that is using std::vector, std::string and friends everywhere, could
> > loose a lot of distributed fat in the form of a lot of memory
> > indirections and benefit from the highly optimized standard library. But
> > it just is not a single patch thing to reap any of these benefits.)
> > 
> > > Which bring us to the code side. First i know it sucks but your spacing
> > > is
> > > broken, please check the whole patch
> > > 
> > > - nameToGID->removeInt(macGlyphNames[j]);
> > > - nameToGID->add(new GooString(macGlyphNames[j]), i);
> > > +  nameToGID[macGlyphNames[j]] = i;
> > > 
> > > i could fix it, but it's better (for me) if you do :D
> > 
> > Definitely for me to fix. The main problem is that the spacing in those
> > files was already broken and I am unsure whether I should fix the
> > surrounding spacing even if I am not touching it otherwise. Due to that,
> > there sometimes is no correct way in the current patch. If you do not
> > say otherwise, I will try to make at least the touched blocks of code
> > consistent.
> 
> Are you sure the spacing is broken? I'd say it's just xpdf weird spacing
> rules of using 2 soft spaces and then hard tabs at 8.
> 
> > > Now onto the code,
> > > 
> > >   const auto emplaceRangeMap = [&](const char* encodingName, GBool
> > >   unicodeOut,>
> > > 
> > > UnicodeMapRange* ranges, int len) {
> > > 
> > > residentUnicodeMaps.emplace(
> > > 
> > >   std::piecewise_construct,
> > >   std::forward_as_tuple(encodingName),
> > >   

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-03-03 Thread Albert Astals Cid
El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va 
escriure:
> Hello again,
> 
> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
> > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va
> > 
> > escriure:
> >> Hello again,
> >> 
> >> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> >>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>  El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va
>  
>  escriure:
> > Hello,
> > 
> > the attached patch replaced Poppler's internal hash table
> > implementation
> > GooHash by std::unordered_map found in the C++ standard library since
> > C++11. This continues Poppler's slow drift towards standard library
> > containers and removes one of the home-grown data structures with the
> > main goals of reducing code size and improving the long term
> > maintainability of the code base.
>  
>  Do you have any benchmarks on "rendering" speed and code size?
> >>> 
> >>> Sorry, not at hand. I will try to prepare them during the week.
> >> 
> >> I did run Splash rendering benchmarks of this branch against master with
> >> the result of rendering the circa 2400 documents of the TeXLive
> > 
> >> documentation present on my machine being:
> > I'm wondering if those 2400 documents are diverse enough, which they may
> > not be given they are all coming from "the same place".
> 
> They seem pretty diverse w.r.t. content, some being text heavy and some
> graphics rich. But I guess they are definitely not diverse technically
> as all are prepared using TeX itself.
> 
> The main problem on my side is that I have failed to find my DVD copy of
> the Poppler regtest document collection until now. :-\ In any case, I am
> reasonably confident on the zero sum result since GooHash does not seem
> to live in any performance critical code path.
> 
> >> Cumulative run time:
> >> Result: 90.95 min ∓ 1.1 %
> >> Reference: 91.57 min ∓ 1.2 %
> >> Deviation: -0.0 %
> >> 
> >> Cumulative memory usage:
> >> Result: 37.2 MB ∓ 0.7 %
> >> Reference: 37.0 MB ∓ 0.7 %
> >> Deviation: +0.0 %
> >> 
> >> (Where result is this patch and the reference is master.) (The
> >> measurement was taken using the perftest script which I proposed here
> >> some time ago for which I'll attach the patch again for
> >> reproduceability.)
> > 
> > Did you really attach this before? i really don't remember it :D
> 
> This was a very long-winded thread ending on 2016-01-01 centered around
> the regtest framework. It went from custom driver using QTest's
> benchmark functionality to custom backend in the regtest framework to
> separate Python script. The script still has problems to reliably
> measure really short things like running "pdftotext" for which a custom
> C++ driver might be needed, but for long-running stuff like "pdftoppm",
> the results seem reasonable.
> 
> > Anyhow it seems the change is mostly a zero-sum runtime wise.
> 
> Indeed. (Although thinking really really long term, I think a Poppler
> that is using std::vector, std::string and friends everywhere, could
> loose a lot of distributed fat in the form of a lot of memory
> indirections and benefit from the highly optimized standard library. But
> it just is not a single patch thing to reap any of these benefits.)
> 
> > Which bring us to the code side. First i know it sucks but your spacing is
> > broken, please check the whole patch
> > 
> > -   nameToGID->removeInt(macGlyphNames[j]);
> > -   nameToGID->add(new GooString(macGlyphNames[j]), i);
> > +  nameToGID[macGlyphNames[j]] = i;
> > 
> > i could fix it, but it's better (for me) if you do :D
> 
> Definitely for me to fix. The main problem is that the spacing in those
> files was already broken and I am unsure whether I should fix the
> surrounding spacing even if I am not touching it otherwise. Due to that,
> there sometimes is no correct way in the current patch. If you do not
> say otherwise, I will try to make at least the touched blocks of code
> consistent.

Are you sure the spacing is broken? I'd say it's just xpdf weird spacing rules 
of using 2 soft spaces and then hard tabs at 8.

> 
> > Now onto the code,
> > 
> >   const auto emplaceRangeMap = [&](const char* encodingName, GBool
> >   unicodeOut,> 
> > UnicodeMapRange* ranges, int len) {
> > 
> > residentUnicodeMaps.emplace(
> > 
> >   std::piecewise_construct,
> >   std::forward_as_tuple(encodingName),
> >   std::forward_as_tuple(encodingName, unicodeOut, ranges, len)
> > 
> > );
> >   
> >   };
> > 
> > It's the first time in my life i see std::piecewise_construct and
> > std::forward_as_tuple, yes that probably makes me a bad C++ developer, but
> > given there's lots of us around, it doesn't make me happy now i don't
> > understand what the code does.
> 
> The problem is that most internal Poppler types lack proper construction
> and assignment 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-03-03 Thread Albert Astals Cid
El dimecres, 28 de febrer de 2018, a les 11:02:11 CET, Oliver Sander va 
escriure:
> On 21.02.2018 00:31, Albert Astals Cid wrote:
> > I'd really appreciate other people's comments on this.
> 
> I am generally in favour of changes like this, provided there is no
> significant increase in run-time or binary code size.  I think such changes
> do decrease long-term maintenance costs, and they make it easier for new
> developers to understand the code.  The current GooHash implementation may
> be a reasonable implementation of a hash table (I didn't check), but it is
> completely undocumented, and if I ever need to know what it does, I will
> have to read the code. On the other hand, while that emplaceRangeMap lambda
> looks strange to me on first sight too, all this modern C++ stuff is well
> described in books, online articles and blog posts.
> BTW: I always thought that changes like this couldn't be made because it
> makes backporting patches from xpdf more difficult/impossible.  Is this not
> an issue anymore?

we never got around to merge xpdf 3.04 changes and given the relatively big 
change i did regarding Object i guess we can say that porting xpdf code isn't 
really a big motivation anymore for not making improving changes.

Cheers,
  Albert

> 
> Best,
> Oliver




___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-28 Thread Oliver Sander


On 21.02.2018 00:31, Albert Astals Cid wrote:

>
> I'd really appreciate other people's comments on this.

I am generally in favour of changes like this, provided there is no significant
increase in run-time or binary code size.  I think such changes do decrease
long-term maintenance costs, and they make it easier for new developers to
understand the code.  The current GooHash implementation may be a reasonable
implementation of a hash table (I didn't check), but it is completely 
undocumented,
and if I ever need to know what it does, I will have to read the code.
On the other hand, while that emplaceRangeMap lambda looks strange to me
on first sight too, all this modern C++ stuff is well described in books,
online articles and blog posts.
BTW: I always thought that changes like this couldn't be made because it makes
backporting patches from xpdf more difficult/impossible.  Is this not an issue
anymore?

Best,
Oliver



smime.p7s
Description: S/MIME Cryptographic Signature
___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-21 Thread Adam Reichold
Hello again,

attached is an updated version of the patch that:

* Improves the conversions between GooString and std::string.

* Fixes the spacing (And yes, the spacing wasn't broken. I was broken. I
just can't get my head around 2 spaces indentation with an 8 spaces tab.
Sorry for that.)

* Extends UnicodeMap to become a move-only type so that the
initialization can stay basically the same as when using GooHash storing
pointers to UnicodeMap.

Best regards, ADam.

Am 21.02.2018 um 07:13 schrieb Adam Reichold:
> Hello again,
> 
> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va 
>> escriure:
>>> Hello again,
>>>
>>> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
 Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va
>
> escriure:
>> Hello,
>>
>> the attached patch replaced Poppler's internal hash table implementation
>> GooHash by std::unordered_map found in the C++ standard library since
>> C++11. This continues Poppler's slow drift towards standard library
>> containers and removes one of the home-grown data structures with the
>> main goals of reducing code size and improving the long term
>> maintainability of the code base.
>
> Do you have any benchmarks on "rendering" speed and code size?

 Sorry, not at hand. I will try to prepare them during the week.
>>>
>>> I did run Splash rendering benchmarks of this branch against master with
>>> the result of rendering the circa 2400 documents of the TeXLive
>>> documentation present on my machine being:
>>
>> I'm wondering if those 2400 documents are diverse enough, which they may not 
>> be given they are all coming from "the same place".
> 
> They seem pretty diverse w.r.t. content, some being text heavy and some
> graphics rich. But I guess they are definitely not diverse technically
> as all are prepared using TeX itself.
> 
> The main problem on my side is that I have failed to find my DVD copy of
> the Poppler regtest document collection until now. :-\ In any case, I am
> reasonably confident on the zero sum result since GooHash does not seem
> to live in any performance critical code path.
> 
>>>
>>> Cumulative run time:
>>> Result: 90.95 min ∓ 1.1 %
>>> Reference: 91.57 min ∓ 1.2 %
>>> Deviation: -0.0 %
>>> Cumulative memory usage:
>>> Result: 37.2 MB ∓ 0.7 %
>>> Reference: 37.0 MB ∓ 0.7 %
>>> Deviation: +0.0 %
>>>
>>> (Where result is this patch and the reference is master.) (The
>>> measurement was taken using the perftest script which I proposed here
>>> some time ago for which I'll attach the patch again for reproduceability.)
>>
>> Did you really attach this before? i really don't remember it :D
> 
> This was a very long-winded thread ending on 2016-01-01 centered around
> the regtest framework. It went from custom driver using QTest's
> benchmark functionality to custom backend in the regtest framework to
> separate Python script. The script still has problems to reliably
> measure really short things like running "pdftotext" for which a custom
> C++ driver might be needed, but for long-running stuff like "pdftoppm",
> the results seem reasonable.
> 
>> Anyhow it seems the change is mostly a zero-sum runtime wise. 
> 
> Indeed. (Although thinking really really long term, I think a Poppler
> that is using std::vector, std::string and friends everywhere, could
> loose a lot of distributed fat in the form of a lot of memory
> indirections and benefit from the highly optimized standard library. But
> it just is not a single patch thing to reap any of these benefits.)
> 
>> Which bring us to the code side. First i know it sucks but your spacing is 
>> broken, please check the whole patch
>>
>> -nameToGID->removeInt(macGlyphNames[j]);
>> -nameToGID->add(new GooString(macGlyphNames[j]), i);
>> +  nameToGID[macGlyphNames[j]] = i;
>>
>> i could fix it, but it's better (for me) if you do :D
> 
> Definitely for me to fix. The main problem is that the spacing in those
> files was already broken and I am unsure whether I should fix the
> surrounding spacing even if I am not touching it otherwise. Due to that,
> there sometimes is no correct way in the current patch. If you do not
> say otherwise, I will try to make at least the touched blocks of code
> consistent.
> 
>> Now onto the code, 
>>
>>   const auto emplaceRangeMap = [&](const char* encodingName, GBool 
>> unicodeOut, 
>> UnicodeMapRange* ranges, int len) {
>> residentUnicodeMaps.emplace(
>>   std::piecewise_construct,
>>   std::forward_as_tuple(encodingName),
>>   std::forward_as_tuple(encodingName, unicodeOut, ranges, len)
>> );
>>   };
>>
>> It's the first time in my life i see std::piecewise_construct and 
>> std::forward_as_tuple, yes that probably makes me a bad C++ developer, but 
>> given there's lots 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-20 Thread Adam Reichold
Hello again,

Am 21.02.2018 um 00:31 schrieb Albert Astals Cid:
> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va 
> escriure:
>> Hello again,
>>
>> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
>>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
 El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va

 escriure:
> Hello,
>
> the attached patch replaced Poppler's internal hash table implementation
> GooHash by std::unordered_map found in the C++ standard library since
> C++11. This continues Poppler's slow drift towards standard library
> containers and removes one of the home-grown data structures with the
> main goals of reducing code size and improving the long term
> maintainability of the code base.

 Do you have any benchmarks on "rendering" speed and code size?
>>>
>>> Sorry, not at hand. I will try to prepare them during the week.
>>
>> I did run Splash rendering benchmarks of this branch against master with
>> the result of rendering the circa 2400 documents of the TeXLive
>> documentation present on my machine being:
> 
> I'm wondering if those 2400 documents are diverse enough, which they may not 
> be given they are all coming from "the same place".

They seem pretty diverse w.r.t. content, some being text heavy and some
graphics rich. But I guess they are definitely not diverse technically
as all are prepared using TeX itself.

The main problem on my side is that I have failed to find my DVD copy of
the Poppler regtest document collection until now. :-\ In any case, I am
reasonably confident on the zero sum result since GooHash does not seem
to live in any performance critical code path.

>>
>> Cumulative run time:
>> Result: 90.95 min ∓ 1.1 %
>> Reference: 91.57 min ∓ 1.2 %
>> Deviation: -0.0 %
>> Cumulative memory usage:
>> Result: 37.2 MB ∓ 0.7 %
>> Reference: 37.0 MB ∓ 0.7 %
>> Deviation: +0.0 %
>>
>> (Where result is this patch and the reference is master.) (The
>> measurement was taken using the perftest script which I proposed here
>> some time ago for which I'll attach the patch again for reproduceability.)
> 
> Did you really attach this before? i really don't remember it :D

This was a very long-winded thread ending on 2016-01-01 centered around
the regtest framework. It went from custom driver using QTest's
benchmark functionality to custom backend in the regtest framework to
separate Python script. The script still has problems to reliably
measure really short things like running "pdftotext" for which a custom
C++ driver might be needed, but for long-running stuff like "pdftoppm",
the results seem reasonable.

> Anyhow it seems the change is mostly a zero-sum runtime wise. 

Indeed. (Although thinking really really long term, I think a Poppler
that is using std::vector, std::string and friends everywhere, could
loose a lot of distributed fat in the form of a lot of memory
indirections and benefit from the highly optimized standard library. But
it just is not a single patch thing to reap any of these benefits.)

> Which bring us to the code side. First i know it sucks but your spacing is 
> broken, please check the whole patch
> 
> - nameToGID->removeInt(macGlyphNames[j]);
> - nameToGID->add(new GooString(macGlyphNames[j]), i);
> +  nameToGID[macGlyphNames[j]] = i;
> 
> i could fix it, but it's better (for me) if you do :D

Definitely for me to fix. The main problem is that the spacing in those
files was already broken and I am unsure whether I should fix the
surrounding spacing even if I am not touching it otherwise. Due to that,
there sometimes is no correct way in the current patch. If you do not
say otherwise, I will try to make at least the touched blocks of code
consistent.

> Now onto the code, 
> 
>   const auto emplaceRangeMap = [&](const char* encodingName, GBool 
> unicodeOut, 
> UnicodeMapRange* ranges, int len) {
> residentUnicodeMaps.emplace(
>   std::piecewise_construct,
>   std::forward_as_tuple(encodingName),
>   std::forward_as_tuple(encodingName, unicodeOut, ranges, len)
> );
>   };
> 
> It's the first time in my life i see std::piecewise_construct and 
> std::forward_as_tuple, yes that probably makes me a bad C++ developer, but 
> given there's lots of us around, it doesn't make me happy now i don't 
> understand what the code does.

The problem is that most internal Poppler types lack proper construction
and assignment operators, hence I need to work harder to construct those
UnicodeMap instances in-place inside the map (GooHash just stored a
pointer to it, so there was no problem.)

The whole piecewise_construct and forward_as_tuple dance just ensures,
that those parameters to emplace are properly grouped before being
unpacked to become the parameters of std::string and UnicodeMap
construction again. If UnicodeMap was movable, this would probably look like

residentUnicodeMaps.emplace(
 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-20 Thread Albert Astals Cid
El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va 
escriure:
> Hello again,
> 
> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va
> >> 
> >> escriure:
> >>> Hello,
> >>> 
> >>> the attached patch replaced Poppler's internal hash table implementation
> >>> GooHash by std::unordered_map found in the C++ standard library since
> >>> C++11. This continues Poppler's slow drift towards standard library
> >>> containers and removes one of the home-grown data structures with the
> >>> main goals of reducing code size and improving the long term
> >>> maintainability of the code base.
> >> 
> >> Do you have any benchmarks on "rendering" speed and code size?
> > 
> > Sorry, not at hand. I will try to prepare them during the week.
> 
> I did run Splash rendering benchmarks of this branch against master with
> the result of rendering the circa 2400 documents of the TeXLive
> documentation present on my machine being:

I'm wondering if those 2400 documents are diverse enough, which they may not 
be given they are all coming from "the same place".

> 
> Cumulative run time:
> Result: 90.95 min ∓ 1.1 %
> Reference: 91.57 min ∓ 1.2 %
> Deviation: -0.0 %
> Cumulative memory usage:
> Result: 37.2 MB ∓ 0.7 %
> Reference: 37.0 MB ∓ 0.7 %
> Deviation: +0.0 %
> 
> (Where result is this patch and the reference is master.) (The
> measurement was taken using the perftest script which I proposed here
> some time ago for which I'll attach the patch again for reproduceability.)

Did you really attach this before? i really don't remember it :D

Anyhow it seems the change is mostly a zero-sum runtime wise. 

Which bring us to the code side. First i know it sucks but your spacing is 
broken, please check the whole patch

-   nameToGID->removeInt(macGlyphNames[j]);
-   nameToGID->add(new GooString(macGlyphNames[j]), i);
+  nameToGID[macGlyphNames[j]] = i;

i could fix it, but it's better (for me) if you do :D

Now onto the code, 

  const auto emplaceRangeMap = [&](const char* encodingName, GBool unicodeOut, 
UnicodeMapRange* ranges, int len) {
residentUnicodeMaps.emplace(
  std::piecewise_construct,
  std::forward_as_tuple(encodingName),
  std::forward_as_tuple(encodingName, unicodeOut, ranges, len)
);
  };

It's the first time in my life i see std::piecewise_construct and 
std::forward_as_tuple, yes that probably makes me a bad C++ developer, but 
given there's lots of us around, it doesn't make me happy now i don't 
understand what the code does.

Then there's the benefit/risk ratio. The code using GooHash is mostly 
rocksolid and we haven't probably touched any line in it for a long time and 
we have probably neither written new code that uses GooHash.

In that regard we risk breaking working code.

The benefit is not more speed nor less memory usage as your measurements show.

Mostly the benefit is "removing code from maintainership", which i agree is a 
good thing, but as mentioned before it's some code "we mostly ignore", so it's 
not that much of a good thing.

So i'm hesitant of what to do. It really sounds like a nice thing to not have 
custom structures, but on the other hand it's not like they get much in the 
way of coding.

I'd really appreciate other people's comments on this.

Cheers,
  Albert

> 
> I'll also attach the detailed comparison, but the gist seems to be that
> if there are significant changes, the run time is reduced but the memory
> usage is increased in the majority of cases. But this does not seem to
> show up in the cumulative results.
> 
> Best regards, Adam.
> 
> P.S.: One could try to improve the memory usage by tuning the load
> factor or calling shrink_to_fit where appropriate. Would you like me to
> try to do this?
> 
> P.P.S.: One obvious area for improvement would be better
> interoperability between GooString and std::string, i.e. not converting
> them as C strings so that the length information does not need to be
> recomputed. I will try to prepare this as a separate patch on top of
> this one or should I include this here?
> 
> Best regards, Adam.
> 
> > Concerning code size, a release build of libpoppler.so goes from
> > 
> >textdata bss dec hex filename
> > 
> > 2464034  288852 360 2753246  2a02de libpoppler.so.73.0.0
> > 
> > for the current master to
> > 
> >textdata bss dec hex filename
> > 
> > 2482129  288756 360 2771245  2a492d libpoppler.so.73.0.0
> > 
> > with the patch applied, i.e. a 0.65% increase in binary size.
> > 
> > 
> > Please note that in my original message, I was referring only to source
> > code size, i.e.
> > 
> > git diff --stat master...remove-goo-hash
> > 
> >  18 files changed, 168 insertions(+), 803 deletions(-)
> >  
> >> Cheers,
> >> 
> >>   Albert
> > 
> > Best regards, Adam.
> > 
> >>> 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-20 Thread Adam Reichold
Hello again again,

I obviously forgot the attachments to the previous mail. Doh...

Regards, Adam.

Am 20.02.2018 um 08:58 schrieb Adam Reichold:
> Hello again,
> 
> Am 18.02.2018 um 23:23 schrieb Adam Reichold:
>>
>>
>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va 
>>> escriure:
 Hello,

 the attached patch replaced Poppler's internal hash table implementation
 GooHash by std::unordered_map found in the C++ standard library since
 C++11. This continues Poppler's slow drift towards standard library
 containers and removes one of the home-grown data structures with the
 main goals of reducing code size and improving the long term
 maintainability of the code base.
>>>
>>> Do you have any benchmarks on "rendering" speed and code size?
>>
>> Sorry, not at hand. I will try to prepare them during the week.
> 
> I did run Splash rendering benchmarks of this branch against master with
> the result of rendering the circa 2400 documents of the TeXLive
> documentation present on my machine being:
> 
> Cumulative run time:
> Result: 90.95 min ∓ 1.1 %
> Reference: 91.57 min ∓ 1.2 %
> Deviation: -0.0 %
> Cumulative memory usage:
> Result: 37.2 MB ∓ 0.7 %
> Reference: 37.0 MB ∓ 0.7 %
> Deviation: +0.0 %
> 
> (Where result is this patch and the reference is master.) (The
> measurement was taken using the perftest script which I proposed here
> some time ago for which I'll attach the patch again for reproduceability.)
> 
> I'll also attach the detailed comparison, but the gist seems to be that
> if there are significant changes, the run time is reduced but the memory
> usage is increased in the majority of cases. But this does not seem to
> show up in the cumulative results.
> 
> Best regards, Adam.
> 
> P.S.: One could try to improve the memory usage by tuning the load
> factor or calling shrink_to_fit where appropriate. Would you like me to
> try to do this?
> 
> P.P.S.: One obvious area for improvement would be better
> interoperability between GooString and std::string, i.e. not converting
> them as C strings so that the length information does not need to be
> recomputed. I will try to prepare this as a separate patch on top of
> this one or should I include this here?
> 
> Best regards, Adam.
> 
>> Concerning code size, a release build of libpoppler.so goes from
>>
>>textdata bss dec hex filename
>> 2464034  288852 360 2753246  2a02de libpoppler.so.73.0.0
>>
>> for the current master to
>>
>>textdata bss dec hex filename
>> 2482129  288756 360 2771245  2a492d libpoppler.so.73.0.0
>>
>> with the patch applied, i.e. a 0.65% increase in binary size.
>>
>>
>> Please note that in my original message, I was referring only to source
>> code size, i.e.
>>
>> git diff --stat master...remove-goo-hash
>>  18 files changed, 168 insertions(+), 803 deletions(-)
>>
>>> Cheers,
>>>   Albert
>>
>> Best regards, Adam.
>>

 Best regards, Adam.
>>>
>>>
>>>
>>>
>>> ___
>>> poppler mailing list
>>> poppler@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>>
>>
>>
>>
>> ___
>> poppler mailing list
>> poppler@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>
> 
> 
> 
> ___
> poppler mailing list
> poppler@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
> 
From f080d40f1dba236a2d317fc7c9d9c77b5db8654c Mon Sep 17 00:00:00 2001
From: Adam Reichold 
Date: Tue, 20 Feb 2018 08:41:04 +0100
Subject: [PATCH] Add simple Python script to measure and comprare rendering
 performance of Poppler builds.

---
 perftest/compare.py   |  94 +++
 perftest/measure.py   | 139 ++
 perftest/poppler-perftest |  40 +
 perftest/util.py  |  29 ++
 4 files changed, 302 insertions(+)
 create mode 100644 perftest/compare.py
 create mode 100644 perftest/measure.py
 create mode 100755 perftest/poppler-perftest
 create mode 100644 perftest/util.py

diff --git a/perftest/compare.py b/perftest/compare.py
new file mode 100644
index ..c19f1cab
--- /dev/null
+++ b/perftest/compare.py
@@ -0,0 +1,94 @@
+import pickle
+import zlib
+
+from util import reference, reldev, maxabs
+
+def collect_stats3(stats, entry):
+if stats is None:
+stats = (0, 0)
+
+sum, acc = stats
+mean, stdev = entry
+stats = (sum + mean, acc + abs(stdev / mean))
+
+return stats
+
+def collect_stats2(stats, entry):
+if stats is None:
+stats = { 'run_time': None, 'memory_usage': None }
+
+stats['run_time'] = collect_stats3(stats['run_time'], entry['run_time'])
+stats['memory_usage'] = 

Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-19 Thread Adam Reichold
Hello again,

Am 18.02.2018 um 23:23 schrieb Adam Reichold:
> 
> 
> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va 
>> escriure:
>>> Hello,
>>>
>>> the attached patch replaced Poppler's internal hash table implementation
>>> GooHash by std::unordered_map found in the C++ standard library since
>>> C++11. This continues Poppler's slow drift towards standard library
>>> containers and removes one of the home-grown data structures with the
>>> main goals of reducing code size and improving the long term
>>> maintainability of the code base.
>>
>> Do you have any benchmarks on "rendering" speed and code size?
> 
> Sorry, not at hand. I will try to prepare them during the week.

I did run Splash rendering benchmarks of this branch against master with
the result of rendering the circa 2400 documents of the TeXLive
documentation present on my machine being:

Cumulative run time:
Result: 90.95 min ∓ 1.1 %
Reference: 91.57 min ∓ 1.2 %
Deviation: -0.0 %
Cumulative memory usage:
Result: 37.2 MB ∓ 0.7 %
Reference: 37.0 MB ∓ 0.7 %
Deviation: +0.0 %

(Where result is this patch and the reference is master.) (The
measurement was taken using the perftest script which I proposed here
some time ago for which I'll attach the patch again for reproduceability.)

I'll also attach the detailed comparison, but the gist seems to be that
if there are significant changes, the run time is reduced but the memory
usage is increased in the majority of cases. But this does not seem to
show up in the cumulative results.

Best regards, Adam.

P.S.: One could try to improve the memory usage by tuning the load
factor or calling shrink_to_fit where appropriate. Would you like me to
try to do this?

P.P.S.: One obvious area for improvement would be better
interoperability between GooString and std::string, i.e. not converting
them as C strings so that the length information does not need to be
recomputed. I will try to prepare this as a separate patch on top of
this one or should I include this here?

Best regards, Adam.

> Concerning code size, a release build of libpoppler.so goes from
> 
>textdata bss dec hex filename
> 2464034  288852 360 2753246  2a02de libpoppler.so.73.0.0
> 
> for the current master to
> 
>textdata bss dec hex filename
> 2482129  288756 360 2771245  2a492d libpoppler.so.73.0.0
> 
> with the patch applied, i.e. a 0.65% increase in binary size.
> 
> 
> Please note that in my original message, I was referring only to source
> code size, i.e.
> 
> git diff --stat master...remove-goo-hash
>  18 files changed, 168 insertions(+), 803 deletions(-)
> 
>> Cheers,
>>   Albert
> 
> Best regards, Adam.
> 
>>>
>>> Best regards, Adam.
>>
>>
>>
>>
>> ___
>> poppler mailing list
>> poppler@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/poppler
>>
> 
> 
> 
> ___
> poppler mailing list
> poppler@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
> 



signature.asc
Description: OpenPGP digital signature
___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-18 Thread Adam Reichold


Am 18.02.2018 um 23:08 schrieb Albert Astals Cid:
> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va 
> escriure:
>> Hello,
>>
>> the attached patch replaced Poppler's internal hash table implementation
>> GooHash by std::unordered_map found in the C++ standard library since
>> C++11. This continues Poppler's slow drift towards standard library
>> containers and removes one of the home-grown data structures with the
>> main goals of reducing code size and improving the long term
>> maintainability of the code base.
> 
> Do you have any benchmarks on "rendering" speed and code size?

Sorry, not at hand. I will try to prepare them during the week.


Concerning code size, a release build of libpoppler.so goes from

   textdata bss dec hex filename
2464034  288852 360 2753246  2a02de libpoppler.so.73.0.0

for the current master to

   textdata bss dec hex filename
2482129  288756 360 2771245  2a492d libpoppler.so.73.0.0

with the patch applied, i.e. a 0.65% increase in binary size.


Please note that in my original message, I was referring only to source
code size, i.e.

git diff --stat master...remove-goo-hash
 18 files changed, 168 insertions(+), 803 deletions(-)

> Cheers,
>   Albert

Best regards, Adam.

>>
>> Best regards, Adam.
> 
> 
> 
> 
> ___
> poppler mailing list
> poppler@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
> 



signature.asc
Description: OpenPGP digital signature
___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [RFC] Replace GooHash by std::unordered_map

2018-02-18 Thread Albert Astals Cid
El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va 
escriure:
> Hello,
> 
> the attached patch replaced Poppler's internal hash table implementation
> GooHash by std::unordered_map found in the C++ standard library since
> C++11. This continues Poppler's slow drift towards standard library
> containers and removes one of the home-grown data structures with the
> main goals of reducing code size and improving the long term
> maintainability of the code base.

Do you have any benchmarks on "rendering" speed and code size?

Cheers,
  Albert

> 
> Best regards, Adam.




___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler