Re: [VOTE] Release Apache PDFBox 3.0.2

2024-03-13 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 11.03.24 um 20:24 schrieb Andreas Lehmkühler:

Hi,

a candidate for the PDFBox 3.0.2 release is available at:

    https://dist.apache.org/repos/dist/dev/pdfbox/3.0.2/

The release candidate is a zip archive of the sources in:

    https://svn.apache.org/repos/asf/pdfbox/tags/3.0.2/

The SHA-512 checksum of the archive is 
d2eaaa4e7a139b00d79d7518ca66ee2c33300dbeed11c05554413e478b2a76814a7404a9467cb2dc3502840259188965a3483342c7d44e3280b68649aec670f8.


Please vote on releasing this package as Apache PDFBox 3.0.2.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

    [ ] +1 Release this package as Apache PDFBox 3.0.2
    [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: t.boe...@digital-science.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Felix Berthelmann - Mario Diwersy


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report January 2024 due

2024-01-08 Thread Timo Boehme

+1

Thanks,
Timo


Am 08.01.24 um 08:14 schrieb Andreas Lehmkühler:

Hi,

find attached a quick draft of the board report we're expected to 
submit this month. It's based upon the report wizard template which 
can be found at [1]


Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Project Status:
Current project status: ongoing with moderate activity
Issues for the board: none

## Membership Data:
Apache PDFBox was founded 2009-10-21 (14 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

    3.0.1 was released on 2023-11-30.
    2.0.30 was released on 2023-11-04.
    3.0.0 was released on 2023-08-17.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the mailing lists
- we released the first minor release of our new 3.0.x line to fix 
some regression issues. A couple of improvements and further fixes 
were included as well.
- the development of the current trunk version 4.0.0 is an ongoing 
effort, e.g. we switched to Log4j2 and did some major refactorings



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: t.boe...@digital-science.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Felix Berthelmann - Mario Diwersy


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 3.0.1

2023-11-28 Thread Timo Boehme

+1,

Thanks,
Timo


Am 27.11.23 um 17:46 schrieb Andreas Lehmkühler:

Hi,

a candidate for the PDFBox 3.0.1 release is available at:

    https://dist.apache.org/repos/dist/dev/pdfbox/3.0.1/

The release candidate is a zip archive of the sources in:

    https://svn.apache.org/repos/asf/pdfbox/tags/3.0.1/

The SHA-512 checksum of the archive is 
8ca8f3297ec04efaa23ab6d9ca421c1b39d8fb2de392e0f7b5aa6e7053eac75066e8b2872dc6b6847a0194b557aa8570de7f1d1a122fcf3888bf9ed21eae0257.


Please vote on releasing this package as Apache PDFBox 3.0.1.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

    [ ] +1 Release this package as Apache PDFBox 3.0.1
    [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: t.boe...@digital-science.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Felix Berthelmann - Mario Diwersy


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 3.0.0

2023-08-17 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 14.08.23 um 20:29 schrieb Andreas Lehmkühler:

Hi,

a candidate for the PDFBox 3.0.0 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0/

The release candidate is a zip archive of the sources in:

     https://svn.apache.org/repos/asf/pdfbox/tags/3.0.0/

The SHA-512 checksum of the archive is 
279f283f8f97e3adb5e58546f6242b495eef26dacfc256129f790064a73934f16ceb0a7a9164293d506fc0fff462783d296b844611ed18e12b9de0f1724294b5. 



Please vote on releasing this package as Apache PDFBox 3.0.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 3.0.0
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5636) Implement PDF 2.0 dash phase clarification

2023-07-20 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745259#comment-17745259
 ] 

Timo Boehme commented on PDFBOX-5636:
-

LGTM - in respect to optimization and under the assumption that phase typically 
will be < sum2 one could spare some CPU cycles:
{code:java}
phase += (-phase < sum2) ? sum2 : (Math.floor(-phase / sum2) + 1) * sum2; {code}

> Implement PDF 2.0 dash phase clarification
> --
>
> Key: PDFBOX-5636
> URL: https://issues.apache.org/jira/browse/PDFBOX-5636
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.29
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.30, 3.0.0 PDFBox
>
>
> Implement clarification of PDF 2.0 when dash phase is negative



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5636) Implement PDF 2.0 dash phase clarification

2023-07-17 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743675#comment-17743675
 ] 

Timo Boehme edited comment on PDFBOX-5636 at 7/17/23 8:35 AM:
--

Reading the spec I cannot follow how the implemented optimized version will 
produce correct results?

E.g. [2 1]; -1 : according to spec the phase will be 5:   -1 + 2*(2+1)

with the current optimized implementation it will be 4 (actually it produces 
multiples of 4);

regarding the initial implementation: it should be checked that the sum of the 
array components is >0 otherwise the loop will run forever


was (Author: tboehme):
Reading the spec I cannot follow how the implemented optimized version will 
produce correct results?

E.g. [2 1]; -1 : according to spec the phase will be 5:   -1 + 2*(2+1)

with the current optimized implementation it will be 4 (actually it produces 
multiples of 4);

regarding the initial implementation: it should be checked, the the sum of the 
array components is >0 otherwise the loop will run forever

> Implement PDF 2.0 dash phase clarification
> --
>
> Key: PDFBOX-5636
> URL: https://issues.apache.org/jira/browse/PDFBOX-5636
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.29
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.30, 3.0.0 PDFBox
>
>
> Implement clarification of PDF 2.0 when dash phase is negative



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5636) Implement PDF 2.0 dash phase clarification

2023-07-17 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743675#comment-17743675
 ] 

Timo Boehme commented on PDFBOX-5636:
-

Reading the spec I cannot follow how the implemented optimized version will 
produce correct results?

E.g. [2 1]; -1 : according to spec the phase will be 5:   -1 + 2*(2+1)

with the current optimized implementation it will be 4 (actually it produces 
multiples of 4);

regarding the initial implementation: it should be checked, the the sum of the 
array components is >0 otherwise the loop will run forever

> Implement PDF 2.0 dash phase clarification
> --
>
> Key: PDFBOX-5636
> URL: https://issues.apache.org/jira/browse/PDFBOX-5636
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.29
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.30, 3.0.0 PDFBox
>
>
> Implement clarification of PDF 2.0 when dash phase is negative



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5624) Infinte loop when parsing Type1 font

2023-06-19 Thread Timo Boehme (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme resolved PDFBOX-5624.
-
Resolution: Fixed

> Infinte loop when parsing Type1 font
> 
>
> Key: PDFBOX-5624
> URL: https://issues.apache.org/jira/browse/PDFBOX-5624
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.28, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Major
> Fix For: 2.0.29, 3.0.0 PDFBox
>
>
> At 2 places the Type1Parser has a loop with a negated condition on the next 
> token to be read. The loops simply advances to the next token without 
> checking the token to be null, which may happen if the font is 
> corrupted/shortened. If this occurs the parser is stuck in the loops. This 
> happens to me for one of the loops with a file which however cannot be 
> shared. The 2nd loop was found by scanning the code for similar problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5624) Infinte loop when parsing Type1 font

2023-06-19 Thread Timo Boehme (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme updated PDFBOX-5624:

Fix Version/s: 2.0.29
   3.0.0 PDFBox

> Infinte loop when parsing Type1 font
> 
>
> Key: PDFBOX-5624
> URL: https://issues.apache.org/jira/browse/PDFBOX-5624
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.28, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Major
> Fix For: 2.0.29, 3.0.0 PDFBox
>
>
> At 2 places the Type1Parser has a loop with a negated condition on the next 
> token to be read. The loops simply advances to the next token without 
> checking the token to be null, which may happen if the font is 
> corrupted/shortened. If this occurs the parser is stuck in the loops. This 
> happens to me for one of the loops with a file which however cannot be 
> shared. The 2nd loop was found by scanning the code for similar problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5624) Infinte loop when parsing Type1 font

2023-06-19 Thread Timo Boehme (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme updated PDFBOX-5624:

Affects Version/s: 3.0.0 PDFBox

> Infinte loop when parsing Type1 font
> 
>
> Key: PDFBOX-5624
> URL: https://issues.apache.org/jira/browse/PDFBOX-5624
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.28, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Major
>
> At 2 places the Type1Parser has a loop with a negated condition on the next 
> token to be read. The loops simply advances to the next token without 
> checking the token to be null, which may happen if the font is 
> corrupted/shortened. If this occurs the parser is stuck in the loops. This 
> happens to me for one of the loops with a file which however cannot be 
> shared. The 2nd loop was found by scanning the code for similar problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5624) Infinte loop when parsing Type1 font

2023-06-19 Thread Timo Boehme (Jira)
Timo Boehme created PDFBOX-5624:
---

 Summary: Infinte loop when parsing Type1 font
 Key: PDFBOX-5624
 URL: https://issues.apache.org/jira/browse/PDFBOX-5624
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Affects Versions: 2.0.28
Reporter: Timo Boehme
Assignee: Timo Boehme


At 2 places the Type1Parser has a loop with a negated condition on the next 
token to be read. The loops simply advances to the next token without checking 
the token to be null, which may happen if the font is corrupted/shortened. If 
this occurs the parser is stuck in the loops. This happens to me for one of the 
loops with a file which however cannot be shared. The 2nd loop was found by 
scanning the code for similar problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2022 due

2022-10-10 Thread Timo Boehme

+1

Thanks,
Timo


Am 09.10.22 um 14:06 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...


## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

a Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (13 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.27 was released on 2022-09-29.
     1.8.17 was released on 2022-09-15.
     2.0.26 was released on 2022-04-21.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the

   mailing lists
- there are a lot of refactorings, improvements and bugfixes
- we are still planning to cut the first beta release of our next major
   version
   3.0.0
- to do so we start to identify the last tickets with breaking changes 
to be

   included in 3.0.0.
- due to the releases last month the preparations for the beta release were
   slowed down a little
- there was an article about maintaining interoperability in open source
   software". To do so the authors studied the activities within Apache 
PDFBox

   for two years without the knowledge of the community. We don't see any
   surprises, see https://s.apache.org/aljtz for further details



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.27

2022-09-29 Thread Timo Boehme
Adding the twelvemonkeys JPEG library fixed the issue (correct rendering 
of JPEG CMYK images in PDF under Java-17). So is this a known problem 
with the original Java JPEG implementation (as used by the pure 
pdfbox-app)? I remember that there might be an alternative code path for 
(JPEG) CMYK rendering until Java 8 in PDFBox which might explain why it 
is working until Java 8?


Best regards,
Timo


Am 28.09.22 um 12:07 schrieb Tilman Hausherr:

Please try with the twelvemonkeys library too
Tilman



--- Original-Nachricht ---
Von: Timo Boehme
Betreff: Re: [VOTE] Release Apache PDFBox 2.0.27
Datum: 28. September 2022, 10:02
An: dev@pdfbox.apache.org




+1

No regression found compared to previous versions but detected problem
with Java-11 and Java-17 rendering some CMYK JPEG images (inverse
colors, Java-8 is fine); have to investigate further and open a Jira Issue.

Thanks,
Timo


Am 26.09.22 um 17:28 schrieb Andreas Lehmkuehler:

a candidate for the PDFBox 2.0.27 release is available at:

    <https://dist.apache.org/repos/dist/dev/pdfbox/2.0.27> /

The release candidate is a zip archive of the sources in:

    <https://svn.apache.org/repos/asf/pdfbox/tags/2.0.27> /

The SHA-512 checksum of the archive is


59a5675f5d1d34f092adc019679f7d10e7e93c0f554a002ac29d48cbffcaa600d930309fa94a92191c01ead8da905cbb37ce5e233dcc9b8732a881d4abf75def.



Please vote on releasing this package as Apache PDFBox 2.0.27.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.27
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org

<mailto:dev-unsubscr...@pdfbox.apache.org>

For additional commands, e-mail: dev-h...@pdfbox.apache.org

<mailto:dev-h...@pdfbox.apache.org>





--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com <mailto:timo.boe...@ontochem.com> | web:
<http://www.ontochem.com>
HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
<mailto:dev-unsubscr...@pdfbox.apache.org>
For additional commands, e-mail: dev-h...@pdfbox.apache.org
<mailto:dev-h...@pdfbox.apache.org>





--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.27

2022-09-28 Thread Timo Boehme

+1

No regression found compared to previous versions but detected problem 
with Java-11 and Java-17 rendering some CMYK JPEG images (inverse 
colors, Java-8 is fine); have to investigate further and open a Jira Issue.


Thanks,
Timo


Am 26.09.22 um 17:28 schrieb Andreas Lehmkuehler:

a candidate for the PDFBox 2.0.27 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.27/

The release candidate is a zip archive of the sources in:

     https://svn.apache.org/repos/asf/pdfbox/tags/2.0.27/

The SHA-512 checksum of the archive is 
59a5675f5d1d34f092adc019679f7d10e7e93c0f554a002ac29d48cbffcaa600d930309fa94a92191c01ead8da905cbb37ce5e233dcc9b8732a881d4abf75def. 



Please vote on releasing this package as Apache PDFBox 2.0.27.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.27
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 1.8.17

2022-09-13 Thread Timo Boehme

+1

Really interesting how much faster (often 10-20 times; and more correct) 
2.X is for parsing+rendering compared to the 1.X version.


Timo


Am 12.09.22 um 18:50 schrieb Andreas Lehmkuehler:

a candidate for the PDFBox 1.8.17 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/1.8.17/

The release candidate is a zip archive of the sources in:

     https://svn.apache.org/repos/asf/pdfbox/tags/1.8.17/

The SHA-512 checksum of the archive is 
e808b3b159b61b5928b0ad983b3bdadfc694ee80ca8a209669d591f90335165a45de684ea04b23d0a149bfc7ce5d890a287cb4e79300f3a08bb954884024c909. 



Please vote on releasing this package as Apache PDFBox 1.8.17.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 1.8.17
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5455) java.lang.ExceptionInInitializerError in org.apache.pdfbox.util.PDFTextStripper class

2022-06-10 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552602#comment-17552602
 ] 

Timo Boehme commented on PDFBOX-5455:
-

You use a very old version of PDFBox. Even the 1.8 branch is at 1.8.16 but it 
is highly recommended to use the newest 2.0.26. Please test your file with a 
current version.

> java.lang.ExceptionInInitializerError in  
> org.apache.pdfbox.util.PDFTextStripper class
> --
>
> Key: PDFBOX-5455
> URL: https://issues.apache.org/jira/browse/PDFBOX-5455
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.9
>Reporter: Kalpesh Patel
>Priority: Minor
>
> Unable to read pdf file . Getting below exception - 
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 1
>     at 
> org.apache.pdfbox.util.PDFTextStripper.(PDFTextStripper.java:123)
>  
> Let me know if more details needed
>  
> [~Bettenburg] 
>  
> [~will86] 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5455) java.lang.ExceptionInInitializerError in org.apache.pdfbox.util.PDFTextStripper class

2022-06-10 Thread Timo Boehme (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme updated PDFBOX-5455:

Priority: Minor  (was: Blocker)

> java.lang.ExceptionInInitializerError in  
> org.apache.pdfbox.util.PDFTextStripper class
> --
>
> Key: PDFBOX-5455
> URL: https://issues.apache.org/jira/browse/PDFBOX-5455
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 1.8.9
>Reporter: Kalpesh Patel
>Priority: Minor
>
> Unable to read pdf file . Getting below exception - 
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds 
> for length 1
>     at 
> org.apache.pdfbox.util.PDFTextStripper.(PDFTextStripper.java:123)
>  
> Let me know if more details needed
>  
> [~Bettenburg] 
>  
> [~will86] 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.25

2021-12-16 Thread Timo Boehme

+1

Thanks,
Timo


Am 13.12.21 um 20:02 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.25 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.25/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.25/

The SHA-512 checksum of the archive is 
e143b2a9aaa4b1f1be72e16a1c9968dacfcb3e89b4f21fdbd0580d8c9f1c9b54ee38d05fe3e52ff93493c858c51090fdd8256d22153cffba1e9b523fdbd1f2f4. 



Please vote on releasing this package as Apache PDFBox 2.0.25.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.25
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
| fax: +49 345 478 047 1
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.24

2021-06-10 Thread Timo Boehme

+1

Thank you
Timo

Am 07.06.21 um 18:51 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.24 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/

The SHA-512 checksum of the archive is 
5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642. 



Please vote on releasing this package as Apache PDFBox 2.0.24.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.24
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
| fax: +49 345 478 047 1
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Retire Subproject Preflight

2021-05-27 Thread Timo Boehme

Hi,

+1

since there wasn't really any development here in the last years and the 
efforts of making the PDFBox parser more lenient with broken PDF 
documents contradict the specification checks of Preflight


Timo


Am 27.05.21 um 08:33 schrieb Andreas Lehmkuehler:

Hi,

a discussion came up on dev@pdfbox [1] to retire Preflight and I had the 
impression that we already reached consensus to do so.


I'd like to run a formal vote so that this topic won't get lost in some 
mailing list thread.


Please vote on retiring the subproject Preflight with Apache PDFBox 4.0.0.
The vote is open for the next 7 days and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Remove Preflight with Apache PDFBox 4.0.0
     [ ] -1 Do not remove Preflight because...

Here is my +1

Andreas

P.S.: I've extended the voting period to 7 days to ensure that everybody 
has a chance to think about it and speak up if necessary.



[1] 
https://lists.apache.org/thread.html/r8abffe02ff4a94be93b7799b589532dc2a3384d6c5cd727bc388250a%40%3Cdev.pdfbox.apache.org%3E 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
| fax: +49 345 478 047 1
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5176) java.io.IOException: Page tree root must be a dictionary

2021-05-12 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343163#comment-17343163
 ] 

Timo Boehme commented on PDFBOX-5176:
-

I think as we are confronted with Fuzzer generated PDFs or hand crafted bad 
PDFs we should strive for PDFBOX remaining stable (no memory overflow/infinite 
loop/stack overflow) independent of the PDF content (at least in the long run). 
Saying this processing or not processing a dictionary/value should not 
influence the stability and thus the question of how to process the problematic 
dictionary should merely be answered by how we best preserve the document 
content - maybe by parsing as much as possible and only skipping clearly 
corrupted parts(?)

> java.io.IOException: Page tree root must be a dictionary
> 
>
> Key: PDFBOX-5176
> URL: https://issues.apache.org/jira/browse/PDFBOX-5176
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: GHOSTSCRIPT-695040-0.zip-71.pdf, 
> GHOSTSCRIPT-695040-0.zip-87.pdf
>
>
> Happens only on 3.0, not on 2.0.23



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 3.0.0-RC1

2021-04-01 Thread Timo Boehme

Hi,

+1

I found only one small bug when calling
java -jar pdfbox-app-3.0.0-RC1.jar help debug
the help says
Usage: pdfbox pdfdebugger
but should
Usage: pdfbox debug

However this is a minor problem which should be ok for RC1.


Best regards,
Timo


Am 29.03.21 um 19:08 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 3.0.0-RC1 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0-RC1/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/3.0.0-RC1/

The SHA-512 checksum of the archive is 
b4ed9fec1d5e86422452bda3d9ec66206aa665277d4aebe1e7053a0ef38de211d8440375bcaf05a4a5c0070d2bdfa9d30df94df2c128f6c15c8fb5b008550987. 



Please vote on releasing this package as Apache PDFBox 3.0.0-RC1.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 3.0.0-RC1
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
| fax: +49 345 478 047 1
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.23

2021-03-16 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 15.03.21 um 19:44 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.23 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.23/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.23/

The SHA-512 checksum of the archive is 
9333cc6557b36d0355e84aa046b5f97b3f5d6e55337b316808e9cb04cec774e0db74f8a12079ca30104fe2853c7c1b4f090483238c47d0a2ccf7d5071b606378. 



Please vote on releasing this package as Apache PDFBox 2.0.23.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.23
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

email: timo.boe...@ontochem.com | web: www.ontochem.com
| fax: +49 345 478 047 1
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO)


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2020 due

2020-10-13 Thread Timo Boehme

+1

Thanks,
Timo


Am 11.10.20 um 19:09 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...



## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Issues:
There are no issue requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (11 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.21 was released on 2020-08-20.
     2.0.20 was released on 2020-05-07.
     2.0.19 was released on 2020-02-23.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the

   mailing lists
- the improvement of the on demand parser in the trunk is an ongoing effort
- there are a lot of refactorings, improvements and bugfixes (selected 
choice

   follows)
-- support for compressed object streams contributed by Christian Appl
    (ongoing)
-- performance enhancements contributed by Alfred Faltiska
-- support for incremental updates
-- improve test coverage an code quality



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4950) No lcms in java.library.path?

2020-09-02 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189316#comment-17189316
 ] 

Timo Boehme commented on PDFBOX-4950:
-

Have you checked using ldd if all dependencies are ok for the lcms library? I 
had trouble with a native library using the Alpine Linux and its musl libc 
implementation.

> No lcms in java.library.path?
> -
>
> Key: PDFBOX-4950
> URL: https://issues.apache.org/jira/browse/PDFBOX-4950
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.16
> Environment: Alpine 3.10.0
> Docker container
> OpenJDK 11
>Reporter: Me
>Priority: Major
>
> Hi. I'm working with on a Camunda (BPM engine) integration that leverages 
> pdfbox to generate a pdf programmatically. I have the pdfbox jar deployed 
> with my WAR:
> {quote}ls -l /camunda/webapps/test-workflows/WEB-INF/lib/
> -rw-r- 1 camunda camunda 62983 Aug 3 16:35 activation-1.1.jar
>  -rw-r- 1 camunda camunda 46874 Aug 3 16:35 
> camunda-bpm-mail-core-1.2.0.jar
>  -rw-r- 1 camunda camunda 4332 Aug 3 16:35 
> camunda-commons-logging-1.6.1.jar
>  -rw-r- 1 camunda camunda 12050 Aug 3 16:35 
> camunda-commons-utils-1.6.1.jar
>  -rw-r- 1 camunda camunda 19240 Aug 3 16:35 camunda-connect-core-1.1.0.jar
>  -rw-r- 1 camunda camunda 23464 Aug 3 16:35 
> camunda-identity-ldap-7.10.0.jar
>  -rw-r- 1 camunda camunda 246174 Aug 3 16:35 commons-beanutils-1.9.3.jar
>  -rw-r- 1 camunda camunda 335042 Aug 3 16:35 commons-codec-1.11.jar
>  -rw-r- 1 camunda camunda 575389 Aug 3 16:35 commons-collections-3.2.1.jar
>  -rw-r- 1 camunda camunda 752798 Aug 3 16:35 commons-collections4-4.2.jar
>  -rw-r- 1 camunda camunda 434678 Aug 3 16:35 commons-lang3-3.4.jar
>  -rw-r- 1 camunda camunda 61829 Aug 3 16:35 commons-logging-1.2.jar
>  -rw-r- 1 camunda camunda 182954 Aug 3 16:35 commons-text-1.3.jar
>  -rw-r- 1 camunda camunda 1558165 Aug 3 16:35 fontbox-2.0.16.jar
>  -rw-r- 1 camunda camunda 767916 Aug 3 16:35 httpclient-4.5.7.jar
>  -rw-r- 1 camunda camunda 326874 Aug 3 16:35 httpcore-4.4.11.jar
>  -rw-r- 1 camunda camunda 41779 Aug 3 16:35 httpmime-4.5.7.jar
>  -rw-r- 1 camunda camunda 66519 Aug 3 16:35 jackson-annotations-2.9.0.jar
>  -rw-r- 1 camunda camunda 324036 Aug 3 16:35 jackson-core-2.9.7.jar
>  -rw-r- 1 camunda camunda 1350857 Aug 3 16:35 jackson-databind-2.9.7.jar
>  -rw-r- 1 camunda camunda 603571 Aug 3 16:35 javax.mail-1.5.5.jar
>  -rw-r- 1 camunda camunda 170348 Aug 3 16:35 opencsv-4.6.jar
>  -rw-r- 1 camunda camunda 2684592 Aug 3 16:35 pdfbox-2.0.16.jar
>  -rw-r- 1 camunda camunda 29257 Aug 3 16:35 slf4j-api-1.7.7.jar
>  -rw-r- 1 camunda camunda 62599 Aug 3 16:35 smtp-1.6.0.jar
> {quote}
> When I try to use PDDocument.load():
> {quote}PDDocument pdfDocument = 
> PDDocument.load(this.getClass().getClassLoader().getResourceAsStream(template));
> {quote}
> I observe the following exception that lcms isn't in the java library path:
> {quote}01-Sep-2020 14:02:06.445 SEVERE [http-nio-8080-exec-1] 
> org.camunda.commons.logging.BaseLogger.logError ENGINE-16004 Exception while 
> closing command context: no lcms in java.library.path: 
> [/usr/lib/jvm/java-11-openjdk/lib/server, /usr/lib/jvm/java-11-openjdk/lib, 
> /usr/lib/jvm/java-11-openjdk/../lib, /usr/java/packages/lib, /usr/lib64, 
> /lib64, /lib, /usr/lib]bpm    | java.lang.UnsatisfiedLinkError: no lcms in 
> java.library.path: [/usr/lib/jvm/java-11-openjdk/lib/server, 
> /usr/lib/jvm/java-11-openjdk/lib, /usr/lib/jvm/java-11-openjdk/../lib, 
> /usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]bpm    | at 
> java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2660)bpm    | at 
> java.base/java.lang.Runtime.loadLibrary0(Runtime.java:829)bpm    | at 
> java.base/java.lang.System.loadLibrary(System.java:1867)bpm    | at 
> java.desktop/sun.java2d.cmm.lcms.LCMS$1.run(LCMS.java:209)bpm    | at 
> java.base/java.security.AccessController.doPrivileged(Native Method)bpm    | 
> at java.desktop/sun.java2d.cmm.lcms.LCMS.getModule(LCMS.java:202)bpm    | at 
> java.desktop/sun.java2d.cmm.lcms.LcmsServiceProvider.getModule(LcmsServiceProvider.java:34)bpm
>     | at 
> java.desktop/sun.java2d.cmm.CMMServiceProvider.getColorManagementModule(CMMServiceProvider.java:31)bpm
>     | at 
> java.desktop/sun.java2d.cmm.CMSManager.getModule(CMSManager.java:68)bpm    | 
> at 
> java.desktop/java.awt.color.ICC_ColorSpace.toRGB(ICC_ColorSpace.java:177)bpm  
>   | at 
> org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB.init(PDDeviceRGB.java:

[jira] [Commented] (PDFBOX-4950) No lcms in java.library.path?

2020-09-02 Thread Timo Boehme (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189278#comment-17189278
 ] 

Timo Boehme commented on PDFBOX-4950:
-

According to your java.library.path shown in the error log /usr/lib is not 
included - only /usr/lib64 or /lib (if I did not overlooked it)

 

> No lcms in java.library.path?
> -
>
> Key: PDFBOX-4950
> URL: https://issues.apache.org/jira/browse/PDFBOX-4950
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.16
> Environment: Alpine 3.10.0
> Docker container
> OpenJDK 11
>Reporter: Me
>Priority: Major
>
> Hi. I'm working with on a Camunda (BPM engine) integration that leverages 
> pdfbox to generate a pdf programmatically. I have the pdfbox jar deployed 
> with my WAR:
> {quote}ls -l /camunda/webapps/test-workflows/WEB-INF/lib/
> -rw-r- 1 camunda camunda 62983 Aug 3 16:35 activation-1.1.jar
>  -rw-r- 1 camunda camunda 46874 Aug 3 16:35 
> camunda-bpm-mail-core-1.2.0.jar
>  -rw-r- 1 camunda camunda 4332 Aug 3 16:35 
> camunda-commons-logging-1.6.1.jar
>  -rw-r- 1 camunda camunda 12050 Aug 3 16:35 
> camunda-commons-utils-1.6.1.jar
>  -rw-r- 1 camunda camunda 19240 Aug 3 16:35 camunda-connect-core-1.1.0.jar
>  -rw-r- 1 camunda camunda 23464 Aug 3 16:35 
> camunda-identity-ldap-7.10.0.jar
>  -rw-r- 1 camunda camunda 246174 Aug 3 16:35 commons-beanutils-1.9.3.jar
>  -rw-r- 1 camunda camunda 335042 Aug 3 16:35 commons-codec-1.11.jar
>  -rw-r- 1 camunda camunda 575389 Aug 3 16:35 commons-collections-3.2.1.jar
>  -rw-r- 1 camunda camunda 752798 Aug 3 16:35 commons-collections4-4.2.jar
>  -rw-r- 1 camunda camunda 434678 Aug 3 16:35 commons-lang3-3.4.jar
>  -rw-r- 1 camunda camunda 61829 Aug 3 16:35 commons-logging-1.2.jar
>  -rw-r- 1 camunda camunda 182954 Aug 3 16:35 commons-text-1.3.jar
>  -rw-r- 1 camunda camunda 1558165 Aug 3 16:35 fontbox-2.0.16.jar
>  -rw-r- 1 camunda camunda 767916 Aug 3 16:35 httpclient-4.5.7.jar
>  -rw-r- 1 camunda camunda 326874 Aug 3 16:35 httpcore-4.4.11.jar
>  -rw-r- 1 camunda camunda 41779 Aug 3 16:35 httpmime-4.5.7.jar
>  -rw-r- 1 camunda camunda 66519 Aug 3 16:35 jackson-annotations-2.9.0.jar
>  -rw-r- 1 camunda camunda 324036 Aug 3 16:35 jackson-core-2.9.7.jar
>  -rw-r- 1 camunda camunda 1350857 Aug 3 16:35 jackson-databind-2.9.7.jar
>  -rw-r- 1 camunda camunda 603571 Aug 3 16:35 javax.mail-1.5.5.jar
>  -rw-r- 1 camunda camunda 170348 Aug 3 16:35 opencsv-4.6.jar
>  -rw-r- 1 camunda camunda 2684592 Aug 3 16:35 pdfbox-2.0.16.jar
>  -rw-r- 1 camunda camunda 29257 Aug 3 16:35 slf4j-api-1.7.7.jar
>  -rw-r- 1 camunda camunda 62599 Aug 3 16:35 smtp-1.6.0.jar
> {quote}
> When I try to use PDDocument.load():
> {quote}PDDocument pdfDocument = 
> PDDocument.load(this.getClass().getClassLoader().getResourceAsStream(template));
> {quote}
> I observe the following exception that lcms isn't in the java library path:
> {quote}01-Sep-2020 14:02:06.445 SEVERE [http-nio-8080-exec-1] 
> org.camunda.commons.logging.BaseLogger.logError ENGINE-16004 Exception while 
> closing command context: no lcms in java.library.path: 
> [/usr/lib/jvm/java-11-openjdk/lib/server, /usr/lib/jvm/java-11-openjdk/lib, 
> /usr/lib/jvm/java-11-openjdk/../lib, /usr/java/packages/lib, /usr/lib64, 
> /lib64, /lib, /usr/lib]bpm    | java.lang.UnsatisfiedLinkError: no lcms in 
> java.library.path: [/usr/lib/jvm/java-11-openjdk/lib/server, 
> /usr/lib/jvm/java-11-openjdk/lib, /usr/lib/jvm/java-11-openjdk/../lib, 
> /usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]bpm    | at 
> java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2660)bpm    | at 
> java.base/java.lang.Runtime.loadLibrary0(Runtime.java:829)bpm    | at 
> java.base/java.lang.System.loadLibrary(System.java:1867)bpm    | at 
> java.desktop/sun.java2d.cmm.lcms.LCMS$1.run(LCMS.java:209)bpm    | at 
> java.base/java.security.AccessController.doPrivileged(Native Method)bpm    | 
> at java.desktop/sun.java2d.cmm.lcms.LCMS.getModule(LCMS.java:202)bpm    | at 
> java.desktop/sun.java2d.cmm.lcms.LcmsServiceProvider.getModule(LcmsServiceProvider.java:34)bpm
>     | at 
> java.desktop/sun.java2d.cmm.CMMServiceProvider.getColorManagementModule(CMMServiceProvider.java:31)bpm
>     | at 
> java.desktop/sun.java2d.cmm.CMSManager.getModule(CMSManager.java:68)bpm    | 
> at 
> java.desktop/java.awt.color.ICC_ColorSpace.toRGB(ICC_ColorSpace.java:177)bpm  
>   | at 
> org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB.init(PDDeviceRGB.java:68)bpm
>     | at 
> org.apache.pdfbo

Re: [VOTE] Release Apache PDFBox 2.0.21

2020-08-18 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 17.08.20 um 17:56 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.21 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.21/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.21/

The SHA-512 checksum of the archive is 
18966eb4201de80b0d3220ab68d8d6062d23346c0ea6263df793c8c9f020ac2b3f173d7393c1bded46a474d44d5b8b839b0ca7f0bcba7b3d7d50196f98942691. 



Please vote on releasing this package as Apache PDFBox 2.0.21.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.21
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report July 2020 due

2020-07-06 Thread Timo Boehme

Hi,

+1,

Thanks,
Timo


Am 03.07.20 um 16:30 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...



## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Issues:
There are no issue requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (11 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:

     2.0.20 was released on 2020-05-07.
     2.0.19 was released on 2020-02-23.
     2.0.18 was released on 2019-12-23.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the

   mailing lists
- the improvement of the on demand parser in the trunk is an ongoing effort
- there are a lot of refactorings, improvements and bugfixes
- our website build is converted to a fully automated maven build 
without the

   need to install any aditional software
- Maruan, one of our pmcs, donated a virtual server which is now the 
home for
   Tikas bunch of test docs to be used for regressions tests in PDFBox, 
POI and

   Tika



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report April 2020 due

2020-04-06 Thread Timo Boehme

+1

Thanks,
Timo

Am 03.04.20 um 16:51 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report wizard template which can be found at [1]

Any comments or additions are appreciated ...




## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Issues:
There are no issue requiring board attention at this time.

## Membership Data:
Apache PDFBox was founded 2009-10-21 (10 years ago)
There are currently 21 committers and 21 PMC members in this project.
The Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Recent releases:
     2.0.19 was released on 2020-02-23.
     2.0.18 was released on 2019-12-23.
     3.0.3 JBIG2 was released on 2019-12-18.

## Community Health:
- there is a steady stream of contributions, bug reports and questions 
on the

   mailing lists
- the improvement of the on demand parser in the trunk is an ongoing effort
   and a base version is available now. First results are promising with 
regard
   to performance and memory foodprint. There are some TODOs on our 3.0 
list

- there are as well a lot of refactorings, improvements and bugfixes



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.19

2020-02-21 Thread Timo Boehme

+1

Thanks.

Best regards,
Timo


Am 20.02.20 um 18:46 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.19 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.19/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.19/

The SHA-512 checksum of the archive is 
b9dcb725ca5123ebe9a8018532733acd443a345fe0a0448dec9ce5776c0b8b2fac420302e550064150403b987960b98a6bb85bff5a86bfbe8d291ba19ac950f8. 



Please vote on releasing this package as Apache PDFBox 2.0.19.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.19
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.3

2019-12-17 Thread Timo Boehme

+1

Thanks,
Timo


Am 14.12.19 um 15:53 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox JBIG2 ImageIO 3.0.3 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio/3.0.3/

The release candidate is a zip archive of the sources in:

     https://github.com/apache/pdfbox-jbig2/tree/3.0.3/

The SHA-512 checksum of the archive is 
5350b4ce89af72eea5069f6ea5fc830238e4df711712506405aaf0e14546a1b07155b8c5225b47f0d40ce2821032426a2987adbe0df63c536cae4fb319b5c700. 



Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.3.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.3
     [ ] -1 Do not release this package because...

Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2019 due

2019-10-04 Thread Timo Boehme

+1

Thanks,
Timo


Am 03.10.19 um 13:45 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report wizard template which can be found at [1]



## Description:
The mission of PDFBox is the creation and maintenance of software 
related to

Java library for working with PDF documents

## Issues:
There are no issues requiring board attention at this time.

## Membership Data:
This month is the 10th anniversary of Apache PDFBox. We graduated as TLP on
2009-10-21.

There are currently 21 committers and 21 PMC members in this project. The
Committer-to-PMC ratio is 1:1.

Community changes, past quarter:
- No new PMC members. Last addition was Matthäus Mayer on 2017-10-16.
- No new committers. Last addition was Joerg O. Henne on 2017-10-09.

## Project Activity:
Software development activity:
- the work on 2.0.18 already started with a handful of fixes
- the minimum requirement for the trunk is now java 8
- the improvement of the on demand parser of the trunk is an ongoing 
effort,

   as well as some other refactorings and improvements
- we are waiting for our sonar project to be moved to the new location

Recent releases:
- 2.0.17 was released on 2019-09-20
- 2.0.16 was released on 2019-06-27
- 2.0.15 was released on 2019-04-11

## Community Health:
There is a steady stream of contributions, bug reports and questions on the
mailing lists.



Andreas

[1] https://reporter.apache.org/wizard/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Switch trunk to java 8

2019-09-18 Thread Timo Boehme

+1

Timo


Am 15.09.19 um 11:00 schrieb Andreas Lehmkuehler:

Hi,

I'd like to switch the trunk to java 8 as I like to use some java 8 
features like streams in the near future.


Are there any objections?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.17

2019-09-18 Thread Timo Boehme

+1

Thanks,
Timo


Am 17.09.19 um 20:22 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.17 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.17/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.17/

The SHA-512 checksum of the archive is 
2c87384ec0ce768b01a653951c570dbb075f3e1ec63a7bf58d652bcab8e7c73375ae8ce2d133ba852d1ec21f999f3a12eeeaa8b982f5b007e92f5f1683032798. 



Please vote on releasing this package as Apache PDFBox 2.0.17.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.17
     [ ] -1 Do not release this package because...

Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
OntoChem GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 215461 Amtsgericht Stendal  | USt-IdNr.: DE246232735
managing director : Lutz Weber

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-23 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890976#comment-16890976
 ] 

Timo Boehme commented on PDFBOX-4601:
-

So while its good to hear the fix is working I'm not convinced we should really 
apply the patch. When RandomAccessFile - which is quite a basic class in Java - 
is not correctly working, than the whole system should be regarded as 
unreliable (at least for running Java; the mentioned reason allows for 
assumptions for other broken functions). Fortunately it seems this bug will be 
fixed in short time.

Thus maybe one should wait for a fixed AWS system instead of adding the 
workaround to our code - so far this kind of problem was not reported by anyone 
else - WDYT?

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890155#comment-16890155
 ] 

Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 1:19 PM:
--

As I don't have this buggy environment it is a bit hard to recoment. Maybe 
writing the last byte of the enlarged RAF may trigger the correct length, 
something like
{code:java}
long origFilePointer = raf.getFilePointer();
raf.seek(fileLen - 1);
raf.write(0);
raf.seek(origFilePointer);
{code}
after
{code:java}
raf.setLength(fileLen);
{code}
Probably already the seek operation is enough?

Please report back if this helps - maybe finding the least costly IO-operation 
for getting the real size. We may add this as an optional workaround which 
could be enabled using a system property.


was (Author: tboehme):
As I don't have this buggy environment it is a bit hard to recoment. Maybe 
writing the last byte of the enlarged RAF may trigger the correct length, 
something like
{code:java}
long origFilePointer = raf.getFilePointer();
raf.seek(fileLen - 1);
raf.write(0);
raf.seek(origFilePointer);
{code}
maybe already the seek operation is enough?

Please report back if this helps - maybe finding the least costly IO-operation 
for getting the real size. We may add this as an optional workaround which 
could be enabled using a system property.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890155#comment-16890155
 ] 

Timo Boehme commented on PDFBOX-4601:
-

As I don't have this buggy environment it is a bit hard to recomment. Maybe 
writing the last byte of the enlarged RAF may trigger the correct length, 
something like
{code:java}
long origFilePointer = raf.getFilePointer();
raf.seek(fileLen - 1);
raf.write(0);
raf.seek(origFilePointer);
{code}
maybe already the seek operation is enough?

Please report back if this helps - maybe finding the least costly IO-operation 
for getting the real size. We may add this as an optional workaround which 
could be enabled using a system property.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890155#comment-16890155
 ] 

Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 1:15 PM:
--

As I don't have this buggy environment it is a bit hard to recoment. Maybe 
writing the last byte of the enlarged RAF may trigger the correct length, 
something like
{code:java}
long origFilePointer = raf.getFilePointer();
raf.seek(fileLen - 1);
raf.write(0);
raf.seek(origFilePointer);
{code}
maybe already the seek operation is enough?

Please report back if this helps - maybe finding the least costly IO-operation 
for getting the real size. We may add this as an optional workaround which 
could be enabled using a system property.


was (Author: tboehme):
As I don't have this buggy environment it is a bit hard to recomment. Maybe 
writing the last byte of the enlarged RAF may trigger the correct length, 
something like
{code:java}
long origFilePointer = raf.getFilePointer();
raf.seek(fileLen - 1);
raf.write(0);
raf.seek(origFilePointer);
{code}
maybe already the seek operation is enough?

Please report back if this helps - maybe finding the least costly IO-operation 
for getting the real size. We may add this as an optional workaround which 
could be enabled using a system property.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890082#comment-16890082
 ] 

Timo Boehme commented on PDFBOX-4601:
-

Don't known if [https://bugs.openjdk.java.net/browse/JDK-8202261] has something 
to do with this.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890076#comment-16890076
 ] 

Timo Boehme commented on PDFBOX-4601:
-

For me this is a quite strange behavior (e.g. fileLen after: 131072, raf 
length: 65536): after setting the RAF size and checking it does not report the 
new size. Somehow it seems the file system does not report the correct new size 
(while testing on my end the set length is immediately also reported on 
raf.length()). Is this a special behavior on the AWS filesystem or JDK? It 
seems there is some caching or lazy propagation of IO operations ...

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890011#comment-16890011
 ] 

Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 9:02 AM:
--

Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it 
should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) 
and cannot have a size of 192512 (47 4kB pages). Currently when setting the new 
file length we rely on RandomAccessFile.setLength(X) to throw an exception if 
setting this size is not possible. Somehow setting the new size did not work in 
your case and there was no exception.

Does this happen regularly/each time? You may add a check after
{code:java}
raf.setLength(fileLen);
{code}
if the file could not be set to the new length ( {{if (raf.length() != fileLen) 
...}} ) and report if that is the case here (just realized that this is what 
[~tilman] did with the debug logging).


was (Author: tboehme):
Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it 
should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) 
and cannot have a size of 192512 (47 4kB pages). Currently when setting the new 
file length we rely on RandomAccessFile.setLength(X) to throw an exception if 
setting this size is not possible. Somehow setting the new size did not work in 
your case and there was no exception.

Does this happen regularly/each time? You may add a check after
{code:java}
raf.setLength(fileLen);
{code}
if the file could not be set to the new length ( {{if (raf.length() != fileLen) 
...}} ) and report if that is the case here.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890011#comment-16890011
 ] 

Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 8:57 AM:
--

Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it 
should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) 
and cannot have a size of 192512 (47 4kB pages). Currently when setting the new 
file length we rely on RandomAccessFile.setLength(X) to throw an exception if 
setting this size is not possible. Somehow setting the new size did not work in 
your case and there was no exception.

Does this happen regularly/each time? You may add a check after
{code:java}
raf.setLength(fileLen);
{code}
if the file could not be set to the new length ( {{if (raf.length() != fileLen) 
...}} ) and report if that is the case here.


was (Author: tboehme):
Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it 
should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) 
and cannot have a size of 192512 (47 4kB pages). Currently when setting the new 
file length we rely on RandomAccessFile.setLength(X) to throw an exception if 
setting this size is not possible. Somehow setting the new size did not work in 
your case and there was no exception.

Does this happen regularly/each time? You may add a check after
{code:java}
raf.setLength(fileLen);
{code}
if the file could not be set to the new length ( {{if (raf.length() != fileLen) 
...}} ) and report if this here.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890011#comment-16890011
 ] 

Timo Boehme commented on PDFBOX-4601:
-

Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it 
should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) 
and cannot have a size of 192512 (47 4kB pages). Currently when setting the new 
file length we rely on RandomAccessFile.setLength(X) to throw an exception if 
setting this size is not possible. Somehow setting the new size did not work in 
your case and there was no exception.

Does this happen regularly/each time? You may add a check after
{code:java}
raf.setLength(fileLen);
{code}
if the file could not be set to the new length ( {{if (raf.length() != fileLen) 
...}} ) and report if this here.

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512

2019-07-22 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889974#comment-16889974
 ] 

Timo Boehme commented on PDFBOX-4601:
-

Hi, regarding
{code:java}
if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
{code}
while at first it seems contradictory in its meaning, but you should also read 
the comment above:
{code:java}
// enlarge if we do not overflow
{code}
so this tests for the seldom case of int overflow. The maximum-page-count is 
tested at start of the method - when later increasing the page count it is 
assumed that adding the #ENLARGE_PAGE_COUNT amount is not problematic even if 
maxPageCount - pageCount is less than this value (few 4kB pages).

> in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected 
> scratch file size of 196608 but found 192512
> -
>
> Key: PDFBOX-4601
> URL: https://issues.apache.org/jira/browse/PDFBOX-4601
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.12, 2.0.16
> Environment: AWS Lambda
>Reporter: biswajit
>Priority: Major
> Fix For: 2.0.17
>
>
> in AWS lambda pdf merge giving error as
> {{Error in pdf consolidation: Expected scratch file size of 196608 but found 
> 192512.}}
> *Code:*
> {code}
> PDFMergerUtility pdfMerger = new PDFMergerUtility();
> pdfMerger.addSources(sources);
> pdfMerger.setDestinationStream(mergedPDFOutputStream);
> pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
> {code}
> both InputStream and OutputStream are ByteArrayInputStream and 
> ByteArrayOutputStream. AWS Lambda environment has 512MB space available only 
> for /tmp partition. This could be an issue or not I am not sure. And AWS 
> lambda do not permit other directory than /tmp partition to create files.
> And while reading into the code I found below piece of code which I think 
> always be true. Because if you add some constant amount to an integer that 
> will always be constant amount greater than its original value
> in ScratchFile.java => enlarge() method:
> {code}
> if (pageCount + ENLARGE_PAGE_COUNT > pageCount)
> {
>   fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE;
>   raf.setLength(fileLen);
>   freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [ANNOUNCE] Apache PDFBox 2.0.16 released

2019-06-28 Thread Timo Boehme
ibe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.16

2019-06-26 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 24.06.19 um 19:35 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.16 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.16/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.16/

The SHA-512 checksum of the archive is 
cd82d40f19500bb7b510d0eb25664779ae63a12152e5ccc92a643db12e438d8700d6f74093a1f2e739780b5fecacc7636aabfe5a4b9b85dd32eb1bc1394f3f71. 



Please vote on releasing this package as Apache PDFBox 2.0.16.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.16
     [ ] -1 Do not release this package because...


Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4559) Parse error reading document from several threads

2019-06-17 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865403#comment-16865403
 ] 

Timo Boehme edited comment on PDFBOX-4559 at 6/17/19 8:17 AM:
--

I think we have to explore different levels of creating/using streams in regard 
to be thread safe. The base implementation for our memory paging - ScratchFile 
- is (as the Javadoc states) thread safe (at least it was meant to be :)). 
However the RandomAccess instances (ScratchFileBuffer) created from it are not 
- as we have possibilities of mixed reads and writes (and so far parallel 
access to an instance was not supported by the API). RandomAccessInputStream is 
only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The 
first step would be to switch the ScratchFileBuffer in a read-only mode (or 
have a small wrapper only allowing thread-safe read access, implementing 
RandomAccessRead).

However even this might not help in this case as using a single 
RandomAccessInputStream from multiple threads will lead to errors (even if the 
methods would be synchronized) as one thread would not see a sequential stream 
of input bytes because the other threads will read some bytes in between.

For thread safe access the RandomAccessInputStream has to be created on request 
of a specific thread and method which wants to read the data. Thus the 
COSInputStream would have to store the thread safe RandomAccessRead 
implementation (as it does so indirectly now for the ScratchFileBuffer 
underlying the RandomAccessInputStream) and would have a method for creating a 
RandomAccessInputStream each time it is needed (being only a small access 
wrapper for the data).

 


was (Author: tboehme):
I think we have to explore different levels of creating/using streams in regard 
to be thread safe. The base implementation for out memory paging - ScratchFile 
- is (as the Javadoc states) thread safe (at least was meant to be it :) ). 
However the RandomAccess instances (ScratchFileBuffer) created from it are not 
- as we have possibilities of mixed reads and writes (and so far parallel 
access to an instance was not supported by the API). RandomAccessInputStream is 
only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The 
first step would be to switch the ScratchFileBuffer in a read-only mode (or 
have a small wrapper only allowing thread-safe read access, implementing 
RandomAccessRead).

However even this might not help in this case as using a single 
RandomAccessInputStream from multiple threads will be go wrong (even if the 
methods would be synchronized) as one thread would not see a sequential stream 
of input bytes but the other threads will read some bytes in between.

For thread safe access the RandomAccessInputStream has to be created on request 
of a specific thread and method which wants to read the data. Thus the 
COSInputStream would have to store the thread safe RandomAccessRead 
implementation (as it does so indirectly now for the ScratchFileBuffer 
underlying the RandomAccessInputStream) and would have a method for creating a 
RandomAccessInputStream each time it is needed (beeing only a small access 
wrapper for the data).

 

> Parse error reading document from several threads
> -
>
> Key: PDFBOX-4559
> URL: https://issues.apache.org/jira/browse/PDFBOX-4559
> Project: PDFBox
>  Issue Type: Bug
>  Components: Documentation, Rendering
>Affects Versions: 2.0.15
> Environment: Oracle Java 8 update125 on both Mac OS X and centos
>Reporter: Jack
>Priority: Major
>  Labels: concurrency, multithreading, type1, type1font
> Attachments: test.pdf
>
>
> I got following error while running a simple parallel rendering code. 
> However, the error doesn't happen when I change parallelStream to sequential 
> (stream()). Interestingly, both methods will render exact same images. I saw 
> a possible related ticket PDFBOX-3654. But seems that issue was fixed. I'd 
> like to learn if we have some more bugs related?  
> *Sample code*:
> {code:java}
> PDDocument document = PDDocument.load(new File(pdfFilename));
> List pdfPages = new Splitter().split(document);
> pdfPages.parallelStream().forEach(page -> {
>  try {
> PDFRenderer renderer = new PDFRenderer(page);
> renderer.renderImageWithDPI(0, 180, ImageType.RGB); // change dpi to your 
> number
> } catch (IOException e) {
>  System.out.println(e);
> }
> try {
>  pdfPage.close();
> } catch (IOException ignored) {
> }
> });
> try {
>  document.close();
> } catch (IOException ignored) {
> }
> {code}
>  
> *Error log*:
> {noformat}
> ERROR [PDType1

[jira] [Commented] (PDFBOX-4559) Parse error reading document from several threads

2019-06-17 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865403#comment-16865403
 ] 

Timo Boehme commented on PDFBOX-4559:
-

I think we have to explore different levels of creating/using streams in regard 
to be thread safe. The base implementation for out memory paging - ScratchFile 
- is (as the Javadoc states) thread safe (at least was meant to be it :) ). 
However the RandomAccess instances (ScratchFileBuffer) created from it are not 
- as we have possibilities of mixed reads and writes (and so far parallel 
access to an instance was not supported by the API). RandomAccessInputStream is 
only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The 
first step would be to switch the ScratchFileBuffer in a read-only mode (or 
have a small wrapper only allowing thread-safe read access, implementing 
RandomAccessRead).

However even this might not help in this case as using a single 
RandomAccessInputStream from multiple threads will be go wrong (even if the 
methods would be synchronized) as one thread would not see a sequential stream 
of input bytes but the other threads will read some bytes in between.

For thread safe access the RandomAccessInputStream has to be created on request 
of a specific thread and method which wants to read the data. Thus the 
COSInputStream would have to store the thread safe RandomAccessRead 
implementation (as it does so indirectly now for the ScratchFileBuffer 
underlying the RandomAccessInputStream) and would have a method for creating a 
RandomAccessInputStream each time it is needed (beeing only a small access 
wrapper for the data).

 

> Parse error reading document from several threads
> -
>
> Key: PDFBOX-4559
> URL: https://issues.apache.org/jira/browse/PDFBOX-4559
> Project: PDFBox
>  Issue Type: Bug
>  Components: Documentation, Rendering
>Affects Versions: 2.0.15
> Environment: Oracle Java 8 update125 on both Mac OS X and centos
>Reporter: Jack
>Priority: Major
>  Labels: concurrency, multithreading, type1, type1font
> Attachments: test.pdf
>
>
> I got following error while running a simple parallel rendering code. 
> However, the error doesn't happen when I change parallelStream to sequential 
> (stream()). Interestingly, both methods will render exact same images. I saw 
> a possible related ticket PDFBOX-3654. But seems that issue was fixed. I'd 
> like to learn if we have some more bugs related?  
> *Sample code*:
> {code:java}
> PDDocument document = PDDocument.load(new File(pdfFilename));
> List pdfPages = new Splitter().split(document);
> pdfPages.parallelStream().forEach(page -> {
>  try {
> PDFRenderer renderer = new PDFRenderer(page);
> renderer.renderImageWithDPI(0, 180, ImageType.RGB); // change dpi to your 
> number
> } catch (IOException e) {
>  System.out.println(e);
> }
> try {
>  pdfPage.close();
> } catch (IOException ignored) {
> }
> });
> try {
>  document.close();
> } catch (IOException ignored) {
> }
> {code}
>  
> *Error log*:
> {noformat}
> ERROR [PDType1Font] Can't read the embedded Type1 font POAEND+Gotham-Book
> java.io.IOException: unexpected closing parenthesis
>  at org.apache.fontbox.type1.Type1Lexer.readToken(Type1Lexer.java:123) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Lexer.nextToken(Type1Lexer.java:75) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.readValue(Type1Parser.java:398) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.readOtherSubrs(Type1Parser.java:707) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.parseBinary(Type1Parser.java:550) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:64) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.fontbox.type1.Type1Font.createWithSegments(Type1Font.java:85) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:262) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at 
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62)
>  ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) 
> ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT]
>  at 
> org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAn

[jira] [Comment Edited] (PDFBOX-4539) Cache CharsetDecoder

2019-05-09 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836339#comment-16836339
 ] 

Timo Boehme edited comment on PDFBOX-4539 at 5/9/19 12:41 PM:
--

Removed my suggestion as the decode method already does the full decoding cycle 
including reset.


was (Author: tboehme):
Removed my suggestion as the decode method already does the full decoding cycle 
inclusing reset.

> Cache CharsetDecoder
> 
>
> Key: PDFBOX-4539
> URL: https://issues.apache.org/jira/browse/PDFBOX-4539
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 2.0.14
>Reporter: Jonathan
>Priority: Major
>  Labels: performance
> Fix For: 2.0.16
>
>
> We were using PDFBox to parse and process a large number of PDFs, which could 
> potentially contains thousands of pages in total, so performance mattered to 
> us.
> Thus, we'd like to suggest to cache the CharsetDecoder, which is currently 
> instantiated on each call of `isValidUTF8(byte[])`.
> Our suggestion in BaseParser.java
> {code:java}
> private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder();
> /**
>  * Returns true if a byte sequence is valid UTF-8.
>  */
> private boolean isValidUTF8(byte[] input)
> {
> try
> {
> csUTF_8.decode(ByteBuffer.wrap(input));
> return true;
> }
> catch (CharacterCodingException e)
> {
> return false;
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4539) Cache CharsetDecoder

2019-05-09 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836339#comment-16836339
 ] 

Timo Boehme edited comment on PDFBOX-4539 at 5/9/19 12:40 PM:
--

Removed my suggestion as the decode method already does the full decoding cycle 
inclusing reset.


was (Author: tboehme):
How about
{code:java}
private final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder();

/**
 * Returns true if a byte sequence is valid UTF-8.
 */
private boolean isValidUTF8(byte[] input)
{
try
{
csUTF_8.decode(ByteBuffer.wrap(input));
return true;
}
catch (CharacterCodingException e)
{
csUTF_8.reset();
return false;
}
}
{code}

> Cache CharsetDecoder
> 
>
> Key: PDFBOX-4539
> URL: https://issues.apache.org/jira/browse/PDFBOX-4539
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 2.0.14
>Reporter: Jonathan
>Priority: Major
>  Labels: performance
> Fix For: 2.0.16
>
>
> We were using PDFBox to parse and process a large number of PDFs, which could 
> potentially contains thousands of pages in total, so performance mattered to 
> us.
> Thus, we'd like to suggest to cache the CharsetDecoder, which is currently 
> instantiated on each call of `isValidUTF8(byte[])`.
> Our suggestion in BaseParser.java
> {code:java}
> private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder();
> /**
>  * Returns true if a byte sequence is valid UTF-8.
>  */
> private boolean isValidUTF8(byte[] input)
> {
> try
> {
> csUTF_8.decode(ByteBuffer.wrap(input));
> return true;
> }
> catch (CharacterCodingException e)
> {
> return false;
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4539) Cache CharsetDecoder

2019-05-09 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836339#comment-16836339
 ] 

Timo Boehme commented on PDFBOX-4539:
-

How about
{code:java}
private final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder();

/**
 * Returns true if a byte sequence is valid UTF-8.
 */
private boolean isValidUTF8(byte[] input)
{
try
{
csUTF_8.decode(ByteBuffer.wrap(input));
return true;
}
catch (CharacterCodingException e)
{
csUTF_8.reset();
return false;
}
}
{code}

> Cache CharsetDecoder
> 
>
> Key: PDFBOX-4539
> URL: https://issues.apache.org/jira/browse/PDFBOX-4539
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 2.0.14
>Reporter: Jonathan
>Priority: Major
>  Labels: performance
> Fix For: 2.0.16
>
>
> We were using PDFBox to parse and process a large number of PDFs, which could 
> potentially contains thousands of pages in total, so performance mattered to 
> us.
> Thus, we'd like to suggest to cache the CharsetDecoder, which is currently 
> instantiated on each call of `isValidUTF8(byte[])`.
> Our suggestion in BaseParser.java
> {code:java}
> private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder();
> /**
>  * Returns true if a byte sequence is valid UTF-8.
>  */
> private boolean isValidUTF8(byte[] input)
> {
> try
> {
> csUTF_8.decode(ByteBuffer.wrap(input));
> return true;
> }
> catch (CharacterCodingException e)
> {
> return false;
> }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.15

2019-04-09 Thread Timo Boehme

+1

Thanks,
Timo

Am 08.04.19 um 17:15 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.15 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.15/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.15/

The SHA-512 checksum of the archive is 
4f2afe35ae9feb0b2edfd4d7bec1061db5651138bffc124b8bf522f18e5446bbdab2bd1949bed6c12c20dc93d5f82031ab958062553b659d8ae88bb0fef43270. 



Please vote on releasing this package as Apache PDFBox 2.0.15.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.15
     [ ] -1 Do not release this package because...


Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.14

2019-02-27 Thread Timo Boehme

Hi,

+1

Thanks,
Timo

Am 25.02.19 um 18:22 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.14 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.14/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.14/

The SHA-512 checksum of the archive is 
8fe88d2ee4e243e47e651df914cc51b72f5ba0cb737125e8a622137327330e6f542f2f0df13e43bec5148554b262a4cbf4b2b0fbcec985e5db487fe6420a06b3. 



Please vote on releasing this package as Apache PDFBox 2.0.14.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.14
     [ ] -1 Do not release this package because...


Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report January 2019 due

2019-01-07 Thread Timo Boehme

Hi,

+1

BR Timo


Am 06.01.19 um 17:44 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report template which can be found at [1]


Any further comments, objections or additions?


## Description:
  - the Apache PDFBox library is an open source Java tool for working 
with PDF

    documents.

## Issues:
  - there are no issue requiring board attention at this time.

## Activity:
  - more than 20 JIRA tickets were fixed since releasing 2.0.13 so that 
most likely 2.0.14 is about to be released soon
  - due to Sallys post "Apache in 2018 - By The Digits" Tilman is among 
the top 5 committers of 2018 although we are a small community compared 
to other ASF projects, see https://s.apache.org/Apache2018Digits
  - Maruan managed to move all of our repos from git-wip-us to gitbox to 
support infra with the decommission of git-wip-us


## Health report:
  - there is a steady stream of contributions, bug reports and questions 
on the

    mailing lists

## PMC changes:

  - Currently 21 PMC members.
  - No new PMC members added in the last 3 months
  - Last PMC addition was Matthäus Mayer on Mon Oct 16 2017

## Committer base changes:

  - Currently 21 committers.
  - No new committers added in the last 3 months
  - Last committer addition was Joerg O. Henne at Mon Oct 09 2017

## Releases:

  - 2.0.13 was released on Sun Dec 02 2018



Andreas

[1] https://reporter.apache.org/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE][LAZY] move PDFBox git-wip repos to gitbox

2018-12-10 Thread Timo Boehme

Hi,

+1

BR Timo

Am 09.12.18 um 13:09 schrieb Andreas Lehmkuehler:

Hi,

Infra stated that we need documented consensus on this. So, let’s have 
at it.


Maruan proposed to move the following repos over to gitbox:

pdfbox-docs - our documentation
pdfbox-jbig2 - jbig2 subproject
pdfbox-testfiles - build test files for jbig2

We are going to start with pdfbox-docs.

The empty repository pdfbox-examples will be retired due to inactivity.

This is a lazy vote and will close in 72 hours. [1], [2]

Cheers,
Andreas


[1] https://www.apache.org/foundation/voting.html
[2] https://www.apache.org/foundation/glossary.html#LazyConsensus




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox Board Report October 2018 due

2018-10-09 Thread Timo Boehme

+1

Thanks
Timo

Am 08.10.18 um 17:35 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report template which can be found at [1]


Any further comments, objections or additions?



## Description:
  - the Apache PDFBox library is an open source Java tool for working 
with PDF

    documents.

## Issues:
  - there are no issue requiring board attention at this time.

## Activity:
  - we released 2 new PDFBox versions and one new JBIG2 version
  - 1.8.16 and 2.0.12 were released to fix CVE-2018-11797. It was 
reported through security@

  - Tilman is working on making PDFBox compatible with java 11
  - we are collaborating with Daniel Persson to explain a jdk related 
performance issue with some rendering cases


## Health report:
  - there is a steady stream of contributions, bug reports and questions 
on the

    mailing lists

## PMC changes:

  - Currently 21 PMC members.
  - No new PMC members added in the last 3 months
  - Last PMC addition was Matthäus Mayer on Mon Oct 16 2017

## Committer base changes:

  - Currently 21 committers.
  - No new committers added in the last 3 months
  - Last committer addition was Joerg O. Henne at Mon Oct 09 2017

## Releases:

  - 1.8.16 was released on Fri Oct 05 2018
  - 2.0.12 was released on Fri Oct 05 2018
  - 3.0.2 JBIG2 was released on Tue Sep 25 2018



Andreas

[1] https://reporter.apache.org/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 1.8.16

2018-10-02 Thread Timo Boehme

Hi,

+1

Timo


Am 01.10.18 um 21:04 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 1.8.16 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/1.8.16/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/1.8.16/

The SHA-512 checksum of the archive is 
85fbb9ef611876566f4bca626328af1e6c2ee9e9fddf18f589c110042727c15fa301d693b5f397bdbfb41e245502f40b9b2edb7dc691ccbe3e9f57a5aee8061e. 



Please vote on releasing this package as Apache PDFBox 1.8.16.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 1.8.16
     [ ] -1 Do not release this package because...

Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.12

2018-10-02 Thread Timo Boehme

Hi,

+1

Timo


Am 01.10.18 um 20:40 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.12 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.12/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.12/

The SHA-512 checksum of the archive is 
164a05954ed30e7c334d3c09a13acb6ad4b242ee24de5f96a27ab80329f85933c9d9561fdd542687864596db3d1f16f55c6fd18f31cea65d98a0cc22f5238f6b. 



Please vote on releasing this package as Apache PDFBox 2.0.12.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.12
     [ ] -1 Do not release this package because...


Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.2

2018-09-25 Thread Timo Boehme

Hi,

+1

Timo


Am 22.09.2018 um 17:54 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox JBIG2 ImageIO 3.0.2 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio-3.0.2/

The release candidate is a zip archive of the sources in:

     https://github.com/apache/pdfbox-jbig2/tree/jbig2-imageio-3.0.2/

The SHA-512 checksum of the archive is 
9a89ebefc13d23ec1b5787f836764b4d9f8793b08f4f5ff3c3fbb310b6b033dd880dac6f3830ab95e086c9efa07434a43fa0d30587b7cb4c1edb4a1ef017f5fe. 



Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.2.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.2
     [ ] -1 Do not release this package because...

Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-21 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623723#comment-16623723
 ] 

Timo Boehme edited comment on PDFBOX-4309 at 9/21/18 2:43 PM:
--

In principle it is only checked if the sun class exists while trying to load it 
(no instance creation etc.). I don't see a major problem here - it only is done 
for Java <= 8 and here I don't know of any problem (there is no problem if the 
class is missing).

You cannot rely on sun.java2d.cmm as even if it is set and KCMS is not 
available (as in OpenJDK 7/8 for Linux) it is simply ignored - thus if a user 
uses this setting (as proposed on the PDFBox website) and we rely on this 
setting it will lead to using wrong (performance) method.


was (Author: tboehme):
In principle it is only checked if the sun class exists while trying to load it 
(no instance creation etc.). I don't see a major problem here - it only is done 
for Java <= 8 and here I don't know of any problem (there is no problem is the 
class is missing).

You cannot rely on sun.java2d.cmm as even if it is set and KCMS is not 
available (as in OpenJDK 7/8 for Linux) it is simply ignored - thus if a user 
uses this setting (as proposed on the PDFBox website) and we rely on this 
setting it will lead to using wrong (performance) method.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.Kcm

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-21 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623723#comment-16623723
 ] 

Timo Boehme commented on PDFBOX-4309:
-

In principle it is only checked if the sun class exists while trying to load it 
(no instance creation etc.). I don't see a major problem here - it only is done 
for Java <= 8 and here I don't know of any problem (there is no problem is the 
class is missing).

You cannot rely on sun.java2d.cmm as even if it is set and KCMS is not 
available (as in OpenJDK 7/8 for Linux) it is simply ignored - thus if a user 
uses this setting (as proposed on the PDFBox website) and we rely on this 
setting it will lead to using wrong (performance) method.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the syst

[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-13 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613319#comment-16613319
 ] 

Timo Boehme edited comment on PDFBOX-4309 at 9/13/18 10:56 AM:
---

For toRGB one could possibly test for: Boolean.TRUE.equals(usesKCMS()) || 
Boolean.FALSE.equals(usesLCMS()) || JAVAVERSION>8 || JAVAVERSION<7


was (Author: tboehme):
For toRGB one could possibly test for: Boolean.TRUE.equals(usesKCMS()) || 
Boolean.FALSE.equals(usesLCMS()) || JAVAVERSION>8

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-13 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613319#comment-16613319
 ] 

Timo Boehme commented on PDFBOX-4309:
-

For toRGB one could possibly test for: Boolean.TRUE.equals(usesKCMS()) || 
Boolean.FALSE.equals(usesLCMS()) || JAVAVERSION>8

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing large collections of PDF documents (e.g. focusing on text) or 
&g

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-13 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613316#comment-16613316
 ] 

Timo Boehme commented on PDFBOX-4309:
-

It seems the the XcmsServiceProviders were introduced when there was a 
possibility to choose (first in Java 8) thus the hasXCMS() maybe should try to 
load classes sun.java2d.cmm.lcms.LCMS and sun.java2d.cmm.kcms.CMM which also 
works in Java 7. Additionally one could add the getCMSClassname test also for 
usesKCMS() (with contains(".kcms.")) to get it right. I've updated the code 
accordingly.

At least for jdk6 and below this will fail, but KCMS seems to be the only 
option in this versions anyway.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYK

[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-13 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613231#comment-16613231
 ] 

Timo Boehme edited comment on PDFBOX-4309 at 9/13/18 10:49 AM:
---

Ok, I've updated the detection as follows
{code:java}
import java.lang.reflect.Method;

public class CMSImplTest {

  public static boolean hasLCMS() {
    try {
  CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.lcms.LCMS" 
);
  return true;
    } catch ( Exception e ) {
  return false;
    }
  }

  public static boolean hasKCMS() {
    try {
  CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.kcms.CMM" );
  return true;
    } catch ( Exception e ) {
  return false;
    }
  }
 
  public static String getCMSClassname() {
    
    String javaVersStr = System.getProperty("java.specification.version");
    if ( (javaVersStr == null) || ( ! javaVersStr.startsWith("1.") ) ) {
  return null;
    }
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass != null ? cmsModuleClass.getClass().getName() : null;
  }
 
  public static Boolean usesLCMS() {
    // first try to get CMS class (works for Java 7,8)
    String cmsModuleClass = getCMSClassname();
    if ( cmsModuleClass != null ) {
  return cmsModuleClass.contains(".lcms.");
    }

    return Boolean.TRUE.equals( usesKCMS() ) ? Boolean.FALSE :
   Boolean.TRUE.equals( hasLCMS() ) ? Boolean.TRUE :
       null;
  }
 
  public static Boolean usesKCMS() {
    // first try to get CMS class (works for Java 7,8)
    String cmsModuleClass = getCMSClassname();
    if ( cmsModuleClass != null ) {
  return cmsModuleClass.contains(".kcms.");
    }

    if ( hasKCMS() &&
 
"sun.java2d.cmm.kcms.KcmsServiceProvider".equals(System.getProperty("sun.java2d.cmm")))
 {
  return true;    
    }
    return null;
  }
 
  public static void main(String[] args) {
    
    System.out.println( "Has KCMS: " + hasKCMS() );
    System.out.println( "Has LCMS: " + hasLCMS() );
      
    System.out.println( "Used CMS class:" + getCMSClassname() );
    
    System.out.println( "Uses KCMS:" + usesKCMS() );
    System.out.println( "Uses LCMS:" + usesLCMS() );
  }
}
{code}
Now it only uses class-loading for checking existence of CMS variants and 
reflection is restricted to Java <= 8. With usesLCMS() and usesKCMS() I used a 
3 value logic - null if unknown. Maybe the check is also only needed for Java 
<= 8 as in later versions even LCMS is reasonably fast, which means if 
usesLCMS() does not return TRUE we may assume KCMS.


was (Author: tboehme):
Ok, I've updated the detection as follows
{code:java}
import java.lang.reflect.Method;

public class CMSImplTest {

  public static boolean hasLCMS() {
    try {
  CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.lcms.LcmsServiceProvider" );
  return true;
    } catch ( Exception e ) {
  return false;
    }
  }

  public static boolean hasKCMS() {
    try {
  CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.kcms.KcmsServiceProvider" );
  return true;
    } catch ( Exception e ) {
  return false;
    }
  }
 
  public static String getCMSClassname() {
    
    String javaVersStr = System.getProperty("java.specification.version");
    if ( (javaVersStr == null) || ( ! javaVersStr.startsWith("1.") ) ) {
  return null;
    }
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-13 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613231#comment-16613231
 ] 

Timo Boehme commented on PDFBOX-4309:
-

Ok, I've updated the detection as follows
{code:java}
import java.lang.reflect.Method;

public class CMSImplTest {

  public static boolean hasLCMS() {
    try {
  CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.lcms.LcmsServiceProvider" );
  return true;
    } catch ( Exception e ) {
  return false;
    }
  }

  public static boolean hasKCMS() {
    try {
  CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.kcms.KcmsServiceProvider" );
  return true;
    } catch ( Exception e ) {
  return false;
    }
  }
 
  public static String getCMSClassname() {
    
    String javaVersStr = System.getProperty("java.specification.version");
    if ( (javaVersStr == null) || ( ! javaVersStr.startsWith("1.") ) ) {
  return null;
    }
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass != null ? cmsModuleClass.getClass().getName() : null;
  }
 
  public static Boolean usesLCMS() {
    // first try to get CMS class (works for Java 7,8)
    String cmsModuleClass = getCMSClassname();
    if ( cmsModuleClass != null ) {
  return cmsModuleClass.contains(".lcms.");
    }

    return Boolean.TRUE.equals( usesKCMS() ) ? false :
   hasLCMS() ? true :
       null;
  }
 
  public static Boolean usesKCMS() {
    if ( hasKCMS() &&
 
"sun.java2d.cmm.kcms.KcmsServiceProvider".equals(System.getProperty("sun.java2d.cmm")))
 {
  return true;    
    }
    return null;
  }
 
  public static void main(String[] args) {
    
    System.out.println( "Has KCMS: " + hasKCMS() );
    System.out.println( "Has LCMS: " + hasLCMS() );
      
    System.out.println( "Used CMS class:" + getCMSClassname() );
    
    System.out.println( "Uses KCMS:" + usesKCMS() );
    System.out.println( "Uses LCMS:" + usesLCMS() );
  }
}
{code}
Now it only uses class-loading for checking existence of CMS variants and 
reflection is restricted to Java <= 8. With usesLCMS() and usesKCMS() I used a 
3 value logic - null if unknown. Maybe the check is also only needed for Java 
<= 8 as in later versions even LCMS is reasonably fast, which means if 
usesLCMS() does not return TRUE we may assume KCMS.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
>     Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorCon

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-12 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611909#comment-16611909
 ] 

Timo Boehme commented on PDFBOX-4309:
-

If the getCMSClassname() in above class returns NULL and Java version is before 
1.7 one may assume a none LCMS manager is used (e.g. in Linux Java 1.6 there is 
sun.awt.color.CMM and no CMSManager). For Java 1.7 and 1.8 if LCMS is available 
also the CMSManager should exist and we would get a non-null result. For 
version above 1.8 if we get NULL it should be safe to assume no KCMS is used.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 second

[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-12 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611698#comment-16611698
 ] 

Timo Boehme edited comment on PDFBOX-4309 at 9/12/18 7:27 AM:
--

How about this to get the CMS class:
{code:java}
import java.lang.reflect.Method;

public class CMSImplTest {

  public static String getCMSClassname() {
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass != null ? cmsModuleClass.getClass().getName() : null;
  }
 
  public static void main(String[] args) {
    System.out.println( getCMSClassname() );
  }
}
{code}


was (Author: tboehme):
How about this to get the CMS class:
{code:java}
import java.lang.reflect.Method;

public class CMSImplTest {

  public static String getCMSClassname() {
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass.getClass().getName();
  }
 
  public static void main(String[] args) {
    System.out.println( getCMSClassname() );
  }
}
{code}

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PD

[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-12 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611698#comment-16611698
 ] 

Timo Boehme edited comment on PDFBOX-4309 at 9/12/18 7:25 AM:
--

How about this to get the CMS class:
{code:java}
import java.lang.reflect.Method;

public class CMSImplTest {

  public static String getCMSClassname() {
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass.getClass().getName();
  }
 
  public static void main(String[] args) {
    System.out.println( getCMSClassname() );
  }
}
{code}


was (Author: tboehme):
How about this to get the CMS class:
{code:java}
public class CMSImplTest {

  public static String getCMSClassname() {
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass.getClass().getName();
  }
 
  public static void main(String[] args) {
    System.out.println( getCMSClassname() );
  }
}
{code}

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDCol

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-12 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611698#comment-16611698
 ] 

Timo Boehme commented on PDFBOX-4309:
-

How about this to get the CMS class:
{code:java}
public class CMSImplTest {

  public static String getCMSClassname() {
    
    Class cmsMgrClass = null;
    try {
  cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( 
"sun.java2d.cmm.CMSManager" );
    } catch ( Exception e ) {
  System.err.println( "Unable to load CMSManager." );
  return null;
    }
    
    Method cmsMgrMethod = null;
    try {
  cmsMgrMethod = cmsMgrClass.getMethod( "getModule" );
    } catch ( Exception e ) {
  System.err.println( "Unable to get 'getModule' method from CMSManager." );
  return null;
    }
    
    Object cmsModuleClass = null;
    try {
  cmsModuleClass = cmsMgrMethod.invoke( null );
    } catch ( Exception e ) {
  System.err.println( "Unable to run 'getModule' method from CMSManager." );
  return null;
    }
    
    return cmsModuleClass.getClass().getName();
  }
 
  public static void main(String[] args) {
    System.out.println( getCMSClassname() );
  }
}
{code}

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611328#comment-16611328
 ] 

Timo Boehme commented on PDFBOX-4309:
-

Maybe sun.java2d.cmm.CMSManager.getModule() should be checked to be 
sun.java2d.cmm.kcms.CMM (not quite sure if this will be the correct class; 
simply checking for '.kcms.' within the qualified name should be enough.

 

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable thi

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611300#comment-16611300
 ] 

Timo Boehme commented on PDFBOX-4309:
-

I doubt that this test for KCMS will work. If one did not set sun.java2d.cmm it 
returns null. If you set it and KCMS is not available it is ignored. Even with 
OpenJDK 1.7 on Linux (newer versions) no KCMS is available, also in OpenJDK 1.8 
- I tried to load sun.java2d.cmm.kcms.KcmsServiceProvider {color:#33}but 
class was not found (also checking the rt.jar) - while in the Java for Windows 
it is still included (1.8.0_192ea).{color}


{color:#33}Thus the only reliable check will be loading the class 
sun.java2d.cmm.kcms.KcmsServiceProvider (e.g. Class.forName()). Only if this 
does not fail the system property check is sensible.{color}

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent us

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610975#comment-16610975
 ] 

Timo Boehme commented on PDFBOX-4309:
-

For completeness I've also checked Linux versions: Suse with OpenJDK 1.8.0_171 
is slow (400ms) while Ubuntu with 1.8.0_181 is fast (8 ms). Thus the startup 
optimization seems to be in one of the latest releases.

Interesting to see that even with the newest Java 8 release there is a 
considerable performance increase when using the performance fix. At least 
using LCMS is now possible again - the previous behavior lead to inacceptable 
processing times.

With 'change you proposed this morning' do you mean the patch against 
PDColorSpace? I thought that this could maybe be skipped with the now commited 
patch against PDICCBased (also see [#comment-16610539]) - or did you mean this 
last one?

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color sp

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610884#comment-16610884
 ] 

Timo Boehme commented on PDFBOX-4309:
-

So the good news is that with the most current versions the fix may not be 
needed anymore. However for production use with restrictions on Java updates we 
should have the property available.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing la

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610879#comment-16610879
 ] 

Timo Boehme commented on PDFBOX-4309:
-

I've checked the current JRE on Windows (1.8.0_181) and yes, it has 
considerably improved. My test program now only takes approx. 10 ms for the 
first ICC access.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing large collections of PDF do

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610843#comment-16610843
 ] 

Timo Boehme commented on PDFBOX-4309:
-

[^ICCImplCheck.java] can be used to check required time using ICC color space 
created from loaded ICC profile. For instance use profile contained in PDFBox 
under 
pdfbox/src/main/resources/org/apache/pdfbox/resources/icc/ISOcoated_v2_300_bas.icc

I've checked my Linux environment as well as Windows with Java 1.8.0_66. For 
both the output looks like
{noformat}
ICC usage (0) time (ms): 600
ICC usage (1) time (ms): 0
ICC usage (2) time (ms): 0
ICC usage (0) time (ms): 570
ICC usage (1) time (ms): 0
ICC usage (2) time (ms): 0
ICC usage (0) time (ms): 584
ICC usage (1) time (ms): 0
ICC usage (2) time (ms): 0{noformat}
(under Linux it was even faster with approx. 400 ms).

Thus the first usage of a new ICC color space take approx. 0.5 seconds; all 
following are fast. This means for documents containg a lot of such color 
spaces the first access times add up to possibly a very large number.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternat

[jira] [Updated] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme updated PDFBOX-4309:

Attachment: ICCImplCheck.java

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: ICCImplCheck.java, PDColorSpace.java.patch, 
> PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing large collections of PDF documents (e.g. focusing on text) or 
> to display a PDF in a timely manner the performance improvement should 
> outperform the drop in image quality.
> Whil

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610539#comment-16610539
 ] 

Timo Boehme commented on PDFBOX-4309:
-

I tracked the problem more deeply and found that my previous finding was only 
the tip of the iceberg (if I skip 'toRGB' trigger it will hang on 'new Color' 
trigger. The underlying problem is that if the (ICC) color profile is 'used' 
(toRGB() or convert-operation) for the first time it takes about 0.4 seconds 
(in my environment). Time is spend in native call LCMS.createNativeTransform().

And this call is not only triggered by PDColorSpace.toRGBImageAwt() but also 
e.g. by ShadingContext.convertToRGB or PageDrawer.getPaint when drawing a glyph.

Thus the only possibility to prevent these half second delays per ICC profile 
is my first proposal to use the alternate color spaces in each case, activated 
by system property. I will therefore apply my first patch.

The other solution with drawing selected images 'directly' instead of via 
ColorConvertOp only captures part of the problem. It would only be sensible to 
have this as an alternative to my first solution if the resulting colors are 
nearer to the original ones as with the alternative color spaces and rendering 
small images is the main problem - depends on document content.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: PDColorSpace.java.patch, PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse Ice

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610415#comment-16610415
 ] 

Timo Boehme commented on PDFBOX-4309:
-

I've found the reason why the 'direct-draw' solution is slower than mine and 
also is much slower on other pages of the problematic document (e.g. 9 seconds 
vs. 0.2 seconds): in PDICCBased.loadICCProfile() some operations are performed 
to trigger exceptions in order to fall back to alternate color space. The 
trigger awtColorSpace.toRGB() results (in my environment) in a 0.4 second delay 
- it seems internally it also uses the slow color-convert operation.

I wanted to check if an alternative operation without this side-effect could be 
used, however I found no document to trigger the exception (in my environment). 
In the code there are following references to problematic documents:
 * PDFBOX-1295: triggers an exception but with trigger 'ComponentColorModel', 
not the 'toRGB'
 * PDFBOX-1740: same as PDFBOX-1295
 * PDFBOX-3610: no exception

Thus its not clear to me if the trigger 'toRGB' is still needed. At least I 
would like to have a switch to disable this trigger so that the trigger by 
default is 'on' for compatibility. For PDFBOX version 3.x we could maybe remove 
it - in case we don't find any documents the trigger is good for.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: PDColorSpace.java.patch, PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Sus

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610340#comment-16610340
 ] 

Timo Boehme commented on PDFBOX-4309:
-

I've added a patch with the changes to PDColorSpace as discussed.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: PDColorSpace.java.patch, PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing large collections of PDF documents (e.g. focusing on text) or 
> to display a PDF in a timely manner the performance improvement should 
> outpe

[jira] [Updated] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme updated PDFBOX-4309:

Attachment: PDColorSpace.java.patch

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: PDColorSpace.java.patch, PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 second!
> It is clear that using the alternate color space might return wrong/not exact 
> colors. Therefore it should be only an option to enable this mode. However 
> for processing large collections of PDF documents (e.g. focusing on text) or 
> to display a PDF in a timely manner the performance improvement should 
> outperform the drop in image quality.
> While t

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-11 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610269#comment-16610269
 ] 

Timo Boehme commented on PDFBOX-4309:
-

Thanks for the hint. I have tested the solution proposed in PDFBOX-4149. When I 
set the condition to w*h < 200 all color-convert operations will be done by the 
alternative Graphics.drawImage. The time needed to render the problematic page 
is 2.1 second while with my solution it takes 1.2 seconds. With my test 
document I cannot comment on the color difference of the rendered images - 
[~tilman]: are you able to say which solution is nearer to the correct colors?

In principle the solution from PDFBOX-4149 is more general. In my environment 
(see above) only 2 ColorConvertOp per second are performed (I assume it is the 
calling of LCMS, not the real rendering). Thus this operations need to be 
prevented as much as possible. I would suggest having a parameter specifying 
the maximum image size (width*height) until which the alternative drawing will 
be done:
{code:java}
if(raster.getWidth() * raster.getHeight() > MAX_DIRECTDRAW_IMAGESIZE)
{
    ColorConvertOp op = new ColorConvertOp(null);
    op.filter(src, dest);
}
else
{
Graphics g = dest.getGraphics();
    g.drawImage(src, 0, 0, null);
    g.dispose();
}
{code}
Default would be -1.

WDYT?

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Su

[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-05 Thread Timo Boehme (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604374#comment-16604374
 ] 

Timo Boehme commented on PDFBOX-4309:
-

The problem in the document I have is that each of the 2500 images has its own 
indexed color space (only a few colors) and when the color space is initialized 
(PDIndexed.initRgbColorTable() -> baseColorSpace.toRGBImage()) the indexed 
colors are converted via ICC profile using the ColorConvertOp which itself 
calls the LCMS. Thus caching of color spaces won't help here as far as I can 
see - caching was also my first idea.

> Performance regression in PDColorSpace#toRGBImageAWT Part 2
> ---
>
> Key: PDFBOX-4309
> URL: https://issues.apache.org/jira/browse/PDFBOX-4309
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>  Labels: optimization
> Attachments: PDICCBased.java.patch
>
>
> This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
> graphics produced by CorelDraw which are combined by more than 2500(!) 
> images, each with its own indexed color space based on an ICC color space 
> (the shadows of graphic objects are created by large number of gray lines 
> ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) 
> rendering a single page with one graphic takes 780 seconds. The most time is 
> spent in creating the indexed color space via ICC color space mapping:
> {noformat}
>    java.lang.Thread.State: RUNNABLE
>     at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
>     at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
>     - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
>     at 
> sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
>     at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
>     at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
>     at 
> org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
>     at 
> org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
>     at 
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
> The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
> way to much time. Unfortunately using kcms via 
> {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no 
> option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) 
> - in both Java 7 and Java 8.
> However the ICC color space (PDICCBased) returns in this case CMYK as 
> alternate color space and for CMYK we have the alternative rendering via 
> system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from 
> PDFBOX-3569.
> The idea is now to have an option to force using the alternative color space 
> instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
> alternative color space it has to be combined with the system property 
> 'UsePureJavaCMYKConversion'.
> Using this approach the rendering time of the page with the problematic 
> graphic drops from 780 seconds to 1 s

[DISCUSS] Document specific processing properties

2018-09-05 Thread Timo Boehme

Hi,

my last (proposed) addition of a system property controlling rendering 
(PDFBOX-4309) adds to other already existing properties (e.g. 
org.apache.pdfbox.rendering.UsePureJavaCMYKConversion, possible more?).


These settings are important for specific use cases/environments. Even 
more they are often only needed for specific PDF documents - e.g. the 
mentioned properties especially cure a problem with excessive calls to 
Java color management implementation, without them some documents are 
practically not processable. In other cases the settings also could have 
negative effects like slower processing or wrong colors.


Thus it would be good to have the possibility to adjust settings on a 
per-document basis (either directly by user or based on 
checking/collecting document features like number of images etc.).
The problem is how these document specific settings can be provided to 
the relevant classes, e.g. PDICCBased.


Providing a settings object through the call-chain is probably not an 
option as a lot of constructors/methods would have to be changed for 
only a few places where the settings are really needed.


One viable solution which came to my mind is using a ThreadLocal 
ProcessingProperties map (String,String). In order to not get unwanted 
side-effects using these properties should be initiated by the user
and it should be clearly documented to do it in a try-finally block in 
order to remove the settings after processing (and also to not get 
memory leaks etc.), like:


try {
  LocalProcessingProperties.activate();  // creates a map object
  ... // PDF processing
} finally {
  LocalProcessingProperties.clear();  // removed map object
}

A call to LocalProcessingProperties.getProperty( KEY ) would return 
value from ThreadLocal map - if map exists and contains this key, 
otherwise fall back to return System.getProperty( KEY ).


As PDFBOX (currently) doen't use multiple threads this should work fine 
- for multi-threaded usage an initialization/clear would be needed for 
each thread which could get the reference to the map object of the main 
processing thread.


WDYT?


Best regards,
Timo


--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2

2018-09-04 Thread Timo Boehme (JIRA)
Timo Boehme created PDFBOX-4309:
---

 Summary: Performance regression in PDColorSpace#toRGBImageAWT Part 
2
 Key: PDFBOX-4309
 URL: https://issues.apache.org/jira/browse/PDFBOX-4309
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.11, 3.0.0 PDFBox
Reporter: Timo Boehme
Assignee: Timo Boehme
 Attachments: PDICCBased.java.patch

This is a continuation of PDFBOX-3569. In a (private) PDF document there are 
graphics produced by CorelDraw which are combined by more than 2500(!) images, 
each with its own indexed color space based on an ICC color space (the shadows 
of graphic objects are created by large number of gray lines ...). In our 
environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) rendering a 
single page with one graphic takes 780 seconds. The most time is spent in 
creating the indexed color space via ICC color space mapping:
{noformat}
   java.lang.Thread.State: RUNNABLE
    at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method)
    at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156)
    at sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155)
    - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform)
    at 
sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268)
    at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355)
    at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92)
    at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70)
    at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672)
    at 
org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196)
    at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443)
    at 
org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424)
    at 
org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat}
The call of LittleCMS (LCMS) multi thousand times is the problem here taking 
way to much time. Unfortunately using kcms via 
{{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no option 
as the Suse IceadTea OpenJDK seems to not have included it (anymore?) - in both 
Java 7 and Java 8.

However the ICC color space (PDICCBased) returns in this case CMYK as alternate 
color space and for CMYK we have the alternative rendering via system property 
org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from PDFBOX-3569.

The idea is now to have an option to force using the alternative color space 
instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as 
alternative color space it has to be combined with the system property 
'UsePureJavaCMYKConversion'.

Using this approach the rendering time of the page with the problematic graphic 
drops from 780 seconds to 1 second!

It is clear that using the alternate color space might return wrong/not exact 
colors. Therefore it should be only an option to enable this mode. However for 
processing large collections of PDF documents (e.g. focusing on text) or to 
display a PDF in a timely manner the performance improvement should outperform 
the drop in image quality.

While the provided patch will use the alternate color space if activated in any 
case, it could be possible at a later stage to add more intelligent logic which 
decides on a runtime analysis when to use this mode (number of calls to LCMS, 
time needed etc.).

If there are no objections with this patch I will apply it in the next days.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-4307) ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary

2018-08-31 Thread Timo Boehme (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme resolved PDFBOX-4307.
-
   Resolution: Fixed
Fix Version/s: 3.0.0 PDFBox
   2.0.12

> ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is 
> not a dictionary
> 
>
> Key: PDFBOX-4307
> URL: https://issues.apache.org/jira/browse/PDFBOX-4307
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
>
> In PDDocumentOutline.getDocumentOutline() the 'outline' is read as dictionary 
> object and directly cast to COSDictionary. Normally this is ok as it should 
> be a dictionary. However in a bad PDF as I have it in my collection 
> (unfortunately I'm not allowed to disclose it) the object is an array 
> (COSArray) which leads to the ClassCastException.
> Since the outline is an optional information the best we can do here is to 
> ignore the 'outline' data if its not a COSDIctionary and return 'null'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-4307) ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary

2018-08-31 Thread Timo Boehme (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme reassigned PDFBOX-4307:
---

Assignee: Timo Boehme

> ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is 
> not a dictionary
> 
>
> Key: PDFBOX-4307
> URL: https://issues.apache.org/jira/browse/PDFBOX-4307
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.11, 3.0.0 PDFBox
>Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
>
> In PDDocumentOutline.getDocumentOutline() the 'outline' is read as dictionary 
> object and directly cast to COSDictionary. Normally this is ok as it should 
> be a dictionary. However in a bad PDF as I have it in my collection 
> (unfortunately I'm not allowed to disclose it) the object is an array 
> (COSArray) which leads to the ClassCastException.
> Since the outline is an optional information the best we can do here is to 
> ignore the 'outline' data if its not a COSDIctionary and return 'null'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4307) ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary

2018-08-31 Thread Timo Boehme (JIRA)
Timo Boehme created PDFBOX-4307:
---

 Summary: ClassCastException in 
PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary
 Key: PDFBOX-4307
 URL: https://issues.apache.org/jira/browse/PDFBOX-4307
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.11, 3.0.0 PDFBox
Reporter: Timo Boehme


In PDDocumentOutline.getDocumentOutline() the 'outline' is read as dictionary 
object and directly cast to COSDictionary. Normally this is ok as it should be 
a dictionary. However in a bad PDF as I have it in my collection (unfortunately 
I'm not allowed to disclose it) the object is an array (COSArray) which leads 
to the ClassCastException.

Since the outline is an optional information the best we can do here is to 
ignore the 'outline' data if its not a COSDIctionary and return 'null'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: New releases?

2018-08-30 Thread Timo Boehme

+1

Thanks,
Timo


Am 29.08.2018 um 18:38 schrieb Andreas Lehmkuehler:

Hi,

I'm planing to cut the following releases in about 3-4 weeks from now:

- JBIG2 3.0.2 (fix for a memory leak)
- PDFBox 2.0.12 (there are about 30 fixes/improvements)

WDYT?

Andreas


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-4301) ClassCastException in PDExtendedGraphicsState

2018-08-27 Thread Timo Boehme (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme resolved PDFBOX-4301.
-
   Resolution: Fixed
Fix Version/s: 3.0.0 PDFBox
   2.0.12

> ClassCastException in PDExtendedGraphicsState
> -
>
> Key: PDFBOX-4301
> URL: https://issues.apache.org/jira/browse/PDFBOX-4301
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.11
>    Reporter: Timo Boehme
>Assignee: Timo Boehme
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
>
> The method PDExtendedGraphicsState.getFloatItem contains a non checked cast 
> to COSNumber for a dictionary object. In a specific journal PDF document I 
> get the following exception:
> {noformat}
> at 
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getFloatItem(PDExtendedGraphicsState.java:591)
>     at 
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getStrokingAlphaConstant(PDExtendedGraphicsState.java:482)
>     at 
> org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:130)
>     at 
> org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61)
>     at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848){noformat}
> because the PDF contains
> {noformat}
> /A4 <<
> /CA (1.0)
> /Type /ExtGState
> /ca (1.0)
> >>
> {noformat}
> where "(1.0)" is clearly wrong and should be "1.0".
> As this seems to be a more seldom error I would suggest to check dictionary 
> object type before casting and returning "null" for wrong type (as it is done 
> e.g. in PDExtendedGraphicsState.getFontSetting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4301) ClassCastException in PDExtendedGraphicsState

2018-08-27 Thread Timo Boehme (JIRA)
Timo Boehme created PDFBOX-4301:
---

 Summary: ClassCastException in PDExtendedGraphicsState
 Key: PDFBOX-4301
 URL: https://issues.apache.org/jira/browse/PDFBOX-4301
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 2.0.11
Reporter: Timo Boehme
Assignee: Timo Boehme


The method PDExtendedGraphicsState.getFloatItem contains a non checked cast to 
COSNumber for a dictionary object. In a specific journal PDF document I get the 
following exception:
{noformat}
at 
org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getFloatItem(PDExtendedGraphicsState.java:591)
    at 
org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getStrokingAlphaConstant(PDExtendedGraphicsState.java:482)
    at 
org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:130)
    at 
org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61)
    at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848){noformat}
because the PDF contains
{noformat}
/A4 <<
/CA (1.0)
/Type /ExtGState
/ca (1.0)
>>
{noformat}
where "(1.0)" is clearly wrong and should be "1.0".

As this seems to be a more seldom error I would suggest to check dictionary 
object type before casting and returning "null" for wrong type (as it is done 
e.g. in PDExtendedGraphicsState.getFontSetting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.11

2018-06-26 Thread Timo Boehme

Hi,

+1

Timo


Am 25.06.2018 um 20:51 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.11 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.11/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.11/

The SHA-512 checksum of the archive is 
a53f6c64e41b4843b103b6d9b964b77c226da1ec21ab0c1c7b14772fa233a53b3f179d16a73deb84803476aced6a74e67f1eb43ba34f3517651c6c52f669aaf7. 



Please vote on releasing this package as Apache PDFBox 2.0.11.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.11
     [ ] -1 Do not release this package because...


Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 1.8.15

2018-06-26 Thread Timo Boehme

Hi,

+1

Best regards,
Timo


Am 25.06.2018 um 20:13 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 1.8.15 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/1.8.15/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/1.8.15/

The SHA-512 checksum of the archive is 
ac3f4b131f5cd2153ec2a744c486db921bc2165d596b243ad673cfc94be1bc4ae27bdf2981b63419fead18db569a2008264d6fdc7c89cf47f69f81c4a7d3a2a6. 



Please vote on releasing this package as Apache PDFBox 1.8.15.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 1.8.15
     [ ] -1 Do not release this package because...

Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.1

2018-05-17 Thread Timo Boehme

Hi,

I've checked some sample images.

+1


Best,
Timo



Am 14.05.2018 um 19:20 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox JBIG2 ImageIO 3.0.1 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio-3.0.1/

The release candidate is a zip archive of the sources in:

     https://github.com/apache/pdfbox-jbig2/tree/jbig2-imageio-3.0.1/

The SHA-512 checksum of the archive is 
3688ad3a79caccfa0d43c68011bafb076d71cce4c94e6ed7061c2a127639ccf0e683bd8ce68b0f14d14d6647aaba9e107a6c0ee785daa299a0fd103e0a554626. 



Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.1.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.1
     [ ] -1 Do not release this package because...


Here his my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4058) High memory consumption when extracting image from PDF file

2018-01-11 Thread Timo Boehme (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321862#comment-16321862
 ] 

Timo Boehme commented on PDFBOX-4058:
-

I would say that putting the values in weak-references is the correct solution 
if you look at how WeakHashMap works: only the keys are weak and may be garbage 
collected at any time but the values are not. The table entries are cleared 
only if you run a method (get/put/size/...) on the table. Without accessing the 
table all entries remain (especially the values). Thus in order to allow also 
the values to be garbage collected if needed it is required to put them into a 
WeakReference as you have done.

> High memory consumption when extracting image from PDF file
> ---
>
> Key: PDFBOX-4058
> URL: https://issues.apache.org/jira/browse/PDFBOX-4058
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.5, 2.0.6, 2.0.7, 2.0.8
> Environment: windows 10 / Linux
>Reporter: Bjorn Misseghers
>Assignee: Tilman Hausherr
>  Labels: regression
> Fix For: 2.0.9, 3.0.0 PDFBox
>
> Attachments: HighMemoryFootprint.pdf
>
>
> When rendering an image at 300 dpi from the included PDF, my java process 
> uses a huge amount of memory.
> The document is only 45 Kb in size and contains 2 pages, my JVM is unable to 
> extract even 1 page with 3G of memory. Setting Xmx to 4G works but is not the 
> solution I want.
> The error occurs when calling PDFRenderer.renderImageWithDPI()
> I already tried tweaking the memory usage in my application to use a scratch 
> file while loading the document as well as avoiding caching of XObjects as 
> described here: https://pdfbox.apache.org/2.0/faq.html#outofmemoryerror
> These didn't work.
> The issue can be reproduced using the pdfbox-app utility:
> java -Xmx3G -jar pdfbox-app-2.0.8.jar PDFToImage 
> HighMemoryFootprint.pdf -dpi 300 -color RGB -page 1
> What can not be changed?
> * 300 dpi will not be decreased.
> * Max Java memory will not be increased: 3GB is ridiculous for a 45kb PDF 
> file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox January 2018 report due

2018-01-10 Thread Timo Boehme

+1

Timo

Am 09.01.2018 um 22:14 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit 
this

month. It's based upon the report template which can be found at [1]


Any further comments, objections or additions?



## Description:
  - the Apache PDFBox library is an open source Java tool for working 
with PDF documents.


## Issues:
  - there are no issue requiring board attention at this time.

## Activity:
  - the integration of the JBig2 ImageIO plugin is complete
  - we are planning to release the first Apache based version of the 
JBig2 ImageIO plugin this month

  - we are working on fixing bugs in 2.0.x
  - we have resolve quite a number of 2.0.x releated tickets so that 
most likely the next bugfix version 2.0.9 will be released this month as 
well


## Board feedback (comment from the last october board meeting)

   mt: Reading the "2.0.7 release" thread on private@ it appears that
   the project is dependent on a single committer for at least a
   sub-set of regression tests. Could you explain this in more
   detail please. If there are tests the community depends on, I'd
   expect to see those tests in an ASF repository where any
   committer can run them.


These tests are not classic regression tests but tests on a large amount 
(> 50) of files. The results are compared to the results of a 
previous version and then committers investigate files with some extreme 
negative differences or with new exceptions. The same is done (on an 
even larger scale) for Tika, see [1] and [2].
The Tika tests need 4TB, and the files can't be hosted on a public ASF 
repo or released under the Apache License because the files largely 
derive from Common Crawl or the internet generally, and 
copyright/licensing would pose a problem. There is a special vm to host 
the described test and it is possible to grant access to all interested 
Tika/PDFBox committers. Tilman already got his access bits in december, 
so that at least one other committer is able to run those tests if 
needed. Maybe others will follow.


[1] 
http://events.linuxfoundation.org/sites/events/files/slides/ApacheConMiami2017_tallison_v2.pdf 

[2] 
http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/ 




## Health report:
  - there is a steady stream of contributions, bug reports and questions 
on the mailing lists


## PMC changes:

  - Currently 21 PMC members.
  - New PMC members:
     - Joerg O. Henne was added to the PMC on Mon Oct 09 2017
     - Sebastian Holder was added to the PMC on Wed Oct 11 2017
     - Carolin Köhler was added to the PMC on Wed Oct 11 2017
     - Matthäus Mayer was added to the PMC on Mon Oct 16 2017

## Committer base changes:

  - Currently 21 committers.
     - Joerg O. Henne was added as a committer on Mon Oct 09 2017
     - Sebastian Holder was added as a committer on Wed Oct 11 2017
     - Carolin Köhler was added as a committer on Wed Oct 11 2017
     - Matthäus Mayer was added as a committer on Mon Oct 16 2017

## Releases:

  - 2.0.8 was released on Thu Nov 02 2017

## JIRA activity:

  - 101 JIRA tickets created in the last 3 months
  - 75 JIRA tickets closed/resolved in the last 3 months




Andreas

[1] https://reporter.apache.org/?pdfbox

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.8

2017-11-01 Thread Timo Boehme

Hi,

+1

Thanks,
Timo


Am 30.10.2017 um 19:47 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.8 release is available at:

     https://dist.apache.org/repos/dist/dev/pdfbox/2.0.8/

The release candidate is a zip archive of the sources in:

     http://svn.apache.org/repos/asf/pdfbox/tags/2.0.8/

The SHA1 checksum of the archive is 
5c0607144dde1b7af3dd428cafbd2c9c29617ab3.


Please vote on releasing this package as Apache PDFBox 2.0.8.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

     [ ] +1 Release this package as Apache PDFBox 2.0.8
     [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Accept the JBig2 ImageIO Plugin contribution (PDFBOX-3906)

2017-09-04 Thread Timo Boehme

Hi,

+1

and thanks for the contribution.

Best,
Timo


Am 02.09.2017 um 11:42 schrieb Andreas Lehmkuehler:

Hi,

The contributed JBig2 ImageIO Plugin codebase is now available for
review in
PDFBOX-3906 [1] and the relevant IP clearance process has been started
[2]. As
discussed, I propose that we accept this codebase and invite the JBig2
ImageIO
Plugin developers listed below as new committers and PMC members of the
PDFBox
project.

Jörg Henne
Matthäus Mayer
Sebastian Holder
Carolin Köhler

So, please vote on accepting the JBig2 ImageIO Plugin contribution and
granting
committer and PMC member status to the people listed above, on the
condition
that the IP clearance passes without problems. This vote is open for the
next 72
hours.

[ ] +1 Accept the JBig2 ImageIO Plugin contribution and grant committer
and PMC
member status to the above people, assuming the IP clearance passes

[ ] -1 Don't accept the codebase and/or grant committership, because...

Here is my +1.


BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-3906
[2] https://incubator.apache.org/ip-clearance/pdfbox-jbig2.html

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.6

2017-05-15 Thread Timo Boehme

Hi,

+1

I don't known which change caused the difference but my special journal 
test document renders for the first time completely (before some images 
had either wrong colors or were wrong missing). Very nice.



Best,
Timo


Am 12.05.2017 um 18:13 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.6 release is available at:

https://dist.apache.org/repos/dist/dev/pdfbox/2.0.6/

The release candidate is a zip archive of the sources in:

http://svn.apache.org/repos/asf/pdfbox/tags/2.0.6/

The SHA1 checksum of the archive is
cb04fa19058efca6913a45490ac66cf44ecf273a.

Please vote on releasing this package as Apache PDFBox 2.0.6.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

[ ] +1 Release this package as Apache PDFBox 2.0.6
[ ] -1 Do not release this package because...


Here is my +1

BR
Andreas Lehmkühler

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Apache PDFBox April 2017 board report due

2017-04-10 Thread Timo Boehme

Hi,

+1

Best regards,
Timo


Am 09.04.2017 um 13:30 schrieb Andreas Lehmkuehler:

Hi,

find attached a quick draft of the board report we're expected to submit
this
month. It's based upon the report template which can be found at [1]


Any further comments, objections or additions?



## Description:
 - the Apache PDFBox library is an open source Java tool for working
with PDF documents.

## Issues:
 - there are no issue requiring board attention at this time.

## Activity:
 - we are working on fixing bugs in 2.0.x
 - there are some small improvements as well
 - we decided to switch the current trunk from 2.1.0 to 3.0.0 as we are
going to introduce some api changes which require a major release
 - Maruan started an effort to update our website
 - we support the new donation campaign and added the logo including a link

## Health report:
 - there is a steady stream of contributions, bug reports and questions
on the mailing lists

## PMC changes:
 - Currently 17 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Tim Allison on Mon Sep 19 2016

## Committer base changes:
 - Currently 17 committers.
 - No new committers added in the last 3 months
 - Last committer addition was Tim Allison at Mon Sep 19 2016

## Releases:
 - 2.0.5 was released on Fri Mar 17 2017

## JIRA activity:
 - 105 JIRA tickets created in the last 3 months
 - 106 JIRA tickets closed/resolved in the last 3 months


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




--
Timo Boehme
OntoChem IT Solutions GmbH
Blücherstraße 24
06120 Halle (Saale)
Germany

phone: +49 345 478 047 4| fax: +49 345 478 047 1
email: timo.boe...@ontochem.com | web: www.ontochem.com
HRB 21962 Amtsgericht Stendal   | USt-IdNr.: DE815563824
managing director : Lutz Weber


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



  1   2   3   4   5   >