Re: [VOTE] Release Apache PDFBox 3.0.2
Hi, +1 Thanks, Timo Am 11.03.24 um 20:24 schrieb Andreas Lehmkühler: Hi, a candidate for the PDFBox 3.0.2 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/3.0.2/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/pdfbox/tags/3.0.2/ The SHA-512 checksum of the archive is d2eaaa4e7a139b00d79d7518ca66ee2c33300dbeed11c05554413e478b2a76814a7404a9467cb2dc3502840259188965a3483342c7d44e3280b68649aec670f8. Please vote on releasing this package as Apache PDFBox 3.0.2. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 3.0.2 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: t.boe...@digital-science.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Felix Berthelmann - Mario Diwersy - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report January 2024 due
+1 Thanks, Timo Am 08.01.24 um 08:14 schrieb Andreas Lehmkühler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report wizard template which can be found at [1] Any comments or additions are appreciated ... ## Description: The mission of PDFBox is the creation and maintenance of software related to Java library for working with PDF documents ## Project Status: Current project status: ongoing with moderate activity Issues for the board: none ## Membership Data: Apache PDFBox was founded 2009-10-21 (14 years ago) There are currently 21 committers and 21 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Matthäus Mayer on 2017-10-16. - No new committers. Last addition was Joerg O. Henne on 2017-10-09. ## Project Activity: Recent releases: 3.0.1 was released on 2023-11-30. 2.0.30 was released on 2023-11-04. 3.0.0 was released on 2023-08-17. ## Community Health: - there is a steady stream of contributions, bug reports and questions on the mailing lists - we released the first minor release of our new 3.0.x line to fix some regression issues. A couple of improvements and further fixes were included as well. - the development of the current trunk version 4.0.0 is an ongoing effort, e.g. we switched to Log4j2 and did some major refactorings - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: t.boe...@digital-science.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Felix Berthelmann - Mario Diwersy - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 3.0.1
+1, Thanks, Timo Am 27.11.23 um 17:46 schrieb Andreas Lehmkühler: Hi, a candidate for the PDFBox 3.0.1 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/3.0.1/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/pdfbox/tags/3.0.1/ The SHA-512 checksum of the archive is 8ca8f3297ec04efaa23ab6d9ca421c1b39d8fb2de392e0f7b5aa6e7053eac75066e8b2872dc6b6847a0194b557aa8570de7f1d1a122fcf3888bf9ed21eae0257. Please vote on releasing this package as Apache PDFBox 3.0.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 3.0.1 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: t.boe...@digital-science.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Felix Berthelmann - Mario Diwersy - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 3.0.0
Hi, +1 Thanks, Timo Am 14.08.23 um 20:29 schrieb Andreas Lehmkühler: Hi, a candidate for the PDFBox 3.0.0 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/pdfbox/tags/3.0.0/ The SHA-512 checksum of the archive is 279f283f8f97e3adb5e58546f6242b495eef26dacfc256129f790064a73934f16ceb0a7a9164293d506fc0fff462783d296b844611ed18e12b9de0f1724294b5. Please vote on releasing this package as Apache PDFBox 3.0.0. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 3.0.0 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5636) Implement PDF 2.0 dash phase clarification
[ https://issues.apache.org/jira/browse/PDFBOX-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17745259#comment-17745259 ] Timo Boehme commented on PDFBOX-5636: - LGTM - in respect to optimization and under the assumption that phase typically will be < sum2 one could spare some CPU cycles: {code:java} phase += (-phase < sum2) ? sum2 : (Math.floor(-phase / sum2) + 1) * sum2; {code} > Implement PDF 2.0 dash phase clarification > -- > > Key: PDFBOX-5636 > URL: https://issues.apache.org/jira/browse/PDFBOX-5636 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.29 >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.30, 3.0.0 PDFBox > > > Implement clarification of PDF 2.0 when dash phase is negative -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5636) Implement PDF 2.0 dash phase clarification
[ https://issues.apache.org/jira/browse/PDFBOX-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743675#comment-17743675 ] Timo Boehme edited comment on PDFBOX-5636 at 7/17/23 8:35 AM: -- Reading the spec I cannot follow how the implemented optimized version will produce correct results? E.g. [2 1]; -1 : according to spec the phase will be 5: -1 + 2*(2+1) with the current optimized implementation it will be 4 (actually it produces multiples of 4); regarding the initial implementation: it should be checked that the sum of the array components is >0 otherwise the loop will run forever was (Author: tboehme): Reading the spec I cannot follow how the implemented optimized version will produce correct results? E.g. [2 1]; -1 : according to spec the phase will be 5: -1 + 2*(2+1) with the current optimized implementation it will be 4 (actually it produces multiples of 4); regarding the initial implementation: it should be checked, the the sum of the array components is >0 otherwise the loop will run forever > Implement PDF 2.0 dash phase clarification > -- > > Key: PDFBOX-5636 > URL: https://issues.apache.org/jira/browse/PDFBOX-5636 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.29 >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.30, 3.0.0 PDFBox > > > Implement clarification of PDF 2.0 when dash phase is negative -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5636) Implement PDF 2.0 dash phase clarification
[ https://issues.apache.org/jira/browse/PDFBOX-5636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743675#comment-17743675 ] Timo Boehme commented on PDFBOX-5636: - Reading the spec I cannot follow how the implemented optimized version will produce correct results? E.g. [2 1]; -1 : according to spec the phase will be 5: -1 + 2*(2+1) with the current optimized implementation it will be 4 (actually it produces multiples of 4); regarding the initial implementation: it should be checked, the the sum of the array components is >0 otherwise the loop will run forever > Implement PDF 2.0 dash phase clarification > -- > > Key: PDFBOX-5636 > URL: https://issues.apache.org/jira/browse/PDFBOX-5636 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.29 >Reporter: Tilman Hausherr >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.30, 3.0.0 PDFBox > > > Implement clarification of PDF 2.0 when dash phase is negative -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-5624) Infinte loop when parsing Type1 font
[ https://issues.apache.org/jira/browse/PDFBOX-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme resolved PDFBOX-5624. - Resolution: Fixed > Infinte loop when parsing Type1 font > > > Key: PDFBOX-5624 > URL: https://issues.apache.org/jira/browse/PDFBOX-5624 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.28, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Major > Fix For: 2.0.29, 3.0.0 PDFBox > > > At 2 places the Type1Parser has a loop with a negated condition on the next > token to be read. The loops simply advances to the next token without > checking the token to be null, which may happen if the font is > corrupted/shortened. If this occurs the parser is stuck in the loops. This > happens to me for one of the loops with a file which however cannot be > shared. The 2nd loop was found by scanning the code for similar problems. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5624) Infinte loop when parsing Type1 font
[ https://issues.apache.org/jira/browse/PDFBOX-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme updated PDFBOX-5624: Fix Version/s: 2.0.29 3.0.0 PDFBox > Infinte loop when parsing Type1 font > > > Key: PDFBOX-5624 > URL: https://issues.apache.org/jira/browse/PDFBOX-5624 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.28, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Major > Fix For: 2.0.29, 3.0.0 PDFBox > > > At 2 places the Type1Parser has a loop with a negated condition on the next > token to be read. The loops simply advances to the next token without > checking the token to be null, which may happen if the font is > corrupted/shortened. If this occurs the parser is stuck in the loops. This > happens to me for one of the loops with a file which however cannot be > shared. The 2nd loop was found by scanning the code for similar problems. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5624) Infinte loop when parsing Type1 font
[ https://issues.apache.org/jira/browse/PDFBOX-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme updated PDFBOX-5624: Affects Version/s: 3.0.0 PDFBox > Infinte loop when parsing Type1 font > > > Key: PDFBOX-5624 > URL: https://issues.apache.org/jira/browse/PDFBOX-5624 > Project: PDFBox > Issue Type: Bug > Components: FontBox >Affects Versions: 2.0.28, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Major > > At 2 places the Type1Parser has a loop with a negated condition on the next > token to be read. The loops simply advances to the next token without > checking the token to be null, which may happen if the font is > corrupted/shortened. If this occurs the parser is stuck in the loops. This > happens to me for one of the loops with a file which however cannot be > shared. The 2nd loop was found by scanning the code for similar problems. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5624) Infinte loop when parsing Type1 font
Timo Boehme created PDFBOX-5624: --- Summary: Infinte loop when parsing Type1 font Key: PDFBOX-5624 URL: https://issues.apache.org/jira/browse/PDFBOX-5624 Project: PDFBox Issue Type: Bug Components: FontBox Affects Versions: 2.0.28 Reporter: Timo Boehme Assignee: Timo Boehme At 2 places the Type1Parser has a loop with a negated condition on the next token to be read. The loops simply advances to the next token without checking the token to be null, which may happen if the font is corrupted/shortened. If this occurs the parser is stuck in the loops. This happens to me for one of the loops with a file which however cannot be shared. The 2nd loop was found by scanning the code for similar problems. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report October 2022 due
+1 Thanks, Timo Am 09.10.22 um 14:06 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report wizard template which can be found at [1] Any comments or additions are appreciated ... ## Description: The mission of PDFBox is the creation and maintenance of software related to a Java library for working with PDF documents ## Issues: There are no issues requiring board attention at this time. ## Membership Data: Apache PDFBox was founded 2009-10-21 (13 years ago) There are currently 21 committers and 21 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Matthäus Mayer on 2017-10-16. - No new committers. Last addition was Joerg O. Henne on 2017-10-09. ## Project Activity: Recent releases: 2.0.27 was released on 2022-09-29. 1.8.17 was released on 2022-09-15. 2.0.26 was released on 2022-04-21. ## Community Health: - there is a steady stream of contributions, bug reports and questions on the mailing lists - there are a lot of refactorings, improvements and bugfixes - we are still planning to cut the first beta release of our next major version 3.0.0 - to do so we start to identify the last tickets with breaking changes to be included in 3.0.0. - due to the releases last month the preparations for the beta release were slowed down a little - there was an article about maintaining interoperability in open source software". To do so the authors studied the activities within Apache PDFBox for two years without the knowledge of the community. We don't see any surprises, see https://s.apache.org/aljtz for further details Andreas [1] https://reporter.apache.org/wizard/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.27
Adding the twelvemonkeys JPEG library fixed the issue (correct rendering of JPEG CMYK images in PDF under Java-17). So is this a known problem with the original Java JPEG implementation (as used by the pure pdfbox-app)? I remember that there might be an alternative code path for (JPEG) CMYK rendering until Java 8 in PDFBox which might explain why it is working until Java 8? Best regards, Timo Am 28.09.22 um 12:07 schrieb Tilman Hausherr: Please try with the twelvemonkeys library too Tilman --- Original-Nachricht --- Von: Timo Boehme Betreff: Re: [VOTE] Release Apache PDFBox 2.0.27 Datum: 28. September 2022, 10:02 An: dev@pdfbox.apache.org +1 No regression found compared to previous versions but detected problem with Java-11 and Java-17 rendering some CMYK JPEG images (inverse colors, Java-8 is fine); have to investigate further and open a Jira Issue. Thanks, Timo Am 26.09.22 um 17:28 schrieb Andreas Lehmkuehler: a candidate for the PDFBox 2.0.27 release is available at: <https://dist.apache.org/repos/dist/dev/pdfbox/2.0.27> / The release candidate is a zip archive of the sources in: <https://svn.apache.org/repos/asf/pdfbox/tags/2.0.27> / The SHA-512 checksum of the archive is 59a5675f5d1d34f092adc019679f7d10e7e93c0f554a002ac29d48cbffcaa600d930309fa94a92191c01ead8da905cbb37ce5e233dcc9b8732a881d4abf75def. Please vote on releasing this package as Apache PDFBox 2.0.27. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.27 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org> -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com <mailto:timo.boe...@ontochem.com> | web: <http://www.ontochem.com> HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org <mailto:dev-unsubscr...@pdfbox.apache.org> For additional commands, e-mail: dev-h...@pdfbox.apache.org <mailto:dev-h...@pdfbox.apache.org> -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.27
+1 No regression found compared to previous versions but detected problem with Java-11 and Java-17 rendering some CMYK JPEG images (inverse colors, Java-8 is fine); have to investigate further and open a Jira Issue. Thanks, Timo Am 26.09.22 um 17:28 schrieb Andreas Lehmkuehler: a candidate for the PDFBox 2.0.27 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.27/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/pdfbox/tags/2.0.27/ The SHA-512 checksum of the archive is 59a5675f5d1d34f092adc019679f7d10e7e93c0f554a002ac29d48cbffcaa600d930309fa94a92191c01ead8da905cbb37ce5e233dcc9b8732a881d4abf75def. Please vote on releasing this package as Apache PDFBox 2.0.27. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.27 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 1.8.17
+1 Really interesting how much faster (often 10-20 times; and more correct) 2.X is for parsing+rendering compared to the 1.X version. Timo Am 12.09.22 um 18:50 schrieb Andreas Lehmkuehler: a candidate for the PDFBox 1.8.17 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/1.8.17/ The release candidate is a zip archive of the sources in: https://svn.apache.org/repos/asf/pdfbox/tags/1.8.17/ The SHA-512 checksum of the archive is e808b3b159b61b5928b0ad983b3bdadfc694ee80ca8a209669d591f90335165a45de684ea04b23d0a149bfc7ce5d890a287cb4e79300f3a08bb954884024c909. Please vote on releasing this package as Apache PDFBox 1.8.17. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 1.8.17 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5455) java.lang.ExceptionInInitializerError in org.apache.pdfbox.util.PDFTextStripper class
[ https://issues.apache.org/jira/browse/PDFBOX-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552602#comment-17552602 ] Timo Boehme commented on PDFBOX-5455: - You use a very old version of PDFBox. Even the 1.8 branch is at 1.8.16 but it is highly recommended to use the newest 2.0.26. Please test your file with a current version. > java.lang.ExceptionInInitializerError in > org.apache.pdfbox.util.PDFTextStripper class > -- > > Key: PDFBOX-5455 > URL: https://issues.apache.org/jira/browse/PDFBOX-5455 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.9 >Reporter: Kalpesh Patel >Priority: Minor > > Unable to read pdf file . Getting below exception - > Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds > for length 1 > at > org.apache.pdfbox.util.PDFTextStripper.(PDFTextStripper.java:123) > > Let me know if more details needed > > [~Bettenburg] > > [~will86] > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5455) java.lang.ExceptionInInitializerError in org.apache.pdfbox.util.PDFTextStripper class
[ https://issues.apache.org/jira/browse/PDFBOX-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme updated PDFBOX-5455: Priority: Minor (was: Blocker) > java.lang.ExceptionInInitializerError in > org.apache.pdfbox.util.PDFTextStripper class > -- > > Key: PDFBOX-5455 > URL: https://issues.apache.org/jira/browse/PDFBOX-5455 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 1.8.9 >Reporter: Kalpesh Patel >Priority: Minor > > Unable to read pdf file . Getting below exception - > Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds > for length 1 > at > org.apache.pdfbox.util.PDFTextStripper.(PDFTextStripper.java:123) > > Let me know if more details needed > > [~Bettenburg] > > [~will86] > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.25
+1 Thanks, Timo Am 13.12.21 um 20:02 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.25 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.25/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.25/ The SHA-512 checksum of the archive is e143b2a9aaa4b1f1be72e16a1c9968dacfcb3e89b4f21fdbd0580d8c9f1c9b54ee38d05fe3e52ff93493c858c51090fdd8256d22153cffba1e9b523fdbd1f2f4. Please vote on releasing this package as Apache PDFBox 2.0.25. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.25 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com | fax: +49 345 478 047 1 HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.24
+1 Thank you Timo Am 07.06.21 um 18:51 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.24 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/ The SHA-512 checksum of the archive is 5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642. Please vote on releasing this package as Apache PDFBox 2.0.24. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.24 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com | fax: +49 345 478 047 1 HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Retire Subproject Preflight
Hi, +1 since there wasn't really any development here in the last years and the efforts of making the PDFBox parser more lenient with broken PDF documents contradict the specification checks of Preflight Timo Am 27.05.21 um 08:33 schrieb Andreas Lehmkuehler: Hi, a discussion came up on dev@pdfbox [1] to retire Preflight and I had the impression that we already reached consensus to do so. I'd like to run a formal vote so that this topic won't get lost in some mailing list thread. Please vote on retiring the subproject Preflight with Apache PDFBox 4.0.0. The vote is open for the next 7 days and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Remove Preflight with Apache PDFBox 4.0.0 [ ] -1 Do not remove Preflight because... Here is my +1 Andreas P.S.: I've extended the voting period to 7 days to ensure that everybody has a chance to think about it and speak up if necessary. [1] https://lists.apache.org/thread.html/r8abffe02ff4a94be93b7799b589532dc2a3384d6c5cd727bc388250a%40%3Cdev.pdfbox.apache.org%3E - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com | fax: +49 345 478 047 1 HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5176) java.io.IOException: Page tree root must be a dictionary
[ https://issues.apache.org/jira/browse/PDFBOX-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343163#comment-17343163 ] Timo Boehme commented on PDFBOX-5176: - I think as we are confronted with Fuzzer generated PDFs or hand crafted bad PDFs we should strive for PDFBOX remaining stable (no memory overflow/infinite loop/stack overflow) independent of the PDF content (at least in the long run). Saying this processing or not processing a dictionary/value should not influence the stability and thus the question of how to process the problematic dictionary should merely be answered by how we best preserve the document content - maybe by parsing as much as possible and only skipping clearly corrupted parts(?) > java.io.IOException: Page tree root must be a dictionary > > > Key: PDFBOX-5176 > URL: https://issues.apache.org/jira/browse/PDFBOX-5176 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 3.0.0 PDFBox >Reporter: Tilman Hausherr >Priority: Major > Attachments: GHOSTSCRIPT-695040-0.zip-71.pdf, > GHOSTSCRIPT-695040-0.zip-87.pdf > > > Happens only on 3.0, not on 2.0.23 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 3.0.0-RC1
Hi, +1 I found only one small bug when calling java -jar pdfbox-app-3.0.0-RC1.jar help debug the help says Usage: pdfbox pdfdebugger but should Usage: pdfbox debug However this is a minor problem which should be ok for RC1. Best regards, Timo Am 29.03.21 um 19:08 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 3.0.0-RC1 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/3.0.0-RC1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/3.0.0-RC1/ The SHA-512 checksum of the archive is b4ed9fec1d5e86422452bda3d9ec66206aa665277d4aebe1e7053a0ef38de211d8440375bcaf05a4a5c0070d2bdfa9d30df94df2c128f6c15c8fb5b008550987. Please vote on releasing this package as Apache PDFBox 3.0.0-RC1. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 3.0.0-RC1 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com | fax: +49 345 478 047 1 HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.23
Hi, +1 Thanks, Timo Am 15.03.21 um 19:44 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.23 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.23/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.23/ The SHA-512 checksum of the archive is 9333cc6557b36d0355e84aa046b5f97b3f5d6e55337b316808e9cb04cec774e0db74f8a12079ca30104fe2853c7c1b4f090483238c47d0a2ccf7d5071b606378. Please vote on releasing this package as Apache PDFBox 2.0.23. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.23 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany email: timo.boe...@ontochem.com | web: www.ontochem.com | fax: +49 345 478 047 1 HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing directors: Dr. Lutz Weber (CEO), Dr. Felix Berthelmann (COO) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report October 2020 due
+1 Thanks, Timo Am 11.10.20 um 19:09 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report wizard template which can be found at [1] Any comments or additions are appreciated ... ## Description: The mission of PDFBox is the creation and maintenance of software related to Java library for working with PDF documents ## Issues: There are no issue requiring board attention at this time. ## Membership Data: Apache PDFBox was founded 2009-10-21 (11 years ago) There are currently 21 committers and 21 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Matthäus Mayer on 2017-10-16. - No new committers. Last addition was Joerg O. Henne on 2017-10-09. ## Project Activity: Recent releases: 2.0.21 was released on 2020-08-20. 2.0.20 was released on 2020-05-07. 2.0.19 was released on 2020-02-23. ## Community Health: - there is a steady stream of contributions, bug reports and questions on the mailing lists - the improvement of the on demand parser in the trunk is an ongoing effort - there are a lot of refactorings, improvements and bugfixes (selected choice follows) -- support for compressed object streams contributed by Christian Appl (ongoing) -- performance enhancements contributed by Alfred Faltiska -- support for incremental updates -- improve test coverage an code quality Andreas [1] https://reporter.apache.org/wizard/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4950) No lcms in java.library.path?
[ https://issues.apache.org/jira/browse/PDFBOX-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189316#comment-17189316 ] Timo Boehme commented on PDFBOX-4950: - Have you checked using ldd if all dependencies are ok for the lcms library? I had trouble with a native library using the Alpine Linux and its musl libc implementation. > No lcms in java.library.path? > - > > Key: PDFBOX-4950 > URL: https://issues.apache.org/jira/browse/PDFBOX-4950 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.16 > Environment: Alpine 3.10.0 > Docker container > OpenJDK 11 >Reporter: Me >Priority: Major > > Hi. I'm working with on a Camunda (BPM engine) integration that leverages > pdfbox to generate a pdf programmatically. I have the pdfbox jar deployed > with my WAR: > {quote}ls -l /camunda/webapps/test-workflows/WEB-INF/lib/ > -rw-r- 1 camunda camunda 62983 Aug 3 16:35 activation-1.1.jar > -rw-r- 1 camunda camunda 46874 Aug 3 16:35 > camunda-bpm-mail-core-1.2.0.jar > -rw-r- 1 camunda camunda 4332 Aug 3 16:35 > camunda-commons-logging-1.6.1.jar > -rw-r- 1 camunda camunda 12050 Aug 3 16:35 > camunda-commons-utils-1.6.1.jar > -rw-r- 1 camunda camunda 19240 Aug 3 16:35 camunda-connect-core-1.1.0.jar > -rw-r- 1 camunda camunda 23464 Aug 3 16:35 > camunda-identity-ldap-7.10.0.jar > -rw-r- 1 camunda camunda 246174 Aug 3 16:35 commons-beanutils-1.9.3.jar > -rw-r- 1 camunda camunda 335042 Aug 3 16:35 commons-codec-1.11.jar > -rw-r- 1 camunda camunda 575389 Aug 3 16:35 commons-collections-3.2.1.jar > -rw-r- 1 camunda camunda 752798 Aug 3 16:35 commons-collections4-4.2.jar > -rw-r- 1 camunda camunda 434678 Aug 3 16:35 commons-lang3-3.4.jar > -rw-r- 1 camunda camunda 61829 Aug 3 16:35 commons-logging-1.2.jar > -rw-r- 1 camunda camunda 182954 Aug 3 16:35 commons-text-1.3.jar > -rw-r- 1 camunda camunda 1558165 Aug 3 16:35 fontbox-2.0.16.jar > -rw-r- 1 camunda camunda 767916 Aug 3 16:35 httpclient-4.5.7.jar > -rw-r- 1 camunda camunda 326874 Aug 3 16:35 httpcore-4.4.11.jar > -rw-r- 1 camunda camunda 41779 Aug 3 16:35 httpmime-4.5.7.jar > -rw-r- 1 camunda camunda 66519 Aug 3 16:35 jackson-annotations-2.9.0.jar > -rw-r- 1 camunda camunda 324036 Aug 3 16:35 jackson-core-2.9.7.jar > -rw-r- 1 camunda camunda 1350857 Aug 3 16:35 jackson-databind-2.9.7.jar > -rw-r- 1 camunda camunda 603571 Aug 3 16:35 javax.mail-1.5.5.jar > -rw-r- 1 camunda camunda 170348 Aug 3 16:35 opencsv-4.6.jar > -rw-r- 1 camunda camunda 2684592 Aug 3 16:35 pdfbox-2.0.16.jar > -rw-r- 1 camunda camunda 29257 Aug 3 16:35 slf4j-api-1.7.7.jar > -rw-r- 1 camunda camunda 62599 Aug 3 16:35 smtp-1.6.0.jar > {quote} > When I try to use PDDocument.load(): > {quote}PDDocument pdfDocument = > PDDocument.load(this.getClass().getClassLoader().getResourceAsStream(template)); > {quote} > I observe the following exception that lcms isn't in the java library path: > {quote}01-Sep-2020 14:02:06.445 SEVERE [http-nio-8080-exec-1] > org.camunda.commons.logging.BaseLogger.logError ENGINE-16004 Exception while > closing command context: no lcms in java.library.path: > [/usr/lib/jvm/java-11-openjdk/lib/server, /usr/lib/jvm/java-11-openjdk/lib, > /usr/lib/jvm/java-11-openjdk/../lib, /usr/java/packages/lib, /usr/lib64, > /lib64, /lib, /usr/lib]bpm | java.lang.UnsatisfiedLinkError: no lcms in > java.library.path: [/usr/lib/jvm/java-11-openjdk/lib/server, > /usr/lib/jvm/java-11-openjdk/lib, /usr/lib/jvm/java-11-openjdk/../lib, > /usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]bpm | at > java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2660)bpm | at > java.base/java.lang.Runtime.loadLibrary0(Runtime.java:829)bpm | at > java.base/java.lang.System.loadLibrary(System.java:1867)bpm | at > java.desktop/sun.java2d.cmm.lcms.LCMS$1.run(LCMS.java:209)bpm | at > java.base/java.security.AccessController.doPrivileged(Native Method)bpm | > at java.desktop/sun.java2d.cmm.lcms.LCMS.getModule(LCMS.java:202)bpm | at > java.desktop/sun.java2d.cmm.lcms.LcmsServiceProvider.getModule(LcmsServiceProvider.java:34)bpm > | at > java.desktop/sun.java2d.cmm.CMMServiceProvider.getColorManagementModule(CMMServiceProvider.java:31)bpm > | at > java.desktop/sun.java2d.cmm.CMSManager.getModule(CMSManager.java:68)bpm | > at > java.desktop/java.awt.color.ICC_ColorSpace.toRGB(ICC_ColorSpace.java:177)bpm > | at > org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB.init(PDDeviceRGB.java:
[jira] [Commented] (PDFBOX-4950) No lcms in java.library.path?
[ https://issues.apache.org/jira/browse/PDFBOX-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189278#comment-17189278 ] Timo Boehme commented on PDFBOX-4950: - According to your java.library.path shown in the error log /usr/lib is not included - only /usr/lib64 or /lib (if I did not overlooked it) > No lcms in java.library.path? > - > > Key: PDFBOX-4950 > URL: https://issues.apache.org/jira/browse/PDFBOX-4950 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.16 > Environment: Alpine 3.10.0 > Docker container > OpenJDK 11 >Reporter: Me >Priority: Major > > Hi. I'm working with on a Camunda (BPM engine) integration that leverages > pdfbox to generate a pdf programmatically. I have the pdfbox jar deployed > with my WAR: > {quote}ls -l /camunda/webapps/test-workflows/WEB-INF/lib/ > -rw-r- 1 camunda camunda 62983 Aug 3 16:35 activation-1.1.jar > -rw-r- 1 camunda camunda 46874 Aug 3 16:35 > camunda-bpm-mail-core-1.2.0.jar > -rw-r- 1 camunda camunda 4332 Aug 3 16:35 > camunda-commons-logging-1.6.1.jar > -rw-r- 1 camunda camunda 12050 Aug 3 16:35 > camunda-commons-utils-1.6.1.jar > -rw-r- 1 camunda camunda 19240 Aug 3 16:35 camunda-connect-core-1.1.0.jar > -rw-r- 1 camunda camunda 23464 Aug 3 16:35 > camunda-identity-ldap-7.10.0.jar > -rw-r- 1 camunda camunda 246174 Aug 3 16:35 commons-beanutils-1.9.3.jar > -rw-r- 1 camunda camunda 335042 Aug 3 16:35 commons-codec-1.11.jar > -rw-r- 1 camunda camunda 575389 Aug 3 16:35 commons-collections-3.2.1.jar > -rw-r- 1 camunda camunda 752798 Aug 3 16:35 commons-collections4-4.2.jar > -rw-r- 1 camunda camunda 434678 Aug 3 16:35 commons-lang3-3.4.jar > -rw-r- 1 camunda camunda 61829 Aug 3 16:35 commons-logging-1.2.jar > -rw-r- 1 camunda camunda 182954 Aug 3 16:35 commons-text-1.3.jar > -rw-r- 1 camunda camunda 1558165 Aug 3 16:35 fontbox-2.0.16.jar > -rw-r- 1 camunda camunda 767916 Aug 3 16:35 httpclient-4.5.7.jar > -rw-r- 1 camunda camunda 326874 Aug 3 16:35 httpcore-4.4.11.jar > -rw-r- 1 camunda camunda 41779 Aug 3 16:35 httpmime-4.5.7.jar > -rw-r- 1 camunda camunda 66519 Aug 3 16:35 jackson-annotations-2.9.0.jar > -rw-r- 1 camunda camunda 324036 Aug 3 16:35 jackson-core-2.9.7.jar > -rw-r- 1 camunda camunda 1350857 Aug 3 16:35 jackson-databind-2.9.7.jar > -rw-r- 1 camunda camunda 603571 Aug 3 16:35 javax.mail-1.5.5.jar > -rw-r- 1 camunda camunda 170348 Aug 3 16:35 opencsv-4.6.jar > -rw-r- 1 camunda camunda 2684592 Aug 3 16:35 pdfbox-2.0.16.jar > -rw-r- 1 camunda camunda 29257 Aug 3 16:35 slf4j-api-1.7.7.jar > -rw-r- 1 camunda camunda 62599 Aug 3 16:35 smtp-1.6.0.jar > {quote} > When I try to use PDDocument.load(): > {quote}PDDocument pdfDocument = > PDDocument.load(this.getClass().getClassLoader().getResourceAsStream(template)); > {quote} > I observe the following exception that lcms isn't in the java library path: > {quote}01-Sep-2020 14:02:06.445 SEVERE [http-nio-8080-exec-1] > org.camunda.commons.logging.BaseLogger.logError ENGINE-16004 Exception while > closing command context: no lcms in java.library.path: > [/usr/lib/jvm/java-11-openjdk/lib/server, /usr/lib/jvm/java-11-openjdk/lib, > /usr/lib/jvm/java-11-openjdk/../lib, /usr/java/packages/lib, /usr/lib64, > /lib64, /lib, /usr/lib]bpm | java.lang.UnsatisfiedLinkError: no lcms in > java.library.path: [/usr/lib/jvm/java-11-openjdk/lib/server, > /usr/lib/jvm/java-11-openjdk/lib, /usr/lib/jvm/java-11-openjdk/../lib, > /usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]bpm | at > java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2660)bpm | at > java.base/java.lang.Runtime.loadLibrary0(Runtime.java:829)bpm | at > java.base/java.lang.System.loadLibrary(System.java:1867)bpm | at > java.desktop/sun.java2d.cmm.lcms.LCMS$1.run(LCMS.java:209)bpm | at > java.base/java.security.AccessController.doPrivileged(Native Method)bpm | > at java.desktop/sun.java2d.cmm.lcms.LCMS.getModule(LCMS.java:202)bpm | at > java.desktop/sun.java2d.cmm.lcms.LcmsServiceProvider.getModule(LcmsServiceProvider.java:34)bpm > | at > java.desktop/sun.java2d.cmm.CMMServiceProvider.getColorManagementModule(CMMServiceProvider.java:31)bpm > | at > java.desktop/sun.java2d.cmm.CMSManager.getModule(CMSManager.java:68)bpm | > at > java.desktop/java.awt.color.ICC_ColorSpace.toRGB(ICC_ColorSpace.java:177)bpm > | at > org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB.init(PDDeviceRGB.java:68)bpm > | at > org.apache.pdfbo
Re: [VOTE] Release Apache PDFBox 2.0.21
Hi, +1 Thanks, Timo Am 17.08.20 um 17:56 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.21 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.21/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.21/ The SHA-512 checksum of the archive is 18966eb4201de80b0d3220ab68d8d6062d23346c0ea6263df793c8c9f020ac2b3f173d7393c1bded46a474d44d5b8b839b0ca7f0bcba7b3d7d50196f98942691. Please vote on releasing this package as Apache PDFBox 2.0.21. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.21 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report July 2020 due
Hi, +1, Thanks, Timo Am 03.07.20 um 16:30 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report wizard template which can be found at [1] Any comments or additions are appreciated ... ## Description: The mission of PDFBox is the creation and maintenance of software related to Java library for working with PDF documents ## Issues: There are no issue requiring board attention at this time. ## Membership Data: Apache PDFBox was founded 2009-10-21 (11 years ago) There are currently 21 committers and 21 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Matthäus Mayer on 2017-10-16. - No new committers. Last addition was Joerg O. Henne on 2017-10-09. ## Project Activity: Recent releases: 2.0.20 was released on 2020-05-07. 2.0.19 was released on 2020-02-23. 2.0.18 was released on 2019-12-23. ## Community Health: - there is a steady stream of contributions, bug reports and questions on the mailing lists - the improvement of the on demand parser in the trunk is an ongoing effort - there are a lot of refactorings, improvements and bugfixes - our website build is converted to a fully automated maven build without the need to install any aditional software - Maruan, one of our pmcs, donated a virtual server which is now the home for Tikas bunch of test docs to be used for regressions tests in PDFBox, POI and Tika Andreas [1] https://reporter.apache.org/wizard/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report April 2020 due
+1 Thanks, Timo Am 03.04.20 um 16:51 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report wizard template which can be found at [1] Any comments or additions are appreciated ... ## Description: The mission of PDFBox is the creation and maintenance of software related to Java library for working with PDF documents ## Issues: There are no issue requiring board attention at this time. ## Membership Data: Apache PDFBox was founded 2009-10-21 (10 years ago) There are currently 21 committers and 21 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Matthäus Mayer on 2017-10-16. - No new committers. Last addition was Joerg O. Henne on 2017-10-09. ## Project Activity: Recent releases: 2.0.19 was released on 2020-02-23. 2.0.18 was released on 2019-12-23. 3.0.3 JBIG2 was released on 2019-12-18. ## Community Health: - there is a steady stream of contributions, bug reports and questions on the mailing lists - the improvement of the on demand parser in the trunk is an ongoing effort and a base version is available now. First results are promising with regard to performance and memory foodprint. There are some TODOs on our 3.0 list - there are as well a lot of refactorings, improvements and bugfixes Andreas [1] https://reporter.apache.org/wizard/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.19
+1 Thanks. Best regards, Timo Am 20.02.20 um 18:46 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.19 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.19/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.19/ The SHA-512 checksum of the archive is b9dcb725ca5123ebe9a8018532733acd443a345fe0a0448dec9ce5776c0b8b2fac420302e550064150403b987960b98a6bb85bff5a86bfbe8d291ba19ac950f8. Please vote on releasing this package as Apache PDFBox 2.0.19. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.19 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.3
+1 Thanks, Timo Am 14.12.19 um 15:53 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox JBIG2 ImageIO 3.0.3 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio/3.0.3/ The release candidate is a zip archive of the sources in: https://github.com/apache/pdfbox-jbig2/tree/3.0.3/ The SHA-512 checksum of the archive is 5350b4ce89af72eea5069f6ea5fc830238e4df711712506405aaf0e14546a1b07155b8c5225b47f0d40ce2821032426a2987adbe0df63c536cae4fb319b5c700. Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.3. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.3 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report October 2019 due
+1 Thanks, Timo Am 03.10.19 um 13:45 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report wizard template which can be found at [1] ## Description: The mission of PDFBox is the creation and maintenance of software related to Java library for working with PDF documents ## Issues: There are no issues requiring board attention at this time. ## Membership Data: This month is the 10th anniversary of Apache PDFBox. We graduated as TLP on 2009-10-21. There are currently 21 committers and 21 PMC members in this project. The Committer-to-PMC ratio is 1:1. Community changes, past quarter: - No new PMC members. Last addition was Matthäus Mayer on 2017-10-16. - No new committers. Last addition was Joerg O. Henne on 2017-10-09. ## Project Activity: Software development activity: - the work on 2.0.18 already started with a handful of fixes - the minimum requirement for the trunk is now java 8 - the improvement of the on demand parser of the trunk is an ongoing effort, as well as some other refactorings and improvements - we are waiting for our sonar project to be moved to the new location Recent releases: - 2.0.17 was released on 2019-09-20 - 2.0.16 was released on 2019-06-27 - 2.0.15 was released on 2019-04-11 ## Community Health: There is a steady stream of contributions, bug reports and questions on the mailing lists. Andreas [1] https://reporter.apache.org/wizard/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Switch trunk to java 8
+1 Timo Am 15.09.19 um 11:00 schrieb Andreas Lehmkuehler: Hi, I'd like to switch the trunk to java 8 as I like to use some java 8 features like streams in the near future. Are there any objections? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.17
+1 Thanks, Timo Am 17.09.19 um 20:22 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.17 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.17/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.17/ The SHA-512 checksum of the archive is 2c87384ec0ce768b01a653951c570dbb075f3e1ec63a7bf58d652bcab8e7c73375ae8ce2d133ba852d1ec21f999f3a12eeeaa8b982f5b007e92f5f1683032798. Please vote on releasing this package as Apache PDFBox 2.0.17. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.17 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- OntoChem GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 215461 Amtsgericht Stendal | USt-IdNr.: DE246232735 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890976#comment-16890976 ] Timo Boehme commented on PDFBOX-4601: - So while its good to hear the fix is working I'm not convinced we should really apply the patch. When RandomAccessFile - which is quite a basic class in Java - is not correctly working, than the whole system should be regarded as unreliable (at least for running Java; the mentioned reason allows for assumptions for other broken functions). Fortunately it seems this bug will be fixed in short time. Thus maybe one should wait for a fixed AWS system instead of adding the workaround to our code - so far this kind of problem was not reported by anyone else - WDYT? > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890155#comment-16890155 ] Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 1:19 PM: -- As I don't have this buggy environment it is a bit hard to recoment. Maybe writing the last byte of the enlarged RAF may trigger the correct length, something like {code:java} long origFilePointer = raf.getFilePointer(); raf.seek(fileLen - 1); raf.write(0); raf.seek(origFilePointer); {code} after {code:java} raf.setLength(fileLen); {code} Probably already the seek operation is enough? Please report back if this helps - maybe finding the least costly IO-operation for getting the real size. We may add this as an optional workaround which could be enabled using a system property. was (Author: tboehme): As I don't have this buggy environment it is a bit hard to recoment. Maybe writing the last byte of the enlarged RAF may trigger the correct length, something like {code:java} long origFilePointer = raf.getFilePointer(); raf.seek(fileLen - 1); raf.write(0); raf.seek(origFilePointer); {code} maybe already the seek operation is enough? Please report back if this helps - maybe finding the least costly IO-operation for getting the real size. We may add this as an optional workaround which could be enabled using a system property. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890155#comment-16890155 ] Timo Boehme commented on PDFBOX-4601: - As I don't have this buggy environment it is a bit hard to recomment. Maybe writing the last byte of the enlarged RAF may trigger the correct length, something like {code:java} long origFilePointer = raf.getFilePointer(); raf.seek(fileLen - 1); raf.write(0); raf.seek(origFilePointer); {code} maybe already the seek operation is enough? Please report back if this helps - maybe finding the least costly IO-operation for getting the real size. We may add this as an optional workaround which could be enabled using a system property. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890155#comment-16890155 ] Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 1:15 PM: -- As I don't have this buggy environment it is a bit hard to recoment. Maybe writing the last byte of the enlarged RAF may trigger the correct length, something like {code:java} long origFilePointer = raf.getFilePointer(); raf.seek(fileLen - 1); raf.write(0); raf.seek(origFilePointer); {code} maybe already the seek operation is enough? Please report back if this helps - maybe finding the least costly IO-operation for getting the real size. We may add this as an optional workaround which could be enabled using a system property. was (Author: tboehme): As I don't have this buggy environment it is a bit hard to recomment. Maybe writing the last byte of the enlarged RAF may trigger the correct length, something like {code:java} long origFilePointer = raf.getFilePointer(); raf.seek(fileLen - 1); raf.write(0); raf.seek(origFilePointer); {code} maybe already the seek operation is enough? Please report back if this helps - maybe finding the least costly IO-operation for getting the real size. We may add this as an optional workaround which could be enabled using a system property. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890082#comment-16890082 ] Timo Boehme commented on PDFBOX-4601: - Don't known if [https://bugs.openjdk.java.net/browse/JDK-8202261] has something to do with this. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890076#comment-16890076 ] Timo Boehme commented on PDFBOX-4601: - For me this is a quite strange behavior (e.g. fileLen after: 131072, raf length: 65536): after setting the RAF size and checking it does not report the new size. Somehow it seems the file system does not report the correct new size (while testing on my end the set length is immediately also reported on raf.length()). Is this a special behavior on the AWS filesystem or JDK? It seems there is some caching or lazy propagation of IO operations ... > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890011#comment-16890011 ] Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 9:02 AM: -- Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) and cannot have a size of 192512 (47 4kB pages). Currently when setting the new file length we rely on RandomAccessFile.setLength(X) to throw an exception if setting this size is not possible. Somehow setting the new size did not work in your case and there was no exception. Does this happen regularly/each time? You may add a check after {code:java} raf.setLength(fileLen); {code} if the file could not be set to the new length ( {{if (raf.length() != fileLen) ...}} ) and report if that is the case here (just realized that this is what [~tilman] did with the debug logging). was (Author: tboehme): Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) and cannot have a size of 192512 (47 4kB pages). Currently when setting the new file length we rely on RandomAccessFile.setLength(X) to throw an exception if setting this size is not possible. Somehow setting the new size did not work in your case and there was no exception. Does this happen regularly/each time? You may add a check after {code:java} raf.setLength(fileLen); {code} if the file could not be set to the new length ( {{if (raf.length() != fileLen) ...}} ) and report if that is the case here. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890011#comment-16890011 ] Timo Boehme edited comment on PDFBOX-4601 at 7/22/19 8:57 AM: -- Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) and cannot have a size of 192512 (47 4kB pages). Currently when setting the new file length we rely on RandomAccessFile.setLength(X) to throw an exception if setting this size is not possible. Somehow setting the new size did not work in your case and there was no exception. Does this happen regularly/each time? You may add a check after {code:java} raf.setLength(fileLen); {code} if the file could not be set to the new length ( {{if (raf.length() != fileLen) ...}} ) and report if that is the case here. was (Author: tboehme): Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) and cannot have a size of 192512 (47 4kB pages). Currently when setting the new file length we rely on RandomAccessFile.setLength(X) to throw an exception if setting this size is not possible. Somehow setting the new size did not work in your case and there was no exception. Does this happen regularly/each time? You may add a check after {code:java} raf.setLength(fileLen); {code} if the file could not be set to the new length ( {{if (raf.length() != fileLen) ...}} ) and report if this here. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890011#comment-16890011 ] Timo Boehme commented on PDFBOX-4601: - Regarding the exception: the scratch file is enlarged by 16*4kB pages. Thus it should have the expected size of 196608 (48 4kB pages; it got 3 times enlarged) and cannot have a size of 192512 (47 4kB pages). Currently when setting the new file length we rely on RandomAccessFile.setLength(X) to throw an exception if setting this size is not possible. Somehow setting the new size did not work in your case and there was no exception. Does this happen regularly/each time? You may add a check after {code:java} raf.setLength(fileLen); {code} if the file could not be set to the new length ( {{if (raf.length() != fileLen) ...}} ) and report if this here. > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4601) in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected scratch file size of 196608 but found 192512
[ https://issues.apache.org/jira/browse/PDFBOX-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889974#comment-16889974 ] Timo Boehme commented on PDFBOX-4601: - Hi, regarding {code:java} if (pageCount + ENLARGE_PAGE_COUNT > pageCount) {code} while at first it seems contradictory in its meaning, but you should also read the comment above: {code:java} // enlarge if we do not overflow {code} so this tests for the seldom case of int overflow. The maximum-page-count is tested at start of the method - when later increasing the page count it is assumed that adding the #ENLARGE_PAGE_COUNT amount is not problematic even if maxPageCount - pageCount is less than this value (few 4kB pages). > in AWS lambda pdf merge giving error as Error in pdf consolidation: Expected > scratch file size of 196608 but found 192512 > - > > Key: PDFBOX-4601 > URL: https://issues.apache.org/jira/browse/PDFBOX-4601 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.12, 2.0.16 > Environment: AWS Lambda >Reporter: biswajit >Priority: Major > Fix For: 2.0.17 > > > in AWS lambda pdf merge giving error as > {{Error in pdf consolidation: Expected scratch file size of 196608 but found > 192512.}} > *Code:* > {code} > PDFMergerUtility pdfMerger = new PDFMergerUtility(); > pdfMerger.addSources(sources); > pdfMerger.setDestinationStream(mergedPDFOutputStream); > pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly()); > {code} > both InputStream and OutputStream are ByteArrayInputStream and > ByteArrayOutputStream. AWS Lambda environment has 512MB space available only > for /tmp partition. This could be an issue or not I am not sure. And AWS > lambda do not permit other directory than /tmp partition to create files. > And while reading into the code I found below piece of code which I think > always be true. Because if you add some constant amount to an integer that > will always be constant amount greater than its original value > in ScratchFile.java => enlarge() method: > {code} > if (pageCount + ENLARGE_PAGE_COUNT > pageCount) > { > fileLen += ENLARGE_PAGE_COUNT * PAGE_SIZE; > raf.setLength(fileLen); > freePages.set(pageCount, pageCount + ENLARGE_PAGE_COUNT); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [ANNOUNCE] Apache PDFBox 2.0.16 released
ibe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.16
Hi, +1 Thanks, Timo Am 24.06.19 um 19:35 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.16 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.16/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.16/ The SHA-512 checksum of the archive is cd82d40f19500bb7b510d0eb25664779ae63a12152e5ccc92a643db12e438d8700d6f74093a1f2e739780b5fecacc7636aabfe5a4b9b85dd32eb1bc1394f3f71. Please vote on releasing this package as Apache PDFBox 2.0.16. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.16 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4559) Parse error reading document from several threads
[ https://issues.apache.org/jira/browse/PDFBOX-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865403#comment-16865403 ] Timo Boehme edited comment on PDFBOX-4559 at 6/17/19 8:17 AM: -- I think we have to explore different levels of creating/using streams in regard to be thread safe. The base implementation for our memory paging - ScratchFile - is (as the Javadoc states) thread safe (at least it was meant to be :)). However the RandomAccess instances (ScratchFileBuffer) created from it are not - as we have possibilities of mixed reads and writes (and so far parallel access to an instance was not supported by the API). RandomAccessInputStream is only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The first step would be to switch the ScratchFileBuffer in a read-only mode (or have a small wrapper only allowing thread-safe read access, implementing RandomAccessRead). However even this might not help in this case as using a single RandomAccessInputStream from multiple threads will lead to errors (even if the methods would be synchronized) as one thread would not see a sequential stream of input bytes because the other threads will read some bytes in between. For thread safe access the RandomAccessInputStream has to be created on request of a specific thread and method which wants to read the data. Thus the COSInputStream would have to store the thread safe RandomAccessRead implementation (as it does so indirectly now for the ScratchFileBuffer underlying the RandomAccessInputStream) and would have a method for creating a RandomAccessInputStream each time it is needed (being only a small access wrapper for the data). was (Author: tboehme): I think we have to explore different levels of creating/using streams in regard to be thread safe. The base implementation for out memory paging - ScratchFile - is (as the Javadoc states) thread safe (at least was meant to be it :) ). However the RandomAccess instances (ScratchFileBuffer) created from it are not - as we have possibilities of mixed reads and writes (and so far parallel access to an instance was not supported by the API). RandomAccessInputStream is only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The first step would be to switch the ScratchFileBuffer in a read-only mode (or have a small wrapper only allowing thread-safe read access, implementing RandomAccessRead). However even this might not help in this case as using a single RandomAccessInputStream from multiple threads will be go wrong (even if the methods would be synchronized) as one thread would not see a sequential stream of input bytes but the other threads will read some bytes in between. For thread safe access the RandomAccessInputStream has to be created on request of a specific thread and method which wants to read the data. Thus the COSInputStream would have to store the thread safe RandomAccessRead implementation (as it does so indirectly now for the ScratchFileBuffer underlying the RandomAccessInputStream) and would have a method for creating a RandomAccessInputStream each time it is needed (beeing only a small access wrapper for the data). > Parse error reading document from several threads > - > > Key: PDFBOX-4559 > URL: https://issues.apache.org/jira/browse/PDFBOX-4559 > Project: PDFBox > Issue Type: Bug > Components: Documentation, Rendering >Affects Versions: 2.0.15 > Environment: Oracle Java 8 update125 on both Mac OS X and centos >Reporter: Jack >Priority: Major > Labels: concurrency, multithreading, type1, type1font > Attachments: test.pdf > > > I got following error while running a simple parallel rendering code. > However, the error doesn't happen when I change parallelStream to sequential > (stream()). Interestingly, both methods will render exact same images. I saw > a possible related ticket PDFBOX-3654. But seems that issue was fixed. I'd > like to learn if we have some more bugs related? > *Sample code*: > {code:java} > PDDocument document = PDDocument.load(new File(pdfFilename)); > List pdfPages = new Splitter().split(document); > pdfPages.parallelStream().forEach(page -> { > try { > PDFRenderer renderer = new PDFRenderer(page); > renderer.renderImageWithDPI(0, 180, ImageType.RGB); // change dpi to your > number > } catch (IOException e) { > System.out.println(e); > } > try { > pdfPage.close(); > } catch (IOException ignored) { > } > }); > try { > document.close(); > } catch (IOException ignored) { > } > {code} > > *Error log*: > {noformat} > ERROR [PDType1
[jira] [Commented] (PDFBOX-4559) Parse error reading document from several threads
[ https://issues.apache.org/jira/browse/PDFBOX-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865403#comment-16865403 ] Timo Boehme commented on PDFBOX-4559: - I think we have to explore different levels of creating/using streams in regard to be thread safe. The base implementation for out memory paging - ScratchFile - is (as the Javadoc states) thread safe (at least was meant to be it :) ). However the RandomAccess instances (ScratchFileBuffer) created from it are not - as we have possibilities of mixed reads and writes (and so far parallel access to an instance was not supported by the API). RandomAccessInputStream is only a small layer on top of RandomAccessRead - here as ScratchFileBuffer. The first step would be to switch the ScratchFileBuffer in a read-only mode (or have a small wrapper only allowing thread-safe read access, implementing RandomAccessRead). However even this might not help in this case as using a single RandomAccessInputStream from multiple threads will be go wrong (even if the methods would be synchronized) as one thread would not see a sequential stream of input bytes but the other threads will read some bytes in between. For thread safe access the RandomAccessInputStream has to be created on request of a specific thread and method which wants to read the data. Thus the COSInputStream would have to store the thread safe RandomAccessRead implementation (as it does so indirectly now for the ScratchFileBuffer underlying the RandomAccessInputStream) and would have a method for creating a RandomAccessInputStream each time it is needed (beeing only a small access wrapper for the data). > Parse error reading document from several threads > - > > Key: PDFBOX-4559 > URL: https://issues.apache.org/jira/browse/PDFBOX-4559 > Project: PDFBox > Issue Type: Bug > Components: Documentation, Rendering >Affects Versions: 2.0.15 > Environment: Oracle Java 8 update125 on both Mac OS X and centos >Reporter: Jack >Priority: Major > Labels: concurrency, multithreading, type1, type1font > Attachments: test.pdf > > > I got following error while running a simple parallel rendering code. > However, the error doesn't happen when I change parallelStream to sequential > (stream()). Interestingly, both methods will render exact same images. I saw > a possible related ticket PDFBOX-3654. But seems that issue was fixed. I'd > like to learn if we have some more bugs related? > *Sample code*: > {code:java} > PDDocument document = PDDocument.load(new File(pdfFilename)); > List pdfPages = new Splitter().split(document); > pdfPages.parallelStream().forEach(page -> { > try { > PDFRenderer renderer = new PDFRenderer(page); > renderer.renderImageWithDPI(0, 180, ImageType.RGB); // change dpi to your > number > } catch (IOException e) { > System.out.println(e); > } > try { > pdfPage.close(); > } catch (IOException ignored) { > } > }); > try { > document.close(); > } catch (IOException ignored) { > } > {code} > > *Error log*: > {noformat} > ERROR [PDType1Font] Can't read the embedded Type1 font POAEND+Gotham-Book > java.io.IOException: unexpected closing parenthesis > at org.apache.fontbox.type1.Type1Lexer.readToken(Type1Lexer.java:123) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.fontbox.type1.Type1Lexer.nextToken(Type1Lexer.java:75) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.fontbox.type1.Type1Parser.readValue(Type1Parser.java:398) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.fontbox.type1.Type1Parser.readOtherSubrs(Type1Parser.java:707) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.fontbox.type1.Type1Parser.parseBinary(Type1Parser.java:550) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.fontbox.type1.Type1Parser.parse(Type1Parser.java:64) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.fontbox.type1.Type1Font.createWithSegments(Type1Font.java:85) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.pdfbox.pdmodel.font.PDType1Font.(PDType1Font.java:262) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:62) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146) > ~[pdfbox-2.0.15-snapshot108.jar:2.0.15-SNAPSHOT] > at > org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAn
[jira] [Comment Edited] (PDFBOX-4539) Cache CharsetDecoder
[ https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836339#comment-16836339 ] Timo Boehme edited comment on PDFBOX-4539 at 5/9/19 12:41 PM: -- Removed my suggestion as the decode method already does the full decoding cycle including reset. was (Author: tboehme): Removed my suggestion as the decode method already does the full decoding cycle inclusing reset. > Cache CharsetDecoder > > > Key: PDFBOX-4539 > URL: https://issues.apache.org/jira/browse/PDFBOX-4539 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 2.0.14 >Reporter: Jonathan >Priority: Major > Labels: performance > Fix For: 2.0.16 > > > We were using PDFBox to parse and process a large number of PDFs, which could > potentially contains thousands of pages in total, so performance mattered to > us. > Thus, we'd like to suggest to cache the CharsetDecoder, which is currently > instantiated on each call of `isValidUTF8(byte[])`. > Our suggestion in BaseParser.java > {code:java} > private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder(); > /** > * Returns true if a byte sequence is valid UTF-8. > */ > private boolean isValidUTF8(byte[] input) > { > try > { > csUTF_8.decode(ByteBuffer.wrap(input)); > return true; > } > catch (CharacterCodingException e) > { > return false; > } > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4539) Cache CharsetDecoder
[ https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836339#comment-16836339 ] Timo Boehme edited comment on PDFBOX-4539 at 5/9/19 12:40 PM: -- Removed my suggestion as the decode method already does the full decoding cycle inclusing reset. was (Author: tboehme): How about {code:java} private final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder(); /** * Returns true if a byte sequence is valid UTF-8. */ private boolean isValidUTF8(byte[] input) { try { csUTF_8.decode(ByteBuffer.wrap(input)); return true; } catch (CharacterCodingException e) { csUTF_8.reset(); return false; } } {code} > Cache CharsetDecoder > > > Key: PDFBOX-4539 > URL: https://issues.apache.org/jira/browse/PDFBOX-4539 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 2.0.14 >Reporter: Jonathan >Priority: Major > Labels: performance > Fix For: 2.0.16 > > > We were using PDFBox to parse and process a large number of PDFs, which could > potentially contains thousands of pages in total, so performance mattered to > us. > Thus, we'd like to suggest to cache the CharsetDecoder, which is currently > instantiated on each call of `isValidUTF8(byte[])`. > Our suggestion in BaseParser.java > {code:java} > private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder(); > /** > * Returns true if a byte sequence is valid UTF-8. > */ > private boolean isValidUTF8(byte[] input) > { > try > { > csUTF_8.decode(ByteBuffer.wrap(input)); > return true; > } > catch (CharacterCodingException e) > { > return false; > } > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4539) Cache CharsetDecoder
[ https://issues.apache.org/jira/browse/PDFBOX-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836339#comment-16836339 ] Timo Boehme commented on PDFBOX-4539: - How about {code:java} private final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder(); /** * Returns true if a byte sequence is valid UTF-8. */ private boolean isValidUTF8(byte[] input) { try { csUTF_8.decode(ByteBuffer.wrap(input)); return true; } catch (CharacterCodingException e) { csUTF_8.reset(); return false; } } {code} > Cache CharsetDecoder > > > Key: PDFBOX-4539 > URL: https://issues.apache.org/jira/browse/PDFBOX-4539 > Project: PDFBox > Issue Type: Improvement > Components: Parsing >Affects Versions: 2.0.14 >Reporter: Jonathan >Priority: Major > Labels: performance > Fix For: 2.0.16 > > > We were using PDFBox to parse and process a large number of PDFs, which could > potentially contains thousands of pages in total, so performance mattered to > us. > Thus, we'd like to suggest to cache the CharsetDecoder, which is currently > instantiated on each call of `isValidUTF8(byte[])`. > Our suggestion in BaseParser.java > {code:java} > private static final CharsetDecoder csUTF_8 = Charsets.UTF_8.newDecoder(); > /** > * Returns true if a byte sequence is valid UTF-8. > */ > private boolean isValidUTF8(byte[] input) > { > try > { > csUTF_8.decode(ByteBuffer.wrap(input)); > return true; > } > catch (CharacterCodingException e) > { > return false; > } > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.15
+1 Thanks, Timo Am 08.04.19 um 17:15 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.15 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.15/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.15/ The SHA-512 checksum of the archive is 4f2afe35ae9feb0b2edfd4d7bec1061db5651138bffc124b8bf522f18e5446bbdab2bd1949bed6c12c20dc93d5f82031ab958062553b659d8ae88bb0fef43270. Please vote on releasing this package as Apache PDFBox 2.0.15. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.15 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.14
Hi, +1 Thanks, Timo Am 25.02.19 um 18:22 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.14 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.14/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.14/ The SHA-512 checksum of the archive is 8fe88d2ee4e243e47e651df914cc51b72f5ba0cb737125e8a622137327330e6f542f2f0df13e43bec5148554b262a4cbf4b2b0fbcec985e5db487fe6420a06b3. Please vote on releasing this package as Apache PDFBox 2.0.14. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.14 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report January 2019 due
Hi, +1 BR Timo Am 06.01.19 um 17:44 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report template which can be found at [1] Any further comments, objections or additions? ## Description: - the Apache PDFBox library is an open source Java tool for working with PDF documents. ## Issues: - there are no issue requiring board attention at this time. ## Activity: - more than 20 JIRA tickets were fixed since releasing 2.0.13 so that most likely 2.0.14 is about to be released soon - due to Sallys post "Apache in 2018 - By The Digits" Tilman is among the top 5 committers of 2018 although we are a small community compared to other ASF projects, see https://s.apache.org/Apache2018Digits - Maruan managed to move all of our repos from git-wip-us to gitbox to support infra with the decommission of git-wip-us ## Health report: - there is a steady stream of contributions, bug reports and questions on the mailing lists ## PMC changes: - Currently 21 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Matthäus Mayer on Mon Oct 16 2017 ## Committer base changes: - Currently 21 committers. - No new committers added in the last 3 months - Last committer addition was Joerg O. Henne at Mon Oct 09 2017 ## Releases: - 2.0.13 was released on Sun Dec 02 2018 Andreas [1] https://reporter.apache.org/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE][LAZY] move PDFBox git-wip repos to gitbox
Hi, +1 BR Timo Am 09.12.18 um 13:09 schrieb Andreas Lehmkuehler: Hi, Infra stated that we need documented consensus on this. So, let’s have at it. Maruan proposed to move the following repos over to gitbox: pdfbox-docs - our documentation pdfbox-jbig2 - jbig2 subproject pdfbox-testfiles - build test files for jbig2 We are going to start with pdfbox-docs. The empty repository pdfbox-examples will be retired due to inactivity. This is a lazy vote and will close in 72 hours. [1], [2] Cheers, Andreas [1] https://www.apache.org/foundation/voting.html [2] https://www.apache.org/foundation/glossary.html#LazyConsensus - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox Board Report October 2018 due
+1 Thanks Timo Am 08.10.18 um 17:35 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report template which can be found at [1] Any further comments, objections or additions? ## Description: - the Apache PDFBox library is an open source Java tool for working with PDF documents. ## Issues: - there are no issue requiring board attention at this time. ## Activity: - we released 2 new PDFBox versions and one new JBIG2 version - 1.8.16 and 2.0.12 were released to fix CVE-2018-11797. It was reported through security@ - Tilman is working on making PDFBox compatible with java 11 - we are collaborating with Daniel Persson to explain a jdk related performance issue with some rendering cases ## Health report: - there is a steady stream of contributions, bug reports and questions on the mailing lists ## PMC changes: - Currently 21 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Matthäus Mayer on Mon Oct 16 2017 ## Committer base changes: - Currently 21 committers. - No new committers added in the last 3 months - Last committer addition was Joerg O. Henne at Mon Oct 09 2017 ## Releases: - 1.8.16 was released on Fri Oct 05 2018 - 2.0.12 was released on Fri Oct 05 2018 - 3.0.2 JBIG2 was released on Tue Sep 25 2018 Andreas [1] https://reporter.apache.org/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 1.8.16
Hi, +1 Timo Am 01.10.18 um 21:04 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 1.8.16 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/1.8.16/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/1.8.16/ The SHA-512 checksum of the archive is 85fbb9ef611876566f4bca626328af1e6c2ee9e9fddf18f589c110042727c15fa301d693b5f397bdbfb41e245502f40b9b2edb7dc691ccbe3e9f57a5aee8061e. Please vote on releasing this package as Apache PDFBox 1.8.16. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 1.8.16 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.12
Hi, +1 Timo Am 01.10.18 um 20:40 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.12 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.12/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.12/ The SHA-512 checksum of the archive is 164a05954ed30e7c334d3c09a13acb6ad4b242ee24de5f96a27ab80329f85933c9d9561fdd542687864596db3d1f16f55c6fd18f31cea65d98a0cc22f5238f6b. Please vote on releasing this package as Apache PDFBox 2.0.12. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.12 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.2
Hi, +1 Timo Am 22.09.2018 um 17:54 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox JBIG2 ImageIO 3.0.2 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio-3.0.2/ The release candidate is a zip archive of the sources in: https://github.com/apache/pdfbox-jbig2/tree/jbig2-imageio-3.0.2/ The SHA-512 checksum of the archive is 9a89ebefc13d23ec1b5787f836764b4d9f8793b08f4f5ff3c3fbb310b6b033dd880dac6f3830ab95e086c9efa07434a43fa0d30587b7cb4c1edb4a1ef017f5fe. Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.2. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.2 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623723#comment-16623723 ] Timo Boehme edited comment on PDFBOX-4309 at 9/21/18 2:43 PM: -- In principle it is only checked if the sun class exists while trying to load it (no instance creation etc.). I don't see a major problem here - it only is done for Java <= 8 and here I don't know of any problem (there is no problem if the class is missing). You cannot rely on sun.java2d.cmm as even if it is set and KCMS is not available (as in OpenJDK 7/8 for Linux) it is simply ignored - thus if a user uses this setting (as proposed on the PDFBox website) and we rely on this setting it will lead to using wrong (performance) method. was (Author: tboehme): In principle it is only checked if the sun class exists while trying to load it (no instance creation etc.). I don't see a major problem here - it only is done for Java <= 8 and here I don't know of any problem (there is no problem is the class is missing). You cannot rely on sun.java2d.cmm as even if it is set and KCMS is not available (as in OpenJDK 7/8 for Linux) it is simply ignored - thus if a user uses this setting (as proposed on the PDFBox website) and we rely on this setting it will lead to using wrong (performance) method. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.Kcm
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623723#comment-16623723 ] Timo Boehme commented on PDFBOX-4309: - In principle it is only checked if the sun class exists while trying to load it (no instance creation etc.). I don't see a major problem here - it only is done for Java <= 8 and here I don't know of any problem (there is no problem is the class is missing). You cannot rely on sun.java2d.cmm as even if it is set and KCMS is not available (as in OpenJDK 7/8 for Linux) it is simply ignored - thus if a user uses this setting (as proposed on the PDFBox website) and we rely on this setting it will lead to using wrong (performance) method. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the syst
[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613319#comment-16613319 ] Timo Boehme edited comment on PDFBOX-4309 at 9/13/18 10:56 AM: --- For toRGB one could possibly test for: Boolean.TRUE.equals(usesKCMS()) || Boolean.FALSE.equals(usesLCMS()) || JAVAVERSION>8 || JAVAVERSION<7 was (Author: tboehme): For toRGB one could possibly test for: Boolean.TRUE.equals(usesKCMS()) || Boolean.FALSE.equals(usesLCMS()) || JAVAVERSION>8 > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613319#comment-16613319 ] Timo Boehme commented on PDFBOX-4309: - For toRGB one could possibly test for: Boolean.TRUE.equals(usesKCMS()) || Boolean.FALSE.equals(usesLCMS()) || JAVAVERSION>8 > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing large collections of PDF documents (e.g. focusing on text) or &g
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613316#comment-16613316 ] Timo Boehme commented on PDFBOX-4309: - It seems the the XcmsServiceProviders were introduced when there was a possibility to choose (first in Java 8) thus the hasXCMS() maybe should try to load classes sun.java2d.cmm.lcms.LCMS and sun.java2d.cmm.kcms.CMM which also works in Java 7. Additionally one could add the getCMSClassname test also for usesKCMS() (with contains(".kcms.")) to get it right. I've updated the code accordingly. At least for jdk6 and below this will fail, but KCMS seems to be the only option in this versions anyway. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYK
[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613231#comment-16613231 ] Timo Boehme edited comment on PDFBOX-4309 at 9/13/18 10:49 AM: --- Ok, I've updated the detection as follows {code:java} import java.lang.reflect.Method; public class CMSImplTest { public static boolean hasLCMS() { try { CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.lcms.LCMS" ); return true; } catch ( Exception e ) { return false; } } public static boolean hasKCMS() { try { CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.kcms.CMM" ); return true; } catch ( Exception e ) { return false; } } public static String getCMSClassname() { String javaVersStr = System.getProperty("java.specification.version"); if ( (javaVersStr == null) || ( ! javaVersStr.startsWith("1.") ) ) { return null; } Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass != null ? cmsModuleClass.getClass().getName() : null; } public static Boolean usesLCMS() { // first try to get CMS class (works for Java 7,8) String cmsModuleClass = getCMSClassname(); if ( cmsModuleClass != null ) { return cmsModuleClass.contains(".lcms."); } return Boolean.TRUE.equals( usesKCMS() ) ? Boolean.FALSE : Boolean.TRUE.equals( hasLCMS() ) ? Boolean.TRUE : null; } public static Boolean usesKCMS() { // first try to get CMS class (works for Java 7,8) String cmsModuleClass = getCMSClassname(); if ( cmsModuleClass != null ) { return cmsModuleClass.contains(".kcms."); } if ( hasKCMS() && "sun.java2d.cmm.kcms.KcmsServiceProvider".equals(System.getProperty("sun.java2d.cmm"))) { return true; } return null; } public static void main(String[] args) { System.out.println( "Has KCMS: " + hasKCMS() ); System.out.println( "Has LCMS: " + hasLCMS() ); System.out.println( "Used CMS class:" + getCMSClassname() ); System.out.println( "Uses KCMS:" + usesKCMS() ); System.out.println( "Uses LCMS:" + usesLCMS() ); } } {code} Now it only uses class-loading for checking existence of CMS variants and reflection is restricted to Java <= 8. With usesLCMS() and usesKCMS() I used a 3 value logic - null if unknown. Maybe the check is also only needed for Java <= 8 as in later versions even LCMS is reasonably fast, which means if usesLCMS() does not return TRUE we may assume KCMS. was (Author: tboehme): Ok, I've updated the detection as follows {code:java} import java.lang.reflect.Method; public class CMSImplTest { public static boolean hasLCMS() { try { CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.lcms.LcmsServiceProvider" ); return true; } catch ( Exception e ) { return false; } } public static boolean hasKCMS() { try { CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.kcms.KcmsServiceProvider" ); return true; } catch ( Exception e ) { return false; } } public static String getCMSClassname() { String javaVersStr = System.getProperty("java.specification.version"); if ( (javaVersStr == null) || ( ! javaVersStr.startsWith("1.") ) ) { return null; } Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613231#comment-16613231 ] Timo Boehme commented on PDFBOX-4309: - Ok, I've updated the detection as follows {code:java} import java.lang.reflect.Method; public class CMSImplTest { public static boolean hasLCMS() { try { CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.lcms.LcmsServiceProvider" ); return true; } catch ( Exception e ) { return false; } } public static boolean hasKCMS() { try { CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.kcms.KcmsServiceProvider" ); return true; } catch ( Exception e ) { return false; } } public static String getCMSClassname() { String javaVersStr = System.getProperty("java.specification.version"); if ( (javaVersStr == null) || ( ! javaVersStr.startsWith("1.") ) ) { return null; } Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass != null ? cmsModuleClass.getClass().getName() : null; } public static Boolean usesLCMS() { // first try to get CMS class (works for Java 7,8) String cmsModuleClass = getCMSClassname(); if ( cmsModuleClass != null ) { return cmsModuleClass.contains(".lcms."); } return Boolean.TRUE.equals( usesKCMS() ) ? false : hasLCMS() ? true : null; } public static Boolean usesKCMS() { if ( hasKCMS() && "sun.java2d.cmm.kcms.KcmsServiceProvider".equals(System.getProperty("sun.java2d.cmm"))) { return true; } return null; } public static void main(String[] args) { System.out.println( "Has KCMS: " + hasKCMS() ); System.out.println( "Has LCMS: " + hasLCMS() ); System.out.println( "Used CMS class:" + getCMSClassname() ); System.out.println( "Uses KCMS:" + usesKCMS() ); System.out.println( "Uses LCMS:" + usesLCMS() ); } } {code} Now it only uses class-loading for checking existence of CMS variants and reflection is restricted to Java <= 8. With usesLCMS() and usesKCMS() I used a 3 value logic - null if unknown. Maybe the check is also only needed for Java <= 8 as in later versions even LCMS is reasonably fast, which means if usesLCMS() does not return TRUE we may assume KCMS. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorCon
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611909#comment-16611909 ] Timo Boehme commented on PDFBOX-4309: - If the getCMSClassname() in above class returns NULL and Java version is before 1.7 one may assume a none LCMS manager is used (e.g. in Linux Java 1.6 there is sun.awt.color.CMM and no CMSManager). For Java 1.7 and 1.8 if LCMS is available also the CMSManager should exist and we would get a non-null result. For version above 1.8 if we get NULL it should be safe to assume no KCMS is used. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 second
[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611698#comment-16611698 ] Timo Boehme edited comment on PDFBOX-4309 at 9/12/18 7:27 AM: -- How about this to get the CMS class: {code:java} import java.lang.reflect.Method; public class CMSImplTest { public static String getCMSClassname() { Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass != null ? cmsModuleClass.getClass().getName() : null; } public static void main(String[] args) { System.out.println( getCMSClassname() ); } } {code} was (Author: tboehme): How about this to get the CMS class: {code:java} import java.lang.reflect.Method; public class CMSImplTest { public static String getCMSClassname() { Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass.getClass().getName(); } public static void main(String[] args) { System.out.println( getCMSClassname() ); } } {code} > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PD
[jira] [Comment Edited] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611698#comment-16611698 ] Timo Boehme edited comment on PDFBOX-4309 at 9/12/18 7:25 AM: -- How about this to get the CMS class: {code:java} import java.lang.reflect.Method; public class CMSImplTest { public static String getCMSClassname() { Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass.getClass().getName(); } public static void main(String[] args) { System.out.println( getCMSClassname() ); } } {code} was (Author: tboehme): How about this to get the CMS class: {code:java} public class CMSImplTest { public static String getCMSClassname() { Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass.getClass().getName(); } public static void main(String[] args) { System.out.println( getCMSClassname() ); } } {code} > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDCol
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611698#comment-16611698 ] Timo Boehme commented on PDFBOX-4309: - How about this to get the CMS class: {code:java} public class CMSImplTest { public static String getCMSClassname() { Class cmsMgrClass = null; try { cmsMgrClass = CMSImplTest.class.getClassLoader().loadClass( "sun.java2d.cmm.CMSManager" ); } catch ( Exception e ) { System.err.println( "Unable to load CMSManager." ); return null; } Method cmsMgrMethod = null; try { cmsMgrMethod = cmsMgrClass.getMethod( "getModule" ); } catch ( Exception e ) { System.err.println( "Unable to get 'getModule' method from CMSManager." ); return null; } Object cmsModuleClass = null; try { cmsModuleClass = cmsMgrMethod.invoke( null ); } catch ( Exception e ) { System.err.println( "Unable to run 'getModule' method from CMSManager." ); return null; } return cmsModuleClass.getClass().getName(); } public static void main(String[] args) { System.out.println( getCMSClassname() ); } } {code} > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?)
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611328#comment-16611328 ] Timo Boehme commented on PDFBOX-4309: - Maybe sun.java2d.cmm.CMSManager.getModule() should be checked to be sun.java2d.cmm.kcms.CMM (not quite sure if this will be the correct class; simply checking for '.kcms.' within the qualified name should be enough. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable thi
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611300#comment-16611300 ] Timo Boehme commented on PDFBOX-4309: - I doubt that this test for KCMS will work. If one did not set sun.java2d.cmm it returns null. If you set it and KCMS is not available it is ignored. Even with OpenJDK 1.7 on Linux (newer versions) no KCMS is available, also in OpenJDK 1.8 - I tried to load sun.java2d.cmm.kcms.KcmsServiceProvider {color:#33}but class was not found (also checking the rt.jar) - while in the Java for Windows it is still included (1.8.0_192ea).{color} {color:#33}Thus the only reliable check will be loading the class sun.java2d.cmm.kcms.KcmsServiceProvider (e.g. Class.forName()). Only if this does not fail the system property check is sensible.{color} > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent us
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610975#comment-16610975 ] Timo Boehme commented on PDFBOX-4309: - For completeness I've also checked Linux versions: Suse with OpenJDK 1.8.0_171 is slow (400ms) while Ubuntu with 1.8.0_181 is fast (8 ms). Thus the startup optimization seems to be in one of the latest releases. Interesting to see that even with the newest Java 8 release there is a considerable performance increase when using the performance fix. At least using LCMS is now possible again - the previous behavior lead to inacceptable processing times. With 'change you proposed this morning' do you mean the patch against PDColorSpace? I thought that this could maybe be skipped with the now commited patch against PDICCBased (also see [#comment-16610539]) - or did you mean this last one? > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color sp
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610884#comment-16610884 ] Timo Boehme commented on PDFBOX-4309: - So the good news is that with the most current versions the fix may not be needed anymore. However for production use with restrictions on Java updates we should have the property available. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing la
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610879#comment-16610879 ] Timo Boehme commented on PDFBOX-4309: - I've checked the current JRE on Windows (1.8.0_181) and yes, it has considerably improved. My test program now only takes approx. 10 ms for the first ICC access. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing large collections of PDF do
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610843#comment-16610843 ] Timo Boehme commented on PDFBOX-4309: - [^ICCImplCheck.java] can be used to check required time using ICC color space created from loaded ICC profile. For instance use profile contained in PDFBox under pdfbox/src/main/resources/org/apache/pdfbox/resources/icc/ISOcoated_v2_300_bas.icc I've checked my Linux environment as well as Windows with Java 1.8.0_66. For both the output looks like {noformat} ICC usage (0) time (ms): 600 ICC usage (1) time (ms): 0 ICC usage (2) time (ms): 0 ICC usage (0) time (ms): 570 ICC usage (1) time (ms): 0 ICC usage (2) time (ms): 0 ICC usage (0) time (ms): 584 ICC usage (1) time (ms): 0 ICC usage (2) time (ms): 0{noformat} (under Linux it was even faster with approx. 400 ms). Thus the first usage of a new ICC color space take approx. 0.5 seconds; all following are fast. This means for documents containg a lot of such color spaces the first access times add up to possibly a very large number. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternat
[jira] [Updated] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme updated PDFBOX-4309: Attachment: ICCImplCheck.java > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: ICCImplCheck.java, PDColorSpace.java.patch, > PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing large collections of PDF documents (e.g. focusing on text) or > to display a PDF in a timely manner the performance improvement should > outperform the drop in image quality. > Whil
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610539#comment-16610539 ] Timo Boehme commented on PDFBOX-4309: - I tracked the problem more deeply and found that my previous finding was only the tip of the iceberg (if I skip 'toRGB' trigger it will hang on 'new Color' trigger. The underlying problem is that if the (ICC) color profile is 'used' (toRGB() or convert-operation) for the first time it takes about 0.4 seconds (in my environment). Time is spend in native call LCMS.createNativeTransform(). And this call is not only triggered by PDColorSpace.toRGBImageAwt() but also e.g. by ShadingContext.convertToRGB or PageDrawer.getPaint when drawing a glyph. Thus the only possibility to prevent these half second delays per ICC profile is my first proposal to use the alternate color spaces in each case, activated by system property. I will therefore apply my first patch. The other solution with drawing selected images 'directly' instead of via ColorConvertOp only captures part of the problem. It would only be sensible to have this as an alternative to my first solution if the resulting colors are nearer to the original ones as with the alternative color spaces and rendering small images is the main problem - depends on document content. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: PDColorSpace.java.patch, PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse Ice
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610415#comment-16610415 ] Timo Boehme commented on PDFBOX-4309: - I've found the reason why the 'direct-draw' solution is slower than mine and also is much slower on other pages of the problematic document (e.g. 9 seconds vs. 0.2 seconds): in PDICCBased.loadICCProfile() some operations are performed to trigger exceptions in order to fall back to alternate color space. The trigger awtColorSpace.toRGB() results (in my environment) in a 0.4 second delay - it seems internally it also uses the slow color-convert operation. I wanted to check if an alternative operation without this side-effect could be used, however I found no document to trigger the exception (in my environment). In the code there are following references to problematic documents: * PDFBOX-1295: triggers an exception but with trigger 'ComponentColorModel', not the 'toRGB' * PDFBOX-1740: same as PDFBOX-1295 * PDFBOX-3610: no exception Thus its not clear to me if the trigger 'toRGB' is still needed. At least I would like to have a switch to disable this trigger so that the trigger by default is 'on' for compatibility. For PDFBOX version 3.x we could maybe remove it - in case we don't find any documents the trigger is good for. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: PDColorSpace.java.patch, PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Sus
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610340#comment-16610340 ] Timo Boehme commented on PDFBOX-4309: - I've added a patch with the changes to PDColorSpace as discussed. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: PDColorSpace.java.patch, PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing large collections of PDF documents (e.g. focusing on text) or > to display a PDF in a timely manner the performance improvement should > outpe
[jira] [Updated] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme updated PDFBOX-4309: Attachment: PDColorSpace.java.patch > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: PDColorSpace.java.patch, PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 second! > It is clear that using the alternate color space might return wrong/not exact > colors. Therefore it should be only an option to enable this mode. However > for processing large collections of PDF documents (e.g. focusing on text) or > to display a PDF in a timely manner the performance improvement should > outperform the drop in image quality. > While t
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610269#comment-16610269 ] Timo Boehme commented on PDFBOX-4309: - Thanks for the hint. I have tested the solution proposed in PDFBOX-4149. When I set the condition to w*h < 200 all color-convert operations will be done by the alternative Graphics.drawImage. The time needed to render the problematic page is 2.1 second while with my solution it takes 1.2 seconds. With my test document I cannot comment on the color difference of the rendered images - [~tilman]: are you able to say which solution is nearer to the correct colors? In principle the solution from PDFBOX-4149 is more general. In my environment (see above) only 2 ColorConvertOp per second are performed (I assume it is the calling of LCMS, not the real rendering). Thus this operations need to be prevented as much as possible. I would suggest having a parameter specifying the maximum image size (width*height) until which the alternative drawing will be done: {code:java} if(raster.getWidth() * raster.getHeight() > MAX_DIRECTDRAW_IMAGESIZE) { ColorConvertOp op = new ColorConvertOp(null); op.filter(src, dest); } else { Graphics g = dest.getGraphics(); g.drawImage(src, 0, 0, null); g.dispose(); } {code} Default would be -1. WDYT? > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Su
[jira] [Commented] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
[ https://issues.apache.org/jira/browse/PDFBOX-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604374#comment-16604374 ] Timo Boehme commented on PDFBOX-4309: - The problem in the document I have is that each of the 2500 images has its own indexed color space (only a few colors) and when the color space is initialized (PDIndexed.initRgbColorTable() -> baseColorSpace.toRGBImage()) the indexed colors are converted via ICC profile using the ColorConvertOp which itself calls the LCMS. Thus caching of color spaces won't help here as far as I can see - caching was also my first idea. > Performance regression in PDColorSpace#toRGBImageAWT Part 2 > --- > > Key: PDFBOX-4309 > URL: https://issues.apache.org/jira/browse/PDFBOX-4309 > Project: PDFBox > Issue Type: Improvement > Components: Rendering >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Labels: optimization > Attachments: PDICCBased.java.patch > > > This is a continuation of PDFBOX-3569. In a (private) PDF document there are > graphics produced by CorelDraw which are combined by more than 2500(!) > images, each with its own indexed color space based on an ICC color space > (the shadows of graphic objects are created by large number of gray lines > ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) > rendering a single page with one graphic takes 780 seconds. The most time is > spent in creating the indexed color space via ICC color space mapping: > {noformat} > java.lang.Thread.State: RUNNABLE > at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) > at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) > at > sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) > - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) > at > sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) > at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) > at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) > at > org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) > at > org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) > at > org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) > at > org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) > at > org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) > at > org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} > The call of LittleCMS (LCMS) multi thousand times is the problem here taking > way to much time. Unfortunately using kcms via > {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no > option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) > - in both Java 7 and Java 8. > However the ICC color space (PDICCBased) returns in this case CMYK as > alternate color space and for CMYK we have the alternative rendering via > system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from > PDFBOX-3569. > The idea is now to have an option to force using the alternative color space > instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as > alternative color space it has to be combined with the system property > 'UsePureJavaCMYKConversion'. > Using this approach the rendering time of the page with the problematic > graphic drops from 780 seconds to 1 s
[DISCUSS] Document specific processing properties
Hi, my last (proposed) addition of a system property controlling rendering (PDFBOX-4309) adds to other already existing properties (e.g. org.apache.pdfbox.rendering.UsePureJavaCMYKConversion, possible more?). These settings are important for specific use cases/environments. Even more they are often only needed for specific PDF documents - e.g. the mentioned properties especially cure a problem with excessive calls to Java color management implementation, without them some documents are practically not processable. In other cases the settings also could have negative effects like slower processing or wrong colors. Thus it would be good to have the possibility to adjust settings on a per-document basis (either directly by user or based on checking/collecting document features like number of images etc.). The problem is how these document specific settings can be provided to the relevant classes, e.g. PDICCBased. Providing a settings object through the call-chain is probably not an option as a lot of constructors/methods would have to be changed for only a few places where the settings are really needed. One viable solution which came to my mind is using a ThreadLocal ProcessingProperties map (String,String). In order to not get unwanted side-effects using these properties should be initiated by the user and it should be clearly documented to do it in a try-finally block in order to remove the settings after processing (and also to not get memory leaks etc.), like: try { LocalProcessingProperties.activate(); // creates a map object ... // PDF processing } finally { LocalProcessingProperties.clear(); // removed map object } A call to LocalProcessingProperties.getProperty( KEY ) would return value from ThreadLocal map - if map exists and contains this key, otherwise fall back to return System.getProperty( KEY ). As PDFBOX (currently) doen't use multiple threads this should work fine - for multi-threaded usage an initialization/clear would be needed for each thread which could get the reference to the map object of the main processing thread. WDYT? Best regards, Timo -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4309) Performance regression in PDColorSpace#toRGBImageAWT Part 2
Timo Boehme created PDFBOX-4309: --- Summary: Performance regression in PDColorSpace#toRGBImageAWT Part 2 Key: PDFBOX-4309 URL: https://issues.apache.org/jira/browse/PDFBOX-4309 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.11, 3.0.0 PDFBox Reporter: Timo Boehme Assignee: Timo Boehme Attachments: PDICCBased.java.patch This is a continuation of PDFBOX-3569. In a (private) PDF document there are graphics produced by CorelDraw which are combined by more than 2500(!) images, each with its own indexed color space based on an ICC color space (the shadows of graphic objects are created by large number of gray lines ...). In our environment (OpenJDK 7 and OpenJDK 8, IcedTea, Suse Linux 64Bit) rendering a single page with one graphic takes 780 seconds. The most time is spent in creating the indexed color space via ICC color space mapping: {noformat} java.lang.Thread.State: RUNNABLE at sun.java2d.cmm.lcms.LCMS.createNativeTransform(Native Method) at sun.java2d.cmm.lcms.LCMS.createTransform(LCMS.java:156) at sun.java2d.cmm.lcms.LCMSTransform.doTransform(LCMSTransform.java:155) - locked <0x000723af9e30> (a sun.java2d.cmm.lcms.LCMSTransform) at sun.java2d.cmm.lcms.LCMSTransform.colorConvert(LCMSTransform.java:268) at java.awt.image.ColorConvertOp.ICCBIFilter(ColorConvertOp.java:355) at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:282) at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.toRGBImageAWT(PDColorSpace.java:314) at org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.toRGBImage(PDICCBased.java:276) at org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.initRgbColorTable(PDIndexed.java:141) at org.apache.pdfbox.pdmodel.graphics.color.PDIndexed.(PDIndexed.java:91) at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:184) at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.createFromCOSObject(PDColorSpace.java:240) at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:92) at org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:70) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getColorSpace(PDImageXObject.java:672) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:196) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:443) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:424) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1046){noformat} The call of LittleCMS (LCMS) multi thousand times is the problem here taking way to much time. Unfortunately using kcms via {{-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider}} is also no option as the Suse IceadTea OpenJDK seems to not have included it (anymore?) - in both Java 7 and Java 8. However the ICC color space (PDICCBased) returns in this case CMYK as alternate color space and for CMYK we have the alternative rendering via system property org.apache.pdfbox.rendering.UsePureJavaCMYKConversion from PDFBOX-3569. The idea is now to have an option to force using the alternative color space instead of the ICC one to circumvent using LCMS in toRGBImage(). For CMYK as alternative color space it has to be combined with the system property 'UsePureJavaCMYKConversion'. Using this approach the rendering time of the page with the problematic graphic drops from 780 seconds to 1 second! It is clear that using the alternate color space might return wrong/not exact colors. Therefore it should be only an option to enable this mode. However for processing large collections of PDF documents (e.g. focusing on text) or to display a PDF in a timely manner the performance improvement should outperform the drop in image quality. While the provided patch will use the alternate color space if activated in any case, it could be possible at a later stage to add more intelligent logic which decides on a runtime analysis when to use this mode (number of calls to LCMS, time needed etc.). If there are no objections with this patch I will apply it in the next days. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-4307) ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary
[ https://issues.apache.org/jira/browse/PDFBOX-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme resolved PDFBOX-4307. - Resolution: Fixed Fix Version/s: 3.0.0 PDFBox 2.0.12 > ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is > not a dictionary > > > Key: PDFBOX-4307 > URL: https://issues.apache.org/jira/browse/PDFBOX-4307 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > > In PDDocumentOutline.getDocumentOutline() the 'outline' is read as dictionary > object and directly cast to COSDictionary. Normally this is ok as it should > be a dictionary. However in a bad PDF as I have it in my collection > (unfortunately I'm not allowed to disclose it) the object is an array > (COSArray) which leads to the ClassCastException. > Since the outline is an optional information the best we can do here is to > ignore the 'outline' data if its not a COSDIctionary and return 'null'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-4307) ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary
[ https://issues.apache.org/jira/browse/PDFBOX-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme reassigned PDFBOX-4307: --- Assignee: Timo Boehme > ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is > not a dictionary > > > Key: PDFBOX-4307 > URL: https://issues.apache.org/jira/browse/PDFBOX-4307 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.11, 3.0.0 PDFBox >Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > > In PDDocumentOutline.getDocumentOutline() the 'outline' is read as dictionary > object and directly cast to COSDictionary. Normally this is ok as it should > be a dictionary. However in a bad PDF as I have it in my collection > (unfortunately I'm not allowed to disclose it) the object is an array > (COSArray) which leads to the ClassCastException. > Since the outline is an optional information the best we can do here is to > ignore the 'outline' data if its not a COSDIctionary and return 'null'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4307) ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary
Timo Boehme created PDFBOX-4307: --- Summary: ClassCastException in PDDocumentCatalog.getDocumentOutline if 'outlines' is not a dictionary Key: PDFBOX-4307 URL: https://issues.apache.org/jira/browse/PDFBOX-4307 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.11, 3.0.0 PDFBox Reporter: Timo Boehme In PDDocumentOutline.getDocumentOutline() the 'outline' is read as dictionary object and directly cast to COSDictionary. Normally this is ok as it should be a dictionary. However in a bad PDF as I have it in my collection (unfortunately I'm not allowed to disclose it) the object is an array (COSArray) which leads to the ClassCastException. Since the outline is an optional information the best we can do here is to ignore the 'outline' data if its not a COSDIctionary and return 'null'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: New releases?
+1 Thanks, Timo Am 29.08.2018 um 18:38 schrieb Andreas Lehmkuehler: Hi, I'm planing to cut the following releases in about 3-4 weeks from now: - JBIG2 3.0.2 (fix for a memory leak) - PDFBox 2.0.12 (there are about 30 fixes/improvements) WDYT? Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Resolved] (PDFBOX-4301) ClassCastException in PDExtendedGraphicsState
[ https://issues.apache.org/jira/browse/PDFBOX-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Boehme resolved PDFBOX-4301. - Resolution: Fixed Fix Version/s: 3.0.0 PDFBox 2.0.12 > ClassCastException in PDExtendedGraphicsState > - > > Key: PDFBOX-4301 > URL: https://issues.apache.org/jira/browse/PDFBOX-4301 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.11 > Reporter: Timo Boehme >Assignee: Timo Boehme >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > > The method PDExtendedGraphicsState.getFloatItem contains a non checked cast > to COSNumber for a dictionary object. In a specific journal PDF document I > get the following exception: > {noformat} > at > org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getFloatItem(PDExtendedGraphicsState.java:591) > at > org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getStrokingAlphaConstant(PDExtendedGraphicsState.java:482) > at > org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:130) > at > org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848){noformat} > because the PDF contains > {noformat} > /A4 << > /CA (1.0) > /Type /ExtGState > /ca (1.0) > >> > {noformat} > where "(1.0)" is clearly wrong and should be "1.0". > As this seems to be a more seldom error I would suggest to check dictionary > object type before casting and returning "null" for wrong type (as it is done > e.g. in PDExtendedGraphicsState.getFontSetting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-4301) ClassCastException in PDExtendedGraphicsState
Timo Boehme created PDFBOX-4301: --- Summary: ClassCastException in PDExtendedGraphicsState Key: PDFBOX-4301 URL: https://issues.apache.org/jira/browse/PDFBOX-4301 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 2.0.11 Reporter: Timo Boehme Assignee: Timo Boehme The method PDExtendedGraphicsState.getFloatItem contains a non checked cast to COSNumber for a dictionary object. In a specific journal PDF document I get the following exception: {noformat} at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getFloatItem(PDExtendedGraphicsState.java:591) at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.getStrokingAlphaConstant(PDExtendedGraphicsState.java:482) at org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState.copyIntoGraphicsState(PDExtendedGraphicsState.java:130) at org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:61) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:848){noformat} because the PDF contains {noformat} /A4 << /CA (1.0) /Type /ExtGState /ca (1.0) >> {noformat} where "(1.0)" is clearly wrong and should be "1.0". As this seems to be a more seldom error I would suggest to check dictionary object type before casting and returning "null" for wrong type (as it is done e.g. in PDExtendedGraphicsState.getFontSetting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.11
Hi, +1 Timo Am 25.06.2018 um 20:51 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.11 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.11/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.11/ The SHA-512 checksum of the archive is a53f6c64e41b4843b103b6d9b964b77c226da1ec21ab0c1c7b14772fa233a53b3f179d16a73deb84803476aced6a74e67f1eb43ba34f3517651c6c52f669aaf7. Please vote on releasing this package as Apache PDFBox 2.0.11. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.11 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 1.8.15
Hi, +1 Best regards, Timo Am 25.06.2018 um 20:13 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 1.8.15 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/1.8.15/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/1.8.15/ The SHA-512 checksum of the archive is ac3f4b131f5cd2153ec2a744c486db921bc2165d596b243ad673cfc94be1bc4ae27bdf2981b63419fead18db569a2008264d6fdc7c89cf47f69f81c4a7d3a2a6. Please vote on releasing this package as Apache PDFBox 1.8.15. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 1.8.15 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox JBIG2 ImageIO 3.0.1
Hi, I've checked some sample images. +1 Best, Timo Am 14.05.2018 um 19:20 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox JBIG2 ImageIO 3.0.1 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/jbig2-imageio-3.0.1/ The release candidate is a zip archive of the sources in: https://github.com/apache/pdfbox-jbig2/tree/jbig2-imageio-3.0.1/ The SHA-512 checksum of the archive is 3688ad3a79caccfa0d43c68011bafb076d71cce4c94e6ed7061c2a127639ccf0e683bd8ce68b0f14d14d6647aaba9e107a6c0ee785daa299a0fd103e0a554626. Please vote on releasing this package as Apache PDFBox JBIG2 ImageIO 3.0.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox JBIG2 ImageIO 3.0.1 [ ] -1 Do not release this package because... Here his my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4058) High memory consumption when extracting image from PDF file
[ https://issues.apache.org/jira/browse/PDFBOX-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321862#comment-16321862 ] Timo Boehme commented on PDFBOX-4058: - I would say that putting the values in weak-references is the correct solution if you look at how WeakHashMap works: only the keys are weak and may be garbage collected at any time but the values are not. The table entries are cleared only if you run a method (get/put/size/...) on the table. Without accessing the table all entries remain (especially the values). Thus in order to allow also the values to be garbage collected if needed it is required to put them into a WeakReference as you have done. > High memory consumption when extracting image from PDF file > --- > > Key: PDFBOX-4058 > URL: https://issues.apache.org/jira/browse/PDFBOX-4058 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 2.0.5, 2.0.6, 2.0.7, 2.0.8 > Environment: windows 10 / Linux >Reporter: Bjorn Misseghers >Assignee: Tilman Hausherr > Labels: regression > Fix For: 2.0.9, 3.0.0 PDFBox > > Attachments: HighMemoryFootprint.pdf > > > When rendering an image at 300 dpi from the included PDF, my java process > uses a huge amount of memory. > The document is only 45 Kb in size and contains 2 pages, my JVM is unable to > extract even 1 page with 3G of memory. Setting Xmx to 4G works but is not the > solution I want. > The error occurs when calling PDFRenderer.renderImageWithDPI() > I already tried tweaking the memory usage in my application to use a scratch > file while loading the document as well as avoiding caching of XObjects as > described here: https://pdfbox.apache.org/2.0/faq.html#outofmemoryerror > These didn't work. > The issue can be reproduced using the pdfbox-app utility: > java -Xmx3G -jar pdfbox-app-2.0.8.jar PDFToImage > HighMemoryFootprint.pdf -dpi 300 -color RGB -page 1 > What can not be changed? > * 300 dpi will not be decreased. > * Max Java memory will not be increased: 3GB is ridiculous for a 45kb PDF > file. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox January 2018 report due
+1 Timo Am 09.01.2018 um 22:14 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report template which can be found at [1] Any further comments, objections or additions? ## Description: - the Apache PDFBox library is an open source Java tool for working with PDF documents. ## Issues: - there are no issue requiring board attention at this time. ## Activity: - the integration of the JBig2 ImageIO plugin is complete - we are planning to release the first Apache based version of the JBig2 ImageIO plugin this month - we are working on fixing bugs in 2.0.x - we have resolve quite a number of 2.0.x releated tickets so that most likely the next bugfix version 2.0.9 will be released this month as well ## Board feedback (comment from the last october board meeting) mt: Reading the "2.0.7 release" thread on private@ it appears that the project is dependent on a single committer for at least a sub-set of regression tests. Could you explain this in more detail please. If there are tests the community depends on, I'd expect to see those tests in an ASF repository where any committer can run them. These tests are not classic regression tests but tests on a large amount (> 50) of files. The results are compared to the results of a previous version and then committers investigate files with some extreme negative differences or with new exceptions. The same is done (on an even larger scale) for Tika, see [1] and [2]. The Tika tests need 4TB, and the files can't be hosted on a public ASF repo or released under the Apache License because the files largely derive from Common Crawl or the internet generally, and copyright/licensing would pose a problem. There is a special vm to host the described test and it is possible to grant access to all interested Tika/PDFBox committers. Tilman already got his access bits in december, so that at least one other committer is able to run those tests if needed. Maybe others will follow. [1] http://events.linuxfoundation.org/sites/events/files/slides/ApacheConMiami2017_tallison_v2.pdf [2] http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/ ## Health report: - there is a steady stream of contributions, bug reports and questions on the mailing lists ## PMC changes: - Currently 21 PMC members. - New PMC members: - Joerg O. Henne was added to the PMC on Mon Oct 09 2017 - Sebastian Holder was added to the PMC on Wed Oct 11 2017 - Carolin Köhler was added to the PMC on Wed Oct 11 2017 - Matthäus Mayer was added to the PMC on Mon Oct 16 2017 ## Committer base changes: - Currently 21 committers. - Joerg O. Henne was added as a committer on Mon Oct 09 2017 - Sebastian Holder was added as a committer on Wed Oct 11 2017 - Carolin Köhler was added as a committer on Wed Oct 11 2017 - Matthäus Mayer was added as a committer on Mon Oct 16 2017 ## Releases: - 2.0.8 was released on Thu Nov 02 2017 ## JIRA activity: - 101 JIRA tickets created in the last 3 months - 75 JIRA tickets closed/resolved in the last 3 months Andreas [1] https://reporter.apache.org/?pdfbox - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.8
Hi, +1 Thanks, Timo Am 30.10.2017 um 19:47 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.8 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.8/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.8/ The SHA1 checksum of the archive is 5c0607144dde1b7af3dd428cafbd2c9c29617ab3. Please vote on releasing this package as Apache PDFBox 2.0.8. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.8 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Accept the JBig2 ImageIO Plugin contribution (PDFBOX-3906)
Hi, +1 and thanks for the contribution. Best, Timo Am 02.09.2017 um 11:42 schrieb Andreas Lehmkuehler: Hi, The contributed JBig2 ImageIO Plugin codebase is now available for review in PDFBOX-3906 [1] and the relevant IP clearance process has been started [2]. As discussed, I propose that we accept this codebase and invite the JBig2 ImageIO Plugin developers listed below as new committers and PMC members of the PDFBox project. Jörg Henne Matthäus Mayer Sebastian Holder Carolin Köhler So, please vote on accepting the JBig2 ImageIO Plugin contribution and granting committer and PMC member status to the people listed above, on the condition that the IP clearance passes without problems. This vote is open for the next 72 hours. [ ] +1 Accept the JBig2 ImageIO Plugin contribution and grant committer and PMC member status to the above people, assuming the IP clearance passes [ ] -1 Don't accept the codebase and/or grant committership, because... Here is my +1. BR Andreas Lehmkühler [1] https://issues.apache.org/jira/browse/PDFBOX-3906 [2] https://incubator.apache.org/ip-clearance/pdfbox-jbig2.html - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.6
Hi, +1 I don't known which change caused the difference but my special journal test document renders for the first time completely (before some images had either wrong colors or were wrong missing). Very nice. Best, Timo Am 12.05.2017 um 18:13 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.6 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.6/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.6/ The SHA1 checksum of the archive is cb04fa19058efca6913a45490ac66cf44ecf273a. Please vote on releasing this package as Apache PDFBox 2.0.6. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.6 [ ] -1 Do not release this package because... Here is my +1 BR Andreas Lehmkühler - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: Apache PDFBox April 2017 board report due
Hi, +1 Best regards, Timo Am 09.04.2017 um 13:30 schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. It's based upon the report template which can be found at [1] Any further comments, objections or additions? ## Description: - the Apache PDFBox library is an open source Java tool for working with PDF documents. ## Issues: - there are no issue requiring board attention at this time. ## Activity: - we are working on fixing bugs in 2.0.x - there are some small improvements as well - we decided to switch the current trunk from 2.1.0 to 3.0.0 as we are going to introduce some api changes which require a major release - Maruan started an effort to update our website - we support the new donation campaign and added the logo including a link ## Health report: - there is a steady stream of contributions, bug reports and questions on the mailing lists ## PMC changes: - Currently 17 PMC members. - No new PMC members added in the last 3 months - Last PMC addition was Tim Allison on Mon Sep 19 2016 ## Committer base changes: - Currently 17 committers. - No new committers added in the last 3 months - Last committer addition was Tim Allison at Mon Sep 19 2016 ## Releases: - 2.0.5 was released on Fri Mar 17 2017 ## JIRA activity: - 105 JIRA tickets created in the last 3 months - 106 JIRA tickets closed/resolved in the last 3 months - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4| fax: +49 345 478 047 1 email: timo.boe...@ontochem.com | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org