[jira] [Commented] (PDFBOX-5209) Using Chinese character make the file size increases
[ https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359746#comment-17359746 ] LI MING commented on PDFBOX-5209: - Ok,I saw the updated remark,thanks.expect some methods to solve it in the future. > Using Chinese character make the file size increases > - > > Key: PDFBOX-5209 > URL: https://issues.apache.org/jira/browse/PDFBOX-5209 > Project: PDFBox > Issue Type: Improvement > Components: AcroForm >Affects Versions: 2.0.15 > Environment: java jdk 1.8 >Reporter: LI MING >Priority: Blocker > Labels: FileSize > > Like the title,we use Chinese Character to generate PDF form file ,it is > successed.but the file size is larger than 10mb.except change the font > file,Is there any other way we can solve this problem? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5209) Using Chinese character make the file size increases
[ https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LI MING updated PDFBOX-5209: Issue Type: Improvement (was: Bug) > Using Chinese character make the file size increases > - > > Key: PDFBOX-5209 > URL: https://issues.apache.org/jira/browse/PDFBOX-5209 > Project: PDFBox > Issue Type: Improvement > Components: AcroForm >Affects Versions: 2.0.15 > Environment: java jdk 1.8 >Reporter: LI MING >Priority: Blocker > Labels: FileSize > > Like the title,we use Chinese Character to generate PDF form file ,it is > successed.but the file size is larger than 10mb.except change the font > file,Is there any other way we can solve this problem? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5209) Using Chinese character make the file size increases
[ https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359737#comment-17359737 ] Tilman Hausherr commented on PDFBOX-5209: - (This is related to the last 4 remarks of PDFBOX-4629. Using a subsetted font for acroform doesn't work, because the appearance content stream doesn't know its own PDDocument so it isn't put into the subset list, so the font ends up not being embedded at all) > Using Chinese character make the file size increases > - > > Key: PDFBOX-5209 > URL: https://issues.apache.org/jira/browse/PDFBOX-5209 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Affects Versions: 2.0.15 > Environment: java jdk 1.8 >Reporter: LI MING >Priority: Blocker > Labels: FileSize > > Like the title,we use Chinese Character to generate PDF form file ,it is > successed.but the file size is larger than 10mb.except change the font > file,Is there any other way we can solve this problem? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5209) Using Chinese character make the file size increases
[ https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359735#comment-17359735 ] LI MING commented on PDFBOX-5209: - it is a little similar to this question in the comments > Using Chinese character make the file size increases > - > > Key: PDFBOX-5209 > URL: https://issues.apache.org/jira/browse/PDFBOX-5209 > Project: PDFBox > Issue Type: Bug > Components: AcroForm >Affects Versions: 2.0.15 > Environment: java jdk 1.8 >Reporter: LI MING >Priority: Blocker > Labels: FileSize > > Like the title,we use Chinese Character to generate PDF form file ,it is > successed.but the file size is larger than 10mb.except change the font > file,Is there any other way we can solve this problem? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5209) Using Chinese character make the file size increases
LI MING created PDFBOX-5209: --- Summary: Using Chinese character make the file size increases Key: PDFBOX-5209 URL: https://issues.apache.org/jira/browse/PDFBOX-5209 Project: PDFBox Issue Type: Bug Components: AcroForm Affects Versions: 2.0.15 Environment: java jdk 1.8 Reporter: LI MING Like the title,we use Chinese Character to generate PDF form file ,it is successed.but the file size is larger than 10mb.except change the font file,Is there any other way we can solve this problem? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4629) WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'
[ https://issues.apache.org/jira/browse/PDFBOX-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359731#comment-17359731 ] Tilman Hausherr commented on PDFBOX-4629: - Use a smaller font :-( > WARNING: Using fallback font 'LiberationSans' for 'LiberationSans' > -- > > Key: PDFBOX-4629 > URL: https://issues.apache.org/jira/browse/PDFBOX-4629 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.15, 2.0.16 >Reporter: Nina Sutter >Priority: Minor > Attachments: image-2019-08-12-08-39-13-652.png, > image-2019-08-12-08-43-29-117.png, image-2019-08-12-08-44-58-434.png, > image-2019-08-12-09-25-11-355.png, image-2019-08-12-09-27-27-416.png > > > Hey everyone, > I use pdfbox version 2.0.15 deployed on AWS Lambda. > My template is done in LibreOffice and the fields are set-up in font > Liberation Sans. > When I fill the fields in the pdf I get the following log message on > CloudWatch: > WARNING: Using fallback font 'LiberationSans' for 'LiberationSans' > > For me this message doesn't tell me what the actual problem is. The pdf still > gets filled with the required data, however this message puzzles me. Maybe > you can help me understanding the issue. > > > Best regards -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4629) WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'
[ https://issues.apache.org/jira/browse/PDFBOX-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359729#comment-17359729 ] LI MING commented on PDFBOX-4629: - Excuse me,About the question that the file rendered pdf is larger than 10MB,Is there any way to solve it? > WARNING: Using fallback font 'LiberationSans' for 'LiberationSans' > -- > > Key: PDFBOX-4629 > URL: https://issues.apache.org/jira/browse/PDFBOX-4629 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.15, 2.0.16 >Reporter: Nina Sutter >Priority: Minor > Attachments: image-2019-08-12-08-39-13-652.png, > image-2019-08-12-08-43-29-117.png, image-2019-08-12-08-44-58-434.png, > image-2019-08-12-09-25-11-355.png, image-2019-08-12-09-27-27-416.png > > > Hey everyone, > I use pdfbox version 2.0.15 deployed on AWS Lambda. > My template is done in LibreOffice and the fields are set-up in font > Liberation Sans. > When I fill the fields in the pdf I get the following log message on > CloudWatch: > WARNING: Using fallback font 'LiberationSans' for 'LiberationSans' > > For me this message doesn't tell me what the actual problem is. The pdf still > gets filled with the required data, however this message puzzles me. Maybe > you can help me understanding the issue. > > > Best regards -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359502#comment-17359502 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1890618 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1890618 ] PDFBOX-4892: optimize, as suggested by valerybokov > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359501#comment-17359501 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1890617 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1890617 ] PDFBOX-4892: optimize, as suggested by valerybokov > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359477#comment-17359477 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1890615 from Tilman Hausherr in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1890615 ] PDFBOX-4892: optimize, as suggested by valerybokov > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4892) Improve code quality (4)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359478#comment-17359478 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1890616 from Tilman Hausherr in branch 'pdfbox/trunk' [ https://svn.apache.org/r1890616 ] PDFBOX-4892: optimize, as suggested by valerybokov > Improve code quality (4) > > > Key: PDFBOX-4892 > URL: https://issues.apache.org/jira/browse/PDFBOX-4892 > Project: PDFBox > Issue Type: Improvement >Affects Versions: 2.0.20 >Reporter: Tilman Hausherr >Priority: Minor > > This is a longterm issue for the task to improve code quality, by using the > [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], > hints in different IDEs, the FindBugs tool and other code quality tools. > This is a follow-up of PDFBOX-4071, which was getting too long. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.24
Thanks! +1 Tilman Am 07.06.2021 um 18:51 schrieb Andreas Lehmkuehler: Hi, a candidate for the PDFBox 2.0.24 release is available at: https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/ The SHA-512 checksum of the archive is 5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642. Please vote on releasing this package as Apache PDFBox 2.0.24. The vote is open for the next 72 hours and passes if a majority of at least three +1 PDFBox PMC votes are cast. [ ] +1 Release this package as Apache PDFBox 2.0.24 [ ] -1 Do not release this package because... Here is my +1 Andreas - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.24
+1 Thanks for preparing the release. Maruan Am Montag, dem 07.06.2021 um 18:51 +0200 schrieb Andreas Lehmkuehler: > Hi, > > a candidate for the PDFBox 2.0.24 release is available at: > > https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/ > > The release candidate is a zip archive of the sources in: > > http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/ > > The SHA-512 checksum of the archive is > 5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab > 5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642. > > Please vote on releasing this package as Apache PDFBox 2.0.24. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 PDFBox PMC votes are cast. > > [ ] +1 Release this package as Apache PDFBox 2.0.24 > [ ] -1 Do not release this package because... > > > Here is my +1 > > Andreas > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5198) When merging multiple pdf ua documents, Tags become nested
[ https://issues.apache.org/jira/browse/PDFBOX-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359359#comment-17359359 ] Matthew Jung commented on PDFBOX-5198: -- Hi Hausherr It looks like the issue happens when the PDF file is not tagged correctly as PDF UA. If the pdf is tagged correctly it works fine Matt On Monday, June 7, 2021, 10:31:01 PM EDT, Tilman Hausherr (Jira) wrote: [ https://issues.apache.org/jira/browse/PDFBOX-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358997#comment-17358997 ] Tilman Hausherr commented on PDFBOX-5198: - The release build is planned for today, so if you get examples later, please create a new issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) > When merging multiple pdf ua documents, Tags become nested > -- > > Key: PDFBOX-5198 > URL: https://issues.apache.org/jira/browse/PDFBOX-5198 > Project: PDFBox > Issue Type: Wish > Components: Utilities >Affects Versions: 2.0.21, 2.0.23 >Reporter: Matthew Jung >Assignee: Tilman Hausherr >Priority: Major > Fix For: 2.0.24, 3.0.0 PDFBox > > Attachments: 1622000586495blob.jpg, 1622120149457blob.jpg, > 1622120149457blob.jpg, 1622123253165blob.jpg, 1622123790854blob.jpg, > 1623105725988blob.jpg, 1623105725988blob.jpg, 1623115281967blob.jpg, > Binder1.pdf, PDFA3A-merged-new.pdf, PDFUA-in-a-Nutshell-PDFUA_1.pdf, > nested_tags_4documents_merged_using_pdfbox.tif, > non_nested_tags_4documents_combined_using+adobe_pro.tif, screenshot-1.png > > > When merging PDF UA documents the tags seen in Adobe reader are nested. If > merging 200 documents then the tags are 200 nested deep. It does not appear > to affect that JAWS reader can still read the document but it may slow down > performance when loading to a content repository. > > > > when using Adobe DC to merge multiple documents the tags are flatten > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
Re: [VOTE] Release Apache PDFBox 2.0.24
+1 Thank you! On Mon, Jun 7, 2021 at 12:52 PM Andreas Lehmkuehler wrote: > > Hi, > > a candidate for the PDFBox 2.0.24 release is available at: > > https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/ > > The release candidate is a zip archive of the sources in: > > http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/ > > The SHA-512 checksum of the archive is > 5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642. > > Please vote on releasing this package as Apache PDFBox 2.0.24. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 PDFBox PMC votes are cast. > > [ ] +1 Release this package as Apache PDFBox 2.0.24 > [ ] -1 Do not release this package because... > > > Here is my +1 > > Andreas > > - > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: dev-h...@pdfbox.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Updated] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation
[ https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5207: --- Fix Version/s: 2.0.25 > Page not rendered / extracted, Unknown type in array for TJ operation > - > > Key: PDFBOX-5207 > URL: https://issues.apache.org/jira/browse/PDFBOX-5207 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.23 >Reporter: Tilman Hausherr >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Fix For: 2.0.25 > > Attachments: ContentStream.txt, evince-395-0.zip-0.pdf > > > Worked in 2.0.23, no longer now. The weird thing is that the content stream > (attached) is the same. It contains a "[" in an array at offset 4211. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Assigned] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation
[ https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reassigned PDFBOX-5207: -- Assignee: Andreas Lehmkühler > Page not rendered / extracted, Unknown type in array for TJ operation > - > > Key: PDFBOX-5207 > URL: https://issues.apache.org/jira/browse/PDFBOX-5207 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.23 >Reporter: Tilman Hausherr >Assignee: Andreas Lehmkühler >Priority: Major > Labels: regression > Attachments: ContentStream.txt, evince-395-0.zip-0.pdf > > > Worked in 2.0.23, no longer now. The weird thing is that the content stream > (attached) is the same. It contains a "[" in an array at offset 4211. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation
[ https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359068#comment-17359068 ] ASF subversion and git services commented on PDFBOX-5207: - Commit 1890583 from le...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1890583 ] PDFBOX-5207: skip nested arrays instead of throwing an IOException > Page not rendered / extracted, Unknown type in array for TJ operation > - > > Key: PDFBOX-5207 > URL: https://issues.apache.org/jira/browse/PDFBOX-5207 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.23 >Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: ContentStream.txt, evince-395-0.zip-0.pdf > > > Worked in 2.0.23, no longer now. The weird thing is that the content stream > (attached) is the same. It contains a "[" in an array at offset 4211. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation
[ https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359070#comment-17359070 ] ASF subversion and git services commented on PDFBOX-5207: - Commit 1890584 from le...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1890584 ] PDFBOX-5207: skip nested arrays instead of throwing an IOException > Page not rendered / extracted, Unknown type in array for TJ operation > - > > Key: PDFBOX-5207 > URL: https://issues.apache.org/jira/browse/PDFBOX-5207 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.23 >Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: ContentStream.txt, evince-395-0.zip-0.pdf > > > Worked in 2.0.23, no longer now. The weird thing is that the content stream > (attached) is the same. It contains a "[" in an array at offset 4211. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation
[ https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359065#comment-17359065 ] Andreas Lehmkühler commented on PDFBOX-5207: AFAIK nested arrays are not allowed as operand for a TJ operator. The pdf in question has at least one array which is malformed (nested array, unbalanced number of square braces). Before PDFBOX-5190 those arrays were skipped and now the parser reads as much as possible. That nested arrays lead to an IOException in {{org.apache.pdfbox.contentstream.PDFStreamEngine.showTextStrings(COSArray)}}. I'm thinking about skipping such nested arrays and continue with the remaining part. In the current case the rendering is improved!! BTW: we should think about a refactoring of {{org.apache.pdfbox.pdfparser.PDFStreamParser}}. It uses COS-objects when parsing a content stream. Although such content is very similar to COS-objects, they aren't. This should simplify the parsing and should reduce the resources to be used. But that is another story ... > Page not rendered / extracted, Unknown type in array for TJ operation > - > > Key: PDFBOX-5207 > URL: https://issues.apache.org/jira/browse/PDFBOX-5207 > Project: PDFBox > Issue Type: Bug > Components: Parsing >Affects Versions: 2.0.23 >Reporter: Tilman Hausherr >Priority: Major > Labels: regression > Attachments: ContentStream.txt, evince-395-0.zip-0.pdf > > > Worked in 2.0.23, no longer now. The weird thing is that the content stream > (attached) is the same. It contains a "[" in an array at offset 4211. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org