Hi,
I fixed the first bug. I'll see if I can reproduce the second one with
the many files I have. It might take some time before the new versions
appears on the repository, there are currently build problems.
JIRA is best to discuss the problems.
I'd like to get both files, despite that the bug could be fixed easily.
I'm wondering what happened (maybe no parent tree at all?).
Tilman
On 19.05.2025 17:59, Alastair Porter wrote:
Hi,
With some of our PDFs I get two different errors:
1:
java.lang.NullPointerException
at org.apache.pdfbox.multipdf.Splitter.cloneStructureTree(Splitter.java:238)
at org.apache.pdfbox.multipdf.Splitter.split(Splitter.java:145)
at org.apache.pdfbox.tools.PDFSplit.call(PDFSplit.java:133)
at org.apache.pdfbox.tools.PDFSplit.call(PDFSplit.java:41)
at picocli.CommandLine.executeUserObject(CommandLine.java:2031)
This appears to be related to your change in rev 1925636 (PDFBOX-6009: get
ParentTreeNextKey from tree). I note that with the commit before this
change, the splitter runs and generates files, but I've not yet verified
the accuracy of the structure tree.
This appears to happen on files whose /K have no /Pg element
2
Exception in thread "main" java.lang.StackOverflowError
at java.base/java.lang.StringCoding.encodeUTF8(StringCoding.java:909)
at java.base/java.lang.StringCoding.encode(StringCoding.java:449)
at java.base/java.lang.String.getBytes(String.java:964)
at org.apache.pdfbox.cos.COSName.writePDF(COSName.java:778)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSName(COSWriterObjectStream.java:308)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:232)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:352)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:240)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:354)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:240)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:329)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:236)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:354)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:240)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:354)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:240)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:329)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:236)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:354)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:240)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:329)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:236)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:354)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:240)
at
org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:354)
... continues
I also get this stack overflow on one of the sample files that I
successfully tested on Friday, so it's possible that a change since then
has caused this.
This appears to happen on files whose /K do have a /Pg (on files with no
/Pg I get the NPE first)
I'm currently verifying if we can privately share these documents with you.
Please let me know if it would be useful for debugging.
I have an account on apache jira, please let me know if you'd prefer to
continue there, or if it's OK to use the mailing list.
Thanks,
Alastair
On Sat, 17 May 2025 at 13:23, Tilman Hausherr <thaush...@t-online.de> wrote:
Hi,
Make sure to download the software again, I found another bug that I fixed.
Tilman
On 16.05.2025 21:36, Alastair Porter wrote:
Hi Tilman,
Please try with a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/4.0.0-SNAPSHOT/
Now elements without /Pg entry are removed only if they have MCIDs. Note
that the "new" second page doesn't pass the PAC test but this is because
it starts with H2.
It looks like this works! Thanks for the prompt response and fix. I've
checked a few test files which I have and their splits now include the
expected tags. I'll send this to the rest of our team next week for them
to
review in more detail, but it looks like things are working here for us.
Thanks again.
Alastair
On Fri, 16 May 2025 at 16:17, Tilman Hausherr <thaush...@t-online.de>
wrote:
Please try with a snapshot:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/4.0.0-SNAPSHOT/
Now elements without /Pg entry are removed only if they have MCIDs. Note
that the "new" second page doesn't pass the PAC test but this is because
it starts with H2.
Please try the new version on other PDFs that had this problem.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org