hi tilman,

thank you for your reply.

re 1: "Please upload your file to a sharehoster"
i am currently asking for the permission to do so, as such file belong to a
customer

re 2: "Does it happen when the file is opened with one of the command line
tools, or from code?"
both: i can reproduce the error with
java org.apache.pdfbox.tools.PDFBox export:xfdf -i ${file} -o whatever.xfdf
Jan 11, 2026 1:40:00 PM org.apache.pdfbox.pdfparser.COSParser
validateStreamLength
WARNING: The end of the stream doesn't point to the correct offset, using
workaround to read the stream, stream start position: 827917, length: 2118,
expected end position: 830035
Jan 11, 2026 1:40:00 PM org.apache.pdfbox.pdfparser.COSParser
validateStreamLength
WARNING: The end of the stream doesn't point to the correct offset, using
workaround to read the stream, stream start position: 934902, length: 1097,
expected end position: 935999
*Error exporting XFDF data [IOException]: Page tree root must be a
dictionary*

re 3: "If code, what is the smallest code that does it?"
this is not for 'code' only and the easiest mean of reproducing the issue
is to try to open the file as pdf:

public class PDFSample {
    public static void main (String[] args) throws IOException {
        if (1 > args.length) {
            System.out.println ("usage: PDFSample ${files...}");
            return;
        }
        Path path = Paths.get (args [0]).toAbsolutePath ();
        if (!Files.isReadable (path)) {
            System.out.println ("cannot read " + path);
            return;
        }
        try (PDDocument ignore = Loader.loadPDF (path.toFile ())) {
            System.out.println ("doc " + path + " loaded");
        }
    }
}

... supplying the file name as 1st argument

re 3: "Is the file local or downloaded from a server?"
this can be reproduced with a local file

re 4: "If yes, is the downloaded file the same that you get locally?"
this is not applicable as the issue can be reproduced with local file


I will keep you updated, tilman, regarding possibility to share the file,
as soon as I get answer from customer.

thank you again for your reply.
have a nice WE,


On Sun, Jan 11, 2026 at 11:30 AM Tilman Hausherr <[email protected]>
wrote:

> Hi,
> Please upload your file to a sharehoster.
> Does it happen when the file is opened with one of the command line
> tools, or from code? If code, what is the smallest code that does it? Is
> the file local or downloaded from a server? If yes, is the downloaded
> file the same that you get locally?
> Tilman
>
> Am 10.01.2026 um 23:54 schrieb mountain the blue:
> > hi,
> >
> > I have encountered an issue attempting to open a pdf file generated by
> > word365 with old version of pdfbox 2.x.
> >
> > I was able to reproduce the same error it on latest version of pdfbox
> > 3.0.6+ with a sample code that just tries to open such pdf file.
> >
> > the error reports:
> > Jan 10, 2026 2:13:23 PM org.apache.pdfbox.pdfparser.COSParser
> > validateStreamLength
> > WARNING: The end of the stream doesn't point to the correct offset, using
> > workaround to read the stream, stream start position: 827917, length:
> 2118,
> > expected end position: 830035
> > Jan 10, 2026 2:13:23 PM org.apache.pdfbox.pdfparser.COSParser
> > validateStreamLength
> > WARNING: The end of the stream doesn't point to the correct offset, using
> > workaround to read the stream, stream start position: 934902, length:
> 1097,
> > expected end position: 935999
> > Exception in thread "main" java.io.IOException: Page tree root must be a
> > dictionary
> > at org.apache.pdfbox.pdfparser.COSParser.checkPages(COSParser.java:1416)
> > at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:120)
> > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:171)
> > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:136)
> > at org.apache.pdfbox.Loader.loadPDF(Loader.java:483)
> > at org.apache.pdfbox.Loader.loadPDF(Loader.java:359)
> >
> > looking further at the code, it decides to do 'brute force' parsing ...
> and
> > does not find expected Pages entry.
> >
> > What happens is that
> > method org.apache.pdfbox.pdfparser.PDFXrefStreamParser#parse () is called
> > and, at some stage records an Xref configured with offset 0 ...
> > such recording is later verified by
> > the org.apache.pdfbox.pdfparser.COSParser#validateXrefOffsets () that
> > cannot resolve object for such offset
> > (see org.apache.pdfbox.pdfparser.COSParser#findObjectKey) ... and
> therefore
> > reset the parsing ... triggering the 'brute force' approach.
> >
> > The org.apache.pdfbox.pdfparser.PDFXrefStreamParser#parse () method
> > currently do ...
> >
> > ...
> >
> > // second field holds the offset (type 1) or the object stream number
> (type 2)
> > long offset = parseValue(currLine, w[0], w[1]);
> > // third filed may hold the generation number (type1) or the index
> > within a object stream (type2)
> > int thirdValue = (int) parseValue(currLine, w[0] + w[1], w[2]);
> >
> > ...
> >
> >
> > *Q1*: can we add some test in the code the exclude the recording of xref
> if
> > the offset if either less than 6
> > (org.apache.pdfbox.pdfparser.COSParser#MINIMUM_SEARCH_OFFSET) ... or if
> it
> > is 0 ... so that pdfbox can accept such incorrect file(s) ?
> >
> > ie:
> >
> > // second field holds the offset (type 1) or the object stream number
> (type 2)
> > long offset = parseValue(currLine, w[0], w[1]);
> >
> >
> > *if (0 == offset){    // found some incorrect PDF file that were
> > showing such xref entry*
> >
> > *    continue;*
> >
> >
> > *}*// third filed may hold the generation number (type1) or the index
> > within a object stream (type2)
> > int thirdValue = (int) parseValue(currLine, w[0] + w[1], w[2]);
> >
> >
> > If pdfbox cannot be change to accommodate such file ...
> >
> > *Q2*: would you have any recommandation to share ?
> >
> > thank you,
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to