Please try current version 1.8.12, maybe it is fixed there (I see no
"streamOffset != prev" anywhere - maybe you mean something else?). If
not, look whether it is fixed in the version on svn,
https://svn.apache.org/viewvc/pdfbox/branches/1.8/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/NonSequentialPDFParser.java?view=markup
and if not, please open an issue in JIRA, preferably with a diff.
Tilman
Am 21.11.2016 um 23:44 schrieb Brzrk One:
ewps... left out that it was pdfbox 1.8.9...
On Mon, Nov 21, 2016 at 5:12 PM, Brzrk One <[email protected]> wrote:
I have a PDF file (which I cannot share) with the trailer:
trailer
<<
/Size 16922
/Root 1 0 R
/Info 9 0 R
/ID [<495BB8DD62106B9AB4E6E1C8B591C982> <91EB7F87537B4838AF45C0D28A9882
80>]
/XRefStm 5347791
startxref
5135270
But there is only a single xref table in this pdf file: there is no object
with /Type /XRef.
In this situation, NonSequentialPDFParser.parseXref() will enter the
XREF_STM paragraph, but, since there is no object with /Type /XRef at
offset 5347791 (a position that lands smack dab in the middle of the xref
table) it does a brute force search for some XRef entry, and returns offset
5135270, which is the location of the one and only xref table in the file.
I added this check to the XREF_STM paragraph, which seems to get around
the problem:
*if* ( streamOffset != prev ) {
// if the positions are the same, this a hybrid *xref* table / *xrefstm*
but no /XRef stream...
parseXrefObjStream(prev, *false*);
}
I see similar code in 2.0.3 COSParser.parseXref().
HtH, Pat
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]