i encountered a PDF where the offset to the xref table points not to the 'xref' token, but right into the entry of

the first object in that xref table (specificly to the second digit of the generation number of the first object).

Podofo finds the offset and the xref token and assumes the xref table is valid and must be parsed.

Podofo then tries to read the objectnumber and generationnumber from there and crashes when it encounters the "f" (or "n") token, because its trying to read a number (and f/n are not numbers).

Furthermore the xref table has wrong lineendings. In a hexeditor you can see that only "A0" is used as lineendings (no second line ending character D0 nor a whitespace).

I know that this pdf is faulty and not according to spec but it opens in adobe and we could scan the file for objects and make a xreftable ourselfs...

Why are we not handling this error better? We could ignore the xref table if we encounter an error while parsing it, and then just treat it as if this pdf doesnt have a xref table.

Edit: While writing this mail i found: https://sourceforge.net/p/podofo/mailman/message/18520832/ which is the same problem but from 2008.



Error Stack (from podofo browser trying to open that file):

#0 error source: PDfParser.cpp:214

#1 error source: PdfParser.cpp320

#2 error source: PdfParserObject.cpp:97

#3 error source: PdfTokenizer.cpp:353


dots <http://www.dots.de/en/>

Dennis Voss

dots Software GmbH
Schlesische Str. 27, 10997 Berlin, Germany

Tel: +49 (0)30 695 799-47
Fax: +49 (0)30 695 799-55

dennis.v...@dots.de <mailto:dennis.v...@dots.de>
http://www.dots.de <http://www.dots.de/>

Amtsgericht Berlin Charlottenburg HRB 65201
Geschäftsführer: Olaf Lorenz

Follow us on: Twitter <http://www.dots.de/?id=twitter> Youtube <http://www.dots.de/?id=youtube> Xing <http://www.dots.de/?id=xing>

Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Podofo-users mailing list

Reply via email to