On Sun, 2014-06-22 at 23:26 -0500, Dennis Jenkins wrote:
> Hello All,
> 
>     I recently noticed that PoDoFo (svn rev 1642) was unable to 
> parse several older PDFs (all obtained from the USA IRS for tax 
> years 2011 and before).  These PDFs were made with profession Adobe 
> products, so I expect them to be conformant.
> 
>     I narrowed down the version of PoDoFo that causes the failure, 
> but I have not analyzes the source code diff yet.  These PDFs parsed 
> without error under PoDoFO svn rev 1586, but failed on rev 1857 
> (2014-04-01, change to PdfParser.cpp).  Attempting to open the 
> document with PoDoFo::PdfMemDocument() throws "ePdfError_NoNumber".
> 
>     I have a total of 6 IRS tax forms for various years that all 
> fail to open in PoDoFo (they all throw the same exception [2]), but 
> for now, I'll just focus on one.  This [1] PDF was created with 
> "Adobe LiveCycle Designer ES 8.2" on 2010-11-22. (October 2010 
> revision of the 941 tax form).
> 
>     I suspect that PDFs are conformant (unproven hunch) and that 
> PoDoFo 1587+ is buggy.
> 
>     Thoughts?  Analysis?
> 
> 
> [1]   http://www.irs.gov/pub/irs-prior/f941--2010.pdf
> 
> [2]  The following stack trace is from PoDoFo rev 1587:
> PoDoFo encounter an error. Error: 14 ePdfError_NoNumber
>         Error Description: A number was expected but not found.
>         Callstack:
>         #0 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:226
>                 Information: Unable to load objects from file.
>         #1 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:289
>                 Information: Unable to skip xref dictionary.
>         #2 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:738
>         #3 Error Source: /tmp/podofo/src/src/base/PdfParser.cpp:551
>                 Information: Unable to load /XRefStm xref stream.
>         #4 Error Source: 
> /tmp/podofo/src/src/base/PdfParserObject.cpp:109
>                 Information: Object and generation number cannot be 
> read.
>         #5 Error Source: 
> /tmp/podofo/src/src/base/PdfTokenizer.cpp:365
>                 Information: xref
> 
> 
> 

        Hi Mark,
I tried to investigate the above issue, which is after your fix for 
XRefStm streams read at r1587 ( 
http://sourceforge.net/p/podofo/code/1587 ). The file Dennis gave a 
link to at [1] above seems fine with respect of references to 
/XRefStm, but it seems that one of the streams contains a reference to 
an object which is out of position and instead of pointing to some 
"1234 0 obj" the offset points to 'xref' tag. Here is backtrace from 
gdb:

#0  PoDoFo::PdfTokenizer::GetNextNumber (this=0x7fffffffd1d0) at 
src/base/PdfTokenizer.cpp:366
#1  0x00000000004af132 in 
PoDoFo::PdfParserObject::ReadObjectNumber (this=0x7fffffffd180) at 
src/base/PdfParserObject.cpp:105
#2  0x00000000004af459 in 
PoDoFo::PdfParserObject::ParseFile (this=0x7fffffffd180, pEncrypt=0x0, 
bIsTrailer=false) at src/base/PdfParserObject.cpp:134
#3  
0x00000000004d1da1 in PoDoFo::PdfXRefStreamParserObject::Parse 
(this=0x7fffffffd180) at src/base/PdfXRefStreamParserObject.cpp:60
#4  
0x00000000004a9597 in PoDoFo::PdfParser::ReadXRefStreamContents 
(this=0x7b19d0, lOffset=203913, bReadOnlyTrailer=false) at 
src/base/PdfParser.cpp:824
#5  0x00000000004a9690 in 
PoDoFo::PdfParser::ReadXRefStreamContents (this=0x7b19d0, 
lOffset=204202, bReadOnlyTrailer=false) at src/base/PdfParser.cpp:840

#6  0x00000000004a84ae in PoDoFo::PdfParser::ReadNextTrailer 
(this=0x7b19d0) at src/base/PdfParser.cpp:549
#7  0x00000000004a8f9a in 
PoDoFo::PdfParser::ReadXRefContents (this=0x7b19d0, lOffset=204376, 
bPositionAtEnd=true) at src/base/PdfParser.cpp:734
#8  
0x00000000004a6ba0 in PoDoFo::PdfParser::ReadDocumentStructure 
(this=0x7b19d0) at src/base/PdfParser.cpp:287
#9  0x00000000004a6853 in 
PoDoFo::PdfParser::ParseFile (this=0x7b19d0, rDevice=..., 
bLoadOnDemand=true) at src/base/PdfParser.cpp:213
#10 
0x00000000004a6604 in PoDoFo::PdfParser::ParseFile (this=0x7b19d0, 
pszFilename=0x531b73 "f941--2010.pdf", bLoadOnDemand=true) at 
src/base/PdfParser.cpp:157
#11 0x00000000004878e6 in 
PoDoFo::PdfMemDocument::Load (this=0x7aa5b0, pszFilename=0x531b73 
"f941--2010.pdf") at src/doc/PdfMemDocument.cpp:186
#12 
0x000000000047b435 in main () at test.cpp:69


I think of reverting the patch, to support those "probably broken" 
files, but I'd like to hear from you too, whether the file is truly 
broken.

        Thanks and bye,
        zyx


-- 
http://www.litePDF.cz                                 i...@litepdf.cz


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to