I've discovered another related issue. PdfTokenizer is unable to reach into the next content stream in order to get a token. So any objects which are split across Contents have an UnexpectedEOF raised. My suggested solution to the problem is to either concatenate all the Content streams before doing any tokenization or to make PdfTokenizer::GetNextToken virtual and move the stream switching logic into PdfContentsTokenizer::GetNextToken such that it will try the parents version, attempt to move to the next stream (if it exists) on failure, then retry. Attached is a very basic example of an array split between two streams.

- Mike Slegeir

Mike Slegeir wrote:
I've resolved this issue in an admittedly hacky way. This may be sufficient for this problem though. Attached is a patch which fixes the issue. I've only done limited testing, but it does at least correct the issue.

- Mike Slegeir

When using PdfContentsTokenizer with a PDF with an array for Contents
rather than a single stream, the tokenizer will reset its position to
the beginning of the first stream upon exhausting a stream. An Contents
array with contents X Y Z will appear as X X Y X Y Z to a user of the
PdfContentsTokenizer. Attached is a PDF which has a Contents array. I
can provide example code and output if necessary.

Attachment: split-array.pdf
Description: Adobe PDF document

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to