Hi,

I have quickly put together what you suggested earlier. The patch is attached in this message, feel free to give it a go. I'm really new at PoDoFo code base; so mistakes are quite inevitable. Please send me some feedbacks if that's ok. About using my sample PDF as a test case, I'm totally fine with it. I know it takes a bit of effort to find/ generate a PDF contains inline image these day :-)

Cheers,
Thach

Attachment: inline-img.patch
Description: Binary data


On 20 Jul 2009, at 03:31, Craig Ringer wrote:

On Wed, 2009-07-15 at 23:52 +0100, Thach Tran wrote:
Hi all,

I'm doing some parsing on PDF's content stream with PoDoFo and came
across this situation where the page content stream contains inline
image. The binary image data lies between ID and EI keyword causes the
tokenizer to choke (ePdfError_InvalidDataType is thrown). I know PDF
files nowadays rarely contain inline image but still, I would like to
know is there any ways to get around this problem (e.g. skip the
binary data and keep on parsing). You can have a look at the sample
PDF enclosed.

PdfContentsTokenizer will need to be enhanced to recognise the ID/EI
keywords and maintain internal state indicating whether or not it's
currently reading binary image data.

This shouldn't be too tricky to add - patches accepted ;-)

Inline images should be reasonably small (small enough, at least, not to be a memory burden) so you should just be able to read the whole inline
image and return the image data in a PdfVariant from the
PdfContentsTokenizer::ReadNext call like usual. I think using the
internal variant type PdfData would be appropriate for the returned
data. I'd suggest adding a new value to the EPdfContentsType enum,
something like ePdfContentsType_ImageData, just to help the caller know what they're getting, though the returned "ID" keyword should've been a
hint.

If you do implement this, PLEASE do so against svn trunk not against a
release branch, and send us a patch. If you don't know how to produce a patch, send the modified file(s) containing ONLY the changes required to
implement inline images in PdfContentsTokenizer.

It'd be nice to use your PDF as a test case for the class, too. OK by
you?

--
Craig Ringer


------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to