Hi Amin,

Thanks a lot for your patch. I tested and applied it today to SVN.

Cheers,
        Dominik

Am Dienstag 31 August 2010 schrieb A. Massad:
> Hi,
> 
> I have encountered two problems in
>  PdfContentsTokenizer::ReadInlineImgData():
> 
> 1) Parsing expects a whitespace *before* the EI operator (end of image
>  data) whereas it should expect a whitespace *after* the EI. 2) Buffer for
>  image data has a fixed size of 4096 bytes.
> 
> The patch (against svn rev. 1298) included in this E-Mail provides a
>  solution for both issues.
> 
> Some further details:
> 
> To 1) Unfortunately, the PDF spec does not clearly define how the EI
>  operator should be detected in the data following the ID operator. The
>  size of the data is not specified, and there seems to be no "escaping"
>  mechanisms if the sequence EI should occur in the image data. However,
>  there is an "heuristic" approach by other PDF parsers which expect a
>  whitespace *after* the EI operator. See, here for such a discussion:
>  http://www.planetpdf.com/forumarchive/134376.asp
> 
> > Topic: Re: parsing inline images (Via Email)
> > Conf: (P-PDF) Developers, Msg: 134376
> > From: LeonardR
> > Date: 6/13/2005 10:58 PM
> >
> > At 06:38 PM 6/13/2005, p-pdf-developers Listmanager wrote:
> > >The image data contains "EI " where the
> > >white space is a space (0x20).
> >
> > The actual image data, or the encoded version of the data? Are
> > you decoding and then looking or grabbing the inline image data till you
> > find the "EI" and then decoding?
> >
> > >our parser detects either a space or cr lf.
> >
> > I've looked at the sources to a few content stream parsers (my
> > own, Xpdf, Multivalent, etc.) and they all also support "EI" followed by
> > at least one whitespace character (specifically space, CR or LF).
> 
> Prior to the patch, PoDoFo expects to find a whitespace *before* the EI
>  operator and fails to detect the end of image data for some PDFs created
>  by a common PDF workflow software.
> 
> To 2) The PDF spec states that inlined images *should* not be larger than
>  4K. However, it does not forbid images to be larger. Again, some common
>  PDF outputs contained inlined images larger than 4K. In that case, PoDoFo
>  should not fail but rather resize the buffer.
> 
> Hopefully, this patch will be helpful for other users, too. Many thanks to
>  all developers for this great project!
> 
> Best regards,
> Amin
> 
> > Index: podofo-src-r1298/src/PdfContentsTokenizer.cpp
> > ===================================================================
> > --- podofo-src-r1298/src/PdfContentsTokenizer.cpp   (revision 1298)
> > +++ podofo-src-r1298/src/PdfContentsTokenizer.cpp   (working copy)
> > @@ -202,40 +202,43 @@
> >          PODOFO_RAISE_ERROR( ePdfError_InvalidHandle );
> >      }
> >
> > -    // cosume the only whitespace between ID and data
> > +    // consume the only whitespace between ID and data
> >      c = m_device.Device()->Look();
> >      if( PdfTokenizer::IsWhitespace( c ) )
> >      {
> >          c = m_device.Device()->GetChar();
> >      }
> >
> > -    while( (c = m_device.Device()->Look()) != EOF
> > -           && counter < static_cast<long long>(m_buffer.GetSize()) )
> > -    {
> > -        if (PdfTokenizer::IsWhitespace(c))
> > -        {
> > -            // test if end-of-image-data is reached (hit EI keyword)
> > -            c = m_device.Device()->GetChar(); // skip the white space
> > -            char e = m_device.Device()->GetChar();
> > -            char i = m_device.Device()->GetChar();
> > -            m_device.Device()->Seek(-2, std::ios::cur);
> > -            if (e == 'E' && i == 'I')
> > -            {
> > -                m_buffer.GetBuffer()[counter] = '\0';
> > -                rVariant = PdfData(m_buffer.GetBuffer(),
> > static_cast<size_t>(counter)); -                reType =
> > ePdfContentsType_ImageData;
> > -                m_readingInlineImgData = false;
> > -                return true;
> > -            }
> > -            m_buffer.GetBuffer()[counter] = c;
> > -            ++counter;
> > -        }
> > -        else
> > -        {
> > -            c = m_device.Device()->GetChar();
> > -            m_buffer.GetBuffer()[counter] = c;
> > -            ++counter;
> > -        }
> > +    while((c = m_device.Device()->Look()) != EOF) {
> > +      c = m_device.Device()->GetChar();
> > +      if (c=='E' &&  m_device.Device()->Look()=='I') {
> > +   char i = m_device.Device()->GetChar();
> > +   char w = m_device.Device()->Look();
> > +        if (w==EOF || PdfTokenizer::IsWhitespace(w)) {
> > +     // EI is followed by whitespace => stop
> > +     m_device.Device()->Seek(-2, std::ios::cur); // put back "EI"
> > +     m_buffer.GetBuffer()[counter] = '\0';
> > +     rVariant = PdfData(m_buffer.GetBuffer(),
> > static_cast<size_t>(counter)); +      reType = ePdfContentsType_ImageData;
> > +     m_readingInlineImgData = false;
> > +     return true;
> > +   }
> > +   else {
> > +     // no whitespace after EI => do not stop
> > +     m_device.Device()->Seek(-1, std::ios::cur); // put back "I"
> > +     m_buffer.GetBuffer()[counter] = c;
> > +     ++counter;
> > +   }
> > +      }
> > +      else {
> > +   m_buffer.GetBuffer()[counter] = c;
> > +   ++counter;
> > +      }
> > +
> > +      if (counter ==  static_cast<long long>(m_buffer.GetSize())) {
> > +         // image is larger than buffer => resize buffer
> > +         m_buffer.Resize(m_buffer.GetSize()*2);
> > +      }
> >      }
> >      return false;
> >  }
> 


-- 
**********************************************************************
Dominik Seichter - [email protected]
KRename  - http://www.krename.net  - Powerful batch renamer for KDE
KBarcode - http://www.kbarcode.net - Barcode and label printing
PoDoFo - http://podofo.sf.net - PDF generation and parsing library
SchafKopf - http://schafkopf.berlios.de - Schafkopf, a card game,  for KDE
Alan - http://alan.sf.net - A Turing Machine in Java
**********************************************************************

Attachment: signature.asc
Description: This is a digitally signed message part.

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to