Function PdfContentsTokenizer::ReadInlineImgData in PdfContentsTokenizer.cpp:
``` c = m_device.Device()->GetChar(); if (c=='E' && m_device.Device()->Look()=='I') { // Consume character m_device.Device()->GetChar(); int w = m_device.Device()->Look(); if (w==EOF || PdfTokenizer::IsWhitespace(w)) { // EI is followed by whitespace => stop ... m_readingInlineImgData = false; ``` It will stop as soon as is found byte sequence "EI " but with inline image data it is not so simple. It is needed calculate size of decoded image data from parameters like width, height, bit per component and color space. Then decode data and see for EOD or look for exactly right number of decoded bytes and then stop parsing image data as not every filter has EOD. As is written in pdf reference: "Entries other than those listed are ignored; in particular, the Type, Subtype, and Length entries normally found in a stream or image dictionary are unnecessary" That is unfortunate because with "Length" would be this parsing simpler. Without it could be used EOD but as not all filters have it is it needed to calculate decompressed size of image data. Sample code to demonstrate this bug: ``` PdfMemDocument doc; PdfStream *stm = doc.CreatePage({0, 0, 100, 100})->GetContentsForAppending()->GetStream(); stm->BeginAppend(TVecFilters()); stm->Append("100 0 0 100 0 0 cm\n"); stm->Append("BI /W 4 /H 4 /CS /RGB /BPC 8\n"); stm->Append("ID\n"); stm->Append("00000z0z00zzz00z0zzz0zzzEI aazazaazzzaazazzzazzz\n"); stm->Append("EI\n"); stm->EndAppend(); doc.Write("bi.pdf"); PdfMemDocument pdf("bi.pdf"); PdfContentsTokenizer tok(pdf.GetPage(0)); EPdfContentsType type; const char *key; PdfVariant var; while(tok.ReadNext(type, key, var)) { switch(type) { case ePdfContentsType_Keyword: printf("keyword: %s\n", key); break; case ePdfContentsType_Variant: printf("variant: %s\n", var.GetDataTypeString()); break; case ePdfContentsType_ImageData: printf("image: %s\n", var.GetDataTypeString()); break; } } ``` Partial output: ``` ... keyword: ID image: RawData keyword: EI keyword: aazazaazzzaazazzzazzz keyword: EI ``` Which should instead be: ``` ... keyword: ID image: RawData keyword: EI ``` Resulting pdf file (also attached): ``` %PDF-1.3 %âãÏÓ 1 0 obj<</Type/Catalog/Pages 3 0 R>> endobj 2 0 obj<</CreationDate(D:20190906183146+02'00')/Producer(PoDoFo - http://podofo.sf.net)>> endobj 3 0 obj<</Type/Pages/Count 1/Kids[ 4 0 R]>> endobj 4 0 obj<</Type/Page/Contents 5 0 R/MediaBox[ 0 0 100 100]/Parent 3 0 R/Resources<</ProcSet[/PDF/Text/ImageB/ImageC/ImageI]>>>> endobj 5 0 obj<</Length 103>> stream 100 0 0 100 0 0 cm BI /W 4 /H 4 /CS /RGB /BPC 8 ID 00000z0z00zzz00z0zzz0zzzEI aazazaazzzaazazzzazzz EI endstream endobj xref 0 6 0000000000 65535 f 0000000015 00000 n 0000000059 00000 n 0000000156 00000 n 0000000207 00000 n 0000000341 00000 n trailer <</ID[<D047079C2B662F2617BF6BC31251DAB1><D047079C2B662F2617BF6BC31251DAB1>]/Info 2 0 R/Root 1 0 R/Size 6>> startxref 492 %%EOF ```
bi.pdf
Description: Adobe PDF document
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users