Re: [Podofo-users] Patch to handle indirect ID in trailer
I fully agree with that. Cordialement/Best Regards, Christophe Blanchard -Message d'origine- De : zyx Envoyé : jeudi 1 novembre 2018 11:09 À : podofo-users@lists.sourceforge.net Objet : Re: [Podofo-users] Patch to handle indirect ID in trailer On Wed, 2018-10-31 at 23:14 +0100, Matthew Brincke wrote: > IHMO yes, because if it were encrypted, an indirect ID would be > against the PDF standard, right? I think that should be caught ... Hi, even it would be against the standard, PoDoFo is not a validator, it might not panic if some software creates documents which are not following the standard strictly. Personally, I'd relax more checks in PoDoFo, allowing to read broken files as much as possible, rather than reject to read the file completely. That's only my opinion though. Bye, zyx ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systèmes does not accept or assume any liability or responsibility for any use of or reliance on this email. Please be informed that your personal data are processed according to our data privacy policy as described on our website. Should you have any questions related to personal data protection, please contact 3DS Data Protection Officer at 3ds.compliance-priv...@3ds.com<mailto:3ds.compliance-priv...@3ds.com> For other languages, go to https://www.3ds.com/terms/email-disclaimer ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Patch to handle indirect ID in trailer
> Le 1 nov. 2018 à 11:09, zyx a écrit : > > On Wed, 2018-10-31 at 23:14 +0100, Matthew Brincke wrote: >> IHMO yes, because if it were encrypted, an indirect ID would be >> against the PDF standard, right? I think that should be caught ... > > Hi, > even it would be against the standard, PoDoFo is not a validator, it > might not panic if some software creates documents which are not > following the standard strictly. Personally, I'd relax more checks in > PoDoFo, allowing to read broken files as much as possible, rather than > reject to read the file completely. That's only my opinion though. > Bye, > zyx +1. -- Best Regards, Meilleures salutations, Met vriendelijke groeten, Olivier Mascia ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Patch to handle indirect ID in trailer
On Wed, 2018-10-31 at 23:14 +0100, Matthew Brincke wrote: > IHMO yes, because if it were encrypted, an indirect ID would be > against the PDF standard, right? I think that should be caught ... Hi, even it would be against the standard, PoDoFo is not a validator, it might not panic if some software creates documents which are not following the standard strictly. Personally, I'd relax more checks in PoDoFo, allowing to read broken files as much as possible, rather than reject to read the file completely. That's only my opinion though. Bye, zyx ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Patch to handle indirect ID in trailer
Hello Clayton, hello zyx, hello all, > On 30 October 2018 at 18:34 Clayton Wheeler wrote: > ... snip ... > the typo in that comment, but I rewrote it after a closer reading of section > 7.5.5 anyway; it implies the ID can be indirect when a PDF is unencrypted. > what the patch doesn't do is to insert a check for the document actually being not encrypted. Should that that be added (as an extra commit), what do you think? IHMO yes, because if it were encrypted, an indirect ID would be against the PDF standard, right? I think that should be caught ... > Thanks, > -- > > > Clayton Wheeler > cwhee...@genomenon.com Best regards, mabri ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Patch to handle indirect ID in trailer
On Tue, 2018-10-30 at 12:34 -0500, Clayton Wheeler wrote: > Here's a fixed (and properly tested) version. Hi, thanks for the patch update. It looks fine, thus I committed it as revision 1947: http://sourceforge.net/p/podofo/code/1947 Bye, zyx ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Patch to handle indirect ID in trailer
Hi zyx, [re-sending this to the list, sent it as a direct reply by mistake] You're right, sorry, I only checked my last round of changes with a production build by mistake. Here's a fixed (and properly tested) version. I couldn't spot the typo in that comment, but I rewrote it after a closer reading of section 7.5.5 anyway; it implies the ID can be indirect when a PDF is unencrypted. Thanks, On Sat, Oct 27, 2018 at 8:10 AM, zyx wrote: > On Thu, 2018-10-18 at 17:43 -0500, Clayton Wheeler wrote: > > Does this seem sensible? > > Hi, > the idea sounds good, but the patch doesn't compile. I guess you made > some changes just before creating the patch, but you didn't compile the > code, because otherwise you'd notice it. > > There's also a little typo in your comment, specifically here: > +// Per the PDF spec, section 7.5.5, the ID shall be an indirect > object. > > I'd also remove debugging stuff from the test code, those comments like > this one: > +//std::cerr << inBuf; > also because once someone enables it the code will not compile (again). > > Would you mind to correct the patch, please? > Thanks and bye, > zyx > > > > > ___ > Podofo-users mailing list > Podofo-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/podofo-users > -- Clayton Wheeler cwhee...@genomenon.com From 927c629c21e7eb63c8b15468c42263b8626b77de Mon Sep 17 00:00:00 2001 From: Clayton Wheeler Date: Thu, 11 Oct 2018 17:49:49 -0500 Subject: [PATCH] Handle trailer ID being an indirect object Some PDF writers (QuarkXPress and/or Quartz ca. Mac OS X 10.3.9, evidently) can write a file identifier in the trailer as an indirect object. This is implicitly allowed for unencrypted PDFs; this fix allows PoDoFo to parse such files. --- src/base/PdfWriter.cpp | 6 test/unit/ParserTest.cpp | 64 test/unit/ParserTest.h | 3 ++ 3 files changed, 73 insertions(+) diff --git a/src/base/PdfWriter.cpp b/src/base/PdfWriter.cpp index 237e0ff..a314c7a 100644 --- a/src/base/PdfWriter.cpp +++ b/src/base/PdfWriter.cpp @@ -686,6 +686,12 @@ void PdfWriter::CreateFileIdentifier( PdfString & identifier, const PdfObject* p if( pOriginalIdentifier && pTrailer->GetDictionary().HasKey( "ID" )) { const PdfObject* idObj = pTrailer->GetDictionary().GetKey("ID"); +// The PDF spec, section 7.5.5, implies that the ID may be +// indirect as long as the PDF is not encrypted. Handle that +// case. +if ( idObj->IsReference() ) { +idObj = m_vecObjects->GetObject( idObj->GetReference() ); +} TCIVariantList it = idObj->GetArray().begin(); if( it != idObj->GetArray().end() && diff --git a/test/unit/ParserTest.cpp b/test/unit/ParserTest.cpp index d0014cd..2c7936d 100644 --- a/test/unit/ParserTest.cpp +++ b/test/unit/ParserTest.cpp @@ -1981,6 +1981,70 @@ void ParserTest::testIsPdfFile() } } +void ParserTest::testRoundTripIndirectTrailerID() +{ +std::ostringstream oss; +oss << "%PDF-1.1\n"; +int nCurObj = 0; +int objPos[20]; + +// Pages + +int nPagesObj = nCurObj; +objPos[nCurObj] = oss.tellp(); +oss << nCurObj++ << " 0 obj\n"; +oss << "<>\n"; +oss << "endobj"; + +// Root catalog + +int rootObj = nCurObj; +objPos[nCurObj] = oss.tellp(); +oss << nCurObj++ << " 0 obj\n"; +oss << "<>\n"; +oss << "endobj\n"; + +// ID +int nIdObj = nCurObj; +objPos[nCurObj] = oss.tellp(); +oss << nCurObj++ << " 0 obj\n"; +oss << "[ ]\n"; +oss << "endobj\n"; + +int nXrefPos = oss.tellp(); +oss << "xref\n"; +oss << "0 " << nCurObj << "\n"; +char objRec[21]; +for ( int i = 0; i < nCurObj; i++ ) { +snprintf( objRec, 21, "%010d 0 n \n", objPos[i] ); +oss << objRec; +} +oss << "trailer <<\n" +<< " /Size " << nCurObj << "\n" +<< " /Root " << rootObj << " 0 R\n" +<< " /ID " << nIdObj << " 0 R\n" // indirect ID +<< ">>\n" +<< "startxref\n" +<< nXrefPos << "\n" +<< "%%EOF\n"; + +std::string sInBuf = oss.str(); +try { +PoDoFo::PdfMemDocument doc; +// load for update +doc.LoadFromBuffer( sInBuf.c_str(), sInBuf.size(), true ); + +PoDoFo::PdfRefCountedBuffer outBuf; +PoDoFo::PdfOutputDevice outDev( ); + +doc.WriteUpdate( ); +// should not throw +CPPUNIT_ASSERT( true ); +} catch ( PoDoFo::PdfError& error ) { +CPPUNIT_FAIL( "Unexpected PdfError" ); +} +} + std::string ParserTest::generateXRefEntries( size_t count ) { std::string strXRefEntries; diff --git a/test/unit/ParserTest.h b/test/unit/ParserTest.h index b8f7ea9..cffcaaa 100644 --- a/test/unit/ParserTest.h +++ b/test/unit/ParserTest.h @@ -41,6 +41,7 @@ class ParserTest : public CppUnit::TestFixture CPPUNIT_TEST(
Re: [Podofo-users] Patch to handle indirect ID in trailer
On Thu, 2018-10-18 at 17:43 -0500, Clayton Wheeler wrote: > Does this seem sensible? Hi, the idea sounds good, but the patch doesn't compile. I guess you made some changes just before creating the patch, but you didn't compile the code, because otherwise you'd notice it. There's also a little typo in your comment, specifically here: +// Per the PDF spec, section 7.5.5, the ID shall be an indirect object. I'd also remove debugging stuff from the test code, those comments like this one: +//std::cerr << inBuf; also because once someone enables it the code will not compile (again). Would you mind to correct the patch, please? Thanks and bye, zyx ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Patch to handle indirect ID in trailer
Hello, While using PoDoFo, I've encountered some slightly non-compliant PDFs with an ID entry in the trailer stored as an indirect object, when the spec mandates that it be a direct object. (At least one of the offending files was from QuarkXPress on Mac OS X 10.3; this may not be a widespread problem.) This causes PoDoFo to error out when parsing the file. I've added a very simple fix and a test case to accompany it. When PoDoFo writes out a modified version of one of these PDFs, it writes the ID into the trailer as a direct object regardless, so this only needs to be dealt with when parsing the PDFs. Does this seem sensible? Thanks, -- Clayton Wheeler cwhee...@genomenon.com From e3daead203d7ebb6a35bc84398531f03bb174b6b Mon Sep 17 00:00:00 2001 From: Clayton Wheeler Date: Thu, 11 Oct 2018 17:49:49 -0500 Subject: [PATCH] Handle trailer ID (incorrectly) being an indirect object Some non-conforming PDF writers (QuarkXPress and/or Quartz ca. Mac OS X 10.3.9, evidently) can write a file identifier in the trailer as an indirect object. This is contrary to the specification, but worth handling since it otherwise breaks PoDoFo. --- src/base/PdfWriter.cpp | 6 test/unit/ParserTest.cpp | 66 test/unit/ParserTest.h | 3 ++ 3 files changed, 75 insertions(+) diff --git a/src/base/PdfWriter.cpp b/src/base/PdfWriter.cpp index 237e0ff..b2a558f 100644 --- a/src/base/PdfWriter.cpp +++ b/src/base/PdfWriter.cpp @@ -686,6 +686,12 @@ void PdfWriter::CreateFileIdentifier( PdfString & identifier, const PdfObject* p if( pOriginalIdentifier && pTrailer->GetDictionary().HasKey( "ID" )) { const PdfObject* idObj = pTrailer->GetDictionary().GetKey("ID"); +// Per the PDF spec, section 7.5.5, the ID shall be an indirect object. +// If a non-conforming writer (e.g. Quark and/or Quartz) writes it as +// an indirect object, we should handle that case. +if ( idObj->IsReference() ) { +idObj = m_vecObjects->GetObject( idObj->GetReference() ); +} TCIVariantList it = idObj->GetArray().begin(); if( it != idObj->GetArray().end() && diff --git a/test/unit/ParserTest.cpp b/test/unit/ParserTest.cpp index d0014cd..a34c039 100644 --- a/test/unit/ParserTest.cpp +++ b/test/unit/ParserTest.cpp @@ -1981,6 +1981,72 @@ void ParserTest::testIsPdfFile() } } +void ParserTest::testRoundTripIndirectTrailerID() +{ +std::ostringstream oss; +oss << "%PDF-1.1\n"; +int nCurObj = 0; +int objPos[20]; + +// Pages + +int nPagesObj = nCurObj; +objPos[objN] = oss.tellp(); +oss << nCurObj++ << " 0 obj\n"; +oss << "<>\n"; +oss << "endobj"; + +// Root catalog + +int rootObj = nCurObj; +objPos[objN] = oss.tellp(); +oss << nCurObj++ << " 0 obj\n"; +oss << "<>\n"; +oss << "endobj\n"; + +// ID +int nIdObj = nCurObj; +objPos[objN] = oss.tellp(); +oss << nCurObj++ << " 0 obj\n"; +oss << "[ ]\n"; +oss << "endobj\n"; + +int nXrefPos = oss.tellp(); +oss << "xref\n"; +oss << "0 " << nCurObj << "\n"; +char objRec[21]; +for ( int i = 0; i < nCurObj; i++ ) { +snprintf( objRec, 21, "%010d 0 n \n", objPos[i] ); +oss << objRec; +} +oss << "trailer <<\n" +<< " /Size " << nCurObj << "\n" +<< " /Root " << rootObj << " 0 R\n" +<< " /ID " << nIdObj << " 0 R\n" // illegal indirect ID +<< ">>\n" +<< "startxref\n" +<< nXrefPos << "\n" +<< "%%EOF\n"; + +std::string sInBuf = oss.str(); +//std::cerr << inBuf; +try { +PoDoFo::PdfMemDocument doc; +// load for update +doc.LoadFromBuffer( inBuf.c_str(), inBuf.size(), true ); + +PoDoFo::PdfRefCountedBuffer outBuf; +PoDoFo::PdfOutputDevice outDev( ); + +doc.WriteUpdate( ); +// should not throw +CPPUNIT_ASSERT( true ); +} catch ( PoDoFo::PdfError& error ) { +//error.PrintErrorMsg(); +CPPUNIT_FAIL( "Unexpected PdfError" ); +} +} + std::string ParserTest::generateXRefEntries( size_t count ) { std::string strXRefEntries; diff --git a/test/unit/ParserTest.h b/test/unit/ParserTest.h index b8f7ea9..cffcaaa 100644 --- a/test/unit/ParserTest.h +++ b/test/unit/ParserTest.h @@ -41,6 +41,7 @@ class ParserTest : public CppUnit::TestFixture CPPUNIT_TEST( testReadXRefStreamContents ); CPPUNIT_TEST( testReadObjects ); CPPUNIT_TEST( testIsPdfFile ); +CPPUNIT_TEST( testRoundTripIndirectTrailerID ); CPPUNIT_TEST_SUITE_END(); public: @@ -77,6 +78,8 @@ public: //void testReadNextTrailer(); //void testCheckEOFMarker(); +void testRoundTripIndirectTrailerID(); + private: std::string generateXRefEntries( size_t count ); bool canOutOfMemoryKillUnitTests(); -- 2.19.1 ___ Podofo-users mailing list