Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-11-01 Thread BLANCHARD Christophe
I fully agree with that.

Cordialement/Best Regards,
Christophe Blanchard

-Message d'origine-
De : zyx 
Envoyé : jeudi 1 novembre 2018 11:09
À : podofo-users@lists.sourceforge.net
Objet : Re: [Podofo-users] Patch to handle indirect ID in trailer

On Wed, 2018-10-31 at 23:14 +0100, Matthew Brincke wrote:
> IHMO yes, because if it were encrypted, an indirect ID would be
> against the PDF standard, right? I think that should be caught ...

Hi,
even it would be against the standard, PoDoFo is not a validator, it might not 
panic if some software creates documents which are not following the standard 
strictly. Personally, I'd relax more checks in PoDoFo, allowing to read broken 
files as much as possible, rather than reject to read the file completely. 
That's only my opinion though.
Bye,
zyx



___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users
This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com<mailto:3ds.compliance-priv...@3ds.com>


For other languages, go to https://www.3ds.com/terms/email-disclaimer


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-11-01 Thread Olivier Mascia
> Le 1 nov. 2018 à 11:09, zyx  a écrit :
> 
> On Wed, 2018-10-31 at 23:14 +0100, Matthew Brincke wrote:
>> IHMO yes, because if it were encrypted, an indirect ID would be
>> against the PDF standard, right? I think that should be caught ...  
> 
>   Hi,
> even it would be against the standard, PoDoFo is not a validator, it
> might not panic if some software creates documents which are not
> following the standard strictly. Personally, I'd relax more checks in
> PoDoFo, allowing to read broken files as much as possible, rather than
> reject to read the file completely. That's only my opinion though.
>   Bye,
>   zyx

+1.

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia




___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-11-01 Thread zyx
On Wed, 2018-10-31 at 23:14 +0100, Matthew Brincke wrote:
> IHMO yes, because if it were encrypted, an indirect ID would be
> against the PDF standard, right? I think that should be caught ...  

Hi,
even it would be against the standard, PoDoFo is not a validator, it
might not panic if some software creates documents which are not
following the standard strictly. Personally, I'd relax more checks in
PoDoFo, allowing to read broken files as much as possible, rather than
reject to read the file completely. That's only my opinion though.
Bye,
zyx



___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-10-31 Thread Matthew Brincke
Hello Clayton, hello zyx, hello all,

> On 30 October 2018 at 18:34 Clayton Wheeler  wrote: 
>  
... snip ...
> the typo in that comment, but I rewrote it after a closer reading of section
> 7.5.5 anyway; it implies the ID can be indirect when a PDF is unencrypted.
>

what the patch doesn't do is to insert a check for the document actually being
not encrypted. Should that that be added (as an extra commit), what do you 
think?
IHMO yes, because if it were encrypted, an indirect ID would be against the PDF
standard, right? I think that should be caught ...  

> Thanks,
> -- 
> 
> 
> Clayton Wheeler
> cwhee...@genomenon.com

Best regards, mabri


___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-10-31 Thread zyx
On Tue, 2018-10-30 at 12:34 -0500, Clayton Wheeler wrote:
> Here's a fixed (and properly tested) version.

Hi,
thanks for the patch update. It looks fine, thus I committed it as
revision 1947:
http://sourceforge.net/p/podofo/code/1947

Bye,
zyx



___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-10-30 Thread Clayton Wheeler
Hi zyx,

[re-sending this to the list, sent it as a direct reply by mistake]

You're right, sorry, I only checked my last round of changes with a
production build by mistake. Here's a fixed (and properly tested) version.
I couldn't spot the typo in that comment, but I rewrote it after a closer
reading of section 7.5.5 anyway; it implies the ID can be indirect when a
PDF is unencrypted.

Thanks,


On Sat, Oct 27, 2018 at 8:10 AM, zyx  wrote:

> On Thu, 2018-10-18 at 17:43 -0500, Clayton Wheeler wrote:
> > Does this seem sensible?
>
> Hi,
> the idea sounds good, but the patch doesn't compile. I guess you made
> some changes just before creating the patch, but you didn't compile the
> code, because otherwise you'd notice it.
>
> There's also a little typo in your comment, specifically here:
> +// Per the PDF spec, section 7.5.5, the ID shall be an indirect
> object.
>
> I'd also remove debugging stuff from the test code, those comments like
> this one:
> +//std::cerr << inBuf;
> also because once someone enables it the code will not compile (again).
>
> Would you mind to correct the patch, please?
> Thanks and bye,
> zyx
>
>
>
>
> ___
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>



-- 
Clayton Wheeler
cwhee...@genomenon.com
From 927c629c21e7eb63c8b15468c42263b8626b77de Mon Sep 17 00:00:00 2001
From: Clayton Wheeler 
Date: Thu, 11 Oct 2018 17:49:49 -0500
Subject: [PATCH] Handle trailer ID being an indirect object

Some PDF writers (QuarkXPress and/or Quartz ca. Mac OS X 10.3.9,
evidently) can write a file identifier in the trailer as an indirect
object. This is implicitly allowed for unencrypted PDFs; this fix
allows PoDoFo to parse such files.
---
 src/base/PdfWriter.cpp   |  6 
 test/unit/ParserTest.cpp | 64 
 test/unit/ParserTest.h   |  3 ++
 3 files changed, 73 insertions(+)

diff --git a/src/base/PdfWriter.cpp b/src/base/PdfWriter.cpp
index 237e0ff..a314c7a 100644
--- a/src/base/PdfWriter.cpp
+++ b/src/base/PdfWriter.cpp
@@ -686,6 +686,12 @@ void PdfWriter::CreateFileIdentifier( PdfString & identifier, const PdfObject* p
 if( pOriginalIdentifier && pTrailer->GetDictionary().HasKey( "ID" ))
 {
 const PdfObject* idObj = pTrailer->GetDictionary().GetKey("ID");
+// The PDF spec, section 7.5.5, implies that the ID may be
+// indirect as long as the PDF is not encrypted. Handle that
+// case.
+if ( idObj->IsReference() ) {
+idObj = m_vecObjects->GetObject( idObj->GetReference() );
+}
 
 TCIVariantList it = idObj->GetArray().begin();
 if( it != idObj->GetArray().end() &&
diff --git a/test/unit/ParserTest.cpp b/test/unit/ParserTest.cpp
index d0014cd..2c7936d 100644
--- a/test/unit/ParserTest.cpp
+++ b/test/unit/ParserTest.cpp
@@ -1981,6 +1981,70 @@ void ParserTest::testIsPdfFile()
 } 
 }
 
+void ParserTest::testRoundTripIndirectTrailerID()
+{
+std::ostringstream oss;
+oss << "%PDF-1.1\n";
+int nCurObj = 0;
+int objPos[20];
+
+// Pages
+
+int nPagesObj = nCurObj;
+objPos[nCurObj] = oss.tellp();
+oss << nCurObj++ << " 0 obj\n";
+oss << "<>\n";
+oss << "endobj";
+
+// Root catalog
+
+int rootObj = nCurObj;
+objPos[nCurObj] = oss.tellp();
+oss << nCurObj++ << " 0 obj\n";
+oss << "<>\n";
+oss << "endobj\n";
+
+// ID
+int nIdObj = nCurObj;
+objPos[nCurObj] = oss.tellp();
+oss << nCurObj++ << " 0 obj\n";
+oss << "[ ]\n";
+oss << "endobj\n";
+
+int nXrefPos = oss.tellp();
+oss << "xref\n";
+oss << "0 " << nCurObj << "\n";
+char objRec[21];
+for ( int i = 0; i < nCurObj; i++ ) {
+snprintf( objRec, 21, "%010d 0 n \n", objPos[i] );
+oss << objRec;
+}
+oss << "trailer <<\n"
+<< "  /Size " << nCurObj << "\n"
+<< "  /Root " << rootObj << " 0 R\n"
+<< "  /ID " << nIdObj << " 0 R\n" // indirect ID
+<< ">>\n"
+<< "startxref\n"
+<< nXrefPos << "\n"
+<< "%%EOF\n";
+
+std::string sInBuf = oss.str();
+try {
+PoDoFo::PdfMemDocument doc;
+// load for update
+doc.LoadFromBuffer( sInBuf.c_str(), sInBuf.size(), true );
+
+PoDoFo::PdfRefCountedBuffer outBuf;
+PoDoFo::PdfOutputDevice outDev(  );
+
+doc.WriteUpdate(  );
+// should not throw
+CPPUNIT_ASSERT( true );
+} catch ( PoDoFo::PdfError& error ) {
+CPPUNIT_FAIL( "Unexpected PdfError" );
+}
+}
+
 std::string ParserTest::generateXRefEntries( size_t count )
 {
 std::string strXRefEntries;
diff --git a/test/unit/ParserTest.h b/test/unit/ParserTest.h
index b8f7ea9..cffcaaa 100644
--- a/test/unit/ParserTest.h
+++ b/test/unit/ParserTest.h
@@ -41,6 +41,7 @@ class ParserTest : public CppUnit::TestFixture
 CPPUNIT_TEST( 

Re: [Podofo-users] Patch to handle indirect ID in trailer

2018-10-27 Thread zyx
On Thu, 2018-10-18 at 17:43 -0500, Clayton Wheeler wrote:
> Does this seem sensible?

Hi,
the idea sounds good, but the patch doesn't compile. I guess you made
some changes just before creating the patch, but you didn't compile the
code, because otherwise you'd notice it.

There's also a little typo in your comment, specifically here:
+// Per the PDF spec, section 7.5.5, the ID shall be an indirect object.

I'd also remove debugging stuff from the test code, those comments like
this one:
+//std::cerr << inBuf;
also because once someone enables it the code will not compile (again).

Would you mind to correct the patch, please?
Thanks and bye,
zyx




___
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users


[Podofo-users] Patch to handle indirect ID in trailer

2018-10-18 Thread Clayton Wheeler
Hello,

While using PoDoFo, I've encountered some slightly non-compliant PDFs with
an ID entry in the trailer stored as an indirect object, when the spec
mandates that it be a direct object. (At least one of the offending files
was from QuarkXPress on Mac OS X 10.3; this may not be a widespread
problem.) This causes PoDoFo to error out when parsing the file.

I've added a very simple fix and a test case to accompany it. When PoDoFo
writes out a modified version of one of these PDFs, it writes the ID into
the trailer as a direct object regardless, so this only needs to be dealt
with when parsing the PDFs.

Does this seem sensible?

Thanks,

-- 
Clayton Wheeler
cwhee...@genomenon.com
From e3daead203d7ebb6a35bc84398531f03bb174b6b Mon Sep 17 00:00:00 2001
From: Clayton Wheeler 
Date: Thu, 11 Oct 2018 17:49:49 -0500
Subject: [PATCH] Handle trailer ID (incorrectly) being an indirect object

Some non-conforming PDF writers (QuarkXPress and/or Quartz ca. Mac OS
X 10.3.9, evidently) can write a file identifier in the trailer as an
indirect object. This is contrary to the specification, but worth
handling since it otherwise breaks PoDoFo.
---
 src/base/PdfWriter.cpp   |  6 
 test/unit/ParserTest.cpp | 66 
 test/unit/ParserTest.h   |  3 ++
 3 files changed, 75 insertions(+)

diff --git a/src/base/PdfWriter.cpp b/src/base/PdfWriter.cpp
index 237e0ff..b2a558f 100644
--- a/src/base/PdfWriter.cpp
+++ b/src/base/PdfWriter.cpp
@@ -686,6 +686,12 @@ void PdfWriter::CreateFileIdentifier( PdfString & identifier, const PdfObject* p
 if( pOriginalIdentifier && pTrailer->GetDictionary().HasKey( "ID" ))
 {
 const PdfObject* idObj = pTrailer->GetDictionary().GetKey("ID");
+// Per the PDF spec, section 7.5.5, the ID shall be an indirect object.
+// If a non-conforming writer (e.g. Quark and/or Quartz) writes it as
+// an indirect object, we should handle that case.
+if ( idObj->IsReference() ) {
+idObj = m_vecObjects->GetObject( idObj->GetReference() );
+}
 
 TCIVariantList it = idObj->GetArray().begin();
 if( it != idObj->GetArray().end() &&
diff --git a/test/unit/ParserTest.cpp b/test/unit/ParserTest.cpp
index d0014cd..a34c039 100644
--- a/test/unit/ParserTest.cpp
+++ b/test/unit/ParserTest.cpp
@@ -1981,6 +1981,72 @@ void ParserTest::testIsPdfFile()
 } 
 }
 
+void ParserTest::testRoundTripIndirectTrailerID()
+{
+std::ostringstream oss;
+oss << "%PDF-1.1\n";
+int nCurObj = 0;
+int objPos[20];
+
+// Pages
+
+int nPagesObj = nCurObj;
+objPos[objN] = oss.tellp();
+oss << nCurObj++ << " 0 obj\n";
+oss << "<>\n";
+oss << "endobj";
+
+// Root catalog
+
+int rootObj = nCurObj;
+objPos[objN] = oss.tellp();
+oss << nCurObj++ << " 0 obj\n";
+oss << "<>\n";
+oss << "endobj\n";
+
+// ID
+int nIdObj = nCurObj;
+objPos[objN] = oss.tellp();
+oss << nCurObj++ << " 0 obj\n";
+oss << "[ ]\n";
+oss << "endobj\n";
+
+int nXrefPos = oss.tellp();
+oss << "xref\n";
+oss << "0 " << nCurObj << "\n";
+char objRec[21];
+for ( int i = 0; i < nCurObj; i++ ) {
+snprintf( objRec, 21, "%010d 0 n \n", objPos[i] );
+oss << objRec;
+}
+oss << "trailer <<\n"
+<< "  /Size " << nCurObj << "\n"
+<< "  /Root " << rootObj << " 0 R\n"
+<< "  /ID " << nIdObj << " 0 R\n" // illegal indirect ID
+<< ">>\n"
+<< "startxref\n"
+<< nXrefPos << "\n"
+<< "%%EOF\n";
+
+std::string sInBuf = oss.str();
+//std::cerr << inBuf;
+try {
+PoDoFo::PdfMemDocument doc;
+// load for update
+doc.LoadFromBuffer( inBuf.c_str(), inBuf.size(), true );
+
+PoDoFo::PdfRefCountedBuffer outBuf;
+PoDoFo::PdfOutputDevice outDev(  );
+
+doc.WriteUpdate(  );
+// should not throw
+CPPUNIT_ASSERT( true );
+} catch ( PoDoFo::PdfError& error ) {
+//error.PrintErrorMsg();
+CPPUNIT_FAIL( "Unexpected PdfError" );
+}
+}
+
 std::string ParserTest::generateXRefEntries( size_t count )
 {
 std::string strXRefEntries;
diff --git a/test/unit/ParserTest.h b/test/unit/ParserTest.h
index b8f7ea9..cffcaaa 100644
--- a/test/unit/ParserTest.h
+++ b/test/unit/ParserTest.h
@@ -41,6 +41,7 @@ class ParserTest : public CppUnit::TestFixture
 CPPUNIT_TEST( testReadXRefStreamContents );
 CPPUNIT_TEST( testReadObjects );
 CPPUNIT_TEST( testIsPdfFile );
+CPPUNIT_TEST( testRoundTripIndirectTrailerID );
 CPPUNIT_TEST_SUITE_END();
 
 public:
@@ -77,6 +78,8 @@ public:
 //void testReadNextTrailer();
 //void testCheckEOFMarker();
 
+void testRoundTripIndirectTrailerID();
+
 private:
 std::string generateXRefEntries( size_t count );
 bool canOutOfMemoryKillUnitTests();
-- 
2.19.1

___
Podofo-users mailing list