Hello again,

today I had some time for testing whether reducing the try-catch clauses
solves the stack overflow, and it does:

I removed try-catch clauses in PdfParser::ReadNextTrailer &
PdfParser::ReadXRefContents.
Also I removed "PODOFO_RAISE_ERROR( ePdfError_NoTrailer )" in
PdfParser::ReadNextTrailer() when there's no trailer,
since the error got suppressed in subsequent try-catch clauses anyway.

With this, I could safely try to load the test document with 160 trailers,
and podofo failed to load it safely without running into the stack overflow
(tested witth Debug and Release).
I added a patch file for the changes to the mail, although its for v0.9.8.
We don't have upgraded to podofo 0.10 yet, So I can't test any changes to
this version beforehand.
But recreating the patch file for 0.10 should be fairly easy, given that
the parser code did not really change between 0.9.x and 0.10.x

Minor side-note:
Why does the PdfTokenizer::RecursionGuard throw a "ePdfError_InvalidXRef"
error when the rmax recursion depth is reached?
It would make more sense to throw a special "MaxRecursionReached" error
here.

Regards,
F.E.

Am Di., 6. Feb. 2024 um 12:40 Uhr schrieb Francesco Pretto <cez...@gmail.com
>:

> Hello, thank you for the fixture, I can reproduce in Windows. With a test
> case fixing the bug becomes less "guess work" and more fun. Unfortunately
> I'm still in a period where I have little time to spend on PoDoFo in tasks
> that are not part of my daily work. If nobody takes it in the mean time, I
> think I may have time to look at it in a couple of weeks.
>
> Regards.
> Francesco
>
> On Thu, 1 Feb 2024 at 19:07, F. E. <exler7...@gmail.com> wrote:
>
>> Hello again,
>>
>> using a ps script, I successfully created an empty pdf with 160 trailers.
>> The trailers are all empty and do not change any objects, but reference
>> each other via /Prev, which is sufficient for the stack overflow to happen.
>>
>> Feel free to test the parsing with it :)
>>
>> Regards,
>> F.E.
>>
>> Am Do., 1. Feb. 2024 um 07:58 Uhr schrieb F. E. <exler7...@gmail.com>:
>>
>>> "think the matter is complex enough that this may undermine the chances
>>> a complete fix materializes soon. Also if you or someone else come up with
>>> a fix it would be recommendable for such SO related issue to add a unit
>>> test on an anonymized file that shows the same or very similar behavior."
>>>
>>> All thats needed is an example pdf with a massive amount of trailers /
>>> updates, it does not have to be my specific file. I know one can open a pdf
>>> in update-only mode with Podofo, I just don't know what the easiest way to
>>> apply an update would be. Maybe such a pdf could even be crafted manually.
>>> I'll look into this in the coming days, and will share a file if I'm
>>> successful.
>>>
>>> Am Mi., 31. Jan. 2024 um 14:28 Uhr schrieb Francesco Pretto <
>>> cez...@gmail.com>:
>>>
>>>> On Tue, 30 Jan 2024 at 12:39, F. E. <exler7...@gmail.com> wrote:
>>>>
>>>>>  I sadly cannot share the file I'm testing against, since it contains
>>>>> confidential data.
>>>>>
>>>>
>>>> I think the matter is complex enough that this may undermine the
>>>> chances a complete fix materializes soon. Also if you or someone else come
>>>> up with a fix it would be recommendable for such SO related issue to add a
>>>> unit test on an anonymized file that shows the same or very similar
>>>> behavior.
>>>>
>>>> Francesco
>>>>
>>> _______________________________________________
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users
>>
>
--- src/podofo/base/PdfParser.cpp       2022-04-11 07:07:00.000000000 +0100
+++ src/podofo/base/PdfParser.cpp       2024-02-13 14:35:00.000000000 +0100
@@ -298,8 +298,7 @@ void PdfParser::ReadDocumentStructure()
         try {
             ReadNextTrailer();
         } catch( PdfError & e ) {
-            if( e != ePdfError_NoTrailer )
-                throw e;
+            throw e;
         }
     }
 
@@ -518,13 +517,8 @@ void PdfParser::ReadNextTrailer()
     //if( strcmp( m_buffer.GetBuffer(), "railer" ) == 0 )
     {
         PdfParserObject trailer( m_vecObjects, m_device, m_buffer );
-        try {
-            // Ignore the encryption in the trailer as the trailer may not be 
encrypted
-            trailer.ParseFile( NULL, true );
-        } catch( PdfError & e ) {
-            e.AddToCallstack( __FILE__, __LINE__, "The linearized trailer was 
found in the file, but contains errors." );
-            throw e;
-        }
+        // Ignore the encryption in the trailer as the trailer may not be 
encrypted
+        trailer.ParseFile( NULL, true );
 
         // now merge the information of this trailer with the main documents 
trailer
         MergeTrailer( &trailer );
@@ -536,42 +530,28 @@ void PdfParser::ReadNextTrailer()
             if( !trailer.GetDictionary().HasKey( "Prev" ) )
                 m_nIncrementalUpdates++;
 
-            try {
-                ReadXRefStreamContents( 
static_cast<pdf_long>(trailer.GetDictionary().GetKeyAsLong( "XRefStm", 0 )), 
false );
-            } catch( PdfError & e ) {
-                e.AddToCallstack( __FILE__, __LINE__, "Unable to load /XRefStm 
xref stream." );
-                throw e;
-            }
+            ReadXRefStreamContents( 
static_cast<pdf_long>(trailer.GetDictionary().GetKeyAsLong( "XRefStm", 0 )), 
false );
         }
 
         if( trailer.GetDictionary().HasKey( "Prev" ) )
         {
-            try {
-                pdf_long lOffset = 
static_cast<pdf_long>(trailer.GetDictionary().GetKeyAsLong( "Prev", 0 ));
-                if( 0 == lOffset )
-                {
-                    PdfError::LogMessage( eLogSeverity_Warning, "XRef contents 
at offset %" PDF_FORMAT_INT64 " is invalid, skipping the read\n", 
static_cast<pdf_int64>( lOffset ));
-                    return;
-                }
-                
-                // Whenever we read a Prev key (!= 0), 
-                // we know that the file was updated.
-                m_nIncrementalUpdates++;                
-
-                if( m_visitedXRefOffsets.find( lOffset ) == 
m_visitedXRefOffsets.end() )
-                    ReadXRefContents( lOffset );
-                else
-                    PdfError::LogMessage( eLogSeverity_Warning, "XRef contents 
at offset %" PDF_FORMAT_INT64 " requested twice, skipping the second read\n", 
static_cast<pdf_int64>( lOffset ));
-            } catch( PdfError & e ) {
-                e.AddToCallstack( __FILE__, __LINE__, "Unable to load /Prev 
xref entries." );
-                throw e;
+            pdf_long lOffset = 
static_cast<pdf_long>(trailer.GetDictionary().GetKeyAsLong( "Prev", 0 ));
+            if( 0 == lOffset )
+            {
+                PdfError::LogMessage( eLogSeverity_Warning, "XRef contents at 
offset %" PDF_FORMAT_INT64 " is invalid, skipping the read\n", 
static_cast<pdf_int64>( lOffset ));
+                return;
             }
+            
+            // Whenever we read a Prev key (!= 0), 
+            // we know that the file was updated.
+            m_nIncrementalUpdates++;                
+
+            if( m_visitedXRefOffsets.find( lOffset ) == 
m_visitedXRefOffsets.end() )
+                ReadXRefContents( lOffset );
+            else
+                PdfError::LogMessage( eLogSeverity_Warning, "XRef contents at 
offset %" PDF_FORMAT_INT64 " requested twice, skipping the second read\n", 
static_cast<pdf_int64>( lOffset ));
         }
     }
-    else // OC 13.08.2010 BugFix: else belongs to IsNextToken( "trailer" ) and 
not to HasKey( "Prev" )
-    {
-        PODOFO_RAISE_ERROR( ePdfError_NoTrailer );
-    }
 }
 
 void PdfParser::ReadTrailer()
@@ -601,13 +581,8 @@ void PdfParser::ReadTrailer()
     else 
     {
         m_pTrailer = new PdfParserObject( m_vecObjects, m_device, m_buffer );
-        try {
-            // Ignore the encryption in the trailer as the trailer may not be 
encrypted
-            static_cast<PdfParserObject*>(m_pTrailer)->ParseFile( NULL, true );
-        } catch( PdfError & e ) {
-            e.AddToCallstack( __FILE__, __LINE__, "The trailer was found in 
the file, but contains errors." );
-            throw e;
-        }
+        // Ignore the encryption in the trailer as the trailer may not be 
encrypted
+        static_cast<PdfParserObject*>(m_pTrailer)->ParseFile( NULL, true );
 #ifdef PODOFO_VERBOSE_DEBUG
         PdfError::DebugMessage("Size=%" PDF_FORMAT_INT64 "\n", 
m_pTrailer->GetDictionary().GetKeyAsLong( PdfName::KeySize, 0 ) );
 #endif // PODOFO_VERBOSE_DEBUG
@@ -738,22 +713,14 @@ void PdfParser::ReadXRefContents( pdf_long lOffset, bool 
bPositionAtEnd )
             if( e == ePdfError_NoNumber || e == ePdfError_InvalidXRef || e == 
ePdfError_UnexpectedEOF ) 
                 break;
             else 
-           {
+            {
                 e.AddToCallstack( __FILE__, __LINE__ );
                 throw e;
             }
         }
     }
 
-    try {
-        ReadNextTrailer();
-    } catch( PdfError & e ) {
-        if( e != ePdfError_NoTrailer ) 
-        {
-            e.AddToCallstack( __FILE__, __LINE__ );
-            throw e;
-        }
-    }
+    ReadNextTrailer();
 }
 
 static bool CheckEOL( char e1, char e2 )
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to