Hi

I’ve been looking at the recursive stack consumption issues in PoDoFo:

https://sourceforge.net/p/podofo/tickets/15/ (CVE-2018-8002)
https://sourceforge.net/p/podofo/tickets/130/ (CVE-2021-30470)
https://sourceforge.net/p/podofo/tickets/131/ (CVE-2021-30471)

tl;dr
PoDoFo needs a general mechanism to prevent recursive stack consumption because 
this can happen in many different places in PDFs. Even if the issues above are 
fixed there will still be other stack overflow issues in PoDoFo.
See Possible Solutions at end.

Stack consumption CVEs on other PDF tools
Stack consumption issues are very common in PDF processing code – here’s a few 
I found searching for “pdf stack consumption cve” – this is just a subset:

https://nvd.nist.gov/vuln/detail/CVE-2021-38569 (Foxit Reader stack consumption 
via recursive function calls during the handling of XFA forms or link objects)
https://nvd.nist.gov/vuln/detail/CVE-2021-38566 (Foxit Reader recursive stack 
consumption reading embedded xml nodes)
https://nvd.nist.gov/vuln/detail/CVE-2020-13815 (Foxit Reader recursive stack 
consumption reading indirect object references)
https://nvd.nist.gov/vuln/detail/CVE-2019-9903 (Poppler PDF mishandles dict 
marking, leading to stack consumption)
https://nvd.nist.gov/vuln/detail/CVE-2018-16369 (Xpdf recursive stack 
consumption processing forms)
https://nvd.nist.gov/vuln/detail/CVE-2018-6544  (Artifex MuPDF recursive stack 
consumption reading xref streams)
https://nvd.nist.gov/vuln/detail/CVE-2017-12595 (QPDF recursive stack 
consumption processing  arrays and dictionaries)
https://nvd.nist.gov/vuln/detail/CVE-2009-3431  (Acrobat Reader recursive stack 
consumption processing large number of [[[[[ square open brackets )

Recursive overflows are common because PDFs contain lots of recursive 
structures (e.g. dictionaries can contain arrays, which can contain 
dictionaries, which can contain…). These recursive structures can be very 
deeply nested. It’s also possible to create specially-crafted PDFs with loops 
(.e.g. /Next points to /Prev)

Stack consumption prevention in other document formats
Other document languages like HTML and XML have the same problem – this is 
usually addressed by defining a hard limit for nesting (sometimes configurable):


  *   HTML – Chrome and Firefox have a fixed 512 element depth limit 
https://github.com/validator/htmlparser/blob/493e71d89b69f6036bd5bd36262d34ea81613f39/src/nu/validator/htmlparser/impl/TreeBuilder.java#L6093
  *   XML – the MSXML processor has a default maximum element depth of 256, 
which is configurable 
https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms762248(v=vs.85)

Operating system stack size and guard pages
Stack sizes vary a lot between operating systems:


  *   Windows: 1MB on x32, x64, ARM 
https://docs.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations?view=msvc-160
  *   macOS: 8MB on main thread, 512KB on worker threads 
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/CreatingThreads/CreatingThreads.html
  *   iOS: 1MB on main thread, 512KB on worker threads
  *   Linux distros – usually 8MB

On modern operating systems there’s usually a stack guard page, so that stack 
consumption usually causes an access violation, but this is relatively recent 
in Linux (only added in 2010) – and that was only a 4KB guard page:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=320b2b8de12698082609ebbc1a17165727f4c893

Even with stack guard pages it’s possible to hop over the guard page if a 
function (or a library call to FreeType, zlib, libtiff) allocates more stack 
than the size of the guard page, and then overwrites whatever is below the 
guard page in memory.
https://lwn.net/Articles/725832/

Embedded systems (e.g. printers that produce scanned PDFs) often don’t have 
stack guard pages, so recursive stack consumption overwrites whatever is below 
the stack in memory.

Possible solutions:


  1.  Add a depth parameter to every method that can make recursive calls
This is the usual solution to the problem, but I think that would modify a very 
large number of PoDoFo methods since dictionaries are used in lots of places, 
and this would break code and ABI compatibility.

  2.  Extend the current PdfRecursionGuard in PdfParser to other classes like 
the tokenizer
This has a recursion count per-object so doesn’t protect against overflow using 
mutual recursion (e.g. when a dictionary contains an array, which contains 
another dictionary, which contains another array…)

  3.  Modify PdfRecursionGuard to store a static recursion count in 
thread-local storage.

This means every thread has its own recursion count - errno works the same way 
in the C standard library – there’s an errno for each thread. This works 
because there’s a 1:1 mapping between threads and stacks, so the thread local 
variable counts calls on the current thread’s stack.

The compiler needs to support the C11 thread_local keyword or one of the older 
equivalents,  but it can be done something like this:

#if defined( thread_local )

    // thread_local is a macro which can be used as a feature test

    // see https://en.cppreference.com/w/c/thread/thread_local

    #define PODOFO_THREAD_LOCAL thread_local

#elif defined(_MSC_VER) && defined( WINVER ) && ( WINVER >= _WIN32_WINNT_VISTA )

    // supported on Windows Vista and above (limited support on Windows XP)

    #define PODOFO_THREAD_LOCAL __declspec( thread )
#elif defined(__GNUC__) && ( __GNUC__ >= 3 )

    // supported on GCC 
https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html#Thread-Local

    #define PODOFO_THREAD_LOCAL __thread

#else

    // fallback to process global recursion count – overcounts depth if PDFs 
processed in parallel on multiple threads, same result as thread_local if 
process single threaded

    #define PODOFO_THREAD_LOCAL

#endif



    PdfRecursionGuard()

    {

        ++t_ nRecursionDepth;



        if ( t_ nRecursionDepth > s_maxRecursionDepth )

        {

            PODOFO_RAISE_ERROR( ePdfError_OutOfMemory ); // 
ePdfError_NestedTooDeep might be better

        }

    }



    ~PdfRecursionGuard()

    {

        --t_ nRecursionDepth;

    }



    static void SetMaxRecursionDepth( int32_t maxRecursionDepth )

    {

        s_maxRecursionDepth = maxRecursionDepth;

    }



static PODOFO_THREAD_LOCAL int t_nRecursionDepth = 0;

static int s_maxRecursionDepth = 500; // user-configurable with reasonable 
default


Adding recursion counting to a method just involves adding the following local 
variable to any method you need to guard:
PdfRecursionGuard guard;


Is option 3) worth investigating?
What does everyone think?

Best Regards
Mark

--
Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com>
PowerMapper Software Ltd - www.powermapper.com
Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to