Hi I’ve been looking at the recursive stack consumption issues in PoDoFo:
https://sourceforge.net/p/podofo/tickets/15/ (CVE-2018-8002) https://sourceforge.net/p/podofo/tickets/130/ (CVE-2021-30470) https://sourceforge.net/p/podofo/tickets/131/ (CVE-2021-30471) tl;dr PoDoFo needs a general mechanism to prevent recursive stack consumption because this can happen in many different places in PDFs. Even if the issues above are fixed there will still be other stack overflow issues in PoDoFo. See Possible Solutions at end. Stack consumption CVEs on other PDF tools Stack consumption issues are very common in PDF processing code – here’s a few I found searching for “pdf stack consumption cve” – this is just a subset: https://nvd.nist.gov/vuln/detail/CVE-2021-38569 (Foxit Reader stack consumption via recursive function calls during the handling of XFA forms or link objects) https://nvd.nist.gov/vuln/detail/CVE-2021-38566 (Foxit Reader recursive stack consumption reading embedded xml nodes) https://nvd.nist.gov/vuln/detail/CVE-2020-13815 (Foxit Reader recursive stack consumption reading indirect object references) https://nvd.nist.gov/vuln/detail/CVE-2019-9903 (Poppler PDF mishandles dict marking, leading to stack consumption) https://nvd.nist.gov/vuln/detail/CVE-2018-16369 (Xpdf recursive stack consumption processing forms) https://nvd.nist.gov/vuln/detail/CVE-2018-6544 (Artifex MuPDF recursive stack consumption reading xref streams) https://nvd.nist.gov/vuln/detail/CVE-2017-12595 (QPDF recursive stack consumption processing arrays and dictionaries) https://nvd.nist.gov/vuln/detail/CVE-2009-3431 (Acrobat Reader recursive stack consumption processing large number of [[[[[ square open brackets ) Recursive overflows are common because PDFs contain lots of recursive structures (e.g. dictionaries can contain arrays, which can contain dictionaries, which can contain…). These recursive structures can be very deeply nested. It’s also possible to create specially-crafted PDFs with loops (.e.g. /Next points to /Prev) Stack consumption prevention in other document formats Other document languages like HTML and XML have the same problem – this is usually addressed by defining a hard limit for nesting (sometimes configurable): * HTML – Chrome and Firefox have a fixed 512 element depth limit https://github.com/validator/htmlparser/blob/493e71d89b69f6036bd5bd36262d34ea81613f39/src/nu/validator/htmlparser/impl/TreeBuilder.java#L6093 * XML – the MSXML processor has a default maximum element depth of 256, which is configurable https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms762248(v=vs.85) Operating system stack size and guard pages Stack sizes vary a lot between operating systems: * Windows: 1MB on x32, x64, ARM https://docs.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations?view=msvc-160 * macOS: 8MB on main thread, 512KB on worker threads https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/CreatingThreads/CreatingThreads.html * iOS: 1MB on main thread, 512KB on worker threads * Linux distros – usually 8MB On modern operating systems there’s usually a stack guard page, so that stack consumption usually causes an access violation, but this is relatively recent in Linux (only added in 2010) – and that was only a 4KB guard page: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=320b2b8de12698082609ebbc1a17165727f4c893 Even with stack guard pages it’s possible to hop over the guard page if a function (or a library call to FreeType, zlib, libtiff) allocates more stack than the size of the guard page, and then overwrites whatever is below the guard page in memory. https://lwn.net/Articles/725832/ Embedded systems (e.g. printers that produce scanned PDFs) often don’t have stack guard pages, so recursive stack consumption overwrites whatever is below the stack in memory. Possible solutions: 1. Add a depth parameter to every method that can make recursive calls This is the usual solution to the problem, but I think that would modify a very large number of PoDoFo methods since dictionaries are used in lots of places, and this would break code and ABI compatibility. 2. Extend the current PdfRecursionGuard in PdfParser to other classes like the tokenizer This has a recursion count per-object so doesn’t protect against overflow using mutual recursion (e.g. when a dictionary contains an array, which contains another dictionary, which contains another array…) 3. Modify PdfRecursionGuard to store a static recursion count in thread-local storage. This means every thread has its own recursion count - errno works the same way in the C standard library – there’s an errno for each thread. This works because there’s a 1:1 mapping between threads and stacks, so the thread local variable counts calls on the current thread’s stack. The compiler needs to support the C11 thread_local keyword or one of the older equivalents, but it can be done something like this: #if defined( thread_local ) // thread_local is a macro which can be used as a feature test // see https://en.cppreference.com/w/c/thread/thread_local #define PODOFO_THREAD_LOCAL thread_local #elif defined(_MSC_VER) && defined( WINVER ) && ( WINVER >= _WIN32_WINNT_VISTA ) // supported on Windows Vista and above (limited support on Windows XP) #define PODOFO_THREAD_LOCAL __declspec( thread ) #elif defined(__GNUC__) && ( __GNUC__ >= 3 ) // supported on GCC https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html#Thread-Local #define PODOFO_THREAD_LOCAL __thread #else // fallback to process global recursion count – overcounts depth if PDFs processed in parallel on multiple threads, same result as thread_local if process single threaded #define PODOFO_THREAD_LOCAL #endif PdfRecursionGuard() { ++t_ nRecursionDepth; if ( t_ nRecursionDepth > s_maxRecursionDepth ) { PODOFO_RAISE_ERROR( ePdfError_OutOfMemory ); // ePdfError_NestedTooDeep might be better } } ~PdfRecursionGuard() { --t_ nRecursionDepth; } static void SetMaxRecursionDepth( int32_t maxRecursionDepth ) { s_maxRecursionDepth = maxRecursionDepth; } static PODOFO_THREAD_LOCAL int t_nRecursionDepth = 0; static int s_maxRecursionDepth = 500; // user-configurable with reasonable default Adding recursion counting to a method just involves adding the following local variable to any method you need to guard: PdfRecursionGuard guard; Is option 3) worth investigating? What does everyone think? Best Regards Mark -- Mark Rogers - mark.rog...@powermapper.com<mailto:mark.rog...@powermapper.com> PowerMapper Software Ltd - www.powermapper.com Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users