It goes into PdfParser::ReadDocumentStructure()

else
{
    PdfError::LogMessage( eLogSeverity_Warning, "PDF Standard Violation: No 
/Size key was specified in the trailer directory. Will attempt to recover." );
    // Treat the xref size as unknown, and expand the xref dynamically as we 
read it.
    m_nNumObjects = 0;
}

// newcode start
// allow caller to specify a max object count to avoid very slow load times on 
large documents
if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
        PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange,  "m_nNumObjects is 
greater than m_nMaxObjects." );
// newcode end

if (m_nNumObjects > 0)
    m_offsets.resize(m_nNumObjects);

Intention for this placement was doing the check before m_offsets was resized 
(which may allocated a large chunk of memory if m_nNumObjects is a big number)

PS Line numbers in my patches all refer to PoDoFo 0.9.1

Best Regards
Mark

-----Original Message-----
From: Dominik Seichter [mailto:domseich...@googlemail.com] 
Sent: 15 July 2012 08:40
To: Mark Rogers
Cc: podofo-users@lists.sourceforge.net
Subject: Re: [Podofo-users] Patch for performance issue

Hi Mark,

I need some context again:

> .293 added
>     // allow caller to specify a max object count to avoid very slow load 
> times on large documents
>     if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
>         PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange, 
> "m_nNumObjects is greater than m_nMaxObjects." );

Which method should this go to?

Cheers,
 Dominik

On Thu, Jun 21, 2012 at 1:43 PM, Mark Rogers <mark.rog...@powermapper.com> 
wrote:
> Hi
>
> A while back I posted about a problem loading a large PDF document into 
> PoDoFo. The document in question was fairly unusual (it's a 700 page list of 
> pharmacies in North America) but took 15 minutes to load and allocated 800MB 
> of working set before throwing an out of memory error.
>
> Problem is due to:
>
> a) large number of objects (about 450,000) in document
> b) short byte sequences in the source document turning into 40-100 
> byte PdfObjects in memory (which turns a 20MB document on disk into 
> 800MB in memory)
>
> There's no easy fix without major refactoring, and the document in question 
> is pretty unusual, so a workaround seems in order. The workaround provides a 
> way for the caller to specify max number of objects to load (an exception is 
> thrown if object limit is exceeded when reading header). If the caller 
> doesn't specify an object limit the behaviour is unchanged from previous 
> versions.
>
> PdfParser.h
>
> .370 added
>
>    /**
>      * \return maximum object count to read (default is LONG_MAX
>          * which means no limit)
>      */
>     inline static long GetMaxObjectCount();
>
>     /**
>      * Specify the maximum number of objects the parser should
>      * read. An exception is thrown if document contains more
>          * objects than this. Use to avoid problems with very large
>          * documents with millions of objects, which use 500MB of
>          * working set and spend 15 mins in Load() before throwing
>          * an out of memory exception.
>      *
>      * \param nMaxObjects set max number of objects
>      */
>     inline static void SetMaxObjectCount( long nMaxObjects );
>
> .538 added
>     static long   s_nMaxObjects;
>
> .641 added
> // -----------------------------------------------------
> //
> // -----------------------------------------------------
> long PdfParser::GetMaxObjectCount()
> {
>     return s_nMaxObjects;
> }
>
> // -----------------------------------------------------
> //
> // -----------------------------------------------------
> void PdfParser::SetMaxObjectCount( long nMaxObjects ) {
>     s_nMaxObjects = nMaxObjects;
> }
>
> PdfParser.cpp
>
> .51 added
> long PdfParser::s_nMaxObjects = LONG_MAX;
>
> .293 added
>     // allow caller to specify a max object count to avoid very slow load 
> times on large documents
>     if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects)
>         PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange, 
> "m_nNumObjects is greater than m_nMaxObjects." );
>
> Best Regards
> Mark
>
> Mark Rogers - mark.rog...@powermapper.com PowerMapper Software Ltd - 
> www.powermapper.com Registered in Scotland No 362274 Quartermile 2 
> Edinburgh EH3 9GL
>
>
>
> ----------------------------------------------------------------------
> --------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. 
> Discussions will include endpoint security, mobile security and the 
> latest in malware threats. 
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to