Hi A while back I posted about a problem loading a large PDF document into PoDoFo. The document in question was fairly unusual (it's a 700 page list of pharmacies in North America) but took 15 minutes to load and allocated 800MB of working set before throwing an out of memory error.
Problem is due to: a) large number of objects (about 450,000) in document b) short byte sequences in the source document turning into 40-100 byte PdfObjects in memory (which turns a 20MB document on disk into 800MB in memory) There's no easy fix without major refactoring, and the document in question is pretty unusual, so a workaround seems in order. The workaround provides a way for the caller to specify max number of objects to load (an exception is thrown if object limit is exceeded when reading header). If the caller doesn't specify an object limit the behaviour is unchanged from previous versions. PdfParser.h .370 added /** * \return maximum object count to read (default is LONG_MAX * which means no limit) */ inline static long GetMaxObjectCount(); /** * Specify the maximum number of objects the parser should * read. An exception is thrown if document contains more * objects than this. Use to avoid problems with very large * documents with millions of objects, which use 500MB of * working set and spend 15 mins in Load() before throwing * an out of memory exception. * * \param nMaxObjects set max number of objects */ inline static void SetMaxObjectCount( long nMaxObjects ); .538 added static long s_nMaxObjects; .641 added // ----------------------------------------------------- // // ----------------------------------------------------- long PdfParser::GetMaxObjectCount() { return s_nMaxObjects; } // ----------------------------------------------------- // // ----------------------------------------------------- void PdfParser::SetMaxObjectCount( long nMaxObjects ) { s_nMaxObjects = nMaxObjects; } PdfParser.cpp .51 added long PdfParser::s_nMaxObjects = LONG_MAX; .293 added // allow caller to specify a max object count to avoid very slow load times on large documents if (s_nMaxObjects != LONG_MAX && m_nNumObjects > s_nMaxObjects) PODOFO_RAISE_ERROR_INFO( ePdfError_ValueOutOfRange, "m_nNumObjects is greater than m_nMaxObjects." ); Best Regards Mark Mark Rogers - mark.rog...@powermapper.com PowerMapper Software Ltd - www.powermapper.com Registered in Scotland No 362274 Quartermile 2 Edinburgh EH3 9GL ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users