Dominik Seichter wrote: > You are right. This is one point that needs optimizations while parsing. You > might want to look at: > > void PdfParser::ReadObjectFromStream( int nObjNo, int ) > > Current issues are: > - We read all objects from a sream immediately into memory (I think this is > ok, because we have to decompress the stream only once)
That does land up being wasteful if nothing in that stream is called for. It'd be interesting to load the stream only when the first object in the stream is required. For that matter, I even wonder about only reading objects from the stream up to the one that is required, but retaining the open file stream, filter stack, etc so we can read more from it when needed. That way, if an object only a little way in is needed, we'd avoid reading the whole lot. If a PDF's object streams have been written to reflect its structure and the PDF is being processed roughly in order that could be a big help. Since object streams can be broken up into collections, where we can read just a particular part of the collection to obtain the object we need, it seems that it's almost certainly possible to avoid the big start-up cost (in time and memory) of reading the whole lot. > - At the beginning of ReadObjectFromStream we search in a NON-SORTED list if > we read this stream already. Using binary search on a sorted list could be an > improvement here > -Maybe we can generally find a way to reduce calls to ReadObjectFromStream by > optimizing PdfParser::ReadObjectsInternal(). Those points definitely make sense. > -Another point is that the object containing the stream currently stays in > memory after we read. I think it could even be deleted savely. It should not > be referenced from anywhere else. I assumed the actual stream data would have practically no cost, since it'd be a PdfObject with a file-backed stream, and we'd be reading from that stream using stream-oriented filters. If that's not the case, then I definitely need to look into options there. -- Craig Ringer ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Podofo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/podofo-users
