Dominik Seichter wrote:

> You are right. This is one point that needs optimizations while parsing. You 
> might want to look at:
> 
> void PdfParser::ReadObjectFromStream( int nObjNo, int )
> 
> Current issues are:
> - We read all objects from a sream immediately into memory (I think this is 
> ok, because we have to decompress the stream only once)

That does land up being wasteful if nothing in that stream is called for.

It'd be interesting to load the stream only when the first object in the 
stream is required.

For that matter, I even wonder about only reading objects from the 
stream up to the one that is required, but retaining the open file 
stream, filter stack, etc so we can read more from it when  needed. That 
way, if an object only a little way in is needed, we'd avoid reading the 
whole lot. If a PDF's object streams have been written to reflect its 
structure and the PDF is being processed roughly in order that could be 
a big help.

Since object streams can be broken up into collections, where we can 
read just a particular part of the collection to obtain the object we 
need, it seems that it's almost certainly possible to avoid the big 
start-up cost (in time and memory) of reading the whole lot.

> - At the beginning of ReadObjectFromStream we search in a NON-SORTED list if 
> we read this stream already. Using binary search on a sorted list could be an 
> improvement here
> -Maybe we can generally find a way to reduce calls to ReadObjectFromStream by 
> optimizing PdfParser::ReadObjectsInternal(). 

Those points definitely make sense.

> -Another point is that the object containing the stream currently stays in 
> memory after we read. I think it could even be deleted savely. It should not 
> be referenced from anywhere else.

I assumed the actual stream data would have practically no cost, since 
it'd be a PdfObject with a file-backed stream, and we'd be reading from 
that stream using stream-oriented filters. If that's not the case, then 
I definitely need to look into options there.

--
Craig Ringer

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to