Re: [poppler] Speed improvements - chapter eleven

Leonard Rosenthol Wed, 06 Sep 2006 04:02:38 -0700

At 03:48 AM 9/6/2006, Krzysztof Kowalczyk wrote:

I attempt to fix this by adding a way to get direct access to Stream's
underlying buffer. That way a client (e.g. a Lexer) can request a
buffer and getChar() logic becomes very fast "if buffer not empty, get
char from buffer, otherwise ask for another buffer".


Frankly, I was disappointed that it's only ~~5%. I was expecting much
more. It turns out that the culprit is current implementation of flate
stream, which is frequently used to compress streams inside PDFs. It
decompresses data in very small chunks (e.g. 8 bytes on average per
getBuf() call in my test) so we don't save nearly as much as if we
were getting, say, 256 bytes at a time. I'm working on improving that
as well, but this change lays the necessary foundation.

Given these two things, why not consider reading an ENTIREPDF Stream into memory and decompressing it - thus turning what isnow a FlateStream->FileStream path with getChar() logic into a singleMemStream with getBuf() logic?? Yes, it will mean having the entirestream in memory - but assuming a "PC" and not an embedded device,it's pretty safe to assume memory is present. You could make it adocument load option and you could dispose the memory when the streamis closed.



Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] Speed improvements - chapter eleven

Reply via email to