At 03:48 AM 9/6/2006, Krzysztof Kowalczyk wrote:
I attempt to fix this by adding a way to get direct access to Stream's
underlying buffer. That way a client (e.g. a Lexer) can request a
buffer and getChar() logic becomes very fast "if buffer not empty, get
char from buffer, otherwise ask for another buffer".
Frankly, I was disappointed that it's only ~~5%. I was expecting much
more. It turns out that the culprit is current implementation of flate
stream, which is frequently used to compress streams inside PDFs. It
decompresses data in very small chunks (e.g. 8 bytes on average per
getBuf() call in my test) so we don't save nearly as much as if we
were getting, say, 256 bytes at a time. I'm working on improving that
as well, but this change lays the necessary foundation.
Given these two things, why not consider reading an ENTIRE
PDF Stream into memory and decompressing it - thus turning what is
now a FlateStream->FileStream path with getChar() logic into a single
MemStream with getBuf() logic?? Yes, it will mean having the entire
stream in memory - but assuming a "PC" and not an embedded device,
it's pretty safe to assume memory is present. You could make it a
document load option and you could dispose the memory when the stream
is closed.
Leonard
---------------------------------------------------------------------------
Leonard Rosenthol <mailto:[EMAIL PROTECTED]>
Chief Technical Officer <http://www.pdfsages.com>
PDF Sages, Inc. 215-938-7080 (voice)
215-938-0880 (fax)
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler