Daniel Noll wrote:
I've just noticed that, even with the "eventfilesystem" framework, that
POI seems to read the entire file anyway, even if I only ask for one
section.
I got a hint of this because we only read the properties out of the
files, and the time taken is O(n) with the size of the file.
I tracked this down to POIFSReader.read(InputStream), which creates a
RawDataBlockList over the input stream, which then reads every chunk of
the file into a 512-byte RawDataBlock. After all this, it sets up the
Block Allocation Table to link the chunks together.
Question 1: what advantages do I still get by using the event filesystem
if it's going to read the entire file regardless?
Question 2: do other people buffer the input stream which is sent to
POI? If it's reading in 512-byte chunks (far lower than the block size
of the vast majority of storage devices) then I guess it would make
sense to buffer it in larger chunks on the way in. Before reading this
code I was working under the assumption that POI was reading in
reasonable chunk sizes.
It uses far less memory...benchmark it yourself if you don't believe.
Are there any plans to replace this implementation with something more
efficient? Right now I'm thinking that it would make sense to create a
MappedByteBuffer over the whole file and then create windows over that
buffer and then glue them back together in the right order... only
reading the data when it's finally asked for.
This is under way presently. Note that it really wasn't possible to do
this in JDK 1.22 (original POI target JVM when we started) with our use
case (streamin->streamout). Now it is.
-Andy
Daniel
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/