I've just noticed that, even with the "eventfilesystem" framework, that
POI seems to read the entire file anyway, even if I only ask for one
section.
I got a hint of this because we only read the properties out of the
files, and the time taken is O(n) with the size of the file.
I tracked this down to POIFSReader.read(InputStream), which creates a
RawDataBlockList over the input stream, which then reads every chunk of
the file into a 512-byte RawDataBlock. After all this, it sets up the
Block Allocation Table to link the chunks together.
Question 1: what advantages do I still get by using the event filesystem
if it's going to read the entire file regardless?
Question 2: do other people buffer the input stream which is sent to
POI? If it's reading in 512-byte chunks (far lower than the block size
of the vast majority of storage devices) then I guess it would make
sense to buffer it in larger chunks on the way in. Before reading this
code I was working under the assumption that POI was reading in
reasonable chunk sizes.
Are there any plans to replace this implementation with something more
efficient? Right now I'm thinking that it would make sense to create a
MappedByteBuffer over the whole file and then create windows over that
buffer and then glue them back together in the right order... only
reading the data when it's finally asked for.
Daniel
--
Daniel Noll
NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax: (02) 9283 9020
This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/