I've just noticed that, even with the "eventfilesystem" framework, that POI seems to read the entire file anyway, even if I only ask for one section.

I got a hint of this because we only read the properties out of the files, and the time taken is O(n) with the size of the file.

I tracked this down to POIFSReader.read(InputStream), which creates a RawDataBlockList over the input stream, which then reads every chunk of the file into a 512-byte RawDataBlock. After all this, it sets up the Block Allocation Table to link the chunks together.

Question 1: what advantages do I still get by using the event filesystem if it's going to read the entire file regardless? Question 2: do other people buffer the input stream which is sent to POI? If it's reading in 512-byte chunks (far lower than the block size of the vast majority of storage devices) then I guess it would make sense to buffer it in larger chunks on the way in. Before reading this code I was working under the assumption that POI was reading in reasonable chunk sizes.

Are there any plans to replace this implementation with something more efficient? Right now I'm thinking that it would make sense to create a MappedByteBuffer over the whole file and then create windows over that buffer and then glue them back together in the right order... only reading the data when it's finally asked for.

Daniel

--
Daniel Noll

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to