Daniel Noll wrote:
I've just noticed that, even with the "eventfilesystem" framework, that POI seems to read the entire file anyway, even if I only ask for one section.

I got a hint of this because we only read the properties out of the files, and the time taken is O(n) with the size of the file.

I tracked this down to POIFSReader.read(InputStream), which creates a RawDataBlockList over the input stream, which then reads every chunk of the file into a 512-byte RawDataBlock. After all this, it sets up the Block Allocation Table to link the chunks together.

Question 1: what advantages do I still get by using the event filesystem if it's going to read the entire file regardless? Question 2: do other people buffer the input stream which is sent to POI? If it's reading in 512-byte chunks (far lower than the block size of the vast majority of storage devices) then I guess it would make sense to buffer it in larger chunks on the way in. Before reading this code I was working under the assumption that POI was reading in reasonable chunk sizes.


It uses far less memory...benchmark it yourself if you don't believe.

Are there any plans to replace this implementation with something more efficient? Right now I'm thinking that it would make sense to create a MappedByteBuffer over the whole file and then create windows over that buffer and then glue them back together in the right order... only reading the data when it's finally asked for.


This is under way presently. Note that it really wasn't possible to do this in JDK 1.22 (original POI target JVM when we started) with our use case (streamin->streamout). Now it is.

-Andy

Daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to