Charles G Harvey wrote:
Does anyone have any tips or pointers for reading in large excel files? I 
expect to be working with files over 100megs... as far as I can tell, even 
after registering a listener, POI reads the entire file before disposing of it, 
which equals heap space errors for me. Do methods exist for parsing each line 
of an excel doc as poi reads it in from the stream, getting access to those 
cell objects, and then allowing the objects to be garbage collected as I move 
on to the next row of cells?

Yes. Most of the examples show POI in the mode where it slurps the whole file. Poi is a memory hog even on 1Mb files. You can simply forget reading 100Mb excel workbooks into Poi all at once.

You will probably have to use the POI event-driven API that reads the file a bit at a time. (If you are familiar with XML processing, this is like using DOM versus SAX parsing, but that's a digression.)

Poi provides example code "Xls2CsvMra" which does a stream-based conversion of old-fashioned "XLS" files to CSV. Search for it, it's not hard to find.

I provide example code "Xlsx2Csv" which does a stream-based conversion of new-fangled "XLSX" documents to CSV. You might look here: http://chris-lott.org/software/ Maybe someday the POI project will accept that code into their examples area.

HTH

chris...

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to