On Wed, 2 Jul 2014, David North wrote:
I have discovered that reading my entire workbook into memory using the obvious method of loading it as an XSSFWorkbook consumes many gigabytes of memory. I want to reduce this.
Going from compressed xml, through decompressed xml and xml parsing, to many java objects does sadly mean XSSF needs many multiples of the file's size in memory use, sorry.
If you have a very large workbook with many sheets, and only care about a couple of sheets, then it's possible that a lazy-loading approach for sheets (not yet supported but not much work) might help. For someone who needs most of the file's contents, your only option for using XSSF UserModel is to smile sweatly at your sysadmin and ask them to buy you some more memory...
If I want to do streaming *read*, I need to follow the example of XSSFSheetXMLHandler, building something on top of SAX - i.e. there is no API as such for streaming read access to Row and Cell objects.
Correct, SAX reading is your only option for streaming read. POI provides a number of helper classes to make your life easier. There's also a few examples which show how to handle formatting etc when doing sax reading, try looking at the XLSX to CSV example in POI, and the XLSX to XHTML code in Apache Tika
Alternately, if you can think of a good model for providing a Workbook style paging XSSF reader on top of SAX, to provide a streaming read equivalent to SXSSF, please open a bugzilla enahncement and make a start!
Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
