A similar question: I want to simply extract the text from a .xlsx file. I'm reading from a rather sparse xlsx file. It is a 20 meg file, using a JVM with 990 megs, with POIXMLTextExtractor. I get Out of Memory. What do you recommend? Thanks, Mark Barton
John Borys wrote: > > Thank you. This is a tremendous help. Your time is greatly appreciated. > > John K. Borys > Projects & Financial Controls > Desk: 312.930.3134 > [email protected] > > CME Group > A CME/Chicago Board of Trade Company > 20 S. Wacker > Chicago, Illinois 60606 > http://www.cmegroup.com/ > > -----Original Message----- > From: Yegor Kozlov [mailto:[email protected]] > Sent: Sunday, December 07, 2008 8:24 AM > To: POI Users List > Subject: Re: Large Reports > > I created an example demonstrating how to generate large workbooks and > avoid OutOfMemory: > http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java > > It works as I suggested: > 1. creates a template workbook > 2. generates a sample XML with random data. It can be a really large XML > with millions of rows and thousands of columns. > 3. substitutes the sheet in the template with the generated xml > > It's not a ready-to-use API, rather a proof of the concept. To use more > advanced features (merged cells, custom height > or width of rows and cells, conditional formats, etc.) you will need to > study the SpreadsheetML documentation and > enhance the demo. > > Yegor > >> Yegor, >> >> Thank you for your excellent response. I figured I would have to resort >> to something like this. >> >> You mentioned I would need to "create a template file using poi-ooxml". >> >> How do I do this? Are there any documentation or tutorials on the >> subject? >> >> John K. Borys >> Projects & Financial Controls >> Desk: 312.930.3134 >> [email protected] >> >> CME Group >> A CME/Chicago Board of Trade Company >> 20 S. Wacker >> Chicago, Illinois 60606 >> http://www.cmegroup.com/ >> >> >> -----Original Message----- >> From: Yegor Kozlov [mailto:[email protected]] >> Sent: Tuesday, December 02, 2008 10:54 AM >> To: POI Users List >> Subject: Re: Large Reports >> >> Unfortunately poi-ooxml has quite a good appetite for memory, under same >> conditions you will get OutOfMemory on less >> number of rows, I would say less in 2x. Increasing JVML heap will help >> till a certain limit, if you allocate 2 GB (the >> limit for 32-bit JVM), you will be able to generate 100K but not 1 >> million of rows. >> >> The memory requirement depends of the row-cell grid density. Sparse rows >> require less memory then rows with every cell set. >> >> If you need to generate such large worksheets, I would recommend direct >> streaming in XML. >> >> The approach would be to create a template file using poi-ooxml, Setup >> sheets, number formats, cell styles, etc. >> Then write a custom application that streams data in a text file. You >> don't need a deep knowledge of SpreadsheetML >> format for that, just follow the pattern in the template. The final step >> would be to inject this file in the template. >> >> It's not very trivial but should be possible. >> >> Regards, >> Yegor >> >>> I have been tasked with generating Enterprise Reports and writing them >>> to Excel Spreadsheets. When using POI, the program crashes after about >>> 30,000 records are processed. Our system requires millions of records >>> to be processed. POI's limit is 65K rows and some change. Excel 2007 >>> is now capable of processing over a million rows of data. Is there a >>> tool available either open source or for purchase that can handle >>> writing large quantities of data (over 1 million rows) to an Excel >>> spreadsheet? This data will in turn be used to populate pivot tables or >>> be merged with an Excel Template. >>> >>> John K. Borys >>> Projects & Financial Controls >>> Desk: 312.930.3134 >>> [email protected]<mailto:[email protected]> >>> >>> CME Group >>> A CME/Chicago Board of Trade Company >>> 20 S. Wacker >>> Chicago, Illinois 60606 >>> http://www.cmegroup.com/ >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > -- View this message in context: http://www.nabble.com/Large-Reports-tp20795592p23197679.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
