Oscar, there is also a fork of Excel Streamer which is worth to mention: https://github.com/pjfanning/excel-streaming-reader Personally I prefer that fork, but I have no good reason or argument. In the past this fork picked up version changes of POI faster.
Cheers Andreas On Mon, 2021-05-03 at 06:05 -0500, Oscar Bastidas wrote: > Awesome. Thanks, I'll give this a try. > > Oscar > > Oscar Bastidas > Research Associate > University of Minnesota > > On Mon, May 3, 2021, 6:04 AM Andreas Reichel > <[email protected]> > wrote: > > > Greetings. > > > > Please use the Excel Streaming Reader when reading large > > files: https://github.com/monitorjbl/excel-streaming-reader > > > > import com.monitorjbl.xlsx.StreamingReader; > > > > InputStream is = new FileInputStream(new > > File("/path/to/workbook.xlsx")); > > Workbook workbook = StreamingReader.builder() > > .rowCacheSize(100) // number of rows to keep in memory > > (defaults to 10) > > .bufferSize(4096) // buffer size to use when reading > > InputStream to file (defaults to 1024) > > .open(is); // InputStream or File for XLSX file > > (required) > > > > > > > > With the code above you can loop through your rows and write it to > > CSV. > > Best regards > > Andreas > > > > > > On Mon, 2021-05-03 at 05:31 -0500, Oscar Bastidas wrote: > > > Hello, > > > > > > I am trying to read a large Excel spreadsheet (60,000 rows) but I > > > get > > > what > > > appears to be a memory leak error from the JVM when I use the > > > *XSSFWorkbook > > > *API. I learned recently that there are size limitations on > > > Excel > > > files > > > being read in this way and the Apache POI website specifically > > > recommends > > > reading the file in a streaming fashion instead of taking the > > > whole > > > file in > > > memory. To do this, POI recommends using something called > > > *XLSX2CSV* > > > but > > > the provided link to teach how to use this returns a "page not > > > found > > > error." > > > > > > Would someone please point me in the direction of how to handle > > > reading my > > > big Excel file? > > > > > > The Apache POI URL that contains the link to *XLSX2CSV* is: > > > > > > http://poi.apache.org/components/spreadsheet/limitations.html > > > > > > Thanks for any help anyone can provide. > > > > > > Oscar > > > > > > Oscar Bastidas > > > Research Associate > > > University of Minnesota > > > >
