Mahmood, that was discussed many times already: XSSFWorkbook is a memory hog as it holds the whole information in a DOM tree with a lot of strings. (Does Java 8 String Deduplication help on that, when you have enough cores?) For cases like yours reading from a SXSSFWorkbook/stream will be more suitable. You also might have a look at: https://g ithub.com/monitorjbl/excel-streaming-reader eventually.
Best regards
Andreas
On Wed, 2017-05-03 at 15:40 +0000, Mahmood Naderan wrote:
> Hi
>
> I used the Apache POI to read an excel file. The file has 16000 rows and 50
> columns.
>
>
> Each cell is read as a string and they are put into a 2D array
>
>
> (e.g. a row is a string array with size 50 and 16000 rows are put in another
> array).
>
>
> the code looks like
>
>
>
> try (FileInputStream fIP = new FileInputStream(selectedFile)) {
> XSSFWorkbook wb = new XSSFWorkbook(fIP);
> XSSFSheet ws = wb.getSheetAt(0);
> maxRows = ws.getLastRowNum() + 1;
> maxColumns = ws.getRow(0).getLastCellNum();
> theRows = new OneRow[maxRows];
> Iterator< Row > rowIt = ws.iterator();
> XSSFRow row;
> int i = 0;
> long chs = 0;
> while ( rowIt.hasNext() ) {
> row = (XSSFRow) rowIt.next();
> Iterator< Cell > cellIt = row.cellIterator();
> String [] str = new String[maxColumns];
> int j = 0;
> long ch = 0;
> while ( cellIt.hasNext() ) {
> Cell cell = cellIt.next();
> str[ j ] = cell.getStringCellValue();
> ch += str[ j ].length(); // sum one row's length
> System.out.print( str[ j ].length() + " " );
> ++j;
> }
> System.out.print( "\n ->" + ch + "\n" );
> chs += ch; // sum total characters length
> theRows[ i ] = new OneRow( maxColumns );
> theRows[ i ].add( str );
> ++i;
> }
> System.out.print( "\n ==>" + chs + "\n" );
> }
>
>
>
> the program also counts the length of each cell, ch and the size of all
>
>
> rows are updated with chs += ch. Then the total number of characters are chs.
>
>
> The value is 3,230,798 (3M chars) but the used heap is about 1GB. That means
> a
>
>
> character is consuming 330 bytes!! (more or less).
>
>
>
> Using the profilers, I see that needs a lot of heap and I have to increase
>
>
> the heap size to 2048. Also the cpu time is devoted to GC which is
> undesirable.
>
>
>
> I have to say that XSSFSheet ws = wb.getSheetAt(0); is the most time and
>
>
> memory consuming part. In order to investigate that, I manually count the
> data
>
>
> structure sizes with ch and chs.
>
>
>
>
> Please see the profiler at https://i.stack.imgur.com/ZfLHj.jpg
>
> In the memory chart and from left, the first peak is where it finishes
>
>
> loading the excel file.
>
>
>
> Regards,
> Mahmood
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
signature.asc
Description: This is a digitally signed message part
