Hi Nick

The DataFormatter works well.  However, the XSSFExcelExtractor runs out of 
memory when I try to process a reasonably large XLSX file - is there a more 
memory efficient way of processing such files?

Thanks,

- Chris

On 4 Nov 2012, at 12:44, Chris Bamford wrote:

Hi Nick,

Thanks for the steer for XLSX files.
I have tried this 
ReadMsOfficeFiles<http://codezrule.wordpress.com/2012/01/05/extract-text-from-ms-office-2007-files-docx-pptx-xlsx/>
 program and I think I may have found the cause of my particular issue i.e. 
text extraction of doubles gives very large scary looking numbers 
("9.2999999999999999E-2" instead of "0.093").

In XSSFExcelExtractor.getText():

...

// Rows and cells
for (Object rawR : sheet) {
Row row = (Row)rawR;
for(Iterator<Cell> ri = row.cellIterator(); ri.hasNext();) {
Cell cell = ri.next();

// Is it a formula one?
if(cell.getCellType() == Cell.CELL_TYPE_FORMULA && formulasNotResults) {
text.append(cell.getCellFormula());
} else if(cell.getCellType() == Cell.CELL_TYPE_STRING) {
text.append(cell.getRichStringCellValue().getString());
} else {
XSSFCell xc = (XSSFCell)cell;
text.append(xc.getRawValue());     // shouldn't this just be 
text.append(cell.toString()); ?
}

// Output the comment, if requested and exists
       Comment comment = cell.getCellComment();
if(includeCellComments && comment != null) {
   // Replace any newlines with spaces, otherwise it
   //  breaks the output
   String commentText = comment.getString().getString().replace('\n', ' ');
   text.append(" Comment by ").append(comment.getAuthor()).append(": 
").append(commentText);
}

if(ri.hasNext())
text.append("\t");
}
text.append("\n");
}

...

The highlighted line spits out the raw double in all its glory rather than just 
the text equivalent.
As this class is designed to produce text it seems reasonable to me that 
toString() would be sufficient, what do you think?
I have a spreadsheet which exhibits the problem, would you like me to send it?  
If so, how?

Thanks,

- Chris

On 2 Nov 2012, at 15:04, Nick Burch wrote:

On Fri, 2 Nov 2012, Chris Bamford wrote:
The XLS extraction is going great.  For XLSX can I use the same mechanism?

Similar. The low level file formats are very different, but there's an 
analagous extractor that uses SAX XML events rather than record events

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@poi.apache.org<mailto:user-unsubscr...@poi.apache.org>
For additional commands, e-mail: 
user-h...@poi.apache.org<mailto:user-h...@poi.apache.org>





<https://serviceB.mimecast.com/mimecast/click?account=C1A1&code=520a90a81be92c80dac7974a447e65bf>





        [ Our 
Blog<https://serviceB.mimecast.com/mimecast/click?account=C1A1&code=de5a5fb9c363a9315c48774b80382f74>
 ]   [ 
Twitter<https://serviceB.mimecast.com/mimecast/click?account=C1A1&code=0805caf78cbff352efabfad6b5793ee4>
 ]   [ 
YouTube<https://serviceB.mimecast.com/mimecast/click?account=C1A1&code=5a09c8486cc49449a18c8e13731c33aa>
 ]


        Chris Bamford
Senior Developer
m: +44 7860 405292
www.mimecast.com<https://serviceB.mimecast.com/mimecast/click?account=C1A1&code=20ad68c80d30750ecf11a8a1c4714c63>

2-8 Balfe Street, London, N1 9EG

        +44 (0) 207 843 2300





Disclaimer
The information contained in this communication from 
cbamf...@mimecast.com<mailto:cbamf...@mimecast.com> sent at 2012-11-04 12:43:30 
is confidential and may be legally privileged. It is intended solely for use by 
user@poi.apache.org<mailto:user@poi.apache.org> and others authorized to 
receive it. If you are not user@poi.apache.org<mailto:user@poi.apache.org> you 
are hereby notified that any disclosure, copying, distribution or taking action 
in reliance of the contents of this information is strictly prohibited and may 
be unlawful.

Mimecast Ltd. is a company registered in England and Wales with the company 
number 4698693 VAT No. GB 123 4197 34
Registered Office: 2 - 8 Balfe Street, Kings Cross London, N1 9EG
Email Address: i...@mimecast.com<mailto:i...@mimecast.com>

________________________________
This email message has been scanned for viruses by Mimecast.
Mimecast delivers a complete managed email solution from a single web based 
platform.
For more information please visit 
http://www.mimecast.com<http://www.mimecast.com/><http://www.mimecast.com/>

________________________________

Reply via email to