Thanks Nick!
Those two references helped a ton. I've solved the problem, and posting
some of my steps here just for Google Fodder for future people.
https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.javahad
the first secret I was missing (
formula.hasCachedResultString()):
This worked to print the value of a good formula:
if(frec.hasCachedResultString()) {
// Formula result is a string
// This is stored in the next record
outputNextStringRecord = true
nextRow = frec.getRow()
nextColumn = frec.getColumn()
} else {
thisStr = formatListener.formatNumberDateCell(frec)
}
Where this didn't (maybe that original example I referenced could get
updated?):
if(Double.isNaN( frec.getValue() )) {
// Formula result is a string
// This is stored in the next record
outputNextStringRecord = true
nextRow = frec.getRow()
nextColumn = frec.getColumn()
} else {
thisStr = formatListener.formatNumberDateCell(frec)
}
The only missing piece now is displaying the "#DIV/0" type of error
messages.
For my particular Error Formula, frec.hasCachedResultString() returns false
frec.getCachedErrorValue() returns: 7 (
http://poi.apache.org/apidocs/org/apache/poi/hssf/record/FormulaRecord.html#getCachedErrorValue()is
a very bad javadoc) - not sure if its related to
http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/FormulaError.html
frec.getParsedExpression() is an array of three:
expression: org.apache.poi.ss.formula.ptg.RefPtg [G2]
expression: org.apache.poi.ss.formula.ptg.IntPtg [0]
expression: class org.apache.poi.ss.formula.ptg.DividePtg
I tried printing frec.getCachedErrorValue() for each Formula entry to see
what a "normal' value might look like, only to find out I triggered an
exception on my first "good formula":
09:29:44.705 [main] ERROR fm.util.Resource - Caught exception using resource
java.lang.IllegalStateException: Not an error cached value - <string>
at
org.apache.poi.hssf.record.FormulaRecord$SpecialCachedValue.getErrorValue(FormulaRecord.java:163)
~[poi-3.9.jar:3.9]
at
org.apache.poi.hssf.record.FormulaRecord.getCachedErrorValue(FormulaRecord.java:252)
~[poi-3.9.jar:3.9]
at
fm.util.XLSStreamProcessor.processRecord(XLSStreamReader.scala:153)
~[classes/:na]
at
org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener.processRecord(MissingRecordAwareHSSFListener.java:189)
~[poi-3.9.jar:3.9]
at
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:85)
~[poi-3.9.jar:3.9]
at
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:112)
~[poi-3.9.jar:3.9]
Based upon the methods available to:
http://poi.apache.org/apidocs/org/apache/poi/hssf/record/FormulaRecord.htmlI
wasn't quite sure howto check for what the formula type was, but based
upon:
http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.poi/poi/3.9/org/apache/poi/hssf/record/FormulaRecord.java/?v=source
I was able to see how FormulaRecord was dealing with the various
fields
under the covers. The result is I've so far pieced this together, which
seems to do the trick:
if(frec.hasCachedResultString()) {
// Formula result is a string
// This is stored in the next record
outputNextStringRecord = true
nextRow = frec.getRow()
nextColumn = frec.getColumn()
} else {
thisStr = frec.getCachedResultType match {
case Cell.CELL_TYPE_STRING =>
formatListener.formatNumberDateCell(frec)
case Cell.CELL_TYPE_BOOLEAN => frec.getCachedBooleanValue()
match {
case false => "FALSE"
case true => "TRUE"
}
case Cell.CELL_TYPE_ERROR =>
ErrorEval.getText(frec.getCachedErrorValue())
case _ => logger.warn("Unknown Formula Result Type");
"<unknown>"
}
}
Now my test file looks exactly like my XSSF:
Vector(
Vector(text_value1, formatted_value1, TRUE, 8/7/12, 5:01:01 AM, 8/7/12
5:01, 1, 1.10, 11.1whee, #DIV/0!, sharedstrings_value),
Vector(, formatted_value2, FALSE, 8/8/12, 5:02:02 AM, 8/8/12 5:02, 2,
2.20, 22.2whee, #DIV/0!, sharedstrings_value),
Vector(text_value3, formatted_value3, TRUE, 8/9/12, 5:03:03 AM, 8/9/12
5:03, 3, 3.30, 33.3whee, #DIV/0!, ),
Vector(text_value4, formatted_value4, FALSE, 8/10/12, 5:04:04 AM, 8/10/12
5:04, 4, 4.40, 44.4whee, #DIV/0!, sharedstrings_value),
Vector(text_value5, formatted_value5, TRUE, , 5:05:05 AM, 8/11/12 5:05,
5, 5.50, 55.5whee, #DIV/0!, sharedstrings_value),
Vector(text_value6, formatted_value6, FALSE, 8/12/12, 5:06:06 AM, 8/12/12
5:06, 6, 6.60, 66.6whee, #DIV/0!, sharedstrings_value)
)
Thanks again Nick!
Eric
On Thu, Jul 18, 2013 at 3:08 AM, Nick Burch <[email protected]> wrote:
On Wed, 17 Jul 2013, Eric Peters wrote:
Based upon
http://svn.apache.org/repos/**asf/poi/trunk/src/examples/**
src/org/apache/poi/hssf/**eventusermodel/examples/**XLS2CSVmra.java<http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java>
Two other good event based examples to view are:
https://svn.apache.org/repos/**asf/poi/trunk/src/java/org/**
apache/poi/hssf/extractor/**EventBasedExcelExtractor.java<https://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java>
https://svn.apache.org/repos/**asf/tika/trunk/tika-parsers/**
src/main/java/org/apache/tika/**parser/microsoft/**ExcelExtractor.java<https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java>
I was expecting to be able to get the result of a formula (One example I
have a CONCATENATE() statement, but I end up getting a value like "0.00" or
in another column I expect an error message like: #DIV/0! but instead get
"0"Thoughts on how I can get this to match up with the XSSF?
Can you try reading that cell via the usermodel, to check if a cached
value was actually saved to the file?
Nick
------------------------------**------------------------------**---------
To unsubscribe, e-mail:
[email protected].**org<[email protected]>
For additional commands, e-mail: [email protected]