Thanks Nick! Those two references helped a ton. I've solved the problem, and posting some of my steps here just for Google Fodder for future people.
https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.javahad the first secret I was missing ( formula.hasCachedResultString()): This worked to print the value of a good formula: if(frec.hasCachedResultString()) { // Formula result is a string // This is stored in the next record outputNextStringRecord = true nextRow = frec.getRow() nextColumn = frec.getColumn() } else { thisStr = formatListener.formatNumberDateCell(frec) } Where this didn't (maybe that original example I referenced could get updated?): if(Double.isNaN( frec.getValue() )) { // Formula result is a string // This is stored in the next record outputNextStringRecord = true nextRow = frec.getRow() nextColumn = frec.getColumn() } else { thisStr = formatListener.formatNumberDateCell(frec) } The only missing piece now is displaying the "#DIV/0" type of error messages. For my particular Error Formula, frec.hasCachedResultString() returns false frec.getCachedErrorValue() returns: 7 ( http://poi.apache.org/apidocs/org/apache/poi/hssf/record/FormulaRecord.html#getCachedErrorValue()is a very bad javadoc) - not sure if its related to http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/FormulaError.html frec.getParsedExpression() is an array of three: expression: org.apache.poi.ss.formula.ptg.RefPtg [G2] expression: org.apache.poi.ss.formula.ptg.IntPtg [0] expression: class org.apache.poi.ss.formula.ptg.DividePtg I tried printing frec.getCachedErrorValue() for each Formula entry to see what a "normal' value might look like, only to find out I triggered an exception on my first "good formula": 09:29:44.705 [main] ERROR fm.util.Resource - Caught exception using resource java.lang.IllegalStateException: Not an error cached value - <string> at org.apache.poi.hssf.record.FormulaRecord$SpecialCachedValue.getErrorValue(FormulaRecord.java:163) ~[poi-3.9.jar:3.9] at org.apache.poi.hssf.record.FormulaRecord.getCachedErrorValue(FormulaRecord.java:252) ~[poi-3.9.jar:3.9] at fm.util.XLSStreamProcessor.processRecord(XLSStreamReader.scala:153) ~[classes/:na] at org.apache.poi.hssf.eventusermodel.MissingRecordAwareHSSFListener.processRecord(MissingRecordAwareHSSFListener.java:189) ~[poi-3.9.jar:3.9] at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:85) ~[poi-3.9.jar:3.9] at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:112) ~[poi-3.9.jar:3.9] Based upon the methods available to: http://poi.apache.org/apidocs/org/apache/poi/hssf/record/FormulaRecord.htmlI wasn't quite sure howto check for what the formula type was, but based upon: http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.poi/poi/3.9/org/apache/poi/hssf/record/FormulaRecord.java/?v=source I was able to see how FormulaRecord was dealing with the various fields under the covers. The result is I've so far pieced this together, which seems to do the trick: if(frec.hasCachedResultString()) { // Formula result is a string // This is stored in the next record outputNextStringRecord = true nextRow = frec.getRow() nextColumn = frec.getColumn() } else { thisStr = frec.getCachedResultType match { case Cell.CELL_TYPE_STRING => formatListener.formatNumberDateCell(frec) case Cell.CELL_TYPE_BOOLEAN => frec.getCachedBooleanValue() match { case false => "FALSE" case true => "TRUE" } case Cell.CELL_TYPE_ERROR => ErrorEval.getText(frec.getCachedErrorValue()) case _ => logger.warn("Unknown Formula Result Type"); "<unknown>" } } Now my test file looks exactly like my XSSF: Vector( Vector(text_value1, formatted_value1, TRUE, 8/7/12, 5:01:01 AM, 8/7/12 5:01, 1, 1.10, 11.1whee, #DIV/0!, sharedstrings_value), Vector(, formatted_value2, FALSE, 8/8/12, 5:02:02 AM, 8/8/12 5:02, 2, 2.20, 22.2whee, #DIV/0!, sharedstrings_value), Vector(text_value3, formatted_value3, TRUE, 8/9/12, 5:03:03 AM, 8/9/12 5:03, 3, 3.30, 33.3whee, #DIV/0!, ), Vector(text_value4, formatted_value4, FALSE, 8/10/12, 5:04:04 AM, 8/10/12 5:04, 4, 4.40, 44.4whee, #DIV/0!, sharedstrings_value), Vector(text_value5, formatted_value5, TRUE, , 5:05:05 AM, 8/11/12 5:05, 5, 5.50, 55.5whee, #DIV/0!, sharedstrings_value), Vector(text_value6, formatted_value6, FALSE, 8/12/12, 5:06:06 AM, 8/12/12 5:06, 6, 6.60, 66.6whee, #DIV/0!, sharedstrings_value) ) Thanks again Nick! Eric On Thu, Jul 18, 2013 at 3:08 AM, Nick Burch <[email protected]> wrote: > On Wed, 17 Jul 2013, Eric Peters wrote: > >> Based upon >> http://svn.apache.org/repos/**asf/poi/trunk/src/examples/** >> src/org/apache/poi/hssf/**eventusermodel/examples/**XLS2CSVmra.java<http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java> >> > > Two other good event based examples to view are: > https://svn.apache.org/repos/**asf/poi/trunk/src/java/org/** > apache/poi/hssf/extractor/**EventBasedExcelExtractor.java<https://svn.apache.org/repos/asf/poi/trunk/src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java> > https://svn.apache.org/repos/**asf/tika/trunk/tika-parsers/** > src/main/java/org/apache/tika/**parser/microsoft/**ExcelExtractor.java<https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java> > > > I was expecting to be able to get the result of a formula (One example I >> have a CONCATENATE() statement, but I end up getting a value like "0.00" or >> in another column I expect an error message like: #DIV/0! but instead get >> "0"Thoughts on how I can get this to match up with the XSSF? >> > > Can you try reading that cell via the usermodel, to check if a cached > value was actually saved to the file? > > Nick > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > [email protected].**org<[email protected]> > For additional commands, e-mail: [email protected] > >
