avik 2005/05/13 07:52:42
Modified: src/documentation/content/xdocs/hssf book.xml
Added: src/documentation/content/xdocs/hssf eval-devguide.xml
eval.xml
Log:
documentation for formula Evaluation. Thanks Amol!
Revision Changes Path
1.7 +2 -0 jakarta-poi/src/documentation/content/xdocs/hssf/book.xml
Index: book.xml
===================================================================
RCS file:
/home/cvs/jakarta-poi/src/documentation/content/xdocs/hssf/book.xml,v
retrieving revision 1.6
retrieving revision 1.7
diff -u -r1.6 -r1.7
--- book.xml 9 Apr 2004 13:05:09 -0000 1.6
+++ book.xml 13 May 2005 14:52:42 -0000 1.7
@@ -15,6 +15,8 @@
<menu-item label="Quick Guide" href="quick-guide.html"/>
<menu-item label="HOWTO" href="how-to.html"/>
<menu-item label="Formula Support" href="formula.html" />
+ <menu-item label="Formula Evaluation" href="eval.html" />
+ <menu-item label="Eval Dev Guide" href="eval-devguide.html" />
<menu-item label="Use Case" href="use-case.html"/>
<menu-item label="Pictorial Docs" href="diagrams.html"/>
<menu-item label="Alternatives" href="alternatives.html"/>
1.1
jakarta-poi/src/documentation/content/xdocs/hssf/eval-devguide.xml
Index: eval-devguide.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright (C) 2005 The Apache Software Foundation. All rights reserved.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"../dtd/document-v11.dtd">
<document>
<header>
<title>Developing Formula Evaluation</title>
<authors>
<person email="[EMAIL PROTECTED]" name="Amol Deshmukh"
id="AD"/>
</authors>
</header>
<body>
<section><title>Introduction</title>
<p>This document is for developers wishing to contribute to the
FormulaEvaluator API functionality.</p>
<p>Currently, contribution is desired for implementing the
standard MS
excel functions. Place holder classes for these have
been created,
contributors only need to insert implementation for the
individual "evaluate()" methods that do the actual
evaluation.</p>
</section>
<section><title>Overview of FormulaEvaluator </title>
<p>Briefly, a formula string (along with the sheet and workbook
that
form the context in which the formula is evaluated) is
first parsed
into RPN tokens using the FormulaParser class in
POI-HSSF main.
(If you dont know what RPN tokens are, now is a good
time to
read <link
href="http://www-stone.ch.cam.ac.uk/documentation/rrf/rpn.html">
this</link>.)
</p>
<section><title> The big picture</title>
<p>RPN tokens are mapped to Eval classes. (Class
hierarchy for the Evals
is best understood if you view the class
diagram in a class diagram
viewer.) Depending on the type of RPN token
(also called as Ptgs
henceforth since that is what the FormulaParser
calls the classes) a
specific type of Eval wrapper is constructed to
wrap the RPN token and
is pushed on the stack.... UNLESS the Ptg is an
OperationPtg. If it is an
OperationPtg, an OperationEval instance is
created for the specific
type of OperationPtg. And depending on how many
operands it takes,
that many Evals are popped of the stack and
passed in an array to
the OperationEval instance's evaluate method
which returns an Eval
of subtype ValueEval.Thus an operation in the
formula is evaluated. </p>
<note> An Eval is of subinterface ValueEval or
OperationEval.
Operands are always ValueEvals, Operations are
always OperationEvals.</note>
<p><code>OperationEval.evaluate(Eval[])</code>
returns an Eval which is supposed
to be of type ValueEval (actually since
ValueEval is an interface,
the return value is instance of one of the
implementations of
ValueEval). The valueEval resulting from
evaluate() is pushed on the
stack and the next RPN token is evaluated....
this continues till
eventually there are no more RPN tokens at
which point, if the formula
string was correctly parsed, there should be
just one Eval on the
stack - which contains the result of evaluating
the formula.</p>
<p>Ofcourse I glossed over the details of how AreaPtg
and ReferencePtg
are handled a little differently, but the code
should be self
explanatory for that. Very briefly, the cells
included in AreaPtg and
RefPtg are examined and their values are
populated in individual
ValueEval objects which are set into the
AreaEval and RefEval (ok,
since AreaEval and RefEval are interfaces, the
implementations of
AreaEval and RefEval - but you'll figure all
that out from the code)</p>
<p>OperationEvals for the standard operators have been
implemented and
basic testing has been done </p>
</section>
<section><title> FunctionEval and FuncVarEval</title>
<p>FunctionEval is an abstract super class of
FuncVarEval. The reason for this is that in the FormulaParser Ptg classes,
there are two Ptgs, FuncPtg and FuncVarPtg. In my tests, I did not see FuncPtg
being used so there is no corresponding FuncEval right now. But in case the
need arises for a FuncVal class, FuncEval and FuncVarEval need to be isolated
with a common interface/abstract class, hence FunctionEval.</p>
<p>FunctionEval also contains the mapping of which
function class maps to which function index. This mapping has been done for all
the functions, so all you really have to do is implement the evaluate method in
the function class that has not already been implemented. The Function indexes
are defined in AbstractFunctionPtg class in POI main.</p>
</section>
</section>
<section><title>Walkthrough of an "evaluate()" implementation.</title>
<p>So here is the fun part - lets walk through the
implementation of the excel
function... <strong>AVERAGE()</strong> </p>
<section><title>The Code</title>
<source>
public Eval evaluate(Eval[] operands) {
double d = 0;
int count = 0;
ValueEval retval = null;
for (int i = 0, iSize = operands.length; i < iSize; i++) {
if (operands[i] == null) continue;
if (operands[i] instanceof AreaEval) {
AreaEval ap = (AreaEval) operands[i];
Object[] values = ap.getValues();
for (int j = 0, jSize = values.length; j < jSize; j++) {
if (values[j] == null) continue;
if (values[j] instanceof NumberEval) {
//inside areas, ignore bools
d += ((NumberEval) values[j]).getNumberValue();
count++;
}
else if (values[j] instanceof RefEval) {
RefEval re = (RefEval) values[j];
ValueEval ve = re.getInnerValueEval();
if (ve != null && ve instanceof NumberEval) {
d += ((NumberEval) ve).getNumberValue();
count++;
}
}
}
}
else if (operands[i] instanceof NumericValueEval) {
// for direct operands evaluate bools
NumericValueEval np = (NumericValueEval) operands[i];
d += np.getNumberValue();
count++;
}
else if (operands[i] instanceof RefEval) {
RefEval re = (RefEval) operands[i];
ValueEval ve = re.getInnerValueEval();
if (ve instanceof NumberEval) {
//if it is a reference, ignore bools
NumberEval ne = (NumberEval) ve;
d += ne.getNumberValue();
count++;
}
}
}
if (retval == null) {
retval = (Double.isNaN(d)) ?
(ValueEval) ErrorEval.ERROR_503 : new NumberEval(d/count);
}
return retval;
}
</source>
</section>
<section><title>Implementation Details</title>
<ul>
<li>The implementation of the AVERAGE function lives in
package
o.a.p.hssf.record.formula.functions named
Average.java.
(Every excel function has a corresponding java
source file
in the above package) </li>
<li>If you open the file for a function thats not yet
implemented, you will see one un-implemented method:
<code>public Eval evaluate(Eval[] operands)
{}</code> </li>
<li>Since the excel Average() function can take 1 or
more operands, we iterate over all operands that are passed in the evaluate
method:
<code>for (int i=0, iSize=operands.length;
i<iSize; i++) {...}</code></li>
<li>inside the loop, you will do the following
<ol>
<li>Do a null check: <code>if
(operands[i] == null) continue;</code></li>
<li>Figure out the actual subtype of
ValueEval that the operands
implements. The possible types
that you will encounter in an
evaluate() are: NumberEval,
BoolEval, StringEval, ErrorEval,
AreaEval, RefEval,
BlankEval.</li>
<li>Implement the function. See the
next section for some
caveats on implementing the
Excel semantics. </li>
</ol>
</li>
<li>Finally before returning the NumberEval wrapping
the double value that
you computed, do one final check to see if the
double is a NaN,
if it is return ErrorEval.ERROR_503 (see the
javadoc in ErrorEval.java
for description of error codes - it is html so
you might as well
generate the javadocs)</li>
</ul>
</section>
<section><title>Modelling Excel Semantics</title>
<p>Strings are ignored. Booleans are ignored!!! (damn
Oo.o! I was almost misled here - nevermind). Actually here's the info on
Bools:
if you have formula: "=TRUE+1", it evaluates to
2.
So also, when you use TRUE like this:
"=SUM(1,TRUE)", you see the result is: 2.
So TRUE means 1 when doing numeric
calculations, right?
Wrong!
Because when you use TRUE in referenced cells
with arithmetic functions, it evaluates to blank - meaning it is not evaluated
- as if it was string or a blank cell.
eg. "=SUM(1,A1)" when A1 is TRUE evaluates to 1.
So you have to do this kind of check for every
possible data type as a function argument for any function before you
understand the behaviour of the function. The operands can be entered in excel
as comma separated or as a region specified like: A2:D4. Regions are treated as
a single token by the parser hence we have AreaEval which stores the ValueEval
at each cell in a region in a 1D array. So in our function if the operand is of
type AreaEval we need to get the array of ValueEvals in the region of the
AreaEval and iterate over each of them as if each of them were individual
operands to the AVERAGE function.
</p>
<p>Thus, since sometimes, Excel treats
Booleans as the numbers 0 and 1
(for F and T respectively).
Hence BoolEval and NumberEval
both implement a common interface:
NumericValueEval (since numbers
and bools are also valid string
values, they also implement
StringValueEval interface which is
also implemented by
StringEval).</p>
<p>
The ValueEval inside an AreaEval can be one of:
NumberEval, BoolEval, StringEval, ErrorEval,
BlankEval.
So you must handle each of these cases.
Similarly, RefEvals have a property: innerValueEval
that returns the ValueEval at the referenced cell. The ValueEval inside a
RefEval can be one of: NumberEval, BoolEval, StringEval, ErrorEval, BlankEval.
So you must handle each of these cases - see how excel treats each one of them.
</p>
</section>
</section>
<section><title>Testing Framework</title>
<fixme author="AD">TODO! FormulaEval comes with a testing framework,
where you add
formula's and their expected values to an Excel sheet, and the
test code
automatically validates them. Since this is still in flux, the
docs
will be put online once the system is stable </fixme>
</section>
</body>
</document>
1.1 jakarta-poi/src/documentation/content/xdocs/hssf/eval.xml
Index: eval.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright (C) 2005 The Apache Software Foundation. All rights reserved.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"../dtd/document-v11.dtd">
<document>
<header>
<title>Formula Evaluation</title>
<authors>
<person email="[EMAIL PROTECTED]" name="Amol Deshmukh"
id="AD"/>
</authors>
</header>
<body>
<section><title>Introduction</title>
<p>The POI formula evaluation code enables you to
calculate the result of
formulas in Excels sheets read-in, or created
in POI. This document explains
how to use the API to evaluate your formulas.
</p>
<warning> This code currently lives in Bugzilla as
<link
href="http://issues.apache.org/bugzilla/show_bug.cgi?id=34828">
bug 34828 </link>. It is expected to land in
POI CVS in the scratchpad
area soon.
</warning>
</section>
<section><title>Status</title>
<p> The code currently provides implementations for
all the arithmatic operators.
It also provides implementations for about 30
built in
functions in Excel. The framework however makes
is easy to add
implementation of new functions. See the <link
href="eval-devguide.html"> Formula
evaluation development guide</link> for
details. </p>
<p> Note that user-defined functions are not supported,
and is not likely to done
any time soon... at least, not till there is a
VB implementation in Java!
</p>
</section>
<section><title>User API How-TO</title>
<p>The following code demonstrates how to use the
HSSFFormulaEvaluator
in the context of other POI excel reading code.
</p>
<p>There are two ways in which you can use the
HSSFFormulaEvalutator API.</p>
<section><title>Using
HSSFFormulaEvaluator.<strong>evaluate</strong>(HSSFCell cell)</title>
<source>
FileInputStream fis = new FileInputStream("c:/temp/test.xls");
HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sheet = wb.getSheetAt(0);
HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb);
// suppose your formula is in B3
CellReference cellReference = new CellReference("B3");
HSSFRow row = sheet.getRow(cellReference.getRow());
HSSFCell cell = row.getCell(cellReference.getCol());
String formulaString = c.getCellFormula();
HSSFFormulaEvaluator.CellValue cellValue =
evaluator.evaluate(formulaString);
switch (cellValue.getCellType()) {
case HSSFCell.CELL_TYPE_BOOLEAN:
System.out.println(cellValue.getBooleanCellValue());
break;
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println(cellValue.getNumberCellValue());
break;
case HSSFCell.CELL_TYPE_STRING:
System.out.println(cellValue.getStringCellValue());
break;
case HSSFCell.CELL_TYPE_BLANK:
break;
case HSSFCell.CELL_TYPE_ERROR:
break;
// CELL_TYPE_FORMULA will never happen
case HSSFCell.CELL_TYPE_FORMULA:
break;
}
</source>
<p>Thus using the retrieved value (of type
HSSFFormulaEvaluator.CellValue - a
nested class) returned
by HSSFFormulaEvaluator is similar to
using a HSSFCell object
containing the value of the formula
evaluation. CellValue is
a simple value object and does not
maintain reference
to the original cell.
</p>
</section>
<section><title>Using
HSSFFormulaEvaluator.<strong>evaluateInCell</strong>(HSSFCell cell)
</title>
<source>
FileInputStream fis = new FileInputStream("c:/temp/test.xls");
HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sheet = wb.getSheetAt(0);
HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb);
// suppose your formula is in B3
CellReference cellReference = new CellReference("B3");
HSSFRow row = sheet.getRow(cellReference.getRow());
HSSFCell cell = row.getCell(cellReference.getCol());
String formulaString = c.getCellFormula();
if (cell!=null) {
switch (<strong>evaluator.evaluateInCell</strong>(cell).getCellType()) {
case HSSFCell.CELL_TYPE_BOOLEAN:
System.out.println(cell.getBooleanCellValue());
break;
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println(cell.getNumberCellValue());
break;
case HSSFCell.CELL_TYPE_STRING:
System.out.println(cell.getStringCellValue());
break;
case HSSFCell.CELL_TYPE_BLANK:
break;
case HSSFCell.CELL_TYPE_ERROR:
System.out.println(cell.getErrorCellValue());
break;
// CELL_TYPE_FORMULA will never occur
case HSSFCell.CELL_TYPE_FORMULA:
break;
}
}
</source>
</section>
</section>
<section><title></title>
</section>
<section><title>Performance Notes</title>
<ul>
<li>Generally you should have to create only
one HSSFFormulaEvaluator
instance per sheet, but there really is
no overhead in creating
multiple HSSFFormulaEvaluators per
sheet other than that of the
HSSFFormulaEvaluator object creation.
</li>
<li>Also note that HSSFFormulaEvaluator
maintains a reference to
the sheet and workbook, so ensure that
the evaluator instance
is available for garbage collection
when you are done with it
(in other words don't maintain long
lived reference to
HSSFFormulaEvaluator if you don't
really need to - unless
all references to the sheet and
workbook are removed, these
don't get garbage collected and
continue to occupy potentially
large amounts of memory).
</li>
<li>CellValue instances however do not maintain
reference to the
HSSFCell or the sheet or workbook, so
these can be long-lived
objects without any adverse effect on
performance.
</li>
</ul>
</section>
</body>
</document>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta POI Project: http://jakarta.apache.org/poi/