Hello Dario,

> So this is my question: is this a good idea? Is POI supposed to do such
> things?

POI has supported formula evaluation since version 3.  Specifically relating
to the functionality you are describing, there have been many improvements
as of late including:
  - New implementations for missing built-in functions
  - Fixes for bugs in existing built-in functions
  - Lazy evaluation of area and reference Evals (speed improvement)
  - Compact representation of formula Ptg arrays and non-expansion of shared
formulas (memory improvement)
  - Dependency-aware formula result caching (speed improvement)

Whether what you are doing is a good idea is a very subjective question.  A
java-only based evaluation solution may be better than one that uses (java
together with) formulas from an XLS file.  Ease of maintenance is a major
consideration. Java is arguably a better language than Excel formulas.  Java
source is easier to browse and refactor.  Having one less language in use
also helps for better focus of skills, fewer moving parts and less plumbing
code.  Another consideration is memory footprint and execution speed. POI
has improved much lately, but it is still relatively easy to implement
formulas directly in java that outperform formulas interpreted via POI.

Having said that, there may be compelling reasons why a POI/XLS based
evaluation solution is chosen:
  - The people who maintain the business logic currently in the Excel
spreadsheet(s) may not comfortable in maintaining that logic anywhere else.
  - The logic in the spreadsheet may be so large/complex that it is too
expensive to port (the spreadsheet becomes a 'black box').

For what it's worth, the company I work for has used POI for exactly this
kind of solution, so we've gained a lot of insight into some of the
considerations (beyond core POI functionality).  I'm always happy to chat if
you want to contact me directly.

> my tests have shown that the recalculation isn't
> fully working: if I change the value of A1, and A2 and A3 have formulas
that
> depend on its value, only A2 is updated but A3 not.

This sounds like a bug to me.  This is new functionality, so a bug wouldn't
be surprising.

Formula dependency tracking was a complex addition to POI (necessary for
performance).  It works by caching evaluation results for every cell
evaluated (directly and indirectly).  For each evaluated cell, POI remembers
which cells were used ('sensitive input cells') to calculate the result.  As
each cell is 'used' during a cell evaluation, POI also remembers which
formula cell(s) are 'consuming' the value.  A client of the formula
evaluator can notify the evaluator that a cell has changed.  When this
happens, the formula evaluator recurses up the tree of 'consuming cells' and
clears the cached value and dependency information.  Another subtle use-case
is when a formula cell changes definition.  In this case, the the formula
evaluator needs to also clear the 'consuming cell' link from all 'sensitive
input cells' of the changing formula cell.


As you can imagine, testing this functionality for correctness is tedious.
There are existing junit tests (TestEvaluationCache) that seem to show very
similar use-cases working but it is always useful to receive test cases that
show new bugs.  Can you make a simple example which demonstrates the
behaviour you described?


> Of course this is a
> trivial example: in reality, some cells have more than 20 levels of
> dependency, sometimes the formulas refer to cells that are "before" them,
> other times refer to cells that are after.

I am not sure what you mean by "before" and "after" here.  If you are just
talking about row/column placement in the spreadsheet, this is not a problem
- POI dependency tracking work by unique cell location, and does not
consider direction.

>                        evaluator.notifySetFormula(celda);
>                        evaluator.clearAllCachedResultValues();

BTW - you never need to call both of these methods together.  There are two
choices:
 (a) track every changing input cell and call notifySetFormula() before
evaluating. If you do this, clearAllCachedResultValues() is not needed.
 (b) don't bother traking cell changes individually, just call
clearAllCachedResultValues() after making changes but before evaluating.
Approach (a) works if you are changing formulas and values of cells one at a
time.  It is also preferred because the performance hit with
clearAllCachedResultValues() may be large.  On the other hand, if you make
big changes like adding or removing sheets, or shifting rows, you need to
use approach (b).
(Perhaps the javadocs need improving.)

Hope this helps,
Josh

Reply via email to