[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2019-03-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #13 from Dominik Stadler  ---
Thanks for the note, but commenting on a resolved issue has a high chance of
being lost, better report a new issue with a mention of the previous issue and
the information about how to rerproduce your problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2019-03-25 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

Travis Burtrum  changed:

   What|Removed |Added

 CC||admin.apache@moparisthebest
   ||.com

--- Comment #12 from Travis Burtrum  ---
Created attachment 36497
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36497=edit
PoiFormulaChecker class demonstrating problem

Here is a minimal testcase showing the bug which seems to only affect XSSF.

when ran with HSSFWorkbook this is printed:

evaluatedCell: org.apache.poi.ss.usermodel.CellValue [39.57]
impl: HSSFWorkbook, maxRow: 4, elapsed: 209ms, 0s, 0min

when ran with XSSFWorkbook:

evaluatedCell: org.apache.poi.ss.usermodel.CellValue [#N/A]
impl: XSSFWorkbook, maxRow: 4, elapsed: 248ms, 0s, 0min

or with my patch to throw the exception instead of return null:

Exception in thread "main" java.lang.RuntimeException: XSSFBUG: would
shortcircuit wrongly: rowIndex: 2, _lastDefinedRow: 1

at
org.apache.poi.xssf.usermodel.XSSFEvaluationSheet.getCell(XSSFEvaluationSheet.java:71)
at
org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:139)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:534)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:275)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:216)
at
org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:56)
at
org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluate(BaseFormulaEvaluator.java:110)
at PoiFormulaChecker.main(PoiFormulaChecker.java:47)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2019-03-25 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #11 from Travis Burtrum  ---
Hello all,

We are upgrading from poi 3.17 to 4.0.1 and formula evaluation was broken, I
used git bisect on poi and tracked it down to this line specifically:

https://github.com/apache/poi/commit/6390202491c3c77a77df3a1b4f2dc205ebc5b307#diff-e3bae629e723fe5bac4ba5d2e85451cbR69

It goes back to working if I comment that line out, or change _lastDefinedRow
to _xs.getLastRowNum(), I changed it to instead in that if block do this to
attempt to track the problem down:

throw new RuntimeException(String.format("XSSFBUG: would shortcircuit wrongly:
rowIndex: %d, _lastDefinedRow: %d%n", rowIndex, _lastDefinedRow));

And the stack trace was:

Caused by: java.lang.RuntimeException: XSSFBUG: would shortcircuit wrongly:
rowIndex: 40, _lastDefinedRow: 39

at
org.apache.poi.xssf.usermodel.XSSFEvaluationSheet.getCell(XSSFEvaluationSheet.java:72)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateReference(WorkbookEvaluator.java:782)
at
org.apache.poi.ss.formula.SheetRefEvaluator.getEvalForCell(SheetRefEvaluator.java:48)
at
org.apache.poi.ss.formula.SheetRangeEvaluator.getEvalForCell(SheetRangeEvaluator.java:74)
at
org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:51)
at
org.apache.poi.ss.formula.LazyAreaEval.getRelativeValue(LazyAreaEval.java:45)
at
org.apache.poi.ss.formula.eval.AreaEvalBase.getValue(AreaEvalBase.java:128)
at
org.apache.poi.ss.formula.functions.LookupUtils$ColumnVector.getItem(LookupUtils.java:100)
at
org.apache.poi.ss.formula.functions.LookupUtils.lookupIndexOfExactValue(LookupUtils.java:504)
at
org.apache.poi.ss.formula.functions.LookupUtils.lookupIndexOfValue(LookupUtils.java:483)
at
org.apache.poi.ss.formula.functions.Vlookup.evaluate(Vlookup.java:61)
at
org.apache.poi.ss.formula.functions.Var3or4ArgFunction.evaluate(Var3or4ArgFunction.java:36)
at
org.apache.poi.ss.formula.OperationEvaluatorFactory.evaluate(OperationEvaluatorFactory.java:144)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateFormula(WorkbookEvaluator.java:534)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluateAny(WorkbookEvaluator.java:275)
at
org.apache.poi.ss.formula.WorkbookEvaluator.evaluate(WorkbookEvaluator.java:216)
at
org.apache.poi.xssf.usermodel.BaseXSSFFormulaEvaluator.evaluateFormulaCellValue(BaseXSSFFormulaEvaluator.java:56)
at
org.apache.poi.ss.formula.BaseFormulaEvaluator.evaluate(BaseFormulaEvaluator.java:110)

Clearly the number of rows in the spreadsheet is increasing without
_lastDefinedRow being updated (as it's only set once in the constructor, and
only updated on a call to clearAllCachedResultValues() which never happens).  I
have tried and failed to reproduce this with a small standalone example though,
I was hoping maybe someone else can spot the problem.

Thanks much,
Travis

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #10 from Javen O'Neal  ---
Throw in some @Override's for good measure

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

Greg Woolsey  changed:

   What|Removed |Added

 Status|NEEDINFO|RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #9 from Greg Woolsey  ---
(In reply to PJ Fanning from comment #7)
> Can we call the new method getLastRowNum instead of getlastRowNum?
> Also, a lot of public methods have had their signatures changed. Do we need
> to keep the existing signatures too and deprecate the old versions of the
> methods?

Argh, hate it when I miss things like that.  Yes, fixed it in r1817325.

The "public" methods here are either in classes marked in docs as "POI
internal" or called only from methods similarly documented, so I think changing
their signatures is fine.  I'm not aware of anyone trying to hook into formula
evaluation down at the internal WorkbookEvaluator level.

I'm not opposed to duplicating and deprecating, but in this case I don't think
it's needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #8 from Luca Martini  ---
(In reply to Greg Woolsey from comment #5)
> Changes in r1817252
> 

Thank you very much Greg.


> Of this remaining time, about 2/3 is taken up in the formula evaluation
> caching and tracking mechanism.  Bypassing it for null cells causes test
> failures, which shows it is necessary, but relatively expensive.  It appears
> to try to optimize and minimize the "empty cell" rectangular regions it
> holds. but assumes processing by row then column.  That may be a memory/time
> optimization we want to consider allowing additional strategies for.
> 
> Note that this shortcut logic doesn't change the result of any methods, only
> avoids busywork that didn't apply to the "nonexistent cell" cases.
> 
> This doesn't optimize VLOOKUP directly, but is about 70% improvement
> sufficient?

I think so. That's more or less the fix I had in mind.

> 
> Changing the VLOOKUP code itself is actually significantly more complex,
> because POI handles sheets by row internally, and columns are second-class
> constructs.  There is no easy way to determine the last row with data in a
> column other than iterating over all defined rows.  With these
> optimizations, the extra iterations should fail fast.

I know, and I think that with current state of the data structure, iterating
over every defined row could be worse than your current solution.
Here we are still on POI 3.x, but it should not be difficult to integrate your
changes in our forked version.

For me the bug is considered as resolved. I still do not change its status
because others have still pending comments.

Best regards,
Luca

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #7 from PJ Fanning  ---
Can we call the new method getLastRowNum instead of getlastRowNum?
Also, a lot of public methods have had their signatures changed. Do we need to
keep the existing signatures too and deprecate the old versions of the methods?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-06 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #6 from Javen O'Neal  ---
(In reply to Greg Woolsey from comment #5)
> 14.4s
> when XSSFEvaluationSheet caches the value of getLastRowNum(), since it comes
> from a TreeMap.lastKey() which has to navigate the tree each time to find
> the last key.

Side-note: should XSSFSheet cache the last row rather than iterating over the
TreeMap for every call? Hopefully we could drop in a better Sorted map
implementation, one that keeps a pointer to the first and last keys.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-05 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #5 from Greg Woolsey  ---
Changes in r1817252

Interesting.  In a local test with the attached sample file, I found these
results:

45s (second run) 
with current codebase issuing FormulaEvaluator.evaluateAll() on the workbook.

19s 
By just changing XSSFEvaluationSheet.getCell(row, col) to immediately return
null if the row index > sheet.getLastRowNum() 

14.4s
when XSSFEvaluationSheet caches the value of getLastRowNum(), since it comes
from a TreeMap.lastKey() which has to navigate the tree each time to find the
last key.

12.1s
after optimizing the blank cell tracking a bit to know about the last row with
data.

That's all without changing anything int he VLOOKUP evaluation and still
iterating over the max # of rows per column.

Of this remaining time, about 2/3 is taken up in the formula evaluation caching
and tracking mechanism.  Bypassing it for null cells causes test failures,
which shows it is necessary, but relatively expensive.  It appears to try to
optimize and minimize the "empty cell" rectangular regions it holds. but
assumes processing by row then column.  That may be a memory/time optimization
we want to consider allowing additional strategies for.

Note that this shortcut logic doesn't change the result of any methods, only
avoids busywork that didn't apply to the "nonexistent cell" cases.

This doesn't optimize VLOOKUP directly, but is about 70% improvement
sufficient?

Changing the VLOOKUP code itself is actually significantly more complex,
because POI handles sheets by row internally, and columns are second-class
constructs.  There is no easy way to determine the last row with data in a
column other than iterating over all defined rows.  With these optimizations,
the extra iterations should fail fast.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-05 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #4 from Greg Woolsey  ---
It looks to me like the sheet is available:

Vlookup.evalutate() receives a ValueEval that is converted to a TwoDEval via
LookupUtils.resolveTableArrayArg().

TwoDEval interface has only one base class implementing it, AreaEvalBase.

AreaEvalBase abstract class has 2 non-test implementations, cached and lazy. 
In both cases actual cell ValueEval references are available, which means
sheet/row/column eventually.  Seems to me we could augment the TwoDEval
interface to indicate the last physical values for rows and columns along with
the existing getWidth() and getHeight() methods.  These could be used when
creating ValueVector objects for the lookup iterations to reduce the vector
size to the maximum defined values, since no matches would exist outside those
anyway.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-01 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #3 from Javen O'Neal  ---
Looking at the POI implementation of Vlookup.java, the class currently doesn't
have a reference to the sheet that contains the lookup table. This will make a
fix non-trivial.

The function that creates the AreaEval that is passed to the evaluate function
might have a reference to the lookup sheet.
Caveats: lookup table may be contained in a different workbook or sheet than
the cell that contains the formula that is being evaluated.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-01 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #2 from Luca Martini  ---
Nick,
I have already contributed some patches in the past. 
However I am not sure to have the time to work to this fix at the moment.
I opened the bug for future reference and to check if someone with more
knowledge about evaluators could help.
In fact, at a first glance, the current workbook is not easy recoverable from
the WorkbookEvaluator without changing its interface and I am not sure that is
safe and sound to do so.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-01 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

Nick Burch  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO

--- Comment #1 from Nick Burch  ---
Would you be able to put together a patch with the improved logic in?

See http://poi.apache.org/guidelines.html and
http://poi.apache.org/howtobuild.html and http://poi.apache.org/subversion.html
for more on building and contributing!

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org



[Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference

2017-12-01 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

Luca Martini  changed:

   What|Removed |Added

 CC||chiaramarches...@tagetik.co
   ||m, guar...@hotmail.com,
   ||samsontes...@tagetik.com
 OS||All

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org