Hi Ryan,

Back in May we had some discussions about performance in the POI-Dev list. 
Bryce Alcock analyzed the situation and had a conversation with the XMLBeans 
about the issue. It has to do with some inefficient code within the XMLBeans. 
http://www.mail-archive.com/[email protected]/msg02207.html

Basically there are places where XMLBeans causes POI to be O(N**2). XMLBeans 
group suggests XMLCursor but I am not sure that is an option.

Here is Yegor's comment within Bryce's POI-Dev thread. 
http://www.mail-archive.com/[email protected]/msg05922.html - there are many 
emails in the thread.

This really is something I am very interested in improving, I just haven't 
found the time. If people have some resources to apply to the problem we will 
find time to help.

Regards,
Dave

On Sep 30, 2010, at 11:55 AM, Ryan Skow wrote:

> 
> Converting to XLS results in the workbook being opened in less than a second 
> using POI 3.7b3 so that is probably what I'll do as a workaround.  Thank you 
> very much for that idea.
> 
> To hopefully help get this problem solved for others and for future 
> reference, I have included answers to the rest of the questions that were 
> asked.
> 
> General information related to the conversion and file size:
> POI Version  File Format  File Size(bytes)  Open Time(seconds)
> 3.6             xlsx        1,501,533               28
> 3.6             xls         4,951,040                1
> 3.7b3           xlsx        1,501,533              600
> 3.7b3           xls         4,951,040                1
> 
> The workbook contains 8 sheets and the (columns,rows) are as follows:
> Sheet 0: (19,123) (mostly raw numeric/text values - inputs)
> Sheet 1: (13,124) (almost all formulas)
> Sheet 2: (33,1459) (almost all formulas)
> Sheet 3: (20,1459) (almost all formulas)
> Sheet 4: (19,131) (almost all formulas)
> Sheet 5: (28, 775) (almost all raw numeric values)
> Sheet 6: (11, 109) (almost all raw numeric values)
> Sheet 7: (33, 89) (mostly tables used in VLOOKUP)
> 
> Attributes of the Workbook:
>  * There are LOTS of VLOOKUP function calls.
>  * There are LOTS regular multiply/divide/add/subtract which use back 
> referenced cells in their calculations.
>  * There are LOTS of nested cell references.
>  * There are quite a few 'named' cells which are referred to throughout the 
> various calculations.
>  * There are NO references to the 'SUM' function or other array related 
> calculations from what I can tell.
> 
> You are correct that the performance issue has nothing to do with INDIRECT - 
> I removed those references and the timings were still the same.
> 
> As a side note, I did figure out what was wrong with the formula values.  It 
> turned out to be a problem in MathX.java and its implementation of the MOD 
> function.  It incorrectly reported mod(13,12) to be 0.9999999999999991 
> instead of simply 1.  This threw off some of the VLOOKUP calls since the MODs 
> were being used as the index of the VLOOKUP table.  I have submitted a patch 
> including test case through Bugzilla:
> https://issues.apache.org/bugzilla/show_bug.cgi?id=50033
> 
> If additional information would be helpful, please let me know and I'll track 
> it down.
> 
> Thanks!
> 
> On 9/30/2010 6:33 AM, Yegor Kozlov wrote:
>> How big is the problem workbook? Can you tell us its size in MB, the
>> number of sheets and the size of data matrix on each sheet. Every little
>> bit of information helps.
>> 
>> Can you obfuscate sensitive data? Without a sample file we can't do much
>> to help you.
>> 
>> I suspect the slowness may be related to the support for array formulas
>> which was introduced right after 3.6-FINAL. Does your workbook contain
>> array formulas? They look something like SUM(C11:C12*D11:D12).
>> Evaluation of array formulas is not yet fully implemented and it
>> explains why you got incorrect formula results.
>> 
>> Indirect evaluation of defined names is unlikely to affect performance.
>> It was an isolated fix in the implementation of INDIRECT.
>> 
>> Try to save the file as .xls and run the same code against HSSF. How
>> much is it faster?
>> 
>> Regards,
>> Yegor
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to