Hi Ryan, Back in May we had some discussions about performance in the POI-Dev list. Bryce Alcock analyzed the situation and had a conversation with the XMLBeans about the issue. It has to do with some inefficient code within the XMLBeans. http://www.mail-archive.com/[email protected]/msg02207.html
Basically there are places where XMLBeans causes POI to be O(N**2). XMLBeans group suggests XMLCursor but I am not sure that is an option. Here is Yegor's comment within Bryce's POI-Dev thread. http://www.mail-archive.com/[email protected]/msg05922.html - there are many emails in the thread. This really is something I am very interested in improving, I just haven't found the time. If people have some resources to apply to the problem we will find time to help. Regards, Dave On Sep 30, 2010, at 11:55 AM, Ryan Skow wrote: > > Converting to XLS results in the workbook being opened in less than a second > using POI 3.7b3 so that is probably what I'll do as a workaround. Thank you > very much for that idea. > > To hopefully help get this problem solved for others and for future > reference, I have included answers to the rest of the questions that were > asked. > > General information related to the conversion and file size: > POI Version File Format File Size(bytes) Open Time(seconds) > 3.6 xlsx 1,501,533 28 > 3.6 xls 4,951,040 1 > 3.7b3 xlsx 1,501,533 600 > 3.7b3 xls 4,951,040 1 > > The workbook contains 8 sheets and the (columns,rows) are as follows: > Sheet 0: (19,123) (mostly raw numeric/text values - inputs) > Sheet 1: (13,124) (almost all formulas) > Sheet 2: (33,1459) (almost all formulas) > Sheet 3: (20,1459) (almost all formulas) > Sheet 4: (19,131) (almost all formulas) > Sheet 5: (28, 775) (almost all raw numeric values) > Sheet 6: (11, 109) (almost all raw numeric values) > Sheet 7: (33, 89) (mostly tables used in VLOOKUP) > > Attributes of the Workbook: > * There are LOTS of VLOOKUP function calls. > * There are LOTS regular multiply/divide/add/subtract which use back > referenced cells in their calculations. > * There are LOTS of nested cell references. > * There are quite a few 'named' cells which are referred to throughout the > various calculations. > * There are NO references to the 'SUM' function or other array related > calculations from what I can tell. > > You are correct that the performance issue has nothing to do with INDIRECT - > I removed those references and the timings were still the same. > > As a side note, I did figure out what was wrong with the formula values. It > turned out to be a problem in MathX.java and its implementation of the MOD > function. It incorrectly reported mod(13,12) to be 0.9999999999999991 > instead of simply 1. This threw off some of the VLOOKUP calls since the MODs > were being used as the index of the VLOOKUP table. I have submitted a patch > including test case through Bugzilla: > https://issues.apache.org/bugzilla/show_bug.cgi?id=50033 > > If additional information would be helpful, please let me know and I'll track > it down. > > Thanks! > > On 9/30/2010 6:33 AM, Yegor Kozlov wrote: >> How big is the problem workbook? Can you tell us its size in MB, the >> number of sheets and the size of data matrix on each sheet. Every little >> bit of information helps. >> >> Can you obfuscate sensitive data? Without a sample file we can't do much >> to help you. >> >> I suspect the slowness may be related to the support for array formulas >> which was introduced right after 3.6-FINAL. Does your workbook contain >> array formulas? They look something like SUM(C11:C12*D11:D12). >> Evaluation of array formulas is not yet fully implemented and it >> explains why you got incorrect formula results. >> >> Indirect evaluation of defined names is unlikely to affect performance. >> It was an isolated fix in the implementation of INDIRECT. >> >> Try to save the file as .xls and run the same code against HSSF. How >> much is it faster? >> >> Regards, >> Yegor >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
