Am 10.02.2014 17:42, schrieb Rob Weir: > On Fri, Jan 31, 2014 at 6:25 AM, Nicholas Evans <[email protected]> wrote: >> Dear ODF users, >> >> For a project I am working on, I am using the ODF toolkit to create >> spreadsheets that can become rather large (>10 000 rows). I have noticed >> that as the spreadsheet gets larger, writing the rows becomes very slow. I >> have put together a class containing 4 different ways of writing 10 000 rows >> of 10 columns to a spreadsheet. The fastest method (using getRowByIndex and >> then getCellByIndex) takes 70 seconds. The methods that use getRowList and >> getNextRow are much slower, taking about 170 seconds each. The method using >> the Iterator<Row> seems to freeze for large inputs, and doesn't behave as >> expected for small inputs. >> >> I would really like to improve this performance. I think this could be done >> by manipulating the DOM directly. However, it would great if there was a >> way of using the Simple API that I have overlooked that could help me. >> >> Does anyone have experience with improving the performance of the ODF >> toolkit in the context of writing rows to an ods spreadsheet? >> > We've had discussions on this topic before. It comes down to use > cases. The DOM model with everything in memory at once, facilitates > random-access to the content of the document and a style of > programming that is similar to what one might do in spreadsheet macro. > It is a very natural way to think about a document, but it does > require a lot of RAM. > > There are specialized use-cases where it should be possible to write > code that will perform much faster, e.g.: > > 1) Uses cases that can be met with a read-onl single-pass streaming > process. In such cases you don't need a DOM at all. It could be done > via SAX. > > 2) A write-only scenario where you specify the contents of a document, > but don't need to query things like "the contents of cell B27". Iyt > is also possible to have a read/write scenario, but at increased > complexity. Finding B27 is easy in a 2D array, but harder in a sparse > matrix representation. > > Note: if we want to, we can always start up a branch to experiment > with a different approach. If it pans out we integrate it with the > trunk. If it doesn't, then we learn from the experience. I wouldn't > find starting a new package to do the read-only streaming approach. > >From my understanding it was not about exotic edge cases, but the usage of the given Simple API, which lead to performance loss. I would still love to see what changes made the differences, although I am working as well on the underlaying layer without the Simple API I could learn from it.
Thanks, Svante
