On Fri, Jan 31, 2014 at 6:25 AM, Nicholas Evans <[email protected]> wrote: > Dear ODF users, > > For a project I am working on, I am using the ODF toolkit to create > spreadsheets that can become rather large (>10 000 rows). I have noticed > that as the spreadsheet gets larger, writing the rows becomes very slow. I > have put together a class containing 4 different ways of writing 10 000 rows > of 10 columns to a spreadsheet. The fastest method (using getRowByIndex and > then getCellByIndex) takes 70 seconds. The methods that use getRowList and > getNextRow are much slower, taking about 170 seconds each. The method using > the Iterator<Row> seems to freeze for large inputs, and doesn't behave as > expected for small inputs. > > I would really like to improve this performance. I think this could be done > by manipulating the DOM directly. However, it would great if there was a > way of using the Simple API that I have overlooked that could help me. > > Does anyone have experience with improving the performance of the ODF > toolkit in the context of writing rows to an ods spreadsheet? >
We've had discussions on this topic before. It comes down to use cases. The DOM model with everything in memory at once, facilitates random-access to the content of the document and a style of programming that is similar to what one might do in spreadsheet macro. It is a very natural way to think about a document, but it does require a lot of RAM. There are specialized use-cases where it should be possible to write code that will perform much faster, e.g.: 1) Uses cases that can be met with a read-onl single-pass streaming process. In such cases you don't need a DOM at all. It could be done via SAX. 2) A write-only scenario where you specify the contents of a document, but don't need to query things like "the contents of cell B27". Iyt is also possible to have a read/write scenario, but at increased complexity. Finding B27 is easy in a 2D array, but harder in a sparse matrix representation. Note: if we want to, we can always start up a branch to experiment with a different approach. If it pans out we integrate it with the trunk. If it doesn't, then we learn from the experience. I wouldn't find starting a new package to do the read-only streaming approach. -Rob > Regards, > Nick
