Re: Spreadsheet performance (earlier -- Re: SAXNotRecognizedException when opening a spreadsheet created with LibreOffice)

Svante Schubert Tue, 15 Apr 2014 05:28:07 -0700

Hi Bruno,

my comment is (hopefully) related to the looping problem earlier
mentioned in your test, for further info please take a look into the
issue, where I just added further infos see:
https://issues.apache.org/jira/browse/ODFTOOLKIT-388?focusedCommentId=13969480&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13969480


PS: FOR EDITING ODF
To easily edited the content.xml of a file I suggest to use JEdit, see
www.jedit.org using the Archive plugin
http://plugins.jedit.org/plugins/?Archive (after pressing open file,
select an ODF and before pressing OPEN button, open the plugin dialog
and select a file within the ODF zip).
After that I use the http://plugins.jedit.org/plugins/?XML to indent the
XML. (the latter can be mapped to a short-cut making work faster).
The big advantage of this approach you may edit and save the embedded
XML without unzip and zip all the time!

Regards,
Svante

Am 15.04.2014 14:08, schrieb Bruno Girin:
> Hi Svante,
>
> I'm happy to help with this. However, I'm not sure how that relates to my
> problem: I am just trying to read a spreadsheet's content by iterating over
> tables, rows and cells in that spreadsheet not knowing how many of those
> items I have when I open the file.
>
> From Nick's comment, it looks like my code is not iterating over the
> spreadsheet properly so I would welcome any suggestion as to how I should
> do that.
>
> In short, what I'm trying to do is this:
>
> open spreadsheet
> for table : spreadsheet {
>   for row : table {
>     for cell : row {
>       store the value in a completely separate structure
>     }
>   }
> }
>
>
> Cheers,
>
> Bruno
>
>
>
> On 15 April 2014 12:45, Svante Schubert <[email protected]> wrote:
>
>> I am still working on a patch for fixing the performance problem for
>> spreadsheets. But again this will come a little later this year with a
>> major contribution and major refactoring is required on my side before
>> submittable.
>> Basically the idea for fix is to separate any functionality for altering
>> cells into two basic functions to avoid code redundancy:
>>
>> One function will be selecting the range of cells to be altered (might
>> be a single cell, row, column, full table or sub-rectangle). The
>> altering might be any arbitrary change to be applied on the range's
>> column/row/cells (e.g. "draw borders around the selected range" or for
>> instance "alter the styles from those cells containing content/format").
>> This change will be covered by second function (or class with a certain
>> method/interface a only JDK 8 support Lambda functions, calling a
>> function with a function as parameter).
>> By this split, I could reuse the selection part, which is quite
>> difficult with all the repeated/coverage of columns/rows and cells.
>> Note: The second function will be called for every column/row/cells
>> within the given range. Seems to have quite a good performance in my
>> current tests..
>>
>> Best regard,
>> Svante
>>
>> Am 15.04.2014 13:22, schrieb Bruno Girin:
>>> Hi Nick,
>>>
>>>
>>> On 15 April 2014 11:49, Nicholas Evans <[email protected]> wrote:
>>>
>>>> Dear Bruno,
>>>>
>>>> I have tried out your test code and cannot reproduce the exception that
>> you
>>>> get.
>>>>
>>> I don't seem to be able to reproduce it either when running it in the ODF
>>> toolkit copy taken from SVN this morning but it hangs instead. This is
>> the
>>> behaviour I was seeing when I wrote the first implementation using
>>> v0.5-incubating from the Maven repositories; moving to v0.6-incubating
>>> using the .jar directly seemed to fix the hanging problem but triggered
>> the
>>> exception. Obviously what might have happened is that by doing that I
>>> introduced an interfering dependency.
>>>
>>> Is there any plan to make ODF Toolkit available in the Maven repositories
>>> again so that client projects can just reference the Maven repo?
>>>
>>>
>>>
>>>> I can load the spreadsheet in without a problem, and can also query the
>>>> spreadsheet as expected.
>>>>
>>>> I couldn't get your test to pass because it seems to take a long time to
>>>> run.  Methods like getRowCount() can return much higher values than you
>>>> expect (on your test code it returns 1048576 for me), and
>> getRowByIndex()
>>>> is a very slow method for large numbers of rows.
>>>>
>>> Right, so what is the best way to iterate over all rows in a table and
>> all
>>> cells in a row? This is a very simple spreadsheet with 1 table, 1 row
>> and 3
>>> cells in the first row.
>>>
>>> If getRowCount() and getRowByIndex() are unreliable and slow, should they
>>> be deprecated or at least identified as not safe for general use?
>>>
>>>
>>>
>>>> Are you running this code in a clean project without other dependencies
>>>> which might be interfering?  If not perhaps you could try this?
>>>>
>>> This is very possible as my project also uses Apache POI so there may be
>>> some dependencies that interfere. As explained above, I'm currently using
>>> the 0.6 jar files direct but would prefer to just reference the project
>> in
>>> a Maven repo to let Maven sort out dependencies.
>>>
>>> Cheers,
>>>
>>> Bruno
>>>
>>

Re: Spreadsheet performance (earlier -- Re: SAXNotRecognizedException when opening a spreadsheet created with LibreOffice)

Reply via email to