Re: Spreadsheet performance (earlier -- Re: SAXNotRecognizedException when opening a spreadsheet created with LibreOffice)

Svante Schubert Tue, 15 Apr 2014 05:47:30 -0700

Hi Bruno,

I agree. Let's move the discussion to the issue:
https://issues.apache.org/jira/browse/ODFTOOLKIT-388


For the user-list readers only, you might want as well register for the
dev list, as the users where the issue comments are sent to.
http://incubator.apache.org/odftoolkit/mailing-lists.html#development-mailing-list

Cheers,
Svante

Am 15.04.2014 14:41, schrieb Bruno Girin:
> Hi Svante,
>
> I read your comment in the bug report but I'm slightly confused by it.
> Considering this file is straight out of LibreOffice with no modification
> and validates as conformant to ODF1.2, shouldn't the library be able to
> handle it without having to tweak the XML by hand?
>
> Bruno
>
>
>
> On 15 April 2014 13:26, Svante Schubert <[email protected]> wrote:
>
>> Hi Bruno,
>>
>> my comment is (hopefully) related to the looping problem earlier
>> mentioned in your test, for further info please take a look into the
>> issue, where I just added further infos see:
>>
>> https://issues.apache.org/jira/browse/ODFTOOLKIT-388?focusedCommentId=13969480&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13969480
>>
>> PS: FOR EDITING ODF
>> To easily edited the content.xml of a file I suggest to use JEdit, see
>> www.jedit.org using the Archive plugin
>> http://plugins.jedit.org/plugins/?Archive (after pressing open file,
>> select an ODF and before pressing OPEN button, open the plugin dialog
>> and select a file within the ODF zip).
>> After that I use the http://plugins.jedit.org/plugins/?XML to indent the
>> XML. (the latter can be mapped to a short-cut making work faster).
>> The big advantage of this approach you may edit and save the embedded
>> XML without unzip and zip all the time!
>>
>> Regards,
>> Svante
>>
>> Am 15.04.2014 14:08, schrieb Bruno Girin:
>>> Hi Svante,
>>>
>>> I'm happy to help with this. However, I'm not sure how that relates to my
>>> problem: I am just trying to read a spreadsheet's content by iterating
>> over
>>> tables, rows and cells in that spreadsheet not knowing how many of those
>>> items I have when I open the file.
>>>
>>> From Nick's comment, it looks like my code is not iterating over the
>>> spreadsheet properly so I would welcome any suggestion as to how I should
>>> do that.
>>>
>>> In short, what I'm trying to do is this:
>>>
>>> open spreadsheet
>>> for table : spreadsheet {
>>>   for row : table {
>>>     for cell : row {
>>>       store the value in a completely separate structure
>>>     }
>>>   }
>>> }
>>>
>>>
>>> Cheers,
>>>
>>> Bruno
>>>
>>>
>>>
>>> On 15 April 2014 12:45, Svante Schubert <[email protected]>
>> wrote:
>>>> I am still working on a patch for fixing the performance problem for
>>>> spreadsheets. But again this will come a little later this year with a
>>>> major contribution and major refactoring is required on my side before
>>>> submittable.
>>>> Basically the idea for fix is to separate any functionality for altering
>>>> cells into two basic functions to avoid code redundancy:
>>>>
>>>> One function will be selecting the range of cells to be altered (might
>>>> be a single cell, row, column, full table or sub-rectangle). The
>>>> altering might be any arbitrary change to be applied on the range's
>>>> column/row/cells (e.g. "draw borders around the selected range" or for
>>>> instance "alter the styles from those cells containing content/format").
>>>> This change will be covered by second function (or class with a certain
>>>> method/interface a only JDK 8 support Lambda functions, calling a
>>>> function with a function as parameter).
>>>> By this split, I could reuse the selection part, which is quite
>>>> difficult with all the repeated/coverage of columns/rows and cells.
>>>> Note: The second function will be called for every column/row/cells
>>>> within the given range. Seems to have quite a good performance in my
>>>> current tests..
>>>>
>>>> Best regard,
>>>> Svante
>>>>
>>>> Am 15.04.2014 13:22, schrieb Bruno Girin:
>>>>> Hi Nick,
>>>>>
>>>>>
>>>>> On 15 April 2014 11:49, Nicholas Evans <[email protected]> wrote:
>>>>>
>>>>>> Dear Bruno,
>>>>>>
>>>>>> I have tried out your test code and cannot reproduce the exception
>> that
>>>> you
>>>>>> get.
>>>>>>
>>>>> I don't seem to be able to reproduce it either when running it in the
>> ODF
>>>>> toolkit copy taken from SVN this morning but it hangs instead. This is
>>>> the
>>>>> behaviour I was seeing when I wrote the first implementation using
>>>>> v0.5-incubating from the Maven repositories; moving to v0.6-incubating
>>>>> using the .jar directly seemed to fix the hanging problem but triggered
>>>> the
>>>>> exception. Obviously what might have happened is that by doing that I
>>>>> introduced an interfering dependency.
>>>>>
>>>>> Is there any plan to make ODF Toolkit available in the Maven
>> repositories
>>>>> again so that client projects can just reference the Maven repo?
>>>>>
>>>>>
>>>>>
>>>>>> I can load the spreadsheet in without a problem, and can also query
>> the
>>>>>> spreadsheet as expected.
>>>>>>
>>>>>> I couldn't get your test to pass because it seems to take a long time
>> to
>>>>>> run.  Methods like getRowCount() can return much higher values than
>> you
>>>>>> expect (on your test code it returns 1048576 for me), and
>>>> getRowByIndex()
>>>>>> is a very slow method for large numbers of rows.
>>>>>>
>>>>> Right, so what is the best way to iterate over all rows in a table and
>>>> all
>>>>> cells in a row? This is a very simple spreadsheet with 1 table, 1 row
>>>> and 3
>>>>> cells in the first row.
>>>>>
>>>>> If getRowCount() and getRowByIndex() are unreliable and slow, should
>> they
>>>>> be deprecated or at least identified as not safe for general use?
>>>>>
>>>>>
>>>>>
>>>>>> Are you running this code in a clean project without other
>> dependencies
>>>>>> which might be interfering?  If not perhaps you could try this?
>>>>>>
>>>>> This is very possible as my project also uses Apache POI so there may
>> be
>>>>> some dependencies that interfere. As explained above, I'm currently
>> using
>>>>> the 0.6 jar files direct but would prefer to just reference the project
>>>> in
>>>>> a Maven repo to let Maven sort out dependencies.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Bruno
>>>>>
>>

Re: Spreadsheet performance (earlier -- Re: SAXNotRecognizedException when opening a spreadsheet created with LibreOffice)

Reply via email to