[jira] [Comment Edited] (ODFTOOLKIT-388) Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0

JIRA Mon, 16 Nov 2015 09:20:56 -0800

    [ 
https://issues.apache.org/jira/browse/ODFTOOLKIT-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003916#comment-15003916
 ]


Raimund Hölle edited comment on ODFTOOLKIT-388 at 11/16/15 5:19 PM:
--------------------------------------------------------------------

I found two other solutions to workaround this problem without changing the API:

* Ignore the "dummy" rows and column
* Remove the rows from the spreadsheet document after loading

The algorithm to identify a dummy row or column is in both solutions the same. 
LibreOffice append one column with a repeated-column-number to get a whole 
number of 1024 columns. Similar for rows - there are one or more additional 
rows, with repeated-row-number to get a whole of 1048576 rows.

I tried the first solution (just ignoring such columns and rows). It was easy 
to correct the row / column number methods and iterators; reading of such 
spreadsheets works fine. But the problem are the modifying methods (e. g. 
"appendColumn()"); since that methods work directly on the DOM it is a lot of 
effort and - to be honest - I didn't understand all of the code; especially the 
handling of covered cells seems to be very complicated. So I stopped this try.

However, the attached patch 
"incomplete-proposal_IgnoreLibreOfficeDummyCells.patch" contains my work which 
allows loading of spreadsheets and investigating the content, but it passes not 
all tests.

Then I realized the second solution. That solution passes all tests and works 
without problems in all tested use cases. See patch 
"ODFTOOLKIT-388-CleanupLibreOfficeDummyCells.patch".

In my szenario (automatically analyzing of spreadsheets), this bug is really 
critical. I fetched about 600 spreadsheet documents of different departments 
from our DMS and found more than 200 with that problem. Since the emigration 
from OpenOffice to LibreOffice is ongoing, I expect increasing number of 
affected documents (and I'm not sure that the problem is limited to 
LibreOffice).


was (Author: profhccaesar):
I found two other solutions to workaround this problem without changing the API:

* Ignore the "dummy" rows and column
* Remove the rows from the spreadsheet document after loading

The algorithm to identify a dummy row or column is in both solutions the same. 
LibreOffice append one column with a repeated-column-number to get a whole 
number of 1024 columns. Similar for rows - there are one or more additional 
rows, with repeated-row-number to get a whole of 1048576 rows.

I tried the first solution (just ignoring such columns and rows). It was easy 
to correct the row / column number methods and iterators; reading of such 
spreadsheets works fine. But the problem are the modifying methods (e. g. 
"appendColumn()"); since that methods work directly on the DOM it is a lot of 
effort and - to be honest - I didn't understand all of the code; especially the 
handling of covered cells seems to be very complicated. So I stopped this try.

However, the attached patch "IgnoreLibreOfficeDummyCells.patch" contains my 
work which allows loading of spreadsheets and investigating the content, but it 
passes not all tests.

Then I realized the second solution. That solution passes all tests and works 
without problems in all tested use cases. See patch 
"CleanupLibreOfficeDummyCells.patch".

In my szenario (automatically analyzing of spreadsheets), this bug is really 
critical. I fetched about 600 spreadsheet documents of different departments 
from our DMS and found more than 200 with that problem. Since the emigration 
from OpenOffice to LibreOffice is ongoing, I expect increasing number of 
affected documents (and I'm not sure that the problem is limited to 
LibreOffice).

> Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0
> ---------------------------------------------------------------------------
>
>                 Key: ODFTOOLKIT-388
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-388
>             Project: ODF Toolkit
>          Issue Type: Bug
>          Components: simple api
>    Affects Versions: 0.6-incubating, 0.6.1-incubating
>            Reporter: Bruno Girin
>         Attachments: ODFTOOLKIT-388-CleanupLibreOfficeDummyCells.patch, 
> SpreadsheetDocumentTest.java, 
> incomplete-proposal_IgnoreLibreOfficeDummyCells.patch, 
> msexcel-converted-contains-additional-problemrow.ods, saxProblem.ods, 
> simple.ods, toolkit.patch
>
>
> When iterating over a simple spreadsheet created with LibreOffice 4, the code 
> hangs on Row.getCellCount().
> Running the same document through the validator at 
> http://odf-validator.rhcloud.com/ confirms that it is conformant to ODF1.2:
> {quote}
> The document is conformant ODF1.2!
> Details:
> simple.ods: Info: ODF version of root document: 1.2
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-manifest-schema.rng: Info: 
> parsed.
> simple.ods/META-INF/manifest.xml: Info: no errors, no warnings
> simple.ods/mimetype: Info: no errors, no warnings
> simple.ods: Info: Media Type: application/vnd.oasis.opendocument.spreadsheet
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-schema.rng: Info: parsed.
> simple.ods/meta.xml: Info: Generator: LibreOffice/4.0.2.2$Linux_X86_64 
> LibreOffice_project/400m0$Build-2
> simple.ods/meta.xml: Info: no errors, no warnings
> simple.ods/settings.xml: Info: no errors, no warnings
> simple.ods/styles.xml: Info: no errors, no warnings
> simple.ods/content.xml: Info: no errors, no warnings
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-dsig-schema.rng: Info: parsed.
> simple.ods: Info: no errors, no warnings
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (ODFTOOLKIT-388) Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0

Reply via email to