[
https://issues.apache.org/jira/browse/ODFTOOLKIT-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003916#comment-15003916
]
Raimund Hölle edited comment on ODFTOOLKIT-388 at 11/16/15 5:19 PM:
--------------------------------------------------------------------
I found two other solutions to workaround this problem without changing the API:
* Ignore the "dummy" rows and column
* Remove the rows from the spreadsheet document after loading
The algorithm to identify a dummy row or column is in both solutions the same.
LibreOffice append one column with a repeated-column-number to get a whole
number of 1024 columns. Similar for rows - there are one or more additional
rows, with repeated-row-number to get a whole of 1048576 rows.
I tried the first solution (just ignoring such columns and rows). It was easy
to correct the row / column number methods and iterators; reading of such
spreadsheets works fine. But the problem are the modifying methods (e. g.
"appendColumn()"); since that methods work directly on the DOM it is a lot of
effort and - to be honest - I didn't understand all of the code; especially the
handling of covered cells seems to be very complicated. So I stopped this try.
However, the attached patch
"incomplete-proposal_IgnoreLibreOfficeDummyCells.patch" contains my work which
allows loading of spreadsheets and investigating the content, but it passes not
all tests.
Then I realized the second solution. That solution passes all tests and works
without problems in all tested use cases. See patch
"ODFTOOLKIT-388-CleanupLibreOfficeDummyCells.patch".
In my szenario (automatically analyzing of spreadsheets), this bug is really
critical. I fetched about 600 spreadsheet documents of different departments
from our DMS and found more than 200 with that problem. Since the emigration
from OpenOffice to LibreOffice is ongoing, I expect increasing number of
affected documents (and I'm not sure that the problem is limited to
LibreOffice).
was (Author: profhccaesar):
I found two other solutions to workaround this problem without changing the API:
* Ignore the "dummy" rows and column
* Remove the rows from the spreadsheet document after loading
The algorithm to identify a dummy row or column is in both solutions the same.
LibreOffice append one column with a repeated-column-number to get a whole
number of 1024 columns. Similar for rows - there are one or more additional
rows, with repeated-row-number to get a whole of 1048576 rows.
I tried the first solution (just ignoring such columns and rows). It was easy
to correct the row / column number methods and iterators; reading of such
spreadsheets works fine. But the problem are the modifying methods (e. g.
"appendColumn()"); since that methods work directly on the DOM it is a lot of
effort and - to be honest - I didn't understand all of the code; especially the
handling of covered cells seems to be very complicated. So I stopped this try.
However, the attached patch "IgnoreLibreOfficeDummyCells.patch" contains my
work which allows loading of spreadsheets and investigating the content, but it
passes not all tests.
Then I realized the second solution. That solution passes all tests and works
without problems in all tested use cases. See patch
"CleanupLibreOfficeDummyCells.patch".
In my szenario (automatically analyzing of spreadsheets), this bug is really
critical. I fetched about 600 spreadsheet documents of different departments
from our DMS and found more than 200 with that problem. Since the emigration
from OpenOffice to LibreOffice is ongoing, I expect increasing number of
affected documents (and I'm not sure that the problem is limited to
LibreOffice).
> Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0
> ---------------------------------------------------------------------------
>
> Key: ODFTOOLKIT-388
> URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-388
> Project: ODF Toolkit
> Issue Type: Bug
> Components: simple api
> Affects Versions: 0.6-incubating, 0.6.1-incubating
> Reporter: Bruno Girin
> Attachments: ODFTOOLKIT-388-CleanupLibreOfficeDummyCells.patch,
> SpreadsheetDocumentTest.java,
> incomplete-proposal_IgnoreLibreOfficeDummyCells.patch,
> msexcel-converted-contains-additional-problemrow.ods, saxProblem.ods,
> simple.ods, toolkit.patch
>
>
> When iterating over a simple spreadsheet created with LibreOffice 4, the code
> hangs on Row.getCellCount().
> Running the same document through the validator at
> http://odf-validator.rhcloud.com/ confirms that it is conformant to ODF1.2:
> {quote}
> The document is conformant ODF1.2!
> Details:
> simple.ods: Info: ODF version of root document: 1.2
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-manifest-schema.rng: Info:
> parsed.
> simple.ods/META-INF/manifest.xml: Info: no errors, no warnings
> simple.ods/mimetype: Info: no errors, no warnings
> simple.ods: Info: Media Type: application/vnd.oasis.opendocument.spreadsheet
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-schema.rng: Info: parsed.
> simple.ods/meta.xml: Info: Generator: LibreOffice/4.0.2.2$Linux_X86_64
> LibreOffice_project/400m0$Build-2
> simple.ods/meta.xml: Info: no errors, no warnings
> simple.ods/settings.xml: Info: no errors, no warnings
> simple.ods/styles.xml: Info: no errors, no warnings
> simple.ods/content.xml: Info: no errors, no warnings
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-dsig-schema.rng: Info: parsed.
> simple.ods: Info: no errors, no warnings
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)