[ 
https://issues.apache.org/jira/browse/ODFTOOLKIT-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971379#comment-13971379
 ] 

Bruno Girin commented on ODFTOOLKIT-388:
----------------------------------------

After some analysis, here is a summary of what happens. It all comes down to a 
couple of things that LibreOffice does:
- it uses repeated empty cells and rows to specify the maximum size of the 
sheet area irrespective of whether the whole sheet is filled in or not;
- the number of columns declared is a maximum and in most cases the sheet does 
not contain a single row with that many columns.

h2. How LibreOffice specifies the document

First, it declares a number of columns that may or may not be filled in but 
that more importantly for LibreOffice defines what styles to use:
{code}
<table:table-column table:style-name="co1" table:number-columns-repeated="257" 
table:default-cell-style-name="ce1"/>
{code}

Then at the end of each row, it includes a repeated empty cell (note the self 
closing tag with no content and no {{office:value-type}} attribute):
{code}
<table:table-cell table:number-columns-repeated="254"/>
{code}

At the end of the sheet, it defines a number of repeated empty rows, i.e. rows 
with one or several empty cells:
{code}
<table:table-row table:style-name="ro1" table:number-rows-repeated="1048574">
  <table:table-cell table:number-columns-repeated="257"/>
</table:table-row>
<table:table-row table:style-name="ro1">
  <table:table-cell table:number-columns-repeated="257"/>
</table:table-row>
{code}

When taking this info at face value, it declares a spreadsheet that has 257 
columns and 1048576 rows even though the there are only 3 cells in a single row 
that are not empty.

h2. What ODF Toolkit does

The {{Table.getRowCount()}} method is fairly straight-forward and counts each 
row by adding all {{number-rows-repeated}} values together. However, as it 
doesn't recognise empty rows, it returns a value of 1048576 even though there 
is only one non-empty row.

The {{Row.getCellCount()}} method is more complex as it tries to take into 
account the cover list by calling {{Table.getCellCoverInfos}} and that's where 
it hangs for the following reasons:
- {{Row.getCellCount()}} calls {{Table.getCellCoverInfos}} by giving it the 
number of columns (taken from the columns declared in the document irrespective 
of whether the particular row it's on really has that many cells) and the total 
number of rows so in this case {{Table.getCellCoverInfos}} iterates over what 
it believes is a 257 by 1048576 cells sheet;
- {{Table.getCellCoverInfos}} calls {{Table.getCellByPosition}} which itself 
calls {{Table.getRowByIndex}} and {{Row.getCellByIndex}} all of which have the 
side effects of creating missing instances of Cell or Row on the fly;
- The result of {{Table.getCellCoverInfos}} is re-calculated every time 
{{Row.getCellCount()}} is called.

So this means that it is possible to craft a very small ODS document that if 
ODF1.2 compliant and can trigger ODF Toolkit to hang, as LibreOffice does.

h2. How to fix this

In order to fix this, I would suggest the following (none of which sounds easy):
# Recognise empty rows and cells and change {{Table.getRowCount()}} and 
{{Row.getCellCount()}} so that they don't include those in results,
# Take the number of columns as an indicative maximum rather than the actual 
number of columns,
# Split {{Table.getCellByPosition}}, {{Table.getRowByIndex}} and 
{{Row.getCellByIndex}} into versions with and versions without side effects,
# Abstract cover list handling into the Table class so that it caches it and 
doesn't re-create it each time {{Row.getCellCount()}} is called (there are 
probably additional optimisations that can be done on the cover list: for 
example, you can assume that any cell that covers another one has row and 
column indices that are lower than the covered cell).

This can probably be done without breaking the existing behaviour of the API 
but is not a small endeavour. So I'd appreciate feedback and suggestions before 
I start fiddling with code.

> Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0
> ---------------------------------------------------------------------------
>
>                 Key: ODFTOOLKIT-388
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-388
>             Project: ODF Toolkit
>          Issue Type: Bug
>          Components: simple api
>    Affects Versions: 0.6-incubating, 0.6.1-incubating
>            Reporter: Bruno Girin
>         Attachments: SpreadsheetDocumentTest.java, saxProblem.ods, 
> simple.ods, toolkit.patch
>
>
> When iterating over a simple spreadsheet created with LibreOffice 4, the code 
> hangs on Row.getCellCount().
> Running the same document through the validator at 
> http://odf-validator.rhcloud.com/ confirms that it is conformant to ODF1.2:
> {quote}
> The document is conformant ODF1.2!
> Details:
> simple.ods: Info: ODF version of root document: 1.2
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-manifest-schema.rng: Info: 
> parsed.
> simple.ods/META-INF/manifest.xml: Info: no errors, no warnings
> simple.ods/mimetype: Info: no errors, no warnings
> simple.ods: Info: Media Type: application/vnd.oasis.opendocument.spreadsheet
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-schema.rng: Info: parsed.
> simple.ods/meta.xml: Info: Generator: LibreOffice/4.0.2.2$Linux_X86_64 
> LibreOffice_project/400m0$Build-2
> simple.ods/meta.xml: Info: no errors, no warnings
> simple.ods/settings.xml: Info: no errors, no warnings
> simple.ods/styles.xml: Info: no errors, no warnings
> simple.ods/content.xml: Info: no errors, no warnings
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-dsig-schema.rng: Info: parsed.
> simple.ods: Info: no errors, no warnings
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to