[ 
https://issues.apache.org/jira/browse/ODFTOOLKIT-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436921#comment-13436921
 ] 

Rob Weir commented on ODFTOOLKIT-333:
-------------------------------------

Very large spreadsheets are a weakness in ODF.  The overhead of each cell, plus 
markup, bloats the file to incredible size.  The ZIP compression brings it down 
to a small fraction, but the time and memory required to uncompress is large.  
This is an issue for OOXML (.xlsx) as well.  That was one reason Microsoft also 
created a binary encoding of spreadsheet files with an XLSB extension:  
http://blogs.msdn.com/b/dmahugh/archive/2006/08/22/712835.aspx

On top of that, the ODF Toolkit is based on a DOM representation of the 
document.  This makes it wonderful for random access to various parts of the 
document.  The developer can access any cell at any time, can add content then 
styles, or styles then content, work however they way.  But this comes with 
memory overhead.

There is no easy fix here that I can see. But there is one very useful approach 
we could consider.  That would be to add another module to the Toolkit, maybe 
call it ODFSAX or ODFStreamer or something like that.  As the name suggests, we 
could do a SAX parse, and instead of instantiating the entire document we could 
define event handlers like onHeader(), onFooter(), on onParagraph(), etc. that 
would be called, in document order.  This is a more constrained solution -- 
read-only, single pass, no random access, and it would require the developer to 
plan their logic in a way that fits the single-pass way of looking at the 
document, but it would performance far, far better for such uses.
                
> Exception in thread "main" java.lang.OutOfMemoryError
> -----------------------------------------------------
>
>                 Key: ODFTOOLKIT-333
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-333
>             Project: ODF Toolkit
>          Issue Type: Bug
>          Components: odfdom, performance, simple api
>    Affects Versions: 0.8.7, 0.8.8
>            Reporter: Vicente Villegas Larios
>         Attachments: bigFile.ods
>
>
> I have been facing an issue related to "Out of Memory", so I'm trying to read 
> a ODS file with 1.4 MB and ODF code is throwing the following exception.
> Exception in thread "main" java.lang.OutOfMemoryError
>       at java.util.Arrays.copyOfRange(Unknown Source)
>       at java.util.Arrays.copyOf(Unknown Source)
>       at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:105)
>       at org.odftoolkit.odfdom.pkg.StreamHelper.stream(StreamHelper.java:74)
>       at 
> org.odftoolkit.odfdom.pkg.StreamHelper.transformStream(StreamHelper.java:48)
>       at org.odftoolkit.odfdom.pkg.OdfPackage.getBytes(OdfPackage.java:1584)
>       at 
> org.odftoolkit.odfdom.pkg.OdfPackage.getInputStream(OdfPackage.java:1650)
>       at org.odftoolkit.odfdom.pkg.OdfFileDom.initialize(OdfFileDom.java:137)
>       at 
> org.odftoolkit.odfdom.dom.OdfContentDom.initialize(OdfContentDom.java:60)
>       at org.odftoolkit.odfdom.pkg.OdfFileDom.<init>(OdfFileDom.java:87)
>       at org.odftoolkit.odfdom.dom.OdfContentDom.<init>(OdfContentDom.java:50)
>       at org.odftoolkit.odfdom.pkg.OdfFileDom.newFileDom(OdfFileDom.java:110)
>       at 
> org.odftoolkit.odfdom.pkg.OdfPackageDocument.getFileDom(OdfPackageDocument.java:280)
>       at 
> org.odftoolkit.odfdom.dom.OdfSchemaDocument.getFileDom(OdfSchemaDocument.java:393)
>       at 
> org.odftoolkit.odfdom.dom.OdfSchemaDocument.getContentDom(OdfSchemaDocument.java:197)
>       at org.odftoolkit.simple.Document.getContentRoot(Document.java:762)
>       at 
> org.odftoolkit.simple.SpreadsheetDocument.getContentRoot(SpreadsheetDocument.java:217)
> I know that ODF files are like zip files, so I changed the extension to "zip" 
> and extracted to a folder, it seems like the folder size is 180MB.
> Beside that I exported the content to a "xls"  file and I used POI to perform 
> the same operation and seemed to work ok with POI. Seems like ODFtoolkit 
> doesn't has support to read big files.
> I notice that the content is stored in a XML file, thinking about it, seems 
> like ODF is using DOM instead of SAX parser. 
> Is there any one who can help me to fix this problem? 
> I'm attaching the "ods" file with the data that throws the out of memory.
> Many thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to