memory issue in ExcelExtractor
------------------------------

                 Key: TIKA-211
                 URL: https://issues.apache.org/jira/browse/TIKA-211
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.3
            Reporter: Daan de Wit


The excel extractor consumes lots and lots of memory when given an excel file 
containing a lot of numeric cells. I tested using a simple sheet containing 254 
columns and 5511 rows resulting in an 8MB big file, this blowed with an OOME 
when given 512MB.
The memory issue is caused by the java NumberFormat that is instantiated for 
every numeric cell. A solution would be to cache the NumberFormat instance in 
the TikaHSSFListener class. Since NumberFormat is not thread-safe, it might be 
necessary to pool it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to