[ 
https://issues.apache.org/jira/browse/TIKA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting resolved TIKA-211.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.4
         Assignee: Jukka Zitting

Thanks! Fixed in revision 757719.

PS. We don't need to worry about thread-safety as long as the NumberFormat 
instances are local to the parse() method, which is how I implemented this for 
now.

> memory issue in ExcelExtractor
> ------------------------------
>
>                 Key: TIKA-211
>                 URL: https://issues.apache.org/jira/browse/TIKA-211
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>            Reporter: Daan de Wit
>            Assignee: Jukka Zitting
>             Fix For: 0.4
>
>
> The excel extractor consumes lots and lots of memory when given an excel file 
> containing a lot of numeric cells. I tested using a simple sheet containing 
> 254 columns and 5511 rows resulting in an 8MB big file, this blowed with an 
> OOME when given 512MB.
> The memory issue is caused by the java NumberFormat that is instantiated for 
> every numeric cell. A solution would be to cache the NumberFormat instance in 
> the TikaHSSFListener class. Since NumberFormat is not thread-safe, it might 
> be necessary to pool it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to