[ https://issues.apache.org/jira/browse/TIKA-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-211. -------------------------------- Resolution: Fixed Fix Version/s: 0.4 Assignee: Jukka Zitting Thanks! Fixed in revision 757719. PS. We don't need to worry about thread-safety as long as the NumberFormat instances are local to the parse() method, which is how I implemented this for now. > memory issue in ExcelExtractor > ------------------------------ > > Key: TIKA-211 > URL: https://issues.apache.org/jira/browse/TIKA-211 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.3 > Reporter: Daan de Wit > Assignee: Jukka Zitting > Fix For: 0.4 > > > The excel extractor consumes lots and lots of memory when given an excel file > containing a lot of numeric cells. I tested using a simple sheet containing > 254 columns and 5511 rows resulting in an 8MB big file, this blowed with an > OOME when given 512MB. > The memory issue is caused by the java NumberFormat that is instantiated for > every numeric cell. A solution would be to cache the NumberFormat instance in > the TikaHSSFListener class. Since NumberFormat is not thread-safe, it might > be necessary to pool it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.