memory issue in ExcelExtractor ------------------------------ Key: TIKA-211 URL: https://issues.apache.org/jira/browse/TIKA-211 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.3 Reporter: Daan de Wit
The excel extractor consumes lots and lots of memory when given an excel file containing a lot of numeric cells. I tested using a simple sheet containing 254 columns and 5511 rows resulting in an 8MB big file, this blowed with an OOME when given 512MB. The memory issue is caused by the java NumberFormat that is instantiated for every numeric cell. A solution would be to cache the NumberFormat instance in the TikaHSSFListener class. Since NumberFormat is not thread-safe, it might be necessary to pool it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.