Hi everyone, I'm integrating Tika with my application and need your help to figure out if the OOM I'm getting is due to the way I'm using Tika or if it is an issue with parsing XML files.
The following example code is causing OOM on 7th iteration with -Xmx2g. The test will pass with -Xmx4g. The XML file I'm trying to parse is 51mb in size. I do not see this issue with other file types that I tested so far. Memory usage keeps on growing with XML file types, but stays constant with other file types. public class Extractor { private BodyContentHandler contentHandler = new BodyContentHandler(-1); private AutoDetectParser parser = new AutoDetectParser(); private Metadata metadata = new Metadata(); public String extract(File file) throws Exception { try { stream = TikaInputStream.get(file); parser.parse(stream, contentHandler, metadata); return contentHandler.toString(); } finally { stream.close(); } } } public static void main(...) { Extractor extractor = new Extractor(); File file = new File("C:\\temp\\test.xml"); for (int i = 0; i < 20; i++) { extractor.extract(file); } Any idea if this is an issue with XML files or if the issue in my code? Thanks Steve