Thanks for the links - I've put a posting on the Tika ML. I've just checked and we using tika-0.2.jar - does anyone know which version I can use with solr 1.3?
Is there any info on upgrading from this far back to the latest version - is it even possible? or would I need to re-index everything? On Tue, Jan 17, 2012 at 5:39 AM, P Williams <williams.tricia.l...@gmail.com> wrote: > Hi, > > I'm not sure which version of Solr/Tika you're using but I had a similar > experience which turned out to be the result of a design change to PDFBox. > > https://issues.apache.org/jira/browse/SOLR-2886 > > Tricia > > On Sat, Jan 14, 2012 at 12:53 AM, Wayne W <waynemailingli...@gmail.com>wrote: > >> Hi, >> >> we're using Solr running on tomcat with 1GB in production, and of late >> we've been having a huge number of OutOfMemory issues. It seems from >> what I can tell this is coming from the tika extraction of the >> content. I've processed the java dump file using a memory analyzer and >> its pretty clean at least the class involved. It seems like a leak to >> me, as we don't parse any files larger than 20M, and these objects are >> taking up ~700M >> >> I've attached 2 screen shots from the tool (not sure if you receive >> attachments). >> >> But to summarize (class, number of objects, Used heap size, Retained Heap >> Size): >> >> >> org.apache.xmlbeans.impl.store.Xob$ElementXObj 838,993 >> 80,533,728 604,606,040 >> org.apache.poi.openxml4j.opc.ZipPackage 2 >> 112 87,009,848 >> char[] >> 587 32,216,960 38,216,950 >> >> >> We're really desperate to find a solution to this - any ideas or help >> is greatly appreciated. >> Wayne >>