Thanks for the links - I've put a posting on the Tika ML.
I've just checked and we using  tika-0.2.jar - does anyone know which
version I can use with solr 1.3?

Is there any info on upgrading from this far back to the latest
version - is it even possible? or would I need to re-index everything?

On Tue, Jan 17, 2012 at 5:39 AM, P Williams
<williams.tricia.l...@gmail.com> wrote:
> Hi,
>
> I'm not sure which version of Solr/Tika you're using but I had a similar
> experience which turned out to be the result of a design change to PDFBox.
>
> https://issues.apache.org/jira/browse/SOLR-2886
>
> Tricia
>
> On Sat, Jan 14, 2012 at 12:53 AM, Wayne W <waynemailingli...@gmail.com>wrote:
>
>> Hi,
>>
>> we're using Solr running on tomcat with 1GB in production, and of late
>> we've been having a huge number of OutOfMemory issues. It seems from
>> what I can tell this is coming from the tika extraction of the
>> content. I've processed the java dump file using a memory analyzer and
>> its pretty clean at least the class involved. It seems like a leak to
>> me, as we don't parse any files larger than 20M, and these objects are
>> taking up ~700M
>>
>> I've attached 2 screen shots from the tool (not sure if you receive
>> attachments).
>>
>> But to summarize (class, number of objects, Used heap size, Retained Heap
>> Size):
>>
>>
>> org.apache.xmlbeans.impl.store.Xob$ElementXObj             838,993
>>         80,533,728       604,606,040
>> org.apache.poi.openxml4j.opc.ZipPackage                          2
>>                   112                  87,009,848
>> char[]
>>              587                    32,216,960       38,216,950
>>
>>
>> We're really desperate to find a solution to this - any ideas or help
>> is greatly appreciated.
>> Wayne
>>

Reply via email to