: Steven White [mailto:swhite4...@gmail.com]
Sent: Tuesday, February 09, 2016 10:35 AM
To: user@tika.apache.org
Subject: Re: Preventing OutOfMemory exception
Thanks Tim!! You helped me find the defect in my code.
Yes, I'm using one BodyContentHandler. When I changed my code to create a new
emo code or you know your document set
> well enough, you should be good to go with keeping Tika and your
> postprocessing steps in the same jvm.
>
>
>
> *From:* Steven White [mailto:swhite4...@gmail.com]
> *Sent:* Tuesday, February 09, 2016 10:35 AM
>
> *To:* user@tika.apache.
://mail-archives.apache.org/mod_mbox/lucene-dev/201507.mbox/%3cjira.12843538.1436367863000.133708.1436382786...@atlassian.jira%3E
From: Steven White [mailto:swhite4...@gmail.com]
Sent: Tuesday, February 09, 2016 5:37 PM
To: user@tika.apache.org
Subject: Re: Preventing OutOfMemory exception
Thanks
...@gmail.com]
Sent: Monday, February 08, 2016 4:56 PM
To: user@tika.apache.org
Subject: Re: Preventing OutOfMemory exception
Hi Tim,
The code I showed is a minimal example code to show the issue I'm running into,
which is: memory keeps on growing.
In production, the loop that you see will read files
I’m not sure why you’d want to append document contents across documents into
one handler. Typically, you’d use a new ContentHandler and new Metadata object
for each parse. Calling “toString()” does not clear the content handler, and
you should have 20 copies of the extracted content on your
Hi Tim,
The code I showed is a minimal example code to show the issue I'm running
into, which is: memory keeps on growing.
In production, the loop that you see will read files off a file system and
parse them using the logic close to what I sowed. I use
contentHandler.toString() to get back the