RE: Preventing OutOfMemory exception

2016-02-09 Thread Allison, Timothy B.
: Steven White [mailto:swhite4...@gmail.com] Sent: Tuesday, February 09, 2016 10:35 AM To: user@tika.apache.org Subject: Re: Preventing OutOfMemory exception Thanks Tim!! You helped me find the defect in my code. Yes, I'm using one BodyContentHandler. When I changed my code to create a new

Re: Preventing OutOfMemory exception

2016-02-09 Thread Steven White
emo code or you know your document set > well enough, you should be good to go with keeping Tika and your > postprocessing steps in the same jvm. > > > > *From:* Steven White [mailto:swhite4...@gmail.com] > *Sent:* Tuesday, February 09, 2016 10:35 AM > > *To:* user@tika.apache.

RE: Preventing OutOfMemory exception

2016-02-09 Thread Allison, Timothy B.
://mail-archives.apache.org/mod_mbox/lucene-dev/201507.mbox/%3cjira.12843538.1436367863000.133708.1436382786...@atlassian.jira%3E From: Steven White [mailto:swhite4...@gmail.com] Sent: Tuesday, February 09, 2016 5:37 PM To: user@tika.apache.org Subject: Re: Preventing OutOfMemory exception Thanks

RE: Preventing OutOfMemory exception

2016-02-08 Thread Allison, Timothy B.
...@gmail.com] Sent: Monday, February 08, 2016 4:56 PM To: user@tika.apache.org Subject: Re: Preventing OutOfMemory exception Hi Tim, The code I showed is a minimal example code to show the issue I'm running into, which is: memory keeps on growing. In production, the loop that you see will read files

RE: Preventing OutOfMemory exception

2016-02-08 Thread Allison, Timothy B.
I’m not sure why you’d want to append document contents across documents into one handler. Typically, you’d use a new ContentHandler and new Metadata object for each parse. Calling “toString()” does not clear the content handler, and you should have 20 copies of the extracted content on your

Re: Preventing OutOfMemory exception

2016-02-08 Thread Steven White
Hi Tim, The code I showed is a minimal example code to show the issue I'm running into, which is: memory keeps on growing. In production, the loop that you see will read files off a file system and parse them using the logic close to what I sowed. I use contentHandler.toString() to get back the