Re: robust Tika and Hadoop

2015-07-22 Thread Mattmann, Chris A (3980)
++ -Original Message- From: Mark Kerzner Reply-To: "user@tika.apache.org" Date: Monday, July 20, 2015 at 4:22 PM To: Tika User Subject: Re: robust Tika and Hadoop >Hi, Tim, > > >here is my Tika with Hadoop project, tested on Enron, >http://frd.org/

RE: robust Tika and Hadoop

2015-07-21 Thread Allison, Timothy B.
Thank you, Ken! From: Ken Krugler [mailto:kkrugler_li...@transpac.com] Sent: Tuesday, July 21, 2015 10:23 AM To: user@tika.apache.org Subject: RE: robust Tika and Hadoop Hi Tim, Responses inline below. -- Ken From: Allison, Timothy B. Sent: July 21, 2015 5

RE: robust Tika and Hadoop

2015-07-21 Thread Ken Krugler
Hi Tim, Responses inline below. -- Ken > From: Allison, Timothy B. > Sent: July 21, 2015 5:29:37am PDT > To: user@tika.apache.org > Subject: RE: robust Tika and Hadoop > > Ken, > To confirm your strategy: one new Thread for each call to Tika, add timeout > except

RE: robust Tika and Hadoop

2015-07-21 Thread Allison, Timothy B.
July 20, 2015 7:21 PM To: user@tika.apache.org Subject: RE: robust Tika and Hadoop Hi Tim, When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap it with a TikaCallable (https://github.com/bixo/bixo/blob/master/src/main/java/bixo/parser/TikaCallable.java) This lets us orphan the pa

RE: robust Tika and Hadoop

2015-07-20 Thread Allison, Timothy B.
Thank you, Ken and Mark. Will update wiki over the next few days! From: Ken Krugler [mailto:kkrugler_li...@transpac.com] Sent: Monday, July 20, 2015 7:21 PM To: user@tika.apache.org Subject: RE: robust Tika and Hadoop Hi Tim, When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap

Re: robust Tika and Hadoop

2015-07-20 Thread Mark Kerzner
son, Timothy B. > > *Sent:* July 15, 2015 4:38:56am PDT > > *To:* user@tika.apache.org > > *Subject:* robust Tika and Hadoop > > All, > > I’d like to fill out our Wiki a bit more on using Tika robustly within > Hadoop. I’m aware of Behemoth [0], Nanite [1] and M

RE: robust Tika and Hadoop

2015-07-20 Thread Ken Krugler
rom: Allison, Timothy B. > Sent: July 15, 2015 4:38:56am PDT > To: user@tika.apache.org > Subject: robust Tika and Hadoop > > All, > > I’d like to fill out our Wiki a bit more on using Tika robustly within > Hadoop. I’m aware of Behemoth [0], Nanite [1] and Morphline

Re: robust Tika and Hadoop

2015-07-15 Thread Chris Mattmann
I would add Nutch to the list too, Tim :-) +1 from me. — Chris Mattmann chris.mattm...@gmail.com -Original Message- From: "Allison, Timothy B." Reply-To: Date: Wednesday, July 15, 2015 at 4:38 AM To: "user@tika.apache.org" Subject: robust Tika and Hadoop >

robust Tika and Hadoop

2015-07-15 Thread Allison, Timothy B.
All, I'd like to fill out our Wiki a bit more on using Tika robustly within Hadoop. I'm aware of Behemoth [0], Nanite [1] and Morphlines [2]. I haven't looked carefully into these packages yet. Does anyone have any recommendations for specific configurations/design patterns that will def