++
-Original Message-
From: Mark Kerzner
Reply-To: "user@tika.apache.org"
Date: Monday, July 20, 2015 at 4:22 PM
To: Tika User
Subject: Re: robust Tika and Hadoop
>Hi, Tim,
>
>
>here is my Tika with Hadoop project, tested on Enron,
>http://frd.org/
Thank you, Ken!
From: Ken Krugler [mailto:kkrugler_li...@transpac.com]
Sent: Tuesday, July 21, 2015 10:23 AM
To: user@tika.apache.org
Subject: RE: robust Tika and Hadoop
Hi Tim,
Responses inline below.
-- Ken
From: Allison, Timothy B.
Sent: July 21, 2015 5
Hi Tim,
Responses inline below.
-- Ken
> From: Allison, Timothy B.
> Sent: July 21, 2015 5:29:37am PDT
> To: user@tika.apache.org
> Subject: RE: robust Tika and Hadoop
>
> Ken,
> To confirm your strategy: one new Thread for each call to Tika, add timeout
> except
July 20, 2015 7:21 PM
To: user@tika.apache.org
Subject: RE: robust Tika and Hadoop
Hi Tim,
When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap it with a
TikaCallable
(https://github.com/bixo/bixo/blob/master/src/main/java/bixo/parser/TikaCallable.java)
This lets us orphan the pa
Thank you, Ken and Mark. Will update wiki over the next few days!
From: Ken Krugler [mailto:kkrugler_li...@transpac.com]
Sent: Monday, July 20, 2015 7:21 PM
To: user@tika.apache.org
Subject: RE: robust Tika and Hadoop
Hi Tim,
When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap
son, Timothy B.
>
> *Sent:* July 15, 2015 4:38:56am PDT
>
> *To:* user@tika.apache.org
>
> *Subject:* robust Tika and Hadoop
>
> All,
>
> I’d like to fill out our Wiki a bit more on using Tika robustly within
> Hadoop. I’m aware of Behemoth [0], Nanite [1] and M
rom: Allison, Timothy B.
> Sent: July 15, 2015 4:38:56am PDT
> To: user@tika.apache.org
> Subject: robust Tika and Hadoop
>
> All,
>
> I’d like to fill out our Wiki a bit more on using Tika robustly within
> Hadoop. I’m aware of Behemoth [0], Nanite [1] and Morphline
I would add Nutch to the list too, Tim :-)
+1 from me.
—
Chris Mattmann
chris.mattm...@gmail.com
-Original Message-
From: "Allison, Timothy B."
Reply-To:
Date: Wednesday, July 15, 2015 at 4:38 AM
To: "user@tika.apache.org"
Subject: robust Tika and Hadoop
>
All,
I'd like to fill out our Wiki a bit more on using Tika robustly within
Hadoop. I'm aware of Behemoth [0], Nanite [1] and Morphlines [2]. I haven't
looked carefully into these packages yet.
Does anyone have any recommendations for specific configurations/design
patterns that will def