RE: robust Tika and Hadoop

2015-07-20 Thread Allison, Timothy B.
Thank you, Ken and Mark. Will update wiki over the next few days! From: Ken Krugler [mailto:kkrugler_li...@transpac.com] Sent: Monday, July 20, 2015 7:21 PM To: user@tika.apache.org Subject: RE: robust Tika and Hadoop Hi Tim, When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap it

Re: robust Tika and Hadoop

2015-07-20 Thread Mark Kerzner
Hi, Tim, here is my Tika with Hadoop project, tested on Enron, http://frd.org/, and it works quite well. Mark On Mon, Jul 20, 2015 at 6:20 PM, Ken Krugler wrote: > Hi Tim, > > When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap it > with a TikaCallable ( > https://github.com

RE: robust Tika and Hadoop

2015-07-20 Thread Ken Krugler
Hi Tim, When we use Tika with Bixo (https://github.com/bixo/bixo/) we wrap it with a TikaCallable (https://github.com/bixo/bixo/blob/master/src/main/java/bixo/parser/TikaCallable.java) This lets us orphan the parsing thread if it times out (https://github.com/bixo/bixo/blob/master/src/main/jav