I would add Nutch to the list too, Tim :-)

+1 from me.

—
Chris Mattmann
[email protected]






-----Original Message-----
From: "Allison, Timothy B." <[email protected]>
Reply-To: <[email protected]>
Date: Wednesday, July 15, 2015 at 4:38 AM
To: "[email protected]" <[email protected]>
Subject: robust Tika and Hadoop

>All,
> 
>  I’d like to fill out our Wiki a bit more on using Tika robustly within
>Hadoop.  I’m aware of Behemoth [0], Nanite [1] and Morphlines [2].  I
>haven’t looked carefully into these packages yet.
> 
>  Does anyone have any recommendations for specific configurations/design
>patterns that will defend against oom and permanent hangs within Hadoop?
>  
>  Thank you!
> 
>        Best,
> 
>                  Tim
> 
> 
>[0] https://github.com/DigitalPebble/behemoth
>[1] 
>http://openpreservation.org/blog/2014/03/21/tika-ride-characterising-web-c
>ontent-nanite/
>[2] 
>http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and
>-integrate-etl-apps-for-apache-hadoop/
><http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-an
>d-integrate-etl-apps-for-apache-hadoop/>
> 
>


Reply via email to