Re: integrating tika into hadoop and tika with tesseract.

chethan Sun, 27 Nov 2011 21:20:55 -0800

can anyone give me an example of how to bind external parser with tika. as
i couldn't find any of the blog or article which illustrates binding
external parser( tesseract ) along with tika.


On Fri, Nov 25, 2011 at 10:50 PM, Julien Nioche <
[email protected]> wrote:

> Behemoth [https://github.com/jnioche/behemoth] has a module for Tika
> which allows you to use it over Hadoop. Re-Tesseract : if you can call it
> on the command line then the external parser could help; not sure how this
> would work on Hadoop, possibly by installing Tesseract on all the slaves at
> the same location
>
> HTH
>
> Julien
>
>
> On 25 November 2011 07:29, chethan <[email protected]> wrote:
>
>> hi,
>>
>> as i am new to tika, i want to know following things.
>>
>> 1. how to integrate tika within hadoop, so that tika will use map
>> reduce to implement the parsing.
>> 2. we wanted tika to parse ocr files too...but as tika is not
>> supporting ocr parsing and also recommending to use tesseract, i want
>> to
>>   know how to call tesseract ( command line operation ) through tika
>> ( which in-turn uses map reduce to parse ocr files ).
>>
>> thanks and regards
>> chethan
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Re: integrating tika into hadoop and tika with tesseract.

Reply via email to