Re: regarding Extracting text from Images

Erick Erickson Wed, 23 Oct 2019 16:22:09 -0700

Here’s a blog about why and how to use Tika outside Solr (and an RDBMS too, but 
you can pull that part out pretty easily):
https://lucidworks.com/post/indexing-with-solrj/




> On Oct 23, 2019, at 7:16 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
> 
> Again, I think you are best to do it out of Solr.
> 
> But even of you want to get it to work in Solr, I think you start by
> getting it to work directly in Tika. Then, get the missing libraries and
> configuration into Solr.
> 
> Regards,
>    Alex
> 
> On Wed, Oct 23, 2019, 7:08 PM suresh pendap, <sureshpen...@gmail.com> wrote:
> 
>> Hi Alex,
>> Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
>> to implement Custom update processor or extend the
>> ExtractingRequestProcessor?
>> 
>> Regards
>> Suresh
>> 
>> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com
>>> 
>> wrote:
>> 
>>> I believe Tika that powers this can do so with extra libraries
>> (tesseract?)
>>> But Solr does not bundle those extras.
>>> 
>>> In any case, you may want to run Tika externally to avoid the
>>> conversion/extraction process be a burden to Solr itself.
>>> 
>>> Regards,
>>>     Alex
>>> 
>>> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com>
>>> wrote:
>>> 
>>>> Hello,
>>>> I am reading the Solr documentation about integration with Tika and
>> Solr
>>>> Cell framework over here
>>>> 
>>>> 
>>> 
>> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
>>>> 
>>>> I would like to know if the can Solr Cell framework also be used to
>>> extract
>>>> text from the image files?
>>>> 
>>>> Regards
>>>> Suresh
>>>> 
>>> 
>>

Re: regarding Extracting text from Images

Reply via email to