Just to stir the pot on this topic, here is an article about why and how to use 
Tika inside of Solr:

https://opensourceconnections.com/blog/2019/10/24/it-s-okay-to-run-tika-inside-of-solr-if-and-only-if/

> On Oct 23, 2019, at 7:21 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Here’s a blog about why and how to use Tika outside Solr (and an RDBMS too, 
> but you can pull that part out pretty easily):
> https://lucidworks.com/post/indexing-with-solrj/
> 
> 
> 
>> On Oct 23, 2019, at 7:16 PM, Alexandre Rafalovitch <arafa...@gmail.com> 
>> wrote:
>> 
>> Again, I think you are best to do it out of Solr.
>> 
>> But even of you want to get it to work in Solr, I think you start by
>> getting it to work directly in Tika. Then, get the missing libraries and
>> configuration into Solr.
>> 
>> Regards,
>>   Alex
>> 
>> On Wed, Oct 23, 2019, 7:08 PM suresh pendap, <sureshpen...@gmail.com> wrote:
>> 
>>> Hi Alex,
>>> Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
>>> to implement Custom update processor or extend the
>>> ExtractingRequestProcessor?
>>> 
>>> Regards
>>> Suresh
>>> 
>>> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com
>>>> 
>>> wrote:
>>> 
>>>> I believe Tika that powers this can do so with extra libraries
>>> (tesseract?)
>>>> But Solr does not bundle those extras.
>>>> 
>>>> In any case, you may want to run Tika externally to avoid the
>>>> conversion/extraction process be a burden to Solr itself.
>>>> 
>>>> Regards,
>>>>    Alex
>>>> 
>>>> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hello,
>>>>> I am reading the Solr documentation about integration with Tika and
>>> Solr
>>>>> Cell framework over here
>>>>> 
>>>>> 
>>>> 
>>> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
>>>>> 
>>>>> I would like to know if the can Solr Cell framework also be used to
>>>> extract
>>>>> text from the image files?
>>>>> 
>>>>> Regards
>>>>> Suresh
>>>>> 
>>>> 
>>> 
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to