Here’s a blog about why and how to use Tika outside Solr (and an RDBMS too, but you can pull that part out pretty easily): https://lucidworks.com/post/indexing-with-solrj/
> On Oct 23, 2019, at 7:16 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > > Again, I think you are best to do it out of Solr. > > But even of you want to get it to work in Solr, I think you start by > getting it to work directly in Tika. Then, get the missing libraries and > configuration into Solr. > > Regards, > Alex > > On Wed, Oct 23, 2019, 7:08 PM suresh pendap, <sureshpen...@gmail.com> wrote: > >> Hi Alex, >> Thanks for your reply. How do we integrate tesseract with Solr? Do we have >> to implement Custom update processor or extend the >> ExtractingRequestProcessor? >> >> Regards >> Suresh >> >> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com >>> >> wrote: >> >>> I believe Tika that powers this can do so with extra libraries >> (tesseract?) >>> But Solr does not bundle those extras. >>> >>> In any case, you may want to run Tika externally to avoid the >>> conversion/extraction process be a burden to Solr itself. >>> >>> Regards, >>> Alex >>> >>> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> I am reading the Solr documentation about integration with Tika and >> Solr >>>> Cell framework over here >>>> >>>> >>> >> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html >>>> >>>> I would like to know if the can Solr Cell framework also be used to >>> extract >>>> text from the image files? >>>> >>>> Regards >>>> Suresh >>>> >>> >>