In my experience, enabling Tika at server level can result in memory heap space used up under high volume of extraction, and bring down Solr entirely. Likely due to garbage collector not able to keep up w/ load, even tuning garbage collector didn't resolve the problem completely. Not recommend. Steve On Wed, Oct 23, 2019 at 7:08 PM, suresh pendap<sureshpen...@gmail.com> wrote: Hi Alex, Thanks for your reply. How do we integrate tesseract with Solr? Do we have to implement Custom update processor or extend the ExtractingRequestProcessor?
Regards Suresh On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com> wrote: > I believe Tika that powers this can do so with extra libraries (tesseract?) > But Solr does not bundle those extras. > > In any case, you may want to run Tika externally to avoid the > conversion/extraction process be a burden to Solr itself. > > Regards, > Alex > > On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com> > wrote: > > > Hello, > > I am reading the Solr documentation about integration with Tika and Solr > > Cell framework over here > > > > > https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html > > > > I would like to know if the can Solr Cell framework also be used to > extract > > text from the image files? > > > > Regards > > Suresh > > >