Re: regarding Extracting text from Images

Steve Ge Wed, 22 Jan 2020 05:31:57 -0800

In my experience, enabling Tika at server level can result in memory heap space 
used up under high volume of extraction, and bring down Solr entirely.   Likely 
due to garbage collector not able to keep up w/ load, even tuning garbage 
collector didn't resolve the problem completely.  Not recommend.
Steve  
 
  On Wed, Oct 23, 2019 at 7:08 PM, suresh pendap<sureshpen...@gmail.com> wrote: 
  Hi Alex,
Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
to implement Custom update processor or extend the
ExtractingRequestProcessor?


Regards
Suresh

On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> I believe Tika that powers this can do so with extra libraries (tesseract?)
> But Solr does not bundle those extras.
>
> In any case, you may want to run Tika externally to avoid the
> conversion/extraction process be a burden to Solr itself.
>
> Regards,
>      Alex
>
> On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com>
> wrote:
>
> > Hello,
> > I am reading the Solr documentation about integration with Tika and Solr
> > Cell framework over here
> >
> >
> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
> >
> > I would like to know if the can Solr Cell framework also be used to
> extract
> > text from the image files?
> >
> > Regards
> > Suresh
> >
>

Re: regarding Extracting text from Images

Reply via email to