Re: regarding Extracting text from Images

Edward Ribeiro Sat, 26 Oct 2019 09:56:36 -0700

No. You should install tesseract-ocr on the same box your Solr instance is,
and configure Solr so that embedded Tika is able to use Tesseract to do the
ocr of images.


Best,
Edward

Em qua, 23 de out de 2019 20:08, suresh pendap <sureshpen...@gmail.com>
escreveu:

> Hi Alex,
> Thanks for your reply. How do we integrate tesseract with Solr?  Do we have
> to implement Custom update processor or extend the
> ExtractingRequestProcessor?
>
> Regards
> Suresh
>
> On Wed, Oct 23, 2019 at 11:21 AM Alexandre Rafalovitch <arafa...@gmail.com
> >
> wrote:
>
> > I believe Tika that powers this can do so with extra libraries
> (tesseract?)
> > But Solr does not bundle those extras.
> >
> > In any case, you may want to run Tika externally to avoid the
> > conversion/extraction process be a burden to Solr itself.
> >
> > Regards,
> >      Alex
> >
> > On Wed, Oct 23, 2019, 1:58 PM suresh pendap, <sureshpen...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > I am reading the Solr documentation about integration with Tika and
> Solr
> > > Cell framework over here
> > >
> > >
> >
> https://lucene.apache.org/solr/guide/6_6/uploading-data-with-solr-cell-using-apache-tika.html
> > >
> > > I would like to know if the can Solr Cell framework also be used to
> > extract
> > > text from the image files?
> > >
> > > Regards
> > > Suresh
> > >
> >
>

Re: regarding Extracting text from Images

Reply via email to