Re: OCR Tika to read PDF, txt and doc docx

Karl Wright Fri, 05 Jan 2018 09:39:49 -0800

Hi,

It's pretty straightforward.  EITHER you configure your Solr output
connection to use the extracting update handler and Solr Cell (the
default), so that Tika is used on the Solr side, OR you configure to use
the standard update handler and insert the Tika Extractor as a document
transformer in your job's pipeline.


Karl

On Fri, Jan 5, 2018 at 12:19 PM, msaunier <[email protected]> wrote:

> Sorry, it’s an error. I need the text *content* of PDF, txt and doc docx
> to index in solr.
>
>
>
> Thanks for your help.
>
>
>
>
>
> *De :* msaunier [mailto:[email protected]]
> *Envoyé :* vendredi 5 janvier 2018 18:05
> *À :* [email protected]
> *Objet :* OCR Tika to read PDF, txt and doc docx
>
>
>
> Hello,
>
>
>
> How can I used/install an OCR to extract the content_html in files with
> ManifoldCF ?
>
> I need the HTML content.
>
>
>
> Thanks for your help,
>
>
>
>
>
>
>
>
>

Re: OCR Tika to read PDF, txt and doc docx

Reply via email to