RE: Solr OCR Support

2018-11-04 Thread Terry Steichen
arser); > >-Original Message- >From: Furkan KAMACI >Sent: Saturday, 3 November 2018 03:30 >To: solr-user@lucene.apache.org >Subject: Solr OCR Support > >Hi All, > >I want to index images and pdf documents which have images into Solr. I >test it with my S

RE: Solr OCR Support

2018-11-04 Thread Phil Scadden
(PDFParserConfig.OCR_STRATEGY.NO_OCR); context.set(PDFParserConfig.class,pdfConfig); context.set(Parser.class,parser); -Original Message- From: Furkan KAMACI Sent: Saturday, 3 November 2018 03:30 To: solr-user@lucene.apache.org Subject: Solr OCR Support Hi All, I want

Re: Solr OCR Support

2018-11-02 Thread Tim Allison
g Nuance (or tesseract), I just wish to point out that > what to OCR is important, because OCR works well when it has good input. > > > -Original Message- > > From: Tim Allison > > Sent: Friday, November 2, 2018 11:03 AM > > To: solr-user@lucene.apache.org &

RE: Solr OCR Support

2018-11-02 Thread Davis, Daniel (NIH/NLM) [C]
11:03 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr OCR Support > > OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr! We > have an open ticket to make it "just work", but we aren't there yet > (TIKA-2749). > > You have to tell Tika how you want

Re: Solr OCR Support

2018-11-02 Thread Tim Allison
OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr! We have an open ticket to make it "just work", but we aren't there yet (TIKA-2749). You have to tell Tika how you want to process images from PDFs via the tika-config.xml file. You've seen this link in the links you mentioned:

Solr OCR Support

2018-11-02 Thread Furkan KAMACI
Hi All, I want to index images and pdf documents which have images into Solr. I test it with my Solr 6.3.0. I've installed tesseract at my computer (Mac). I verify that Tesseract works fine to extract text from an image. I index image into Solr but it has no content. However, as far as I know,