OCR'ing of PDFs is fiddly at the moment because of Tika, not Solr! We have an open ticket to make it "just work", but we aren't there yet (TIKA-2749).
You have to tell Tika how you want to process images from PDFs via the tika-config.xml file. You've seen this link in the links you mentioned: https://wiki.apache.org/tika/TikaOCR This one is key for PDFs: https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29#OCR On Fri, Nov 2, 2018 at 10:30 AM Furkan KAMACI <furkankam...@gmail.com> wrote: > > Hi All, > > I want to index images and pdf documents which have images into Solr. I > test it with my Solr 6.3.0. > > I've installed tesseract at my computer (Mac). I verify that Tesseract > works fine to extract text from an image. > > I index image into Solr but it has no content. However, as far as I know, I > don't need to do anything else to integrate Tesseract with Solr. > > I've checked these but they were not useful for me: > > http://lucene.472066.n3.nabble.com/TIKA-OCR-not-working-td4201834.html > http://lucene.472066.n3.nabble.com/Fwd-configuring-Solr-with-Tesseract-td4361908.html > > My question is, how can I support OCR with Solr? > > Kind Regards, > Furkan KAMACI