Recently installed tesseract and am having some trouble with PDFs. The error is some form of:
Error in fopenReadStream: file not found %���� in pixRead: image file not found: %PDF-1.3 %���� cannot be read! Error during processing. where the 1.3 may be 1.4 or 1.6. Things are fine with a jpg or tiff version of the same PDF (created by exporting from Preview.app). System: Mac OS X 10.9.5. "tesseract -v" reports: tesseract 3.04.01 leptonica-1.72 libjpeg 8d : libpng 1.6.23 : libtiff 4.0.6 : zlib 1.2.5 I installed tesseract and leptonica with homebrew and "brew info tesseract" reports: tesseract: stable 3.04.01 (bottled), HEAD OCR (Optical Character Recognition) engine https://github.com/tesseract-ocr/ /usr/local/Cellar/tesseract/3.04.01_1 (93 files, 39.5M) * Poured from bottle on 2016-05-27 at 15:41:15 From: https: //github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb ==> Dependencies Required: leptonica ✔ Recommended: libtiff ✔ ==> Options --with-all-languages Install recognition data for all languages --with-opencl Enable OpenCL support --with-training-tools Install OCR training tools --without-libtiff Build without libtiff support --HEAD Install HEAD version I suspect some missing package or something similar, but don't know what exactly. TIA. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/de320a67-b788-4263-8486-a522c556051c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

