Tesseract cannot read PDF (which is a document format) directly. You'll need to convert it to an image format first.
On Thursday, June 23, 2016 at 7:12:13 PM UTC-5, John Muccigrosso wrote: > > Recently installed tesseract and am having some trouble with PDFs. The > error is some form of: > > Error in fopenReadStream: file not found > %���� in pixRead: image file not found: %PDF-1.3 > %���� cannot be read! > Error during processing. > > where the 1.3 may be 1.4 or 1.6. Things are fine with a jpg or tiff > version of the same PDF (created by exporting from Preview.app). > > System: Mac OS X 10.9.5. > "tesseract -v" reports: > > tesseract 3.04.01 > leptonica-1.72 > libjpeg 8d : libpng 1.6.23 : libtiff 4.0.6 : zlib 1.2.5 > > > I installed tesseract and leptonica with homebrew and "brew info > tesseract" reports: > > tesseract: stable 3.04.01 (bottled), HEAD > OCR (Optical Character Recognition) engine > https://github.com/tesseract-ocr/ > /usr/local/Cellar/tesseract/3.04.01_1 (93 files, 39.5M) * > Poured from bottle on 2016-05-27 at 15:41:15 > From: https:// > github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb > ==> Dependencies > Required: leptonica ✔ > Recommended: libtiff ✔ > ==> Options > --with-all-languages > Install recognition data for all languages > --with-opencl > Enable OpenCL support > --with-training-tools > Install OCR training tools > --without-libtiff > Build without libtiff support > --HEAD > Install HEAD version > > > I suspect some missing package or something similar, but don't know what > exactly. > > TIA. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9981de31-434e-4c7f-a184-e55af1833ec0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

