On Thursday, June 23, 2016 at 8:09:53 PM UTC-5, Quan Nguyen wrote: > > Tesseract cannot read PDF (which is a document format) directly. You'll > need to convert it to an image format first. > > On Thursday, June 23, 2016 at 7:12:13 PM UTC-5, John Muccigrosso wrote: >> >> Recently installed tesseract and am having some trouble with PDFs. The >> error is some form of: >> >> Error in fopenReadStream: file not found >> %���� in pixRead: image file not found: %PDF-1.3 >> %���� cannot be read! >> Error during processing. >> >> where the 1.3 may be 1.4 or 1.6. Things are fine with a jpg or tiff >> version of the same PDF (created by exporting from Preview.app). >> >> System: Mac OS X 10.9.5. >> "tesseract -v" reports: >> >> tesseract 3.04.01 >> leptonica-1.72 >> libjpeg 8d : libpng 1.6.23 : libtiff 4.0.6 : zlib 1.2.5 >> >> >> I installed tesseract and leptonica with homebrew and "brew info >> tesseract" reports: >> >> tesseract: stable 3.04.01 (bottled), HEAD >> OCR (Optical Character Recognition) engine >> https://github.com/tesseract-ocr/ >> /usr/local/Cellar/tesseract/3.04.01_1 (93 files, 39.5M) * >> Poured from bottle on 2016-05-27 at 15:41:15 >> From: https:// >> github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb >> ==> Dependencies >> Required: leptonica ✔ >> Recommended: libtiff ✔ >> ==> Options >> --with-all-languages >> Install recognition data for all languages >> --with-opencl >> Enable OpenCL support >> --with-training-tools >> Install OCR training tools >> --without-libtiff >> Build without libtiff support >> --HEAD >> Install HEAD version >> >> >> I suspect some missing package or something similar, but don't know what >> exactly. >> >> TIA. >> > Ugh, of course. Thanks!
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/dca78ff6-3dac-4127-ae03-e8879a651973%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

