On Thursday, June 23, 2016 at 8:09:53 PM UTC-5, Quan Nguyen wrote:
>
> Tesseract cannot read PDF (which is a document format) directly. You'll 
> need to convert it to an image format first.
>
> On Thursday, June 23, 2016 at 7:12:13 PM UTC-5, John Muccigrosso wrote:
>>
>> Recently installed tesseract and am having some trouble with PDFs. The 
>> error is some form of:
>>
>> Error in fopenReadStream: file not found
>> %���� in pixRead: image file not found: %PDF-1.3
>> %���� cannot be read!
>> Error during processing.
>>
>> where the 1.3 may be 1.4 or 1.6. Things are fine with a jpg or tiff 
>> version of the same PDF (created by exporting from Preview.app).
>>
>> System: Mac OS X 10.9.5.
>> "tesseract -v" reports:
>>
>> tesseract 3.04.01
>>  leptonica-1.72
>>   libjpeg 8d : libpng 1.6.23 : libtiff 4.0.6 : zlib 1.2.5
>>
>>
>> I installed tesseract and leptonica with homebrew and "brew info 
>> tesseract" reports:
>>
>> tesseract: stable 3.04.01 (bottled), HEAD
>> OCR (Optical Character Recognition) engine
>> https://github.com/tesseract-ocr/
>> /usr/local/Cellar/tesseract/3.04.01_1 (93 files, 39.5M) *
>>   Poured from bottle on 2016-05-27 at 15:41:15
>> From: https://
>> github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb
>> ==> Dependencies
>> Required: leptonica ✔
>> Recommended: libtiff ✔
>> ==> Options
>> --with-all-languages
>>  Install recognition data for all languages
>> --with-opencl
>>  Enable OpenCL support
>> --with-training-tools
>>  Install OCR training tools
>> --without-libtiff
>>  Build without libtiff support
>> --HEAD
>>  Install HEAD version
>>
>>
>> I suspect some missing package or something similar, but don't know what 
>> exactly.
>>
>> TIA.
>>
>
Ugh, of course. Thanks! 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dca78ff6-3dac-4127-ae03-e8879a651973%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to