Is there a solution to this,  or am I going to have to dig into the 
sources?    Thanks!

in.tif:  https://drive.google.com/open?id=0B4f6QpD8ItHyYmdYLWF1WGRFSTQ
[the actual TIF is nothing you'd ever want to OCR but the error below 
impedes batch conversion of the document]

$ file in.tif
in.tif: TIFF image data, little-endian, direntries=16, height=2558, bps=1, 
compression=none, PhotometricIntepretation=BlackIsZero, 
orientation=upper-left, width=1667

$ tesseract in.tif out -l eng pdf
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
Too few characters. Skipping this page
OSD: Weak margin (0.00) for 4 blob text block, but using orientation 
anyway: 0
Error in fopenWriteStream: stream not opened
Error in pixWrite: stream not opened
Error in fopenReadStream: file not found
Error in extractG4DataFromFile: stream not opened to file
Error in l_generateG4Data: datacomp not extracted
Error in pixGenerateCIData: g4 data not made
Error in l_generateCIDataForPdf: file in.tif format is 4; unreadable
Error during processing.

$ tesseract -v
tesseract 3.04.01
leptonica-1.73
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.25 : libtiff 
4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/cf108f05-0d54-4b22-808b-112c28e1852f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to