Is there a solution to this, or am I going to have to dig into the sources? Thanks!
in.tif: https://drive.google.com/open?id=0B4f6QpD8ItHyYmdYLWF1WGRFSTQ [the actual TIF is nothing you'd ever want to OCR but the error below impedes batch conversion of the document] $ file in.tif in.tif: TIFF image data, little-endian, direntries=16, height=2558, bps=1, compression=none, PhotometricIntepretation=BlackIsZero, orientation=upper-left, width=1667 $ tesseract in.tif out -l eng pdf Tesseract Open Source OCR Engine v3.04.01 with Leptonica Page 1 Too few characters. Skipping this page OSD: Weak margin (0.00) for 4 blob text block, but using orientation anyway: 0 Error in fopenWriteStream: stream not opened Error in pixWrite: stream not opened Error in fopenReadStream: file not found Error in extractG4DataFromFile: stream not opened to file Error in l_generateG4Data: datacomp not extracted Error in pixGenerateCIData: g4 data not made Error in l_generateCIDataForPdf: file in.tif format is 4; unreadable Error during processing. $ tesseract -v tesseract 3.04.01 leptonica-1.73 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.25 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to firstname.lastname@example.org. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cf108f05-0d54-4b22-808b-112c28e1852f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.