[tesseract-ocr] issues with pdf

Simon Eigeldinger Mon, 06 Oct 2014 11:04:16 -0700

hi all,

just tried the following with the provided eurotext.tif in the testingdir of the source package.

used current git from this afternoon european time:


i get this:

$ tesseract eurotext.tif eurotext -l eng pdf

Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
Error in fopenWriteStream: stream not opened
Error in pixWrite: stream not opened
Error in fopenReadStream: file not found
Error in extractG4DataFromFile: stream not opened to file
Error in l_generateG4Data: datacomp not extracted
Error in pixGenerateCIData: g4 data not made
Error in l_generateCIDataForPdf: file eurotext.tif format is 4; unreadable
Error during processing.

the text file is fine but the pdf is 4 kb and adobe reader doesn't likethe file either.



here are the files:
https://dl.dropboxusercontent.com/u/1598766/tesseract-error.7z


language data is from the tesseract git repository as well.

greetings,
simon

---
Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz 
ist aktiv.
http://www.avast.com

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5432D97A.5090607%40vol.at.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] issues with pdf

Reply via email to