Uwe, Here are some further manipulations, perhaps they are useful. I'm also under Debian (sid):
% xloadimage -identify fax000000095.tif fax000000095.tif is a 3456x4677 single-plane black-on-white G4FAX TIFF imageTitled "0389263661" % xloadimage -identify fax95_cut.tif fax95_cut.tif is a 1800x300 32-bit single-plane RGB standard TIFF imageTitled "/tmp/fax95_cut.tif" % convert fax95_cut.tif fax95_cut.pbm % convert fax95_cut.pbm fax95_cut_bw.tif % tesseract fax95_cut_bw.tif output && cat output.txt Tesseract Open Source OCR Engine AZLAN AT UNITEN DOT EDU DOT MY % tesseract fax000000095.tif output -l eng && cat output.txt Tesseract Open Source OCR Engine AZLAN AT UNITEN DOT EDU DOT MY A % tesseract fax000000095.tif output -l fra && cat output.txt Tesseract Open Source OCR Engine AZLAN AT UNITEN DOT EDU DOT MY * Best regards, Laird. On Mar 14, 3:10 am, udippel <[email protected]> wrote: > Following up on my earlier observations, I have now uploaded two > files: > fax000000095.tif > which results in Debian (Lenny) as well as Ubuntu (8.10) with an extra > 'A' at the end of the line using tesseract, One that definitively is > not there at all ("... MY A", is what I get in Debian and Ubuntu; in > case your tesseract works okay) > fax95_cut.tif > is the same image, with the area cut out in Gimp. Both on Debian and > Ubuntu these result in an empty text file. > > Could someone please confirm, that these problems are limited to > Debian/Ubuntu? > > Especially the first one seems to point to an imaging problem, rather > than an OCR problem. > > Uwe --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

