Hi, Yes, I would also like to find the "don't suck button". Just kidding.
It just sounds like typical OCR problems. Being a human I can figure out 0 from O from o from Q from @. But for a computer to do so is hard especially with small DPIs and with font modifiers (e.g. bold and italics). So I would just accept it as reality and add a spell checker of sorts to scan the output. Unless you are saying that it works under Windows but not under Debian.... - Albert -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of udippel Sent: Thursday, February 26, 2009 21:08 To: tesseract-ocr Subject: All 'e' come out as 'c' (I permit myself to pick this topic up, again, after a break of a few months during which I had other obligations.) My install is Debian, by now 5.0. I run tesseract out of the box. It works pretty well, except that - under 4.0 and now under 5.0 - all lowercase 'e' are recognised as lowercase 'c', irrespective of resolution or font size. Any optical inspection reveals the clear predominance of the horizontal stroke in the 'e'-s. Like before, I can't make out how to attach an image file that fails for us. I wonder, if anybody out there could please help me, to identify the setting in one of those configuration files so that it starts to recognize the lowercase 'e'-s properly. Maybe I should add that we don't feed it with any specific language/ dictionary. The character to be recognised here, are just supposed to be recognised as such. We only need tesseract to recognize the standard ASCII-128 characters. Thanks in advance, Uwe --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

