RE: All 'e' come out as 'c'

Albert Law Fri, 27 Feb 2009 07:09:47 -0800

Hi,

Yes, I would also like to find the "don't suck button".  Just kidding.


It just sounds like typical OCR problems.  Being a human I can figure out 0 
from O from o from Q from @.  But for a computer to do
so is hard especially with small DPIs and with font modifiers (e.g. bold and 
italics).  So I would just accept it as reality and add
a spell checker of sorts to scan the output.

Unless you are saying that it works under Windows but not under Debian....



-
Albert 

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of udippel
Sent: Thursday, February 26, 2009 21:08
To: tesseract-ocr
Subject: All 'e' come out as 'c'


(I permit myself to pick this topic up, again, after a break of a few
months during which I had other obligations.)

My install is Debian, by now 5.0. I run tesseract out of the box. It
works pretty well, except that - under 4.0 and now under 5.0 - all
lowercase 'e' are recognised as lowercase 'c', irrespective of
resolution or font size. Any optical inspection reveals the clear
predominance of the horizontal stroke in the 'e'-s. Like before, I
can't make out how to attach an image file that fails for us.

I wonder, if anybody out there could please help me, to identify the
setting in one of those configuration files so that it starts to
recognize the lowercase 'e'-s properly.
Maybe I should add that we don't feed it with any specific language/
dictionary. The character to be recognised here, are just supposed to
be recognised as such. We only need tesseract to recognize the
standard ASCII-128 characters.

Thanks in advance,

Uwe



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

RE: All 'e' come out as 'c'

Reply via email to