Re: All 'e' come out as 'c'

udippel Sat, 28 Feb 2009 04:31:49 -0800

On Feb 27, 11:10 pm, "Albert Law" <[email protected]> wrote:

> Yes, I would also like to find the "don't suck button".  Just kidding.
>
> It just sounds like typical OCR problems.  Being a human I can figure out 0 
> from O from o from Q from @.  But for a computer to do
> so is hard especially with small DPIs and with font modifiers (e.g. bold and 
> italics).  So I would just accept it as reality and add
> a spell checker of sorts to scan the output.

Not all too funny, I am sorry. Please check
http://metalab.uniten.edu.my/~udippel/tess_no_e_and_dotcom.tif
for reference. It comes out of 2.03 with a 'c', and it does so under
plenty of trials of different resolutions. I think any visual
inspection will clearly show that the 'e' is an 'e'; and very
unambiguously so. [ocrad, by the way, does not have this problem at
all. It has others.]
That is concatenates the 'dot com' to 'dotcom' is a typical OCR
problem, I agree.
And no, we don't try to harvest e-mail addresses. We try to allow
hylafax to route faxes to the intended receivers.
Therefore, to you as well as Jimmy, we have no way to spell check.

Thank you,

Uwe


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: All 'e' come out as 'c'

Reply via email to