Improving a tricky character recognition error

Nick White Thu, 12 Jul 2012 09:13:18 -0700

Hi folks,

My Polytonic Greek training is going swimmingly, but there is one
obstacle which is still beating me. In many fonts alpha (α) looks
quite similar to omicron followed by iota (οι), and Tesseract gets
this wrong quite often. I added a rule to unicharambigs to suggest
that one may be replaced for the other, but it hasn't made much
difference, and both are quite common forms, so I can't use the
'always replace' mode).


Does anybody have any suggestions on how to improve this? Any
configuration variables that could be tweaked? The training already
is a good size, so I don't think more tif/box sets would help.

Any thoughts would be much appreciated.

Nick

P.S. This is with Tesseract 3.02 (latest SVN), but v2 behaves
similarly.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Improving a tricky character recognition error

Reply via email to