Have you tried isolating just the letter and seeing if it is correctly
identified when you use single-character mode?

On Thu, Jul 12, 2012 at 7:50 AM, Nick White <[email protected]> wrote:

> Hi folks,
>
> My Polytonic Greek training is going swimmingly, but there is one
> obstacle which is still beating me. In many fonts alpha (α) looks
> quite similar to omicron followed by iota (οι), and Tesseract gets
> this wrong quite often. I added a rule to unicharambigs to suggest
> that one may be replaced for the other, but it hasn't made much
> difference, and both are quite common forms, so I can't use the
> 'always replace' mode).
>
> Does anybody have any suggestions on how to improve this? Any
> configuration variables that could be tweaked? The training already
> is a good size, so I don't think more tif/box sets would help.
>
> Any thoughts would be much appreciated.
>
> Nick
>
> P.S. This is with Tesseract 3.02 (latest SVN), but v2 behaves
> similarly.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to