Hi Nick,
Have you done anything with dictionary entries? From previous reports
it seems to me that having whole words would improve the chance of a
correct choice.
-_Sven

On Thu, Jul 12, 2012 at 9:50 AM, Nick White <[email protected]> wrote:
> Hi folks,
>
> My Polytonic Greek training is going swimmingly, but there is one
> obstacle which is still beating me. In many fonts alpha (α) looks
> quite similar to omicron followed by iota (οι), and Tesseract gets
> this wrong quite often. I added a rule to unicharambigs to suggest
> that one may be replaced for the other, but it hasn't made much
> difference, and both are quite common forms, so I can't use the
> 'always replace' mode).
>
> Does anybody have any suggestions on how to improve this? Any
> configuration variables that could be tweaked? The training already
> is a good size, so I don't think more tif/box sets would help.
>
> Any thoughts would be much appreciated.
>
> Nick
>
> P.S. This is with Tesseract 3.02 (latest SVN), but v2 behaves
> similarly.
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king."

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to