Hi folks, My Polytonic Greek training is going swimmingly, but there is one obstacle which is still beating me. In many fonts alpha (α) looks quite similar to omicron followed by iota (οι), and Tesseract gets this wrong quite often. I added a rule to unicharambigs to suggest that one may be replaced for the other, but it hasn't made much difference, and both are quite common forms, so I can't use the 'always replace' mode).
Does anybody have any suggestions on how to improve this? Any configuration variables that could be tweaked? The training already is a good size, so I don't think more tif/box sets would help. Any thoughts would be much appreciated. Nick P.S. This is with Tesseract 3.02 (latest SVN), but v2 behaves similarly. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

