Hi Nick, Have you done anything with dictionary entries? From previous reports it seems to me that having whole words would improve the chance of a correct choice. -_Sven
On Thu, Jul 12, 2012 at 9:50 AM, Nick White <[email protected]> wrote: > Hi folks, > > My Polytonic Greek training is going swimmingly, but there is one > obstacle which is still beating me. In many fonts alpha (α) looks > quite similar to omicron followed by iota (οι), and Tesseract gets > this wrong quite often. I added a rule to unicharambigs to suggest > that one may be replaced for the other, but it hasn't made much > difference, and both are quite common forms, so I can't use the > 'always replace' mode). > > Does anybody have any suggestions on how to improve this? Any > configuration variables that could be tweaked? The training already > is a good size, so I don't think more tif/box sets would help. > > Any thoughts would be much appreciated. > > Nick > > P.S. This is with Tesseract 3.02 (latest SVN), but v2 behaves > similarly. > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king." -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

