Hi there,
 
we are currently using tesseract 3.01 as OCR engine and have trained a 
number of fonts with it. Things work quite well, but we would like to move 
to version 3.02 for two reasons:

   - It is possible to combine fonts 
   - The character recognition is supposed to be significantly improved

In our tests we found that the character recognition has chenged, but the 
results are mixed. In particular, quite a few characters that previously 
had few confusions now have none (which is good), but then there are 
characters that are much worse, making the overall result worse. For 
example, in one dataset the number of confusions from H to M has increased 
from 7 to 52 and the number of confusions from O to D has increased from 15 
to 37.
 
Is there a difference in the font files between tesseract 3.01 and 3.02? 
Does it matter to tesseract 3.02 whether a font was trained with 3.01 
training? Would it help to retrain the fonts with tesseract 3.02 training 
tools or should this not matter?
 
In what way was character recognition improved in tesseract 3.02?
 
Thanks in advance for any help you can provide!
 
Best regards,
Marcus
 

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to