Hi there, we are currently using tesseract 3.01 as OCR engine and have trained a number of fonts with it. Things work quite well, but we would like to move to version 3.02 for two reasons:
- It is possible to combine fonts - The character recognition is supposed to be significantly improved In our tests we found that the character recognition has chenged, but the results are mixed. In particular, quite a few characters that previously had few confusions now have none (which is good), but then there are characters that are much worse, making the overall result worse. For example, in one dataset the number of confusions from H to M has increased from 7 to 52 and the number of confusions from O to D has increased from 15 to 37. Is there a difference in the font files between tesseract 3.01 and 3.02? Does it matter to tesseract 3.02 whether a font was trained with 3.01 training? Would it help to retrain the fonts with tesseract 3.02 training tools or should this not matter? In what way was character recognition improved in tesseract 3.02? Thanks in advance for any help you can provide! Best regards, Marcus -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

