On Wednesday, March 2, 2016 at 2:23:44 AM UTC-5, Roger wrote: > > I am training tesseract to recognize CMC7 font, following this > <http://michaeljaylissner.com/posts/2012/02/11/adding-new-fonts-to-tesseract-3-ocr-engine/> > and this > <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract> > tutorial. >
I see two immediate issues: - Tesseract assumes non-noisy character images are connected shapes (except for diacritics, etc) while the CMC7 characters are made up of disconnected vertical bars - According to this Wikipedia page https://fr.wikipedia.org/wiki/CMC7 the significant part of the CMC7 encoding is the interbar spacing, *not* the overall shape. Are you sure you're using the right tool for the job? Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b17d0aec-9f5f-446c-b84c-128519dac0d1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

