On Wednesday, March 2, 2016 at 2:23:44 AM UTC-5, Roger wrote:
>
> I am training tesseract to recognize CMC7 font, following this 
> <http://michaeljaylissner.com/posts/2012/02/11/adding-new-fonts-to-tesseract-3-ocr-engine/>
>  and this 
> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract>
>  tutorial.
>

I see two immediate issues:

- Tesseract assumes non-noisy character images are connected shapes (except 
for diacritics, etc) while the CMC7 characters are made up of disconnected 
vertical bars
- According to this Wikipedia page https://fr.wikipedia.org/wiki/CMC7 the 
significant part of the CMC7 encoding is the interbar spacing, *not* the 
overall shape.

Are you sure you're using the right tool for the job?

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b17d0aec-9f5f-446c-b84c-128519dac0d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to