I trained Tesseract (based on the eng language) to work with a particular 
customer derived font.
hen I have finished training, I use other image files and try and scan for 
characters, during this I get errors in reading a 0 (zero).  It will come 
up sometimes as an 8, Q, D or an O (the letter).
 
The images I am using are not mixing up font size nor putting in lower case.
 
I have tried using the unicharambigs file, but I'm not sure that it is 
being implemented correctly.  I renamed it to eng.unicharambigs as well and 
nothing happened there.  I even tried to set the Type Indicator value to 1 
to mandate the substitution and nothing happened.
 
Anyway, I feel I'm missing something simple, but have wound myself around 
this problem where I put myself in the middle of the forest.
 
If someone can point me in a proper direction or give a few pointers I 
would appreciate the help.
 
The files created or used:
Trained Data: 
https://drive.google.com/file/d/0B3S3PcVl6aznbHh5TW9HX3FCdlU/view?usp=sharing
Image: 
https://drive.google.com/file/d/0B3S3PcVl6azndXRvcGhCRlJRUWc/view?usp=sharing
Box: 
https://drive.google.com/file/d/0B3S3PcVl6aznYWNZaUJ2RVFDU2s/view?usp=sharing
unicharambigs: 
https://drive.google.com/file/d/0B3S3PcVl6aznUHdKX2RucExBaXM/view?usp=sharing
font_properties: 
https://drive.google.com/file/d/0B3S3PcVl6aznTVl6OUNvRnh0LXM/view?usp=sharing
 
Thanks,
  Jim

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c71a648a-f4eb-462e-8475-755408f262fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to