I believe this file needs to be supplied before the final combined trained data is compiled, therefore perhaps you should look for if jTessBoxEditor supports its creation.
https://tesseract-ocr.googlecode.com/svn/trunk/doc/unicharambigs.5.html On 7 January 2015 at 22:17, newbie <[email protected]> wrote: > Thanks Allistair for your response. I have the final crunched eng/ > trained_data, not sure if that has merged in unicharambigs. How would i > know ? > > On Wednesday, January 7, 2015 4:47:10 PM UTC-5, Allistair C wrote: >> >> You've tried unicharambigs right (bottom of this page >> https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) >> >> On Thursday, 20 November 2014 12:53:43 UTC, Mark Beylis wrote: >>> >>> Hello >>> >>> I am making use of Tesseract OCR to perform number plate recognition on >>> vehicles >>> >>> I am making use of jTessBoxEditor v1.1 to check my box and tif files >>> >>> At the moment each iteration of my training consists of using about 250 >>> - 300 number plates >>> >>> I have read in many places that one should train fonts separately. This >>> is difficult in my case as my source of images of number plates consists of >>> number plates with varying font's unless I manually look through each one >>> of the 100 initial images I use per training iteration to separate them >>> into different groups. Would this really be neccessary? >>> >>> I have been doing training for over a month now and probably trained on >>> over 1000 images and 3000 number plates and seem to not be able to get a >>> better accuracy percentage of over 86% >>> >>> I was wondering if you have some suggestions as ideally I would like to >>> see in excess of 90% accuracy >>> >>> What I have picked up is that the OCR struggles with certain problem >>> characters : O vs 0, 5 vs S, 2 vs Z, B vs 8 >>> >>> Is there a specific way of training that I should use to improve correct >>> reads of these letters. During my editting of the tif/box in jTessBoxEditor >>> I am torn between discarding the bad quality read characters and only >>> keeping the good quality read characters vs correcting each and every >>> character to be what it should be regardless of the quality of the >>> character in the tif file. Which is the better approach and why? >>> >>> Any other suggestions on how to improve my training using jTessBoxEditor >>> greatly appreciated >>> >>> Thanks >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/71596b7f-3630-4241-b665-f5c03f2d66a1%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/71596b7f-3630-4241-b665-f5c03f2d66a1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vii7e7vy4G5Z%3DobLwOPpKgYQj1rWogOZ-RZu91TFD0Ceg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

