Thanks Allistair for your response. I have the final crunched eng/ trained_data, not sure if that has merged in unicharambigs. How would i know ?
On Wednesday, January 7, 2015 4:47:10 PM UTC-5, Allistair C wrote: > > You've tried unicharambigs right (bottom of this page > https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) > > On Thursday, 20 November 2014 12:53:43 UTC, Mark Beylis wrote: >> >> Hello >> >> I am making use of Tesseract OCR to perform number plate recognition on >> vehicles >> >> I am making use of jTessBoxEditor v1.1 to check my box and tif files >> >> At the moment each iteration of my training consists of using about 250 - >> 300 number plates >> >> I have read in many places that one should train fonts separately. This >> is difficult in my case as my source of images of number plates consists of >> number plates with varying font's unless I manually look through each one >> of the 100 initial images I use per training iteration to separate them >> into different groups. Would this really be neccessary? >> >> I have been doing training for over a month now and probably trained on >> over 1000 images and 3000 number plates and seem to not be able to get a >> better accuracy percentage of over 86% >> >> I was wondering if you have some suggestions as ideally I would like to >> see in excess of 90% accuracy >> >> What I have picked up is that the OCR struggles with certain problem >> characters : O vs 0, 5 vs S, 2 vs Z, B vs 8 >> >> Is there a specific way of training that I should use to improve correct >> reads of these letters. During my editting of the tif/box in jTessBoxEditor >> I am torn between discarding the bad quality read characters and only >> keeping the good quality read characters vs correcting each and every >> character to be what it should be regardless of the quality of the >> character in the tif file. Which is the better approach and why? >> >> Any other suggestions on how to improve my training using jTessBoxEditor >> greatly appreciated >> >> Thanks >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/71596b7f-3630-4241-b665-f5c03f2d66a1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

