Thanks Allistair for your response. I have the final crunched eng/ 
trained_data, not sure if that has merged in unicharambigs. How would i 
know ?

On Wednesday, January 7, 2015 4:47:10 PM UTC-5, Allistair C wrote:
>
> You've tried unicharambigs right (bottom of this page 
> https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3)
>
> On Thursday, 20 November 2014 12:53:43 UTC, Mark Beylis wrote:
>>
>> Hello
>>
>> I am making use of Tesseract OCR to perform number plate recognition on 
>> vehicles
>>
>> I am making use of jTessBoxEditor v1.1 to check my box and tif files
>>
>> At the moment each iteration of my training consists of using about 250 - 
>> 300 number plates
>>
>> I have read in many places that one should train fonts separately. This 
>> is difficult in my case as my source of images of number plates consists of 
>> number plates with varying font's unless I manually look through each one 
>> of the 100 initial images I use per training iteration to separate them 
>> into different groups. Would this really be neccessary?
>>
>> I have been doing training for over a month now and probably trained on 
>> over 1000 images and 3000 number plates and seem to not be able to get a 
>> better accuracy percentage of over 86%
>>
>> I was wondering if you have some suggestions as ideally I would like to 
>> see in excess of 90% accuracy
>>
>> What I have picked up is that the OCR struggles with certain problem 
>> characters : O vs 0, 5 vs S, 2 vs Z, B vs 8
>>
>> Is there a specific way of training that I should use to improve correct 
>> reads of these letters. During my editting of the tif/box in jTessBoxEditor 
>> I am torn between discarding the bad quality read characters and only 
>> keeping the good quality read characters vs correcting each and every 
>> character to be what it should be regardless of the quality of the 
>> character in the tif file. Which is the better approach and why?
>>
>> Any other suggestions on how to improve my training using jTessBoxEditor 
>> greatly appreciated
>>
>> Thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/71596b7f-3630-4241-b665-f5c03f2d66a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to