I believe this file needs to be supplied before the final combined trained
data is compiled, therefore perhaps you should look for if jTessBoxEditor
supports its creation.

https://tesseract-ocr.googlecode.com/svn/trunk/doc/unicharambigs.5.html

On 7 January 2015 at 22:17, newbie <[email protected]> wrote:

> Thanks Allistair for your response. I have the final crunched eng/
> trained_data, not sure if that has merged in unicharambigs. How would i
> know ?
>
> On Wednesday, January 7, 2015 4:47:10 PM UTC-5, Allistair C wrote:
>>
>> You've tried unicharambigs right (bottom of this page
>> https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3)
>>
>> On Thursday, 20 November 2014 12:53:43 UTC, Mark Beylis wrote:
>>>
>>> Hello
>>>
>>> I am making use of Tesseract OCR to perform number plate recognition on
>>> vehicles
>>>
>>> I am making use of jTessBoxEditor v1.1 to check my box and tif files
>>>
>>> At the moment each iteration of my training consists of using about 250
>>> - 300 number plates
>>>
>>> I have read in many places that one should train fonts separately. This
>>> is difficult in my case as my source of images of number plates consists of
>>> number plates with varying font's unless I manually look through each one
>>> of the 100 initial images I use per training iteration to separate them
>>> into different groups. Would this really be neccessary?
>>>
>>> I have been doing training for over a month now and probably trained on
>>> over 1000 images and 3000 number plates and seem to not be able to get a
>>> better accuracy percentage of over 86%
>>>
>>> I was wondering if you have some suggestions as ideally I would like to
>>> see in excess of 90% accuracy
>>>
>>> What I have picked up is that the OCR struggles with certain problem
>>> characters : O vs 0, 5 vs S, 2 vs Z, B vs 8
>>>
>>> Is there a specific way of training that I should use to improve correct
>>> reads of these letters. During my editting of the tif/box in jTessBoxEditor
>>> I am torn between discarding the bad quality read characters and only
>>> keeping the good quality read characters vs correcting each and every
>>> character to be what it should be regardless of the quality of the
>>> character in the tif file. Which is the better approach and why?
>>>
>>> Any other suggestions on how to improve my training using jTessBoxEditor
>>> greatly appreciated
>>>
>>> Thanks
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/71596b7f-3630-4241-b665-f5c03f2d66a1%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/71596b7f-3630-4241-b665-f5c03f2d66a1%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vii7e7vy4G5Z%3DobLwOPpKgYQj1rWogOZ-RZu91TFD0Ceg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to