Re: Newbie: Training tesseract

clyde Mon, 23 Sep 2013 07:37:36 -0700

Hello! Nick White!

Thank you so much! it did unpacked the eng.traineddata! Thank you...


Noong Lunes, Setyembre 23 2013 19:19:02 UTC+8, si Nick White ay sumulat:
>
> Hi Clyde, 
>
> Nearly there. Try giving the full path to the eng.traineddata file. 
> So this command: 
>
> combine_tessdata -u "C:\Program Files\Tesseract-OCR\eng.traineddata" eng. 
>
> (that path may be slightly wrong as I can't remember exactly where 
> the eng.traineddata is in the Windows install - find it and correct 
> it as necessary) 
>
> Nick 
>
> On Sun, Sep 22, 2013 at 11:38:44PM -0700, clyde wrote: 
> > Hello Nick White! 
> > 
> > when i try to unpack the eng.traineddata 
> > command:  combine_tessdata -u eng.traineddata eng. 
> > 
> > i got an error message: 
> > 
> > Error opening data file eng.traineddata 
> > Please make sure the TESSDATA_PREFIX environment variable is set to the 
> parent 
> > d 
> > irectory of your "tessdata" directory. 
> > Extracting tessdata components from eng.traineddata 
> > 
> > 
> > I did check the environment variable of TESSDATA_PREFIX. 
> > see attach image.. 
> > 
> > 
> > [env] 
> > 
> > thank you...i hope for your response.. 
> > 
> > 
> > 
> > Noong Biyernes, Hulyo 20 2012 17:03:41 UTC+8, si Nick White ay sumulat: 
> > 
> >     Hi Nikola, 
> > 
> >     I suggest you don't try training it. Training is mostly for adding 
> >     new languages, or at least significantly different fonts. As your 
> >     input is English, and a common font, I doubt it would help much over 
> >     the standard english training file. 
> > 
> >     The results I got from running Tesseract 3 on your sample were 
> >     pretty good, though. I'll attach them here. Using -psm 6 made a big 
> >     improvement as it meant the table cells were on the correct row. So 
> >     I ran: 
> > 
> >       tesseract ocr1.png outtest2 -psm 6 
> > 
> >     The problems remaining in the output is 7 being consistently 
> recognised 
> >     as ?, and m is regularly misrecognised as r'n or r‘n. I have 
> suggestions 
> >     for this. 
> > 
> >     If your input data will never have ? in, create an ambig rule which 
> >     always changes a ? to a 7 (and similar for the r'n issues). The best 
> >     way to do this would be: 
> > 
> >     1) unpack the english training data: 
> > 
> >       combine_tessdata -u eng.traineddata eng. 
> > 
> >     2) add the following lines to the end of eng.unicharambigs: 
> > 
> >     1        ?        1        7        1 
> >     3        r ' n        1        m        1 
> >     3        r ‘ n        1        m        1 
> > 
> >     3) recombine the training data: 
> > 
> >       combine_tessdata eng. 
> > 
> >     And the eng.traineddata file will contain the extra ambig rules. 
> > 
> >     Hope this helps, and let us know how you get on. 
> > 
> >     Nick 
> > 
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Newbie: Training tesseract

Reply via email to