Re: Newbie: Training tesseract

Nick White Mon, 23 Sep 2013 04:19:38 -0700

Hi Clyde,

Nearly there. Try giving the full path to the eng.traineddata file.
So this command:


combine_tessdata -u "C:\Program Files\Tesseract-OCR\eng.traineddata" eng.

(that path may be slightly wrong as I can't remember exactly where
the eng.traineddata is in the Windows install - find it and correct
it as necessary)

Nick

On Sun, Sep 22, 2013 at 11:38:44PM -0700, clyde wrote:
> Hello Nick White!
> 
> when i try to unpack the eng.traineddata 
> command:  combine_tessdata -u eng.traineddata eng. 
> 
> i got an error message:
> 
> Error opening data file eng.traineddata
> Please make sure the TESSDATA_PREFIX environment variable is set to the parent
> d
> irectory of your "tessdata" directory.
> Extracting tessdata components from eng.traineddata
> 
> 
> I did check the environment variable of TESSDATA_PREFIX.
> see attach image..
> 
> 
> [env]
> 
> thank you...i hope for your response..
> 
> 
> 
> Noong Biyernes, Hulyo 20 2012 17:03:41 UTC+8, si Nick White ay sumulat:
> 
>     Hi Nikola,
> 
>     I suggest you don't try training it. Training is mostly for adding
>     new languages, or at least significantly different fonts. As your
>     input is English, and a common font, I doubt it would help much over
>     the standard english training file.
> 
>     The results I got from running Tesseract 3 on your sample were
>     pretty good, though. I'll attach them here. Using -psm 6 made a big
>     improvement as it meant the table cells were on the correct row. So
>     I ran:
> 
>       tesseract ocr1.png outtest2 -psm 6
> 
>     The problems remaining in the output is 7 being consistently recognised
>     as ?, and m is regularly misrecognised as r'n or r‘n. I have suggestions
>     for this.
> 
>     If your input data will never have ? in, create an ambig rule which
>     always changes a ? to a 7 (and similar for the r'n issues). The best
>     way to do this would be:
> 
>     1) unpack the english training data:
> 
>       combine_tessdata -u eng.traineddata eng.
> 
>     2) add the following lines to the end of eng.unicharambigs:
> 
>     1        ?        1        7        1
>     3        r ' n        1        m        1
>     3        r ‘ n        1        m        1
> 
>     3) recombine the training data:
> 
>       combine_tessdata eng.
> 
>     And the eng.traineddata file will contain the extra ambig rules.
> 
>     Hope this helps, and let us know how you get on.
> 
>     Nick
> 

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Newbie: Training tesseract

Reply via email to