Hello Nick White!

when i try to unpack the eng.traineddata 
command:  combine_tessdata -u eng.traineddata eng. 

i got an error message:

Error opening data file eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the 
parent d
irectory of your "tessdata" directory.
Extracting tessdata components from eng.traineddata


I did check the environment variable of TESSDATA_PREFIX.
see attach image..

<https://lh5.googleusercontent.com/-wJEfnO2NKWY/Uj_h6IdFBII/AAAAAAAAAAY/1T9gtYZVLFo/s1600/env.PNG>
thank you...i hope for your response..



Noong Biyernes, Hulyo 20 2012 17:03:41 UTC+8, si Nick White ay sumulat:
>
> Hi Nikola, 
>
> I suggest you don't try training it. Training is mostly for adding 
> new languages, or at least significantly different fonts. As your 
> input is English, and a common font, I doubt it would help much over 
> the standard english training file. 
>
> The results I got from running Tesseract 3 on your sample were 
> pretty good, though. I'll attach them here. Using -psm 6 made a big 
> improvement as it meant the table cells were on the correct row. So 
> I ran: 
>
>   tesseract ocr1.png outtest2 -psm 6 
>
> The problems remaining in the output is 7 being consistently recognised 
> as ?, and m is regularly misrecognised as r'n or r‘n. I have suggestions 
> for this. 
>
> If your input data will never have ? in, create an ambig rule which 
> always changes a ? to a 7 (and similar for the r'n issues). The best 
> way to do this would be: 
>
> 1) unpack the english training data: 
>
>   combine_tessdata -u eng.traineddata eng. 
>
> 2) add the following lines to the end of eng.unicharambigs: 
>
> 1        ?        1        7        1 
> 3        r ' n        1        m        1 
> 3        r ‘ n        1        m        1 
>
> 3) recombine the training data: 
>
>   combine_tessdata eng. 
>
> And the eng.traineddata file will contain the extra ambig rules. 
>
> Hope this helps, and let us know how you get on. 
>
> Nick 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to