Hi Clyde, Nearly there. Try giving the full path to the eng.traineddata file. So this command:
combine_tessdata -u "C:\Program Files\Tesseract-OCR\eng.traineddata" eng. (that path may be slightly wrong as I can't remember exactly where the eng.traineddata is in the Windows install - find it and correct it as necessary) Nick On Sun, Sep 22, 2013 at 11:38:44PM -0700, clyde wrote: > Hello Nick White! > > when i try to unpack the eng.traineddata > command: combine_tessdata -u eng.traineddata eng. > > i got an error message: > > Error opening data file eng.traineddata > Please make sure the TESSDATA_PREFIX environment variable is set to the parent > d > irectory of your "tessdata" directory. > Extracting tessdata components from eng.traineddata > > > I did check the environment variable of TESSDATA_PREFIX. > see attach image.. > > > [env] > > thank you...i hope for your response.. > > > > Noong Biyernes, Hulyo 20 2012 17:03:41 UTC+8, si Nick White ay sumulat: > > Hi Nikola, > > I suggest you don't try training it. Training is mostly for adding > new languages, or at least significantly different fonts. As your > input is English, and a common font, I doubt it would help much over > the standard english training file. > > The results I got from running Tesseract 3 on your sample were > pretty good, though. I'll attach them here. Using -psm 6 made a big > improvement as it meant the table cells were on the correct row. So > I ran: > > tesseract ocr1.png outtest2 -psm 6 > > The problems remaining in the output is 7 being consistently recognised > as ?, and m is regularly misrecognised as r'n or r‘n. I have suggestions > for this. > > If your input data will never have ? in, create an ambig rule which > always changes a ? to a 7 (and similar for the r'n issues). The best > way to do this would be: > > 1) unpack the english training data: > > combine_tessdata -u eng.traineddata eng. > > 2) add the following lines to the end of eng.unicharambigs: > > 1 ? 1 7 1 > 3 r ' n 1 m 1 > 3 r ‘ n 1 m 1 > > 3) recombine the training data: > > combine_tessdata eng. > > And the eng.traineddata file will contain the extra ambig rules. > > Hope this helps, and let us know how you get on. > > Nick > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

