Hello! Nick White! Thank you so much! it did unpacked the eng.traineddata! Thank you...
Noong Lunes, Setyembre 23 2013 19:19:02 UTC+8, si Nick White ay sumulat: > > Hi Clyde, > > Nearly there. Try giving the full path to the eng.traineddata file. > So this command: > > combine_tessdata -u "C:\Program Files\Tesseract-OCR\eng.traineddata" eng. > > (that path may be slightly wrong as I can't remember exactly where > the eng.traineddata is in the Windows install - find it and correct > it as necessary) > > Nick > > On Sun, Sep 22, 2013 at 11:38:44PM -0700, clyde wrote: > > Hello Nick White! > > > > when i try to unpack the eng.traineddata > > command: combine_tessdata -u eng.traineddata eng. > > > > i got an error message: > > > > Error opening data file eng.traineddata > > Please make sure the TESSDATA_PREFIX environment variable is set to the > parent > > d > > irectory of your "tessdata" directory. > > Extracting tessdata components from eng.traineddata > > > > > > I did check the environment variable of TESSDATA_PREFIX. > > see attach image.. > > > > > > [env] > > > > thank you...i hope for your response.. > > > > > > > > Noong Biyernes, Hulyo 20 2012 17:03:41 UTC+8, si Nick White ay sumulat: > > > > Hi Nikola, > > > > I suggest you don't try training it. Training is mostly for adding > > new languages, or at least significantly different fonts. As your > > input is English, and a common font, I doubt it would help much over > > the standard english training file. > > > > The results I got from running Tesseract 3 on your sample were > > pretty good, though. I'll attach them here. Using -psm 6 made a big > > improvement as it meant the table cells were on the correct row. So > > I ran: > > > > tesseract ocr1.png outtest2 -psm 6 > > > > The problems remaining in the output is 7 being consistently > recognised > > as ?, and m is regularly misrecognised as r'n or r‘n. I have > suggestions > > for this. > > > > If your input data will never have ? in, create an ambig rule which > > always changes a ? to a 7 (and similar for the r'n issues). The best > > way to do this would be: > > > > 1) unpack the english training data: > > > > combine_tessdata -u eng.traineddata eng. > > > > 2) add the following lines to the end of eng.unicharambigs: > > > > 1 ? 1 7 1 > > 3 r ' n 1 m 1 > > 3 r ‘ n 1 m 1 > > > > 3) recombine the training data: > > > > combine_tessdata eng. > > > > And the eng.traineddata file will contain the extra ambig rules. > > > > Hope this helps, and let us know how you get on. > > > > Nick > > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

