Hello Nick White! when i try to unpack the eng.traineddata command: combine_tessdata -u eng.traineddata eng.
i got an error message: Error opening data file eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent d irectory of your "tessdata" directory. Extracting tessdata components from eng.traineddata I did check the environment variable of TESSDATA_PREFIX. see attach image.. <https://lh5.googleusercontent.com/-wJEfnO2NKWY/Uj_h6IdFBII/AAAAAAAAAAY/1T9gtYZVLFo/s1600/env.PNG> thank you...i hope for your response.. Noong Biyernes, Hulyo 20 2012 17:03:41 UTC+8, si Nick White ay sumulat: > > Hi Nikola, > > I suggest you don't try training it. Training is mostly for adding > new languages, or at least significantly different fonts. As your > input is English, and a common font, I doubt it would help much over > the standard english training file. > > The results I got from running Tesseract 3 on your sample were > pretty good, though. I'll attach them here. Using -psm 6 made a big > improvement as it meant the table cells were on the correct row. So > I ran: > > tesseract ocr1.png outtest2 -psm 6 > > The problems remaining in the output is 7 being consistently recognised > as ?, and m is regularly misrecognised as r'n or r‘n. I have suggestions > for this. > > If your input data will never have ? in, create an ambig rule which > always changes a ? to a 7 (and similar for the r'n issues). The best > way to do this would be: > > 1) unpack the english training data: > > combine_tessdata -u eng.traineddata eng. > > 2) add the following lines to the end of eng.unicharambigs: > > 1 ? 1 7 1 > 3 r ' n 1 m 1 > 3 r ‘ n 1 m 1 > > 3) recombine the training data: > > combine_tessdata eng. > > And the eng.traineddata file will contain the extra ambig rules. > > Hope this helps, and let us know how you get on. > > Nick > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

