On Thu, Aug 09, 2012 at 08:32:17AM -0700, Chathuri Gunawardhana wrote: > Do I need to train tesseract for local words written in English > like Matara, Galle? If so How can I do that?
Which version of tesseract are you using? If v2.x, follow the advise here: http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary? Otherwise, I think you have to unpack the .traineddata file, copy in your word list, then repack. Something like this should work (from your tessdata directory: combine_tessdata -u eng. cp /path/to/new/eng.user-words combine_tessdata eng. The new eng.traineddata will now include your words. Hope this helps, and is clear enough. Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

