Re: Tesseract does not identify local words written in English

Nick White Fri, 10 Aug 2012 09:16:15 -0700

On Thu, Aug 09, 2012 at 08:32:17AM -0700, Chathuri Gunawardhana
wrote:
> Do I need to train tesseract for local words written in English
> like Matara, Galle? If so How can I do that?


Which version of tesseract are you using? If v2.x, follow the advise
here:
http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary?

Otherwise, I think you have to unpack the .traineddata file, copy in
your word list, then repack. Something like this should work (from
your tessdata directory:

combine_tessdata -u eng.
cp /path/to/new/eng.user-words
combine_tessdata eng.

The new eng.traineddata will now include your words.

Hope this helps, and is clear enough.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract does not identify local words written in English

Reply via email to