I ran into this problem recently.  Here is the solution (I'm using
Tesseract 3.01):
to use user-words list, in dict.h and dict.cpp, find user_words_suffix
and change the "" to "user-words"
//dict.h
STRING_VAR_H(user_words_suffix, "user-words", "A list of user-provided
words.");

//dict.cpp
STRING_INIT_MEMBER(user_words_suffix, "user-words",
  "A list of user-provided words.",
  getImage()->getCCUtil()->params()),

This assumes, then, that in your tessdata folder there is a file named
"eng.user-words" with your user made word list.

.bj.

On Sep 27, 8:03 am, Slavko Kocjancic <[email protected]> wrote:
> Hello...
>
> I have question about user-words.
> I use eng.traineddata and OCR works well. But the problem is that text
> have a lot of foregin names and that is not recongnized correctly. So I
> try to make file eng.user-words in same directory as eng.traineddata is
> and put that names in file one name per line. Then I try to OCR again.
> But no difference. So the question is.
> Is enought to just make file eng.user-words or something else should be
> done?
>
> Thanks.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to