A quick glance at the documentation will tell you that "the dictionary"
lives in several DAWG files, as well in that user-words file.
patrickq wrote, On 2010-07-27 14:59:
I get HAX 6 5-5,- with Tesseract 3.0
What I find remarkable is that half the folks on this forum would love
to disable the word recognition (i.e. dictionary), the other half
would like to enable it - and absolutely no one knows how to enable/
disable the dictionary nor can say for sure if it's actually enabled
or not by default. I am included in the group of the clueless - we
have scanned thousands of business cards and still have no idea
whatsoever what the hell is going on with that elusive dictionary.
I gather from Jimmy's recent answer that the dictionary is contained
in a single file of type text, one word per line, in a file called
eng.user-words (any support for regular expressions there? for example
to say that [\\d]*th is a common word) placed in the Tessdata folder
but we await final confirmation. Is it enough that the file exists?
Does removing the file disable the dictionary?
Clearly many have used the dictionary but sadly it appears that these
knowledgeable people deserted this forum once they got the answers
they need - if you see one of these gentlemen (or ladies, yes) roaming
the streets, please admonish them for not staying subscribed to forum
messages to give back in helping others!
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.