I followed the FAQ - How do I provide my own dictionary -- Tesseract 3 <https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary?> instructions to create a custom dictionary.
In my custom dictionary, I only have the following words: local variables variable name names When I ran tesseract against this test image <http://bit.ly/ocrimage>, the output was: You can ereate local variables for the pipelines within the template by > prefixing the variable name with a “$" Sign. Variable names have to be > eomposed of alphanumeric characters and the underseore. In the example > below I have used a few variations that work for variable names. and I was expecting it to _only_ have words from the custom dictionary. (eg, "local", "variable", etc..) Am I misunderstanding how custom dictionaries are supposed to work? Are the words in a custom dictionary merely a "hint" rather than a constraint on what words can be emitted in the ocr output? Here are the steps I used to regenerate a new eng.traineddata file: $ combine_tessdata -u tessdata/eng.traineddata /tmp/eng. $ wordlist2dawg eng.wordlist eng.word-dawg eng.unicharset (where eng.wordlist contains word list mentioned above with "local", "variables", etc) $ combine_tessdata /tmp/eng. $ mv eng.traineddata ~/tmp/tessdata/eng.traineddata And here is how I called tesseract $ wget http://bit.ly/ocrimage $ tesseract --tessdata-dir /tmp ocrimage ocrimage I'm using the latest subversion trunk version, built via this dockerfile <https://github.com/tleyden/docker/blob/master/tesseract-training/Dockerfile> . -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2bd5da04-08bb-4073-9e8c-06cf49694558%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

