I followed the FAQ - How do I provide my own dictionary -- Tesseract 3 
<https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_provide_my_own_dictionary?>
 instructions 
to create a custom dictionary.

In my custom dictionary, I only have the following words:

local
variables
variable
name
names

When I ran tesseract against this test image <http://bit.ly/ocrimage>, the 
output was:

You can ereate local variables for the pipelines within the template by
> prefixing the variable name with a “$" Sign. Variable names have to be
> eomposed of alphanumeric characters and the underseore. In the example
> below I have used a few variations that work for variable names.


and I was expecting it to _only_ have words from the custom dictionary. 
 (eg, "local", "variable", etc..)

Am I misunderstanding how custom dictionaries are supposed to work?  Are 
the words in a custom dictionary merely a "hint" rather than a constraint 
on what words can be emitted in the ocr output?

Here are the steps I used to regenerate a new eng.traineddata file:

$ combine_tessdata -u tessdata/eng.traineddata /tmp/eng.
$ wordlist2dawg eng.wordlist eng.word-dawg eng.unicharset (where 
eng.wordlist contains word list mentioned above with "local", "variables", 
etc)
$ combine_tessdata /tmp/eng.
$ mv eng.traineddata ~/tmp/tessdata/eng.traineddata

And here is how I called tesseract

$ wget http://bit.ly/ocrimage
$ tesseract --tessdata-dir /tmp ocrimage ocrimage 

I'm using the latest subversion trunk version, built via this dockerfile 
<https://github.com/tleyden/docker/blob/master/tesseract-training/Dockerfile>
.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2bd5da04-08bb-4073-9e8c-06cf49694558%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to