Regarding "user_patterns_suffix" have a look at tesseract manual page [1].
I am not sure if there is possibility to force tesseract choose ocr output from dictionary (I never tried it ;-) ) But you can increase dictionary strength with variables language_model_penalty_non_freq_dict_word and language_model_penalty_non_dict_word. See FAQ[2].


[1] http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data [2] http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary?--

--
Zdenko

On 03.09.2012 16:02, ms wrote:
Aidano

Did you manage to solve this problem? We have the exact same question?
Would really be interested in any solutions

thanks

On Thursday, March 22, 2012 8:37:44 AM UTC+8, aidano wrote:
I'd like to configure tesseract with a small dictionary (~200 words) and
tell it to always choose the best match in the dictionary. Is that possible?

Also, when inspecting the source code I saw a variable in dict.h called
"user_patterns_suffix". Is there any documentation around this? I'd like to
see if I can use it to tell Tesseract that my images will always contain
one serial number that always has 19 characters with no spaces.




--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to