Re: Can I configure Tesseract to always match a dictionary word?

Zdenko Podobný Thu, 15 Nov 2012 11:15:24 -0800

Regarding "user_patterns_suffix" have a look at tesseract manual page [1].

I am not sure if there is possibility to force tesseract choose ocroutput from dictionary (I never tried it ;-) )But you can increase dictionary strength with variableslanguage_model_penalty_non_freq_dict_word andlanguage_model_penalty_non_dict_word. See FAQ[2].

[1]http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data[2]http://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary?--


--
Zdenko

On 03.09.2012 16:02, ms wrote:

Aidano

Did you manage to solve this problem? We have the exact same question?
Would really be interested in any solutions

thanks

On Thursday, March 22, 2012 8:37:44 AM UTC+8, aidano wrote:

I'd like to configure tesseract with a small dictionary (~200 words) and
tell it to always choose the best match in the dictionary. Is that possible?

Also, when inspecting the source code I saw a variable in dict.h called
"user_patterns_suffix". Is there any documentation around this? I'd like to
see if I can use it to tell Tesseract that my images will always contain
one serial number that always has 19 characters with no spaces.


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Can I configure Tesseract to *always* match a dictionary word?

Reply via email to

Re: Can I configure Tesseract to always match a dictionary word?