p.s.

On Saturday, June 28, 2014 12:39:21 AM UTC-4, [email protected] wrote:
>
>
> 3) Attempted to increase the strength of dictionary matches as discussed 
> on the FAQ (
> https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary?),
>  
> both via API calls to setVariable and via a configuration file (tess-two 
> uses tesseract 3.0.3):
>
> language_model_penalty_non_freq_dict_word 1
> language_model_penalty_non_dict_word 1
>
> However, I still occasionally get words that are three characters long and 
> not in the dictionary, e.g. "C9" will be recognized as "129".  When this 
> happens it wrecks havoc with the base 16 decoding, as there are an odd 
> number of hex digits.  Since I can include additional error correction 
> data, I'd be fine with dictionary words being hallucinated, but having 
> three characters returned causes a problem.
>
> This makes me wonder if I am properly following the instructions to 
> increase the strength of dictionary matches.  In this case, I'd be happy to 
> constrain results to strictly only dictionary words.
>

Since these are doubles, you might want to try 0.9 (or even 0.5) to make 
sure that you're not running into some type of boundary condition.  I 
haven't played with them myself, so I'm not sure how they're handled 
internally.

Tom 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dde7b791-a62e-46e0-8871-4aec84a0cdf0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to