Hi,

The wordlists are only one element Tesseract uses when judging what 
the best output for characters and words should be. So just adding a 
few more likely words to freq-words - while helpful - is not likely 
to make an enormous difference to the quality of your output.

If you're confident that all of the words you're going to be 
recognising are going to be in your wordlists you could try 
increasing the weighting Tesseract gives to words in the wordlists, 
but it can make "false positives" more common when you come to words 
that aren't in the lists.

See this entry from the FAQ for more details on that:
https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_to_increase_the_trust_in/strength_of_the_dictionary?

Nick

On Tue, Mar 25, 2014 at 07:09:34AM -0700, temp name wrote:
> Hello,
> 
> I trained tesseract for a new language. In my testing I didn't get the 
> accurate
> results, So I added dictionary in my training data file. I created lang.
> word-dawg  and lang.freq-word-dawg and combined them to training file.
> In testing with new trained data files I got similar results. I can see no
> change in recognition of the words which are present in the dictionary. 
> 
> Do anyone know how tesseract uses dictionary ?
> How tesseract select a word from dictionary ?
>  And how to improve accuracy using the dictonary ?
> 
> Thanks!
> 
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
> 
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to