Hi Sven,

Not only did I read these posts, but I was the one to which Jimmy
kindly responded. Here is one quote:

"At any point, if you ask Tesseract what the 'word' it sees is, it
will
simply give you a string composed of the highest-confidence
characters: the word structure also keeps an array of possible
characters along with the confidence from the recogniser. The weight
from a dictionary can add extra weight to a set of characters, but
only if the set of characters that word is composed from is among the
set of choices (some other steps can add or remove characters...
etc)."

Although I did not debug to inspect the alternative choices for the
mistaken 'f' and 'i', it's a reasonable expectations that 't' and 'l'
would be next in line in these two cases respectively, because these
ARE the letters clearly appearing in this image and these are known
frequent mistakes. I'd say 'i' instead of 'l' is the most common
mistake. So I think it's reasonable that I would be disappointed.

If I missed something else that would indicate how I can make it work,
please clarify!

Thanks,
Patrick

On Jul 30, 1:55 pm, Sven Pedersen <[email protected]> wrote:
> Patrick,
> This is a known issue which has been discussed in the last three days.
> Please look in the archives or check the emails you've received from
> the list for the last few days.
> --Sven
>
>
>
> On Fri, Jul 30, 2010 at 8:04 AM, patrickq <[email protected]> 
> wrote:
> > This what I did:
>
> > 1. Created a text file called eng.user-words, containing:
> > Chest
> > Chestnut
> > Floor
> > Vice
>
> > 2. Placed the file in the tessdata folder (next to eng.traineddata)
>
> > 3. Ran recognition on an image returning "Chesf" instead of "Chest"
> > and "Fioor" instead of "Floor". Both mistaken "f" and "i" look quite
> > right visually so I can only assume their confidence level would be
> > low (but I didn't check).
>
> > No effect whatsoever - zip. I can only assume that a variable must be
> > set or a function needs to be called to turn this on (even though
> > there is no mention of needing to set anything in the documentation)
> > or (most likely) I just don't understand how this works and the
> > dictionary kicks in only on the day or the summer solstice and when
> > there is a full moon or something.
>
> > Patrick
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "tesseract-ocr" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to