word review

Thilanka Kaushalya Thu, 11 Mar 2010 08:00:03 -0800

Hi Joe and Moffette,

             Thanks for the tips you provided. those are very helpful for
me. These days
I'm testing your instructions. Thanks again.


regards thilanka

>
>
>
>   Topic: word 
> review<http://groups.google.com/group/tesseract-ocr/t/4e723fa1766b7167>
>
>    Joe K <[email protected]> Mar 08 11:02AM -0800 
> ^<#12749c27dfe006e1_digest_top>
>
>    Hey Thilanka,
>
>    I ran into a similar problem when I only needed it to look at
>    hexidecimal values. What I ended up doing was creating a separate
>    "langauge" that only contained the specified characters. So you could
>    create a langauge of numbers and a language with letters and use
>    tesseract to read each part of your image using the appropriate
>    language.
>
>    The web address below shows you how to train tesseract for a specific
>    language. Hope this helps.
>
>    http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
>
>
>
>
>
>
>
>    Moffette <[email protected]> Mar 08 12:26PM -0800 
> ^<#12749c27dfe006e1_digest_top>
>
>    Hi,
>
>    An easier way to deal with number only or letter, is to use this from
>    FAQ (http://code.google.com/p/tesseract-ocr/wiki/FAQ):
>
>    
> ----------------------------------------------------------------------------------------------------------------------------
>    How do I recognize only digits?
>
>    In 2.03 and above:
>
>    Use
>
>    TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
>
>    BEFORE calling an Init function or put this in a text file called
>    tessdata/configs/digits:
>
>    tessedit_char_whitelist 0123456789
>
>    and then your command line becomes:
>
>    tesseract image.tif outputbase nobatch digits
>
>    Warning: Until the old and new config variables get merged, you must
>    have the nobatch parameter too.
>
>    
> ----------------------------------------------------------------------------------------------------------------------------
>
>    For the second part : " I'm willing to review the recognised letters
>    with the
>    possible words so we can improve the accuracy "
>
>    If you are using a 2.0X version you could use the eng.user-words (a
>    user dictionary) as it's suggested in the FAQ (http://code.google.com/
>    p/tesseract-ocr/wiki/FAQ)
>
>
>    
> ----------------------------------------------------------------------------------------------------------------------------
>    How do I provide my own dictionary?
>
>    Easy: Replace tessdata/eng.user-words with your own word list, in the
>    same format - UTF8 text, one word per line.
>
>    More difficult, but better for a large dictionary: Replace tessdata/
>    eng.word-dawg with one created from your own word list, using
>    wordlist2dawg. See the TrainingTesseract wiki page for details.
>
>    
> ----------------------------------------------------------------------------------------------------------------------------
>
> --
http://coders-view.blogspot.com/
http://thilankagekawuluwa.blogspot.com/
http://twitter.com/thilanka_k

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

word review

Reply via email to