Hi Joe and Moffette,
Thanks for the tips you provided. those are very helpful for
me. These days
I'm testing your instructions. Thanks again.
regards thilanka
>
>
>
> Topic: word
> review<http://groups.google.com/group/tesseract-ocr/t/4e723fa1766b7167>
>
> Joe K <[email protected]> Mar 08 11:02AM -0800
> ^<#12749c27dfe006e1_digest_top>
>
> Hey Thilanka,
>
> I ran into a similar problem when I only needed it to look at
> hexidecimal values. What I ended up doing was creating a separate
> "langauge" that only contained the specified characters. So you could
> create a langauge of numbers and a language with letters and use
> tesseract to read each part of your image using the appropriate
> language.
>
> The web address below shows you how to train tesseract for a specific
> language. Hope this helps.
>
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
>
>
>
>
>
>
>
> Moffette <[email protected]> Mar 08 12:26PM -0800
> ^<#12749c27dfe006e1_digest_top>
>
> Hi,
>
> An easier way to deal with number only or letter, is to use this from
> FAQ (http://code.google.com/p/tesseract-ocr/wiki/FAQ):
>
>
> ----------------------------------------------------------------------------------------------------------------------------
> How do I recognize only digits?
>
> In 2.03 and above:
>
> Use
>
> TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
>
> BEFORE calling an Init function or put this in a text file called
> tessdata/configs/digits:
>
> tessedit_char_whitelist 0123456789
>
> and then your command line becomes:
>
> tesseract image.tif outputbase nobatch digits
>
> Warning: Until the old and new config variables get merged, you must
> have the nobatch parameter too.
>
>
> ----------------------------------------------------------------------------------------------------------------------------
>
> For the second part : " I'm willing to review the recognised letters
> with the
> possible words so we can improve the accuracy "
>
> If you are using a 2.0X version you could use the eng.user-words (a
> user dictionary) as it's suggested in the FAQ (http://code.google.com/
> p/tesseract-ocr/wiki/FAQ)
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------
> How do I provide my own dictionary?
>
> Easy: Replace tessdata/eng.user-words with your own word list, in the
> same format - UTF8 text, one word per line.
>
> More difficult, but better for a large dictionary: Replace tessdata/
> eng.word-dawg with one created from your own word list, using
> wordlist2dawg. See the TrainingTesseract wiki page for details.
>
>
> ----------------------------------------------------------------------------------------------------------------------------
>
> --
http://coders-view.blogspot.com/
http://thilankagekawuluwa.blogspot.com/
http://twitter.com/thilanka_k
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.