Hi,
An easier way to deal with number only or letter, is to use this from
FAQ (http://code.google.com/p/tesseract-ocr/wiki/FAQ):
----------------------------------------------------------------------------------------------------------------------------
How do I recognize only digits?
In 2.03 and above:
Use
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
BEFORE calling an Init function or put this in a text file called
tessdata/configs/digits:
tessedit_char_whitelist 0123456789
and then your command line becomes:
tesseract image.tif outputbase nobatch digits
Warning: Until the old and new config variables get merged, you must
have the nobatch parameter too.
----------------------------------------------------------------------------------------------------------------------------
For the second part : " I'm willing to review the recognised letters
with the
possible words so we can improve the accuracy "
If you are using a 2.0X version you could use the eng.user-words (a
user dictionary) as it's suggested in the FAQ (http://code.google.com/
p/tesseract-ocr/wiki/FAQ)
----------------------------------------------------------------------------------------------------------------------------
How do I provide my own dictionary?
Easy: Replace tessdata/eng.user-words with your own word list, in the
same format - UTF8 text, one word per line.
More difficult, but better for a large dictionary: Replace tessdata/
eng.word-dawg with one created from your own word list, using
wordlist2dawg. See the TrainingTesseract wiki page for details.
----------------------------------------------------------------------------------------------------------------------------
On Mar 8, 2:02 pm, Joe K <[email protected]> wrote:
> Hey Thilanka,
>
> I ran into a similar problem when I only needed it to look at
> hexidecimal values. What I ended up doing was creating a separate
> "langauge" that only contained the specified characters. So you could
> create a langauge of numbers and a language with letters and use
> tesseract to read each part of your image using the appropriate
> language.
>
> The web address below shows you how to train tesseract for a specific
> language. Hope this helps.
>
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
>
> On Mar 6, 11:46 pm, Thilanka Kaushalya <[email protected]> wrote:
>
> > Hi,
>
> > I'm using Tesseract for my letter recognition project and currently
> > the recognitions is quite good.
> > The letters are hand written.But there are some problems when I used it to
> > recognise the letter "O" and
> > number "0". These letters are used in data areas as the fields that enter
> > names. So names cannot have any
> > numbers with it. And when we are using the the system of the data fields as
> > date of birth it only contains
> > numbers. So I'm willing to give restriction to the recognition system saying
> > that the corresponding data fields
> > have only numbers or the letters.
> > And also I'm willing to review the recognised letters with the
> > possible words so we can improve the accuracy
> > of the data. But I don't have any idea about how to do that.
>
> > Can some one help me. Thank you.
>
> > Regards,
> > Thilanka.
> > --http://coders-view.blogspot.com/http://thilankagekawuluwa.blogspot.co...
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.