English Word Filtering

Jason Funk Tue, 15 Nov 2011 00:49:31 -0800

Does Tesseract make any attempts to filter out things that aren't
words? For example, I processed an image and it returned this:


"This is a slide about a mufﬁn's magical
powers. !%i
Mufﬁn Power
HI K
Q55
iii‘

E!!!
iU_
‘gm
!"

All of the words that it found are right, but everything else isn't. I
don't know where it's coming from? Maybe the background or whatever. I
thought that tesseract had a dictionary that it used to know that
"iU_" wasn't a valid word. Or maybe I don't have it turned on
correctly? Or configured right? Any pointers would be great.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

English Word Filtering

Reply via email to