Does Tesseract make any attempts to filter out things that aren't words? For example, I processed an image and it returned this:
"This is a slide about a muffin's magical powers. !%i Muffin Power HI K Q55 iii‘ E!!! iU_ ‘gm !" All of the words that it found are right, but everything else isn't. I don't know where it's coming from? Maybe the background or whatever. I thought that tesseract had a dictionary that it used to know that "iU_" wasn't a valid word. Or maybe I don't have it turned on correctly? Or configured right? Any pointers would be great. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

