Is impossible for you in your project to include a post proccesing stage in wich you can find the words you need into the tesseract output? Something like an script who calls tesseract and after that find these words in the output.
Maybe (just maybe, i never try anything like that) you can train tesseract to recognize that especific words, but in my experience tesseract doesn't works ok with non continuous characters (so with words i think worse...) 2012/10/4 Brian Hayward <[email protected]> > Hi, I'm working on a project that is using OCR to detect > different labeled faces on a box (eg. Top, Bottom, Left, etc...). I've > gotten the OCR working but the results leave something to be desired. I > can control how big the font size will be on the box, and there are only > six words I'm looking for. So my question is, is there a way to set > tesseract to ignore things below a certain size so I can help filter out > the noise? So I can use decently large font, and it will know that > anything smaller than that should just be ignored? Also, I tried setting > up eng.user-words file to create a small dictionary for myself, but it > didn't appear to work. Is there a guide for how to set that up so I can > have tesseract just look for the 6 words I'm using and ignore everything > else? > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- * Francisco Loché Costa,* * Ingeniero Técnico de Telecomunicación, esp. Telemática.* -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

