On Sat, Mar 5, 2016 at 12:12 PM, Tom Morris <[email protected]> wrote:
> On Saturday, March 5, 2016 at 5:11:55 AM UTC-5, Gunasekaran Velu wrote: >> >> >> >tesseract.exe Underline.png Underline -l eng -psm 1 >> >> Result: This is underline word @ >> >> Does it possible to do OCR recognition for underlined text/word on the >> image? or some image processing need to apply on the image? >> >> Attached sample image. >> > > Tesseract knows how to recognize underlined text, as you can see from that > fact that it got "underline" correct in your example. For some reason it's > getting confused by the underlined word "test", perhaps because it's at the > end of the line? > > It could potentially represent a bug, but I'd try to recreate it with a > less artificial example. Of course, pre-processing would improve the > situation and removing underlines should be that hard to do. > There's a critical word missing from what I wrote and perhaps my English is a little ambiguous too, so let me try again: It could *potentially* represent a bug, but, *if I were you, *I'd try to recreate it with a less artificial example *and if you confirm that it's a real bug, file a bug report with all the details of your findings so that one of the developers can look at it*. Of course, pre-processing would improve the situation and removing underlines should *not *be that hard to do. The most direct route to success, in my opinion, is going to be pre-processing to remove the underlines. When you're working on this and testing the results, you should make sure that you work on representative images, not little tiny fragments of a few words. When Tesseract has normal page boundaries, multiple lines of text, etc, it has much more information available to it to figure out font size, line spacing, etc. If you need help in figuring out how to do the line removal, there are tutorials available on the web, but any recipe is going to need tuning and experimentation to work best with your particular application. http://docs.opencv.org/3.1.0/d1/dee/tutorial_moprh_lines_detection.html http://www.leptonica.com/line-removal.html If you've got additional question, feel free to address them to the list rather than me personally. I wasn't offering to help you debug this for free or to write the application for you. Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAE9vqEE7wq-G1u8J_1x14bOdYASn9kFgjDFpaqn5H%3Du-3jjWgA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

