I will try your sugestion. Thanks !
El sábado, 16 de abril de 2016, 13:56:37 (UTC-4), Tom Morris escribió: > > On Sat, Mar 5, 2016 at 12:12 PM, Tom Morris <[email protected] > <javascript:>> wrote: > >> On Saturday, March 5, 2016 at 5:11:55 AM UTC-5, Gunasekaran Velu wrote: >>> >>> >>> >tesseract.exe Underline.png Underline -l eng -psm 1 >>> >>> Result: This is underline word @ >>> >>> Does it possible to do OCR recognition for underlined text/word on the >>> image? or some image processing need to apply on the image? >>> >>> Attached sample image. >>> >> >> Tesseract knows how to recognize underlined text, as you can see from >> that fact that it got "underline" correct in your example. For some reason >> it's getting confused by the underlined word "test", perhaps because it's >> at the end of the line? >> >> It could potentially represent a bug, but I'd try to recreate it with a >> less artificial example. Of course, pre-processing would improve the >> situation and removing underlines should be that hard to do. >> > > There's a critical word missing from what I wrote and perhaps my English > is a little ambiguous too, so let me try again: > > It could *potentially* represent a bug, but, *if I were you, *I'd try to > recreate it with a less artificial example *and if you confirm that it's > a real bug, file a bug report with all the details of your findings so that > one of the developers can look at it*. Of course, pre-processing would > improve the situation and removing underlines should *not *be that hard > to do. > > The most direct route to success, in my opinion, is going to be > pre-processing to remove the underlines. When you're working on this and > testing the results, you should make sure that you work on representative > images, not little tiny fragments of a few words. When Tesseract has normal > page boundaries, multiple lines of text, etc, it has much more information > available to it to figure out font size, line spacing, etc. > > If you need help in figuring out how to do the line removal, there are > tutorials available on the web, but any recipe is going to need tuning and > experimentation to work best with your particular application. > > http://docs.opencv.org/3.1.0/d1/dee/tutorial_moprh_lines_detection.html > http://www.leptonica.com/line-removal.html > > If you've got additional question, feel free to address them to the list > rather than me personally. I wasn't offering to help you debug this for > free or to write the application for you. > > Tom > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8b0d94e5-748e-4e09-ab9d-eaf9737ab52c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

