I will try your sugestion.

Thanks !

El sábado, 16 de abril de 2016, 13:56:37 (UTC-4), Tom Morris escribió:
>
> On Sat, Mar 5, 2016 at 12:12 PM, Tom Morris <[email protected] 
> <javascript:>> wrote:
>
>> On Saturday, March 5, 2016 at 5:11:55 AM UTC-5, Gunasekaran Velu wrote:
>>>
>>>
>>> >tesseract.exe Underline.png Underline -l eng -psm 1
>>>
>>> Result: This is underline word @
>>>
>>> Does it possible to do OCR recognition for underlined text/word on the 
>>> image? or some image processing need to apply on the image?
>>>
>>> Attached sample image.
>>>
>>
>> Tesseract knows how to recognize underlined text, as you can see from 
>> that fact that it got "underline" correct in your example. For some reason 
>> it's getting confused by the underlined word "test", perhaps because it's 
>> at the end of the line?
>>
>> It could potentially represent a bug, but I'd try to recreate it with a 
>> less artificial example. Of course, pre-processing would improve the 
>> situation and removing underlines should be that hard to do.
>>
>
> There's a critical word missing from what I wrote and perhaps my English 
> is a little ambiguous too, so let me try again:
>
> It could *potentially* represent a bug, but, *if I were you, *I'd try to 
> recreate it with a less artificial example *and if you confirm that it's 
> a real bug, file a bug report with all the details of your findings so that 
> one of the developers can look at it*. Of course, pre-processing would 
> improve the situation and removing underlines should *not *be that hard 
> to do.
>
> The most direct route to success, in my opinion, is going to be 
> pre-processing to remove the underlines. When you're working on this and 
> testing the results, you should make sure that you work on representative 
> images, not little tiny fragments of a few words. When Tesseract has normal 
> page boundaries, multiple lines of text, etc, it has much more information 
> available to it to figure out font size, line spacing, etc.
>
> If you need help in figuring out how to do the line removal, there are 
> tutorials available on the web, but any recipe is going to need tuning and 
> experimentation to work best with your particular application.
>
> http://docs.opencv.org/3.1.0/d1/dee/tutorial_moprh_lines_detection.html
> http://www.leptonica.com/line-removal.html
>
> If you've got additional question, feel free to address them to the list 
> rather than me personally. I wasn't offering to help you debug this for 
> free or to write the application for you.
>
> Tom
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8b0d94e5-748e-4e09-ab9d-eaf9737ab52c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to