I am not too familiar with Tesseract-OCR code, but there is a solution for this 
problem.  If the pre-processor does remove the underline from text (usually in 
Courier and fixed pitch fonts) then the same code should fix the broken 
characters to look close to what they should be.  I tried it on fixed pitch 
fonts in my own pre-processor and it restored the characters just fine.  It is 
not a STRIKE-THROUGH though but underline.

Hussein

Date: Mon, 9 Mar 2009 21:32:10 -0800
Subject: Re: Recovering Text Underlines
From: [email protected]
To: [email protected]

No code was ever written to do this properly.Characters are chopped form 
underlines and the remaining underlines put back in a list. If you iterate the 
underlines list in the TO_BLOCK, you could potentially mark up blobs that have 
an underline under them, or it might be feasible to add code to do this to the 
restore_underlines function. In either case, you have to dig into the guts and 
write code to do it.
Ray.

On Wed, Feb 4, 2009 at 10:00 AM, Lincolin <[email protected]> wrote:



Hello Everybody,



Does anyone knows how can I recover the removed underlines from the

recognized blocks?



Thanks in advanced for any help.

Lincolin










--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to