I am not too familiar with Tesseract-OCR code, but there is a solution for this problem. If the pre-processor does remove the underline from text (usually in Courier and fixed pitch fonts) then the same code should fix the broken characters to look close to what they should be. I tried it on fixed pitch fonts in my own pre-processor and it restored the characters just fine. It is not a STRIKE-THROUGH though but underline.
Hussein Date: Mon, 9 Mar 2009 21:32:10 -0800 Subject: Re: Recovering Text Underlines From: [email protected] To: [email protected] No code was ever written to do this properly.Characters are chopped form underlines and the remaining underlines put back in a list. If you iterate the underlines list in the TO_BLOCK, you could potentially mark up blobs that have an underline under them, or it might be feasible to add code to do this to the restore_underlines function. In either case, you have to dig into the guts and write code to do it. Ray. On Wed, Feb 4, 2009 at 10:00 AM, Lincolin <[email protected]> wrote: Hello Everybody, Does anyone knows how can I recover the removed underlines from the recognized blocks? Thanks in advanced for any help. Lincolin --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

