>From: [EMAIL PROTECTED]

>Sub/super no. You would have to evaluate that yourself from the bounding box
>Underlined no, but it does detect underlines, so there is a chance the 
>information can be recovered.
>Strikethrough no, and not much hope either.
>Ray.
The post processor who detects lower and upper case should also detect the 
sub/sup (done by the caller not Tesseract).
 
Underline usually happens in TYPED documents (using typewriter) (like lawyers 
and gov documents) which are pretty much Courier, which makes the problem a 
little less complex.
 
Strikethrough should be done in pre-processing before the segmentation and 
before finding the bounding boxes and sometimes before even the text nontext 
detection if the strike through is a long one.
 
Hussein Al-Hussein
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to