>From: [EMAIL PROTECTED]
>Sub/super no. You would have to evaluate that yourself from the bounding box
>Underlined no, but it does detect underlines, so there is a chance the
>information can be recovered.
>Strikethrough no, and not much hope either.
>Ray.
The post processor who detects lower and upper case should also detect the
sub/sup (done by the caller not Tesseract).
Underline usually happens in TYPED documents (using typewriter) (like lawyers
and gov documents) which are pretty much Courier, which makes the problem a
little less complex.
Strikethrough should be done in pre-processing before the segmentation and
before finding the bounding boxes and sometimes before even the text nontext
detection if the strike through is a long one.
Hussein Al-Hussein
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---