Hi all, I was wondering how well OCRopus handles (or could be made to handle) text lines with very little vertical separation. Specifically, I have a set of documents that contain blocks of text where the vertical spacing between lines is negative, resulting in frequent overlap between characters on different lines. It seems to me that OCRopus's strategy of over segmentation followed by alignment with a language model could work well at recognizing these characters, especially if segments can be assigned to multiple lines. However, that requires getting past the line segmentation stage; these lines are so closely spaced that they often appear to be a single block and are mistaken for oversized non-text by OCRopus' default page pre-processing.
Thanks! Derek -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
