Short columns are generally a problem and there is no simple, general purpose solution that automatically works for arbitrary documents. In some cases, the only way to tell is by actually seeing which combination of lines makes the most sense at the textual level.
If you have a collection of pages, you can train layout analysis models on it. We've published a couple of papers on trainable layout analysis, but that code hasn't been integrated into OCRopus yet. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/adzvAtWsLh4J. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
