We're reimplementing and improving a text/graphics segmentation algorithm from Leptonica; that's a pretty standard morphology-based algorithm.
We have also implemented text/graphics segmentation based on machine learning. Both of those will make it into the codebase in the future (but I can't say exactly when). Both work for arbitrary layouts. For actually analyzing the resulting layouts, you need to use the Voronoi page segmenter. It gives less good performance on Manhattan layouts but works on many non-Manhattan type documents. Tom On May 27, 7:07 am, avd <[email protected]> wrote: > Which algorithms does Ocropus uses for separating text and graphics > from a document image which can have arbitrary non-Manhattan layout? -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
