There are several ways of interfacing with OCRopus. By default, the command line tools are set up for recognizing books and technical reports. Those recognizers will not work well on diagrams because the layout analysis fails.
But you don't have to use the layout analysis; ocropus-linerec takes individual images of text lines and gives you corresponding text. There is no ready-made program for extracting text lines from diagrams right now. Eventually, there will be but there aren't yet. Right now, you still need to program that yourself. There are some potentially useful tools you can find in ocropy/ocrolib, but it's a non-trivial task. Tom On Jan 3, 1:42 pm, Manon <[email protected]> wrote: > Hi, > > I am member in a bachelor project at Hasso Plattner Institute Potsdam > (Germany). We are about to build an online process platform and for > this we need an OCR program which is able to extract the texts from > pictures of process models (like BPMN, EPK etc). > > OCRopus is the best one we found but it can't find enough of the texts > and often nothing at all. > > Will this "find text in between graphs and images"-algorithm be > implemented in the next time (lets say until the beginning of > february)? > Or how much work would it be to implement this? Because if it wouldn't > go beyond the scope of our project we could implement it ourself. > > Thanks in advance, > Manon -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
