The answer to that question depends on several factors. Tesseract is fairly mature and works on arbitrary binary documents. Tesseract until 3.0 doesn't work well on isolated lines, but it also didn't have much in the way of layout analysis. Tesseract 3.0 offers layout analysis, a neural network recognizer, and improved language modeling.
OCRopus has not had a stable release yet. Its layout analysis is probably better than Tesseracts. Its text recognition isn't as good as Tesseract's yet, but it's rapidly improving. OCRopus also contains a whole range of new technologies for page segmentation, preprocessing, and language modeling. Our long term plan is to make Tesseract available through OCR as well, once the 3.0 release and APIs are stable. OCRopus has largely moved to Python now, which has speeded up development and makes it easier to create custom solutions. The upshot is: both solutions are going to be a lot of work, and they both have their limitations. If Tesseract gets your job done, just use it for the time being. Tom On May 13, 6:06 pm, Christoph <[email protected]> wrote: > Hi, > > i am new to the ocropus-project, so i've got a basic question. What > are the major benefits of using ocropus rather than just tesseract, if > i only want to train the ocr-engine and using this data to recognize > text inside image-files which were already preprocessed (binarization, > segmentation, ...), discounting postprocessing like semantic analysis > and so on? > > -- > You received this message because you are subscribed to the Google Groups > "ocropus" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group > athttp://groups.google.com/group/ocropus?hl=en. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
