> As I understand it, one of the strengths of ocropus is the addition of > a powerful "Document Layout Analysis" to a powerful OCR (tesseract) am > I right?
OCRopus is really a separate project. You can use Tesseract as a line recognizer, but you don't have to (actually, we need to re-enable that capability; it is currently broken). > I believe that a positive point for the ocropus would be the > development of a frontend (perhaps based on some frontend that already > exists, like gscan2pdf) that would enable ordinary people to use the > ocropus with ease. Yes, once the API settles down. > I also think that, if possible, adding the feature "text under the > image" (such as ABBY FineReader, here is an picture of the > finereader's feature in > Portuguese:http://www.imagebam.com/image/c1c78f93276763 > ) would be very welcome. This feature enables the scanning of old > texts without concern for the correction of all errors, because those > who are reading the text has access also to the original image (here > is an exaple of "text under image" in That's part of the DECAPOD project. It actually does a lot more, including token-based compression. It also provides a web-based frontend. > Ocropus can read Portuguese? If not, the tesseract-ocr language files > for Brasilian Portuguese text is compatible with ocropus? No, the model files are completely different. Also, we're still debugging the Unicode support. Tom -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
