You can read about the current status of OCRopus here: http://code.google.com/p/ocropus/wiki/OcropusWaves
Here's a summary: (1) We've analyzed the OCR errors in the current version of OCRopus (and written a number of new tools to do so, including ocropus- showlrecs). The conclusion is that the data sets we have been training on do not have wide enough coverage of fonts. We're getting more training data. Once that's done, we'll be training new models. (2) We're integrating the C++ narray data structure with Python numpy arrays so that you can use narray and numpy interchangeably (that is, iulib and ocropus will look like NumPy libraries). This is basically working but needs testing in the full version of OCRopus. (3) There's a lot of cleanup to be done surrounding Python packages, dependencies on pylab/matplotlib, etc. (4) We've been working with some people to help with training on non- Latin scripts. Tasks not started yet are better language models, full Unicode support (it's partially there), and re-integration of book-adaptive support. Also, separately, we've been spending a lot of time resurrecting the handwriting recognition features of OCRopus (this is handwriting recognition for high volume production applications, which is different from end-user handwriting recognition). Tom -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
