current activities and status

Tom Tue, 22 Jun 2010 03:38:46 -0700

You can read about the current status of OCRopus here:

http://code.google.com/p/ocropus/wiki/OcropusWaves


Here's a summary:

(1) We've analyzed the OCR errors in the current version of OCRopus
(and written a number of new tools to do so, including ocropus-
showlrecs).  The conclusion is that the data sets we have been
training on do not have wide enough coverage of fonts.  We're getting
more training data.  Once that's done, we'll be training new models.

(2) We're integrating the C++ narray data structure with Python numpy
arrays so that you can use narray and numpy interchangeably (that is,
iulib and ocropus will look like NumPy libraries).  This is basically
working but needs testing in the full version of OCRopus.

(3) There's a lot of cleanup to be done surrounding Python packages,
dependencies on pylab/matplotlib, etc.

(4) We've been working with some people to help with training on non-
Latin scripts.

Tasks not started yet are better language models, full Unicode support
(it's partially there), and re-integration of book-adaptive support.

Also, separately, we've been spending a lot of time resurrecting the
handwriting recognition features of OCRopus (this is handwriting
recognition for high volume production applications, which is
different from end-user handwriting recognition).

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

current activities and status

Reply via email to