Just to let people know about current work/progress: Unicode support
is working well enough now that we have been able to train and
recognize documents in non-Latin languages.

Other big changes are parallelization of a lot of the code (with
Python multiprocessing), successful use with handwriting recognition,
more and better documented Python toplevels, a fast hierarchical
nearest neighbor implementation for classification, and code for the
generation of artificial train and test data using Cairo.

If you want to see how all that works, look at ocropy/ocropus-*, in
particular ocropus-pages.  The Python APIs are getting to a point
where they are likely pretty stable.

There are a bunch of things that still need to be done before we want
to actually call it a "release":

(1) We have a patch that unifies the OCRopus C++ arrays with NumPy,
meaning that OCRopus native code becomes a NumPy library.  That
simplifies the Python code significantly.  If that works well testing,
we may make it the default.

(2) We need to go through the current contents of the Python ocropy
library package and be more careful about, and document, what we
actually import and don't import.

(3) We want to port the C++ line recognizer and the RAST Layout in
Python so that they are more amenable to modification.

There's a lot of research code that will be migrated into OCRopus
afterwards, such as book-level adaptation, new layout analysis
methods, and new line recognizers.

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to