I wanted to announce the OCRopus 0.4 (alpha 4) release. This release
will hopefully be the last alpha release.
There is still a lot of cleanup, performance improvements, and bug
fixing to be done for the 0.5 (beta 1) release, but we think we have
the architecture and other aspects of the system pretty much in place
now. We hope to continue our six month release cycle and get out the
beta in October.
This being an alpha release, you should still expect rough spots and
problems during both installing and running the system.
It is best for us if you report issues and bugs to the OCRopus issue tracker.
I'd like to thank everybody who contributed, including Ilya Mezhirov,
Faisal Shafait, Christian Kofler, Yves Rangoni, Joost van Beusekom,
Mathias Reif, and many others.
Here is a partial list of changes:
* OCRopus source code is now kept in Mercurial
* OCRopus has been turned into a library
* there is a new set of command line programs for book-level recognition
* there is a new line recognizer
* there is a new component model
* OCRopus supports book-level retraining and adaptation
* there are new preprocessing functions
* there is a new language modeling system
* there are many improvements to layout analysis
* OpenFST support is now optional (it's provided through ocrolangmod)
* Tesseract support is now optional (it will be supported through
a separate ocrotess project)
* there is TIFF support
* Lua support has been factored into a separate repository (ocroscript)
* there is a separate, new Python binding
Some limitations:
* autoconf/automake has not been tested much
* the character shape models use a simple model and have been
trained on only about 1.5M characters
o for 0.5 (beta) we expect to have much larger models
trained and in place
* the language model is a simple combination of a dictionary, some
case rules
o for 0.5 (beta) we expect to have much better language models
* the recognizer still has a lot of rough edges
Note that you will probably encounter some exceptions or error
messages while running OCRopus. That's usually harmless: those
exceptions indicate real error conditions, but they are usually
handled in some sensible way. For example, the layout analysis may
sometimes pass a (non-text) image to the line recognizer, and the line
recognizer will probably raise an exception.
Tom
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---