On May 15, 5:43 am, Thomas Breuel <[email protected]> wrote:
> OK, I checked in a fixed version.
Great thank you. I'm glad I could help a tiny bit. :)
> In terms of performance, the Tesseract character recognizer provides
> reasonable results on a wide range of documents, while OCRopus pre-0.4 has
> less consistent performance but performs much better in our benchmarks than
> Tesseract for document classes that it has been trained on.
>
> Mostly, what OCRopus needs now for more consistent performance is a lot more
> training on different document types. We're aiming for a distributed
> training model, where many different people can train OCRopus on their
> documents and on their machines and submit the trained models. We can then
> build a "supermodel" out of the components that works well for a lot of
> models. Again, the infrastructure for that is in place, and we hope that
> that will be part of 0.5.
I guess I didn't realize you were doing all the work of replacing
tesseract, but that's
great, there hasn't been much work on it it seems. This distributed
training idea is
also great, hopefully it can be as easy as possible so a large number
of people can
submit good training data. I for one have tons of material I can test
OCR on, but I
can't really work with an overly complicated sytem. Any chance .5 will
be out and
able to be packaged up and in the Ubuntu repositories by the October
release?
Taxman
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---