Re: Individual character recognition

Thomas Breuel Mon, 19 Jan 2009 22:12:33 -0800

On Tue, Jan 20, 2009 at 02:37, Yaroslav Bulatov <[email protected]>wrote:


>
> I'm curious how the pre-built neural network has been trained in
> ocropus 0.3. The reason is that I get misidentification on noiseless
> input, and wondering whether it's due to insufficient training, or to
> overall design of NN and features.


First of all, the neural network code is being replaced, and the model you
are using has not been trained very carefully.

But for any classifier, you will get misidentifications even on "noise free"
input.  By default, we train OCRopus on noisy scanned book data with some
transcription errors.  As a consequence, some instances of "6" may either
have lost their "tail" or may have been mislabeled in the training data.
Furthermore, your digit "0" looks like it is not a usual book font.

Eventually, we need better more and training data for the system, and with
that, we will be able to perform recognition better.  Our system can take
advantage of Gutenberg and other sources for training, so that's what we're
going to use eventually.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Individual character recognition

Reply via email to