On Tue, Jan 20, 2009 at 02:37, Yaroslav Bulatov <[email protected]>wrote:
> > I'm curious how the pre-built neural network has been trained in > ocropus 0.3. The reason is that I get misidentification on noiseless > input, and wondering whether it's due to insufficient training, or to > overall design of NN and features. First of all, the neural network code is being replaced, and the model you are using has not been trained very carefully. But for any classifier, you will get misidentifications even on "noise free" input. By default, we train OCRopus on noisy scanned book data with some transcription errors. As a consequence, some instances of "6" may either have lost their "tail" or may have been mislabeled in the training data. Furthermore, your digit "0" looks like it is not a usual book font. Eventually, we need better more and training data for the system, and with that, we will be able to perform recognition better. Our system can take advantage of Gutenberg and other sources for training, so that's what we're going to use eventually. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
