The training code in 0.3.* is not all that useful (someone attempted to convert an MLP handwriting recognizer to OCR, but didn't really succeed).
If you want to do training, please use the current development branch of OCRopus; it has a completely rewritten set of classifiers. You can pull it from mercurial.iupr.org I'll try to write up more documentation in the coming weeks. Shouldn't it always decrease? For stochastic gradient descent training, error rates can go up if the learning rates are not chosen well. The new training code automatically adapts the learning rates to avoid that. > Unrelated question -- has ocropus ever been trained on dataset the > size of NIST? I have 4 GB and ocropus crashes with out of memory error > on a dataset with 16k examples (during feature extraction phase) The new code has been trained on datasets with many million training samples (on a 32G machine). On a 4 Gbyte machine, you can currently train about 2-3 million samples on the MLP if you want to keep all the training examples in memory at once (it depend on which combination of features you keep). Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
