On Sun, Aug 30, 2009 at 11:57, Caius<[email protected]> wrote: >> If you want to continue training an existing model, you need to use >> linerec_cpreload; this is used for book adaptation, for example. It >> works quite well (see the publications). >> > > After some steps of trial and error, I did this: > > float8buffer_datafile=mydefault.f8b linerec_cpreload=default.model > ocropus trainseg mydefault.model <my receipt as book dir> > > cmodel=mydefault.model ocropus lines2fsts <my receipt as book dir>
I don't know which version of OCRopus you're using; if this is the current tip, the training code is in flux. > But the result is identical to the result I get using the original > default.model. I don't know if it should, but ocropus never accesses > the "mydefault.f8b" file the trainseg operation produced when tracking > with strace utility. The float8buffer_datafile argument is for saving pre-extracted datasets. The linerec_cpreload does should preload the classifier for further training. However, whether preloading makes a difference or not depends on the training parameters, the amount of training data, and the classifier. > Prior to trainseg, I did correct the transcriptions in .gt.txt files > in the bookdir and trainseg mostly did accept the input (like 24 out > of 30 lines except for those that contained Finnish umlaut a's). I'm not sure what happens if you just train on 30 lines; probably it won't change the classifier much. The default classifier is written assuming about 100k-10M training samples. The nearest neighbor classifier is intended for small amounts of training data and bootstrapping new languages, but it hasn't been tested and optimized much yet. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
