On Sun, Aug 30, 2009 at 11:57, Caius<[email protected]> wrote:
>> If you want to continue training an existing model, you need to use
>> linerec_cpreload; this is used for book adaptation, for example.  It
>> works quite well (see the publications).
>>
>
> After some steps of trial and error, I did this:
>
> float8buffer_datafile=mydefault.f8b linerec_cpreload=default.model
> ocropus trainseg mydefault.model <my receipt as book dir>
>
> cmodel=mydefault.model ocropus lines2fsts <my receipt as book dir>

I don't know which version of OCRopus you're using; if this is the
current tip, the training code is in flux.

> But the result is identical to the result I get using the original
> default.model. I don't know if it should, but ocropus never accesses
> the "mydefault.f8b" file the trainseg operation produced when tracking
> with strace utility.

The float8buffer_datafile argument is for saving pre-extracted datasets.

The linerec_cpreload does should preload the classifier for further
training.  However, whether preloading makes a difference or not
depends on the training parameters, the amount of training data, and
the classifier.

> Prior to trainseg, I did correct the transcriptions in .gt.txt files
> in the bookdir and trainseg mostly did accept the input (like 24 out
> of 30 lines except for those that contained Finnish umlaut a's).

I'm not sure what happens if you just train on 30 lines; probably it
won't change the classifier much.  The default classifier is written
assuming about 100k-10M training samples.

The nearest neighbor classifier is intended for small amounts of
training data and bootstrapping new languages, but it hasn't been
tested and optimized much yet.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to