Chinese

Tom H Wed, 03 Feb 2010 20:53:03 -0800

Hello.  I'm trying to train ocropus on Chinese using the code in the
repository as of last week.  I'm using the training code in extras/
train-unicode (very cool, btw).  After producing the training files, I
ran:


  ocropus trainseg my.model out

which took all day but finally produced a model.  From the log:

[info] updateModel 236200 samples, 6600 features, 127 classes
[info] updateModel memory status 1755 Mbytes, 1558 Mvalues
[info] training content classifier
[info] [mapped 123 to 53 classes]
[info] mlp training n 47020 nc 53
[info] mlp round 0 err 0.0198 nhidden 80
...
[info] mlp round 7 err 0.0112 nhidden 159
[info] training junk classifier
[info] mlp training n 231200 nc 2
[info] mlp round 0 err 0.0042 nhidden 50
...
[info] mlp round 7 err 0.001 nhidden 23
[info] trained 53140 characters, 2430 lines
[warn] 35120 old csegs
[info] saving my.model

Also in the log were a ton of "transcript doesn't agree with cseg
(transcript 4, cseg 25)" type messages.

But since I had a model, I thought things were ok.  Then I ran:

  debug=info,transcript cmodel=my.model ocropus lines2fsts out

but every single line in the log read like:

[warn] skipping out/train/0001/0001 (CHECK ocr-line/glclass.cc:1743
Training incomplete for all classes)

I checked out that source location and it's in the LatinClassifier
class!

Three questions:

1. What do those error messages from trainseg mean?  How can I get
training to complete?

2. Is lines2fsts correct in using LatinClassifier?  I expected
MlpClassifier.

3. Am I doing this right?

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Chinese

Reply via email to