Hello. I'm trying to train ocropus on Chinese using the code in the repository as of last week. I'm using the training code in extras/ train-unicode (very cool, btw). After producing the training files, I ran:
ocropus trainseg my.model out which took all day but finally produced a model. From the log: [info] updateModel 236200 samples, 6600 features, 127 classes [info] updateModel memory status 1755 Mbytes, 1558 Mvalues [info] training content classifier [info] [mapped 123 to 53 classes] [info] mlp training n 47020 nc 53 [info] mlp round 0 err 0.0198 nhidden 80 ... [info] mlp round 7 err 0.0112 nhidden 159 [info] training junk classifier [info] mlp training n 231200 nc 2 [info] mlp round 0 err 0.0042 nhidden 50 ... [info] mlp round 7 err 0.001 nhidden 23 [info] trained 53140 characters, 2430 lines [warn] 35120 old csegs [info] saving my.model Also in the log were a ton of "transcript doesn't agree with cseg (transcript 4, cseg 25)" type messages. But since I had a model, I thought things were ok. Then I ran: debug=info,transcript cmodel=my.model ocropus lines2fsts out but every single line in the log read like: [warn] skipping out/train/0001/0001 (CHECK ocr-line/glclass.cc:1743 Training incomplete for all classes) I checked out that source location and it's in the LatinClassifier class! Three questions: 1. What do those error messages from trainseg mean? How can I get training to complete? 2. Is lines2fsts correct in using LatinClassifier? I expected MlpClassifier. 3. Am I doing this right? Thank you. -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
