For specific questions of what you can and cannot do with OpenFST, you might also want to try the OpenFST mailing list; people there have much more experience with creating complex language models.
Tom On Aug 19, 11:37 pm, Marcin <[email protected]> wrote: > Thanks for your reply Ilya, but I'm afraid I'm still none the wiser > here. I know I can create a deterministic and minimal model from raw > text files, but how do I add it to the default model that comes with > Ocropus? I don't want to have to create a new comprehensive one from > scratch because I don't have enough training data. Are there any other > tools you know of? > > On Aug 19, 9:45 am, Ilya Mezhirov <[email protected]> wrote: > > > > > Yes, default.fst can't be determinized. There are some conditions > > (which I don't remember) on an FST to determinize it, but an acyclic > > FST should always work. So you can make a word model first, > > determinize/minimize it, and then create cycles to get a line model. > > > On Aug 19, 9:06 am, Marcin <[email protected]> wrote: > > > > I'm trying to build my own language model by extending the default one > > > at /usr/local/share/ocropus/models/default.fst. Following the example > > > of ocropus-linefst and fstutils, I'm doing the following: > > > > fst = openfst.StdVectorFst.Read("/usr/local/share/ocropus/models/ > > > default.fst") > > > filenames = glob.glob("training/*.gt.txt") > > > for filename in filenames: > > > file = open(filename) > > > for line in file.readlines(): > > > l = line.strip() > > > if not l: > > > continue > > > fstutils.add_line(fst, l) > > > > det = Fst() > > > openfst.Determinize(fst, det) > > > (...) > > > > The rest is truncated because I never get there. The Determinize > > > function aborts the program with the message: > > > > FATAL: StringWeight::Plus: unequal arguments (non-functional FST?) > > > > Is this even supposed to work? The same crash happens when I run > > > Determinize on the original model, i.e. without running the for loop > > > above. I suppose I should load the default model into an Ocropus > > > container created with ocropy.make_OcroFST(), but then I can't use the > > > functions in fstutils, which expect StdVectorFst. Does anyone have any > > > advice here? -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
