Thanks for your reply Ilya, but I'm afraid I'm still none the wiser
here. I know I can create a deterministic and minimal model from raw
text files, but how do I add it to the default model that comes with
Ocropus? I don't want to have to create a new comprehensive one from
scratch because I don't have enough training data. Are there any other
tools you know of?

On Aug 19, 9:45 am, Ilya Mezhirov <[email protected]> wrote:
> Yes, default.fst can't be determinized. There are some conditions
> (which I don't remember) on an FST to determinize it, but an acyclic
> FST should always work. So you can make a word model first,
> determinize/minimize it, and then create cycles to get a line model.
>
> On Aug 19, 9:06 am, Marcin <[email protected]> wrote:
>
>
>
> > I'm trying to build my own language model by extending the default one
> > at /usr/local/share/ocropus/models/default.fst. Following the example
> > of ocropus-linefst and fstutils, I'm doing the following:
>
> > fst = openfst.StdVectorFst.Read("/usr/local/share/ocropus/models/
> > default.fst")
> > filenames = glob.glob("training/*.gt.txt")
> > for filename in filenames:
> >     file = open(filename)
> >     for line in file.readlines():
> >         l = line.strip()
> >         if not l:
> >             continue
> >         fstutils.add_line(fst, l)
>
> > det = Fst()
> > openfst.Determinize(fst, det)
> > (...)
>
> > The rest is truncated because I never get there. The Determinize
> > function aborts the program with the message:
>
> > FATAL: StringWeight::Plus: unequal arguments (non-functional FST?)
>
> > Is this even supposed to work? The same crash happens when I run
> > Determinize on the original model, i.e. without running the for loop
> > above. I suppose I should load the default model into an Ocropus
> > container created with ocropy.make_OcroFST(), but then I can't use the
> > functions in fstutils, which expect StdVectorFst. Does anyone have any
> > advice here?

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to