Well, it's trained, it's just not discriminatively trained. What's shipping is just a model using word frequencies; we have tried n-gram with back-off, but that's not ready for prime time yet.
Tom On Aug 20, 9:53 pm, Marcin <[email protected]> wrote: > That's what I feared. It's not the end of the world, though. I can > live with small models created from scratch for now. Thanks again for > your time Ilya. > > On Aug 20, 2:33 am, Ilya Mezhirov <[email protected]> wrote: > > > > > The language model isn't exactly trained, at least AFAIK, more like > > constructed. > > It's similar to a regexp like ((a | aaron | abacus | ... | zygote) > > ( |,|.|!|?))* except more complicated and with probabilities on arcs. > > One can't just add stuff to it, it has to be recreated from scratch. I > > don't know how this is done currently. > > > On Aug 20, 8:37 am, Marcin <[email protected]> wrote:> Thanks for your > > reply Ilya, but I'm afraid I'm still none the wiser > > > here. I know I can create a deterministic and minimal model from raw > > > text files, but how do I add it to the default model that comes with > > > Ocropus? I don't want to have to create a new comprehensive one from > > > scratch because I don't have enough training data. Are there any other > > > tools you know of? -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
