Thanks a lot Jörn, it works now. I don't know why I typed SKIP instead of SPLIT and I was focused on the error message.
Sorry for taking yr time. Best wishes, Jean-Claude On Fri, May 13, 2011 at 11:47 AM, Jörn Kottmann <[email protected]> wrote: > On 5/13/11 11:33 AM, Jean-Claude Dauphin wrote: > >> Hi, >> >> I tried to produce train models for french from a set of french human >> resource positions data which are splitted in sentences and use it as >> sample >> train data stream. >> It works fine for the sentence detector model using * >> SentenceDectectorME.train* >> >> However, if I use the same sample as Tokenizer training content with * >> opennlp.tools.tokenize.TokenizerME.train* , I got the following error: >> >> The maxent model is not compatible! >> > > The error message sounds a bit strange, what it means is that you only > train > with NO_SPLIT events (I guess). The produced model will not be able to > split any tokens. > > We should fix the model validation code, or put out some more meaningful > error > message. > > Anyway, to solve you problem rename your <SKIP> tags to <SPLIT>. > > Have a look at our documentation here: > > http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.tokenizer.cmdline.training > > Hope that helps, > Jörn > -- Jean-Claude Dauphin [email protected] [email protected] http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org
