Hello

Does someone have already used the UIMA TokenizerTrainer component ? I
am a bit confused since it does not create any model file.

In my stdout I got this :
Indexing events using cutoff of 5
        Computing event counts...

done. 69669 events
        Indexing...  done.
Sorting and merging events... done. Reduced 69669 events to 16467.
Done indexing.
Incorporating indexed data for training...
done.
        Number of Event Tokens: 16467
            Number of Outcomes: 1
          Number of Predicates: 5624
...done.
Computing model parameters...
Performing 100 iterations.
  1:  .. loglikelihood=0.0      1.0
  2:  .. loglikelihood=0.0      1.0

This look like a problem I got when I trained the model in command
line without using the '<SPLIT>' tag. In command line, It differs
since in command line I also got the following exception
Exception in thread "main" java.lang.IllegalArgumentException: The
maxent model is not compatible!

I solved this problem by adding the tag as it is mentioned in the post
of maxent model is not compatible with Tokenizer training       Fri, 13 May,
09:33
 
http://mail-archives.apache.org/mod_mbox/incubator-opennlp-users/201105.mbox/browser

Does anyone know if it is the same problem ? In that case, how to
specify the '<SPLIT>' tag in the UIMA version? As much as I understand
its role, it is important to let the user the possibility of setting
it.

More globaly I am interested by any return on experience of people who
successfully managed to build models with the UIMA OpenNLP * Trainer
components. For now, I also got some trouble with the SentenceTrainer
and I do not have test the others.

/Nicolas


-- 
nicolas.hernan...@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

Reply via email to