Hello Nicolas, I successfully used the OpenNLP UIMA TokenizerTrainer and also the other trainers, for a simple proof I created an aggregate analysis engine descriptor with the UIMA WhitespaceTokenizer and the OpenNLP TokenizerTrainer in a fixed flow, then used a FileSystemCollectionReader to to feed the pipeline. In the TokenizerTrainer I set:
<nameValuePair> <name>opennlp.uima.TokenType</name> <value> <string>org.apache.uima.TokenAnnotation</string> </value> </nameValuePair> <nameValuePair> <name>opennlp.uima.language</name> <value> <string>en-US</string> </value> </nameValuePair> <nameValuePair> <name>opennlp.uima.ModelName</name> <value> <string>target/Tokens.bin</string> </value> </nameValuePair> which then created the Tokens.bin model that I was able to test from command line and via APIs. Are you using it in a different way? Regards, Tommaso 2011/6/15 Nicolas Hernandez <nicolas.hernan...@gmail.com> > Hello > > Does someone have already used the UIMA TokenizerTrainer component ? I > am a bit confused since it does not create any model file. > > In my stdout I got this : > Indexing events using cutoff of 5 > Computing event counts... > > done. 69669 events > Indexing... done. > Sorting and merging events... done. Reduced 69669 events to 16467. > Done indexing. > Incorporating indexed data for training... > done. > Number of Event Tokens: 16467 > Number of Outcomes: 1 > Number of Predicates: 5624 > ...done. > Computing model parameters... > Performing 100 iterations. > 1: .. loglikelihood=0.0 1.0 > 2: .. loglikelihood=0.0 1.0 > > This look like a problem I got when I trained the model in command > line without using the '<SPLIT>' tag. In command line, It differs > since in command line I also got the following exception > Exception in thread "main" java.lang.IllegalArgumentException: The > maxent model is not compatible! > > I solved this problem by adding the tag as it is mentioned in the post > of maxent model is not compatible with Tokenizer training Fri, 13 > May, > 09:33 > > http://mail-archives.apache.org/mod_mbox/incubator-opennlp-users/201105.mbox/browser > > Does anyone know if it is the same problem ? In that case, how to > specify the '<SPLIT>' tag in the UIMA version? As much as I understand > its role, it is important to let the user the possibility of setting > it. > > More globaly I am interested by any return on experience of people who > successfully managed to build models with the UIMA OpenNLP * Trainer > components. For now, I also got some trouble with the SentenceTrainer > and I do not have test the others. > > /Nicolas > > > -- > nicolas.hernan...@univ-nantes.fr > # > http://enicolashernandez.blogspot.com > http://www.univ-nantes.fr/hernandez-n > # > Laboratoire LINA-TALN CNRS UMR 6241 > tel. +33 (0)2 51 12 58 55 > # > Université de Nantes - Institut Universitaire de Technologie - > Département Informatique > tel. +33 (0)2 40 30 60 67 >