Hello Does someone have already used the UIMA TokenizerTrainer component ? I am a bit confused since it does not create any model file.
In my stdout I got this : Indexing events using cutoff of 5 Computing event counts... done. 69669 events Indexing... done. Sorting and merging events... done. Reduced 69669 events to 16467. Done indexing. Incorporating indexed data for training... done. Number of Event Tokens: 16467 Number of Outcomes: 1 Number of Predicates: 5624 ...done. Computing model parameters... Performing 100 iterations. 1: .. loglikelihood=0.0 1.0 2: .. loglikelihood=0.0 1.0 This look like a problem I got when I trained the model in command line without using the '<SPLIT>' tag. In command line, It differs since in command line I also got the following exception Exception in thread "main" java.lang.IllegalArgumentException: The maxent model is not compatible! I solved this problem by adding the tag as it is mentioned in the post of maxent model is not compatible with Tokenizer training Fri, 13 May, 09:33 http://mail-archives.apache.org/mod_mbox/incubator-opennlp-users/201105.mbox/browser Does anyone know if it is the same problem ? In that case, how to specify the '<SPLIT>' tag in the UIMA version? As much as I understand its role, it is important to let the user the possibility of setting it. More globaly I am interested by any return on experience of people who successfully managed to build models with the UIMA OpenNLP * Trainer components. For now, I also got some trouble with the SentenceTrainer and I do not have test the others. /Nicolas -- nicolas.hernan...@univ-nantes.fr # http://enicolashernandez.blogspot.com http://www.univ-nantes.fr/hernandez-n # Laboratoire LINA-TALN CNRS UMR 6241 tel. +33 (0)2 51 12 58 55 # Université de Nantes - Institut Universitaire de Technologie - Département Informatique tel. +33 (0)2 40 30 60 67