Hello Nicolas,
I successfully used the OpenNLP UIMA TokenizerTrainer and also the other
trainers, for a simple proof I created an aggregate analysis engine
descriptor with the UIMA WhitespaceTokenizer and the OpenNLP
TokenizerTrainer in a fixed flow, then used a FileSystemCollectionReader to
to feed the pipeline.
In the TokenizerTrainer I set:

        <nameValuePair>
  <name>opennlp.uima.TokenType</name>
   <value>
     <string>org.apache.uima.TokenAnnotation</string>
   </value>
</nameValuePair>
        <nameValuePair>
   <name>opennlp.uima.language</name>
  <value>
     <string>en-US</string>
   </value>
</nameValuePair>
        <nameValuePair>
   <name>opennlp.uima.ModelName</name>
  <value>
      <string>target/Tokens.bin</string>
  </value>
 </nameValuePair>

which then created the Tokens.bin model that I was able to test from command
line and via APIs.
Are you using it in a different way?
Regards,
Tommaso


2011/6/15 Nicolas Hernandez <nicolas.hernan...@gmail.com>

> Hello
>
> Does someone have already used the UIMA TokenizerTrainer component ? I
> am a bit confused since it does not create any model file.
>
> In my stdout I got this :
> Indexing events using cutoff of 5
>        Computing event counts...
>
> done. 69669 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 69669 events to 16467.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 16467
>            Number of Outcomes: 1
>          Number of Predicates: 5624
> ...done.
> Computing model parameters...
> Performing 100 iterations.
>  1:  .. loglikelihood=0.0      1.0
>  2:  .. loglikelihood=0.0      1.0
>
> This look like a problem I got when I trained the model in command
> line without using the '<SPLIT>' tag. In command line, It differs
> since in command line I also got the following exception
> Exception in thread "main" java.lang.IllegalArgumentException: The
> maxent model is not compatible!
>
> I solved this problem by adding the tag as it is mentioned in the post
> of maxent model is not compatible with Tokenizer training       Fri, 13
> May,
> 09:33
>
> http://mail-archives.apache.org/mod_mbox/incubator-opennlp-users/201105.mbox/browser
>
> Does anyone know if it is the same problem ? In that case, how to
> specify the '<SPLIT>' tag in the UIMA version? As much as I understand
> its role, it is important to let the user the possibility of setting
> it.
>
> More globaly I am interested by any return on experience of people who
> successfully managed to build models with the UIMA OpenNLP * Trainer
> components. For now, I also got some trouble with the SentenceTrainer
> and I do not have test the others.
>
> /Nicolas
>
>
> --
> nicolas.hernan...@univ-nantes.fr
> #
> http://enicolashernandez.blogspot.com
> http://www.univ-nantes.fr/hernandez-n
> #
> Laboratoire LINA-TALN CNRS UMR 6241
> tel. +33 (0)2 51 12 58 55
> #
> Université de Nantes - Institut Universitaire de Technologie -
> Département Informatique
> tel. +33 (0)2 40 30 60 67
>

Reply via email to