Re: UIMA TokenizerTrainer component : the model file is not created

Jörn Kottmann Wed, 22 Jun 2011 03:16:23 -0700

On 6/15/11 4:46 PM, Nicolas Hernandez wrote:

Hello


Does someone have already used the UIMA TokenizerTrainer component ? I
am a bit confused since it does not create any model file.

In my stdout I got this :
Indexing events using cutoff of 5
        Computing event counts...

done. 69669 events
        Indexing...  done.
Sorting and merging events... done. Reduced 69669 events to 16467.
Done indexing.
Incorporating indexed data for training...
done.
        Number of Event Tokens: 16467
            Number of Outcomes: 1
          Number of Predicates: 5624
...done.
Computing model parameters...
Performing 100 iterations.
   1:  .. loglikelihood=0.0     1.0
   2:  .. loglikelihood=0.0     1.0

This look like a problem I got when I trained the model in command
line without using the '<SPLIT>' tag. In command line, It differs
since in command line I also got the following exception
Exception in thread "main" java.lang.IllegalArgumentException: The
maxent model is not compatible!

I solved this problem by adding the tag as it is mentioned in the post
of maxent model is not compatible with Tokenizer training       Fri, 13 May,
09:33
  
http://mail-archives.apache.org/mod_mbox/incubator-opennlp-users/201105.mbox/browser

Does anyone know if it is the same problem ? In that case, how to
specify the '<SPLIT>' tag in the UIMA version? As much as I understand
its role, it is important to let the user the possibility of setting
it.

The <SPLIT> tag is not supported by the UIMA trainer version, there yousimplyannotate your tokens with an UIMA annotation. The training code does notworkwhen you annotate white space tokenized text, since then the trainingcode cannot

figure out which tokens haven been written together and which not.

In UIMA you usually always want to work with the original text, which isusuallynot white space tokenized. To track the tokens, token annotations can beadded to

the CAS.

I guess in your test the serialization code failed because the modelonly had one

outcome, that can be considered as a bug and should be fixed in some way.

Jörn

Re: UIMA TokenizerTrainer component : the model file is not created

Reply via email to