Confusing error message in tokenizer training
---------------------------------------------

                 Key: OPENNLP-371
                 URL: https://issues.apache.org/jira/browse/OPENNLP-371
             Project: OpenNLP
          Issue Type: Improvement
          Components: Tokenizer
    Affects Versions: tools-1.5.3-incubating
            Reporter: Aliaksandr Autayeu
            Priority: Minor


The following error message

java.lang.IllegalArgumentException: The maxent model is not compatible with the 
tokenizer!
        at 
opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
        at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:73)
        at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:267)
        at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:231)
        at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:293)
        at 
opennlp.tools.tokenize.TokenizerTestUtil.createMaxentTokenModel(TokenizerTestUtil.java:67)
        at 
opennlp.tools.tokenize.TokenizerMETest.testTokenizer(TokenizerMETest.java:54)
... cut

might be confusing. 

Due to error in my conversion tool, I tried to train a tokenizer model on data 
without <SPLIT>s, which resulted in a model with one outcome only. This model 
did not pass validation in ModelUtil.validateOutcomes(), which is correct, 
however, the error message is a bit confusing and it took some time to 
understood what is going on. 

I would agree, that a model with different outcomes than expected is 
incompatible with the tool, but with less outcomes? Is the model with less 
outcomes than expected really incompatible? For example, with POS tagger I have 
corpora and models which use a subset of PTB tagset. 

However, in case of tokenizer this incompatibility makes sense (model with 1 
outcome does not work) and in this case the message might be improved to 
indicate the cause better. Something like: "The maxent model is not compatible 
with the tokenizer: outcome XXX is not found". 

Please, advice. Thank you!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to