[ 
https://issues.apache.org/jira/browse/OPENNLP-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149095#comment-13149095
 ] 

Aliaksandr Autayeu commented on OPENNLP-371:
--------------------------------------------

To reproduce the error message: remove <SPLIT>s from token.train and run 
TokenizerMETest
                
> Confusing error message in tokenizer training
> ---------------------------------------------
>
>                 Key: OPENNLP-371
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-371
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Tokenizer
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Priority: Minor
>              Labels: model, tokenizer, training
>
> The following error message
> java.lang.IllegalArgumentException: The maxent model is not compatible with 
> the tokenizer!
>       at 
> opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
>       at opennlp.tools.tokenize.TokenizerModel.<init>(TokenizerModel.java:73)
>       at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:267)
>       at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:231)
>       at opennlp.tools.tokenize.TokenizerME.train(TokenizerME.java:293)
>       at 
> opennlp.tools.tokenize.TokenizerTestUtil.createMaxentTokenModel(TokenizerTestUtil.java:67)
>       at 
> opennlp.tools.tokenize.TokenizerMETest.testTokenizer(TokenizerMETest.java:54)
> ... cut
> might be confusing. 
> Due to error in my conversion tool, I tried to train a tokenizer model on 
> data without <SPLIT>s, which resulted in a model with one outcome only. This 
> model did not pass validation in ModelUtil.validateOutcomes(), which is 
> correct, however, the error message is a bit confusing and it took some time 
> to understood what is going on. 
> I would agree, that a model with different outcomes than expected is 
> incompatible with the tool, but with less outcomes? Is the model with less 
> outcomes than expected really incompatible? For example, with POS tagger I 
> have corpora and models which use a subset of PTB tagset. 
> However, in case of tokenizer this incompatibility makes sense (model with 1 
> outcome does not work) and in this case the message might be improved to 
> indicate the cause better. Something like: "The maxent model is not 
> compatible with the tokenizer: outcome XXX is not found". 
> Please, advice. Thank you!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to