[ 
https://issues.apache.org/jira/browse/OPENNLP-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072986#comment-13072986
 ] 

Jörn Kottmann commented on OPENNLP-238:
---------------------------------------

The model has an outcome list, which contains all observed outcomes in your 
training data. On each prediction it calculates the probability of each 
outcome. The tad dictionary now reduces the allowed outcomes to a smaller set. 
This makes things more accurate and speeds the whole process up.

When the pos tagger comes to "acompanhas" it should advance the existing 
sequences with one of the best predicted outcomes, or if that fails just 
advance all valid sequences. For some reason the later fails and it does not 
advance anything, right? But that is strange and indicates that we have a bug 
somewhere.

I believe it should not be a problem with the tagdict itself, because it is 
validated when the POS Model is loaded. I am not sure what exactly is going 
wrong here.

> BestSequence method in BeamSearch can cause NullPointerException if it can 
> not find a valid sequence
> ----------------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-238
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-238
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: POS Tagger
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> I am using the standard sequence validator of POS Tagger with a 
> TagDictionary. Sometimes there are no outcome that matches with the tags in 
> the dictionary. That is causing a NullPointerException in bestSequence method.
> I think we should add an extra validation: if the heap 'next' still empty 
> after advancing all valid sequences (line 159) we should let it advance 
> invalid sequences. It would make the POS Tagger more robust.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to