[ 
https://issues.apache.org/jira/browse/OPENNLP-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025024#comment-13025024
 ] 

Jason Baldridge commented on OPENNLP-155:
-----------------------------------------

2011/4/22 Jörn Kottmann (JIRA) <[email protected]>

Yes. However, we can certainly fix this so that is is both fast and correct.
I just coded it to get the right answer, but it is essentially doing double
work now.


Hmm... so there is actually an odd aspect of how the perceptron is
implemented that isn't the textbook way. The trigger for whether to update
the parameters is if the correct label is assigned a score <= zero, and if
any incorrect label gets a score > zero. Normally, update happens whenever
an incorrect label gets a higher score than the correct label, regardless of
positivity or negativity. Anyway, I changed it so that the same code is
used, based on the updateValue variable. What that means is that now
incorrect labels get updated when their score is zero. Otherwise the code
should be the same. But that is the likely difference because initially the
scores of many examples will be zero, and then updates made in the first
pass are different from the previous version. You could test that by
changing the line in the previous
version<http://svn.apache.org/viewvc/incubator/opennlp/trunk/opennlp-maxent/src/main/java/opennlp/perceptron/PerceptronTrainer.java?revision=1049002&view=markup>
:

else {
   if (modelDistribution[oi] > 0) {

to be >= 0 instead.

Jason





-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


> unreliable training set accuracy in perceptron
> ----------------------------------------------
>
>                 Key: OPENNLP-155
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-155
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Maxent
>    Affects Versions: maxent-3.0.1-incubating
>            Reporter: Jason Baldridge
>            Assignee: Jason Baldridge
>            Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> The training accuracies reported during perceptron training were much higher 
> than final training accuracy, which turned out to be an artifact of the way 
> training examples were ordered.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to