[
https://issues.apache.org/jira/browse/OPENNLP-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046446#comment-13046446
]
Jörn Kottmann commented on OPENNLP-199:
---------------------------------------
The bug is not caused by the strict compare, the delta is quite large.
Here is the stack trace:
Error Message
expected:<0.7833622183708839> but was:<0.7813815300817034>
Stacktrace
junit.framework.AssertionFailedError: expected:<0.7833622183708839> but
was:<0.7813815300817034>
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:71)
at
opennlp.perceptron.PerceptronPrepAttachTest.testPerceptronOnPrepAttachData(PerceptronPrepAttachTest.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
at
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at $Proxy0.invoke(Unknown Source)
at
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
at
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
> Refactor the PerceptronTrainer class to address a couple of problems
> --------------------------------------------------------------------
>
> Key: OPENNLP-199
> URL: https://issues.apache.org/jira/browse/OPENNLP-199
> Project: OpenNLP
> Issue Type: Improvement
> Components: Maxent
> Affects Versions: maxent-3.0.1-incubating
> Reporter: Jörn Kottmann
> Assignee: Jason Baldridge
> Fix For: tools-1.5.2-incubating, maxent-3.0.2-incubating
>
>
> - Changed the update to be the actual perceptron update: when a label
> that is not the gold label is chosen for an event, the parameters
> associated with that label are decremented, and the parameters
> associated with the gold label are incremented. I checked this
> empirically on several datasets, and it works better than the
> previous update (and it involves fewer updates).
> - stepsize is decreased by stepsize/1.05 on every iteration, ensuring
> better stability toward the end of training. This is actually the
> main reason that the training set accuracy obtained during parameter
> update continued to be different from that computed when parameters
> aren't updated. Now, the parameters don't jump as much in later
> iterations, so things settle down and those two accuracies converge
> if enough iterations are allowed.
> - Training set accuracy is computed once per iteration.
> - Training stops if the current training set accuracy changes less
> than a given tolerance from the accuracies obtained in each of the
> previous three iterations.
> - Averaging is done differently than before. Rather than doing an
> immediate update, parameters are simply accumulated after iterations
> (this makes the code much easier to understand/maintain). Also, not
> every iteration is used, as this tends to give to much weight to the
> final iterations, which don't actually differ that much from one
> another. I tried a few things and found a simple method that works
> well: sum the parameters from the first 20 iterations and then sum
> parameters from any further iterations that are perfect squares (25,
> 36, 49, etc). This gets a good (diverse) sample of parameters for
> averaging since the distance between subsequent parameter sets gets
> larger as the number of iterations gets bigger.
> - Added prepositional phrase attachment dataset to
> src/test/resources/data/ppa. This is done with permission from
> Adwait Ratnarparkhi -- see the README for details.
> - Created unit test to check perceptron training consistency, using
> the prepositional phrase attachment data. It would be good to do the
> same for maxent.
> - Added ListEventStream to make a stream out of List<Event>
> - Added some helper methods, e.g. maxIndex, to simplify the code in
> the main algorithm.
> - The training stats aren't shown for every iteration. Now it is just
> the first 10 and then every 10th iteration after that.
> - modelDistribution, params, evalParams and others are no longer class
> variables. They have been pushed into the findParameters
> method. Other variables could/should be made non-global too, but
> leaving as is for now.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira