[jira] [Assigned] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
[ https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1381: -- Assignee: (was: Suneel Marthi) > OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The > Security Manager is deprecated and will be removed in a future release > --- > > Key: OPENNLP-1381 > URL: https://issues.apache.org/jira/browse/OPENNLP-1381 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface >Affects Versions: 2.1.0 > Environment: MacOS Monterey 12.5 Intel (same issue in M1 chip) > brew-installed OpenJDK 18.0.2 >Reporter: Bertrand Rigaldies >Priority: Major > Fix For: 2.1.1 > > Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot > 2022-07-31 at 9.11.09 PM.png > > > As of OpenJDK 18, the Security Manager has been deprecated (see [JEP > 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java: > java.lang.UnsupportedOperationException: The Security Manager is deprecated > and will be removed in a future release > at java.base/java.lang.System.setSecurityManager(System.java:416) > at > opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) > at > com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
[ https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1381: --- Affects Version/s: 2.1.0 > OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The > Security Manager is deprecated and will be removed in a future release > --- > > Key: OPENNLP-1381 > URL: https://issues.apache.org/jira/browse/OPENNLP-1381 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface >Affects Versions: 2.1.0 > Environment: MacOS Monterey 12.5 Intel (same issue in M1 chip) > brew-installed OpenJDK 18.0.2 >Reporter: Bertrand Rigaldies >Assignee: Suneel Marthi >Priority: Major > Fix For: 2.1.1 > > Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot > 2022-07-31 at 9.11.09 PM.png > > > As of OpenJDK 18, the Security Manager has been deprecated (see [JEP > 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java: > java.lang.UnsupportedOperationException: The Security Manager is deprecated > and will be removed in a future release > at java.base/java.lang.System.setSecurityManager(System.java:416) > at > opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) > at > com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
[ https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1381: --- Fix Version/s: 2.1.1 > OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The > Security Manager is deprecated and will be removed in a future release > --- > > Key: OPENNLP-1381 > URL: https://issues.apache.org/jira/browse/OPENNLP-1381 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface > Environment: MacOS Monterey 12.5 Intel (same issue in M1 chip) > brew-installed OpenJDK 18.0.2 >Reporter: Bertrand Rigaldies >Assignee: Suneel Marthi >Priority: Major > Fix For: 2.1.1 > > Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot > 2022-07-31 at 9.11.09 PM.png > > > As of OpenJDK 18, the Security Manager has been deprecated (see [JEP > 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java: > java.lang.UnsupportedOperationException: The Security Manager is deprecated > and will be removed in a future release > at java.base/java.lang.System.setSecurityManager(System.java:416) > at > opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) > at > com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
[ https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1381: -- Assignee: Suneel Marthi > OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The > Security Manager is deprecated and will be removed in a future release > --- > > Key: OPENNLP-1381 > URL: https://issues.apache.org/jira/browse/OPENNLP-1381 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface > Environment: MacOS Monterey 12.5 Intel (same issue in M1 chip) > brew-installed OpenJDK 18.0.2 >Reporter: Bertrand Rigaldies >Assignee: Suneel Marthi >Priority: Major > Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot > 2022-07-31 at 9.11.09 PM.png > > > As of OpenJDK 18, the Security Manager has been deprecated (see [JEP > 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java: > java.lang.UnsupportedOperationException: The Security Manager is deprecated > and will be removed in a future release > at java.base/java.lang.System.setSecurityManager(System.java:416) > at > opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) > at java.base/java.lang.reflect.Method.invoke(Method.java:577) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) > at > com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) > at > com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) > at > com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235) > at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (OPENNLP-1397) Build should fail fast if an unsupported JDK is used
[ https://issues.apache.org/jira/browse/OPENNLP-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1397. Fix Version/s: 2.1.0 Resolution: Fixed Enforce Maven-enforcer-plugin to kick in JDK version validation during the 'validate' phase > Build should fail fast if an unsupported JDK is used > > > Key: OPENNLP-1397 > URL: https://issues.apache.org/jira/browse/OPENNLP-1397 > Project: OpenNLP > Issue Type: Task > Components: Build, Packaging and Test >Affects Versions: 2.0.0, 2.1.0 >Reporter: Jeff Zemerick >Assignee: Suneel Marthi >Priority: Minor > Fix For: 2.1.0 > > > The build should fail fast if an unsupported JDK is used. Check to see if the > Maven Enforcer plugin is configured correctly or if something else is needed. > This issue came about from [~smarthi] testing the OpenNLP 2.1.0 RC1 using > Amazon Corretto 8. The build did not fail until a failed unit test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OPENNLP-1271) Illegal Argument Exception
[ https://issues.apache.org/jira/browse/OPENNLP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1271: --- Fix Version/s: 1.9.3 > Illegal Argument Exception > -- > > Key: OPENNLP-1271 > URL: https://issues.apache.org/jira/browse/OPENNLP-1271 > Project: OpenNLP > Issue Type: Bug > Components: Name Finder >Reporter: Raza Abbas >Priority: Major > Fix For: 1.9.3 > > > I am using this library in some production code. I am getting the following > exception once in a while, so I am not being able to reproduce it exactly > always. > > java.lang.IllegalArgumentException: The span [18..23) is outside the given > text which has length 12! > at opennlp.tools.util.Span.getCoveredText(Span.java:231) > at opennlp.tools.util.Span.spansToStrings(Span.java:351) > at > opennlp.tools.tokenize.AbstractTokenizer.tokenize(AbstractTokenizer.java:25) > at opennlp.tools.tokenize.TokenizerME.tokenize(TokenizerME.java:76) > > > This seems like an internal OpenNLP issue, and not how I'm using the library. > Any help would be appreciated. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1219) change private instance variable featureGenerators to protected in DefaultNameContextGenerator
[ https://issues.apache.org/jira/browse/OPENNLP-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1219. -- > change private instance variable featureGenerators to protected in > DefaultNameContextGenerator > -- > > Key: OPENNLP-1219 > URL: https://issues.apache.org/jira/browse/OPENNLP-1219 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.1 > > > TokenNameFinderTrainer allows users to customize TokenNameFinderFactory via > -factory option. As I want to override > DefaultNameContextGenerator.getContext(), I made the sub-class of > TokenNameFinderFactory and created an instance of the sub-class of > DefaultNameContextGenerator in the constructor of my TokenNameFinderFactory. > However, I couldn't implement getContext() method of my > DefaultNameContextGenerator because I couldn't access private member > featureGenerators. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OPENNLP-1270) Add new languages to the language detector
[ https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023639#comment-17023639 ] Suneel Marthi commented on OPENNLP-1270: Could we also look at Europaarl corpus maybe? [https://www.statmt.org/europarl/] > Add new languages to the language detector > -- > > Key: OPENNLP-1270 > URL: https://issues.apache.org/jira/browse/OPENNLP-1270 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 1.9.3 > > Attachments: report.txt, report.txt > > > Leipzig has several other languages that might be useful to add to the > language detector. I've selected some with > 10k sentences. Once I build > the model and evaluate performance, I'll share the reports, the model and a > tgz of the *-sentences.txt files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1264) Trivial fixes to enable building on, gasp, Windows
[ https://issues.apache.org/jira/browse/OPENNLP-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1264. Fix Version/s: 1.9.3 Assignee: Tim Allison Resolution: Fixed > Trivial fixes to enable building on, gasp, Windows > -- > > Key: OPENNLP-1264 > URL: https://issues.apache.org/jira/browse/OPENNLP-1264 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Trivial > Fix For: 1.9.3 > > > I had to change 3 things to get a clean build on Windows...I'm not sure the > solutions are the most elegant, and these may be user error > 1) I had to turn off (fail on error) in style checking because of a problem w > new lines. On nearly every file, I got this failure. > {noformat} > [ERROR] src\test\java\opennlp\tools\util\VersionTest.java:[0] (misc) > NewlineAtEndOfFile: File does not end with a newline. > [WARNING] checkstyle:check violations detected but failOnViolation set to > false > {noformat} > 2) {{LanguageDetectorEvaluatorTest#processSample}} fails because '\n' are > expected, but Windows, of course, writes '\r\n' with {{println}} > 3) I intentionally have a space in the directory structure to my IdeaProjects > directory, which can cause problems on Windows when finding paths. There are > two areas where this happens. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1264) Trivial fixes to enable building on, gasp, Windows
[ https://issues.apache.org/jira/browse/OPENNLP-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1264. -- > Trivial fixes to enable building on, gasp, Windows > -- > > Key: OPENNLP-1264 > URL: https://issues.apache.org/jira/browse/OPENNLP-1264 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Trivial > Fix For: 1.9.3 > > > I had to change 3 things to get a clean build on Windows...I'm not sure the > solutions are the most elegant, and these may be user error > 1) I had to turn off (fail on error) in style checking because of a problem w > new lines. On nearly every file, I got this failure. > {noformat} > [ERROR] src\test\java\opennlp\tools\util\VersionTest.java:[0] (misc) > NewlineAtEndOfFile: File does not end with a newline. > [WARNING] checkstyle:check violations detected but failOnViolation set to > false > {noformat} > 2) {{LanguageDetectorEvaluatorTest#processSample}} fails because '\n' are > expected, but Windows, of course, writes '\r\n' with {{println}} > 3) I intentionally have a space in the directory structure to my IdeaProjects > directory, which can cause problems on Windows when finding paths. There are > two areas where this happens. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1269) Add alternate to NGramModel that uses straight Strings rather than StringList
[ https://issues.apache.org/jira/browse/OPENNLP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1269. Fix Version/s: 1.9.3 Assignee: Jeffrey T. Zemerick Resolution: Fixed > Add alternate to NGramModel that uses straight Strings rather than StringList > - > > Key: OPENNLP-1269 > URL: https://issues.apache.org/jira/browse/OPENNLP-1269 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Jeffrey T. Zemerick >Priority: Trivial > Fix For: 1.9.3 > > > On OPENNLP-1265, I found that we could halve the lang detect speed on longer > documents if we didn't create a StringList for every ngram, but rather used a > plain String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1269) Add alternate to NGramModel that uses straight Strings rather than StringList
[ https://issues.apache.org/jira/browse/OPENNLP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1269. -- > Add alternate to NGramModel that uses straight Strings rather than StringList > - > > Key: OPENNLP-1269 > URL: https://issues.apache.org/jira/browse/OPENNLP-1269 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Jeffrey T. Zemerick >Priority: Trivial > Fix For: 1.9.3 > > > On OPENNLP-1265, I found that we could halve the lang detect speed on longer > documents if we didn't create a StringList for every ngram, but rather used a > plain String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1272) Add support for Catalan and Indonesian stemmers
[ https://issues.apache.org/jira/browse/OPENNLP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1272. Assignee: Jeffrey T. Zemerick Resolution: Fixed > Add support for Catalan and Indonesian stemmers > --- > > Key: OPENNLP-1272 > URL: https://issues.apache.org/jira/browse/OPENNLP-1272 > Project: OpenNLP > Issue Type: Improvement > Components: Stemmer >Reporter: Vlad Ciotlausi >Assignee: Jeffrey T. Zemerick >Priority: Minor > Labels: Stemmer > Fix For: 1.9.3 > > > Added Indonesian and Catalan stemmers plus some minor fixes. > > This PR includes: > * Creating the Java code based on the .sbl files and adding them to the > stemmer folder > * Updating relevant classes to support the new stemmers > * Adding tests for the new stemmers > * Changing the romanian unit tests for stemmers because it didn't test a > romanian word -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1272) Add support for Catalan and Indonesian stemmers
[ https://issues.apache.org/jira/browse/OPENNLP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1272. -- > Add support for Catalan and Indonesian stemmers > --- > > Key: OPENNLP-1272 > URL: https://issues.apache.org/jira/browse/OPENNLP-1272 > Project: OpenNLP > Issue Type: Improvement > Components: Stemmer >Reporter: Vlad Ciotlausi >Assignee: Jeffrey T. Zemerick >Priority: Minor > Labels: Stemmer > Fix For: 1.9.3 > > > Added Indonesian and Catalan stemmers plus some minor fixes. > > This PR includes: > * Creating the Java code based on the .sbl files and adding them to the > stemmer folder > * Updating relevant classes to support the new stemmers > * Adding tests for the new stemmers > * Changing the romanian unit tests for stemmers because it didn't test a > romanian word -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1258) implement Serializable in langdetect and normalize
[ https://issues.apache.org/jira/browse/OPENNLP-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1258. Resolution: Fixed > implement Serializable in langdetect and normalize > -- > > Key: OPENNLP-1258 > URL: https://issues.apache.org/jira/browse/OPENNLP-1258 > Project: OpenNLP > Issue Type: Improvement > Components: Language Detector >Reporter: Lucas Avanço >Priority: Major > Fix For: 1.9.3 > > > It is necessary to make some classes of langdetect and normalizer to > implement Serializable in order to save language detection models -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1258) implement Serializable in langdetect and normalize
[ https://issues.apache.org/jira/browse/OPENNLP-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1258. -- > implement Serializable in langdetect and normalize > -- > > Key: OPENNLP-1258 > URL: https://issues.apache.org/jira/browse/OPENNLP-1258 > Project: OpenNLP > Issue Type: Improvement > Components: Language Detector >Reporter: Lucas Avanço >Priority: Major > Fix For: 1.9.3 > > > It is necessary to make some classes of langdetect and normalizer to > implement Serializable in order to save language detection models -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1261) Language Detector fails to predict language on long input texts
[ https://issues.apache.org/jira/browse/OPENNLP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1261. Resolution: Won't Fix > Language Detector fails to predict language on long input texts > --- > > Key: OPENNLP-1261 > URL: https://issues.apache.org/jira/browse/OPENNLP-1261 > Project: OpenNLP > Issue Type: Improvement > Components: Language Detector >Reporter: Jörn Kottmann >Assignee: Jörn Kottmann >Priority: Major > Fix For: 1.9.3 > > Attachments: langid_plus_minus_rollups.zip, leipzig_1000-sents.zip, > opennlp_as_is_vs_1261.zip > > > If the input text is very long, e.g. 100k chars, then the lang detect > component fails to detect the language correctly, even though the text is > only written in one language. > This issue was tracked down to the context generator, where the count of the > ngrams are ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1261) Language Detector fails to predict language on long input texts
[ https://issues.apache.org/jira/browse/OPENNLP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1261. -- > Language Detector fails to predict language on long input texts > --- > > Key: OPENNLP-1261 > URL: https://issues.apache.org/jira/browse/OPENNLP-1261 > Project: OpenNLP > Issue Type: Improvement > Components: Language Detector >Reporter: Jörn Kottmann >Assignee: Jörn Kottmann >Priority: Major > Fix For: 1.9.3 > > Attachments: langid_plus_minus_rollups.zip, leipzig_1000-sents.zip, > opennlp_as_is_vs_1261.zip > > > If the input text is very long, e.g. 100k chars, then the lang detect > component fails to detect the language correctly, even though the text is > only written in one language. > This issue was tracked down to the context generator, where the count of the > ngrams are ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1261) Language Detector fails to predict language on long input texts
[ https://issues.apache.org/jira/browse/OPENNLP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1261: --- Fix Version/s: 1.9.3 > Language Detector fails to predict language on long input texts > --- > > Key: OPENNLP-1261 > URL: https://issues.apache.org/jira/browse/OPENNLP-1261 > Project: OpenNLP > Issue Type: Improvement > Components: Language Detector >Reporter: Jörn Kottmann >Assignee: Jörn Kottmann >Priority: Major > Fix For: 1.9.3 > > Attachments: langid_plus_minus_rollups.zip, leipzig_1000-sents.zip, > opennlp_as_is_vs_1261.zip > > > If the input text is very long, e.g. 100k chars, then the lang detect > component fails to detect the language correctly, even though the text is > only written in one language. > This issue was tracked down to the context generator, where the count of the > ngrams are ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1234) Dictionary.asStringSet() is returning single tokens
[ https://issues.apache.org/jira/browse/OPENNLP-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1234: --- Fix Version/s: 1.9.3 > Dictionary.asStringSet() is returning single tokens > > > Key: OPENNLP-1234 > URL: https://issues.apache.org/jira/browse/OPENNLP-1234 > Project: OpenNLP > Issue Type: Bug > Components: Name Finder >Reporter: Evandro Fonseca >Priority: Major > Labels: easyfix > Fix For: 1.9.3 > > Original Estimate: 10m > Remaining Estimate: 10m > > When we use the method Dictionary.asStringSet(), it returns a list of single > tokens. > For example: European Union -> European. Basically, it returns just the first > token of each instance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OPENNLP-1286) Updating from 1.7.0 to 1.9.1 breaks
[ https://issues.apache.org/jira/browse/OPENNLP-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023356#comment-17023356 ] Suneel Marthi commented on OPENNLP-1286: Could you confirm that this is still the case? A reproducible test case would be helpful. > Updating from 1.7.0 to 1.9.1 breaks > --- > > Key: OPENNLP-1286 > URL: https://issues.apache.org/jira/browse/OPENNLP-1286 > Project: OpenNLP > Issue Type: Bug >Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4 >Reporter: xia0c >Priority: Major > > When I try to upgrade Opennlp-tools from 1.7.0 to the version after 1.8.0. > The following code breaks. > {code:java} > public void demo(String summary) { > > try { > inputStrean = new FileInputStream(Paths.get(bin).toFile()); > DoccatModel doccatModel = new DoccatModel(inputStrean); > DocumentCategorizerME myCategorizer = new > DocumentCategorizerME(doccatModel); > double[] outcomes = myCategorizer.categorize(summary); > String category = myCategorizer.getBestCategory(outcomes); > > LOGGER.info(category); > } catch (IOException e) { > LOGGER.error(ExceptionUtils.getStackTrace(e)); > } > } > {code} > The code should pass, but it throws an error: > {code:java} > incompatible types: java.lang.String cannot be converted to java.lang.String[] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-32) Write more documentation for the parser
[ https://issues.apache.org/jira/browse/OPENNLP-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-32. > Write more documentation for the parser > --- > > Key: OPENNLP-32 > URL: https://issues.apache.org/jira/browse/OPENNLP-32 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Reporter: Jörn Kottmann >Assignee: Suneel Marthi >Priority: Major > Labels: help-wanted > Fix For: 1.9.3 > > > Write more documentation for the parser. It should cover the same topic as the > documentation for the other components. > The following sections are still missing: > - No general introduction, it should be explained what parsing is, ideally > with a few images > of parse trees > - Explain how to navigate in the parse tree with the Parse class, that should > be > explained based on a sample parse tree > - Add a section about how the training api can be used > - Remove all todos, and open jira issues for them if they are not solved with > this issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-32) Write more documentation for the parser
[ https://issues.apache.org/jira/browse/OPENNLP-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-32: - Fix Version/s: 1.9.3 > Write more documentation for the parser > --- > > Key: OPENNLP-32 > URL: https://issues.apache.org/jira/browse/OPENNLP-32 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Reporter: Jörn Kottmann >Assignee: Suneel Marthi >Priority: Major > Labels: help-wanted > Fix For: 1.9.3 > > > Write more documentation for the parser. It should cover the same topic as the > documentation for the other components. > The following sections are still missing: > - No general introduction, it should be explained what parsing is, ideally > with a few images > of parse trees > - Explain how to navigate in the parse tree with the Parse class, that should > be > explained based on a sample parse tree > - Add a section about how the training api can be used > - Remove all todos, and open jira issues for them if they are not solved with > this issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OPENNLP-32) Write more documentation for the parser
[ https://issues.apache.org/jira/browse/OPENNLP-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-32: Assignee: Suneel Marthi > Write more documentation for the parser > --- > > Key: OPENNLP-32 > URL: https://issues.apache.org/jira/browse/OPENNLP-32 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Reporter: Jörn Kottmann >Assignee: Suneel Marthi >Priority: Major > Labels: help-wanted > > Write more documentation for the parser. It should cover the same topic as the > documentation for the other components. > The following sections are still missing: > - No general introduction, it should be explained what parsing is, ideally > with a few images > of parse trees > - Explain how to navigate in the parse tree with the Parse class, that should > be > explained based on a sample parse tree > - Add a section about how the training api can be used > - Remove all todos, and open jira issues for them if they are not solved with > this issue -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1268) StringUtil.toLowerCase() should lowercase codepoints, not chars
[ https://issues.apache.org/jira/browse/OPENNLP-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1268. -- > StringUtil.toLowerCase() should lowercase codepoints, not chars > --- > > Key: OPENNLP-1268 > URL: https://issues.apache.org/jira/browse/OPENNLP-1268 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Trivial > Fix For: 1.9.3 > > > {{StringUtils#toLowerCase()}} should run Character.tolowerCase() on code > points. It is currently failing to lowercase characters beyond the BMP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1268) StringUtil.toLowerCase() should lowercase codepoints, not chars
[ https://issues.apache.org/jira/browse/OPENNLP-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1268. Fix Version/s: 1.9.3 Resolution: Fixed > StringUtil.toLowerCase() should lowercase codepoints, not chars > --- > > Key: OPENNLP-1268 > URL: https://issues.apache.org/jira/browse/OPENNLP-1268 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Trivial > Fix For: 1.9.3 > > > {{StringUtils#toLowerCase()}} should run Character.tolowerCase() on code > points. It is currently failing to lowercase characters beyond the BMP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OPENNLP-1268) StringUtil.toLowerCase() should lowercase codepoints, not chars
[ https://issues.apache.org/jira/browse/OPENNLP-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1268: -- Assignee: Tim Allison > StringUtil.toLowerCase() should lowercase codepoints, not chars > --- > > Key: OPENNLP-1268 > URL: https://issues.apache.org/jira/browse/OPENNLP-1268 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Trivial > > {{StringUtils#toLowerCase()}} should run Character.tolowerCase() on code > points. It is currently failing to lowercase characters beyond the BMP. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1258) implement Serializable in langdetect and normalize
[ https://issues.apache.org/jira/browse/OPENNLP-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1258: --- Fix Version/s: 1.9.3 > implement Serializable in langdetect and normalize > -- > > Key: OPENNLP-1258 > URL: https://issues.apache.org/jira/browse/OPENNLP-1258 > Project: OpenNLP > Issue Type: Improvement > Components: Language Detector >Reporter: Lucas Avanço >Priority: Major > Fix For: 1.9.3 > > > It is necessary to make some classes of langdetect and normalizer to > implement Serializable in order to save language detection models -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OPENNLP-1265) Improve speed of lang detect
[ https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1265: -- Assignee: Tim Allison > Improve speed of lang detect > > > Key: OPENNLP-1265 > URL: https://issues.apache.org/jira/browse/OPENNLP-1265 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > > Over on TIKA-2790, we found that opennlp's language detector is far, far > slower than Optimaize and yalder. > Let's use this ticket to see what we can do to improve lang detect's speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1265) Improve speed of lang detect
[ https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1265: --- Fix Version/s: 1.9.3 > Improve speed of lang detect > > > Key: OPENNLP-1265 > URL: https://issues.apache.org/jira/browse/OPENNLP-1265 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 1.9.3 > > > Over on TIKA-2790, we found that opennlp's language detector is far, far > slower than Optimaize and yalder. > Let's use this ticket to see what we can do to improve lang detect's speed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OPENNLP-1270) Add new languages to the language detector
[ https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1270: -- Assignee: Tim Allison > Add new languages to the language detector > -- > > Key: OPENNLP-1270 > URL: https://issues.apache.org/jira/browse/OPENNLP-1270 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Attachments: report.txt, report.txt > > > Leipzig has several other languages that might be useful to add to the > language detector. I've selected some with > 10k sentences. Once I build > the model and evaluate performance, I'll share the reports, the model and a > tgz of the *-sentences.txt files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OPENNLP-1266) Limit normalization regexes in UrlCharSequenceNormalizer
[ https://issues.apache.org/jira/browse/OPENNLP-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1266: -- Assignee: Tim Allison > Limit normalization regexes in UrlCharSequenceNormalizer > > > Key: OPENNLP-1266 > URL: https://issues.apache.org/jira/browse/OPENNLP-1266 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 1.9.3 > > > The {{MAIL_REGEX}} in UrlCharSequenceNormalizer is unbounded and requires > backtracking. In rare cases, this can cause eye-opening performance costs. > > I tested the other regexes in the other normalizers. I could be wrong, but > they don't appear to require backtracking, and there are no surprising > performance costs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1266) Limit normalization regexes in UrlCharSequenceNormalizer
[ https://issues.apache.org/jira/browse/OPENNLP-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1266: --- Fix Version/s: 1.9.3 > Limit normalization regexes in UrlCharSequenceNormalizer > > > Key: OPENNLP-1266 > URL: https://issues.apache.org/jira/browse/OPENNLP-1266 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Priority: Major > Fix For: 1.9.3 > > > The {{MAIL_REGEX}} in UrlCharSequenceNormalizer is unbounded and requires > backtracking. In rare cases, this can cause eye-opening performance costs. > > I tested the other regexes in the other normalizers. I could be wrong, but > they don't appear to require backtracking, and there are no surprising > performance costs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1270) Add new languages to the language detector
[ https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1270: --- Fix Version/s: 1.9.3 > Add new languages to the language detector > -- > > Key: OPENNLP-1270 > URL: https://issues.apache.org/jira/browse/OPENNLP-1270 > Project: OpenNLP > Issue Type: Task >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 1.9.3 > > Attachments: report.txt, report.txt > > > Leipzig has several other languages that might be useful to add to the > language detector. I've selected some with > 10k sentences. Once I build > the model and evaluate performance, I'll share the reports, the model and a > tgz of the *-sentences.txt files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string
[ https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1267. -- > Allow the LanguageDetector to stop before processing the full string > > > Key: OPENNLP-1267 > URL: https://issues.apache.org/jira/browse/OPENNLP-1267 > Project: OpenNLP > Issue Type: Improvement >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 1.9.3 > > > On TIKA-2790, I found that Yalder is stopping after computing character > ngrams on roughly the first 60 characters. That _likely_ explains its > impressive speed. Let's make this "stopping short" feature available in > OpenNLP. > > Ideally, the language detector wouldn't copy the full String, it wouldn't > normalize the full String, and it wouldn't compute ngrams on the full String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string
[ https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1267. Fix Version/s: 1.9.3 Resolution: Fixed > Allow the LanguageDetector to stop before processing the full string > > > Key: OPENNLP-1267 > URL: https://issues.apache.org/jira/browse/OPENNLP-1267 > Project: OpenNLP > Issue Type: Improvement >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > Fix For: 1.9.3 > > > On TIKA-2790, I found that Yalder is stopping after computing character > ngrams on roughly the first 60 characters. That _likely_ explains its > impressive speed. Let's make this "stopping short" feature available in > OpenNLP. > > Ideally, the language detector wouldn't copy the full String, it wouldn't > normalize the full String, and it wouldn't compute ngrams on the full String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner
[ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1214: --- Fix Version/s: (was: 1.9.2) 1.9.3 > use hash to avoid linear search in DefaultEndOfSentenceScanner > -- > > Key: OPENNLP-1214 > URL: https://issues.apache.org/jira/browse/OPENNLP-1214 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.3 > > > When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to > check if each characters in the sentence is one of eos characters. I think > we'd better use HashSet to keep eosCharacters instead of char[]. > In accordance with this replacement, I'd like to make > getEndOfSentenceCharacters() deprecated because it returns char[] and nobody > in OpenNLP calls it at present, and I'd like to add the equivalent method > which returns Set of eos chars. Though it cannot keep the order of > eos chars but I don't think it can be a problem anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1272) Add support for Catalan and Indonesian stemmers
[ https://issues.apache.org/jira/browse/OPENNLP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1272: --- Fix Version/s: (was: 1.9.2) 1.9.3 > Add support for Catalan and Indonesian stemmers > --- > > Key: OPENNLP-1272 > URL: https://issues.apache.org/jira/browse/OPENNLP-1272 > Project: OpenNLP > Issue Type: Improvement > Components: Stemmer >Reporter: Vlad Ciotlausi >Priority: Minor > Labels: Stemmer > Fix For: 1.9.3 > > > Added Indonesian and Catalan stemmers plus some minor fixes. > > This PR includes: > * Creating the Java code based on the .sbl files and adding them to the > stemmer folder > * Updating relevant classes to support the new stemmers > * Adding tests for the new stemmers > * Changing the romanian unit tests for stemmers because it didn't test a > romanian word -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string
[ https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1267: -- Assignee: Tim Allison > Allow the LanguageDetector to stop before processing the full string > > > Key: OPENNLP-1267 > URL: https://issues.apache.org/jira/browse/OPENNLP-1267 > Project: OpenNLP > Issue Type: Improvement >Reporter: Tim Allison >Assignee: Tim Allison >Priority: Major > > On TIKA-2790, I found that Yalder is stopping after computing character > ngrams on roughly the first 60 characters. That _likely_ explains its > impressive speed. Let's make this "stopping short" feature available in > OpenNLP. > > Ideally, the language detector wouldn't copy the full String, it wouldn't > normalize the full String, and it wouldn't compute ngrams on the full String. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner
[ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1214: --- Fix Version/s: (was: 1.9.1) 1.9.2 > use hash to avoid linear search in DefaultEndOfSentenceScanner > -- > > Key: OPENNLP-1214 > URL: https://issues.apache.org/jira/browse/OPENNLP-1214 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 1.9.2 > > > When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to > check if each characters in the sentence is one of eos characters. I think > we'd better use HashSet to keep eosCharacters instead of char[]. > In accordance with this replacement, I'd like to make > getEndOfSentenceCharacters() deprecated because it returns char[] and nobody > in OpenNLP calls it at present, and I'd like to add the equivalent method > which returns Set of eos chars. Though it cannot keep the order of > eos chars but I don't think it can be a problem anyway. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (OPENNLP-1209) Is there documentation for feature gneration?
[ https://issues.apache.org/jira/browse/OPENNLP-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1209. -- > Is there documentation for feature gneration? > - > > Key: OPENNLP-1209 > URL: https://issues.apache.org/jira/browse/OPENNLP-1209 > Project: OpenNLP > Issue Type: Question > Components: Documentation >Reporter: Joseph >Priority: Major > Fix For: 1.9.1 > > > I could not find any documentation about how to use the feature generation > while training a model for the name finder. Nor could I find any information > about how to train a Maxent or Perceptron model, or how to configure these > algorithms. > I am aware of the basic documentation here > [http://opennlp.apache.org/docs/1.9.0/manual/opennlp.html] but it does not > help beyond getting started. > seems other people cannot find any as well > [https://stackoverflow.com/questions/11989633/custom-feature-generation-in-opennlp-namefinder-api] > So basically you have put all this work into creating this software but not > included sufficient documentation or example how to configure and use it, > which basically renders it useless to us if we cannot figure it out. > Is there another location where I can find further documentation or examples? > If not are there any plans to address this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (OPENNLP-1209) Is there documentation for feature gneration?
[ https://issues.apache.org/jira/browse/OPENNLP-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1209. Resolution: Not A Problem Fix Version/s: 1.9.1 > Is there documentation for feature gneration? > - > > Key: OPENNLP-1209 > URL: https://issues.apache.org/jira/browse/OPENNLP-1209 > Project: OpenNLP > Issue Type: Question > Components: Documentation >Reporter: Joseph >Priority: Major > Fix For: 1.9.1 > > > I could not find any documentation about how to use the feature generation > while training a model for the name finder. Nor could I find any information > about how to train a Maxent or Perceptron model, or how to configure these > algorithms. > I am aware of the basic documentation here > [http://opennlp.apache.org/docs/1.9.0/manual/opennlp.html] but it does not > help beyond getting started. > seems other people cannot find any as well > [https://stackoverflow.com/questions/11989633/custom-feature-generation-in-opennlp-namefinder-api] > So basically you have put all this work into creating this software but not > included sufficient documentation or example how to configure and use it, > which basically renders it useless to us if we cannot figure it out. > Is there another location where I can find further documentation or examples? > If not are there any plans to address this? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (OPENNLP-1230) Replace MD5 and SHA1 with SHA256/512
[ https://issues.apache.org/jira/browse/OPENNLP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1230: -- Assignee: Suneel Marthi > Replace MD5 and SHA1 with SHA256/512 > > > Key: OPENNLP-1230 > URL: https://issues.apache.org/jira/browse/OPENNLP-1230 > Project: OpenNLP > Issue Type: Task >Reporter: Jeff Zemerick >Assignee: Suneel Marthi >Priority: Major > Fix For: 1.9.1 > > > Per the Apache Distribution Procedure, replace MD5 and SHA1 with SHA256/512. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (OPENNLP-1188) Update Penn Treebank URL
[ https://issues.apache.org/jira/browse/OPENNLP-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1188. -- > Update Penn Treebank URL > > > Key: OPENNLP-1188 > URL: https://issues.apache.org/jira/browse/OPENNLP-1188 > Project: OpenNLP > Issue Type: Task > Components: Documentation >Reporter: Jeff Zemerick >Assignee: Jeff Zemerick >Priority: Minor > Fix For: 1.8.5 > > > As reported on the users mailing list, the URL for the PennTree Bank needs > updated to be > [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html] > (from [http://www.cis.upenn.edu/~treebank/)] on the page > http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.postagger.tagging.cmdline. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (OPENNLP-1189) Token model creation fails without at least one tag
[ https://issues.apache.org/jira/browse/OPENNLP-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1189. -- > Token model creation fails without at least one tag > --- > > Key: OPENNLP-1189 > URL: https://issues.apache.org/jira/browse/OPENNLP-1189 > Project: OpenNLP > Issue Type: Bug > Components: Tokenizer >Affects Versions: 1.8.4 >Reporter: Jeff Zemerick >Assignee: Jeff Zemerick >Priority: Minor > Fix For: 1.8.5 > > > The tokenizer training documentation for 1.8.4 states that "Tokens are either > separated by a whitespace or by a special tag." However, it appears > that training files if the training data does not contain at least one > tag. To reproduce: > Training on the sample data works fine: > {quote}Pierre Vinken, 61 years old, will join the board as a > nonexecutive director Nov. 29. > Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing > group. > Rudolph Agnew, 55 years old and former chairman of Consolidated Gold > Fields PLC, > was named a nonexecutive director of this British industrial > conglomerate. > {quote} > Replacing the tags with whitespace causes the training to fail with > InsufficientTrainingDataException: > {quote}Pierre Vinken , 61 years old , will join the board as a nonexecutive > director Nov. 29 . > Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group . > Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields > PLC , > was named a nonexecutive director of this British industrial conglomerate . > {quote} > Modifying the training data to contain a single tag allows model > training to complete successfully: > {quote}Pierre Vinken, 61 years old , will join the board as a > nonexecutive director Nov. 29 . > Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group . > Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields > PLC , > was named a nonexecutive director of this British industrial conglomerate . > {quote} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (OPENNLP-1180) Use String[] instead of StringList in LanguageModel API
[ https://issues.apache.org/jira/browse/OPENNLP-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1180. -- > Use String[] instead of StringList in LanguageModel API > --- > > Key: OPENNLP-1180 > URL: https://issues.apache.org/jira/browse/OPENNLP-1180 > Project: OpenNLP > Issue Type: Task > Components: language model >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili >Priority: Major > Fix For: 1.8.5 > > > Current {{LanguageModel}} API uses {{StringList}}, however that's less > convenient for easy consumption as one needs to look into StringList and > adapt its code to convert arrays or collections of Strings into StringList. > Additionally this requires more objects to be created that will be soon > discarded by garbage collection e.g. the input StringList for > LM#calculateProbability and LM#predictNextTokens. > I propose to deprecate those methods and add new ones with exactly the same > signature but using String[] (or String...) instead. > Internally StringLists can be kept or not, but that would be an > implementation detail and allows to move away from using them more easily. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (OPENNLP-1182) LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise
[ https://issues.apache.org/jira/browse/OPENNLP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1182: --- Affects Version/s: 1.8.4 Fix Version/s: 1.8.5 > LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise > --- > > Key: OPENNLP-1182 > URL: https://issues.apache.org/jira/browse/OPENNLP-1182 > Project: OpenNLP > Issue Type: Bug >Affects Versions: 1.8.4 >Reporter: Steve Rowe >Priority: Major > Fix For: 1.8.5 > > > Contrary to the docs (see below), LanguageDetectorConverterTool doesn't > actually do anything at all; the class is empty. > {quote} > The following sequence of commands shows how to convert the Leipzig Corpora > collection at folder leipzig-train/ to the default Language Detector format, > by creating groups of 5 sentences as documents and limiting to 1 > documents per language. Them, it shuffles the result and select the first > 10 lines as train corpus and the last 2 as evaluation corpus: > {noformat} > $ bin/opennlp LanguageDetectorConverter leipzig -sentencesDir leipzig-train/ > -sentencesPerSample 5 -samplesPerLanguage 1 > leipzig.txt > $ perl -MList::Util=shuffle -e 'print shuffle();' < leipzig.txt > > leipzig_shuf.txt > $ head -10 < leipzig_shuf.txt > leipzig.train > $ tail -2 < leipzig_shuf.txt > leipzig.eval > {noformat} > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (OPENNLP-1187) Issue in finding accuracy of model
[ https://issues.apache.org/jira/browse/OPENNLP-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1187: --- Fix Version/s: 1.8.5 > Issue in finding accuracy of model > -- > > Key: OPENNLP-1187 > URL: https://issues.apache.org/jira/browse/OPENNLP-1187 > Project: OpenNLP > Issue Type: Bug > Components: Doccat, Machine Learning >Affects Versions: 1.8.4 >Reporter: Aman Garg >Priority: Major > Fix For: 1.8.5 > > Original Estimate: 48h > Remaining Estimate: 48h > > the trainingStats function in NaiveBayesTrainer class is not working properly > and display wrong result. > In findParameters(), at line 154 i.e. > EvalParameters evalParams = new EvalParameters(params, numOutcomes); > should be replaced by following block: > > double[] outcomeTotals = new double[outcomeLabels.length]; > for (int i = 0; i < params.length; ++i) { > Context context = params[i]; > for (int j = 0; j < context.getOutcomes().length; ++j) { > int outcome = context.getOutcomes()[j]; > double count = context.getParameters()[j]; > outcomeTotals[outcome] += count; > } > } > evalParams = new NaiveBayesEvalParameters(params, > outcomeLabels.length, outcomeTotals, predLabels.length); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OPENNLP-1190) CONLL02 format
[ https://issues.apache.org/jira/browse/OPENNLP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424564#comment-16424564 ] Suneel Marthi commented on OPENNLP-1190: Would u like to submit a PR with fix ? > CONLL02 format > -- > > Key: OPENNLP-1190 > URL: https://issues.apache.org/jira/browse/OPENNLP-1190 > Project: OpenNLP > Issue Type: Bug > Components: Formats >Affects Versions: tools-1.5.3 >Reporter: Luca >Priority: Major > Original Estimate: 1h > Remaining Estimate: 1h > > According to the documentation, the following should work > bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types > per > es_corpus_train_persons.txt > However currently it delivers error message since it expects 3 columns > instead of 2 that are in the dataset. > This is a bug, introduced at line 130 of > opennlp.tools.formats.Conll02NameSampleStream.java where a length of 3 is > imposed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (OPENNLP-1192) Remove MD5 hashes from Release process
[ https://issues.apache.org/jira/browse/OPENNLP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1192. -- > Remove MD5 hashes from Release process > -- > > Key: OPENNLP-1192 > URL: https://issues.apache.org/jira/browse/OPENNLP-1192 > Project: OpenNLP > Issue Type: New Feature > Components: Build, Packaging and Test >Affects Versions: 1.8.4 >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Major > Fix For: 1.8.5 > > > Per [http://www.apache.org/dev/release-publishing.html] MD5 should not be > supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (OPENNLP-1192) Remove MD5 hashes from Release process
[ https://issues.apache.org/jira/browse/OPENNLP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1192: --- Description: Per [http://www.apache.org/dev/release-publishing.html] MD5 should not be supported. (was: Per [http://www.apache.org/dev/release-publishing.html] MD5 should be stopped.) > Remove MD5 hashes from Release process > -- > > Key: OPENNLP-1192 > URL: https://issues.apache.org/jira/browse/OPENNLP-1192 > Project: OpenNLP > Issue Type: New Feature > Components: Build, Packaging and Test >Affects Versions: 1.8.4 >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Major > Fix For: 1.8.5 > > > Per [http://www.apache.org/dev/release-publishing.html] MD5 should not be > supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (OPENNLP-1192) Remove MD5 hashes from Release process
Suneel Marthi created OPENNLP-1192: -- Summary: Remove MD5 hashes from Release process Key: OPENNLP-1192 URL: https://issues.apache.org/jira/browse/OPENNLP-1192 Project: OpenNLP Issue Type: New Feature Components: Build, Packaging and Test Affects Versions: 1.8.4 Reporter: Suneel Marthi Assignee: Suneel Marthi Fix For: 1.8.5 Per [http://www.apache.org/dev/release-publishing.html] MD5 should be stopped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (OPENNLP-956) Javadoc issues with Java 8
[ https://issues.apache.org/jira/browse/OPENNLP-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-956. --- Resolution: Won't Fix Fix Version/s: 1.8.4 May not be an issue now with Java 9 support. Resolving this as 'Won't Fix' > Javadoc issues with Java 8 > -- > > Key: OPENNLP-956 > URL: https://issues.apache.org/jira/browse/OPENNLP-956 > Project: OpenNLP > Issue Type: Bug > Components: Build, Packaging and Test >Affects Versions: 1.7.1 >Reporter: William Colen >Assignee: Suneel Marthi > Fix For: 1.8.4 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-956) Javadoc issues with Java 8
[ https://issues.apache.org/jira/browse/OPENNLP-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-956. - > Javadoc issues with Java 8 > -- > > Key: OPENNLP-956 > URL: https://issues.apache.org/jira/browse/OPENNLP-956 > Project: OpenNLP > Issue Type: Bug > Components: Build, Packaging and Test >Affects Versions: 1.7.1 >Reporter: William Colen >Assignee: Suneel Marthi > Fix For: 1.8.4 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1132) Fail with exception if not enough lines in leipzig parser
[ https://issues.apache.org/jira/browse/OPENNLP-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1132. Resolution: Fixed > Fail with exception if not enough lines in leipzig parser > - > > Key: OPENNLP-1132 > URL: https://issues.apache.org/jira/browse/OPENNLP-1132 > Project: OpenNLP > Issue Type: Bug > Components: Language Detector >Affects Versions: 1.8.2 >Reporter: Peter Thygesen >Assignee: Peter Thygesen > Fix For: 1.8.4 > > > Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = > 10 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004) > at java.util.ArrayList.subList(ArrayList.java:996) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.(LeipzigLanguageSampleStream.java:65) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42) > at > opennlp.tools.formats.leipzig.SampleShuffleStream.(SampleShuffleStream.java:38) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76) > at > opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106) > at opennlp.tools.cmdline.CLI.main(CLI.java:256) > line 65: > Set selectedLines = new HashSet<>( > indexes.subList(0, sentencesPerSample * numberOfSamples)); > Fails if sentencesPerSample x numberOfSamples is larger than size of indexes > (source file). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1156) Downloaded files have invalid hash sums
[ https://issues.apache.org/jira/browse/OPENNLP-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1156. -- > Downloaded files have invalid hash sums > --- > > Key: OPENNLP-1156 > URL: https://issues.apache.org/jira/browse/OPENNLP-1156 > Project: OpenNLP > Issue Type: Bug > Components: Website >Affects Versions: 1.8.3 >Reporter: Oleg Popov > Fix For: 1.8.4 > > > [user@knime out]$ md5sum * > 336ec3cb06862f685a9b670753915ba9 apache-opennlp-1.8.3-bin.tar.gz > 2b1c1ec960646697a621bb52fd389083 apache-opennlp-1.8.3-bin.zip > 070645990b19210408229c045e9ddad9 apache-opennlp-1.8.3-src.tar.gz > cc1757ad7bb988e6a41cd2c75d128309 apache-opennlp-1.8.3-src.zip > [user@knime out]$ sha1sum * > 17e1089b41c6cad1a080cf76f5593d2e39cd apache-opennlp-1.8.3-bin.tar.gz > d59af5017ffdb0b81898e39048a8c8b460f13025 apache-opennlp-1.8.3-bin.zip > 5af2aa28c4ce36b61a35d45767dcd47d25c368a4 apache-opennlp-1.8.3-src.tar.gz > 1189d6c5c464f5d32d2f47c08d4b05d65766a0d9 apache-opennlp-1.8.3-src.zip -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1132) Fail with exception if not enough lines in leipzig parser
[ https://issues.apache.org/jira/browse/OPENNLP-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1132. -- > Fail with exception if not enough lines in leipzig parser > - > > Key: OPENNLP-1132 > URL: https://issues.apache.org/jira/browse/OPENNLP-1132 > Project: OpenNLP > Issue Type: Bug > Components: Language Detector >Affects Versions: 1.8.2 >Reporter: Peter Thygesen >Assignee: Peter Thygesen > Fix For: 1.8.4 > > > Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = > 10 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004) > at java.util.ArrayList.subList(ArrayList.java:996) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.(LeipzigLanguageSampleStream.java:65) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42) > at > opennlp.tools.formats.leipzig.SampleShuffleStream.(SampleShuffleStream.java:38) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76) > at > opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106) > at opennlp.tools.cmdline.CLI.main(CLI.java:256) > line 65: > Set selectedLines = new HashSet<>( > indexes.subList(0, sentencesPerSample * numberOfSamples)); > Fails if sentencesPerSample x numberOfSamples is larger than size of indexes > (source file). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1132) Fail with exception if not enough lines in leipzig parser
[ https://issues.apache.org/jira/browse/OPENNLP-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1132: --- Fix Version/s: 1.8.4 > Fail with exception if not enough lines in leipzig parser > - > > Key: OPENNLP-1132 > URL: https://issues.apache.org/jira/browse/OPENNLP-1132 > Project: OpenNLP > Issue Type: Bug > Components: Language Detector >Affects Versions: 1.8.2 >Reporter: Peter Thygesen >Assignee: Peter Thygesen > Fix For: 1.8.4 > > > Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = > 10 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004) > at java.util.ArrayList.subList(ArrayList.java:996) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.(LeipzigLanguageSampleStream.java:65) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42) > at > opennlp.tools.formats.leipzig.SampleShuffleStream.(SampleShuffleStream.java:38) > at > opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76) > at > opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106) > at opennlp.tools.cmdline.CLI.main(CLI.java:256) > line 65: > Set selectedLines = new HashSet<>( > indexes.subList(0, sentencesPerSample * numberOfSamples)); > Fails if sentencesPerSample x numberOfSamples is larger than size of indexes > (source file). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1156) Downloaded files have invalid hash sums
[ https://issues.apache.org/jira/browse/OPENNLP-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1156. Resolution: Not A Problem Resolving this as 'Not an Issue' > Downloaded files have invalid hash sums > --- > > Key: OPENNLP-1156 > URL: https://issues.apache.org/jira/browse/OPENNLP-1156 > Project: OpenNLP > Issue Type: Bug > Components: Website >Affects Versions: 1.8.3 >Reporter: Oleg Popov > Fix For: 1.8.4 > > > [user@knime out]$ md5sum * > 336ec3cb06862f685a9b670753915ba9 apache-opennlp-1.8.3-bin.tar.gz > 2b1c1ec960646697a621bb52fd389083 apache-opennlp-1.8.3-bin.zip > 070645990b19210408229c045e9ddad9 apache-opennlp-1.8.3-src.tar.gz > cc1757ad7bb988e6a41cd2c75d128309 apache-opennlp-1.8.3-src.zip > [user@knime out]$ sha1sum * > 17e1089b41c6cad1a080cf76f5593d2e39cd apache-opennlp-1.8.3-bin.tar.gz > d59af5017ffdb0b81898e39048a8c8b460f13025 apache-opennlp-1.8.3-bin.zip > 5af2aa28c4ce36b61a35d45767dcd47d25c368a4 apache-opennlp-1.8.3-src.tar.gz > 1189d6c5c464f5d32d2f47c08d4b05d65766a0d9 apache-opennlp-1.8.3-src.zip -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1156) Downloaded files have invalid hash sums
[ https://issues.apache.org/jira/browse/OPENNLP-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1156: --- Fix Version/s: 1.8.4 > Downloaded files have invalid hash sums > --- > > Key: OPENNLP-1156 > URL: https://issues.apache.org/jira/browse/OPENNLP-1156 > Project: OpenNLP > Issue Type: Bug > Components: Website >Affects Versions: 1.8.3 >Reporter: Oleg Popov > Fix For: 1.8.4 > > > [user@knime out]$ md5sum * > 336ec3cb06862f685a9b670753915ba9 apache-opennlp-1.8.3-bin.tar.gz > 2b1c1ec960646697a621bb52fd389083 apache-opennlp-1.8.3-bin.zip > 070645990b19210408229c045e9ddad9 apache-opennlp-1.8.3-src.tar.gz > cc1757ad7bb988e6a41cd2c75d128309 apache-opennlp-1.8.3-src.zip > [user@knime out]$ sha1sum * > 17e1089b41c6cad1a080cf76f5593d2e39cd apache-opennlp-1.8.3-bin.tar.gz > d59af5017ffdb0b81898e39048a8c8b460f13025 apache-opennlp-1.8.3-bin.zip > 5af2aa28c4ce36b61a35d45767dcd47d25c368a4 apache-opennlp-1.8.3-src.tar.gz > 1189d6c5c464f5d32d2f47c08d4b05d65766a0d9 apache-opennlp-1.8.3-src.zip -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1166) TwoPassDataIndexer fails if features contain \n
[ https://issues.apache.org/jira/browse/OPENNLP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1166: --- Fix Version/s: 1.8.4 > TwoPassDataIndexer fails if features contain \n > --- > > Key: OPENNLP-1166 > URL: https://issues.apache.org/jira/browse/OPENNLP-1166 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Affects Versions: 1.8.3 >Reporter: Peter Thygesen >Assignee: Peter Thygesen > Fix For: 1.8.4 > > > Training a model with Newline tokens causes TwoPassDataIndexer to throw > exception > Exception in thread "main" java.util.NoSuchElementException > at java.util.StringTokenizer.nextToken(StringTokenizer.java:349) > at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71) > at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35) > at > opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168) > at > opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72) > at > opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68) > at > opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90) > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244) > at > opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169) > at opennlp.tools.cmdline.CLI.main(CLI.java:256) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1166) TwoPassDataIndexer fails if features contain \n
[ https://issues.apache.org/jira/browse/OPENNLP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1166. -- > TwoPassDataIndexer fails if features contain \n > --- > > Key: OPENNLP-1166 > URL: https://issues.apache.org/jira/browse/OPENNLP-1166 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Affects Versions: 1.8.3 >Reporter: Peter Thygesen >Assignee: Peter Thygesen > Fix For: 1.8.4 > > > Training a model with Newline tokens causes TwoPassDataIndexer to throw > exception > Exception in thread "main" java.util.NoSuchElementException > at java.util.StringTokenizer.nextToken(StringTokenizer.java:349) > at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71) > at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35) > at > opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168) > at > opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72) > at > opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68) > at > opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90) > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244) > at > opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169) > at opennlp.tools.cmdline.CLI.main(CLI.java:256) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1166) TwoPassDataIndexer fails if features contain \n
[ https://issues.apache.org/jira/browse/OPENNLP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1166. Resolution: Fixed > TwoPassDataIndexer fails if features contain \n > --- > > Key: OPENNLP-1166 > URL: https://issues.apache.org/jira/browse/OPENNLP-1166 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Affects Versions: 1.8.3 >Reporter: Peter Thygesen >Assignee: Peter Thygesen > Fix For: 1.8.4 > > > Training a model with Newline tokens causes TwoPassDataIndexer to throw > exception > Exception in thread "main" java.util.NoSuchElementException > at java.util.StringTokenizer.nextToken(StringTokenizer.java:349) > at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71) > at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35) > at > opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168) > at > opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72) > at > opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68) > at > opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90) > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244) > at > opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169) > at opennlp.tools.cmdline.CLI.main(CLI.java:256) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1140) Add 20 newsgroups format support
[ https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1140. Resolution: Fixed > Add 20 newsgroups format support > > > Key: OPENNLP-1140 > URL: https://issues.apache.org/jira/browse/OPENNLP-1140 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Reporter: Tommaso Teofili >Assignee: Joern Kottmann > Fix For: 1.8.4 > > > It'd be nice to have support for [20 > newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for > evaluating {{DocCat}} models. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1140) Add 20 newsgroups format support
[ https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1140. -- > Add 20 newsgroups format support > > > Key: OPENNLP-1140 > URL: https://issues.apache.org/jira/browse/OPENNLP-1140 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Reporter: Tommaso Teofili >Assignee: Joern Kottmann > Fix For: 1.8.4 > > > It'd be nice to have support for [20 > newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for > evaluating {{DocCat}} models. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OPENNLP-1140) Add 20 newsgroups format support
[ https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1140: -- Assignee: Joern Kottmann Fix Version/s: (was: 1.8.5) 1.8.4 > Add 20 newsgroups format support > > > Key: OPENNLP-1140 > URL: https://issues.apache.org/jira/browse/OPENNLP-1140 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Reporter: Tommaso Teofili >Assignee: Joern Kottmann > Fix For: 1.8.4 > > > It'd be nice to have support for [20 > newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for > evaluating {{DocCat}} models. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OPENNLP-1172) Add Annotator notes to BratAnnotation
[ https://issues.apache.org/jira/browse/OPENNLP-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1172: -- Assignee: Daniel Russ > Add Annotator notes to BratAnnotation > - > > Key: OPENNLP-1172 > URL: https://issues.apache.org/jira/browse/OPENNLP-1172 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Affects Versions: 1.8.3 >Reporter: Daniel Russ >Assignee: Daniel Russ >Priority: Minor > Fix For: 1.8.4 > > > The Brat Annotator allows Annotators to add Notes to entites/relations. The > BratAnnotation class should reflect it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1153) Add model download page to website
[ https://issues.apache.org/jira/browse/OPENNLP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1153. Resolution: Fixed Fix Version/s: 1.8.3 > Add model download page to website > -- > > Key: OPENNLP-1153 > URL: https://issues.apache.org/jira/browse/OPENNLP-1153 > Project: OpenNLP > Issue Type: Improvement > Components: Website >Reporter: William Colen >Assignee: William Colen >Priority: Trivial > Fix For: 1.8.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1153) Add model download page to website
[ https://issues.apache.org/jira/browse/OPENNLP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1153. -- > Add model download page to website > -- > > Key: OPENNLP-1153 > URL: https://issues.apache.org/jira/browse/OPENNLP-1153 > Project: OpenNLP > Issue Type: Improvement > Components: Website >Reporter: William Colen >Assignee: William Colen >Priority: Trivial > Fix For: 1.8.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1147) Missing URLs in doc
[ https://issues.apache.org/jira/browse/OPENNLP-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1147. -- > Missing URLs in doc > --- > > Key: OPENNLP-1147 > URL: https://issues.apache.org/jira/browse/OPENNLP-1147 > Project: OpenNLP > Issue Type: Bug > Components: Documentation >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > When I read name finder part in document, some missing URLs were there. I'd > like to correct some of them which I could find latest/alternative ones. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1149) remove unused member in PlainTextByLineStream
[ https://issues.apache.org/jira/browse/OPENNLP-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1149. -- > remove unused member in PlainTextByLineStream > - > > Key: OPENNLP-1149 > URL: https://issues.apache.org/jira/browse/OPENNLP-1149 > Project: OpenNLP > Issue Type: Improvement >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > PlainTextByLineStream has a private member variable "channel" but it is never > set and hence, it is always null. It can be removed to simplify code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1145) Javadoc of NaiveBayesTrainer class looks incorrect
[ https://issues.apache.org/jira/browse/OPENNLP-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1145. -- > Javadoc of NaiveBayesTrainer class looks incorrect > -- > > Key: OPENNLP-1145 > URL: https://issues.apache.org/jira/browse/OPENNLP-1145 > Project: OpenNLP > Issue Type: Bug > Components: Machine Learning >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > It seems that Javadoc of NaiveBayesTrainer class was copied from > PerceptronTrainer and hence, it says "Trains models using the perceptron > algorithm." :) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1146) remove unnecessary serialVersionUID
[ https://issues.apache.org/jira/browse/OPENNLP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1146. -- > remove unnecessary serialVersionUID > --- > > Key: OPENNLP-1146 > URL: https://issues.apache.org/jira/browse/OPENNLP-1146 > Project: OpenNLP > Issue Type: Improvement > Components: Build, Packaging and Test >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > We saw several classes that have unnecessary serialVersionUID constant > declaration. Most of them are Stemmer classes that are created by the > Snowball to Java compiler. I think we can just remove serialVersionUID from > Stemmer classes. Other than Stemmer classes, Exception classes which extend > RuntimeException or IOException have serialVersionUID. I'll remove > serialVersionUID from these Exception classes as well but add > @SuppressWarnings("serial") just in case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1148) use StandardCharsets.UTF_8 in doc
[ https://issues.apache.org/jira/browse/OPENNLP-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1148. -- > use StandardCharsets.UTF_8 in doc > - > > Key: OPENNLP-1148 > URL: https://issues.apache.org/jira/browse/OPENNLP-1148 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > In the doc, the use of PlainTextByLineStream() is not unified. Other than > specifying StandardCharsets.UTF_8 for its second parameter, there are > following variations: > - String "UTF-8" > - StandardCharsets.UTF8 (not UTF_8) > - Charset.forName("UTF-8") > Let's unify the use to StandardCharsets.UTF_8 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1151) All Sample objects should implement Serializable for easy interation into other tools
[ https://issues.apache.org/jira/browse/OPENNLP-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1151. -- > All Sample objects should implement Serializable for easy interation into > other tools > - > > Key: OPENNLP-1151 > URL: https://issues.apache.org/jira/browse/OPENNLP-1151 > Project: OpenNLP > Issue Type: Bug >Reporter: Joern Kottmann >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.8.3 > > > State of the Art frameworks like Apache Flink require that objects are > serializable to use them in the pipeline. To use it to prepare training date > for OpenNLP the Sample objects should all implement Serializable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1150) TokenNameFinderTrainerTool should use ModelUtil.createDefaultTrainingParameters() when mlParams is null
[ https://issues.apache.org/jira/browse/OPENNLP-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1150: --- Fix Version/s: (was: 1.8.3) 1.8.4 > TokenNameFinderTrainerTool should use > ModelUtil.createDefaultTrainingParameters() when mlParams is null > --- > > Key: OPENNLP-1150 > URL: https://issues.apache.org/jira/browse/OPENNLP-1150 > Project: OpenNLP > Issue Type: Improvement > Components: Name Finder >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.4 > > > Unlike other TrainerTools, TokenNameFinderTrainerTool create an empty > TrainingParameters when mlParams is null by calling the constructor. > TokenNameFinderTrainerTool should use > ModelUtil.createDefaultTrainingParameters() like as other TrainerTools do to > initialize mlParams. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)
[ https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-936: -- Fix Version/s: (was: 1.8.3) 1.8.4 > Add thread safe versions of some tools (ME sentence detection, tokenization, > pos tagging) > - > > Key: OPENNLP-936 > URL: https://issues.apache.org/jira/browse/OPENNLP-936 > Project: OpenNLP > Issue Type: Improvement > Components: POS Tagger >Affects Versions: 1.7.1 >Reporter: Thilo Goetz >Priority: Minor > Fix For: 1.8.4 > > > As discussed on the mailing list, add thread safe versions of maximum entropy > sentence detection, tokenization and pos tagging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1144) Add support for word vector resources
[ https://issues.apache.org/jira/browse/OPENNLP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1144: --- Fix Version/s: (was: 1.8.3) 1.8.4 > Add support for word vector resources > - > > Key: OPENNLP-1144 > URL: https://issues.apache.org/jira/browse/OPENNLP-1144 > Project: OpenNLP > Issue Type: Improvement >Reporter: Joern Kottmann > Fix For: 1.8.4 > > > It would be nice to have support for word vector resources and parsing > support for the most common formats. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1140) Add 20 newsgroups format support
[ https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1140: --- Fix Version/s: (was: 1.8.3) 1.8.4 > Add 20 newsgroups format support > > > Key: OPENNLP-1140 > URL: https://issues.apache.org/jira/browse/OPENNLP-1140 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Reporter: Tommaso Teofili > Fix For: 1.8.4 > > > It'd be nice to have support for [20 > newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for > evaluating {{DocCat}} models. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1082) SentenceSampleStream should add EOS to samples if missing
[ https://issues.apache.org/jira/browse/OPENNLP-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1082: --- Fix Version/s: (was: 1.8.3) 1.8.4 > SentenceSampleStream should add EOS to samples if missing > - > > Key: OPENNLP-1082 > URL: https://issues.apache.org/jira/browse/OPENNLP-1082 > Project: OpenNLP > Issue Type: Improvement > Components: Sentence Detector >Reporter: William Colen >Assignee: William Colen > Fix For: 1.8.4 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-47) Rewrite the CONLL06 documentation based on the tutorial
[ https://issues.apache.org/jira/browse/OPENNLP-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-47: - Fix Version/s: (was: 1.8.3) 1.8.4 > Rewrite the CONLL06 documentation based on the tutorial > --- > > Key: OPENNLP-47 > URL: https://issues.apache.org/jira/browse/OPENNLP-47 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Affects Versions: tools-1.5.1-incubating >Reporter: Joern Kottmann > Labels: help-wanted > Fix For: 1.8.4 > > > The CONLL06 documentation should be rewritten the reflect the new converters > which have been added to OpenNLP after its initial write. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1113) Identify why some eval tests fail on AMD processors
[ https://issues.apache.org/jira/browse/OPENNLP-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1113: --- Fix Version/s: (was: 1.8.3) 1.8.4 > Identify why some eval tests fail on AMD processors > --- > > Key: OPENNLP-1113 > URL: https://issues.apache.org/jira/browse/OPENNLP-1113 > Project: OpenNLP > Issue Type: Test >Affects Versions: 1.8.1 >Reporter: Jeff Zemerick >Assignee: Jeff Zemerick >Priority: Minor > Fix For: 1.8.4 > > Attachments: failure.txt, success.txt > > > When running the eval-tests for the 1.8.1 tag some of the tests consistently > fail on an EC2 instance. On another virtual machine the tests consistently > pass. When the tests fail the failures are consistent with the following: > {quote}Failed tests: > > ArvoresDeitadasEval.evalPortugueseChunkerQnMultipleThreads:208->chunkerCrossEval:128 > expected:<0.9649180953528779> but was:<0.9650518197155942> > > ArvoresDeitadasEval.evalPortugueseSentenceDetectorMaxentQn:143->sentenceCrossEval:90 > expected:<0.99261110833375> but was:<0.9927505074644777> > Conll02NameFinderEval.evalSpanishOrganizationMaxentQn:390->eval:90 > expected:<0.682961897915169> but was:<0.6798418972332015> > ConllXPosTaggerEval.evalSwedishMaxentQn:152->eval:76 > expected:<0.9347595473833098> but was:<0.9322842998585573>{quote} > Both systems are Ubuntu 16.04.2 running OpenJDK 1.8.0_131 but there must be > some other differences affecting the tests. Those differences need to be > identified. > *VM1 (Tests Consistently _Pass_)* > Apache Maven 3.3.9 > Maven home: /usr/share/maven > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "4.4.0-1022-aws", arch: "amd64", family: "unix" > LANG=en_US.UTF-8 > *VM2 (Tests Consistently _Fail_)* > Apache Maven 3.3.9 > Maven home: /usr/share/maven > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix" > LANG=en_US.UTF-8 > This VM also consistently fails when using Oracle JDK: > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-oracle/jre > *VM3 (Tests Consistently _Pass_)* > Apache Maven 3.3.9 > Maven home: /usr/share/maven > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix" > *VM4 (Tests Consistently _Fail_)* > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; > 2015-11-10T11:41:47-05:00) > Maven home: C:\Program Files (x86)\maven\bin\.. > Java version: 1.8.0_92, vendor: Oracle Corporation > Java home: C:\Program Files\Java\jdk1.8.0_92\jre > Default locale: en_US, platform encoding: Cp1252 > OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1151) All Sample objects should implement Serializable for easy interation into other tools
[ https://issues.apache.org/jira/browse/OPENNLP-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1151. Resolution: Fixed > All Sample objects should implement Serializable for easy interation into > other tools > - > > Key: OPENNLP-1151 > URL: https://issues.apache.org/jira/browse/OPENNLP-1151 > Project: OpenNLP > Issue Type: Bug >Reporter: Joern Kottmann >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.8.3 > > > State of the Art frameworks like Apache Flink require that objects are > serializable to use them in the pipeline. To use it to prepare training date > for OpenNLP the Sample objects should all implement Serializable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-976) Add formats support for germeval2014
[ https://issues.apache.org/jira/browse/OPENNLP-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-976: -- Fix Version/s: (was: 1.8.3) > Add formats support for germeval2014 > > > Key: OPENNLP-976 > URL: https://issues.apache.org/jira/browse/OPENNLP-976 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Reporter: Joern Kottmann >Assignee: Suneel Marthi > > Details about the format can be found here: > https://sites.google.com/site/germeval2014ner/data -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1148) use StandardCharsets.UTF_8 in doc
[ https://issues.apache.org/jira/browse/OPENNLP-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1148. Resolution: Fixed > use StandardCharsets.UTF_8 in doc > - > > Key: OPENNLP-1148 > URL: https://issues.apache.org/jira/browse/OPENNLP-1148 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > In the doc, the use of PlainTextByLineStream() is not unified. Other than > specifying StandardCharsets.UTF_8 for its second parameter, there are > following variations: > - String "UTF-8" > - StandardCharsets.UTF8 (not UTF_8) > - Charset.forName("UTF-8") > Let's unify the use to StandardCharsets.UTF_8 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1147) Missing URLs in doc
[ https://issues.apache.org/jira/browse/OPENNLP-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1147. Resolution: Fixed > Missing URLs in doc > --- > > Key: OPENNLP-1147 > URL: https://issues.apache.org/jira/browse/OPENNLP-1147 > Project: OpenNLP > Issue Type: Bug > Components: Documentation >Affects Versions: 1.8.2 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 1.8.3 > > > When I read name finder part in document, some missing URLs were there. I'd > like to correct some of them which I could find latest/alternative ones. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OPENNLP-1151) All Sample objects should implement Serializable for easy interation into other tools
[ https://issues.apache.org/jira/browse/OPENNLP-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1151: -- Assignee: Suneel Marthi > All Sample objects should implement Serializable for easy interation into > other tools > - > > Key: OPENNLP-1151 > URL: https://issues.apache.org/jira/browse/OPENNLP-1151 > Project: OpenNLP > Issue Type: Bug >Reporter: Joern Kottmann >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.8.3 > > > State of the Art frameworks like Apache Flink require that objects are > serializable to use them in the pipeline. To use it to prepare training date > for OpenNLP the Sample objects should all implement Serializable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1116) Add Concatenate Stream method for Collections of streams
[ https://issues.apache.org/jira/browse/OPENNLP-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1116. -- > Add Concatenate Stream method for Collections of streams > > > Key: OPENNLP-1116 > URL: https://issues.apache.org/jira/browse/OPENNLP-1116 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Affects Versions: 1.8.1 >Reporter: Daniel Russ >Assignee: Daniel Russ >Priority: Trivial > Fix For: 1.8.2 > > > Minor change to opennlp.tools.util.ObjectStreamUtls. First change the > signature of the createObjectStream(final ObjectStream... streams) to > concatenateObjectStream(final ObjectStream... streams), and add a method > concatenateObjectStream(final Collectionstreams) > The reason behind this is that I often pull data from multiple files, whereas > it is possible to create an array of ObjectStreams, it is easier to work with > Lists. Also, the name of the method is clearer. It concatenates a > list/array of ObjectStreams as opposed the the createObjectStream(final > Collection collection) which makes an obectstream of items in the > collection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (OPENNLP-1116) Add Concatenate Stream method for Collections of streams
[ https://issues.apache.org/jira/browse/OPENNLP-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-1116: -- Assignee: Daniel Russ > Add Concatenate Stream method for Collections of streams > > > Key: OPENNLP-1116 > URL: https://issues.apache.org/jira/browse/OPENNLP-1116 > Project: OpenNLP > Issue Type: Improvement > Components: Machine Learning >Affects Versions: 1.8.1 >Reporter: Daniel Russ >Assignee: Daniel Russ >Priority: Trivial > Fix For: 1.8.2 > > > Minor change to opennlp.tools.util.ObjectStreamUtls. First change the > signature of the createObjectStream(final ObjectStream... streams) to > concatenateObjectStream(final ObjectStream... streams), and add a method > concatenateObjectStream(final Collectionstreams) > The reason behind this is that I often pull data from multiple files, whereas > it is possible to create an array of ObjectStreams, it is easier to work with > Lists. Also, the name of the method is clearer. It concatenates a > list/array of ObjectStreams as opposed the the createObjectStream(final > Collection collection) which makes an obectstream of items in the > collection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1117) Fix cmd line training time
[ https://issues.apache.org/jira/browse/OPENNLP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1117. Resolution: Fixed > Fix cmd line training time > -- > > Key: OPENNLP-1117 > URL: https://issues.apache.org/jira/browse/OPENNLP-1117 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface >Affects Versions: 1.7.1 >Reporter: Peter Thygesen >Assignee: Peter Thygesen >Priority: Trivial > Fix For: 1.8.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > The final execution time for training a model should be printed using > System.err not System.out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1117) Fix cmd line training time
[ https://issues.apache.org/jira/browse/OPENNLP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1117: --- Fix Version/s: 1.8.2 > Fix cmd line training time > -- > > Key: OPENNLP-1117 > URL: https://issues.apache.org/jira/browse/OPENNLP-1117 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface >Affects Versions: 1.7.1 >Reporter: Peter Thygesen >Assignee: Peter Thygesen >Priority: Trivial > Fix For: 1.8.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > The final execution time for training a model should be printed using > System.err not System.out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (OPENNLP-1117) Fix cmd line training time
[ https://issues.apache.org/jira/browse/OPENNLP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi closed OPENNLP-1117. -- > Fix cmd line training time > -- > > Key: OPENNLP-1117 > URL: https://issues.apache.org/jira/browse/OPENNLP-1117 > Project: OpenNLP > Issue Type: Bug > Components: Command Line Interface >Affects Versions: 1.7.1 >Reporter: Peter Thygesen >Assignee: Peter Thygesen >Priority: Trivial > Fix For: 1.8.2 > > Original Estimate: 1h > Remaining Estimate: 1h > > The final execution time for training a model should be printed using > System.err not System.out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1112) Jenkins should publish daily snapshot builds
[ https://issues.apache.org/jira/browse/OPENNLP-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1112: --- Summary: Jenkins should publish daily snapshot builds (was: Travis should publish daily snapshot builds) > Jenkins should publish daily snapshot builds > > > Key: OPENNLP-1112 > URL: https://issues.apache.org/jira/browse/OPENNLP-1112 > Project: OpenNLP > Issue Type: Improvement > Components: Build, Packaging and Test >Reporter: Joern Kottmann >Assignee: Suneel Marthi > Fix For: 1.8.2 > > > Travis should publish a snapshot build every time the master branch is > updated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (OPENNLP-1114) Update OpenNLP Release Notes
[ https://issues.apache.org/jira/browse/OPENNLP-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-1114. Resolution: Fixed > Update OpenNLP Release Notes > > > Key: OPENNLP-1114 > URL: https://issues.apache.org/jira/browse/OPENNLP-1114 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.8.1 >Reporter: Suneel Marthi >Assignee: Suneel Marthi > Fix For: 1.8.2 > > > The Release Notes need to be updated to account for the changes to the web > site code that need to happen prior to a Release announcement. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (OPENNLP-1114) Update OpenNLP Release Notes
Suneel Marthi created OPENNLP-1114: -- Summary: Update OpenNLP Release Notes Key: OPENNLP-1114 URL: https://issues.apache.org/jira/browse/OPENNLP-1114 Project: OpenNLP Issue Type: Improvement Components: Documentation Affects Versions: 1.8.1 Reporter: Suneel Marthi Assignee: Suneel Marthi Fix For: 1.8.2 The Release Notes need to be updated to account for the changes to the web site code that need to happen prior to a Release announcement. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-47) Rewrite the CONLL06 documentation based on the tutorial
[ https://issues.apache.org/jira/browse/OPENNLP-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-47: - Fix Version/s: 1.8.2 > Rewrite the CONLL06 documentation based on the tutorial > --- > > Key: OPENNLP-47 > URL: https://issues.apache.org/jira/browse/OPENNLP-47 > Project: OpenNLP > Issue Type: Improvement > Components: Documentation >Affects Versions: tools-1.5.1-incubating >Reporter: Joern Kottmann > Labels: help-wanted > Fix For: 1.8.2 > > > The CONLL06 documentation should be rewritten the reflect the new converters > which have been added to OpenNLP after its initial write. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-976) Add formats support for germeval2014
[ https://issues.apache.org/jira/browse/OPENNLP-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-976: -- Fix Version/s: 1.8.2 > Add formats support for germeval2014 > > > Key: OPENNLP-976 > URL: https://issues.apache.org/jira/browse/OPENNLP-976 > Project: OpenNLP > Issue Type: Improvement > Components: Formats >Reporter: Joern Kottmann >Assignee: Suneel Marthi > Fix For: 1.8.2 > > > Details about the format can be found here: > https://sites.google.com/site/germeval2014ner/data -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1106) Update the coref code to compile against 1.6.0
[ https://issues.apache.org/jira/browse/OPENNLP-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1106: --- Fix Version/s: 1.8.2 > Update the coref code to compile against 1.6.0 > -- > > Key: OPENNLP-1106 > URL: https://issues.apache.org/jira/browse/OPENNLP-1106 > Project: OpenNLP > Issue Type: Improvement > Components: Coref >Reporter: Joern Kottmann >Assignee: Joern Kottmann > Fix For: 1.8.2 > > > It would be nice if the coref code would compile against an older release > version and gets the code a bit updated so it complies mostly with checkstyle > rules. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1082) SentenceSampleStream should add EOS to samples if missing
[ https://issues.apache.org/jira/browse/OPENNLP-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1082: --- Fix Version/s: 1.8.2 > SentenceSampleStream should add EOS to samples if missing > - > > Key: OPENNLP-1082 > URL: https://issues.apache.org/jira/browse/OPENNLP-1082 > Project: OpenNLP > Issue Type: Improvement > Components: Sentence Detector >Reporter: William Colen >Assignee: William Colen > Fix For: 1.8.2 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (OPENNLP-1113) evalPortugueseChunkerQnMultipleThreads and other tests can fail
[ https://issues.apache.org/jira/browse/OPENNLP-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi updated OPENNLP-1113: --- Fix Version/s: 1.8.2 > evalPortugueseChunkerQnMultipleThreads and other tests can fail > --- > > Key: OPENNLP-1113 > URL: https://issues.apache.org/jira/browse/OPENNLP-1113 > Project: OpenNLP > Issue Type: Test >Affects Versions: 1.8.1 >Reporter: Jeff Zemerick >Assignee: Jeff Zemerick >Priority: Minor > Fix For: 1.8.2 > > Attachments: failure.txt, success.txt > > > When running the eval-tests for the 1.8.1 tag some of the tests consistently > fail on an EC2 instance. On another virtual machine the tests consistently > pass. When the tests fail the failures are consistent with the following: > {quote}Failed tests: > > ArvoresDeitadasEval.evalPortugueseChunkerQnMultipleThreads:208->chunkerCrossEval:128 > expected:<0.9649180953528779> but was:<0.9650518197155942> > > ArvoresDeitadasEval.evalPortugueseSentenceDetectorMaxentQn:143->sentenceCrossEval:90 > expected:<0.99261110833375> but was:<0.9927505074644777> > Conll02NameFinderEval.evalSpanishOrganizationMaxentQn:390->eval:90 > expected:<0.682961897915169> but was:<0.6798418972332015> > ConllXPosTaggerEval.evalSwedishMaxentQn:152->eval:76 > expected:<0.9347595473833098> but was:<0.9322842998585573>{quote} > Both systems are Ubuntu 16.04.2 running OpenJDK 1.8.0_131 but there must be > some other differences affecting the tests. Those differences need to be > identified. > *VM1 (Tests Consistently _Pass_)* > Apache Maven 3.3.9 > Maven home: /usr/share/maven > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "4.4.0-1022-aws", arch: "amd64", family: "unix" > LANG=en_US.UTF-8 > *VM2 (Tests Consistently _Fail_)* > Apache Maven 3.3.9 > Maven home: /usr/share/maven > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix" > LANG=en_US.UTF-8 > This VM also consistently fails when using Oracle JDK: > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-oracle/jre > *VM3 (Tests Consistently _Pass_)* > Apache Maven 3.3.9 > Maven home: /usr/share/maven > Java version: 1.8.0_131, vendor: Oracle Corporation > Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix" -- This message was sent by Atlassian JIRA (v6.4.14#64029)