[jira] [Assigned] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release

2022-11-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1381:
--

Assignee: (was: Suneel Marthi)

> OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The 
> Security Manager is deprecated and will be removed in a future release
> ---
>
> Key: OPENNLP-1381
> URL: https://issues.apache.org/jira/browse/OPENNLP-1381
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
>Affects Versions: 2.1.0
> Environment: MacOS  Monterey 12.5 Intel (same issue in M1 chip)
> brew-installed OpenJDK 18.0.2
>Reporter: Bertrand Rigaldies
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot 
> 2022-07-31 at 9.11.09 PM.png
>
>
> As of OpenJDK 18, the Security Manager has been deprecated (see [JEP 
> 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java:
> java.lang.UnsupportedOperationException: The Security Manager is deprecated 
> and will be removed in a future release
>     at java.base/java.lang.System.setSecurityManager(System.java:416)
>     at 
> opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66)
>     at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:577)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
>     at 
> com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
>     at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
>     at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release

2022-11-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1381:
---
Affects Version/s: 2.1.0

> OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The 
> Security Manager is deprecated and will be removed in a future release
> ---
>
> Key: OPENNLP-1381
> URL: https://issues.apache.org/jira/browse/OPENNLP-1381
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
>Affects Versions: 2.1.0
> Environment: MacOS  Monterey 12.5 Intel (same issue in M1 chip)
> brew-installed OpenJDK 18.0.2
>Reporter: Bertrand Rigaldies
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot 
> 2022-07-31 at 9.11.09 PM.png
>
>
> As of OpenJDK 18, the Security Manager has been deprecated (see [JEP 
> 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java:
> java.lang.UnsupportedOperationException: The Security Manager is deprecated 
> and will be removed in a future release
>     at java.base/java.lang.System.setSecurityManager(System.java:416)
>     at 
> opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66)
>     at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:577)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
>     at 
> com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
>     at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
>     at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release

2022-11-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1381:
---
Fix Version/s: 2.1.1

> OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The 
> Security Manager is deprecated and will be removed in a future release
> ---
>
> Key: OPENNLP-1381
> URL: https://issues.apache.org/jira/browse/OPENNLP-1381
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
> Environment: MacOS  Monterey 12.5 Intel (same issue in M1 chip)
> brew-installed OpenJDK 18.0.2
>Reporter: Bertrand Rigaldies
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot 
> 2022-07-31 at 9.11.09 PM.png
>
>
> As of OpenJDK 18, the Security Manager has been deprecated (see [JEP 
> 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java:
> java.lang.UnsupportedOperationException: The Security Manager is deprecated 
> and will be removed in a future release
>     at java.base/java.lang.System.setSecurityManager(System.java:416)
>     at 
> opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66)
>     at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:577)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
>     at 
> com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
>     at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
>     at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OPENNLP-1381) OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release

2022-11-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1381:
--

Assignee: Suneel Marthi

> OpenJDK 18+: CLITest fails with java.lang.UnsupportedOperationException: The 
> Security Manager is deprecated and will be removed in a future release
> ---
>
> Key: OPENNLP-1381
> URL: https://issues.apache.org/jira/browse/OPENNLP-1381
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
> Environment: MacOS  Monterey 12.5 Intel (same issue in M1 chip)
> brew-installed OpenJDK 18.0.2
>Reporter: Bertrand Rigaldies
>Assignee: Suneel Marthi
>Priority: Major
> Attachments: Screen Shot 2022-07-31 at 9.10.48 PM.png, Screen Shot 
> 2022-07-31 at 9.11.09 PM.png
>
>
> As of OpenJDK 18, the Security Manager has been deprecated (see [JEP 
> 411]([https://openjdk.org/jeps/411)),] which fails all tests in CLITest.java:
> java.lang.UnsupportedOperationException: The Security Manager is deprecated 
> and will be removed in a future release
>     at java.base/java.lang.System.setSecurityManager(System.java:416)
>     at 
> opennlp.tools.cmdline.CLITest.installNoExitSecurityManager(CLITest.java:66)
>     at 
> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:577)
>     at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>     at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>     at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>     at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>     at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>     at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>     at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>     at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>     at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>     at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>     at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>     at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>     at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>     at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>     at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>     at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
>     at 
> com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
>     at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
>     at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
>     at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OPENNLP-1397) Build should fail fast if an unsupported JDK is used

2022-11-22 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1397.

Fix Version/s: 2.1.0
   Resolution: Fixed

Enforce Maven-enforcer-plugin to kick in JDK version validation during the 
'validate' phase

> Build should fail fast if an unsupported JDK is used
> 
>
> Key: OPENNLP-1397
> URL: https://issues.apache.org/jira/browse/OPENNLP-1397
> Project: OpenNLP
>  Issue Type: Task
>  Components: Build, Packaging and Test
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jeff Zemerick
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 2.1.0
>
>
> The build should fail fast if an unsupported JDK is used. Check to see if the 
> Maven Enforcer plugin is configured correctly or if something else is needed.
> This issue came about from [~smarthi] testing the OpenNLP 2.1.0 RC1 using 
> Amazon Corretto 8. The build did not fail until a failed unit test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OPENNLP-1271) Illegal Argument Exception

2020-01-29 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1271:
---
Fix Version/s: 1.9.3

> Illegal Argument Exception
> --
>
> Key: OPENNLP-1271
> URL: https://issues.apache.org/jira/browse/OPENNLP-1271
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Name Finder
>Reporter: Raza Abbas
>Priority: Major
> Fix For: 1.9.3
>
>
> I am using this library in some production code. I am getting the following 
> exception once in a while, so I am not being able to reproduce it exactly 
> always. 
>  
> java.lang.IllegalArgumentException: The span [18..23) is outside the given 
> text which has length 12!
> at opennlp.tools.util.Span.getCoveredText(Span.java:231)
> at opennlp.tools.util.Span.spansToStrings(Span.java:351)
> at 
> opennlp.tools.tokenize.AbstractTokenizer.tokenize(AbstractTokenizer.java:25)
> at opennlp.tools.tokenize.TokenizerME.tokenize(TokenizerME.java:76)
>  
>  
> This seems like an internal OpenNLP issue, and not how I'm using the library. 
> Any help would be appreciated. Thanks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1219) change private instance variable featureGenerators to protected in DefaultNameContextGenerator

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1219.
--

> change private instance variable featureGenerators to protected in 
> DefaultNameContextGenerator
> --
>
> Key: OPENNLP-1219
> URL: https://issues.apache.org/jira/browse/OPENNLP-1219
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.1
>
>
> TokenNameFinderTrainer allows users to customize TokenNameFinderFactory via 
> -factory option. As I want to override 
> DefaultNameContextGenerator.getContext(), I made the sub-class of 
> TokenNameFinderFactory and created an instance of the sub-class of 
> DefaultNameContextGenerator in the constructor of my TokenNameFinderFactory. 
> However, I couldn't implement getContext() method of my 
> DefaultNameContextGenerator because I couldn't access private member 
> featureGenerators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OPENNLP-1270) Add new languages to the language detector

2020-01-25 Thread Suneel Marthi (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023639#comment-17023639
 ] 

Suneel Marthi commented on OPENNLP-1270:


Could we also look at Europaarl corpus maybe?  
[https://www.statmt.org/europarl/]

> Add new languages to the language detector
> --
>
> Key: OPENNLP-1270
> URL: https://issues.apache.org/jira/browse/OPENNLP-1270
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
> Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the 
> language detector.  I've selected some with > 10k sentences.  Once I build 
> the model and evaluate performance, I'll share the reports, the model and a 
> tgz of the *-sentences.txt files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1264) Trivial fixes to enable building on, gasp, Windows

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1264.

Fix Version/s: 1.9.3
 Assignee: Tim Allison
   Resolution: Fixed

> Trivial fixes to enable building on, gasp, Windows
> --
>
> Key: OPENNLP-1264
> URL: https://issues.apache.org/jira/browse/OPENNLP-1264
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 1.9.3
>
>
> I had to change 3 things to get a clean build on Windows...I'm not sure the 
> solutions are the most elegant, and these may be user error
> 1) I had to turn off (fail on error) in style checking because of a problem w 
> new lines. On nearly every file, I got this failure.
> {noformat}
> [ERROR] src\test\java\opennlp\tools\util\VersionTest.java:[0] (misc) 
> NewlineAtEndOfFile: File does not end with a newline.
> [WARNING] checkstyle:check violations detected but failOnViolation set to 
> false
> {noformat}
> 2) {{LanguageDetectorEvaluatorTest#processSample}} fails because '\n' are 
> expected, but Windows, of course, writes '\r\n' with {{println}}
> 3) I intentionally have a space in the directory structure to my IdeaProjects 
> directory, which can cause problems on Windows when finding paths.  There are 
> two areas where this happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1264) Trivial fixes to enable building on, gasp, Windows

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1264.
--

> Trivial fixes to enable building on, gasp, Windows
> --
>
> Key: OPENNLP-1264
> URL: https://issues.apache.org/jira/browse/OPENNLP-1264
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 1.9.3
>
>
> I had to change 3 things to get a clean build on Windows...I'm not sure the 
> solutions are the most elegant, and these may be user error
> 1) I had to turn off (fail on error) in style checking because of a problem w 
> new lines. On nearly every file, I got this failure.
> {noformat}
> [ERROR] src\test\java\opennlp\tools\util\VersionTest.java:[0] (misc) 
> NewlineAtEndOfFile: File does not end with a newline.
> [WARNING] checkstyle:check violations detected but failOnViolation set to 
> false
> {noformat}
> 2) {{LanguageDetectorEvaluatorTest#processSample}} fails because '\n' are 
> expected, but Windows, of course, writes '\r\n' with {{println}}
> 3) I intentionally have a space in the directory structure to my IdeaProjects 
> directory, which can cause problems on Windows when finding paths.  There are 
> two areas where this happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1269) Add alternate to NGramModel that uses straight Strings rather than StringList

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1269.

Fix Version/s: 1.9.3
 Assignee: Jeffrey T. Zemerick
   Resolution: Fixed

> Add alternate to NGramModel that uses straight Strings rather than StringList
> -
>
> Key: OPENNLP-1269
> URL: https://issues.apache.org/jira/browse/OPENNLP-1269
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Jeffrey T. Zemerick
>Priority: Trivial
> Fix For: 1.9.3
>
>
> On OPENNLP-1265, I found that we could halve the lang detect speed on longer 
> documents if we didn't create a StringList for every ngram, but rather used a 
> plain String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1269) Add alternate to NGramModel that uses straight Strings rather than StringList

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1269.
--

> Add alternate to NGramModel that uses straight Strings rather than StringList
> -
>
> Key: OPENNLP-1269
> URL: https://issues.apache.org/jira/browse/OPENNLP-1269
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Jeffrey T. Zemerick
>Priority: Trivial
> Fix For: 1.9.3
>
>
> On OPENNLP-1265, I found that we could halve the lang detect speed on longer 
> documents if we didn't create a StringList for every ngram, but rather used a 
> plain String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1272) Add support for Catalan and Indonesian stemmers

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1272.

  Assignee: Jeffrey T. Zemerick
Resolution: Fixed

> Add support for Catalan and Indonesian stemmers
> ---
>
> Key: OPENNLP-1272
> URL: https://issues.apache.org/jira/browse/OPENNLP-1272
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Stemmer
>Reporter: Vlad Ciotlausi
>Assignee: Jeffrey T. Zemerick
>Priority: Minor
>  Labels: Stemmer
> Fix For: 1.9.3
>
>
> Added Indonesian and Catalan stemmers plus some minor fixes.
>  
> This PR includes:
>  * Creating the Java code based on the .sbl files and adding them to the 
> stemmer folder
>  * Updating relevant classes to support the new stemmers
>  * Adding tests for the new stemmers
>  * Changing the romanian unit tests for stemmers because it didn't test a 
> romanian word



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1272) Add support for Catalan and Indonesian stemmers

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1272.
--

> Add support for Catalan and Indonesian stemmers
> ---
>
> Key: OPENNLP-1272
> URL: https://issues.apache.org/jira/browse/OPENNLP-1272
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Stemmer
>Reporter: Vlad Ciotlausi
>Assignee: Jeffrey T. Zemerick
>Priority: Minor
>  Labels: Stemmer
> Fix For: 1.9.3
>
>
> Added Indonesian and Catalan stemmers plus some minor fixes.
>  
> This PR includes:
>  * Creating the Java code based on the .sbl files and adding them to the 
> stemmer folder
>  * Updating relevant classes to support the new stemmers
>  * Adding tests for the new stemmers
>  * Changing the romanian unit tests for stemmers because it didn't test a 
> romanian word



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1258) implement Serializable in langdetect and normalize

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1258.

Resolution: Fixed

> implement Serializable in langdetect and normalize
> --
>
> Key: OPENNLP-1258
> URL: https://issues.apache.org/jira/browse/OPENNLP-1258
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Language Detector
>Reporter: Lucas Avanço
>Priority: Major
> Fix For: 1.9.3
>
>
> It is necessary to make some classes of langdetect and normalizer to 
> implement Serializable in order to save language detection models



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1258) implement Serializable in langdetect and normalize

2020-01-25 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1258.
--

> implement Serializable in langdetect and normalize
> --
>
> Key: OPENNLP-1258
> URL: https://issues.apache.org/jira/browse/OPENNLP-1258
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Language Detector
>Reporter: Lucas Avanço
>Priority: Major
> Fix For: 1.9.3
>
>
> It is necessary to make some classes of langdetect and normalizer to 
> implement Serializable in order to save language detection models



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1261) Language Detector fails to predict language on long input texts

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1261.

Resolution: Won't Fix

> Language Detector fails to predict language on long input texts
> ---
>
> Key: OPENNLP-1261
> URL: https://issues.apache.org/jira/browse/OPENNLP-1261
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Language Detector
>Reporter: Jörn Kottmann
>Assignee: Jörn Kottmann
>Priority: Major
> Fix For: 1.9.3
>
> Attachments: langid_plus_minus_rollups.zip, leipzig_1000-sents.zip, 
> opennlp_as_is_vs_1261.zip
>
>
> If the input text is very long, e.g. 100k chars, then the lang detect 
> component fails to detect the language correctly, even though the text is 
> only written in one language.
> This issue was tracked down to the context generator, where the count of the 
> ngrams are ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1261) Language Detector fails to predict language on long input texts

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1261.
--

> Language Detector fails to predict language on long input texts
> ---
>
> Key: OPENNLP-1261
> URL: https://issues.apache.org/jira/browse/OPENNLP-1261
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Language Detector
>Reporter: Jörn Kottmann
>Assignee: Jörn Kottmann
>Priority: Major
> Fix For: 1.9.3
>
> Attachments: langid_plus_minus_rollups.zip, leipzig_1000-sents.zip, 
> opennlp_as_is_vs_1261.zip
>
>
> If the input text is very long, e.g. 100k chars, then the lang detect 
> component fails to detect the language correctly, even though the text is 
> only written in one language.
> This issue was tracked down to the context generator, where the count of the 
> ngrams are ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1261) Language Detector fails to predict language on long input texts

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1261:
---
Fix Version/s: 1.9.3

> Language Detector fails to predict language on long input texts
> ---
>
> Key: OPENNLP-1261
> URL: https://issues.apache.org/jira/browse/OPENNLP-1261
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Language Detector
>Reporter: Jörn Kottmann
>Assignee: Jörn Kottmann
>Priority: Major
> Fix For: 1.9.3
>
> Attachments: langid_plus_minus_rollups.zip, leipzig_1000-sents.zip, 
> opennlp_as_is_vs_1261.zip
>
>
> If the input text is very long, e.g. 100k chars, then the lang detect 
> component fails to detect the language correctly, even though the text is 
> only written in one language.
> This issue was tracked down to the context generator, where the count of the 
> ngrams are ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1234) Dictionary.asStringSet() is returning single tokens

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1234:
---
Fix Version/s: 1.9.3

> Dictionary.asStringSet() is returning single tokens 
> 
>
> Key: OPENNLP-1234
> URL: https://issues.apache.org/jira/browse/OPENNLP-1234
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Name Finder
>Reporter: Evandro Fonseca
>Priority: Major
>  Labels: easyfix
> Fix For: 1.9.3
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> When we use the method Dictionary.asStringSet(), it returns a list of single 
> tokens.
> For example: European Union -> European. Basically, it returns just the first 
> token of each instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OPENNLP-1286) Updating from 1.7.0 to 1.9.1 breaks

2020-01-24 Thread Suneel Marthi (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023356#comment-17023356
 ] 

Suneel Marthi commented on OPENNLP-1286:


Could you confirm that this is still the case? A reproducible test case would 
be helpful.

> Updating from 1.7.0 to 1.9.1 breaks
> ---
>
> Key: OPENNLP-1286
> URL: https://issues.apache.org/jira/browse/OPENNLP-1286
> Project: OpenNLP
>  Issue Type: Bug
>Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4
>Reporter: xia0c
>Priority: Major
>
> When I try to upgrade Opennlp-tools from 1.7.0 to the version after 1.8.0. 
> The following code breaks.
> {code:java}
> public void demo(String summary) {
>   
>   try {   
>   inputStrean = new FileInputStream(Paths.get(bin).toFile());
>   DoccatModel doccatModel = new DoccatModel(inputStrean);
>   DocumentCategorizerME myCategorizer = new 
> DocumentCategorizerME(doccatModel);
>   double[] outcomes = myCategorizer.categorize(summary);
>   String category = myCategorizer.getBestCategory(outcomes);
>   
>   LOGGER.info(category);
>   } catch (IOException e) {
>   LOGGER.error(ExceptionUtils.getStackTrace(e));  
>   } 
> }
> {code}
> The code should pass, but it throws an error:
> {code:java}
> incompatible types: java.lang.String cannot be converted to java.lang.String[]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-32) Write more documentation for the parser

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-32.


> Write more documentation for the parser
> ---
>
> Key: OPENNLP-32
> URL: https://issues.apache.org/jira/browse/OPENNLP-32
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Jörn Kottmann
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: help-wanted
> Fix For: 1.9.3
>
>
> Write more documentation for the parser. It should cover the same topic as the
> documentation for the other components. 
> The following sections are still missing:
> - No general introduction, it should be explained what parsing is, ideally 
> with a few images
>   of parse trees
> - Explain how to navigate in the parse tree with the Parse class, that should 
> be
>   explained based on a sample parse tree 
> - Add a section about how the training api can be used
> - Remove all todos, and open jira issues for them if they are not solved with 
> this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-32) Write more documentation for the parser

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-32:
-
Fix Version/s: 1.9.3

> Write more documentation for the parser
> ---
>
> Key: OPENNLP-32
> URL: https://issues.apache.org/jira/browse/OPENNLP-32
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Jörn Kottmann
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: help-wanted
> Fix For: 1.9.3
>
>
> Write more documentation for the parser. It should cover the same topic as the
> documentation for the other components. 
> The following sections are still missing:
> - No general introduction, it should be explained what parsing is, ideally 
> with a few images
>   of parse trees
> - Explain how to navigate in the parse tree with the Parse class, that should 
> be
>   explained based on a sample parse tree 
> - Add a section about how the training api can be used
> - Remove all todos, and open jira issues for them if they are not solved with 
> this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OPENNLP-32) Write more documentation for the parser

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-32:


Assignee: Suneel Marthi

> Write more documentation for the parser
> ---
>
> Key: OPENNLP-32
> URL: https://issues.apache.org/jira/browse/OPENNLP-32
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Jörn Kottmann
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: help-wanted
>
> Write more documentation for the parser. It should cover the same topic as the
> documentation for the other components. 
> The following sections are still missing:
> - No general introduction, it should be explained what parsing is, ideally 
> with a few images
>   of parse trees
> - Explain how to navigate in the parse tree with the Parse class, that should 
> be
>   explained based on a sample parse tree 
> - Add a section about how the training api can be used
> - Remove all todos, and open jira issues for them if they are not solved with 
> this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1268) StringUtil.toLowerCase() should lowercase codepoints, not chars

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1268.
--

> StringUtil.toLowerCase() should lowercase codepoints, not chars
> ---
>
> Key: OPENNLP-1268
> URL: https://issues.apache.org/jira/browse/OPENNLP-1268
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 1.9.3
>
>
> {{StringUtils#toLowerCase()}} should run Character.tolowerCase() on code 
> points.  It is currently failing to lowercase characters beyond the BMP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1268) StringUtil.toLowerCase() should lowercase codepoints, not chars

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1268.

Fix Version/s: 1.9.3
   Resolution: Fixed

> StringUtil.toLowerCase() should lowercase codepoints, not chars
> ---
>
> Key: OPENNLP-1268
> URL: https://issues.apache.org/jira/browse/OPENNLP-1268
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
> Fix For: 1.9.3
>
>
> {{StringUtils#toLowerCase()}} should run Character.tolowerCase() on code 
> points.  It is currently failing to lowercase characters beyond the BMP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OPENNLP-1268) StringUtil.toLowerCase() should lowercase codepoints, not chars

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1268:
--

Assignee: Tim Allison

> StringUtil.toLowerCase() should lowercase codepoints, not chars
> ---
>
> Key: OPENNLP-1268
> URL: https://issues.apache.org/jira/browse/OPENNLP-1268
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Trivial
>
> {{StringUtils#toLowerCase()}} should run Character.tolowerCase() on code 
> points.  It is currently failing to lowercase characters beyond the BMP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1258) implement Serializable in langdetect and normalize

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1258:
---
Fix Version/s: 1.9.3

> implement Serializable in langdetect and normalize
> --
>
> Key: OPENNLP-1258
> URL: https://issues.apache.org/jira/browse/OPENNLP-1258
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Language Detector
>Reporter: Lucas Avanço
>Priority: Major
> Fix For: 1.9.3
>
>
> It is necessary to make some classes of langdetect and normalizer to 
> implement Serializable in order to save language detection models



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OPENNLP-1265) Improve speed of lang detect

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1265:
--

Assignee: Tim Allison

> Improve speed of lang detect
> 
>
> Key: OPENNLP-1265
> URL: https://issues.apache.org/jira/browse/OPENNLP-1265
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> Over on TIKA-2790, we found that opennlp's language detector is far, far 
> slower than Optimaize and yalder.
> Let's use this ticket to see what we can do to improve lang detect's speed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1265) Improve speed of lang detect

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1265:
---
Fix Version/s: 1.9.3

> Improve speed of lang detect
> 
>
> Key: OPENNLP-1265
> URL: https://issues.apache.org/jira/browse/OPENNLP-1265
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
>
> Over on TIKA-2790, we found that opennlp's language detector is far, far 
> slower than Optimaize and yalder.
> Let's use this ticket to see what we can do to improve lang detect's speed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OPENNLP-1270) Add new languages to the language detector

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1270:
--

Assignee: Tim Allison

> Add new languages to the language detector
> --
>
> Key: OPENNLP-1270
> URL: https://issues.apache.org/jira/browse/OPENNLP-1270
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the 
> language detector.  I've selected some with > 10k sentences.  Once I build 
> the model and evaluate performance, I'll share the reports, the model and a 
> tgz of the *-sentences.txt files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OPENNLP-1266) Limit normalization regexes in UrlCharSequenceNormalizer

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1266:
--

Assignee: Tim Allison

> Limit normalization regexes in UrlCharSequenceNormalizer
> 
>
> Key: OPENNLP-1266
> URL: https://issues.apache.org/jira/browse/OPENNLP-1266
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
>
> The {{MAIL_REGEX}} in UrlCharSequenceNormalizer is unbounded and requires 
> backtracking. In rare cases, this can cause eye-opening performance costs.
>  
> I tested the other regexes in the other normalizers.  I could be wrong, but 
> they don't appear to require backtracking, and there are no surprising 
> performance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1266) Limit normalization regexes in UrlCharSequenceNormalizer

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1266:
---
Fix Version/s: 1.9.3

> Limit normalization regexes in UrlCharSequenceNormalizer
> 
>
> Key: OPENNLP-1266
> URL: https://issues.apache.org/jira/browse/OPENNLP-1266
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
>
> The {{MAIL_REGEX}} in UrlCharSequenceNormalizer is unbounded and requires 
> backtracking. In rare cases, this can cause eye-opening performance costs.
>  
> I tested the other regexes in the other normalizers.  I could be wrong, but 
> they don't appear to require backtracking, and there are no surprising 
> performance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1270) Add new languages to the language detector

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1270:
---
Fix Version/s: 1.9.3

> Add new languages to the language detector
> --
>
> Key: OPENNLP-1270
> URL: https://issues.apache.org/jira/browse/OPENNLP-1270
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
> Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the 
> language detector.  I've selected some with > 10k sentences.  Once I build 
> the model and evaluate performance, I'll share the reports, the model and a 
> tgz of the *-sentences.txt files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1267.
--

> Allow the LanguageDetector to stop before processing the full string
> 
>
> Key: OPENNLP-1267
> URL: https://issues.apache.org/jira/browse/OPENNLP-1267
> Project: OpenNLP
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
>
> On TIKA-2790, I found that Yalder is stopping after computing character 
> ngrams on roughly the first 60 characters.  That _likely_ explains its 
> impressive speed.  Let's make this "stopping short" feature available in 
> OpenNLP.
>  
> Ideally, the language detector wouldn't copy the full String, it wouldn't 
> normalize the full String, and it wouldn't compute ngrams on the full String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1267.

Fix Version/s: 1.9.3
   Resolution: Fixed

> Allow the LanguageDetector to stop before processing the full string
> 
>
> Key: OPENNLP-1267
> URL: https://issues.apache.org/jira/browse/OPENNLP-1267
> Project: OpenNLP
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.9.3
>
>
> On TIKA-2790, I found that Yalder is stopping after computing character 
> ngrams on roughly the first 60 characters.  That _likely_ explains its 
> impressive speed.  Let's make this "stopping short" feature available in 
> OpenNLP.
>  
> Ideally, the language detector wouldn't copy the full String, it wouldn't 
> normalize the full String, and it wouldn't compute ngrams on the full String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1214:
---
Fix Version/s: (was: 1.9.2)
   1.9.3

> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --
>
> Key: OPENNLP-1214
> URL: https://issues.apache.org/jira/browse/OPENNLP-1214
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.3
>
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1272) Add support for Catalan and Indonesian stemmers

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1272:
---
Fix Version/s: (was: 1.9.2)
   1.9.3

> Add support for Catalan and Indonesian stemmers
> ---
>
> Key: OPENNLP-1272
> URL: https://issues.apache.org/jira/browse/OPENNLP-1272
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Stemmer
>Reporter: Vlad Ciotlausi
>Priority: Minor
>  Labels: Stemmer
> Fix For: 1.9.3
>
>
> Added Indonesian and Catalan stemmers plus some minor fixes.
>  
> This PR includes:
>  * Creating the Java code based on the .sbl files and adding them to the 
> stemmer folder
>  * Updating relevant classes to support the new stemmers
>  * Adding tests for the new stemmers
>  * Changing the romanian unit tests for stemmers because it didn't test a 
> romanian word



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OPENNLP-1267) Allow the LanguageDetector to stop before processing the full string

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1267:
--

Assignee: Tim Allison

> Allow the LanguageDetector to stop before processing the full string
> 
>
> Key: OPENNLP-1267
> URL: https://issues.apache.org/jira/browse/OPENNLP-1267
> Project: OpenNLP
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> On TIKA-2790, I found that Yalder is stopping after computing character 
> ngrams on roughly the first 60 characters.  That _likely_ explains its 
> impressive speed.  Let's make this "stopping short" feature available in 
> OpenNLP.
>  
> Ideally, the language detector wouldn't copy the full String, it wouldn't 
> normalize the full String, and it wouldn't compute ngrams on the full String.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

2020-01-24 Thread Suneel Marthi (Jira)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1214:
---
Fix Version/s: (was: 1.9.1)
   1.9.2

> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --
>
> Key: OPENNLP-1214
> URL: https://issues.apache.org/jira/browse/OPENNLP-1214
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 1.9.2
>
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (OPENNLP-1209) Is there documentation for feature gneration?

2018-12-31 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1209.
--

> Is there documentation for feature gneration?
> -
>
> Key: OPENNLP-1209
> URL: https://issues.apache.org/jira/browse/OPENNLP-1209
> Project: OpenNLP
>  Issue Type: Question
>  Components: Documentation
>Reporter: Joseph
>Priority: Major
> Fix For: 1.9.1
>
>
> I could not find any documentation about how to use the feature generation 
> while training a model for the name finder.  Nor could I find any information 
> about how to train a Maxent or Perceptron model, or how to configure these 
> algorithms. 
> I am aware of the basic documentation here 
> [http://opennlp.apache.org/docs/1.9.0/manual/opennlp.html]  but it does not 
> help beyond getting started.
> seems other people cannot find any as well
> [https://stackoverflow.com/questions/11989633/custom-feature-generation-in-opennlp-namefinder-api]
> So basically you have put all this work into creating this software but not 
> included sufficient documentation or example how to configure and use it, 
> which basically renders it useless to us if we cannot figure it out.
> Is there another location where I can find further documentation or examples? 
> If not are there any plans to address this?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OPENNLP-1209) Is there documentation for feature gneration?

2018-12-31 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1209.

   Resolution: Not A Problem
Fix Version/s: 1.9.1

> Is there documentation for feature gneration?
> -
>
> Key: OPENNLP-1209
> URL: https://issues.apache.org/jira/browse/OPENNLP-1209
> Project: OpenNLP
>  Issue Type: Question
>  Components: Documentation
>Reporter: Joseph
>Priority: Major
> Fix For: 1.9.1
>
>
> I could not find any documentation about how to use the feature generation 
> while training a model for the name finder.  Nor could I find any information 
> about how to train a Maxent or Perceptron model, or how to configure these 
> algorithms. 
> I am aware of the basic documentation here 
> [http://opennlp.apache.org/docs/1.9.0/manual/opennlp.html]  but it does not 
> help beyond getting started.
> seems other people cannot find any as well
> [https://stackoverflow.com/questions/11989633/custom-feature-generation-in-opennlp-namefinder-api]
> So basically you have put all this work into creating this software but not 
> included sufficient documentation or example how to configure and use it, 
> which basically renders it useless to us if we cannot figure it out.
> Is there another location where I can find further documentation or examples? 
> If not are there any plans to address this?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (OPENNLP-1230) Replace MD5 and SHA1 with SHA256/512

2018-12-27 Thread Suneel Marthi (JIRA)


 [ 
https://issues.apache.org/jira/browse/OPENNLP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1230:
--

Assignee: Suneel Marthi

> Replace MD5 and SHA1 with SHA256/512
> 
>
> Key: OPENNLP-1230
> URL: https://issues.apache.org/jira/browse/OPENNLP-1230
> Project: OpenNLP
>  Issue Type: Task
>Reporter: Jeff Zemerick
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 1.9.1
>
>
> Per the Apache Distribution Procedure, replace MD5 and SHA1 with SHA256/512.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (OPENNLP-1188) Update Penn Treebank URL

2018-05-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1188.
--

> Update Penn Treebank URL
> 
>
> Key: OPENNLP-1188
> URL: https://issues.apache.org/jira/browse/OPENNLP-1188
> Project: OpenNLP
>  Issue Type: Task
>  Components: Documentation
>Reporter: Jeff Zemerick
>Assignee: Jeff Zemerick
>Priority: Minor
> Fix For: 1.8.5
>
>
> As reported on the users mailing list, the URL for the PennTree Bank needs 
> updated to be 
> [https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html] 
> (from [http://www.cis.upenn.edu/~treebank/)] on the page 
> http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.postagger.tagging.cmdline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (OPENNLP-1189) Token model creation fails without at least one tag

2018-05-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1189.
--

> Token model creation fails without at least one  tag
> ---
>
> Key: OPENNLP-1189
> URL: https://issues.apache.org/jira/browse/OPENNLP-1189
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Tokenizer
>Affects Versions: 1.8.4
>Reporter: Jeff Zemerick
>Assignee: Jeff Zemerick
>Priority: Minor
> Fix For: 1.8.5
>
>
> The tokenizer training documentation for 1.8.4 states that "Tokens are either 
> separated by a whitespace or by a special  tag." However, it appears 
> that training files if the training data does not contain at least one 
>  tag. To reproduce:
> Training on the sample data works fine:
> {quote}Pierre Vinken, 61 years old, will join the board as a 
> nonexecutive director Nov. 29.
>  Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing 
> group.
>  Rudolph Agnew, 55 years old and former chairman of Consolidated Gold 
> Fields PLC,
>  was named a nonexecutive director of this British industrial 
> conglomerate.
> {quote}
> Replacing the  tags with whitespace causes the training to fail with 
> InsufficientTrainingDataException:
> {quote}Pierre Vinken , 61 years old , will join the board as a nonexecutive 
> director Nov. 29 .
>  Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
>  Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields 
> PLC ,
>  was named a nonexecutive director of this British industrial conglomerate .
> {quote}
> Modifying the training data to contain a single  tag allows model 
> training to complete successfully:
> {quote}Pierre Vinken, 61 years old , will join the board as a 
> nonexecutive director Nov. 29 .
>  Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
>  Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields 
> PLC ,
>  was named a nonexecutive director of this British industrial conglomerate .
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (OPENNLP-1180) Use String[] instead of StringList in LanguageModel API

2018-05-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1180.
--

> Use String[] instead of StringList in LanguageModel API
> ---
>
> Key: OPENNLP-1180
> URL: https://issues.apache.org/jira/browse/OPENNLP-1180
> Project: OpenNLP
>  Issue Type: Task
>  Components: language model
>Reporter: Tommaso Teofili
>Assignee: Tommaso Teofili
>Priority: Major
> Fix For: 1.8.5
>
>
> Current {{LanguageModel}} API uses {{StringList}}, however that's less 
> convenient for easy consumption as one needs to look into StringList and 
> adapt its code to convert arrays or collections of Strings into StringList. 
> Additionally this requires more objects to be created that will be soon 
> discarded by garbage collection e.g. the input StringList for 
> LM#calculateProbability and LM#predictNextTokens.
> I propose to deprecate those methods and add new ones with exactly the same 
> signature but using String[] (or String...) instead.
> Internally StringLists can be kept or not, but that would be an 
> implementation detail and allows to move away from using them more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OPENNLP-1182) LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise

2018-05-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1182:
---
Affects Version/s: 1.8.4
Fix Version/s: 1.8.5

> LanguageDetectorConverterTool is a no-op, despite the docs saying otherwise
> ---
>
> Key: OPENNLP-1182
> URL: https://issues.apache.org/jira/browse/OPENNLP-1182
> Project: OpenNLP
>  Issue Type: Bug
>Affects Versions: 1.8.4
>Reporter: Steve Rowe
>Priority: Major
> Fix For: 1.8.5
>
>
> Contrary to the docs (see below), LanguageDetectorConverterTool doesn't 
> actually do anything at all; the class is empty.
> {quote}
> The following sequence of commands shows how to convert the Leipzig Corpora 
> collection at folder leipzig-train/ to the default Language Detector format, 
> by creating groups of 5 sentences as documents and limiting to 1 
> documents per language. Them, it shuffles the result and select the first 
> 10 lines as train corpus and the last 2 as evaluation corpus:
> {noformat}
> $ bin/opennlp LanguageDetectorConverter leipzig -sentencesDir leipzig-train/ 
> -sentencesPerSample 5 -samplesPerLanguage 1 > leipzig.txt
> $ perl -MList::Util=shuffle -e 'print shuffle();' < leipzig.txt > 
> leipzig_shuf.txt
> $ head -10 < leipzig_shuf.txt > leipzig.train
> $ tail -2 < leipzig_shuf.txt > leipzig.eval
> {noformat}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OPENNLP-1187) Issue in finding accuracy of model

2018-05-10 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1187:
---
Fix Version/s: 1.8.5

> Issue in finding accuracy of model
> --
>
> Key: OPENNLP-1187
> URL: https://issues.apache.org/jira/browse/OPENNLP-1187
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Doccat, Machine Learning
>Affects Versions: 1.8.4
>Reporter: Aman Garg
>Priority: Major
> Fix For: 1.8.5
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> the trainingStats function in NaiveBayesTrainer class is not working properly 
> and display wrong result.
> In findParameters(), at line 154 i.e. 
> EvalParameters evalParams = new EvalParameters(params, numOutcomes);
> should be replaced by following block:
>  
> double[] outcomeTotals = new double[outcomeLabels.length];
>     for (int i = 0; i < params.length; ++i) {
>       Context context = params[i];
>       for (int j = 0; j < context.getOutcomes().length; ++j) {
>         int outcome = context.getOutcomes()[j];
>         double count = context.getParameters()[j];
>         outcomeTotals[outcome] += count;
>       }
>     }
> evalParams = new NaiveBayesEvalParameters(params,
> outcomeLabels.length, outcomeTotals, predLabels.length);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OPENNLP-1190) CONLL02 format

2018-04-03 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424564#comment-16424564
 ] 

Suneel Marthi commented on OPENNLP-1190:


Would u like to submit a PR with fix ?

> CONLL02 format
> --
>
> Key: OPENNLP-1190
> URL: https://issues.apache.org/jira/browse/OPENNLP-1190
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Formats
>Affects Versions: tools-1.5.3
>Reporter: Luca
>Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According to the documentation, the following should work
>  bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types 
> per > es_corpus_train_persons.txt
> However currently it delivers error message since  it expects 3 columns 
> instead of 2 that are in the dataset.
> This is a bug, introduced at line 130 of   
> opennlp.tools.formats.Conll02NameSampleStream.java where a length of 3 is 
> imposed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (OPENNLP-1192) Remove MD5 hashes from Release process

2018-04-03 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1192.
--

> Remove MD5 hashes from Release process
> --
>
> Key: OPENNLP-1192
> URL: https://issues.apache.org/jira/browse/OPENNLP-1192
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Build, Packaging and Test
>Affects Versions: 1.8.4
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 1.8.5
>
>
> Per [http://www.apache.org/dev/release-publishing.html] MD5 should not be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (OPENNLP-1192) Remove MD5 hashes from Release process

2018-04-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1192:
---
Description: Per [http://www.apache.org/dev/release-publishing.html] MD5 
should not be supported.  (was: Per 
[http://www.apache.org/dev/release-publishing.html] MD5 should be stopped.)

> Remove MD5 hashes from Release process
> --
>
> Key: OPENNLP-1192
> URL: https://issues.apache.org/jira/browse/OPENNLP-1192
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Build, Packaging and Test
>Affects Versions: 1.8.4
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
> Fix For: 1.8.5
>
>
> Per [http://www.apache.org/dev/release-publishing.html] MD5 should not be 
> supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (OPENNLP-1192) Remove MD5 hashes from Release process

2018-04-02 Thread Suneel Marthi (JIRA)
Suneel Marthi created OPENNLP-1192:
--

 Summary: Remove MD5 hashes from Release process
 Key: OPENNLP-1192
 URL: https://issues.apache.org/jira/browse/OPENNLP-1192
 Project: OpenNLP
  Issue Type: New Feature
  Components: Build, Packaging and Test
Affects Versions: 1.8.4
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 1.8.5


Per [http://www.apache.org/dev/release-publishing.html] MD5 should be stopped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (OPENNLP-956) Javadoc issues with Java 8

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-956.
---
   Resolution: Won't Fix
Fix Version/s: 1.8.4

May not be an issue now with Java 9 support. Resolving this as 'Won't Fix'

> Javadoc issues with Java 8
> --
>
> Key: OPENNLP-956
> URL: https://issues.apache.org/jira/browse/OPENNLP-956
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Build, Packaging and Test
>Affects Versions: 1.7.1
>Reporter: William Colen
>Assignee: Suneel Marthi
> Fix For: 1.8.4
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-956) Javadoc issues with Java 8

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-956.
-

> Javadoc issues with Java 8
> --
>
> Key: OPENNLP-956
> URL: https://issues.apache.org/jira/browse/OPENNLP-956
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Build, Packaging and Test
>Affects Versions: 1.7.1
>Reporter: William Colen
>Assignee: Suneel Marthi
> Fix For: 1.8.4
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1132) Fail with exception if not enough lines in leipzig parser

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1132.

Resolution: Fixed

> Fail with exception if not enough lines in leipzig parser
> -
>
> Key: OPENNLP-1132
> URL: https://issues.apache.org/jira/browse/OPENNLP-1132
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Language Detector
>Affects Versions: 1.8.2
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
> Fix For: 1.8.4
>
>
> Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = 
> 10
>   at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004)
>   at java.util.ArrayList.subList(ArrayList.java:996)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.(LeipzigLanguageSampleStream.java:65)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42)
>   at 
> opennlp.tools.formats.leipzig.SampleShuffleStream.(SampleShuffleStream.java:38)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76)
>   at 
> opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106)
>   at opennlp.tools.cmdline.CLI.main(CLI.java:256)
> line 65:
> Set selectedLines = new HashSet<>(
>   indexes.subList(0, sentencesPerSample * numberOfSamples));
> Fails if sentencesPerSample x numberOfSamples is larger than size of indexes 
> (source file).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1156) Downloaded files have invalid hash sums

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1156.
--

> Downloaded files have invalid hash sums
> ---
>
> Key: OPENNLP-1156
> URL: https://issues.apache.org/jira/browse/OPENNLP-1156
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 1.8.3
>Reporter: Oleg Popov
> Fix For: 1.8.4
>
>
> [user@knime out]$ md5sum *
> 336ec3cb06862f685a9b670753915ba9  apache-opennlp-1.8.3-bin.tar.gz
> 2b1c1ec960646697a621bb52fd389083  apache-opennlp-1.8.3-bin.zip
> 070645990b19210408229c045e9ddad9  apache-opennlp-1.8.3-src.tar.gz
> cc1757ad7bb988e6a41cd2c75d128309  apache-opennlp-1.8.3-src.zip
> [user@knime out]$ sha1sum *
> 17e1089b41c6cad1a080cf76f5593d2e39cd  apache-opennlp-1.8.3-bin.tar.gz
> d59af5017ffdb0b81898e39048a8c8b460f13025  apache-opennlp-1.8.3-bin.zip
> 5af2aa28c4ce36b61a35d45767dcd47d25c368a4  apache-opennlp-1.8.3-src.tar.gz
> 1189d6c5c464f5d32d2f47c08d4b05d65766a0d9  apache-opennlp-1.8.3-src.zip



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1132) Fail with exception if not enough lines in leipzig parser

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1132.
--

> Fail with exception if not enough lines in leipzig parser
> -
>
> Key: OPENNLP-1132
> URL: https://issues.apache.org/jira/browse/OPENNLP-1132
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Language Detector
>Affects Versions: 1.8.2
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
> Fix For: 1.8.4
>
>
> Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = 
> 10
>   at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004)
>   at java.util.ArrayList.subList(ArrayList.java:996)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.(LeipzigLanguageSampleStream.java:65)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42)
>   at 
> opennlp.tools.formats.leipzig.SampleShuffleStream.(SampleShuffleStream.java:38)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76)
>   at 
> opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106)
>   at opennlp.tools.cmdline.CLI.main(CLI.java:256)
> line 65:
> Set selectedLines = new HashSet<>(
>   indexes.subList(0, sentencesPerSample * numberOfSamples));
> Fails if sentencesPerSample x numberOfSamples is larger than size of indexes 
> (source file).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1132) Fail with exception if not enough lines in leipzig parser

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1132:
---
Fix Version/s: 1.8.4

> Fail with exception if not enough lines in leipzig parser
> -
>
> Key: OPENNLP-1132
> URL: https://issues.apache.org/jira/browse/OPENNLP-1132
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Language Detector
>Affects Versions: 1.8.2
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
> Fix For: 1.8.4
>
>
> Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = 
> 10
>   at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004)
>   at java.util.ArrayList.subList(ArrayList.java:996)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream$LeipzigSentencesStream.(LeipzigLanguageSampleStream.java:65)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:157)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStream.read(LeipzigLanguageSampleStream.java:42)
>   at 
> opennlp.tools.formats.leipzig.SampleShuffleStream.(SampleShuffleStream.java:38)
>   at 
> opennlp.tools.formats.leipzig.LeipzigLanguageSampleStreamFactory.create(LeipzigLanguageSampleStreamFactory.java:76)
>   at 
> opennlp.tools.cmdline.AbstractConverterTool.run(AbstractConverterTool.java:106)
>   at opennlp.tools.cmdline.CLI.main(CLI.java:256)
> line 65:
> Set selectedLines = new HashSet<>(
>   indexes.subList(0, sentencesPerSample * numberOfSamples));
> Fails if sentencesPerSample x numberOfSamples is larger than size of indexes 
> (source file).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1156) Downloaded files have invalid hash sums

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1156.

Resolution: Not A Problem

Resolving this as 'Not an Issue'

> Downloaded files have invalid hash sums
> ---
>
> Key: OPENNLP-1156
> URL: https://issues.apache.org/jira/browse/OPENNLP-1156
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 1.8.3
>Reporter: Oleg Popov
> Fix For: 1.8.4
>
>
> [user@knime out]$ md5sum *
> 336ec3cb06862f685a9b670753915ba9  apache-opennlp-1.8.3-bin.tar.gz
> 2b1c1ec960646697a621bb52fd389083  apache-opennlp-1.8.3-bin.zip
> 070645990b19210408229c045e9ddad9  apache-opennlp-1.8.3-src.tar.gz
> cc1757ad7bb988e6a41cd2c75d128309  apache-opennlp-1.8.3-src.zip
> [user@knime out]$ sha1sum *
> 17e1089b41c6cad1a080cf76f5593d2e39cd  apache-opennlp-1.8.3-bin.tar.gz
> d59af5017ffdb0b81898e39048a8c8b460f13025  apache-opennlp-1.8.3-bin.zip
> 5af2aa28c4ce36b61a35d45767dcd47d25c368a4  apache-opennlp-1.8.3-src.tar.gz
> 1189d6c5c464f5d32d2f47c08d4b05d65766a0d9  apache-opennlp-1.8.3-src.zip



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1156) Downloaded files have invalid hash sums

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1156:
---
Fix Version/s: 1.8.4

> Downloaded files have invalid hash sums
> ---
>
> Key: OPENNLP-1156
> URL: https://issues.apache.org/jira/browse/OPENNLP-1156
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Website
>Affects Versions: 1.8.3
>Reporter: Oleg Popov
> Fix For: 1.8.4
>
>
> [user@knime out]$ md5sum *
> 336ec3cb06862f685a9b670753915ba9  apache-opennlp-1.8.3-bin.tar.gz
> 2b1c1ec960646697a621bb52fd389083  apache-opennlp-1.8.3-bin.zip
> 070645990b19210408229c045e9ddad9  apache-opennlp-1.8.3-src.tar.gz
> cc1757ad7bb988e6a41cd2c75d128309  apache-opennlp-1.8.3-src.zip
> [user@knime out]$ sha1sum *
> 17e1089b41c6cad1a080cf76f5593d2e39cd  apache-opennlp-1.8.3-bin.tar.gz
> d59af5017ffdb0b81898e39048a8c8b460f13025  apache-opennlp-1.8.3-bin.zip
> 5af2aa28c4ce36b61a35d45767dcd47d25c368a4  apache-opennlp-1.8.3-src.tar.gz
> 1189d6c5c464f5d32d2f47c08d4b05d65766a0d9  apache-opennlp-1.8.3-src.zip



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1166) TwoPassDataIndexer fails if features contain \n

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1166:
---
Fix Version/s: 1.8.4

> TwoPassDataIndexer fails if features contain \n
> ---
>
> Key: OPENNLP-1166
> URL: https://issues.apache.org/jira/browse/OPENNLP-1166
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Machine Learning
>Affects Versions: 1.8.3
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
> Fix For: 1.8.4
>
>
> Training a model with Newline tokens causes TwoPassDataIndexer to throw 
> exception
> Exception in thread "main" java.util.NoSuchElementException
> at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
> at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71)
> at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35)
> at 
> opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168)
> at 
> opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72)
> at 
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68)
> at 
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244)
> at 
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169)
> at opennlp.tools.cmdline.CLI.main(CLI.java:256)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1166) TwoPassDataIndexer fails if features contain \n

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1166.
--

> TwoPassDataIndexer fails if features contain \n
> ---
>
> Key: OPENNLP-1166
> URL: https://issues.apache.org/jira/browse/OPENNLP-1166
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Machine Learning
>Affects Versions: 1.8.3
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
> Fix For: 1.8.4
>
>
> Training a model with Newline tokens causes TwoPassDataIndexer to throw 
> exception
> Exception in thread "main" java.util.NoSuchElementException
> at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
> at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71)
> at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35)
> at 
> opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168)
> at 
> opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72)
> at 
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68)
> at 
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244)
> at 
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169)
> at opennlp.tools.cmdline.CLI.main(CLI.java:256)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1166) TwoPassDataIndexer fails if features contain \n

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1166.

Resolution: Fixed

> TwoPassDataIndexer fails if features contain \n
> ---
>
> Key: OPENNLP-1166
> URL: https://issues.apache.org/jira/browse/OPENNLP-1166
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Machine Learning
>Affects Versions: 1.8.3
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
> Fix For: 1.8.4
>
>
> Training a model with Newline tokens causes TwoPassDataIndexer to throw 
> exception
> Exception in thread "main" java.util.NoSuchElementException
> at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
> at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:71)
> at opennlp.tools.ml.model.FileEventStream.read(FileEventStream.java:35)
> at 
> opennlp.tools.ml.model.AbstractDataIndexer.index(AbstractDataIndexer.java:168)
> at 
> opennlp.tools.ml.model.TwoPassDataIndexer.index(TwoPassDataIndexer.java:72)
> at 
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:68)
> at 
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:90)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:244)
> at 
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:169)
> at opennlp.tools.cmdline.CLI.main(CLI.java:256)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1140) Add 20 newsgroups format support

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1140.

Resolution: Fixed

> Add 20 newsgroups format support
> 
>
> Key: OPENNLP-1140
> URL: https://issues.apache.org/jira/browse/OPENNLP-1140
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Reporter: Tommaso Teofili
>Assignee: Joern Kottmann
> Fix For: 1.8.4
>
>
> It'd be nice to have support for [20 
> newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for 
> evaluating {{DocCat}} models.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1140) Add 20 newsgroups format support

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1140.
--

> Add 20 newsgroups format support
> 
>
> Key: OPENNLP-1140
> URL: https://issues.apache.org/jira/browse/OPENNLP-1140
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Reporter: Tommaso Teofili
>Assignee: Joern Kottmann
> Fix For: 1.8.4
>
>
> It'd be nice to have support for [20 
> newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for 
> evaluating {{DocCat}} models.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OPENNLP-1140) Add 20 newsgroups format support

2017-12-26 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1140:
--

 Assignee: Joern Kottmann
Fix Version/s: (was: 1.8.5)
   1.8.4

> Add 20 newsgroups format support
> 
>
> Key: OPENNLP-1140
> URL: https://issues.apache.org/jira/browse/OPENNLP-1140
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Reporter: Tommaso Teofili
>Assignee: Joern Kottmann
> Fix For: 1.8.4
>
>
> It'd be nice to have support for [20 
> newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for 
> evaluating {{DocCat}} models.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OPENNLP-1172) Add Annotator notes to BratAnnotation

2017-12-19 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1172:
--

Assignee: Daniel Russ

> Add Annotator notes to BratAnnotation
> -
>
> Key: OPENNLP-1172
> URL: https://issues.apache.org/jira/browse/OPENNLP-1172
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Affects Versions: 1.8.3
>Reporter: Daniel Russ
>Assignee: Daniel Russ
>Priority: Minor
> Fix For: 1.8.4
>
>
> The Brat Annotator allows Annotators to add Notes to entites/relations.  The 
> BratAnnotation class should reflect it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1153) Add model download page to website

2017-11-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1153.

   Resolution: Fixed
Fix Version/s: 1.8.3

> Add model download page to website
> --
>
> Key: OPENNLP-1153
> URL: https://issues.apache.org/jira/browse/OPENNLP-1153
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Website
>Reporter: William Colen
>Assignee: William Colen
>Priority: Trivial
> Fix For: 1.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1153) Add model download page to website

2017-11-06 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1153.
--

> Add model download page to website
> --
>
> Key: OPENNLP-1153
> URL: https://issues.apache.org/jira/browse/OPENNLP-1153
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Website
>Reporter: William Colen
>Assignee: William Colen
>Priority: Trivial
> Fix For: 1.8.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1147) Missing URLs in doc

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1147.
--

> Missing URLs in doc
> ---
>
> Key: OPENNLP-1147
> URL: https://issues.apache.org/jira/browse/OPENNLP-1147
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> When I read name finder part in document, some missing URLs were there. I'd 
> like to correct some of them which I could find latest/alternative ones.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1149) remove unused member in PlainTextByLineStream

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1149.
--

> remove unused member in PlainTextByLineStream
> -
>
> Key: OPENNLP-1149
> URL: https://issues.apache.org/jira/browse/OPENNLP-1149
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> PlainTextByLineStream has a private member variable "channel" but it is never 
> set and hence, it is always null. It can be removed to simplify code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1145) Javadoc of NaiveBayesTrainer class looks incorrect

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1145.
--

> Javadoc of NaiveBayesTrainer class looks incorrect
> --
>
> Key: OPENNLP-1145
> URL: https://issues.apache.org/jira/browse/OPENNLP-1145
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Machine Learning
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> It seems that Javadoc of NaiveBayesTrainer class was copied from 
> PerceptronTrainer and hence, it says "Trains models using the perceptron 
> algorithm." :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1146) remove unnecessary serialVersionUID

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1146.
--

> remove unnecessary serialVersionUID
> ---
>
> Key: OPENNLP-1146
> URL: https://issues.apache.org/jira/browse/OPENNLP-1146
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Build, Packaging and Test
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> We saw several classes that have unnecessary serialVersionUID constant 
> declaration. Most of them are Stemmer classes that are created by the 
> Snowball to Java compiler. I think we can just remove serialVersionUID from 
> Stemmer classes. Other than Stemmer classes, Exception classes which extend 
> RuntimeException or IOException have serialVersionUID. I'll remove 
> serialVersionUID from these Exception classes as well but add 
> @SuppressWarnings("serial") just in case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1148) use StandardCharsets.UTF_8 in doc

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1148.
--

> use StandardCharsets.UTF_8 in doc
> -
>
> Key: OPENNLP-1148
> URL: https://issues.apache.org/jira/browse/OPENNLP-1148
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> In the doc, the use of PlainTextByLineStream() is not unified. Other than 
> specifying StandardCharsets.UTF_8 for its second parameter, there are 
> following variations:
> - String "UTF-8"
> - StandardCharsets.UTF8 (not UTF_8)
> - Charset.forName("UTF-8")
> Let's unify the use to StandardCharsets.UTF_8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1151) All Sample objects should implement Serializable for easy interation into other tools

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1151.
--

> All Sample objects should implement Serializable for easy interation into 
> other tools
> -
>
> Key: OPENNLP-1151
> URL: https://issues.apache.org/jira/browse/OPENNLP-1151
> Project: OpenNLP
>  Issue Type: Bug
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.8.3
>
>
> State of the Art frameworks like Apache Flink require that objects are 
> serializable to use them in the pipeline. To use it to prepare training date 
> for OpenNLP the Sample objects should all implement Serializable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1150) TokenNameFinderTrainerTool should use ModelUtil.createDefaultTrainingParameters() when mlParams is null

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1150:
---
Fix Version/s: (was: 1.8.3)
   1.8.4

> TokenNameFinderTrainerTool should use 
> ModelUtil.createDefaultTrainingParameters() when mlParams is null
> ---
>
> Key: OPENNLP-1150
> URL: https://issues.apache.org/jira/browse/OPENNLP-1150
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Name Finder
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.4
>
>
> Unlike other TrainerTools, TokenNameFinderTrainerTool create an empty 
> TrainingParameters when mlParams is null by calling the constructor. 
> TokenNameFinderTrainerTool should use 
> ModelUtil.createDefaultTrainingParameters() like as other TrainerTools do to 
> initialize mlParams.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-936:
--
Fix Version/s: (was: 1.8.3)
   1.8.4

> Add thread safe versions of some tools (ME sentence detection, tokenization, 
> pos tagging)
> -
>
> Key: OPENNLP-936
> URL: https://issues.apache.org/jira/browse/OPENNLP-936
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: POS Tagger
>Affects Versions: 1.7.1
>Reporter: Thilo Goetz
>Priority: Minor
> Fix For: 1.8.4
>
>
> As discussed on the mailing list, add thread safe versions of maximum entropy 
> sentence detection, tokenization and pos tagging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1144) Add support for word vector resources

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1144:
---
Fix Version/s: (was: 1.8.3)
   1.8.4

> Add support for word vector resources
> -
>
> Key: OPENNLP-1144
> URL: https://issues.apache.org/jira/browse/OPENNLP-1144
> Project: OpenNLP
>  Issue Type: Improvement
>Reporter: Joern Kottmann
> Fix For: 1.8.4
>
>
> It would be nice to have support for word vector resources and parsing 
> support for the most common formats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1140) Add 20 newsgroups format support

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1140:
---
Fix Version/s: (was: 1.8.3)
   1.8.4

> Add 20 newsgroups format support
> 
>
> Key: OPENNLP-1140
> URL: https://issues.apache.org/jira/browse/OPENNLP-1140
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Reporter: Tommaso Teofili
> Fix For: 1.8.4
>
>
> It'd be nice to have support for [20 
> newsgroups|http://qwone.com/~jason/20Newsgroups/] format, especially for 
> evaluating {{DocCat}} models.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1082) SentenceSampleStream should add EOS to samples if missing

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1082:
---
Fix Version/s: (was: 1.8.3)
   1.8.4

> SentenceSampleStream should add EOS to samples if missing
> -
>
> Key: OPENNLP-1082
> URL: https://issues.apache.org/jira/browse/OPENNLP-1082
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Sentence Detector
>Reporter: William Colen
>Assignee: William Colen
> Fix For: 1.8.4
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-47) Rewrite the CONLL06 documentation based on the tutorial

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-47:
-
Fix Version/s: (was: 1.8.3)
   1.8.4

> Rewrite the CONLL06 documentation based on the tutorial
> ---
>
> Key: OPENNLP-47
> URL: https://issues.apache.org/jira/browse/OPENNLP-47
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: tools-1.5.1-incubating
>Reporter: Joern Kottmann
>  Labels: help-wanted
> Fix For: 1.8.4
>
>
> The CONLL06 documentation should be rewritten the reflect the new converters
> which have been added to OpenNLP after its initial write.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1113) Identify why some eval tests fail on AMD processors

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1113:
---
Fix Version/s: (was: 1.8.3)
   1.8.4

> Identify why some eval tests fail on AMD processors
> ---
>
> Key: OPENNLP-1113
> URL: https://issues.apache.org/jira/browse/OPENNLP-1113
> Project: OpenNLP
>  Issue Type: Test
>Affects Versions: 1.8.1
>Reporter: Jeff Zemerick
>Assignee: Jeff Zemerick
>Priority: Minor
> Fix For: 1.8.4
>
> Attachments: failure.txt, success.txt
>
>
> When running the eval-tests for the 1.8.1 tag some of the tests consistently 
> fail on an EC2 instance. On another virtual machine the tests consistently 
> pass. When the tests fail the failures are consistent with the following:
> {quote}Failed tests: 
>   
> ArvoresDeitadasEval.evalPortugueseChunkerQnMultipleThreads:208->chunkerCrossEval:128
>  expected:<0.9649180953528779> but was:<0.9650518197155942>
>   
> ArvoresDeitadasEval.evalPortugueseSentenceDetectorMaxentQn:143->sentenceCrossEval:90
>  expected:<0.99261110833375> but was:<0.9927505074644777>
>   Conll02NameFinderEval.evalSpanishOrganizationMaxentQn:390->eval:90 
> expected:<0.682961897915169> but was:<0.6798418972332015>
>   ConllXPosTaggerEval.evalSwedishMaxentQn:152->eval:76 
> expected:<0.9347595473833098> but was:<0.9322842998585573>{quote}
> Both systems are Ubuntu 16.04.2 running OpenJDK 1.8.0_131 but there must be 
> some other differences affecting the tests. Those differences need to be 
> identified.
> *VM1 (Tests Consistently _Pass_)*
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-1022-aws", arch: "amd64", family: "unix"
> LANG=en_US.UTF-8
> *VM2 (Tests Consistently _Fail_)*
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix"
> LANG=en_US.UTF-8
> This VM also consistently fails when using Oracle JDK:
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-oracle/jre
> *VM3 (Tests Consistently _Pass_)*
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix"
> *VM4 (Tests Consistently _Fail_)*
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
> 2015-11-10T11:41:47-05:00)
> Maven home: C:\Program Files (x86)\maven\bin\..
> Java version: 1.8.0_92, vendor: Oracle Corporation
> Java home: C:\Program Files\Java\jdk1.8.0_92\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1151) All Sample objects should implement Serializable for easy interation into other tools

2017-10-24 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1151.

Resolution: Fixed

> All Sample objects should implement Serializable for easy interation into 
> other tools
> -
>
> Key: OPENNLP-1151
> URL: https://issues.apache.org/jira/browse/OPENNLP-1151
> Project: OpenNLP
>  Issue Type: Bug
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.8.3
>
>
> State of the Art frameworks like Apache Flink require that objects are 
> serializable to use them in the pipeline. To use it to prepare training date 
> for OpenNLP the Sample objects should all implement Serializable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-976) Add formats support for germeval2014

2017-10-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-976:
--
Fix Version/s: (was: 1.8.3)

> Add formats support for germeval2014
> 
>
> Key: OPENNLP-976
> URL: https://issues.apache.org/jira/browse/OPENNLP-976
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
>
> Details about the format can be found here:
> https://sites.google.com/site/germeval2014ner/data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1148) use StandardCharsets.UTF_8 in doc

2017-10-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1148.

Resolution: Fixed

> use StandardCharsets.UTF_8 in doc
> -
>
> Key: OPENNLP-1148
> URL: https://issues.apache.org/jira/browse/OPENNLP-1148
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> In the doc, the use of PlainTextByLineStream() is not unified. Other than 
> specifying StandardCharsets.UTF_8 for its second parameter, there are 
> following variations:
> - String "UTF-8"
> - StandardCharsets.UTF8 (not UTF_8)
> - Charset.forName("UTF-8")
> Let's unify the use to StandardCharsets.UTF_8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1147) Missing URLs in doc

2017-10-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1147.

Resolution: Fixed

> Missing URLs in doc
> ---
>
> Key: OPENNLP-1147
> URL: https://issues.apache.org/jira/browse/OPENNLP-1147
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.8.2
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 1.8.3
>
>
> When I read name finder part in document, some missing URLs were there. I'd 
> like to correct some of them which I could find latest/alternative ones.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OPENNLP-1151) All Sample objects should implement Serializable for easy interation into other tools

2017-10-23 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1151:
--

Assignee: Suneel Marthi

> All Sample objects should implement Serializable for easy interation into 
> other tools
> -
>
> Key: OPENNLP-1151
> URL: https://issues.apache.org/jira/browse/OPENNLP-1151
> Project: OpenNLP
>  Issue Type: Bug
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.8.3
>
>
> State of the Art frameworks like Apache Flink require that objects are 
> serializable to use them in the pipeline. To use it to prepare training date 
> for OpenNLP the Sample objects should all implement Serializable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1116) Add Concatenate Stream method for Collections of streams

2017-07-17 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1116.
--

> Add Concatenate Stream method for Collections of streams
> 
>
> Key: OPENNLP-1116
> URL: https://issues.apache.org/jira/browse/OPENNLP-1116
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Machine Learning
>Affects Versions: 1.8.1
>Reporter: Daniel Russ
>Assignee: Daniel Russ
>Priority: Trivial
> Fix For: 1.8.2
>
>
> Minor change to opennlp.tools.util.ObjectStreamUtls.  First change the 
> signature of the createObjectStream(final ObjectStream... streams)  to 
> concatenateObjectStream(final ObjectStream... streams), and add a method 
> concatenateObjectStream(final Collection streams)
> The reason behind this is that I often pull data from multiple files, whereas 
> it is possible to create an array of ObjectStreams, it is easier to work with 
> Lists.  Also, the name of the method is clearer.  It concatenates a 
> list/array of ObjectStreams  as opposed the the createObjectStream(final 
> Collection collection) which makes an obectstream of items in the 
> collection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OPENNLP-1116) Add Concatenate Stream method for Collections of streams

2017-07-17 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-1116:
--

Assignee: Daniel Russ

> Add Concatenate Stream method for Collections of streams
> 
>
> Key: OPENNLP-1116
> URL: https://issues.apache.org/jira/browse/OPENNLP-1116
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Machine Learning
>Affects Versions: 1.8.1
>Reporter: Daniel Russ
>Assignee: Daniel Russ
>Priority: Trivial
> Fix For: 1.8.2
>
>
> Minor change to opennlp.tools.util.ObjectStreamUtls.  First change the 
> signature of the createObjectStream(final ObjectStream... streams)  to 
> concatenateObjectStream(final ObjectStream... streams), and add a method 
> concatenateObjectStream(final Collection streams)
> The reason behind this is that I often pull data from multiple files, whereas 
> it is possible to create an array of ObjectStreams, it is easier to work with 
> Lists.  Also, the name of the method is clearer.  It concatenates a 
> list/array of ObjectStreams  as opposed the the createObjectStream(final 
> Collection collection) which makes an obectstream of items in the 
> collection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1117) Fix cmd line training time

2017-07-16 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1117.

Resolution: Fixed

> Fix cmd line training time
> --
>
> Key: OPENNLP-1117
> URL: https://issues.apache.org/jira/browse/OPENNLP-1117
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
>Affects Versions: 1.7.1
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
>Priority: Trivial
> Fix For: 1.8.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The final execution time for training a model should be printed using 
> System.err not System.out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1117) Fix cmd line training time

2017-07-16 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1117:
---
Fix Version/s: 1.8.2

> Fix cmd line training time
> --
>
> Key: OPENNLP-1117
> URL: https://issues.apache.org/jira/browse/OPENNLP-1117
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
>Affects Versions: 1.7.1
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
>Priority: Trivial
> Fix For: 1.8.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The final execution time for training a model should be printed using 
> System.err not System.out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1117) Fix cmd line training time

2017-07-16 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed OPENNLP-1117.
--

> Fix cmd line training time
> --
>
> Key: OPENNLP-1117
> URL: https://issues.apache.org/jira/browse/OPENNLP-1117
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Command Line Interface
>Affects Versions: 1.7.1
>Reporter: Peter Thygesen
>Assignee: Peter Thygesen
>Priority: Trivial
> Fix For: 1.8.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The final execution time for training a model should be printed using 
> System.err not System.out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1112) Jenkins should publish daily snapshot builds

2017-07-12 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1112:
---
Summary: Jenkins should publish daily snapshot builds  (was: Travis should 
publish daily snapshot builds)

> Jenkins should publish daily snapshot builds
> 
>
> Key: OPENNLP-1112
> URL: https://issues.apache.org/jira/browse/OPENNLP-1112
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Build, Packaging and Test
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
> Fix For: 1.8.2
>
>
> Travis should publish a snapshot build every time the master branch is 
> updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (OPENNLP-1114) Update OpenNLP Release Notes

2017-07-11 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-1114.

Resolution: Fixed

> Update OpenNLP Release Notes
> 
>
> Key: OPENNLP-1114
> URL: https://issues.apache.org/jira/browse/OPENNLP-1114
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.8.1
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.8.2
>
>
> The Release Notes need to be updated to account for the changes to the web 
> site code that need to happen prior to a Release announcement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (OPENNLP-1114) Update OpenNLP Release Notes

2017-07-08 Thread Suneel Marthi (JIRA)
Suneel Marthi created OPENNLP-1114:
--

 Summary: Update OpenNLP Release Notes
 Key: OPENNLP-1114
 URL: https://issues.apache.org/jira/browse/OPENNLP-1114
 Project: OpenNLP
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.8.1
Reporter: Suneel Marthi
Assignee: Suneel Marthi
 Fix For: 1.8.2


The Release Notes need to be updated to account for the changes to the web site 
code that need to happen prior to a Release announcement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-47) Rewrite the CONLL06 documentation based on the tutorial

2017-07-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-47:
-
Fix Version/s: 1.8.2

> Rewrite the CONLL06 documentation based on the tutorial
> ---
>
> Key: OPENNLP-47
> URL: https://issues.apache.org/jira/browse/OPENNLP-47
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: tools-1.5.1-incubating
>Reporter: Joern Kottmann
>  Labels: help-wanted
> Fix For: 1.8.2
>
>
> The CONLL06 documentation should be rewritten the reflect the new converters
> which have been added to OpenNLP after its initial write.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-976) Add formats support for germeval2014

2017-07-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-976:
--
Fix Version/s: 1.8.2

> Add formats support for germeval2014
> 
>
> Key: OPENNLP-976
> URL: https://issues.apache.org/jira/browse/OPENNLP-976
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Formats
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
> Fix For: 1.8.2
>
>
> Details about the format can be found here:
> https://sites.google.com/site/germeval2014ner/data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1106) Update the coref code to compile against 1.6.0

2017-07-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1106:
---
Fix Version/s: 1.8.2

> Update the coref code to compile against 1.6.0
> --
>
> Key: OPENNLP-1106
> URL: https://issues.apache.org/jira/browse/OPENNLP-1106
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Coref
>Reporter: Joern Kottmann
>Assignee: Joern Kottmann
> Fix For: 1.8.2
>
>
> It would be nice if the coref code would compile against an older release 
> version and gets the code a bit updated so it complies mostly with checkstyle 
> rules.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1082) SentenceSampleStream should add EOS to samples if missing

2017-07-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1082:
---
Fix Version/s: 1.8.2

> SentenceSampleStream should add EOS to samples if missing
> -
>
> Key: OPENNLP-1082
> URL: https://issues.apache.org/jira/browse/OPENNLP-1082
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Sentence Detector
>Reporter: William Colen
>Assignee: William Colen
> Fix For: 1.8.2
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1113) evalPortugueseChunkerQnMultipleThreads and other tests can fail

2017-07-08 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated OPENNLP-1113:
---
Fix Version/s: 1.8.2

> evalPortugueseChunkerQnMultipleThreads and other tests can fail
> ---
>
> Key: OPENNLP-1113
> URL: https://issues.apache.org/jira/browse/OPENNLP-1113
> Project: OpenNLP
>  Issue Type: Test
>Affects Versions: 1.8.1
>Reporter: Jeff Zemerick
>Assignee: Jeff Zemerick
>Priority: Minor
> Fix For: 1.8.2
>
> Attachments: failure.txt, success.txt
>
>
> When running the eval-tests for the 1.8.1 tag some of the tests consistently 
> fail on an EC2 instance. On another virtual machine the tests consistently 
> pass. When the tests fail the failures are consistent with the following:
> {quote}Failed tests: 
>   
> ArvoresDeitadasEval.evalPortugueseChunkerQnMultipleThreads:208->chunkerCrossEval:128
>  expected:<0.9649180953528779> but was:<0.9650518197155942>
>   
> ArvoresDeitadasEval.evalPortugueseSentenceDetectorMaxentQn:143->sentenceCrossEval:90
>  expected:<0.99261110833375> but was:<0.9927505074644777>
>   Conll02NameFinderEval.evalSpanishOrganizationMaxentQn:390->eval:90 
> expected:<0.682961897915169> but was:<0.6798418972332015>
>   ConllXPosTaggerEval.evalSwedishMaxentQn:152->eval:76 
> expected:<0.9347595473833098> but was:<0.9322842998585573>{quote}
> Both systems are Ubuntu 16.04.2 running OpenJDK 1.8.0_131 but there must be 
> some other differences affecting the tests. Those differences need to be 
> identified.
> *VM1 (Tests Consistently _Pass_)*
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-1022-aws", arch: "amd64", family: "unix"
> LANG=en_US.UTF-8
> *VM2 (Tests Consistently _Fail_)*
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix"
> LANG=en_US.UTF-8
> This VM also consistently fails when using Oracle JDK:
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-oracle/jre
> *VM3 (Tests Consistently _Pass_)*
> Apache Maven 3.3.9
> Maven home: /usr/share/maven
> Java version: 1.8.0_131, vendor: Oracle Corporation
> Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.4.0-83-generic", arch: "amd64", family: "unix"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   >