[jira] [Assigned] (OPENNLP-951) Add a central RAT-exclude file
[ https://issues.apache.org/jira/browse/OPENNLP-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi reassigned OPENNLP-951: - Assignee: Suneel Marthi > Add a central RAT-exclude file > -- > > Key: OPENNLP-951 > URL: https://issues.apache.org/jira/browse/OPENNLP-951 > Project: OpenNLP > Issue Type: Improvement > Components: Build, Packaging and Test >Affects Versions: 1.7.1 >Reporter: William Colen >Assignee: Suneel Marthi > Fix For: 1.7.2 > > > The project needs a central rat-exclude file and configure that in parent pom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OPENNLP-952) Add checkstyle rule to verify presence of AL header
Joern Kottmann created OPENNLP-952: -- Summary: Add checkstyle rule to verify presence of AL header Key: OPENNLP-952 URL: https://issues.apache.org/jira/browse/OPENNLP-952 Project: OpenNLP Issue Type: Improvement Components: Build, Packaging and Test Reporter: Joern Kottmann Assignee: Joern Kottmann Priority: Trivial Fix For: 1.7.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (OPENNLP-951) Add a central RAT-exclude file
William Colen created OPENNLP-951: - Summary: Add a central RAT-exclude file Key: OPENNLP-951 URL: https://issues.apache.org/jira/browse/OPENNLP-951 Project: OpenNLP Issue Type: Improvement Components: Build, Packaging and Test Affects Versions: 1.7.1 Reporter: William Colen Fix For: 1.7.2 The project needs a central rat-exclude file and configure that in parent pom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-950) Deprecate DocumentCategorizer.categorzie(String) variants
[ https://issues.apache.org/jira/browse/OPENNLP-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831917#comment-15831917 ] ASF GitHub Bot commented on OPENNLP-950: GitHub user smarthi opened a pull request: https://github.com/apache/opennlp/pull/79 OPENNLP-950: Deprecate DocumentCategorizer.categorzie(String) variants You can merge this pull request into a Git repository by running: $ git pull https://github.com/smarthi/opennlp OPENNLP-950 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/opennlp/pull/79.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #79 commit 145c46a234f961d90bb80d989d2d15654f6d2d49 Author: smarthiDate: 2017-01-20T15:13:36Z OPENNLP-950: Deprecate DocumentCategorizer.categorzie(String) variants > Deprecate DocumentCategorizer.categorzie(String) variants > - > > Key: OPENNLP-950 > URL: https://issues.apache.org/jira/browse/OPENNLP-950 > Project: OpenNLP > Issue Type: Improvement > Components: Doccat >Reporter: Joern Kottmann >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.7.1 > > > The user is supposed to pass tokenized text to the Document Categorizer, > therefore the methods and mechanism which deal with untokenized text should > be deprecated so it can be removed in the future. > DocumentCategorizer.categorize(String) > DocumentCategorizer.categorize(String, Map ) > And also DoccatFactory tokenizer methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)
[ https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831857#comment-15831857 ] Joern Kottmann commented on OPENNLP-936: Lets move this one release ahead to invest more time on it. The SentenceDetectorME and TokenizerME implementations could just be made thread safe. Also there is a lack of documentation which explains what is thread safe and what not. We should invest a bit more into this, and sorry to Thilo to let you wait a few more weeks until 1.7.2 is out. > Add thread safe versions of some tools (ME sentence detection, tokenization, > pos tagging) > - > > Key: OPENNLP-936 > URL: https://issues.apache.org/jira/browse/OPENNLP-936 > Project: OpenNLP > Issue Type: Improvement > Components: POS Tagger >Affects Versions: 1.7.1 >Reporter: Thilo Goetz >Priority: Minor > Fix For: 1.7.2 > > > As discussed on the mailing list, add thread safe versions of maximum entropy > sentence detection, tokenization and pos tagging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)
[ https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joern Kottmann updated OPENNLP-936: --- Fix Version/s: (was: 1.7.1) 1.7.2 > Add thread safe versions of some tools (ME sentence detection, tokenization, > pos tagging) > - > > Key: OPENNLP-936 > URL: https://issues.apache.org/jira/browse/OPENNLP-936 > Project: OpenNLP > Issue Type: Improvement > Components: POS Tagger >Affects Versions: 1.7.1 >Reporter: Thilo Goetz >Priority: Minor > Fix For: 1.7.2 > > > As discussed on the mailing list, add thread safe versions of maximum entropy > sentence detection, tokenization and pos tagging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (OPENNLP-949) Extend eval tests to run more ml algorithms
[ https://issues.apache.org/jira/browse/OPENNLP-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joern Kottmann closed OPENNLP-949. -- Resolution: Fixed > Extend eval tests to run more ml algorithms > --- > > Key: OPENNLP-949 > URL: https://issues.apache.org/jira/browse/OPENNLP-949 > Project: OpenNLP > Issue Type: Improvement > Components: Build, Packaging and Test >Reporter: Joern Kottmann >Assignee: Joern Kottmann >Priority: Minor > Fix For: 1.7.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.
[ https://issues.apache.org/jira/browse/OPENNLP-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831812#comment-15831812 ] Joern Kottmann edited comment on OPENNLP-697 at 1/20/17 2:25 PM: - That is how it should be. The tokenizer should be removed from the factory, we will address this in OPENNLP-950. was (Author: joern): That is how it should be. The tokenizer should be removed from the factory, we will > Tokenizer class is hardcoded in the DocumentSampleStream class. > > > Key: OPENNLP-697 > URL: https://issues.apache.org/jira/browse/OPENNLP-697 > Project: OpenNLP > Issue Type: Bug > Components: Doccat, Tokenizer >Affects Versions: 1.6.0 >Reporter: Praveena B > Fix For: 1.7.1 > > > While training the DocumentCategorizerME it is possible to set the type of > Tokenizer that the categorizer should use. > i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE); > But the Tokenizer class is hardcoded to WhitespaceTokenizer in the > DocumentSampleStream class. > So it is not possible to modify the default tokenizing behaviour even after > setting it in the doccatFactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.
[ https://issues.apache.org/jira/browse/OPENNLP-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suneel Marthi resolved OPENNLP-697. --- Resolution: Won't Fix Fix Version/s: 1.7.1 > Tokenizer class is hardcoded in the DocumentSampleStream class. > > > Key: OPENNLP-697 > URL: https://issues.apache.org/jira/browse/OPENNLP-697 > Project: OpenNLP > Issue Type: Bug > Components: Doccat, Tokenizer >Affects Versions: 1.6.0 >Reporter: Praveena B > Fix For: 1.7.1 > > > While training the DocumentCategorizerME it is possible to set the type of > Tokenizer that the categorizer should use. > i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE); > But the Tokenizer class is hardcoded to WhitespaceTokenizer in the > DocumentSampleStream class. > So it is not possible to modify the default tokenizing behaviour even after > setting it in the doccatFactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OPENNLP-949) Extend eval tests to run more ml algorithms
[ https://issues.apache.org/jira/browse/OPENNLP-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831629#comment-15831629 ] ASF GitHub Bot commented on OPENNLP-949: Github user asfgit closed the pull request at: https://github.com/apache/opennlp/pull/77 > Extend eval tests to run more ml algorithms > --- > > Key: OPENNLP-949 > URL: https://issues.apache.org/jira/browse/OPENNLP-949 > Project: OpenNLP > Issue Type: Improvement > Components: Build, Packaging and Test >Reporter: Joern Kottmann >Assignee: Joern Kottmann >Priority: Minor > Fix For: 1.7.1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)