[jira] [Assigned] (OPENNLP-951) Add a central RAT-exclude file

2017-01-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi reassigned OPENNLP-951:
-

Assignee: Suneel Marthi

> Add a central RAT-exclude file
> --
>
> Key: OPENNLP-951
> URL: https://issues.apache.org/jira/browse/OPENNLP-951
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Build, Packaging and Test
>Affects Versions: 1.7.1
>Reporter: William Colen
>Assignee: Suneel Marthi
> Fix For: 1.7.2
>
>
> The project needs a central rat-exclude file and configure that in parent pom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OPENNLP-952) Add checkstyle rule to verify presence of AL header

2017-01-20 Thread Joern Kottmann (JIRA)
Joern Kottmann created OPENNLP-952:
--

 Summary: Add checkstyle rule to verify presence of AL header
 Key: OPENNLP-952
 URL: https://issues.apache.org/jira/browse/OPENNLP-952
 Project: OpenNLP
  Issue Type: Improvement
  Components: Build, Packaging and Test
Reporter: Joern Kottmann
Assignee: Joern Kottmann
Priority: Trivial
 Fix For: 1.7.2






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (OPENNLP-951) Add a central RAT-exclude file

2017-01-20 Thread William Colen (JIRA)
William Colen created OPENNLP-951:
-

 Summary: Add a central RAT-exclude file
 Key: OPENNLP-951
 URL: https://issues.apache.org/jira/browse/OPENNLP-951
 Project: OpenNLP
  Issue Type: Improvement
  Components: Build, Packaging and Test
Affects Versions: 1.7.1
Reporter: William Colen
 Fix For: 1.7.2


The project needs a central rat-exclude file and configure that in parent pom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-950) Deprecate DocumentCategorizer.categorzie(String) variants

2017-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831917#comment-15831917
 ] 

ASF GitHub Bot commented on OPENNLP-950:


GitHub user smarthi opened a pull request:

https://github.com/apache/opennlp/pull/79

OPENNLP-950: Deprecate DocumentCategorizer.categorzie(String) variants



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/smarthi/opennlp OPENNLP-950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #79


commit 145c46a234f961d90bb80d989d2d15654f6d2d49
Author: smarthi 
Date:   2017-01-20T15:13:36Z

OPENNLP-950: Deprecate DocumentCategorizer.categorzie(String) variants




> Deprecate DocumentCategorizer.categorzie(String) variants
> -
>
> Key: OPENNLP-950
> URL: https://issues.apache.org/jira/browse/OPENNLP-950
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Doccat
>Reporter: Joern Kottmann
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.7.1
>
>
> The user is supposed to pass tokenized text to the Document Categorizer, 
> therefore the methods and mechanism which deal with untokenized text should 
> be deprecated so it can be removed in the future.
> DocumentCategorizer.categorize(String)
> DocumentCategorizer.categorize(String, Map)
> And also DoccatFactory tokenizer methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)

2017-01-20 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831857#comment-15831857
 ] 

Joern Kottmann commented on OPENNLP-936:


Lets move this one release ahead to invest more time on it. The 
SentenceDetectorME and TokenizerME implementations could just be made thread 
safe. Also there is a lack of documentation which explains what is thread safe 
and what not.

We should invest a bit more into this, and sorry to Thilo to let you wait a few 
more weeks until 1.7.2 is out.

> Add thread safe versions of some tools (ME sentence detection, tokenization, 
> pos tagging)
> -
>
> Key: OPENNLP-936
> URL: https://issues.apache.org/jira/browse/OPENNLP-936
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: POS Tagger
>Affects Versions: 1.7.1
>Reporter: Thilo Goetz
>Priority: Minor
> Fix For: 1.7.2
>
>
> As discussed on the mailing list, add thread safe versions of maximum entropy 
> sentence detection, tokenization and pos tagging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)

2017-01-20 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann updated OPENNLP-936:
---
Fix Version/s: (was: 1.7.1)
   1.7.2

> Add thread safe versions of some tools (ME sentence detection, tokenization, 
> pos tagging)
> -
>
> Key: OPENNLP-936
> URL: https://issues.apache.org/jira/browse/OPENNLP-936
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: POS Tagger
>Affects Versions: 1.7.1
>Reporter: Thilo Goetz
>Priority: Minor
> Fix For: 1.7.2
>
>
> As discussed on the mailing list, add thread safe versions of maximum entropy 
> sentence detection, tokenization and pos tagging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (OPENNLP-949) Extend eval tests to run more ml algorithms

2017-01-20 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann closed OPENNLP-949.
--
Resolution: Fixed

> Extend eval tests to run more ml algorithms
> ---
>
> Key: OPENNLP-949
> URL: https://issues.apache.org/jira/browse/OPENNLP-949
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Build, Packaging and Test
>Reporter: Joern Kottmann
>Assignee: Joern Kottmann
>Priority: Minor
> Fix For: 1.7.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.

2017-01-20 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831812#comment-15831812
 ] 

Joern Kottmann edited comment on OPENNLP-697 at 1/20/17 2:25 PM:
-

That is how it should be. The tokenizer should be removed from the factory, we 
will address this in OPENNLP-950.


was (Author: joern):
That is how it should be. The tokenizer should be removed from the factory, we 
will 

> Tokenizer class is hardcoded in the DocumentSampleStream class. 
> 
>
> Key: OPENNLP-697
> URL: https://issues.apache.org/jira/browse/OPENNLP-697
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Doccat, Tokenizer
>Affects Versions: 1.6.0
>Reporter: Praveena B
> Fix For: 1.7.1
>
>
> While training the DocumentCategorizerME it is possible to set the type of 
> Tokenizer that the categorizer should use.
> i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE); 
> But the Tokenizer class is hardcoded to WhitespaceTokenizer in the 
> DocumentSampleStream class. 
> So it is not possible to modify the default tokenizing behaviour even after 
> setting it in the doccatFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.

2017-01-20 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved OPENNLP-697.
---
   Resolution: Won't Fix
Fix Version/s: 1.7.1

> Tokenizer class is hardcoded in the DocumentSampleStream class. 
> 
>
> Key: OPENNLP-697
> URL: https://issues.apache.org/jira/browse/OPENNLP-697
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Doccat, Tokenizer
>Affects Versions: 1.6.0
>Reporter: Praveena B
> Fix For: 1.7.1
>
>
> While training the DocumentCategorizerME it is possible to set the type of 
> Tokenizer that the categorizer should use.
> i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE); 
> But the Tokenizer class is hardcoded to WhitespaceTokenizer in the 
> DocumentSampleStream class. 
> So it is not possible to modify the default tokenizing behaviour even after 
> setting it in the doccatFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OPENNLP-949) Extend eval tests to run more ml algorithms

2017-01-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831629#comment-15831629
 ] 

ASF GitHub Bot commented on OPENNLP-949:


Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/77


> Extend eval tests to run more ml algorithms
> ---
>
> Key: OPENNLP-949
> URL: https://issues.apache.org/jira/browse/OPENNLP-949
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: Build, Packaging and Test
>Reporter: Joern Kottmann
>Assignee: Joern Kottmann
>Priority: Minor
> Fix For: 1.7.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)