[jira] [Commented] (OPENNLP-840) Sentiment Analysis

2017-06-26 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064092#comment-16064092
 ] 

Chris A. Mattmann commented on OPENNLP-840:
---

OK all tests pass and build works:

{noformat}

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ opennlp-distr 
---
[INFO] Installing /Users/mattmann/git/opennlp/opennlp-distr/pom.xml to 
/Users/mattmann/.m2/repository/org/apache/opennlp/opennlp-distr/1.8.1-SNAPSHOT/opennlp-distr-1.8.1-SNAPSHOT.pom
[INFO] Installing 
/Users/mattmann/git/opennlp/opennlp-distr/target/apache-opennlp-1.8.1-SNAPSHOT-bin.tar.gz
 to 
/Users/mattmann/.m2/repository/org/apache/opennlp/opennlp-distr/1.8.1-SNAPSHOT/opennlp-distr-1.8.1-SNAPSHOT-bin.tar.gz
[INFO] Installing 
/Users/mattmann/git/opennlp/opennlp-distr/target/apache-opennlp-1.8.1-SNAPSHOT-bin.zip
 to 
/Users/mattmann/.m2/repository/org/apache/opennlp/opennlp-distr/1.8.1-SNAPSHOT/opennlp-distr-1.8.1-SNAPSHOT-bin.zip
[INFO] Installing 
/Users/mattmann/git/opennlp/opennlp-distr/target/apache-opennlp-1.8.1-SNAPSHOT-src.tar.gz
 to 
/Users/mattmann/.m2/repository/org/apache/opennlp/opennlp-distr/1.8.1-SNAPSHOT/opennlp-distr-1.8.1-SNAPSHOT-src.tar.gz
[INFO] Installing 
/Users/mattmann/git/opennlp/opennlp-distr/target/apache-opennlp-1.8.1-SNAPSHOT-src.zip
 to 
/Users/mattmann/.m2/repository/org/apache/opennlp/opennlp-distr/1.8.1-SNAPSHOT/opennlp-distr-1.8.1-SNAPSHOT-src.zip
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache OpenNLP Reactor . SUCCESS [  4.749 s]
[INFO] Apache OpenNLP Tools ... SUCCESS [01:07 min]
[INFO] Apache OpenNLP UIMA Annotators . SUCCESS [ 10.972 s]
[INFO] Apache OpenNLP Brat Annotator .. SUCCESS [ 11.016 s]
[INFO] Apache OpenNLP Morfologik Addon  SUCCESS [  5.650 s]
[INFO] Apache OpenNLP Documentation ... SUCCESS [  9.800 s]
[INFO] Apache OpenNLP Distribution  SUCCESS [  4.986 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:55 min
[INFO] Finished at: 2017-06-25T07:14:38-07:00
[INFO] Final Memory: 79M/1658M
[INFO] 
{noformat}

Going to commit now!

> Sentiment Analysis
> --
>
> Key: OPENNLP-840
> URL: https://issues.apache.org/jira/browse/OPENNLP-840
> Project: OpenNLP
>  Issue Type: New Feature
>Reporter: Mondher Bouazizi
>Assignee: Chris A. Mattmann
>  Labels: gsoc, gsoc2016, nlp
>
> The objective of the "Sentiment Analysis" component is to determine the 
> sentiment of the author towards the object of his text.
> Different techniques are proposed in the academic literature, and some state 
> of the art approaches present very high accuracy.
> Sentiment analysis can have different granularity levels:
> - Binary classification: in this case, the text is to be classified into two 
> classes which are "positive" and "negative".
> - Ternary classification: in addition to the two classes present in the 
> binary classification, a third class is added which is "neutral".
> - Multi-class sentiment analysis: the two classes "positive" and "negative" 
> are further divided into sub-classes (e.g., "love" happiness", etc. for the 
> positive class; and "hate", "anger", etc. for the negative class). Therefore 
> the classification objective is to determine the sentiment sub-class instead 
> of the main polarity
> In this component, we will implement some of the state of the art approaches, 
> in particular the one presented here[1]. approaches use machine-learning 
> techniques to learn a classifier from labeled training sets.
> ---
> [1] http://www.ieice.org/ken/paper/20160129DbfF/eng/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OPENNLP-1098) Create a web page for 'Books-Tutorials-Talks'

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063897#comment-16063897
 ] 

ASF GitHub Bot commented on OPENNLP-1098:
-

GitHub user smarthi opened a pull request:

https://github.com/apache/opennlp-site/pull/22

OPENNLP-1098: Create a web page for 'Books-Tutorials-Talks'



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/smarthi/opennlp-site master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp-site/pull/22.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22


commit 419f89c8511cf421c3c7fe54611529115fbd0462
Author: smarthi 
Date:   2017-06-26T22:16:36Z

OPENNLP-1098: Create a web page for 'Books-Tutorials-Talks'




> Create a web page for 'Books-Tutorials-Talks' 
> --
>
> Key: OPENNLP-1098
> URL: https://issues.apache.org/jira/browse/OPENNLP-1098
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Website
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.8.1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OPENNLP-1098) Create a web page for 'Books-Tutorials-Talks'

2017-06-26 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063892#comment-16063892
 ] 

Suneel Marthi commented on OPENNLP-1098:


PR on the way

> Create a web page for 'Books-Tutorials-Talks' 
> --
>
> Key: OPENNLP-1098
> URL: https://issues.apache.org/jira/browse/OPENNLP-1098
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Website
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.8.1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (OPENNLP-1092) PosTagger serialization in namefinder model

2017-06-26 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann reassigned OPENNLP-1092:
---

Assignee: Joern Kottmann

> PosTagger serialization in namefinder model
> ---
>
> Key: OPENNLP-1092
> URL: https://issues.apache.org/jira/browse/OPENNLP-1092
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Name Finder
>Affects Versions: 1.8.0, 1.8.1
> Environment: Ubuntu 16.04 - Intel Core i7 6700k - Openjdk version 
> 1.8.0_131
>Reporter: Damiano Porta
>Assignee: Joern Kottmann
> Fix For: 1.8.1
>
>
> I am getting an error during the serialization of the post tagger inside a 
> name finder model.
> The error is: *java.lang.IllegalStateException: Missing serializer for 
> postagger.bin*
> I am having this problem via API and via cmd NameFinderTrainer tool.
> The command is:
> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it 
> -model /home/damiano/model.bin -featuregen /home/damiano/test.xml 
> -sequenceCodec BIO -resources 
> /home/damiano/lavoro/java/Parser/src/main/resources/*
> {code}
> The output is:
> Writing name finder model ... Compressed 885605 parameters to 94030
> 3451 outcome patterns
> Exception in thread "main" java.lang.IllegalStateException: Missing 
> serializer for postagger.bin
>   at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)
>   at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)
>   at 
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)
>   at opennlp.tools.cmdline.CLI.main(CLI.java:244)
> {code}
> My generators.xml is:
> {code:xml}
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
>  
> 
> 
> 
>   
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OPENNLP-1092) PosTagger serialization in namefinder model

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063161#comment-16063161
 ] 

ASF GitHub Bot commented on OPENNLP-1092:
-

GitHub user kottmann opened a pull request:

https://github.com/apache/opennlp/pull/237

OPENNLP-1092: Fix pos model serialization bug

Thank you for contributing to Apache OpenNLP.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kottmann/opennlp opennlp-1092

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/opennlp/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #237


commit 97281ba4a752536c261bc739e165309e70363399
Author: Jörn Kottmann 
Date:   2017-06-26T14:20:09Z

OPENNLP-1092: Fix pos model serialization bug




> PosTagger serialization in namefinder model
> ---
>
> Key: OPENNLP-1092
> URL: https://issues.apache.org/jira/browse/OPENNLP-1092
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Name Finder
>Affects Versions: 1.8.0, 1.8.1
> Environment: Ubuntu 16.04 - Intel Core i7 6700k - Openjdk version 
> 1.8.0_131
>Reporter: Damiano Porta
> Fix For: 1.8.1
>
>
> I am getting an error during the serialization of the post tagger inside a 
> name finder model.
> The error is: *java.lang.IllegalStateException: Missing serializer for 
> postagger.bin*
> I am having this problem via API and via cmd NameFinderTrainer tool.
> The command is:
> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it 
> -model /home/damiano/model.bin -featuregen /home/damiano/test.xml 
> -sequenceCodec BIO -resources 
> /home/damiano/lavoro/java/Parser/src/main/resources/*
> {code}
> The output is:
> Writing name finder model ... Compressed 885605 parameters to 94030
> 3451 outcome patterns
> Exception in thread "main" java.lang.IllegalStateException: Missing 
> serializer for postagger.bin
>   at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)
>   at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)
>   at 
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)
>   at opennlp.tools.cmdline.CLI.main(CLI.java:244)
> {code}
> My generators.xml is:
> {code:xml}
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  
> 
> 
> 
>  
> 
> 
> 
>   
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (OPENNLP-1096) Optimize n-gram creation loop for CPU cache usage

2017-06-26 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann closed OPENNLP-1096.
---
Resolution: Fixed

> Optimize n-gram creation loop for CPU cache usage
> -
>
> Key: OPENNLP-1096
> URL: https://issues.apache.org/jira/browse/OPENNLP-1096
> Project: OpenNLP
>  Issue Type: Improvement
>Reporter: Joern Kottmann
>Assignee: Joern Kottmann
>Priority: Trivial
> Fix For: 1.8.1
>
>
> There are two for loops to read the string and calculate n-grams, the loops 
> should be turned around to be more cache friendly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OPENNLP-1096) Optimize n-gram creation loop for CPU cache usage

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063016#comment-16063016
 ] 

ASF GitHub Bot commented on OPENNLP-1096:
-

Github user asfgit closed the pull request at:

https://github.com/apache/opennlp/pull/235


> Optimize n-gram creation loop for CPU cache usage
> -
>
> Key: OPENNLP-1096
> URL: https://issues.apache.org/jira/browse/OPENNLP-1096
> Project: OpenNLP
>  Issue Type: Improvement
>Reporter: Joern Kottmann
>Assignee: Joern Kottmann
>Priority: Trivial
> Fix For: 1.8.1
>
>
> There are two for loops to read the string and calculate n-grams, the loops 
> should be turned around to be more cache friendly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-936) Add thread safe versions of some tools (ME sentence detection, tokenization, pos tagging)

2017-06-26 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann updated OPENNLP-936:
---
Fix Version/s: (was: 1.8.1)
   1.8.2

> Add thread safe versions of some tools (ME sentence detection, tokenization, 
> pos tagging)
> -
>
> Key: OPENNLP-936
> URL: https://issues.apache.org/jira/browse/OPENNLP-936
> Project: OpenNLP
>  Issue Type: Improvement
>  Components: POS Tagger
>Affects Versions: 1.7.1
>Reporter: Thilo Goetz
>Priority: Minor
> Fix For: 1.8.2
>
>
> As discussed on the mailing list, add thread safe versions of maximum entropy 
> sentence detection, tokenization and pos tagging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (OPENNLP-1091) Fixing issues found via FindBugs and warnings found via IDE

2017-06-26 Thread Joern Kottmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/OPENNLP-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joern Kottmann updated OPENNLP-1091:

Fix Version/s: 1.8.1

> Fixing issues found via FindBugs and warnings found via IDE
> ---
>
> Key: OPENNLP-1091
> URL: https://issues.apache.org/jira/browse/OPENNLP-1091
> Project: OpenNLP
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Bruno P. Kinoshita
>Assignee: Bruno P. Kinoshita
>Priority: Minor
>  Labels: findbugs, static-analysis, warnings
> Fix For: 1.8.1
>
>
> There are several issues that can be found using *FindBugs*.
> {noformat}
> mvn clean install findbugs:findbugs findbugs:gui
> {noformat}
> The _opennlp-tools_ is the only project with issues. Some are mere cosmetics, 
> or not so important. The pull request mentioned in this issue does not fix 
> all issues found, only the ones that I thought would be more important, and 
> that would not have huge impact in the code (i.e. would not have to change 
> much of the current behaviour/code base).
> Some changes are quite useful, such as optimizations that replace string 
> concatenation and use _Map#entrySet_ instead of _Map#keySet_ + another call 
> to _Map#get_. All the optimizations changes put together, I expect we should 
> see at least a few milliseconds improvement.
> Other changes are quite important, such as comparisons with 
> _Object.equals(anArray, anotherArray)_, which will compare two objects with 
> _==_, meaning that even when the arrays are equals, it would still return 
> false.
> In the pull request, I intentionally did not squash it now, as the second 
> commit include warnings found via the IDE (Eclipse in this case, but I 
> believe it's independent of the IDE). Such as _suppressWarnings_ that are not 
> necessary, and - the most importants - resource leak.
> This latter issue was fixed with Java8 try-with-resources, mainly in tests, 
> but also in some tools.
> Cheers
> Bruno



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OPENNLP-1098) Create a web page for 'Books-Tutorials-Talks'

2017-06-26 Thread Bruno P. Kinoshita (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062655#comment-16062655
 ] 

Bruno P. Kinoshita commented on OPENNLP-1098:
-

If you would like to throw some initial links here, and perhaps some categories 
too? Books, Twitter accounts, Projects, Talks, etc?

> Create a web page for 'Books-Tutorials-Talks' 
> --
>
> Key: OPENNLP-1098
> URL: https://issues.apache.org/jira/browse/OPENNLP-1098
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Website
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.8.1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OPENNLP-1098) Create a web page for 'Books-Tutorials-Talks'

2017-06-26 Thread Bruno P. Kinoshita (JIRA)

[ 
https://issues.apache.org/jira/browse/OPENNLP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062653#comment-16062653
 ] 

Bruno P. Kinoshita commented on OPENNLP-1098:
-

Went to JIRA to file this issue :-) thanks for doing that.

Page linked in the chat for reference later:

http://mahout.apache.org/general/books-tutorials-and-talks.html

> Create a web page for 'Books-Tutorials-Talks' 
> --
>
> Key: OPENNLP-1098
> URL: https://issues.apache.org/jira/browse/OPENNLP-1098
> Project: OpenNLP
>  Issue Type: New Feature
>  Components: Website
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.8.1
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)