Re: [VOTE] Apache OpenNLP 2.0.0 Release Candidate

2022-06-01 Thread Joern Kottmann
+1 binding Thanks for all the work on this Jeff! Cheers, Jörn On Wed, Jun 1, 2022 at 9:57 PM Suneel Marthi wrote: > +1 binding > > On Wed, Jun 1, 2022 at 3:12 PM Jeff Zemerick wrote: > > > Just pinging folks on the thread about the active vote. The project has a > > board report due in a

Re: [VOTE] Apache OpenNLP 1.9.3 Release Candidate

2020-07-29 Thread Joern Kottmann
+1 Release the packages as Apache OpenNLP 1.9.3 Jörn On Wed, Jul 29, 2020 at 1:08 PM Tommaso Teofili wrote: > > +1 from me, build, sigs, tag look good. > > Regards, > Tommaso > > On Tue, 28 Jul 2020 at 10:48, Bruno P. Kinoshita wrote: > > > It worked after I imported keys from > >

Re: license for opennlp 1.5 pre-trained models

2019-12-30 Thread Joern Kottmann
Hello, The Apache OpenNLP project only distributes models that are licensed under the AL 2.0 license, or models that comply with the strict licensing requirements at Apache. So far we only release a language detection model at the Apache OpenNLP project. The OpenNLP project was hosted in the

Re: OpenNLP 1.9.2 and Java 8/11

2019-12-15 Thread Joern Kottmann
+1 lgtm, it would be nice to track down the exact cause of the changes on accuracy caused by the JDK update. We had similar issues in the past e.g through things like the undefined iteration order of Sets. I am happy to help with this. Jörn On Sat, Dec 14, 2019 at 3:48 PM Tommaso Teofili

Re: KStem support?

2019-02-19 Thread Joern Kottmann
Hello, we don't have it, but it would be nice to get a contribution for it. Jörn On Thu, Feb 7, 2019 at 3:03 PM Benedict Holland wrote: > > Hello all, > > I just happened to read a Solr message about using KStem. Is there any > support for this particular stemmer or would you like there to be?

Re: [VOTE] Apache OpenNLP 1.9.0 Release Candidate 2

2018-06-29 Thread Joern Kottmann
+1 Jörn On Fri, Jun 29, 2018 at 1:45 PM, Jeff Zemerick wrote: > Hi folks, > > I have posted a 2nd release candidate for the Apache OpenNLP 1.9.0 release > and it is ready for testing. > > The distributables can be downloaded from: >

Re: Custom models (for Ukrainian and Russian languages)

2018-06-28 Thread Joern Kottmann
Hello, we would be happy to hear about your experience. Did the language detector perform well enough on Russian/Ukrainian texts? To reproduce the models we train you should download the data via svn: svn co https://svn.apache.org/repos/bigdata/opennlp/trunk opennlp-corpus Note the corpus is

Re: OPENNLP-912 : Add a rule based sentence detector

2018-04-06 Thread Joern Kottmann
Hello, could you elaborate a bit on the approach? Jörn On Tue, Apr 3, 2018 at 5:24 PM, Isuranga Perera wrote: > Hi All, > > I would like to contribute $subject feature. Appreciate if anyone can guide > me through the process. > > Best Regards > Isuranga Perera

Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-23 Thread Joern Kottmann
+1 Jörn On Dec 21, 2017 15:44, "Jeff Zemerick" wrote: > Hi Folks, > > I have posted a first release candidate for the Apache OpenNLP 1.8.4 > release and it is ready for testing. > > The RC1 distributables can be downloaded from here: >

Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-10-30 Thread Joern Kottmann
+1 Jörn On Mon, Oct 30, 2017 at 2:30 PM, William Colen wrote: > The Apache OpenNLP PMC would like to call for a Vote on the Language > Detector model for Apache OpenNLP 1.8.3 Release Candidate 3. > > The Release artifacts can be downloaded from: > >

Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-26 Thread Joern Kottmann
+1 Jörn On Thu, Oct 26, 2017 at 10:18 AM, Rodrigo Agerri wrote: > +1 (binding) > > -eval and unit tests OK > > On Wed, Oct 25, 2017 at 7:01 PM, William Colen > wrote: >> +1 binding >> >> - eval tests ok >> - unit test ok >> - build from tag ok

[ANNOUNCE] CVE-2017-12620: Apache OpenNLP XXE vulnerability

2017-10-02 Thread Joern Kottmann
Severity: Medium Vendor: The Apache Software Foundation Versions Affected: OpenNLP 1.5.0 to 1.5.3 OpenNLP 1.6.0 OpenNLP 1.7.0 to 1.7.2 OpenNLP 1.8.0 to 1.8.1 Description: When loading models or dictionaries that contain XML it is possible to perform an XXE attack, since OpenNLP is a library,

Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-15 Thread Joern Kottmann
com> wrote: > Vote: +1 binding > > >> On 11 Sep 2017, at 09.12, Joern Kottmann <kottm...@gmail.com> wrote: >> >> Hi Folks, >> >> >> I have posted a second release candidate for the Apache OpenNLP 1.8.2 >> release and it is ready for testin

[VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-11 Thread Joern Kottmann
Hi Folks, I have posted a second release candidate for the Apache OpenNLP 1.8.2 release and it is ready for testing. The RC 2 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/ The

Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-07 Thread Joern Kottmann
t;> >> > +1 binding >> > >> > /PEter Thygesen >> > >> > > On 4 Sep 2017, at 23.41, Joern Kottmann <jo...@apache.org> wrote: >> > > >> > > Hi Folks, >> > > >> > > >> > > I

[VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-04 Thread Joern Kottmann
Hi Folks, I have posted a first release candidate for the Apache OpenNLP 1.8.2 release and it is ready for testing. The RC 1 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/ The

Re: Early stopping NameFinderME

2017-08-29 Thread Joern Kottmann
get to it later tonight. > Daniel > >> On Aug 29, 2017, at 10:32 AM, Joern Kottmann <kottm...@gmail.com> wrote: >> >> Hi Daniel, >> >> do you see any issue if we expose LLThreshold and allow the user to >> change it via training parameters? >&g

Re: Early stopping NameFinderME

2017-08-29 Thread Joern Kottmann
? > > Daniel > >> On Aug 24, 2017, at 4:48 AM, Joern Kottmann <kottm...@gmail.com> wrote: >> >> You are the first one who ever asked this question. I think we have this as >> an option already on the gis trainer but it is not exposed all the way >> through. &

Re: Early stopping NameFinderME

2017-08-24 Thread Joern Kottmann
You are the first one who ever asked this question. I think we have this as an option already on the gis trainer but it is not exposed all the way through. Please open a jira and I can look at it next week. Jörn On Aug 21, 2017 5:11 PM, "Saurabh Jain" wrote: > Hi

Re: Problem of POSTaggerCrossValidator

2017-07-20 Thread Joern Kottmann
Hello, attachments are not allowed on this list. Could you please copy the error you got and the command you used into a mail? Jörn On Thu, Jul 20, 2017 at 6:31 AM, Santipong Thaiprayoon wrote: > To whom it may concern. > > > I used OpenNLP version 1.8.1 for

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
t; the metadata would have the algorithm information ? > > 2. Do we publish multiple models for the same task, each trained on > different algorithms ? > > > > On Tue, Jul 11, 2017 at 9:30 AM, Joern Kottmann <kottm...@gmail.com> wrote: > >> Hello, >> &g

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
ols – lots in my day – that can load from the CLI to >> override an >> internal classpath dependency. This is for people in environments who want >> a sensible >> / delivered internal classpath default and the ability for run-time, non >> zipped up/messing >> wi

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
g model or always download from the > original provider? We can't guarantee that the corpus will be there > forever, not only because it changed license, but simple because the > provider is not keeping the server up anymore. > > William > > > > 2017-07-10 14:52 GMT-03:00

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
gt;>> If the user does not provide a model to use on the CLI, can the CLI tools >>> look on the classpath for a model whose name fits the needed model (like >>> en-ner-person.bin) and if found use it automatically? >>> >>> Jeff >>> >>> >>&

Re: Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
t; provider is not keeping the server up anymore. > > William > > > > 2017-07-10 14:52 GMT-03:00 Joern Kottmann <kottm...@gmail.com>: > >> Hello all, >> >> since Apache OpenNLP 1.8.1 we have a new language detection component >> which like all our compone

Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
Hello all, since Apache OpenNLP 1.8.1 we have a new language detection component which like all our components has to be trained. I think we should release a pre-build model for it trained on the Leipzig corpus. This will allow the majority of our users to get started very quickly with language

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread Joern Kottmann
+1 i did run the eval the tests and they passed Jörn On Fri, Jul 7, 2017 at 1:06 PM, Bruno P. Kinoshita wrote: > Build passing OK with the following environment: > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; > 2015-11-11T05:41:47+13:00) >

Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

2017-07-05 Thread Joern Kottmann
d, Jul 5, 2017 at 8:26 AM, Chris Mattmann <mattm...@apache.org> > wrote: > > > Thamme, great job! > > > > (proud academic dad) > > > > Cheers, > > Chris > > > > > > > > >

Re: Title: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 2

2017-07-05 Thread Joern Kottmann
hs and sigs > 2. clean build from {src} * {tar, zip} and all tests pass > > > On Tue, Jul 4, 2017 at 9:16 AM, Joern Kottmann <kottm...@gmail.com> wrote: > >> Hi Folks, >> >> >> I have posted a 2nd release candidate for the Apache OpenNLP 1.8.1 >> rel

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate

2017-07-04 Thread Joern Kottmann
Thank you very much for that info. We reverted the change we did to the sentence detector and will do this in a release after 1.8.1. RC 2 is now available. Jörn On Sun, Jul 2, 2017 at 9:25 PM, Richard Eckart de Castilho <r...@apache.org> wrote: > On 02.07.2017, at 19:13, Joern Kottma

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate

2017-07-02 Thread Joern Kottmann
Hello, one question, did you retrain or use existing models? Jörn On Sat, Jul 1, 2017 at 10:20 PM, Richard Eckart de Castilho wrote: > Hi all, > > I ran a DKPro Core build against the RC. Looks mostly fine. No code changes > are required after switching from 1.8.0 to 1.8.1.

1.8.1 release

2017-07-01 Thread Joern Kottmann
Dear all, We will be making a 1.8.1 release of OpenNLP in the next days. All issues in jira are closed now. Jörn

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
One more thing, in case we check in models for unit tests we need to be able to train them again, we might not support those models forever and then it would be bad if we can't use the tests anymore or need to repair them by hand. Jörn On Thu, Jun 29, 2017 at 7:18 PM, Joern Kottmann <ko

Re: Missing serializer for postagger.bin

2017-06-29 Thread Joern Kottmann
This is fixed now in the master branch, would you mind to try it again? Jörn On Wed, Jun 14, 2017 at 4:31 PM, Joern Kottmann <kottm...@gmail.com> wrote: > We have to fix this, William wrote a unit test to reproduce it. > > Jörn > > On Fri, Jun 9, 2017 at 4:31 PM, Dam

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
ennlp/tools/sentiment/sample_train_categ2 > (for categorical/multi-class) > > We can also do similar files where instead of multi-class, we just use > pos/neg as the label. > > Cheers, > Chris > > > > > > On 6/29/17, 2:35 AM, "Joern Kottmann" <kottm...

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-29 Thread Joern Kottmann
Is there some rush here? > > Cheers, > Chris > > > > > On 6/28/17, 3:57 AM, "Joern Kottmann" <kottm...@gmail.com> wrote: > > The vote passes, only +1 votes have been received: > +1 Mark G > +1 Rodrigo Agerri > +1 Jeff Zemerick >

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
Hello Chris, could you please point me to files I can use to train the sentiment component? I am currently looking again through the code and would like to train it myself. Jörn On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ wrote: > Hi All, >First, let me take a share of

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-28 Thread Joern Kottmann
Hub >> >> >> >> On Tue, Jun 27, 2017 at 10:48 PM, Chris Mattmann <mattm...@apache.org> >> wrote: >> >> > If you are talking about using Apache Gitbox, then yes I am +1 for this. >> > >> > Thanks, >> > Chris >> > &

[VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Joern Kottmann
Hello all, lets decide here if we want to move our main repository, currently hosted at Apache to GitHub instead. This will make our process a bit easier because we can eliminate one remote from our workflow. [ ] +1 Migrate all repositories to GitHub [ ] -1 Do not migrate, because...

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Joern Kottmann
+1 Jörn On Tue, Jun 27, 2017 at 12:30 PM, Joern Kottmann <kottm...@gmail.com> wrote: > Hello all, > > lets decide here if we want to move our main repository, currently > hosted at Apache to GitHub instead. This will make our process a bit > easier because we can eliminat

Re: Missing serializer for postagger.bin

2017-06-14 Thread Joern Kottmann
tackTrace(); > >> } > >> > >> } > >> } > >> } > >> > >> public static POSModel loadPosTagger (String modelName) { > >> > >> try (InputStream modelIn = new

Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
opennlp-tools > 1.8.0 > > > Do i need others dependencies too? > > > > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > > > This should be working. Did you test with 1.8.0? > > > > Jörn > > > &

Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
This should be working. Did you test with 1.8.0? Jörn On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta wrote: > Hello, > i am using the POSTaggerFeatureGenerator via generators.xml > > > > during the training i add this model in the resources doing: > >

Re: opennlp.tools.coref.mention.JWNLDictionary;

2017-05-23 Thread Joern Kottmann
The coref component was removed from OpenNLP quite some time ago because we didn't have a maintainer anymore for it. The JWNLDictionary class was part of that removal, you can still find the code in the OpenNLP Sandbox:

[ANNOUNCE] Apache OpenNLP 1.8.0 Release

2017-05-19 Thread Joern Kottmann
The Apache OpenNLP team is pleased to announce the release of version 1.8.0 of Apache OpenNLP. The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation,

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Joern Kottmann
The vote passes, only +1 votes were receive: +1 Bruno +1 Tommaso +1 William +1 Jörn +1 Jeff +1 Daniel +1 Richard +1 Joey +1 Suneel +1 Rodrigo Thanks for voting! Jörn On Wed, 2017-05-17 at 23:48 +0200, Joern Kottmann wrote: > The Apache OpenNLP PMC would like to call for a Vote on Apa

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Joern Kottmann
nlp/blob/73c8e5b9d8e055fefb53f7f3c2487d > > 05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/ > > POSTaggerNameFeatureGenerator.java#L59 > > > Plus other NullPointerException's that can be prevented, and other > minor > > > issues. Not blockers for the release though, IMO. >

Re: CoReference

2017-05-18 Thread Joern Kottmann
"Damiano Porta" <damianopo...@gmail.com> ha scritto: > > > Oh my wrong. Pardon. > > Do we have accuracy statistics? > > > > Il 18 mag 2017 14:59, "Joern Kottmann" <kottm...@gmail.com> ha scritto: > > > >> This is for linking

Re: CoReference

2017-05-18 Thread Joern Kottmann
This is for linking entities in one document, e.g. first name mention to a full name mention, or to he, she, it. Jörn On Thu, May 18, 2017 at 1:27 PM, Damiano Porta wrote: > Hi, thanks but I need to link entities to each others . I do not need to > link entities to

[VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-17 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 3.  The RC 3 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 3/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
15, 2017 at 6:21 PM, Richard Eckart de Castilho <r...@apache.org> wrote: > > On 15.05.2017, at 16:35, Joern Kottmann <kottm...@gmail.com> wrote: > > > > Richard, I believe I found the problem with the parser, would you mind to > > take a look? > > > &g

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Richard, I believe I found the problem with the parser, would you mind to take a look? This PR should fix it: https://github.com/apache/opennlp/pull/199 Jörn On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilho wrote: > Hi Rodrigo, > > On 15.05.2017, at 15:36, Rodrigo

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/test/java/opennlp/tools/eval/OntoNotes4ParserEval.java#L70 On Sat, May 13, 2017 at 10:35 PM, Richard Eckart de Castilho <r...@apache.org > wrote: > Hi all, > > > On 11.05.2017, at 18:

Re: Error when processing doap file http://opennlp.apache.org/doap_opennlp.rdf:

2017-05-12 Thread Joern Kottmann
Thanks for forwarding this to the dev list. The file is now available again. Jörn On Fri, May 12, 2017 at 10:46 AM, sebb wrote: > -- Forwarded message -- > From: Projects > Date: 12 May 2017 at 03:00 > Subject: Error when processing doap

[VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-11 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 2.  The RC 2 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 2/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the

[ANNOUNCE] New website for Apache OpenNLP

2017-05-11 Thread Joern Kottmann
Hello all, we launched a redesigned new web site for Apache OpenNLP with a new logo - check it out at https://opennlp.apache.org Regards, The Apache OpenNLP Team

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Joern Kottmann
I am canceling the vote due to the above mentioned bug. Lets prepare another RC which has this issue fixed. Jörn On Thu, May 11, 2017 at 9:51 AM, Joern Kottmann <kottm...@gmail.com> wrote: > I am changing my vote to -1 due to a bug i the DictionaryLemmatizer, in > case the word and

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Joern Kottmann
<jzemer...@apache.org> wrote: > +1 non-binding > > Built and tested on Ubuntu 16.04 and Amazon Linux 2017.03.0 with OpenJDK8. > NOTICE and LICENSE files look good. > Created and tested a token name finder model. > > Jeff > > > On Tue, May 9, 2017 at 2:41 PM, Joern

[VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-09 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 1.  The RC 1 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 1/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the

Re: InsufficientTrainingDataException while cross validating with TokenNameFinderCrossValidator

2017-04-19 Thread Joern Kottmann
Send us a patch to improve the documentation. Jörn On Mon, Apr 17, 2017 at 9:44 AM, Saurabh Jain wrote: > Thanks Jeff it worked. I think it is not mentioned in docs. > > On Mon, Apr 17, 2017 at 1:20 AM, Jeff Zemerick > wrote: > > > Saurabh, >

Welcome our new committers

2017-04-14 Thread Joern Kottmann
Hi all, The Apache OpenNLP PPMC is very pleased to announce that Daniel Russ, Peter Thygesen and Koji Sekiguchi accepted our invitation to become Apache OpenNLP committers. Congratulations, and welcome in the team! Jörn

Re: Codec classes (BioCodec and BilouCodec)

2017-03-15 Thread Joern Kottmann
od should be move out. Perhaps to a base class called Codec (or > CodecBase... or how you normally named base classes) The method is also > called by BilouCodec, which then is calling directly to BioCodec... (which > smells) > > Perhaps another refactoring task? > > /Peter >

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
; Oh I see. Thanks! > > > > Basically i have 30k sentences i apply the labels with a script and then > i > > pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate > > it. > > > > I am trying to build the model with 300 iterations again. >

Re: CUDA

2017-03-06 Thread Joern Kottmann
y it is only allowed > for MAXENT classifier, right ? > > 2017-03-06 10:17 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > > > Hello, > > > > no, we don't support CUDA. At some point we probably add support for one > of > > the deep learning packages and t

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
no > > 2017-03-06 10:19 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > > > Hello, > > > > this looks like output from the cross validator. > > > > Jörn > > > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <damianopo...@gmail.com> >

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
Hello, this looks like output from the cross validator. Jörn On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta wrote: > Hello, > > I am training a NER model with perceptron classifier (using OpenNLP 1.7.0) > > the output of the training is: > > Indexing events using

Re: Me getting involved

2017-02-27 Thread Joern Kottmann
Hello, yes, we are always very eager to get new contributors. From my past experience I think the best way to get started is to write a few unit tests. That will help you learn about our code base and will teach you how to get contributions into OpenNLP. Otherwise if you have something specific

Re: New lines at end of source files

2017-02-14 Thread Joern Kottmann
+1 to merge this Jörn On Mon, Feb 13, 2017 at 9:26 PM, Jeffrey Zemerick wrote: > On a recent pull request there was a comment that some new source files did > not have new lines at the ends of the files. When I added a rule to > checkstyle for new lines at the ends of

Re: Multiple models and String.intern

2017-02-08 Thread Joern Kottmann
heir > existing models exceed the default JVM limit, and an option would also be > useful for cases when the models were made from different data sources. > (I'm assuming in that case using string pooling would be detrimental to > performance.) > > Jeff > > > On Wed, Feb

Re: Name Finder trainer default settings

2017-02-07 Thread Joern Kottmann
oblem. I think > that our > > long-term goal should be to add a CRF, and make it the default for > the > > NameFinder. > > > > > > > > Daniel > > > > > > > > > > > > On 2/6/

Fwd: Training models for OpenNLP on the OntoNotes corpus

2017-02-04 Thread Joern Kottmann
-- Forwarded message -- From: "Joern Kottmann" <jo...@apache.org> Date: Feb 3, 2017 11:51 AM Subject: Training models for OpenNLP on the OntoNotes corpus To: <legal-disc...@apache.org> Cc: Hello all, the Apache OpenNLP library is a machine

Re: 1.7.2 release

2017-02-01 Thread Joern Kottmann
for high-risk changes, and we now should see problems before we merge new work into master. Jörn On Wed, Feb 1, 2017 at 2:59 PM, Richard Eckart de Castilho <r...@apache.org> wrote: > On 01.02.2017, at 14:35, Joern Kottmann <jo...@apache.org> wrote: > > > > The proje

Re: 1.7.2 release

2017-02-01 Thread Joern Kottmann
t;r...@apache.org> wrote: > Hi Jörn, > > I am curious - is there a specific reason that OpenNLP suddenly > has this flurry of activity? > > Best, > > -- Richard > > > On 31.01.2017, at 13:45, Joern Kottmann <jo...@apache.org> wrote: > > > > Dear

Re: [VOTE] Apache OpenNLP 1.7.2 Release Candidate

2017-02-01 Thread Joern Kottmann
The GIS training is not printing any messages due to a bug. Lets cancel this vote and try to release again with that bug fixed. Also the Data Indexers printing can't be controlled witht he PrintMessages parameter, we should fix that as well. Jörn On Tue, Jan 31, 2017 at 2:33 PM, Suneel Marthi

Re: OpenNLP model for model 1.7.3+

2017-01-30 Thread Joern Kottmann
Hello, I agree with Richard, we can't do such a step in a minor version increase because we also promise that models work with older minor versions e.g. model trained with 1.7.4 is supposed to work with 1.7.0. Users probably have a much higher overhead to retrain their models than to update to

Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-23 Thread Joern Kottmann
+1 binding Jörn On Jan 21, 2017 12:18 AM, "Suneel Marthi" wrote: The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.7.1 Release Candidate. The Release artifacts can be downloaded from: https://repository.apache.org/content/repositories/

Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-22 Thread Joern Kottmann
On Sat, 2017-01-21 at 21:09 -0500, Jeffrey Zemerick wrote: > I went to the opennlp-distr/README for a summary of changes in 1.7.1 > but I > think it is the same as it was for 1.7.0. Is that file typically > updated > for revision releases? The link at the bottom of the RELEASE_NOTES to > the >

Re: Check OpenNLP build version of trained model

2017-01-13 Thread Joern Kottmann
We should consider printing out basic information about the model when it is loaded with the CLI tools. Jörn On Fri, Jan 13, 2017 at 11:42 AM, William Colen wrote: > Yes. The model is a zip file. Extract it and you can find a metadata file > with this information. > >

Re: Thread-safe versions of some of the tools

2017-01-12 Thread Joern Kottmann
be confused if your build fails for style violation. Jörn On Thu, Jan 12, 2017 at 1:07 PM, Thilo Goetz <twgo...@gmx.de> wrote: > On 12/01/2017 10:20, Joern Kottmann wrote: > >> The POSTagger interface just grew over time and I am not sure it is >> actually that great. Today

Re: Thread-safe versions of some of the tools

2017-01-12 Thread Joern Kottmann
.de> wrote: > On 11/01/2017 22:51, Joern Kottmann wrote: > >> On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote: >> >>> in a recent project, I was using SentenceDetectorME, TokenizerME and >>> POSTaggerME. It turns out that none of those is thread sa

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
On Wed, 2017-01-11 at 17:14 +, Russ, Daniel (NIH/CIT) [E] wrote: > Hi, > >    I am little confused. Why do you want to share an instance of a > SentenceDetectorME across threads? Are you documents very long single > sentences? I don’t think there is enough work for the > SentenceDetectorME to

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
ol, or thread locals, or anything > > > > like that. > > > > Especially since there is really no good reason IMHO. You could > > > > very easily > > > > just return the probabilities together with the spans, and > > > > whoever d

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
On Wed, 2017-01-11 at 11:05 +0100, Thilo Goetz wrote: > in a recent project, I was using SentenceDetectorME, TokenizerME and  > POSTaggerME. It turns out that none of those is thread safe. This is  > because the classification probabilities for the last tag() call > (for  > example) are stored in

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
the spans, and whoever doesn't > need them can ignore them. Or have two methods, one with probabilities, one > without. Maybe it's just where I'm coming from, but I fail to see the > advantages of the current approach. > > --Thilo > > > > On 11/01/2017 13:58, Joern Kottma

Re: Thread-safe versions of some of the tools

2017-01-11 Thread Joern Kottmann
Hello Thilo, I am interested in your opinion about how this is done currently. We say: "Share the model between threads and create one instance of the component per thread". Wouldn't that work well in your use case? Jörn On Wed, Jan 11, 2017 at 11:05 AM, Thilo Goetz wrote:

Re: Commit message style

2017-01-10 Thread Joern Kottmann
+1 for the OPENNLP-xxx: commit message. > > > > > > > > > > > > On Tue, Jan 10, 2017 at 12:51 AM, William Colen <william.colen@gm > > > ail.com > > > > > > wrote: > > > > > > > +1 for the OPENNLP-xxx: commit me

Re: merge TrainingParameters and PluggableParameters

2017-01-10 Thread Joern Kottmann
1. Yes, there are historic reasons, opennlp-tools used to depend on maxent, but maxent didn't depend on opennlp-tools. Therefore maxent couldn't use any opennlp-tools classes. It would be good to open a jira for this and get this re-factored. 2. The cutoff code from GISTrainer should be removed.

Re: Trunk vs. Master

2017-01-09 Thread Joern Kottmann
Sorry for the confusion, this should have been done differently when we moved from svn to git. The other repositories opennlp-addons.git and opennlp-sandbox.git only have the master branch. Jörn On Mon, 2017-01-09 at 13:21 -0500, Suneel Marthi wrote: > ITs the 'master' going forward, we'll be

Re: Commit message style

2017-01-09 Thread Joern Kottmann
t; > On Mon, Jan 9, 2017 at 8:26 AM, Joern Kottmann <kottm...@gmail.com> > wrote: > > > Hello all, > > > > we are using different styles for commit messages. It would be good > > to have > > a short discussion on how we think they should be and agree a

Commit message style

2017-01-09 Thread Joern Kottmann
Hello all, we are using different styles for commit messages. It would be good to have a short discussion on how we think they should be and agree all on how to write the subject line. Here are few points from me: - Good commit messages are important to understand what happened in the project

Checkstyle

2017-01-07 Thread Joern Kottmann
Hello all, we added the checkstyle maven plugin to the build. Please let us know what you think about the enforced rules and what you think should be added. For some rules it is good to fail the build, this will make it easier for us to perform code reviews, because travis will then by itself

[ANNOUNCE] Welcome our new committer Suneel Marthi

2017-01-03 Thread Joern Kottmann
Hi all, The Apache OpenNLP PPMC is very pleased to announce that Suneel Marthi accepted our invitation to become an Apache OpenNLP committer. Suneel helped us with many PRs to get the 1.7.0 release out and had lots of advice on how to increase the development activity again. Congratulations,

Next release

2017-01-01 Thread Joern Kottmann
Hello all, now all the tests we do to release OpenNLP are automated and that allows us to also do more frequent releases. I would like to do a couple of releases this year and not just one, so the next one will probably be 1.7.1 and we should do it rather soon. There is a PR we can merge for

Re: OpenNLP 1.7.0 RC 2 is ready for testing

2017-01-01 Thread Joern Kottmann
Sorry, this was closed too early and should have been open longer, you are right. We will do better for the releases to come! Jörn On Sun, 2017-01-01 at 03:02 +0100, Richard Eckart de Castilho wrote: > On 01.01.2017, at 02:41, Suneel Marthi wrote: > > > > The release has

Re: OpenNLP 1.7.0 RC 2 is ready for testing

2016-12-31 Thread Joern Kottmann
+1, looks good Jörn On Dec 31, 2016 8:54 PM, "William Colen" wrote: > Hi all, > > Apache OpenNLP 1.7.0 RC 2 is ready for testing. The RC 1 failed due to > missing files and it failed to run 1.6.0 models. There is no new features > since RC 1. > > The RC 2 can be downloaded

Re: OpenNLP 1.7.0 RC 1 is ready for testing

2016-12-31 Thread Joern Kottmann
We are missing the LICENSE and NOTICE files in the binary distribution and should make a RC2 for the release. All the manual tests are now automatic, so we don't this long test plan anymore. Jörn On Sat, 2016-12-31 at 00:24 -0200, William Colen wrote: > Hi all, > > Apache OpenNLP 1.7.0 RC 1

Re: Update to Java 8

2016-12-20 Thread Joern Kottmann
I merged it, now Java 8 and Maven 3.3.9 is required to build OpenNLP. Jörn On Tue, Dec 20, 2016 at 10:18 AM, Joern Kottmann <kottm...@gmail.com> wrote: > Looks like there is nobody against it here. > > Suneel already sent us a PR for the change I will merge it, thanks! > >

Re: Update to Java 8

2016-12-20 Thread Joern Kottmann
gmail.com> > wrote: > > > +1 > > > > 2016-12-19 21:22 GMT-02:00 Joern Kottmann <kottm...@gmail.com>: > > > > > +1 from me as well > > > > > > Jörn > > > > > > On Tue, Dec 20, 2016 at 12:02 AM, Tommaso Teofili < > >

Re: Update to Java 8

2016-12-19 Thread Joern Kottmann
.8 > > > > On Tue, Dec 20, 2016 at 2:51 AM, Suneel Marthi < > > suneel_mar...@yahoo.com.invalid> wrote: > > > > > +1 to move to Java 8 > > > > > > > > > From: Joern Kottmann <kottm...@gmail.com> > > > To: "dev

Re: Get Original text

2016-12-19 Thread Joern Kottmann
sentence inside my custom feature generator, > but i would like to get the original text sentence too without > tokenization. > > I am finding a way to pass the original text too and analyze it inside the > createFeatures() callback. > > Is it possible somehow? > > 201

  1   2   >