Re: [VOTE] Apache OpenNLP 2.0.0 Release Candidate

2022-06-01 Thread Joern Kottmann
+1 binding Thanks for all the work on this Jeff! Cheers, Jörn On Wed, Jun 1, 2022 at 9:57 PM Suneel Marthi wrote: > +1 binding > > On Wed, Jun 1, 2022 at 3:12 PM Jeff Zemerick wrote: > > > Just pinging folks on the thread about the active vote. The project has a > > board report due in a wee

Re: [VOTE] Apache OpenNLP 1.9.3 Release Candidate

2020-07-29 Thread Joern Kottmann
+1 Release the packages as Apache OpenNLP 1.9.3 Jörn On Wed, Jul 29, 2020 at 1:08 PM Tommaso Teofili wrote: > > +1 from me, build, sigs, tag look good. > > Regards, > Tommaso > > On Tue, 28 Jul 2020 at 10:48, Bruno P. Kinoshita wrote: > > > It worked after I imported keys from > > https://dist.

Re: license for opennlp 1.5 pre-trained models

2019-12-30 Thread Joern Kottmann
Hello, The Apache OpenNLP project only distributes models that are licensed under the AL 2.0 license, or models that comply with the strict licensing requirements at Apache. So far we only release a language detection model at the Apache OpenNLP project. The OpenNLP project was hosted in the past

Re: OpenNLP 1.9.2 and Java 8/11

2019-12-15 Thread Joern Kottmann
+1 lgtm, it would be nice to track down the exact cause of the changes on accuracy caused by the JDK update. We had similar issues in the past e.g through things like the undefined iteration order of Sets. I am happy to help with this. Jörn On Sat, Dec 14, 2019 at 3:48 PM Tommaso Teofili wrote:

Re: KStem support?

2019-02-19 Thread Joern Kottmann
Hello, we don't have it, but it would be nice to get a contribution for it. Jörn On Thu, Feb 7, 2019 at 3:03 PM Benedict Holland wrote: > > Hello all, > > I just happened to read a Solr message about using KStem. Is there any > support for this particular stemmer or would you like there to be?

Re: [VOTE] Apache OpenNLP 1.9.0 Release Candidate 2

2018-06-29 Thread Joern Kottmann
+1 Jörn On Fri, Jun 29, 2018 at 1:45 PM, Jeff Zemerick wrote: > Hi folks, > > I have posted a 2nd release candidate for the Apache OpenNLP 1.9.0 release > and it is ready for testing. > > The distributables can be downloaded from: > https://repository.apache.org/content/repositories/orgapacheope

Re: Custom models (for Ukrainian and Russian languages)

2018-06-28 Thread Joern Kottmann
Hello, we would be happy to hear about your experience. Did the language detector perform well enough on Russian/Ukrainian texts? To reproduce the models we train you should download the data via svn: svn co https://svn.apache.org/repos/bigdata/opennlp/trunk opennlp-corpus Note the corpus is qui

Re: OPENNLP-912 : Add a rule based sentence detector

2018-04-06 Thread Joern Kottmann
Hello, could you elaborate a bit on the approach? Jörn On Tue, Apr 3, 2018 at 5:24 PM, Isuranga Perera wrote: > Hi All, > > I would like to contribute $subject feature. Appreciate if anyone can guide > me through the process. > > Best Regards > Isuranga Perera

Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-23 Thread Joern Kottmann
+1 Jörn On Dec 21, 2017 15:44, "Jeff Zemerick" wrote: > Hi Folks, > > I have posted a first release candidate for the Apache OpenNLP 1.8.4 > release and it is ready for testing. > > The RC1 distributables can be downloaded from here: > https://repository.apache.org/content/repositories/ > orgap

Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-10-30 Thread Joern Kottmann
+1 Jörn On Mon, Oct 30, 2017 at 2:30 PM, William Colen wrote: > The Apache OpenNLP PMC would like to call for a Vote on the Language > Detector model for Apache OpenNLP 1.8.3 Release Candidate 3. > > The Release artifacts can be downloaded from: > > http://people.apache.org/~colen/models/langdet

Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-26 Thread Joern Kottmann
+1 Jörn On Thu, Oct 26, 2017 at 10:18 AM, Rodrigo Agerri wrote: > +1 (binding) > > -eval and unit tests OK > > On Wed, Oct 25, 2017 at 7:01 PM, William Colen > wrote: >> +1 binding >> >> - eval tests ok >> - unit test ok >> - build from tag ok >> - distribution execution ok >> - distribution o

Preparing for the next release

2017-10-24 Thread Joern Kottmann
Hello all, please don't commit any new changes (code freeze) because we are now preparing for the 1.8.3 release. BR, Jörn

[ANNOUNCE] CVE-2017-12620: Apache OpenNLP XXE vulnerability

2017-10-02 Thread Joern Kottmann
Severity: Medium Vendor: The Apache Software Foundation Versions Affected: OpenNLP 1.5.0 to 1.5.3 OpenNLP 1.6.0 OpenNLP 1.7.0 to 1.7.2 OpenNLP 1.8.0 to 1.8.1 Description: When loading models or dictionaries that contain XML it is possible to perform an XXE attack, since OpenNLP is a library,

Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-15 Thread Joern Kottmann
ing > > >> On 11 Sep 2017, at 09.12, Joern Kottmann wrote: >> >> Hi Folks, >> >> >> I have posted a second release candidate for the Apache OpenNLP 1.8.2 >> release and it is ready for testing. >> >> >> The RC 2 distributables

[VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-11 Thread Joern Kottmann
Hi Folks, I have posted a second release candidate for the Apache OpenNLP 1.8.2 release and it is ready for testing. The RC 2 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-1018/org/apache/opennlp/opennlp-distr/1.8.2/ The releas

Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-07 Thread Joern Kottmann
ed, Sep 6, 2017 at 2:17 PM, Madhawa Kasun Gunasekara < > madhaw...@gmail.com> wrote: > >> +1 (non-binding) >> >> Madhawa >> >> On Wed, Sep 6, 2017 at 1:15 PM, Peter Thygesen >> wrote: >> >> > +1 binding >> > >> >

Re: Cache

2017-09-05 Thread Joern Kottmann
The feature generators are not thread safe, so it is ok to use an instance variable for caching. We have some feature generators which do that. Jörn On Tue, Sep 5, 2017 at 9:19 PM, Daniel Russ wrote: > Again, you should send this to users not dev mail list. > > Have you tried adding an instance

[VOTE] Apache OpenNLP 1.8.2 Release Candidate

2017-09-04 Thread Joern Kottmann
Hi Folks, I have posted a first release candidate for the Apache OpenNLP 1.8.2 release and it is ready for testing. The RC 1 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-1017/org/apache/opennlp/opennlp-distr/1.8.2/ The release

Re: Early stopping NameFinderME

2017-08-29 Thread Joern Kottmann
> Daniel > >> On Aug 29, 2017, at 10:32 AM, Joern Kottmann wrote: >> >> Hi Daniel, >> >> do you see any issue if we expose LLThreshold and allow the user to >> change it via training parameters? >> >> Jörn >> >> On Sat, Aug 26, 2017 at 1

Re: Early stopping NameFinderME

2017-08-29 Thread Joern Kottmann
, early > stopping is used to prevent overtraining and improve generalization to unseen > data. I am not sure early stopping serves the same purpose with GIS > training. Does anyone know if early stopping improves generalization for a > maxent problem? > > Daniel > >&

Re: Early stopping NameFinderME

2017-08-24 Thread Joern Kottmann
You are the first one who ever asked this question. I think we have this as an option already on the gis trainer but it is not exposed all the way through. Please open a jira and I can look at it next week. Jörn On Aug 21, 2017 5:11 PM, "Saurabh Jain" wrote: > Hi All > > How can we use early s

Re: OpenNLP Support for Urdu

2017-08-02 Thread Joern Kottmann
Hello, we would like to release models based on Universal Dependencies [1] in the future, they also have training data in Urdu. It would be nice if you could train, evaluate and optimize those models. Jörn [1] http://universaldependencies.org/ On Wed, Aug 2, 2017 at 11:12 AM, Abdul Rauf wrote:

Re: Problem of POSTaggerCrossValidator

2017-07-20 Thread Joern Kottmann
Hello, attachments are not allowed on this list. Could you please copy the error you got and the command you used into a mail? Jörn On Thu, Jul 20, 2017 at 6:31 AM, Santipong Thaiprayoon wrote: > To whom it may concern. > > > I used OpenNLP version 1.8.1 for training a part-of-speech in Thai l

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
o > the metadata would have the algorithm information ? > > 2. Do we publish multiple models for the same task, each trained on > different algorithms ? > > > > On Tue, Jul 11, 2017 at 9:30 AM, Joern Kottmann wrote: > >> Hello, >> >> right, very good p

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
n load from the CLI to >> override an >> internal classpath dependency. This is for people in environments who want >> a sensible >> / delivered internal classpath default and the ability for run-time, non >> zipped up/messing >> with JAR file override. Think abo

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
download from the > original provider? We can't guarantee that the corpus will be there > forever, not only because it changed license, but simple because the > provider is not keeping the server up anymore. > > William > > > > 2017-07-10 14:52 GMT-03:00 Joern Ko

Re: Releasing a Language Detection Model

2017-07-11 Thread Joern Kottmann
odel to use on the CLI, can the CLI tools >>> look on the classpath for a model whose name fits the needed model (like >>> en-ner-person.bin) and if found use it automatically? >>> >>> Jeff >>> >>> >>> >>> On Mon, Jul 10, 2017

Re: Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
er is not keeping the server up anymore. > > William > > > > 2017-07-10 14:52 GMT-03:00 Joern Kottmann : > >> Hello all, >> >> since Apache OpenNLP 1.8.1 we have a new language detection component >> which like all our components has to be trained. I think we

Releasing a Language Detection Model

2017-07-10 Thread Joern Kottmann
Hello all, since Apache OpenNLP 1.8.1 we have a new language detection component which like all our components has to be trained. I think we should release a pre-build model for it trained on the Leipzig corpus. This will allow the majority of our users to get started very quickly with language de

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread Joern Kottmann
+1 i did run the eval the tests and they passed Jörn On Fri, Jul 7, 2017 at 1:06 PM, Bruno P. Kinoshita wrote: > Build passing OK with the following environment: > Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; > 2015-11-11T05:41:47+13:00) > Maven home: /opt/maven > Java version:

Re: Joining the group

2017-07-05 Thread Joern Kottmann
still empty, but it will be located here: https://cwiki.apache.org/confluence/display/OPENNLP/NIP-3%3A+Revive+the+coreference+component Jörn On Thu, Jun 29, 2017 at 7:14 PM, Joern Kottmann wrote: > Hello, > > there are a few problems we have with it. It would be very good if you > ca

Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

2017-07-05 Thread Joern Kottmann
ucture for it. > > @Chris > Hi Prof. Thanks for the kind words! Just getting started with my new job > here - more NLP and Machine Translation stuff to come. > > -Thamme > > On Wed, Jul 5, 2017 at 8:26 AM, Chris Mattmann > wrote: > > > Thamme,

Re: Title: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 2

2017-07-05 Thread Joern Kottmann
n build from {src} * {tar, zip} and all tests pass > > > On Tue, Jul 4, 2017 at 9:16 AM, Joern Kottmann wrote: > >> Hi Folks, >> >> >> I have posted a 2nd release candidate for the Apache OpenNLP 1.8.1 >> release and it is ready for testing. >> &g

Re: Document Categorizer based on Glove + LSTM (powered by DL4J)

2017-07-05 Thread Joern Kottmann
+1 to merge this when it implements the Document Categorizer, then we can also use those tools to train and evaluate it Jörn On Wed, Jul 5, 2017 at 9:28 AM, Rodrigo Agerri wrote: > Hello again, > > @Thamme, out of curiosity, do you have evaluation numbers on the > Stanford Large Movie Review dat

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate

2017-07-04 Thread Joern Kottmann
Thank you very much for that info. We reverted the change we did to the sentence detector and will do this in a release after 1.8.1. RC 2 is now available. Jörn On Sun, Jul 2, 2017 at 9:25 PM, Richard Eckart de Castilho wrote: > On 02.07.2017, at 19:13, Joern Kottmann wrote: >>

Title: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 2

2017-07-04 Thread Joern Kottmann
Hi Folks, I have posted a 2nd release candidate for the Apache OpenNLP 1.8.1 release and it is ready for testing. The RC 2 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-1015/org/apache/opennlp/opennlp-distr/1.8.1/ The release w

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate

2017-07-02 Thread Joern Kottmann
Hello, one question, did you retrain or use existing models? Jörn On Sat, Jul 1, 2017 at 10:20 PM, Richard Eckart de Castilho wrote: > Hi all, > > I ran a DKPro Core build against the RC. Looks mostly fine. No code changes > are required after switching from 1.8.0 to 1.8.1. All unit tests excep

1.8.1 release

2017-07-01 Thread Joern Kottmann
Dear all, We will be making a 1.8.1 release of OpenNLP in the next days. All issues in jira are closed now. Jörn

Re: Where are JIRA issue emails sent?

2017-06-29 Thread Joern Kottmann
There is a separate list for that, our website explains the details: https://opennlp.apache.org/mailing-lists.html Jörn On Thu, Jun 29, 2017 at 7:51 PM, Chris Mattmann wrote: > …because I’m not getting them. > > Thanks team for the pointers. > > Chris > > > > >

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
One more thing, in case we check in models for unit tests we need to be able to train them again, we might not support those models forever and then it would be bad if we can't use the tests anymore or need to repair them by hand. Jörn On Thu, Jun 29, 2017 at 7:18 PM, Joern Kottmann

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
ds.usc.edu/SentimentAnalysisParser/datasets.html > > > > There are other evaluations here: > > > > http://irds.usc.edu/SentimentAnalysisParser/models.html > > > > The HT provider review I cannot contribute at this time and I question >

Re: Joining the group

2017-06-29 Thread Joern Kottmann
Hello, there are a few problems we have with it. It would be very good if you can help us to solve those. Basically we would need to get it into the following state: - Have a data set it can be trained on - Implement evaluation for it - Write some documentation As far as I remember we somehow go

Re: Missing serializer for postagger.bin

2017-06-29 Thread Joern Kottmann
This is fixed now in the master branch, would you mind to try it again? Jörn On Wed, Jun 14, 2017 at 4:31 PM, Joern Kottmann wrote: > We have to fix this, William wrote a unit test to reproduce it. > > Jörn > > On Fri, Jun 9, 2017 at 4:31 PM, Damiano Porta > wrote: >&

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
_categ2 > (for categorical/multi-class) > > We can also do similar files where instead of multi-class, we just use > pos/neg as the label. > > Cheers, > Chris > > > > > > On 6/29/17, 2:35 AM, "Joern Kottmann" wrote: > > Hello Chris, >

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-29 Thread Joern Kottmann
gt; > Cheers, > Chris > > > > > On 6/28/17, 3:57 AM, "Joern Kottmann" wrote: > > The vote passes, only +1 votes have been received: > +1 Mark G > +1 Rodrigo Agerri > +1 Jeff Zemerick > +1 Suneel Marthi > +1 Jörn Kottmann >

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

2017-06-29 Thread Joern Kottmann
Hello Chris, could you please point me to files I can use to train the sentiment component? I am currently looking again through the code and would like to train it myself. Jörn On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ wrote: > Hi All, >First, let me take a share of blame for the comment C

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-28 Thread Joern Kottmann
, 2017 at 10:48 PM, Chris Mattmann >> wrote: >> >> > If you are talking about using Apache Gitbox, then yes I am +1 for this. >> > >> > Thanks, >> > Chris >> > >> > >> > >> > >> > On 6/27/17, 3:30 AM, "

[VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Joern Kottmann
Hello all, lets decide here if we want to move our main repository, currently hosted at Apache to GitHub instead. This will make our process a bit easier because we can eliminate one remote from our workflow. [ ] +1 Migrate all repositories to GitHub [ ] -1 Do not migrate, because...

Re: [VOTE] Migrate our main repositories to GitHub

2017-06-27 Thread Joern Kottmann
+1 Jörn On Tue, Jun 27, 2017 at 12:30 PM, Joern Kottmann wrote: > Hello all, > > lets decide here if we want to move our main repository, currently > hosted at Apache to GitHub instead. This will make our process a bit > easier because we can eliminate one remote fr

Re: Missing serializer for postagger.bin

2017-06-14 Thread Joern Kottmann
gt; >> > >> public static POSModel loadPosTagger (String modelName) { > >> > >> try (InputStream modelIn = new FileInputStream(modelName)) { > >> POSModel model = new POSModel(modelIn); > >> return model; > >&

Re: Multiple Token Name Finder Models

2017-06-08 Thread Joern Kottmann
1) Yes, you can use the API to load multiple models, and afterwards you can merge the spans. In that case you have to deal with overlapping spans somehow. Today we also support training models on multiple types, that is probably better suited for you use case. 2) There is some code which does that

Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
1.8.0 > > > Do i need others dependencies too? > > > > 2017-06-07 14:53 GMT+02:00 Joern Kottmann : > > > This should be working. Did you test with 1.8.0? > > > > Jörn > > > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta > &

Re: Missing serializer for postagger.bin

2017-06-07 Thread Joern Kottmann
This should be working. Did you test with 1.8.0? Jörn On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta wrote: > Hello, > i am using the POSTaggerFeatureGenerator via generators.xml > > > > during the training i add this model in the resources doing: > > HashMap map = new HashMap<>(); >

Re: Stemmer Feature Generator

2017-05-28 Thread Joern Kottmann
the main english models? > > 2017-05-27 13:14 GMT+02:00 Joern Kottmann : > > > I don't know, for that corpus you have to order the Reuters data, but we > > have formats support for it, should be easy to measure when you have the > > data. > > > > Jörn &g

Re: Stemmer Feature Generator

2017-05-27 Thread Joern Kottmann
6 17:43 GMT+02:00 Joern Kottmann : > > > Hello, > > > > can you post performance numbers? Only if it helps with some data set it > > would make sense to add it. > > > > Jörn > > > > On Thu, May 25, 2017 at 3:10 PM, Damiano Porta > > wrot

Re: Stemmer Feature Generator

2017-05-26 Thread Joern Kottmann
Hello, can you post performance numbers? Only if it helps with some data set it would make sense to add it. Jörn On Thu, May 25, 2017 at 3:10 PM, Damiano Porta wrote: > Hello, > do you think a StemmerFeatureGenerator can be useful for NER models? > I can create a PR for it. > > Damiano >

Re: opennlp.tools.coref.mention.JWNLDictionary;

2017-05-23 Thread Joern Kottmann
The coref component was removed from OpenNLP quite some time ago because we didn't have a maintainer anymore for it. The JWNLDictionary class was part of that removal, you can still find the code in the OpenNLP Sandbox: https://github.com/apache/opennlp-sandbox/blob/master/opennlp-coref/src/main/ja

[ANNOUNCE] Apache OpenNLP 1.8.0 Release

2017-05-19 Thread Joern Kottmann
The Apache OpenNLP team is pleased to announce the release of version 1.8.0 of Apache OpenNLP. The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-sp

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Joern Kottmann
The vote passes, only +1 votes were receive: +1 Bruno +1 Tommaso +1 William +1 Jörn +1 Jeff +1 Daniel +1 Richard +1 Joey +1 Suneel +1 Rodrigo Thanks for voting! Jörn On Wed, 2017-05-17 at 23:48 +0200, Joern Kottmann wrote: > The Apache OpenNLP PMC would like to call for a Vote on Apa

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread Joern Kottmann
pennlp-tools/src/main/java/opennlp/tools/util/ > > TokenTag.java#L85 > > > > > > And > > > > > > > > > > > > https://github.com/apache/opennlp/blob/73c8e5b9d8e055fefb53f7f3c2487d > > 05c9788c6a/opennlp-tools/src/main/java/opennlp/tools/util/featuregen/ > > POSTag

Re: CoReference

2017-05-18 Thread Joern Kottmann
critto: > > > Oh my wrong. Pardon. > > Do we have accuracy statistics? > > > > Il 18 mag 2017 14:59, "Joern Kottmann" ha scritto: > > > >> This is for linking entities in one document, e.g. first name mention > to a > >> full name m

Re: CoReference

2017-05-18 Thread Joern Kottmann
This is for linking entities in one document, e.g. first name mention to a full name mention, or to he, she, it. Jörn On Thu, May 18, 2017 at 1:27 PM, Damiano Porta wrote: > Hi, thanks but I need to link entities to each others . I do not need to > link entities to external resources. > > Damia

[VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-17 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 3.  The RC 3 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 3/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the A

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
15, 2017 at 6:21 PM, Richard Eckart de Castilho wrote: > > On 15.05.2017, at 16:35, Joern Kottmann wrote: > > > > Richard, I believe I found the problem with the parser, would you mind to > > take a look? > > > > This PR should fix it: > > https://github.

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
Richard, I believe I found the problem with the parser, would you mind to take a look? This PR should fix it: https://github.com/apache/opennlp/pull/199 Jörn On Mon, May 15, 2017 at 4:14 PM, Richard Eckart de Castilho wrote: > Hi Rodrigo, > > On 15.05.2017, at 15:36, Rodrigo Agerri wrote: > >

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
should be sufficient. > > Could there maybe be a problem with duplicates being dropped silently > by the move from the ListHeap to the TreeSet? If duplicate removal > is not important, then maybe sorting the heap after it has been filled > would be a better option than using a permanen

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-15 Thread Joern Kottmann
://github.com/apache/opennlp/blob/3df659b9bfb02084e782f1e8b6ec716f56e0611c/opennlp-tools/src/test/java/opennlp/tools/eval/OntoNotes4ParserEval.java#L70 On Sat, May 13, 2017 at 10:35 PM, Richard Eckart de Castilho wrote: > Hi all, > > > On 11.05.2017, at 18:37, Joern Kottmann wrote:

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-12 Thread Joern Kottmann
0RC2 before release. > Daniel > > On May 11, 2017 12:38 PM, "Joern Kottmann" wrote: > > > The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP > > 1.8.0 Release Candidate 2. > > > > The RC 2 distributables can be downloaded from her

Re: Error when processing doap file http://opennlp.apache.org/doap_opennlp.rdf:

2017-05-12 Thread Joern Kottmann
Thanks for forwarding this to the dev list. The file is now available again. Jörn On Fri, May 12, 2017 at 10:46 AM, sebb wrote: > -- Forwarded message -- > From: Projects > Date: 12 May 2017 at 03:00 > Subject: Error when processing doap file > http://opennlp.apache.org/doap_op

[VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-11 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 2.  The RC 2 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 2/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the A

[ANNOUNCE] New website for Apache OpenNLP

2017-05-11 Thread Joern Kottmann
Hello all, we launched a redesigned new web site for Apache OpenNLP with a new logo - check it out at https://opennlp.apache.org Regards, The Apache OpenNLP Team

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Joern Kottmann
I am canceling the vote due to the above mentioned bug. Lets prepare another RC which has this issue fixed. Jörn On Thu, May 11, 2017 at 9:51 AM, Joern Kottmann wrote: > I am changing my vote to -1 due to a bug i the DictionaryLemmatizer, in > case the word and postag pair is not

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-11 Thread Joern Kottmann
wrote: > +1 non-binding > > Built and tested on Ubuntu 16.04 and Amazon Linux 2017.03.0 with OpenJDK8. > NOTICE and LICENSE files look good. > Created and tested a token name finder model. > > Jeff > > > On Tue, May 9, 2017 at 2:41 PM, Joern Kottmann wrote: > > &g

[VOTE] Apache OpenNLP 1.8.0 Release Candidate

2017-05-09 Thread Joern Kottmann
The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.8.0 Release Candidate 1.  The RC 1 distributables can be downloaded from here: https://repository.apache.org/content/repositories/orgapacheopennlp-101 1/org/apache/opennlp/opennlp-distr/1.8.0/ The release was made from the A

Re: InsufficientTrainingDataException while cross validating with TokenNameFinderCrossValidator

2017-04-19 Thread Joern Kottmann
Send us a patch to improve the documentation. Jörn On Mon, Apr 17, 2017 at 9:44 AM, Saurabh Jain wrote: > Thanks Jeff it worked. I think it is not mentioned in docs. > > On Mon, Apr 17, 2017 at 1:20 AM, Jeff Zemerick > wrote: > > > Saurabh, > > > > Are there document boundaries (new lines) in

Welcome our new committers

2017-04-14 Thread Joern Kottmann
Hi all, The Apache OpenNLP PPMC is very pleased to announce that Daniel Russ, Peter Thygesen and Koji Sekiguchi accepted our invitation to become Apache OpenNLP committers. Congratulations, and welcome in the team! Jörn

Re: Codec classes (BioCodec and BilouCodec)

2017-03-15 Thread Joern Kottmann
to a base class called Codec (or > CodecBase... or how you normally named base classes) The method is also > called by BilouCodec, which then is calling directly to BioCodec... (which > smells) > > Perhaps another refactoring task? > > /Peter > > 2017-03-14 12:48 GMT+01:00 J

Re: Codec classes (BioCodec and BilouCodec)

2017-03-14 Thread Joern Kottmann
On Sun, Mar 12, 2017 at 11:19 PM, Peter Thygesen wrote: > Hi OpenNLP Developers > > > > After I some weeks ago added a PR for NameFinderSequenceValidator test, I > spent some time looking into the BilouNameFinderSequenceValidator and the > Codec classes, trying to understand how they work and I w

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
es i apply the labels with a script and then > i > > pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate > > it. > > > > I am trying to build the model with 300 iterations again. > > > > 2017-03-06 13:31 GMT+01:00 Joern Kottmann : > &g

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
2 days and > it is still running... i will block it > > i still do not understand what number should i set as *folds*. Ok i will > set a number > 1 but, should i have to pay more attention to this > parameter? if i set 8 or 10 does it matter anything? > > > >

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
FMeasure result = test.getFMeasure(); > > System.out.println(result.toString()); > } > > What should i put on the second parameter of test.evaluate() ? Each sample > (in samples variable) represents a document. There are no relations with > other samples. > > 2017-

Re: CUDA

2017-03-06 Thread Joern Kottmann
or MAXENT classifier, right ? > > 2017-03-06 10:17 GMT+01:00 Joern Kottmann : > > > Hello, > > > > no, we don't support CUDA. At some point we probably add support for one > of > > the deep learning packages and those usually use CUDA. > > > > Jörn &g

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
ber to > 100 i can finally get the model in half an hour. > > The problem with 300 iterations is that i can see the model (.bin) in half > an hour too but the computations are still running. So i do not really > understand what it is doing. > > Damiano > > 2017-03-

Re: CUDA

2017-03-06 Thread Joern Kottmann
Hello, no, we don't support CUDA. At some point we probably add support for one of the deep learning packages and those usually use CUDA. Jörn On Sat, Mar 4, 2017 at 5:17 PM, Damiano Porta wrote: > Hello everybody, > > does OpenNLP support CUDA parallel computing? > > Damiano >

Re: Training perceptron model

2017-03-06 Thread Joern Kottmann
Hello, this looks like output from the cross validator. Jörn On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta wrote: > Hello, > > I am training a NER model with perceptron classifier (using OpenNLP 1.7.0) > > the output of the training is: > > Indexing events using cutoff of 0 > > Computing even

Re: Me getting involved

2017-02-27 Thread Joern Kottmann
Hello, yes, we are always very eager to get new contributors. From my past experience I think the best way to get started is to write a few unit tests. That will help you learn about our code base and will teach you how to get contributions into OpenNLP. Otherwise if you have something specific yo

Re: New lines at end of source files

2017-02-14 Thread Joern Kottmann
+1 to merge this Jörn On Mon, Feb 13, 2017 at 9:26 PM, Jeffrey Zemerick wrote: > On a recent pull request there was a comment that some new source files did > not have new lines at the ends of the files. When I added a rule to > checkstyle for new lines at the ends of files there were a good nu

Re: Multiple models and String.intern

2017-02-08 Thread Joern Kottmann
els exceed the default JVM limit, and an option would also be > useful for cases when the models were made from different data sources. > (I'm assuming in that case using string pooling would be detrimental to > performance.) > > Jeff > > > On Wed, Feb 8, 2017 at 5:50 AM

Multiple models and String.intern

2017-02-08 Thread Joern Kottmann
Hello all, I often run multiple models in production, often trained on the same data but with different types (typical name finder scenario). There could be one model to detect person names, and another to detection locations. The predicate Strings inside those models are always the same but the m

Re: Name Finder trainer default settings

2017-02-07 Thread Joern Kottmann
problem. I think > that our > > long-term goal should be to add a CRF, and make it the default for > the > > NameFinder. > > > > > > > > Daniel > > > > > > > > > > > > On 2/6/17, 12:40 PM, "

Name Finder trainer default settings

2017-02-06 Thread Joern Kottmann
Hello all, I would like to propose to switch the default training algorithm from maxent gis to perceptron for the Name Finder. In all the data sets I tried perceptron performs better than maxent gis and I believe that would be a much more sensible default. A user can always override the default b

Fwd: Training models for OpenNLP on the OntoNotes corpus

2017-02-04 Thread Joern Kottmann
-- Forwarded message -- From: "Joern Kottmann" Date: Feb 3, 2017 11:51 AM Subject: Training models for OpenNLP on the OntoNotes corpus To: Cc: Hello all, the Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.It su

Re: [VOTE] Apache OpenNLP 1.7.2 Release Candidate

2017-02-03 Thread Joern Kottmann
+1 I did run the eval tests and they all run through except one test which needed more memory, that test case has to be adapted to run fast and with much less memory, we should do that for the 1.7.3 release. Jörn On Wed, Feb 1, 2017 at 5:52 PM, Suneel Marthi wrote: > The Apache OpenNLP PMC wou

Re: 1.7.2 release

2017-02-01 Thread Joern Kottmann
for high-risk changes, and we now should see problems before we merge new work into master. Jörn On Wed, Feb 1, 2017 at 2:59 PM, Richard Eckart de Castilho wrote: > On 01.02.2017, at 14:35, Joern Kottmann wrote: > > > > The project is now more agile and we can cut a release w

Re: 1.7.2 release

2017-02-01 Thread Joern Kottmann
wrote: > Hi Jörn, > > I am curious - is there a specific reason that OpenNLP suddenly > has this flurry of activity? > > Best, > > -- Richard > > > On 31.01.2017, at 13:45, Joern Kottmann wrote: > > > > Dear all, > > > > We will be making a

Re: [VOTE] Apache OpenNLP 1.7.2 Release Candidate

2017-02-01 Thread Joern Kottmann
The GIS training is not printing any messages due to a bug. Lets cancel this vote and try to release again with that bug fixed. Also the Data Indexers printing can't be controlled witht he PrintMessages parameter, we should fix that as well. Jörn On Tue, Jan 31, 2017 at 2:33 PM, Suneel Marthi w

1.7.2 release

2017-01-31 Thread Joern Kottmann
Dear all, We will be making a 1.7.2 release of OpenNLP today. All issues in jira are closed now. Jörn

Re: OpenNLP model for model 1.7.3+

2017-01-30 Thread Joern Kottmann
Hello, I agree with Richard, we can't do such a step in a minor version increase because we also promise that models work with older minor versions e.g. model trained with 1.7.4 is supposed to work with 1.7.0. Users probably have a much higher overhead to retrain their models than to update to th

Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-23 Thread Joern Kottmann
+1 binding Jörn On Jan 21, 2017 12:18 AM, "Suneel Marthi" wrote: The Apache OpenNLP PMC would like to call for a Vote on Apache OpenNLP 1.7.1 Release Candidate. The Release artifacts can be downloaded from: https://repository.apache.org/content/repositories/ orgapacheopennlp-1008/org/apache/op

Re: [VOTE] Apache OpenNLP 1.7.1 Release Candidate 1

2017-01-22 Thread Joern Kottmann
On Sat, 2017-01-21 at 21:09 -0500, Jeffrey Zemerick wrote: > I went to the opennlp-distr/README for a summary of changes in 1.7.1 > but I > think it is the same as it was for 1.7.0. Is that file typically > updated > for revision releases? The link at the bottom of the RELEASE_NOTES to > the > fixe

  1   2   3   >