Update to Java 8

2016-12-19 Thread Joern Kottmann
Hello all, Java 7 is already EOL. Should we update OpenNLP to Java 8 for the 1.7.0 release, any opinions? Jörn

Re: TODO in GeneratorFactory.java

2016-12-13 Thread Joern Kottmann
Yes, that is a nice change, can you open a jira issue for it and send me the PR? Would like to include that. Jörn On Tue, Dec 13, 2016 at 1:41 PM, Jeffrey Zemerick wrote: > Hi everyone, > > I came across a TODO in GeneratorFactory.java to make > the

Re: Next release

2016-11-09 Thread Joern Kottmann
t; > On Tue, Nov 8, 2016 at 12:07 PM, Joern Kottmann <kottm...@gmail.com> > wrote: > > Hello Rodrigo, > > > > would you mind to add this to our README file? > > > > It is in opennlp-distr and should contain the notable changes for the > > release

Next release

2016-11-07 Thread Joern Kottmann
Hello all, since our last release it has been a while and we received quite a few changes which would be nice to get released. There are still some open Jira issues, but mostly smaller things that can be wrapped up rather quickly. Is there anything important missing which should go into the

Re: Why can i not serialize a Dictionary ?

2016-10-29 Thread Joern Kottmann
eaturegen.GeneratorFactory$CachedFeature > > > GeneratorFactory.create(GeneratorFactory.java:171) > > > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerat > > > or(GeneratorFactory.java:661) > > > at opennlp.tools.util.featuregen.GeneratorFactory$Ag

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Joern Kottmann
ializer too? > > > > > 2016-10-27 22:04 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > > > On Thu, 2016-10-27 at 21:18 +0200, Joern Kottmann wrote: > > > On Tue, 2016-10-25 at 18:49 +0200, Damiano Porta wrote: > > > > > > > > i am getti

Re: new tool training

2016-10-27 Thread Joern Kottmann
On Thu, 2016-10-27 at 16:04 +, Russ, Daniel (NIH/CIT) [E] wrote: > Is it important to calculate the hash of all events? I missed that question. No this is included for debug purposes only, with the has it is possible to see if two models have been trained from exactly the same source with

Re: Why can i not serialize a Dictionary ?

2016-10-27 Thread Joern Kottmann
On Thu, 2016-10-27 at 21:18 +0200, Joern Kottmann wrote: > On Tue, 2016-10-25 at 18:49 +0200, Damiano Porta wrote: > > > > i am getting a strange error during the compiling of a NER model. > > Basically, the end of the build output is: > > > >  98:  ...

Re: Why can i not serialize a Dictionary ?

2016-10-27 Thread Joern Kottmann
On Tue, 2016-10-25 at 18:49 +0200, Damiano Porta wrote: > i am getting a strange error during the compiling of a NER model. > Basically, the end of the build output is: > >  98:  ... loglikelihood=-13340.018762351776 0.999005934601099 >  99:  ... loglikelihood=-13258.358751926637

Re: new tool training

2016-10-27 Thread Joern Kottmann
On Thu, 2016-10-27 at 16:04 +, Russ, Daniel (NIH/CIT) [E] wrote: > Hello, > >    Okay, I found why my toy worked.  I call > AbstractEventTrainer.doTrain(DataIndexer) as oppose to > AbstractEventTrainer.train(ObjectStream).   The train method > calls isValid(). That sets the value of threads

Re: new tool training

2016-10-27 Thread Joern Kottmann
On Thu, 2016-10-27 at 15:49 +, Russ, Daniel (NIH/CIT) [E] wrote: > > Comment 2: > Do you have a preference where the variable should go?  I think > AbstractTrainer is the appropriate place for PSF variable dealing > with ALL trainers, so Threads_(P/D) should be there.  I would remove > and

Re: new tool training

2016-10-27 Thread Joern Kottmann
On Thu, Oct 27, 2016 at 4:41 PM, Russ, Daniel (NIH/CIT) [E] < dr...@mail.nih.gov> wrote: > Hello, > > Background: >I am developing a tool that uses OpenNLP. I have a model that extends > BaseModel, and several AbstractModels. I allow the user (myself) to > specify the TrainerType (GIS/QN)

Re: Custom Features Generator example

2016-10-25 Thread Joern Kottmann
We should probably create an example and add it to our documentation. Jörn On Tue, Oct 25, 2016 at 1:39 PM, Joern Kottmann <kottm...@gmail.com> wrote: > You need to use a constructor which is public and has no arguments. > > The parameters can be passed in onl

Re: Custom Features Generator example

2016-10-25 Thread Joern Kottmann
> > > System.out.println(prefix); > > System.out.println((String)finder); > > System.out.println(prevWindowSize); > > System.out.println(nextWindowSize); > > System.exit(1); > > > > } > > > > It is obviously a test to un

Re: Custom Features Generator example

2016-10-25 Thread Joern Kottmann
What is the constructor of the com.damiano.parser.generator.SpanFeatureGenerator class? Jörn On Tue, Oct 25, 2016 at 11:51 AM, Damiano Porta wrote: > Hello, > I have created a custom generator implementing the AdaptiveFeatureGenerator > interface. > > I am getting this

Re: ContextGenerator

2016-10-24 Thread Joern Kottmann
Hello, the ContextGenerator is not used much anymore and was replaced with context generators which are specific for a component. It think it we can safely make it generic, and the change wouldn't break backward compatibility anyway. Jörn On Fri, Oct 21, 2016 at 3:40 PM, Russ, Daniel (NIH/CIT)

Re: Access to Git

2016-10-21 Thread Joern Kottmann
gt;> > >> or in the svn repo > >> > >> http://svn.apache.org/viewvc/opennlp/trunk/ > >> > >> it does however appear in the original git repo > >> > >> https://git-wip-us.apache.org/repos/asf?p=opennlp.git;a=summary > >>

Re: Moving brat annotator to opennlp.git

2016-10-19 Thread Joern Kottmann
Madhawa Kasun Gunasekara <madhaw...@gmail.com>: > > > +1 > > > > Madhawa > > > > On Wed, Oct 19, 2016 at 2:20 PM, "Shuo Xu" <pzc...@gmail.com> wrote: > > > > > +1 > > > > > > > > > On Wed, Oct 19, 2016 at 12

Moving brat annotator to opennlp.git

2016-10-18 Thread Joern Kottmann
Hello all, what do you think about including the brat ner annotator in the 1.6.1 release? I believe it is important that we include it to allow our users to easier run custom annotation projects, as part of the move we need to extend the documentation so everyone can easily get it up and running

Re: Morfologik Addon

2016-10-13 Thread Joern Kottmann
We could distribute it with our main release, similar to how we do with opennlp-uima. I think that would make sense. If people would like to use it they can add it as an extra dependency. There are probably also other thing we can distribute in a similar fashion with the next release. Jörn On

Re: Access to Git

2016-09-19 Thread Joern Kottmann
The opennlp-addons repo is now also available, and opennlp-sandbox will be available soon. Jörn On Thu, 2016-09-15 at 01:12 +0200, Joern Kottmann wrote: > Sorry, it took me a little to figure this out. > > This link explains how it works: > https://reference.apache.org/c

Re: Access to Git

2016-09-14 Thread Joern Kottmann
Sorry, it took me a little to figure this out. This link explains how it works: https://reference.apache.org/committer/git The reponame is opennlp, we will soon also have the other repos opennlp-addons and opennlp-sandbox. Jörn On Fri, Sep 9, 2016 at 10:58 PM, Joern Kottmann <ko

Re: Access to Git

2016-09-09 Thread Joern Kottmann
Hello, yes you can use it. The add-ons and other things are not setup yet as far as I know, have to ping the infra team about it. Please have a look at the issue I posted to see how to access it. I will work on this on Monday. HTH Jörn On Sep 9, 2016 19:10, "William Colen"

Re: Is sentence detection process really needed?

2016-08-26 Thread Joern Kottmann
The name finder has the concept of "adaptive data" in the feature generation. The feature generators can remember things from previous sentences and use it to generate features based on it. Usually that can help with the recognition rate if you have names that are repeated. You can tweak this to

Re: Migrate to Git?

2016-08-19 Thread Joern Kottmann
ic, Joern! I have some SentimentAnalysis stuff to hopefully commit > and > get refactored. Hopefully after that’s done we can ship a release soon and > publish to Central. > > > > On 8/18/16, 5:50 AM, "Joern Kottmann" <kottm...@gmail.com> wrote: > >

Re: Migrate to Git?

2016-08-18 Thread Joern Kottmann
ieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++ > > > >

Re: Migrate to Git?

2016-07-04 Thread Joern Kottmann
hern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++ > > > > > > > > > > > On 7/4/16, 7:36 AM, "Joern Kottmann" <kottm...@gmail.com> wrote: > > > Hello all, > > > > do we sti

Re: Migrate to Git?

2016-07-04 Thread Joern Kottmann
Hello all, do we still want to do this? Has been a while since we discussed it. I am happy to get it done if we reach consensus on it again. My +1 again. Jörn On Thu, Dec 20, 2012 at 4:40 PM, Tommaso Teofili wrote: > in my opinion that would be good, +1 > Tommaso >

Re: Model to detect the gender

2016-07-04 Thread Joern Kottmann
Hello, there are also other interesting properties e.g. person title (e.g. professor, doctor), job title/position, company legal form. And much more for other entity types. Maybe it would be worth it to build a dedicated component to extract properties from entities. Jörn On Fri, Jul 1, 2016

Re: Performances of OpenNLP tools

2016-07-04 Thread Joern Kottmann
therwise, if anyone would like to suggest proper data-sets for testing > each component that would be really helpful > > Anthony > > On Thu, Jun 23, 2016 at 12:18 AM, Joern Kottmann <kottm...@gmail.com> > wrote: > > > It would be nice to get MASC support into the OpenNLP for

Re: DeepLearning4J as a ML for OpenNLP

2016-07-01 Thread Joern Kottmann
Hello, the people from deeplearning4j are rather nice and I discussed with them for a while how it can be used for OpenNLP. The state back then was that they don't properly support the sparse feature vectors we use in OpenNLP today. Instead we would need to use word embeddings. In the end I never

Re: SentimentAnalysisParser updates

2016-07-01 Thread Joern Kottmann
Hello, would be nice to get a pull request for the work you did. Thanks, Jörn On Wed, Jun 29, 2016 at 8:08 PM, Anastasija Mensikova < mensikova.anastas...@gmail.com> wrote: > Hi everyone, > > Some updates on our SentimentAnalysisParser. > > For the past week I worked on making a pull request

Re: Performances of OpenNLP tools

2016-06-22 Thread Joern Kottmann
> -Jason > > On Tue, 21 Jun 2016 at 10:46 Joern Kottmann <kottm...@gmail.com> wrote: > > > There are some research papers which study and compare the performance of > > NLP toolkits, but be careful often they don't train the NLP tools on the > > same data and the tr

Re: Performances of OpenNLP tools

2016-06-21 Thread Joern Kottmann
There are some research papers which study and compare the performance of NLP toolkits, but be careful often they don't train the NLP tools on the same data and the training data makes a big difference on the performance. Jörn On Tue, Jun 21, 2016 at 5:44 PM, Joern Kottmann <kottm...@gmail.

Re: Performances of OpenNLP tools

2016-06-21 Thread Joern Kottmann
Just don't use the very old existing models, to get good results you have to train on your own data, especially if the domain of the data used for training and the data which should be processed doesn't match. The old models are trained on 90s news, those don't work well on todays news and

Re: svn commit: r1731145 - in /opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools: lemmatizer/ util/

2016-04-26 Thread Joern Kottmann
Hello Rodrigo, you are adding a couple of java files in this commit, and I think more in other commits for the lemmatizer. All new java files must have the AL header. May you please add the header to files where it is missing. Thanks, Jörn  On Thu, 2016-02-18 at 21:02 +,

Re: GSoC 2016: OpenNLP Sentiment Analysis

2016-04-26 Thread Joern Kottmann
The Large Movie Review Dataset might be interesting for this as well: http://ai.stanford.edu/~amaas/data/sentiment/ Jörn On Tue, Apr 26, 2016 at 4:26 PM, Anthony Beylerian < anthony.beyler...@gmail.com> wrote: > sentiment analysis discussion doc : > > >

Re: GSoC 2016: OpenNLP Sentiment Analysis

2016-04-26 Thread Joern Kottmann
I will be able to join as well. Jörn On Tue, Apr 26, 2016 at 5:28 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hey Anastasija, > > To be honest 9am EST is a little aggressive, I will likely be able > to do 6:40 am PT (am traveling back from DC as I type this) which >

Re: Question about deprecated NameFinderME constructors

2016-03-08 Thread Joern Kottmann
There is a custom xml element where it can load a user defined class for feature generation. So you would add an element like this: https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen I think we should remove the deprecated training methods so

Re: Language Model contribution

2016-02-17 Thread Joern Kottmann
Ups, confused the language model you were working on with language detection. I think the interface is good as it is. Jörn On Wed, Feb 17, 2016 at 10:00 AM, Joern Kottmann <kottm...@gmail.com> wrote: > Hello, > > I saw the language model commit. Thanks for contributing

Language Model contribution

2016-02-17 Thread Joern Kottmann
Hello, I saw the language model commit. Thanks for contributing that! Would it be possible to get a short introduction to it? The interface is supposed to take a StringList. Wouldn't it be better if a user can just pass in a String instead? Otherwise he has to worry about tokenizing a string in

Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-12 Thread Joern Kottmann
On Thu, 2015-11-12 at 15:43 +, Russ, Daniel (NIH/CIT) [E] wrote: > 1) I use the old sourceforge models. I find that the source of error > in my analysis are usually not do to mistakes in sentence detection or > POS tagging. I don’t have the annotated data or the time/money to > build custom

Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-12 Thread Joern Kottmann
On Thu, 2015-11-12 at 19:50 +, Jason Baldridge wrote: > Having said that, there is a lot of activity in the deep learning > space, > where old techniques (neural nets) are now viable in ways they weren't > previously, and they are outperforming linear classifiers in task > after > task. I'm

Re: mallet addon

2015-10-20 Thread Joern Kottmann
pipe-blog.com/2006/11/22/why-do-you-hate-crfs/ > > but if results are also worse in Maxent, that is intriguing. I will > look at the Mallet implementation to see if I find out something. > > R > > > > On Mon, Oct 12, 2015 at 4:07 PM, Joern Kottmann <kottm...@gmail.co

Re: Out of Bounds Exception in BioCodec.class

2015-10-07 Thread Joern Kottmann
Hello, I can't see the exception. Can you post it just as text please. Thanks, Jörn On Wed, 2015-10-07 at 10:56 -0400, Blizzard, Zach wrote: > Hey Dev team, > > > > I have a quick question about the BioCodec class: I’m trying to create > my own model to train the OpenNLP program, but I’m

Re: mallet addon

2015-09-29 Thread Joern Kottmann
Hello, this doesn't work with the 1.6.0 release, I build it for testing of one of the first drafts of the machine learning rewrite work we did for 1.6.0. There have been a few changes afterwards. Anyway, if you have a need for it I am happy to fix it up. We can also move it to the sandbox,

Re: svn commit: r1681259 - in /opennlp/trunk: opennlp-distr/pom.xml opennlp-docs/pom.xml opennlp-tools/pom.xml opennlp-uima/pom.xml pom.xml

2015-09-03 Thread Joern Kottmann
Hello, yes the github apache/opennlp repository is always synchronized with our subversion repository here at Apache. If you have a look you will see recent changes in there. Jörn On Tue, May 26, 2015 at 6:07 AM, Ethan Wang wrote: > Hey folks, > > is

Re: Word Sense Disambiguator

2015-07-24 Thread Joern Kottmann
It would be nice if you could share instructions on how to run it. I also would like to give it a try. Jörn On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian anthonybeyler...@hotmail.com wrote: Hello, Yes for the moment we are only using WordNet for sense definitions.The plan is to

Re: GSoC 2015 - WSD Module

2015-06-30 Thread Joern Kottmann
Can you please open some jira issues so we can better keep track of what has to be done. Jörn On Jun 28, 2015 10:23 PM, Joern Kottmann kottm...@gmail.com wrote: Yes, the performance testing has to be there, otherwise it is hard to tell if it works or not. Jörn On Mon, 2015-06-29 at 02:02

Re: GSoC 2015 - WSD Module

2015-06-28 Thread Joern Kottmann
Yes, the performance testing has to be there, otherwise it is hard to tell if it works or not. Jörn On Mon, 2015-06-29 at 02:02 +0900, Anthony Beylerian wrote: Dear Jörn, As a first milestone, for now we have the main interface with two implementations (one unsupervised, one supervised),

Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote: Dear Jörn, Thank you for that. After further surveying, I was thinking of beginning the implementation of an approach based on context clustering as a next step. Maybe similar to the one in [1] which relies on a public (CC-A

Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Wed, 2015-06-10 at 22:13 +0900, Anthony Beylerian wrote: Hi, I attached an initial patch to OPENNLP-758. However, we are currently modifying things a bit since many approaches need to be supported, but would like your recommendations. Here are some notes : 1 - We used extJWNL 2-

Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote: Dear Jörn, Thank you for that. After further surveying, I was thinking of beginning the implementation of an approach based on context clustering as a next step. Maybe similar to the one in [1] which relies on a public (CC-A

Re: GSoC 2015 - WSD Module

2015-06-19 Thread Joern Kottmann
Hello, I will dedicate time tonight to get this pulled in the sandbox and will then also provide some feedback. We can then create new patches against the sandbox to fix further issues. Jörn On Fri, Jun 19, 2015 at 11:02 AM, Anthony Beylerian anthonybeyler...@hotmail.com wrote: Thank you for

Re: GSoC 2015 - WSD Module

2015-06-10 Thread Joern Kottmann
You can attach the patch to one of the issues, you can create an new issue. In the end it doesn't matter much, but important is that we make progress here and get the initial code into our repository. Subsequent changes can then be done in a patch series. Please try to submit the patch as quickly

Re: GSoC 2015 - WSD Module

2015-06-05 Thread Joern Kottmann
Hello, yes, wordnet is fine, we already depend on it. I just think that remote resources are particular problematic. For local resources it boils down to their license. Here is the wordnet one: http://wordnet.princeton.edu/wordnet/license/ We might even be able to redistribute this here at

Re: GSoC 2015 - WSD Module

2015-06-03 Thread Joern Kottmann
We should not use remote resources. A remote service adds severe limits to the WSD component. A remote resource will be slow to query (compared to disk or memory), queries might be expensive (pay per request), the license might not allow usage in a way the ASL promises to our users. Another issue

Re: GSoC 2015 - WSD Module

2015-06-01 Thread Joern Kottmann
Hello, I had a look at your APIs. Lets start with the WSDisambiguator. Should that be an interface? // returns the senses ordered by their score (best one first or only 1 in supervised case) String[] disambiguate(String inputText,int inputWordposition); Shouldn't we have a tokenized input? Or

Re: OpenNLP 1.6.0 RC 4 ready for testing

2015-05-28 Thread Joern Kottmann
The chunker and parser tests are fine now. Do you know what's the deal with the sentence detector? The compatibility test is marked as failed. Can we leave it like that or do we have to fix some bugs? Jörn On May 23, 2015 5:35 AM, William Colen co...@apache.org wrote: Our fourth release

Re: GSoC 2015 - WSD Module

2015-05-22 Thread Joern Kottmann
Hello, one of the tasks we should start is, is to define the interface for the WSD component. Please have a look at the other components in OpenNLP and try to propose an interface in a similar style. Can we use one interface for all the different implementations? Jörn On Mon, May 18, 2015 at

W2VClassesDictionary class

2015-05-22 Thread Joern Kottmann
Hello, looks like this class was renamed into WordClusterDictionary. Can the class W2VClassesDictionary be removed? We shouldn't include it in RC4 when it is not necessary. Thanks, Jörn

OpenNLP RC4

2015-05-22 Thread Joern Kottmann
Hello, we should now be in a good state to do RC4. We finally solved the performance problems with the parser and a couple of very minor things where fixed as well (e.g NOTICE file update). A major addition since RC3 are the automated evaluation tests to speed up our release process. I hope this

Re: How to start contributing to OpenNLP

2015-05-12 Thread Joern Kottmann
Hello, the best way to start is to find something you feel comfortable doing. That could be fixing a bug or implementing a certain feature. Yes, have a look at JIRA there are many issues. Is there some component you would prefer working on? HTH, Jörn On Tue, May 12, 2015 at 5:34 PM, Haider

Re: Automated testing with public data

2015-04-29 Thread Joern Kottmann
richard.eck...@gmail.com: On 15.04.2015, at 10:23, Joern Kottmann kottm...@gmail.com wrote: With publicly accessible data I mean a corpus you can somehow acquire, opposed to the data you create on your own for a project. All the corpora we support in the formats package

Automated testing with public data

2015-04-14 Thread Joern Kottmann
Hi all, this time the progress with the testing for 1.6.0 is rather slow. Most tests are done now and I believe we are in a good shape to build RC3. Anyway it would have bee better to be at that stage month ago. To improve the situation in the future I would like to propose to automate all tests

Re: svn commit: r1670574 - /opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java

2015-04-01 Thread Joern Kottmann
The adaptive data is cleared in the documentDone method. The statement in the issue that it is not cleared is not true afaik. Jörn On Wed, Apr 1, 2015 at 9:47 AM, tomm...@apache.org wrote: Author: tommaso Date: Wed Apr 1 07:47:41 2015 New Revision: 1670574 URL:

Re: Regarding performance of opennlp entity extraction modals

2015-03-16 Thread Joern Kottmann
Hello, I don't have any numbers for you. The performance depends highly on the model you are using, the configured feature generation and the number of features in your training data. To get a good number you probably have to run a test on your machines. All modern CPUs have multiple cores these

Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Joern Kottmann
Hello, thanks for your interest in OpenNLP. We already have a lot of candidates for those GSOC issues. You are welcome to suggest something you would like to work on here on the dev list, create an issue for it and contribute some code to solve it. The best way to get started is probably to

Re: Parser performance bug

2015-03-09 Thread Joern Kottmann
On Fri, 2015-03-06 at 21:07 +0100, Joern Kottmann wrote: The parser still uses the old style of setting the beam size via the constructor. Due to the changes to move that to the training time it doesn't work anymore. The parser has to be changed to set the beam size during training time

Re: Parser performance bug

2015-03-06 Thread Joern Kottmann
. We can't discard the possibility that there was a bug that was fixed with the changes. Regards, William 2015-02-16 12:17 GMT-02:00 Joern Kottmann kottm...@gmail.com: Hi all, the performance of the parser changed a bit. The output of the current version in 1.6.0 RC2 is different

Re: [GSoC2015] OPENNLP-758

2015-03-05 Thread Joern Kottmann
Hello, we got already two students for those two GSOC WSD tasks. They contacted us a while ago (see the WSD thread on this list) and set up the tasks so they can apply for it. I am not sure if it makes much sense to break the WSD tasks further down. Do you have something else in mind you could

Re: Word Sense Disambiguation

2015-02-16 Thread Joern Kottmann
On Mon, 2015-02-16 at 16:29 +0100, Aliaksandr Autayeu wrote: Jörn, to avoid ambiguity in case you addressed me to propose a WSD interface. I'd prefer Anthony to come up with a proposal, because he is closer to the multiple WSD algorithms that would be nice to include in the analysis. Sorry,

Parser performance bug

2015-02-16 Thread Joern Kottmann
Hi all, the performance of the parser changed a bit. The output of the current version in 1.6.0 RC2 is different from the output of the 1.5.3 release. Even tough there shouldn't been any difference as far as I can see. The question of what caused that difference came up and I started to bisect

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
Or if that is a problem for the test, you could also tell RAT to ignore it. On my machine the test fails. The two strings don't match. Jörn On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote: right, thanks I'll fix both. Tommaso 2015-01-29 9:54 GMT+01:00 Joern Kottmann kottm

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote: +String modelString = IOUtils.toString(nGramModelStream); +String outputString = out.toString(Charset.defaultCharset().name()); The XML serialization writes it in UTF-8. Shouldn't you use UTF-8 for this test too instead of

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
It still fails in the assert. I didn't check but I guess the build server has the same problem. Jörn On Thu, 2015-01-29 at 10:25 +0100, Tommaso Teofili wrote: even after my latest commit? If so I'll rearrange the test a bit. Tommaso 2015-01-29 10:21 GMT+01:00 Joern Kottmann kottm

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
wrote: I've just disabled that test, I'll fix it and re-enable it when done. Regards, Tommaso 2015-01-29 10:51 GMT+01:00 Joern Kottmann kottm...@gmail.com: It still fails in the assert. I didn't check but I guess the build server has the same problem. Jörn On Thu, 2015-01-29

Re: svn commit: r1655238 - /opennlp/trunk/

2015-01-28 Thread Joern Kottmann
You didn't remove any entries in your recent commit to them. We moved the main pom.xml from the opennlp folder to the root of the project. Now using eclipse with m2e creates the project files there and I thought it would be nice to have them in svn ignore. Maybe it is possible to consolidate the

Re: Word Sense Disambiguation

2015-01-19 Thread Joern Kottmann
Hello, +1 from me to just go ahead and implement the proposed approach. One goal of this implementation will be to figure out the interface we want to have in OpenNLP for WSD. We can later extend OpenNLP with more implementations which are taking different approaches. Jörn On Thu, 2015-01-15

Build changed opennlp/pom.xml moved to root directory

2014-11-20 Thread Joern Kottmann
Hello everybody, we changed the structure of the project slightly. The main pom.xml used to be located in opennlp/pom.xml. This was done because an Eclipse workspace can't have files at the root level. The Maven convention is to have the file at the root level. I think it is time to move this

Re: 1.6.0 maven repo

2014-11-19 Thread Joern Kottmann
Hello, yes, that should be the current state. Can you please elaborate on the issue you have. Do you get an old version? We should try to make a release of 1.6.0, I think most issues are already solved and remaining bugs we will uncover during the manual testing phase. Jörn On Wed,

Re: Need to speed up the model creation process of OpenNLP

2014-11-19 Thread Joern Kottmann
The runtime almost scales with the number of cores your CPU you have. If you have a 4 core CPU you might come down from 3 hours to 1 hour. To enabled it you need to train with the -params argument and provide a config file for the learner. There are samples shipped with OpenNLP. Jörn On Wed,

Re: Jenkins build is back to normal : OpenNLP_java8 #2

2014-10-29 Thread Joern Kottmann
Hello, I added an OpenNLP Java 8 build to the build server. This will hopefully inform us about problems with Java 8 in the future. Jörn On Wed, 2014-10-29 at 20:25 +, Apache Jenkins Server wrote: See https://builds.apache.org/job/OpenNLP_java8/2/

What should we do with the SF models?

2014-10-28 Thread Joern Kottmann
Hi all, OpenNLP always came with a couple of trained models which were ready to use for a few languages. The performance a user encounters with those models heavily depends on their input text. Especially the English name finder models which were trained on MUC 6/7 data perform very poorly these

Re: Build failed in Jenkins: OpenNLP #476

2014-10-27 Thread Joern Kottmann
On Mon, 2014-10-27 at 19:15 +, Rodrigo Agerri wrote: Hi, This is not caused by my latest commit, is it not? Your last commit just triggered the build. The build itself was successful. It failed afterwards when it tried to deploy the artifacts to the snapshot repo with: 503 Service

Re: TokenNameFinder and Span probs

2014-05-07 Thread Joern Kottmann
Hello Mark, +1 for your second solution. I believe that is much more intuitive than calling a method afterwards to retrieve the prob for a Span. it is easier to use because the prob is delivered as part of the result and no user action is required to obtain it. We could use this solution

Re: Please review the 1.5.3 release announcement

2013-04-17 Thread Joern Kottmann
Yes, we are ready, everything is done. Lets send the announcement. Jörn On Wed, Apr 17, 2013 at 2:44 PM, William Colen william.co...@gmail.comwrote: Jörn, thank you for updating the web site. I already added a news item. Now are we ready to send the announce? On Mon, Apr 15, 2013 at 6:52

<    1   2