OpenNLP Similarity release (Was: Re: Build failed in Jenkins: OpenNLP #476)

2014-10-28 Thread Jörn Kottmann

Yes it would be great to get it released.

I suggest we move it from the sandbox to the addons and then we make an 
addons

release.

Any opinions?

Jörn

On 10/27/2014 11:54 PM, Boris Galitsky wrote:

Hi guys

   since you are taking about the build - when this project is moved to github, 
would I have a chance to try to deploy
OpenNLP.Similarity?
   
   I struggled for some time to deploy it couple of years back.


Regards
Boris



Subject: Re: Build failed in Jenkins: OpenNLP #476
From: kottm...@gmail.com
To: dev@opennlp.apache.org
Date: Mon, 27 Oct 2014 22:50:17 +0100

On Mon, 2014-10-27 at 19:15 +, Rodrigo Agerri wrote:

Hi,

This is not caused by my latest commit, is it not?

Your last commit just triggered the build.
The build itself was successful. It failed afterwards when it tried to
deploy the artifacts to the snapshot repo with: 503 Service Temporarily
Unavailable

It probably works if we trigger it.

Jörn







RE: OpenNLP Similarity release

2014-10-28 Thread Boris Galitsky
I will then clean the code and make sure all tests work

Regards
Boris


 Date: Tue, 28 Oct 2014 08:46:33 +0100
 From: kottm...@gmail.com
 To: dev@opennlp.apache.org
 Subject: OpenNLP Similarity release (Was: Re: Build failed in Jenkins: 
 OpenNLP #476)

 Yes it would be great to get it released.

 I suggest we move it from the sandbox to the addons and then we make an
 addons
 release.

 Any opinions?

 Jörn

 On 10/27/2014 11:54 PM, Boris Galitsky wrote:
 Hi guys

 since you are taking about the build - when this project is moved to github, 
 would I have a chance to try to deploy
 OpenNLP.Similarity?

 I struggled for some time to deploy it couple of years back.

 Regards
 Boris

 
 Subject: Re: Build failed in Jenkins: OpenNLP #476
 From: kottm...@gmail.com
 To: dev@opennlp.apache.org
 Date: Mon, 27 Oct 2014 22:50:17 +0100

 On Mon, 2014-10-27 at 19:15 +, Rodrigo Agerri wrote:
 Hi,

 This is not caused by my latest commit, is it not?
 Your last commit just triggered the build.
 The build itself was successful. It failed afterwards when it tried to
 deploy the artifacts to the snapshot repo with: 503 Service Temporarily
 Unavailable

 It probably works if we trigger it.

 Jörn



  

What should we do with the SF models?

2014-10-28 Thread Joern Kottmann
Hi all,

OpenNLP always came with a couple of trained models which were ready to
use for a few languages. The performance a user encounters with those
models heavily depends on their input text.

Especially the English name finder models which were trained on MUC 6/7
data perform very poorly these days if run on current news articles and
even worse on data which is not in the news domain.

Anyway, we often get judged on how well OpenNLP works just based on the
performance of those models (or maybe people who compare their NLP
systems against OpenNLP just love to have OpenNLP perform badly).

I think we are now at a point with those models were it is questionable
if having them is still an advantage for OpenNLP. The SourceForge page
is often blocked due to traffic limitations. We definitely have to act
somehow.

The old models have definitely some historic value and are used for
testing the release.

What should we do?

We could take them offline and advice our users to train their own
models on one of the various corpora we support. We could also do both
and place a prominent link to our corpora documentation on the download
page and in a less visible place a link to he historic SF models.

Jörn



Re: What should we do with the SF models?

2014-10-28 Thread Gustavo Knuppe
I believe that models are important for users, since not every user has
access to appropriate data files to train basic models.

My suggestion is to use an alternative service to host these models,
like github, torrent or other file share service...

Github is a good option since they don't have any quota or bandwidth
limitation.

Gustvo K.

2014-10-28 15:19 GMT-02:00 Joern Kottmann kottm...@gmail.com:

 Hi all,

 OpenNLP always came with a couple of trained models which were ready to
 use for a few languages. The performance a user encounters with those
 models heavily depends on their input text.

 Especially the English name finder models which were trained on MUC 6/7
 data perform very poorly these days if run on current news articles and
 even worse on data which is not in the news domain.

 Anyway, we often get judged on how well OpenNLP works just based on the
 performance of those models (or maybe people who compare their NLP
 systems against OpenNLP just love to have OpenNLP perform badly).

 I think we are now at a point with those models were it is questionable
 if having them is still an advantage for OpenNLP. The SourceForge page
 is often blocked due to traffic limitations. We definitely have to act
 somehow.

 The old models have definitely some historic value and are used for
 testing the release.

 What should we do?

 We could take them offline and advice our users to train their own
 models on one of the various corpora we support. We could also do both
 and place a prominent link to our corpora documentation on the download
 page and in a less visible place a link to he historic SF models.

 Jörn