Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-03 Thread Jörn Kottmann

On 04/03/2013 02:10 AM, William Colen wrote:

Thank you, Jörn.

I also had to update the maven-changes-plugin version. The 2.3 was failing
to download the issue list. Changing to the latest solved the issue.



The date in the NOTICE file still says 2011, that needs to be changed to 
2013.


Jörn


Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-03 Thread William Colen
Thank you, I fixed it. I will start the build of RC3 right now.



On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote:

 On 04/03/2013 02:10 AM, William Colen wrote:

 Thank you, Jörn.

 I also had to update the maven-changes-plugin version. The 2.3 was failing
 to download the issue list. Changing to the latest solved the issue.


 The date in the NOTICE file still says 2011, that needs to be changed to
 2013.

 Jörn



Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-03 Thread Jörn Kottmann
Before you build we should either commit OPENNLP-564 or remove it from 
the issue list.

Should I quickly commit the rules file?

Jörn

On 04/03/2013 01:23 PM, William Colen wrote:

Thank you, I fixed it. I will start the build of RC3 right now.



On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote:


On 04/03/2013 02:10 AM, William Colen wrote:


Thank you, Jörn.

I also had to update the maven-changes-plugin version. The 2.3 was failing
to download the issue list. Changing to the latest solved the issue.



The date in the NOTICE file still says 2011, that needs to be changed to
2013.

Jörn





Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-03 Thread William Colen
In fact you already fixed the year. Thank you.

Yes, I can start the build after it.


On Wed, Apr 3, 2013 at 8:26 AM, Jörn Kottmann kottm...@gmail.com wrote:

 Before you build we should either commit OPENNLP-564 or remove it from the
 issue list.
 Should I quickly commit the rules file?

 Jörn


 On 04/03/2013 01:23 PM, William Colen wrote:

 Thank you, I fixed it. I will start the build of RC3 right now.



 On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote:

  On 04/03/2013 02:10 AM, William Colen wrote:

  Thank you, Jörn.

 I also had to update the maven-changes-plugin version. The 2.3 was
 failing
 to download the issue list. Changing to the latest solved the issue.


  The date in the NOTICE file still says 2011, that needs to be changed
 to
 2013.

 Jörn





Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-02 Thread Jörn Kottmann

The test plan shows that the issue list was not generated,
the problem is that the version is not matching anymore,
the new version id for 1.5.3 in opennlp-distr/pom.xml should
be 12319040.

See this link:
https://issues.apache.org/jira/browse/OPENNLP/fixforversion/12319040

I already updated the pom and committed the change.

Jörn

On 03/08/2013 03:11 PM, William Colen wrote:

Hi all,

Our second release candidate is ready for testing. RC1 failed to pass the
initial quality check.

The RC 2 can be downloaded from here:
http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/

To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.5.3, and for opennlp-maxent to 3.0.3, and add this URL to
your settings.xml file:
https://repository.apache.org/content/repositories/orgapacheopennlp-005/

The current test plan can be found here:
https://cwiki.apache.org/OPENNLP/testplan153.html

Please sign up for tasks in the test plan.

The release plan can be found here:
https://cwiki.apache.org/OPENNLP/releaseplanandtasks153.html

The RC contains quite some changes, please refer to the contained issue
list for details.

William





Re: Liblinear (was: OpenNLP 1.5.3 RC 2 ready for testing)

2013-03-22 Thread Jason Baldridge
I used the Java port. I actually pulled it into nak as nak.liblinear
because the model write/read code did it as text files and I needed access
to the Model member fields in order to do the serialization how I wanted.
Otherwise it remains as is. With a little bit of adaptation, you could
provide a Java wrapper in OpenNLP that follows the same pattern as my Scala
stuff. You'd just need to make it implement AbstractModel, which shouldn't
be too hard. (I have it implement LinearModel, which is just a slight
modification of MaxentModel, and I changed all uses of AbstractModel to
LinearModel in Chalk [the opennlp.tools portion]). -j

On Fri, Mar 22, 2013 at 9:32 AM, Jörn Kottmann kottm...@gmail.com wrote:

 Sounds interesting, I hope we will find the time to do that in OpenNLP
 after the 1.5.3 release too. We already discussed this and I think had
 consensus
 on making the machine learning pluggable and then offer a few addons for
 existing libraries.

 Good to know that liblinear works well, as far as I know its written in
 C/C++,
 did you use the Java port of it, or did you wrote a JNI interface?

 Jörn

 On 03/22/2013 03:08 PM, Jason Baldridge wrote:

 BTW, I've just recently finished integrating Liblinear into Nak (which is
 an adaptation of the maxent portion of OpenNLP). I'm still rounding some
 things out, but so far it is producing more accurate models that are
 trained in less time and without using cutoffs. Here's the code:
 https://github.com/scalanlp/**nak https://github.com/scalanlp/nak

 It is still mostly Java, but the liblinear adaptors are in Scala. I've
 kept
 things such that liblinear retrofits to the interfaces that were in
 opennlp.maxent, though given how well it is working, I'll be stripping
 those out and going with liblinear for everything in upcoming versions.

 Happy to answer any questions or help out with any of the above if it
 might
 be useful!





-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge


Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-03-20 Thread James Kosin

I've finished the testing with the Name Finders.

The results where improved slightly.  I know in the English models the 
tagger mistakenly tagged adjacent tokens as being identical when in fact 
the model had correctly categorized the names.  There were 3 sentences 
that previously had an incorrect adjacent tag in the 1.5.2 release that 
are now fixed and improved the score slightly for the 1.5.3 release.  
I'm not sure on the other models how many sentences were affected.
This only affected the name finder that was trained to categorize all 
the names together into one model.  Models trained to only find one type 
of name where not affected by this change, because any adjacently tagged 
item would be the same type anyway.


James


Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-03-18 Thread James Kosin

Jorn,

Could you run the German data to get the combined values?

I have the values for the 1000 iterations for the combined.
=
Testing All Name Finder [de.testa]

Precision: 0.6825576995838063
Recall: 0.37326712187047384
F-Measure: 0.4826110219368647
-
Testing All Name Finder [de.testb]

Precision: 0.6774332472006891
Recall: 0.4282602777021508
F-Measure: 0.5247706422018349
-
=

But the runs you did with 1.5.2 where with 100 iterations for the training.

I'll try to get the Conll 2002 data by Wednesday for the namefinder 
testing results.


Thanks,
James


Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-03-14 Thread James Kosin

Hi William,

No, I think it will be fine.  The problem only lies in data where there 
is back to back names being tagged in the sentences.  The unfixed prior 
models would invalidly tag them with the wrong type... i.e.: both could 
be the same type such as person instead of the different types one 
person and the other maybe miscellaneous.


In some of the models; especially the combined Name Finder models that 
contained all the tags ... were affected most; since, the likelihood of 
back to back tags is higher.
In the English models there were 3 sentences that had improper tags 
before ... now have the correct tags with the fixes.  This improved the 
scores a bit.


It should produce identical models since the problem was with the output 
tagging and not with the training of the models.


James

On 3/14/2013 11:00 PM, William Colen wrote:

Hi, James,

Thank you for the warning. It didn't affect the test with the Leipzig
corpus: the output from 1.5.2 and 1.5.3 are identical. Do you think we
should better manually check the output?

Thank you,
William


On Thu, Mar 14, 2013 at 12:09 AM, James Kosin james.ko...@gmail.com wrote:


Hi all,

Note, that we will have some discrepancies in the model performance for
some of the tests in the NameFinder models due to OPENNLP-417 that fixes
the back-to-back name tags.

It should really be limited to the combined name tags; but, could also
affect others.

James



On 3/8/2013 9:11 AM, William Colen wrote:


Hi all,

Our second release candidate is ready for testing. RC1 failed to pass the
initial quality check.

The RC 2 can be downloaded from here:
http://people.apache.org/~**colen/releases/opennlp-1.5.3/**rc2/http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/

To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.5.3, and for opennlp-maxent to 3.0.3, and add this URL
to
your settings.xml file:
https://repository.apache.org/**content/repositories/**
orgapacheopennlp-005/https://repository.apache.org/content/repositories/orgapacheopennlp-005/

The current test plan can be found here:
https://cwiki.apache.org/**OPENNLP/testplan153.htmlhttps://cwiki.apache.org/OPENNLP/testplan153.html

Please sign up for tasks in the test plan.

The release plan can be found here:
https://cwiki.apache.org/**OPENNLP/**releaseplanandtasks153.htmlhttps://cwiki.apache.org/OPENNLP/releaseplanandtasks153.html

The RC contains quite some changes, please refer to the contained issue
list for details.

William






Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-03-13 Thread James Kosin

Hi all,

Note, that we will have some discrepancies in the model performance for 
some of the tests in the NameFinder models due to OPENNLP-417 that fixes 
the back-to-back name tags.


It should really be limited to the combined name tags; but, could also 
affect others.


James


On 3/8/2013 9:11 AM, William Colen wrote:

Hi all,

Our second release candidate is ready for testing. RC1 failed to pass the
initial quality check.

The RC 2 can be downloaded from here:
http://people.apache.org/~colen/releases/opennlp-1.5.3/rc2/

To use it in a maven build set the version for opennlp-tools or
opennlp-uima to 1.5.3, and for opennlp-maxent to 3.0.3, and add this URL to
your settings.xml file:
https://repository.apache.org/content/repositories/orgapacheopennlp-005/

The current test plan can be found here:
https://cwiki.apache.org/OPENNLP/testplan153.html

Please sign up for tasks in the test plan.

The release plan can be found here:
https://cwiki.apache.org/OPENNLP/releaseplanandtasks153.html

The RC contains quite some changes, please refer to the contained issue
list for details.

William