Re: Rule based sentence detector

2021-01-21 Thread William Colen
Hi Alan, Do you have a PR for the implementation? Thank you, William Em ter., 19 de jan. de 2021 às 23:52, Alan Wang escreveu: > Hi all, > > I created a rule based sentence detector for OpenNLP > . > There are two kinds of rules: > > 1. break

Re: OpenNLP UD models

2021-01-18 Thread William Colen
Hello Jeff! Nice work!! Did you store the evaluation results somewhere? Does UD have Named Entity annotation? Do you have any reference to share? Why did you select only these languages? Any restrictions? Thank you William Em dom., 17 de jan. de 2021 às 21:15, Jeff Zemerick escreveu: >

Re: [VOTE] Apache OpenNLP 1.9.2 Release Candidate

2019-12-23 Thread William Colen
+1 Biding Tried it with DKPro and other projects. William Colen Em seg., 23 de dez. de 2019 às 12:04, Suneel Marthi escreveu: > +1 binding > > > On Mon, Dec 23, 2019 at 9:28 AM Tommaso Teofili > > wrote: > > > +1 (binding) > > > > tag build succeeds

Re: [VOTE] Apache OpenNLP 1.9.0 Release Candidate 2

2018-07-02 Thread William Colen
+1 William Colen Em seg, 2 de jul de 2018 às 08:07, Jeff Zemerick escreveu: > +1 > > Jeff > > > On Mon, Jul 2, 2018 at 5:22 AM Tommaso Teofili > wrote: > > > +1 > > Il giorno lun 2 lug 2018 alle 10:34 Rodrigo Agerri > > > > > ha scritto:

Re: [VOTE] Apache OpenNLP 1.8.4 Release Candidate

2017-12-22 Thread William Colen
+1 binding Executed eval-tests suite. -- William 2017-12-22 6:05 GMT-02:00 Rodrigo Agerri : > +1 binding > > R > > On Fri, Dec 22, 2017 at 2:02 AM, Koji Sekiguchi > wrote: > > +1 > > > > I checked files included in the -src package, built

Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-11-02 Thread William Colen
ekiguchi >> > > <koji.sekigu...@rondhuit.com> wrote: >> > >> +1 >> > >> >> > >> - checked text files in the zipped model file >> > >> - verified signatures >> > >> - executed LanguageDetector using the

Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-11-01 Thread William Colen
> <koji.sekigu...@rondhuit.com> wrote: > > >> +1 > > >> > > >> - checked text files in the zipped model file > > >> - verified signatures > > >> - executed LanguageDetector using the model file > > >> > >

[VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3

2017-10-30 Thread William Colen
The Apache OpenNLP PMC would like to call for a Vote on the Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 3. The Release artifacts can be downloaded from: http://people.apache.org/~colen/models/langdetect-183/rc3/ The model was built with Apache OpenNLP 1.8.3 release,

Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 2

2017-10-30 Thread William Colen
.2, here's my +1 (verifying > signatures, running LanguageDetector under OpenNLP 1.8.3, etc.) > > Thanks! > > Koji > > > > On 2017/10/29 11:48, William Colen wrote: > >> The Apache OpenNLP PMC would like to call for a Vote on the Language >> Detector model for A

[VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 2

2017-10-28 Thread William Colen
The Apache OpenNLP PMC would like to call for a Vote on the Language Detector model for Apache OpenNLP 1.8.3 Release Candidate 2. The Release artifacts can be downloaded from: http://people.apache.org/~colen/models/langdetect-183/rc2/ The model was built with Apache OpenNLP 1.8.3 release,

Re: [VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate

2017-10-28 Thread William Colen
RC1 vote is cancelled due to a issue in the model. It was including a 3 MB unnecessary report. The report will be removed from the model and made available as an optional download. 2017-10-27 22:13 GMT-02:00 William Colen <co...@apache.org>: > The Apache OpenNLP PMC would like to call f

[VOTE] Language Detector model for Apache OpenNLP 1.8.3 Release Candidate

2017-10-27 Thread William Colen
The Apache OpenNLP PMC would like to call for a Vote on the Language Detector model for Apache OpenNLP 1.8.3 Release Candidate. The Release artifacts can be downloaded from: http://people.apache.org/~colen/models/langdetect-183/rc1/ The model was built with Apache OpenNLP 1.8.3 release, trained

Re: [VOTE] Apache OpenNLP 1.8.3 Release Candidate

2017-10-25 Thread William Colen
+1 binding - eval tests ok - unit test ok - build from tag ok - distribution execution ok - distribution ok 2017-10-25 14:46 GMT-02:00 Tommaso Teofili : > +1 (binding) > > - source build from tag ok > - sigs and checks ok > > Il giorno mer 25 ott 2017 alle ore 18:09

Re: [VOTE] Apache OpenNLP 1.8.2 Release Candidate 2

2017-09-13 Thread William Colen
Evaluation tests OK LD with Leipzig OK +1 (binding) 2017-09-12 17:52 GMT-03:00 Richard Eckart de Castilho : > On 11.09.2017, at 09:12, Joern Kottmann wrote: > > > > I have posted a second release candidate for the Apache OpenNLP 1.8.2 > > release and it is

Re: Sentence Detector

2017-08-25 Thread William Colen
The writer did a mistake by not adding a space after the dot. The sentence detector model will not know how to deal with it because not very often there are dots without space splitting sentences. This is very common in social network. I apply some regex to check if it is not a UR, email or

Re: Releasing a Language Detection Model

2017-07-11 Thread William Colen
e ability to quickly test OpenNLP before they > >> integrate it into their software and to train and evaluate models > >> > >> Users who for some reason have a jar file with a model inside can > just > >> write "unzip model.jar". > &g

Re: Releasing a Language Detection Model

2017-07-10 Thread William Colen
We need to address things such as sharing the evaluation results and how to reproduce the training. There are several possibilities for that, but there are points to consider: Will we store the model itself in a SCM repository or only the code that can build it? Will we deploy the models to a

Re: [VOTE] Apache OpenNLP 1.8.1 Release Candidate 3

2017-07-07 Thread William Colen
+1 - Tested with multiple other projects. Tested language detector. 2017-07-07 10:52 GMT-03:00 Joern Kottmann : > +1 i did run the eval the tests and they passed > > Jörn > > On Fri, Jul 7, 2017 at 1:06 PM, Bruno P. Kinoshita > wrote: > >

Re: [GitHub] opennlp pull request #143: OPENNLP-788: Add initial langdetect implementatio...

2017-05-18 Thread William Colen
+1 to think how to do it. Polyglot is doing it already. 2017-05-18 22:28 GMT-03:00 : > Can the language detector find when the language changes? I have data in > French and English, I would love to be able to pull separate the two. > Daniel > > > > On May 17, 2017, at 12:38

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 3

2017-05-18 Thread William Colen
+1 (binding) Successfully executed complete evaluation tests in source deliverable. Tried it with DKPro and after updating the Lemmatizer and Chunker usage there was two test failures that we could trace back to issues fixed in OPENNLP-125 and OPENNLP-989 that would affect evaluation results.

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-17 Thread William Colen
Would be a pleasure. Let's prepare the next OpenNLP RC and I create a PR with the update. 2017-05-16 14:36 GMT-03:00 Richard Eckart de Castilho <r...@apache.org>: > Hi William, > > > On 16.05.2017, at 14:35, William Colen <william.co...@gmail.com> wrote: > > >

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-16 Thread William Colen
Hi Richard, I cloned DKPro code and tried Rodrigo proposed changes. Your test passes with it. Thank you William 2017-05-15 18:51 GMT-03:00 Rodrigo Agerri : > Hello Richard, > > I have tried with various corpora, including GUM, but I cannot reproduce > that error. > >

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-13 Thread William Colen
With the issues reported by Richard we should cancel the vote and rollback the release. I change my vote to -1 (binding) 2017-05-13 19:08 GMT-03:00 Richard Eckart de Castilho : > > > On 13.05.2017, at 22:35, Richard Eckart de Castilho > wrote: > > > > Should

Re: [VOTE] Apache OpenNLP 1.8.0 Release Candidate 2

2017-05-12 Thread William Colen
+1 binding Executed the complete evaluation suite, both in source distribution and the git tag. Integrated and tested with other tools. 2017-05-12 9:48 GMT-03:00 Joern Kottmann : > The vote is still open and we won't close it before the entire active PMC > voted or the time

Re: closeQuietly() for stream try/catch

2017-04-16 Thread William Colen
We try to avoid external dependencies, including Apache Commons. Take a look if it is possible to use try-with-resources statement. Thank you, William 2017-04-16 8:46 GMT-03:00 Jeff Zemerick : > In cases of code like this when closing a stream: > > finally { >

Re: Update web site layout

2017-03-03 Thread William Colen
Hi, Bruno, What do you think if we instead of using maven site we do it using Jekyll + github? That way we don't need to separate the site and documentation deploy. Thank you William 2017-03-03 10:03 GMT-03:00 Bruno P. Kinoshita < brunodepau...@yahoo.com.br.invalid>: > Hi all, > > Didn't find

Re: Hardcoded length in prefix and suffix feature generators

2017-02-09 Thread William Colen
Looks good! Thanks for the unit tests. Please open a Jira, squash your commits and open the PR. 2017-02-09 19:55 GMT-02:00 Jeffrey Zemerick : > Hi, > > I noticed that the length is hardcoded to 4 in the PrefixFeatureGenerator > and the SuffixFeatureGenerator. I made this

Re: [VOTE] Apache OpenNLP 1.7.2 Release Candidate

2017-02-03 Thread William Colen
+1 binding I did run the eval tests and they all run through, including the one that needs more memory. William 2017-02-03 13:35 GMT-02:00 Suneel Marthi : > +1 binding > > Verified {src, bin} * {zip, tar} and all tests pass. > > On Fri, Feb 3, 2017 at 10:08 AM, Russ, Daniel

Re: OpenNLP - Model version 1.6.0 not supported by this (1.5.3) version of OpenNLP

2017-01-13 Thread William Colen
Are you using Maven? 2017-01-13 5:32 GMT-02:00 David Samuel Lim : > Oops, I meant *opennlp-tools-1.5.3.jar*. My bad. > > On Fri, Jan 13, 2017 at 3:29 PM, David Samuel Lim > wrote: > > > Hi Richard, > > > > Thanks for the reply. I've checked the

Re: Commit message style

2017-01-09 Thread William Colen
+1 for the OPENNLP-xxx: commit message. Fast to find a commit. 2017-01-09 21:24 GMT-02:00 Joern Kottmann : > On Mon, 2017-01-09 at 17:02 -0500, Jeffrey Zemerick wrote: > > I'm personally a fan of the issue number being the first thing on the > > subject line, like

Re: [DISCUSS] 1.7.0 release process issues (was Re: OpenNLP 1.7.0 RC 2 is ready for testing)

2017-01-04 Thread William Colen
https://issues.apache.org/jira/browse/OPENNLP-916 Created a Jira task for it. OPENNLP-916: Create a Release Process page Thank you 2017-01-04 15:23 GMT-02:00 Chris Mattmann : > I just wanted to put this on the thread. > > The process to do a release: > > 1. [VOTE] thread

Re: OpenNLP 1.7.0 RC 2 is ready for testing

2017-01-01 Thread William Colen
Thank you for the voters. We have 3 binding votes and 1 non-binding. The vote is now closed. 2017-01-01 11:20 GMT-02:00 Tommaso Teofili : > +1 > > Source build ok > Sigs ok > License & co ok > Il giorno dom 1 gen 2017 alle 03:02 Richard Eckart de Castilho < >

Re: OpenNLP 1.7.0 RC 2 is ready for testing

2016-12-31 Thread William Colen
gt; > wrote: > > > +1, looks good > > > > Jörn > > > > On Dec 31, 2016 8:54 PM, "William Colen" <co...@apache.org> wrote: > > > > > Hi all, > > > > > > Apache OpenNLP 1.7.0 RC 2 is ready for testing. The

Re: OpenNLP 1.7.0 RC 1 is ready for testing

2016-12-31 Thread William Colen
Richard, We fixed it for 1.7.0 RC 2. Thank you William 2016-12-31 17:17 GMT-02:00 Richard Eckart de Castilho <r...@apache.org>: > On 31.12.2016, at 19:12, William Colen <co...@apache.org> wrote: > > > > Can you please open a Jira? I am already looking how to fix

OpenNLP 1.7.0 RC 2 is ready for testing

2016-12-31 Thread William Colen
Hi all, Apache OpenNLP 1.7.0 RC 2 is ready for testing. The RC 1 failed due to missing files and it failed to run 1.6.0 models. There is no new features since RC 1. The RC 2 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.7.0/rc2/ To use it in a maven build set

Re: OpenNLP 1.7.0 RC 1 is ready for testing

2016-12-31 Thread William Colen
ehu.es/ixa-pipes/models/chunk-models-1.1.0.tar.gz > file: en-perceptron-conll00.bin > > I believe that 1.6.0 models should still work for 1.7.0, right? > > Cheers, > > - Richard > > > On 31.12.2016, at 03:33, William Colen <co...@apache.org> wrote: > > >

Re: OpenNLP 1.7.0 RC 1 is ready for testing

2016-12-31 Thread William Colen
the release. > > All the manual tests are now automatic, so we don't this long > test plan anymore. > > Jörn > > On Sat, 2016-12-31 at 00:24 -0200, William Colen wrote: > > Hi all, > > > > Apache OpenNLP 1.7.0 RC 1 is ready for testing. > > > > The

Re: OpenNLP 1.7.0 RC 1 is ready for testing

2016-12-30 Thread William Colen
Important note: the release artifacts were signed by KEY - 524A9649 2016-12-31 0:24 GMT-02:00 William Colen <co...@apache.org>: > Hi all, > > Apache OpenNLP 1.7.0 RC 1 is ready for testing. > > The RC 1 can be downloaded from here: > http://people.apache.org/~colen/re

OpenNLP 1.7.0 RC 1 is ready for testing

2016-12-30 Thread William Colen
Hi all, Apache OpenNLP 1.7.0 RC 1 is ready for testing. The RC 1 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.7.0/rc1/ To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.7.0 and add the following URL to your settings.xml file:

Re: Update to Java 8

2016-12-19 Thread William Colen
+1 2016-12-19 21:22 GMT-02:00 Joern Kottmann : > +1 from me as well > > Jörn > > On Tue, Dec 20, 2016 at 12:02 AM, Tommaso Teofili < > tommaso.teof...@gmail.com > > wrote: > > > +1 > > > > Tommaso > > > > Il giorno lun 19 dic 2016 alle ore 22:27 ARUN Thundyill Saseendran < >

Re: Chunker - proposal to change API (break compatibility)

2016-11-10 Thread William Colen
ct which has the token and tag, e.g. > TokenWithPos. > On such a sequence we should be able to use most of the existing interfaces > without too much change, right? > > Jörn > > On Thu, Nov 10, 2016 at 10:33 AM, William Colen <william.co...@gmail.com> > wrote: &g

Re: Next release

2016-11-10 Thread William Colen
Joern Kottmann <kottm...@gmail.com>: > Ok, I created a couple of issues and will go through them rather quickly. > > Jörn > > On Thu, Nov 10, 2016 at 3:36 AM, William Colen <william.co...@gmail.com> > wrote: > > > Jörn, I can help removing deprecated code. I

Re: Next release

2016-11-09 Thread William Colen
Jörn, I can help removing deprecated code. I started with PlainTextByLineStream. It is used everywhere so there is a lot to change. 2016-11-08 9:08 GMT-02:00 Joern Kottmann : > I suggest we remove more deprecated code, there is still a lot which could > be removed and is

Re: Moving brat annotator to opennlp.git

2016-10-19 Thread William Colen
+1 Do you think latter we can expand the annotator server to other tools? 2016-10-19 7:05 GMT-02:00 Madhawa Kasun Gunasekara : > +1 > > Madhawa > > On Wed, Oct 19, 2016 at 2:20 PM, "Shuo Xu" wrote: > > > +1 > > > > > > On Wed, Oct 19, 2016 at 12:46 AM,

Access to Git

2016-09-09 Thread William Colen
Hello, Is the Git repository ready for use? Do we need to wait for it to develop new stuff? Thank you, William

Re: Generators

2016-08-17 Thread William Colen
Features does not guarantee that the token will be marked as a NE. Its is like saying to the model that in the dictionary the token can be a NE, but of course it will be evaluated with other features. Remember it is machine learning. You can skip the machine learning using a DictionaryNameFinder.

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread William Colen
o i need to write all the other cases when those must be ignored. > > 2016-08-12 16:26 GMT+02:00 William Colen <william.co...@gmail.com>: > > > You also need examples of what is not entities. > > > > > > 2016-08-12 11:21 GMT-03:00 Damiano Porta <damia

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread William Colen
You also need examples of what is not entities. 2016-08-12 11:21 GMT-03:00 Damiano Porta : > Hello everyone, > pardon for the stupid question but i really do not get the point about > training a maxent model with complete sentences. > > For example: > > Pierre Vinken ,

Re: Morfologik Addon

2016-07-15 Thread William Colen
any light on this? > > Thanks, > > R > > On Fri, Jul 15, 2016 at 12:02 AM, William Colen <william.co...@gmail.com> > wrote: > > Hello, > > > > A while back we started working on a Morfologik Addon. > > > > http://svn.apache.org/viewvc/ope

Morfologik Addon

2016-07-14 Thread William Colen
Hello, A while back we started working on a Morfologik Addon. http://svn.apache.org/viewvc/opennlp/addons/ I checked it out last week and notice it was outdated, specially because it was not using the latest Morfologik version. Also it was missing documentation. You can find more about

Re: Reg. MaxENT and GIS.

2016-07-06 Thread William Colen
> 2016-07-05 13:19 GMT+02:00 William Colen <william.co...@gmail.com>: > > > It is not that easy. You could start from "Papers implemented by > OpenNLP": > > > > https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers > > > >

Re: Migrate to Git?

2016-07-04 Thread William Colen
+1 2016-07-04 11:59 GMT-03:00 Tommaso Teofili : > +1 > > Il giorno lun 4 lug 2016 alle ore 16:41 Madhawa Kasun Gunasekara < > madhaw...@gmail.com> ha scritto: > > > +1 > > > > Madhawa > > > > On Mon, Jul 4, 2016 at 8:09 PM, Anthony Beylerian < > >

Re: Model to detect the gender

2016-06-29 Thread William Colen
erent things, or, name > finder is one thing and those feature generators other? > > Thank you in advance for the clarification. > > 2016-06-29 1:23 GMT+02:00 William Colen <william.co...@gmail.com>: > > > Not exactly. You would create a new NER model to replace yours. &

Re: Model to detect the gender

2016-06-28 Thread William Colen
to create the model, we could only use it to detect names > without using NER. No? > > > > 2016-06-29 0:10 GMT+02:00 William Colen <william.co...@gmail.com>: > > > Do you plan to use the surrounding context? If yes, maybe you could try > to > > split NER in

Re: Model to detect the gender

2016-06-28 Thread William Colen
Do you plan to use the surrounding context? If yes, maybe you could try to split NER in two categories: PersonM and PersonF. Just an idea, never read or tried anything like it. You would need a training corpus with these classes. You could add both the plain dictionary and the regex as NER

Re: DeepLearning4J as a ML for OpenNLP

2016-06-28 Thread William Colen
Thank you for pointing, Prof. Chris. Can you please point me the exact project at http://scispark.jpl.nasa.gov/ I should look at? It is huge. Thank you again. William William Colen 2016-06-28 18:26 GMT-03:00 Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov>: > Yep I think so

Re: DeepLearning4J as a ML for OpenNLP

2016-06-28 Thread William Colen
GMT-03:00 Suneel Marthi <suneel_mar...@yahoo.com.invalid>: > Are u looking at using ND4J (from Deeplearning4j project) as the Math > backend for ML work? If so, yes. > > > From: William Colen <william.co...@gmail.com> > To: "dev@opennlp.apache.org" &

DeepLearning4J as a ML for OpenNLP

2016-06-28 Thread William Colen
Hi, Do you think it would be possible to implement a ML based on DL4J? http://deeplearning4j.org/ Thank you William

Re: Sentiment Analysis Parser updates

2016-06-28 Thread William Colen
Hi, I tried your code. Very good work so far! Congratulations. Is the examples/result file corrupted? It has only one line. Do you plan to implement a simple CLI to use it interactively from command line, similar to bin/opennlp Doccat bin/opennlp TokenNameFinder ? Also, do you plan to add

Re: Usages of Adaptive features.

2016-06-28 Thread William Colen
to be used, usually a file name. -encoding charsetName encoding for reading and writing text, if absent the system default is used. William Colen 2016-06-28 11:04 GMT-03:00 William Colen <william.co...@gmail.com>: > > https://opennlp.apache.org/documentation/1.6.0/manual/

Re: [VOTE] Release OpenNLP 1.6.0 RC 6

2015-07-09 Thread William Colen
...@apache.org: +1 for the release. It all good looks to me and I think it will be nice to have the 1.6.0 out. Rodrigo On Wed, Jul 1, 2015 at 2:37 PM, William Colen co...@apache.org wrote: +1 for the release I repeated a few tests and used the distributables in another project. 2015-06-30 9

Re: [VOTE] Release OpenNLP 1.6.0 RC 6

2015-07-01 Thread William Colen
+1 for the release I repeated a few tests and used the distributables in another project. 2015-06-30 9:20 GMT-03:00 Joern Kottmann kottm...@gmail.com: +1 in addition to the other tests I verified all the hashes and signatures. They are all good. Jörn On Jun 16, 2015 4:51 PM, William Colen

Re: OpenNLP: Named Entity Recognition ( Token Name Finder )

2015-06-17 Thread William Colen
I can't remember if the interactions parameter is used in PERCEPTRON. With my experience with other tools, you should use Cutoff 0. Perceptron takes advantage of every feature you add. Did you try the evaluation tools to compute F1? 2015-06-17 13:25 GMT-03:00 nikhil jain

[VOTE] Release OpenNLP 1.6.0 RC 6

2015-06-16 Thread William Colen
Hello, Lets vote to release RC 6 as OpenNLP 1.6.0. The testing of it is documented here: https://cwiki.apache.org/confluence/display/OPENNLP/TestPlan1.6.0 The RC can be downloaded here: http://people.apache.org/~colen/releases/opennlp-1.6.0/rc6 The release notes can be found here:

Re: OpenNLP: Named Entity Recognition ( Token Name Finder )

2015-05-29 Thread William Colen
The answer about the differences would be quite long. You can learn about the theory researching online. Try some papers from here: https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers Which algorithm is better for you depends on your task and your data. You can start developing using

Re: OpenNLP 1.6.0 RC 4 ready for testing

2015-05-28 Thread William Colen
that or do we have to fix some bugs? Jörn On May 23, 2015 5:35 AM, William Colen co...@apache.org wrote: Our fourth release candidate is ready for testing. RC 3 failed in the compatibility, regression and performance tests, which are fixed in RC 4. The RC 4 can be downloaded from here: http

OpenNLP 1.6.0 RC 3 ready for testing

2015-04-30 Thread William Colen
Our third release candidate is ready for testing. RC 2 failed in the compatibility and regression tests, which are fixed in RC 3. The RC 3 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.6.0/rc3/ To use it in a maven build set the version for opennlp-tools or

Re: Automated testing with public data

2015-04-29 Thread William Colen
+1 The script would also be great for documentation. 2015-04-29 11:15 GMT-03:00 Joern Kottmann kottm...@gmail.com: Or we just make a download script which bootstraps the users corpus folder. Could be a couple of wget lines or so ... Jörn On Wed, Apr 29, 2015 at 6:17 AM, William Colen

Re: Automated testing with public data

2015-04-28 Thread William Colen
Automating the download would be fine as long as we cache it, as Richard suggested. Maybe it could be done by a script to prepare the environment, and not be part of the unit test itself. Anyway, it would be a good idea to save the data somewhere because we never know if some of the websites will

Re: Parser performance bug

2015-02-20 Thread William Colen
I might be totally wrong, but I have a feeling that the change is in ChunkerModel.java, because I also notice a change in the Chunker tool results. It could be somehow related to the changes in the parameters in that file. We can't discard the possibility that there was a bug that was fixed with

OpenNLP 1.6.0 RC 2 ready for testing

2015-01-22 Thread William Colen
Hi all, Our second release candidate is ready for testing. RC 1 failed to pass the initial tests. The RC 2 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.6.0/rc2/ To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.6.0 and add the

Re: OpenNLP 1.6.0 RC 2 ready for testing

2015-01-22 Thread William Colen
of the library. -Original Message- From: William Colen [mailto:william.co...@gmail.com] Sent: Thursday, January 22, 2015 1:55 PM To: dev@opennlp.apache.org Subject: OpenNLP 1.6.0 RC 2 ready for testing Hi all, Our second release candidate is ready for testing. RC 1 failed to pass

OpenNLP 1.6.0 RC 1 ready for testing

2014-12-10 Thread William Colen
Hi all, Our first release candidate is ready for testing. The RC 1 can be downloaded from here: http://people.apache.org/~colen/releases/opennlp-1.6.0/rc1/ To use it in a maven build set the version for opennlp-tools or opennlp-uima to 1.6.0 and add the following URL to your settings.xml file:

Re: Next release (was: Re: 1.6.0 maven repo)

2014-11-21 Thread William Colen
+1 to start the release process I candidate myself as release manager for the 1.6.0. 2014-11-20 18:32 GMT-02:00 Vinh Khuc knv...@gmail.com: +1 for the release of 1.6.0 RC Vinh On Thu, Nov 20, 2014 at 3:24 PM, Joern Kottmann kottm...@gmail.com wrote: Yes, all the important issues,

Re: How to sanitize and parse noisy text

2014-07-15 Thread William Colen
A while back I had a similar problem while extracting text from HTML using Tika. What I did was to hack the Tika HTML parser to extract the text as I needed. I can't remember exactly how it was, but as far as I remember Tika raises events when it finds a markup (at least a HTML markup), that is

Re: TokenNameFinder and Span probs

2014-05-11 Thread William Colen
? -- William Colen

Re: End of line whitespaces in Eclipse

2014-04-24 Thread William Colen
. Any opinions? Jörn On Thu, 2014-04-10 at 19:58 -0300, William Colen wrote: When I save a .java file in Eclipse, it is removing the end of line whitespaces. I am using the http://opennlp.apache.org/code-formatter/OpenNLP-Eclipse-Formatter.xml This is causing lots of changes in files

Re: DocumentSample in Doccat

2014-04-24 Thread William Colen
of the samples I had a Postgres and Accumulo impl for sample storage. just a thought, I know this can get very specific and complicated, thought we may be able to find a middle ground by providing a framework and some generic impls. MG On Thu, Apr 17, 2014 at 8:28 AM, William Colen william.co

Re: DocumentSample in Doccat

2014-04-24 Thread William Colen
should consider adding this method to abstract some detailsjust a thought On Thu, Apr 24, 2014 at 3:56 PM, William Colen william.co...@gmail.com wrote: What do you think of adding the following field to the DocumentSample? MapString, Object extraInformation Also, we could add

Re: DocumentSample in Doccat

2014-04-17 Thread William Colen
in some cases and not constraining Sent from my iPhone On Apr 17, 2014, at 6:35 AM, Jörn Kottmann kottm...@gmail.com wrote: On 04/15/2014 07:45 PM, William Colen wrote: Hello, I've been working with the Doccat module and I am wondering if we could improve its data structure

Re: svn commit: r1587944 [1/2] - in /opennlp/trunk/opennlp-tools/src: main/java/opennlp/tools/cmdline/doccat/ main/java/opennlp/tools/doccat/ main/java/opennlp/tools/sentdetect/ main/java/opennlp/tool

2014-04-16 Thread William Colen
Jörn, Can you please review my change to the ExtensionLoader? I modified it to accept singletons (private constructor and the field INSTANCE). Thank you, William 2014-04-16 12:26 GMT-03:00 co...@apache.org: Author: colen Date: Wed Apr 16 15:26:24 2014 New Revision: 1587944 URL:

Re: Doccat evaluator

2014-04-11 Thread William Colen
the stderr to print the misclassified documents. If reportOutputFile is set, the evaluator will print to it some detailed reports, for example the f-measure for the different outcomes and the confusion matrix. 2014-04-10 19:48 GMT-03:00 William Colen william.co...@gmail.com: Yes, I just finished

Re: Doccat evaluator

2014-04-10 Thread William Colen
, William Colen wrote: Actually, since we always add a tag to each document, accuracy makes sense. We could implement F-1 for the individual categories. 2014-04-09 17:23 GMT-03:00 William Colen william.co...@gmail.com: Hello, I was checking if there is any open issue related to Doccat, and I

End of line whitespaces in Eclipse

2014-04-10 Thread William Colen
When I save a .java file in Eclipse, it is removing the end of line whitespaces. I am using the http://opennlp.apache.org/code-formatter/OpenNLP-Eclipse-Formatter.xml This is causing lots of changes in files I actually needed to change only one line. Do anybody know how to I avoid it? Thank you,

Doccat evaluator

2014-04-09 Thread William Colen
Hello, I was checking if there is any open issue related to Doccat, and I found this one - OPENNLP-81: Add a cli tool for the doccat evaluation support I noticed that there is already a class named DocumentCategorizerEvaluator, which is not used anywhere internally. This is evaluating

Re: CoNLL02 format issue

2014-03-12 Thread William Colen
If it helps, there is another Spanish corpus at CONLL02 page which has 3 fields: Xavier Carreras provides the Spanish data sets with part of speech tagshttp://www.lsi.upc.es/~nlp/tools/nerc/nerc.html (20030803) William 2014-03-12 9:43 GMT-03:00 Roque Vera roqu...@gmail.com: I found an

Re: svn commit: r1534864 - /opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/entitylinker/GeoHashBinScorer.java

2013-10-23 Thread William Colen
to avoid one or two versions. Any opinions? Do we still have a Java 5 user here? Jörn -- William Colen

Re: Size of training data

2013-04-26 Thread William Colen
From command line you can specify memory using MAVEN_OPTS=-Xmx4048m You can also set it as JVM arguments if you are using from the API: java -Xmx4048m ... On Fri, Apr 26, 2013 at 4:30 AM, Svetoslav Marinov svetoslav.mari...@findwise.com wrote: I use the API. Can one specify the memory

Re: Please review the 1.5.3 release announcement

2013-04-17 Thread William Colen
are available. I already promoted the maven repo and pushed the release to the dist area, but it might take a bit until everything is available, the mirrors might need 24 hours to mirror the distributables. Jörn On 04/15/2013 09:46 PM, William Colen wrote: Hello, Please review the release

Re: OpenNLP 1.5.3 RC 3 ready for testing

2013-04-09 Thread William Colen
+1 What about the similarity component? Should we build it only after the 1.5.3 release? William Colen On Tue, Apr 9, 2013 at 5:19 AM, Jörn Kottmann kottm...@gmail.com wrote: Are we ready to start with the release vote? The last important test item that is missing is checking

[VOTE] Release OpenNLP 1.5.3 RC 3

2013-04-09 Thread William Colen
Hello, Lets vote to release RC 3 as OpenNLP 1.5.3. The testing of it is documented here: https://cwiki.apache.org/confluence/display/OPENNLP/TestPlan1.5.3 The RC can be downloaded here: http://people.apache.org/~colen/releases/opennlp-1.5.3/rc3 Please vote to approve this release: [ ] +1

Re: [VOTE] Release OpenNLP 1.5.3 RC 3

2013-04-09 Thread William Colen
+1 Approve the release On Tue, Apr 9, 2013 at 9:51 AM, William Colen william.co...@gmail.comwrote: Hello, Lets vote to release RC 3 as OpenNLP 1.5.3. The testing of it is documented here: https://cwiki.apache.org/confluence/display/OPENNLP/TestPlan1.5.3 The RC can be downloaded here

Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-03 Thread William Colen
Thank you, I fixed it. I will start the build of RC3 right now. On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote: On 04/03/2013 02:10 AM, William Colen wrote: Thank you, Jörn. I also had to update the maven-changes-plugin version. The 2.3 was failing to download

Re: OpenNLP 1.5.3 RC 2 ready for testing

2013-04-03 Thread William Colen
On 04/03/2013 01:23 PM, William Colen wrote: Thank you, I fixed it. I will start the build of RC3 right now. On Wed, Apr 3, 2013 at 5:01 AM, Jörn Kottmann kottm...@gmail.com wrote: On 04/03/2013 02:10 AM, William Colen wrote: Thank you, Jörn. I also had to update the maven-changes

OpenNLP 1.5.3 RC 3 ready for testing

2013-04-03 Thread William Colen
Hi all, Our third release candidate is ready for testing. RC 2 failed to pass in only a few tests, including the creation of the issues list and the NOTICE file. Also, some new bug fixes and improvements were recently included. The RC 3 can be downloaded from here:

Re: Re: English 300k sentences Leipzig Corpus for test

2013-03-14 Thread William Colen
Subject:Re: English 300k sentences Leipzig Corpus for test Date: Thu, 14 Mar 2013 09:48:21 -0300 From: William Colen william.co...@gmail.com To: Jörn Kottmann kottm...@gmail.com Yes, you can forward. It is not clear to me how to convert it. I could only find

Re: Next release

2013-02-19 Thread William Colen
Should we try to upload it to Central Repo using jwnl as groupid? What do you think? On Mon, Feb 18, 2013 at 3:03 PM, Benson Margulies bimargul...@gmail.comwrote: upload to central via ossrh. On Feb 18, 2013, at 12:46 PM, William Colen william.co...@gmail.com wrote: We are using jwnl

Uploading JWNL to Maven Central Repo

2013-02-19 Thread William Colen
1.3 rc3 they distribute? Thank you, William On Tue, Feb 19, 2013 at 11:09 AM, Benson Margulies bimargul...@gmail.comwrote: yes, ossrh will do that On Feb 19, 2013, at 8:38 AM, William Colen william.co...@gmail.com wrote: Should we try to upload it to Central Repo using jwnl as groupid

Re: Next release

2013-02-18 Thread William Colen
With jwnl 1.4_rc3 the code at least compiles. Also, it would be nice if someone familiar with the Coreference module could add some tests to the test plan: https://cwiki.apache.org/OPENNLP/testplan153.html On Sun, Feb 17, 2013 at 10:07 PM, Lance Norskog goks...@gmail.com wrote: OPENNLP-510

Re: Next release

2013-02-18 Thread William Colen
I suppose we can't use opennlp.apache.org to host it, can we? On Mon, Feb 18, 2013 at 10:57 AM, Jörn Kottmann kottm...@gmail.com wrote: On 02/18/2013 02:07 AM, Lance Norskog wrote: OPENNLP-510 Maven dependency on jwnl is broken The version of JWNL used in coreference does not have an

  1   2   >