Re: GSoC 2015 - WSD Module

2015-06-28 Thread Joern Kottmann
Yes, the performance testing has to be there, otherwise it is hard to
tell if it works or not.

Jörn

On Mon, 2015-06-29 at 02:02 +0900, Anthony Beylerian wrote:
> Dear Jörn,
> 
> As a first milestone, for now we have the main interface with two 
> implementations (one unsupervised, one supervised), maybe we can add an 
> evaluator for performance tests and comparison with the test data we 
> currently have (SemEval, SensEval test sets).  
> 
> Best,
> 
> Anthony
> 
> > Subject: Re: GSoC 2015 - WSD Module
> > From: kottm...@gmail.com
> > To: dev@opennlp.apache.org
> > Date: Thu, 25 Jun 2015 21:47:22 +0200
> > 
> > On Wed, 2015-06-10 at 22:13 +0900, Anthony Beylerian wrote:
> > > Hi,
> > > 
> > > I attached an initial patch to OPENNLP-758.
> > > However, we are currently modifying things a bit since many approaches 
> > > need to be supported, but would like your recommendations.
> > > Here are some notes : 
> > > 
> > > 1 - We used extJWNL
> > > 2- [WSDisambiguator] is the main interface
> > > 3- [Loader] loads the resources required
> > > 4- Please check [FeaturesExtractor] for the mentioned methods by Rodrigo.
> > > 5- [Lesk] has many variants, we already implemented some, but wondering 
> > > on the preferred way to switch from one to the other:
> > > As of now we use one of them as default, but we thought of either making 
> > > a parameter list to fill or make separate classes for each, or otherwise 
> > > following your preference.
> > > 6- The other classes are for convenience.
> > > 
> > > We will try to patch frequently on the separate issues, following the 
> > > feedback.
> > 
> > 
> > Sounds good, I reviewed it and think what we have is quite ok.
> > 
> > Most important now is to fix the smaller issues (see the jira issue) and
> > explain to us how it can be run.
> > 
> > The midterm evaluation is coming up next week as well.
> > 
> > How are we standing with the milstone we set?
> > 
> > Jörn
> > 
> 



signature.asc
Description: This is a digitally signed message part


RE: GSoC 2015 - WSD Module

2015-06-28 Thread Anthony Beylerian
Dear Jörn,

As a first milestone, for now we have the main interface with two 
implementations (one unsupervised, one supervised), maybe we can add an 
evaluator for performance tests and comparison with the test data we currently 
have (SemEval, SensEval test sets).  

Best,

Anthony

> Subject: Re: GSoC 2015 - WSD Module
> From: kottm...@gmail.com
> To: dev@opennlp.apache.org
> Date: Thu, 25 Jun 2015 21:47:22 +0200
> 
> On Wed, 2015-06-10 at 22:13 +0900, Anthony Beylerian wrote:
> > Hi,
> > 
> > I attached an initial patch to OPENNLP-758.
> > However, we are currently modifying things a bit since many approaches need 
> > to be supported, but would like your recommendations.
> > Here are some notes : 
> > 
> > 1 - We used extJWNL
> > 2- [WSDisambiguator] is the main interface
> > 3- [Loader] loads the resources required
> > 4- Please check [FeaturesExtractor] for the mentioned methods by Rodrigo.
> > 5- [Lesk] has many variants, we already implemented some, but wondering on 
> > the preferred way to switch from one to the other:
> > As of now we use one of them as default, but we thought of either making a 
> > parameter list to fill or make separate classes for each, or otherwise 
> > following your preference.
> > 6- The other classes are for convenience.
> > 
> > We will try to patch frequently on the separate issues, following the 
> > feedback.
> 
> 
> Sounds good, I reviewed it and think what we have is quite ok.
> 
> Most important now is to fix the smaller issues (see the jira issue) and
> explain to us how it can be run.
> 
> The midterm evaluation is coming up next week as well.
> 
> How are we standing with the milstone we set?
> 
> Jörn
> 
  

Re: WSD - Supervised techniques

2015-06-28 Thread Mondher Bouazizi
Hi everyone,

I finished the first iteration of IMS approach for lexical sample
disambiguation. Please find the patch uploaded on the jira issue [1]. I
also created a tester (IMSTester) to run it.

As I mentioned before, the approach is as follows: each time, the module is
called to disambiguate a word, it first check if the model file for that
word exists.

1- If the "model" file exists, it is used to disambiguate the word

2- Otherwise, if the file does not exist, the module checks if the training
data file for that word exists. If it does, the xml file data will be used
to train the model and create the model file.

3- If no training data exist, the most frequent sense (mfs) in WordNet is
returned.

For now I am using the training data I collected from Senseval and Semeval
websites. However, I am currently checking semcore to use it as a main
reference.

Yours sincerely,

Mondher

[1] https://issues.apache.org/jira/browse/OPENNLP-757



On Thu, Jun 25, 2015 at 5:27 AM, Joern Kottmann  wrote:

> On Fri, 2015-06-19 at 21:42 +0900, Mondher Bouazizi wrote:
> > Hi,
> >
> > Actually I have finished the implementation of most of the parts of the
> IMS
> > approach. I also made a parser for the Senseval-3 data.
> >
> > However I am currently working on two main points:
> >
> > - I am trying to figure out how to use the MaxEnt classifier.
> Unfortunately
> > there is no enough documentation, so I am trying to see how it is used by
> > the other components of OpenNLP. Any recommendation ?
>
> Yes, have a look at the doccat component. It should be easy to
> understand from it how it works. The classifier has to be trained with
> an event (outcome and features) and can then classify a set of features
> in the categories it has seen before as outcome.
>
> Jörn
>