OK so per this https://issues.apache.org/jira/browse/OPENNLP-54
you're saying that results may improve with the CONLL training set,
yes?  That definitely seems worth trying to me.  Now, what, if any,
policies are there about dependencies between OpenNLP modules?  I ask
because the coref task might benefit from the NE output -- perhaps
they are already linked!

jds

On Tue, Jul 17, 2012 at 8:04 AM, Jörn Kottmann <[email protected]> wrote:
> On 07/17/2012 01:55 PM, John Stewart wrote:
>>
>> Well, my sense is that before much more work on packaging steps are
>> done, the quality of the output needs to improve.  I'm not sure it's
>> just a matter of training -- but at this point I'm not at all sure of
>> what I'm saying.  My*impression*  is that the module needs to
>>
>> incorporate a bit more knowledge of language in order to increase
>> recall without over-generating.  Does that make sense?  Also, is there
>> any documentation on how it works currently?  I would be interested in
>> helping, time permitting as always.
>
>
> We do not have documentation. There are some posts on our
> mailing list speaking about it, there is a thesis from Thomas Morton
> which has a chapter about the coref component.
>
> I would like to at least provide very basic documentation for
> the next release.
>
> Do you want to propose some changes or do you have ideas what
> we can do to improve the quality of the output?
>
> The coref component was implemented by Tom and we just maintained
> it a very bit here, but do not have good knowledge about it, anyway, that
> is something that should be changed, and I actually did read and work on
> the code while looking into how to add training support to it.
>
> Do you think OntoNotes is a good data set to continue the development?
>
> Jörn
>

Reply via email to