Re: coref results seem weird

Jörn Kottmann Thu, 25 Apr 2013 06:48:27 -0700

On 04/25/2013 03:15 PM, Svetoslav Marinov wrote:

What corpus has the English Coref module been trained on?

I contributed code to train on MUC data, but there are still a fewproblems, with detectingpossible mentions in the training data. If you want to give that a try Ican help to get you started.As far as I know the coref models have been trained on MUC data plussome private data, but I am not

sure if that is correct.

Can someone provide some guidance into which language specific resources
(modulo Sentence splitters, tokenizers, POS tagger, Parser and NER) are
needed in order to get the coreference working for a new language. A
Wordnet? What else?


Input needs to be:
- Sentence splitted
- Tokenized
- Either full or shallow parse, depending on how you trained the coref model

If you don't have a wordnet dict for your language you can probablydisable to pieceof feature generation which uses it. I don't know how that will affectthe performance.

We will move the coref component to the sandbox for the next release andhopefully get some help

to refactor it so it can be moved back to the tools package.

Having a second coref component, e.g. rule based would also be nice.

Jörn

Re: coref results seem weird

Reply via email to