Ah good, I was going to ask about parses too -- so this is done. I'll start reading the code tonight.
OntoNotes is smallish, yes? Is the English bit larger than the CoNLL data set? In terms of cost, isn't it free? Thanks, jds On Tue, Jul 17, 2012 at 11:09 AM, Jörn Kottmann <[email protected]> wrote: > On 07/17/2012 05:03 PM, John Stewart wrote: >> >> OK so per thishttps://issues.apache.org/jira/browse/OPENNLP-54 >> >> you're saying that results may improve with the CONLL training set, >> yes? That definitely seems worth trying to me. Now, what, if any, >> policies are there about dependencies between OpenNLP modules? I ask >> because the coref task might benefit from the NE output -- perhaps >> they are already linked! > > > The input for coref is this: > - Full or shallow parse (depends on how the model was trained) > - NER output > > All this information is encoded into Parse objects and therefore no > direct link between the components is necessary. > You can see this nicely when you run the command line demo. > > Yes, we need a corpus to train it on. Maybe OntoNotes would be a good > candidate, its affordable to everyone. > > What do you think? > > Jörn >
