We shouldn't replace JWNL with a newer version,
because we currently don't have the ability to train
or evaluate the coref component.
This is a big issue for us because that also blocks
other changes and updates to the code itself,
e.g. the cleanups Aliaksandr contributed.
What we need here is a plan how we can get the coref component
into a state which makes it possible to develop it in a community.
If we don't find a way to resolve this I think we should move the coref
stuff
to the sandbox and leave it there until we have some training data.
Don't having the ability to train coref also blocks changes we might want
to do the our maxent library.
Maybe it is possible to buy a license for MUC 6 and 7 data, so we can share
this data privately by the team. Are any people familiar if that would
be possible
with the LDC license?
The CONLL2011 data (OntoNotes, costs 50$) might also be suitable to
train it:
http://conll.bbn.com/index.php/data.html
Another option would be label enough wikinews data, so we are able to
train it.
Jörn
On 11/17/11 5:50 AM, James Kosin wrote:
All,
I just saw this that may be interesting to update to...
http://sourceforge.net/projects/extjwnl/
James