On 7/4/11 7:20 PM, Olivier Grisel wrote:
Keeping the correct link position from the original markup while cleaning it can be tricky though. Be careful when tweaking the parser. Maybe the Span helper classes from OpenNLP could help make this code more robust.
I wonder how important the links are here, because we do not want to throw away sentences which do not have links covering their entities. But I believe the links might be very interesting for entity identification, if lets say a person name is labeled, and also covered by a link. The link can be used to identify the person mention. And after we have a few manually labeled articles we can use the links to generate special features which are passed to the name finder. So in the end, do we just generate an annotation for every link?! Jörn
