On 6/22/11 10:45 AM, Olivier Grisel wrote:
I will (soon?) include a couple of new scripts in pignlproc to extract occurrence contexts of any kind of entities occurring as wikilinks in Wikipedia dumps to load those in a Solr index. I will let you know when that happens.
We definitely need some code to parse the wikipedia articles. How do you transform the wiki text to plain text in pignlproc? Could we take a similar approach for the annotation project, or maybe even share the code which does it? Jörn
