2011/7/4 Jörn Kottmann <[email protected]>:
> On 7/4/11 2:05 PM, Olivier Grisel wrote:
>>
>> Done. See my comment on
>> https://issues.apache.org/jira/browse/OPENNLP-211  for additional info
>> on the integration / usage.
>
> Thanks, doesn't seem that difficult to parse it. Hopefully we have quickly
> a state where it is possible to import the wikinews data in to the corpus
> server, the parsing might need a little fine tuning to give good results.

Keeping the correct link position from the original markup while
cleaning it can be tricky though. Be careful when tweaking the parser.
Maybe the Span helper classes from OpenNLP could help make this code
more robust.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to