2013/4/9 Jörn Kottmann <[email protected]>

> On 04/07/2013 09:51 PM, Karin Ahlen wrote:
>
>> I have a question about the parser. I have made an implementation of the
>> parser (trained on Swedish Talbanken data) and it's working pretty well.
>>
>
> Would it be possible to contribute the Talbanken corpus parsing code and
> other modifications
> back to OpenNLP?
>
> In OpenNLP we have a formats package for parsing/conversion code which
> integrates with our
> command line training tools. If we add Talbanken support to this package
> it would be possible to
> train the OpenNLP parser directly on the Talbanken data without doing any
> manual conversion.
>
> Jörn
>

Thanks for the reply. The training was done with the command line tool and
the corpus was converted to OpenNLP format before training. I didn't do the
conversion so I have no knowledge of that. But for your information I can
say the corpus is originally in TigerXML format. It would have been sweet
if there was a Talbanken support in the formats package. If you want to
know more about Talbanken and its format then this is the page to go:
http://stp.lingfil.uu.se/~nivre/swedish_treebank/


Regards
Karin

Reply via email to