2013/4/9 Jörn Kottmann <[email protected]> > On 04/07/2013 09:51 PM, Karin Ahlen wrote: > >> I have a question about the parser. I have made an implementation of the >> parser (trained on Swedish Talbanken data) and it's working pretty well. >> > > Would it be possible to contribute the Talbanken corpus parsing code and > other modifications > back to OpenNLP? > > In OpenNLP we have a formats package for parsing/conversion code which > integrates with our > command line training tools. If we add Talbanken support to this package > it would be possible to > train the OpenNLP parser directly on the Talbanken data without doing any > manual conversion. > > Jörn >
Thanks for the reply. The training was done with the command line tool and the corpus was converted to OpenNLP format before training. I didn't do the conversion so I have no knowledge of that. But for your information I can say the corpus is originally in TigerXML format. It would have been sweet if there was a Talbanken support in the formats package. If you want to know more about Talbanken and its format then this is the page to go: http://stp.lingfil.uu.se/~nivre/swedish_treebank/ Regards Karin
