Hello Jörn, Yes, I have a lot of code around TCF, I will see how it can be integrated. AT least, I'll need importers/exporters for OpenNLP/TCF anyway :-)
Best, Tom Am 15.10.2013 10:06, schrieb Jörn Kottmann: > OpenNLP is designed to support many formats for training, but we had to > decide > on one default format, and that is the one which was always supported. > > We can support the proposed TCF Format, are you interested to contribute > parsing code for it? > > Jörn > > On 10/14/2013 09:59 PM, Thomas Zastrow wrote: >> Hello, >> >> In any case, I think its a little bit oldschool to identify tokens and >> additional annotations just with spaces between them ... what about a >> nice XML format (no, not that ISO crap .. what about TCF [1])? Or maybe >> NEGRA? >> >> Best, >> >> Tom >> >> [1] >> http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/The_TCF_Format >> >> >> >> Am 14.10.2013 21:53, schrieb Charles Martin: >>> What happens if all the entity tokens are at the beginning of every >>> line? >>> I find that openlp then thinks that any string near the beginning of >>> a line >>> is an entity, >>> regardless of the content or word context >>> >>> >>> >>> On Mon, Oct 14, 2013 at 12:48 PM, Thomas Zastrow >>> <[email protected]>wrote: >>> >>>> Thanks. That explains a lot ... :-) >>>> >>>> Does it play a role it it is one or two blanks? >>>> >>>> >>>> >>>> Am 14.10.2013 21:44, schrieb William Colen: >>>>> Yes, it does. Include a blank between any element, including >>>>> punctuations >>>>> and annotations. The corpus must be tokenized. >>>>> >>>>> >>>>> 2013/10/14 Thomas Zastrow <[email protected]> >>>>> >>>>>> Hello, >>>>>> >>>>>> I have a question: when creating training material, does it make a >>>>>> difference if there are " " (blanks) around the NE? In other >>>>>> words, is >>>>>> it the same to have: >>>>>> >>>>>> <START:loc>Hamburg<END> >>>>>> >>>>>> or: >>>>>> >>>>>> <START:loc> Hamburg <END> >>>>>> >>>>>> The example in the documentation shows up with the " " ... ? >>>>>> >>>>>> Best, >>>>>> >>>>>> Tom >>>>>> >>>>>> P.S.: ca. 1300 sentences for a free German NE model are done :-) >>>>>> >>>> >>> >
