Re: Training NEs?

Thomas Zastrow Mon, 14 Oct 2013 13:01:30 -0700

Hello,

In any case, I think its a little bit oldschool to identify tokens and
additional annotations just with spaces between them ... what about a
nice XML format (no, not that ISO crap .. what about TCF [1])? Or maybe
NEGRA?


Best,

Tom

[1]
http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/The_TCF_Format


Am 14.10.2013 21:53, schrieb Charles Martin:
> What happens if all the entity tokens are at the beginning of every line?
> I find that openlp then thinks that any string near the beginning of a line
> is an entity,
> regardless of the content or word context
> 
> 
> 
> On Mon, Oct 14, 2013 at 12:48 PM, Thomas Zastrow 
> <[email protected]>wrote:
> 
>> Thanks. That explains a lot ... :-)
>>
>> Does it play a role it it is one or two blanks?
>>
>>
>>
>> Am 14.10.2013 21:44, schrieb William Colen:
>>> Yes, it does. Include a blank between any element, including punctuations
>>> and annotations. The corpus must be tokenized.
>>>
>>>
>>> 2013/10/14 Thomas Zastrow <[email protected]>
>>>
>>>> Hello,
>>>>
>>>> I have a question: when creating training material, does it make a
>>>> difference if there are " " (blanks) around the NE? In other words, is
>>>> it the same to have:
>>>>
>>>> <START:loc>Hamburg<END>
>>>>
>>>> or:
>>>>
>>>> <START:loc> Hamburg <END>
>>>>
>>>> The example in the documentation shows up with the " " ... ?
>>>>
>>>> Best,
>>>>
>>>> Tom
>>>>
>>>> P.S.: ca. 1300 sentences for a free German NE model are done :-)
>>>>
>>>
>>
>>
> 
>

Re: Training NEs?

Reply via email to