Hi,
Am 13.01.2017 um 08:19 schrieb Richard Eckart de Castilho:
> ...
>
> In theory there is also a trainer for the tokenizer, but I haven't been able
> yet to set up a working unit test for it. I think that was due to an
> immediate lack up suitable training data. So it remains on the todo list
On 13.01.2017, at 11:15, Peter Klügl wrote:
>
> Am 13.01.2017 um 08:19 schrieb Richard Eckart de Castilho:
>> ...
>>
>> In theory there is also a trainer for the tokenizer, but I haven't been able
>> yet to set up a working unit test for it. I think that was due to an
>> immediate lack up suit
Am 13.01.2017 um 21:12 schrieb Richard Eckart de Castilho:
...
I think the problem was that the data I had easily available was in a CoNLL
format - you cannot train a tokenizer from most CoNLL formats because there is
no information whether two tokens are directly adjacent or not.
Do you have