Re: Tokenizer in NameFinderME

Joern Kottmann Thu, 28 Sep 2017 05:20:28 -0700

Use the same tokenizer as you used to tokenize the training data. The
default format assumes the input text is whitespace tokenized and then
uses the whitespace tokenizer to detect the tokens. But for applying
the model you need to use the tokenizer which was used for the
training data.


Jörn

On Thu, Sep 28, 2017 at 8:45 AM, Markus Kreuzthaler
<markus.kreuztha...@gmail.com> wrote:
> Hi Jeff!
>
> Thank you for this hint!
> Yes, looks like the WhitespaceTokenizer is used in this case...
>
> All the best!
>
> Markus
>
>
> 2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>:
>
>> Markus,
>>
>> I believe the WhitespaceTokenizer is used [1].
>>
>> Jeff
>>
>> [1]
>> https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3
>> 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/
>> namefind/NameSample.java#L220
>>
>> On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler <
>> markus.kreuztha...@gmail.com> wrote:
>>
>> > Hello!
>> >
>> > Does anyone know, what tokenizer is used when applying NameFinderME for
>> > training a custom named entity recognition model? I was searching but I
>> > could not find this information.
>> >
>> > I have to attach the same tokenizer when using the trained model, but I
>> > don't know which one was used.
>> >
>> > Therefore at the moment I just tokenize via:
>> > String[] tokens = sentence.getCoveredText().split("\\s+");
>> >
>> > Thank you for feedback!
>> >
>> > lg Markus
>> >
>>

Re: Tokenizer in NameFinderME

Reply via email to