Use the same tokenizer as you used to tokenize the training data. The
default format assumes the input text is whitespace tokenized and then
uses the whitespace tokenizer to detect the tokens. But for applying
the model you need to use the tokenizer which was used for the
training data.

Jörn

On Thu, Sep 28, 2017 at 8:45 AM, Markus Kreuzthaler
<markus.kreuztha...@gmail.com> wrote:
> Hi Jeff!
>
> Thank you for this hint!
> Yes, looks like the WhitespaceTokenizer is used in this case...
>
> All the best!
>
> Markus
>
>
> 2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>:
>
>> Markus,
>>
>> I believe the WhitespaceTokenizer is used [1].
>>
>> Jeff
>>
>> [1]
>> https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3
>> 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/
>> namefind/NameSample.java#L220
>>
>> On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler <
>> markus.kreuztha...@gmail.com> wrote:
>>
>> > Hello!
>> >
>> > Does anyone know, what tokenizer is used when applying NameFinderME for
>> > training a custom named entity recognition model? I was searching but I
>> > could not find this information.
>> >
>> > I have to attach the same tokenizer when using the trained model, but I
>> > don't know which one was used.
>> >
>> > Therefore at the moment I just tokenize via:
>> > String[] tokens = sentence.getCoveredText().split("\\s+");
>> >
>> > Thank you for feedback!
>> >
>> > lg Markus
>> >
>>

Reply via email to