Hi Jeff!

Thank you for this hint!
Yes, looks like the WhitespaceTokenizer is used in this case...

All the best!

Markus


2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>:

> Markus,
>
> I believe the WhitespaceTokenizer is used [1].
>
> Jeff
>
> [1]
> https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3
> 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/
> namefind/NameSample.java#L220
>
> On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler <
> markus.kreuztha...@gmail.com> wrote:
>
> > Hello!
> >
> > Does anyone know, what tokenizer is used when applying NameFinderME for
> > training a custom named entity recognition model? I was searching but I
> > could not find this information.
> >
> > I have to attach the same tokenizer when using the trained model, but I
> > don't know which one was used.
> >
> > Therefore at the moment I just tokenize via:
> > String[] tokens = sentence.getCoveredText().split("\\s+");
> >
> > Thank you for feedback!
> >
> > lg Markus
> >
>

Reply via email to