Re: Tokenizer in NameFinderME

Markus Kreuzthaler Thu, 28 Sep 2017 06:09:08 -0700

Hi Jörn!

Thank you!
This issue is solved for me...


lg Markus


2017-09-28 14:19 GMT+02:00 Joern Kottmann <kottm...@gmail.com>:

> Use the same tokenizer as you used to tokenize the training data. The
> default format assumes the input text is whitespace tokenized and then
> uses the whitespace tokenizer to detect the tokens. But for applying
> the model you need to use the tokenizer which was used for the
> training data.
>
> Jörn
>
> On Thu, Sep 28, 2017 at 8:45 AM, Markus Kreuzthaler
> <markus.kreuztha...@gmail.com> wrote:
> > Hi Jeff!
> >
> > Thank you for this hint!
> > Yes, looks like the WhitespaceTokenizer is used in this case...
> >
> > All the best!
> >
> > Markus
> >
> >
> > 2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>:
> >
> >> Markus,
> >>
> >> I believe the WhitespaceTokenizer is used [1].
> >>
> >> Jeff
> >>
> >> [1]
> >> https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3
> >> 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/
> >> namefind/NameSample.java#L220
> >>
> >> On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler <
> >> markus.kreuztha...@gmail.com> wrote:
> >>
> >> > Hello!
> >> >
> >> > Does anyone know, what tokenizer is used when applying NameFinderME
> for
> >> > training a custom named entity recognition model? I was searching but
> I
> >> > could not find this information.
> >> >
> >> > I have to attach the same tokenizer when using the trained model, but
> I
> >> > don't know which one was used.
> >> >
> >> > Therefore at the moment I just tokenize via:
> >> > String[] tokens = sentence.getCoveredText().split("\\s+");
> >> >
> >> > Thank you for feedback!
> >> >
> >> > lg Markus
> >> >
> >>
>

Re: Tokenizer in NameFinderME

Reply via email to