Use the same tokenizer as you used to tokenize the training data. The default format assumes the input text is whitespace tokenized and then uses the whitespace tokenizer to detect the tokens. But for applying the model you need to use the tokenizer which was used for the training data.
Jörn On Thu, Sep 28, 2017 at 8:45 AM, Markus Kreuzthaler <markus.kreuztha...@gmail.com> wrote: > Hi Jeff! > > Thank you for this hint! > Yes, looks like the WhitespaceTokenizer is used in this case... > > All the best! > > Markus > > > 2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>: > >> Markus, >> >> I believe the WhitespaceTokenizer is used [1]. >> >> Jeff >> >> [1] >> https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3 >> 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/ >> namefind/NameSample.java#L220 >> >> On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler < >> markus.kreuztha...@gmail.com> wrote: >> >> > Hello! >> > >> > Does anyone know, what tokenizer is used when applying NameFinderME for >> > training a custom named entity recognition model? I was searching but I >> > could not find this information. >> > >> > I have to attach the same tokenizer when using the trained model, but I >> > don't know which one was used. >> > >> > Therefore at the moment I just tokenize via: >> > String[] tokens = sentence.getCoveredText().split("\\s+"); >> > >> > Thank you for feedback! >> > >> > lg Markus >> > >>