Hi Jörn! Thank you! This issue is solved for me...
lg Markus 2017-09-28 14:19 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > Use the same tokenizer as you used to tokenize the training data. The > default format assumes the input text is whitespace tokenized and then > uses the whitespace tokenizer to detect the tokens. But for applying > the model you need to use the tokenizer which was used for the > training data. > > Jörn > > On Thu, Sep 28, 2017 at 8:45 AM, Markus Kreuzthaler > <markus.kreuztha...@gmail.com> wrote: > > Hi Jeff! > > > > Thank you for this hint! > > Yes, looks like the WhitespaceTokenizer is used in this case... > > > > All the best! > > > > Markus > > > > > > 2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>: > > > >> Markus, > >> > >> I believe the WhitespaceTokenizer is used [1]. > >> > >> Jeff > >> > >> [1] > >> https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3 > >> 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/ > >> namefind/NameSample.java#L220 > >> > >> On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler < > >> markus.kreuztha...@gmail.com> wrote: > >> > >> > Hello! > >> > > >> > Does anyone know, what tokenizer is used when applying NameFinderME > for > >> > training a custom named entity recognition model? I was searching but > I > >> > could not find this information. > >> > > >> > I have to attach the same tokenizer when using the trained model, but > I > >> > don't know which one was used. > >> > > >> > Therefore at the moment I just tokenize via: > >> > String[] tokens = sentence.getCoveredText().split("\\s+"); > >> > > >> > Thank you for feedback! > >> > > >> > lg Markus > >> > > >> >