Hi Jeff! Thank you for this hint! Yes, looks like the WhitespaceTokenizer is used in this case...
All the best! Markus 2017-09-27 13:03 GMT+02:00 Jeff Zemerick <jzemer...@apache.org>: > Markus, > > I believe the WhitespaceTokenizer is used [1]. > > Jeff > > [1] > https://github.com/apache/opennlp/blob/4362e02ed0404d12ca75ee3476d4a3 > 2f9f671811/opennlp-tools/src/main/java/opennlp/tools/ > namefind/NameSample.java#L220 > > On Wed, Sep 27, 2017 at 4:13 AM, Markus Kreuzthaler < > markus.kreuztha...@gmail.com> wrote: > > > Hello! > > > > Does anyone know, what tokenizer is used when applying NameFinderME for > > training a custom named entity recognition model? I was searching but I > > could not find this information. > > > > I have to attach the same tokenizer when using the trained model, but I > > don't know which one was used. > > > > Therefore at the moment I just tokenize via: > > String[] tokens = sentence.getCoveredText().split("\\s+"); > > > > Thank you for feedback! > > > > lg Markus > > >