Yes, maybe we should make the module dependencies clear to understand, maybe with some diagrams. The Tokenizer Tools documentation includes a passage explaining how to work with raw text. Maybe it would be helpful to add something like that in the Name Finder Tools documentation, since it is the most popular module.
http://incubator.apache.org/opennlp/documentation/1.5.2-incubating/manual/opennlp.html#tools.tokenizer.cmdline On Wed, Feb 8, 2012 at 9:52 PM, James Kosin <james.ko...@gmail.com> wrote: > On 2/8/2012 6:20 PM, Aliaksandr Autayeu wrote: > > On Wed, Feb 8, 2012 at 3:02 PM, Jim - FooBar(); <jimpil1...@gmail.com > >wrote: > > > >> Any chance you remember whether you tokenized the sentences *and > >> pos-tagged the tokens* before feeding them to the maxent NER model? I' m > >> asking because the docs say you *ONLY* need to tokenize sentences before > >> sending > > AFAIK, there was no POS tagging. > > > > Aliaksandr > > > Actually, the documentation should say that it needs to be run through > the sentence detector and tokenizer before sending to the name finder. > > James >