After a while I figured out that the result provided by the pretrained tokenizer causes this problem. If "Mr. Vinken" is tokenized into 3 tokens "Mr", ".", "Vinken", instead of 2 tokens, the Name Finder works perfectly. It seems that the SimpleTokenizer is better than the pretrained tokenizer in these cases.
May I ask how we can use the optional parameters of opennlp.uima.namefind.NameFinder: opennlp.uima.ProbabilityFeature, opennlp.uima.BeamSize, opennlp.uima.DocumentConfidenceType? I'm sorry for asking these kinds of questions. I just started to use OpenNLP recently and there is nearly no documentation for OpenNLP UIMA at all. On Tue, Jul 17, 2012 at 2:38 PM, Chi Dat Nguyen <[email protected]> wrote: > Hi, > > I has a question on how OpenNLP develops the given tool for the NER task? > > I follow the example in the manual, but the result is not as good as > the given tool. Below is my code: > > InputStream modelIn = new > FileInputStream("resources/opennlpModels/en-ner-person.bin"); > TokenNameFinderModel model = new TokenNameFinderModel(modelIn); > NameFinderME nameFinder = new NameFinderME(model); > > String s1[] = new String[] {"Pierre", "Vinken", ",", "61", "years", > "old", ",", "will", "join", "the", "board", "as", "a", "nonexecutive", > "director", "Nov.", "29", "."}; > Span nameSpans1[] = nameFinder.find(s1); > > String s2[] = new String[] {"Mr.", "Vinken", "is", "chairman", "of", > "Elsevier", "N.V.", ",", "the", "Dutch", "publishing", "group", "."}; > Span nameSpans2[] = nameFinder.find(s2); > > This code can only detect "Pierre Vinken" in the first sentence, but > cannot detect "Vinken" in the second sentence. However, the given tool > can detect both of the entities with the same input model. > Is there any parameter tuning or settings configuration that I am not aware > of? > > Thank you.
