After a while I figured out that the result provided by the pretrained
tokenizer causes this problem.
If "Mr. Vinken" is tokenized into 3 tokens "Mr", ".", "Vinken",
instead of 2 tokens, the Name Finder works perfectly.
It seems that the SimpleTokenizer is better than the pretrained
tokenizer in these cases.

May I ask how we can use the optional parameters of
opennlp.uima.namefind.NameFinder: opennlp.uima.ProbabilityFeature,
opennlp.uima.BeamSize, opennlp.uima.DocumentConfidenceType?
I'm sorry for asking these kinds of questions. I just started to use
OpenNLP recently and there is nearly no documentation for OpenNLP UIMA
at all.

On Tue, Jul 17, 2012 at 2:38 PM, Chi Dat Nguyen
<[email protected]> wrote:
> Hi,
>
> I has a question on how OpenNLP develops the given tool for the NER task?
>
> I follow the example in the manual, but the result is not as good as
> the given tool. Below is my code:
>
>         InputStream modelIn = new
> FileInputStream("resources/opennlpModels/en-ner-person.bin");
>         TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
>         NameFinderME nameFinder = new NameFinderME(model);
>
>         String s1[] = new String[] {"Pierre", "Vinken", ",", "61", "years",
> "old", ",", "will", "join", "the", "board", "as", "a", "nonexecutive",
> "director", "Nov.", "29", "."};
>         Span nameSpans1[] = nameFinder.find(s1);
>
>         String s2[] = new String[] {"Mr.", "Vinken", "is", "chairman", "of",
> "Elsevier", "N.V.", ",", "the", "Dutch", "publishing", "group", "."};
>         Span nameSpans2[] = nameFinder.find(s2);
>
> This code can only detect "Pierre Vinken" in the first sentence, but
> cannot detect "Vinken" in the second sentence. However, the given tool
> can detect both of the entities with the same input model.
> Is there any parameter tuning or settings configuration that I am not aware 
> of?
>
> Thank you.

Reply via email to