Re: ConceptMApper

Michael Tanenblatt Wed, 20 Mar 2013 04:26:36 -0700

I have never seen this issue--under no circumstances should anything less than 
the full dictionary entry be matched. The only things I can think of are either 
errors in the dictionary, though that's unlikely, or issues with the tokenizer. 
Or a bug… My guess is that the dictionary entry, "FC Barcelona" is being 
tokenized such that only "FC" is annotated, therefore that is the only part 
that needs to match. You can test if it is a tokenization issue by using the 
sample whitespace tokenizer that comes with ConceptMapper just to test and see 
what results you get.



On Mar 20, 2013, at 7:09 AM, Andreas Niekler 
<[email protected]> wrote:

> Hello,
> 
> i try to use the ConceptMapper to annotate Multi Word Units in german. I
> face the problem that all the tokens within the dictionary are matched
> somehow like.
> 
> In the dict -> FC Barcelona
> 
> Annotated in a Text "The FC scored today" FC is annotated as DictEntry
> 
> Why does conceptMapper annotate this. Here are my Parameters
> 
> AnalysisEngineDescription mapper =
> AnalysisEngineFactory.createPrimitiveDescription(
>                               ConceptMapper.class,
>                               ts,
>                               ConceptMapper.PARAM_ANNOTATION_NAME,
> "org.apache.uima.conceptMapper.DictTerm",
>                       ConceptMapper.PARAM_ENCLOSINGSPAN, "enclosingSpan",
>                       ConceptMapper.PARAM_TOKENANNOTATION, 
> "opennlp.uima.Token",
>                       ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] 
> {"canonical"},
>                       ConceptMapper.PARAM_FEATURE_LIST, new String[] 
> {"DictCanon"},                   
>                       ConceptMapper.PARAM_MATCHEDFEATURE, "matchedText",
>                       ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, 
> "TokenizerDE.xml",
>                       //ConceptMapper.PARAM_DATA_BLOCK_FS, 
> "uima.tcas.DocumentAnnotation",
>                       ConceptMapper.PARAM_DATA_BLOCK_FS, 
> "opennlp.uima.Sentence",
>                       ConceptMapper.PARAM_SEARCHSTRATEGY, "ContiguousMatch",
>                       ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, 
> "matchedTokens",
>                       TokenNormalizer.PARAM_CASE_MATCH, "ignoreall");
> 
> Thank you
> 
> Andreas

Re: ConceptMApper

Reply via email to