Hello,

i try to use the ConceptMapper to annotate Multi Word Units in german. I
face the problem that all the tokens within the dictionary are matched
somehow like.

In the dict -> FC Barcelona

Annotated in a Text "The FC scored today" FC is annotated as DictEntry

Why does conceptMapper annotate this. Here are my Parameters

AnalysisEngineDescription mapper =
AnalysisEngineFactory.createPrimitiveDescription(
                                ConceptMapper.class,
                                ts,
                                ConceptMapper.PARAM_ANNOTATION_NAME,
"org.apache.uima.conceptMapper.DictTerm",
                        ConceptMapper.PARAM_ENCLOSINGSPAN, "enclosingSpan",
                        ConceptMapper.PARAM_TOKENANNOTATION, 
"opennlp.uima.Token",
                        ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] 
{"canonical"},
                        ConceptMapper.PARAM_FEATURE_LIST, new String[] 
{"DictCanon"},                   
                        ConceptMapper.PARAM_MATCHEDFEATURE, "matchedText",
                        ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, 
"TokenizerDE.xml",
                        //ConceptMapper.PARAM_DATA_BLOCK_FS, 
"uima.tcas.DocumentAnnotation",
                        ConceptMapper.PARAM_DATA_BLOCK_FS, 
"opennlp.uima.Sentence",
                        ConceptMapper.PARAM_SEARCHSTRATEGY, "ContiguousMatch",
                        ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, 
"matchedTokens",
                        TokenNormalizer.PARAM_CASE_MATCH, "ignoreall");

Thank you

Andreas

Reply via email to