This is how my dict looks like:

<?xml version="1.0" encoding="UTF-8" ?>
<synonym>
<token canonical="mwu" SemClass="mwu">
<variant base="1. FC Straubing"/>
<variant base="1. FC Styrum"/>
<variant base="1. FC Tatran Presov"/>
<variant base="1. FC Tatran Prešov"/>
<variant base="1. FC Trogen"/>
<variant base="1. FC Union"/>
<variant base="1. FC Union Berlin"/>
<variant base="1. FC Union Solingen"/>
<variant base="1. FC Viersen"/>
<variant base="1. FC Viersen 05"/>
<variant base="1. FC Vöcklabruck"/>
<variant base="1. FC Weißenfels"/>
<variant base="1. FC Wernigerode"/>
<variant base="1. FC Wilmersdorf"/>
<variant base="1. FC Windeck"/>
<variant base="1. FC Wolfsburg"/>
<variant base="1. FC Wunstorf"/>
<variant base="1. FC Zeitz"/>
<variant base="1. FFC"/>
<variant base="1. FFC 08 Niederkirchen"/>
<variant base="1. FFC Fortuna Dresden-Rähnitz"/>
<variant base="1. FFC Frankfurt"/>
<variant base="1. FFC Montabaur"/>
</token>
</synonym>

Am 20.03.2013 12:26, schrieb Michael Tanenblatt:
> I have never seen this issue--under no circumstances should anything less 
> than the full dictionary entry be matched. The only things I can think of are 
> either errors in the dictionary, though that's unlikely, or issues with the 
> tokenizer. Or a bug… My guess is that the dictionary entry, "FC Barcelona" is 
> being tokenized such that only "FC" is annotated, therefore that is the only 
> part that needs to match. You can test if it is a tokenization issue by using 
> the sample whitespace tokenizer that comes with ConceptMapper just to test 
> and see what results you get.
> 
> 
> On Mar 20, 2013, at 7:09 AM, Andreas Niekler 
> <[email protected]> wrote:
> 
>> Hello,
>>
>> i try to use the ConceptMapper to annotate Multi Word Units in german. I
>> face the problem that all the tokens within the dictionary are matched
>> somehow like.
>>
>> In the dict -> FC Barcelona
>>
>> Annotated in a Text "The FC scored today" FC is annotated as DictEntry
>>
>> Why does conceptMapper annotate this. Here are my Parameters
>>
>> AnalysisEngineDescription mapper =
>> AnalysisEngineFactory.createPrimitiveDescription(
>>                              ConceptMapper.class,
>>                              ts,
>>                              ConceptMapper.PARAM_ANNOTATION_NAME,
>> "org.apache.uima.conceptMapper.DictTerm",
>>                      ConceptMapper.PARAM_ENCLOSINGSPAN, "enclosingSpan",
>>                      ConceptMapper.PARAM_TOKENANNOTATION, 
>> "opennlp.uima.Token",
>>                      ConceptMapper.PARAM_ATTRIBUTE_LIST, new String[] 
>> {"canonical"},
>>                      ConceptMapper.PARAM_FEATURE_LIST, new String[] 
>> {"DictCanon"},                   
>>                      ConceptMapper.PARAM_MATCHEDFEATURE, "matchedText",
>>                      ConceptMapper.PARAM_TOKENIZERDESCRIPTOR, 
>> "TokenizerDE.xml",
>>                      //ConceptMapper.PARAM_DATA_BLOCK_FS, 
>> "uima.tcas.DocumentAnnotation",
>>                      ConceptMapper.PARAM_DATA_BLOCK_FS, 
>> "opennlp.uima.Sentence",
>>                      ConceptMapper.PARAM_SEARCHSTRATEGY, "ContiguousMatch",
>>                      ConceptMapper.PARAM_MATCHEDTOKENSFEATURENAME, 
>> "matchedTokens",
>>                      TokenNormalizer.PARAM_CASE_MATCH, "ignoreall");
>>
>> Thank you
>>
>> Andreas
> 
> 

-- 
Andreas Niekler, Dipl. Ing. (FH)
NLP Group | Department of Computer Science
University of Leipzig
Johannisgasse 26 | 04103 Leipzig

mail: [email protected]

Reply via email to