SemClass feature not working in ConceptMapper add-on

Kothuvatiparambil, Viju Sun, 20 Apr 2014 13:11:25 -0700

Hi All, 

I am trying to use the ConceptMapper add on to assign a SemClass feature to 
tokens. I am getting the following error:


SEVERE: ConceptMapper SEVERE: FeatureList[1] 'SemClass' specified, but does not 
exist for type: org.apache.uima.conceptMapper.DictTerm

I configured FeatureList and AttributeList in ConceptMapperOffsetTokenizer.xml 
as given below:

                        <nameValuePair>
                                <name>AttributeList</name>
                                <value>
                                        <array>
                                                <string>canonical</string>
                                                <string>SemClass</string>
                                        </array>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>FeatureList</name>
                                <value>
                                        <array>
                                                <string>DictCanon</string>
                                                <string>SemClass</string>
                                        </array>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>ResultingAnnotationName</name>
                                <value>
                                        <string>
                                                
org.apache.uima.conceptMapper.DictTerm
                                        </string>
                                </value>
                        </nameValuePair>

Here is my simplified dict.xml file

<synonym>
  <token canonical="grocery" SemClass="category">
     <variant base="grocery"/>
  </token>
</synonym>

I debugged the problem and found that it is looking for the SemClass feature in 
resultAnnotationType which DictTerm. But actually, the SemClass is not a 
feature in DictTerm type.

      resultEnclosingSpan = 
resultAnnotationType.getFeatureByBaseName(resultEnclosingSpanName);
      if (resultEnclosingSpan == null) {
        logger.logError(PARAM_ENCLOSINGSPAN + " '" + resultEnclosingSpanName
                + "' specified, but does not exist for type: " + 
resultAnnotationType.getName());
        throw new AnnotatorInitializationException();
      }

I just started using UIMA, so I don't understand the complete architecture yet. 
Could any of you point me to the right direction ?  Thanks a lot in advance.

Viju Kothuvatiparambil

Here is the complete ConceptMapperOffsetTokenizer.xml file contents:

<taeDescription xmlns="http://uima.apache.org/resourceSpecifier";>
        <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
        <primitive>true</primitive>
        
<annotatorImplementationName>org.apache.uima.conceptMapper.ConceptMapper</annotatorImplementationName>
        <analysisEngineMetaData>
                <name>ConceptMapper</name>
                <description></description>
                <version>1</version>
                <vendor></vendor>
                <configurationParameters>
                        <configurationParameter>
                                <name>caseMatch</name>
                                <description>
                                        this parameter specifies the case 
folding mode:
                                        ignoreall - fold everything to 
lowercase for
                                        matching insensitive - fold only tokens 
with initial
                                        caps to lowercase digitfold - fold all 
(and only)
                                        tokens with a digit sensitive - perform 
no case
                                        folding
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>Stemmer</name>
                                <description>
                                        Name of stemmer class to use before 
matching. MUST
                                        have a zero-parameter constructor! If 
not specified,
                                        no stemming will be performed.
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>ResultingAnnotationName</name>
                                <description>
                                        Name of the annotation type created by 
this TAE,
                                        must match the typeSystemDescription 
entry
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>ResultingEnclosingSpanName</name>
                                <description>
                                        Name of the feature in the 
resultingAnnotation to
                                        contain the span that encloses it (i.e. 
its
                                        sentence)
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>AttributeList</name>
                                <description>
                                        List of attribute names for XML 
dictionary entry
                                        record - must correspond to FeatureList
                                </description>
                                <type>String</type>
                                <multiValued>true</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>FeatureList</name>
                                <description>
                                        List of feature names for CAS 
annotation - must
                                        correspond to AttributeList
                                </description>
                                <type>String</type>
                                <multiValued>true</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>TokenAnnotation</name>
                                <description></description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>TokenClassFeatureName</name>
                                <description>
                                        Name of feature used when doing lookups 
against
                                        IncludedTokenClasses and 
ExcludedTokenClasses
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>TokenTextFeatureName</name>
                                <description></description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>SpanFeatureStructure</name>
                                <description>
                                        Type of annotation which corresponds to 
spans of
                                        data for processing (e.g. a Sentence)
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>OrderIndependentLookup</name>
                                <description>
                                        True if should ignore element order 
during lookup
                                        (i.e., "top box" would equal "box 
top"). Default is
                                        False.
                                </description>
                                <type>Boolean</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>TokenTypeFeatureName</name>
                                <description>
                                        Name of feature used when doing lookups 
against
                                        IncludedTokenTypes and 
ExcludedTokenTypes
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>IncludedTokenTypes</name>
                                <description>
                                        Type of tokens to include in lookups 
(if not
                                        supplied, then all types are included 
except those
                                        specifically mentioned in 
ExcludedTokenTypes)
                                </description>
                                <type>Integer</type>
                                <multiValued>true</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>ExcludedTokenTypes</name>
                                <description></description>
                                <type>Integer</type>
                                <multiValued>true</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>ExcludedTokenClasses</name>
                                <description>
                                        Class of tokens to exclude from lookups 
(if not
                                        supplied, then all classes are excluded 
except those
                                        specifically mentioned in 
IncludedTokenClasses,
                                        unless IncludedTokenClasses is not 
supplied, in
                                        which case none are excluded)
                                </description>
                                <type>String</type>
                                <multiValued>true</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>IncludedTokenClasses</name>
                                <description>
                                        Class of tokens to include in lookups 
(if not
                                        supplied, then all classes are included 
except those
                                        specifically mentioned in 
ExcludedTokenClasses)
                                </description>
                                <type>String</type>
                                <multiValued>true</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>TokenClassWriteBackFeatureNames</name>
                                <description>
                                        names of features that should be 
written back to a
                                        token, such as a POS tag
                                </description>
                                <type>String</type>
                                <multiValued>true</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                
<name>ResultingAnnotationMatchedTextFeature</name>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>PrintDictionary</name>
                                <type>Boolean</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>SearchStrategy</name>
                                <description>
                                        Can be either "SkipAnyMatch",
                                        "SkipAnyMatchAllowOverlap" or
                                        
"ContiguousMatch"&#13;&#13;ContiguousMatch: longest
                                        match of contiguous tokens within 
enclosing
                                        span(taking into account 
included/excluded items).
                                        DEFAULT strategy &#13;SkipAnyMatch: 
longest match of
                                        not-necessarily contiguous tokens 
within enclosing
                                        span (taking into account 
included/excluded items).
                                        Subsequent lookups begin in span after 
complete
                                        match. IMPLIES order-independent lookup
                                        &#13;SkipAnyMatchAllowOverlap: longest 
match of
                                        not-necessarily contiguous tokens 
within enclosing
                                        span (taking into account 
included/excluded items).
                                        Subsequent lookups begin in span after 
next token.
                                        IMPLIES order-independent lookup
                                </description>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>StopWords</name>
                                <type>String</type>
                                <multiValued>true</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>FindAllMatches</name>
                                <type>Boolean</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>MatchedTokensFeatureName</name>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>ReplaceCommaWithAND</name>
                                <type>Boolean</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>TokenizerDescriptorPath</name>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>true</mandatory>
                        </configurationParameter>
                        <configurationParameter>
                                <name>LanguageID</name>
                                <type>String</type>
                                <multiValued>false</multiValued>
                                <mandatory>false</mandatory>
                        </configurationParameter>
                </configurationParameters>
                <configurationParameterSettings>
                        <nameValuePair>
                                <name>caseMatch</name>
                                <value>
                                        <string>ignoreall</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>AttributeList</name>
                                <value>
                                        <array>
                                                <string>canonical</string>
                                                <string>SemClass</string>
                                        </array>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>FeatureList</name>
                                <value>
                                        <array>
                                                <string>DictCanon</string>
                                                <string>SemClass</string>
                                        </array>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>TokenAnnotation</name>
                                <value>
                                        <string>uima.tt.TokenAnnotation</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>ResultingAnnotationName</name>
                                <value>
                                        <string>
                                                
org.apache.uima.conceptMapper.DictTerm
                                        </string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>SpanFeatureStructure</name>
                                <value>
                                        
<string>uima.tcas.DocumentAnnotation</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>OrderIndependentLookup</name>
                                <value>
                                        <boolean>false</boolean>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>TokenClassWriteBackFeatureNames</name>
                                <value>
                                        <array />
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>IncludedTokenClasses</name>
                                <value>
                                        <array />
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>PrintDictionary</name>
                                <value>
                                        <boolean>false</boolean>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>FindAllMatches</name>
                                <value>
                                        <boolean>false</boolean>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>StopWords</name>
                                <value>
                                        <array />
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>ReplaceCommaWithAND</name>
                                <value>
                                        <boolean>false</boolean>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>TokenizerDescriptorPath</name>
                                <value>
                                        <string>
                                                
/search/uima/conf/descriptors/OffsetTokenizer.xml
                                        </string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>ResultingEnclosingSpanName</name>
                                <value>
                                        <string>enclosingSpan</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>MatchedTokensFeatureName</name>
                                <value>
                                        <string>matchedTokens</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                
<name>ResultingAnnotationMatchedTextFeature</name>
                                <value>
                                        <string>matchedText</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>SearchStrategy</name>
                                <value>
                                        <string>ContiguousMatch</string>
                                </value>
                        </nameValuePair>
                        <nameValuePair>
                                <name>LanguageID</name>
                                <value>
                                        <string>en</string>
                                </value>
                        </nameValuePair>
                </configurationParameterSettings>
                <typeSystemDescription>
                        <imports>
                                <import 
name="org.apache.uima.conceptMapper.DictTerm" />
                                <import
                                        
name="org.apache.uima.conceptMapper.support.tokenizer.TokenAnnotation" />
                        </imports>
                        <types>
                                <typeDescription>
                                        <name>uima.tt.TokenAnnotation</name>
                                        <description></description>
                                        
<supertypeName>uima.tcas.Annotation</supertypeName>
                                        <features>
                                                <featureDescription>
                                                        <name>SemClass</name>
                                                        <description>
                                                                semantic class 
of token
                                                        </description>
                                                        <rangeTypeName>
                                                                uima.cas.String
                                                        </rangeTypeName>
                                                </featureDescription>
                                                <featureDescription>
                                                        <name>POS</name>
                                                        <description>
                                                                Part of SPeech 
of term to which this
                                                                token is a part
                                                        </description>
                                                        <rangeTypeName>
                                                                uima.cas.String
                                                        </rangeTypeName>
                                                </featureDescription>
                                                <featureDescription>
                                                        
<name>frost_TokenType</name>
                                                        
<description></description>
                                                        <rangeTypeName>
                                                                uima.cas.Integer
                                                        </rangeTypeName>
                                                </featureDescription>
                                        </features>
                                </typeDescription>
                        </types>
                </typeSystemDescription>
                <typePriorities>
                        <priorityList>
                                <!-- <type>uima.tt.SentenceAnnotation</type> -->
                                <type>uima.tt.TokenAnnotation</type>
                        </priorityList>
                </typePriorities>
                <fsIndexCollection />
                <capabilities>
                        <capability>
                                <inputs>
                                        <type allAnnotatorFeatures="true">
                                                uima.tt.TokenAnnotation
                                        </type>
                                        <!-- <type 
allAnnotatorFeatures="true">uima.tt.SentenceAnnotation</type>
                                                <type 
allAnnotatorFeatures="true">uima.tt.ParagraphAnnotation</type> -->
                                </inputs>
                                <outputs>
                                        <type allAnnotatorFeatures="true">
                                                
org.apache.uima.conceptMapper.DictTerm
                                        </type>
                                        <type allAnnotatorFeatures="true">
                                                uima.tt.TokenAnnotation
                                        </type>
                                        <type allAnnotatorFeatures="true">
                                                
org.apache.uima.conceptMapper.support.tokenizer.TokenAnnotation
                                        </type>
                                        <type allAnnotatorFeatures="true">
                                                uima.tcas.DocumentAnnotation
                                        </type>
                                </outputs>
                                <languagesSupported />
                        </capability>
                </capabilities>
                <operationalProperties>
                        <modifiesCas>true</modifiesCas>
                        
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
                        <outputsNewCASes>false</outputsNewCASes>
                </operationalProperties>
        </analysisEngineMetaData>
        <externalResourceDependencies>
                <externalResourceDependency>
                        <key>DictionaryFile</key>
                        <description>dictionary file loader.</description>
                        <interfaceName>
                                
org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource
                        </interfaceName>
                        <optional>false</optional>
                </externalResourceDependency>
        </externalResourceDependencies>
        <resourceManagerConfiguration>
                <externalResources>
                        <externalResource>
                                <name>DictionaryFileName</name>
                                <description>
                                        A file containing the dictionary. 
Modify this URL to
                                        use a different dictionary.
                                </description>
                                <fileResourceSpecifier>
                                        
<fileUrl>file:/search/uima/conf/testDict.xml</fileUrl>
                                </fileResourceSpecifier>
                                <implementationName>
                                        
org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource_impl
                                </implementationName>
                        </externalResource>
                </externalResources>
                <externalResourceBindings>
                        <externalResourceBinding>
                                <key>DictionaryFile</key>
                                <resourceName>DictionaryFileName</resourceName>
                        </externalResourceBinding>
                </externalResourceBindings>
        </resourceManagerConfiguration>
</taeDescription>
[Kothuvatiparambil, Viju] 

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

SemClass feature not working in ConceptMapper add-on

Reply via email to