Re: SemClass feature not working in ConceptMapper add-on

Michael Tanenblatt Mon, 21 Apr 2014 06:25:30 -0700

You are exactly correct in your analysis: by specifying those values for 
AttributeList and FeatureList, ConceptMapper is trying to write the value of 
the SemClass in your dictionary entries to your resulting annotation, which 
appears to be DictTerm, and DIctTerm does not appear to have the SemClass 
feature as it is currently defined. The solution is to extend the definition of 
the DictTerm type to include the the feature SemClass (which should be a 
String).



On Apr 20, 2014, at 4:10 PM, Kothuvatiparambil, Viju 
<[email protected]> wrote:

> Hi All, 
> 
> I am trying to use the ConceptMapper add on to assign a SemClass feature to 
> tokens. I am getting the following error:
> 
> SEVERE: ConceptMapper SEVERE: FeatureList[1] 'SemClass' specified, but does 
> not exist for type: org.apache.uima.conceptMapper.DictTerm
> 
> I configured FeatureList and AttributeList in 
> ConceptMapperOffsetTokenizer.xml as given below:
> 
>                       <nameValuePair>
>                               <name>AttributeList</name>
>                               <value>
>                                       <array>
>                                               <string>canonical</string>
>                                               <string>SemClass</string>
>                                       </array>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>FeatureList</name>
>                               <value>
>                                       <array>
>                                               <string>DictCanon</string>
>                                               <string>SemClass</string>
>                                       </array>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>ResultingAnnotationName</name>
>                               <value>
>                                       <string>
>                                               
> org.apache.uima.conceptMapper.DictTerm
>                                       </string>
>                               </value>
>                       </nameValuePair>
> 
> Here is my simplified dict.xml file
> 
> <synonym>
>  <token canonical="grocery" SemClass="category">
>     <variant base="grocery"/>
>  </token>
> </synonym>
> 
> I debugged the problem and found that it is looking for the SemClass feature 
> in resultAnnotationType which DictTerm. But actually, the SemClass is not a 
> feature in DictTerm type.
> 
>      resultEnclosingSpan = 
> resultAnnotationType.getFeatureByBaseName(resultEnclosingSpanName);
>      if (resultEnclosingSpan == null) {
>        logger.logError(PARAM_ENCLOSINGSPAN + " '" + resultEnclosingSpanName
>                + "' specified, but does not exist for type: " + 
> resultAnnotationType.getName());
>        throw new AnnotatorInitializationException();
>      }
> 
> I just started using UIMA, so I don't understand the complete architecture 
> yet. Could any of you point me to the right direction ?  Thanks a lot in 
> advance.
> 
> Viju Kothuvatiparambil
> 
> Here is the complete ConceptMapperOffsetTokenizer.xml file contents:
> 
> <taeDescription xmlns="http://uima.apache.org/resourceSpecifier";>
>       <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
>       <primitive>true</primitive>
>       
> <annotatorImplementationName>org.apache.uima.conceptMapper.ConceptMapper</annotatorImplementationName>
>       <analysisEngineMetaData>
>               <name>ConceptMapper</name>
>               <description></description>
>               <version>1</version>
>               <vendor></vendor>
>               <configurationParameters>
>                       <configurationParameter>
>                               <name>caseMatch</name>
>                               <description>
>                                       this parameter specifies the case 
> folding mode:
>                                       ignoreall - fold everything to 
> lowercase for
>                                       matching insensitive - fold only tokens 
> with initial
>                                       caps to lowercase digitfold - fold all 
> (and only)
>                                       tokens with a digit sensitive - perform 
> no case
>                                       folding
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>Stemmer</name>
>                               <description>
>                                       Name of stemmer class to use before 
> matching. MUST
>                                       have a zero-parameter constructor! If 
> not specified,
>                                       no stemming will be performed.
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>ResultingAnnotationName</name>
>                               <description>
>                                       Name of the annotation type created by 
> this TAE,
>                                       must match the typeSystemDescription 
> entry
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>ResultingEnclosingSpanName</name>
>                               <description>
>                                       Name of the feature in the 
> resultingAnnotation to
>                                       contain the span that encloses it (i.e. 
> its
>                                       sentence)
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>AttributeList</name>
>                               <description>
>                                       List of attribute names for XML 
> dictionary entry
>                                       record - must correspond to FeatureList
>                               </description>
>                               <type>String</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>FeatureList</name>
>                               <description>
>                                       List of feature names for CAS 
> annotation - must
>                                       correspond to AttributeList
>                               </description>
>                               <type>String</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>TokenAnnotation</name>
>                               <description></description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>TokenClassFeatureName</name>
>                               <description>
>                                       Name of feature used when doing lookups 
> against
>                                       IncludedTokenClasses and 
> ExcludedTokenClasses
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>TokenTextFeatureName</name>
>                               <description></description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>SpanFeatureStructure</name>
>                               <description>
>                                       Type of annotation which corresponds to 
> spans of
>                                       data for processing (e.g. a Sentence)
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>OrderIndependentLookup</name>
>                               <description>
>                                       True if should ignore element order 
> during lookup
>                                       (i.e., "top box" would equal "box 
> top"). Default is
>                                       False.
>                               </description>
>                               <type>Boolean</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>TokenTypeFeatureName</name>
>                               <description>
>                                       Name of feature used when doing lookups 
> against
>                                       IncludedTokenTypes and 
> ExcludedTokenTypes
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>IncludedTokenTypes</name>
>                               <description>
>                                       Type of tokens to include in lookups 
> (if not
>                                       supplied, then all types are included 
> except those
>                                       specifically mentioned in 
> ExcludedTokenTypes)
>                               </description>
>                               <type>Integer</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>ExcludedTokenTypes</name>
>                               <description></description>
>                               <type>Integer</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>ExcludedTokenClasses</name>
>                               <description>
>                                       Class of tokens to exclude from lookups 
> (if not
>                                       supplied, then all classes are excluded 
> except those
>                                       specifically mentioned in 
> IncludedTokenClasses,
>                                       unless IncludedTokenClasses is not 
> supplied, in
>                                       which case none are excluded)
>                               </description>
>                               <type>String</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>IncludedTokenClasses</name>
>                               <description>
>                                       Class of tokens to include in lookups 
> (if not
>                                       supplied, then all classes are included 
> except those
>                                       specifically mentioned in 
> ExcludedTokenClasses)
>                               </description>
>                               <type>String</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>TokenClassWriteBackFeatureNames</name>
>                               <description>
>                                       names of features that should be 
> written back to a
>                                       token, such as a POS tag
>                               </description>
>                               <type>String</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               
> <name>ResultingAnnotationMatchedTextFeature</name>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>PrintDictionary</name>
>                               <type>Boolean</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>SearchStrategy</name>
>                               <description>
>                                       Can be either "SkipAnyMatch",
>                                       "SkipAnyMatchAllowOverlap" or
>                                       
> "ContiguousMatch"&#13;&#13;ContiguousMatch: longest
>                                       match of contiguous tokens within 
> enclosing
>                                       span(taking into account 
> included/excluded items).
>                                       DEFAULT strategy &#13;SkipAnyMatch: 
> longest match of
>                                       not-necessarily contiguous tokens 
> within enclosing
>                                       span (taking into account 
> included/excluded items).
>                                       Subsequent lookups begin in span after 
> complete
>                                       match. IMPLIES order-independent lookup
>                                       &#13;SkipAnyMatchAllowOverlap: longest 
> match of
>                                       not-necessarily contiguous tokens 
> within enclosing
>                                       span (taking into account 
> included/excluded items).
>                                       Subsequent lookups begin in span after 
> next token.
>                                       IMPLIES order-independent lookup
>                               </description>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>StopWords</name>
>                               <type>String</type>
>                               <multiValued>true</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>FindAllMatches</name>
>                               <type>Boolean</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>MatchedTokensFeatureName</name>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>ReplaceCommaWithAND</name>
>                               <type>Boolean</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>TokenizerDescriptorPath</name>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>true</mandatory>
>                       </configurationParameter>
>                       <configurationParameter>
>                               <name>LanguageID</name>
>                               <type>String</type>
>                               <multiValued>false</multiValued>
>                               <mandatory>false</mandatory>
>                       </configurationParameter>
>               </configurationParameters>
>               <configurationParameterSettings>
>                       <nameValuePair>
>                               <name>caseMatch</name>
>                               <value>
>                                       <string>ignoreall</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>AttributeList</name>
>                               <value>
>                                       <array>
>                                               <string>canonical</string>
>                                               <string>SemClass</string>
>                                       </array>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>FeatureList</name>
>                               <value>
>                                       <array>
>                                               <string>DictCanon</string>
>                                               <string>SemClass</string>
>                                       </array>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>TokenAnnotation</name>
>                               <value>
>                                       <string>uima.tt.TokenAnnotation</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>ResultingAnnotationName</name>
>                               <value>
>                                       <string>
>                                               
> org.apache.uima.conceptMapper.DictTerm
>                                       </string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>SpanFeatureStructure</name>
>                               <value>
>                                       
> <string>uima.tcas.DocumentAnnotation</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>OrderIndependentLookup</name>
>                               <value>
>                                       <boolean>false</boolean>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>TokenClassWriteBackFeatureNames</name>
>                               <value>
>                                       <array />
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>IncludedTokenClasses</name>
>                               <value>
>                                       <array />
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>PrintDictionary</name>
>                               <value>
>                                       <boolean>false</boolean>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>FindAllMatches</name>
>                               <value>
>                                       <boolean>false</boolean>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>StopWords</name>
>                               <value>
>                                       <array />
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>ReplaceCommaWithAND</name>
>                               <value>
>                                       <boolean>false</boolean>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>TokenizerDescriptorPath</name>
>                               <value>
>                                       <string>
>                                               
> /search/uima/conf/descriptors/OffsetTokenizer.xml
>                                       </string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>ResultingEnclosingSpanName</name>
>                               <value>
>                                       <string>enclosingSpan</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>MatchedTokensFeatureName</name>
>                               <value>
>                                       <string>matchedTokens</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               
> <name>ResultingAnnotationMatchedTextFeature</name>
>                               <value>
>                                       <string>matchedText</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>SearchStrategy</name>
>                               <value>
>                                       <string>ContiguousMatch</string>
>                               </value>
>                       </nameValuePair>
>                       <nameValuePair>
>                               <name>LanguageID</name>
>                               <value>
>                                       <string>en</string>
>                               </value>
>                       </nameValuePair>
>               </configurationParameterSettings>
>               <typeSystemDescription>
>                       <imports>
>                               <import 
> name="org.apache.uima.conceptMapper.DictTerm" />
>                               <import
>                                       
> name="org.apache.uima.conceptMapper.support.tokenizer.TokenAnnotation" />
>                       </imports>
>                       <types>
>                               <typeDescription>
>                                       <name>uima.tt.TokenAnnotation</name>
>                                       <description></description>
>                                       
> <supertypeName>uima.tcas.Annotation</supertypeName>
>                                       <features>
>                                               <featureDescription>
>                                                       <name>SemClass</name>
>                                                       <description>
>                                                               semantic class 
> of token
>                                                       </description>
>                                                       <rangeTypeName>
>                                                               uima.cas.String
>                                                       </rangeTypeName>
>                                               </featureDescription>
>                                               <featureDescription>
>                                                       <name>POS</name>
>                                                       <description>
>                                                               Part of SPeech 
> of term to which this
>                                                               token is a part
>                                                       </description>
>                                                       <rangeTypeName>
>                                                               uima.cas.String
>                                                       </rangeTypeName>
>                                               </featureDescription>
>                                               <featureDescription>
>                                                       
> <name>frost_TokenType</name>
>                                                       
> <description></description>
>                                                       <rangeTypeName>
>                                                               uima.cas.Integer
>                                                       </rangeTypeName>
>                                               </featureDescription>
>                                       </features>
>                               </typeDescription>
>                       </types>
>               </typeSystemDescription>
>               <typePriorities>
>                       <priorityList>
>                               <!-- <type>uima.tt.SentenceAnnotation</type> -->
>                               <type>uima.tt.TokenAnnotation</type>
>                       </priorityList>
>               </typePriorities>
>               <fsIndexCollection />
>               <capabilities>
>                       <capability>
>                               <inputs>
>                                       <type allAnnotatorFeatures="true">
>                                               uima.tt.TokenAnnotation
>                                       </type>
>                                       <!-- <type 
> allAnnotatorFeatures="true">uima.tt.SentenceAnnotation</type>
>                                               <type 
> allAnnotatorFeatures="true">uima.tt.ParagraphAnnotation</type> -->
>                               </inputs>
>                               <outputs>
>                                       <type allAnnotatorFeatures="true">
>                                               
> org.apache.uima.conceptMapper.DictTerm
>                                       </type>
>                                       <type allAnnotatorFeatures="true">
>                                               uima.tt.TokenAnnotation
>                                       </type>
>                                       <type allAnnotatorFeatures="true">
>                                               
> org.apache.uima.conceptMapper.support.tokenizer.TokenAnnotation
>                                       </type>
>                                       <type allAnnotatorFeatures="true">
>                                               uima.tcas.DocumentAnnotation
>                                       </type>
>                               </outputs>
>                               <languagesSupported />
>                       </capability>
>               </capabilities>
>               <operationalProperties>
>                       <modifiesCas>true</modifiesCas>
>                       
> <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
>                       <outputsNewCASes>false</outputsNewCASes>
>               </operationalProperties>
>       </analysisEngineMetaData>
>       <externalResourceDependencies>
>               <externalResourceDependency>
>                       <key>DictionaryFile</key>
>                       <description>dictionary file loader.</description>
>                       <interfaceName>
>                               
> org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource
>                       </interfaceName>
>                       <optional>false</optional>
>               </externalResourceDependency>
>       </externalResourceDependencies>
>       <resourceManagerConfiguration>
>               <externalResources>
>                       <externalResource>
>                               <name>DictionaryFileName</name>
>                               <description>
>                                       A file containing the dictionary. 
> Modify this URL to
>                                       use a different dictionary.
>                               </description>
>                               <fileResourceSpecifier>
>                                       
> <fileUrl>file:/search/uima/conf/testDict.xml</fileUrl>
>                               </fileResourceSpecifier>
>                               <implementationName>
>                                       
> org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource_impl
>                               </implementationName>
>                       </externalResource>
>               </externalResources>
>               <externalResourceBindings>
>                       <externalResourceBinding>
>                               <key>DictionaryFile</key>
>                               <resourceName>DictionaryFileName</resourceName>
>                       </externalResourceBinding>
>               </externalResourceBindings>
>       </resourceManagerConfiguration>
> </taeDescription>
> [Kothuvatiparambil, Viju] 
> 
> ----------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s) only, may 
> contain information that is privileged, confidential and/or proprietary and 
> subject to important terms and conditions available at 
> http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
> recipient, please delete this message.

Re: SemClass feature not working in ConceptMapper add-on

Reply via email to