Hi everyone, I am unable to match text from dictionary if the enclosing span contains only a single token.
For example - I am trying to match word "education" from my dictionary and for the enclosing span I am using a sentence. So if sentence contains a single token then I am not able to match it from dictionary. Here is what I have tried, When I have a sentence like - "Education <**something else**>" then conceptMapper matches "education". While if I have a sentence like - "Education" then conceptMapper is not picking it from dictionary. So I have a question that *does conceptMapper requires you to have more than 1 TokenAnnotation within the specified spanFeatureStructure ? * P.S : This is the descriptor I am using <?xml version="1.0" encoding="UTF-8"?> > <taeDescription xmlns="http://uima.apache.org/resourceSpecifier"> > <frameworkImplementation>org.apache.uima.java</frameworkImplementation> > <primitive>true</primitive> > > <annotatorImplementationName>org.apache.uima.conceptMapper.ConceptMapper</annotatorImplementationName> > <analysisEngineMetaData> > <name>Segment Heading Annotator</name> > <description/> > <version>1</version> > <vendor/> > <configurationParameters> > <configurationParameter> > <name>caseMatch</name> > <description>this parameter specifies the case folding mode: > ignoreall - fold everything to lowercase for > matching insensitive - fold only tokens with initial > caps to lowercase digitfold - fold all (and only) > tokens with a digit sensitive - perform no case > folding</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>Stemmer</name> > <description>Name of stemmer class to use before matching. MUST > have a zero-parameter constructor! If not specified, > no stemming will be performed.</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>ResultingAnnotationName</name> > <description>Name of the annotation type created by this TAE, > must match the typeSystemDescription > entry</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>ResultingEnclosingSpanName</name> > <description>Name of the feature in the resultingAnnotation to > contain the span that encloses it (i.e. its > sentence)</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>AttributeList</name> > <description>List of attribute names for XML dictionary entry > record - must correspond to FeatureList</description> > <type>String</type> > <multiValued>true</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>FeatureList</name> > <description>List of feature names for CAS annotation - must > correspond to AttributeList</description> > <type>String</type> > <multiValued>true</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>TokenAnnotation</name> > <description/> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>TokenClassFeatureName</name> > <description>Name of feature used when doing lookups against > IncludedTokenClasses and > ExcludedTokenClasses</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>TokenTextFeatureName</name> > <description/> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>SpanFeatureStructure</name> > <description>Type of annotation which corresponds to spans of > data for processing (e.g. a Sentence)</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>OrderIndependentLookup</name> > <description>True if should ignore element order during lookup > (i.e., "top box" would equal "box top"). Default is > False.</description> > <type>Boolean</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>TokenTypeFeatureName</name> > <description>Name of feature used when doing lookups against > IncludedTokenTypes and ExcludedTokenTypes</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>IncludedTokenTypes</name> > <description>Type of tokens to include in lookups (if not > supplied, then all types are included except those > specifically mentioned in > ExcludedTokenTypes)</description> > <type>Integer</type> > <multiValued>true</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>ExcludedTokenTypes</name> > <description/> > <type>Integer</type> > <multiValued>true</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>ExcludedTokenClasses</name> > <description>Class of tokens to exclude from lookups (if not > supplied, then all classes are excluded except those > specifically mentioned in IncludedTokenClasses, > unless IncludedTokenClasses is not supplied, in > which case none are excluded)</description> > <type>String</type> > <multiValued>true</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>IncludedTokenClasses</name> > <description>Class of tokens to include in lookups (if not > supplied, then all classes are included except those > specifically mentioned in > ExcludedTokenClasses)</description> > <type>String</type> > <multiValued>true</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>TokenClassWriteBackFeatureNames</name> > <description>names of features that should be written back to a > token, such as a POS tag</description> > <type>String</type> > <multiValued>true</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>ResultingAnnotationMatchedTextFeature</name> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>PrintDictionary</name> > <type>Boolean</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>SearchStrategy</name> > <description>Can be either "SkipAnyMatch", > "SkipAnyMatchAllowOverlap" or > "ContiguousMatch" ContiguousMatch: longest > match of contiguous tokens within enclosing > span(taking into account included/excluded items). > DEFAULT strategy SkipAnyMatch: longest match of > not-necessarily contiguous tokens within enclosing > span (taking into account included/excluded items). > Subsequent lookups begin in span after complete > match. IMPLIES order-independent lookup > SkipAnyMatchAllowOverlap: longest match of > not-necessarily contiguous tokens within enclosing > span (taking into account included/excluded items). > Subsequent lookups begin in span after next token. > IMPLIES order-independent lookup</description> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>StopWords</name> > <type>String</type> > <multiValued>true</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>FindAllMatches</name> > <type>Boolean</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>MatchedTokensFeatureName</name> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>ReplaceCommaWithAND</name> > <type>Boolean</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > <configurationParameter> > <name>TokenizerDescriptorPath</name> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>true</mandatory> > </configurationParameter> > <configurationParameter> > <name>LanguageID</name> > <type>String</type> > <multiValued>false</multiValued> > <mandatory>false</mandatory> > </configurationParameter> > </configurationParameters> > <configurationParameterSettings> > <nameValuePair> > <name>caseMatch</name> > <value> > <string>ignoreall</string> > </value> > </nameValuePair> > <nameValuePair> > <name>AttributeList</name> > <value> > <array> > <string>canonical</string> > <string>group</string> > <string>class</string> > </array> > </value> > </nameValuePair> > <nameValuePair> > <name>FeatureList</name> > <value> > <array> > <string>DictCanon</string> > <string>group</string> > <string>segmentClass</string> > </array> > </value> > </nameValuePair> > <nameValuePair> > <name>TokenAnnotation</name> > <value> > <string>com.naukri.parse.type.TokenAnnotation</string> > </value> > </nameValuePair> > <nameValuePair> > <name>ResultingAnnotationName</name> > <value> > <string>com.naukri.parse.type.resume.SegmentHeading</string> > </value> > </nameValuePair> > <nameValuePair> > <name>SpanFeatureStructure</name> > <value> > <string>com.naukri.parse.type.Sentence</string> > </value> > </nameValuePair> > <nameValuePair> > <name>OrderIndependentLookup</name> > <value> > <boolean>false</boolean> > </value> > </nameValuePair> > <nameValuePair> > <name>TokenClassWriteBackFeatureNames</name> > <value> > <array/> > </value> > </nameValuePair> > <nameValuePair> > <name>IncludedTokenClasses</name> > <value> > <array/> > </value> > </nameValuePair> > <nameValuePair> > <name>PrintDictionary</name> > <value> > <boolean>false</boolean> > </value> > </nameValuePair> > <nameValuePair> > <name>FindAllMatches</name> > <value> > <boolean>true</boolean> > </value> > </nameValuePair> > <nameValuePair> > <name>StopWords</name> > <value> > <array/> > </value> > </nameValuePair> > <nameValuePair> > <name>ReplaceCommaWithAND</name> > <value> > <boolean>false</boolean> > </value> > </nameValuePair> > <nameValuePair> > <name>TokenizerDescriptorPath</name> > <value> > <string>desc/tokenizerAE.xml</string> > </value> > </nameValuePair> > <nameValuePair> > <name>ResultingEnclosingSpanName</name> > <value> > <string>enclosingSpan</string> > </value> > </nameValuePair> > <nameValuePair> > <name>MatchedTokensFeatureName</name> > <value> > <string>matchedTokens</string> > </value> > </nameValuePair> > <nameValuePair> > <name>ResultingAnnotationMatchedTextFeature</name> > <value> > <string>matchedText</string> > </value> > </nameValuePair> > <nameValuePair> > <name>SearchStrategy</name> > <value> > <string>ContiguousMatch</string> > </value> > </nameValuePair> > <nameValuePair> > <name>LanguageID</name> > <value> > <string>en</string> > </value> > </nameValuePair> > </configurationParameterSettings> > <typeSystemDescription> > <imports> > <import location="typesystem.xml"/> > </imports> > </typeSystemDescription> > <typePriorities> > <priorityList> > <type>com.naukri.parse.type.TokenAnnotation</type> > </priorityList> > </typePriorities> > <fsIndexCollection/> > <capabilities> > <capability> > <inputs> > <type > allAnnotatorFeatures="true">ucom.naukri.parse.type.TokenAnnotation</type> > </inputs> > <outputs> > <type > allAnnotatorFeatures="true">com.naukri.parse.type.DictTerm</type> > <type > allAnnotatorFeatures="true">com.naukri.parse.type.TokenAnnotation</type> > <type > allAnnotatorFeatures="true">uima.tcas.DocumentAnnotation</type> > </outputs> > <languagesSupported/> > </capability> > </capabilities> > <operationalProperties> > <modifiesCas>true</modifiesCas> > <multipleDeploymentAllowed>true</multipleDeploymentAllowed> > <outputsNewCASes>false</outputsNewCASes> > </operationalProperties> > </analysisEngineMetaData> > <externalResourceDependencies> > <externalResourceDependency> > <key>DictionaryFile</key> > <description>dictionary file loader.</description> > > <interfaceName>org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource</interfaceName> > <optional>false</optional> > </externalResourceDependency> > </externalResourceDependencies> > <resourceManagerConfiguration> > <externalResources> > <externalResource> > <name>segment_heading_dict</name> > <description>A file containing the dictionary. Modify this URL to > use a different dictionary.</description> > <fileResourceSpecifier> > <fileUrl>dict/segment.heading.xml</fileUrl> > </fileResourceSpecifier> > > <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.DictionaryResource_impl</implementationName> > </externalResource> > </externalResources> > <externalResourceBindings> > <externalResourceBinding> > <key>DictionaryFile</key> > <resourceName>segment_heading_dict</resourceName> > </externalResourceBinding> > </externalResourceBindings> > </resourceManagerConfiguration> > </taeDescription> >
