Hey Sorry for delayed reply.
I believe the changes you are suggesting, are to be made in
LookupDesc_csv_sample.xml. I made those changes but still didn't get the
required results.
I am attaching the files here for reference.
I also wanted to know if this is the only method to use a non-UMLS
vocabulary as a dictionary in cTAKES
Regards,
Ravi Garg
On Tue, Apr 23, 2013 at 1:40 AM, Chen, Pei
<[email protected]>wrote:
> Ravi, ****
>
> Could you please attach the DictionaryLookupAnnotarCSV.xml****
>
> In particular, please consider using the
> FirstTokenPermLookupInitializerImpl vs DirectLookup…****
>
> ** **
>
> <lookupInitializer
> className="org.apache.ctakes.dictionary.lookup.ae.FirstTokenPermLookupInitializerImpl">
> ****
>
> <properties>****
>
> <property key="textMetaFields" value="0|1"/>****
>
> <property key="maxPermutationLevel" value="7"/>****
>
> <property key="windowAnnotations"
> value="org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation"/>
> ****
>
> </properties>****
>
> </lookupInitializer>****
>
> ** **
>
> I hope that helps.****
>
> ** **
>
> *From:* ravi garg [mailto:[email protected]]
> *Sent:* Monday, April 22, 2013 4:09 PM
>
> *To:* [email protected]
> *Subject:* Re: Regarding Entity Recognition****
>
> ** **
>
> Sorry, But this too doesn't solve the problem****
>
> ** **
>
> On Tue, Apr 23, 2013 at 1:28 AM, Savova, Guergana <
> [email protected]> wrote:****
>
> Try adding in the dictionary:****
>
> Knee|knee pain|….****
>
> ****
>
> The first field is reserved for the first word of the phrase.****
>
> Regards,****
>
> --Guergana****
>
> ****
>
> *From:* ravi garg [mailto:[email protected]]
> *Sent:* Monday, April 22, 2013 3:37 PM
> *To:* [email protected]
> *Subject:* Re: Regarding Entity Recognition****
>
> ****
>
> Hey,****
>
> Thanks for reply.****
>
> First let me brief you on what configuration I am using. I am using
> AggregatePlaintextProcessor.xml with DictionaryLookupAnnotar being
> DictionaryLookupAnnotarCSV.xml which reads dictionary from two files i.e
> one being the flat dictionary1.csv and another the lucene index one. I have
> added knee pain as single term in dictionary1.csv (like knee pain| knee
> pain) but still I am not being to get them as single entity. Am I missing
> something here?****
>
> Regards,****
>
> Ravi Garg****
>
> ****
>
> On Tue, Apr 23, 2013 at 12:49 AM, Chen, Pei <
> [email protected]> wrote:****
>
> Hi Ravi,****
>
> Yes, in your example “knee pain”, the default behavior in the dictionary
> lookup will create 3 IdentifiedAnnotations****
>
> “knee”, “pain”, as well as “knee pain”.****
>
> ****
>
> [Assuming the terms exist in the UMLS dictionary]****
>
> --Pei****
>
> ****
>
> *From:* ravi garg [mailto:[email protected]]
> *Sent:* Monday, April 22, 2013 3:06 PM
> *To:* [email protected]
> *Subject:* Regarding Entity Recognition****
>
> ****
>
> Hey,****
>
> First of all Congrats for building such a wonderful software. I am very
> new to cTAKES so had a very basic question to ask. ****
>
> My query is Is it possible to identify multiple words as a single entity,
> for eg right now knee pain gets identified as 'knee' and 'pain', but is it
> possible to get 'knee pain' as single identity. If so what all changes I
> have to make to get going.****
>
>
> ****
>
>
> --
> Ravi Garg
> 3rd Year
> MSc (hons) Biological Sciences
> B.E (hons) Computer Science and Engineering
> BITS Pilani KK Birla Goa Campus****
>
>
>
>
> --
> Ravi Garg
> 3rd Year
> MSc (hons) Biological Sciences
> B.E (hons) Computer Science and Engineering
> BITS Pilani KK Birla Goa Campus****
>
>
>
>
> --
> Ravi Garg
> 3rd Year
> MSc (hons) Biological Sciences
> B.E (hons) Computer Science and Engineering
> BITS Pilani KK Birla Goa Campus****
>
--
Ravi Garg
3rd Year
MSc (hons) Biological Sciences
B.E (hons) Computer Science and Engineering
BITS Pilani KK Birla Goa Campus
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<lookupSpecification>
<!-- Defines what dictionaries will be used in terms of implementation specifics and metaField configuration. -->
<dictionaries>
<dictionary id="DICT_CSV_SAMPLE" externalResourceKey="DictionaryFile" caseSensitive="false">
<implementation>
<csvImpl delimiter="|" indexedFieldNames="0,1"/>
</implementation>
<lookupField fieldName="0"/>
<metaFields>
<metaField fieldName="1"/>
</metaFields>
</dictionary>
<dictionary id="DICT_RXNORM" externalResourceKey="RxnormIndexReader" caseSensitive="false">
<implementation>
<luceneImpl/>
</implementation>
<lookupField fieldName="first_word"/>
<metaFields>
<metaField fieldName="code"/>
<metaField fieldName="preferred_designation"/>
<metaField fieldName="other_designation"/>
</metaFields>
</dictionary>
</dictionaries>
<!-- Binds together the components necessary to perform the complete lookup logic start to end. -->
<lookupBindings>
<lookupBinding>
<dictionaryRef idRef="DICT_CSV_SAMPLE"/>
<!-- NOTE: Only use if windowAnnotations have small # of tokens, sentences are not a good idea! -->
<lookupInitializer className="org.apache.ctakes.dictionary.lookup.ae.DirectLookupInitializerImpl">
<properties>
</properties>
</lookupInitializer>
<lookupConsumer className="org.apache.ctakes.dictionary.lookup.ae.NamedEntityLookupConsumerImpl">
<properties>
</properties>
</lookupConsumer>
</lookupBinding>
<lookupBinding>
<dictionaryRef idRef="DICT_RXNORM"/>
<lookupInitializer className="org.apache.ctakes.dictionary.lookup.ae.FirstTokenPermLookupInitializerImpl">
<properties>
<property key="textMetaFields" value="preferred_designation|other_designation"/>
<property key="maxPermutationLevel" value="7"/>
<property key="windowAnnotations" value="org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation"/>
<property key="exclusionTags" value="VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,LS,MD,PDT,POS,PP,PP$,RP,TO,WDT,WP,WPS,WRB"/>
</properties>
</lookupInitializer>
<lookupConsumer className="org.apache.ctakes.dictionary.lookup.ae.OrangeBookFilterConsumerImpl">
<properties>
<property key="codingScheme" value="RXNORM"/>
<property key="codeMetaField" value="code"/>
<property key="luceneFilterExtResrcKey" value="OrangeBookIndexReader"/>
</properties>
</lookupConsumer>
</lookupBinding>
</lookupBindings>
</lookupSpecification>
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<taeDescription xmlns="http://uima.apache.org/resourceSpecifier">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>true</primitive>
<annotatorImplementationName>org.apache.ctakes.dictionary.lookup.ae.DictionaryLookupAnnotator</annotatorImplementationName>
<analysisEngineMetaData>
<name>DictionaryLookupAnnotatorCSV</name>
<description>Dictionaries - some in lucene indexes and some in CSV files</description>
<version/>
<vendor/>
<configurationParameters>
<configurationParameter>
<name>maxListSize</name>
<description>Specifies the maximum number of items to be returned from an lucene query.</description>
<type>Integer</type>
<multiValued>false</multiValued>
<mandatory>false</mandatory>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings>
<nameValuePair>
<name>maxListSize</name>
<value>
<integer>2147483647</integer>
</value>
</nameValuePair>
</configurationParameterSettings>
<typeSystemDescription>
<imports>
</imports>
</typeSystemDescription>
<typePriorities/>
<fsIndexCollection/>
<capabilities>
<capability>
<inputs>
<type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.BaseToken</type>
<type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation</type>
</inputs>
<outputs>
<type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation</type>
</outputs>
<languagesSupported/>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
<externalResourceDependencies>
<externalResourceDependency>
<key>LookupDescriptor</key>
<description/>
<interfaceName>org.apache.ctakes.core.resource.FileResource</interfaceName>
<optional>false</optional>
</externalResourceDependency>
<externalResourceDependency>
<key>DictionaryFile</key>
<description/>
<interfaceName>org.apache.ctakes.core.resource.FileResource</interfaceName>
<optional>false</optional>
</externalResourceDependency>
<externalResourceDependency>
<key>RxnormIndexReader</key>
<description/>
<interfaceName>org.apache.ctakes.core.resource.LuceneIndexReaderResource</interfaceName>
<optional>false</optional>
</externalResourceDependency>
<externalResourceDependency>
<key>OrangeBookIndexReader</key>
<description/>
<interfaceName>org.apache.ctakes.core.resource.LuceneIndexReaderResource</interfaceName>
<optional>false</optional>
</externalResourceDependency>
</externalResourceDependencies>
<resourceManagerConfiguration>
<externalResources>
<externalResource>
<name>LookupDescriptorFile</name>
<description/>
<fileResourceSpecifier>
<fileUrl>file:org/apache/ctakes/dictionary/lookup/LookupDesc_csv_sample.xml</fileUrl>
</fileResourceSpecifier>
<implementationName>org.apache.ctakes.core.resource.FileResourceImpl</implementationName>
</externalResource>
<externalResource>
<name>DictionaryFileResource</name>
<description/>
<fileResourceSpecifier>
<fileUrl>file:org/apache/ctakes/dictionary/lookup/dictionary1.csv</fileUrl>
</fileResourceSpecifier>
<implementationName>org.apache.ctakes.core.resource.FileResourceImpl</implementationName>
</externalResource>
<externalResource>
<name>RxnormIndex</name>
<description/>
<configurableDataResourceSpecifier>
<url/>
<resourceMetaData>
<name/>
<configurationParameters>
<configurationParameter>
<name>UseMemoryIndex</name>
<type>Boolean</type>
<multiValued>false</multiValued>
<mandatory>true</mandatory>
</configurationParameter>
<configurationParameter>
<name>IndexDirectory</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>true</mandatory>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings>
<nameValuePair>
<name>UseMemoryIndex</name>
<value>
<boolean>true</boolean>
</value>
</nameValuePair>
<nameValuePair>
<name>IndexDirectory</name>
<value>
<string>org/apache/ctakes/dictionary/lookup/drug_index</string>
</value>
</nameValuePair>
</configurationParameterSettings>
</resourceMetaData>
</configurableDataResourceSpecifier>
<implementationName>org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl</implementationName>
</externalResource>
<externalResource>
<name>OrangeBookIndex</name>
<description/>
<configurableDataResourceSpecifier>
<url/>
<resourceMetaData>
<name/>
<configurationParameters>
<configurationParameter>
<name>UseMemoryIndex</name>
<type>Boolean</type>
<multiValued>false</multiValued>
<mandatory>true</mandatory>
</configurationParameter>
<configurationParameter>
<name>IndexDirectory</name>
<type>String</type>
<multiValued>false</multiValued>
<mandatory>true</mandatory>
</configurationParameter>
</configurationParameters>
<configurationParameterSettings>
<nameValuePair>
<name>UseMemoryIndex</name>
<value>
<boolean>true</boolean>
</value>
</nameValuePair>
<nameValuePair>
<name>IndexDirectory</name>
<value>
<string>org/apache/ctakes/dictionary/lookup/OrangeBook</string>
</value>
</nameValuePair>
</configurationParameterSettings>
</resourceMetaData>
</configurableDataResourceSpecifier>
<implementationName>org.apache.ctakes.core.resource.LuceneIndexReaderResourceImpl</implementationName>
</externalResource>
</externalResources>
<externalResourceBindings>
<externalResourceBinding>
<key>LookupDescriptor</key>
<resourceName>LookupDescriptorFile</resourceName>
</externalResourceBinding>
<externalResourceBinding>
<key>DictionaryFile</key>
<resourceName>DictionaryFileResource</resourceName>
</externalResourceBinding>
<externalResourceBinding>
<key>RxnormIndexReader</key>
<resourceName>RxnormIndex</resourceName>
</externalResourceBinding>
<externalResourceBinding>
<key>OrangeBookIndexReader</key>
<resourceName>OrangeBookIndex</resourceName>
</externalResourceBinding>
</externalResourceBindings>
</resourceManagerConfiguration>
</taeDescription>