Hi Lee,

As you have discovered, the dictionary1.csv is not used by 
AggregatePlainTextProcessor.xml

AggregatePlainTextProcessor.xml uses a (tiny) lucene index for the few words 
like knee and pain that are annotated without using the larger UMLS resource.

I think to use the csv instead of the other methods, you would modified 
AggregatePlainTextProcessor.xml to refer to 
DictionaryLookupAnnotatorCSV.xml
instead of 
DictionaryLookupAnnotator.xml
where there currently is the line
<import 
location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotator.xml"/>

I can't remember offhand if there are parameters you have to update too, but I 
don't think so.

Hopefully that will give you an idea then of what to add to 
AggregatePlainTextUMLSProcessor.xml to get both the UMLS and your csv 
dictionary in effect.

(You would add the following to the delegateAnalysisEngine list of 
AggregatePlainTextUMLSProcessor.xml

<delegateAnalysisEngine key="DictionaryLookupAnnotator">
<import 
location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorCSV.xml"/>
</delegateAnalysisEngine>

And then add 
<node>DictionaryLookupAnnotator</node>
Just before or just after
<node>DictionaryLookupAnnotatorDB</node>

-- James

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Lee, Richard A. [USA]
Sent: Monday, January 06, 2014 3:09 PM
To: [email protected]
Subject: RE: How to augment/modify UMLS resources?

Thanks, James.

I am leaning toward supplementing the UMLS DB as you suggest rather than 
changing it, if I can make that work. I did originally try adding entries to 
dictionary1.csv, while using AggregatePlainTextProcessor.xml, but saw no change 
in the annotations. I guess that dictionary1 is in fact not being used in 
APTP.xml, and "hyperlipidemia", "knee", "pain", et al get annotated due to some 
other term list / dictionary. Time to wade through the contents of that 
ctakes-dictionary-lookup-res\src\... tree.

-----Original Message-----
From: Masanz, James J. [mailto:[email protected]] 
Sent: Fri, 03 Jan, 2014 16:45
To: '[email protected]'
Subject: [External] RE: How to augment/modify UMLS resources?

The separately downloadable UMLS dictionary formatted for cTAKES [1], not 
counting medication names (RxNorm), is in a database [2]. So you could add to 
that database whatever terms you want.

The RxNorm dictionary is in a Lucene index (though there is a related jira 
ticket open so that maybe it will end up in the same database) so to add to the 
currently used medications list, would probably best be done programmatically 
using the Lucene API (someone with more Lucene end-user experience, please 
chime in)

cTAKES provides a way to look up terms in a flatfile dictionary that you would 
provide. See the files that end with .csv within 
ctakes-dictionary-lookup-res\src\main\resources\org\apache\ctakes\dictionary\lookup

The flatfile is not used directly in conjunction with the database file of 
terms from UMLS – to use the two together, you would have one annotator 
configured to use that flatfile for the dictionary, and have a second annotator 
configured to use the database file.
 
Some things to be aware of if you went that route
 - each note would be processed by both, and if you had terms in your flatfile 
that duplicated what was in the database, you would end up with double 
annotations
 - each note would be processed in effect twice (not by the entire pipeline 
thankfully) so it would be a slower than just using one.

As far as something being annotated that you don't want annotated, within the 
LookupDesc*xml file being used, there can be an excludeList to have "men" no 
longer annotated.  See LookupDesc_DrugNER.xml for an example of using 
excludeList.

Any improvements or even written steps on any of the above would be a great 
contribution.

-- James

[1] http://sourceforge.net/projects/ctakesresources/files/
[2] the relative path to the hsql db is 
resources\org\apache\ctakes\dictionary\lookup\umls2011ab

From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Lee, Richard A. [USA]
Sent: Thursday, January 02, 2014 5:01 PM
To: [email protected]
Subject: How to augment/modify UMLS resources?

Howdy, all. I’ve got a lot of experience with various commercial extraction 
tools, but I’m new to cTAKES and UIMA, so please bear with me.

I am able to use my UMLS credentials to process documents, and the results are 
good. But there are a few things I wish to change in the medfacts.types.Concept 
and AnatomicalSiteMention areas, for starters. For example, while it annotates 
“orbicularis oculi” as a concept, it does not annotate “musculus orbicularis 
oculi”, “septum orbital”, or “oculi medialis”. It annotates “ulceration”, 
“perforation”, and “corneal perforation” but not “corneal ulceration”. It 
annotates “men” (as in “Chinese men”) as a “problem”. It annotates “ER” (ie 
Emergency Room) as an AnatomicalSiteReference.

So, the question becomes, how do I address these? Do I need to somehow 
re-generate (with changes) the UMLS data files, probably using Luke or some 
such? That seems a bit crude. Is there a clean way to supplement those data 
files instead to achieve the desired changes?

Thanks in advance.

------------------------------------------------------------------------------------------------------------
Richard A Lee || Lead Associate / Senior Ontologist || [email protected] || 
571-482-7809

Reply via email to