Thank you for that pointer. Unfortunately, 
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation does not have 
the missing annotations.

I noticed that a later post to this list asked a similar question concerning 
adding TUIs to LookupDesc_Db.xml, and the answer was that the ctakes code in 
UmlsToSnomedConsumerImpl only looks for certain TUI “groups”. So that would 
explain why my shot-in-the-dark of using “chemicalanddrugTuis” did not work. I 
changed that to “medicationTuis”, as suggested by the code, which did indeed 
cause most of the expected additional terms to be annotated.

So that partially answers my question. The ones it still missed despite being 
tied to the added TUIs, and the ones not added to the annotations despite 
adding T058 to the existing element with group “procedureTuis”, remain 
mysteries…

----

From: Pei Chen [mailto:[email protected]]
Sent: Fri, 04 Apr, 2014 16:33
To: [email protected]<mailto:[email protected]>
Subject: [External] Re: Problems with TUI filtering and other annotation 
omissions

Richard,
org.apache.ctakes.assertion.medfacts.types.Concept is an internal type used by 
the assertion module,
could you see what is returned in: 
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation?


On Fri, Apr 4, 2014 at 3:56 PM, Lee, Richard A. [USA] 
<[email protected]<mailto:[email protected]>> wrote:
I ran several documents through cTAKES, using AggregatePlaintextUMLSProcessor, 
and examined the list of org.apache.ctakes.assertion.medfacts.types.Concept 
annotations produced for each. From those results, I made up a list of phrases 
I had hoped cTAKES would annotate but did not. I used MetaMap to look up each 
of those phrases, and found that approximately 150 of them resulted in a 
full-phrase match and a corresponding CUI.

I used the MetamorphoSys scripts to load the UMLS RRF data set into a SQL DB, 
and queried the DB to confirm that those ~150 phrases were indeed present with 
the expected CUIs. So, the question becomes, why didn’t cTAKES annotate them?

Looking at the cTAKES logs, it appears the OrangeBookFilter “Filtered out” only 
5 out of the 150.

The other possible cause I could think of was the TUI filtering; there was no 
evidence of it in the logs, but I don’t know whether the results of filtering 
in that step get logged by default or not. I looked up in the DB the TUIs for 
each of the phrases, compared them to the lists of “allowed” TUIs in 
LookupDesc_Db.xml, and concluded that the TUI filtering might account for 44 of 
the phrases. So the rest remain a mystery.

I modified the TUI lists in LookupDesc_Db.xml to add TUIs, in the hopes that 
that would cause the corresponding phrases to be annotated. Specifically, I 
added T058 to one list, and added a second list with a handful of TUIs:

<property key="procedureTuis" value="T058,T059,T060,T061"/>
<property key="chemicalanddrugTuis" value="T109,T110,T116,T121,T123"/>

T058 corresponded to 3 of the phrases on my list; T121 alone accounted for 24 
of them. But, upon restarting cTAKES with that modified file, and running 
relevant documents, I found that the expected phrases were still not annotated. 
I even tried making the same change in LookupDesc.xml just in case, to no avail.

So, the questions are:

- Are there reasons beyond the OrangeBook and TUI filters why CUI-associated 
phrases in UMLS would not get annotated?

- Do TUI-filter results get logged by default, and if not, is there a way 
(log4j settings?) to log them without making code changes?

- Am I doing the TUI filter changes wrong?

Thanks for any answers and advice.

Reply via email to