I ran several documents through cTAKES, using AggregatePlaintextUMLSProcessor, 
and examined the list of org.apache.ctakes.assertion.medfacts.types.Concept 
annotations produced for each. From those results, I made up a list of phrases 
I had hoped cTAKES would annotate but did not. I used MetaMap to look up each 
of those phrases, and found that approximately 150 of them resulted in a 
full-phrase match and a corresponding CUI.

I used the MetamorphoSys scripts to load the UMLS RRF data set into a SQL DB, 
and queried the DB to confirm that those ~150 phrases were indeed present with 
the expected CUIs. So, the question becomes, why didn’t cTAKES annotate them?

Looking at the cTAKES logs, it appears the OrangeBookFilter “Filtered out” only 
5 out of the 150.

The other possible cause I could think of was the TUI filtering; there was no 
evidence of it in the logs, but I don’t know whether the results of filtering 
in that step get logged by default or not. I looked up in the DB the TUIs for 
each of the phrases, compared them to the lists of “allowed” TUIs in 
LookupDesc_Db.xml, and concluded that the TUI filtering might account for 44 of 
the phrases. So the rest remain a mystery.

I modified the TUI lists in LookupDesc_Db.xml to add TUIs, in the hopes that 
that would cause the corresponding phrases to be annotated. Specifically, I 
added T058 to one list, and added a second list with a handful of TUIs:

<property key="procedureTuis" value="T058,T059,T060,T061"/>
<property key="chemicalanddrugTuis" value="T109,T110,T116,T121,T123"/>

T058 corresponded to 3 of the phrases on my list; T121 alone accounted for 24 
of them. But, upon restarting cTAKES with that modified file, and running 
relevant documents, I found that the expected phrases were still not annotated. 
I even tried making the same change in LookupDesc.xml just in case, to no avail.

So, the questions are:

- Are there reasons beyond the OrangeBook and TUI filters why CUI-associated 
phrases in UMLS would not get annotated?

- Do TUI-filter results get logged by default, and if not, is there a way 
(log4j settings?) to log them without making code changes?

- Am I doing the TUI filter changes wrong?

Thanks for any answers and advice.

Reply via email to