Hi James & Pei,

Thank you for your replies and sorry for my late reply as I have been away.

Q1 – The longest span could work and is one of the options we are looking at 
but when there are overlaps it can get complicated.  In the following example, 
the longest would work.  We can take start with 01, and ignore 02 and 03 
because their start positions overlap the end position of 01, and then continue 
with 04.  But I don’t think it will always be this straight forward as the 
being/end string positions may not always be a good indicator of what exactly 
in the original text was coded.

00 Invasive ductal carcinoma of the left breast with bone metastases.
01 Invasive ductal carcinoma of the left breast                       
408643008|Infiltrating duct carcinoma of breast (disorder)|
02                                       breast with bone             
56873002|Bone structure of sternum (body structure)|
03                                       breast with bone metastases  
94297009|Secondary malignant neoplasm of female breast (disorder)|
04                                                   bone metastases  
94222008|Secondary malignant neoplasm of bone (disorder)|

Q2 – As we are beginners, we are not at the level where we are comfortable with 
modifying cTakes or even know where to begin modifying cTakes but that would be 
an option in the future.  Going back to the example of “cancer of liver” and 
using the begin/end position of the string that was used to identify the 
concept, the original string would be “cancer of colon, lung and liver.”  The 
CUI that was identified was C0345904, which has 209 (137 unique) descriptions 
for all languages.  Examples of English terms include:
  a.. CA - Liver cancer 
  b.. Cancer of Liver 
  c.. cancer of the liver 
  d.. Cancer, Hepatic 
  e.. CANCER, HEPATOCELLULAR 
  f.. Malignant hepatic neoplasm 
  g.. Malignant liver tumor 
  h.. Malignant liver tumour 
  i.. Malignant neoplasm of liver 
  j.. malignant neoplasm of liver (diagnosis) 
  k.. Malignant neoplasm of liver unspecified 
  l.. Malignant neoplasm of liver unspecified (disorder) 
  m.. Malignant neoplasm of liver, not specified as primary or secondary 
  n.. Malignant neoplasm of liver, NOS 
  o.. Malignant neoplasm of liver, unspecified 
  p.. malignant neosplasm of the liver 
  q.. Malignant tumor of liver 
  r.. Malignant tumor of liver (disorder) 
  s.. Malignant tumour of liver
It would seem suboptimal to go through each of the descriptions to try and 
determine which was the UMLS term that was used in the coding.  It is important 
for us to know which part of the string is matched because something like 
“Invasive ductal carcinoma of the left breast” will be matched to the SNOMED CT 
concept “408643008|Infiltrating duct carcinoma of breast (disorder)|”, but we 
would like to know that “left” was not matched and would like to 
post-coordinate the expression to indicate the left breast, i.e.: 
408643008|Infiltrating duct carcinoma of breast (disorder)|:363698007|Finding 
site (attribute)|=80248007|Left breast structure (body structure)|.  When there 
are other qualifiers like severity, chronicity and episodicity that may be 
ignored when matching, we would like to capture it at the level of granularity 
specified in the original text.

In terms of the chunking, here is what I see for “cancer of colon, lung and 
liver”:
  a.. NP: cancer of colon, lung and liver 
  b.. PP: of 
  c.. NP: colon, lung and liver
For “cancer of colon, liver and lung” here is what I see:
  a.. NP: cancer of colon, 
  b.. PP: of 
  c.. NP: colon 
  d.. O: liver 
  e.. O: and 
  f.. NP: lung
Q3 – To answer Pei’s question, we are not looking at the preferred name from 
the UMLS, just which term was used.

Regards,
Dennis

From: Chen, Pei 
Sent: Thursday, August 22, 2013 12:27 PM
To: [email protected] 
Subject: RE: Concept annotation questions

Also,

> 3)… or the exact description that was returned in the UMLS? 

I presume you mean to save the preferred name from UMLS?  If so, this seems to 
be a common request- see: https://issues.apache.org/jira/browse/CTAKES-224

 

--Pei

 

From: Masanz, James J. [mailto:[email protected]] 
Sent: Thursday, August 22, 2013 3:24 PM
To: '[email protected]'
Subject: RE: Concept annotation questions

 

 

Welcome to the cTAKES community.

 

Q1 – some people use the longest span. 

Q2 &Q3 – can you just use the text from the dictionary “Malignant neoplasm of 
liver (disorder)“.  Alternatively you could modify cTAKES to save the text of 
the words that it matches when it is performing dictionary lookup. I would 
guess there is a term in the UMLS dictionary with the same code as Malignant 
neoplasm of liver (disorder) that just has the words “cancer of liver”, but 
there isn’t anything in cTAKES to give that to you just through a configuration 
change.

 

For “cancer of colon, liver and lung“, can you look at the chunk  tag for 
liver.  If it’s in a separate noun phrase (NP) from “cancer of colon” that 
would account for why cancer is not getting tied to liver in that case (but 
wouldn’t account for why the chunker is creating as a separate noun phrase)

 

-- James

 

From: [email protected] 
[mailto:[email protected]] On Behalf Of 
Dennis Lee Hon Kit
Sent: Wednesday, August 21, 2013 1:10 PM
To: [email protected]
Subject: Concept annotation questions

 

Hi Everyone,

 

We are new to cTakes so please bear with our questions.  We are using cTakes to 
annotate things like encounter diagnoses and referral notes and are especially 
interested with the SNOMED CT encodings.  But we are not sure how to make sense 
of all the outputs.

 

Example #1

 

In the example below, “cancer of colon, lung and liver” has been encoded with 
SNOMED CT and additional concepts that do not apply have been removed (e.g., 
general “cancer” concept, lung, colon and liver structures, etc).   They have 
been plotted out by the begin/end positions.  If the terms to do not align, its 
probably because the email only accepts plain text and a mono-spaced font is 
not the default.

 

cancer of colon, lung and liver

cancer of colon, lung and liver   93870000|Malignant neoplasm of liver 
(disorder)|

cancer of colon, lung             363358000|Malignant tumor of lung (disorder)|

cancer of colon                   363406005|Malignant tumor of colon (disorder)|

 

Question (1) – We had to do quite a bit of post-processing to remove inactive 
concepts, subtype concepts, concepts that are part of the defining attributes, 
etc.  Are there a set of guidelines to help sort out the CUI or SNOMED CT codes 
that have been identified?

Question (2) – How can we determine that “93870000|Malignant neoplasm of liver 
(disorder)|” refers to “cancer of liver” as opposed to using the begin/end 
string, which points to “cancer of colon, lung and liver”?  Certainly we can 
try to do additional parsing but there are a lot of different scenarios to take 
into account.

Question (3) – This relates to question 2, are we able to identify the original 
terms that were used for the concept matching or the exact description that was 
returned in the UMLS?  While the CUI is helpful, the CUI can refer to tens or 
even hundreds of descriptions.

 


--------------------------------------------------------------------------------

Example #2

 

Switching the position of colon, lung and liver can result in different 
encodings.  Once again, after removing additional concepts not needed (i.e., 
“cancer” and “colon structure”), we get the following.  What happened to liver 
and lung cancer?

 

cancer of colon, liver and lung

cancer of colon                   363406005|Malignant tumor of colon (disorder)|

                           lung   39607008|Lung structure (body structure)|

 

We have more questions but will start with these.  Thank you in advance.

 

Regards,
Dennis

Reply via email to