Re: Concept annotation questions and keep JCas results in a file

samir chabou Fri, 06 Sep 2013 20:35:35 -0700

Hi Richard,
I had a look to these methods they can allow me to implement my requirement. Do 
you have an idea if there is a preferrence of using readXCas/writeXCas rather 
than readXmi/writeXmi or it is just a matter of having different possibilities 
of read/write from/to different file format.
Thanks
Samir





________________________________
 From: Richard Eckart de Castilho <[email protected]>
To: [email protected]; samir chabou <[email protected]> 
Sent: Friday, September 6, 2013 3:29:19 AM
Subject: Re: Concept annotation questions and keep JCas results in a file
 

Hi,

you might want to take a look at convenience methods in the recently
released Apache uimaFIT 2.0.0:

CasIOUtil
  readXCas(JCas, File)
  readXmi(JCas, File)
  writeXCas(JCas, File)
  writeXmi(JCas, File)

Cheers,

-- Richard

On 06.09.2013, at 06:28, samir chabou <[email protected]> wrote:

> Hi Tim, Pei and James
> 1) I tryied List l = JCasUtil.selectCovered(jcas, BaseToken.class, i) it 
> answer perfectly my requirement, thanks Tim. 
> 2) Now; I need to  NLP a medical question using the clinical pipeline and I 
> need to keep the JCas result in a file or any persistent way because i need 
> to use it later in my processing. Is it possible to do this and is it 
> possible to recall this  JCas later in my processing ?    
> 
> Thanks 
> Samir
> From: samir chabou <[email protected]>
> To: "[email protected]" <[email protected]> 
> Sent: Thursday, August 29, 2013 2:51:12 PM
> Subject: Re: Concept annotation questions
> 
> Thanks Tim,
> it looks a better and cleaner way. It means the List l = 
> JCasUtil.selectCovered(jcas, BaseToken.class, i) will give me the 
> intersection between the BaseTokens and IdentifiedAnnotations. If my base 
> token is in the list so the base token is also an IdentifiedAnnotation. I'll 
> give it a try some time next week and let you know. 
> Thanks 
> Samir
> 
> 
> From: Tim Miller <[email protected]>
> To: [email protected] 
> Sent: Thursday, August 29, 2013 1:07:58 PM
> Subject: Re: Concept annotation questions
> 
> Samir,
> You may be able to use the JCasUtil class from Uimafit to do something like 
> the following:
> 
> for each IdentifiedAnnotation i:
>     List l = JCasUtil.selectCovered(jcas, BaseToken.class, i)
> 
> 
> (this is java-ish pseudocode obviously). Then the list you get of tokens will 
> all have the same type as the IdentifiedAnnotation i. Would that solve your 
> problem?
> Tim
> 
> On 08/29/2013 12:29 PM, samir chabou wrote:
>> Hi James and Pei,
>> I also need to know what is the medical type (Sympto, Drug , procedure, 
>> relation) of a given word token. Since in the typeystem hierarchy wordtoken 
>> is not under the same inheritance tree than identifiedAnnotation . I’m 
>> currently iterating on all wordTokens and compare each wordToken.CoveredText 
>> to the annotations.CovredText in the identifiedAnnotation. I found this a 
>> long process. James, do you think the patch  <<I could create a patch for 
>> you that would help with determining which words from the text matched a 
>> dictionary entry >> that you are planning to create will permit also this 
>> requirement ? or can you suggest me some thing better than I’m currently 
>> doing.
>>  
>> Thanks
>> Samir  
>> 
>> From: "Masanz, James J." <[email protected]>
>> To: "'[email protected]'" <[email protected]> 
>> Sent: Thursday, August 29, 2013 10:18:40 AM
>> Subject: RE: Concept annotation questions
>> 
>> Hi Dennis,
>>  
>> Thanks for explaining why you are interested in finding out which words in 
>> the original text cause a particular concept to be annotated.  We are 
>> currently working on getting Apache cTAKES 3.1 out.  Depending on your 
>> timeline, after that is done, perhaps I could create a patch for you that 
>> would help with determining which words from the text matched a dictionary 
>> entry, rather than just the begin offset of the first word and the end 
>> offset of the last word.
>>  
>> As far as the chunking, the fact “liver” and “and” are being tagged as 
>> O-chunks explains why the dictionary lookup component is not finding liver 
>> cancer or lung cancer in “cancer of colon, liver and lung”
>>  
>> I’ll try that sentence with the latest chunker model (which will be in 
>> cTAKES 3.1) and see if it assigns correct chunk tags for that sentence.
>>  
>> -- James
>>  
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf 
>> Of Dennis Lee Hon Kit
>> Sent: Wednesday, August 28, 2013 2:33 PM
>> To: [email protected]
>> Subject: Re: Concept annotation questions
>>  
>> Hi James & Pei,
>>  
>> Thank you for your replies and sorry for my late reply as I have been away.
>>  
>> Q1 – The longest span could work and is one of the options we are looking at 
>> but when there are overlaps it can get complicated.  In the following 
>> example, the longest would work.  We can take start with 01, and ignore 02 
>> and 03 because their start positions overlap the end position of 01, and 
>> then continue with 04.  But I don’t think it will always be this straight 
>> forward as the being/end string positions may not always be a good indicator 
>> of what exactly in the original text was coded.
>>  
>> 00 Invasive ductal carcinoma of the left breast with bone metastases.
>> 01 Invasive ductal carcinoma of the left breast                       
>> 408643008|Infiltrating duct carcinoma of breast (disorder)|
>> 02                                       breast with bone             
>> 56873002|Bone structure of sternum (body structure)|
>> 03                                       breast with bone metastases  
>> 94297009|Secondary malignant neoplasm of female breast (disorder)|
>> 04                                                   bone metastases  
>> 94222008|Secondary malignant neoplasm of bone (disorder)|
>>  
>> Q2 – As we are beginners, we are not at the level where we are comfortable 
>> with modifying cTakes or even know where to begin modifying cTakes but that 
>> would be an option in the future.  Going back to the example of “cancer of 
>> liver” and using the begin/end position of the string that was used to 
>> identify the concept, the original string would be “cancer of colon, lung 
>> and liver.”  The CUI that was identified was C0345904, which has 209 (137 
>> unique) descriptions for all languages.  Examples of English terms include:
>>     • CA - Liver cancer
>>     • Cancer of Liver
>>     • cancer of the liver
>>     • Cancer, Hepatic
>>     • CANCER, HEPATOCELLULAR
>>     • Malignant hepatic neoplasm
>>     • Malignant liver tumor
>>     • Malignant liver tumour
>>     • Malignant neoplasm of liver
>>     • malignant neoplasm of liver (diagnosis)
>>     • Malignant neoplasm of liver unspecified
>>     • Malignant neoplasm of liver unspecified (disorder)
>>     • Malignant neoplasm of liver, not specified as primary or secondary
>>     • Malignant neoplasm of liver, NOS
>>     • Malignant neoplasm of liver, unspecified
>>     • malignant neosplasm of the liver
>>     • Malignant tumor of liver
>>     • Malignant tumor of liver (disorder)
>>     • Malignant tumour of liver
>> It would seem suboptimal to go through each of the descriptions to try and 
>> determine which was the UMLS term that was used in the coding.  It is 
>> important for us to know which part of the string is matched because 
>> something like “Invasive ductal carcinoma of the left breast” will be 
>> matched to the SNOMED CT concept “408643008|Infiltrating duct carcinoma of 
>> breast (disorder)|”, but we would like to know that “left” was not matched 
>> and would like to post-coordinate the expression to indicate the left 
>> breast, i.e.: 408643008|Infiltrating duct carcinoma of breast 
>> (disorder)|:363698007|Finding site (attribute)|=80248007|Left breast 
>> structure (body structure)|.  When there are other qualifiers like severity, 
>> chronicity and episodicity that may be ignored when matching, we would like 
>> to capture it at the level of granularity specified in the original text.
>>  
>> In terms of the chunking, here is what I see for “cancer of colon, lung and 
>> liver”:
>>     • NP: cancer of colon, lung and liver
>>     • PP: of
>>     • NP: colon, lung and liver
>> For “cancer of colon, liver and lung” here is what I see:
>>     • NP: cancer of colon,
>>     • PP: of
>>     • NP: colon
>>     • O: liver
>>     • O: and
>>     • NP: lung
>> Q3 – To answer Pei’s question, we are not looking at the preferred name from 
>> the UMLS, just which term was used.
>>  
>> Regards,
>> Dennis
>>  
>> From: Chen, Pei
>> Sent: Thursday, August 22, 2013 12:27 PM
>> To: [email protected]
>> Subject: RE: Concept annotation questions
>>  
>> Also,
>> > 3)… or the exact description that was returned in the UMLS?
>> I presume you mean to save the preferred name from UMLS?  If so, this seems 
>> to be a common request- see: https://issues.apache.org/jira/browse/CTAKES-224
>>  
>> --Pei
>>  
>> From: Masanz, James J. [mailto:[email protected]] 
>> Sent: Thursday, August 22, 2013 3:24 PM
>> To: '[email protected]'
>> Subject: RE: Concept annotation questions
>>  
>>  
>> Welcome to the cTAKES community.
>>  
>> Q1 – some people use the longest span.
>> Q2 &Q3 – can you just use the text from the dictionary “Malignant neoplasm 
>> of liver (disorder)“.  Alternatively you could modify cTAKES to save the 
>> text of the words that it matches when it is performing dictionary lookup. I 
>> would guess there is a term in the UMLS dictionary with the same code as 
>> Malignant neoplasm of liver (disorder) that just has the words “cancer of 
>> liver”, but there isn’t anything in cTAKES to give that to you just through 
>> a configuration change.
>>  
>> For “cancer of colon, liver and lung“, can you look at the chunk  tag for 
>> liver.  If it’s in a separate noun phrase (NP) from “cancer of colon” that 
>> would account for why cancer is not getting tied to liver in that case (but 
>> wouldn’t account for why the chunker is creating as a separate noun phrase)
>>  
>> -- James
>>  
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf 
>> Of Dennis Lee Hon Kit
>> Sent: Wednesday, August 21, 2013 1:10 PM
>> To: [email protected]
>> Subject: Concept annotation questions
>>  
>> Hi Everyone,
>>  
>> We are new to cTakes so please bear with our questions.  We are using cTakes 
>> to annotate things like encounter diagnoses and referral notes and are 
>> especially interested with the SNOMED CT encodings.  But we are not sure how 
>> to make sense of all the outputs.
>>  
>> Example #1
>>  
>> In the example below, “cancer of colon, lung and liver” has been encoded 
>> with SNOMED CT and additional concepts that do not apply have been removed 
>> (e.g., general “cancer” concept, lung, colon and liver structures, etc).   
>> They have been plotted out by the begin/end positions.  If the terms to do 
>> not align, its probably because the email only accepts plain text and a 
>> mono-spaced font is not the default.
>>  
>> cancer of colon, lung and liver
>> cancer of colon, lung and liver   93870000|Malignant neoplasm of liver 
>> (disorder)|
>> cancer of colon, lung             363358000|Malignant tumor of lung 
>> (disorder)|
>> cancer of colon                   363406005|Malignant tumor of colon 
>> (disorder)|
>>  
>> Question (1) – We had to do quite a bit of post-processing to remove 
>> inactive concepts, subtype concepts, concepts that are part of the defining 
>> attributes, etc.  Are there a set of guidelines to help sort out the CUI or 
>> SNOMED CT codes that have been identified?
>> Question (2) – How can we determine that “93870000|Malignant neoplasm of 
>> liver (disorder)|” refers to “cancer of liver” as opposed to using the 
>> begin/end string, which points to “cancer of colon, lung and liver”?  
>> Certainly we can try to do additional parsing but there are a lot of 
>> different scenarios to take into account.
>> Question (3) – This relates to question 2, are we able to identify the 
>> original terms that were used for the concept matching or the exact 
>> description that was returned in the UMLS?  While the CUI is helpful, the 
>> CUI can refer to tens or even hundreds of descriptions.
>>  
>> Example #2
>>  
>> Switching the position of colon, lung and liver can result in different 
>> encodings.  Once again, after removing additional concepts not needed (i.e., 
>> “cancer” and “colon structure”), we get the following.  What happened to 
>> liver and lung cancer?
>>  
>> cancer of colon, liver and lung
>> cancer of colon                   363406005|Malignant tumor of colon 
>> (disorder)|
>>                            lung   39607008|Lung structure (body structure)|
>>  
>> We have more questions but will start with these.  Thank you in advance.
>>  
>> Regards,
>> Dennis
>> 
>> 
> 
> 
> 
> 
>

Re: Concept annotation questions and keep JCas results in a file

Reply via email to