Hi Richard, I had a look to these methods they can allow me to implement my requirement. Do you have an idea if there is a preferrence of using readXCas/writeXCas rather than readXmi/writeXmi or it is just a matter of having different possibilities of read/write from/to different file format. Thanks Samir
________________________________ From: Richard Eckart de Castilho <[email protected]> To: [email protected]; samir chabou <[email protected]> Sent: Friday, September 6, 2013 3:29:19 AM Subject: Re: Concept annotation questions and keep JCas results in a file Hi, you might want to take a look at convenience methods in the recently released Apache uimaFIT 2.0.0: CasIOUtil readXCas(JCas, File) readXmi(JCas, File) writeXCas(JCas, File) writeXmi(JCas, File) Cheers, -- Richard On 06.09.2013, at 06:28, samir chabou <[email protected]> wrote: > Hi Tim, Pei and James > 1) I tryied List l = JCasUtil.selectCovered(jcas, BaseToken.class, i) it > answer perfectly my requirement, thanks Tim. > 2) Now; I need to NLP a medical question using the clinical pipeline and I > need to keep the JCas result in a file or any persistent way because i need > to use it later in my processing. Is it possible to do this and is it > possible to recall this JCas later in my processing ? > > Thanks > Samir > From: samir chabou <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Thursday, August 29, 2013 2:51:12 PM > Subject: Re: Concept annotation questions > > Thanks Tim, > it looks a better and cleaner way. It means the List l = > JCasUtil.selectCovered(jcas, BaseToken.class, i) will give me the > intersection between the BaseTokens and IdentifiedAnnotations. If my base > token is in the list so the base token is also an IdentifiedAnnotation. I'll > give it a try some time next week and let you know. > Thanks > Samir > > > From: Tim Miller <[email protected]> > To: [email protected] > Sent: Thursday, August 29, 2013 1:07:58 PM > Subject: Re: Concept annotation questions > > Samir, > You may be able to use the JCasUtil class from Uimafit to do something like > the following: > > for each IdentifiedAnnotation i: > List l = JCasUtil.selectCovered(jcas, BaseToken.class, i) > > > (this is java-ish pseudocode obviously). Then the list you get of tokens will > all have the same type as the IdentifiedAnnotation i. Would that solve your > problem? > Tim > > On 08/29/2013 12:29 PM, samir chabou wrote: >> Hi James and Pei, >> I also need to know what is the medical type (Sympto, Drug , procedure, >> relation) of a given word token. Since in the typeystem hierarchy wordtoken >> is not under the same inheritance tree than identifiedAnnotation . I’m >> currently iterating on all wordTokens and compare each wordToken.CoveredText >> to the annotations.CovredText in the identifiedAnnotation. I found this a >> long process. James, do you think the patch <<I could create a patch for >> you that would help with determining which words from the text matched a >> dictionary entry >> that you are planning to create will permit also this >> requirement ? or can you suggest me some thing better than I’m currently >> doing. >> >> Thanks >> Samir >> >> From: "Masanz, James J." <[email protected]> >> To: "'[email protected]'" <[email protected]> >> Sent: Thursday, August 29, 2013 10:18:40 AM >> Subject: RE: Concept annotation questions >> >> Hi Dennis, >> >> Thanks for explaining why you are interested in finding out which words in >> the original text cause a particular concept to be annotated. We are >> currently working on getting Apache cTAKES 3.1 out. Depending on your >> timeline, after that is done, perhaps I could create a patch for you that >> would help with determining which words from the text matched a dictionary >> entry, rather than just the begin offset of the first word and the end >> offset of the last word. >> >> As far as the chunking, the fact “liver” and “and” are being tagged as >> O-chunks explains why the dictionary lookup component is not finding liver >> cancer or lung cancer in “cancer of colon, liver and lung” >> >> I’ll try that sentence with the latest chunker model (which will be in >> cTAKES 3.1) and see if it assigns correct chunk tags for that sentence. >> >> -- James >> >> From: [email protected] >> [mailto:[email protected]] On Behalf >> Of Dennis Lee Hon Kit >> Sent: Wednesday, August 28, 2013 2:33 PM >> To: [email protected] >> Subject: Re: Concept annotation questions >> >> Hi James & Pei, >> >> Thank you for your replies and sorry for my late reply as I have been away. >> >> Q1 – The longest span could work and is one of the options we are looking at >> but when there are overlaps it can get complicated. In the following >> example, the longest would work. We can take start with 01, and ignore 02 >> and 03 because their start positions overlap the end position of 01, and >> then continue with 04. But I don’t think it will always be this straight >> forward as the being/end string positions may not always be a good indicator >> of what exactly in the original text was coded. >> >> 00 Invasive ductal carcinoma of the left breast with bone metastases. >> 01 Invasive ductal carcinoma of the left breast >> 408643008|Infiltrating duct carcinoma of breast (disorder)| >> 02 breast with bone >> 56873002|Bone structure of sternum (body structure)| >> 03 breast with bone metastases >> 94297009|Secondary malignant neoplasm of female breast (disorder)| >> 04 bone metastases >> 94222008|Secondary malignant neoplasm of bone (disorder)| >> >> Q2 – As we are beginners, we are not at the level where we are comfortable >> with modifying cTakes or even know where to begin modifying cTakes but that >> would be an option in the future. Going back to the example of “cancer of >> liver” and using the begin/end position of the string that was used to >> identify the concept, the original string would be “cancer of colon, lung >> and liver.” The CUI that was identified was C0345904, which has 209 (137 >> unique) descriptions for all languages. Examples of English terms include: >> • CA - Liver cancer >> • Cancer of Liver >> • cancer of the liver >> • Cancer, Hepatic >> • CANCER, HEPATOCELLULAR >> • Malignant hepatic neoplasm >> • Malignant liver tumor >> • Malignant liver tumour >> • Malignant neoplasm of liver >> • malignant neoplasm of liver (diagnosis) >> • Malignant neoplasm of liver unspecified >> • Malignant neoplasm of liver unspecified (disorder) >> • Malignant neoplasm of liver, not specified as primary or secondary >> • Malignant neoplasm of liver, NOS >> • Malignant neoplasm of liver, unspecified >> • malignant neosplasm of the liver >> • Malignant tumor of liver >> • Malignant tumor of liver (disorder) >> • Malignant tumour of liver >> It would seem suboptimal to go through each of the descriptions to try and >> determine which was the UMLS term that was used in the coding. It is >> important for us to know which part of the string is matched because >> something like “Invasive ductal carcinoma of the left breast” will be >> matched to the SNOMED CT concept “408643008|Infiltrating duct carcinoma of >> breast (disorder)|”, but we would like to know that “left” was not matched >> and would like to post-coordinate the expression to indicate the left >> breast, i.e.: 408643008|Infiltrating duct carcinoma of breast >> (disorder)|:363698007|Finding site (attribute)|=80248007|Left breast >> structure (body structure)|. When there are other qualifiers like severity, >> chronicity and episodicity that may be ignored when matching, we would like >> to capture it at the level of granularity specified in the original text. >> >> In terms of the chunking, here is what I see for “cancer of colon, lung and >> liver”: >> • NP: cancer of colon, lung and liver >> • PP: of >> • NP: colon, lung and liver >> For “cancer of colon, liver and lung” here is what I see: >> • NP: cancer of colon, >> • PP: of >> • NP: colon >> • O: liver >> • O: and >> • NP: lung >> Q3 – To answer Pei’s question, we are not looking at the preferred name from >> the UMLS, just which term was used. >> >> Regards, >> Dennis >> >> From: Chen, Pei >> Sent: Thursday, August 22, 2013 12:27 PM >> To: [email protected] >> Subject: RE: Concept annotation questions >> >> Also, >> > 3)… or the exact description that was returned in the UMLS? >> I presume you mean to save the preferred name from UMLS? If so, this seems >> to be a common request- see: https://issues.apache.org/jira/browse/CTAKES-224 >> >> --Pei >> >> From: Masanz, James J. [mailto:[email protected]] >> Sent: Thursday, August 22, 2013 3:24 PM >> To: '[email protected]' >> Subject: RE: Concept annotation questions >> >> >> Welcome to the cTAKES community. >> >> Q1 – some people use the longest span. >> Q2 &Q3 – can you just use the text from the dictionary “Malignant neoplasm >> of liver (disorder)“. Alternatively you could modify cTAKES to save the >> text of the words that it matches when it is performing dictionary lookup. I >> would guess there is a term in the UMLS dictionary with the same code as >> Malignant neoplasm of liver (disorder) that just has the words “cancer of >> liver”, but there isn’t anything in cTAKES to give that to you just through >> a configuration change. >> >> For “cancer of colon, liver and lung“, can you look at the chunk tag for >> liver. If it’s in a separate noun phrase (NP) from “cancer of colon” that >> would account for why cancer is not getting tied to liver in that case (but >> wouldn’t account for why the chunker is creating as a separate noun phrase) >> >> -- James >> >> From: [email protected] >> [mailto:[email protected]] On Behalf >> Of Dennis Lee Hon Kit >> Sent: Wednesday, August 21, 2013 1:10 PM >> To: [email protected] >> Subject: Concept annotation questions >> >> Hi Everyone, >> >> We are new to cTakes so please bear with our questions. We are using cTakes >> to annotate things like encounter diagnoses and referral notes and are >> especially interested with the SNOMED CT encodings. But we are not sure how >> to make sense of all the outputs. >> >> Example #1 >> >> In the example below, “cancer of colon, lung and liver” has been encoded >> with SNOMED CT and additional concepts that do not apply have been removed >> (e.g., general “cancer” concept, lung, colon and liver structures, etc). >> They have been plotted out by the begin/end positions. If the terms to do >> not align, its probably because the email only accepts plain text and a >> mono-spaced font is not the default. >> >> cancer of colon, lung and liver >> cancer of colon, lung and liver 93870000|Malignant neoplasm of liver >> (disorder)| >> cancer of colon, lung 363358000|Malignant tumor of lung >> (disorder)| >> cancer of colon 363406005|Malignant tumor of colon >> (disorder)| >> >> Question (1) – We had to do quite a bit of post-processing to remove >> inactive concepts, subtype concepts, concepts that are part of the defining >> attributes, etc. Are there a set of guidelines to help sort out the CUI or >> SNOMED CT codes that have been identified? >> Question (2) – How can we determine that “93870000|Malignant neoplasm of >> liver (disorder)|” refers to “cancer of liver” as opposed to using the >> begin/end string, which points to “cancer of colon, lung and liver”? >> Certainly we can try to do additional parsing but there are a lot of >> different scenarios to take into account. >> Question (3) – This relates to question 2, are we able to identify the >> original terms that were used for the concept matching or the exact >> description that was returned in the UMLS? While the CUI is helpful, the >> CUI can refer to tens or even hundreds of descriptions. >> >> Example #2 >> >> Switching the position of colon, lung and liver can result in different >> encodings. Once again, after removing additional concepts not needed (i.e., >> “cancer” and “colon structure”), we get the following. What happened to >> liver and lung cancer? >> >> cancer of colon, liver and lung >> cancer of colon 363406005|Malignant tumor of colon >> (disorder)| >> lung 39607008|Lung structure (body structure)| >> >> We have more questions but will start with these. Thank you in advance. >> >> Regards, >> Dennis >> >> > > > > >
