If you prefer to stay in Java, the annotations are stored in a data structure called a CAS. There are utility classes provided in UIMA and UIMAFit to extract these annotations. There is some code to do this in the ctakes-web-rest project, it takes all the CUIs extracted from the input and sends back a json object:
Check out the methods getAnalyzedJSON(...) and process(...) http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CtakesRestController.java?view=markup which will lead you to JCasParser and CuiResponse: http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/util/JCasParser.java?view=markup <http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup>http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup Tim -----Original Message----- From: "Baas,Leah" <[email protected]<mailto:%22Baas,leah%22%20%[email protected]%3e>> Reply-to: <[email protected]> To: [email protected] <[email protected]<mailto:%[email protected]%22%20%[email protected]%3e>> Subject: Re: [EXTERNAL] Re: Filtering Annotated Files Date: Tue, 8 Jan 2019 15:47:53 +0000 Thanks Alden! I am learning python and I think this will be extremely helpful. Best, Leah From: Alden Gordon <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, January 8, 2019 at 9:43 AM To: "[email protected]" <[email protected]> Subject: [EXTERNAL] Re: Filtering Annotated Files Leah, If you know python, feel free to use the simple class I wrote to parse cTAKES XMI files, attached. It only pulls out the information I needed for my use case so you may need to adapt it. Best, Alden On Tue, Jan 8, 2019 at 9:53 AM Smith, Lincoln <[email protected]<mailto:[email protected]>> wrote: I don't know of anything other than parsing the xml text to look for your preferred terminology and CUIs of interest in the text. Its not overly difficult in R if you google some of their xml parsing examples. Lincoln Lincoln Smith, MD, MS Director, Analytic Enablement Customer Engagement & Insight 412-544-8043 [email protected]<mailto:[email protected]> From: Baas,Leah [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, January 08, 2019 9:44 AM To: [email protected]<mailto:[email protected]> Subject: [EXTERNAL] Filtering Annotated Files To whom it may concern, Hello! I am a student researcher who is new to NLP and cTAKES. I am trying to use cTAKES to extract clinical text indicative of BRCA mutations, and I’m feeling a bit lost. I’ve described my current progress below. Wondering if you can guide me to the next step: So far, I’ve been able to create .xml files for each subject in my dataset, run the files through the default clinical pipeline, and view the annotated output files in the CVD. However, my goal is to “filter” the annotations for concepts relevant to BRCA mutations (such as UMLS CUIs and SNOMED CT terms), and this is where I’m getting stuck. Is there a way to isolate these specific concepts within the cTAKES system? Or does this require post-processing using a different platform? Thanks for entertaining my amateur question! Leah Baas ----------------------------------------------------------------------- Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain privileged and confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. ________________________________ The information contained in this transmission may contain privileged and confidential information including personal information protected by federal and/or state privacy laws. It is intended only for the use of the addressee named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Highmark Health is a Pennsylvania nonprofit corporation. This communication may come from Highmark Health or one of its subsidiaries or affiliated businesses. -- Alden Gordon Director of Data Science & Analytics (860) 402-6572 [mage removed by sender.] rubiconmd.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.rubiconmd.com_&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qPlEU76rQnDMaYeRjU1rulfG3BNk2QZuwDNyveQrogE&s=5ygFYTvFoYpqIGWAb0C1pdy-Zp1g1aIIpnC8dnG0v_c&e=>
