If you prefer to stay in Java, the annotations are stored in a data structure 
called a CAS. There are utility classes provided in UIMA and UIMAFit to extract 
these annotations. There is some code to do this in the ctakes-web-rest 
project, it takes all the CUIs extracted from the input and sends back a json 
object:

Check out the methods getAnalyzedJSON(...) and process(...)
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CtakesRestController.java?view=markup
which will lead you to JCasParser and CuiResponse:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/util/JCasParser.java?view=markup
<http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup>http://svn.apache.org/viewvc/ctakes/trunk/ctakes-web-rest/src/main/java/org/apache/ctakes/rest/service/CuiResponse.java?view=markup

Tim

-----Original Message-----
From: "Baas,Leah" 
<[email protected]<mailto:%22Baas,leah%22%20%[email protected]%3e>>
Reply-to: <[email protected]>
To: [email protected] 
<[email protected]<mailto:%[email protected]%22%20%[email protected]%3e>>
Subject: Re: [EXTERNAL] Re: Filtering Annotated Files
Date: Tue, 8 Jan 2019 15:47:53 +0000

Thanks Alden!

I am learning python and I think this will be extremely helpful.

Best,
Leah



From: Alden Gordon <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, January 8, 2019 at 9:43 AM
To: "[email protected]" <[email protected]>
Subject: [EXTERNAL] Re: Filtering Annotated Files

Leah,

If you know python, feel free to use the simple class I wrote to parse cTAKES 
XMI files, attached. It only pulls out the information I needed for my use case 
so you may need to adapt it.

Best,
Alden

On Tue, Jan 8, 2019 at 9:53 AM Smith, Lincoln 
<[email protected]<mailto:[email protected]>> wrote:
I don't know of anything other than parsing the xml text to look for your 
preferred terminology and CUIs of interest in the text. Its not overly 
difficult in R if you google some of their xml parsing examples. Lincoln

Lincoln Smith, MD, MS
Director, Analytic Enablement
Customer Engagement & Insight
412-544-8043
[email protected]<mailto:[email protected]>

From: Baas,Leah 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, January 08, 2019 9:44 AM
To: [email protected]<mailto:[email protected]>
Subject: [EXTERNAL] Filtering Annotated Files

To whom it may concern,

Hello! I am a student researcher who is new to NLP and cTAKES. I am trying to 
use cTAKES to extract clinical text indicative of BRCA mutations, and I’m 
feeling a bit lost. I’ve described my current progress below. Wondering if you 
can guide me to the next step:

So far, I’ve been able to create .xml files for each subject in my dataset, run 
the files through the default clinical pipeline, and view the annotated output 
files in the CVD. However, my goal is to “filter” the annotations for concepts 
relevant to BRCA mutations (such as UMLS CUIs and SNOMED CT terms), and this is 
where I’m getting stuck. Is there a way to isolate these specific concepts 
within the cTAKES system? Or does this require post-processing using a 
different platform?

Thanks for entertaining my amateur question!

Leah Baas


-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information.  Any unauthorized review, use,
disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

________________________________

The information contained in this transmission may contain privileged and 
confidential information including personal information protected by federal 
and/or state privacy laws. It is intended only for the use of the addressee 
named above. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, distribution or duplication of this 
communication is strictly prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message. Highmark Health is a Pennsylvania nonprofit corporation. This 
communication may come from Highmark Health or one of its subsidiaries or 
affiliated businesses.


--

Alden Gordon
Director of Data Science & Analytics
(860) 402-6572


[mage removed by sender.]
rubiconmd.com<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.rubiconmd.com_&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=qPlEU76rQnDMaYeRjU1rulfG3BNk2QZuwDNyveQrogE&s=5ygFYTvFoYpqIGWAb0C1pdy-Zp1g1aIIpnC8dnG0v_c&e=>

Reply via email to