RE: Analyzing and processing cTakes NLP output - Resolved

digital girl Thu, 10 Oct 2013 17:01:44 -0700

Thanks Tim, since you mentioned that several projects are moving towards using 
UIMAFit I'll get started with that.    I just downloaded the tutorial for 
UIMAFit.  
 
I really appreciate your response.
 
Regards,
Paula

From: [email protected]
To: [email protected]
Subject: Re: Analyzing and processing cTakes NLP output
Date: Thu, 10 Oct 2013 00:18:28 +0000

Yeah, a CPE is one way to go for reading a set of documents and then outputting 
specific information. If you go that route, given your desired outcomes, you 
would have to write a UIMA Consumer class to extract all the things you
 specified and put them somewhere.

Alternatively, many of our projects are moving towards using UIMAFit, which 
allows you to do many of the same things without having to deal with xml 
configuration files. A good place to start with that approach is the class:

/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfCUIsGenerator.java

and its parent class:

/ctakes-clinical-pipeline/src/main/java/org/apache/ctakes/clinicalpipeline/runtime/BagOfAnnotationsGenerator.java

It has a main method so you can run it like a normal java program. It will run 
the standard ctakes pipeline on a set of files in a hardcoded directory 
("data/input") and write out files with the extracted CUIs to another hardcoded 
directory ("data/output").
 That isn't exactly what you want but I think if you do need to do some 
development you can copy that class and extend it for your own uses, and that 
is probably the route that requires the smallest amount of effort.

Tim

On 10/09/2013 06:30 PM, digital girl wrote:

Hi Tim,

Thanks for the prompt response.

For starters, what I'd like to do process several hundred clinical narratives 
and extract the key items per narrative (CUI, RxForm, symptoms, relationships, 
smoking status,  etc) for structured classification in a database.   Since I'm 
looking at a collection
 of narratives for processing I see that the CPE tool would be ideal.    

You stated that "for more systematic access and processing we usually will 
write java code around an annotator that will use the ctakes API and typesystem 
to extract what we need."  I'm currently using the user tool I'm guessing that 
I will need to graduate
 to the developer version in order to do what you stated.

I appreciate your feedback.

Regards,

Paula

--------------

Hi Paula,

The typical way we visually inspect this information is in the CVD tool. Then 
for more systematic

access and processing we usually will write java code around an annotator that 
will use the

ctakes API and typesystem to extract what we need. Looking directly at the xml 
is usually

about as useful as you seem to have found it (i.e., not very :) ).

What task are you trying to accomplish? If you just want to see what concepts 
are found for

one file at a time that can be done in the CVD. If you are having trouble 
finding what you

need there let us know. If you want an output file with all terms that were 
listed in any

given input file that would probably require a little bit of programming.

Tim

From: [email protected]

To: [email protected]

Subject: Analyzing and processing cTakes NLP output

Date: Tue, 8 Oct 2013 21:00:17 -0400

Hi Team (or is it just the Team of Samir ;-)

I had processed a 2 1/2 page narrative from the CVD tool and exported to XCAS 
file in xml.   I would like to extract the key items from the narrative that 
cTakes is known for such as identifying diseases/disorders, medications, 
signs/symptoms,
 and so forth.    I quickly perused the file via xml browser and did see the 
SNOMED and RXNORM codes associated.   I decided to printout out the file to 
markup the sections and to get an idea of how these codes relate back to the 
concepts identified by cTakes.  
 My printer ran out of paper after about 60 pages and when I looked at the top 
sheet it was 1 out of 2243 pages!   A 2 1/2 page narrative resulted in an xml 
file of over two thousand pages!!!   

 I examined the first medication mapping.  The numeric lines are my comments 
and everything else copy/pasted from XCAS file. 

1.  Identification of RxNorm code is 69749 but it's not associated with a 
concept so I copied '163573' and pasted in search in the xml file.   See number 
2 below for what retrieved.

<uima.cas.FSArray _id="163573" size="1">

<i>163576</i>

</uima.cas.FSArray>

<org.apache.ctakes.typesystem.type.refsem.OntologyConcept _id="163539" 
codingScheme="RXNORM" code="69749" oid="69749#RXNORM"/>

2.  Retrieved this result with some additional information such as the generic 
is false,  but not med name mention.  I  copied "163581" and pasted in search.  
See number 3 below for what retrieved. 

_indexed="1" _id="163581" _ref_sofa="6" begin="9776" end="9784" id="530" 
_ref_ontologyConceptArr="163573" typeID="1" segmentID="SIMPLE_SEGMENT" 
discoveryTechnique="1" confidence="1.0" polarity="1" uncertainty="0" 
conditional="false"
 generic="false" subject="patient" historyOf="0"/>

3.  Retrieved this result.  The RxNorm code associated identifies Coumadin as a 
treatment. 

<org.apache.ctakes.assertion.medfacts.types.Concept _indexed="1" _id="227307" 
_ref_sofa="6" begin="9776" end="9784" conceptType="TREATMENT" 
conceptText="Coumadin" externalId="0" originalEntityExternalId="163581"/>

Here are my questions:

1. is there any resources available that explains what the xml output file 
contains and the layout?  Such as what does confidence of 1.0 and polarity of 1 
and uncertainty of 0 refer to? 

2.  Are there any tools already existing that interpret the NLP output from 
cTakes and automatically structure and associate it to the concepts?  Such as, 
automatically associate the RxNorm to the medication mention as illustrated 
above.   As you see it took
 a few steps to associate the RxNorm code to the actual medication mention from 
the narrative.   

Thanks.

Regards,

Paula

RE: Analyzing and processing cTakes NLP output - Resolved

Reply via email to