Analyzing and processing cTakes NLP output

digital girl Tue, 08 Oct 2013 18:50:03 -0700

Hi Team (or is it just the Team of Samir ;-)
 
I had processed a 2 1/2 page narrative from the CVD tool and exported to XCAS 
file in xml.   I would like to extract the key items from the narrative that 
cTakes is known for such as identifying diseases/disorders, medications, 
signs/symptoms, and so forth.    I quickly perused the file via xml browser and 
did see the SNOMED and RXNORM codes associated.   I decided to printout out the 
file to markup the sections and to get an idea of how these codes relate back 
to the concepts identified by cTakes.   My printer ran out of paper after about 
60 pages and when I looked at the top sheet it was 1 out of 2243 pages!   A 2 
1/2 page narrative resulted in an xml file of over two thousand pages!!!   
 
 I examined the first medication mapping.  The numeric lines are my comments 
and everything else copy/pasted from XCAS file.  
 
1.  Identification of RxNorm code is 69749 but it's not associated with a 
concept so I copied '163573' and pasted in search in the xml file.   See number 
2 below for what retrieved.
<uima.cas.FSArray _id="163573" size="1">
<i>163576</i>


</uima.cas.FSArray>

<org.apache.ctakes.typesystem.type.refsem.OntologyConcept _id="163539" 
codingScheme="RXNORM" code="69749" oid="69749#RXNORM"/>
 
2.  Retrieved this result with some additional information such as the generic 
is false,  but not med name mention.  I  copied "163581" and pasted in search.  
See number 3 below for what retrieved.  
_indexed="1" _id="163581" _ref_sofa="6" begin="9776" end="9784" id="530" 
_ref_ontologyConceptArr="163573" typeID="1" segmentID="SIMPLE_SEGMENT" 
discoveryTechnique="1" confidence="1.0" polarity="1" uncertainty="0" 
conditional="false" generic="false" subject="patient" historyOf="0"/>
 
3.  Retrieved this result.  The RxNorm code associated identifies Coumadin as a 
treatment.  
<org.apache.ctakes.assertion.medfacts.types.Concept _indexed="1" _id="227307" 
_ref_sofa="6" begin="9776" end="9784" conceptType="TREATMENT" 
conceptText="Coumadin" externalId="0" originalEntityExternalId="163581"/>
 
Here are my questions:
 
1. is there any resources available that explains what the xml output file 
contains and the layout?  Such as what does confidence of 1.0 and polarity of 1 
and uncertainty of 0 refer to?  
 
2.  Are there any tools already existing that interpret the NLP output from 
cTakes and automatically structure and associate it to the concepts?  Such as, 
automatically associate the RxNorm to the medication mention as illustrated 
above.   As you see it took a few steps to associate the RxNorm code to the 
actual medication mention from the narrative.    
 
Thanks.
 
Regards,
Paula

Analyzing and processing cTakes NLP output

Reply via email to