RE: Sentence extraction from cTAKES XML output.

Yadav, Harish Tue, 27 Mar 2018 23:07:18 -0700

Hi Reed,

Thanks for responding. Below is the example and output which I am trying to get:


Once cTAKES gives the output after processing the raw_text (clinical document) 
in the form of XML. Below are the snapshots depicting what I am trying to 
extract from the XML:

Step 1
Finding the CUI and the id in the XML (In below snapshot cui is C0085619 and id 
is 39838 marked in red rectangle).
[cid:[email protected]]

Step 2
Finding the begin and the end tags for the corresponding CUI ( In below 
snapshot begin = 5740 and end = 5749 marked in red rectangle)
[cid:[email protected]]

Step 3
Finding the begin and end tags for the sentence of the corresponding CUI ( In 
below snapshot begin = 5740 and end = 5750 marked in red rectangle)
[cid:[email protected]]


Now when I am trying to Get the complete sentence from the raw_text (clinical 
document which was fed as an input to cTAKES) where the CUI was tagged, by 
using the begin and end tags of sentence extracted in the step 3 by simply 
performing raw_text[5740:5750] I am getting the output as:

OUTPUT :- o pnd ort

Instead of this I was expecting the complete sentence of the raw_text as:- 
Orthopnea (since the CUI correspond to Orthopnea hence the tagged sentence 
should comprise the tagged concept as well i.e Orthopnea)

Below is the snippet from the of the raw_text where I have marked the sentence 
in red rectangular box which yields “o pnd. ort” instead of “orthopnea” :-

[cid:[email protected]]

Please let me know if you have any queries regarding the example or the output 
I am trying to get.

Regards,
Harish.



From: Reed Villanueva [mailto:[email protected]]
Sent: Wednesday, March 28, 2018 12:35 AM
To: [email protected]
Subject: Re: Sentence extraction from cTAKES XML output.

Could you provide an example of the problem your are seeing and a bit more 
about the kind of output you are trying to end up with?



On Tue, Mar 27, 2018 at 3:33 PM, Yadav, Harish 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

I am trying to extract the sentence from cTAKES XML output by taking the 
“begin=5740” and “end=5749” tags (5740 and 5749 is just one example) in 
org.apache.ctakes.typesystem.type.textspan.Sentence and slicing the input text 
from 5740 to 5749 characters, but it turns out that the extracted section is 
not the complete sentence and misses the concept(CUIs preferred text) as well 
sometimes.

I am analyzing the sentences as well where the concept is tagged, so I need 
them to be complete. Any pointers will be of great help

Regards,
Harish.

RE: Sentence extraction from cTAKES XML output.

Reply via email to