Just looking at what you wrote as the desired output it looks like you just
want the associated ontology concept text (ie. in this case input= output="Orthopnea"). Is this correct? Note that for the
annotation mention that you showed (ie. the SignSymptomMention) the
ref_ontologyConceptArr maps to the UmlsConcept _id value. The documentation
seems spares on how all of the relationships in the ctakes XMI output
exactly works, but this relation between annotation mention and UMLS
concept tags seems to hold across all other XMIs that I have seen. You
could use this relation to get the UmlsConcept.preferredText output (that I
think) you are looking for by mapping in this way.
I don't know anything about how ctakes is parsing for the sentence
segments, but I notice that the raw text you provide has a lot a period
characters for abbreviations. Ctakes seems to have problems segmenting
these kinds of sentences, eg. here are the sentence segment I get when
inputting an abbreviation heavy string into the ctakesCVD and using the
AggregatePlainTextFastUMLSPipeline.xmi:
"
> [
> pt.
> ]
>
> [
> desc.
> ]
> [
> not having any reason to con't.
> ]
> [
> living;
> ]
> [
> clinical depression.
> ]
> "
This could be the reason for some weirdness in trying to extract sentence
information from the XMI fields.
By the way what are you using to view the XMI? The tags in your images look
different than what I see in when running ctakes, eg. mine look like
id="0" ontologyConceptArr="340" typeID="3" discoveryTechnique="1"
> confidence="0.0" polarity="1" uncertainty="0" conditional="false"
> generic="false" subject="patient" historyOf="0"/>
Hope this helps.
On Tue, Mar 27, 2018 at 8:06 PM, Yadav, Harish wrote:
> Hi Reed,
>
>
>
> Thanks for responding. Below is the example and output which I am trying
> to get:
>
>
>
> Once cTAKES gives the output after processing the raw_text (clinical
> document) in the form of XML. Below are the snapshots depicting what I am
> trying to extract from the XML:
>
>
>
> Step 1
>
> Finding the CUI and the id in the XML (In below snapshot cui is C0085619
> and id is 39838 marked in red rectangle).
>
>
>
>
>
> Step 2
>
> Finding the begin and the end tags for the corresponding CUI ( In below
> snapshot begin = 5740 and end = 5749 marked in red rectangle)
>
>
>
> Step 3
>
> Finding the begin and end tags for the sentence of the corresponding CUI (
> In below snapshot begin = 5740 and end = 5750 marked in red rectangle)
>
>
>
>
>
> Now when I am trying to Get the complete sentence from the raw_text
> (clinical document which was fed as an input to cTAKES) where the CUI was
> tagged, by using the begin and end tags of sentence extracted in the step 3
> by simply performing raw_text[5740:5750] I am getting the output as:
>
>
>
> *OUTPUT** :- *o pnd ort
>
>
>
> *Instead of this I was expecting the complete sentence of the raw_text as*:-
> Orthopnea (since the CUI correspond to Orthopnea hence the tagged sentence
> should comprise the tagged concept as well i.e Orthopnea)
>
>
>
> Below is the snippet from the of the raw_text where I have marked the
> sentence in red rectangular box which yields “o pnd. ort” instead of
> “orthopnea” :-
>
>
>
>
>
> Please let me know if you have any queries regarding the example or the
> output I am trying to get.
>
>
>
> Regards,
>
> Harish.
>
>
>
>
>
>
>
> *From:* Reed Villanueva [mailto:villanuevar...@gmail.com]
> *Sent:* Wednesday, March 28, 2018 12:35 AM
> *To:* user@ctakes.apache.org
> *Subject:* Re: Sentence extraction from cTAKES XML output.
>
>
>
> Could you provide an example of the problem your are seeing and a bit more
> about the kind of output you are trying to end up with?
>
>
>
>
>
>
>
> On Tue, Mar 27, 2018 at 3:33 PM, Yadav, Harish
> wrote:
>
> Hi All,
>
>
>
> I am trying to extract the sentence from cTAKES XML output by taking the
> “begin=5740” and “end=5749” tags (5740 and 5749 is just one example) in
> org.apache.ctakes.typesystem.type.textspan.Sentence and slicing the input
> text from 5740 to 5749 characters, but it turns out that the extracted
> section is not the complete sentence and misses the concept(CUIs preferred
> text) as well sometimes.
>
>
>
> I am analyzing the sentences as well where the concept is tagged, so I
> need them to be complete. Any pointers will be of great help
>
>
>
> Regards,
>
> Harish.
>
>
>