Packaging cTAKES in a Jar - LVG Related Configuration Error

2018-03-28 Thread Michael Trepanier
Hi All,

I am attempting to package cTAKES in a jar while while avoiding it copying
the lvg related files to /tmp/ as it does
in 
/ctakes/trunk/ctakes-lvg/src/main/java/org/apache/ctakes/lvg/ae/LvgAnnotator.java.

Everything works up until cTAKES tries to path the lvg.properties file
within the jar down to gov.nih.nlm.nls.lvg.Lib.SetConfiguration, where the
code attempts to create a FileInputStream from a resource contained within
a jar, which throws the below exception.


** Configuration Error:
jar:file:\D:\ctakes\ctakes-local\lib\ctakes-assembly-4.0.jar!\org\apache\ctakes\lvg\data\config\lvg.properties
(The filename, directory name, or volume label syntax is incorrect)
** Error: problem of opening/reading config file:
'jar:file:\D:\ctakes\ctakes-local\lib\ctakes-assembly-4.0.jar!\org\apache\ctakes\lvg\data\config\lvg.properties'.
Use -x option to specify the config file path.

While I likely can't avoid the above scenario without changing cTAKES'
dependencies, I was wondering two things:

1) If it would be possible to set the LVG_DIR to a non absolute path
instead of AUTO_MODE and have it function properly?

2) Oddly enough, despite logging this error, cTAKES appears to be running
fine locally. Should I not be concerned about these configuration errors as
they don't seem to be impacting anything? Is there a downstream way I can
check that the properties file is being correctly read? Or is cTAKES
chugging through the default pipeline evidence enough that I need not worry?

Best,

Mike



-- 
[image: MetiStream Logo - 500]
Mike Trepanier| Big Data Engineer | MetiStream, Inc. |  m...@metistream.com |
845 - 270 - 3129 (m) | www.metistream.com


Re: Sentence extraction from cTAKES XML output.

2018-03-28 Thread Reed Villanueva
Just looking at what you wrote as the desired output it looks like you just
want the associated ontology concept text (ie. in this case input= output="Orthopnea"). Is this correct? Note that for the
annotation mention that you showed (ie. the SignSymptomMention) the
ref_ontologyConceptArr maps to the UmlsConcept _id value. The documentation
seems spares on how all of the relationships in the ctakes XMI output
exactly works, but this relation between annotation mention and UMLS
concept tags seems to hold across all other XMIs that I have seen. You
could use this relation to get the UmlsConcept.preferredText output (that I
think) you are looking for by mapping in this way.

I don't know anything about how ctakes is parsing for the sentence
segments, but I notice that the raw text you provide has a lot a period
characters for abbreviations. Ctakes seems to have problems segmenting
these kinds of sentences, eg. here are the sentence segment I get when
inputting an abbreviation heavy string into the ctakesCVD and using the
AggregatePlainTextFastUMLSPipeline.xmi:

"
> ​[​
> pt.
> ​]​
>
> ​[​
> desc.
> ​]​
> ​[​
> not having any reason to con't.
> ​]​
> ​[​
> living;
> ​]​
> ​[​
> clinical depression.
> ​]​
> "


​This could be the reason for some weirdness in trying to extract sentence
information from the XMI fields.​

By the way what are you using to view the XMI? The tags in your images look
different than what I see in when running ctakes, eg. mine look like

 id="0" ontologyConceptArr="340" typeID="3" discoveryTechnique="1"
> confidence="0.0" polarity="1" uncertainty="0" conditional="false"
> generic="false" subject="patient" historyOf="0"/>


​Hope this helps.​

On Tue, Mar 27, 2018 at 8:06 PM, Yadav, Harish  wrote:

> Hi Reed,
>
>
>
> Thanks for responding. Below is the example and output which I am trying
> to get:
>
>
>
> Once cTAKES gives the output after processing the raw_text (clinical
> document) in the form of XML. Below are the snapshots depicting what I am
> trying to extract from the XML:
>
>
>
> Step 1
>
> Finding the CUI and the id in the XML (In below snapshot cui is C0085619
> and id is 39838 marked in red rectangle).
>
>
>
>
>
> Step 2
>
> Finding the begin and the end tags for the corresponding CUI ( In below
> snapshot begin = 5740 and end = 5749 marked in red rectangle)
>
>
>
> Step 3
>
> Finding the begin and end tags for the sentence of the corresponding CUI (
> In below snapshot begin = 5740 and end = 5750 marked in red rectangle)
>
>
>
>
>
> Now when I am trying to Get the complete sentence from the raw_text
> (clinical document which was fed as an input to cTAKES) where the CUI was
> tagged, by using the begin and end tags of sentence extracted in the step 3
> by simply performing raw_text[5740:5750] I am getting the output as:
>
>
>
> *OUTPUT** :- *o pnd ort
>
>
>
> *Instead of this I was expecting the complete sentence of the raw_text as*:-
> Orthopnea (since the CUI correspond to Orthopnea hence the tagged sentence
> should comprise the tagged concept as well i.e Orthopnea)
>
>
>
> Below is the snippet from the of the raw_text where I have marked the
> sentence in red rectangular box which yields “o pnd. ort” instead of
> “orthopnea” :-
>
>
>
>
>
> Please let me know if you have any queries regarding the example or the
> output I am trying to get.
>
>
>
> Regards,
>
> Harish.
>
>
>
>
>
>
>
> *From:* Reed Villanueva [mailto:villanuevar...@gmail.com]
> *Sent:* Wednesday, March 28, 2018 12:35 AM
> *To:* user@ctakes.apache.org
> *Subject:* Re: Sentence extraction from cTAKES XML output.
>
>
>
> Could you provide an example of the problem your are seeing and a bit more
> about the kind of output you are trying to end up with?
>
>
>
>
>
>
>
> On Tue, Mar 27, 2018 at 3:33 PM, Yadav, Harish 
> wrote:
>
> Hi All,
>
>
>
> I am trying to extract the sentence from cTAKES XML output by taking the
> “begin=5740” and “end=5749” tags (5740 and 5749 is just one example) in
> org.apache.ctakes.typesystem.type.textspan.Sentence and slicing the input
> text from 5740 to 5749 characters, but it turns out that the extracted
> section is not the complete sentence and misses the concept(CUIs preferred
> text) as well sometimes.
>
>
>
> I am analyzing the sentences as well where the concept is tagged, so I
> need them to be complete. Any pointers will be of great help
>
>
>
> Regards,
>
> Harish.
>
>
>


RE: Sentence extraction from cTAKES XML output.

2018-03-28 Thread Yadav, Harish
Hi Reed,

Thanks for responding. Below is the example and output which I am trying to get:

Once cTAKES gives the output after processing the raw_text (clinical document) 
in the form of XML. Below are the snapshots depicting what I am trying to 
extract from the XML:

Step 1
Finding the CUI and the id in the XML (In below snapshot cui is C0085619 and id 
is 39838 marked in red rectangle).
[cid:image001.png@01D3C634.B0363D20]

Step 2
Finding the begin and the end tags for the corresponding CUI ( In below 
snapshot begin = 5740 and end = 5749 marked in red rectangle)
[cid:image002.png@01D3C635.30FFC070]

Step 3
Finding the begin and end tags for the sentence of the corresponding CUI ( In 
below snapshot begin = 5740 and end = 5750 marked in red rectangle)
[cid:image003.png@01D3C636.EC021750]


Now when I am trying to Get the complete sentence from the raw_text (clinical 
document which was fed as an input to cTAKES) where the CUI was tagged, by 
using the begin and end tags of sentence extracted in the step 3 by simply 
performing raw_text[5740:5750] I am getting the output as:

OUTPUT :- o pnd ort

Instead of this I was expecting the complete sentence of the raw_text as:- 
Orthopnea (since the CUI correspond to Orthopnea hence the tagged sentence 
should comprise the tagged concept as well i.e Orthopnea)

Below is the snippet from the of the raw_text where I have marked the sentence 
in red rectangular box which yields “o pnd. ort” instead of “orthopnea” :-

[cid:image004.png@01D3C638.5DA307B0]

Please let me know if you have any queries regarding the example or the output 
I am trying to get.

Regards,
Harish.



From: Reed Villanueva [mailto:villanuevar...@gmail.com]
Sent: Wednesday, March 28, 2018 12:35 AM
To: user@ctakes.apache.org
Subject: Re: Sentence extraction from cTAKES XML output.

Could you provide an example of the problem your are seeing and a bit more 
about the kind of output you are trying to end up with?



On Tue, Mar 27, 2018 at 3:33 PM, Yadav, Harish 
> wrote:
Hi All,

I am trying to extract the sentence from cTAKES XML output by taking the 
“begin=5740” and “end=5749” tags (5740 and 5749 is just one example) in 
org.apache.ctakes.typesystem.type.textspan.Sentence and slicing the input text 
from 5740 to 5749 characters, but it turns out that the extracted section is 
not the complete sentence and misses the concept(CUIs preferred text) as well 
sometimes.

I am analyzing the sentences as well where the concept is tagged, so I need 
them to be complete. Any pointers will be of great help

Regards,
Harish.