Re: Post co-ordinated SNOMED-CT with AggregatePlaintextFastUMLSProcessor

2016-10-23 Thread Lacey A . S .
Hi Pete and Azad (good to hear from you Azad - I hope you enjoyed the 
conference in Swansea).

Some excellent advice thank you and plenty to explore. I was going to get to 
work on some complicated xpath queries, but I'll try what you have both 
recommended first.

All the best,

Arron Lacey
Research Analyst
SAIL Databank
Swansea Neuroscience Research Group
01792 602023
a.s.la...@swansea.ac.uk

From: Azad Dehghan 
Sent: 23 Oct 2016 08:25
To: dev@ctakes.apache.org
Subject: Re: Post co-ordinated SNOMED-CT with 
AggregatePlaintextFastUMLSProcessor

Arron,

I would likewise follow peter's approach but the more flexible and
recommended approach would be to use RUTA rule-based language:
https://uima.apache.org/ruta.html

Best wishes,
Azad

On 21 Oct 2016 21:18, "Abramowitsch, Peter" 
wrote:
>
> While it is doable, it will need some non trivial post processing. The
> approach I suggest below is just an example, there are many ways to
> achieve this, but there is no silver bullet.
>
> To do something like that I suggest incorporating a TokensRegex analysis
> engine in your pipeline.  I have had a lot of success with
> https://github.com/JuleStar/uima-tokens-regex
>
> These allow you to combine standard string based Regex with expressions on
> properties of Annotations - a MetaRegex.  They allow you to choose the
> AnnotationType you prefer to operate with.  (Stanford's TokensRegex for
> CoreNLP is even more powerful)
>
> Write TokensRegex rules that look for ConllDep nodes whose text is like
> clinic/visit/specialist/referral.. Whatever you are searching for, and
> assign a unique tag to that token.  Let's say you name the tag CLINIC.
> It's a custom NER, basically
>
> Output your CAS object and start processing here:
>
> Scan the ConllDep tokens of your document looking for one with the new tag
> CLINIC
>
> If you find one, Now find the sentence boundary around this Token, using
> the Sentence Annotations.
>
> Then use the POS attribute of all the ConllDep tokens within that Sentence
> boundary to look for a modifier token(POS=JJ) to the token(POS=NN) that
> you tagged
>
> Now look through the DiseaseDisorderMentions and ProcedureMentions for a
> token whose offsets matches the offsets of your JJ ConllDep token.  If you
> have a hit, then you can use it to find the core SNOMED code for Headache
> Clinic, Epilepsy Clinic, Dialysis Clinic etc.   Once you have this you
> will need to manually add the post coordinations to the SNOMED ref pointed
> to by the "(Disease|Procedure)Mention" token.  You can elaborate on this
> theme to capture more complex cases where the modifier is expressed
> differently or is not adjacent to the "CLINIC" token.
>
> I created a framework in Ruby to post process a CAS in this way, although
> I never went as far as generating SNOMED modifiers as they weren't needed
> in my case.  If not Ruby, use some other language that allows efficient
> manipulation of complex data structures in a very few lines of code.
> Otherwise it will get ugly fast.
>
>
> On 10/21/16, 3:03 AM, "Finan, Sean" 
> wrote:
>
> >Hi Arron,
> >
> >
> >
> >Ctakes discovers text words and phrases by lookup using a subset of the
> >UMLS
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_home
.
>
>html=DQIGaQ=B73tqXN8Ec0ocRmZHMCntw=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswN
>
>F8BK5Orm10=eJEOUMzoBPBjZxm8a4k4cdGeAH1SrTXyQMdrocZGEiM=QambLzUt8R0dB1k
> >VhZJzZukV-whlMVbMI82LvtmFkyU= ctakes then assigns a code to
> >everything that it finds.
> >
> >
> >
> >While you can employ various workarounds to remove "epilepsy" in when
> >within "epilepsy clinic", these are not part of the standard ctakes
> >distribution or workflow.
> >
> >
> >
> >Sean
> >
> >
> >
> >-Original Message-
> >
> >From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
> >
> >Sent: Thursday, October 20, 2016 6:56 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Post co-ordinated SNOMED-CT with
> >AggregatePlaintextFastUMLSProcessor
> >
> >
> >
> >Hi,
> >
> >
> >
> >Just wondering if someone could point me in the direction of how ctakes
> >produces post coordinated SNOMED-CT? Using the
> >AggregatePlaintextFastUMLSProcessor the individual concepts come out
> >write nicely, however if you take the following phrase "I went to the
> >Epilepsy Clinic", I can't see how the final pay coordinated SNOMED
> >concepts are formed, and appears I have a list of sub concepts
> >(pre-coordinated) that includes the disorder epilepsy (which merely going
> >to the clinic would not confirm this.
> >
> >
> >
> >Any help would be great thanks - enjoying working with ctakes and hoping
> >to include it in an NLP paper on some UK healthcare data.
> >
> >
> >
> >Arron Lacey
> >
> >Research Analyst
> >
> >SAIL Databank
> >
> >Swansea Neuroscience Research Group
> >
> >01792 602023
> >
> >a.s.la...@swansea.ac.uk
> >
> >
> >
>


Re: Post co-ordinated SNOMED-CT with AggregatePlaintextFastUMLSProcessor

2016-10-23 Thread Azad Dehghan
Arron,

I would likewise follow peter's approach but the more flexible and
recommended approach would be to use RUTA rule-based language:
https://uima.apache.org/ruta.html

Best wishes,
Azad

On 21 Oct 2016 21:18, "Abramowitsch, Peter" 
wrote:
>
> While it is doable, it will need some non trivial post processing. The
> approach I suggest below is just an example, there are many ways to
> achieve this, but there is no silver bullet.
>
> To do something like that I suggest incorporating a TokensRegex analysis
> engine in your pipeline.  I have had a lot of success with
> https://github.com/JuleStar/uima-tokens-regex
>
> These allow you to combine standard string based Regex with expressions on
> properties of Annotations - a MetaRegex.  They allow you to choose the
> AnnotationType you prefer to operate with.  (Stanford's TokensRegex for
> CoreNLP is even more powerful)
>
> Write TokensRegex rules that look for ConllDep nodes whose text is like
> clinic/visit/specialist/referral.. Whatever you are searching for, and
> assign a unique tag to that token.  Let's say you name the tag CLINIC.
> It's a custom NER, basically
>
> Output your CAS object and start processing here:
>
> Scan the ConllDep tokens of your document looking for one with the new tag
> CLINIC
>
> If you find one, Now find the sentence boundary around this Token, using
> the Sentence Annotations.
>
> Then use the POS attribute of all the ConllDep tokens within that Sentence
> boundary to look for a modifier token(POS=JJ) to the token(POS=NN) that
> you tagged
>
> Now look through the DiseaseDisorderMentions and ProcedureMentions for a
> token whose offsets matches the offsets of your JJ ConllDep token.  If you
> have a hit, then you can use it to find the core SNOMED code for Headache
> Clinic, Epilepsy Clinic, Dialysis Clinic etc.   Once you have this you
> will need to manually add the post coordinations to the SNOMED ref pointed
> to by the "(Disease|Procedure)Mention" token.  You can elaborate on this
> theme to capture more complex cases where the modifier is expressed
> differently or is not adjacent to the "CLINIC" token.
>
> I created a framework in Ruby to post process a CAS in this way, although
> I never went as far as generating SNOMED modifiers as they weren't needed
> in my case.  If not Ruby, use some other language that allows efficient
> manipulation of complex data structures in a very few lines of code.
> Otherwise it will get ugly fast.
>
>
> On 10/21/16, 3:03 AM, "Finan, Sean" 
> wrote:
>
> >Hi Arron,
> >
> >
> >
> >Ctakes discovers text words and phrases by lookup using a subset of the
> >UMLS
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_home
.
>
>html=DQIGaQ=B73tqXN8Ec0ocRmZHMCntw=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswN
>
>F8BK5Orm10=eJEOUMzoBPBjZxm8a4k4cdGeAH1SrTXyQMdrocZGEiM=QambLzUt8R0dB1k
> >VhZJzZukV-whlMVbMI82LvtmFkyU= ctakes then assigns a code to
> >everything that it finds.
> >
> >
> >
> >While you can employ various workarounds to remove "epilepsy" in when
> >within "epilepsy clinic", these are not part of the standard ctakes
> >distribution or workflow.
> >
> >
> >
> >Sean
> >
> >
> >
> >-Original Message-
> >
> >From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
> >
> >Sent: Thursday, October 20, 2016 6:56 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Post co-ordinated SNOMED-CT with
> >AggregatePlaintextFastUMLSProcessor
> >
> >
> >
> >Hi,
> >
> >
> >
> >Just wondering if someone could point me in the direction of how ctakes
> >produces post coordinated SNOMED-CT? Using the
> >AggregatePlaintextFastUMLSProcessor the individual concepts come out
> >write nicely, however if you take the following phrase "I went to the
> >Epilepsy Clinic", I can't see how the final pay coordinated SNOMED
> >concepts are formed, and appears I have a list of sub concepts
> >(pre-coordinated) that includes the disorder epilepsy (which merely going
> >to the clinic would not confirm this.
> >
> >
> >
> >Any help would be great thanks - enjoying working with ctakes and hoping
> >to include it in an NLP paper on some UK healthcare data.
> >
> >
> >
> >Arron Lacey
> >
> >Research Analyst
> >
> >SAIL Databank
> >
> >Swansea Neuroscience Research Group
> >
> >01792 602023
> >
> >a.s.la...@swansea.ac.uk
> >
> >
> >
>