Re: Post co-ordinated SNOMED-CT with AggregatePlaintextFastUMLSProcessor

2016-10-23 Thread Lacey A . S .
Hi Pete and Azad (good to hear from you Azad - I hope you enjoyed the 
conference in Swansea).

Some excellent advice thank you and plenty to explore. I was going to get to 
work on some complicated xpath queries, but I'll try what you have both 
recommended first.

All the best,

Arron Lacey
Research Analyst
SAIL Databank
Swansea Neuroscience Research Group
01792 602023
a.s.la...@swansea.ac.uk

From: Azad Dehghan 
Sent: 23 Oct 2016 08:25
To: dev@ctakes.apache.org
Subject: Re: Post co-ordinated SNOMED-CT with 
AggregatePlaintextFastUMLSProcessor

Arron,

I would likewise follow peter's approach but the more flexible and
recommended approach would be to use RUTA rule-based language:
https://uima.apache.org/ruta.html

Best wishes,
Azad

On 21 Oct 2016 21:18, "Abramowitsch, Peter" 
wrote:
>
> While it is doable, it will need some non trivial post processing. The
> approach I suggest below is just an example, there are many ways to
> achieve this, but there is no silver bullet.
>
> To do something like that I suggest incorporating a TokensRegex analysis
> engine in your pipeline.  I have had a lot of success with
> https://github.com/JuleStar/uima-tokens-regex
>
> These allow you to combine standard string based Regex with expressions on
> properties of Annotations - a MetaRegex.  They allow you to choose the
> AnnotationType you prefer to operate with.  (Stanford's TokensRegex for
> CoreNLP is even more powerful)
>
> Write TokensRegex rules that look for ConllDep nodes whose text is like
> clinic/visit/specialist/referral.. Whatever you are searching for, and
> assign a unique tag to that token.  Let's say you name the tag CLINIC.
> It's a custom NER, basically
>
> Output your CAS object and start processing here:
>
> Scan the ConllDep tokens of your document looking for one with the new tag
> CLINIC
>
> If you find one, Now find the sentence boundary around this Token, using
> the Sentence Annotations.
>
> Then use the POS attribute of all the ConllDep tokens within that Sentence
> boundary to look for a modifier token(POS=JJ) to the token(POS=NN) that
> you tagged
>
> Now look through the DiseaseDisorderMentions and ProcedureMentions for a
> token whose offsets matches the offsets of your JJ ConllDep token.  If you
> have a hit, then you can use it to find the core SNOMED code for Headache
> Clinic, Epilepsy Clinic, Dialysis Clinic etc.   Once you have this you
> will need to manually add the post coordinations to the SNOMED ref pointed
> to by the "(Disease|Procedure)Mention" token.  You can elaborate on this
> theme to capture more complex cases where the modifier is expressed
> differently or is not adjacent to the "CLINIC" token.
>
> I created a framework in Ruby to post process a CAS in this way, although
> I never went as far as generating SNOMED modifiers as they weren't needed
> in my case.  If not Ruby, use some other language that allows efficient
> manipulation of complex data structures in a very few lines of code.
> Otherwise it will get ugly fast.
>
>
> On 10/21/16, 3:03 AM, "Finan, Sean" 
> wrote:
>
> >Hi Arron,
> >
> >
> >
> >Ctakes discovers text words and phrases by lookup using a subset of the
> >UMLS
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_home
.
>
>html=DQIGaQ=B73tqXN8Ec0ocRmZHMCntw=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswN
>
>F8BK5Orm10=eJEOUMzoBPBjZxm8a4k4cdGeAH1SrTXyQMdrocZGEiM=QambLzUt8R0dB1k
> >VhZJzZukV-whlMVbMI82LvtmFkyU= ctakes then assigns a code to
> >everything that it finds.
> >
> >
> >
> >While you can employ various workarounds to remove "epilepsy" in when
> >within "epilepsy clinic", these are not part of the standard ctakes
> >distribution or workflow.
> >
> >
> >
> >Sean
> >
> >
> >
> >-Original Message-
> >
> >From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
> >
> >Sent: Thursday, October 20, 2016 6:56 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Post co-ordinated SNOMED-CT with
> >AggregatePlaintextFastUMLSProcessor
> >
> >
> >
> >Hi,
> >
> >
> >
> >Just wondering if someone could point me in the direction of how ctakes
> >produces post coordinated SNOMED-CT? Using the
> >AggregatePlaintextFastUMLSProcessor the individual concepts come out
> >write nicely, however if you take the following phrase "I went to the
> >Epilepsy Clinic", I can't see how the final pay coordinated SNOMED
> >concepts are formed, and appears I have a list of sub concepts
> >(pre-coordinated) that includes the disorder epilepsy (which merely going
> >to the clinic would not confirm this.
> >
> >
> >
> >Any help would be great thanks - enjoying working with ctakes and hoping
> >to include it in an NLP paper on some UK healthcare data.
> >
> >
> >
> >Arron Lacey
> >
> >Research Analyst
> >
> >SAIL Databank
> >
> >Swansea Neuroscience Research Group
> >
> >01792 602023
> >
> >a.s.la...@swansea.ac.uk
> >
> >
> >
>


Post co-ordinated SNOMED-CT with AggregatePlaintextFastUMLSProcessor

2016-10-20 Thread Lacey A . S .
Hi,

Just wondering if someone could point me in the direction of how ctakes 
produces post coordinated SNOMED-CT? Using the 
AggregatePlaintextFastUMLSProcessor the individual concepts come out write 
nicely, however if you take the following phrase "I went to the Epilepsy 
Clinic", I can't see how the final pay coordinated SNOMED concepts are formed, 
and appears I have a list of sub concepts (pre-coordinated) that includes the 
disorder epilepsy (which merely going to the clinic would not confirm this.

Any help would be great thanks - enjoying working with ctakes and hoping to 
include it in an NLP paper on some UK healthcare data.

Arron Lacey
Research Analyst
SAIL Databank
Swansea Neuroscience Research Group
01792 602023
a.s.la...@swansea.ac.uk



Filter CVD output?

2017-07-17 Thread Lacey A . S .
Hi - I spend a lot of time showing doctors the output of cTakes via what I have 
parsed during post processing. Problem being there is not context of where it 
is in the letter each term has been pulled from, visually anyway.

It would be great if I could sit down and run a letter through the CVD program 
and filter the output to just medical mentions?

Sent from Nine



Re: Filter CVD output? [EXTERNAL]

2017-07-17 Thread Lacey A . S .
It's a directory! Problem solved. Thanks Sean. And I will try out the 
FileTreeReader in it's place...

Sent from Nine

From: "Finan, Sean" 
Sent: 17 Jul 2017 21:08
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Hi Arron,

The TextReader is a fairly old class - it was written before I joined and I've 
never used it myself.  I don't know why it would claim that it doesn't have 
access For files I always use the FileTreeReader.  If I only want to read one 
file I just throw a copy into a directory by itself.
On that note, is "200 letters" a file or a directory?  If it is a directory 
then that is your answer.  TextReader wants a list of individual file names.  
If it gets a directory name then it doesn't gracefully handle the matter, it 
just throws an exception and fails.

Sean


-Original Message-
From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
Sent: Monday, July 17, 2017 3:41 PM
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Thanks Sean - I am finally getting somewhere now. I am able to run the 
following .piper using runPiperFile.bat





// Commands and parameters to create a default plaintext document processing 
pipeline with UMLS lookup







//  Text Files Reader

//  Reads document texts from text files specified in a provided list.

#   files  The text files to be loaded

reader org.apache.ctakes.core.cr.TextReader 
files="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200
 letters"





// Load a simple token processing pipeline from another pipeline file

load DefaultTokenizerPipeline.piper



// Add non-core annotators

add ContextDependentTokenizerAnnotator

addDescription POSTagger



// Add Chunkers

load ChunkerSubPipe.piper



// Default fast dictionary lookup

add DefaultJCasTermAnnotator



// Add Cleartk Entity Attribute annotators

load AttributeCleartkSubPipe.piper







//  Pretty Text Writer

//  Writes text files with document text and simple markups (POS, Semantic 
Group, CUI, Negation).

#   OutputDirectory  Directory for all output files.

#   SubDirectory  SubDirectory for files.

add org.apache.ctakes.core.cc.pretty.plaintext.PrettyTextWriterFit 
OutputDirectory="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\user_pipelines\test_output"











However I run into a permissions issue on my own filestore (?!)



Loading configuration.  

   Loading feature templates.   

  Loading lexica.   

 Loading model: 



   Loading model:   

  
.   

   17 Jul 2017 20:36:01 ERROR PiperFileRunner - 
C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200
 letters (Access is denied)





I've also tried running the batch file as administrator but still the same. Do 
you have any ideas?



Thanks,



Arron.





-Original Message-

From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

Sent: 17 July 2017 17:32

To: dev@ctakes.apache.org

Subject: RE: Filter CVD output? [EXTERNAL]



Hi Arron,



In your version of the clinical pipeline gui you just need to set the value of 
OutputDirectory:

add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter 
OutputDirectory=/my/directory



In the pipeline creator gui you should be able to click the button with a 
folder  icon to the right of "OutputDirectory" in the central table and use a 
file browser.  Or you can edit the piper manually (far right panel).



I am not sure why the piper validates.  If OutputDirectory is not set then it 
is a bug in validation: it should claim that the piper is not valid.  It is 
probably a bug.



If you think that the piper is valid then you can save it and then try to run 
via command line with the bin/runPiperFile script in ctakes-distribution or via 
the PiperFileRunner class in core.  See the near-bottom of 

RE: Filter CVD output? [EXTERNAL]

2017-07-19 Thread Lacey A . S .
Thanks again Sean - I now have some nice html files with annotations popping up 
when hovering over them.

"I would like to, in the future, mark up times, lists, and relations.  For now, 
as long as the purpose is displaying mentions to a non-nlper and possibly even 
passing system output to people that don't have specialized readers (e.g. cvd), 
the html writer should be useful for a lot of people."

This would be very interesting, and even the ability to mark up user defined 
annotations / dictionary items would be great. i.e. drug name get picked up, 
but the units / dose etc would also be really good.

All the best,

Arron.

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: 17 July 2017 15:01
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Hi A.S.,

If you are interested in showing medical terms discovered in text to 
non-nlpers, you could try adding the html writer to your pipeline.

ctakes-core org/apache/ctakes/core/cc/pretty/html/HtmlTextWriter.java

It creates an html file that displays the document text marked with green, red, 
yellow and orange underlines for affirmed, negated, uncertain, 
uncertain-negated medical terms.  These would be the typical anatomical site, 
sign/symptom, disease/disorder, medication, procedure mentions.  Tooltips 
appear over the text indicating the semantic type.  You can click on the 
mention and marked-up details will be displayed on the right with polarity, 
semantic type, cui, document text and preferred text.  Overlapping terms are 
also handled by the tooltips and details panel.
The document title (usually filename) is a header at the top of the document, 
and section headers are displayed larger and normalized.  They are also 
clickable.  This of course requires a sectionizer in the pipeline.  The html 
file is named after the document name.  html files are saved in a location 
indicated by the parameter "OutputDirectory".
I would like to, in the future, mark up times, lists, and relations.  For now, 
as long as the purpose is displaying mentions to a non-nlper and possibly even 
passing system output to people that don't have specialized readers (e.g. cvd), 
the html writer should be useful for a lot of people.

Sean

-Original Message-
From: Kean Kaufmann [mailto:k...@recordsone.com]
Sent: Monday, July 17, 2017 9:30 AM
To: dev@ctakes.apache.org
Subject: Re: Filter CVD output? [EXTERNAL]

Hi A.S.,

Does the "Show Selected Annotations" menu item serve your purposes?
https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23cvd.toolsMenu=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGsjlw53mHYzso=ESEOutOylgrvMI3vkv4UK7zx7eH82UeCEXZQKKqkvhU=
 



On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S.  wrote:

> Hi - I spend a lot of time showing doctors the output of cTakes via 
> what I have parsed during post processing. Problem being there is not 
> context of where it is in the letter each term has been pulled from, visually 
> anyway.
>
> It would be great if I could sit down and run a letter through the CVD 
> program and filter the output to just medical mentions?
>
> Sent from
> Nine com_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlG
> ZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGs
> jlw53mHYzso=lzGaMHUMam8F2ZpNtTRIilIWHKdm6_2QQD6aU4vQK-E= >
>
>


RE: Filtering disease term precisely from EventMention and loading MedDRA library for cTAKES

2017-09-11 Thread Lacey A . S .
Hi Ganhdi -

I think you may just have to gather all Snomed codes of interest, and then 
select only those terms that match.

Snomed has a constraint language 
(https://confluence.ihtsdotools.org/display/DOCECL/A.1+Simple+Expression+Constraints+-+Valid+Expressions
 )that you can use to collect snomed codes that fall under a concept.

An implementation of this is here:

https://github.com/slaverman/SnoLyze

hope that helps.

Arron



-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com] 
Sent: 11 September 2017 9:17 AM
To: dev@ctakes.apache.org
Subject: RE: Filtering disease term precisely from EventMention and loading 
MedDRA library for cTAKES

Hi All, Could someone please throw some light on this and help me out?

Regards,
Gandhi


-Original Message-
From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
Sent: Saturday, September 09, 2017 11:52 PM
To: dev@ctakes.apache.org
Subject: Filtering disease term precisely from EventMention and loading MedDRA 
library for cTAKES

HI All, we have deployed temporal demo application available under 
https://github.com/healthnlp/examples/tree/master/ctakes-temporal-demo locally 
and referring  to dictionaries(RxNORM,SnoMED) loaded in MySQL DB. We are trying 
to extract out disease terms using  this application.



When we tried out the text mentioned in user installation guide, " Dr. 
Nutritious Medical Nutrition Therapy for Hyperlipidemia Referral from: Julie 
Tester, RD, LD, CNSD Phone contact: (555) 555-1212 Height: 144 cm Current 
Weight: 45 kg Date of current weight: 02-29-2001 Admit Weight: 53 kg BMI: 18 
kg/m2 Diet: General Daily Calorie needs (kcals): 1500 calories, assessed as HB 
+ 20% for activity. Daily Protein needs: 40 grams, assessed as 1.0 g/kg. Pt has 
been on a 3-day calorie count and has had an average intake of 1100 calories. 
She was instructed to drink 2-3 cans of liquid supplement to help promote 
weight gain. She agrees with the plan and has my number for further assessment. 
May want a Resting Metabolic Rate as well. She takes an aspirin a day for knee 
pain", it extracted out 'Hyperlipidemia' and 'plan' as  DiseaseDisorderMention. 
But we expected only disease terms to be extracted out(Hyperlipidemia) but 
"plan" is not what we expected.



How do we avoid this or filter out only disease terms like fever, red eye, 
nausea etc. from the given text. Any help on this is greatly appreciated. Also 
please let us know is there a provision to load MedDRA dictionaries and lookup 
the same in cTAKES? If yes, please let us know how to achieve it?



Thanks in advance

Regards,
Gandhi

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.