RE: Filter CVD output? [EXTERNAL]

2017-07-19 Thread Lacey A . S .
Thanks again Sean - I now have some nice html files with annotations popping up 
when hovering over them.

"I would like to, in the future, mark up times, lists, and relations.  For now, 
as long as the purpose is displaying mentions to a non-nlper and possibly even 
passing system output to people that don't have specialized readers (e.g. cvd), 
the html writer should be useful for a lot of people."

This would be very interesting, and even the ability to mark up user defined 
annotations / dictionary items would be great. i.e. drug name get picked up, 
but the units / dose etc would also be really good.

All the best,

Arron.

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: 17 July 2017 15:01
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Hi A.S.,

If you are interested in showing medical terms discovered in text to 
non-nlpers, you could try adding the html writer to your pipeline.

ctakes-core org/apache/ctakes/core/cc/pretty/html/HtmlTextWriter.java

It creates an html file that displays the document text marked with green, red, 
yellow and orange underlines for affirmed, negated, uncertain, 
uncertain-negated medical terms.  These would be the typical anatomical site, 
sign/symptom, disease/disorder, medication, procedure mentions.  Tooltips 
appear over the text indicating the semantic type.  You can click on the 
mention and marked-up details will be displayed on the right with polarity, 
semantic type, cui, document text and preferred text.  Overlapping terms are 
also handled by the tooltips and details panel.
The document title (usually filename) is a header at the top of the document, 
and section headers are displayed larger and normalized.  They are also 
clickable.  This of course requires a sectionizer in the pipeline.  The html 
file is named after the document name.  html files are saved in a location 
indicated by the parameter "OutputDirectory".
I would like to, in the future, mark up times, lists, and relations.  For now, 
as long as the purpose is displaying mentions to a non-nlper and possibly even 
passing system output to people that don't have specialized readers (e.g. cvd), 
the html writer should be useful for a lot of people.

Sean

-Original Message-
From: Kean Kaufmann [mailto:k...@recordsone.com]
Sent: Monday, July 17, 2017 9:30 AM
To: dev@ctakes.apache.org
Subject: Re: Filter CVD output? [EXTERNAL]

Hi A.S.,

Does the "Show Selected Annotations" menu item serve your purposes?
https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23cvd.toolsMenu=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGsjlw53mHYzso=ESEOutOylgrvMI3vkv4UK7zx7eH82UeCEXZQKKqkvhU=
 



On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S. <a.s.la...@swansea.ac.uk> wrote:

> Hi - I spend a lot of time showing doctors the output of cTakes via 
> what I have parsed during post processing. Problem being there is not 
> context of where it is in the letter each term has been pulled from, visually 
> anyway.
>
> It would be great if I could sit down and run a letter through the CVD 
> program and filter the output to just medical mentions?
>
> Sent from
> Nine<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.9folders.
> com_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlG
> ZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGs
> jlw53mHYzso=lzGaMHUMam8F2ZpNtTRIilIWHKdm6_2QQD6aU4vQK-E= >
>
>


Re: Filter CVD output? [EXTERNAL]

2017-07-17 Thread Lacey A . S .
It's a directory! Problem solved. Thanks Sean. And I will try out the 
FileTreeReader in it's place...

Sent from Nine<http://www.9folders.com/>

From: "Finan, Sean" <sean.fi...@childrens.harvard.edu>
Sent: 17 Jul 2017 21:08
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Hi Arron,

The TextReader is a fairly old class - it was written before I joined and I've 
never used it myself.  I don't know why it would claim that it doesn't have 
access For files I always use the FileTreeReader.  If I only want to read one 
file I just throw a copy into a directory by itself.
On that note, is "200 letters" a file or a directory?  If it is a directory 
then that is your answer.  TextReader wants a list of individual file names.  
If it gets a directory name then it doesn't gracefully handle the matter, it 
just throws an exception and fails.

Sean


-Original Message-
From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk]
Sent: Monday, July 17, 2017 3:41 PM
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Thanks Sean - I am finally getting somewhere now. I am able to run the 
following .piper using runPiperFile.bat





// Commands and parameters to create a default plaintext document processing 
pipeline with UMLS lookup







//  Text Files Reader

//  Reads document texts from text files specified in a provided list.

#   files  The text files to be loaded

reader org.apache.ctakes.core.cr.TextReader 
files="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200
 letters"





// Load a simple token processing pipeline from another pipeline file

load DefaultTokenizerPipeline.piper



// Add non-core annotators

add ContextDependentTokenizerAnnotator

addDescription POSTagger



// Add Chunkers

load ChunkerSubPipe.piper



// Default fast dictionary lookup

add DefaultJCasTermAnnotator



// Add Cleartk Entity Attribute annotators

load AttributeCleartkSubPipe.piper







//  Pretty Text Writer

//  Writes text files with document text and simple markups (POS, Semantic 
Group, CUI, Negation).

#   OutputDirectory  Directory for all output files.

#   SubDirectory  SubDirectory for files.

add org.apache.ctakes.core.cc.pretty.plaintext.PrettyTextWriterFit 
OutputDirectory="C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\user_pipelines\test_output"











However I run into a permissions issue on my own filestore (?!)



Loading configuration.  

   Loading feature templates.   

  Loading lexica.   

 Loading model: 



   Loading model:   

  
.   

   17 Jul 2017 20:36:01 ERROR PiperFileRunner - 
C:\Users\arron\Downloads\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\projects\200
 letters (Access is denied)





I've also tried running the batch file as administrator but still the same. Do 
you have any ideas?



Thanks,



Arron.





-Original Message-

From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]

Sent: 17 July 2017 17:32

To: dev@ctakes.apache.org

Subject: RE: Filter CVD output? [EXTERNAL]



Hi Arron,



In your version of the clinical pipeline gui you just need to set the value of 
OutputDirectory:

add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter 
OutputDirectory=/my/directory



In the pipeline creator gui you should be able to click the button with a 
folder  icon to the right of "OutputDirectory" in the central table and use a 
file browser.  Or you can edit the piper manually (far right panel).



I am not sure why the piper validates.  If OutputDirectory is not set then it 
is a bug in validation: it should claim that the piper is not valid.  It is 
probably a bug.



If you think that the piper is valid then you can save it and then try to run 
via command line with the bin/runPiperFile script in ctakes-distribution o

RE: Filter CVD output? [EXTERNAL]

2017-07-17 Thread Finan, Sean
Hi Arron,

In your version of the clinical pipeline gui you just need to set the value of 
OutputDirectory:
add org.apache.ctakes.core.cc.pretty.html.HtmlTextWriter 
OutputDirectory=/my/directory

In the pipeline creator gui you should be able to click the button with a 
folder  icon to the right of "OutputDirectory" in the central table and use a 
file browser.  Or you can edit the piper manually (far right panel).

I am not sure why the piper validates.  If OutputDirectory is not set then it 
is a bug in validation: it should claim that the piper is not valid.  It is 
probably a bug.

If you think that the piper is valid then you can save it and then try to run 
via command line with the bin/runPiperFile script in ctakes-distribution or via 
the PiperFileRunner class in core.  See the near-bottom of 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files
More on that later.

Are you using the 4.0 release or trunk?  I ask for two reasons:
- The latest HtmlTextWriter in trunk is much better than that in the 4.0 release
- Trunk contains the PiperRunnerGui in org.apache.ctakes.gui.pipeline

I advise that you use ctakes trunk.

The PiperRunnerGui does two things for you:
- It makes setting command-line parameters easy
- It allows you to save command-line parameters so that you don't need to hard 
code things like OutputDirectory into your piper file.
Check 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+File+Submitter+GUI
More on that later.

The default clinical pipeline piper actually is a complete end-to-end pipeline. 
 I am to blame for absent documentation.  I should probably have more detailed 
information on the page on the default pipeline itself 
https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline
And maybe piper defaults for all pipers on 
https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files

If no reader is specified but InputDirectory is set, then the FileTreeReader is 
used by default.  
If a value is specified for the "--xmiOut" command-line parameter then the 
FileTreeXmiWriter is used.  InputDirectory can be set using -i on the command 
line. 

The piper file submitter gui will read your piper file and provide text boxes 
for all available "cli" options, including those that are custom for the piper 
file.  This always includes the options for the default clinical pipeline even 
if they aren't necessary.  That is to say that --xmiOut will be available to 
set but you don't need to do so.  Ditto for OutputDirectory, Umls user/pass, 
etc.  They are always there for convenience as those are standard options.  So, 
you don't need to set OutputDirectory in your version of the clinical pipeline. 
 Just use the gui and set it on the gui.  You can save and reload your option 
values if you plan to keep using the same values.  It is basically a pretty 
equivalent to what could otherwise be done with the runPiperFile script or 
PiperFileRunner class.

As for example complete pipeline piper files, you can find some in 
ctakes-examples-res org.apache.ctakes.examples.pipeline:
HelloWorld.piper
HelloWorldAssertProps.piper
HelloWorldCui.piper
HelloWorldProps.piper
HelloWorldTkProps.piper
ProcessDir.piper

The HelloWorld pipers have launch classes in ctakes-examples 
org.apache.ctakes.examples.pipeline that simply provide a string of text for 
processing.
The ProcessDir piper is more independent and uses the readFiles command to 
process a directory tree of example notes.

I hope that covers all of your questions, but let me know if anything is 
terribly unclear.  This is a good indication that I need to improve the 
documentation.

Sean


-Original Message-
From: Lacey A.S. [mailto:a.s.la...@swansea.ac.uk] 
Sent: Monday, July 17, 2017 11:21 AM
To: dev@ctakes.apache.org
Subject: RE: Filter CVD output? [EXTERNAL]

Hi Sean - thanks for such a quick reply.

This sounds interesting and something that would help me convey what has been 
found to non-nlpers. I do all of my processing just through CVD / CPE using the 
fastUMLSProcessor. So using the nice pipeline creator GUI I have got this far 
(by importing the existing /clinical/pipeline/DefaultFastPipeline.piper):

// Commands and parameters to create a default plaintext document processing 
pipeline with UMLS lookup

// Load a simple token processing pipeline from another pipeline file

#   files  The text files to be loaded
reader org.apache.ctakes.core.cr.TextReader files="C:\Users\arron\Documents\ 
200 letters\Epi_Let192.docx"

load DefaultTokenizerPipeline.piper

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe.piper

// Default fast dictionary lookup
add DefaultJCasTermAnnotator

// Add Cleartk Entity Attribute annotators load AttributeCleartkSubPipe.piper

//  HTML Writer
//  Writes html files with document text and simple markups (Semantic Group, 
CUI, Negation).
#   OutputD

RE: Filter CVD output? [EXTERNAL]

2017-07-17 Thread Finan, Sean
Hi A.S.,

If you are interested in showing medical terms discovered in text to 
non-nlpers, you could try adding the html writer to your pipeline.

ctakes-core org/apache/ctakes/core/cc/pretty/html/HtmlTextWriter.java

It creates an html file that displays the document text marked with green, red, 
yellow and orange underlines for affirmed, negated, uncertain, 
uncertain-negated medical terms.  These would be the typical anatomical site, 
sign/symptom, disease/disorder, medication, procedure mentions.  Tooltips 
appear over the text indicating the semantic type.  You can click on the 
mention and marked-up details will be displayed on the right with polarity, 
semantic type, cui, document text and preferred text.  Overlapping terms are 
also handled by the tooltips and details panel.
The document title (usually filename) is a header at the top of the document, 
and section headers are displayed larger and normalized.  They are also 
clickable.  This of course requires a sectionizer in the pipeline.  The html 
file is named after the document name.  html files are saved in a location 
indicated by the parameter "OutputDirectory".
I would like to, in the future, mark up times, lists, and relations.  For now, 
as long as the purpose is displaying mentions to a non-nlper and possibly even 
passing system output to people that don't have specialized readers (e.g. cvd), 
the html writer should be useful for a lot of people.

Sean

-Original Message-
From: Kean Kaufmann [mailto:k...@recordsone.com] 
Sent: Monday, July 17, 2017 9:30 AM
To: dev@ctakes.apache.org
Subject: Re: Filter CVD output? [EXTERNAL]

Hi A.S.,

Does the "Show Selected Annotations" menu item serve your purposes?
https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimaj-2Dcurrent_tools.html-23cvd.toolsMenu=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGsjlw53mHYzso=ESEOutOylgrvMI3vkv4UK7zx7eH82UeCEXZQKKqkvhU=
 



On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S. <a.s.la...@swansea.ac.uk> wrote:

> Hi - I spend a lot of time showing doctors the output of cTakes via 
> what I have parsed during post processing. Problem being there is not 
> context of where it is in the letter each term has been pulled from, visually 
> anyway.
>
> It would be great if I could sit down and run a letter through the CVD 
> program and filter the output to just medical mentions?
>
> Sent from 
> Nine<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.9folders.
> com_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlG
> ZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=HsoCy31FnpSeRSrfGfy0AvgF2hpkMTGs
> jlw53mHYzso=lzGaMHUMam8F2ZpNtTRIilIWHKdm6_2QQD6aU4vQK-E= >
>
>