from:"John Green"

Re: Apache cTAKES is now on GitHub ! [EXTERNAL] [SUSPICIOUS]

2023-01-03 Thread John Green

Wonderful!

On Tue, Jan 3, 2023 at 17:15 Savova, Guergana
 wrote:

> Fantastic development, thank you very much for making this happen, Sean!
>
> Happy New Year to all.
> --
> Guergana Savova, PhD, FACMI
> Patricia F. Brennan Professor
> Computational Health Informatics Program (CHIP)
> Boston Children's Hospital and Harvard Medical School
>
>
> -Original Message-
> From: Finan, Sean 
> Sent: Friday, December 30, 2022 1:49 PM
> To: dev@ctakes.apache.org; u...@ctakes.apache.org
> Subject: Apache cTAKES is now on GitHub ! [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi all,
>
> I am pleased to announce that the cTAKES source code is now on GitHub at
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmM-ol7Lc$
> [
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmZ84rBi0$
> ]<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmM-ol7Lc$
> > GitHub - apache/ctakes: Apache ctakes<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmM-ol7Lc$
> > Apache ctakes. Contribute to apache/ctakes development by creating an
> account on GitHub.
> github.com
> 
> 
> 
> 
> 
> 
> 
>
> All current and future code development should be performed on the source
> in GitHub.
>
>
>Changes ( vs. Subversion Repository )
>=
>
>   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
>   *   STRUCTURE:   The project has been slightly restructured at a high
> level.  The typical user should not notice the difference.
>   *   CODE API:   All package, class, method and constant names remain the
> same, so your code should not need to be refactored.
>   *   DEPENDENCIES:   If you include cTAKES modules as dependencies in
> your maven project, you can simply change the version to obtain new
> 5.0.0-SNAPSHOT builds. *
>   *   BINARY PACKAGE:   The binary package has some minor differences, but
> the typical user should not notice them.
>
> * If you use maven dependency exclusions for resource ('-res') modules
> because of unwanted ML models, you need to change the excluded name
> extension from '-res' to '-model'.
>
>
>Moving forward from the Subversion Repository
>=
>
>   *   VERSION:   The project in the SVN repository was versioned
> 4.0.1-SNAPSHOT.
>   *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT
> Subversion (SVN) repository will remain available for checkout, but should
> be considered read-only.  4.0.1-SNAPSHOT built modules will remain
> available for maven dependencies.  All current and future code development
> should be performed on the source in GitHub.
>   *   RELEASE:   There is no cTAKES 4.0.1 release.
>
>Next Anticipated Release
>
>
>   *   VERSION:   As you might guess from the snapshot version change, we
> are gearing up for a version 5.0.0 release.
>   *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0,
> including completely new modules, that the version number was bumped up.
>   *   DOCUMENTATION:   All of the new toys will be documented in the
> confluence wiki at the time of the 5.0.0 release.
>   *   DATE:   There is no release date yet, but hopefully it will be very
> very soon ...
>
> Happy New Year,
>
> Sean
>
>
>

Re: cTAKES Rest Service Development - Dictionary GUI MySQL Progress + 1 Concern [EXTERNAL]

2017-12-31 Thread John Green

Strong work!  

--- Sent from VMware Boxer

Just wanted to note that I've made a good bit of progress on the GUI

dictionary piece. I'll post some screenshots when it is further along, but

I am definitely seeing the tables in my MySQL database (tested with CPT and

ICD10).



I'll aim to setup CouchDB for v2 of the ctakes-rest-service. Next step is

to point cTAKES (within the context of the rest service) at MySQL :).



Thanks,



Matthew Vita

www.matthewvita.com



On Mon, Dec 18, 2017 at 1:27 PM, Finan, Sean <

sean.fi...@childrens.harvard.edu> wrote:



> Hi Matthew,

>

> I've heard of CouchDB but know nothing about it.  At a glance it looks

> like it is pretty versatile.

>

> Sean

>

> -Original Message-

> From: Matthew Vita [mailto:matthewvit...@gmail.com]

> Sent: Monday, December 18, 2017 3:52 PM

> To: dev@ctakes.apache.org

> Cc: Sandeep Byatha Gururaja rao; Shane Chesnutt

> Subject: Re: cTAKES Rest Service Development - Dictionary GUI MySQL

> Progress + 1 Concern [EXTERNAL]

>

> Okay, thanks for that Sean.

>

> I have a CRAZY idea... how about I try it with CouchDB instead? It's a) by

> Apache b) can be ran in Docker c) Has a JDBC connector on Github and d) Is

> 1 of the 2 databases used in OpenEMR so our cTAKES module users wouldn't

> get too confused.

>

> Again, that last item is nice to have, don't read into it too much :).

>

> Thoughts?

>

> Thanks,

>

> Matthew Vita

> www.matthewvita.com

>

> On Mon, Dec 18, 2017 at 7:23 AM, Finan, Sean <

> sean.fi...@childrens.harvard.edu> wrote:

>

> > Fantastic!

> >

> > I am glad that you found the issue - that wouldn't have been a

> > straightforward causal to track down.

> >

> > Unfortunately we cannot package and ship any binaries that aren't

> > fully Apache license compliant etc.  However, we can do two different

> things:

> > - We can still grab mysql from maven central for developers to use in

> > a developer environment, just like we do with the default umls

> dictionary.

> > - We can provide an easy means for fetching the library

> > post-installation.  A Dockerfile for ctakes, a downloader that

> > launches when mysql is selected, or a good old fashioned installation

> script.

> > Luckily the mysql library is easily available and we wouldn't need to

> > put together a runtime package like APR.

> >

> > https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_

> > books-3Fid-3DHTo-5FAmTpQPMC-26pg-3DPA14-26=DwIFaQ=qS4goWBT7poplM69

> > zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g

> > Tao=n53T6SD3EzUyJps9w7LjgVaYGA5GQjR3h-9GBGppwBc=-fxY4QiBU8SayGVEzX

> > 8LVPEPzmgsYP6ANRsuTUkHkrY=

> > lpg=PA14=apache+license+mysql=bl=uOpifTcI2E=-qlWP2-

> > pXtEkwPM8BsKd73GvX9g=en=X=0ahUKEwjhqrrX55PYAhVm5YMKHTB2A

> > GM4ChDoAQhMMAY#v=onepage=apache%20license%20mysql=false

> >

> > https://urldefense.proofpoint.com/v2/url?u=https-3A__apr.apache.org_do

> > wnload.cgi=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs

> > 67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=n53T6SD3EzUyJps9w7LjgVaYGA

> > 5GQjR3h-9GBGppwBc=Gm3m03xhcJj7rfD05pZkfM9t5l8-aBOOZCwJVf4as5g=

> >

> > I think that we could make a class that searches for mysql in the

> > environment if the mysql option is selected.   ctakes-gui has a

> dependency

> > that makes this easy.  Then a little downloader that throws a driver

> > into the lib/ directory.  If needed we could create a factory that

> > returns a wrapper for the required mysql driver classes, and the

> > factory could contain a class loader that guarantees the jar is

> > discovered post-installation.  That way a restart of the gui wouldn't be

> necessary ...

> > though that may not be a big deal.

> >

> > I am just throwing out some ideas.  There is probably a very nice

> > solution that I haven't considered.

> >

> > Sean

> >

> >

> >

> > -Original Message-

> > From: Matthew Vita [mailto:matthewvit...@gmail.com]

> > Sent: Monday, December 18, 2017 1:23 AM

> > To: dev@ctakes.apache.org; Sandeep Byatha Gururaja rao; Shane Chesnutt

> > Subject: cTAKES Rest Service Development - Dictionary GUI MySQL

> > Progress +

> > 1 Concern [EXTERNAL]

> >

> > Hi Gandhi, Sean, Tim, Alex, James,

> >

> > Good news, I was able to get MySQL running in the `ctakes-gui` (recall

> > that I am building in a toggle so that folks can create dictionaries

> > using MySQL rather than HSQLDB script files).

> >

> > I found out the source of the issue with bringing in the MySQL

> dependency.

> > This one definitely took me a while and was super subtle! If you visit

> > /ctakes/ctakes-distribution/src/main/assembly/bin.xml,

> > mysql:*is present because it's a n  non-asf

> > compliant dependency used by ytex.

> >

> > Removing the exclude and addingin

> > mysql:mysql-connector-java gets the correct result:

> >

> > /ctakes/ctakes-distribution/target/apache-ctakes-4.0.1-SNAPSHOT/lib

> > matthew

> > 

> > % ls -lash | grep mysql

> >

Re: Does Apache cTakes have Python API ?

2017-10-30 Thread John Green

Only if you wrap it in a restful interface  

--- Sent from VMware Boxer

Hello,



Can anybody please tell me if  Apache cTakes have Python API and is there

any documentation available for it?



--  
Thanks,

Bhagwat Posane

Re: Indentification of prostatectomy using http://52.27.22.206:8080/

2015-10-11 Thread John Green

You would have to look into the CUIs. My guess is RRP isnt going to be 
recognized but I could be wrong, simple acronym. 



—

On Fri, Oct 9, 2015 at 4:43 AM, Sangram Patil  wrote:

> Hi All,
> At http://52.27.22.206:8080/, When I input the following note -
> "69 y.o. Caucasian gentleman with a history of prostate cancer s/p open RRP
> in 9/2005 for a pT2c, GS3+4 prostate cancer excised with a negative
> margin."
> I get the output -
> SENTENCE:  69 y.o. Caucasian gentleman with a  history  of prostate
> cancers  /p  openRRPin 9/2005 for a  pT2c, GS3+4 prostate
> cancer  excised with a  negative  margin .
>NN JJNN  IN  DTNNINNN
> NN NN   IN  JJ NN ININ  DT  NN   NN   NN
> NN  VBNIN  DTJJ   NN
>|==||==|
> |==| |===| |===||||==|
> |==| |=|  |==|
>Finding Anatomy
> Disorder Event ProcedureTimex Anatomy
> Disorder  Event   Anatomy
>C0262926C0033572
> C0006826   C0194825   C0033572
> C0006826  C0229985
>C1278980
> C1306459  C1278980
> C1306459
>|=|
> |===|
> |===|
>  Finding
> Disorder
> Disorder
> C0262926
> C0376358
> C0376358
> C0600139
> C0600139
> ||
> Disorder
> C0007114
> TLINKS:history CONTAINS 9/2005 , 9/2005 CONTAINS RRP , history CONTAINS
> cancer , history CONTAINS s , history CONTAINS RRP , history CONTAINS cancer
> Can anyone please tell, on what basis I can make out if the patient has
> underwent prostatectomy.
> -- 
> Sincerely,
> Sangram Patil

Re: Identification of prostatectomy

2015-10-11 Thread John Green

My appologies; this was already answered. Going thru old emails.



—

On Fri, Oct 9, 2015 at 6:53 AM, Finan, Sean
 wrote:

> Hi Sangram,
> Just fyi, users of some email apps (e.g. Outlook) may not be able to reply to 
> a message with a web address in the title.
> Now to your question:
>>69 y.o. Caucasian gentleman with a history of prostate cancer s/p open RRP in 
>>9/2005 for a pT2c, GS3+4 prostate cancer excised with a negative margin.
>> on what basis I can make out if the patient has underwent prostatectomy
> The output has underlined "RRP" and marked it as a procedure with UMLS CUI 
> C0194825 .
> As the author of the question, you probably already know that an rrp is a 
> (radical retropubic) prostatectomy.  However, if you did not then you can see 
> that it is marked in the sentence as a procedure, making it a candidate.  If 
> you visit the UMLS metathesaurus https://uts.nlm.nih.gov/metathesaurus.html 
> and enter the CUI C0194825 .  The metathesaurus will display (among other 
> things):
> Radical retropubic prostatectomy
> Therapeutic or Preventative Procedure
> Surgery to remove all of the prostate and nearby lymph nodes through an 
> incision in the wall of the abdomen.
> If you aren't able to use the metathesaurus then you can apply for a free 
> user license from the nlm: 
> https://uts.nlm.nih.gov//license.html
> If you would like to have more information (such as term preferred text 
> listing), please add a jira item as a "nice to have" and maybe somebody will 
> implement it.  It isn't in there now because I didn't want the display to get 
> cluttered.
> Sean

RE: Allergy Annotator

2015-07-11 Thread John Green

Great discussion

On Fri, Jul 10, 2015 at 7:16 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 Hi Tom,
 Just for fun I checked ctakes-allergy into sandbox.  Great title.  While 
 too simple to be really useful, it might serve as a testing point or example 
 for future endeavors.
 Sean
 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
 Sent: Friday, July 10, 2015 4:21 PM
 To: dev@ctakes.apache.org
 Subject: RE: Allergy Annotator
 Hi Tom,
 It is exactly because the sentence detector splits KEY: from VALUE that I 
 didn't suggest using sentences.  Instead, I would just iterate over the whole 
 cas collection of medication events and attempt to match allergy phrases  
 (allergic to medication) with text the note spanning from event.begin-15 to 
 event.end+15 or whatever window size you prefer.
 Sean
 -Original Message-
 From: Tom Devel [mailto:deve...@gmail.com]
 Sent: Friday, July 10, 2015 4:12 PM
 To: dev@ctakes.apache.org
 Subject: Re: Allergy Annotator
 Sean and Dima, these are great suggestions, thanks so far.
 Sean, when looping over medication events as you say, I can see how it is 
 possible to take the textspan.Sentence of this MedicationMention, and then do 
 a regex check for the phrase structure as Dima said.
 But instead of textspan.Sentence, you mention see any is included in a 
 phrase. What cTAKES/UIMA class is related to this?
 Because if I would use textspan.Sentence, it would work for The patient is 
 allergic to penicillin., but cTAKES splits ALLERGIES:  PENICILLIN, WHEAT
 into two sentences, so that the MedicationMentions here would not be in the 
 same sentence as the word ALLERGIES.
 Thanks again,
 Tom
 On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean  
 sean.fi...@childrens.harvard.edu wrote:
 Hi Dima, Tom,

 I was thinking the same as Dima's first solution.  Iterate through the 
 medication events and see any is included in a phrase as mentioned in 
 Tom's original email.  Each phrase structure would have to be 
 specified beforehand.  However, assigning appropriate CUIs would 
 require having a lookup table for each medication allergy.  I think 
 that would be the simplest solution.

 Sean

 -Original Message-
 From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu]
 Sent: Friday, July 10, 2015 2:50 PM
 To: cTAKES Developer list
 Subject: Re: Allergy Annotator

 Hi Tom,

 If the patters are pretty simple, you could just add a few rules on 
 top of the cTAKES dictionary lookup output. Something of the kind 
 allergic to medication or allergies: medication1, 
 medication2, substance1, 

 If these patterns are hard to express as rules, you should consider a 
 machine learning based sequence labeling route (e.g. something similar 
 to the cTAKES chunker).


 Dima

 --
 Dmitriy (Dima) Dligach, Ph.D.
 Boston Children's Hospital and Harvard Medical School
 (617) 651-0397



 On Jul 10, 2015, at 13:40, Tom Devel deve...@gmail.commailto:
 deve...@gmail.com wrote:

 Sean,

 It would be a wider net, such that if an allergy is mentioned in the 
 clinical note, this is captured in the corresponding 
 IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation 
 class should not be changed with a new attribute, in a separate allergy 
 annotation).

 This annotator would then have to of course run after the clinical 
 pipeline has run and discovered all IdentifiedAnnotations.

 I am familiar with writing UIMA/cTAKES annotators, but not sure how a 
 new ML method could be integrated here for detecting allergies. Do you 
 have any thoughts about how to approach this in general?

 Thanks,
 Tom

 On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean  
 sean.fi...@childrens.harvard.edumailto:Sean.Finan@childrens.harvard.e
 du
 wrote:

 Hi Tom,

 Are you interested in catching all allergies or just a few specific 
 allergies for a study?  If you are only concerned with a few then 
 there is a (possibly) simple solution.  If you are interested in 
 throwing a wider net then I think that a new module would need to be 
 created; does anybody reading this have an ML or regex style module?

 Sean

 -Original Message-
 From: Tom Devel [mailto:deve...@gmail.com]
 Sent: Friday, July 10, 2015 12:42 PM
 To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
 Subject: Allergy Annotator

 Hi,

 I would like to use/extend cTAKES to detect allergies.

 In the cTAKES publication (2010)


 https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g
 ov_pmc_articles_PMC2995668_d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZM
 SdioCoppxeFUr=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTaom=ZApJmGKjz
 vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZMEs=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E
 WcvhPYW7Loe=
 there is the mention
 that: Allergies to a given medication are handled by setting the 
 negation attribute of that medication to 'is negated'.

 However, in a post here in 2014 (RE: Allergy Indication) it is said 
 that cTAKES does not have a module for

Re: Python Libraries

2015-05-04 Thread John Green

Your best bet would be using it as a RESTful service. There is a service in the 
sandbox that is under developed but works fine for testing out an integration. 
You have to change the serielized to json unless you can easily parse xmi from 
python which I dont think you can... Even lxml doesnt from what I know off the 
top of my head.


JG

On Mon, May 4, 2015 at 11:55 AM, William Dailey wdai...@gvmh.org wrote:

 I can't find a way to search the archives...so I apologize for this if it 
 has been covered.
 I am looking for a way to interact with ctakes via python.  If there has 
 been work in this area I would appreciate any links that would help me get 
 up to speed.
 Bill Dailey, MD, MS, MSMI
 Chief of Medical Information
 Golden Valley Memorial Healthcare

Re: Small medical query parser

2015-04-18 Thread John Green

Pei - Id love to. My contributions to this wonderful project (cTAKES) have been 
minimal at best. Ill see what I can do before intern year starts and if not ill 
chip at it when the daily grind of medicine needs a pause! 


Best,

JG

On Fri, Apr 17, 2015 at 11:51 PM, Pei Chen chen...@apache.org wrote:

 Hi John,
 It looks pretty straightforward.  Were you thinking of contributing
 something like this as an alternative/simplified pipeline- for those who
 may not want to always extract deep knowledge but just map terms to codes?
 Perhaps it can be even simpler if it can read the same existing bundled
 hsqldb cTAKES dictionaries (that way folks won't have to worry about
 UMLS).  Something for the sandbox if others think it will be useful for
 their use cases and especially if you're willing to help maintain it...
 --Pei
 On Fri, Apr 17, 2015 at 10:22 AM, John Green john.travis.gr...@gmail.com
 wrote:
 Hi all, been silent awhile but not for lack of work. Recently pushed a
 project to github that some on this list might find interesting. Its a
 small NER program suitable for one line inputs. A use case is highly
 variable one line inputs that need cui's mapped for searching against say a
 larger corpus annotated with cTAKES. I've found that for small phrases its
 very good. I'd love to hear feedback.

 *https://github.com/jtgreen/SMPP https://github.com/jtgreen/SMPP*

 Best JG

Small medical query parser

2015-04-17 Thread John Green

Hi all, been silent awhile but not for lack of work. Recently pushed a
project to github that some on this list might find interesting. Its a
small NER program suitable for one line inputs. A use case is highly
variable one line inputs that need cui's mapped for searching against say a
larger corpus annotated with cTAKES. I've found that for small phrases its
very good. I'd love to hear feedback.

*https://github.com/jtgreen/SMPP https://github.com/jtgreen/SMPP*

Best JG

Re: Running cTAKES via command line

2015-04-02 Thread John Green

Hi!
It depends on what you mean by run on the command line... Can you clarify the 
use case?Jg

On Thu, Apr 2, 2015 at 5:42 PM, Pedro Teixeira teixeir...@gmail.com
wrote:

 Hello, I've got an installation of cTAKES running but am unsure of how to
 run it via commandline only. I'd like to write a script to automate
 processing and skip the GUI. A quick search hasn't turned anything up. Any
 advice on how to do that? Will I have to dig into the code to do this?
 Thanks!

Re: types for hybrid relations

2015-02-10 Thread John Green

Im interested in hearing more about this.


John
—
Sent from Mailbox

On Mon, Feb 9, 2015 at 12:22 PM, Miller, Timothy
timothy.mil...@childrens.harvard.edu wrote:

 The typesystem has a few different basic relations:
 Relation: The base type, it has no information about how many arguments
 or what type of arguments it uses.
 BinaryTextRelation: Between 2 RelationArgument objects, which are
 wrappers for UIMA Annotation type (spanned arguments).
 CollectionTextRelation: Between a set of RelationArgument objects
 (Annotation)
 ElementRelation: Between 2 Element objects, which are non-spanned types,
 with pointers to mentions.
 AttributeRelation: Between an Element and an Attribute, which is a type
 of Element.
 However, as far as I can tell there is no relation type which would
 allow for a link between an Annotation and an Element. This use case
 comes up in certain models of coreference resolution, where you attempt
 to link new mentions back to clusters instead of to individual mentions.
 I am interested in trying out models of this type and was going to
 extend RelationExtractorAnnotator but I think the typesystem doesn't
 have what we need for this case. Someone please correct me if I'm wrong,
 but I would propose to modify the typesystem to make such relations
 possible.
 Thanks
 Tim

Re: Negex

2015-01-05 Thread John Green

Thanks Ma'am for the input!


So to clarify: ctakes added additional trigger words to the list published 
originally? (This is an unrelated question to the negex vs ml thread last 
month).




Best,

John
—
Sent from Mailbox

On Mon, Jan 5, 2015 at 12:58 PM, Green, John john.gr...@usuhs.edu wrote:

 Hi all - Does anyone know off the top of their head if the negex trigger
 rules included in the original 2009 python script were added to when it was
 implemented in ctakes?
 Thanks,
 John

Re: cTakes polarity problem

2014-12-31 Thread John Green

As I was reading this thread I had the same thought as Tim, perhaps a
combination. It seems over the perfect training corpus this wouldnt be
necessary, but perhaps as a stop gap the ensemble approach for some using
your training data but working in a diff corpus (not that I really have the
time to write anything here, just spit balling bc its an interesting
thread). Im still bootstrapping myself in ML so I may not have followed
David's reasoning perfectly, but couldn't a simple approach be that
anything that isnt negated by the new algo get passed to negex as a fall
back? I think that was what you were saying Tim.

One area that I can comment on in a more meaningful way would be chiming in
on Tim's remarks regarding the legitimacy of the phrase Deny hepatitis: I
agree, my clinical intuition says it's an unlikely phrase. More probable
would be it was a typo; Negative for hepatitis would be more reasonable
after, say, serology for HepB markers, though strictly speaking this would
be less likely to be in a phrase reporting results of just that specific
test (this would more likely be something a long the lines of hep panel
negative or simply the the labs were unremarkable. However, I could see
this phrase in something like the std screen was negative for hep but
positive for hiv.

The latter is definitely just one clinical opinion, people talk all kinds
of ways on the wards, good and bad, and it ends up in their notes too.

Best,
JG

On Wed, Dec 31, 2014 at 12:32 PM, David Kincaid kincaid.d...@gmail.com
wrote:

 Tim, I like your idea of a hybrid approach. I've thought about trying a
 hybrid approach in the past myself, but haven't had a chance to try it or
 seen any papers on it. It seems you could do it by either treating the
 NegEx output simply as a feature in the ML model or combining the output of
 NegEx and the ML model as an ensemble of sorts. The former would probably
 have the problem of the NegEx feature overwhelming any other features
 since it would be right most of the time. If I were doing it I think I'd
 start with the latter approach.

 In any event, it seems like right now people will need to see how the two
 systems (NegEx and ML) work on their particular data and go with whichever
 is best.

 - Dave

 On Wed, Dec 31, 2014 at 10:40 AM, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:

  Hi Michael,
  I'm somewhat sympathetic to that opinion. But we did a bunch of
  experiments and it seemed to us that negex was too hand-tailored for a
  specific dataset and that our new module did better across datasets and
  overall. The tradeoff is that it is harder to improve and it sometimes
  gives unexpected results on the kind of inputs people input by hand for
  preliminary testing. That is a tradeoff people will have to consider and
  like Guergana said, the rule-based module is still part of cTAKES.
  (FWIW, I believe it is possible to engineer examples that make Negex
  fail in unintuitive ways as well.) If you are interested in these
  experiments please check out our paper in Plos One where we look at the
  difficulty of the polarity problem, specifically porting systems to new
  domains:
 
 http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0112774
 
  I've been wondering if some hybrid approach might be useful. For
  example, maybe a system that runs the ML module and Negex and adds in
  all the recalled negated terms that Negex finds over and above the ML.
  This would probably fix some of the issues with test sentences but does
  not solve the problem of being hard to debug. Another possibility is
  using a more transparent ML method like decision trees or something.
 
  Tim
 
 
 
 
 
  On 12/31/2014 11:22 AM, Michael J Gurley wrote:
   I think this demonstrates that machine learning is not the right
 approach
   to the negation/polarity problem.
  
  
   Michael Gurley
   m-gur...@northwestern.edu
   312 925 3268
   Northwestern University Clinical and Translational Sciences Institute
   (NUCATS)
   http://www.nucats.northwestern.edu
   Rubloff Building
   750 N Lake Shore Drive, 11th Floor
   Chicago, IL 60611
  
  
  
  
  
  
  
   On 12/31/14 9:13 AM, Miller, Timothy
   timothy.mil...@childrens.harvard.edu wrote:
  
   Hi Yu,
  
   The new polarity module is machine-learning based so it is not always
   easy to diagnose accuracy issues. But generally it might mean there
 was
   no example like that in the training data. It was trained on multiple
   corpora, but sometimes certain phrases slip through the cracks, and
   Deny hepatitis, while possible in the truncated language of clinical
   notes, seems like an unlikely phrase and so it may not be in our data.
   Is that a real example you saw or just a minimum (not) working
 example?
   If not do you have a real example (i.e. a whole sentence) where deny
   should cause a negation but does not? If so I will look into it. We
 have
   had a few reports like this so it may be worth keeping track of missed

Re: [DISCUSS] new cTAKES web site

2014-12-31 Thread John Green

Acknowledging how variable taste is in aesthetics like web design... I'll
say I really like 1! Though 4 would be a strong contender.

The graphic in 3 is very informative.

JG

On Wed, Dec 31, 2014 at 4:09 PM, Lin, Chen chen@childrens.harvard.edu
wrote:

 Thanks for creating all those mockups! Agree with James and Tim, I prefer
 option 4 for its clear layout and the emphasis of ctakes' pros.
 Happy New Year!

 Best,
 Chen

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Wednesday, December 31, 2014 3:25 PM
 To: dev@ctakes.apache.org
 Subject: RE: [DISCUSS] new cTAKES web site


 I prefer option 4 overall for compactness, for the prominence of the
 download button, and the green color of the button.  But of all the bars at
 the top, I prefer the look of the top bar from option 2.

 I agree with Tim about the figure in option 3.

 -- James
 
 From: Miller, Timothy [timothy.mil...@childrens.harvard.edu]
 Sent: Wednesday, December 31, 2014 2:14 PM
 To: dev@ctakes.apache.org
 Subject: Re: [DISCUSS] new cTAKES web site

 For front page I prefer 1 or 4, for similar reasons as Britt. I love the
 figure in option 3 and we should use it but as the first thing you see it
 is a little dense. I like the color of option 1 better than option 4.
 Tim


 On 12/31/2014 03:05 PM, britt fitch wrote:
 I prefer option 1. Largely because of the prominence of 'downloads' and
 'examples'.



 Britt Fitch
 Wired Informatics
 265 Franklin St Ste 1702
 Boston, MA 02110
 http://wiredinformatics.com
 britt.fi...@wiredinformatics.commailto:britt.fi...@wiredinformatics.com

 On Dec 31, 2014, at 2:53 PM, Chen, Pei pei.c...@childrens.harvard.edu
 mailto:pei.c...@childrens.harvard.edu wrote:

 Hi folks,
 Michelle, Sean, Guergana, and Co. have created a few mockups for the new
 cTAKES website.  Which option would folks prefer?
 This is purely on the design intent, and layout, etc.  (not actual
 content).
 Option 1: http://mwchen.scripts.mit.edu/cTakes/mock0/index.html
 Option 2: http://mwchen.scripts.mit.edu/cTakes/mock1/index.html
 Option 3: http://svn.apache.org/repos/asf/ctakes/site/new/index.html
 Option 4: http://svn.apache.org/repos/asf/ctakes/site/new/index2.html

RE: intro video and ctakes youtube : Youtube Apache cTakes Channel Direct Link

2014-12-19 Thread John Green

Great article. Im not a fan of the email solution, simply because of size 
problems. Given how small the rate of new video uploads is likely to be, it 
seems a common drop box solution may be the best solution for our case. Maybe 
someone very central to the project could volunteer as this point of contact 
and play relay between dropbox and youtube? 


JG
—
Sent from Mailbox

On Wed, Dec 17, 2014 at 3:11 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 Hmmm, well this is a ticker:
 http://www.ampercent.com/upload-videos-youtube-channel-without-knowing-username-password/9374/
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com] 
 Sent: Wednesday, December 17, 2014 2:08 PM
 To: dev@ctakes.apache.org
 Subject: Re: intro video and ctakes youtube : Youtube Apache cTakes Channel 
 Direct Link
 Isnt this to upload for my account? What about to the channel?
 On Tue, Dec 16, 2014 at 12:16 PM, Finan, Sean  
 sean.fi...@childrens.harvard.edu wrote:

 Hi John,

 Look for an Upload button in the upper-left corner next to a blue 
 Sign in button.

 Sean

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Tuesday, December 16, 2014 11:12 AM
 To: dev@ctakes.apache.org
 Subject: Re: intro video and ctakes youtube : Youtube Apache cTakes 
 Channel Direct Link

 That is, how do we upload videos *to the channel. *

 On Tue, Dec 16, 2014 at 11:09 AM, John Green 
 john.travis.gr...@gmail.com
 wrote:
 
  How do we upload videos we wish to contribute? I dont have any 
  experience with youtube other than as a watcher.
 
  JG
 
  On Mon, Dec 15, 2014 at 11:43 AM, Finan, Sean  
  sean.fi...@childrens.harvard.edu wrote:
 
  Hmmm, I can't find it in a search.  However, here is a direct link:
 
  https://www.youtube.com/channel/UC8hQoOKz3v4PNEf6cqSkjbQ
 
  Maybe it needs a few videos to register in the search engine ?
 
  Sean
 
  -Original Message-
  From: Pei Chen [mailto:chen...@apache.org]
  Sent: Monday, December 15, 2014 11:32 AM
  To: dev@ctakes.apache.org
  Subject: Re: intro video and ctakes youtube
 
  John,
  I presume you this thread:
 
  http://mail-archives.apache.org/mod_mbox/ctakes-dev/201408.mbox/%3C
  39 3252f14c42f946952f1ed75d316cad39158...@chexmbx4a.chboston.org%3E
 
  Strange, I couldn't find it anymore either... The place holder 
  could have been auto deleted because it was empty?  I think it's 
  worth it if you're willing to create and add to it again...
 
  ---Pei
 
  On Fri, Dec 12, 2014 at 11:46 PM, John Green 
  john.travis.gr...@gmail.com
  
  wrote:
  
   I was going to post some basic how to videos that help with the 
   learning curve I've walked over the last year and a half. I went 
   looking for ctakes youtube channel mentioned awhile back and I 
   did not
  find it...
  
   Anyone know where it went?
  
   Best,
   JG

Re: revamping the Apache cTAKES website

2014-12-16 Thread John Green

Looks great.

Bootstrap provides most of the css/layout natively. There are augmentations
freely available like http://bootswatch.com/ that can add flare to the
packaged css. Design is easy with templating incorporated into systems like
django ;-) (not another plug!) Bootstrap also has some designs as
examples on its website that would probably be safe to use as a base.

JG

On Mon, Dec 15, 2014 at 11:17 PM, Pei Chen chen...@apache.org wrote:

 the template was borrowed from spark... we should put in our own
 design/css/layout/skin to suit our needs.  Perhaps Michelle or others
 familiar with bootstrap could help us out here?

 On Mon, Dec 15, 2014 at 7:32 PM, jay vyas jayunit100.apa...@gmail.com
 wrote:
 
  this is gorgeous ! Thanks pei !  i let the bigtop folks know as well
  !
 
  On Mon, Dec 15, 2014 at 6:21 PM, Murali mmin...@gmail.com wrote:
  
   Looks great. +1
  
  
  
On Dec 15, 2014, at 4:29 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
  
   wrote:
   
Check out a mockup of a new website proposal:
http://svn.apache.org/repos/asf/ctakes/site/new/index.html
Based off bootstrap (Idea borrowed from the Spark folks..).
   
Couple of key pieces of info:
- 10% of visitors are on mobile/tablets
- The most currently visited pages are: downloads.cgi,
   gettingstarted.html.  I suggest we focus our attention on those 2
 items.
   (Putting a Downloads link right on the front page, etc.)
   
svn co http://svn.apache.org/repos/asf/ctakes/site/new if you want
 to
   checkout the code of the site.
   
--Pei
   
-Original Message-
From: John Green [mailto:john.travis.gr...@gmail.com]
Sent: Friday, December 05, 2014 6:34 PM
To: dev@ctakes.apache.org
Cc: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website
   
I would like to second the bootstrap recommendation, with the
  additional
   recommendation of django for the backend. It is an amazing platform for
   rapid development and easy updating.
   
   
JG
—
Sent from Mailbox
   
On Fri, Dec 5, 2014 at 12:15 PM, Savova, Guergana 
   guergana.sav...@childrens.harvard.edu wrote:
   
There are now 4 volunteers:
Michelle Chen
Pei Chen
Sean Finan
Guergana Savova
--Guergana
-Original Message-
From: Savova, Guergana [mailto:
 guergana.sav...@childrens.harvard.edu]
Sent: Friday, December 05, 2014 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website Wonderful, thank
 you,
Michelle! There will be a flurry of emails the week of Dec 15
 followed
   by actual work, so book your calendar if possible...
--Guergana
-Original Message-
From: Michelle Chen [mailto:michelle1919c...@gmail.com]
Sent: Friday, December 05, 2014 11:48 AM
To: dev@ctakes.apache.org
Subject: Re: revamping the Apache cTAKES website Hello Guergana, I
don't know that much about cTakes, but would be interested in
   contributing to the effort.
I'm not sure if there is an interest in matching the website design
 of
   other Apache projects, but it seems that the two main designs that are
   being used from my arbitrary search on
   http://projects.apache.org/indexes/alpha.html is 1. the current design
   that cTakes is using and 2. a Bootstrap approach.
I've done a little bit of work on Bootstrap and would be interested
 in
   helping with that. Let me know how I can be helpful.
Sincerely,
Michelle Chen :)
Be strong and of good courage; do not be afraid, nor be dismayed,
 for
the Lord your God is with you wherever you go. ~Joshua 1:9 On Fri,
  Dec
   5, 2014 at 11:21 AM, Savova, Guergana 
   guergana.sav...@childrens.harvard.edu wrote:
cTAKES-ers,
   
we would like to start working on updating the Apache cTAKES
 website
- some of the information there is already stale and needs
  refreshing.
Do you have ideas on website design, content, etc.? Would you like
 to
contribute to the effort? We are planning to start working on the
website the week of Dec 15.
   
Cheers,
--Guergana
   
   
  
 
 
  --
  jay vyas

Re: intro video and ctakes youtube : Youtube Apache cTakes Channel Direct Link

2014-12-16 Thread John Green

How do we upload videos we wish to contribute? I dont have any experience
with youtube other than as a watcher.

JG

On Mon, Dec 15, 2014 at 11:43 AM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

 Hmmm, I can't find it in a search.  However, here is a direct link:

 https://www.youtube.com/channel/UC8hQoOKz3v4PNEf6cqSkjbQ

 Maybe it needs a few videos to register in the search engine ?

 Sean

 -Original Message-
 From: Pei Chen [mailto:chen...@apache.org]
 Sent: Monday, December 15, 2014 11:32 AM
 To: dev@ctakes.apache.org
 Subject: Re: intro video and ctakes youtube

 John,
 I presume you this thread:

 http://mail-archives.apache.org/mod_mbox/ctakes-dev/201408.mbox/%3c393252f14c42f946952f1ed75d316cad39158...@chexmbx4a.chboston.org%3E

 Strange, I couldn't find it anymore either... The place holder could have
 been auto deleted because it was empty?  I think it's worth it if you're
 willing to create and add to it again...

 ---Pei

 On Fri, Dec 12, 2014 at 11:46 PM, John Green john.travis.gr...@gmail.com
 wrote:
 
  I was going to post some basic how to videos that help with the
  learning curve I've walked over the last year and a half. I went
  looking for ctakes youtube channel mentioned awhile back and I did not
 find it...
 
  Anyone know where it went?
 
  Best,
  JG

intro video and ctakes youtube

2014-12-12 Thread John Green

I was going to post some basic how to videos that help with the learning
curve I've walked over the last year and a half. I went looking for ctakes
youtube channel mentioned awhile back and I did not find it...

Anyone know where it went?

Best,
JG

Re: Announcement: UMLS MedGen-MySQL dataset now available as open access download

2014-11-13 Thread John Green

The old licensed setup would be kept as a packaged option? Much as it is 
now With the unlicensed going out in place of the current free 
dictionary? Am I understanding that right? 


JG
—
Sent from Mailbox

On Thu, Nov 13, 2014 at 1:40 PM, andy mcmurry mcmurry.a...@gmail.com
wrote:

 I'll crunch the numbers -- in the meantime I can tell you that phenotypes
 vary by semantic type. clinical attributes  from SNOMED are abundant, many
 concepts in mesh that are mapped to diseases. Tons of pharmacological
 substances
 On Nov 12, 2014 6:19 AM, Dligach, Dmitriy 
 dmitriy.dlig...@childrens.harvard.edu wrote:
 Andy, thank you for this resource!

 Do you have an estimate of what percentage of UMLS concepts were left out?

 Dima




 On Nov 11, 2014, at 16:02, andy mcmurry mcmurry.a...@gmail.com wrote:

  Hello!
 
  https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
 
  We just released a new library containing a huge chunk of UMLS concepts
  which are available without registering accounts/username/passwords.
  LEGALLY. Yes, really!
 
  The subset is from NCBI and it contains *thousands of concepts from
 SNOMED
  and other vocabularies*.
 
  The code is essentially
  1. a list of WGET targets to various NCBI FTP site mirrors
  2. Makefile for building the databases of interest
 
  Our legal team has approved distribution for Open Access work, ASL2
  LICENSE.
 
  I recommend we use this opportunity to make this the default distribution
  for CTAKES UMLS connections, because it obviates the need for so much
  painful credentialing and back and forth agreements with the US National
  Library of Medicine.
 
  Cheers!
  --Andy
 
 
  On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. 
 masanz.ja...@mayo.edu
  wrote:
 
 
  I would love to see the install be as simple as apt-get install to end
 up
  with some working dictionary that have more than a handful of entries to
  get them started.
 
  Regards,
  James Masanz
 
  -Original Message-
  From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
  Sent: Tuesday, September 09, 2014 4:32 PM
  To: ctakes-...@incubator.apache.org
  Subject: Recommendation for ctakes default (UMLS) dictionaries
 
  Greetings ctakes-dev:
 
  *UMLS license restrictions have been getting more lax over the years --
  *much of the UMLS can be downloaded directly from the NCBI official FTP
  site.
 
  In fact, the NIH (and implicitly the NLM) *have already made the
 standard
  terms public for some medical specialities*.
 
  For example: Here is the UMLS subset specific to Medical Genetics
 (MedGen)
  and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) and
 names,
  etc :
 
  [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
 
  My team has developed a JVM based wrapper for MetaMap 2013AB which I
  intend to open source soon (Clojure).  It includes REST support for
  invoking MetaMap with any or all of the command line arguments.
  We do not integrate with UIMA, we are basically a wrapper around the
  binary installation of MetaMap. The emphasis is on publication text not
  clinical text, still, some services are common (such as LVG).
 
  Strangely, the NLM still requires UMLS licenses to download MetaMap
  execution binaries. The MetaMap binary install is better but customizing
  dictionaries (DataFileBuilder) is not as easy to use as CTAKES with
 YTEXT
 
  [ https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
 ]
 
  *** Hence, there is a real opportunity here to enable Apache cTAKES to
  have a stronger default dictionary. ** *
 
  Imagine if we could
  *$ apt-get install apache-ctakes *
 
  and instantly have a working package for SOME problem domain.
  In my case (Medical Genetics) the UMLS definitions are already available
  and the UMLS license problem becomes a non issue, at least for many
 first
  time users
 
  Your thoughts?
  AndyMC

Re: YTEX Semantic Sim RESTful

2014-10-14 Thread John Green

Thanks Vijay! You are the best! It now works with
http://localhost:8080/services/rest/similarity?conceptGraph=sct-msh-csp-aodconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH
which is the name of the graph it was loading.

Copying beans-kernel-simweb.xml
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/web/beans-kernel-simweb.xml
worked
perfectly, thank you very much for the time you spent on this.

Best,
JG

On Tue, Oct 14, 2014 at 8:58 AM, vijay garla vnga...@gmail.com wrote:

 Hi John,

 Looking at the code, that error is due to the concept graph 'umls' not
 being loaded.  by default, ytex is configured to use the sct-rxnorm concept
 graph.

 Can you see if this works:

 http://localhost:8080/services/rest/similarity?conceptGraph=sct-rxnormconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH

 To set the concept graph name set the ytex.conceptGraphName in
 resources/org/apache/ctakes/ytex/ytex.properties

 If not, there may be an issue in a config file; if you get another NPE,
 please
 * copy

 https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/web/beans-kernel-simweb.xml
 to CTAKES_HOME/resources/org/apache/ctakes/ytex/web/
 * copy

 https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/main/bin/ytexweb.bat
 to CTAKES_HOME/bin

 HTH!

 Vijay

 On Tue, Oct 14, 2014 at 1:51 PM, John Green john.travis.gr...@gmail.com
 wrote:

  Good idea Kim! Unfortunately, that wasn't it. Ill admit, though, I hadnt
  looked at that variable yet.
 
  Thanks for your help,
  JG
 
  On Mon, Oct 13, 2014 at 6:28 PM, Kim Ebert 
  kim.eb...@perfectsearchcorp.com
  wrote:
 
   Perhaps your JVM is running out of heap? I've noticed that when I run
   out of heap, cTakes tends to behave erratically.
  
   Kim Ebert
   1.801.669.7342
   Perfect Search Corp
   http://www.perfectsearchcorp.com/
  
   On 10/13/2014 09:29 AM, John Green wrote:
I've been putting off debugging this as it was a piece of this app Im
working on, but one that fit in down the road in development.
  Development
has progressed, and here I am. I have posted this one before, was
  hoping
   to
find fresh help.
   
When running ytex.sh in a distro installed at something like
./ctakes3.2.0/apache-ctakes-3.1.2-SNAPSHOT/bin$ under Ubuntu 14 and
   trying
to access the restful interface per the docs on a query like such as
   
  
 
 http://localhost:8080/similarity?conceptGraph=umlsconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH
the query fails with a 500 (see below).
   
Of note, the http://localhost:8080/semanticSim.jsf works just fine.
   
Am I missing something simple?
   
Thanks for any and all help,
Best,
JG
   
500 error:
HTTP ERROR 500
   
Problem accessing /services/rest/similarity. Reason:
   
Server Error
   
Caused by:
   
java.lang.RuntimeException: org.apache.cxf.interceptor.Fault
  at
  
 
 org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
  at
  
 
 org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:333)
  at
  
 
 org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
  at
  
 
 org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:239)
  at
  
 
 org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:248)
  at
  
 
 org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:222)
  at
  
 
 org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:153)
  at
  
 
 org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:167)
  at
  
 
 org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:286)
  at
  
 
 org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:211)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
  at
  
 
 org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:262)
  at
   org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
  at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:526)
  at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
  at
  
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568)
  at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
  at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1105)
  at
  
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:453

YTEX Semantic Sim RESTful

2014-10-13 Thread John Green

I've been putting off debugging this as it was a piece of this app Im
working on, but one that fit in down the road in development. Development
has progressed, and here I am. I have posted this one before, was hoping to
find fresh help.

When running ytex.sh in a distro installed at something like
./ctakes3.2.0/apache-ctakes-3.1.2-SNAPSHOT/bin$ under Ubuntu 14 and trying
to access the restful interface per the docs on a query like such as
http://localhost:8080/similarity?conceptGraph=umlsconcept1=C0018787concept2=C0024109metrics=LCH,INTRINSIC_LCH
the query fails with a 500 (see below).

Of note, the http://localhost:8080/semanticSim.jsf works just fine.

Am I missing something simple?

Thanks for any and all help,
Best,
JG

500 error:
HTTP ERROR 500

Problem accessing /services/rest/similarity. Reason:

Server Error

Caused by:

java.lang.RuntimeException: org.apache.cxf.interceptor.Fault
at 
org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:333)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:239)
at 
org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:248)
at 
org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:222)
at 
org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:153)
at 
org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:167)
at 
org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:286)
at 
org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:211)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at 
org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:262)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:526)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1105)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:453)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1039)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:201)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:277)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:216)
at 
org.eclipse.jetty.io.AbstractConnection$1.run(AbstractConnection.java:505)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.cxf.interceptor.Fault
at 
org.apache.cxf.service.invoker.AbstractInvoker.createFault(AbstractInvoker.java:162)
at 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:128)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:205)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:102)
at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:58)
at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:94)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
... 30 more
Caused by: java.lang.NullPointerException
at 
org.apache.ctakes.ytex.ws.ConceptSimilarityWebServiceImpl.getConceptSimilarityService(ConceptSimilarityWebServiceImpl.java:53)
at 
org.apache.ctakes.ytex.ws.ConceptSimilarityWebServiceImpl.similarity(ConceptSimilarityWebServiceImpl.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Re: De-identified lab tests dataset

2014-09-30 Thread John Green

How large? And across how many EMRs? 


JG
—
Sent from Mailbox

On Mon, Sep 29, 2014 at 6:58 PM, Ajay Jain ajayj...@mobileinsights.net
wrote:

 Sorry, I wasn't clear. I am working on a related project and trying to figure 
 out if the code can be repurposed for a lab mention annotator for cTAKES. 
 From what I have seen, test names from different institutions are not 
 standardized which makes it hard to standardize the resulting annotation. 
 Getting access to a larger lab tests dataset (structured) will help me fine 
 tune the model. 
  
 Hope this helps. 
 Ajay
 Sent from my iPhone
 On Sep 29, 2014, at 2:12 PM, Savova, Guergana 
 guergana.sav...@childrens.harvard.edu wrote:
 
 Ajay,
 cTAKES currently does not implement a method to discover labs from the text. 
 The motivation is that you can get that easily from the structured part of 
 the EMR (what Pete explained below). Hope this makes sense!
 --Guergana
 
 -Original Message-
 From: Peter Szolovits [mailto:p...@mit.edu] 
 Sent: Monday, September 29, 2014 2:32 PM
 To: dev@ctakes.apache.org
 Subject: Re: De-identified lab tests dataset
 
 Ajay, I'm confused by your query.  cTakes is good at interpreting text, but 
 most lab test results are reported in tabular form that is most 
 appropriately searched by SQL queries.  Sometimes lab results are also 
 reported in narrative notes, but parsing those is often more a matter of 
 deciphering the text structure of tables than of parsing real English text.  
 What am I misunderstanding?
 
 --Pete Sz.
 
 On Sep 29, 2014, at 2:25 PM, Ajay Jain ajayj...@mobileinsights.net wrote:
 
 Hello All,
 
 I am working on a use case for lab tests data using cTAKES and my 
 online search to find a test dataset has been futile.  I'll greatly 
 appreciate if someone can share such a dataset or can point me in the 
 right direction to go looking for one.
 
 Best,
 Ajay
 
 --
 Founder  CEO
 Mobile Insights, Inc.
 (630) 408-8623

Semantic similarity standard

2014-09-27 Thread John Green

Does anyone know of a larger set of human generated measures for semantic 
simalarity than the 500+ one cited by Vijay in his paper on semantic 
simalarity? The paper cited was PMID 21347043.

Jg
—
Sent from Mailbox

Re: Acronym annotator

2014-08-22 Thread John Green

Wow, what a goldmine, thanks! 


JG
—
Sent from Mailbox for iPhone

On Fri, Aug 22, 2014 at 8:31 AM, Koola, Jejo David
jejo.d.ko...@vanderbilt.edu wrote:

 You might be interested in: 
 https://sbmi.uth.edu/ccb/resources/abbreviation.htm
 On Aug 21, 2014, at 2:08 PM, John Green 
 john.travis.gr...@gmail.commailto:john.travis.gr...@gmail.com wrote:
 Are there any acronym annotators and disambiguators? What are people doing
 in production elsewhere? Im learning the heart of cTakes and UIMA by the
 numbers right now and I think writing an annotator of my own will be the
 best way to solidify the information. If no one has it done already, I
 thought Id write a simple acronym annotator and disambiguator. The
 disambiguation would just be a co-occurance over a lookup window across a
 private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
 word5 word6. I would provide specificity by excluding words that tend to
 occur frequently across instances of the acronyms with the same
 abbreviation.
 But, if someone has already done it and is planning on releasing it, I hate
 to reproduce wheels...
 JG

Web server

2014-08-21 Thread John Green

Im trying to deploy the cTakes web-server code someone already wrote (who
wrote it btw?). Im running into deployment issues in eclipse with tomcat 7
on mac... I can get into details but for now: is it in a working state? Im
learning as I go and it looks in order and the code is solid...

Also, Pei: did they check in an LVG version that is thread safe now?

Im really set on getting cTakes into a fluid RESTful interface.

JG

Acronym annotator

2014-08-21 Thread John Green

Are there any acronym annotators and disambiguators? What are people doing
in production elsewhere? Im learning the heart of cTakes and UIMA by the
numbers right now and I think writing an annotator of my own will be the
best way to solidify the information. If no one has it done already, I
thought Id write a simple acronym annotator and disambiguator. The
disambiguation would just be a co-occurance over a lookup window across a
private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
word5 word6. I would provide specificity by excluding words that tend to
occur frequently across instances of the acronyms with the same
abbreviation.

But, if someone has already done it and is planning on releasing it, I hate
to reproduce wheels...

JG

RE: Web server

2014-08-21 Thread John Green

I have. I read the docs, it mentions more information but the tutorial was very 
short.


It seems there are simple get requests with the xml ae for output built into 
the existing sandbox code, so I just wanted to hash that first before starting 
on a new thread.




Do you have experience with uima simple server? 




JG
—
Sent from Mailbox for iPhone

On Thu, Aug 21, 2014 at 12:10 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 Hi John,
 Have you (or another) thought about modifying the Uima Simple Server to run a 
 cTakes pipeline?
 http://uima.apache.org/sandbox.html#simple-server
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Thursday, August 21, 2014 3:06 PM
 To: dev@ctakes.apache.org
 Subject: Web server
 
 Im trying to deploy the cTakes web-server code someone already wrote (who
 wrote it btw?). Im running into deployment issues in eclipse with tomcat 7
 on mac... I can get into details but for now: is it in a working state? Im
 learning as I go and it looks in order and the code is solid...
 
 Also, Pei: did they check in an LVG version that is thread safe now?
 
 Im really set on getting cTakes into a fluid RESTful interface.
 
 JG

RE: Web server

2014-08-21 Thread John Green

And you did so with ctakes? Thats great, ill try it. 


JG
—
Sent from Mailbox for iPhone

On Thu, Aug 21, 2014 at 12:46 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 Do you have experience with uima simple server?
 A few months ago I set it up and ran it just for kicks.  It is simple, but I 
 pondered that as such it could serve as a nice foundation.  Well, maybe a 
 cornerstone.
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Thursday, August 21, 2014 3:43 PM
 To: dev@ctakes.apache.org
 Cc: dev@ctakes.apache.org
 Subject: RE: Web server
 
 I have. I read the docs, it mentions more information but the tutorial was 
 very
 short.
 
 
 It seems there are simple get requests with the xml ae for output built into 
 the
 existing sandbox code, so I just wanted to hash that first before starting 
 on a
 new thread.
 
 
 
 
 Do you have experience with uima simple server?
 
 
 
 
 JG
 —
 Sent from Mailbox for iPhone
 
 On Thu, Aug 21, 2014 at 12:10 PM, Finan, Sean
 sean.fi...@childrens.harvard.edu wrote:
 
  Hi John,
  Have you (or another) thought about modifying the Uima Simple Server to run
 a cTakes pipeline?
  http://uima.apache.org/sandbox.html#simple-server
  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Thursday, August 21, 2014 3:06 PM
  To: dev@ctakes.apache.org
  Subject: Web server
 
  Im trying to deploy the cTakes web-server code someone already wrote
  (who wrote it btw?). Im running into deployment issues in eclipse
  with tomcat 7 on mac... I can get into details but for now: is it in
  a working state? Im learning as I go and it looks in order and the code is
 solid...
 
  Also, Pei: did they check in an LVG version that is thread safe now?
 
  Im really set on getting cTakes into a fluid RESTful interface.
 
  JG

Re: Web server

2014-08-21 Thread John Green

Disregard the couldnt deploy question, got it. Ill try and update the docs 
(e.g. Create a doc for the web server as it matures) but for folks like me 
linking web-inf/lib to the maven dependencies was not immediately clear. 
Figured it out. Other questions still stand :-)


JG
—
Sent from Mailbox for iPhone

On Thu, Aug 21, 2014 at 12:05 PM, John Green john.travis.gr...@gmail.com
wrote:

 Im trying to deploy the cTakes web-server code someone already wrote (who
 wrote it btw?). Im running into deployment issues in eclipse with tomcat 7
 on mac... I can get into details but for now: is it in a working state? Im
 learning as I go and it looks in order and the code is solid...
 Also, Pei: did they check in an LVG version that is thread safe now?
 Im really set on getting cTakes into a fluid RESTful interface.
 JG

Re: Change from SNOMEDCT to SNOMEDCT_US affecting v_snomed_fword_lookup

2014-08-21 Thread John Green

Clayton - this indeed did fix the @db.schema@ for you? Im gonna try and 
reproduce (havent had time yet) then ill close the Jira ticket out.


JG
—
Sent from Mailbox for iPhone

On Thu, Aug 21, 2014 at 1:24 PM, Clayton Turner caturn...@g.cofc.edu
wrote:

 Ah, I just switched to the ytex branch and all is good now. The SNOMED_US
 issue has been plaguing me for weeks now so thanks a million for that.
 On Thu, Aug 21, 2014 at 2:13 PM, Clayton Turner caturn...@g.cofc.edu
 wrote:
 Awesome. This is just what I needed for the longest time.

 I'm having a slight issue. When running either the ytex pipeline or ytex
 version of the AggregatePlaintextUMLSProcessor I get an error during
 initialization.

 My DictionaryLookupAnnotator.xml is raising a
 org.apache.uima.resource.ResourceInitializationException causedby:
 java.lang.ClassNotFoundException:
 edu.mayo.bmi.uima.lookup.ae.FirstTokenPermLookupInitializerImpl

 I feel like I may have drifted away from what I need, though, because
 before this the CPE was complaining about a lack of LookupDesc_SNOMED.xml
 file. I found a ytex version of this on a google code site somewhere and
 pasted it where the CPE was looking for it. Now this error is coming up.

 Could my problem be solved with just a re-run of the ant script (was just
 trying to avoid since it takes ages) or is it a different issue?


 On Tue, Aug 19, 2014 at 12:58 PM, Tim O'Connell tim.oconn...@gmail.com
 wrote:

 Hi John,

 I'm not sure what was going on with the @db.schema@ error, although I was
 getting it as well before with my prior build of 3.1.2 - I assume that
 you've fixed something (thank you!) to make this go away.  I rebuilt
 everything from scratch and it's working now.

 I think one other thing I had to change was that after I had finished the
 install/build, the cTakes version of LookupDesc_Db.xml doesn't work (in
 resources\org\apache\ctakes\dictionary\lookup) - I'm pretty sure I had to
 copy in an older version of the file from 3.1.1 to get the default cTakes
 AggregatePlaintextUMLSProcessor pipeline working, although please
 double-check that as my memory is a little foggy.

 But yes, here's what I have working since re-building:
 1. ytex-pipeline.xml
 2. ytex version of AggregatePlaintextUMLSProcessor.xml
 3. cTakes version of AggregatePlaintextUMLSProcessor.xml (with swapping
 the
 LookupDesc_Db.xml file as above)

 I've even made modifications to the ytex version of LookupDesc_SNOMED.xml
 to get it tagging Disease Disorders, along with database modifications to
 have it store these entries as well, which is working great.   Literally,
 everything is working perfectly now.

 Still so much for me to learn!  Let me know if you need any more details.

 All the best,
 Tim



 On Tue, Aug 19, 2014 at 4:31 AM, John Green john.travis.gr...@gmail.com
 wrote:

  I have not had time to implement this - to clarify out of curiosity,
 does
  this clear up the @db.schema@ error Tim? And did you successfully run
  ytex with the ctakes dictionary-lookup?
 
 
  JG
  —
  Sent from Mailbox for iPhone
 
  On Sat, Aug 16, 2014 at 2:53 AM, Tim O'Connell tim.oconn...@gmail.com
  wrote:
 
   Hi folks,
   I was having an issue with the current build (from svn) of ctakes/ytex
  not
   identifying any annotations as some folks on this board.  I traced it
 to
   the fact that the UMLS database has at sometime in the relatively
 recent
   past changed the SAB tag in the MRCONSO table for SNOMED terms from
   SNOMEDCT to SNOMEDCT_US.  I just had a newer version of UMLS that uses
   SNOMEDCT_US.  Thus when the install script tried to create the
   v_snomed_fword_lookup table, it wasn't finding any of the SNOMEDCT
 terms,
   thus nothing was getting annotated.
   The ytex install script was just looking for things in MRCONSO with
 the
   SNOMEDCT SAB tag when it created the ytex lookup table - so, by
 changing
   this to SNOMEDCT_US in the file
  
 
 CTAKES_HOME/bin/ctakes-ytex/scripts/data/mysql/umls/insert_view_template.sql
   it now works (for mysql users) to find the annotations. You can just
  re-run
   the ytex setup script, but that takes hours - instead, I just deleted
 all
   the data from the v_snomed_fword_lookup table and basically ran the
 sql
   command to repopulate the table and it worked fine. Here's the code,
 n.b.
   my schema name for my umls database is 'umls' - change the code below
 if
   yours is different.
   delete from v_snomed_fword_lookup;
   insert into v_snomed_fword_lookup (cui, tui, fword, fstem, tok_str,
   stem_str)
   select mrc.cui, t.tui, c.fword, c.fstem, c.tok_str, c.stem_str
   from umls_aui_fword c
   inner join umls.MRCONSO mrc on c.aui = mrc.aui and mrc.SAB in (
   'SNOMEDCT_US', 'RXNORM')
   inner join
   (
   select cui, min(tui) tui
   from umls.MRSTY sty
   where sty.tui in
   (
  'T019', 'T020', 'T037', 'T046', 'T047', 'T048', 'T049', 'T050',
'T190', 'T191', 'T033',
  'T184',
  'T017', 'T029', 'T023', 'T030', 'T031', 'T022', 'T025', 'T026

Re: Youtube Channel Apache cTakes

2014-08-12 Thread John Green

Nice!!! —
Sent from Mailbox for iPhone

On Tue, Aug 12, 2014 at 5:09 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 cTakes now has a youtube channel named Apache cTakes.  It is empty, but if 
 you have ever made a training video, presentation on a component 
 (descriptors, type system, etc.), or demo of integration with another system 
 (UimaFit, Uima-AS, etc.) then please feel free to post on that channel.  When 
 there is content the Apache pages can have a link to the channel.
 Sean

Re: LabMentions

2014-08-07 Thread John Green

outstanding!—

On Thu, Aug 7, 2014 at 9:16 AM, britt fitch britt.fi...@gmail.com wrote:

 We have been actively working on a lab annotator at Wired Informatics that we 
 are planning to contribute back to the community once its fully tested. 
 We will post to the dev list when we have a date that that becomes available. 
 Cheers, 
 Britt
 On Aug 4, 2014, at 5:19 PM, Harpreet Khanduja hsk5...@rit.edu wrote:
 Thank you so much for letting me know.
 I will try my best to come up with it.
 
 Regards,
 Harpreet
 
 
 On Mon, Aug 4, 2014 at 4:42 PM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:
 
 
 As far as I know, there isn't an annotator yet for creating LabMention
 annotations.  We would welcome a contribution.
 
 - James Masanz
 
 -Original Message-
 From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
 Sent: Friday, August 01, 2014 11:27 AM
 To: dev@ctakes.apache.org
 Subject: LabMentions
 
 Hello,
 
 Is there a way to include the annotation LabMentions in the pipeline?
 
 Thank you for your help.
 
 Regards,
 Harpreet

Null Pointer

2014-07-28 Thread John Green

Any ideas why as of cTakes 3.2.0 when I try and
use FilesInDirectoryCollectionReader.xml I get a NullPointerException?

java.lang.NullPointerException
at org.apache.uima.tools.cpm.CpmPanel.fileSelected(CpmPanel.java:1509)
at
org.apache.uima.tools.util.gui.FileSelector$1.actionPerformed(FileSelector.java:141)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2018)
at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2341)
at
javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)
at
javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252)
at java.awt.Component.processMouseEvent(Component.java:6505)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3311)
at java.awt.Component.processEvent(Component.java:6270)
at java.awt.Container.processEvent(Container.java:2229)
at java.awt.Component.dispatchEventImpl(Component.java:4861)
at java.awt.Container.dispatchEventImpl(Container.java:2287)
at java.awt.Component.dispatchEvent(Component.java:4687)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4492)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
at java.awt.Container.dispatchEventImpl(Container.java:2273)
at java.awt.Window.dispatchEventImpl(Window.java:2719)
at java.awt.Component.dispatchEvent(Component.java:4687)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:735)
at java.awt.EventQueue.access$200(EventQueue.java:103)
at java.awt.EventQueue$3.run(EventQueue.java:694)
at java.awt.EventQueue$3.run(EventQueue.java:692)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at
java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
at java.awt.EventQueue$4.run(EventQueue.java:708)
at java.awt.EventQueue$4.run(EventQueue.java:706)
at java.security.AccessController.doPrivileged(Native Method)
at
java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:705)
at
java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
at
java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)

RE: Wiki

2014-07-24 Thread John Green

Yah, the little regex replace was nice too.


JG
—
Sent from Mailbox for iPhone

On Thu, Jul 24, 2014 at 1:45 PM, Chen, Pei pei.c...@childrens.harvard.edu
wrote:

 Ah Yes,
 I noticed there was a 'Copy Page Tree' feature that copied the entire pages 
 so it was fairly straightforward...
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Wednesday, July 23, 2014 9:11 PM
 To: dev@ctakes.apache.org
 Subject: Wiki
 
 Well Pei, I guess we were editing at the same time 5 hours ago. I kept 
 getting
 this error, and I couldn't figure it out because nothing was in the index 
 yet:
 
 Cause
 
 com.atlassian.confluence.pages.DuplicateDataRuntimeException: A page
 already exists with the title cTAKES 3.2 Component Use Guide in the space
 with key CTAKES
 at
 com.atlassian.confluence.pages.DefaultPageManager.throwIfDuplicateAbstr
 actPageTitle(DefaultPageManager.java:909)
 
 
 You beat me to the punch.
 
 
 JG

RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-22 Thread John Green

Ill play with it tonight or tomorrow night.


JG
—
Sent from Mailbox for iPhone

On Tue, Jul 22, 2014 at 4:27 PM, Chen, Pei pei.c...@childrens.harvard.edu
wrote:

 There is currently no guides on the confluence wiki for cTAKES 3.2.0...
 I was thinking of just cloning 3.1.1
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.1
 And just add the YTEX and/or any new changes to it...
 Would be grateful for any help here...
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Tuesday, July 22, 2014 2:37 PM
 To: dev@ctakes.apache.org
 Subject: Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 What exactly needs updated? I have not had the time (unfortunately) to
 help with this project very much because of the steep learning curve on the
 technology. I'm currently on some protected research time working with
 cTakes as of this week and would be happy to help with some grunt work.
 
 JG
 
 
 On Tue, Jul 22, 2014 at 11:39 AM, Bleeker, Troy C. bleeker.t...@mayo.edu
 wrote:
 
  One page at a time. At least there's that.
 
  Thanks
  Troy
  -Original Message-
  From: Masanz, James J.
  Sent: Tuesday, July 22, 2014 10:38 AM
  To: 'dev@ctakes.apache.org'
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  When I asked Troy that question for 3.1.1, he didn't know of a way,
  and I don't either, which is why I had the 3.1.1 page mostly just
  reference the
  3.2 documentation.
 
  -Original Message-
  From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
  Sent: Tuesday, July 22, 2014 10:00 AM
  To: dev@ctakes.apache.org
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  Thanks James.
  I was planning on closing the vote today.
  In the meantime, does anyone a quick way to clone/rename the wiki
  documentation for 3.2?
  --Pei
 
   -Original Message-
   From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
   Sent: Monday, July 21, 2014 4:25 PM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Here's the additional I've done
  
   I ran mvn test with 0 Failures and 0 Errors.
   Ran the AggregateTemplateFiller.xml and received same output (except
   for internal UIMA identifiers) with rc2 as I did with 3.1.1.
  
   +1 to release
  
   -Original Message-
   From: Masanz, James J.
   Sent: Wednesday, July 16, 2014 3:59 PM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   FYI, so far I have done the following steps:
  
   downloaded the source archive
   compiled it using: maven compile
   downloaded the separately available resources set up classpath to
   include e.g. jars (from the bin distribution) set ctakes.umlsuser
   and ctakes.umlspw env vars run  runctakesCVD.bat loaded
   AggregatePlaintextUMLSProcessor.xml
   ran against some simple text.
   verified did not through an exception.
   verified some EventMention and EntityMention annotations were
 produced.
  
   I will do more testing tomorrow. Just giving a status update.
  
   --James
  
   -Original Message-
   From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
   Sent: Saturday, July 12, 2014 6:24 AM
   To: dev@ctakes.apache.org
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Agreed on that.
  
   I downloaded the new resources binary and was able to run my tests
   on the - bin version of the RC.
  
   +1 for making this the release.
  
   Tim
  
  
   
   From: Masanz, James J. [masanz.ja...@mayo.edu]
   Sent: Friday, July 11, 2014 7:27 PM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   I agree about keeping the thread open.
  
   -- James
  
   -Original Message-
   From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
   Sent: Friday, July 11, 2014 4:28 PM
   To: dev@ctakes.apache.org
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Updated the lvg.properties file within ctakes-resources on
   sourceforge
  [1].
   Since the Apache cTAKES artifacts didn't change, I would like to
   keep this VOTE thread open.
  
   Also renamed it to 3.2.0 (even though they technically do not have
   to follow each other, but probably nice to keep it consistent for
   users as James
   suggested.) [1]
   http://sourceforge.net/projects/ctakesresources/files/ctakes-resourc
   es
   -
   3.2.0.zip/download
  
-Original Message-
From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
Sent: Thursday, July 10, 2014 5:53 PM
To: 'dev@ctakes.apache.org'
Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
   
Can you also give ctakesresources the number 3.2 or 3.2.0 instead
of
3.1.3
   
-Original Message-
From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
Sent: Thursday, July 10, 2014 2:12 PM
To: dev@ctakes.apache.org
Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
   
I

Re: Procedure

2014-07-17 Thread John Green

General so that I dont keep generating work for others :-)

Specifically: Temperature wasnt annotated, neither was Heart rate, for
example.

different but related: it picked up abdominal mass (C734) but not
pulsatile abdominal mass (C0266835) when given pulsatile abdominal
mass. I understand that this may be expected given the word order. If it
wasnt, then the concern, of course, is that by clinical intuition abdominal
mass isnt very specific and one wouldnt jump to thinking AAA. However,
pulsatile abdominal mass you would immediately think AAA. While this delta
is fairly well reflected in ytex's semantic similarity measure
(particularly LCH) with the distance being 0.84 and 0.64 for abdominal mass
to pulsatile abdominal mass and Abdominal Aortic Aneurysm (C0162871)
respectively.

Pulsatile abdominal mass was in the lookup window.

JG


On Wed, Jul 16, 2014 at 3:07 PM, Masanz, James J. masanz.ja...@mayo.edu
wrote:


 It depends on the type of annotation.

 Some are rule-based. Some are machine-learning based (models).  Some are
 dictionary dependent.  And some are based on annotations earlier in the
 pipeline, and so looking at the part of speech tags within the tokens, for
 example, can explain which chunk something appears in, which can explain
 why something might not have been annotated as a DiseaseDisorderMention,
 for example.

 Are you asking a general question or is there a specific type of
 annotation you are most interested in.

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Wednesday, July 16, 2014 2:01 PM
 To: dev@ctakes.apache.org
 Subject: Procedure

 Is there a generally accepted procedure for identifying why an annotation
 wasnt made?

 JG

RE: Procedure

2014-07-17 Thread John Green

I didnt see how it appeared in dictionary, I just looked at the cui in umls, 
which has it as abdominal pulsatile mass, which isnt the same order as the text 
I annotated in ctakes (pulsatile abdominal mass); but if im wrong great, it 
does raise the question even more why if it was in the lookup window and in the 
dictionary that it was only annotated as abdominal mass.


Apropos temperature and heart rate, the results of these are measurements 
right? But it seems also that they should be procedures in the sense that you 
perform a physical manipulation on a pt. If I were checking notes for the 
presence of whether or not someone checked vitals vs obtaining the 
measurements, this seems within the current use case, but Im so often wrong 
here being so new... 




JG







—
Sent from Mailbox for iPhone

On Thu, Jul 17, 2014 at 1:44 PM, Masanz, James J. masanz.ja...@mayo.edu
wrote:

 In general cTAKES doesn't pick up things with values, such as weight, height, 
 lab values, temperature, with the exception that the drug ner pipeline can 
 pick up medication related values such as dose, strength, etc.
 cTAKES does pick up a few things as MeasurementAnnotation just by pattern, 
 but doesn't associate those with a named entity that has a cui.
 The example of pulsatile abdominal mass listed the same 3 words in the same 
 order for the dictionary entry and the text that was processed, so I'm not 
 clear what you meant about word order.
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com] 
 Sent: Thursday, July 17, 2014 8:04 AM
 To: dev@ctakes.apache.org
 Subject: Re: Procedure
 General so that I dont keep generating work for others :-)
 Specifically: Temperature wasnt annotated, neither was Heart rate, for
 example.
 different but related: it picked up abdominal mass (C734) but not
 pulsatile abdominal mass (C0266835) when given pulsatile abdominal
 mass. I understand that this may be expected given the word order. If it
 wasnt, then the concern, of course, is that by clinical intuition abdominal
 mass isnt very specific and one wouldnt jump to thinking AAA. However,
 pulsatile abdominal mass you would immediately think AAA. While this delta
 is fairly well reflected in ytex's semantic similarity measure
 (particularly LCH) with the distance being 0.84 and 0.64 for abdominal mass
 to pulsatile abdominal mass and Abdominal Aortic Aneurysm (C0162871)
 respectively.
 Pulsatile abdominal mass was in the lookup window.
 JG
 On Wed, Jul 16, 2014 at 3:07 PM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:

 It depends on the type of annotation.

 Some are rule-based. Some are machine-learning based (models).  Some are
 dictionary dependent.  And some are based on annotations earlier in the
 pipeline, and so looking at the part of speech tags within the tokens, for
 example, can explain which chunk something appears in, which can explain
 why something might not have been annotated as a DiseaseDisorderMention,
 for example.

 Are you asking a general question or is there a specific type of
 annotation you are most interested in.

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Wednesday, July 16, 2014 2:01 PM
 To: dev@ctakes.apache.org
 Subject: Procedure

 Is there a generally accepted procedure for identifying why an annotation
 wasnt made?

 JG

Re: Procedure

2014-07-17 Thread John Green

Hey, thanks! 


JG
—
Sent from Mailbox for iPhone

On Thu, Jul 17, 2014 at 2:37 PM, Bruce Tietjen
bruce.tiet...@perfectsearchcorp.com wrote:

 I believe it wouldn't be picked up because the first word dictionary lookup
 algorithm only does lookups by first word as it progresses through the
 words in the lookup window.
 From the lookup window, it would do:
 pulsatile abdominal mass -- lookup word 'pulsatile'
 abdominal mass -- lookup word 'abdominal'
 mass -- lookup word 'mass
 Since the dictionary entry is 'Abdominal pulsatile mass', the lookup key is
 'Abdominal'.  the lookup for 'pulstile' would not find it.
 When it does the 'abdominal mass' lookup, the word 'pulsatile' has already
 been dropped and so it will not match the full word matching for this entry.
  [image: IMAT Solutions] http://imatsolutions.com
  Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com
 On Thu, Jul 17, 2014 at 11:57 AM, John Green john.travis.gr...@gmail.com
 wrote:
 I didnt see how it appeared in dictionary, I just looked at the cui in
 umls, which has it as abdominal pulsatile mass, which isnt the same order
 as the text I annotated in ctakes (pulsatile abdominal mass); but if im
 wrong great, it does raise the question even more why if it was in the
 lookup window and in the dictionary that it was only annotated as abdominal
 mass.


 Apropos temperature and heart rate, the results of these are measurements
 right? But it seems also that they should be procedures in the sense that
 you perform a physical manipulation on a pt. If I were checking notes for
 the presence of whether or not someone checked vitals vs obtaining the
 measurements, this seems within the current use case, but Im so often wrong
 here being so new...




 JG







 —
 Sent from Mailbox for iPhone

 On Thu, Jul 17, 2014 at 1:44 PM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:

  In general cTAKES doesn't pick up things with values, such as weight,
 height, lab values, temperature, with the exception that the drug ner
 pipeline can pick up medication related values such as dose, strength, etc.
  cTAKES does pick up a few things as MeasurementAnnotation just by
 pattern, but doesn't associate those with a named entity that has a cui.
  The example of pulsatile abdominal mass listed the same 3 words in the
 same order for the dictionary entry and the text that was processed, so I'm
 not clear what you meant about word order.
  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Thursday, July 17, 2014 8:04 AM
  To: dev@ctakes.apache.org
  Subject: Re: Procedure
  General so that I dont keep generating work for others :-)
  Specifically: Temperature wasnt annotated, neither was Heart rate, for
  example.
  different but related: it picked up abdominal mass (C734) but not
  pulsatile abdominal mass (C0266835) when given pulsatile abdominal
  mass. I understand that this may be expected given the word order. If it
  wasnt, then the concern, of course, is that by clinical intuition
 abdominal
  mass isnt very specific and one wouldnt jump to thinking AAA. However,
  pulsatile abdominal mass you would immediately think AAA. While this
 delta
  is fairly well reflected in ytex's semantic similarity measure
  (particularly LCH) with the distance being 0.84 and 0.64 for abdominal
 mass
  to pulsatile abdominal mass and Abdominal Aortic Aneurysm (C0162871)
  respectively.
  Pulsatile abdominal mass was in the lookup window.
  JG
  On Wed, Jul 16, 2014 at 3:07 PM, Masanz, James J. masanz.ja...@mayo.edu
 
  wrote:
 
  It depends on the type of annotation.
 
  Some are rule-based. Some are machine-learning based (models).  Some are
  dictionary dependent.  And some are based on annotations earlier in the
  pipeline, and so looking at the part of speech tags within the tokens,
 for
  example, can explain which chunk something appears in, which can explain
  why something might not have been annotated as a DiseaseDisorderMention,
  for example.
 
  Are you asking a general question or is there a specific type of
  annotation you are most interested in.
 
  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Wednesday, July 16, 2014 2:01 PM
  To: dev@ctakes.apache.org
  Subject: Procedure
 
  Is there a generally accepted procedure for identifying why an
 annotation
  wasnt made?
 
  JG

RE: Procedure

2014-07-17 Thread John Green

Wonderful explanation James, thank you!


JG
—
Sent from Mailbox for iPhone

On Thu, Jul 17, 2014 at 2:41 PM, Masanz, James J. masanz.ja...@mayo.edu
wrote:

 The order you mentioned in your previous email had been  pulsatile abdominal 
 mass  for both what is in UMLS and what was in the text being annotated, 
 which is why I was asking about the ordering.
 Given that I now know the text you were annotating did have a different word 
 order than what is in umls, and seeing exactly what those orderings were, 
 that explains why it was not being picked up.
 A quirk/feature of cTAKES (current) dictionary lookup (as opposed to the 
 newer one called lookup-2) is that the first word must be first, but in a 
 multi (2) word entry, the order of the other words doesn't matter.
 So for example, with abdominal pulsatile mass in the dictionary, both of 
 these should get annotated with the same cui
 abdominal pulsatile mass
 abdominal mass pulsatile
 but this will not get an annotation for that CUI
 pulsatile abdominal mass
 unless that ordering is also in the dictionary.
 As far as heart rate and temperature, whether they are annotated as 
 procedures all depends on if they show up in the UMLS with the semantic types 
 used by cTAKES.
 To check those, I would do this
 - Open the UMLS terminology services Metathesaurus Browser app 
   https://uts.nlm.nih.gov/home.html
   Applications-UTS Metathesaurus Browser
 - input the text of interest into the box in the left pane, and click Go
 - select the CUI that looks hopeful
 - the pane on the right will fill in with details about that Concept, 
 including the semantic all the Atoms.
 - look at the Semantic Types in the pane on the right
 - if not a semantic type that cTAKES annotates, select a different CUI
 - Once found a CUI with a semantic type cTAKES annotates, if the text of the 
 UMLS Concept itself is not exactly what I was looking for, look at all the 
 Atoms, and see if the text I was looking for appears with SNOMED_CT, NCI, 
 MSH, or ICD9CM.
 Note that cTAKES also uses normalized forms of the words in the text being 
 processed, so if the input text were lymph nodes it would match a 
 hypothetical dictionary entry of lymph node.
 Also note that intervening words can be OK, up to a limit, but all words 
 within the term must appear within a single LookupWindow.  
 Hope that is helpful
 -- James
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com] 
 Sent: Thursday, July 17, 2014 12:57 PM
 To: dev@ctakes.apache.org
 Cc: dev@ctakes.apache.org
 Subject: RE: Procedure
 I didnt see how it appeared in dictionary, I just looked at the cui in umls, 
 which has it as abdominal pulsatile mass, which isnt the same order as the 
 text I annotated in ctakes (pulsatile abdominal mass); but if im wrong great, 
 it does raise the question even more why if it was in the lookup window and 
 in the dictionary that it was only annotated as abdominal mass.
 Apropos temperature and heart rate, the results of these are measurements 
 right? But it seems also that they should be procedures in the sense that you 
 perform a physical manipulation on a pt. If I were checking notes for the 
 presence of whether or not someone checked vitals vs obtaining the 
 measurements, this seems within the current use case, but Im so often wrong 
 here being so new... 
 JG
 —
 Sent from Mailbox for iPhone
 On Thu, Jul 17, 2014 at 1:44 PM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:
 In general cTAKES doesn't pick up things with values, such as weight, 
 height, lab values, temperature, with the exception that the drug ner 
 pipeline can pick up medication related values such as dose, strength, etc.
 cTAKES does pick up a few things as MeasurementAnnotation just by pattern, 
 but doesn't associate those with a named entity that has a cui.
 The example of pulsatile abdominal mass listed the same 3 words in the 
 same order for the dictionary entry and the text that was processed, so I'm 
 not clear what you meant about word order.
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com] 
 Sent: Thursday, July 17, 2014 8:04 AM
 To: dev@ctakes.apache.org
 Subject: Re: Procedure
 General so that I dont keep generating work for others :-)
 Specifically: Temperature wasnt annotated, neither was Heart rate, for
 example.
 different but related: it picked up abdominal mass (C734) but not
 pulsatile abdominal mass (C0266835) when given pulsatile abdominal
 mass. I understand that this may be expected given the word order. If it
 wasnt, then the concern, of course, is that by clinical intuition abdominal
 mass isnt very specific and one wouldnt jump to thinking AAA. However,
 pulsatile abdominal mass you would immediately think AAA. While this delta
 is fairly well reflected in ytex's semantic similarity measure
 (particularly LCH) with the distance being 0.84 and 0.64 for abdominal mass
 to pulsatile abdominal mass and Abdominal Aortic Aneurysm

Re: DBConsumer

2014-07-15 Thread John Green

—
Sent from Mailbox for iPhone

On Tue, Jul 15, 2014 at 8:26 AM, vijay garla vnga...@gmail.com wrote:

 You can add the DBConsumer to any pipeline, or add it to any CPE config.
  See
 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+YTEX+DBConsumer
 You will have to set up ctakes to and your database as documented here:
 https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
 -vj
 On Tue, Jul 15, 2014 at 2:16 AM, John Green john.travis.gr...@gmail.com
 wrote:
 The Ytex DBConsumer - If someone has a free moment, could they give me a
 hint at how I can plug into a mysql DB with the ytex DB consumer? For
 example, taking the default ytex pipeline and sending it to a db.

 If I get pointed in the right direction such that I figure it out Ill
 update the confluence with the how to for the future.

 Thanks!
 JG

Re: Building

2014-07-14 Thread John Green

Hi Vijay - The queries are all returning odd errors but the commands to
list what graphs are available. I will try and sleuth out a better report
than that.

Thank you for your continued help,
JG

On Sat, Jul 5, 2014 at 12:53 AM, vijay garla vnga...@gmail.com wrote:

When you run the webapp, the restful sevices run as well

On Friday, July 4, 2014, John Green john.travis.gr...@gmail.com wrote:

Vijay - Ha! Ok. Works perfect with cuis.

Is there a way to run the web application as a RESTful API? You mention
this as a service on your yale box, but I dont see a way to deploy it
this
way local.

Thanks again,
JG

On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com
javascript:; wrote:

The ytexWeb application tries to look up concepts from terms using the
ytex
dictionary lookup table, which is a small subset of the UMLS. Can you
try
specifying cuis? That skips the lookup - if the concepts are in the
concept graph, this will work.

Best,

On Sun, Jun 29, 2014 at 6:10 PM, John Green
john.travis.gr...@gmail.com
javascript:;
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex
with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to
run
this in a RESTful mode? Btw, the informatics.yale links returns 502).
After
I did get it launched, and the confusion was probably all my fault,
the
concepts available to the similarity fields seemed very sparse; I
just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test,
which,
if Im understanding it right, would be an alkphos, but alkaline
phosphatase
didnt come up, which led to me to believe they were smaller sets of
the
the
snomed, mesh, etc compilations (as I checked the UMLS db and these
concepts
are there).

I think I got that execution command from the code.google, which is
probably why it was stale. I did not see the ytex semantic similarity
guide
under the ctakes components part (sorry, thanks for pointing me
there,
ill
get to work on reading it).

So bottom line: are the ones that shipped watered down versions? And
if
not, why are my concepts coming up short? If you give me a hint at
where
to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com
javascript:; wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

- sct-rxnorm: concepts from SNOMED-CT and RXNORM. This is the
default.
- sct-msh-csp-aod: concepts from the SNOMED-CT, MeSH, CRISP, and
Alcohol
and Drug thesaurus
- umls: concepts from all restriction free (level 0) UMLS source
vocabularies and SNOMED-CT

These concept graphs are included in ytex resources zip (see

https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute
semantic
similarity measures

All YTEX packages moved from the ytex namespace into
org.apache.ctakes.ytex
- can you tell me which document you were looking at that mentioned
ytex.kernel.dao.ConceptDaoImpl? I thought I had fixed this in the
documentation.

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green
john.travis.gr...@gmail.com javascript:;

wrote:

I got the semantic similarity web app running in ytex. Im still
learning
umls terminology, but I believe it says that out of the box its
concept
graphs are limited to the free set from umls? Does this mean
without
permissions? Similar to ctakes with umls rights? The concepts
available
seem limited so this would make sense.

So, to take full advantage I would need to rebuild the concept
graph,
correct? Im in the process of doing this but getting classpath
errors.
I
used java a bit ten years ago, so you can probably guess these
will
take
me
a minute to resolve. Notably, it is complaining about
ytex.kernel.dao.ConceptDaoImpl.

Thanks all,

—
Sent from Mailbox for iPhone

Re: Building

2014-07-04 Thread John Green

Vijay - Ha! Ok. Works perfect with cuis.

Is there a way to run the web application as a RESTful API? You mention
this as a service on your yale box, but I dont see a way to deploy it this
way local.

Thanks again,
JG

On Wed, Jul 2, 2014 at 10:58 PM, vijay garla vnga...@gmail.com wrote:

The ytexWeb application tries to look up concepts from terms using the ytex
dictionary lookup table, which is a small subset of the UMLS. Can you try
specifying cuis? That skips the lookup - if the concepts are in the
concept graph, this will work.

Best,

On Sun, Jun 29, 2014 at 6:10 PM, John Green john.travis.gr...@gmail.com
wrote:

Hi Vijay, thank you for your time.

Your documentation was quite good. I had no problem setting up ytex with
UMLS running on my local mysql server. Where I ran into problems was
understanding how to launch the web service (also, is there anyway to run
this in a RESTful mode? Btw, the informatics.yale links returns 502).
After
I did get it launched, and the confusion was probably all my fault, the
concepts available to the similarity fields seemed very sparse; I just
started typing randomly, hematochezia, choledocholithiasis, etc, and
nothing would come up. The best I got was gallbladder function test,
which,
if Im understanding it right, would be an alkphos, but alkaline
phosphatase
didnt come up, which led to me to believe they were smaller sets of the
the
snomed, mesh, etc compilations (as I checked the UMLS db and these
concepts
are there).

So bottom line: are the ones that shipped watered down versions? And if
not, why are my concepts coming up short? If you give me a hint at where
to
check Ill investigate.

Thanks!
JG

On Sun, Jun 29, 2014 at 8:56 PM, vijay garla vnga...@gmail.com wrote:

Hi John,

YTEX ships with 3 concept graphs (see

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.2+-+Semantic+Similarity
):

These concept graphs are included in ytex resources zip (see
https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation):
3) Unzip YTEX Resources (Optional - UTS login required)

Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

http://www.ytex-nlp.org/umls.download/secure/3.1/ctakes-ytex-resources-3.1.2-SNAPSHOT.zip

'over'
your installation. This contains:

- Concept Graphs derived from the UMLS2013AA used to compute
semantic
similarity measures

HTH,

-vj

On Sun, Jun 29, 2014 at 2:25 PM, John Green
john.travis.gr...@gmail.com

wrote:

I got the semantic similarity web app running in ytex. Im still
learning
umls terminology, but I believe it says that out of the box its
concept
graphs are limited to the free set from umls? Does this mean without
permissions? Similar to ctakes with umls rights? The concepts
available
seem limited so this would make sense.

So, to take full advantage I would need to rebuild the concept graph,
correct? Im in the process of doing this but getting classpath
errors.
I
used java a bit ten years ago, so you can probably guess these will
take
me
a minute to resolve. Notably, it is complaining about
ytex.kernel.dao.ConceptDaoImpl.

Thanks all,

—
Sent from Mailbox for iPhone

Re: Confluence

2014-07-01 Thread John Green

Thanks Pei!—
Sent from Mailbox for iPhone

On Sun, Jun 29, 2014 at 10:37 PM, Pei Chen chen...@apache.org wrote:

 jtgreen has been added to the confluence wiki.
 --Pei
 On Sun, Jun 29, 2014 at 2:03 PM, John Green john.gr...@usuhs.edu wrote:
 Pei - Im finally getting around to working with ytex. There are some
 things Id like to clarify in the install for beginners like me. How do I
 edit the confluence wiki?

 JG

 Sent from my iPhone

ytex DBconsumer and groovy parser

2014-07-01 Thread John Green

If someone has a free minute, which, judging from my own life is probably
not the case - where in the groovy scrips in sandbox do you define the
consumer to use? There is one comment that says dont put the .xml here
then there is a path to the dictionary ae. Im working by ssh from the
hospital a lot in my free time in the ICU and running gui CPEs isn't
gonna cut it.

Apropos the ytex dbconsumer - I should be able to just tack this on to the
end of the ytex aggregate pipeline?

I'm probably still asking very naive questions but to date I still haven't
had the time to dive into UIMA's base very well, so I apologize.

My goal is to run the full ytex pipeline from the command line with the
ytex dbconsumer ...

Thanks for everyone's patience,
John

Web demo

2014-06-18 Thread John Green

Where do we stand on the web demo? Ive been off on other projects and Im 
looking to digging into ctakes again, picking up where I left off, for an 
application for my med school. I think my work there could overlap with a demp 
server.

In regards to thread safety: it sounds like from the chatter recently we would 
just have to ditch lvg to make it thread safe. 


Also, Im not too familiar with how ctakes loads the modules into memory, but 
any web demo that ran would a) want a restful api and b) those reponses would 
want to be generated against a ctakes process already loaded lr without having 
to reload the models, right? Any ideas, either in correcting a misconception I 
may have or on how to proceed there?


If I pulled off a demo Id use django and python and one of its restful apis.


JG
—
Sent from Mailbox for iPhone

RE: Ctakes-data-vis

2014-04-29 Thread John Green

Pei - I meant as a web app, can we keep the credentials loaded and the 
resources (more importantly) loaded in memory accross runs? E.g. Treat it like 
a que with the machinery already loaded and fed? Im sure this can be done, I 
just run ctakes from CPE right now and havent toyed with this so wasnt sure.


Where is this at? Is anyone developing the front end? I might be able to invest 
some time into the easily. 




Jg
—
Sent from Mailbox for iPhone

On Wed, Apr 16, 2014 at 12:23 PM, Chen, Pei
pei.c...@childrens.harvard.edu wrote:

 John,
 How we use the VM is up to us to decide.  For an online demo,
 We can certainly load up cTAKES and it's resources.  
 If it's a web app, we can prompt the user to enter umls credentials if they 
 choose the umls resources?
 --Pei
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Sunday, April 13, 2014 9:16 PM
 To: dev@ctakes.apache.org
 Subject: Re: Ctakes-data-vis
 
 Great! Ill try and fix that soon. Im back on the wards so time is slim.
 
 
 What are the next steps for the vm? For the demo site?
 
 
 
 
 Out of curiosity, would this allow resources to stay loaded and a kind of que
 be setup? Is there a solution that allows to do this now? That is, the
 resources stay loaded in mem, the umls auth stays current, and I could just
 pass content as it becomes available?
 
 
 
 
 Jg
 —
 Sent from Mailbox for iPhone
 
 On Sat, Apr 12, 2014 at 2:57 PM, andy mcmurry mcmurry.a...@gmail.com
 wrote:
 
  It looks great! The transitions are smooth and the hierarchical
  browsing is straightforward. The only edit I recommend I have is about
  spacing -- The information often exceeds the space of a single page.
  On Sat, Apr 5, 2014 at 12:13 PM, John Green
 john.travis.gr...@gmail.comwrote:
  Had to refresh my svn skills as its been years. As a result not much
  cleaning up got done Andy/Pei. The code is solid though and I sent
  four different ways to view the json up too; collapsable dendrogram
  is the most useful.
 
 
  The script could easily be re written to iterate through a directory
  as its in the form of a simple class. Also, it should take command line 
  args.
  Im out of time this weekend, even for the ten minutes that would
  take, but I can do both next weekend.
 
 
  Let me know if its useful at all Andy or if you need tweaks on
  anything to make it useful for whatever demo u have in mind, id be
  happy to as time permits.
 
 
  Hope to make more significant contributions to this wonderful project
  sometime in the next year, Jg
  --
  Sent from Mailbox for iPhone

Re: Apache cTAKES Example Application?

2014-04-17 Thread John Green

+1!


Im mainly using ctakes as middleware, which is totally inline with this.




What is NCBO? 




JG
—
Sent from Mailbox for iPhone

On Wed, Apr 16, 2014 at 6:53 PM, andy mcmurry mcmurry.a...@gmail.com
wrote:

 Lowering the barrier to entry = worth the effort
 Notice the NCBO users mailing list has solid routine comments and
 discussion about the VM appliance they distribute.
 On Wed, Apr 16, 2014 at 9:37 AM, Pei Chen chen...@apache.org wrote:
 We spent some time in the past to make it easier for users to launch the
 CVD/CPE.
 But based on the questions/discussions, I think we are passed this stage
 and a very common use case would be for developers to use cTAKES as a lib,
 extend a class or two and then, embed it into their existing app.

 I am proposing a ctakes-web-demo for the sandbox,
 A simple webapp- war/maven pom.xml that uses the ctakes as a dependency.
  Have a simple servlet that wires up a pipeline (uimaFIT style), and then
 dump the CAS as html table.  We could even host it on:
 https://demo-ctakes.apache.org/

 It will probably only be a few lines of code, but it may be a good starting
 point for developers who are more interested in using it as a lib and not
 necessarily modifying ctakes code.


 What do folks think?

 --Pei

Re: Ctakes-data-vis

2014-04-13 Thread John Green

Great! Ill try and fix that soon. Im back on the wards so time is slim.


What are the next steps for the vm? For the demo site? 




Out of curiosity, would this allow resources to stay loaded and a kind of que 
be setup? Is there a solution that allows to do this now? That is, the 
resources stay loaded in mem, the umls auth stays current, and I could just 
pass content as it becomes available? 




Jg
—
Sent from Mailbox for iPhone

On Sat, Apr 12, 2014 at 2:57 PM, andy mcmurry mcmurry.a...@gmail.com
wrote:

 It looks great! The transitions are smooth and the hierarchical browsing is
 straightforward. The only edit I recommend I have is about spacing -- The
 information often exceeds the space of a single page.
 On Sat, Apr 5, 2014 at 12:13 PM, John Green 
 john.travis.gr...@gmail.comwrote:
 Had to refresh my svn skills as its been years. As a result not much
 cleaning up got done Andy/Pei. The code is solid though and I sent four
 different ways to view the json up too; collapsable dendrogram is the most
 useful.


 The script could easily be re written to iterate through a directory as
 its in the form of a simple class. Also, it should take command line args.
 Im out of time this weekend, even for the ten minutes that would take, but
 I can do both next weekend.


 Let me know if its useful at all Andy or if you need tweaks on anything to
 make it useful for whatever demo u have in mind, id be happy to as time
 permits.


 Hope to make more significant contributions to this wonderful project
 sometime in the next year,
 Jg
 --
 Sent from Mailbox for iPhone

Further

2014-04-05 Thread John Green

Well, dang. I think I can fix my mistake but it seems one of my files needs 
unlocked. Rather than compound things here, Pei, you mind extricating me from 
this by moving those files to a subdir called ctakes-data-vis? 


Sorry about that.


JG
—
Sent from Mailbox for iPhone

Disregard

2014-04-05 Thread John Green

Please disregard Pei. The lock error I was getting threw me. All is corrected.


Jg
—
Sent from Mailbox for iPhone

RE: ctakes-vm.apache.org

2014-04-03 Thread John Green

Would love to! Ive only submitted those example notes I did though to a jira 
ticket. How do I push to the sandbox dir? Any special permissions I need? 




JG

—
Sent from Mailbox for iPhone

On Wed, Apr 2, 2014 at 10:51 PM, Chen, Pei pei.c...@childrens.harvard.edu
wrote:

 John,
 If there are no other objections, you can also put it directly in sandbox
 https://svn.apache.org/repos/asf/ctakes/sandbox/
 It may make it easier in the future if folks decided to integrate into 
 cTAKES... and possibly save any potential IP/License questions...
 --Pei
 
 From: John Green [john.travis.gr...@gmail.com]
 Sent: Wednesday, April 02, 2014 6:24 PM
 To: dev@ctakes.apache.org
 Subject: Re: ctakes-vm.apache.org
 Great!
 Let me clean it up this weekend and ill throw it out onto my github. Will 
 post link soon; nlt cob this weekend.
 JG
 —
 Sent from Mailbox for iPhone
 On Wed, Apr 2, 2014 at 1:53 PM, andy mcmurry mcmurry.a...@gmail.com
 wrote:
 Yes! Impeccable timing. Where can we find the python source?
 On Apr 2, 2014 8:33 AM, John Green john.travis.gr...@gmail.com wrote:
 Andy: this is very interesting and exciting.




 I hacked out a script that makes a visually appealing representation of
 the aggregate pipeline in d3js that, at least for a clinician, is a nice
 overall summary of the meta data generated from the pipeline. Its really no
 more than a parser of the xml through the type system spitted out into
 json, but when I was talking to my informatics department who didnt know
 much at all about ctakes, it was a great visual summary. Its in python. I
 dont know if youd want it but it might be worth having the demo site spit
 out a visually appealing graphic like this automatically. If not in python
 it might be worth adapting it to whatever your using for a platform to spit
 out the json for the d3js graphic im using.




 John

 --
 Sent from Mailbox for iPhone

 On Thu, Mar 20, 2014 at 5:31 AM, andy mcmurry mcmurry.a...@gmail.com
 wrote:

  Yes! I have been working full time on the apt-get install task specific
  to medical genetics: http://www.ncbi.nlm.nih.gov/medgen
  Right now, millions of $$$ are invested in getting phenotype concepts --
  indications, diseases, problem lists -- linked to patient test results
  including DNA / RNA / etc. In industry, most of the curation work is done
  manually because platforms like cTAKES are not yet immediately
 accessible.
  I have written code to
  A) start automating the installer tasks for cTAKES on Ubuntu 13
  B) install UMLS NLP tools metamap, semrep, semmed
  C) mirror NLM content that extends UMLS annotation
  *SO THAT : *
  Mentions of diseases relationships -- SNOMED-CT, HPO, OMIM, GTR, UMLS --
  reference the same semantic relationships in UMLS Clinical Terms and
  Genetic Test Reference. This is powerful and all credit to the NLM for
  creating MedGen and GTR, new crucial additions to the UMLS. To my
  knowledge, these new sources have not been fully utilized by the medical
  NLP community.
  *I'm strongly advocating for a cTAKES VM that indexes UMLS concepts in
 the
  same say that NCBI indexes UMLS linked Medical Genetics terms.*
  Towards this goal, if other committers are interested,  I'm 100% time
  committed to this problem.
  *TL;DR*: at minimum, having a demo site makes cTAKES more accessible. We
  should demonstrate rather than explain every feature of cTAKES. I'm
 working
  100% on the Clinical Text +BioNLP problem. If that interests you, let me
  know I'm convinced this area has huge, understudied potential.
  --AndyMC
  On Tue, Mar 18, 2014 at 8:15 AM, Pei Chen chen...@apache.org wrote:
  FYI:
  ASF Infra is setting up our VM for demo purposes.
  INFRA-7451
 
  If you need access, feel free to let us now.
  Initial maintainers: james-masanz, andymc,chenpei
  --Pei

Nice info

2014-03-03 Thread John Green

I'm sure most of you know what he presents esp because he's from Mayo, but
great talk:

http://webcast.jhu.edu/Mediasite/Play/0fd833143fcc4bd5954bcab801bba6da1d

If that url doesnt work:
http://www.icm.jhu.edu/seminars/index.php?pageid=5

and look for Christopher Chute
Professor of Biomedical Informatics, Mayo Clinic College of Medicine

JG

Re: Sectionizer

2014-02-16 Thread John Green

To answer my own question, and for anyone searching the mail archives in
the future, I haven't fully explored it yet, but it seems that Vanderbilt
already did this with SecTag.

JG


On Sun, Feb 9, 2014 at 9:29 PM, digital paula cybersat...@hotmail.comwrote:

 John,

 I've been out of the loop for a few weeks now w/the cTAKES developer list
 and will be until end of Feb... too many pressing deadlines.  I have to
 step through the code again to verify if sectionizer uses reg ex but I'm
 pretty sure that it does.   Hmmm, that does sound interesting to train the
 sectionizer from clinical notes.   Not familiar with Mastif so hopefully
 someone else can chime in on your question there.   Talk to you and
 everyone very soon.  :-)

 Regards,
 Paula

  Date: Sun, 9 Feb 2014 17:05:40 -0500
  Subject: Sectionizer
  From: john.travis.gr...@gmail.com
  To: dev@ctakes.apache.org
 
  I know there has been some chatter about sectionizers. From what I
  understand Paula is doing, and from what I understand YTEX does, they are
  all regular expression based, correct?
 
  Has anyone added to rule based matching a statistical/Markov type
  sectionizer? E.g. one trained from a bunch of notes?
 
  Does Mastif work this way?
 
  Thanks all,
  JG

Sectionizer

2014-02-09 Thread John Green

I know there has been some chatter about sectionizers. From what I
understand Paula is doing, and from what I understand YTEX does, they are
all regular expression based, correct?

Has anyone added to rule based matching a statistical/Markov type
sectionizer? E.g. one trained from a bunch of notes?

Does Mastif work this way?

Thanks all,
JG

Re: Brat

2014-02-08 Thread John Green

You're too kind Pei, with the phrase your analysis ... What I was
thinking was Brat for both annotations (forward), but also data
visualization (backward in Brat). I know we can visualize a little with
UIMA's tools, but I like the layout of Brat for a 1000 yard-look of the
data that I will be formulating a hypothesis on (or have in this case); for
instance, right now, Im looking at the I2B2 data a lot, as well as some
note sets my informatics department has as I write my IRB proposal, and it
just seems having this larger look would be helpful for me narrowing the
scope of the study based on the limitations of the data.

I've been thinking of doing something similar to brat in
Django/Python/Twitter Bootstrap, but I hate reinventing wheels.

But, maybe I am missing something and another visually rich representation
of the CAS is available?

JG


On Fri, Feb 7, 2014 at 1:43 PM, Chen, Pei pei.c...@childrens.harvard.eduwrote:

 I think some of the OpenNLP folks did some work with the Brat annotation
 tool,
 but I don't think anyone has worked on it with cTAKES-I would be curious
 on your analysis though...


  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Friday, February 07, 2014 6:01 AM
  To: dev@ctakes.apache.org
  Subject: Brat
 
  I've done a cursory search and come up short: has anyone written anything
  to convert the annotations from the pipeline to the Brat *.ann format?
 
  Thanks,
  JG

Re: Brat

2014-02-08 Thread John Green

Hi Will, that would be much appreciated. Do you mind sending it my way? Any
changes/additions I can commit somewhere or just email back to you if/when
they occur.

Thank you for your offer,
John


On Fri, Feb 7, 2014 at 1:49 PM, William Karl Thompson
w...@northwestern.eduwrote:

 Hi,

 OpenNLP has some code that takes brat annotation files and creates
 BratAnnotation object instances. I've taken the code and modified it
 (simplified in some ways) to generate cTAKES annotations, using a
 BratAnnotator analysis engine that reads in brat annotation files. I
 would be happy to share that code with anyone who wants to look at it and
 make it better!

 -Will
 
 From: Chen, Pei [pei.c...@childrens.harvard.edu]
 Sent: Friday, February 07, 2014 12:43 PM
 To: dev@ctakes.apache.org
 Subject: RE: Brat

 I think some of the OpenNLP folks did some work with the Brat annotation
 tool,
 but I don't think anyone has worked on it with cTAKES-I would be curious
 on your analysis though...


  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Friday, February 07, 2014 6:01 AM
  To: dev@ctakes.apache.org
  Subject: Brat
 
  I've done a cursory search and come up short: has anyone written anything
  to convert the annotations from the pipeline to the Brat *.ann format?
 
  Thanks,
  JG

Brat

2014-02-07 Thread John Green

I've done a cursory search and come up short: has anyone written anything
to convert the annotations from the pipeline to the Brat *.ann format?

Thanks,
JG

Re: YTEX cTAKES 3.1.1 ready

2014-02-07 Thread John Green

Completely non-contributory, but it is odd/humorous to see the headaches
that quickly written notes we do in the 5 minutes post-encounter lead to in
free-text analysis.

JG


On Thu, Feb 6, 2014 at 1:27 PM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

 Right, got it.  I just wanted to let you know that some EMR notes -do-
 require sentence splitting at newline characters.

 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Thursday, February 06, 2014 1:06 PM
 To: dev@ctakes.apache.org
 Cc: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org;
 vlad.valtchi...@gmail.com
 Subject: Re: YTEX cTAKES 3.1.1 ready

 The cTAKES sentence detector is not changed in the YTEX branch.  The YTEX
 branch has an *additional* sentence detector that does not automatically
 split sentences on newlines - users can use this if they like.

 -vj


 On Thu, Feb 6, 2014 at 1:01 PM, Finan, Sean 
 sean.fi...@childrens.harvard.edu wrote:

  Hi Vijay,
 
I have yet to run across clinical text from a real EMR where
   newlines
  represent the end of a sentence
 
  Since James pointed out this possibility a couple weeks ago, I have
  kept my eyes open.  The problem is pretty ubiquitous in a corpus that
  I'm working with right now.  I just opened the first note and gave it
  a count ... 95 lines total, 9 are sentence/phrase (lacking punctuation)
 endings.
   This is not including lists, which comprise about half of the note.
  One possible conjoinment was Will consider [...] biopsy\nGiven [...].
   Depending upon how cTakes deals with it, the meaning could change
  drastically.
 
   I believe cTAKES absolutely has to support sentences with newlines
  within them
 
  Yes, cTakes should do so, but I hope that you aren't suggesting that
  it only support such a structure.
 
  Where is that easy button?
 
  -Original Message-
  From: vijay garla [mailto:vnga...@gmail.com]
  Sent: Thursday, February 06, 2014 10:31 AM
  To: dev@ctakes.apache.org
  Cc: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org;
  vlad.valtchi...@gmail.com
  Subject: Re: YTEX cTAKES 3.1.1 ready
 
  I believe it is worth migrating to trunk.
 
  Note that the sentence detector is also complementary - the existing
  ctakes sentence detector is unchanged - users can choose which
  sentence detector to use.  There are changes to assertion  dependency
  parsing to support sentences without newlines, and that works with
  both sentence detectors.
 
  I believe cTAKES absolutely has to support sentences with newlines
  within them - I have yet to run across clinical text from a real EMR
  where newlines represent the end of a sentence - the changes to
  assertion  dependency parsing will have to be done at some point.
 
  -vj
 
 
  On Thu, Feb 6, 2014 at 10:19 AM, Chen, Pei
  pei.c...@childrens.harvard.eduwrote:
 
   VJ,
   Aside from the changes to the existing cTAKES code (sentence
   detector,
   etc.) [which we could leave out if it's still being debated], Do you
   think it's worth migrating the ytex code to trunk at this point?
As you mentioned earlier, it's largely complementary.
   [I was just thinking of saving effort to maintain the separate
   branch and for simplicity for dev...]
  
   --Pei
  
-Original Message-
From: vijay garla [mailto:vnga...@gmail.com]
Sent: Wednesday, February 05, 2014 9:30 PM
To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org;
vlad.valtchi...@gmail.com
Subject: Re: YTEX cTAKES 3.1.1 ready
   
Hi Vlad,
   
I Updated the umls install guide; see
https://code.google.com/p/ytex/wiki/UMLS_SQL_SERVER_3_1
   
I would prefer to add the docs in the ctakes confluence, but as
far as I
   can
tell, I don't have write access there - can somebody give me write
   privileges
on the ctakes confluence site?
   
There was a bug in the umls install; copy
https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-
ytex/scripts/data/build.xmlover
the corresponding file in your ctakes-3.1.2 install
(CTAKES_HOME\bin\ctakes-ytex\scripts\data) and you should be set.
The import is currently running on the UMLS 2013AA (I assume this
will
   complete
without issues as long as the umls schema hasn't changed from 2012).
   
what trial and error did you have to go through to build the distro?
   
-vj
   
   
On Wed, Feb 5, 2014 at 5:33 PM, vijay garla vnga...@gmail.com
 wrote:
   
 Hi Vlad,

 sorry that the instructions aren't clear.

 re 1) What I am trying to say is install
 apache-ctakes-3.2.0-snapshot as usual (this is unchanged from
 3.1.1).  After that you still have to apply the lib and
 resources (these are things that cannot be distributed via apache).

 re 2) Yes, I need to update those docs.  Hopefully will get to
 that at some point.  However, I assume you already have a UMLS
 DB (also assume SQL Server).  If you can't/don't

Re: cTakes-247

2014-02-07 Thread John Green

Pei - Yes, yes it seems it does. I saw that when exploring the ticket. Is
the TypeSystem.xml complete? I think it is ... Also, is this something that
anyone thinks is worth while? I suppose I may be in a better position to
comment, learning my way through cTakes now and this being geared toward
new folks such as myself. But, opinions before diving would be appreciated,
time always being limited.

JG


On Thu, Feb 6, 2014 at 9:51 AM, Chen, Pei pei.c...@childrens.harvard.eduwrote:

 John,
 As a starting point, you may want to check out:

 http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.1.1/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/
 The content (Descriptions) probably needs to be filled in more...
 --Pei

  -Original Message-
  From: John Green [mailto:john.travis.gr...@gmail.com]
  Sent: Thursday, February 06, 2014 8:26 AM
  To: dev@ctakes.apache.org
  Subject: cTakes-247
 
  Anyone working on Jira item cTakes-247? If not, I was gonna tackle it.
 And if
  no one is, is everyone OK with a python script that auto-runs the XSLT
  transformations with some pretty css/javascript?
 
  JG

cTakes-247

2014-02-06 Thread John Green

Anyone working on Jira item cTakes-247? If not, I was gonna tackle it. And
if no one is, is everyone OK with a python script that auto-runs the XSLT
transformations with some pretty css/javascript?

JG

RE: Output Schema Documentation

2013-12-27 Thread John Green

I found the above that Pei mentioned helpful, but also the brief UIMA tutorial 
on their page as well as the original 2010 ctakes JAMIA article helpful for a 
big picture look. 




JG

—
Sent from Mailbox for iPhone

On Tue, Dec 24, 2013 at 11:28 AM, Chen, Pei
pei.c...@childrens.harvard.edu wrote:

 I think the type system doc [1] and javadoc [2] is probably the closest thing 
 I could think of:
 It's not an xml schema of the UIMA XMI per se though...
 [1] 
 http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.1.1/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSystem.xml
 [2] http://ctakes.apache.org/apidocs/3.1.1/
 Hope that helps, but if you have suggestions, feel free to post it on this 
 list...
 --Pei
 -Original Message-
 From: nartz...@gmail.com [mailto:nartz...@gmail.com] On Behalf Of
 Dewful
 Sent: Monday, December 23, 2013 7:11 AM
 To: dev@ctakes.apache.org
 Subject: Output Schema Documentation
 
 Hi All -
 
 I'm wondering if there is a piece of documentation that describes in detail
 the actual output of running the AggregatePlaintextUMLSProcessor?
 
  I can export the annotations as XML, but arent quite sure what I'm looking
 at. but would like to know the output of each component without having to
 jump into each javadoc / src code. Does this exist somewhere?
 
 Thanks!
 
 Nick

Re: Documentation

2013-12-27 Thread John Green

James, thank you for such a comprehensive reply! I sincerely appreciate the
effort.

I was being slightly tongue-in-cheek in regards to diving into old posts,
but thank you for the affirmation that I'm probably right on track. Im
finding the UIMA documentation profoundly helpful along with cherry-picked
old messages on the cTakes boards.

Thank you for pointing me at 3 useful issues to tackle. I'm going to take a
hard look at those 3 issues around Feb 3rd, when my latest series of exams
are over.

Regards groovy, I'm not familiar with it, but Ill take a brief look also in
February. This week I had off was spent almost entirely reading UIMA
documentation and a lot of the papers around cTakes.

The documentation is also something I am very interested in helping with,
especially since I'm going through it all right now as a new
user/developer. I will make this a high priority Feb 3rd as well.

Again, thank you for pointing me in a couple of directions where I can be
most helpful. I look forward to this set of exams being done with so I can
get to it. Reading all of the cTakes literature and its associated article
references was very exciting.

Sincerely,
JG


On Mon, Dec 23, 2013 at 11:41 AM, Masanz, James J. masanz.ja...@mayo.eduwrote:

 Hi John,

 As far as documentation, I doubt you are missing anything

 It all really starts from
 http://ctakes.apache.org/
 but with most of it on the confluence wiki
 https://cwiki.apache.org/confluence/display/CTAKES/

 And then yes there are bits and pieces spread across old posts.

 As far as code, you might try taking a look at any of these
 https://issues.apache.org/jira/i#browse/CTAKES-217
 https://issues.apache.org/jira/i#browse/CTAKES-155
 https://issues.apache.org/jira/i#browse/CTAKES-66

 Or any other JIRA issue that doesn't have an assignee.
 Or if you see an open issue with an assignee that looks interesting, you
 could also ask if someone is in the middle of a fix or not.

 I also plan to post or checkin some a groovy script for running the
 AggregatePlaintextUMLSProcessor today or tomorrow. The version I am working
 on doesn't do the dynamic downloading of the required components - I am
 taking the tactic of assuming the user downloaded a cTAKES binary and the
 separately downloadable resources.

 I think it would be great if you could try it out, and perhaps extend so
 there is a way to run an example of each component to give people an idea
 of what each component can do, as has been discussed on the dev@ mailing
 list.

 Another task might be to point out where you think the documentation is
 most scattered and maybe we can address that.

 -- James

 -Original Message-
 From: dev-return-2334-Masanz.James=mayo@ctakes.apache.org [mailto:
 dev-return-2334-Masanz.James=mayo@ctakes.apache.org] On Behalf Of
 John Green
 Sent: Friday, December 20, 2013 3:15 PM
 To: dev@ctakes.apache.org
 Subject: Documentation

 Hi all, Happy Holidays!

 I have a week off, then 6 weeks of insanity, then Ill finally be regularly
 free to try and help out, not that anyone is holding their breath or
 anything. When February roles around and I really start applying myself to
 some development corner of cTakes, is there anything I can do in the
 meantime that is pressing slop-work? Anything that I can leverage my
 clinical experience with to helping ctakes? Other than committing some more
 notes. Or maybe menial coding that is pressing? Like I've said before, Im
 no computer scientist (only aspiring), but I can definitely knock out some
 grunt-work coding.

 In the meantime, this week, Im still trying to get a real working
 understanding of all the moving parts in cTakes, both from a user side
 (building annotators, pipelines, dictionaries, etc) and a development side.
 I dont want to trouble anyone with individual questions before I've tackled
 all the literature/code documentation; however, what --is-- all the
 literature? The documentation, beyond installing the software, seems to be
 very spread out. Am I missing something obvious? Or is it just, dig in and
 read a billion different posts from 2008-till present time (including all
 the UIMA documentation/Lucene documentation etc)?

 As is the patent phrase in moments like these: forgive me if this has been
 asked before.

 John Green

Documentation

2013-12-20 Thread John Green

Hi all, Happy Holidays!

I have a week off, then 6 weeks of insanity, then Ill finally be regularly
free to try and help out, not that anyone is holding their breath or
anything. When February roles around and I really start applying myself to
some development corner of cTakes, is there anything I can do in the
meantime that is pressing slop-work? Anything that I can leverage my
clinical experience with to helping ctakes? Other than committing some more
notes. Or maybe menial coding that is pressing? Like I've said before, Im
no computer scientist (only aspiring), but I can definitely knock out some
grunt-work coding.

In the meantime, this week, Im still trying to get a real working
understanding of all the moving parts in cTakes, both from a user side
(building annotators, pipelines, dictionaries, etc) and a development side.
I dont want to trouble anyone with individual questions before I've tackled
all the literature/code documentation; however, what --is-- all the
literature? The documentation, beyond installing the software, seems to be
very spread out. Am I missing something obvious? Or is it just, dig in and
read a billion different posts from 2008-till present time (including all
the UIMA documentation/Lucene documentation etc)?

As is the patent phrase in moments like these: forgive me if this has been
asked before.

John Green

Re: cTAKES Groovy…Examples? Relatively low effort, potentially game changing

2013-12-20 Thread John Green

I was in passionate agreement with this post by Andy, by the by.




JG

—
Sent from Mailbox for iPhone

On Wed, Dec 4, 2013 at 6:29 PM, Andrew McMurry mcmurry.a...@gmail.com
wrote:

 Most Open Source frameworks come with an project-examples.zip folder.  
 I can't help but think that the Groovy parser code and ctakes-gui make 
 excellent EXAMPLES for potential users. 
 https://svn.apache.org/repos/asf/ctakes/sandbox/
 Imagine if each ctakes-component had an example Groovy script that shows how 
 to use each component complete with the pubmed citation for each! 
 http://ctakes.apache.org/components.html
 Now imagine you could just download a VM and run the examples out of the 
 box. 
 I'll follow up in a separate thread about the VM progress. 
 I am passionate about improving the first time user experience. 
 Why? John Resig (creator of jQuery) gave a convincing (if not damning) 
 synopsis of how open source projects lose users. 
 I think our user base could be easily 10X if we follow his advice:  
 http://lanyrd.com/2009/harvard-open-source-retreat/scdrkh/
 Thoughts??  
 PS: My research interest in NLP/ machine learning methods is taking second 
 priority to helping the first time user experience. 
 It is imperative we get this stuff right. 
 On Dec 4, 2013, at 7:09 AM, Tim Miller timothy.mil...@childrens.harvard.edu 
 wrote:
 Very cool. I was noticing that it was downloading the umls resources which 
 the parser itself doesn't need -- so I made a change to not grab 
 clinical-pipeline and grab directly the things it was getting through that 
 reference and now it runs even faster with only a 35M initial download.
 
 I'd like to check in my change -- should we keep working out of sandbox or 
 can we maybe put groovy scripts somewhere alongside the projects they belong 
 to? Maybe in the scripts/ directory or scripts/groovy, scripts/perl, etc.? 
 Any opinions on this?
 
 Tim
 
 
 On 11/27/2013 12:19 PM, Chen, Pei wrote:
 The sample constituency parser printer should be working now...
 Just copy and paste the text to parser.groovy and make it executable.
 All you should need is groovy installed on your machine.
 http://svn.apache.org/repos/asf/ctakes/sandbox/groovy/parser.groovy
 $ parser.groovy input
 Reading from directory: input
  (TOP (S (NP-SBJ (NN patient)) (VP (VBD took) (NP (NP (NNS 50mg)) (PP (IN 
 of) (NP (NP (NN aspirin)) (PP (IN for) (NP (NP (NN pain)) (PP-LOC (IN in) 
 (NP (NN knee)(. .)))
 
 Maybe we could create one that will output UMLS CUI/Codes... and then 
 others could easily modify to their needs.
 
 --Pei
 -Original Message-
 From: William Karl Thompson [mailto:w...@northwestern.edu]
 Sent: Tuesday, November 26, 2013 10:46 PM
 To: dev@ctakes.apache.org
 Subject: RE: cTAKES Groovy...
 
 That is very cool!
 
 Since we're talking Groovy, I'd just like make a plug for Gradle, a 
 fantastic
 build/deployment/dependency management tool that is in many ways much
 nicer to work with than Maven, though it plays nicely with Maven (for
 example, it can use Maven repositories). Gradle is also proven technology:
 it's the build tool for the Android operating system.
 
 From: Chen, Pei [pei.c...@childrens.harvard.edu]
 Sent: Tuesday, November 26, 2013 4:13 PM
 To: dev@ctakes.apache.org
 Subject: cTAKES Groovy...
 
 Tim had a good end user use case:
 I just want to use the ctakes constituency parser and output the tree text 
 to
 console.
 So I was inspired by Richard example of groovy...
 Check out:
 http://svn.apache.org/repos/asf/ctakes/sandbox/groovy/parser.groovy
 
 The groovy script will Automagically download the required
 classes,jars,resources and automatically runs.
 No longer requires the user to have any knowledge of UIMA, cTAKES, etc.
 Sample:
 $ parser.groovy input
 Reading from directory: input
 patient took 50mg of aspirin for pain in knee.
 begin:0 end:48
 
 Pretty cool, 'eh...
 --Pei

RE: Sundry; Problem Lists

2013-11-05 Thread John Green

I spoke too loosley! Maybe not short term. What I was trying to express was a 
step in an overall direction, e.g., to truly understand a clinical encounter, 
this seemed to me a natural pre-req. These are my goals with NLP and may not be 
ctakes goals. I joined ctakes as a step in the understanding requisite to head 
toward that goal.


Anyways, well see what happens! Im going to see if I can generate a gold 
standard list I can donate to ctakes, annotated by residents, by which the 
performance of any approach someone does manage to come up with, myself or 
otherwise, can be compared. 




Does such a list exist? Ive heard it mentioned in some of the pubmed articles, 
but first blush ive seen no publicaclly available set of annotated notes. Maybe 
each of your institutions has their own propriety lists. Walter Reed, where I 
work primarily, I havent heard of anything like that. 




Jg

—
Sent from Mailbox for iPhone

On Mon, Nov 4, 2013 at 8:29 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 Hi John,
 as the simplest answer to your comment/request:
 I'd be interested in hearing more of what you meant by:  - if not 
 completely necessary for any real clinical use of nlp.
 I'll answer with your own words:
 a good problem list, whether the physician admits to it or not, is 
 interpretation problem-number-one.
 There was nothing deep in my writing.
 I am a little confused by your statement:
 In the short-term, any NLP wanting to suggest further workup on this man 
 would need to a) recognize those features of the HPI and b) prioritize the 
 TB workup!
 I don't know if anybody has a short-term goal for an nlp tool that makes a 
 diagnosis (a) or suggests a procedure (b).  That seems to be a very long-term 
 goal for software that goes beyond the processing of natural language in the 
 note.  I may be misreading what you wrote.
 I understand your example, and like the ideas of parsing a problem list (if 
 explicit) or extracting a problem list (if not explicit).  These are what I 
 would say could be immediate goals for nlp.  At this time I do not know of 
 any special problem list section parsing - in fact, cTakes does not handle 
 formatted lists / tables.  Summarization of patient information (extraction 
 of a problem list and other) from unstructured text is already a big goal.
 Reordering a problem list, as well as clumping, would require quite a bit 
 more than nlp - a database and intelligent decision-making.  I'm not saying 
 that an nlp group would not love to tackle such a matter, just that it spills 
 outside the domain.
 I hope that I am starting to get on the same page, and I am enjoying this 
 chat - it is different from my normal engagements, which is always nice.
 Cheers,
 Sean
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Monday, November 04, 2013 5:30 PM
 To: Finan, Sean
 Cc: dev@ctakes.apache.org
 Subject: Re: Sundry; Problem Lists
 Thank you Sean for taking the time to respond to me, it was much appreciated. 
 I'm learning a lot about a lot.
I briefly discussed the first idea (acute vs. historical) with another 
physician (after you brought it up) and there was concurrency that such a 
feature would be extremely useful - if not completely necessary for any real 
clinical use of nlp.  I think that if temporal parsing ever becomes finite 
enough with respect to the time of an event relative to the time of the note 
(DocTimeRel) or with proper narrative containers, then this becomes a 
possible use case.  I mention this in a weak attempt to pull the nlpers into 
the discussion ...
 I'd be interested in hearing more of what you meant by:  - if not completely 
 necessary for any real clinical use of nlp. I may be showing my lack of 
 knowledge here again, or I may have miscommunicated in the first instance: a 
 good problem list, whether the physician admits to it or not, is 
 interpretation problem-number-one. Take this example of a History of Present 
 Ilness in physician lingo: I come in with a cough, I have a sick child at 
 home with a cough, I'm also 60 years old and a bad diabetic and a recent lab 
 value showed an A1C of 9. Further, I'm also a traveler and I just came back 
 from visiting my cousin in (some country endemic with tuberculosis). Of 
 course, all of the above may be in a narrative that includes complex story 
 features, that the physician may or may not have included in the free-text 
 note. Mr X is a 60 yo man with a known history of CAD and DMII. Patient 
 states he came home and had a cough. He further states that his daughter has 
 a cough. He recently returned from a country in which he had regular contact 
 with people with TB. He expresses concern and anxiety over this. Well, our 
 problem list is above (Cough, Sick contact at home (viral), Sick contact 
 abroad (TB), A1C of 9). In the short-term, any NLP wanting to suggest further 
 workup on this man would need to a) recognize those features of the HPI and 
 b) prioritize the TB workup

Re: Sundry; Problem Lists

2013-11-04 Thread John Green

Thank you Sean for taking the time to respond to me, it was much
appreciated. I'm learning a lot about a lot.

I briefly discussed the first idea (acute vs. historical) with another
physician (after you brought it up) and there was concurrency that such a
feature would be extremely useful - if not completely necessary for any
real clinical use of nlp.  I think that if temporal parsing ever becomes
finite enough with respect to the time of an event relative to the time of
the note (DocTimeRel) or with proper narrative containers, then this
becomes a possible use case.  I mention this in a weak attempt to pull the
nlpers into the discussion ...

I'd be interested in hearing more of what you meant by:  - if not
completely necessary for any real clinical use of nlp. I may be showing my
lack of knowledge here again, or I may have miscommunicated in the first
instance: a good problem list, whether the physician admits to it or not,
is interpretation problem-number-one. Take this example of a History of
Present Ilness in physician lingo: I come in with a cough, I have a sick
child at home with a cough, I'm also 60 years old and a bad diabetic and a
recent lab value showed an A1C of 9. Further, I'm also a traveler and I
just came back from visiting my cousin in (some country endemic with
tuberculosis). Of course, all of the above may be in a narrative that
includes complex story features, that the physician may or may not have
included in the free-text note. Mr X is a 60 yo man with a known history
of CAD and DMII. Patient states he came home and had a cough. He further
states that his daughter has a cough. He recently returned from a country
in which he had regular contact with people with TB. He expresses concern
and anxiety over this. Well, our problem list is above (Cough, Sick
contact at home (viral), Sick contact abroad (TB), A1C of 9). In the
short-term, any NLP wanting to suggest further workup on this man would
need to a) recognize those features of the HPI and b) prioritize the TB
workup! So the modified by priority problem list would be 1) Cough 2) TB
exposure ... etc. Clumping could ensue. Also, for a longitudinal problem
list, one that tracked across clinical encounters, only the TB exposure and
maybe a history of poorly controlled diabetes would need to continue on
in the patients history. Certainly a sick child at home would not (what I
meant by acute vs longitudinal problem lists).

Thanks for the conversation Sean,
Sincerely,
John


On Mon, Nov 4, 2013 at 12:15 PM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

  Excellent!  By the by, I know next to nothing about nlp - I'm just a
 software developer that (for some reason) jumped down this (nlp) particular
 rabbit hole.  When it comes to nlp background, research, state and
 direction I'm hoping that somebody much more knowledgable than I will jump
 in.

  after a thorough pubmed search, no one seems to have tried to build
 problem lists for ACUTE encounters, only as extensions to a past medical
 history
 I''m really glad that we have a truly novel road on which to travel.

   I seem to be interested in a current encounter (the now) [as opposed
 to]  the longitudinal problem list (the ever).
 I think that is a great as both a challenge and possible tool, as well as
 your thought on
  prioritization, eg enumeration from most important to least, as well as
 clumping

  I briefly discussed the first idea (acute vs. historical) with another
 physician (after you brought it up) and there was concurrency that such a
 feature would be extremely useful - if not completely necessary for any
 real clinical use of nlp.  I think that if temporal parsing ever becomes
 finite enough with respect to the time of an event relative to the time of
 the note (DocTimeRel) or with proper narrative containers, then this
 becomes a possible use case.  I mention this in a weak attempt to pull the
 nlpers into the discussion ...

   This is probably well known stuff
 Bad assumption ... insert emoticon here ...

  working back from the known natural history of diseases would possibly
 be a route to a solution.
 Now that is a challenge!

  Cheers for the inspiration and enthusiasm,
 Sean


  --
 *From:* John Green [john.travis.gr...@gmail.com]
 *Sent:* Monday, November 04, 2013 10:45 AM
 *To:* Finan, Sean

 *Subject:* RE: Sundry; Problem Lists

   Oh goodness no, I didnt think that at all! Im so new to the field of
 NLP, anything and everything helps and is appreciated. Heck, im just now
 learning to understand Markov chains.

  An additional thought: after a thorough pubmed search, no one seems to
 have tried to build problem lists for ACUTE encouters, only as extensions
 to a past medical history. I think this would be a very fruitful avenue. It
 could easily be scored against a gold standard medical resident list for a
 few hundred patients across depth and acuity.

  Just thinkin out loud, bouncing ideas off those who know more than I

Re: Sundry

2013-10-31 Thread John Green

Pei and Tim - Good questions.

The bottom line is that OPQRST is the algorithm that every clinician uses
to characterize the history of a sign, symptom or constellation of
symptoms. Each letter has multiple meanings, but generally they're grouped.
O for onset, was it quick or slow in onset, P for palliative or provoking
phenomenon, that is, does tylenol make it better? Does it feel better when
you lean forward? Is it worse with standing? Q is the quality, generally,
though I could give more examples of each Ill keep it brief from here, R is
generally region or radiation of the pain and or sign, S is the severity,
and T is the time course, is it intermittent? When it happens, how long
does it last for? I could send documents used to teach new clinicians to
better comprehend for anyone interested.

OPQRST, while most residents would assume it is only for teaching new
clinicians, as Tim said, is a useful tool at all levels. Great clinicians,
and I work with some great senior folks, use this everyday. The idea that
it is only for teaching is founded on two things: one, that it doubles as a
structured mnemonic for characterizing signs and symptoms and two, that
everyone so far ingrains this into their clinical skill set, unless they
are geared toward teaching, they, after the basic level, never think about
it again! Caveat: many good clinicians will tell you to keep it algorithmic
so that you're systematic and do not overlook details.

What is it's application to ML? Obviously the furthest desired end-state
for NLP like cTakes would be understanding a clinical encounter to such a
nuanced level that detailed diagnoses could be considered along with
treatment plans. While I only know what I've read in Artificial
Intelligence: A Modern Approach and picked up from friends over the years
who were good knowledgeable in this field, I feel that OPQRST would be a
huge benefit toward beginning to outline the problem of more rigorous ML
characterization of the clinical narrative.

The utility of OPQRST may not still be entirely clear to those who have
never been presented with a clinical encounter. Let me try one more stab:
Take the classic example of chest pain. A man comes to the ER with chest
pain. Is the onset quick? Yes doc, it was all of a sudden. This might
support a diagnosis of, say, MI, aortic dissection, pulmonary embolism, but
less likely someone would call GERD sudden. Palliative or provoking
features? Well, when I take 8 antacids it gets better (GERD), or, When I
take my wifes nitroglycerine it got better for a little while (angina), or,
when I took my wifes nitroglycerine it did nothing (pericarditis?).
Quality: Is it stabbing? Ya doc, its stabbing (less likely MI). Is it
crushing? Like an elephant on your chest? Ya doc, that's it. (more likely
MI), and so on.

Now of course, cTakes could be used for a real life encounter like this
(middleware) at some point, but likely it would be taking a history and
proposing a diagnosis (middleware again Tim, yes). But the point is, the
first steps toward knowing what were dealing with at the historical level
is centered around OPQRST, and it just occurred to me to ask what we
thought about the feasibility of something like that.

In retrospect, it may be too tough, but at some point it would need done,
just as much as a clinician must learn it.

One final point: problem lists. These are absolutely essential to any
clinician in making a diagnosis. Again, often times, they dont think about
it, but they use them. When writing the above it occurred to me: much of a
problem list definition may already be contained to varying degrees in
existing cTakes databases. It would be an interesting and worthwhile paper,
I think, to see how well cTakes compiled problem lists matched Medical
Students, Residents, and Attending physician's problem lists. If anyone is
interested in this line of thought, I would be interested in collaborating.
It would be very easy, and the data may actually already exist to compare.
Forgive me if its already been done, but, if it hasnt, then it would go a
long way toward proving cTakes efficacy in regards to high-order processes.
And if it hasnt been done and someone does it at a later date, please, send
me an email to the paper!

JG


On Wed, Oct 30, 2013 at 10:08 AM, Tim Miller 
timothy.mil...@childrens.harvard.edu wrote:

 Thanks for bumping this Pei, it reminds me I meant to respond to it.

 The OPQRST does sound like a great ML project. At a glance I might think a
 sequence model over sentences (like a CRF) would be a good model.
 But I'm wondering what the end use case is? Is it for teaching OPQRST to
 new clinicians? Or maybe as a sort of middleware for other projects where
 it might be a useful feature? Without a physician's intuition I sometimes
 suffer from a failure of imagination on these things.

 Tim



 On 10/30/2013 09:59 AM, Chen, Pei wrote:

 Hi John,
 I was away for a little bit and finally got a chance to catch up on
 emails...

  2)

Re: Sundry

2013-10-31 Thread John Green

A follow up point to the previous email:

Yes Tim: middleware. This is moving beyond just documentation toward
diagnosis and action, which I suppose is a hidden assumption here - maybe
the cTakes community doesnt care at this point, based on history or
physical exam/labs. However, a common quote in medicine is that a good
clinician can diagnose 90% of the time based on history alone. This may be
an overstatement, however, it underscores its importance. I think OPQRST
shouldn't be overlooked as a simple paradigm for as starting point.

Now, some of this may be integrated into cTakes already. I must admit, I am
still in my infancy with understanding cTakes, UIMA, etc.

JG



On Thu, Oct 31, 2013 at 12:00 PM, John Green john.travis.gr...@gmail.comwrote:

 Pei and Tim - Good questions.

 The bottom line is that OPQRST is the algorithm that every clinician uses
 to characterize the history of a sign, symptom or constellation of
 symptoms. Each letter has multiple meanings, but generally they're grouped.
 O for onset, was it quick or slow in onset, P for palliative or provoking
 phenomenon, that is, does tylenol make it better? Does it feel better when
 you lean forward? Is it worse with standing? Q is the quality, generally,
 though I could give more examples of each Ill keep it brief from here, R is
 generally region or radiation of the pain and or sign, S is the severity,
 and T is the time course, is it intermittent? When it happens, how long
 does it last for? I could send documents used to teach new clinicians to
 better comprehend for anyone interested.

 OPQRST, while most residents would assume it is only for teaching new
 clinicians, as Tim said, is a useful tool at all levels. Great clinicians,
 and I work with some great senior folks, use this everyday. The idea that
 it is only for teaching is founded on two things: one, that it doubles as a
 structured mnemonic for characterizing signs and symptoms and two, that
 everyone so far ingrains this into their clinical skill set, unless they
 are geared toward teaching, they, after the basic level, never think about
 it again! Caveat: many good clinicians will tell you to keep it algorithmic
 so that you're systematic and do not overlook details.

 What is it's application to ML? Obviously the furthest desired end-state
 for NLP like cTakes would be understanding a clinical encounter to such a
 nuanced level that detailed diagnoses could be considered along with
 treatment plans. While I only know what I've read in Artificial
 Intelligence: A Modern Approach and picked up from friends over the years
 who were good knowledgeable in this field, I feel that OPQRST would be a
 huge benefit toward beginning to outline the problem of more rigorous ML
 characterization of the clinical narrative.

 The utility of OPQRST may not still be entirely clear to those who have
 never been presented with a clinical encounter. Let me try one more stab:
 Take the classic example of chest pain. A man comes to the ER with chest
 pain. Is the onset quick? Yes doc, it was all of a sudden. This might
 support a diagnosis of, say, MI, aortic dissection, pulmonary embolism, but
 less likely someone would call GERD sudden. Palliative or provoking
 features? Well, when I take 8 antacids it gets better (GERD), or, When I
 take my wifes nitroglycerine it got better for a little while (angina), or,
 when I took my wifes nitroglycerine it did nothing (pericarditis?).
 Quality: Is it stabbing? Ya doc, its stabbing (less likely MI). Is it
 crushing? Like an elephant on your chest? Ya doc, that's it. (more likely
 MI), and so on.

 Now of course, cTakes could be used for a real life encounter like this
 (middleware) at some point, but likely it would be taking a history and
 proposing a diagnosis (middleware again Tim, yes). But the point is, the
 first steps toward knowing what were dealing with at the historical level
 is centered around OPQRST, and it just occurred to me to ask what we
 thought about the feasibility of something like that.

 In retrospect, it may be too tough, but at some point it would need done,
 just as much as a clinician must learn it.

 One final point: problem lists. These are absolutely essential to any
 clinician in making a diagnosis. Again, often times, they dont think about
 it, but they use them. When writing the above it occurred to me: much of a
 problem list definition may already be contained to varying degrees in
 existing cTakes databases. It would be an interesting and worthwhile paper,
 I think, to see how well cTakes compiled problem lists matched Medical
 Students, Residents, and Attending physician's problem lists. If anyone is
 interested in this line of thought, I would be interested in collaborating.
 It would be very easy, and the data may actually already exist to compare.
 Forgive me if its already been done, but, if it hasnt, then it would go a
 long way toward proving cTakes efficacy in regards to high-order processes.
 And if it hasnt

Re: cTAKES user interface

2013-10-31 Thread John Green

This is a very good point. I can attest as a new user, these hurdles
mentioned by Andy seem paramount to broader adoption. Just by-the-by I
loved the Linux analogy Andy. At least, to a greater or lesser degree, it
survived!

JG


On Tue, Oct 29, 2013 at 7:33 PM, andy mcmurry mcmurry.a...@gmail.comwrote:

 Ricard:

 Groovy idea.
 Virtual Machine could benefit from that as well.

 *## IMHO: These are parallel tasks that I think are higher priority than
 new features.
 *It isn't as glamorous as playing Jeopardy, but deploy issues are keeping
 users out -- if I were to guess A LOT of the potential user base looks at
 ctakes and says wow that amazing! 

 then they look at how complex it will be to do their simple Hello Ctakes
 example and the 15 minutes of attention -- we are one walk to the soda
 machine away from being either embraced or forgotten. It really is that
 basic -- if it takes to long to get started, chances are new users -- wont.

 The documentation is nicely done and this is absolutely no criticism of
 that.
 In fact, I apologize for being so out of touch (was thesis writing ).

 It just feels like 1992 all over again and your compiling linux just to
 find out you didn't tweak your VGA card settings right, and now your stuck.
 But you still think linux rocks.

 2013 is here and linux now runs out of the box.
 The OS itself has changed less in features and more in ease of use.

 I know its not Jeopardy but we gotta do it.
 *## I'm going to make a VM (Ubuntu 13.10) for myself and let everyone kick
 it around. **I strongly encourage the Groovy deploy targets as Richard
 suggested. *

 This really isn't an either | OR.
 We need to be able to have easier turn around on cTakes so we can get back
 to Jeopardy.

 *## Who agrees, and are there any counter proposals? *


 AndyMC






 On Tue, Oct 29, 2013 at 3:49 PM, Richard Eckart de Castilho
 r...@apache.orgwrote:

  Maven allows to do marvelous things on the CLI, provided you throw in an
  additional component: Groovy.
 
  We did some amazing self-contained Groovy scripts with uimaFIT and DKPro
  Core which you might find interesting
 
http://code.google.com/p/dkpro-core-asl/wiki/DKProGroovyCookbook
 
  -- Richard
 
  On 29.10.2013, at 23:09, Miller, Timothy 
  timothy.mil...@childrens.harvard.edu wrote:
 
   I think this is also an area where Maven integration was a small step
  backwards (I greatly appreciate the steps forward it allowed). I used to
  run stuff from the command line and in scripts more often but it's
 slightly
  less straightforward setting up the classpath with maven -- before you
  could put a simple java -cp lib/*.jar class name in a script, now I'm
 not
  sure how to go about it using maven. I'm sure there's a way, but I am
  afraid of falling down the maven rabbit hole.
   Tim
  
  
   On Oct 29, 2013, at 5:53 PM, Chen, Pei wrote:
  
   +1
   Pan, the short answer is yes- it can be done in CLI.
   The problem is that most of us who are already familiar with the nitty
  gritty are probably doing this with some sort of custom scripts or
 solution.
   Cc' the dev group to get a fresh perspective; not sure what the
 easiest
  would be-- run the CPE via command line with default input/output
  directories or running a Driver Main Class as part of examples.
  
   --Pei

Re: Sundry; Problem Lists

2013-10-31 Thread John Green

Thanks! I will look at both. JG


On Thu, Oct 31, 2013 at 1:53 PM, Finan, Sean 
sean.fi...@childrens.harvard.edu wrote:

 I don't know if what I write below truly applies to the discussion, but
 here it is.

 much of a problem list definition may already be contained to varying
 degrees
  in existing cTakes databases.
 The UMLS does provide a problem list, but I haven't looked at it.
 http://www.nlm.nih.gov/research/umls/Snomed/core_subset.html

 This might be a paper of interest to you:
 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655994/
 It discusses the use of nlp to create something like a problem list.

 Sean



 
 From: John Green [john.travis.gr...@gmail.com]
 Sent: Thursday, October 31, 2013 12:02 PM
 To: dev@ctakes.apache.org
 Subject: Re: Sundry

 Pei and Tim - Good questions.

 The bottom line is that OPQRST is the algorithm that every clinician uses
 to characterize the history of a sign, symptom or constellation of
 symptoms. Each letter has multiple meanings, but generally they're grouped.
 O for onset, was it quick or slow in onset, P for palliative or provoking
 phenomenon, that is, does tylenol make it better? Does it feel better when
 you lean forward? Is it worse with standing? Q is the quality, generally,
 though I could give more examples of each Ill keep it brief from here, R is
 generally region or radiation of the pain and or sign, S is the severity,
 and T is the time course, is it intermittent? When it happens, how long
 does it last for? I could send documents used to teach new clinicians to
 better comprehend for anyone interested.

 OPQRST, while most residents would assume it is only for teaching new
 clinicians, as Tim said, is a useful tool at all levels. Great clinicians,
 and I work with some great senior folks, use this everyday. The idea that
 it is only for teaching is founded on two things: one, that it doubles as a
 structured mnemonic for characterizing signs and symptoms and two, that
 everyone so far ingrains this into their clinical skill set, unless they
 are geared toward teaching, they, after the basic level, never think about
 it again! Caveat: many good clinicians will tell you to keep it algorithmic
 so that you're systematic and do not overlook details.

 What is it's application to ML? Obviously the furthest desired end-state
 for NLP like cTakes would be understanding a clinical encounter to such a
 nuanced level that detailed diagnoses could be considered along with
 treatment plans. While I only know what I've read in Artificial
 Intelligence: A Modern Approach and picked up from friends over the years
 who were good knowledgeable in this field, I feel that OPQRST would be a
 huge benefit toward beginning to outline the problem of more rigorous ML
 characterization of the clinical narrative.

 The utility of OPQRST may not still be entirely clear to those who have
 never been presented with a clinical encounter. Let me try one more stab:
 Take the classic example of chest pain. A man comes to the ER with chest
 pain. Is the onset quick? Yes doc, it was all of a sudden. This might
 support a diagnosis of, say, MI, aortic dissection, pulmonary embolism, but
 less likely someone would call GERD sudden. Palliative or provoking
 features? Well, when I take 8 antacids it gets better (GERD), or, When I
 take my wifes nitroglycerine it got better for a little while (angina), or,
 when I took my wifes nitroglycerine it did nothing (pericarditis?).
 Quality: Is it stabbing? Ya doc, its stabbing (less likely MI). Is it
 crushing? Like an elephant on your chest? Ya doc, that's it. (more likely
 MI), and so on.

 Now of course, cTakes could be used for a real life encounter like this
 (middleware) at some point, but likely it would be taking a history and
 proposing a diagnosis (middleware again Tim, yes). But the point is, the
 first steps toward knowing what were dealing with at the historical level
 is centered around OPQRST, and it just occurred to me to ask what we
 thought about the feasibility of something like that.

 In retrospect, it may be too tough, but at some point it would need done,
 just as much as a clinician must learn it.

 One final point: problem lists. These are absolutely essential to any
 clinician in making a diagnosis. Again, often times, they dont think about
 it, but they use them. When writing the above it occurred to me: much of a
 problem list definition may already be contained to varying degrees in
 existing cTakes databases. It would be an interesting and worthwhile paper,
 I think, to see how well cTakes compiled problem lists matched Medical
 Students, Residents, and Attending physician's problem lists. If anyone is
 interested in this line of thought, I would be interested in collaborating.
 It would be very easy, and the data may actually already exist to compare.
 Forgive me if its already been done, but, if it hasnt, then it would go a
 long way toward proving cTakes efficacy

RE: Sundry; Problem Lists

2013-10-31 Thread John Green

Sean - quick note: after looking at the above two resources, a couple of 
points.  The first resource confirms what I expected, that the vocabulary 
exists in ctakes. The second confirms what I suspected: that novel approaches 
to ordering and identification of top members of a problem list are needed. 
Namely, that the vocabulary may be there, but thats only a tenth of the battle. 
Your second great resource you sent me acknowledges this - that prioritization, 
eg enumeration from most important to least, as well as clumping, are the true 
battle.




A point of clarification on my end: it would be interesting to see what could 
be added on top of existing ctakes in order to facilate a solution to the 
second problem - clumping and prioritizing. (For instance, from the second 
article, an acute process may have nothing todo with the past medical history 
and if an algorithm were concerned with all members as equals, it would miss 
the issue at hand). 




Just as a thought: working back from the known natural history of diseases 
would possibly be a route to a solution.




This is probably well known stuff, so please forgive my ignorance if its all 
been done/thought of before.




Again, the two links were very helpful, thank you.




Jg

—
Sent from Mailbox for iPhone

On Thu, Oct 31, 2013 at 2:04 PM, Finan, Sean
sean.fi...@childrens.harvard.edu wrote:

 I don't know if what I write below truly applies to the discussion, but here 
 it is.
much of a problem list definition may already be contained to varying degrees
 in existing cTakes databases.
 The UMLS does provide a problem list, but I haven't looked at it.
 http://www.nlm.nih.gov/research/umls/Snomed/core_subset.html
 This might be a paper of interest to you:
 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655994/
 It discusses the use of nlp to create something like a problem list.
 Sean
 
 From: John Green [john.travis.gr...@gmail.com]
 Sent: Thursday, October 31, 2013 12:02 PM
 To: dev@ctakes.apache.org
 Subject: Re: Sundry
 Pei and Tim - Good questions.
 The bottom line is that OPQRST is the algorithm that every clinician uses
 to characterize the history of a sign, symptom or constellation of
 symptoms. Each letter has multiple meanings, but generally they're grouped.
 O for onset, was it quick or slow in onset, P for palliative or provoking
 phenomenon, that is, does tylenol make it better? Does it feel better when
 you lean forward? Is it worse with standing? Q is the quality, generally,
 though I could give more examples of each Ill keep it brief from here, R is
 generally region or radiation of the pain and or sign, S is the severity,
 and T is the time course, is it intermittent? When it happens, how long
 does it last for? I could send documents used to teach new clinicians to
 better comprehend for anyone interested.
 OPQRST, while most residents would assume it is only for teaching new
 clinicians, as Tim said, is a useful tool at all levels. Great clinicians,
 and I work with some great senior folks, use this everyday. The idea that
 it is only for teaching is founded on two things: one, that it doubles as a
 structured mnemonic for characterizing signs and symptoms and two, that
 everyone so far ingrains this into their clinical skill set, unless they
 are geared toward teaching, they, after the basic level, never think about
 it again! Caveat: many good clinicians will tell you to keep it algorithmic
 so that you're systematic and do not overlook details.
 What is it's application to ML? Obviously the furthest desired end-state
 for NLP like cTakes would be understanding a clinical encounter to such a
 nuanced level that detailed diagnoses could be considered along with
 treatment plans. While I only know what I've read in Artificial
 Intelligence: A Modern Approach and picked up from friends over the years
 who were good knowledgeable in this field, I feel that OPQRST would be a
 huge benefit toward beginning to outline the problem of more rigorous ML
 characterization of the clinical narrative.
 The utility of OPQRST may not still be entirely clear to those who have
 never been presented with a clinical encounter. Let me try one more stab:
 Take the classic example of chest pain. A man comes to the ER with chest
 pain. Is the onset quick? Yes doc, it was all of a sudden. This might
 support a diagnosis of, say, MI, aortic dissection, pulmonary embolism, but
 less likely someone would call GERD sudden. Palliative or provoking
 features? Well, when I take 8 antacids it gets better (GERD), or, When I
 take my wifes nitroglycerine it got better for a little while (angina), or,
 when I took my wifes nitroglycerine it did nothing (pericarditis?).
 Quality: Is it stabbing? Ya doc, its stabbing (less likely MI). Is it
 crushing? Like an elephant on your chest? Ya doc, that's it. (more likely
 MI), and so on.
 Now of course, cTakes could be used for a real life encounter like this
 (middleware

RE: Sundry; Problem Lists

2013-10-31 Thread John Green

Last point: I seem to be interested in a current encounter (the now) and 
diagnosis, the article seems to be interested in an arguably just as useful 
tool, the longitudinal problem list (the ever), though very different I would 
think in approach. 




Thoughts? 

Jg







—
Sent from Mailbox for iPhone

On Thu, Oct 31, 2013 at 7:22 PM, John Green john.travis.gr...@gmail.com
wrote:

 Sean - quick note: after looking at the above two resources, a couple of 
 points.  The first resource confirms what I expected, that the vocabulary 
 exists in ctakes. The second confirms what I suspected: that novel approaches 
 to ordering and identification of top members of a problem list are needed. 
 Namely, that the vocabulary may be there, but thats only a tenth of the 
 battle. Your second great resource you sent me acknowledges this - that 
 prioritization, eg enumeration from most important to least, as well as 
 clumping, are the true battle.
 A point of clarification on my end: it would be interesting to see what could 
 be added on top of existing ctakes in order to facilate a solution to the 
 second problem - clumping and prioritizing. (For instance, from the second 
 article, an acute process may have nothing todo with the past medical history 
 and if an algorithm were concerned with all members as equals, it would miss 
 the issue at hand). 
 Just as a thought: working back from the known natural history of diseases 
 would possibly be a route to a solution.
 This is probably well known stuff, so please forgive my ignorance if its all 
 been done/thought of before.
 Again, the two links were very helpful, thank you.
 Jg
 —
 Sent from Mailbox for iPhone
 On Thu, Oct 31, 2013 at 2:04 PM, Finan, Sean
 sean.fi...@childrens.harvard.edu wrote:
 I don't know if what I write below truly applies to the discussion, but here 
 it is.
much of a problem list definition may already be contained to varying degrees
 in existing cTakes databases.
 The UMLS does provide a problem list, but I haven't looked at it.
 http://www.nlm.nih.gov/research/umls/Snomed/core_subset.html
 This might be a paper of interest to you:
 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655994/
 It discusses the use of nlp to create something like a problem list.
 Sean
 
 From: John Green [john.travis.gr...@gmail.com]
 Sent: Thursday, October 31, 2013 12:02 PM
 To: dev@ctakes.apache.org
 Subject: Re: Sundry
 Pei and Tim - Good questions.
 The bottom line is that OPQRST is the algorithm that every clinician uses
 to characterize the history of a sign, symptom or constellation of
 symptoms. Each letter has multiple meanings, but generally they're grouped.
 O for onset, was it quick or slow in onset, P for palliative or provoking
 phenomenon, that is, does tylenol make it better? Does it feel better when
 you lean forward? Is it worse with standing? Q is the quality, generally,
 though I could give more examples of each Ill keep it brief from here, R is
 generally region or radiation of the pain and or sign, S is the severity,
 and T is the time course, is it intermittent? When it happens, how long
 does it last for? I could send documents used to teach new clinicians to
 better comprehend for anyone interested.
 OPQRST, while most residents would assume it is only for teaching new
 clinicians, as Tim said, is a useful tool at all levels. Great clinicians,
 and I work with some great senior folks, use this everyday. The idea that
 it is only for teaching is founded on two things: one, that it doubles as a
 structured mnemonic for characterizing signs and symptoms and two, that
 everyone so far ingrains this into their clinical skill set, unless they
 are geared toward teaching, they, after the basic level, never think about
 it again! Caveat: many good clinicians will tell you to keep it algorithmic
 so that you're systematic and do not overlook details.
 What is it's application to ML? Obviously the furthest desired end-state
 for NLP like cTakes would be understanding a clinical encounter to such a
 nuanced level that detailed diagnoses could be considered along with
 treatment plans. While I only know what I've read in Artificial
 Intelligence: A Modern Approach and picked up from friends over the years
 who were good knowledgeable in this field, I feel that OPQRST would be a
 huge benefit toward beginning to outline the problem of more rigorous ML
 characterization of the clinical narrative.
 The utility of OPQRST may not still be entirely clear to those who have
 never been presented with a clinical encounter. Let me try one more stab:
 Take the classic example of chest pain. A man comes to the ER with chest
 pain. Is the onset quick? Yes doc, it was all of a sudden. This might
 support a diagnosis of, say, MI, aortic dissection, pulmonary embolism, but
 less likely someone would call GERD sudden. Palliative or provoking
 features? Well, when I take 8 antacids it gets better (GERD), or, When I
 take my

Re: ctakes-examples project?

2013-08-22 Thread John Green

@Guergana: Glad to be getting involved in the project ma'am!

JG


On Wed, Aug 21, 2013 at 12:59 PM, Savova, Guergana 
guergana.sav...@childrens.harvard.edu wrote:

 We will look at the metadata info. Some of it is critical for the
 annotations (e.g. docTime).
 Thank you, John.
 --Guergana

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Wednesday, August 21, 2013 12:56 PM
 To: dev@ctakes.apache.org
 Subject: Re: ctakes-examples project?

 So far I have about 15 notes done. Im submitting them slow as, after I
 said they were done I decided one last review of each for gross errors and
 completeness would be in order. Im slowly working through the proof read of
 each now. Just FYI if anyone was wondering.

 I don't know what the coders can do with the metadata I included at the
 top of each note. I thought it would be useful, however, to attempt to
 describe the data with these additional metrics. Maybe they are already
 included vectors in the gold-standard annotation.

 JG


 On Wed, Aug 21, 2013 at 12:00 PM, Pei Chen chen...@apache.org wrote:

  John is creating example clinical notes [1]...  and I believe Guergana's
  group and co. will create gold standard annotations for them (as
  training examples)?
 
  Where would be a good home for something like this?
 
  What do folks think about creating a separate project called
  ctakes-examples?  An alternative might be to put it in one of the test
  projects?
  [1] https://issues.apache.org/jira/browse/CTAKES-223
 
  --Pei

Examples

2013-08-17 Thread John Green

Just got some free time. I have a number of example free-text per previous
discussions to upload. They're quality but not annotated. Do I need someone
to commit for me?

Thanks,
J Green

Re: Next cTAKES release (3.1)?

2013-07-02 Thread John Green

Hi all,

Ive been following this mail list for a couple of months. Im a third year 
medical student rounding the bend toward my MD. I used to be a computer 
programmer, however, and continue my own projects. Im very interested in 
contributing eventually to cTakes development. In the meantime, given the 
current talk of examples, if any domain specific examples needed generated I am 
domain knowledgable enough that I could pound out a few free text notes made to 
order.

Let me know, you all may already have docs on hand willing todo this, but if 
not...

John Green

Sent from my iPhone

On Jun 28, 2013, at 8:59, Chen, Pei pei.c...@childrens.harvard.edu wrote:

 I completely agree with making cTAKES easier use.  I think it is exciting to 
 hear the different use cases here and understanding where some of the areas 
 that need improvements are (which we haven't thought about earlier).
 I think Tim's suggestions and the 3 concrete actionable items makes a lot of 
 sense.  Hopefully it should attract new users, adopters, and perhaps more 
 committers.
 
 i) Make the typesystem forefront in documentation -- generate javadocs and
 have as a link on the ctakes frontpage/sidebar
 ii) Similar to the way that we are aiming to have tests in every module, also
 have clearly labeled examples in every module that set up a pipeline, run on
 sample notes (could be the same sample notes from the tests), and do
 something with the results.
 iii) Follow Giri's recommendation to have example training data for people
 who want to take the next step and train their own models
 
 I think Java developers are accustomed to including a library as a 
 dependency/jar, have an API to pass input, and get the results via pojos;  So 
 the examples could initially shield the complexity of wiring a pipeline 
 together etc.  
 If we can improve the API's and how it gets integrated with other apps, we 
 can add any GUI/CLI tools on top of this afterwards.
 
 --Pei
 
 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Friday, June 28, 2013 8:00 AM
 To: dev@ctakes.apache.org
 Subject: Re: Next cTAKES release (3.1)?
 
 Very interesting discussion. I think Giri is right about giving example 
 training
 data in the format that our training code can read. While our ultimate goal
 would be to build and release models that are completely domain-
 independent, in the real world it is almost always better to use some
 domain-specific data and we should think more about how to facilitate that.
 
 As for making it easier to get started, it is not totally clear to me what 
 this
 means/how to do it so it might be useful to get specific about what this
 means. I think our biggest hurdle is
 
 1) Prerequisite of understanding UIMA/UIMAFit
 
 Since UIMAFit is officially becoming part of UIMA that will be easier, and
 hopefully people will just learn the easier (in my opinion) UIMAFit way than
 the standard UIMA way of doing things. Is there something we can be doing
 to make understanding UIMA easier? Or do we just need to say upfront that
 this is a prerequisite and hope that people don't give up due to this thing 
 that
 is out of our control?
 
 Another hurdle is:
 
 2) cTAKES is a multi-purpose developer-aimed tool
 
 So it's not just a matter of hiding complexity -- at some point people have 
 to
 understand their problem, understand cTAKES' capabilities, and start coding.
 Pei's GUI will help for some common use cases but will not remove the
 requirement that someone at the organization knows cTAKES.
 I think one part of this problem is the fact that the typesystem is not well
 documented. A developer needs to know what the output is (objects from
 the typesystem), how to get them (which modules/pipelines), and what
 information is in them. So maybe on this end my recommendation would be:
 i) Make the typesystem forefront in documentation -- generate javadocs and
 have as a link on the ctakes frontpage/sidebar
 ii) Similar to the way that we are aiming to have tests in every module, also
 have clearly labeled examples in every module that set up a pipeline, run on
 sample notes (could be the same sample notes from the tests), and do
 something with the results.
 iii) Follow Giri's recommendation to have example training data for people
 who want to take the next step and train their own models
 
 This is quite a bit of developer overhead, so it's worth asking whether you
 agree with my diagnosis and treatment or whether you think there are
 different problems/solutions that should be higher priority.
 
 Tim
 
 On 06/27/2013 10:59 PM, Girivaraprasad Nambari wrote:
 Hi Vijay and Andy,
 
 Thanks for sharing those examples.
 
 Trouble is, privacy requires that these examples be made up by hand
 
 Agree with this statement and this is very valid concern.
 
 In getting started examples, I think we should just have couple of
 entries (5-10 small entries), not more than that (with explicit
 statement like

75 matches

Mail list logo