RE: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-15 Thread Savova, Guergana
+1 from me.
--Guergana

-Original Message-
From: Finan, Sean  
Sent: Wednesday, May 15, 2024 1:32 PM
To: dev@ctakes.apache.org
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Thanks Tim!


From: Miller, Timothy 
Sent: Wednesday, May 15, 2024 11:38 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Thanks Sean,
I was able to get it working – definitely a user/documentation issue and not an 
issue with the code. Looks like a great release. I’m happy to vote for release 
+1.
Tim


From: Finan, Sean 
Date: Tuesday, May 14, 2024 at 10:35 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Ah - are you just running the class within intellij?  If so, you need to set 
the classpath in the run configuration to be ctakes-examples.  Otherwise the 
classpath doesn't contain anything from modules outside ctakes-gui and 
ctakes-core.

Alternatively, run the maven compile step with the "runPiperGui" profile 
selected.  That will also run the piper file submitter gui with the correct 
classpath.

Using a binary build, after running bin/getUmlsDictionary, running 
bin/runPiperSubmitter also works.

I don't want to do it for 5.1.0, but I should make names of the class, profile 
and script match.

I will check the wiki instructions and make sure that -exact- details are in 
there.

Sean


From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its 
directory) or running 'mvn package' on the ctakes -main- project (in the main 
ctakes root directory) with the web-rest-build profile enabled 
'-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.

RE: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

2023-03-08 Thread Savova, Guergana
+1 on waiting.
--Guergana

From: Bethard, Steven - (bethard) 
Sent: Wednesday, March 8, 2023 1:19 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

* External Email - Caution *

+1 on waiting for the new distribution platform (and continuing to search for a 
primary Release Manager).

On 3/8/23, 06:29, "Finan, Sean" wrote:

External Email

Hi all Apache cTAKES developers and users,

I have news on the release front ...

The Apache Infrastructure team is working on a new Artifact Distribution 
Platform.  It will be used to upload and promote release artifacts, sign keys, 
and host distributions in a fashion that is informative and attractive to a 
user.

Some of the old/current items that are part of an Apache project release are 
going to be "legacy" and there are some new metadata items that go with a 
release artifact.

I see two paths moving forward:


  1.   We push on with a release of cTAKES 5.0 and release in the current style.
  2.   We wait a couple of months until the Apache Infrastructure team has the 
new Artifact Distribution Platform ready and use it to release.

For #1 please keep in mind that we still haven't had a volunteer for the 
primary Release Manager.  Gandhi Rajan has volunteered to be co-RM but it will 
be a two-person job.

Either way can create Release Candidate source branches on GitHub to be tested 
and have issues posted on the cTAKES GitHub issues list.

This manner of Release Candidate testing would be a deviation from the method 
of creating Release Candidate artifacts including binary installations and 
putting them in a Subversion (svn) repository online.
We can probably place "binary installation" artifacts on GitHub, but somebody 
will need to check on space limits and other rules before we can make any 
promises there.  If there is some barrier there then testers would need to test 
binary installations by build/packaging locally on their system - which is a 
good thing to have tested anyway.

So, please post any thoughts or questions in reply to this email and we can try 
to figure out where to go from here.

Many thanks,

Sean


From: Finan, Sean 
mailto:sean.fi...@childrens.harvard.edu.INVALID>>
Sent: Monday, February 20, 2023 5:12 PM
To: dev@ctakes.apache.org 
mailto:dev@ctakes.apache.org>>; 
u...@ctakes.apache.org 
mailto:u...@ctakes.apache.org>>
Subject: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Hi all,

The cTAKES Project Management Committee has voted that it is time to officially 
begin the release process for cTAKES 5.0

It has been almost 6 years since version 4.0.0 was released, and with a 
worldwide user count estimated in the thousands, a new release will be 
extremely valuable.

Releasing cTAKES 5.0 will involve some work, and the project needs volunteers 
to assist in the process.

The most important thing right now is the appointment of a Release Manager (RM).
While the position is not to be taken lightly and does involve work, it can be 
a great experience (and a resume builder).

We need a cTAKES committer to be the RM, but I am going to split the general 
responsibilities below.
I am doing this because I believe that any user familiar with cTAKES can be a 
co-RM.

Requiring a committer:
1.  Creating Release Candidates of the code.
2.  Deploying and Signing the actual Official Release.

Not requiring a committer:
1.  Coordinating people performing documentation, testing and bug fixing.
2.  Communicating progress with the developer list.

I am sure that I am forgetting something, but those are the 4 tasks that I can 
think of right now.

If you would like to be the Release Manager (or a co-RM), please volunteer on 
the dev@ctakes.apache.org mailing list.

Other tasks that must be performed for a release include:
1.  Testing the release candidates.
3.  Contributing documentation.
2.  Writing fixes for bugs that can be fixed for the release.
4.  Updating the release information on ctakes.apache.org

Anybody can test release candidates.  There are countless pipelines that can be 
built and tested, but I think that we should try to cover the 'most commonly 
used' pipelines.  If you run any pipeline, please report success - even if you 
don't run it specifically for release testing.
Documentation can be contributed by any user.  A cTAKES committer is required 
to actually push the documentation to the wiki, readme, release notes, etc. 
Sending out markdown, images, plain text or just recommendations is open to all 
users.
While only committers can actually push changes to cTAKES code, any user can 
contribute fixes by creating code patches or even just copy-pasting code in an 
email.
Updating the ctakes.apache.org website will require a committer, but 
non-committer assistance is possible 

RE: Apache cTAKES is now on GitHub ! [EXTERNAL] [SUSPICIOUS]

2023-01-03 Thread Savova, Guergana
Fantastic development, thank you very much for making this happen, Sean! 

Happy New Year to all.
--
Guergana Savova, PhD, FACMI
Patricia F. Brennan Professor
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School


-Original Message-
From: Finan, Sean  
Sent: Friday, December 30, 2022 1:49 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Apache cTAKES is now on GitHub ! [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,

I am pleased to announce that the cTAKES source code is now on GitHub at 
https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmM-ol7Lc$
[https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!otCh3U4mka-wYQBwtZPr-CZRQIyRyuM20lodC_YD4HYbfV9nh8OFlzHtLuCMI87U8ulHHuas33Z3_nCzcOrjyKxwiaDg0cavUA1rpm7JB5d4yhtmZ84rBi0$
 
] GitHub - apache/ctakes: Apache 
ctakes Apache ctakes. Contribute to apache/ctakes development by creating an 
account on GitHub.
github.com








All current and future code development should be performed on the source in 
GitHub.


   Changes ( vs. Subversion Repository )
   =

  *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
  *   STRUCTURE:   The project has been slightly restructured at a high level.  
The typical user should not notice the difference.
  *   CODE API:   All package, class, method and constant names remain the 
same, so your code should not need to be refactored.
  *   DEPENDENCIES:   If you include cTAKES modules as dependencies in your 
maven project, you can simply change the version to obtain new 5.0.0-SNAPSHOT 
builds. *
  *   BINARY PACKAGE:   The binary package has some minor differences, but the 
typical user should not notice them.

* If you use maven dependency exclusions for resource ('-res') modules because 
of unwanted ML models, you need to change the excluded name extension from 
'-res' to '-model'.


   Moving forward from the Subversion Repository
   =

  *   VERSION:   The project in the SVN repository was versioned 4.0.1-SNAPSHOT.
  *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT Subversion 
(SVN) repository will remain available for checkout, but should be considered 
read-only.  4.0.1-SNAPSHOT built modules will remain available for maven 
dependencies.  All current and future code development should be performed on 
the source in GitHub.
  *   RELEASE:   There is no cTAKES 4.0.1 release.

   Next Anticipated Release
   

  *   VERSION:   As you might guess from the snapshot version change, we are 
gearing up for a version 5.0.0 release.
  *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0, including 
completely new modules, that the version number was bumped up.
  *   DOCUMENTATION:   All of the new toys will be documented in the confluence 
wiki at the time of the 5.0.0 release.
  *   DATE:   There is no release date yet, but hopefully it will be very very 
soon ...

Happy New Year,

Sean




RE: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch [EXTERNAL] [SUSPICIOUS]

2021-01-21 Thread Savova, Guergana
+1
Amazing effort by the community led by Sean and Peter, thank you!
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, January 21, 2021 7:16 AM
To: dev@ctakes.apache.org
Subject: Re: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Seconded, thanks a lot Sean and Peter for getting this working and turned 
around so quickly! 
Tim

On Wed, 2021-01-20 at 23:13 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Thanks Sean!
> 
> Peter
> 
> On Wed, Jan 20, 2021 at 4:25 PM Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > ???As some have experienced, the U.S.A. National Library of Medicine 
> > (NLM) has changed the authentication method for using the Unified 
> > Medical Language System (UMLS).
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.gov
> > _research_umls_index.html=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZM
> > SdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrH
> > Eo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=CVA7xXHE
> > y4dOSNfEju1Or1cr6KZd3QY7bnY4yIDye3I=
> > 
> > 
> > Though a bit late in its arrival, Apache cTAKES now has a patch 
> > release that supports the new UMLS authentication method.
> > 
> > 
> > The release number is 4.0.0.1, an update of the previous release 
> > version
> > 4.0.0 with a single change to enable the new UMLS authentication.
> > 
> > No other code or functionality has been modified and there are no 
> > enhancements to the previous release 4.0.0
> > 
> > 
> > There are instructions for use on the Apache cTAKES wiki.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
> > g_confluence_display_CTAKES_cTAKES-2B4.0.0.1=DwIBaQ=qS4goWBT7pop
> > lM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDR
> > RNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhh
> > d8GDDdhA=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo=
> > 
> > 
> > The source code is available in the 4.0.0.1 tag Subversion (svn) 
> > repository.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_
> > repos_asf_ctakes_tags_ctakes-2D4.0.0.1_=DwIBaQ=qS4goWBT7poplM69z
> > y_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXi
> > powRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDD
> > dhA=1jNLJHU_4gH08DUNZDjfC4BLGsPSKdiOe63D48Qqekw=
> > 
> > 
> > The jar and pom files are available from maven central and any 
> > Applications utilizing Apache cTAKES as an Apache Maven dependency 
> > should update their pom files.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__search.maven.or
> > g_search-3Fq-3Dctakes=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdio
> > CoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8u
> > Yx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=7ICwdr1JlzQe
> > T2skY6TMXmU_u3WAZlxTYKpIZGmGQfs=
> > 
> > 
> > At this time the Apache infra script that points mirror download 
> > servers to the pre-built zip/archive files has not run.  I hope that 
> > the mirror servers are updated in a day or two.
> > 
> > When the mirror servers are updated the buttons on the "Downloads"
> > page of
> > ctakes.apache.org should trigger a download of the patch version.  
> > Until then you will get a "page not found" error.
> > 
> > Until the pre-built archive downloads are available through the 
> > website, you can find them in the release repository.
> > 
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apac
> > he.org_content_repositories_releases_org_apache_ctakes_ctakes-2Dcore
> > _4.0.0.1_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=H
> > eup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=Ko
> > UGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=uM_5s0vlGN8eJc1nK4s9RPxN
> > Q2o5KB3vWRC1M0qo2HU=
> > 
> > 
> > For more information please visit the wiki page on the Apache cTAKES
> > 4.0.0.1 patch release.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
> > g_confluence_display_CTAKES_cTAKES-2B4.0.0.1=DwIBaQ=qS4goWBT7pop
> > lM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDR
> > RNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhh
> > d8GDDdhA=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo=
> > 
> > 
> > 
> > A very special thanks goes to Peter Abramowitsch for conception and 
> > original implementation of the authentication code and workflow.
> > 
> > 
> > Many thanks to those who boldly tested, documented and otherwise 
> > made this patch and its trunk equivalent possible, including
> > 
> > Kean Kaufmann
> > 
> > Gandhi Rajan
> > 
> > Eugenia Monogyiou
> > 
> > Timothy Miller
> > 
> > and anybody else that I have forgotten (apologies).
> > 
> > 
> > ?And for those of you gave gave me a bit of prodding to get this 
> > wrapped up and published ... in the end I am grateful and you have 
> 

RE: Current thinking on new UMLS authentication [EXTERNAL]

2020-09-18 Thread Savova, Guergana
I have not received that email either. Could you share it with us?
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215


-Original Message-
From: Greg Silverman [mailto:g...@umn.edu.INVALID] 
Sent: Friday, September 18, 2020 1:46 PM
To: dev@ctakes.apache.org
Subject: Re: Current thinking on new UMLS authentication [EXTERNAL]

* External Email - Caution *


I never received the email you mentioned.

I assume this will affect the API call to NLM for UMLS validation? If it does, 
why not take the NLM's model for UMLS and only require UMLS credentials at the 
time of download?

Greg--



On Fri, Sep 18, 2020 at 12:33 PM Peter Abramowitsch 
wrote:

> Hi All
>
> Probably all of you have received an email from Patrick McLaughlin at 
> the NLM regarding upcoming changes to the UMLS authentication they are going 
> to
> support and to retire.   This will have implications for all cTakes users
> in different ways depending on how cTakes is implemented in your
> community.   To me, there were some ambiguities in his email regarding
> usage situations as a registered content provider that needed to be 
> spelled out.
>
> I was wondering if any of you have had further conversations with him 
> which might clarify whether, for instance,  users within a registered 
> content provider installation would still need to be individually 
> authenticated.
> Or on any other authentication scenario.
>
> I'm trying to contact him or his team at the moment to ask about our 
> particular architecture.
>
> Regards,  Peter
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE 
 Department of Surgery University of Minnesota g...@umn.edu


RE: ApacheCon 2020 [EXTERNAL] [SUSPICIOUS]

2020-07-07 Thread Savova, Guergana
A fantastic set of presentations, will be of broad interest to the Apache 
community!
Amazing work, cTAKES community!
Stay safe and healthy all,
--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215
Tel: (617) 919-2972
Fax: (617) 730-0817


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, July 6, 2020 9:21 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Fw: ApacheCon 2020 [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


I can't believe that I forgot to mention ...


There will also be a presentation (maybe two?) by a group that has adapted 
ctakes to work with two other languages.  They have also integrated ctakes with 
other tools such as FreeLing and HeidelTime.  So cool ...


Cheers,

Sean



From: Finan, Sean
Sent: Monday, July 6, 2020 9:08 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: ApacheCon 2020


Hi all,


The ctakes representation at ApacheCon 2020 is looking good!​


ApacheCon 2020 runs September 29 through October 1.

Submission runs through Sunday, July 12.  Technically it is 8:00 a.m. Eastern 
time Monday, but please don't procrastinate.

Registration is free.


I am excited to announce that we have three groups interested in giving 
presentations on their configuration and use of ctakes at a large scale!

We also have a presentation on the installation of the ctakes Rest service 
using the ctakes-rest module!


Knowledge on these topics is always extremely valuable to our users, and I for 
one really want to see how sites use ctakes when given different resources, 
requirements and restrictions.  Because of that, I am trying to put together 
(technology allowing) a roundtable discussion with those presenters.  That 
should be of value to every user no matter what your situation.


We still need more presentations!  To encourage you, here is a little 
information:


1.  What you do is interesting!  If you think that nobody out there cares about 
what you've done and how, then you probably aren't fully aware of how large and 
diverse our user base really is.  People want to know about things like 
integration, customization, clinical specialty application, augmentation and 
favorite capability fascination.

2.  Submission is very simple.  This is not like a scientific conference that 
requires a complete paper describing your work.  You only need to submit a 
blurb that loosely covers your topic and major talking point(s).  Half a dozen 
sentences will suffice.  In fact, what I sent last week (far below) could pass 
muster for a submission.  Go for something that will be on a brochure / 
schedule.

3.  The audience is made up of people just like you.  Developers, 
Bioinformaticians, IT Specialists, Students, Medical Researchers, AI Explorers 
and far more Hackers than Rock Stars.

4.  Slick presentation skills are not necessary.  Don't worry if you have never 
spoken to a room full of listeners.  Don't worry if English isn't your first 
language.  Don't worry if your slides are "sloppy".  Your presentation will not 
be graded.

5.  You don't need to prepare your whole talk before submitting.Idea now, 
details later.

6.  Registration is FREE.


Right now the speaking time is anything up to 50 minutes.  If you don't want to 
present a full 50 minutes then that is ok ... The rest can be filled with extra 
question/answer or somebody else may fill the remaining time with a 
presentation on a similar topic.


I am going to put together a lightning round.  If you think that you can cover 
some material in five to fifteen minutes then this is for you!  Lightning 
rounds can be fun as you can make an impact with two or three slides and barely 
enough speaking to run out of breath.  This is really a free-for-all.  You can 
pack the time with data, give a short demonstration, compare using ctakes to 
breaking a mustang, or even do some on-topic (ctakes, nlp, AI, bioinformatics) 
stand up.  Anything goes.  This was an interesting (full) talk last year: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__aceu19.apachecon.com_session_confessions-2Dmiddle-2Daged-2Dcoder-2Dturned-2Dgravel-2Dgrinder=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=e-OOqkInyUhKdC06RHK2xAz6io-pUkfzLWQ4kF_HI1M=rrZwfkkVrf06VZ0-06cTQ-JCSvtGXKmpxQo7r20KBxs=
 .   If you want to be in the lightning round, just write me a couple of 
sentences on your strike and I will put together the full submission for 
ApacheCon.  Does it get any easier?


I will present one or two things, but to maximize impact I would like to know 
what most interests / would help all of you.  So, please write me a topic or 
two that would best apply to your work.


Some links ...



RE: ApacheCon 2020 and cTAKES

2020-06-29 Thread Savova, Guergana
Hi Sean,

Thank you for bringing ApacheCon to the attention of cTAKES-ers!

In my opinion, your list of ideas for presentations/videos catches topics of 
high interest in our community that we have a seen many discussions on in the 
cTAKES lists. Thank you for volunteering to be the point of contact!

It is a short two week timeline, but we as a community can pull it off.

Looking forward to engaging discussions on the list. I am including the user 
list as well as there are many there who might be interested.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, June 29, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: ApacheCon 2020 [Bulk] [EXTERNAL] [SUSPICIOUS] [Bulk]

* External Email - Caution *


Hi all,


General admission to ApacheCon 2020 is free:  
https://urldefense.proofpoint.com/v2/url?u=https-3A__hopin.to_events_apachecon-2Dhome=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=yU_agaYe-PZHfO7KaS_wI1oIHZ9S2WZ6mlFRuPuGX-w=iNzRSD7w2OIaoya3gcxVg3TN3e1uZZnaTfnLbPIH13A=
 


I think that price of admission and travel costs have held back ctakes users 
from attending past conferences, and lack of a sizable audience has diminished 
the comparative value of ctakes presentations in the eyes of ApacheCon 
planners.  Because of the "at home" nature of this year's conference, an app 
with smaller presence and less hip buzz has a better chance of grabbing some 
time on the schedule.


The predetermined tracks are still an ill fit when it comes to the nature of 
ctakes.  
https://urldefense.proofpoint.com/v2/url?u=https-3A__apachecon.com_acah2020_cfp.html=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=yU_agaYe-PZHfO7KaS_wI1oIHZ9S2WZ6mlFRuPuGX-w=NzjDAyMTCLL62RKHfhr4dnMgGZTDFgB3X92YlqwPUEY=
 

However, I think that we can still use this opportunity to deliver some 
powerful introduction and training videos, as well as user stories and clinical 
project application.  Perhaps we can argue for a NLP track and do some 
coordination with projects like OpenNLP and UIMA.


There are a scant two weeks to come up with presentations, and less time to 
propose a track/topic.  The call for presentations ends July 13th.  That is a 
deadline that requires immediate attention by anybody who wants to show off 
their project or expertise.


Apache wants to have a single point of contact for each project, and I am 
volunteering to be that person for ctakes.   I am volunteering, not laying 
claim, so if you think that you are a better fit for the position please let me 
know.


I have written some ideas for presentations below.  If you want to take one 
(modify as you like) then please write me and post to the devlist.  If you have 
ideas for another presentation topic, please let me and the devlist know - even 
if you aren't volunteering to do the presentation yourself perhaps somebody 
else will.Again ... two weeks.​


Thank you,

Sean



*  The following talk ideas are by and large directed toward training.  That 
does not mean that topics should stay within that scope.


=


Customizing cTAKES: First Principles

Built using Apache UIMA, cTAKES is modular and extensible.  Why is it 
frequently treated as a black box?  Is it lack of need, sparsity of resources, 
or simply fear of the unknown?

This is a quick start tutorial on adding custom elements to cTAKES.  We 
illustrate creating simple classes to input, process and output data.  This 
involves a concise overview of Apache uimaFIT and the cTAKES type system, as 
well as building a UIMA pipeline using piper files.


=


Loading a shippable with cTAKES DockHand

Customizing a simple pipeline need not be left to cTAKES experts.  Making a 
cTAKES installation need not be confined to source code checkouts or lengthy 
multi-stage binary downloads.

We introduce cTAKES DockHand, a compact single-file installation tool that 
allows one to construct custom pipelines as well as local installations, Rest 
Services and Dockerfiles.


==


Secret Engines of cTAKES

The cTAKES default natural language processing pipeline is a standard in the 
clinical research community.  What is past that standard?  While the default 
clinical pipeline uses almost 20 engines, there are dozens more in various 
cTAKES modules.

We present and discuss the top 10 annotation engines you never knew you had.



RE: how to activate inactive features in cTAKES? [EXTERNAL] [SUSPICIOUS]

2020-04-30 Thread Savova, Guergana
To add to Tim's clarification.
In addition, this enables you (or anyone for that matter) to implement your 
/their own method for these types. 

--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, April 30, 2020 7:53 AM
To: dev@ctakes.apache.org
Subject: Re: how to activate inactive features in cTAKES? [EXTERNAL] 
[SUSPICIOUS]

* External Email - Caution *


Akram, the typesystem in ctakes was created by a project with the aim of 
specifying things that are useful, without specifying implementations for them 
all. There are many items in the data model that there are no ctakes modules to 
fill. The idea was that when people bring things online there are placeholders 
for that information, so that new functionality is not added in a completely ad 
hoc way. So of the examples you describe:

- discoveryTechnique is always the same because you are running the same 
pipeline
- confidence is not filled in by the dictionary lookup -- the current method 
used does not generate a confidence score
- disambiguated is not filled but is technically correct because there is no 
disambiguation algorithm running
- polarity, uncertainty, conditional, generic, historyOf, can be filled in by 
certain pipelines. You will have to add them after the DictionarySubPIpe to see 
them filled in.

Tim


From: Akram 
Sent: Thursday, April 30, 2020 4:37 AM
To: dev@ctakes.apache.org
Subject: how to activate inactive features in cTAKES? [EXTERNAL]

* External Email - Caution *


Hi
I can extract many tags when I use the default .piper in cTakes Tags such as 
LabMention, AnatomicalSiteMention, ProcedureMention, etc they all extracted 
from applying this piper

load DefaultTokenizerPipeline

load DictionarySubPipe

writeHtml
writeXmis

The problem is there are some features that do not change no matter the text 
change.
most importantly confidence which is always 0 How can I get the confidence of 
each term?
other features such
discoveryTechnique is always 1

polarity always 0

uncertainty always 0

conditional always false

generic always false

historyOf always 0

score always 0

disambiguated always flase

how can I get these features working and where can I find more info about these 
features and what do they mean?
Thanks



RE: Missing body side and laterality attribute in AnatomicalSiteMention [EXTERNAL]

2020-02-17 Thread Savova, Guergana
Hi Abad,

Methods for populating these two attributes have not been implemented in 
cTAKES.  In cTAKES, there is a method for linking anatomical sites to 
diseases/disorders, sign/symptoms or procedures:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3994852/

best,
--
Guergana Savova, PhD, FACMI
Associate Professor
Boston Children's Hospital and Harvard Medical School

[http://web2.tch.harvard.edu/homepagestories/Images/SigBlock.jpg]


From: abad.ay...@cognizant.com [mailto:abad.ay...@cognizant.com]
Sent: Monday, February 17, 2020 3:47 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: Missing body side and laterality attribute in AnatomicalSiteMention 
[EXTERNAL]

* External Email - Caution *


Hello Team,

We introduced cTAKES as our NLP engine to parse clinical data recently in our  
profile. Though we are able to parse the clinical data at high level, we are 
not able to get values for attributes like bodySide and bodyLaterality. For eg: 
for the below text

"He had a slight fracture in the proximal right fibula"

It should have ideally populated values for 'bodySide' and 'bodyLaterality' in 
the 'AnatomicalSiteMention' as "right" and  "proximal" respectively. These 
attributes are critical information in our profile. We tried different 
possibilities and still it's not working. We are new to cTAKES so we would like 
to know what should be the probable fix for it.  Do we need to add any specific 
changes in our piper file to have AnalysisEngine needed for the same. I tried 
to unit test using the 'RelationExtractorAnnotatorsTest' coming under 
'ctakes-relation-extractor' module but couldn't find an annotator.xml for the 
same. Pls. advise on how to proceed


Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored. This e-mail and any files transmitted with it are for the sole 
use of the intended recipient(s) and may contain confidential and privileged 
information. If you are not the intended recipient(s), please reply to the 
sender and destroy all copies of the original message. Any unauthorized review, 
use, disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


RE: Looking for literature [EXTERNAL]

2020-01-29 Thread Savova, Guergana
Hi Greg,

A link to our JAMIA publication describing the MiPACQ corpus and its usage:
https://academic.oup.com/jamia/article/20/5/922/2909262

I believe the "Development and evaluation of NLP components" section provides 
the details you are looking for.

Best,
--
Guergana Savova, PhD, FACMI
Associate Professor
Boston Children's Hospital and Harvard Medical School
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova


-Original Message-
From: Greg Silverman [mailto:g...@umn.edu] 
Sent: Wednesday, January 29, 2020 2:54 PM
To: dev@ctakes.apache.org
Subject: Looking for literature [EXTERNAL]

* External Email - Caution *


I'm digging around for literature on the relationship between cTAKES and 
MiPACQ, and of course found this paper, "The MiPACQ Clinical Question Answering 
System," which describes how cTAKES was used wrt to the question and answering 
component of MiPACQ.

However, I'm more interested in how cTAKES was used with the deidentified 
MiPACQ corpus of pathology and colorectal cancer notes. All I'm able to find is 
reference to the MiPACQ Treebank and with that, very little about use of 
cTAKES. Any information about this, especially in the literature would be most 
welcome.

Thanks in advance!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE 
 Department of Surgery University of Minnesota g...@umn.edu

 ›  evaluate-it.org  ‹


RE: Deep learning [EXTERNAL]

2017-08-18 Thread Savova, Guergana
Not at this point...
Thanks for your question,
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  





-Original Message-
From: abilash.mat...@cognizant.com [mailto:abilash.mat...@cognizant.com] 
Sent: Friday, August 18, 2017 1:46 AM
To: dev@ctakes.apache.org
Subject: Deep learning [EXTERNAL]

Just a basic question, are we using any deep learning techniques in CTAKES? If 
yes, then which module.

Thanks,
Abilash Mathew
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.


RE: Proposed improvements [EXTERNAL] [SUSPICIOUS]

2017-07-10 Thread Savova, Guergana
Good dependency parser are hard to find; moreover good dependency parsers 
trained on clinical data are impossible to find. I don't think there is another 
dep parser trained on clinical data other than cTAKES's. In general, the state 
of the art of dependency parsing is associated with resource intense computing, 
the models are also of fair size.
--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  






-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, June 27, 2017 4:07 PM
To: dev@ctakes.apache.org
Subject: RE: Proposed improvements [EXTERNAL] [SUSPICIOUS]

Hi all,

> I would like to have (and work on it) much leaner distribution
One bigfoot is the clearparser_models.jar in ctakes-dependency-parser-res.  As 
far as I know this is not used by default or in any checked-in non-default 
configuration.  As it is 1/4 GB, I would like to move it to its own module to 
keep it out of projects that use ctakes "as a library".  I hunted the net to 
see if a duplicate is available elsewhere for alternative inclusion methods but 
couldn't find one.

Thoughts?

Thanks,
Sean

-Original Message-
From: Andrey Kurdumov [mailto:kant2...@googlemail.com]
Sent: Sunday, June 25, 2017 1:52 AM
To: cTakes developers list
Subject: Re: Proposed improvements [EXTERNAL]

Just want to note that ASF PMC want to make GitHub primary repository and 
Apache servers secondary soon.

Regarding improvements:
I personally want better support for embedding. Right now cTakes distribution 
comes with LVG and UMLS dictionary and size of cTakes thus become very.
I would like to have (and work on it) much leaner distribution, let's name it 
cTakes Core, which will just provide cTakes executable without need for data.
Right now I have constantly rip-off that data after cTakes build which slow 
down my build significantly.

Personally I support Hadrian initiative to have better logging since cTakes 
setup has some quirks which could be faster resolved by better logging.


2017-06-23 17:38 GMT+06:00 Miller, Timothy <
timothy.mil...@childrens.harvard.edu>:

> Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and 
> like something where we should be making people aware of cTAKES!
>
> svn vs. git -- I'm with you on preferring git, but not by so much that 
> it's worth spending time on an argument if it turns into an argument 
> :). As far as I know we've never really had a discussion about it.
> It's probably getting to the point where new developers have _only_ 
> used git and would find it a complete roadblock to use svn but for me 
> it's just a mild annoyance.
>
> All others you mentioned -- if you are willing to contribute a patch 
> we are happy to accept one-off contributions, and we are also 
> interested in growing the developer community with people who are 
> interested in contributing regularly over time.
>
> Tim
>
> 
> From: Hadrian Zbarcea 
> Sent: Thursday, June 22, 2017 9:14 PM
> To: dev@ctakes.apache.org
> Subject: Proposed improvements [EXTERNAL]
>
> Last week I presented at the OSEHRA Summit about ActiveMQ (and a few 
> other projects) and the ASF in general.
>
> I was surprised that most didn't know much about the ASF and more 
> importantly that nobody knew about cTakes, the only (directly) 
> healthcare related project at the ASF. There was no cTakes talk at 
> ApacheCon in Miami, but at OSEHRA, which is all about healthcare we 
> should have had a presence. I will probably submit a talk for next 
> year, but until then, because I think I created a bit of interest in 
> cTakes I went to build cTakes myself and try a few things.
>
> Some of my findings are:
> * test failures with openjdk; granted the docs mention oracle jdk as a 
> prerequisite, but think it's easy to support openjdk
> * use of svn vs git; this is a debatable topic, but by now everybody 
> and their uncles are on git so moving to git (which I'd recommend) 
> would probably forster adoption (yes, I know about the github mirror)
> * no support for OSGi, many large players use it
> * improvements in logging could go a long way, starting with moving to 
> slf4j
>
> Suggesting improvements imply that I volunteer to do a good chunk of 
> the work, but before that I'm interested more in how much the 
> community would welcome such improvements. I am curious what are 
> considered more low hanging fruits, for the more controversial topics 
> we could take them to [discuss] threads. Because every community 

RE: Annotating Lab data [EXTERNAL]

2017-07-10 Thread Savova, Guergana
Yes, cTAKES does not annotate lab data. The basic components are there -- the 
lab and the value, but linking the two of them is not. One could do the linking 
through rules or a classifier.
I hope this helps.
--Guergana


Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Das, Tanmay [mailto:tanmay@optum.com] 
Sent: Tuesday, June 27, 2017 3:39 AM
To: dev@ctakes.apache.org
Subject: Annotating Lab data [EXTERNAL]

Hi,

When using the CVD bundled with cTAKES along with 
AggrigatePlainTextFastUMLSProcessor I found that no laboratory data was 
annotated, even after providing it.
For an input like:
LABORATORY DATA:
Hemoglobin 10.6, hematocrit 31.7, white cell count 5.8, platelet 377.
Magnesium 2.6, glucose 98, BUN 13, creatinine 0.5, sodium 138, potassium 3.9, 
chloride 103. INR is 1.5.
The IdentifiedAnnotations classified them as Medication, Procedures etc but not 
as LabMention.
Does this AE contain annotator to annotate Lab data? If not, can someone 
suggest any different annotator that could identify lab values.


This e-mail, including attachments, may include confidential and/or proprietary 
information, and may be used only by the person or entity to which it is 
addressed. If the reader of this e-mail is not the intended recipient or his or 
her authorized agent, the reader is hereby notified that any dissemination, 
distribution or copying of this e-mail is prohibited. If you have received this 
e-mail in error, please notify the sender by replying to this message and 
delete this e-mail immediately.


RE: Visit segregation and extraction [EXTERNAL]

2017-06-26 Thread Savova, Guergana
You probably have to add some logic on top of the cTAKES extracted information 
to distinguish inpatient v outpatient text. 
--Guergana


Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Hari, Sekhar [mailto:sekhar.h...@cgi.com] 
Sent: Monday, June 26, 2017 12:44 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: RE: Visit segregation and extraction [EXTERNAL]

These are already readable PDFs and not images. The clinical documents came 
through to me as scanned images. We then converted those images into readable 
PDFs using OCR. cTAKES is able to read the texts. But I want to understand if 
it can distinguish BP test result performed during an outpatient visit and in a 
non-outpatient visit (such as inpatient stay, ED visit, diagnostic test, or 
surgical procedure). The texts are cluttered with different types of clinical 
documents (progress notes, radiology notes, H notes etc.).

Thanks
Sekhar Hari | Program Lead
Health Sciences Business Innovation
ASDC CGI Health Solutions
Electronic City, Bangalore
Karnataka, India 560100

814 7027 779 (C)
080 6642 2536 (D)

-Original Message-
From: Chris Mattmann [mailto:mattm...@apache.org] 
Sent: 26 June 2017 10:03
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Visit segregation and extraction

Maybe start out with Apache Tika for text extraction from the PDFs, then run 
Apache cTAKES on the resultant text?



On 6/25/17, 5:30 PM, "Hari, Sekhar"  wrote:

Hello there -

I have a task in hand to process 7,000,000 patient records (PDF files) 
containing different clinical documents. Each PDF has 20 pages and one PDF = 
one patient.

The information to retrieve from these documents is like this for a patient 
quality measure namely 'Controlling High Blood Pressure' -

"Extract most recently documented blood pressure occurring after the 
diagnosis of hypertension (Do not use BP readings from inpatient stay, ED 
visit, diagnostic test, or surgical procedure). Blood pressure should be 
routinely assessed as part of a physical exam at each outpatient visit."

Can cTAKES identify non-outpatient visits and outpatient visits separately? 
Are there specific pipelines that we should use to solve this problem?

Many thanks,
Sekhar H.





RE: cTAKES 4.0.0 Release [SUSPICIOUS]

2017-04-24 Thread Savova, Guergana
Excellent work, cTAKES team! We are already looking forward to v4.1... Release 
soon, release early.

The formal announcement will go out tomorrow -- we are very appreciative of 
Sally's expertise and efforts (and the ASF PR office) in promoting this major 
release!

--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, April 24, 2017 9:33 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org; annou...@apache.org
Subject: Re: cTAKES 4.0.0 Release [SUSPICIOUS]

Congrats cTAKES team! This is an important milestone!

Tim





On Mon, 2017-04-24 at 09:02 -0400, Murali Minnah wrote:

> The Apache cTAKES team is pleased to announce the availability of the

> 4.0.0 release.

> 

> For the complete release notes, please visit

> https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apache.org_ctakes-2D4.0.0-2Drelease-2Dnotes=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=WdT9MyR7SjUw37IAH5QXuFcUwS4ShD531Pxte5A7z4Q=vwMjzd0TaH9TGT7Ew-2x6q_LQFiiO4iky3Iy8zmf9EM=
>  

> 

> Apache clinical Text Analysis and Knowledge Extraction System

> (cTAKES) is

> an open-source natural language processing system for information

> extraction from electronic medical record clinical free-text.

> 

> The release can be downloaded from

> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_downloads.cgi=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=WdT9MyR7SjUw37IAH5QXuFcUwS4ShD531Pxte5A7z4Q=QvJDR_vTNejF4s7uwNgJusOJ1BLYmnJoi0y8B8Priyw=
>  

> 

> For further information, please visit the project website at

> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=WdT9MyR7SjUw37IAH5QXuFcUwS4ShD531Pxte5A7z4Q=HBd5K583Nzh_0eWV0gx504QQEf07QFZuegCTGCQUUQI=
>  

> 

> -- The Apache cTAKES Team


RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

2017-04-17 Thread Savova, Guergana
Never rule out unplanned family events and emergencies... Glad to hear that 
does not appear to be the case. 

More clarity on when the rc3 will be ready would be appreciated.

--Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org  
http://thyme.healthnlp.org 
http://cancer.healthnlp.org 
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, April 17, 2017 10:40 AM
To: dev@ctakes.apache.org
Subject: RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

Hi Pei,

I don't  think that Guergana was imposing a deadline.  I think that she is 
indicating that James and/or I will make rc3 if you are offline.  I think that 
she was actually trying to relieve any pressure that may be upon you by 
volunteering bch time.
Guergana is very eager to get a successful 4.0 out as soon as possible.

Sean

-Original Message-
From: Pei Chen [mailto:chen...@apache.org]
Sent: Monday, April 17, 2017 10:30 AM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

Guergana,
Sean and James sent us a private message to request a rc3 to include most 
recent changes in trunk after rc2 was created.
We are more than happy to create another release candidate.  That was the 
reason that rc2 was veto'd and a rc3 was requested.  The only differences 
between rc3 and rc2 are whatever minor changes went into trunk since Fri over 
the Easter and Patriots holiday weekend.  You're more than welcome to create 
the rc yourself-- but I don't think it will make it any more efficient.  I 
rarely see anyone threaten dates/deadlines upon other ASF volunteers.  What 
gives?

On Mon, Apr 17, 2017 at 9:53 AM, Savova, Guergana 
<guergana.sav...@childrens.harvard.edu> wrote:
> Pei/Murali,
> Let us know if you could cut release candidate 3 by Monday, April 17, 5 pm 
> ET. We would understand if you are very busy and unavailable to do so -- life 
> happens. Sean Finan and James Masanz volunteered to prepare rc3 if we do not 
> hear from you.
>
> Dear cTAKES community,
> Thank you for your testing of rc2, your contributions are so valuable! RC3 
> will be made available on Tuesday, April 18 or Wednesday, April 19 for 
> another round of testing and voting.
> We all are looking forward to the v4 release!
>
> Cheers,
>  --Guergana
>
> -Original Message-
> From: Savova, Guergana
> Sent: Saturday, April 15, 2017 10:02 AM
> To: 'dev@ctakes.apache.org' <dev@ctakes.apache.org>
> Subject: RE: Release Apache cTAKES 4.0.0 (rc2)
>
> Not sure what is meant by "this week". Today, Sat, April 15 by 5 pm?
> --Guergana
>
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> guergana.sav...@childrens.harvard.edu
> Harvard Scholar: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> u_guergana-5Fk-5Fsavova_biocv=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14J
> ZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=1NZsSd_
> t_sgTIJih8u2BxRLiJPDvnuewNBu5-1b-YVk=8bQ5yoZbdBJ1OPH9Mx93S8AKr4UenJQ
> VV_q6yL86np8=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org;
> d=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTp
> yIisCYNYmQCP6r0bcpKGd4f7d4gTao=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNBu5-
> 1b-YVk=in-TijV-tW7CS3nn-XBBPGGx960bvD-tBdvM-ANaOok=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__thyme.healthnlp.or
> g=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZst
> TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNBu
> 5-1b-YVk=qjQAgMNUlopxi2zR5RHe8BaOZBtb3O3LKiZElC1dA9o=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__cancer.healthnlp.o
> rg=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZs
> tTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNB
> u5-1b-YVk=3WDCHDRtjyvhZ4rqdmRooWIaP0O25UuYnVhAp8m131k=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__share.healthnlp.or
> g=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZst
> TpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=1NZsSd_t_sgTIJih8u2BxRLiJPDvnuewNBu
> 5-1b-YVk=1M68m6OHMvu7ZH9j41Co0kVZRsgTeidD-NGDvwmeMf4=
> https://urldefense.proofpoint.com/v2/url?u=http-3A__center.healthnlp.o
> rg=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs6

RE: Release Apache cTAKES 4.0.0 (rc2)

2017-04-17 Thread Savova, Guergana
Pei/Murali,
Let us know if you could cut release candidate 3 by Monday, April 17, 5 pm ET. 
We would understand if you are very busy and unavailable to do so -- life 
happens. Sean Finan and James Masanz volunteered to prepare rc3 if we do not 
hear from you. 

Dear cTAKES community,
Thank you for your testing of rc2, your contributions are so valuable! RC3 will 
be made available on Tuesday, April 18 or Wednesday, April 19 for another round 
of testing and voting.
We all are looking forward to the v4 release!

Cheers,
 --Guergana

-Original Message-
From: Savova, Guergana 
Sent: Saturday, April 15, 2017 10:02 AM
To: 'dev@ctakes.apache.org' <dev@ctakes.apache.org>
Subject: RE: Release Apache cTAKES 4.0.0 (rc2)

Not sure what is meant by "this week". Today, Sat, April 15 by 5 pm?
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org
http://thyme.healthnlp.org
http://cancer.healthnlp.org
http://share.healthnlp.org
http://center.healthnlp.org  


-Original Message-
From: Pei Chen [mailto:pei.c...@wiredinformatics.com]
Sent: Saturday, April 15, 2017 9:13 AM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

Let us recut 4.0.0 from trunk this week.  I just saw a note from Sean that he 
would like to integrate changes from trunk as well.

   Pei Chen
Wired Informatics 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.ly_1pHmTcL=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=if2F9Ti4D02juzTUQoXtsUPoO5F3SufvTF70twXnRpc=hKX8Ff6KEsf5JpGL11G7PTETB_ZEFCtCGxoWs5U2JEA=
 >
265 Franklin St Ste 1702
Boston, MA 02110
tel: (617) 433-7544
pei.c...@wiredinformatics.com

On Fri, Apr 14, 2017 at 11:38 PM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> > I'd rather not get into the definition of "basic", just like I'd 
> > rather
> not discuss the definition of obvious with another mathematician.
> --> Lol.  My wife can't stand it when I say "obviously".
>
> Fwiw, I think that cutting a new rc sooner rather than later is 
> comparatively little work compared to the benefit for testers.  It 
> needs to be done anyway as what is in rc2 is not releasable.  I don't 
> want to vote
> -1 on the rc, but will if it is necessary to get an rc3 cut.
>
> Sean
>
> -Original Message-
> From: James Masanz [mailto:masanz.ja...@gmail.com]
> Sent: Friday, April 14, 2017 9:36 PM
> To: dev@ctakes.apache.org
> Subject: Re: Release Apache cTAKES 4.0.0 (rc2)
>
> these are all the fixes I plan to make. last I talked to Sean, he had 
> all his changes in. I assume there will be more testing up until final 
> vote, I certainly will be doing more testing and working more on 
> documentation. But why not have people test on the latest now that we 
> have fixed some issues that seem like showstoppers?  I'd rather not 
> get into the definition of "basic", just like I'd rather not discuss 
> the definition of obvious with another mathematician.
>
> On Fri, Apr 14, 2017 at 8:23 PM, Pei Chen <chen...@apache.org> wrote:
>
> > James,
> > Happy to create another rc3, but can I suggest we bundle all of the 
> > fixes before creating another candidate?  Are there other remaining 
> > items to test? This just seems like basic functionality?
> >
> > On Fri, Apr 14, 2017 at 8:04 PM, James Masanz 
> > <masanz.ja...@gmail.com>
> > wrote:
> > > -1 from me for rc2 because of various issues found
> > > old dictionary lookup didn't work in an IDE unless you 
> > > manually download the latest zip - pom files needed updating 
> > > (checked into trunk
> > > today) (more of the ctakesresources from sourceforge need to be 
> > > put onto maven central for ctakes to work as a maven dependency)
> > > Sean fixed some issues today (I saw commit notices today) 
> > > which I'd like to see included in 4.0 before it's released
> > >
> > > -- James
> > >
> > >
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen <chen...@apache.org> wrote:
> > >
> > >> This is a call for a vote on releasing the following candidate
> > >> (rc2) as Apache cTAKES 4.0.0.
> > >>
> > >> For more detailed information on the changes/release notes, 
> > >> please
> > visit:
> > >> https://urldefense.proofpoint.com/v2/

RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

2017-04-15 Thread Savova, Guergana
Agreed that we need rc3 asap.
I am planning to test rc3 this weekend. 
--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Friday, April 14, 2017 11:38 PM
To: dev@ctakes.apache.org
Subject: RE: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

> I'd rather not get into the definition of "basic", just like I'd rather not 
> discuss the definition of obvious with another mathematician.
--> Lol.  My wife can't stand it when I say "obviously".

Fwiw, I think that cutting a new rc sooner rather than later is comparatively 
little work compared to the benefit for testers.  It needs to be done anyway as 
what is in rc2 is not releasable.  I don't want to vote -1 on the rc, but will 
if it is necessary to get an rc3 cut.

Sean

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com]
Sent: Friday, April 14, 2017 9:36 PM
To: dev@ctakes.apache.org
Subject: Re: Release Apache cTAKES 4.0.0 (rc2)

these are all the fixes I plan to make. last I talked to Sean, he had all his 
changes in. I assume there will be more testing up until final vote, I 
certainly will be doing more testing and working more on documentation. But why 
not have people test on the latest now that we have fixed some issues that seem 
like showstoppers?  I'd rather not get into the definition of "basic", just 
like I'd rather not discuss the definition of obvious with another 
mathematician.

On Fri, Apr 14, 2017 at 8:23 PM, Pei Chen  wrote:

> James,
> Happy to create another rc3, but can I suggest we bundle all of the 
> fixes before creating another candidate?  Are there other remaining 
> items to test? This just seems like basic functionality?
>
> On Fri, Apr 14, 2017 at 8:04 PM, James Masanz 
> wrote:
> > -1 from me for rc2 because of various issues found
> > old dictionary lookup didn't work in an IDE unless you manually 
> > download the latest zip - pom files needed updating (checked into 
> > trunk
> > today) (more of the ctakesresources from sourceforge need to be put 
> > onto maven central for ctakes to work as a maven dependency)
> > Sean fixed some issues today (I saw commit notices today) which 
> > I'd like to see included in 4.0 before it's released
> >
> > -- James
> >
> >
> > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen  wrote:
> >
> >> This is a call for a vote on releasing the following candidate
> >> (rc2) as Apache cTAKES 4.0.0.
> >>
> >> For more detailed information on the changes/release notes, please
> visit:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.
> >> org_jira_secure_ReleaseNote.jspa-3F=DwIBaQ=qS4goWBT7poplM69zy_3
> >> xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gT
> >> ao=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8=UysXDJxyZLqXPkgd
> >> XXnHhwHUOl9QlNlwEhNHgti7unw=
> >> projectId=12313621=12340211
> >>
> >> The release was made using the cTAKES release process documented here:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__ctakes.apache.
> >> org_ctakes-2Drelease-2Dguide.html=DwIBaQ=qS4goWBT7poplM69zy_3xh
> >> KwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao
> >> =ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8=fIq5eXz-SfJlhVLIwE
> >> cyCvFBbXzhgDSobUXBLQd4J-A=
> >>
> >> The candidate is available at:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D=DwIBaQ=qS4goWBT7poplM
> >> 69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4
> >> f7d4gTao=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8=X5tWr5mjLa
> >> aF0ox740Z4Qm5A0vgmBG52xJhjwZvDiYk=
> >> rc2/apache-ctakes-4.0.0-src.tar.gz
> >> /.zip
> >>
> >> The tag to be voted on:
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_
> >> repos_asf_ctakes_tags_ctakes-2D4.0.0-2Drc2=DwIBaQ=qS4goWBT7popl
> >> M69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd
> >> 4f7d4gTao=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8=UK2KK8Yem
> >> 20C6Ai8CJd358-kgZFai3uOLcwnuzKBw9Q=
> >> The MD5 checksum of the tarball can be found at:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D=DwIBaQ=qS4goWBT7poplM
> >> 69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4
> >> f7d4gTao=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8=X5tWr5mjLa
> >> aF0ox740Z4Qm5A0vgmBG52xJhjwZvDiYk=
> >> rc2/apache-ctakes-4.0.0-src.tar.gz.md5
> >> /.zip.md5
> >>
> >> The signature of the tarball can be found at:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >> g_repos_dist_dev_ctakes_ctakes-2D4.0.0-2D=DwIBaQ=qS4goWBT7poplM
> >> 69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4
> >> f7d4gTao=ba097K3Ff5NJE_fKkqDp_KlqaE2ZYHmHwMN9zXpR2A8=X5tWr5mjLa
> >> aF0ox740Z4Qm5A0vgmBG52xJhjwZvDiYk=
> >> rc2/apache-ctakes-4.0.0-src.tar.gz.asc
> >> /.zip.asc
> >>
> >> Apache cTAKES' KEYS 

RE: cTAKES confluence wiki

2017-04-13 Thread Savova, Guergana
Actually, James did send an email with the Confluence details -- my bad for not 
seeing it.
--Guergana

-Original Message-
From: Savova, Guergana 
Sent: Thursday, April 13, 2017 2:05 PM
To: dev@ctakes.apache.org
Subject: RE: cTAKES confluence wiki

I am sorry but I am not seeing documentation for v4 on the confluence Wiki: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES
Could you please send the relevant link? Also, I think it would be very helpful 
to include the link in the README distributed with the release.

Thanks!
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Thursday, April 13, 2017 12:49 PM
To: dev@ctakes.apache.org
Subject: Re: testing release candidates Re: Release Apache cTAKES 4.0.0 (rc2) 
[SUSPICIOUS] [SUSPICIOUS]

OK. By logging into confluence I found the draft of version 4.0 documentation, 
but maybe it's worth sending an email to dev with a few pages that need help 
and people can improve as they test?

I will do the same.

Thanks
Tim

On Thu, 2017-04-13 at 12:24 -0400, James Masanz wrote:
> I agree.
> There are (or were) some places that have TBD. and the part about 
> unzipping resources needs to be expanded to include what to do if you 
> just download the fast dictionary and not the entire set of 
> dictionaries.  If no one beats me to it I will improve those sections 
> by the time we announce.
> but
> the documentation is ready for comments and can continually be 
> improved, even past an announced release if needed
> 
> On Thu, Apr 13, 2017 at 12:14 PM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > 
> > Hi Tim,
> > 
> > Excellent question/point.
> > 
> > I think that you are welcome to follow any online instructions.  We 
> > are aware that the wiki is far from complete, and one thing that I 
> > welcome everybody to do is become active on documentation.
> > 
> > So, if you find instructions for installation, workflow, etc.
> > please
> > "test" the instructions.  If there are none then comment on the 
> > absence.
> > However, I think that a paucity of documentation should not hold up 
> > the code/bin release.  I could be in the minority opinion.
> > 
> > Sean
> > 
> > -Original Message-
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Thursday, April 13, 2017 11:55 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: testing release candidates Re: Release Apache cTAKES
> > 4.0.0
> > (rc2) [SUSPICIOUS]
> > 
> > Thanks all for your hard work. I added some minor instructions to 
> > the spreadsheet that are hopefully helpful.
> > 
> > I want to test the cvd for standard dictionary lookup with the 
> > separate resoureces. Am I meant to be testing documentation as well?
> > As in, something I can follow along and make sure it's correct? Or 
> > should I just do it the way I know how to do it?
> > Tim
> > 
> > On Wed, 2017-04-12 at 20:21 -0400, James Masanz wrote:
> > > 
> > > Hi Everyone,
> > > 
> > > We could use a google spreadsheet to end up with a sense of 
> > > testing coverage and maybe reduce duplicate testing effort too.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > > com_
> > > spreadsheets_d_1FK-
> > > 2DkEhwewLJaVCBgWsSAMhL2KNCD6L8AMfFM33oKR2Y_edit-
> > > 3Fusp-
> > > 3Dsharing=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> > > r=He
> > > up-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQ
> > > FLuR
> > > Py-8Or1PXz-Fk=e08AY-Zbdb76VvYv_7uI4PE7LSTnsaP9BWpYtALtNgI=
> > > And we can compare future releases to this one.
> > > 
> > > I put a few example lines of the first things I plan to test.
> > > I'll
> > > start testing tomorrow and add more lines for myself then.
> > > If you don't want to update the spreadsheet twice, it would still 
> > > be helpful to list what you've done after you do testing, without 
> > > listing what you plan to do ahead of time.
> > > 
> > > Thanks,
> > > -- James
> > > 
> > > 
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen <chen...@apache.org>
> > > wrote:
> > > 
> > > > 
> > > > 
> > > > This is a call for a vote on releasing the following candidate
> > > > (rc2) as
> > > > Apache cTAKES 4.0.0.
> > > > 
> > > > For more detailed information on the changes/releas

RE: cTAKES confluence wiki

2017-04-13 Thread Savova, Guergana
I am sorry but I am not seeing documentation for v4 on the confluence Wiki: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES 
Could you please send the relevant link? Also, I think it would be very helpful 
to include the link in the README distributed with the release.

Thanks!
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, April 13, 2017 12:49 PM
To: dev@ctakes.apache.org
Subject: Re: testing release candidates Re: Release Apache cTAKES 4.0.0 (rc2) 
[SUSPICIOUS] [SUSPICIOUS]

OK. By logging into confluence I found the draft of version 4.0 documentation, 
but maybe it's worth sending an email to dev with a few pages that need help 
and people can improve as they test?

I will do the same.

Thanks
Tim

On Thu, 2017-04-13 at 12:24 -0400, James Masanz wrote:
> I agree.
> There are (or were) some places that have TBD. and the part about 
> unzipping resources needs to be expanded to include what to do if you 
> just download the fast dictionary and not the entire set of 
> dictionaries.  If no one beats me to it I will improve those sections 
> by the time we announce.
> but
> the documentation is ready for comments and can continually be 
> improved, even past an announced release if needed
> 
> On Thu, Apr 13, 2017 at 12:14 PM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > 
> > Hi Tim,
> > 
> > Excellent question/point.
> > 
> > I think that you are welcome to follow any online instructions.  We 
> > are aware that the wiki is far from complete, and one thing that I 
> > welcome everybody to do is become active on documentation.
> > 
> > So, if you find instructions for installation, workflow, etc.
> > please
> > "test" the instructions.  If there are none then comment on the 
> > absence.
> > However, I think that a paucity of documentation should not hold up 
> > the code/bin release.  I could be in the minority opinion.
> > 
> > Sean
> > 
> > -Original Message-
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Thursday, April 13, 2017 11:55 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: testing release candidates Re: Release Apache cTAKES
> > 4.0.0
> > (rc2) [SUSPICIOUS]
> > 
> > Thanks all for your hard work. I added some minor instructions to 
> > the spreadsheet that are hopefully helpful.
> > 
> > I want to test the cvd for standard dictionary lookup with the 
> > separate resoureces. Am I meant to be testing documentation as well? 
> > As in, something I can follow along and make sure it's correct? Or 
> > should I just do it the way I know how to do it?
> > Tim
> > 
> > On Wed, 2017-04-12 at 20:21 -0400, James Masanz wrote:
> > > 
> > > Hi Everyone,
> > > 
> > > We could use a google spreadsheet to end up with a sense of 
> > > testing coverage and maybe reduce duplicate testing effort too.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > > com_
> > > spreadsheets_d_1FK-
> > > 2DkEhwewLJaVCBgWsSAMhL2KNCD6L8AMfFM33oKR2Y_edit-
> > > 3Fusp-
> > > 3Dsharing=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> > > r=He
> > > up-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQ
> > > FLuR
> > > Py-8Or1PXz-Fk=e08AY-Zbdb76VvYv_7uI4PE7LSTnsaP9BWpYtALtNgI=
> > > And we can compare future releases to this one.
> > > 
> > > I put a few example lines of the first things I plan to test.
> > > I'll
> > > start testing tomorrow and add more lines for myself then.
> > > If you don't want to update the spreadsheet twice, it would still 
> > > be helpful to list what you've done after you do testing, without 
> > > listing what you plan to do ahead of time.
> > > 
> > > Thanks,
> > > -- James
> > > 
> > > 
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen 
> > > wrote:
> > > 
> > > > 
> > > > 
> > > > This is a call for a vote on releasing the following candidate
> > > > (rc2) as
> > > > Apache cTAKES 4.0.0.
> > > > 
> > > > For more detailed information on the changes/release notes, 
> > > > please
> > > > visit:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apa
> > > > che.
> > > > org_jira_secure_ReleaseNote.jspa- 
> > > > 3F=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heu
> > > > p-
> > > > IbsIg9Q1TPOylpP9FE4GTK-
> > > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qW
> > > > MQFL
> > > > uRPy-8Or1PXz-Fk=rjZm_RuqvmHgiCulkvVx1bMlB-
> > > > hPdl2e6jFALQo9EpI=
> > > > projectId=12313621=12340211
> > > > 
> > > > The release was made using the cTAKES release process documented
> > > > here:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ctakes.apa
> > > > che.
> > > > org_ctakes-2Drelease-
> > > > 2Dguide.html=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
> > > > xeFU
> > > > =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qW
> > > > MQFL
> > > > uRPy-8Or1PXz-
> > > > 

Apache cTAKES 4.0 and soliciting testimonials

2017-04-03 Thread Savova, Guergana
Dear Apache cTAKES community,

As you know, Apache cTAKES 4.0 release candidate will be ready for your testing 
sometime this week. A big round of applause goes to James Masanz and Sean Finan 
for leading this milestone release -- thank you, James and Sean!

Sally Khudairi from Apache generously offered to help us craft the announcement 
for cTAKES 4.0 release. She suggested we solicit quotes/testimonials from the 
Apache cTakes community to demonstrate the project's robustness and breadth of 
deployment. We are now asking you to send us your testimonials to include in 
the announcement. We very much look forward to your input!

Kindest regards,
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
http://ctakes.apache.org
http://thyme.healthnlp.org
http://cancer.healthnlp.org
http://share.healthnlp.org
http://center.healthnlp.org





FW: ASF Board Report for cTAKES - Initial Reminder for March 2017

2017-03-02 Thread Savova, Guergana
Some items to include in the report:
1. actively working on cTAKES 4.0 scheduled for release end of March (major 
release)
2. actively working on updating the cTAKES Confluence website
3. human-tagged gold annotations of 18 mockup clinical notes done. Notes were 
generated by a cTAKES committer-physician John Green. Format is Anafora 
(https://github.com/weitechen/anafora ), annotations are for signs/symptoms, 
diseases/disorders, procedures, anatomical sites and medications with relevant 
attributes and mappings to ontology concept codes. Human tagged annotations 
done by Dave Harris at Boston Children's Hospital.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org


-Original Message-
From: Brett Porter [mailto:br...@apache.org] 
Sent: Wednesday, March 1, 2017 6:27 AM
To: Pei J Chen 
Cc: priv...@ctakes.apache.org
Subject: ASF Board Report for cTAKES - Initial Reminder for March 2017

This email was sent on behalf of the ASF Board.  It is an initial reminder to 
give you plenty of time to prepare the report.

According to board records, you are listed as the chair of a committee that is 
due to submit a report this month. [1] [2]

The meeting is scheduled for Wed, 15 Mar 2017 at 10:30 PDT and the deadline for 
submitting your report is 1 full week prior to that (Wed Mar 8th)!

Meeting times in other time zones:

  
https://urldefense.proofpoint.com/v2/url?u=http-3A__timeanddate.com_s_3773=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=9CXNCWeglLq82KBvEvZkH9XmRRfCPgwlK6hkn1W74rY=
 

Please submit your report with sufficient time to allow the board members to 
review and digest. Again, the very latest you should submit your report is 1 
full week (7days) prior to the board meeting (Wed Mar 8th).

If you feel that an error has been made, please consult [1] and if there is 
still an issue then contact the board directly.

As always, PMC chairs are welcome to attend the board meeting.

Thanks,
The ASF Board

[1] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_committers_board_committee-2Dinfo.txt=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=JA58ZZaIG-RLKRrTJEnH0bjDyYkaLX4HQTxBIrjGdf0=
[2] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_committers_board_calendar.txt=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=6gNTwUUVr1yhHfwrwiQZw76k05LsCxa4CdomPb24Y3U=
[3] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_committers_board_templates=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=XbtYiUZAJVS_GEnFsxqbYNQInVHTJAxJfyPAdIuol0I=
[4] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__reporter.apache.org_=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=jPo9HdBMlwrjjCIDgW_SDYe9ZHbj-GTd6FVXkwIqByg=
 


Submitting your Report
--

Full details about the process and schedule are in [1].

The report should be committed to the meeting agenda in the board directory in 
the foundation repository, trying to keep a similar format to the others.
This can be found at:

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_private_foundation_board=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=IRycMcH2Tp_FHvglrdxrfIObbAXYSafzFipAmXSUywM=
 

Reports can also be posted using the online agenda tool:

  
https://urldefense.proofpoint.com/v2/url?u=https-3A__whimsy.apache.org_board_agenda_2017-2D03-2D15_cTAKES=DwICaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=JR7s3exxq16XMdUbnTSADKy12HeXAvhV5jxjKT5pMRc=xsHBVwyaBAvLrqe9XgcTJ6BpsD_I1Y0Z_liwO6SISP0=
 

Your report should also be sent in plain-text format to bo...@apache.org with a 
Subject line that follows the below format:

  Subject: [REPORT] cTAKES - March 2017

Cutting and pasting directly from a Wiki is not acceptable due to formatting 
issues. Line lengths should be limited to 77 characters.


RE: wiki wishlist [SUSPICIOUS]

2017-03-01 Thread Savova, Guergana
Thank you, James!

One suggestion (more to come): post the pamphlet that Sean Finan created for 
the cTAKES hackathon in Chicago in Nov 2016. 

--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Wednesday, March 1, 2017 1:31 PM
To: dev@ctakes.apache.org
Subject: RE: wiki wishlist [SUSPICIOUS]

Virge!  Thanks James!

Sean

-Original Message-
From: James Masanz [mailto:masanz.ja...@gmail.com] 
Sent: Wednesday, March 01, 2017 1:22 PM
To: dev@ctakes.apache.org
Subject: wiki wishlist

In an earlier post I mentioned I was interested in moving away from Confluence 
for the cTAKES wiki, but the only new wikis Infra will create are Confluence 
ones.

I suggest we use this thread + a JIRA item to compile a list of wiki changes 
people would like - formatting, content, anything to do with updating the 
cTAKES wiki, which is 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=fis6AwJb9fhaRocij6Pe0pkYNBdy-lty2Sn_7j8xt7c=KF-QpiUGZ8Ks8KuSn8OVstr8A_Wlk88EaJmt5CZ2DgE=
 

First, if you have a quick update, please just go ahead and make it!

I'll start the list with these items:
   - make the sidebar show the most recent cTAKES release at the top (reverse 
chronological order) (Done - Just did it!)
   - incorporate any comments made within the Wiki, such as this one 
https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.0-2BUser-2BInstall-2BGuide-3FfocusedCommentId-3D34013875-23comment-2D34013875=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=fis6AwJb9fhaRocij6Pe0pkYNBdy-lty2Sn_7j8xt7c=XIuRKG3vBrrf8vkmmsAwQTjiq82mgaG_XMRWSYEAvLM=
 

To add to the wish list, either comment on CTAKES-420 
 or reply to this thread.

Thanks!
James


RE: Phenotype-specific entities

2017-02-15 Thread Savova, Guergana
I don't believe there is a tool for walking the UMLS ontology, Dima. But Sean 
should confirm that his dictionary building tool does not have that 
functionality.

I think you can use the UMLS tables to get that information. It has been quite 
a while I have used these tables, but I remember I was able to get that 
information from them...

Sean,
Does your dictionary building tool implement ontology walking?

--Guergana

-Original Message-
From: Dligach, Dmitriy [mailto:ddlig...@luc.edu] 
Sent: Wednesday, February 15, 2017 1:50 PM
To: dev@ctakes.apache.org
Subject: Re: Phenotype-specific entities

Guergana, thank you. 

Is there anything in cTAKES now for walking the UMLS ontology (e.g. for finding 
hypernyms, synonyms, etc.)?

Dima



> On Feb 15, 2017, at 12:45, Savova, Guergana 
> <guergana.sav...@childrens.harvard.edu> wrote:
> 
> Hi Erin,
> Yes, creating your customized dictionary is the way to go. You can prune by 
> semantic types of interest and then remove branches that are not relevant to 
> your specific phenotype. I am not aware of cTAKES implementing such a tool 
> for a very customized dictionary.
> 
> You can also start with  a few terms that you know are relevant to your 
> phenotype and then find their synonyms in the UMLS. Then, you can further 
> walk a specific ontology and take siblings, parents if you think they are 
> relevant.
> 
> Then, there is the whole field of using word embeddings to find 
> synonyms/related terms from unlabeled data  if you want to become really 
> fancy :-) At this point, cTAKES does not implement any deep learning 
> algorithms, in the future we are planning to release a bridge to KERAS. 
> 
> I hope this makes sense.
> 
> --
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> guergana.sav...@childrens.harvard.edu
> Harvard Scholar: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.edu_guergana-5Fk-5Fsavova_biocv=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=EMsbVKH4fuTPUXGVRWfjw4vqV3ifyKdh-3K3OLUIogI=oAz3p_diNUmQdKL6UIfE9Vsnj1T4H5xq6CIof1jXisU=
>  
> ctakes.apache.org
> thyme.healthnlp.org
> cancer.healthnlp.org
> share.healthnlp.org
> 
> 
> -Original Message-
> From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu] 
> Sent: Wednesday, February 15, 2017 1:38 PM
> To: dev@ctakes.apache.org
> Subject: Phenotype-specific entities
> 
> Hi all,
> 
> I would like to be able to only identify entities that are relevant for some 
> specific phenotype. One step towards achieving this would be to build a 
> custom dictionary with a limited set of semantic types. However, this is not 
> quite specific enough to only identify mentions related to one disease while 
> ignoring those related to some other disease, for example.
> 
> Does cTAKES currently have a way to do this sort of filtering? Or, has anyone 
> developed their own tools that they'd be willing to share?
> 
> Thanks,
> Erin



RE: Phenotype-specific entities

2017-02-15 Thread Savova, Guergana
Hi Erin,
Yes, creating your customized dictionary is the way to go. You can prune by 
semantic types of interest and then remove branches that are not relevant to 
your specific phenotype. I am not aware of cTAKES implementing such a tool for 
a very customized dictionary.

You can also start with  a few terms that you know are relevant to your 
phenotype and then find their synonyms in the UMLS. Then, you can further walk 
a specific ontology and take siblings, parents if you think they are relevant.

Then, there is the whole field of using word embeddings to find 
synonyms/related terms from unlabeled data  if you want to become really fancy 
:-) At this point, cTAKES does not implement any deep learning algorithms, in 
the future we are planning to release a bridge to KERAS. 

I hope this makes sense.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org


-Original Message-
From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu] 
Sent: Wednesday, February 15, 2017 1:38 PM
To: dev@ctakes.apache.org
Subject: Phenotype-specific entities

Hi all,

I would like to be able to only identify entities that are relevant for some 
specific phenotype. One step towards achieving this would be to build a custom 
dictionary with a limited set of semantic types. However, this is not quite 
specific enough to only identify mentions related to one disease while ignoring 
those related to some other disease, for example.

Does cTAKES currently have a way to do this sort of filtering? Or, has anyone 
developed their own tools that they'd be willing to share?

Thanks,
Erin


RE: gold standard annotations for cTAKES [SUSPICIOUS]

2017-01-31 Thread Savova, Guergana
Thank you, Sean!

Yes, absolutely -- we welcome volunteers for the gold annotations!
Regards,
--Guergana

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, January 31, 2017 4:08 PM
To: dev@ctakes.apache.org
Subject: RE: gold standard annotations for cTAKES [SUSPICIOUS]

Hi all,

I just have a couple of notes to expand upon what Guergana wrote.

Anafora requires a schema for annotation and it requires text files to be in a 
certain structure.  I just checked in text files for annotation and the schema 
that we plan to use in ctakes-examples-res 
src/main/resources/org/apache/ctakes/examples/annotation/ . 

Everybody is obviously welcome to use the schema and notes, or to create 
annotations using another tool for all to share.

As a disclaimer ... Anafora is not associated with ctakes.   My opinion is that 
the ctakes devlist should not be over-used for anafora q/a.  

Thanks,
Sean

-Original Message-
From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
Sent: Tuesday, January 31, 2017 3:42 PM
To: dev@ctakes.apache.org
Subject: gold standard annotations for cTAKES [SUSPICIOUS]

A while ago our physician colleague John Green created 16 realistically looking 
(but fake) clinical notes. Many thanks again, John!

These notes are in ctakes-examples/data/notes. We now volunteer to annotate 
them with gold annotations. The main elements with their attributes are:
Medications, Attributes ::= span   associatedCode change_status_model 
conditional  dosage_model duration_model  end_date form_model frequency_model 
generic negation_indicator  route_model  start_date  strength_model  subject  
uncertainty_indicator


Signs/Symptoms, Attributes ::= associated_code body_location conditional course 
duration end_time generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Anatomical Sites, Attributes ::= associatedCode  conditional  generic  
negation_indicator  subject  uncertainty_indicator



Disease/DisordersAttributes ::= associated_code body_location conditional 
course duration end_time  generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Procedures, Attributes ::= associated_code  body_location conditional duration 
end_time generic historyOf method negation_indicator relative_temporal_context 
start_time subject uncertainty_indicator



We expect to have the gold annotations by end of March. We are using the 
Anafora annotation tool 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_weitechen_anafora=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=klIlU3or2Lr4NKPbcLwbF6pes2n2Ype-qri4zGIW_Xk=DE-u9g6s9UaCO6fLztks2ClRi7lrSCi5IkV5jtu3BPc=
  ) and will release the annotations in the xml format.



Regards,

--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu<mailto:guergana.sav...@childrens.harvard.edu>
Harvard Scholar: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.edu_guergana-5Fk-5Fsavova_biocv=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=klIlU3or2Lr4NKPbcLwbF6pes2n2Ype-qri4zGIW_Xk=8AV7t2x3gPeu3zXjyzKyiyi6KUNsNO2Qv2Jmsx2Ys1M=
 
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org





gold standard annotations for cTAKES

2017-01-31 Thread Savova, Guergana
A while ago our physician colleague John Green created 16 realistically looking 
(but fake) clinical notes. Many thanks again, John!

These notes are in ctakes-examples/data/notes. We now volunteer to annotate 
them with gold annotations. The main elements with their attributes are:
Medications, Attributes ::= span   associatedCode change_status_model 
conditional  dosage_model duration_model  end_date form_model frequency_model 
generic negation_indicator  route_model  start_date  strength_model  subject  
uncertainty_indicator


Signs/Symptoms, Attributes ::= associated_code body_location conditional course 
duration end_time generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Anatomical Sites, Attributes ::= associatedCode  conditional  generic  
negation_indicator  subject  uncertainty_indicator



Disease/DisordersAttributes ::= associated_code body_location conditional 
course duration end_time  generic historyOf negation_indicator 
relative_temporal_context severity start_time subject uncertainty_indicator



Procedures, Attributes ::= associated_code  body_location conditional duration 
end_time generic historyOf method negation_indicator relative_temporal_context 
start_time subject uncertainty_indicator



We expect to have the gold annotations by end of March. We are using the 
Anafora annotation tool (https://github.com/weitechen/anafora ) and will 
release the annotations in the xml format.



Regards,

--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org





RE: New to CTAKES [SUSPICIOUS]

2017-01-17 Thread Savova, Guergana
As Sean mentioned, we the NLP lab at Boston Children's Hospital/Harvard Medical 
School will be dedicating significant effort in the next several months (=FTE) 
to make a solid release happen asap. We expect the release within 3 months. As 
Sean mentioned help from the broader cTAKES community is welcome.
Thank you!
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
guergana.sav...@childrens.harvard.edu
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org





-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, January 17, 2017 12:54 PM
To: dev@ctakes.apache.org
Subject: RE: New to CTAKES [SUSPICIOUS]

> 3.2.3 is still considered a snapshot - do you have any feeling when/if it 
> will be released?

Good question.

One person has volunteered to be a release manager and we at the Boston 
Children's Hospital nlp group are trying to get some additional hands on the 
task.  There are still outstanding bugs.  Ctakes-core is undergoing some 
changes and should be tested before release.  I think that the state is good, 
but in my opinion the whole app needs to have some end-to-end testing before a 
major release.  At a recent hackathon, 50% of those present could not by 
themselves get ctakes installed and running even with written instructions.  In 
my mind a release that is not usable is not a release at all, so I think that 
we need to devote a little effort to usability.  Again, that is just my 
opinion.  I have put in time over the past few months to work on making ctakes 
easier for newcomers and non-developers.  As you noticed, a fair amount of 
online documentation is stale, and it would be great if people volunteered to 
update it before a release.  After that there are just the matters of updating 
the main website links, publicizing the release, release notes, and a parade 
with balloons.

It think that everybody out there would be happy if there was a new official, 
stable and useable release.  I also think that we can get one of good quality 
together within the next 3 months - more quickly if there are volunteers from 
the community.

Sean


-Original Message-
From: Dunlop, Joyce (HP) [mailto:joyce.dun...@va.gov] 
Sent: Tuesday, January 17, 2017 12:32 PM
To: dev@ctakes.apache.org
Subject: RE: New to CTAKES 

Thanks Sean,

3.2.3 is still considered a snapshot - do you have any feeling when/if it will 
be released?

Thanks,
Joyce

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, January 17, 2017 10:57 AM
To: dev@ctakes.apache.org
Subject: [EXTERNAL] RE: New to CTAKES 

Hi Joyce,

If you are building from source then you should not need to manually download 
the resources.  Maven should be doing it for you.  Well, that is the behavior 
of 3.2.3 ... I honestly cannot remember what 3.2.2 did ...

Otherwise, I think that if the latest was the 3.2.1.1 then that is probably the 
most appropriate for the 3.2.2 release if you want all of the resources.

As for building and deploying ytex, I don't have any advice.  Perhaps some ytex 
power-user out there can help.

Sean

-Original Message-
From: Dunlop, Joyce (HP) [mailto:joyce.dun...@va.gov] 
Sent: Tuesday, January 17, 2017 11:25 AM
To: dev@ctakes.apache.org
Cc: Dorner, Andrew J. (PSI); Rustrian, Armando (Liberty ITS)
Subject: New to CTAKES 

Good Morning,

I am trying to set up a development environment using the source release of 
3.2.2.

Reading though the documentation on

https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2BDeveloper-2BInstall-2BGuide=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=yBXENVQKpWjVraf6Zf7uY5l9LJxxrRiiE-yjyFID6d8=iqpkHc0kT5mucNnxYyc1mczXXlbmSVJlX-8dxeJvp2o=
 .

Merge the version-matching resources ZIP file from 
https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_projects_ctakesresources_files_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=yBXENVQKpWjVraf6Zf7uY5l9LJxxrRiiE-yjyFID6d8=wPUG8d9qpl_kQBPP5xI9y84mwMEXfaB2cdbkHvWaa0Y=
  into your ctakes-dictionary-lookup-res project.

ctakes-resources-3.2.1.1-bin.zip is available for download.  Is there a 3.2.2 version of the resources?

After reading some of the posts from 

RE: Best combination of analysis engines to consider negation, family history, uncertainty, etc.

2016-10-19 Thread Savova, Guergana
Hi Yiming,
Re your question about gold standard datasets. In parallel with releasing best 
performing methods in cTAKES, we have generated several gold standard datesets. 
Our plan is to start distributing them through a unified effort -- a health NLP 
Center. See attached exec summary. We hope to have the Center running in the 
very near future.

Cheers,
--Guergana

-Original Message-
From: Zuo Yiming [mailto:yiming...@gmail.com] 
Sent: Wednesday, October 19, 2016 12:22 PM
To: dev@ctakes.apache.org
Subject: Re: Best combination of analysis engines to consider negation, family 
history, uncertainty, etc.

Hi Sean and Timothy,

Thanks for your clarification about ClearTK tools. I'm amazed by the power of 
cTAKES and the resource and community you guys take efforts to built. I will 
certainly be happy to provide more feedback as my project moves on.

For Timothy,

By rule-based system, do you refer to the assertion annotator? How about the 
old negation annotator and the status annotator, are they also ruled-based 
system? I got a feeling that assertion annotator and ClearTK system are more 
favored than negation annotator and the status annotator for some reason in 
cTAKES right now.

Regarding ClearTK system on my test files, the negation, history, uncertainty 
modules work just fine as the assertion annotator. My test files are only a 
few, so it's really hard to tell which one is better. The main difference comes 
when detecting subject and generic property. On my limited test files, ClearTK 
system doesn't work at all. It will assign patient as the subject for all 
detected phrases when it's the patient's family member who have diabetes. The 
same problem goes to the generic property, ClearTK system assigns false as the 
generic property for all detected phrases. The paper mentioned by you and Sean 
seems interesting, I will take a look later.

As for further questions, can you guys give me some suggestions where to find 
public golden standard datasets so I can actually conduct some independent 
evaluation of cTAKES by metrics like precision/recall and F1 score?

At last, a minor suggestion from the user perspective will be to add the 
preferred words property to the AggregatePlaintextUMLSProcessor. Like I pointed 
out briefly in my first email, using AggregatePlaintextFastUMLSProcessor we can 
get the preferred words for detected phrases but not 
AggregatePlaintextUMLSProcessor. This is very helpful when the detected phrases 
are acronyms such as pt for patient. From my experience, 
AggregatePlaintextUMLSProcessor tend to detect more clinical relevant phrases 
compared with AggregatePlaintextFastUMLSProcessor. It will be really nice if we 
can have the same preferred words property in AggregatePlaintextUMLSProcessor 
in future cTAKES release.

Best,
Yiming

On Wed, Oct 19, 2016 at 11:11 AM, Miller, Timothy < 
timothy.mil...@childrens.harvard.edu> wrote:

> I can second Sean's thank you, it is good to have this feedback. The 
> ClearTK machine learning models were made the default after we ran 
> some experiments that found it performed better across a range of 
> standard datasets than rule-based algorithms or the existing cTAKES 
> module ( 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__journals.plos.org_plosone_article-3Fid-3D10.1371_journal.pone.0112774=DQIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=h2xGj7JrNP5pTj6fU4IE9EdNfbJZ0FkOk3swxGR91E4=9b891QWT_DEckn4f25-xn3W32qkz8UoOw61qKAOqpK0=
>  ).
> Since making them the default, though, we have heard from people and 
> had our own experience conflict with those experiments. And certainly 
> the errors in the rule-based system are easier to understand.
>
> Just curious, are you able to characterize the errors you see from the 
> ClearTK system? I did some experiments recently on a new dataset 
> comparing negex with the cleartk negation module and found that there 
> was a precision/recall tradeoff but almost identical F1 scores. But 
> for that dataset the tradeoff negex provided was preferred by our 
> collaborators. (I think negex had better recall of negated terms but worse 
> precision).
>
> Tim
>
>
>
> 
> From: Finan, Sean 
> Sent: Wednesday, October 19, 2016 10:53 AM
> To: dev@ctakes.apache.org
> Subject: RE: Best combination of analysis engines to consider 
> negation, family history, uncertainty, etc.
>
> Hi Yiming,
>
>
>
> Thank you very much for letting the community know what has and has 
> not worked for you.  I have also had better results with the Assertion 
> annotators than the ClearTk alternatives, but that could be because of 
> the note types/formats that I am using.
>
>
>
> Regarding the "Clear" in names, it is because ClearTk (Clear ToolKit) 
> is used to train machine learning models for detection of the 
> indicated property.  You can find information on ClearTk starting 

RE: cTAKES false positives, case-insensitivity

2016-06-01 Thread Savova, Guergana
This is the very interesting topic of Word Sense Disambiguation. Currently 
there are no generalizable large scale solutions for it... One can in a way 
"hack" it if the domain is constrained, e.g. if your extraction focuses on use 
of hearing aids, you can have a rule that says if hearing in proximity of 
aid/aids, then tag it with the code for a hearing aid and remove all other 
ontology mappings.

In general, the topic makes an excellent candidate for a PhD thesis work.

Hope this helps.
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv


-Original Message-
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Wednesday, June 1, 2016 11:28 AM
To: dev@ctakes.apache.org
Subject: cTAKES false positives, case-insensitivity

Hi,

I have encountered false positives annotated with cTAKES that seem to come from 
case-insensitivity of the annotation lookup, such as:

Pt uses hearing aids. -> "aids" is found as DiseaseDisorderMention 
cui=C0001175, Acquired Immunodeficiency Syndrome

Pt values are all stable. -> "all" is found as DiseaseDisorderMention 
cui=C1961102, Precursor Cell Lymphoblastic Leukemia Lymphoma"

Are there ways in cTAKES to approach or to resolve such issues?

How do you deal with such false positives, so that they are not matched?

Regards,
Tomasz


RE: cTAKES scale-out with DUCC and Shangridocs

2016-03-14 Thread Savova, Guergana
WOW, this is fantastic, Chris! Thank you so very much!! We will start using the 
DUCC implementation.
Cheers,
--Guergana

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Monday, March 14, 2016 2:58 PM
To: dev@ctakes.apache.org
Subject: cTAKES scale-out with DUCC and Shangridocs

Hi Team,



Just wanted to let you know that my team has completed a deployment

of cTAKES Scale-out with DUCC. Thanks to a number of contributors

in particular Yi-Wen my Directed Research student at USC, and all

the help she has received on this list and on the UIMA list. Thanks

much.



We also have an app, Shangridocs, that we are building on top of

this scale-out. You can find it here:



https://urldefense.proofpoint.com/v2/url?u=http-3A__github.com_chrismattmann_shangridocs.git=BQIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=tztXfa7c6KSvjdoVfp-B7Ay9zQwsexm2d65LcSCk2_0=T4I20dU3kYj8AGStMEcJJfenlBpq2C7aoe48mrmfH7Y=
 



We are actively working on making it more robust and scalable. Feedback

is always welcomed. Apache cTAKES is at the core and is awesome.



Cheers,

Chris



++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: chris.a.mattm...@nasa.gov

WWW:  
https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_=BQIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=tztXfa7c6KSvjdoVfp-B7Ay9zQwsexm2d65LcSCk2_0=q0GA0RJZysioUTuiGvbhfooG__KOATqEgvWCchiacnM=
 

++

Director, Information Retrieval and Data Science Group (IRDS)

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

WWW: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__irds.usc.edu_=BQIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=tztXfa7c6KSvjdoVfp-B7Ay9zQwsexm2d65LcSCk2_0=NfkeQsPLWF_QJ_fCr98_n4Lsajm_BJ54VNbQ3Zq-uw0=
 

++









RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-03-10 Thread Savova, Guergana
You can re-build the models that feed into MIST. I personally would not use the 
default model that MIST comes with as it is not trained on clinical data. In 
our previous work we found that hand-annotating about 200 docs for PHI 
(representative of the sample you are going to run the models on) results in 
building a pretty good model - in the 90's for p, r and f1. However, even with 
that high performance, the institution that owns the data might be still 
reluctant to share as it might pose a violation of HIPAA through some potential 
PHI leaks. In cTAKES our approach has been to de-couple the de-identifcation 
from the NLP/information extraction. If a user has the need for de-identified 
data, they could choose their method -- manual or otherwise -- and then process 
through cTAKES. Our focus is the NLP/IE space, while de-identification is a 
blend of that plus policy

--Guergana

-Original Message-
From: Azad Dehghan [mailto:azad.dehg...@gmail.com] 
Sent: Thursday, March 10, 2016 4:19 PM
To: dev@ctakes.apache.org
Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

Thanks Guergana.

> Yes, the current release of cTAKES has a module for the temporal
expressions which includes dates. The normalizer for the temporal expressions 
is Steven Bethard's timenorm code.
>

Great.

> However, if you do de-identification of dates/temporal expressions, 
> you
run the risk of creating incorrect timelines as many of the relative temporal 
expressions (e.g. spring of this year, x-mas time, etc.) are unlikely to be 
correctly shifted by any de-identification tool.
>
Indeed, a reason I have not included the dates component.

> One de-identification tool is MIST -- 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mist-2Ddeid.sourceforge.net_=BQIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik=5awdXn2I-hRE0-161tqFDGgmYgQQviQg360uHI4fs2s=
>   .
>
I don't remember them doing well in the community held evaluation in 2014.
Hence, cDeid :)
>
> Guergana Savova, PhD, FACMI
> Associate Professor
> PI Natural Language Processing Lab
> Boston Children's Hospital and Harvard Medical School
> 300 Longwood Avenue
> Mailstop: BCH3092
> Enders 144.1
> Boston, MA 02115
> Tel: (617) 919-2972
> Fax: (617) 730-0817
> Harvard Scholar: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__scholar.harvard.ed
> u_guergana-5Fk-5Fsavova_biocv=BQIFaQ=qS4goWBT7poplM69zy_3xhKwEW14J
> ZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGm
> RCJNAr-rCmP=FlURWGr18rKbgM76o8Hxoo1rbC2D2h-kk611lbKnPik=3taiTxFp55
> iQUnc6A6Yemg-XzFQrRjo5QZRQeKHQ29c=
>
> -Original Message-
> From: Azad Dehghan [mailto:azad.dehg...@gmail.com]
> Sent: Thursday, March 10, 2016 3:42 PM
> To: dev@ctakes.apache.org
> Subject: Re: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
>
> > This means both training data folders? I have access to the data but 
> > not
> to the challenge description.
>
> Yes. Is there any specific information that you are missing?
> >
> >
> >> It would be good to incorporate/refactor (basically, GATE API needs 
> >> to be replaced with UIMA API to generate annotation) the two-pass 
> >> recognition method for cTAKES - which has a wider application on
longitudinal data.
> >> This method is used on-top of a number NERs.
> >
> >
> > I'll take a look.
> >
> > I do not know how much time I can invest this month. Let's see how 
> > many
> phases I can translate.
> >
> > I added the rules for age. Are there jape rules for creating date
> annotations?
> >
>
> No. I believe cTAKES has existing component(s) to capture dates?
>
> > After all rules are translated, they need some major refactoring. 
> > Jape
> and Ruta are quite different in some aspects.
> >
> Ok.
>
> >
> >
> >
> >
> >
> >> Please let me know where I can help. I will be available again in
April.
> >>
> >> Cheers,
> >> Azad
> >>
> >> On 10 March 2016 at 13:13, Peter Klügl 
wrote:
> >>
> >>> Hi,
> >>>
> >>> sorry, I was quite busy last month.
> >>>
> >>> I added a new patch, which needs to be applied.
> >>>
> >>> No new rules, but it's possible now to evaluate everything against 
> >>> the labelled data of the challenge.
> >>>
> >>> @Azad:
> >>> Which documents exactly did you use to develop the rules?
> >>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> testing-PHI-Gold-fixed?
> >>>
> >>> Best,
> >>>
> >>> Peter
> >>>
> >>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> 
>  Hi,
> 
>  the last patch fixed almost all problems.
> 
>  I added another one that adds the csv file for the unit test and
> extends
>  svn-ignore.
> 
>  Best,
> 
>  Peter
> 
>  Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> >
> > Hi,
> >
> > I added another patch. I missed to manually add one test 

RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-03-10 Thread Savova, Guergana
Yes, the current release of cTAKES has a module for the temporal expressions 
which includes dates. The normalizer for the temporal expressions is Steven 
Bethard's timenorm code.

However, if you do de-identification of dates/temporal expressions, you run the 
risk of creating incorrect timelines as many of the relative temporal 
expressions (e.g. spring of this year, x-mas time, etc.) are unlikely to be 
correctly shifted by any de-identification tool.

One de-identification tool is MIST -- http://mist-deid.sourceforge.net/ . 

Hope this helps with the de-identification items
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv

-Original Message-
From: Azad Dehghan [mailto:azad.dehg...@gmail.com] 
Sent: Thursday, March 10, 2016 3:42 PM
To: dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

> This means both training data folders? I have access to the data but 
> not
to the challenge description.

Yes. Is there any specific information that you are missing?
>
>
>> It would be good to incorporate/refactor (basically, GATE API needs 
>> to be replaced with UIMA API to generate annotation) the two-pass 
>> recognition method for cTAKES - which has a wider application on 
>> longitudinal data.
>> This method is used on-top of a number NERs.
>
>
> I'll take a look.
>
> I do not know how much time I can invest this month. Let's see how 
> many
phases I can translate.
>
> I added the rules for age. Are there jape rules for creating date
annotations?
>

No. I believe cTAKES has existing component(s) to capture dates?

> After all rules are translated, they need some major refactoring. Jape
and Ruta are quite different in some aspects.
>
Ok.

>
>
>
>
>
>> Please let me know where I can help. I will be available again in April.
>>
>> Cheers,
>> Azad
>>
>> On 10 March 2016 at 13:13, Peter Klügl  wrote:
>>
>>> Hi,
>>>
>>> sorry, I was quite busy last month.
>>>
>>> I added a new patch, which needs to be applied.
>>>
>>> No new rules, but it's possible now to evaluate everything against 
>>> the labelled data of the challenge.
>>>
>>> @Azad:
>>> Which documents exactly did you use to develop the rules?
>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
testing-PHI-Gold-fixed?
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:

 Hi,

 the last patch fixed almost all problems.

 I added another one that adds the csv file for the unit test and
extends
 svn-ignore.

 Best,

 Peter

 Am 02.02.2016 um 09:16 schrieb Peter Klügl:
>
> Hi,
>
> I added another patch. I missed to manually add one test file to
version
> control, and there are still duplicate lines.
> I hope this patch fixes the remaining problems.
>
> Best,
>
> Peter
>
>
> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
>>
>> Hi,
>>
>> the problems were caused by the svn client in my Eclipse. Sorry 
>> for
the
>> trouble, I should have looked more closely at the ciomplete patch.
>>
>> I attached a new patch created with commandline tools wich looks
>>>
>>> correct
>>
>> now.
>>
>> Pei, can you apply the new patch?
>>
>> Best,
>>
>> Peter
>>
>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
>>>
>>> Thanks Pei.
>>>
>>> I fear there was again a problem with the patch. All new files 
>>> are missing (and also the svn-ignore settings).
>>>
>>> Can you take a look?
>>>
>>> Best,
>>>
>>> Peter
>>>
>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:

 patch applied.
 Thanks,
 Pei

 On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
>>>
>>> peter.klu...@averbis.com> wrote:
>
> Hi Pei,
>
> can you commit the recent patch for us?
>
> CTAKES-384-20160120.patch
>
> Best,
>
> Peter
>
> Am 20.01.2016 um 19:35 schrieb Pei Chen:
>>
>> Hi,
>> Sorry I was swamped recently.
>> But yeah, we can even create an extended type system to store
>>>
>>> these items temporarily and add them into the main/core type system 
>>> afterwards.
>>
>> There was an existing item to upgrade UIMA, but agreed- it 
>> will
>>>
>>> require much more testing.  If it works, we can upgrade it in our
sandbox
>>> area or create a branch if necessary.
>>
>> —Pei
>>
>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl <
>>>
>>> peter.klu...@averbis.com> wrote:
>>>

RE: Contributing to documentation

2016-02-10 Thread Savova, Guergana
Hi Jessica,
Thank you very much for offering to contribute to the documentation! Indeed 
this is our weak link and any help there will be greatly appreciated.
A warm welcome to the community!
--Guergana


-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Wednesday, February 10, 2016 1:41 PM
To: dev@ctakes.apache.org
Subject: Re: Contributing to documentation

We've been generally following the C-T-R model [1] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_foundation_glossary.html-23CommitThenReview=BQIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=yprBottjMZmd-5h2kun5_56ITgboOGhRiM1FrbJtLiE=BiPUyRARC7nrVJaM2ajjNaANac3AbCc0l25_hWVUCQU=
But feel free to discuss on dev@ whenever in doubt...

[1] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_foundation_glossary.html-23CommitThenReview=BQIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=yprBottjMZmd-5h2kun5_56ITgboOGhRiM1FrbJtLiE=BiPUyRARC7nrVJaM2ajjNaANac3AbCc0l25_hWVUCQU=
 

On Wed, Feb 10, 2016 at 1:31 PM, Jessica Glover  
wrote:
> Thank you. I'm excited to contribute.
>
> Is there a process by which my contributions should get "voted in" or 
> am I free to just start editing?
>
> - Jessica
>
> On Feb 10, 2016 9:28 AM, "Pei Chen"  wrote:
>
>> User Jessica Glover (jgloves) Added to:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
>> _confluence_display_CTAKES_cTAKES=BQIFaQ=qS4goWBT7poplM69zy_3xhKw
>> EW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcr
>> O4yRGmRCJNAr-rCmP=yprBottjMZmd-5h2kun5_56ITgboOGhRiM1FrbJtLiE=LVL
>> CQGevx3dGn1G-IoKWfyFMl6ZQThSi90BoERcRp6w=
>> Enjoy!
>> —Pei
>>
>> On Feb 10, 2016, at 8:43 AM, Jessica Glover 
>> 
>> wrote:
>>
>> Hi Pei,
>> I'm not sure what my confluence ID is. I log in with this email 
>> address, and I can be found under Jessica Glover in a People search.
>>
>> - Jessica
>> This would be great.  What is your confluence id (anyone should be 
>> able to create an account)?
>> --Pei
>>
>> On Tue, Feb 9, 2016 at 7:49 AM, Jessica Glover 
>>  wrote:
>>
>> Hello,
>>
>> I am a cTAKES user, but I am interested in development and especially 
>> interested in contributing to the documentation. I have some ideas 
>> for making the component use guides more user-friendly for first-time 
>> UIMAers, but I'm also eager to hear what the dev community would like 
>> to see. I am happy to write as well as create diagrams.
>>
>> Thanks,
>>
>> Jessica Glover
>>
>>
>>


RE: Clinical Element Model normalization component

2016-02-09 Thread Savova, Guergana
Hi Phuc,
Did you create a dictionary to run your pipeline with?
Did you check under IdentifiedAnnotations where there are annotations of type 
Drugs?
--Guergana


-Original Message-
From: Phan Hồng Phúc [mailto:phanhongphu...@gmail.com] 
Sent: Tuesday, February 9, 2016 2:19 PM
To: dev@ctakes.apache.org
Subject: Re: Clinical Element Model normalization component

Hi all,

I try to run the sample to get the drug name.
Let me describe my steps:
- runctakesCVD.bat
- Load AE:
apache-ctakes-3.2.2\desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextFastUMLSProcessor
- Add sample text: "*He is taking 500mg of tylenol, twice a day"*
*- *After run, I see a list of result but all type
org.apache.ctakes.drugner.type.* are [0]

I think it should contain at least drug name "tylenol" in the result.

Do I miss something?

Thank you,
- Phúc


RE: 2007 CMC Challenge data set

2016-01-28 Thread Savova, Guergana
Including Dr. John Pestian who led the creation of the dataset.
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org

-Original Message-
From: John Mongan [mailto:john.mon...@ucsf.edu] 
Sent: Wednesday, January 27, 2016 6:25 PM
To: dev@ctakes.apache.org
Subject: 2007 CMC Challenge data set

Does anyone have a copy of the 2007 Computational Medicine Center Challenge 
data?

The website for the Computational Medicine Center seems to have disappeared and 
I can't find a place to download the data anymore.

Thanks,

John



RE: ctakes with icd10

2015-12-08 Thread Savova, Guergana
Hi Alaa,
You need to create a resource off the terminology/ontology you want to use (in 
this case ICD9 or ICD10). Then run that resource with cTAKES for the fast 
dictionary lookup. There is cTAKES code and some documentation on how to create 
that resource. By default, cTAKES runs with a resource created from the English 
version of SNOMED CT and RxNORM.
Hope this helps.
--Guergana

-Original Message-
From: Alaa al Barari [mailto:alaa.albar...@gmail.com] 
Sent: Tuesday, December 8, 2015 10:01 AM
To: dev@ctakes.apache.org
Subject: ctakes with icd10

Hi,

I downloaded Latest umls version, and I want to know how to make ctakes work 
with icd10 and icd9.


Thanks


RE: End to end ctakes example app

2015-09-18 Thread Savova, Guergana
Jay,
Do you mean something like this:
http://52.26.219.218:8080/index.jsp

--guergana

-Original Message-
From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] 
Sent: Friday, September 18, 2015 10:43 AM
To: dev@ctakes.apache.org
Subject: End to end ctakes example app

Is there a end 2 end ctakes app that can be used to demonstrate all the core 
Portions of it?

We built the bigpetstore app in bigtop to do this. And we have a data generator.

Maybe time to do something similar for ctakes


RE: Including gene mappings in UMLS

2015-09-01 Thread Savova, Guergana
Hi Chris,
We have not focused on the gene mappings because gene mentions were not very 
frequent in the clinical narrative until several years ago. However, as 
genes/mutations and proteins have become actionable items in both diagnosing 
and treatment, we do have plans to add a module for gene/mutations/protein 
mentions in the very near future. We are likely to start with cancer since that 
domain offers most actionable information and is consistently recorded in 
pathology notes. Contributions are more than welcome.

Is this helping? 
Cheers,
--Guergana

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, September 1, 2015 7:16 PM
To: dev@ctakes.apache.org
Subject: Including gene mappings in UMLS

Hey cTAKES community,



Is there any particular reason that you guys didn’t do the mappings

for genes (which exists in UMLS) - but doesn’t seem to exist as

cTAKES concept identifiers (CIDs)?



Are there any plans to include this in a future release of Apache

cTAKES? We were just wondering.



Thanks all!



Cheers,

Chris



++

Chris Mattmann, Ph.D.

Chief Architect

Instrument Software and Science Data Systems Section (398)

NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

Office: 168-519, Mailstop: 168-527

Email: chris.a.mattm...@nasa.gov

WWW:  
https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_=BQIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP=Dq_pRYxsnlvTlbBj114Dbfhjvu-Ps3J0sV5YYJj-S3o=NWXiQ1BQ4rFG345QMhfFcVwsFIiy9cmd_8KD9rlGh1w=
 

++

Adjunct Associate Professor, Computer Science Department

University of Southern California, Los Angeles, CA 90089 USA

++









RE: TimeLanes

2015-06-22 Thread Savova, Guergana
The cTAKES temporal component is in the main release. You can get the system 
output, but as Sean said TimeLanes does not consume it yet.

A demo of the cTAKES temporal component can be found in Getting Started - 
Demos. Pei just put it up there, thank you very much, Pei!
--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, June 22, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: TimeLanes

Hi Maashu,



TimeLanes is currently a prototype gui under development and there is probably 
no information about it on the web.  It is in sandbox because it isn't part of 
the ctakes release and is missing much needed functionality.  For instance, It 
should display basic information about the patient and note (name, birth date, 
note date), but such things are often in structured data or some custom header 
of the note.  Right now TimeLanes does not fetch them at all (it will require 
custom readers) and just displays Dan Testing.



If you want to run it, the main class is 
org.chboston.cnlp.timeline.gui.main.TimelineMain .  Upon startup it will 
display open a note.  You can use the Open button or drag a file into the 
box.  Unfortunately, it does not yet run ctakes (coming soon), so you need to 
give it an annotated (protégé or Anafora) note or .xmi .  Using an .xmi would 
probably be easiest as you can create it with ctakes.  You can watch an 
outdated video here:  

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DKp9YE0o3urU-26feature-3Dyoutu.bed=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmPm=P2Q3bVKBdvXziFnahfApZEyBbj-eR-wV-TfEZfTtl0Qs=1HETvigL__bzBXBpv2jLdRJMvJ3CI77UQZORumsBJIMe=
 



Sean



-Original Message-

From: maa...@gmail.com [mailto:maa...@gmail.com] 

Sent: Friday, June 12, 2015 1:18 PM

To: dev@ctakes.apache.org

Subject: TimeLanes



Hi All,



I've just started working with cTAKES and was curious about TimeLanes.  I found 
it in the sandbox here:



https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_timelanes_d=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTaom=qneEArWy0QvCgMGCuF8-DwG3kslsrGAKWFtmP174uO4s=iZj-v0HJjZccezixIOmlTFwyIGFf9OqImfSv-aMKdgIe=
 



But I'm lost on how to actually use it.  I've googled around but there seems to 
be very little information on it.



Can anyone point me in the right direction?



Thanks in advance!



Cheers,



-Maashu



--

If you are immune to boredom, there is literally nothing you cannot 
accomplish.



-David Foster Wallace



RE: Exploiting the power of cTakes, using OpenNLP only

2015-05-22 Thread Savova, Guergana
Yes, you are correct. cTAKES does named entity recognition and 
normalization=mapping to an ontology (through the UMLS). The normalization part 
is what is different from what is usually done in the general domain (where 
mentions of several semantic types are discovered but not necessarily 
normalized to a concept within an ontology). In the general domain, there is a 
recent trend to normalize to Wikipedia (wikification).

In short, to do the NER in cTAKES you do need a license for the UMLS. BTW, that 
license is free for level 0 vocabularies.

Hope this information helps.
--Guergana

-Original Message-
From: Damir Olejar [mailto:olejar.da...@gmail.com] 
Sent: Friday, May 22, 2015 7:51 AM
To: dev@ctakes.apache.org
Subject: Re: Exploiting the power of cTakes, using OpenNLP only

To answer my own question, it all comes down to UMLS licensing, and which files 
are being downloaded from the server.
The files that are downloaded are compressed *.model files that can be 
integrated with cTakes.
However, there is (or might be in the near future) a restriction to which user 
can download which files, and also, there might be a copyright issue if the 
UMLS procedure is not followed.

So, yes, there is no need for UIMA, but then, for any serious work, the 
copyrights need to be respected.


On Thu, May 21, 2015 at 12:10 PM, Damir Olejar olejar.da...@gmail.com
wrote:

 To whom it may concern,

 First, I would like to apologize if my question is vague, since I am 
 new and unaccustomed to the cTakes diction. To keep my question simple 
 and up to a point, let us assume that I am working only with an Apache 
 OpenNLP. I do not have any UIMA-specific JAR files included, and let 
 us assume that I do not want to include any of them (or keep it to a 
 minimum), thus keeping the project confined to OpenNLP as much as possible.

 As far as I know, UIMA is just a framework that does not provide any 
 specific NLP tools (source:
 https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_24186742_is-2Duima-2Dprovides-2Donly-2Da-2Dwrapper-2Dor-2Dis-2Dit-2Dlike-2Dstandfordcore-2Dnlp-2Dand-2Dgated=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmPm=umFvmAvfVN2FIHuugFp5H33UdNyy-mxG3U3yDPRMp9Is=uM0wOUdg63NBJRXD3JRZeU0fx-jT8ide6bcZdx_-WY8e=
  ).
 This means that there should be a way of integrating the cTakes 
 components with OpenNLP.

 What I would like to do is to simply have the Name Entity Recognition
 (NER) applied to a text, so I know which word from an inputted 
 sentence is a medical term.  The perfect option would be if I could 
 have a *.bin file such as en-ner-person.bin”, but I think that cTakes 
 does not give us such an option, since there are no *.bin files.

 How would I accomplish such a task? Would there be any code, examples, 
 tutorials, documentations, pseudo-code, ideas ,… to take a look at?

 Thank you kindly for your time, understanding, and a patience.

 Damir



RE: Request for help:: NCBO Ontology Extraction Tool for i2b2

2015-04-27 Thread Savova, Guergana
Hi Sekhar,
You'd want to be on the i2b2 mailing list, not the cTAKES mailing list.
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, April 27, 2015 7:57 AM
To: dev@ctakes.apache.org
Subject: RE: Request for help:: NCBO Ontology Extraction Tool for i2b2

Sekhar,
You seem to be on the wrong email list.
Tim


From: Hari, Sekhar [sekhar.h...@cgi.com]
Sent: Monday, April 27, 2015 7:50 AM
To: dev@ctakes.apache.org
Subject: RE: Request for help:: NCBO Ontology Extraction Tool for i2b2

Hello there - Any luck on extracting and processing these ontologies; 
particularly OAE, SSE, and OVAE?



Many thanks,

Sekhar H.



-Original Message-

From: Hari, Sekhar

Sent: Friday, April 24, 2015 11:45 AM

To: dev@ctakes.apache.org

Subject: RE: Request for help:: NCBO Ontology Extraction Tool for i2b2



I checked only those 4 Ontologies that I mentioned in my email. In this site - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__i2b2.bioontology.org_d=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNEs=o9OW8Ggj5Sf1Qj9MP74B-vyof_EDsrOWZhRPNHFNsh0e=
  , I see that you have submitted a number of final metadata files for 
different Ontologies. I am not familiar with Extraction and Processing programs 
to modify it; hence I requested the group under the hope that somebody can 
extract and process the final metadata files for these Ontologies.



WHO-ART:

For this one, the problem is that the Processing program dies with the GC 
Overhead Limit reached error exactly after the output file size reaches 11GB 
(if I provide the  pathFormat as 'Medium'; dies at 9.4GB if the pathFormat is 
'Short'). The Extraction program worked very well.

I contacted Lori, and here is what he has to say:

Problem with WHO-ART is that its circular...  I don't have a solution for this 
problem.  ..

Traverse down one of the AV Block paths / Retinal Odeama / Fungal ../ Thyroid 
... / Aspiration / and Av Block again...  it goes on and on...



OAE, SSE, OVAE:

For this one, the problem is different. There is no GC Overhead Limit error. 
But when the Extraction program runs, after each page there is Java 
NullPointerException. Lori asked me to modify the program. Below is Lori's 
response:



I see the problem



My code assumes the following format for each concept:



Example from ICD9:

classproperties

tuiCollectiontui 
type=https://urldefense.proofpoint.com/v2/url?u=http-3A__bioportal.bioontology.org_ontologies_umls_tuid=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNEs=pJ0tV9QPzp3YPIl85qnP8S4zaxpEE7m8auQPWFGkvNAe=
 T061/tui/tuiCollection

notationCollectionnotation 
type=https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_2004_02_skos_core-23notationd=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNEs=Khia9dB2IyUg57GmBl39USHjHqNrPovzCUP3ivBAOR4e=
 83.72/notation/notationCollection

cuiCollectioncui 
type=https://urldefense.proofpoint.com/v2/url?u=http-3A__bioportal.bioontology.org_ontologies_umls_cuid=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNEs=PrdG0tWcPXszObCrai-Xn-pVAVhIbmw8-jvCfImP2zAe=
 C0185466/cui/cuiCollection

prefLabelCollectionprefLabel 
type=https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_2004_02_skos_core-23prefLabeld=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNEs=p0y5VnSnfBTe2SjT5xQGpY4PoWOKqMQTkuTV1bY1O9Me=
 Recession of tendon/prefLabel/prefLabelCollection



Its expecting to see notationCollection to obtain the basecode of the term.



In your case

There is no notationCollection entry.   (why you are seeing null pointers)



It does have (which I assume is the basecode) prefixIRICollectionprefixIRI 
type=https://urldefense.proofpoint.com/v2/url?u=http-3A__data.bioontology.org_metadata_prefixIRId=BQIGaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674hm=qiYabyWg17ViMroLBEPKvb0M8-0Z-JpASggJzpQMMNEs=TQXj3b4O_PVIR5VZnzWgq2RzphBc3LeKpnI2LPFih40e=
 OAE:0001620/prefixIRI



Your problem is going to need a custom solution, that unfortunately I don't 
have the bandwidth for.   I can tell where/how to modify the code to fit your 
needs.  Let me know if you need assistance in modifying the code.



Thanks,

Sekhar H.



-Original Message-

From: Pei Chen [mailto:chen...@apache.org]

Sent: Thursday, April 23, 2015 9:22 PM

To: dev@ctakes.apache.org

Subject: Re: 

RE: Negex

2015-01-05 Thread Savova, Guergana
Yes, they were added in the rule-based implementation. You can still use it if 
you'd like.
--Guergana


-Original Message-
From: Green, John [mailto:john.gr...@usuhs.edu] 
Sent: Monday, January 05, 2015 12:59 PM
To: dev@ctakes.apache.org
Subject: Negex

Hi all - Does anyone know off the top of their head if the negex trigger rules 
included in the original 2009 python script were added to when it was 
implemented in ctakes?

Thanks,
John


RE: cTakes polarity problem

2014-12-31 Thread Savova, Guergana
cTAKES also implements a rule-based approach to the negation/polarity problem. 
It was the default until the latest release. You are free to use the rule-based 
implementation and compare results with the ML approach.
--Guergana

-Original Message-
From: Michael J Gurley [mailto:m-gur...@northwestern.edu] 
Sent: Wednesday, December 31, 2014 11:22 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes polarity problem

I think this demonstrates that machine learning is not the right approach to 
the negation/polarity problem.


Michael Gurley
m-gur...@northwestern.edu
312 925 3268
Northwestern University Clinical and Translational Sciences Institute
(NUCATS)
http://www.nucats.northwestern.edu
Rubloff Building
750 N Lake Shore Drive, 11th Floor
Chicago, IL 60611







On 12/31/14 9:13 AM, Miller, Timothy
timothy.mil...@childrens.harvard.edu wrote:

Hi Yu,

The new polarity module is machine-learning based so it is not always 
easy to diagnose accuracy issues. But generally it might mean there was 
no example like that in the training data. It was trained on multiple 
corpora, but sometimes certain phrases slip through the cracks, and 
Deny hepatitis, while possible in the truncated language of clinical 
notes, seems like an unlikely phrase and so it may not be in our data.
Is that a real example you saw or just a minimum (not) working example?
If not do you have a real example (i.e. a whole sentence) where deny
should cause a negation but does not? If so I will look into it. We 
have had a few reports like this so it may be worth keeping track of 
missed examples for future iterations of the module. It is important 
that they be real examples from the wild though.

(As an aside, machine learning methods don't understand language the 
way people do so even if it seems obvious to a human that Deny disease.
should be negated, if it looks different enough from the context of an 
example from the training data the ML will sometimes fall back to the 
majority class of Not negated.)

Tim


On 12/31/2014 10:03 AM, Yu Liang wrote:
 I have a quick question about CTAKES.
 I am using AE ³AggregatePlaintextUMLSProcessor.xml² and want to get 
some negation results by referring to polarity attribute.
 However, it turns out, for example ³Negative for hepatitis², is not 
negated. I think it is weird and I tried ³No hepatitis², ³ Denies 
hepatitis² which return ³polarity= -1², but ³Deny hepatitis.² returns 
³polarity=1².

 any one could give me some clue that what is wrong? Thank you!




RE: cTakes Annotation Comparison

2014-12-19 Thread Savova, Guergana
We are doing a similar kind of evaluation and will report the results.

Before we released the Fast lookup, we did a systematic evaluation across three 
gold standard sets. We did not see the trend that Bruce reported below. The P, 
R and F1 results from the old dictionary look up and the fast one were similar.

Thank you everyone!
--Guergana

-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com] 
Sent: Friday, December 19, 2014 9:02 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my 
small tests that I've done in a non-systematic way. Did you happen to capture 
the number of false positives yet (annotations made by cTAKES that are not in 
the human adjudicated standard)? I've seen a lot of dictionary hits that are 
not actually entity mentions, but I haven't had a chance to do a systematic 
analysis (we're working on our annotated gold standard now). One great example 
is the antibiotic Today. Every time the word today appears in any text it is 
annotated as a medication mention when it almost never is being used in that 
sense.

These results by themselves are quite disappointing to me. Both the 
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
recall. It seems like the trade off for more speed is a ten-fold (or more) 
decrease in entity recognition.

Thanks again for sharing your results with us. I think they are very useful to 
the project.

- Dave

On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen  
bruce.tiet...@perfectsearchcorp.com wrote:

 Actually, we are working on a similar tool to compare it to the human 
 adjudicated standard for the set we tested against.  I didn't mention 
 it before because the tool isn't complete yet, but initial results for 
 the set (excluding those marked as CUI-less) was as follows:

 Human adjudicated annotations: 4591 (excluding CUI-less)

 Annotations found matching the human adjudicated standard
 UMLSProcessor  2245
 FastUMLSProcessor   215






  [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen 
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com

 On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 
 wrote:
 
  Bruce,
  Thanks for this-- very useful.
  Perhaps Sean Finan comment more-
  but it's also probably worth it to compare to an adjudicated human 
  annotated gold standard.
 
  --Pei
 
  -Original Message-
  From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]
  Sent: Thursday, December 18, 2014 1:45 PM
  To: dev@ctakes.apache.org
  Subject: cTakes Annotation Comparison
 
  With the recent release of cTakes 3.2.1, we were very interested in 
  checking for any differences in annotations between using the 
  AggregatePlaintextUMLSProcessor pipeline and the 
  AggregatePlanetextFastUMLSProcessor pipeline within this release of
 cTakes
  with its associated set of UMLS resources.
 
  We chose to use the SHARE 14-a-b Training data that consists of 199 
  documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the 
  basis for the comparison.
 
  We decided to share a summary of the results with the development 
  community.
 
  Documents Processed: 199
 
  Processing Time:
  UMLSProcessor   2,439 seconds
  FastUMLSProcessor1,837 seconds
 
  Total Annotations Reported:
  UMLSProcessor  20,365 annotations
  FastUMLSProcessor 8,284 annotations
 
 
  Annotation Comparisons:
  Annotations common to both sets:  3,940
  Annotations reported only by the UMLSProcessor: 16,425
  Annotations reported only by the FastUMLSProcessor:4,344
 
 
  If anyone is interested, following was our test procedure:
 
  We used the UIMA CPE to process the document set twice, once using 
  the AggregatePlaintextUMLSProcessor pipeline and once using the 
  AggregatePlaintextFastUMLSProcessor pipeline. We used the 
  WriteCAStoFile CAS consumer to write the results to output files.
 
  We used a tool we recently developed to analyze and compare the 
  annotations generated by the two pipelines. The tool compares the 
  two outputs for each file and reports any differences in the 
  annotations (MedicationMention, SignSymptomMention, 
  ProcedureMention, AnatomicalSiteMention, and
  DiseaseDisorderMention) between the two output sets. The tool 
  reports the number of 'matches' and 'misses' between each annotation set. A 
  'match'
 is
  defined as the presence of an identified source text interval with 
  its associated CUI appearing in both annotation sets. A 'miss' is 
  defined as the presence of an identified source text interval and 
  its associated CUI in one annotation set, but no matching identified 
  source text interval
 and
  CUI in the other. The tool also reports the total number of 
  annotations (source text 

RE: cTakes Annotation Comparison

2014-12-19 Thread Savova, Guergana
Several thoughts:
1. The ShARE corpus annotates only mentions of type Diseases/Disorders and only 
Anatomical Sites associated with a Disease/Disorder. This is by design. cTAKES 
annotates all mentions of types Diseases/Disorders, Signs/Symptoms, Procedures, 
Medications and Anatomical Sites. Therefore you will get MANY more annotations 
with cTAKES. Eventually the ShARe corpus will be expanded to the other types.

2. Keeping (1) in mind, you can approximately estimate the precision/recall/f1 
of cTAKES on the ShARe corpus if you output only mentions of type 
Disease/Disorder. 

3. Could you send us the list of files you use from ShARe to test? We have the 
corpus and would like to run against as well.

Hope this makes sense...
--Guergana

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Friday, December 19, 2014 1:16 PM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Our analysis against the human adjudicated gold standard from this SHARE corpus 
is using a simple check to see if the cTakes output included the annotation 
specified by the gold standard. The initial results I reported were for exact 
matches of CUI and text span.  Only exact matches were counted.

It looks like if we also count as matches cTakes annotations with a matching 
CUI and a text span that overlaps the gold standard text span then the matches 
increase to 224 matching annotations for the FastUMLS pipeline and 2319 for the 
the old pipeline.

The question was also asked about annotations in the cTakes output that were 
not in the human adjudicated gold standard. The answer is yes, there were a lot 
of additional annotations made by cTakes that don't appear to be in the gold 
standard. We haven't analyzed that yet, but it looks like the gold standard we 
are using may only have Disease_Disorder annotations.



 [image: IMAT Solutions] http://imatsolutions.com  Bruce Tietjen Senior 
Software Engineer
[image: Mobile:] 801.634.1547
bruce.tiet...@imatsolutions.com

On Fri, Dec 19, 2014 at 9:54 AM, Miller, Timothy  
timothy.mil...@childrens.harvard.edu wrote:

 Thanks Kim,
 This sounds interesting though I don't totally understand it. Are you 
 saying that extraction performance for a given note depends on which 
 order the note was in the processing queue? If so that's pretty bad! 
 If you (or anyone else who understands this issue) has a concrete 
 example I think that might help me understand what the problem is/was.

 Even though, as Pei mentioned, we are going to try moving the 
 community to the faster dictionary, I would like to understand better 
 just to help myself avoid issues of this type going forward (and 
 verify the new dictionary doesn't use similar logic).

 Also, when we finish annotating the sample notes, might we use that as 
 a point of comparison for the two dictionaries? That would get around 
 the issue that not everyone has access to the datasets we used for 
 validation and others are likely not able to share theirs either. And 
 maybe we can replicate the notes if we want to simulate the scenario 
 Kim is talking about with thousands or more notes.

 Tim


 On 12/19/2014 10:24 AM, Kim Ebert wrote:
 Guergana,

 I'm curious to the number of records that are in your gold standard 
 sets, or if your gold standard set was run through a long running cTAKES 
 process.
 I know at some point we fixed a bug in the old dictionary lookup that 
 caused the permutations to become corrupted over time. Typically this 
 isn't seen in the first few records, but over time as patterns are 
 used the permutations would become corrupted. This caused documents 
 that were fed through cTAKES more than once to have less codes 
 returned than the first time.

 For example, if a permutation of 4,2,3,1 was found, the permutation 
 would be corrupted to be 1,2,3,4. It would no longer be possible to 
 detect permutations of 4,2,3,1 until cTAKES was restarted. We got the 
 fix in after the cTAKES 3.2.0 release. 
 https://issues.apache.org/jira/browse/CTAKES-310
 Depending upon the corpus size, I could see the permutation engine 
 eventually only have a single permutation of 1,2,3,4.

 Typically though, this isn't very easily detected in the first 100 or 
 so documents.

 We discovered this issue when we made cTAKES have consistent output of 
 codes in our system.

 [IMAT Solutions]http://imatsolutions.com
 Kim Ebert
 Software Engineer
 [Office:] 801.669.7342
 kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com
 On 12/19/2014 07:05 AM, Savova, Guergana wrote:

 We are doing a similar kind of evaluation and will report the results.

 Before we released the Fast lookup, we did a systematic evaluation 
 across three gold standard sets. We did not see the trend that Bruce 
 reported below. The P, R and F1 results from the old dictionary look 
 up and the fast one were similar.

 Thank you everyone!
 --Guergana

 -Original Message-
 From: David Kincaid

gold standard annotations for Apache cTAKES sample notes

2014-12-05 Thread Savova, Guergana
Thanks to John Green, we now have sample clinical notes in cTAKES. Many thanks, 
John, for your effort!

We will take these notes and will start generating gold annotations that could 
be used then to compare cTAKES output to. We are planning to include 
annotations for:

1.   Entities with the attributes

2.   LocationOf and DegreeOf relations

3.   Within-document coreference

4.   Events

5.   Temporal expressions

6.   Temporal relations

Effort permitting, we also have on the list annotations for:

1.   Syntactic trees

2.   Dependency links

3.   Semantic roles

It will take us some time to generate the gold annotations, we will keep you 
posted on the progress.
Cheers,
--Guergana


RE: revamping the Apache cTAKES website

2014-12-05 Thread Savova, Guergana
Wonderful, thank you, Michelle! There will be a flurry of emails the week of 
Dec 15 followed by actual work, so book your calendar if possible...
--Guergana

-Original Message-
From: Michelle Chen [mailto:michelle1919c...@gmail.com] 
Sent: Friday, December 05, 2014 11:48 AM
To: dev@ctakes.apache.org
Subject: Re: revamping the Apache cTAKES website

Hello Guergana,

I don't know that much about cTakes, but would be interested in contributing to 
the effort.

I'm not sure if there is an interest in matching the website design of other 
Apache projects, but it seems that the two main designs that are being used 
from my arbitrary search on http://projects.apache.org/indexes/alpha.html is 1. 
the current design that cTakes is using and 2. a Bootstrap approach.

I've done a little bit of work on Bootstrap and would be interested in helping 
with that. Let me know how I can be helpful.

Sincerely,
Michelle Chen :)

Be strong and of good courage; do not be afraid, nor be dismayed, for the Lord 
your God is with you wherever you go. ~Joshua 1:9


On Fri, Dec 5, 2014 at 11:21 AM, Savova, Guergana  
guergana.sav...@childrens.harvard.edu wrote:

 cTAKES-ers,

 we would like to start working on updating the Apache cTAKES website - 
 some of the information there is already stale and needs refreshing. 
 Do you have ideas on website design, content, etc.? Would you like to 
 contribute to the effort? We are planning to start working on the 
 website the week of Dec 15.

 Cheers,
 --Guergana




RE: Scaling cTakes

2014-12-05 Thread Savova, Guergana
Hi Brandon,
Our estimate of how long it takes to process a document is under a second with 
the fast dictionary lookup I believe. Sean can provide more details. 
--Guergana

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Friday, December 05, 2014 1:21 PM
To: dev@ctakes.apache.org
Subject: RE: Scaling cTakes

Hi Brandon,

It sounds like you've got  a decent pipeline set up.  To increase the speed you 
could try swapping out use of ctakes-dictionary-lookup with 
ctakes-dictionary-lookup-fast in the AE.  Check 
ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml for 
an example.  As for the CASPool, I don't think that it will make any difference 
for cTakes.  

Sean

From: Geise, Brandon D. [bdge...@geisinger.edu]
Sent: Friday, December 05, 2014 12:40 PM
To: dev@ctakes.apache.org
Subject: Scaling cTakes

Hi,

I'm new to cTakes and the UIMA framework.  I've read most of the UIMA 
documentation and was able to take the BagofCUIGenerator example and modify to 
read notes from a DB, process using the UMLS AE in the clinical-pipeline using 
a local DB version of UMLS, and output the CUIs to a DB.  However, the problem 
I'm having is it's extremely slow; ~3.5-4 notes a minute.  I was hoping I could 
get some hints or advice on speeding the process up.  I read there's a patch 
for LVG, but wasn't quite sure how to implement.  Also from testing using the 
CPE GUI, I don't notice any different in processing time by adjusting the 
CASPool setting.  Some advice on the CASPool would be appreciated also.

Thanks,
Brandon


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.


RE: YTEX depends on trove4j? LGPL issue

2014-10-15 Thread Savova, Guergana
At one point (some long time ago), I remember LGPL was compatible with Apache. 
What version of LGPL is this dependency using?
--Guergana

-Original Message-
From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] 
Sent: Wednesday, October 15, 2014 11:43 AM
To: dev@ctakes.apache.org
Subject: RE: YTEX depends on trove4j? LGPL issue

Steve,
This is a good catch!  I was pretty sure 3rd party libs were checked but 
somehow this may have been missed.
I noticed it's in the convenience binary distro as well.  We need to remove 
this; I'll create a Jira.
VJ, could you confirm- I actually don't think we use trove4j in ytex? 
ctakes-ytex/pom.xml

--Pei

 -Original Message-
 From: Steven Bethard [mailto:steven.beth...@gmail.com]
 Sent: Wednesday, October 15, 2014 10:40 AM
 To: dev@ctakes.apache.org
 Subject: YTEX depends on trove4j? LGPL issue
 
 It seems that YTEX depends on trove4j which is LGPL [1], but 
 LGPL-licensed works must not be included in Apache products [2].
 Have the YTEX dependencies been reviewed for licensing issues? (I only 
 stumbled upon the trove issue via a version conflict in other code.)
 
 Steve
 
 [1] http://trove4j.sourceforge.net/html/license.html
 [2] http://www.apache.org/legal/resolved.html


RE: sentence detector model

2014-09-29 Thread Savova, Guergana
How about pairing it with THYME and MiPACQ? Perhaps you are using them 
already...
--Guergana

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Monday, September 29, 2014 1:38 PM
To: dev@ctakes.apache.org
Subject: Re: sentence detector model

Some of them are a bit artificial for this task, with notes being annotated as 
one sentence per line and offset punctuation. I think maybe the 2008 and 2009 
data might have original formatting though, with newlines not always breaking 
sentences. That has certain advantages over raw MIMIC for training since the 
PHI isn't so weirdly formatted, but then again is not a mix of styles (that is, 
the styles of newline always terminates sentence vs. sometimes terminates 
sentence). I think it would still have to be paired with another dataset to be 
a representative sample.
Tim

On 09/29/2014 01:24 PM, vijay garla wrote:
 Why not use the i2b2 corpora?

 On Monday, September 29, 2014, Dligach, Dmitriy  
 dmitriy.dlig...@childrens.harvard.edu wrote:

 Maybe creating a made-up set of sentences would be an option? That 
 way we could agree on the annotation of concrete cases. Although this 
 would be more of a unit test than a corpus.

 Dima




 On Sep 27, 2014, at 12:15, Miller, Timothy  
 timothy.mil...@childrens.harvard.edu javascript:; wrote:

 I've just been using the opennlp command line cross validator on the
 small dataset i annotated (along with some eyeballing). It would be 
 cool if there was a standard clinical resource available for this 
 task, but I hadn't considered it much because the data I annotated 
 pulls from multiple datasets and the process of  arranging with 
 different institutions to make something like that available would probably 
 be a nightmare.
 Tim

 Sent from my iPad. Sorry about the typos.

 On Sep 27, 2014, at 12:16 PM, Dligach, Dmitriy 
 dmitriy.dlig...@childrens.harvard.edu javascript:; wrote:
 Tim, thanks for working on this!

 Question: do we have some formal way of evaluating the sentence
 detector? Maybe we should come up with some dev set that would 
 include examples from mimic...
 Dima




 On Sep 27, 2014, at 8:57, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu javascript:; wrote:
 I have been working on the sentence detector newline issue, 
 training a
 model to probabilistically split sentences on newlines rather than 
 forcing sentence breaks. I have checked in a model to the repo under 
 ctakes-core-res. I also attached a patch to ctakes-core to the jira issue:
 https://issues.apache.org/jira/browse/CTAKES-41

 for people to test. The status of my testing is that it doesn't 
 seem
 to break on notes where ctakes worked well before (those where 
 newlines are always sentence breaks), and is a slight improvement on 
 notes where newlines may or may not be sentence breaks. Once the 
 change is checked in we can continue improving the model by adding 
 more data and features, but the first hurdle I'd like to get past is 
 making sure it runs well enough on the type of data that the old 
 model worked well on. Let me know if you have any questions.
 Thanks
 Tim




RE: De-identified lab tests dataset

2014-09-29 Thread Savova, Guergana
Ajay,
cTAKES currently does not implement a method to discover labs from the text. 
The motivation is that you can get that easily from the structured part of the 
EMR (what Pete explained below). Hope this makes sense!
--Guergana

-Original Message-
From: Peter Szolovits [mailto:p...@mit.edu] 
Sent: Monday, September 29, 2014 2:32 PM
To: dev@ctakes.apache.org
Subject: Re: De-identified lab tests dataset

Ajay, I'm confused by your query.  cTakes is good at interpreting text, but 
most lab test results are reported in tabular form that is most appropriately 
searched by SQL queries.  Sometimes lab results are also reported in narrative 
notes, but parsing those is often more a matter of deciphering the text 
structure of tables than of parsing real English text.  What am I 
misunderstanding?

--Pete Sz.

On Sep 29, 2014, at 2:25 PM, Ajay Jain ajayj...@mobileinsights.net wrote:

 Hello All,
 
 I am working on a use case for lab tests data using cTAKES and my 
 online search to find a test dataset has been futile.  I'll greatly 
 appreciate if someone can share such a dataset or can point me in the 
 right direction to go looking for one.
 
 Best,
 Ajay
 
 --
 Founder  CEO
 Mobile Insights, Inc.
 (630) 408-8623



RE: temporal assertion module

2014-03-26 Thread Savova, Guergana
The temporal module is still in development. We are working on a release but 
will take couple of months. We will email the cTAKES community once the 
temporal system is ready to go.
--Guergana

-Original Message-
From: digital paula [mailto:cybersat...@hotmail.com] 
Sent: Tuesday, March 25, 2014 6:42 PM
To: dev@ctakes.apache.org
Subject: temporal assertion module

Hello cTAKES Developer Community,
 
There are about 10 packages under temporal with a total of maybe 80 or 90 files 
but no XML descriptor.  I see that the temporal module is not part of the 
released version in cTAKES since no documentation available in component use 
guide page: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+Component+Use+Guide#cTAKES3.1ComponentUseGuide-Components
 
I'd like to use the temporal module but I'm lost without the XML descriptors 
which is how I integrate modules into the clinical pipeline.  If anyone's using 
the temporal module I'd appreciate any help to get started which could be links 
to more info on usage, maybe even an XML descriptor or two on the temporal 
module.  
 
Thanks.
 
Regards,
Paula
  


RE: ctakes-pad-term-spotter component?

2014-02-18 Thread Savova, Guergana
I vote to deprecate.
--Guergana

From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
Sent: Tuesday, February 18, 2014 10:45 AM
To: u...@ctakes.apache.org; dev@ctakes.apache.org
Subject: ctakes-pad-term-spotter component?

Hi,
Is anyone still using the pad-term-spotter component?
Deprecating this module if it's no longer used will simplify the codebase and 
reduce the effort in support...

--Pei



RE: Happy thanksgiving !

2013-11-29 Thread Savova, Guergana
Thank you, Andy! Happy thanksgiving cTAKES team!
--Guergana

-Original Message-
From: andy mcmurry [mailto:mcmurry.a...@gmail.com] 
Sent: Thursday, November 28, 2013 7:19 PM
To: dev@ctakes.apache.org
Subject: Happy thanksgiving !

Sharing is caring! Happy thanksgiving to the ctakes crew. Thanks for 
everything, you are appreciated


RE: ctakes-examples project?

2013-08-21 Thread Savova, Guergana
We will look at the metadata info. Some of it is critical for the annotations 
(e.g. docTime).
Thank you, John.
--Guergana

-Original Message-
From: John Green [mailto:john.travis.gr...@gmail.com] 
Sent: Wednesday, August 21, 2013 12:56 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes-examples project?

So far I have about 15 notes done. Im submitting them slow as, after I said 
they were done I decided one last review of each for gross errors and 
completeness would be in order. Im slowly working through the proof read of 
each now. Just FYI if anyone was wondering.

I don't know what the coders can do with the metadata I included at the top of 
each note. I thought it would be useful, however, to attempt to describe the 
data with these additional metrics. Maybe they are already included vectors in 
the gold-standard annotation.

JG


On Wed, Aug 21, 2013 at 12:00 PM, Pei Chen chen...@apache.org wrote:

 John is creating example clinical notes [1]...  and I believe Guergana's 
 group and co. will create gold standard annotations for them (as 
 training examples)?

 Where would be a good home for something like this?

 What do folks think about creating a separate project called 
 ctakes-examples?  An alternative might be to put it in one of the test 
 projects?
 [1] https://issues.apache.org/jira/browse/CTAKES-223

 --Pei



RE: Examples

2013-08-17 Thread Savova, Guergana
I second Pei. Excellent development, John! Thank you!

I can ask my annotators to create gold annotations for several layers according 
to the annotation guidelines. How many notes do you have, John?
--Guergana

-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Saturday, August 17, 2013 2:49 PM
To: dev@ctakes.apache.org
Subject: Re: Examples

Hi John,
This is great news.
I would suggest creating a Jira item and attach the docs to it (Additional
ActionsAttach). [On can create an Jira account on
https://issues.apache.org/jira/browse/ctakes.]
Then we can make the commits for you.
This is awesome... do anyone have the gold standard annotation guideline(s) 
handy?  Would be great to have those test notes annotated as well as examples.

--Pei



On Sat, Aug 17, 2013 at 11:41 AM, John Green john.travis.gr...@gmail.comwrote:

 Just got some free time. I have a number of example free-text per 
 previous discussions to upload. They're quality but not annotated. Do 
 I need someone to commit for me?

 Thanks,
 J Green



RE: Next cTAKES release (3.1)?

2013-07-18 Thread Savova, Guergana
We have 5-6 clinical notes that we got from the web (=publicly available to 
anyone). We can include them as samples in the 3.1 release. We have been using 
these notes for demo purposes.
--Guergana

-Original Message-
From: Andy McMurry [mailto:mcmurry.a...@gmail.com] 
Sent: Friday, June 28, 2013 10:15 AM
To: dev@ctakes.apache.org
Subject: Re: Next cTAKES release (3.1)?

iDash and others have medical NLP datasets that could be used for ctakes 
Getting Started examples http://idash.ucsd.edu/nlp-and-data-modeling
http://idash.ucsd.edu/nlp/umls-vm

the GOOD: iDash already includes ctakes 
the BAD: iDash references old versions ctakes and points to cabig (which is now 
defunct)   

Recommendation: we should talk to iDash, create hello medical world training 
examples, and request iDaash point to the cTakes Apache home page. 

Disclaimer: I'm not involved with iDash 

On Jun 27, 2013, at 10:58 PM, Girivaraprasad Nambari girinamb...@gmail.com 
wrote:

 Hi Vijay and Andy,
 
 Thanks for sharing those examples.
 
 Trouble is, privacy requires that these examples be made up by hand
 
 Agree with this statement and this is very valid concern.
 
 In getting started examples, I think we should just have couple of 
 entries (5-10 small entries), not more than that (with explicit 
 statement like ONLY EXAMPLE, NOT GOOD FOR REAL USAGE). I understand 
 handcrafting these may not be easy because we are not medical domain 
 experts, but I feel worth time, because it brings in more user community.
 
 Thank you,
 Giri
 
 
 
 
 
 On Thu, Jun 27, 2013 at 10:25 PM, Andy McMurry mcmurry.a...@gmail.comwrote:
 
 GREAT !
 
 The i2b2 data though isn't publicly distributable, you still need to 
 request access to it since it is semi private
 
 
 On Jun 27, 2013, at 9:52 PM, vijay garla vnga...@gmail.com wrote:
 
 We released code on using cTAKES to annotate clinical text and SVMs 
 that use the annotations to classify clinical text from the CMC 2007 
 and I2B2
 2008 challenges:
 
 We did the cmd 2007 with cTAKES 2.5:
 
 https://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08#Repro
 ducing_results_on_CMC_2007_challenge
 https://code.google.com/p/ytex/downloads/list
 
 
 And the i2b2 2008 with the version of cTAKES distributed with the 
 first version of ARC:
 https://code.google.com/p/ytex/wiki/FeatEng_V05#i2b2_2008
 
 These are both publicly available datasets, and represent real-world 
 problems (in general I believe when publishing a paper the code 
 should be reproducible and made publicly available, but that's a different 
 issue).
 
 When we get around to upgrading YTEX to cTAKES 3.1, we would like to 
 upgrade these samples as well.
 
 Best,
 
 VJ
 
 
 
 On Thu, Jun 27, 2013 at 8:32 PM, Andy McMurry 
 mcmurry.a...@gmail.com
 wrote:
 
 +1 suggestion for documenting many examples of getting started 
 +NLP
 datasets.
 
 I have at least one we can use that was created by our lead 
 Pathologist
 
 
 https://open.med.harvard.edu/svn/scrubber/releases/3.0/data/input/cas
 es/train/traincase.xml
 
 We should provide at least one sample for each domain.
 Trouble is, privacy requires that these examples be made up by hand 
 and not copy-pasted from EMR systems.
 
 --Andy
 
 On Jun 27, 2013, at 5:32 PM, Girivaraprasad Nambari 
 girinamb...@gmail.com
 wrote:
 
 +1 for this observation Andy!
 
 Lowering time will motive users in writing blogs about features, 
 how
 to,
 etc., which reduces core team work load on documentation.
 
 I have been trying to write a small how to write standalone 
 client for ctakes with my experience (I saw at least 4 users 
 posted similar
 question
 in last 2 months), but not getting enough time because ctakes 
 depends
 on
 lot of other frameworks (UimaFit, cleartk, UIMA Framework etc.,), 
 most
 of
 my spare time is being spent on juggling between these frameworks,
 posting
 and browsing those forums, relating observations to ctakes code. I
 think
 we
 need to have some high level documentation about these (with links 
 to corresponding forums).
 
 Above case is for developers (I think this will be more user base 
 as
 ctakes
 progress), for users I think documentation is lot better though 
 some improvements need to be done.
 
 As a developer I felt tough with lack of sample training data (I 
 am
 still
 struggling in this area even though I browsed all relevant code),
 though
 training class are there. I understood that there are licensing 
 issues
 with
 REAL data, but at least some hand made example sentences, which 
 may not
 be
 real but helps developers in understanding the type/structure of 
 input TRAINING classes expecting. This way people who browse the 
 code can
 reverse
 engineer and develop their own models. Sorry if you guys feel this 
 as novice issue, but I feel most of the developers will be novice 
 when
 they
 adopt a system and Machine Learning/NLP is ocean. Some 
 documentation in this area will same lot of time for us.
 
 I wish there will be some activity in this area 

RE: Next cTAKES release (3.1)?

2013-07-18 Thread Savova, Guergana
Actually, MTsamples is what iDASH downloaded for their notes repository.
--Guergana

-Original Message-
From: andy mcmurry [mailto:mcmurry.a...@gmail.com] 
Sent: Wednesday, July 03, 2013 7:26 PM
To: dev@ctakes.apache.org
Subject: Re: Next cTAKES release (3.1)?

Mtsamples has lots of free public examples already but we aren't using them 
yet.  This is probably because mtsamples don't have the annotations we need to 
use them as training examples.
On Jul 3, 2013 2:46 PM, Hephaestus Studio hephaestus.stu...@gmail.com
wrote:

 @Andy - Not a doctor yet, but soon! Thanks for the promotion though, 
 one more year!

 - Apropos meds or clinical type questions: any developer on here can 
 feel free to shoot me a quick question via the list anytime, Id be 
 happy to confirm that a drug or anything else makes since given a 
 particular clinical/note context.

 - I wonder if there is someway in which you could guide us in making 
 better use of the medical knowledge sources (ontologies) that are 
 available. - I'd be happy to brainstorm about using existing 
 resources to help in decision making. We use these all the time in the clinic.

 @ Tim+Andy+Chen - I haven't had a chance to really start chewing into 
 the code, though I hope to over the next year; so, what kind of 
 examples would be most helpful?
 - Any particular disease processes?
 - Are you all familiar with the ubiquitous SOAP style presentation 
 that doctors use to write free notes? The few examples I clicked 
 through in the repository that Chen pointed me too are very sparse. 
 Would we want gradations? E.g., a scale for well done notes to very 
 quick I-dont-care-because-I'm-in-a-rush notes?

 @ Chen - Thank you for the kind words. It's nice to be welcomed by a 
 community in which you hope to integrate. And thank you for pointing 
 me to the directory with the current sample notes. This was very 
 helpful in determining where those are at in there development. I know 
 that each of your hospitals have a wealth of HIPAA-closed notes, but 
 I'll see what I can do to make some stereotypical open-notes for 
 common disease presentations. Again: maybe a scale, not necessarily 
 just on brevity but some other metric, whose continuum represented 
 various permutations of degrees of something, maybe of difficulty in 
 processing? Apropos code,
 Chen: I will help where I can but where I want to be is elbow deep in 
 the code :)

 Finally: I haven't had a chance to look into some of the links from 
 earlier in this thread regarding open access repositories of free text 
 clinical notes: what do you all feel the quality of these resources are?
 Abundant but low quality? Paucity but those that are there are high quality?

 Bottom line: no problem either answering contextual questions (can 
 afib be associated with a lower gi bleed??) and no problem writing 
 some notes, only question would be, before I put in any time: what 
 disease/specialty domain?
 and would we want some system that put them on a continuum of some 
 variable, say, brevity or readability?

 Just thinking before leaping,

 Thanks,
 JG

 Sent from my iPhone

 On Jul 2, 2013, at 21:23, Chen, Pei pei.c...@childrens.harvard.edu
 wrote:

  Hi John,
  Welcome!  There are actually many ways to contribute and it's not
 limited to just code.  It's always great to hear new ideas and 
 suggestions on how to improve the software.  Therefore even, things 
 like user feedback, documentation, new use cases, essentially anything 
 that will make things better would be awesome!
 
  To get started, I would suggest subscribing to the email lists.  If 
  you
 would like to contribute anything, just create an Jira account (anyone 
 should be able to do this), and add/review Jira items (add attachments 
 if you like) and we can even help integrate it.
 
  We normally use Jira to keep track of issues:
  [1] https://issues.apache.org/jira/browse/ctakes
 
  Current collection of sample test notes that have been collected 
  over
 the years:
 
 https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-regression-test/t
 estdata/input/plaintext/
 
  
  From: Tim Miller [timothy.mil...@childrens.harvard.edu]
  Sent: Tuesday, July 02, 2013 6:31 PM
  To: dev@ctakes.apache.org
  Subject: Re: Next cTAKES release (3.1)?
 
  Agreed that you could definitely help out, and that would be a great 
  way to do so. We don't really have examples right now, more like 
  just short test sentences for showing simple results and verifying 
  that nothing has been broken by changes. I think regular length fake 
  but realistic notes would be very useful.
  Tim
 
  On 07/02/2013 05:19 PM, John Green wrote:
  Hi all,
 
  Ive been following this mail list for a couple of months. Im a 
  third
 year medical student rounding the bend toward my MD. I used to be a 
 computer programmer, however, and continue my own projects. Im very 
 interested in contributing eventually to cTakes development. 

RE: Next cTAKES release (3.1)?

2013-07-18 Thread Savova, Guergana
+1 for Dr. Green generating fake but realistically looking notes.

Dr. Green,
If you can generate a few notes that could go in the 3.1 release, that would be 
wonderful! Thanking you!
--Guergana

-Original Message-
From: Tim Miller [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, July 02, 2013 6:31 PM
To: dev@ctakes.apache.org
Subject: Re: Next cTAKES release (3.1)?

Agreed that you could definitely help out, and that would be a great way to do 
so. We don't really have examples right now, more like just short test 
sentences for showing simple results and verifying that nothing has been broken 
by changes. I think regular length fake but realistic notes would be very 
useful.
Tim

On 07/02/2013 05:19 PM, John Green wrote:
 Hi all,

 Ive been following this mail list for a couple of months. Im a third year 
 medical student rounding the bend toward my MD. I used to be a computer 
 programmer, however, and continue my own projects. Im very interested in 
 contributing eventually to cTakes development. In the meantime, given the 
 current talk of examples, if any domain specific examples needed generated I 
 am domain knowledgable enough that I could pound out a few free text notes 
 made to order.

 Let me know, you all may already have docs on hand willing todo this, but if 
 not...

 John Green

 Sent from my iPhone

 On Jun 28, 2013, at 8:59, Chen, Pei pei.c...@childrens.harvard.edu wrote:

 I completely agree with making cTAKES easier use.  I think it is exciting to 
 hear the different use cases here and understanding where some of the areas 
 that need improvements are (which we haven't thought about earlier).
 I think Tim's suggestions and the 3 concrete actionable items makes a lot of 
 sense.  Hopefully it should attract new users, adopters, and perhaps more 
 committers.

 i) Make the typesystem forefront in documentation -- generate 
 javadocs and have as a link on the ctakes frontpage/sidebar
 ii) Similar to the way that we are aiming to have tests in every 
 module, also have clearly labeled examples in every module that set 
 up a pipeline, run on sample notes (could be the same sample notes 
 from the tests), and do something with the results.
 iii) Follow Giri's recommendation to have example training data for 
 people who want to take the next step and train their own models
 I think Java developers are accustomed to including a library as a 
 dependency/jar, have an API to pass input, and get the results via pojos;  
 So the examples could initially shield the complexity of wiring a pipeline 
 together etc.
 If we can improve the API's and how it gets integrated with other apps, we 
 can add any GUI/CLI tools on top of this afterwards.

 --Pei

 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Friday, June 28, 2013 8:00 AM
 To: dev@ctakes.apache.org
 Subject: Re: Next cTAKES release (3.1)?

 Very interesting discussion. I think Giri is right about giving 
 example training data in the format that our training code can read. 
 While our ultimate goal would be to build and release models that 
 are completely domain- independent, in the real world it is almost 
 always better to use some domain-specific data and we should think more 
 about how to facilitate that.

 As for making it easier to get started, it is not totally clear to 
 me what this means/how to do it so it might be useful to get 
 specific about what this means. I think our biggest hurdle is

 1) Prerequisite of understanding UIMA/UIMAFit

 Since UIMAFit is officially becoming part of UIMA that will be 
 easier, and hopefully people will just learn the easier (in my 
 opinion) UIMAFit way than the standard UIMA way of doing things. Is 
 there something we can be doing to make understanding UIMA easier? 
 Or do we just need to say upfront that this is a prerequisite and 
 hope that people don't give up due to this thing that is out of our control?

 Another hurdle is:

 2) cTAKES is a multi-purpose developer-aimed tool

 So it's not just a matter of hiding complexity -- at some point 
 people have to understand their problem, understand cTAKES' capabilities, 
 and start coding.
 Pei's GUI will help for some common use cases but will not remove 
 the requirement that someone at the organization knows cTAKES.
 I think one part of this problem is the fact that the typesystem is 
 not well documented. A developer needs to know what the output is 
 (objects from the typesystem), how to get them (which 
 modules/pipelines), and what information is in them. So maybe on this end 
 my recommendation would be:
 i) Make the typesystem forefront in documentation -- generate 
 javadocs and have as a link on the ctakes frontpage/sidebar
 ii) Similar to the way that we are aiming to have tests in every 
 module, also have clearly labeled examples in every module that set 
 up a pipeline, run on sample notes (could be the same sample notes 
 from the tests), and do 

RE: how do you feel about putting public presentations on ctakes.apache.org ?

2013-07-08 Thread Savova, Guergana
+1

There are already a number of publications in Publications and Acknowledgements 
section in History. Of course, the content can be re-organized.
--Guergana

-Original Message-
From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] 
Sent: Wednesday, July 03, 2013 10:04 AM
To: 'dev@ctakes.apache.org'
Subject: RE: how do you feel about putting public presentations on 
ctakes.apache.org ?

+1

-Original Message-
From: dev-return-1727-Masanz.James=mayo@ctakes.apache.org 
[mailto:dev-return-1727-Masanz.James=mayo@ctakes.apache.org] On Behalf Of 
Mattmann, Chris A (398J)
Sent: Tuesday, July 02, 2013 9:54 PM
To: dev@ctakes.apache.org
Subject: Re: how do you feel about putting public presentations on 
ctakes.apache.org ?

+1 makes total sense, should be a great way to show off the project.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department University of Southern 
California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Chen, Pei pei.c...@childrens.harvard.edu
Reply-To: dev@ctakes.apache.org dev@ctakes.apache.org
Date: Tuesday, July 2, 2013 7:45 PM
To: dev@ctakes.apache.org dev@ctakes.apache.org
Subject: RE: how do you feel about putting public presentations on 
ctakes.apache.org ?

+1 for a Related Resources with public links.
There have been a couple of recently accepted/published papers that can 
illustrate what could be done with cTAKES which would fit nicely into 
that page...

From: Girivaraprasad Nambari [girinamb...@gmail.com]
Sent: Tuesday, July 02, 2013 10:15 PM
To: dev@ctakes.apache.org
Subject: Re: how do you feel about putting public presentations on 
ctakes.apache.org ?

I think it would be good to have a page on Wiki with title something 
like Related resources (Apache Mahout has similar page I guess) and 
add relevant public links here. I know we can't pull everything to 
this page, but whatever core team thinks are valuable at least (or) 
referred while implementing the code.

I think this page will give high level overview on what frameworks 
developers need to be aware of while using ctakes.

For example, we can add following links:

https://code.google.com/p/uimafit/wiki/GettingStarted
http://knowtator.sourceforge.net/quickstart.shtml

And some links related to SVM and MaxEnt related information 
etc.,(Which were referred by original implementer).

This way people who want to extend/add new features and referring 
existing implementation will be able to go through these to get 
understanding on what is happening inside ctakes.

Thank you,
Giri





On Tue, Jul 2, 2013 at 7:26 PM, Andy McMurry mcmurry.a...@gmail.com
wrote:

 Argument FOR:

 Videos can also be very educational for new users!
 For example, this cTAKES description by Guergana :
 https://vimeo.com/24829353

 Publishing our slides or video -- for example recent the NLP 
presentations  from the i2b2 user group -- gives folks a very real 
sense of the kinds of  problems we are currently working on.
 A lot of people can't make it to these events. The slides are on the 
web  anyway, just harder to find if you dont know already.

 Argument AGAINST:

 Sharing your ideas before they are published in a journal can be bad 
for  academic credit!
 Make every effort to separate the scientific research from 
engineering  product.
 There is little value in sharing an idea without an implementation 
anyway,  this is already complicated enough with stable software.

 ~~~
 I can see both perspectives.
 Curious what others think about this.

 --Andy