Re: cTAKES Trunk Broken?

2015-10-15 Thread Chen, Pei
Yes, it would be great if we can fix the unit test.
So "mvn test" works fine, but if you need to install or package, we had 
-skipTests=true because the unit test attempts to load resources after it's 
been bundled/packaged.
We should fix update the unit test or allow load resource from stream.

Sent from my iPhone

On Oct 15, 2015, at 1:36 PM, Bruce Tietjen 
>
 wrote:

Typically if the trunk is generally broken for everyone, it would be addressed 
pretty quickly.

I may be wrong, but I believe broken build issues should probably be addressed 
to 'dev@ctakes.apache.org' for more prompt 
responses.




On Thu, Oct 15, 2015 at 10:26 AM, Lewis John Mcgibbney 
> wrote:
For example, when I try to build trunk I get the following

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.776 sec <<< 
FAILURE!

Results :

Tests in error:
  
TestClearNLPPipeLine(org.apache.ctakes.dependency.parser.ae.util.TestClearNLPAnalysisEngines):
 URI is not hierarchical

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache cTAKES .. SUCCESS [  1.328 s]
[INFO] Apache cTAKES common type system ... SUCCESS [  5.040 s]
[INFO] Apache cTAKES utils  SUCCESS [  1.620 s]
[INFO] Apache cTAKES Resources core ... SUCCESS [  0.359 s]
[INFO] Apache cTAKES core . SUCCESS [  7.365 s]
[INFO] Apache cTAKES Resources pos-tagger . SUCCESS [  1.055 s]
[INFO] Apache cTAKES part-of-speech tagger  SUCCESS [  2.798 s]
[INFO] Apache cTAKES Resources ctakes-chunker-res . SUCCESS [  0.896 s]
[INFO] Apache cTAKES chunker .. SUCCESS [  0.801 s]
[INFO] Apache cTAKES document preprocessor  SUCCESS [  1.141 s]
[INFO] Apache cTAKES Resources dictionary-lookup .. SUCCESS [  7.870 s]
[INFO] Apache cTAKES dictionary lookup  SUCCESS [  0.904 s]
[INFO] Apache cTAKES context dependent tokenizer .. SUCCESS [  0.697 s]
[INFO] Apache cTAKES Resources lvg  SUCCESS [ 12.562 s]
[INFO] Apache cTAKES LVG lexical tools  SUCCESS [  0.768 s]
[INFO] Apache cTAKES Resources ne-contexts  SUCCESS [  0.236 s]
[INFO] Apache cTAKES named entity contexts  SUCCESS [  2.246 s]
[INFO] Apache cTAKES Resources constituency-parser  SUCCESS [  2.426 s]
[INFO] Apache cTAKES Constituency Parser .. SUCCESS [  0.737 s]
[INFO] Apache cTAKES Resources coreference  SUCCESS [  3.742 s]
[INFO] Apache cTAKES Resources relation-extractor . SUCCESS [  0.409 s]
[INFO] Apache cTAKES Resources dependency-parser .. SUCCESS [ 16.388 s]
[INFO] Apache cTAKES Dependency Parser  FAILURE [  3.108 s]


My environment is

Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 
2015-04-22T04:57:37-07:00)
Maven home: /usr/local/apache-maven-3.3.3
Java version: 1.7.0_79, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.9.5", arch: "x86_64", family: "mac"

On Thu, Oct 15, 2015 at 9:11 AM, Lewis John Mcgibbney 
> wrote:
Hi Folks,
I tried for the first time to test cTAKES trunk.
It seems to be brolen, is this correct?
Are you guys interested in setting up a Jenkins build for cTAKES?
I can do this no problem, please let me know.
Thanks
Lewis

--
Lewis



--
Lewis



Re: URI is not hierarchical when attempting to obtain lvg.properties within JAR

2015-10-15 Thread Chen, Pei
It would be great if we could have a patch that allows lvg to load from 
resource from steam. Thanks for looking into that Lewis. Note though- the 
physical files requirement may go deeper into lvg code.

Sent from my iPhone

On Oct 15, 2015, at 2:36 AM, Lewis John Mcgibbney 
> wrote:

Issue in Jira

https://issues.apache.org/jira/browse/CTAKES-385

On Wed, Oct 14, 2015 at 10:55 PM, Lewis John Mcgibbney 
> wrote:
Hi Folks,
I am using cTAKES 3.2.2 Maven dependencies.
I have some clinical pipeline code along with cTAKES dependencies and some 
resources packaged into an uber jar which I am utilizing within my Spark driver 
code. When I submit this to the Spark cluster I get a nasty stack trace [0] 
with the following being important


  *   Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
  * at java.io.File.(File.java:418)
  * at 
org.apache.ctakes.lvg.ae.LvgAnnotator.createAnnotatorDescription(LvgAnnotator.java:565)
  * at 
it.cnr.iac.CTAKESClinicalPipelineFactory.getTokenProcessingPipeline(CTAKESClinicalPipelineFactory.java:146)

The problem here is that 
LvgAnnotator.createAnnotatorDescription(LvgAnnotator.java:565) looks as follows


ExternalResourceFactory.createExternalResourceDescription(
  LvgCmdApiResourceImpl.class,
  new File(LvgCmdApiResourceImpl.class.getResource(
  "/org/apache/ctakes/lvg/data/config/lvg.properties").toURI()))

Here we should be using LvgCmdApiResourceImpl.class.getResourceAsStream, the 
transformation to File should then be done if required within 
ExternalResourceFactory.createExternalResourceDescription.

The above is an issue which has been reported on a few occasions and a fix 
somewhat proposed for a similar issue here [1][2].
I am going to submit a patch for this and submit a test. I'll open an issue in 
Jira.
Thanks
Lewis

[0] 
https://paste.apache.org/gDJa
[1] 
https://issues.apache.org/jira/browse/CTAKES-307
[2] 
https://issues.apache.org/jira/browse/CTAKES-89

--
Lewis



--
Lewis


RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2015-07-30 Thread Chen, Pei
Hi Ted/Jay,
Thanks for suggesting and taking this up….
What information will be needed to accomplish what you were thinking?
Just thinking aloud here:

1)  Test data.  I think John Green crafted about 20-30 notes in the data 
folder.  We can use this as a starting point.

2)  Code to run though the various components and pipelines?

3)  Environments to run thru different O/S/hardware, etc.?

4)  Create a Gold Standard format (Knowtator and/or Anafora).  cTAKES 
already has existing readers for those. [For ML based examples?]

I think there is an ctakes-regression project that we can probably just 
overwrite for new regression testing code.

From: Ted Strall [mailto:tstr...@yahoo.com]
Sent: Thursday, July 30, 2015 9:21 AM
To: Chen, Pei; dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

How / when can we go about getting started on this?


From: Chen, Pei 
pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu
To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org 
dev@ctakes.apache.orgmailto:dev@ctakes.apache.org; Ted Strall 
tstr...@yahoo.commailto:tstr...@yahoo.com
Sent: Friday, July 24, 2015 12:52 PM
Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

Ted- Welcome to the community!
I think this would be a great enhancement.
Jay- I think the BigTop folks did a lot with the smoke and integration tests... 
Do you how they did it? Something we can reuse?
--Pei


-Original Message-
From: Ted Strall 
[mailto:tstr...@yahoo.com.INVALIDmailto:tstr...@yahoo.com.INVALID]
Sent: Friday, July 24, 2015 12:31 PM
To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

I would be interested in helping to develop / maintain a regression testing 
framework for that.
I'm new to ctakes (and just recently started stalking the dev mailing list) but 
I've been a software engineer for 20 years and have done a lot of framework 
automation stuff that will probably be required. As I write this, I am working 
on an automated integration test that will run on Jenkins that fires up and 
load an h2 database, a solr instance, an in-house indexing pipeline and an 
in-house search service, indexes 10k documents and executes and evaluates some 
canned queries before shutting itself down.
I'm also working on a MS in Predictive Analytics and I am interested in 
applying machine learning and NLP to medical informatics, so I would welcome 
the chance to get dirty with that side of stuff, also.
  From: Jay Vyas 
jayunit100.apa...@gmail.commailto:jayunit100.apa...@gmail.com
To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org 
dev@ctakes.apache.orgmailto:dev@ctakes.apache.org
Sent: Friday, July 24, 2015 10:44 AM
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification 
of Clinical Narratives

Yes this is very interesting work.

-  If we have access to a large corpus of de identified records we can 
recession test the ctakes platform.

- I can help collaborate on a regression testing framework if someone else 
wants to help Maintain it.



 On Jul 24, 2015, at 11:12 AM, Pei Chen 
 chen...@apache.orgmailto:chen...@apache.org wrote:

 Hi,
 Re:
 https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.
 com_science_article_pii_S1532046415001392d=BQIFaQc=qS4goWBT7poplM69z
 y_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5
 WYm=IdFJ0ChLqz9-dg435_5Rea2_0EUPNDw0uCUKnNp_N7ks=DOgavsLa7IIU0rgq8lx
 DXTb33J8-4zgCWuKzL83CZywe= This is very interesting work and I think


 it would be very valuable for the general community.  Is this
 something that you may be in interested in contributing/sharing the
 code with the Apache cTAKES community?
 Thanks,
 Pei



RE: Annotator POSTagger.xml

2015-07-24 Thread Chen, Pei
Matie,
That looks to be a discrepancy.
My suggestion would be to remove: POSTagger.xml from the Chunker project and 
anywhere else as it is confusing.  (I think these 'mini' pipelines were there 
when we supported those PEAR file deployments)
Would you mind double checking to see what the defaults are for those 
parameters?  If memory serves me correctly, I don't think TagDictionary is used 
anymore when we upgraded to the latest version of OpenNLP and it's most likely 
that some old descriptors were not updated.
Feel free to create a Jira to track it.

-Original Message-
From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com] 
Sent: Friday, July 24, 2015 10:50 AM
To: dev@ctakes.apache.org
Subject: Annotator POSTagger.xml

Hi everyone,

I explored the POS tagger component guide and the readme file which both 
describe the annotator called POSTagger.xml. It looks like it should have 3 
parameters:
PosModelFile, TagDictionary and CaseSensitive.

This description matches with POSTagger.xml under ctakes-chunker/desc, but 
POSTagger.xml under ctakes-pos-tagger/desc has only the first parameter, ( this 
last directory is used in AggregatePlaintextUmlsProcessor.xml and 
AggregatePlaintextFastUmlsProcessor.xml ).

Does this make a difference when running the pipeline?

Thank you for your time,

Maite


RE: UmlsConcept subject

2015-07-22 Thread Chen, Pei
Tomasz,
Thanks for bringing those up.  It would be great if you can log the real 
examples into the Jira ticket and it can be incorporated into test cases going 
forward (it may most likely need more training examples).
Also, FYI- If I recall correctly, there was nothing previously in cTAKES that 
explicitly populated the subject attribute.  The closest remotely was the regex 
that was lumped together with history.

I hope that helps...
--Pei
-Original Message-
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Wednesday, July 22, 2015 11:35 AM
To: dev@ctakes.apache.org
Subject: RE: UmlsConcept subject

Pei,

The SubjectCleartkAnalysisEngine is currently broken in cTAKES, I tried it with 
more examples, it just returns patient as subject.

You mentioned that this is the new Subject Classifier. 

1. What was the old module that was capturing the subject of a UmlsConcept? 

2. How can this old module be enabled in the clinical pipeline until this new 
Subject Classifier is fixed?

Thanks,
Tomasz


RE: UmlsConcept subject

2015-07-15 Thread Chen, Pei
Tomasz,
Yes, please please feel free to open a Jira ticket for this. Also, Be sure to 
include the version of the cTAKES and pipeline you're using.
It is possible that the new Subject Classifier isn't classifying this...

-Original Message-
From: Tomasz Oliwa [mailto:ol...@uchicago.edu] 
Sent: Wednesday, July 15, 2015 2:50 PM
To: dev@ctakes.apache.org
Subject: UmlsConcept subject

Hi,

I think there is a regression in the way cTAKES discovers the subject status 
(patient, familiy_member, etc.) of an UmlsConcept. Using cTAKES 3.2.2 and 
the AggregatePlaintextFastUMLSProcessor in the CVD:

1. Patient's brother has a myocardial infarction. 
myocardial infarction and infarction have subject = patient

2. Father had a myocardial infarction.
myocardial infarction and infarction have subject = patient

3. Sister was diagnosed with a myocardial infarction.
myocardial infarction and infarction have subject = patient

4. Family member had a myocardial infarction.
myocardial infarction and infarction have subject = family_member (this 
is correct)

I am looking at the code of the SubjectCleartkAnalysisEngine. Is this the class 
responsible for inferring the subject?
How can this be fixed? Should I open a JIRA ticket?

Thanks,
Tomasz


Re: Training model to detect a pattern

2015-07-04 Thread Chen, Pei
Soumya,
Could you elaborate a bit on what you mean by pattern? Perhaps an example would 
be helpful.
--Pei

Sent from my iPhone

On Jul 3, 2015, at 7:27 AM, Soumya Shree 
soumya.sh...@citiustech.commailto:soumya.sh...@citiustech.com wrote:

Hi folks,

I need to train my system so that it should detect the pattern where ever it 
encounter in input. Do we have any API or relevant thing which CTakes offer. I 
have tried this using Chunker, but the behavior imply on the whole sentence not 
on specific pattern. I appreciate if I can get little help on the same.


Thanks  Regards,
Soumya Shree
[cid:image001.png@01D0B5B0.91194B90]https://urldefense.proofpoint.com/v2/url?u=http-3A__www.citiustech.com_d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=xX7Y5O8lJRzZqPFKhQOfbDYTT7yaGI26YLRi40sdOQ0s=w13pfYw2-RiMJvMrR9yWdDcFzWMwY9bjhC49Rdp7UpQe=[cid:image002.png@01D0B5B0.91194B90]https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_company_80661-3Ftrk-3Dtyahd=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=xX7Y5O8lJRzZqPFKhQOfbDYTT7yaGI26YLRi40sdOQ0s=lkJI51xfeXl8d3ziT3vmHeAaLuQuqBHQz5pkigWSPz4e=
  [cid:image003.png@01D0B5B0.91194B90] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_-23-21_CitiusTechd=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=xX7Y5O8lJRzZqPFKhQOfbDYTT7yaGI26YLRi40sdOQ0s=jW4dTYPGTnh5NipOPvG5J0-I9m92vXfS9sBjknxtmpse=
   [cid:image004.png@01D0B5B0.91194B90] 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_pages_CitiusTech_124740167627560-3Fsk-3Dwalld=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=xX7Y5O8lJRzZqPFKhQOfbDYTT7yaGI26YLRi40sdOQ0s=gQhiICUaujwZr9Igkd8TnC2wSicFIbJn7c-FU9M5V9se=

===
 DISCLAIMER: The information contained in this message (including any 
attachments) is confidential and may be privileged. If you have received it by 
mistake please notify the sender by return e-mail and permanently delete this 
message and any attachments from your system. Any dissemination, use, review, 
distribution, printing or copying of this message in whole or in part is 
strictly prohibited. Please note that e-mails are susceptible to change. 
CitiusTech shall not be liable for the improper or incomplete transmission of 
the information contained in this communication nor for any delay in its 
receipt or damage to your system. CitiusTech does not guarantee that the 
integrity of this communication has been maintained or that this communication 
is free of viruses, interceptions or interferences. 



Re: keep file name when using CPE_GUI

2015-06-14 Thread Chen, Pei
Samir,
Which cas consumer are you using?

Sent from my iPhone

On Jun 14, 2015, at 11:24 AM, samir chabou 
samir...@yahoo.commailto:samir...@yahoo.com wrote:

Hi,
When I use CPE_GUI it does not keep the input file name but it changes it to 
doc0.
Example input file = test123.txt
the output file for test123.txt from the CPE_GUI = doc0. Is there any way to 
get the output file name = test123 instead of doc0 ?
Thanks for your help


Re: Integration of Tika with cTAKES

2015-06-07 Thread Chen, Pei
This looks awesome. 
Perhaps we can reuse the Tika server on the ctakes demo VM. 

Sent from my iPhone

 On Jun 6, 2015, at 8:40 PM, jay vyas jayunit100.apa...@gmail.com wrote:
 
 This is awesome; thanks!
 
 For some of the new ctakes projects where fplks bc are aiming at using it
 with big data tooling, the till abstraction might be super useful.
 On Jun 6, 2015 8:19 PM, Mattmann, Chris A (3980) 
 chris.a.mattm...@jpl.nasa.gov wrote:
 
 Hey cTAKES peeps!
 
 We went ahead and integrated Tika with cTAKES for a project I’m
 working on at JPL. It will be part of the 1.9 release of Tika. You
 can check it out here:
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_tika_cTAKESParserd=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=L070DL_WFb_1U_8jGdAbnv_Ggx5mnsTfV4Jba6oNNU8s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaUPRMe=
  
 
 
 Feedback welcomed. cTAKES is rad!
 
 Cheers,
 Chris
 
 ++
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398)
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  
 https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_d=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=L070DL_WFb_1U_8jGdAbnv_Ggx5mnsTfV4Jba6oNNU8s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48e=
  
 ++
 Adjunct Associate Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 


RE: Downloads link broken

2015-05-29 Thread Chen, Pei
Hi Tom,
Thanks for pointing that out.  There was a copy and paste error on the website 
links for the resources.
Should be fixed now.

-Original Message-
From: Tom Devel [mailto:deve...@gmail.com] 
Sent: Friday, May 29, 2015 6:23 PM
To: dev@ctakes.apache.org
Subject: Re: Downloads link broken

The download page, the links to the installation and the links to the source 
code are working now, many thanks.
The link to the resources at sourceforge still points to files that do not 
exist, the links point to 3.2.2.1 files but there are 3.2.1.1 files there.

On Fri, May 29, 2015 at 3:29 PM, Pei Chen chen...@apache.org wrote:

 It looks like downloads.cgi on the web site didn't have the executable 
 svn property set in staging causing the -500 Internal Server Error.
 That should be fixed now.

 On Fri, May 29, 2015 at 3:05 PM, Tom Devel deve...@gmail.com wrote:

  The links on 
  https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_downloadsd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=2kcbGy2NgCHwrwe4wbe4_665imePYDCTY5mAnFCsJl0s=7s2RXcpSxpIYbgJLQLTSIr7lohWtu59xWU2tw7bQYnwe=
are broken, too:
 
  User Installation links points to a URL that does not exist, for example:
 
 
 https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_
 -255Bpreferred-255D_ctakes_ctakes-2D3.2.2_apache-2Dctakes-2D3.2.2-2Dbi
 n.tar.gzd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2
 MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=2kcbGy2NgCHwrwe4wbe4_665imeP
 YDCTY5mAnFCsJl0s=U4zj9TU14Pe1MznfJ2xLyL8t9GXKdn4vJSjmWQyCazUe=
 
  Source code points to a URL that does not exist:
 
 
 https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.org_
 -255Bpreferred-255D_ctakes_ctakes-2D3.2.2_apache-2Dctakes-2D3.2.2-2Dsr
 c.tar.gzd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2
 MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=2kcbGy2NgCHwrwe4wbe4_665imeP
 YDCTY5mAnFCsJl0s=9_5jPIvUHe2QgHtzcoxh7OpXTwR91m3Q77HEZVbRbOQe=
 
  Umls dictionary points to sourceforge, which displays:
  The /ctakessnorx-3.2.2.1..kessnorx-3.2.2.1.zip file could not be 
  found or is not available. Please select another file.
  The /ctakes-resources-3...rces-3.2.2.1-bin.zip file could not be 
  found or is not available. Please select another file.
 
  On Fri, May 29, 2015 at 1:52 PM, andy mcmurry 
  mcmurry.a...@gmail.com
  wrote:
 
   Homepage points to
   https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.
   org_downloads.cgid=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCo
   ppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=2kcbGy2NgCH
   wrwe4wbe4_665imePYDCTY5mAnFCsJl0s=71Sa1bFGp_AxJEes5v2Q8eHkDf1CWqx
   lbz5amTEJiXEe=
  
   should be
   https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache.
   org_downloadsd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
   FUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=2kcbGy2NgCHwrwe4wbe4_665imePYDCTY5mAnFCsJl0s=7s2RXcpSxpIYbgJLQLTSIr7lohWtu59xWU2tw7bQYnwe=
  
 



[CANCEL] [VOTE] Release Apache cTAKES 3.2.2 (rc1)

2015-05-12 Thread Chen, Pei
Cancelling this rc1 so we can squeeze the UMLS validation fix into this patch 
release.
Will create a rc2 for voting instead.
--Pei

-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Tuesday, May 05, 2015 2:06 PM
To: dev@ctakes.apache.org
Subject: [VOTE] Release Apache cTAKES 3.2.2 (rc1)

This is a call for a vote on releasing the following candidate (rc1) as Apache 
cTAKES 3.2.2.

The major changes include:
- Improved optional Temporal models (Time + Event Relationships models now
available)
- Other bug fixes/enhancements from Jira (see release notes Jira link below).

I manually downloaded the bin as well as resources and tried the CVD with the 
AggregatePlaintextFastUMLSProcessor.xml and CPE testing the 
AggregateCdaProcessor.

Would be great if folks have time to test/verify especially if you opened any 
of the Jira's below to ensure the bugs have been fixed/integrated.

For more detailed information on the changes/release notes, please visit:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ReleaseNote.jspa-3FprojectId-3D12313621-26version-3D12328717d=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=2tzdaYJ1MXopjzVglM0MX8jhVBU7UwpAInyuunff0AQe=
 

The release was made using the cTAKES release process documented here:

https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_repos_asf_ctakes_site_backup_content_ctakes-2Drelease-2Dguide.mdtextd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=42ZyGuQy1kYIyKO3KGVTmeY6WYH5apS21JrURf0psdse=
 

The candidate is available at:
https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_ctakes_ctakes-2D3.2.2-2Drc1_apache-2Dctakes-2D3.2.2-2Dsrc.tar.gzd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=faqJin33YGqxEtcRs5-QHQ8lVjKKAAQNmZMbHiZcLZ8e=
 

/.zip

The tag to be voted on:
https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D3.2.2-2Drc1d=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=e7SCqBEAj17J0IrRUmhtfSLWnfcUZUbmCACsYdUP2G0e=
 

The MD5 checksum of the tarball can be found at:
https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_ctakes_ctakes-2D3.2.2-2Drc1_apache-2Dctakes-2D3.2.2-2Dsrc.tar.gz.md5d=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=TOx4Peq8kgQSy_d-BCyfECxia3xJn7-eUhcBZhD3HNge=
/.zip.md5

The signature of the tarball can be found at:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_ctakes_ctakes-2D3.2.2-2Drc1_apache-2Dctakes-2D3.2.2-2Dsrc.tar.gz.ascd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=sLdNFPSN5Xv4fLMk94HXyKi6iq3FRDBSKde6wyYFB-Ie=
/.zip.asc

Apache cTAKES' KEYS file, containing the PGP keys used to sign the release:
https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_release_ctakes_KEYSd=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=FkI3vhmNv0fc-x4GnDtwCtoCaGtU9kpufV2la4cTYkYe=
 

Please vote on releasing these packages as Apache cTAKES 3.2.2. The vote is 
open for at least the next 72 hours.

The vote passes if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache cTAKES 3.2.2

[ ] -1 Do not release the packages because...


Also, the convenience binary can be found at:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.org_repos_dist_dev_ctakes_ctakes-2D3.2.2-2Drc1_apache-2Dctakes-2D3.2.2-2Dbin.tar.gz.md5d=BQIBaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=XMD87lroTsWv3aghR8BikamR_jciEaJGvG656TGFVcEs=xVzKiDnJ8dgnzW_banZcon0CbEyyz8OX1NU7kM6i2rIe=
 

/.zip


Thanks!


RE: svn commit: r1677903 - in /ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2: concept/BsvConceptFactory.java dictionary/BsvRareWordDictionary.java util/

2015-05-05 Thread Chen, Pei
Can we use InputStreamReader instead of FileReader?
That way the resource can also be read from within a jar (potentially from 
maven central, etc.) and doesn't have to be fixed to a physical file...

i.e.
Instead of new BufferedReader(new FileReader(path))
new BufferedReader(new InputStreamReader(FileLocator.getAsStream(path)))

--Pei

-Original Message-
From: seanfi...@apache.org [mailto:seanfi...@apache.org] 
Sent: Tuesday, May 05, 2015 6:42 PM
To: comm...@ctakes.apache.org
Subject: svn commit: r1677903 - in 
/ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2:
 concept/BsvConceptFactory.java dictionary/BsvRareWordDictionary.java 
util/JdbcConnectionFactory.java

Author: seanfinan
Date: Tue May  5 22:41:26 2015
New Revision: 1677903

URL: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_r1677903d=BQICaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=9sLhiql1kiKYdaC8Nx3dTASt89nXQA3uy4kwesnHIags=wuwFl1DxU-yGWdGewROupvowHfYFay_u5LYKJUJF2VAe=
Log:
Use FileLocator to find BSV dictionaries

Modified:

ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/concept/BsvConceptFactory.java

ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/dictionary/BsvRareWordDictionary.java

ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/util/JdbcConnectionFactory.java

Modified: 
ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/concept/BsvConceptFactory.java
URL: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_viewvc_ctakes_trunk_ctakes-2Ddictionary-2Dlookup-2Dfast_src_main_java_org_apache_ctakes_dictionary_lookup2_concept_BsvConceptFactory.java-3Frev-3D1677903-26r1-3D1677902-26r2-3D1677903-26view-3Ddiffd=BQICaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=9sLhiql1kiKYdaC8Nx3dTASt89nXQA3uy4kwesnHIags=N_IOanbEYnXUTZ4ZO3vIjOeYun186kZGjXPKWp-Wi7ke=
==
--- 
ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/concept/BsvConceptFactory.java
 (original)
+++ ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/
+++ ctakes/dictionary/lookup2/concept/BsvConceptFactory.java Tue May  5 
+++ 22:41:26 2015
@@ -1,5 +1,6 @@
 package org.apache.ctakes.dictionary.lookup2.concept;
 
+import org.apache.ctakes.core.resource.FileLocator;
 import org.apache.ctakes.dictionary.lookup2.util.CuiCodeUtil;
 import org.apache.ctakes.dictionary.lookup2.util.LookupUtil;
 import org.apache.ctakes.dictionary.lookup2.util.TuiCodeUtil;
@@ -34,11 +35,12 @@ final public class BsvConceptFactory imp
}
 
public BsvConceptFactory( final String name, final String bsvFilePath ) {
-  this( name, new File( bsvFilePath ) );
-   }
-
-   public BsvConceptFactory( final String name, final File bsvFile ) {
-  final CollectionCuiTuiTerm cuiTuiTerms = parseBsvFile( bsvFile );
+//  this( name, new File( bsvFilePath ) );
+//   }
+//
+//   public BsvConceptFactory( final String name, final File bsvFile ) {
+//  final CollectionCuiTuiTerm cuiTuiTerms = parseBsvFile( bsvFile );
+  final CollectionCuiTuiTerm cuiTuiTerms = parseBsvFile( 
+bsvFilePath );
   final MapLong, Concept conceptMap = new HashMap( cuiTuiTerms.size() 
);
   for ( CuiTuiTerm cuiTuiTerm : cuiTuiTerms ) {
  final CollectionMapConceptCode, String, ? extends 
CollectionString codes @@ -90,11 +92,21 @@ final public class 
BsvConceptFactory imp
 * CUI|TUI|Text|PreferredTerm
 * /p
 * If the TUI column is omitted then the entityId for the dictionary is 
used as the TUI
+* p/
+* //* @param bsvFile file containing term rows and bsv columns
 *
-* @param bsvFile file containing term rows and bsv columns
+* @param bsvFilePath file containing term rows and bsv columns
 * @return collection of all valid terms read from the bsv file
 */
-   static private CollectionCuiTuiTerm parseBsvFile( final File bsvFile ) {
+//   static private CollectionCuiTuiTerm parseBsvFile( final File bsvFile ) {
+   static private CollectionCuiTuiTerm parseBsvFile( final String 
bsvFilePath ) {
+  File bsvFile = null;
+  try {
+ bsvFile = FileLocator.locateFile( bsvFilePath );
+  } catch ( IOException ioE ) {
+ ioE.getMessage();
+ return Collections.emptyList();
+  }
   final CollectionCuiTuiTerm cuiTuiTerms = new ArrayList();
   try ( final BufferedReader reader = new BufferedReader( new FileReader( 
bsvFile ) ) ) {
  String line = reader.readLine();

Modified: 
ctakes/trunk/ctakes-dictionary-lookup-fast/src/main/java/org/apache/ctakes/dictionary/lookup2/dictionary/BsvRareWordDictionary.java
URL: 

RE: Prep for upcoming cTAKES 3.2.2 Patch Release

2015-04-30 Thread Chen, Pei
My vote would be to push forward.
The old assertion module also had it's share of bugs/issues and gives an 
incentive to improve the new models.  
And there's currently always the option for a user to easily revert back to the 
old since it's not removed yet...
--Pei

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Thursday, April 30, 2015 9:14 AM
To: dev@ctakes.apache.org
Subject: Re: Prep for upcoming cTAKES 3.2.2 Patch Release

A question about the default pipelines. There has been some concern about the 
new assertion modules (the machine learning ones that I worked on), partially 
due to some less intuitive error modes than negex and partially due to its 
reliance on the dependency parser which increases the memory footprint 
substantially. Should we consider reverting to the rule-based negation for the 
default pipeline (thus also removing the dependency parser from the default 
pipeline)? I'm not sure what that would mean for the other assertion modules 
(uncertainty, generic, subject, hypothetical) -- but I think it means they 
would not exist.

I can see arguments both ways. I also think if we revert we would want to have 
some way for people to access all the machine learning assertion modules if 
they want them.

Tim


On 04/29/2015 06:04 PM, Chen, Pei wrote:
 FYI- I will plan to create a 3.2.2 branch from trunk this week in prep for 
 the 3.2.2 release so others can continue their work in trunk.
 Feel free to put any changes in trunk now if you want to have it included in 
 the 3.2.2 patch release.
 The main changes are:

 1)  Improved temporal models

 2)  Minor bug fixes reported in Jira

 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Thursday, March 12, 2015 12:55 PM
 To: dev@ctakes.apache.org
 Subject: Prep for upcoming cTAKES 3.2.2 Patch Release

 I was thinking of creating a 3.2.2 release for Mar (it's long passed the 
 original Jan date?)  I can volunteer to be the RM again.
 There are still plenty of unresolved items... If you plan to have anything 
 you would like included in the upcoming release, please mark it in Jira and 
 plan the commits accordingly...

 Jira Items:
 https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
 _jira_issues_-3Fjql-3DfixVersion-2520-253D-25203.2.2-2520AND-2520proje
 ct-2520-253D-2520CTAKESd=BQIFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdio
 CoppxeFUr=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx
 6674hm=2WI-fDHF0jDSXyUcTxv5U4_T_w9MBjbDAw3ZRYgoLXss=CF0gyLPeOyRvUjRy
 Vm_rcl8SaFUtPTMmfrLObpiHtxMe=
 1-25 of 25
 Columns
 T

 Patch Info

 Key

 Summary

 Assignee

 Reporter

 P

 Status

 Resolution

 Created

 Updated

 Due

 [Bug]https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apac
 he.org_jira_browse_CTAKES-2D349d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW1
 4JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOt
 BAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=OuUBnh20dG00BWWGMKNkCLddKAzEK
 EiFP3s5uMqcXvUe=

 CTAKES-349https://urldefense.proofpoint.com/v2/url?u=https-3A__issues
 .apache.org_jira_browse_CTAKES-2D349d=BQMFAgc=qS4goWBT7poplM69zy_3xh
 KwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=
 pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=OuUBnh20dG00BWWGMKNkCLdd
 KAzEKEiFP3s5uMqcXvUe=


 JdbcWriterTemplate does not store rows if there are fewer than 100 per 
 notehttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apach
 e.org_jira_browse_CTAKES-2D349d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14
 JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtB
 Aj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=OuUBnh20dG00BWWGMKNkCLddKAzEKE
 iFP3s5uMqcXvUe=

 Unassigned

 Sean 
 Finanhttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apac
 he.org_jira_secure_ViewProfile.jspa-3Fname-3Dseanfinand=BQMFAgc=qS4g
 oWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEY
 ujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=0eQpWY
 xtyJWqM1JvCN8qkioGRcjID0-QD5k2tf9-1Rce=

 [Major]

 OPEN

 Unresolved

 12/Mar/15

 12/Mar/15



 [Bug]https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apac
 he.org_jira_browse_CTAKES-2D347d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW1
 4JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOt
 BAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=ja8aLYd7A_7XF8HGNZlgwYtf57IaT
 kNbKjuO-LfG1Nwe=

 CTAKES-347https://urldefense.proofpoint.com/v2/url?u=https-3A__issues
 .apache.org_jira_browse_CTAKES-2D347d=BQMFAgc=qS4goWBT7poplM69zy_3xh
 KwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=
 pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=ja8aLYd7A_7XF8HGNZlgwYtf
 57IaTkNbKjuO-LfG1Nwe=


 AggregateCdaProcessor fails with URI is not 
 hierarchicalhttps://urldefense.proofpoint.com/v2/url?u=https-3A__issu
 es.apache.org_jira_browse_CTAKES-2D347d=BQMFAgc=qS4goWBT7poplM69zy_3
 xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY
 m=pMfOtBAj84JGCJYU

RE: Prep for upcoming cTAKES 3.2.2 Patch Release

2015-04-29 Thread Chen, Pei
FYI- I will plan to create a 3.2.2 branch from trunk this week in prep for the 
3.2.2 release so others can continue their work in trunk.
Feel free to put any changes in trunk now if you want to have it included in 
the 3.2.2 patch release.
The main changes are:

1)  Improved temporal models

2)  Minor bug fixes reported in Jira

From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
Sent: Thursday, March 12, 2015 12:55 PM
To: dev@ctakes.apache.org
Subject: Prep for upcoming cTAKES 3.2.2 Patch Release

I was thinking of creating a 3.2.2 release for Mar (it's long passed the 
original Jan date?)  I can volunteer to be the RM again.
There are still plenty of unresolved items... If you plan to have anything you 
would like included in the upcoming release, please mark it in Jira and plan 
the commits accordingly...

Jira Items:
https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%203.2.2%20AND%20project%20%3D%20CTAKES
1-25 of 25
Columns
T

Patch Info

Key

Summary

Assignee

Reporter

P

Status

Resolution

Created

Updated

Due

[Bug]https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D349d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=OuUBnh20dG00BWWGMKNkCLddKAzEKEiFP3s5uMqcXvUe=

CTAKES-349https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D349d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=OuUBnh20dG00BWWGMKNkCLddKAzEKEiFP3s5uMqcXvUe=


JdbcWriterTemplate does not store rows if there are fewer than 100 per 
notehttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D349d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=OuUBnh20dG00BWWGMKNkCLddKAzEKEiFP3s5uMqcXvUe=

Unassigned

Sean 
Finanhttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ViewProfile.jspa-3Fname-3Dseanfinand=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=0eQpWYxtyJWqM1JvCN8qkioGRcjID0-QD5k2tf9-1Rce=

[Major]

OPEN

Unresolved

12/Mar/15

12/Mar/15



[Bug]https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D347d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=ja8aLYd7A_7XF8HGNZlgwYtf57IaTkNbKjuO-LfG1Nwe=

CTAKES-347https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D347d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=ja8aLYd7A_7XF8HGNZlgwYtf57IaTkNbKjuO-LfG1Nwe=


AggregateCdaProcessor fails with URI is not 
hierarchicalhttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D347d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=ja8aLYd7A_7XF8HGNZlgwYtf57IaTkNbKjuO-LfG1Nwe=

Unassigned

Pei 
Chenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ViewProfile.jspa-3Fname-3Dchenpeid=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=oTni0dRNMncNZa8XclTZHjQ6obKXu9M6duLiy4E_O2se=

[Major]

OPEN

Unresolved

02/Feb/15

02/Feb/15



[Improvement]https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D344d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=WcwQCMRJDST_hU4wIa1Q6zF_pa9NqzfX61g6WPlB4YYe=

CTAKES-344https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D344d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=WcwQCMRJDST_hU4wIa1Q6zF_pa9NqzfX61g6WPlB4YYe=


Add DrugNER to 
clinical-pipelinehttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D344d=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=WcwQCMRJDST_hU4wIa1Q6zF_pa9NqzfX61g6WPlB4YYe=

Pei 
Chenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_secure_ViewProfile.jspa-3Fname-3Dchenpeid=BQMFAgc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFUr=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WYm=pMfOtBAj84JGCJYU-ZSZ6Ac5QC_d7g8ZReRfZu12U4ss=oTni0dRNMncNZa8XclTZHjQ6obKXu9M6duLiy4E_O2se=

Pei 
Chenhttps://urldefense.proofpoint.com/v2/url?u

Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

2015-04-20 Thread Chen, Pei
Great. There is a redundant Negation step in one of final sub smoking desc 
xml's. 
Leave the Jira as a placeholder to clean up the smoking status desc's.

Sent from my iPhone

 On Apr 20, 2015, at 1:11 PM, Tom Devel deve...@gmail.com wrote:
 
 Pei,
 
 I did what you recommended, I run a test input with this new pipeline and
 did a diff with the clinical pipeline without the smoking status on the two
 CAS files. It seems to do the trick, the Umls concept tags are still the
 same, and there is now a new tag for the smoking status annotation, great!
 
 Before I create the Jira item, what do you mean with removing the last
 NegEx?
 
 In AggregatePlaintextFastUMLSProcessor, the node of the NegationAnnotator
 is commented out:
 !-- nodeNegationAnnotator/node --
 
 Did you mean this node?
 
 At the top of the file, there is an import for the NegationAnnotator:
 delegateAnalysisEngine key=NegationAnnotator, but it is not commented
 out and never run in the fixed flow.
 
 Am I correct that the negation detection in the clinical pipeline is now
 performed by PolarityCleartkAnalysisEngine?
 
 Thanks,
 Tom
 
 On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen chen...@apache.org wrote:
 
 Tom,
 I would put it at the end of the pipeline (at a min, it should be behind
 sectionizer, sentence, tokenizer, lvg).  I would remove
 ExternalBaseAggregateTAE
 as this simulates the sectionizer, sentence, tokenizer, lvg would would be
 redundant.  I would also probably remove the last NegEx which could
 override the assertion values.
 
 Disclaimer: I did not test this yet.  Feel free to open a Jira item if it
 works for you so it can be tracked.  It seems kind of strange to have a
 descriptor xml define another xml descriptor to be loaded up via code
 again- I think this could be simplified.
 --Pei
 
 On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel deve...@gmail.com wrote:
 
 Hi,
 
 I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it
 works
 fine, I can see the smoking status annotation in the CVD.
 
 Now I would like to include the smoking status detection in the clinical
 pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run
 the
 clinincal pipeline, the smoking status will also be determined.
 
 How can I do this?
 
 I am thinking to just put the nodes from the fixed flow of
 SimulatedProdSmokingTAE.xml into the fixed flow of
 AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
 
 If so, at which exact place in the clinical pipeline fixed flow should
 these nodes be added?
 
 Is there a preferred place (such as append after the last node or put
 before the first node) ?
 
 Can a wrong position or ordering of the smoking status nodes
 damage/corrupt
 the rest of the annotations?
 
 SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
 
 fixedFlow
 nodeExternalBaseAggregateTAE/node
 nodeSentenceAdjuster/node
 nodeClassifiableEntriesAnnotator/node
 /fixedFlow
 
 AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
 fixed flow:
 
 fixedFlow
 nodeSimpleSegmentAnnotator/node
 nodeSentenceDetectorAnnotator/node
 nodeTokenizerAnnotator/node
 nodeLvgAnnotator/node
 nodeContextDependentTokenizerAnnotator/node
 nodePOSTagger/node
 !-- nodeClearPOSTagger/node --
 nodeChunker/node
 nodeAdjustNounPhraseToIncludeFollowingNP/node
 nodeAdjustNounPhraseToIncludeFollowingPPNP/node
 !--nodeLookupWindowAnnotator/node--
 nodeDictionaryLookupAnnotatorDB/node
 nodeDrugNER/node
 nodeDependencyParser/node
 nodeSemanticRoleLabeler/node
 nodeConstituencyParser/node
 !-- nodeAssertionAnnotator/node --
 !-- nodeStatusAnnotator/node --
 !-- nodeNegationAnnotator/node --
 nodeGenericCleartkAnalysisEngine/node
 nodeHistoryCleartkAnalysisEngine/node
 nodePolarityCleartkAnalysisEngine/node
 nodeSubjectCleartkAnalysisEngine/node
 nodeUncertaintyCleartkAnalysisEngine/node
 
 nodeExtractionPrepAnnotator/node
 /fixedFlow
 
 Thanks for any help or pointers,
 
 Tom
 


Prep for upcoming cTAKES 3.2.2 Patch Release

2015-03-12 Thread Chen, Pei
I was thinking of creating a 3.2.2 release for Mar (it's long passed the 
original Jan date?)  I can volunteer to be the RM again.
There are still plenty of unresolved items... If you plan to have anything you 
would like included in the upcoming release, please mark it in Jira and plan 
the commits accordingly...

Jira Items:
https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%203.2.2%20AND%20project%20%3D%20CTAKES
1-25 of 25
Columns
T

Patch Info

Key

Summary

Assignee

Reporter

P

Status

Resolution

Created

Updated

Due

[Bug]https://issues.apache.org/jira/browse/CTAKES-349

CTAKES-349https://issues.apache.org/jira/browse/CTAKES-349


JdbcWriterTemplate does not store rows if there are fewer than 100 per 
notehttps://issues.apache.org/jira/browse/CTAKES-349

Unassigned

Sean 
Finanhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=seanfinan

[Major]

OPEN

Unresolved

12/Mar/15

12/Mar/15



[Bug]https://issues.apache.org/jira/browse/CTAKES-347

CTAKES-347https://issues.apache.org/jira/browse/CTAKES-347


AggregateCdaProcessor fails with URI is not 
hierarchicalhttps://issues.apache.org/jira/browse/CTAKES-347

Unassigned

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Major]

OPEN

Unresolved

02/Feb/15

02/Feb/15



[Improvement]https://issues.apache.org/jira/browse/CTAKES-344

CTAKES-344https://issues.apache.org/jira/browse/CTAKES-344


Add DrugNER to 
clinical-pipelinehttps://issues.apache.org/jira/browse/CTAKES-344

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Major]

RESOLVED

Fixed

18/Dec/14

18/Dec/14



[Bug]https://issues.apache.org/jira/browse/CTAKES-341

CTAKES-341https://issues.apache.org/jira/browse/CTAKES-341


FractionFSM annotates incorrect 
spanhttps://issues.apache.org/jira/browse/CTAKES-341

britt 
fitchhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=britt.fitch

britt 
fitchhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=britt.fitch

[Major]

OPEN

Unresolved

12/Dec/14

12/Dec/14



[Bug]https://issues.apache.org/jira/browse/CTAKES-340

CTAKES-340https://issues.apache.org/jira/browse/CTAKES-340


AggregatePlaintextProcessor.xml has invalid 
xmlhttps://issues.apache.org/jira/browse/CTAKES-340

Unassigned

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Minor]

RESOLVED

Fixed

05/Dec/14

05/Dec/14



[Improvement]https://issues.apache.org/jira/browse/CTAKES-338

CTAKES-338https://issues.apache.org/jira/browse/CTAKES-338


Download/Unpack full LVG by 
defaulthttps://issues.apache.org/jira/browse/CTAKES-338

Unassigned

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Minor]

OPEN

Unresolved

01/Dec/14

01/Dec/14



[Bug]https://issues.apache.org/jira/browse/CTAKES-333

CTAKES-333https://issues.apache.org/jira/browse/CTAKES-333


jwnl-1.3.3.jar; error in opening zip 
filehttps://issues.apache.org/jira/browse/CTAKES-333

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Major]

RESOLVED

Fixed

18/Nov/14

18/Nov/14



[Bug]https://issues.apache.org/jira/browse/CTAKES-332

CTAKES-332https://issues.apache.org/jira/browse/CTAKES-332


Upgrade to OpenNLP 1.5.2 - 
1.5.3https://issues.apache.org/jira/browse/CTAKES-332

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Minor]

RESOLVED

Fixed

18/Nov/14

18/Nov/14



[Improvement]https://issues.apache.org/jira/browse/CTAKES-328

CTAKES-328https://issues.apache.org/jira/browse/CTAKES-328


Clean up XML Annotator 
Descriptorshttps://issues.apache.org/jira/browse/CTAKES-328

Unassigned

Pei Chenhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=chenpei

[Major]

OPEN

Unresolved

04/Nov/14

04/Nov/14



[Bug]https://issues.apache.org/jira/browse/CTAKES-320

CTAKES-320https://issues.apache.org/jira/browse/CTAKES-320


Methods used by getDefaultPipeline should be able to load reasonable defaults 
without expecting external 
files.https://issues.apache.org/jira/browse/CTAKES-320

Unassigned

jay vyashttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=jayunit100

[Major]

OPEN

Unresolved

11/Oct/14

23/Nov/14



[Bug]https://issues.apache.org/jira/browse/CTAKES-302

CTAKES-302https://issues.apache.org/jira/browse/CTAKES-302


Element type hibernate-mapping must be followed by either attribute 
specifications,  or /.https://issues.apache.org/jira/browse/CTAKES-302

Unassigned

James Joseph 
Masanzhttps://issues.apache.org/jira/secure/ViewProfile.jspa?name=james-masanz

[Major]

OPEN

Unresolved

27/Jun/14

07/Nov/14



[Improvement]https://issues.apache.org/jira/browse/CTAKES-295

CTAKES-295https://issues.apache.org/jira/browse/CTAKES-295


Use UIMAFit-style configuration 

RE: [DISCUSS] new cTAKES web site

2015-01-06 Thread Chen, Pei
Thanks for the responses!  It looks like there is a strong preference for 
Option 1:
Results Tally:
Option 1 – (7) Britt, Tim, John, Oleg, Brian, Sarma, Todd
Option 2 – (1) James: Only the Top Nav/Menu Bar
Option 3 – (0)
Option 4 – (4) Tim, James, Chen, Sarma

I suggest we converge with #1's code and incorporate the comments/feedback.

Michelle, would it be possible to attach the code of #1 into the Jira [1]?  
That way, it's gone thru the proper channels and you will receive credit for 
your contribution(s).
I will then move it to the central repo [2] that we can all work off there 
going forward.
[1] https://issues.apache.org/jira/browse/CTAKES-342
[2] http://svn.apache.org/repos/asf/ctakes/site/

--Pei

-Original Message-
From: Lingren, Todd [mailto:todd.ling...@cchmc.org] 
Sent: Friday, January 02, 2015 2:01 PM
To: dev@ctakes.apache.org; ksa...@gmail.com; u...@ctakes.apache.org
Subject: RE: [DISCUSS] new cTAKES web site

#1 is definitely my favorite. 
However, use #3 front page graphic in the #1 website 'examples' link. 

Todd Lingren
Biomedical Informatics
Cincinnati Children’s Hospital
todd.ling...@cchmc.org
513-803-9032

-Original Message-
From: Karthik Sarma [mailto:ksa...@gmail.com]
Sent: Friday, January 02, 2015 11:42 AM
To: u...@ctakes.apache.org
Cc: dev@ctakes.apache.org
Subject: Re: [DISCUSS] new cTAKES web site

Thanks so much for your work on these! I like 1 and 4 best, pretty much for the 
same reasons as everyone else.

Hope you all had a great holiday season!
ᐧ




--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging  Informatics Lab Member, CA Delegation to the 
House of Delegates of the American Medical Association ksa...@ksarma.com
gchat: ksa...@gmail.com
linkedin: www.linkedin.com/in/ksarma

On Wed, Dec 31, 2014 at 1:53 PM, Chen, Pei pei.c...@childrens.harvard.edu
wrote:

  Hi folks,
 Michelle, Sean, Guergana, and Co. have created a few mockups for the 
 new cTAKES website.  Which option would folks prefer?
 This is purely on the design intent, and layout, etc.  (not actual 
 content).

 Option 1: http://mwchen.scripts.mit.edu/cTakes/mock0/index.html
 Option 2: http://mwchen.scripts.mit.edu/cTakes/mock1/index.html
 Option 3: http://svn.apache.org/repos/asf/ctakes/site/new/index.html
 Option 4: http://svn.apache.org/repos/asf/ctakes/site/new/index2.html





[DISCUSS] new cTAKES web site

2014-12-31 Thread Chen, Pei
Hi folks,
Michelle, Sean, Guergana, and Co. have created a few mockups for the new cTAKES 
website.  Which option would folks prefer?
This is purely on the design intent, and layout, etc.  (not actual content).
Option 1: http://mwchen.scripts.mit.edu/cTakes/mock0/index.html
Option 2: http://mwchen.scripts.mit.edu/cTakes/mock1/index.html
Option 3: http://svn.apache.org/repos/asf/ctakes/site/new/index.html
Option 4: http://svn.apache.org/repos/asf/ctakes/site/new/index2.html



RE: cTakes Annotation Comparison

2014-12-19 Thread Chen, Pei
Also check out stats that Sean ran before releasing the new component on:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-fast/doc/DictionaryLookupStats.docx
From the evaluation and experience, the new lookup algorithm should be a huge 
improvement in terms of both speed and accuracy.
This is very different than what Bruce mentioned…  I’m sure Sean will chime 
here.
(The old dictionary lookup is essentially obsolete now- plagued with 
bugs/issues as you mentioned.)
--Pei

From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
Sent: Friday, December 19, 2014 10:25 AM
To: dev@ctakes.apache.org
Subject: Re: cTakes Annotation Comparison

Guergana,

I'm curious to the number of records that are in your gold standard sets, or if 
your gold standard set was run through a long running cTAKES process. I know at 
some point we fixed a bug in the old dictionary lookup that caused the 
permutations to become corrupted over time. Typically this isn't seen in the 
first few records, but over time as patterns are used the permutations would 
become corrupted. This caused documents that were fed through cTAKES more than 
once to have less codes returned than the first time.

For example, if a permutation of 4,2,3,1 was found, the permutation would be 
corrupted to be 1,2,3,4. It would no longer be possible to detect permutations 
of 4,2,3,1 until cTAKES was restarted. We got the fix in after the cTAKES 3.2.0 
release. https://issues.apache.org/jira/browse/CTAKES-310 Depending upon the 
corpus size, I could see the permutation engine eventually only have a single 
permutation of 1,2,3,4.

Typically though, this isn't very easily detected in the first 100 or so 
documents.

We discovered this issue when we made cTAKES have consistent output of codes in 
our system.

[IMAT Solutions]http://imatsolutions.com
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com
On 12/19/2014 07:05 AM, Savova, Guergana wrote:

We are doing a similar kind of evaluation and will report the results.



Before we released the Fast lookup, we did a systematic evaluation across three 
gold standard sets. We did not see the trend that Bruce reported below. The P, 
R and F1 results from the old dictionary look up and the fast one were similar.



Thank you everyone!

--Guergana



-Original Message-

From: David Kincaid [mailto:kincaid.d...@gmail.com]

Sent: Friday, December 19, 2014 9:02 AM

To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org

Subject: Re: cTakes Annotation Comparison



Thanks for this, Bruce! Very interesting work. It confirms what I've seen in my 
small tests that I've done in a non-systematic way. Did you happen to capture 
the number of false positives yet (annotations made by cTAKES that are not in 
the human adjudicated standard)? I've seen a lot of dictionary hits that are 
not actually entity mentions, but I haven't had a chance to do a systematic 
analysis (we're working on our annotated gold standard now). One great example 
is the antibiotic Today. Every time the word today appears in any text it is 
annotated as a medication mention when it almost never is being used in that 
sense.



These results by themselves are quite disappointing to me. Both the 
UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor 
recall. It seems like the trade off for more speed is a ten-fold (or more) 
decrease in entity recognition.



Thanks again for sharing your results with us. I think they are very useful to 
the project.



- Dave



On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen  
bruce.tiet...@perfectsearchcorp.commailto:bruce.tiet...@perfectsearchcorp.com
 wrote:



Actually, we are working on a similar tool to compare it to the human

adjudicated standard for the set we tested against.  I didn't mention

it before because the tool isn't complete yet, but initial results for

the set (excluding those marked as CUI-less) was as follows:



Human adjudicated annotations: 4591 (excluding CUI-less)



Annotations found matching the human adjudicated standard

UMLSProcessor  2245

FastUMLSProcessor   215













 [image: IMAT Solutions] http://imatsolutions.comhttp://imatsolutions.com  
Bruce Tietjen

Senior Software Engineer

[image: Mobile:] 801.634.1547

bruce.tiet...@imatsolutions.commailto:bruce.tiet...@imatsolutions.com



On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei

pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu



wrote:



Bruce,

Thanks for this-- very useful.

Perhaps Sean Finan comment more-

but it's also probably worth it to compare to an adjudicated human

annotated gold standard.



--Pei



-Original Message-

From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com]

Sent: Thursday, December 18, 2014 1:45 PM

To: dev@ctakes.apache.orgmailto:dev@ctakes.apache.org

Subject: cTakes Annotation Comparison



With the recent release of cTakes

RE: drug ner in ctakes 3.2.1

2014-12-18 Thread Chen, Pei
Matt,
The below change has been made in trunk:
http://svn.apache.org/r1646497
https://issues.apache.org/jira/browse/CTAKES-344
(if you make the change, be sure to also include ctakes-drug-ner in your 
pom.xml)

--Pei

-Original Message-
From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] 
Sent: Tuesday, December 16, 2014 11:17 AM
To: dev@ctakes.apache.org
Subject: RE: drug ner in ctakes 3.2.1

Matt,
I think the easiest would be to just add DrugNER Annotator to your existing 
AggregatePlaintextFastUMLSProcessor.xml.

delegateAnalysisEngine key=DrugNER
import 
location=../../../ctakes-drug-ner/desc/analysis_engine/DrugMentionAnnotator.xml/
/delegateAnalysisEngine
... //then add it after the DictionaryLookupAnnotatorDB nodeDrugNER/node

I actually think we should add the DrugNER in by default and remove all of 
redundant descriptors.
If it works for you, perhaps create a Jira for the next patch?



-Original Message-
From: Matt Work Coarr [mailto:mattcoarr.w...@gmail.com]
Sent: Tuesday, December 16, 2014 11:07 AM
To: dev@ctakes.apache.org
Subject: drug ner in ctakes 3.2.1

I'm getting an error when I try to load the Drug NER pipeline in the ctakes
3.2.1 CVD.

I received a similar error in ctakes 3.2.0 for the clinical pipeline, but the 
clinical pipeline is working for me now in 3.2.1. :-)

Is Drug NER supported in 3.2.1 and can it use the new fast dictionary lookup?  
Or is there more work required there?

FYI...

I'm loading
desc/ctakes-drug-ner/desc/analysis_engine/DrugAggregatePlaintextUMLSProcessor.xml

Here's the error message:

java.lang.IllegalArgumentException: URI is not hierarchical More detailed 
information is in the log file.


Here's the stack trace:

12/15/14 4:57:45 PM - 15:
org.apache.uima.tools.cvd.MainFrame.handleException(526): SEVERE: URI is not 
hierarchical

java.lang.IllegalArgumentException: URI is not hierarchical

at java.io.File.init(File.java:418)

at
org.apache.ctakes.core.resource.FileResourceImpl.load(FileResourceImpl.java:44)

at
org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:603)

at
org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:442)

at
org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:153)

at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)

at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:123)

at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)

at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)

at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)

at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)

at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)

at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)

at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)

at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)

at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)

at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354)

at org.apache.uima.tools.cvd.MainFrame.setupAE(MainFrame.java:1484)

at org.apache.uima.tools.cvd.MainFrame.loadAEDescriptor(MainFrame.java:476)

at
org.apache.uima.tools.cvd.control.AnnotatorOpenEventHandler.actionPerformed(AnnotatorOpenEventHandler.java:52)

at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022)

at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2346)

at
javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402)

at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259)

at javax.swing.AbstractButton.doClick(AbstractButton.java:376)

at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:833)

at com.apple.laf.AquaMenuItemUI.doClick(AquaMenuItemUI.java:157)

at
javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:877)

at java.awt.Component.processMouseEvent(Component.java:6525)

at javax.swing.JComponent.processMouseEvent(JComponent.java:3321)

at java.awt.Component.processEvent(Component.java:6290)

at java.awt.Container.processEvent(Container.java:2234)

at java.awt.Component.dispatchEventImpl(Component.java:4881)

at java.awt.Container.dispatchEventImpl(Container.java:2292

RE: cTakes Annotation Comparison

2014-12-18 Thread Chen, Pei
Bruce,
Thanks for this-- very useful.
Perhaps Sean Finan comment more- 
but it's also probably worth it to compare to an adjudicated human annotated 
gold standard.

--Pei

-Original Message-
From: Bruce Tietjen [mailto:bruce.tiet...@perfectsearchcorp.com] 
Sent: Thursday, December 18, 2014 1:45 PM
To: dev@ctakes.apache.org
Subject: cTakes Annotation Comparison

With the recent release of cTakes 3.2.1, we were very interested in checking 
for any differences in annotations between using the 
AggregatePlaintextUMLSProcessor pipeline and the 
AggregatePlanetextFastUMLSProcessor pipeline within this release of cTakes with 
its associated set of UMLS resources.

We chose to use the SHARE 14-a-b Training data that consists of 199 documents 
(Discharge  61, ECG 54, Echo 42 and Radiology 42) as the basis for the 
comparison.

We decided to share a summary of the results with the development community.

Documents Processed: 199

Processing Time:
UMLSProcessor   2,439 seconds
FastUMLSProcessor1,837 seconds

Total Annotations Reported:
UMLSProcessor  20,365 annotations
FastUMLSProcessor 8,284 annotations


Annotation Comparisons:
Annotations common to both sets:  3,940
Annotations reported only by the UMLSProcessor: 16,425
Annotations reported only by the FastUMLSProcessor:4,344


If anyone is interested, following was our test procedure:

We used the UIMA CPE to process the document set twice, once using the 
AggregatePlaintextUMLSProcessor pipeline and once using the 
AggregatePlaintextFastUMLSProcessor pipeline. We used the WriteCAStoFile CAS 
consumer to write the results to output files.

We used a tool we recently developed to analyze and compare the annotations 
generated by the two pipelines. The tool compares the two outputs for each file 
and reports any differences in the annotations (MedicationMention, 
SignSymptomMention, ProcedureMention, AnatomicalSiteMention, and
DiseaseDisorderMention) between the two output sets. The tool reports the 
number of 'matches' and 'misses' between each annotation set. A 'match' is 
defined as the presence of an identified source text interval with its 
associated CUI appearing in both annotation sets. A 'miss' is defined as the 
presence of an identified source text interval and its associated CUI in one 
annotation set, but no matching identified source text interval and CUI in the 
other. The tool also reports the total number of annotations (source text 
intervals with associated CUIs) reported in each annotation set. The compare 
tool is in our GitHub repository at 
https://github.com/perfectsearch/cTAKES-compare


RE: revamping the Apache cTAKES website

2014-12-15 Thread Chen, Pei
Check out a mockup of a new website proposal:
http://svn.apache.org/repos/asf/ctakes/site/new/index.html
Based off bootstrap (Idea borrowed from the Spark folks..).

Couple of key pieces of info:
- 10% of visitors are on mobile/tablets
- The most currently visited pages are: downloads.cgi, gettingstarted.html.  I 
suggest we focus our attention on those 2 items.  (Putting a Downloads link 
right on the front page, etc.)

svn co http://svn.apache.org/repos/asf/ctakes/site/new if you want to checkout 
the code of the site.

--Pei

-Original Message-
From: John Green [mailto:john.travis.gr...@gmail.com] 
Sent: Friday, December 05, 2014 6:34 PM
To: dev@ctakes.apache.org
Cc: dev@ctakes.apache.org
Subject: RE: revamping the Apache cTAKES website

I would like to second the bootstrap recommendation, with the additional 
recommendation of django for the backend. It is an amazing platform for rapid 
development and easy updating.


JG
—
Sent from Mailbox

On Fri, Dec 5, 2014 at 12:15 PM, Savova, Guergana 
guergana.sav...@childrens.harvard.edu wrote:

 There are now 4 volunteers:
 Michelle Chen
 Pei Chen
 Sean Finan
 Guergana Savova
 --Guergana
 -Original Message-
 From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
 Sent: Friday, December 05, 2014 11:56 AM
 To: dev@ctakes.apache.org
 Subject: RE: revamping the Apache cTAKES website Wonderful, thank you, 
 Michelle! There will be a flurry of emails the week of Dec 15 followed by 
 actual work, so book your calendar if possible...
 --Guergana
 -Original Message-
 From: Michelle Chen [mailto:michelle1919c...@gmail.com]
 Sent: Friday, December 05, 2014 11:48 AM
 To: dev@ctakes.apache.org
 Subject: Re: revamping the Apache cTAKES website Hello Guergana, I 
 don't know that much about cTakes, but would be interested in contributing to 
 the effort.
 I'm not sure if there is an interest in matching the website design of other 
 Apache projects, but it seems that the two main designs that are being used 
 from my arbitrary search on http://projects.apache.org/indexes/alpha.html is 
 1. the current design that cTakes is using and 2. a Bootstrap approach.
 I've done a little bit of work on Bootstrap and would be interested in 
 helping with that. Let me know how I can be helpful.
 Sincerely,
 Michelle Chen :)
 Be strong and of good courage; do not be afraid, nor be dismayed, for 
 the Lord your God is with you wherever you go. ~Joshua 1:9 On Fri, Dec 5, 
 2014 at 11:21 AM, Savova, Guergana  guergana.sav...@childrens.harvard.edu 
 wrote:
 cTAKES-ers,

 we would like to start working on updating the Apache cTAKES website 
 - some of the information there is already stale and needs refreshing.
 Do you have ideas on website design, content, etc.? Would you like to 
 contribute to the effort? We are planning to start working on the 
 website the week of Dec 15.

 Cheers,
 --Guergana




RE: UMLS validation url

2014-11-24 Thread Chen, Pei
That’s a typo in the fast dictionary lookup.
It should be: https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser

Jira raised for this: https://issues.apache.org/jira/browse/CTAKES-335


From: Kim Ebert [mailto:kim.eb...@imatsolutions.com]
Sent: Monday, November 24, 2014 1:28 PM
To: dev@ctakes.apache.org
Subject: UMLS validation url

Hi All,

Today I noticed that https://uts-ws.nlm.nih.gov/restful/isValidctakes.umlsuser 
is returning 404 messages. Anyone else running into the same problem?

Thanks,
--
[IMAT Solutions]http://imatsolutions.com
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com


RE: running 3.2.1rc failed. suggestion?

2014-11-18 Thread Chen, Pei
Budi,
It looks like there may have been an issue with the sourceforge mirrors for 
3.2.1.1 resources (the size should be about 627MB, not 200MB.)
I refreshed it… Could you try know?
Also, ensure do a merge of the resources folder (rather than replace.)
Hope that helps…
--Pei

From: Budi Wibowo [mailto:heisb...@umich.edu]
Sent: Tuesday, November 18, 2014 11:17 AM
To: dev@ctakes.apache.org
Subject: running 3.2.1rc failed. suggestion?

Hello,
I'm new to CTAKES and i'm trying to use it for class project. I'm trying to run 
3.2.1rc, but it failed.

This is what I did:
1. downloaded the 3.2.1.1 resource file from 
herehttp://sourceforge.net/projects/ctakesresources/files/, but it says 
invalid when I try to unzip it
2. I used 3.2.0 resource files instead
3. When running runctakesCPE.bat I picked test1.xml from 
desc\ctakes-clinical-pipeline\desc\collection_processing_engine
4. I used AggregatePlaintextFastUMLSProcessor.xml for the analysis engine. The 
error I'm getting is attached.

is it because I'm using the wring resource file? The 3.2.1.1 seems to be 
corrupt.


[Inline image 1]


RE: Using Ctakes takes a long time to process text

2014-11-17 Thread Chen, Pei
Budi,
You can also try out 
ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml
 available in the current 3.2.1-rc.
It contains a new dictionary lookup algorithm from Sean that is roughly 1000% 
faster for each pipeline.

--Pei

From: Kim Ebert [mailto:kim.eb...@imatsolutions.com]
Sent: Monday, November 17, 2014 11:55 AM
To: dev@ctakes.apache.org
Subject: Re: Using Ctakes takes a long time to process text

Hi,

cTakes currently is single threaded. To increase throughput, we use a patch or 
two to run several pipelines at once. To Increase your performance, I would 
recommend splitting up the work into multiple batches.

To get cTakes to run multiple threads, you have to patch the LVG. I believe the 
patch is available in the bug tracker.

You also would need to run them as separate pipelines inside of UIMA. You can't 
just say increase the number of threads for this operation. While we aren't 
using static variables, state is maintained inside of the object. 
http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded

https://issues.apache.org/jira/browse/CTAKES-151

[IMAT Solutions]http://imatsolutions.com
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.eb...@imatsolutions.commailto:greg.hub...@imatsolutions.com

On 11/16/2014 12:06 AM, Budi Wibowo wrote:

Hello,

i'm using CTAKES for class project.

I'm using CPE to process clinical text notes.

I ran the software with -Xmx10g command.

I have 16g in my machine.



my problem is:

running the CPE takes a long long time.

I'm processing 175 clinical notes with a total of 40MB for all

the notes.

CPE has been running close to 4 hours now,

and it's only been able to process 104 out of

175.



I'm using the test1.xml CPE descriptor and

AggregatePlaintextUMLSProcessor.xml analysis

engine(clinical-text-pipeline).



 it seems like java is only using 2 core out of the 16 I have. RAM usage hover 
between 5-8gb.



is there anyway i can make the software run a bit faster?











RE: Announcement: UMLS MedGen-MySQL dataset now available as open access download

2014-11-13 Thread Chen, Pei
John- I believe that was the thinking.
Andy- Just to confirm- Is the raw content of this dataset released under 
ASL2.0?  i.e. can you contribute it as a CSV or similar so that cTAKES may 
re-tokenize it using the same PTB rules, format it for cTAKES' dictionary 
lookup, etc., and then redistribute it under the same License.

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Thursday, November 13, 2014 1:55 PM
 To: dev@ctakes.apache.org
 Cc: dev@ctakes.apache.org
 Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available
 as open access download
 
 The old licensed setup would be kept as a packaged option? Much as it is
 now With the unlicensed going out in place of the current free
 dictionary? Am I understanding that right?
 
 
 JG
 —
 Sent from Mailbox
 
 On Thu, Nov 13, 2014 at 1:40 PM, andy mcmurry
 mcmurry.a...@gmail.com
 wrote:
 
  I'll crunch the numbers -- in the meantime I can tell you that
  phenotypes vary by semantic type. clinical attributes  from SNOMED are
  abundant, many concepts in mesh that are mapped to diseases. Tons of
  pharmacological substances
  On Nov 12, 2014 6:19 AM, Dligach, Dmitriy 
  dmitriy.dlig...@childrens.harvard.edu wrote:
  Andy, thank you for this resource!
 
  Do you have an estimate of what percentage of UMLS concepts were left
 out?
 
  Dima
 
 
 
 
  On Nov 11, 2014, at 16:02, andy mcmurry mcmurry.a...@gmail.com
 wrote:
 
   Hello!
  
   https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
  
   We just released a new library containing a huge chunk of UMLS
   concepts which are available without registering
 accounts/username/passwords.
   LEGALLY. Yes, really!
  
   The subset is from NCBI and it contains *thousands of concepts from
  SNOMED
   and other vocabularies*.
  
   The code is essentially
   1. a list of WGET targets to various NCBI FTP site mirrors 2.
   Makefile for building the databases of interest
  
   Our legal team has approved distribution for Open Access work, ASL2
   LICENSE.
  
   I recommend we use this opportunity to make this the default
   distribution for CTAKES UMLS connections, because it obviates the
   need for so much painful credentialing and back and forth
   agreements with the US National Library of Medicine.
  
   Cheers!
   --Andy
  
  
   On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. 
  masanz.ja...@mayo.edu
   wrote:
  
  
   I would love to see the install be as simple as apt-get install to
   end
  up
   with some working dictionary that have more than a handful of
   entries to get them started.
  
   Regards,
   James Masanz
  
   -Original Message-
   From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
   Sent: Tuesday, September 09, 2014 4:32 PM
   To: ctakes-...@incubator.apache.org
   Subject: Recommendation for ctakes default (UMLS) dictionaries
  
   Greetings ctakes-dev:
  
   *UMLS license restrictions have been getting more lax over the
   years -- *much of the UMLS can be downloaded directly from the
   NCBI official FTP site.
  
   In fact, the NIH (and implicitly the NLM) *have already made the
  standard
   terms public for some medical specialities*.
  
   For example: Here is the UMLS subset specific to Medical Genetics
  (MedGen)
   and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s)
   and
  names,
   etc :
  
   [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
  
   My team has developed a JVM based wrapper for MetaMap 2013AB
 which
   I intend to open source soon (Clojure).  It includes REST support
   for invoking MetaMap with any or all of the command line arguments.
   We do not integrate with UIMA, we are basically a wrapper around
   the binary installation of MetaMap. The emphasis is on publication
   text not clinical text, still, some services are common (such as LVG).
  
   Strangely, the NLM still requires UMLS licenses to download
   MetaMap execution binaries. The MetaMap binary install is better
   but customizing dictionaries (DataFileBuilder) is not as easy to
   use as CTAKES with
  YTEXT
  
   [
   https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installati
   on
  ]
  
   *** Hence, there is a real opportunity here to enable Apache
   cTAKES to have a stronger default dictionary. ** *
  
   Imagine if we could
   *$ apt-get install apache-ctakes *
  
   and instantly have a working package for SOME problem domain.
   In my case (Medical Genetics) the UMLS definitions are already
   available and the UMLS license problem becomes a non issue, at
   least for many
  first
   time users
  
   Your thoughts?
   AndyMC
  
 
 


Re: svn commit: r1637884 - in /ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal: ae/ eval/

2014-11-10 Thread Chen, Pei
Chen,
Does this need to go into this upcoming release or can it wait till the next 
one?

Sent from my iPhone

 On Nov 10, 2014, at 10:20 AM, c...@apache.org c...@apache.org wrote:
 
 Author: clin
 Date: Mon Nov 10 15:19:55 2014
 New Revision: 1637884
 
 URL: http://svn.apache.org/r1637884
 Log:
 add annotators and update evaluation code for i2b2 data.
 add more system-generated events for candidate temporal relations.
 
 Added:

 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventEventI2B2RelationAnnotator.java

 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventTimeI2B2RelationAnnotator.java
 Modified:

 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventEventRelationAnnotator.java

 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfEventEventThymeRelations.java

 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfEventTimeRelations.java

 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/eval/EvaluationOfI2B2TemporalRelations.java
 
 Added: 
 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventEventI2B2RelationAnnotator.java
 URL: 
 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventEventI2B2RelationAnnotator.java?rev=1637884view=auto
 ==
 --- 
 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventEventI2B2RelationAnnotator.java
  (added)
 +++ 
 ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/EventEventI2B2RelationAnnotator.java
  Mon Nov 10 15:19:55 2014
 @@ -0,0 +1,280 @@
 +/**
 + * Licensed to the Apache Software Foundation (ASF) under one
 + * or more contributor license agreements.  See the NOTICE file
 + * distributed with this work for additional information
 + * regarding copyright ownership.  The ASF licenses this file
 + * to you under the Apache License, Version 2.0 (the
 + * License); you may not use this file except in compliance
 + * with the License.  You may obtain a copy of the License at
 + *
 + *   http://www.apache.org/licenses/LICENSE-2.0
 + *
 + * Unless required by applicable law or agreed to in writing,
 + * software distributed under the License is distributed on an
 + * AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 + * KIND, either express or implied.  See the License for the
 + * specific language governing permissions and limitations
 + * under the License.
 + */
 +package org.apache.ctakes.temporal.ae;
 +
 +import java.io.File;
 +import java.util.ArrayList;
 +import java.util.Arrays;
 +import java.util.Collection;
 +import java.util.List;
 +import java.util.Map;
 +
 +import org.apache.ctakes.relationextractor.ae.RelationExtractorAnnotator;
 +import 
 org.apache.ctakes.relationextractor.ae.features.PartOfSpeechFeaturesExtractor;
 +import 
 org.apache.ctakes.relationextractor.ae.features.RelationFeaturesExtractor;
 +//import 
 org.apache.ctakes.relationextractor.ae.features.TokenFeaturesExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.CheckSpecialWordRelationExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.ConjunctionRelationFeaturesExtractor;
 +//import org.apache.ctakes.temporal.ae.feature.DependencyParseUtils;
 +import org.apache.ctakes.temporal.ae.feature.DependencyPathFeaturesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.CoordinateFeaturesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.DependingVerbsFeatureExtractor;
 +//import 
 org.apache.ctakes.temporal.ae.feature.EventInBetweenPropertyExtractor;
 +//import org.apache.ctakes.temporal.ae.feature.EventOutsidePropertyExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.SpecialAnnotationRelationExtractor;
 +import org.apache.ctakes.temporal.ae.feature.TemporalPETFlatExtractor;
 +import org.apache.ctakes.temporal.ae.feature.TokenPropertyFeaturesExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.DeterminerRelationFeaturesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.EventArgumentPropertyExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.EventTimeRelationFeatureExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.EventPositionRelationFeaturesExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.NumberOfEventsInTheSameSentenceExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.NearbyVerbTenseRelationExtractor;
 +import 
 org.apache.ctakes.temporal.ae.feature.NumberOfEventTimeBetweenCandidatesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.OverlappedHeadFeaturesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.SRLRelationFeaturesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.TimeXRelationFeaturesExtractor;
 +import org.apache.ctakes.temporal.ae.feature.SectionHeaderRelationExtractor;
 

Apache cTAKES 3.2.1 (rc1)

2014-11-10 Thread Chen, Pei
RC1 ready for testing:
Binary Artifacts: https://dist.apache.org/repos/dist/dev/ctakes/ctakes-3.2.1/
Tag: https://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.1-rc1/

Would be great if folks have time to test/verify especially if you opened any 
of the Jira's below to ensure the bugs have been fixed/integrated.

Changes:
Sub-task
· [CTAKES-312https://issues.apache.org/jira/browse/CTAKES-312] - 
upgrade uimafit
· [CTAKES-324https://issues.apache.org/jira/browse/CTAKES-324] - 
Deploy 3.2.1 resources from SF to Maven Central/OSSonatype
Bug
· [CTAKES-162https://issues.apache.org/jira/browse/CTAKES-162] - 
Command line scripts leave the user back one directory
· [CTAKES-169https://issues.apache.org/jira/browse/CTAKES-169] - 
SectionSegmentAnnotator.java is in core, but the sample 
SectionSegmentAnnotator.xml descriptor is in ctakes-clinical-pipeline
· [CTAKES-241https://issues.apache.org/jira/browse/CTAKES-241] - 
NullPointerException in ctakes-assertion
· [CTAKES-280https://issues.apache.org/jira/browse/CTAKES-280] - 
upgrade to cleartk-2.*
· [CTAKES-285https://issues.apache.org/jira/browse/CTAKES-285] - 
cleartk-ml-liblinear needs to be added to the dependencies
· [CTAKES-307https://issues.apache.org/jira/browse/CTAKES-307] - URI 
is not hierarchical when running mvn install
· [CTAKES-309https://issues.apache.org/jira/browse/CTAKES-309] - Add 
SNOMEDCT_US to ytext db scripts
· [CTAKES-310https://issues.apache.org/jira/browse/CTAKES-310] - 
Dictionary lookup permutations sort issue
· [CTAKES-311https://issues.apache.org/jira/browse/CTAKES-311] - 
v_document_cui_sent View returns no results in cTAKES-YTEX
· [CTAKES-319https://issues.apache.org/jira/browse/CTAKES-319] - YTEX 
Web Semantic Search not starting in Linux
· [CTAKES-321https://issues.apache.org/jira/browse/CTAKES-321] - 
Verify ctakes-ytex 3rd party dependencies
· [CTAKES-327https://issues.apache.org/jira/browse/CTAKES-327] - Make 
inner classes import explicit - if not, class names can be ambiguous and depend 
on compiler order.
Improvement
· [CTAKES-94https://issues.apache.org/jira/browse/CTAKES-94] - 
refactoring assertion module to use a cleartk-based analysis engine (and 
include evaluation)
· [CTAKES-222https://issues.apache.org/jira/browse/CTAKES-222] - 
FirstTokenPermLookupInitializerImpl to suppot arraylist of 
DictionaryLookupWindows
· [CTAKES-225https://issues.apache.org/jira/browse/CTAKES-225] - 
Common Type System - Add field to save preferredText in Segment
· [CTAKES-325https://issues.apache.org/jira/browse/CTAKES-325] - 
Create aggregate pipeline to use dictionary-lookup-fast
· [CTAKES-326https://issues.apache.org/jira/browse/CTAKES-326] - 
Default Assertion to use new clearTK based models
New Feature
· [CTAKES-329https://issues.apache.org/jira/browse/CTAKES-329] - Add 
temporal event-event and event-time relation discovery models
Task
· [CTAKES-315https://issues.apache.org/jira/browse/CTAKES-315] - 
Update Default UMLS pipeline to use dictionary-lookup-fast
· [CTAKES-323https://issues.apache.org/jira/browse/CTAKES-323] - 
Create a 3.2.1 Release



RE: ctakes-dictionary-lookup-fast

2014-11-07 Thread Chen, Pei
Attached screenshots of CVD output to the Jira[1].
As much as I hate maintaining more desc xml's, but I think it's prudent to 
create a separate one for a patch release temporarily for 
ctakes-dictionary-lookup-fast so users do not get blindsided by the change in 
output.
So users can still choose the existing behavior: 
AggregatePlaintextUMLSProcessor.xml
Or the new dictionary lookup: AggregatePlaintextFastUMLSProcessor.xml

[1] https://issues.apache.org/jira/browse/CTAKES-325

We can replace the xml's in the next major/minor release...
--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Thursday, November 06, 2014 10:17 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: ctakes-dictionary-lookup-fast
 
 The image  didn't come through for me. Can you post the image somewhere
 and send the url? Thanks.
 
 
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Thursday, November 06, 2014 2:55 PM
 To: dev@ctakes.apache.org
 Subject: ctakes-dictionary-lookup-fast
 
 Hi,
 The original plan was to update AggregatePlaintextUMLSProcessor.xml to
 use the new ultrafast dictionary lookup in the upcoming 3.2.1 release.
 However, the output is slightly different the old cTAKES dictionary where it
 no longer has a SNOMED/RXNORM consumer (Returns CUI's only and
 doesn't post process map back to the SNOMED/RXNORM codes.)  This can
 certainly be done again, but I am not sure how many people are dependent
 on the AggregatePlaintextUMLSProcessor.xml to consider this a patch
 release.
 Some Options/Ideas:
 
 1)  Create a AggreatePlaintextUMLSFastProcessor.xml which defaults to
 dictionary-lookup-fast. But doesn't return the codes for now.  We replace
 the default pipeline when SNOMED/RXNORM codes are returned again.
 
 2)  Push forward with defaulting to the new dictionary-lookup-fast in
 AggregatePlaintextUMLSProcessor.xml
 
 Example output of dictionary-lookup-fast:
 
 [cid:image001.png@01CFF9D9.E5D2CA50]


ctakes-dictionary-lookup-fast

2014-11-06 Thread Chen, Pei
Hi,
The original plan was to update AggregatePlaintextUMLSProcessor.xml to use the 
new ultrafast dictionary lookup in the upcoming 3.2.1 release.
However, the output is slightly different the old cTAKES dictionary where it no 
longer has a SNOMED/RXNORM consumer (Returns CUI's only and doesn't post 
process map back to the SNOMED/RXNORM codes.)  This can certainly be done 
again, but I am not sure how many people are dependent on the 
AggregatePlaintextUMLSProcessor.xml to consider this a patch release.
Some Options/Ideas:

1)  Create a AggreatePlaintextUMLSFastProcessor.xml which defaults to 
dictionary-lookup-fast. But doesn't return the codes for now.  We replace the 
default pipeline when SNOMED/RXNORM codes are returned again.

2)  Push forward with defaulting to the new dictionary-lookup-fast in 
AggregatePlaintextUMLSProcessor.xml

Example output of dictionary-lookup-fast:

[cid:image001.png@01CFF9D9.E5D2CA50]


RE: YTEX depends on trove4j? LGPL issue

2014-11-04 Thread Chen, Pei
VJ,
This required a code change as well.
I updated it to use java pojo's instead.  Would be good if you can help 
verify/confirm:
Please see http://svn.apache.org/r1636663

import gnu.trove.set.TIntSet;
import gnu.trove.set.hash.TIntHashSet;


 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Thursday, October 16, 2014 12:47 PM
 To: dev@ctakes.apache.org
 Subject: RE: YTEX depends on trove4j? LGPL issue
 
 http://issues.apache.org/jira/browse/CTAKES-321 has been opened to track
 this.
 I've manually reviewed all included jars in the bin distro one-by-one... see
 attachment in Jira for comparisons since incubation.
 In addition to trove, there is also jsr305 which is also LGPL (most likely
 inadvertently included via transitive dependencies).
 If anyone would like to take a second look, it'll be much appreciated.
 These need to get resolved before the next release...
 --Pei
 
  -Original Message-
  From: Steven Bethard [mailto:steven.beth...@gmail.com]
  Sent: Wednesday, October 15, 2014 1:42 PM
  To: dev@ctakes.apache.org
  Subject: Re: YTEX depends on trove4j? LGPL issue
 
  In addition to fixing the trove4j dependency, if anyone knows how to
  get a report of all dependency licenses, it would be good to
  double-check the rest of the YTEX dependencies to make sure there aren't
 other issues.
 
  On Wed, Oct 15, 2014 at 10:42 AM, Chen, Pei
  pei.c...@childrens.harvard.edu wrote:
   Steve,
   This is a good catch!  I was pretty sure 3rd party libs were checked
   but
  somehow this may have been missed.
   I noticed it's in the convenience binary distro as well.  We need to
   remove
  this; I'll create a Jira.
   VJ, could you confirm- I actually don't think we use trove4j in ytex?
   ctakes-ytex/pom.xml
  
   --Pei
  
   -Original Message-
   From: Steven Bethard [mailto:steven.beth...@gmail.com]
   Sent: Wednesday, October 15, 2014 10:40 AM
   To: dev@ctakes.apache.org
   Subject: YTEX depends on trove4j? LGPL issue
  
   It seems that YTEX depends on trove4j which is LGPL [1], but
   LGPL-licensed works must not be included in Apache products [2].
   Have the YTEX dependencies been reviewed for licensing issues? (I
   only stumbled upon the trove issue via a version conflict in other
   code.)
  
   Steve
  
   [1] http://trove4j.sourceforge.net/html/license.html
   [2] http://www.apache.org/legal/resolved.html


RE: Error when installing cTAKES 3.2.0-rc2

2014-11-03 Thread Chen, Pei
Lam Vu,
Have you tried running it with -DskipTests as a temp workaround?
Tests in error:
  
TestClearNLPPipeLine(org.apache.ctakes.dependency.parser.ae.util.TestClearNLPAnalysisEngines):
 URI
is not hierarchical

From: Lam Vu Son [mailto:lamvu...@gmail.com]
Sent: Friday, October 31, 2014 12:10 PM
To: dev@ctakes.apache.org
Subject: Error when installing cTAKES 3.2.0-rc2

Hello cTAKES community,

I am new to cTAKES and Maven. I am having problem with installing cTAKES 
3.2.0-rc2 for developer.
I am sorry if this question is sent to an inappropriate address.

I followed instructions at this guide: 
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.0+Developer+Install+Guide.

I checked out Maven Projects from SCM (tag: 
https://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.0-rc2/) (the size of 
ctakes folder is about 2.5GB)
Checking out and importing was succesfull.
However, in Eclipse, I see many project with errors. I have cleaned and built 
all project, but there are 1759 errors.

I tried to run UIMA_CVD--clinical_documents_pipeline.launch, select 
AggregatePlainTextProcessor.xml as an AE, then I received a message box with 
error: org.apache.uima.resource.ResourceInitializationException: An import 
could not be resolved. No .xml file with name 
org.apache.ctakes.assertion.types.TypeSystem was found in the class path or 
data path.

In ctakes-clinical-pipeline project, I noticed that there are missing files 
in Maven Dependencies (such as ctakes-assertion-3.2.0.jar, 
ctakes-ytex-3.2.0.jar...) (please see attachment)

Then, I ran mvn clean install, but this didn't solve the problem (please see 
attachement for log file)

I appreciate your help in advance.

-Lam.


RE: CTakes on github.

2014-10-30 Thread Chen, Pei
Jay,
Were you proposal the A) Hybrid [1] git/svn approach (where svn is still the 
primary version control, but code and full histories gets gets mirrored to git)?
OR were you proposal B) replacing SVN and using git as the primary version 
control for cTAKES @ a.o [3]?
A) is fairly straightforward and most folks probably won't have any issues with 
it.  B) Will be a tougher proposal as it changes the workflow and will require 
all current committers to transfer over to git.

[1] http://git.apache.org/
[2] http://www.apache.org/dev/git.html
[3] https://git-wip-us.apache.org/

 -Original Message-
 From: jay vyas [mailto:jayunit100.apa...@gmail.com]
 Sent: Wednesday, October 29, 2014 1:32 PM
 To: dev@ctakes.apache.org
 Subject: CTakes on github.
 
 Hi CTakes.
 
 I notice we dont have a github presence yet.   Maybe we can file an INFRA
 ticket for this ?
 
 --
 jay vyas


RE: YTEX depends on trove4j? LGPL issue

2014-10-16 Thread Chen, Pei
http://issues.apache.org/jira/browse/CTAKES-321 has been opened to track this.
I've manually reviewed all included jars in the bin distro one-by-one... see 
attachment in Jira for comparisons since incubation.
In addition to trove, there is also jsr305 which is also LGPL (most likely 
inadvertently included via transitive dependencies).
If anyone would like to take a second look, it'll be much appreciated.
These need to get resolved before the next release...
--Pei

 -Original Message-
 From: Steven Bethard [mailto:steven.beth...@gmail.com]
 Sent: Wednesday, October 15, 2014 1:42 PM
 To: dev@ctakes.apache.org
 Subject: Re: YTEX depends on trove4j? LGPL issue
 
 In addition to fixing the trove4j dependency, if anyone knows how to get a
 report of all dependency licenses, it would be good to double-check the rest
 of the YTEX dependencies to make sure there aren't other issues.
 
 On Wed, Oct 15, 2014 at 10:42 AM, Chen, Pei
 pei.c...@childrens.harvard.edu wrote:
  Steve,
  This is a good catch!  I was pretty sure 3rd party libs were checked but
 somehow this may have been missed.
  I noticed it's in the convenience binary distro as well.  We need to remove
 this; I'll create a Jira.
  VJ, could you confirm- I actually don't think we use trove4j in ytex?
  ctakes-ytex/pom.xml
 
  --Pei
 
  -Original Message-
  From: Steven Bethard [mailto:steven.beth...@gmail.com]
  Sent: Wednesday, October 15, 2014 10:40 AM
  To: dev@ctakes.apache.org
  Subject: YTEX depends on trove4j? LGPL issue
 
  It seems that YTEX depends on trove4j which is LGPL [1], but
  LGPL-licensed works must not be included in Apache products [2].
  Have the YTEX dependencies been reviewed for licensing issues? (I
  only stumbled upon the trove issue via a version conflict in other
  code.)
 
  Steve
 
  [1] http://trove4j.sourceforge.net/html/license.html
  [2] http://www.apache.org/legal/resolved.html


RE: YTEX depends on trove4j? LGPL issue

2014-10-15 Thread Chen, Pei
Steve,
This is a good catch!  I was pretty sure 3rd party libs were checked but 
somehow this may have been missed.
I noticed it's in the convenience binary distro as well.  We need to remove 
this; I'll create a Jira.
VJ, could you confirm- I actually don't think we use trove4j in ytex? 
ctakes-ytex/pom.xml

--Pei

 -Original Message-
 From: Steven Bethard [mailto:steven.beth...@gmail.com]
 Sent: Wednesday, October 15, 2014 10:40 AM
 To: dev@ctakes.apache.org
 Subject: YTEX depends on trove4j? LGPL issue
 
 It seems that YTEX depends on trove4j which is LGPL [1], but LGPL-licensed
 works must not be included in Apache products [2].
 Have the YTEX dependencies been reviewed for licensing issues? (I only
 stumbled upon the trove issue via a version conflict in other code.)
 
 Steve
 
 [1] http://trove4j.sourceforge.net/html/license.html
 [2] http://www.apache.org/legal/resolved.html


RE: NPE with ytex in ctakes 3.2.0

2014-10-10 Thread Chen, Pei
I think it’s in ctakes-ytex-res.jar (is that in your classpath)?
This is just a guess… vj may have a better idea if it still doesn’t work for 
you.

From: David Kincaid [mailto:kincaid.d...@gmail.com]
Sent: Friday, October 10, 2014 4:51 PM
To: u...@ctakes.apache.org
Subject: Re: NPE with ytex in ctakes 3.2.0

No. I have no file named beanRefContext.xml anywhere on my hard drive.



On Fri, Oct 10, 2014 at 3:45 PM, Chen, Pei 
pei.c...@childrens.harvard.edumailto:pei.c...@childrens.harvard.edu wrote:
I’m not too familiar with the ytex component,
but my guess is that the ytexApplicationContext bean is null?
It seems that it would be expected to be in the 
classpath*:org/apache/ctakes/ytex/uima/beanRefContext.xml?  Do those exists?

From: David Kincaid 
[mailto:kincaid.d...@gmail.commailto:kincaid.d...@gmail.com]
Sent: Friday, October 10, 2014 4:23 PM
To: u...@ctakes.apache.orgmailto:u...@ctakes.apache.org
Subject: NPE with ytex in ctakes 3.2.0

I'm trying to experiment the ytex in 3.2.0. Trying to run 
AggregatePlaintextUMLSProcessor with the FilesInDirectoryCollectionReader and 
FileWriterCASConsumer. When I try to run it against some text files it blows up 
with a null pointer exception during initialization. Here's the relevant part 
of the stack trace. Anyone have any ideas what I might have wrong?:

Caused by: org.apache.uima.resource.ResourceInitializationException: 
Initialization of annotator class 
org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator failed.  
(Descriptor: 
file:/home/davek/apps/apache-ctakes-3.2.0/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml)
  at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
  at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156)
  at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
  at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
  at 
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
  at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
  at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
  at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
  at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
  at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
  at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
  at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
  at 
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
  at 
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314)
  at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425)
  at 
org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1088)
  ... 9 more
Caused by: java.lang.NullPointerException
  at 
org.apache.ctakes.ytex.uima.ApplicationContextHolder.getApplicationContext(ApplicationContextHolder.java:79)
  at 
org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator.initialize(SegmentRegexAnnotator.java:64)
  at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:250)
  ... 24 more




RE: sentence detector model

2014-09-29 Thread Chen, Pei
Assuming we have a representative training set, are there any objections if we 
default cTAKES to this SentenceAnnotator + Model?
For the upcoming release:
- Consolidate the existing sentence detector, ytex sentence dectector into this 
new? 
- Allow a config parameter to still allow an override of a hard break on 
newline chars.  That way, we won't have maintain multiple sentence annotators 
and it'll be less confusing for new users...

--Pei 


 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Monday, September 29, 2014 2:47 PM
 To: dev@ctakes.apache.org
 Subject: Re: sentence detector model
 
 That does sound like it would be useful since MIMIC does have both kinds of
 linebreak styles in different notes. If I did some annotations on such a
 dataset would it be re-distributable, say on the physionet website? I believe
 the ShARe project has a download site there (it is a layer of annotations on
 MIMIC). Another option would be you posting your raw data there and I
 could post offset-based annotations on a public repo like github.
 Tim
 
 
 On 09/29/2014 01:54 PM, Peter Szolovits wrote:
  I have a set of about 27K documents from MIMIC (circa 2009) in which I
 have replaced the weird PHI markers by synthesized pseudonymous data.
 These have natural sentence breaks (typically in the middle of lines), normal
 paragraph structure, bulleted lists, etc.  Assuming it goes to people who have
 signed the MIMIC DUA, I could provide these if you are interested.  --Pete
 Sz.
 
  On Sep 29, 2014, at 1:37 PM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 
  Some of them are a bit artificial for this task, with notes being
  annotated as one sentence per line and offset punctuation. I think
  maybe the 2008 and 2009 data might have original formatting though,
  with newlines not always breaking sentences. That has certain
  advantages over raw MIMIC for training since the PHI isn't so weirdly
  formatted, but then again is not a mix of styles (that is, the styles
  of newline always terminates sentence vs. sometimes terminates
  sentence). I think it would still have to be paired with another dataset to
 be a representative sample.
  Tim
 
  On 09/29/2014 01:24 PM, vijay garla wrote:
  Why not use the i2b2 corpora?
 
  On Monday, September 29, 2014, Dligach, Dmitriy 
  dmitriy.dlig...@childrens.harvard.edu wrote:
 
  Maybe creating a made-up set of sentences would be an option? That
  way we could agree on the annotation of concrete cases. Although
  this would be more of a unit test than a corpus.
 
  Dima
 
 
 
 
  On Sep 27, 2014, at 12:15, Miller, Timothy 
  timothy.mil...@childrens.harvard.edu javascript:; wrote:
 
  I've just been using the opennlp command line cross validator on
  the
  small dataset i annotated (along with some eyeballing). It would be
  cool if there was a standard clinical resource available for this
  task, but I hadn't considered it much because the data I annotated
  pulls from multiple datasets and the process of  arranging with
  different institutions to make something like that available would
 probably be a nightmare.
  Tim
 
  Sent from my iPad. Sorry about the typos.
 
  On Sep 27, 2014, at 12:16 PM, Dligach, Dmitriy 
  dmitriy.dlig...@childrens.harvard.edu javascript:; wrote:
  Tim, thanks for working on this!
 
  Question: do we have some formal way of evaluating the sentence
  detector? Maybe we should come up with some dev set that would
  include examples from mimic...
  Dima
 
 
 
 
  On Sep 27, 2014, at 8:57, Miller, Timothy 
  timothy.mil...@childrens.harvard.edu javascript:; wrote:
  I have been working on the sentence detector newline issue,
  training a
  model to probabilistically split sentences on newlines rather than
  forcing sentence breaks. I have checked in a model to the repo
  under ctakes-core-res. I also attached a patch to ctakes-core to the jira
 issue:
  https://issues.apache.org/jira/browse/CTAKES-41
 
  for people to test. The status of my testing is that it doesn't
  seem
  to break on notes where ctakes worked well before (those where
  newlines are always sentence breaks), and is a slight improvement
  on notes where newlines may or may not be sentence breaks. Once
 the
  change is checked in we can continue improving the model by adding
  more data and features, but the first hurdle I'd like to get past
  is making sure it runs well enough on the type of data that the old
  model worked well on. Let me know if you have any questions.
  Thanks
  Tim
 



RE: v_document_cui_sent not being populated

2014-09-10 Thread Chen, Pei
Applied fix in trunk r.1624031
https://issues.apache.org/jira/browse/CTAKES-311
VJ- I'm not sure if there is test coverage for this, but let us know if you 
have any idea/sconcerns.
--Pei

 -Original Message-
 From: Tim O'Connell [mailto:tim.oconn...@gmail.com]
 Sent: Monday, September 08, 2014 7:05 PM
 To: dev@ctakes.apache.org
 Subject: Re: v_document_cui_sent not being populated
 
 Hi Pei,
 
 Happy to do so.  Just created the issue in JIRA.  Traveling at present.
  Will do the patch over the next few days.
 
 Best,
 Tim
 
 On Mon, Sep 8, 2014 at 1:55 PM, Pei Chen chen...@apache.org wrote:
 
  Hi Tim,
  Thanks for catching that- yes, would you mind creating a jira for that?
  Even better if you can attach a patch for it (perhaps a good idea to
  search/replace on the entire project) and we can include in the next
  3.2.1 patch...
  --Pei
 
  On Mon, Sep 8, 2014 at 4:50 PM, Tim O'Connell tim.oconn...@gmail.com
  wrote:
   Hi Clayton,
  
   (One of) problems here is that the source for the creation of the
   v_document_cui_sent view contains an error (I think).
  
   You can see the view source (in MySQL anyway) by using 'show create
   view v_document_cui_sent'.
  
   You'll then see 'where (`ref_uima_type`.`uima_type_name` =
   'edu.mayo.bmi.uima.core.type.textspan.Sentence') join `document`
   `d` on((`da`.`document_id` = `d`.`document_id...' in the view definition.
  
   The 'edu.mayo.bmi.uima...' is the old notation for the
   uima_type_name
  that
   this view depends on. It should be '
   org.apache.ctakes.typesystem.type.textspan.Sentence'.
  
   You can drop the view and re-create it with the correction using the
   correct syntax for your SQL DB.  The bug lives in the ytex setup
   script
  in
   CTAKES_HOME\bin\ctakes-
 ytex\scripts\data\SQL_TYPE\uima\create_view.s
   ql
  
   Pei - let me know if you want me to create an issue for this in Jira.
  
   Best,
   Tim
  
   On Mon, Sep 8, 2014 at 11:09 AM, Clayton Turner
   caturn...@g.cofc.edu
   wrote:
  
   Hey everyone:
  
   I'm using the ytex branch of ctakes and am trying to pull down
  polarities
   of concepts and other related information after running the ytex
  pipeline
   AE on my data.
  
   the v_document_cui_sent table contains 0 rows of data, but
   v_document
  and
   v_document_ontoanno both contain data.
  
   I would be able to get by with the latter 2, but I'm hitting some
  oddities
   in my data. I'm sure there's a simple way to do this, but I'm not
   able
  to
   come up with a solution right now.
  
   I run:
   select d.instance_id,v.polarity from document d join
  v_document_ontoanno v
   on d.document_id=v.document_id where d.analysis_batch=sle1 and
   v.code=C0277942; in order to look at which of my noteid's
   expressed
  the
   concept matching C0277942. A lot of these noteid's contain
   differing polarities, so running an update on an external table to
   grab the polarities differs on which update is run first (-1
   polarities or 1 polarities). Is there a way to dynamically add
   these or just have some formal resolution that isn't dependent on
 which command runs first?
  
   Thanks,
   Clayton
  
 


Re: Ctakes to process 5000K recoreds

2014-09-09 Thread Chen, Pei
(Trying to avoid passing individual jars via email)

Sent from my iPhone

 On Sep 9, 2014, at 5:26 PM, Chen, Pei pei.c...@childrens.harvard.edu 
 wrote:
 
 Sean-
 Aren't the scripts to generate the DB already available in the sandbox area?  
 
 Sent from my iPhone
 
 On Sep 9, 2014, at 5:24 PM, Finan, Sean sean.fi...@childrens.harvard.edu 
 wrote:
 
 There is a tool to generate a dictionary in the new format using the UMLS 
 MR*** files.  
 
 The module can also read directly from a file with bar-separated values:  
 CUI|Text or CUI|TUI|Text which could be useful for small custom dictionaries.
 
 I can send a copy of the dictionary creator jar and instructions tomorrow.
 
 Sean
 
 From: Bruce Tietjen [bruce.tiet...@perfectsearchcorp.com]
 Sent: Tuesday, September 09, 2014 5:17 PM
 To: dev@ctakes.apache.org
 Subject: Re: Ctakes to process 5000K recoreds
 
 Sean,
 
 If that is a script for generating a dictionary for use with
 dictionary-lookup-fast, I would also be very interested in checking it out.
 
 Thanks,
 
 Bruce
 
 
 [image: IMAT Solutions] http://imatsolutions.com
 Bruce Tietjen
 Senior Software Engineer
 [image: Mobile:] 801.634.1547
 bruce.tiet...@imatsolutions.com
 
 On Tue, Sep 9, 2014 at 2:40 PM, Nick Nikandish 
 snika...@emerginghealthit.com wrote:
 
 Great. I will do that. Thanks again.
 
 Nick
 
 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
 Sent: Tuesday, September 09, 2014 4:39 PM
 To: dev@ctakes.apache.org
 Subject: RE: Ctakes to process 5000K recoreds
 
 Just use it with cTakes.  Instead of removing other modules from the
 pipeline, replace the dictionary-lookup with dictionary-lookup-fast.
 
 For the
 desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
 , you would modify:
 
   delegateAnalysisEngine key=DictionaryLookupAnnotatorDB
 import
 location=../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml/
   /delegateAnalysisEngine
 
 To be:
 
   delegateAnalysisEngine key=DictionaryLookupAnnotatorDB
 import
 location=../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml/
   /delegateAnalysisEngine
 
 
 That should be it.  You can then leave the rest of the module
 specifications alone.
 
 Sean
 
 
 From: Nick Nikandish [snika...@emerginghealthit.com]
 Sent: Tuesday, September 09, 2014 4:32 PM
 To: dev@ctakes.apache.org
 Subject: RE: Ctakes to process 5000K recoreds
 
 Hi Sean,
 
 Many thanks, I will try it tomorrow. Do you have any special instruction
 to run that scrip or I have to use it with cTakes?
 
 Thanks,
 Nick
 
 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
 Sent: Tuesday, September 09, 2014 4:24 PM
 To: dev@ctakes.apache.org
 Subject: RE: Ctakes to process 5000K recoreds
 
 Hi Nick,
 
 I think that the bottleneck is probably the lookup module itself.  So, I
 just sent you a secure email/ftp link.  It contains a build of the new
 dictionary-lookup-fast module.  Should you choose to try it, let me know
 how things turn out.
 
 Sean
 
 From: Nick Nikandish [snika...@emerginghealthit.com]
 Sent: Tuesday, September 09, 2014 4:10 PM
 To: dev@ctakes.apache.org
 Subject: RE: Ctakes to process 5000K recoreds
 
 Thanks, let me try it.
 Nick
 
 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Tuesday, September 09, 2014 4:08 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: Ctakes to process 5000K recoreds
 
 If you just need the medication names, you can remove these:
 nodeContextDependentTokenizerAnnotator/node
 nodeDependencyParser/node
 nodeAssertionAnnotator/node
 
 You might be able to get rid of the LvgAnnotator and still get decent
 results since variations of word form should not affect medication names. I
 would try with it and without it on a smaller set of files and see if you
 see a difference.
 
 I believe the others are needed by the default configs for medication
 lookup. For example, POS is used to get phrase type. Phrases are used to
 remove verb phrases from the lookup and also therefore to keep the lookup
 windows from getting too big.  I'm more familiar with the other types of
 named entities (diseases, symptoms, etc) than with medications.
 
 -Original Message-
 From: Nick Nikandish [mailto:snika...@emerginghealthit.com]
 Sent: Tuesday, September 09, 2014 3:01 PM
 To: dev@ctakes.apache.org
 Subject: RE: Ctakes to process 5000K recoreds
 
 James,
 
 Do you have any suggestion about running cTakes with minimum annotators
 that can return Medications in DictionaryLookupAnnotator?
 Thanks,
 Nick
 
 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Tuesday, September 09, 2014 3:05 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: Ctakes to process 5000K recoreds
 
 I suspect that when

RE: managing ctakes resources on classpath

2014-08-25 Thread Chen, Pei
Tim/Kim,
After a quick debug, it looks like the DependencyParser Tests looks okay, 
however, the test pipeline uses LVG.  And during maven 'install', these files 
are inside a jar which LVG explicitly needs a File or Directory.  I just 
committed a step in the pom.xml to unpack the lvg-res.  This should be fine for 
the junit test; but also keep that in mind when configuring for a production 
environment.
If you have a chance, could you try trunk?  It should solve the parser issue 
(and any other test components that depend on lvg-res), but not sure about 
potentially other test errors.

--Pei

 -Original Message-
 From: Tim O'Connell [mailto:tim.oconn...@gmail.com]
 Sent: Monday, August 25, 2014 12:38 AM
 To: dev@ctakes.apache.org
 Subject: Re: managing ctakes resources on classpath
 
 Thanks Kim  Pei.  If it helps any, I'm getting the same error in Eclipse.
  I just checked out the code this morning from SVN.
 
 Using -DskipTests=true I was able to get it to build from the command line.
 
 Tim
 
 
 
 
 On Wed, Aug 20, 2014 at 12:41 PM, Kim Ebert
 kim.eb...@perfectsearchcorp.com
  wrote:
 
  I've added issue 307.
 
  https://issues.apache.org/jira/browse/CTAKES-307
 
  Kim Ebert
  1.801.669.7342
  Perfect Search Corp
  http://www.perfectsearchcorp.com/
 
  On 08/20/2014 11:52 AM, Chen, Pei wrote:
   Thanks Kim- would you mind opening up a Jira to track this?
   The cTAKES ClearNLP Dependency Parser and/or Test Cases most likely
   need
  to be updated to enable resources to be picked up from the jar.
  
   -Original Message-
   From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
   Sent: Wednesday, August 20, 2014 1:39 PM
   To: dev@ctakes.apache.org
   Subject: Re: managing ctakes resources on classpath
  
   I'm just using exactly what came out of SVN, so I haven't modified
   the default classpath yet.
  
   Kim Ebert
   1.801.669.7342
   Perfect Search Corp
   http://www.perfectsearchcorp.com/
  
   On 08/20/2014 11:28 AM, Chen, Pei wrote:
   Do you happen to have both jars and unpacked in your cp?
   Temp workaround: -DskipTests=true?
  
   Sent from my iPhone
  
   On Aug 20, 2014, at 1:25 PM, Kim Ebert
   kim.eb...@perfectsearchcorp.com wrote:
   I am encountering this same issue when I try to run mvn install
   from the command line. Is there a way to get mvn install to work?
  
   ---
   T E S T S
   ---
   Running
   org.apache.ctakes.dependency.parser.ae.util.TestClearNLPAnalysisE
   ngin
   es
   log4j: reset attribute= false.
   log4j: Threshold =null.
   log4j: Level value for root is  [INFO].
   log4j: root level set to INFO
   log4j: Class name: [org.apache.log4j.ConsoleAppender]
   log4j: Parsing layout of class: org.apache.log4j.PatternLayout
   log4j: Setting property [conversionPattern] to [%d{dd MMM 
   HH:mm:ss} %5p %c{1} - %m%n].
   log4j: Adding appender named [consoleAppender] to category
 [root].
   Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
   1.397 sec  FAILURE!
  
   Results :
  
   Tests in error:
  
  
   TestClearNLPPipeLine(org.apache.ctakes.dependency.parser.ae.util.Te
   stCle
   arNLPAnalysisEngines):
   URI is not hierarchical
  
   Kim Ebert
   1.801.669.7342
   Perfect Search Corp
   http://www.perfectsearchcorp.com/
  
   On 09/10/2013 07:33 AM, Pei Chen wrote:
   Hi Steve,
   The URI is not hierarchical is most likely caused by the code
   trying to use the resources/models, but they are inside a jar
   instead of
   unpacked.
   -Which version of cTAKES are you using?
   -Do you happen to have the resource file name that caused the
 above?
  
   --Pei
  
  
   On Mon, Sep 9, 2013 at 9:48 PM, Steve Hookway
   shook...@cra.com wrote:
   Hi all,
  
   I'm trying to integrate ctakes into a webapp and am running
   into issues getting the resources to load correctly. In a
   standalone version of the app, if I add the resources folder to
   the buildpath (as described in the install directions)
   everything works as expected. However, if I add the folder to
   the project classpath instead, I get a URI is not hierarchical
   exception from
   FileResourceImpl.load:
   java.lang.IllegalArgumentException: URI is not hierarchical
 at java.io.File.init(File.java:392)
 at
  
  
  org.apache.ctakes.core.resource.FileResourceImpl.load(FileResourceImpl
  .ja
   va:44)
 at
  
  
 org.apache.uima.resource.impl.ResourceManager_impl.registerResource
   (ResourceManager_impl.java:603)
  
   Similarly if I try and run from a webapp - setting up tomcat's
   classpath to include the ctakes resource folder, I get the same
   URI is not hierarchical error. I found this bug report:
   https://issues.apache.org/jira/browse/CTAKES-89 but  it
   suggests adding the resource folder to my classpath, which
   isn't doing the
  trick.
  
   If you can steer me in the right direction, I'd really
   appreciate

RE: org.apache.ctakes.ytex.umls.dao.UMLSDaoTest

2014-08-25 Thread Chen, Pei
It logs it as a warn, but fails the test.
Should there it assertNotNull only if UMLS setup else, pass the test?  
That way, for those folks are doing a default 'maven clean install, they won't 
have to skipTests?
// UMLSDaoTest .testGetAllAuiStr() - Check to see if UMLS is setup before 
checking?
Assert.assertNotNull(auis);
CTAKES-308

 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Monday, August 25, 2014 4:30 PM
 To: dev@ctakes.apache.org
 Subject: Re: org.apache.ctakes.ytex.umls.dao.UMLSDaoTest
 
 That is an expected error having to do with the fact that UMLS isn't installed
 in the test database that get's fired up for unit tests.  That is actually a
 warning (and should be interpreted as an error only if you do have UMLS set
 up)
 
 
 On Mon, Aug 25, 2014 at 9:02 PM, Pei Chen chen...@apache.org wrote:
 
  Hi VJ,
  While on the subject of unit tests-
 
  I didn't get a chance to dig deeper and was hoping you would know the
  cause of this unit test failure:  mvn clean install
 
  2014-08-25 13:33:50,830 WARN  net.sf.ehcache.CacheManager  - Creating
  a new instance of CacheManager using the diskStorePath
  /var/folders/qc/d7xd4zzs0_xcybv88skt5_7mgn/T/ which is already
  used by an existing CacheManager.
 
  The source of the configuration was
 
 
 net.sf.ehcache.config.generator.ConfigurationSource$InputStreamConfigura
 tionSource@7433a719.
 
  The diskStore path for this CacheManager will be set to
 
 
 /var/folders/qc/d7xd4zzs0_xcybv88skt5_7mgn/T//ehcache_auto_creat
 ed_1408988030830.
 
  To avoid this warning consider using the CacheManager factory methods
  to create a singleton CacheManager or specifying a separate ehcache
  configuration (ehcache.xml) for each CacheManager instance.
 
  2014-08-25 13:33:51,082 WARN
  org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - SQL Error: 62,
  SQLState: S0010
 
  2014-08-25 13:33:51,082 ERROR
  org.hibernate.engine.jdbc.spi.SqlExceptionHelper  - Unknown JDBC
  escape sequence: {{db.schema}.MRCONSO mrconso0_ where
 mrconso0_.aui?
  and length(mrconso0_.aui)0 and length(mrconso0_.str)200 and
  mrconso0_.lat='ENG' order by mrconso0_.aui
 
  2014-08-25 13:33:51,085 WARN
  org.apache.ctakes.ytex.umls.dao.UMLSDaoTest  - sql exception - mrconso
  probably doesn't exist, check error
 
  org.hibernate.exception.SQLGrammarException: could not prepare
  statement
 
  at
  org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQ
  LStateConversionDelegate.java:123)
 
  at
  org.hibernate.exception.internal.StandardSQLExceptionConverter.convert
  (StandardSQLExceptionConverter.java:49)
 
  at
  org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionH
  elper.java:125)
 
  at
 
 org.hibernate.engine.jdbc.internal.StatementPreparerImpl$StatementPrep
  arationTemplate.prepareStatement(StatementPreparerImpl.java:188)
 
  at
  org.hibernate.engine.jdbc.internal.StatementPreparerImpl.prepareQueryS
  tatement(StatementPreparerImpl.java:159)
 
  at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1859)
 
  at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1836)
 
  at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1816)
 
  at org.hibernate.loader.Loader.doQuery(Loader.java:900)
 
  at
  org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loa
  der.java:342)
 
  at org.hibernate.loader.Loader.doList(Loader.java:2526)
 
  at org.hibernate.loader.Loader.doList(Loader.java:2512)
 
  at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2342)
 
  at org.hibernate.loader.Loader.list(Loader.java:2337)
 
  at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:495)
 
  at
  org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslato
  rImpl.java:357)
 
  at
 
 org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.j
  ava:195)
 
  at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1269)
 
  at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
 
  at
 
 org.apache.ctakes.ytex.umls.dao.UMLSDaoImpl.getAllAuiStr(UMLSDaoImpl.j
  ava:106)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
  ava:57)
 
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 s
  orImpl.java:43)
 
  at java.lang.reflect.Method.invoke(Method.java:606)
 
  at
  org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflectio
  n(AopUtils.java:319)
 
  at
 
 org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoi
  npoint(ReflectiveMethodInvocation.java:183)
 
  at
 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
 R
  eflectiveMethodInvocation.java:150)
 
  at
  org.springframework.transaction.interceptor.TransactionInterceptor.inv
  oke(TransactionInterceptor.java:110)
 
  at
 
 org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(
 R
  

RE: Change from SNOMEDCT to SNOMEDCT_US affecting v_snomed_fword_lookup

2014-08-25 Thread Chen, Pei
This has been done in trunk for the next patch 3.2.1 release:
https://issues.apache.org/jira/browse/CTAKES-309
Thanks for pointing this out- Would be great if someone could confirm the ytex 
scripts works as expected with their latest version of umls.


 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Thursday, August 21, 2014 11:07 PM
 To: dev@ctakes.apache.org
 Subject: Re: Change from SNOMEDCT to SNOMEDCT_US affecting
 v_snomed_fword_lookup
 
 That would definitely make sense
 
 On Thursday, August 21, 2014, Chen, Pei pei.c...@childrens.harvard.edu
 wrote:
 
  VJ,
  Would it make sense to add in('SNOMEDCT_US') to the default
  ctakes-ytex/scripts/data/**/insert_view.sql?
  That way it'll support umls2011 as well as the newer 2014 naming
  conventions?
  Ex: inner join MRCONSO mrc on c.aui = mrc.aui and mrc.SAB in (
  'SNOMEDCT',
  'SNOMEDCT_US','RXNORM')
 
  --Pei
 
   -Original Message-
   From: clayclay...@gmail.com javascript:;
   [mailto:clayclay...@gmail.com
  javascript:;] On Behalf Of
   Clayton Turner
   Sent: Thursday, August 21, 2014 4:25 PM
   To: dev@ctakes.apache.org javascript:;
   Subject: Re: Change from SNOMEDCT to SNOMEDCT_US affecting
   v_snomed_fword_lookup
  
   Ah, I just switched to the ytex branch and all is good now. The
   SNOMED_US issue has been plaguing me for weeks now so thanks a
 million for that.
  
  
   On Thu, Aug 21, 2014 at 2:13 PM, Clayton Turner
   caturn...@g.cofc.edu
  javascript:;
   wrote:
  
Awesome. This is just what I needed for the longest time.
   
I'm having a slight issue. When running either the ytex pipeline
or ytex version of the AggregatePlaintextUMLSProcessor I get an
error during initialization.
   
My DictionaryLookupAnnotator.xml is raising a
org.apache.uima.resource.ResourceInitializationException causedby:
java.lang.ClassNotFoundException:
edu.mayo.bmi.uima.lookup.ae.FirstTokenPermLookupInitializerImpl
   
I feel like I may have drifted away from what I need, though,
because before this the CPE was complaining about a lack of
LookupDesc_SNOMED.xml file. I found a ytex version of this on a
google code site somewhere and pasted it where the CPE was looking
for it. Now
   this error is coming up.
   
Could my problem be solved with just a re-run of the ant script
(was just trying to avoid since it takes ages) or is it a different 
issue?
   
   
On Tue, Aug 19, 2014 at 12:58 PM, Tim O'Connell
tim.oconn...@gmail.com javascript:;
wrote:
   
Hi John,
   
I'm not sure what was going on with the @db.schema@ error,
although I was getting it as well before with my prior build of
3.1.2 - I assume that you've fixed something (thank you!) to make
this go away.  I rebuilt everything from scratch and it's working now.
   
I think one other thing I had to change was that after I had
finished the install/build, the cTakes version of
LookupDesc_Db.xml doesn't work (in
resources\org\apache\ctakes\dictionary\lookup) - I'm pretty sure
I had to copy in an older version of the file from 3.1.1 to get
the default cTakes AggregatePlaintextUMLSProcessor pipeline
working, although please double-check that as my memory is a little
 foggy.
   
But yes, here's what I have working since re-building:
1. ytex-pipeline.xml
2. ytex version of AggregatePlaintextUMLSProcessor.xml
3. cTakes version of AggregatePlaintextUMLSProcessor.xml (with
swapping the LookupDesc_Db.xml file as above)
   
I've even made modifications to the ytex version of
LookupDesc_SNOMED.xml to get it tagging Disease Disorders, along
with
   database modifications to
have it store these entries as well, which is working great.
   Literally,
everything is working perfectly now.
   
Still so much for me to learn!  Let me know if you need any more
  details.
   
All the best,
Tim
   
   
   
On Tue, Aug 19, 2014 at 4:31 AM, John Green
john.travis.gr...@gmail.com javascript:;
wrote:
   
 I have not had time to implement this - to clarify out of
 curiosity,
does
 this clear up the @db.schema@ error Tim? And did you
 successfully run ytex with the ctakes dictionary-lookup?


 JG
 —
 Sent from Mailbox for iPhone

 On Sat, Aug 16, 2014 at 2:53 AM, Tim O'Connell
 tim.oconn...@gmail.com javascript:;
 wrote:

  Hi folks,
  I was having an issue with the current build (from svn) of
  ctakes/ytex
 not
  identifying any annotations as some folks on this board.  I
  traced it
to
  the fact that the UMLS database has at sometime in the
  relatively
recent
  past changed the SAB tag in the MRCONSO table for SNOMED
  terms from SNOMEDCT to SNOMEDCT_US.  I just had a newer
  version of
   UMLS
  that uses SNOMEDCT_US.  Thus when the install script tried to
  create

IdentifiedAnnotation.originalText UMLSConcept.preferredText

2014-08-01 Thread Chen, Pei
Hi Tim,
I think these 2 types may have been clobbered accidently between these 2 
revisions?  
Let me know and I'll add these back in...
https://svn.apache.org/viewvc/ctakes/trunk/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSystem.xml?r1=1534759r2=1549698diff_format=h

IdentifiedAnnotation.originalTextFSArray
descriptionThe covered text of the span or the disjoint spans that resulted 
in the creation of this IdentifiedAnnotation. If the covered text is from 
disjoint spans, they are separated by a delimeter./description

UMLSConcept.preferredTextString
descriptionpreferredText is the preferred term. normally his is the UMLS 
preferred name. /description


RE: Fwd: UMLS integration with cTAKES 3.1​

2014-07-25 Thread Chen, Pei
Natalia,
That is strange.  It sounds like it isn't configured to use the right resource.
Could you double check the mappings, in particular: in your LookupDesc_Db.xml:
externalResourceKey=DbConnection to ensure it's using the right resource that 
was added in your DictionaryLookupAnnotatorDB.xml?
If it still doesn't work, would you mind attaching the xml config files?

--Pei

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Friday, July 25, 2014 1:06 PM
 To: u...@ctakes.apache.org; dev@ctakes.apache.org
 Subject: Re: Fwd: UMLS integration with cTAKES 3.1​
 
 This sounds just like jira ctakes-306.
 
 
 JG
 —
 
 On Fri, Jul 25, 2014 at 1:04 PM, John Green hephaestus.stu...@gmail.com
 wrote:
 
  —
  -- Forwarded message --
  From: Natalia Connolly natalia.v.conno...@gmail.com
  Date: Fri, Jul 25, 2014 at 12:35 PM
  Subject: UMLS integration with cTAKES 3.1
  To: u...@ctakes.apache.org u...@ctakes.apache.org
  Hello,
 I am trying to supplement the basic cTAKES dictionary with the
  latest UMLS release.  Following the instructions here (
  https://cabig-
 kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28t=80#p25
  8), I built a mysql database and imported UMLS data into it as two tables,
  umls_ms_2013 and umls_snomed_map. I then modified
  DictionaryLookupAnnotatorDB.xml as follows:
  nameURL/name
 value
stringjdbc:mysql://localhost:3306/umls/string
/value
 and I also changed LookupDesc_Db.xml to reflect my two table names.
 After I added DictionaryLookupAnnotatorDB.xml to my analysis
  engines in runctakesCPE.sh, I got the following error:
  org.apache.uima.analysis_engine.AnalysisEngineProcessException
 CausedBY:
  org.apache.ctakes.dictionary.lookup.DictionaryException:
  java.sql.SQLException: Table not found in statement [SELECT tui,
  text, cui from UMLS_MS_2013 where fword =?]
   This is strange because the table does exist and it's not empty:
  mysql SELECT tui, text, cui from UMLS_MS_2013 limit 2;
  +--++--+
  | tui  | text   | cui  |
  +--++--+
  | T121 | MSH| C005 |
  | T121 | MSHFRE | C005 |
  +--++--+
  2 rows in set (0.00 sec)
   Can someone please help?
   Thank you,
   Natalia Connolly


RE: Wiki

2014-07-24 Thread Chen, Pei
Ah Yes,
I noticed there was a 'Copy Page Tree' feature that copied the entire pages so 
it was fairly straightforward...

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Wednesday, July 23, 2014 9:11 PM
 To: dev@ctakes.apache.org
 Subject: Wiki
 
 Well Pei, I guess we were editing at the same time 5 hours ago. I kept getting
 this error, and I couldn't figure it out because nothing was in the index yet:
 
 Cause
 
 com.atlassian.confluence.pages.DuplicateDataRuntimeException: A page
 already exists with the title cTAKES 3.2 Component Use Guide in the space
 with key CTAKES
 at
 com.atlassian.confluence.pages.DefaultPageManager.throwIfDuplicateAbstr
 actPageTitle(DefaultPageManager.java:909)
 
 
 You beat me to the punch.
 
 
 JG


RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-22 Thread Chen, Pei
Thanks James.
I was planning on closing the vote today.
In the meantime, does anyone a quick way to clone/rename the wiki documentation 
for 3.2?
--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Monday, July 21, 2014 4:25 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Here's the additional I've done
 
 I ran mvn test with 0 Failures and 0 Errors.
 Ran the AggregateTemplateFiller.xml and received same output (except for
 internal UIMA identifiers) with rc2 as I did with 3.1.1.
 
 +1 to release
 
 -Original Message-
 From: Masanz, James J.
 Sent: Wednesday, July 16, 2014 3:59 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 FYI, so far I have done the following steps:
 
 downloaded the source archive
 compiled it using: maven compile
 downloaded the separately available resources set up classpath to include
 e.g. jars (from the bin distribution) set ctakes.umlsuser and  ctakes.umlspw
 env vars run  runctakesCVD.bat loaded
 AggregatePlaintextUMLSProcessor.xml
 ran against some simple text.
 verified did not through an exception.
 verified some EventMention and EntityMention annotations were produced.
 
 I will do more testing tomorrow. Just giving a status update.
 
 --James
 
 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Saturday, July 12, 2014 6:24 AM
 To: dev@ctakes.apache.org
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Agreed on that.
 
 I downloaded the new resources binary and was able to run my tests on the -
 bin version of the RC.
 
 +1 for making this the release.
 
 Tim
 
 
 
 From: Masanz, James J. [masanz.ja...@mayo.edu]
 Sent: Friday, July 11, 2014 7:27 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 I agree about keeping the thread open.
 
 -- James
 
 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Friday, July 11, 2014 4:28 PM
 To: dev@ctakes.apache.org
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Updated the lvg.properties file within ctakes-resources on sourceforge [1].
 Since the Apache cTAKES artifacts didn't change, I would like to keep this
 VOTE thread open.
 
 Also renamed it to 3.2.0 (even though they technically do not have to follow
 each other, but probably nice to keep it consistent for users as James
 suggested.) [1]
 http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-
 3.2.0.zip/download
 
  -Original Message-
  From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
  Sent: Thursday, July 10, 2014 5:53 PM
  To: 'dev@ctakes.apache.org'
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  Can you also give ctakesresources the number 3.2 or 3.2.0 instead of
  3.1.3
 
  -Original Message-
  From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
  Sent: Thursday, July 10, 2014 2:12 PM
  To: dev@ctakes.apache.org
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  I think this is due to the fact that the default lvg.properties also
  exits in the ctakes-resources project, so if you download and replace,
  it will override the ctakes configured one.
  I think it's a bug, but probably always been there...
  I'll fix up ctakes-resources on sourceforge nethertheless but it
  shouldn't require any changes to the release candidates.
 
   -Original Message-
   From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
   Sent: Thursday, July 10, 2014 11:59 AM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Hi Tim,
  
   When you say that it didn't seem to affect the run, where you
   comparing output to last release or just checking if data seemed OK
   at a
  glance?
  
   -Original Message-
   From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
   Sent: Thursday, July 10, 2014 7:29 AM
   To: dev@ctakes.apache.org
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   I was able to run the binary without issues this time.
   I also downloaded the resources from sourceforge and integrated into
   the bin release and ran with the ctakes dictionary.
  
   I did get some weird exceptions thrown that didn't seem to affect
   the run -- looks like some hardcoded file paths in LVG? (See below)
  
   Tim
  
  
   Exception: java.io.FileNotFoundException:
   /export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data
   (No such file or directory)
   ** Error: problem of opening/reading stop words file:
  
 '/export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data'.
   Exception: java.io.FileNotFoundException:
  
 
 /export/home/lu/Development/LVG/lvg2008/data/misc/nonInfoWords.data
   (No such file or directory)
   ** Error: problem of opening/reading non-Info words file:
  
 
 '/export/home/lu/Development/LVG

RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-22 Thread Chen, Pei
There is currently no guides on the confluence wiki for cTAKES 3.2.0...
I was thinking of just cloning 3.1.1
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.1

And just add the YTEX and/or any new changes to it...
Would be grateful for any help here...

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Tuesday, July 22, 2014 2:37 PM
 To: dev@ctakes.apache.org
 Subject: Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 What exactly needs updated? I have not had the time (unfortunately) to
 help with this project very much because of the steep learning curve on the
 technology. I'm currently on some protected research time working with
 cTakes as of this week and would be happy to help with some grunt work.
 
 JG
 
 
 On Tue, Jul 22, 2014 at 11:39 AM, Bleeker, Troy C. bleeker.t...@mayo.edu
 wrote:
 
  One page at a time. At least there's that.
 
  Thanks
  Troy
  -Original Message-
  From: Masanz, James J.
  Sent: Tuesday, July 22, 2014 10:38 AM
  To: 'dev@ctakes.apache.org'
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  When I asked Troy that question for 3.1.1, he didn't know of a way,
  and I don't either, which is why I had the 3.1.1 page mostly just
  reference the
  3.2 documentation.
 
  -Original Message-
  From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
  Sent: Tuesday, July 22, 2014 10:00 AM
  To: dev@ctakes.apache.org
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  Thanks James.
  I was planning on closing the vote today.
  In the meantime, does anyone a quick way to clone/rename the wiki
  documentation for 3.2?
  --Pei
 
   -Original Message-
   From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
   Sent: Monday, July 21, 2014 4:25 PM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Here's the additional I've done
  
   I ran mvn test with 0 Failures and 0 Errors.
   Ran the AggregateTemplateFiller.xml and received same output (except
   for internal UIMA identifiers) with rc2 as I did with 3.1.1.
  
   +1 to release
  
   -Original Message-
   From: Masanz, James J.
   Sent: Wednesday, July 16, 2014 3:59 PM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   FYI, so far I have done the following steps:
  
   downloaded the source archive
   compiled it using: maven compile
   downloaded the separately available resources set up classpath to
   include e.g. jars (from the bin distribution) set ctakes.umlsuser
   and ctakes.umlspw env vars run  runctakesCVD.bat loaded
   AggregatePlaintextUMLSProcessor.xml
   ran against some simple text.
   verified did not through an exception.
   verified some EventMention and EntityMention annotations were
 produced.
  
   I will do more testing tomorrow. Just giving a status update.
  
   --James
  
   -Original Message-
   From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
   Sent: Saturday, July 12, 2014 6:24 AM
   To: dev@ctakes.apache.org
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Agreed on that.
  
   I downloaded the new resources binary and was able to run my tests
   on the - bin version of the RC.
  
   +1 for making this the release.
  
   Tim
  
  
   
   From: Masanz, James J. [masanz.ja...@mayo.edu]
   Sent: Friday, July 11, 2014 7:27 PM
   To: 'dev@ctakes.apache.org'
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   I agree about keeping the thread open.
  
   -- James
  
   -Original Message-
   From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
   Sent: Friday, July 11, 2014 4:28 PM
   To: dev@ctakes.apache.org
   Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
  
   Updated the lvg.properties file within ctakes-resources on
   sourceforge
  [1].
   Since the Apache cTAKES artifacts didn't change, I would like to
   keep this VOTE thread open.
  
   Also renamed it to 3.2.0 (even though they technically do not have
   to follow each other, but probably nice to keep it consistent for
   users as James
   suggested.) [1]
   http://sourceforge.net/projects/ctakesresources/files/ctakes-resourc
   es
   -
   3.2.0.zip/download
  
-Original Message-
From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
Sent: Thursday, July 10, 2014 5:53 PM
To: 'dev@ctakes.apache.org'
Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
   
Can you also give ctakesresources the number 3.2 or 3.2.0 instead
of
3.1.3
   
-Original Message-
From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
Sent: Thursday, July 10, 2014 2:12 PM
To: dev@ctakes.apache.org
Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
   
I think this is due to the fact that the default lvg.properties
also exits in the ctakes-resources project, so if you download and
replace, it will override

RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-11 Thread Chen, Pei
Updated the lvg.properties file within ctakes-resources on sourceforge [1].  
Since the Apache cTAKES artifacts didn't change, I would like to keep this VOTE 
thread open.

Also renamed it to 3.2.0 (even though they technically do not have to follow 
each other, but probably nice to keep it consistent for users as James 
suggested.)
[1] 
http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.2.0.zip/download

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Thursday, July 10, 2014 5:53 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Can you also give ctakesresources the number 3.2 or 3.2.0 instead of 3.1.3
 
 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Thursday, July 10, 2014 2:12 PM
 To: dev@ctakes.apache.org
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 I think this is due to the fact that the default lvg.properties also exits in 
 the
 ctakes-resources project, so if you download and replace, it will override the
 ctakes configured one.
 I think it's a bug, but probably always been there...
 I'll fix up ctakes-resources on sourceforge nethertheless but it shouldn't
 require any changes to the release candidates.
 
  -Original Message-
  From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
  Sent: Thursday, July 10, 2014 11:59 AM
  To: 'dev@ctakes.apache.org'
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  Hi Tim,
 
  When you say that it didn't seem to affect the run, where you
  comparing output to last release or just checking if data seemed OK at a
 glance?
 
  -Original Message-
  From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
  Sent: Thursday, July 10, 2014 7:29 AM
  To: dev@ctakes.apache.org
  Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
  I was able to run the binary without issues this time.
  I also downloaded the resources from sourceforge and integrated into
  the bin release and ran with the ctakes dictionary.
 
  I did get some weird exceptions thrown that didn't seem to affect the
  run -- looks like some hardcoded file paths in LVG? (See below)
 
  Tim
 
 
  Exception: java.io.FileNotFoundException:
  /export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data
  (No such file or directory)
  ** Error: problem of opening/reading stop words file:
  '/export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/misc/nonInfoWords.data
  (No such file or directory)
  ** Error: problem of opening/reading non-Info words file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/misc/nonInfoWords.dat
  a'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/misc/conjunctionWord.d
  ata (No such file or directory)
  ** Error: problem of opening/reading conjunction words file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/misc/conjunctionWord.
  data'.
  ** ERR: problem of opening/reading diacritics file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/diacriticMap.da
  ta'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/diacriticMap.da
  ta (No such file or directory)
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/ligatureMap.da
  ta (No such file or directory)
  ** Error: problem of opening/reading ligature file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/ligatureMap.da
  ta'.
  ** Error: problem of opening/reading symbol synonym file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/synonymMap.
  data'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/synonymMap.d
  ata (No such file or directory)
  **Error: problem of opening/reading file
  '/export/home/lu/Development/LVG/lvg2008/data/misc/removeS.data'.
  Exception: java.io.FileNotFoundException:
  /export/home/lu/Development/LVG/lvg2008/data/misc/removeS.data
 (No
  such file or directory)
  ** Error: problem of opening/reading Unicode symbol file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/symbolMap.da
  ta'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/symbolMap.dat
  a (No such file or directory)
  ** Error: problem of opening/reading Unicode file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/unicodeMap.d
  ata'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/unicodeMap.da
  ta (No such file or directory)
  ** Error: problem of opening/reading nonStripMap file:
 
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/nonStripMap.d
  ata'.
  Exception: java.io.FileNotFoundException:
 
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/nonStripMap.d
  ata

RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-10 Thread Chen, Pei
I think this is due to the fact that the default lvg.properties also exits in 
the ctakes-resources project, so if you download and replace, it will override 
the ctakes configured one.
I think it's a bug, but probably always been there...  
I'll fix up ctakes-resources on sourceforge nethertheless but it shouldn't 
require any changes to the release candidates.

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Thursday, July 10, 2014 11:59 AM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Hi Tim,
 
 When you say that it didn't seem to affect the run, where you comparing
 output to last release or just checking if data seemed OK at a glance?
 
 -Original Message-
 From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
 Sent: Thursday, July 10, 2014 7:29 AM
 To: dev@ctakes.apache.org
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 I was able to run the binary without issues this time.
 I also downloaded the resources from sourceforge and integrated into the
 bin release and ran with the ctakes dictionary.
 
 I did get some weird exceptions thrown that didn't seem to affect the run --
 looks like some hardcoded file paths in LVG? (See below)
 
 Tim
 
 
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data
 (No such file or directory)
 ** Error: problem of opening/reading stop words file:
 '/export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/misc/nonInfoWords.data
 (No such file or directory)
 ** Error: problem of opening/reading non-Info words file:
 '/export/home/lu/Development/LVG/lvg2008/data/misc/nonInfoWords.dat
 a'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/misc/conjunctionWord.d
 ata (No such file or directory)
 ** Error: problem of opening/reading conjunction words file:
 '/export/home/lu/Development/LVG/lvg2008/data/misc/conjunctionWord.
 data'.
 ** ERR: problem of opening/reading diacritics file:
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/diacriticMap.da
 ta'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/diacriticMap.da
 ta (No such file or directory)
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/ligatureMap.da
 ta (No such file or directory)
 ** Error: problem of opening/reading ligature file:
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/ligatureMap.da
 ta'.
 ** Error: problem of opening/reading symbol synonym file:
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/synonymMap.
 data'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/synonymMap.d
 ata (No such file or directory)
 **Error: problem of opening/reading file
 '/export/home/lu/Development/LVG/lvg2008/data/misc/removeS.data'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/misc/removeS.data (No
 such file or directory)
 ** Error: problem of opening/reading Unicode symbol file:
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/symbolMap.da
 ta'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/symbolMap.dat
 a (No such file or directory)
 ** Error: problem of opening/reading Unicode file:
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/unicodeMap.d
 ata'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/unicodeMap.da
 ta (No such file or directory)
 ** Error: problem of opening/reading nonStripMap file:
 '/export/home/lu/Development/LVG/lvg2008/data/Unicode/nonStripMap.d
 ata'.
 Exception: java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/Unicode/nonStripMap.d
 ata (No such file or directory)
 java.sql.SQLException: File input/output error
 /export/home/lu/Development/LVG/lvg2008/data/HSqlDb/lvg2008.properti
 es java.io.FileNotFoundException:
 /export/home/lu/Development/LVG/lvg2008/data/HSqlDb/lvg2008.properti
 es.new (No such file or directory)
 
 
 From: Masanz, James J. [masanz.ja...@mayo.edu]
 Sent: Wednesday, July 09, 2014 2:26 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 So far for rc2, I've
 
 Verified MD5  and signatures of .zip archives Verified that source at tag
 matches source in.zip Imported source from archive into eclipse as maven
 project(s) and compiled without any Errors.
 
 --  James
 
 
 -Original Message-
 From: Pei Chen [mailto:chen...@apache.org]
 Sent: Tuesday, July 08, 2014 5:11 PM
 To: dev@ctakes.apache.org
 Subject: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Hi all,
 
 The main difference between rc1 and rc2 is that we removed the lvg-res and
 

RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)

2014-07-09 Thread Chen, Pei
The maven artifacts are also available in the staging area:
https://repository.apache.org/content/repositories/orgapachectakes-1001
VJ: Just curious- how did you envision ytex users downloading the jars/war? 
From the distro bin.zip or from maven central?

--Pei

 -Original Message-
 From: Pei Chen [mailto:chen...@apache.org]
 Sent: Tuesday, July 08, 2014 6:11 PM
 To: dev@ctakes.apache.org
 Subject: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
 
 Hi all,
 
 The main difference between rc1 and rc2 is that we removed the lvg-res and
 assertion-res.jar from the distro.  They still need to be unpacked.
 
 This is a call for a vote on releasing the following candidate (rc2) as Apache
 cTAKES 3.2.0.
 The major changes include:
 - New optional YTEX component(s) (Yale Extensions to cTAKES)
 - New optional improved/faster dictionary lookup (dictionary-lookup-fast)
 - New optional Temporal component (Time + Event extraction.  Relations will
 be including in a future release.)
 - Other bug fixes/enhancements from Jira
 
 [TODO: Online documentation still needs to be updated on wiki]
 
 For more detailed information on the changes/release notes, please visit:
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313621
 version=12324066
 
 The release was made using the cTAKES release process documented here:
 http://ctakes.apache.org/ctakes-release-guide.html
 
 The candidate is available at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
 3.2.0-src.tar.gz
 /.zip
 
 The tag to be voted on:
 http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.0-rc2
 
 The MD5 checksum of the tarball can be found at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
 3.2.0-src.tar.gz.md5
 /.zip.md5
 
 The signature of the tarball can be found at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
 3.2.0-src.tar.gz.asc
 /.zip.asc
 
 Apache cTAKES' KEYS file, containing the PGP keys used to sign the release:
 https://dist.apache.org/repos/dist/release/ctakes/KEYS
 
 Please vote on releasing these packages as Apache cTAKES 3.2.0. The vote is
 open for at least the next 72 hours.
 Only votes from the cTAKES PMC are binding, but folks are welcome to check
 the release candidate and voice their approval or disapproval.
 The vote passes if at least three binding +1 votes are cast.
 
 [ ] +1 Release the packages as Apache cTAKES 3.2.0 [ ] -1 Do not release the
 packages because...
 
 Also, the convenience binary can be found at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0-rc2/apache-ctakes-
 3.2.0-bin.tar.gz
 /.zip
 
 Note: It's temporarily on people.a.o because the artifacts were too large for
 https://dist.apache.org/repos/dist/dev/ctakes (Working with infra on
 increasing the limit).
 
 
 Thanks!


RE: Retrieving CUIs

2014-07-08 Thread Chen, Pei
Nick,
On why 'Enterococcus faecium' isn't returned:
I think a limitation is that cTAKES will force a mapping of the TUI's semantic 
types into a semantic group.  If it doesn't exist in the mapping, the 
dictionary lookup won't save it.
It would probably be a nice feature to create a Other semantic group and 
default anything that is not mapped into this group.
You can test out this theory by modifying your LookupDesc_Db.xml to include 
T007 (Bacterium) into one of the existing groups such findings and see if 
'Enterococcus faecium' returns for you.

[Assuming you're using the current default dictionary-lookup and not 
dictionary-lookup-fast]

 -Original Message-
 From: Nick Nikandish [mailto:snika...@emerginghealthit.com]
 Sent: Tuesday, July 08, 2014 4:12 PM
 To: dev@ctakes.apache.org
 Subject: RE: Retrieving CUIs
 
 One of the cui that I need to get is for Bactria like  Enterococcus faecium
 but I am not seeing it. I am writing a new annotator to get those cui and save
 them in a map. Do you have any applicationContext that I can save those
 values and retrieve them in my annotators which are at the end of the
 pipeline or I should use something like ehcache?
 
 
 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Tuesday, July 08, 2014 3:56 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: Retrieving CUIs
 
 In your CAS, I see
 
 textsem:ProcedureMention xmi:id=585 sofa=1 begin=0 end=13
 id=0 ontologyConceptArr=550 564 543 557 571
 
 Reading the CAS xmi, to see the first element in the array, we look for ID 
 550,
 which takes us to
 
 refsem:UmlsConcept xmi:id=550 codingScheme=SNOMED
 code=168335002 cui=C0430404 tui=T059/
 
 So it seems like whatever you are doing to try to read out the CUI is not
 accessing something correctly.
 
 -- James
 
 -Original Message-
 From: Nick Nikandish [mailto:snika...@emerginghealthit.com]
 Sent: Tuesday, July 08, 2014 2:46 PM
 To: dev@ctakes.apache.org
 Subject: RE: Retrieving CUIs
 
 Sure:
 
 ?xml version=1.0 encoding=UTF-8?xmi:XMI
 xmlns:refsem=http:///org/apache/ctakes/typesystem/type/refsem.ecore;
 xmlns:type4=http:///org/apache/ctakes/padtermspotter/type.ecore;
 xmlns:cas=http:///uima/cas.ecore;
 xmlns:types3=http:///org/montefiore/cri/nlp/culturetests/uima/types.ecor
 e
 xmlns:type6=http:///org/apache/ctakes/smokingstatus/i2b2/type.ecore;
 xmlns:util=http:///org/apache/ctakes/typesystem/type/util.ecore;
 xmlns:types=http:///org/apache/ctakes/assertion/medfacts/types.ecore;
 xmlns:type=http:///org/apache/ctakes/constituency/parser/uima/type.eco
 re xmlns:tcas=http:///uima/tcas.ecore;
 xmlns:type2=http:///org/apache/ctakes/coreference/type.ecore;
 xmlns:types2=http:///org/apache/ctakes/assertion/zoner/types.ecore;
 xmlns:type7=http:///org/apache/ctakes/smokingstatus/type.ecore;
 xmlns:syntax=http:///org/apache/ctakes/typesystem/type/syntax.ecore;
 xmlns:type5=http:///org/apache/ctakes/sideeffect/type.ecore;
 xmlns:xmi=http://www.omg.org/XMI;
 xmlns:textspan=http:///org/apache/ctakes/typesystem/type/textspan.eco
 re
 xmlns:assertion=http:///org/apache/ctakes/typesystem/type/temporary/a
 ssertion.ecore
 xmlns:structured=http:///org/apache/ctakes/typesystem/type/structured.
 ecore
 xmlns:relation=http:///org/apache/ctakes/typesystem/type/relation.ecore
 
 xmlns:textsem=http:///org/apache/ctakes/typesystem/type/textsem.ecor
 e xmlns:type3=http:///org/apache/ctakes/drugner/type.ecore;
 xmlns:libsvm=http:///org/apache/ctakes/smokingstatus/type/libsvm.ecore
  xmlns:type8=http:///org/apache/ctakes/typesystem/type.ecore;
 xmi:version=2.0cas:NULL xmi:id=0/cas:Sofa xmi:id=1
 sofaNum=1 sofaID=_InitialView mimeType=text/plain
 sofaString=urine culture mos source/body site clean catch culture results
 gt;100,001col/ml escherichia coli final id e.coli amikacin s ampicillin r 
 gt;=32
 #13;#10;/cas:Sofa xmi:id=13 sofaNum=2 sofaID=UriView
 sofaURI=file:/C:/srv/apps/nlp/testdata/tmp.txt/tcas:DocumentAnnotati
 on xmi:id=8 sofa=1 begin=0 end=142 language=x-
 unspecified/textspan:Segment xmi:id=20 sofa=1 begin=0
 end=142 id=SIMPLE_SEGMENT/textspan:Sentence xmi:id=26
 sofa=1 begin=0 end=139
 sentenceNumber=0/syntax:NewlineToken xmi:id=32 sofa=1
 begin=140 end=142 tokenNumber=30/syntax:WordToken
 xmi:id=40 sofa=1 begin=0 end=5 tokenNumber=0
 partOfSpeech=NN capitalization=0 numPosition=0
 canonicalForm=urine/syntax:WordToken xmi:id=52 sofa=1
 begin=6 end=13 tokenNumber=1 partOfSpeech=NN
 capitalization=0 numPosition=0
 canonicalForm=culture/syntax:WordToken xmi:id=64 sofa=1
 begin=14 end=17 tokenNumber=2 partOfSpeech=NNS
 capitalization=0 numPosition=0
 canonicalForm=mos/syntax:WordToken xmi:id=76 sofa=1
 begin=18 end=24 tokenNumber=3 partOfSpeech=NN
 capitalization=0 numPosition=0
 canonicalForm=source/syntax:WordToken xmi:id=96 sofa=1
 begin=25 end=29 tokenNumber=5 partOfSpeech=NN
 capitalization=0 numPosition=0
 canonicalForm=body/syntax:WordToken xmi:id=108 sofa=1
 begin=30 end=34 tokenNumber=6 partOfSpeech=NN
 capitalization=0 

Re: sectionSegmentAnnotator

2014-07-02 Thread Chen, Pei
One can try the CDASegmentAnnotator. Both are rules/regex based but it's much 
simpler and doesn't require the find struct 3rd party lib.

Sent from my iPhone

 On Jul 1, 2014, at 10:19 PM, Harpreet Khanduja hsk5...@rit.edu wrote:
 
 Hello,p
 Thanks for getting back.
 I tried again and this is the error.
 
 Exception in thread main java.lang.NoSuchMethodError:
 org.jdom.Element.addContent(Lorg/jdom/Element;)Lorg/jdom/Element;
 at findstruct.StructModel$SM.process(StructModel.java:234)
 at findstruct.StructModel.process(StructModel.java:43)
 at findstruct.StructFinder.execute(StructFinder.java:53)
 at
 org.apache.ctakes.core.ae.SectionSegmentAnnotator.process(SectionSegmentAnnotator.java:90)
 at
 org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:375)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:296)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
 at
 org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
 at org.uimafit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:80)
 at
 org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls.main(ClinicalPipelineWithUmls.java:91)
 
 Thank you so much,
 
 Harpreet
 
 
 On Tue, Jul 1, 2014 at 9:51 PM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:
 
 Was the system you were trying to run cTAKES on connected to the  internet
 at the time?
 
 In among all those messages is this line:
 
 Caused by: java.net.UnknownHostException: uts-ws.nlm.nih.gov
 
 Or perhaps it was just a temporary glitch in your connection?
 
 -- James
 
 -Original Message-
 From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
 Sent: Tuesday, July 01, 2014 6:04 PM
 To: dev@ctakes.apache.org
 Subject: sectionSegmentAnnotator
 
 Hello,
 
 I would really appreciate if anyone could help me on this.
 
 I am trying to use SectionSegmentAnnotator in developer version of ctakes
 inside eclipse.
 
 I have included
 delegateAnalysisEngine key=SectionSegmentAnnotator
import location=SectionSegmentAnnotator.xml/
/delegateAnalysisEngine
 
 and
 
  nodeSectionSegmentAnnotator/node
 
 in the file aggregateplaintextMLSprocessor.xml.
 
 I am getting the following error.
 
 Exception in thread main
 org.apache.uima.resource.ResourceInitializationException: Initialization of
 annotator class
 org.apache.ctakes.dictionary.lookup.ae.UmlsDictionaryLookupAnnotator
 failed.  (Descriptor:
 
 file:/D:/workspaces/workspacectakes/ctakes/ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml)
 at
 
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
 at
 
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156)
 at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
 at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
 at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
 at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
 at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
 at 

Re: Release Apache cTAKES 3.2.0

2014-07-02 Thread Chen, Pei
Himanshu,
There is a RC1 available for voting (see recent [VOTE] thread from this list.) 
Please feel free to try it out and vote. 
It will be release once there are more than 3 +1 binding votes. 

Sent from my iPhone

 On Jul 2, 2014, at 2:32 PM, Himanshu Singhal himanshusinghal...@gmail.com 
 wrote:
 
 Hi,
 
 When will be ctakes 3.2.0 will be released officially. Can you share link
 where I can download the ctakes resources for the version 3.2.0.
 
 I request to share links for the developer documentation for the latest
 version i.e. 3.2.0, so that I can setup ctakes in eclipse and able to build
 it successfully.
 
 
 
 -- 
 *Thanks  Regards*
 *HIMANSHU SINGHAL*
 *himanshusinghal...@gmail.com himanshusinghal...@gmail.com*
 *+91-060661(M)*
 
 - The information transmitted is intended only for the person or entity to
 which it is addressed and may contain confidential and/or privileged
 material. Any review, retransmission, dissemination or other use of, or
 taking of any action in reliance upon, this information by persons or
 entities other than the intended recipient is prohibited. If you received
 this in error, please contact the sender and delete the material from any
 compute.


RE: [VOTE] Release Apache cTAKES 3.2.0

2014-06-30 Thread Chen, Pei
Thanks James.
I just did a Jira review for 3.2.  There are just 2 remaining items that are 
pending some clarification from respective dev.  Otherwise, it should be up to 
date now- any items that didn't make it to 3.2 have been updated to 3.2.1 
instead now.
--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Saturday, June 28, 2014 10:26 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: [VOTE] Release Apache cTAKES 3.2.0
 
 The release notes include some JIRA issues that are open (and I think some
 that have not had any changes done for them)
 
 Example of one that has not been implemented as far as I know:
 https://issues.apache.org/jira/i#browse/CTAKES-122
 
 Example of one that has status=open and Resolution=Unresolved
 https://issues.apache.org/jira/i#browse/CTAKES-224
 
 There are others
 
 -- James
 
 -Original Message-
 From: Pei Chen [mailto:chen...@apache.org]
 Sent: Friday, June 27, 2014 5:16 PM
 To: dev@ctakes.apache.org
 Subject: [VOTE] Release Apache cTAKES 3.2.0
 
 Hi all,
 
 This is a call for a vote on releasing the following candidate (rc1) as Apache
 cTAKES 3.2.0.
 The major changes include:
 - New optional YTEX component(s) (Yale Extensions to cTAKES)
 - New optional improved/faster dictionary lookup (dictionary-lookup-fast)
 - New optional Temporal component (Time + Event extraction.  Relations will
 be including in a future release.)
 - Other bug fixes/enhancements from Jira
 
 [TODO: Online documentation still needs to be updated on wiki for the abo]
 
 For more detailed information on the changes/release notes, please visit:
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313621
 version=12324066
 
 The release was made using the cTAKES release process documented here:
 http://ctakes.apache.org/ctakes-release-guide.html
 
 The candidate is available at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0/ctakes-3.2.0/apache-
 ctakes-3.2.0-src.tar.gz
 /.zip
 
 The tag to be voted on:
 http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.2.0-rc1/
 
 The MD5 checksum of the tarball can be found at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0/ctakes-3.2.0/apache-
 ctakes-3.2.0-src.tar.gz.md5
 /.zip.md5
 
 The signature of the tarball can be found at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0/ctakes-3.2.0/apache-
 ctakes-3.2.0-src.tar.gz.asc
 /.zip.asc
 
 Apache cTAKES' KEYS file, containing the PGP keys used to sign the release:
 https://dist.apache.org/repos/dist/release/ctakes/KEYS
 
 Please vote on releasing these packages as Apache cTAKES 3.2.0. The vote is
 open for at least the next 72 hours.
 Only votes from the cTAKES PMC are binding, but folks are welcome to check
 the release candidate and voice their approval or disapproval.
 The vote passes if at least three binding +1 votes are cast.
 
 [ ] +1 Release the packages as Apache cTAKES 3.2.0 [ ] -1 Do not release the
 packages because...
 
 Also, the convenience binary can be found at:
 http://people.apache.org/~chenpei/RCs/ctakes-3.2.0/ctakes-3.2.0/apache-
 ctakes-3.2.0-bin.tar.gz
 /.zip
 Note: It's tempoarily on people.a.o because the artifacts were too large for
 https://dist.apache.org/repos/dist/dev/ctakes (Working with infra on
 increasing the limit).
 
 
 Thanks!


RE: YTEX install - one error after building

2014-06-27 Thread Chen, Pei
I presume the parameter markers should really have a property name attached to 
it:
catalog=@filter.umls.catalog@ rather than @filter.umls.catalog@ byitself in 
order to pass xml validation.

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Friday, June 27, 2014 2:59 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: YTEX install - one error after building
 
 I had a workspace in eclipse that I had checked out 6/6 but never built.
 Today I did a Team-Update and then File-Import-Existing Maven Projects
 to pick up new modules such as temporal relations
 
 Now during build I am getting the same kind of error Paula was seeing a few
 months ago:
 
 Description   ResourcePathLocationType
 Element type hibernate-mapping must be followed by either attribute
 specifications,  or /.  UMLS.hbm.template.xml   /ctakes-ytex-
 res/src/main/resources/org/apache/ctakes/ytex/umls/model  line 27 XML
 Problem
 
 This is on Windows 7 with Eclipse 4.2.2
 
 Line 27 is the second line of this XML fragment:
 hibernate-mapping package=org.apache.ctakes.ytex.umls.model
   schema=@umls.schema@ @filter.umls.catalog@
 
 Are there parts of these instructions that still apply?
 https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation
 
 Since mine was not a fresh check out from SVN, will people who check out
 fresh from trunk need to perform any of those steps?
 
 -- James
 
 
 -Original Message-
 From: digital paula [mailto:cybersat...@hotmail.com]
 Sent: Wednesday, March 26, 2014 7:05 PM
 To: dev@ctakes.apache.org
 Subject: RE: YTEX install - one error after building
 
 Thanks VJ.  I did try disabling validation in eclipse but that didn't work.  
 At any
 rate, I see that there is also a 'ytex installation' link on the ctakes wiki 
 that  I
 did not follow (I was still referring to your code.google.com site where ytex
 was previously stored) which also seems to align to the readme file link you
 provided.  I might not be able to get to till the weekend but will let you 
 know
 but I'm sure it will work once I execute according to  instructions.
 
  Date: Wed, 26 Mar 2014 10:03:33 -0400
  Subject: Re: YTEX install - one error after building
  From: vnga...@gmail.com
  To: dev@ctakes.apache.org
 
  Hi Paula,
 
  UMLS.hbm.template.xml is a template used to generate a valid hibernate
  xml config file.  If you have imported YTEX into eclipse, follow these
  guidelines:
 
  https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-
 ytex/READ
  ME
 
  I believe the issue might be that you have validation enabled for XML;
  I believe you can disable it for specific files (like 
  UMLS.hbm.template.xml).
   I am using keper, and it doesn't complain about
  UMLS.hbm.template.xml; I'm not sure if I tweaked my validator settings.
 
  -vj
 
 
 
  On Tue, Mar 25, 2014 at 6:16 PM, digital paula
 cybersat...@hotmail.comwrote:
 
   Hi VJ,
  
   As part of testing, I  did a fresh install of cTAKES with YTEX and
   everything installed correctly but after building I got one error
   pertaining to this page, five lines down.
  
  
   https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex-re
  
 s/src/main/resources/org/apache/ctakes/ytex/umls/model/UMLS.hbm.tem
 p
   late.xml
  
  
   The error is this line:
   hibernate-mapping package=org.apache.ctakes.ytex.umls.model
   schema=@umls.schema@ @filter.umls.catalog@
  
   Using Eclipse Juno, the error states:
  
   Element type hibernate-mapping must be followed by either
   attribute specifications,  or /.
  
   I tried using / instead of  and putting it all on one line instead
   of two but can't seem to fix it.
  
   Also,  I was about to install the sectionizer separately as a module
   but I see that YTEX already has a
   sectionizer(SegmentRegexSectionizer) so I look forward to exploring it
 further.
  
   Regards,
   Paula
  
Date: Thu, 20 Mar 2014 14:08:32 -0400
Subject: Re: YTEX Doc in cwiki
From: vnga...@gmail.com
To: dev@ctakes.apache.org
   
I plan to fix all the links.
   
I have not yet moved the scripts for the semantic similarity
benchmark to cTAKES, so I dropped that from the cTAKES semantic
similarity docs.  When those scripts get moved to cTAKES, I'll update 
the
 docs.
   
   
On Thu, Mar 20, 2014 at 12:33 PM, Masanz, James J. 
   masanz.ja...@mayo.eduwrote:
   
 hi vijay,

 I have just skimmed a few sections so far.

 the page has links at the top to google docs pages and then
 links to
   our
 web pages (the children pages) at the bottom. Is your intent to
 remove
   the
 first 3 links once things are finalized?

 some of the examples on the Semantic+Similarity page use cd
   CTAKES_HOME
 but later use %CTAKES_HOME%

 so it looks like you meant cd %CTAKES_HOME%

 I didn't see anything about the Similarity Benchmark on the new
 pages.
   Is
 that still part of ytex?

 -- 

RE: OrangeBookFilterConsumerImpl

2014-06-25 Thread Chen, Pei
Nick,
If I'm reading it correctly, that code change essentially tells it to bypass 
the OrangeBookFilter completely.
If that is the behavior you're looking for (i.e. return all of the drugs 
bypassing the OrangeBookFilter, then you can just modify the lookupConsumer 
className in your LookupDesc_Db.xml.  Try something like the 
NamedEntityLookupConsumerImpl.class or similar (i.e. no filters) I don't recall 
the exact name on the top of my head.
--Pei

From: Nick Nikandish [mailto:snika...@emerginghealthit.com]
Sent: Wednesday, June 25, 2014 11:38 AM
To: dev@ctakes.apache.org
Subject: OrangeBookFilterConsumerImpl

Hi There,

I am using Ctakes and have added my own annotators that utilize CTakes. I need 
to use the medication annotator so I can retrieve the medication names. In 
OrangeBookFilterConsumerImpl class , consumeHits() method has a statements:
final boolean isValid = isValid( trade_name, text ) || isValid( ingredient, 
text );

It filters out some medication that I actually need. I made this change 
boolean isValid= ture  to the code and made it work but I was wondering if 
there was another way like changing something in the xml files that  have the 
same impact  without changing the code?


Thanks,
Nick Nikandish
Product Development Software Engineer
Clinical Research Informatics

Emerging Health
Montefiore Information Technology
6 Executive Blvd. Suite 290, Yonkers, NY 10701
914-457-6792 Office
snika...@montefiore.orgmailto:snika...@montefiore.org
www.emerginghealthit.comhttp://www.emerginghealthit.com/
www.montefiore.orghttp://www.montefiore.org/

[logo-montefiore-it]



RE: OrangeBookFilterConsumerImpl

2014-06-25 Thread Chen, Pei
1) Could you debug/confirm that MedicationMentions are created by Annotator and 
the new consumer?   If not, attach the db config xml and version of cTAKES you 
are using?
2) Just an FYI: did you know you can use uimaFIT's JCasUtil to simplify the 
example code you had?
Collection MedicationMention mentions = JCasUtil.select(jcas, 
MedicationMention.class);
for(MedicationMention mention : mentions) {
//some work here
}

 -Original Message-
 From: Nick Nikandish [mailto:snika...@emerginghealthit.com]
 Sent: Wednesday, June 25, 2014 12:04 PM
 To: dev@ctakes.apache.org
 Subject: RE: OrangeBookFilterConsumerImpl
 
 Hi Pei,
 
 Thanks, I used NamedEntityLookupConsumerImpl that you mentioned now
 that but I am getting an error here:
 
 MapString, org.apache.ctakes.typesystem.type.refsem.OntologyConcept
 medicationAnnotator(JCas aJCas){
   MapString,
 org.apache.ctakes.typesystem.type.refsem.OntologyConcept ocMap =
   new HashMapString,
 org.apache.ctakes.typesystem.type.refsem.OntologyConcept();
 FSIndex medIndex =
 aJCas.getAnnotationIndex(MedicationMention.type);
 IteratorMedicationMention medIter = medIndex.iterator();
 
 while (medIter.hasNext())
 {
   MedicationMention medMen = medIter.next();
   ocMap.put(medMen.getCoveredText().toLowerCase(),
 medMen.getOntologyConceptArr(0));
   }
 System.out.println(Medication: +
 patternLists.createPatternList(ocMap.keySet()).toLowerCase());
 return ocMap;
 
 }
 
 I wrote this code in my own annotator to retrieve  the medication names but
 this is not returning anything now. Which class should I  use now to get
 medication names?
 
 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Wednesday, June 25, 2014 11:53 AM
 To: dev@ctakes.apache.org
 Subject: RE: OrangeBookFilterConsumerImpl
 
 Nick,
 If I'm reading it correctly, that code change essentially tells it to bypass 
 the
 OrangeBookFilter completely.
 If that is the behavior you're looking for (i.e. return all of the drugs 
 bypassing
 the OrangeBookFilter, then you can just modify the lookupConsumer
 className in your LookupDesc_Db.xml.  Try something like the
 NamedEntityLookupConsumerImpl.class or similar (i.e. no filters) I don't
 recall the exact name on the top of my head.
 --Pei
 
 From: Nick Nikandish [mailto:snika...@emerginghealthit.com]
 Sent: Wednesday, June 25, 2014 11:38 AM
 To: dev@ctakes.apache.org
 Subject: OrangeBookFilterConsumerImpl
 
 Hi There,
 
 I am using Ctakes and have added my own annotators that utilize CTakes. I
 need to use the medication annotator so I can retrieve the medication
 names. In OrangeBookFilterConsumerImpl class , consumeHits() method has
 a statements:
 final boolean isValid = isValid( trade_name, text ) || isValid( 
 ingredient,
 text );
 
 It filters out some medication that I actually need. I made this change
 boolean isValid= ture  to the code and made it work but I was wondering if
 there was another way like changing something in the xml files that  have the
 same impact  without changing the code?
 
 
 Thanks,
 Nick Nikandish
 Product Development Software Engineer
 Clinical Research Informatics
 
 Emerging Health
 Montefiore Information Technology
 6 Executive Blvd. Suite 290, Yonkers, NY 10701
 914-457-6792 Office
 snika...@montefiore.orgmailto:snika...@montefiore.org
 www.emerginghealthit.comhttp://www.emerginghealthit.com/
 www.montefiore.orghttp://www.montefiore.org/
 
 [logo-montefiore-it]



RE: Web demo

2014-06-18 Thread Chen, Pei
There is a demo server setup to host a web ui:
https://demo-ctakes.apache.org/

There is a code for a very simple html web UI at:
http://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-web-client/
We should be able to just mvn package and drop the war file into a tomcat 
instance.
[I didn't get a chance to install tomcat on the demo vm properly yet].  Feel 
free to give it a shot if you have time.


 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Wednesday, June 18, 2014 1:22 PM
 To: dev@ctakes.apache.org
 Subject: Web demo
 
 Where do we stand on the web demo? Ive been off on other projects and
 Im looking to digging into ctakes again, picking up where I left off, for an
 application for my med school. I think my work there could overlap with a
 demp server.
 
 In regards to thread safety: it sounds like from the chatter recently we would
 just have to ditch lvg to make it thread safe.
 
 
 Also, Im not too familiar with how ctakes loads the modules into memory,
 but any web demo that ran would a) want a restful api and b) those reponses
 would want to be generated against a ctakes process already loaded lr
 without having to reload the models, right? Any ideas, either in correcting a
 misconception I may have or on how to proceed there?
 
 
 If I pulled off a demo Id use django and python and one of its restful apis.
 
 
 JG
 —
 Sent from Mailbox for iPhone


RE: Preparing for an Apache cTAKES 3.2 Release?

2014-06-18 Thread Chen, Pei
Renamed to *-fast.  
Again, this is only temporary... this will eventually just replace the existing 
dictionary lookup (next minor release?).

 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Tuesday, June 17, 2014 10:14 AM
 To: dev@ctakes.apache.org
 Subject: RE: Preparing for an Apache cTAKES 3.2 Release?
 
 Yes.  It's only temporary to give folks a chance try out and transition to the
 new lookup algorithm (hence, the +1 for the -fast suffix rename).
 But open to biting the bullet and defaulting it now if folks are compelled to
 do so.
 
  -Original Message-
  From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
  Sent: Monday, June 16, 2014 11:36 AM
  To: dev@ctakes.apache.org
  Subject: RE: Preparing for an Apache cTAKES 3.2 Release?
 
  I guess that I've got one question at this point:
 
  Is the name being given to the -new- dictionary lookup module
  temporary or permanent?
 
  I was under the assumption that it was temporary and that with the
  switch to it being default (and eventually only) the module would
  simply be named dictionary-lookup.
 
 
 
  -Original Message-
  From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
  Sent: Monday, June 16, 2014 11:24 AM
  To: 'dev@ctakes.apache.org'
  Subject: RE: Preparing for an Apache cTAKES 3.2 Release?
 
  I'd rather something else than dictionary-lookup-fast. If we come up
  with something even faster than this one, having an older one called
  fast could be confusing.
 
  -Original Message-
  From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu]
  Sent: Monday, June 16, 2014 9:55 AM
  To: cTAKES Developer list
  Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
  +1
 
  Dima
 
 
 
 
  On Jun 16, 2014, at 9:42, Miller, Timothy
  timothy.mil...@childrens.harvard.edu wrote:
 
   Sorry to weigh in so late on this -- just returned from vacation. If
   we want to have a one release delay before making dictionary2
   default for testing/documentation/configuration purposes, and there
   isn't an obvious function-related name, and the main difference is
   speed, maybe we could call it dictionary-lookup-fast? Besides being
   accurate and more descriptive than 2, it might lure people into
   trying it and give us some feedback.
  
   Tim
  
  
   On 06/16/2014 10:34 AM, Chen, Pei wrote:
   I'm making some significant updates to trunk that may cause some
  instability for this release.
   It should be mostly transparent, but let me know if you encounter
   any
  issues with trunk.
  
   Also, regarding the dictionary-lookup2.  If there are no strong
   objections,
  we can leave default to as-is (old behavior).  Folks who wish to give
  the new one a try are welcome to do so and we can change the default
  behavior in a future release.
  
   [ducks for cover now]
   --Pei
  
   -Original Message-
   From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of
   Karthik Sarma
   Sent: Wednesday, June 11, 2014 9:58 AM
   To: dev@ctakes.apache.org
   Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
  
   Agreed
  
   On Wednesday, June 11, 2014, vijay garla vnga...@gmail.com wrote:
  
   regardless of the name, I think it would be incredibly helpful to
   have thorough documentation on the dictionary lookup, how to
   configure it, and how to create new dictionaries.  I would
   venture to say that this is the most important component in
   cTAKES, and probably the one that has generated the most
   questions on the
  newsgroup.
  
  
  
   On Wed, Jun 11, 2014 at 9:21 AM, Finan, Sean 
   sean.fi...@childrens.harvard.edu wrote:
  
   . The newer NER should have in its name the Behavior...
   I agree, but the *2 module is a complete replacement for the
   current lookup.  It does not (really) have any different
   behavior, just a
   different
   implementation and performance.  We plan to swap out the old
   with the new in the next release and get rid of the *2 suffix.
   So, any name provided now is just temporary - unless people
   don't like the name dictionary-lookup at all.
  
   In my original sandbox it was named RareWordLookup, a nod to
   its implementation.  However, this doesn't help any users.
  
   Sean
  
   -Original Message-
   From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
   Sent: Wednesday, June 11, 2014 3:09 AM
   To: dev@ctakes.apache.org
   Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
  
   2 doesn't mean much. The newer NER should have in its name the
   Behavior...
  
   Perhaps something like MetaMap Usage
   http://metamap.nlm.nih.gov/Docs/MM09_Usage.shtml --
   allow_overmatches
   or  --allow_concept_gaps or .other?
  
   Since yTex already provides a pluggable *DictionaryLookup, *that
   seems like the best place to define the differing Behavior /  Usage.
  
   https://cwiki.apache.org/confluence/display/CTAKES/User's+Guide
   https://code.google.com/p/ytex/wiki/DictionaryLookup_V05

RE: Preparing for an Apache cTAKES 3.2 Release?

2014-06-17 Thread Chen, Pei
Yes.  It's only temporary to give folks a chance try out and transition to the 
new lookup algorithm (hence, the +1 for the -fast suffix rename).
But open to biting the bullet and defaulting it now if folks are compelled to 
do so.

 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
 Sent: Monday, June 16, 2014 11:36 AM
 To: dev@ctakes.apache.org
 Subject: RE: Preparing for an Apache cTAKES 3.2 Release?
 
 I guess that I've got one question at this point:
 
 Is the name being given to the -new- dictionary lookup module temporary or
 permanent?
 
 I was under the assumption that it was temporary and that with the switch to
 it being default (and eventually only) the module would simply be named
 dictionary-lookup.
 
 
 
 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Monday, June 16, 2014 11:24 AM
 To: 'dev@ctakes.apache.org'
 Subject: RE: Preparing for an Apache cTAKES 3.2 Release?
 
 I'd rather something else than dictionary-lookup-fast. If we come up with
 something even faster than this one, having an older one called fast could
 be confusing.
 
 -Original Message-
 From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu]
 Sent: Monday, June 16, 2014 9:55 AM
 To: cTAKES Developer list
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
 +1
 
 Dima
 
 
 
 
 On Jun 16, 2014, at 9:42, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 
  Sorry to weigh in so late on this -- just returned from vacation. If
  we want to have a one release delay before making dictionary2 default
  for testing/documentation/configuration purposes, and there isn't an
  obvious function-related name, and the main difference is speed, maybe
  we could call it dictionary-lookup-fast? Besides being accurate and
  more descriptive than 2, it might lure people into trying it and
  give us some feedback.
 
  Tim
 
 
  On 06/16/2014 10:34 AM, Chen, Pei wrote:
  I'm making some significant updates to trunk that may cause some
 instability for this release.
  It should be mostly transparent, but let me know if you encounter any
 issues with trunk.
 
  Also, regarding the dictionary-lookup2.  If there are no strong objections,
 we can leave default to as-is (old behavior).  Folks who wish to give the new
 one a try are welcome to do so and we can change the default behavior in a
 future release.
 
  [ducks for cover now]
  --Pei
 
  -Original Message-
  From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of
  Karthik Sarma
  Sent: Wednesday, June 11, 2014 9:58 AM
  To: dev@ctakes.apache.org
  Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
  Agreed
 
  On Wednesday, June 11, 2014, vijay garla vnga...@gmail.com wrote:
 
  regardless of the name, I think it would be incredibly helpful to
  have thorough documentation on the dictionary lookup, how to
  configure it, and how to create new dictionaries.  I would venture
  to say that this is the most important component in cTAKES, and
  probably the one that has generated the most questions on the
 newsgroup.
 
 
 
  On Wed, Jun 11, 2014 at 9:21 AM, Finan, Sean 
  sean.fi...@childrens.harvard.edu wrote:
 
  . The newer NER should have in its name the Behavior...
  I agree, but the *2 module is a complete replacement for the
  current lookup.  It does not (really) have any different behavior,
  just a
  different
  implementation and performance.  We plan to swap out the old with
  the new in the next release and get rid of the *2 suffix.  So, any
  name provided now is just temporary - unless people don't like the
  name dictionary-lookup at all.
 
  In my original sandbox it was named RareWordLookup, a nod to its
  implementation.  However, this doesn't help any users.
 
  Sean
 
  -Original Message-
  From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
  Sent: Wednesday, June 11, 2014 3:09 AM
  To: dev@ctakes.apache.org
  Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
  2 doesn't mean much. The newer NER should have in its name the
  Behavior...
 
  Perhaps something like MetaMap Usage
  http://metamap.nlm.nih.gov/Docs/MM09_Usage.shtml --
  allow_overmatches
  or  --allow_concept_gaps or .other?
 
  Since yTex already provides a pluggable *DictionaryLookup, *that
  seems like the best place to define the differing Behavior /  Usage.
 
  https://cwiki.apache.org/confluence/display/CTAKES/User's+Guide
  https://code.google.com/p/ytex/wiki/DictionaryLookup_V05
 
 
  AndyMC
 
  On Tue, Jun 10, 2014 at 9:55 AM, britt fitch
  britt.fi...@gmail.com
  wrote:
 
  I don't have an issue with the *-2 name. I also don't have any
  objections to renaming it.
 
  It might be nice to keep the old dictionary code around for a
  release-worth of time but after that I would vote purging it.
  If someone needs it after that it'll be accessible in the
  archived releases.
 
 
 
  On Jun 10, 2014, at 12:48 PM, Chen, Pei
  pei.c

Re: query

2014-06-17 Thread Chen, Pei
If this is trunk, Can you do an 'svn update' to ensure you have the latest?
If this is trunk you won't need to do a separate download- maven should 
download and unpack it automatically for you. 
Also try the command line alternative:
'mvn -PrunCVD compile' from the root dir. Let me know.


Sent from my iPhone

 On Jun 17, 2014, at 5:37 PM, Harpreet Khanduja hsk5...@rit.edu wrote:
 
 Pei,
 
  Thank you for the quick reply.
  I am using CVD GUI within eclipse which is under
 ctakes-clinical-pipeline project..
 resources
 launch
 UIMA_CVD---clinical_documents_pipeline.launch.
 
  I am using ctakes 3.1.1 and resouces also 3.1
 
 I used this link for svn : https://svn.apache.org/repos/asf/ctakes/trunk
 when I downloaded ctakes using Eclipse.
 
 isn't this svn for ctakes 3.1.1?
 
 Thank you,
 
 Harpreet
 
 
 On Tue, Jun 17, 2014 at 5:29 PM, Pei Chen chen...@apache.org wrote:
 
 Harpreet,
 Are you using the CVD GUI? or within Eclipse IDE?
 Also which version cTAKES are you using? trunk?
 --Pei
 
 
 On Tue, Jun 17, 2014 at 5:13 PM, Harpreet Khanduja hsk5...@rit.edu
 wrote:
 
 Hello Pei,
  I would really appreciate if you could help me again.
  After talking to you and reading other email archives. I have done
 almost
 everything I could, but I am not able to use
 AggregatePlainTextUMLSProcessor.xml using
 UIMA_CVDclinical-pipeline.launch.
 
  Just to be sure.
  The resources folder that from 
 http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.1.0.zip/download
 url
  is used to replace the resources folder which is already inside
 ctakes/ctakes-dictionary-lookup/ -- directoryor
 ctakes/ctakes-dictionary-lookup-res/source/main/ -- directory.
 
 And then which directory is used as a classpath to
 ctakes-clinical-pipeline project.
 
 I still get the same exception that I was getting earlier. I looked into
 the exception deeply and I found that the exception is thrown
 while creating a connection object ( iv_conn ) in
 JdbcConnectionResourceImpl.java (   line 109 or 110 ) which is inside
  ctakes- core/
 src/main/java
 org.apache.ctakes.core.resource
package.
 
 Thank you very much.
 Harpreet
 
 
 On Wed, Jun 11, 2014 at 4:33 PM, Chen, Pei 
 pei.c...@childrens.harvard.edu
 wrote:
 
 Harpreet,
 Just curious- is maven able to connect to the internet (maven central
 repositories)? i.e. did you have to set your ~/.m2/settings.xml with
 proxy
 info if behind a firewall?
 If it was an intermittent issue, you can try clearing out the local
 ~/.m2/repository?
 --Pei
 
 -Original Message-
 From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
 Sent: Wednesday, June 11, 2014 3:54 PM
 To: dev@ctakes.apache.org
 Subject: Re: query
 
 Pei,
   I had provided the classpath = ctakes-dictionay-look-up/resources
 to
 all the
 projects in ctakes.
   as it says in the documentation but there was nothing inside my
 target
 folder in ctakes-clinical-pipeline directory.
   So, then I ran maven compile and I got following error.
 
 
 [ERROR] Failed to execute goal on project ctakes-clinical-pipeline:
 Could not
 resolve dependencies for project
 org.apache.ctakes:ctakes-clinical-pipeline:jar:3.1.2-SNAPSHOT: Failed
 to
 collect dependencies for
 [org.apache.ctakes:ctakes-type-system:jar:3.1.2-
 SNAPSHOT (compile), org.apache.ctakes:ctakes-core:jar:3.1.2-SNAPSHOT
 (compile), org.apache.ctakes:ctakes-utils:jar:3.1.2-SNAPSHOT
 (compile),
 jdom:jdom:jar:1.0 (compile), junit:junit:jar:4.10 (test),
 org.apache.ctakes:ctakes-context-tokenizer:jar:3.1.2-SNAPSHOT
 (compile),
 org.apache.ctakes:ctakes-dictionary-lookup:jar:3.1.2-SNAPSHOT
 (compile),
 org.apache.ctakes:ctakes-preprocessor:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-lvg:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-chunker:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-ne-contexts:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-pos-tagger:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-assertion:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-dependency-parser:jar:3.1.2-SNAPSHOT
 (compile),
 org.apache.ctakes:ctakes-dependency-parser-res:jar:3.1.2-SNAPSHOT
 (compile), org.apache.ctakes:ctakes-ytex:jar:3.1.2-SNAPSHOT
 (compile),
 org.apache.ctakes:ctakes-ytex-res:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-ytex-uima:jar:3.1.2-SNAPSHOT (compile)]:
 Failed
 to read artifact descriptor for
 org.apache.ctakes:ctakes-type-system:jar:3.1.2-SNAPSHOT: Failure to
 find
 org.apache.ctakes:ctakes:pom:3.1.2-SNAPSHOT in
 http://repository.apache.org/snapshots was cached in the local
 repository,
 resolution will not be reattempted until the update interval of
 apache.snapshots has elapsed or updates are forced - [Help 1]
 [ERROR]
 
 
 
 On Wed, Jun 11, 2014 at 2:17 PM, Chen, Pei
 pei.c...@childrens.harvard.edu
 wrote:
 
 Harpreet,
 I had a closer look at your log file and it looks like you were
 actually trying

RE: Preparing for an Apache cTAKES 3.2 Release?

2014-06-16 Thread Chen, Pei
I'm making some significant updates to trunk that may cause some instability 
for this release.
It should be mostly transparent, but let me know if you encounter any issues 
with trunk.

Also, regarding the dictionary-lookup2.  If there are no strong objections, we 
can leave default to as-is (old behavior).  Folks who wish to give the new one 
a try are welcome to do so and we can change the default behavior in a future 
release.

[ducks for cover now]
--Pei

 -Original Message-
 From: ksa...@gmail.com [mailto:ksa...@gmail.com] On Behalf Of Karthik
 Sarma
 Sent: Wednesday, June 11, 2014 9:58 AM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
 Agreed
 
 On Wednesday, June 11, 2014, vijay garla vnga...@gmail.com wrote:
 
  regardless of the name, I think it would be incredibly helpful to have
  thorough documentation on the dictionary lookup, how to configure it,
  and how to create new dictionaries.  I would venture to say that this
  is the most important component in cTAKES, and probably the one that
  has generated the most questions on the newsgroup.
 
 
 
  On Wed, Jun 11, 2014 at 9:21 AM, Finan, Sean 
  sean.fi...@childrens.harvard.edu wrote:
 
   . The newer NER should have in its name the Behavior...
  
   I agree, but the *2 module is a complete replacement for the current
   lookup.  It does not (really) have any different behavior, just a
  different
   implementation and performance.  We plan to swap out the old with
   the new in the next release and get rid of the *2 suffix.  So, any
   name provided now is just temporary - unless people don't like the
   name dictionary-lookup at all.
  
   In my original sandbox it was named RareWordLookup, a nod to its
   implementation.  However, this doesn't help any users.
  
   Sean
  
   -Original Message-
   From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
   Sent: Wednesday, June 11, 2014 3:09 AM
   To: dev@ctakes.apache.org
   Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
  
   2 doesn't mean much. The newer NER should have in its name the
   Behavior...
  
   Perhaps something like MetaMap Usage
   http://metamap.nlm.nih.gov/Docs/MM09_Usage.shtml --
 allow_overmatches
   or  --allow_concept_gaps or .other?
  
   Since yTex already provides a pluggable *DictionaryLookup, *that
   seems like the best place to define the differing Behavior /  Usage.
  
   https://cwiki.apache.org/confluence/display/CTAKES/User's+Guide
   https://code.google.com/p/ytex/wiki/DictionaryLookup_V05
  
  
   AndyMC
  
   On Tue, Jun 10, 2014 at 9:55 AM, britt fitch britt.fi...@gmail.com
   wrote:
  
I don’t have an issue with the *-2 name. I also don’t have any
objections to renaming it.
   
It might be nice to keep the old dictionary code around for a
release-worth of time but after that I would vote purging it.
If someone needs it after that it’ll be accessible in the archived
releases.
   
   
   
On Jun 10, 2014, at 12:48 PM, Chen, Pei
pei.c...@childrens.harvard.edu
wrote:
   
 I think James has a fair point here.
 It may be worthwhile biting the bullet here and push forward.

 Since this essentially will be a full replacement of the
ctakes-dictionary-lookup module, a good option maybe to just
replace the entire module now and rename the existing module to *
 _deprecated.
 How do folks feel about that?  In a nutshell,
 ctakes-dictionary-lookup-2
is a faster algorithm with a simpler code base- and comparable
results (Sean has a full comparison in the documentation for those
who are
   curious).

 --Pei

 -Original Message-
 From: britt fitch [mailto:britt.fi...@gmail.com]
 Sent: Monday, June 09, 2014 5:42 PM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?

 There is some documentation in the dictionary2 module under
 /doc/DictionaryLookupHelp.{txt | docx} that gives some some
 details of
the
 different lookup implementation options within that module that
 I found helpful.


 On Jun 9, 2014, at 5:17 PM, Masanz, James J.
 
 
 
 
 --
 
 
 
 
 --
 Karthik Sarma
 UCLA Medical Scientist Training Program Class of 20??
 Member, UCLA Medical Imaging  Informatics Lab Member, CA Delegation
 to the House of Delegates of the American Medical Association
 ksa...@ksarma.com
 gchat: ksa...@gmail.com
 linkedin: www.linkedin.com/in/ksarma


ApacheCon CFP closes June 25

2014-06-11 Thread Chen, Pei
Dear cTAKES enthusiast,



As you may be aware, ApacheCon will be held this year in Budapest, on November 
17-23. (See http://apachecon.eu for more info.)



The Call For Papers for that conference is still open, but will be closing 
soon. We need you talk proposals, to represent cTAKES at ApacheCon. We need all 
kinds of talks - deep technical talks, hands-on tutorials, introductions for 
beginners, or case studies about the awesome stuff you're doing with cTAKES.



Please consider submitting a proposal, at 
http://events.linuxfoundation.org//events/apachecon-europe/program/cfphttp://events.linuxfoundation.org/events/apachecon-europe/program/cfp



Thanks!



RE: query

2014-06-11 Thread Chen, Pei
Harpreet,
I had a closer look at your log file and it looks like you were actually trying 
to run it from Eclipse IDE?
If so, just ensure that the resources do exist in the classpath.
If it's within eclipse ide, the plugin should download and unpack the umls 
dictionaries automatically actually. (you can check the below to ensure it 
exists
target/classes/org/apache/ctakes/dictionary/lookup/umls2011ab/)
You can also try running 'mvn clean compile' from the command line as well.



 -Original Message-
 From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
 Sent: Wednesday, June 11, 2014 2:09 PM
 To: dev@ctakes.apache.org
 Subject: Re: query
 
 Hello,
Thanks for the reply, but I have already done that and I made sure that
these resources are there all over again.
 
 Harpreet
 
 
 
 On Wed, Jun 11, 2014 at 1:49 PM, Pei Chen chen...@apache.org wrote:
 
  Harpreet,
  Ensure that you have downloaded the dictionaries (umls) per download
 page:
  http://ctakes.apache.org/downloads.cgi
  Resources
 
  Resources are required to run most of cTAKES. They are available for
  download from SourceForge: ctakes-resources-3.1.0.zip 
  http://sourceforge.net/projects/ctakesresources/files/ctakes-resources
  -3.1.0.zip/download
  
  .
 
  Please download, unzip and add/merge the contents to the existing
  resources directory. Follow the User
  https://cwiki.apache.org/confluence/x/oxAHAg
   or Developer https://cwiki.apache.org/confluence/x/nxAHAg Install
  Guide to direct you through the installation process.
 
 
  On Wed, Jun 11, 2014 at 1:45 PM, Harpreet Khanduja hsk5...@rit.edu
  wrote:
 
   Hello,
 I am trying to use ctakes as a developer. I am not able to use
   UMLS resources when I run
 the AEs which use UMLS Dictionary.
 I have signed up on the UMLS website and I am using the correct
   email
  and
   password.
 I have specified username and password in
  
  Dictionary Lookup: cTAKES_HOME/desc/ctakes-dictionary-
lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml*
  
  
 But I keep getting the exception. I would really appreciate any
   help on this.
  
OUTPUT THAT I GET  on running AggregatePlaintextUMLSProcessor.xml :
  
  
 log4j: reset attribute= false.
   log4j: Threshold =null.
   log4j: Level value for root is  [INFO].
   log4j: root level set to INFO
   log4j: Class name: [org.apache.log4j.ConsoleAppender]
   log4j: Parsing layout of class: org.apache.log4j.PatternLayout
   log4j: Setting property [conversionPattern] to [%d{dd MMM 
   HH:mm:ss} %5p %c{1} - %m%n].
   log4j: Adding appender named [consoleAppender] to category [root].
   11 Jun 2014 13:07:01  INFO TokenizerAnnotatorPTB - Initializing
   org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
   11 Jun 2014 13:07:01  INFO POSTagger - POS tagger model file:
   org/apache/ctakes/postagger/models/mayo-pos.zip
   11 Jun 2014 13:07:01  INFO ContextDependentTokenizerAnnotator -
   Finite state machines loaded.
   11 Jun 2014 13:07:01  INFO Chunker - Chunker model file:
   org/apache/ctakes/chunker/models/chunker-model.zip
   11 Jun 2014 13:07:03  INFO SentenceDetector - Sentence detector
   model
  file:
   org/apache/ctakes/core/sentdetect/sd-med-model.zip
   11 Jun 2014 13:07:03  INFO LvgCmdApiResourceImpl - Loading NLM Norm
   and
  Lvg
   with config file =
  
  
  D:\workspaces\workspace_cTakes\ctakes\ctakes-dictionary-
 lookup\resourc
  es\org\apache\ctakes\lvg\data\config\lvg.properties
   11 Jun 2014 13:07:03  INFO LvgCmdApiResourceImpl -   config file absolute
   path =
  
  
  D:\workspaces\workspace_cTakes\ctakes\ctakes-dictionary-
 lookup\resourc
  es\org\apache\ctakes\lvg\data\config\lvg.properties
   11 Jun 2014 13:07:03  INFO LvgCmdApiResourceImpl - cwd =
   D:\workspaces\workspace_cTakes\ctakes\ctakes-clinical-pipeline
   11 Jun 2014 13:07:03  INFO LvgCmdApiResourceImpl - cd
  
  
  D:\workspaces\workspace_cTakes\ctakes\ctakes-dictionary-
 lookup\resourc
  es\org\apache\ctakes\lvg\
   11 Jun 2014 13:07:03  INFO LvgCmdApiResourceImpl - cd
   D:\workspaces\workspace_cTakes\ctakes\ctakes-clinical-pipeline
   11 Jun 2014 13:07:04  INFO JdbcConnectionResourceImpl - Connection
   established to:
   jdbc:hsqldb:res:/org/apache/ctakes/dictionary/lookup/umls2011ab/umls
   Exception in thread main
   org.apache.uima.resource.ResourceInitializationException
   at
  
  
 
 org.apache.ctakes.core.resource.JdbcConnectionResourceImpl.load(JdbcCo
  nnectionResourceImpl.java:130)
   at
  
  
 
 org.apache.uima.resource.impl.ResourceManager_impl.registerResource(Re
  sourceManager_impl.java:603)
   at
  
  
  org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalR
  esources(ResourceManager_impl.java:442)
   at
  
  
  org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBas
  e.java:153)
   at
  
  
  org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize
  (AnalysisEngineImplBase.java:157)
   at
  
  
  

RE: query

2014-06-11 Thread Chen, Pei
Harpreet,
Just curious- is maven able to connect to the internet (maven central 
repositories)? i.e. did you have to set your ~/.m2/settings.xml with proxy info 
if behind a firewall?
If it was an intermittent issue, you can try clearing out the local 
~/.m2/repository?
--Pei

 -Original Message-
 From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
 Sent: Wednesday, June 11, 2014 3:54 PM
 To: dev@ctakes.apache.org
 Subject: Re: query
 
 Pei,
I had provided the classpath = ctakes-dictionay-look-up/resources to all 
 the
 projects in ctakes.
as it says in the documentation but there was nothing inside my target
 folder in ctakes-clinical-pipeline directory.
So, then I ran maven compile and I got following error.
 
 
  [ERROR] Failed to execute goal on project ctakes-clinical-pipeline: Could not
 resolve dependencies for project
 org.apache.ctakes:ctakes-clinical-pipeline:jar:3.1.2-SNAPSHOT: Failed to
 collect dependencies for [org.apache.ctakes:ctakes-type-system:jar:3.1.2-
 SNAPSHOT (compile), org.apache.ctakes:ctakes-core:jar:3.1.2-SNAPSHOT
 (compile), org.apache.ctakes:ctakes-utils:jar:3.1.2-SNAPSHOT (compile),
 jdom:jdom:jar:1.0 (compile), junit:junit:jar:4.10 (test),
 org.apache.ctakes:ctakes-context-tokenizer:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-dictionary-lookup:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-preprocessor:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-lvg:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-chunker:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-ne-contexts:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-pos-tagger:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-assertion:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-dependency-parser:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-dependency-parser-res:jar:3.1.2-SNAPSHOT
 (compile), org.apache.ctakes:ctakes-ytex:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-ytex-res:jar:3.1.2-SNAPSHOT (compile),
 org.apache.ctakes:ctakes-ytex-uima:jar:3.1.2-SNAPSHOT (compile)]: Failed
 to read artifact descriptor for
 org.apache.ctakes:ctakes-type-system:jar:3.1.2-SNAPSHOT: Failure to find
 org.apache.ctakes:ctakes:pom:3.1.2-SNAPSHOT in
 http://repository.apache.org/snapshots was cached in the local repository,
 resolution will not be reattempted until the update interval of
 apache.snapshots has elapsed or updates are forced - [Help 1] [ERROR]
 
 
 
 On Wed, Jun 11, 2014 at 2:17 PM, Chen, Pei
 pei.c...@childrens.harvard.edu
 wrote:
 
  Harpreet,
  I had a closer look at your log file and it looks like you were
  actually trying to run it from Eclipse IDE?
  If so, just ensure that the resources do exist in the classpath.
  If it's within eclipse ide, the plugin should download and unpack the
  umls dictionaries automatically actually. (you can check the below to
  ensure it exists
  target/classes/org/apache/ctakes/dictionary/lookup/umls2011ab/)
  You can also try running 'mvn clean compile' from the command line as
 well.
 
 
 
   -Original Message-
   From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
   Sent: Wednesday, June 11, 2014 2:09 PM
   To: dev@ctakes.apache.org
   Subject: Re: query
  
   Hello,
  Thanks for the reply, but I have already done that and I made
   sure
  that
  these resources are there all over again.
  
   Harpreet
  
  
  
   On Wed, Jun 11, 2014 at 1:49 PM, Pei Chen chen...@apache.org
 wrote:
  
Harpreet,
Ensure that you have downloaded the dictionaries (umls) per
download
   page:
http://ctakes.apache.org/downloads.cgi
Resources
   
Resources are required to run most of cTAKES. They are available
for download from SourceForge: ctakes-resources-3.1.0.zip 
http://sourceforge.net/projects/ctakesresources/files/ctakes-resou
rces
-3.1.0.zip/download

.
   
Please download, unzip and add/merge the contents to the existing
resources directory. Follow the User
https://cwiki.apache.org/confluence/x/oxAHAg
 or Developer https://cwiki.apache.org/confluence/x/nxAHAg
Install Guide to direct you through the installation process.
   
   
On Wed, Jun 11, 2014 at 1:45 PM, Harpreet Khanduja
hsk5...@rit.edu
wrote:
   
 Hello,
   I am trying to use ctakes as a developer. I am not able to use
 UMLS resources when I run
   the AEs which use UMLS Dictionary.
   I have signed up on the UMLS website and I am using the
 correct email
and
 password.
   I have specified username and password in

Dictionary Lookup: cTAKES_HOME/desc/ctakes-dictionary-

 lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml*


   But I keep getting the exception. I would really appreciate
 any help on this.

  OUTPUT THAT I GET  on running
 AggregatePlaintextUMLSProcessor.xml :


   log4j: reset attribute= false.
 log4j: Threshold =null.
 log4j

RE: Preparing for an Apache cTAKES 3.2 Release?

2014-06-10 Thread Chen, Pei
I think James has a fair point here.
It may be worthwhile biting the bullet here and push forward.

Since this essentially will be a full replacement of the 
ctakes-dictionary-lookup module, a good option maybe to just replace the entire 
module now and rename the existing module to * _deprecated.
How do folks feel about that?  In a nutshell, ctakes-dictionary-lookup-2 is a 
faster algorithm with a simpler code base- and comparable results (Sean has a 
full comparison in the documentation for those who are curious).

--Pei

 -Original Message-
 From: britt fitch [mailto:britt.fi...@gmail.com]
 Sent: Monday, June 09, 2014 5:42 PM
 To: dev@ctakes.apache.org
 Subject: Re: Preparing for an Apache cTAKES 3.2 Release?
 
 There is some documentation in the dictionary2 module under
 /doc/DictionaryLookupHelp.{txt | docx} that gives some some details of the
 different lookup implementation options within that module that I found
 helpful.
 
 
 On Jun 9, 2014, at 5:17 PM, Masanz, James J. masanz.ja...@mayo.edu
 wrote:
 
 
  Will ctakes-dictionary-lookup2 remain the name for the new dictionary
 lookup or will it have a name that reflects the algorithm?
 
  Is there a description of it that will help users to decide when to use one
 dictionary lookup component vs. the other.
 
  -- James
 
  -Original Message-
  From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
  Sent: Friday, June 06, 2014 12:34 PM
  To: dev@ctakes.apache.org
  Subject: Preparing for an Apache cTAKES 3.2 Release?
 
  Hi,
  The 3.2 release was slated to be release end of this month (Jun 21).
  Since I volunteered to be the RM for this release, just like the past
 releases, I was planning to create a branch/tag next week from trunk and
 dev can continue.
  Feel free to take a look at any outstanding Jira issues [1] that you may 
  want
 to be included in this release.
 
  Major changes include:
  CTAKES-197Upgrade cTAKES to Java 7
  CTAKES-292Integrate YTEX with cTAKES
  CTAKES-82  Add ctakes-temporal module (Time and Event Annotator +
 DocTimeRel Property only?)
 
  [1]
  https://issues.apache.org/jira/browse/CTAKES-
 298?jql=fixVersion%20%3D%
  203.2.0%20AND%20project%20%3D%20CTAKES
 
  -Original Message-
  From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
  Sent: Wednesday, March 26, 2014 9:34 PM
  To: 'dev@ctakes.apache.org'
  Subject: RE: Apache cTAKES 3.2 Release?
 
  +1 to naming it 3.2
 
  I'll review my JIRA items this week.
 
  -- James
 
  -Original Message-
  From: Pei Chen [mailto:chen...@apache.org]
  Sent: Wednesday, March 26, 2014 10:14 AM
  To: dev@ctakes.apache.org
  Subject: Apache cTAKES 3.2 Release?
 
  Hi,
 
  I think there are a lot of items slated for the next release, I
  suggest we make it 3.2 instead of another patch release.
 
  I can volunteer to be the RM unless someone would like to take that up...
 
 
 
  Main Changes pending for 3.2:
 
  CTAKES-197Upgrade cTAKES to Java 7
 
  CTAKES-292Integrate YTEX with cTAKES
 
  CTAKES-82  Add ctakes-temporal module (Time and Event Annotator
 +
  DocTimeRel Property only?)
 
  CTAKES-275some of the older junit tests don't have the right
  Project name in the run configurations
 
  CTAKES-268Fix SentenceDetector training with updated OpenNLP API
 
  CTAKES-162Command line scripts leave the user back one directory
 
  CTAKES-241NullPointerException in ctakes-assertion
 
  CTAKES-288Severity not set for DiseaseDisorderMention
 
  CTAKES-239Medication Modifiers do not have the offsets populated
 
  CTAKES-94  refactoring assertion module to use a cleartk-based
  analysis engine (and include evaluation)
 
  CTAKES-232change concept type
 
  CTAKES-76  get third party dependencies into Maven Central
 
  CTAKES-138Remove 3rd party jars from our SVN
 
  CTAKES-74  Tokenizer PennTreeBank breaks with certain apostrophes
  in tokens.
 
  CTAKES-225Common Type System - Add field to save preferredText in
  Segment
 
  CTAKES-222FirstTokenPermLookupInitializerImpl to suppot arraylist
  of DictionaryLookupWindows
 
  CTAKES-213ModifierExtractorAnnotator should produce XxxxModifier
  subtypes
 
 
 
  Full List:
 
  https://issues.apache.org/jira/browse/CTAKES-
 
 288?jql=project%20%3D%20CTAKES%20AND%20fixVersion%20%3D%203.2%
 
 20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20create
  d%20ASC



RE: Preparing for an Apache cTAKES 3.2 Release?

2014-06-09 Thread Chen, Pei
I'm not sure if it's worth it to keep both for a prolonged period of time. 
We can just replace the old module after the following release?

What are folks preferences? 
I think we can just leave both temporarily for a short transition period (1 
release?). 
--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Monday, June 09, 2014 5:18 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: Preparing for an Apache cTAKES 3.2 Release?
 
 
 Will ctakes-dictionary-lookup2 remain the name for the new dictionary
 lookup or will it have a name that reflects the algorithm?
 
 Is there a description of it that will help users to decide when to use one
 dictionary lookup component vs. the other.
 
 -- James
 
 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Friday, June 06, 2014 12:34 PM
 To: dev@ctakes.apache.org
 Subject: Preparing for an Apache cTAKES 3.2 Release?
 
 Hi,
 The 3.2 release was slated to be release end of this month (Jun 21).
 Since I volunteered to be the RM for this release, just like the past 
 releases, I
 was planning to create a branch/tag next week from trunk and dev can
 continue.
 Feel free to take a look at any outstanding Jira issues [1] that you may want
 to be included in this release.
 
 Major changes include:
 CTAKES-197Upgrade cTAKES to Java 7
 CTAKES-292Integrate YTEX with cTAKES
 CTAKES-82  Add ctakes-temporal module (Time and Event Annotator +
 DocTimeRel Property only?)
 
 [1] https://issues.apache.org/jira/browse/CTAKES-
 298?jql=fixVersion%20%3D%203.2.0%20AND%20project%20%3D%20CTAKES
 
  -Original Message-
  From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
  Sent: Wednesday, March 26, 2014 9:34 PM
  To: 'dev@ctakes.apache.org'
  Subject: RE: Apache cTAKES 3.2 Release?
 
  +1 to naming it 3.2
 
  I'll review my JIRA items this week.
 
  -- James
 
  -Original Message-
  From: Pei Chen [mailto:chen...@apache.org]
  Sent: Wednesday, March 26, 2014 10:14 AM
  To: dev@ctakes.apache.org
  Subject: Apache cTAKES 3.2 Release?
 
  Hi,
 
  I think there are a lot of items slated for the next release, I
  suggest we make it 3.2 instead of another patch release.
 
  I can volunteer to be the RM unless someone would like to take that up...
 
 
 
  Main Changes pending for 3.2:
 
  CTAKES-197Upgrade cTAKES to Java 7
 
  CTAKES-292Integrate YTEX with cTAKES
 
  CTAKES-82  Add ctakes-temporal module (Time and Event Annotator +
  DocTimeRel Property only?)
 
  CTAKES-275some of the older junit tests don't have the right
  Project name in the run configurations
 
  CTAKES-268Fix SentenceDetector training with updated OpenNLP API
 
  CTAKES-162Command line scripts leave the user back one directory
 
  CTAKES-241NullPointerException in ctakes-assertion
 
  CTAKES-288Severity not set for DiseaseDisorderMention
 
  CTAKES-239Medication Modifiers do not have the offsets populated
 
  CTAKES-94  refactoring assertion module to use a cleartk-based
  analysis engine (and include evaluation)
 
  CTAKES-232change concept type
 
  CTAKES-76  get third party dependencies into Maven Central
 
  CTAKES-138Remove 3rd party jars from our SVN
 
  CTAKES-74  Tokenizer PennTreeBank breaks with certain apostrophes
  in tokens.
 
  CTAKES-225Common Type System - Add field to save preferredText in
  Segment
 
  CTAKES-222FirstTokenPermLookupInitializerImpl to suppot arraylist
  of DictionaryLookupWindows
 
  CTAKES-213ModifierExtractorAnnotator should produce XxxxModifier
  subtypes
 
 
 
  Full List:
 
  https://issues.apache.org/jira/browse/CTAKES-
 
 288?jql=project%20%3D%20CTAKES%20AND%20fixVersion%20%3D%203.2%
 
 20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20create
  d%20ASC


Preparing for an Apache cTAKES 3.2 Release?

2014-06-06 Thread Chen, Pei
Hi,
The 3.2 release was slated to be release end of this month (Jun 21).
Since I volunteered to be the RM for this release, just like the past releases, 
I was planning to create a branch/tag next week from trunk and dev can continue.
Feel free to take a look at any outstanding Jira issues [1] that you may want 
to be included in this release.

Major changes include:
CTAKES-197Upgrade cTAKES to Java 7
CTAKES-292Integrate YTEX with cTAKES
CTAKES-82  Add ctakes-temporal module (Time and Event Annotator + 
DocTimeRel Property only?)

[1] 
https://issues.apache.org/jira/browse/CTAKES-298?jql=fixVersion%20%3D%203.2.0%20AND%20project%20%3D%20CTAKES

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Wednesday, March 26, 2014 9:34 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: Apache cTAKES 3.2 Release?
 
 +1 to naming it 3.2
 
 I'll review my JIRA items this week.
 
 -- James
 
 -Original Message-
 From: Pei Chen [mailto:chen...@apache.org]
 Sent: Wednesday, March 26, 2014 10:14 AM
 To: dev@ctakes.apache.org
 Subject: Apache cTAKES 3.2 Release?
 
 Hi,
 
 I think there are a lot of items slated for the next release, I suggest we 
 make
 it 3.2 instead of another patch release.
 
 I can volunteer to be the RM unless someone would like to take that up...
 
 
 
 Main Changes pending for 3.2:
 
 CTAKES-197Upgrade cTAKES to Java 7
 
 CTAKES-292Integrate YTEX with cTAKES
 
 CTAKES-82  Add ctakes-temporal module (Time and Event Annotator +
 DocTimeRel Property only?)
 
 CTAKES-275some of the older junit tests don't have the right
 Project name in the run configurations
 
 CTAKES-268Fix SentenceDetector training with updated OpenNLP API
 
 CTAKES-162Command line scripts leave the user back one directory
 
 CTAKES-241NullPointerException in ctakes-assertion
 
 CTAKES-288Severity not set for DiseaseDisorderMention
 
 CTAKES-239Medication Modifiers do not have the offsets populated
 
 CTAKES-94  refactoring assertion module to use a cleartk-based
 analysis engine (and include evaluation)
 
 CTAKES-232change concept type
 
 CTAKES-76  get third party dependencies into Maven Central
 
 CTAKES-138Remove 3rd party jars from our SVN
 
 CTAKES-74  Tokenizer PennTreeBank breaks with certain apostrophes
 in tokens.
 
 CTAKES-225Common Type System - Add field to save preferredText in
 Segment
 
 CTAKES-222FirstTokenPermLookupInitializerImpl to suppot arraylist
 of DictionaryLookupWindows
 
 CTAKES-213ModifierExtractorAnnotator should produce XxxxModifier
 subtypes
 
 
 
 Full List:
 
 https://issues.apache.org/jira/browse/CTAKES-
 288?jql=project%20%3D%20CTAKES%20AND%20fixVersion%20%3D%203.2%
 20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC%2C%20create
 d%20ASC


RE: Missing artifact org.apache.ctakes:ctakes-ytex-res:jar:3.1.2-SNAPSHOT

2014-05-16 Thread Chen, Pei
Michal,
Quick Q
Do the ctakes-ytex*physical folders exist after your refresh/checkout?
https://svn.apache.org/repos/asf/ctakes/trunk/
Depending on the setup, I don't recall if 'svn up' automatically adds new 
folders that were added subsequent to your original checkout or if you had to 
specify them explicitly.
--Pei

 -Original Message-
 From: michal.iglew...@uqo.ca [mailto:michal.iglew...@uqo.ca]
 Sent: Wednesday, May 14, 2014 9:54 PM
 To: dev@ctakes.apache.org
 Subject: Missing artifact org.apache.ctakes:ctakes-ytex-res:jar:3.1.2-
 SNAPSHOT
 
 Hi,
 
 I synchronised my copy of project with the repository and since then I cannot
 execute ctakes-clinical-pipeline project. I'm getting the message
 Missing artifact org.apache.ctakes:ctakes-ytex-res:jar:3.1.2-SNAPSHOT
 pom.xml /ctakes-clinical-pipelineline 88   Maven 
 Dependency
 Problem
 
 How to fix this problem?
 
 Michal



RE: markable types

2014-05-16 Thread Chen, Pei
+1 for a consolidated common type system...
I would go a step further- 'Markable' seems like a pretty general concept, 
maybe if folks can think of other uses, we can subclass a 
MarkableCoRefMarkable?

 -Original Message-
 From: Steven Bethard [mailto:steven.beth...@gmail.com]
 Sent: Sunday, May 11, 2014 8:12 AM
 To: dev@ctakes.apache.org
 Subject: Re: markable types
 
 I don't think not something anyone would want extracted should be an
 argument against anything. We already have constituent and dependency
 parse trees in the type system, and those would fall under that category.
 
 So +1 on markables in the type system. (In general, +1 on moving module-
 specific types to the standard type system. I'm not sure what the real benefit
 of splitting them out is...)
 
 Steve
 
 On Fri, May 9, 2014 at 11:53 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
  What do people think about taking the markable types out of the
  coreference project and adding them to the standard type system? This
  is a pretty standard concept in coreference that doesn't really have a
  great natural representation in the current type system -- it
  encompasses IdentifiedAnnotations as well as pronouns (It, him,
  her) and some determiners (this).
 
  The drawback I can see is that it is probably not something anyone
  would want extracted -- ultimately you want the actual coref pairs or
 chains.
  But it is useful for things like representing gold standard input or
  splitting coreference resolution into separate markable recognition
  and relation classification steps.
 
  Tim
 


resources in ctakes jars

2014-05-02 Thread Chen, Pei
There is a filter in the root pom.xml to only include these types inside the 
jars during package time:
So essentially, all of the jars from *-res projects will be empty.
I think this was residual setting when we were still in incubator where we were 
debating if resources should be included.
Are there any exceptions in removing this filter so that models will be 
included in the resource jars?  There will still be other issues that need to 
be resolved, but this will be a first step in allowing the models to be read in 
from a jar.

--Pei
   plugin
  artifactIdmaven-jar-plugin/artifactId
  version2.4/version
  configuration
   includes
   !-- Resources will be copied by the assembly
   No need for it to be inside the jar --
 include**/*.class/include
 include**/types/*.xml/include
 include**/META-INF/**/include
   /includes
/configuration
   /plugin


Re: suggestion for default pipelines

2014-04-28 Thread Chen, Pei
Yes. I was thinking of the use case for example- the ytex component need 
SentenceDectectorA but dictionary lookup component expects SentenceDectectorB. 
It's probably not too common but something to consider with the cool 
dynamic/plugin n play pipelines idea. 

Sent from my iPhone

 On Apr 28, 2014, at 5:46 AM, Richard Eckart de Castilho r...@apache.org 
 wrote:
 
 At the time a factory method becomes callable, the Maven/Ivy-magic should 
 already have taken place, no?
 
 -- Richard
 
 On 27.04.2014, at 17:52, Chen, Pei pei.c...@childrens.harvard.edu wrote:
 
 My vote would be for the latter. Have the Factory create pipelines 
 instead. It could just be a naming thing though...
 
 +1 for building dynamic pipelines. I think this idea has been thrown around 
 for sometime, but it hasn't been really worked on so it would be cool to see 
 it in action. I think the tricky part is handling pipeline dependencies- ie. 
 Similar concept to Maven/Ivy. 
 
 Sent from my iPhone
 
 On Apr 24, 2014, at 5:48 PM, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 Any preference for separate factory classes:
 
 class SentenceDetectorAnnotatorFactory:
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 VS
 
 static methods added to primitive annotators:
 
 class SentenceDetector (existing)
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 ?
 
 The former can clutter up the class space while the latter extends the
 length of classes, especially if there are multiple versions
 (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
 getMeshDictionaryAnnotator(), etc.)
 
 Tim
 
 On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
 It would be nice if uimaFIT provided a Maven plugin to automatically
 generate descriptors for aggregates. Maybe if we come up with a 
 convention for factories, e.g. a class with static methods that do
 not take any parameters and that return descriptors, or methods
 that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)
 it should be possible to implement such a Maven plugin.
 
 Cheers,
 
 -- Richard
 
 On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote:
 
 +1. And note that once you have a descriptor, you can generate the
 XML, so we should arrange to replace the current XML descriptors with
 ones generated automatically from the uimaFIT code. That should reduce
 some synchronization problems when the Java code was changed but the
 XML descriptor was not.
 
 Steve
 
 On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 The discussion in the other thread with Abraham Tom gave me an idea I
 wanted to float to the list. We have been using some UIMAFit pipeline
 builders in the temporal project that maybe could be moved into
 clinical-pipeline. For example, look to this file:
 
 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
 
 with the static methods getPreprocessorAggregateBuilder() and
 getLightweightPreprocessorAggregateBuilder()   [no umls].
 
 So my idea would be to create a class in clinical-pipeline
 (CTakesPipelines) with static methods for some standard pipelines (to
 return AnalysisEngineDescriptions instead of AggregateBuilders?):
 
 getStandardUMLSPipeline()  -- builds pipeline currently in
 AggregatePlaintextUMLSProcessor.xml
 getFullPipeline() -- same as above but with SRL, constituency parsing,
 etc., every component in ctakes
 
 We could then potentially merge our entry points -- I think Abraham's
 experience points out that this is currently confusing, as well as
 probably not implemented optimally. For example, either
 ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
 method to run a uimafit-style pipeline. Maybe we can slowly deprecate
 our xml descriptors too unless people feel strongly about keeping those
 around.
 
 Another benefit is that the cTAKES API is then trivial -- if you import
 ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
 
 builder.add(CTAKESPipelines.getStandardUMLSPipeline());
 
 
 I think this would actually be pretty easy to implement, but hoping to
 get some feedback on whether this is a good direction.
 
 Tim
 
 -- 
 Tim Miller
 Instructor
 Boston Children's Hospital and Harvard Medical School
 timothy.mil...@childrens.harvard.edu
 617-919-1223
 


Re: suggestion for default pipelines

2014-04-27 Thread Chen, Pei
My vote would be for the latter. Have the Factory create pipelines instead. 
It could just be a naming thing though...

+1 for building dynamic pipelines. I think this idea has been thrown around for 
sometime, but it hasn't been really worked on so it would be cool to see it in 
action. I think the tricky part is handling pipeline dependencies- ie. Similar 
concept to Maven/Ivy. 

Sent from my iPhone

 On Apr 24, 2014, at 5:48 PM, Miller, Timothy 
 timothy.mil...@childrens.harvard.edu wrote:
 
 Any preference for separate factory classes:
 
 class SentenceDetectorAnnotatorFactory:
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 VS
 
 static methods added to primitive annotators:
 
 class SentenceDetector (existing)
 
 static AnalysisEngineDescription getSentenceDetectorAnnotator()
 
 ?
 
 The former can clutter up the class space while the latter extends the
 length of classes, especially if there are multiple versions
 (getUMLSDictionaryAnnotator(), getICD9DictionaryAnnotator(),
 getMeshDictionaryAnnotator(), etc.)
 
 Tim
 
 On 04/16/2014 04:48 AM, Richard Eckart de Castilho wrote:
 It would be nice if uimaFIT provided a Maven plugin to automatically
 generate descriptors for aggregates. Maybe if we come up with a 
 convention for factories, e.g. a class with static methods that do
 not take any parameters and that return descriptors, or methods
 that bear a specific Java annotation, e.g. @AutoGenerateDescriptor)
 it should be possible to implement such a Maven plugin.
 
 Cheers,
 
 -- Richard
 
 On 16.04.2014, at 05:21, Steven Bethard steven.beth...@gmail.com wrote:
 
 +1. And note that once you have a descriptor, you can generate the
 XML, so we should arrange to replace the current XML descriptors with
 ones generated automatically from the uimaFIT code. That should reduce
 some synchronization problems when the Java code was changed but the
 XML descriptor was not.
 
 Steve
 
 On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 The discussion in the other thread with Abraham Tom gave me an idea I
 wanted to float to the list. We have been using some UIMAFit pipeline
 builders in the temporal project that maybe could be moved into
 clinical-pipeline. For example, look to this file:
 
 http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
 
 with the static methods getPreprocessorAggregateBuilder() and
 getLightweightPreprocessorAggregateBuilder()   [no umls].
 
 So my idea would be to create a class in clinical-pipeline
 (CTakesPipelines) with static methods for some standard pipelines (to
 return AnalysisEngineDescriptions instead of AggregateBuilders?):
 
 getStandardUMLSPipeline()  -- builds pipeline currently in
 AggregatePlaintextUMLSProcessor.xml
 getFullPipeline() -- same as above but with SRL, constituency parsing,
 etc., every component in ctakes
 
 We could then potentially merge our entry points -- I think Abraham's
 experience points out that this is currently confusing, as well as
 probably not implemented optimally. For example, either
 ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
 method to run a uimafit-style pipeline. Maybe we can slowly deprecate
 our xml descriptors too unless people feel strongly about keeping those
 around.
 
 Another benefit is that the cTAKES API is then trivial -- if you import
 ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
 
 builder.add(CTAKESPipelines.getStandardUMLSPipeline());
 
 
 I think this would actually be pretty easy to implement, but hoping to
 get some feedback on whether this is a good direction.
 
 Tim
 
 -- 
 Tim Miller
 Instructor
 Boston Children's Hospital and Harvard Medical School
 timothy.mil...@childrens.harvard.edu
 617-919-1223
 


RE: ctakes-vm.apache.org

2014-04-16 Thread Chen, Pei
Hi Andy,
Let me know if you're able to ssh -l -v to and...@ctakes-vm.apache.org
I believe all you would need to do is run ssh-keygen and then copy your public 
key to id.apache.org.

James: 
opiekey is only required for sudo access.  I see you're able to log in 
successfully already.
Nethertheless, Jan reset opiekey so you can try again if needed.

https://issues.apache.org/jira/browse/INFRA-7451

--Pei

 -Original Message-
 From: andy mcmurry [mailto:mcmurry.a...@gmail.com]
 Sent: Monday, April 07, 2014 5:43 PM
 To: dev@ctakes.apache.org
 Subject: Re: ctakes-vm.apache.org
 
 On a mac here, also having trouble logging in to the VM.
 Looking more into the keys situation.
 
 On mac, there is this problem: no support for PKCS#11
 Which I'm working to resolve.
 
 
 
 
 On Sat, Apr 5, 2014 at 12:15 PM, John Green
 john.travis.gr...@gmail.comwrote:
 
  Thanks Pei.
 
 
 
 
  Sure, that would be great, add me.
 
 
 
 
  Jg
 
  --
  Sent from Mailbox for iPhone
 
  On Fri, Apr 4, 2014 at 10:15 AM, Chen, Pei
  pei.c...@childrens.harvard.edu
  
  wrote:
 
   John,
   You should have committer rights now... I would suggest opening a
   Jira
  item just so that it can be tracked.
   But you should be able create a subdir within
  https://svn.apache.org/repos/asf/ctakes/sandbox and do an svn commit.
   As a side note: ctakes-vm.apache.org has been created now.  John,
   let
  me know if you would like to added as list of maintainers.
   We can use that machine to host any of the demo's.
   It requires passwordless ssh so you'll need to ssh-keygen and save
   them
  via http://id.apache.org.
   --Pei
   -Original Message-
   From: John Green [mailto:john.travis.gr...@gmail.com]
   Sent: Thursday, April 03, 2014 6:05 PM
   To: dev@ctakes.apache.org
   Cc: dev@ctakes.apache.org
   Subject: RE: ctakes-vm.apache.org
  
   Would love to! Ive only submitted those example notes I did though
   to a
  jira
   ticket. How do I push to the sandbox dir? Any special permissions I
  need?
  
  
  
  
   JG
  
   --
   Sent from Mailbox for iPhone
  
   On Wed, Apr 2, 2014 at 10:51 PM, Chen, Pei
   pei.c...@childrens.harvard.edu
   wrote:
  
John,
If there are no other objections, you can also put it directly in
sandbox https://svn.apache.org/repos/asf/ctakes/sandbox/
It may make it easier in the future if folks decided to integrate
into
   cTAKES... and possibly save any potential IP/License questions...
--Pei

From: John Green [john.travis.gr...@gmail.com]
Sent: Wednesday, April 02, 2014 6:24 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes-vm.apache.org Great!
Let me clean it up this weekend and ill throw it out onto my github.
  Will
   post link soon; nlt cob this weekend.
JG
--
Sent from Mailbox for iPhone
On Wed, Apr 2, 2014 at 1:53 PM, andy mcmurry
   mcmurry.a...@gmail.com
wrote:
Yes! Impeccable timing. Where can we find the python source?
On Apr 2, 2014 8:33 AM, John Green
john.travis.gr...@gmail.com
   wrote:
Andy: this is very interesting and exciting.
   
   
   
   
I hacked out a script that makes a visually appealing
representation of the aggregate pipeline in d3js that, at least
for a clinician, is a nice overall summary of the meta data generated
 from the pipeline.
Its really no more than a parser of the xml through the type
system spitted out into json, but when I was talking to my
informatics department who didnt know much at all about ctakes,
it was a great visual summary. Its in python. I dont know if
youd want it but it might be worth having the demo site spit
out a visually appealing graphic like this automatically. If
not in python it might be worth adapting it to whatever your
using for a platform to spit out the
  json for
   the d3js graphic im using.
   
   
   
   
John
   
--
Sent from Mailbox for iPhone
   
On Thu, Mar 20, 2014 at 5:31 AM, andy mcmurry
mcmurry.a...@gmail.com
wrote:
   
 Yes! I have been working full time on the apt-get install
 task specific to medical genetics:
 http://www.ncbi.nlm.nih.gov/medgen
 Right now, millions of $$$ are invested in getting phenotype
 concepts -- indications, diseases, problem lists -- linked to
 patient test results including DNA / RNA / etc. In industry,
 most of the curation work is done manually because platforms
 like cTAKES are not yet immediately
accessible.
 I have written code to
 A) start automating the installer tasks for cTAKES on Ubuntu
 13
 B) install UMLS NLP tools metamap, semrep, semmed
 C) mirror NLM content that extends UMLS annotation *SO THAT :
 * Mentions of diseases relationships -- SNOMED-CT, HPO, OMIM,
 GTR, UMLS -- reference the same semantic relationships in
 UMLS Clinical Terms and Genetic Test Reference. This is
 powerful and all credit to the NLM

RE: errors when run BagOfCUIsGenerator.java

2014-04-16 Thread Chen, Pei
Ying,
Are you behind a proxy or firewall?
If you're trying to use the umls resources, it attempts to make a call to their 
umls service to validate your credentials.
--Pei

 -Original Message-
 From: Liu, Ying [mailto:l...@advisory.com]
 Sent: Wednesday, April 16, 2014 1:13 PM
 To: dev@ctakes.apache.org
 Subject: errors when run BagOfCUIsGenerator.java
 
 It failed when run BagOfCUIsGenerator.java. The followings are the error
 information. Thanks for your help.
 Ying
 
 
 
 Exception in thread main
 org.apache.uima.resource.ResourceInitializationException: Initialization of
 annotator class
 org.apache.ctakes.dictionary.lookup.ae.UmlsDictionaryLookupAnnotator
 failed.  (Descriptor: file:/C:/Users/Ying/workspacectakes/ctakes/ctakes-
 dictionary-
 lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize
 AnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize
 (PrimitiveAnalysisEngine_impl.java:156)
 at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analysi
 sEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C
 ompositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:
 269)
 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework
 .java:387)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:25
 4)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initAS
 B(AggregateAnalysisEngine_impl.java:431)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializ
 eAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializ
 e(AggregateAnalysisEngine_impl.java:185)
 at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analysi
 sEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C
 ompositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:
 269)
 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework
 .java:354)
 at
 org.uimafit.factory.AnalysisEngineFactory.createAnalysisEngineFromPath(An
 alysisEngineFactory.java:147)
 at
 org.apache.ctakes.clinicalpipeline.runtime.BagOfAnnotationsGenerator.init
 (BagOfAnnotationsGenerator.java:42)
 at
 org.apache.ctakes.clinicalpipeline.runtime.BagOfAnnotationsGenerator.init
 (BagOfAnnotationsGenerator.java:36)
 at
 org.apache.ctakes.clinicalpipeline.runtime.BagOfCUIsGenerator.init(BagOf
 CUIsGenerator.java:16)
 at
 org.apache.ctakes.clinicalpipeline.runtime.BagOfCUIsGenerator.main(BagOf
 CUIsGenerator.java:49)
 Caused by: org.apache.uima.resource.ResourceInitializationException
 at
 org.apache.ctakes.dictionary.lookup.ae.UmlsDictionaryLookupAnnotator.initi
 alize(UmlsDictionaryLookupAnnotator.java:79)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize
 AnalysisComponent(PrimitiveAnalysisEngine_impl.java:250)
 ... 18 more
 Caused by: java.net.ConnectException: Connection timed out: connect
 at java.net.DualStackPlainSocketImpl.connect0(Native Method)
 at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
 at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
 at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
 at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
 at java.net.PlainSocketImpl.connect(Unknown Source)
 at java.net.SocksSocketImpl.connect(Unknown Source)
 at java.net.Socket.connect(Unknown Source)
 at sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
 at sun.security.ssl.BaseSSLSocketImpl.connect(Unknown Source)
 at sun.net.NetworkClient.doConnect(Unknown Source)
 at sun.net.www.http.HttpClient.openServer(Unknown Source)
 at sun.net.www.http.HttpClient.openServer(Unknown Source)
 at sun.net.www.protocol.https.HttpsClient.init(Unknown Source)
 at sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
 at
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNew
 HttpClient(Unknown Source)
 at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown
 Source)
 at
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(
 Unknown Source)
 at
 sun.net.www.protocol.http.HttpURLConnection.getOutputStream(Unknow
 n Source)
 at
 sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(Un
 known Source)
 at
 org.apache.ctakes.dictionary.lookup.ae.UmlsDictionaryLookupAnnotator.isV
 alidUMLSUser(UmlsDictionaryLookupAnnotator.java:93)
 at
 

RE: getSeverity etc. for relation extractor

2014-03-19 Thread Chen, Pei
If I remember correctly, I think those attributes were set in 
IdentifiedAnnotation via:
ctakes-template-filler/desc/analysis_engine/TemplateFillerAnnotator.xml
One can look at the logic in:
org.apache.ctakes.template.filler.ae.TemplateFillerAnnotator [1]

Have you tried added that to the pipeline?

[1] 
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-template-filler/src/main/java/org/apache/ctakes/template/filler/ae/TemplateFillerAnnotator.java

--Pei

 -Original Message-
 From: Chase Master [mailto:chasemast...@gmail.com]
 Sent: Wednesday, March 19, 2014 1:56 PM
 To: dev@ctakes.apache.org
 Subject: getSeverity etc. for relation extractor
 
 Hi,
 
 I am trying to output the relations associated with DiseaseDisorderMentions
 and other types.  But I want to start by iterating over
 DiseaseDisorderMention, not BinaryTextRelations since I want to be sure to
 find them all, even if they have no associated relation.
 
 I always get null when using any of the getters like getSeverity().  I am
 using the example text He had a slight fracture in the proximal right 
 fibula.
 When I iterate over BinaryTextRelations, I see the following valid values:
 BinaryTextRelation slightFracture = iterator.next();
 slightFracture.getArg1().getArgument().getCoveredText() is fracture
 slightFracture.getArg2().getArgument().getCoveredText() is slight.
 However, for the fracture DiseaseDisorderMention, getSeverity() is null.
  If it wasn't, I would then grab
 disease.getSeverity().getArg1().getArgument().getCoveredText(), or for
 Arg2.
 
 Thanks,
 Chase


RE: getSeverity etc. for relation extractor

2014-03-19 Thread Chen, Pei
Chase,
I am not sure why or the reasoning behind this, but it might explain why 
Severity is null for your DiseaseDisorderMention example:
Line 319 in TemplateFillerAnnotator.java:

If I'm reading this logic correctly, it will only populate severity for 
SignSymptomMention   Can't think of why not to populate it if it exists in 
the BinaryTextRelations- 
have you tried adding: ddm.setSeverity(degreeOfTextRelation); instead of 
logging the error ???

if (eventMention instanceof 
DiseaseDisorderMention) {
DiseaseDisorderMention ddm = 
(DiseaseDisorderMention) eventMention;
logger.error(Need to implement attr 
for  + relation +  for DiseaseDisorderMention); 
} else if (eventMention instanceof 
SignSymptomMention) {
SignSymptomMention ssm = 
(SignSymptomMention) eventMention;
ssm.setSeverity(degreeOfTextRelation);

Would you mind opening a Jira attach a patch/test if it works for you?
-Pei

 -Original Message-
 From: Chase Master [mailto:chasemast...@gmail.com]
 Sent: Wednesday, March 19, 2014 4:09 PM
 To: dev@ctakes.apache.org
 Subject: Re: getSeverity etc. for relation extractor
 
 Thanks,
 I tried using the AggregateTemplateFiller.xml from the template-filler
 module, and I specified the relation extractor pipeline that I was using 
 before
 from the relation-extractor project (there is also a different one in the
 template-filler project called
 RelationExtractorAggregateWithoutOrangeBook).  However, I don't see a
 difference, the severity is still null.
 
 Just wondering - is there some reason that the TemplateFiller is not included
 by default?  It seems confusing that there are getters for properties that
 aren't set in general ...even when one runs the default clinical pipeline
 instead of the RelationExtractorAggregate, these getters are there, but there
 are no relations.
 
 
 Thanks
 Chase
 
 
 On Wed, Mar 19, 2014 at 1:04 PM, Chen, Pei
 pei.c...@childrens.harvard.eduwrote:
 
  If I remember correctly, I think those attributes were set in
  IdentifiedAnnotation via:
  ctakes-template-filler/desc/analysis_engine/TemplateFillerAnnotator.xm
  l
  One can look at the logic in:
  org.apache.ctakes.template.filler.ae.TemplateFillerAnnotator [1]
 
  Have you tried added that to the pipeline?
 
  [1]
  http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-template-filler/sr
  c/main/java/org/apache/ctakes/template/filler/ae/TemplateFillerAnnotat
  or.java
 
  --Pei
 
   -Original Message-
   From: Chase Master [mailto:chasemast...@gmail.com]
   Sent: Wednesday, March 19, 2014 1:56 PM
   To: dev@ctakes.apache.org
   Subject: getSeverity etc. for relation extractor
  
   Hi,
  
   I am trying to output the relations associated with
  DiseaseDisorderMentions
   and other types.  But I want to start by iterating over
   DiseaseDisorderMention, not BinaryTextRelations since I want to be
   sure
  to
   find them all, even if they have no associated relation.
  
   I always get null when using any of the getters like
   getSeverity().  I
  am
   using the example text He had a slight fracture in the proximal
   right
  fibula.
   When I iterate over BinaryTextRelations, I see the following valid
  values:
   BinaryTextRelation slightFracture = iterator.next();
   slightFracture.getArg1().getArgument().getCoveredText() is fracture
   slightFracture.getArg2().getArgument().getCoveredText() is slight.
   However, for the fracture DiseaseDisorderMention, getSeverity() is
  null.
If it wasn't, I would then grab
   disease.getSeverity().getArg1().getArgument().getCoveredText(), or
   for Arg2.
  
   Thanks,
   Chase
 


[DRAFT] [REPORT] Apache cTAKES Mar 2014

2014-03-10 Thread Chen, Pei
Feel Free to add/edit (due 3/12/14)



Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) is a 
natural language processing (NLP) tool for information extraction from 
electronic medical record clinical free-text.



Issues:

There are no issues requiring board attention at this time.



Releases:

- ctakes-3.0.0-incubating on 2013-02-22

- ctakes-3.1.0 on 2013-08-30

- ctakes-3.1.1 on 2013-12-05



Development:

The committee is actively working and planning for the future release Some of 
the planned code changes for the upcoming release includes:

- YTEX (Yale Extensions for Apache cTAKES) has been committed to sandbox.

Key features include storing annotations into an relational db, exporting 
annotation to data mining toolkits (WEKA, R, Matlab, etc.).

- New faster dictionary lookup component has been committed to sandbox

- New temporal relations component in progress

- Various bug fixes and code enhancements tracked by Jira



Community:

Last Committers/PMC:

Murali Nagendranath (2013-10-21)

Vijay Garla (2013-11-16)

dev mailing list subscribers count: 103 (+9 since last report) user mailing 
list subscribers count: 91 (+10 since last report)



ctakes-pad-term-spotter component?

2014-02-18 Thread Chen, Pei
Hi,
Is anyone still using the pad-term-spotter component?
Deprecating this module if it's no longer used will simplify the codebase and 
reduce the effort in support...

--Pei



RE: YTEX LVG Fix

2014-02-14 Thread Chen, Pei
 Don't know how we'd find out if anyone is still using it.
I think we can start with the dev@ and user@ mailing lists to see if they're 
still using PAD Term spotter.  
And let them know the plans of removing it in the future major release.

I can volunteer some time for that...


 
 +1 for upgrading resources.
 
 -Original Message-
 From: Pei Chen [mailto:chen...@apache.org]
 Sent: Wednesday, February 12, 2014 10:38 AM
 To: dev@ctakes.apache.org
 Subject: Re: YTEX LVG Fix
 
 +1 for upgrading the LVG resources...
 Not sure how many people are still using the pad term spotter module-
 perhaps we could think about deprecating it if no one is using it??
 --Pei
 
 
 On Wed, Feb 12, 2014 at 11:24 AM, John David Osborne (Campus) 
 ozb...@uab.edu wrote:
 
  Thanks Vijay.
 
  I'm going to do the rebuild, but one thing I noticed is that YTEX is
  using the 2008 version of LVG and the old PAD term spotter looks like
  it is using the one from 2004!
 
  I guess they can both use different versions (each one living in its
  own directory), but given how dated that LVG is (and the number of
  improvements) maybe we should upgrade and/or share LVG? What do
 others
  think?
 
  I'm likely going to do the rebuild of the ytex branch with an newer
  version of LVG and see how it goes. I'm not sure how the PAD term
  spotter would like that though - it looks like it is using MySQL instead of
 HSQL.
 
  --
  John David Osborne
 
  Research Associate
  University of Alabama at Birmingham
  Biomedical Informatics
  Center for Clinical and Translational Science
  1720 7th Avenue South
  Sparks Building, Suite 175
  Birmingham, AL, 35294
 
 
 
 
 
  On 2/11/14 8:19 PM, vijay garla vnga...@gmail.com wrote:
 
  Hi John,
  
  Thanks for this.  I've updated the YTEXPipeline, fixed the lvg paths
  in SetupAUIFirstWord.
  
  If you want to re-run SetupAUIFirstWord (not necessary unless you are
  using the stemmed words for dictionary lookup), just svn update,
  rebuild ctakes-ytex-uima, and copy the jar to the lib dir of your
  ctakes install.
  
  Best,
  
  VJ
  
  
  On Mon, Feb 10, 2014 at 6:25 PM, John David Osborne (Campus)
  ozb...@uab.edu
   wrote:
  
These were the changes I made to get the YTEX pipeline working
  with LVG  (2008). It looks like there were just a couple of spots
  with some old  hard-coded paths in SetupAUIFirstWord.java that were
  appropriate to the old  ytex directory structure.
  
   For now I have just swapped them out to fit with the new directory
  structure, but I suppose the correct fix may be to extract them out
  somewhere...  In any case I don't have write privileges, some
  someone else  may want to fix this (Vijay?)
  
   I also included the YTEXPipeline.xml descriptor file I fixed as
   well in case anybody needs it.
  
-John
  
  
  
 
 


RE: cTakes-247

2014-02-07 Thread Chen, Pei
+1 Maybe even take a stab at filling in the descriptions...
--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Friday, February 07, 2014 10:12 AM
 To: 'dev@ctakes.apache.org'
 Subject: RE: cTakes-247
 
 
 I think spending a fairly small amount of time would be worthwhile, but not
 worrying about getting every last thing.
 
 I completely agree that you are in a good position to comment.
 
 -- James
 
 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Friday, February 07, 2014 6:11 AM
 To: dev@ctakes.apache.org
 Subject: Re: cTakes-247
 
 Pei - Yes, yes it seems it does. I saw that when exploring the ticket. Is the
 TypeSystem.xml complete? I think it is ... Also, is this something that anyone
 thinks is worth while? I suppose I may be in a better position to comment,
 learning my way through cTakes now and this being geared toward new folks
 such as myself. But, opinions before diving would be appreciated, time
 always being limited.
 
 JG
 
 
 On Thu, Feb 6, 2014 at 9:51 AM, Chen, Pei
 pei.c...@childrens.harvard.eduwrote:
 
  John,
  As a starting point, you may want to check out:
 
  http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.1.1/ctakes-type-s
  ystem/src/main/resources/org/apache/ctakes/typesystem/types/
  The content (Descriptions) probably needs to be filled in more...
  --Pei
 
   -Original Message-
   From: John Green [mailto:john.travis.gr...@gmail.com]
   Sent: Thursday, February 06, 2014 8:26 AM
   To: dev@ctakes.apache.org
   Subject: cTakes-247
  
   Anyone working on Jira item cTakes-247? If not, I was gonna tackle it.
  And if
   no one is, is everyone OK with a python script that auto-runs the
   XSLT transformations with some pretty css/javascript?
  
   JG
 


RE: Brat

2014-02-07 Thread Chen, Pei
I think some of the OpenNLP folks did some work with the Brat annotation tool, 
but I don't think anyone has worked on it with cTAKES-I would be curious on 
your analysis though...


 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Friday, February 07, 2014 6:01 AM
 To: dev@ctakes.apache.org
 Subject: Brat
 
 I've done a cursory search and come up short: has anyone written anything
 to convert the annotations from the pipeline to the Brat *.ann format?
 
 Thanks,
 JG


RE: cTakes-247

2014-02-06 Thread Chen, Pei
John,
As a starting point, you may want to check out:
http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.1.1/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/
The content (Descriptions) probably needs to be filled in more...
--Pei

 -Original Message-
 From: John Green [mailto:john.travis.gr...@gmail.com]
 Sent: Thursday, February 06, 2014 8:26 AM
 To: dev@ctakes.apache.org
 Subject: cTakes-247
 
 Anyone working on Jira item cTakes-247? If not, I was gonna tackle it. And if
 no one is, is everyone OK with a python script that auto-runs the XSLT
 transformations with some pretty css/javascript?
 
 JG


RE: sentence detector newline behavior

2014-01-29 Thread Chen, Pei
+1
There's an example of the configs here :)
https://issues.apache.org/jira/browse/CTAKES-98

I think we should be able to use OpenNLP's Sentence Annotator directly if we no 
longer need the custom newline rule(s) 
[Or if we find that a fixed rule is still required, perhaps OpenNLP can support 
it via config as well- there doesn't seem to be anything cTAKES specific about 
it].
Pending the results of Tim's retraining/evaluation of the new models??

--Pei
 -Original Message-
 From: Jörn Kottmann [mailto:kottm...@gmail.com]
 Sent: Wednesday, January 29, 2014 3:55 PM
 To: dev@ctakes.apache.org
 Subject: Re: sentence detector newline behavior
 
 On 01/27/2014 08:44 PM, Tim Miller wrote:
 
  That is a good point, and something I was wondering about. Having now
  looked at both the ctakes and opennlp code for the sentence splitter
  it seems like there is a lot of overlap. I would've thought it was
  just a matter of converting annotations into our type system. So I'm
  curious if there is some justification for why there seems to be
  duplication (or if I'm hallucinating it).
 
 It should be possible (and if not we should make it possible) to directly use
 the opennlp-uima integration. It supports dynamic types which can be
 mapped in the descriptor.
 This would also give you a smooth transition, your existing integration could
 be labeled as deprecated and be removed in one of the future releases.
 
 Jörn


Re: How are cTAKES resources distributed via Maven Central?

2014-01-27 Thread Chen, Pei
The contents  -res jars/projects have been temp commented out by the parent 
Pom. 
-it was quite ready for prime time yet. Mainly because some of the code still 
can not load resources from jars/class paths (some of which we do not have 
control the 3rd party code so it's not as straight forward) And leaving both 
file/cp could lead to conflicts. We need to switch that over- 

Sent from my iPhone

 On Jan 27, 2014, at 8:23 AM, Richard Eckart de Castilho r...@apache.org 
 wrote:
 
 Hi
 
 I was looking if/how cTAKES distributes resources via Maven Central. I found 
 some, but I am actually quite a bit confused now.
 
 There are the component-res artifacts [1], like ctakes-pos-tagger-res. 
 These have a JAR and a sources-JAR. The JAR is practically empty, but the 
 sources-JAR appears to contain the actual resources. Is there a special 
 reason for this?
 
 Additionally, there is the ctakes-resources-distribution which is distributed 
 as a bin.zip via Maven Central. It appears to contain UMLS data. Has it 
 been replaced by ctakes-resources-umls2011ab in 3.1.1? The 
 ctakes-resources-umls2011ab JAR actually contains data, contrary to the 
 component-res JARs mentioned above. Why is there data in this JAR, but not 
 in the component-res JARs?
 
 /me scratching head…
 
 Please enlighten me :)
 
 Cheers,
 
 -- Richard
 
 [1] http://search.maven.org/#search%7Cga%7C1%7Cctakes%20res


RE: Apache cTAKES confluence wiki spam

2014-01-07 Thread Chen, Pei
Done.  
Anonymous is read-only now.
Thanks for pointing that out- I always thought that was the default.
 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Tuesday, January 07, 2014 10:34 AM
 To: 'dev@ctakes.apache.org'
 Subject: Apache cTAKES confluence wiki spam
 
 We've had a spate of spam in anonymous comments on the Apache cTAKES
 confluence wiki space in the last two days.
 Could a space admin (Pei?) remove the ability for anonymous comments to
 the Apache cTAKES wiki? I don't seem to be a space admin (or I am totally
 overlooking the options for permissions settings)
 
 - James
 P.S. Troy and I deleted the spam comments.


RE: YTEX cTAKES 3.1.1 ready

2014-01-07 Thread Chen, Pei
 * How can I distribute the ctakes binary distribution to ytex users before the
 merge? Can we make the branch build available somewhere?  The binary
 distribution is too large to host on the ytex google code site (max 200 MB)
Is this for testing purposes?  Or official release? If it's just for testing, 
there will be more options...
Are you referring to the convenience binary/zip file?  Or maven artifacts that 
could be deployed to the SNAPSHOTS repo [1]?
If it's for testing, you can always have users build from source via mvn 
package (assuming you added the ytex* to the ctakes-distribution module)?
Again if it's for testing, you can always try the svn or home dir.  But it's 
not the recommended channel for actual distribution to users because that 
normally has to go through the normal release process (Voting, etc.). 

 * Non-ASF libraries - I have segregated these out into their own zip file that
 can be distributed via sourceforge.  As a stopgap, I can upload this to the 
 ytex
 google code site, but would prefer to upload to sourceforge.
Are these optional 3rd party libs available via maven central?

 * UMLS Derivatives - Ditto for these - would like to move to sourceforge.
Are you planning to distribute them via maven central?  I think it would be 
nice to make these available as maven artifacts.
If so, what is your sourceforge id? We can grant you access to the existing 
ctakes resourcse project [2]:
The pom.xml is already setup to upload to OSS Sonatype (request a login for oss 
sonatype to perform a mvn deploy for the actual upload later on)...

 * Documentation - How can I update the confluence docs?  I would migrate
 the documentation from the google code website.
This would be great; You've been added to the cTAKES confluence space [3].

Downloading the code now... To be continued...

[1] https://repository.apache.org/content/groups/snapshots/org/apache/ctakes/
[2] http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/
[3] https://cwiki.apache.org/confluence/display/CTAKES/cTAKES

 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Friday, January 03, 2014 10:23 PM
 To: ytex-us...@googlegroups.com; ctakes-...@incubator.apache.org
 Subject: YTEX cTAKES 3.1.1 ready
 
 Hello All,
 
 I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.  Most of 
 the
 YTEX functionality has been ported and integrated with cTAKES, and I've
 tested with MySQL and MS SQL Server (oracle tests pending).
 
 Most of the changes were made in new projects - very little existing cTAKES
 code has been modified.  The only non-trivial changes are in /ctakes-
 assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api
 - here I modified CharacterOffsetToLineTokenConverterCtakesImpl 
 SingleDocumentProcessorCtakes to deal with newlines within sentences
 correctly.  Can somebody take a look at the changes in the ytex branch?
 
 I believe that the branch
 https://svn.apache.org/repos/asf/ctakes/branches/ytex is ready to be
 merged into ctakes trunk, but would like other users to test it as well.
  Questions:
 
 * How can I distribute the ctakes binary distribution to ytex users before the
 merge? Can we make the branch build available somewhere?  The binary
 distribution is too large to host on the ytex google code site (max 200 MB)
 * Non-ASF libraries - I have segregated these out into their own zip file that
 can be distributed via sourceforge.  As a stopgap, I can upload this to the 
 ytex
 google code site, but would prefer to upload to sourceforge.
 * UMLS Derivatives - Ditto for these - would like to move to sourceforge.
 * Documentation - How can I update the confluence docs?  I would migrate
 the documentation from the google code website.
 
 Here the installation instructions (putting the wagon in front of the horse
 ...)
 
 https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1?ts=13887939
 98updated=Installation_cTAKES_3_1
 
 Best,
 
 VJ


RE: UMLS Env variables suggestion

2014-01-06 Thread Chen, Pei
Sounds like a good idea; 
we can just update all of the documentation/scripts to use underscore (_), and 
leave the dot (.) in the code to be deprecated for now?
--Pei

 -Original Message-
 From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
 Sent: Saturday, January 04, 2014 10:10 PM
 To: dev@ctakes.apache.org
 Subject: RE: UMLS Env variables suggestion
 
 This went in to 3.1  https://issues.apache.org/jira/browse/CTAKES-164
 
 I agree - the docs need to be updated if there is consensus on the use of this
 method.  Personally I think that there should be one supported method, not
 both dot and underscore.  I would prefer that we remove the dot
 functionality since it is not operational across all environments, but it 
 isn't up
 to me alone to remove functionality.
 
 -Original Message-
 From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu]
 Sent: Saturday, January 04, 2014 4:08 PM
 To: dev@ctakes.apache.org
 Cc: dev@ctakes.apache.org
 Subject: Re: UMLS Env variables suggestion
 
 I believe Sean updated the code to also support underscore (_) as well. But
 the docs just need to be updated...
 
 
  On Jan 4, 2014, at 4:04 PM, Dewful dew...@gmail.com wrote:
 
  In the documentation, in the .sh files to run ctakes;
 
  # If you plan to use the UMLS Resources, set/export env variables #
  export ctakes.umlsuser=[username], ctakes.umlspw=[password]
 
  however, simply trying to
 
  export ctakes.umlsuser=myusername, ctakes.umlspw=mypassword
 
  doesnt work because bash3 doesnt allow dots in the keyname and will
  throw an error
 
  bin/runctakesCVD.sh: line 42: export: `ctakes.umlsuser=username,': not
  a valid identifier
 
  http://stackoverflow.com/questions/15016403/how-to-export-dot-
 separate
  d-environment-variablesexplains
  some solutions
 
  it may be helpful to show how the user can set these easily if they
  want to set the env variables this way, possibly using one of the
 suggestions in SO.
 
  N


Re: UMLS Env variables suggestion

2014-01-04 Thread Chen, Pei
I believe Sean updated the code to also support underscore (_) as well. But the 
docs just need to be updated...


 On Jan 4, 2014, at 4:04 PM, Dewful dew...@gmail.com wrote:
 
 In the documentation, in the .sh files to run ctakes;
 
 # If you plan to use the UMLS Resources, set/export env variables
 # export ctakes.umlsuser=[username], ctakes.umlspw=[password]
 
 however, simply trying to
 
 export ctakes.umlsuser=myusername, ctakes.umlspw=mypassword
 
 doesnt work because bash3 doesnt allow dots in the keyname and will throw
 an error
 
 bin/runctakesCVD.sh: line 42: export: `ctakes.umlsuser=username,': not a
 valid identifier
 
 http://stackoverflow.com/questions/15016403/how-to-export-dot-separated-environment-variablesexplains
 some solutions
 
 it may be helpful to show how the user can set these easily if they want to
 set the env variables this way, possibly using one of the suggestions in SO.
 
 N


Re: YTEX cTAKES 3.1.1 ready

2014-01-04 Thread Chen, Pei
This is awesome VJ.  
I'll take a look at it this week unless someone beats me to it

 On Jan 3, 2014, at 10:22 PM, vijay garla vnga...@gmail.com wrote:
 
 Hello All,
 
 I have finished an initial cut at the port of YTEX to cTAKES 3.1.1.  Most
 of the YTEX functionality has been ported and integrated with cTAKES, and
 I've tested with MySQL and MS SQL Server (oracle tests pending).
 
 Most of the changes were made in new projects - very little existing cTAKES
 code has been modified.  The only non-trivial changes are
 in 
 /ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/i2b2/api
 - here I modified CharacterOffsetToLineTokenConverterCtakesImpl 
 SingleDocumentProcessorCtakes to deal with newlines within sentences
 correctly.  Can somebody take a look at the changes in the ytex branch?
 
 I believe that the branch
 https://svn.apache.org/repos/asf/ctakes/branches/ytex is ready to be merged
 into ctakes trunk, but would like other users to test it as well.
 Questions:
 
 * How can I distribute the ctakes binary distribution to ytex users before
 the merge? Can we make the branch build available somewhere?  The binary
 distribution is too large to host on the ytex google code site (max 200 MB)
 * Non-ASF libraries - I have segregated these out into their own zip file
 that can be distributed via sourceforge.  As a stopgap, I can upload this
 to the ytex google code site, but would prefer to upload to sourceforge.
 * UMLS Derivatives - Ditto for these - would like to move to sourceforge.
 * Documentation - How can I update the confluence docs?  I would migrate
 the documentation from the google code website.
 
 Here the installation instructions (putting the wagon in front of the horse
 ...)
 
 https://code.google.com/p/ytex/wiki/Installation_cTAKES_3_1?ts=1388793998updated=Installation_cTAKES_3_1
 
 Best,
 
 VJ


RE: Output Schema Documentation

2013-12-24 Thread Chen, Pei
I think the type system doc [1] and javadoc [2] is probably the closest thing I 
could think of:
It's not an xml schema of the UIMA XMI per se though...
[1] 
http://svn.apache.org/repos/asf/ctakes/tags/ctakes-3.1.1/ctakes-type-system/src/main/resources/org/apache/ctakes/typesystem/types/TypeSystem.xml
[2] http://ctakes.apache.org/apidocs/3.1.1/

Hope that helps, but if you have suggestions, feel free to post it on this 
list...
--Pei

 -Original Message-
 From: nartz...@gmail.com [mailto:nartz...@gmail.com] On Behalf Of
 Dewful
 Sent: Monday, December 23, 2013 7:11 AM
 To: dev@ctakes.apache.org
 Subject: Output Schema Documentation
 
 Hi All -
 
 I'm wondering if there is a piece of documentation that describes in detail
 the actual output of running the AggregatePlaintextUMLSProcessor?
 
  I can export the annotations as XML, but arent quite sure what I'm looking
 at. but would like to know the output of each component without having to
 jump into each javadoc / src code. Does this exist somewhere?
 
 Thanks!
 
 Nick


Re: cTAKES Virtual Machine update

2013-12-23 Thread Chen, Pei
I can ping ASF infra to see if they offer any vm's for Demo's etc. will revert 
back to see what I find out.

Sent from my iPhone

 On Dec 23, 2013, at 10:17 AM, Masanz, James J. masanz.ja...@mayo.edu 
 wrote:
 
 Hi Andy,
 
 It's great to see such enthusiasm!
 
 I like all the ideas. I don't know how we ensure our image doesn't end up out 
 of date, but I think it's worth doing -- and given the size of our community, 
 I hope it will be able to be kept up to date.
 
 I'm not sure where we would host the public demo server, and since keeping 
 the image itself up to date would be needed in order to keep the live 
 public/demo server up to date, my thoughts are to start with 2 and 3 and 
 decide about #1 later.
 
 -- James
 
 
 -Original Message-
 From: dev-return-2339-Masanz.James=mayo@ctakes.apache.org 
 [mailto:dev-return-2339-Masanz.James=mayo@ctakes.apache.org] On Behalf Of 
 Andrew McMurry
 Sent: Friday, December 20, 2013 7:02 PM
 To: dev@ctakes.apache.org
 Subject: cTAKES Virtual Machine update
 
 Eating your own dog food 
 
 I've bundled a NLP appliance containing UMLS and cTAKES and I'm migrating 
 services to the AWS cloud. 
 Throughout this process I have extensively documented the steps and I made a 
 VM target for Ubuntu 13.04. 
 
 I would have used the iDASH appliance but it is no longer maintained and was 
 way out of date. 
 From conversations and experiences witth iDASH and NCBO bioportal I strongly 
 support a VM option for cTAKES downloads.  
 
 Suggestions 
 
 #[ 1 ] Public demo server (instance of VM)
 For new users, this is the hook that motivates them to get involved BEFORE 
 licensing and install. 
 http://bioportal.bioontology.org/annotator
 
 #[ 2 ] Downloading the public demo server is power ! 
 http://nlp-ecosystem.ucsd.edu/
 http://bioportal.bioontology.org/virtual_appliance
 
 #[ 3 ] Example code for each cTAKES component is game changing 
 
 cTAKES has an impressive set of components, but no way to try them out. 
 http://ctakes.apache.org/components.html
 
 Whereas you can play with the components of Bioportal, for example 
 http://data.bioontology.org/documentation
 
 I'm sharing these experiences and suggestions because I truly do believe they 
 are crucial and likely more important than any level of new-features that can 
 be added. 
 
 Hoping this message finds you well. 
 I will soon circulate a copy of a VM to play around with and get feedback. 
 
 Thoughts? 
 
 --AndyMC 
 


RE: cTakes: question on updating cue words

2013-12-16 Thread Chen, Pei
[moved to dev@]
Hi Paula,
My suggestion would be to open a Jira item so that it could be tracked:
https://issues.apache.org/jira/browse/CTAKES (Feel free to create a new 
account).
Even cooler if you could attach the affected files with the patch(diffs) and 
any tests.
--Pei


From: digital paula [mailto:cybersat...@hotmail.com]
Sent: Monday, December 16, 2013 1:30 PM
To: u...@ctakes.apache.org
Subject: cTakes: question on updating cue words

Hello again cTAKES Community,

I would like to  add additional cue words to polarity (for negation) and 
uncertainty.I would so appreciate if someone can let me know how I can add 
additional cue words.

Thanks.

Regards,
Paula


RE: scala and groovy

2013-12-13 Thread Chen, Pei
James,
Would it be possible to also attach your script?  I can try to replicate it 
here...
--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Friday, December 13, 2013 11:34 AM
 To: 'dev@ctakes.apache.org'
 Subject: scala and groovy
 
 
 I'm still working on getting the clinical-pipeline
 (AggregatePlaintextUMLSProcessor) to run from groovy, using the
 parser.groovy as a starting point
 
 A side issue to the main point of this post:
 The first  issue is already marked as a TODO in parser.groovy -- about
 downloading models. I am working around that for now by programmatically
 downloading all models needed and the LookupDesc_Db.xml from SVN in
 the same way parser.groovy downloads the sentence detector model,
 because things within the ctakes-*-res jars aren't being found by
 org.apache.ctakes.core.resource.FileLocator#getAsStream
 
 Another issue is some of the jars that are used by the assertion component:
 med-facts-i2b2-1.2-SNAPSHOT.jar
 med-facts-zoner-1.1.jar
 jcarafe-ext_2.9.1-0.9.8.3.RC4.jar
 jcarafe-core_2.9.1-0.9.8.3.RC4.jar
 
 I download those also separately from SVN and add them using statements
 like this:
 this.class.classLoader.rootLoader.addURL( new URL(libLocation + jarName) );
 
 The bigger issue is the following
 I was getting an error about scala, so I added the following to the Grapes
 annotation in my groovy script @Grab(group='org.scala-lang', module='scala-
 library', version='2.9.0'), @Grab(group='org.scala-tools.sbinary',
 module='sbinary_2.9.0', version='0.4.0'),
 
 Those grapes now appear in my grapes repo. But I am getting the following
 error, and I don't know why ScopeParser cannot see scala.ScalaObject
 which I believe is defined in one of the scala jars that I added to the
 rootLoader as I described above.
 
 scope model: /C:/usr/meTAKES/using-
 groovy/org/apache/ctakes/assertion/models/scope.model
 Caught: java.lang.NoClassDefFoundError: scala/ScalaObject
 java.lang.NoClassDefFoundError: scala/ScalaObject
 at
 org.mitre.medfacts.i2b2.annotation.ScopeParser.init(ScopeParser.java:22)
 at
 org.apache.ctakes.assertion.medfacts.AssertionAnalysisEngine.initialize(Ass
 ertionAnalysisEngine.java:121)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize
 AnalysisComponent(PrimitiveAnalysisEngine_impl.java:250)
 at
 org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize
 (PrimitiveAnalysisEngine_impl.java:156)
 at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analysi
 sEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C
 ompositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:
 269)
 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework
 .java:387)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:25
 4)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initAS
 B(AggregateAnalysisEngine_impl.java:431)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializ
 eAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializ
 e(AggregateAnalysisEngine_impl.java:185)
 at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analysi
 sEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C
 ompositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:
 269)
 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework
 .java:387)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:25
 4)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initAS
 B(AggregateAnalysisEngine_impl.java:431)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializ
 eAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializ
 e(AggregateAnalysisEngine_impl.java:185)
 at
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analysi
 sEngineFactory_impl.java:94)
 at
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C
 ompositeResourceFactory_impl.java:62)
 at
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:
 269)
 at
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework
 .java:387)
 at
 org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:25
 4)
 at
 org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initAS
 B(AggregateAnalysisEngine_impl.java:431)
 at
 

RE: scala and groovy

2013-12-13 Thread Chen, Pei
James,
I wonder why the transitive dependencies didn't resolve that automatically (I 
think scala is in maven central) just like the other jars...
i.e. why do we need to have it manually added to the classpath?

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Friday, December 13, 2013 2:02 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: scala and groovy
 
 
 Sort of good news on the scala front -- after I added the complete filename
 for the jars to the rootLoader statements (I had left out the version part 
 of
 the jar names),  using scala now works.
 
 jarName = scala-library-2.9.0.jar;
 this.class.classLoader.rootLoader.addURL( new URL(libLocation + jarName) );
 jarName = sbinary_2.9.0-0.4.0.jar;
 this.class.classLoader.rootLoader.addURL( new URL(libLocation + jarName) );
 
 (not ready to post a full solution yet - pipeline did not run to completion, 
 but
 at least I'm on to the next issue now)
 
 -- James
 
 -Original Message-
 From: dev-return-2314-Masanz.James=mayo@ctakes.apache.org
 [mailto:dev-return-2314-Masanz.James=mayo@ctakes.apache.org] On
 Behalf Of Masanz, James J.
 Sent: Friday, December 13, 2013 10:59 AM
 To: 'dev@ctakes.apache.org'
 Subject: RE: scala and groovy
 
 That would be great if you could try to replicate.
 Check out  ctakes/sandbox/groovy/cTAKES-with-resources.groovy
 Download the two zips from ctakes/sandbox/groovy-temp-resources/
 Extract them so that the directory containing  cTAKES-with-resources.groovy
 also now contains a desc subdirectory and an org subdirectory.
 mkdir inputDir
 groovy cTAKES-with-resources.groovy  inputDir
 
 Thanks Pei!
 
 -Original Message-
 From: dev-return-2313-Masanz.James=mayo@ctakes.apache.org
 [mailto:dev-return-2313-Masanz.James=mayo@ctakes.apache.org] On
 Behalf Of Chen, Pei
 Sent: Friday, December 13, 2013 10:48 AM
 To: dev@ctakes.apache.org
 Subject: RE: scala and groovy
 
 James,
 Would it be possible to also attach your script?  I can try to replicate it 
 here...
 --Pei
 
  -Original Message-
  From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
  Sent: Friday, December 13, 2013 11:34 AM
  To: 'dev@ctakes.apache.org'
  Subject: scala and groovy
 
 
  I'm still working on getting the clinical-pipeline
  (AggregatePlaintextUMLSProcessor) to run from groovy, using the
  parser.groovy as a starting point
 
  A side issue to the main point of this post:
  The first  issue is already marked as a TODO in parser.groovy -- about
  downloading models. I am working around that for now by
  programmatically downloading all models needed and the
  LookupDesc_Db.xml from SVN in the same way parser.groovy downloads
 the
  sentence detector model, because things within the ctakes-*-res jars
  aren't being found by
  org.apache.ctakes.core.resource.FileLocator#getAsStream
 
  Another issue is some of the jars that are used by the assertion
 component:
  med-facts-i2b2-1.2-SNAPSHOT.jar
  med-facts-zoner-1.1.jar
  jcarafe-ext_2.9.1-0.9.8.3.RC4.jar
  jcarafe-core_2.9.1-0.9.8.3.RC4.jar
 
  I download those also separately from SVN and add them using
  statements like this:
  this.class.classLoader.rootLoader.addURL( new URL(libLocation +
  jarName) );
 
  The bigger issue is the following
  I was getting an error about scala, so I added the following to the
  Grapes annotation in my groovy script @Grab(group='org.scala-lang',
  module='scala- library', version='2.9.0'),
  @Grab(group='org.scala-tools.sbinary',
  module='sbinary_2.9.0', version='0.4.0'),
 
  Those grapes now appear in my grapes repo. But I am getting the
  following error, and I don't know why ScopeParser cannot see
  scala.ScalaObject which I believe is defined in one of the scala jars
  that I added to the rootLoader as I described above.
 
  scope model: /C:/usr/meTAKES/using-
  groovy/org/apache/ctakes/assertion/models/scope.model
  Caught: java.lang.NoClassDefFoundError: scala/ScalaObject
  java.lang.NoClassDefFoundError: scala/ScalaObject
  at
 
 org.mitre.medfacts.i2b2.annotation.ScopeParser.init(ScopeParser.java:22)
  at
  org.apache.ctakes.assertion.medfacts.AssertionAnalysisEngine.initializ
  e(Ass
  ertionAnalysisEngine.java:121)
  at
  org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.init
  ialize
  AnalysisComponent(PrimitiveAnalysisEngine_impl.java:250)
  at
  org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.init
  ialize
  (PrimitiveAnalysisEngine_impl.java:156)
  at
 
 org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analys
  i
  sEngineFactory_impl.java:94)
  at
 
 org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(C
  ompositeResourceFactory_impl.java:62)
  at
 
 org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:
  269)
  at
 
 org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework
  .java:387

RE: scala and groovy

2013-12-13 Thread Chen, Pei
Also, 
My 2 cents on dynamic downloading-
If the use case is for developers to have a simple script to start coding away 
that is sits on top of cTAKES without having to learn about UIMA etc..., then 
dynamic downloads are probably okay. 

But if the goal is to make a stand-alone product w/o any dynamic downloads or 
internet dependencies, then it probably makes sense to use groovy as a 
'super-build' script to generate a single working app?
If this is an enterprise environment, then you probably would not want to have 
a dependency on dynamic internet downloads anyway- Which is why a lot of 
institutions have their own internal repos and such...

--Pei

 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Friday, December 13, 2013 4:47 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: scala and groovy
 
 
 Thanks Richard for doing all that testing.
 
 But the idea that we cannot easily get to what is causing the issue, together
 with the fact Tim was able to reproduce one of my issues [1], leads me to
 question using dynamic downloading of anything for our users.
 
 I would prefer to see a single download that a user extracts from, which I see
 having the following advantages
  - no mysterious suspected network issues
  - user can be told how much space will be taken up
  - user has easy control where things will be put (rather than having to
 configure where grapes will be stored, if user does not want them under
 their home directory)
 
 That's my 2 cents.
 
 Yes, I am behind a firewall. And in fact I am VPN'd in to work. But I suspect
 some of our users do that too.
 
 [1] http://markmail.org/message/lgo7eyruotl7nnix
 
 -- James
 
 
 -Original Message-
 From: dev-return-2322-Masanz.James=mayo@ctakes.apache.org
 [mailto:dev-return-2322-Masanz.James=mayo@ctakes.apache.org] On
 Behalf Of Richard Eckart de Castilho
 Sent: Friday, December 13, 2013 3:36 PM
 To: dev@ctakes.apache.org
 Subject: Re: scala and groovy
 
 Hi James,
 
 I enabled info on the grape resolving using
 
   export JAVA_OPTS=-Dgroovy.grape.report.downloads=true $JAVA_OPTS
 
 Then I tried your script three times.
 
 1) First, I just ran without any changes to my system (custom
 grapeConfig.xml which avoids using .m2/repository, no flush of
 ~/groovy/grapes). It downloaded some missing artifacts and printed the
 message.
 
 2) Then I deleted my ~/.groovy/grapes folder and tried again. It downloaded
 all artifacts and printed the message.
 
 3) Then - just to make sure - I removed my customized grapeConfig.xml.
 Then I deleted my ~/.m2/repository and ~/.groovy/grapes again. It
 downloaded all artifacts and printed the message. It couldn't be a cleaner
 test then this one, I suppose.
 
 So here is the output full of the third run:
 
 $ ./blah
 Resolving dependency: org.cleartk#cleartk-util;0.9.2 {default=[default]}
 Preparing to download artifact org.cleartk#cleartk-util;0.9.2!cleartk-util.jar
 Preparing to download artifact org.apache.uima#uimaj-core;2.4.0!uimaj-
 core.jar
 Preparing to download artifact org.uimafit#uimafit;1.4.0!uimafit.jar
 Preparing to download artifact args4j#args4j;2.0.16!args4j.jar Preparing to
 download artifact com.google.guava#guava;13.0!guava.jar
 Preparing to download artifact com.carrotsearch#hppc;0.4.1!hppc.jar
 Preparing to download artifact commons-io#commons-io;2.4!commons-
 io.jar
 Preparing to download artifact commons-lang#commons-lang;2.4!commons-
 lang.jar
 Preparing to download artifact org.apache.uima#uimaj-tools;2.4.0!uimaj-
 tools.jar
 Preparing to download artifact org.springframework#spring-
 core;3.1.0.RELEASE!spring-core.jar
 Preparing to download artifact org.springframework#spring-
 context;3.1.0.RELEASE!spring-context.jar
 Preparing to download artifact org.apache.uima#uimaj-cpe;2.4.0!uimaj-
 cpe.jar
 Preparing to download artifact org.apache.uima#uimaj-document-
 annotation;2.4.0!uimaj-document-annotation.jar
 Preparing to download artifact org.apache.uima#uimaj-adapter-
 vinci;2.4.0!uimaj-adapter-vinci.jar
 Preparing to download artifact org.apache.uima#jVinci;2.4.0!jVinci.jar
 Preparing to download artifact org.springframework#spring-
 asm;3.1.0.RELEASE!spring-asm.jar
 Preparing to download artifact commons-logging#commons-
 logging;1.1.1!commons-logging.jar
 Preparing to download artifact org.springframework#spring-
 aop;3.1.0.RELEASE!spring-aop.jar
 Preparing to download artifact org.springframework#spring-
 beans;3.1.0.RELEASE!spring-beans.jar
 Preparing to download artifact org.springframework#spring-
 expression;3.1.0.RELEASE!spring-expression.jar
 Preparing to download artifact aopalliance#aopalliance;1.0!aopalliance.jar
 Downloaded 8478 Kbytes in 44860ms:
   [SUCCESSFUL ] org.cleartk#cleartk-util;0.9.2!cleartk-util.jar (1385ms)
   [SUCCESSFUL ] org.apache.uima#uimaj-core;2.4.0!uimaj-core.jar (5326ms)
   [SUCCESSFUL ] org.uimafit#uimafit;1.4.0!uimafit.jar (1553ms)
   [SUCCESSFUL ] commons-lang#commons-lang;2.4!commons-lang.jar
 

RE: scala and groovy

2013-12-13 Thread Chen, Pei
James,
If you enable the verbose debugging, it may help identify the cause:
Try removing the artifact in question from your .m2 directory and your .groovy
And then:
$grape -d install org.springframework spring-asm 3.1.0.RELEASE
Which should output the full path it will take in attempting to resolve..
--Pei


 -Original Message-
 From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
 Sent: Friday, December 13, 2013 4:15 PM
 To: 'dev@ctakes.apache.org'
 Subject: RE: scala and groovy
 
 My experience this week with groovy and grapes has been one of
 frustration.
 
 Having an issue with  download failed: org.springframework#spring-
 asm;3.1.0.RELEASE!spring-asm.jar
 
 So I pared things down to a simple script of four lines:
 
 #!/usr/bin/env groovy
 @Grab(group='org.cleartk', module='cleartk-util', version='0.9.2') import
 java.io.File; System.out.println(Hello World with @Grab annotations);
 
 And those four lines still result in the following:
 
 Resolving dependency: org.cleartk#cleartk-util;0.9.2 {default=[default]}
 Preparing to download artifact org.cleartk#cleartk-util;0.9.2!cleartk-util.jar
 Preparing to download artifact org.apache.uima#uimaj-core;2.4.0!uimaj-
 core.jar
 Preparing to download artifact org.uimafit#uimafit;1.4.0!uimafit.jar
 Preparing to download artifact args4j#args4j;2.0.16!args4j.jar Preparing to
 download artifact com.google.guava#guava;13.0!guava.jar
 Preparing to download artifact com.carrotsearch#hppc;0.4.1!hppc.jar
 Preparing to download artifact commons-io#commons-io;2.4!commons-
 io.jar
 Preparing to download artifact commons-lang#commons-lang;2.4!commons-
 lang.jar
 Preparing to download artifact org.apache.uima#uimaj-tools;2.4.0!uimaj-
 tools.jar
 Preparing to download artifact org.springframework#spring-
 core;3.1.0.RELEASE!spring-core.jar
 Preparing to download artifact org.springframework#spring-
 context;3.1.0.RELEASE!spring-context.jar
 Preparing to download artifact org.apache.uima#uimaj-cpe;2.4.0!uimaj-
 cpe.jar
 Preparing to download artifact org.apache.uima#uimaj-document-
 annotation;2.4.0!uimaj-document-annotation.jar
 Preparing to download artifact org.apache.uima#uimaj-adapter-
 vinci;2.4.0!uimaj-adapter-vinci.jar
 Preparing to download artifact org.apache.uima#jVinci;2.4.0!jVinci.jar
 Preparing to download artifact org.springframework#spring-
 asm;3.1.0.RELEASE!spring-asm.jar
 Preparing to download artifact commons-logging#commons-
 logging;1.1.1!commons-logging.jar
 Preparing to download artifact org.springframework#spring-
 aop;3.1.0.RELEASE!spring-aop.jar
 Preparing to download artifact org.springframework#spring-
 beans;3.1.0.RELEASE!spring-beans.jar
 Preparing to download artifact org.springframework#spring-
 expression;3.1.0.RELEASE!spring-expression.jar
 Preparing to download artifact aopalliance#aopalliance;1.0!aopalliance.jar
 org.codehaus.groovy.control.MultipleCompilationErrorsException: startup
 failed:
 General error during conversion: Error grabbing Grapes -- [download failed:
 org.springframework#spring-asm;3.1.0.RELEASE!spring-asm.jar]
 
 java.lang.RuntimeException: Error grabbing Grapes -- [download failed:
 org.springframework#spring-asm;3.1.0.RELEASE!spring-asm.jar]
 
 
 I tried deleting .groovy/grapes/org.springframework  but get the same error
 I don't see this as being friendly for new users if downloading dependencies
 is not so simple.
 
 -Original Message-
 From: dev-return-2317-Masanz.James=mayo@ctakes.apache.org
 [mailto:dev-return-2317-Masanz.James=mayo@ctakes.apache.org] On
 Behalf Of Richard Eckart de Castilho
 Sent: Friday, December 13, 2013 12:16 PM
 To: dev@ctakes.apache.org
 Subject: Re: scala and groovy
 
 On 13.12.2013, at 15:27, Steven Bethard steven.beth...@gmail.com
 wrote:
 
  P.S. I've stayed out of this whole Groovy thing because we (at
  ClearTK) had some bad experiences with Groovy in the past. Mainly with
  Groovy scripts getting out of sync with the rest of the code base,
  just like XML descriptors, though perhaps the IDEs and Maven are
  better now and that's no longer a problem? But this whole grape
  thing instead of standard Maven isn't changing my mind. Not that I
  planned to switch away from Scala for my scripting anyway, but...
 
 
 I heard and read about your bad experiences with Groovy. I believe that the
 IDEs got somewhat better at handling Groovy. However, I think a difference
 needs to be made depending on the use case.
 
 Some people use the XML files as a format to exchange pipelines with each
 other. However, alone, these files are not of much use.
 One benefit of using Groovy as a pipeline-exchange format is, that it can
 actually get all its dependencies itself via Grape. The Groovy script is quite
 self-contained (although it relies on the Maven infrastructure for
 downloading its dependencies).
 Another is, that thanks to uimaFIT, the Groovy code is much less verbose
 than the XML descriptors.
 
 At the UKP Lab, we also use Groovy sometimes for high-level experiment
 

RE: OrangeBook missing?

2013-12-03 Thread Chen, Pei
Hi Dima,
I believe OrangeBook doesn't have the license restriction like UMLS- hence it 
was included in 
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook/

 -Original Message-
 From: Dligach, Dmitriy [mailto:dmitriy.dlig...@childrens.harvard.edu]
 Sent: Tuesday, December 03, 2013 11:14 AM
 To: cTAKES Developer list
 Subject: OrangeBook missing?
 
 Hi,
 
 It seems like OrangeBook is not included in
 http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-
 3.1.0.zip.
 
 However, when I downloaded
 http://sourceforge.net/projects/ctakesresources/files/ctakes-resources-
 3.0.1.zip it was there. The 3.1.0 bundle is about half the size of the 3.0.1.
 
 Does anybody know why is OrangeBook missing from 3.1.0 resource bundle?
 
 Dima
 
 
 



RE: cTAKES Groovy...

2013-11-27 Thread Chen, Pei
The sample constituency parser printer should be working now...
Just copy and paste the text to parser.groovy and make it executable.
All you should need is groovy installed on your machine.
http://svn.apache.org/repos/asf/ctakes/sandbox/groovy/parser.groovy
$ parser.groovy input
Reading from directory: input
 (TOP (S (NP-SBJ (NN patient)) (VP (VBD took) (NP (NP (NNS 50mg)) (PP (IN of) 
(NP (NP (NN aspirin)) (PP (IN for) (NP (NP (NN pain)) (PP-LOC (IN in) (NP (NN 
knee)(. .)))

Maybe we could create one that will output UMLS CUI/Codes... and then others 
could easily modify to their needs.

--Pei
 -Original Message-
 From: William Karl Thompson [mailto:w...@northwestern.edu]
 Sent: Tuesday, November 26, 2013 10:46 PM
 To: dev@ctakes.apache.org
 Subject: RE: cTAKES Groovy...
 
 That is very cool!
 
 Since we're talking Groovy, I'd just like make a plug for Gradle, a fantastic
 build/deployment/dependency management tool that is in many ways much
 nicer to work with than Maven, though it plays nicely with Maven (for
 example, it can use Maven repositories). Gradle is also proven technology:
 it's the build tool for the Android operating system.
 
 From: Chen, Pei [pei.c...@childrens.harvard.edu]
 Sent: Tuesday, November 26, 2013 4:13 PM
 To: dev@ctakes.apache.org
 Subject: cTAKES Groovy...
 
 Tim had a good end user use case:
 I just want to use the ctakes constituency parser and output the tree text to
 console.
 So I was inspired by Richard example of groovy...
 Check out:
 http://svn.apache.org/repos/asf/ctakes/sandbox/groovy/parser.groovy
 
 The groovy script will Automagically download the required
 classes,jars,resources and automatically runs.
 No longer requires the user to have any knowledge of UIMA, cTAKES, etc.
 Sample:
 $ parser.groovy input
 Reading from directory: input
 patient took 50mg of aspirin for pain in knee.
 begin:0 end:48
 
 Pretty cool, 'eh...
 --Pei


RE: ytex branch

2013-11-26 Thread Chen, Pei
Hi VJ,
Sounds cool.  I guess once things are in the branch, we can start to take a 
look to see if it makes sense to incorporate them directly into existing ctakes 
modules or not?
Just curious- were the type system changes mainly adding additional fields?  
Just planning ahead especially for proposed type system changes...

--Pei

 -Original Message-
 From: vijay garla [mailto:vnga...@gmail.com]
 Sent: Monday, November 25, 2013 5:07 PM
 To: ctakes-...@incubator.apache.org
 Subject: ytex branch
 
 Hello All,
 
 I'm close to done with the port of ytex to ctakes.  I would like to create
 branch to commit the changes to for review by the ctakes elders and other
 developers.  I will be adding the following projects:
 * ctakes-ytex-res - resources
 * ctakes-ytex - no uima/ctakes dependencies - primarily semantic similarity
 code
 * ctakes-ytex-uima - ctakes annotators and pipeline configs
 
 I made very few changes to other ctakes modules, these include:
 * fixing spring version conflicts
 * treatment of newlines in various annotators
 * added properties to OntologyConcept type to support word sense
 disambiguation
 
 Any objections to a branch?
 
 The main thing left to do is packaging for the binary distro.
 * setup ant scripts: I think bin\scripts would be a good spot
 * adding to ctakes-resources download: I have the following to add:
 - delimited text file with lookup dictionary (similar to hsqldb for current
 dictionary lookup)
 - concept graphs for semantic similarity and WSD
 - libraries for jdbc drivers (mysql, oracle, sql server) and hibernate For the
 ctakes-resources additions, I can create a zip file to add to the ctakes-
 resources, and send it to somebody (I think it will be a bit big to attach to 
 a
 ticket, and the whole point is not to have non-asf compliant stuff lurking
 around apache)
 
 TIA,
 
 VJ


  1   2   >