Resending without attachments.,

2024-05-09 Thread Peter Abramowitsch
Shifting this thread back to the main ctakes thread where it belongs...

Hi Joel,

>From your dump, it looks as if the main concept dictionary is missing.

*"No Resource at
resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script"*

It's currently configured to run with a standard but older dictionary.  But
first we need to establish whether you have a UMLS api-key that gives you
access to use that vocabulary resource.  If not, here's where to begin
https://documentation.uts.nlm.nih.gov/rest/authentication.html

The dictionary in question is rather dated and intended to be a sample.  I
found it here:
https://github.com/CDCgov/NLPWorkbench/blob/master/ctakes-patch/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab.script.
Once you have your UMLS license you can also download the entire UMLS
vocabulary resource onto your machine, then run the cTakes Dictionary
Creator application  to build the vocabulary you need.  It selectively
fetches the parts you want from the UMLS files and builds a database for
use in cTakes.  I think most cTakes users build their own dictionaries
after they've become familiar with the application.

There are also models you may need, but not have.These large binary
objects got shifted when the source was transferred onto GitHub and I'm not
sure where they are stored now.Others on this thread will know.

*But first I recommend you get your license key and follow the instructions
about how to configure it into the WAR file.*I haven't used that module
before and it's probably been a decade since I last used apache tomcat.  In
any case, you will continue to get a rather cryptic resource initialization
error until you've passed the API  key correctly.

I'm about to head off to Europe, so you may need to lean on another
resource to get started.  That's why I've cc'd the ctakes thread and you
can take it from there.

Peter


Re: Remaining Maven errors visible in Eclipse [EXTERNAL]

2024-05-06 Thread Peter Abramowitsch
Hi Sean
All the rest of the errors are identical and they apply to every sub
project.  Because this email doesn't allow images, I'll send you a little
screen capture via your other email.   But the error that shows up on every
project is:

*Cannot parse lifecycle mapping metadata for maven project MavenProject:
org.apache.ctakes:ctakes:5.1.0 @
/Users/peterabramowitsch/projects/apache/ctakes-5.0/ctakes/pom.xml Cause:
Unrecognised tag: 'version' (position: START_TAG seen ...\n
 ... @23:18) *

I have no idea why "version" isn't expected
When I look at the pom in Eclipse's pom inspector it doesn't show an
error.
for instance..

4.0.0

ctakes-drug-ner

Apache cTAKES Drug NER



org.apache.ctakes

ctakes

5.1.0



On Mon, May 6, 2024 at 6:48 AM Finan, Sean
 wrote:

> Hi Peter,
>
> Thanks again for testing.  I didn't have a problem with that
> ctakes.version in ytex-web, but I stuck a definition of it in 
> just in case.  It does the same thing as using parent.version, but just in
> case we ever change the parent I went with a definition of ctakes.version
> in the ytex-web pom.
>
> Do you have a listing of the rest of the errors reported by eclipse?  I
> use Intellij and while I do get a bunch of version warnings I don't get any
> errors.  I think that ytex-web would need a fair amount of code overhaul to
> get rid of them, and unless there is a major demand from the community I
> don't know that it is worth doing.  Personally, I'd like to put ytex-web in
> the attic and refer to ctakes-web-rest as a replacement.  Perhaps we can do
> that in ctakes 6 ?
>
> Thanks,
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Sunday, May 5, 2024 4:48 PM
> To: dev@ctakes.apache.org 
> Subject: Remaining Maven errors visible in Eclipse [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean, there are some minor 5.1.0 Maven glitches picked up by Eclipse,
> one of which I can fix and others not.
>
> in ctakes-ytex-web's pom.xml, I changed *ctakes*.version to
> *parent*.version.
> I have not checked it in, it case it wasn't the right thing to do, but it
> made the error go away.
>
> 
>
> org.apache.ctakes
>
> ctakes-user-resources
>
> ${parent.version}
>
> 
>
> But every pom except the master pom, shows this error:
>
> "Cannot parse lifecycle mapping for maven project Maven Project"
>
> In fact  there is no lifecycle mapping file.
>
> I looked at various solutions online and none of them worked, including
> creating a dummy mapping and including it in the project - all it did was
> insert a blank line between every line of the master pom!  *Che palle*, as
> they say in Italian
>
> I'm pretty sure it's not my Eclipse installation because my own Maven
> projects (admittedly smaller) don't show this error.
>
> Is anyone else seeing a red 'x' error next to every pom in the source tree
> in Eclipse?
>
> Eclipse Version: 2023-12 (4.30.0)
> M2E - Maven Integration for Eclipse 2.6.0.20240220-1109
> org.eclipse.m2e.feature.feature.group Eclipse.org - m2e
>
> Peter
>


Remaining Maven errors visible in Eclipse

2024-05-05 Thread Peter Abramowitsch
Hi Sean, there are some minor 5.1.0 Maven glitches picked up by Eclipse,
one of which I can fix and others not.

in ctakes-ytex-web's pom.xml, I changed *ctakes*.version to *parent*.version.
I have not checked it in, it case it wasn't the right thing to do, but it
made the error go away.



org.apache.ctakes

ctakes-user-resources

${parent.version}



But every pom except the master pom, shows this error:

"Cannot parse lifecycle mapping for maven project Maven Project"

In fact  there is no lifecycle mapping file.

I looked at various solutions online and none of them worked, including
creating a dummy mapping and including it in the project - all it did was
insert a blank line between every line of the master pom!  *Che palle*, as
they say in Italian

I'm pretty sure it's not my Eclipse installation because my own Maven
projects (admittedly smaller) don't show this error.

Is anyone else seeing a red 'x' error next to every pom in the source tree
in Eclipse?

Eclipse Version: 2023-12 (4.30.0)
M2E - Maven Integration for Eclipse 2.6.0.20240220-1109
org.eclipse.m2e.feature.feature.group Eclipse.org - m2e

Peter


Mastif Zoner is there now

2024-05-02 Thread Peter Abramowitsch
Hi Sean

I did a clean build, also removing the mastif zoner library from my maven
cache.  It does get into the distribution now.

My git branch got a bit confused when I tried to merge the tag into it.
But by destroying my branch and using switch -c to create a new one off the
5.1.0 tag it seemed to do the right thing.  I guess when 5.1.0 is merged
into main, that won't be an issue

Peter


Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

2024-05-02 Thread Peter Abramowitsch
I'll test it today.  Thanks Sean.

Peter

On Wed, May 1, 2024, 14:05 Finan, Sean
 wrote:

> Hi Peter,
>
> I think that I have the ctakes-mastif-zoner module behavior as desired.
> Let me know if you have any problems with the new candidate.
>
> Sean
> ____
> From: Peter Abramowitsch 
> Sent: Friday, April 26, 2024 11:41 PM
> To: dev@ctakes.apache.org 
> Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate
> [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,
>
> It all compiles, but one of the jars is missing from the distribution.
> It's the one I added:  ctakes-mastif-zoner which is required if you're
> going to use the Zone Annotator.
>
> It's in the master pom, and in the pom of ctakes-distribution, and the jar
> got built in its projecte, but it's not scooped up into the distribution.
> I'm not sure where else to look.Can you fix it?
>
> Peter
>
>
> On Fri, Apr 26, 2024 at 8:59 AM Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > There is a candidate for version 5.1.0 of Apache cTAKES source code in a
> > staging repository:
> >
> >
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGZJnsDvVQ$
> >
> > The code is contained within the file:
> > ctakes-5.1.0-source-release.zip<
> >
> https://urldefense.com/v3/__https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGbgcnIf3Q$
> > >
> >
> > I welcome you all to test your favorite pipeline(s) and report any
> issues.
> > I am calling a vote from the PMC to finish by 12:nn Eastern time, next
> > Wednesday May 1.  Please report any issues before that time.  If any
> > 'road-block' issues are found they will need to be addressed before a
> > release.
> >
> > Thank you,
> > Sean
> >
> >
> > p.s.
> >
> > The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0
> tag:
> >
> https://urldefense.com/v3/__https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGYS7mfi0g$
> >
> > The ctakes-5.1.0 tag was made from the 5.1.0 branch:
> >
> https://urldefense.com/v3/__https://github.com/apache/ctakes/tree/5.1.0__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGZACCHJkw$
> >
> > The 5.1.0 branch is a copy of the main branch:
> >
> https://urldefense.com/v3/__https://github.com/apache/ctakes/tree/main__;!!NZvER7FxgEiBAiR_!oamc_hHAvz2fBQkEZj8-lyh6rre52BxodrPgl8rcF3RzuF2yd5wK6jstb2HvAuElskKXRnpFZryui_jct7-SQsoWbGaoZVS80g$
> > The version number in the 5.1.0 branch is different, but there are no
> code
> > differences between the two branches.
> >
> >
> >
>


Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-29 Thread Peter Abramowitsch
axb.ObjectFactoryUtil.getJAXBElementBySrcXml(ObjectFactoryUtil.java:49)
> [INFO]  [java]  at
>
> org.apache.ctakes.jdl.data.xml.jaxb.ObjectFactoryUtil.getConnTypeBySrcXml(ObjectFactoryUtil.java:86)
> [INFO]  [java]  at
>
> org.apache.ctakes.jdl.data.xml.jaxb.ObjectFactoryUtil.getJdbcTypeBySrcXml(ObjectFactoryUtil.java:64)
> [INFO]  [java]  at
> org.apache.ctakes.jdl.AppJdl.execute(AppJdl.java:80)
> [INFO]  [java]  at
> org.apache.ctakes.jdl.AppMain.main(AppMain.java:84)
> [INFO]  [java] Caused by:
> java.lang.reflect.InaccessibleObjectException: Unable to make protected
> final java.lang.Class
> java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int) throws
> java.lang.ClassFormatError accessible: module java.base does not "opens
> java.lang" to unnamed module @61ca2dfa
> [INFO]  [java]  at
>
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
> [INFO]  [java]  at
>
> java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
> [INFO]  [java]  at
> java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
> [INFO]  [java]  at
> java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
> [INFO]  [java]  at
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector$1.run(Injector.java:177)
> [INFO]  [java]  at
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector$1.run(Injector.java:172)
> [INFO]  [java]  at
>
> java.base/java.security.AccessController.doPrivileged(AccessController.java:318)
> [INFO]  [java]  at
>
> com.sun.xml.bind.v2.runtime.reflect.opt.Injector.(Injector.java:172)
> [INFO]  [java]  ... 34 more
>
> On Mon, 29 Apr 2024 at 21:21, Peter Abramowitsch 
> wrote:
>
> > I think this is the class where Java is exiting with 1
> > /ctakes-ytex/src/test/java/org/apache/ctakes/jdl/AppMainTest.java
> >
> > btw my environment is MacOS and I notice yours is Windows, so the root
> > cause why this class is giving you trouble is something I wouldn't be
> able
> > to help you with.  But some debug statements rather than asserts would
> tell
> > you, I think.
> >
> > Peter
> >
> > On Mon, Apr 29, 2024 at 8:43 AM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > Hi Gandhi
> > > This project is an odd one in the sense that when you tell it to skip
> the
> > > tests, it still goes through the effort in building up the db
> environment
> > > that the tests would use.  But in any case, for me it does build either
> > > way.  In the attached log, I've run a maven clean before doing the
> build
> > > without tests.
> > >
> > > However, check my previous email about your issue.  Whereas you'd
> > narrowed
> > > it down to a script, I found a line in your email which showed the
> error
> > > within that script's execution:  A java  program: jdl running as
> App.Main
> > > threw an assertion on one of the tasks connected with the mysql
> database
> > it
> > > was trying to configure.  You could put some debugging statements in
> > there
> > > to see which one.
> > >
> > > Peter
> > >
> > > On Mon, Apr 29, 2024 at 4:55 AM gandhi rajan 
> > > wrote:
> > >
> > >> Thanks for the insights Peter. I dint make it clear that I did ran the
> > >> install on ytex module with test case execution toggled off. I used
> the
> > >> following command - "mvn -e clean install -Dmaven.test.skip=true" and
> I
> > >> still hit the same error.
> > >>
> > >> On digging deep, I could find that the build process is trying to
> > execute
> > >> "" in
> > >> build-main.xml which in turn is trying to invoke the following target
> in
> > >> build.setup.xml:
> > >>
> > >>  > >> depends="generateTestYtexProperties,templateToConfig,deleteTestDb">
> > >> 
> > >> 
> > >>
> > >> Did you try running this on a fresh setup Peter?
> > >>
> > >> On Sun, 28 Apr 2024 at 01:17, Peter Abramowitsch <
> > pabramowit...@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > Hi Gandhi
> > >> > Your error appears to be at this line
> > >> >
> > >> >
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
> > >> Java
> > >> >

Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-29 Thread Peter Abramowitsch
I think this is the class where Java is exiting with 1
/ctakes-ytex/src/test/java/org/apache/ctakes/jdl/AppMainTest.java

btw my environment is MacOS and I notice yours is Windows, so the root
cause why this class is giving you trouble is something I wouldn't be able
to help you with.  But some debug statements rather than asserts would tell
you, I think.

Peter

On Mon, Apr 29, 2024 at 8:43 AM Peter Abramowitsch 
wrote:

> Hi Gandhi
> This project is an odd one in the sense that when you tell it to skip the
> tests, it still goes through the effort in building up the db environment
> that the tests would use.  But in any case, for me it does build either
> way.  In the attached log, I've run a maven clean before doing the build
> without tests.
>
> However, check my previous email about your issue.  Whereas you'd narrowed
> it down to a script, I found a line in your email which showed the error
> within that script's execution:  A java  program: jdl running as App.Main
> threw an assertion on one of the tasks connected with the mysql database it
> was trying to configure.  You could put some debugging statements in there
> to see which one.
>
> Peter
>
> On Mon, Apr 29, 2024 at 4:55 AM gandhi rajan 
> wrote:
>
>> Thanks for the insights Peter. I dint make it clear that I did ran the
>> install on ytex module with test case execution toggled off. I used the
>> following command - "mvn -e clean install -Dmaven.test.skip=true" and I
>> still hit the same error.
>>
>> On digging deep, I could find that the build process is trying to execute
>> "" in
>> build-main.xml which in turn is trying to invoke the following target in
>> build.setup.xml:
>>
>> > depends="generateTestYtexProperties,templateToConfig,deleteTestDb">
>> 
>> 
>>
>> Did you try running this on a fresh setup Peter?
>>
>> On Sun, 28 Apr 2024 at 01:17, Peter Abramowitsch > >
>> wrote:
>>
>> > Hi Gandhi
>> > Your error appears to be at this line
>> >
>> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
>> Java
>> > returned: 1
>> >
>> > A test application being run here:  AppMain is in charge of loading a
>> > temporary mysqldb that is used to test that part of ytex.   For me it is
>> > working, but if  you can find a way to run that surefire test in the
>> > debugger, you can find out why it's failing on one of the assertions.
>> > Otherwise you can  take this shortcut
>> >
>> > mvn  -Dmaven.test.skip=true
>> >
>> > To build the project without running any tests.
>> >
>> > On Sat, Apr 27, 2024 at 7:35 AM gandhi rajan 
>> > wrote:
>> >
>> > > Hi Sean,
>> > >
>> > > When I tried to build the complete ctakes suite, i get build failure
>> for
>> > > ctakes-ytex module with the following error:
>> > >
>> > > [ERROR] Failed to execute goal
>> > > org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run
>> > > (generate-test-config) on project ctakes-ytex: An Ant BuildException
>> has
>> > > occured: The following error occurred while executing this line:
>> > > [ERROR]
>> > >
>> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\build-setup.xml:149:
>> > The
>> > > following error occurred while executing this line:
>> > > [ERROR]
>> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:148:
>> > The
>> > > following error occurred while executing this line:
>> > > [ERROR]
>> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:295:
>> > The
>> > > following error occurred while executing this line:
>> > > [ERROR]
>> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
>> > Java
>> > > returned: 1
>> > > [ERROR] around Ant part ...> dir="scripts"
>> > > target="test.setup">... @ 5:70 in
>> > >
>> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\target\antrun\build-main.xml
>> > >
>> > > Is this expected Sean?
>> > >
>> > > On Fri, 26 Apr 2024 at 21:30, Finan, Sean
>> > >  wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > There is a candidate for version 5.1.0 of Apache cTAKES source code
>> in
>> > a
>> > > > staging repository:
>> > > >
>> > > >
>> &

Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-29 Thread Peter Abramowitsch
Hi Gandhi
This project is an odd one in the sense that when you tell it to skip the
tests, it still goes through the effort in building up the db environment
that the tests would use.  But in any case, for me it does build either
way.  In the attached log, I've run a maven clean before doing the build
without tests.

However, check my previous email about your issue.  Whereas you'd narrowed
it down to a script, I found a line in your email which showed the error
within that script's execution:  A java  program: jdl running as App.Main
threw an assertion on one of the tasks connected with the mysql database it
was trying to configure.  You could put some debugging statements in there
to see which one.

Peter

On Mon, Apr 29, 2024 at 4:55 AM gandhi rajan 
wrote:

> Thanks for the insights Peter. I dint make it clear that I did ran the
> install on ytex module with test case execution toggled off. I used the
> following command - "mvn -e clean install -Dmaven.test.skip=true" and I
> still hit the same error.
>
> On digging deep, I could find that the build process is trying to execute
> "" in
> build-main.xml which in turn is trying to invoke the following target in
> build.setup.xml:
>
>  depends="generateTestYtexProperties,templateToConfig,deleteTestDb">
> 
> 
>
> Did you try running this on a fresh setup Peter?
>
> On Sun, 28 Apr 2024 at 01:17, Peter Abramowitsch 
> wrote:
>
> > Hi Gandhi
> > Your error appears to be at this line
> >
> > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
> Java
> > returned: 1
> >
> > A test application being run here:  AppMain is in charge of loading a
> > temporary mysqldb that is used to test that part of ytex.   For me it is
> > working, but if  you can find a way to run that surefire test in the
> > debugger, you can find out why it's failing on one of the assertions.
> > Otherwise you can  take this shortcut
> >
> > mvn  -Dmaven.test.skip=true
> >
> > To build the project without running any tests.
> >
> > On Sat, Apr 27, 2024 at 7:35 AM gandhi rajan 
> > wrote:
> >
> > > Hi Sean,
> > >
> > > When I tried to build the complete ctakes suite, i get build failure
> for
> > > ctakes-ytex module with the following error:
> > >
> > > [ERROR] Failed to execute goal
> > > org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run
> > > (generate-test-config) on project ctakes-ytex: An Ant BuildException
> has
> > > occured: The following error occurred while executing this line:
> > > [ERROR]
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\build-setup.xml:149:
> > The
> > > following error occurred while executing this line:
> > > [ERROR]
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:148:
> > The
> > > following error occurred while executing this line:
> > > [ERROR]
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:295:
> > The
> > > following error occurred while executing this line:
> > > [ERROR]
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456:
> > Java
> > > returned: 1
> > > [ERROR] around Ant part ... > > target="test.setup">... @ 5:70 in
> > > C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\target\antrun\build-main.xml
> > >
> > > Is this expected Sean?
> > >
> > > On Fri, 26 Apr 2024 at 21:30, Finan, Sean
> > >  wrote:
> > >
> > > > Hi all,
> > > >
> > > > There is a candidate for version 5.1.0 of Apache cTAKES source code
> in
> > a
> > > > staging repository:
> > > >
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/
> > > >
> > > > The code is contained within the file:
> > > > ctakes-5.1.0-source-release.zip<
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip
> > > > >
> > > >
> > > > I welcome you all to test your favorite pipeline(s) and report any
> > > issues.
> > > > I am calling a vote from the PMC to finish by 12:nn Eastern time,
> next
> > > > Wednesday May 1.  Please report any issues before that time.  If any
> > > > 'road-block' issues are found they will need to be addressed before a
> > > > release.
> > > >
> > > &g

Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-27 Thread Peter Abramowitsch
Hi Gandhi
Your error appears to be at this line

C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456: Java
returned: 1

A test application being run here:  AppMain is in charge of loading a
temporary mysqldb that is used to test that part of ytex.   For me it is
working, but if  you can find a way to run that surefire test in the
debugger, you can find out why it's failing on one of the assertions.
Otherwise you can  take this shortcut

mvn  -Dmaven.test.skip=true

To build the project without running any tests.

On Sat, Apr 27, 2024 at 7:35 AM gandhi rajan 
wrote:

> Hi Sean,
>
> When I tried to build the complete ctakes suite, i get build failure for
> ctakes-ytex module with the following error:
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run
> (generate-test-config) on project ctakes-ytex: An Ant BuildException has
> occured: The following error occurred while executing this line:
> [ERROR]
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\build-setup.xml:149: The
> following error occurred while executing this line:
> [ERROR]
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:148: The
> following error occurred while executing this line:
> [ERROR]
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:295: The
> following error occurred while executing this line:
> [ERROR]
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\scripts\data\build.xml:456: Java
> returned: 1
> [ERROR] around Ant part ... target="test.setup">... @ 5:70 in
> C:\Gandhi\Project\ctakes-5.1.0\ctakes-ytex\target\antrun\build-main.xml
>
> Is this expected Sean?
>
> On Fri, 26 Apr 2024 at 21:30, Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > There is a candidate for version 5.1.0 of Apache cTAKES source code in a
> > staging repository:
> >
> >
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/
> >
> > The code is contained within the file:
> > ctakes-5.1.0-source-release.zip<
> >
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip
> > >
> >
> > I welcome you all to test your favorite pipeline(s) and report any
> issues.
> > I am calling a vote from the PMC to finish by 12:nn Eastern time, next
> > Wednesday May 1.  Please report any issues before that time.  If any
> > 'road-block' issues are found they will need to be addressed before a
> > release.
> >
> > Thank you,
> > Sean
> >
> >
> > p.s.
> >
> > The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0
> tag:
> > https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0
> >
> > The ctakes-5.1.0 tag was made from the 5.1.0 branch:
> > https://github.com/apache/ctakes/tree/5.1.0
> >
> > The 5.1.0 branch is a copy of the main branch:
> > https://github.com/apache/ctakes/tree/main
> > The version number in the 5.1.0 branch is different, but there are no
> code
> > differences between the two branches.
> >
> >
> >
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>


Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-26 Thread Peter Abramowitsch
Hi again Sean
Perfect Compile
Within our context and our pipeline, it runs well.
Tried with simple and complex pipelines.
I have not used most of the piperRunner/Creator/ scripts.
I haven't exercised any of the PBJ stuff yet.
I don't use the REST projects or the YTEX DB stuff - we have our own

Apart from the missing project I mentioned in the previous email that does
need to be fixed, I would give 5.1.0 a plus for release.

Peter

On Fri, Apr 26, 2024 at 8:41 PM Peter Abramowitsch 
wrote:

> Hi Sean,
>
> It all compiles, but one of the jars is missing from the distribution.
> It's the one I added:  ctakes-mastif-zoner which is required if you're
> going to use the Zone Annotator.
>
> It's in the master pom, and in the pom of ctakes-distribution, and the jar
> got built in its projecte, but it's not scooped up into the distribution.
> I'm not sure where else to look.Can you fix it?
>
> Peter
>
>
> On Fri, Apr 26, 2024 at 8:59 AM Finan, Sean
>  wrote:
>
>> Hi all,
>>
>> There is a candidate for version 5.1.0 of Apache cTAKES source code in a
>> staging repository:
>>
>> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/
>>
>> The code is contained within the file:
>> ctakes-5.1.0-source-release.zip<
>> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip
>> >
>>
>> I welcome you all to test your favorite pipeline(s) and report any issues.
>> I am calling a vote from the PMC to finish by 12:nn Eastern time, next
>> Wednesday May 1.  Please report any issues before that time.  If any
>> 'road-block' issues are found they will need to be addressed before a
>> release.
>>
>> Thank you,
>> Sean
>>
>>
>> p.s.
>>
>> The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
>> https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0
>>
>> The ctakes-5.1.0 tag was made from the 5.1.0 branch:
>> https://github.com/apache/ctakes/tree/5.1.0
>>
>> The 5.1.0 branch is a copy of the main branch:
>> https://github.com/apache/ctakes/tree/main
>> The version number in the 5.1.0 branch is different, but there are no
>> code differences between the two branches.
>>
>>
>>


Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-26 Thread Peter Abramowitsch
Hi Sean,

It all compiles, but one of the jars is missing from the distribution.
It's the one I added:  ctakes-mastif-zoner which is required if you're
going to use the Zone Annotator.

It's in the master pom, and in the pom of ctakes-distribution, and the jar
got built in its projecte, but it's not scooped up into the distribution.
I'm not sure where else to look.Can you fix it?

Peter


On Fri, Apr 26, 2024 at 8:59 AM Finan, Sean
 wrote:

> Hi all,
>
> There is a candidate for version 5.1.0 of Apache cTAKES source code in a
> staging repository:
>
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/
>
> The code is contained within the file:
> ctakes-5.1.0-source-release.zip<
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip
> >
>
> I welcome you all to test your favorite pipeline(s) and report any issues.
> I am calling a vote from the PMC to finish by 12:nn Eastern time, next
> Wednesday May 1.  Please report any issues before that time.  If any
> 'road-block' issues are found they will need to be addressed before a
> release.
>
> Thank you,
> Sean
>
>
> p.s.
>
> The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
> https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0
>
> The ctakes-5.1.0 tag was made from the 5.1.0 branch:
> https://github.com/apache/ctakes/tree/5.1.0
>
> The 5.1.0 branch is a copy of the main branch:
> https://github.com/apache/ctakes/tree/main
> The version number in the 5.1.0 branch is different, but there are no code
> differences between the two branches.
>
>
>


Re: Please test the Apache cTAKES 5.1.0 release candidate

2024-04-26 Thread Peter Abramowitsch
Hi Sean

I'll do a runthrough.  But looking through the commits, except for label
changes, it looks as if most of the last code additions & changes were mine
from a couple of months ago.  I'm planning to put this release (wrapped in
our webservice framework) into production in a month or so.

Peter



On Fri, Apr 26, 2024 at 8:59 AM Finan, Sean
 wrote:

> Hi all,
>
> There is a candidate for version 5.1.0 of Apache cTAKES source code in a
> staging repository:
>
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/
>
> The code is contained within the file:
> ctakes-5.1.0-source-release.zip<
> https://repository.apache.org/content/repositories/staging/org/apache/ctakes/ctakes/5.1.0/ctakes-5.1.0-source-release.zip
> >
>
> I welcome you all to test your favorite pipeline(s) and report any issues.
> I am calling a vote from the PMC to finish by 12:nn Eastern time, next
> Wednesday May 1.  Please report any issues before that time.  If any
> 'road-block' issues are found they will need to be addressed before a
> release.
>
> Thank you,
> Sean
>
>
> p.s.
>
> The 5.1.0 candidate is based upon the source code in the ctakes-5.1.0 tag:
> https://github.com/apache/ctakes/releases/tag/ctakes-5.1.0
>
> The ctakes-5.1.0 tag was made from the 5.1.0 branch:
> https://github.com/apache/ctakes/tree/5.1.0
>
> The 5.1.0 branch is a copy of the main branch:
> https://github.com/apache/ctakes/tree/main
> The version number in the 5.1.0 branch is different, but there are no code
> differences between the two branches.
>
>
>


Changes to the Assertion-Zoner

2024-02-06 Thread Peter Abramowitsch
Hi all.

I've merged in an update to the AssertionZoner (for detection of Sections
in clinical text).   There's a new markdown file in the project
ctakes-assertion-zoner describing the changes and how to take advantage of
the new features.   The update (code graciously permitted by my client
UCSF) does not include a further configuration (sections_regex.xml) beyond
the one supplied originally with ctakes.  This you'll probably need to
extend for the  characteristics of the text you are working with, but it's
a good starting point and one of the new features checked in here will help
you determine which section definitions you'll need to add.

There's also a new ctakes-project:  ctakes-mastif-zoner which was an
external support library that needed modification ...so the external
one, mastif-zoner-1.4.jar, has been removed as a dependency.  The new
"internal" version is based off of the mastif source version 1.6.   Make
sure you also remove the old version mastif-zoner-1.4.jar from any runtime
areas.

Peter


Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]

2023-12-05 Thread Peter Abramowitsch
Thanks for the great discussion, Sean.

About Mastif,   I started with 1.6 and then made further changes.  If the
current checked in ZoneAnnotator works well with 1.6, then I can first
update the dependency on 1.6.  Then find a way to mix-in the changes within
the cTakes package structure at the same time I add my Annotator changes.

Among other things I modified the code to allow for Generic document
sections and subsections to be printed into the log - it detects things
that look like they're document sections but are not yet accomodated in the
regex list of "known sections".  This helps the user identify which
sections still need to be configured.  There are some other changes I'll
check with UCSF and see if we can release these publicly.We've used the
new code on 130 million notes with good results.

Peter

On Tue, Dec 5, 2023 at 7:59 PM Finan, Sean
 wrote:

> Hi Peter,
>
> My thoughts on this:
>
> > a newer version of Mastif than Ctakes is packaged with, and additional
> modifications that I've made
> - Are you saying that you made a local/custom version of mastif that
> builds upon their latest (1.6?) version?  Or do you just need to update the
> ctakes dependency from 1.4 to 1.6?
> - If you have a custom version of mastif, one way that I have dealt with
> this in other projects is to keep changes to the standard library 'in
> parallel' and call the parallel versions in my/our code.  If mastif is
> written in java then this is fairly easy to do by creating another module
> that shares the package structure of mastif.  The parallel code can be put
> into ctakes.
> - Another way to deal with it is to distribute your custom code and jar
> with ctakes.  Mastif appears to use an Apache 2.0 license, so I think that
> this can be done.  If your changes are extensive or make the parallel
> option inconvenient then this may be the way to go.  "Developed by the
> Apache Software Foundation <https://www.apache.org/> and introduced in
> 2004, the Apache 2.0 License is a is a permissive free software license.
> The license permits use of the software for any purpose, users are able to
> distribute it, to modify it, and to distribute modified versions of the
> software."  -
> https://pitt.libguides.com/openlicensing/apache2#:~:text=Apache%20License,modified%20versions%20of%20the%20software
> .
> - For an inclusion of a modified mastif in ctakes, maybe just put the
> whole thing into a project named "ctakes-mastif".
>
> > no documentation or references
> - A common problem with ctakes code, tests, resources ...  unfortunate and
> difficult to deal with.  I am guilty of some of this paucity of information.
>
> > dependent on a BertREST server
> - In instances such as this I would say that somebody checked in
> unfinished code, or that somebody forgot to check in a resource.  However,
> this particular code probably came from a developer working on an external
> project and checked in code that is intended to be used by that project.
> - For any new developers out there: It is a 'best practice' to create your
> own project and include ctakes as a dependency.  Keep your project code
> only in your project repository.  If you want to make changes to ctakes in
> parallel, you can also create a module in your ctakes source root and put
> your non-ctakes code only in that module.  Don't check in that module!
> - All that said, everybody forgets/makes mistakes/hurries ...
>
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Tuesday, December 5, 2023 12:38 PM
> To: dev@ctakes.apache.org 
> Subject: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]
>
> * External Email - Caution *
>
>
> The question is:  what is our policy if a resource in the ctakes archive
> depends upon another resource that is not in the archive and may not be
> available elsewhere.  I'm sure there are other examples, but here are
> two
>
> 1.   I've done some enhancements to the ZoneAnnotator for note section
> detection, but these depend upon a newer version of Mastif than Ctakes is
> packaged with, and additional modifications that I've made.   If I do add
> the updates to the Zone Annotator, where should I put the customized Mastif
> library - does it belong in cTakes?
>
> 2.  I found a couple of interesting annotators in the archive that are
> dependent on a BertREST server, but there's no documentation or references
> as to what code base that server comes from or whether its BERT model is
> even publicly available.
>
> DocTimeRelBertRestAnnotator
> TemporalBertRestAnnotator
> PolarityBertRestAnnotator
>
> Here's my feeling:  Ctakes sources should be packaged to either be
> self-sufficient or based on publicly avail

Examining Ctakes 5.0 - two sides of the same question

2023-12-05 Thread Peter Abramowitsch
The question is:  what is our policy if a resource in the ctakes archive
depends upon another resource that is not in the archive and may not be
available elsewhere.  I'm sure there are other examples, but here are
two

1.   I've done some enhancements to the ZoneAnnotator for note section
detection, but these depend upon a newer version of Mastif than Ctakes is
packaged with, and additional modifications that I've made.   If I do add
the updates to the Zone Annotator, where should I put the customized Mastif
library - does it belong in cTakes?

2.  I found a couple of interesting annotators in the archive that are
dependent on a BertREST server, but there's no documentation or references
as to what code base that server comes from or whether its BERT model is
even publicly available.

DocTimeRelBertRestAnnotator
TemporalBertRestAnnotator
PolarityBertRestAnnotator

Here's my feeling:  Ctakes sources should be packaged to either be
self-sufficient or based on publicly available dependencies at the time of
check in.  If we really want to keep dangling sources, there should be a
separate folder for them rather than mixing them in with the living
product.   But, for now, I would be even happier if whoever checked in the
BertRest based annotators could provide links and documentation to the
dependencies

Your thoughts?
Peter


Re: Compilation Errors and the context.tokenizer [EXTERNAL]

2023-11-28 Thread Peter Abramowitsch
Hi Gandhi

I checked in the various changes that helped me get it to compile
properly.  I checked my maven settings and there are no repositories
mentioned - I.e. it is running completely on its defaults.  Have you tried
removing the entry you mentioned in your email rather than trying to add to
or change your existing ones?

Peter

On Mon, Nov 27, 2023 at 5:45 PM gandhi rajan 
wrote:

> Hi Sean/Peter,
>
> Thanks for the insights. As Sean mentioned, I do have the root pom.xml
> changes in my codebase as well. But in my settings.xml file, I have
> mentioned the maven repo as "https://repo1.maven.org/maven2; and the
> maven build is trying to fetch
> " org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT" from that repo
> where its not available obviously. The repository settings changes in
> pom.xml doesnt seems to be working in my case.
>
> But "https://repository.apache.org/content/groups/snapshots/; is the only
> repo where I could find the missing jars the build is looking for. I can
> probably work around this problem by referring this link in settings.xml
> but will that be feasible is my point?
>
> On Mon, 27 Nov 2023 at 21:24, Peter Abramowitsch 
> wrote:
>
> > Thanks Sean.  Could be that slf4j is available from two sites, one of
> them
> > still on http and for whatever reason it is coming up first.
> > I'm not sure though whether all this is the same as the problem Gandhi
> has
> > been having.  Let's see what e says.
> >
> > - Peter
> >
> > On Mon, Nov 27, 2023 at 4:43 PM Finan, Sean
> >  wrote:
> >
> > > As for slf4j being on http:, I don't know that I ever saw that.  If you
> > > check maven central it is actually https:
> > >
> > > https://repo1.maven.org/maven2/org/slf4j/
> > >
> > > As referred to here:
> > > https://mvnrepository.com/artifact/org.slf4j/slf4j-api/2.0.5
> > >
> > > I will do some more research on this tonight, though I welcome people
> to
> > > beat me to a solution!
> > >
> > > Sean
> > >
> > > 
> > > From: Peter Abramowitsch 
> > > Sent: Monday, November 27, 2023 7:01 AM
> > > To: dev@ctakes.apache.org 
> > > Subject: Re: Compilation Errors and the context.tokenizer [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Ghandi,
> > >
> > > As I mentioned at the beginning of this thread, there's only one
> change I
> > > needed to make to the settings.xml:  to comment out the dummy mirror
> > server
> > > that maven uses as a way of enforcing the https requirement.
> Commenting
> > it
> > > out allowed maven to fetch slf4j-api.
> > >
> > > Since slf4j  is integral to the build, it halted everything.   The
> error
> > > manifested itself as maven just hanging with a message saying it was
> > trying
> > > to access
> > >
> >
> https://urldefense.com/v3/__http://0.0.0.0__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxuIKXiFw$
> > >   I made no other changes and there's nothing
> > > about the settings.xml which is ctakes-specific.
> > >
> > > Comment out this part of settings:
> > >
> > > 
> > >   maven-default-http-blocker
> > >   external:http:*
> > >   Pseudo repository to mirror external repositories initially
> > > using HTTP.
> > >   
> > >
> >
> https://urldefense.com/v3/__http://0.0.0.0/__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHzn8LRPpw$
> > > 
> > >   true
> > > 
> > >
> > > Have you already tried doing it?
> > >
> > > For the model.jars it seems they are just brought to your local maven
> > repo
> > > from maven.apache.org.
> > > Then a goal called unpack-dependencies copies the relevant files into
> > your
> > > build tree.
> > >
> > > Generally, if you look very carefully at the maven output, it will tell
> > you
> > > what is the original cause of your error.
> > > Don't just go by what the last output lines are
> > >
> > > If this doesn't help,  I'm afraid you'll need to ask someone else who
> may
> > > be able to ask you better questions about your environment.
> > >
> > > Peter
> > >
> > > On Mon, Nov 27, 2023 at 11:07 AM gandhi rajan  >
> > > wrote:

Re: Compilation Errors and the context.tokenizer [EXTERNAL]

2023-11-27 Thread Peter Abramowitsch
Thanks Sean.  Could be that slf4j is available from two sites, one of them
still on http and for whatever reason it is coming up first.
I'm not sure though whether all this is the same as the problem Gandhi has
been having.  Let's see what e says.

- Peter

On Mon, Nov 27, 2023 at 4:43 PM Finan, Sean
 wrote:

> As for slf4j being on http:, I don't know that I ever saw that.  If you
> check maven central it is actually https:
>
> https://repo1.maven.org/maven2/org/slf4j/
>
> As referred to here:
> https://mvnrepository.com/artifact/org.slf4j/slf4j-api/2.0.5
>
> I will do some more research on this tonight, though I welcome people to
> beat me to a solution!
>
> Sean
>
> ____
> From: Peter Abramowitsch 
> Sent: Monday, November 27, 2023 7:01 AM
> To: dev@ctakes.apache.org 
> Subject: Re: Compilation Errors and the context.tokenizer [EXTERNAL]
>
> * External Email - Caution *
>
>
> Ghandi,
>
> As I mentioned at the beginning of this thread, there's only one change I
> needed to make to the settings.xml:  to comment out the dummy mirror server
> that maven uses as a way of enforcing the https requirement.  Commenting it
> out allowed maven to fetch slf4j-api.
>
> Since slf4j  is integral to the build, it halted everything.   The error
> manifested itself as maven just hanging with a message saying it was trying
> to access
> https://urldefense.com/v3/__http://0.0.0.0__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxuIKXiFw$
>   I made no other changes and there's nothing
> about the settings.xml which is ctakes-specific.
>
> Comment out this part of settings:
>
> 
>   maven-default-http-blocker
>   external:http:*
>   Pseudo repository to mirror external repositories initially
> using HTTP.
>   
> https://urldefense.com/v3/__http://0.0.0.0/__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHzn8LRPpw$
> 
>   true
> 
>
> Have you already tried doing it?
>
> For the model.jars it seems they are just brought to your local maven repo
> from maven.apache.org.
> Then a goal called unpack-dependencies copies the relevant files into your
> build tree.
>
> Generally, if you look very carefully at the maven output, it will tell you
> what is the original cause of your error.
> Don't just go by what the last output lines are
>
> If this doesn't help,  I'm afraid you'll need to ask someone else who may
> be able to ask you better questions about your environment.
>
> Peter
>
> On Mon, Nov 27, 2023 at 11:07 AM gandhi rajan 
> wrote:
>
> > Hi Peter,
> >
> > I did found your old discussion with Sean in the following link -
> >
> https://urldefense.com/v3/__https://lists.apache.org/thread/w9c33421vxb21bnr6gd9r2tb3n1odnnw__;!!NZvER7FxgEiBAiR_!rossFJrjv7RCYPsO0XiTkn5cwu5oMIZg_q3p8SFPSLVMfs4GvriQpnATGog-DGXyb4aNKBsQfqVmK9-9NDZUr7WdBHxc2xW_0A$
> >
> > I am facing the same issue while building cakes core module. Could you
> > please send me the URL details from your 'settings.xml' under
> > '/usr/local/maven/config'
> > folder to figure out from which repo you are trying to pull the
> > dependencies from during build?
> >
> > On Mon, 27 Nov 2023 at 13:22, Peter Abramowitsch <
> pabramowit...@gmail.com>
> > wrote:
> >
> > > Hi Ghandi
> > > I did some checking around and sure enough, the resource files are not
> in
> > > the git archive.  I remember a conversation about this from long ago,
> > that
> > > git wasn't the best place for large binaries.  And they're not in git.
> > So
> > > I looked in my maven repository to see where my model files had come
> > from.
> > > Those models and in fact all the resource data for ctakes comes from
> > these
> > > sources:
> > >
> > > here's the content of my
> > > .m2/repository/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT
> > >
> > > _remote.repositories
> > > ctakes-core-models-5.0.0-20221224.062752-3.jar
> > > ctakes-core-models-5.0.0-20221224.062752-3.jar.sha1
> > > ctakes-core-models-5.0.0-20221224.062752-3.pom
> > > ctakes-core-models-5.0.0-20221224.062752-3.pom.sha1
> > > ctakes-core-models-5.0.0-SNAPSHOT-javadoc.jar.lastUpdated
> > > ctakes-core-models-5.0.0-SNAPSHOT-sources.jar.lastUpdated
> > > ctakes-core-models-5.0.0-SNAPSHOT.jar
> > > ctakes-core-models-5.0.0-SNAPSHOT.pom
> > > m2e-lastUpdated.properties
> > > maven-metadata-apache.snaps

Re: Compilation Errors and the context.tokenizer

2023-11-27 Thread Peter Abramowitsch
Ghandi,

As I mentioned at the beginning of this thread, there's only one change I
needed to make to the settings.xml:  to comment out the dummy mirror server
that maven uses as a way of enforcing the https requirement.  Commenting it
out allowed maven to fetch slf4j-api.

Since slf4j  is integral to the build, it halted everything.   The error
manifested itself as maven just hanging with a message saying it was trying
to access http://0.0.0.0   I made no other changes and there's nothing
about the settings.xml which is ctakes-specific.

Comment out this part of settings:


  maven-default-http-blocker
  external:http:*
  Pseudo repository to mirror external repositories initially
using HTTP.
  http://0.0.0.0/
  true


Have you already tried doing it?

For the model.jars it seems they are just brought to your local maven repo
from maven.apache.org.
Then a goal called unpack-dependencies copies the relevant files into your
build tree.

Generally, if you look very carefully at the maven output, it will tell you
what is the original cause of your error.
Don't just go by what the last output lines are

If this doesn't help,  I'm afraid you'll need to ask someone else who may
be able to ask you better questions about your environment.

Peter

On Mon, Nov 27, 2023 at 11:07 AM gandhi rajan 
wrote:

> Hi Peter,
>
> I did found your old discussion with Sean in the following link -
> https://lists.apache.org/thread/w9c33421vxb21bnr6gd9r2tb3n1odnnw
>
> I am facing the same issue while building cakes core module. Could you
> please send me the URL details from your 'settings.xml' under
> '/usr/local/maven/config'
> folder to figure out from which repo you are trying to pull the
> dependencies from during build?
>
> On Mon, 27 Nov 2023 at 13:22, Peter Abramowitsch 
> wrote:
>
> > Hi Ghandi
> > I did some checking around and sure enough, the resource files are not in
> > the git archive.  I remember a conversation about this from long ago,
> that
> > git wasn't the best place for large binaries.  And they're not in git.
> So
> > I looked in my maven repository to see where my model files had come
> from.
> > Those models and in fact all the resource data for ctakes comes from
> these
> > sources:
> >
> > here's the content of my
> > .m2/repository/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT
> >
> > _remote.repositories
> > ctakes-core-models-5.0.0-20221224.062752-3.jar
> > ctakes-core-models-5.0.0-20221224.062752-3.jar.sha1
> > ctakes-core-models-5.0.0-20221224.062752-3.pom
> > ctakes-core-models-5.0.0-20221224.062752-3.pom.sha1
> > ctakes-core-models-5.0.0-SNAPSHOT-javadoc.jar.lastUpdated
> > ctakes-core-models-5.0.0-SNAPSHOT-sources.jar.lastUpdated
> > ctakes-core-models-5.0.0-SNAPSHOT.jar
> > ctakes-core-models-5.0.0-SNAPSHOT.pom
> > m2e-lastUpdated.properties
> > maven-metadata-apache.snapshots.xml
> > maven-metadata-apache.snapshots.xml.sha1
> > resolver-status.properties
> >
> > and here's the content of m2e-lastUpdated.properties
> >
> > #Fri Nov 17 22:03:31 CET 2023
> > apache.snapshots|https\://
> > repository.apache.org/content/groups/snapshots/|javadoc=1700255011056
> <http://repository.apache.org/content/groups/snapshots/%7Cjavadoc=1700255011056>
> > <
> http://repository.apache.org/content/groups/snapshots/%7Cjavadoc=1700255011056
> >
> > apache.snapshots|https\://
> > repository.apache.org/content/groups/snapshots/|sources=1700255007489
> <http://repository.apache.org/content/groups/snapshots/%7Csources=1700255007489>
> > <
> http://repository.apache.org/content/groups/snapshots/%7Csources=1700255007489
> >
> > central|https\://repo.maven.apache.org/maven2|javadoc=1700255011056
> <http://repo.maven.apache.org/maven2%7Cjavadoc=1700255011056>
> > <http://repo.maven.apache.org/maven2%7Cjavadoc=1700255011056>
> > central|https\://repo.maven.apache.org/maven2|sources=1700255007489
> <http://repo.maven.apache.org/maven2%7Csources=1700255007489>
> > <http://repo.maven.apache.org/maven2%7Csources=1700255007489>
> >
> > I'm not a maven export, but perhaps where you're located (inside an
> > organization?) you cannot access these addresses, or they need to be put
> in
> > a whitelist somewhere?  Sean might know.
> >
> > Sorry about the confusion.  Maven makes things both transparent and
> opaque
> > at the same time.  Most disturbin
> > Peter
> >
> >
> >
> > On Mon, Nov 27, 2023 at 7:50 AM gandhi rajan 
> > wrote:
> >
> > > Hi Peter,
> > >
> > > I have pulled the code from main branch and the local copy is clean
&

Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
Hi Ghandi
I did some checking around and sure enough, the resource files are not in
the git archive.  I remember a conversation about this from long ago, that
git wasn't the best place for large binaries.  And they're not in git.  So
I looked in my maven repository to see where my model files had come from.
Those models and in fact all the resource data for ctakes comes from these
sources:

here's the content of my
.m2/repository/org/apache/ctakes/ctakes-core-models/5.0.0-SNAPSHOT

_remote.repositories
ctakes-core-models-5.0.0-20221224.062752-3.jar
ctakes-core-models-5.0.0-20221224.062752-3.jar.sha1
ctakes-core-models-5.0.0-20221224.062752-3.pom
ctakes-core-models-5.0.0-20221224.062752-3.pom.sha1
ctakes-core-models-5.0.0-SNAPSHOT-javadoc.jar.lastUpdated
ctakes-core-models-5.0.0-SNAPSHOT-sources.jar.lastUpdated
ctakes-core-models-5.0.0-SNAPSHOT.jar
ctakes-core-models-5.0.0-SNAPSHOT.pom
m2e-lastUpdated.properties
maven-metadata-apache.snapshots.xml
maven-metadata-apache.snapshots.xml.sha1
resolver-status.properties

and here's the content of m2e-lastUpdated.properties

#Fri Nov 17 22:03:31 CET 2023
apache.snapshots|https\://
repository.apache.org/content/groups/snapshots/|javadoc=1700255011056
apache.snapshots|https\://
repository.apache.org/content/groups/snapshots/|sources=1700255007489
central|https\://repo.maven.apache.org/maven2|javadoc=1700255011056
central|https\://repo.maven.apache.org/maven2|sources=1700255007489

I'm not a maven export, but perhaps where you're located (inside an
organization?) you cannot access these addresses, or they need to be put in
a whitelist somewhere?  Sean might know.

Sorry about the confusion.  Maven makes things both transparent and opaque
at the same time.  Most disturbin
Peter



On Mon, Nov 27, 2023 at 7:50 AM gandhi rajan 
wrote:

> Hi Peter,
>
> I have pulled the code from main branch and the local copy is clean without
> any changes to it. My issue still remains the same. The build is trying to
> pull the org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT jar from the
> remote maven repo which its not able to find.
>
> Complete error trace as follows:
>
> [ERROR] Failed to execute goal on project ctakes-core: Could not resolve
> dependencies for project org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT:
> The following artifacts could not be resolved:
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent): Cannot
> access nexus (https://repo1.maven.org/maven2) in offline mode and the
> artifact org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT has not
> been downloaded from it before.
>
>
> On Sun, 26 Nov 2023 at 23:25, Peter Abramowitsch 
> wrote:
>
> > Ghandi,  I'm going out for a bit.   Not sure how familiar you are with
> git,
> > so forgive me if this is obvious
> >
> >
> >
> > *$ git statusOn branch mainnothing to commit, working tree clean*
> >
> > If you've started making changes stash them in a branch and make sure
> your
> > tree is clean to simplify what you are finding.
> >
> > Peter
> >
> > On Sun, Nov 26, 2023 at 6:47 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > Nope,  that is not the issue. Those version changes are intentional and
> > > necessary.   The core pom and the subsidiary poms have been changed to
> > > 5.0.0-SNAPSHOT as well because this is a new release. For me it
> > > compiles and runs fine (I am waiting for permission to check in my
> > changes
> > > for the context token path issue to help with eclipse building.  But if
> > > you're just using mvn  to build you should be fine.
> > >
> > > The problem you're having seems to be that maven isn't able to fetch
> the
> > > dependencies you need to build it.
> > >
> > > From your top level, if you do a *git status* and see that there are no
> > > modifications or missing files,
> > >
> > > then try:*mvn --offline package*  (to see if the relevant
> > > dependencies are already present)
> > >
> > > and send me the output.
> > >
> > >
> > >
> > > On Sun, Nov 26, 2023 at 6:30 PM gandhi rajan 
> > > wrote:
> > >
> > >> hi Peter,
> > >>
> > >> Thanks for the inputs. But I could see that ctakes-core module's
> pom.xml
> > >> has changed from 4.0.1 snapshot to 5.0.0 snapshot and I guess this is
> > >> causing the issue in my case. Please find the pom.xml changes for
> > >> your reference:
> > >>
> > >> *ctakes-core 4.0.1:*
> > >>
> > >> 
> > >> org.apache.ctakes
> > >> ctakes-core-res
> > >>  
> > &g

Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
Ghandi,  I'm going out for a bit.   Not sure how familiar you are with git,
so forgive me if this is obvious



*$ git statusOn branch mainnothing to commit, working tree clean*

If you've started making changes stash them in a branch and make sure your
tree is clean to simplify what you are finding.

Peter

On Sun, Nov 26, 2023 at 6:47 PM Peter Abramowitsch 
wrote:

> Nope,  that is not the issue. Those version changes are intentional and
> necessary.   The core pom and the subsidiary poms have been changed to
> 5.0.0-SNAPSHOT as well because this is a new release. For me it
> compiles and runs fine (I am waiting for permission to check in my changes
> for the context token path issue to help with eclipse building.  But if
> you're just using mvn  to build you should be fine.
>
> The problem you're having seems to be that maven isn't able to fetch the
> dependencies you need to build it.
>
> From your top level, if you do a *git status* and see that there are no
> modifications or missing files,
>
> then try:*mvn --offline package*  (to see if the relevant
> dependencies are already present)
>
> and send me the output.
>
>
>
> On Sun, Nov 26, 2023 at 6:30 PM gandhi rajan 
> wrote:
>
>> hi Peter,
>>
>> Thanks for the inputs. But I could see that ctakes-core module's pom.xml
>> has changed from 4.0.1 snapshot to 5.0.0 snapshot and I guess this is
>> causing the issue in my case. Please find the pom.xml changes for
>> your reference:
>>
>> *ctakes-core 4.0.1:*
>>
>> 
>> org.apache.ctakes
>> ctakes-core-res
>>  
>>
>> *ctakes-core 5.0.0:*
>>
>> 
>> org.apache.ctakes
>> ctakes-core-models
>>  ${ctakes.models.version}
>>  
>>
>> Do u know about the changes Peter by any chance? or only Sean can throw
>> some light on these changes?
>>
>> On Sun, 26 Nov 2023 at 22:54, Peter Abramowitsch > >
>> wrote:
>>
>> > Hi Ghandi
>> >
>> > See how far you get when you built using --offline
>> >
>> > Peter
>> >
>> > On Sun, Nov 26, 2023 at 6:00 PM gandhi rajan 
>> > wrote:
>> >
>> > > Hi Peter,
>> > >
>> > > I am not using any IDE to build the project. I just used git bash to
>> pull
>> > > the project from https://github.com/apache/ctakes to local folder and
>> > > tried
>> > > build the project from the root pom.xml
>> > >
>> > > The issue reported by build seems to be something strange.
>> > >
>> > > ERROR] Failed to execute goal on project ctakes-core: Could not
>> resolve
>> > > dependencies for project
>> > org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT:
>> > > The following artifacts could not be resolved:
>> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
>> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found
>> in
>> > > https://repo1.maven.org/maven2 during a previous attempt. This
>> failure
>> > was
>> > > cached in the local repository and resolution is not reattempted until
>> > the
>> > > update interval of nexus has elapsed or updates are forced -> [Help 1]
>> > > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
>> execute
>> > > goal on project ctakes-core: Could not resolve dependencies for
>> project
>> > > org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT: The following
>> artifacts
>> > > could not be resolved:
>> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
>> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found
>> in
>> > > https://repo1.maven.org/maven2 during a previous attempt. This
>> failure
>> > was
>> > > cached in the local repository and resolution is not reattempted until
>> > the
>> > > update interval of nexus has elapsed or updates are forced.
>> > >
>> > > Looks like maven build is expecting "ctakes:ctakes-core-models" in the
>> > > maven repository and trying to pull the same from remote repo.
>> > >
>> > >
>> > >
>> > > On Sun, 26 Nov 2023 at 21:19, Peter Abramowitsch <
>> > pabramowit...@gmail.com>
>> > > wrote:
>> > >
>> > > > Just a curiosity - Ghandi, are you using Eclipse+Maven?  If not, do
>> you
>> > > > have another IDE wrapped around Maven ?
>> > > >
&

Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
Nope,  that is not the issue. Those version changes are intentional and
necessary.   The core pom and the subsidiary poms have been changed to
5.0.0-SNAPSHOT as well because this is a new release. For me it
compiles and runs fine (I am waiting for permission to check in my changes
for the context token path issue to help with eclipse building.  But if
you're just using mvn  to build you should be fine.

The problem you're having seems to be that maven isn't able to fetch the
dependencies you need to build it.

>From your top level, if you do a *git status* and see that there are no
modifications or missing files,

then try:*mvn --offline package*  (to see if the relevant dependencies
are already present)

and send me the output.



On Sun, Nov 26, 2023 at 6:30 PM gandhi rajan 
wrote:

> hi Peter,
>
> Thanks for the inputs. But I could see that ctakes-core module's pom.xml
> has changed from 4.0.1 snapshot to 5.0.0 snapshot and I guess this is
> causing the issue in my case. Please find the pom.xml changes for
> your reference:
>
> *ctakes-core 4.0.1:*
>
> 
> org.apache.ctakes
> ctakes-core-res
>  
>
> *ctakes-core 5.0.0:*
>
> 
> org.apache.ctakes
> ctakes-core-models
>  ${ctakes.models.version}
>  
>
> Do u know about the changes Peter by any chance? or only Sean can throw
> some light on these changes?
>
> On Sun, 26 Nov 2023 at 22:54, Peter Abramowitsch 
> wrote:
>
> > Hi Ghandi
> >
> > See how far you get when you built using --offline
> >
> > Peter
> >
> > On Sun, Nov 26, 2023 at 6:00 PM gandhi rajan 
> > wrote:
> >
> > > Hi Peter,
> > >
> > > I am not using any IDE to build the project. I just used git bash to
> pull
> > > the project from https://github.com/apache/ctakes to local folder and
> > > tried
> > > build the project from the root pom.xml
> > >
> > > The issue reported by build seems to be something strange.
> > >
> > > ERROR] Failed to execute goal on project ctakes-core: Could not resolve
> > > dependencies for project
> > org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT:
> > > The following artifacts could not be resolved:
> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found
> in
> > > https://repo1.maven.org/maven2 during a previous attempt. This failure
> > was
> > > cached in the local repository and resolution is not reattempted until
> > the
> > > update interval of nexus has elapsed or updates are forced -> [Help 1]
> > > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
> execute
> > > goal on project ctakes-core: Could not resolve dependencies for project
> > > org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT: The following
> artifacts
> > > could not be resolved:
> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
> > > org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found
> in
> > > https://repo1.maven.org/maven2 during a previous attempt. This failure
> > was
> > > cached in the local repository and resolution is not reattempted until
> > the
> > > update interval of nexus has elapsed or updates are forced.
> > >
> > > Looks like maven build is expecting "ctakes:ctakes-core-models" in the
> > > maven repository and trying to pull the same from remote repo.
> > >
> > >
> > >
> > > On Sun, 26 Nov 2023 at 21:19, Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > > wrote:
> > >
> > > > Just a curiosity - Ghandi, are you using Eclipse+Maven?  If not, do
> you
> > > > have another IDE wrapped around Maven ?
> > > >
> > > >
> > > >
> > > > On Sun, Nov 26, 2023 at 4:43 PM Peter Abramowitsch <
> > > > pabramowit...@gmail.com>
> > > > wrote:
> > > >
> > > > > HI Ghandi,
> > > > >
> > > > > That's one of the fundamental jars that gets built when you start
> > from
> > > > the
> > > > > top.  And if you encounter the error I found, "core" isn't going to
> > be
> > > > > built and therefore any succeeding component also dependent on core
> > > will
> > > > > also fail.   Check your build log and see if it doesn't mention the
> > > > > "contexttokenizer"
> > > > >
> > > > > I will be checking in a fix.  A bunch 

Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
Hi Ghandi

See how far you get when you built using --offline

Peter

On Sun, Nov 26, 2023 at 6:00 PM gandhi rajan 
wrote:

> Hi Peter,
>
> I am not using any IDE to build the project. I just used git bash to pull
> the project from https://github.com/apache/ctakes to local folder and
> tried
> build the project from the root pom.xml
>
> The issue reported by build seems to be something strange.
>
> ERROR] Failed to execute goal on project ctakes-core: Could not resolve
> dependencies for project org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT:
> The following artifacts could not be resolved:
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found in
> https://repo1.maven.org/maven2 during a previous attempt. This failure was
> cached in the local repository and resolution is not reattempted until the
> update interval of nexus has elapsed or updates are forced -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal on project ctakes-core: Could not resolve dependencies for project
> org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT: The following artifacts
> could not be resolved:
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found in
> https://repo1.maven.org/maven2 during a previous attempt. This failure was
> cached in the local repository and resolution is not reattempted until the
> update interval of nexus has elapsed or updates are forced.
>
> Looks like maven build is expecting "ctakes:ctakes-core-models" in the
> maven repository and trying to pull the same from remote repo.
>
>
>
> On Sun, 26 Nov 2023 at 21:19, Peter Abramowitsch 
> wrote:
>
> > Just a curiosity - Ghandi, are you using Eclipse+Maven?  If not, do you
> > have another IDE wrapped around Maven ?
> >
> >
> >
> > On Sun, Nov 26, 2023 at 4:43 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > HI Ghandi,
> > >
> > > That's one of the fundamental jars that gets built when you start from
> > the
> > > top.  And if you encounter the error I found, "core" isn't going to be
> > > built and therefore any succeeding component also dependent on core
> will
> > > also fail.   Check your build log and see if it doesn't mention the
> > > "contexttokenizer"
> > >
> > > I will be checking in a fix.  A bunch of files, in the next day or so.
> > >
> > > Peter
> > >
> > > On Sun, Nov 26, 2023 at 3:46 PM gandhi rajan 
> > > wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> I tried building the ctakes project from
> > https://github.com/apache/ctakes
> > >> out of curiosity to check on this issue. But I am hitting on a
> different
> > >> issue in building ctakes-core module.  The error is "Could not resolve
> > >> dependencies for project
> > org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT"
> > >>
> > >> Am I missing something? Where do I get or build
> > >> org.apache.ctakes:ctakes-core.jar?
> > >>
> > >> On Sun, 26 Nov 2023 at 13:47, Peter Abramowitsch <
> > pabramowit...@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > About package naming and the context tokenizer, I was quite puzzled
> as
> > >> to
> > >> > why no one had so far complained about the compilation issues in the
> > Git
> > >> > Archive which I noticed.
> > >> >
> > >> > The issue is that a bunch of the ctakes files refer to a package
> > >> >
> > >> > *org.apache.ctakes.*
> > >> > *contexttokenizer/...*
> > >> >
> > >> > when its contents actually live in the folder
> > >> >
> > >> > *org/apache/ctakes/context/**tokenizer/*
> > >> >
> > >> > I did some research and discovered something that I hadn't known.
> > >> > Apparently the Java spec suggests but doesn't enforce that package
> > names
> > >> > and folder structure should mirror each other.
> > >> >
> > >> > While Eclipse enforces it, some other build environments may not.
> > This
> > >> was
> > >> > reported to the Eclipse team years ago and was assigned "wont-fix"
> > >> status.
> > >> > I think I a

Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
Hi Ghandi

I am not an expert in these things, but I did notice that I needed to
modify the maven configuration to allow http fetching instead of only
https.  Otherwise I get an error while fetching dependencies and it all
blocks.   Unfortunately slf4j-api comes from a source that has an http
url.
So on my Mac at /usr/local/maven/config/settings.xml  I needed to comment
this out:

   
  maven-default-http-blocker
  external:http:*
  Pseudo repository to mirror external repositories initially
using HTTP.
  http://0.0.0.0/
  true




See if that helps.

Once your repository is up to date for this project you can also use mvn
--offline   to speed things up



On Sun, Nov 26, 2023 at 6:00 PM gandhi rajan 
wrote:

> Hi Peter,
>
> I am not using any IDE to build the project. I just used git bash to pull
> the project from https://github.com/apache/ctakes to local folder and
> tried
> build the project from the root pom.xml
>
> The issue reported by build seems to be something strange.
>
> ERROR] Failed to execute goal on project ctakes-core: Could not resolve
> dependencies for project org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT:
> The following artifacts could not be resolved:
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found in
> https://repo1.maven.org/maven2 during a previous attempt. This failure was
> cached in the local repository and resolution is not reattempted until the
> update interval of nexus has elapsed or updates are forced -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal on project ctakes-core: Could not resolve dependencies for project
> org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT: The following artifacts
> could not be resolved:
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT (absent):
> org.apache.ctakes:ctakes-core-models:jar:5.0.0-SNAPSHOT was not found in
> https://repo1.maven.org/maven2 during a previous attempt. This failure was
> cached in the local repository and resolution is not reattempted until the
> update interval of nexus has elapsed or updates are forced.
>
> Looks like maven build is expecting "ctakes:ctakes-core-models" in the
> maven repository and trying to pull the same from remote repo.
>
>
>
> On Sun, 26 Nov 2023 at 21:19, Peter Abramowitsch 
> wrote:
>
> > Just a curiosity - Ghandi, are you using Eclipse+Maven?  If not, do you
> > have another IDE wrapped around Maven ?
> >
> >
> >
> > On Sun, Nov 26, 2023 at 4:43 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > HI Ghandi,
> > >
> > > That's one of the fundamental jars that gets built when you start from
> > the
> > > top.  And if you encounter the error I found, "core" isn't going to be
> > > built and therefore any succeeding component also dependent on core
> will
> > > also fail.   Check your build log and see if it doesn't mention the
> > > "contexttokenizer"
> > >
> > > I will be checking in a fix.  A bunch of files, in the next day or so.
> > >
> > > Peter
> > >
> > > On Sun, Nov 26, 2023 at 3:46 PM gandhi rajan 
> > > wrote:
> > >
> > >> Hi Peter,
> > >>
> > >> I tried building the ctakes project from
> > https://github.com/apache/ctakes
> > >> out of curiosity to check on this issue. But I am hitting on a
> different
> > >> issue in building ctakes-core module.  The error is "Could not resolve
> > >> dependencies for project
> > org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT"
> > >>
> > >> Am I missing something? Where do I get or build
> > >> org.apache.ctakes:ctakes-core.jar?
> > >>
> > >> On Sun, 26 Nov 2023 at 13:47, Peter Abramowitsch <
> > pabramowit...@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > About package naming and the context tokenizer, I was quite puzzled
> as
> > >> to
> > >> > why no one had so far complained about the compilation issues in the
> > Git
> > >> > Archive which I noticed.
> > >> >
> > >> > The issue is that a bunch of the ctakes files refer to a package
> > >> >
> > >> > *org.apache.ctakes.*
> > >> > *contexttokenizer/...*
> > >> >
> > >> > when its contents actually live in the folder
> > >> >
> > >> > *org/apache/ctakes/contex

Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
Just a curiosity - Ghandi, are you using Eclipse+Maven?  If not, do you
have another IDE wrapped around Maven ?



On Sun, Nov 26, 2023 at 4:43 PM Peter Abramowitsch 
wrote:

> HI Ghandi,
>
> That's one of the fundamental jars that gets built when you start from the
> top.  And if you encounter the error I found, "core" isn't going to be
> built and therefore any succeeding component also dependent on core will
> also fail.   Check your build log and see if it doesn't mention the
> "contexttokenizer"
>
> I will be checking in a fix.  A bunch of files, in the next day or so.
>
> Peter
>
> On Sun, Nov 26, 2023 at 3:46 PM gandhi rajan 
> wrote:
>
>> Hi Peter,
>>
>> I tried building the ctakes project from https://github.com/apache/ctakes
>> out of curiosity to check on this issue. But I am hitting on a different
>> issue in building ctakes-core module.  The error is "Could not resolve
>> dependencies for project org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT"
>>
>> Am I missing something? Where do I get or build
>> org.apache.ctakes:ctakes-core.jar?
>>
>> On Sun, 26 Nov 2023 at 13:47, Peter Abramowitsch > >
>> wrote:
>>
>> > About package naming and the context tokenizer, I was quite puzzled as
>> to
>> > why no one had so far complained about the compilation issues in the Git
>> > Archive which I noticed.
>> >
>> > The issue is that a bunch of the ctakes files refer to a package
>> >
>> > *org.apache.ctakes.*
>> > *contexttokenizer/...*
>> >
>> > when its contents actually live in the folder
>> >
>> > *org/apache/ctakes/context/**tokenizer/*
>> >
>> > I did some research and discovered something that I hadn't known.
>> > Apparently the Java spec suggests but doesn't enforce that package names
>> > and folder structure should mirror each other.
>> >
>> > While Eclipse enforces it, some other build environments may not.  This
>> was
>> > reported to the Eclipse team years ago and was assigned "wont-fix"
>> status.
>> > I think I agree with that decision. Since Java's consistency is one of
>> its
>> > great virtues, with class names required to mirror file names, why allow
>> > fuzzy folder placement of sources?
>> >
>> > In the case of the Git archive for ctakes, the folders are already
>> logical
>> > and "correct", but in some files the package names and imports for the
>> > *context.tokenizer* are mismatching.  Since I do use Eclipse, I know
>> that
>> > the context.tokenizer is the only instance of this issue.
>> >
>> > Would anyone mind if I corrected the package names and references to
>> match
>> > the folders?
>> >
>> > Peter
>> >
>>
>>
>> --
>> Regards,
>> Gandhi
>>
>> "The best way to find urself is to lose urself in the service of others
>> !!!"
>>
>


Re: Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
HI Ghandi,

That's one of the fundamental jars that gets built when you start from the
top.  And if you encounter the error I found, "core" isn't going to be
built and therefore any succeeding component also dependent on core will
also fail.   Check your build log and see if it doesn't mention the
"contexttokenizer"

I will be checking in a fix.  A bunch of files, in the next day or so.

Peter

On Sun, Nov 26, 2023 at 3:46 PM gandhi rajan 
wrote:

> Hi Peter,
>
> I tried building the ctakes project from https://github.com/apache/ctakes
> out of curiosity to check on this issue. But I am hitting on a different
> issue in building ctakes-core module.  The error is "Could not resolve
> dependencies for project org.apache.ctakes:ctakes-core:jar:5.0.0-SNAPSHOT"
>
> Am I missing something? Where do I get or build
> org.apache.ctakes:ctakes-core.jar?
>
> On Sun, 26 Nov 2023 at 13:47, Peter Abramowitsch 
> wrote:
>
> > About package naming and the context tokenizer, I was quite puzzled as to
> > why no one had so far complained about the compilation issues in the Git
> > Archive which I noticed.
> >
> > The issue is that a bunch of the ctakes files refer to a package
> >
> > *org.apache.ctakes.*
> > *contexttokenizer/...*
> >
> > when its contents actually live in the folder
> >
> > *org/apache/ctakes/context/**tokenizer/*
> >
> > I did some research and discovered something that I hadn't known.
> > Apparently the Java spec suggests but doesn't enforce that package names
> > and folder structure should mirror each other.
> >
> > While Eclipse enforces it, some other build environments may not.  This
> was
> > reported to the Eclipse team years ago and was assigned "wont-fix"
> status.
> > I think I agree with that decision. Since Java's consistency is one of
> its
> > great virtues, with class names required to mirror file names, why allow
> > fuzzy folder placement of sources?
> >
> > In the case of the Git archive for ctakes, the folders are already
> logical
> > and "correct", but in some files the package names and imports for the
> > *context.tokenizer* are mismatching.  Since I do use Eclipse, I know that
> > the context.tokenizer is the only instance of this issue.
> >
> > Would anyone mind if I corrected the package names and references to
> match
> > the folders?
> >
> > Peter
> >
>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>


Compilation Errors and the context.tokenizer

2023-11-26 Thread Peter Abramowitsch
About package naming and the context tokenizer, I was quite puzzled as to
why no one had so far complained about the compilation issues in the Git
Archive which I noticed.

The issue is that a bunch of the ctakes files refer to a package

*org.apache.ctakes.*
*contexttokenizer/...*

when its contents actually live in the folder

*org/apache/ctakes/context/**tokenizer/*

I did some research and discovered something that I hadn't known.
Apparently the Java spec suggests but doesn't enforce that package names
and folder structure should mirror each other.

While Eclipse enforces it, some other build environments may not.  This was
reported to the Eclipse team years ago and was assigned "wont-fix" status.
I think I agree with that decision. Since Java's consistency is one of its
great virtues, with class names required to mirror file names, why allow
fuzzy folder placement of sources?

In the case of the Git archive for ctakes, the folders are already logical
and "correct", but in some files the package names and imports for the
*context.tokenizer* are mismatching.  Since I do use Eclipse, I know that
the context.tokenizer is the only instance of this issue.

Would anyone mind if I corrected the package names and references to match
the folders?

Peter


Starting to look at 5.0 repo and found this...

2023-11-17 Thread Peter Abramowitsch
Hi all,   Looking at the 5.0 repo, there's a compilation error across many
projects because what is being imported as
*org.apache.ctakes.contexttokenizer.ae.**

is actually located in package
*org.apache.ctakes.context.tokenizer.ae
*
and the maven artifact is declared as
*ctakes-context-tokenizer*

I've changed the import statements in 13 files, but if there's anyone who
feels strongly that I should leave those alone and change the package &
folder instead, let me know.

DId anyone else notice this too?

Peter


Re: Junk E-Mail Fwd: Initial CTakes analysis

2023-08-11 Thread Peter Abramowitsch
Paul
There are many images and papers online that describe cTakes at a high
level.  Probably none of them are 100% comprehensive but they will get you
started.

try this:
https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup

Peter

On Fri, Aug 11, 2023 at 10:37 AM Peter Abramowitsch 
wrote:

> Hi Paul
>
> 1.  The cTakes ecosystem is Java with a some optional Python code. I have
> little experience running it in a Windows environment and so perhaps
> someone else in the group can give you pointers.   My instinct would be to
> run it in a Linux based Docker instance - which I do anyway for some
> clients.   You can package it yourself as a standalone application talking
> to a database or you can use a Webservice wrapper around it which exists in
> the codebase (that is either Dockerized or packaged as.a WAR or both).
> Then you can implement a REST client in a pure Windows environment if that
> is easier for you.
>
> 2.  cTakes is an open source project going back to 2012, and as such, uses
> many different technical approaches in its various components:  pattern
> recognition, state machines, POS and treebank extractors  and some ML
> techniques but it does not have a user friendly training mechanism for
> those components, although there are some examples.
>
> The best way to understand it is to download it and get started.
>
> Peter
>
> On Fri, Aug 11, 2023 at 6:45 AM Paul Stearns 
> wrote:
>
>> Peter:
>>
>> Thanks for the detailed and thoughtful explanation.
>>
>> The easiest part for me to understand and work through would be #6. My MO
>> for this sort of thing with both currently used in the existing target
>> system are Windows services with associated DB queues and DLLs called from
>> the application. The former for items which are not needed as part of the
>> "real time" application and the latter for those which are.
>>
>> I currently have a homegrown application which looks for keywords and
>> negation modifiers within a certain distance from the keywords which works
>> moderately well.
>>
>> My ignorance regarding NLP algorithms like CTakes is whether it is
>> keyword driven, or it is self learning. If it is the latter, I have a
>> fairly large collection of human curated data which I could feed a training
>> module.
>>
>> Where can I find an "executive overview" (30,000 foot view) of how the
>> CTakes works?
>>
>> Paul R. Stearns
>> Advanced Consulting Enterprises, Inc.
>> 15150 NW 79th Court,
>> Suite: 206
>> Miami Lakes Fl, 33016
>>
>> Voice: (305)623-0360 x107
>> Fax: (305)623-4588
>>
>> 
>> From: "Peter Abramowitsch" 
>> Sent: 8/10/23 11:59 PM
>> To: dev@ctakes.apache.org
>> Subject: Junk E-Mail Fwd: Initial CTakes analysis
>>
>> Hi Paul
>> Out of the box, cTakes would get you part of the way there, but would
>> require several types of customization to meet your requirements. All of
>> these are the kind of customizations that most of us have had to do, so
>> there's nothing new here, but they are not trivial. As I see it they fall
>> into these categories.
>>
>> 1. getting familiar with the cTakes Application, pipeline, annotator and
>> vocabulary ecosystem
>> 2. choosing a vocabulary subset that gives the best coverage of the terms
>> you are looking for
>> 3. adding one or more custom dictionaries to add terms & synonyms that are
>> not present -
>> 4. maybe employing the anatomical site annotator in your pipeline
>> 5. deciding how to harvest and structure the data you extract from the CAS
>> object which all the annotators target
>> 6. decide how to deploy the application (standalone?, webservices host?
>> multi-instance? ). Many considerations go into this and greatly affect
>> ability to scale. There is more than one architectural solution that will
>> work and allow you to get to your "fully automated" goal, but you will
>> need
>> to implement that yourself.
>>
>> A hint about highlighting the text - all annotations carry text offsets so
>> with these you can write code (usually JS and CSS) to do your
>> highlighting. native cTakes does not have any graphical display
>> functionality.
>>
>> Another hint learned from experience. If you have many large texts (say,
>> 20kb and above with lots of potential terms to discover), you can achieve
>> much better throughput by breaking these into smaller chunks at sentence
>> boundaries and tweaking offsets accordingly as you reassemble the 

Re: Junk E-Mail Fwd: Initial CTakes analysis

2023-08-11 Thread Peter Abramowitsch
Hi Paul

1.  The cTakes ecosystem is Java with a some optional Python code. I have
little experience running it in a Windows environment and so perhaps
someone else in the group can give you pointers.   My instinct would be to
run it in a Linux based Docker instance - which I do anyway for some
clients.   You can package it yourself as a standalone application talking
to a database or you can use a Webservice wrapper around it which exists in
the codebase (that is either Dockerized or packaged as.a WAR or both).
Then you can implement a REST client in a pure Windows environment if that
is easier for you.

2.  cTakes is an open source project going back to 2012, and as such, uses
many different technical approaches in its various components:  pattern
recognition, state machines, POS and treebank extractors  and some ML
techniques but it does not have a user friendly training mechanism for
those components, although there are some examples.

The best way to understand it is to download it and get started.

Peter

On Fri, Aug 11, 2023 at 6:45 AM Paul Stearns 
wrote:

> Peter:
>
> Thanks for the detailed and thoughtful explanation.
>
> The easiest part for me to understand and work through would be #6. My MO
> for this sort of thing with both currently used in the existing target
> system are Windows services with associated DB queues and DLLs called from
> the application. The former for items which are not needed as part of the
> "real time" application and the latter for those which are.
>
> I currently have a homegrown application which looks for keywords and
> negation modifiers within a certain distance from the keywords which works
> moderately well.
>
> My ignorance regarding NLP algorithms like CTakes is whether it is keyword
> driven, or it is self learning. If it is the latter, I have a fairly large
> collection of human curated data which I could feed a training module.
>
> Where can I find an "executive overview" (30,000 foot view) of how the
> CTakes works?
>
> Paul R. Stearns
> Advanced Consulting Enterprises, Inc.
> 15150 NW 79th Court,
> Suite: 206
> Miami Lakes Fl, 33016
>
> Voice: (305)623-0360 x107
> Fax: (305)623-4588
>
> 
> From: "Peter Abramowitsch" 
> Sent: 8/10/23 11:59 PM
> To: dev@ctakes.apache.org
> Subject: Junk E-Mail Fwd: Initial CTakes analysis
>
> Hi Paul
> Out of the box, cTakes would get you part of the way there, but would
> require several types of customization to meet your requirements. All of
> these are the kind of customizations that most of us have had to do, so
> there's nothing new here, but they are not trivial. As I see it they fall
> into these categories.
>
> 1. getting familiar with the cTakes Application, pipeline, annotator and
> vocabulary ecosystem
> 2. choosing a vocabulary subset that gives the best coverage of the terms
> you are looking for
> 3. adding one or more custom dictionaries to add terms & synonyms that are
> not present -
> 4. maybe employing the anatomical site annotator in your pipeline
> 5. deciding how to harvest and structure the data you extract from the CAS
> object which all the annotators target
> 6. decide how to deploy the application (standalone?, webservices host?
> multi-instance? ). Many considerations go into this and greatly affect
> ability to scale. There is more than one architectural solution that will
> work and allow you to get to your "fully automated" goal, but you will need
> to implement that yourself.
>
> A hint about highlighting the text - all annotations carry text offsets so
> with these you can write code (usually JS and CSS) to do your
> highlighting. native cTakes does not have any graphical display
> functionality.
>
> Another hint learned from experience. If you have many large texts (say,
> 20kb and above with lots of potential terms to discover), you can achieve
> much better throughput by breaking these into smaller chunks at sentence
> boundaries and tweaking offsets accordingly as you reassemble the chunks.
> The memory requirements grow rapidly with the size of the note.
>
> In summary, a strong developer background is a good starting point. To
> that you'd want to add medical informatics, and experience with scalable
> architectures. cTakes is a great kernel to your system but be prepared to
> dive deep.
>
> Peter
>
> On Thu, Aug 10, 2023 at 10:06 AM Paul Stearns 
> wrote:
>
> > I am looking for a NLP to read pathology reports and extract cancer
> > related site, histology, stage and any other DX/RX data available. In
> > looking at CTakes, I have a few questions;
> >
> > - Is CTakes an appropriate tool to automate this task?
> > - The end goal wou

Fwd: Initial CTakes analysis

2023-08-10 Thread Peter Abramowitsch
Hi Paul
Out of the box, cTakes would get you part of the way there, but would
require several types of customization to meet your requirements.  All of
these are the kind of customizations that most of us have had to do, so
there's nothing new here, but they are not trivial.  As I see it they fall
into these categories.

1. getting familiar with the cTakes Application, pipeline, annotator and
vocabulary ecosystem
2. choosing a vocabulary subset that gives the best coverage of the terms
you are looking for
3. adding one or more custom dictionaries to add terms & synonyms that are
not present -
4. maybe employing the anatomical site annotator in your pipeline
5. deciding how to harvest and structure the data you extract from the CAS
object which all the annotators target
6. decide how to deploy the application (standalone?,  webservices host?
multi-instance? ).  Many considerations go into this and greatly affect
ability to scale.  There is more than one architectural solution that will
work and allow you to get to your "fully automated" goal, but you will need
to implement that yourself.

A hint about highlighting the text - all annotations carry text offsets so
with these you can write code (usually JS and CSS) to do your
highlighting.  native cTakes does not have any graphical display
functionality.

Another hint learned from experience.  If you have many large texts (say,
20kb and above with lots of potential terms to discover), you can achieve
much better throughput by breaking these into smaller chunks at sentence
boundaries and tweaking offsets accordingly as you reassemble the chunks.
The memory requirements grow rapidly with the size of the note.

In summary, a strong developer background is a good starting point.  To
that you'd want to add medical informatics, and experience with scalable
architectures.  cTakes is a great kernel to your system but be prepared to
dive deep.

Peter



On Thu, Aug 10, 2023 at 10:06 AM Paul Stearns 
wrote:

> I am looking for a NLP to read pathology reports and extract cancer
> related site, histology, stage and any other DX/RX data available. In
> looking at CTakes, I have a few questions;
>
> - Is CTakes an appropriate tool to automate this task?
> - The end goal would be a fully automated tool where text was presented to
> an API and data was returned.
> - An added bonus, would be for the tool to annotate the text, so that a
> reviewer can more easily find the relevant data.
> - For someone with a strong IT/software development background, but no NLP
> background what is the level of difficulty in getting started with this
> product?
>
> Paul R. Stearns
> Advanced Consulting Enterprises, Inc.
> 15150 NW 79th Court,
> Suite: 206
> Miami Lakes Fl, 33016
>
> Voice: (305)623-0360 x107
> Fax: (305)623-4588
>


Re: Testing the 5.0 version [EXTERNAL]

2023-08-10 Thread Peter Abramowitsch
Hi Sean and everyone,  I'm happy to receive suggestions from others, but
because this is funded by a client, I will eventually have to put more
effort on areas that are of interest to them.  Also it is a limited
engagement, so I can't guarantee how thorough my testing will be - but I'll
certainly report back what I find.  I'll probably begin this work towards
the middle of September.

One area I probably will not test is the web REST module as we already have
a REST framework that we'll be continuing to use that is lightweight,
tailored to our production and security needs, and whose performance under
heavy load is well understood.  It is based on a SparkJava-Jetty
foundation rather than Spring. So maybe someone else can test the official
version.

regards
Peter

On Thu, Aug 10, 2023 at 7:37 AM Finan, Sean
 wrote:

> Hi Peter,
>
> That is great news.  I sometime soon I will take a gander at ctakes and
> see if I can identify areas of importance or concern to me and what I might
> do to test them.  However, don't think of that as being a definitive list.
>
> All, please take advantage of Peter's offer and share items that you would
> like to receive some attention.
>
> If anybody can, please work with Peter to help keep ctakes a top-notch
> application for clinical NLP.
>
> Cheers,
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Monday, August 7, 2023 11:48 AM
> To: dev@ctakes.apache.org 
> Subject: Testing the 5.0 version [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,   looks like my funding for some experimentation with 5.0 is
> finally going to happen in a month or so.  I'm going to be looking at all
> the new functionality (I'm back on a branch of 4.0.1 on a custom
> webservices platform),  but is there any particular area of 5.0 that you'd
> like me to exercise?
>
> Peter
>


Testing the 5.0 version

2023-08-07 Thread Peter Abramowitsch
Hi Sean,   looks like my funding for some experimentation with 5.0 is
finally going to happen in a month or so.  I'm going to be looking at all
the new functionality (I'm back on a branch of 4.0.1 on a custom
webservices platform),  but is there any particular area of 5.0 that you'd
like me to exercise?

Peter


What's new in 5.0 && testing JDK 11

2023-05-26 Thread Peter Abramowitsch
Hi Sean,

It looks like I may get some support from my employer to explore 5.0 this
summer, and while doing so, also test the jdk11 build, but I have a couple
of quick questions.

1.  If the system would still require 1.8 to run due to certain
dependencies, what would be the advantage of building it under 11? - or
were you suggesting that an 11 runtime would be possible by upgrading those
dependencies too.

2.  In building the complete 5.0 from git, I've run into a problem with
maven blocking certain artifacts due to http/https issues.  There are
global fixes and project by project specific fixes.  Which do you
recommend?   Ideally should maven be run with -o?


*[INFO] --- maven-remote-resources-plugin:1.4:process (default) @
ctakes-core ---Downloading from maven-default-http-blocker:
http://0.0.0.0/org/apache/ctakes/ctakes-models/5.0.0-SNAPSHOT/maven-metadata.xml
*

3.  Finally,  I had asked a while back if someone could point me to a list
of improvements or significant additions to cTakes that have occurred over
the last year or so.  Since no one responded, I decided to look at all the
SVN and Git commit messages and diffing the sources.

I did come across the PBJ project.  The readme doesn't actually explain
what it is for and there are various meanings of the term PBJ in the python
community.  This one looks like infrastructure to allow ctakes to be called
from a python pipeline using Artemis to decouple the processes -- or am I
wrong and it is the reverse (calling python from within a cTakes pipeline)

If there are any areas where  concept lookup has been improved through
better semantic contextualization please let us know!

Peter


Re: cTAKES running slower with each run

2023-04-12 Thread Peter Abramowitsch
There are many ways to package ctakes, and admittedly ours is unlike the
console app and we have our own multithreaded API, but we regularly do
millions of documents at a time and haven't seen this issue.  The core
application with  our fairly standard pipeline is up for a month at a time
with no degradation

Are you using any unusual or deprecated annotators?  It could be that one
of the less used ones doesn't separate initialization properly from it's
processing method and is caching something that it shouldn't..   Are you
seeing a concomitant growth in memory footprint?

Try running it under jvisualvm it may give you a clue.

Peter

On Wed, Apr 12, 2023 at 7:41 PM Milinovich, Alex  wrote:

> I’m running ctakes both as an API and as a console app.   Each time I hit
> ctakes, the run time per document is getting incrementally slower by a few
> thousands of a millisecond per xml element (to normalize for different
> document sizes) than the previous document.  Compound this over 1000
> documents in 20 minutes and the runs are going from 0.06 milliseconds per
> xml element to 1.5 milliseconds per xml element.  It’s a very consistent
> 0.002 millisecond increase in the rate for each subsequent document I throw
> at cTAKES.
>
>
>
> Is there any caching or garbage collection or something I should be on the
> lookout to adjust or fix?
>
>
>
> Thanks
>
>
>
> ~Alex
>
>
>
>
> *Alex Milinovich*
>
> Director of Research – Data Science Analytics  |  Quantitative Health
> Sciences
> 9500 Euclid Ave. – JJN3 | Cleveland, OH 44195 | m: (216) 245-7655
>
>
>
>
>
> Please consider the environment before printing this e-mail
> Cleveland Clinic is currently ranked as one of the nation’s top hospitals
> by *U.S. News & World Report* (2022-2023). Visit us online at
> http://www.clevelandclinic.org for a complete listing of our services,
> staff and locations. Confidentiality Note: This message is intended for use
> only by the individual or entity to which it is addressed and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient or the employee or agent responsible for delivering the message
> to the intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If
> you have received this communication in error, please contact the sender
> immediately and destroy the material in its entirety, whether electronic or
> hard copy. Thank you.
>


Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]

2023-02-22 Thread Peter Abramowitsch
Hi Sean and all,

If you expect the release process to last a couple of months, I can
volunteer.  At the moment and for the next few weeks I'm really busy.

One thing that would really help is to have a list of all the major changes
& additions that have happened since the 4.0.0 release.   I think that
would be valuable for everyone.
I also have some additions to fold in, but without a good knowledge of
what's been added/changed and why, it would not be safe to do that.

For instance there's a project ctakes-pbj in the sources.  Unless I've
missed something, it's Readme doesn't have any explanation of what it
actually is.  And there are new annotators and functionality.Is there a
comprehensive list?   Probably it would be for each author to document
their own additions, for accuracy and completeness.  I will be doing that
for sure.

Peter

On Wed, Feb 22, 2023 at 7:12 AM Finan, Sean
 wrote:

> Hi Gandhi,
>
> Thank you very much for volunteering!
>
> I am waiting to see if anybody else volunteers to be the RM, but I will
> help anybody that volunteers for any position as much as I can.
>
> Cheers,
>
> Sean
> 
> From: gandhi rajan 
> Sent: Monday, February 20, 2023 11:06 PM
> To: dev@ctakes.apache.org 
> Subject: Re: It is Official! Steps toward a cTAKES 5.0 release. [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,
>
> I can volunteer for co-RM so that I can work under your guidance. Thanks.
>
> On Tue, 21 Feb 2023 at 03:43, Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > The cTAKES Project Management Committee has voted that it is time to
> > officially begin the release process for cTAKES 5.0
> >
> > It has been almost 6 years since version 4.0.0 was released, and with a
> > worldwide user count estimated in the thousands, a new release will be
> > extremely valuable.
> >
> > Releasing cTAKES 5.0 will involve some work, and the project needs
> > volunteers to assist in the process.
> >
> > The most important thing right now is the appointment of a Release
> Manager
> > (RM).
> > While the position is not to be taken lightly and does involve work, it
> > can be a great experience (and a resume builder).
> >
> > We need a cTAKES committer to be the RM, but I am going to split the
> > general responsibilities below.
> > I am doing this because I believe that any user familiar with cTAKES can
> > be a co-RM.
> >
> > Requiring a committer:
> > 1.  Creating Release Candidates of the code.
> > 2.  Deploying and Signing the actual Official Release.
> >
> > Not requiring a committer:
> > 1.  Coordinating people performing documentation, testing and bug fixing.
> > 2.  Communicating progress with the developer list.
> >
> > I am sure that I am forgetting something, but those are the 4 tasks that
> I
> > can think of right now.
> >
> > If you would like to be the Release Manager (or a co-RM), please
> volunteer
> > on the dev@ctakes.apache.org mailing list.
> >
> > Other tasks that must be performed for a release include:
> > 1.  Testing the release candidates.
> > 3.  Contributing documentation.
> > 2.  Writing fixes for bugs that can be fixed for the release.
> > 4.  Updating the release information on ctakes.apache.org
> >
> > Anybody can test release candidates.  There are countless pipelines that
> > can be built and tested, but I think that we should try to cover the
> 'most
> > commonly used' pipelines.  If you run any pipeline, please report
> success -
> > even if you don't run it specifically for release testing.
> > Documentation can be contributed by any user.  A cTAKES committer is
> > required to actually push the documentation to the wiki, readme, release
> > notes, etc. Sending out markdown, images, plain text or just
> > recommendations is open to all users.
> > While only committers can actually push changes to cTAKES code, any user
> > can contribute fixes by creating code patches or even just copy-pasting
> > code in an email.
> > Updating the ctakes.apache.org website will require a committer, but
> > non-committer assistance is possible just like it is for bug fixes.
> >
> > One person (Tim Miller) has already volunteered to perform testing and
> > another (Dennis Johns) is currently working on the GitHub wiki.
> > I don't think that people need to officially volunteer to perform last 4
> > listed tasks, but it may be beneficial to identify areas that you would
> > like to cover in order to prevent duplicated work.
> >
> > I suspect that I am forgetting at least some minor items, but they will
> > come to light when encountered.
> >
> > I urge you all to take part in the release process.  You can earn good
> > karma, become famous as a cTAKES power user, and perhaps be nominated as
> a
> > Committer!
> >
> > Thank you all,
> >
> > Sean
> >
> >
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>


Re: Apache cTAKES is now on GitHub ! [EXTERNAL]

2023-01-01 Thread Peter Abramowitsch
Great, thanks Sean

Peter

On Sat, Dec 31, 2022 at 12:38 PM Finan, Sean
 wrote:

> Hi Peter,
>
> Privileges for the GitHub repository are the same as they were for the old
> SVN repository.  Anybody who is an Apache cTAKES Committer then should have
> write permission for the repository.
>
> You do need to have a GitHub account and connect it to your Apache account.
> There are a couple of ways to handle it and I did this myself quite a
> while ago, but I think that the most direct method is:
>
>   1.   Visit the Apache Account Utility at https://id.apache.org/
>   2.  Log in with your Apache username and Apache password.
>   3.  Halfway down the page, enter your GitHub Username in the first box
> labeled "Your GitHub Username".
>   4.  Save Changes.
>
> I think that it takes several hours for the system to establish the
> connection, at which point you might get some kind of notification email.
>
>
> Regarding the NegEx, ZoneAnnotator and anything else that you might have
> locally, please do check it in!  For some reason I thought that the Negex
> change was done a long time ago, but I wasn't really paying close
> attention.  Thanks again for the improvements!
>
> Cheers,
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Friday, December 30, 2022 11:31 PM
> To: dev@ctakes.apache.org 
> Subject: Re: Apache cTAKES is now on GitHub ! [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thank you Sean - looks like you & others put in a lot of work to make this
> transition.  I'm looking forward to the "toys" you mentioned.
> Will the repository protocol be the same as it was during the SVN days with
> designated contributors?
>
> Although I didn't receive any feedback, I might check in some improvements
> I made to the Negex module and to the ZoneAnnotator.  These have been in
> production for a year now, so I'm pretty sure they're stable.
>
> Peter
>
> On Fri, Dec 30, 2022 at 10:49 AM Finan, Sean
>  wrote:
>
> > Hi all,
> >
> > I am pleased to announce that the cTAKES source code is now on GitHub at
> >
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDmUTsdsGg$
> > [
> >
> https://urldefense.com/v3/__https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDk81Mxahw$
> > ]<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDmUTsdsGg$
> >
> > GitHub - apache/ctakes: Apache ctakes<
> https://urldefense.com/v3/__https://github.com/apache/ctakes__;!!NZvER7FxgEiBAiR_!vvc9ppnqFpdHxDNEs2tIbtt4ZSkRCeydNFZM3uDqPhxj9AkB6XB3VW9vwfHslXEvY16AAmkA5nsxjU42V9PdnJo2sDmUTsdsGg$
> >
> > Apache ctakes. Contribute to apache/ctakes development by creating an
> > account on GitHub.
> > github.com
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > All current and future code development should be performed on the source
> > in GitHub.
> >
> >
> >Changes ( vs. Subversion Repository )
> >=
> >
> >   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
> >   *   STRUCTURE:   The project has been slightly restructured at a high
> > level.  The typical user should not notice the difference.
> >   *   CODE API:   All package, class, method and constant names remain
> the
> > same, so your code should not need to be refactored.
> >   *   DEPENDENCIES:   If you include cTAKES modules as dependencies in
> > your maven project, you can simply change the version to obtain new
> > 5.0.0-SNAPSHOT builds. *
> >   *   BINARY PACKAGE:   The binary package has some minor differences,
> but
> > the typical user should not notice them.
> >
> > * If you use maven dependency exclusions for resource ('-res') modules
> > because of unwanted ML models, you need to change the excluded name
> > extension from '-res' to '-model'.
> >
> >
> >Moving forward from the Subversion Repository
> >=
> >
> >   *   VERSION:   The project in the SVN repository was versioned
> > 4.0.1-SNAPSHOT.
> >   *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT
> > Subversion (SVN) repository will remain av

Re: Apache cTAKES is now on GitHub !

2022-12-30 Thread Peter Abramowitsch
Thank you Sean - looks like you & others put in a lot of work to make this
transition.  I'm looking forward to the "toys" you mentioned.
Will the repository protocol be the same as it was during the SVN days with
designated contributors?

Although I didn't receive any feedback, I might check in some improvements
I made to the Negex module and to the ZoneAnnotator.  These have been in
production for a year now, so I'm pretty sure they're stable.

Peter

On Fri, Dec 30, 2022 at 10:49 AM Finan, Sean
 wrote:

> Hi all,
>
> I am pleased to announce that the cTAKES source code is now on GitHub at
> https://github.com/apache/ctakes
> [
> https://opengraph.githubassets.com/fcab5fb05ec83aeb556ec2e939a856d20cfb4d9684aa13253c82cc7370f1c9cd/apache/ctakes
> ]
> GitHub - apache/ctakes: Apache ctakes
> Apache ctakes. Contribute to apache/ctakes development by creating an
> account on GitHub.
> github.com
> 
> 
> 
> 
> 
> 
> 
>
> All current and future code development should be performed on the source
> in GitHub.
>
>
>Changes ( vs. Subversion Repository )
>=
>
>   *   VERSION:   The project in GitHub has been versioned 5.0.0-SNAPSHOT.
>   *   STRUCTURE:   The project has been slightly restructured at a high
> level.  The typical user should not notice the difference.
>   *   CODE API:   All package, class, method and constant names remain the
> same, so your code should not need to be refactored.
>   *   DEPENDENCIES:   If you include cTAKES modules as dependencies in
> your maven project, you can simply change the version to obtain new
> 5.0.0-SNAPSHOT builds. *
>   *   BINARY PACKAGE:   The binary package has some minor differences, but
> the typical user should not notice them.
>
> * If you use maven dependency exclusions for resource ('-res') modules
> because of unwanted ML models, you need to change the excluded name
> extension from '-res' to '-model'.
>
>
>Moving forward from the Subversion Repository
>=
>
>   *   VERSION:   The project in the SVN repository was versioned
> 4.0.1-SNAPSHOT.
>   *   DEPRECATION:   The code and resources in the 4.0.1-SNAPSHOT
> Subversion (SVN) repository will remain available for checkout, but should
> be considered read-only.  4.0.1-SNAPSHOT built modules will remain
> available for maven dependencies.  All current and future code development
> should be performed on the source in GitHub.
>   *   RELEASE:   There is no cTAKES 4.0.1 release.
>
>Next Anticipated Release
>
>
>   *   VERSION:   As you might guess from the snapshot version change, we
> are gearing up for a version 5.0.0 release.
>   *   WHY 5.0.0:   There are so many new features over cTAKES 4.0.0,
> including completely new modules, that the version number was bumped up.
>   *   DOCUMENTATION:   All of the new toys will be documented in the
> confluence wiki at the time of the 5.0.0 release.
>   *   DATE:   There is no release date yet, but hopefully it will be very
> very soon ...
>
> Happy New Year,
>
> Sean
>
>
>


Re: Best practices for documenting NLP versions

2022-10-21 Thread Peter Abramowitsch
Interesting, but it would depend on how the docker is set up.  Our docker
for instance, encapsulates all the code and imported jars, as you imply,
but the piper and other runtime configuration such as section regex, negex,
bsvs, etc are imported on a mounted FS during the container's runtime.
Having them frozen into the docker instances would proliferate vast numbers
of docker image-tars with 99% redundant data.  Or do you have a cleverer
solution?

Peter

On Fri, Oct 21, 2022 at 10:18 PM Greg Silverman  wrote:

> Why not use Docker and versioning by tags? See "C. Boettiger, An
> introduction to Docker for reproducible research, SIGOPS Oper. Syst. Rev.
> 49
> (2015) 71–79. doi:10.1145/2723872.2723882.
> <https://www.zotero.org/google-docs/?Xd3H9e>"
>
>
>
> On Fri, Oct 21, 2022 at 3:15 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Well, obviously, the full range of permutations of all source files and
> all
> > annotators and pre and post ctakes code would require a huge amount of
> > commit information on thousands of files... and not only ctakes
> > files...recently I made some pretty significant changes to the  ZonerCli
> > library which is only a dependency of the ctakes distribution. How would
> > all the commit info be used to tag the end results.  I think the answer
> is
> > that it's simply not feasible or useful. So we haven't gone to those
> > lengths.  As far as we go at the UCs  is to version the piper file and
> then
> > write the versioned_name of the piper back into the json object returned
> > for each note... We have our own rest service and our own Java and Python
> > clients, but they don't touch the internals of the message in a way that
> > interferes with the clinical informatics.  The note concept collection
> > object with its piper version is then persisted in our data store.   The
> > server jar also has a version which writes into a log and is updated
> > whenever any significant framework changes are implemented.   But the
> > server version is not written into the data-store.
> >
> > Not sure if any of this was helpful
> >
> > On Fri, Oct 21, 2022 at 8:03 PM Miller, Timothy
> >  wrote:
> >
> > > We’ve recently been using cTAKES for some internal projects where we
> make
> > > modifications, often using the REST server, combined with an
> open-source
> > > python client that makes the output of the REST server easy to
> > post-process:
> > >
> >
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
> > > written by my colleagues Andy McMurry and Mike Terry, and pip
> > installable.
> > > The output is then either converted to FHIR or written to whatever
> > > convenient format we need.
> > >
> > > But it’s useful to know for a given run on a given project, what was
> the
> > > NLP configuration that produced this output? Obviously, there are
> things
> > > like version numbers, but since cTAKES is highly configurable, and our
> > > post-processing libraries have versions, and we may use trunk or a
> > previous
> > > commit instead of releases, things get complicated quickly. Does anyone
> > > have an existing solution they are willing to share? Or does anyone
> have
> > > any thoughts on this topic? This question goes slightly beyond cTAKES,
> > but
> > > cTAKES is responsible for a lot of the complexity in figuring this out
> > > since it’s the most configurable component.
> > >
> > > Thanks
> > > Tim
> > >
> > >
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


Re: Best practices for documenting NLP versions

2022-10-21 Thread Peter Abramowitsch
Well, obviously, the full range of permutations of all source files and all
annotators and pre and post ctakes code would require a huge amount of
commit information on thousands of files... and not only ctakes
files...recently I made some pretty significant changes to the  ZonerCli
library which is only a dependency of the ctakes distribution. How would
all the commit info be used to tag the end results.  I think the answer is
that it's simply not feasible or useful. So we haven't gone to those
lengths.  As far as we go at the UCs  is to version the piper file and then
write the versioned_name of the piper back into the json object returned
for each note... We have our own rest service and our own Java and Python
clients, but they don't touch the internals of the message in a way that
interferes with the clinical informatics.  The note concept collection
object with its piper version is then persisted in our data store.   The
server jar also has a version which writes into a log and is updated
whenever any significant framework changes are implemented.   But the
server version is not written into the data-store.

Not sure if any of this was helpful

On Fri, Oct 21, 2022 at 8:03 PM Miller, Timothy
 wrote:

> We’ve recently been using cTAKES for some internal projects where we make
> modifications, often using the REST server, combined with an open-source
> python client that makes the output of the REST server easy to post-process:
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
> written by my colleagues Andy McMurry and Mike Terry, and pip installable.
> The output is then either converted to FHIR or written to whatever
> convenient format we need.
>
> But it’s useful to know for a given run on a given project, what was the
> NLP configuration that produced this output? Obviously, there are things
> like version numbers, but since cTAKES is highly configurable, and our
> post-processing libraries have versions, and we may use trunk or a previous
> commit instead of releases, things get complicated quickly. Does anyone
> have an existing solution they are willing to share? Or does anyone have
> any thoughts on this topic? This question goes slightly beyond cTAKES, but
> cTAKES is responsible for a lot of the complexity in figuring this out
> since it’s the most configurable component.
>
> Thanks
> Tim
>
>


Re: Two Questions about OverlapJcasTermAnnotator [EXTERNAL]

2022-08-23 Thread Peter Abramowitsch
Thanks Sean.
Glad to know there wasn't any special behavior with prefterms that I hadn't
known about all these years

Peter

On Tue, Aug 23, 2022 at 4:31 PM Finan, Sean
 wrote:

> Hi Peter,
>
> the "blood, urine"... in the example did work when I originally tested,
> but the default settings (window size, etc.) may have been changed since
> then.
>
> Everything in preftext is simple string literal.  It is likely that
> certain things will not appear in raw text.  The UMLS has some interesting
> synonym sources.
>
> Sean
>
> ____
> From: Peter Abramowitsch 
> Sent: Tuesday, August 23, 2022 6:00 PM
> To: dev@ctakes.apache.org 
> Subject: Two Questions about OverlapJcasTermAnnotator [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean (or whoever has some historical knowledge)
>
> I'm trying to improve the term annotators for speed and have noticed that
> the overlap term annotator does not seem to pass even the most rudimentary
> use cases suggested in the code comments:
>
> // things like "blood, urine, sputum cultures" should pick up "blood
> culture" and "urine culture"
>
> I'm happy to fix this, but my question is whether anyone can attest to
> whether it ever has worked, or what use cases you have to indicate that it
> does today.
>
> The other question is about the conventions in the term dictionary.  When a
> PREFTERM has symbols embedded in its text - like so:
>
> *'electrocardiogram ; 24 hour'*
> or so
> *'us . doppler . cw'*
> or so
> *'angioscopies , microscopic'*
>
> Do the symbols have any implied meaning or behavior somewhere in the
> pipeline, or are they literally part of the text? (which is usually an
> impossibility in real notes)
>


Two Questions about OverlapJcasTermAnnotator

2022-08-23 Thread Peter Abramowitsch
Hi Sean (or whoever has some historical knowledge)

I'm trying to improve the term annotators for speed and have noticed that
the overlap term annotator does not seem to pass even the most rudimentary
use cases suggested in the code comments:

// things like "blood, urine, sputum cultures" should pick up "blood
culture" and "urine culture"

I'm happy to fix this, but my question is whether anyone can attest to
whether it ever has worked, or what use cases you have to indicate that it
does today.

The other question is about the conventions in the term dictionary.  When a
PREFTERM has symbols embedded in its text - like so:

*'electrocardiogram ; 24 hour'*
or so
*'us . doppler . cw'*
or so
*'angioscopies , microscopic'*

Do the symbols have any implied meaning or behavior somewhere in the
pipeline, or are they literally part of the text? (which is usually an
impossibility in real notes)


Re: Issue in running developers version of Apache cTakes to process DefaultClinicalPipeline

2022-04-07 Thread Peter Abramowitsch
Hi Ankit,

One normally doesn't put the input and output folders in the trunk area,
but I just tried it and works fin

r 2022 08:15:53  INFO FinishedLogger - Run Start Time:   Thu
Apr 07 08:14:53 CEST 2022
07 Apr 2022 08:15:53  INFO FinishedLogger - Processing Start Time:
 Thu Apr 07 08:15:18 CEST 2022
07 Apr 2022 08:15:53  INFO FinishedLogger - Processing End Time:
 Thu Apr 07 08:15:53 CEST 2022
07 Apr 2022 08:15:53  INFO FinishedLogger - Initialization Time Elapsed:
 24 seconds
07 Apr 2022 08:15:53  INFO FinishedLogger - Processing Time Elapsed:
 35 seconds
07 Apr 2022 08:15:53  INFO FinishedLogger - Total Run Time Elapsed:   1
minutes, 0 seconds
.
Your error message is complaining about your  *command line option for the
input folder.*  That it is missing and that other args it's finding are
unexpected.

So my guess is that your command line is not exactly as you have written.
For instance, the input folder has a space and a hyphen its name
Having a argument of  -i input -folder   instead of -i input-folder  would
cause an error like yours.  I tried that and I get your error

Option must have a value: [--option_u -u value]
Option must have a value: [--option_t -t value]
at
com.lexicalscope.jewel.cli.ValidationErrorBuilderImpl.validate(ValidationErrorBuilderImpl.java:64)
at
com.lexicalscope.jewel.cli.validation.ArgumentValidatorImpl.finishedProcessing(ArgumentValidatorImpl.java:104)
at
com.lexicalscope.jewel.cli.ArgumentCollectionBuilder.processArguments(ArgumentCollectionBuilder.java:129)
at
com.lexicalscope.jewel.cli.AbstractCliImpl.parseArguments(AbstractCliImpl.java:42)
at com.lexicalscope.jewel.cli.CliFactory.parseArguments(CliFactory.java:67)
at
org.apache.ctakes.core.pipeline.PiperFileRunner.run(PiperFileRunner.java:39)
at
org.apache.ctakes.core.pipeline.PiperFileRunner.main(PiperFileRunner.java:30)

Peter


On Thu, Apr 7, 2022 at 1:46 AM Anand, Ankit (Campus)  wrote:

> Hi,
>
> I installed developer’s version of ctakes and I was able to build it
> successfully.
>
> Now, when I try to run the the default Clinical pipeline using command –
>
> bin/runClinicalPipeline.sh  -i input  --xmiOut output  --user ankitk
> --key umlsAPIKey
>
> I am inside trunk directory which have bin, input where I have stored the
> file to process and output directories
>
> I am getting an error message –
>
> log4j: reset attribute= "false".log4j: Threshold ="null".log4j: Retreiving
> an instance of org.apache.log4j.Logger.log4j: Setting [ProgressAppender]
> additivity to [false].log4j: Level value for ProgressAppender is
> [INFO].log4j: ProgressAppender level set to INFOlog4j: Class name:
> [org.apache.log4j.ConsoleAppender]log4j: Parsing layout of class:
> "org.apache.log4j.PatternLayout"log4j: Setting property [conversionPattern]
> to [%m].log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].log4j: Retreiving an instance of
> org.apache.log4j.Logger.log4j: Setting [ProgressDone] additivity to
> [false].log4j: Level value for ProgressDone is [INFO].log4j: ProgressDone
> level set to INFOlog4j: Class name:
> [org.apache.log4j.ConsoleAppender]log4j: Parsing layout of class:
> "org.apache.log4j.PatternLayout"log4j: Setting property [conversionPattern]
> to [%m%n].log4j: Adding appender named [eolAppender] to category
> [ProgressDone].log4j: Level value for root is [INFO].log4j: root level set
> to INFOlog4j: Class name: [org.apache.log4j.ConsoleAppender]log4j: Parsing
> layout of class: "org.apache.log4j.PatternLayout"log4j: Setting property
> [conversionPattern] to [%d{dd MMM  HH:mm:ss} %5p %c{1} - %m%n].log4j:
> Adding appender named [consoleAppender] to category [root].Exception in
> thread "main" com.lexicalscope.jewel.cli.ArgumentValidationException:
> Option must have a value: [--inputDir -i value] : path to the directory
> containing the clinical notes to be processed.Option must have a value:
> [--option_x -x value]Option must have a value: [--option_m -m
> value]Unexpected Option: OOption must have a value: [--option_u -u value]
> at
> com.lexicalscope.jewel.cli.ValidationErrorBuilderImpl.validate(ValidationErrorBuilderImpl.java:64)
> at
> com.lexicalscope.jewel.cli.validation.ArgumentValidatorImpl.finishedProcessing(ArgumentValidatorImpl.java:104)
> at
> com.lexicalscope.jewel.cli.ArgumentCollectionBuilder.processArguments(ArgumentCollectionBuilder.java:129)
> at
> com.lexicalscope.jewel.cli.AbstractCliImpl.parseArguments(AbstractCliImpl.java:42)
> at com.lexicalscope.jewel.cli.CliFactory.parseArguments(CliFactory.java:67)
> at
> org.apache.ctakes.core.pipeline.PiperFileRunner.main(PiperFileRunner.java:27)
>
> Can you please help me fix this!
>
> -Ankit.
>
> Best,
>
> Ankit Anand
> Graduate Assistant | Informatics Institute, UAB<
> https://www.uab.edu/medicine/informatics/>
> Department of Computer Science, UAB<
> https://www.uab.edu/cas/computerscience/>
> University of Alabama at Birmingham
>
>


Re: doubt

2022-04-05 Thread Peter Abramowitsch
Bavithra,

There are many possible encapsulations of cTakes as a service. You are free
to create your own, as I have done.  But for the team and particularly for
the developer of  cTakes Web Rest,  it would be useful to document in some
detail what your experience with it has been.

Peter


On Tue, Apr 5, 2022 at 8:18 AM Chaithanya Sree 
wrote:

> Dear Developer,
>I am having doubts in ctakes web rest
>
> Thanks,
> Bavithra
>


Re: Segment annotation type

2022-03-22 Thread Peter Abramowitsch
Hi Greg
I don't bother about segments but have been pretty successful using this to
get a document's sections.

*add org.mitre.medfacts.uima.ZoneAnnotator
SectionRegex=org/mitre/medfacts/uima/section_regex.xml*

Have you checked out this annotator?  It creates "Heading" types and the
config file above is a good place to start from.  It has a nice ability to
normalize section types so that if note type A and B both have assessment
sections that are somewhat titled differently, you can have them both
tagged with the same label.

The annotator had some rough edges and unwanted printing in the log which
I've recently modified.  Also I did some optimization of the code which was
wasting compute cycles by re-initializing itself for every document.   I
can check it in, but you can get a good flavor of it by trying what's in
the codebase already.

Peter




On Tue, Mar 22, 2022 at 6:33 PM Greg Silverman  wrote:

> How do I modify org.apache.ctakes.typesystem.type.textspan.Segment to
> actually create annotations for document segments/sections?
>
> Also, how do I disable annotations for the SemanticRoleRelation annotation
> type?
>
> Thanks!
>
> Greg--
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE 
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


Question for Sean et al?

2022-02-10 Thread Peter Abramowitsch
Hi all

I've started using the mitre ZoneAnnotator am making some optimizations and
necessary changes which prevent it from leaking note text into the log file
(horrors!!!)

But I have a question.   Turns out that back in 2012  someone left raw
system.outs in the mastif-zoner code which comes to cTakes as a
dependency.  They are not part of the cTakes repo.

These print
input (character offset): 0
input (character offset): 34
...
in the log,  which can get really annoying when you have 100 million notes
as I have.

Here's my question ... what would you do?

   - Would you try to see about changing it in the original code (not sure
   who maintains it)
   - Use a modified version locally  (mind my own business)
   - Absorb this small library into the cTakes repo and remove the external
   dependency?

Peter


Re: Ctakes + UMLS dictionary

2022-01-18 Thread Peter Abramowitsch
As distributed, it contains the mappings of cuis to 2015 snomed and rxnorm
vocabularies.  It does not contain ICD 9 or 10 mappings.  But creating a
custom dictionary is a normal aspect of any serious installation. This is
how you can incorporate more recent versions of the umls and other
vocabularies.  See the ctakes dictionary creator for more information.

On Tue, Jan 18, 2022, 7:36 AM Shyam Bhimani  wrote:

> Hello,
>
>
>
> When I dig little deep I found below information on cTAKES wiki. Does it
> mean default clinical pipeline uses 2015 version of SNOWMED, RxNorm, ICD9,
> ICD10? Please advise.
>
>
>
>
>
> Shyam Bhimani
>
>
>
> *From:* Shyam Bhimani 
> *Sent:* Thursday, January 13, 2022 8:10 PM
> *To:* dev@ctakes.apache.org
> *Subject:* Ctakes + UMLS dictionary
>
>
>
> *** **WARNING:* This email originated from outside of Target RWE. 
>
>
>
> Hello,
>
>
>
> I am new to cTAKES and having hard time understanding what year/version
> dictionary (SNOMED-CT, RxNorm, ICD9 etc) is being used by ctakes default
> clinical pipeline?
>
> I have some medication names that are not being picked up by cTAKES eg
> dupilumab, dupixent so I am trying to understand why. Please advise.
>
>
>
> TIA
>
>
>
> Shyam Bhimani
>
> *Software Engineer*
>
>
>
> *Target RWE *
>
> 5001 S Miami Blvd, Suite 100
>
> Durham, NC 27703
>
> sbhim...@targetrwe.com
>
> C: (817) 323-0632
>
>
>
>
> 
>
>
>
>
>
> [image: Title: LinkedIn - Description: image of LinkedIn icon]
> [image:
> Title: Twitter - Description: image of Twitter icon]
> 
>
>
>
>
> *CONFIDENTIALITY NOTICE: The contents of this email message and any
> attachments are intended solely for the addressee(s) and may
> contain confidential and/or privileged information and may be legally
> protected from disclosure.*
>
>
>


Re: What URLs does ctakes use

2022-01-13 Thread Peter Abramowitsch
The only one you need for ctakes is
https://utslogin.nlm.nih.gov

The other one is for accessing the NLM's apis for content - such as the
metathesaurus.  It is not used by ctakes.

peter

On Thu, Jan 13, 2022 at 12:11 PM John Doe  wrote:

> Hello,
>
> I'm trying to run ctakes in an environment where requests are restricted.
> It failed to authenticate my UMLS account (timeout error), which I assume
> is because it is using the UMLS API, which accesses
> https://utslogin.nlm.nih.gov to authenticate my account. So, if I'm going
> to be able to run ctakes in this environment, I will need to allow access
> to this URL. I think I would also need https://uts-ws.nlm.nih.gov. Are
> there any other URLs that I would also need to allow?
>


Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Peter Abramowitsch
Hi Tim,
The performance boost was the frosting on the cake:  I had to make changes
(at least for our team) because Negex was not working correctly in
sentences with multiple identified annotations only some of which were
meant to be negated.  Negex became over-eager - applying negation when it
shouldn't have.  But even in its original version it was much more
effective than the cleartk polarity module. Shifting from Polarity to
the original Negex was decidedly slower - you could feel it.

However, you're right it would be good to benchmark it and get some real
numbers.  But as I say, it was the need to fix some of its problems that
was the primary issue.  I suspect that the regex cpu loading wasn't a big
issue in the early days of Negex when testing on grammatical biomedical
text and there were only a few negex trigger patterns.   But with 310
potential patterns and extremely dense notes it can make a real
difference.   The compiled regex from each pattern is fairly complex as
well.

I don't like code that does unnecessary work (literally billions of times
in my case)  - and in a large suite like cTakes all the little coding
shortcuts that waste CPU do add up.

I'll do a test and publish the results when I check in the code.

On Tue, Jan 4, 2022 at 8:54 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Peter,
> That sounds really useful! Were you able to benchmark it for runtime on a
> reasonably sized sample of your notes? Just curious because I wouldn't have
> expected regex to be that much of a bottleneck.
> Tim
>
>
> On Tue, 2022-01-04 at 17:36 -0800, Peter Abramowitsch wrote:
>
> * External Email - Caution *
>
>
>
> Thank you for the fulsome and humorous response.  Yes, I understand
>
> perfectly.  We definitely think along the same lines.  One of the drawbacks
>
> of static and simple to understand utility functions like JCasUtil's  is
>
> that one can just slap things together without getting to grips with the
>
> wastage of resources that sometimes occur.
>
>
> This brings me to the topic of Negex.  I've done a lot of improvements to
>
> it, also after I sent you that version last year.  It has been well tested
>
> in over 100 million notes so i think I can check it in.  But back to
>
> performance - it used to execute 200+ regular expressions multiple times on
>
> every sentence covering an identified annotation regardless of whether
>
> there was any hope of any of them matching.   My solution was to build an
>
> inverted index of the compiled expressions keyed on unique words found in
>
> the expressions, so based on the sentence,  I could look up and execute
>
> only the expressions that might match.  This might cut the number of regex
>
> operations down to 5 or 10 and sometimes none at all.There were many
>
> other changes that related to negation detection, of course.  For instance
>
> - handling sentences that switch between negating and non negating phrases
>
> within the same sentence.
>
>
> Peter
>
>
> On Tue, Jan 4, 2022 at 10:47 AM Finan, Sean <
>
> <mailto:sean.fi...@childrens.harvard.edu>
>
> sean.fi...@childrens.harvard.edu
>
> > wrote:
>
>
> Great question.
>
>
> The package name "windowed" isn't helpfully self-descriptive.  It contains
>
> yet another bit of code that I wrote as quickly as possible to help
>
> somebody in real-time with a problem.
>
> * There is only a 'procedural' difference between the two.  The models and
>
> methods are the same.
>
>
> The assertion engine has a bunch of objects delegating to objects
>
> delegating to more objects.  Each object calls one or more
>
> JCasUtil.select() frequently for the same types.  They also redundantly
>
> call JCasUtil.selectCovered() and selectCovering() for the same types.
>
>
> process( jcas ) {
>
>   Collection<..> sentences = ...select(..);
>
>   delegateA.do( sentences );
>
> }
>
> class DelegateA {
>
>   void do( Collection<..> sentences ) {
>
>for ( Sentence sentence : sentences ) {
>
>   Collection tokens = JCasUtil.selectCovered( jcas,
>
> Token.class, sentence );
>
>   delegateB.use( tokens );
>
>  }
>
> }
>
> class DelegateB {
>
>   void use( Collection<..> tokens ) {
>
>  Collection sentence = JCasUtil.selectCovering( jcas,
>
> Sentece.class, tokens );
>
> ...
>
>   }
>
> }
>
>
> The above isn't an exact representation, but you get the point.
>
> The problem with code like this is repeated traversal of the (object)
>
> array in the cas.  Every JCasUtil.select* pours through the whole thing.
>
> For a small document with a sma

Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Peter Abramowitsch
Thank you for the fulsome and humorous response.  Yes, I understand
perfectly.  We definitely think along the same lines.  One of the drawbacks
of static and simple to understand utility functions like JCasUtil's  is
that one can just slap things together without getting to grips with the
wastage of resources that sometimes occur.

This brings me to the topic of Negex.  I've done a lot of improvements to
it, also after I sent you that version last year.  It has been well tested
in over 100 million notes so i think I can check it in.  But back to
performance - it used to execute 200+ regular expressions multiple times on
every sentence covering an identified annotation regardless of whether
there was any hope of any of them matching.   My solution was to build an
inverted index of the compiled expressions keyed on unique words found in
the expressions, so based on the sentence,  I could look up and execute
only the expressions that might match.  This might cut the number of regex
operations down to 5 or 10 and sometimes none at all.There were many
other changes that related to negation detection, of course.  For instance
- handling sentences that switch between negating and non negating phrases
within the same sentence.

Peter

On Tue, Jan 4, 2022 at 10:47 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Great question.
>
> The package name "windowed" isn't helpfully self-descriptive.  It contains
> yet another bit of code that I wrote as quickly as possible to help
> somebody in real-time with a problem.
> * There is only a 'procedural' difference between the two.  The models and
> methods are the same.
>
> The assertion engine has a bunch of objects delegating to objects
> delegating to more objects.  Each object calls one or more
> JCasUtil.select() frequently for the same types.  They also redundantly
> call JCasUtil.selectCovered() and selectCovering() for the same types.
>
> process( jcas ) {
>   Collection<..> sentences = ...select(..);
>   delegateA.do( sentences );
> }
> class DelegateA {
>   void do( Collection<..> sentences ) {
>for ( Sentence sentence : sentences ) {
>   Collection tokens = JCasUtil.selectCovered( jcas,
> Token.class, sentence );
>   delegateB.use( tokens );
>  }
> }
> class DelegateB {
>   void use( Collection<..> tokens ) {
>  Collection sentence = JCasUtil.selectCovering( jcas,
> Sentece.class, tokens );
> ...
>   }
> }
>
> The above isn't an exact representation, but you get the point.
> The problem with code like this is repeated traversal of the (object)
> array in the cas.  Every JCasUtil.select* pours through the whole thing.
> For a small document with a small cas (or early in a pipeline), that array
> may be small and the traversal fast.  However, when people are
> (unadvisably) processing a single document that sizes in the gigabyte
> range, repeatedly going through the cas takes a long time.
>
> So, what I did was create a single container object that holds Collections
> of the types of interest and their covering relationships, populate all
> that stuff once per process( jcas ) and pass that container through to each
> delegate object.  Basically, a jcas lite.  The biggest culprit in the
> assertion engines was repeatedly iterating over the array for covered and
> covering windows, hence the subpackage name "windowed".
>
> Is it faster for smaller docs?  Not so much.  Does it instantaneously
> process the Encyclopedia Brittanica as one text?  Of course not.  Is it
> orders of magnitudes faster on such onerous docs?  In my tests, yes.
>
> Going through my delegating example above, the end delegate is the same.
> Hence the processing is the same and repeatable.  In my tests on both small
> and gargantuan documents the windowed version and the original version
> produced the same output.
>
> Sean
>
>
>
>
>
> 
> From: Peter Abramowitsch 
> Sent: Tuesday, January 4, 2022 11:39 AM
> To: dev@ctakes.apache.org
> Subject: Re: Performance of the cleartk history module [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean
> Ok..  I was confused whether I was meant to find it in the sources.
> But while you're reading this, is there a brief way to describe the
> difference between the older:package
>
> org.apache.ctakes.assertion.medfacts.cleartk;
> and
> org.apache.ctakes.assertion.medfacts.cleartk.windowed
>
> Peter
>
>
>
>
>
> On Tue, Jan 4, 2022 at 7:47 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu>
> wrote:
>
> > Hi Peter,
> >
> > I created a second engine that just used text matching or regular
> > expressions given the discovered eve

Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Peter Abramowitsch
Hi Sean
Ok..  I was confused whether I was meant to find it in the sources.
But while you're reading this, is there a brief way to describe the
difference between the older:package

org.apache.ctakes.assertion.medfacts.cleartk;
and
org.apache.ctakes.assertion.medfacts.cleartk.windowed

Peter





On Tue, Jan 4, 2022 at 7:47 AM Finan, Sean 
wrote:

> Hi Peter,
>
> I created a second engine that just used text matching or regular
> expressions given the discovered events.  It also uses covering section
> types, formatted text and other things, but the text match might be the
> most impactful item.
>
> You are an accomplished developer so the email scratch below is for the
> benefit of others who search archives.
>
> class LazyHistoryFinder extends JCasAnnotator_ImplBase {
>   String[] HISTORY = { "history of", "h/o", "h / o" };
>
>   boolean isHistory( EventMention event ) {
>text = e.getCoveredText().toLowerCase();
>   return Arrays.stream( HISTORY ).anyMatch( text::startsWith );
>   }
>
>   void process( JCas jcas ) throws Analysis*Ex {
> JCasUtil.select( jcas, EventMention.class )
>  .stream()
>  .filter( this::isHistory )
>  .foreach( e -> e.setHistoryOf(
> CONST.NE_HISTORY_OF_PRESENT ) );
>   }
> }
>
> It requires a stroll through the monstrous cas array and it certainly
> isn't sexy, but it gets the job done.
>
> Sean
>
>
> 
> From: Peter Abramowitsch 
> Sent: Monday, January 3, 2022 10:23 PM
> To: dev@ctakes.apache.org
> Subject: Re: Performance of the cleartk history module [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thanks Sean
>
> By "following engine", you mean a second instance of the history engine
> that uses only the event spans, or you modified the current one to traverse
> the event-span within the context window?I see you made some source
> changes in that area and will check tomorrow.
>
> Peter
>
> On Mon, Jan 3, 2022 at 2:26 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu>
> wrote:
>
> > Hi Peter,
> >
> > I have noticed this and just added a following engine that recognized
> text
> > within event spans.  It is a lazy solution, but it fit my needs and
> > available time.
> >
> > Sean
> > 
> > From: Peter Abramowitsch 
> > Sent: Monday, January 3, 2022 5:03 PM
> > To: dev@ctakes.apache.org
> > Subject: Performance of the cleartk history module [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi All
> >
> > I've noticed that the HistoryCleartkAnalysisEngine misses many common
> forms
> > of subject history including the obvious "h/o" prefix.Looking into
> the
> > distribution, there's a model.jar and what  appears to be a weights file
> > containing trigger words:
> > resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
> > are all given their own weights.   But I'm not sure that they're actually
> > used in this way:  see below.   However, there's also a tiny file:
> > /org/apache/ctakes/assertion/semantic_classes/history.txt
> > which does contain a few entries including "h/o" which I assume is used
> for
> > training but is never referred to anywhere.
> >
> > Here's the behavior I'm seeing:
> > example input condition term found history feature marked range text
> > history of pregnancies "history of" included in the cu_term and prefterm
> > yes
> >   no history of pregnancies
> > history of adenopathy "history of" not included in the cu_term or
> prefterm
> > yes yes adenopathy
> > H/O postpartum psychosis "h/o" not included in the prefterm or cu_term
> yes
> > yes postpartum psychosis
> > H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term
> yes
> > no postpartum psychosis
> > H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies
> >
> > You can see that it is quite perverse -  there is a pattern suggesting
> that
> > if the concept definition occupies the history words, then they cannot be
> > seen by the history annotation engine.
> >
> > Has anyone else noticed this - and have they done anything about it?
> >
> > Peter
> >
>


Re: Performance of the cleartk history module [EXTERNAL]

2022-01-03 Thread Peter Abramowitsch
Thanks Sean

By "following engine", you mean a second instance of the history engine
that uses only the event spans, or you modified the current one to traverse
the event-span within the context window?I see you made some source
changes in that area and will check tomorrow.

Peter

On Mon, Jan 3, 2022 at 2:26 PM Finan, Sean 
wrote:

> Hi Peter,
>
> I have noticed this and just added a following engine that recognized text
> within event spans.  It is a lazy solution, but it fit my needs and
> available time.
>
> Sean
> ________
> From: Peter Abramowitsch 
> Sent: Monday, January 3, 2022 5:03 PM
> To: dev@ctakes.apache.org
> Subject: Performance of the cleartk history module [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I've noticed that the HistoryCleartkAnalysisEngine misses many common forms
> of subject history including the obvious "h/o" prefix.Looking into the
> distribution, there's a model.jar and what  appears to be a weights file
> containing trigger words:
> resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
> are all given their own weights.   But I'm not sure that they're actually
> used in this way:  see below.   However, there's also a tiny file:
> /org/apache/ctakes/assertion/semantic_classes/history.txt
> which does contain a few entries including "h/o" which I assume is used for
> training but is never referred to anywhere.
>
> Here's the behavior I'm seeing:
> example input condition term found history feature marked range text
> history of pregnancies "history of" included in the cu_term and prefterm
> yes
>   no history of pregnancies
> history of adenopathy "history of" not included in the cu_term or prefterm
> yes yes adenopathy
> H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes
> yes postpartum psychosis
> H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes
> no postpartum psychosis
> H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies
>
> You can see that it is quite perverse -  there is a pattern suggesting that
> if the concept definition occupies the history words, then they cannot be
> seen by the history annotation engine.
>
> Has anyone else noticed this - and have they done anything about it?
>
> Peter
>


Performance of the cleartk history module

2022-01-03 Thread Peter Abramowitsch
Hi All

I've noticed that the HistoryCleartkAnalysisEngine misses many common forms
of subject history including the obvious "h/o" prefix.Looking into the
distribution, there's a model.jar and what  appears to be a weights file
containing trigger words:
resources/org/apache/ctakes/assertion/models/history.txt   where h, o, /
are all given their own weights.   But I'm not sure that they're actually
used in this way:  see below.   However, there's also a tiny file:
/org/apache/ctakes/assertion/semantic_classes/history.txt
which does contain a few entries including "h/o" which I assume is used for
training but is never referred to anywhere.

Here's the behavior I'm seeing:
example input condition term found history feature marked range text
history of pregnancies "history of" included in the cu_term and prefterm yes
  no history of pregnancies
history of adenopathy "history of" not included in the cu_term or prefterm
yes yes adenopathy
H/O postpartum psychosis "h/o" not included in the prefterm or cu_term yes
yes postpartum psychosis
H/O: postpartum psychosis "h/o" not included in the prefterm or cu_term yes
no postpartum psychosis
H/O pregnancies "h/o"  included in the  cu_term yes no h/o pregnancies

You can see that it is quite perverse -  there is a pattern suggesting that
if the concept definition occupies the history words, then they cannot be
seen by the history annotation engine.

Has anyone else noticed this - and have they done anything about it?

Peter


Re: empty preferredText [EXTERNAL]

2021-12-07 Thread Peter Abramowitsch
I think the issue is that preferred text in the dictionary is only
populated by matches from the "dest" vocabularies and it uses *their*
preferred text.  If there's no match in any of them, then it should put the
CUI's own preferred text entry in the dictionary, but it doesn't.  I'm
pretty sure It's available during the dictionary creation process, but
probably not used.

On Tue, Dec 7, 2021 at 6:22 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> OK, I thought this might be what's happening. I did check my 2021 UMLS
> release and the cui does seem to have a preferred text but I think my
> container is using an older release. For what it's worth the CUI is:
> C0360554
>
> and a sentence that reproduces the issue in CVD with the current release
> is:
>
> 'Patient had problems tolerating oral hydrocortisone.'
>
> I will see if I can find the older UMLS release lying around. I think the
> right workaround for now is your suggestion of using the covered text.
>
> Tim
>
>
> On Tue, 2021-12-07 at 17:59 +0100, Peter Abramowitsch wrote:
>
> * External Email - Caution *
>
>
>
> Hi Tim,
>
>
> Yes, I've definitely encountered it.   It happens when the concept has a
>
> CUI_TERM which has matched the text, but there is no corresponding entry in
>
> the SNOMED or other vocab table mapping CUI to SNOMED.  The obvious choice
>
> is to use the covered text as a surrogate, but technically it could be PHI
>
> if that matters to you.  The other thing is to see if there's an MSH term
>
> that maps using the metathesaurus.  If so, including MSH in your dictionary
>
> as a src AND dest vocab will solve the problem.
>
>
> Peter
>
>
>
> On Tue, Dec 7, 2021 at 5:45 PM Miller, Timothy <
>
> <mailto:timothy.mil...@childrens.harvard.edu>
>
> timothy.mil...@childrens.harvard.edu
>
> > wrote:
>
>
> Hello,
>
> I'm using the dictionary lookup (through ctakes-web-rest) and trying to
>
> read off the preferredText that comes back as a human-readable way to
>
> display the CUI. On a very small percentage, there does not seem to be any
>
> preferredText. Has anyone else encountered this? Is this a limitation of
>
> the underlying ontologies or a bug we can address?
>
> Tim
>
>
>


Re: empty preferredText

2021-12-07 Thread Peter Abramowitsch
Hi Tim,

Yes, I've definitely encountered it.   It happens when the concept has a
CUI_TERM which has matched the text, but there is no corresponding entry in
the SNOMED or other vocab table mapping CUI to SNOMED.  The obvious choice
is to use the covered text as a surrogate, but technically it could be PHI
if that matters to you.  The other thing is to see if there's an MSH term
that maps using the metathesaurus.  If so, including MSH in your dictionary
as a src AND dest vocab will solve the problem.

Peter


On Tue, Dec 7, 2021 at 5:45 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Hello,
> I'm using the dictionary lookup (through ctakes-web-rest) and trying to
> read off the preferredText that comes back as a human-readable way to
> display the CUI. On a very small percentage, there does not seem to be any
> preferredText. Has anyone else encountered this? Is this a limitation of
> the underlying ontologies or a bug we can address?
> Tim
>
>


Question about Relation Extractors

2021-10-27 Thread Peter Abramowitsch
Thanks Sean and Tim for the background & code on these annotators & models.

Just looking at how the EventTimeRelationAnnotator works,  I think the
internal representation would be a bit different, but I get the gist of it
for sure.  Unless I'm using  outdated code The "actual data" is captured in
the SemanticRoleRelations, Predicate, and the Semantic Arguments.
Strangely, the BinaryTextRelation is declared in the CAS but never actually
used.

I found a couple of good resources on this:

https://aclanthology.org/W04-2412.pdf
https://aclanthology.org/D08-1008.pdf
https://web.stanford.edu/~jurafsky/slp3/slides/22_SRL.pdf

Peter


followup question

2021-10-27 Thread Peter Abramowitsch
Hi Sean,   I've been doing a bit of reading on propbanks, framesets, etc in
relation to what I'm seeing in the  CAS when I turn on some of the relation
extractors that do work (in contrast to the ones I mentioned before that
are missing a model).

Is it safe to say that these extractors are mostly experiments used to
validate semantic approaches proposed back in the day, and that there isn't
yet much code that pulls these relations out in any user-friendly way?   By
user-friendly I mean  creating a simpler "edge object" that simply joins
two identified events in a way that hides all the intermediate collections
of structures generated by these relation extractors and their dependencies.

Peter


Another question about relationship extractors

2021-10-27 Thread Peter Abramowitsch
Hi (probably Sean),  are the default model.jars for the
*CausesBringsAboutRelationExtractorAnnotator* and the
*ManagesTreatsRelationExtractorAnnotator* not part of the cTakes
sources?I looked through the source at all pipers and all unit tests
and on the net and I didn't find references to the usage of these
annotators.  When I run with them, they are definitely looking for models
of their own, and there is code to do the training, but this is an area
that's still a mystery to me.  Are these models proprietary to U of
Colorado which is where the source seems to come from?

Peter


Re: Question about use of Time Annotators in 4.0.1 (trunk) [EXTERNAL]

2021-10-26 Thread Peter Abramowitsch
That's great info, thank you, Sean!
I keep forgetting that the sub pipers may have more recent information that
some of the unit tests.

P.


On Tue, Oct 26, 2021 at 3:24 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Peter,
>
> I use the piper files, and temporal sub piper TemporalSubPipe.piper in
> ctakes-temporal-res/src/main/resources/org/apache/ctakes/temporal/pipeline
>
> contains the following:
>
> // Commands and parameters to create a default temporal processing
> sub-pipeline.  This is not a full pipeline.
>
> // 'Generic' Events.  Use addDescription and let the EventAnnotator set
> itself up with defaults.
> addDescription EventAnnotator
>
> // Times.  Use addLogged to log start and finish of processing.  There
> aren't default models, so set specifically
> add BackwardsTimeAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar
>
> // DocTimeRel: the relation bin for Events to the Document Creation Time.
> add DocTimeRelAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar
>
> // Event - Time binary relations.
> add EventTimeRelationAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar
>
> // Event - Event binary relations.
> add EventEventRelationAnnotator
> classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar
>
>
> The last time that I ran this it completed successfully.
>
> Sean
>
>
> 
> From: Peter Abramowitsch 
> Sent: Tuesday, October 26, 2021 9:09 AM
> To: dev@ctakes.apache.org
> Subject: Question about use of Time Annotators in 4.0.1 (trunk) [EXTERNAL]
>
> * External Email - Caution *
>
>
> I have a couple of questions about the TimeAnnotators  forward and backward
>
> 1.  The BackwardsTimeAnnotator complains that it doesn't know whether it is
> in training mode when there is no "inTraining" parameter.  But when I
> supply it with the value false, then it complains that it doesn't have a
> classifier jar path, as if it now really thinks it's in training!  So
> what's the trick to make it happy that it's not in training.
>
> 2.  The unit tests for the time annotators contain the two extra pipeline
> steps:
>
>- CopyNPChunksToLookupWindowAnnotations.class
>- RemoveEnclosedLookupWindows.class
>
> Are these needed for regular use of the Time annotators or is this just a
> Unit test feature.
>
> Peter
>


Question about use of Time Annotators in 4.0.1 (trunk)

2021-10-26 Thread Peter Abramowitsch
I have a couple of questions about the TimeAnnotators  forward and backward

1.  The BackwardsTimeAnnotator complains that it doesn't know whether it is
in training mode when there is no "inTraining" parameter.  But when I
supply it with the value false, then it complains that it doesn't have a
classifier jar path, as if it now really thinks it's in training!  So
what's the trick to make it happy that it's not in training.

2.  The unit tests for the time annotators contain the two extra pipeline
steps:

   - CopyNPChunksToLookupWindowAnnotations.class
   - RemoveEnclosedLookupWindows.class

Are these needed for regular use of the Time annotators or is this just a
Unit test feature.

Peter


Re: An exception occured while executing the Java class. URI is not hierarchical [EXTERNAL]

2021-08-19 Thread Peter Abramowitsch
RE the Dependency Parser, it is part of the release.
Just use the PiperCreator app and look for it in the list of modules.  It
will tell you it's prerequisites, and therefore where in your Piper file it
belongs.

addDescription ClearNLPDependencyParserAE

- Peter

On Thu, Aug 19, 2021 at 2:39 AM Benjamin hansen 
wrote:

> Thanks for the input.
> So I am trying to use
> the ClinicalPipelineFactory.getTokenProcessingPipeline which includes the
> LVG which seems to be having the issue. But you say i can just replace the
> LVG part with the DependencyParser in that pipeline? Can you elaborate a
> bit on where I can find the DependencyParser?
>
>
> On Wed, Aug 18, 2021 at 5:56 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Hi Benjamin,
> >
> > If what you're looking for are the lemmas of tokens, just use the
> > DependencyParser instead of LVG
> > I had another problem with LVG as well, but I think that it might be
> simply
> > that all the needed resources are not being copied into the right place.
> > This was done without the lvg
> >
> > {
> >   "_type": "ConllDependencyNode",
> >   "sofa": 1,
> >   "begin": 3,
> >   "end": 10,
> >   "id": 2,
> >   "form": "decided",
> >   "lemma": "decide",
> >   "cpostag": "VBD",
> >   "postag": "VBD",
> >   "feats": "_",
> >   "head": 137,
> >   "deprel": "root",
> >   "pdeprel": "_"
> > }
> >
> > On Wed, Aug 18, 2021 at 7:29 AM Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> >
> > > Hi Benjamin,
> > >
> > > My first question is: what pipeline are you trying to run?
> > >
> > > My second question is: Do you really need to use LVG?
> > >
> > > Sean
> > > 
> > > From: Benjamin hansen 
> > > Sent: Wednesday, August 18, 2021 3:07 AM
> > > To: dev@ctakes.apache.org
> > > Subject: An exception occured while executing the Java class. URI is
> not
> > > hierarchical [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > While working at a simple pipeline example I got this error:
> > >
> > > *java.lang.IllegalArgumentException*: *URI is not hierarchical*
> > >
> > > *at* java.io.File. (*File.java:420*)
> > >
> > > *at* org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load (
> > > *LvgCmdApiResourceImpl.java:65*)
> > >
> > >
> > > I found that this issue has already been reported 4 years ago here
> > >
> > >
> >
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-445__;!!NZvER7FxgEiBAiR_!9HBmXkq30TUdwnSpuHc8_7iEVkoMAiJ3p_rSTXE5d90TARHEdioOMNukOUaL6eB5CboTRBMsOYI$
> > >
> > >
> > > I am on MacOS which the workaround patched proposed in that thread does
> > not
> > > fix... And like the last comment in the thread says - the patch likely
> > also
> > > does not work on linux.
> > >
> > >
> > > This seems to be quite a serious bug since both mac and linux would be
> > > serious development and production platforms for ctakes users.
> > >
> > >
> > > Is there no fix for this after 4 years?
> > >
> >
>


Re: An exception occured while executing the Java class. URI is not hierarchical [EXTERNAL]

2021-08-18 Thread Peter Abramowitsch
Hi Benjamin,

If what you're looking for are the lemmas of tokens, just use the
DependencyParser instead of LVG
I had another problem with LVG as well, but I think that it might be simply
that all the needed resources are not being copied into the right place.
This was done without the lvg

{
  "_type": "ConllDependencyNode",
  "sofa": 1,
  "begin": 3,
  "end": 10,
  "id": 2,
  "form": "decided",
  "lemma": "decide",
  "cpostag": "VBD",
  "postag": "VBD",
  "feats": "_",
  "head": 137,
  "deprel": "root",
  "pdeprel": "_"
}

On Wed, Aug 18, 2021 at 7:29 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Benjamin,
>
> My first question is: what pipeline are you trying to run?
>
> My second question is: Do you really need to use LVG?
>
> Sean
> 
> From: Benjamin hansen 
> Sent: Wednesday, August 18, 2021 3:07 AM
> To: dev@ctakes.apache.org
> Subject: An exception occured while executing the Java class. URI is not
> hierarchical [EXTERNAL]
>
> * External Email - Caution *
>
>
> While working at a simple pipeline example I got this error:
>
> *java.lang.IllegalArgumentException*: *URI is not hierarchical*
>
> *at* java.io.File. (*File.java:420*)
>
> *at* org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load (
> *LvgCmdApiResourceImpl.java:65*)
>
>
> I found that this issue has already been reported 4 years ago here
>
> https://urldefense.com/v3/__https://issues.apache.org/jira/browse/CTAKES-445__;!!NZvER7FxgEiBAiR_!9HBmXkq30TUdwnSpuHc8_7iEVkoMAiJ3p_rSTXE5d90TARHEdioOMNukOUaL6eB5CboTRBMsOYI$
>
>
> I am on MacOS which the workaround patched proposed in that thread does not
> fix... And like the last comment in the thread says - the patch likely also
> does not work on linux.
>
>
> This seems to be quite a serious bug since both mac and linux would be
> serious development and production platforms for ctakes users.
>
>
> Is there no fix for this after 4 years?
>


Re: ctakes activity gauge

2021-07-30 Thread Peter Abramowitsch
Hi Ben
If you watch my presentation from ApacheCon you'll see how we went about
mass extraction of notes.   This video contains two presentations and mine
starts about halfway through:   https://www.youtube.com/watch?v=F5WCCPWz7Z0

But in the same conference thread, there were two other groups working on
similar projects, but using different approaches.
here's one of them  https://www.youtube.com/watch?v=kZw42pGzyHs

Peter

On Thu, Jul 29, 2021 at 11:25 AM Benjamin hansen 
wrote:

> Thank you for these insights Peter.
> Your project sounds very interesting. Are you using the uima pipeline on a
> cluster to process that many notes? And how long does it take?
>
> I have been considering to use uimafit+ctakes together with apache spark
> for distributed computations. I saw a video from Philip Ogren from 2014
> describing this - but unfortunately he does not give any details on how he
> did this.
> Would you by any chance know where i can find more information about how to
> achieve this?
>
> Best regards
>
> On Thu, Jul 29, 2021 at 6:21 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Hi Ben,
> >
> > I can only speak for myself, but I am using cTakes extensively at two
> major
> > California Universities in multiple projects. The kind of customizations
> I
> > am doing are mostly specific to the facility and to the project and
> > therefore wouldn't be for inclusion in the source repository.We have
> > just finished using it to extract concepts from 102 million notes
> >
> > Unless I am wrong, updates are going into the Apache SVN repository and
> > Github has acted as a backup repo.  It's true that there isn't a well
> > organized update bugfix & release team and schedule.  Others can speak to
> > that too, but I would suspect that part of this is due to the fact that
> the
> > core is very stable and most modifications and enhancements are, like
> mine,
> > local to a project.However, there has been talk but not much
> definitive
> > action on two initiatives - to upgrade to the current version of UIMA and
> > to include the Ruta engine as one of the pluggable components.
> >
> > The user base is fairly substantial for a project this specialized.  I
> > suggest you have a look at the presentations at the cTakes thread of the
> > 2020 ApacheCon conference.
> >
> > You'll have to search through this list:
> > https://www.youtube.com/playlist?list=PLU2OcwpQkYCy_awEe5xwlxGTk5UieA37m
> >
> > By all means, come and join us!
> >
> > Regards, Peter
> >
> >
> >
> > On Thu, Jul 29, 2021 at 12:24 AM Benjamin hansen <
> > benjaminkakke...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > >
> > > I hope its okay i send this mail here - i was not sure where else to
> pose
> > > my question.
> > >
> > >
> > > We are considering to use cTakes in our applications - however I got
> > > concerned when i saw the lack of activity in the github repository
> which
> > is
> > > why I want to ask -
> > >
> > >
> > >
> > > Is cTakes still being actively developed and maintained or have
> > developers
> > > gone to develop other systems instead?
> > >
> > >
> > > Is there any kind of up 2 date roadmap for the activity development of
> > > cTakes?
> > >
> > >
> > > What is the cTakes userbase like these days? Is it growing?, dwindling?
> > > stable? non-existent?
> > >
> > >
> > >
> > >
> > > Thanks in advance.
> > >
> >
>


Re: ctakes activity gauge

2021-07-29 Thread Peter Abramowitsch
Hi Ben,

I can only speak for myself, but I am using cTakes extensively at two major
California Universities in multiple projects. The kind of customizations I
am doing are mostly specific to the facility and to the project and
therefore wouldn't be for inclusion in the source repository.We have
just finished using it to extract concepts from 102 million notes

Unless I am wrong, updates are going into the Apache SVN repository and
Github has acted as a backup repo.  It's true that there isn't a well
organized update bugfix & release team and schedule.  Others can speak to
that too, but I would suspect that part of this is due to the fact that the
core is very stable and most modifications and enhancements are, like mine,
local to a project.However, there has been talk but not much definitive
action on two initiatives - to upgrade to the current version of UIMA and
to include the Ruta engine as one of the pluggable components.

The user base is fairly substantial for a project this specialized.  I
suggest you have a look at the presentations at the cTakes thread of the
2020 ApacheCon conference.

You'll have to search through this list:
https://www.youtube.com/playlist?list=PLU2OcwpQkYCy_awEe5xwlxGTk5UieA37m

By all means, come and join us!

Regards, Peter



On Thu, Jul 29, 2021 at 12:24 AM Benjamin hansen 
wrote:

> Hi all,
>
>
> I hope its okay i send this mail here - i was not sure where else to pose
> my question.
>
>
> We are considering to use cTakes in our applications - however I got
> concerned when i saw the lack of activity in the github repository which is
> why I want to ask -
>
>
>
> Is cTakes still being actively developed and maintained or have developers
> gone to develop other systems instead?
>
>
> Is there any kind of up 2 date roadmap for the activity development of
> cTakes?
>
>
> What is the cTakes userbase like these days? Is it growing?, dwindling?
> stable? non-existent?
>
>
>
>
> Thanks in advance.
>


Re: UMLS Changes

2021-07-15 Thread Peter Abramowitsch
Hi Dylan,

I guess you missed many emails about this.  The new regime has been in place 
since January and changes have been made to the 4.0.0.1 release and on the 
trunk to accomodate the use of an API key in place of user and password.  If 
you go onto the ctakes Wiki you will get the information you need.  The first 
thing is to log into your UMLS account, go to the profile section and get your 
API key string.

Peter

Sent from my iPad

> On Jul 15, 2021, at 06:49, Dylan Price-Ginno  
> wrote:
> 
> Dear whomever it may concern,
> 
> UMLS has changed the way you log in meaning you don't have a password
> anymore but login via a google account or a university account. How does
> this work when trying to operate ctakes?
> 
> Warm regards
> Dylan


Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-19 Thread Peter Abramowitsch
Sean & everyone,  totally agree.   Ruta is an obvious candidate because it
is already so tightly coupled to UIMA.  It provides a very rich overlay to
the annotations and the type system.  Does anyone know if Ruta instances
are thread safe (assuming the JCAS is in thread-local storage)?   I saw one
conversation from a while ago asking the same question, but don't think I
saw an answer)

At times I've wondered whether a more generic rules engine that exposed
rules to the CAS could also be useful.  The logic wouldn't be restricted to
doing text interrogation.  Like  Ruta it would access the jCas via a Rules
Language but a predicate wiring API could provide support for a wide range
of operations involving external logic and data.   Also the ability to
invoke the rules stage at multiple times in the same pipeline with
different rule sets.   Perhaps all this could already be handled in Ruta's
extension mechanism.

Peter


On Wed, May 19, 2021 at 5:30 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom
> synonyms, cuis, etc.) which kind of changes the "rules" of what the
> standard dictionary lookup considers a valid term based upon available
> tokens in the text.  There are other simple settings that further qualify
> how the standard dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced
> "logic" that can alter or remove terms already discovered by the standard
> dictionary lookup.
>
> Peter and Kean both outline some custom annotators that they have created
> to use logic that can alter/add/remove terms discovered by the standard
> dictionary lookup.  I do the same thing for different projects and advise
> everybody that applies ctakes to specific domains do the same.
>
> ctakes is a general purpose tool and results can definitely be improved
> when catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more
> versatile annotator.  Introducing an engine that can utilize something like
> ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without
> changing code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script
> to fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to
> worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> community can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we
> went for different annotators like Peter and Kean outlined and just use
> piper file changes to satisfy #2 as that is definitely much easier.
> However, it doesn't benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> 
> From: Kean Kaufmann 
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> > yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified number of tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and thinking
> it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Hi All,  yes,  the line between "lookup" and rule execution is a little
> > blurry sometimes.   Here's some more blurriness.
> >
> > I've done something related, adapting a UIMA tokens regex engine for
> > Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> > CONLLDEP Annotations as the tokens to reason over.   

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-18 Thread Peter Abramowitsch
Hi All,  yes,  the line between "lookup" and rule execution is a little
blurry sometimes.   Here's some more blurriness.

I've done something related, adapting a UIMA tokens regex engine for
Ctakes.  You create a new type in the TypeSystem.  In my case it uses
CONLLDEP Annotations as the tokens to reason over.   You can set up
expressions (rules) that look like this.
(Yes, this case is already covered in the dictionary, but it's an example)

Matcher A:   (lemma=="be");
Matcher B:   /partially|partly/;
Matcher C:   /vaccinated/;

Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;

You get the Annotation you've delegated to this task, with the entity
value  "vaccinated|1234|5678"  and the range which spanned the tokens that
caused the annotation rule to fire

(See Stanford's Tokens Regex)

Peter


On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> But Sean, isn't what he's asking for essentially already implemented in
> cTAKES as the custom dictionary? I'm currently using that approach for my
> covid container:
>
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container
> Tim
>
> 
> From: Finan, Sean 
> Sent: Tuesday, May 18, 2021 11:55 AM
> To: dev@ctakes.apache.org
> Cc: Himanshu Shekhar Sahoo
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi Greg,
>
> From 30,000 ft, I think that you would want to use the RutaEngine.
>
>
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
>
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
>
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
>
> That seems to be the actual analysis engine that loads and uses rules to
> create annotations.
> While you could use an xml descriptor or use the piper "set" command and
> do things like mapping ruta to ctakes type systems, I would take the
> alternate approach of "copying" the initialize(..) and process (..) methods
> and modify them to use ctakes types directly.
>
> Disclaimer:  I know very little about uima ruta.  At some point I did look
> into it but it was for a specific (ctakes-derivative) project and I didn't
> go further than basic doc perusal.
>
> If you move forward with this please let us all know what you find.  I
> think that there will be great interest in the community.
>
> Sean
> 
> From: Greg Silverman 
> Sent: Tuesday, May 18, 2021 11:13 AM
> To: dev@ctakes.apache.org
> Cc: Himanshu Shekhar Sahoo
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,
> I was wondering if there was a way to use rule-base lookup of a custom
> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
> wrt to cTAKES specifics.
>
> Thanks!
>
>
> Greg--
>
> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> >  To which ctakes component(s) are you referring?
> > 
> > From: Greg Silverman 
> > Sent: Sunday, May 16, 2021 6:02 PM
> > To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > I looked all over and could not find any information on how to add this
> > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> >
> > Thanks in advance!
> >
> > Greg--
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > >
> > Department of Surgery
> > University of Minnesota
> > g...@umn.edu
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> >
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


Re: Healthcare Division for Apache

2021-05-03 Thread Peter Abramowitsch
Hi Javi,

I'm a long term ctakes user and contributor who has worked in various
professional, incubator, and standards-creating environments for
healthcare.  I like your idea in theory, but I see an on-the-ground issue
with a specific health oriented section of Apache - and that is a lack of
content.  Unless I have missed something, cTakes is the only mature Apache
project in this space - it is not as if there are a set of domain-related
projects in this space waiting to be promoted.

There would also be something materially different about a Healthcare
specific component of Apache - which is not to say it isn't a valid idea,
but it is worth discussion.  The difference is that most all of the
projects in Apache are embedded in the information technology domain and
its challenges - storage, search, management, presentation etc - but tend
to be agnostic with regard to the information content over which the Apache
technology is implemented.

Health, on the other hand is its own specific domain, so the tasks,
problems, and solutions here are not just about creation, transport and
presentation of health related information, but what the information
actually is, how it is used, and by whom.  Here we're talking about
clinical informatics and a host of other information spaces that require
very high levels of experience and understanding that are tangential to the
problems of system infrastructure.

If I could invent an analogy...  A lot of Apache's software is probably
being used in the logistics domain,  warehousing, supply chain
optimization, package deliveries and such, but would that be a reason to
create an Apache shipping and delivery software presence?

So this is my hesitation about endorsing a Health initiative at the
moment.  As I've said, it would be a wonderful idea, but the cart is
somewhat before the horse in my opinion.   More localized and feasible
might be a cTakes user group that goes beyond just this email address.

Regards
Peter


Re: multi-threads on REST client?

2021-03-25 Thread Peter Abramowitsch
I did a different implementation in my own rest service where I
instantiated not only a fresh cas, but fresh pipeline for each threadpool
object.  It's memory hungry but safe - running for weeks with 35
simultaneous threads & zero errors.   I wasn't convinced that all the AEs
were thread-safe to let them be shared.  So have a look at how the REST
server sets up its threadpool and what it does after the service has
completed a request in terms of releasing resources.  On RHEL, you may also
need to increase your ulimit if you're getting connection refused errors
under heavy load.  I discovered that

Peter

On Thu, Mar 25, 2021 at 3:26 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Just wondering what the logistics of this are. The REST interface has a
> CAS pool of 10, and when it gets a new request, it grabs a CAS and
> sends it into a pipeline. So what happens if the REST endpoint is
> getting hit by tons of different requests at the same time? I'm
> experimenting with this in python and getting hard to understand errors
> (best as I can tell it looks like it's complainin that the output is
> None). Just wondering if anyone has any insight about what's going on
> on the server side and whether a) this _should_ work, b) it _could_
> work if done properly.
>
> Thanks
> Tim
>
>


Re: Issue with dictionary creator? [EXTERNAL]

2021-03-18 Thread Peter Abramowitsch
Thanks Sean and Eugenia,

I'm glad that it's not just me.   I'll do some stats on this.  I also found
another possible issue where there's no current snomed mapping for a CUI
(perhaps one of them is obsolete), so what happens then that the CUI based
entry remains in TUI and in CUI_TERMS, but there's none in either SNOMEDCT
or in PREFTERM.

Peter

On Thu, Mar 18, 2021 at 1:28 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Peter, Eugenia,
>
> I haven't noticed the occurrence, but can believe that this could happen.
> I am not sure why or how.  I suppose you could run two counts to find out
> how many are missing - I don't want to know.
>
> There are probably a few things that could be done to 'cover' this
> problem, but the best would be making the dictionary creator fill in the
> blanks.
>
> Sean
> 
> From: Monogyiou, Eugenia 
> Sent: Thursday, March 18, 2021 6:34 AM
> To: dev@ctakes.apache.org
> Subject: RE: Issue with dictionary creator? [EXTERNAL]
>
> * External Email - Caution *
>
>
> Just to clarify the cases I encountered were not valid conceptually but
> appeared to be valid , i.e. had cui, tui and SNOMED code. I used today as
> an example to show exactly that "conceptual" issue but it has been many
> months since the last time I encountered this so I don't have any proper
> examples to list at the moment.
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com
>
>
> -Original Message-
> From: Monogyiou, Eugenia
> Sent: 18 March 2021 10:27
> To: dev@ctakes.apache.org
> Subject: RE: Issue with dictionary creator?
>
> Hi Peter,
>
> Yes I have (before I started using the cased format) and it was only
> medication indeed-- however it was  drugs that should not be annotated as
> such in the first place , e.g. today as an antibiotic which led me to think
> it may have had something to do with broken links, even residuals from
> efforts to "fix" certain entries perhaps?  Our cohort was for heart attack
> so not a very broad range of meds were present in the letters; perhaps I
> did not encounter "valid" cases just out of luck because of the specific
> cohort...?
>
> Kind Regards,
>
> Eugenia Monogyiou
>
> -Original Message-
> From: Peter Abramowitsch 
> Sent: 18 March 2021 10:19
> To: dev@ctakes.apache.org
> Subject: Issue with dictionary creator?
>
> Has anyone seen an issue where a dictionary is created from UMLS sources
> where there is no entry in PREFTERM for a valid CUI that is present in TUI,
> CUI_TERMS, and SNOMEDCT_US?.
>
> It seems to be happening in certain medication mentions where there is a
> base drug instance and then various forms.  It could be that one of the
> form descriptions is obsolete, but that wouldn't explain why it was only
> half-present in the resulting dictionary.
>
> for instance
>
> clobetasol CUI 8992  has an entry in every table clobetasol emollient  CUI
> 4520933  has all information in every table clobetasol topical  CUI 3207574
> but is missing only in PREFTERM
>
> cui_term
> '3207574','0','2','clobetasol topical','clobetasol'
> '3207574','0','8','clobetasol - containing product in cutaneous dose
> form','clobetasol'
>
> Tui
> '3207574','200'
>
> Snomed
> '3207574','771278006'
>
> Prefterm
> Blank.
>
> Peter
> Disclaimer: This email and any attachments are sent in strictest
> confidence for the sole use of the addressee and may contain legally
> privileged, confidential, and proprietary data. If you are not the intended
> recipient, please advise the sender by replying promptly to this email and
> then delete and destroy this email and any attachments without any further
> use, copying or forwarding.
>


Issue with dictionary creator?

2021-03-18 Thread Peter Abramowitsch
Has anyone seen an issue where a dictionary is created from UMLS sources
where there is no entry in PREFTERM for a valid CUI that is present in TUI,
CUI_TERMS, and SNOMEDCT_US?.

It seems to be happening in certain medication mentions where there is a
base drug instance and then various forms.  It could be that one of the
form descriptions is obsolete, but that wouldn't explain why it was only
half-present in the resulting dictionary.

for instance

clobetasol CUI 8992  has an entry in every table
clobetasol emollient  CUI 4520933  has all information in every table
clobetasol topical  CUI 3207574 but is missing only in PREFTERM

cui_term
'3207574','0','2','clobetasol topical','clobetasol'
'3207574','0','8','clobetasol - containing product in cutaneous dose
form','clobetasol'

Tui
'3207574','200'

Snomed
'3207574','771278006'

Prefterm
Blank.

Peter


Re: 4.0.0.1 patch [EXTERNAL]

2021-03-01 Thread Peter Abramowitsch
Hi Sean,

Apropos your last email,  I'm never quite sure when/how to let maven access
outside repositories.   I notice that sometimes it is downloading the very
jars I am trying to build., but if I tell it to be local only, then the
build will sometimes fail because of other dependencies it needs to
update.  It's a bit of an octopus.And so, if I were to create a maven
submodule with my permissions, how could I be sure not to contaminate the
global archive when I sure don't want to..?

Peter

On Mon, Mar 1, 2021 at 6:13 PM Finan, Sean 
wrote:

> Hi Sean (M.),
>
> I don't know if this helps at all, but I usually keep all custom code in a
> separate maven submodule (e.g. ctakes-myproject) and just throw that module
> reference into the main ctakes pom.  There are a lot of benefits to doing
> things this way, including sharing code (vcs or directly), sharing prebuilt
> jars, version independency, etc.  You might already do this for other
> things, in which case this is just a reminder.  You can still keep using
> the ctakes package system which makes piper specifications easier.
>
> Sean
>
>
> 
> From: Mullane, Sean *HS 
> Sent: Monday, March 1, 2021 11:19 AM
> To: dev@ctakes.apache.org
> Subject: RE: 4.0.0.1 patch [EXTERNAL]
>
> * External Email - Caution *
>
>
> Sean,
>
> You're right that CuiFilterAnnotator is a custom annotator. It looks like
> I will need to merge the auth changes from 4.0.0.1 into our branch and
> recompile.
>
> Thanks,
> Sean
>
> -Original Message-
> From: Finan, Sean 
> Sent: Saturday, February 27, 2021 2:33 PM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0.1 patch [EXTERNAL]
>
> Hi Sean (Mullane),
>
> For the original problem:
>
> No Analysis
> > > Component found for org.apache.ctakes.core.ae.CuiFilterAnnotator
>
> This means that somewhere in your piper file you have an "add" command for
> an analysis engine named "CuiFilterAnnotator".
>
> There is no  org.apache.ctakes.core.ae.CuiFilterAnnotator in ctakes 4.0.0
> or trunk:
>
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/tags/ctakes-4.0.0/ctakes-core/src/main/java/org/apache/ctakes/core/ae/__;!!NZvER7FxgEiBAiR_!9EqSrWbJ5y-PF0tnZc30W49RI0H8lQQb2HgfBO659g3w5yAPh0C-ZyhCeNRsIc7y9t9Q3wkVLpM$
>
> https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-core/src/main/java/org/apache/ctakes/core/ae/__;!!NZvER7FxgEiBAiR_!9EqSrWbJ5y-PF0tnZc30W49RI0H8lQQb2HgfBO659g3w5yAPh0C-ZyhCeNRsIc7y9t9QkmE1D-E$
>
> I also cannot find a class named "CuiFilterAnnotator" in my copy of trunk
> in any of the projects - including ytex.
>
> That leads me to believe that CuiFilterAnnotator is a custom annotator in
> your local installation - which I hope that you didn't overwrite with a
> copy of 4.0.0.1
>
> If you can find the CuiFilterAnnotator file in your previous installation,
> just copy it over in a custom jar.  After rerunning you may find another
> missing class, then another.  Such can be the pains of working with custom
> alterations.
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Friday, February 26, 2021 12:27 PM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0.1 patch [EXTERNAL]
>
> * External Email - Caution *
>
>
> I don't mean to butt in on another conversation, but for the UMLS
> authentication to work in 4.0.0.1, you would need to replace
> ctakes-dictionary-lookup-fast jar, the ctakes-dictionary-lookup.jar and for
> safety's sake the ctakes-core jar too.
>
> Peter
>
> On Fri, Feb 26, 2021 at 6:01 PM Mullane, Sean *HS <
> sp...@hscmail.mcc.virginia.edu> wrote:
>
> > Can you check my understanding of how to apply the patch? I replaced
> > the ctakes-core-4.0.0.jar file in ./lib with ctakes-core-4.0.0.1.jar.
> > I also replaced the user/pass values in ctakes.profile with the API key
> values.
> > Are there any steps I missed there?
> >
> > Sean
> >
> > -Original Message-
> > From: Miller, Timothy 
> > Sent: Friday, February 26, 2021 6:41 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: 4.0.0.1 patch [EXTERNAL]
> >
> > Hi Sean,
> > I can't answer your primary question, but my recollection is that
> > 4.0.0.1 was an absolutely minimalist change to just fix the
> > authentication, so I don't think ytex would've been touched.
> > Tim
> >
> >
> > On Thu, 2021-02-25 at 17:24 +, Mullane, Sean *HS wrote:
> > > * External Email - Caution *
> > >
> > >
> > > Hello,
> > >
> > > I am just catching up with the NLM auth c

Re: 4.0.0.1 patch [EXTERNAL]

2021-02-26 Thread Peter Abramowitsch
I don't mean to butt in on another conversation, but for the UMLS
authentication to work in 4.0.0.1,
you would need to replace ctakes-dictionary-lookup-fast jar, the
ctakes-dictionary-lookup.jar and for safety's sake the ctakes-core jar too.

Peter

On Fri, Feb 26, 2021 at 6:01 PM Mullane, Sean *HS <
sp...@hscmail.mcc.virginia.edu> wrote:

> Can you check my understanding of how to apply the patch? I replaced the
> ctakes-core-4.0.0.jar file in ./lib with ctakes-core-4.0.0.1.jar. I also
> replaced the user/pass values in ctakes.profile with the API key values.
> Are there any steps I missed there?
>
> Sean
>
> -Original Message-
> From: Miller, Timothy 
> Sent: Friday, February 26, 2021 6:41 AM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0.1 patch [EXTERNAL]
>
> Hi Sean,
> I can't answer your primary question, but my recollection is that
> 4.0.0.1 was an absolutely minimalist change to just fix the
> authentication, so I don't think ytex would've been touched.
> Tim
>
>
> On Thu, 2021-02-25 at 17:24 +, Mullane, Sean *HS wrote:
> > * External Email - Caution *
> >
> >
> > Hello,
> >
> > I am just catching up with the NLM auth changes. I tried replacing the
> > ctakes-core-4.0.0.jar file with ctakes-core-4.0.0.1.jar, and am
> > getting this error:
> >
> > ERROR [PiperFileRunner] MESSAGE LOCALIZATION FAILED: Can't find
> > resource for bundle java.util.PropertyResourceBundle, key No Analysis
> > Component found for org.apache.ctakes.core.ae.CuiFilterAnnotator
> >
> > I saw a message from Tim Miller from December mentioning removing ytex
> > components from ctakes-core. Was this done on the released version of
> > 4.0.0.1? We're using ytex so I wonder if that may be the cause of this
> > error. Or maybe applying the patch isn't as simple as drop-in
> > replacing the jar? (I changed the API key in my config files and that
> > seems to be working as expected).
> >
> > Thanks,
> > Sean
> >
> >
>


Looking for comparable experiences with mysql

2021-02-25 Thread Peter Abramowitsch
Hi all,

As an experiment I extracted my rather large HSQL UMLS dictionary into a
local MYSQL instance and ran the equivalent of 3 simultaneous ctakes
pipelines with the overlap lookup annotator against it with a set of 1000
notes.

Comparing that with the same setup running against the traditional
in-memory HSQL database (three separate instances), I was surprised to find
that the Mysql implementation it was 30% faster even though it is an out of
process DB

Has that been anyone else's experience as well?  And if so, do you have any
experience with a MYSQL based UMLS dictionary with 100+ pipeline
connections?

Peter


Re: Identifying dates for identified annotations

2021-02-19 Thread Peter Abramowitsch
Hi Muhammad,

Obviously, the first thing to say is that you can only extract information
that is available in the note.  So if the note doesn't give you enough
clues, then there's no magic way to infer dates.   Sometimes what you want
is already in structured data that is available alongside the note.
Especially the dates of procedures.  That would be your most painless way
of getting that information.

Look for the note structure first - is it free text or is it just a
rendered table, where the date you're looking for is in another column?

But back to your topic,  one way I've tackled it in the past is to allow
for degrees of "accuracy" or fuzziness.  It was a while ago so I don't
remember which of the temporal annotators I was using, but the process is
the same.  Perhaps someone else here has spent time on this with the
current temporal annotators and can give you a more detailed answer.

1.   Starting with the date of the note as the coarsest timestamp.  That
would have the lowest accuracy and potentially be used as an anchor.
2.   Then see if there's a better anchor date in the note (perhaps at the
head of an inner section)
2.  Then you use the temporal annotator to see if an actual date is
available for the procedure.  These are rarely found in free text but that
would have the highest accuracy.
3.  Then you look for relative times clues like "yesterday", "tomorrow",
"last monday", "two days ago".  A temporal annotator should be able to find
these..  If they're present, then a relative date, can be created by using
anchor datetime as a baseline + offset and calculating the putative time of
the event.

This is usually done in two stages   (here just using pseudocode)
"relative time hint"  becomes  coded relative offset   as in
"yesterday"  becomes   day -1
then putativeDate = strtotime(anchor, coded_offset)

Even if you can calculate a date this way, it is often highly inaccurate
because the relative time hints may be relative to something other than the
note itself.
In short, it is a very tricky area.

Peter

On Thu, Feb 18, 2021 at 10:32 PM Muhammad Ali Syed  wrote:

> Hi there,
>
> I'm a new cTakes user and have been following the recent developments in
> cTakes for a couple of months now.
>
> My question is: Is there a component that assigns mentioned dates, within
> the clinical notes, to the identified annotations? I know the temporal
> module can place events into coarse temporal bins but I am talking about
> the date / exact-time level..
>
> If the answer is no, can someone share how they have approached this
> problem(e.g. finding the date a procedure was performed)? I would
> appreciate any suggestions.
>
> Best Regards,
> Muhammad
>


Re: Dictionary "bad" codes

2021-02-15 Thread Peter Abramowitsch
Good information to know.  Thanks Kean.

Peter

On Mon, Feb 15, 2021 at 4:47 PM Kean Kaufmann  wrote:

> Yes, but that way you can't get the SNOMED code if you need it - the code
>> becomes a custom vocabulary item.
>
>
> Not if you hang it off an existing UMLS CUI and integrate the
> BsvRareWordDictionary and the UmlsJdbcRareWordDictionary using
>  .  Don't recall exactly where this is in the Fast
> Dictionary Lookup
> <https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup>
>  docs/examples,
> but it's there somewhere...
>
> [image: image.png]
>
> I do use the BSV for concepts that are not in the script file at all.
>
>
> I use BSV files for adding both synonyms and concepts.  The concept BSV
> has its own ; the synonym BSV doesn't.
>
>
> On Mon, Feb 15, 2021 at 10:21 AM Peter Abramowitsch <
> pabramowit...@gmail.com> wrote:
>
>> HI Kean
>>
>> Yes, but that way you can't get the SNOMED code if you need it - the code
>> becomes a custom vocabulary item. Adding a synonym gets you the real
>> linkage.  I do use the BSV for concepts that are not in the script file at
>> all.
>>
>> I don't "edit" the HSQL file per se.  I have a parallel SED script which
>> does the editing (add, change, delete) of many items.  I maintain the sed
>> script and use it to massage the dictionary I created using the creator.
>>
>> Peter
>>
>>
>>
>> On Mon, Feb 15, 2021 at 4:16 PM Kean Kaufmann 
>> wrote:
>>
>> > FWIW, rather than editing the HSQLDB script, we use Sean's
>> > BsvRareWordDictionary to add phrases with a BSV file:
>> >
>> > cTakesHsql.xml:
>> >
>> > 
>> > 
>> > 
>> > 
>> > AddPhrases
>> >
>> >
>> >
>> org.apache.ctakes.dictionary.lookup2.dictionary.BsvRareWordDictionary
>> > 
>> > > >
>> >
>> value="org/apache/ctakes/dictionary/lookup/fast/custom2020ab/AddPhrases.bsv"/>
>> > 
>> > 
>> >
>> > AddPhrases.bsv:
>> >
>> > C1276061|nstemi
>> > C1276061|non st elevation mi
>> > C1276061|non-st elevation mi
>> > C1536222|non-stemi
>> > C1276061|non st elevation myocardial infarction
>> > C1276061|non st elevated myocardial infarction
>> > C1276061|non-st elevated myocardial infarction
>> > C1276061|non st elevated mi
>> > C1276061|non-st elevated mi
>> >
>> > On Mon, Feb 15, 2021 at 10:09 AM Peter Abramowitsch <
>> > pabramowit...@gmail.com>
>> > wrote:
>> >
>> > > Hi Eugenia
>> > > There may be better ways to do this, but I would insert these as
>> Synonyms
>> > > in the CUI_TERMS table.  The concept is already there:
>> > >
>> > > insert this:
>> > > INSERT INTO CUI_TERMS VALUES(1303258,0,1,'stemi','stemi')
>> > >
>> > > the concept is already declared here
>> > > INSERT INTO PREFTERM VALUES(1303258,'Acute ST segment elevation
>> > myocardial
>> > > infarction (disorder)')
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Feb 15, 2021 at 3:46 PM Monogyiou, Eugenia <
>> > > eugenia.monogy...@nttdata.com> wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > Peter we had a discussion quite some time ago about additions and
>> > > removals
>> > > > that would result into a good dictionary. I have a specific example
>> I
>> > > would
>> > > > ask for your help please (or anyone else that has an idea :) )
>> > > >
>> > > > So STEMI and NSTEMI are correctly annotated but "Code" and "Coding
>> > > Scheme"
>> > > > are null... when that happens in my tests it is usually because of a
>> > > badly
>> > > > formed term interpretation which is to be discarded anyway. I have
>> > found
>> > > > similar cases mentioned as "ghost" terms in the UMLS documentation;
>> in
>> > > this
>> > > > case, however, they are valid interpretations and quite important
>> to my
>> > > > tests because I am working on heart attack and cardiac disease
>> cohorts.
>> > > >
>> > > > I tried associating with the appropriate code by adding the
>> following
>> > > > inserts but it did not work ...
>> > > >
>> > > > INSERT INTO UPPER VALUES(1276061,'','NSTEMI','',31,1)
>> > > > INSERT INTO UPPER VALUES(1303258,'','STEMI','',31,1)
>> > > >
>> > > > Any ideas please?
>> > > >
>> > > > Many thanks in advance,
>> > > >
>> > > > Kind Regards,
>> > > >
>> > > > Eugenia Monogyiou | NTT Data UK
>> > > > Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>> > > >
>> > > > Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com> > > > eugenia.monogy...@nttdata.com>
>> > > >
>> > > > Disclaimer: This email and any attachments are sent in strictest
>> > > > confidence for the sole use of the addressee and may contain legally
>> > > > privileged, confidential, and proprietary data. If you are not the
>> > > intended
>> > > > recipient, please advise the sender by replying promptly to this
>> email
>> > > and
>> > > > then delete and destroy this email and any attachments without any
>> > > further
>> > > > use, copying or forwarding.
>> > > >
>> > >
>> >
>>
>


Re: Dictionary "bad" codes

2021-02-15 Thread Peter Abramowitsch
HI Kean

Yes, but that way you can't get the SNOMED code if you need it - the code
becomes a custom vocabulary item. Adding a synonym gets you the real
linkage.  I do use the BSV for concepts that are not in the script file at
all.

I don't "edit" the HSQL file per se.  I have a parallel SED script which
does the editing (add, change, delete) of many items.  I maintain the sed
script and use it to massage the dictionary I created using the creator.

Peter



On Mon, Feb 15, 2021 at 4:16 PM Kean Kaufmann  wrote:

> FWIW, rather than editing the HSQLDB script, we use Sean's
> BsvRareWordDictionary to add phrases with a BSV file:
>
> cTakesHsql.xml:
>
> 
> 
> 
> 
> AddPhrases
>
>
> org.apache.ctakes.dictionary.lookup2.dictionary.BsvRareWordDictionary
> 
> 
> value="org/apache/ctakes/dictionary/lookup/fast/custom2020ab/AddPhrases.bsv"/>
> 
> 
>
> AddPhrases.bsv:
>
> C1276061|nstemi
> C1276061|non st elevation mi
> C1276061|non-st elevation mi
> C1536222|non-stemi
> C1276061|non st elevation myocardial infarction
> C1276061|non st elevated myocardial infarction
> C1276061|non-st elevated myocardial infarction
> C1276061|non st elevated mi
> C1276061|non-st elevated mi
>
> On Mon, Feb 15, 2021 at 10:09 AM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Hi Eugenia
> > There may be better ways to do this, but I would insert these as Synonyms
> > in the CUI_TERMS table.  The concept is already there:
> >
> > insert this:
> > INSERT INTO CUI_TERMS VALUES(1303258,0,1,'stemi','stemi')
> >
> > the concept is already declared here
> > INSERT INTO PREFTERM VALUES(1303258,'Acute ST segment elevation
> myocardial
> > infarction (disorder)')
> >
> >
> >
> >
> >
> >
> > On Mon, Feb 15, 2021 at 3:46 PM Monogyiou, Eugenia <
> > eugenia.monogy...@nttdata.com> wrote:
> >
> > > Hello,
> > >
> > > Peter we had a discussion quite some time ago about additions and
> > removals
> > > that would result into a good dictionary. I have a specific example I
> > would
> > > ask for your help please (or anyone else that has an idea :) )
> > >
> > > So STEMI and NSTEMI are correctly annotated but "Code" and "Coding
> > Scheme"
> > > are null... when that happens in my tests it is usually because of a
> > badly
> > > formed term interpretation which is to be discarded anyway. I have
> found
> > > similar cases mentioned as "ghost" terms in the UMLS documentation; in
> > this
> > > case, however, they are valid interpretations and quite important to my
> > > tests because I am working on heart attack and cardiac disease cohorts.
> > >
> > > I tried associating with the appropriate code by adding the following
> > > inserts but it did not work ...
> > >
> > > INSERT INTO UPPER VALUES(1276061,'','NSTEMI','',31,1)
> > > INSERT INTO UPPER VALUES(1303258,'','STEMI','',31,1)
> > >
> > > Any ideas please?
> > >
> > > Many thanks in advance,
> > >
> > > Kind Regards,
> > >
> > > Eugenia Monogyiou | NTT Data UK
> > > Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
> > >
> > > Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com > > eugenia.monogy...@nttdata.com>
> > >
> > > Disclaimer: This email and any attachments are sent in strictest
> > > confidence for the sole use of the addressee and may contain legally
> > > privileged, confidential, and proprietary data. If you are not the
> > intended
> > > recipient, please advise the sender by replying promptly to this email
> > and
> > > then delete and destroy this email and any attachments without any
> > further
> > > use, copying or forwarding.
> > >
> >
>


Re: Dictionary "bad" codes

2021-02-15 Thread Peter Abramowitsch
Hi Eugenia
There may be better ways to do this, but I would insert these as Synonyms
in the CUI_TERMS table.  The concept is already there:

insert this:
INSERT INTO CUI_TERMS VALUES(1303258,0,1,'stemi','stemi')

the concept is already declared here
INSERT INTO PREFTERM VALUES(1303258,'Acute ST segment elevation myocardial
infarction (disorder)')






On Mon, Feb 15, 2021 at 3:46 PM Monogyiou, Eugenia <
eugenia.monogy...@nttdata.com> wrote:

> Hello,
>
> Peter we had a discussion quite some time ago about additions and removals
> that would result into a good dictionary. I have a specific example I would
> ask for your help please (or anyone else that has an idea :) )
>
> So STEMI and NSTEMI are correctly annotated but "Code" and "Coding Scheme"
> are null... when that happens in my tests it is usually because of a badly
> formed term interpretation which is to be discarded anyway. I have found
> similar cases mentioned as "ghost" terms in the UMLS documentation; in this
> case, however, they are valid interpretations and quite important to my
> tests because I am working on heart attack and cardiac disease cohorts.
>
> I tried associating with the appropriate code by adding the following
> inserts but it did not work ...
>
> INSERT INTO UPPER VALUES(1276061,'','NSTEMI','',31,1)
> INSERT INTO UPPER VALUES(1303258,'','STEMI','',31,1)
>
> Any ideas please?
>
> Many thanks in advance,
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com eugenia.monogy...@nttdata.com>
>
> Disclaimer: This email and any attachments are sent in strictest
> confidence for the sole use of the addressee and may contain legally
> privileged, confidential, and proprietary data. If you are not the intended
> recipient, please advise the sender by replying promptly to this email and
> then delete and destroy this email and any attachments without any further
> use, copying or forwarding.
>


Re: authentication issue?

2021-02-10 Thread Peter Abramowitsch
Hi Greg

401 means either not authenticated or there's a problem at the UMLS end -
which does happen every now and then.  I've already told their guy about
it,   But at this very instant 19:18 GMT, it is working

10 Feb 2021 20:18:43  INFO UmlsUserApprover - Checking UMLS Account at
https://utslogin.nlm.nih.gov/cas/v1/api-key:
..10 Feb 2021 20:18:45  INFO UmlsUserApprover -   UMLS Account has been
validated


Peter

On Wed, Feb 10, 2021 at 7:12 PM Greg Silverman  wrote:

> This had been working (4.0.0.1), but now am getting the following error:
>
> 10 Feb 2021 18:03:40  INFO SentenceDetector - Sentence detector model file:
> org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 10 Feb 2021 18:03:40  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 10 Feb 2021 18:03:40  INFO ContextDependentTokenizerAnnotator - Finite
> state machines loaded.
> 10 Feb 2021 18:03:40  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 10 Feb 2021 18:03:41  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 10 Feb 2021 18:03:42  INFO AbstractJCasTermAnnotator - Using dictionary
> lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 10 Feb 2021 18:03:42  INFO AbstractJCasTermAnnotator - Exclusion tagset
> loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> VBP VBZ WDT WP WPS WRB
> 10 Feb 2021 18:03:42  INFO AbstractJCasTermAnnotator - Using minimum term
> text span: 3
> 10 Feb 2021 18:03:42  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 10 Feb 2021 18:03:42  INFO DictionaryDescriptorParser - Parsing dictionary
> specifications:
> 10 Feb 2021 18:03:42 ERROR UmlsUserApprover -   umlsUser CHANGE_ME not
> allowed.  It is a placeholder reminder.
> 10 Feb 2021 18:03:42 ERROR UmlsUserApprover -   umlsPass CHANGE_ME not
> allowed.  It is a placeholder reminder.
> 10 Feb 2021 18:03:42  INFO UmlsUserApprover - Checking UMLS Account at
> https://utslogin.nlm.nih.gov/cas/v1/api-key:
> ..
> 10 Feb 2021 18:03:43 ERROR UmlsUserApprover - Server returned HTTP response
> code: 401 for URL: https://utslogin.nlm.nih.gov/cas/v1/api-key
> 10 Feb 2021 18:03:43 ERROR PiperFileRunner - Initialization of annotator
> class "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
> failed.  (Descriptor: )
> /data/ctakes_out /usr/share/ctakes
>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE 
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


Re: error: CRITICAL

2021-02-10 Thread Peter Abramowitsch
Hi Greg

I would help you, but unfortunately I'm in italy at the back end of an
abysmally slow internet, especially in upload.   But still it would be
interesting if I could see an example anonymized note that has this FP
error problem as I could try it on my version of 4.0.0. The fact that
4.0.1 is no longer working makes me wonder whether you have some kind of
deployment context error rather than the core cTakes code itself.

Peter

On Wed, Feb 10, 2021 at 4:25 PM Greg Silverman  wrote:

> We're running version 4.0.0.1 on ~12K notes. The first time we ran it I got
> a heap space error at ~10.5k notes processed (at about ~38 hours).
>
> I increased the heap space params and then reran. This time it died at the
> same place, but with a different error (see below):
>
> SEVERE: Exception occurred
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
> at
> org.apache.ctakes.contexttokenizer.ae
> .ContextDependentTokenizerAnnotator.process(ContextDependentTokenizerAnnotator.java:105)
> at
>
> org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
> at
>
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:396)
> at
>
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314)
> at
>
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570)
> at
>
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:412)
> at
>
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344)
> at
>
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265)
> ...
> Caused by: java.lang.NumberFormatException: For input string: "f"
> at
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
> at java.lang.Double.parseDouble(Double.java:538)
> 
>
> Thus, it looks like a string is being detected as a float. This had worked
> in version 4.0.1, so it must have been fixed at some point. Even after I
> made changes for the new NLM authentication for UMLS and tested it in 4.0.1
> based on Peter's authentication solution, it stopped working after January
> 15th.  Unfortunately, we're not set up to compile 4.0.1.
>
> That being said, does someone have a working version of 4.0.1 built from
> the trunk? If so, could you please send me a copy?
>
> If not, how can I find the offending file?
>
> This is kind of critical, since we're in the middle of an experiment and
> another side effect of reverting to 4.0.0.1 is it is a LOT slower than
> 4.0.1.
>
> Thanks very much in advance!
>
> Greg--
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE 
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


Re: Passing SectionsBsv to piper containing BsvRegexSectionizer [EXTERNAL] [SUSPICIOUS]

2021-01-30 Thread Peter Abramowitsch
Hi Tom,  I think there is a way to do what you were thinking of.  I'm not
suggesting it's a better solution.  It's just a thought.   In Java you can
create a new ClassLoader, and with this ClassLoader you can create a second
definition of a Class, and from that you can create a new instance of the
class which does not share anything - even the statics of the other class
instance thus allowing you to create two singletons of the same
"class"  I doubt that you would be able to do it via a piper, though.
You'd have to create the pipeline programmatically using the AnalysisEngine
APIs and add the BSV lookups from the different class loaders.   It would
be a lot of work and not as tidy as the kind of thing Sean is suggesting.
But you can lie awake at night thinking about it anyway.

Peter

On Sat, Jan 30, 2021 at 4:27 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Tom,
>
> You aren't the only person with an itchy "send" clicker.
>
> You probably don't want
>if ( sections.size <= 1 ) {
>   return;
>}
> Because you may want to rename that first section.
>
>
> Anyway, all of the code would go into a new JCasAnnotator_ImplBase class
> process( jCas ) method.
>
> class SectionAdjuster extends JCasAnnotator_ImplBase {
> @Override
>public void process( final JCas jcas ) throws
> AnalysisEngineProcessException {
>   // all that code ...
>}
> }
>
> Then that would go into the piper after the BsvRegexSectionizer
>
> // Fetch Sections
> add BsvRegexSectionizer SectionsBsv=/my/custom/file.bsv
> // Change or remove Sections
> add my.java.package.SectionAdjuster
>
>
>
> Sean
>
>
> 
> From: Finan, Sean 
> Sent: Saturday, January 30, 2021 10:00 AM
> To: dev@ctakes.apache.org
> Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer
> [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi Thomas,
>
> Short answer:
> You can't do that.  The collection of Section definitions is shared
> through all of the pipelines.
>
> Long answer:
> I think that there might be another approach.
>
> My guess is that within your two different note types there is some common
> section header expression, but the content and intention and use of the
> section information is different.
>
> If that is the case, I would propose the following:
>
> 1.  Use just a single sectionizer.
> -- sectionization, as with any regex process, can be "slow".  It is better
> to detect a common word by running just a single regex over text than two
> different regex that look for the same word.
> 2.  Use one pipeline definition.
> -- While using two unlike pipelines simultaneously, if processing n notes
> of type A takes X seconds and processing n' notes of type B takes >>X
> seconds then you are stuck waiting on B process time.
> -- It also makes latter description of a single pipeline easier ...  as
> below (hopefully).
> 3.  Make a simple annotation engine that determines note type and adjusts
> the properties of sections identified with the common section header based
> upon the note type.
> -- The complexity of this depends upon the differences in sections with
> common headers.
>
> -- Please Note: I am typing this freehand, so there are probably typos and
> missing items.  There are also probably better ways to do the same thing.
> It should give you the general idea.  A lot of people in the community
> don't dream in java so I sometimes add this kind of thing to (hopefully)
> save time.
>
>
> String noteType = new NoteSpecs( jCas ).getDocumentType();
>
> List sections = new ArrayList( JCasUtil.select( jCas,
> Segment.class ) );
> Collections.sort( Comparator.comparingInt( Segment::getBegin ) );
>
> if ( sections.size <= 1 ) {
>return;
> }
>
> //  Join sections if one is unwanted.
> Collection unwantedSections = new HashSet<>();
> Segment previousSection = sections.get( 0 );
> for ( int i=1; iSegment section = sections.get( i );
>if ( !isWantedSection( noteType, section.getPreferredText() ) {
>   previousSection.setEnd( section.getEnd() );
>   unwantedSections.add( section );
>   section.removeFromIndices();
>   continue;
>}
>previousSection = section;
> }
> sections.removeAll( unwantedSections );
>
> // Rename Sections
> sections.foreach( s -> adjustSectionInfo( noteType, s ) );
>
>
> //  Something to defined unwanted sections:
> Collection BAD_A_SECTIONS = Arrays.asList( "Bilge", "Plumbing" );
> Collection BAD_B_SECTIONS = Arrays.asList( "Joint", "Elbow" );
> boolean isWantedSection( String noteType, String sectionType ) {
>return ( sectionType.equals("A") && BAD_A_SECTIONS.contains(
> sectionType ) )
>||   ( sectionType.equals("B") && BAD_B_SECTIONS.contains(
> sectionType ) )
> }
>
> // And something to adjust properties of certain section types:
> Map X_TO_A_SECTIONS = new HashMap<>()
> Map X_TO_B_SECTIONS = new HashMap<>()
> initRenameMaps() {
>X_TO_A_SECTIONS.put( "Stern", "Sternum" );
>

Re: performance report [EXTERNAL]

2021-01-25 Thread Peter Abramowitsch
Great, thanks Greg.  I'd like to see the kind of stats that are available
beyond what one can scrape from log4j

Peter

On Mon, Jan 25, 2021 at 5:16 PM Greg Silverman  wrote:

> Hi Sean,
> Thanks! I'll give it a whirl and let you know how it works out.
>
> Best!
>
> On Mon, Jan 25, 2021 at 8:48 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Greg, Peter,
> >
> > I believe that the performance report comes from a
> > CollectionProcessingEngine (CPE)
> >
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html
> >
> >
> > I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the
> > tool's name, but that may have changed in recent years.
> >
> > The PipelineBuilder class in ctakes.core used by the PiperFileRunner
> could
> > be changed to use this style of running a single-threaded pipeline -
> right
> > now it uses a simpler UIMAFit method.
> > The code changes are relatively minor, but obviously significant testing
> > would be required.  The ctakes PipelineBuilder does use a CPE for
> > multi-threaded pipelines, so there has already been some testing on that
> > front.
> >
> > You can look at the ctakes PipelineBuilder run() method.  If you get rid
> > of the if (threadCount==1) {..} else {   the the CPE will always be used.
> > Then just add a cpe.getPerformanceReport() after cpe.process() you should
> > have a ProcessTrace object.  This is where my guessing ends as I have
> never
> > used a ProcessTrace and don't know exactly what to beg of it.
> >
> > I hope that is a decent start,
> > Sean
> > 
> > From: Greg Silverman 
> > Sent: Saturday, January 23, 2021 3:01 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: performance report [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Peter,
> > I have no doubt about performance differences regarding variance between
> > note styles and pipeline components.
> >
> > We're looking for a way to benchmark the standard/non-customized pipeline
> > performance for processing a largish set of identical notes using several
> > clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
> > clamp). At the command line, both metamap and biomedicus output a
> standard
> > performance report with total timings and the details for each specific
> > pipeline component. I assume there is a way to enable the performance
> > report output available in the GUI version of ctakes at the command line
> -
> > which is what I'm really interested in.
> >
> > We're fine with information at a very coarse level, since we're
> interested
> > in a particular note type, so the aforementioned report should be
> > sufficient. I'm just wondering how to enable it using the standard
> pipeline
> > in cTAKES.
> >
> > Thanks!
> >
> > Greg--
> >
> >
> >
> > On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> >
> > > Hi Greg,
> > >
> > > I’ve found that there’s so much difference between note styles that
> have
> > > performance implications and so many interactions between pipeline
> > > configurations which affect overall performance, that really the only
> way
> > > to get a sense of performance is either on a vary coarse level,
> measuring
> > > process time across large collections of varied notes, or very granular
> > > using something like jvisualvm.   Using the latter I saw some
> surprising
> > > things, some of which I was able to tackle with minor software changes,
> > > while others are deep in UIMA utilities used by cTakes..  The biggest
> > > factor in my experience after processing millions of notes is after
> they
> > > have reached about 5k AND are missing punctuation.  At around this size
> > > begins a geometric rise in complexity of internal structures that
> depend
> > on
> > > sentences and a serious elevation of processing time.
> > >
> > > Peter
> > >
> > > Sent from my iPad
> > >
> > > > On Jan 23, 2021, at 18:09, Greg Silverman 
> wrote:
> > > >
> > > > I found this:
> > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4

Re: performance report [EXTERNAL]

2021-01-25 Thread Peter Abramowitsch
Thanks Sean.  The CPE ProcessTrace object was something I wasn't familiar
with.

Definitely, though, the piper file runner, by default,  should be as
lightweight and simple as possible.  Other options for threading or for
tracing should be injected or layered in without modifying default
behavior.  It is currently very stable.  In my alternative threading model
it runs thirty or more pipeline instances for weeks in a single process
under very heavy stress.

Peter


On Mon, Jan 25, 2021 at 3:48 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Greg, Peter,
>
> I believe that the performance report comes from a
> CollectionProcessingEngine (CPE)
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html
>
>
> I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the
> tool's name, but that may have changed in recent years.
>
> The PipelineBuilder class in ctakes.core used by the PiperFileRunner could
> be changed to use this style of running a single-threaded pipeline - right
> now it uses a simpler UIMAFit method.
> The code changes are relatively minor, but obviously significant testing
> would be required.  The ctakes PipelineBuilder does use a CPE for
> multi-threaded pipelines, so there has already been some testing on that
> front.
>
> You can look at the ctakes PipelineBuilder run() method.  If you get rid
> of the if (threadCount==1) {..} else {   the the CPE will always be used.
> Then just add a cpe.getPerformanceReport() after cpe.process() you should
> have a ProcessTrace object.  This is where my guessing ends as I have never
> used a ProcessTrace and don't know exactly what to beg of it.
>
> I hope that is a decent start,
> Sean
> 
> From: Greg Silverman 
> Sent: Saturday, January 23, 2021 3:01 PM
> To: dev@ctakes.apache.org
> Subject: Re: performance report [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Peter,
> I have no doubt about performance differences regarding variance between
> note styles and pipeline components.
>
> We're looking for a way to benchmark the standard/non-customized pipeline
> performance for processing a largish set of identical notes using several
> clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
> clamp). At the command line, both metamap and biomedicus output a standard
> performance report with total timings and the details for each specific
> pipeline component. I assume there is a way to enable the performance
> report output available in the GUI version of ctakes at the command line -
> which is what I'm really interested in.
>
> We're fine with information at a very coarse level, since we're interested
> in a particular note type, so the aforementioned report should be
> sufficient. I'm just wondering how to enable it using the standard pipeline
> in cTAKES.
>
> Thanks!
>
> Greg--
>
>
>
> On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Hi Greg,
> >
> > I’ve found that there’s so much difference between note styles that have
> > performance implications and so many interactions between pipeline
> > configurations which affect overall performance, that really the only way
> > to get a sense of performance is either on a vary coarse level, measuring
> > process time across large collections of varied notes, or very granular
> > using something like jvisualvm.   Using the latter I saw some surprising
> > things, some of which I was able to tackle with minor software changes,
> > while others are deep in UIMA utilities used by cTakes..  The biggest
> > factor in my experience after processing millions of notes is after they
> > have reached about 5k AND are missing punctuation.  At around this size
> > begins a geometric rise in complexity of internal structures that depend
> on
> > sentences and a serious elevation of processing time.
> >
> > Peter
> >
> > Sent from my iPad
> >
> > > On Jan 23, 2021, at 18:09, Greg Silverman  wrote:
> > >
> > > I found this:
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI=
> , which
> > > states: "A performance report is generated when the process is done."
> > >
> > > However, we are running this from the command line and no such report
> is
> > > being generated.
&g

Re: neural negation model in ctakes

2021-01-24 Thread Peter Abramowitsch
Thats great Tim - it sounds very sophisticated!

In fact I had made some changes to the Negex Annotator a last fall which I
hadn't checked in but was waiting for Sean to test.  In a great deal of my
own testing I discovered that Negex, which is easily expandable to
accommodate new constructions, had only a couple of serious flaws and I
believe I have fixed these, as well as a performance issue it had.   If
you're interested in testing it up against yours that would be great.
Reading your description above, I wondered how it would do in the case of
strings of entities which were negated by a single negating trigger phrase
either ahead or behind the series.  Or what happens when a series of
entities which begins as all being negated has one expressed in a way that
stops the negation pattern.  These are the weaknesses I addressed in my
changes.

Regards
Peter

On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Hi all,
> I just checked in a usable proof-of-concept for a neural (RoBERTa-based to
> be specific) negation classifier. The way it works is a tiny bit of python
> code (using FastAPI) sets up a REST interface that runs the classifier:
> ctakes-assertion/src/main/python/negation_rest.py
>
> it runs a default model that I trained and uploaded into Huggingface
> modelhub. It will automatically download the first time the server is run.
>
> there is a startup script there too:
> ctakes-assertion/src/main/python/start_negation_rest.sh
>
> The idea would be to run this on whatever machine you have with the
> appropriate GPU resources and it creates 3 REST endpoints:
> /negation/initialize  -- to load the model (takes longer the first time as
> it will download)
> /negation/process -- to classify the data and return negation values
> /negation/collection_process_complete -- to unload the model
>
> to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
>
> ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java
>
> The main work here is converting the cTAKES entities/events into a simpler
> data structure that gets sent to the python REST server, making the REST
> call, and then converting the classifier output into the polarity property.
>
> Performance:
> The accuracy of this classifier is much better in my testing. I am looking
> forward to being able to hopefully make the path to improving the
> performance easier as it can potentially just be a change to the model
> string to have it grab a new model on modelhub.
>
> The speed is marginally slower if we do a 1-for-1 swap, but that's a
> little bit misleading, because we currently run 2 parsers to generate
> features for the default ML negation module. If we don't need those parsers
> we can dramatically cut the speed of the processing even with the neural
> negation module. I tested this with the python code running on a machine
> with a 1070ti. The goal for these methods going forward if we want to scale
> should be to have the neural call do a few things with a single pass,
> especially if we are using large transformer models. But this proof of
> concept of a single task will hopefully make it easier for other folks to
> do that if they wish.
>
> FYI, another way of doing this is by using python libraries like cassis
> and actually having python functions be essentially UIMA AEs -- I think
> there will be a place for both approaches and I'm not trying to wall off
> work in that direction.
>
> Tim
>
>


Re: performance report

2021-01-23 Thread Peter Abramowitsch
Hi Greg,

I’ve found that there’s so much difference between note styles that have 
performance implications and so many interactions between pipeline 
configurations which affect overall performance, that really the only way to 
get a sense of performance is either on a vary coarse level, measuring  process 
time across large collections of varied notes, or very granular using something 
like jvisualvm.   Using the latter I saw some surprising things, some of which 
I was able to tackle with minor software changes, while others are deep in UIMA 
utilities used by cTakes..  The biggest factor in my experience after 
processing millions of notes is after they have reached about 5k AND are 
missing punctuation.  At around this size begins a geometric rise in complexity 
of internal structures that depend on sentences and a serious elevation of 
processing time. 

Peter

Sent from my iPad

> On Jan 23, 2021, at 18:09, Greg Silverman  wrote:
> 
> I found this:
> https://medium.com/@felix_chan/install-apache-ctakes-924c40967ce2, which
> states: "A performance report is generated when the process is done."
> 
> However, we are running this from the command line and no such report is
> being generated.
> 
> Thanks!
> 
>> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman  wrote:
>> 
>> Hi all,
>> Is there a way to easily generate a performance report similar to the one
>> generated by MetaMap (with timings for each task, etc.)?
>> 
>> Thanks in advance!
>> 
>> Greg--
>> 
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE 
>> Department of Surgery
>> University of Minnesota
>> g...@umn.edu
>> 
>> 
> 
> -- 
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE 
> Department of Surgery
> University of Minnesota
> g...@umn.edu


Re: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch

2021-01-20 Thread Peter Abramowitsch
Thanks Sean!

Peter

On Wed, Jan 20, 2021 at 4:25 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> ???As some have experienced, the U.S.A. National Library of Medicine (NLM)
> has changed the authentication method for using the Unified Medical
> Language System (UMLS).
>
> https://www.nlm.nih.gov/research/umls/index.html
>
>
> Though a bit late in its arrival, Apache cTAKES now has a patch release
> that supports the new UMLS authentication method.
>
>
> The release number is 4.0.0.1, an update of the previous release version
> 4.0.0 with a single change to enable the new UMLS authentication.
>
> No other code or functionality has been modified and there are no
> enhancements to the previous release 4.0.0
>
>
> There are instructions for use on the Apache cTAKES wiki.
>
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0.0.1
>
>
> The source code is available in the 4.0.0.1 tag Subversion (svn)
> repository.
>
> https://svn.apache.org/repos/asf/ctakes/tags/ctakes-4.0.0.1/
>
>
> The jar and pom files are available from maven central and any
> Applications utilizing Apache cTAKES as an Apache Maven dependency should
> update their pom files.
>
> https://search.maven.org/search?q=ctakes
>
>
> At this time the Apache infra script that points mirror download servers
> to the pre-built zip/archive files has not run.  I hope that the mirror
> servers are updated in a day or two.
>
> When the mirror servers are updated the buttons on the "Downloads" page of
> ctakes.apache.org should trigger a download of the patch version.  Until
> then you will get a "page not found" error.
>
> Until the pre-built archive downloads are available through the website,
> you can find them in the release repository.
>
>
> https://repository.apache.org/content/repositories/releases/org/apache/ctakes/ctakes-core/4.0.0.1/
>
>
> For more information please visit the wiki page on the Apache cTAKES
> 4.0.0.1 patch release.
>
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0.0.1
>
>
>
> A very special thanks goes to Peter Abramowitsch for conception and
> original implementation of the authentication code and workflow.
>
>
> Many thanks to those who boldly tested, documented and otherwise made this
> patch and its trunk equivalent possible, including
>
> Kean Kaufmann
>
> Gandhi Rajan
>
> Eugenia Monogyiou
>
> Timothy Miller
>
> and anybody else that I have forgotten (apologies).
>
>
> ?And for those of you gave gave me a bit of prodding to get this wrapped
> up and published ... in the end I am grateful and you have done us all a
> service.
>
>
> Cheers,
>
> Sean
>
>


Sean... about the UMLS URL

2021-01-11 Thread Peter Abramowitsch
Hi Sean

While helping Eugenia, I discover that there's another spot that needs
changing which I had forgotten about.  I'm happy to help, but you need
first to decide how you want it done.

When one uses the DictionaryCreator, it, of course, write a  Dictionary
Descriptor xml file and this will now contain lots of incorrect UMLS
information.

This happens in DictionaryXmlWriter.writeXmlFile

So in the minimal case, we need to remind users that if they create a new
dictionary they need to re-edit the xml file that accompanies it.

Peter


Re: Running PiperGui produces authentication error

2021-01-08 Thread Peter Abramowitsch
Hi Eugenia
Yes you need to set the url value to "" in the sno_rx.xml

The system still maintains the ability to give an alternate URL as we do in
our installation where we've created a proxy process, but it sounds like
you need your instance of cTakes to contact the uts site directly.   In
your case, cTakes is treating the old url in your file as the alternate
url.

So remove that URL definition in your sno_rx.xml  and it will now point to
the correct url which is:   https://utslogin.nlm.nih.gov/cas/v1/api-key and
is hard wired into the code module.  You don't need to define it anywhere

Peter



Peter

On Fri, Jan 8, 2021 at 8:05 PM Monogyiou, Eugenia <
eugenia.monogy...@nttdata.com> wrote:

> Hi ,
>
> Thanks for the prompt response.
> (First of all please ignore the contextAnnotator error: classpath was not
> applied when "use of module" etc. restart , apply etc. sorted that out).
> Now on to the authentication error
>
> When I look into the following files I cannot find any vendor, umlsuser,
> umlspw and umlsaddr values to set to an empty string.
>
> 1.resources/org/apache/ctakes/dictionary/lookup/fast/*.xml   or
>
> 2.desc/ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml
>
> 3.desc/ctakes-side-effect/desc/analysis_engine/DictionaryLookupAnnotator_sideEffectUMLS.xml
> 4.
> desc/ctakes-clinical-pipeline/desc/analysis_engine/auto/defaultPipeline.xml
>
> However when I check the sno_rx_16ab.xml I do find the following which I
> removed entirely :
>
> https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>
>  
>  
>  
>
> The error I get :
> [...] Using dictionary descriptor
> org/apache/ctakes/dictionary/lookup/fast/ sno_rx_16ab.xml
> Using alternate umlsURL found via : properties
> Checking UMLS account at
> https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/>   UMLS account at
> https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser"/> is not valid.
> Verify you are setting command line etc. etc. for umlsuser and umlspwd
> correctly
> Initialization of annotator class 
> org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
> failed
>
> Any ideas please? Many thanks for your assistance
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com
>
>
> -Original Message-
> From: Peter Abramowitsch 
> Sent: 08 January 2021 18:47
> To: dev@ctakes.apache.org
> Subject: Re: Running PiperGui produces No Analysis Component found for
> ContextDependentTokenizerAnnotator error
>
> Hi Eugenia
>
> Not sure if Sean has changed this part of the code that I had checked in,
> I'm still using my original check-in.  So there may be a small alignment
> issue if you are using his newer version and my original instructions.
>
> Raising the level of log4j to DEBUG, the UmlsUserValidator module should
> report where it is getting the credentials from. So that should show
> you if there are any "rogue" credentials that are being read in favor of
> your environment variable.   Make sure also that the URL fields in those
> XML files are also nulled out in case they still contain the old UTS url.
> The validator module will supply the correct URL unless you need to
> override it.
>
> Let us know the exact value of the variable(s) you are supplying.  If, for
> instance, you have a left-over UMLSUSER variable and it still has your old
> value, that would be a problem.  If you do have it, its value must be the
> string  "umls_api_key"  and the password variable would contain the key
> itself.
>
> Regards
> Peter
>
>
>
>
> On Fri, Jan 8, 2021 at 4:44 PM Monogyiou, Eugenia <
> eugenia.monogy...@nttdata.com> wrote:
>
> > Hello,
> >
> > Sean is probably very busy with the release so if anyone else could
> > please provide any guidance on the below would very much appreciated
> > as I am working on a tight deadline at the moment :(
> >
> > I switched recently to Intellij and I no longer encountering all the
> > weird maven plugin errors produced with Eclipse - however I have not
> > been able to run the Gui successfully yet.
> >
> > 1.   I got the code from the latest trunk and I do not seem to have
> > any password/ username to remove from any xml files (am I missing
> > something?) I have set an environment variable with the api key
> >
> > 2.   I run the PiperRunnerGui by navigating to
> > \ctakes-gui\src\main\java\org\apache\ctakes\gui\pipeline\PiperRunnerGu
> > i , right-click and run successfully

Re: Running PiperGui produces No Analysis Component found for ContextDependentTokenizerAnnotator error

2021-01-08 Thread Peter Abramowitsch
Hi Eugenia

Not sure if Sean has changed this part of the code that I had checked in,
I'm still using my original check-in.  So there may be a small alignment
issue if you are using his newer version and my original instructions.

Raising the level of log4j to DEBUG, the UmlsUserValidator module should
report where it is getting the credentials from. So that should show
you if there are any "rogue" credentials that are being read in favor of
your environment variable.   Make sure also that the URL fields in those
XML files are also nulled out in case they still contain the old UTS url.
The validator module will supply the correct URL unless you need to
override it.

Let us know the exact value of the variable(s) you are supplying.  If, for
instance, you have a left-over UMLSUSER variable and it still has your old
value, that would be a problem.  If you do have it, its value must be the
string  "umls_api_key"  and the password variable would contain the key
itself.

Regards
Peter




On Fri, Jan 8, 2021 at 4:44 PM Monogyiou, Eugenia <
eugenia.monogy...@nttdata.com> wrote:

> Hello,
>
> Sean is probably very busy with the release so if anyone else could please
> provide any guidance on the below would very much appreciated as I am
> working on a tight deadline at the moment :(
>
> I switched recently to Intellij and I no longer encountering all the weird
> maven plugin errors produced with Eclipse - however I have not been able to
> run the Gui successfully yet.
>
> 1.   I got the code from the latest trunk and I do not seem to have
> any password/ username to remove from any xml files (am I missing
> something?) I have set an environment variable with the api key
>
> 2.   I run the PiperRunnerGui by navigating to
> \ctakes-gui\src\main\java\org\apache\ctakes\gui\pipeline\PiperRunnerGui ,
> right-click and run successfully (no run configuration set)
>
> 3.   I navigate to
> \ctakes-clinical-pipeline-res\src\main\resources\org\apache\ctakes\clinical\pipeline\
> and I load the DefaultFastPipeline.piper
>
> 4.   First of all I had to copy here all the required subPipe files in
> the directory before the file was load successfully - I am not sure this is
> default behaviour or I am missing config
>
> 5.   and then when I attempted to run, it failed immediately with
> org.apache.uima.resource.ResourceInitializationException: MESSAGE
> LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBundle, key No Analysis Component found for
> ContextDependentTokenizerAnnotator
>
> Adding dependencies to the clinical pipeline and/or ctakes-gui pom file
> did not help.
>
> Many sincere thanks in advance,
>
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com eugenia.monogy...@nttdata.com>
>
> Disclaimer: This email and any attachments are sent in strictest
> confidence for the sole use of the addressee and may contain legally
> privileged, confidential, and proprietary data. If you are not the intended
> recipient, please advise the sender by replying promptly to this email and
> then delete and destroy this email and any attachments without any further
> use, copying or forwarding.
>


Re: Sean... & UMLS

2021-01-04 Thread Peter Abramowitsch
Hi Eugenia, Yes it is checked in.

I believe Sean is about to publish an updated version of what I checked in
for 4.0.1, and what I supplied but didn't check in for 4.0.0  The
difference is that his update allows old auth info living in xml files to
be ignored, while mine requires it to be explicirtely removed or emptied,
and then supplying the key (following instructions I attached to Jira item
CT-545.

What you do to upgrade  all depends on whether you want to start today or
soon.   Following those instructions should work now and still be valid
when Sean checks in the slightly more permissive version.

Peter



On Mon, Jan 4, 2021 at 3:54 PM Monogyiou, Eugenia <
eugenia.monogy...@nttdata.com> wrote:

> Hello,
>
> Happy New Year! Hope everyone is well!
> I am just starting to work on this again  and trying to catch up on the
> latest... I have not migrated yet , I am just about to...
> Peter, is your contribution in the trunk yet and should I be using it or
> should I just be following your advice in older posts? Are there JIRA posts
> I should be looking into as well?
>
> Thank you in advance,
>
>
> Kind Regards,
>
> Eugenia Monogyiou | NTT Data UK
> Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG
>
> Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com
>
> -Original Message-
> From: Peter Abramowitsch 
> Sent: 28 December 2020 09:46
> To: dev@ctakes.apache.org
> Subject: Sean... & UMLS
>
> Hi Sean
>
> Corresponding with Kean on another matter, It surfaced that people don't
> know about the UMLS change in the trunk and those instructions I
> distributed to the early adopters didn't get circulated (Gandhi - see other
> email).
>
> Is there anything you'd like me to do to facilitate this?   Also what about
> the 4.0.0 version I supplied.
>
> Peter
> Disclaimer: This email and any attachments are sent in strictest
> confidence for the sole use of the addressee and may contain legally
> privileged, confidential, and proprietary data. If you are not the intended
> recipient, please advise the sender by replying promptly to this email and
> then delete and destroy this email and any attachments without any further
> use, copying or forwarding.
>


Intermittent instability of the new UMLS API webservice

2020-12-31 Thread Peter Abramowitsch
Hi All

I've been seeing that every so often, the new UMLS web service that we are
using for validation, is unavailable.  This condition lasts for a short
time (minute or two) and then it returns.   Initially I thought it was the
network at my end, but it was too frequent and seemed more like their
service was just rebooting as the service host could be found even during
their service downtimes.

I wrote to my technical contact at the NLM and they confirmed that there is
indeed a problem.  From the wording of their reply, my guess is that they
are only now discovering that it is a regular event and also that we are
the only ones putting it to a real test.

I will write with any updates.

Peter


Re: LabValueFinderTester updated [EXTERNAL]

2020-12-30 Thread Peter Abramowitsch
Thanks!  Sean

Btw I noticed that someone has added two duplicate new issues (546 and 547)
about UMLS auth to JIRA without having noticed that there's a 545 we
created to show the issue was resolved.  But for whatever reason I have
only permissions to create an issue or add a comment to an issue, I
couldn't mark those others as dupes.  So I just commented.  You may want to
fix that.

Peter

On Wed, Dec 30, 2020 at 5:43 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Excellent, thanks!  It looks like this component is getting some great
> attention!
> ____
> From: Peter Abramowitsch 
> Sent: Wednesday, December 30, 2020 11:27 AM
> To: dev@ctakes.apache.org
> Subject: LabValueFinderTester updated [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean and Kean
>
> I updated with the LVF tester in regression-tests to fix the mismatch
> problem that made the LVF seem to fail more often than it actually did.
> The test is much more talkative in its output and does not transmit failure
> to the build process - It seemed more convenient that way..  Inserted a few
> more comments.
>
> Peter
>


LabValueFinderTester updated

2020-12-30 Thread Peter Abramowitsch
Hi Sean and Kean

I updated with the LVF tester in regression-tests to fix the mismatch
problem that made the LVF seem to fail more often than it actually did.
The test is much more talkative in its output and does not transmit failure
to the build process - It seemed more convenient that way..  Inserted a few
more comments.

Peter


Re: Sean... & UMLS [EXTERNAL]

2020-12-28 Thread Peter Abramowitsch
Hi Sean
The debug lines in reference to...?

If you're thinking about my LabValueFinder conversation with Kean the other
day, I've been working with him on another email thread and we've found
some answers.   Is that what you were interested in?

Peter

On Mon, Dec 28, 2020 at 4:35 PM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Could you send me the debug lines and/or unit tests?
> ____
> From: Peter Abramowitsch 
> Sent: Monday, December 28, 2020 4:45 AM
> To: dev@ctakes.apache.org
> Subject: Sean... & UMLS [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean
>
> Corresponding with Kean on another matter, It surfaced that people don't
> know about the UMLS change in the trunk and those instructions I
> distributed to the early adopters didn't get circulated (Gandhi - see other
> email).
>
> Is there anything you'd like me to do to facilitate this?   Also what about
> the 4.0.0 version I supplied.
>
> Peter
>


Sean... & UMLS

2020-12-28 Thread Peter Abramowitsch
Hi Sean

Corresponding with Kean on another matter, It surfaced that people don't
know about the UMLS change in the trunk and those instructions I
distributed to the early adopters didn't get circulated (Gandhi - see other
email).

Is there anything you'd like me to do to facilitate this?   Also what about
the 4.0.0 version I supplied.

Peter


Re: Lab Value Finder

2020-12-24 Thread Peter Abramowitsch
Hi Kean
I hope this doesn't make you regret having mentioned that you wrote the
LabValueFinder in an email a few weeks back... (:-)

I've been trying to figure out why it isn't working for me in a production
setting.

So I'm now just running it with its own unit tests (which I turned into a
simple Junit test) against the usual sno_rx_16ab dictionary.  They all
fail.   I added one extremely simple test to your more complex ones, but
that fails too - which makes me think there's something fundamental that
may be easy to fix

It finds the concepts and many of the candidate "values"  but matches the
wrong value or no value with a concept.  The debug output shows odd things
like this:   Two entries for one concept but with different search windows,
one of them with window begin and end swapped.


*24 Dec 2020 12:56:38 DEBUG LabValueFinder - Seeking value for:
LabMention(14-21): Lactate between 21 and 1424 Dec 2020 12:56:38 DEBUG
LabValueFinder - Seeking value for: LabMention(14-21): Lactate between 21
and 29*

I've been going through the code, but it would probably be easier for you
to spot the issue rather than a person unfamiliar with it.   Do you have a
moment?

I've attached the Junit test based off your unit test and its debug
output.  You'll have to change the package name, though.

Peter


Re: What to do about 4.0.0 and UMLS [EXTERNAL]

2020-12-09 Thread Peter Abramowitsch
Thanks Sean.

Since you plan to furnish just about everything, a poll wouldn't be
needed.  I was just thinking of a way that we could pare down the work to
the minimum needed for 4.0.0 users.

Peter

On Wed, Dec 9, 2020 at 3:40 PM Finan, Sean 
wrote:

> Hi Peter, Kean and others,
>
> > Seems to me that if we don't do anything,
> - I plan to give this 100% of my time starting next week.  Well, right
> after I clean the basement.
>
> > we will have a flurry of support
> > emails in 4.0.0 as soon as the NLM completes its roll-over.
> - I would call that optimistic  :^)
>
> > 1.  offer a tar file containing the two needed jars with instructions on
> > setting up new credentials and blanking out fields in the various xml
> > resources.
> > 2.  for 4.0.0 users that have done no customization whatsoever,
> - are good options for people that only ever install the binary.
> - This should be easy enough.  I will try to host it from the ctakes
> website.
>
> >> 3.  for 4.0.0 users that compile their own, provide a tar file
> containing
> > the sources plus instructions for modifying xml files and removing
> obsolete
> > Junit file.
> >+1 for Option 3!
> - This should also be easy enough.  I will try to host it from the ctakes
> website.
>
> >> 4.  Create a new 4.0.0.x tag and build new binary and source packs with
> > pre-blanked xml files
> - I plan to do at least this much, and I would hope that it covers
> everybody that uses the source directly.
>
> >> Is it worth a quick email poll of 4.0.0 users?
> -Go for it!  Just to warn you, users may not be very responsive.
>
> Cheers,
> Sean
>
>
> 
> From: Kean Kaufmann 
> Sent: Wednesday, December 9, 2020 9:04 AM
> To: dev@ctakes.apache.org
> Subject: Re: What to do about 4.0.0 and UMLS [EXTERNAL]
>
> * External Email - Caution *
>
>
> >
> > 3.  for 4.0.0 users that compile their own, provide a tar file containing
> > the sources plus instructions for modifying xml files and removing
> obsolete
> > Junit file.
>
>
>  Is it worth a quick email poll of 4.0.0 users?
>
>
> +1 for Option 3!  Thanks Peter (and everybody)...
>
>
> On Wed, Dec 9, 2020 at 8:48 AM Peter Abramowitsch  >
> wrote:
>
> > Hi Sean et al
> >
> > Seems to me that if we don't do anything, we will have a flurry of
> support
> > emails in 4.0.0 as soon as the NLM completes its roll-over.   I can see
> > several options and wonder which of these would stem the most calls and
> > provide the least work for source maintainers.
> >
> > 1.  offer a tar file containing the two needed jars with instructions on
> > setting up new credentials and blanking out fields in the various xml
> > resources.
> >
> > 2.  for 4.0.0 users that have done no customization whatsoever, offer a
> tar
> > file containing the new jars plus pre-modified xml files in desc and in
> > resources that can just overlay a virgin release.  This should just run
> as
> > soon as the user has added the umls key to the environment or command
> line
> >
> > 3.  for 4.0.0 users that compile their own, provide a tar file containing
> > the sources plus instructions for modifying xml files and removing
> obsolete
> > Junit file.
> >
> > 4.  Create a new 4.0.0.x tag and build new binary and source packs with
> > pre-blanked xml files   (Obviously this is the most work but may not be
> > needed)
> >
> > Is it worth a quick email poll of 4.0.0 users?
> >
> > Peter
> >
>


What to do about 4.0.0 and UMLS

2020-12-09 Thread Peter Abramowitsch
Hi Sean et al

Seems to me that if we don't do anything, we will have a flurry of support
emails in 4.0.0 as soon as the NLM completes its roll-over.   I can see
several options and wonder which of these would stem the most calls and
provide the least work for source maintainers.

1.  offer a tar file containing the two needed jars with instructions on
setting up new credentials and blanking out fields in the various xml
resources.

2.  for 4.0.0 users that have done no customization whatsoever, offer a tar
file containing the new jars plus pre-modified xml files in desc and in
resources that can just overlay a virgin release.  This should just run as
soon as the user has added the umls key to the environment or command line

3.  for 4.0.0 users that compile their own, provide a tar file containing
the sources plus instructions for modifying xml files and removing obsolete
Junit file.

4.  Create a new 4.0.0.x tag and build new binary and source packs with
pre-blanked xml files   (Obviously this is the most work but may not be
needed)

Is it worth a quick email poll of 4.0.0 users?

Peter


Re: 4.0.0 trial UserTester [EXTERNAL] [SUSPICIOUS]

2020-12-08 Thread Peter Abramowitsch
Hi Sean
Hope I did the right thing,

I felt better creating the tester file to fit the configuration rather than
change the configuration.  And since the other tester didn't do much - it
was just a stub..

Peter

On Tue, Dec 8, 2020 at 9:33 PM Peter Abramowitsch 
wrote:

> exactement.
>
> On Tue, Dec 8, 2020 at 8:02 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
>> Argh.  And now I notice that there is no /*Tester.java  ...
>> 
>> From: Finan, Sean 
>> Sent: Tuesday, December 8, 2020 1:56 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: 4.0.0 trial UserTester [EXTERNAL] [SUSPICIOUS]
>>
>> * External Email - Caution *
>>
>>
>> Peter, thanks for making the test(s) actually do something.
>>
>> Just as an aside, since it doesn't much matter right now, but supposedly
>> 
>>
>> maven-surefire-plugin
>>
>> ${maven-surefire-plugin.version}
>> 
>> 
>>
>> **/Test*.java
>>
>> **/*Test.java
>>
>> **/*Tests.java
>>
>> **/*TestCase.java
>> 
>> Removes that junit class naming requirement when testing with maven.
>>
>> In junit 5 "Test*" was added to the default test class search for running
>> junit by other means.
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_junit-2Dteam_junit5_commit_ce303c347e42b606ce51eaf72294dabf19a0ae72=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=9YvriB5zsxbV_Gibqia-BK7YsSb52NseyKFHZKgkdD0=y7UBsvsxgj8m2K9hxqT2nUKmHBCJJCQJU0os--y5rY4=
>>
>> junit 5.  Yet another ctakes update that would be nice to have.
>>
>> Sean
>>
>> 
>> From: Peter Abramowitsch 
>> Sent: Tuesday, December 8, 2020 1:03 PM
>> To: dev@ctakes.apache.org
>> Subject: 4.0.0 trial UserTester [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Tim - yes  the UmlsUserTester.java was old code and needs to go.  It never
>> did anything and didn't have a name recognized by Junit.  I created a new
>> one side by side called UmlsUserTest.java   It does work but is disabled
>> because you need to modify it and your environment to perform all the
>> tests
>> it does.  Requirements for testing are documented in the code.
>>
>> Peter
>>
>


Re: 4.0.0 trial UserTester [EXTERNAL] [SUSPICIOUS]

2020-12-08 Thread Peter Abramowitsch
exactement.

On Tue, Dec 8, 2020 at 8:02 PM Finan, Sean 
wrote:

> Argh.  And now I notice that there is no /*Tester.java  ...
> 
> From: Finan, Sean 
> Sent: Tuesday, December 8, 2020 1:56 PM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0 trial UserTester [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Peter, thanks for making the test(s) actually do something.
>
> Just as an aside, since it doesn't much matter right now, but supposedly
> 
>
> maven-surefire-plugin
>
> ${maven-surefire-plugin.version}
> 
> 
>
> **/Test*.java
>
> **/*Test.java
>
> **/*Tests.java
>
> **/*TestCase.java
> 
> Removes that junit class naming requirement when testing with maven.
>
> In junit 5 "Test*" was added to the default test class search for running
> junit by other means.
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_junit-2Dteam_junit5_commit_ce303c347e42b606ce51eaf72294dabf19a0ae72=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=9YvriB5zsxbV_Gibqia-BK7YsSb52NseyKFHZKgkdD0=y7UBsvsxgj8m2K9hxqT2nUKmHBCJJCQJU0os--y5rY4=
>
> junit 5.  Yet another ctakes update that would be nice to have.
>
> Sean
>
> 
> From: Peter Abramowitsch 
> Sent: Tuesday, December 8, 2020 1:03 PM
> To: dev@ctakes.apache.org
> Subject: 4.0.0 trial UserTester [EXTERNAL]
>
> * External Email - Caution *
>
>
> Tim - yes  the UmlsUserTester.java was old code and needs to go.  It never
> did anything and didn't have a name recognized by Junit.  I created a new
> one side by side called UmlsUserTest.java   It does work but is disabled
> because you need to modify it and your environment to perform all the tests
> it does.  Requirements for testing are documented in the code.
>
> Peter
>


4.0.0 trial UserTester

2020-12-08 Thread Peter Abramowitsch
Tim - yes  the UmlsUserTester.java was old code and needs to go.  It never
did anything and didn't have a name recognized by Junit.  I created a new
one side by side called UmlsUserTest.java   It does work but is disabled
because you need to modify it and your environment to perform all the tests
it does.  Requirements for testing are documented in the code.

Peter


  1   2   3   >