from:"Miller, Timothy"

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2024-05-15 Thread Miller, Timothy

Thanks Sean,
I was able to get it working – definitely a user/documentation issue and not an 
issue with the code. Looks like a great release. I’m happy to vote for release 
+1.
Tim

From: Finan, Sean 
Date: Tuesday, May 14, 2024 at 10:35 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *

Ah - are you just running the class within intellij?  If so, you need to set 
the classpath in the run configuration to be ctakes-examples.  Otherwise the 
classpath doesn't contain anything from modules outside ctakes-gui and 
ctakes-core.

Alternatively, run the maven compile step with the "runPiperGui" profile 
selected.  That will also run the piper file submitter gui with the correct 
classpath.

Using a binary build, after running bin/getUmlsDictionary, running 
bin/runPiperSubmitter also works.

I don't want to do it for 5.1.0, but I should make names of the class, profile 
and script match.

I will check the wiki instructions and make sure that -exact- details are in 
there.

Sean

____
From: Miller, Timothy 
Sent: Tuesday, May 14, 2024 12:55 PM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *

I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...

Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator

It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim

From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *

What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim

From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *

Hi Gandhi,

  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e%3e>

I added a little bit to your instructions in the ctakes-web-rest README  
https

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2024-05-14 Thread Miller, Timothy

I can check out and build successfully with mvn from the command line. I can 
successfully open in intellij and run the piper file submitter. I get an error 
trying to run the default fast pipeline piper file:

Loading Piper File DefaultTokenizerPipeline ...


Error: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key No Analysis Component found for 
ContextDependentTokenizerAnnotator



It doesn’t seem to be able to find the ContextDependentTokenizerAnnotator.

Tim



From: Miller, Timothy 
Date: Tuesday, May 14, 2024 at 9:25 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim


From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Gandhi,


  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$%3e>

I added a little bit to your instructions in the ctakes-web-rest README  
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$%3e>

The lines here indirectly applies to pre-release builds:
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$<https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$><https://urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$%3chttps:/urldefense.com/v3/__https:/github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$%3e>

The 5.1.0-SNAPSHOT version of ctakes-web-rest has a dependency on the 5.1.0 
version of ctakes modules (not the SNAPSHOT).
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/pom.xml*L14__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdP

Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] [SUSPICIOUS]

2024-05-14 Thread Miller, Timothy

What would you recommend for testing? Download the release tag to a clean 
system and try to do mvn compile and run some tests?
Tim

From: Finan, Sean 
Date: Thursday, May 2, 2024 at 6:57 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *

Hi Gandhi,

  *   So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder

The opposite.  Pre-release there are no jars on maven central, post-release 
there are.
Running 'mvn package' directly on the ctakes-web-rest project (in its directory)
or
running 'mvn package' on the ctakes -main- project (in the main ctakes root 
directory) with the web-rest-build profile enabled '-Pweb-rest-build'
will build the ctakes-web-rest.war web package.
That profile is defined in the main ctakes pom.
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/pom.xml*L1074__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO4jLXZJfA$

I added a little bit to your instructions in the ctakes-web-rest README  
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README__;!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7xPAX45w$

The lines here indirectly applies to pre-release builds:
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/README*L22__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO77G-Kmpw$

The 5.1.0-SNAPSHOT version of ctakes-web-rest has a dependency on the 5.1.0 
version of ctakes modules (not the SNAPSHOT).
https://urldefense.com/v3/__https://github.com/apache/ctakes/blob/main/ctakes-web-rest/pom.xml*L14__;Iw!!NZvER7FxgEiBAiR_!qEC094V4WptTp0dDb8gSoucu-ATor3CRJ8D064AyK511nLCL-ngQkVe-b3Ci3xgtI2BMcKLF1VFuuZZ7Q0sAXaNzpr0cHK4p5MRW6FKEcjQdPO7IKyYTAw$

The pre-release basically contains an equivalent to "changed code or resources" 
in that the code and resources in the pre-release do not exist on maven 
central, which is where a maven build would normally get them.
When maven builds the pre-release it will not be able to find version 5.1.0 of 
any jars through maven central, so it will look for them in your local .m2 
directory.
Maven puts the 5.1.0 jars in your .m2 directory when you run 'mvn install' on 
the main ctakes project.

In summary,
To build ctakes-web-rest to test the pre-release war, one must run 'mvn 
install' on the ctakes main project before they run 'mvn package' on the 
ctakes-web-rest project (or on the main project's web-rest-build profile).
To build ctakes-web-rest once ctakes 5.1.0 has been released, the extra 
preliminary step of running 'mvn install' will not be necessary.

  *   If you have some time this week, we can connect to understand what 
exactly is the problem.

I can meet you tomorrow evening your time (4-7 pm IST) to work with you in the 
SQL problem.  If you'd rather keep your Friday night to yourself, I can work 
with the same time slot any time through next Monday evening.

Before the 6.0.0 release I will put some Release Manager information in the 
wiki.  The maven release process using a GitHub repo requires a little trick 
that took me a long time to figure out, and the pre-release testing deserves 
some recorded documentation.

Sean

From: gandhi rajan 
Sent: Thursday, May 2, 2024 1:42 AM
To: dev@ctakes.apache.org 
Subject: Re: Please test the Apache cTAKES 5.1.0 release candidate [EXTERNAL]

* External Email - Caution *

Hi Sean,

Thanks for the update. So post release I would be able to run mvn clean
install on web rest module rather than relying on resources in .m2 folder
you mean? Infact I was trying to build them on a machine which doesnt have
any historic jars in the .m2 folder

Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2023-12-20 Thread Miller, Timothy

To some extent I think (and hope!) it will be superseded by the PBJ code that 
will be in cTAKES 5.0.0 anyways.
Tim


From: Finan, Sean 
Date: Wednesday, December 20, 2023 at 3:43 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
* External Email - Caution *


Hi Tim,

Thanks for the explanation.  I am going to remove the BERTRest classes.

Sean

From: Miller, Timothy 
Sent: Wednesday, December 20, 2023 6:25 PM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS] [SUSPICIOUS]

* External Email - Caution *


Hi Sean and Peter,
I put the BERTRest stuff in, with the intention of finishing it and adding the 
python code to run the REST server, but just never finished it up. I’m ok with 
leaving it out for now. (Now that we are on GitHub it would be so much easier 
to do things like this in branches and only merge when it’s actually finished!)
Thanks
Tim


From: Finan, Sean 
Date: Tuesday, December 5, 2023 at 10:59 AM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation 
<https://urldefense.com/v3/__https://www.apache.org/__;!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTR-i2_Eg$
 > and introduced in 2004, the Apache 2.0 License is a is a permissive free 
software license. The license permits use of the software for any purpose, 
users are able to distribute it, to modify it, and to distribute modified 
versions of the software."  - 
https://urldefense.com/v3/__https://pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$<https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$><https://urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$%3chttps:/urldefense.com/v3/__https:/pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$%3e>
 .
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 
project and include ctakes as a dependency.  Keep your project code only in 
your project repository.  If you want to make changes to ctakes in parallel, 
you can also create a module in your ctakes source root and put your non-ctakes 
code only in that module.  Don't check in that module!
- All t

Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] [SUSPICIOUS]

2023-12-20 Thread Miller, Timothy

Hi Sean and Peter,
I put the BERTRest stuff in, with the intention of finishing it and adding the 
python code to run the REST server, but just never finished it up. I’m ok with 
leaving it out for now. (Now that we are on GitHub it would be so much easier 
to do things like this in branches and only merge when it’s actually finished!)
Thanks
Tim


From: Finan, Sean 
Date: Tuesday, December 5, 2023 at 10:59 AM
To: dev@ctakes.apache.org 
Subject: Re: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL] 
[SUSPICIOUS]
* External Email - Caution *


Hi Peter,

My thoughts on this:

> a newer version of Mastif than Ctakes is packaged with, and additional 
> modifications that I've made
- Are you saying that you made a local/custom version of mastif that builds 
upon their latest (1.6?) version?  Or do you just need to update the ctakes 
dependency from 1.4 to 1.6?
- If you have a custom version of mastif, one way that I have dealt with this 
in other projects is to keep changes to the standard library 'in parallel' and 
call the parallel versions in my/our code.  If mastif is written in java then 
this is fairly easy to do by creating another module that shares the package 
structure of mastif.  The parallel code can be put into ctakes.
- Another way to deal with it is to distribute your custom code and jar with 
ctakes.  Mastif appears to use an Apache 2.0 license, so I think that this can 
be done.  If your changes are extensive or make the parallel option 
inconvenient then this may be the way to go.  "Developed by the Apache Software 
Foundation 
 and introduced in 2004, the Apache 2.0 License is a is a permissive free 
software license. The license permits use of the software for any purpose, 
users are able to distribute it, to modify it, and to distribute modified 
versions of the software."  - 
https://urldefense.com/v3/__https://pitt.libguides.com/openlicensing/apache2*:*:text=Apache*20License,modified*20versions*20of*20the*20software__;I34lJSUlJQ!!NZvER7FxgEiBAiR_!v4zkgN5rD-SYoU0guBeLZRHcTP_prpbCx0vc8SAx_RQhjGimd-7xb6PbvmISt3NmGl9Il1XIjNX47wK3l06ENC92i2qu7uQbxhON4pqoXmHM8nTgt3CGPw$
 .
- For an inclusion of a modified mastif in ctakes, maybe just put the whole 
thing into a project named "ctakes-mastif".

> no documentation or references
- A common problem with ctakes code, tests, resources ...  unfortunate and 
difficult to deal with.  I am guilty of some of this paucity of information.

> dependent on a BertREST server
- In instances such as this I would say that somebody checked in unfinished 
code, or that somebody forgot to check in a resource.  However, this particular 
code probably came from a developer working on an external project and checked 
in code that is intended to be used by that project.
- For any new developers out there: It is a 'best practice' to create your own 
project and include ctakes as a dependency.  Keep your project code only in 
your project repository.  If you want to make changes to ctakes in parallel, 
you can also create a module in your ctakes source root and put your non-ctakes 
code only in that module.  Don't check in that module!
- All that said, everybody forgets/makes mistakes/hurries ...


Sean


From: Peter Abramowitsch 
Sent: Tuesday, December 5, 2023 12:38 PM
To: dev@ctakes.apache.org 
Subject: Examining Ctakes 5.0 - two sides of the same question [EXTERNAL]

* External Email - Caution *


The question is:  what is our policy if a resource in the ctakes archive
depends upon another resource that is not in the archive and may not be
available elsewhere.  I'm sure there are other examples, but here are
two

1.   I've done some enhancements to the ZoneAnnotator for note section
detection, but these depend upon a newer version of Mastif than Ctakes is
packaged with, and additional modifications that I've made.   If I do add
the updates to the Zone Annotator, where should I put the customized Mastif
library - does it belong in cTakes?

2.  I found a couple of interesting annotators in the archive that are
dependent on a BertREST server, but there's no documentation or references
as to what code base that server comes from or whether its BERT model is
even publicly available.

DocTimeRelBertRestAnnotator
TemporalBertRestAnnotator
PolarityBertRestAnnotator

Here's my feeling:  Ctakes sources should be packaged to either be
self-sufficient or based on publicly available dependencies at the time of
check in.  If we really want to keep

Best practices for documenting NLP versions

2022-10-21 Thread Miller, Timothy

We’ve recently been using cTAKES for some internal projects where we make 
modifications, often using the REST server, combined with an open-source python 
client that makes the output of the REST server easy to post-process:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
written by my colleagues Andy McMurry and Mike Terry, and pip installable. The 
output is then either converted to FHIR or written to whatever convenient 
format we need.

But it’s useful to know for a given run on a given project, what was the NLP 
configuration that produced this output? Obviously, there are things like 
version numbers, but since cTAKES is highly configurable, and our 
post-processing libraries have versions, and we may use trunk or a previous 
commit instead of releases, things get complicated quickly. Does anyone have an 
existing solution they are willing to share? Or does anyone have any thoughts 
on this topic? This question goes slightly beyond cTAKES, but cTAKES is 
responsible for a lot of the complexity in figuring this out since it’s the 
most configurable component.

Thanks
Tim

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS]

2022-06-02 Thread Miller, Timothy

My recollection was that we ran into issues in previous attempts at migration 
with the large file sizes in our repo.
Tim

On Thu, 2022-06-02 at 20:55 +, Finan, Sean wrote:

* External Email - Caution *

Thank you Gandhi and Richard.

Unless somebody else beats me to it I will perform some research and see what 
approaches can be used and which might be best.  In the end the cTAKES Project 
Management Committee will need to vote for any action as sweeping as moving to 
github.

Sean

From: gandhi rajan <

gandhiraja...@gmail.com

>

Sent: Thursday, June 2, 2022 9:02 AM

To:

dev@ctakes.apache.org

Subject: Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

* External Email - Caution *

Hi Sean,

If we are sure that the SVN has all the latest changes and active

development is primarily on SVN, then why don't we request a fresh git

repository and push all the changes over there.

More info on

https://urldefense.com/v3/__https://infra.apache.org/svn-to-git-migration.html__;!!NZvER7FxgEiBAiR_!rXFMCtlZM4NpDPkgzeq-X2pj1rNwzQNTpZkMZXDoYiZKdJp0n4tDY6q9IcsGRPGrA6KhvmouV_1y_txDVok-tGy3dVLaqefQlQ$

On Thu, Jun 2, 2022 at 5:52 PM Finan, Sean

<

sean.fi...@childrens.harvard.edu.invalid

> wrote:

Hi Richard, you bring up a valid concern.

cTAKES Developers:

The Apache Foundation has had an initiative to "move" all projects to

GitHub for some time now.

I don't know much about how this is done.  If anybody out there has

knowledge or experience that they can pass on, please share.

Thanks,

Sean

From: Richard Eckart de Castilho <

r...@apache.org

>

Sent: Thursday, June 2, 2022 3:39 AM

To:

dev@ctakes.apache.org

Subject: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL]

* External Email - Caution *

Hi,

it appears that the GitHub mirror of Apache cTAKES may be stuck.

When I check the svn log of

https://urldefense.com/v3/__https://svn.apache.org/repos/asf/ctakes/trunk/__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9XZi-zSEw$

, I can

see activity as recent as May 2022.

However, on GitHub, I can only see stale branches:

https://urldefense.com/v3/__https://github.com/apache/ctakes/branches__;!!NZvER7FxgEiBAiR_!pH7M7eePuLp7ejJW09QaoQOZsyoj1CD8QySUDx79FZmu6CUuooFcB0dk0hJQ7aI7G3Sq3Mz_GzoiL9Uu2s-59w$

Wouldn't it be good if the GitHub mirror would be kept up-to-date?

Best,

-- Richard

--

Regards,

Gandhi

"The best way to find urself is to lose urself in the service of others !!!"

Re: Ctakes + UMLS dictionary [EXTERNAL]

2022-01-18 Thread Miller, Timothy

I recently posted an updated 2021AA UMLS file to the ctakes resource 
sourceforge repo:
https://sourceforge.net/projects/ctakesresources/files/

which should be a drop-in replacement for the version included in the last 
ctakes release.

If you extract this new file in the same directory as your release version, 
this container setup is an example of how to download and where to put the file:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-rest-package/blob/master/Dockerfile
and it references the upgrade of the dictionary descriptor here:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-rest-package/blob/master/customDictionary.xml

Tim



On Tue, 2022-01-18 at 16:52 +, Shyam Bhimani wrote:

* External Email - Caution *



Peter,


Appreciate your response.


Shyam Bhimani








CONFIDENTIALITY NOTICE: The contents of this email message and any attachments 
are intended solely for the addressee(s) and may contain confidential and/or 
privileged information and may be legally protected from disclosure.


-Original Message-

From: Peter Abramowitsch <



pabramowit...@gmail.com

>

Sent: Tuesday, January 18, 2022 9:19 AM

To:



dev@ctakes.apache.org


Subject: Re: Ctakes + UMLS dictionary


** WARNING: This email originated from outside of Target RWE. **



As distributed, it contains the mappings of cuis to 2015 snomed and rxnorm 
vocabularies.  It does not contain ICD 9 or 10 mappings.  But creating a custom 
dictionary is a normal aspect of any serious installation. This is how you can 
incorporate more recent versions of the umls and other vocabularies.  See the 
ctakes dictionary creator for more information.


On Tue, Jan 18, 2022, 7:36 AM Shyam Bhimani <



sbhim...@targetrwe.com

> wrote:


Hello,




When I dig little deep I found below information on cTAKES wiki. Does

it mean default clinical pipeline uses 2015 version of SNOWMED,

RxNorm, ICD9, ICD10? Please advise.






Shyam Bhimani




*From:* Shyam Bhimani <



sbhim...@targetrwe.com

>

*Sent:* Thursday, January 13, 2022 8:10 PM

*To:*



dev@ctakes.apache.org


*Subject:* Ctakes + UMLS dictionary




*** **WARNING:* This email originated from outside of Target RWE. 




Hello,




I am new to cTAKES and having hard time understanding what

year/version dictionary (SNOMED-CT, RxNorm, ICD9 etc) is being used by

ctakes default clinical pipeline?


I have some medication names that are not being picked up by cTAKES eg

dupilumab, dupixent so I am trying to understand why. Please advise.




TIA




Shyam Bhimani


*Software Engineer*




*Target RWE *


5001 S Miami Blvd, Suite 100


Durham, NC 27703




sbhim...@targetrwe.com



C: (817) 323-0632





<



https://urldefense.com/v3/__https://nam12.safelinks.protection.outlook.com/?url=https*3A*2F*2Fwww__;JSUl!!NZvER7FxgEiBAiR_!6BbmTtLIk5mmapjkmyGElCq2e6V7CLZrfMHoXe1HWqAvhoPBuVZ0DZIxPp_iBM_Ah14a5A6IoX7Jne4$



.targetrwe.com%2Fdata=04%7C01%7C%7Cff820d39bb904a065f6308d9da9e4f

74%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637781195763008088%7CU

nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha

WwiLCJXVCI6Mn0%3D%7C3000sdata=v32v0ofWwBBwwdTm%2BeykdKCDz20ItDE6f

2GMNZJva8k%3Dreserved=0>






[image: Title: LinkedIn - Description: image of LinkedIn icon]

<



https://urldefense.com/v3/__https://nam12.safelinks.protection.outlook.com/?url=https*3A*2F*2Fwww.linkedin.com*2Fcompany*2Ftargetrwe*2Fdata=04*7C01*7C*7Cff820d39bb904a065f6308d9da9e4f74*7Cd09f6c4846d241f380993e0f7df7a48e*7C1*7C0*7C637781195763008088*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C3000sdata=QhFxUBhxaOjUSe*2BFXnCBNI33FQAplp1iuohoTKI2ZOE*3Dreserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJQ!!NZvER7FxgEiBAiR_!6BbmTtLIk5mmapjkmyGElCq2e6V7CLZrfMHoXe1HWqAvhoPBuVZ0DZIxPp_iBM_Ah14a5A6I91-EdXg$

 >[image:

Title: Twitter - Description: image of Twitter icon]

<

Re: Performance of the cleartk history module [EXTERNAL]

2022-01-04 Thread Miller, Timothy

Peter,
That sounds really useful! Were you able to benchmark it for runtime on a 
reasonably sized sample of your notes? Just curious because I wouldn't have 
expected regex to be that much of a bottleneck.
Tim

On Tue, 2022-01-04 at 17:36 -0800, Peter Abramowitsch wrote:

* External Email - Caution *

Thank you for the fulsome and humorous response.  Yes, I understand

perfectly.  We definitely think along the same lines.  One of the drawbacks

of static and simple to understand utility functions like JCasUtil's  is

that one can just slap things together without getting to grips with the

wastage of resources that sometimes occur.

This brings me to the topic of Negex.  I've done a lot of improvements to

it, also after I sent you that version last year.  It has been well tested

in over 100 million notes so i think I can check it in.  But back to

performance - it used to execute 200+ regular expressions multiple times on

every sentence covering an identified annotation regardless of whether

there was any hope of any of them matching.   My solution was to build an

inverted index of the compiled expressions keyed on unique words found in

the expressions, so based on the sentence,  I could look up and execute

only the expressions that might match.  This might cut the number of regex

operations down to 5 or 10 and sometimes none at all.There were many

other changes that related to negation detection, of course.  For instance

- handling sentences that switch between negating and non negating phrases

within the same sentence.

Peter

On Tue, Jan 4, 2022 at 10:47 AM Finan, Sean <

sean.fi...@childrens.harvard.edu

> wrote:

Great question.

The package name "windowed" isn't helpfully self-descriptive.  It contains

yet another bit of code that I wrote as quickly as possible to help

somebody in real-time with a problem.

* There is only a 'procedural' difference between the two.  The models and

methods are the same.

The assertion engine has a bunch of objects delegating to objects

delegating to more objects.  Each object calls one or more

JCasUtil.select() frequently for the same types.  They also redundantly

call JCasUtil.selectCovered() and selectCovering() for the same types.

process( jcas ) {

  Collection<..> sentences = ...select(..);

  delegateA.do( sentences );

}

class DelegateA {

  void do( Collection<..> sentences ) {

   for ( Sentence sentence : sentences ) {

  Collection tokens = JCasUtil.selectCovered( jcas,

Token.class, sentence );

  delegateB.use( tokens );

 }

}

class DelegateB {

  void use( Collection<..> tokens ) {

 Collection sentence = JCasUtil.selectCovering( jcas,

Sentece.class, tokens );

...

  }

}

The above isn't an exact representation, but you get the point.

The problem with code like this is repeated traversal of the (object)

array in the cas.  Every JCasUtil.select* pours through the whole thing.

For a small document with a small cas (or early in a pipeline), that array

may be small and the traversal fast.  However, when people are

(unadvisably) processing a single document that sizes in the gigabyte

range, repeatedly going through the cas takes a long time.

So, what I did was create a single container object that holds Collections

of the types of interest and their covering relationships, populate all

that stuff once per process( jcas ) and pass that container through to each

delegate object.  Basically, a jcas lite.  The biggest culprit in the

assertion engines was repeatedly iterating over the array for covered and

covering windows, hence the subpackage name "windowed".

Is it faster for smaller docs?  Not so much.  Does it instantaneously

process the Encyclopedia Brittanica as one text?  Of course not.  Is it

orders of magnitudes faster on such onerous docs?  In my tests, yes.

Going through my delegating example above, the end delegate is the same.

Hence the processing is the same and repeatable.  In my tests on both small

and gargantuan documents the windowed version and the original version

produced the same output.

Sean

From: Peter Abramowitsch <

pabramowit...@gmail.com

>

Sent: Tuesday, January 4, 2022 11:39 AM

To:

dev@ctakes.apache.org

Subject: Re: Performance of the cleartk history module [EXTERNAL]

* External Email - Caution *

Hi Sean

Ok..  I was confused whether I was meant to find it in the sources.

But while you're reading this, is there a brief way to describe the

difference between the older:package

org.apache.ctakes.assertion.medfacts.cleartk;

and

org.apache.ctakes.assertion.medfacts.cleartk.windowed

Peter

On Tue, Jan 4, 2022 at 7:47 AM Finan, Sean <

sean.fi...@childrens.harvard.edu

>

wrote:

Hi Peter,

I created a second engine that just used text

Re: empty preferredText [EXTERNAL]

2021-12-07 Thread Miller, Timothy

OK, I thought this might be what's happening. I did check my 2021 UMLS release 
and the cui does seem to have a preferred text but I think my container is 
using an older release. For what it's worth the CUI is:
C0360554

and a sentence that reproduces the issue in CVD with the current release is:

'Patient had problems tolerating oral hydrocortisone.'

I will see if I can find the older UMLS release lying around. I think the right 
workaround for now is your suggestion of using the covered text.

Tim


On Tue, 2021-12-07 at 17:59 +0100, Peter Abramowitsch wrote:

* External Email - Caution *



Hi Tim,


Yes, I've definitely encountered it.   It happens when the concept has a

CUI_TERM which has matched the text, but there is no corresponding entry in

the SNOMED or other vocab table mapping CUI to SNOMED.  The obvious choice

is to use the covered text as a surrogate, but technically it could be PHI

if that matters to you.  The other thing is to see if there's an MSH term

that maps using the metathesaurus.  If so, including MSH in your dictionary

as a src AND dest vocab will solve the problem.


Peter



On Tue, Dec 7, 2021 at 5:45 PM Miller, Timothy <

<mailto:timothy.mil...@childrens.harvard.edu>

timothy.mil...@childrens.harvard.edu

> wrote:


Hello,

I'm using the dictionary lookup (through ctakes-web-rest) and trying to

read off the preferredText that comes back as a human-readable way to

display the CUI. On a very small percentage, there does not seem to be any

preferredText. Has anyone else encountered this? Is this a limitation of

the underlying ontologies or a bug we can address?

Tim

empty preferredText

2021-12-07 Thread Miller, Timothy

Hello,
I'm using the dictionary lookup (through ctakes-web-rest) and trying to read 
off the preferredText that comes back as a human-readable way to display the 
CUI. On a very small percentage, there does not seem to be any preferredText. 
Has anyone else encountered this? Is this a limitation of the underlying 
ontologies or a bug we can address?
Tim

Re: Another question about relationship extractors [EXTERNAL]

2021-10-27 Thread Miller, Timothy

Hi Peter,
I guess you're asking why there is annotator code for all the relations but 
only released models for location_of and degree_of (severity)? The simple 
reason is those are the only two that we felt were accurate enough to release. 
We had an annotated training corpus with all the relations, but some relation 
types did not have enough instances to train accurate models with the methods 
of the time. We're circling back pretty regularly to discuss whether newer 
methods might be able to do better with less data, we'll try to keep in touch 
about that.

Thanks
Tim


On Wed, 2021-10-27 at 11:35 +0200, Peter Abramowitsch wrote:

* External Email - Caution *



Hi (probably Sean),  are the default model.jars for the

*CausesBringsAboutRelationExtractorAnnotator* and the

*ManagesTreatsRelationExtractorAnnotator* not part of the cTakes

sources?I looked through the source at all pipers and all unit tests

and on the net and I didn't find references to the usage of these

annotators.  When I run with them, they are definitely looking for models

of their own, and there is code to do the training, but this is an area

that's still a mystery to me.  Are these models proprietary to U of

Colorado which is where the source seems to come from?


Peter

Re: Loading model - what? [EXTERNAL]

2021-09-13 Thread Miller, Timothy

Hi Ben,
Those come from the dependency parser and SRL system, and I think are generated 
from the external library (ClearNLP?) we depend on for those modules. As for 
the models themselves, the files are in ctakes-dependency-parser-res, but they 
are binary files that will be difficult to understand without ClearNLP.
Tim


On Mon, 2021-09-13 at 22:17 +0200, Benjamin hansen wrote:

* External Email - Caution *



Hi, when i run my ctakes code i see in the stdout loads of


Loading model:

.

Loading model:

...

Loading model:

.

Loading model:



etc.


I understand that my pipeline is loading a lot of models - but what models

are they? Is there any way i can find out what models are being loaded in

the pipeline?


I have tried to search for "Loading model" is both ctakes and opennlp

source code to figure out where its being printed from but to no avail :(

Where is this being printed from? How can I find out what models are loaded?


Thanks in advance

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-18 Thread Miller, Timothy

But Sean, isn't what he's asking for essentially already implemented in cTAKES 
as the custom dictionary? I'm currently using that approach for my covid 
container:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container
Tim

From: Finan, Sean 
Sent: Tuesday, May 18, 2021 11:55 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *

Hi Greg,

>From 30,000 ft, I think that you would want to use the RutaEngine.

https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$

That seems to be the actual analysis engine that loads and uses rules to create 
annotations.
While you could use an xml descriptor or use the piper "set" command and do 
things like mapping ruta to ctakes type systems, I would take the alternate 
approach of "copying" the initialize(..) and process (..) methods and modify 
them to use ctakes types directly.

Disclaimer:  I know very little about uima ruta.  At some point I did look into 
it but it was for a specific (ctakes-derivative) project and I didn't go 
further than basic doc perusal.

If you move forward with this please let us all know what you find.  I think 
that there will be great interest in the community.

Sean

From: Greg Silverman 
Sent: Tuesday, May 18, 2021 11:13 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *

Hi Sean,
I was wondering if there was a way to use rule-base lookup of a custom
lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
wrt to cTAKES specifics.

Thanks!

Greg--

On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

>  To which ctakes component(s) are you referring?
> 
> From: Greg Silverman 
> Sent: Sunday, May 16, 2021 6:02 PM
> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> I looked all over and could not find any information on how to add this
> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>
> Thanks in advance!
>
> Greg--
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>

--
Greg M. Silverman
Senior Systems Developer
NLP/IE 

Department of Surgery
University of Minnesota
g...@umn.edu

multi-threads on REST client?

2021-03-25 Thread Miller, Timothy

Just wondering what the logistics of this are. The REST interface has a
CAS pool of 10, and when it gets a new request, it grabs a CAS and
sends it into a pipeline. So what happens if the REST endpoint is
getting hit by tons of different requests at the same time? I'm
experimenting with this in python and getting hard to understand errors
(best as I can tell it looks like it's complainin that the output is
None). Just wondering if anyone has any insight about what's going on
on the server side and whether a) this _should_ work, b) it _could_
work if done properly.

Thanks
Tim

Re: 4.0.0.1 patch [EXTERNAL]

2021-02-26 Thread Miller, Timothy

Hi Sean,
I can't answer your primary question, but my recollection is that
4.0.0.1 was an absolutely minimalist change to just fix the
authentication, so I don't think ytex would've been touched.
Tim


On Thu, 2021-02-25 at 17:24 +, Mullane, Sean *HS wrote:
> * External Email - Caution *
> 
> 
> Hello,
> 
> I am just catching up with the NLM auth changes. I tried replacing
> the ctakes-core-4.0.0.jar file with ctakes-core-4.0.0.1.jar, and am
> getting this error:
> 
> ERROR [PiperFileRunner] MESSAGE LOCALIZATION FAILED: Can't find
> resource for bundle java.util.PropertyResourceBundle, key No Analysis
> Component found for org.apache.ctakes.core.ae.CuiFilterAnnotator
> 
> I saw a message from Tim Miller from December mentioning removing
> ytex components from ctakes-core. Was this done on the released
> version of 4.0.0.1? We're using ytex so I wonder if that may be the
> cause of this error. Or maybe applying the patch isn't as simple as
> drop-in replacing the jar? (I changed the API key in my config files
> and that seems to be working as expected).
> 
> Thanks,
> Sean
> 
>

Re: Looking for comparable experiences with mysql [EXTERNAL]

2021-02-25 Thread Miller, Timothy

Gandhi,
Is that code public at all? I made a docker container for the REST
server that uses the hsql, but if mysql is even faster and the
dictionary building can be containerized that might be a nice next step
for better performance of the container.
Tim


On Thu, 2021-02-25 at 20:33 +0530, gandhi rajan wrote:
> * External Email - Caution *
> 
> 
> Hi Peter,
> 
> Noticed a similar behavior while working on cTAKES REST module. The
> in-memory HSQL in my case was stressing the application server memory
> and
> ended up slowing down the process whereas mysql performed better.
> Also
> the engine you use in MySQL matters as well.
> 
> We did a testing on MySQL based UMLS dictionary using multiple pods
> running
> ctakes rest and it was scaling fairly well. But havent explored with
> 100+
> connections. But i guess with connection pool configurations in MySQL
> DB it
> should be manageable. Hope it helps.
> 
> On Thu, Feb 25, 2021 at 7:37 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
> 
> > Hi all,
> > 
> > As an experiment I extracted my rather large HSQL UMLS dictionary
> > into a
> > local MYSQL instance and ran the equivalent of 3 simultaneous
> > ctakes
> > pipelines with the overlap lookup annotator against it with a set
> > of 1000
> > notes.
> > 
> > Comparing that with the same setup running against the traditional
> > in-memory HSQL database (three separate instances), I was surprised
> > to find
> > that the Mysql implementation it was 30% faster even though it is
> > an out of
> > process DB
> > 
> > Has that been anyone else's experience as well?  And if so, do you
> > have any
> > experience with a MYSQL based UMLS dictionary with 100+ pipeline
> > connections?
> > 
> > Peter
> > 
> 
>

Re: neural negation model in ctakes [EXTERNAL]

2021-01-24 Thread Miller, Timothy

Peter, I'd be happy to try it, especially if it's made easy with a ctakes 
module! At the very least that sounds like it would be a good baseline 
comparison to use if we are benchmarking new ML methods. We have several 
datasets available internally that are not widely available in the research 
community.
Tim


From: Peter Abramowitsch 
Sent: Sunday, January 24, 2021 12:05 PM
To: dev@ctakes.apache.org
Subject: Re: neural negation model in ctakes [EXTERNAL]

* External Email - Caution *


Thats great Tim - it sounds very sophisticated!

In fact I had made some changes to the Negex Annotator a last fall which I
hadn't checked in but was waiting for Sean to test.  In a great deal of my
own testing I discovered that Negex, which is easily expandable to
accommodate new constructions, had only a couple of serious flaws and I
believe I have fixed these, as well as a performance issue it had.   If
you're interested in testing it up against yours that would be great.
Reading your description above, I wondered how it would do in the case of
strings of entities which were negated by a single negating trigger phrase
either ahead or behind the series.  Or what happens when a series of
entities which begins as all being negated has one expressed in a way that
stops the negation pattern.  These are the weaknesses I addressed in my
changes.

Regards
Peter

On Sun, Jan 24, 2021 at 5:08 PM Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Hi all,
> I just checked in a usable proof-of-concept for a neural (RoBERTa-based to
> be specific) negation classifier. The way it works is a tiny bit of python
> code (using FastAPI) sets up a REST interface that runs the classifier:
> ctakes-assertion/src/main/python/negation_rest.py
>
> it runs a default model that I trained and uploaded into Huggingface
> modelhub. It will automatically download the first time the server is run.
>
> there is a startup script there too:
> ctakes-assertion/src/main/python/start_negation_rest.sh
>
> The idea would be to run this on whatever machine you have with the
> appropriate GPU resources and it creates 3 REST endpoints:
> /negation/initialize  -- to load the model (takes longer the first time as
> it will download)
> /negation/process -- to classify the data and return negation values
> /negation/collection_process_complete -- to unload the model
>
> to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
>
> ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java
>
> The main work here is converting the cTAKES entities/events into a simpler
> data structure that gets sent to the python REST server, making the REST
> call, and then converting the classifier output into the polarity property.
>
> Performance:
> The accuracy of this classifier is much better in my testing. I am looking
> forward to being able to hopefully make the path to improving the
> performance easier as it can potentially just be a change to the model
> string to have it grab a new model on modelhub.
>
> The speed is marginally slower if we do a 1-for-1 swap, but that's a
> little bit misleading, because we currently run 2 parsers to generate
> features for the default ML negation module. If we don't need those parsers
> we can dramatically cut the speed of the processing even with the neural
> negation module. I tested this with the python code running on a machine
> with a 1070ti. The goal for these methods going forward if we want to scale
> should be to have the neural call do a few things with a single pass,
> especially if we are using large transformer models. But this proof of
> concept of a single task will hopefully make it easier for other folks to
> do that if they wish.
>
> FYI, another way of doing this is by using python libraries like cassis
> and actually having python functions be essentially UIMA AEs -- I think
> there will be a place for both approaches and I'm not trying to wall off
> work in that direction.
>
> Tim
>
>

neural negation model in ctakes

2021-01-24 Thread Miller, Timothy

Hi all,
I just checked in a usable proof-of-concept for a neural (RoBERTa-based to be
specific) negation classifier. The way it works is a tiny bit of python code
(using FastAPI) sets up a REST interface that runs the classifier:
ctakes-assertion/src/main/python/negation_rest.py

it runs a default model that I trained and uploaded into Huggingface modelhub.
It will automatically download the first time the server is run.

there is a startup script there too:
ctakes-assertion/src/main/python/start_negation_rest.sh

The idea would be to run this on whatever machine you have with the appropriate
GPU resources and it creates 3 REST endpoints:
/negation/initialize -- to load the model (takes longer the first time as it
will download)
/negation/process -- to classify the data and return negation values
/negation/collection_process_complete -- to unload the model

to mirror UIMA workflows. Then, the UIMA analysis engine sits in:
ctakes-assertion/src/main/java/org/apache/ctakes/assertion/ae/PolarityBertRestAnnotator.java

The main work here is converting the cTAKES entities/events into a simpler data
structure that gets sent to the python REST server, making the REST call, and
then converting the classifier output into the polarity property.

Performance:
The accuracy of this classifier is much better in my testing. I am looking
forward to being able to hopefully make the path to improving the performance
easier as it can potentially just be a change to the model string to have it
grab a new model on modelhub.

The speed is marginally slower if we do a 1-for-1 swap, but that's a little bit
misleading, because we currently run 2 parsers to generate features for the
default ML negation module. If we don't need those parsers we can dramatically
cut the speed of the processing even with the neural negation module. I tested
this with the python code running on a machine with a 1070ti. The goal for
these methods going forward if we want to scale should be to have the neural
call do a few things with a single pass, especially if we are using large
transformer models. But this proof of concept of a single task will hopefully
make it easier for other folks to do that if they wish.

FYI, another way of doing this is by using python libraries like cassis and
actually having python functions be essentially UIMA AEs -- I think there will
be a place for both approaches and I'm not trying to wall off work in that
direction.

Tim

Re: Apache cTAKES 4.0.0.1 : UMLS Authentication Patch [EXTERNAL]

2021-01-21 Thread Miller, Timothy

Seconded, thanks a lot Sean and Peter for getting this working and
turned around so quickly! 
Tim

On Wed, 2021-01-20 at 23:13 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Thanks Sean!
> 
> Peter
> 
> On Wed, Jan 20, 2021 at 4:25 PM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > ???As some have experienced, the U.S.A. National Library of
> > Medicine (NLM)
> > has changed the authentication method for using the Unified Medical
> > Language System (UMLS).
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.gov_research_umls_index.html=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=CVA7xXHEy4dOSNfEju1Or1cr6KZd3QY7bnY4yIDye3I=
> >  
> > 
> > 
> > Though a bit late in its arrival, Apache cTAKES now has a patch
> > release
> > that supports the new UMLS authentication method.
> > 
> > 
> > The release number is 4.0.0.1, an update of the previous release
> > version
> > 4.0.0 with a single change to enable the new UMLS authentication.
> > 
> > No other code or functionality has been modified and there are no
> > enhancements to the previous release 4.0.0
> > 
> > 
> > There are instructions for use on the Apache cTAKES wiki.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0.0.1=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo=
> >  
> > 
> > 
> > The source code is available in the 4.0.0.1 tag Subversion (svn)
> > repository.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D4.0.0.1_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=1jNLJHU_4gH08DUNZDjfC4BLGsPSKdiOe63D48Qqekw=
> >  
> > 
> > 
> > The jar and pom files are available from maven central and any
> > Applications utilizing Apache cTAKES as an Apache Maven dependency
> > should
> > update their pom files.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__search.maven.org_search-3Fq-3Dctakes=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=7ICwdr1JlzQeT2skY6TMXmU_u3WAZlxTYKpIZGmGQfs=
> >  
> > 
> > 
> > At this time the Apache infra script that points mirror download
> > servers
> > to the pre-built zip/archive files has not run.  I hope that the
> > mirror
> > servers are updated in a day or two.
> > 
> > When the mirror servers are updated the buttons on the "Downloads"
> > page of
> > ctakes.apache.org should trigger a download of the patch
> > version.  Until
> > then you will get a "page not found" error.
> > 
> > Until the pre-built archive downloads are available through the
> > website,
> > you can find them in the release repository.
> > 
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_releases_org_apache_ctakes_ctakes-2Dcore_4.0.0.1_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=uM_5s0vlGN8eJc1nK4s9RPxNQ2o5KB3vWRC1M0qo2HU=
> >  
> > 
> > 
> > For more information please visit the wiki page on the Apache
> > cTAKES
> > 4.0.0.1 patch release.
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0.0.1=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=KoUGdRx91vEVMVc0CokYc1Uhsfa38K34Dhhd8GDDdhA=5Fxduqd71TO5P2AWuZyzAYmaBG1BiOM7G3mWXN-ljqo=
> >  
> > 
> > 
> > 
> > A very special thanks goes to Peter Abramowitsch for conception and
> > original implementation of the authentication code and workflow.
> > 
> > 
> > Many thanks to those who boldly tested, documented and otherwise
> > made this
> > patch and its trunk equivalent possible, including
> > 
> > Kean Kaufmann
> > 
> > Gandhi Rajan
> > 
> > Eugenia Monogyiou
> > 
> > Timothy Miller
> > 
> > and anybody else that I have forgotten (apologies).
> > 
> > 
> > ?And for those of you gave gave me a bit of prodding to get this
> > wrapped
> > up and published ... in the end I am grateful and you have done us
> > all a
> > service.
> > 
> > 
> > Cheers,
> > 
> > Sean
> > 
> >

Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-12-08 Thread Miller, Timothy

Those are the ones that I set to the empty string, I don't know how
it's still finding something. I'll poke around.
Tim


On Tue, 2020-12-08 at 17:39 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> In your dictionary configuration xml file?  That would be
> resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml  -
> depending upon what dictionary you are using.
> 
> You will find two sections that look like this:
> 
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=uRA81eRtCuJYVkMEzd47jQTacPEI0XTrHeDpgKY_Ma0=9SE2vJimnmdqHHlSYjb0EtK6QJ0DDzB7O7PBZQ6ayJI=
> >
>  
>  
>  
> 
> 
> 
> ____
> From: Miller, Timothy 
> Sent: Tuesday, December 8, 2020 12:18 PM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> I also forgot to follow some of the instructions for setting umls url
> and other fields in the descriptor to empty string. But now I get:
> 
> 08 Dec 2020 12:15:15  WARN UmlsUserApprover - Using alternate umlsURL
> found via: properties
> 08 Dec 2020 12:15:15  INFO UmlsUserApprover - Checking UMLS Account
> at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=7jjWEe7tIQEjuL1bxNuOtUY3RXS1He-CqwN-1jMluqo=5v4infTOKs_DKS1MU9guMdb9vBk3jRrvQixooV2M8ZY=
> :
> 
> 
> where is it finding this other umlsURL???
> 
> Tim
> 
> 
> On Tue, 2020-12-08 at 17:06 +, Finan, Sean wrote:
> > * External Email - Caution *
> > 
> > 
> > Hi Tim, Peter,
> > 
> > Just in case Peter can't get back to you right away,
> > 
> > > I'm actually setting this via my VM options as:
> > -Dctakes_umlspw=
> > 
> > I think that you want to use
> > -Dctakes.umls_apikey
> > 
> > On some systems/shells the dot doesn't work.  ctakes will also
> > accept
> > (dot to underscore)
> > -Dctakes_umls_apikey
> > 
> > 
> > I think that is what I used ...
> > 
> > 
> > From: Miller, Timothy 
> > Sent: Tuesday, December 8, 2020 11:52 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: 4.0.0 UMLS Authentication Patch - for Developers - Not
> > a
> > release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Hi Peter,
> > Sorry to leave you in the lurch so long, I've been trying it this
> > morning and running into some issues (I don't think patch-related
> > issues but just trouble getting to where I can test it).
> > 
> > So far:
> > 1) Had to add dictionary-fast to the dictionary pom to get it to
> > compile
> > 2) had to remove all ytex modules from main pom to get it to
> > compile
> > 3) org/apache/ctakes/dictionary/lookup2/util/UmlsUserTester.java
> > looks
> > to have some old code and doesn't compile (bypassed by changing to
> > "no
> > error check"
> > 
> > Then I had to remember what classes in 4.0.0 I could try to check
> > it,
> > and settled on
> > org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java
> > 
> > with no credentials, I get a UMLS authentication error (as
> > expected).
> > 
> > with my old credentials, I get these errors:
> > 08 Dec 2020 11:43:21 ERROR UmlsUserApprover - The user property
> > must
> > now be set to 'umls_api_key'
> > 08 Dec 2020 11:43:21 ERROR UmlsUserApprover -  Verify that you are
> > setting command-line option --user, or ctakes property umlsUser, or
> > environment variable umlsUser properly.
> > 08 Dec 2020 11:43:21 ERROR UmlsDictionaryLookupAnnotator - Error:
> > Invalid UMLS License.  A UMLS License is required to use the UMLS
> > dictionary lookup.
> > 
> > (also seems about right).
> > 
> > If i set ctakes_umlsuser=umls_api_key and ctakes_umlspw= > api
> > key>, I'm still getting an error:
> > 
> > 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -   UMLS Account at
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=1zoXt5XWc

Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-12-08 Thread Miller, Timothy

I also forgot to follow some of the instructions for setting umls url
and other fields in the descriptor to empty string. But now I get:

08 Dec 2020 12:15:15  WARN UmlsUserApprover - Using alternate umlsURL
found via: properties
08 Dec 2020 12:15:15  INFO UmlsUserApprover - Checking UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser:


where is it finding this other umlsURL???

Tim


On Tue, 2020-12-08 at 17:06 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> Hi Tim, Peter,
> 
> Just in case Peter can't get back to you right away,
> 
> > I'm actually setting this via my VM options as:
> -Dctakes_umlspw=
> 
> I think that you want to use
> -Dctakes.umls_apikey
> 
> On some systems/shells the dot doesn't work.  ctakes will also accept
> (dot to underscore)
> -Dctakes_umls_apikey
> 
> 
> I think that is what I used ...
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, December 8, 2020 11:52 AM
> To: dev@ctakes.apache.org
> Subject: Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> release [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> Hi Peter,
> Sorry to leave you in the lurch so long, I've been trying it this
> morning and running into some issues (I don't think patch-related
> issues but just trouble getting to where I can test it).
> 
> So far:
> 1) Had to add dictionary-fast to the dictionary pom to get it to
> compile
> 2) had to remove all ytex modules from main pom to get it to compile
> 3) org/apache/ctakes/dictionary/lookup2/util/UmlsUserTester.java
> looks
> to have some old code and doesn't compile (bypassed by changing to
> "no
> error check"
> 
> Then I had to remember what classes in 4.0.0 I could try to check it,
> and settled on
> org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java
> 
> with no credentials, I get a UMLS authentication error (as expected).
> 
> with my old credentials, I get these errors:
> 08 Dec 2020 11:43:21 ERROR UmlsUserApprover - The user property must
> now be set to 'umls_api_key'
> 08 Dec 2020 11:43:21 ERROR UmlsUserApprover -  Verify that you are
> setting command-line option --user, or ctakes property umlsUser, or
> environment variable umlsUser properly.
> 08 Dec 2020 11:43:21 ERROR UmlsDictionaryLookupAnnotator - Error:
> Invalid UMLS License.  A UMLS License is required to use the UMLS
> dictionary lookup.
> 
> (also seems about right).
> 
> If i set ctakes_umlsuser=umls_api_key and ctakes_umlspw= key>, I'm still getting an error:
> 
> 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -   UMLS Account at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=1zoXt5XWcztipmIoZwSHyBw0N4u9aig3s4i2cVZ6EM4=IyrSaDgcbPb4YT4a_k99DLUDNJtuXQVMg1sDAUsUcyw=
>   is not
> valid.
> 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
> setting command-line option --user, or ctakes property umlsUser, or
> environment variable umlsUser properly.
> 08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
> setting command-line option --pass, or ctakes property umlsPass, or
> environment variable umlsPass properly.
> 
> 
> I'm actually setting this via my VM options as:
> -Dctakes_umlspw=
> 
> should I be doing something else?
> 
> Thanks
> Tim
> 
> 
> On Tue, 2020-12-08 at 12:21 +0100, Peter Abramowitsch wrote:
> > * External Email - Caution *
> > 
> > 
> > Attn Tim Miller
> > =
> > Hi Tim,
> > 
> > Were you able to test out the 4.0.0  umls authentication
> > patch?It
> > would
> > be good to know if it and its instructions can be dropped in
> > without
> > much
> > further work.
> > 
> > Peter
> > 
> > On Tue, Dec 1, 2020 at 3:34 PM Miller, Timothy <
> > timothy.mil...@childrens.harvard.edu> wrote:
> > 
> > > Peter, I saw the readme attachment, but it sounded from your
> > > email
> > > like
> > > there was a patch attachment too that I didn't see. Did that not
> > > come
> > > through?
> > > Tim
> > > 
> > > On Fri, 2020-11-27 at 18:19 +, Finan, Sean wrote:
> > > > * External Email - Caution *
> > > > 
> > > > 
> > > > ?Thanks Peter,
> > > > 
> > > > 
> > > > Happy Thanksgiving all
> > > > 
> > > > 
> > > > 
> > > > From: Peter Abramowit

Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS]

2020-12-08 Thread Miller, Timothy

Hi Peter,
Sorry to leave you in the lurch so long, I've been trying it this
morning and running into some issues (I don't think patch-related
issues but just trouble getting to where I can test it).

So far:
1) Had to add dictionary-fast to the dictionary pom to get it to
compile
2) had to remove all ytex modules from main pom to get it to compile
3) org/apache/ctakes/dictionary/lookup2/util/UmlsUserTester.java looks
to have some old code and doesn't compile (bypassed by changing to "no
error check"

Then I had to remember what classes in 4.0.0 I could try to check it,
and settled on 
org/apache/ctakes/clinicalpipeline/ClinicalPipelineFactory.java

with no credentials, I get a UMLS authentication error (as expected).

with my old credentials, I get these errors:
08 Dec 2020 11:43:21 ERROR UmlsUserApprover - The user property must
now be set to 'umls_api_key' 
08 Dec 2020 11:43:21 ERROR UmlsUserApprover -  Verify that you are
setting command-line option --user, or ctakes property umlsUser, or
environment variable umlsUser properly.
08 Dec 2020 11:43:21 ERROR UmlsDictionaryLookupAnnotator - Error:
Invalid UMLS License.  A UMLS License is required to use the UMLS
dictionary lookup. 

(also seems about right).

If i set ctakes_umlsuser=umls_api_key and ctakes_umlspw=, I'm still getting an error:

08 Dec 2020 11:49:05 ERROR UmlsUserApprover -   UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid.
08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
setting command-line option --user, or ctakes property umlsUser, or
environment variable umlsUser properly.
08 Dec 2020 11:49:05 ERROR UmlsUserApprover -  Verify that you are
setting command-line option --pass, or ctakes property umlsPass, or
environment variable umlsPass properly.

I'm actually setting this via my VM options as:
-Dctakes_umlspw=

should I be doing something else?

Thanks
Tim

On Tue, 2020-12-08 at 12:21 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Attn Tim Miller
> =
> Hi Tim,
> 
> Were you able to test out the 4.0.0  umls authentication patch?It
> would
> be good to know if it and its instructions can be dropped in without
> much
> further work.
> 
> Peter
> 
> On Tue, Dec 1, 2020 at 3:34 PM Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
> 
> > Peter, I saw the readme attachment, but it sounded from your email
> > like
> > there was a patch attachment too that I didn't see. Did that not
> > come
> > through?
> > Tim
> > 
> > On Fri, 2020-11-27 at 18:19 +, Finan, Sean wrote:
> > > * External Email - Caution *
> > > 
> > > 
> > > ?Thanks Peter,
> > > 
> > > 
> > > Happy Thanksgiving all
> > > 
> > > 
> > > 
> > > From: Peter Abramowitsch 
> > > Sent: Friday, November 27, 2020 11:47 AM
> > > To: dev@ctakes.apache.org
> > > Subject: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> > > release [EXTERNAL]
> > > 
> > > * External Email - Caution *
> > > 
> > > 
> > > Hi Sean
> > > 
> > > Given that you're still deciding about the tagging or branching
> > > for
> > > the 4.0.0 back-patch, I won't check the changes in, but they are
> > > attached here.They need to be unloaded at the top of the
> > > source
> > > tree.
> > > 
> > > Gandhi:  I've attached a slightly modified version of the
> > > instructions for your Wiki updates.
> > > If anyone wants the two unofficial 4.0.0 jars for testing, I
> > > would be
> > > happy to put them in dropbox
> > > 
> > > Regards & Happy Thanksgiving
> > > Peter
> > >

Re: 4.0.0 UMLS Authentication Patch - for Developers - Not a release [EXTERNAL] [SUSPICIOUS]

2020-12-01 Thread Miller, Timothy

Peter, I saw the readme attachment, but it sounded from your email like
there was a patch attachment too that I didn't see. Did that not come
through?
Tim

On Fri, 2020-11-27 at 18:19 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> ?Thanks Peter,
> 
> 
> Happy Thanksgiving all
> 
> 
> 
> From: Peter Abramowitsch 
> Sent: Friday, November 27, 2020 11:47 AM
> To: dev@ctakes.apache.org
> Subject: 4.0.0 UMLS Authentication Patch - for Developers - Not a
> release [EXTERNAL]
> 
> * External Email - Caution *
> 
> 
> Hi Sean
> 
> Given that you're still deciding about the tagging or branching for
> the 4.0.0 back-patch, I won't check the changes in, but they are
> attached here.They need to be unloaded at the top of the source
> tree.
> 
> Gandhi:  I've attached a slightly modified version of the
> instructions for your Wiki updates.
> If anyone wants the two unofficial 4.0.0 jars for testing, I would be
> happy to put them in dropbox
> 
> Regards & Happy Thanksgiving
> Peter
>

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-11-25 Thread Miller, Timothy

That link doesn't say anything there about incremental update releases,
but even with the normal process I think we can get 4.0.1 out faster
than usual because it is such a small change and there are unlikely to
be multiple RCs to get one that works for everyone.
Does anyone want to volunteer to be release manager? It needs to be
someone on the PMC, so Sean, myself, Gandhi, or Chen probably.

Tim


On Tue, 2020-11-24 at 18:10 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> > > I haven't looked into whether or not Apache svn servers have a
> > > locking mechanism ...
> > I think it's worth checking -- if we're allowed to just branch off
> > the
> > 4.0.0 tag we can get a 4.0.1 distribution that just has this one
> > change, and we could have it built and uploaded quickly so we're
> > ready
> > for the UMLS change. How would we find out?
> 
> A 4.0.1 made directly from 4.0 with only the authentication update is
> probably the way to go.
> I suppose that for people with dependencies, downloads etc. fixed at
> 4.0 would have to get their new umls key and change their ctakes
> config anyway, so telling them to update any coded version numbers
> doesn't involve too much extra effort.
> 
> The main apache org how to release documentation is at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__infra.apache.org_release-2Dpublishing.html=DwIF-g=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=QVEto_k7Ovh16r4YjW7Uelv9_lDmvjxRwoI2r7_6qBk=fjMkpO1i2FXprtFbQ-XJ1cvVlSQ8-uz3gSOBojxNMI8=
>  
> I am not sure of anything specifically regarding patches.
> I don't know if we need to go through the full process for a point
> release ...
> 
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, November 24, 2020 12:45 PM
> To: dev@ctakes.apache.org
> Subject: Re: Changes to UTS Authentication for Authorized Content
> Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> On Tue, 2020-11-24 at 16:29 +, Finan, Sean wrote:
> > * External Email - Caution *
> > 
> > 
> > Hi Tim and all,
> > 
> > Peter kindly checked this into trunk last week.
> > I tested that version and it seemed to work.
> > 
> > Another question might be "how do we get this into the/a release?
> > 
> > I haven't looked into whether or not Apache svn servers have a
> > locking mechanism on release branches, but if not I think that a
> > patch of 4.0 using the version that you and Greg tested should be a
> > simple checkin.
> 
> I think it's worth checking -- if we're allowed to just branch off
> the
> 4.0.0 tag we can get a 4.0.1 distribution that just has this one
> change, and we could have it built and uploaded quickly so we're
> ready
> for the UMLS change. How would we find out?
> 
> Tim
> 
> > I am sure that everybody is tired of hearing me say this, but I
> > would
> > like to get out a version 5 asap and disclaim that it is required
> > for
> > the new umls authentication.  That would make patching v4 a non-
> > issue.
> > 
> > Regardless of repository inclusion, the documentation (also written
> > by Peter) needs to get to the ctakes wiki  - and probably the main
> > ctakes web site.  On that note, the web site needs to be redone
> > asap.
> > 
> > Anyway, cheers to Peter for taking upon himself this update!
> > We do still have a few things left to do.
> > Volunteers?
> > 
> > Sean
> > 
> > 
> > From: Miller, Timothy 
> > Sent: Tuesday, November 24, 2020 11:07 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Peter,
> > I was able to try your changes and get this new authentication
> > mechanism to work in the default pipeline. Peter, Sean, et al, what
> > are
> > the next steps for getting this in to trunk? If you're not
> > comfortable
> > checking in directly maybe you can share the patch for review.
> > Tim
> > 
> > On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> > > * External Email - Caution *
> > > 
> > > 
> > > Hi Greg
> > > 
> > > I've got the modifications finished for the new UMLS
> > > authentication
> > > method
> > > using API keys.  If you're game, I'd like you to be next to test
> > > it.
> > >

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2020-11-24 Thread Miller, Timothy

There's no doubt, maybe even 5.0.0 could be justified, but the hope (at
least my hope!) was that if we could get out a 4.0.0.0.0.1 release
with just this change, it would satisfy anyone who just wants to make
sure their setup still works when the NLM switches off the REST server.
Tim


On Tue, 2020-11-24 at 22:19 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Sounds reasonable, but just a thought:  Are the changes in trunk
> sufficient to warrant a new major release?   Are there major
> structural or compatibility issues between 4.0 and trunk?  - it
> doesn’t strike me that there areHow about 4.0.0 going to 4.0.1
> and trunk becoming 4.1.0-SNAPSHOT?   I.e. a new feature release...
> when it comes.
> 
> Peter
> 
> Sent from my iPad
> 
> > On Nov 24, 2020, at 22:10, Finan, Sean <
> > sean.fi...@childrens.harvard.edu> wrote:
> > 
> > Webapp email is killing me ... that email was sent prematurely.
> > 
> > > ctakes-4.0.0-rc3to   ctakes-4.0.1
> > 
> > I think that is certainly one way to do it.
> > 
> > One could checkout the branch
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_branches_ctakes-2D4.0.0_=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=m8k3ufZy2nG7f3SJkq_l7KQvGiWF_Z2l4hcEJkYLGP0=k_mq8OPmtWdlvyj5tITQHfONr0GiO2Akx-eyT6FbVdQ=
> >  
> > and make the changes to that code.
> > 
> > Would the method be:
> > 1.  Checkout 4.0.0 branch
> > 2.  Apply the patch
> > 3.  Continue with the full release process, checkin and tag 4.0.1 ?
> > 4.  Keep working on trunk for the next release
> > 5.  Change the version numbers in trunk to ctakes-5.0.0-SNAPSHOT
> > ^- this would force all external projects using trunk to update
> > their dependency version.
> > 
> > Let us keep this rolling,
> > Sean
> > 
> > From: Finan, Sean 
> > Sent: Tuesday, November 24, 2020 4:02 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> > [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > > ctakes-4.0.0-rc3to   ctakes-4.0.1
> > 
> > I think that is certainly one way to do it.
> > 
> > One could checkout the branch
> > 
> > Would the method be:
> > 1.  Checkout 4.0.0-rc3
> > 2.  Apply the patch
> > 3.  Continue with the full release process, checkin and tag?
> > 4.  Keep working on trunk for the next release
> > 
> > 
> > 
> > From: Miller, Timothy 
> > Sent: Tuesday, November 24, 2020 2:22 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Changes to UTS Authentication for Authorized Content
> > Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]
> > 
> > * External Email - Caution *
> > 
> > 
> > Specifically, is the way to go to branch from the tag at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D4.0.0-2Drc3=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=FQFm_YbfeRux_ry1zWBtjd3hgCIPEZvQqmrh1W9UAVE=F1goCJ3-zDm3bXn-8Z-aBDBSeOhu6U8vMwtDGnmxTE4=
> > 
> > (the latest release candidate before release I believe)
> > into
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_tags_ctakes-2D4.0.1=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=FQFm_YbfeRux_ry1zWBtjd3hgCIPEZvQqmrh1W9UAVE=5acAKSY9--DM3OWA_Kl4H7Uwt7AGhttmPYKUzdnYLEY=
> > 
> > ?
> > 
> > Tim
> > 
> > > On Tue, 2020-11-24 at 20:14 +0100, Peter Abramowitsch wrote:
> > > * External Email - Caution *
> > > 
> > > 
> > > Right, then.
> > > I'll get that done.
> > > 
> > > Peter
> > > 
> > > On Tue, Nov 24, 2020 at 7:53 PM Finan, Sean <
> > > sean.fi...@childrens.harvard.edu> wrote:
> > > 
> > > > I think so.  Whether we can 'release' it or not, branching code
> > > > from the
> > > > 4.0 release is probably a first step.
> > > > 
> > > > From: Peter Abramowitsch 
> > > > Sent: Tuesday, November 24, 2020 1:23 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Changes to UTS Authentication f

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2020-11-24 Thread Miller, Timothy

On Tue, 2020-11-24 at 16:29 +, Finan, Sean wrote:
> * External Email - Caution *
> 
> 
> Hi Tim and all,
> 
> Peter kindly checked this into trunk last week.  
> I tested that version and it seemed to work.
> 
> Another question might be "how do we get this into the/a release?
> 
> I haven't looked into whether or not Apache svn servers have a
> locking mechanism on release branches, but if not I think that a
> patch of 4.0 using the version that you and Greg tested should be a
> simple checkin.

I think it's worth checking -- if we're allowed to just branch off the
4.0.0 tag we can get a 4.0.1 distribution that just has this one
change, and we could have it built and uploaded quickly so we're ready
for the UMLS change. How would we find out?

Tim

> 
> I am sure that everybody is tired of hearing me say this, but I would
> like to get out a version 5 asap and disclaim that it is required for
> the new umls authentication.  That would make patching v4 a non-
> issue.  
> 
> Regardless of repository inclusion, the documentation (also written
> by Peter) needs to get to the ctakes wiki  - and probably the main
> ctakes web site.  On that note, the web site needs to be redone
> asap.  
> 
> Anyway, cheers to Peter for taking upon himself this update!  
> We do still have a few things left to do.  
> Volunteers?
> 
> Sean
> 
> 
> From: Miller, Timothy 
> Sent: Tuesday, November 24, 2020 11:07 AM
> To: dev@ctakes.apache.org
> Subject: Re: Changes to UTS Authentication for Authorized Content
> Distributors [EXTERNAL] [SUSPICIOUS]
> 
> * External Email - Caution *
> 
> 
> Peter,
> I was able to try your changes and get this new authentication
> mechanism to work in the default pipeline. Peter, Sean, et al, what
> are
> the next steps for getting this in to trunk? If you're not
> comfortable
> checking in directly maybe you can share the patch for review.
> Tim
> 
> On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> > * External Email - Caution *
> > 
> > 
> > Hi Greg
> > 
> > I've got the modifications finished for the new UMLS authentication
> > method
> > using API keys.  If you're game, I'd like you to be next to test
> > it.
> > Contact me at pabramowit...@gmail.com and I'll get you a new
> > ctakes-dictionary-lookup-fast.4.0,1,x,jar  and Readme.
> > 
> > If it's smooth for you, I'll talk with Sean about checking it in
> > and
> > what
> > wiki locations need to be updated.
> > 
> > To get your key you'll need to log into UMLS, If you've not
> > been
> > there
> > recently you'll need to go through their profile upgrade process
> > where user
> > details will be rerouted through one of the  public authentication
> > mechanisms.
> > Once in, go to your profile section and you'll find the API_KEY.
> > 
> > All of you will need to do this eventually.
> > 
> > Regards
> > Peter
> > 
> > Regards, Peter
> > 
> > On Wed, Nov 11, 2020 at 10:13 PM Greg Silverman <
> > g...@umn.edu.invalid>
> > wrote:
> > 
> > > Hi Peter,
> > > Thanks, that would be great. I like the backwards compatible
> > > method. Our
> > > issue is that we have custom configurations for use in Docker and
> > > Kubernetes with UIMA-AS, so this would be ideal.
> > > 
> > > Greg--
> > > 
> > > 
> > > On Wed, Nov 11, 2020 at 3:07 PM Peter Abramowitsch <
> > > pabramowit...@gmail.com>
> > > wrote:
> > > 
> > > > Hi Greg
> > > > It's actually extremely simple for current UMLS licensees.
> > > > The new API uses an API_KEY instead of user/password.Just
> > > > login to
> > > the
> > > > UTS site, go to your profile area and check on your key
> > > > I or someone else will make changes to the cTAKES validator to
> > > > accept
> > > this
> > > > key in lieu of name and password
> > > > 
> > > > For new UMLS users, they will need a couple of extra
> > > > steps.   They will
> > > get
> > > > an identity from one of the authentication providers like
> > > > Login.gov as a
> > > > part of the UTS registration process.   But having completed
> > > > that, they
> > > > will have a profile page with the API_KEY as above
> > > > 
> > > > 
> > > > 
> > > > On Wed, Nov 11, 2020 at 7:27 PM Greg Silverman <
> > > > g...@um

Re: Changes to UTS Authentication for Authorized Content Distributors [EXTERNAL]

2020-11-24 Thread Miller, Timothy

Peter,
I was able to try your changes and get this new authentication
mechanism to work in the default pipeline. Peter, Sean, et al, what are
the next steps for getting this in to trunk? If you're not comfortable
checking in directly maybe you can share the patch for review.
Tim

On Sun, 2020-11-15 at 20:54 +0100, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Hi Greg
> 
> I've got the modifications finished for the new UMLS authentication
> method
> using API keys.  If you're game, I'd like you to be next to test it.
> Contact me at pabramowit...@gmail.com and I'll get you a new
> ctakes-dictionary-lookup-fast.4.0,1,x,jar  and Readme.
> 
> If it's smooth for you, I'll talk with Sean about checking it in and
> what
> wiki locations need to be updated.
> 
> To get your key you'll need to log into UMLS, If you've not been
> there
> recently you'll need to go through their profile upgrade process
> where user
> details will be rerouted through one of the  public authentication
> mechanisms.
> Once in, go to your profile section and you'll find the API_KEY.
> 
> All of you will need to do this eventually.
> 
> Regards
> Peter
> 
> Regards, Peter
> 
> On Wed, Nov 11, 2020 at 10:13 PM Greg Silverman 
> wrote:
> 
> > Hi Peter,
> > Thanks, that would be great. I like the backwards compatible
> > method. Our
> > issue is that we have custom configurations for use in Docker and
> > Kubernetes with UIMA-AS, so this would be ideal.
> > 
> > Greg--
> > 
> > 
> > On Wed, Nov 11, 2020 at 3:07 PM Peter Abramowitsch <
> > pabramowit...@gmail.com>
> > wrote:
> > 
> > > Hi Greg
> > > It's actually extremely simple for current UMLS licensees.
> > > The new API uses an API_KEY instead of user/password.Just
> > > login to
> > the
> > > UTS site, go to your profile area and check on your key
> > > I or someone else will make changes to the cTAKES validator to
> > > accept
> > this
> > > key in lieu of name and password
> > > 
> > > For new UMLS users, they will need a couple of extra
> > > steps.   They will
> > get
> > > an identity from one of the authentication providers like
> > > Login.gov as a
> > > part of the UTS registration process.   But having completed
> > > that, they
> > > will have a profile page with the API_KEY as above
> > > 
> > > 
> > > 
> > > On Wed, Nov 11, 2020 at 7:27 PM Greg Silverman <
> > > g...@umn.edu.invalid>
> > > wrote:
> > > 
> > > > For example, the user installation guide has not been updated
> > > > to
> > reflect
> > > > the changes NLM is implementing. The impact for our workflow is
> > > > pretty
> > > > significant, so without a clear picture about what we need to
> > > > do in
> > order
> > > > to not have any down time is - to put it mildly -  leaving us
> > > > in the
> > > dark.
> > > > Greg--
> > > > 
> > > > On Tue, Nov 10, 2020 at 9:18 AM Greg Silverman 
> > > > wrote:
> > > > 
> > > > > It's still unclear what this means for me as a user of a
> > > > > piece of
> > > > software
> > > > > that uses UTS for authentication purposes. Could someone
> > > > > please, in
> > > plain
> > > > > language, describe what we as normal users who use software
> > > > > reliant
> > on
> > > > this
> > > > > authentication mechanism will have to do in order to not
> > > > > disrupt any
> > > > > running workflows?
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > Greg--
> > > > > 
> > > > > 
> > > > > On Mon, Nov 9, 2020 at 7:13 AM McLaughlin, Patrick (NIH/NLM)
> > > > > [E]
> > > > >  wrote:
> > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > The UMLS Terminology Services (UTS) is moving from a
> > username/password
> > > > > > login to an NIH-federal identity provider system on Monday,
> > > > > > November
> > > 9.
> > > > > > UMLS users will begin migrating their accounts to the new
> > > > > > system on
> > > this
> > > > > > date with a migration deadline of January 15, 2021.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > You will need to update any systems that use the UMLS user
> > validation
> > > > API
> > > > > > <
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov_help_license_validateumlsuserhelp.html=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=mRLdzmP8PH1wOUx_Eh0yspc_HfbCKRpLtcwojZLiy1U=vl8aEPfbmDAK-rTVWtqAu41tQQw1y1GI6MV0Gu6YDNI=
> > > > > > >,
> > as
> > > > > > described in my previous emails. We recommend you implement
> > > > > > the new
> > > > > > workflow as soon as possible after November 9.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Attached are instructions for implementing UMLS user
> > > > > > validation with
> > > the
> > > > > > new system. You MUST supply NLM with the domains (e.g.,
> > > > > >

Re: cTAKES data flow [EXTERNAL]

2020-10-13 Thread Miller, Timothy

With the default pipelines, the only information that leaves your
computer is your UMLS credentials, which are used to verify that you
are a registered/current UMLS user.
Tim


On Tue, 2020-10-13 at 15:37 +0530, moinuddeen smrk wrote:
> * External Email - Caution *
> 
> 
> Hi Team,
> i am one of many users of cTAKES. i work with clinical trial
> sensitive
> data. I wanted to know about the data flow that cTAKES has. Following
> are
> my questions:
> 
> 1. Does cTAKES send any information (the text in the files) outside
> my
> workspace/computer ?
> 2. Does cTAKES store any information parsed to it outside of my
> computer?
> 
> Please do let me know the answers for this as soon as you can.
> 
> Thanks!
> 
> Regards,
> Riyaz

Re: I think I found a bug. [EXTERNAL]

2020-08-31 Thread Miller, Timothy

Peter,
I think the email server doesn't let images through. Can you post an
imgur link maybe?
Tim

On Sun, 2020-08-30 at 14:35 -0700, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> Hi,
> I was getting a StringIndexOutOfBoundsException in
> DependencyUtil.doesSubsume(annot1, annot2)  with exactly this
> situation:
> 
> negex annotator
> the text begins  "negative for "
> 
> If the chunk negative for xyz is preceded by anything else, even a
> space, the problem goes away.  It also goes away when you choose
> another style of negation.   "no headache", for instance
> 
> I've traced the problem back to some illegal entries in the jCAS  You
> can see from the image below that the ContextAnnotation's begin
> offset is illegal.  
> 
> Clearly there's an off-by-one error and this triggered the exception
> because in my example, the Annotation is created right from the 0th
> char of my note text.  But it occurred to me that in every other
> case, where the annotation doesn't begin on the first character and
> it doesn't throw an exception, it might cause  downstream methods
> like doesSubsume to give the wrong result because the begin/end
> offsets are wrong.
> 
> I'm not sure how to follow this up.  But if anyone wants to tackle
> it?
> 
> This is from HistoryAttributeClassifier beginning at line 274
> 
> 
> 
> 
>

Re: Sentence detector changes [EXTERNAL] [SUSPICIOUS]

2020-06-12 Thread Miller, Timothy

Hi Abad,
I've been following the thread but don't have much to add on top of what Sean's 
saying. The BIO version has one major benefit, in that it allows sentences to 
wrap newlines. But it does seem to break on Mr. and Dr. unfortunately. The 
solution is to create more training data but it's hard to get people excited 
about that. The next best solution is along the lines of what Sean suggested, 
to use post-processing to fix mistakes.
Tim


From: Finan, Sean 
Sent: Friday, June 12, 2020 1:20 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector changes [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Abad,

I can't say anything about Timothy Miller's availability.  He is on the ctakes 
dev mailing list so he may respond if he feels it is necessary.  He is quite 
busy with a lot of groundbreaking work, but I wanted to make sure that he got 
credit for the ..BIO annotator.

The piper file would be just as it was before for the Sentence..BIO with the 
classifier specified.
That would be followed by the lines

add EolSentenceFixer
add MrsDrSentenceJoiner
add AbadsNewDigitJoiner

where AbadsNewDigitJoiner is a custom AE using the logic of MrsDr.. that checks 
for digits before and after the dot (eg "5.5") instead of checking for a person 
title before the dot (eg "Mrs.")

Sean

From: abad.ay...@cognizant.com 
Sent: Friday, June 12, 2020 11:50 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: RE: Sentence detector changes [EXTERNAL]

* External Email - Caution *


Thank you for that quick response Sean :). So you mean to say we can add a new 
custom AE using the similar logic in MrsDr... and refer it in the piper file, 
in that case do we need to again mention the classifier jar path as   
"classifierJarPath=/org/apache/ctakes/core/sentdetect/model.jar".

Also is Timothy Miller available to help us on the issues with ' 
SentenceDetectorAnnotatorBIO ' where sentences are splitted on decimals or 
dates separated with '.'. I hope you guys are safe and doing well during this 
lock down. Stay safe :)

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-Original Message-
From: Finan, Sean 
Sent: Friday, June 12, 2020 9:06 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector changes [EXTERNAL]

[External]


Hi Abad,

The expert on SentenceDetectorAnnotatorBIO is Timothy Miller, so he might be 
able to weigh in on some of this.

I haven't noticed Sentence..BIO splitting sentences on decimals, but as an AI 
trained model you never quite know what might happen.

You could easily make something like the MrsDr.. that handles decimal problems.

Basically, a copy of MrsDr.. with lines ~62
 if ( (text.endsWith( " Mr." ) || text.endsWith( " Mrs." ) || 
text.endsWith( " Dr." )
   || text.endsWith( " a.m." ) || text.endsWith( " p.m." )
   || text.equals( "Mr." ) || text.equals( "Mrs." ) || text.equals( 
"Dr." ))
  && i < sentenceCount - 1
  && !newlines.contains( sentence.getEnd() ) ) {

to something like

 if ( text.length() > 1
  && text.charAt( text.length()-1 ) == '.'
  && Character.isDigit( text.charAt( text.length()-2 ) )
  && !sentences.get( i+1 ).getCoveredText().isEmpty()
  && Character.isDigit( sentences.get( i+1 
).getCoveredText().charAt( 0 ) ) ) {

That if (..) could be cleaned up a little, but that should do it.

Sean





From: abad.ay...@cognizant.com 
Sent: Friday, June 12, 2020 11:21 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: RE: Sentence detector changes [EXTERNAL]

* External Email - Caution *


Hi Sean,

Thank you for your advise and we tried using the 'SentenceDetectorAnnotatorBIO' 
along with the changes required in piper files as you mentioned and we could 
find that its splitting the sentences based on '.'  only ,  Actually we were 
able to get similar o/p by using the  'SentenceDetectorAnnotator' itself by 
just using '.' as the only eosCandidate in the EOSScannerImpl class.

So will 'SentenceDetectorAnnotatorBIO'  be able to extract sentences using some 
other way. Like some problems we face are the ''SentenceDetectorAnnotatorBIO' ' 
is splitting the sentence whenever it sees a decimal point like 5.5 or a date 
where separated using '.' like 01.01.2020.

Can the AE's EolSentenceFixer & MrsDrSentenceJoiner  be able to resolve our 
above issues where sentences are splitted on encountering decimals or '.' 
separated dates. If it can what are the changes that we need to do in the piper 
file to incorporate the same.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-Original Message-
From: Finan, Sean 
Sent: Thursday, June 11, 2020 9:14 PM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Re: Sentence detector

Re: Missing Medication Frequency and Allergy attributes from MedicationMention [EXTERNAL]

2020-06-06 Thread Miller, Timothy

Hi Honey,
I created a module last year for doing some medication attribute extraction, 
but it is not part of core ctakes yet so you would have to integrate it 
yourself. It uses the typesystem and most of the regular ctakes pipeline so it 
shouldn't be that difficult.
Check it out here:
https://github.com/tmills/ctakes-ade

If you want to give it a try and have questions I'll be happy to try to help. 
There is also a ctakes-drugner module that probably does similar things but I 
don't have experience with it myself.

Tim


From: Honey gandhi 
Sent: Saturday, June 6, 2020 2:53 AM
To: dev@ctakes.apache.org
Subject: Re: Missing Medication Frequency and Allergy attributes from 
MedicationMention [EXTERNAL]

* External Email - Caution *


Is there any other way to find relationship between medication and its 
dose/route/frequency or between anatomical site and its sign symptoms?

Thanks,
Honey G.

> On 06-Jun-2020, at 12:09 PM, Peter Abramowitsch  
> wrote:
>
> Some granular areas are unfinished in cTakes and in these cases, attributes
> mentioned are just placeholders for functionality that needs to be filled
> in.   I can't speak specifically to Medication Freq/Dose/Route, but much
> work is left to be done and contributed throughout the system.  Bodysite is
> another one of these.  Or conditionality and confidence.  In some cases you
> will never find them populated, or in others you'll find that values can
> only be detected in a small number of contexts.
>
> Unless an army of highly qualified developers and informaticists with free
> time materializes to take it much further, cTakes will always be a work in
> progress.  But many of us have already found it to be highly effective in
> its current form, and some have made private customizations to suit our own
> needs.
>
> Peter
>
> On Fri, Jun 5, 2020 at 2:58 AM Honey gandhi
>  wrote:
>
>> Hi
>>
>> We are exploring ctakes capabilities to use it as our NLP engine to parse
>> clinical data.
>>
>> Though we are able to parse the data at high level. We are not able to get
>> values for medication frequency, duration, allergy and other related
>> specifications.
>> It should have ideally populated values for ‘MedicationFrequency',
>> ‘MedicationAllergy' and other related fields in ‘MedicationMention’
>>
>> I have also tried including RelationSubPipe.piper file  from
>> cakes-relation-extractor to my Full.piper files in cakes-web-rest module.
>> But I don’t see any difference this made as I am yet not able to figure
>> out the relation among medication entity and its frequency, dosage etc.
>>
>> We are relatively new to this. Please advise on how to proceed further.
>>
>>
>> Thanks,
>> Honey G.

Re: how to activate inactive features in cTAKES? [EXTERNAL]

2020-04-30 Thread Miller, Timothy

Akram, the typesystem in ctakes was created by a project with the aim of 
specifying things that are useful, without specifying implementations for them 
all. There are many items in the data model that there are no ctakes modules to 
fill. The idea was that when people bring things online there are placeholders 
for that information, so that new functionality is not added in a completely ad 
hoc way. So of the examples you describe:

- discoveryTechnique is always the same because you are running the same 
pipeline
- confidence is not filled in by the dictionary lookup -- the current method 
used does not generate a confidence score
- disambiguated is not filled but is technically correct because there is no 
disambiguation algorithm running
- polarity, uncertainty, conditional, generic, historyOf, can be filled in by 
certain pipelines. You will have to add them after the DictionarySubPIpe to see 
them filled in.

Tim


From: Akram 
Sent: Thursday, April 30, 2020 4:37 AM
To: dev@ctakes.apache.org
Subject: how to activate inactive features in cTAKES? [EXTERNAL]

* External Email - Caution *


Hi
I can extract many tags when I use the default .piper in cTakes
Tags such as LabMention, AnatomicalSiteMention, ProcedureMention, etc they all 
extracted from applying this piper

load DefaultTokenizerPipeline

load DictionarySubPipe

writeHtml
writeXmis

The problem is there are some features that do not change no matter the text 
change.
most importantly confidence which is always 0
How can I get the confidence of each term?
other features such
discoveryTechnique is always 1

polarity always 0

uncertainty always 0

conditional always false

generic always false

historyOf always 0

score always 0

disambiguated always flase

how can I get these features working and where can I find more info about these 
features and what do they mean?
Thanks

Re: ML NER for cTakes [EXTERNAL]

2019-08-20 Thread Miller, Timothy

Yes, this is still true. I know there are different folks working on ML-based 
NER but none of it is in main line cTAKES yet. There is some ML in the 
pre-processing stages, and the outputs of that are used by the dictionary tool, 
but the lookup itself is done without learning.
Tim

-Original Message-
From: Maral Amir 
mailto:maral%20amir%20%3cmaraljav...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: ML NER for cTakes [EXTERNAL]
Date: Tue, 13 Aug 2019 11:52:40 -0700


Hi,

According to cTakes paper, "the ML NER module is not part of the current
cTAKES release". I was wondering if this is still true and the current
release still uses lookup for NER or we have ML NER for the current version.

Thanks,
Maral

Re: Clinical Processor [EXTERNAL]

2019-08-20 Thread Miller, Timothy

Can you send an error message that is as complete as possible? It is hard to 
tell from the information you've given.
Thanks
Tim


-Original Message-
From: Sébastien Boussard 
mailto:%3d%3fiso-8859-1%3fq%3fs%3de9bastien%3f%3d%20boussard%20%3cbouss...@bu.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Clinical Processor [EXTERNAL]
Date: Thu, 15 Aug 2019 10:28:51 -0700


I'm working on making a clinical processor, and I've been having a lot of
trouble with the JCasTermAnnotator. It's telling me that it's failing to
initialize. It is connecting to umls and validating. I've had this problem
for a while, is there any other java class I could use. I have the
dictionary and I tried to make a custom dictionary.

Thanks,
Sebastien Boussard

Re: unicode issues [EXTERNAL]

2019-07-18 Thread Miller, Timothy

Thanks Remy, that makes sense, but I'm wondering why I get the correct offsets 
in one way of accessing ctakes (the CVD) but the wrong offsets through another 
way (the REST interface)?

I guess for the fake notes I'm fully in favor of saving as plain text/ascii 
files to simplify things. But there are more unicode characters than we can 
write smart rules for and I'd like to make sure unicode strings at least don't 
screw up offsets, even if we don't process them meaningfully. I'm sure we all 
look forward to generation Z doctor's notes that use the thumbs up/down emojis 
for patient prognosis :).

Tim



-Original Message-
From: Remy Sanouillet 
mailto:remy%20sanouillet%20%3cre...@foreseemed.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Re: unicode issues [EXTERNAL]
Date: Thu, 18 Jul 2019 13:37:33 -0700

Hi Tim,

What is happening is that your o'clock contains a smart quote (Unicode U+2019) 
which is encoded as three bytes: 0x6f9980, so you have to take those two extra 
bytes into account when counting offsets. For that particular character, it is 
much easier to just preprocess the text and replace all occurrences with the 
simple apostrophe (ASCII 0x6f). The one on your keyboard. It won't change any 
interpretation and it makes life simpler for everyone downstream. You probably 
will want to deal with all extended Unicode characters like emojis otherwise, 
you will encounter the same offset issues.

Rémy Sanouillet
NLP Engineer
re...@foreseemed.com<mailto:xx...@foreseemed.com>


[cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are 
intended solely for the use of the addressee and may contain legally privileged 
and confidential information. If the reader of this message is not the intended 
recipient, or an employee or agent responsible for delivering this message to 
the intended recipient, you are hereby notified that any dissemination, 
distribution, copying, or other use of this message or its attachments is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately by replying to this message and please delete it from 
your computer.


On Thu, Jul 18, 2019 at 1:20 PM Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:
I'm having a weird issue with unicode characters in one of the sample notes 
distributed with ctakes. The sentence is:

The right breast and axilla were sterilely prepped and draped in the usual 
standard fashion.  First the right 1 o’clock position 5 cm from the nipple was 
targeted.  Local anesthesia was obtained with 2% xylocaine.  A small skin 
incision was made.  Under ultrasound guidance from a medial approach, 2 passes 
with a 14 gauge biopsy device were performed and sent to pathology.  A clip was 
placed.

The unicode characters are the right single quotes in "o'clock". If I just put 
it in the CVD everything works fine, e.g. I find the drug "xylocaine" at 
location 203-212 and it's highlighted correctly. However, if I use the REST 
interface and send it using the python requests API, I get back the span 
205:214. If we then grab that span we get the wrong string (offset by 2, so 
something like "locaine. "

Any thoughts on where things might be going wrong for the REST interface? Does 
anyone more knowledgeable than me know how UIMA and cTAKES (and java for that 
matter) normally handle unicode?

Tim

unicode issues

2019-07-18 Thread Miller, Timothy

I'm having a weird issue with unicode characters in one of the sample notes 
distributed with ctakes. The sentence is:

The right breast and axilla were sterilely prepped and draped in the usual 
standard fashion.  First the right 1 o’clock position 5 cm from the nipple was 
targeted.  Local anesthesia was obtained with 2% xylocaine.  A small skin 
incision was made.  Under ultrasound guidance from a medial approach, 2 passes 
with a 14 gauge biopsy device were performed and sent to pathology.  A clip was 
placed.

The unicode characters are the right single quotes in "o'clock". If I just put 
it in the CVD everything works fine, e.g. I find the drug "xylocaine" at 
location 203-212 and it's highlighted correctly. However, if I use the REST 
interface and send it using the python requests API, I get back the span 
205:214. If we then grab that span we get the wrong string (offset by 2, so 
something like "locaine. "

Any thoughts on where things might be going wrong for the REST interface? Does 
anyone more knowledgeable than me know how UIMA and cTAKES (and java for that 
matter) normally handle unicode?

Tim

Re: Accessing the External Resource from the UimaContext without Using XML descriptor [EXTERNAL] [SUSPICIOUS]

2019-06-30 Thread Miller, Timothy

Just wanted to make a general comment about this. I've worked on the spelling 
correction problem a tiny bit and it has all of the difficulties you all 
describe, and I think it is also slow in a kind of unavoidable way because it's 
doing quite a bit of extra work on each word.

I still would like a better solution, but I find myself wondering if there's 
good evidence for spelling correction having a real impact on a problem. I 
would like to see a paper saying, "we corrected all the spelling in this subset 
of Mimic, and it had the following effect on performance:"

phenotyping: X -> Y
NER: X -> Y
adverse event detection: X -> Y

This is a serious amount of work to carry out these experiments, and 
potentially for a result that could be negative and difficult to publish. Even 
if I just do it as a thought experiment I have a hard time convincing myself 
that I'll see large gains in these categories.

Tim

From: Finan, Sean 
Sent: Saturday, June 29, 2019 7:00 PM
To: dev@ctakes.apache.org
Subject: Re: Accessing the External Resource from the UimaContext without Using 
XML descriptor [EXTERNAL] [SUSPICIOUS]

I implemented a quick and dirty soundex a few years ago.  Terrible precision.  
I tried using it as a "catch" for terms that were not netted by the regular 
lookup.   Then I found myself running down that rabbit hole trying to identify 
topics like you (Pete) mention ... which just means that I had turned an 
attempt at solving one nlp problem to attempting to solving two.   I crawled 
out and haven't looked back.

Sean

From: Peter Abramowitsch 
Sent: Saturday, June 29, 2019 12:02 PM
To: dev@ctakes.apache.org
Subject: Re: Accessing the External Resource from the UimaContext without Using 
XML descriptor [EXTERNAL]

I've been wondering whether Levenshtein Distance or Soundex have any
potential in the cTakes pipeline. For example, if, after failing the
dictionary lookup, one used something like CSpell to find a potential
concept, but then used one of these linguistic similarity methods to
quantify the difference between it and the source over the text range and
turn that into a confidence value, would it help mitigate overfitting?  I
guess the answer would be how often radically different concepts can differ
by a single character.  Another factor as was hinted at above is that
spelling issues in consumer provided text are completely different in
character from that of the rushed clinician, and these may require
completely different solutions.

On Fri, Jun 28, 2019 at 6:34 AM Remy Sanouillet 
wrote:

> Hi Siamak,
>
> I agree with Sean. Spelling correction in NLP is a bit of a tar baby. We
> attempted to integrate CSpell (
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lsg3.nlm.nih.gov_Specialist_Summary_cSpell.html=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=CST_DJHBnyHs2yZy6bNYrEbg8KH5KIjIbtafSbM9NQQ=Yka0I-sYj7AQsBAXKF-s02fd6tpXYdHdT1chqkiJ004=
>  ) to improve
> recall.
> Unfortunately we had to take if out because the overfitting affected
> precision and increased ambiguity too much.
>
>Remy
>
> On Fri, Jun 28, 2019 at 5:20 AM Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Siamak,
> >
> > The problem of misspelled terms is a big one.  I have read about
> > approaches taken by others for research, but nothing has been implemented
> > for ctakes.
> >
> > The only thing that has been done on my projects is addition to the
> > dictionary of common misspellings for a directed project.  For instance,
> in
> > a project specifically addressing brain aneurysms I added to the
> (project)
> > dictionary misspellings like "aneurism", "anurism" and "anurysm".  I
> didn't
> > worry about misspellings for terms that didn't apply to the project; I
> > didn't bother adding things like "skelatal" for "skeletal" because I
> didn't
> > really care if that term was missed.
> >
> > Sean
> > 
> > From: Siamak Barzegar 
> > Sent: Friday, June 28, 2019 6:12 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Accessing the External Resource from the UimaContext without
> > Using XML descriptor [EXTERNAL]
> >
> > Dear Sean,
> >
> > Thank you very much for your help.
> > As you suggested, I use "BsvRareWordDictionary" and create a BSV file for
> > my small lexicon.
> > I am using it in the Spanish medical documents. As you know medical
> > documents have a lot of typos.  I was wondering to know is there any
> > dictionary lookup in cTAKES or another component from other projects that
> > can detect these small typos?
> > for example, if we have this work in dictionary file:
> > C001|T01|Fumador 2 paq*ue*tes
> >
> > And in the document, we have "fumador 2 paq*eu*tes". Is there any way to
> be
> > able to annotate this typo word as well?
> >
> > With Best Wishes,
> > Siamak
> >
> >
> >
> > On Tue, 25 Jun 2019 at 18:38,

RE: Convert type system of a component to cTakes typesysem [EXTERNAL]

2019-06-07 Thread Miller, Timothy

I don't have much experience with Heideltime, but I think this would be a great 
addition to ctakes, so if you know Heideltime a bit and you're willing to put 
in the effort I'm happy to help with your understanding the typesystem. I don't 
know that there's an easy way of 'converting' other than just writing some java 
code in a UIMA analysis engine that converts UIMA types to whatever Heideltime 
reads, makes a call to Heideltime, and then iterates over types output from 
Heideltime and creating the equivalent UIMA types. If you have some more info 
on what sort of conversion you had in mind let me know.
Tim

-Original Message-
From: Siamak Barzegar  
Sent: Thursday, June 6, 2019 5:59 AM
To: dev@ctakes.apache.org
Subject: Convert type system of a component to cTakes typesysem [EXTERNAL]

Dear All,

I want to integrate HeidelTime project (
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_HeidelTime_heideltime=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=OpaELc8Grwv2E1s2bEQTdL8LDw39-LJdnTMrUg6g6wI=zze0700gfvrFb8vk0FxhQfRq25AsRVyy2p8RUbQK3-c=)
 as a component into cTakes to use it with other components in ctakes (to build 
a pipeline for my task) But the problem is two projects (HeidelTime and cTakes) 
have different typesystems.

is there anyway to convert heidelTime typesystem to cTakes one?

PS: It seems Nactem had a code for it, but it does not work  (
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_argo-2Dnactem_nactem-2Dtype-2Dmapper=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=OpaELc8Grwv2E1s2bEQTdL8LDw39-LJdnTMrUg6g6wI=DOFLsY1vAaBiOUXHNn3FuggLUuZBT4-DbwRnNH47xMQ=)


With Best Wishes,
Siamak

Re: Looking for cTakes deployment strategies [EXTERNAL]

2019-01-29 Thread Miller, Timothy

Yousof,
I have seen this with SentenceDetectorAnnotatorBIO.xml annotator, but with the 
one you describe, I thought it had a hard-coded rule to break on newlines and 
split them into sentences. Do you have any log files that you can copy/paste 
the initialization lines so we can verify which sentence segmenter you're 
running?
Tim

From: Joseph Erfani 
Sent: Monday, January 28, 2019 6:19 PM
To: dev@ctakes.apache.org
Subject: Re: Looking for cTakes deployment strategies [EXTERNAL]

Hello everyone,
I have a question regarding the cTakes sentence detector. I am using
the "SentenceDetectorAnnotator.xml"
analysis engine located in the ctakes-core for sentence boundary detection.
It seems that the sentence boundary engine is not able to find the sentence
boundary, when a sentence is finished with a carriage return instead of a
period or several spaces.
e.g. the note
"He is a smoker
He has hypertension"

all the text is considered as one sentence, while there is a carriage
return after the word 'smoker' (at the end of the first sentence).
Have you encountered similar problem or do  you  have any suggestion for
this?

Thank you
Yousof

On Wed, Jan 16, 2019 at 10:47 AM Anusha Balasubramaniam <
anus...@foreseemed.com> wrote:

> Hello everyone,
>
> I am looking for a strategy to use cTakes to asynchronously process
> thousands of clinical notes by listening to a queue on AWS and maintaining
> a hot process with all the dictionaries loaded in memory. So far I've had
> some success using the REST server wrapper I found here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dirkweissenborn_ctakes-2Dserver=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=NkatAgDcxp3wmhmcluDwWbJYocosvqWSD3kcmDDGtHU=e3GmZyP0_WM8lPbTwxDcmbT1Qspwfgj-tSbYM3Wk-Q0=,
>  but it's still a
> synchronous call, which I found hard to scale.
> Are there any other wrappers out there that could be used to enable cTakes
> to listen to a port for input? Can anyone share some strategies they used
> to implement cTakes on AWS to achieve similar requirements?
>
> Thanks and Regards,
> Anusha
>

Re: ctakes-web-rest changes [EXTERNAL]

2019-01-23 Thread Miller, Timothy

I checked in some code to wrap the REST server in a docker container. The good 
news is, it lets you run a ctakes rest server with a pretty simple build 
command that should be system independent! The bad news is, the image is 16Gb, 
and it has a hard time running on a machine with 8Gb. So this is a work in 
progress, but if anyone wants to try it I'd be happy to hear how it works for 
you. It is in ctakes-web-rest/docker.
Just run:
docker build -t ctakes-web-rest .
from that directory, then run:
start_rest.sh
It will take a while for the server to start up because it needs to unpack the 
.war file and initialize all the UIMA modules. If you run:
docker logs 
you will be able to see how much progress it has made.
Once it's started you can navigate in a web browser to 
localhost:8080/ctakes-web-rest and you should see it. Or from a REST client api 
the url will be localhost:8080/ctakes-web-rest/service/analyze

Thanks
Tim


-Original Message-
From: gandhi rajan 
mailto:gandhi%20rajan%20%3cgandhiraja...@gmail.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: ctakes-web-rest changes [EXTERNAL]
Date: Sat, 22 Dec 2018 08:40:20 +0530


Thanks Tim. Great work.

On Friday, December 21, 2018, Miller, Timothy <
timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:



There is certainly no need to apologize! It's 100x easier for me to change
an existing version that runs than to write it from scratch since I don't
really know REST that well, so thanks for contributing that code. That's
the beauty of open source teams with different expertise!
Tim


From: Gandhi Rajan Natarajan 
mailto:gandhi.natara...@arisglobal.com>>
Sent: Friday, December 21, 2018 3:13 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: RE: ctakes-web-rest changes [EXTERNAL]

Hi Tim,

Thanks for taking your time out and checking this. Have left my comments
in the JIRA issue. Sorry that I could not improvise on the REST module
which is more suitable for our business needs due to lack of domain
expertise.

Regards,
Gandhi

-----Original Message-
From: Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
Sent: Friday, December 21, 2018 1:54 AM
To: dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: ctakes-web-rest changes

Hello all,
I've been trying out the ctakes-web-rest module for a project that uses
python where I wanted an easy way to send a sentence and get back some CUI
annotations. There was an issue where the returned json map was keyed by
the string of the concept, so it would only return one discovered concept
if more than one had the same string. In the course of fixing that I
noticed the code was writing the CAS to xmi, then manually parsing that
file, rather than just interrogating the JCas object, so I rewrote that as
well to use uimafit. Finally, I commented out the "full" pipeline -- it is
just too resource heavy to try to run 2 independent pipelines in parallel
on the same machine. I think the state of the module right now is suitable
for people who want to try and would make their own changes if they want
different pipelines (i.e., it's not yet shrink-wrapped) so I would prefer
it in a state with a simple pipeline that runs well.

Please take a look at the following issue with the attached patch and let
me know if there are any obvious problems.
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
apache.org_jira_browse_CTAKES-2D529=DwIGaQ=qS4goWBT7poplM69zy_
3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=k-ebO4GxtYSuyXd6BYi7jXvTFAafL_
nm1IIPeVzHdKA=yHIpAw72nyKeovPpQpuIFW1AxiENG54X5iOIKTtxtto=

Overall, it's in nice shape and I'm excited to get it into a usable shape,
I think this is a use case that would satisfy a lot of users.

Tim

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the named addressee you should not disseminate, distribute
or copy this e-mail. Please notify the sender or system manager by email
immediately if you have received this e-mail by mistake and delete this
e-mail from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited and
against the law.

Re: uima-as examples [EXTERNAL]

2019-01-18 Thread Miller, Timothy

Greg - I've developed a cluster-like architecture that uses Docker-wrapped 
UIMA-AS components on AWS for scalability. It's a work in progress but it might 
be helpful:
https://github.com/tmills/ctakes-docker
Tim


-Original Message-
From: Greg Silverman mailto:greg%20silverman%20%3c...@umn.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Cc: Raymond Finzel 
mailto:raymond%20finzel%20%3cfinze...@umn.edu%3e>>, Reed 
McEwan mailto:reed%20mcewan%20%3crmce...@umn.edu%3e>>
Subject: Re: uima-as examples [EXTERNAL]
Date: Fri, 18 Jan 2019 12:23:53 -0600


Thanks Peter,
The architecture for our project 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nlpie_nlp-2Dadapt-2Dkube=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=WEY8xYYIUiTWnZDnwU72eUiyHXNWFAi3vY9DMayfV-g=fvf05Pvhnq2FEnKxgYHuXibuP5Is9-bZCEE8-cbqq8M=,
uima-as branch under current development), relies heavily on uima-as to
work in conjunction with ActiveMQ and a home spun multiplexer/collection
processing client to do all the heavy lifting for the nlp-engines we're
using. Currently, CLAMP, and BioMedICUS both support UIMA-AS out-of-the-box
(I'm looking into MetaMap, as I type this).

To the best of my knowledge, the MQ and broker work together (at least in
ActiveMQ).

Given the volume of documents we need to process and the constraint of
being tied to UIMA, UIMA-AS is the easiest option for implementing at
scale, for both speed and fault tolerance.

If anyone has done any work trying to integrate UIMA-AS into cTAKES we
would be very interested in this. Retrofitting a different solution into
our architecture at this time is not feasible.

Thanks very much!

Best!

Greg-



On Thu, Jan 17, 2019 at 10:08 PM Peter Abramowitsch 
mailto:pabramowit...@gmail.com>>
wrote:



I used a completely different approach that allows parallel but not async
processing.  Multiple [analysis engine+cas] pair objects pre-instantiated
into into a threadsafe pool running behind a web service interface. We can
fully saturate a single ctakes server process using multiple client
processes talking to that API each working synchronously and arriving at an
overall speed of 10-15 6K notes per second on a single server process.

I haven't used AS but it looks as if that middleware could have too many
moving parts for our needs.  They would generate many wakeups and context
switches adding undesired latency as a request makes its way to the
server.   I'm assuming that in AS, the broker and the MQ are separate
processes and not just in-process subsystems to the ctakes server process.
Is that right?

On Thu, Jan 17, 2019 at 4:09 PM Greg Silverman 
mailto:g...@umn.edu>> wrote:



Anyone out there developed a pipeline using UIMA-AS, as opposed to the
CPE/CPM file reader?

Thanks in advance!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE 

Cardiovascular Informatics 

University of Minnesota
g...@umn.edu

 ›  evaluate-it.org  ‹

Re: Looking for cTakes deployment strategies [EXTERNAL]

2019-01-16 Thread Miller, Timothy

Hi Anusha,
I've been working on a project that hasn't merged with ctakes yet, but has a 
github page:
https://github.com/tmills/ctakes-docker

it is a work in progress and so documentation is not great, but I've used it to 
do exactly what you're asking about -- setup a ctakes cluster on AWS to process 
millions of notes.

See the README for a general introduction and then take a look at the script 
bin/launch_cluster.sh

Tim


-Original Message-
From: Anusha Balasubramaniam 
mailto:anusha%20balasubramaniam%20%3canus...@foreseemed.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Looking for cTakes deployment strategies [EXTERNAL]
Date: Wed, 16 Jan 2019 10:40:55 -0800


Hello everyone,

I am looking for a strategy to use cTakes to asynchronously process
thousands of clinical notes by listening to a queue on AWS and maintaining
a hot process with all the dictionaries loaded in memory. So far I've had
some success using the REST server wrapper I found here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dirkweissenborn_ctakes-2Dserver=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=YqHlEhy_rtyv1ECpkh6Nju79T2jpGNkfIfaDhI6C4nw=49CVRWzKU6zTCFHD70RiQCbBdtOLb9uZHsNa3HY7hg4=,
 but it's still a
synchronous call, which I found hard to scale.
Are there any other wrappers out there that could be used to enable cTakes
to listen to a port for input? Can anyone share some strategies they used
to implement cTakes on AWS to achieve similar requirements?

Thanks and Regards,
Anusha

Re: Question about negation [EXTERNAL]

2019-01-16 Thread Miller, Timothy

No, SHARPn was a later project. I'm not sure if there is any overlap in the 
datasets.

There are 2 ways to look at the features, one is to read this paper:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112774

and another is to look at the source:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-assertion/src/main/java/org/apache/ctakes/assertion/medfacts/cleartk/AssertionCleartkAnalysisEngine.java?view=markup

Tim

-Original Message-
From: ouyeyu panyu 
mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>>
Reply-to: 
To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>
Cc: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 08:09:06 -0800

Hi Timothy,

Thank you very much for the quick response.

https://pdfs.semanticscholar.org/8f2c/a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf<https://urldefense.proofpoint.com/v2/url?u=https-3A__pdfs.semanticscholar.org_8f2c_a8b638d216a3e9ec10cd1c21bdaeaa74a229.pdf=DwMFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=bdfSiGGOpy6_mnRe0CZd0-wjjUpY-DH7SrOU5_WMkZE=UhoZqDN8rO9tb4R791cI7gKRT7zn_O2yZ8VZpbsD3Ek=>
 says
The Mayo-derived linguistically annotated corpus (Mayo) was developed in-house 
and consisted of 273 clinical notes (100 650 tokens; 7299 sentences; 61 
consult; 1 discharge summary; 4 educational visit; 4 general medical 
examination; 48 limited exam; 19 multi-system evaluation; 43 miscellaneous; 1 
preoperative medical evaluation; 3 report; 3 specialty evaluation; 5 dismissal 
summary; 73 subsequent visit; 5 therapy; 3 test-oriented miscellaneous).

Is SHARPn based on the aforementioned 273 clinical notes?
Also is there a way for me to look into the trained SVM model? Say what are 
features there and their weights?

Best,
Yu Pan


On Wed, Jan 16, 2019 at 7:58 AM Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:
It uses an SVM model. The training data is from a project called SHARPn, it is 
notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? 
That sounds more like a command than documentation of a negated concept 
("denies" or "denied" would seem more common?). Even if that is a real example, 
I think it's unusual enough that there are probably not examples of "Deny X" in 
the training data.

Tim


-Original Message-
From: ouyeyu panyu 
mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>>
Reply-to: mailto:u...@ctakes.apache.org>>
To: u...@ctakes.apache.org<mailto:u...@ctakes.apache.org>, 
dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” 
returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, 
which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is 
the training data and what machine learning algorithm is used? LogisticRegress, 
SVM, RandomForest or something else?
Thanks.

Re: Question about negation [EXTERNAL]

2019-01-16 Thread Miller, Timothy

It uses an SVM model. The training data is from a project called SHARPn, it is 
notes from Mayo Clinic with a variety of note types and specialties represented.

As for the example, is it a real example that someone wrote "Deny hepatitis"? 
That sounds more like a command than documentation of a negated concept 
("denies" or "denied" would seem more common?). Even if that is a real example, 
I think it's unusual enough that there are probably not examples of "Deny X" in 
the training data.

Tim


-Original Message-
From: ouyeyu panyu 
mailto:ouyeyu%20panyu%20%3couy...@gmail.com%3e>>
Reply-to: 
To: u...@ctakes.apache.org, 
dev@ctakes.apache.org
Subject: Question about negation [EXTERNAL]
Date: Wed, 16 Jan 2019 07:51:20 -0800

Hi ctakes dev team,

I have one question, hope someone can help me with it.
For negation, "Denies hepatitis” returns polarity=-1, but "Deny hepatitis” 
returns polarity=1.
It is said CTAKES uses ClearTK’s PolarityCleartkAnalysisEngine for negation, 
which is machine learning based.
It seems this issue is caused by the training data. Is this true? And what is 
the training data and what machine learning algorithm is used? LogisticRegress, 
SVM, RandomForest or something else?
Thanks.

Re: AggregateCdaUmlsprocessor only annotates last section of CDA document [EXTERNAL] [SUSPICIOUS]

2019-01-11 Thread Miller, Timothy

Looks like someone fixed that as part of a different issue:
https://issues.apache.org/jira/browse/CTAKES-500
Tim

-Original Message-
From: "Finan, Sean" 
mailto:%22Finan,%20sean%22%20%3csean.fi...@childrens.harvard.edu%3e>>
Reply-to: 
To: dev@ctakes.apache.org 
mailto:%22...@ctakes.apache.org%22%20%3c...@ctakes.apache.org%3e>>
Subject: Re: AggregateCdaUmlsprocessor only annotates last section of CDA 
document [EXTERNAL] [SUSPICIOUS]
Date: Fri, 11 Jan 2019 16:05:21 +

Hi Sana,

This might be related to

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D450-3Ffilter-3D-2D5-26jql-3Dproject-2520-253D-2520CTAKES-2520AND-2520resolution-2520-253D-2520Unresolved-2520AND-2520-2522Attachment-2520count-2522-2520-253C-253D-2520-25222-2522-2520AND-2520-2522Attachment-2520count-2522-2520-253E-253D-2520-25221-2522-2520order-2520by-2520priority-2520DESC-252Cupdated-2520DESC=DwIFAw=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=Ojz-Ww86QvcLG1VBfECfCcNudtXNQIe7c-jJ_UMXtiE=sd2GH6n5nOzk4vtOA4qKh0kULci4rCiBDMWyM0IKU0Y=

If anybody has time to test and approve the patch attached to that tar please 
let me know so that it can be checked in.

Thanks,
Sean

From: Sana Riaz 
mailto:sana.r...@xflowresearch.com>>
Sent: Friday, January 11, 2019 5:33 AM
To: dev@ctakes.apache.org
Subject: AggregateCdaUmlsprocessor only annotates last section of CDA document 
[EXTERNAL]

Hi,
I am trying to process CDA documents with AggregateCdaUMLSProcessor.xml
descriptor (clinical-pipeline). The cda document includes sections like
problems, medications, allergies, tests etc. In the plain_view, all these
section are visible in CVD but all the annotations extracted by
AggregateCdaUMLSProcessor are only on last section. i.e. there's no
annotation on the medications or problems.

I've looked into CdaCasInitializer output , and it only passes one segment
(the last one) so all the other annotators only process on that. In
addition to that, every section's id (including last) is assigned null as
[start section id="null"]

[end section id="null"]

Do i have to assign section id's myself? Any suggestion would be very
helpful.

Warm Regards,

Sana Riaz

SemanticCleanupTermConsumer

2018-12-31 Thread Miller, Timothy

Sean (and team),
I was using PrecisionTermConsumer for my ctakes-web-rest implementation hoping 
to avoid any overlaps at all, but when I saw some overlaps I noticed the 
comment:
PrecisionTermConsumer will only persist only the longest overlapping span of 
any semantic group.

So with this term consumer, "colon cancer" goes from 3 spans (colon, cancer, 
colon cancer) to 2 (colon, colon cancer) since cancer and colon cancer have the 
same semantic group. But if I want it to go to 1 (colon cancer), is that what 
SemanticCleanupTermConsumer does?

Tim

Re: ctakes-web-rest changes [EXTERNAL]

2018-12-21 Thread Miller, Timothy

There is certainly no need to apologize! It's 100x easier for me to change an 
existing version that runs than to write it from scratch since I don't really 
know REST that well, so thanks for contributing that code. That's the beauty of 
open source teams with different expertise!
Tim

From: Gandhi Rajan Natarajan 
Sent: Friday, December 21, 2018 3:13 AM
To: dev@ctakes.apache.org
Subject: RE: ctakes-web-rest changes [EXTERNAL]

Hi Tim,

Thanks for taking your time out and checking this. Have left my comments in the 
JIRA issue. Sorry that I could not improvise on the REST module which is more 
suitable for our business needs due to lack of domain expertise.

Regards,
Gandhi

-Original Message-
From: Miller, Timothy 
Sent: Friday, December 21, 2018 1:54 AM
To: dev@ctakes.apache.org
Subject: ctakes-web-rest changes

Hello all,
I've been trying out the ctakes-web-rest module for a project that uses python 
where I wanted an easy way to send a sentence and get back some CUI 
annotations. There was an issue where the returned json map was keyed by the 
string of the concept, so it would only return one discovered concept if more 
than one had the same string. In the course of fixing that I noticed the code 
was writing the CAS to xmi, then manually parsing that file, rather than just 
interrogating the JCas object, so I rewrote that as well to use uimafit. 
Finally, I commented out the "full" pipeline -- it is just too resource heavy 
to try to run 2 independent pipelines in parallel on the same machine. I think 
the state of the module right now is suitable for people who want to try and 
would make their own changes if they want different pipelines (i.e., it's not 
yet shrink-wrapped) so I would prefer it in a state with a simple pipeline that 
runs well.

Please take a look at the following issue with the attached patch and let me 
know if there are any obvious problems.
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D529=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=k-ebO4GxtYSuyXd6BYi7jXvTFAafL_nm1IIPeVzHdKA=yHIpAw72nyKeovPpQpuIFW1AxiENG54X5iOIKTtxtto=

Overall, it's in nice shape and I'm excited to get it into a usable shape, I 
think this is a use case that would satisfy a lot of users.

Tim

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.

Re: Recognising Concept and its Value for text without space [EXTERNAL]

2018-11-07 Thread Miller, Timothy

Hi Zakir,
I think the problem here is that the default tokenizer will never split up a 
string like POD10 into ['POD', '10'] since there is no whitespace. The 
dictionary lookup uses tokens as the unit of analysis, so unless something like 
POD10 is in the dictionary database you will not get a hit for POD (which I 
assume is what you wanted). The only solution I can think of is to write your 
own tokenizer class, and swap it for the default tokenizer and re-run your 
pipeline.
Tim


-Original Message-
From: Zakir Saifi 
mailto:zakir%20saifi%20%3czakir.sa...@raxa.com%3e>>
Reply-to: 
To: dev@ctakes.apache.org
Subject: Recognising Concept and its Value for text without space [EXTERNAL]
Date: Thu, 1 Nov 2018 16:38:41 +0530


Hi, Everyone. I want Ctakes, to recognise a concept its value from the text
for those strings in which there is no space between concept and its value
For eg. POD10 (Post Operative Day 10), Pulse120. How can I achieve this in
Ctakes?

test

2018-09-14 Thread Miller, Timothy

Please ignore.
Tim

Re: Cannot authenticate license on REST API TRACKING:000308016 [EXTERNAL]

2018-07-19 Thread Miller, Timothy

Are you providing your password via the xml descriptor file or an environment 
variable? The only thing I can think of is that there might be some 
misformatting in the xml, like an extra trailing space/newline in the field 
where one of the username/password goes.
Tim

From: Jain, Ritika 
Sent: Thursday, July 19, 2018 7:15 AM
To: dev@ctakes.apache.org
Subject: RE: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

Hi Sean

See this reply from UMLS support

That endpoint (documented here:
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__emea01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-253Fu-253Dhttps-2D3A-5F-5Futs.nlm.nih.gov-5Fhelp-5Flicense-5Fvalidateumlsuserhelp.html-2526d-253DDwIFAg-2526c-253DqS4goWBT7poplM69zy-5F3xhKwEW14JZMSdioCoppxeFU-2526r-253Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao-2526m-253DfQiwb4h2SxUTGfMyinBlOo9wdbQdJuM3zugwflzf1F8-2526s-253DuO0nsPPev-2DybnYKedCLy-5F4HwS1GZsf7u-2D8H5w2UOyek-2526e-26amp-3Bdata-3D02-257C01-257C-257C032497f07be34e9a6b3908d5ecd6787f-257C1a407a2d76754d178692b3ac285306e4-257C0-257C0-257C636675328259370278-26amp-3Bsdata-3DkRL2rxurzA6WxsuiSYm9zRwvVeaMAys3dXFcbR1y-252BZc-253D-26amp-3Breserved-3D0-3D%26d%3DDwIFAg%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DGqHa003bWO-wQ-O5I0ufpc_LJHggfChk83dWdrndMS4%26s%3D-bDdjhQcUbADrAh24ci4iuJtNAILeTUJ7wYnpPORBQU%26edata=02%7C01%7C%7C2cb58d890ef94c7aa76408d5ece0669d%7C1a407a2d76754d178692b3ac285306e4%7C0%7C0%7C636675370907437611sdata=MAlPCxcpI%2F5QX92datNpg%2BfZjSyb9IBRl%2FE7q9mevzw%3Dreserved=0=
)
is not meant for end users, so it will not work with your license code and 
username.

The ctakes CVD uses the same end point ( also, pointed out in the logs I 
shared).

Regards,
Ritika

-Original Message-
From: Finan, Sean 
Sent: Thursday, July 19, 2018 4:39 PM
To: dev@ctakes.apache.org
Subject: Re: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

Hi Ritika,

I am glad that adding your proxy information got you one step closer to a 
working configuration.  However, I cannot say why your password isn't being 
properly validated.  If you can reach the umls server and your credentials are 
correct then the umls server should reply positively and ctakes should let the 
pipeline continue.

Does anybody else on the devlist have any ideas?

Sean

From: Jain, Ritika 
Sent: Thursday, July 19, 2018 5:06 AM
To: dev@ctakes.apache.org
Subject: RE: Cannot authenticate license on REST API TRACKING:000308016 
[EXTERNAL]

I can get it working adding proxy parameters in the java command, now I do not 
get the connection timeout, but a different error that the user is not valid. 
If you follow the email chain below, the support person from UMLS says that my 
user is a valid user and the account user to validate the user is not for end 
point users.

Can you help me with this?

14:29:06,054 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:06,060 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@39c6fd02]
14:29:06,067 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@39c6fd02]
14:29:06,072 INFO  [Chunker] Chunker model file: 
org/apache/ctakes/chunker/models/chunker-model.zip
14:29:07,745 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,746 INFO  [TokenizerAnnotatorPTB] Initializing 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14:29:07,756 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,770 INFO  [ContextDependentTokenizerAnnotator] Finite state machines 
loaded.
14:29:07,778 DEBUG [DataBinder] DataBinder requires binding of required fields 
[]
14:29:07,779 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,782 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,785 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,792 TRACE [TypeConverterDelegate] Converting String to [class 
java.lang.String] using property editor 
[org.apache.uima.fit.internal.propertyeditors.GetAsTextStringEditor@725e7dcc]
14:29:07,854 DEBUG [StandardEnvironment] Initializing new StandardEnvironment
14:29:07,857 DEBUG

Re: Parse Medical Research Papers [EXTERNAL]

2018-06-18 Thread Miller, Timothy

To get predicate argument structure the best method is probably to use the SRL 
(Semantic Role Labeling) annotator which is part of the 
ctakes-dependency-parser module. Check in the desc/ directory in that module 
for some sample pipelines to see its dependencies. Once you have that running, 
look for the types:
org.apache.ctakes.typesystem.type.textsem.Predicate
org.apache.ctakes.typesystem.type.textsem.SemanticArgument
org.apache.ctakes.typesystem.type.textsem.SemanticRoleRelation

in the CVD to get a feel for how predicate arguments are represented in the CAS.
If you are not familiar with SRL maybe check out this demo:
http://cogcomp.org/page/demo_view/SRL
and these slides (specifically the propbank, that is the style cTAKES uses):
https://nlp.stanford.edu/kristina/papers/SRL-Tutorial-post-HLT-NAACL-06.pdf

I believe StanfordNLP has a module to do this too, but of course not trained on 
clinical data and not using the augmented set of verb senses that were created 
by the PropBank team for the clinical domain.

Tim



From: Don Flinn 
Sent: Monday, June 18, 2018 5:40 AM
To: dev@ctakes.apache.org
Subject: Parse Medical Research Papers [EXTERNAL]

I want to parse medical research papers and am looking at using Ctakes.  I
do realize that Ctkes is aimed at Clinical Reports, but I would like to see
if I can use it for my purposes.  I'm initially looking to get a tuple of
Subject, Predicate, Object for each sentence and later additional semantic
information..

I modified ClinicalPipelineFactory.java to use  the following portion of a
research report -

"A research team based in Houston has developed a prototype for a
“bionic” heart replacement. Other designs all mimic the beating of
a heart, but due to many moving parts, the mechanical hearts
would quickly wear out. The heart developed by BiVACOR does not
beat, and instead has one moving part which propels the blood
throughout the body. The bionic heart has been safely and
successfully transplanted into animals leading to very promising
results."

I got the following result -
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false
Entity: replacement === Polarity: 1 === Uncertain? false === Subject:
patient === Generic? false === Conditional? false === History? false
Entity: mimic === Polarity: 0 === Uncertain? false === Subject: null ===
Generic? false === Conditional? false === History? false
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false
Entity: heart === Polarity: 1 === Uncertain? false === Subject: patient ===
Generic? false === Conditional? false === History? false

I assume my problem is related to the Snomed database, which is not trained
for what I want.

My questions -
Is my assumption correct?
Should I attempt to modify/extend Snomed?
Is there a better/different way to query Snomed to meet my needs?
Is there an existing database that I could use with Ctakes that would more
meet my needs?
Should I instead use the Stanford Java NLP system or the Apache OpenNLP?
I'll still need a database.

Thank you for any suggestions
Don

Re: issues with line endings [EXTERNAL]

2018-05-07 Thread Miller, Timothy

Yes, there is a setting in git but I think I'm in a bit of a catch-22 with 
git-svn. If I don't do anything, it auto-changes a bunch of files and won't let 
me even pull without checking in those changes. I can modify the .gitattributes 
file to not care about line endings, but then I can't pull because I have the 
modified .gitattribtues file! I think my solution is to check out a totally 
clean repo with git-svn, immediately push back the files with corrected (Unix) 
line endings, and then work from that copy.
Tim


From: Gandhi Rajan Natarajan <gandhi.natara...@arisglobal.com>
Sent: Saturday, May 5, 2018 3:38 AM
To: dev@ctakes.apache.org
Subject: RE: issues with line endings [EXTERNAL]

Hi Tim,



Though I'm not an expert in git, I guess there is a setting to turn off this 
feature of auto correcting line endings in git-svn.



Just have a look at this link - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__dzone.com_articles_git-2Dshowing-2Dfile-2Dmodified-2Deven=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=05YYpgD74Yy5dihICtDGHLESWl1BVu0XHA9gD0hBDeU=r9A1Uam0pxgIy7Nzt2833VYY4xaAqQAiSWMRB38-6rU=
 and see if it helps.



Regards,

Gandhi





-Original Message-

From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]

Sent: Saturday, May 05, 2018 2:25 AM

To: dev@ctakes.apache.org

Subject: issues with line endings



I'm trying to use git-svn to do ctakes development but it has this weird issue 
where it auto "fixes" line endings (mainly in -ytex*

modules) to be LF from CRLF. So it won't let me pull until I've checked in 
those changes. And because it's automatic I can't clean my local copy (if I try 
they just show up again, it's like trying to strangle a ghost). Anyways, should 
we just to a brute force commit of all files to LF endings?

Tim

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you are not the named addressee you should not disseminate, distribute or copy 
this e-mail. Please notify the sender or system manager by email immediately if 
you have received this e-mail by mistake and delete this e-mail from your 
system. If you are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents of this 
information is strictly prohibited and against the law.

issues with line endings

2018-05-04 Thread Miller, Timothy

I'm trying to use git-svn to do ctakes development but it has this
weird issue where it auto "fixes" line endings (mainly in -ytex*
modules) to be LF from CRLF. So it won't let me pull until I've checked
in those changes. And because it's automatic I can't clean my local
copy (if I try they just show up again, it's like trying to strangle a
ghost). Anyways, should we just to a brute force commit of all files to
LF endings?
Tim

Re: SentenceDetector [EXTERNAL]

2018-04-06 Thread Miller, Timothy

The changes were mainly meant to adapt the OpenNLP model to
idiosyncrasies of clinical text, but you're right that they have some
shortcomings.

The newline thing is in the data sources used originally to build the
model, there were frequent cases of headings/sentence fragments by
themselves on a line, and _no_ cases of mid-sentence newlines. That,
combined with the fact that OpenNLP's train file format (at the time)
itself used newlines as a separator, led to the creation of that simple
rule rather than trying to retrain with newline as a candidate sentence
splitter. I created a different training file format and annotator that
does what you suggest, and built an alternative sentence splitter
model, here:
org/apache/ctakes/core/ae/SentenceDetectorAnnotatorBIO.java

it operates at the character level and splits a document into
sentences. For some people it works better. For data where there are
potentially mid-sentence newlines (like MIMIC), it is probably the only
model with usable results. It's typical failure mode is to lump two
sentences together, while the default annotator does the opposite.

Tim

On Fri, 2018-04-06 at 02:11 +, Ewan Mellor wrote:
> I'm looking at SentenceDetector from ctakes-core.  It has a
> surprising
> idea of what counts as a "sentence".  Before I delve any deeper,
> I wanted to ask whether there is a reason for what it's doing, in
> particular
> whether there's anything in the clinical pipeline that's depending on
> its
> behavior specifically.
> 
> The main problem I have is that it's splitting on characters like
> colon and
> semicolon, which aren't usually considered sentence separators, with
> the
> result that it often ends up tagging phrases rather than whole
> sentences.
> 
> It's using SentenceDetectorCtakes and EndOfSentenceScannerImpl, which
> seem
> to be derived from equivalents in OpenNLP, but with changes that I
> can't
> track (they date from the original edu.mayo import as far as I can
> tell).
> Other than the additional separator characters, I can't tell whether
> these
> classes are doing anything important that you wouldn't equally get
> from
> OpenNLP's SentenceDetectorME, so I don't know why they're being used.
> 
> SentenceDetector is also splitting on newlines after passing the text
> through
> the max entropy sentence model.  I don't see the point in this -- if
> you're
> going to split on newlines anyway, then why not do that before
> passing
> through the entropy model?  Or just have newline as one of the
> potential
> EOS characters and treat it as a possible break point rather than a
> definite
> one?
> 
> Any insight would be welcome.
> 
> Thanks,
> 
> Ewan.

Re: consequences of change to typesystem [EXTERNAL]

2018-04-03 Thread Miller, Timothy

Yes, that's right. Especially for one-off contributions, it is really
helpful to the project if you open up a jira issue and attach the patch
to the issue, then one of the committers will check it and commit it.
Let us know if you have any questions about that.

For people interested in contributing more regularly (i.e., getting
committing privileges), which we are more than happy to see, that is
usually a good way to start as well.

Tim



On Tue, 2018-04-03 at 18:10 +, Gandhi Rajan Natarajan wrote:
> Hi Sean,
> 
> Please find the response from Sean Finan for the similar question I
> asked him earlier:
> 
> "Ctakes doesn't really have a steadfast process for making upgrades.
> 
> You should create a jira item or use an existing one.  Any commits
> should have a comment/message starting with the jira item.  For
> instance "CTAKES-441: Add LabValueFinder".
> 
> You can use patch files, attaching them to a jira item and requesting
> that somebody test them before the changes are committed.  You may
> want to create the patch using your git version and then commit it to
> ctakes using svn.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.devroom.io_2
> 009_10_26_how-2Dto-2Dcreate-2Dand-2Dapply-2Da-2Dpatch-2Dwith-
> 2Dgit_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=CrS3yfiJxacbnmFPA6qJIyrLpQCXyg
> 3EOYDAahILynY=UNYDqzKKwNXwggNdpJ8XikpBGUktz3yadc0Mfyw1pjk=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.devroom.io_2
> 007_07_03_how-2Dto-2Dcreate-2Dand-2Dapply-2Da-2Dpatch-2Dwith-
> 2Dsubversion_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=CrS3yfiJxacbnmFPA6qJIyrLpQCXyg
> 3EOYDAahILynY=lddQG2thUvB1znl1AGa_4uES_nFv_lGhNaOsj_xMd-Y=
> 
> If the change is significant then you could create an svn branch of
> ctakes and then commit your changes to that branch.  Ask for
> assistance testing the branch and then merge the branch into trunk."
> 
> Hope it makes sense.
> 
> Regards,
> Gandhi
> 
> -Original Message-
> From: Mullane, Sean *HS [mailto:sp...@hscmail.mcc.virginia.edu]
> Sent: Tuesday, April 03, 2018 11:28 PM
> To: 'Finan, Sean' ; d...@ctakes.apac
> he.org
> Subject: RE: consequences of change to typesystem [EXTERNAL]
> 
> I have made some minor changes to DocumentMapperServiceImpl.java to
> fix this. The bodyLocation attributes now get added via the anno_link
> table in the database. I created JIRA issue 503 [0] for this issue,
> per the cTAKES wiki.
> 
> Since this is my first time committing a change to the project I'm
> not sure how to go about it. Is there a tutorial on how to file a
> pull request I can reference?
> 
> [0] https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apach
> e.org_jira_browse_CTAKES-
> 2D503=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=CrS3yfiJxacbnmFPA6qJIyrLpQCXyg
> 3EOYDAahILynY=RO1ApuEOrhaRTQ1RtZVRk8zyTdGOJe0EniNvV7aLmqs=
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Mullane, Sean *HS [mailto:sp...@hscmail.mcc.virginia.edu]
> Sent: Wednesday, March 28, 2018 6:54 PM
> To: 'Finan, Sean'; dev@ctakes.apache.org
> Subject: RE: consequences of change to typesystem [EXTERNAL]
> 
> Sean,
> 
> Glad I asked. I will try either what you suggested or the similar
> approach of adding some code to handle the bare-annotation-as-feature 
> case similarly to how annotations inside FSArrays are handled.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Wednesday, March 28, 2018 8:40 AM
> To: dev@ctakes.apache.org
> Subject: Re: consequences of change to typesystem [EXTERNAL]
> 
> Hi Sean,
> 
> In case nobody else has replied,
> Yes, this would definitely break a whole lot of things.  I am not
> saying that it is a bad idea, just that the current
> BinaryTextRelation interface is used as-is in probably a thousand
> places, and while some refactoring might be trivial I wouldn't bet
> that it all would be as easy as one would like.
> 
> I haven't looked at the ytex DBConsumer, but could it possibly be
> easier to add some code there that would check BinaryTextRelations
> and create a new FSArray for each?  Stick those arrays in the cas
> immediately before and db write() and you should be able to do what
> you want without impacting the rest of ctakes.
> 
> Sean
> 
> From: Mullane, Sean *HS 
> Sent: Tuesday, March 27, 2018 6:05 PM
> To: dev@ctakes.apache.org
> Subject: consequences of change to typesystem [EXTERNAL]
> 
> I am trying out a change to the typesystem (explained below). If it
> works as I hope, I would want to contribute this back to the trunk.
> Before I invest too much time into this, can anyone tell me if this
> is likely to

uima 3

2018-03-15 Thread Miller, Timothy

Has some cool looking useful new functionality:
https://uima.apache.org/d/uimaj-3.0.0-alpha02/version_3_users_guide.htm
l#uv3.overview.new

Support for arbitrary Java objects, transportable in the CAS
New types: FSHashSet
Automatic garbage collection of unreferenced Feature Structures
better performance

And an interesting new select api that interacts with java streaming
api:

Set foundTypes =
   myIndex.select(MyType.class) 
   .coveredBy(myBoundingAnnotation)
   .nonOverlapping()
   .map(fs -> fs.getType())
   .collect(Collectors.toCollection(TreeSet::new));

Re: Sentence splitter [EXTERNAL]

2018-03-13 Thread Miller, Timothy

That sounds bizarre! I can think of two possibilities: a sentence break in the 
middle of the word (unlikely), or the different sentence splits caused the POS 
tagger some confusion, and tagged the word aspirin as a forbidden part of 
speech, like a preposition or something. If you check the token annotation on 
the word aspirin you should be able to see the part of speech tag for that word.
Tim


From: Tomasz Oliwa 
Sent: Tuesday, March 13, 2018 5:34 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Hi,

I tested SentenceDetectorAnnotatorBIO in cTAKES 4.0.0, simply by replacing 
SentenceDetectorAnnotator.xml with SentenceDetectorAnnotatorBIO.xml in 
AggregatePlaintextFastUMLSProcessor.xml.

While it seemed to work, I noticed that in one example, an IdentifiedAnnotation 
was not found, that was found for the same input with just 
SentenceDetectorAnnotator.xml.

Could somebody check this please? Run the cTAKES CVD with the following input 
(without the "):

"
aspirin

his leg
"

On the machine I tested this, the MedicationMention does not show up with 
SentenceDetectorAnnotatorBIO, but it does with SentenceDetectorAnnotator.


From: Masoud Rouhizadeh 
Sent: Tuesday, March 13, 2018 3:02:35 PM
To: dev@ctakes.apache.org
Subject: Re: Sentence splitter [EXTERNAL]

Hi Sean,

Thank you for the pointer. I was able to run the SentenceDetectorAnnotatorBIO 
from ctakes-core. The results are way better than the SentenceDetectorAnnotator 
but I still see some issues such as splitting “Dr.” as a separate sentence 
(most likely due to the period after the abbreviation). Do you think there is a 
way to define an abbreviation list for SentenceDetectorAnnotatorBIO so that it 
knows that this is a word-final (i.e. abbreviation-final) and not a 
sentence-final period?

Thanks again,
Masoud





On 3/9/18, 5:35 PM, "Finan, Sean"  wrote:


Hi Masoud,

There is a very nice SentenceDetectorBIO in ctakes-core.  It will split 
sentences based upon features other than just a newline character, which 
appears to be what you want.

Sean



From: Masoud Rouhizadeh 
Sent: Friday, March 9, 2018 4:41 PM
To: dev@ctakes.apache.org
Subject: Sentence splitter [EXTERNAL]

Hello cTAKES team!



I was wondering what types of sentence splitters are available in cTAKES? 
The default sentence splitter does not appear to be the best one. See output 
for the demo example from the example in cTAKES installation guide:



Dr. Nutritious Medical Nutrition Therapy for Hyperlipidemia Referral from:

Julie Tester, RD, LD, CNSD Phone contact:

(555)

555-1212 Height:

144 cm Current Weight:

45 kg Date of current weight: 02-29-2001 Admit Weight:

[...]



Thanks so much,

Masoud







Masoud Rouhizadeh, PhD

NLP Specialist / Software Engineer

Institute for Clinical and Translational Research

Johns Hopkins University


https://urldefense.proofpoint.com/v2/url?u=http-3A__pages.jh.edu_-7Emrouhiz1=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=aZ4yDE4zQbRJuUQ8p-T5nPrjhYvXF28sFoJWEtP3sGU=ob0U2sSfS7UijTI8PqCh_MwMucxPc14ovmcC2vq7rDA=

Re: UmlsUserApprover Error [EXTERNAL]

2018-02-26 Thread Miller, Timothy

Is it possible there is some network issue preventing connectivity? New
institutional firewall maybe?

Otherwise, it looks like somehow your credentials are not getting into
the right place. Possible a configuration file had them before and it's
been changed out from under you?

One thing you can try, if you are using an IDE, you can directly put
your credentials into the VM options for your run configuration with:
-Dctakes.umlsuser= -Dctakes.umlspw

and see if you still get the issue.

Tim


On Sat, 2018-02-24 at 18:42 -0600, Andrew Phillips wrote:
> Hello,
> 
> I am getting an error after recompiling a script in my pipeline. My
> setup
> was working fine the last time I did a compile several months ago,
> and I
> have logged into my UMLS account to ensure it isn't an issue with my
> credentials, as well as done a complete reinstall from the GitHub
> repo and
> checked out the 4.0.0 release. The minor change I made in the script
> was
> just uncommenting something that I've used before, so I know there
> are no
> errors in it. Any insights as to what the issue may be? I've included
> the
> complete output below. Thank you.
> 
> 
> [INFO] Scanning for projects...
> [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> missing, no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies
> could
> not be resolved: Failure to find
> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repo.maven.apach
> e.org_maven2=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=NHws3pftXkncEWsu-
> Y6fCtMKfY3WWkYQmDYrA4AVcvU=1C-i1p8UnA38es-UT_d0FMIUOx5yrfK0NQh-
> PSEuxpA= was cached in the local repository,
> resolution will not be reattempted until the update interval of
> central has
> elapsed or updates are forced
> [INFO]
> [INFO]
> ---
> -
> [INFO] Building Apache cTAKES Temporal Information Extraction 4.0.1-
> SNAPSHOT
> [INFO]
> ---
> -
> [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> missing, no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies
> could
> not be resolved: Failure to find
> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__repo.maven.apach
> e.org_maven2=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=NHws3pftXkncEWsu-
> Y6fCtMKfY3WWkYQmDYrA4AVcvU=1C-i1p8UnA38es-UT_d0FMIUOx5yrfK0NQh-
> PSEuxpA= was cached in the local repository,
> resolution will not be reattempted until the update interval of
> central has
> elapsed or updates are forced
> [INFO]
> [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) > validate @
> ctakes-misc >>>
> [INFO]
> [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) < validate @
> ctakes-misc <<<
> [INFO]
> [INFO]
> [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ ctakes-misc
> ---
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category
> [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM 
> HH:mm:ss}
> %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 24 Feb 2018 18:22:23  INFO LvgAnnotator - URL for lvg.properties
> =/home/aphillips5/ctakes/ctakes-
> misc/target/classes/org/apache/ctakes/lvg/data/config/lvg.properties
> 24 Feb 2018 18:22:23  INFO SentenceDetector - Sentence

Re: Fast UMLS dictionary lookup description [EXTERNAL] [SUSPICIOUS]

2018-02-23 Thread Miller, Timothy

Didn't you have some slides at some point as well? I don't know if they
are suitable for public consumption but I remember it was helpful for
me at least.
Tim

On Fri, 2018-02-23 at 15:34 +, Finan, Sean wrote:
> Unfortunately, writing is not my jam.  I wrote about 50% of a paper
> and then shoved it aside for other tasks.  Now I have no idea where I
> saved it ...
> 
> However, there is an outline of sorts in the code repository within
> the ctakes-dictionary-lookup-fast module.  The doc/ directory
> contains a few files and the DictionaryLookupHelp document may
> address your question.  I apparently wrote it in March of 2014 (time
> flies) so I am guessing that some minor details have changed, but the
> main flow is the same.
> 
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Friday, February 23, 2018 2:57 AM
> To: dev@ctakes.apache.org
> Subject: RE: Fast UMLS dictionary lookup description [EXTERNAL]
> 
> Hi Masoud,
> 
> 
> 
> In this link - https://urldefense.proofpoint.com/v2/url?u=https-3A__c
> wiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BFast-
> 2BDictionary-
> 2BLookup=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs6
> 7GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=2lx9jiMXTJ4lNLDbef7KG0qSHx
> D_AZH_DYqrQyAZWSY=UpxVWvyK8fZ_8vnYhIrFZlUza0qBHuqVme5n-8zEeqw=, I
> could see an information stating " A paper on rare word indexing is
> currently in progress."
> 
> 
> 
> May be Sean or Tim will be able to provide info on this I feel.
> 
> 
> 
> Regards,
> 
> Gandhi
> 
> 
> 
> -Original Message-
> 
> From: Masoud Rouhizadeh [mailto:m...@jhu.edu]
> 
> Sent: Thursday, February 22, 2018 9:57 PM
> 
> To: dev@ctakes.apache.org
> 
> Subject: Fast UMLS dictionary lookup description
> 
> 
> 
> Hello, cTAKES developing team,
> 
> 
> 
> We are using and comparing various NLP tools (including cTAKES) for
> processing over 5 million clinical notes within Johns Hopkins Medical
> Institutes. As a part of our comparisons, we are exploring the
> architecture of the NER and (UMLS) concept linking components of the
> tools.
> 
> 
> 
> I was able to find the description on the cTAKES default/original
> dictionary look up in the Savova et. al. 2010 paper but I was not
> able to find a paper or tech report describing the fast UMLS
> dictionary lookup (Fast UMLS Processor) yet.
> 
> 
> 
> Any description of the fast dictionary lookup algorithm is highly
> appreciated.
> 
> 
> 
> Thank you,
> 
> Masoud Rouhizadeh
> 
> 
> 
> 
> 
> Masoud Rouhizadeh, PhD
> 
> 
> 
> NLP Specialist / Software Engineer
> 
> Institute for Clinical and Translational Research Center for Clinical
> Data Analysis School of Medicine, Johns Hopkins University
> 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__pages.jh.edu_-7Em
> rouhiz1=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67
> GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=2lx9jiMXTJ4lNLDbef7KG0qSHxD
> _AZH_DYqrQyAZWSY=sqC6maCH-rhpZGJ_y6zc1q1K1z5FDYjcN6HhX8e_ZbY=
> 
> 
> 
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> or system manager by email immediately if you have received this e-
> mail by mistake and delete this e-mail from your system. If you are
> not the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> information is strictly prohibited and against the law.
>

Re: using umls dictionary lookup offline [EXTERNAL] [SUSPICIOUS]

2018-02-15 Thread Miller, Timothy

Again, not legal advice, but this is my rule of thumb:
- If you had to enter your UMLS credentials to download the copy of the
UMLS you're using with cTAKES, then you don't need to have the online
credentials check. (As Sean said, you are responsible for following
licenses in terms of redistribution.)
- If you did _not_ enter your UMLS credentials to download the copy of
the UMLS you're using with cTAKES (e.g., from our sourceforge mirror),
then you DO need to have the online credentials check. It is very
beneficial to the cTAKES project that we are allowed to redistribute
the UMLS in a format that's convenient for users getting started, so it
is really important not to abuse this.

Tim


On Thu, 2018-02-15 at 14:13 +, Finan, Sean wrote:
> Hi Devi,
> 
> There is a lot to say on this topic, and I can't possibly cover it
> all.  Disclaimer: the following is not meant to be complete.  It is
> the rambling of a layman, not a lawyer, who hasn't slept.  I did not
> draft the UMLS license, nor have I thoroughly read it since ... I
> want to say October.  If anybody notices that I state something
> inaccurate please correct me.  Also, apologies for shouting TAKES.
> 
> !!!  Please visit the UMLS license start page [1] for complete
> information on what you should do regarding its use.  Apache has no
> affiliation that I know of and this is not the best forum for legal
> matters.
> 
> In short, as things apply to Apache cTAKES:
> 1) There is available on sourceforge a prebuilt database containing a
> subset of the UMLS that is usable by Apache cTAKES.  
> 2) That database is not distributed or supported by Apache.  The
> licenses are incompatible.
> 3) The Apache cTAKES website downloads page [2] provides a link to it
> as a courtesy.
> 
> 4) Just like help on anything else 3rd party [3], information on
> using the dictionary [4] in the Apache cTAKES wiki, Apache cTAKES
> mailing lists, etc. is provided for assistance.
> 5) There are inherent expectations that those utilizing said help
> abide by all laws and restrictions of the third party.
> 
> 6) The "default" Apache cTAKES dictionary lookup uses a "rare word
> index" schema. [5]
> 7) While the database on sourceforge adheres to the rare word index
> schema,
> 8) An infinite number of databases can be created that conform to
> said schema and can be used by Apache cTAKES.  [6]
> 
> 9) There is also code in Apache cTAKES that can use other database
> schemas or bar-separated value flat files.
> 
> 10) While the "default clinical pipeline" [7] is possibly the most
> commonly run configuration,
> 11) The default clinical pipeline is far from being the only way to
> use Apache cTAKES.
> 
> 12) While the default dictionary lookup does require a check of the
> end user's UMLS license during initialization,
> 13) it is possible that the end user may want to run Apache cTAKES
> without the herein mentioned sourceforge database.
> 14) For that reason, there are configurations of the dictionary
> lookup that do not require a UMLS credential check.
> 
> I have run out of steam.  So,
> 1)  If you use the subset of the UMLS that exists on sourceforge,
> PLEASE keep the UMLS credential check enabled.
> 2)  If you use another database of your own making, you can do what
> you want.
> 3) I should also say that if you create your own dictionary using the
> UMLS, I am pretty certain that you are not allowed to distribute it
> without expressed permission from the NLM.  Please consult the UMLS
> license. [1]
> 
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.
> gov_databases_umls.html=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdi
> oCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA=IpjGTDhTHstuDCNdgaxEo9doI7Djf-cWL7JWrtOeKwE=
> [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.apache
> .org_downloads.cgi=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
> xeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA=5Pv5xjzH7FP4OSYumoLEsWrAzY5lRiZVBsYOmMoIR68=
> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache
> .org_confluence_display_CTAKES_External-2BTools-2Band-
> 2BApplications=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA=8I3gCGeAzw4jkeGDPg536JUlUHJvmIacIg8Jjx46_kQ=
> [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache
> .org_confluence_display_CTAKES_cTAKES-2B4.0-2BDictionaries-2Band-
> 2BModels=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heu
> p-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=kLwhTWLiycm7tPA3QG6BD8swPRXxgO
> TpWHc6l_TCKoA=nFS7-kIWdv_QpbHxdxl26WBnm3yGaauhs8cRHlpqMYM=
> [5] https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache
> .org_confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BFast-
>

Re: SubjectClearTkAnalysisEngine not working [EXTERNAL]

2018-01-16 Thread Miller, Timothy

OK, it sounds like a slight misunderstanding of what "subject" refers
to. The subject field refers to _who_ is the subject of an event.

This is important to differentiate diseases that are mentioned because
the patient is experiencing them ("pt has colon cancer") from those
that might be mentioned because a family member had them ("mother had
breast cancer").

What you're talking about sounds more like "Sections", which I think in
ctakes are called "segments". There is a regex-based section finder in
cTAKES but it is not enabled by default because it would usually need
to be customized for a given institutions notes.

Tim


On Wed, 2018-01-17 at 01:10 +0530, Ratan Sharma wrote:
> I am trying to find out something like If an entity falls in one of
> these
> category, and my understanding was subject can get me these
> information.
> 
> SUBJECT it belongs to like -
> *"Vital Signs", "BP", "Physical Examination", "Family Medical
> History",
> "Lab Results"*
> 
> Any idea how to achieve this.
> 
> 
> On Wed, Jan 17, 2018 at 1:05 AM, Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
> 
> > 
> > What output would you like? What are you expecting?
> > 
> > This field in theory could have a few different values: patient,
> > family_member, other, donor(iirc?)
> > 
> > But in reality our training data was very skewed towards the
> > patient
> > label, and the representation we used for training is not great at
> > picking up section-wide cues that would be helpful (like a family
> > history section header). So in practice it almost always will say
> > "patient." It may occasionally get something very obvious: "Mother
> > had
> > breast cancer"
> > I don't know if it will get this exact example, it probably needs
> > to
> > look exactly like a training instance because we had very few to
> > generalize from.
> > Thanks
> > Tim
> > 
> > 
> > On Wed, 2018-01-17 at 00:57 +0530, Ratan Sharma wrote:
> > > 
> > > I am able to pull entity information for different section
> > > correctly.
> > > But
> > > facing issues when it comes to pull subject information. The
> > > subject
> > > is
> > > always pulled as "PATIENT".
> > > 
> > > I do have this added in the AssertionPipeline
> > > builder.add(
> > > SubjectCleartkAnalysisEngine.createAnnotatorDescription() );
> > > 
> > > 
> > > Here are some sample output :
> > > 
> > > Entity: 3 === Text: Blood Transfusion === Polarity: 1 ===
> > > Subject:
> > > patient
> > > === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> > > Entity: 6 === Text: Blood === Polarity: 1 === Subject: patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.AnatomicalSiteMention
> > > Entity: 3 === Text: Transfusion Reaction === Polarity: 1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> > > Entity: 5 === Text: Transfusion === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.ProcedureMention
> > > Entity: 2 === Text: HIV === Polarity: 1 === Subject: patient ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 6 === Text: Sickle Cell === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.AnatomicalSiteMention
> > > Entity: 2 === Text: Neurologic Disorders === Polarity: 1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 2 === Text: Autoimmune Disorders === Polarity: 1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 3 === Text: Autoimmune === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention
> > > Entity: 2 === Text: Autoimmune Disorders === Polarity: -1 ===
> > > Subject:
> > > patient === EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.DiseaseDisorderMention
> > > Entity: 3 === Text: Autoimmune === Polarity: 1 === Subject:
> > > patient
> > > ===
> > > EntityName:
> > > org.apache.ctakes.typesystem.type.textsem.SignSymptomMention

Re: Can we build CollectionReader from database [EXTERNAL]

2018-01-12 Thread Miller, Timothy

Hi Kishore,
Take a look in this directory for many different collection reader options:
http://svn.apache.org/viewvc/ctakes/trunk/ctakes-core/src/main/java/org/apache/ctakes/core/cr/

JcdbCollectionReader may work for you.

here are the parameters with comments:

59  /**
60   * SQL statement to retrieve the document.
61   */
62  public static final String PARAM_SQL = "SqlStatement";
63  
64  /**
65   * Name of column from resultset that contains the document text. 
Supported
66   * column types are CHAR, VARCHAR, and CLOB.
67   */
68  public static final String PARAM_DOCTEXT_COL = "DocTextColName";
69  
70  /**
71   * Name of external resource for database connection.
72   */
73  public static final String PARAM_DB_CONN_RESRC = "DbConnResrcName";
74  
75  /**
76   * Optional parameter. Specifies column names that will be used to 
form a
77   * document ID.
78   */
79  public static final String PARAM_DOCID_COLS = "DocIdColNames";
80  
81  /**
82   * Optional parameter. Specifies delimiter used when document ID is 
built.
83   */
84  public static final String PARAM_DOCID_DELIMITER = "DocIdDelimiter";
85  


Tim


From: kishore 
Sent: Friday, January 12, 2018 6:26 AM
To: dev@ctakes.apache.org
Subject: Can we build CollectionReader from database [EXTERNAL]

Hi,
I got to know we can build CollectionReader using FileCollectionReader.
Do we have option to build CollectionReader from database? Can you suggest
me how to do that?

Thanks,
Kishore.

Re: Unable to understand the importance of attributes in IdentifiedAnnotations [EXTERNAL]

2018-01-06 Thread Miller, Timothy

>
> =PROCEDURES==
>   [,,,SNOMEDCT_US:32413006/C0018823] Heart Transplantation Cardiac
> transplant 15 2018-01-01 15:00:00 +0100~
> --
>
>
> Again, it picked up the History Of in the first clause where "history of"
> preceded its predicate, but not subsequent ones, or after a time
> expression indicating the past.
>
> I have a mind to work on this one day, but I think I'll be doing it in my
> CAS post processor rather than the annotator itself as the problem really
> involves a whole new solution that looks at the semantics of the whole
> sentence and not just "history of (x)"  For that we'd start looking at the
> conldep nodes, time annotations, and more.
>
> Peter
>
>
>
>
>
> On 1/5/18, 12:58 PM, "Miller, Timothy"
> <timothy.mil...@childrens.harvard.edu> wrote:
>
> >Uncertainty is when the text indicates some hedging about the concept:
> >"possible asthma" should have asthma as an IdentifiedAnnotation with the
> >uncertainty flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >HistoryOf is for concepts that are explicitly in patient history, often
> >in a history section.
> >"history of lymphoma as a child"
> >lymphoma should have its history flag set to 1.
> >This is done by machine learning and it is not easy so it is not perfect.
> >
> >Confidence is a field that I don't believe gets set by any current
> >annotators, but in theory it is for methods that might use statistical
> >methods that output a score to set the score there.
> >The cTAKES dictionary lookup either hits or doesn't, so it doesn't set
> >that score.
> >
> >DiscoveryTechnique is a way to flag which entities were annotated by
> >which annotator, since it's possible to have, e.g., multiple clinical
> >concept taggers. We use it occasionally internally
> >to separate gold standard entities from system-discovered entities (in a
> >machine learning evaluation) but I don't know if any standard pipeline
> >components set it.
> >
> >Tim
> >
> >
> >From: Kumari,Puja <puja.kuma...@cerner.com>
> >Sent: Friday, January 5, 2018 12:03 AM
> >To: dev@ctakes.apache.org
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations [EXTERNAL]
> >
> >Hi,
> >
> >
> >
> >Thanks for the replies but I am still not able to understand the
> >significance of the attributes such as Uncertainty, HistoryOf,
> >Confidence, DiscoveryTechniques.
> >
> >Can anyone give some examples or any information which will help me to
> >understand these concepts in more depth?
> >
> >
> >
> >Thanks.
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 5:30 PM, "Gandhi Rajan Natarajan"
> ><gandhi.natara...@arisglobal.com> wrote:
> >
> >
> >
> >Try out this link -
> >"https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__na01.safelinks.prote
> >ction.outlook.com_-3Furl-3Dhttps-253A-252F-252Fcwiki.
> apache.org-252Fconflu
> >ence-252Fdisplay-252FCTAKES-252FcTAKES-252B4.0-252B-2D-
> 252BAssertion-26dat
> >a-3D02-257C01-257CPuja.Kumari3-2540cerner.com-
> 257C989437995db145fcbaa808d5
> >536ac609-257Cfbc493a80d244454a815f4ca58e8c09d-257C0-257C0-
> 257C636506640417
> >310103-26sdata-3D8WN2HIq9RiCiZJiTtp0i6Sk7ZVDM
> gNGoUbJRW1Hevp4-253D-26reserv
> >ed-3D0=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> Heup-IbsIg
> >9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uY
> x6674h=muQ5_Uh4Q-5Uui87e
> >9eAWy2afrJRgcg4FrOmy2VyFP8=0NlpH8OCzjaVbZq3yTy4pQcWgTYtUK
> JOD5orbrpKGro
> >="
> >
> >
> >
> >Regards,
> >
> >Gandhi
> >
> >
> >
> >
> >
> >-Original Message-
> >
> >From: Kumari,Puja [mailto:puja.kuma...@cerner.com]
> >
> >Sent: Thursday, January 04, 2018 3:11 PM
> >
> >To: dev@ctakes.apache.org
> >
> >Subject: Re: Unable to understand the importance of attributes in
> >IdentifiedAnnotations
> >
> >
> >
> >Hi,
> >
> >
> >
> >Thanks for your reply Krishnareddy but the link given says ³page not
> >found². Any other suggestions/links that you can share would be
> >appreciable.
> >
> >
> >
> >Thanks
> >
> >Puja Kumari
> >
> >
> >
> >On 1/4/18, 2:51 PM, &q

Re: Unable to get Confidence score for any entity [EXTERNAL]

2017-12-28 Thread Miller, Timothy

These items are created by a dictionary lookup -- not any kind of probabilistic 
algorithm -- which doesn't set the confidence score. There is nothing really 
like confidence distinguishing different kinds of found dictionary concepts.
Tim

From: Ratan Sharma 
Sent: Thursday, December 28, 2017 2:09 PM
To: dev@ctakes.apache.org
Subject: Unable to get Confidence score for any entity [EXTERNAL]

I am trying to find confidence score for difference section entities like -

ProcedureMention
SignSymptomMention
MedicationMention
DiseaseDisorderMention
AnatomicalSiteMention

But for all entities under any category, the Confidence score is always 0.0

Is there a specific setting I need to turn on to get these results.

Any suggestion / link to understand more would be helpful.

Re: non Medical entity extraction [EXTERNAL]

2017-12-21 Thread Miller, Timothy

By structured fields I mean non-note sources. Notes might be stored in
a database and other columns/tables in that database will contain
patient metadata, such as sex, birthdate, insurance status, etc.
Extracting this information is probably institution-specific. If you
don't have access to this kind of database and want to get it from
notes you will need to write your own uima annotator. There are
examples of how to do this in the ctakes-examples module.
Tim


On Thu, 2017-12-21 at 07:14 -0800, Vedic Baatein wrote:
> That makes sense. 
> 
> What would be a good way to extract information about the “structured
> fields” from the notes. Is there a specific module for it. 
> 
> Thanks,
> Nitesh
> 
> > 
> > On Dec 21, 2017, at 4:24 AM, Miller, Timothy <Timothy.Miller@childr
> > ens.harvard.edu> wrote:
> > 
> > No, there is not that I'm aware of. While that information is often
> > in
> > the note, it is also usually in structured fields where it can be
> > extracted with ~100% accuracy so it's not a high priority for NLP.
> > Thanks
> > Tim
> > 
> > 
> > On Thu, 2017-12-21 at 09:26 +, abilash.mat...@cognizant.com
> > wrote:
> > > 
> > > Hi All,
> > > 
> > > Is there an option currently available with CTAKES for extracting
> > > patient name, age etc. from Medical records and lab reports?
> > > 
> > > Thanks,
> > > Abilash Mathew
> > > This e-mail and any files transmitted with it are for the sole
> > > use of
> > > the intended recipient(s) and may contain confidential and
> > > privileged
> > > information. If you are not the intended recipient(s), please
> > > reply
> > > to the sender and destroy all copies of the original message. Any
> > > unauthorized review, use, disclosure, dissemination, forwarding,
> > > printing or copying of this email, and/or any action taken in
> > > reliance on the contents of this e-mail is strictly prohibited
> > > and
> > > may be unlawful. Where permitted by applicable law, this e-mail
> > > and
> > > other e-mail communications sent to and from Cognizant e-mail
> > > addresses may be monitored.

Re: non Medical entity extraction [EXTERNAL]

2017-12-21 Thread Miller, Timothy

No, there is not that I'm aware of. While that information is often in
the note, it is also usually in structured fields where it can be
extracted with ~100% accuracy so it's not a high priority for NLP.
Thanks
Tim


On Thu, 2017-12-21 at 09:26 +, abilash.mat...@cognizant.com wrote:
> Hi All,
> 
> Is there an option currently available with CTAKES for extracting
> patient name, age etc. from Medical records and lab reports?
> 
> Thanks,
> Abilash Mathew
> This e-mail and any files transmitted with it are for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply
> to the sender and destroy all copies of the original message. Any
> unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email, and/or any action taken in
> reliance on the contents of this e-mail is strictly prohibited and
> may be unlawful. Where permitted by applicable law, this e-mail and
> other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-12-15 Thread Miller, Timothy

Great, that's very helpful.

I'll be happy to help with extracting the information needed from the
CAS the easy way. Sean, am I remembering right that there was an API
started for that somewhere? Or maybe that was part of DeepPhe?

Tim


On Fri, 2017-12-15 at 03:52 +, Gandhi Rajan Natarajan wrote:
> Hi Tim,
> 
> Thanks for taking time out and having a look at this. As you
> mentioned, the dictionary descriptor file contains details specific
> to my setup which needs to be changes to 127.0.0.1 by default. Will
> make the change accordingly.
> 
> The only reason we went ahead with the approach of parsing XML to
> JSON is due to our lack of in-depth knowledge in cTAKES
> implementations. If I could get some guidance on how to get the
> required JSON details directly from type systems, will be happy to
> implement the same as it will be a huge performance gain.
> 
> Also as you said we have two directories names ctakes-web-rest and
> ctakes-rest-service. Ctakes-rest-service directory is no longer
> active and its obsolete. We are just maintaining it for some
> reference for the time being. We will knock it off soon.
> 
> Thanks again for the detailed feedback.
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Friday, December 15, 2017 1:25 AM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS]
> [SUSPICIOUS]
> 
> I looked at this today. Looks like a great start!
> 
> I was able to get as far as deploying to tomcat, seeing the web form,
> and submitting, but didn't get correct feedback because I don't have
> a mysql dictionary set up, which the default descriptor points at. I
> didn't see any instructions for building that and didn't have time to
> figure that out.
> 
> I think I mentioned in a different thread that if this whole thing
> could be wrapped in a docker container that would be really powerful,
> but if not, there are a few things that are obvious to you as
> developers but would make it easier for novices (like me) to deploy.
> 
> * download tomcat bin and start with bin/startup.sh (check at
> localhost:8080)
> * run mvn install on my ctakes installation to populate jar files in
> the .m2 directory that were missing
> * run mvn package inside the ctakes-web-rest subdirectory
> * copy the .war file into the webapps directory in my tomcat
> installation.
> * While I couldn't get the dictionary to work pointing to mysql, I
> noticed that the dictionary descriptor file has a hardcoded IP
> address when maybe it should be 127.0.0.1?
> 
> One other thing I noticed in the code is that in sending back JSON it
> looks like you're turning the JCas into xml and then parsing it
> yourself. It should be easier just to access typesystem objects
> directly. Sean may have some API code laying around to simplify that
> as well.
> 
> To iterate over signs/symptoms, for example, you would do:
> 
> for(SignSymptomMention ss : JCasUtil.select(jcas,
> SignSymptomMention.class)){
>   int begin = ss.getBegin(); // begin offset
>   int end = ss.getEnd(): // end offset ...
> }
> 
> Using the typesystem directly may help you to speed up that code or
> make it easier to read. But maybe there is a reason to write it to
> xml that I'm not aware of.
> 
> Finally, I see there are two sub-directories with similar names,
> ctakes-rest-service and ctakes-web-rest. If they are duplicates can
> you delete the old one?
> 
> I'll keep poking around, but hopefully this is helpful feedback for
> you guys. Thanks again for getting this off the ground!
> 
> Tim
> 
> 
> 
> 
> On Thu, 2017-12-07 at 14:16 +, Miller, Timothy wrote:
> > 
> > I am really interested in this too, just waiting until I have a few
> > free hours to look around. Don't want you to think it's not of
> > interest.
> > Tim
> > 
> > 
> > On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> > > 
> > > 
> > > Hi all,
> > > 
> > > I am trying to clear a backlog at work.  I will most likely not
> > > be
> > > able to do anything with ctakes for another week.  Hopefully some
> > > rest expert out there can prove their worth by testing ...
> > > 
> > > Sean
> > > 
> > > -Original Message-
> > > From: Matthew Vita [mailto:matthewvit...@gmail.com]
> > > Sent: Tuesday, December 05, 2017 1:58 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: cTAKES as REST service [EXTERNAL]
> > > 
> > > 
> > > Hi Gandhi, Sean, Tim, Alex, James,
> > > 
> >

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-12-14 Thread Miller, Timothy

Another thought I just had is that it seems to load the pipeline when
the first call is made -- without knowing the REST APIs that well, is
it possible to load the pipelines when the war is deployed? With some
of our larger pipelines the first call may take quite a while. Would
every call re-load the pipeline?
Tim

On Thu, 2017-12-14 at 20:16 +, Finan, Sean wrote:
> Hi Tim,
> 
> Many thanks for testing the new rest service!  And double that for
> the setup instructions!
> 
> > 
> > if this whole thing could be wrapped in a docker container that
> > would be really powerful
> - Matthew and I have had a short discussion or two on a docker that
> he is working on.  It was working, but performed a lot of the spring
> updates and some workarounds that should no longer be needed.  The
> next iteration should be cleaner and simpler.  We have also talked
> about making the container more compact.  He is busy with real work,
> but I think that this is definitely just over the horizon.
> 
> > 
> > One other thing I noticed in the code is that in sending back JSON
> > it looks like you're turning the JCas into xml and then parsing it
> > yourself. It should be easier just to access typesystem objects
> > directly. Sean may have some API code laying around to simplify
> > that as well.
> -  I am actually looking at the rest/util/XmlParser and had the very
> same thought.  It is a great start though, and as far as I know it is
> the first publicly available ctakes json writer.  If anybody else out
> there already has or knows of another, please share!
> 
> 
> Cheers all,
> Sean
> 
> 
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
> Sent: Thursday, December 14, 2017 2:55 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS]
> [SUSPICIOUS] [SUSPICIOUS]
> 
> I looked at this today. Looks like a great start!
> 
> I was able to get as far as deploying to tomcat, seeing the web form,
> and submitting, but didn't get correct feedback because I don't have
> a mysql dictionary set up, which the default descriptor points at. I
> didn't see any instructions for building that and didn't have time to
> figure that out.
> 
> I think I mentioned in a different thread that if this whole thing
> could be wrapped in a docker container that would be really powerful,
> but if not, there are a few things that are obvious to you as
> developers but would make it easier for novices (like me) to deploy.
> 
> * download tomcat bin and start with bin/startup.sh (check at
> localhost:8080)
> * run mvn install on my ctakes installation to populate jar files in
> the .m2 directory that were missing
> * run mvn package inside the ctakes-web-rest subdirectory
> * copy the .war file into the webapps directory in my tomcat
> installation.
> * While I couldn't get the dictionary to work pointing to mysql, I
> noticed that the dictionary descriptor file has a hardcoded IP
> address when maybe it should be 127.0.0.1?
> 
> One other thing I noticed in the code is that in sending back JSON it
> looks like you're turning the JCas into xml and then parsing it
> yourself. It should be easier just to access typesystem objects
> directly. Sean may have some API code laying around to simplify that
> as well.
> 
> To iterate over signs/symptoms, for example, you would do:
> 
> for(SignSymptomMention ss : JCasUtil.select(jcas,
> SignSymptomMention.class)){
>   int begin = ss.getBegin(); // begin offset
>   int end = ss.getEnd():     // end offset ...
> }
> 
> Using the typesystem directly may help you to speed up that code or
> make it easier to read. But maybe there is a reason to write it to
> xml that I'm not aware of.
> 
> Finally, I see there are two sub-directories with similar names,
> ctakes-rest-service and ctakes-web-rest. If they are duplicates can
> you delete the old one?
> 
> I'll keep poking around, but hopefully this is helpful feedback for
> you guys. Thanks again for getting this off the ground!
> 
> Tim
> 
> 
> 
> 
> On Thu, 2017-12-07 at 14:16 +, Miller, Timothy wrote:
> > 
> > I am really interested in this too, just waiting until I have a
> > few 
> > free hours to look around. Don't want you to think it's not of 
> > interest.
> > Tim
> > 
> > 
> > On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> > > 
> > > 
> > > Hi all,
> > > 
> > > I am trying to clear a backlog at work.  I will most likely not
> > > be 
> > > able to do anything with ctakes for another week.  Hopefully
> > > some 
> > > rest e

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-12-14 Thread Miller, Timothy

I looked at this today. Looks like a great start!

I was able to get as far as deploying to tomcat, seeing the web form,
and submitting, but didn't get correct feedback because I don't have a
mysql dictionary set up, which the default descriptor points at. I
didn't see any instructions for building that and didn't have time to
figure that out.

I think I mentioned in a different thread that if this whole thing
could be wrapped in a docker container that would be really powerful,
but if not, there are a few things that are obvious to you as
developers but would make it easier for novices (like me) to deploy.

* download tomcat bin and start with bin/startup.sh (check at
localhost:8080)
* run mvn install on my ctakes installation to populate jar files in
the .m2 directory that were missing
* run mvn package inside the ctakes-web-rest subdirectory
* copy the .war file into the webapps directory in my tomcat
installation.
* While I couldn't get the dictionary to work pointing to mysql, I
noticed that the dictionary descriptor file has a hardcoded IP address
when maybe it should be 127.0.0.1?

One other thing I noticed in the code is that in sending back JSON it
looks like you're turning the JCas into xml and then parsing it
yourself. It should be easier just to access typesystem objects
directly. Sean may have some API code laying around to simplify that as
well.

To iterate over signs/symptoms, for example, you would do:

for(SignSymptomMention ss : JCasUtil.select(jcas,
SignSymptomMention.class)){
  int begin = ss.getBegin(); // begin offset
  int end = ss.getEnd():     // end offset
...
}

Using the typesystem directly may help you to speed up that code or
make it easier to read. But maybe there is a reason to write it to xml
that I'm not aware of.

Finally, I see there are two sub-directories with similar names,
ctakes-rest-service and ctakes-web-rest. If they are duplicates can you
delete the old one?

I'll keep poking around, but hopefully this is helpful feedback for you
guys. Thanks again for getting this off the ground!

Tim

On Thu, 2017-12-07 at 14:16 +, Miller, Timothy wrote:
> I am really interested in this too, just waiting until I have a few
> free hours to look around. Don't want you to think it's not of
> interest.
> Tim
> 
> 
> On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> > 
> > Hi all,
> > 
> > I am trying to clear a backlog at work.  I will most likely not be
> > able to do anything with ctakes for another week.  Hopefully some
> > rest expert out there can prove their worth by testing ...
> > 
> > Sean
> > 
> > -Original Message-
> > From: Matthew Vita [mailto:matthewvit...@gmail.com] 
> > Sent: Tuesday, December 05, 2017 1:58 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: cTAKES as REST service [EXTERNAL]
> > 
> > 
> > Hi Gandhi, Sean, Tim, Alex, James,
> > 
> > I'm still getting back into the swing of things after my trip (I'm
> > on
> > business traveling at the moment, here in the states). I will be
> > jumping right back into cTAKES REST development next week
> > personally
> > and with a new team mate from the open source team.
> > 
> > I'm so sorry for my silence/lack of updates!!! Very excited to see
> > what Gandhi's updates are looking like and enriching the JSON
> > response payload.
> > 
> > Thanks,
> > 
> > Matthew Vita
> > www.matthewvita.com
> > 
> > On Tue, Dec 5, 2017 at 10:24 AM, Gandhi Rajan Natarajan <
> > Gandhi.Nata
> > ra...@arisglobal.com> wrote:
> > 
> > > 
> > > 
> > > Could someone help me out on the resources cleanup atleast if not
> > > review?
> > > 
> > > Regards,
> > > Gandhi
> > > 
> > > 
> > > -Original Message-
> > > From: Gandhi Rajan Natarajan [mailto:Gandhi.Natarajan@arisglobal.
> > > co
> > > m]
> > > Sent: Monday, December 04, 2017 10:05 PM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: cTAKES as REST service [EXTERNAL]
> > > 
> > > Hi Sean, Tim, Alex, Matthew, James and All,
> > > 
> > > I have placed the first cut version of cTAKES REST module in the 
> > > following path - 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_G
> > > oT
> > > eam
> > > Epsilon_ctakes-2Drest-
> > > 2Dservice_tree_=DwIFaQ=qS4goWBT7poplM69zy_3x
> > > hKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > > Ta
> > > o
> > > =AaXwWeHrvVgjd3l30MX0K74_d9uL4nLj63jy45d5x_Y=KZ65xiQopzQNQarVc3
> > > BP
> > > MxK
> > > izpqJwoUJtjIJZC8C6i

Re: cTAKES as REST service [EXTERNAL] [SUSPICIOUS]

2017-12-07 Thread Miller, Timothy

I am really interested in this too, just waiting until I have a few
free hours to look around. Don't want you to think it's not of
interest.
Tim


On Tue, 2017-12-05 at 19:18 +, Finan, Sean wrote:
> Hi all,
> 
> I am trying to clear a backlog at work.  I will most likely not be
> able to do anything with ctakes for another week.  Hopefully some
> rest expert out there can prove their worth by testing ...
> 
> Sean
> 
> -Original Message-
> From: Matthew Vita [mailto:matthewvit...@gmail.com] 
> Sent: Tuesday, December 05, 2017 1:58 PM
> To: dev@ctakes.apache.org
> Subject: Re: cTAKES as REST service [EXTERNAL]
> 
> 
> Hi Gandhi, Sean, Tim, Alex, James,
> 
> I'm still getting back into the swing of things after my trip (I'm on
> business traveling at the moment, here in the states). I will be
> jumping right back into cTAKES REST development next week personally
> and with a new team mate from the open source team.
> 
> I'm so sorry for my silence/lack of updates!!! Very excited to see
> what Gandhi's updates are looking like and enriching the JSON
> response payload.
> 
> Thanks,
> 
> Matthew Vita
> www.matthewvita.com
> 
> On Tue, Dec 5, 2017 at 10:24 AM, Gandhi Rajan Natarajan < Gandhi.Nata
> ra...@arisglobal.com> wrote:
> 
> > 
> > Could someone help me out on the resources cleanup atleast if not
> > review?
> > 
> > Regards,
> > Gandhi
> > 
> > 
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.co
> > m]
> > Sent: Monday, December 04, 2017 10:05 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: cTAKES as REST service [EXTERNAL]
> > 
> > Hi Sean, Tim, Alex, Matthew, James and All,
> > 
> > I have placed the first cut version of cTAKES REST module in the 
> > following path - 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoT
> > eam
> > Epsilon_ctakes-2Drest-
> > 2Dservice_tree_=DwIFaQ=qS4goWBT7poplM69zy_3x
> > hKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
> > o
> > =AaXwWeHrvVgjd3l30MX0K74_d9uL4nLj63jy45d5x_Y=KZ65xiQopzQNQarVc3BP
> > MxK
> > izpqJwoUJtjIJZC8C6iA=
> > master/ctakes-web-rest/
> > 
> > Things pending in the module:
> > 1) Index Page to test the rest module using AJAX call
> > 2) Revamping the final output XML
> > 
> > Request you all to have a look at this module and provide your 
> > feedback. I would also require expert advice to clean up the
> > resources 
> > folder - 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoT
> > eam
> > Epsilon_ctakes-2Drest-
> > 2Dservice_tree_=DwIFaQ=qS4goWBT7poplM69zy_3x
> > hKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTa
> > o
> > =AaXwWeHrvVgjd3l30MX0K74_d9uL4nLj63jy45d5x_Y=KZ65xiQopzQNQarVc3BP
> > MxK
> > izpqJwoUJtjIJZC8C6iA= master/ctakes-web-
> > rest/src/main/resources/org
> > 
> > This module can be deployed as a web-app in Tomcat using the
> > generated 
> > WAR file . It can be tested using any REST client (like Chrome's 
> > Postman app) by accessing the following URL - 
> > http://:/ctakes-web-rest/service/analyze
> > and providing the analysis text as request body.
> > 
> > Sample input : "Patient has cancer and nausea. Earlier he has been 
> > deducted for red eye."
> > Sample output:
> >  {
> > "DrugChangeStatusAnnotation": [],
> > "StrengthAnnotation": [],
> > "FractionStrengthAnnotation": [],
> > "FrequencyUnitAnnotation": [],
> > "CompanyAnnotation": [],
> > "DiseaseDisorderMention": [
> > "CANCER"
> > ],
> > "SignSymptomMention": [
> > "RED EYE",
> > "NAUSEA"
> > ],
> > "RouteAnnotation": [],
> > "DateAnnotation": [],
> > "MeasurementAnnotation": [],
> > "ProcedureMention": [],
> > "TimeMention": [],
> > "StrengthUnitAnnotation": []
> > }
> > 
> > Regards,
> > Gandhi
> > 
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.co
> > m]
> > Sent: Sunday, November 19, 2017 1:45 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: cTAKES as REST service [EXTERNAL]
> > 
> > Hi All,
> > 
> > Have completed cTAKES Spring upgrade changes and checked in the
> > same 
> > to SVN. Please revert in case of any issues.
> > 
> > @Alex, Thanks a lot for taking time out and providing your review 
> > comments on Spring upgrade. Really appreciate it.
> > 
> > Now it will ease our effort in creating ctakes rest module.
> > 
> > Regards,
> > Gandhi
> > 
> > 
> > -Original Message-
> > From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.co
> > m]
> > Sent: Sunday, November 19, 2017 4:20 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: cTAKES as REST service [EXTERNAL]
> > 
> > Hi,
> > 
> > I have attached the patch file for cTAKES Spring upgrade in 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.
> > org
> > _jira_browse_CTAKES-
> > 2D472=DwIFaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> > ioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=AaXwWeHr

Re: polarity tag in output for mention/concept. [EXTERNAL] [SUSPICIOUS]

2017-11-28 Thread Miller, Timothy

I'll just point out -- the kind of examples Kathy gave were the bane of
our existence while working on the ML-based assertion system. Even
though it is obvious what is going on to a human it was hard to encode
as a feature in a way that was learnable. But I think most rule-based
algorithms will also run into problems with this type of example
eventually if they have a hard-coded scoping mechanism (e.g., scope
extends up to 10 words to the right). If you make it larger than you
may increase the number of false positives your algorithm finds
(confusingly, here a false positive is an example the algorithm calls
negated that is not actually being negated).
Tim


On Tue, 2017-11-28 at 17:22 +, Finan, Sean wrote:
> Hi Kathy,
> 
> I am glad that you checked the wiki!  I should have pointed to it ...
> 
> In the example I sent the "relevant distance" between trigger terms
> and events would be 10.  There isn't any maximum as far as I know,
> but I think that 10 is the most that I've ever used.  The default is
> 7, and you can try with that (remove "*=*") before increasing the
> number(s).
> 
> The piper files aren't source code, they are just plain text and
> don't require compiling, etc.  How are you running the pipeline right
> now?  From a binary with a bin/run* script?
> 
> Sean
> 
> 
> -Original Message-
> From: Kathy Ferro [mailto:healthcare1...@gmail.com] 
> Sent: Tuesday, November 28, 2017 12:11 PM
> To: dev@ctakes.apache.org
> Subject: Re: polarity tag in output for mention/concept. [EXTERNAL]
> 
> Sean,
> 
> Thank you for information.
> 
> I was reading the document.  So, the MaxLeftScopeSize and
> MaxRightScopeSize are limit up to 10?  Is there anyway to adjust it
> without modify the source code?
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org
> _confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BNE-
> 2BContexts=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=f
> s67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=4K9fxMmBiI0QZB0UhriFp_Yv
> XDL8rmXtGRiKVgxMCPE=hsCB9xPXLC8fpiwrGXuEW9snw_WZbY0e-E-mhPOO9N8=
> 
> 
> Thanks again,
> Kathy
> 
> 
> 
> On Tue, Nov 28, 2017 at 9:31 AM, Finan, Sean < Sean.Finan@childrens.h
> arvard.edu> wrote:
> 
> > 
> > Hi Kathy,
> > 
> > The negation annotator used in the default clinical pipeline is
> > based 
> > upon machine learning and trained on real data.  It is possible
> > that 
> > such "denies" lists were underrepresented in the training
> > data.  One 
> > thing that you can try is adding another negation annotator.  The 
> > ContextAnnotator in ctakes-ne-contexts will add negation to terms 
> > without removing existing negation.  It also has configurable
> > scope/distance that may be helpful.
> > 
> > To use this, create a new piper file containing the two lines
> > 
> > load DefaultFastPipeline
> > add ContextAnnotator MaxLeftScopeSize=10 MaxRightScopeSize=10
> > 
> > The default scope sizes are 7, but increasing  the MaxRight* might 
> > help with your "denies" discoveries.  7 might be ok for the left,
> > so 
> > feel free to remove "MaxLeftScopeSize=10" from the line.
> > 
> > Then run your piper file (command line, gui, maven profile, etc.) 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.o
> > rg_
> > confluence_display_CTAKES_Piper-
> > 2BFiles=DwIBaQ=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > Tao
> > =4K9fxMmBiI0QZB0UhriFp_YvXDL8rmXtGRiKVgxMCPE=rXqsHq_poDXmwkCf3L
> > 2M5
> > ZlsByCbUHcSWD84JQQuh5A=
> > 
> > Sean
> > 
> > -Original Message-
> > From: Kathy Ferro [mailto:healthcare1...@gmail.com]
> > Sent: Monday, November 27, 2017 8:10 PM
> > To: dev@ctakes.apache.org
> > Subject: polarity tag in output for mention/concept. [EXTERNAL]
> > 
> > Good evening,
> > 
> > I ran a few sentences through default clinical pipeline.
> > 
> > It really reliable if it's only one term after negative, but I am
> > get 
> > in-consistent value for polarity for the list of terms.  Please
> > see 
> > example below.
> > 
> > 1.   denies fatigue, malaise, fever, weight loss
> > SignSymthomMention:
> > polarity = -1:  fatigue, malaise,fever polarity = 1: weight loss.
> > Why does weight loss got single out?
> > 
> > 2.   denies ear pain or discharge, nasal obstruction or discharge,
> > sore
> > throat
> > polarity = -1: ear pain or discharge
> > polarity = 1: nasal obstruction or discharge, obstruction, sore
> > throat 
> > Doesn't even acknowledge the list.
> > 
> > 3.   denies back pain, joint swelling, joint stiffness, joint pain
> > polarity = -1: back pain, Swelling
> > polarity = 1: Joint swelling, Stiffness, pain What! totally messy
> > the 
> > pattern.
> > 
> > 4.   denied back pain, joint swelling, joint stiffness, joint pain
> > Ok, may be it doesn't like the word denies; I changed to denied,
> > deny, 
> > etc..
> > polarity = -1 : Swelling
> > everything else is 1.
> > 
> > 
> > My question is:
> > How do I handle the negative claims in the document?

Re: Contribute to ctakes: it is in your best interests! RE: unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-11-21 Thread Miller, Timothy

; > 7.8M./ctakes-pos-tagger-res/src/main/resources/org/apache/
> > ctakes/postagger/models/clearnlp/mayo-en-pos-1.3.0.jar
> > 4.0K./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/models/mention-cluster/model.jar
> > 1.5M./ctakes-core-res/src/main/resources/org/apache/ctakes/
> > core/sentdetect/model.jar
> > 
> > 504K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/subject/model.jar
> > 588K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/historyOf/model.jar
> > 332K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/uncertainty/model.jar
> > 740K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/conditional/model.jar
> > 592K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/polarity/sharpi2b2mipacqnegex/model.jar
> > 572K./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/generic/model.jar
> > 1.5M./ctakes-assertion-res/resources/model/
> > sharpi2b2mipacqnegex/polarity/model.jar
> > 312K./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/lemmatizer/dictionary-
> > 1.3.1.jar
> > 228M./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/clearparser_models.jar
> > 5.8M./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/srl/mayo-en-srl-1.3.0.jar
> > 452K./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/pred/mayo-en-pred-1.3.0.jar
> > 1.2M./ctakes-dependency-parser-res/src/main/resources/org/
> > apache/ctakes/dependency/parser/models/role/mayo-en-role-1.3.0.jar
> > 25M ./ctakes-dependency-parser-res/src/main/resources/
> > org/apache/ctakes/dependency/parser/models/dependency/mayo-
> > en-dep-1.3.0.jar
> > 688K./ctakes-relation-extractor-res/src/main/
> > resources/org/apache/ctakes/relationextractor/models/location_of/mo
> > del.jar
> > 488K./ctakes-relation-extractor-res/src/main/
> > resources/org/apache/ctakes/relationextractor/models/degree_of/mode
> > l.jar
> > 300K./ctakes-relation-extractor-res/src/main/
> > resources/org/apache/ctakes/relationextractor/models/
> > modifier_extractor/model.jar
> > 
> > 282Mtotal
> > 
> > or
> > 
> > $ find ./ -type f -size +5M | grep -v "\.jar" | grep -v "\.svn" |
> > grep 
> > -v "\.git" | xargs du -hsc 9.2M
> >    ./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/models/index_med_5k/_3.prx
> > 
> > 20M
> > ./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/models/index_med_5k/_3.tvf
> > 
> > 6.9M
> >    ./ctakes-coreference-res/src/main/resources/org/apache/
> > ctakes/coreference/pref_probs.txt
> > 
> > 13M
> > ./ctakes-chunker-res/src/main/resources/org/apache/ctakes/
> > chunker/models/chunker-model.zip
> > 
> > 6.4M
> >    ./ctakes-constituency-parser-res/src/main/resources/org/
> > apache/ctakes/constituency/parser/models/thyme.bin
> > 
> > 15M
> > ./ctakes-constituency-parser-res/src/main/resources/org/
> > apache/ctakes/constituency/parser/models/sharpacq-3.1.bin
> > 
> > 12M
> > ./ctakes-constituency-parser-res/src/main/resources/org/
> > apache/ctakes/constituency/parser/models/sharpacq-1.5.bin
> > 
> > 84M
> > ./resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
> > 16ab/sno_rx_16ab.script
> > 
> > 11M
> > ./ctakes-assertion-res/src/main/resources/org/apache/
> > ctakes/assertion/models/pos.model
> > 
> > 38M
> > 
> > ./ctakes-assertion-
> > res/resources/model/sharpi2b2mipacqnegex/polarity/
> > training-data.liblinear
> > 
> > 9.6M
> >    ./ctakes-temporal/src/main/resources/org/apache/ctakes/
> > temporal/thyme_word2vec_mapped_50.vec
> > 
> > 91M
> > ./ctakes-temporal/src/main/resources/org/apache/ctakes/
> > temporal/gloveresult_3
> > 
> > 67M
> > ./ctakes-temporal/src/main/resources/org/apache/ctakes/
> > temporal/mimic_vectors.txt
> > 
> > 378Mtotal
> > 
> > Are all these resources still relevant? Is there a way to generate
> > them?
> > 
> > I do not wish to open the Pandora box though, Alex

Re: Contribute to ctakes: it is in your best interests! RE: unknown dependencies [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-11-20 Thread Miller, Timothy

t I think that those
> 4 tasks can be performed by anybody of any experience level.   They build
> upon each other and should help the implementers better understand ctakes.
> After that the sky is the limit.
>
> A couple of years ago I sat on a panel at a workshop for open source
> scientific software.  For the half dozen or so highlighted projects
> (ctakes was one!) the common thread was that getting people to
> contribute is extremely difficult.
> I have a tendency to assume that people always act in their best
> interests.  Any student thinking of going towards industry should be
> jumping at the opportunity to contribution to a large,
> production-quality project.  They should also realize that
> contribution means potential recommendation (and possibly hiring
> interest) by established developers, physicians and researchers that
> use ctakes.  Even just answering questions on a user or dev list creates 
> credibility and can build a network.
> Active researchers could discover common thoughts and directions that
> could lead to collaboration outside ctakes.  Researchers and companies
> trying to build upon open source should realize that direct
> contribution is easier than custom substitution.  Plus, it is in their
> best interests that code does what they need it to do in the fastest,
> lightest, most stable way possible.
> With a project like ctakes there are a lot of things that can be done,
> there are great opportunities to really shine.  "I wrote this tool for
> my thesis that performs some nlp task" sounds good.  Appending "in an
> Apache product and it has been taken up by thousands across the globe"
> makes it sound a lot better.
> At my previous job in industry the company actively contributed to
> several open source projects.  We had a few people for whom that was
> 50% of their job.  Why?  Because we made a commitment to use that open source 
> software.
> It was a better use of our resources to contribute to it, improve it
> and keep its momentum going and prevent it from becoming stale (or
> abandoned) while our software continued to move forward.
>
> Hmm, that was a touch more than I had planned to write.  A whole cup
> of coffee in that one.
>
> Sean
>
>
>
>
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Saturday, November 18, 2017 8:13 AM
> To: dev@ctakes.apache.org
> Subject: Re: unknown dependencies [EXTERNAL] [SUSPICIOUS]
>
> Thanks Alex, looks like that was probably a fat-fingered auto-import
> on my part.
>
> I like your idea, and I don't know the best way to to start either,
> but maybe one suggestion is to start with one or two focused things to
> clean up, and then ask for volunteers to take on specific modules?
> Then people can contribute an hour here and there to do cleanup on
> their task/module and try to fix that thing in a 1-2-month long
> sprint. I am happy to contribute to cleanup, I am responsible for my
> fair share of unclean code, but since I don't have strong software
> engineering chops it would be good to have people with that background
> propose the tasks and describe exactly what needs to be done. My idea
> of cleaning is just to delete commented out sections of evaluation code.
>
> Tim
>
> 
> From: Alexandru Zbarcea <al...@apache.org>
> Sent: Friday, November 17, 2017 4:46 PM
> To: Apache cTAKES Dev
> Subject: unknown dependencies [EXTERNAL]
>
> Hi,
>
> I notice that a miss-dependency has slipped in the code:
> jdk.internal.org.objectweb.asm.commons.AnalyzerAdapter;
>
> Now, that the Jenkins builds is successful, I think it is easier to
> clean-up the code. I would like to be a common effort. I don't know
> the best way to approach this.
>
> Looking forward to your advice,
> Alex
>

Re: source code of user installation of cTakes. [EXTERNAL] [SUSPICIOUS]

2017-11-14 Thread Miller, Timothy

) method. This has
> > > > given results like POS tag, Polarity, etc.
> > > > Now, I am more interested in finding Procedure, Medication,
> > > > Drug,
> etc.
> > > > Could you please point me to the code file or help with code
> > > > snippet to capture above terms.
> > > >
> > > >
> > > >
> > > > On 30 October 2017 at 19:36, Finan, Sean
> > > > <sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu><mailto:sean.fi...@childrens.harvard.edu<mailto:sean.fi...@childrens.harvard.edu>>
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Bhagwat,
> > > > >
> > > > > If you are interested in the default clinical pipeline, you
> > > > > can look at the wiki here:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> > > > > org_confluence_display_CTAKES_=DwIBaQ=qS4goWBT7poplM69zy_3
> > > > > xh
> > > > > Kw
> > > > > EW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4g
> > > > > Ta
> > > > > o&
> > > > > m=Q-UMs2CriAdL_TaKVFqOnSOfqjR05AQWCnwqn6bOrBk=VdNz5x7XXCD3tr
> > > > > fx
> > > > > 4P
> > > > > oJCYVmL-_RYlSoCOOPf-i_tMs=
> > > > > Default+Clinical+Pipeline
> > > > > For a visual representation of what Tim described.
> > > > >
> > > > > The AEs used for the ctakes 4.0 default clinical pipeline are
> > > > > shown at the bottom of this wiki page:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> > > > > org_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> > > > > fs
> > > > > 67
> > > > > GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=Q-UMs2CriAdL_TaKVFqO
> > > > > nS
> > > > > Of
> > > > > qjR05AQWCnwqn6bOrBk=1hU1X63Qu3ZRVgWTSJd9uxe-X5W-hKlf24gMo6Gh
> > > > > Jw s& e= confluence/display/CTAKES/Piper+Files
> > > > > The Class names are shown, but not the packages.  If you have
> > > > > a decent IDE they should be easy to find - for Intellij press
> > > > > CTRL-N and type the name of the class.
> > > > >
> > > > > Another option is to use the Simple Pipeline Fabricator gui to
> > > > > look at the available readers and AEs and see what they do
> > > > > (and their required inputs).  Check the wiki at:
> > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> > > > > org_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> > > > > fs
> > > > > 67
> > > > > GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=Q-UMs2CriAdL_TaKVFqO
> > > > > nS
> > > > > Of
> > > > > qjR05AQWCnwqn6bOrBk=1hU1X63Qu3ZRVgWTSJd9uxe-X5W-hKlf24gMo6Gh
> > > > > Jw s& e=
> > > > > confluence/display/CTAKES/Simple+Pipeline+Fabricator+GUI
> > > > > If you launch the gui and let it gather information, you can
> > > > > look at the pipe bit names and descriptions (reader, AE).  If
> > > > > it interests you, click the "add" button (big '+') and on the
> > > > > right you will see the path to the source code for that bit of
> > > > > the pipeline.  Not all AEs
> > > > are described ...
> > > > > calling all community ...  but I think that most are.
> > > > >
> > > > > Sean
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: Miller, Timothy
> > > > > [mailto:timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu><mailto:timothy.mil...@childrens.harvard.edu<mailto:timothy.mil...@childrens.harvard.edu>>]
> > > > > Sent: Monday, October 30, 2017 9:48 AM
> > > > > To: 
> > > > > dev@ctakes.apache.org<mailto:dev@ctakes.apache.org><mailto:dev@ctakes.apache.org<mailto:dev@ctakes.apache.org>>
> > > > > Subject: Re: source code of user installation of cTakes.
> > > > > [EXTERNAL] [SUSPICIOUS]
> > > > >
> > > > > cTAKES is based on Apache UIMA, which is a pipeline-building tool.
> > > > > So the output you see in the CVD is the result of many
> > > > > different pieces of the pipeline run in

Re: source code of user installation of cTakes. [EXTERNAL] [SUSPICIOUS]

2017-11-08 Thread Miller, Timothy

scribed.
> > >
> > > The AEs used for the ctakes 4.0 default clinical pipeline are shown
> > > at the bottom of this wiki page: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ONC114Bki6vY6dmCLn3sPjdNegVyawdkxXvYuBFoonI=oN0sRQQgrlsp8j926ayeysmYTVO2kriknuUjfIjlUq8=
> > > confluence/display/CTAKES/Piper+Files
> > > The Class names are shown, but not the packages.  If you have a
> > > decent IDE they should be easy to find - for Intellij press CTRL-N
> > > and type the name of the class.
> > >
> > > Another option is to use the Simple Pipeline Fabricator gui to look
> > > at the available readers and AEs and see what they do (and their
> > > required inputs).  Check the wiki at: 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=ONC114Bki6vY6dmCLn3sPjdNegVyawdkxXvYuBFoonI=oN0sRQQgrlsp8j926ayeysmYTVO2kriknuUjfIjlUq8=
> > > confluence/display/CTAKES/Simple+Pipeline+Fabricator+GUI
> > > If you launch the gui and let it gather information, you can look at
> > > the pipe bit names and descriptions (reader, AE).  If it interests
> > > you, click the "add" button (big '+') and on the right you will see
> > > the path to the source code for that bit of the pipeline.  Not all
> > > AEs
> > are described ...
> > > calling all community ...  but I think that most are.
> > >
> > > Sean
> > >
> > >
> > > -Original Message-
> > > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > > Sent: Monday, October 30, 2017 9:48 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: source code of user installation of cTakes. [EXTERNAL]
> > > [SUSPICIOUS]
> > >
> > > cTAKES is based on Apache UIMA, which is a pipeline-building tool.
> > > So the output you see in the CVD is the result of many different
> > > pieces of the pipeline run in succession, and they are each in
> > > different modules of cTAKES. ctakes-core has the most basic elements
> > > that will run for every pipeline -- tokens, sentences, etc.
> > > ctakes-dictionary-lookup-fast is what maps text spans to UMLS concepts.
> > ctakes-assertion finds negation status.
> > > ctakes-chunker creates syntactic chunks and ctakes-pos-tagger finds
> > > part-of-speech tags for tokens. There are many others but I think
> > > this covers the basics. In general, if you see a type in the CVD
> > > that you find interesting, your best bet is to grep the code for
> > > that type and see where it is being created (if you don't want to
> > > wait for an email
> > from the list).
> > > Pipeline components are known as "Analysis Engines" (AEs) in UIMA
> > > lingo and as a result are often in a package ending in .ae.
> > > Hope this helps you navigate the code!
> > > Tim
> > >
> > > 
> > > From: Bhagwat Posane <bhagwat.pos...@gmail.com>
> > > Sent: Monday, October 30, 2017 7:24 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: source code of user installation of cTakes. [EXTERNAL]
> > >
> > > Thanks Gandhi, for the quick response.
> > >
> > > I have source code of cTAKES which is available under
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.
> > > apache.org_repos_asf_ctakes_trunk=DwIBaQ=qS4goWBT7poplM69zy_
> > > 3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=Efsfuj37pWNoR_
> > > 6AidMyWm4ab03VgFjoRDFcJxdS9k0=ZquL0hWuNhJGyujJCmNBTCENaERN6B
> > > U3zisHhnM18Wo=. I see there are many projects in it.
> > >
> > > I am checking user version using \CTAKES_HOME\bin\runctakesCVD.bat,
> > > this opens an UI. I could run analysis engine for a clinical note
> > > according to the guidelines in the user-install guide..
> > > It gives me descent result in the left pane of the UI.
> > > Now I am looking for the source-code that gives this result for a
> > > clinical note. Could you please point me to the project, where can I
> > > see to it in the ctakes-trunk or so.
> > >
> > >
> > >
> > > On 30 October 2017 at 16:36, Gandhi Rajan Natarajan <
> > > gandhi.natara...@

Re: cTAKES as REST service [EXTERNAL]

2017-10-29 Thread Miller, Timothy

Sounds great, Matthew and Gandhi, thanks for sharing your solution.
Tim

From: Matthew Vita 
Sent: Sunday, October 29, 2017 11:59 AM
To: dev@ctakes.apache.org
Subject: Re: cTAKES as REST service [EXTERNAL]

Sean,

Ghandi and I have met and we both agreed that his solution is superior to
the one I was working on. Therefore, I will be helping to see this project
through to the end so we can get it into the codebase!

Here are the remaining work items that I will be spending time on:

   1. Get it running (I'm using Linux Mint)
   2. Test it out (including stress tests)
   3. Automate it to run in Docker (just need UMLS credentials)
   4. Make a call to

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoTeamEpsilon_cTAKES-2DConcept-2DMention-2DParser=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=RsnpR4AiLXd_qcFBYZG7T4_ExzPAkin8TsudEMLyPo8=YwuDivFqbAlEMTdeK-uxI2c01mLaq-4TfNwqDnVWUW4=
 to get a
   nice JSON payload that is easy to traverse (this can be an optional switch,
   of course - I believe it may be best to rewrite this in Java should this be
   included with the solution)
   5. Test the output in my web viewer:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GoTeamEpsilon_cTAKES-2DFriendly-2DWeb-2DUI=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=RsnpR4AiLXd_qcFBYZG7T4_ExzPAkin8TsudEMLyPo8=NjO4o8D_b6137joWwzPbit21dfg58a0_BXTikkpMFm8=
   6. Work on preparing the solution for the cTAKES core codebase. I will
   prepare it with a very rich README.

I will provide my updates over the coming days.

Thanks,

Matthew Vita
www.matthewvita.com

On Sun, Oct 29, 2017 at 7:47 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Gandhi,
>
> Thank you for the additional information.  Having a reliable rest service
> included with ctakes would be a boon for everybody interested in web
> access.  I look forward to checking out the info in github as soon as I am
> able.
>
> Thanks to you and Mathew both!
>
> Sean
>
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Sunday, October 29, 2017 5:44 AM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Sean,
>
> I feel it's better to upgrade cTAKES Spring version to 4x so that exposing
> it as rest service becomes seamless. Please find the github link that
> contains the proposed changes for Spring upgrade in cTAKES,
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_gandhirajan_cTAKES_tree_master_SpringUpgrade_ctakes-
> 2DSVN-2Dsrc=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=cedfmyhdY7P3qZdiVB-
> gp0T0WinfllT61pLMqbP_Jyw=eMYiHpgQwJ5Zjc7-gW6qyAJ3AS3-H622ZVSJEJcgd8s=
>
> I have not tested the changes in ytex modules as I'm not sure how to go
> about that.
>
> Matthew Vita will be reviewing the changes. He is also reviewing and
> testing my rest service changes. He will provide more info to us once we
> are done with our testing. So that we can discuss about productizing the
> same.
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Friday, October 27, 2017 12:53 AM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Gandhi,
>
> That sounds really great!  Thank you for sharing the process!
>
> Sean
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Thursday, October 26, 2017 3:02 PM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Sean,
>
> I'm glad to inform that I was able to upgrade cTAKES to Spring 4 in my
> sandbox. As you have mentioned, it is used by uima fit for firing some
> queries.
>
> To brief it, I did the following changes:
>
> 1) Changing SimpleJdbcTemplate to JdbcTemplate in uima modules
> 2) Changing Spring version in cTAKES root pom.xml
> 3) Adding Spring versions in ctakes type system, ctakes assertion, ctakes
> ytex and ctakes ytex web modules.
>
> Now I'm able to expose cTAKES as a rest service which takes the clinical
> text as Input and outputs the result.
>
>  Hope it helps someone.
>
> Regards,
> Gandhi
>
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Wednesday, October 25, 2017 7:33 PM
> To: dev@ctakes.apache.org
> Subject: RE: cTAKES as REST service [EXTERNAL]
>
> Hi Sean,
>
> Thanks for the instant response. Will try to upgrade to Spring 4 and keep
> you posted about the progress.
>
> Regards,
> Gandhi
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Wednesday, October 25, 2017 7:28 PM
> To: dev@ctakes.apache.org
>

Re: CAS Visual Debugger - [EXTERNAL]

2017-10-25 Thread Miller, Timothy

I've had the same thought, and come to the same conclusions.
Tim

From: Melvin Ma 
Sent: Wednesday, October 25, 2017 1:33 PM
To: dev@ctakes.apache.org
Subject: CAS Visual Debugger - [EXTERNAL]

This is more of a question. I am fully aware that CAS Visual Debugger is
maintained in UIMA project.

For me for now, I will frequently need to use CVD to view .xmi file. It
would be really nice if I could put the type system xml as an input to CVD
startup argument (instead of manully lookup this file and load it). Do you
know anyway to do it? I checked the documents multiple times and was not
able to find anything.

Thanks.

Melvin

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Miller, Timothy

I had in mind the notes in:
/ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf

which I believe are the fake notes Dr. John Green wrote for us. I don't know 
why they are rtf but they are nice, non-toy-length notes.
Tim


From: Alexandru Zbarcea <al...@apache.org>
Sent: Tuesday, October 3, 2017 5:32 PM
To: Apache cTAKES Dev
Subject: Re: Missing resources for script that extracts markables from a corpus 
for analysis [EXTERNAL]

Hi Tim,

That's great news. If you think there are sample notes that can be used, I
can start working on the Lucene index and slowly build the UTest for them.

I have created CTAKES-462[1] where we can track this work.

Looking into the ctakes-examples-res, what I can find is:
$ find . -type f | grep -v "\.class" | grep -v "\.iml" | grep -v "\.jar" |
grep -v "\.rtf" | grep -v "\.xml" | grep -v "\.bsv" | grep -v "\.piper"
./main/resources/org/apache/ctakes/examples/notes/pain_no_swelling.txt
./main/resources/org/apache/ctakes/examples/notes/claudication
./main/resources/org/apache/ctakes/examples/notes/shark_bite.txt
./main/resources/org/apache/ctakes/examples/notes/edge_cases_plaintext_1.txt

./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_1.txt
./main/resources/org/apache/ctakes/examples/notes/right_knee_arthroscopy
./main/resources/org/apache/ctakes/examples/notes/SampleInputRadiologyNotes.txt

./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_past_smoker.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc2_07543210_sample_past_smoker.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc2_07543210_sample_current.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_unknown.txt
./main/resources/org/apache/ctakes/examples/notes/smoker/
doc1_07543210_sample_current.txt
./main/resources/org/apache/ctakes/examples/notes/mother_goose/README
./main/resources/org/apache/ctakes/examples/notes/mother_
goose/OneMistyMoistyMorning.txt
./main/resources/org/apache/ctakes/examples/notes/dr_nutritious_2.txt
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_RoutBirthNote_1/Peds_RoutBirthNote_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_AAA_Leak_1/VascSurg_AAA_Leak_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_Dysphagia_1/Peds_Dysphagia_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_LaborProgressNote_1/OBGYN_LaborProgressNote_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_IUD_1/OBGYN_IUD_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_HysterectomyAndBSO_1/OBGYN_HysterectomyAndBSO_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_FollowUp_1/VascSurg_FollowUp_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_PROMCheck_1/OBGYN_PROMCheck_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_Gen_Abscess_1/OBGYN_Gen_Abscess_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/Peds_FebrileSez_1/Peds_FebrileSez_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_RO_AAA_1/VascSurg_RO_AAA_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_RO_DVT_1/VascSurg_RO_DVT_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/GenSurg_UmbilicalHernia_1/GenSurg_UmbilicalHernia_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/VascSurg_PVD_1/VascSurg_PVD_1
./main/resources/org/apache/ctakes/examples/annotation/
anafora_annotated/OBGYN_MVAPrego_1/OBGYN_MVAPrego_1

What notes do you consider I should start with (all) ?

Alex

[1] - 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D462=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=COSkyBpYGrcp_hTAFRRfTx8JCwHAzxTM3GMiXKrSbnE=jOmot_onPFb31eg689D0ihb5Y4dZTzKcQ40vMCW0Bgk=


On Mon, Oct 2, 2017 at 6:46 PM, Miller, Timothy <Timothy.Miller@childrens.
harvard.edu> wrote:

> Yeah, it might be nice to build a lucene index of all the sample notes in
> the ctakes-example module. I'll create a jira for it but probably won't be
> able to get to it right away.
> Tim
>
> 
> From: Alexandru Zbarcea <al...@apache.org>
> Sent: Monday, October 2, 2017 5:31 PM
> To: Apache cTAKES Dev
> Subject: Re: Missing resources for script that extracts markables from a
> corpus for analysis [EXTERNAL]
>
> Hi Tim,
>
> I understand, makes sense. Is it possible to anonymize the data you have or
> come up with a separate body

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS] [SUSPICIOUS]

2017-10-03 Thread Miller, Timothy

lization for a right atrial clot in 03/02 hepatocellular
> carcinoma was first noted and he was referred to an oncologist.  The
> patient started study treatment of Thalomid 200mg (days 1-21), and
> Epirubicin, 20 mg/m2 (days 1, 8, and 15) on 06/07/02 for the
> treatment of hepatocellular carcinoma.  He was concomitantly
> receiving Cardura, Ambien (for insomnia), Megace, Coumadin, and
> Oxycodone. This patient presented to the emergency room with the
> chief complaint of hematochezia. He reported noticing bright red
> blood and small clots mixed in with his stool. On 07/13/02, he was
> admitted due to gastrointestinal bleed.  The physician ordered 2
> large bore intravenous lines and planned to transfuse for hematocrit
> less than 30%. Due to the  INR (international normalized ratio) level
> of 3.0, Coumadin was held. He was also noted to have bilateral lower
> extremity edema with dyspnea on exertion.  On 07/13/02, he had a
> chest X-ray PA and lateral done that showed no evidence of acute
> pneumonia or congestive heart failure.  On 07/14/02, he underwent  an
> ultrasound which was negative for deep vein thrombosis. This patient
> did not take Thalomid on the day of his admittance to the hospital,
> but resumed treatment shortly after with no return of symptoms. On
> 07/15/02, he was discharged in stable condition. There have been no
> further reports of bleeding at this time. Thedoctor has assessed the
> hematochezia as related to Coumadin treatment and previously
> diagnosed diverticulosis, and not to protocol therapy with Thalomid
> and Epirubicin.Additional information received from the investigator
> on 27Aug02 reveals that this male patient began on 07Jun02 two cycles
> of therapy with Thalidomide and Epirubicin.  His post cycle two
> computed tomography scans revealed increase in size of liver lesion
> with development of multiple new satellite nodules.  On 29Jul02, the
> investigator removed this patient from protocol for progressive
> disease and recommended hospice care.  After seeking a second opinion
> from two other institutions, this patient was admitted to hospice on
> 05Aug02.  On 20Aug02, the investigator noted that this patient was
> suffering worsening fatigue and got tired getting out of his
> chair.  On 25Aug02, this patient died due to disease
> progression.  The investigator assessed the death as not related to
> study treatment and expected"
> 
> 
> 
> 
> -Original Message-
> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> Sent: Monday, October 02, 2017 10:36 AM
> To: dev@ctakes.apache.org
> Subject: Re: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]
> 
> My bad, I didn't read too closely and thought this was going to be a
> 
> coreference patch. I don't know this FSM code that well, so I am not
> an
> 
> expert. My biggest concern at a glance is that these additions help
> 
> find more true positives (as in your examples), can we verify that
> they
> 
> won't create false positives?
> 
> Tim
> 
> 
> 
> 
> 
> On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
> 
> > 
> > Hi Sean,
> > 
> > 
> > 
> > Thanks again for the response. I guess its mistake from my side
> > that
> > 
> > I dint send the complete text. Did you mean that with the text I
> > 
> > sent, the co-reference superscript-1 will be lost?
> > 
> > 
> > 
> > Also as per your advice, We have created an issue  - https://urldef
> > ense.proofpoint.com/v2/url?u=https-
> > 3A__urldefen=DwIGaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> > =fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=CGnNaO_ZfECB0wIfj3
> > upr01l4w_rNBG8no_VN9cFxhs=ikLBvXRXENiHoTgailnfsVrB-
> > sy2hMgKCTVIO8iUeNE=
> > 
> > se.proofpoint.com/v2/url?u=https-
> > 
> > 3A__issues.apache.org_jira_browse_CTAKES-
> > 
> > 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup
> > -
> > 
> > IbsIg9Q1TPOylpP9FE4GTK-
> > 
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh
> > _g
> > 
> > nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   fo
> > r
> > 
> > measurement FSM changes and attached the modified file changes.
> > Could
> > 
> > someone have a look and know your thoughts please?
> > 
> > 
> > 
> > Regards,
> > 
> > Gandhi
> > 
> > 
> > 
> > 
> > 
> > -Original Message-
> > 
> > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> > 
> > Sent: Thursday, September 28, 2017 8:21 PM
> > 
>

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Miller, Timothy

Yeah, it might be nice to build a lucene index of all the sample notes in the 
ctakes-example module. I'll create a jira for it but probably won't be able to 
get to it right away.
Tim


From: Alexandru Zbarcea <al...@apache.org>
Sent: Monday, October 2, 2017 5:31 PM
To: Apache cTAKES Dev
Subject: Re: Missing resources for script that extracts markables from a corpus 
for analysis [EXTERNAL]

Hi Tim,

I understand, makes sense. Is it possible to anonymize the data you have or
come up with a separate body of test data to generate a Lucene index and
unit test the code? I think this would have the double benefit of the code
being tested and showing dev/users how the code is supposed to be used.

What do you think?

Alex


On Mon, Oct 2, 2017 at 9:45 AM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> Thanks Alex,
> This code is for processing a clinical text data corpus stored as a
> lucene index -- data that cannot be redistributed for privacy reasons.
> Since it's so related to the coref stuff I thought it should go
> alongside the coreference module. But maybe it makes more sense as an
> external project since it can't really function without externally
> created resources -- what do you think?
> Tim
>
>
> On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote:
> > Hi,
> >
> > I was trying to do a UTest for the
> > org.apache.ctakes.coreference.data.PrintMimicMarkables (recently
> > added),
> > but I couldn't find any of the existing resources that can be used
> > for
> > this. Can anyone help me pointing to a resource (Lucene index)
> > folder.
> >
> > org.apache.ctakes.coreference.data.PrintMimicMarkables \
> >
> > /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup-
> > res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index
> > \
> > index.out
> >
> > I was trying with the following lucene folder/resource:
> > ./ctakes-coreference-
> > res/src/main/resources/org/apache/ctakes/coreference/models/index_med
> > _5k
> >
> > And also the dictionaries:
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> > like_codes_sample
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_
> > cue_phrase_index
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> > like_sample
> > ./ctakes-dictionary-lookup-
> > res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index
> >
> > Any execution looks like:
> > 01 Oct 2017 19:50:19  INFO ConstituencyParser - Initializing
> > parser...
> > Oct 01, 2017 7:50:20 PM
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process
> > WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::)
> > Message:
> > docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> > Oct 01, 2017 7:50:20 PM
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820)
> > WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> > java.lang.IllegalArgumentException: docID must be >= 0 and <
> > maxDoc=5000
> > (got docID=5000)
> > at
> > org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite
> > Reader.java:152)
> > at
> > org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea
> > der.java:115)
> > at org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
> > at
> > org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec
> > tionReader.java:90)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(
> > ArtifactProducer.java:494)
> > at
> > org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif
> > actProducer.java:711)
> >
> > Collection process complete called, closing file writer.
> >
> > I appreciate any of your help,
> > Alex

Re: CTAKES-460: coreference Test should not be part of main [EXTERNAL]

2017-10-02 Thread Miller, Timothy

Thanks Alex, I've committed this patch.
I unfortunately looked at the wrong tab when typing my commit message
and committed it with the wrong issue number (459).

Tim

On Mon, 2017-10-02 at 08:17 -0400, Alexandru Zbarcea wrote:
> Hi,
> 
> I have refactor a main class that should have been a UTest.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.or
> g_jira_browse_CTAKES-
> 2D460=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=T0fckwyf1n_TXQgdwCR5YlQItLlxMx
> 9nU_S5EUx1Iu0=f5ZcQqm3Dbk91cdhymh20-kg5cyZGoHPFjK0x9ZH32k= 
> 
> This moves the test code from src/main to src/test and also added
> some
> refactoring.
> 
> No impact. Can easily be merged.
> 
> Alex

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS]

2017-10-02 Thread Miller, Timothy

My bad, I didn't read too closely and thought this was going to be a
coreference patch. I don't know this FSM code that well, so I am not an
expert. My biggest concern at a glance is that these additions help
find more true positives (as in your examples), can we verify that they
won't create false positives?
Tim


On Fri, 2017-09-29 at 06:25 +, Gandhi Rajan Natarajan wrote:
> Hi Sean,
> 
> Thanks again for the response. I guess its mistake from my side that
> I dint send the complete text. Did you mean that with the text I
> sent, the co-reference superscript-1 will be lost?
> 
> Also as per your advice, We have created an issue  - https://urldefen
> se.proofpoint.com/v2/url?u=https-
> 3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=Tihsi1dyNHsqsYbwyClGANfqk2Ov2nfQL2YuIV1L0CI=   for
> measurement FSM changes and attached the modified file changes. Could
> someone have a look and know your thoughts please?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Thursday, September 28, 2017 8:21 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Gandhi,
> 
> I don't recall you sending me that entire snippet of text.  I think
> that I only had your single example sentence.
> You have discovered one of the quirks of software: "change the data,
> change the result."
> Ctakes is a system with many moving parts.  Things that precede or
> follow your original example sentence will change the evaluation of
> that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> Tim can correct me here, but maybe the coreference module ranked the
> links between "thalomid" as much higher than the rank between "study
> treatment of thalomid 200mg" and "the treatment of hepatocellular
> carcinoma" and discarded the encapsulating treatment texts from
> markables?  It is probably more complex than that.
> 
> > 
> > we have also made some code changes in MeasurementFSM.java to
> > identify certain measurements like '20 mg/m2' which was not
> > identified out of the box.  Should we send the code changes to you
> > so that you can consider the same to be productized ? Please
> > advise."
> I don't know if you've noticed the recent emails on the dev list
> involving Alexandru Zbarcea.  Alex has been creating or commenting on
> Jira items and attaching code for  fixes and enhancements.  This is a
> widely used process and is fairly easy to follow.   I think that the
> following links are relevant:
> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http
> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-
> 2Dissues-
> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=Fo-LGlsEfYJpgYcWvrDmor0B3YGxx5brZLelntVMxrU= 
> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-
> 2Dpre-2Dcommit-2Dreviews-
> 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=wVhEQCU73iEplHm34bO2AtgaDUpjAvrFe4GFx5b6pYo= 
> Attaching files:   https://urldefense.proofpoint.com/v2/url?u=https-3
> A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-
> 2Dscreenshots-2Dto-2Dissues-
> 2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=0kLxqu0Xu_2pjzCrVwxC4cd_1ubh_g
> nqCIxz6hOzUUQ=eO_HZCkkeOg8jF3CMYnMxttXRHSM16qdwPl5nTW48zQ= 
> 
> I don't know if you have a jira account and permissions for the
> ctakes project.  An administrator may need to set that up for you.
> 
> Thanks,
> Sean
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
> Sent: Thursday, September 28, 2017 4:09 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
>

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Miller, Timothy

Thanks Alex,
This code is for processing a clinical text data corpus stored as a
lucene index -- data that cannot be redistributed for privacy reasons.
Since it's so related to the coref stuff I thought it should go
alongside the coreference module. But maybe it makes more sense as an
external project since it can't really function without externally
created resources -- what do you think?
Tim


On Sun, 2017-10-01 at 19:54 -0400, Alexandru Zbarcea wrote:
> Hi,
> 
> I was trying to do a UTest for the
> org.apache.ctakes.coreference.data.PrintMimicMarkables (recently
> added),
> but I couldn't find any of the existing resources that can be used
> for
> this. Can anyone help me pointing to a resource (Lucene index)
> folder.
> 
> org.apache.ctakes.coreference.data.PrintMimicMarkables \
> 
> /home/alex/projects/apache/ctakes/ctakes-dictionary-lookup-
> res/target/classes/org/apache/ctakes/dictionary/lookup/rxnorm_index
> \
> index.out
> 
> I was trying with the following lucene folder/resource:
> ./ctakes-coreference-
> res/src/main/resources/org/apache/ctakes/coreference/models/index_med
> _5k
> 
> And also the dictionaries:
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> like_codes_sample
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/assertion_
> cue_phrase_index
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/OrangeBook
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/snomed-
> like_sample
> ./ctakes-dictionary-lookup-
> res/src/main/resources/org/apache/ctakes/dictionary/lookup/drug_index
> 
> Any execution looks like:
> 01 Oct 2017 19:50:19  INFO ConstituencyParser - Initializing
> parser...
> Oct 01, 2017 7:50:20 PM
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer process
> WARNING: Got Exception. (Thread Name: [CollectionReader Thread]::)
> Message:
> docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> Oct 01, 2017 7:50:20 PM
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer run(820)
> WARNING: docID must be >= 0 and < maxDoc=5000 (got docID=5000)
> java.lang.IllegalArgumentException: docID must be >= 0 and <
> maxDoc=5000
> (got docID=5000)
> at
> org.apache.lucene.index.BaseCompositeReader.readerIndex(BaseComposite
> Reader.java:152)
> at
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeRea
> der.java:115)
> at org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
> at
> org.apache.ctakes.core.cr.LuceneCollectionReader.getNext(LuceneCollec
> tionReader.java:90)
> at
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(
> ArtifactProducer.java:494)
> at
> org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(Artif
> actProducer.java:711)
> 
> Collection process complete called, closing file writer.
> 
> I appreciate any of your help,
> Alex

Re: Enabling drugner pipeline and identifying dates [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2017-09-29 Thread Miller, Timothy

It is a very busy time for me but this is on my todo list. Don't be
afraid to ping in a week or so if you don't hear anything.

Tim

On Fri, 2017-09-29 at 14:04 +, Finan, Sean wrote:
> Hi Gandhi,
> > 
> > Did you mean that with the text I sent, the co-reference
> > superscript-1 will be lost?
> Yes.  Well, to be more clear, the coreference that was resolved as #1
> in your original sentence alone will be lost.  However, there are
> eight or none coreference chains discovered in your full paragraph,
> and one of those will have superscript 1s.
> 
> > 
> > Could someone have a look and know your thoughts please?
> Thank you for creating the jira and the patch.  I am sure that
> somebody will take a look.
> 
> Thanks,
> Sean
> 
> 
> -Original Message-
> From: Gandhi Rajan Natarajan [mailto:gandhi.natara...@arisglobal.com]
>  
> Sent: Friday, September 29, 2017 2:25 AM
> To: dev@ctakes.apache.org
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Sean,
> 
> Thanks again for the response. I guess its mistake from my side that
> I dint send the complete text. Did you mean that with the text I
> sent, the co-reference superscript-1 will be lost?
> 
> Also as per your advice, We have created an issue  - https://urldefen
> se.proofpoint.com/v2/url?u=https-
> 3A__issues.apache.org_jira_browse_CTAKES-
> 2D459=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67Gv
> lGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjADsUYBjMaVho
> hpozRybEEpwNUg=KHAFRjKk4tjMJGHaIjrUuqk6XAtVFYP0sVuN5ODLs3Q=   for
> measurement FSM changes and attached the modified file changes. Could
> someone have a look and know your thoughts please?
> 
> Regards,
> Gandhi
> 
> 
> -Original Message-----
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Thursday, September 28, 2017 8:21 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy <timothy.mil...@childrens.harvard.edu>
> Subject: RE: Enabling drugner pipeline and identifying dates
> [EXTERNAL] [SUSPICIOUS]
> 
> Hi Gandhi,
> 
> I don't recall you sending me that entire snippet of text.  I think
> that I only had your single example sentence.
> You have discovered one of the quirks of software: "change the data,
> change the result."
> Ctakes is a system with many moving parts.  Things that precede or
> follow your original example sentence will change the evaluation of
> that sentence.
> With the pipeline you are using and the full note, you should see a
> number (mine is 4) next to the first "thalomid" in the original
> example sentence.  If you click that number you should see (to the
> right) 4 instances of "thalomid".
> Tim can correct me here, but maybe the coreference module ranked the
> links between "thalomid" as much higher than the rank between "study
> treatment of thalomid 200mg" and "the treatment of hepatocellular
> carcinoma" and discarded the encapsulating treatment texts from
> markables?  It is probably more complex than that.
> 
> > 
> > we have also made some code changes in MeasurementFSM.java to
> > identify certain measurements like '20 mg/m2' which was not
> > identified out of the box.  Should we send the code changes to you
> > so that you can consider the same to be productized ? Please
> > advise."
> I don't know if you've noticed the recent emails on the dev list
> involving Alexandru Zbarcea.  Alex has been creating or commenting on
> Jira items and attaching code for  fixes and enhancements.  This is a
> widely used process and is fairly easy to follow.   I think that the
> following links are relevant:
> Working with issues:  https://urldefense.proofpoint.com/v2/url?u=http
> s-3A__confluence.atlassian.com_jiracoreserver073_working-2Dwith-
> 2Dissues-
> 2D861257307.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=2BFHffDc3fS5DTAXq3M5MsGBv_uG0t3MceVT38alp
> 2Q= 
> Creating patches:   https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__confluence.atlassian.com_crucible_creating-2Dpatch-2Dfiles-2Dfor-
> 2Dpre-2Dcommit-2Dreviews-
> 2D298977458.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=iyJsQ5ekdL7Vf_wcjA
> DsUYBjMaVhohpozRybEEpwNUg=JXOJanO4pjISmYVdCpcTLHD72n0_wzJMa7xrYDT1G
> yc= 
> Attaching files:   https://urldefense.proofpoint.com/v2/url?u=https-3
> A__confluence.atlassian.com_jiracorecloud_attaching-2Dfiles-2Dand-
> 2Dscreenshots-2Dto-2Dissues-
> 2D765593805.html=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx

Re: bitwise operator vs logical operator [EXTERNAL]

2017-09-25 Thread Miller, Timothy

Thanks Alexz,
I've committed the patch.
Tim

On Sun, 2017-09-24 at 21:57 -0400, Alexandru Zbarcea wrote:
> Hi,
> 
> I have reported and provided patch for:
> https://issues.apache.org/jira/browse/CTAKES-456
> 
> I hope it helps to improve readability at least.
> 
> Is there anything else related to the process of providing patches?
> 
> Regards,
> Alexz

Re: semantic Role mapping [EXTERNAL]

2017-09-15 Thread Miller, Timothy

?The image isn't rendering for me -- can you upload to imgur and post a link 
maybe?

Tim

From: abilash.mat...@cognizant.com 
Sent: Friday, September 15, 2017 12:42 AM
To: dev@ctakes.apache.org
Subject: RE: semantic Role mapping [EXTERNAL]

Hi Sean,

I am looking for the relation between the semantic groups to get printed in the 
output text file.

For example in the sentence,  "The patient underwent a CT scan in April which 
did not reveal lesions in his liver"

I want the "Semantic Role Labeling"( as in the below image) - relation between 
semantic groups identified to get printed in the output text file.

[cid:image001.png@01D32E0B.1A55E0B0]

Thanks,

Abilash Mathew

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
Sent: Thursday, September 14, 2017 6:48 PM
To: dev@ctakes.apache.org
Subject: RE: semantic Role mapping [EXTERNAL]

Hi Abilash,

What exactly are you looking for?

Sean

-Original Message-

From: abilash.mat...@cognizant.com 
[mailto:abilash.mat...@cognizant.com]

Sent: Thursday, September 14, 2017 3:13 AM

To: dev@ctakes.apache.org

Subject: semantic Role mapping [EXTERNAL]

Hi,

Please let me know if there is a method in CTAKES other than  
dumpSRLOutput(Annotation annotation to get the semantic Role mapping of input 
sentences.

Does DependencyNodeWriter.java has any role in this?

Thanks,

Abilash Mathew

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

question about PersonTitleAnnotation class

2017-07-03 Thread Miller, Timothy

More specifically:

/ctakes-type-system/target/generated-sources/jcasgen/org/apache/ctakes/typesystem/type/textsem/PersonTitleAnnotation.java?


Just curious what this type is intended to represent. Is it titles, as in "Dr." 
or "Mrs."?

And is there a type for just representing a person?


If anyone has recollection I'll add it to the description.

Tim

Re: Proposed improvements [EXTERNAL] [SUSPICIOUS]

2017-06-27 Thread Miller, Timothy

Yeah, actually, I have no idea why that's there. All the actual default parser 
models are in their own directories (dependency, srl, etc.). This almost looks 
like just a collection of additional models, which the average user would have 
no idea how to use and take up a lot of space.
Tim


From: Finan, Sean <sean.fi...@childrens.harvard.edu>
Sent: Tuesday, June 27, 2017 10:07 PM
To: dev@ctakes.apache.org
Subject: RE: Proposed improvements [EXTERNAL] [SUSPICIOUS]

Hi all,

> I would like to have (and work on it) much leaner distribution
One bigfoot is the clearparser_models.jar in ctakes-dependency-parser-res.  As 
far as I know this is not used by default or in any checked-in non-default 
configuration.  As it is 1/4 GB, I would like to move it to its own module to 
keep it out of projects that use ctakes "as a library".  I hunted the net to 
see if a duplicate is available elsewhere for alternative inclusion methods but 
couldn't find one.

Thoughts?

Thanks,
Sean

-Original Message-
From: Andrey Kurdumov [mailto:kant2...@googlemail.com]
Sent: Sunday, June 25, 2017 1:52 AM
To: cTakes developers list
Subject: Re: Proposed improvements [EXTERNAL]

Just want to note that ASF PMC want to make GitHub primary repository and 
Apache servers secondary soon.

Regarding improvements:
I personally want better support for embedding. Right now cTakes distribution 
comes with LVG and UMLS dictionary and size of cTakes thus become very.
I would like to have (and work on it) much leaner distribution, let's name it 
cTakes Core, which will just provide cTakes executable without need for data.
Right now I have constantly rip-off that data after cTakes build which slow 
down my build significantly.

Personally I support Hadrian initiative to have better logging since cTakes 
setup has some quirks which could be faster resolved by better logging.


2017-06-23 17:38 GMT+06:00 Miller, Timothy <
timothy.mil...@childrens.harvard.edu>:

> Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and
> like something where we should be making people aware of cTAKES!
>
> svn vs. git -- I'm with you on preferring git, but not by so much that
> it's worth spending time on an argument if it turns into an argument
> :). As far as I know we've never really had a discussion about it.
> It's probably getting to the point where new developers have _only_
> used git and would find it a complete roadblock to use svn but for me
> it's just a mild annoyance.
>
> All others you mentioned -- if you are willing to contribute a patch
> we are happy to accept one-off contributions, and we are also
> interested in growing the developer community with people who are
> interested in contributing regularly over time.
>
> Tim
>
> 
> From: Hadrian Zbarcea <hzbar...@gmail.com>
> Sent: Thursday, June 22, 2017 9:14 PM
> To: dev@ctakes.apache.org
> Subject: Proposed improvements [EXTERNAL]
>
> Last week I presented at the OSEHRA Summit about ActiveMQ (and a few
> other projects) and the ASF in general.
>
> I was surprised that most didn't know much about the ASF and more
> importantly that nobody knew about cTakes, the only (directly)
> healthcare related project at the ASF. There was no cTakes talk at
> ApacheCon in Miami, but at OSEHRA, which is all about healthcare we
> should have had a presence. I will probably submit a talk for next
> year, but until then, because I think I created a bit of interest in
> cTakes I went to build cTakes myself and try a few things.
>
> Some of my findings are:
> * test failures with openjdk; granted the docs mention oracle jdk as a
> prerequisite, but think it's easy to support openjdk
> * use of svn vs git; this is a debatable topic, but by now everybody
> and their uncles are on git so moving to git (which I'd recommend)
> would probably forster adoption (yes, I know about the github mirror)
> * no support for OSGi, many large players use it
> * improvements in logging could go a long way, starting with moving to
> slf4j
>
> Suggesting improvements imply that I volunteer to do a good chunk of
> the work, but before that I'm interested more in how much the
> community would welcome such improvements. I am curious what are
> considered more low hanging fruits, for the more controversial topics
> we could take them to [discuss] threads. Because every community has
> its own culture and I am not that familiar with the cTakes one,
> although I went through the mail archives, I thought a prudent first step 
> would be to start with this.
>
> Feedback appreciated,
> Hadrian
>

Re: Proposed improvements [EXTERNAL]

2017-06-23 Thread Miller, Timothy

Thanks Hadrian, I hadn't heard of OSEHRA but it looks interesting and like 
something where we should be making people aware of cTAKES!

svn vs. git -- I'm with you on preferring git, but not by so much that it's 
worth spending time on an argument if it turns into an argument :). As far as I 
know we've never really had a discussion about it. It's probably getting to the 
point where new developers have _only_ used git and would find it a complete 
roadblock to use svn but for me it's just a mild annoyance.

All others you mentioned -- if you are willing to contribute a patch we are 
happy to accept one-off contributions, and we are also interested in growing 
the developer community with people who are interested in contributing 
regularly over time.

Tim


From: Hadrian Zbarcea 
Sent: Thursday, June 22, 2017 9:14 PM
To: dev@ctakes.apache.org
Subject: Proposed improvements [EXTERNAL]

Last week I presented at the OSEHRA Summit about ActiveMQ (and a few
other projects) and the ASF in general.

I was surprised that most didn't know much about the ASF and more
importantly that nobody knew about cTakes, the only (directly)
healthcare related project at the ASF. There was no cTakes talk at
ApacheCon in Miami, but at OSEHRA, which is all about healthcare we
should have had a presence. I will probably submit a talk for next year,
but until then, because I think I created a bit of interest in cTakes I
went to build cTakes myself and try a few things.

Some of my findings are:
* test failures with openjdk; granted the docs mention oracle jdk as a
prerequisite, but think it's easy to support openjdk
* use of svn vs git; this is a debatable topic, but by now everybody and
their uncles are on git so moving to git (which I'd recommend) would
probably forster adoption (yes, I know about the github mirror)
* no support for OSGi, many large players use it
* improvements in logging could go a long way, starting with moving to slf4j

Suggesting improvements imply that I volunteer to do a good chunk of the
work, but before that I'm interested more in how much the community
would welcome such improvements. I am curious what are considered more
low hanging fruits, for the more controversial topics we could take them
to [discuss] threads. Because every community has its own culture and I
am not that familiar with the cTakes one, although I went through the
mail archives, I thought a prudent first step would be to start with this.

Feedback appreciated,
Hadrian

Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

2017-06-23 Thread Miller, Timothy

Something I just thought of is that if you are using the new (beta) sentence 
detector trained on Mimic, it is a bit of a "lumper" rather than a "splitter," 
meaning it is more likely to miss a sentence break and make longer sentences, 
sometimes absurdly long if there are no clear cues. I know that will slow down 
the constituency parser and dependency parser, but not sure why it would only 
slow down when negation processing is added. So, not a solution but something 
to keep in mind while debugging, especially if it interacts with Steve and 
Sean's feedback.
Tim



From: Dligach, Dmitriy <ddlig...@luc.edu>
Sent: Wednesday, June 21, 2017 9:18 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Sean, thanks for your comments. You are right. The slowdown doesn’t have 
anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The 
reason you and others haven’t seen this in other pipelines is that you’ve 
probably been processing relatively small files.

I am processing MIMIC patient files, which typically have thousands of words. I 
just tried to process 300 files from the THYME corpus (where the files have 
hundreds of words) and the slowdown was barely noticeable. When running the 
same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <sean.fi...@childrens.harvard.edu> 
> wrote:
>
> Hi Dima,
>
> It looks like the UriCollectionReader that you are using never sets a 
> document id (type DocumentID) in the cas.  However, this shouldn't be a 
> problem as each document will be assigned a unique id "UnknownDocument"{###} 
> where {###} is a number incremented per new document with an unknown id.  The 
> message that you are seeing is just a warning.  The code fetching the 
> documentID and creating a default are very simple and should not take any 
> real processing time.
>
> The call to get document id is the very first line in 
> AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException
>  {
>String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>
> So, the slowdown occurring after the warning message leads me to believe that 
> the problem lies later in the process ...
>
> My suggestion is that you put a breakpoint there and run your pipeline 
> through a debugger.  Optionally, there are a couple of log.debug messages in 
> that class, so you could change the granularity of your log4j and see if you 
> can narrow down the problem.  Add more debug statements if it helps.
>
> At any rate, I have not seen this problem in other pipelines.
>
> Sean
>
> -Original Message-
> From: Dligach, Dmitriy [mailto:ddlig...@luc.edu]
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
>
> Dear cTAKES developers,
>
> I am observing something strange. As soon as I add at the end of my pipeline 
> the uncertainty/negation AEs:
>
> aggregateBuilder.add( 
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
> aggregateBuilder.add( 
> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon 
> as I remove these two AEs at the end of my pipeline, it runs very fast again.
>
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>
> If I remove the two AEs, this warning disappears.
>
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E=
>
> Any clues?
>
> Thank you very much,
>
> Dima
>
>
>

Re: Get the Annotator descriptor file [EXTERNAL]

2017-06-14 Thread Miller, Timothy

Someone else can probably be more helpful for building a custom dictionary. I 
haven't looked too much but if someone has a pointer to a guide that would be 
helpful to me too. I do know if you have a lookup descriptor there is a version 
of the method that takes its location:
DefaultJCasTermAnnotator.java.createAnnotatorDescription(final String 
descriptorPath)

Tim


From: Kumar, Avanish <avanish.ku...@optum.com>
Sent: Wednesday, June 14, 2017 3:25 PM
To: dev@ctakes.apache.org
Subject: RE: Get the Annotator descriptor file [EXTERNAL]

Hi Tim,
 Thanks for your suggestion. I need a little more help. Can you tell the full 
procedure on how to build custom dictionary which will include terms defined by 
me and then how to configure that custom dictionary with the annotator.

Thanks,
Avanish kumar

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Wednesday, June 14, 2017 5:17 PM
To: dev@ctakes.apache.org
Subject: Re: Get the Annotator descriptor file [EXTERNAL]

You should be able to add the dictionary to an AggregateBuilder with
DefaultJCasTermAnnotator.java.createAnnotatorDescription()
and then get a descriptor with toXml() (or whatever the uimafit method is 
called).

If you've tried that and it's not working then I think we'll need more 
information about the error to help you.

Thanks
Tim


From: Kumar, Avanish <avanish.ku...@optum.com>
Sent: Wednesday, June 14, 2017 12:44 PM
To: dev@ctakes.apache.org
Subject: Get the Annotator descriptor file [EXTERNAL]

Hi ,

I am using UIMAFit so the program is unable to generate the annotator 
descriptor file(.xml file) for the dictionary lookup annotator.
If anyone has any idea how to get the descriptor file while using UIMAFit 
please help me out.

Thanks
Avanish Kumar

This e-mail, including attachments, may include confidential and/or proprietary 
information, and may be used only by the person or entity to which it is 
addressed. If the reader of this e-mail is not the intended recipient or his or 
her authorized agent, the reader is hereby notified that any dissemination, 
distribution or copying of this e-mail is prohibited. If you have received this 
e-mail in error, please notify the sender by replying to this message and 
delete this e-mail immediately.


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Re: Get the Annotator descriptor file [EXTERNAL]

2017-06-14 Thread Miller, Timothy

You should be able to add the dictionary to an AggregateBuilder with
DefaultJCasTermAnnotator.java.createAnnotatorDescription()
and then get a descriptor with toXml() (or whatever the uimafit method is 
called).

If you've tried that and it's not working then I think we'll need more 
information about the error to help you.

Thanks
Tim


From: Kumar, Avanish 
Sent: Wednesday, June 14, 2017 12:44 PM
To: dev@ctakes.apache.org
Subject: Get the Annotator descriptor file [EXTERNAL]

Hi ,

I am using UIMAFit so the program is unable to generate the annotator 
descriptor file(.xml file) for the dictionary lookup annotator.
If anyone has any idea how to get the descriptor file while using UIMAFit 
please help me out.

Thanks
Avanish Kumar

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Re: cTAKES 4.0.0 Release

2017-04-24 Thread Miller, Timothy

Congrats cTAKES team! This is an important milestone!
Tim


On Mon, 2017-04-24 at 09:02 -0400, Murali Minnah wrote:
> The Apache cTAKES team is pleased to announce the availability of the
> 4.0.0 release.
> 
> For the complete release notes, please visit
> https://s.apache.org/ctakes-4.0.0-release-notes
> 
> Apache clinical Text Analysis and Knowledge Extraction System
> (cTAKES) is
> an open-source natural language processing system for information
> extraction from electronic medical record clinical free-text.
> 
> The release can be downloaded from
> http://ctakes.apache.org/downloads.cgi
> 
> For further information, please visit the project website at
> http://ctakes.apache.org/
> 
> -- The Apache cTAKES Team

Re: Docker

2017-04-24 Thread Miller, Timothy

One of those that Oleg found is my github repo which is very early
stages:
https://github.com/tmills/ctakes-docker

it can create 2 docker images, one for a UIMA AS queue server and
another that downloads ctakes, installs the dictionary, and starts a
basic concept extraction server with a UIMA AS descriptor. There is a
sample environment variables file where you need to enter your UMLS
credentials. It is a big image but because it is a simple pipeline it
can run with a smaller memory footprint. This project just started so
the reader end is underdeveloped, I've just been pointing to it from a
CVD with a remote descriptor.

Tim

On Sun, 2017-04-23 at 18:59 -0600, John Travis Green wrote:
> Sean: if you have it dockerized at harvard can you share the setup
> files? Regarding legalities, this is for in-house work, not
> redistribution. I work for the federal government and to deploy with
> our security restrictions its very convenient to have it deployed as
> a docker instance.  Im a very strong advocate in the dod regarding
> ctakes use. I dont know of anyone else in my position pushing it.
> Were looking at some major uses regarding accessability of legacy
> data (recall the dod is transitioning to cerner, but we have a lot of
> data that will still need accessibility, not the least of which are
> physician researchers). But ctakes is difficult to deploy on dod
> systems because of our security requirements. If we can containerize
> it then it will make it more likely we use it.  Thanks, John  
> 
> 
> 
> Keep in mind one very important thing:
> 
> 
> 
> You need to be very careful about redistribution of a umls
> database.  Many years ago ctakes had to get special permission to
> post a copy on sourceforge.  As you all know, use of that
> distribution requires a umls username and password check per-ctakes
> launch.  This was also a requirement placed upon ctakes by the nlm
> per the agreement.
> 
> 
> 
> Public distribution of Oracle Java in a docker container is
> technically illegal, but in the beginning a lot of people were not
> reading eula info and went smooth criminal.  Strange but true.  Now
> people know to use OpenJDK.  I have not contacted the nlm regarding
> docker and the umls.  Has anybody else out there?  If so please let
> us know.
> 
> 
> 
> For a private container inclusion of the dictionary is fine (we have
> one at harvard).   Otherwise there are ways to use / copy s3 files at
> runtime, you would just need to document a static location for the
> database, etc. etc.
> 
> 
> 
> Sean
> 
> 
> 
> -Original Message-
> 
> From: Jay Vyas [mailto:jayunit100.apa...@gmail.com]  
> Sent: Sunday, April 23, 2017 5:56 AM
> 
> To: dev@ctakes.apache.org
> 
> Subject: Re: Docker
> 
> 
> 
> Dockerizing ctakes as a build was useful at one time for sure.
> 
> 
> 
> If running as a microservice remember the size of the image is
> problematic ; you don't want it on lots of different nodes if using
> something like kubernetes.
> 
> 
> 
> Also remember to make sure you run with Xmx args so that cgroups done
> constrain the jvm memory guess, otherwise you'll get OOME errors.
> 
> 
> 
> > 
> > On Apr 23, 2017, at 4:38 AM, Oleg Tikhonov  wrote:
> > 
> >  
> > I've tried to create service from
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com
> > _r_llin_docker-5Fapache-5Fctakes_-
> > 7E_dockerfile_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> > FU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8kr
> > nlh_KgNiyydac8pJOidOHZ9T8R0=P2oSxIaW_ShWXNZ3wdqY6W-
> > Rz20Hy_FHp3JPXTHOdcw= , without
> > 
> > success.
> > 
> >  
> > However Docker file looks as follows:
> > 
> >  
> > FROM java:7
> > 
> > ADD https://urldefense.proofpoint.com/v2/url?u=http-3A__mirror.soft
> > aculous.com_apache_ctakes_ctakes-2D3.2.2_apache-2Dctakes-
> > 2D=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67Gvl
> > GZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8krnlh_KgNiyyda
> > c8pJOidOHZ9T8R0=puah9D0M36Stz_sbDttCx3KRoSnBicoYAKkikXPuMCQ=  
> > 3.2.2-bin.tar.gz
> > 
> > ADD https://urldefense.proofpoint.com/v2/url?u=https-3A__storage.go
> > ogleapis.com_google-2Dcode-2Darchive-
> > 2Ddownloads_v2_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> > eFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=jQYowxW0GDNXw8k
> > rnlh_KgNiyydac8pJOidOHZ9T8R0=I7CUV0TTeXZY4oqG5P1oMbQ3m2glTGzLEN5T
> > KzWGQuk=  
> > code.google.com/ytex/ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
> > 
> > RUN tar -xzf apache-ctakes-3.2.2-bin.tar.gz
> > 
> > RUN ln -s /apache-ctakes-3.2.2 /apache-ctakes
> > 
> > RUN mkdir temp
> > 
> > RUN unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip -d temp/
> > 
> > RUN cp -a temp/lib/. /apache-ctakes/lib/
> > 
> > RUN rm apache-ctakes-3.2.2-bin.tar.gz
> > 
> > RUN rm ctakes-ytex-lib-3.1.2-SNAPSHOT.zip
> > 
> > RUN rm -r temp
> > 
> >  
> > Hope it helps.
> > 
> >  
> >  
> >  
> >  
> > > 
> > > On Sun, Apr 23, 2017 at 8:00 AM, Oleg Tikhonov 
> > > wrote:
> > 
>

Re: [VOTE] Release Apache cTAKES 4.0.0 (rc3)

2017-04-20 Thread Miller, Timothy

Sorry for the delay, but I finally got around to testing RC3. I did a
dictionary download for the bin release following the wiki, a test of
the timex annotator, and a test of the coref annotator, and all worked
to my satisfaction.

My vote is +1.

Thanks
Tim

On Thu, 2017-04-20 at 14:53 -0400, James Masanz wrote:
> +1 for rc3 to be released as 4.0.0
> 
> Pei, Sean, and Murali, knowing that Tuesdays are good days for
> release
> announcements, do you see any problem with announcing the release on
> Tuesday the 25th if we end up with enough votes by end of Friday or
> even by
> the end of Saturday?  If there is, I'd be willing to help.
> 
> 
> On Thu, Apr 20, 2017 at 12:50 PM, Lin, Chen  rd.edu>
> wrote:
> 
> > 
> > I have checked to run the temporal module of rc3 for event-event,
> > event-time, and DocTimeRel annotators. All evaluation scripts
> > returned
> > proper outputs. Speaking of the temporal part:
> > +1
> > 
> > Best,
> > Chen
> > 
> > 
> > 
> > On 4/17/17, 11:47 PM, "Pei Chen"  wrote:
> > 
> > > 
> > > This is a call for a vote on releasing the following candidate
> > > (rc3) as
> > > Apache cTAKES 4.0.0.
> > > 
> > > For more detailed information on the changes/release notes,
> > > please visit:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__issues.apache.org_jir
> > > 
> > > a_secure_ReleaseNote.jspa-3FprojectId-3D12313621-
> > 26version-3D12340211=Dw
> > > 
> > > IBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> > PZ241CwYZ3AszaTEBtM2w
> > > 
> > > l3EcIjNNNeKX8q7N_mt-aI=VlkBaDDb0U8YoTQH0iGcuIpekrOEFC
> > c7o3NuMIG3zfQ=b0u
> > > 
> > > PVZJQhSNcFOGGEYe31-jWNnHtpYOCwUP-7lp7FQ8=
> > > 
> > > The release was made using the cTAKES release process documented
> > > here:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__ctakes.apache.org_cta
> > > 
> > > kes-2Drelease-2Dguide.html=DwIBaQ=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioC
> > > 
> > > oppxeFU=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI=
> > VlkBaDDb0U8YoTQH0i
> > > 
> > > GcuIpekrOEFCc7o3NuMIG3zfQ=qGG0kg6pqcCgN_zowUwigICdZm_
> > 41dYSVe3TGPqm-ts=
> > > 
> > > 
> > > 
> > > The candidate is available at:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__dist.apache.org_repos
> > > 
> > > _dist_dev_ctakes_ctakes-2D4.0.0-2Drc3_apache-2Dctakes-2D4.
> > 0.0-2Dsrc.tar.gz
> > > 
> > > =DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> > PZ241CwYZ3AszaTE
> > > 
> > > BtM2wl3EcIjNNNeKX8q7N_mt-aI=VlkBaDDb0U8YoTQH0iGcuIpekrOEFC
> > c7o3NuMIG3zfQ&
> > > 
> > > s=YDysq7gIYqVaSYxzHSPMZFWXlriEztSp6wIkQQ079AI=
> > > /.zip
> > > 
> > > The tag to be voted on:
> > > https://urldefense.proofpoint.com/v2/url?u=http-
> > 3A__svn.apache.org_repos_a
> > > 
> > > sf_ctakes_tags_ctakes-2D4.0.0-2Drc3=DwIBaQ=
> > qS4goWBT7poplM69zy_3xhKwEW1
> > > 
> > > 4JZMSdioCoppxeFU=PZ241CwYZ3AszaTEBtM2wl3EcIjNNN
> > eKX8q7N_mt-aI=VlkBaDDb0
> > > 
> > > U8YoTQH0iGcuIpekrOEFCc7o3NuMIG3zfQ=EPO9nj9p37_WnLh5q_
> > qzGZmE8n5zwhF_E72dY
> > > 
> > > Z_ZPFU=
> > > The MD5 checksum of the tarball can be found at:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__dist.apache.org_repos
> > > 
> > > _dist_dev_ctakes_ctakes-2D4.0.0-2Drc3_apache-2Dctakes-2D4.
> > 0.0-2Dsrc.tar.gz
> > > 
> > > .md5=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> > PZ241CwYZ3As
> > > 
> > > zaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI=VlkBaDDb0U8YoTQH0iGcuIpekrOEFC
> > c7o3NuMIG3
> > > 
> > > zfQ=BIXp-dYLC4yAC7LMclGzCm7Ru2wTiFw4NuSBSXHI4s8=
> > > /.zip.md5
> > > 
> > > The signature of the tarball can be found at:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__dist.apache.org_repos
> > > 
> > > _dist_dev_ctakes_ctakes-2D4.0.0-2Drc3_apache-2Dctakes-2D4.
> > 0.0-2Dsrc.tar.gz
> > > 
> > > .asc=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=
> > PZ241CwYZ3As
> > > 
> > > zaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI=VlkBaDDb0U8YoTQH0iGcuIpekrOEFC
> > c7o3NuMIG3
> > > 
> > > zfQ=0F6kw80_ut6W6eb1XpZyhDjACfxPHzeyl7i4C6HJLRE=
> > > /.zip.asc
> > > 
> > > Apache cTAKES' KEYS file, containing the PGP keys used to sign
> > > the
> > > release:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__dist.apache.org_repos
> > > 
> > > _dist_dev_ctakes_KEYS=DwIBaQ=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxe
> > > 
> > > FU=PZ241CwYZ3AszaTEBtM2wl3EcIjNNNeKX8q7N_mt-aI=
> > VlkBaDDb0U8YoTQH0iGcuIp
> > > 
> > > ekrOEFCc7o3NuMIG3zfQ=HsiwB4zCyI3ZeXrfFGAE9gusqKNzTYglf7neBvfSvR
> > > Y=
> > > 
> > > Please vote on releasing these packages as Apache cTAKES 4.0.0.
> > > The vote
> > > is
> > > open for at least the next 72 hours.
> > > 
> > > The vote passes if at least three binding +1 votes are cast.
> > > [ ] +1 Release the packages as Apache cTAKES 4.0.0
> > > [ ] -1 Do not release the packages because...
> > > 
> > > Also, the convenience binary can be found at:
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__dist.apache.org_repos
> > > 
> > >

Re: testing release candidates Re: Release Apache cTAKES 4.0.0 (rc2) [SUSPICIOUS]

2017-04-13 Thread Miller, Timothy

OK. By logging into confluence I found the draft of version 4.0
documentation, but maybe it's worth sending an email to dev with a few 
pages that need help and people can improve as they test?

I will do the same.

Thanks
Tim

On Thu, 2017-04-13 at 12:24 -0400, James Masanz wrote:
> I agree.
> There are (or were) some places that have TBD. and the part about
> unzipping
> resources needs to be expanded to include what to do if you just
> download
> the fast dictionary and not the entire set of dictionaries.  If no
> one
> beats me to it I will improve those sections by the time we announce.
> but
> the documentation is ready for comments and can continually be
> improved,
> even past an announced release if needed
> 
> On Thu, Apr 13, 2017 at 12:14 PM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
> 
> > 
> > Hi Tim,
> > 
> > Excellent question/point.
> > 
> > I think that you are welcome to follow any online instructions.  We
> > are
> > aware that the wiki is far from complete, and one thing that I
> > welcome
> > everybody to do is become active on documentation.
> > 
> > So, if you find instructions for installation, workflow, etc.
> > please
> > "test" the instructions.  If there are none then comment on the
> > absence.
> > However, I think that a paucity of documentation should not hold up
> > the
> > code/bin release.  I could be in the minority opinion.
> > 
> > Sean
> > 
> > -Original Message-
> > From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
> > Sent: Thursday, April 13, 2017 11:55 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: testing release candidates Re: Release Apache cTAKES
> > 4.0.0
> > (rc2) [SUSPICIOUS]
> > 
> > Thanks all for your hard work. I added some minor instructions to
> > the
> > spreadsheet that are hopefully helpful.
> > 
> > I want to test the cvd for standard dictionary lookup with the
> > separate
> > resoureces. Am I meant to be testing documentation as well? As in,
> > something I can follow along and make sure it's correct? Or should
> > I just
> > do it the way I know how to do it?
> > Tim
> > 
> > On Wed, 2017-04-12 at 20:21 -0400, James Masanz wrote:
> > > 
> > > Hi Everyone,
> > > 
> > > We could use a google spreadsheet to end up with a sense of
> > > testing
> > > coverage and maybe reduce duplicate testing effort too.
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > > com_
> > > spreadsheets_d_1FK-
> > > 2DkEhwewLJaVCBgWsSAMhL2KNCD6L8AMfFM33oKR2Y_edit-
> > > 3Fusp-
> > > 3Dsharing=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&
> > > r=He
> > > up-IbsIg9Q1TPOylpP9FE4GTK-
> > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQ
> > > FLuR
> > > Py-8Or1PXz-Fk=e08AY-Zbdb76VvYv_7uI4PE7LSTnsaP9BWpYtALtNgI=
> > > And we can compare future releases to this one.
> > > 
> > > I put a few example lines of the first things I plan to test.
> > > I'll
> > > start testing tomorrow and add more lines for myself then.
> > > If you don't want to update the spreadsheet twice, it would still
> > > be
> > > helpful to list what you've done after you do testing, without
> > > listing
> > > what you plan to do ahead of time.
> > > 
> > > Thanks,
> > > -- James
> > > 
> > > 
> > > On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen <chen...@apache.org>
> > > wrote:
> > > 
> > > > 
> > > > 
> > > > This is a call for a vote on releasing the following candidate
> > > > (rc2) as
> > > > Apache cTAKES 4.0.0.
> > > > 
> > > > For more detailed information on the changes/release notes,
> > > > please
> > > > visit:
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apa
> > > > che.
> > > > org_jira_secure_ReleaseNote.jspa-
> > > > 3F=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heu
> > > > p-
> > > > IbsIg9Q1TPOylpP9FE4GTK-
> > > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qW
> > > > MQFL
> > > > uRPy-8Or1PXz-Fk=rjZm_RuqvmHgiCulkvVx1bMlB-
> > > > hPdl2e6jFALQo9EpI=
> > > > projectId=12313621=12340211
> > > > 
> > > > The release was made using the cTAKES release process
> > > &

Re: testing release candidates Re: Release Apache cTAKES 4.0.0 (rc2)

2017-04-13 Thread Miller, Timothy

Thanks all for your hard work. I added some minor instructions to the
spreadsheet that are hopefully helpful.

I want to test the cvd for standard dictionary lookup with the separate
resoureces. Am I meant to be testing documentation as well? As in,
something I can follow along and make sure it's correct? Or should I
just do it the way I know how to do it?
Tim

On Wed, 2017-04-12 at 20:21 -0400, James Masanz wrote:
> Hi Everyone,
> 
> We could use a google spreadsheet to end up with a sense of testing
> coverage and maybe reduce duplicate testing effort too.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_
> spreadsheets_d_1FK-2DkEhwewLJaVCBgWsSAMhL2KNCD6L8AMfFM33oKR2Y_edit-
> 3Fusp-
> 3Dsharing=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=He
> up-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFLuR
> Py-8Or1PXz-Fk=e08AY-Zbdb76VvYv_7uI4PE7LSTnsaP9BWpYtALtNgI= 
> And we can compare future releases to this one.
> 
> I put a few example lines of the first things I plan to test. I'll
> start
> testing tomorrow and add more lines for myself then.
> If you don't want to update the spreadsheet twice, it would still be
> helpful to list what you've done after you do testing, without
> listing what
> you plan to do ahead of time.
> 
> Thanks,
> -- James
> 
> 
> On Wed, Apr 12, 2017 at 5:31 PM, Pei Chen  wrote:
> 
> > 
> > This is a call for a vote on releasing the following candidate
> > (rc2) as
> > Apache cTAKES 4.0.0.
> > 
> > For more detailed information on the changes/release notes, please
> > visit:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.
> > org_jira_secure_ReleaseNote.jspa-
> > 3F=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> > IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=rjZm_RuqvmHgiCulkvVx1bMlB-hPdl2e6jFALQo9EpI= 
> > projectId=12313621=12340211
> > 
> > The release was made using the cTAKES release process documented
> > here:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__ctakes.apache.
> > org_ctakes-2Drelease-
> > 2Dguide.html=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
> > =Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=_IVdOGaMXHU4VILsaTHW2DlT2vxCuXdFsAxZoeHH4f4= 
> > 
> > The candidate is available at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> > g_repos_dist_dev_ctakes_ctakes-2D4.0.0-
> > 2D=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> > IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=USgDkQyULsxjkoma2QZgAxzYzJ_ne6jG6NS9GWt6Wlc= 
> > rc2/apache-ctakes-4.0.0-src.tar.gz
> > /.zip
> > 
> > The tag to be voted on:
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__svn.apache.org_
> > repos_asf_ctakes_tags_ctakes-2D4.0.0-
> > 2Drc2=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup
> > -IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=LHrSuC3lec4UA8knMnpk6HGDgud-Gv8cRwqu3kaj1_U= 
> > The MD5 checksum of the tarball can be found at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> > g_repos_dist_dev_ctakes_ctakes-2D4.0.0-
> > 2D=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> > IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=USgDkQyULsxjkoma2QZgAxzYzJ_ne6jG6NS9GWt6Wlc= 
> > rc2/apache-ctakes-4.0.0-src.tar.gz.md5
> > /.zip.md5
> > 
> > The signature of the tarball can be found at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> > g_repos_dist_dev_ctakes_ctakes-2D4.0.0-
> > 2D=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=Heup-
> > IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=USgDkQyULsxjkoma2QZgAxzYzJ_ne6jG6NS9GWt6Wlc= 
> > rc2/apache-ctakes-4.0.0-src.tar.gz.asc
> > /.zip.asc
> > 
> > Apache cTAKES' KEYS file, containing the PGP keys used to sign the
> > release:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> > g_repos_dist_dev_ctakes_KEYS=DwIBaQ=qS4goWBT7poplM69zy_3xhKwEW1
> > 4JZMSdioCoppxeFU=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h=4CRAiUDySrFeinWC7JWYv7qWMQFL
> > uRPy-8Or1PXz-Fk=7IC1v9M631wNw6zau9L3E1VrSHW29J3Sad4hqYQccbg= 
> > 
> > Please vote on releasing these packages as Apache cTAKES 4.0.0. The
> > vote is
> > open for at least the next 72 hours.
> > 
> > The vote passes if at least three binding +1 votes are cast.
> > [ ] +1 Release the packages as Apache cTAKES 4.0.0
> > [ ] -1 Do not release the packages because...
> > 
> > Also, the convenience binary can be found at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__dist.apache.or
> >

Re: Regarding Negation, Uncertainty Pipes

2017-04-05 Thread Miller, Timothy

Thanks for that perspective, Yiming.

I contributed to the ClearTK version of the system. At that time we
evaluated it for negation [1] and found that it was more generalizable
than the rule-based negation detectors like Negex. Since then, we've
found on some projects that Negex is easier to modify for new data
sets, simply by tuning a parameter for how wide of a window to look at.
It also is easier to explain to physicians who may not be NLP savvy,
since its rules are simple and its mistakes are easier to explain. It
also seems that the ClearTK system was biased towards precision and
Negex towards recall, and our collaborators seemed more comfortable
with the recall bias. All that said, we don't have exhaustive
comparisons across attributes (uncertainty, generic, hypothetical,
subject, conditional) because performance along those other attributes
with any system is quite often below what would be considered
acceptable for presenting to users, and so any use of those is almost
beta testing.

I believe the default system is not changing in the upcoming release,
but I think the rule-based system will stick around and so you can
switch to it if the default does not work on your data.

Hope that is helpful.

Tim

[1] http://journals.plos.org/plosone/article?id=10.1371/journal.pone.01
12774

On Wed, 2017-04-05 at 13:30 -0400, Zuo Yiming wrote:
> Hi Devi,
> 
> My feeling is CleartK based pipelines are the currently recommended
> ones to use. They have specific pipeline for each property you may
> want to detect (neg, uncertainty, subject, etc), thus providing more
> concise and understandable coding style. 
> 
> From my personal experience, Assertion Non CleartK pipelines work
> well in detecting generic and subject properties while I have
> difficulty putting CleartK based pipelines to work for these two
> properties on user mode. On developer mode, I was not able to apply
> Assertion Non CleartK pipelines due to some errors, so it looks
> CleartK based pipelines are the only choices you have. I don’t think
> context annotator are still recommended now.
> 
> Again, that’s only my feedback, welcome more discussion.
> 
> Best,
> Yiming   
> 
> > 
> > On Apr 5, 2017, at 5:38 AM, Devikiran Ramadas  > > wrote:
> > 
> > Hi,
> > 
> > I have been looking into the cTAKES code base for sometime now..
> > and have
> > few confusions related to identifying Negation, Uncertainty and
> > other
> > properties for the identified named entity.
> > 
> > It can be done by :
> > 1) Assertion module
> >  - Cleartk based (neg, Uncertainty, subject, history, generic
> > etc.)
> >  - Non Cleartk (generic, subject )
> > 2) Context Annotator (Negation, Uncertainty and history )
> > 
> > I have been using Context Annotator but saw usage of the Cleartk
> > based
> > pipes even in the svn trunk version of Clinicalpipelinefactory
> > class.
> > 
> > Could some one please clear the air on what pipe is recommended for
> > each of
> > the properties of Identified Annotation?
> > 
> > Regards,
> > Devi

Re: Evaluate cTAKES perfomance

2017-03-18 Thread Miller, Timothy

To save you a little trouble, in ctakes-temporal we rely a lot on an outside 
library called ClearTK that has some evaluation APIs built in that work well 
with UIMA frameworks and typical NLP tasks. We use the following classes:
http://cleartk.github.io/cleartk/apidocs/2.0.0/org/cleartk/eval/AnnotationStatistics.html
http://cleartk.github.io/cleartk/apidocs/2.0.0/org/cleartk/eval/Evaluation_ImplBase.html

The simplest place to start looking in ctakes-temporal is probably the 
EventAnnotator and its evaluation, since they are simple one word spans. Then 
the TimeAnnotator is slightly more complicated with multi-word spans. Then if 
you are interested in evaluating relations I would suggest switching over to 
ctakes-relation-extractor which is more stable than the ctakes-temporal 
relation code, which is an area of highly active (i.e., funded) research and so 
the code has not been cleaned up as much.
Tim


From: Leander Melms 
Sent: Friday, March 17, 2017 3:05 PM
To: dev@ctakes.apache.org
Subject: Re: Evaluate cTAKES perfomance

Thanks! I'll have a look at it and will try to give something back to the 
community!

Leander


> On 17 Mar 2017, at 19:42, Finan, Sean  
> wrote:
>
> Ah - you meant best way to test.  Sorry, I misread your inquiry as a best way 
> to write output.
>
> Yes, that is a great introduction document for ctakes and early tests.  There 
> are a few small test classes in ctakes that read anafora files, run ctakes 
> and run agreement numbers.  You can find some in the ctakes-temporal module.  
> I didn't write them, and I think that they are built-to-fit purpose-driven 
> classes, but you could try to adapt them to a general purpose case.  That 
> would be a great thing to have in ctakes!
>
> Sean
>
> -Original Message-
> From: Leander Melms [mailto:me...@students.uni-marburg.de]
> Sent: Friday, March 17, 2017 1:46 PM
> To: dev@ctakes.apache.org
> Subject: Re: Evaluate cTAKES perfomance
>
> Hi Sean,
>
> thank you (again) for your help and feedback! I'll give it a try! Seems like 
> the authors of the publication "Mayo clinical Text analysis and Knowledge 
> Extraction System" 
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_pmc_articles_PMC2995668_=DwIFAg=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=PZ0f8s12PJA8W5B4hMlw-0F83VAM9m6E1ypWVaT2hcM=Isgii7k_fUy_qLsyqEdh15wKLAnFT6_KeE7zN1dE73Q=
>   
>   >) did this as well.
>
> Thank you
> Leander
>
>
>
>> On 17 Mar 2017, at 18:33, Finan, Sean  
>> wrote:
>>
>> Hi Leander,
>>
>> There is no single correct way to do this, but a couple of similar
>> classes exist.  Well, one sat in my sandbox for two years until about 5 
>> seconds ago as I only just checked it in.  Anyway, take a look at two 
>> classes in ctakes-core org.apache.ctakes.core They are TextSpanWriter and 
>> CuiCountFileWriter.
>>
>> TextSpanWriter writes annotation name | span | covered text in a file, one 
>> per document.
>>
>> CuiCountFileWriter writes a list of discovered cuis and their counts.
>>
>> It sounds like you are interested in a combination of both - basically 
>> TextSpanWriter with the added output of CUIs.
>>
>> You can also have a look at EntityCollector of 
>> org.apache.ctakes.core.pipeline.  It has an annotation engine that keeps a 
>> running list of "entities" for the whole run, doc ids, spans, text and cuis.
>>
>> Sean
>>
>>
>> -Original Message-
>> From: Leander Melms [mailto:me...@students.uni-marburg.de]
>> Sent: Friday, March 17, 2017 1:09 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: Evaluate cTAKES perfomance
>>
>> Sorry for writing again. I just have a quick question: My idea is to parse 
>> the cTAKES output to a text file with a structure like this 
>> DocName|Spans|CUI|CoveredText|ConceptType and do the same with the cold 
>> standart (from anafora).
>>
>> Is this a correct way to do this?
>>
>> I'm new to the subject and happy about the tiniest information on the topic.
>>
>> Thanks
>> Leander
>>
>> I
>>> On 17 Mar 2017, at 12:05, Leander Melms  
>>> wrote:
>>>
>>> Hi,
>>>
>>> I've integrated a custom dictionary, retrained some of the OpenNLP models 
>>> and would like to evaluate the changes on a gold standard. I'd like to 
>>> calculate the precision, the recall and the f1-score to compare the results.
>>>
>>> My question is: Does cTAKES ship with some evaluation / test scripts? What 
>>> is the best strategry to do this? Has anyone dealt with this topic before?
>>>
>>> I'm happy to share the results afterwards

1 2 >

1 - 100 of 183 matches

Mail list logo