Implementation Improvements for cTAKES on top of Spark

2017-07-25 Thread Michael Trepanier
Hi, I am currently leveraging cTAKES inside of Apache Spark and have written a function that takes in a single clinical note as as string and does the following: 1) Sets the UMLS system properties. 2) Instantiates JCAS object. 3) Runs the default pipeline 4) (Not shown below) Grabs the annotation

Re: Implementation Improvements for cTAKES on top of Spark

2017-07-28 Thread Michael Trepanier
Peter wrote: > About your second question with UMLS, You can build the pipeline > initially and it will verify the license info, then just reuse the > pipeline on each call. > > > > On 7/25/17, 4:53 PM, "Michael Trepanier" wrote: > >>Hi, >> >>I

Re: Implementation Improvements for cTAKES on top of Spark

2017-08-11 Thread Michael Trepanier
Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++++++ > > > On 7/28/17, 1

cTAKES Fast Pipeline Failing

2017-09-01 Thread Michael Trepanier
Hi All, We've been attempting to scale our cTAKES Pipeline on top of Spark, so we've switched form using the "getDefaultPipeline" method to the "getFastPipeline" method to boost the processing speed. However, while the default pipeline works fine with Spark, the fast pipeline is throwing the below

Re: cTAKES Fast Pipeline Failing

2017-09-03 Thread Michael Trepanier
ulting > in > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: > -7 > > Are you using cTAKES 4.0 (either from the convenience binary download or > as a maven dependency) or are you using cTAKES in some other way > > -- James > > > On Fri, Sep 1, 2017 at 3:

Setting the Lvg Resources Location in lvg.properties

2017-09-26 Thread Michael Trepanier
I am attempting to run cTAKES from an executable UberJar. While the fast pipeline seems to run correctly (in terms of producing an output), when stepping through the LvgAnnotator related steps, cTAKES produces the below error. 26 Sep 2017 22:47:01 INFO LvgAnnotator - URL for lvg.properties =file:

Re: Setting the Lvg Resources Location in lvg.properties

2017-09-29 Thread Michael Trepanier
revisit that when I look at that patch. > > -- James > > > > On Tue, Sep 26, 2017 at 5:53 PM, Michael Trepanier > wrote: > >> I am attempting to run cTAKES from an executable UberJar. While the fast >> pipeline seems to run correctly (in terms of producing an o

GetOriginalText() Always Returning NULL for Identified Annotations

2018-02-13 Thread Michael Trepanier
Hi, I am attempting to run the default FastPipeline to extract various features from clinical text. One of the features I'd like to capture is the covered text. However, when running the below scala code, calling getOriginalText yields a "null" value for every annotation of type IdentifiedAnnotati

Re: GetOriginalText() Always Returning NULL for Identified Annotations

2018-02-13 Thread Michael Trepanier
Thanks Jessica - that does exactly what I need. On Tue, Feb 13, 2018 at 1:49 PM, Jessica Glover wrote: > Hi Mike, > > Have you tried the getCoveredText() method that IdentifiedAnnotation > inherits from Annotation? > > - Jessica > > On Tue, Feb 13, 2018 at 2:42 PM, Mi

Leveraging cTAKES without a UMLS Credential Check

2018-03-09 Thread Michael Trepanier
Hi All, Is it possible to avoid the UMLS credential check each time cTAKES is run? It seems like cTAKES would be configurable in such a way to use UMLS credentials to acquire the sno_rx_16abterms dictionary once, and then not need to check against UMLS in future runs. In particular, I am thinking

Re: [EXTERNAL] Leveraging cTAKES without a UMLS Credential Check

2018-03-10 Thread Michael Trepanier
eryone a while back. From what we > understand so far it seems that this may go away once you build and load > your own dictionary vs using the default you mentioned. But we' haven't > tested that yet. Lincoln > > > > *From:* Michael Trepanier [mailto:m...@metistream.com] &

Packaging cTAKES in a Jar - LVG Related Configuration Error

2018-03-28 Thread Michael Trepanier
Hi All, I am attempting to package cTAKES in a jar while while avoiding it copying the lvg related files to /tmp/ as it does in /ctakes/trunk/ctakes-lvg/src/main/java/org/apache/ctakes/lvg/ae/LvgAnnotator.java. Everything works up until cTAKES tries to path the lvg.properties file within the jar

Re: How do I add a dictionary (like NCI) to cTakes lookup?

2018-08-20 Thread Michael Trepanier
Ory, In response to Gandhi's comments, the video below outlines custom dictionary creation in detail: https://www.youtube.com/watch?v=4aOnafv-NQs Best, Mike On Mon, Aug 20, 2018 at 2:09 AM, Gandhi Rajan Natarajan < gandhi.natara...@arisglobal.com> wrote: > Hi Ory, > > I guess RxNORM and SNO

cTAKES Blocking when Run in Separate JVMs

2018-12-21 Thread Michael Trepanier
Hi, I am running multiple cTAKES pipelines on a single machine in parallel, each in their own JVM. Looking across the logs of each JVM, it appears that severe blocking is occurring after the annotations are generated for a particular segment. In particular, it looks like only one JVM is processing

Re: cTAKES heuristic for memory/CPU allocation

2019-01-31 Thread Michael Trepanier
The cTAKES Default Processing Pipeline requires about a minimum of 3G of RAM due to the size of the embedded HSQLDBs (that is the default). However, providing a fair bit of overhead is generally a good idea. As for multi-threading, I have been using the ThreadSafeLvg class. Per the component-use g

Temporal Relations Absent from cTAKES Output

2019-02-19 Thread Michael Trepanier
Hi, I have a pipeline defined by the below aggregate description (in Scala). aed = { builder.add(SimpleSegmentAnnotator.createAnnotatorDescription) builder.add(SentenceDetector.createAnnotatorDescription) builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription) builder.add(ThreadSafeL

Processing Extraordinarily Long Documents

2019-02-27 Thread Michael Trepanier
Hi, We currently have a pipeline which is generating ontology mappings for a repository of clinical notes. However, this repository contains documents which, after RTF parsing, can contain over 900,000 characters (albeit this is a very small percentage of notes, out of ~13 million, around 50 conta

Re: Processing Extraordinarily Long Documents

2019-02-28 Thread Michael Trepanier
endently from the smaller documents. > > Best, > > Dima > > > > > On Feb 27, 2019, at 16:59, Michael Trepanier wrote: > > Hi, > > We currently have a pipeline which is generating ontology mappings for a > repository of clinical notes. However, this repos

Re: Processing Extraordinarily Long Documents

2019-02-28 Thread Michael Trepanier
curs once > documents get above 12K-13K. We also target processing as many as 10 > annotators in a single pass of the corpus. This approach has worked well > for us. > > > > Thanks, > > Ron > > > > > > > > > > *From: *Michael Trepanier > *

Re: Temporal Relations Absent from cTAKES Output

2019-03-11 Thread Michael Trepanier
e of a DiseaseDisorderMention, calling d iseaseDisorderMention.getRelativeTemporalContext always returns a null. Am I missing pipeline steps which link the TemporalTextRelation instances to EventMention instances, or is it necessary to manually do this? Thanks, Mike On Tue, Feb 19, 2019 at 4:47 PM Michael Trepanier wrot