Re: Processing Extraordinarily Long Documents

2019-02-28 Thread Michael Trepanier
once > documents get above 12K-13K. We also target processing as many as 10 > annotators in a single pass of the corpus. This approach has worked well > for us. > > > > Thanks, > > Ron > > > > > > > > > > *From: *Michael Trepanier > *Date:

Re: Processing Extraordinarily Long Documents

2019-02-28 Thread Michael Trepanier
tly from the smaller documents. > > Best, > > Dima > > > > > On Feb 27, 2019, at 16:59, Michael Trepanier wrote: > > Hi, > > We currently have a pipeline which is generating ontology mappings for a > repository of clinical notes. However, this repository

Processing Extraordinarily Long Documents

2019-02-27 Thread Michael Trepanier
Hi, We currently have a pipeline which is generating ontology mappings for a repository of clinical notes. However, this repository contains documents which, after RTF parsing, can contain over 900,000 characters (albeit this is a very small percentage of notes, out of ~13 million, around 50

cTAKES Blocking when Run in Separate JVMs

2018-12-21 Thread Michael Trepanier
Hi, I am running multiple cTAKES pipelines on a single machine in parallel, each in their own JVM. Looking across the logs of each JVM, it appears that severe blocking is occurring after the annotations are generated for a particular segment. In particular, it looks like only one JVM is

Re: How do I add a dictionary (like NCI) to cTakes lookup?

2018-08-20 Thread Michael Trepanier
Ory, In response to Gandhi's comments, the video below outlines custom dictionary creation in detail: https://www.youtube.com/watch?v=4aOnafv-NQs Best, Mike On Mon, Aug 20, 2018 at 2:09 AM, Gandhi Rajan Natarajan < gandhi.natara...@arisglobal.com> wrote: > Hi Ory, > > I guess RxNORM and

Packaging cTAKES in a Jar - LVG Related Configuration Error

2018-03-28 Thread Michael Trepanier
Hi All, I am attempting to package cTAKES in a jar while while avoiding it copying the lvg related files to /tmp/ as it does in /ctakes/trunk/ctakes-lvg/src/main/java/org/apache/ctakes/lvg/ae/LvgAnnotator.java. Everything works up until cTAKES tries to path the lvg.properties file within the

Re: [EXTERNAL] Leveraging cTAKES without a UMLS Credential Check

2018-03-10 Thread Michael Trepanier
the same question by everyone a while back. From what we > understand so far it seems that this may go away once you build and load > your own dictionary vs using the default you mentioned. But we' haven't > tested that yet. Lincoln > > > > *From:* Michael Trepanier [mailt

Leveraging cTAKES without a UMLS Credential Check

2018-03-09 Thread Michael Trepanier
Hi All, Is it possible to avoid the UMLS credential check each time cTAKES is run? It seems like cTAKES would be configurable in such a way to use UMLS credentials to acquire the sno_rx_16abterms dictionary once, and then not need to check against UMLS in future runs. In particular, I am

Re: Setting the Lvg Resources Location in lvg.properties

2017-09-29 Thread Michael Trepanier
ll have to revisit that when I look at that patch. > > -- James > > > > On Tue, Sep 26, 2017 at 5:53 PM, Michael Trepanier <m...@metistream.com> > wrote: > >> I am attempting to run cTAKES from an executable UberJar. While the fast >> pipeline seems to run cor

Re: cTAKES Fast Pipeline Failing

2017-09-03 Thread Michael Trepanier
in > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: > -7 > > Are you using cTAKES 4.0 (either from the convenience binary download or > as a maven dependency) or are you using cTAKES in some other way > > -- James > > > On Fri,

cTAKES Fast Pipeline Failing

2017-09-01 Thread Michael Trepanier
Hi All, We've been attempting to scale our cTAKES Pipeline on top of Spark, so we've switched form using the "getDefaultPipeline" method to the "getFastPipeline" method to boost the processing speed. However, while the default pipeline works fine with Spark, the fast pipeline is throwing the

Implementation Improvements for cTAKES on top of Spark

2017-07-25 Thread Michael Trepanier
Hi, I am currently leveraging cTAKES inside of Apache Spark and have written a function that takes in a single clinical note as as string and does the following: 1) Sets the UMLS system properties. 2) Instantiates JCAS object. 3) Runs the default pipeline 4) (Not shown below) Grabs the