Hi Mike, Thanks for adding this here. I just want to know If you see the answer to these questions. I am facing the same issue and want to address the same questions.
If Yes, please share with me. Thanks, Harsha On 26 July 2017 at 05:23, Michael Trepanier <[email protected]> wrote: > Hi, > > I am currently leveraging cTAKES inside of Apache Spark and have > written a function that takes in a single clinical note as as string > and does the following: > > 1) Sets the UMLS system properties. > 2) Instantiates JCAS object. > 3) Runs the default pipeline > 4) (Not shown below) Grabs the annotations and places them in a JSON > object for each note. > > def generateAnnotations(paragraph:String): String = { > System.setProperty("ctakes.umlsuser", "MY_UMLS_USERNAME") > System.setProperty("ctakes.umlspw", "MY_UMLS_PASSWORD") > > var jcas = JCasFactory.createJCas("org.apache.ctakes.typesystem. > types.TypeSystem") > var aed = ClinicalPipelineFactory.getDefaultPipeline() > jcas.setDocumentText(paragraph) > SimplePipeline.runPipeline(jcas, aed) > ... > > This function is being implemented as a UDF which is applied to a > Spark Dataframe with clinical notes in each row. I have two > implementation questions that follow: > > 1) When cTAKES is being applied iteratively to clinical notes, is it > necessary to instantiate a new JCAS object for every annotation? Or > can the same JCAS object be utilized over and over with the document > text being changed? > 2) For each application of this function, the > UmlsDictionaryLookupAnnotator has to connect to UMLS using the > provided UMLS information. This Is there any way to instead perform > this step locally? Ie. ingest UMLS and place it in either HDFS or just > mount it somewhere on each node? I'm worried about spamming the UMLS > server in this step, and about how long this seems to take. > > Thanks, > > Mike > > > -- > > Mike Trepanier| Big Data Engineer | MetiStream, Inc. | > [email protected] | 845 - 270 - 3129 (m) | www.metistream.com >
