Re: Implementation Improvements for cTAKES on top of Spark

Harsh Mishra Wed, 26 Jul 2017 11:29:47 -0700

Hi Mike,

Thanks for adding this here. I just want to know If you see the answer to
these questions.
I am facing the same issue and want to address the same questions.


If Yes, please share with me.

Thanks,

Harsha



On 26 July 2017 at 05:23, Michael Trepanier <[email protected]> wrote:

> Hi,
>
> I am currently leveraging cTAKES inside of Apache Spark and have
> written a function that takes in a single clinical note as as string
> and does the following:
>
> 1) Sets the UMLS system properties.
> 2) Instantiates JCAS object.
> 3) Runs the default pipeline
> 4) (Not shown below) Grabs the annotations and places them in a JSON
> object for each note.
>
>   def generateAnnotations(paragraph:String): String = {
>     System.setProperty("ctakes.umlsuser", "MY_UMLS_USERNAME")
>     System.setProperty("ctakes.umlspw", "MY_UMLS_PASSWORD")
>
>     var jcas = JCasFactory.createJCas("org.apache.ctakes.typesystem.
> types.TypeSystem")
>     var aed = ClinicalPipelineFactory.getDefaultPipeline()
>     jcas.setDocumentText(paragraph)
>     SimplePipeline.runPipeline(jcas, aed)
>     ...
>
> This function is being implemented as a UDF which is applied to a
> Spark Dataframe with clinical notes in each row. I have two
> implementation questions that follow:
>
> 1) When cTAKES is being applied iteratively to clinical notes, is it
> necessary to instantiate a new JCAS object for every annotation? Or
> can the same JCAS object be utilized over and over with the document
> text being changed?
> 2) For each application of this function, the
> UmlsDictionaryLookupAnnotator has to connect to UMLS using the
> provided UMLS information. This Is there any way to instead perform
> this step locally? Ie. ingest UMLS and place it in either HDFS or just
> mount it somewhere on each node? I'm worried about spamming the UMLS
> server in this step, and about how long this seems to take.
>
> Thanks,
>
> Mike
>
>
> --
>
> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
> [email protected] | 845 - 270 - 3129 (m) | www.metistream.com
>

Re: Implementation Improvements for cTAKES on top of Spark

Reply via email to