Re: Implementation Improvements for cTAKES on top of Spark

Abramowitsch, Peter Fri, 28 Jul 2017 11:09:56 -0700

I've done the same thing, and there is a reset method in the CAS which allows 
one to repopulate it with new patient data each time.
My server side function looks like this



private void runPipeline(spark.Request req, spark.Response res)

throws AnalysisEngineProcessException,

ResourceInitializationException, SAXException, IOException {

_jcas.setDocumentText(req.body());

_xxx.process(_jcas);

_yyy.process(_jcas);

res.header("Content-Type", "application/json");

JsonCasSerializer.jsonSerialize(_jcas.getCas(), res.raw()

.getOutputStream());

_jcas.reset();

}


- Peter
From: Harsh Mishra <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, July 26, 2017 at 11:29 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Implementation Improvements for cTAKES on top of Spark

Hi Mike,

Thanks for adding this here. I just want to know If you see the answer to these 
questions.
I am facing the same issue and want to address the same questions.

If Yes, please share with me.

Thanks,

Harsha



On 26 July 2017 at 05:23, Michael Trepanier 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am currently leveraging cTAKES inside of Apache Spark and have
written a function that takes in a single clinical note as as string
and does the following:

1) Sets the UMLS system properties.
2) Instantiates JCAS object.
3) Runs the default pipeline
4) (Not shown below) Grabs the annotations and places them in a JSON
object for each note.

  def generateAnnotations(paragraph:String): String = {
    System.setProperty("ctakes.umlsuser", "MY_UMLS_USERNAME")
    System.setProperty("ctakes.umlspw", "MY_UMLS_PASSWORD")

    var jcas = 
JCasFactory.createJCas("org.apache.ctakes.typesystem.types.TypeSystem")
    var aed = ClinicalPipelineFactory.getDefaultPipeline()
    jcas.setDocumentText(paragraph)
    SimplePipeline.runPipeline(jcas, aed)
    ...

This function is being implemented as a UDF which is applied to a
Spark Dataframe with clinical notes in each row. I have two
implementation questions that follow:

1) When cTAKES is being applied iteratively to clinical notes, is it
necessary to instantiate a new JCAS object for every annotation? Or
can the same JCAS object be utilized over and over with the document
text being changed?
2) For each application of this function, the
UmlsDictionaryLookupAnnotator has to connect to UMLS using the
provided UMLS information. This Is there any way to instead perform
this step locally? Ie. ingest UMLS and place it in either HDFS or just
mount it somewhere on each node? I'm worried about spamming the UMLS
server in this step, and about how long this seems to take.

Thanks,

Mike


--

Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
[email protected]<mailto:[email protected]> | 845 - 270 - 3129 (m) | 
www.metistream.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.metistream.com&d=DwMFaQ&c=B73tqXN8Ec0ocRmZHMCntw&r=5LM1YwNyMUq7CWiSepCCsjTjwuVF4uswNF8BK5Orm10&m=7dtqwYkQGEmsR3jZXDfs2ewubNXvT2XaY4he3ko4mLg&s=K_1QhVRXYRhvR-yP3MHjPkebo43Nku-i8DMt5rhvNlg&e=>

Re: Implementation Improvements for cTAKES on top of Spark

Reply via email to