Re: Scaling out cTAKES

Olga Patterson Tue, 24 Nov 2020 15:23:20 -0800

At the VA, we use cTAKES with UIMA AS.
Here is a very simple example how it can be implemented
http://decipher.chpc.utah.edu/gitblit/summary/?r=examples/ctakes-test.git

--
Olga

From: John Doe <lucanus...@gmail.com>
Reply-To: "user@ctakes.apache.org" <user@ctakes.apache.org>
Date: Wednesday, November 18, 2020 at 7:23 AM
To: "user@ctakes.apache.org" <user@ctakes.apache.org>
Cc: Serguei Pakhomov <pakh0...@umn.edu>, Raymond Finzel <finze...@umn.edu>
Subject: Re: Scaling out cTAKES

Thank you all for the responses. For now, I am going to learn more how UIMA-AS 
works to determine if this will work for my use case. If not, I will check out 
your other suggestions.

On Tue, Nov 17, 2020 at 5:22 PM Greg Silverman 
<g...@umn.edu<mailto:g...@umn.edu>> wrote:
FYI, I just doubled the number of backends and clients and increased the 
throughput to ~1000 docs/second. Server utilization is only minimal now.

I should note, that unlike on a Spark cluster, this is running on 2-old servers 
and a VM. The nice thing about Kubernetes is that you can easily scale up or 
down the number of instances using horizontal pod autoscaling. Plus, it's a lot 
easier to manage than a Spark cluster.

We just started running the cTAKES pipeline on this, so it's an experiment in 
process.

So far, the results are very decent. I'll scale it up even more in a day or so.

Greg--

On Tue, Nov 17, 2020 at 11:10 AM Greg Silverman 
<g...@umn.edu<mailto:g...@umn.edu>> wrote:
We at the UMN NLP/IE Lab have developed NLP-ADAPT-kube to scale out 4-UIMA NLP 
annotators using Kubernetes/UIMA-AS, including cTAKES, CLAMP, MetaMap (using 
the UIMA wrapper), and our own homegrown BioMedICUS. Our project is here: 
https://github.com/nlpie/nlp-adapt-kube

There are 2-versions: One for CPM, which includes QuickUMLS; and the other for 
UIMA-AS. The AS versions are under the docker folder and the argo-k8s folder, 
and use the 4-engines mentioned above. There is a project Wiki (but it is 
slightly out-of-date). We are in the process of working non-UIMA engines (like 
QuickUMLS and our new version of BioMedICUS) into the AS workflow (we're using 
AMQ for message queuing).

We're currently running cTAKES using Kubernetes hpa with 6-backends and 
2-clients across 3-compute nodes getting very decent throughput (~150 
docs/second). We could definitely scale it up even further.

For comparison how well this scales, we were running 64-MetaMap backends with 
16-clients and getting  ~40 docs/second for very large clinical documents 
(which for MetaMap is very decent). This was across 5-compute nodes.

If you're interested, we can assist in implementation. The client does require 
some customizations based on the backend database you're using: 
https://github.com/nlpie/nlp-adapt-kube/tree/master/docker/as/client, but that 
is pretty straightforward.

Best!

Greg--

On Tue, Nov 17, 2020 at 10:47 AM John Doe 
<lucanus...@gmail.com<mailto:lucanus...@gmail.com>> wrote:
Hello,

I'm new to cTAKES and was wondering what the options are for scaling out the 
default clinical pipeline. I'm running it on a large number of clinical notes 
using runClinicalPipeline.bat and specifying the input directory with the 
notes. What are the best options for doing this in a more scalable way? For 
example, can I parallelize it with UIMA-AS? Or should I manually use multiple 
command prompts to run the clinical pipeline on a different set of clinical 
notes in parallel? I'm not sure if there is any build-in solution or community 
resource which uses EMR/Spark or some other method to achieve this.

Thank you for your help.

--
Greg M. Silverman
Senior Systems Developer
NLP/IE<https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
g...@umn.edu<mailto:g...@umn.edu>

--
Greg M. Silverman
Senior Systems Developer
NLP/IE<https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
g...@umn.edu<mailto:g...@umn.edu>

Re: Scaling out cTAKES

Reply via email to