At the VA, we use cTAKES with UIMA AS. Here is a very simple example how it can be implemented http://decipher.chpc.utah.edu/gitblit/summary/?r=examples/ctakes-test.git
-- Olga From: John Doe <lucanus...@gmail.com> Reply-To: "user@ctakes.apache.org" <user@ctakes.apache.org> Date: Wednesday, November 18, 2020 at 7:23 AM To: "user@ctakes.apache.org" <user@ctakes.apache.org> Cc: Serguei Pakhomov <pakh0...@umn.edu>, Raymond Finzel <finze...@umn.edu> Subject: Re: Scaling out cTAKES Thank you all for the responses. For now, I am going to learn more how UIMA-AS works to determine if this will work for my use case. If not, I will check out your other suggestions. On Tue, Nov 17, 2020 at 5:22 PM Greg Silverman <g...@umn.edu<mailto:g...@umn.edu>> wrote: FYI, I just doubled the number of backends and clients and increased the throughput to ~1000 docs/second. Server utilization is only minimal now. I should note, that unlike on a Spark cluster, this is running on 2-old servers and a VM. The nice thing about Kubernetes is that you can easily scale up or down the number of instances using horizontal pod autoscaling. Plus, it's a lot easier to manage than a Spark cluster. We just started running the cTAKES pipeline on this, so it's an experiment in process. So far, the results are very decent. I'll scale it up even more in a day or so. Greg-- On Tue, Nov 17, 2020 at 11:10 AM Greg Silverman <g...@umn.edu<mailto:g...@umn.edu>> wrote: We at the UMN NLP/IE Lab have developed NLP-ADAPT-kube to scale out 4-UIMA NLP annotators using Kubernetes/UIMA-AS, including cTAKES, CLAMP, MetaMap (using the UIMA wrapper), and our own homegrown BioMedICUS. Our project is here: https://github.com/nlpie/nlp-adapt-kube There are 2-versions: One for CPM, which includes QuickUMLS; and the other for UIMA-AS. The AS versions are under the docker folder and the argo-k8s folder, and use the 4-engines mentioned above. There is a project Wiki (but it is slightly out-of-date). We are in the process of working non-UIMA engines (like QuickUMLS and our new version of BioMedICUS) into the AS workflow (we're using AMQ for message queuing). We're currently running cTAKES using Kubernetes hpa with 6-backends and 2-clients across 3-compute nodes getting very decent throughput (~150 docs/second). We could definitely scale it up even further. For comparison how well this scales, we were running 64-MetaMap backends with 16-clients and getting ~40 docs/second for very large clinical documents (which for MetaMap is very decent). This was across 5-compute nodes. If you're interested, we can assist in implementation. The client does require some customizations based on the backend database you're using: https://github.com/nlpie/nlp-adapt-kube/tree/master/docker/as/client, but that is pretty straightforward. Best! Greg-- On Tue, Nov 17, 2020 at 10:47 AM John Doe <lucanus...@gmail.com<mailto:lucanus...@gmail.com>> wrote: Hello, I'm new to cTAKES and was wondering what the options are for scaling out the default clinical pipeline. I'm running it on a large number of clinical notes using runClinicalPipeline.bat and specifying the input directory with the notes. What are the best options for doing this in a more scalable way? For example, can I parallelize it with UIMA-AS? Or should I manually use multiple command prompts to run the clinical pipeline on a different set of clinical notes in parallel? I'm not sure if there is any build-in solution or community resource which uses EMR/Spark or some other method to achieve this. Thank you for your help. -- Greg M. Silverman Senior Systems Developer NLP/IE<https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu<mailto:g...@umn.edu> -- Greg M. Silverman Senior Systems Developer NLP/IE<https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu<mailto:g...@umn.edu>