Re: Running cTAKES using all cores available

Miller, Timothy Sun, 11 Oct 2015 05:13:12 -0700

Hi Frank,

I've only been able to do so using UIMA-AS (Asynchronous Scaleout). I was 
thinking of writing up a quickstart/tutorial on this but here is the brief list 
of steps. In a near-future release there may be an easier way because up until 
now one of our external dependencies (LVG) was not thread-safe. But it has been 
improved and local hyperthreading could potentially be made easier. What I am 
about to describe could be extended to multiple machines.



1) Download UIMA-AS binaries for your desired version and unpack

2) Go into that directory and start the AMQ broker at bin/startBroker.{sh,bat} 
-- UIMA uses this to set up all the necessary queues between readers/pipelines 
and between pipelines

3) Create a deployment descriptor file.  Yes, another UIMA XML descriptor type. 
But luckily if you use eclipse this is really easy to do with UIMA and UIMA-AS 
tooling. Otherwise you can look at the UIMA documentation[1]. For the simplest 
case of multiplying out a single pipeline it basically just has to point to an 
analysis engine desciptor and have a number of CASes which will correspond to 
the number of pipelines you want to run. You can also get more complicated and 
just scale out single analysis engines within your pipeline but I will leave 
this to you to learn more about.

4) Setup your path variables: UIMA_HOME should point at the UIMA-AS download 
directory, UIMA_CLASSPATH needs to have all the jars/directories that your 
analysis engines need.

5) Startup your pipelines with bin/deployAsyncService.{sh,bat}

6) Debug by looking in uima.log to see error messages.


Now to get documents to your pipelines, see the API docs [2] for how to setup 
the engine information about your pipelines, but then instead of creating a cas 
and calling sendCas(), you can create a collection reader in UimaFIT and then 
call setCollectionReader on the uima as engine object.


You may or may not want to go this route -- it's more complicated then just 
saying "use 8 cores," but it is nice if you want to eventually setup a bunch of 
pipelines on a cluster or something.?


Tim





[1] UIMA-AS deployment descriptor documentation: 
https://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html#ugr.ref.async.deploy

[2] 
https://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html#ugr.ref.async.api.usage

________________________________
From: Franck Dernoncourt <[email protected]>
Sent: Saturday, October 10, 2015 7:37 PM
To: [email protected]
Subject: Running cTAKES using all cores available

Hi,

When processing several text documents with cTAKES, is there any way to use all 
the cores available on the machine? When I batch process documents using the 
CPE Configurator, cTAKES only uses one core. I read it is possible to have 
cTAKES use all cores available programmatically (e.g. 
http://ctakes.markmail.org/search/?q=list%3Aorg.apache.incubator.ctakes-user+multi#query:list%3Aorg.apache.incubator.ctakes-user%20multi+page:1+mid:7xancosdfbnmm67d+state:results<https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.markmail.org_search_-3Fq-3Dlist-253Aorg.apache.incubator.ctakes-2Duser-2Bmulti-23query-3Alist-253Aorg.apache.incubator.ctakes-2Duser-2520multi-2Bpage-3A1-2Bmid-3A7xancosdfbnmm67d-2Bstate-3Aresults&d=BQMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kBsduhiqp3ioqSomSEHwwxjajfcRSiKxxUVtTG9zQ9Y&s=qW9JBQVOIRSOcdbA4skv6d3cvN-Yh1I-Chm-LI5gW6o&e=>),
 but I wonder whether it's possible to do so through the GUI or the config 
files.

Thanks,
Franck


----
Franck Dernoncourt
[email protected]<mailto:[email protected]>
http://francky.me<https://urldefense.proofpoint.com/v2/url?u=http-3A__francky.me&d=BQMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kBsduhiqp3ioqSomSEHwwxjajfcRSiKxxUVtTG9zQ9Y&s=c82nBXQHTiZrGCeIEeAthvlilWOeXQJ-F7NGYXU9r8s&e=>

Re: Running cTAKES using all cores available

Reply via email to