Hi Frank,
I've only been able to do so using UIMA-AS (Asynchronous Scaleout). I was
thinking of writing up a quickstart/tutorial on this but here is the brief list
of steps. In a near-future release there may be an easier way because up until
now one of our external dependencies (LVG) was not thread-safe. But it has been
improved and local hyperthreading could potentially be made easier. What I am
about to describe could be extended to multiple machines.
1) Download UIMA-AS binaries for your desired version and unpack
2) Go into that directory and start the AMQ broker at bin/startBroker.{sh,bat}
-- UIMA uses this to set up all the necessary queues between readers/pipelines
and between pipelines
3) Create a deployment descriptor file. Yes, another UIMA XML descriptor type.
But luckily if you use eclipse this is really easy to do with UIMA and UIMA-AS
tooling. Otherwise you can look at the UIMA documentation[1]. For the simplest
case of multiplying out a single pipeline it basically just has to point to an
analysis engine desciptor and have a number of CASes which will correspond to
the number of pipelines you want to run. You can also get more complicated and
just scale out single analysis engines within your pipeline but I will leave
this to you to learn more about.
4) Setup your path variables: UIMA_HOME should point at the UIMA-AS download
directory, UIMA_CLASSPATH needs to have all the jars/directories that your
analysis engines need.
5) Startup your pipelines with bin/deployAsyncService.{sh,bat}
6) Debug by looking in uima.log to see error messages.
Now to get documents to your pipelines, see the API docs [2] for how to setup
the engine information about your pipelines, but then instead of creating a cas
and calling sendCas(), you can create a collection reader in UimaFIT and then
call setCollectionReader on the uima as engine object.
You may or may not want to go this route -- it's more complicated then just
saying "use 8 cores," but it is nice if you want to eventually setup a bunch of
pipelines on a cluster or something.?
Tim
[1] UIMA-AS deployment descriptor documentation:
https://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html#ugr.ref.async.deploy
[2]
https://uima.apache.org/d/uima-as-2.6.0/uima_async_scaleout.html#ugr.ref.async.api.usage
________________________________
From: Franck Dernoncourt <[email protected]>
Sent: Saturday, October 10, 2015 7:37 PM
To: [email protected]
Subject: Running cTAKES using all cores available
Hi,
When processing several text documents with cTAKES, is there any way to use all
the cores available on the machine? When I batch process documents using the
CPE Configurator, cTAKES only uses one core. I read it is possible to have
cTAKES use all cores available programmatically (e.g.
http://ctakes.markmail.org/search/?q=list%3Aorg.apache.incubator.ctakes-user+multi#query:list%3Aorg.apache.incubator.ctakes-user%20multi+page:1+mid:7xancosdfbnmm67d+state:results<https://urldefense.proofpoint.com/v2/url?u=http-3A__ctakes.markmail.org_search_-3Fq-3Dlist-253Aorg.apache.incubator.ctakes-2Duser-2Bmulti-23query-3Alist-253Aorg.apache.incubator.ctakes-2Duser-2520multi-2Bpage-3A1-2Bmid-3A7xancosdfbnmm67d-2Bstate-3Aresults&d=BQMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kBsduhiqp3ioqSomSEHwwxjajfcRSiKxxUVtTG9zQ9Y&s=qW9JBQVOIRSOcdbA4skv6d3cvN-Yh1I-Chm-LI5gW6o&e=>),
but I wonder whether it's possible to do so through the GUI or the config
files.
Thanks,
Franck
----
Franck Dernoncourt
[email protected]<mailto:[email protected]>
http://francky.me<https://urldefense.proofpoint.com/v2/url?u=http-3A__francky.me&d=BQMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kBsduhiqp3ioqSomSEHwwxjajfcRSiKxxUVtTG9zQ9Y&s=c82nBXQHTiZrGCeIEeAthvlilWOeXQJ-F7NGYXU9r8s&e=>