Accidentally replied off list ...

---------- Forwarded message ----------
From: Eddie Epstein <[email protected]>
Date: Tue, Aug 16, 2011 at 2:45 PM
Subject: Re: Performance of CPE/CPM vs AS
To: Charles Bearden <[email protected]>


> Thanks again for your reply. I thought that I was deploying the pipeline in
> one AS process with the first option for running it:
>
> runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
>  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
>  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml
>
> It looks like one process in the output of ps. I'm just surprised that the
> performance is so much slower (16x slower).

Right, all in one process, but the connection between client and
service is the same used between multiple processes. As a quick test,
create a new aggregate with these two delegates:
SentencesFromDBReader.xml and
SbmiUmlsSmallAggregatePlaintextProcessor.xml. Then create a deployment
descriptor for this aggregate, say
Deploy_OneProcessDictionaryTest.xml, and test it with:

runRemoteAsyncAE.sh tcp://localhost:61616 OneProcessQueue \
-d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_OneProcessDictionaryTest.xml

Without a collection reader runRemoteAsyncAE will send a single empty
CAS to the service. This will kick off the embedded collection reader
in the aggregate, and hopefully you'll see times similar to the CPE.

> To create a pipeline with an architecture like Figure 5, I would use the
> example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the
> uima_async_scaleout.pdf for 2.3.1?

That would be one way. The important points are 1) to send a CAS which
points at some subset of the collection, and 2) change the embedded
collection reader inside the service to a CasMultiplier which can
access that CAS and generate the sub-collection of CASes to the
pipeline. Given these 2, a static set of CASes to be sent to the
service could be created and runRemoteAsyncAE used to send them.

Eddie

Reply via email to