Accidentally replied off list ... ---------- Forwarded message ---------- From: Eddie Epstein <[email protected]> Date: Tue, Aug 16, 2011 at 2:45 PM Subject: Re: Performance of CPE/CPM vs AS To: Charles Bearden <[email protected]>
> Thanks again for your reply. I thought that I was deploying the pipeline in > one AS process with the first option for running it: > > runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \ > -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \ > -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml > > It looks like one process in the output of ps. I'm just surprised that the > performance is so much slower (16x slower). Right, all in one process, but the connection between client and service is the same used between multiple processes. As a quick test, create a new aggregate with these two delegates: SentencesFromDBReader.xml and SbmiUmlsSmallAggregatePlaintextProcessor.xml. Then create a deployment descriptor for this aggregate, say Deploy_OneProcessDictionaryTest.xml, and test it with: runRemoteAsyncAE.sh tcp://localhost:61616 OneProcessQueue \ -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_OneProcessDictionaryTest.xml Without a collection reader runRemoteAsyncAE will send a single empty CAS to the service. This will kick off the embedded collection reader in the aggregate, and hopefully you'll see times similar to the CPE. > To create a pipeline with an architecture like Figure 5, I would use the > example in "4.6. Asynchronous Client API Usage Scenarios" on p. 30 of the > uima_async_scaleout.pdf for 2.3.1? That would be one way. The important points are 1) to send a CAS which points at some subset of the collection, and 2) change the embedded collection reader inside the service to a CasMultiplier which can access that CAS and generate the sub-collection of CASes to the pipeline. Given these 2, a static set of CASes to be sent to the service could be created and runRemoteAsyncAE used to send them. Eddie
