The CPE runs everything in the same process. UIMA AS could deploy this
pipeline in one process and get the same performance as the CPE.

Scaling out to multiple processes incurs overhead, which for UIMA AS
essentially consists of CAS serialization and communication. Figure 5
on http://uima.apache.org/doc-uimaas-what.html will have much lower
overhead for this scenario.

Eddie

On Tue, Aug 16, 2011 at 11:48 AM, Charles Bearden
<[email protected]> wrote:
> Thank you Jerry & Eddie for your responses to my previous questions. I
> appreciate the opportunity to learn.
>
> Based on a little testing, I'm starting to think that AS is not designed for
> performance-enhancing scale-out, but maybe rather for architectural clarity.
> I have a CPE that has a collection reader that reads sentences from a
> database, and an aggregate AE that is the cTAKES AggregatePlaintextProcessor
> (using our dictionary for dictionary lookup) plus an AE that writes the
> concept annotations to a database. When I put these together as a CPE and
> run it against a test set of 2553 sentences, it takes about one minute,
> sometimes a few seconds less. The CpmFrame GUI indicates that the CR
> accounts for about 5% of the processing time, and the AE for the other 95%,
> with the LVG annotator & dictionary lookup each accounting for between
> 35%-45%.
>
> When I use the same CR & aggregate AE like this:
>
> runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
>  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
>  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml
>
> it takes 16 minutes to process the same 2553 sentences.
> Deploy_DictionaryTest.xml is the deployment descriptor; you can see its
> contents here: <http://pastebin.com/6nhuaC4H>.
>
> When I deploy the AE five times with 'deployAsyncService.sh' like this:
>
> deployAsyncService.sh \
>  sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml
>
> and then use 'runRemoteAsyncAE.sh' to connect the CR to the input queue like
> this:
>
> runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
>  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml
>
> it still takes 12 minutes to process the 2553 sentences. I can see from the
> log files that the processing is being scaled out. Given that in the CPE the
> CASes spent only 5% of their time in the CR, I'm skeptical it has become the
> bottleneck, though I could be wrong. I'm just wondering if this kind of
> performance difference is expected.
>
> Thanks,
> Chuck
> --
> Chuck Bearden
> Programmer Analyst IV
> The University of Texas Health Science Center at Houston
> School of Biomedical Informatics
> Email: [email protected]
> Phone: 713.500.9672
>
>
>

Reply via email to