Thank you Jerry & Eddie for your responses to my previous questions. I appreciate the opportunity to learn.

Based on a little testing, I'm starting to think that AS is not designed for performance-enhancing scale-out, but maybe rather for architectural clarity. I have a CPE that has a collection reader that reads sentences from a database, and an aggregate AE that is the cTAKES AggregatePlaintextProcessor (using our dictionary for dictionary lookup) plus an AE that writes the concept annotations to a database. When I put these together as a CPE and run it against a test set of 2553 sentences, it takes about one minute, sometimes a few seconds less. The CpmFrame GUI indicates that the CR accounts for about 5% of the processing time, and the AE for the other 95%, with the LVG annotator & dictionary lookup each accounting for between 35%-45%.

When I use the same CR & aggregate AE like this:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml \
  -d sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

it takes 16 minutes to process the same 2553 sentences. Deploy_DictionaryTest.xml is the deployment descriptor; you can see its contents here: <http://pastebin.com/6nhuaC4H>.

When I deploy the AE five times with 'deployAsyncService.sh' like this:

deployAsyncService.sh \
  sbmi-ctsa/desc/asynchronous_scaleout/Deploy_DictionaryTest.xml

and then use 'runRemoteAsyncAE.sh' to connect the CR to the input queue like 
this:

runRemoteAsyncAE.sh tcp://localhost:61616 CollectionReader \
  -c sbmi-ctsa/desc/asynchronous_scaleout/SentencesFromDBReader.xml

it still takes 12 minutes to process the 2553 sentences. I can see from the log files that the processing is being scaled out. Given that in the CPE the CASes spent only 5% of their time in the CR, I'm skeptical it has become the bottleneck, though I could be wrong. I'm just wondering if this kind of performance difference is expected.

Thanks,
Chuck
--
Chuck Bearden
Programmer Analyst IV
The University of Texas Health Science Center at Houston
School of Biomedical Informatics
Email: [email protected]
Phone: 713.500.9672


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to