The simplest way of vertically scaling a Job process is to specify the analysis pipeline using core UIMA descriptors and then using --process_thread_count to specify how many copies of the pipeline to deploy, each in a different thread. No use of UIMA-AS at all. Please check out the "Raw Text Processing" sample application that comes with DUCC.
On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <[email protected]> wrote: > > Ohh!!! I misunderstand this. I thought this would scale my Aggregate and > AEs both. > > I want to scale aggregate as well as individual AEs. Is there any way of > doing this in UIMA AS/DUCC? > > > > On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote: > >> In async aggregate you scale individual AEs not the aggregate as a whole. >> The below configuration should do that. Are there any warnings from >> dd2spring at startup with your configuration? >> >> <analysisEngine async="true" > >> >> <delegates> >> <analysisEngine >> key="ChunkerDescriptor"> >> <scaleout >> numberOfInstances="5" /> >> </analysisEngine> >> <analysisEngine >> key="NEDescriptor"> >> <scaleout >> numberOfInstances="5" /> >> </analysisEngine> >> <analysisEngine >> key="StemmerDescriptor"> >> <scaleout >> numberOfInstances="5" /> >> </analysisEngine> >> <analysisEngine >> key="ConsumerDescriptor"> >> <scaleout >> numberOfInstances="5" /> >> </analysisEngine> >> </delegates> >> </analysisEngine> >> >> Jerry >> >> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <[email protected]> >> wrote: >> >> Hi, >>> >>> I was trying to scale my processing pipeline to be run in DUCC >>> environment >>> with uima as process_dd. If I was trying to scale using the below given >>> configuration, the threads started were not as expected: >>> >>> >>> <analysisEngineDeploymentDescription >>> xmlns="http://uima.apache.org/resourceSpecifier"> >>> >>> <name>Uima v3 Deployment Descripter</name> >>> <description>Deploys Uima v3 Aggregate AE using the Advanced >>> Fixed >>> Flow >>> Controller</description> >>> >>> <deployment protocol="jms" provider="activemq"> >>> <casPool numberOfCASes="5" /> >>> <service> >>> <inputQueue endpoint="UIMA_Queue_test" >>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" /> >>> <topDescriptor> >>> <import >>> >>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml" >>> /> >>> </topDescriptor> >>> <analysisEngine async="true" >>> key="FlowControllerAgg" internalReplyQueueScaleout="10" >>> inputQueueScaleout="10"> >>> <scaleout numberOfInstances="5"/> >>> <delegates> >>> <analysisEngine >>> key="ChunkerDescriptor"> >>> <scaleout >>> numberOfInstances="5" /> >>> </analysisEngine> >>> <analysisEngine >>> key="NEDescriptor"> >>> <scaleout >>> numberOfInstances="5" /> >>> </analysisEngine> >>> <analysisEngine >>> key="StemmerDescriptor"> >>> <scaleout >>> numberOfInstances="5" /> >>> </analysisEngine> >>> <analysisEngine >>> key="ConsumerDescriptor"> >>> <scaleout >>> numberOfInstances="5" /> >>> </analysisEngine> >>> </delegates> >>> </analysisEngine> >>> </service> >>> </deployment> >>> >>> </analysisEngineDeploymentDescription> >>> >>> >>> There should be 5 threads of FlowControllerAgg where each thread will >>> have >>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor >>> and >>> ConsumerDescriptor. >>> >>> But I didn't think it is actually happening in case of DUCC. >>> >>> Thanks in advance. >>> >>> Reshu. >>> >>> >>> >>> >
