Re: DUCC- process_dd

Eddie Epstein Thu, 30 Apr 2015 15:00:07 -0700

The simplest way of vertically scaling a Job process is to specify the
analysis pipeline using core UIMA descriptors and then using
--process_thread_count to specify how many copies of the pipeline to
deploy, each in a different thread. No use of UIMA-AS at all. Please check
out the "Raw Text Processing" sample application that comes with DUCC.


On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <[email protected]>
wrote:

>
> Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
> AEs both.
>
> I want to scale aggregate as well as individual AEs. Is there any way of
> doing this in UIMA AS/DUCC?
>
>
>
> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>
>> In async aggregate you scale individual AEs not the aggregate as a whole.
>> The below configuration should do that. Are there any warnings from
>> dd2spring at startup with your configuration?
>>
>> <analysisEngine async="true" >
>>
>>                                  <delegates>
>>                                          <analysisEngine
>> key="ChunkerDescriptor">
>>                                                  <scaleout
>> numberOfInstances="5" />
>>                                          </analysisEngine>
>>                                          <analysisEngine
>> key="NEDescriptor">
>>                                                  <scaleout
>> numberOfInstances="5" />
>>                                          </analysisEngine>
>>                                          <analysisEngine
>> key="StemmerDescriptor">
>>                                                  <scaleout
>> numberOfInstances="5" />
>>                                          </analysisEngine>
>>                                          <analysisEngine
>> key="ConsumerDescriptor">
>>                                                  <scaleout
>> numberOfInstances="5" />
>>                                          </analysisEngine>
>>                                  </delegates>
>>                          </analysisEngine>
>>
>> Jerry
>>
>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <[email protected]>
>> wrote:
>>
>>  Hi,
>>>
>>> I was trying to scale my processing pipeline to be run in DUCC
>>> environment
>>> with uima as process_dd. If I was trying to scale using the below given
>>> configuration, the threads started were not as expected:
>>>
>>>
>>> <analysisEngineDeploymentDescription
>>>          xmlns="http://uima.apache.org/resourceSpecifier";>
>>>
>>>          <name>Uima v3 Deployment Descripter</name>
>>>          <description>Deploys Uima v3 Aggregate AE using the Advanced
>>> Fixed
>>> Flow
>>>                  Controller</description>
>>>
>>>          <deployment protocol="jms" provider="activemq">
>>>                  <casPool numberOfCASes="5" />
>>>                  <service>
>>>                          <inputQueue endpoint="UIMA_Queue_test"
>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
>>>                          <topDescriptor>
>>>                                  <import
>>>
>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>> />
>>>                          </topDescriptor>
>>>                          <analysisEngine async="true"
>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>> inputQueueScaleout="10">
>>>                                  <scaleout numberOfInstances="5"/>
>>>                                  <delegates>
>>>                                          <analysisEngine
>>> key="ChunkerDescriptor">
>>>                                                  <scaleout
>>> numberOfInstances="5" />
>>>                                          </analysisEngine>
>>>                                          <analysisEngine
>>> key="NEDescriptor">
>>>                                                  <scaleout
>>> numberOfInstances="5" />
>>>                                          </analysisEngine>
>>>                                          <analysisEngine
>>> key="StemmerDescriptor">
>>>                                                  <scaleout
>>> numberOfInstances="5" />
>>>                                          </analysisEngine>
>>>                                          <analysisEngine
>>> key="ConsumerDescriptor">
>>>                                                  <scaleout
>>> numberOfInstances="5" />
>>>                                          </analysisEngine>
>>>                                  </delegates>
>>>                          </analysisEngine>
>>>                  </service>
>>>          </deployment>
>>>
>>> </analysisEngineDeploymentDescription>
>>>
>>>
>>> There should be 5 threads of FlowControllerAgg where each thread will
>>> have
>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>> and
>>> ConsumerDescriptor.
>>>
>>> But I didn't think it is actually happening in case of DUCC.
>>>
>>> Thanks in advance.
>>>
>>> Reshu.
>>>
>>>
>>>
>>>
>

Re: DUCC- process_dd

Reply via email to