Re: Scale out tuning for jobs

Amit Gupta Thu, 23 Oct 2014 13:22:30 -0700

Thank so much for that clarification Jim and Burn,
All useful info as I am still learning this framework.


I also discovered another parameter ducc.threads.limit set to 500 by
default in the resources/ducc.properties file that was also limiting my
scaleout.

Thanks again,

On Sun, Oct 19, 2014 at 2:07 PM, Jim Challenger <[email protected]> wrote:

> Amit,
>     The component that decides when it's time to scale out is the DUCC
> Resource Manager, or RM.
>
>     The decision to scale out is made essentially from a combination of
> (a) how much work is left, (b) how long work items take to complete, and
> (c) how long it takes to initialize a new process. If it looks like it will
> take longer to start a new process than to just let all the work run out in
> the existing processes, no new processes are spawned.  This produces nice
> behavior when there are a lot of users trying to share the resources.
>
>    If there's only "you" or a very limited quantity of resources so that
> sharing isn't really practical, you can configure the RM to scale out the
> job to it's maximum immediately.  If this is done the RM will just give you
> everything it can within the constraints of fair-share if there are other
> jobs or users.
>
>    Here is how to configure the RM to fully expand a job immediately - you
> need to modify ducc_runtime/resources/ducc.properties thus:
> 1) change ducc.rm.initialization.cap  to be some high number such as 99 or
> 999:
>     ducc.rm.initialization.cap = 99
>    That will be the number of processes initially allocated, constrained
> by the number of resources that exist.  So if you only have resources for
> 20 processes, you'll get 20 processes, not 99.
>
> 2) change ducc.rm.expand.by.doubling to false:
>   ducc.rm.expand.by.doubling = false
>    If this is true, the job will double up to it's maximum each scheduling
> cycle AFTER the job has initialized.  If it is false, it will expand fully
> immediately after the first successful initialization.
>
> 3) Bounce the RM to force re-read of the config file ( or all of DUCC if
> you prefer );  To bounce only RM:
>    a) ducc_runtime/admin/stop_ducc -c rm
>    b) ducc_runtime/admin/start_ducc -c rm
>
>    Working together, those two parameters insure your job expands fully
> immediately.
>
>    Be aware, RM will never expand beyond physical resources, and it will
> never expand beyond the number of processes needed to run your job.  For
> example if you have 100 work items, and each process runs 4 threads, you
> will never get more than 25 processes (4*25=100 work items processed
> simultaneously).
>
> Jim
>
>
>
>
> On 10/16/14 3:26 PM, Burn Lewis wrote:
>
>> If your job has unprocessed work then perhaps the unused nodes are not in
>> the scheduling class you specified, or are too small.  Note that the
>> example below has all of its work either completed or active, so has no
>> work waiting to be processed.
>>
>> State: Running  Workitems: 16  Done: 12  Error: 0  Dispatch: 4
>> Unassigned:
>> 0  Limbo: 0
>> ~Burn
>>
>> On Thu, Oct 16, 2014 at 2:38 PM, Lou DeGenaro <[email protected]>
>> wrote:
>>
>>  Amit,
>>>
>>> DUCC should use all available resources as configured by your
>>> ducc.classes
>>> and ducc.nodes files.
>>>
>>> Lou.
>>>
>>>
>>> On Thu, Oct 16, 2014 at 11:59 AM, Amit Gupta <[email protected]>
>>> wrote:
>>>
>>>  Thanks for the clarification Burn,
>>>>
>>>> So indeed there is no way to "force" a job to scale out to maximum
>>>> resources available?
>>>>
>>>> What I'm finding is that even though a job takes > 1 hour to complete
>>>>
>>> using
>>>
>>>> 2 nodes, it doesnt use some extra available nodes which are part of the
>>>> ducc cluster.
>>>>
>>>> a. Is there no configuration option to deal with this (I'm guessing this
>>>> requirement may have come up before) ?
>>>>
>>>> b. Would you happen to know what part of UIMA code makes that decision
>>>>
>>> (i.e
>>>
>>>> the trigger to spawn a process on a new node or not) ?
>>>>
>>>>
>>>> Thanks again for you help,
>>>>
>>>> Best,
>>>> Amit
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Oct 16, 2014 at 9:32 AM, Burn Lewis <[email protected]>
>>>> wrote:
>>>>
>>>>  Yes, that parameter only limits the maximum scaleout.  DUCC will ramp
>>>>>
>>>> up
>>>
>>>> the number of processors based on the available resources and the
>>>>>
>>>> amount
>>>
>>>> of
>>>>
>>>>> work to be done.  It initially starts only 1 or 2 and only when one
>>>>> initializes successfully will it start more.  It may not start more if
>>>>>
>>>> it
>>>
>>>> suspects that all the work will be completed on the existing nodes
>>>>>
>>>> before
>>>
>>>> any new ones are ready.
>>>>>
>>>>> There is an additional type of scaleout, within each process,
>>>>>
>>>> controlled
>>>
>>>> by
>>>>
>>>>> --process_thread_count which controls how many threads in each process
>>>>>
>>>> are
>>>>
>>>>> capable of processing separate work items.
>>>>>
>>>>> ~Burn
>>>>>
>>>>> On Wed, Oct 15, 2014 at 7:11 PM, Amit Gupta <[email protected]>
>>>>> wrote:
>>>>>
>>>>>  Hi,
>>>>>> I've been trying to find the options related to configuration of
>>>>>>
>>>>> scaleout
>>>>
>>>>> of a ducc job.
>>>>>>
>>>>>> Thus far the only ones Ive found are:
>>>>>>
>>>>>> process_deployments_max:
>>>>>> which limits the maximum number of processes spawned by a ducc job.
>>>>>>
>>>>>> At what point does DUCC decide to spawn a new process or spread
>>>>>>
>>>>> processing
>>>>>
>>>>>> out to a new node. Is there a tuning parameter for an optimal number
>>>>>>
>>>>> of
>>>
>>>> work items per process spawned? Can the user control this behavior?
>>>>>>
>>>>>> For example,
>>>>>> I have a job large enough that DUCC natively spreads it across 2
>>>>>>
>>>>> nodes.
>>>
>>>> I havent been able to force this job, via a config parameter, to
>>>>>>
>>>>> spread
>>>
>>>> across 4 nodes (or "X" nodes) for faster processing times.
>>>>>>
>>>>>> Does anyone know if theres a parameter than can directly control
>>>>>>
>>>>> scaleout
>>>>
>>>>> in this manner?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> Amit Gupta
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Amit Gupta
>>>>
>>>>
>


-- 
Amit Gupta

Re: Scale out tuning for jobs

Reply via email to