Re: Scale out tuning for jobs

Jim Challenger Sun, 19 Oct 2014 12:07:48 -0700

Amit,

The component that decides when it's time to scale out is the DUCCResource Manager, or RM.

The decision to scale out is made essentially from a combination of(a) how much work is left, (b) how long work items take to complete, and(c) how long it takes to initialize a new process. If it looks like itwill take longer to start a new process than to just let all the workrun out in the existing processes, no new processes are spawned. Thisproduces nice behavior when there are a lot of users trying to share theresources.

If there's only "you" or a very limited quantity of resources sothat sharing isn't really practical, you can configure the RM to scaleout the job to it's maximum immediately. If this is done the RM willjust give you everything it can within the constraints of fair-share ifthere are other jobs or users.

Here is how to configure the RM to fully expand a job immediately -you need to modify ducc_runtime/resources/ducc.properties thus:1) change ducc.rm.initialization.cap to be some high number such as 99or 999:

    ducc.rm.initialization.cap = 99

That will be the number of processes initially allocated,constrained by the number of resources that exist. So if you only haveresources for 20 processes, you'll get 20 processes, not 99.


2) change ducc.rm.expand.by.doubling to false:
  ducc.rm.expand.by.doubling = false

If this is true, the job will double up to it's maximum eachscheduling cycle AFTER the job has initialized. If it is false, it willexpand fully immediately after the first successful initialization.

3) Bounce the RM to force re-read of the config file ( or all of DUCC ifyou prefer ); To bounce only RM:

   a) ducc_runtime/admin/stop_ducc -c rm
   b) ducc_runtime/admin/start_ducc -c rm

Working together, those two parameters insure your job expands fullyimmediately.

Be aware, RM will never expand beyond physical resources, and itwill never expand beyond the number of processes needed to run yourjob. For example if you have 100 work items, and each process runs 4threads, you will never get more than 25 processes (4*25=100 work itemsprocessed simultaneously).


Jim



On 10/16/14 3:26 PM, Burn Lewis wrote:

If your job has unprocessed work then perhaps the unused nodes are not in
the scheduling class you specified, or are too small.  Note that the
example below has all of its work either completed or active, so has no
work waiting to be processed.

State: Running  Workitems: 16  Done: 12  Error: 0  Dispatch: 4  Unassigned:
0  Limbo: 0
~Burn

On Thu, Oct 16, 2014 at 2:38 PM, Lou DeGenaro <[email protected]>
wrote:

Amit,

DUCC should use all available resources as configured by your ducc.classes
and ducc.nodes files.

Lou.


On Thu, Oct 16, 2014 at 11:59 AM, Amit Gupta <[email protected]>
wrote:

Thanks for the clarification Burn,

So indeed there is no way to "force" a job to scale out to maximum
resources available?

What I'm finding is that even though a job takes > 1 hour to complete

using

2 nodes, it doesnt use some extra available nodes which are part of the
ducc cluster.

a. Is there no configuration option to deal with this (I'm guessing this
requirement may have come up before) ?

b. Would you happen to know what part of UIMA code makes that decision

(i.e

the trigger to spawn a process on a new node or not) ?


Thanks again for you help,

Best,
Amit





On Thu, Oct 16, 2014 at 9:32 AM, Burn Lewis <[email protected]> wrote:

Yes, that parameter only limits the maximum scaleout.  DUCC will ramp

up

the number of processors based on the available resources and the

amount

of

work to be done.  It initially starts only 1 or 2 and only when one
initializes successfully will it start more.  It may not start more if

it

suspects that all the work will be completed on the existing nodes

before

any new ones are ready.

There is an additional type of scaleout, within each process,

controlled

by

--process_thread_count which controls how many threads in each process

are

capable of processing separate work items.

~Burn

On Wed, Oct 15, 2014 at 7:11 PM, Amit Gupta <[email protected]>
wrote:

Hi,
I've been trying to find the options related to configuration of

scaleout

of a ducc job.

Thus far the only ones Ive found are:

process_deployments_max:
which limits the maximum number of processes spawned by a ducc job.

At what point does DUCC decide to spawn a new process or spread

processing

out to a new node. Is there a tuning parameter for an optimal number

of

work items per process spawned? Can the user control this behavior?

For example,
I have a job large enough that DUCC natively spreads it across 2

nodes.

I havent been able to force this job, via a config parameter, to

spread

across 4 nodes (or "X" nodes) for faster processing times.

Does anyone know if theres a parameter than can directly control

scaleout

in this manner?

Thanks,

--
Amit Gupta



--
Amit Gupta

Re: Scale out tuning for jobs

Reply via email to