Amit,
The component that decides when it's time to scale out is the DUCC Resource Manager, or RM.

The decision to scale out is made essentially from a combination of (a) how much work is left, (b) how long work items take to complete, and (c) how long it takes to initialize a new process. If it looks like it will take longer to start a new process than to just let all the work run out in the existing processes, no new processes are spawned. This produces nice behavior when there are a lot of users trying to share the resources.

If there's only "you" or a very limited quantity of resources so that sharing isn't really practical, you can configure the RM to scale out the job to it's maximum immediately. If this is done the RM will just give you everything it can within the constraints of fair-share if there are other jobs or users.

Here is how to configure the RM to fully expand a job immediately - you need to modify ducc_runtime/resources/ducc.properties thus: 1) change ducc.rm.initialization.cap to be some high number such as 99 or 999:
    ducc.rm.initialization.cap = 99
That will be the number of processes initially allocated, constrained by the number of resources that exist. So if you only have resources for 20 processes, you'll get 20 processes, not 99.

2) change ducc.rm.expand.by.doubling to false:
  ducc.rm.expand.by.doubling = false
If this is true, the job will double up to it's maximum each scheduling cycle AFTER the job has initialized. If it is false, it will expand fully immediately after the first successful initialization.

3) Bounce the RM to force re-read of the config file ( or all of DUCC if you prefer ); To bounce only RM:
   a) ducc_runtime/admin/stop_ducc -c rm
   b) ducc_runtime/admin/start_ducc -c rm

Working together, those two parameters insure your job expands fully immediately.

Be aware, RM will never expand beyond physical resources, and it will never expand beyond the number of processes needed to run your job. For example if you have 100 work items, and each process runs 4 threads, you will never get more than 25 processes (4*25=100 work items processed simultaneously).

Jim



On 10/16/14 3:26 PM, Burn Lewis wrote:
If your job has unprocessed work then perhaps the unused nodes are not in
the scheduling class you specified, or are too small.  Note that the
example below has all of its work either completed or active, so has no
work waiting to be processed.

State: Running  Workitems: 16  Done: 12  Error: 0  Dispatch: 4  Unassigned:
0  Limbo: 0
~Burn

On Thu, Oct 16, 2014 at 2:38 PM, Lou DeGenaro <[email protected]>
wrote:

Amit,

DUCC should use all available resources as configured by your ducc.classes
and ducc.nodes files.

Lou.


On Thu, Oct 16, 2014 at 11:59 AM, Amit Gupta <[email protected]>
wrote:

Thanks for the clarification Burn,

So indeed there is no way to "force" a job to scale out to maximum
resources available?

What I'm finding is that even though a job takes > 1 hour to complete
using
2 nodes, it doesnt use some extra available nodes which are part of the
ducc cluster.

a. Is there no configuration option to deal with this (I'm guessing this
requirement may have come up before) ?

b. Would you happen to know what part of UIMA code makes that decision
(i.e
the trigger to spawn a process on a new node or not) ?


Thanks again for you help,

Best,
Amit





On Thu, Oct 16, 2014 at 9:32 AM, Burn Lewis <[email protected]> wrote:

Yes, that parameter only limits the maximum scaleout.  DUCC will ramp
up
the number of processors based on the available resources and the
amount
of
work to be done.  It initially starts only 1 or 2 and only when one
initializes successfully will it start more.  It may not start more if
it
suspects that all the work will be completed on the existing nodes
before
any new ones are ready.

There is an additional type of scaleout, within each process,
controlled
by
--process_thread_count which controls how many threads in each process
are
capable of processing separate work items.

~Burn

On Wed, Oct 15, 2014 at 7:11 PM, Amit Gupta <[email protected]>
wrote:

Hi,
I've been trying to find the options related to configuration of
scaleout
of a ducc job.

Thus far the only ones Ive found are:

process_deployments_max:
which limits the maximum number of processes spawned by a ducc job.

At what point does DUCC decide to spawn a new process or spread
processing
out to a new node. Is there a tuning parameter for an optimal number
of
work items per process spawned? Can the user control this behavior?

For example,
I have a job large enough that DUCC natively spreads it across 2
nodes.
I havent been able to force this job, via a config parameter, to
spread
across 4 nodes (or "X" nodes) for faster processing times.

Does anyone know if theres a parameter than can directly control
scaleout
in this manner?

Thanks,

--
Amit Gupta



--
Amit Gupta


Reply via email to