Thank so much for that clarification Jim and Burn, All useful info as I am still learning this framework.
I also discovered another parameter ducc.threads.limit set to 500 by default in the resources/ducc.properties file that was also limiting my scaleout. Thanks again, On Sun, Oct 19, 2014 at 2:07 PM, Jim Challenger <[email protected]> wrote: > Amit, > The component that decides when it's time to scale out is the DUCC > Resource Manager, or RM. > > The decision to scale out is made essentially from a combination of > (a) how much work is left, (b) how long work items take to complete, and > (c) how long it takes to initialize a new process. If it looks like it will > take longer to start a new process than to just let all the work run out in > the existing processes, no new processes are spawned. This produces nice > behavior when there are a lot of users trying to share the resources. > > If there's only "you" or a very limited quantity of resources so that > sharing isn't really practical, you can configure the RM to scale out the > job to it's maximum immediately. If this is done the RM will just give you > everything it can within the constraints of fair-share if there are other > jobs or users. > > Here is how to configure the RM to fully expand a job immediately - you > need to modify ducc_runtime/resources/ducc.properties thus: > 1) change ducc.rm.initialization.cap to be some high number such as 99 or > 999: > ducc.rm.initialization.cap = 99 > That will be the number of processes initially allocated, constrained > by the number of resources that exist. So if you only have resources for > 20 processes, you'll get 20 processes, not 99. > > 2) change ducc.rm.expand.by.doubling to false: > ducc.rm.expand.by.doubling = false > If this is true, the job will double up to it's maximum each scheduling > cycle AFTER the job has initialized. If it is false, it will expand fully > immediately after the first successful initialization. > > 3) Bounce the RM to force re-read of the config file ( or all of DUCC if > you prefer ); To bounce only RM: > a) ducc_runtime/admin/stop_ducc -c rm > b) ducc_runtime/admin/start_ducc -c rm > > Working together, those two parameters insure your job expands fully > immediately. > > Be aware, RM will never expand beyond physical resources, and it will > never expand beyond the number of processes needed to run your job. For > example if you have 100 work items, and each process runs 4 threads, you > will never get more than 25 processes (4*25=100 work items processed > simultaneously). > > Jim > > > > > On 10/16/14 3:26 PM, Burn Lewis wrote: > >> If your job has unprocessed work then perhaps the unused nodes are not in >> the scheduling class you specified, or are too small. Note that the >> example below has all of its work either completed or active, so has no >> work waiting to be processed. >> >> State: Running Workitems: 16 Done: 12 Error: 0 Dispatch: 4 >> Unassigned: >> 0 Limbo: 0 >> ~Burn >> >> On Thu, Oct 16, 2014 at 2:38 PM, Lou DeGenaro <[email protected]> >> wrote: >> >> Amit, >>> >>> DUCC should use all available resources as configured by your >>> ducc.classes >>> and ducc.nodes files. >>> >>> Lou. >>> >>> >>> On Thu, Oct 16, 2014 at 11:59 AM, Amit Gupta <[email protected]> >>> wrote: >>> >>> Thanks for the clarification Burn, >>>> >>>> So indeed there is no way to "force" a job to scale out to maximum >>>> resources available? >>>> >>>> What I'm finding is that even though a job takes > 1 hour to complete >>>> >>> using >>> >>>> 2 nodes, it doesnt use some extra available nodes which are part of the >>>> ducc cluster. >>>> >>>> a. Is there no configuration option to deal with this (I'm guessing this >>>> requirement may have come up before) ? >>>> >>>> b. Would you happen to know what part of UIMA code makes that decision >>>> >>> (i.e >>> >>>> the trigger to spawn a process on a new node or not) ? >>>> >>>> >>>> Thanks again for you help, >>>> >>>> Best, >>>> Amit >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Oct 16, 2014 at 9:32 AM, Burn Lewis <[email protected]> >>>> wrote: >>>> >>>> Yes, that parameter only limits the maximum scaleout. DUCC will ramp >>>>> >>>> up >>> >>>> the number of processors based on the available resources and the >>>>> >>>> amount >>> >>>> of >>>> >>>>> work to be done. It initially starts only 1 or 2 and only when one >>>>> initializes successfully will it start more. It may not start more if >>>>> >>>> it >>> >>>> suspects that all the work will be completed on the existing nodes >>>>> >>>> before >>> >>>> any new ones are ready. >>>>> >>>>> There is an additional type of scaleout, within each process, >>>>> >>>> controlled >>> >>>> by >>>> >>>>> --process_thread_count which controls how many threads in each process >>>>> >>>> are >>>> >>>>> capable of processing separate work items. >>>>> >>>>> ~Burn >>>>> >>>>> On Wed, Oct 15, 2014 at 7:11 PM, Amit Gupta <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, >>>>>> I've been trying to find the options related to configuration of >>>>>> >>>>> scaleout >>>> >>>>> of a ducc job. >>>>>> >>>>>> Thus far the only ones Ive found are: >>>>>> >>>>>> process_deployments_max: >>>>>> which limits the maximum number of processes spawned by a ducc job. >>>>>> >>>>>> At what point does DUCC decide to spawn a new process or spread >>>>>> >>>>> processing >>>>> >>>>>> out to a new node. Is there a tuning parameter for an optimal number >>>>>> >>>>> of >>> >>>> work items per process spawned? Can the user control this behavior? >>>>>> >>>>>> For example, >>>>>> I have a job large enough that DUCC natively spreads it across 2 >>>>>> >>>>> nodes. >>> >>>> I havent been able to force this job, via a config parameter, to >>>>>> >>>>> spread >>> >>>> across 4 nodes (or "X" nodes) for faster processing times. >>>>>> >>>>>> Does anyone know if theres a parameter than can directly control >>>>>> >>>>> scaleout >>>> >>>>> in this manner? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -- >>>>>> Amit Gupta >>>>>> >>>>>> >>>> >>>> -- >>>> Amit Gupta >>>> >>>> > -- Amit Gupta
