Re: [gridengine users] load_formula and PE jobs

Reuti Sat, 11 Aug 2012 13:54:09 -0700

Am 11.08.2012 um 20:30 schrieb Joseph Farran:

> Yes, all my queues have the same "0" for "seq_no".
> 
> Here is my scheduler load formula:
> 
> qconf -ssconf
> algorithm                         default
> schedule_interval                 0:0:15
> maxujobs                          0
> queue_sort_method                 load
> job_load_adjustments              NONE
> load_adjustment_decay_time        0
> load_formula                      -cores_in_use


Can you please try it with -slots? It should behave the same like your own 
complex. In one of your former post you mentioned a different relation == for 
it.

-- Reuti


> Here is a sample display of what is going on.  My compute nodes have 64 cores 
> each:
> 
> I submit 4 1-core jobs to my bio queue.   Note:  I wait around 30 seconds 
> before submitting each 1-core job, long enough for my "cores_in_use" to 
> report back correctly:
> 
> job-ID   name  user    state  queue             slots
> -----------------------------------------------------
>   2324  TEST  me      r      bio@compute-2-3   1
>   2325  TEST  me      r      bio@compute-2-3   1
>   2326  TEST  me      r      bio@compute-2-3   1
>   2327  TEST  me      r      bio@compute-2-3   1
> 
> Everything works great with single 1-core jobs.   Jobs 2324 through 2327 
> packed unto one node ( compute-2-3 ) correctly. The "cores_in_use" for 
> compute-2-3 reports "4".
> 
> Now I submit one 16-core "openmp" PE job:
> 
> job-ID   name  user    state  queue             slots
> -----------------------------------------------------
>   2324  TEST  me      r      bio@compute-2-3   1
>   2325  TEST  me      r      bio@compute-2-3   1
>   2326  TEST  me      r      bio@compute-2-3   1
>   2327  TEST  me      r      bio@compute-2-3   1
>   2328  TEST  me      r      bio@compute-2-6  16
> 
> The scheduler should have picked compute-2-3 since it has 4 cores_in_use, but 
> instead, it picked compute-2-6 which had 0 cores_in_use.     So here the 
> scheduler is now behaving differently than with 1-core jobs.
> 
> As a further test I wait until my cores_in_use report back that compute2-6 
> has "16" cores in use.   I now submit another 16-core "openmp" job:
> 
> job-ID   name  user    state  queue             slots
> -----------------------------------------------------
>   2324  TEST  me      r      bio@compute-2-3   1
>   2325  TEST  me      r      bio@compute-2-3   1
>   2326  TEST  me      r      bio@compute-2-3   1
>   2327  TEST  me      r      bio@compute-2-3   1
>   2328  TEST  me      r      bio@compute-2-6  16
>   2329  TEST  me      r      bio@compute-2-7  16
> 
> The schedule now picks yet a different node compute-2-7 which had 0 
> cores_in_use.    I have tried this several times with many config changes to 
> the scheduler and it sure looks like that the scheduler is *not* using the 
> "load_formula" for PE jobs.   From what I can tell, the scheduler chooses 
> nodes in random with PE jobs.
> 
> Here is my "openmp" PE:
> # qconf -sp openmp
> pe_name            openmp
> slots              9999
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    NONE
> stop_proc_args     NONE
> allocation_rule    $pe_slots
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary TRUE
> 
> Here is my "bio" Q showing relevant info:
> 
> # qconf -sq bio | egrep "qname|slots|pe_list"
> qname                 bio
> pe_list               make mpi openmp
> slots                 64
> 
> Thanks for taking a look at this!
> 
> 
> On 8/11/2012 4:32 AM, Reuti wrote:
>> Am 11.08.2012 um 02:57 schrieb Joseph Farran <[email protected]>:
>> 
>>> Reuti,
>>> 
>>> Are you sure this works in GE2011.11?
>>> 
>>> I have defined my own complex called "cores_in_use" which counts both 
>>> single cores and PE cores correctly.
>>> 
>>> It works great for single core jobs, but not for PE jobs using the 
>>> "$pe_slots" allocation rule.
>>> 
>>> # qconf -sp openmp
>>> pe_name            openmp
>>> slots              9999
>>> user_lists         NONE
>>> xuser_lists        NONE
>>> start_proc_args    NONE
>>> stop_proc_args     NONE
>>> allocation_rule    $pe_slots
>>> control_slaves     TRUE
>>> job_is_first_task  FALSE
>>> urgency_slots      min
>>> accounting_summary TRUE
>>> 
>>> # qconf -ssconf
>>> algorithm                         default
>>> schedule_interval                 0:0:15
>>> maxujobs                          0
>>> queue_sort_method                 seqno
>> The seq_no is the same for the queue instances in question?
>> 
>> -- Reuti
>> 
>>> job_load_adjustments              cores_in_use=1
>>> load_adjustment_decay_time        0
>>> load_formula                      -cores_in_use
>>> schedd_job_info                   true
>>> flush_submit_sec                  5
>>> flush_finish_sec                  5
>>> 
>>> I wait until the node reports the correct "cores_in_use" complex, I then 
>>> submit a PE openmp job and it totally ignores the "load_formula" on the 
>>> scheduler.
>>> 
>>> Joseph
>>> 
>>> On 08/09/2012 12:50 PM, Reuti wrote:
>>>> Correct. It uses the "allocation_rule" specified in the PE instead. Only 
>>>> for "allocation_rule" set to $PE_SLOTS it will also use the 
>>>> "load_formula". Unfortunately there is nothing what you can do to change 
>>>> the behavior.
>>>> 
>>>> -- Reuti
>>>> 
>>>> Am 09.08.2012 um 21:23 schrieb Joseph Farran<[email protected]>:
>>>> 
>>>>> Howdy.
>>>>> 
>>>>> I am using GE2011.11.
>>>>> 
>>>>> I am successfully using GE "load_formula" to load jobs by core count 
>>>>> using my own "load_sensor" script.
>>>>> 
>>>>> All works as expected with single core jobs, however, for PE jobs, it 
>>>>> seems as if GE does not abide by the "load_formula".
>>>>> 
>>>>> Does the scheduler use a different "load" formula for single core jobs 
>>>>> verses parallel jobs suing the PE environment setup?
>>>>> 
>>>>> Joseph
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] load_formula and PE jobs

Reply via email to