Am 11.08.2012 um 20:30 schrieb Joseph Farran: > Yes, all my queues have the same "0" for "seq_no". > > Here is my scheduler load formula: > > qconf -ssconf > algorithm default > schedule_interval 0:0:15 > maxujobs 0 > queue_sort_method load > job_load_adjustments NONE > load_adjustment_decay_time 0 > load_formula -cores_in_use
Can you please try it with -slots? It should behave the same like your own complex. In one of your former post you mentioned a different relation == for it. -- Reuti > Here is a sample display of what is going on. My compute nodes have 64 cores > each: > > I submit 4 1-core jobs to my bio queue. Note: I wait around 30 seconds > before submitting each 1-core job, long enough for my "cores_in_use" to > report back correctly: > > job-ID name user state queue slots > ----------------------------------------------------- > 2324 TEST me r bio@compute-2-3 1 > 2325 TEST me r bio@compute-2-3 1 > 2326 TEST me r bio@compute-2-3 1 > 2327 TEST me r bio@compute-2-3 1 > > Everything works great with single 1-core jobs. Jobs 2324 through 2327 > packed unto one node ( compute-2-3 ) correctly. The "cores_in_use" for > compute-2-3 reports "4". > > Now I submit one 16-core "openmp" PE job: > > job-ID name user state queue slots > ----------------------------------------------------- > 2324 TEST me r bio@compute-2-3 1 > 2325 TEST me r bio@compute-2-3 1 > 2326 TEST me r bio@compute-2-3 1 > 2327 TEST me r bio@compute-2-3 1 > 2328 TEST me r bio@compute-2-6 16 > > The scheduler should have picked compute-2-3 since it has 4 cores_in_use, but > instead, it picked compute-2-6 which had 0 cores_in_use. So here the > scheduler is now behaving differently than with 1-core jobs. > > As a further test I wait until my cores_in_use report back that compute2-6 > has "16" cores in use. I now submit another 16-core "openmp" job: > > job-ID name user state queue slots > ----------------------------------------------------- > 2324 TEST me r bio@compute-2-3 1 > 2325 TEST me r bio@compute-2-3 1 > 2326 TEST me r bio@compute-2-3 1 > 2327 TEST me r bio@compute-2-3 1 > 2328 TEST me r bio@compute-2-6 16 > 2329 TEST me r bio@compute-2-7 16 > > The schedule now picks yet a different node compute-2-7 which had 0 > cores_in_use. I have tried this several times with many config changes to > the scheduler and it sure looks like that the scheduler is *not* using the > "load_formula" for PE jobs. From what I can tell, the scheduler chooses > nodes in random with PE jobs. > > Here is my "openmp" PE: > # qconf -sp openmp > pe_name openmp > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args NONE > stop_proc_args NONE > allocation_rule $pe_slots > control_slaves TRUE > job_is_first_task FALSE > urgency_slots min > accounting_summary TRUE > > Here is my "bio" Q showing relevant info: > > # qconf -sq bio | egrep "qname|slots|pe_list" > qname bio > pe_list make mpi openmp > slots 64 > > Thanks for taking a look at this! > > > On 8/11/2012 4:32 AM, Reuti wrote: >> Am 11.08.2012 um 02:57 schrieb Joseph Farran <[email protected]>: >> >>> Reuti, >>> >>> Are you sure this works in GE2011.11? >>> >>> I have defined my own complex called "cores_in_use" which counts both >>> single cores and PE cores correctly. >>> >>> It works great for single core jobs, but not for PE jobs using the >>> "$pe_slots" allocation rule. >>> >>> # qconf -sp openmp >>> pe_name openmp >>> slots 9999 >>> user_lists NONE >>> xuser_lists NONE >>> start_proc_args NONE >>> stop_proc_args NONE >>> allocation_rule $pe_slots >>> control_slaves TRUE >>> job_is_first_task FALSE >>> urgency_slots min >>> accounting_summary TRUE >>> >>> # qconf -ssconf >>> algorithm default >>> schedule_interval 0:0:15 >>> maxujobs 0 >>> queue_sort_method seqno >> The seq_no is the same for the queue instances in question? >> >> -- Reuti >> >>> job_load_adjustments cores_in_use=1 >>> load_adjustment_decay_time 0 >>> load_formula -cores_in_use >>> schedd_job_info true >>> flush_submit_sec 5 >>> flush_finish_sec 5 >>> >>> I wait until the node reports the correct "cores_in_use" complex, I then >>> submit a PE openmp job and it totally ignores the "load_formula" on the >>> scheduler. >>> >>> Joseph >>> >>> On 08/09/2012 12:50 PM, Reuti wrote: >>>> Correct. It uses the "allocation_rule" specified in the PE instead. Only >>>> for "allocation_rule" set to $PE_SLOTS it will also use the >>>> "load_formula". Unfortunately there is nothing what you can do to change >>>> the behavior. >>>> >>>> -- Reuti >>>> >>>> Am 09.08.2012 um 21:23 schrieb Joseph Farran<[email protected]>: >>>> >>>>> Howdy. >>>>> >>>>> I am using GE2011.11. >>>>> >>>>> I am successfully using GE "load_formula" to load jobs by core count >>>>> using my own "load_sensor" script. >>>>> >>>>> All works as expected with single core jobs, however, for PE jobs, it >>>>> seems as if GE does not abide by the "load_formula". >>>>> >>>>> Does the scheduler use a different "load" formula for single core jobs >>>>> verses parallel jobs suing the PE environment setup? >>>>> >>>>> Joseph >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
