Am 14.08.2012 um 00:27 schrieb Joseph Farran: > Hi Alex. > > Thanks for the info, but the issue is more complex. > > The issue is that slots cannot be used with Subordinate queues. > > Why not? Reason is here: > > http://gridengine.org/pipermail/users/2012-August/004372.html
But it seems working even if you don't attach the slots complex to each exechost to pack (at least) serial jobs by "load_formula slots". -- Reuti > Best, > Joseph > > On 08/13/2012 03:12 PM, Alex Chekholko wrote: >> Hi, >> >> I'm not sure if this helps, but we have a working config with: >> queue_sort_method seqno >> load_formula slots >> >> That puts single-slot jobs onto a single node if a bunch of nodes are empty, >> rather than distributing them evenly across empty nodes. >> >> Regards, >> Alex >> >> On 08/13/2012 09:14 AM, Joseph Farran wrote: >>> Hi Reuti / Rayson. >>> >>> To make we are on the same page, are you saying that for PE jobs using >>> "$pe_slots" for the "allocation_rule", that Grid Engine does indeed >>> ignore the "load_formula" on the scheduler? >>> >>> >>> If Yes, a couple of questions please: >>> >>> 1) Was there a point in which GE did *not* ignore the >>> "load_formula" for PE jobs "$pe_slots"? >>> 2) Will this be brought back to GE on a future release? >>> >>> Joseph >>> >>> >>> >>> On 08/13/2012 08:22 AM, Reuti wrote: >>>> Am 12.08.2012 um 19:55 schrieb Joseph Farran: >>>> >>>>> Hi Rayson. >>>>> >>>>> Here is one particular entry: >>>>> http://gridengine.org/pipermail/users/2012-May/003495.html >>>>> >>>>> I am using Grid Engine 2011.11 binary >>>>> http://dl.dropbox.com/u/47200624/respin/ge2011.11.tar.gz >>>> First of all sorry for using the wrong expression. If you used >>>> "-cores_in_use", it should be the positive "slots". As a lower value >>>> is taken first, a lower remaining number of slots should be taken >>>> first. It's working as it should for serial jobs. >>>> >>>> But for parallel ones, even with $pe_slots as allocation rule, it's >>>> ignored already in 6.2u5. >>>> >>>> -- Reuti >>>> >>>> >>>>> Thanks, >>>>> Joseph >>>>> >>>>> On 8/12/2012 10:10 AM, Rayson Ho wrote: >>>>>> On Sun, Aug 12, 2012 at 5:27 AM, Joseph Farran<[email protected]> wrote: >>>>>>> I saw some old postings that this used to be a bug with GE, that >>>>>>> parallel >>>>>>> jobs were not using the scheduler load_formula. Was this bug >>>>>>> corrected in >>>>>>> GE2011.11 ? >>>>>> Hi Joseph, >>>>>> >>>>>> Can you point me to the previous discussion? We did not receive bug >>>>>> report related to this problem before... >>>>>> >>>>>> So far, our main focus is to fix issues& bugs reported by our users >>>>>> first, and may be we've missed the discussion on this bug. >>>>>> >>>>>> Rayson >>>>>> >>>>>> >>>>>> >>>>>>> Anyone able to test this in GE2011.11 to see if it was fixed? >>>>>>> >>>>>>> Joseph >>>>>>> >>>>>>> >>>>>>> On 8/11/2012 1:51 PM, Reuti wrote: >>>>>>>> Am 11.08.2012 um 20:30 schrieb Joseph Farran: >>>>>>>> >>>>>>>>> Yes, all my queues have the same "0" for "seq_no". >>>>>>>>> >>>>>>>>> Here is my scheduler load formula: >>>>>>>>> >>>>>>>>> qconf -ssconf >>>>>>>>> algorithm default >>>>>>>>> schedule_interval 0:0:15 >>>>>>>>> maxujobs 0 >>>>>>>>> queue_sort_method load >>>>>>>>> job_load_adjustments NONE >>>>>>>>> load_adjustment_decay_time 0 >>>>>>>>> load_formula -cores_in_use >>>>>>>> Can you please try it with -slots? It should behave the same like >>>>>>>> your own >>>>>>>> complex. In one of your former post you mentioned a different >>>>>>>> relation == >>>>>>>> for it. >>>>>>>> >>>>>>>> -- Reuti >>>>>>>> >>>>>>>> >>>>>>>>> Here is a sample display of what is going on. My compute nodes >>>>>>>>> have 64 >>>>>>>>> cores each: >>>>>>>>> >>>>>>>>> I submit 4 1-core jobs to my bio queue. Note: I wait around 30 >>>>>>>>> seconds >>>>>>>>> before submitting each 1-core job, long enough for my >>>>>>>>> "cores_in_use" to >>>>>>>>> report back correctly: >>>>>>>>> >>>>>>>>> job-ID name user state queue slots >>>>>>>>> ----------------------------------------------------- >>>>>>>>> 2324 TEST me r bio@compute-2-3 1 >>>>>>>>> 2325 TEST me r bio@compute-2-3 1 >>>>>>>>> 2326 TEST me r bio@compute-2-3 1 >>>>>>>>> 2327 TEST me r bio@compute-2-3 1 >>>>>>>>> >>>>>>>>> Everything works great with single 1-core jobs. Jobs 2324 >>>>>>>>> through 2327 >>>>>>>>> packed unto one node ( compute-2-3 ) correctly. The >>>>>>>>> "cores_in_use" for >>>>>>>>> compute-2-3 reports "4". >>>>>>>>> >>>>>>>>> Now I submit one 16-core "openmp" PE job: >>>>>>>>> >>>>>>>>> job-ID name user state queue slots >>>>>>>>> ----------------------------------------------------- >>>>>>>>> 2324 TEST me r bio@compute-2-3 1 >>>>>>>>> 2325 TEST me r bio@compute-2-3 1 >>>>>>>>> 2326 TEST me r bio@compute-2-3 1 >>>>>>>>> 2327 TEST me r bio@compute-2-3 1 >>>>>>>>> 2328 TEST me r bio@compute-2-6 16 >>>>>>>>> >>>>>>>>> The scheduler should have picked compute-2-3 since it has 4 >>>>>>>>> cores_in_use, >>>>>>>>> but instead, it picked compute-2-6 which had 0 cores_in_use. >>>>>>>>> So here the >>>>>>>>> scheduler is now behaving differently than with 1-core jobs. >>>>>>>>> >>>>>>>>> As a further test I wait until my cores_in_use report back that >>>>>>>>> compute2-6 has "16" cores in use. I now submit another 16-core >>>>>>>>> "openmp" >>>>>>>>> job: >>>>>>>>> >>>>>>>>> job-ID name user state queue slots >>>>>>>>> ----------------------------------------------------- >>>>>>>>> 2324 TEST me r bio@compute-2-3 1 >>>>>>>>> 2325 TEST me r bio@compute-2-3 1 >>>>>>>>> 2326 TEST me r bio@compute-2-3 1 >>>>>>>>> 2327 TEST me r bio@compute-2-3 1 >>>>>>>>> 2328 TEST me r bio@compute-2-6 16 >>>>>>>>> 2329 TEST me r bio@compute-2-7 16 >>>>>>>>> >>>>>>>>> The schedule now picks yet a different node compute-2-7 which had 0 >>>>>>>>> cores_in_use. I have tried this several times with many config >>>>>>>>> changes to >>>>>>>>> the scheduler and it sure looks like that the scheduler is *not* >>>>>>>>> using the >>>>>>>>> "load_formula" for PE jobs. From what I can tell, the scheduler >>>>>>>>> chooses >>>>>>>>> nodes in random with PE jobs. >>>>>>>>> >>>>>>>>> Here is my "openmp" PE: >>>>>>>>> # qconf -sp openmp >>>>>>>>> pe_name openmp >>>>>>>>> slots 9999 >>>>>>>>> user_lists NONE >>>>>>>>> xuser_lists NONE >>>>>>>>> start_proc_args NONE >>>>>>>>> stop_proc_args NONE >>>>>>>>> allocation_rule $pe_slots >>>>>>>>> control_slaves TRUE >>>>>>>>> job_is_first_task FALSE >>>>>>>>> urgency_slots min >>>>>>>>> accounting_summary TRUE >>>>>>>>> >>>>>>>>> Here is my "bio" Q showing relevant info: >>>>>>>>> >>>>>>>>> # qconf -sq bio | egrep "qname|slots|pe_list" >>>>>>>>> qname bio >>>>>>>>> pe_list make mpi openmp >>>>>>>>> slots 64 >>>>>>>>> >>>>>>>>> Thanks for taking a look at this! >>>>>>>>> >>>>>>>>> >>>>>>>>> On 8/11/2012 4:32 AM, Reuti wrote: >>>>>>>>>> Am 11.08.2012 um 02:57 schrieb Joseph Farran<[email protected]>: >>>>>>>>>> >>>>>>>>>>> Reuti, >>>>>>>>>>> >>>>>>>>>>> Are you sure this works in GE2011.11? >>>>>>>>>>> >>>>>>>>>>> I have defined my own complex called "cores_in_use" which >>>>>>>>>>> counts both >>>>>>>>>>> single cores and PE cores correctly. >>>>>>>>>>> >>>>>>>>>>> It works great for single core jobs, but not for PE jobs using the >>>>>>>>>>> "$pe_slots" allocation rule. >>>>>>>>>>> >>>>>>>>>>> # qconf -sp openmp >>>>>>>>>>> pe_name openmp >>>>>>>>>>> slots 9999 >>>>>>>>>>> user_lists NONE >>>>>>>>>>> xuser_lists NONE >>>>>>>>>>> start_proc_args NONE >>>>>>>>>>> stop_proc_args NONE >>>>>>>>>>> allocation_rule $pe_slots >>>>>>>>>>> control_slaves TRUE >>>>>>>>>>> job_is_first_task FALSE >>>>>>>>>>> urgency_slots min >>>>>>>>>>> accounting_summary TRUE >>>>>>>>>>> >>>>>>>>>>> # qconf -ssconf >>>>>>>>>>> algorithm default >>>>>>>>>>> schedule_interval 0:0:15 >>>>>>>>>>> maxujobs 0 >>>>>>>>>>> queue_sort_method seqno >>>>>>>>>> The seq_no is the same for the queue instances in question? >>>>>>>>>> >>>>>>>>>> -- Reuti >>>>>>>>>> >>>>>>>>>>> job_load_adjustments cores_in_use=1 >>>>>>>>>>> load_adjustment_decay_time 0 >>>>>>>>>>> load_formula -cores_in_use >>>>>>>>>>> schedd_job_info true >>>>>>>>>>> flush_submit_sec 5 >>>>>>>>>>> flush_finish_sec 5 >>>>>>>>>>> >>>>>>>>>>> I wait until the node reports the correct "cores_in_use" >>>>>>>>>>> complex, I >>>>>>>>>>> then submit a PE openmp job and it totally ignores the >>>>>>>>>>> "load_formula" on the >>>>>>>>>>> scheduler. >>>>>>>>>>> >>>>>>>>>>> Joseph >>>>>>>>>>> >>>>>>>>>>> On 08/09/2012 12:50 PM, Reuti wrote: >>>>>>>>>>>> Correct. It uses the "allocation_rule" specified in the PE >>>>>>>>>>>> instead. >>>>>>>>>>>> Only for "allocation_rule" set to $PE_SLOTS it will also use the >>>>>>>>>>>> "load_formula". Unfortunately there is nothing what you can do >>>>>>>>>>>> to change the >>>>>>>>>>>> behavior. >>>>>>>>>>>> >>>>>>>>>>>> -- Reuti >>>>>>>>>>>> >>>>>>>>>>>> Am 09.08.2012 um 21:23 schrieb Joseph Farran<[email protected]>: >>>>>>>>>>>> >>>>>>>>>>>>> Howdy. >>>>>>>>>>>>> >>>>>>>>>>>>> I am using GE2011.11. >>>>>>>>>>>>> >>>>>>>>>>>>> I am successfully using GE "load_formula" to load jobs by >>>>>>>>>>>>> core count >>>>>>>>>>>>> using my own "load_sensor" script. >>>>>>>>>>>>> >>>>>>>>>>>>> All works as expected with single core jobs, however, for PE >>>>>>>>>>>>> jobs, it >>>>>>>>>>>>> seems as if GE does not abide by the "load_formula". >>>>>>>>>>>>> >>>>>>>>>>>>> Does the scheduler use a different "load" formula for single >>>>>>>>>>>>> core >>>>>>>>>>>>> jobs verses parallel jobs suing the PE environment setup? >>>>>>>>>>>>> >>>>>>>>>>>>> Joseph >> > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
