Re: [gridengine users] "Packing" jobs on nodes

Reuti Tue, 31 May 2011 13:14:04 -0700

Mmh,

I have only 6.2u5 around. I can only state, that for me it seemsworking as it should. Maybe someone having your version of SGE canperform the test too, to check whether there is any additonal setupmisconfigured, or whether it's a bug in this version.



Am 31.05.2011 um 21:24 schrieb James Gladden:

Reuti,
Thank you for your response and the helpful link. Unfortunately, Icannot seem to make the scheduler work as described. Here is myscheduler configuration:
[gladden@stuart ~]$ qconf -ssconf
algorithm                         default
schedule_interval                 0:0:06


NB:

0:2:0

maxujobs                          0
queue_sort_method                 load
job_load_adjustments              NONE
load_adjustment_decay_time        0:7:30
load_formula                      slots
schedd_job_info                   true
flush_submit_sec                  0

flush_finish_sec                  0

This will try to place the job 4 seconds after the submission or theend of a previous job, hence lowering the qmaster load by the smallschedule_interval.


-- Reuti

...
...
...
[gladden@stuart ~]$
Please note that queue_sort_method="load" and load_formula="slots"as per our discussion of how of how to configure the scheduler to"pack" jobs on nodes.
And here is an example of what happens when I submit a job. This isthe abbreviated output of "qhost -q" showing the state of the queueinstances on the first nine compute nodes on the cluster:
[gladden@stuart ~]$ qhost -q
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSESWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - -- - -compute-1-1 lx26-amd64 8 8.60 23.5G 16.3G7.8G 7.8G
  serial.q             BIP   0/8
  stf.q                BIP   8/8
  all.q                BIP   0/8
compute-1-2 lx26-amd64 8 7.35 23.5G 21.4G7.8G 38.9M
  serial.q             BIP   0/8
  stf.q                BIP   8/8
  all.q                BIP   0/8
compute-1-3 lx26-amd64 8 0.00 23.5G 89.2M7.8G 37.2M
  serial.q             BIP   0/8
  stf.q                BIP   0/8
  all.q                BIP   0/8
compute-1-4 lx26-amd64 8 0.00 23.5G 97.2M7.8G 37.6M
  serial.q             BIP   0/8
  stf.q                BIP   0/8
  all.q                BIP   0/8
compute-1-5 lx26-amd64 8 7.32 23.5G 21.2G7.8G 33.6M
  serial.q             BIP   0/8
  stf.q                BIP   8/8
  all.q                BIP   0/8
compute-1-6 lx26-amd64 8 7.98 23.5G 701.7M7.8G 25.1M
  serial.q             BIP   0/8
  stf.q                BIP   8/8
  all.q                BIP   0/8
compute-1-7 lx26-amd64 8 0.00 23.5G 167.1M7.8G 36.0M
  serial.q             BIP   0/8
  stf.q                BIP   0/8
  all.q                BIP   0/8
compute-1-8 lx26-amd64 8 0.00 23.5G 97.8M7.8G 35.0M
  serial.q             BIP   0/8
  stf.q                BIP   4/8
  all.q                BIP   0/8
compute-1-9 lx26-amd64 8 7.77 23.5G 528.3M7.8G 33.8M
  serial.q             BIP   0/8
  stf.q                BIP   8/8
  all.q                BIP   0/8
....
....
[gladden@stuart ~]$
Note that compute-1-3 is the first "empty" node on the list, andthat compute-1-8 is partially subscribed with 4 of the 8 slots inuse. And here is the output of " qhost -F slots" confirming thestate of the "slots" resource:
[gladden@stuart ~]$ qhost -F slots
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSESWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - -- - -compute-1-1 lx26-amd64 8 8.68 23.5G 16.0G7.8G 7.8G
   Host Resource(s):      hc:slots=0.000000
compute-1-2 lx26-amd64 8 7.32 23.5G 21.4G7.8G 38.9M
   Host Resource(s):      hc:slots=0.000000
compute-1-3 lx26-amd64 8 0.00 23.5G 89.2M7.8G 37.2M
   Host Resource(s):      hc:slots=8.000000
compute-1-4 lx26-amd64 8 0.00 23.5G 97.2M7.8G 37.6M
   Host Resource(s):      hc:slots=8.000000
compute-1-5 lx26-amd64 8 7.32 23.5G 21.2G7.8G 33.6M
   Host Resource(s):      hc:slots=0.000000
compute-1-6 lx26-amd64 8 7.98 23.5G 701.8M7.8G 25.1M
   Host Resource(s):      hc:slots=0.000000
compute-1-7 lx26-amd64 8 0.00 23.5G 167.2M7.8G 36.0M
   Host Resource(s):      hc:slots=8.000000
compute-1-8 lx26-amd64 8 0.00 23.5G 97.8M7.8G 35.0M
   Host Resource(s):      hc:slots=4.000000
compute-1-9 lx26-amd64 8 7.80 23.5G 528.5M7.8G 33.8M
   Host Resource(s):      hc:slots=0.000000
...
...
[gladden@stuart ~]$

However, if I submit a job like this:

[gladden@stuart ~]$ qsub -q stf.q  submit_test
Your job 524987 ("test") has been submitted

The result is this:

[gladden@stuart ~]$ qstat -u gladden
job-ID prior name user state submit/start atqueue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
524987 0.55500 test gladden r 05/31/2011 11:36:57 [email protected]1
Note that the submitted job ended up on compute-1-3 (the first emptynode) rather than on compute-1-8 which was the partially consumednode where the job should have been "packed". The stf.q instanceson this system are sequentially numbered starting with nodecompute-1-1, so it appears that the scheduler simply picked thelowest sequence numbered queue instance on a node with an availableslot. I see no evidence that it sorted the queue instances by loadas per the scheduler configuration.
I've done this experiment several times and the result appears to beconsistent. Is there some additional configuration issue I need toaddress? Or, perhaps, was there a bug in this version (6.2U2-1)that was later addressed?
James Gladden


On 5/27/2011 1:02 PM, Reuti wrote:
Am 27.05.2011 um 21:42 schrieb James Gladden:
On 5/18/2011 1:47 PM, Dave Love wrote:
James Gladden<[email protected]>  writes:
The scheduler picked stf.q@compute-1-1 which was the unloadednode, instead of"packing" the job into one of the four available slots oncompute-1-12 as wasdesired and expected. I should add that stf.q@compute-1-1 isthe lowestsequence number instance in stf.q, so this looks like the jobwas assigned by
sequence number rather  than by our "-slots" load formula.
Well, what's the queue_sort_method?
However, as I said, there's a bug, but it happens univa justfixed it --
see commits from a couple of days ago, I think.
Any suggestions? I have poked around in the archive withoutfinding the error
of my ways.  BTW, why the (-) inversion in the load formula?
don't you want to favour more loaded  nodes?
Yes, I do. "Slots" is a consumable resource, right? I the caseof our systems, the starting value for each execution host is setto 8. If the load formula is "-slots", then for an unloaded nodewe have:
load = (-8)
If we then dispatch a single slot job that node the load valuewould then change to
load = (-7)

Algebraically,

(-7) > (-8)
so the scheduler will perceive the empty node as "less loaded" anddispatch to it in preference to the node with one slot alreadyconsumed. I don't see how this formula "favors more loadednodes." On the other hand, if we go with a load formula of just(slots) then we get
7 < 8
so the scheduler should perceive the partially consumed node (load= 7) as "less loaded" and dispatch to it in preference to theempty node (load = 8). Is there some flaw in this logic?
There is none. It's like outlined here:
http://blogs.oracle.com/sgrell/entry/grid_engine_scheduler_hacks_least
-- Reuti
Alas, all of this seems moot as I have not been able to establishthat the scheduler actually pays any attention to the setting ofthe load formula or the queue sort method. My system appears todispatch jobs based on queue sequence number irrespective of thesesettings. For example, changing the load formula from "slots" to"-slots" appears to make no difference. The problem isdemonstrable with serial jobs, so it is not a bug associated withPEs. I have even tried restarting qmaster on the theory thatperhaps it would only recognize the change on daemon start up. Noluck.
Have any of you actually observed this feature to work? If so,what version of SGE are you running?
James Gladden
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] "Packing" jobs on nodes

Reply via email to