Re: [gridengine users] "Packing" jobs on nodes

Reuti Thu, 19 May 2011 00:55:40 -0700

Am 18.05.2011 um 01:34 schrieb Hung-ShengTsao (Lao Tsao) Ph.D.:

as a side remark
since you configure 3 queues for each host , in order not to oversubscribe the cpu(h_slots)
you will need to define h_slots as consumable complex
see this paper http://www.sun.com/blueprints/0607/820-1695.pdf
Table 2-1. h_slots Settings
complex_values h_slots=16,use_mem_size=32
Name Shortcut Type Relop Requestable Consumable Default Urgency
h_slots h_slots INT <= Forced Yes0 0
for each execution host add
complex_values        h_slots=8, use_mem_size=<mem>

Why do you want to define a new complex for this purpose? You canachieve the same with the normal "slots" complex and avoid forgettingto request this additonal one during your submissions.


I also checked the PDF you mentioned - I still see no need to use it.

-- Reuti

need
qsub -l h_slots=n(could be 1) in submit job

On 5/17/2011 3:25 PM, James Gladden wrote:
Dave,
Thank you for your reply. I have setup the scheduler as you havesuggested, but I still do not get the desired result.
Here's my scheduler configuration:

[gladden@stuart ~]$ qconf -ssconf|grep load
queue_sort_method                 load
job_load_adjustments              NONE
load_adjustment_decay_time        0:7:30
load_formula                      -slots
And here is the state of the hosts prior to doing a test jobsubmission:
[gladden@stuart ~]$ qhost -q
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSESWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - -- - -compute-1-1 lx26-amd64 8 0.00 23.5G 77.9M7.8G 33.4M
   serial.q             BIP   0/8
   stf.q                BIP   0/8
   all.q                BIP   0/8
compute-1-10 lx26-amd64 8 8.02 23.5G 15.5G7.8G 32.9M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-11 lx26-amd64 8 8.52 23.5G 19.7G7.8G 5.7G
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-12 lx26-amd64 8 4.00 23.5G 1.1G7.8G 24.0M
   serial.q             BIP   0/8
   stf.q                BIP   4/8
   all.q                BIP   0/8
compute-1-13 lx26-amd64 8 7.00 23.5G 22.2G7.8G 30.8M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-14 lx26-amd64 8 7.11 23.5G 22.2G7.8G 216.7M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-15 lx26-amd64 8 8.35 23.5G 21.1G7.8G 5.7G
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-16 lx26-amd64 8 7.38 23.5G 23.2G7.8G 32.4M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-17 lx26-amd64 8 7.38 23.5G 23.2G7.8G 32.3M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-18 lx26-amd64 8 0.00 23.5G 144.0M7.8G 18.2M
   all.q                BIP   0/8
   turecek.q            BIP   0/8
compute-1-19 lx26-amd64 8 0.00 23.5G 359.8M7.8G 11.5M
   all.q                BIP   0/8
   turecek.q            BIP   0/8
compute-1-2 lx26-amd64 8 8.03 23.5G 561.4M7.8G 29.3M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-20 lx26-amd64 8 0.00 23.5G 383.3M7.8G 14.0M
   all.q                BIP   0/8
   turecek.q            BIP   0/8
compute-1-21 lx26-amd64 8 8.00 23.5G 13.0G7.8G 22.9M
   robinson.q           BIP   8/8
   all.q                BIP   0/8
compute-1-22 lx26-amd64 8 1.00 23.5G 288.9M7.8G 30.4M
   robinson.q           BIP   1/8
   all.q                BIP   0/8
compute-1-23 lx26-amd64 8 8.00 23.5G 13.2G7.8G 35.2M
   robinson.q           BIP   8/8
   all.q                BIP   0/8
compute-1-24 lx26-amd64 8 0.00 23.5G 146.5M7.8G 25.0M
   khalil.q             BIP   0/8
   all.q                BIP   0/8
compute-1-3 lx26-amd64 8 8.05 23.5G 742.0M7.8G 32.9M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-4 lx26-amd64 8 0.00 23.5G 104.2M7.8G 31.6M
   serial.q             BIP   0/8
   stf.q                BIP   0/8
   all.q                BIP   0/8
compute-1-5 lx26-amd64 8 8.03 23.5G 8.4G7.8G 23.4M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-6 lx26-amd64 8 5.63 23.5G 1.3G7.8G 25.0M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
compute-1-7 lx26-amd64 8 0.00 23.5G 155.7M7.8G 33.3M
   serial.q             BIP   0/8
   stf.q                BIP   0/8
   all.q                BIP   0/8
compute-1-8 lx26-amd64 8 0.00 23.5G 126.0M7.8G 29.5M
   serial.q             BIP   0/8
   stf.q                BIP   0/8
   all.q                BIP   0/8
compute-1-9 lx26-amd64 8 8.02 23.5G 1.3G7.8G 32.7M
   serial.q             BIP   0/8
   stf.q                BIP   8/8
   all.q                BIP   0/8
Please note in particular that node compute-1-1 is idle - none ofthe eight slots are in use by any of the queues. However, nodecompute-1-12 has four slots in use, with the other four slotsavailable on the stf.q queue, which is the queue to which the testjob will be submitted.
Here is the submittal:

[gladden@stuart ~]$ qsub  -q stf.q submit_test
Your job 523949 ("test") has been submitted

And here is the result:

[gladden@stuart ~]$ qstat -u gladden
job-ID prior name user state submit/start atqueue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
523949 0.55500 test gladden r 05/17/2011 11:58:18 [email protected]1
The scheduler picked stf.q@compute-1-1 which was the unloaded node,instead of "packing" the job into one of the four available slotson compute-1-12 as was desired and expected. I should add thatstf.q@compute-1-1 is the lowest sequence number instance in stf.q,so this looks like the job was assigned by sequence number ratherthan by our "-slots" load formula.
Any suggestions? I have poked around in the archive withoutfinding the error of my ways. BTW, why the (-) inversion in theload formula?
Jim


Dave Love wrote:
Thats atypical requirement:

$ qconf -ssconf|grep load
job_load_adjustments              NONE
load_adjustment_decay_time        0:7:30
load_formula                      -slots
See past discussion here and on the old list. Note that there isa bug
which may prevent it working properly with parallel jobs.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
<laotsao.vcf>_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] "Packing" jobs on nodes

Reply via email to