Am 04.08.2012 um 07:06 schrieb Joseph Farran: > Found the issue. If I start with the count being the number of cores > counting down, then it works.
Yep, therefore I wrote -slots (the minus sign would make it negative by intention), but counting down will also work. -- Reuti > On 8/3/2012 4:29 PM, Joseph Farran wrote: >> I create a load sensor and it is reporting accordingly. Not sure if I got >> the sensor options correct? >> >> # qconf -sc| egrep cores_in_use >> cores_in_use cu INT == YES NO 0 0 >> >> The nodes are reporting cores in use. Compute-3-3 has two jobs and qhost >> reports accordingly: >> >> # qhost -F -h compute-3-2 | egrep cores_in_use >> hl:cores_in_use=2.000000 >> >> I setup the scheduler with: >> >> # qconf -ssconf | egrep "queue|load" >> queue_sort_method seqno >> job_load_adjustments NONE >> load_adjustment_decay_time 0 >> load_formula cores_in_use >> >> But jobs are not packing. >> >> >> >> On 08/03/2012 12:58 PM, Reuti wrote: >>> Well, for single core jobs you can change the sort order to pack jobs on >>> nodes. But instead of the usual -slots you will need a special load sensor >>> reporting only the used slots owner queue and use this variable. >>> >>> -- Reuti >>> >>> Von meinem iPad gesendet >>> >>> Am 03.08.2012 um 21:23 schrieb Joseph Farran<[email protected]>: >>> >>>> For others that are trying to pack jobs on nodes and using subordinate >>>> queues, here is an example of why job-packing is so critical: >>>> >>>> Consider the following scenario. We have two queues, "owner" and "free" >>>> with "free" being the subordinate queue. >>>> >>>> Our two compute nodes have 8 cores each. >>>> >>>> We load up our free queue with 16 single core jobs: >>>> >>>> job-ID prior name user state queue slots ja-task-ID >>>> --------------------------------------------------------------------------- >>>> 8560 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8561 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8562 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8563 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8564 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8565 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8566 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8567 0.55500 FREE testfree r free@compute-3-2 1 >>>> 8568 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8569 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8570 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8571 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8572 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8573 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8574 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8575 0.55500 FREE testfree r free@compute-3-1 1 >>>> >>>> >>>> The owner now submits ONE single core job: >>>> >>>> $ qstat >>>> job-ID prior name user state queue slots ja-task-ID >>>> --------------------------------------------------------------------------- >>>> 8560 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8561 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8562 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8563 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8564 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8565 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8566 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8567 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8568 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8569 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8570 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8571 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8572 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8573 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8574 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8575 0.55500 FREE testfree r free@compute-3-1 1 >>>> 8584 0.55500 OWNER testbio r owner@compute-3-2 1 >>>> >>>> >>>> All cores on compute-3-2 are suspended in order to run that one single >>>> core owner job #8584. >>>> >>>> Not the ideal or best setup, but we can live with this. >>>> >>>> However, here is where it get's nasty. >>>> >>>> The owner now submits another ONE core job. At this point, compute-3-2 >>>> has 7 free cores on which it could schedule this additional ONE core job, >>>> but no, GE likes to spread jobs: >>>> >>>> $ qstat >>>> job-ID prior name user state queue slots ja-task-ID >>>> --------------------------------------------------------------------------- >>>> 8560 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8561 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8562 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8563 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8564 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8565 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8566 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8567 0.55500 FREE testfree S free@compute-3-2 1 >>>> 8568 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8569 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8570 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8571 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8572 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8573 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8574 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8575 0.55500 FREE testfree S free@compute-3-1 1 >>>> 8584 0.55500 OWNER testbio r owner@compute-3-2 1 >>>> 8585 0.55500 OWNER testbio r owner@compute-3-1 1 >>>> >>>> The new single core job # 8585 starts on compute-3-1 instead of on >>>> compute-3-2 suspending another 7 cores. >>>> >>>> If job-packing with subordinate queues were available, job #8585 would >>>> have started compute-3-2 since it has cores available. >>>> >>>> Two single ONE core jobs suspend 16 single core jobs. Nasty and >>>> wasteful! >>>> >>>> >>>> >>>> >>>> On 08/03/2012 10:10 AM, Joseph Farran wrote: >>>>> On 08/03/2012 09:57 AM, Reuti wrote: >>>>>> Am 03.08.2012 um 18:50 schrieb Joseph Farran: >>>>>> >>>>>>> On 08/03/2012 09:18 AM, Reuti wrote: >>>>>>>> Am 03.08.2012 um 18:04 schrieb Joseph Farran: >>>>>>>> >>>>>>>>> I pack jobs unto nodes using the following GE setup: >>>>>>>>> >>>>>>>>> # qconf -ssconf | egrep "queue|load" >>>>>>>>> queue_sort_method seqno >>>>>>>>> job_load_adjustments NONE >>>>>>>>> load_adjustment_decay_time 0 >>>>>>>>> load_formula slots >>>>>>>>> >>>>>>>>> I also set my nodes with the slots complex value: >>>>>>>>> >>>>>>>>> # qconf -rattr exechost complex_values "slots=64" compute-2-1 >>>>>>>> Don't limit it here. Just define 64 in both queues for slots. >>>>>>>> >>>>>>> Yes, I tried that approached as well but then parallel jobs will not >>>>>>> suspend equal number of serial jobs. >>>>>>> >>>>>>> So after I setup the above ( note my test queue and nodes have 8 cores >>>>>>> and not 64 ): >>>>>>> >>>>>>> # qconf -sq owner | egrep "slots" >>>>>>> slots 8 >>>>>>> subordinate_list slots=8(free:0:sr) >>>>>>> >>>>>>> # qconf -sq free | egrep "slots" >>>>>>> slots 8 >>>>>>> >>>>>>> [# qconf -se compute-3-1 | egrep complex >>>>>>> complex_values NONE >>>>>>> # qconf -se compute-3-2 | egrep complex >>>>>>> complex_values NONE >>>>>>> >>>>>>> When I submit one 8 parallel job to owner, only one core in free is >>>>>>> suspended instead of 8: >>>>>>> >>>>>>> Here is qstat listing: >>>>>>> >>>>>>> job-ID prior name user state queue slots >>>>>>> -------------------------------------------------------------- >>>>>>> 8531 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8532 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8533 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8534 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8535 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8536 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8537 0.50500 FREE testfree r free@compute-3-1 1 >>>>>>> 8538 0.50500 FREE testfree S free@compute-3-1 1 >>>>>>> 8539 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8540 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8541 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8542 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8543 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8544 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8545 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8546 0.50500 FREE testfree r free@compute-3-2 1 >>>>>>> 8547 0.60500 Owner me r owner@compute-3-1 8 >>>>>>> >>>>>>> >>>>>>> Job 8547 on owner queue starts just fine running with 8 cores on >>>>>>> compute-3-1 *but* only one core in compute-3-1 from free queue is >>>>>>> suspended instead of 8 cores. >>>>>> AFAIR this is a known bug for parallel jobs. >>>>> So the answer to my original question is that no, it cannot be done. >>>>> >>>>> Is there another open source GE flavor that has fixed this bug, or is >>>>> this bug across all open source GE flavors? >>>>> >>>>> >>>>>>>>> Serial jobs are all packed nicely unto a node until the node is full >>>>>>>>> and then it goes unto the next node. >>>>>>>>> >>>>>>>>> >>>>>>>>> The issue I am having is that my subordinate queue breaks when I have >>>>>>>>> set my nodes with the node complex value above. >>>>>>>>> >>>>>>>>> I have two queues: The owner queue and the free queue: >>>>>>>>> >>>>>>>>> # qconf -sq owner | egrep "subordinate|shell" >>>>>>>>> shell /bin/bash >>>>>>>>> shell_start_mode posix_compliant >>>>>>>>> subordinate_list free=1 >>>>>>>> subordinate_list slots=64(free) >>>>>>>> >>>>>>>> >>>>>>>>> # qconf -sq free | egrep "subordinate|shell" >>>>>>>>> shell /bin/bash >>>>>>>>> shell_start_mode posix_compliant >>>>>>>>> subordinate_list NONE >>>>>>>>> >>>>>>>>> When I fill up the free queue with serial jobs and I then submit a >>>>>>>>> job to the owner queue, the owner job will not suspend the free job. >>>>>>>>> Qstat scheduling info says: >>>>>>>>> >>>>>>>>> queue instance "[email protected]" dropped because it is full >>>>>>>>> queue instance "[email protected]" dropped because it is full >>>>>>>>> >>>>>>>>> If I remove the "complex_values=" from my nodes, then jobs are >>>>>>>>> correctly suspended in free queue and the owner job runs just fine. >>>>>>>> Yes, and what's the problem with this setup? >>>>>>> What is wrong with the above setup is that the 'owner' cannot run >>>>>>> because free jobs are not suspended. >>>>>> They are not suspended in advance. The suspension is the result of an >>>>>> additional job being started thereon. Not the other way round. >>>>> Right, but the idea of a subordinate queue ( job preemption ) is that >>>>> when a job *IS* scheduled, that the subordinate queue suspend jobs. I >>>>> mean, that's the whole idea. >>>>> >>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>>>> So how can I accomplish both items above? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *** By the way, here are some pre-answers to some questions I am >>>>>>>>> going to be asked: >>>>>>>>> >>>>>>>>> Why pack jobs?: Because in any HPC environment that runs a mixture >>>>>>>>> of serial and parallel jobs, you really don't want to spread single >>>>>>>>> core jobs across multiple nodes, specially 64 cores nodes. You want >>>>>>>>> to keep nodes whole for parallel jobs ( this is HPC 101 ). >>>>>>>> Depends on the application. E.g. Molcas is writing a lot to the local >>>>>>>> scratch disk, so it's better to spread them in the cluster and use the >>>>>>>> remaining cores in each exechost for jobs without or at least with >>>>>>>> less disk access. >>>>>>> Yes, there will always be exceptions. I should have said in most 99% >>>>>>> of circumstances. >>>>>>> >>>>>>> >>>>>>>> -- Reuti >>>>>>>> >>>>>>>> >>>>>>>>> Suspended jobs will not free up resources: Yeap, but the jobs will >>>>>>>>> *not* be consuming CPU cycles which is what I want. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Joseph >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> [email protected] >>>>> https://gridengine.org/mailman/listinfo/users >>>>> >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://gridengine.org/mailman/listinfo/users >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
