Re: [gridengine users] Filling up nodes when using gepetools
So it sounds like the $pe_slots=serial behavior isn't true for your grid engine. An alternative would be to convert single node jobs to serial jobs that requests a consumable(other than slots) representing the number of cores on the node. However this will change the behavior when requesting other consumables and probably won't interact well with core binding. I'm just testing another way/trick: I attached to each queue-instance for each host an own seq_no (numbers increasing with +1). E.g. # qconf -sq mpi seq_no 0, [f01=101],[f02=102], ... So far the tests look pretty good now! :-) ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
What happens with serial jobs (ie no PE at all)? If they exhibit the same behavior then we just need to figure out how to tweak the scheduler config to fill up nodes. If serial jobs are already clumping onto a few nodes then the claim that $pe_slots behaves like serial on the oracle blog that Reuti pointed to may not be an accurate description of your version of grid engine. Because gepetools is configured as default jsv ( jsv_url /software/sge/gepetools/pe.jsv ) for all jobs, serial jobs are also concerned (nodes=1,ppn=1). Just for testing I disabled the jsv temporarily: the scheduling with serial jobs work then as in the oracle blog (= filling up) ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
Have you got slots defined per host or just per queue? It may need to be defined at a host level via complex_values in order to work. Both: # qconf -sc | grep #name\|slots #name shortcut typerelop requestable consumable default urgency occupied_slots o_slotsINT = YES YES1 0 slots s INT = YES YES1 1000 # qconf -se f20 hostname f20 load_scaling NONE complex_valuesoccupied_slots=12,slots=12,num_proc=12,h_vmem=45.01G, \ exclusive=true # qconf -sq mpi ... slots 1,[f01=12],[f02=12],[f38=12],[f03=12],[f04=12],[f05=12], \ [f06=12],[f07=12],[f08=12],[f09=12],[f10=12],[f11=12], \ [f12=12],[f13=12],[f14=12],[f15=12],[f16=12],[f17=12], \ [f18=12],[f19=12],[f20=12],[f21=12],[f22=12],[f23=12], \ [f25=12],[f26=12],[f27=12],[f28=12],[f29=12],[f30=12], \ [f31=12],[f32=12],[f33=12],[f34=12],[f35=12],[f36=12], \ [f37=12],[f24=12] ... Maybe occupied_slots is the key??? ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
complex_valuesoccupied_slots=12,slots=12,num_proc=12,h_vmem=45.01G, \ exclusive=true To streamline this (but it won't solve your problem): I usually recommend not to set num_proc, as it's a feature of the host which is detected by SGE already. Limitations are implemented by the slots complex already. Then I'll remove it. slots 1,[f01=12],[f02=12],[f38=12],[f03=12],[f04=12],[f05=12], \ [f06=12],[f07=12],[f08=12],[f09=12],[f10=12],[f11=12], \ [f12=12],[f13=12],[f14=12],[f15=12],[f16=12],[f17=12], \ [f18=12],[f19=12],[f20=12],[f21=12],[f22=12],[f23=12], \ [f25=12],[f26=12],[f27=12],[f28=12],[f29=12],[f30=12], \ [f31=12],[f32=12],[f33=12],[f34=12],[f35=12],[f36=12], \ [f37=12],[f24=12] When all are having 12, it can just be put as a single value here. Ok. Thank you for your optimation-recommendations! ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
On Fri, 31 Jul 2015 05:19:32 + Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at wrote: Well, I created an additional PE with alloacation_rule $pe_slots, and built in an if condition in pe.jsv for all jobs which request just a single node to be assigned to this new PE. But the annoying situation didn't change. The scheduler configuration is set to queue_sort_methodload and load_formula slots. So what I'm still missing? I believe it should be a load_formula of -slots so the more slots are available(fewest used) the lower the load and the more attractive the node. The page Reuti pointed to manages to write this both ways around. Setting load_formula to -slots doesn't change anything - every job still starts on a separate host (but in this case it should be the correct hehave if I don't misinterpret the instructions from the Web Page Reuti mentioned). I must be missing something else and pretty basic... Have you got slots defined per host or just per queue? It may need to be defined at a host level via complex_values in order to work. pgpo3tauwmNHL.pgp Description: OpenPGP digital signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
On Thu, 30 Jul 2015 12:57:13 + Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at wrote: My suggestion was to modify your jsv/gepetools to force single node parallel jobs into PEs with $pe_slots allocation rules (which gives you control over where they are scheduled via queue_sort_method and load_formula) while sending the others to PEs with other (appropriate) allocation rules that won't cause (ii). Well, I created an additional PE with alloacation_rule $pe_slots, and built in an if condition in pe.jsv for all jobs which request just a single node to be assigned to this new PE. But the annoying situation didn't change. The scheduler configuration is set to queue_sort_methodload and load_formula slots. So what I'm still missing? How is job_load_adjustment configured? ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users pgpfZHlQA2o9k.pgp Description: OpenPGP digital signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
Sorry to step in the discussion: `qstat -j ...` shows the requested one, the granted one is in `qstat -r`. $ qsub -pe * 2 test.sh Your job 44329 (test.sh) has been submitted $ qstat -j 44329 ... parallel environment: * range: 2 ... My jobs: qstat -j ... ... parallel environment: gepetools_1host range 2 ... That's the PE I created for that purposes. So qstat -j shows the right info. $ qstat -r ... Requested PE: * 2 Granted PE: make 2 qstat -r ... Requested PE: gepetools_1host 2 Granted PE:gepetools_1host 2 ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
Well, I created an additional PE with alloacation_rule $pe_slots, and built in an if condition in pe.jsv for all jobs which request just a single node to be assigned to this new PE. But the annoying situation didn't change. The scheduler configuration is set to queue_sort_methodload and load_formula slots. So what I'm still missing? I believe it should be a load_formula of -slots so the more slots are available(fewest used) the lower the load and the more attractive the node. The page Reuti pointed to manages to write this both ways around. Setting load_formula to -slots doesn't change anything - every job still starts on a separate host (but in this case it should be the correct hehave if I don't misinterpret the instructions from the Web Page Reuti mentioned). I must be missing something else and pretty basic... ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
On Thu, 30 Jul 2015 06:12:52 + Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at wrote: -Ursprüngliche Nachricht- Von: Reuti [mailto:re...@staff.uni-marburg.de] Gesendet: Mittwoch, 29. Juli 2015 15:10 An: Winkler, Ursula (ursula.wink...@uni-graz.at) Cc: users@gridengine.org Betreff: Re: [gridengine users] Filling up nodes when using gepetools Hi, Am 29.07.2015 um 12:50 schrieb Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at: Node1 has 12 Cores/Slots and 1 MPI-Job with 2 Slots is running on it. A user submits job2 which require maximal 10 slots. Independently from schedule_interval, job_load_adjustments, load_formula and/or load_adjustment_decay_time parameters-settings job2 usually won't start on Node1 if What about queue_sort_method? Doesn't work neither. As long as the requested PE has $pe_slots as allocation_rule, it should be possible to use a fill up configuration: https://blogs.oracle.com/sgrell/entry/grid_engine_scheduler_hacks_least Thank you for the link, that with $pe_slots I didn't know. But unfortunately it still doesn't work - maybe because of the gepetools Sub-PE's. Setting there $pe_slots too has the effect that jobs doesn't start anymore. Ursula $pe_slots restricts you to a single node so I'm guessing the jobs that don't start are jobs that need more than one node. While we don't use gepetools we do have a JSV that rewrites people's requested PE based on the number What you need I think is something that routes jobs that request 1 node to PEs with a $pe_slots allocation rule while other jobs are routed to nodes with an allocation rule equal to the requested ppn. In all cases the number of slots to request should be nodes*ppn. ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users pgp5A7ayz_l7X.pgp Description: OpenPGP digital signature ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
I believe it should be a load_formula of -slots so the more slots are available(fewest used) the lower the load and the more attractive the node. The page Reuti pointed to manages to write this both ways around. I'll try it out tomorrow - I'm not at the office now and it's a little bit difficult from here. ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
On Thu, 30 Jul 2015 12:57:13 + Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at wrote: My suggestion was to modify your jsv/gepetools to force single node parallel jobs into PEs with $pe_slots allocation rules (which gives you control over where they are scheduled via queue_sort_method and load_formula) while sending the others to PEs with other (appropriate) allocation rules that won't cause (ii). Well, I created an additional PE with alloacation_rule $pe_slots, and built in an if condition in pe.jsv for all jobs which request just a single node to be assigned to this new PE. But the annoying situation didn't change. The scheduler configuration is set to queue_sort_methodload and load_formula slots. So what I'm still missing? Ignore previous message. Me getting it back to front I think. That looks correct (I think). Have you checked the jobs show the right granted PE with qstat -j? Yes, of course. ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
Am 30.07.2015 um 18:14 schrieb Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at: On Thu, 30 Jul 2015 12:57:13 + Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at wrote: My suggestion was to modify your jsv/gepetools to force single node parallel jobs into PEs with $pe_slots allocation rules (which gives you control over where they are scheduled via queue_sort_method and load_formula) while sending the others to PEs with other (appropriate) allocation rules that won't cause (ii). Well, I created an additional PE with alloacation_rule $pe_slots, and built in an if condition in pe.jsv for all jobs which request just a single node to be assigned to this new PE. But the annoying situation didn't change. The scheduler configuration is set to queue_sort_methodload and load_formula slots. So what I'm still missing? Ignore previous message. Me getting it back to front I think. That looks correct (I think). Have you checked the jobs show the right granted PE with qstat -j? Yes, of course. Sorry to step in the discussion: `qstat -j ...` shows the requested one, the granted one is in `qstat -r`. $ qsub -pe * 2 test.sh Your job 44329 (test.sh) has been submitted $ qstat -j 44329 ... parallel environment: * range: 2 ... $ qstat -r ... Requested PE: * 2 Granted PE: make 2 -- Reuti ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users
Re: [gridengine users] Filling up nodes when using gepetools
Am 30.07.2015 um 18:29 schrieb Reuti re...@staff.uni-marburg.de: Am 30.07.2015 um 18:14 schrieb Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at: On Thu, 30 Jul 2015 12:57:13 + Winkler, Ursula (ursula.wink...@uni-graz.at) ursula.wink...@uni-graz.at wrote: My suggestion was to modify your jsv/gepetools to force single node parallel jobs into PEs with $pe_slots allocation rules (which gives you control over where they are scheduled via queue_sort_method and load_formula) while sending the others to PEs with other (appropriate) allocation rules that won't cause (ii). Well, I created an additional PE with alloacation_rule $pe_slots, and built in an if condition in pe.jsv for all jobs which request just a single node to be assigned to this new PE. But the annoying situation didn't change. The scheduler configuration is set to queue_sort_methodload and load_formula slots. So what I'm still missing? Ignore previous message. Me getting it back to front I think. That looks correct (I think). Have you checked the jobs show the right granted PE with qstat -j? Yes, of course. Sorry to step in the discussion: `qstat -j ...` shows the requested one, the granted one is in `qstat -r`. $ qsub -pe * 2 test.sh Your job 44329 (test.sh) has been submitted $ qstat -j 44329 ... parallel environment: * range: 2 ... $ qstat -r ... Requested PE: * 2 Granted PE: make 2 -- Reuti At the moment I don't know if I checked it with qstat -j, but I checked it - when I'm in the office again I probably have the output still on some screen window so I can tell it exactly. And I did do a test: I removed the PE temporarely from the queue - with the result that the jobs could not start anymore (as respected). ___ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users