Hi, Am 29.01.2014 um 20:01 schrieb Daniel Kamalic:
> Slotwise preemption doesn't seem to be working correctly for single jobs that > take up multiple slots on my setup: Unfortunately that's true. I can't find a discussions about it in the mailing list though. I thought this was an issue which was fixed in the meantime. -- Reuti > enggrid1:~# qconf -sq hyness.q|grep subordinate > subordinate_list slots=32(me.q:0:sr) > > enggrid1:~# qconf -sq me.q|grep subordinate > subordinate_list slots=32(lowpriority.q:0:sr) > > This mimicks the configuration shown in section 2c of the subordinate_list > heading of > the man page at > http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html . > > I have a parallel environment "-pe threaded" set up for users to request > multiple cores on a node: > > enggrid1:~# qconf -sp threaded > pe_name threaded > slots 99999 > user_lists NONE > xuser_lists NONE > start_proc_args /bin/true > stop_proc_args /bin/true > allocation_rule $pe_slots > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > accounting_summary FALSE > > > But a 16-slot multi-thread job: > > >> [email protected] BIP 0/16/32 47.25 linux-x64 > >> 6635028 0.56000 script_mbr ebru r 01/03/2014 13:56:00 16 > > is only counting as ONE slot as far as the preemption mechanism is concerned. > This causes only ONE single-slot job on the subordinate queue to get > suspended, when in fact 16 of them should get suspended. See below for my > correspondence with the researcher, with full qstat output, for more detail. > > Any ideas what I'm doing wrong here, and how I should go about fixing it? > > > Thanks! > Dan > > > > > > What's going on here is it seems that Ebru's multi-thread "-pe threaded > > 16" jobs are only counting as "one slot" each from the queue's > > perspective. That's why when she ran two 16-thread jobs on > > me-compsim-6, it suspended two of your one-slot jobs -- when it should > > have suspended all 32 of your one-slot jobs. What we want is for these > > jobs to count as the same number of "slots" as "threads". > > > > There's got to be a way for either her to report the number of slots her > > jobs are using correctly, or for me to get the queue to count the number > > of slots she's using correctly. The "-l slots=16" option does not work > > correctly, but we can play with some more of the options and figure it out. > > > > > > On 01/04/2014 01:48 PM, [email protected] via RT wrote: > >> <URL: http://eng-rt.bu.edu/rt/Ticket/Display.html?id=126980> > >> > >> Queue: ENG [Grid] > >> > >> Hi Dan, > >> > >> I think there is still a problem unfortunately. > >> Ebru is running some jobs from the hyness queue and to give her priority > >> I'm using the me queue instead. Even though she's using most of the slots > >> of some servers I am still able to run 32 jobs (or so). > >> > >> I'm printing the stats about hyness.q and me.q bellow. As you can see I > >> should not be able to run 32 jobs on the me-compsim-02,06,07 servers from > >> me.q. The 06 server is not even listed as being in S state when printing > >> stats from the me.q (same with lowpriority.q). > >> > >> Guilhem > >> > >> > >> -bash-3.2$ qstat -f -q hyness.q -u '*' > >> queuename qtype resv/used/tot. load_avg arch > >> states > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/0/32 28.03 linux-x64 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/16/32 47.25 linux-x64 > >> 6635028 0.56000 script_mbr ebru r 01/03/2014 13:56:00 16 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/0/32 33.29 linux-x64 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/12/32 -NA- linux-x64 au > >> 6619437 0.51000 matlab -no grichard dr 12/18/2013 13:16:38 1 > >> 6619476 0.51000 matlab -no grichard dr 12/18/2013 13:17:23 1 > >> 6619481 0.51000 matlab -no grichard dr 12/18/2013 13:17:23 1 > >> 6619501 0.51000 matlab -no grichard dr 12/18/2013 13:17:53 1 > >> 6619516 0.51000 matlab -no grichard dr 12/18/2013 13:18:08 1 > >> 6619518 0.51000 matlab -no grichard dr 12/18/2013 13:18:08 1 > >> 6619528 0.51000 matlab -no grichard dr 12/18/2013 13:18:08 1 > >> 6619546 0.51000 matlab -no grichard dr 12/18/2013 13:18:38 1 > >> 6619554 0.51000 matlab -no grichard dr 12/18/2013 13:18:38 1 > >> 6619564 0.51000 matlab -no grichard dr 12/18/2013 13:18:53 1 > >> 6619581 0.51000 matlab -no grichard dr 12/18/2013 13:19:08 1 > >> 6619647 0.51000 matlab -no grichard dr 12/18/2013 13:20:23 1 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/31/32 42.67 linux-x64 > >> 6635029 0.61000 script_mbr ebru r 01/03/2014 13:57:15 31 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/32/32 51.56 linux-x64 > >> 6632977 0.56000 script_mbr ebru r 01/02/2014 12:07:44 16 > >> 6633378 0.56000 script_mbr ebru r 01/02/2014 13:51:44 16 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/0/32 32.35 linux-x64 > >> > >> > >> -bash-3.2$ qstat -f -q me.q -u '*' > >> queuename qtype resv/used/tot. load_avg arch > >> states > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/28/32 28.03 linux-x64 > >> 6633178 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633195 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633199 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633205 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633209 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633216 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633221 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633228 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633233 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633240 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633245 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633261 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633299 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633302 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633307 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633310 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633315 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633318 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633323 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633326 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633331 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633334 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633339 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633342 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633347 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633350 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633353 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633355 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/32/32 46.49 linux-x64 > >> 6633200 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633204 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633210 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633215 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633222 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633227 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633234 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633239 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633246 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633252 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633256 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633262 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633268 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633274 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633285 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633287 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633290 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633293 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633296 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633300 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633304 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633308 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633312 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633316 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633320 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633324 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633328 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633332 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633336 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633341 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633345 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633349 0.51000 matlab -no grichard S 01/02/2014 13:50:14 1 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/32/32 33.21 linux-x64 > >> 6633214 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633220 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633226 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633232 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633238 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633244 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633250 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633257 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633263 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633269 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633275 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633279 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633280 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633281 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633282 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633283 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633284 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633286 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633288 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633291 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633294 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633297 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633301 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633305 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633309 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633313 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633317 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633321 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633325 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633329 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633333 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633337 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/0/32 -NA- linux-x64 au > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/32/32 43.34 linux-x64 > >> 6633184 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633186 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633188 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633190 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633192 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633194 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633197 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633202 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633207 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633212 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633218 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633224 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633230 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633236 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633242 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633248 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633253 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633259 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633265 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633271 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633277 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633340 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633344 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633348 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633352 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633354 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633356 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633357 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633358 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633359 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633360 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633361 0.51000 matlab -no grichard S 01/02/2014 13:50:14 1 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/32/32 52.10 linux-x64 > >> 6633179 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633180 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633181 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633182 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633183 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633185 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633187 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633189 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633191 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633193 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633196 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633201 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633206 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633211 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633217 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633223 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633229 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633235 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633241 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633247 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633258 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633264 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633270 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633276 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633362 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633363 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633364 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633365 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633366 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633367 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633368 0.51000 matlab -no grichard S 01/02/2014 13:50:14 1 > >> 6633369 0.51000 matlab -no grichard S 01/02/2014 13:50:14 1 > >> --------------------------------------------------------------------------------- > >> [email protected] BIP 0/32/32 32.30 linux-x64 > >> 6633198 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633203 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633208 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633213 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633219 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633225 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633231 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633237 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633243 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633249 0.51000 matlab -no grichard r 01/02/2014 13:23:29 1 > >> 6633254 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633260 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633266 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633272 0.51000 matlab -no grichard r 01/02/2014 13:23:44 1 > >> 6633278 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633289 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633292 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633295 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633298 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633303 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633306 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633311 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633314 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633319 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633322 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633327 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633330 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633335 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633338 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633343 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633346 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> 6633351 0.51000 matlab -no grichard r 01/02/2014 13:50:14 1 > >> > >> > > > -- > Daniel Kamalic > Manager of Research Computing > College of Engineering Web: www.eng.bu.edu > Boston University www.massopencloud.org > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
