Am 18.03.2013 um 07:56 schrieb Joseph Farran: > On 3/17/2013 1:42 PM, Reuti wrote: >> Am 17.03.2013 um 19:15 schrieb Joseph Farran: >> >>> On 3/17/2013 2:14 AM, Reuti wrote: >>>> Am 17.03.2013 um 07:22 schrieb Joseph Farran: >>>> >>>>> On 1/4/2013 10:37 AM, Reuti wrote: >>>>>> Am 02.01.2013 um 05:08 schrieb Joseph Farran: >>>>>> >>>>>>> Hello Reuti. >>>>>>> >>>>>>> Yes, the job(s) are not suspending (S) as they normally do. So it's >>>>>>> not the queue, but the jobs. >>>>>> But is the queue in suspended state (qstat -f)? >>>>> Sorry Reuti, missed your question. >>>>> >>>>> Yes, the queue is SUSPENDED but jobs continue to run: Here is one >>>>> example: >>>>> >>>>> [email protected] BIP 0/4/64 11.21 lx-amd64 S >>>>> 242709 0.00355 CMAPNN mengfant r 03/15/2013 02:27:23 2 20 >>>>> 242709 0.00355 CMAPNN mengfant r 03/15/2013 02:27:23 2 33 >>>> Were these slave tasks of a parallel job? >>> No, they are part of a job array: >>> >>> qstat|fgrep compute-14-18 >>> 242709 0.00610 CMAPNN mengfant S 03/15/2013 02:27:23 >>> [email protected] 2 20 >>> 242709 0.00610 CMAPNN mengfant S 03/15/2013 02:27:23 >>> [email protected] 2 33 >> But it's using 2 slots - so only 2 slots via $pe_slots on the same machine? >> >> -- Reuti > > Yes and correction. It's a job array running with -pe with each task using > 2 cores. So yes.
How is the subordination defined - when the complete queue instance is filled up on this particular exechost? -- Reuti >>> was able to suspend the quue "[email protected]" manually, but >>> this happens every so often that Grid Engine "forgets". >>> >>>> -- Reuti >>>> >>>> >>>>> Any idea why it keeps forgetting to suspend? Only happens once in a >>>>> while but it overloads the nodes when it does happen. >>>>> >>>>> >>>>> >>>>>> -- Reuti >>>>>> >>>>>> >>>>>>> Normally as soon as 1 or more core jobs enters the node through the >>>>>>> queue, the subordinate jobs suspend immediately. Once is a while, >>>>>>> the jobs that go in through the subordinate queue do not suspend as >>>>>>> they should. >>>>>>> >>>>>>> On 1/1/2013 7:04 AM, Reuti wrote: >>>>>>>> Engine Forgets and does not suspend and the node is overloaded. >>>>>>>> The queue is not going into the "S" state or the jobs therein are just >>>>>>>> not suspended? >>>>>>>> >>>>>>>> -- Reuti >>>>>>>> >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
