Am 18.03.2013 um 07:56 schrieb Joseph Farran:

> On 3/17/2013 1:42 PM, Reuti wrote:
>> Am 17.03.2013 um 19:15 schrieb Joseph Farran:
>> 
>>> On 3/17/2013 2:14 AM, Reuti wrote:
>>>> Am 17.03.2013 um 07:22 schrieb Joseph Farran:
>>>> 
>>>>> On 1/4/2013 10:37 AM, Reuti wrote:
>>>>>> Am 02.01.2013 um 05:08 schrieb Joseph Farran:
>>>>>> 
>>>>>>> Hello Reuti.
>>>>>>> 
>>>>>>> Yes, the job(s) are not suspending (S) as they normally do.   So it's 
>>>>>>> not the queue, but the jobs.
>>>>>> But is the queue in suspended state (qstat -f)?
>>>>> Sorry Reuti, missed your question.
>>>>> 
>>>>> Yes, the queue is SUSPENDED but jobs continue to run:    Here is one 
>>>>> example:
>>>>> 
>>>>> [email protected]     BIP   0/4/64         11.21 lx-amd64      S
>>>>> 242709 0.00355 CMAPNN     mengfant     r     03/15/2013 02:27:23     2 20
>>>>> 242709 0.00355 CMAPNN     mengfant     r     03/15/2013 02:27:23     2 33
>>>> Were these slave tasks of a parallel job?
>>> No, they are part of a job array:
>>> 
>>> qstat|fgrep compute-14-18
>>> 242709 0.00610 CMAPNN     mengfant     S     03/15/2013 02:27:23 
>>> [email protected]         2 20
>>> 242709 0.00610 CMAPNN     mengfant     S     03/15/2013 02:27:23 
>>> [email protected]         2 33
>> But it's using 2 slots - so only 2 slots via $pe_slots on the same machine?
>> 
>> -- Reuti
> 
> Yes and correction.   It's a job array running with -pe with each task using 
> 2 cores.    So yes.

How is the subordination defined - when the complete queue instance is filled 
up on this particular exechost?

-- Reuti


>>> was able to suspend the quue "[email protected]" manually, but 
>>> this happens every so often that Grid Engine "forgets".
>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> Any idea why it keeps forgetting to suspend?    Only happens once in a 
>>>>> while but it overloads the nodes when it does happen.
>>>>> 
>>>>> 
>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> 
>>>>>>> Normally as soon as 1 or more core jobs enters the node through the 
>>>>>>> queue, the subordinate jobs suspend immediately.    Once is a while, 
>>>>>>> the jobs that go in through the subordinate queue do not suspend as 
>>>>>>> they should.
>>>>>>> 
>>>>>>> On 1/1/2013 7:04 AM, Reuti wrote:
>>>>>>>> Engine Forgets and does not suspend and the node is overloaded.
>>>>>>>> The queue is not going into the "S" state or the jobs therein are just 
>>>>>>>> not suspended?
>>>>>>>> 
>>>>>>>> -- Reuti
>>>>>>>> 
>> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to