Re: [gridengine users] SGE 8.1.3 and USE_CGROUPS sets hosts in error state

Mikael Brandström Durling Wed, 13 Mar 2013 15:41:39 -0700

And the shepherd trace:

failed before prolog: shepherd exited with exit status 7: before prolog
Shepherd trace:
03/13/2013 22:39:47 [0:17310]: shepherd called with uid = 0, euid = 0
03/13/2013 22:39:47 [400:17310]: starting up 8.1.3
03/13/2013 22:39:47 [400:17310]: can't open file pid: Permission denied


Mikael

---
Sent from a crippled computer (a.k.a a phone)

13 mar 2013 kl. 23:10 skrev "Mikael Brandström Durling" <[email protected]>:

> Hi sge users,
> I have been testing the USE_CGROUPS option that is available to execd. When 
> USE_CGROUPS is enabled it works fine to submit jobs one by one. But when I 
> submitted 70 serial jobs, all queues on all hosts were set to error state. It 
> happens after 2 or more jobs have started on the host, and the error message 
> is that the shepherd exited with return code 7, and the shepherds trace 
> pasted below. Jobs that successfully start have job spool directories owned 
> by the gridadmin administrative user (the user SGE runs as), while the spool 
> directories of the failed jobs are still owned by root.
> If I turn off USE_CGROUPS everything works ok. It seems as there is some race 
> condition which can be triggered when jobs are started rapidly, but I have 
> not been able to figure out really what's happening.
> Mikael
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] SGE 8.1.3 and USE_CGROUPS sets hosts in error state

Reply via email to