And the shepherd trace: failed before prolog: shepherd exited with exit status 7: before prolog Shepherd trace: 03/13/2013 22:39:47 [0:17310]: shepherd called with uid = 0, euid = 0 03/13/2013 22:39:47 [400:17310]: starting up 8.1.3 03/13/2013 22:39:47 [400:17310]: can't open file pid: Permission denied
Mikael --- Sent from a crippled computer (a.k.a a phone) 13 mar 2013 kl. 23:10 skrev "Mikael Brandström Durling" <[email protected]>: > Hi sge users, > I have been testing the USE_CGROUPS option that is available to execd. When > USE_CGROUPS is enabled it works fine to submit jobs one by one. But when I > submitted 70 serial jobs, all queues on all hosts were set to error state. It > happens after 2 or more jobs have started on the host, and the error message > is that the shepherd exited with return code 7, and the shepherds trace > pasted below. Jobs that successfully start have job spool directories owned > by the gridadmin administrative user (the user SGE runs as), while the spool > directories of the failed jobs are still owned by root. > If I turn off USE_CGROUPS everything works ok. It seems as there is some race > condition which can be triggered when jobs are started rapidly, but I have > not been able to figure out really what's happening. > Mikael > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
