Joseph Farran <[email protected]> writes:

> Howdy.
>
> We have a cluster running Rocks 6.1 with Grid Engine 8.1.2.
>
> Every once in a while, we get jobs that fail not being able to set the user 
> id ( setuid  fails ).
>
> The nodes have the correct /etc/passwd entry as many jobs from the
> same user work while a few fail every once in a while.

[The failure is different if the user isn't defined on the node.]

> The user
> submitts several hundred 1-core jobs at once

Presumably not running on one node?

> so I am not sure if that
> is contributing to the failures - but it should not.   The failures
> are random and are happening around 1 failure every 300 jobs or so.
>
> Any suggestions on what could be causing this?

No, but it might be caused by a previous, ignored error.  There are a
load of relevant return values ignored by the code in that area.  I'll
make the setuid message use strerror, at least, for more info.  You
could perhaps patch it to do that and figure out which of the two
possible failures (assuming Linux) it is.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to