Sorry, I apparently missed this before.
William Hay <[email protected]> writes:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 01/08/13 04:09, Jewell, Chris wrote:
>> Hello all,
>>
>> A while since I posted here, so good to be back!
>>
>> My installation of GE 8.1.3 from the Scientific Linux 6.3 RPM repo
>> has started misbehaving of late, since I introduced a share tree
>> policy the other day.
[I doubt there's a direct connexion.]
>> My setup is contained entirely on my 32 cpu, 2 NVIDIA Tesla card
>> machine (both qmaster and execd), and the spool directory is
>> mounted in /opt which is on the root partition. Having had a very
>> stable vanilla setup, I decided to implement a share tree policy to
>> give myself priority on my machine, and keep other users at bey.
>> All of a sudden, I'm getting lots of error messages in the execd
>> messages file:
>>
>> 07/29/2013 09:55:54| main | it060123 | C | can't switch
>> _user/group: Success
> I wonder if you are running out of unix groups to assign to the jobs?
The message is from a failure of setuid(2) or similar. I don't know if
it's a libc bug that errno seems no to be set ("Success") as it should
be.
The two possible cases are:
EAGAIN The uid does not match the current uid and uid brings process
over its RLIMIT_NPROC resource limit.
i.e. check the limit on processes/user (ulimit -u), and
EPERM The user is not privileged (Linux: does not have the CAP_SETUID
capability) and uid does not match the real UID or saved set-
user-ID of the calling process.
possibly because there was a previous failure to switch back to root
somehow -- many cases still don't have a check for errors,
unfortunately. (At least in some cases failure to drop privileges
should probably be fatal.).
In the absence of errno info, EAGAIN seems more likely.
--
Community Grid Engine: http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users