> The message is from a failure of setuid(2) or similar. I don't know if
> it's a libc bug that errno seems no to be set ("Success") as it should
> be.
>
> The two possible cases are:
>
> EAGAIN The uid does not match the current uid and uid brings process
> over its RLIMIT_NPROC resource limit.
>
> i.e. check the limit on processes/user (ulimit -u), and
Checked ulimit -u, which shows 1024 processes per user. Monitoring jobs/user
with:
$ while (true) ; do ps hax -o user | sort | uniq -c; sleep 0.5; done
shows that sgeadmin and my username never go above 120 jobs. I've increase the
process limit
in /etc/security/limits.d/90-nproc.conf to 4096 to see if this improves things,
but...
> EPERM The user is not privileged (Linux: does not have the CAP_SETUID
> capability) and uid does not match the real UID or saved set-
> user-ID of the calling process.
>
> possibly because there was a previous failure to switch back to root
> somehow -- many cases still don't have a check for errors,
> unfortunately. (At least in some cases failure to drop privileges
> should probably be fatal.).
>
> In the absence of errno info, EAGAIN seems more likely.
Well, not sure, given the above. Anything else I can do to try to gather more
info? Is it possible to get GE to not delete the directories in
$SGE_ROOT/default/spool/hostname/active_jobs so I can get a trace?
Cheers,
Chris
--
Dr Chris Jewell
Lecturer in Biostatistics
Institute of Fundamental Sciences
Massey University
Private Bag 11222
Palmerston North 4442
New Zealand
Tel: +64 (0) 6 350 5701 Extn: 3586
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users