"Jewell, Chris" <[email protected]> writes: >> The message is from a failure of setuid(2) or similar. I don't know if >> it's a libc bug that errno seems no to be set ("Success") as it should >> be. >> >> The two possible cases are: >> >> EAGAIN The uid does not match the current uid and uid brings process >> over its RLIMIT_NPROC resource limit. >> >> i.e. check the limit on processes/user (ulimit -u), and > > Checked ulimit -u, which shows 1024 processes per user.
Is that actually in the job context? > Monitoring jobs/user with: > > $ while (true) ; do ps hax -o user | sort | uniq -c; sleep 0.5; done > > shows that sgeadmin and my username never go above 120 jobs. Jobs or processes? (I'm not sure that measures "never".) > I've increase the process limit > in /etc/security/limits.d/90-nproc.conf to 4096 to see if this > improves things, but... Did it? > Well, not sure, given the above. Anything else I can do to try to > gather more info? It's possible a trace from execd would show something useful (see sge_dl(8)), but given the apparently-bogus errno from the setuid or whatever, I'd resort to running it under gdb, which is probably not what you want to hear. I'll see if I can reproduce it sometime with a low process limit. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
