"Jewell, Chris" <[email protected]> writes:

>> The message is from a failure of setuid(2) or similar.  I don't know if
>> it's a libc bug that errno seems no to be set ("Success") as it should
>> be.
>> 
>> The two possible cases are:
>> 
>>    EAGAIN The uid does not match the current uid and  uid  brings process
>>           over its RLIMIT_NPROC resource limit.
>> 
>> i.e. check the limit on processes/user (ulimit -u), and
>
> Checked ulimit -u, which shows 1024 processes per user.

Is that actually in the job context?

> Monitoring jobs/user with:
>
> $ while (true) ; do ps hax -o user | sort | uniq -c; sleep 0.5; done
>
> shows that sgeadmin and my username never go above 120 jobs.

Jobs or processes?  (I'm not sure that measures "never".)

> I've increase the process limit
> in /etc/security/limits.d/90-nproc.conf to 4096 to see if this
> improves things, but...

Did it?

> Well, not sure, given the above.  Anything else I can do to try to
> gather more info?

It's possible a trace from execd would show something useful (see
sge_dl(8)), but given the apparently-bogus errno from the setuid or
whatever, I'd resort to running it under gdb, which is probably not what
you want to hear.

I'll see if I can reproduce it sometime with a low process limit.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to