Sorry for the kerfufflage. 

'ssh -vvv' wasn't particularly informative (but I agree that's the 1st thing 
to try after such a rejection; often /very/ informative.

We did look at the logs and saw lots of this:

Feb 13 12:51:29 compute-1-1 sshd[56824]: Accepted hostbased for mcherian from 
10.1.255.239 port 44435 ssh2
Feb 13 12:51:29 compute-1-1 sshd[56824]: pam_unix(sshd:session): session 
opened for user mcherian by (uid=0)
Feb 13 12:51:29 compute-1-1 sshd[56828]: fatal: setresuid 649: Resource 
temporarily unavailable
Feb 13 12:51:29 compute-1-1 sshd[56824]: pam_unix(sshd:session): session 
closed for user mcherian

It looks like ssh is running into this restriction:

<https://bugzilla.redhat.com/show_bug.cgi?id=165156>
   and 
<http://www.linuxquestions.org/questions/linux-newbie-8/unable-to-login-user-
account-orasp-on-linux-server-936871/>

SO there are probably too many processes running for the limits set in 
'/etc/security/limits.conf' 
   ie
<http://gerardnico.com/wiki/linux/limits.conf>

ie, it may have nothing to do with GE per se, but just that the number of 
processes that start due to that GE job exceed the number allowed by 
limits.conf.

We're trying out the suggestion to increase them at least temporarily to see 
if that identifies the ssh interaction.

hjm


On Thursday, February 14, 2013 09:39:05 AM William Hay wrote:
> On 13 February 2013 19:47, Joseph Farran <[email protected]> wrote:
> > Hi Harry.
> > 
> > Mathew did and I asked him to ask here - I got too many fires on the
> > burner.
> > 
> > Another followup to my posting:
> > 
> > When Mathew cannot ssh to a node, I can ssh just fine from my own regular
> > account or from root to the node.
> > 
> > So this leaves me to believe it's Grid Engine doing the restriction.
> > 
> > Grid engine doesn't normally interfere with sshd unless you have an
> 
> appropriate pam module installed which would normally work the other way
> (ie permit him to ssh only
> if he has a job on the node).  The first thing I would do is look at the
> logs for the node in question to see if a reason for the drop is logged but
> wild ass guess you have RLIMIT_NPROC set to some value (eg via ulimit -u)
> low enough so that when the user's job goes berserk it consumes all the
> available processes for that user then when sshd tries to drop privs via
> setuid it gets EAGAIN.   But as I said that is a wild ass guess and if I'm
> correct this has nothing to do with grid engine so I'd look at the logs if
> I were you.  Additionally trying the ssh with a few -v might help.
> 
> William
> 
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users

---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
"Something must be done. [X] is something. Therefore, we must do it."
Bruce Schneier, on American response to just about anything.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to