Sorry for the kerfufflage. 'ssh -vvv' wasn't particularly informative (but I agree that's the 1st thing to try after such a rejection; often /very/ informative.
We did look at the logs and saw lots of this: Feb 13 12:51:29 compute-1-1 sshd[56824]: Accepted hostbased for mcherian from 10.1.255.239 port 44435 ssh2 Feb 13 12:51:29 compute-1-1 sshd[56824]: pam_unix(sshd:session): session opened for user mcherian by (uid=0) Feb 13 12:51:29 compute-1-1 sshd[56828]: fatal: setresuid 649: Resource temporarily unavailable Feb 13 12:51:29 compute-1-1 sshd[56824]: pam_unix(sshd:session): session closed for user mcherian It looks like ssh is running into this restriction: <https://bugzilla.redhat.com/show_bug.cgi?id=165156> and <http://www.linuxquestions.org/questions/linux-newbie-8/unable-to-login-user- account-orasp-on-linux-server-936871/> SO there are probably too many processes running for the limits set in '/etc/security/limits.conf' ie <http://gerardnico.com/wiki/linux/limits.conf> ie, it may have nothing to do with GE per se, but just that the number of processes that start due to that GE job exceed the number allowed by limits.conf. We're trying out the suggestion to increase them at least temporarily to see if that identifies the ssh interaction. hjm On Thursday, February 14, 2013 09:39:05 AM William Hay wrote: > On 13 February 2013 19:47, Joseph Farran <[email protected]> wrote: > > Hi Harry. > > > > Mathew did and I asked him to ask here - I got too many fires on the > > burner. > > > > Another followup to my posting: > > > > When Mathew cannot ssh to a node, I can ssh just fine from my own regular > > account or from root to the node. > > > > So this leaves me to believe it's Grid Engine doing the restriction. > > > > Grid engine doesn't normally interfere with sshd unless you have an > > appropriate pam module installed which would normally work the other way > (ie permit him to ssh only > if he has a job on the node). The first thing I would do is look at the > logs for the node in question to see if a reason for the drop is logged but > wild ass guess you have RLIMIT_NPROC set to some value (eg via ulimit -u) > low enough so that when the user's job goes berserk it consumes all the > available processes for that user then when sshd tries to drop privs via > setuid it gets EAGAIN. But as I said that is a wild ass guess and if I'm > correct this has nothing to do with grid engine so I'd look at the logs if > I were you. Additionally trying the ssh with a few -v might help. > > William > > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- "Something must be done. [X] is something. Therefore, we must do it." Bruce Schneier, on American response to just about anything. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
