On Mon, 21 Apr 2014 16:04:50 +0000 Prentice Bisbal <prentice.bis...@rutgers.edu> wrote:
> GridEngine Users, > > I'm having the following recurring problem with my GridEngine > installation. The problem started last week. Users reported to me > that the qrsh command wasn't working. What happens is that after > running qrsh from the login node, there is a pause and then you get a > prompt on the login node again, instead of the prompt for an > execution host, like this: > > [xxxx@login ~]$ qrsh > [xxxx@login ~]$ > > This seems to be dependent on the execution host assigned to the > task, and I've determined which hosts are the problems. For example, > if I shutdown the problematic hosts, qrsh works: > > [xxxx@login ~]$ qrsh > [xxxx@c9n3 ~]$ > > After one of these qrsh jobs fails, I get the following e-mail: > > Job 5326173 caused action: Job 5326173 set to ERROR > User = xxxx > Queue =pow1...@yyyy.zzzz > Start Time = <unknown> > End Time = <unknown> > failed assumedly before job:can't get password entry for user "xxxx". > Either the user does not exist or NIS error! > > > This error indicates there's something wrong with getting user > information. However, I can ssh into the problematic execution hosts > just fine, and when I do a 'getent passwd <username>', I get the > correct results. I've gone over my PAM configuration, and > my /etc/nsswitch.conf configuration, but I don't see anything > obviously wrong. It appears to me that sge_execd is using some other > mechanism for getting user information that is not configured > correctly on these hosts. > > Any ideas as to what could be wrong? Any suggestions on how to debug > this? Are you running nscd on your system? In the past we've had issues with nscd getting confused and needing a good kicking. William
signature.asc
Description: PGP signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users