GridEngine Users,
I'm having the following recurring problem with my GridEngine
installation. The problem started last week. Users reported to me that
the qrsh command wasn't working. What happens is that after running qrsh
from the login node, there is a pause and then you get a prompt on the
login node again, instead of the prompt for an execution host, like this:
[xxxx@login ~]$ qrsh
[xxxx@login ~]$
This seems to be dependent on the execution host assigned to the task,
and I've determined which hosts are the problems. For example, if I
shutdown the problematic hosts, qrsh works:
[xxxx@login ~]$ qrsh
[xxxx@c9n3 ~]$
After one of these qrsh jobs fails, I get the following e-mail:
Job 5326173 caused action: Job 5326173 set to ERROR
User = xxxx
Queue =pow1...@yyyy.zzzz
Start Time = <unknown>
End Time = <unknown>
failed assumedly before job:can't get password entry for user "xxxx".
Either the user does not exist or NIS error!
This error indicates there's something wrong with getting user
information. However, I can ssh into the problematic execution hosts
just fine, and when I do a 'getent passwd <username>', I get the correct
results. I've gone over my PAM configuration, and my /etc/nsswitch.conf
configuration, but I don't see anything obviously wrong. It appears to
me that sge_execd is using some other mechanism for getting user
information that is not configured correctly on these hosts.
Any ideas as to what could be wrong? Any suggestions on how to debug this?
I googled this error message, and the only hit I came up with with a
solution was my own post here from several years ago, which I don't
think applies in this case:
http://gridengine.org/pipermail/users/2012-February/002734.html
The cluster is using OGS 6.2u5p2:
$ rpm -q ogs
ogs-6.2u5p2-2.x86_64
I know this is an older version, but I'm not really in the position to
upgrade the entire cluster right now. I have users who can't run job, so
I need to get the system up and running as quickly as possible.
Thanks for your help.
Prentice
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users