GridEngine Users,

I'm having the following recurring problem with my GridEngine installation. The problem started last week. Users reported to me that the qrsh command wasn't working. What happens is that after running qrsh from the login node, there is a pause and then you get a prompt on the login node again, instead of the prompt for an execution host, like this:

[xxxx@login ~]$ qrsh
[xxxx@login ~]$

This seems to be dependent on the execution host assigned to the task, and I've determined which hosts are the problems. For example, if I shutdown the problematic hosts, qrsh works:

[xxxx@login ~]$ qrsh
[xxxx@c9n3 ~]$

After one of these qrsh jobs fails, I get the following e-mail:

Job 5326173 caused action: Job 5326173 set to ERROR
 User        = xxxx
 Queue       =pow1...@yyyy.zzzz
 Start Time  = <unknown>
 End Time    = <unknown>
failed assumedly before job:can't get password entry for user "xxxx". Either the user does not exist or NIS error!


This error indicates there's something wrong with getting user information. However, I can ssh into the problematic execution hosts just fine, and when I do a 'getent passwd <username>', I get the correct results. I've gone over my PAM configuration, and my /etc/nsswitch.conf configuration, but I don't see anything obviously wrong. It appears to me that sge_execd is using some other mechanism for getting user information that is not configured correctly on these hosts.

Any ideas as to what could be wrong? Any suggestions on how to debug this?

I googled this error message, and the only hit I came up with with a solution was my own post here from several years ago, which I don't think applies in this case:

http://gridengine.org/pipermail/users/2012-February/002734.html

The cluster is using OGS 6.2u5p2:

$ rpm -q ogs
ogs-6.2u5p2-2.x86_64

I know this is an older version, but I'm not really in the position to upgrade the entire cluster right now. I have users who can't run job, so I need to get the system up and running as quickly as possible.

Thanks for your help.

Prentice
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to