On Mon, 21 Apr 2014 16:04:50 +0000
Prentice Bisbal <prentice.bis...@rutgers.edu> wrote:

> GridEngine Users,
> 
> I'm having the following recurring problem with my GridEngine 
> installation. The problem started last week. Users reported to me
> that the qrsh command wasn't working. What happens is that after
> running qrsh from the login node, there is a pause and then you get a
> prompt on the login node again, instead of the prompt for an
> execution host, like this:
> 
> [xxxx@login ~]$ qrsh
> [xxxx@login ~]$
> 
> This seems to be dependent on the execution host assigned to the
> task, and I've determined which hosts are the problems. For example,
> if I shutdown the problematic hosts, qrsh works:
> 
> [xxxx@login ~]$ qrsh
> [xxxx@c9n3 ~]$
> 
> After one of these qrsh jobs fails, I get the following e-mail:
> 
> Job 5326173 caused action: Job 5326173 set to ERROR
>   User        = xxxx
>   Queue       =pow1...@yyyy.zzzz
>   Start Time  = <unknown>
>   End Time    = <unknown>
> failed assumedly before job:can't get password entry for user "xxxx". 
> Either the user does not exist or NIS error!
> 
> 
> This error indicates there's something wrong with getting user 
> information. However, I can ssh into the problematic execution hosts 
> just fine, and when I do a 'getent passwd <username>', I get the
> correct results. I've gone over my PAM configuration, and
> my /etc/nsswitch.conf configuration, but I don't see anything
> obviously wrong. It appears to me that sge_execd is using some other
> mechanism for getting user information that is not configured
> correctly on these hosts.
> 
> Any ideas as to what could be wrong? Any suggestions on how to debug
> this?

Are you running nscd on your system?  In the past we've had issues
with nscd getting confused and needing a good kicking.  

William 

Attachment: signature.asc
Description: PGP signature

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to