Thanks for your reply.  Here is the output I get at the end of the ssh
session before it allows me in:

(alot of lines above to this...)
debug2: channel 0: request shell confirm 0
debug2: fd 3 setting TCP_NODELAY
debug2: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel 0: rcvd adjust 131072
Last login: Fri Jul 13 17:53:53 2007 from blah.blah.clemson.edu
do_ypcall: clnt_call: RPC: Timed out

At this point I am logged in, but this doesn't really tell me anything at
this point, other than window size that tcp is using.  Anyone good at
diagnosing output?  (blah, blah domain name is not real, just for
security...)

You might have a good point on the driver/bios issue.  We have the same
computes but have changed the head node.  I will look into this tomorrow. 
But has anyone else seen this type of problem?

(Thanks for your help Michael).

> Try doing "ssh -vvv" from the head to the compute nodes and see where
> it is timing out.
> Cexec just uses ssh, so this should help diagnose the problem.
>
> What network cards and switch are you using?  It is possible that
> there is a driver/bios issue with your networking hardware...
>
> On 7/13/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> Hello, I have been through the lists and have seen a couple of posts
>> similar to this.  I have tried the resolutions but to no avail.
>>
>> Here is my oscar environment:
>>
>> 1) 1 head node and 22 computes (all running rhel4 u4, using oscar 5.0)
>> 2) oscar installation went well, computes functioning fine
>> 3) head node:  2 nics, eth0 external 130.127.X.X, eth1 172.16.0.100,
>> running NAT for computes traffic outside the network
>> 4) head and computes are NIS and NFS clients
>>
>> Operability:
>>
>> 1) head node and computes can mount nfs drives and ypbind starts for NIS
>> 2) NIS users can log in (although ypbind has started to become
>> sporatic).
>> 3) NIS users can submit jobs to PBS.
>>
>> Situation:
>>
>> 1) Using head node LOCAL root account, ssh'ing into computes starts to
>> get
>> slow after setting up NIS.  CEXEC also is very slow.
>> 2) When login to computer as local root, get
>>
>> do_ypcall: clnt_call: RPC: Timed out
>>
>> 3) Also get
>>
>> do_ypcall: clnt_call: RPC: Timed out; errno=cannot send
>>
>> Questions:
>>
>> NIS is functional.  Users can log in.  Cexec is basically inoperable,
>> too
>> slow.  Has someone had this problem?  I know I might get a iptables
>> response (NFS and NIS is operational, so packets are going outside the
>> cluster).  Is there an OSCAR setting that might be slowing cexec down?
>> Or
>> is something else?
>>
>> NOTE:  NO computes have /etc/resolv.conf.  I saw a solution to something
>> like this.  All computes can ping NIS server and NFS servers.
>>
>> Thanks for any help.
>>
>> Vince
>>
>>




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to