I am remembering a long time ago I think I experienced something similar to 
your problem.  If I remember correctly, I think that our problem was eventually 
traced to spanning tree on our switches; when we disabled spanning tree, we no 
longer needed the sleep statement before NFS attempted to do its mounts, 
because there was no pause anymore when the switches checked for loops in the 
network.
 
If you have switches that implement spanning tree, maybe you could try turning 
it off and seeing if that what was causing the network issues?
 
--Joe

________________________________

From: Nicolas Triantafillou [mailto:n...@uow.edu.au]
Sent: Wed 3/31/2010 11:43 PM
To: oscar-users@lists.sourceforge.net
Subject: Re: [Oscar-users] /home mount failed, ssh server->node / node->server 
failed



Thankyou Ibad, this certainly put me in the right direction, as our
servers all have dual integrated NIC's (Dell PowerEdge 1750's).

Unforunately the BIOS in these servers don't have the capability to just
disable one of the integrated NIC's, it's both or none. I found an
alternate solution on another website:
http://crazytoon.com/2007/05/11/centos-and-redhat-problem-nfs-mount-at-boot-up-fails-with-error-system-error-no-route-to-host/

For email archive history in case that site goes down, this was the
solution I used:

vi /etc/init.d/netfs
insert: action $"Sleeping for 30 secs: " sleep 30
right after: [ ! -f /var/lock/subsys/portmap ] && service portmap start
and right before: action $"Mounting NFS filesystems: " mount -a -t nfs,nfs4

That solves one of our problems.. now to find out why there's no
oscartst user on any of my client machines :)

Cheers,
Nick.

On 31/03/2010 9:13 PM, I.Kureshi U0850037 wrote:
> In our cluster we have found that if you have multiple NIC on the compute 
> nodes when they reboot often they fail to reconnect to the head node. This 
> usually happens because it mixes up which eth is which. and after 
> initializing eth0 it fails at Mounting NFS file system. we have by passed 
> this by editing the ifcfg-eth0 and eth1 files and hardcoded the MAC 
> addresses. This still sometimes doesnt work. the best way is to disable the 
> NIC you are not using.
>
> Hope this helps
>
> Ibad
> ________________________________________
> From: Nicolas Triantafillou [n...@uow.edu.au]
> Sent: Wednesday, March 31, 2010 6:11 AM
> To: oscar-users@lists.sourceforge.net
> Subject: [Oscar-users] /home mount failed,      ssh server->node / 
> node->server failed
>
> Hello,
>
> I recently installed OSCAR 6.0.5svn03312010 on CentOS 5.4. (The 'latest
> release' version wasn't working at all so I went to the development
> version).
>
> I ran the 'test_cluster' script at the end of the installation wizard
> the following is happening:
>
> ---
>
> [r...@h-node01 testing]# ./test_cluster
> Performing root tests...
> /home mounts            7 nodes
> failed                [FAILED]
>
> Preparing user tests...
> Performing user tests...
> SSH ping test         [PASSED]
> SSH server->node      [FAILED]
> SSH node->server
> Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,gssapi-with-mic,password).
> SSH node->server      [FAILED]
>
> ---
>
> 1. The /home mounts are failing on boot due to the error 'no route to
> host', even though /etc/rc3.d/S25netfs is clearly being run after
> /etc/rc3.d/S10network, which succeeds. I moved it to S99netfs and it
> still fails to mount /home on boot. Immediately after booting I can
> manually ssh to the client and mount /home and it works perfectly.
>
> 2. The SSH problem is due to the oscartst user not existing on any of
> the client nodes. The test_cluster script seems to be trying to execute
> useradd only on the head node if /home/oscartst doesn't exist, however
> it does exist on the head node, as does the user, just not the clients.
>
> Does anyone have any idea how to resolve either of these issues?
>
> Also, I found this in the test_cluster script (while trying to work out
> why $test_user_homedir/oscartestfile disappears even when the unlink
> command is commented out):
>
> # Cleanup before copying base files
> `rm -rf $test_user_homedir/*`;
>
> This looks very dangerous, especially if $test_user_homedir is somehow
> unset. :)
>
> Cheers,
> Nick.
>
> --
> Nick Triantafillou
> Computer Systems Officer
> Faculty of Informatics
> University of Wollongong
>

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users


------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to