In our cluster we have found that if you have multiple NIC on the compute nodes 
when they reboot often they fail to reconnect to the head node. This usually 
happens because it mixes up which eth is which. and after initializing eth0 it 
fails at Mounting NFS file system. we have by passed this by editing the 
ifcfg-eth0 and eth1 files and hardcoded the MAC addresses. This still sometimes 
doesnt work. the best way is to disable the NIC you are not using.

Hope this helps

Ibad
________________________________________
From: Nicolas Triantafillou [n...@uow.edu.au]
Sent: Wednesday, March 31, 2010 6:11 AM
To: oscar-users@lists.sourceforge.net
Subject: [Oscar-users] /home mount failed,      ssh server->node / node->server 
failed

Hello,

I recently installed OSCAR 6.0.5svn03312010 on CentOS 5.4. (The 'latest
release' version wasn't working at all so I went to the development
version).

I ran the 'test_cluster' script at the end of the installation wizard
the following is happening:

---

[r...@h-node01 testing]# ./test_cluster
Performing root tests...
/home mounts            7 nodes
failed                [FAILED]

Preparing user tests...
Performing user tests...
SSH ping test         [PASSED]
SSH server->node      [FAILED]
SSH node->server
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-with-mic,password).
SSH node->server      [FAILED]

---

1. The /home mounts are failing on boot due to the error 'no route to
host', even though /etc/rc3.d/S25netfs is clearly being run after
/etc/rc3.d/S10network, which succeeds. I moved it to S99netfs and it
still fails to mount /home on boot. Immediately after booting I can
manually ssh to the client and mount /home and it works perfectly.

2. The SSH problem is due to the oscartst user not existing on any of
the client nodes. The test_cluster script seems to be trying to execute
useradd only on the head node if /home/oscartst doesn't exist, however
it does exist on the head node, as does the user, just not the clients.

Does anyone have any idea how to resolve either of these issues?

Also, I found this in the test_cluster script (while trying to work out
why $test_user_homedir/oscartestfile disappears even when the unlink
command is commented out):

# Cleanup before copying base files
`rm -rf $test_user_homedir/*`;

This looks very dangerous, especially if $test_user_homedir is somehow
unset. :)

Cheers,
Nick.

--
Nick Triantafillou
Computer Systems Officer
Faculty of Informatics
University of Wollongong

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users


---
This transmission is confidential and may be legally privileged. If you receive 
it in error, please notify us immediately by e-mail and remove it from your 
system. If the content of this e-mail does not relate to the business of the 
University of Huddersfield, then we do not endorse it and will accept no 
liability.

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to