In our cluster we have found that if you have multiple NIC on the compute nodes when they reboot often they fail to reconnect to the head node. This usually happens because it mixes up which eth is which. and after initializing eth0 it fails at Mounting NFS file system. we have by passed this by editing the ifcfg-eth0 and eth1 files and hardcoded the MAC addresses. This still sometimes doesnt work. the best way is to disable the NIC you are not using.
Hope this helps Ibad ________________________________________ From: Nicolas Triantafillou [n...@uow.edu.au] Sent: Wednesday, March 31, 2010 6:11 AM To: oscar-users@lists.sourceforge.net Subject: [Oscar-users] /home mount failed, ssh server->node / node->server failed Hello, I recently installed OSCAR 6.0.5svn03312010 on CentOS 5.4. (The 'latest release' version wasn't working at all so I went to the development version). I ran the 'test_cluster' script at the end of the installation wizard the following is happening: --- [r...@h-node01 testing]# ./test_cluster Performing root tests... /home mounts 7 nodes failed [FAILED] Preparing user tests... Performing user tests... SSH ping test [PASSED] SSH server->node [FAILED] SSH node->server Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,gssapi-with-mic,password). SSH node->server [FAILED] --- 1. The /home mounts are failing on boot due to the error 'no route to host', even though /etc/rc3.d/S25netfs is clearly being run after /etc/rc3.d/S10network, which succeeds. I moved it to S99netfs and it still fails to mount /home on boot. Immediately after booting I can manually ssh to the client and mount /home and it works perfectly. 2. The SSH problem is due to the oscartst user not existing on any of the client nodes. The test_cluster script seems to be trying to execute useradd only on the head node if /home/oscartst doesn't exist, however it does exist on the head node, as does the user, just not the clients. Does anyone have any idea how to resolve either of these issues? Also, I found this in the test_cluster script (while trying to work out why $test_user_homedir/oscartestfile disappears even when the unlink command is commented out): # Cleanup before copying base files `rm -rf $test_user_homedir/*`; This looks very dangerous, especially if $test_user_homedir is somehow unset. :) Cheers, Nick. -- Nick Triantafillou Computer Systems Officer Faculty of Informatics University of Wollongong ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users --- This transmission is confidential and may be legally privileged. If you receive it in error, please notify us immediately by e-mail and remove it from your system. If the content of this e-mail does not relate to the business of the University of Huddersfield, then we do not endorse it and will accept no liability. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users