I did a clean install again and have run into exactly the same problem at step-8. The /home mount scrolls on all 31 nodes and fails in the test. I get no errors whatsoever in the first steps. This time however, I am asked for the password for oscartst user as number of times as there are nodes in subsequent tests. It looks like a very minor problem but we don't know how to fix it. Unfortunately, we are non professional linux users and are trying to fix the cluster for teaching bioinformatics. 1. I have installed fedora core 4 as 'workstation' Should I go for custom install? 2. I have checked 'network' also in the selection of packages. 3. I have selected MPIH instead of LAM Could some one please suggest some tests to check if NFS is working properly? Should we change to fedora core 5 or 6 or some other distribution? By the way how is the whole cluster shutdown? Currently, I ssh into every node and issue halt -p :)
Still trying and haven't lost hope yet, lutfullah On 4/24/07, Michael Edwards <[EMAIL PROTECTED]> wrote: > The home directories should contain the same files as the head node, which > to start with is probably just a bunch of .whatever files. > > I have had problems with the passwordless ssh when I was fiddling around > with getting LDAP authentication set up, so you may have changed something > which fixed one problem but is now causing this one. Adding things to > hosts.allow should not be necessary for instance, though as far as I know it > shouldn't cause problems either. > > After you were having set up problems, did you go back and start with a > clean OS install again? It shouldn't take nearly as long the second time :) > > > On 4/24/07, Dr. Lutfullah <[EMAIL PROTECTED]> wrote: > > /etc/hosts is like this on all nodes: > > > > # Do not remove the following line, or various programs > > # that require network functionality will fail. > > 127.0.0.1 localhost.localdomain localhost > > 192.168.0.100 cc32.kust.edu.pk cc32 oscar_server nfs_oscar pbs_oscar > > > > # These entries are managed by SIS, please don't modify them. > > 192.168.0.1 oscarnode1.kust.edu.pk oscarnode1 > > 192.168.0.2 oscarnode2.kust.edu.pk oscarnode2 > > 192.168.0.3 oscarnode3.kust.edu.pk oscarnode3 > > 192.168.0.4 oscarnode4.kust.edu.pk oscarnode4 > > 192.168.0.5 oscarnode5.kust.edu.pk oscarnode5 > > 192.168.0.6 oscarnode6.kust.edu.pk oscarnode6 > > 192.168.0.7 oscarnode7.kust.edu.pk oscarnode7 > > 192.168.0.8 oscarnode8.kust.edu.pk oscarnode8 > > 192.168.0.9 oscarnode9.kust.edu.pk oscarnode9 > > 192.168.0.10 oscarnode10.kust.edu.pk oscarnode10 > > --------------------- rest deleted--------------- > > > > lutfullah > > > > On 4/24/07, Michael Edwards <[EMAIL PROTECTED]> wrote: > > > Try rebooting the head node, waiting until it is completely booted, and > then > > > rebooting all the client nodes. If everything is set up right, that > might > > > fix the problem. You can check if it is working by sshing to the > compute > > > node and doing "ls /home" and comparing it to the head node. > > > > > > Check /etc/exports and make sure /home is exported to the internal > network, > > > and the fstab in the image and nodes to make sure it is being mounted. > > > > > > Another common problem is an incorrect /etc/hosts files, see if it is > > > something like > > > 127.0.0.1 localhost.localdomain localhost > > > 10.0.0.1 oscarmaster.oscardomain oscarmaster nfs_oscar pbs_oscar > > > > > > If the cluster hostname is on the same line as localhost, that will > cause > > > this kind of problem because the nodes will be trying to contact > themselves > > > instead of the head node for nfs mounts. > > > > > > Not much will work without nfs working unfortunately. > > > > > > > > > On 4/23/07, Dr. Lutfullah <[EMAIL PROTECTED]> wrote: > > > > Thanks a lot. There are 15 log files. I am attaching two. > > > > > > > > lutfullah > > > > > > > > On 4/23/07, Michael Edwards < [EMAIL PROTECTED]> wrote: > > > > > Please send your oscarinstall.log file. > > > > > > > > > > > > > > > On 4/23/07, Dr. Lutfullah < [EMAIL PROTECTED] > wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > I am using fedora core 4 and trying to install oscar on a cluster > with > > > > > > 32 HP P4 computers. > > > > > > Everything has gone well except for the last step in which I get > tests > > > > > > FAILED errors. > > > > > > Something like: > > > > > > /home mounts 31 nodes FAILED > > > > > > SSH ping test PASSED > > > > > > SSH server > node FAILED > > > > > > SSH node -> server FAILED > > > > > > TORQUE shell test > > > > > > goes into a loop and produces > > > > > > Checking for NFS propagation of MPI preferences not yet > > > > > > this message then keeps on appearing. > > > > > > Could anyone please help. > > > > > > This is our first experience with a cluster. > > > > > > > > > > > > lutfullah > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > > This SF.net email is sponsored by DB2 Express > > > > > > Download DB2 Express C - the FREE version of DB2 express and take > > > > > > control of your XML. No limits. Just data. Click to get it now. > > > > > > http://sourceforge.net/powerbar/db2/ > > > > > > _______________________________________________ > > > > > > Oscar-users mailing list > > > > > > Oscar-users@lists.sourceforge.net > > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > > > > This SF.net email is sponsored by DB2 Express > > > > > Download DB2 Express C - the FREE version of DB2 express and take > > > > > control of your XML. No limits. Just data. Click to get it now. > > > > > http://sourceforge.net/powerbar/db2/ > > > > > _______________________________________________ > > > > > Oscar-users mailing list > > > > > Oscar-users@lists.sourceforge.net > > > > > > > > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users