MPI implementation has nothing to do with nfs mounts or paswordless ssh. If
you ssh onto the nodes and /home is not mounted, then nfs is not working.
Things to do:
Check /var/log/messages on the client and server for nfs related errors.
Do you have the firewall on the head node turned off? Clients? Switch?
All internal traffic should be allowed.
Check and see if /home is in /etc/fstab on the compute nodes.
Check and make sure /home is in /etc/exports on the head node and refers to
the internal cluster network.
Check and see if you can ssh to nfs_oscar from the compute nodes.
As for doing things on all the nodes at the same time, that is one of the
nicer included packages in oscar, which is c3. It lets you do all sorts of
nice things. Here is a list of some of the included cluster commands,
please refer to the man pages for syntax
http://svn.oscar.openclustergroup.org/wiki/oscar:5.0:administration_guide:ch4.3.1_c3_overview
On 4/26/07, Lutfullah Kakakhel <[EMAIL PROTECTED]> wrote:
I did a clean install again and have run into exactly the same problem
at step-8.
The /home mount scrolls on all 31 nodes and fails in the test.
I get no errors whatsoever in the first steps.
This time however, I am asked for the password for oscartst user as
number of times as there are nodes in subsequent tests.
It looks like a very minor problem but we don't know how to fix it.
Unfortunately, we are non professional linux users and are trying to
fix the cluster for teaching bioinformatics.
1. I have installed fedora core 4 as 'workstation' Should I go for
custom install?
2. I have checked 'network' also in the selection of packages.
3. I have selected MPIH instead of LAM
Could some one please suggest some tests to check if NFS is working
properly?
Should we change to fedora core 5 or 6 or some other distribution?
By the way how is the whole cluster shutdown?
Currently, I ssh into every node and issue halt -p :)
Still trying and haven't lost hope yet,
lutfullah
On 4/24/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
> The home directories should contain the same files as the head node,
which
> to start with is probably just a bunch of .whatever files.
>
> I have had problems with the passwordless ssh when I was fiddling around
> with getting LDAP authentication set up, so you may have changed
something
> which fixed one problem but is now causing this one. Adding things to
> hosts.allow should not be necessary for instance, though as far as I
know it
> shouldn't cause problems either.
>
> After you were having set up problems, did you go back and start with a
> clean OS install again? It shouldn't take nearly as long the second
time :)
>
>
> On 4/24/07, Dr. Lutfullah <[EMAIL PROTECTED]> wrote:
> > /etc/hosts is like this on all nodes:
> >
> > # Do not remove the following line, or various programs
> > # that require network functionality will fail.
> > 127.0.0.1 localhost.localdomain localhost
> > 192.168.0.100 cc32.kust.edu.pk cc32 oscar_server nfs_oscar pbs_oscar
> >
> > # These entries are managed by SIS, please don't modify them.
> > 192.168.0.1 oscarnode1.kust.edu.pk oscarnode1
> > 192.168.0.2 oscarnode2.kust.edu.pk oscarnode2
> > 192.168.0.3 oscarnode3.kust.edu.pk oscarnode3
> > 192.168.0.4 oscarnode4.kust.edu.pk oscarnode4
> > 192.168.0.5 oscarnode5.kust.edu.pk oscarnode5
> > 192.168.0.6 oscarnode6.kust.edu.pk oscarnode6
> > 192.168.0.7 oscarnode7.kust.edu.pk oscarnode7
> > 192.168.0.8 oscarnode8.kust.edu.pk oscarnode8
> > 192.168.0.9 oscarnode9.kust.edu.pk oscarnode9
> > 192.168.0.10 oscarnode10.kust.edu.pk oscarnode10
> > --------------------- rest deleted---------------
> >
> > lutfullah
> >
> > On 4/24/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
> > > Try rebooting the head node, waiting until it is completely booted,
and
> then
> > > rebooting all the client nodes. If everything is set up right, that
> might
> > > fix the problem. You can check if it is working by sshing to the
> compute
> > > node and doing "ls /home" and comparing it to the head node.
> > >
> > > Check /etc/exports and make sure /home is exported to the internal
> network,
> > > and the fstab in the image and nodes to make sure it is being
mounted.
> > >
> > > Another common problem is an incorrect /etc/hosts files, see if it
is
> > > something like
> > > 127.0.0.1 localhost.localdomain localhost
> > > 10.0.0.1 oscarmaster.oscardomain oscarmaster nfs_oscar pbs_oscar
> > >
> > > If the cluster hostname is on the same line as localhost, that will
> cause
> > > this kind of problem because the nodes will be trying to contact
> themselves
> > > instead of the head node for nfs mounts.
> > >
> > > Not much will work without nfs working unfortunately.
> > >
> > >
> > > On 4/23/07, Dr. Lutfullah <[EMAIL PROTECTED]> wrote:
> > > > Thanks a lot. There are 15 log files. I am attaching two.
> > > >
> > > > lutfullah
> > > >
> > > > On 4/23/07, Michael Edwards < [EMAIL PROTECTED]> wrote:
> > > > > Please send your oscarinstall.log file.
> > > > >
> > > > >
> > > > > On 4/23/07, Dr. Lutfullah < [EMAIL PROTECTED] >
wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I am using fedora core 4 and trying to install oscar on a
cluster
> with
> > > > > > 32 HP P4 computers.
> > > > > > Everything has gone well except for the last step in which I
get
> tests
> > > > > > FAILED errors.
> > > > > > Something like:
> > > > > > /home mounts 31 nodes FAILED
> > > > > > SSH ping test PASSED
> > > > > > SSH server > node FAILED
> > > > > > SSH node -> server FAILED
> > > > > > TORQUE shell test
> > > > > > goes into a loop and produces
> > > > > > Checking for NFS propagation of MPI preferences not yet
> > > > > > this message then keeps on appearing.
> > > > > > Could anyone please help.
> > > > > > This is our first experience with a cluster.
> > > > > >
> > > > > > lutfullah
> > > > > >
> > > > > >
> > > > >
> > >
>
-------------------------------------------------------------------------
> > > > > > This SF.net email is sponsored by DB2 Express
> > > > > > Download DB2 Express C - the FREE version of DB2 express and
take
> > > > > > control of your XML. No limits. Just data. Click to get it
now.
> > > > > > http://sourceforge.net/powerbar/db2/
> > > > > > _______________________________________________
> > > > > > Oscar-users mailing list
> > > > > > Oscar-users@lists.sourceforge.net
> > > > > >
> > >
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > > >
> > > > >
> > > > >
> > > > >
> > >
>
-------------------------------------------------------------------------
> > > > > This SF.net email is sponsored by DB2 Express
> > > > > Download DB2 Express C - the FREE version of DB2 express and
take
> > > > > control of your XML. No limits. Just data. Click to get it now.
> > > > > http://sourceforge.net/powerbar/db2/
> > > > > _______________________________________________
> > > > > Oscar-users mailing list
> > > > > Oscar-users@lists.sourceforge.net
> > > > >
> > >
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > >
> > > > >
> > > >
> > > >
> >
> >
>
-------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
>
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users