Hi,

We have traced the problem to the Cisco Catalyst 2950 Switch which we
substituted with two ordinary 16 port 10/100 switches and also
switched to Fedora Core 5. It looks like the NFS problem is resolved.
We are however getting two errors on the Test Cluster Setup
1. MPICH (via TORQUE) FAILED
2. Checking for nn free nodes FAILED
    Not enough free nodes. Tests incomplete
All the rest of the tests have PASSED.
An earlier test installation with two nodes only passed all the tests.
We are trying to figure out how to configure the Cisco switch (It was
in default mode when we used it).
Any reason for the MPICH failure? There are no .err or .out files in
the directory.

Thanks in advance,

Lutfullah



On 4/26/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
> MPI implementation has nothing to do with nfs mounts or paswordless ssh.  If
> you ssh onto the nodes and /home is not mounted, then nfs is not working.
>
> Things to do:
> Check /var/log/messages on the client and server for nfs related errors.
>  Do you have the firewall on the head node turned off?  Clients?  Switch?
> All internal traffic should be allowed.
> Check and see if /home is in /etc/fstab on the compute nodes.
> Check and make sure /home is in /etc/exports on the head node and refers to
> the internal cluster network.
> Check and see if you can ssh to nfs_oscar from the compute nodes.
>
> As for doing things on all the nodes at the same time, that is one of the
> nicer included packages in oscar, which is c3.  It lets you do all sorts of
> nice things.  Here is a list of some of the included cluster commands,
> please refer to the man pages for syntax
> http://svn.oscar.openclustergroup.org/wiki/oscar:5.0:administration_guide:ch4.3.1_c3_overview
>
>
>  On 4/26/07, Lutfullah Kakakhel <[EMAIL PROTECTED]> wrote:
> >
> > I did a clean install again and have run into exactly the same problem
> > at step-8.
> > The /home mount scrolls on all 31 nodes and fails in the test.
> > I get no errors whatsoever in the first steps.
> > This time however, I am asked for the password for oscartst user as
> > number of times as there are nodes in subsequent tests.
> > It looks like a very minor problem but we don't know how to fix it.
> > Unfortunately, we are non professional linux users and are trying to
> > fix the cluster for teaching  bioinformatics.
> > 1. I have installed fedora core 4 as 'workstation' Should I go for
> > custom install?
> > 2. I have checked 'network' also in the selection of packages.
> > 3. I have selected MPIH instead of LAM
> > Could some one please suggest some tests to check if NFS is working
> properly?
> > Should we change to fedora core 5 or 6 or some other distribution?
> > By the way how is the whole cluster shutdown?
> > Currently, I ssh into every node and issue halt -p :)
> >
> > Still trying and haven't lost hope yet,
> >
> > lutfullah
> >
> >
> >
> >
> > On 4/24/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
> > > The home directories should contain the same files as the head node,
> which
> > > to start with is probably just a bunch of .whatever files.
> > >
> > > I have had problems with the passwordless ssh when I was fiddling around
> > > with getting LDAP authentication set up, so you may have changed
> something
> > > which fixed one problem but is now causing this one.  Adding things to
> > > hosts.allow should not be necessary for instance, though as far as I
> know it
> > > shouldn't cause problems either.
> > >
> > > After you were having set up problems, did you go back and start with a
> > > clean OS install again?  It shouldn't take nearly as long the second
> time :)
> > >
> > >
> > > On 4/24/07, Dr. Lutfullah < [EMAIL PROTECTED]> wrote:
> > > > /etc/hosts is like this on all nodes:
> > > >
> > > > # Do not remove the following line, or various programs
> > > > # that require network functionality will fail.
> > > > 127.0.0.1       localhost.localdomain    localhost
> > > > 192.168.0.100 cc32.kust.edu.pk cc32 oscar_server nfs_oscar pbs_oscar
> > > >
> > > > # These entries are managed by SIS, please don't modify them.
> > > > 192.168.0.1          oscarnode1.kust.edu.pk      oscarnode1
> > > > 192.168.0.2           oscarnode2.kust.edu.pk     oscarnode2
> > > > 192.168.0.3           oscarnode3.kust.edu.pk     oscarnode3
> > > > 192.168.0.4           oscarnode4.kust.edu.pk      oscarnode4
> > > > 192.168.0.5          oscarnode5.kust.edu.pk     oscarnode5
> > > > 192.168.0.6           oscarnode6.kust.edu.pk     oscarnode6
> > > > 192.168.0.7           oscarnode7.kust.edu.pk      oscarnode7
> > > > 192.168.0.8          oscarnode8.kust.edu.pk     oscarnode8
> > > > 192.168.0.9            oscarnode9.kust.edu.pk     oscarnode9
> > > > 192.168.0.10         oscarnode10.kust.edu.pk      oscarnode10
> > > > --------------------- rest deleted---------------
> > > >
> > > > lutfullah
> > > >
> > > > On 4/24/07, Michael Edwards <[EMAIL PROTECTED] > wrote:
> > > > > Try rebooting the head node, waiting until it is completely booted,
> and
> > > then
> > > > > rebooting all the client nodes.  If everything is set up right, that
> > > might
> > > > > fix the problem.  You can check if it is working by sshing to the
> > > compute
> > > > > node and doing "ls /home" and comparing it to the head node.
> > > > >
> > > > > Check /etc/exports and make sure /home is exported to the internal
> > > network,
> > > > > and the fstab in the image and nodes to make sure it is being
> mounted.
> > > > >
> > > > > Another common problem is an incorrect /etc/hosts files, see if it
> is
> > > > > something like
> > > > > 127.0.0.1 localhost.localdomain localhost
> > > > > 10.0.0.1 oscarmaster.oscardomain oscarmaster nfs_oscar pbs_oscar
> > > > >
> > > > > If the cluster hostname is on the same line as localhost, that will
> > > cause
> > > > > this kind of problem because the nodes will be trying to contact
> > > themselves
> > > > > instead of the head node for nfs mounts.
> > > > >
> > > > > Not much will work without nfs working unfortunately.
> > > > >
> > > > >
> > > > > On 4/23/07, Dr. Lutfullah <[EMAIL PROTECTED] > wrote:
> > > > > > Thanks a lot. There are 15 log files. I am attaching two.
> > > > > >
> > > > > > lutfullah
> > > > > >
> > > > > > On 4/23/07, Michael Edwards < [EMAIL PROTECTED]> wrote:
> > > > > > > Please send your oscarinstall.log file.
> > > > > > >
> > > > > > >
> > > > > > > On 4/23/07, Dr. Lutfullah < [EMAIL PROTECTED] >
> wrote:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I am using fedora core 4 and trying to install oscar on a
> cluster
> > > with
> > > > > > > > 32 HP P4 computers.
> > > > > > > > Everything has gone well except for the last step in which I
> get
> > > tests
> > > > > > > > FAILED errors.
> > > > > > > > Something like:
> > > > > > > > /home mounts 31 nodes FAILED
> > > > > > > > SSH ping test PASSED
> > > > > > > > SSH server > node FAILED
> > > > > > > > SSH node -> server FAILED
> > > > > > > > TORQUE shell test
> > > > > > > > goes into a loop and produces
> > > > > > > > Checking for NFS propagation of MPI preferences     not yet
> > > > > > > > this message then keeps on appearing.
> > > > > > > > Could anyone please help.
> > > > > > > > This is our first experience with a cluster.
> > > > > > > >
> > > > > > > > lutfullah
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> -------------------------------------------------------------------------
> > > > > > > > This SF.net email is sponsored by DB2 Express
> > > > > > > > Download DB2 Express C - the FREE version of DB2 express and
> take
> > > > > > > > control of your XML. No limits. Just data. Click to get it
> now.
> > > > > > > > http://sourceforge.net/powerbar/db2/
> > > > > > > >
> _______________________________________________
> > > > > > > > Oscar-users mailing list
> > > > > > > > Oscar-users@lists.sourceforge.net
> > > > > > > >
> > > > >
> > >
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > >
> -------------------------------------------------------------------------
> > > > > > > This SF.net email is sponsored by DB2 Express
> > > > > > > Download DB2 Express C - the FREE version of DB2 express and
> take
> > > > > > > control of your XML. No limits. Just data. Click to get it now.
> > > > > > > http://sourceforge.net/powerbar/db2/
> > > > > > > _______________________________________________
> > > > > > > Oscar-users mailing list
> > > > > > > Oscar-users@lists.sourceforge.net
> > > > > > >
> > > > >
> > >
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> -------------------------------------------------------------------------
> > > > This SF.net email is sponsored by DB2 Express
> > > > Download DB2 Express C - the FREE version of DB2 express and take
> > > > control of your XML. No limits. Just data. Click to get it now.
> > > > http://sourceforge.net/powerbar/db2/
> > > > _______________________________________________
> > > > Oscar-users mailing list
> > > > Oscar-users@lists.sourceforge.net
> > > >
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > >
> > >
> > >
> > >
> -------------------------------------------------------------------------
> > > This SF.net email is sponsored by DB2 Express
> > > Download DB2 Express C - the FREE version of DB2 express and take
> > > control of your XML. No limits. Just data. Click to get it now.
> > > http://sourceforge.net/powerbar/db2/
> > > _______________________________________________
> > > Oscar-users mailing list
> > > Oscar-users@lists.sourceforge.net
> > >
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> > >
> > >
> >
> >
> -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to