Hi,

I am attempting to install and configure OSCAR 1.4 under RedHat 7.1.  I
understand that OSCAR 1.4 is only experimentally supported under 7.1
(according to the installation documentation) so I know I'm starting off
with a handicap from the beginning.  I can't upgrade to RH 7.2 or 7.3
because some of the s/w that is in use on my cluster has only been tested
and verified under 7.1 by the vendor.  Anyway, I've encountered a few bumps
in the road and would like to pick other users' brains.  My cluster consists
of a server (P3 450) and 6 clients (1 P3 450, 5 P2 450s).  All are single
processor platforms.  All but one of the NICs are 3COMs, the other is an
EEPro100.  They communicate through a Cisco 2900XL series switch (100Mb).
LAM/MPI is the default MPI environment.

First, during the server setup phase, when the distro and update RPMs in
/tftpboot/rpm are being checked for the latest and greatest version #, a
message is written to the screen about a non-numeric comparison between
versions. Here's an excerpt from the installation log:

             .
             .
             .

        --> Successfully ran wizard_prep
        --> Running: "./oscar_wizard eth1"

        
============================================================================
=
        == Running step 1 of the OSCAR wizard
        
============================================================================
=

        --> Step 1: MPI selected: lam-6.5.6

        --> Step 1: Running: ./server_prep eth1
         [OSCAR::PackageBest :: Line 302] Reading package directory
         [OSCAR::PackageBest :: Line 314] Reading cache file.
         [OSCAR::PackageBest :: Line 327] Comparing cache to directory.
         [OSCAR::PackageBest :: Line 352] Writing new cache file.
====>    [OSCAR::PackageBest :: Line 229] - non numeric comparison
3.4.3/3.4.3 cmp 3.4.4+6/3.4.4+6
        Preparing...
##################################################
        openpbs-oscar
##################################################
        openpbs-oscar-client
##################################################
        openpbs-oscar-server
##################################################
        rrdtool-oscar
##################################################
             .
             .
             .

There are other messages like this, too.  This particular message is for the
version comparison between the latest PVM RPMs from RedHat and the OSCAR PVM
distro.  I don't think this is an OSCAR error, or an error at all, but
rather it is a warning issued by the RPM command that different versions of
an RPM could not be compared due to non-numeric characters in the version
#'s of one or both of the RPMs being compared.  Does anyone know if this is
really a problem to be concerned with?  The installation doesn't crash and
it appears that the latest version of each RPM gets selected and installed
anyway.  Just for safe measure, I removed all old/outdated RPMs from the
/tftpboot/rpm directory and the messages went away.

Second, the MPICH and LAM tests are failing.  After much interrogation and
hand-modification of the pbs_script.lam and pbs_script.mpich scripts to
isolate the source(s) of the failure, I believe that the problem may have to
do with the setup of the LAM and MPI operating environments.  I modified the
pbs_script.lam and pbs_script.mpich scripts to execute only certain parts of
the tests to see if any parts would pass and was able to determine the
following:
(1) MPICH tests:
     (a) successfully compiles C test program, fails to execute (more on
this in a moment);
     (b) fails C++ test program compile (undefined references, see below),
unable as yet to debug (missing a header file?)

        /tmp/ccmZEHqI.o: In function `main':
        /tmp/ccmZEHqI.o(.text+0x1f): undefined reference to
`MPI::COMM_WORLD'
        /tmp/ccmZEHqI.o(.text+0x34): undefined reference to
`MPI::COMM_WORLD'
        /tmp/ccmZEHqI.o: In function `MPI::Finalize(void)':
        /tmp/ccmZEHqI.o(.MPI::gnu.linkonce.t.Finalize(void)+0xa): undefined
reference to `MPI::ERRORS_THROW_EXCEPTIONS'
        /tmp/ccmZEHqI.o: In function `MPI::Real_init(void)':
        /tmp/ccmZEHqI.o(.MPI::gnu.linkonce.t.Real_init(void)+0xa): undefined
reference to `MPI::ERRORS_THROW_EXCEPTIONS'
        /tmp/ccmZEHqI.o: In function `MPI::Errhandler::init(void) const':
        /tmp/ccmZEHqI.o(.MPI::Errhandler::gnu.linkonce.t.init(void)
const+0x11): undefined reference to `throw_excptn_fctn'
        collect2: ld returned 1 exit status
        hcc: No such file or directory

     (c) fails FORTRAN test program compile because the header file mpif.h
cannot be found.  Specifiying the full path of the header file in the test
program source code fixes the problem (successfully compiles);
     (d) fails all test program executions, seemingly because the mpirun
command does not recognize the -machinefile option on the command line in
the script.  The usage/help information for the mpirun command is returned
upon execution.  When the -machinefile option is removed, mpirun complains
that there is no lamd running on the execution host and requests that
lamboot be executed to start it.  Running lamboot and starting lamd on all
nodes does nothing to change this.  It's as if the LAM environment is still
in effect for the MPICH test rather than the MPICH environment.  Should
switcher be used to invoke the MPICH environment before running the MPICH
tests and then to return to the default (LAM) environment after running the
MPCH tests?;
(2) LAM tests:
     (a) lamboot at beginning of pbs_script.lam attempts to rsh and fails.
rsh is deactivated by default on my cluster and I would like to keep it this
way.  Setting up a link named rsh that points to ssh fixes this.  Can the
OSCAR distros of LAM and MPICH be configured to use ssh instead of rsh?
     (b) successfully compiles and executes C test program;
     (c) fails C++ test program compile (undefined references, same as with
MPICH, see above), unable as yet to debug (missing a header file?);
     (d) fails FORTRAN test program compile because the header file mpif.h
cannot be found (same as with MPICH).  Specifiying the full path of the
header file in the test program source code fixes the problem (successfully
compiles, same as with MPICH, successfully executes).

Are these known problems with OSCAR under RedHat 7.1?  Is there a way to
resolve them?

Many thanks for any help you can give, and my apologies for the lengthy
post, but I wanted to be as thorough as possible up front to minimize
re-re-re-re-re-responses.

Chris Hazelrig
Simulation Technologies,Inc.
H455, Bldg. 5400, RSA
(256)955-7305
(256)876-4204
[EMAIL PROTECTED]





-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future 
of Java(TM) technology. Join the Java Community 
Process(SM) (JCP(SM)) program now. 
http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to