Hello,

I have OSCAR 5.0 on SLES10SP1 (x86_64) but still get 2 errors at the
last step if I check the cluster. The logs are below

The last and biggest change I made was the switch from OpenMPI 1.1.1 to
OpenMPI 1.2.5 now. After the RPM build I changed the configuration that
OSCAR can use the new package.

The build system is also the testing system. So it isn't a clean (fresh)
SLES10SP1 and OSCAR 5.0. I think it is always a problem to get package
debendency problem and other stuff if it isn't a clean system correct.

I have build OpenMPI from the official OpenMPI src rpm. It has tm
support.

# ompi_info | grep tm
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
v1.2.5)
                 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.5)

About the LAM problem I read
(http://www.mail-archive.com/[email protected]/msg04656.html) 
that it could be a problem if the package was build on a host without torque. 
But this problem is very old and from OSCAR 4.2. I tried it with torque but 
didn't help me to solve it.

There were no errors at the lam and openmpi RPM build.
And in every log I had a look I can not find more information what can
cause this 2 problems.

The PVM and MPICH tests are passed so it couldn't be a possible switcher
problem I think.

For every help I would be glad. If you need more information or a
logfile just let me know.



Performing root tests...
Maui service check:maui                                        [PASSED]
TORQUE node check                                              [PASSED]
TORQUE service check:pbs_server                                [PASSED]
/home mounts                                                   [PASSED]

Preparing user tests...
Performing user tests...
SSH ping test                                                  [PASSED]
SSH server->node                                               [PASSED]
SSH node->server                                               [PASSED]
LAM/MPI (via TORQUE)                                           [FAILED]
PVM (via TORQUE)                                               [PASSED]
MPICH (via TORQUE)                                             [PASSED]
Ganglia setup test                                             [PASSED]
Ganglia node count test                                        [PASSED]
TORQUE default queue definition                                [PASSED]
TORQUE Shell Test                                              [PASSED]
Open MPI (via TORQUE)                                          [FAILED]
qdel: Request invalid for state of job 106.sles10oscar

Run APItests...

Running Installation tests for pvm
[PASS]       2008-01-23 09:54:49   pvmd-path-ls.apt
[PASS]       2008-01-23 09:54:49   envvar-pvm_arch.apt
[PASS]       2008-01-23 09:54:49   envvar-pvm_root.apt
[PASS]       2008-01-23 09:54:49   envvar.apb
[PASS]       2008-01-23 09:54:49   pvmd-path-which.apt
[PASS]       2008-01-23 09:54:49   modulecmd-path-ls.apt
[PASS]       2008-01-23 09:54:49   pvm-module-list.apt
[PASS]       2008-01-23 09:54:49   pvm-module-show-pvm_rsh.apt
[PASS]       2008-01-23 09:54:49   pvm-module-show-pvm_arch.apt
[PASS]       2008-01-23 09:54:49   pvm-module-show-pvm_root.apt
[PASS]       2008-01-23 09:54:49   pvm-module-show.apb
[PASS]       2008-01-23 09:54:49   pvm-module.apb
[PASS]       2008-01-23 09:54:49   install_tests.apb

There are 2 failed/skipped tests (see above).
Please check for .err and .out files in /home/oscartst/<package>.

...Hit <ENTER> to close this window...

--------------------------------------------------------------------------------

sles10oscar:/home/oscartst/lam # cat lamtest.out
Running LAM/MPI test
sles10oscar:/home/oscartst/lam # cat lamtest.err

ERROR: LAM/MPI does not appear to have the tm boot SSI module!
       This test script will now abort.

sles10oscar:/home/oscartst/lam #

--------------------------------------------------------------------------------

sles10oscar:/home/oscartst/openmpi # cat openmpitest.err
[oscarnode2:08400] pls:tm: failed to poll for a spawned proc, return
status = 17002
[oscarnode2:08400] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c
at line 462
[oscarnode2:08400] mpiexec: spawn failed with errno=-11
sles10oscar:/home/oscartst/openmpi # cat openmpitest.out
Running Open MPI test
Open MPI appears to have TM suppport.  Yippee!

--> MPI C bindings test:

TEST FAILED!
Commands: cp cpi.c /tmp/openmpi-test && cd /tmp/openmpi-test && mpicc
cpi.c -o openmpi-cpi && cp openmpi-cpi /home/oscartst/openmpi &&
cd /home/oscartst/openmpi && mpiexec
-machinefile /var/spool/pbs/aux//106.sles10oscar -n 2 openmpi-cpi
sles10oscar:/home/oscartst/openmpi #




-- 
Mit freundlichen Grüßen / Best regards

Dominik Schips

Tel.: +49 (0)21 61 - 46 43-112
Fax:  +49 (0)21 61 - 46 43-100

credativ GmbH, HRB Mönchengladbach 12080
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to