Hi Dominik,

Have you double checked to see if the mpi names(lam-XXX and openmpi-XXX) for 
switcher exist on your client nodes?
http://www.mail-archive.com/[email protected]/msg08516.html

Can you post what option did you give to rebuild lam?

Of course, once your openmpi and lam are rebuilt/installed on your head node, I 
believe that they are also newly installed to the client nodes, right?

Regards,

- DongInn


Dominik Schips wrote:
> Hello,
> 
> I have OSCAR 5.0 on SLES10SP1 (x86_64) but still get 2 errors at the
> last step if I check the cluster. The logs are below
> 
> The last and biggest change I made was the switch from OpenMPI 1.1.1 to
> OpenMPI 1.2.5 now. After the RPM build I changed the configuration that
> OSCAR can use the new package.
> 
> The build system is also the testing system. So it isn't a clean (fresh)
> SLES10SP1 and OSCAR 5.0. I think it is always a problem to get package
> debendency problem and other stuff if it isn't a clean system correct.
> 
> I have build OpenMPI from the official OpenMPI src rpm. It has tm
> support.
> 
> # ompi_info | grep tm
>               MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.2.5)
>                  MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.5)
>                  MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.5)
> 
> About the LAM problem I read
> (http://www.mail-archive.com/[email protected]/msg04656.html) 
> that it could be a problem if the package was build on a host without torque. 
> But this problem is very old and from OSCAR 4.2. I tried it with torque but 
> didn't help me to solve it.
> 
> There were no errors at the lam and openmpi RPM build.
> And in every log I had a look I can not find more information what can
> cause this 2 problems.
> 
> The PVM and MPICH tests are passed so it couldn't be a possible switcher
> problem I think.
> 
> For every help I would be glad. If you need more information or a
> logfile just let me know.
> 
> 
> 
> Performing root tests...
> Maui service check:maui                                        [PASSED]
> TORQUE node check                                              [PASSED]
> TORQUE service check:pbs_server                                [PASSED]
> /home mounts                                                   [PASSED]
> 
> Preparing user tests...
> Performing user tests...
> SSH ping test                                                  [PASSED]
> SSH server->node                                               [PASSED]
> SSH node->server                                               [PASSED]
> LAM/MPI (via TORQUE)                                           [FAILED]
> PVM (via TORQUE)                                               [PASSED]
> MPICH (via TORQUE)                                             [PASSED]
> Ganglia setup test                                             [PASSED]
> Ganglia node count test                                        [PASSED]
> TORQUE default queue definition                                [PASSED]
> TORQUE Shell Test                                              [PASSED]
> Open MPI (via TORQUE)                                          [FAILED]
> qdel: Request invalid for state of job 106.sles10oscar
> 
> Run APItests...
> 
> Running Installation tests for pvm
> [PASS]       2008-01-23 09:54:49   pvmd-path-ls.apt
> [PASS]       2008-01-23 09:54:49   envvar-pvm_arch.apt
> [PASS]       2008-01-23 09:54:49   envvar-pvm_root.apt
> [PASS]       2008-01-23 09:54:49   envvar.apb
> [PASS]       2008-01-23 09:54:49   pvmd-path-which.apt
> [PASS]       2008-01-23 09:54:49   modulecmd-path-ls.apt
> [PASS]       2008-01-23 09:54:49   pvm-module-list.apt
> [PASS]       2008-01-23 09:54:49   pvm-module-show-pvm_rsh.apt
> [PASS]       2008-01-23 09:54:49   pvm-module-show-pvm_arch.apt
> [PASS]       2008-01-23 09:54:49   pvm-module-show-pvm_root.apt
> [PASS]       2008-01-23 09:54:49   pvm-module-show.apb
> [PASS]       2008-01-23 09:54:49   pvm-module.apb
> [PASS]       2008-01-23 09:54:49   install_tests.apb
> 
> There are 2 failed/skipped tests (see above).
> Please check for .err and .out files in /home/oscartst/<package>.
> 
> ...Hit <ENTER> to close this window...
> 
> --------------------------------------------------------------------------------
> 
> sles10oscar:/home/oscartst/lam # cat lamtest.out
> Running LAM/MPI test
> sles10oscar:/home/oscartst/lam # cat lamtest.err
> 
> ERROR: LAM/MPI does not appear to have the tm boot SSI module!
>        This test script will now abort.
> 
> sles10oscar:/home/oscartst/lam #
> 
> --------------------------------------------------------------------------------
> 
> sles10oscar:/home/oscartst/openmpi # cat openmpitest.err
> [oscarnode2:08400] pls:tm: failed to poll for a spawned proc, return
> status = 17002
> [oscarnode2:08400] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c
> at line 462
> [oscarnode2:08400] mpiexec: spawn failed with errno=-11
> sles10oscar:/home/oscartst/openmpi # cat openmpitest.out
> Running Open MPI test
> Open MPI appears to have TM suppport.  Yippee!
> 
> --> MPI C bindings test:
> 
> TEST FAILED!
> Commands: cp cpi.c /tmp/openmpi-test && cd /tmp/openmpi-test && mpicc
> cpi.c -o openmpi-cpi && cp openmpi-cpi /home/oscartst/openmpi &&
> cd /home/oscartst/openmpi && mpiexec
> -machinefile /var/spool/pbs/aux//106.sles10oscar -n 2 openmpi-cpi
> sles10oscar:/home/oscartst/openmpi #
> 
> 
> 
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Oscar-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to