On 1.1.2, what that error is telling you is that it didn't find any nodes in the environment. The bproc allocator looks for an environmental variable NODES that contains a list of nodes assigned to you. This error indicates it didn't find anything.
Did you get an allocation prior to running the job? Could you check to see if NODES appears in your environment? Ralph On 10/30/06 8:47 AM, "hpe...@infonie.fr" <hpe...@infonie.fr> wrote: > Hi, > I have a problem using the MPI_Comm_spawn multiple together with bproc. > > I want to use the MPI_Comm_spawn multiple call to spawn a set of exe, but in a > bproc environment, the program crashes or is stuck on this call (depending of > the used open mpi release). > > I have created one test program that spawns one other program on the same host > (cf. code listing at the end of the mail). > > * With open mpi 1.1.2, the program crashs on the MPI_Comm_spawn multiple call: > <---------------------------------> > [myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > [myhost:17061] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR) > Failing at addr:(nil) > [0] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f70ccf] > [1] func:[0xffffe440] > [2] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_schema_base_get_node_t > okens+0x7f) [0xb7fdc41f] > [3] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_node_assign+0 > x20b) [0xb7fd230b] > [4] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate_node > s+0x41) [0xb7fd0371] > [5] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_ras_hostfile.so > [0xb7538ba8] > [6] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_ras_base_allocate+0xd0 > ) [0xb7fd0470] > [7] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754d62f] > [8] > func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0(orte_rmgr_base_cmd_dispatch > +0x137) [0xb7fd9187] > [9] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_rmgr_urm.so [0xb754e09e] > [10] func:/usr/local/Mpi/openmpi-1.1.2/lib/liborte.so.0 [0xb7fcd00e] > [11] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7585084] > [12] func:/usr/local/Mpi/openmpi-1.1.2/lib/openmpi/mca_oob_tcp.so [0xb7586763] > [13] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0(opal_event_loop+0x199) > [0xb7f5f7a9] > [14] func:/usr/local/Mpi/openmpi-1.1.2/lib/libopal.so.0 [0xb7f60353] > [15] func:/lib/tls/libpthread.so.0 [0xb7ef7b63] > [16] func:/lib/tls/libc.so.6(__clone+0x5a) [0xb7e9518a] > *** End of error message *** > <-----------------------------------------------> > > * With open mpi 1.1.1, the program is simply stuck on the MPI_Comm_spawn > multiple call: > <---------------------------------> > [myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > [myhost:17187] [0,0,0] ORTE_ERROR_LOG: Not available in file ras_bjs.c at line > 253 > <---------------------------------> > > * With open mpi 1.0.2, the program is also stuck on the MPI_Comm_spawn > multiple call but there is no ORTE_ERROR_LOG: > <---------------------------------> > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > <---------------------------------> > > > * With open mpi 1.1.2 in a non bproc environment, the program works just fine > : > <---------------------------------> > main_exe: Begining of main_exe > main_exe: Call MPI_Init > main_exe: Call MPI_Comm_spawn_multiple() > spawned_exe: Begining of spawned_exe > spawned_exe: Call MPI_Init > main_exe: Back from MPI_Comm_spawn_multiple() result = 0 > main_exe: Spawned exe returned errcode = 0 > spawned_exe: This exe does not do really much thing actually > main_exe: Call MPI_finalize > main_exe: End of main_exe > <---------------------------------> > > Can you help me to solve this problem ? > > Regards. > > Herve > > > The bproc release is: > bproc: Beowulf Distributed Process Space Version 4.0.0pre8 > bproc: (C) 1999-2003 Erik Hendriks <e...@hendriks.cx> > bproc: Initializing node set. node_ct=1 id_ct=1 > > the system is a debian sarge with a 2.6.9 kernel installed and patched with > bproc. > > Eventually, I provide to you the ompi_info log fot he open mpi 1.1.2 release: > Open MPI: 1.1.2 > Open MPI SVN revision: r12073 > Open RTE: 1.1.2 > Open RTE SVN revision: r12073 > OPAL: 1.1.2 > OPAL SVN revision: r12073 > Prefix: /usr/local/Mpi/openmpi-1.1.2 > Configured architecture: i686-pc-linux-gnu > Configured by: itrsat > Configured on: Mon Oct 23 12:55:17 CEST 2006 > Configure host: myhost > Built by: setics > Built on: lun oct 23 13:09:47 CEST 2006 > Built host: myhost > C bindings: yes > C++ bindings: yes > Fortran77 bindings: no > Fortran90 bindings: no > Fortran90 bindings size: na > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ > C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: none > Fortran77 compiler abs: none > Fortran90 compiler: none > Fortran90 compiler abs: none > C profiling: yes > C++ profiling: yes > Fortran77 profiling: no > Fortran90 profiling: no > C++ exceptions: no > Thread support: posix (mpi: yes, progress: yes) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.2) > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.2) > MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.2) > MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.2) > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: self (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.2) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.2) > MCA io: romio (MCA v1.0, API v1.0, Component v1.1.2) > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.2) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.2) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.2) > MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.2) > MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2) > MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2) > MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.2) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) > MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.2) > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.2) > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.2) > MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.2) > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA ras: bjs (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: lsf_bproc (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: poe (MCA v1.0, API v1.0, Component v1.1.2) > MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1.2) > MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.2) > MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.2) > MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.2) > MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.2) > MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.2) > MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: bproc (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: bproc_orted (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.2) > MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: bproc (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: env (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.2) > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1.2) > MCA soh: bproc (MCA v1.0, API v1.0, Component v1.1.2) > > Here below, the code listings: > * main_exe.c > <-------------------------------------------------------------------> > #include "mpi.h" > #include <stdlib.h> > #include <stdio.h> > #include <unistd.h> > int gethostname(char *nom, size_t lg); > > int main( int argc, char **argv ) { > > /* > * MPI_Comm_spawn_multiple parameters > */ > int result, count, root; > int maxprocs; > char **commands; > MPI_Info infos; > int errcodes; > > MPI_Comm intercomm, newintracomm; > int rank; > char hostname[80]; > int len; > > printf( "main_exe: Begining of main_exe\n"); > printf( "main_exe: Call MPI_Init\n"); > MPI_Init( &argc, &argv ); > MPI_Comm_rank( MPI_COMM_WORLD, &rank ); > > /* > * MPI_Comm_spawn_multiple parameters > */ > count = 1; > maxprocs = 1; > root = rank; > > commands = malloc (sizeof (char *)); > commands[0] = calloc (80, sizeof (char )); > sprintf (commands[0], "./spawned_exe"); > > MPI_Info_create( &infos ); > > /* set proc/cpu info */ > result = MPI_Info_set( infos, "soft", "0:1" ); > > /* set host info */ > result = gethostname ( hostname, len); > if ( -1 == result ) { > printf ("main_exe: Problem in gethostname\n"); > } > result = MPI_Info_set( infos, "host", hostname ); > > printf( "main_exe: Call MPI_Comm_spawn_multiple()\n"); > result = MPI_Comm_spawn_multiple( count, > commands, > MPI_ARGVS_NULL, > &maxprocs, > &infos, > root, > MPI_COMM_WORLD, > &intercomm, > &errcodes ); > printf( "main_exe: Back from MPI_Comm_spawn_multiple() result = %d\n", > result); > printf( "main_exe: Spawned exe returned errcode = %d\n", errcodes ); > > MPI_Intercomm_merge( intercomm, 0, &newintracomm ); > > /* Synchronisation with spawned exe */ > MPI_Barrier( newintracomm ); > > free( commands[0] ); > free( commands ); > MPI_Comm_free( &newintracomm ); > > printf( "main_exe: Call MPI_finalize\n"); > MPI_Finalize( ); > > printf( "main_exe: End of main_exe\n"); > return 0; > } > > <-------------------------------------------------------------------> > > * spawned_exe.c > <-------------------------------------------------------------------> > > #include "mpi.h" > #include <stdio.h> > > int main( int argc, char **argv ) { > MPI_Comm parent, newintracomm; > > printf ("spawned_exe: Begining of spawned_exe\n"); > printf( "spawned_exe: Call MPI_Init\n"); > MPI_Init( &argc, &argv ); > > MPI_Comm_get_parent ( &parent ); > MPI_Intercomm_merge ( parent, 1, &newintracomm ); > > printf( "spawned_exe: This exe does not do really much thing actually\n" > ); > > /* Synchronisation with main exe */ > MPI_Barrier( newintracomm ); > > MPI_Comm_free( &newintracomm ); > > printf( "spawned_exe: Call MPI_finalize\n"); > MPI_Finalize( ); > > printf( "spawned_exe: End of spawned_exe\n"); > return 0; > } > > <-------------------------------------------------------------------> > > > > --------------------- ALICE SECURITE ENFANTS --------------------- > Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le > contrôle parental d'Alice. > http://www.aliceadsl.fr/securitepc/default_copa.asp > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users