OK. The -np only run: --- sh-3.1$ mpirun -np 2 --display-allocation --display-devel-map mpi_hello
====================== ALLOCATED NODES ====================== Data for node: Name: cut1n7 Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[51868,0],0] Daemon launched: True Num slots: 1 Slots in use: 0 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: Name: cut1n8 Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: Not defined Daemon launched: False Num slots: 0 Slots in use: 0 Num slots allocated: 0 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 ================================================================= Map generated by mapping policy: 0400 Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE Num new daemons: 1 New daemon starting vpid 1 Num nodes: 2 Data for node: Name: cut1n7 Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[51868,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[51868,1],0] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Data for node: Name: cut1n8 Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[51868,0],1] Daemon launched: False Num slots: 0 Slots in use: 1 Num slots allocated: 0 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[51868,1],1] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Hello, I am node cut1n8 with rank 1 Hello, I am node cut1n7 with rank 0 --- Before the segfault I got (using -npernode): --- sh-3.1$ mpirun -npernode 1 --display-allocation --display-devel-map mpi_hello ====================== ALLOCATED NODES ====================== Data for node: Name: cut1n7 Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[51942,0],0] Daemon launched: True Num slots: 1 Slots in use: 0 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: Name: cut1n8 Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: Not defined Daemon launched: False Num slots: 0 Slots in use: 0 Num slots allocated: 0 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 ================================================================= Map generated by mapping policy: 0400 Npernode: 1 Oversubscribe allowed: TRUE CPU Lists: FALSE Num new daemons: 1 New daemon starting vpid 1 Num nodes: 2 Data for node: Name: cut1n7 Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[51942,0],0] Daemon launched: True Num slots: 1 Slots in use: 1 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[51942,1],0] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Data for node: Name: cut1n8 Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[51942,0],1] Daemon launched: False Num slots: 0 Slots in use: 1 Num slots allocated: 0 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[51942,1],0] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL [cut1n7:19375] *** Process received signal *** [cut1n7:19375] Signal: Segmentation fault (11) [cut1n7:19375] Signal code: Address not mapped (1) [cut1n7:19375] Failing at address: 0x50 [cut1n7:19375] [ 0] /lib64/libpthread.so.0 [0x37bda0de80] [cut1n7:19375] [ 1] /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xdb) [0x2aed0f93af8b] [cut1n7:19375] [ 2] /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x655) [0x2aed0f9462f5] [cut1n7:19375] [ 3] /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x10b) [0x2aed0f94d31b] [cut1n7:19375] [ 4] /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/openmpi/mca_plm_slurm.so [0x2aed107f6ecf] [cut1n7:19375] [ 5] mpirun [0x40335a] [cut1n7:19375] [ 6] mpirun [0x4029f3] [cut1n7:19375] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x37bce1d8b4] [cut1n7:19375] [ 8] mpirun [0x402929] [cut1n7:19375] *** End of error message *** Segmentation fault --- I'll look into a slurm version update. Previously, SLURM 1.0.30 and Open MPI 1.3.2 working together. Just curious what was giving me heartache here ... On Mon, May 17, 2010 at 4:06 PM, Ralph Castain <r...@open-mpi.org> wrote: > That's a pretty old version of slurm - I don't have access to anything that > old to test against. You could try running it with --display-allocation > --display-devel-map to see what ORTE thinks the allocation is and how it > mapped the procs. It sounds like something may be having a problem there... > > > On Mon, May 17, 2010 at 11:08 AM, Christopher Maestas <cdmaes...@gmail.com > > wrote: > >> Hello, >> >> I've been having some troubles with OpenMPI 1.4.X and slurm recently. I >> seem to be able to run jobs this way ok: >> --- >> sh-3.1$ mpirun -np 2 mpi_hello >> Hello, I am node cut1n7 with rank 0 >> Hello, I am node cut1n8 with rank 1 >> -- >> >> However if I try and use the -npernode option I get: >> --- >> sh-3.1$ mpirun -npernode 1 mpi_hello >> [cut1n7:16368] *** Process received signal *** >> [cut1n7:16368] Signal: Segmentation fault (11) >> [cut1n7:16368] Signal code: Address not mapped (1) >> [cut1n7:16368] Failing at address: 0x50 >> [cut1n7:16368] [ 0] /lib64/libpthread.so.0 [0x37bda0de80] >> [cut1n7:16368] [ 1] >> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xdb) >> [0x2b73eb84df8b] >> [cut1n7:16368] [ 2] >> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x655) >> [0x2b73eb8592f5] >> [cut1n7:16368] [ 3] >> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x10b) >> [0x2b73eb86031b] >> [cut1n7:16368] [ 4] >> /apps/mpi/openmpi/1.4.2-gcc-4.1.2-may.12.10/lib/openmpi/mca_plm_slurm.so >> [0x2b73ec709ecf] >> [cut1n7:16368] [ 5] mpirun [0x40335a] >> [cut1n7:16368] [ 6] mpirun [0x4029f3] >> [cut1n7:16368] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) >> [0x37bce1d8b4] >> [cut1n7:16368] [ 8] mpirun [0x402929] >> [cut1n7:16368] *** End of error message *** >> Segmentation fault >> --- >> >> This is ompi 1.4.2, gcc 4.1.1 and slurm 2.0.9 ... I'm sure it's a rather >> silly detail on my end, but figure I should start this thread for any >> insights and feedback I can help provide to resolve this. >> >> Thanks, >> -cdm >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >