Re: [OMPI users] Simple MPI_Comm_spawn program hangs
To add more info, here is a backtrace of the spawned (hung) program. (gdb) bt #0 0xe410 in __kernel_vsyscall () #1 0x402cdaec in sched_yield () from /lib/tls/libc.so.6 #2 0x4016360c in opal_progress () at runtime/opal_progress.c:301 #3 0x403a9b29 in mca_oob_tcp_msg_wait (msg=0x805cc70, rc=0xbfffba40) at oob_tcp_msg.c:108 #4 0x403b09a5 in mca_oob_tcp_recv (peer=0xbfffbba8, iov=0xbfffba88, count=1, tag=0, flags=4) at oob_tcp_recv.c:138 #5 0x40119420 in mca_oob_recv_packed (peer=0xbfffbba8, buf=0x821b200, tag=0) at base/oob_base_recv.c:69 #6 0x4003c28b in ompi_comm_allreduce_intra_oob (inbuf=0xbfffbb48, outbuf=0xbfffbb44, count=1, op=0x400d14a0, comm=0x8049d38, bridgecomm=0x0, lleader=0xbfffbc04, rleader=0xbfffbba8, send_first=1) at communicator/comm_cid.c:674 #7 0x4003adf2 in ompi_comm_nextcid (newcomm=0x807c4f8, comm=0x8049d38, bridgecomm=0x0, local_leader=0xbfffbc04, remote_leader=0xbfffbba8, mode=256, send_first=1) at communicator/ comm_cid.c:176 #8 0x4003cc2c in ompi_comm_connect_accept (comm=0x8049d38, root=0, port=0x807a5c0, send_first=1, newcomm=0xbfffbc28, tag=2000) at communicator/comm_dyn.c:208 #9 0x4003ec97 in ompi_comm_dyn_init () at communicator/comm_dyn.c:668 #10 0x4005465a in ompi_mpi_init (argc=1, argv=0xbfffbf64, requested=0, provided=0xbfffbd14) at runtime/ompi_mpi_init.c:704 #11 0x40090367 in PMPI_Init (argc=0xbfffbee0, argv=0xbfffbee4) at pinit.c:71 #12 0x08048983 in main (argc=1, argv=0xbfffbf64) at slave.c:43 (gdb) Prakash On Dec 6, 2007, at 12:08 AM, Prakash Velayutham wrote: Hi Edgar, I changed the spawned program from /bin/hostname to a very simple MPI program as below. But now, the slave hangs right at MPI_Init line. What could the issue be? slave.c #include #include #include #include "mpi.h" #include /* standard system types */ #include /* Internet address structures */ #include /* socket interface functions */ #include /* host to IP resolution */ int gdb_var; void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; MPI_Status status; MPI_Comminter_comm; gdb_var = 0; char hostname[64]; FILE *f; while (0 == gdb_var) sleep(5); gethostname(hostname, 64); MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); MPI_Comm_get_parent(_comm); MPI_Finalize(); exit(0); } Thanks, Prakash On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote: MPI_Comm_spawn is tested nightly by the test our suites, so it should definitely work... Thanks Edgar Prakash Velayutham wrote: Thanks Edgar. I did not know that. Really? Anyways, you are sure, an MPI job will work as a spawned process instead of "hostname"? Thanks, Prakash On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote: MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
Re: [OMPI users] Simple MPI_Comm_spawn program hangs
Hi Edgar, I changed the spawned program from /bin/hostname to a very simple MPI program as below. But now, the slave hangs right at MPI_Init line. What could the issue be? slave.c #include #include #include #include "mpi.h" #include /* standard system types */ #include /* Internet address structures */ #include /* socket interface functions */ #include /* host to IP resolution */ int gdb_var; void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; MPI_Status status; MPI_Comminter_comm; gdb_var = 0; char hostname[64]; FILE *f; while (0 == gdb_var) sleep(5); gethostname(hostname, 64); MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); MPI_Comm_get_parent(_comm); MPI_Finalize(); exit(0); } Thanks, Prakash On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote: MPI_Comm_spawn is tested nightly by the test our suites, so it should definitely work... Thanks Edgar Prakash Velayutham wrote: Thanks Edgar. I did not know that. Really? Anyways, you are sure, an MPI job will work as a spawned process instead of "hostname"? Thanks, Prakash On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote: MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Simple MPI_Comm_spawn program hangs
MPI_Comm_spawn is tested nightly by the test our suites, so it should definitely work... Thanks Edgar Prakash Velayutham wrote: Thanks Edgar. I did not know that. Really? Anyways, you are sure, an MPI job will work as a spawned process instead of "hostname"? Thanks, Prakash On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote: MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
Re: [OMPI users] Simple MPI_Comm_spawn program hangs
Thanks Edgar. I did not know that. Really? Anyways, you are sure, an MPI job will work as a spawned process instead of "hostname"? Thanks, Prakash On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote: MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Simple MPI_Comm_spawn program hangs
MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
[OMPI users] Simple MPI_Comm_spawn program hangs
Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash