To add more info, here is a backtrace of the spawned (hung) program.
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x402cdaec in sched_yield () from /lib/tls/libc.so.6
#2 0x4016360c in opal_progress () at runtime/opal_progress.c:301
#3 0x403a9b29 in mca_oob_tcp_msg_wait (msg=0x805cc70, rc=0xbfffba40)
at oob_tcp_msg.c:108
#4 0x403b09a5 in mca_oob_tcp_recv (peer=0xbfffbba8, iov=0xbfffba88,
count=1, tag=0, flags=4) at oob_tcp_recv.c:138
#5 0x40119420 in mca_oob_recv_packed (peer=0xbfffbba8, buf=0x821b200,
tag=0) at base/oob_base_recv.c:69
#6 0x4003c28b in ompi_comm_allreduce_intra_oob (inbuf=0xbfffbb48,
outbuf=0xbfffbb44, count=1, op=0x400d14a0,
comm=0x8049d38, bridgecomm=0x0, lleader=0xbfffbc04,
rleader=0xbfffbba8, send_first=1) at communicator/comm_cid.c:674
#7 0x4003adf2 in ompi_comm_nextcid (newcomm=0x807c4f8,
comm=0x8049d38, bridgecomm=0x0, local_leader=0xbfffbc04,
remote_leader=0xbfffbba8, mode=256, send_first=1) at communicator/
comm_cid.c:176
#8 0x4003cc2c in ompi_comm_connect_accept (comm=0x8049d38, root=0,
port=0x807a5c0, send_first=1, newcomm=0xbfffbc28,
tag=2000) at communicator/comm_dyn.c:208
#9 0x4003ec97 in ompi_comm_dyn_init () at communicator/comm_dyn.c:668
#10 0x4005465a in ompi_mpi_init (argc=1, argv=0xbfffbf64, requested=0,
provided=0xbfffbd14)
at runtime/ompi_mpi_init.c:704
#11 0x40090367 in PMPI_Init (argc=0xbfffbee0, argv=0xbfffbee4) at
pinit.c:71
#12 0x08048983 in main (argc=1, argv=0xbfffbf64) at slave.c:43
(gdb)
Prakash
On Dec 6, 2007, at 12:08 AM, Prakash Velayutham wrote:
Hi Edgar,
I changed the spawned program from /bin/hostname to a very simple MPI
program as below. But now, the slave hangs right at MPI_Init line.
What could the issue be?
slave.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mpi.h"
#include <sys/types.h> /* standard system types */
#include <netinet/in.h> /* Internet address structures */
#include <sys/socket.h> /* socket interface functions */
#include <netdb.h> /* host to IP resolution */
int gdb_var;
void
main(int argc, char **argv)
{
int tag = 0;
int my_rank;
int num_proc;
MPI_Status status;
MPI_Comm inter_comm;
gdb_var = 0;
char hostname[64];
FILE *f;
while (0 == gdb_var) sleep(5);
gethostname(hostname, 64);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
MPI_Comm_get_parent(&inter_comm);
MPI_Finalize();
exit(0);
}
Thanks,
Prakash
On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote:
MPI_Comm_spawn is tested nightly by the test our suites, so it should
definitely work...
Thanks
Edgar
Prakash Velayutham wrote:
Thanks Edgar. I did not know that. Really?
Anyways, you are sure, an MPI job will work as a spawned process
instead of "hostname"?
Thanks,
Prakash
On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:
MPI_Comm_spawn has to build an intercommunicator with the child
process
that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages
from
the
child process(es) in order to set up the intercommunicator.
Thanks
Edgar
Prakash Velayutham wrote:
Hello,
Open MPI 1.2.4
I am trying to run a simple C program.
######################################################################################
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"
void
main(int argc, char **argv)
{
int tag = 0;
int my_rank;
int num_proc;
char message_0[] = "hello slave, i'm your
master";
char message_1[50];
char master_data[] = "slaves to work";
int array_of_errcodes[10];
int num;
MPI_Status status;
MPI_Comm inter_comm;
MPI_Info info;
int arr[1];
int rc1;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
printf("MASTER : spawning a slave ... \n");
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);
MPI_Finalize();
exit(0);
}
######################################################################################
This program hangs as below:
prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01
Any ideas why?
Thanks,
Prakash
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users