To add more info, here is a backtrace of the spawned (hung) program.

(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x402cdaec in sched_yield () from /lib/tls/libc.so.6
#2  0x4016360c in opal_progress () at runtime/opal_progress.c:301
#3 0x403a9b29 in mca_oob_tcp_msg_wait (msg=0x805cc70, rc=0xbfffba40) at oob_tcp_msg.c:108 #4 0x403b09a5 in mca_oob_tcp_recv (peer=0xbfffbba8, iov=0xbfffba88, count=1, tag=0, flags=4) at oob_tcp_recv.c:138 #5 0x40119420 in mca_oob_recv_packed (peer=0xbfffbba8, buf=0x821b200, tag=0) at base/oob_base_recv.c:69 #6 0x4003c28b in ompi_comm_allreduce_intra_oob (inbuf=0xbfffbb48, outbuf=0xbfffbb44, count=1, op=0x400d14a0, comm=0x8049d38, bridgecomm=0x0, lleader=0xbfffbc04, rleader=0xbfffbba8, send_first=1) at communicator/comm_cid.c:674 #7 0x4003adf2 in ompi_comm_nextcid (newcomm=0x807c4f8, comm=0x8049d38, bridgecomm=0x0, local_leader=0xbfffbc04, remote_leader=0xbfffbba8, mode=256, send_first=1) at communicator/ comm_cid.c:176 #8 0x4003cc2c in ompi_comm_connect_accept (comm=0x8049d38, root=0, port=0x807a5c0, send_first=1, newcomm=0xbfffbc28,
    tag=2000) at communicator/comm_dyn.c:208
#9  0x4003ec97 in ompi_comm_dyn_init () at communicator/comm_dyn.c:668
#10 0x4005465a in ompi_mpi_init (argc=1, argv=0xbfffbf64, requested=0, provided=0xbfffbd14)
    at runtime/ompi_mpi_init.c:704
#11 0x40090367 in PMPI_Init (argc=0xbfffbee0, argv=0xbfffbee4) at pinit.c:71
#12 0x08048983 in main (argc=1, argv=0xbfffbf64) at slave.c:43
(gdb)


Prakash


On Dec 6, 2007, at 12:08 AM, Prakash Velayutham wrote:

Hi Edgar,

I changed the spawned program from /bin/hostname to a very simple MPI
program as below. But now, the slave hangs right at MPI_Init line.
What could the issue be?

slave.c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "mpi.h"
#include <sys/types.h>     /* standard system types       */
#include <netinet/in.h>    /* Internet address structures */
#include <sys/socket.h>    /* socket interface functions  */
#include <netdb.h>         /* host to IP resolution       */

int gdb_var;
void
main(int argc, char **argv)
{
        int             tag = 0;
        int             my_rank;
        int             num_proc;
        MPI_Status      status;
        MPI_Comm        inter_comm;

        gdb_var = 0;
  char hostname[64];

   FILE *f;

        while (0 == gdb_var) sleep(5);
  gethostname(hostname, 64);

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
        MPI_Comm_size(MPI_COMM_WORLD, &num_proc);

        MPI_Comm_get_parent(&inter_comm);

        MPI_Finalize();
        exit(0);
}

Thanks,
Prakash


On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote:

MPI_Comm_spawn is tested nightly by the test our suites, so it should
definitely work...

Thanks
Edgar

Prakash Velayutham wrote:
Thanks Edgar. I did not know that. Really?

Anyways, you are sure, an MPI job will work as a spawned process
instead of "hostname"?

Thanks,
Prakash


On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:

MPI_Comm_spawn has to build an intercommunicator with the child
process
that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages from
the
child process(es) in order to set up the intercommunicator.

Thanks
Edgar

Prakash Velayutham wrote:
Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

######################################################################################

#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include "mpi.h"

void
main(int argc, char **argv)
{

      int             tag = 0;
      int             my_rank;
      int             num_proc;
      char            message_0[] = "hello slave, i'm your
master";
      char            message_1[50];
      char            master_data[] = "slaves to work";
      int             array_of_errcodes[10];
      int             num;
      MPI_Status      status;
      MPI_Comm        inter_comm;
      MPI_Info        info;
      int             arr[1];
      int             rc1;

      MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
      MPI_Comm_size(MPI_COMM_WORLD, &num_proc);

      printf("MASTER : spawning a slave ... \n");
      rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);

      MPI_Finalize();
      exit(0);
}

######################################################################################


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to