Re: [OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-06 Thread Prakash Velayutham

To add more info, here is a backtrace of the spawned (hung) program.

(gdb) bt
#0  0xe410 in __kernel_vsyscall ()
#1  0x402cdaec in sched_yield () from /lib/tls/libc.so.6
#2  0x4016360c in opal_progress () at runtime/opal_progress.c:301
#3  0x403a9b29 in mca_oob_tcp_msg_wait (msg=0x805cc70, rc=0xbfffba40)  
at oob_tcp_msg.c:108
#4  0x403b09a5 in mca_oob_tcp_recv (peer=0xbfffbba8, iov=0xbfffba88,  
count=1, tag=0, flags=4) at oob_tcp_recv.c:138
#5  0x40119420 in mca_oob_recv_packed (peer=0xbfffbba8, buf=0x821b200,  
tag=0) at base/oob_base_recv.c:69
#6  0x4003c28b in ompi_comm_allreduce_intra_oob (inbuf=0xbfffbb48,  
outbuf=0xbfffbb44, count=1, op=0x400d14a0,
comm=0x8049d38, bridgecomm=0x0, lleader=0xbfffbc04,  
rleader=0xbfffbba8, send_first=1) at communicator/comm_cid.c:674
#7  0x4003adf2 in ompi_comm_nextcid (newcomm=0x807c4f8,  
comm=0x8049d38, bridgecomm=0x0, local_leader=0xbfffbc04,
remote_leader=0xbfffbba8, mode=256, send_first=1) at communicator/ 
comm_cid.c:176
#8  0x4003cc2c in ompi_comm_connect_accept (comm=0x8049d38, root=0,  
port=0x807a5c0, send_first=1, newcomm=0xbfffbc28,

tag=2000) at communicator/comm_dyn.c:208
#9  0x4003ec97 in ompi_comm_dyn_init () at communicator/comm_dyn.c:668
#10 0x4005465a in ompi_mpi_init (argc=1, argv=0xbfffbf64, requested=0,  
provided=0xbfffbd14)

at runtime/ompi_mpi_init.c:704
#11 0x40090367 in PMPI_Init (argc=0xbfffbee0, argv=0xbfffbee4) at  
pinit.c:71

#12 0x08048983 in main (argc=1, argv=0xbfffbf64) at slave.c:43
(gdb)


Prakash


On Dec 6, 2007, at 12:08 AM, Prakash Velayutham wrote:


Hi Edgar,

I changed the spawned program from /bin/hostname to a very simple MPI
program as below. But now, the slave hangs right at MPI_Init line.
What could the issue be?

slave.c

#include 
#include 
#include 
#include "mpi.h"
#include  /* standard system types   */
#include /* Internet address structures */
#include /* socket interface functions  */
#include  /* host to IP resolution   */

int gdb_var;
void
main(int argc, char **argv)
{
int tag = 0;
int my_rank;
int num_proc;
MPI_Status  status;
MPI_Comminter_comm;

gdb_var = 0;
  char hostname[64];

   FILE *f;

while (0 == gdb_var) sleep(5);
  gethostname(hostname, 64);

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Comm_size(MPI_COMM_WORLD, _proc);

MPI_Comm_get_parent(_comm);

MPI_Finalize();
exit(0);
}

Thanks,
Prakash


On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote:


MPI_Comm_spawn is tested nightly by the test our suites, so it should
definitely work...

Thanks
Edgar

Prakash Velayutham wrote:

Thanks Edgar. I did not know that. Really?

Anyways, you are sure, an MPI job will work as a spawned process
instead of "hostname"?

Thanks,
Prakash


On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:


MPI_Comm_spawn has to build an intercommunicator with the child
process
that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages  
from

the
child process(es) in order to set up the intercommunicator.

Thanks
Edgar

Prakash Velayutham wrote:

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

  int tag = 0;
  int my_rank;
  int num_proc;
  charmessage_0[] = "hello slave, i'm your
master";
  charmessage_1[50];
  charmaster_data[] = "slaves to work";
  int array_of_errcodes[10];
  int num;
  MPI_Status  status;
  MPI_Comminter_comm;
  MPI_Infoinfo;
  int arr[1];
  int rc1;

  MPI_Init(, );
  MPI_Comm_rank(MPI_COMM_WORLD, _rank);
  MPI_Comm_size(MPI_COMM_WORLD, _proc);

  printf("MASTER : spawning a slave ... \n");
  rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);

  MPI_Finalize();
  exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335

Re: [OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-06 Thread Prakash Velayutham

Hi Edgar,

I changed the spawned program from /bin/hostname to a very simple MPI  
program as below. But now, the slave hangs right at MPI_Init line.  
What could the issue be?


slave.c

#include 
#include 
#include 
#include "mpi.h"
#include  /* standard system types   */
#include /* Internet address structures */
#include /* socket interface functions  */
#include  /* host to IP resolution   */

int gdb_var;
void
main(int argc, char **argv)
{
int tag = 0;
int my_rank;
int num_proc;
MPI_Status  status;
MPI_Comminter_comm;

gdb_var = 0;
  char hostname[64];

   FILE *f;

while (0 == gdb_var) sleep(5);
  gethostname(hostname, 64);

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Comm_size(MPI_COMM_WORLD, _proc);

MPI_Comm_get_parent(_comm);

MPI_Finalize();
exit(0);
}

Thanks,
Prakash


On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote:


MPI_Comm_spawn is tested nightly by the test our suites, so it should
definitely work...

Thanks
Edgar

Prakash Velayutham wrote:

Thanks Edgar. I did not know that. Really?

Anyways, you are sure, an MPI job will work as a spawned process
instead of "hostname"?

Thanks,
Prakash


On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:


MPI_Comm_spawn has to build an intercommunicator with the child
process
that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages from
the
child process(es) in order to set up the intercommunicator.

Thanks
Edgar

Prakash Velayutham wrote:

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

   int tag = 0;
   int my_rank;
   int num_proc;
   charmessage_0[] = "hello slave, i'm your  
master";

   charmessage_1[50];
   charmaster_data[] = "slaves to work";
   int array_of_errcodes[10];
   int num;
   MPI_Status  status;
   MPI_Comminter_comm;
   MPI_Infoinfo;
   int arr[1];
   int rc1;

   MPI_Init(, );
   MPI_Comm_rank(MPI_COMM_WORLD, _rank);
   MPI_Comm_size(MPI_COMM_WORLD, _proc);

   printf("MASTER : spawning a slave ... \n");
   rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);

   MPI_Finalize();
   exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-02 Thread Edgar Gabriel
MPI_Comm_spawn is tested nightly by the test our suites, so it should 
definitely work...


Thanks
Edgar

Prakash Velayutham wrote:

Thanks Edgar. I did not know that. Really?

Anyways, you are sure, an MPI job will work as a spawned process  
instead of "hostname"?


Thanks,
Prakash


On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:

MPI_Comm_spawn has to build an intercommunicator with the child  
process

that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages from  
the

child process(es) in order to set up the intercommunicator.

Thanks
Edgar

Prakash Velayutham wrote:

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

int tag = 0;
int my_rank;
int num_proc;
charmessage_0[] = "hello slave, i'm your master";
charmessage_1[50];
charmaster_data[] = "slaves to work";
int array_of_errcodes[10];
int num;
MPI_Status  status;
MPI_Comminter_comm;
MPI_Infoinfo;
int arr[1];
int rc1;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Comm_size(MPI_COMM_WORLD, _proc);

printf("MASTER : spawning a slave ... \n");
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);

MPI_Finalize();
exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-01 Thread Prakash Velayutham

Thanks Edgar. I did not know that. Really?

Anyways, you are sure, an MPI job will work as a spawned process  
instead of "hostname"?


Thanks,
Prakash


On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:

MPI_Comm_spawn has to build an intercommunicator with the child  
process

that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages from  
the

child process(es) in order to set up the intercommunicator.

Thanks
Edgar

Prakash Velayutham wrote:

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

int tag = 0;
int my_rank;
int num_proc;
charmessage_0[] = "hello slave, i'm your master";
charmessage_1[50];
charmaster_data[] = "slaves to work";
int array_of_errcodes[10];
int num;
MPI_Status  status;
MPI_Comminter_comm;
MPI_Infoinfo;
int arr[1];
int rc1;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Comm_size(MPI_COMM_WORLD, _proc);

printf("MASTER : spawning a slave ... \n");
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);

MPI_Finalize();
exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-01 Thread Edgar Gabriel
MPI_Comm_spawn has to build an intercommunicator with the child process 
that it spawns. Thus, you can not spawn a non-MPI job such as 
/bin/hostname, since the parent process waits for some messages from the 
child process(es) in order to set up the intercommunicator.


Thanks
Edgar

Prakash Velayutham wrote:

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

 int tag = 0;
 int my_rank;
 int num_proc;
 charmessage_0[] = "hello slave, i'm your master";
 charmessage_1[50];
 charmaster_data[] = "slaves to work";
 int array_of_errcodes[10];
 int num;
 MPI_Status  status;
 MPI_Comminter_comm;
 MPI_Infoinfo;
 int arr[1];
 int rc1;

 MPI_Init(, );
 MPI_Comm_rank(MPI_COMM_WORLD, _rank);
 MPI_Comm_size(MPI_COMM_WORLD, _proc);

 printf("MASTER : spawning a slave ... \n");
 rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,  
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);


 MPI_Finalize();
 exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


[OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-01 Thread Prakash Velayutham

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

int tag = 0;
int my_rank;
int num_proc;
charmessage_0[] = "hello slave, i'm your master";
charmessage_1[50];
charmaster_data[] = "slaves to work";
int array_of_errcodes[10];
int num;
MPI_Status  status;
MPI_Comminter_comm;
MPI_Infoinfo;
int arr[1];
int rc1;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Comm_size(MPI_COMM_WORLD, _proc);

printf("MASTER : spawning a slave ... \n");
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,  
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);


MPI_Finalize();
exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash