done...

Jeff Squyres wrote:
Edgar --

Can you file a CMR for v1.2?

On Apr 10, 2008, at 8:10 AM, Edgar Gabriel wrote:
thanks for reporting the bug, it is fixed on the trunk. The problem was
this time not in the algorithm, but in the checking of the
preconditions. If recvcount was zero and the rank not equal to the rank of the root, than we did not even start the scatter, assuming that there
was nothing to do. For inter-communicators the check has to be however
extended to accept recvcount=0 for root=MPI_ROOT. The fix is in the
trunk in rev. 18123.

Thanks
Edgar

Edgar Gabriel wrote:
I don't think that anybody answered to your email so far, I'll have a
look at it on thursday...

Thanks
Edgar

Audet, Martin wrote:
Hi,

I don't know if it is my sample code or if it is a problem whit MPI_Scatter() on inter-communicator (maybe similar to the problem we found with MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I wrote freeze during its second iteration of a loop doing an MPI_Scatter() over an inter- communicator.

For example if I compile as follows:

 mpicc -Wall scatter_bug.c -o scatter_bug

I get no error or warning. Then if a start it with np=2 as follows:

   mpiexec -n 2 ./scatter_bug

it prints:

  beginning Scatter i_root_group=0
  ending Scatter i_root_group=0
  beginning Scatter i_root_group=1

and then hang...

Note also that if I change the for loop to execute only the MPI_Scatter() of the second iteration (e.g. replacing "i_root_group=0;" by "i_root_group=1;"), it prints:

   beginning Scatter i_root_group=1

and then hang...

The problem therefore seems to be related with the second iteration itself.

Please note that this program run fine with mpich2 1.0.7rc2 (ch3:sock device) for many different number of process (np) when the executable is ran with or without valgrind.

The OpenMPI version I use is 1.2.6rc3 and was configured as follows:

./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi- f77 --disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions -- with-io-romio-flags=--with-file-system=ufs+nfs

Note also that all process (when using OpenMPI or mpich2) were started on the same machine.

Also if you look at source code, you will notice that some arguments to MPI_Scatter() are NULL or 0. This may look strange and problematic when using a normal intra-communicator. However according to the book "MPI - The complete reference" vol 2 about MPI-2, for MPI_Scatter() with an inter-communicator:

"The sendbuf, sendcount and sendtype arguments are significant only at the root process. The recvbuf, recvcount, and recvtype arguments are significant only at the processes of the leaf group."

If anyone else can have a look at this program and try it it would be helpful.

Thanks,

Martin


#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv)
{
  int ret_code = 0;
  int comm_size, comm_rank;

  MPI_Init(&argc, &argv);

  MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
  MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

  if (comm_size > 1) {
     MPI_Comm subcomm, intercomm;
     const int group_id = comm_rank % 2;
     int i_root_group;

     /* split process in two groups:  even and odd comm_ranks. */
     MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, &subcomm);

/* The remote leader comm_rank for even and odd groups are respectively: 1 and 0 */ MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id, 0, &intercomm);

/* for i_root_group==0 process with comm_rank==0 scatter data to all process with odd comm_rank */ /* for i_root_group==1 process with comm_rank==1 scatter data to all process with even comm_rank */
     for (i_root_group=0; i_root_group < 2; i_root_group++) {
        if (comm_rank == 0) {
printf("beginning Scatter i_root_group=%d \n",i_root_group);
        }
        if (group_id == i_root_group) {
           const int  is_root  = (comm_rank == i_root_group);
           int       *send_buf = NULL;
           if (is_root) {
              const int nbr_other = (comm_size+i_root_group)/2;
              int       ii;
              send_buf = malloc(nbr_other*sizeof(*send_buf));
              for (ii=0; ii < nbr_other; ii++) {
                  send_buf[ii] = ii;
              }
           }
           MPI_Scatter(send_buf, 1, MPI_INT,
NULL, 0, MPI_INT, (is_root ? MPI_ROOT : MPI_PROC_NULL), intercomm);

           if (is_root) {
              free(send_buf);
           }
        }
        else {
           int an_int;
           MPI_Scatter(NULL,    0, MPI_INT,
                       &an_int, 1, MPI_INT, 0, intercomm);
        }
        if (comm_rank == 0) {
           printf("ending Scatter i_root_group=%d\n",i_root_group);
        }
     }

     MPI_Comm_free(&intercomm);
     MPI_Comm_free(&subcomm);
  }
  else {
fprintf(stderr, "%s: error this program must be started np > 1\n", argv[0]);
     ret_code = 1;
  }

  MPI_Finalize();

  return ret_code;
}

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335

Reply via email to