Hi David:

On Wed, Jul 21, 2010 at 02:10:53PM -0400, David Ronis wrote:
> I've got a mpi program on an 8-core box that runs in a master-slave
> mode.   The slaves calculate something, pass data to the master, and
> then call MPI_Bcast waiting for the master to update and return some
> data via a MPI_Bcast originating on the master.  
> 
> One of the things the master does while the slaves are waiting is to
> make heavy use of fftw3 FFT routines which can support multi-threading.
> However, for threading to make sense, the slaves on same physical
> machine have to give up their CPU usage, and this doesn't seem to be the
> case (top shows them running at close to 100%).  Is there another MPI
> routine that polls for data and then gives up its time-slice? 
> 
> Any other suggestions?

I ran into a similar problem some time ago.  My situation seems similar
to yours:
  1. the data in the MPI application has a to-and-fro nature.
  2. I cannot afford an MPI process that consumes 100% cpu 
     while doing nothing.

My solution was to link two extra routines with my (FORTRAN)
application.  These routines intercept mpi_recv and mpi_send, test the
status of the request, and sleep if it is not ready.  The sleep time
has an exponential curve; it has a start value, factor, and maximum
value.

I made no source code changes to my application.  When I include these
two routines at link time, the load from the application changes from
2.0 to 1.0 

I use these with OpenMPI-1.2.8.

I have not tried -mca yield_when_idle 1; which may not be in 1.2.8.
Not sure.

Hope that helps
Douglas.
-- 
  Douglas Guptill                       voice: 902-461-9749
  Research Assistant, LSC 4640          email: douglas.gupt...@dal.ca
  Oceanography Department               fax:   902-494-3877
  Dalhousie University
  Halifax, NS, B3H 4J1, Canada

/*
 * Intercept MPI_Recv, and
 * call PMPI_Irecv, loop over PMPI_Request_get_status and sleep, until done
 *
 * Revision History:
 *  2008-12-17: copied from MPI_Send.c
 *  2008-12-18: tweaking.
 *
 * See MPI_Send.c for additional comments, 
 *  especially w.r.t. PMPI_Request_get_status.
 **/

#include "mpi.h"
#define _POSIX_C_SOURCE 199309 
#include <time.h>

int MPI_Recv(void *buff, int count, MPI_Datatype datatype, 
	      int from, int tag, MPI_Comm comm, MPI_Status *status) {

  int flag, nsec_start=1000, nsec_max=100000;
  struct timespec ts;
  MPI_Request req;

  ts.tv_sec = 0;
  ts.tv_nsec = nsec_start;

  PMPI_Irecv(buff, count, datatype, from, tag, comm, &req);
  do {
    nanosleep(&ts, NULL);
    ts.tv_nsec *= 2;
    ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec;
    PMPI_Request_get_status(req, &flag, status);
  } while (!flag);

  return (*status).MPI_ERROR;
}
/*
 * Intercept MPI_Send, and
 * call PMPI_Isend, loop over PMPI_Request_get_status and sleep, until done
 *
 * Revision History:
 *  2008-12-12: skeleton by Jeff Squyres <jsquy...@cisco.com>
 *  2008-12-16->18: adding parameters, variable wait, 
 *     change MPI_Test to MPI_Request_get_status
 *      Douglas Guptill <douglas.gupt...@dal.ca>
 **/

/* When we use this:
 *   PMPI_Test(&req, &flag, &status); 
 * we get:
 * dguptill@DOME:$ mpirun -np 2 mpi_send_recv_test_mine
 * This is process            0  of            2 .
 * This is process            1  of            2 .
 * error: proc            0 ,mpi_send returned -1208109376
 * error: proc            1 ,mpi_send returned -1208310080
 *     1 changed to            3
 *
 * Using MPI_request_get_status cures the problem.
 *
 * A read of mpi21-report.pdf confirms that MPI_Request_get_status
 * is the appropriate choice, since there seems to be something
 * between the call to MPI_SEND (MPI_RECV) in my FORTRAN program
 * and MPI_Send.c (MPI_Recv.c)
 **/


#include "mpi.h"
#define _POSIX_C_SOURCE 199309 
#include <time.h>

int MPI_Send(void *buff, int count, MPI_Datatype datatype, 
	      int dest, int tag, MPI_Comm comm) {

  int flag, nsec_start=1000, nsec_max=100000;
  struct timespec ts;
  MPI_Request req;
  MPI_Status status;

  ts.tv_sec = 0;
  ts.tv_nsec = nsec_start;

  PMPI_Isend(buff, count, datatype, dest, tag, comm, &req);
  do {
    nanosleep(&ts, NULL);
    ts.tv_nsec *= 2;
    ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec;
    PMPI_Request_get_status(req, &flag, &status);
  } while (!flag);

  return status.MPI_ERROR;
}

Reply via email to