I have a very simple program which spawns a number of slaves. I am getting
erratic results from this program. It seems that all the slave processes are
spawned but not all of them complete the MPI_Init() before the main program
ends. In addition I get the following error messages for which I haven't
been able to find any documentation:

[turkana:26736] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:26736] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:26736] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276
[turkana:26736] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:26736] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:26736] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276

I am using openmpi 1.1 on FC4 on a dual AMD Athlon machine.

My program is as follows:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>

int
main(int ac, char *av[])
{
   int  rank, size;
   char name[MPI_MAX_PROCESSOR_NAME];
   int  nameLen;
   int  n = 5, i;
   int  slave = 0;
   int  errs[5];
   char *args[] = { av[0], "-W", NULL};
   MPI_Comm intercomm;
   int  err;

   memset(name, sizeof(name), 0);

   for(i=1; i<ac; i++){
       if (strcmp(av[i],"-W") == 0){
           slave = 1;
       }
   }

   fprintf(stderr, "%s before MPI_Init() in %d\n", slave?"slave":"master",
getpid());
   MPI_Init(&ac, &av);
   fprintf(stderr, "%s after MPI_Init() in %d\n", slave?"slave":"master",
getpid());

   if (!slave){
       err = MPI_Comm_spawn(av[0], args, n, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &intercomm, errs);
       if (err){
           fprintf(stderr, "MPI_Comm_spawn generated error %d.\n", err);
       }
   }
   else {
       fprintf(stderr, "%s before MPI_Comm_get_parent() in %d\n",
slave?"slave":"master", getpid());
       MPI_Comm_get_parent(&intercomm);
   }

   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &size);

   fprintf(stderr, "%s %d (%s) of %d\n", slave?"slave":"master", rank,
name, size);

   MPI_Finalize();

   return 0;
}

Reply via email to