Hi,

I have installed openmpi-master-201702010209-6cb484a on my "SUSE Linux
Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0.
Unfortunately, I get errors when I run my spawn programs.


loki spawn 107 mpiexec -np 1 --host loki,loki,nfs1 spawn_intra_comm
Parent process 0: I create 2 slave processes
[nfs1:27716] PMIX ERROR: ERROR in file ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1029 [nfs1:27716] PMIX ERROR: ERROR in file ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server_get.c at line 501
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[42193,2],1]) is on host: nfs1
  Process 2 ([[42193,1],0]) is on host: unknown!
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
[nfs1:27727] [[42193,2],1] ORTE_ERROR_LOG: Unreachable in file ../../openmpi-master-201702010209-6cb484a/ompi/dpm/dpm.c at line 426
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
[nfs1:27727] *** An error occurred in MPI_Init
[nfs1:27727] *** reported by process [2765160450,1]
[nfs1:27727] *** on a NULL communicator
[nfs1:27727] *** Unknown error
[nfs1:27727] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[nfs1:27727] ***    and potentially your MPI job)
loki spawn 108



I used the following commands to build and install the package.
${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64" for my
Linux machine. Options "--enable-mpi-cxx-bindings and
"--enable-mpi-thread-multiple" are now unrecognized. Probably
they are now automatically supported. "configure" reports a
warning that I should report.


mkdir openmpi-master-201702010209-6cb484a-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
cd openmpi-master-201702010209-6cb484a-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc

../openmpi-master-201702010209-6cb484a/configure \
  --prefix=/usr/local/openmpi-master_64_cc \
  --libdir=/usr/local/openmpi-master_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
  JAVA_HOME=/usr/local/jdk1.8.0_66 \
  LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  --enable-mpi-cxx \
  --enable-mpi-cxx-bindings \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-mpi-thread-multiple \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64" \
  --with-wrapper-fcflags="-m64" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
rm -r /usr/local/openmpi-master_64_cc.old
mv /usr/local/openmpi-master_64_cc /usr/local/openmpi-master_64_cc.old
make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc



...
checking numaif.h usability... no
checking numaif.h presence... yes
configure: WARNING: numaif.h: present but cannot be compiled
configure: WARNING: numaif.h:     check for missing prerequisite headers?
configure: WARNING: numaif.h: see the Autoconf documentation
configure: WARNING: numaif.h:     section "Present But Cannot Be Compiled"
configure: WARNING: numaif.h: proceeding with the compiler's result
configure: WARNING:     ## 
------------------------------------------------------ ##
configure: WARNING:     ## Report this to 
http://www.open-mpi.org/community/help/ ##
configure: WARNING:     ## 
------------------------------------------------------ ##
checking for numaif.h... no
...




I get the following errors, if I run "spawn_master" or "spawn_multiple_master".

loki spawn 108 mpiexec -np 1 --host loki,loki,loki,nfs1,nfs1 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[nfs1:29189] *** Process received signal ***
[nfs1:29189] Signal: Aborted (6)
[nfs1:29189] Signal code:  (-6)
[nfs1:29189] PMIX ERROR: ERROR in file ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1029 [nfs1:29189] PMIX ERROR: ERROR in file ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server_get.c at line 501 [nfs1:29189] PMIX ERROR: ERROR in file ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c at line 1029 [nfs1:29189] PMIX ERROR: ERROR in file ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server_get.c at line 501 Warning :: pmix_list_remove_item - the item 0x7f03e001b5b0 is not on the list 0x7f03e8760fc8 orted: ../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server_get.c:587: pmix_pending_resolve: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((pmix_object_t *) (ptr))->obj_magic_id' failed.
[nfs1:29189] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f03eaca5870]
[nfs1:29189] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7f03ea9230c7]
[nfs1:29189] [ 2] /lib64/libc.so.6(abort+0x118)[0x7f03ea924478]
[nfs1:29189] [ 3] /lib64/libc.so.6(+0x2e146)[0x7f03ea91c146]
[nfs1:29189] [ 4] /lib64/libc.so.6(+0x2e1f2)[0x7f03ea91c1f2]
[nfs1:29189] [ 5] /usr/local/openmpi-master_64_cc/lib64/openmpi/mca_pmix_pmix2x.so(pmix_pending_resolve+0x2bc)[0x7f03e8382fbc] [nfs1:29189] [ 6] /usr/local/openmpi-master_64_cc/lib64/openmpi/mca_pmix_pmix2x.so(+0x1557b9)[0x7f03e83837b9] [nfs1:29189] [ 7] /usr/local/openmpi-master_64_cc/lib64/libopen-pal.so.0(+0x270d2b)[0x7f03ec065d2b] [nfs1:29189] [ 8] /usr/local/openmpi-master_64_cc/lib64/libopen-pal.so.0(+0x27106a)[0x7f03ec06606a] [nfs1:29189] [ 9] /usr/local/openmpi-master_64_cc/lib64/libopen-pal.so.0(opal_libevent2022_event_base_loop+0x2d9)[0x7f03ec0669b9] [nfs1:29189] [10] /usr/local/openmpi-master_64_cc/lib64/openmpi/mca_pmix_pmix2x.so(+0x1e1dc4)[0x7f03e840fdc4]
[nfs1:29189] [11] /lib64/libpthread.so.0(+0x80a4)[0x7f03eac9e0a4]
[nfs1:29189] [12] /lib64/libc.so.6(clone+0x6d)[0x7f03ea9d302d]
[nfs1:29189] *** End of error message ***
Abort
--------------------------------------------------------------------------
ORTE has lost communication with its daemon located on node:

  hostname:  nfs1

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------
loki spawn 109



I would be grateful, if somebody can fix the problems. Do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

#define NUM_SLAVES	2		/* create NUM_SLAVES processes	*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_ALL_PROCESSES,		/* intra-communicator		*/
	   COMM_CHILD_PROCESSES,	/* inter-communicator		*/
	   COMM_PARENT_PROCESSES;	/* inter-communicator		*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   ntasks_all,			/* tasks in COMM_ALL_PROCESSES	*/
	   mytid_world,			/* my task id in MPI_COMM_WORLD	*/
	   mytid_all,			/* id in COMM_ALL_PROCESSES	*/
	   namelen;			/* length of processor name	*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid_world);
  /* At first we must decide if this program is executed from a parent
   * or child process because only a parent is allowed to spawn child
   * processes (otherwise the child process with rank 0 would spawn
   * itself child processes and so on). "MPI_Comm_get_parent ()"
   * returns the parent inter-communicator for a spawned MPI rank and
   * MPI_COMM_NULL if the process wasn't spawned, i.e. it was started
   * statically via "mpiexec" on the command line.
   */
  MPI_Comm_get_parent (&COMM_PARENT_PROCESSES);
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    /* All parent processes must call "MPI_Comm_spawn ()" but only
     * the root process (in our case the process with rank 0) will
     * spawn child processes. All other processes of the
     * intra-communicator (in our case MPI_COMM_WORLD) will ignore
     * the values of all arguments before the "root" parameter.
     */
    if (mytid_world == 0)
    {
      printf ("Parent process 0: I create %d slave processes\n",
	      NUM_SLAVES);
    }
    MPI_Comm_spawn (argv[0], MPI_ARGV_NULL, NUM_SLAVES,
		    MPI_INFO_NULL, 0, MPI_COMM_WORLD,
		    &COMM_CHILD_PROCESSES, MPI_ERRCODES_IGNORE);
  }
  /* Merge all processes into one intra-communicator. The "high" flag
   * determines the order of the processes in the intra-communicator.
   * If parent and child processes use the same flag the order may
   * be arbitray otherwise the processes with "high == 0" will have
   * a lower rank than the processes with "high == 1".
   */
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    /* parent processes							*/
    MPI_Intercomm_merge (COMM_CHILD_PROCESSES, 0, &COMM_ALL_PROCESSES);
  }
  else
  {
    /* spawned child processes						*/
    MPI_Intercomm_merge (COMM_PARENT_PROCESSES, 1, &COMM_ALL_PROCESSES);
  }
  MPI_Comm_size	(MPI_COMM_WORLD, &ntasks_world);
  MPI_Comm_size (COMM_ALL_PROCESSES, &ntasks_all);
  MPI_Comm_rank (COMM_ALL_PROCESSES, &mytid_all);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the following printf-statement every process executing this
   * code will print some lines on the display. It may happen that the
   * lines will get mixed up because the display is a critical section.
   * In general only one process (mostly the process with rank 0) will
   * print on the display and all other processes will send their
   * messages to this process. Nevertheless for debugging purposes
   * (or to demonstrate that it is possible) it may be useful if every
   * process prints itself.
   */
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
    MPI_Comm_size	 (COMM_CHILD_PROCESSES, &ntasks_local);
    MPI_Comm_remote_size (COMM_CHILD_PROCESSES, &ntasks_remote);
    printf ("\nParent process %d running on %s\n"
	    "    MPI_COMM_WORLD ntasks:              %d\n"
	    "    COMM_CHILD_PROCESSES ntasks_local:  %d\n"
	    "    COMM_CHILD_PROCESSES ntasks_remote: %d\n"
	    "    COMM_ALL_PROCESSES ntasks:          %d\n"
	    "    mytid in COMM_ALL_PROCESSES:        %d\n",
	    mytid_world, processor_name, ntasks_world, ntasks_local,
	    ntasks_remote, ntasks_all, mytid_all);
  }
  else
  {
    printf ("\nChild process %d running on %s\n"
	    "    MPI_COMM_WORLD ntasks:              %d\n"
	    "    COMM_ALL_PROCESSES ntasks:          %d\n"
	    "    mytid in COMM_ALL_PROCESSES:        %d\n",
	    mytid_world, processor_name, ntasks_world, ntasks_all,
	    mytid_all);
  }
  MPI_Finalize ();
  return EXIT_SUCCESS;
}
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to