Hi justa tester tester

Is your p2p1 interface an Infiniband port, or is it Ethernet?
If it is Ethernet, try removing "--mca btl_openib_if_include p2p1"
from your mpiexec command line, because it would conflict with
the other mca parameter you chose "--mca btl openib,sm,self".

Simpler may be better: Have you tried to use just
"--mca btl openib,sm,self" ?
This way OMPI will find the Infiniband interface(s) for you.

Justa guessed guess,
Gus Correa

On 09/18/2013 01:49 PM, justa tester tester wrote:

I'm new to OPEN MPI and have a question in regards to the error I'm
seeing after compiling the OFED stack to facilitate RDMA and OpenMPI and
specified the install path of OFED stack and installed Intel MPI
Benchmark.  I was able to run tcp but when running openib we could not
run succesfully we are see the error below: OFED version 3.5

[root@dhcp-8-168 imb]# mpirun --mca btl openib,sm,self --mca
btl_openib_cpc_include rdmacm --mca btl_openib_if_include p2p1  --mca
btl_openib_verbose 2 -np 2 -hostfile hosts ./3.2.4/src/IMB-MPI1 -npmin 2
-iter 10 PingPong
WARNING: One or more nonexistent OpenFabrics devices/ports were

   Host:                 dhcp-8-168
   MCA parameter:        mca_btl_if_include
   Nonexistent entities: p2p1

These entities will be ignored.  You can disable this warning by
setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

   Process 1 ([[60771,1],0]) is on host: dhcp-8-168
   Process 2 ([[60771,1],1]) is on host: 169
   BTLs attempted: self sm

Your MPI job is now going to abort; sorry.
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

  * Check the output of ompi_info to see which BTL/MTL plugins are
  * Run your application with MPI_THREAD_SINGLE.
  * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
    if using MTL-based communications) to see exactly which
    communication plugins were considered and/or discarded.
[dhcp-8-168:3503] *** An error occurred in MPI_Init
[dhcp-8-168:3503] *** on a NULL communicator
[dhcp-8-168:3503] *** Unknown error
[dhcp-8-168:3503] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

   Reason:     Before MPI_INIT completed
   Local host: dhcp-8-168
   PID:        3503
mpirun has exited due to process rank 0 with PID 3503 on
node dhcp-8-168 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
[dhcp-8-168:03501] 1 more process has sent help message
help-mpi-btl-openib.txt / nonexistent port
[dhcp-8-168:03501] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages
[dhcp-8-168:03501] 1 more process has sent help message
help-mca-bml-r2.txt / unreachable proc
[dhcp-8-168:03501] 1 more process has sent help message help-mpi-runtime
/ mpi_init:startup:pml-add-procs-fail
[dhcp-8-168:03501] 1 more process has sent help message
help-mpi-errors.txt / mpi_errors_are_fatal unknown handle
[dhcp-8-168:03501] 1 more process has sent help message
help-mpi-runtime.txt / ompi mpi abort:cannot guarantee all killed


users mailing list

Reply via email to