Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Andrei Berceanu
Problem solved, thank you!

Best,
Andrei

On Mon, Nov 12, 2018 at 6:33 PM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Andrei,
>
> you can
>
> mpirun --mca btl ^openib ...
>
> in order to "disable" infiniband
>
>
> Cheers,
>
> Gilles
> On Mon, Nov 12, 2018 at 9:52 AM Andrei Berceanu
>  wrote:
> >
> > The node has an IB card, but it is a stand-alone node, disconnected from
> the rest of the cluster.
> > I am using OMPI to communicate internally between the GPUs of this node
> (and not between nodes).
> > So how can I disable the IB?
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Gilles Gouaillardet
Andrei,

you can

mpirun --mca btl ^openib ...

in order to "disable" infiniband


Cheers,

Gilles
On Mon, Nov 12, 2018 at 9:52 AM Andrei Berceanu
 wrote:
>
> The node has an IB card, but it is a stand-alone node, disconnected from the 
> rest of the cluster.
> I am using OMPI to communicate internally between the GPUs of this node (and 
> not between nodes).
> So how can I disable the IB?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Andrei Berceanu
The node has an IB card, but it is a stand-alone node, disconnected from
the rest of the cluster.
I am using OMPI to communicate internally between the GPUs of this node
(and not between nodes).
So how can I disable the IB?
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Michael Di Domenico
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu
 wrote:
>
> Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following 
> warnings:
>
> --
> WARNING: There is at least non-excluded one OpenFabrics device found,
> but there are no active ports detected (or Open MPI was unable to use
> them).  This is most certainly not what you wanted.  Check your
> cables, subnet manager configuration, etc.  The openib BTL will be
> ignored for this job.
>
>   Local host: gpu01
> --
> [gpu01:107262] 1 more process has sent help message help-mpi-btl-openib.txt / 
> no active ports found
> [gpu01:107262] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
>
> Any idea of what is going on and how I can fix this?
> I am using OpenMPI 3.1.2.

looks like openmpi found something like an infiniband card in the
compute node you're using, but it is not active/usable

as for a fix, it depends.

if you have an IB card should it be active?  if so, you'd have to
check the connections to see why it's disabled

if not, you'll can tell openmpi to disregard the IB ports, which will
clear the warning, but that might mean you're potentially using a
slower interface for message passing
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] OpenFabrics warning

2018-11-12 Thread Andrei Berceanu
Hi all,

Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the
following warnings:

--
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: gpu01
--
[gpu01:107262] 1 more process has sent help message help-mpi-btl-openib.txt
/ no active ports found
[gpu01:107262] Set MCA parameter "orte_base_help_aggregate" to 0 to see all
help / error messages

Any idea of what is going on and how I can fix this?
I am using OpenMPI 3.1.2.

Best,
Andrei
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users