Hi,
today I installed openmpi-1.8.2rc2r32288 on my machines (Solaris 10
Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with
Sun C 5.12 and gcc-4.9.0. Unfortunately I have problems with both
compilers on "Solaris 10 Sparc". My small program works as expected
on "Solaris 10 x86_64" and
Ahsan,
This link might be helpful in trying to diagnose and treat IB fabric issues:
http://docs.oracle.com/cd/E18476_01/doc.220/e18478/fabric.htm#CIHIHJGD
You might try resetting the problematic port, or just use port 2 for your
jobs as a quick workaround:
-mca btl_openib_if_include mlx4_0:2
It seems that the network was not consistenly wired.
Port DOWN means that the port was not wired (or bad cable). Moreover, on some
nodes port 1 is connected on other port 2.
My concern is that they are not connected to the same subnet. If you have at
least one port on each node connected to the
It's supposed to, so it sounds like we have a bug in the connection failover
mechanism. I'll address it
On Jul 23, 2014, at 1:21 AM, Timur Ismagilov wrote:
> Thanks, Ralph!
> When I add --mca oob_tcp_if_include ib0 (where ib0 is infiniband interface
> from ifconfig) to
Thanks, Ralph!
When I add --mca oob_tcp_if_include ib0 (where ib0 is infiniband interface from
ifconfig) to mpirun it starts working correct!
Why OpenMPI doesn't do it itself?
Tue, 22 Jul 2014 11:26:16 -0700 от Ralph Castain :
>Okay, the problem is that the connection back
Hi Josh
It was my mistake. The status of error generating node is pasted below
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0018:8b90:97fe:94fe
base lid:0x0
sm lid: 0x0
state: 1: DOWN
phys state:
Dear Pasha
The ibstatus is not of two different machines it is of the same machine.
There are two infiband ports showing up on all nodes. I checked on all the
nodes that one of the port in always in INIT status and other one active.
Now please see below the ibstatus of the problem causing node