Hi,
basically, here is how things happen by default :
- each btl (openib, tcp, sm, ...) has three parameters {exclusivity,
bandwidth, latency}
- in order to communicate with another task, all the available btl with
the higher exclusivity are used
- in the case of the openib btl, there is an option to use only the
closest IB port, but i cannot remember whether it is on by default.
(tasks running on socket 0 will use the IB port connected to the PCI bus
of socket 0, tasks running on socket 1 will use the IB port connected to
the PCI bus of socket 1)
- if several btl are used, then messages are split across all the btl
(e.g. ports), and in this case, bandwitch/latency are used to
efficiently split the messages.
(ideally, and if IB is QDR, large message traffic should be routed 80%
to the IB port and 20 % to the 10GbE port)
in most IB configurations, two tasks on different nodes are able to
communicate via TCP or IB.
since the openib btl has a higher exclusivity than the tcp btl, only IB
is used.
in your configuration, it seems that the openib btl is used for the 10
GbE port, so if you did the right thing by only including the first IB port.
at this stage, i think the RDMA errors is likely a bug, but using the 10
GbE port is most likely a feature.
i will follow-up on the devel mailing list, since we might want to
decrease the exclusivity of the 10 GbE port, so it is no more used by
default in this kind of configuration.
Cheers,
Gilles
On 6/11/2016 12:25 AM, Grodowitz, Nathan T. wrote:
Hello,
We recently ran into an issue with a cluster and dual port ConnectX3 cards. We
are using the cards with one port setup for IB and one port setup for 10gbe. We
ran into scaling issues when using the openib BTL where the system tried to run
over the 10gbe port rather than the IB port. This caused lots of RDMA errors
(RDMA_CM_EVENT_ADDR_ERROR) which were somewhat hard to diagnose. We were able
to discover the issue via “—mca btl_base_verbose 30”. This showed the ports
being used. From there, we were able to setup our openmpi module to use “
btl_openib_if_include “mlx4_0:1” “ and put openmpi traffic over the proper
port. There wasn’t much documentation on the issue, so I wanted to send it out
to the mailing list.
Also, is there a reason that openib attempts to use the 10gbe interface as
well? What is the cause for this as the default behavior? If this sort of
configuration gets more common, it may come up more in the future.
Thank you,
Nathan Grodowitz
ITSD Linux R&D Scientific Platforms
HPC Admin
Office:865-576-4715
Cell:865-347-4247
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/06/29423.php