Please see inline (marked with "Pasha >"). From: users <users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> on behalf of John Marshall <john.marsh...@ssc-spc.gc.ca<mailto:john.marsh...@ssc-spc.gc.ca>> Reply-To: Open Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> List-Post: users@lists.open-mpi.org Date: Monday, October 19, 2015 11:06 AM To: Open Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> Subject: Re: [OMPI users] openib issue with 1.6.5 but not later releases
Further efforts have shown that if we add: export OMPI_MCA_btl_openib_if_include=<device> where device corresponds to the IB interface (e.g., mlx4_14), then our test does not fail (yet, anyways). Pasha > This is a pretty clear indicator that each container sees more than a single device. Can you run ibv_devinfo -V within container and see what happens ? So, is this setting required if there are multiple IB interfaces (as when there are multiple eth interfaces)? What is curious is that there is only one interface visible from the container. Does the openib btl look deeper and find all that exist in the host? Pasha > Not really. We use Verbs driver to fetch the list of devices on the "node" Is there something about the openib implementations in 1.8 and 1.10 that may handle this differently since we do not set OMPI_MCA_btl_openib_if_include but our tests seem to work? Or, is it a fluke? Pasha > I was not involved that much in 1.8 and 1.10 so it is a bit hard to comment. I would suspect that this might be somehow related to the locality feature and openib btl selects and creates only one btl instance and ignores all the rest. Best, Pasha