Hi,

That gives me an avenue to pursue.

Thanks,
John

On Mon, 2018-04-23 at 15:12 +0000, Jeff Squyres (jsquyres) wrote:

On Apr 23, 2018, at 11:00 AM, Marshall2, John (SSC/SPC) 
<john.marsha...@canada.ca<mailto:john.marsha...@canada.ca>> wrote:



Only one ib interface shows up via ifconfig and at /sys/class/net/ibX.

But, under /sys/class/infiniband and /sys/class/infiniband_cm, all the mlx4_Y 
do show
up. E.g.,
mlx4_0  mlx4_10  mlx4_12  mlx4_14  mlx4_16  mlx4_3  mlx4_5  mlx4_7  mlx4_9
mlx4_1  mlx4_11  mlx4_13  mlx4_15  mlx4_2   mlx4_4  mlx4_6  mlx4_8

I'm not sure if this can be avoided.

So, where is openmpi looking for the available mlx4_Y? Under one of those two 
directories
or whatever is at /sys/class/net/ibX/device/infiniband/mlx4_Y?



It will use whatever devices libibverbs reports back.

It's been quite a while since I've looked in the libibverbs code, but it 
*might* return all the devices...?  What does ibv_devinfo(1) return inside one 
of your containers?  That's probably the same information that is returned to 
Open MPI programmatically via the libibverbs API.

If libibverbs is returning all devices vs. just the one that is actually 
available in your container, then that might explain the performance disparity.


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to