Hi, That gives me an avenue to pursue.
Thanks, John On Mon, 2018-04-23 at 15:12 +0000, Jeff Squyres (jsquyres) wrote: On Apr 23, 2018, at 11:00 AM, Marshall2, John (SSC/SPC) <john.marsha...@canada.ca<mailto:john.marsha...@canada.ca>> wrote: Only one ib interface shows up via ifconfig and at /sys/class/net/ibX. But, under /sys/class/infiniband and /sys/class/infiniband_cm, all the mlx4_Y do show up. E.g., mlx4_0 mlx4_10 mlx4_12 mlx4_14 mlx4_16 mlx4_3 mlx4_5 mlx4_7 mlx4_9 mlx4_1 mlx4_11 mlx4_13 mlx4_15 mlx4_2 mlx4_4 mlx4_6 mlx4_8 I'm not sure if this can be avoided. So, where is openmpi looking for the available mlx4_Y? Under one of those two directories or whatever is at /sys/class/net/ibX/device/infiniband/mlx4_Y? It will use whatever devices libibverbs reports back. It's been quite a while since I've looked in the libibverbs code, but it *might* return all the devices...? What does ibv_devinfo(1) return inside one of your containers? That's probably the same information that is returned to Open MPI programmatically via the libibverbs API. If libibverbs is returning all devices vs. just the one that is actually available in your container, then that might explain the performance disparity.
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users