Ok, that makes total sense. I'm leaning towards us fixing this in the OFI MTL
rather than making everyone load. I agree with you that it probably doesn't
matter, but let's not create a corner case. I'm also going to follow up with
the dev who wrote this code, but my guess is that we should ad
If you call "hwloc_topology_load", then hwloc merrily does its discovery and
slams many-core systems. If you call "opal_hwloc_get_topology", then that is
fine - it checks if we already have it, tries to get it from PMIx (using shared
mem for hwloc 2.x), and only does the discovery if no other me
But does raise the question; should we call get_topology() for belt and
suspenders in OFI? Or will that cause your concerns from the start of this
thread?
Brian
From: Ralph Castain
Date: Friday, March 20, 2020 at 9:31 AM
To: OpenMPI Devel
Cc: "Barrett, Brian"
Subject: RE: [EXTERNAL] [OMPI d
https://github.com/open-mpi/ompi/pull/7547 fixes it and has an explanation as
to why it wasn't catching us elsewhere in the MPI code
On Mar 20, 2020, at 9:22 AM, Ralph Castain via devel mailto:devel@lists.open-mpi.org> > wrote:
Odd - the topology object gets filled in during init, well before t
Odd - the topology object gets filled in during init, well before the fence (as
it doesn't need the fence, being a purely local op). Let me take a look
> On Mar 20, 2020, at 9:15 AM, Barrett, Brian wrote:
>
> PMIx folks -
>
> When using mpirun for launching, it looks like opal_hwloc_topology
PMIx folks -
When using mpirun for launching, it looks like opal_hwloc_topology isn't filled
in at the point where we need the information (mtl_ofi_component_init()). This
would end up being before the modex fence, since the goal is to figure out
which address the process should publish. I'm