Hello
This is a hwloc issue, the mailing list is
hwloc-us...@lists.open-mpi.org (please update the CCed address if you
reply to this message).
Try building with --enable-debug to get a lot of debug messages in lstopo.
Or run "hwloc-gather-topology foo" and send the resulting foo.tar.gz
Hello
OS X doesn't support binding at all, that's why hwloc and OpenMPI don't
support it either.
Brice
Le 17/03/2022 à 20:23, Sajid Ali via users a écrit :
Hi OpenMPI-developers,
When trying to run a program with process binding and oversubscription
(on a github actions CI instance)
Hello Dwaipayan
You seem to be running a very old hwloc (maybe embedded inside an old
Open MPI release?). Can you install a more recent hwloc from
https://www.open-mpi.org/projects/hwloc/, build it, and run its "lstopo"
to check whether the error remains?
If so, could you open an issue on
Hello
The hwloc/X11 stuff is caused by OpenMPI using a hwloc that was built
with the GL backend enabled (in your case, it's because package
libhwloc-plugins is installed). That backend is used for querying the
locality of X11 displays running on NVIDIA GPUs (using libxnvctrl). Does
running
Hello
Both 1.10.1 and 1.11.10 are vry old. Any chance you try at least
1.11.13 or even 2.x on these machines? I can't remember all what we
changed in this code 5 years later unfortunately.
We are not aware of any issue of Intel haswell but it's not impossible
something is buggy in the
Hello
This is hwloc (the hardware detection tool) complaining that something
is wrong in your hardware or operating system. It won't prevent your MPI
code from working, however process binding may not be optimal.
You may want to upgrade your operating system kernel and/or BIOS. If you
want to
Le 08/01/2020 à 21:51, Prentice Bisbal via users a écrit :
>
> On 1/8/20 3:30 PM, Brice Goglin via users wrote:
>> Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :
>>> We just added about a dozen nodes to our cluster, which have AMD EPYC
>>> 7281 processors
Le 08/01/2020 à 21:20, Prentice Bisbal via users a écrit :
> We just added about a dozen nodes to our cluster, which have AMD EPYC
> 7281 processors. When a particular users jobs fall on one of these
> nodes, he gets these error messages:
>
>
The attached patch (against 4.0.2) should fix it, I'll prepare a PR to
fix this upstream.
Brice
Le 27/11/2019 à 00:41, Brice Goglin via users a écrit :
> It looks like NUMA is broken, while others such as SOCKET and L3CACHE
> work fine. A quick look in opal_hwloc_base_get_relative_lo
It looks like NUMA is broken, while others such as SOCKET and L3CACHE
work fine. A quick look in opal_hwloc_base_get_relative_locality() and
friends tells me that those functions were not properly updated to hwloc
2.0 NUMA changes. I'll try to understand what's going on tomorrow.
Rebuilding OMPI
Hello
We have a platform with an old MLX4 partition and another OPA partition.
We want a single OMPI installation working for both kinds of nodes. When
we enable UCX in OMPI for MLX4, UCX ends up being used on the OPA
partition too, and the performance is poor (3GB/s instead of 10). The
problem
11 matches
Mail list logo