I opened an issue, and a fix looks like it went in to the 4.1.2 release branch
already. I tested the patch on my 4.1.1 release tarball, and the error no
longer occurs.
Here is the link to the issue:
https://github.com/open-mpi/ompi/issues/9617
Thanks,
David
Could you please ensure it was configured with --enable-debug and then add
"--mca rmaps_base_verbose 5" to the mpirun cmd line?
On Nov 3, 2021, at 9:10 AM, Mccall, Kurt E. (MSFC-EV41) via users
mailto:users@lists.open-mpi.org> > wrote:
Gilles and Ralph,
I did build with -with-tm. I tried
Gilles and Ralph,
I did build with -with-tm. I tried Gilles workaround but the failure still
occurred.What do I need to provide you so that you can investigate this
possible bug?
Thanks,
Kurt
From: users On Behalf Of Ralph Castain via
users
Sent: Wednesday, November 3, 2021 8:45 AM
this seemed to help me as well, so far at least. still have a lot
more testing to do
On Tue, Nov 2, 2021 at 4:15 PM Shrader, David Lee wrote:
>
> As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to
> get around this issue. I'm not sure why this works, but perhaps there
Sounds like a bug to me - regardless of configuration, if the hostfile contains
an entry for each slot on a node, OMPI should have added those up.
On Nov 3, 2021, at 2:49 AM, Gilles Gouaillardet via users
mailto:users@lists.open-mpi.org> > wrote:
Kurt,
Assuming you built Open MPI with tm
Kurt,
Assuming you built Open MPI with tm support (default if tm is detected at
configure time, but you can configure --with-tm to have it abort if tm
support is not found), you should not need to use a hostfile.
As a workaround, I would suggest you try to
mpirun --map-by node -np 21 ...
I'm using OpenMPI 4.1.1 compiled with Nvidia's nvc++ 20.9, and compiled with
Torque support.
I want to reserve multiple slots on each node, and then launch a single manager
process on each node. The remaining slots would be filled up as the manager
spawns new processes with MPI_Comm_spawn on