These are very, very old versions of UCX and HCOLL installed in your
environment. Also, MXM was deprecated years ago in favor of UCX. What
version of MOFED is installed (run ofed_info -s)? What HCA generation is
present (run ibstat).
Josh
On Tue, Mar 1, 2022 at 6:42 AM Angel de Vicente via users
This is an ancient version of HCOLL. Please upgrade to the latest version
(you can do this by installing HPC-X
https://www.mellanox.com/products/hpc-x-toolkit)
Josh
On Wed, Feb 5, 2020 at 4:35 AM Angel de Vicente
wrote:
> Hi,
>
> Joshua Ladd writes:
>
> > We cannot reproduce this. On four node
We cannot reproduce this. On four nodes 20 PPN with and w/o hcoll it takes
exactly the same 19 secs (80 ranks).
What version of HCOLL are you using? Command line?
Josh
On Tue, Feb 4, 2020 at 8:44 AM George Bosilca via users <
users@lists.open-mpi.org> wrote:
> Hcoll will be present in many case
OK. Please try:
mpirun -np 128 --debug-daemons --map-by ppr:64:socket hostname
Josh
On Tue, Jan 28, 2020 at 12:49 PM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:
> Input: mpirun -np 128 --debug-daemons -mca plm rsh hostname
>
>
>
> Output:
>
> [Gen2Node3:54039] [[16643,0],0] orted_c
Sorry, typo, try:
mpirun -np 128 --debug-daemons -mca plm rsh hostname
Josh
On Tue, Jan 28, 2020 at 12:45 PM Joshua Ladd wrote:
> And if you try:
> mpirun -np 128 --debug-daemons -plm rsh hostname
>
> Josh
>
> On Tue, Jan 28, 2020 at 12:34 PM Collin Strassburger <
> cstrassbur...@bihrle.com> w
And if you try:
mpirun -np 128 --debug-daemons -plm rsh hostname
Josh
On Tue, Jan 28, 2020 at 12:34 PM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:
> Input: mpirun -np 128 --debug-daemons hostname
>
>
>
> Output:
>
> [Gen2Node3:54023] [[16659,0],0] orted_cmd: received add_local_procs
Interesting. Can you try:
mpirun -np 128 --debug-daemons hostname
Josh
On Tue, Jan 28, 2020 at 12:14 PM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:
> In relation to the multi-node attempt, I haven’t yet set that up yet as
> the per-node configuration doesn’t pass its tests (full node
Also, can you try running:
mpirun -np 128 hostname
Josh
On Tue, Jan 28, 2020 at 11:49 AM Joshua Ladd wrote:
> I don't see how this can be diagnosed as a "problem with the Mellanox
> Software". This is on a single node. What happens when you try to launch on
> more than one node?
>
> Josh
>
> O
I don't see how this can be diagnosed as a "problem with the Mellanox
Software". This is on a single node. What happens when you try to launch on
more than one node?
Josh
On Tue, Jan 28, 2020 at 11:43 AM Collin Strassburger <
cstrassbur...@bihrle.com> wrote:
> Here’s the I/O for these high local
Can you send the output of a failed run including your command line.
Josh
On Tue, Jan 28, 2020 at 11:26 AM Ralph Castain via users <
users@lists.open-mpi.org> wrote:
> Okay, so this is a problem with the Mellanox software - copying Artem.
>
> On Jan 28, 2020, at 8:15 AM, Collin Strassburger
> w
Did you build UCX with CUDA support (--with-cuda) ?
Josh
On Thu, Sep 5, 2019 at 8:45 PM AFernandez via users <
users@lists.open-mpi.org> wrote:
> Hello OpenMPI Team,
>
> I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU
> and the code runs on the CPUs. I've tried differe
ll not. Still it would be good to iron out
>> this kink so if users do hit it we have a solution. As noted UCX is very
>> new to us and thus it is entirely possible that we are missing something in
>> its interaction with OpenMPI. Our MPI is compiled thusly:
>>
>>
>
> I will note that when I built this it was built using the default version
> of UCX that comes with EPEL (1.5.1). We only built 1.6.0 as the version
> provided by EPEL did not build with MT enabled, which to me seems strange
> as I don't see any reason not to build with MT enable
Paul,
Can you provide a repro and command line, please. Also, what network
hardware are you using?
Josh
On Fri, Aug 23, 2019 at 3:35 PM Paul Edmon via users <
users@lists.open-mpi.org> wrote:
> I have a code using MPI_THREAD_MULTIPLE along with MPI-RMA that I'm
> using OpenMPI 4.0.1. Since 4.0
Hi, Noam
Can you try your original command line with the following addition:
mpirun —mca pml ucx —mca btl ^vader,tcp,openib -*mca osc ucx *
I think we're seeing some conflict between UCX PML and UCT OSC.
Josh
On Wed, Jun 19, 2019 at 4:36 PM Noam Bernstein via users <
users@lists.open-mpi.org>
15 matches
Mail list logo