Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Jeff Squyres (jsquyres) via users
Can you please send all the information listed here: https://www.open-mpi.org/community/help/ Thanks! On Jan 27, 2020, at 12:00 PM, Collin Strassburger via users mailto:users@lists.open-mpi.org>> wrote: Hello, I had initially thought the same thing about the streams, but I have 2 sockets

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Collin Strassburger via users
Hello, I had initially thought the same thing about the streams, but I have 2 sockets with 64 cores each. Additionally, I have not yet turned multithreading off, so lscpu reports a total of 256 logical cores and 128 physical cores. As such, I don’t see how it could be running out of streams u

Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Ray Sheppard via users
Hi All,   Just my two cents, I think error code 63 is saying it is running out of streams to use.  I think you have only 64 cores, so at 100, you are overloading most of them.  It feels like you are running out of resources trying to swap in and out ranks on physical cores.    Ray On 1/27/202

Re: [OMPI users] OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Collin Strassburger via users
Hello Howard, To remove potential interactions, I have found that the issue persists without ucx and hcoll support. Run command: mpirun -np 128 bin/xhpcg Output: -- mpirun was unable to start the specified application as it

Re: [OMPI users] OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Howard Pritchard via users
Hello Collen, Could you provide more information about the error. Is there any output from either Open MPI or, maybe, UCX, that could provide more information about the problem you are hitting? Howard Am Mo., 27. Jan. 2020 um 08:38 Uhr schrieb Collin Strassburger via users < users@lists.open-m

[OMPI users] OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Collin Strassburger via users
Hello, I am having difficulty with OpenMPI versions 4.0.2 and 3.1.5. Both of these versions cause the same error (error code 63) when utilizing more than 100 cores on a single node. The processors I am utilizing are AMD Epyc "Rome" 7742s. The OS is CentOS 8.1. I have tried compiling with bo