Hello,
I am having difficulty with OpenMPI versions 4.0.2 and 3.1.5. Both of these
versions cause the same error (error code 63) when utilizing more than 100
cores on a single node. The processors I am utilizing are AMD Epyc "Rome"
7742s. The OS is CentOS 8.1. I have tried compiling with bo
:38 Uhr schrieb Collin Strassburger via users
mailto:users@lists.open-mpi.org>>:
Hello,
I am having difficulty with OpenMPI versions 4.0.2 and 3.1.5. Both of these
versions cause the same error (error code 63) when utilizing more than 100
cores on a single node. The processors I am util
/27/2020 11:29 AM, Collin Strassburger via users wrote:
This message was sent from a non-IU address. Please exercise caution when
clicking links or opening attachments from external sources.
Hello Howard,
To remove potential interactions, I have found that the issue persists without
ucx and hcoll
here:
https://www.open-mpi.org/community/help/
Thanks!
On Jan 27, 2020, at 12:00 PM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
I had initially thought the same thing about the streams, but I have 2 sockets
with 64 cores each. Additionally, I h
returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
I have done some additional testi
ct: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
sts.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM,
t;>
Subject: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>&
On
Behalf Of Ralph Castain via users
Sent: Tuesday, January 28, 2020 11:02 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Ralph Castain mailto:r...@open-mpi.org>>
Subject: Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per nod
users] [External] Re: OMPI returns error 63 on AMD 7742 when
utilizing 100+ processors per node
Does it work with pbs but not Mellanox? Just trying to isolate the problem.
On Jan 28, 2020, at 6:39 AM, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
I hav
I agree that it is odd that the issue does not appear until after the Mellanox
drivers have been installed (and the configure flags set to use them). As
requested, here are the results
Input: mpirun -np 128 --mca odls_base_verbose 10 --mca state_base_verbose 10
hostname
Output:
[Gen2Node3:54
Wonderful! I am happy to confirm that this resolves the issue!
Many thanks to everyone for their assistance,
Collin
Hello,
Just a quick comment on this; is your code written in C/C++ or Fortran?
Fortran has issues with writing at a decent speed regardless of MPI setup and
as such should be avoided for file IO (yet I still occasionally see it
implemented).
Collin
From: users On Behalf Of Dong-In Kang via
.
Cheers,
Gilles
On April 6, 2020, at 23:22, Collin Strassburger via users
mailto:users@lists.open-mpi.org>> wrote:
Hello,
Just a quick comment on this; is your code written in C/C++ or Fortran?
Fortran has issues with writing at a decent speed regardless of MPI setup and
as such sho
Hello David,
The slot calculation is based on physical cores rather than logical cores. The
4 CPUs you are seeing there are logical CPUs. And since your processor has 2
threads per core, you have two physical cores; yielding a total of 4 logical
cores (which is reported to lscpu). On machine
it instructs mpirun to treat the HWTs as independent cpus so you would
> have 4 slots in this case.
>
>
> > On Jun 8, 2020, at 11:28 AM, Collin Strassburger via users
> > wrote:
> >
> > Hello David,
> >
> > The slot calculation is based on physical
Since it is happening on this cluster and not on others, have you checked the
InfiniBand counters to ensure it’s not a bad cable or something along those
lines? I believe the command is ibdiag (or something similar).
Collin
From: users On Behalf Of Bart Willems via
users
Sent: Thursday, June
17 matches
Mail list logo