Thanks Howard,
  Yes, I think they are using a single node.  I believe they have nodes that 
now support 256 ranks.  I am the middle man but I will pass it along and report 
the result.  I am not sure it should be a new formal issue since they think it 
is part of 12979 (Sorry about the Windoze "Safe Links" the school now uses).  
If you think it should be a new issue though, I could pass back that they 
should open one.  I do not know more than the email we received.  They could 
fill in the necessary info.
         Ray


________________________________________
From: 'Pritchard Jr., Howard' via Open MPI users <[email protected]>
Sent: Wednesday, January 14, 2026 12:12 PM
To: [email protected]
Cc: [email protected]
Subject: Re: [EXTERNAL] [OMPI users] OpenMPI issue reported to SPEC HPG group.  
Thoughts?

Hello Ray,

A few questions to help better understand the problem.

- Are you running the benchmark on a single node?
- If the answer to that question is yes,  could you try using UCX for messaging 
- mpirun --mca pml ucx and see if you still observe the hang?

Also, it would help to open an issue for this problem at 
https://github.com/open-mpi/ompi/issues

Thanks,

Howard

On 1/14/26, 9:46 AM, "[email protected] 
<mailto:[email protected]> on behalf of Sheppard, Raymond W" 
<[email protected] <mailto:[email protected]> on behalf of 
[email protected] <mailto:[email protected]>> wrote:


Hi,
Ray Sheppard here wearing my SPEC hat. We received a mail from AMD we are not 
sure how to deal with. So I thought I would pass it along in case anyone might 
have some relevant thoughts about it. It looks like Jeff S. filed the issue 
they site. We are sort of fishing for a response to them. So any info is 
appreciated. Thanks.
Ray


Dear Support.


I am an engineer at AMD who is currently running the SPECMPI2007 benchmarks, 
and we are experiencing issues with running the 122.Tachyon benchmark when 
compiled with OpenMPI 5. It is our goal to be able to run SPECMPI with OpenMPI 
5 to minimize the overhead of MPI in our benchmarking.


In our usual configuration, running the benchmark on 256 ranks using OpenMPI 5 
with the cross-memory attach (CMA) fabric. It appears that the 122.Tachyon 
benchmark deadlocks. When running Tachyon with OpenMPI 4.1.8 and the UCX 
fabric, this issue does not occur.


On investigating further, we observe:
With MPICH v4.3.0 the benchmark fails to run due to an MPI error detected by 
MPICH, due to an ‘MPI_Allgather()’ call using the same array for the send and 
receive buffer, which is disallowed by the MPI spec.
On modifying the benchmark to correct the issue with the Allgather call we see 
the following:
MPICH runs to completion, then crashes at finalization.
OpenMPI still deadlocks.
The deadlock is only observed when running on >35 ranks and is present in 
multiple versions of OpenMPI (v.5.0.5, v.5.0.8).
We discovered this issue for OpenMPI when investigating this: 
https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$
 
<https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$>
 , which may be relevant.


Is this a known issue with 122.Tachyon benchmark, and are you able to help us 
run 122.Tachyon on OpenMPI 5?


Thank you in advance for your help. If you require any further information, 
please do not hesitate to reach out to me.


Thanks
James


To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.





To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Reply via email to