HI Ray,

I think you're correct about opening a new issue.  I added a link to this email 
chain on to that issue.

Howard

On 1/14/26, 10:22 AM, "[email protected] 
<mailto:[email protected]> on behalf of Sheppard, Raymond W" 
<[email protected] <mailto:[email protected]> on behalf of 
[email protected] <mailto:[email protected]>> wrote:


Thanks Howard,
Yes, I think they are using a single node. I believe they have nodes that now 
support 256 ranks. I am the middle man but I will pass it along and report the 
result. I am not sure it should be a new formal issue since they think it is 
part of 12979 (Sorry about the Windoze "Safe Links" the school now uses). If 
you think it should be a new issue though, I could pass back that they should 
open one. I do not know more than the email we received. They could fill in the 
necessary info.
Ray




________________________________________
From: 'Pritchard Jr., Howard' via Open MPI users <[email protected] 
<mailto:[email protected]>>
Sent: Wednesday, January 14, 2026 12:12 PM
To: [email protected] <mailto:[email protected]>
Cc: [email protected] <mailto:[email protected]>
Subject: Re: [EXTERNAL] [OMPI users] OpenMPI issue reported to SPEC HPG group. 
Thoughts?


Hello Ray,


A few questions to help better understand the problem.


- Are you running the benchmark on a single node?
- If the answer to that question is yes, could you try using UCX for messaging 
- mpirun --mca pml ucx and see if you still observe the hang?


Also, it would help to open an issue for this problem at 
https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues__;!!Bt8fGhp8LhKGRg!HtBMV-qUF4xu6V0LQ9FL6GEj2IiRCbQzIdvzNsGmOxvJlKnwSqD9vPWnM4sn8ZdW_nuEXRb5liH0ESavHA$
 
<https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues__;!!Bt8fGhp8LhKGRg!HtBMV-qUF4xu6V0LQ9FL6GEj2IiRCbQzIdvzNsGmOxvJlKnwSqD9vPWnM4sn8ZdW_nuEXRb5liH0ESavHA$>
 


Thanks,


Howard


On 1/14/26, 9:46 AM, "[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>> on behalf of Sheppard, Raymond W" 
<[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> on behalf 
of [email protected] <mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>> wrote:




Hi,
Ray Sheppard here wearing my SPEC hat. We received a mail from AMD we are not 
sure how to deal with. So I thought I would pass it along in case anyone might 
have some relevant thoughts about it. It looks like Jeff S. filed the issue 
they site. We are sort of fishing for a response to them. So any info is 
appreciated. Thanks.
Ray




Dear Support.




I am an engineer at AMD who is currently running the SPECMPI2007 benchmarks, 
and we are experiencing issues with running the 122.Tachyon benchmark when 
compiled with OpenMPI 5. It is our goal to be able to run SPECMPI with OpenMPI 
5 to minimize the overhead of MPI in our benchmarking.




In our usual configuration, running the benchmark on 256 ranks using OpenMPI 5 
with the cross-memory attach (CMA) fabric. It appears that the 122.Tachyon 
benchmark deadlocks. When running Tachyon with OpenMPI 4.1.8 and the UCX 
fabric, this issue does not occur.




On investigating further, we observe:
With MPICH v4.3.0 the benchmark fails to run due to an MPI error detected by 
MPICH, due to an ‘MPI_Allgather()’ call using the same array for the send and 
receive buffer, which is disallowed by the MPI spec.
On modifying the benchmark to correct the issue with the Allgather call we see 
the following:
MPICH runs to completion, then crashes at finalization.
OpenMPI still deadlocks.
The deadlock is only observed when running on >35 ranks and is present in 
multiple versions of OpenMPI (v.5.0.5, v.5.0.8).
We discovered this issue for OpenMPI when investigating this: 
https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$
 
<https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$>
 
<https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$
 
<https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/12979__;!!Bt8fGhp8LhKGRg!EUk9KbzE4IsIwvMccyKR9xRhhryblAA5dgx5ASXZA9YebM0wntlpKYGOHVaHw9c9-f0ZG-g8wHM7jPgsYA$>>
 , which may be relevant.




Is this a known issue with 122.Tachyon benchmark, and are you able to help us 
run 122.Tachyon on OpenMPI 5?




Thank you in advance for your help. If you require any further information, 
please do not hesitate to reach out to me.




Thanks
James




To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]> 
<mailto:[email protected] 
<mailto:[email protected]>>.










To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.


To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.





To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Reply via email to