RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'Collin Strassburger' via Open MPI users
to note that down for the future. Collin Strassburger (he/him) From: 'George Bosilca' via Open MPI users Sent: Wednesday, December 10, 2025 3:32 PM To: [email protected] Subject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting CAUTION: This email originated from outs

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'George Bosilca' via Open MPI users
sday, December 10, 2025 10:42 AM > *To:* [email protected] > *Subject:* Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting > > > > *CAUTION:* This email originated from outside of the organization. Do not > click links or open attachments unless you recognize the sender

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'Collin Strassburger' via Open MPI users
ubject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. There you go, the misconfiguration of the second host prevents UCC,

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'George Bosilca' via Open MPI users
gt; Collin Strassburger (he/him) > > -Original Message- > From: 'Joachim Jenke' via Open MPI users > Sent: Wednesday, December 10, 2025 10:12 AM > To: [email protected] > Subject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting > > Hi Collin, >

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'Collin Strassburger' via Open MPI users
Open MPI users Sent: Wednesday, December 10, 2025 10:12 AM To: [email protected] Subject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting Hi Collin, Am 10.12.25 um 15:36 schrieb 'Collin Strassburger' via Open MPI users: > /opt/hpcx/ucc/lib/ucc/libucc_tl_cuda.so (libcuda.

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'Joachim Jenke' via Open MPI users
Hi Collin, Am 10.12.25 um 15:36 schrieb 'Collin Strassburger' via Open MPI users: /opt/hpcx/ucc/lib/ucc/libucc_tl_cuda.so (libcuda.so.1: cannot open shared object file: No such file or directory) Is it only the second host that cannot find libcuda.so? Do you have the library installed on both

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-10 Thread 'Collin Strassburger' via Open MPI users
ge Bosilca' via Open MPI users Sent: Tuesday, December 9, 2025 5:12 PM To: [email protected] Subject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the se

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'George Bosilca' via Open MPI users
> > Warm regards, > > Collin Strassburger (he/him) > > > > *From:* 'Pritchard Jr., Howard' via Open MPI users < > [email protected]> > *Sent:* Tuesday, December 9, 2025 4:24 PM > *To:* [email protected] > *Subject:* Re: [EXTERNAL] [OMPI users]

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Collin Strassburger' via Open MPI users
10 PM To: "[email protected]<mailto:[email protected]>" mailto:[email protected]>> Subject: RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting Hello Howard, Running with export OMPI_MCA_coll=^ucc resulted in a working run of the code! Are there any

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Pritchard Jr., Howard' via Open MPI users
pen-mpi.org" Subject: RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting Hello Howard, Running with export OMPI_MCA_coll=^ucc resulted in a working run of the code! Are there any downsides to using OMPI_MCA_coll=^ucc to side-step this issue? Warm regards, Collin Strassburger (he/him)

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Collin Strassburger' via Open MPI users
er 9, 2025 3:54 PM To: [email protected] Subject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi Collin, Th

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Pritchard Jr., Howard' via Open MPI users
ollin Strassburger' via Open MPI users Reply-To: "[email protected]" Date: Tuesday, December 9, 2025 at 1:50 PM To: "[email protected]" Subject: RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting Hit “enter” a little too soon. Here’s the rest that was intended to

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Collin Strassburger' via Open MPI users
000\000\000\004%\016{\377\177\000\000\326\354\2606Ww", '\000' ... Collin Strassburger (he/him) From: 'Collin Strassburger' via Open MPI users Sent: Tuesday, December 9, 2025 3:40 PM To: [email protected] Subject: RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting CAU

RE: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Collin Strassburger' via Open MPI users
/libopen-pal.so.40 Collin Strassburger (he/him) From: 'Pritchard Jr., Howard' via Open MPI users Sent: Tuesday, December 9, 2025 3:27 PM To: [email protected] Subject: Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting CAUTION: This email originated from outside of the org

Re: [EXTERNAL] [OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Pritchard Jr., Howard' via Open MPI users
ecember 9, 2025 at 1:19 PM To: Open MPI Users Subject: [EXTERNAL] [OMPI users] Multi-host troubleshooting Hello, I am dealing with an odd mpi issue that I am unsure how to continue diagnosing. Following the outline given by: https://www.open-mpi.org/faq/?category=running#diagnose-multi

[OMPI users] Multi-host troubleshooting

2025-12-09 Thread 'Collin Strassburger' via Open MPI users
Hello, I am dealing with an odd mpi issue that I am unsure how to continue diagnosing. Following the outline given by: https://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems, steps 1-3 complete without any issues i.e. ssh remotehost hostname works Paths include the nvidia h