Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-02-08 Thread Patrick Begou via users
IFS are you running? > 2. Are you using CUDA cards by any chance? If so, what version of CUDA? > > -Original Message- > From: Heinz, Michael William > Sent: Wednesday, January 27, 2021 3:45 PM > To: Open MPI Users > Subject: RE: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-28 Thread Heinz, Michael William via users
27, 2021 3:37 PM To: Open MPI Users Cc: Heinz, Michael William Subject: Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path Unfortunately, OPA/PSM support for Debian isn't handled by Intel directly or by Cornelis Networks - but I should point out you can download the latest

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-28 Thread Peter Kjellström via users
On Wed, 27 Jan 2021 15:31:40 -0500 Michael Di Domenico via users wrote: > if you have OPA cards, for openmpi you only need --with-ofi, you don't > need psm/psm2/verbs/ucx. I agree with Michael and would add for clarity that on the system you always need PSM2 and optionally libfabric (if you go

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Heinz, Michael William via users
, Michael William Subject: Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path Unfortunately, OPA/PSM support for Debian isn't handled by Intel directly or by Cornelis Networks - but I should point out you can download the latest official source for PSM2 and the drivers from Github

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Heinz, Michael William via users
: Wednesday, January 27, 2021 3:32 PM To: Open MPI Users Cc: Michael Di Domenico Subject: Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path if you have OPA cards, for openmpi you only need --with-ofi, you don't need psm/psm2/verbs/ucx. but this assumes you're running a rhel based

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Michael Di Domenico via users
if you have OPA cards, for openmpi you only need --with-ofi, you don't need psm/psm2/verbs/ucx. but this assumes you're running a rhel based distro and have installed the OPA fabric suite of software from Intel/CornelisNetworks. which is what i have. perhaps there's something really odd in

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Patrick Begou via users
Hi Howard and Michael first many thanks for testing with my short application. Yes, when the test code runs fine it just show the max RSS size of rank 0 process. When it runs wrong it put a messages about each invalid value found. As I said, I have also deployed OpenMPI on various cluster (in

Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path

2021-01-27 Thread Pritchard Jr., Howard via users
Hi Folks, I'm also have problems reproducing this on one of our OPA clusters: libpsm2-11.2.78-1.el7.x86_64 libpsm2-devel-11.2.78-1.el7.x86_64 cluster runs RHEL 7.8 hca_id: hfi1_0 transport: InfiniBand (0) fw_ver: 1.27.0