Re: [OMPI users] no openmpi over IB on new CentOS 7 system
> On Oct 10, 2018, at 4:51 AM, Dave Love wrote: > > RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in > case that's the problem. (Fixed in 3.10.0-862.14.4.) I strongly suspect that this is it. In the process of getting everything organized to collect the info various people suggested would be useful, I noticed some kernel package inconsistencies, and when I made them consistent by upgrading to 862.14, it started working. If the problem comes back, I guess I’ll be back here, but for the moment it appears to be working. Thanks to everyone for the suggestions Noam ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] no openmpi over IB on new CentOS 7 system
On that system please tell us what these return: ibstat ibstatus sminfo ibdiagnet On Wed, 10 Oct 2018 at 12:49, John Hearns wrote: > > Noam, what does ompi_info say - specifically which BTLs are available? > Stupid question though - this is a single system with no connection to a > switch? > You probably dont have an OpenSM subnet manager running then - could that be > the root cause? > > On Wed, 10 Oct 2018 at 09:53, Dave Love wrote: > > > > RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in > > case that's the problem. (Fixed in 3.10.0-862.14.4.) > > ___ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] no openmpi over IB on new CentOS 7 system
Noam, what does ompi_info say - specifically which BTLs are available? Stupid question though - this is a single system with no connection to a switch? You probably dont have an OpenSM subnet manager running then - could that be the root cause? On Wed, 10 Oct 2018 at 09:53, Dave Love wrote: > > RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in > case that's the problem. (Fixed in 3.10.0-862.14.4.) > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] no openmpi over IB on new CentOS 7 system
RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in case that's the problem. (Fixed in 3.10.0-862.14.4.) ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] no openmpi over IB on new CentOS 7 system
Noam, Start with the FAQ, etc., under "Getting Help/Support" in the left-column menu at https://www.open-mpi.org/ Andy *From:* Noam Bernstein *Sent:* Tuesday, October 09, 2018 2:26PM *To:* Open Mpi Users *Cc:* *Subject:* [OMPI users] no openmpi over IB on new CentOS 7 system Hi - I’m trying to get OpenMPI working on a newly configured CentOS 7 system, and I’m not even sure what information would be useful to provide. I’m using the CentOS built in libibverbs and/or libfabric, and I configure openmpi with just —with-verbs —with-ofi —prefix=$DEST also tried —without-ofi, no change. Basically, I can run with “—mca btl self,vader”, but if I try “—mca btl,openib” I get an error from each process: [compute-0-0][[24658,1],5][connect/btl_openib_connect_udcm.c:1245:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument If I don’t specify the btl it appears to try to set up openib with the same errors, then crashes on some free() related segfault, presumably when it tries to actually use vader. The machine seems to be able to see its IB interface, as reported by things like ibstatus or ibv_devinfo. I’m not sure what else to look for. I also confirmed that “ulimit -l” reports unlimited. Does anyone have any suggestions as to how to diagnose this issue? thanks, Noam ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users