Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread Noam Bernstein
> On Oct 10, 2018, at 4:51 AM, Dave Love  wrote:
> 
> RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in
> case that's the problem.  (Fixed in 3.10.0-862.14.4.)

I strongly suspect that this is it.  In the process of getting everything 
organized to collect the info various people suggested would be useful, I 
noticed some kernel package inconsistencies, and when I made them consistent by 
upgrading to 862.14, it started working.  If the problem comes back, I guess 
I’ll be back here, but for the moment it appears to be working.  Thanks to 
everyone for the suggestions

Noam

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread John Hearns via users
On that system please tell us what these return:
ibstat
ibstatus
sminfo
ibdiagnet




On Wed, 10 Oct 2018 at 12:49, John Hearns  wrote:
>
> Noam,  what does ompi_info say - specifically which BTLs are available?
> Stupid question though - this is a single system with no connection to a 
> switch?
> You probably dont have an OpenSM subnet manager running then - could that be 
> the root cause?
>
> On Wed, 10 Oct 2018 at 09:53, Dave Love  wrote:
> >
> > RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in
> > case that's the problem.  (Fixed in 3.10.0-862.14.4.)
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread John Hearns via users
Noam,  what does ompi_info say - specifically which BTLs are available?
Stupid question though - this is a single system with no connection to a switch?
You probably dont have an OpenSM subnet manager running then - could
that be the root cause?

On Wed, 10 Oct 2018 at 09:53, Dave Love  wrote:
>
> RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in
> case that's the problem.  (Fixed in 3.10.0-862.14.4.)
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-10 Thread Dave Love
RDMA was just broken in the last-but-one(?) RHEL7 kernel release, in
case that's the problem.  (Fixed in 3.10.0-862.14.4.)
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] no openmpi over IB on new CentOS 7 system

2018-10-09 Thread Andy Riebs

Noam,

Start with the FAQ, etc., under "Getting Help/Support" in the 
left-column menu at https://www.open-mpi.org/


Andy


*From:* Noam Bernstein 
*Sent:* Tuesday, October 09, 2018 2:26PM
*To:* Open Mpi Users 
*Cc:*
*Subject:* [OMPI users] no openmpi over IB on new CentOS 7 system

Hi - I’m trying to get OpenMPI working on a newly configured CentOS 7 
system, and I’m not even sure what information would be useful to 
provide.  I’m using the CentOS built in libibverbs and/or libfabric, and 
I configure openmpi with just

—with-verbs —with-ofi —prefix=$DEST
also tried —without-ofi, no change.  Basically, I can run with “—mca btl 
self,vader”, but if I try “—mca btl,openib” I get an error from each 
process:


   
[compute-0-0][[24658,1],5][connect/btl_openib_connect_udcm.c:1245:udcm_rc_qp_to_rtr]
   error modifing QP to RTR errno says Invalid argument

If I don’t specify the btl it appears to try to set up openib with the 
same errors, then crashes on some free() related segfault, presumably 
when it tries to actually use vader.


The machine seems to be able to see its IB interface, as reported by 
things like ibstatus or ibv_devinfo.  I’m not sure what else to look 
for.  I also confirmed that “ulimit -l” reports unlimited.


Does anyone have any suggestions as to how to diagnose this issue?

thanks,
Noam


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users