RE: [ofa-general] Re: [ewg] /dev/infiniband/rdma_cm not created

2009-05-13 Thread Davis, Arlin R
 
FWIW, I see the following in /etc/infiniband/openibd.conf:


# Load RDMA_CM module
RDMA_CM_LOAD=yes


is RDMA_UCM_LOAD=yes ?

What do you see with modinfo rdma_cm rdma_ucm 
?___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] /dev/infiniband/rdma_cm not created

2009-05-13 Thread Jeff Squyres

On May 13, 2009, at 3:03 PM, Davis, Arlin R wrote:


FWIW, I see the following in /etc/infiniband/openibd.conf:


# Load RDMA_CM module
RDMA_CM_LOAD=yes

is RDMA_UCM_LOAD=yes ?



Yes, sorry I didn't see that one first time around:

# Load RDMA_UCM module
RDMA_UCM_LOAD=yes


What do you see with modinfo rdma_cm rdma_ucm ?


[r...@svbu-mpi055 ~]# modinfo rdma_cm rdma_ucm
filename:   /lib/modules/2.6.9-67.ELsmp/updates/kernel/drivers/ 
infiniband/core/rdma_cm.ko

parm:   cma_response_timeout:CMA_CM_RESPONSE_TIMEOUT default=20
parm:   unify_tcp_port_space:Unify the host TCP and RDMA port  
space allocation (default=0)
parm:   tavor_quirk:Tavor performance quirk: limit MTU to 1K  
if  0

license:Dual BSD/GPL
description:Generic RDMA CM Agent
author: Sean Hefty
depends:ib_addr,ib_cm,iw_cm,ib_core,ib_sa
vermagic:   2.6.9-67.ELsmp SMP gcc-3.4
filename:   /lib/modules/2.6.9-67.ELsmp/updates/kernel/drivers/ 
infiniband/core/rdma_ucm.ko

license:Dual BSD/GPL
description:RDMA Userspace Connection Manager Access
author: Sean Hefty
depends:rdma_cm,ib_uverbs,ib_core,rdma_cm
vermagic:   2.6.9-67.ELsmp SMP gcc-3.4
[r...@svbu-mpi055 ~]#


--
Jeff Squyres
Cisco Systems

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ofa-general] Re: [ewg] /dev/infiniband/rdma_cm not created

2009-05-13 Thread Jeff Squyres
Ok, I figured it out.  I have some creative /etc/sysconfig/network- 
script/ifcfg-ib* scripts that may choose to do nothing if no device is  
present (or some other esoteric, specific-to-jeffs-cluster criteria is  
met) -- they call exit 0 in this case.  This apparently causes the  
top-level /etc/init.d/openibd to exit (!).  I've fixed this (they now  
never call exit); now everything works as expected.


Upon reflection, I can see that this was totally my fault -- ifcfg-*  
scripts are always sourced and should therefore never call exit.


But given that /etc/init.d/openib is sooo complex and has sooo many  
moving parts, it would be nice if there were a way to track down  
problems a little more easily; perhaps a verbose setting in /etc/ 
infiniband/openibd.conf, or somesuch.  Indeed, since OFED is targeted  
at the datacenter, monitors attached to the servers in question and/or  
serial consoles may not be readily available.  Hence, having the  
ability to drop some verbose output into syslog during boot, for  
example, might be quite useful to sysadmins/network admins when  
troubleshooting.


Just my $0.02.

Thanks for the tips where to look, Woody!



On May 13, 2009, at 3:18 PM, Jeff Squyres (jsquyres) wrote:


On May 13, 2009, at 3:12 PM, Woodruff, Robert J wrote:

 Check to see if some other driver failed to load.
 I think I have seen before that if another driver
 fails to load, the start script bails out and
 does not load the other drivers.

 Perhaps try doing a /etc/init.d/openibd restart
 manually to see if something is failing to load.


Weird -- doing it manually shows no problem:

[r...@svbu-mpi055 ~]# /etc/init.d/openibd restart
Unloading HCA driver:  [  OK  ]
Loading HCA driver and Access Layer:   [  OK  ]
Setting up InfiniBand network interfaces:
Bringing up interface ib0: [  OK  ]
Bringing up interface ib1: [  OK  ]
Setting up service network . . .   [  done  ]
[r...@svbu-mpi055 ~]# ls -l /dev/infiniband/rdma_cm
crw-rw-rw-  1 root root 10, 62 May 13 12:17 /dev/infiniband/rdma_cm
[r...@svbu-mpi055 ~]#

Something must be going wrong during the bootup.  I'm unfortunately
several thousand miles from the server and don't have a serial
console.  I guess I'll insert some initlog's in /etc/init.d/openibd...

--
Jeff Squyres
Cisco Systems

___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



--
Jeff Squyres
Cisco Systems

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg