Michael,
> > The ib_mthca module now initializes correctly on
> > both EM64T machines. I noticed some discussion between you and Roland about
> > making the parameter "fw_cmd_doorbell=0" the default. Did this
> > occur in RC5?
>
> Yes, we changed fw_cmd_doorbell to 0 by default for now because it seemed
> safer. I expect if you load mthca with fw_cmd_doorbell=1 you still get an
> error, isn't that right?
>
Although the change in RC5 for fw_cmd_doorbell *seemed* to allow the ib_mthca module to initialize, I don't think I am out of the woods yet on this particular machine. The link never comes up, and the other machine, which is connected back to back with this one, and on which I am trying to run OpenSM, does not get a response to its MAD packets. When I try to shut down the openib stack with the "/etc/init.d/openibd stop" script, the processes hang trying to set device "ib0" down. Here is an excerpt from a terminal session:
[jatoba] (ib) ib> ibstat
CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.0.800
Hardware version: a0
Node GUID: 0x0002c90200216e40
System image GUID: 0x0002c90200216e43
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 20
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0002c90200216e41
[jatoba] (ib) ib> ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c902:0021:6e41
base lid: 0x0
sm lid: 0x0
state: 2: INIT
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
[jatoba] (ib) ib> /etc/init.d/opensmd status
opensm is stopped
[jatoba] (ib) ib> /etc/init.d/openibd status
HCA driver loaded
Configured devices:
ib0
Currently active devices:
ib0
The following modules are also loaded:
ib_cm
[jatoba] (ib) ib> /etc/init.d/openibd stop
At this point the command hangs. Doing a "ps -ef" from another terminal reveals:
root 6882 6755 0 15:31 pts/0 00:00:00 /bin/bash /etc/init.d/openibd stop
root 7012 6882 0 15:31 pts/0 00:00:00 /bin/bash /sbin/ifdown ib0
root 7031 7012 0 15:31 pts/0 00:00:00 ip link set dev ib0 down
I tried using gdb to "attach" to process 7031 to see its stack, but that hung too, as well as an attempt to see what the status of the interface was with "/sbin/ifconfig".
It is rather difficult for me to debug this sort of hang, since I telecommute from Tucson and the machines are located in Phoenix. Anyone have any suggestions?
-Don Albert-
_______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
