Hello Troy, this morning I've looked in detail into the problem you've reported on Oct 10 via the OpenIB mailing-list [1]. It seems that the kernel panic is an IPoIB issues.
[1]: http://openib.org/pipermail/openib-general/2005-October/012353.html The following things appens: 1. modprobe hcad_mod ehca_nr_ports=1 The eHCA InfiniBand Device Driver is loaded. 2. modprobe ib_mad The ib_mad stack creates an AQP1. This will start the port activation process. By my count it will take more than 110 / 120 seconds to activate a port. Our device driver gets a timeout, which means that the port is NOT active. and ib_modify_qp will not work (for any QP, doesn't matter if it was created in the ib_mad stack or in the ib_ipoib stack). 3. modprobe ib_ipoib All ressources for IPoIB are allocated (CQ, QPs, MR, etc.) 4. A user runs ifconfig ib0 xxx.xxx.xxx.xxx which executes the following functions: ipoib_open -> ipoib_ib_dev_open -> ipoib_qp_create. The user should see the following error message: l2:/home/schickhj/ibt/linstack/ehca2/ehca2 # ifconfig ib0 192.168.8.8 SIOCSIFFLAGS: Invalid argument 5. The function ipoib_qp_create modifies the QP from Reset 2 Init 2 RTR 2 RTS. If one of these three ib_modify_qp doesn't work, the IPoIB QP (priv->qp) will be destroyed (by the ipoib_qp_create error routine / out_fail) and priv->qp will be NULL. --> see /src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c function ipoib_qp_create 6. A user runs (again) ifconfig ib0 xxx.xxx.xxx which executes (again) the following functions: ipoib_open -> ipoib_ib_dev_open -> ipoib_qp_create 7. ipoib_qp_create wants to modify the IPoIB QP (priv->qp) which is NULL, because the QP was destroy earlier in time by the error handling routine in ipoib_qp_create (see 5.) I think this error could also show up on Mellanox based IB cards when ib_modify_qp failes in ipoib_qp_create. In dmesg you should see: (see 1.) eHCA Infiniband Device Driver (Rel.: ) xics_enable_irq: irq=9029: ibm_int_on returned fffffffd eHCA Infiniband Device Driver (Rel.: ) (see 2.) PU0000 000b0078:ehca_define_sqp HCAD_ERROR Port 1 is not active. PU0000 000b0387:ehca_create_qp HCAD_ERROR ehca_define_sqp() failed rc=ffffffffffffffff PU0000 000b03ae:ehca_create_qp <<< failed ret=ffffffea ib_mad: Couldn't create ib_mad QP1 ib_mad: Couldn't open ehca0 port 1 PU0001 00060103:ehca_parse_ec EHCA port 1 is available. PU0000 000b00bd:plpar_hcall_7arg_7ret HCAD_ERROR HCALL77_IN r3=168 r4=1001000503000004 r5=200100000000002c r6=8a40000000000000 3ed48000 r8=0 r9=0 r10=0 PU0000 000b00c4:plpar_hcall_7arg_7ret HCAD_ERROR HCALL77_OUT r3=ffffffffffffffd3 r4=0 r5=0 r6=0 r7=4 r8=0 r9=800000000005aa18 r10=0 (see 4.) PU0000 000b0564:internal_modify_qp HCAD_ERROR hipz_h_modify_qp() failed rc=ffffffffffffffd3 ehca_qp=c000000003ba4e00 qp_num=2c ib0: failed to modify QP to init, ret = -22 ib0: ipoib_qp_create returned -22 Mit freundlichen Gruessen / Kind Regards Heiko Joerg Schick IBM Deutschland Entwicklung GmbH I/Ox Microcode Development Linux Infiniband Device Drivers Schoenaicher Str. 220 71032 Boeblingen E-Mail: [EMAIL PROTECTED] External: 49-7031-16-0 x4219, t/l: 120-4219 _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
