On Thu, Oct 20, 2005 at 03:32:13PM -0700, Roland Dreier wrote: > Troy> There is some sort of strange initializiation error going on here.. > > Yes, very strange. Can you add > > printk(KERN_ERR "hca->node_type = %d\n", hca->node_type); > > to the beginning of ipoib_add_port(), and > > printk(KERN_ERR "dev->ib_dev.node_type = %d\n", dev->ib_dev.node_type); > > right before the call to ib_register_device() in > mthca_register_device() and send the output that you get when hotplug > loads ib_mthca vs. when you load ib_mthca by hand?
When loaded at boot: [586811.915831] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [586811.915849] ib_mthca: Initializing 0000:d9:00.0 [586811.916634] PCI: Enabling device: (0000:d9:00.0), cmd 142 [586818.501595] openafs: module license 'http://www.openafs.org/dl/license10.html' taints kernel. [586818.504651] Found system call table at 0xc000000000013e68 (scan: close+ioctl) [586818.520240] Starting AFS cache scan...Memory cache: Allocating 12500 dcacheentries...found 0 non-empty cache files (0%). [586875.848354] afs: Lost contact with volume location server 147.155.137.10 incell scl.ameslab.gov [586875.848374] afs: Lost contact with volume location server 147.155.137.10 incell scl.ameslab.gov [587154.758768] hca->node_type = 236 [587154.760578] hca->node_type = 236 [587154.761511] hca->node_type = 236 [587154.761572] mthca0: ib_query_pkey port 3 failed (ret = -22) [587154.761584] hca->node_type = 236 [587154.761633] mthca0: ib_query_pkey port 4 failed (ret = -22) [587154.761644] hca->node_type = 236 [587154.762506] hca->node_type = 236 [587154.763422] hca->node_type = 236 [587154.763480] mthca0: ib_query_pkey port 7 failed (ret = -22) [587154.763491] hca->node_type = 236 [587154.763542] mthca0: ib_query_pkey port 8 failed (ret = -22) [587154.763553] hca->node_type = 236 [587154.765698] hca->node_type = 236 [587154.767136] hca->node_type = 236 [587154.767312] mthca0: ib_query_pkey port 11 failed (ret = -22) [587154.767324] hca->node_type = 236 [587154.767455] mthca0: ib_query_pkey port 12 failed (ret = -22) [587154.767471] hca->node_type = 236 [587154.769140] hca->node_type = 236 [587154.772116] hca->node_type = 236 [587154.772180] mthca0: ib_query_pkey port 15 failed (ret = -22) [587154.772192] hca->node_type = 236 [587154.772243] mthca0: ib_query_pkey port 16 failed (ret = -22) [587154.772255] hca->node_type = 236 [587154.773401] hca->node_type = 236 [587154.776817] hca->node_type = 236 [587154.776974] mthca0: ib_query_pkey port 19 failed (ret = -22) [587154.776986] hca->node_type = 236 [587154.778179] mthca0: ib_query_pkey port 20 failed (ret = -22) [587154.778198] hca->node_type = 236 [587154.780159] hca->node_type = 236 [587154.785406] hca->node_type = 236 [587154.785512] mthca0: ib_query_pkey port 23 failed (ret = -22) [587154.785523] hca->node_type = 236 [587154.785582] mthca0: ib_query_pkey port 24 failed (ret = -22) [587154.785599] hca->node_type = 236 [587154.789427] hca->node_type = 236 [587154.794314] hca->node_type = 236 [587154.794458] mthca0: ib_query_pkey port 27 failed (ret = -22) [587154.794474] hca->node_type = 236 [587154.794634] mthca0: ib_query_pkey port 28 failed (ret = -22) [587154.794646] hca->node_type = 236 [587154.797133] hca->node_type = 236 [587154.803507] hca->node_type = 236 [587154.803597] mthca0: ib_query_pkey port 31 failed (ret = -22) [587154.803608] hca->node_type = 236 [587154.803667] mthca0: ib_query_pkey port 32 failed (ret = -22) [587154.803679] hca->node_type = 236 [587154.820947] hca->node_type = 236 [587154.829795] hca->node_type = 236 [587154.831921] mthca0: ib_query_pkey port 35 failed (ret = -22) [587154.831934] hca->node_type = 236 [587154.834932] mthca0: ib_query_pkey port 36 failed (ret = -22) [587154.834946] hca->node_type = 236 [587154.844314] hca->node_type = 236 [587154.853591] hca->node_type = 236 [587154.853680] mthca0: ib_query_pkey port 39 failed (ret = -22) [587154.853692] hca->node_type = 236 [587154.853745] mthca0: ib_query_pkey port 40 failed (ret = -22) [587154.853761] hca->node_type = 236 [587154.869483] hca->node_type = 236 [587154.874749] hca->node_type = 236 [587154.874952] mthca0: ib_query_pkey port 43 failed (ret = -22) [587154.874969] hca->node_type = 236 [587154.875609] mthca0: ib_query_pkey port 44 failed (ret = -22) [587154.875624] hca->node_type = 236 [587154.894612] hca->node_type = 236 [587154.908058] hca->node_type = 236 [587154.909244] mthca0: ib_query_pkey port 47 failed (ret = -22) [587154.909261] hca->node_type = 236 [587154.909323] mthca0: ib_query_pkey port 48 failed (ret = -22) [587154.909334] hca->node_type = 236 [587154.918749] hca->node_type = 236 [587154.939629] hca->node_type = 236 [587154.939729] mthca0: ib_query_pkey port 51 failed (ret = -22) [587154.939745] hca->node_type = 236 [587154.939866] mthca0: ib_query_pkey port 52 failed (ret = -22) [587154.939883] hca->node_type = 236 [587154.957219] hca->node_type = 236 [587154.971523] hca->node_type = 236 [587154.971643] mthca0: ib_query_pkey port 55 failed (ret = -22) [587154.971664] hca->node_type = 236 [587154.972717] mthca0: ib_query_pkey port 56 failed (ret = -22) [587154.972733] hca->node_type = 236 [587154.984707] hca->node_type = 236 [587154.999129] hca->node_type = 236 [587154.999963] mthca0: ib_query_pkey port 59 failed (ret = -22) [587154.999976] hca->node_type = 236 [587155.000264] mthca0: ib_query_pkey port 60 failed (ret = -22) [587155.000282] hca->node_type = 236 [587155.012766] hca->node_type = 236 [587155.041105] hca->node_type = 236 [587155.041178] mthca0: ib_query_pkey port 63 failed (ret = -22) [587155.041189] hca->node_type = 236 [587155.041319] mthca0: ib_query_pkey port 64 failed (ret = -22) [587155.041332] hca->node_type = 236 [587155.066730] hca->node_type = 236 [587155.077348] hca->node_type = 236 [587155.077576] mthca0: ib_query_pkey port 67 failed (ret = -22) [587155.077593] hca->node_type = 236 [587155.077883] mthca0: ib_query_pkey port 68 failed (ret = -22) [587155.077896] hca->node_type = 236 [587155.097490] hca->node_type = 236 [587155.117809] hca->node_type = 236 [587155.117946] mthca0: ib_query_pkey port 71 failed (ret = -22) [587155.117962] hca->node_type = 236 [587155.118016] mthca0: ib_query_pkey port 72 failed (ret = -22) [587155.118031] hca->node_type = 236 [587155.138066] hca->node_type = 236 [587155.170056] hca->node_type = 236 [587155.170137] mthca0: ib_query_pkey port 75 failed (ret = -22) [587155.170153] hca->node_type = 236 [587155.170213] mthca0: ib_query_pkey port 76 failed (ret = -22) [587155.170225] hca->node_type = 236 [587155.205813] hca->node_type = 236 [587155.238014] hca->node_type = 236 [587155.238154] mthca0: ib_query_pkey port 79 failed (ret = -22) [587155.238168] hca->node_type = 236 [587155.238242] mthca0: ib_query_pkey port 80 failed (ret = -22) [587155.238256] hca->node_type = 236 [587155.266483] hca->node_type = 236 [587155.381938] hca->node_type = 236 [587155.382011] mthca0: ib_query_pkey port 83 failed (ret = -22) [587155.382027] hca->node_type = 236 [587155.382113] mthca0: ib_query_pkey port 84 failed (ret = -22) [587155.382125] hca->node_type = 236 [587155.418259] hca->node_type = 236 [587155.457782] hca->node_type = 236 [587155.457870] mthca0: ib_query_pkey port 87 failed (ret = -22) [587155.457886] hca->node_type = 236 [587155.457953] mthca0: ib_query_pkey port 88 failed (ret = -22) [587155.457966] hca->node_type = 236 [587155.477128] hca->node_type = 236 [587155.501172] hca->node_type = 236 [587155.501235] mthca0: ib_query_pkey port 91 failed (ret = -22) [587155.501245] hca->node_type = 236 [587155.501312] mthca0: ib_query_pkey port 92 failed (ret = -22) [587155.501323] hca->node_type = 236 [587155.580150] hca->node_type = 236 [587155.611763] hca->node_type = 236 [587155.611842] mthca0: ib_query_pkey port 95 failed (ret = -22) [587155.611855] hca->node_type = 236 [587155.611913] mthca0: ib_query_pkey port 96 failed (ret = -22) [587155.611929] hca->node_type = 236 [587155.663057] hca->node_type = 236 [587155.692342] hca->node_type = 236 [587155.692482] mthca0: ib_query_pkey port 99 failed (ret = -22) [587155.692494] hca->node_type = 236 [587155.692554] mthca0: ib_query_pkey port 100 failed (ret = -22) [587155.692572] hca->node_type = 236 [587155.759843] hca->node_type = 236 [587155.808226] hca->node_type = 236 [587155.808297] mthca0: ib_query_pkey port 103 failed (ret = -22) [587155.808317] hca->node_type = 236 [587155.808370] mthca0: ib_query_pkey port 104 failed (ret = -22) [587155.808383] hca->node_type = 236 [587155.847076] hca->node_type = 236 [587155.870709] hca->node_type = 236 [587155.870781] mthca0: ib_query_pkey port 107 failed (ret = -22) [587155.870797] hca->node_type = 236 [587155.870857] mthca0: ib_query_pkey port 108 faile6 [587155.986258] mthca0: ib_query_pkey port 111 failed (ret = -22) [587155.986269] hca->node_type = 236 [587155.986338] mthca0: ib_query_pkey port 112 failed (ret = -22) [587155.986353] hca->node_type = 236 [587156.020368] hca->node_type = 236 [587156.068549] hca->node_type = 236 [587156.068626] mthca0: ib_query_pkey port 115 failed (ret = -22) [587156.068643] hca->node_type = 236 [587156.068700] mthca0: ib_query_pkey port 116 failed (ret = -22) [587156.068719] hca->node_type = 236 p5l1:~# p5l1:~# p5l1:~# p5l1:~# # reload...... p5l1:~# p5l1:~# rmmod ib_ipoib p5l1:~# rmmod ib_mad ERROR: Module ib_mad is in use by ib_sa,ib_mthca p5l1:~# rmmod ib_sa p5l1:~# rmmod ib_mthca p5l1:~# rmmod ib_mad p5l1:~# rmmod ib_core p5l1:~# p5l1:~# modprobe ib_mthca p5l1:~# modprobe <kernel panics here>. [587324.500037] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [587324.500056] ib_mthca: Initializing 0000:d9:00.0 [587325.778913] dev->ib_dev.node_type = 1 [587330.812591] Oops: Kernel access of bad area, sig: 7 [#1] [587330.812605] SMP NR_CPUS=8 NUMA PSERIES LPAR [587330.812618] Modules linked in: ib_mthca ib_mad ib_core openafs [587330.812637] NIP: D0000000098BF558 XER: 2000000B LR: C000000000057B2C CTR: D0 000000098BF4F0 [587330.812653] REGS: c0000001e3fb3490 TRAP: 0300 Tainted: P (2.6.13.3-p ower5) [587330.812669] MSR: 8000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 2800 0084 [587330.812682] DAR: d000010082187a04 DSISR: 0000000040000000 [587330.812694] TASK: c0000003dbf4d640[0] 'swapper' THREAD: c0000001e3fb0000 CPU : 5 [587330.812708] GPR00: 0000000000000010 C0000001E3FB3710 D0000000098D64C0 D00001 0082187A04 [587330.812729] GPR04: 0000000000000008 000000010003727D 0000000000000000 00000000000007D0 [587330.812748] GPR08: C0000001E3E08910 0000000000000000 C0000001E3FB3840 D000010082187A04 [587330.812770] GPR12: 0000000048000082 C0000000004BEC00 0000000000000000 000000000FA8536C [587330.812790] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [587330.812809] GPR20: 0000000000000000 C0000000005F7ED8 C0000000005F7F40 C000000000606500 [587330.812830] GPR24: C0000001EAE84498 C0000001E3FB3840 C0000001E3FB0000 C0000001E3E08000 [587330.812852] GPR28: 0000000000000100 C0000001E3E08000 D0000000098D4E40 0000000000000000 [587330.812875] NIP [d0000000098bf558] .poll_catas+0x68/0x2f0 [ib_mthca] [587330.812914] LR [c000000000057b2c] .run_timer_softirq+0x15c/0x260 [587330.812932] Call Trace: [587330.812940] [c0000001e3fb3710] [c0000001e3fb37d0] 0xc0000001e3fb37d0 (unreliable) [587330.812959] [c0000001e3fb37d0] [c000000000057b2c] .run_timer_softirq+0x15c/0x260 [587330.812979] [c0000001e3fb3890] [c000000000051e68] .__do_softirq+0xe8/0x1c0 [587330.812997] [c0000001e3fb3950] [c000000000051fc4] .do_softirq+0x84/0x90 [587330.813016] [c0000001e3fb39d0] [c0000000000108f0] .timer_interrupt+0xd0/0x41 0 [587330.813036] [c0000001e3fb3ad0] [c00000000000a2b4] decrementer_common+0xb4/0x100 [587330.813052] --- Exception: 901 at .pseries_dedicated_idle+0x108/0x280 [587330.813071] LR = .pseries_dedicated_idle+0x1e0/0x280 [587330.813083] [c0000001e3fb3e90] [c00000000000f460] .cpu_idle+0x40/0x60 [587330.813101] [c0000001e3fb3f00] [c000000000032fa0] .start_secondary+0x120/0x150 [587330.813120] [c0000001e3fb3f90] [c00000000000ba7c] .enable_64b_mode+0x0/0x28 [587330.813136] Instruction dump: [587330.813144] 3be00000 48000020 2fab0000 381f0001 7c1f07b4 409e0058 801d0908 7f9f0040 [587330.813169] 409c00c8 e97d08f8 7be91764 7c6b4a14 <7c001c2c> 0c000000 4c00012c 780b0020 [587330.813193] <0>Kernel panic - not syncing: Fatal exception in interrupt [587330.813208] _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
