-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Right. I've had this machine (and OS running a much more vanilla configuration) and HBA using the OpenIB and MTI stacks just fine in the past. Dual Opteron, 8GB RAM, PCI-X MT23108. This same problem happens with this kernel on fairly different hardware we're using too, though..
It is Fedora Core 1, vanilla 2.4.24-based with Lustre 1.2.6 patches/mods. Almost nothing is modular in the kernel.. it is either off or compiled in. In fact, ACPI is turned off.. perhaps enabling it would be beneficial? I have attached the config file if that helps. Perhaps there is something critical I have unknowingly disabled.
Also, another question I have is fairly naive -- at what point are the Lion Cub (PCI Express) cards supported in the OpenIB stack? I seem to remember the Tavor code supporting them inherently but in a non-efficient manner if native code wasn't used.
Ken
Tziporet Koren wrote:
| The problem is that the driver does not get the interrupt for the command | completion, | and thus you get the error: "Command not completed after timeout". | | It is related to the OS & system you are using. What is the distribution you | are using? We once saw such problems with older versions of SuSE. | | Try to add append="acpi=off" to the lilo you are using or add also | disableapic in the same append line. | | | Tziporet | | | -----Original Message----- | From: Ken MacInnis [mailto:[EMAIL PROTECTED] | Sent: Sunday, October 31, 2004 8:20 PM | To: [EMAIL PROTECTED] | Subject: [openib-general] Problem with 2.4.24 and gen1
| I've got a fairly modified kernel here I'm trying to get a OpenIB stack | running on. It's a vanilla 2.4.24 kernel with Lustre and other patches | in it, but I'm seeing this when I modprobe ib_tavor: | | Oct 31 13:13:05 samwise kernel: THH(1): cmdif.c[1190]: Command not | completed after timeout: cmd=TAV | OR_IF_CMD_MAD_IFC (0x24), token=0x1400, pid=0x8E1, go=0 | Oct 31 13:13:05 samwise kernel: THH(1): CMD ERROR DUMP. opcode=0x24, | opc_mod = 0x1, exec_time_micro | =300000000 | . | . | Oct 31 13:13:06 samwise kernel: THH(1): cmdif.c[842]: Failed command | 0x24 (TAVOR_IF_CMD_MAD_IFC): s | tatus=0x103 (0x0103 - unexpected error - fatal) | Oct 31 13:13:06 samwise kernel: | Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[2790]: | THH_hob_query_port_prop: cmdif returned FA | TAL | Oct 31 13:13:06 samwise kernel: VIPKL(1): qpm.c[278]: QPM_new: | HOBKL_query_port_prop returned with | error: -254 = VAPI_EFATAL | Oct 31 13:13:06 samwise kernel: VIPKL(1): qpm.c[302]: QPM_new: | returned with error: -254 = VAPI_EF | ATAL | Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[3474]: | THH_hob_fatal_err_thread: RECEIVED FATAL E | RROR WAKEUP | Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[4490]: | THH_hob_halt_hca: HALT HCA returned 0x103 | Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[1620]: | THH_hob_destroy: FATAL ERROR | Oct 31 13:13:06 samwise kernel: THH(1): thh_hob.c[1627]: | THH_hob_destroy: PERFORMING SW RESET. pa=0 | xFE9F0010 va=0xF8A01010 | Oct 31 13:13:06 samwise kernel: | Oct 31 13:13:06 samwise kernel: Mellanox Tavor Device Driver is creating | device "InfiniHost0" (bus=0 | 4, devfn=00) | Oct 31 13:13:06 samwise kernel: | Oct 31 13:13:06 samwise kernel: | [KERNEL_IB][_tsIbTavorInitOne][tavor_main.c:86]InfiniHost0: VAPI_ope | n_hca failed, status -254 (Fatal error (Local Catastrophic Error)) | Oct 31 13:13:06 samwise kernel: | [SRPTP][srp_host_init][srp_host.c:1495]SRP Host using indirect addre | ssing | | | This occurs with an older openib rev (200-ish) as well as one up-to-date | as of today. | | Everything else (modules.conf, etc.) is set up as it has been when I was | messing with 2.4 kernels and OpenIB a few months ago, so I'm not | thinking it's related to such. | | Any ideas? Yes, I know it's 2.4 as well as a fairly older 2.4, but I | have no choice here. :) lspci -vvv bits follow. | | 03:01.0 PCI bridge: Mellanox Technology: Unknown device 5a46 (rev a1) | (prog-if 00 [Normal decode]) | Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- | ParErr- Stepping- SERR+ FastB2B- | Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | <TAbort- <MAbort- >SERR- <P | ERR- | Latency: 64, cache line size 10 | Bus: primary=03, secondary=04, subordinate=04, sec-latency=64 | I/O behind bridge: 0000f000-00000fff | Memory behind bridge: fe700000-fe9fffff | Prefetchable memory behind bridge: | 00000000eb200000-00000000fc200000 | BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- | Capabilities: [70] PCI-X non-bridge device. | Command: DPERE+ ERO+ RBC=0 OST=4 | Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, | DC=simple, DMMRBC=0, DMOST=0, D | MCRS=0, RSCEM- | 04:00.0 InfiniBand: Mellanox Technology: Unknown device 5a44 (rev a1) | Subsystem: Mellanox Technology: Unknown device 5a44 | Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- | ParErr- Stepping- SERR+ FastB2B- | Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | <TAbort- <MAbort- >SERR- <P | ERR- | Latency: 64, cache line size 10 | Interrupt: pin A routed to IRQ 25 | Region 0: Memory at fe900000 (64-bit, non-prefetchable) [size=1M] | Region 2: Memory at fb800000 (64-bit, prefetchable) [size=8M] | Region 4: Memory at f0000000 (64-bit, prefetchable) [size=128M] | Capabilities: [40] #11 [001f] | Capabilities: [60] Message Signalled Interrupts: 64bit+ | Queue=0/5 Enable- | Address: 0000000000000000 Data: 0000 | Capabilities: [70] PCI-X non-bridge device. | Command: DPERE- ERO- RBC=3 OST=1 | Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, | DC=simple, DMMRBC=0, DMOST=0, D | MCRS=0, RSCEM- | | | Ken |
- -- Ken MacInnis - Systems Engineer, PSC - http://www.psc.edu/~kcm/ kcm at psc dot edu - +1 412 268 9833 (w) - +1 412 268 5832 (f) Pittsburgh Supercomputing Center - 4400 Fifth Ave - Pittsburgh, PA 15213 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32)
iD8DBQFBhi9mnT0C17PQhv4RAvckAKComYvuQ8dZ+B3tZBuBvkH6q+MDSgCfe3Bz DtsqzV39ekgtfzWIGx6vNzk= =zkFD -----END PGP SIGNATURE----- _______________________________________________ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
