Hi Ira. Ira Weiny wrote: > I got the following error running with OFED 1.1 on a modified 2.6.9 RHEL4 > kernel. Hal mentioned that there might be a catastrophic error recovery patch > submitted since then? I can't find a mention of that in the mailing list. If > possible I would like to try such a patch. > > Thanks, > Ira > > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: Catastrophic error detected: > unknown error > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[00]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[01]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[02]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[03]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[04]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[05]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[06]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[07]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[08]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[09]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0a]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0b]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0c]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0d]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0e]: ffffffff > 2006-10-17 21:31:47 ib_mthca 0000:07:00.0: buf[0f]: ffffffff > > # rhea277 /root > /sbin/lspci -vv -s 07:00.0 > 07:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (rev 20) > Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- > Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- > Interrupt: pin A routed to IRQ 217 > Region 0: Memory at dff00000 (64-bit, non-prefetchable) [disabled] > [size=1M] > Region 2: Memory at de800000 (64-bit, prefetchable) [disabled] > [size=8M] > Capabilities: [40] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] Vital Product Data > Capabilities: [90] Message Signalled Interrupts: 64bit+ Queue=0/5 > Enable- > Address: 0000000000000000 Data: 0000 > Capabilities: [84] MSI-X: Enable- Mask- TabSize=32 > Vector table: BAR=0 offset=00082000 > PBA: BAR=0 offset=00082200 > Capabilities: [60] Express Endpoint IRQ 0 > Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag- > Device: Latency L0s <64ns, L1 unlimited > Device: AtnBtn- AtnInd- PwrInd- > Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported- > Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > Device: MaxPayload 128 bytes, MaxReadReq 512 bytes > Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8 > Link: Latency L0s unlimited, L1 unlimited > Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- > Link: Speed 2.5Gb/s, Width x8 >
can you please give me some info on how you got this error: * what did you do that caused this error? * which FW version do you have? * what is the board_id of the HCA? (you can find this info using ibv_devinfo) thanks Dotan _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
