Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Gavin Shan
I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08). Haven't had a chance to narrow it down yet. Thanks for the information. I'll try to reproduce the issue on Firebird-L today. By the way, it seems that mstmread is some user-level application accessing the config space while the

Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Anton Blanchard
Hi, Thanks for the information. I'll try to reproduce the issue on Firebird-L today. By the way, it seems that mstmread is some user-level application accessing the config space while the problem happened? The EEH error is caused by the Melanox firmware tools. It seems the crash was caused

Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Benjamin Herrenschmidt
On Tue, 2012-04-17 at 11:37 +1000, Anton Blanchard wrote: No. I replaced that backtrace in eeh_dn_check_failure with a WARN_ON() because the backtrace doesn't give us enough info. I'm submitting a patch for that today. Bottom line is mstmread has been causing an EEH error since at least

Re: [PATCH v5 00/21] EEH reorganization

2012-04-16 Thread Gavin Shan
Ben, thanks a lot for the backtrace to help narrowing down the root cause. Also thanks a lot for how to parse the backtrace and register staff printed by oops ;-) Finally, I successfully reproduced the issue on Firebird-L machine without loading the corresponding device driver for Emulex

Re: [PATCH v5 00/21] EEH reorganization

2012-04-12 Thread Anton Blanchard
Hi Gavin, This series of patches is going to reorganize EEH so that it could support multiple platforms in future. The requirements were raised from the aspects. I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08). Haven't had a chance to narrow it down yet. Oops: Kernel

Re: [PATCH v5 00/21] EEH reorganization

2012-04-12 Thread Anton Blanchard
Hi, I just hit this on mainline from today (3.4.0-rc2-00065-gf549e08). Haven't had a chance to narrow it down yet. Looking closer, it was caused by an EEH error at boot. It looks like the Mellanox infiniband card gets an error when probed by their firmware tool (mstmread), but only if the

Re: [PATCH v5 00/21] EEH reorganization

2012-02-28 Thread Gavin Shan
Hi Ben, Could you pls take a look on this when you have time? Thanks, Gavin This series of patches is going to reorganize EEH so that it could support multiple platforms in future. The requirements were raised from the aspects. * The original EEH implementation only support pSeries