Re: MCE Bug?

2015-06-18 Thread Borislav Petkov
On Wed, Jun 17, 2015 at 11:53:53PM +, Luck, Tony wrote: > > if you want to give those changes a run, I've uploaded them here: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras > > Latest experiments show that sometimes checking kventd_up() before calling >

Re: MCE bug?

2015-06-18 Thread Borislav Petkov
On Thu, Jun 18, 2015 at 05:18:45PM +0800, Rui Wang wrote: > I see a different panic with this kernel. Not seen every time. > It was after reboot due to injected errors. Yeah, we did debug a bit last night with Tony - this is all in the workqueue code which we're apparently calling too early into.

Re: MCE bug?

2015-06-18 Thread Rui Wang
> On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote: >> And I was waiting in line to get a chance to do some injection on our >> EINJ box here too. But it seems you have the required setup already so >> if you want to give those changes a run, I've uploaded them here: >> >>

Re: MCE bug?

2015-06-18 Thread Borislav Petkov
On Thu, Jun 18, 2015 at 05:18:45PM +0800, Rui Wang wrote: I see a different panic with this kernel. Not seen every time. It was after reboot due to injected errors. Yeah, we did debug a bit last night with Tony - this is all in the workqueue code which we're apparently calling too early into. I

Re: MCE Bug?

2015-06-18 Thread Borislav Petkov
On Wed, Jun 17, 2015 at 11:53:53PM +, Luck, Tony wrote: if you want to give those changes a run, I've uploaded them here: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras Latest experiments show that sometimes checking kventd_up() before calling schedule_work()

Re: MCE bug?

2015-06-18 Thread Rui Wang
On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote: And I was waiting in line to get a chance to do some injection on our EINJ box here too. But it seems you have the required setup already so if you want to give those changes a run, I've uploaded them here:

RE: MCE Bug?

2015-06-17 Thread Luck, Tony
> if you want to give those changes a run, I've uploaded them here: > > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras Latest experiments show that sometimes checking kventd_up() before calling schedule_work() helps ... but mostly only when I fake some early logs from low

Re: MCE Bug?

2015-06-17 Thread Luck, Tony
On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote: > And I was waiting in line to get a chance to do some injection on our > EINJ box here too. But it seems you have the required setup already so > if you want to give those changes a run, I've uploaded them here: > >

Re: MCE Bug?

2015-06-17 Thread Borislav Petkov
Hi Rui, On Wed, Jun 17, 2015 at 02:04:56AM +, Wang, Rui Y wrote: > Is it a known problem? I'm based on Linux 4.1.0-rc3-7. Yeah, I triggered a similar thing in conjunction with testing Gong's stuff: https://lkml.kernel.org/r/20150611082747.ga30...@pd.tnic And I was waiting in line to get a

Re: MCE Bug?

2015-06-17 Thread Luck, Tony
On Wed, Jun 17, 2015 at 11:41:56AM +0200, Borislav Petkov wrote: And I was waiting in line to get a chance to do some injection on our EINJ box here too. But it seems you have the required setup already so if you want to give those changes a run, I've uploaded them here:

RE: MCE Bug?

2015-06-17 Thread Luck, Tony
if you want to give those changes a run, I've uploaded them here: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras Latest experiments show that sometimes checking kventd_up() before calling schedule_work() helps ... but mostly only when I fake some early logs from low

Re: MCE Bug?

2015-06-17 Thread Borislav Petkov
Hi Rui, On Wed, Jun 17, 2015 at 02:04:56AM +, Wang, Rui Y wrote: Is it a known problem? I'm based on Linux 4.1.0-rc3-7. Yeah, I triggered a similar thing in conjunction with testing Gong's stuff: https://lkml.kernel.org/r/20150611082747.ga30...@pd.tnic And I was waiting in line to get a

MCE bug?

2015-06-16 Thread Rui Wang
Hi Boris & Tony, While injecting MCEs using einj, I encountered a panic: [0.305697] mce: CPU supports 22 MCE banks [0.310288] BUG: unable to handle kernel NULL pointer dereference at 0100 [0.319057] IP: []

MCE bug?

2015-06-16 Thread Rui Wang
Hi Boris Tony, While injecting MCEs using einj, I encountered a panic: [0.305697] mce: CPU supports 22 MCE banks [0.310288] BUG: unable to handle kernel NULL pointer dereference at 0100 [0.319057] IP: