Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash

Hao, Xudong Fri, 03 Nov 2017 01:30:33 -0700

> -----Original Message-----
> From: Jan Beulich [mailto:[email protected]]
> Sent: Wednesday, May 24, 2017 2:25 PM
> To: Hao, Xudong <[email protected]>
> Cc: Julien Grall <[email protected]>; George Dunlap
> <[email protected]>; Lars Kurth <[email protected]>; Zhang,
> Haozhong <[email protected]>; [email protected]
> Subject: RE: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash
> 
> >>> On 24.05.17 at 07:32, <[email protected]> wrote:
> >>  -----Original Message-----
> >> From: Xen-devel [mailto:[email protected]] On Behalf Of
> >> Hao, Xudong
> >> Sent: Tuesday, May 23, 2017 5:34 PM
> >> To: Jan Beulich <[email protected]>
> >> Cc: Lars Kurth <[email protected]>; Julien Grall
> >> <[email protected]>; George Dunlap <[email protected]>;
> >> Zhang, Haozhong <[email protected]>; [email protected]
> >> Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0
> >> crash
> >>
> >> > -----Original Message-----
> >> > From: Xen-devel [mailto:[email protected]] On Behalf
> >> > Of Jan Beulich
> >> > Sent: Tuesday, May 23, 2017 12:06 AM
> >> > To: Hao, Xudong <[email protected]>
> >> > Cc: Lars Kurth <[email protected]>; Julien Grall
> >> > <[email protected]>; [email protected]; George Dunlap
> >> > <[email protected]>; Zhang, Haozhong
> >> > <[email protected]>
> >> > Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0
> >> > crash
> >> >
> >> > >>> On 22.05.17 at 10:39, <[email protected]> wrote:
> >> > > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> >> >
> >> > Not this - Xen is unavoidably going to go down in such a case, yet
> >> > your log has no hint at all what kind of problem Dom0 experienced
> >> > (e.g. whether one of the injected #MC-s caused this).
> >> >
> >>
> >> Jan,
> >> The first mail attached the complete log from Xen booting, hope there
> >> is
> > some
> >> hint from the full log.
> >>
> >> > > (XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Tainted: MCE  ]----
> >> > > (XEN) CPU:    0
> >> > > (XEN) RIP:    e008:[<0000000065eb1e13>] 0000000065eb1e13
> >> > > ...
> >> > > (XEN) Pagetable walk from 00000000682ab009:
> >> > > (XEN)  L4[0x000] = 000000102c961063 ffffffffffffffff
> >> > > (XEN)  L3[0x001] = 000000005f812063 ffffffffffffffff
> >> > > (XEN)  L2[0x141] = 0000000000000000 ffffffffffffffff
> >> >
> >> > Here you're apparently hitting a firmware bug: While RIP points
> >> > into runtime services memory, CR2 doesn't:
> >> >
> >> > (XEN)  0000065eb8000-00000682acfff type=0 attr=000000000000000f
> >> >
> >> > You may try working around this via one of "reboot=acpi" or
> >> > "efi=no-rs" on the hypervisor command line.
> >> >
> >>
> >> Will try them.
> >>
> >
> > Neither "reboot=acpi" nor "efi=no-rs" can work around this issue.
> 
> Apparently I didn't express myself clearly enough: These workarounds were
> supposed to help with the Xen crash, not the Dom0 one. And as your logs prove
> they did fulfill that purpose. Yet still there are no Dom0 log messages at 
> all near
> the crash, which leaves open whether there is a completely silent path in its 
> MCE
> handling, or whether some messages simply don't make it through. Right now I
> can't see any Xen side of the issue here though, so from a 4.9 perspective I 
> think
> we're fine.
>


We figured out the problem, some corner scripts triggered the error injection 
at the same page (pfn 0x180020) twice, i.e. "./xen-mceinj -t 0" run over one 
time, which resulted in Dom0 crash.

Let's close this bug thread, sorry for the invalid report and thanks Jan's 
analysis.


Thanks,
-Xudong


_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xen.org/xen-devel

Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash

Reply via email to