> -----Original Message----- > From: Jan Beulich [mailto:[email protected]] > Sent: Wednesday, May 24, 2017 2:25 PM > To: Hao, Xudong <[email protected]> > Cc: Julien Grall <[email protected]>; George Dunlap > <[email protected]>; Lars Kurth <[email protected]>; Zhang, > Haozhong <[email protected]>; [email protected] > Subject: RE: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 crash > > >>> On 24.05.17 at 07:32, <[email protected]> wrote: > >> -----Original Message----- > >> From: Xen-devel [mailto:[email protected]] On Behalf Of > >> Hao, Xudong > >> Sent: Tuesday, May 23, 2017 5:34 PM > >> To: Jan Beulich <[email protected]> > >> Cc: Lars Kurth <[email protected]>; Julien Grall > >> <[email protected]>; George Dunlap <[email protected]>; > >> Zhang, Haozhong <[email protected]>; [email protected] > >> Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 > >> crash > >> > >> > -----Original Message----- > >> > From: Xen-devel [mailto:[email protected]] On Behalf > >> > Of Jan Beulich > >> > Sent: Tuesday, May 23, 2017 12:06 AM > >> > To: Hao, Xudong <[email protected]> > >> > Cc: Lars Kurth <[email protected]>; Julien Grall > >> > <[email protected]>; [email protected]; George Dunlap > >> > <[email protected]>; Zhang, Haozhong > >> > <[email protected]> > >> > Subject: Re: [Xen-devel] [BUG] xen-mceinj tool testing cause dom0 > >> > crash > >> > > >> > >>> On 22.05.17 at 10:39, <[email protected]> wrote: > >> > > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds. > >> > > >> > Not this - Xen is unavoidably going to go down in such a case, yet > >> > your log has no hint at all what kind of problem Dom0 experienced > >> > (e.g. whether one of the injected #MC-s caused this). > >> > > >> > >> Jan, > >> The first mail attached the complete log from Xen booting, hope there > >> is > > some > >> hint from the full log. > >> > >> > > (XEN) ----[ Xen-4.9-rc x86_64 debug=y Tainted: MCE ]---- > >> > > (XEN) CPU: 0 > >> > > (XEN) RIP: e008:[<0000000065eb1e13>] 0000000065eb1e13 > >> > > ... > >> > > (XEN) Pagetable walk from 00000000682ab009: > >> > > (XEN) L4[0x000] = 000000102c961063 ffffffffffffffff > >> > > (XEN) L3[0x001] = 000000005f812063 ffffffffffffffff > >> > > (XEN) L2[0x141] = 0000000000000000 ffffffffffffffff > >> > > >> > Here you're apparently hitting a firmware bug: While RIP points > >> > into runtime services memory, CR2 doesn't: > >> > > >> > (XEN) 0000065eb8000-00000682acfff type=0 attr=000000000000000f > >> > > >> > You may try working around this via one of "reboot=acpi" or > >> > "efi=no-rs" on the hypervisor command line. > >> > > >> > >> Will try them. > >> > > > > Neither "reboot=acpi" nor "efi=no-rs" can work around this issue. > > Apparently I didn't express myself clearly enough: These workarounds were > supposed to help with the Xen crash, not the Dom0 one. And as your logs prove > they did fulfill that purpose. Yet still there are no Dom0 log messages at > all near > the crash, which leaves open whether there is a completely silent path in its > MCE > handling, or whether some messages simply don't make it through. Right now I > can't see any Xen side of the issue here though, so from a 4.9 perspective I > think > we're fine. >
We figured out the problem, some corner scripts triggered the error injection at the same page (pfn 0x180020) twice, i.e. "./xen-mceinj -t 0" run over one time, which resulted in Dom0 crash. Let's close this bug thread, sorry for the invalid report and thanks Jan's analysis. Thanks, -Xudong _______________________________________________ Xen-devel mailing list [email protected] https://lists.xen.org/xen-devel
