On 02/22/2017 at 02:20 AM, Luck, Tony wrote:
>> It's from my understanding, I didn't get the explicit description from the
>> intel SDM on this point.
>> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each
>> cpu have MCG_STATUS_RIPV bit set?
> MCG_STATUS is a per-thread
> It's from my understanding, I didn't get the explicit description from the
> intel SDM on this point.
> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each
> cpu have MCG_STATUS_RIPV bit set?
MCG_STATUS is a per-thread MSR and will contain the status appropriate for
On 02/17/2017 at 05:07 PM, Borislav Petkov wrote:
> On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
>> It changes the value of cpu_online_mask/etc which will cause confusion to
>> vmcore analysis.
> Then export the crashing_cpu variable, initialize it to something
> invalid in the
On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
> It changes the value of cpu_online_mask/etc which will cause confusion to
> vmcore analysis.
Then export the crashing_cpu variable, initialize it to something
invalid in the first kernel, -1 for example, and test it in the #MC
On 02/16/2017 at 08:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>> then mce will be broadcast to the other cpus which are still running
>> in the first kernel(i.e. looping in crash_nmi_callback).
> Simple: the crash code should really mark
On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
> then mce will be broadcast to the other cpus which are still running
> in the first kernel(i.e. looping in crash_nmi_callback).
Simple: the crash code should really mark CPUs as not being online:
void do_machine_check(struct
On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0
>> 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump
>> boots(seems
>>
On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0
> 0x0"),
> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump
> boots(seems
> the cpus remain in 1st kernel don't respond to the
On 01/26/2017 at 02:44 PM, Borislav Petkov wrote:
> On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
>> The hardware machine check is hard to reproduce, but the mce code of
>> RHEL7 is quite the same as that of tip/master, anyway we are able to
>> inject software mce to reproduce it.
>
On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
> The hardware machine check is hard to reproduce, but the mce code of
> RHEL7 is quite the same as that of tip/master, anyway we are able to
> inject software mce to reproduce it.
Please give me your exact steps so that I can try to
On 01/24/2017 at 08:22 PM, Borislav Petkov wrote:
> On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
>> It occurred on real hardware when testing crash dump.
>>
>> 1) SysRq-c was injected for the test in 1st kernel
>> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for
On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
> It occurred on real hardware when testing crash dump.
>
> 1) SysRq-c was injected for the test in 1st kernel
> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>[ 0.00] Command line:
On 01/23/2017 at 10:50 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
>> One possible timing sequence would be:
>> 1st kernel running on multiple cpus panicked
>> then the crash dump code starts
>> the crash dump code stops the others cpus except the
On 01/24/2017 at 02:14 AM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
>> will ignore the machine check on the other cpus ... assuming
>> that "cpu_is_offline(smp_processor_id())" does the right thing
>> in the kexec case where this is an "old" cpu that
On 01/24/2017 at 09:46 AM, Xunlei Pang wrote:
> On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
>> Hey Tony,
>>
>> a "welcome back" is in order? :-)
>>
>> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>>> If the system had experienced some memory corruption, but
>>> recovered ...
On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that
On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
> will ignore the machine check on the other cpus ... assuming
> that "cpu_is_offline(smp_processor_id())" does the right thing
> in the kexec case where this is an "old" cpu that isn't online
> in the new kernel.
Nice. And kdump did do
On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
Yes - first day back today. Lots of catching up to do.
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
>
Hey Tony,
a "welcome back" is in order? :-)
On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others
On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> One possible timing sequence would be:
> 1st kernel running on multiple cpus panicked
> then the crash dump code starts
> the crash dump code stops the others cpus except the crashing one
> 2nd kernel boots up on the crash cpu with
On 01/23/2017 at 08:51 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
>> We met an issue for kdump: after kdump kernel boots up,
>> and there comes a broadcasted mce in first kernel, the
> How does that even happen?
>
> Lemme try to understand this
On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
> We met an issue for kdump: after kdump kernel boots up,
> and there comes a broadcasted mce in first kernel, the
How does that even happen?
Lemme try to understand this correctly: the first kernel gets an
MCE, kdump starts and boots
23 matches
Mail list logo