Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-21 Thread Xunlei Pang
On 02/22/2017 at 02:20 AM, Luck, Tony wrote:
>> It's from my understanding, I didn't get the explicit description from the 
>> intel SDM on this point.
>> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each 
>> cpu have MCG_STATUS_RIPV bit set?
> MCG_STATUS is a per-thread MSR and will contain the status appropriate for 
> that thread when #MC is delivered.
> So the RIPV bit will be set if, and only if, the thread saved a valid return 
> address for this exception. The net result
> is that it is almost always set for "innocent bystander" CPUs that were 
> dragged into the exception handler because
> of a broadcast #MC. We make the test because if it isn't set, then the 
> do_machine_check() had better not return
> because we have no idea where it will return to - since there is not a valid 
> return IP.
>

Got it, thanks for the details.

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-21 Thread Xunlei Pang
On 02/22/2017 at 02:20 AM, Luck, Tony wrote:
>> It's from my understanding, I didn't get the explicit description from the 
>> intel SDM on this point.
>> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each 
>> cpu have MCG_STATUS_RIPV bit set?
> MCG_STATUS is a per-thread MSR and will contain the status appropriate for 
> that thread when #MC is delivered.
> So the RIPV bit will be set if, and only if, the thread saved a valid return 
> address for this exception. The net result
> is that it is almost always set for "innocent bystander" CPUs that were 
> dragged into the exception handler because
> of a broadcast #MC. We make the test because if it isn't set, then the 
> do_machine_check() had better not return
> because we have no idea where it will return to - since there is not a valid 
> return IP.
>

Got it, thanks for the details.

Regards,
Xunlei


RE: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-21 Thread Luck, Tony
> It's from my understanding, I didn't get the explicit description from the 
> intel SDM on this point.
> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each 
> cpu have MCG_STATUS_RIPV bit set?

MCG_STATUS is a per-thread MSR and will contain the status appropriate for that 
thread when #MC is delivered.
So the RIPV bit will be set if, and only if, the thread saved a valid return 
address for this exception. The net result
is that it is almost always set for "innocent bystander" CPUs that were dragged 
into the exception handler because
of a broadcast #MC. We make the test because if it isn't set, then the 
do_machine_check() had better not return
because we have no idea where it will return to - since there is not a valid 
return IP.

-Tony


RE: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-21 Thread Luck, Tony
> It's from my understanding, I didn't get the explicit description from the 
> intel SDM on this point.
> If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each 
> cpu have MCG_STATUS_RIPV bit set?

MCG_STATUS is a per-thread MSR and will contain the status appropriate for that 
thread when #MC is delivered.
So the RIPV bit will be set if, and only if, the thread saved a valid return 
address for this exception. The net result
is that it is almost always set for "innocent bystander" CPUs that were dragged 
into the exception handler because
of a broadcast #MC. We make the test because if it isn't set, then the 
do_machine_check() had better not return
because we have no idea where it will return to - since there is not a valid 
return IP.

-Tony


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-17 Thread Xunlei Pang
On 02/17/2017 at 05:07 PM, Borislav Petkov wrote:
> On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
>> It changes the value of cpu_online_mask/etc which will cause confusion to 
>> vmcore analysis.
> Then export the crashing_cpu variable, initialize it to something
> invalid in the first kernel, -1 for example, and test it in the #MC
> handlier like this:
>
>   int cpu;
>
>   ...
>
>   cpu = smp_processor_id();
>
>   if (cpu_is_offline(cpu) ||
>   ((crashing_cpu != -1) && (crashing_cpu != cpu)) {
> u64 mcgstatus;
>
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>   return;
>   }
>   }

Yes, it is doable, I will do some tests later.

>> Moreover, for the code(see comment inlined)
>>
>> if (cpu_is_offline(smp_processor_id())) {
>> u64 mcgstatus;
>>
>> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>> if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be 
>> not true, the mce triggered on kdump cpu 
>>  // 
>> doesn't need to have this bit set for the other cpus remain in 1st kernel. 
> Is this on kvm or on a real hardware? Because for kvm I don't care. And
> don't say "theoretically".
>

It's from my understanding, I didn't get the explicit description from the 
intel SDM on this point.
If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each 
cpu have MCG_STATUS_RIPV bit set?

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-17 Thread Xunlei Pang
On 02/17/2017 at 05:07 PM, Borislav Petkov wrote:
> On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
>> It changes the value of cpu_online_mask/etc which will cause confusion to 
>> vmcore analysis.
> Then export the crashing_cpu variable, initialize it to something
> invalid in the first kernel, -1 for example, and test it in the #MC
> handlier like this:
>
>   int cpu;
>
>   ...
>
>   cpu = smp_processor_id();
>
>   if (cpu_is_offline(cpu) ||
>   ((crashing_cpu != -1) && (crashing_cpu != cpu)) {
> u64 mcgstatus;
>
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>   return;
>   }
>   }

Yes, it is doable, I will do some tests later.

>> Moreover, for the code(see comment inlined)
>>
>> if (cpu_is_offline(smp_processor_id())) {
>> u64 mcgstatus;
>>
>> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>> if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be 
>> not true, the mce triggered on kdump cpu 
>>  // 
>> doesn't need to have this bit set for the other cpus remain in 1st kernel. 
> Is this on kvm or on a real hardware? Because for kvm I don't care. And
> don't say "theoretically".
>

It's from my understanding, I didn't get the explicit description from the 
intel SDM on this point.
If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each 
cpu have MCG_STATUS_RIPV bit set?

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-17 Thread Borislav Petkov
On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
> It changes the value of cpu_online_mask/etc which will cause confusion to 
> vmcore analysis.

Then export the crashing_cpu variable, initialize it to something
invalid in the first kernel, -1 for example, and test it in the #MC
handlier like this:

int cpu;

...

cpu = smp_processor_id();

if (cpu_is_offline(cpu) ||
((crashing_cpu != -1) && (crashing_cpu != cpu)) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

> Moreover, for the code(see comment inlined)
> 
> if (cpu_is_offline(smp_processor_id())) {
> u64 mcgstatus;
> 
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be 
> not true, the mce triggered on kdump cpu 
>  // 
> doesn't need to have this bit set for the other cpus remain in 1st kernel. 

Is this on kvm or on a real hardware? Because for kvm I don't care. And
don't say "theoretically".

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-17 Thread Borislav Petkov
On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote:
> It changes the value of cpu_online_mask/etc which will cause confusion to 
> vmcore analysis.

Then export the crashing_cpu variable, initialize it to something
invalid in the first kernel, -1 for example, and test it in the #MC
handlier like this:

int cpu;

...

cpu = smp_processor_id();

if (cpu_is_offline(cpu) ||
((crashing_cpu != -1) && (crashing_cpu != cpu)) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

> Moreover, for the code(see comment inlined)
> 
> if (cpu_is_offline(smp_processor_id())) {
> u64 mcgstatus;
> 
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be 
> not true, the mce triggered on kdump cpu 
>  // 
> doesn't need to have this bit set for the other cpus remain in 1st kernel. 

Is this on kvm or on a real hardware? Because for kvm I don't care. And
don't say "theoretically".

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Xunlei Pang
On 02/16/2017 at 08:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>> then mce will be broadcast to the other cpus which are still running
>> in the first kernel(i.e. looping in crash_nmi_callback).
> Simple: the crash code should really mark CPUs as not being online:
>
> void do_machine_check(struct pt_regs *regs, long error_code)
>
>   ...
>
> /* If this CPU is offline, just bail out. */
> if (cpu_is_offline(smp_processor_id())) {
> u64 mcgstatus;
>
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> return;
> }
> }
>
> because looping in crash_nmi_callback() does not really denote them as
> CPUs being online.
>
> And just so that you don't disturb the machine too much during crashing,
> you could simply clear them from the online masks, i.e., perhaps call
> remove_cpu_from_maps() with the proper locking around it instead of
> doing a full cpu_down().

It changes the value of cpu_online_mask/etc which will cause confusion to 
vmcore analysis.
Moreover, for the code(see comment inlined)

if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not 
true, the mce triggered on kdump cpu 
 // doesn't 
need to have this bit set for the other cpus remain in 1st kernel. 
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}


Regards,
Xunlei

>
> The machine will be killed anyway after kdump is done writing out
> memory.
>



Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Xunlei Pang
On 02/16/2017 at 08:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>> then mce will be broadcast to the other cpus which are still running
>> in the first kernel(i.e. looping in crash_nmi_callback).
> Simple: the crash code should really mark CPUs as not being online:
>
> void do_machine_check(struct pt_regs *regs, long error_code)
>
>   ...
>
> /* If this CPU is offline, just bail out. */
> if (cpu_is_offline(smp_processor_id())) {
> u64 mcgstatus;
>
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> return;
> }
> }
>
> because looping in crash_nmi_callback() does not really denote them as
> CPUs being online.
>
> And just so that you don't disturb the machine too much during crashing,
> you could simply clear them from the online masks, i.e., perhaps call
> remove_cpu_from_maps() with the proper locking around it instead of
> doing a full cpu_down().

It changes the value of cpu_online_mask/etc which will cause confusion to 
vmcore analysis.
Moreover, for the code(see comment inlined)

if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not 
true, the mce triggered on kdump cpu 
 // doesn't 
need to have this bit set for the other cpus remain in 1st kernel. 
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}


Regards,
Xunlei

>
> The machine will be killed anyway after kdump is done writing out
> memory.
>



Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Borislav Petkov
On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
> then mce will be broadcast to the other cpus which are still running
> in the first kernel(i.e. looping in crash_nmi_callback).

Simple: the crash code should really mark CPUs as not being online:

void do_machine_check(struct pt_regs *regs, long error_code)

...

/* If this CPU is offline, just bail out. */
if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

because looping in crash_nmi_callback() does not really denote them as
CPUs being online.

And just so that you don't disturb the machine too much during crashing,
you could simply clear them from the online masks, i.e., perhaps call
remove_cpu_from_maps() with the proper locking around it instead of
doing a full cpu_down().

The machine will be killed anyway after kdump is done writing out
memory.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Borislav Petkov
On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
> then mce will be broadcast to the other cpus which are still running
> in the first kernel(i.e. looping in crash_nmi_callback).

Simple: the crash code should really mark CPUs as not being online:

void do_machine_check(struct pt_regs *regs, long error_code)

...

/* If this CPU is offline, just bail out. */
if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

because looping in crash_nmi_callback() does not really denote them as
CPUs being online.

And just so that you don't disturb the machine too much during crashing,
you could simply clear them from the online masks, i.e., perhaps call
remove_cpu_from_maps() with the proper locking around it instead of
doing a full cpu_down().

The machine will be killed anyway after kdump is done writing out
memory.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Xunlei Pang
On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 
>> 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
>> boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting 
>> mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
>> broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast 
>> exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
Enable kdump(crashkernel=256M) and configure kdump kernel to boot with 
"nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
(taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100 0x5 0x0 0x0),
then mce will be broadcast to the other cpus which are still running
in the first kernel(i.e. looping in crash_nmi_callback).
If you own some hardware to inject mce, it would be great, as QEMU does not 
work correctly for me.
4. Then something like below is expected to happen:

[1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
 Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.10] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: 
bd00017a
[   39.10] mce: [Hardware Error]: TSC 0 ADDR 6160 MISC 8c 
[   39.10] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 
0 APIC 0 microcode 1
[   39.10] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.10] Kernel panic - not syncing: Timeout: Not all CPUs entered 
broadcast exception handler
[   39.10] Shutting down cpus with NMI
[1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[1.758463] Do you have a strange power saving mode enabled?
[1.758463] Dazed and confused, but trying to continue
[   39.10] Rebooting in 30 seconds..

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Xunlei Pang
On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 
>> 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
>> boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting 
>> mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
>> broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast 
>> exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
Enable kdump(crashkernel=256M) and configure kdump kernel to boot with 
"nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
(taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100 0x5 0x0 0x0),
then mce will be broadcast to the other cpus which are still running
in the first kernel(i.e. looping in crash_nmi_callback).
If you own some hardware to inject mce, it would be great, as QEMU does not 
work correctly for me.
4. Then something like below is expected to happen:

[1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
 Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.10] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: 
bd00017a
[   39.10] mce: [Hardware Error]: TSC 0 ADDR 6160 MISC 8c 
[   39.10] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 
0 APIC 0 microcode 1
[   39.10] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.10] Kernel panic - not syncing: Timeout: Not all CPUs entered 
broadcast exception handler
[   39.10] Shutting down cpus with NMI
[1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[1.758463] Do you have a strange power saving mode enabled?
[1.758463] Dazed and confused, but trying to continue
[   39.10] Rebooting in 30 seconds..

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Borislav Petkov
On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 
> 0x0"),
> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
> boots(seems
> the cpus remain in 1st kernel don't respond to the simulated broadcasting 
> mce).
> 
> But in theory, we know cpus belong to kdump kernel can't respond to the
> old mce handler, so a single SRAO injection in 1st kernel should be similar.
> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
> broadcast
> mce supported, and inject SRAO to cpu0 only through qemu monitor
> "mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
> the machine as follows(running on linux-4.9):
>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast 
> exception handler

Sounds to me like you're trying hard to prove some point of yours which
doesn't make much sense to me. And when you say "in theory", that makes
it even less believable. So I remember asking you for exact steps. That
above doesn't read like steps but like some babbling and I've actually
tried to make sense of it for a couple of minutes but failed.

So lemme spell it out for ya. I'd like for you to give me this:

1. Build kernel with this config
2. Boot it in kvm with this settings
3. Do this in the guest
4. Do that in the guest
5. ...
6. ...


And all should be exact commands so that I can do them here on my machine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Borislav Petkov
On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 
> 0x0"),
> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
> boots(seems
> the cpus remain in 1st kernel don't respond to the simulated broadcasting 
> mce).
> 
> But in theory, we know cpus belong to kdump kernel can't respond to the
> old mce handler, so a single SRAO injection in 1st kernel should be similar.
> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
> broadcast
> mce supported, and inject SRAO to cpu0 only through qemu monitor
> "mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
> the machine as follows(running on linux-4.9):
>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast 
> exception handler

Sounds to me like you're trying hard to prove some point of yours which
doesn't make much sense to me. And when you say "in theory", that makes
it even less believable. So I remember asking you for exact steps. That
above doesn't read like steps but like some babbling and I've actually
tried to make sense of it for a couple of minutes but failed.

So lemme spell it out for ya. I'd like for you to give me this:

1. Build kernel with this config
2. Boot it in kvm with this settings
3. Do this in the guest
4. Do that in the guest
5. ...
6. ...


And all should be exact commands so that I can do them here on my machine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-15 Thread Xunlei Pang
On 01/26/2017 at 02:44 PM, Borislav Petkov wrote:
> On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
>> The hardware machine check is hard to reproduce, but the mce code of
>> RHEL7 is quite the same as that of tip/master, anyway we are able to
>> inject software mce to reproduce it.
> Please give me your exact steps so that I can try to reproduce it here
> too.
>

Hi Borislav,

I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 0x0"),
it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
boots(seems
the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).

But in theory, we know cpus belong to kdump kernel can't respond to the
old mce handler, so a single SRAO injection in 1st kernel should be similar.
For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
broadcast
mce supported, and inject SRAO to cpu0 only through qemu monitor
"mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
the machine as follows(running on linux-4.9):
  Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception 
handler
  Kernel Offset: disabled
  Rebooting in 30 seconds..

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-15 Thread Xunlei Pang
On 01/26/2017 at 02:44 PM, Borislav Petkov wrote:
> On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
>> The hardware machine check is hard to reproduce, but the mce code of
>> RHEL7 is quite the same as that of tip/master, anyway we are able to
>> inject software mce to reproduce it.
> Please give me your exact steps so that I can try to reproduce it here
> too.
>

Hi Borislav,

I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 0x0"),
it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
boots(seems
the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).

But in theory, we know cpus belong to kdump kernel can't respond to the
old mce handler, so a single SRAO injection in 1st kernel should be similar.
For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
broadcast
mce supported, and inject SRAO to cpu0 only through qemu monitor
"mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
the machine as follows(running on linux-4.9):
  Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception 
handler
  Kernel Offset: disabled
  Rebooting in 30 seconds..

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-25 Thread Borislav Petkov
On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
> The hardware machine check is hard to reproduce, but the mce code of
> RHEL7 is quite the same as that of tip/master, anyway we are able to
> inject software mce to reproduce it.

Please give me your exact steps so that I can try to reproduce it here
too.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-25 Thread Borislav Petkov
On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
> The hardware machine check is hard to reproduce, but the mce code of
> RHEL7 is quite the same as that of tip/master, anyway we are able to
> inject software mce to reproduce it.

Please give me your exact steps so that I can try to reproduce it here
too.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-25 Thread Xunlei Pang
On 01/24/2017 at 08:22 PM, Borislav Petkov wrote:
> On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
>> It occurred on real hardware when testing crash dump.
>>
>> 1) SysRq-c was injected for the test in 1st kernel
>> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>>[ 0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 
>> root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789
> Yeah, no, I'm not debugging the RH Frankenstein kernel.
>
> Please retrigger this with latest tip/master first.
>

The hardware machine check is hard to reproduce, but the mce code of RHEL7 is 
quite
the same as that of tip/master, anyway we are able to inject software mce to 
reproduce it.

It is also clear from the theoretical analysis of the code.

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-25 Thread Xunlei Pang
On 01/24/2017 at 08:22 PM, Borislav Petkov wrote:
> On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
>> It occurred on real hardware when testing crash dump.
>>
>> 1) SysRq-c was injected for the test in 1st kernel
>> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>>[ 0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 
>> root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789
> Yeah, no, I'm not debugging the RH Frankenstein kernel.
>
> Please retrigger this with latest tip/master first.
>

The hardware machine check is hard to reproduce, but the mce code of RHEL7 is 
quite
the same as that of tip/master, anyway we are able to inject software mce to 
reproduce it.

It is also clear from the theoretical analysis of the code.

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-24 Thread Borislav Petkov
On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
> It occurred on real hardware when testing crash dump.
> 
> 1) SysRq-c was injected for the test in 1st kernel
> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>[ 0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 
> root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789

Yeah, no, I'm not debugging the RH Frankenstein kernel.

Please retrigger this with latest tip/master first.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-24 Thread Borislav Petkov
On Tue, Jan 24, 2017 at 09:27:45AM +0800, Xunlei Pang wrote:
> It occurred on real hardware when testing crash dump.
> 
> 1) SysRq-c was injected for the test in 1st kernel
> [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
>[ 0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 
> root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789

Yeah, no, I'm not debugging the RH Frankenstein kernel.

Please retrigger this with latest tip/master first.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/24/2017 at 02:14 AM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
>> will ignore the machine check on the other cpus ... assuming
>> that "cpu_is_offline(smp_processor_id())" does the right thing
>> in the kexec case where this is an "old" cpu that isn't online
>> in the new kernel.
> Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
> good there.
>

"nr_cpus=N" will consume more memory, using very large N is almost
impossible for kdump to boot with considering the limited crash memory
reserved.

For some large machine, nr_cpus=1 might not be enough, we have to use
nr_cpus=4 or more, it is also helpful for the vmcore parallel dumping :-)

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/24/2017 at 02:14 AM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
>> will ignore the machine check on the other cpus ... assuming
>> that "cpu_is_offline(smp_processor_id())" does the right thing
>> in the kexec case where this is an "old" cpu that isn't online
>> in the new kernel.
> Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
> good there.
>

"nr_cpus=N" will consume more memory, using very large N is almost
impossible for kdump to boot with considering the limited crash memory
reserved.

For some large machine, nr_cpus=1 might not be enough, we have to use
nr_cpus=4 or more, it is also helpful for the vmcore parallel dumping :-)

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/24/2017 at 09:46 AM, Xunlei Pang wrote:
> On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
>> Hey Tony,
>>
>> a "welcome back" is in order? :-)
>>
>> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>>> If the system had experienced some memory corruption, but
>>> recovered ... then there would be some pages sitting around
>>> that the old kernel had marked as POISON and stopped using.
>>> The kexec'd kernel doesn't know about these, so may touch that
>>> memory while taking a crash dump ...
>> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
>> touch. Looks like there's already functionality for that:
>>
>> "makedumpfile can exclude the following types of pages while copying
>> VMCORE to DUMPFILE, and a user can choose which type of pages will be
>> excluded.
>>
>> - Pages filled with zero
>> - Cache pages
>> - User process data pages
>> - Free pages"
>>
>>  (there is a makedumpfile manpage somewhere)
>>
>> And apparently crash knows about poisoned pages and handles them:
>>
>> static int __init crash_save_vmcoreinfo_init(void)
>> {
>>  ...
>> #ifdef CONFIG_MEMORY_FAILURE
>> VMCOREINFO_NUMBER(PG_hwpoison);
>> #endif
>>
>> so if that works, the kexeced kernel should know about that list.
> From the log in my previous reply, MCE occurred before makedumpfile dumping,
> so I guess if the poisoned ones belong to the crash reserved memory or other
> type of events?

Another possibility may be from any system.reserved/pcie memory
which are shared between 1st and 2nd kernel.

>
> Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
> is also allowed to process "/proc/vmcore".
>
>>> and then you have a broadcast machine check (on older[1] Intel CPUs
>>> that don't support local machine check).
>> Right.
>>
>>> This is hard to work around. You really need all the CPUs to have set
>>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>>> the machine check). Also you need to make sure that they jump to the
>>> copy of do_machine_check() in the new kernel, not the old kernel.
>> Doesn't matter, right? The new copy is as clueless as the old one about
>> those MCEs.
>>
> It's the code in mce_start(), it waits for all the online cpus including the 
> cpus
> that kdump boots on to synchronize.
>
> So for new mce handler of kdump kernel, it is fine as the number of online 
> cpus
> is correct; as for old mce handler of 1st kernel, it's not true because some 
> cpus
> which are regarded online from 1st kernel's view are running the 2nd kernel 
> now,
> they can't respond to the old mce handler which will timeout the old mce 
> handler.
>
> Regards,
> Xunlei



Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/24/2017 at 09:46 AM, Xunlei Pang wrote:
> On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
>> Hey Tony,
>>
>> a "welcome back" is in order? :-)
>>
>> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>>> If the system had experienced some memory corruption, but
>>> recovered ... then there would be some pages sitting around
>>> that the old kernel had marked as POISON and stopped using.
>>> The kexec'd kernel doesn't know about these, so may touch that
>>> memory while taking a crash dump ...
>> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
>> touch. Looks like there's already functionality for that:
>>
>> "makedumpfile can exclude the following types of pages while copying
>> VMCORE to DUMPFILE, and a user can choose which type of pages will be
>> excluded.
>>
>> - Pages filled with zero
>> - Cache pages
>> - User process data pages
>> - Free pages"
>>
>>  (there is a makedumpfile manpage somewhere)
>>
>> And apparently crash knows about poisoned pages and handles them:
>>
>> static int __init crash_save_vmcoreinfo_init(void)
>> {
>>  ...
>> #ifdef CONFIG_MEMORY_FAILURE
>> VMCOREINFO_NUMBER(PG_hwpoison);
>> #endif
>>
>> so if that works, the kexeced kernel should know about that list.
> From the log in my previous reply, MCE occurred before makedumpfile dumping,
> so I guess if the poisoned ones belong to the crash reserved memory or other
> type of events?

Another possibility may be from any system.reserved/pcie memory
which are shared between 1st and 2nd kernel.

>
> Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
> is also allowed to process "/proc/vmcore".
>
>>> and then you have a broadcast machine check (on older[1] Intel CPUs
>>> that don't support local machine check).
>> Right.
>>
>>> This is hard to work around. You really need all the CPUs to have set
>>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>>> the machine check). Also you need to make sure that they jump to the
>>> copy of do_machine_check() in the new kernel, not the old kernel.
>> Doesn't matter, right? The new copy is as clueless as the old one about
>> those MCEs.
>>
> It's the code in mce_start(), it waits for all the online cpus including the 
> cpus
> that kdump boots on to synchronize.
>
> So for new mce handler of kdump kernel, it is fine as the number of online 
> cpus
> is correct; as for old mce handler of 1st kernel, it's not true because some 
> cpus
> which are regarded online from 1st kernel's view are running the 2nd kernel 
> now,
> they can't respond to the old mce handler which will timeout the old mce 
> handler.
>
> Regards,
> Xunlei



Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
>   ...
> #ifdef CONFIG_MEMORY_FAILURE
> VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

>From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the 
cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some 
cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce 
handler.

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/24/2017 at 01:51 AM, Borislav Petkov wrote:
> Hey Tony,
>
> a "welcome back" is in order? :-)
>
> On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
>> If the system had experienced some memory corruption, but
>> recovered ... then there would be some pages sitting around
>> that the old kernel had marked as POISON and stopped using.
>> The kexec'd kernel doesn't know about these, so may touch that
>> memory while taking a crash dump ...
> Hmm, pass a list of poisoned pages to the kdump kernel so as not to
> touch. Looks like there's already functionality for that:
>
> "makedumpfile can exclude the following types of pages while copying
> VMCORE to DUMPFILE, and a user can choose which type of pages will be
> excluded.
>
> - Pages filled with zero
> - Cache pages
> - User process data pages
> - Free pages"
>
>  (there is a makedumpfile manpage somewhere)
>
> And apparently crash knows about poisoned pages and handles them:
>
> static int __init crash_save_vmcoreinfo_init(void)
> {
>   ...
> #ifdef CONFIG_MEMORY_FAILURE
> VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
>
> so if that works, the kexeced kernel should know about that list.

>From the log in my previous reply, MCE occurred before makedumpfile dumping,
so I guess if the poisoned ones belong to the crash reserved memory or other
type of events?

Besides, some kdump kernel may not use makedumpfile, for example a simple "cp"
is also allowed to process "/proc/vmcore".

>
>> and then you have a broadcast machine check (on older[1] Intel CPUs
>> that don't support local machine check).
> Right.
>
>> This is hard to work around. You really need all the CPUs to have set
>> CR4.MCE=1 (if any didn't, then they will force a reset when they see
>> the machine check). Also you need to make sure that they jump to the
>> copy of do_machine_check() in the new kernel, not the old kernel.
> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.
>

It's the code in mce_start(), it waits for all the online cpus including the 
cpus
that kdump boots on to synchronize.

So for new mce handler of kdump kernel, it is fine as the number of online cpus
is correct; as for old mce handler of 1st kernel, it's not true because some 
cpus
which are regarded online from 1st kernel's view are running the 2nd kernel now,
they can't respond to the old mce handler which will timeout the old mce 
handler.

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/23/2017 at 10:50 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
>> One possible timing sequence would be:
>> 1st kernel running on multiple cpus panicked
>> then the crash dump code starts
>> the crash dump code stops the others cpus except the crashing one
>> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
>> some broadcasted mce comes on some cpu amongst the other cpus(not the 
>> crashing cpu)
> Where does this broadcasted MCE come from?
>
> The crash dump code triggered it? Or it happened before the panic()?
>
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?
>

It occurred on real hardware when testing crash dump.

1) SysRq-c was injected for the test in 1st kernel
[ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
   [ 0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 
root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 
nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off 
numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug 
disable_cpu_apicid=0 elfcorehdr=869772K 3) An MCE came to the 1st kernel, 
timeout panic occurred, and rebooted the machine
[6.095706] Dazed and confused, but trying to continue  // message of 
the 1st kernel
[   81.655507] Kernel panic - not syncing: Timeout synchronizing machine 
check over CPUs
[   82.729324] Shutting down cpus with NMI
[   82.774539] drm_kms_helper: panic occurred, switching back to text 
console
[   82.782257] Rebooting in 10 seconds..

Please see the attached for the full log. Regards, Xunlei

[   49.897279] SysRq : Trigger a crash 
[   49.901218] BUG: unable to handle kernel NULL pointer dereference at 
  (null) 
[   49.909988] IP: [] sysrq_handle_crash+0x16/0x20 
[   49.916805] PGD 868add067 PUD 867139067 PMD 0  
[   49.921805] Oops: 0002 [#1] SMP  
[   49.925432] Modules linked in: ipmi_devintf intel_powerclamp coretemp 
intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
iTCO_wdt sb_edac iTCO_vendor_support ntb mei_me pcspkr edac_core ioatdma 
lpc_ich i2c_i801 ipmi_si mei mfd_core shpchp dca ipmi_msghandler acpi_pad 
acpi_power_meter xfs sd_mod sr_mod crc_t10dif cdrom crct10dif_common 
usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit 
drm_kms_helper ata_generic ttm bnx2x pata_acpi mdio drm ata_piix ptp libata 
i2c_core pps_core libcrc32c 
[   49.984994] CPU: 9 PID: 9463 Comm: do-test.sh Not tainted 
3.10.0-229.el7.x86_64 #1 
[   49.993456] Hardware name: NEC Express5800/B120d-h [N8400-126Y]/G7LDV, BIOS 
4.6.2013 10/24/2012 
[   50.003164] task: 88043370 ti: 8808653b8000 task.ti: 
8808653b8000 
[   50.011514] RIP: 0010:[]  [] 
sysrq_handle_crash+0x16/0x20 
[   50.021045] RSP: 0018:8808653bbe80  EFLAGS: 00010046 
[   50.026976] RAX: 000f RBX: 819c18a0 RCX: 
 
[   50.034939] RDX:  RSI: 88087fc2d488 RDI: 
0063 
[   50.042908] RBP: 8808653bbe80 R08: 0092 R09: 
0608 
[   50.050870] R10: 0607 R11: 0003 R12: 
0063 
[   50.058837] R13: 0246 R14: 0007 R15: 
 
[   50.066799] FS:  7f0faaf54740() GS:88087fc2() 
knlGS: 
[   50.075828] CS:  0010 DS:  ES:  CR0: 80050033 
[   50.082244] CR2:  CR3: 000866d07000 CR4: 
000407e0 
[   50.090212] DR0:  DR1:  DR2: 
 
[   50.098173] DR3:  DR6: 0ff0 DR7: 
0400 
[   50.106133] Stack: 
[   50.108388]  8808653bbeb8 81397c32 0002 
7f0faaf58000 
[   50.116671]  8808653bbf48 0002  
8808653bbed0 
[   50.124963]  8139810f 8804674a6540 8808653bbef0 
8122de0d 
[   50.133257] Call Trace: 
[   50.135993]  [] __handle_sysrq+0xa2/0x170 
[   50.142219]  [] write_sysrq_trigger+0x2f/0x40 
[   50.148841]  [] pro c_reg_write+0x3] Code: eb 9b 45 01 f4 
45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 50 d7 
59 00 01 00 00 00 48 89 e5 0f ae f8  04 25 00 00 00 00 01 5d c3 66 66 66 66 
90 55 31 c0 c7 05 ce  
[   50.194758] RIP  [] sysrq_handle_crash+0x16/0x20 
[   50.201669]  RSP  
[   50.205558] CR2:  
[0.00] Initializing cgroup subsys cpuset 
[0.00] Initializing cgroup subsys cpu 
[0.00] Initializing cgroup subsys cpuacct 
[0.00] Linux version 3.10.0-229.el7.x86_64 
(mockbu...@x86-035.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red 
Hat 4.8.3-7) (GCC) ) #1 SMP Thu Jan 29 18:37:38 EST 2015 
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 

Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/23/2017 at 10:50 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
>> One possible timing sequence would be:
>> 1st kernel running on multiple cpus panicked
>> then the crash dump code starts
>> the crash dump code stops the others cpus except the crashing one
>> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
>> some broadcasted mce comes on some cpu amongst the other cpus(not the 
>> crashing cpu)
> Where does this broadcasted MCE come from?
>
> The crash dump code triggered it? Or it happened before the panic()?
>
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?
>

It occurred on real hardware when testing crash dump.

1) SysRq-c was injected for the test in 1st kernel
[ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump
   [ 0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 
root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 
nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off 
numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug 
disable_cpu_apicid=0 elfcorehdr=869772K 3) An MCE came to the 1st kernel, 
timeout panic occurred, and rebooted the machine
[6.095706] Dazed and confused, but trying to continue  // message of 
the 1st kernel
[   81.655507] Kernel panic - not syncing: Timeout synchronizing machine 
check over CPUs
[   82.729324] Shutting down cpus with NMI
[   82.774539] drm_kms_helper: panic occurred, switching back to text 
console
[   82.782257] Rebooting in 10 seconds..

Please see the attached for the full log. Regards, Xunlei

[   49.897279] SysRq : Trigger a crash 
[   49.901218] BUG: unable to handle kernel NULL pointer dereference at 
  (null) 
[   49.909988] IP: [] sysrq_handle_crash+0x16/0x20 
[   49.916805] PGD 868add067 PUD 867139067 PMD 0  
[   49.921805] Oops: 0002 [#1] SMP  
[   49.925432] Modules linked in: ipmi_devintf intel_powerclamp coretemp 
intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel 
ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd 
iTCO_wdt sb_edac iTCO_vendor_support ntb mei_me pcspkr edac_core ioatdma 
lpc_ich i2c_i801 ipmi_si mei mfd_core shpchp dca ipmi_msghandler acpi_pad 
acpi_power_meter xfs sd_mod sr_mod crc_t10dif cdrom crct10dif_common 
usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit 
drm_kms_helper ata_generic ttm bnx2x pata_acpi mdio drm ata_piix ptp libata 
i2c_core pps_core libcrc32c 
[   49.984994] CPU: 9 PID: 9463 Comm: do-test.sh Not tainted 
3.10.0-229.el7.x86_64 #1 
[   49.993456] Hardware name: NEC Express5800/B120d-h [N8400-126Y]/G7LDV, BIOS 
4.6.2013 10/24/2012 
[   50.003164] task: 88043370 ti: 8808653b8000 task.ti: 
8808653b8000 
[   50.011514] RIP: 0010:[]  [] 
sysrq_handle_crash+0x16/0x20 
[   50.021045] RSP: 0018:8808653bbe80  EFLAGS: 00010046 
[   50.026976] RAX: 000f RBX: 819c18a0 RCX: 
 
[   50.034939] RDX:  RSI: 88087fc2d488 RDI: 
0063 
[   50.042908] RBP: 8808653bbe80 R08: 0092 R09: 
0608 
[   50.050870] R10: 0607 R11: 0003 R12: 
0063 
[   50.058837] R13: 0246 R14: 0007 R15: 
 
[   50.066799] FS:  7f0faaf54740() GS:88087fc2() 
knlGS: 
[   50.075828] CS:  0010 DS:  ES:  CR0: 80050033 
[   50.082244] CR2:  CR3: 000866d07000 CR4: 
000407e0 
[   50.090212] DR0:  DR1:  DR2: 
 
[   50.098173] DR3:  DR6: 0ff0 DR7: 
0400 
[   50.106133] Stack: 
[   50.108388]  8808653bbeb8 81397c32 0002 
7f0faaf58000 
[   50.116671]  8808653bbf48 0002  
8808653bbed0 
[   50.124963]  8139810f 8804674a6540 8808653bbef0 
8122de0d 
[   50.133257] Call Trace: 
[   50.135993]  [] __handle_sysrq+0xa2/0x170 
[   50.142219]  [] write_sysrq_trigger+0x2f/0x40 
[   50.148841]  [] pro c_reg_write+0x3] Code: eb 9b 45 01 f4 
45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 66 66 66 66 90 55 c7 05 50 d7 
59 00 01 00 00 00 48 89 e5 0f ae f8  04 25 00 00 00 00 01 5d c3 66 66 66 66 
90 55 31 c0 c7 05 ce  
[   50.194758] RIP  [] sysrq_handle_crash+0x16/0x20 
[   50.201669]  RSP  
[   50.205558] CR2:  
[0.00] Initializing cgroup subsys cpuset 
[0.00] Initializing cgroup subsys cpu 
[0.00] Initializing cgroup subsys cpuacct 
[0.00] Linux version 3.10.0-229.el7.x86_64 
(mockbu...@x86-035.build.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red 
Hat 4.8.3-7) (GCC) ) #1 SMP Thu Jan 29 18:37:38 EST 2015 
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 

Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
> will ignore the machine check on the other cpus ... assuming
> that "cpu_is_offline(smp_processor_id())" does the right thing
> in the kexec case where this is an "old" cpu that isn't online
> in the new kernel.

Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
good there.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
On Mon, Jan 23, 2017 at 10:01:53AM -0800, Luck, Tony wrote:
> will ignore the machine check on the other cpus ... assuming
> that "cpu_is_offline(smp_processor_id())" does the right thing
> in the kexec case where this is an "old" cpu that isn't online
> in the new kernel.

Nice. And kdump did do the dumping on one CPU, AFAIR. So we should be
good there.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote:
> Hey Tony,
> 
> a "welcome back" is in order? :-)

Yes - first day back today. Lots of catching up to do.

> And apparently crash knows about poisoned pages and handles them:
> 
> static int __init crash_save_vmcoreinfo_init(void)
> {
>   ...
> #ifdef CONFIG_MEMORY_FAILURE
> VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
> 
> so if that works, the kexeced kernel should know about that list.

Oh good ... it is smarter than I thought.

> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.

If things are well enough initialized that we don't reset, and
get to do_machine_check(), then this code from Ashok:

/* If this CPU is offline, just bail out. */
if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

will ignore the machine check on the other cpus ... assuming
that "cpu_is_offline(smp_processor_id())" does the right thing
in the kexec case where this is an "old" cpu that isn't online
in the new kernel.

-Tony


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 06:51:30PM +0100, Borislav Petkov wrote:
> Hey Tony,
> 
> a "welcome back" is in order? :-)

Yes - first day back today. Lots of catching up to do.

> And apparently crash knows about poisoned pages and handles them:
> 
> static int __init crash_save_vmcoreinfo_init(void)
> {
>   ...
> #ifdef CONFIG_MEMORY_FAILURE
> VMCOREINFO_NUMBER(PG_hwpoison);
> #endif
> 
> so if that works, the kexeced kernel should know about that list.

Oh good ... it is smarter than I thought.

> Doesn't matter, right? The new copy is as clueless as the old one about
> those MCEs.

If things are well enough initialized that we don't reset, and
get to do_machine_check(), then this code from Ashok:

/* If this CPU is offline, just bail out. */
if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

will ignore the machine check on the other cpus ... assuming
that "cpu_is_offline(smp_processor_id())" does the right thing
in the kexec case where this is an "old" cpu that isn't online
in the new kernel.

-Tony


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
Hey Tony,

a "welcome back" is in order? :-)

On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The kexec'd kernel doesn't know about these, so may touch that
> memory while taking a crash dump ...

Hmm, pass a list of poisoned pages to the kdump kernel so as not to
touch. Looks like there's already functionality for that:

"makedumpfile can exclude the following types of pages while copying
VMCORE to DUMPFILE, and a user can choose which type of pages will be
excluded.

- Pages filled with zero
- Cache pages
- User process data pages
- Free pages"

 (there is a makedumpfile manpage somewhere)

And apparently crash knows about poisoned pages and handles them:

static int __init crash_save_vmcoreinfo_init(void)
{
...
#ifdef CONFIG_MEMORY_FAILURE
VMCOREINFO_NUMBER(PG_hwpoison);
#endif

so if that works, the kexeced kernel should know about that list.

> and then you have a broadcast machine check (on older[1] Intel CPUs
> that don't support local machine check).

Right.

> This is hard to work around. You really need all the CPUs to have set
> CR4.MCE=1 (if any didn't, then they will force a reset when they see
> the machine check). Also you need to make sure that they jump to the
> copy of do_machine_check() in the new kernel, not the old kernel.

Doesn't matter, right? The new copy is as clueless as the old one about
those MCEs.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
Hey Tony,

a "welcome back" is in order? :-)

On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The kexec'd kernel doesn't know about these, so may touch that
> memory while taking a crash dump ...

Hmm, pass a list of poisoned pages to the kdump kernel so as not to
touch. Looks like there's already functionality for that:

"makedumpfile can exclude the following types of pages while copying
VMCORE to DUMPFILE, and a user can choose which type of pages will be
excluded.

- Pages filled with zero
- Cache pages
- User process data pages
- Free pages"

 (there is a makedumpfile manpage somewhere)

And apparently crash knows about poisoned pages and handles them:

static int __init crash_save_vmcoreinfo_init(void)
{
...
#ifdef CONFIG_MEMORY_FAILURE
VMCOREINFO_NUMBER(PG_hwpoison);
#endif

so if that works, the kexeced kernel should know about that list.

> and then you have a broadcast machine check (on older[1] Intel CPUs
> that don't support local machine check).

Right.

> This is hard to work around. You really need all the CPUs to have set
> CR4.MCE=1 (if any didn't, then they will force a reset when they see
> the machine check). Also you need to make sure that they jump to the
> copy of do_machine_check() in the new kernel, not the old kernel.

Doesn't matter, right? The new copy is as clueless as the old one about
those MCEs.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others cpus except the crashing one
> > 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> > some broadcasted mce comes on some cpu amongst the other cpus(not the 
> > crashing cpu)
> 
> Where does this broadcasted MCE come from?
> 
> The crash dump code triggered it? Or it happened before the panic()?
> 
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?

If the system had experienced some memory corruption, but
recovered ... then there would be some pages sitting around
that the old kernel had marked as POISON and stopped using.
The kexec'd kernel doesn't know about these, so may touch that
memory while taking a crash dump ... and then you have a
broadcast machine check (on older[1] Intel CPUs that don't support
local machine check).

This is hard to work around.  You really need all the CPUs to
have set CR4.MCE=1 (if any didn't, then they will force a reset
when they see the machine check). Also you need to make sure that
they jump to the copy of do_machine_check() in the new kernel, not
the old kernel.

A while ago I played with the nr_cpus=N code to have it bring
all the CPUs far enough online to get the machine check initialization
done, then any extras above "N" just go back offline again.
But I never got this to work reliably.

-Tony

[1] older == all released ones, at the moment.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Luck, Tony
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others cpus except the crashing one
> > 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> > some broadcasted mce comes on some cpu amongst the other cpus(not the 
> > crashing cpu)
> 
> Where does this broadcasted MCE come from?
> 
> The crash dump code triggered it? Or it happened before the panic()?
> 
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?

If the system had experienced some memory corruption, but
recovered ... then there would be some pages sitting around
that the old kernel had marked as POISON and stopped using.
The kexec'd kernel doesn't know about these, so may touch that
memory while taking a crash dump ... and then you have a
broadcast machine check (on older[1] Intel CPUs that don't support
local machine check).

This is hard to work around.  You really need all the CPUs to
have set CR4.MCE=1 (if any didn't, then they will force a reset
when they see the machine check). Also you need to make sure that
they jump to the copy of do_machine_check() in the new kernel, not
the old kernel.

A while ago I played with the nr_cpus=N code to have it bring
all the CPUs far enough online to get the machine check initialization
done, then any extras above "N" just go back offline again.
But I never got this to work reliably.

-Tony

[1] older == all released ones, at the moment.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> One possible timing sequence would be:
> 1st kernel running on multiple cpus panicked
> then the crash dump code starts
> the crash dump code stops the others cpus except the crashing one
> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> some broadcasted mce comes on some cpu amongst the other cpus(not the 
> crashing cpu)

Where does this broadcasted MCE come from?

The crash dump code triggered it? Or it happened before the panic()?

Are you talking about an *actual* sequence which you're experiencing on
real hw or is this something hypothetical?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> One possible timing sequence would be:
> 1st kernel running on multiple cpus panicked
> then the crash dump code starts
> the crash dump code stops the others cpus except the crashing one
> 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> some broadcasted mce comes on some cpu amongst the other cpus(not the 
> crashing cpu)

Where does this broadcasted MCE come from?

The crash dump code triggered it? Or it happened before the panic()?

Are you talking about an *actual* sequence which you're experiencing on
real hw or is this something hypothetical?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/23/2017 at 08:51 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
>> We met an issue for kdump: after kdump kernel boots up,
>> and there comes a broadcasted mce in first kernel, the
> How does that even happen?
>
> Lemme try to understand this correctly: the first kernel gets an
> MCE, kdump starts and boots a *whole* kernel and *then* you get the
> broadcasted MCE? I have real hard time believing that.
>
> What happened to the approach of clearing CR4.MCE before loading the
> kdump kernel, in native_machine_shutdown() or wherever does the kdump
> gets loaded...
>

One possible timing sequence would be:
1st kernel running on multiple cpus panicked
then the crash dump code starts
the crash dump code stops the others cpus except the crashing one
2nd kernel boots up on the crash cpu with "nr_cpus=1"
some broadcasted mce comes on some cpu amongst the other cpus(not the crashing 
cpu)
the other cpus enter old mce handler of 1st kernel, while crash cpu enters new 
mce handler of 2nd kernel
the old mce handler of 1st kernel will timeout and panic due to mce 
syncrhonization under default setting

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Xunlei Pang
On 01/23/2017 at 08:51 PM, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
>> We met an issue for kdump: after kdump kernel boots up,
>> and there comes a broadcasted mce in first kernel, the
> How does that even happen?
>
> Lemme try to understand this correctly: the first kernel gets an
> MCE, kdump starts and boots a *whole* kernel and *then* you get the
> broadcasted MCE? I have real hard time believing that.
>
> What happened to the approach of clearing CR4.MCE before loading the
> kdump kernel, in native_machine_shutdown() or wherever does the kdump
> gets loaded...
>

One possible timing sequence would be:
1st kernel running on multiple cpus panicked
then the crash dump code starts
the crash dump code stops the others cpus except the crashing one
2nd kernel boots up on the crash cpu with "nr_cpus=1"
some broadcasted mce comes on some cpu amongst the other cpus(not the crashing 
cpu)
the other cpus enter old mce handler of 1st kernel, while crash cpu enters new 
mce handler of 2nd kernel
the old mce handler of 1st kernel will timeout and panic due to mce 
syncrhonization under default setting

Regards,
Xunlei


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
> We met an issue for kdump: after kdump kernel boots up,
> and there comes a broadcasted mce in first kernel, the

How does that even happen?

Lemme try to understand this correctly: the first kernel gets an
MCE, kdump starts and boots a *whole* kernel and *then* you get the
broadcasted MCE? I have real hard time believing that.

What happened to the approach of clearing CR4.MCE before loading the
kdump kernel, in native_machine_shutdown() or wherever does the kdump
gets loaded...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-01-23 Thread Borislav Petkov
On Mon, Jan 23, 2017 at 04:01:51PM +0800, Xunlei Pang wrote:
> We met an issue for kdump: after kdump kernel boots up,
> and there comes a broadcasted mce in first kernel, the

How does that even happen?

Lemme try to understand this correctly: the first kernel gets an
MCE, kdump starts and boots a *whole* kernel and *then* you get the
broadcasted MCE? I have real hard time believing that.

What happened to the approach of clearing CR4.MCE before loading the
kdump kernel, in native_machine_shutdown() or wherever does the kdump
gets loaded...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.