On Tue, Feb 21, 2017 at 08:37:21PM +0800, Xunlei Pang wrote:
> -/* If this CPU is offline, just bail out. */
> -if (cpu_is_offline(smp_processor_id())) {
> +/*
> + * Cases to bail out to avoid rendezvous process timeout:
> + * 1)If crashing_cpu was set, e.g. entering kdump,
> +
On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long
>> error_code)
>> */
>> int lmce = 1;
>>
>> -/* If this CPU is offline, just bail out. */
>> -
On Tue, Feb 21, 2017 at 09:28:20AM +0800, Xunlei Pang wrote:
> Not kdump kernel starts dumping, just during nmi_shootdown_cpus(), if some
> MCE comes after crashing_cpu was set and we don't skip crashing_cpu, then
> the crashing cpu will enter mce handler and trigger the synchronization issue.
Ok,
On 02/21/2017 at 04:26 AM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote:
>> There is a small window between crash and kdump kernel boot, so
>> if a SRAO comes within this window it will also cause the mce
>> synchronization problem on the crashing cpu if we d
On 02/20/2017 at 09:29 PM, Xunlei Pang wrote:
> On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
>> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long
>>> error_code)
>>> */
>>> int lmce = 1;
>>>
>>> -
On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote:
> There is a small window between crash and kdump kernel boot, so
> if a SRAO comes within this window it will also cause the mce
> synchronization problem on the crashing cpu if we don't bail out the
> crashing cpu.
You mean, in the win
On 02/20/2017 at 07:09 PM, Borislav Petkov wrote:
> On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
>> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long
>> error_code)
>> */
>> int lmce = 1;
>>
>> -/* If this CPU is offline, just bail out. */
>> -
On Mon, Feb 20, 2017 at 02:10:37PM +0800, Xunlei Pang wrote:
> @@ -1128,8 +1129,9 @@ void do_machine_check(struct pt_regs *regs, long
> error_code)
>*/
> int lmce = 1;
>
> - /* If this CPU is offline, just bail out. */
> - if (cpu_is_offline(smp_processor_id())) {
> + /
We met an issue for kdump: after kdump kernel boots up,
and there comes a broadcasted mce in first kernel, the
other cpus remaining in first kernel will enter the old
mce handler of first kernel, then timeout and panic due
to MCE synchronization, finally reset the kdump cpus.
This patch lets cpus
9 matches
Mail list logo