Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-26 Thread Borislav Petkov
On Thu, May 22, 2014 at 09:55:38PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov 
> Date: Thu, 22 May 2014 16:40:54 +0200
> Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD
> 
> In conjunction with cleaning up CPU hotplug, we want to get rid of
> CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
> end of CPU_DEAD.
> 
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 68317c80de7f..bfde4871848f 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>   threshold_cpu_callback(action, cpu);
>   mce_device_remove(cpu);
>   mce_intel_hcpu_update(cpu);
> +
> + /* intentionally ignoring frozen here */
> + if (!(action & CPU_TASKS_FROZEN))
> + cmci_rediscover();
>   break;
>   case CPU_DOWN_PREPARE:
>   smp_call_function_single(cpu, mce_disable_cpu, , 1);
> @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>   break;
>   }
>  
> - if (action == CPU_POST_DEAD) {
> - /* intentionally ignoring frozen here */
> - cmci_rediscover();
> - }
> -
>   return NOTIFY_OK;
>  }

Ok, so I did a little hammering on this one by running a hotplug toggler
script, reading out files under /sys/devices/system/machinecheck... and
suspending to disk and resuming, all at the same time. 'Round 10ish
cycles I did and the box was chugging away happily without any issues.

So, I'm going to queue this one for 3.17, along with the
panic-on-timeout for the default tolerance level one:

http://lkml.kernel.org/r/20140523091041.ga21...@pd.tnic

if you don't have any objections. I'm saying 3.17 because both are
not really critical stuff and could use a full cycle of simmering in
linux-next just fine.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-26 Thread Borislav Petkov
On Thu, May 22, 2014 at 09:55:38PM +0200, Borislav Petkov wrote:
 From: Borislav Petkov b...@suse.de
 Date: Thu, 22 May 2014 16:40:54 +0200
 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD
 
 In conjunction with cleaning up CPU hotplug, we want to get rid of
 CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
 end of CPU_DEAD.
 
 Signed-off-by: Borislav Petkov b...@suse.de
 ---
  arch/x86/kernel/cpu/mcheck/mce.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
 b/arch/x86/kernel/cpu/mcheck/mce.c
 index 68317c80de7f..bfde4871848f 100644
 --- a/arch/x86/kernel/cpu/mcheck/mce.c
 +++ b/arch/x86/kernel/cpu/mcheck/mce.c
 @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
 long action, void *hcpu)
   threshold_cpu_callback(action, cpu);
   mce_device_remove(cpu);
   mce_intel_hcpu_update(cpu);
 +
 + /* intentionally ignoring frozen here */
 + if (!(action  CPU_TASKS_FROZEN))
 + cmci_rediscover();
   break;
   case CPU_DOWN_PREPARE:
   smp_call_function_single(cpu, mce_disable_cpu, action, 1);
 @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
 long action, void *hcpu)
   break;
   }
  
 - if (action == CPU_POST_DEAD) {
 - /* intentionally ignoring frozen here */
 - cmci_rediscover();
 - }
 -
   return NOTIFY_OK;
  }

Ok, so I did a little hammering on this one by running a hotplug toggler
script, reading out files under /sys/devices/system/machinecheck... and
suspending to disk and resuming, all at the same time. 'Round 10ish
cycles I did and the box was chugging away happily without any issues.

So, I'm going to queue this one for 3.17, along with the
panic-on-timeout for the default tolerance level one:

http://lkml.kernel.org/r/20140523091041.ga21...@pd.tnic

if you don't have any objections. I'm saying 3.17 because both are
not really critical stuff and could use a full cycle of simmering in
linux-next just fine.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Srivatsa S. Bhat
On 05/23/2014 01:25 AM, Borislav Petkov wrote:
> On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote:
 So I think we can reduce it to just the one rwsem (with recursion) if we
 shoot CPU_POST_DEAD in the head.
>>>
>>> Here's the first bullet. Stressing my box here with Steve's hotplug
>>> script seems to work fine.
>>>
>>> Tony, any objections?
>>
>> what was this comment referring to:
>>
>> /* intentionally ignoring frozen here */
>>
>> After you move the cmci_rediscover() call, it is now in a place where we are
>> no longer ignoring frozen (i.e. the old placement did the rediscover even if 
>> the
>> CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
>> rediscovery.
>>
>> So we were working around some interaction between cpu hotplug and frozen.
>> Do we no longer need to do that?
> 
> Hmm, that FROZEN thing is supposedly for hotplug operations while
> suspend is happening. I guess it makes a little sense to rediscover CMCI
> banks while suspend is in progress. Whatever.
> 
> Let's keep it before more crap ensues, that was a good catch, thanks.
> 
> So, I guess something like that instead.
> 
> Which means, I'd need to run a couple of suspend/resume rounds while
> hotplugging cores to see whether we're still kosher.
> 
> More fun tomorrow.
> 
> ---
> From: Borislav Petkov 
> Date: Thu, 22 May 2014 16:40:54 +0200
> Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD
> 
> In conjunction with cleaning up CPU hotplug, we want to get rid of
> CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
> end of CPU_DEAD.
> 
> Signed-off-by: Borislav Petkov 

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 68317c80de7f..bfde4871848f 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>   threshold_cpu_callback(action, cpu);
>   mce_device_remove(cpu);
>   mce_intel_hcpu_update(cpu);
> +
> + /* intentionally ignoring frozen here */
> + if (!(action & CPU_TASKS_FROZEN))
> + cmci_rediscover();
>   break;
>   case CPU_DOWN_PREPARE:
>   smp_call_function_single(cpu, mce_disable_cpu, , 1);
> @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>   break;
>   }
> 
> - if (action == CPU_POST_DEAD) {
> - /* intentionally ignoring frozen here */
> - cmci_rediscover();
> - }
> -
>   return NOTIFY_OK;
>  }
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Srivatsa S. Bhat
On 05/23/2014 03:01 AM, Borislav Petkov wrote:
> On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote:
 After you move the cmci_rediscover() call, it is now in a place where we 
 are
 no longer ignoring frozen (i.e. the old placement did the rediscover even 
 if the
 CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
 rediscovery.

>>
>> That's not quite true. The existing code already ignores FROZEN for all the 
>> cases,
>> by ignoring it at the top of the switch-case itself:
> 
> No, Tony's right and you got confused:
> 
> Before my change, the code did:
> 
>   if (action == CPU_POST_DEAD) {
>   /* intentionally ignoring frozen here */
>   cmci_rediscover();
>   }
> 
> which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit.
> 
> If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN
> bit and gets executed for both:
> 
>   CPU_DEAD:
>   CPU_DEAD_FROZEN:
> 
> because with the FROZEN bit masked out, they're the same.
> 
> But we don't want to execute it for the FROZEN bit - look for the other
> two tests for CPU_TASKS_FROZEN in mce.c for an example.
> 
> So, before we go and change the FROZEN aspect and break things in
> strange ways, let's keep the _FROZEN ignore. I certainly don't want to
> go down that road and chase why we needed FROZEN or not.
> 
> Ok?
> 

Right, I got confused about who meant what by the term 'ignore' -
ignore the FROZEN _bit_ as in execute all the time irrespective of that
bit being set or unset, or ignore the FROZEN _case_ as in don't execute
during suspend/resume.

Anyway, sorry for the confusion! Your latest code looks correct to me.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Borislav Petkov
On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote:
> >> After you move the cmci_rediscover() call, it is now in a place where we 
> >> are
> >> no longer ignoring frozen (i.e. the old placement did the rediscover even 
> >> if the
> >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
> >> rediscovery.
> >>
> 
> That's not quite true. The existing code already ignores FROZEN for all the 
> cases,
> by ignoring it at the top of the switch-case itself:

No, Tony's right and you got confused:

Before my change, the code did:

if (action == CPU_POST_DEAD) {
/* intentionally ignoring frozen here */
cmci_rediscover();
}

which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit.

If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN
bit and gets executed for both:

CPU_DEAD:
CPU_DEAD_FROZEN:

because with the FROZEN bit masked out, they're the same.

But we don't want to execute it for the FROZEN bit - look for the other
two tests for CPU_TASKS_FROZEN in mce.c for an example.

So, before we go and change the FROZEN aspect and break things in
strange ways, let's keep the _FROZEN ignore. I certainly don't want to
go down that road and chase why we needed FROZEN or not.

Ok?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Srivatsa S. Bhat
On 05/23/2014 01:25 AM, Borislav Petkov wrote:
> On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote:
 So I think we can reduce it to just the one rwsem (with recursion) if we
 shoot CPU_POST_DEAD in the head.
>>>
>>> Here's the first bullet. Stressing my box here with Steve's hotplug
>>> script seems to work fine.
>>>
>>> Tony, any objections?
>>
>> what was this comment referring to:
>>
>> /* intentionally ignoring frozen here */
>>
>> After you move the cmci_rediscover() call, it is now in a place where we are
>> no longer ignoring frozen (i.e. the old placement did the rediscover even if 
>> the
>> CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
>> rediscovery.
>>

That's not quite true. The existing code already ignores FROZEN for all the 
cases,
by ignoring it at the top of the switch-case itself:

switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
[...]
break;
case CPU_DEAD:
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
mce_intel_hcpu_update(cpu);
break;

Then I started wondering what the comment really meant, and commit 1a65f970d1
made things clear: its actually the _other_ way around! That is, 
cmci_rediscover()
didn't have to be invoked* during suspend/resume, so it was kept separate from
the rest.

* or maybe it was not _supposed_ to be invoked; I don't know which is the case..
the original commit 88ccbedd9 didn't explain that.

Either way, cmci_rediscover() doesn't seem to have any reason why it should
be called specifically in the POST_DEAD stage only. So I'm sure we can get rid
of that one way or other easily.

Regards,
Srivatsa S. Bhat

>> So we were working around some interaction between cpu hotplug and frozen.
>> Do we no longer need to do that?
> 
> Hmm, that FROZEN thing is supposedly for hotplug operations while
> suspend is happening. I guess it makes a little sense to rediscover CMCI
> banks while suspend is in progress. Whatever.
> 
> Let's keep it before more crap ensues, that was a good catch, thanks.
> 
> So, I guess something like that instead.
> 
> Which means, I'd need to run a couple of suspend/resume rounds while
> hotplugging cores to see whether we're still kosher.
> 
> More fun tomorrow.
> 
> ---
> From: Borislav Petkov 
> Date: Thu, 22 May 2014 16:40:54 +0200
> Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD
> 
> In conjunction with cleaning up CPU hotplug, we want to get rid of
> CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
> end of CPU_DEAD.
> 
> Signed-off-by: Borislav Petkov 
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 68317c80de7f..bfde4871848f 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>   threshold_cpu_callback(action, cpu);
>   mce_device_remove(cpu);
>   mce_intel_hcpu_update(cpu);
> +
> + /* intentionally ignoring frozen here */
> + if (!(action & CPU_TASKS_FROZEN))
> + cmci_rediscover();
>   break;
>   case CPU_DOWN_PREPARE:
>   smp_call_function_single(cpu, mce_disable_cpu, , 1);
> @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
> long action, void *hcpu)
>   break;
>   }
> 
> - if (action == CPU_POST_DEAD) {
> - /* intentionally ignoring frozen here */
> - cmci_rediscover();
> - }
> -
>   return NOTIFY_OK;
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Borislav Petkov
On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote:
> >> So I think we can reduce it to just the one rwsem (with recursion) if we
> >> shoot CPU_POST_DEAD in the head.
> >
> > Here's the first bullet. Stressing my box here with Steve's hotplug
> > script seems to work fine.
> >
> > Tony, any objections?
> 
> what was this comment referring to:
> 
> /* intentionally ignoring frozen here */
> 
> After you move the cmci_rediscover() call, it is now in a place where we are
> no longer ignoring frozen (i.e. the old placement did the rediscover even if 
> the
> CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
> rediscovery.
>
> So we were working around some interaction between cpu hotplug and frozen.
> Do we no longer need to do that?

Hmm, that FROZEN thing is supposedly for hotplug operations while
suspend is happening. I guess it makes a little sense to rediscover CMCI
banks while suspend is in progress. Whatever.

Let's keep it before more crap ensues, that was a good catch, thanks.

So, I guess something like that instead.

Which means, I'd need to run a couple of suspend/resume rounds while
hotplugging cores to see whether we're still kosher.

More fun tomorrow.

---
From: Borislav Petkov 
Date: Thu, 22 May 2014 16:40:54 +0200
Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD

In conjunction with cleaning up CPU hotplug, we want to get rid of
CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
end of CPU_DEAD.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68317c80de7f..bfde4871848f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
mce_intel_hcpu_update(cpu);
+
+   /* intentionally ignoring frozen here */
+   if (!(action & CPU_TASKS_FROZEN))
+   cmci_rediscover();
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, , 1);
@@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
break;
}
 
-   if (action == CPU_POST_DEAD) {
-   /* intentionally ignoring frozen here */
-   cmci_rediscover();
-   }
-
return NOTIFY_OK;
 }
 
-- 
1.9.0


-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Luck, Tony
>> So I think we can reduce it to just the one rwsem (with recursion) if we
>> shoot CPU_POST_DEAD in the head.
>
> Here's the first bullet. Stressing my box here with Steve's hotplug
> script seems to work fine.
>
> Tony, any objections?

what was this comment referring to:

/* intentionally ignoring frozen here */

After you move the cmci_rediscover() call, it is now in a place where we are
no longer ignoring frozen (i.e. the old placement did the rediscover even if the
CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery.

So we were working around some interaction between cpu hotplug and frozen.
Do we no longer need to do that?

-Tony

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Luck, Tony
 So I think we can reduce it to just the one rwsem (with recursion) if we
 shoot CPU_POST_DEAD in the head.

 Here's the first bullet. Stressing my box here with Steve's hotplug
 script seems to work fine.

 Tony, any objections?

what was this comment referring to:

/* intentionally ignoring frozen here */

After you move the cmci_rediscover() call, it is now in a place where we are
no longer ignoring frozen (i.e. the old placement did the rediscover even if the
CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery.

So we were working around some interaction between cpu hotplug and frozen.
Do we no longer need to do that?

-Tony

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Borislav Petkov
On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote:
  So I think we can reduce it to just the one rwsem (with recursion) if we
  shoot CPU_POST_DEAD in the head.
 
  Here's the first bullet. Stressing my box here with Steve's hotplug
  script seems to work fine.
 
  Tony, any objections?
 
 what was this comment referring to:
 
 /* intentionally ignoring frozen here */
 
 After you move the cmci_rediscover() call, it is now in a place where we are
 no longer ignoring frozen (i.e. the old placement did the rediscover even if 
 the
 CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
 rediscovery.

 So we were working around some interaction between cpu hotplug and frozen.
 Do we no longer need to do that?

Hmm, that FROZEN thing is supposedly for hotplug operations while
suspend is happening. I guess it makes a little sense to rediscover CMCI
banks while suspend is in progress. Whatever.

Let's keep it before more crap ensues, that was a good catch, thanks.

So, I guess something like that instead.

Which means, I'd need to run a couple of suspend/resume rounds while
hotplugging cores to see whether we're still kosher.

More fun tomorrow.

---
From: Borislav Petkov b...@suse.de
Date: Thu, 22 May 2014 16:40:54 +0200
Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD

In conjunction with cleaning up CPU hotplug, we want to get rid of
CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
end of CPU_DEAD.

Signed-off-by: Borislav Petkov b...@suse.de
---
 arch/x86/kernel/cpu/mcheck/mce.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68317c80de7f..bfde4871848f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
mce_intel_hcpu_update(cpu);
+
+   /* intentionally ignoring frozen here */
+   if (!(action  CPU_TASKS_FROZEN))
+   cmci_rediscover();
break;
case CPU_DOWN_PREPARE:
smp_call_function_single(cpu, mce_disable_cpu, action, 1);
@@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
long action, void *hcpu)
break;
}
 
-   if (action == CPU_POST_DEAD) {
-   /* intentionally ignoring frozen here */
-   cmci_rediscover();
-   }
-
return NOTIFY_OK;
 }
 
-- 
1.9.0


-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Srivatsa S. Bhat
On 05/23/2014 01:25 AM, Borislav Petkov wrote:
 On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote:
 So I think we can reduce it to just the one rwsem (with recursion) if we
 shoot CPU_POST_DEAD in the head.

 Here's the first bullet. Stressing my box here with Steve's hotplug
 script seems to work fine.

 Tony, any objections?

 what was this comment referring to:

 /* intentionally ignoring frozen here */

 After you move the cmci_rediscover() call, it is now in a place where we are
 no longer ignoring frozen (i.e. the old placement did the rediscover even if 
 the
 CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
 rediscovery.


That's not quite true. The existing code already ignores FROZEN for all the 
cases,
by ignoring it at the top of the switch-case itself:

switch (action  ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
[...]
break;
case CPU_DEAD:
if (threshold_cpu_callback)
threshold_cpu_callback(action, cpu);
mce_device_remove(cpu);
mce_intel_hcpu_update(cpu);
break;

Then I started wondering what the comment really meant, and commit 1a65f970d1
made things clear: its actually the _other_ way around! That is, 
cmci_rediscover()
didn't have to be invoked* during suspend/resume, so it was kept separate from
the rest.

* or maybe it was not _supposed_ to be invoked; I don't know which is the case..
the original commit 88ccbedd9 didn't explain that.

Either way, cmci_rediscover() doesn't seem to have any reason why it should
be called specifically in the POST_DEAD stage only. So I'm sure we can get rid
of that one way or other easily.

Regards,
Srivatsa S. Bhat

 So we were working around some interaction between cpu hotplug and frozen.
 Do we no longer need to do that?
 
 Hmm, that FROZEN thing is supposedly for hotplug operations while
 suspend is happening. I guess it makes a little sense to rediscover CMCI
 banks while suspend is in progress. Whatever.
 
 Let's keep it before more crap ensues, that was a good catch, thanks.
 
 So, I guess something like that instead.
 
 Which means, I'd need to run a couple of suspend/resume rounds while
 hotplugging cores to see whether we're still kosher.
 
 More fun tomorrow.
 
 ---
 From: Borislav Petkov b...@suse.de
 Date: Thu, 22 May 2014 16:40:54 +0200
 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD
 
 In conjunction with cleaning up CPU hotplug, we want to get rid of
 CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
 end of CPU_DEAD.
 
 Signed-off-by: Borislav Petkov b...@suse.de
 ---
  arch/x86/kernel/cpu/mcheck/mce.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
 b/arch/x86/kernel/cpu/mcheck/mce.c
 index 68317c80de7f..bfde4871848f 100644
 --- a/arch/x86/kernel/cpu/mcheck/mce.c
 +++ b/arch/x86/kernel/cpu/mcheck/mce.c
 @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
 long action, void *hcpu)
   threshold_cpu_callback(action, cpu);
   mce_device_remove(cpu);
   mce_intel_hcpu_update(cpu);
 +
 + /* intentionally ignoring frozen here */
 + if (!(action  CPU_TASKS_FROZEN))
 + cmci_rediscover();
   break;
   case CPU_DOWN_PREPARE:
   smp_call_function_single(cpu, mce_disable_cpu, action, 1);
 @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
 long action, void *hcpu)
   break;
   }
 
 - if (action == CPU_POST_DEAD) {
 - /* intentionally ignoring frozen here */
 - cmci_rediscover();
 - }
 -
   return NOTIFY_OK;
  }
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Borislav Petkov
On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote:
  After you move the cmci_rediscover() call, it is now in a place where we 
  are
  no longer ignoring frozen (i.e. the old placement did the rediscover even 
  if the
  CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
  rediscovery.
 
 
 That's not quite true. The existing code already ignores FROZEN for all the 
 cases,
 by ignoring it at the top of the switch-case itself:

No, Tony's right and you got confused:

Before my change, the code did:

if (action == CPU_POST_DEAD) {
/* intentionally ignoring frozen here */
cmci_rediscover();
}

which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit.

If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN
bit and gets executed for both:

CPU_DEAD:
CPU_DEAD_FROZEN:

because with the FROZEN bit masked out, they're the same.

But we don't want to execute it for the FROZEN bit - look for the other
two tests for CPU_TASKS_FROZEN in mce.c for an example.

So, before we go and change the FROZEN aspect and break things in
strange ways, let's keep the _FROZEN ignore. I certainly don't want to
go down that road and chase why we needed FROZEN or not.

Ok?

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Srivatsa S. Bhat
On 05/23/2014 03:01 AM, Borislav Petkov wrote:
 On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote:
 After you move the cmci_rediscover() call, it is now in a place where we 
 are
 no longer ignoring frozen (i.e. the old placement did the rediscover even 
 if the
 CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
 rediscovery.


 That's not quite true. The existing code already ignores FROZEN for all the 
 cases,
 by ignoring it at the top of the switch-case itself:
 
 No, Tony's right and you got confused:
 
 Before my change, the code did:
 
   if (action == CPU_POST_DEAD) {
   /* intentionally ignoring frozen here */
   cmci_rediscover();
   }
 
 which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit.
 
 If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN
 bit and gets executed for both:
 
   CPU_DEAD:
   CPU_DEAD_FROZEN:
 
 because with the FROZEN bit masked out, they're the same.
 
 But we don't want to execute it for the FROZEN bit - look for the other
 two tests for CPU_TASKS_FROZEN in mce.c for an example.
 
 So, before we go and change the FROZEN aspect and break things in
 strange ways, let's keep the _FROZEN ignore. I certainly don't want to
 go down that road and chase why we needed FROZEN or not.
 
 Ok?
 

Right, I got confused about who meant what by the term 'ignore' -
ignore the FROZEN _bit_ as in execute all the time irrespective of that
bit being set or unset, or ignore the FROZEN _case_ as in don't execute
during suspend/resume.

Anyway, sorry for the confusion! Your latest code looks correct to me.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD

2014-05-22 Thread Srivatsa S. Bhat
On 05/23/2014 01:25 AM, Borislav Petkov wrote:
 On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote:
 So I think we can reduce it to just the one rwsem (with recursion) if we
 shoot CPU_POST_DEAD in the head.

 Here's the first bullet. Stressing my box here with Steve's hotplug
 script seems to work fine.

 Tony, any objections?

 what was this comment referring to:

 /* intentionally ignoring frozen here */

 After you move the cmci_rediscover() call, it is now in a place where we are
 no longer ignoring frozen (i.e. the old placement did the rediscover even if 
 the
 CPU_TASKS_FROZEN bit was set - with the new placement we will skip 
 rediscovery.

 So we were working around some interaction between cpu hotplug and frozen.
 Do we no longer need to do that?
 
 Hmm, that FROZEN thing is supposedly for hotplug operations while
 suspend is happening. I guess it makes a little sense to rediscover CMCI
 banks while suspend is in progress. Whatever.
 
 Let's keep it before more crap ensues, that was a good catch, thanks.
 
 So, I guess something like that instead.
 
 Which means, I'd need to run a couple of suspend/resume rounds while
 hotplugging cores to see whether we're still kosher.
 
 More fun tomorrow.
 
 ---
 From: Borislav Petkov b...@suse.de
 Date: Thu, 22 May 2014 16:40:54 +0200
 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD
 
 In conjunction with cleaning up CPU hotplug, we want to get rid of
 CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the
 end of CPU_DEAD.
 
 Signed-off-by: Borislav Petkov b...@suse.de

Reviewed-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

Regards,
Srivatsa S. Bhat

 ---
  arch/x86/kernel/cpu/mcheck/mce.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
 b/arch/x86/kernel/cpu/mcheck/mce.c
 index 68317c80de7f..bfde4871848f 100644
 --- a/arch/x86/kernel/cpu/mcheck/mce.c
 +++ b/arch/x86/kernel/cpu/mcheck/mce.c
 @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
 long action, void *hcpu)
   threshold_cpu_callback(action, cpu);
   mce_device_remove(cpu);
   mce_intel_hcpu_update(cpu);
 +
 + /* intentionally ignoring frozen here */
 + if (!(action  CPU_TASKS_FROZEN))
 + cmci_rediscover();
   break;
   case CPU_DOWN_PREPARE:
   smp_call_function_single(cpu, mce_disable_cpu, action, 1);
 @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned 
 long action, void *hcpu)
   break;
   }
 
 - if (action == CPU_POST_DEAD) {
 - /* intentionally ignoring frozen here */
 - cmci_rediscover();
 - }
 -
   return NOTIFY_OK;
  }
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/