Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-05 Thread Srivatsa S. Bhat
On 10/05/2012 08:54 AM, Yasuaki Ishimatsu wrote:
> 2012/10/04 15:16, Srivatsa S. Bhat wrote:
>> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>>> On Wed, 03 Oct 2012 18:23:09 +0530
>>> "Srivatsa S. Bhat"  wrote:
>>>
>>>> The synchronization between CPU hotplug readers and writers is
>>>> achieved by
>>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>>
>>>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>>>> decrements
>>>> it. If we ever hit an imbalance between the two, we end up
>>>> compromising the
>>>> guarantees of the hotplug synchronization i.e, for example, an extra
>>>> call to
>>>> put_online_cpus() can end up allowing a hotplug reader to execute
>>>> concurrently with
>>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect
>>>> such cases
>>>> where the refcount can go negative.
>>>>
>>>> Signed-off-by: Srivatsa S. Bhat 
>>>> ---
>>>>
>>>>   kernel/cpu.c |1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>>> index f560598..00d29bc 100644
>>>> --- a/kernel/cpu.c
>>>> +++ b/kernel/cpu.c
>>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>>>   if (cpu_hotplug.active_writer == current)
>>>>   return;
>>>>   mutex_lock(_hotplug.lock);
>>>> +BUG_ON(cpu_hotplug.refcount == 0);
>>>>   if (!--cpu_hotplug.refcount &&
>>>> unlikely(cpu_hotplug.active_writer))
>>>>   wake_up_process(cpu_hotplug.active_writer);
>>>>   mutex_unlock(_hotplug.lock);
>>>
>>> I think calling BUG() here is a bit harsh.  We should only do that if
>>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>>> analyse the underlying bug, etc.
>>>
>>> But a cpu-hotplug locking imbalance is a really really really minor
>>> problem!  So how about we emit a warning then try to fix things up?
>>
>> That would be better indeed, thanks!
>>
>>> This should increase the chance that the machine will keep running and
>>> so will increase the chance that a user will be able to report the bug
>>> to us.
>>>
>>
>> Yep, sounds good.
>>
>>>
>>> ---
>>> a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>>>
>>> +++ a/kernel/cpu.c
>>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>>>   if (cpu_hotplug.active_writer == current)
>>>   return;
>>>   mutex_lock(_hotplug.lock);
>>> -BUG_ON(cpu_hotplug.refcount == 0);
>>> -if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>> -wake_up_process(cpu_hotplug.active_writer);
>>> +if (!--cpu_hotplug.refcount) {
>>
>> This won't catch it. We'll enter this 'if' condition only when
>> cpu_hotplug.refcount was
>> decremented to zero. We'll miss out the case when it went negative
>> (which we intended to detect).
>>
>>> +if (WARN_ON(cpu_hotplug.refcount == -1))
>>> +cpu_hotplug.refcount++;/* try to fix things up */
>>> +if (unlikely(cpu_hotplug.active_writer))
>>> +wake_up_process(cpu_hotplug.active_writer);
>>> +}
>>>   mutex_unlock(_hotplug.lock);
>>>
>>>   }
>>
>> So how about something like below:
>>
>> -->
>>
>> From: Srivatsa S. Bhat 
>> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between
>> get_online_cpus() and put_online_cpus()
>>
>> The synchronization between CPU hotplug readers and writers is
>> achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>> decrements
>> it. If we ever hit an imbalance between the two, we end up
>> compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra
>> call to
>> put_online_cpus() can end up allowing a hotplug reader to execute
>> concurrently with
>> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect
>> such cases
>> where the refcount can go negative, and also attempt to fix it up, so
>>

Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-05 Thread Srivatsa S. Bhat
On 10/05/2012 08:54 AM, Yasuaki Ishimatsu wrote:
 2012/10/04 15:16, Srivatsa S. Bhat wrote:
 On 10/04/2012 02:43 AM, Andrew Morton wrote:
 On Wed, 03 Oct 2012 18:23:09 +0530
 Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote:

 The synchronization between CPU hotplug readers and writers is
 achieved by
 means of refcounting, safe-guarded by the cpu_hotplug.lock.

 get_online_cpus() increments the refcount, whereas put_online_cpus()
 decrements
 it. If we ever hit an imbalance between the two, we end up
 compromising the
 guarantees of the hotplug synchronization i.e, for example, an extra
 call to
 put_online_cpus() can end up allowing a hotplug reader to execute
 concurrently with
 a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect
 such cases
 where the refcount can go negative.

 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 ---

   kernel/cpu.c |1 +
   1 file changed, 1 insertion(+)

 diff --git a/kernel/cpu.c b/kernel/cpu.c
 index f560598..00d29bc 100644
 --- a/kernel/cpu.c
 +++ b/kernel/cpu.c
 @@ -80,6 +80,7 @@ void put_online_cpus(void)
   if (cpu_hotplug.active_writer == current)
   return;
   mutex_lock(cpu_hotplug.lock);
 +BUG_ON(cpu_hotplug.refcount == 0);
   if (!--cpu_hotplug.refcount 
 unlikely(cpu_hotplug.active_writer))
   wake_up_process(cpu_hotplug.active_writer);
   mutex_unlock(cpu_hotplug.lock);

 I think calling BUG() here is a bit harsh.  We should only do that if
 there's a risk to proceeding: a risk of data loss, a reduced ability to
 analyse the underlying bug, etc.

 But a cpu-hotplug locking imbalance is a really really really minor
 problem!  So how about we emit a warning then try to fix things up?

 That would be better indeed, thanks!

 This should increase the chance that the machine will keep running and
 so will increase the chance that a user will be able to report the bug
 to us.


 Yep, sounds good.


 ---
 a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix

 +++ a/kernel/cpu.c
 @@ -80,9 +80,12 @@ void put_online_cpus(void)
   if (cpu_hotplug.active_writer == current)
   return;
   mutex_lock(cpu_hotplug.lock);
 -BUG_ON(cpu_hotplug.refcount == 0);
 -if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
 -wake_up_process(cpu_hotplug.active_writer);
 +if (!--cpu_hotplug.refcount) {

 This won't catch it. We'll enter this 'if' condition only when
 cpu_hotplug.refcount was
 decremented to zero. We'll miss out the case when it went negative
 (which we intended to detect).

 +if (WARN_ON(cpu_hotplug.refcount == -1))
 +cpu_hotplug.refcount++;/* try to fix things up */
 +if (unlikely(cpu_hotplug.active_writer))
 +wake_up_process(cpu_hotplug.active_writer);
 +}
   mutex_unlock(cpu_hotplug.lock);

   }

 So how about something like below:

 --

 From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 Subject: [PATCH] CPU hotplug, debug: Detect imbalance between
 get_online_cpus() and put_online_cpus()

 The synchronization between CPU hotplug readers and writers is
 achieved by
 means of refcounting, safe-guarded by the cpu_hotplug.lock.

 get_online_cpus() increments the refcount, whereas put_online_cpus()
 decrements
 it. If we ever hit an imbalance between the two, we end up
 compromising the
 guarantees of the hotplug synchronization i.e, for example, an extra
 call to
 put_online_cpus() can end up allowing a hotplug reader to execute
 concurrently with
 a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect
 such cases
 where the refcount can go negative, and also attempt to fix it up, so
 that we can
 continue to run.

 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 ---
 
 Looks good to me.
 Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
 

Thanks for your review Yasuaki!

Regards,
Srivatsa S. Bhat


   kernel/cpu.c |4 
   1 file changed, 4 insertions(+)

 diff --git a/kernel/cpu.c b/kernel/cpu.c
 index f560598..42bd331 100644
 --- a/kernel/cpu.c
 +++ b/kernel/cpu.c
 @@ -80,6 +80,10 @@ void put_online_cpus(void)
   if (cpu_hotplug.active_writer == current)
   return;
   mutex_lock(cpu_hotplug.lock);
 +
 +if (WARN_ON(!cpu_hotplug.refcount))
 +cpu_hotplug.refcount++; /* try to fix things up */
 +
   if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
   wake_up_process(cpu_hotplug.active_writer);
   mutex_unlock(cpu_hotplug.lock);



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-04 Thread Yasuaki Ishimatsu

2012/10/04 15:16, Srivatsa S. Bhat wrote:

On 10/04/2012 02:43 AM, Andrew Morton wrote:

On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat"  wrote:


The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
where the refcount can go negative.

Signed-off-by: Srivatsa S. Bhat 
---

  kernel/cpu.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(_hotplug.lock);
+   BUG_ON(cpu_hotplug.refcount == 0);
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(_hotplug.lock);


I think calling BUG() here is a bit harsh.  We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.

But a cpu-hotplug locking imbalance is a really really really minor
problem!  So how about we emit a warning then try to fix things up?


That would be better indeed, thanks!


This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.



Yep, sounds good.



--- 
a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(_hotplug.lock);
-   BUG_ON(cpu_hotplug.refcount == 0);
-   if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-   wake_up_process(cpu_hotplug.active_writer);
+   if (!--cpu_hotplug.refcount) {


This won't catch it. We'll enter this 'if' condition only when 
cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we 
intended to detect).


+   if (WARN_ON(cpu_hotplug.refcount == -1))
+   cpu_hotplug.refcount++; /* try to fix things up */
+   if (unlikely(cpu_hotplug.active_writer))
+   wake_up_process(cpu_hotplug.active_writer);
+   }
mutex_unlock(_hotplug.lock);

  }


So how about something like below:

-->

From: Srivatsa S. Bhat 
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() 
and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we 
can
continue to run.

Signed-off-by: Srivatsa S. Bhat 
---


Looks good to me.
Reviewed-by: Yasuaki Ishimatsu 



  kernel/cpu.c |4 
  1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(_hotplug.lock);
+
+   if (WARN_ON(!cpu_hotplug.refcount))
+   cpu_hotplug.refcount++; /* try to fix things up */
+
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(_hotplug.lock);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-04 Thread Srivatsa S. Bhat
On 10/04/2012 02:43 AM, Andrew Morton wrote:
> On Wed, 03 Oct 2012 18:23:09 +0530
> "Srivatsa S. Bhat"  wrote:
> 
>> The synchronization between CPU hotplug readers and writers is achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus() 
>> decrements
>> it. If we ever hit an imbalance between the two, we end up compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>> put_online_cpus() can end up allowing a hotplug reader to execute 
>> concurrently with
>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such 
>> cases
>> where the refcount can go negative.
>>
>> Signed-off-by: Srivatsa S. Bhat 
>> ---
>>
>>  kernel/cpu.c |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..00d29bc 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>  if (cpu_hotplug.active_writer == current)
>>  return;
>>  mutex_lock(_hotplug.lock);
>> +BUG_ON(cpu_hotplug.refcount == 0);
>>  if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>  wake_up_process(cpu_hotplug.active_writer);
>>  mutex_unlock(_hotplug.lock);
> 
> I think calling BUG() here is a bit harsh.  We should only do that if
> there's a risk to proceeding: a risk of data loss, a reduced ability to
> analyse the underlying bug, etc.
> 
> But a cpu-hotplug locking imbalance is a really really really minor
> problem!  So how about we emit a warning then try to fix things up? 

That would be better indeed, thanks!

> This should increase the chance that the machine will keep running and
> so will increase the chance that a user will be able to report the bug
> to us.
>

Yep, sounds good.
 
> 
> --- 
> a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
> +++ a/kernel/cpu.c
> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>   if (cpu_hotplug.active_writer == current)
>   return;
>   mutex_lock(_hotplug.lock);
> - BUG_ON(cpu_hotplug.refcount == 0);
> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> - wake_up_process(cpu_hotplug.active_writer);
> + if (!--cpu_hotplug.refcount) {

This won't catch it. We'll enter this 'if' condition only when 
cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we 
intended to detect).

> + if (WARN_ON(cpu_hotplug.refcount == -1))
> + cpu_hotplug.refcount++; /* try to fix things up */
> +     if (unlikely(cpu_hotplug.active_writer))
> + wake_up_process(cpu_hotplug.active_writer);
> + }
>   mutex_unlock(_hotplug.lock);
> 
>  }

So how about something like below:

-->

From: Srivatsa S. Bhat 
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() 
and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we 
can
continue to run.

Signed-off-by: Srivatsa S. Bhat 
---

 kernel/cpu.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(_hotplug.lock);
+
+   if (WARN_ON(!cpu_hotplug.refcount))
+   cpu_hotplug.refcount++; /* try to fix things up */
+
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(_hotplug.lock);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-04 Thread Srivatsa S. Bhat
On 10/04/2012 02:43 AM, Andrew Morton wrote:
 On Wed, 03 Oct 2012 18:23:09 +0530
 Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote:
 
 The synchronization between CPU hotplug readers and writers is achieved by
 means of refcounting, safe-guarded by the cpu_hotplug.lock.

 get_online_cpus() increments the refcount, whereas put_online_cpus() 
 decrements
 it. If we ever hit an imbalance between the two, we end up compromising the
 guarantees of the hotplug synchronization i.e, for example, an extra call to
 put_online_cpus() can end up allowing a hotplug reader to execute 
 concurrently with
 a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such 
 cases
 where the refcount can go negative.

 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 ---

  kernel/cpu.c |1 +
  1 file changed, 1 insertion(+)

 diff --git a/kernel/cpu.c b/kernel/cpu.c
 index f560598..00d29bc 100644
 --- a/kernel/cpu.c
 +++ b/kernel/cpu.c
 @@ -80,6 +80,7 @@ void put_online_cpus(void)
  if (cpu_hotplug.active_writer == current)
  return;
  mutex_lock(cpu_hotplug.lock);
 +BUG_ON(cpu_hotplug.refcount == 0);
  if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
  wake_up_process(cpu_hotplug.active_writer);
  mutex_unlock(cpu_hotplug.lock);
 
 I think calling BUG() here is a bit harsh.  We should only do that if
 there's a risk to proceeding: a risk of data loss, a reduced ability to
 analyse the underlying bug, etc.
 
 But a cpu-hotplug locking imbalance is a really really really minor
 problem!  So how about we emit a warning then try to fix things up? 

That would be better indeed, thanks!

 This should increase the chance that the machine will keep running and
 so will increase the chance that a user will be able to report the bug
 to us.


Yep, sounds good.
 
 
 --- 
 a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
 +++ a/kernel/cpu.c
 @@ -80,9 +80,12 @@ void put_online_cpus(void)
   if (cpu_hotplug.active_writer == current)
   return;
   mutex_lock(cpu_hotplug.lock);
 - BUG_ON(cpu_hotplug.refcount == 0);
 - if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
 - wake_up_process(cpu_hotplug.active_writer);
 + if (!--cpu_hotplug.refcount) {

This won't catch it. We'll enter this 'if' condition only when 
cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we 
intended to detect).

 + if (WARN_ON(cpu_hotplug.refcount == -1))
 + cpu_hotplug.refcount++; /* try to fix things up */
 + if (unlikely(cpu_hotplug.active_writer))
 + wake_up_process(cpu_hotplug.active_writer);
 + }
   mutex_unlock(cpu_hotplug.lock);
 
  }

So how about something like below:

--

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() 
and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we 
can
continue to run.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 kernel/cpu.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(cpu_hotplug.lock);
+
+   if (WARN_ON(!cpu_hotplug.refcount))
+   cpu_hotplug.refcount++; /* try to fix things up */
+
if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(cpu_hotplug.lock);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-04 Thread Yasuaki Ishimatsu

2012/10/04 15:16, Srivatsa S. Bhat wrote:

On 10/04/2012 02:43 AM, Andrew Morton wrote:

On Wed, 03 Oct 2012 18:23:09 +0530
Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote:


The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
where the refcount can go negative.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

  kernel/cpu.c |1 +
  1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(cpu_hotplug.lock);
+   BUG_ON(cpu_hotplug.refcount == 0);
if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(cpu_hotplug.lock);


I think calling BUG() here is a bit harsh.  We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.

But a cpu-hotplug locking imbalance is a really really really minor
problem!  So how about we emit a warning then try to fix things up?


That would be better indeed, thanks!


This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.



Yep, sounds good.



--- 
a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(cpu_hotplug.lock);
-   BUG_ON(cpu_hotplug.refcount == 0);
-   if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
-   wake_up_process(cpu_hotplug.active_writer);
+   if (!--cpu_hotplug.refcount) {


This won't catch it. We'll enter this 'if' condition only when 
cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we 
intended to detect).


+   if (WARN_ON(cpu_hotplug.refcount == -1))
+   cpu_hotplug.refcount++; /* try to fix things up */
+   if (unlikely(cpu_hotplug.active_writer))
+   wake_up_process(cpu_hotplug.active_writer);
+   }
mutex_unlock(cpu_hotplug.lock);

  }


So how about something like below:

--

From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() 
and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we 
can
continue to run.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---


Looks good to me.
Reviewed-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com



  kernel/cpu.c |4 
  1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(cpu_hotplug.lock);
+
+   if (WARN_ON(!cpu_hotplug.refcount))
+   cpu_hotplug.refcount++; /* try to fix things up */
+
if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(cpu_hotplug.lock);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ

Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-03 Thread Andrew Morton
On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat"  wrote:

> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
> 
> get_online_cpus() increments the refcount, whereas put_online_cpus() 
> decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute 
> concurrently with
> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative.
> 
> Signed-off-by: Srivatsa S. Bhat 
> ---
> 
>  kernel/cpu.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..00d29bc 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>   if (cpu_hotplug.active_writer == current)
>   return;
>   mutex_lock(_hotplug.lock);
> + BUG_ON(cpu_hotplug.refcount == 0);
>   if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>   wake_up_process(cpu_hotplug.active_writer);
>   mutex_unlock(_hotplug.lock);

I think calling BUG() here is a bit harsh.  We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.

But a cpu-hotplug locking imbalance is a really really really minor
problem!  So how about we emit a warning then try to fix things up? 
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.


--- 
a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(_hotplug.lock);
-   BUG_ON(cpu_hotplug.refcount == 0);
-   if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-   wake_up_process(cpu_hotplug.active_writer);
+   if (!--cpu_hotplug.refcount) {
+   if (WARN_ON(cpu_hotplug.refcount == -1))
+   cpu_hotplug.refcount++; /* try to fix things up */
+   if (unlikely(cpu_hotplug.active_writer))
+   wake_up_process(cpu_hotplug.active_writer);
+   }
mutex_unlock(_hotplug.lock);
 
 }
_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-03 Thread Srivatsa S. Bhat
On 10/03/2012 05:52 PM, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
>>
>>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
>>> cachep has already been unlinked. I am adding slab people and linux-mm to 
>>> CC (the whole thread on LKML can be found at 
>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>
[...]
> 
> But, I'm also quite surprised that the put_online_cpus() code as it stands 
> today
> doesn't have any checks for the refcount going negative. I believe that such a
> check would be valuable to help catch cases where we might end up 
> inadvertently
> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll 
> post
> that as a separate patch.
> 


---------------


From: Srivatsa S. Bhat 
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() 
and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
where the refcount can go negative.

Signed-off-by: Srivatsa S. Bhat 
---

 kernel/cpu.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(_hotplug.lock);
+   BUG_ON(cpu_hotplug.refcount == 0);
if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(_hotplug.lock);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-03 Thread Srivatsa S. Bhat
On 10/03/2012 05:52 PM, Srivatsa S. Bhat wrote:
 On 10/03/2012 03:16 PM, Jiri Kosina wrote:
 On Wed, 3 Oct 2012, Jiri Kosina wrote:

 Good question. I believe it should be safe to drop slab_mutex earlier, as 
 cachep has already been unlinked. I am adding slab people and linux-mm to 
 CC (the whole thread on LKML can be found at 
 https://lkml.org/lkml/2012/10/2/296 for reference).

[...]
 
 But, I'm also quite surprised that the put_online_cpus() code as it stands 
 today
 doesn't have any checks for the refcount going negative. I believe that such a
 check would be valuable to help catch cases where we might end up 
 inadvertently
 causing an imbalance between get_online_cpus() and put_online_cpus(). I'll 
 post
 that as a separate patch.
 


---


From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() 
and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently 
with
a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
where the refcount can go negative.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
---

 kernel/cpu.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(cpu_hotplug.lock);
+   BUG_ON(cpu_hotplug.refcount == 0);
if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
wake_up_process(cpu_hotplug.active_writer);
mutex_unlock(cpu_hotplug.lock);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

2012-10-03 Thread Andrew Morton
On Wed, 03 Oct 2012 18:23:09 +0530
Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote:

 The synchronization between CPU hotplug readers and writers is achieved by
 means of refcounting, safe-guarded by the cpu_hotplug.lock.
 
 get_online_cpus() increments the refcount, whereas put_online_cpus() 
 decrements
 it. If we ever hit an imbalance between the two, we end up compromising the
 guarantees of the hotplug synchronization i.e, for example, an extra call to
 put_online_cpus() can end up allowing a hotplug reader to execute 
 concurrently with
 a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
 where the refcount can go negative.
 
 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
 ---
 
  kernel/cpu.c |1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/kernel/cpu.c b/kernel/cpu.c
 index f560598..00d29bc 100644
 --- a/kernel/cpu.c
 +++ b/kernel/cpu.c
 @@ -80,6 +80,7 @@ void put_online_cpus(void)
   if (cpu_hotplug.active_writer == current)
   return;
   mutex_lock(cpu_hotplug.lock);
 + BUG_ON(cpu_hotplug.refcount == 0);
   if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
   wake_up_process(cpu_hotplug.active_writer);
   mutex_unlock(cpu_hotplug.lock);

I think calling BUG() here is a bit harsh.  We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.

But a cpu-hotplug locking imbalance is a really really really minor
problem!  So how about we emit a warning then try to fix things up? 
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.


--- 
a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
if (cpu_hotplug.active_writer == current)
return;
mutex_lock(cpu_hotplug.lock);
-   BUG_ON(cpu_hotplug.refcount == 0);
-   if (!--cpu_hotplug.refcount  unlikely(cpu_hotplug.active_writer))
-   wake_up_process(cpu_hotplug.active_writer);
+   if (!--cpu_hotplug.refcount) {
+   if (WARN_ON(cpu_hotplug.refcount == -1))
+   cpu_hotplug.refcount++; /* try to fix things up */
+   if (unlikely(cpu_hotplug.active_writer))
+   wake_up_process(cpu_hotplug.active_writer);
+   }
mutex_unlock(cpu_hotplug.lock);
 
 }
_

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/