subject:"Re\: \[RFC PATCH v4 1\/9\] CPU hotplug\: Provide APIs to prevent CPU offline from atomic context"

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-24 Thread Srivatsa S. Bhat

On 12/23/2012 10:12 PM, Oleg Nesterov wrote:
> On 12/23, Srivatsa S. Bhat wrote:
>>
>> On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
>>>
>>> We need mb() + rmb(). Plust cli/sti unless this arch has optimized
>>> this_cpu_add() like x86 (as you pointed out).
>>>
>>
>> Hey, IIUC, we actually don't need mb() in the reader!! Just an rmb() will do.
> 
> Well. I don't think so. But when it comes to the barriers I am never sure
> until Paul confirms my understanding ;)
> 
>> #define reader_nested_percpu()   
>> \
>>   (__this_cpu_read(reader_percpu_refcnt) & READER_REFCNT_MASK)
>>
>> #define writer_active()  
>> \
>>  (__this_cpu_read(writer_signal))
>>
>>
>> #define READER_PRESENT   (1UL << 16)
>> #define READER_REFCNT_MASK   (READER_PRESENT - 1)
>>
>> void get_online_cpus_atomic(void)
>> {
>>  preempt_disable();
>>
>>  /*
>>   * First and foremost, make your presence known to the writer.
>>   */
>>  this_cpu_add(reader_percpu_refcnt, READER_PRESENT);
>>
>>  /*
>>   * If we are already using per-cpu refcounts, it is not safe to switch
>>   * the synchronization scheme. So continue using the refcounts.
>>   */
>>  if (reader_nested_percpu()) {
>>  this_cpu_inc(reader_percpu_refcnt);
>>  } else {
>>  smp_rmb();
>>  if (unlikely(writer_active())) {
>>  ... //take hotplug_rwlock
>>  }
>>  }
>>
>>  ...
>>
>>  /* Prevent reordering of any subsequent reads of cpu_online_mask. */
>>  smp_rmb();
>> }
>>
>> The smp_rmb() before writer_active() ensures that LOAD(writer_signal) follows
>> LOAD(reader_percpu_refcnt) (at the 'if' condition). And in turn, that load is
>> automatically going to follow the STORE(reader_percpu_refcnt)
> 
> But why this STORE should be visible on another CPU before we 
> LOAD(writer_signal)?
> 
> Lets discuss the simple and artificial example. Suppose we have
> 
>   int X, Y;
> 
>   int func(void)
>   {
>   X = 1;  // suppose that nobody else can change it
>   mb();
>   return Y;
>   }
> 
> Now you are saying that we can change it and avoid the costly mb():
> 
>   int func(void)
>   {
>   X = 1;
> 
>   if (X != 1)
>   BUG();
>   
>   rmb();
>   return Y;
>   }
> 
> I doubt. rmb() can only guarantee that the preceding LOAD's should be
> completed. Without mb() it is possible that this CPU won't write X to
> memory at all.
> 

Oh, ok :-( Thanks for correcting me and for the detailed explanation!
For a moment, I really thought we had it solved at last! ;-(

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-24 Thread Srivatsa S. Bhat

On 12/23/2012 10:12 PM, Oleg Nesterov wrote:
 On 12/23, Srivatsa S. Bhat wrote:

 On 12/20/2012 07:12 PM, Oleg Nesterov wrote:

 We need mb() + rmb(). Plust cli/sti unless this arch has optimized
 this_cpu_add() like x86 (as you pointed out).


 Hey, IIUC, we actually don't need mb() in the reader!! Just an rmb() will do.
 
 Well. I don't think so. But when it comes to the barriers I am never sure
 until Paul confirms my understanding ;)
 
 #define reader_nested_percpu()   
 \
   (__this_cpu_read(reader_percpu_refcnt)  READER_REFCNT_MASK)

 #define writer_active()  
 \
  (__this_cpu_read(writer_signal))


 #define READER_PRESENT   (1UL  16)
 #define READER_REFCNT_MASK   (READER_PRESENT - 1)

 void get_online_cpus_atomic(void)
 {
  preempt_disable();

  /*
   * First and foremost, make your presence known to the writer.
   */
  this_cpu_add(reader_percpu_refcnt, READER_PRESENT);

  /*
   * If we are already using per-cpu refcounts, it is not safe to switch
   * the synchronization scheme. So continue using the refcounts.
   */
  if (reader_nested_percpu()) {
  this_cpu_inc(reader_percpu_refcnt);
  } else {
  smp_rmb();
  if (unlikely(writer_active())) {
  ... //take hotplug_rwlock
  }
  }

  ...

  /* Prevent reordering of any subsequent reads of cpu_online_mask. */
  smp_rmb();
 }

 The smp_rmb() before writer_active() ensures that LOAD(writer_signal) follows
 LOAD(reader_percpu_refcnt) (at the 'if' condition). And in turn, that load is
 automatically going to follow the STORE(reader_percpu_refcnt)
 
 But why this STORE should be visible on another CPU before we 
 LOAD(writer_signal)?
 
 Lets discuss the simple and artificial example. Suppose we have
 
   int X, Y;
 
   int func(void)
   {
   X = 1;  // suppose that nobody else can change it
   mb();
   return Y;
   }
 
 Now you are saying that we can change it and avoid the costly mb():
 
   int func(void)
   {
   X = 1;
 
   if (X != 1)
   BUG();
   
   rmb();
   return Y;
   }
 
 I doubt. rmb() can only guarantee that the preceding LOAD's should be
 completed. Without mb() it is possible that this CPU won't write X to
 memory at all.
 

Oh, ok :-( Thanks for correcting me and for the detailed explanation!
For a moment, I really thought we had it solved at last! ;-(

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-23 Thread Oleg Nesterov

On 12/23, Srivatsa S. Bhat wrote:
>
> On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
> >
> > We need mb() + rmb(). Plust cli/sti unless this arch has optimized
> > this_cpu_add() like x86 (as you pointed out).
> >
>
> Hey, IIUC, we actually don't need mb() in the reader!! Just an rmb() will do.

Well. I don't think so. But when it comes to the barriers I am never sure
until Paul confirms my understanding ;)

> #define reader_nested_percpu()
> \
>(__this_cpu_read(reader_percpu_refcnt) & READER_REFCNT_MASK)
>
> #define writer_active()   
> \
>   (__this_cpu_read(writer_signal))
>
>
> #define READER_PRESENT(1UL << 16)
> #define READER_REFCNT_MASK(READER_PRESENT - 1)
>
> void get_online_cpus_atomic(void)
> {
>   preempt_disable();
>
>   /*
>* First and foremost, make your presence known to the writer.
>*/
>   this_cpu_add(reader_percpu_refcnt, READER_PRESENT);
>
>   /*
>* If we are already using per-cpu refcounts, it is not safe to switch
>* the synchronization scheme. So continue using the refcounts.
>*/
>   if (reader_nested_percpu()) {
>   this_cpu_inc(reader_percpu_refcnt);
>   } else {
>   smp_rmb();
>   if (unlikely(writer_active())) {
>   ... //take hotplug_rwlock
>   }
>   }
>
>   ...
>
>   /* Prevent reordering of any subsequent reads of cpu_online_mask. */
>   smp_rmb();
> }
>
> The smp_rmb() before writer_active() ensures that LOAD(writer_signal) follows
> LOAD(reader_percpu_refcnt) (at the 'if' condition). And in turn, that load is
> automatically going to follow the STORE(reader_percpu_refcnt)

But why this STORE should be visible on another CPU before we 
LOAD(writer_signal)?

Lets discuss the simple and artificial example. Suppose we have

int X, Y;

int func(void)
{
X = 1;  // suppose that nobody else can change it
mb();
return Y;
}

Now you are saying that we can change it and avoid the costly mb():

int func(void)
{
X = 1;

if (X != 1)
BUG();

rmb();
return Y;
}

I doubt. rmb() can only guarantee that the preceding LOAD's should be
completed. Without mb() it is possible that this CPU won't write X to
memory at all.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-23 Thread Oleg Nesterov

On 12/23, Srivatsa S. Bhat wrote:

 On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
 
  We need mb() + rmb(). Plust cli/sti unless this arch has optimized
  this_cpu_add() like x86 (as you pointed out).
 

 Hey, IIUC, we actually don't need mb() in the reader!! Just an rmb() will do.

Well. I don't think so. But when it comes to the barriers I am never sure
until Paul confirms my understanding ;)

 #define reader_nested_percpu()
 \
(__this_cpu_read(reader_percpu_refcnt)  READER_REFCNT_MASK)

 #define writer_active()   
 \
   (__this_cpu_read(writer_signal))


 #define READER_PRESENT(1UL  16)
 #define READER_REFCNT_MASK(READER_PRESENT - 1)

 void get_online_cpus_atomic(void)
 {
   preempt_disable();

   /*
* First and foremost, make your presence known to the writer.
*/
   this_cpu_add(reader_percpu_refcnt, READER_PRESENT);

   /*
* If we are already using per-cpu refcounts, it is not safe to switch
* the synchronization scheme. So continue using the refcounts.
*/
   if (reader_nested_percpu()) {
   this_cpu_inc(reader_percpu_refcnt);
   } else {
   smp_rmb();
   if (unlikely(writer_active())) {
   ... //take hotplug_rwlock
   }
   }

   ...

   /* Prevent reordering of any subsequent reads of cpu_online_mask. */
   smp_rmb();
 }

 The smp_rmb() before writer_active() ensures that LOAD(writer_signal) follows
 LOAD(reader_percpu_refcnt) (at the 'if' condition). And in turn, that load is
 automatically going to follow the STORE(reader_percpu_refcnt)

But why this STORE should be visible on another CPU before we 
LOAD(writer_signal)?

Lets discuss the simple and artificial example. Suppose we have

int X, Y;

int func(void)
{
X = 1;  // suppose that nobody else can change it
mb();
return Y;
}

Now you are saying that we can change it and avoid the costly mb():

int func(void)
{
X = 1;

if (X != 1)
BUG();

rmb();
return Y;
}

I doubt. rmb() can only guarantee that the preceding LOAD's should be
completed. Without mb() it is possible that this CPU won't write X to
memory at all.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-22 Thread Srivatsa S. Bhat

On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
> On 12/20, Srivatsa S. Bhat wrote:
>>
>> On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
>>>
>>> We need 2 helpers for writer, the 1st one does synchronize_sched() and the
>>> 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
>>>
>>
>> Ah, that's the problem no? Users of reader-writer locks expect to run in
>> atomic context (ie., they don't want to sleep).
> 
> Ah, I misunderstood.
> 
> Sure, percpu_write_lock() should be might_sleep(), and this is not
> symmetric to percpu_read_lock().
> 
>> We can't expose an API that
>> can make the task go to sleep under the covers!
> 
> Why? Just this should be documented. However I would not worry until we
> find another user. Until then we do not even need to add percpu_write_lock
> or try to generalize this code too much.
> 
>>> To me, the main question is: can we use synchronize_sched() in cpu_down?
>>> It is slow.
>>>
>>
>> Haha :-) So we don't want smp_mb() in the reader,
> 
> We need mb() + rmb(). Plust cli/sti unless this arch has optimized
> this_cpu_add() like x86 (as you pointed out).
> 

Hey, IIUC, we actually don't need mb() in the reader!! Just an rmb() will do.

This is the reader code I have so far:

#define reader_nested_percpu()  \
 (__this_cpu_read(reader_percpu_refcnt) & READER_REFCNT_MASK)

#define writer_active() \
(__this_cpu_read(writer_signal))

#define READER_PRESENT  (1UL << 16)
#define READER_REFCNT_MASK  (READER_PRESENT - 1)

void get_online_cpus_atomic(void)
{
preempt_disable();

/*
 * First and foremost, make your presence known to the writer.
 */
this_cpu_add(reader_percpu_refcnt, READER_PRESENT);

/*
 * If we are already using per-cpu refcounts, it is not safe to switch
 * the synchronization scheme. So continue using the refcounts.
 */
if (reader_nested_percpu()) {
this_cpu_inc(reader_percpu_refcnt);
} else {
smp_rmb();
if (unlikely(writer_active())) {
... //take hotplug_rwlock
}
}

...

/* Prevent reordering of any subsequent reads of cpu_online_mask. */
smp_rmb();
}

The smp_rmb() before writer_active() ensures that LOAD(writer_signal) follows
LOAD(reader_percpu_refcnt) (at the 'if' condition). And in turn, that load is
automatically going to follow the STORE(reader_percpu_refcnt) (at 
this_cpu_add())
due to the data dependency. So it is something like a transitive relation.

So, the result is that, we mark ourselves as active in reader_percpu_refcnt 
before
we check writer_signal. This is exactly what we wanted to do right?
And luckily, due to the dependency, we can achieve it without using the heavy
smp_mb(). And, we can't crib about the smp_rmb() because it is unavoidable 
anyway
(because we want to prevent reordering of the reads to cpu_online_mask, like you
pointed out earlier).

I hope I'm not missing anything...

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-22 Thread Srivatsa S. Bhat

On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
 On 12/20, Srivatsa S. Bhat wrote:

 On 12/20/2012 12:44 AM, Oleg Nesterov wrote:

 We need 2 helpers for writer, the 1st one does synchronize_sched() and the
 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.


 Ah, that's the problem no? Users of reader-writer locks expect to run in
 atomic context (ie., they don't want to sleep).
 
 Ah, I misunderstood.
 
 Sure, percpu_write_lock() should be might_sleep(), and this is not
 symmetric to percpu_read_lock().
 
 We can't expose an API that
 can make the task go to sleep under the covers!
 
 Why? Just this should be documented. However I would not worry until we
 find another user. Until then we do not even need to add percpu_write_lock
 or try to generalize this code too much.
 
 To me, the main question is: can we use synchronize_sched() in cpu_down?
 It is slow.


 Haha :-) So we don't want smp_mb() in the reader,
 
 We need mb() + rmb(). Plust cli/sti unless this arch has optimized
 this_cpu_add() like x86 (as you pointed out).
 

Hey, IIUC, we actually don't need mb() in the reader!! Just an rmb() will do.

This is the reader code I have so far:

#define reader_nested_percpu()  \
 (__this_cpu_read(reader_percpu_refcnt)  READER_REFCNT_MASK)

#define writer_active() \
(__this_cpu_read(writer_signal))


#define READER_PRESENT  (1UL  16)
#define READER_REFCNT_MASK  (READER_PRESENT - 1)

void get_online_cpus_atomic(void)
{
preempt_disable();

/*
 * First and foremost, make your presence known to the writer.
 */
this_cpu_add(reader_percpu_refcnt, READER_PRESENT);

/*
 * If we are already using per-cpu refcounts, it is not safe to switch
 * the synchronization scheme. So continue using the refcounts.
 */
if (reader_nested_percpu()) {
this_cpu_inc(reader_percpu_refcnt);
} else {
smp_rmb();
if (unlikely(writer_active())) {
... //take hotplug_rwlock
}
}

...

/* Prevent reordering of any subsequent reads of cpu_online_mask. */
smp_rmb();
}

The smp_rmb() before writer_active() ensures that LOAD(writer_signal) follows
LOAD(reader_percpu_refcnt) (at the 'if' condition). And in turn, that load is
automatically going to follow the STORE(reader_percpu_refcnt) (at 
this_cpu_add())
due to the data dependency. So it is something like a transitive relation.

So, the result is that, we mark ourselves as active in reader_percpu_refcnt 
before
we check writer_signal. This is exactly what we wanted to do right?
And luckily, due to the dependency, we can achieve it without using the heavy
smp_mb(). And, we can't crib about the smp_rmb() because it is unavoidable 
anyway
(because we want to prevent reordering of the reads to cpu_online_mask, like you
pointed out earlier).

I hope I'm not missing anything...

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-20 Thread Srivatsa S. Bhat

On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
> On 12/20, Srivatsa S. Bhat wrote:
>>
>> On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
>>>
>>> We need 2 helpers for writer, the 1st one does synchronize_sched() and the
>>> 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
>>>
>>
>> Ah, that's the problem no? Users of reader-writer locks expect to run in
>> atomic context (ie., they don't want to sleep).
> 
> Ah, I misunderstood.
> 
> Sure, percpu_write_lock() should be might_sleep(), and this is not
> symmetric to percpu_read_lock().
> 
>> We can't expose an API that
>> can make the task go to sleep under the covers!
> 
> Why? Just this should be documented. However I would not worry until we
> find another user. Until then we do not even need to add percpu_write_lock
> or try to generalize this code too much.
> 

Hmm.. But considering the disable_nonboot_cpus() case you mentioned below, I'm
only getting farther away from using synchronize_sched() ;-) And that also makes
it easier to expose a generic percpu rwlock API, like Tejun was suggesting.
So I'll give it a shot.

>>> To me, the main question is: can we use synchronize_sched() in cpu_down?
>>> It is slow.
>>>
>>
>> Haha :-) So we don't want smp_mb() in the reader,
> 
> We need mb() + rmb(). Plust cli/sti unless this arch has optimized
> this_cpu_add() like x86 (as you pointed out).
> 
>> *and* also don't want
>> synchronize_sched() in the writer! Sounds like saying we want to have the 
>> cake
>> and eat it too ;-) :P
> 
> Personally I'd vote for synchronize_sched() but I am not sure. And I do
> not really understand the problem space.
> 
>> And moreover, since I'm still not convinced about the writer API part if use
>> synchronize_sched(), I'd rather avoid synchronize_sched().)
> 
> Understand.
> 
> And yes, synchronize_sched() adds more problems. For example, where should
> we call it? I do not this _cpu_down() should do this, in this case, say,
> disable_nonboot_cpus() needs num_online_cpus() synchronize_sched's.
> 

Ouch! I should have seen that coming!

> So probably cpu_down() should call it before cpu_maps_update_begin(), this
> makes the locking even less obvious.
> 

True.

> In short. What I am trying to say is, don't ask me I do not know ;)
>

OK then, I'll go with what I believe is a reasonably good way (not necessarily
the best way) to deal with this:

I'll avoid the use of synchronize_sched(), expose a decent-looking percpu
rwlock implementation, use it in CPU hotplug and get rid of stop_machine().
That would certainly be a good starting base, IMHO.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-20 Thread Oleg Nesterov

On 12/20, Srivatsa S. Bhat wrote:
>
> On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
> >
> > We need 2 helpers for writer, the 1st one does synchronize_sched() and the
> > 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
> >
>
> Ah, that's the problem no? Users of reader-writer locks expect to run in
> atomic context (ie., they don't want to sleep).

Ah, I misunderstood.

Sure, percpu_write_lock() should be might_sleep(), and this is not
symmetric to percpu_read_lock().

> We can't expose an API that
> can make the task go to sleep under the covers!

Why? Just this should be documented. However I would not worry until we
find another user. Until then we do not even need to add percpu_write_lock
or try to generalize this code too much.

> > To me, the main question is: can we use synchronize_sched() in cpu_down?
> > It is slow.
> >
>
> Haha :-) So we don't want smp_mb() in the reader,

We need mb() + rmb(). Plust cli/sti unless this arch has optimized
this_cpu_add() like x86 (as you pointed out).

> *and* also don't want
> synchronize_sched() in the writer! Sounds like saying we want to have the cake
> and eat it too ;-) :P

Personally I'd vote for synchronize_sched() but I am not sure. And I do
not really understand the problem space.

> And moreover, since I'm still not convinced about the writer API part if use
> synchronize_sched(), I'd rather avoid synchronize_sched().)

Understand.

And yes, synchronize_sched() adds more problems. For example, where should
we call it? I do not this _cpu_down() should do this, in this case, say,
disable_nonboot_cpus() needs num_online_cpus() synchronize_sched's.

So probably cpu_down() should call it before cpu_maps_update_begin(), this
makes the locking even less obvious.

In short. What I am trying to say is, don't ask me I do not know ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-20 Thread Oleg Nesterov

On 12/20, Srivatsa S. Bhat wrote:

 On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
 
  We need 2 helpers for writer, the 1st one does synchronize_sched() and the
  2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
 

 Ah, that's the problem no? Users of reader-writer locks expect to run in
 atomic context (ie., they don't want to sleep).

Ah, I misunderstood.

Sure, percpu_write_lock() should be might_sleep(), and this is not
symmetric to percpu_read_lock().

 We can't expose an API that
 can make the task go to sleep under the covers!

Why? Just this should be documented. However I would not worry until we
find another user. Until then we do not even need to add percpu_write_lock
or try to generalize this code too much.

  To me, the main question is: can we use synchronize_sched() in cpu_down?
  It is slow.
 

 Haha :-) So we don't want smp_mb() in the reader,

We need mb() + rmb(). Plust cli/sti unless this arch has optimized
this_cpu_add() like x86 (as you pointed out).

 *and* also don't want
 synchronize_sched() in the writer! Sounds like saying we want to have the cake
 and eat it too ;-) :P

Personally I'd vote for synchronize_sched() but I am not sure. And I do
not really understand the problem space.

 And moreover, since I'm still not convinced about the writer API part if use
 synchronize_sched(), I'd rather avoid synchronize_sched().)

Understand.

And yes, synchronize_sched() adds more problems. For example, where should
we call it? I do not this _cpu_down() should do this, in this case, say,
disable_nonboot_cpus() needs num_online_cpus() synchronize_sched's.

So probably cpu_down() should call it before cpu_maps_update_begin(), this
makes the locking even less obvious.

In short. What I am trying to say is, don't ask me I do not know ;)

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-20 Thread Srivatsa S. Bhat

On 12/20/2012 07:12 PM, Oleg Nesterov wrote:
 On 12/20, Srivatsa S. Bhat wrote:

 On 12/20/2012 12:44 AM, Oleg Nesterov wrote:

 We need 2 helpers for writer, the 1st one does synchronize_sched() and the
 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.


 Ah, that's the problem no? Users of reader-writer locks expect to run in
 atomic context (ie., they don't want to sleep).
 
 Ah, I misunderstood.
 
 Sure, percpu_write_lock() should be might_sleep(), and this is not
 symmetric to percpu_read_lock().
 
 We can't expose an API that
 can make the task go to sleep under the covers!
 
 Why? Just this should be documented. However I would not worry until we
 find another user. Until then we do not even need to add percpu_write_lock
 or try to generalize this code too much.
 

Hmm.. But considering the disable_nonboot_cpus() case you mentioned below, I'm
only getting farther away from using synchronize_sched() ;-) And that also makes
it easier to expose a generic percpu rwlock API, like Tejun was suggesting.
So I'll give it a shot.

 To me, the main question is: can we use synchronize_sched() in cpu_down?
 It is slow.


 Haha :-) So we don't want smp_mb() in the reader,
 
 We need mb() + rmb(). Plust cli/sti unless this arch has optimized
 this_cpu_add() like x86 (as you pointed out).
 
 *and* also don't want
 synchronize_sched() in the writer! Sounds like saying we want to have the 
 cake
 and eat it too ;-) :P
 
 Personally I'd vote for synchronize_sched() but I am not sure. And I do
 not really understand the problem space.
 
 And moreover, since I'm still not convinced about the writer API part if use
 synchronize_sched(), I'd rather avoid synchronize_sched().)
 
 Understand.
 
 And yes, synchronize_sched() adds more problems. For example, where should
 we call it? I do not this _cpu_down() should do this, in this case, say,
 disable_nonboot_cpus() needs num_online_cpus() synchronize_sched's.
 

Ouch! I should have seen that coming!

 So probably cpu_down() should call it before cpu_maps_update_begin(), this
 makes the locking even less obvious.
 

True.

 In short. What I am trying to say is, don't ask me I do not know ;)


OK then, I'll go with what I believe is a reasonably good way (not necessarily
the best way) to deal with this:

I'll avoid the use of synchronize_sched(), expose a decent-looking percpu
rwlock implementation, use it in CPU hotplug and get rid of stop_machine().
That would certainly be a good starting base, IMHO.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Srivatsa S. Bhat

On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
> On 12/19, Srivatsa S. Bhat wrote:
>>
>> BTW, there is a small problem with the synchronize_sched() approach:
>> We can't generalize the synchronization scheme as a generic percpu rwlock
>> because the writer's role is split into 2, the first part (the one having
>> synchronize_sched()) which should be run in process context, and the
>> second part (the rest of it), which can be run in atomic context.
> 
> Yes,
> 
>> That is, needing the writer to be able to sleep (due to synchronize_sched())
>> will make the locking scheme unusable (in other usecases) I think. And
>> the split (blocking and non-blocking part) itself seems hard to express
>> as a single writer API.
> 
> I do not think this is the problem.
> 
> We need 2 helpers for writer, the 1st one does synchronize_sched() and the
> 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
>

Ah, that's the problem no? Users of reader-writer locks expect to run in
atomic context (ie., they don't want to sleep). We can't expose an API that
can make the task go to sleep under the covers!
(And that too, definitely not with a name such as "percpu_write_lock_irqsave()"
... because we are going to be interrupt-safe as well, in the second part...).

> In fact I think that the 1st helper should not do synchronize_sched(),
> and (say) cpu_hotplug_begin() should call it itself. Because if we also
> change cpu_hotplug_begin() to use percpu_rw_sem we do not want to do
> synchronize_sched() twice. But lets ignore this for now.
> 

Yeah, let's ignore this for now, else I'll get all confused ;-)

> But,
> 
>> Hmmm.. so what do we do? Shall we say "We anyway have to do smp_rmb() in the
>> reader in the fast path. Instead let's do a full smp_mb() and get rid of
>> the synchronize_sched() in the writer, and thereby expose this 
>> synchronization
>> scheme as generic percpu rwlocks?" Or should we rather use 
>> synchronize_sched()
>> and keep this synchronization scheme restricted to CPU hotplug only?
> 
> Oh, I do not know ;)
> 
> To me, the main question is: can we use synchronize_sched() in cpu_down?
> It is slow.
>

Haha :-) So we don't want smp_mb() in the reader, *and* also don't want
synchronize_sched() in the writer! Sounds like saying we want to have the cake
and eat it too ;-) :P

But seriously speaking, I understand that its a bit of a hard choice to make.
On one side, we can avoid synchronize_sched() at the writer, but have to bear
the burden of smp_mb() at the reader even in the fastpath, when no writer is 
active.
On the other side, we can make the reader avoid smp_mb(), but the writer has to
suffer a full synchronize_sched(). But an important point is that, at the writer
side, we already wait for so many mutex locks and stuff, like in the 
CPU_DOWN_PREPARE
stage (as determined by the callbacks). So adding a synchronize_sched() to the 
mix
shouldn't make it noticeably bad, isn't it?

(I'm not arguing in favor of using synchronize_sched(). I'm just trying to point
out that using it might not be as bad as we think it is, in the CPU hotplug 
case.
And moreover, since I'm still not convinced about the writer API part if use
synchronize_sched(), I'd rather avoid synchronize_sched().)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Oleg Nesterov

On 12/19, Srivatsa S. Bhat wrote:
>
> BTW, there is a small problem with the synchronize_sched() approach:
> We can't generalize the synchronization scheme as a generic percpu rwlock
> because the writer's role is split into 2, the first part (the one having
> synchronize_sched()) which should be run in process context, and the
> second part (the rest of it), which can be run in atomic context.

Yes,

> That is, needing the writer to be able to sleep (due to synchronize_sched())
> will make the locking scheme unusable (in other usecases) I think. And
> the split (blocking and non-blocking part) itself seems hard to express
> as a single writer API.

I do not think this is the problem.

We need 2 helpers for writer, the 1st one does synchronize_sched() and the
2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.

In fact I think that the 1st helper should not do synchronize_sched(),
and (say) cpu_hotplug_begin() should call it itself. Because if we also
change cpu_hotplug_begin() to use percpu_rw_sem we do not want to do
synchronize_sched() twice. But lets ignore this for now.

But,

> Hmmm.. so what do we do? Shall we say "We anyway have to do smp_rmb() in the
> reader in the fast path. Instead let's do a full smp_mb() and get rid of
> the synchronize_sched() in the writer, and thereby expose this synchronization
> scheme as generic percpu rwlocks?" Or should we rather use synchronize_sched()
> and keep this synchronization scheme restricted to CPU hotplug only?

Oh, I do not know ;)

To me, the main question is: can we use synchronize_sched() in cpu_down?
It is slow.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Srivatsa S. Bhat

On 12/19/2012 10:09 PM, Oleg Nesterov wrote:
> (sorry if you see this email twice)
> 
> On 12/19, Srivatsa S. Bhat wrote:
>>
>> On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
>>
 I tried thinking about other ways to avoid that smp_mb() in the reader,
>>>
>>> Just in case, I think there is no way to avoid mb() in _get (although
>>> perhaps it can be "implicit").
>>>
>>
>> Actually, I was trying to avoid mb() in the _fastpath_, when there is no
>> active writer. I missed stating that clearly, sorry.
> 
> Yes, I meant the fastpath too,
> 
>>> The writer changes cpu_online_mask and drops the lock. We need to ensure
>>> that the reader sees the change in cpu_online_mask after _get returns.
>>>
>>
>> The write_unlock() will ensure the completion of the update to 
>> cpu_online_mask,
>> using the same semi-permeable logic that you pointed above. So readers will
>> see the update as soon as the writer releases the lock, right?
> 
> Why?
> 
> Once again, the writer (say) removes CPU_1 from cpu_online_mask, and
> sets writer_signal[0] = 0, this "enables" fast path on CPU_0.
> 
> But, without a barrier, there is no guarantee that CPU_0 will see the
> change in cpu_online_mask after get_online_cpus_atomic() checks
> writer_signal[0] and returns.
> 

Hmm, because a reader's code can get reordered to something like:

read cpu_online_mask

get_online_cpus_atomic()

You are right, I missed that.

> Hmm. And this means that the last version lacks the additional rmb()
> (at least) if writer_active() == F.
> 

Yes, the fast path needs that smp_rmb(), whereas the slow-path is fine,
because it takes the read_lock().

BTW, there is a small problem with the synchronize_sched() approach:
We can't generalize the synchronization scheme as a generic percpu rwlock
because the writer's role is split into 2, the first part (the one having
synchronize_sched()) which should be run in process context, and the
second part (the rest of it), which can be run in atomic context.

That is, needing the writer to be able to sleep (due to synchronize_sched())
will make the locking scheme unusable (in other usecases) I think. And
the split (blocking and non-blocking part) itself seems hard to express
as a single writer API.

Hmmm.. so what do we do? Shall we say "We anyway have to do smp_rmb() in the
reader in the fast path. Instead let's do a full smp_mb() and get rid of
the synchronize_sched() in the writer, and thereby expose this synchronization
scheme as generic percpu rwlocks?" Or should we rather use synchronize_sched()
and keep this synchronization scheme restricted to CPU hotplug only?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Oleg Nesterov

(sorry if you see this email twice)

On 12/19, Srivatsa S. Bhat wrote:
>
> On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
>
> >> I tried thinking about other ways to avoid that smp_mb() in the reader,
> >
> > Just in case, I think there is no way to avoid mb() in _get (although
> > perhaps it can be "implicit").
> >
>
> Actually, I was trying to avoid mb() in the _fastpath_, when there is no
> active writer. I missed stating that clearly, sorry.

Yes, I meant the fastpath too,

> > The writer changes cpu_online_mask and drops the lock. We need to ensure
> > that the reader sees the change in cpu_online_mask after _get returns.
> >
>
> The write_unlock() will ensure the completion of the update to 
> cpu_online_mask,
> using the same semi-permeable logic that you pointed above. So readers will
> see the update as soon as the writer releases the lock, right?

Why?

Once again, the writer (say) removes CPU_1 from cpu_online_mask, and
sets writer_signal[0] = 0, this "enables" fast path on CPU_0.

But, without a barrier, there is no guarantee that CPU_0 will see the
change in cpu_online_mask after get_online_cpus_atomic() checks
writer_signal[0] and returns.

Hmm. And this means that the last version lacks the additional rmb()
(at least) if writer_active() == F.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Oleg Nesterov

(sorry if you see this email twice)

On 12/19, Srivatsa S. Bhat wrote:

 On 12/19/2012 01:13 AM, Oleg Nesterov wrote:

  I tried thinking about other ways to avoid that smp_mb() in the reader,
 
  Just in case, I think there is no way to avoid mb() in _get (although
  perhaps it can be implicit).
 

 Actually, I was trying to avoid mb() in the _fastpath_, when there is no
 active writer. I missed stating that clearly, sorry.

Yes, I meant the fastpath too,

  The writer changes cpu_online_mask and drops the lock. We need to ensure
  that the reader sees the change in cpu_online_mask after _get returns.
 

 The write_unlock() will ensure the completion of the update to 
 cpu_online_mask,
 using the same semi-permeable logic that you pointed above. So readers will
 see the update as soon as the writer releases the lock, right?

Why?

Once again, the writer (say) removes CPU_1 from cpu_online_mask, and
sets writer_signal[0] = 0, this enables fast path on CPU_0.

But, without a barrier, there is no guarantee that CPU_0 will see the
change in cpu_online_mask after get_online_cpus_atomic() checks
writer_signal[0] and returns.

Hmm. And this means that the last version lacks the additional rmb()
(at least) if writer_active() == F.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Srivatsa S. Bhat

On 12/19/2012 10:09 PM, Oleg Nesterov wrote:
 (sorry if you see this email twice)
 
 On 12/19, Srivatsa S. Bhat wrote:

 On 12/19/2012 01:13 AM, Oleg Nesterov wrote:

 I tried thinking about other ways to avoid that smp_mb() in the reader,

 Just in case, I think there is no way to avoid mb() in _get (although
 perhaps it can be implicit).


 Actually, I was trying to avoid mb() in the _fastpath_, when there is no
 active writer. I missed stating that clearly, sorry.
 
 Yes, I meant the fastpath too,
 
 The writer changes cpu_online_mask and drops the lock. We need to ensure
 that the reader sees the change in cpu_online_mask after _get returns.


 The write_unlock() will ensure the completion of the update to 
 cpu_online_mask,
 using the same semi-permeable logic that you pointed above. So readers will
 see the update as soon as the writer releases the lock, right?
 
 Why?
 
 Once again, the writer (say) removes CPU_1 from cpu_online_mask, and
 sets writer_signal[0] = 0, this enables fast path on CPU_0.
 
 But, without a barrier, there is no guarantee that CPU_0 will see the
 change in cpu_online_mask after get_online_cpus_atomic() checks
 writer_signal[0] and returns.
 

Hmm, because a reader's code can get reordered to something like:

read cpu_online_mask

get_online_cpus_atomic()

You are right, I missed that.

 Hmm. And this means that the last version lacks the additional rmb()
 (at least) if writer_active() == F.
 

Yes, the fast path needs that smp_rmb(), whereas the slow-path is fine,
because it takes the read_lock().

BTW, there is a small problem with the synchronize_sched() approach:
We can't generalize the synchronization scheme as a generic percpu rwlock
because the writer's role is split into 2, the first part (the one having
synchronize_sched()) which should be run in process context, and the
second part (the rest of it), which can be run in atomic context.

That is, needing the writer to be able to sleep (due to synchronize_sched())
will make the locking scheme unusable (in other usecases) I think. And
the split (blocking and non-blocking part) itself seems hard to express
as a single writer API.

Hmmm.. so what do we do? Shall we say We anyway have to do smp_rmb() in the
reader in the fast path. Instead let's do a full smp_mb() and get rid of
the synchronize_sched() in the writer, and thereby expose this synchronization
scheme as generic percpu rwlocks? Or should we rather use synchronize_sched()
and keep this synchronization scheme restricted to CPU hotplug only?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Oleg Nesterov

On 12/19, Srivatsa S. Bhat wrote:

 BTW, there is a small problem with the synchronize_sched() approach:
 We can't generalize the synchronization scheme as a generic percpu rwlock
 because the writer's role is split into 2, the first part (the one having
 synchronize_sched()) which should be run in process context, and the
 second part (the rest of it), which can be run in atomic context.

Yes,

 That is, needing the writer to be able to sleep (due to synchronize_sched())
 will make the locking scheme unusable (in other usecases) I think. And
 the split (blocking and non-blocking part) itself seems hard to express
 as a single writer API.

I do not think this is the problem.

We need 2 helpers for writer, the 1st one does synchronize_sched() and the
2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.

In fact I think that the 1st helper should not do synchronize_sched(),
and (say) cpu_hotplug_begin() should call it itself. Because if we also
change cpu_hotplug_begin() to use percpu_rw_sem we do not want to do
synchronize_sched() twice. But lets ignore this for now.

But,

 Hmmm.. so what do we do? Shall we say We anyway have to do smp_rmb() in the
 reader in the fast path. Instead let's do a full smp_mb() and get rid of
 the synchronize_sched() in the writer, and thereby expose this synchronization
 scheme as generic percpu rwlocks? Or should we rather use synchronize_sched()
 and keep this synchronization scheme restricted to CPU hotplug only?

Oh, I do not know ;)

To me, the main question is: can we use synchronize_sched() in cpu_down?
It is slow.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-19 Thread Srivatsa S. Bhat

On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
 On 12/19, Srivatsa S. Bhat wrote:

 BTW, there is a small problem with the synchronize_sched() approach:
 We can't generalize the synchronization scheme as a generic percpu rwlock
 because the writer's role is split into 2, the first part (the one having
 synchronize_sched()) which should be run in process context, and the
 second part (the rest of it), which can be run in atomic context.
 
 Yes,
 
 That is, needing the writer to be able to sleep (due to synchronize_sched())
 will make the locking scheme unusable (in other usecases) I think. And
 the split (blocking and non-blocking part) itself seems hard to express
 as a single writer API.
 
 I do not think this is the problem.
 
 We need 2 helpers for writer, the 1st one does synchronize_sched() and the
 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.


Ah, that's the problem no? Users of reader-writer locks expect to run in
atomic context (ie., they don't want to sleep). We can't expose an API that
can make the task go to sleep under the covers!
(And that too, definitely not with a name such as percpu_write_lock_irqsave()
... because we are going to be interrupt-safe as well, in the second part...).
 
 In fact I think that the 1st helper should not do synchronize_sched(),
 and (say) cpu_hotplug_begin() should call it itself. Because if we also
 change cpu_hotplug_begin() to use percpu_rw_sem we do not want to do
 synchronize_sched() twice. But lets ignore this for now.
 

Yeah, let's ignore this for now, else I'll get all confused ;-)

 But,
 
 Hmmm.. so what do we do? Shall we say We anyway have to do smp_rmb() in the
 reader in the fast path. Instead let's do a full smp_mb() and get rid of
 the synchronize_sched() in the writer, and thereby expose this 
 synchronization
 scheme as generic percpu rwlocks? Or should we rather use 
 synchronize_sched()
 and keep this synchronization scheme restricted to CPU hotplug only?
 
 Oh, I do not know ;)
 
 To me, the main question is: can we use synchronize_sched() in cpu_down?
 It is slow.


Haha :-) So we don't want smp_mb() in the reader, *and* also don't want
synchronize_sched() in the writer! Sounds like saying we want to have the cake
and eat it too ;-) :P

But seriously speaking, I understand that its a bit of a hard choice to make.
On one side, we can avoid synchronize_sched() at the writer, but have to bear
the burden of smp_mb() at the reader even in the fastpath, when no writer is 
active.
On the other side, we can make the reader avoid smp_mb(), but the writer has to
suffer a full synchronize_sched(). But an important point is that, at the writer
side, we already wait for so many mutex locks and stuff, like in the 
CPU_DOWN_PREPARE
stage (as determined by the callbacks). So adding a synchronize_sched() to the 
mix
shouldn't make it noticeably bad, isn't it?

(I'm not arguing in favor of using synchronize_sched(). I'm just trying to point
out that using it might not be as bad as we think it is, in the CPU hotplug 
case.
And moreover, since I'm still not convinced about the writer API part if use
synchronize_sched(), I'd rather avoid synchronize_sched().)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-18 Thread Srivatsa S. Bhat

On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
> On 12/18, Srivatsa S. Bhat wrote:
>>
>> So now that we can't avoid disabling and enabling interrupts,
> 
> Still I think it would be better to not use local_irq_save/restore
> directly.

Sure, we can use this_cpu_add() itself. I explicitly used
local_irq_save/restore here just to explain my question.

> And,
> 
>> I was
>> wondering if we could exploit this to avoid the smp_mb()..
>>
>> Maybe this is a stupid question, but I'll shoot it anyway...
>> Does local_irq_disable()/enable provide any ordering guarantees by any 
>> chance?
> 
> Oh, I do not know.
> 
> But please look at the comment above prepare_to_wait(). It assumes
> that even spin_unlock_irqrestore() is not the full barrier.
> 

Semi-permeable barrier.. Hmm.. 

> In any case. get_online_cpus_atomic() has to use irq_restore, not
> irq_enable. And _restore does nothing "special" if irqs were already
> disabled, so I think we can't rely on sti.
> 

Right, I forgot about the _restore part.

>> I tried thinking about other ways to avoid that smp_mb() in the reader,
> 
> Just in case, I think there is no way to avoid mb() in _get (although
> perhaps it can be "implicit").
> 

Actually, I was trying to avoid mb() in the _fastpath_, when there is no
active writer. I missed stating that clearly, sorry.

> The writer changes cpu_online_mask and drops the lock. We need to ensure
> that the reader sees the change in cpu_online_mask after _get returns.
> 

The write_unlock() will ensure the completion of the update to cpu_online_mask,
using the same semi-permeable logic that you pointed above. So readers will
see the update as soon as the writer releases the lock, right?

>> but was unsuccessful. So if the above assumption is wrong, I guess we'll
>> just have to go with the version that uses synchronize_sched() at the
>> writer-side.
> 
> In this case we can also convert get_online_cpus() to use percpu_rwsem
> and avoid mutex_lock(_hotplug.lock), but this is minor I guess.
> I do not think get_online_cpus() is called too often.
> 

Yes, we could do that as well. I remember you saying that you had some
patches for percpu_rwsem to help use it in cpu hotplug code (to make it
recursive, IIRC).

So, I guess we'll go with the synchronize_sched() approach for percpu rwlocks
then. Tejun, it is still worthwhile to expose this as a generic percpu rwlock
and then use it inside cpu hotplug code, right?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-18 Thread Oleg Nesterov

On 12/18, Srivatsa S. Bhat wrote:
>
> So now that we can't avoid disabling and enabling interrupts,

Still I think it would be better to not use local_irq_save/restore
directly. And,

> I was
> wondering if we could exploit this to avoid the smp_mb()..
>
> Maybe this is a stupid question, but I'll shoot it anyway...
> Does local_irq_disable()/enable provide any ordering guarantees by any chance?

Oh, I do not know.

But please look at the comment above prepare_to_wait(). It assumes
that even spin_unlock_irqrestore() is not the full barrier.

In any case. get_online_cpus_atomic() has to use irq_restore, not
irq_enable. And _restore does nothing "special" if irqs were already
disabled, so I think we can't rely on sti.

> I tried thinking about other ways to avoid that smp_mb() in the reader,

Just in case, I think there is no way to avoid mb() in _get (although
perhaps it can be "implicit").

The writer changes cpu_online_mask and drops the lock. We need to ensure
that the reader sees the change in cpu_online_mask after _get returns.

> but was unsuccessful. So if the above assumption is wrong, I guess we'll
> just have to go with the version that uses synchronize_sched() at the
> writer-side.

In this case we can also convert get_online_cpus() to use percpu_rwsem
and avoid mutex_lock(_hotplug.lock), but this is minor I guess.
I do not think get_online_cpus() is called too often.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-18 Thread Srivatsa S. Bhat

On 12/14/2012 11:33 PM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
>>> On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
>
> Even I don't spot anything wrong with it. But I'll give it some more
> thought..

 Since an interrupt handler can also run get_online_cpus_atomic(), we
 cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
 right?
>>>
>>> Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
>>> IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
>>> __this_cpu_inc() correctness-wise.
>>>
>>> And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
>>>
>>> But when I try to read the comments percpu.h, I am starting to think that
>>> even this_cpu_inc() is not safe if irq handler can do the same?
>>>
>>
>> The comment seems to say that its not safe wrt interrupts. But looking at
>> the code in include/linux/percpu.h, IIUC, that is true only about
>> this_cpu_read() because it only disables preemption.
>>
>> However, this_cpu_inc() looks safe wrt interrupts because it wraps the
>> increment within raw_local_irqsave()/restore().
> 
> You mean _this_cpu_generic_to_op() I guess. So yes, I think you are right,
> this_cpu_* should be irq-safe, but __this_cpu_* is not.
> 

Yes.

> Thanks.
> 
> At least on x86 there is no difference between this_ and __this_, both do
> percpu_add_op() without local_irq_disable/enable. But it seems that most
> of architectures use generic code.
> 

So now that we can't avoid disabling and enabling interrupts, I was
wondering if we could exploit this to avoid the smp_mb()..

Maybe this is a stupid question, but I'll shoot it anyway...
Does local_irq_disable()/enable provide any ordering guarantees by any chance?
I think the answer is no, but if it is yes, I guess we can do as shown
below to ensure that STORE(reader_percpu_refcnt) happens before
LOAD(writer_signal).

void get_online_cpus_atomic(void)
{
unsigned long flags;

preempt_disable();

//only for writer
local_irq_save(flags);
__this_cpu_add(reader_percpu_refcnt, );
local_irq_restore(flags);

//no need of an explicit smp_mb()

if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
this_cpu_inc(reader_percpu_refcnt);
} else if (writer_active()) {
...
}

this_cpu_dec(reader_percpu_refcnt, );

}

I tried thinking about other ways to avoid that smp_mb() in the reader,
but was unsuccessful. So if the above assumption is wrong, I guess we'll
just have to go with the version that uses synchronize_sched() at the
writer-side.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-18 Thread Srivatsa S. Bhat

On 12/14/2012 11:33 PM, Oleg Nesterov wrote:
 On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
 On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:

 Even I don't spot anything wrong with it. But I'll give it some more
 thought..

 Since an interrupt handler can also run get_online_cpus_atomic(), we
 cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
 right?

 Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
 IOW, I thought that, say, this_cpu_inc() is equal to preempt_disable +
 __this_cpu_inc() correctness-wise.

 And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

 But when I try to read the comments percpu.h, I am starting to think that
 even this_cpu_inc() is not safe if irq handler can do the same?


 The comment seems to say that its not safe wrt interrupts. But looking at
 the code in include/linux/percpu.h, IIUC, that is true only about
 this_cpu_read() because it only disables preemption.

 However, this_cpu_inc() looks safe wrt interrupts because it wraps the
 increment within raw_local_irqsave()/restore().
 
 You mean _this_cpu_generic_to_op() I guess. So yes, I think you are right,
 this_cpu_* should be irq-safe, but __this_cpu_* is not.
 

Yes.

 Thanks.
 
 At least on x86 there is no difference between this_ and __this_, both do
 percpu_add_op() without local_irq_disable/enable. But it seems that most
 of architectures use generic code.
 

So now that we can't avoid disabling and enabling interrupts, I was
wondering if we could exploit this to avoid the smp_mb()..

Maybe this is a stupid question, but I'll shoot it anyway...
Does local_irq_disable()/enable provide any ordering guarantees by any chance?
I think the answer is no, but if it is yes, I guess we can do as shown
below to ensure that STORE(reader_percpu_refcnt) happens before
LOAD(writer_signal).

void get_online_cpus_atomic(void)
{
unsigned long flags;

preempt_disable();

//only for writer
local_irq_save(flags);
__this_cpu_add(reader_percpu_refcnt, );
local_irq_restore(flags);

//no need of an explicit smp_mb()

if (__this_cpu_read(reader_percpu_refcnt)  MASK) {
this_cpu_inc(reader_percpu_refcnt);
} else if (writer_active()) {
...
}

this_cpu_dec(reader_percpu_refcnt, );

}

I tried thinking about other ways to avoid that smp_mb() in the reader,
but was unsuccessful. So if the above assumption is wrong, I guess we'll
just have to go with the version that uses synchronize_sched() at the
writer-side.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-18 Thread Oleg Nesterov

On 12/18, Srivatsa S. Bhat wrote:

 So now that we can't avoid disabling and enabling interrupts,

Still I think it would be better to not use local_irq_save/restore
directly. And,

 I was
 wondering if we could exploit this to avoid the smp_mb()..

 Maybe this is a stupid question, but I'll shoot it anyway...
 Does local_irq_disable()/enable provide any ordering guarantees by any chance?

Oh, I do not know.

But please look at the comment above prepare_to_wait(). It assumes
that even spin_unlock_irqrestore() is not the full barrier.

In any case. get_online_cpus_atomic() has to use irq_restore, not
irq_enable. And _restore does nothing special if irqs were already
disabled, so I think we can't rely on sti.

 I tried thinking about other ways to avoid that smp_mb() in the reader,

Just in case, I think there is no way to avoid mb() in _get (although
perhaps it can be implicit).

The writer changes cpu_online_mask and drops the lock. We need to ensure
that the reader sees the change in cpu_online_mask after _get returns.

 but was unsuccessful. So if the above assumption is wrong, I guess we'll
 just have to go with the version that uses synchronize_sched() at the
 writer-side.

In this case we can also convert get_online_cpus() to use percpu_rwsem
and avoid mutex_lock(cpu_hotplug.lock), but this is minor I guess.
I do not think get_online_cpus() is called too often.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-18 Thread Srivatsa S. Bhat

On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
 On 12/18, Srivatsa S. Bhat wrote:

 So now that we can't avoid disabling and enabling interrupts,
 
 Still I think it would be better to not use local_irq_save/restore
 directly.

Sure, we can use this_cpu_add() itself. I explicitly used
local_irq_save/restore here just to explain my question.

 And,
 
 I was
 wondering if we could exploit this to avoid the smp_mb()..

 Maybe this is a stupid question, but I'll shoot it anyway...
 Does local_irq_disable()/enable provide any ordering guarantees by any 
 chance?
 
 Oh, I do not know.
 
 But please look at the comment above prepare_to_wait(). It assumes
 that even spin_unlock_irqrestore() is not the full barrier.
 

Semi-permeable barrier.. Hmm.. 

 In any case. get_online_cpus_atomic() has to use irq_restore, not
 irq_enable. And _restore does nothing special if irqs were already
 disabled, so I think we can't rely on sti.
 

Right, I forgot about the _restore part.

 I tried thinking about other ways to avoid that smp_mb() in the reader,
 
 Just in case, I think there is no way to avoid mb() in _get (although
 perhaps it can be implicit).
 

Actually, I was trying to avoid mb() in the _fastpath_, when there is no
active writer. I missed stating that clearly, sorry.

 The writer changes cpu_online_mask and drops the lock. We need to ensure
 that the reader sees the change in cpu_online_mask after _get returns.
 

The write_unlock() will ensure the completion of the update to cpu_online_mask,
using the same semi-permeable logic that you pointed above. So readers will
see the update as soon as the writer releases the lock, right?

 but was unsuccessful. So if the above assumption is wrong, I guess we'll
 just have to go with the version that uses synchronize_sched() at the
 writer-side.
 
 In this case we can also convert get_online_cpus() to use percpu_rwsem
 and avoid mutex_lock(cpu_hotplug.lock), but this is minor I guess.
 I do not think get_online_cpus() is called too often.
 

Yes, we could do that as well. I remember you saying that you had some
patches for percpu_rwsem to help use it in cpu hotplug code (to make it
recursive, IIRC).

So, I guess we'll go with the synchronize_sched() approach for percpu rwlocks
then. Tejun, it is still worthwhile to expose this as a generic percpu rwlock
and then use it inside cpu hotplug code, right?


Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-14 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
> > On 12/13, Srivatsa S. Bhat wrote:
> >>
> >> On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
> >>>
> >>> Even I don't spot anything wrong with it. But I'll give it some more
> >>> thought..
> >>
> >> Since an interrupt handler can also run get_online_cpus_atomic(), we
> >> cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
> >> right?
> >
> > Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
> > IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
> > __this_cpu_inc() correctness-wise.
> >
> > And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
> >
> > But when I try to read the comments percpu.h, I am starting to think that
> > even this_cpu_inc() is not safe if irq handler can do the same?
> >
>
> The comment seems to say that its not safe wrt interrupts. But looking at
> the code in include/linux/percpu.h, IIUC, that is true only about
> this_cpu_read() because it only disables preemption.
>
> However, this_cpu_inc() looks safe wrt interrupts because it wraps the
> increment within raw_local_irqsave()/restore().

You mean _this_cpu_generic_to_op() I guess. So yes, I think you are right,
this_cpu_* should be irq-safe, but __this_cpu_* is not.

Thanks.

At least on x86 there is no difference between this_ and __this_, both do
percpu_add_op() without local_irq_disable/enable. But it seems that most
of architectures use generic code.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-14 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
  On 12/13, Srivatsa S. Bhat wrote:
 
  On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
 
  Even I don't spot anything wrong with it. But I'll give it some more
  thought..
 
  Since an interrupt handler can also run get_online_cpus_atomic(), we
  cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
  right?
 
  Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
  IOW, I thought that, say, this_cpu_inc() is equal to preempt_disable +
  __this_cpu_inc() correctness-wise.
 
  And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
 
  But when I try to read the comments percpu.h, I am starting to think that
  even this_cpu_inc() is not safe if irq handler can do the same?
 

 The comment seems to say that its not safe wrt interrupts. But looking at
 the code in include/linux/percpu.h, IIUC, that is true only about
 this_cpu_read() because it only disables preemption.

 However, this_cpu_inc() looks safe wrt interrupts because it wraps the
 increment within raw_local_irqsave()/restore().

You mean _this_cpu_generic_to_op() I guess. So yes, I think you are right,
this_cpu_* should be irq-safe, but __this_cpu_* is not.

Thanks.

At least on x86 there is no difference between this_ and __this_, both do
percpu_add_op() without local_irq_disable/enable. But it seems that most
of architectures use generic code.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Srivatsa S. Bhat

On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
>>>
>>> Even I don't spot anything wrong with it. But I'll give it some more
>>> thought..
>>
>> Since an interrupt handler can also run get_online_cpus_atomic(), we
>> cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
>> right?
> 
> Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
> IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
> __this_cpu_inc() correctness-wise.
>
> And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
> 
> But when I try to read the comments percpu.h, I am starting to think that
> even this_cpu_inc() is not safe if irq handler can do the same?
> 

The comment seems to say that its not safe wrt interrupts. But looking at
the code in include/linux/percpu.h, IIUC, that is true only about
this_cpu_read() because it only disables preemption.

However, this_cpu_inc() looks safe wrt interrupts because it wraps the
increment within raw_local_irqsave()/restore().

> Confused...
> 
> I am shy to ask... will, say, DEFINE_PER_CPU(local_t) and
> local_inc(__this_cpu_ptr(...)) work??
> 
>> But still, this scheme is better, because the reader doesn't have to spin
>> on the read_lock() with interrupts disabled.
> 
> Yes, but my main concern is that irq_disable/enable itself is not that cheap.
> 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Tejun Heo

Hello, Oleg.

On Thu, Dec 13, 2012 at 05:17:09PM +0100, Oleg Nesterov wrote:
> Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
> IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
> __this_cpu_inc() correctness-wise.

this_cpu_inc() equals local_irq_save() + __this_cpu_inc().

> And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

Yes, it is safe.

> But when I try to read the comments percpu.h, I am starting to think that
> even this_cpu_inc() is not safe if irq handler can do the same?
> 
> Confused...

Yeah, the comment is confusing and the way these macros are defined
doesn't help.  There used to be three variants and it looks like we
didn't update the comment while removing the preempt safe ones.  Gotta
clean those up.  Anyways, yes, this_cpu_*() are safe against irqs.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
> >
> > Even I don't spot anything wrong with it. But I'll give it some more
> > thought..
>
> Since an interrupt handler can also run get_online_cpus_atomic(), we
> cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
> right?

Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
__this_cpu_inc() correctness-wise.

And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

But when I try to read the comments percpu.h, I am starting to think that
even this_cpu_inc() is not safe if irq handler can do the same?

Confused...

I am shy to ask... will, say, DEFINE_PER_CPU(local_t) and
local_inc(__this_cpu_ptr(...)) work??

> But still, this scheme is better, because the reader doesn't have to spin
> on the read_lock() with interrupts disabled.

Yes, but my main concern is that irq_disable/enable itself is not that cheap.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Srivatsa S. Bhat

On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
> On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
>> On 12/13, Srivatsa S. Bhat wrote:
>>>
>>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
 And _perhaps_ get_ can avoid it too?

 I didn't really try to think, probably this is not right, but can't
 something like this work?

#define (1 << 16)
#define MASK( -1)

void get_online_cpus_atomic(void)
{
preempt_disable();

// only for writer
__this_cpu_add(reader_percpu_refcnt, );

if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
__this_cpu_inc(reader_percpu_refcnt);
} else {
smp_wmb();
if (writer_active()) {
...
}
}

__this_cpu_dec(reader_percpu_refcnt, );
}

>>>
>>> Sorry, may be I'm too blind to see, but I didn't understand the logic
>>> of how the mask helps us avoid disabling interrupts..
>>
>> Why do we need cli/sti at all? We should prevent the following race:
>>
>>  - the writer already holds hotplug_rwlock, so get_ must not
>>succeed.
>>
>>  - the new reader comes, it increments reader_percpu_refcnt,
>>but before it checks writer_active() ...
>>
>>  - irq handler does get_online_cpus_atomic() and sees
>>reader_nested_percpu() == T, so it simply increments
>>reader_percpu_refcnt and succeeds.
>>
>> OTOH, why do we need to increment reader_percpu_refcnt the counter
>> in advance? To ensure that either we see writer_active() or the
>> writer should see reader_percpu_refcnt != 0 (and that is why they
>> should write/read in reverse order).
>>
>> The code above tries to avoid this race using the lower 16 bits
>> as a "nested-counter", and the upper bits to avoid the race with
>> the writer.
>>
>>  // only for writer
>>  __this_cpu_add(reader_percpu_refcnt, );
>>
>> If irq comes and does get_online_cpus_atomic(), it won't be confused
>> by __this_cpu_add(), it will check the lower bits and switch to
>> the "slow path".
>>
> 
> This is a very clever scheme indeed! :-) Thanks a lot for explaining
> it in detail.
> 
>>
>> But once again, so far I didn't really try to think. It is quite
>> possible I missed something.
>>
> 
> Even I don't spot anything wrong with it. But I'll give it some more
> thought..

Since an interrupt handler can also run get_online_cpus_atomic(), we
cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
right?

To maintain the integrity of the update itself, we will have to use the
this_cpu_* variant, which basically plays spoil-sport on this whole
scheme... :-(

But still, this scheme is better, because the reader doesn't have to spin
on the read_lock() with interrupts disabled. That way, interrupt handlers
that are not hotplug readers can continue to run on this CPU while taking
another CPU offline.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Srivatsa S. Bhat

On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
 On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
 On 12/13, Srivatsa S. Bhat wrote:

 On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
 And _perhaps_ get_ can avoid it too?

 I didn't really try to think, probably this is not right, but can't
 something like this work?

#define (1  16)
#define MASK( -1)

void get_online_cpus_atomic(void)
{
preempt_disable();

// only for writer
__this_cpu_add(reader_percpu_refcnt, );

if (__this_cpu_read(reader_percpu_refcnt)  MASK) {
__this_cpu_inc(reader_percpu_refcnt);
} else {
smp_wmb();
if (writer_active()) {
...
}
}

__this_cpu_dec(reader_percpu_refcnt, );
}


 Sorry, may be I'm too blind to see, but I didn't understand the logic
 of how the mask helps us avoid disabling interrupts..

 Why do we need cli/sti at all? We should prevent the following race:

  - the writer already holds hotplug_rwlock, so get_ must not
succeed.

  - the new reader comes, it increments reader_percpu_refcnt,
but before it checks writer_active() ...

  - irq handler does get_online_cpus_atomic() and sees
reader_nested_percpu() == T, so it simply increments
reader_percpu_refcnt and succeeds.

 OTOH, why do we need to increment reader_percpu_refcnt the counter
 in advance? To ensure that either we see writer_active() or the
 writer should see reader_percpu_refcnt != 0 (and that is why they
 should write/read in reverse order).

 The code above tries to avoid this race using the lower 16 bits
 as a nested-counter, and the upper bits to avoid the race with
 the writer.

  // only for writer
  __this_cpu_add(reader_percpu_refcnt, );

 If irq comes and does get_online_cpus_atomic(), it won't be confused
 by __this_cpu_add(), it will check the lower bits and switch to
 the slow path.

 
 This is a very clever scheme indeed! :-) Thanks a lot for explaining
 it in detail.
 

 But once again, so far I didn't really try to think. It is quite
 possible I missed something.

 
 Even I don't spot anything wrong with it. But I'll give it some more
 thought..

Since an interrupt handler can also run get_online_cpus_atomic(), we
cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
right?

To maintain the integrity of the update itself, we will have to use the
this_cpu_* variant, which basically plays spoil-sport on this whole
scheme... :-(

But still, this scheme is better, because the reader doesn't have to spin
on the read_lock() with interrupts disabled. That way, interrupt handlers
that are not hotplug readers can continue to run on this CPU while taking
another CPU offline.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
 
  Even I don't spot anything wrong with it. But I'll give it some more
  thought..

 Since an interrupt handler can also run get_online_cpus_atomic(), we
 cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
 right?

Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
IOW, I thought that, say, this_cpu_inc() is equal to preempt_disable +
__this_cpu_inc() correctness-wise.

And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

But when I try to read the comments percpu.h, I am starting to think that
even this_cpu_inc() is not safe if irq handler can do the same?

Confused...

I am shy to ask... will, say, DEFINE_PER_CPU(local_t) and
local_inc(__this_cpu_ptr(...)) work??

 But still, this scheme is better, because the reader doesn't have to spin
 on the read_lock() with interrupts disabled.

Yes, but my main concern is that irq_disable/enable itself is not that cheap.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Tejun Heo

Hello, Oleg.

On Thu, Dec 13, 2012 at 05:17:09PM +0100, Oleg Nesterov wrote:
 Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
 IOW, I thought that, say, this_cpu_inc() is equal to preempt_disable +
 __this_cpu_inc() correctness-wise.

this_cpu_inc() equals local_irq_save() + __this_cpu_inc().

 And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

Yes, it is safe.

 But when I try to read the comments percpu.h, I am starting to think that
 even this_cpu_inc() is not safe if irq handler can do the same?
 
 Confused...

Yeah, the comment is confusing and the way these macros are defined
doesn't help.  There used to be three variants and it looks like we
didn't update the comment while removing the preempt safe ones.  Gotta
clean those up.  Anyways, yes, this_cpu_*() are safe against irqs.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-13 Thread Srivatsa S. Bhat

On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
 On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:

 Even I don't spot anything wrong with it. But I'll give it some more
 thought..

 Since an interrupt handler can also run get_online_cpus_atomic(), we
 cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
 right?
 
 Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
 IOW, I thought that, say, this_cpu_inc() is equal to preempt_disable +
 __this_cpu_inc() correctness-wise.

 And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
 
 But when I try to read the comments percpu.h, I am starting to think that
 even this_cpu_inc() is not safe if irq handler can do the same?
 

The comment seems to say that its not safe wrt interrupts. But looking at
the code in include/linux/percpu.h, IIUC, that is true only about
this_cpu_read() because it only disables preemption.

However, this_cpu_inc() looks safe wrt interrupts because it wraps the
increment within raw_local_irqsave()/restore().

 Confused...
 
 I am shy to ask... will, say, DEFINE_PER_CPU(local_t) and
 local_inc(__this_cpu_ptr(...)) work??
 
 But still, this scheme is better, because the reader doesn't have to spin
 on the read_lock() with interrupts disabled.
 
 Yes, but my main concern is that irq_disable/enable itself is not that cheap.
 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/13/2012 01:06 AM, Oleg Nesterov wrote:
> >
> > But perhaps there is another reason to make it per-cpu...

Actually this is not the reason, please see below. But let me repeat,
it is not that I suggest to remove "per-cpu".

> > It seems we can avoid cpu_hotplug.active_writer == current check in
> > get/put.
> >
> > take_cpu_down() can clear this_cpu(writer_signal) right after it takes
> > hotplug_rwlock for writing. It runs with irqs and preemption disabled,
> > nobody else will ever look at writer_signal on its CPU.
> >
>
> Hmm.. And then the get/put_ on that CPU will increment/decrement the per-cpu
> refcount, but we don't care.. because we only need to ensure that they don't
> deadlock by taking the rwlock for read.

Yes, but...

Probably it would be more clean to simply do this_cpu_inc(reader_percpu_refcnt)
after write_lock(hotplug_rwlock). This will have the same effect for get/put,
and we still can make writer_signal global (if we want).

And note that this will also simplify the lockdep annotations which we (imho)
should add later.

Ignoring all complications get_online_cpus_atomic() does:

if (this_cpu_read(reader_percpu_refcnt))
this_cpu_inc(reader_percpu_refcnt);
else if (!writer_signal)
this_cpu_inc(reader_percpu_refcnt); // same as above
else
read_lock(_rwlock);

But for lockdep it should do:

if (this_cpu_read(reader_percpu_refcnt))
this_cpu_inc(reader_percpu_refcnt);
else if (!writer_signal) {
this_cpu_inc(reader_percpu_refcnt);
// pretend we take hotplug_rwlock for lockdep
rwlock_acquire_read(_rwlock.dep_map, 0, 0);
}
else
read_lock(_rwlock);

And we need to ensure that rwlock_acquire_read() is not called under
write_lock(hotplug_rwlock).

If we use reader_percpu_refcnt to fool get/put, we should not worry.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/13/2012 01:06 AM, Oleg Nesterov wrote:
> On 12/12, Oleg Nesterov wrote:
>>
>> On 12/12, Srivatsa S. Bhat wrote:
>>>
>>> On 12/12/2012 10:47 PM, Oleg Nesterov wrote:

 Why it needs to be per-cpu? It can be global and __read_mostly to avoid
 the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
 into a single cacheline...
>>>
>>> Even I realized this (that we could use a global) after posting out the
>>> series.. But do you think that it would be better to retain the per-cpu
>>> variant itself, due to the cache effects?
>>
>> I don't really know, up to you. This was the question ;)
> 
> But perhaps there is another reason to make it per-cpu...
> 
> It seems we can avoid cpu_hotplug.active_writer == current check in
> get/put.
> 
> take_cpu_down() can clear this_cpu(writer_signal) right after it takes
> hotplug_rwlock for writing. It runs with irqs and preemption disabled,
> nobody else will ever look at writer_signal on its CPU.
> 

Hmm.. And then the get/put_ on that CPU will increment/decrement the per-cpu
refcount, but we don't care.. because we only need to ensure that they don't
deadlock by taking the rwlock for read.

This sounds great!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Oleg Nesterov wrote:
>
> On 12/12, Srivatsa S. Bhat wrote:
> >
> > On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
> > >
> > > Why it needs to be per-cpu? It can be global and __read_mostly to avoid
> > > the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
> > > into a single cacheline...
> >
> > Even I realized this (that we could use a global) after posting out the
> > series.. But do you think that it would be better to retain the per-cpu
> > variant itself, due to the cache effects?
>
> I don't really know, up to you. This was the question ;)

But perhaps there is another reason to make it per-cpu...

It seems we can avoid cpu_hotplug.active_writer == current check in
get/put.

take_cpu_down() can clear this_cpu(writer_signal) right after it takes
hotplug_rwlock for writing. It runs with irqs and preemption disabled,
nobody else will ever look at writer_signal on its CPU.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
>>> And _perhaps_ get_ can avoid it too?
>>>
>>> I didn't really try to think, probably this is not right, but can't
>>> something like this work?
>>>
>>> #define (1 << 16)
>>> #define MASK( -1)
>>>
>>> void get_online_cpus_atomic(void)
>>> {
>>> preempt_disable();
>>>
>>> // only for writer
>>> __this_cpu_add(reader_percpu_refcnt, );
>>>
>>> if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
>>> __this_cpu_inc(reader_percpu_refcnt);
>>> } else {
>>> smp_wmb();
>>> if (writer_active()) {
>>> ...
>>> }
>>> }
>>>
>>> __this_cpu_dec(reader_percpu_refcnt, );
>>> }
>>>
>>
>> Sorry, may be I'm too blind to see, but I didn't understand the logic
>> of how the mask helps us avoid disabling interrupts..
> 
> Why do we need cli/sti at all? We should prevent the following race:
> 
>   - the writer already holds hotplug_rwlock, so get_ must not
> succeed.
> 
>   - the new reader comes, it increments reader_percpu_refcnt,
> but before it checks writer_active() ...
> 
>   - irq handler does get_online_cpus_atomic() and sees
> reader_nested_percpu() == T, so it simply increments
> reader_percpu_refcnt and succeeds.
> 
> OTOH, why do we need to increment reader_percpu_refcnt the counter
> in advance? To ensure that either we see writer_active() or the
> writer should see reader_percpu_refcnt != 0 (and that is why they
> should write/read in reverse order).
> 
> The code above tries to avoid this race using the lower 16 bits
> as a "nested-counter", and the upper bits to avoid the race with
> the writer.
> 
>   // only for writer
>   __this_cpu_add(reader_percpu_refcnt, );
> 
> If irq comes and does get_online_cpus_atomic(), it won't be confused
> by __this_cpu_add(), it will check the lower bits and switch to
> the "slow path".
> 

This is a very clever scheme indeed! :-) Thanks a lot for explaining
it in detail.

> 
> But once again, so far I didn't really try to think. It is quite
> possible I missed something.
> 

Even I don't spot anything wrong with it. But I'll give it some more
thought..

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
> > And _perhaps_ get_ can avoid it too?
> >
> > I didn't really try to think, probably this is not right, but can't
> > something like this work?
> >
> > #define (1 << 16)
> > #define MASK( -1)
> >
> > void get_online_cpus_atomic(void)
> > {
> > preempt_disable();
> >
> > // only for writer
> > __this_cpu_add(reader_percpu_refcnt, );
> >
> > if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
> > __this_cpu_inc(reader_percpu_refcnt);
> > } else {
> > smp_wmb();
> > if (writer_active()) {
> > ...
> > }
> > }
> >
> > __this_cpu_dec(reader_percpu_refcnt, );
> > }
> >
>
> Sorry, may be I'm too blind to see, but I didn't understand the logic
> of how the mask helps us avoid disabling interrupts..

Why do we need cli/sti at all? We should prevent the following race:

- the writer already holds hotplug_rwlock, so get_ must not
  succeed.

- the new reader comes, it increments reader_percpu_refcnt,
  but before it checks writer_active() ...

- irq handler does get_online_cpus_atomic() and sees
  reader_nested_percpu() == T, so it simply increments
  reader_percpu_refcnt and succeeds.

OTOH, why do we need to increment reader_percpu_refcnt the counter
in advance? To ensure that either we see writer_active() or the
writer should see reader_percpu_refcnt != 0 (and that is why they
should write/read in reverse order).

The code above tries to avoid this race using the lower 16 bits
as a "nested-counter", and the upper bits to avoid the race with
the writer.

// only for writer
__this_cpu_add(reader_percpu_refcnt, );

If irq comes and does get_online_cpus_atomic(), it won't be confused
by __this_cpu_add(), it will check the lower bits and switch to
the "slow path".

But once again, so far I didn't really try to think. It is quite
possible I missed something.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 11:53 PM, Oleg Nesterov wrote:
> On 12/12, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
>>
>>> And when I look at get_online_cpus_atomic() again it uses rmb(). This
>>> doesn't look correct, we need the full barrier between this_cpu_inc()
>>> and writer_active().
>>
>> Hmm..
>>
>>> At the same time reader_nested_percpu() can be checked before mb().
>>
>> I thought that since the increment and the check (reader_nested_percpu)
>> act on the same memory location, they will naturally be run in the given
>> order, without any need for barriers. Am I wrong?
> 
> And this is what I meant, you do not need a barrier before
> reader_nested_percpu().
> 

Ah, ok!

> But you need to ensure that WRITE(reader_percpu_refcnt) and 
> READ(writer_signal)
> can't be reordered, so you need mb() in between. rmb() can serialize LOADs and
> STOREs.
> 

OK, got it. (I know you meant s/can/can't).

I'm trying to see if we can somehow exploit the fact that the writer can
potentially tolerate if a reader ignores his signal (to switch to rwlocks)
for a while... and use this to get rid of barriers in the reader path (without
using synchronize_sched() at the writer, of course). And perhaps also take 
advantage
of the fact that the read_lock() acts as a one-way barrier..

I don't know, maybe its not possible after all.. :-/

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
> On 12/12, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
>>>
>>> Why it needs to be per-cpu? It can be global and __read_mostly to avoid
>>> the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
>>> into a single cacheline...
>>
>> Even I realized this (that we could use a global) after posting out the
>> series.. But do you think that it would be better to retain the per-cpu
>> variant itself, due to the cache effects?
> 
> I don't really know, up to you. This was the question ;)

OK :-)

> 
>>> Do we really need local_irq_save/restore in put_ ?
>>>
>>
>> Hmm.. good point! I don't think we need it.
> 
> And _perhaps_ get_ can avoid it too?
> 
> I didn't really try to think, probably this is not right, but can't
> something like this work?
> 
>   #define (1 << 16)
>   #define MASK( -1)
> 
>   void get_online_cpus_atomic(void)
>   {
>   preempt_disable();
> 
>   // only for writer
>   __this_cpu_add(reader_percpu_refcnt, );
> 
>   if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
>   __this_cpu_inc(reader_percpu_refcnt);
>   } else {
>   smp_wmb();
>   if (writer_active()) {
>   ...
>   }
>   }
> 
>   __this_cpu_dec(reader_percpu_refcnt, );
>   }
> 

Sorry, may be I'm too blind to see, but I didn't understand the logic
of how the mask helps us avoid disabling interrupts.. Can you kindly
elaborate?

>   void put_online_cpus_atomic(void)
>   {
>   if (__this_cpu_read(reader_percpu_refcnt) & MASK)
>   __this_cpu_dec(reader_percpu_refcnt);
>   else
>   read_unlock(_rwlock);
>   preempt_enable();
>   }
> 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Srivatsa S. Bhat wrote:
>
> On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
>
> > And when I look at get_online_cpus_atomic() again it uses rmb(). This
> > doesn't look correct, we need the full barrier between this_cpu_inc()
> > and writer_active().
>
> Hmm..
>
> > At the same time reader_nested_percpu() can be checked before mb().
>
> I thought that since the increment and the check (reader_nested_percpu)
> act on the same memory location, they will naturally be run in the given
> order, without any need for barriers. Am I wrong?

And this is what I meant, you do not need a barrier before
reader_nested_percpu().

But you need to ensure that WRITE(reader_percpu_refcnt) and READ(writer_signal)
can't be reordered, so you need mb() in between. rmb() can serialize LOADs and
STOREs.

Or I misunderstood?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
> On 12/12, Oleg Nesterov wrote:
>>
>> On 12/11, Srivatsa S. Bhat wrote:
>>>
>>> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
>>> when no writer is active.
>>
>> plus cli/sti ;) and increment/decrement are atomic.
>   ^
> 
> OOPS, sorry I was going to say "adds mb()".
> 

Ok, got it :)

> And when I look at get_online_cpus_atomic() again it uses rmb(). This
> doesn't look correct, we need the full barrier between this_cpu_inc()
> and writer_active().
> 

Hmm..

> At the same time reader_nested_percpu() can be checked before mb().
> 

I thought that since the increment and the check (reader_nested_percpu)
act on the same memory location, they will naturally be run in the given
order, without any need for barriers. Am I wrong?

(I referred Documentation/memory-barriers.txt again to verify this, and
the second point under the "Guarantees" section looked like it said the
same thing : "Overlapping loads and stores within a particular CPU will
appear to be ordered within that CPU").

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Srivatsa S. Bhat wrote:
>
> On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
> >
> > Why it needs to be per-cpu? It can be global and __read_mostly to avoid
> > the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
> > into a single cacheline...
>
> Even I realized this (that we could use a global) after posting out the
> series.. But do you think that it would be better to retain the per-cpu
> variant itself, due to the cache effects?

I don't really know, up to you. This was the question ;)

> > Do we really need local_irq_save/restore in put_ ?
> >
>
> Hmm.. good point! I don't think we need it.

And _perhaps_ get_ can avoid it too?

I didn't really try to think, probably this is not right, but can't
something like this work?

#define (1 << 16)
#define MASK( -1)

void get_online_cpus_atomic(void)
{
preempt_disable();

// only for writer
__this_cpu_add(reader_percpu_refcnt, );

if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
__this_cpu_inc(reader_percpu_refcnt);
} else {
smp_wmb();
if (writer_active()) {
...
}
}

__this_cpu_dec(reader_percpu_refcnt, );
}

void put_online_cpus_atomic(void)
{
if (__this_cpu_read(reader_percpu_refcnt) & MASK)
__this_cpu_dec(reader_percpu_refcnt);
else
read_unlock(_rwlock);
preempt_enable();
}

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
> On 12/11, Srivatsa S. Bhat wrote:
>>
>> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
>> when no writer is active.
> 
> plus cli/sti ;)

Of course, forgot to mention it, again! :)

> and increment/decrement are atomic.
> 
> At first glance looks correct to me, but I'll try to read it carefully
> later.
> 
> A couple of minor nits,
> 
>> +static DEFINE_PER_CPU(bool, writer_signal);
> 
> Why it needs to be per-cpu? It can be global and __read_mostly to avoid
> the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
> into a single cacheline...
> 

Even I realized this (that we could use a global) after posting out the
series.. But do you think that it would be better to retain the per-cpu
variant itself, due to the cache effects?

>> +void get_online_cpus_atomic(void)
>> +{
>> +unsigned long flags;
>> +
>> +preempt_disable();
>> +
>> +if (cpu_hotplug.active_writer == current)
>> +return;
>> +
>> +local_irq_save(flags);
> 
> Yes... this is still needed, we are going to increment reader_percpu_refcnt
> unconditionally and this makes reader_nested_percpu() == T.
> 
> But,
> 
>> +void put_online_cpus_atomic(void)
>> +{
>> +unsigned long flags;
>> +
>> +if (cpu_hotplug.active_writer == current)
>> +goto out;
>> +
>> +local_irq_save(flags);
>> +
>> +/*
>> + * We never allow heterogeneous nesting of readers. So it is trivial
>> + * to find out the kind of reader we are, and undo the operation
>> + * done by our corresponding get_online_cpus_atomic().
>> + */
>> +if (__this_cpu_read(reader_percpu_refcnt))
>> +__this_cpu_dec(reader_percpu_refcnt);
>> +else
>> +read_unlock(_rwlock);
>> +
>> +local_irq_restore(flags);
>> +out:
>> +preempt_enable();
>> +}
> 
> Do we really need local_irq_save/restore in put_ ?
>

Hmm.. good point! I don't think we need it.
 

Regards,
Srivatsa S. Bhat


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Oleg Nesterov wrote:
>
> On 12/11, Srivatsa S. Bhat wrote:
> >
> > IOW, the hotplug readers just increment/decrement their per-cpu refcounts
> > when no writer is active.
>
> plus cli/sti ;) and increment/decrement are atomic.
  ^

OOPS, sorry I was going to say "adds mb()".

And when I look at get_online_cpus_atomic() again it uses rmb(). This
doesn't look correct, we need the full barrier between this_cpu_inc()
and writer_active().

At the same time reader_nested_percpu() can be checked before mb().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/11, Srivatsa S. Bhat wrote:
>
> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
> when no writer is active.

plus cli/sti ;) and increment/decrement are atomic.

At first glance looks correct to me, but I'll try to read it carefully
later.

A couple of minor nits,

> +static DEFINE_PER_CPU(bool, writer_signal);

Why it needs to be per-cpu? It can be global and __read_mostly to avoid
the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
into a single cacheline...

> +void get_online_cpus_atomic(void)
> +{
> + unsigned long flags;
> +
> + preempt_disable();
> +
> + if (cpu_hotplug.active_writer == current)
> + return;
> +
> + local_irq_save(flags);

Yes... this is still needed, we are going to increment reader_percpu_refcnt
unconditionally and this makes reader_nested_percpu() == T.

But,

> +void put_online_cpus_atomic(void)
> +{
> + unsigned long flags;
> +
> + if (cpu_hotplug.active_writer == current)
> + goto out;
> +
> + local_irq_save(flags);
> +
> + /*
> +  * We never allow heterogeneous nesting of readers. So it is trivial
> +  * to find out the kind of reader we are, and undo the operation
> +  * done by our corresponding get_online_cpus_atomic().
> +  */
> + if (__this_cpu_read(reader_percpu_refcnt))
> + __this_cpu_dec(reader_percpu_refcnt);
> + else
> + read_unlock(_rwlock);
> +
> + local_irq_restore(flags);
> +out:
> + preempt_enable();
> +}

Do we really need local_irq_save/restore in put_ ?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/11, Srivatsa S. Bhat wrote:

 IOW, the hotplug readers just increment/decrement their per-cpu refcounts
 when no writer is active.

plus cli/sti ;) and increment/decrement are atomic.

At first glance looks correct to me, but I'll try to read it carefully
later.

A couple of minor nits,

 +static DEFINE_PER_CPU(bool, writer_signal);

Why it needs to be per-cpu? It can be global and __read_mostly to avoid
the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
into a single cacheline...

 +void get_online_cpus_atomic(void)
 +{
 + unsigned long flags;
 +
 + preempt_disable();
 +
 + if (cpu_hotplug.active_writer == current)
 + return;
 +
 + local_irq_save(flags);

Yes... this is still needed, we are going to increment reader_percpu_refcnt
unconditionally and this makes reader_nested_percpu() == T.

But,

 +void put_online_cpus_atomic(void)
 +{
 + unsigned long flags;
 +
 + if (cpu_hotplug.active_writer == current)
 + goto out;
 +
 + local_irq_save(flags);
 +
 + /*
 +  * We never allow heterogeneous nesting of readers. So it is trivial
 +  * to find out the kind of reader we are, and undo the operation
 +  * done by our corresponding get_online_cpus_atomic().
 +  */
 + if (__this_cpu_read(reader_percpu_refcnt))
 + __this_cpu_dec(reader_percpu_refcnt);
 + else
 + read_unlock(hotplug_rwlock);
 +
 + local_irq_restore(flags);
 +out:
 + preempt_enable();
 +}

Do we really need local_irq_save/restore in put_ ?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Oleg Nesterov wrote:

 On 12/11, Srivatsa S. Bhat wrote:
 
  IOW, the hotplug readers just increment/decrement their per-cpu refcounts
  when no writer is active.

 plus cli/sti ;) and increment/decrement are atomic.
  ^

OOPS, sorry I was going to say adds mb().

And when I look at get_online_cpus_atomic() again it uses rmb(). This
doesn't look correct, we need the full barrier between this_cpu_inc()
and writer_active().

At the same time reader_nested_percpu() can be checked before mb().

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
 On 12/11, Srivatsa S. Bhat wrote:

 IOW, the hotplug readers just increment/decrement their per-cpu refcounts
 when no writer is active.
 
 plus cli/sti ;)

Of course, forgot to mention it, again! :)

 and increment/decrement are atomic.
 
 At first glance looks correct to me, but I'll try to read it carefully
 later.
 
 A couple of minor nits,
 
 +static DEFINE_PER_CPU(bool, writer_signal);
 
 Why it needs to be per-cpu? It can be global and __read_mostly to avoid
 the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
 into a single cacheline...
 

Even I realized this (that we could use a global) after posting out the
series.. But do you think that it would be better to retain the per-cpu
variant itself, due to the cache effects?

 +void get_online_cpus_atomic(void)
 +{
 +unsigned long flags;
 +
 +preempt_disable();
 +
 +if (cpu_hotplug.active_writer == current)
 +return;
 +
 +local_irq_save(flags);
 
 Yes... this is still needed, we are going to increment reader_percpu_refcnt
 unconditionally and this makes reader_nested_percpu() == T.
 
 But,
 
 +void put_online_cpus_atomic(void)
 +{
 +unsigned long flags;
 +
 +if (cpu_hotplug.active_writer == current)
 +goto out;
 +
 +local_irq_save(flags);
 +
 +/*
 + * We never allow heterogeneous nesting of readers. So it is trivial
 + * to find out the kind of reader we are, and undo the operation
 + * done by our corresponding get_online_cpus_atomic().
 + */
 +if (__this_cpu_read(reader_percpu_refcnt))
 +__this_cpu_dec(reader_percpu_refcnt);
 +else
 +read_unlock(hotplug_rwlock);
 +
 +local_irq_restore(flags);
 +out:
 +preempt_enable();
 +}
 
 Do we really need local_irq_save/restore in put_ ?


Hmm.. good point! I don't think we need it.
 

Regards,
Srivatsa S. Bhat


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Srivatsa S. Bhat wrote:

 On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
 
  Why it needs to be per-cpu? It can be global and __read_mostly to avoid
  the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
  into a single cacheline...

 Even I realized this (that we could use a global) after posting out the
 series.. But do you think that it would be better to retain the per-cpu
 variant itself, due to the cache effects?

I don't really know, up to you. This was the question ;)

  Do we really need local_irq_save/restore in put_ ?
 

 Hmm.. good point! I don't think we need it.

And _perhaps_ get_ can avoid it too?

I didn't really try to think, probably this is not right, but can't
something like this work?

#define (1  16)
#define MASK( -1)

void get_online_cpus_atomic(void)
{
preempt_disable();

// only for writer
__this_cpu_add(reader_percpu_refcnt, );

if (__this_cpu_read(reader_percpu_refcnt)  MASK) {
__this_cpu_inc(reader_percpu_refcnt);
} else {
smp_wmb();
if (writer_active()) {
...
}
}

__this_cpu_dec(reader_percpu_refcnt, );
}

void put_online_cpus_atomic(void)
{
if (__this_cpu_read(reader_percpu_refcnt)  MASK)
__this_cpu_dec(reader_percpu_refcnt);
else
read_unlock(hotplug_rwlock);
preempt_enable();
}

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
 On 12/12, Oleg Nesterov wrote:

 On 12/11, Srivatsa S. Bhat wrote:

 IOW, the hotplug readers just increment/decrement their per-cpu refcounts
 when no writer is active.

 plus cli/sti ;) and increment/decrement are atomic.
   ^
 
 OOPS, sorry I was going to say adds mb().
 

Ok, got it :)

 And when I look at get_online_cpus_atomic() again it uses rmb(). This
 doesn't look correct, we need the full barrier between this_cpu_inc()
 and writer_active().
 

Hmm..

 At the same time reader_nested_percpu() can be checked before mb().
 

I thought that since the increment and the check (reader_nested_percpu)
act on the same memory location, they will naturally be run in the given
order, without any need for barriers. Am I wrong?

(I referred Documentation/memory-barriers.txt again to verify this, and
the second point under the Guarantees section looked like it said the
same thing : Overlapping loads and stores within a particular CPU will
appear to be ordered within that CPU).

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Srivatsa S. Bhat wrote:

 On 12/12/2012 10:54 PM, Oleg Nesterov wrote:

  And when I look at get_online_cpus_atomic() again it uses rmb(). This
  doesn't look correct, we need the full barrier between this_cpu_inc()
  and writer_active().

 Hmm..

  At the same time reader_nested_percpu() can be checked before mb().

 I thought that since the increment and the check (reader_nested_percpu)
 act on the same memory location, they will naturally be run in the given
 order, without any need for barriers. Am I wrong?

And this is what I meant, you do not need a barrier before
reader_nested_percpu().

But you need to ensure that WRITE(reader_percpu_refcnt) and READ(writer_signal)
can't be reordered, so you need mb() in between. rmb() can serialize LOADs and
STOREs.

Or I misunderstood?

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
 On 12/12, Srivatsa S. Bhat wrote:

 On 12/12/2012 10:47 PM, Oleg Nesterov wrote:

 Why it needs to be per-cpu? It can be global and __read_mostly to avoid
 the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
 into a single cacheline...

 Even I realized this (that we could use a global) after posting out the
 series.. But do you think that it would be better to retain the per-cpu
 variant itself, due to the cache effects?
 
 I don't really know, up to you. This was the question ;)

OK :-)

 
 Do we really need local_irq_save/restore in put_ ?


 Hmm.. good point! I don't think we need it.
 
 And _perhaps_ get_ can avoid it too?
 
 I didn't really try to think, probably this is not right, but can't
 something like this work?
 
   #define (1  16)
   #define MASK( -1)
 
   void get_online_cpus_atomic(void)
   {
   preempt_disable();
 
   // only for writer
   __this_cpu_add(reader_percpu_refcnt, );
 
   if (__this_cpu_read(reader_percpu_refcnt)  MASK) {
   __this_cpu_inc(reader_percpu_refcnt);
   } else {
   smp_wmb();
   if (writer_active()) {
   ...
   }
   }
 
   __this_cpu_dec(reader_percpu_refcnt, );
   }
 

Sorry, may be I'm too blind to see, but I didn't understand the logic
of how the mask helps us avoid disabling interrupts.. Can you kindly
elaborate?

   void put_online_cpus_atomic(void)
   {
   if (__this_cpu_read(reader_percpu_refcnt)  MASK)
   __this_cpu_dec(reader_percpu_refcnt);
   else
   read_unlock(hotplug_rwlock);
   preempt_enable();
   }
 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/12/2012 11:53 PM, Oleg Nesterov wrote:
 On 12/12, Srivatsa S. Bhat wrote:

 On 12/12/2012 10:54 PM, Oleg Nesterov wrote:

 And when I look at get_online_cpus_atomic() again it uses rmb(). This
 doesn't look correct, we need the full barrier between this_cpu_inc()
 and writer_active().

 Hmm..

 At the same time reader_nested_percpu() can be checked before mb().

 I thought that since the increment and the check (reader_nested_percpu)
 act on the same memory location, they will naturally be run in the given
 order, without any need for barriers. Am I wrong?
 
 And this is what I meant, you do not need a barrier before
 reader_nested_percpu().
 

Ah, ok!

 But you need to ensure that WRITE(reader_percpu_refcnt) and 
 READ(writer_signal)
 can't be reordered, so you need mb() in between. rmb() can serialize LOADs and
 STOREs.
 

OK, got it. (I know you meant s/can/can't).

I'm trying to see if we can somehow exploit the fact that the writer can
potentially tolerate if a reader ignores his signal (to switch to rwlocks)
for a while... and use this to get rid of barriers in the reader path (without
using synchronize_sched() at the writer, of course). And perhaps also take 
advantage
of the fact that the read_lock() acts as a one-way barrier..

I don't know, maybe its not possible after all.. :-/
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:

 On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
  And _perhaps_ get_ can avoid it too?
 
  I didn't really try to think, probably this is not right, but can't
  something like this work?
 
  #define (1  16)
  #define MASK( -1)
 
  void get_online_cpus_atomic(void)
  {
  preempt_disable();
 
  // only for writer
  __this_cpu_add(reader_percpu_refcnt, );
 
  if (__this_cpu_read(reader_percpu_refcnt)  MASK) {
  __this_cpu_inc(reader_percpu_refcnt);
  } else {
  smp_wmb();
  if (writer_active()) {
  ...
  }
  }
 
  __this_cpu_dec(reader_percpu_refcnt, );
  }
 

 Sorry, may be I'm too blind to see, but I didn't understand the logic
 of how the mask helps us avoid disabling interrupts..

Why do we need cli/sti at all? We should prevent the following race:

- the writer already holds hotplug_rwlock, so get_ must not
  succeed.

- the new reader comes, it increments reader_percpu_refcnt,
  but before it checks writer_active() ...

- irq handler does get_online_cpus_atomic() and sees
  reader_nested_percpu() == T, so it simply increments
  reader_percpu_refcnt and succeeds.

OTOH, why do we need to increment reader_percpu_refcnt the counter
in advance? To ensure that either we see writer_active() or the
writer should see reader_percpu_refcnt != 0 (and that is why they
should write/read in reverse order).

The code above tries to avoid this race using the lower 16 bits
as a nested-counter, and the upper bits to avoid the race with
the writer.

// only for writer
__this_cpu_add(reader_percpu_refcnt, );

If irq comes and does get_online_cpus_atomic(), it won't be confused
by __this_cpu_add(), it will check the lower bits and switch to
the slow path.


But once again, so far I didn't really try to think. It is quite
possible I missed something.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
 On 12/13, Srivatsa S. Bhat wrote:

 On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
 And _perhaps_ get_ can avoid it too?

 I didn't really try to think, probably this is not right, but can't
 something like this work?

 #define (1  16)
 #define MASK( -1)

 void get_online_cpus_atomic(void)
 {
 preempt_disable();

 // only for writer
 __this_cpu_add(reader_percpu_refcnt, );

 if (__this_cpu_read(reader_percpu_refcnt)  MASK) {
 __this_cpu_inc(reader_percpu_refcnt);
 } else {
 smp_wmb();
 if (writer_active()) {
 ...
 }
 }

 __this_cpu_dec(reader_percpu_refcnt, );
 }


 Sorry, may be I'm too blind to see, but I didn't understand the logic
 of how the mask helps us avoid disabling interrupts..
 
 Why do we need cli/sti at all? We should prevent the following race:
 
   - the writer already holds hotplug_rwlock, so get_ must not
 succeed.
 
   - the new reader comes, it increments reader_percpu_refcnt,
 but before it checks writer_active() ...
 
   - irq handler does get_online_cpus_atomic() and sees
 reader_nested_percpu() == T, so it simply increments
 reader_percpu_refcnt and succeeds.
 
 OTOH, why do we need to increment reader_percpu_refcnt the counter
 in advance? To ensure that either we see writer_active() or the
 writer should see reader_percpu_refcnt != 0 (and that is why they
 should write/read in reverse order).
 
 The code above tries to avoid this race using the lower 16 bits
 as a nested-counter, and the upper bits to avoid the race with
 the writer.
 
   // only for writer
   __this_cpu_add(reader_percpu_refcnt, );
 
 If irq comes and does get_online_cpus_atomic(), it won't be confused
 by __this_cpu_add(), it will check the lower bits and switch to
 the slow path.
 

This is a very clever scheme indeed! :-) Thanks a lot for explaining
it in detail.

 
 But once again, so far I didn't really try to think. It is quite
 possible I missed something.
 

Even I don't spot anything wrong with it. But I'll give it some more
thought..

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/12, Oleg Nesterov wrote:

 On 12/12, Srivatsa S. Bhat wrote:
 
  On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
  
   Why it needs to be per-cpu? It can be global and __read_mostly to avoid
   the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
   into a single cacheline...
 
  Even I realized this (that we could use a global) after posting out the
  series.. But do you think that it would be better to retain the per-cpu
  variant itself, due to the cache effects?

 I don't really know, up to you. This was the question ;)

But perhaps there is another reason to make it per-cpu...

It seems we can avoid cpu_hotplug.active_writer == current check in
get/put.

take_cpu_down() can clear this_cpu(writer_signal) right after it takes
hotplug_rwlock for writing. It runs with irqs and preemption disabled,
nobody else will ever look at writer_signal on its CPU.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Srivatsa S. Bhat

On 12/13/2012 01:06 AM, Oleg Nesterov wrote:
 On 12/12, Oleg Nesterov wrote:

 On 12/12, Srivatsa S. Bhat wrote:

 On 12/12/2012 10:47 PM, Oleg Nesterov wrote:

 Why it needs to be per-cpu? It can be global and __read_mostly to avoid
 the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
 into a single cacheline...

 Even I realized this (that we could use a global) after posting out the
 series.. But do you think that it would be better to retain the per-cpu
 variant itself, due to the cache effects?

 I don't really know, up to you. This was the question ;)
 
 But perhaps there is another reason to make it per-cpu...
 
 It seems we can avoid cpu_hotplug.active_writer == current check in
 get/put.
 
 take_cpu_down() can clear this_cpu(writer_signal) right after it takes
 hotplug_rwlock for writing. It runs with irqs and preemption disabled,
 nobody else will ever look at writer_signal on its CPU.
 

Hmm.. And then the get/put_ on that CPU will increment/decrement the per-cpu
refcount, but we don't care.. because we only need to ensure that they don't
deadlock by taking the rwlock for read.

This sounds great!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-12 Thread Oleg Nesterov

On 12/13, Srivatsa S. Bhat wrote:

 On 12/13/2012 01:06 AM, Oleg Nesterov wrote:
 
  But perhaps there is another reason to make it per-cpu...

Actually this is not the reason, please see below. But let me repeat,
it is not that I suggest to remove per-cpu.

  It seems we can avoid cpu_hotplug.active_writer == current check in
  get/put.
 
  take_cpu_down() can clear this_cpu(writer_signal) right after it takes
  hotplug_rwlock for writing. It runs with irqs and preemption disabled,
  nobody else will ever look at writer_signal on its CPU.
 

 Hmm.. And then the get/put_ on that CPU will increment/decrement the per-cpu
 refcount, but we don't care.. because we only need to ensure that they don't
 deadlock by taking the rwlock for read.

Yes, but...

Probably it would be more clean to simply do this_cpu_inc(reader_percpu_refcnt)
after write_lock(hotplug_rwlock). This will have the same effect for get/put,
and we still can make writer_signal global (if we want).

And note that this will also simplify the lockdep annotations which we (imho)
should add later.

Ignoring all complications get_online_cpus_atomic() does:

if (this_cpu_read(reader_percpu_refcnt))
this_cpu_inc(reader_percpu_refcnt);
else if (!writer_signal)
this_cpu_inc(reader_percpu_refcnt); // same as above
else
read_lock(hotplug_rwlock);

But for lockdep it should do:

if (this_cpu_read(reader_percpu_refcnt))
this_cpu_inc(reader_percpu_refcnt);
else if (!writer_signal) {
this_cpu_inc(reader_percpu_refcnt);
// pretend we take hotplug_rwlock for lockdep
rwlock_acquire_read(hotplug_rwlock.dep_map, 0, 0);
}
else
read_lock(hotplug_rwlock);

And we need to ensure that rwlock_acquire_read() is not called under
write_lock(hotplug_rwlock).

If we use reader_percpu_refcnt to fool get/put, we should not worry.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

60 matches

Mail list logo