Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-08 Thread Andrea Parri
On Fri, Oct 06, 2017 at 10:32:19AM +0200, Peter Zijlstra wrote:
> > AFAIU the scheduler rq->lock is held while preemption is disabled.
> > synchronize_sched() is used here to ensure that all pre-existing
> > preempt-off critical sections have completed.
> > 
> > So saying that we use synchronize_sched() to synchronize with rq->lock
> > would be stretching the truth a bit. It's actually only true because the
> > scheduler holding the rq->lock is surrounded by a preempt-off
> > critical section.
> 
> No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which
> implies !preempt. Yes, we also surround the rq->lock usage with a
> slightly larger preempt_disable() section but that's not in fact
> required for this.

That's what it is, according to the current sources: we seemed to agree that
a preempt-off critical section is what we rely on here and that the start of
this critical section is not marked by that raw_spin_lock.

  Andrea


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-08 Thread Andrea Parri
On Fri, Oct 6, 2017 at 12:02 AM, Andrea Parri  wrote:
> On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
>> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
>>
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c 
>> >> b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index ..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> >
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> +  struct task_struct *t;
>> >> +
>> >> +  if (get_nr_threads(p) == 1) {
>> >> +  set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> +  return;
>> >> +  }
>> >> +  /*
>> >> +   * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +   * fork is protected by siglock.
>> >> +   */
>> >> +  spin_lock(>sighand->siglock);
>> >> +  for_each_thread(p, t)
>> >> +  set_ti_thread_flag(task_thread_info(t),
>> >> +  TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>>
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>>
>> copy_process() grabs spin_lock(>sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>>
>> What am I missing here ?
>>
>> >
>> >> +  spin_unlock(>sighand->siglock);
>> >> +  /*
>> >> +   * Ensure all future scheduler executions will observe the new
>> >> +   * thread flag state for this process.
>> >> +   */
>> >> +  synchronize_sched();
>> >
>> > This relies on the flag being read inside rq->lock, right?
>>
>> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
>> within switch_mm_irqs_off(), called by context_switch() before
>> finish_task_switch() releases the rq lock.
>
> I fail to graps the relation between this synchronize_sched() and rq->lock.

s/graps/grasp

  Andrea


>
> (Besides, we made no reference to rq->lock while discussing:
>
>   
> https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
>   replace membarrier_arch_sched_in with switch_mm_irqs_off )
>
> Could you elaborate?
>
>   Andrea
>
>
>>
>> Is the comment clear enough, or do you have suggestions for
>> improvements ?
>
>
>
>>
>> Thanks,
>>
>> Mathieu
>>
>> >
>> > > +}
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-08 Thread Andrea Parri
On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
> 
> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> >> diff --git a/arch/powerpc/kernel/membarrier.c 
> >> b/arch/powerpc/kernel/membarrier.c
> >> new file mode 100644
> >> index ..b0d79a5f5981
> >> --- /dev/null
> >> +++ b/arch/powerpc/kernel/membarrier.c
> >> @@ -0,0 +1,45 @@
> > 
> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> >> +{
> >> +  struct task_struct *t;
> >> +
> >> +  if (get_nr_threads(p) == 1) {
> >> +  set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> >> +  return;
> >> +  }
> >> +  /*
> >> +   * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> >> +   * fork is protected by siglock.
> >> +   */
> >> +  spin_lock(>sighand->siglock);
> >> +  for_each_thread(p, t)
> >> +  set_ti_thread_flag(task_thread_info(t),
> >> +  TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > 
> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> The intent here is to hold the sighand siglock to provide mutual
> exclusion against invocation of membarrier_fork(p, clone_flags)
> by copy_process().
> 
> copy_process() grabs spin_lock(>sighand->siglock) for both
> CLONE_THREAD and not CLONE_THREAD flags.
> 
> What am I missing here ?
> 
> > 
> >> +  spin_unlock(>sighand->siglock);
> >> +  /*
> >> +   * Ensure all future scheduler executions will observe the new
> >> +   * thread flag state for this process.
> >> +   */
> >> +  synchronize_sched();
> > 
> > This relies on the flag being read inside rq->lock, right?
> 
> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
> within switch_mm_irqs_off(), called by context_switch() before
> finish_task_switch() releases the rq lock.

I fail to graps the relation between this synchronize_sched() and rq->lock.

(Besides, we made no reference to rq->lock while discussing:

  
https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
  replace membarrier_arch_sched_in with switch_mm_irqs_off )

Could you elaborate?

  Andrea


> 
> Is the comment clear enough, or do you have suggestions for
> improvements ?



> 
> Thanks,
> 
> Mathieu
> 
> > 
> > > +}
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-08 Thread Avi Kivity



On 10/05/2017 07:23 AM, Nicholas Piggin wrote:

On Wed,  4 Oct 2017 14:37:53 -0700
"Paul E. McKenney"  wrote:


From: Mathieu Desnoyers 

Provide a new command allowing processes to register their intent to use
the private expedited command.

This allows PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
   powerpc membarrier_arch_sched_in(), given that we want to specifically
   check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
   *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
   build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
   allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
   field on fork.
- powerpc: test thread flag instead of reading
   membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
   from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
   code, thus eliminating a race where concurrent commands could succeed
   when they should fail if issued concurrently with process
   registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
   membarrier_private_expedited. Matches WRITE_ONCE() performed in
   process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
   from Nicholas Piggin.

For now, the powerpc approach is okay by me. I plan to test
others (e.g., taking runqueue locks) on larger systems, but that can
be sent as an incremental patch at a later time.

The main thing I would like is for people to review the userspace API.



As a future satisfied user of the expedited private membarrier syscall, 
I am happy with the change.




Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-06 Thread Peter Zijlstra
> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
> 
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit. It's actually only true because the
> scheduler holding the rq->lock is surrounded by a preempt-off
> critical section.

No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which
implies !preempt. Yes, we also surround the rq->lock usage with a
slightly larger preempt_disable() section but that's not in fact
required for this.


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Steven Rostedt
On Thu, 5 Oct 2017 22:19:15 + (UTC)
Mathieu Desnoyers  wrote:

> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
> 
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit. It's actually only true because the
> scheduler holding the rq->lock is surrounded by a preempt-off
> critical section.

Not only is preemption disabled (which is true for all spinlocks, at
least with non PREEMPT_RT), but the rq->lock must also only be taken
with interrupts disabled (for PREEMPT_RT as well). Because it can also
be taken in interrupt context.

-- Steve


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers
- On Oct 5, 2017, at 6:02 PM, Andrea Parri parri.and...@gmail.com wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
>> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c 
>> >> b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index ..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> + struct task_struct *t;
>> >> +
>> >> + if (get_nr_threads(p) == 1) {
>> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> + return;
>> >> + }
>> >> + /*
>> >> +  * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +  * fork is protected by siglock.
>> >> +  */
>> >> + spin_lock(>sighand->siglock);
>> >> + for_each_thread(p, t)
>> >> + set_ti_thread_flag(task_thread_info(t),
>> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(>sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
>> 
>> > 
>> >> + spin_unlock(>sighand->siglock);
>> >> + /*
>> >> +  * Ensure all future scheduler executions will observe the new
>> >> +  * thread flag state for this process.
>> >> +  */
>> >> + synchronize_sched();
>> > 
>> > This relies on the flag being read inside rq->lock, right?
>> 
>> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
>> within switch_mm_irqs_off(), called by context_switch() before
>> finish_task_switch() releases the rq lock.
> 
> I fail to grap the relation between this synchronize_sched() and rq->lock.
> 
> (Besides, we made no reference to rq->lock while discussing:
> 
>  
> https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
>  replace membarrier_arch_sched_in with switch_mm_irqs_off )
> 
> Could you elaborate?

Hi Andrea,

AFAIU the scheduler rq->lock is held while preemption is disabled.
synchronize_sched() is used here to ensure that all pre-existing
preempt-off critical sections have completed.

So saying that we use synchronize_sched() to synchronize with rq->lock
would be stretching the truth a bit. It's actually only true because the
scheduler holding the rq->lock is surrounded by a preempt-off
critical section.

Thanks,

Mathieu



> 
>  Andrea
> 
> 
>> 
>> Is the comment clear enough, or do you have suggestions for
>> improvements ?
> 
> 
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> > 
>> > > +}
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers
- On Oct 5, 2017, at 12:21 PM, Peter Zijlstra pet...@infradead.org wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
>> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c 
>> >> b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index ..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> + struct task_struct *t;
>> >> +
>> >> + if (get_nr_threads(p) == 1) {
>> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> + return;
>> >> + }
>> >> + /*
>> >> +  * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +  * fork is protected by siglock.
>> >> +  */
>> >> + spin_lock(>sighand->siglock);
>> >> + for_each_thread(p, t)
>> >> + set_ti_thread_flag(task_thread_info(t),
>> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(>sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
> 
> If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
> you'll not be part of thread_head, so the for_each_thread() iteration
> will not find the task.

Excellent point. Please see the follow up RFC patch I posted taking care of
this matter.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Peter Zijlstra
On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
> 
> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> >> diff --git a/arch/powerpc/kernel/membarrier.c 
> >> b/arch/powerpc/kernel/membarrier.c
> >> new file mode 100644
> >> index ..b0d79a5f5981
> >> --- /dev/null
> >> +++ b/arch/powerpc/kernel/membarrier.c
> >> @@ -0,0 +1,45 @@
> > 
> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> >> +{
> >> +  struct task_struct *t;
> >> +
> >> +  if (get_nr_threads(p) == 1) {
> >> +  set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> >> +  return;
> >> +  }
> >> +  /*
> >> +   * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> >> +   * fork is protected by siglock.
> >> +   */
> >> +  spin_lock(>sighand->siglock);
> >> +  for_each_thread(p, t)
> >> +  set_ti_thread_flag(task_thread_info(t),
> >> +  TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > 
> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> The intent here is to hold the sighand siglock to provide mutual
> exclusion against invocation of membarrier_fork(p, clone_flags)
> by copy_process().
> 
> copy_process() grabs spin_lock(>sighand->siglock) for both
> CLONE_THREAD and not CLONE_THREAD flags.
> 
> What am I missing here ?

If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
you'll not be part of thread_head, so the for_each_thread() iteration
will not find the task.


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers

- On Oct 5, 2017, at 8:24 AM, Peter Zijlstra pet...@infradead.org wrote:

> On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote:
>> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> > diff --git a/arch/powerpc/kernel/membarrier.c 
>> > b/arch/powerpc/kernel/membarrier.c
>> > new file mode 100644
>> > index ..b0d79a5f5981
>> > --- /dev/null
>> > +++ b/arch/powerpc/kernel/membarrier.c
>> > @@ -0,0 +1,45 @@
>> 
>> > +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> > +{
>> > +  struct task_struct *t;
>> > +
>> > +  if (get_nr_threads(p) == 1) {
>> > +  set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > +  return;
>> > +  }
>> > +  /*
>> > +   * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> > +   * fork is protected by siglock.
>> > +   */
>> > +  spin_lock(>sighand->siglock);
>> > +  for_each_thread(p, t)
>> > +  set_ti_thread_flag(task_thread_info(t),
>> > +  TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> 
>> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> Also, for easier reading I would suggest putting { } around the block.

Will do, thanks,

Mathieu

> 
>> > +  spin_unlock(>sighand->siglock);
>> > +  /*
>> > +   * Ensure all future scheduler executions will observe the new
>> > +   * thread flag state for this process.
>> > +   */
>> > +  synchronize_sched();
>> 
>> This relies on the flag being read inside rq->lock, right?
>> 
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers
- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:

> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> diff --git a/arch/powerpc/kernel/membarrier.c 
>> b/arch/powerpc/kernel/membarrier.c
>> new file mode 100644
>> index ..b0d79a5f5981
>> --- /dev/null
>> +++ b/arch/powerpc/kernel/membarrier.c
>> @@ -0,0 +1,45 @@
> 
>> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> +{
>> +struct task_struct *t;
>> +
>> +if (get_nr_threads(p) == 1) {
>> +set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> +return;
>> +}
>> +/*
>> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> + * fork is protected by siglock.
>> + */
>> +spin_lock(>sighand->siglock);
>> +for_each_thread(p, t)
>> +set_ti_thread_flag(task_thread_info(t),
>> +TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> 
> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

The intent here is to hold the sighand siglock to provide mutual
exclusion against invocation of membarrier_fork(p, clone_flags)
by copy_process().

copy_process() grabs spin_lock(>sighand->siglock) for both
CLONE_THREAD and not CLONE_THREAD flags.

What am I missing here ?

> 
>> +spin_unlock(>sighand->siglock);
>> +/*
>> + * Ensure all future scheduler executions will observe the new
>> + * thread flag state for this process.
>> + */
>> +synchronize_sched();
> 
> This relies on the flag being read inside rq->lock, right?

Yes. The flag is read by membarrier_arch_switch_mm(), invoked
within switch_mm_irqs_off(), called by context_switch() before
finish_task_switch() releases the rq lock.

Is the comment clear enough, or do you have suggestions for
improvements ?

Thanks,

Mathieu

> 
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers
- On Oct 5, 2017, at 8:22 AM, Avi Kivity a...@scylladb.com wrote:

> On 10/05/2017 07:23 AM, Nicholas Piggin wrote:
>> On Wed,  4 Oct 2017 14:37:53 -0700
>> "Paul E. McKenney"  wrote:
>>
>>> From: Mathieu Desnoyers 
>>>
>>> Provide a new command allowing processes to register their intent to use
>>> the private expedited command.
>>>
>>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>>> only issue the barrier when scheduling into a task belonging to a
>>> process that has registered to use expedited private.
>>>
>>> Processes are now required to register before using
>>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>>>
>>> Changes since v1:
>>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>>powerpc membarrier_arch_sched_in(), given that we want to specifically
>>>check the next thread state.
>>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>>> - Use task_thread_info() to pass thread_info from task to
>>>*_ti_thread_flag().
>>>
>>> Changes since v2:
>>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>>> - Check for NULL t->mm in membarrier_arch_fork().
>>> - Use membarrier_sched_in() in generic code, which invokes the
>>>arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>>build on PowerPC.
>>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>>allnoconfig build on PowerPC.
>>> - Build and runtime tested on PowerPC.
>>>
>>> Changes since v3:
>>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>>field on fork.
>>> - powerpc: test thread flag instead of reading
>>>membarrier_private_expedited in membarrier_arch_fork().
>>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>>from kernel thread, since mmdrop() implies a full barrier.
>>> - Set membarrier_private_expedited to 1 only after arch registration
>>>code, thus eliminating a race where concurrent commands could succeed
>>>when they should fail if issued concurrently with process
>>>registration.
>>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>>membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>>process registration.
>>>
>>> Changes since v4:
>>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>>from Nicholas Piggin.
>> For now, the powerpc approach is okay by me. I plan to test
>> others (e.g., taking runqueue locks) on larger systems, but that can
>> be sent as an incremental patch at a later time.
>>
>> The main thing I would like is for people to review the userspace API.
>>
> 
> As a future satisfied user of the expedited private membarrier syscall,
> I am happy with the change.

Thanks Avi for your input on the userspace API.

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers
- On Oct 5, 2017, at 12:23 AM, Nicholas Piggin npig...@gmail.com wrote:

> On Wed,  4 Oct 2017 14:37:53 -0700
> "Paul E. McKenney"  wrote:
> 
>> From: Mathieu Desnoyers 
>> 
>> Provide a new command allowing processes to register their intent to use
>> the private expedited command.
>> 
>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>> only issue the barrier when scheduling into a task belonging to a
>> process that has registered to use expedited private.
>> 
>> Processes are now required to register before using
>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>> 
>> Changes since v1:
>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>   powerpc membarrier_arch_sched_in(), given that we want to specifically
>>   check the next thread state.
>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>> - Use task_thread_info() to pass thread_info from task to
>>   *_ti_thread_flag().
>> 
>> Changes since v2:
>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>> - Check for NULL t->mm in membarrier_arch_fork().
>> - Use membarrier_sched_in() in generic code, which invokes the
>>   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>   build on PowerPC.
>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>   allnoconfig build on PowerPC.
>> - Build and runtime tested on PowerPC.
>> 
>> Changes since v3:
>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>   field on fork.
>> - powerpc: test thread flag instead of reading
>>   membarrier_private_expedited in membarrier_arch_fork().
>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>   from kernel thread, since mmdrop() implies a full barrier.
>> - Set membarrier_private_expedited to 1 only after arch registration
>>   code, thus eliminating a race where concurrent commands could succeed
>>   when they should fail if issued concurrently with process
>>   registration.
>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>   membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>   process registration.
>> 
>> Changes since v4:
>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>   from Nicholas Piggin.
> 
> For now, the powerpc approach is okay by me. I plan to test
> others (e.g., taking runqueue locks) on larger systems, but that can
> be sent as an incremental patch at a later time.
> 
> The main thing I would like is for people to review the userspace API.
> 
> 
>> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
>> index 3a19c253bdb1..4af1b719c65f 100644
>> --- a/include/linux/sched/mm.h
>> +++ b/include/linux/sched/mm.h
>> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned 
>> int
>> flags)
>>  current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>>  }
>>  
>> +#ifdef CONFIG_MEMBARRIER
>> +
>> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
>> +#include 
>> +#else
>> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
>> +struct mm_struct *next, struct task_struct *tsk)
>> +{
>> +}
> 
> This is no longer required in architecture independent code, is it?

Yes, good point! I'll remove this unused code in a follow up patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Peter Zijlstra
On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> > diff --git a/arch/powerpc/kernel/membarrier.c 
> > b/arch/powerpc/kernel/membarrier.c
> > new file mode 100644
> > index ..b0d79a5f5981
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/membarrier.c
> > @@ -0,0 +1,45 @@
> 
> > +void membarrier_arch_register_private_expedited(struct task_struct *p)
> > +{
> > +   struct task_struct *t;
> > +
> > +   if (get_nr_threads(p) == 1) {
> > +   set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > +   return;
> > +   }
> > +   /*
> > +* Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> > +* fork is protected by siglock.
> > +*/
> > +   spin_lock(>sighand->siglock);
> > +   for_each_thread(p, t)
> > +   set_ti_thread_flag(task_thread_info(t),
> > +   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> 
> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

Also, for easier reading I would suggest putting { } around the block.

> > +   spin_unlock(>sighand->siglock);
> > +   /*
> > +* Ensure all future scheduler executions will observe the new
> > +* thread flag state for this process.
> > +*/
> > +   synchronize_sched();
> 
> This relies on the flag being read inside rq->lock, right?
> 
> > +}


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Peter Zijlstra
On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> diff --git a/arch/powerpc/kernel/membarrier.c 
> b/arch/powerpc/kernel/membarrier.c
> new file mode 100644
> index ..b0d79a5f5981
> --- /dev/null
> +++ b/arch/powerpc/kernel/membarrier.c
> @@ -0,0 +1,45 @@

> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> +{
> + struct task_struct *t;
> +
> + if (get_nr_threads(p) == 1) {
> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> + return;
> + }
> + /*
> +  * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> +  * fork is protected by siglock.
> +  */
> + spin_lock(>sighand->siglock);
> + for_each_thread(p, t)
> + set_ti_thread_flag(task_thread_info(t),
> + TIF_MEMBARRIER_PRIVATE_EXPEDITED);

I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

> + spin_unlock(>sighand->siglock);
> + /*
> +  * Ensure all future scheduler executions will observe the new
> +  * thread flag state for this process.
> +  */
> + synchronize_sched();

This relies on the flag being read inside rq->lock, right?

> +}


Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-04 Thread Nicholas Piggin
On Wed,  4 Oct 2017 14:37:53 -0700
"Paul E. McKenney"  wrote:

> From: Mathieu Desnoyers 
> 
> Provide a new command allowing processes to register their intent to use
> the private expedited command.
> 
> This allows PowerPC to skip the full memory barrier in switch_mm(), and
> only issue the barrier when scheduling into a task belonging to a
> process that has registered to use expedited private.
> 
> Processes are now required to register before using
> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
> 
> Changes since v1:
> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>   powerpc membarrier_arch_sched_in(), given that we want to specifically
>   check the next thread state.
> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
> - Use task_thread_info() to pass thread_info from task to
>   *_ti_thread_flag().
> 
> Changes since v2:
> - Move membarrier_arch_sched_in() call to finish_task_switch().
> - Check for NULL t->mm in membarrier_arch_fork().
> - Use membarrier_sched_in() in generic code, which invokes the
>   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>   build on PowerPC.
> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>   allnoconfig build on PowerPC.
> - Build and runtime tested on PowerPC.
> 
> Changes since v3:
> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>   field on fork.
> - powerpc: test thread flag instead of reading
>   membarrier_private_expedited in membarrier_arch_fork().
> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>   from kernel thread, since mmdrop() implies a full barrier.
> - Set membarrier_private_expedited to 1 only after arch registration
>   code, thus eliminating a race where concurrent commands could succeed
>   when they should fail if issued concurrently with process
>   registration.
> - Use READ_ONCE() for membarrier_private_expedited field access in
>   membarrier_private_expedited. Matches WRITE_ONCE() performed in
>   process registration.
> 
> Changes since v4:
> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>   from Nicholas Piggin.

For now, the powerpc approach is okay by me. I plan to test
others (e.g., taking runqueue locks) on larger systems, but that can
be sent as an incremental patch at a later time.

The main thing I would like is for people to review the userspace API.


> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 3a19c253bdb1..4af1b719c65f 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned 
> int flags)
>   current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>  }
>  
> +#ifdef CONFIG_MEMBARRIER
> +
> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
> +#include 
> +#else
> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> + struct mm_struct *next, struct task_struct *tsk)
> +{
> +}

This is no longer required in architecture independent code, is it?