Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Fri, Oct 06, 2017 at 10:32:19AM +0200, Peter Zijlstra wrote: > > AFAIU the scheduler rq->lock is held while preemption is disabled. > > synchronize_sched() is used here to ensure that all pre-existing > > preempt-off critical sections have completed. > > > > So saying that we use synchronize_sched() to synchronize with rq->lock > > would be stretching the truth a bit. It's actually only true because the > > scheduler holding the rq->lock is surrounded by a preempt-off > > critical section. > > No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which > implies !preempt. Yes, we also surround the rq->lock usage with a > slightly larger preempt_disable() section but that's not in fact > required for this. That's what it is, according to the current sources: we seemed to agree that a preempt-off critical section is what we rely on here and that the start of this critical section is not marked by that raw_spin_lock. Andrea
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Fri, Oct 6, 2017 at 12:02 AM, Andrea Parriwrote: > On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote: >> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote: >> >> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: >> >> diff --git a/arch/powerpc/kernel/membarrier.c >> >> b/arch/powerpc/kernel/membarrier.c >> >> new file mode 100644 >> >> index ..b0d79a5f5981 >> >> --- /dev/null >> >> +++ b/arch/powerpc/kernel/membarrier.c >> >> @@ -0,0 +1,45 @@ >> > >> >> +void membarrier_arch_register_private_expedited(struct task_struct *p) >> >> +{ >> >> + struct task_struct *t; >> >> + >> >> + if (get_nr_threads(p) == 1) { >> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> >> + return; >> >> + } >> >> + /* >> >> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread >> >> + * fork is protected by siglock. >> >> + */ >> >> + spin_lock(>sighand->siglock); >> >> + for_each_thread(p, t) >> >> + set_ti_thread_flag(task_thread_info(t), >> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> > >> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. >> >> The intent here is to hold the sighand siglock to provide mutual >> exclusion against invocation of membarrier_fork(p, clone_flags) >> by copy_process(). >> >> copy_process() grabs spin_lock(>sighand->siglock) for both >> CLONE_THREAD and not CLONE_THREAD flags. >> >> What am I missing here ? >> >> > >> >> + spin_unlock(>sighand->siglock); >> >> + /* >> >> + * Ensure all future scheduler executions will observe the new >> >> + * thread flag state for this process. >> >> + */ >> >> + synchronize_sched(); >> > >> > This relies on the flag being read inside rq->lock, right? >> >> Yes. The flag is read by membarrier_arch_switch_mm(), invoked >> within switch_mm_irqs_off(), called by context_switch() before >> finish_task_switch() releases the rq lock. > > I fail to graps the relation between this synchronize_sched() and rq->lock. s/graps/grasp Andrea > > (Besides, we made no reference to rq->lock while discussing: > > > https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b > replace membarrier_arch_sched_in with switch_mm_irqs_off ) > > Could you elaborate? > > Andrea > > >> >> Is the comment clear enough, or do you have suggestions for >> improvements ? > > > >> >> Thanks, >> >> Mathieu >> >> > >> > > +} >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote: > - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote: > > > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: > >> diff --git a/arch/powerpc/kernel/membarrier.c > >> b/arch/powerpc/kernel/membarrier.c > >> new file mode 100644 > >> index ..b0d79a5f5981 > >> --- /dev/null > >> +++ b/arch/powerpc/kernel/membarrier.c > >> @@ -0,0 +1,45 @@ > > > >> +void membarrier_arch_register_private_expedited(struct task_struct *p) > >> +{ > >> + struct task_struct *t; > >> + > >> + if (get_nr_threads(p) == 1) { > >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); > >> + return; > >> + } > >> + /* > >> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread > >> + * fork is protected by siglock. > >> + */ > >> + spin_lock(>sighand->siglock); > >> + for_each_thread(p, t) > >> + set_ti_thread_flag(task_thread_info(t), > >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED); > > > > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. > > The intent here is to hold the sighand siglock to provide mutual > exclusion against invocation of membarrier_fork(p, clone_flags) > by copy_process(). > > copy_process() grabs spin_lock(>sighand->siglock) for both > CLONE_THREAD and not CLONE_THREAD flags. > > What am I missing here ? > > > > >> + spin_unlock(>sighand->siglock); > >> + /* > >> + * Ensure all future scheduler executions will observe the new > >> + * thread flag state for this process. > >> + */ > >> + synchronize_sched(); > > > > This relies on the flag being read inside rq->lock, right? > > Yes. The flag is read by membarrier_arch_switch_mm(), invoked > within switch_mm_irqs_off(), called by context_switch() before > finish_task_switch() releases the rq lock. I fail to graps the relation between this synchronize_sched() and rq->lock. (Besides, we made no reference to rq->lock while discussing: https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b replace membarrier_arch_sched_in with switch_mm_irqs_off ) Could you elaborate? Andrea > > Is the comment clear enough, or do you have suggestions for > improvements ? > > Thanks, > > Mathieu > > > > > > +} > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On 10/05/2017 07:23 AM, Nicholas Piggin wrote: On Wed, 4 Oct 2017 14:37:53 -0700 "Paul E. McKenney"wrote: From: Mathieu Desnoyers Provide a new command allowing processes to register their intent to use the private expedited command. This allows PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Processes are now required to register before using MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. Changes since v1: - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in powerpc membarrier_arch_sched_in(), given that we want to specifically check the next thread state. - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. - Use task_thread_info() to pass thread_info from task to *_ti_thread_flag(). Changes since v2: - Move membarrier_arch_sched_in() call to finish_task_switch(). - Check for NULL t->mm in membarrier_arch_fork(). - Use membarrier_sched_in() in generic code, which invokes the arch-specific membarrier_arch_sched_in(). This fixes allnoconfig build on PowerPC. - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing allnoconfig build on PowerPC. - Build and runtime tested on PowerPC. Changes since v3: - Simply rely on copy_mm() to copy the membarrier_private_expedited mm field on fork. - powerpc: test thread flag instead of reading membarrier_private_expedited in membarrier_arch_fork(). - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming from kernel thread, since mmdrop() implies a full barrier. - Set membarrier_private_expedited to 1 only after arch registration code, thus eliminating a race where concurrent commands could succeed when they should fail if issued concurrently with process registration. - Use READ_ONCE() for membarrier_private_expedited field access in membarrier_private_expedited. Matches WRITE_ONCE() performed in process registration. Changes since v4: - Move powerpc hook from sched_in() to switch_mm(), based on feedback from Nicholas Piggin. For now, the powerpc approach is okay by me. I plan to test others (e.g., taking runqueue locks) on larger systems, but that can be sent as an incremental patch at a later time. The main thing I would like is for people to review the userspace API. As a future satisfied user of the expedited private membarrier syscall, I am happy with the change.
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
> AFAIU the scheduler rq->lock is held while preemption is disabled. > synchronize_sched() is used here to ensure that all pre-existing > preempt-off critical sections have completed. > > So saying that we use synchronize_sched() to synchronize with rq->lock > would be stretching the truth a bit. It's actually only true because the > scheduler holding the rq->lock is surrounded by a preempt-off > critical section. No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which implies !preempt. Yes, we also surround the rq->lock usage with a slightly larger preempt_disable() section but that's not in fact required for this.
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Thu, 5 Oct 2017 22:19:15 + (UTC) Mathieu Desnoyerswrote: > AFAIU the scheduler rq->lock is held while preemption is disabled. > synchronize_sched() is used here to ensure that all pre-existing > preempt-off critical sections have completed. > > So saying that we use synchronize_sched() to synchronize with rq->lock > would be stretching the truth a bit. It's actually only true because the > scheduler holding the rq->lock is surrounded by a preempt-off > critical section. Not only is preemption disabled (which is true for all spinlocks, at least with non PREEMPT_RT), but the rq->lock must also only be taken with interrupts disabled (for PREEMPT_RT as well). Because it can also be taken in interrupt context. -- Steve
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
- On Oct 5, 2017, at 6:02 PM, Andrea Parri parri.and...@gmail.com wrote: > On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote: >> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote: >> >> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: >> >> diff --git a/arch/powerpc/kernel/membarrier.c >> >> b/arch/powerpc/kernel/membarrier.c >> >> new file mode 100644 >> >> index ..b0d79a5f5981 >> >> --- /dev/null >> >> +++ b/arch/powerpc/kernel/membarrier.c >> >> @@ -0,0 +1,45 @@ >> > >> >> +void membarrier_arch_register_private_expedited(struct task_struct *p) >> >> +{ >> >> + struct task_struct *t; >> >> + >> >> + if (get_nr_threads(p) == 1) { >> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> >> + return; >> >> + } >> >> + /* >> >> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread >> >> + * fork is protected by siglock. >> >> + */ >> >> + spin_lock(>sighand->siglock); >> >> + for_each_thread(p, t) >> >> + set_ti_thread_flag(task_thread_info(t), >> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> > >> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. >> >> The intent here is to hold the sighand siglock to provide mutual >> exclusion against invocation of membarrier_fork(p, clone_flags) >> by copy_process(). >> >> copy_process() grabs spin_lock(>sighand->siglock) for both >> CLONE_THREAD and not CLONE_THREAD flags. >> >> What am I missing here ? >> >> > >> >> + spin_unlock(>sighand->siglock); >> >> + /* >> >> + * Ensure all future scheduler executions will observe the new >> >> + * thread flag state for this process. >> >> + */ >> >> + synchronize_sched(); >> > >> > This relies on the flag being read inside rq->lock, right? >> >> Yes. The flag is read by membarrier_arch_switch_mm(), invoked >> within switch_mm_irqs_off(), called by context_switch() before >> finish_task_switch() releases the rq lock. > > I fail to grap the relation between this synchronize_sched() and rq->lock. > > (Besides, we made no reference to rq->lock while discussing: > > > https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b > replace membarrier_arch_sched_in with switch_mm_irqs_off ) > > Could you elaborate? Hi Andrea, AFAIU the scheduler rq->lock is held while preemption is disabled. synchronize_sched() is used here to ensure that all pre-existing preempt-off critical sections have completed. So saying that we use synchronize_sched() to synchronize with rq->lock would be stretching the truth a bit. It's actually only true because the scheduler holding the rq->lock is surrounded by a preempt-off critical section. Thanks, Mathieu > > Andrea > > >> >> Is the comment clear enough, or do you have suggestions for >> improvements ? > > > >> >> Thanks, >> >> Mathieu >> >> > >> > > +} >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
- On Oct 5, 2017, at 12:21 PM, Peter Zijlstra pet...@infradead.org wrote: > On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote: >> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote: >> >> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: >> >> diff --git a/arch/powerpc/kernel/membarrier.c >> >> b/arch/powerpc/kernel/membarrier.c >> >> new file mode 100644 >> >> index ..b0d79a5f5981 >> >> --- /dev/null >> >> +++ b/arch/powerpc/kernel/membarrier.c >> >> @@ -0,0 +1,45 @@ >> > >> >> +void membarrier_arch_register_private_expedited(struct task_struct *p) >> >> +{ >> >> + struct task_struct *t; >> >> + >> >> + if (get_nr_threads(p) == 1) { >> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> >> + return; >> >> + } >> >> + /* >> >> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread >> >> + * fork is protected by siglock. >> >> + */ >> >> + spin_lock(>sighand->siglock); >> >> + for_each_thread(p, t) >> >> + set_ti_thread_flag(task_thread_info(t), >> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> > >> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. >> >> The intent here is to hold the sighand siglock to provide mutual >> exclusion against invocation of membarrier_fork(p, clone_flags) >> by copy_process(). >> >> copy_process() grabs spin_lock(>sighand->siglock) for both >> CLONE_THREAD and not CLONE_THREAD flags. >> >> What am I missing here ? > > If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but > you'll not be part of thread_head, so the for_each_thread() iteration > will not find the task. Excellent point. Please see the follow up RFC patch I posted taking care of this matter. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote: > - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote: > > > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: > >> diff --git a/arch/powerpc/kernel/membarrier.c > >> b/arch/powerpc/kernel/membarrier.c > >> new file mode 100644 > >> index ..b0d79a5f5981 > >> --- /dev/null > >> +++ b/arch/powerpc/kernel/membarrier.c > >> @@ -0,0 +1,45 @@ > > > >> +void membarrier_arch_register_private_expedited(struct task_struct *p) > >> +{ > >> + struct task_struct *t; > >> + > >> + if (get_nr_threads(p) == 1) { > >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); > >> + return; > >> + } > >> + /* > >> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread > >> + * fork is protected by siglock. > >> + */ > >> + spin_lock(>sighand->siglock); > >> + for_each_thread(p, t) > >> + set_ti_thread_flag(task_thread_info(t), > >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED); > > > > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. > > The intent here is to hold the sighand siglock to provide mutual > exclusion against invocation of membarrier_fork(p, clone_flags) > by copy_process(). > > copy_process() grabs spin_lock(>sighand->siglock) for both > CLONE_THREAD and not CLONE_THREAD flags. > > What am I missing here ? If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but you'll not be part of thread_head, so the for_each_thread() iteration will not find the task.
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
- On Oct 5, 2017, at 8:24 AM, Peter Zijlstra pet...@infradead.org wrote: > On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote: >> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: >> > diff --git a/arch/powerpc/kernel/membarrier.c >> > b/arch/powerpc/kernel/membarrier.c >> > new file mode 100644 >> > index ..b0d79a5f5981 >> > --- /dev/null >> > +++ b/arch/powerpc/kernel/membarrier.c >> > @@ -0,0 +1,45 @@ >> >> > +void membarrier_arch_register_private_expedited(struct task_struct *p) >> > +{ >> > + struct task_struct *t; >> > + >> > + if (get_nr_threads(p) == 1) { >> > + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> > + return; >> > + } >> > + /* >> > + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread >> > + * fork is protected by siglock. >> > + */ >> > + spin_lock(>sighand->siglock); >> > + for_each_thread(p, t) >> > + set_ti_thread_flag(task_thread_info(t), >> > + TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> >> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. > > Also, for easier reading I would suggest putting { } around the block. Will do, thanks, Mathieu > >> > + spin_unlock(>sighand->siglock); >> > + /* >> > + * Ensure all future scheduler executions will observe the new >> > + * thread flag state for this process. >> > + */ >> > + synchronize_sched(); >> >> This relies on the flag being read inside rq->lock, right? >> > > > +} -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote: > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: >> diff --git a/arch/powerpc/kernel/membarrier.c >> b/arch/powerpc/kernel/membarrier.c >> new file mode 100644 >> index ..b0d79a5f5981 >> --- /dev/null >> +++ b/arch/powerpc/kernel/membarrier.c >> @@ -0,0 +1,45 @@ > >> +void membarrier_arch_register_private_expedited(struct task_struct *p) >> +{ >> +struct task_struct *t; >> + >> +if (get_nr_threads(p) == 1) { >> +set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); >> +return; >> +} >> +/* >> + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread >> + * fork is protected by siglock. >> + */ >> +spin_lock(>sighand->siglock); >> +for_each_thread(p, t) >> +set_ti_thread_flag(task_thread_info(t), >> +TIF_MEMBARRIER_PRIVATE_EXPEDITED); > > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. The intent here is to hold the sighand siglock to provide mutual exclusion against invocation of membarrier_fork(p, clone_flags) by copy_process(). copy_process() grabs spin_lock(>sighand->siglock) for both CLONE_THREAD and not CLONE_THREAD flags. What am I missing here ? > >> +spin_unlock(>sighand->siglock); >> +/* >> + * Ensure all future scheduler executions will observe the new >> + * thread flag state for this process. >> + */ >> +synchronize_sched(); > > This relies on the flag being read inside rq->lock, right? Yes. The flag is read by membarrier_arch_switch_mm(), invoked within switch_mm_irqs_off(), called by context_switch() before finish_task_switch() releases the rq lock. Is the comment clear enough, or do you have suggestions for improvements ? Thanks, Mathieu > > > +} -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
- On Oct 5, 2017, at 8:22 AM, Avi Kivity a...@scylladb.com wrote: > On 10/05/2017 07:23 AM, Nicholas Piggin wrote: >> On Wed, 4 Oct 2017 14:37:53 -0700 >> "Paul E. McKenney"wrote: >> >>> From: Mathieu Desnoyers >>> >>> Provide a new command allowing processes to register their intent to use >>> the private expedited command. >>> >>> This allows PowerPC to skip the full memory barrier in switch_mm(), and >>> only issue the barrier when scheduling into a task belonging to a >>> process that has registered to use expedited private. >>> >>> Processes are now required to register before using >>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. >>> >>> Changes since v1: >>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in >>>powerpc membarrier_arch_sched_in(), given that we want to specifically >>>check the next thread state. >>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. >>> - Use task_thread_info() to pass thread_info from task to >>>*_ti_thread_flag(). >>> >>> Changes since v2: >>> - Move membarrier_arch_sched_in() call to finish_task_switch(). >>> - Check for NULL t->mm in membarrier_arch_fork(). >>> - Use membarrier_sched_in() in generic code, which invokes the >>>arch-specific membarrier_arch_sched_in(). This fixes allnoconfig >>>build on PowerPC. >>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing >>>allnoconfig build on PowerPC. >>> - Build and runtime tested on PowerPC. >>> >>> Changes since v3: >>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm >>>field on fork. >>> - powerpc: test thread flag instead of reading >>>membarrier_private_expedited in membarrier_arch_fork(). >>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming >>>from kernel thread, since mmdrop() implies a full barrier. >>> - Set membarrier_private_expedited to 1 only after arch registration >>>code, thus eliminating a race where concurrent commands could succeed >>>when they should fail if issued concurrently with process >>>registration. >>> - Use READ_ONCE() for membarrier_private_expedited field access in >>>membarrier_private_expedited. Matches WRITE_ONCE() performed in >>>process registration. >>> >>> Changes since v4: >>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback >>>from Nicholas Piggin. >> For now, the powerpc approach is okay by me. I plan to test >> others (e.g., taking runqueue locks) on larger systems, but that can >> be sent as an incremental patch at a later time. >> >> The main thing I would like is for people to review the userspace API. >> > > As a future satisfied user of the expedited private membarrier syscall, > I am happy with the change. Thanks Avi for your input on the userspace API. Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
- On Oct 5, 2017, at 12:23 AM, Nicholas Piggin npig...@gmail.com wrote: > On Wed, 4 Oct 2017 14:37:53 -0700 > "Paul E. McKenney"wrote: > >> From: Mathieu Desnoyers >> >> Provide a new command allowing processes to register their intent to use >> the private expedited command. >> >> This allows PowerPC to skip the full memory barrier in switch_mm(), and >> only issue the barrier when scheduling into a task belonging to a >> process that has registered to use expedited private. >> >> Processes are now required to register before using >> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. >> >> Changes since v1: >> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in >> powerpc membarrier_arch_sched_in(), given that we want to specifically >> check the next thread state. >> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. >> - Use task_thread_info() to pass thread_info from task to >> *_ti_thread_flag(). >> >> Changes since v2: >> - Move membarrier_arch_sched_in() call to finish_task_switch(). >> - Check for NULL t->mm in membarrier_arch_fork(). >> - Use membarrier_sched_in() in generic code, which invokes the >> arch-specific membarrier_arch_sched_in(). This fixes allnoconfig >> build on PowerPC. >> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing >> allnoconfig build on PowerPC. >> - Build and runtime tested on PowerPC. >> >> Changes since v3: >> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm >> field on fork. >> - powerpc: test thread flag instead of reading >> membarrier_private_expedited in membarrier_arch_fork(). >> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming >> from kernel thread, since mmdrop() implies a full barrier. >> - Set membarrier_private_expedited to 1 only after arch registration >> code, thus eliminating a race where concurrent commands could succeed >> when they should fail if issued concurrently with process >> registration. >> - Use READ_ONCE() for membarrier_private_expedited field access in >> membarrier_private_expedited. Matches WRITE_ONCE() performed in >> process registration. >> >> Changes since v4: >> - Move powerpc hook from sched_in() to switch_mm(), based on feedback >> from Nicholas Piggin. > > For now, the powerpc approach is okay by me. I plan to test > others (e.g., taking runqueue locks) on larger systems, but that can > be sent as an incremental patch at a later time. > > The main thing I would like is for people to review the userspace API. > > >> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h >> index 3a19c253bdb1..4af1b719c65f 100644 >> --- a/include/linux/sched/mm.h >> +++ b/include/linux/sched/mm.h >> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned >> int >> flags) >> current->flags = (current->flags & ~PF_MEMALLOC) | flags; >> } >> >> +#ifdef CONFIG_MEMBARRIER >> + >> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS >> +#include >> +#else >> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev, >> +struct mm_struct *next, struct task_struct *tsk) >> +{ >> +} > > This is no longer required in architecture independent code, is it? Yes, good point! I'll remove this unused code in a follow up patch. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote: > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: > > diff --git a/arch/powerpc/kernel/membarrier.c > > b/arch/powerpc/kernel/membarrier.c > > new file mode 100644 > > index ..b0d79a5f5981 > > --- /dev/null > > +++ b/arch/powerpc/kernel/membarrier.c > > @@ -0,0 +1,45 @@ > > > +void membarrier_arch_register_private_expedited(struct task_struct *p) > > +{ > > + struct task_struct *t; > > + > > + if (get_nr_threads(p) == 1) { > > + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); > > + return; > > + } > > + /* > > +* Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread > > +* fork is protected by siglock. > > +*/ > > + spin_lock(>sighand->siglock); > > + for_each_thread(p, t) > > + set_ti_thread_flag(task_thread_info(t), > > + TIF_MEMBARRIER_PRIVATE_EXPEDITED); > > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. Also, for easier reading I would suggest putting { } around the block. > > + spin_unlock(>sighand->siglock); > > + /* > > +* Ensure all future scheduler executions will observe the new > > +* thread flag state for this process. > > +*/ > > + synchronize_sched(); > > This relies on the flag being read inside rq->lock, right? > > > +}
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote: > diff --git a/arch/powerpc/kernel/membarrier.c > b/arch/powerpc/kernel/membarrier.c > new file mode 100644 > index ..b0d79a5f5981 > --- /dev/null > +++ b/arch/powerpc/kernel/membarrier.c > @@ -0,0 +1,45 @@ > +void membarrier_arch_register_private_expedited(struct task_struct *p) > +{ > + struct task_struct *t; > + > + if (get_nr_threads(p) == 1) { > + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED); > + return; > + } > + /* > + * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread > + * fork is protected by siglock. > + */ > + spin_lock(>sighand->siglock); > + for_each_thread(p, t) > + set_ti_thread_flag(task_thread_info(t), > + TIF_MEMBARRIER_PRIVATE_EXPEDITED); I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD. > + spin_unlock(>sighand->siglock); > + /* > + * Ensure all future scheduler executions will observe the new > + * thread flag state for this process. > + */ > + synchronize_sched(); This relies on the flag being read inside rq->lock, right? > +}
Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
On Wed, 4 Oct 2017 14:37:53 -0700 "Paul E. McKenney"wrote: > From: Mathieu Desnoyers > > Provide a new command allowing processes to register their intent to use > the private expedited command. > > This allows PowerPC to skip the full memory barrier in switch_mm(), and > only issue the barrier when scheduling into a task belonging to a > process that has registered to use expedited private. > > Processes are now required to register before using > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. > > Changes since v1: > - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in > powerpc membarrier_arch_sched_in(), given that we want to specifically > check the next thread state. > - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig. > - Use task_thread_info() to pass thread_info from task to > *_ti_thread_flag(). > > Changes since v2: > - Move membarrier_arch_sched_in() call to finish_task_switch(). > - Check for NULL t->mm in membarrier_arch_fork(). > - Use membarrier_sched_in() in generic code, which invokes the > arch-specific membarrier_arch_sched_in(). This fixes allnoconfig > build on PowerPC. > - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing > allnoconfig build on PowerPC. > - Build and runtime tested on PowerPC. > > Changes since v3: > - Simply rely on copy_mm() to copy the membarrier_private_expedited mm > field on fork. > - powerpc: test thread flag instead of reading > membarrier_private_expedited in membarrier_arch_fork(). > - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming > from kernel thread, since mmdrop() implies a full barrier. > - Set membarrier_private_expedited to 1 only after arch registration > code, thus eliminating a race where concurrent commands could succeed > when they should fail if issued concurrently with process > registration. > - Use READ_ONCE() for membarrier_private_expedited field access in > membarrier_private_expedited. Matches WRITE_ONCE() performed in > process registration. > > Changes since v4: > - Move powerpc hook from sched_in() to switch_mm(), based on feedback > from Nicholas Piggin. For now, the powerpc approach is okay by me. I plan to test others (e.g., taking runqueue locks) on larger systems, but that can be sent as an incremental patch at a later time. The main thing I would like is for people to review the userspace API. > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index 3a19c253bdb1..4af1b719c65f 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned > int flags) > current->flags = (current->flags & ~PF_MEMALLOC) | flags; > } > > +#ifdef CONFIG_MEMBARRIER > + > +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS > +#include > +#else > +static inline void membarrier_arch_switch_mm(struct mm_struct *prev, > + struct mm_struct *next, struct task_struct *tsk) > +{ > +} This is no longer required in architecture independent code, is it?