Re: [PATCH 1/3] task: Add a count of task rcu users

2019-09-05 Thread Frederic Weisbecker
On Wed, Sep 04, 2019 at 11:20:03AM -0700, Linus Torvalds wrote:
> On Wed, Sep 4, 2019 at 9:33 AM Frederic Weisbecker  
> wrote:
> >
> > I thought the point of these rcu_users was to be able to do:
> >
> > rcu_read_lock()
> > p = rcu_dereference(task)
> > if (!refcount_inc_not_zero(p->rcu_users)) {
> 
> No. Because of the shared state, you can't do that from RCU context.
> 
> But you *could* increase rcu_users from within the process context
> itself (as long as you do it before the exit path, ie during normal
> system call execution), or possibly while holding the tasklist_lock
> and verifying that the task hasn't died yet.
> 
> I'm not sure there is any sensible case for doing that, though. It
> would have to have a similar pattern to the runqueue use, where you
> add a new RCU lookup point for the task. I'm sure something like that
> _could_ exist, I just can't think of any right now.

Yeah indeed, in fact I just hadn't read the patches entirely so I missed
the point. I see now that the purpose of rcu_users is to postpone the RCU
delayed put_task_struct() at least once we are done with both final schedule
out and release_task().

Sorry for the noise :o)


Re: [PATCH 1/3] task: Add a count of task rcu users

2019-09-04 Thread Linus Torvalds
On Wed, Sep 4, 2019 at 9:33 AM Frederic Weisbecker  wrote:
>
> I thought the point of these rcu_users was to be able to do:
>
> rcu_read_lock()
> p = rcu_dereference(task)
> if (!refcount_inc_not_zero(p->rcu_users)) {

No. Because of the shared state, you can't do that from RCU context.

But you *could* increase rcu_users from within the process context
itself (as long as you do it before the exit path, ie during normal
system call execution), or possibly while holding the tasklist_lock
and verifying that the task hasn't died yet.

I'm not sure there is any sensible case for doing that, though. It
would have to have a similar pattern to the runqueue use, where you
add a new RCU lookup point for the task. I'm sure something like that
_could_ exist, I just can't think of any right now.

 Linus


Re: [PATCH 1/3] task: Add a count of task rcu users

2019-09-04 Thread Frederic Weisbecker
On Wed, Sep 04, 2019 at 05:32:46PM +0200, Oleg Nesterov wrote:
> On 09/04, Frederic Weisbecker wrote:
> >
> > So what happens if, say:
> >
> >
> >CPU 1 CPU 2
> >--
> >rcu_read_lock()
> >p = rcu_dereference(rq->task)
> >if (refcount_inc_not_zero(p->rcu_users)) {
> >.
> >  release_task() {
> >  put_task_struct_rcu_user() {
> >  call_rcu() {
> >  queue rcu_head
> 
> in this particular case call_rcu() won't be called, so

Yeah I confused myself in finding the scenario but like you say below, 
release_task()
can simply happen between the p = rcu_dereference() and the refcount_inc to 
break things.

I thought the point of these rcu_users was to be able to do:

rcu_read_lock()
p = rcu_dereference(task)
if (!refcount_inc_not_zero(p->rcu_users)) {
rcu_read_unlock();
return -1;
}
rcu_read_unlock();

//do stuff with task

put_task_struct_rcu_user(p);

Thanks.

> 
> >  }
> >  }
> >  }
> >put_task_struct_rcu_user(); //here rcu_users has been overwritten
> 
> rcu_users won't be overwritten.
> 
> But nobody should try to increment ->rcu_users,
> 
>   rcu_read_lock();
>   p = rcu_dereference(rq->task);
>   refcount_inc_not_zero(p->rcu_users);
> 
> is already wrong because both release_task/last-schedule can happen in
> between, before refcount_inc_not_zero().
> 
> Oleg.
> 


Re: [PATCH 1/3] task: Add a count of task rcu users

2019-09-04 Thread Oleg Nesterov
On 09/04, Frederic Weisbecker wrote:
>
> So what happens if, say:
>
>
>CPU 1 CPU 2
>--
>rcu_read_lock()
>p = rcu_dereference(rq->task)
>if (refcount_inc_not_zero(p->rcu_users)) {
>.
>  release_task() {
>  put_task_struct_rcu_user() {
>  call_rcu() {
>  queue rcu_head

in this particular case call_rcu() won't be called, so

>  }
>  }
>  }
>put_task_struct_rcu_user(); //here rcu_users has been overwritten

rcu_users won't be overwritten.

But nobody should try to increment ->rcu_users,

rcu_read_lock();
p = rcu_dereference(rq->task);
refcount_inc_not_zero(p->rcu_users);

is already wrong because both release_task/last-schedule can happen in
between, before refcount_inc_not_zero().

Oleg.



Re: [PATCH 1/3] task: Add a count of task rcu users

2019-09-04 Thread Frederic Weisbecker
On Mon, Sep 02, 2019 at 11:51:34PM -0500, Eric W. Biederman wrote:
> 
> Add a count of the number of rcu users (currently 1) of the task
> struct so that we can later add the scheduler case and get rid of the
> very subtle task_rcu_dereference, and just use rcu_dereference.
> 
> As suggested by Oleg have the count overlap rcu_head so that no
> additional space in task_struct is required.
> 
> Inspired-by: Linus Torvalds 
> Inspired-by: Oleg Nesterov 
> Signed-off-by: "Eric W. Biederman" 
> ---
>  include/linux/sched.h  | 5 -
>  include/linux/sched/task.h | 1 +
>  kernel/exit.c  | 7 ++-
>  kernel/fork.c  | 7 +++
>  4 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 9f51932bd543..99a4518b9b17 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1142,7 +1142,10 @@ struct task_struct {
>  
>   struct tlbflush_unmap_batch tlb_ubc;
>  
> - struct rcu_head rcu;
> + union {
> + refcount_t  rcu_users;
> + struct rcu_head rcu;

So what happens if, say:


   CPU 1 CPU 2
   --
   rcu_read_lock()
   p = rcu_dereference(rq->task)
   if (refcount_inc_not_zero(p->rcu_users)) {
   .
 release_task() {
 put_task_struct_rcu_user() {
 call_rcu() {
 queue rcu_head
 }
 }
 }
   put_task_struct_rcu_user(); //here rcu_users has been overwritten


Thanks.


Re: [PATCH 1/3] task: Add a count of task rcu users

2019-09-04 Thread Oleg Nesterov
On 09/02, Eric W. Biederman wrote:
>
> @@ -900,10 +900,9 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
> - /*
> -  * One for us, one for whoever does the "release_task()" (usually
> -  * parent)
> -  */
> + /* One for the user space visible state that goes away when reaped. */
> + refcount_set(>rcu_users, 1);

forgot to mention...

This is minor, but we don't really need to initialize child->rcu_users in
dup_task_struct(), we can just add

.rcu_users = REFCOUNT_INIT(2),

into init_task's initializer.

Until we have a refcount_inc(task->rcu_users) user, of course.

Oleg.



[PATCH 1/3] task: Add a count of task rcu users

2019-09-02 Thread Eric W. Biederman


Add a count of the number of rcu users (currently 1) of the task
struct so that we can later add the scheduler case and get rid of the
very subtle task_rcu_dereference, and just use rcu_dereference.

As suggested by Oleg have the count overlap rcu_head so that no
additional space in task_struct is required.

Inspired-by: Linus Torvalds 
Inspired-by: Oleg Nesterov 
Signed-off-by: "Eric W. Biederman" 
---
 include/linux/sched.h  | 5 -
 include/linux/sched/task.h | 1 +
 kernel/exit.c  | 7 ++-
 kernel/fork.c  | 7 +++
 4 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9f51932bd543..99a4518b9b17 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1142,7 +1142,10 @@ struct task_struct {
 
struct tlbflush_unmap_batch tlb_ubc;
 
-   struct rcu_head rcu;
+   union {
+   refcount_t  rcu_users;
+   struct rcu_head rcu;
+   };
 
/* Cache last used pipe for splice(): */
struct pipe_inode_info  *splice_pipe;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 0497091e40c1..4c44c37236b2 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -116,6 +116,7 @@ static inline void put_task_struct(struct task_struct *t)
 }
 
 struct task_struct *task_rcu_dereference(struct task_struct **ptask);
+void put_task_struct_rcu_user(struct task_struct *task);
 
 #ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT
 extern int arch_task_struct_size __read_mostly;
diff --git a/kernel/exit.c b/kernel/exit.c
index 5b4a5dcce8f8..2e259286f4e7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -182,6 +182,11 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
put_task_struct(tsk);
 }
 
+void put_task_struct_rcu_user(struct task_struct *task)
+{
+   if (refcount_dec_and_test(>rcu_users))
+   call_rcu(>rcu, delayed_put_task_struct);
+}
 
 void release_task(struct task_struct *p)
 {
@@ -222,7 +227,7 @@ void release_task(struct task_struct *p)
 
write_unlock_irq(_lock);
release_thread(p);
-   call_rcu(>rcu, delayed_put_task_struct);
+   put_task_struct_rcu_user(p);
 
p = leader;
if (unlikely(zap_leader))
diff --git a/kernel/fork.c b/kernel/fork.c
index 2852d0e76ea3..9f04741d5c70 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -900,10 +900,9 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
if (orig->cpus_ptr == >cpus_mask)
tsk->cpus_ptr = >cpus_mask;
 
-   /*
-* One for us, one for whoever does the "release_task()" (usually
-* parent)
-*/
+   /* One for the user space visible state that goes away when reaped. */
+   refcount_set(>rcu_users, 1);
+   /* One for the rcu users, and one for the scheduler */
refcount_set(>usage, 2);
 #ifdef CONFIG_BLK_DEV_IO_TRACE
tsk->btrace_seq = 0;
-- 
2.21.0.dirty