Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-23 Thread Frederic Weisbecker
On Thu, Jul 23, 2020 at 10:32:54AM +0200, Thomas Gleixner wrote:
> Frederic Weisbecker  writes:
> > On Fri, Jul 17, 2020 at 12:50:34AM +0200, Peter Zijlstra wrote:
> >> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
> >> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
> >> > +{
> >> > +struct posix_cputimers *pct = >posix_cputimers;
> >> > +
> >> > +if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
> >> > +task_work_add(tsk, >task_work, true);
> >> > +}
> >> > +
> >> > +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
> >> > +{
> >> > +clear_bit(CPUTIMERS_WORK_SCHEDULED, 
> >> > >posix_cputimers.flags);
> >>/*
> >> * Ensure we observe everything before a failing test_and_set()
> >> * in __run_posix_cpu_timers().
> >> */
> >>smp_mb__after_atomic();
> >> > +}
> >> 
> >> Such that when another timer interrupt happens while we run this, we're
> >> guaranteed to either see it, or get re-queued and thus re-run the
> >> function.
> >
> > But each thread in the process enqueues its own task work and flips its
> > own flags. So if task A runs the task work and task B runs 
> > __run_posix_cpu_timers(),
> > they wouldn't be ordering against the same flags.
> 
> If two tasks queue work independent of each other then one of them will
> find it done already, which is the same as if two tasks of the same
> process execute run_posix_cpu_timers() in parallel.
> 
> I really don't want to go into the rathole of making the work or the
> synchronization process wide. That's a guarantee for disaster.
> 
> Handling task work strictly per task is straight forward and simple. The
> eventually resulting contention on sighand lock in task work is
> unavoidable, but that's a reasonable tradeoff vs. the complexity you
> need to handle task work process wide.

Definetly!

I was only commenting on the barrier suggestion. But I believe it shouldn't
be needed in the end.

If we were to have a per task work for thread timers and a per process work
for process timers, that means we would need to cut down the whole thing, and 
also
take care about timers firing after exit_task_work(), which isn't an issue
in the thread case as the work will simply be ignored for an exiting task but
it's a big issue in the case of process wide handling.

Anyway, the current layout is simple enough.


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-23 Thread Thomas Gleixner
Frederic Weisbecker  writes:
> On Fri, Jul 17, 2020 at 12:50:34AM +0200, Peter Zijlstra wrote:
>> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
>> > +{
>> > +  struct posix_cputimers *pct = >posix_cputimers;
>> > +
>> > +  if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
>> > +  task_work_add(tsk, >task_work, true);
>> > +}
>> > +
>> > +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
>> > +{
>> > +  clear_bit(CPUTIMERS_WORK_SCHEDULED, >posix_cputimers.flags);
>>  /*
>>   * Ensure we observe everything before a failing test_and_set()
>>   * in __run_posix_cpu_timers().
>>   */
>>  smp_mb__after_atomic();
>> > +}
>> 
>> Such that when another timer interrupt happens while we run this, we're
>> guaranteed to either see it, or get re-queued and thus re-run the
>> function.
>
> But each thread in the process enqueues its own task work and flips its
> own flags. So if task A runs the task work and task B runs 
> __run_posix_cpu_timers(),
> they wouldn't be ordering against the same flags.

If two tasks queue work independent of each other then one of them will
find it done already, which is the same as if two tasks of the same
process execute run_posix_cpu_timers() in parallel.

I really don't want to go into the rathole of making the work or the
synchronization process wide. That's a guarantee for disaster.

Handling task work strictly per task is straight forward and simple. The
eventually resulting contention on sighand lock in task work is
unavoidable, but that's a reasonable tradeoff vs. the complexity you
need to handle task work process wide.

Thanks,

tglx


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-22 Thread Frederic Weisbecker
On Fri, Jul 17, 2020 at 12:50:34AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
> > +static void __run_posix_cpu_timers(struct task_struct *tsk)
> > +{
> > +   struct posix_cputimers *pct = >posix_cputimers;
> > +
> > +   if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
> > +   task_work_add(tsk, >task_work, true);
> > +}
> > +
> > +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
> > +{
> > +   clear_bit(CPUTIMERS_WORK_SCHEDULED, >posix_cputimers.flags);
>   /*
>* Ensure we observe everything before a failing test_and_set()
>* in __run_posix_cpu_timers().
>*/
>   smp_mb__after_atomic();
> > +}
> 
> Such that when another timer interrupt happens while we run this, we're
> guaranteed to either see it, or get re-queued and thus re-run the
> function.

But each thread in the process enqueues its own task work and flips its
own flags. So if task A runs the task work and task B runs 
__run_posix_cpu_timers(),
they wouldn't be ordering against the same flags.


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-21 Thread Thomas Gleixner
Thomas Gleixner  writes:
>
> Bah, that creates a dependency on sched/core ...

Only when looking at the wrong tree :)


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-19 Thread Thomas Gleixner
Thomas Gleixner  writes:
> Peter Zijlstra  writes:
>> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>>> +static void __run_posix_cpu_timers(struct task_struct *tsk)
>>> +{
>>> +   struct posix_cputimers *pct = >posix_cputimers;
>>> +
>>> +   if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
>>> +   task_work_add(tsk, >task_work, true);
>>
>> s/true/TWA_RESUME/g
>>
>> see: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
>
> Duh, yes.

Bah, that creates a dependency on sched/core ...


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-17 Thread Thomas Gleixner
Peter Zijlstra  writes:
> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>> +static void __run_posix_cpu_timers(struct task_struct *tsk)
>> +{
>> +struct posix_cputimers *pct = >posix_cputimers;
>> +
>> +if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
>> +task_work_add(tsk, >task_work, true);
>
> s/true/TWA_RESUME/g
>
> see: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")

Duh, yes.


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-17 Thread Thomas Gleixner
Peter Zijlstra  writes:
> On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
>
>> @@ -1096,6 +1099,12 @@ static void __run_posix_cpu_timers(struc
>>  check_process_timers(tsk, );
>>  
>>  /*
>> + * Allow new work to be scheduled. The expiry cache
>> + * is up to date.
>> + */
>> +posix_cpu_timers_enable_work(tsk);
>> +
>> +/*
>>   * We must release these locks before taking any timer's lock.
>>   * There is a potential race with timer deletion here, as the
>>   * siglock now protects our private firing list.  We have set
>
> I think I would feel more comfortable if this was done at the very
> beginning of that function, possibly even with:
>
>> +static void __run_posix_cpu_timers(struct task_struct *tsk)
>> +{
>> +struct posix_cputimers *pct = >posix_cputimers;
>> +
>> +if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
>> +task_work_add(tsk, >task_work, true);
>> +}
>> +
>> +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
>> +{
>> +clear_bit(CPUTIMERS_WORK_SCHEDULED, >posix_cputimers.flags);
>   /*
>* Ensure we observe everything before a failing test_and_set()
>* in __run_posix_cpu_timers().
>*/
>   smp_mb__after_atomic();
>> +}
>
> Such that when another timer interrupt happens while we run this, we're
> guaranteed to either see it, or get re-queued and thus re-run the
> function.

Makes sense.

Thanks,

tglx


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-17 Thread Thomas Gleixner
Oleg,

Oleg Nesterov  writes:
> Looks correct to me, but I forgot everything about posix-timers.c

that's not a problem because this is about posix-cpu-timers.c :)

> this obviously means that the expired timer won't fire until the
> task returns to user-mode but probably we don't care.

If the signal goes to the task itself it does not matter at all because
it's going to be delivered when the task goes out to user space.

If the signal goes to a supervisor process, then it will be slightly
delayed but I could not find a problem with that at all.

I'll add more reasoning to the changelog on V3.

>> +#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
>> +void posix_cpu_timers_work(struct callback_head *work);
>> +
>> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct)
>> +{
>> +pct->task_work.func = posix_cpu_timers_work;
>
> init_task_work() ?

Yeah.

>> +}
>> +#else
>> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct) { }
>> +#endif
>> +
>>  static inline void posix_cputimers_init(struct posix_cputimers *pct)
>>  {
>>  memset(pct, 0, sizeof(*pct));
>>  pct->bases[0].nextevt = U64_MAX;
>>  pct->bases[1].nextevt = U64_MAX;
>>  pct->bases[2].nextevt = U64_MAX;
>> +posix_cputimer_init_work(pct);
>>  }
>
> And I can't resist. I know this is a common practice, please ignore, but to me
>
>   static inline void posix_cputimers_init(struct posix_cputimers *pct)
>   {
>   memset(pct, 0, sizeof(*pct));
>   pct->bases[0].nextevt = U64_MAX;
>   pct->bases[1].nextevt = U64_MAX;
>   pct->bases[2].nextevt = U64_MAX;
>   #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
>   init_task_work(>task_work, posix_cpu_timers_work);
>   #endif
>   }
>
> looks better than 2 posix_cputimer_init_work() definitions above.

Gah, I hate ifdefs in the middle of the code :)

> Note also that signal_struct->posix_cputimers.task_work is never used, perhaps
> it would be better to move this task_work into task_struct? This way we do not
> even need to change posix_cputimers_init(), we call simply initialize
> init_task.posix_task_work.

Let me look into that.

Thanks,

tglx


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-17 Thread Oleg Nesterov
Looks correct to me, but I forgot everything about posix-timers.c

this obviously means that the expired timer won't fire until the
task returns to user-mode but probably we don't care.

One cosmetic nit below,

On 07/16, Thomas Gleixner wrote:
>
> +#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
> +void posix_cpu_timers_work(struct callback_head *work);
> +
> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct)
> +{
> + pct->task_work.func = posix_cpu_timers_work;

init_task_work() ?

> +}
> +#else
> +static inline void posix_cputimer_init_work(struct posix_cputimers *pct) { }
> +#endif
> +
>  static inline void posix_cputimers_init(struct posix_cputimers *pct)
>  {
>   memset(pct, 0, sizeof(*pct));
>   pct->bases[0].nextevt = U64_MAX;
>   pct->bases[1].nextevt = U64_MAX;
>   pct->bases[2].nextevt = U64_MAX;
> + posix_cputimer_init_work(pct);
>  }

And I can't resist. I know this is a common practice, please ignore, but to me

static inline void posix_cputimers_init(struct posix_cputimers *pct)
{
memset(pct, 0, sizeof(*pct));
pct->bases[0].nextevt = U64_MAX;
pct->bases[1].nextevt = U64_MAX;
pct->bases[2].nextevt = U64_MAX;
#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
init_task_work(>task_work, posix_cpu_timers_work);
#endif
}

looks better than 2 posix_cputimer_init_work() definitions above.

Note also that signal_struct->posix_cputimers.task_work is never used, perhaps
it would be better to move this task_work into task_struct? This way we do not
even need to change posix_cputimers_init(), we call simply initialize
init_task.posix_task_work.

Oleg.



Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-16 Thread Peter Zijlstra
On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:
> +static void __run_posix_cpu_timers(struct task_struct *tsk)
> +{
> + struct posix_cputimers *pct = >posix_cputimers;
> +
> + if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
> + task_work_add(tsk, >task_work, true);

s/true/TWA_RESUME/g

see: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")

> +}


Re: [patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-16 Thread Peter Zijlstra
On Thu, Jul 16, 2020 at 10:19:26PM +0200, Thomas Gleixner wrote:

> @@ -1096,6 +1099,12 @@ static void __run_posix_cpu_timers(struc
>   check_process_timers(tsk, );
>  
>   /*
> +  * Allow new work to be scheduled. The expiry cache
> +  * is up to date.
> +  */
> + posix_cpu_timers_enable_work(tsk);
> +
> + /*
>* We must release these locks before taking any timer's lock.
>* There is a potential race with timer deletion here, as the
>* siglock now protects our private firing list.  We have set

I think I would feel more comfortable if this was done at the very
beginning of that function, possibly even with:

> +static void __run_posix_cpu_timers(struct task_struct *tsk)
> +{
> + struct posix_cputimers *pct = >posix_cputimers;
> +
> + if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
> + task_work_add(tsk, >task_work, true);
> +}
> +
> +static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
> +{
> + clear_bit(CPUTIMERS_WORK_SCHEDULED, >posix_cputimers.flags);
/*
 * Ensure we observe everything before a failing test_and_set()
 * in __run_posix_cpu_timers().
 */
smp_mb__after_atomic();
> +}

Such that when another timer interrupt happens while we run this, we're
guaranteed to either see it, or get re-queued and thus re-run the
function.


[patch V2 3/5] posix-cpu-timers: Provide mechanisms to defer timer handling to task_work

2020-07-16 Thread Thomas Gleixner
Running posix cpu timers in hard interrupt context has a few downsides:

 - For PREEMPT_RT it cannot work as the expiry code needs to take
   sighand lock, which is a 'sleeping spinlock' in RT. The original RT
   approach of offloading the posix CPU timer handling into a high
   priority thread was clumsy and provided no real benefit in general.

 - For fine grained accounting it's just wrong to run this in context of
   the timer interrupt because that way a process specific cpu time is
   accounted to the timer interrupt.

 - Long running timer interrupts caused by a large amount of expiring
   timers which can be created and armed by unpriviledged user space.

There is no hard requirement to expire them in interrupt context.

Provide infrastructure to schedule task work which allows splitting the
posix CPU timer code into a quick check in interrupt context and a thread
context expiry and signal delivery function. This has to be enabled by
architectures as it requires that the architecture specific KVM
implementation handles pending task work before exiting to guest mode.

Signed-off-by: Thomas Gleixner 
---
 include/linux/posix-timers.h   |   17 
 kernel/time/Kconfig|5 
 kernel/time/posix-cpu-timers.c |   42 -
 3 files changed, 63 insertions(+), 1 deletion(-)

--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -112,25 +112,42 @@ struct posix_cputimer_base {
 enum {
CPUTIMERS_ACTIVE,
CPUTIMERS_EXPIRING,
+   CPUTIMERS_WORK_SCHEDULED,
 };
 
 /**
  * posix_cputimers - Container for posix CPU timer related data
  * @bases: Base container for posix CPU clocks
  * @flags: Flags for various CPUTIMERS_* states
+ * @task_work: Task work to defer timer expiry into task context
  * Used in task_struct and signal_struct
  */
 struct posix_cputimers {
struct posix_cputimer_base  bases[CPUCLOCK_MAX];
unsigned long   flags;
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+   struct callback_headtask_work;
+#endif
 };
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+void posix_cpu_timers_work(struct callback_head *work);
+
+static inline void posix_cputimer_init_work(struct posix_cputimers *pct)
+{
+   pct->task_work.func = posix_cpu_timers_work;
+}
+#else
+static inline void posix_cputimer_init_work(struct posix_cputimers *pct) { }
+#endif
+
 static inline void posix_cputimers_init(struct posix_cputimers *pct)
 {
memset(pct, 0, sizeof(*pct));
pct->bases[0].nextevt = U64_MAX;
pct->bases[1].nextevt = U64_MAX;
pct->bases[2].nextevt = U64_MAX;
+   posix_cputimer_init_work(pct);
 }
 
 void posix_cputimers_group_init(struct posix_cputimers *pct, u64 cpu_limit);
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -52,6 +52,11 @@ config GENERIC_CLOCKEVENTS_MIN_ADJUST
 config GENERIC_CMOS_UPDATE
bool
 
+# Select to handle posix CPU timers from task_work
+# and not from the timer interrupt context
+config POSIX_CPU_TIMERS_TASK_WORK
+   bool
+
 if GENERIC_CLOCKEVENTS
 menu "Timers subsystem"
 
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "posix-timers.h"
@@ -1075,7 +1076,9 @@ static inline bool fastpath_timer_check(
return false;
 }
 
-static void __run_posix_cpu_timers(struct task_struct *tsk)
+static inline void posix_cpu_timers_enable_work(struct task_struct *tsk);
+
+static void handle_posix_cpu_timers(struct task_struct *tsk)
 {
struct k_itimer *timer, *next;
unsigned long flags;
@@ -1096,6 +1099,12 @@ static void __run_posix_cpu_timers(struc
check_process_timers(tsk, );
 
/*
+* Allow new work to be scheduled. The expiry cache
+* is up to date.
+*/
+   posix_cpu_timers_enable_work(tsk);
+
+   /*
 * We must release these locks before taking any timer's lock.
 * There is a potential race with timer deletion here, as the
 * siglock now protects our private firing list.  We have set
@@ -1130,6 +1139,37 @@ static void __run_posix_cpu_timers(struc
lockdep_posixtimer_exit();
 }
 
+#ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+
+void posix_cpu_timers_work(struct callback_head *work)
+{
+   handle_posix_cpu_timers(current);
+}
+
+static void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+   struct posix_cputimers *pct = >posix_cputimers;
+
+   if (!test_and_set_bit(CPUTIMERS_WORK_SCHEDULED, >flags))
+   task_work_add(tsk, >task_work, true);
+}
+
+static inline void posix_cpu_timers_enable_work(struct task_struct *tsk)
+{
+   clear_bit(CPUTIMERS_WORK_SCHEDULED, >posix_cputimers.flags);
+}
+
+#else
+
+static void __run_posix_cpu_timers(struct task_struct *tsk)
+{
+   handle_posix_cpu_timers(tsk);
+}
+
+static inline void posix_cpu_timers_enable_work(struct task_struct