Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-13 Thread Paul E. McKenney
On Tue, Apr 13, 2021 at 10:49:11PM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
> >> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> >> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> >> >> > I will send a new series out later today, Pacific Time.
> >> >> 
> >> >> Can you do me a favour and send it standalone and not as yet another
> >> >> reply to this existing thread maze. A trivial lore link to the previous
> >> >> version gives enough context.
> >> >
> >> > Will do!
> >> >
> >> > Of course, it turns out that lockdep also doesn't like waited-on
> >> > smp_call_function_single() invocations from timer handlers,
> >> > so I am currently looking at other options for dealing with that
> >> > potential use-after-free.  I am starting to like the looks of "only set
> >> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> >> > and let KASAN enforce this restriction", but I have not quite given up
> >> > on making it more general.
> >> 
> >> The simplest point is in the thread under the clocksource_mutex which
> >> prevents anything from vanishing under your feet.
> >
> > And lockdep is -much- happier with the setup shown below, so thank
> > you again!
> 
> But it is too simple now :) ...
> 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index f047c6cb056c..34dc38b6b923 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
> > unsigned long flags;
> > int select = 0;
> >  
> > +   /* Do any required per-CPU skew verification. */
> > +   list_for_each_entry(cs, _list, wd_list) {
> > +   if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | 
> > CLOCK_SOURCE_VERIFY_PERCPU)) ==
> > +   (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
> > +   clocksource_verify_percpu(cs);
> > +   }
> 
> because that list is _NOT_ protected by the clocksource_mutex as you
> noticed yourself already.
> 
> But you don't have to walk that list at all because the only interesting
> thing is the currently active clocksource, which is about to be changed
> in case the watchdog marked it unstable and cannot be changed by any
> other code concurrently because clocksource_mutex is held.
> 
> So all you need is:
> 
>   if (curr_clocksource &&
>   curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
>   curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
>   clocksource_verify_percpu_wreckage(curr_clocksource);
> 
> Hmm?

With the addition of a clocksource=tsc boot parameter, this approach
does appear to work, thank you!  I sent out the updated series.

Thanx, Paul


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-13 Thread Thomas Gleixner
Paul,

On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote:
> On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
>> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
>> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
>> >> > I will send a new series out later today, Pacific Time.
>> >> 
>> >> Can you do me a favour and send it standalone and not as yet another
>> >> reply to this existing thread maze. A trivial lore link to the previous
>> >> version gives enough context.
>> >
>> > Will do!
>> >
>> > Of course, it turns out that lockdep also doesn't like waited-on
>> > smp_call_function_single() invocations from timer handlers,
>> > so I am currently looking at other options for dealing with that
>> > potential use-after-free.  I am starting to like the looks of "only set
>> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
>> > and let KASAN enforce this restriction", but I have not quite given up
>> > on making it more general.
>> 
>> The simplest point is in the thread under the clocksource_mutex which
>> prevents anything from vanishing under your feet.
>
> And lockdep is -much- happier with the setup shown below, so thank
> you again!

But it is too simple now :) ...

> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index f047c6cb056c..34dc38b6b923 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
>   unsigned long flags;
>   int select = 0;
>  
> + /* Do any required per-CPU skew verification. */
> + list_for_each_entry(cs, _list, wd_list) {
> + if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | 
> CLOCK_SOURCE_VERIFY_PERCPU)) ==
> + (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
> + clocksource_verify_percpu(cs);
> + }

because that list is _NOT_ protected by the clocksource_mutex as you
noticed yourself already.

But you don't have to walk that list at all because the only interesting
thing is the currently active clocksource, which is about to be changed
in case the watchdog marked it unstable and cannot be changed by any
other code concurrently because clocksource_mutex is held.

So all you need is:

if (curr_clocksource &&
curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE &&
curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU)
clocksource_verify_percpu_wreckage(curr_clocksource);

Hmm?

Thanks,

tglx




Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-12 Thread Paul E. McKenney
On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote:
> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> >> > I will send a new series out later today, Pacific Time.
> >> 
> >> Can you do me a favour and send it standalone and not as yet another
> >> reply to this existing thread maze. A trivial lore link to the previous
> >> version gives enough context.
> >
> > Will do!
> >
> > Of course, it turns out that lockdep also doesn't like waited-on
> > smp_call_function_single() invocations from timer handlers,
> > so I am currently looking at other options for dealing with that
> > potential use-after-free.  I am starting to like the looks of "only set
> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> > and let KASAN enforce this restriction", but I have not quite given up
> > on making it more general.
> 
> The simplest point is in the thread under the clocksource_mutex which
> prevents anything from vanishing under your feet.

And lockdep is -much- happier with the setup shown below, so thank
you again!

Thanx, Paul



diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index f047c6cb056c..34dc38b6b923 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void)
unsigned long flags;
int select = 0;
 
+   /* Do any required per-CPU skew verification. */
+   list_for_each_entry(cs, _list, wd_list) {
+   if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | 
CLOCK_SOURCE_VERIFY_PERCPU)) ==
+   (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU))
+   clocksource_verify_percpu(cs);
+   }
+
spin_lock_irqsave(_lock, flags);
list_for_each_entry_safe(cs, tmp, _list, wd_list) {
if (cs->flags & CLOCK_SOURCE_UNSTABLE) {


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-12 Thread Thomas Gleixner
On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote:
> On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
>> > I will send a new series out later today, Pacific Time.
>> 
>> Can you do me a favour and send it standalone and not as yet another
>> reply to this existing thread maze. A trivial lore link to the previous
>> version gives enough context.
>
> Will do!
>
> Of course, it turns out that lockdep also doesn't like waited-on
> smp_call_function_single() invocations from timer handlers,
> so I am currently looking at other options for dealing with that
> potential use-after-free.  I am starting to like the looks of "only set
> CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
> and let KASAN enforce this restriction", but I have not quite given up
> on making it more general.

The simplest point is in the thread under the clocksource_mutex which
prevents anything from vanishing under your feet.

Thanks,

tglx


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-12 Thread Paul E. McKenney
On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Mon, Apr 12 2021 at 11:20, Paul E. McKenney wrote:
> > On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
> >> The reason for irqsave is again historical AFAICT and nobody bothered to
> >> clean it up. spin_lock_bh() should be sufficient to serialize against
> >> the watchdog timer, though I haven't looked at all possible scenarios.
> >
> > Though if BH is disabled, there is not so much advantage to
> > invoking it from __clocksource_watchdog_kthread().  Might as
> > well just invoke it directly from clocksource_watchdog().
> >
> >> > 2.   Invoke clocksource_verify_percpu() from its original
> >> >  location in clocksource_watchdog(), just before the call to
> >> >  __clocksource_unstable().  This relies on the fact that
> >> >  clocksource_watchdog() acquires watchdog_lock without
> >> >  disabling interrupts.
> >> 
> >> That should be fine, but this might cause the softirq to 'run' for a
> >> very long time which is not pretty either.
> >> 
> >> Aside of that, do we really need to check _all_ online CPUs? What you
> >> are trying to figure out is whether the wreckage is CPU local or global,
> >> right?
> >> 
> >> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
> >> enough? Either the other CPU has the same wreckage, then it's global or
> >> it hasn't which points to a per CPU local issue.
> >> 
> >> Sure it does not catch the case where a subset (>1) of all CPUs is
> >> affected, but I'm not seing how that really buys us anything.
> >
> > Good point!  My thought is to randomly pick eight CPUs to keep the
> > duration reasonable while having a good chance of hitting "interesting"
> > CPU choices in multicore and multisocket systems.
> >
> > However, if a hard-to-reproduce problem occurred, it would be good to take
> > the hit and scan all the CPUs.  Additionally, there are some workloads
> > for which the switch from TSC to HPET is fatal anyway due to increased
> > overhead.  For these workloads, the full CPU scan is no additional pain.
> >
> > So I am thinking in terms of a default that probes eight randomly selected
> > CPUs without worrying about duplicates (as in there would be some chance
> > that fewer CPUs would actually be probed), but with a boot-time flag
> > that does all CPUs.  I would add the (default) random selection as a
> > separate patch.
> 
> You can't do without making it complex, right? Keep it simple is not an
> option for a RCU hacker it seems :)

But it was simple!  It just hit all the CPUs.

However, you (quite rightly) pointed out that this simple approach had
a few shortcomings.  ;-)

> > I will send a new series out later today, Pacific Time.
> 
> Can you do me a favour and send it standalone and not as yet another
> reply to this existing thread maze. A trivial lore link to the previous
> version gives enough context.

Will do!

Of course, it turns out that lockdep also doesn't like waited-on
smp_call_function_single() invocations from timer handlers,
so I am currently looking at other options for dealing with that
potential use-after-free.  I am starting to like the looks of "only set
CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures
and let KASAN enforce this restriction", but I have not quite given up
on making it more general.

Thanx, Paul


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-12 Thread Thomas Gleixner
Paul,

On Mon, Apr 12 2021 at 11:20, Paul E. McKenney wrote:
> On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
>> The reason for irqsave is again historical AFAICT and nobody bothered to
>> clean it up. spin_lock_bh() should be sufficient to serialize against
>> the watchdog timer, though I haven't looked at all possible scenarios.
>
> Though if BH is disabled, there is not so much advantage to
> invoking it from __clocksource_watchdog_kthread().  Might as
> well just invoke it directly from clocksource_watchdog().
>
>> > 2. Invoke clocksource_verify_percpu() from its original
>> >location in clocksource_watchdog(), just before the call to
>> >__clocksource_unstable().  This relies on the fact that
>> >clocksource_watchdog() acquires watchdog_lock without
>> >disabling interrupts.
>> 
>> That should be fine, but this might cause the softirq to 'run' for a
>> very long time which is not pretty either.
>> 
>> Aside of that, do we really need to check _all_ online CPUs? What you
>> are trying to figure out is whether the wreckage is CPU local or global,
>> right?
>> 
>> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
>> enough? Either the other CPU has the same wreckage, then it's global or
>> it hasn't which points to a per CPU local issue.
>> 
>> Sure it does not catch the case where a subset (>1) of all CPUs is
>> affected, but I'm not seing how that really buys us anything.
>
> Good point!  My thought is to randomly pick eight CPUs to keep the
> duration reasonable while having a good chance of hitting "interesting"
> CPU choices in multicore and multisocket systems.
>
> However, if a hard-to-reproduce problem occurred, it would be good to take
> the hit and scan all the CPUs.  Additionally, there are some workloads
> for which the switch from TSC to HPET is fatal anyway due to increased
> overhead.  For these workloads, the full CPU scan is no additional pain.
>
> So I am thinking in terms of a default that probes eight randomly selected
> CPUs without worrying about duplicates (as in there would be some chance
> that fewer CPUs would actually be probed), but with a boot-time flag
> that does all CPUs.  I would add the (default) random selection as a
> separate patch.

You can't do without making it complex, right? Keep it simple is not an
option for a RCU hacker it seems :)

> I will send a new series out later today, Pacific Time.

Can you do me a favour and send it standalone and not as yet another
reply to this existing thread maze. A trivial lore link to the previous
version gives enough context.

Thanks,

tglx
 


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-12 Thread Paul E. McKenney
On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote:
> On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote:
> > On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
> >> So I need to is inline clocksource_verify_percpu_wq()
> >> into clocksource_verify_percpu() and then move the call to
> >> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
> >> before the existing call to list_del_init().  Will do!
> >
> > Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
> > due to interrupts being disabled across that list_del_init().
> >
> > Possibilities include:
> >
> > 1.  Figure out why interrupts must be disabled only sometimes while
> > holding watchdog_lock, in the hope that they need not be across
> > the entire critical section for __clocksource_watchdog_kthread().
> > As in:
> >
> > local_irq_restore(flags);
> > clocksource_verify_percpu(cs);
> > local_irq_save(flags);
> >
> > Trying this first with lockdep enabled.  Might be spectacular.
> 
> Yes, it's a possible deadlock against the watchdog timer firing ...

And lockdep most emphatically agrees with you.  ;-)

> The reason for irqsave is again historical AFAICT and nobody bothered to
> clean it up. spin_lock_bh() should be sufficient to serialize against
> the watchdog timer, though I haven't looked at all possible scenarios.

Though if BH is disabled, there is not so much advantage to
invoking it from __clocksource_watchdog_kthread().  Might as
well just invoke it directly from clocksource_watchdog().

> > 2.  Invoke clocksource_verify_percpu() from its original
> > location in clocksource_watchdog(), just before the call to
> > __clocksource_unstable().  This relies on the fact that
> > clocksource_watchdog() acquires watchdog_lock without
> > disabling interrupts.
> 
> That should be fine, but this might cause the softirq to 'run' for a
> very long time which is not pretty either.
> 
> Aside of that, do we really need to check _all_ online CPUs? What you
> are trying to figure out is whether the wreckage is CPU local or global,
> right?
> 
> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
> enough? Either the other CPU has the same wreckage, then it's global or
> it hasn't which points to a per CPU local issue.
> 
> Sure it does not catch the case where a subset (>1) of all CPUs is
> affected, but I'm not seing how that really buys us anything.

Good point!  My thought is to randomly pick eight CPUs to keep the
duration reasonable while having a good chance of hitting "interesting"
CPU choices in multicore and multisocket systems.

However, if a hard-to-reproduce problem occurred, it would be good to take
the hit and scan all the CPUs.  Additionally, there are some workloads
for which the switch from TSC to HPET is fatal anyway due to increased
overhead.  For these workloads, the full CPU scan is no additional pain.

So I am thinking in terms of a default that probes eight randomly selected
CPUs without worrying about duplicates (as in there would be some chance
that fewer CPUs would actually be probed), but with a boot-time flag
that does all CPUs.  I would add the (default) random selection as a
separate patch.

I will send a new series out later today, Pacific Time.

Thanx, Paul


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-12 Thread Thomas Gleixner
On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote:
> On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
>> So I need to is inline clocksource_verify_percpu_wq()
>> into clocksource_verify_percpu() and then move the call to
>> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
>> before the existing call to list_del_init().  Will do!
>
> Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
> due to interrupts being disabled across that list_del_init().
>
> Possibilities include:
>
> 1.Figure out why interrupts must be disabled only sometimes while
>   holding watchdog_lock, in the hope that they need not be across
>   the entire critical section for __clocksource_watchdog_kthread().
>   As in:
>
>   local_irq_restore(flags);
>   clocksource_verify_percpu(cs);
>   local_irq_save(flags);
>
>   Trying this first with lockdep enabled.  Might be spectacular.

Yes, it's a possible deadlock against the watchdog timer firing ...

The reason for irqsave is again historical AFAICT and nobody bothered to
clean it up. spin_lock_bh() should be sufficient to serialize against
the watchdog timer, though I haven't looked at all possible scenarios.

> 2.Invoke clocksource_verify_percpu() from its original
>   location in clocksource_watchdog(), just before the call to
>   __clocksource_unstable().  This relies on the fact that
>   clocksource_watchdog() acquires watchdog_lock without
>   disabling interrupts.

That should be fine, but this might cause the softirq to 'run' for a
very long time which is not pretty either.

Aside of that, do we really need to check _all_ online CPUs? What you
are trying to figure out is whether the wreckage is CPU local or global,
right?

Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good
enough? Either the other CPU has the same wreckage, then it's global or
it hasn't which points to a per CPU local issue.

Sure it does not catch the case where a subset (>1) of all CPUs is
affected, but I'm not seing how that really buys us anything.

Thanks,

tglx


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-11 Thread Paul E. McKenney
On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote:
> On Sun, Apr 11, 2021 at 12:33:44PM +0200, Thomas Gleixner wrote:
> > On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote:
> > > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
> > >> > +  if (WARN_ON_ONCE(!cs))
> > >> > +  return;
> > >> > +  pr_warn("Checking clocksource %s synchronization from CPU 
> > >> > %d.\n",
> > >> > +  cs->name, smp_processor_id());
> > >> > +  cpumask_clear(_ahead);
> > >> > +  cpumask_clear(_behind);
> > >> > +  csnow_begin = cs->read(cs);
> > >> 
> > >> So this is invoked via work and the actual clocksource change is done
> > >> via work too. Once the clocksource is not longer actively used for
> > >> timekeeping it can go away. What's guaranteeing that this runs prior to
> > >> the clocksource change and 'cs' is valid throughout this function?
> > >
> > > From what I can see, cs->read() doesn't care whether or not the
> > > clocksource has been marked unstable.  So it should be OK to call
> > > cs->read() before, during, or after the call to __clocksource_unstable().
> > >
> > > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
> > > so any clocksource that did not like cs->read() being invoked during
> > > or after the call to __clocksource_unstable() should leave off the
> > > CLOCK_SOURCE_VERIFY_PERCPU bit.
> > >
> > > Or did I take a wrong turn somewhere in the pointers to functions?
> > 
> > Right. cs->read() does not care, but what guarantees that cs is valid
> > and not freed yet? It's not an issue with TSC and KVMCLOCK, but
> > conceptually the following is possible:
> > 
> > watchdog()   
> >   queue_work(synccheck);
> >   queue_work(clocksource_change);
> > 
> > work:   
> >   synccheck() clocksource_change()
> > preemption...
> >   ...
> >   some_other_code():
> >  unregister_clocksource(cs)
> >  free(cs)
> >   cs->read()   <- UAF
> 
> Got it, with the ingenic_tcu_init() function being case in point.
> It invokes clcoksource_unregister() shortly followed by clk_put(), which,
> if I found the correct clk_put(), can kfree() it.
> 
> Thank you!
> 
> > >> > +  queue_work(system_highpri_wq, _verify_work);
> > >> 
> > >> This does not guarantee anything. So why does this need an extra work
> > >> function which is scheduled seperately?
> > >
> > > Because I was concerned about doing smp_call_function() while holding
> > > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
> > > And it still looks like on x86 that spin_lock_irqsave() spins with irqs
> > > disabled, which could result in deadlock.  The smp_call_function_single()
> > > would wait for the target CPU to enable interrupts, which would not
> > > happen until after the smp_call_function_single() returned due to its
> > > caller holding watchdog_lock.
> > >
> > > Or is there something that I am missing that prevents this deadlock
> > > from occurring?
> > 
> > The unstable mechanism is:
> > 
> > watchdog()
> >   __clocksource_unstable()
> > schedule_work(_work);
> > 
> > watchdog_work()
> >   kthread_run(clocksource_watchdog_thread);
> > 
> > cs_watchdog_thread()
> >   mutex_lock(_mutex);
> >   if (__clocksource_watchdog_kthread())
> > clocksource_select();
> >   mutex_unlock(_mutex);
> > 
> > So what prevents you from doing that right in watchdog_work() or even in
> > cs_watchdog_thread() properly ordered against the actual clocksource
> > switch?
> > 
> > Hmm?
> 
> My own confusion, apparently.  :-/
> 
> So I need to is inline clocksource_verify_percpu_wq()
> into clocksource_verify_percpu() and then move the call to
> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
> before the existing call to list_del_init().  Will do!

Except that this triggers the WARN_ON_ONCE() in smp_call_function_single()
due to interrupts being disabled across that list_del_init().

Possibilities include:

1.  Figure out why interrupts must be disabled only sometimes while
holding watchdog_lock, in the hope that they need not be across
the entire critical section for __clocksource_watchdog_kthread().
As in:

local_irq_restore(flags);
clocksource_verify_percpu(cs);
local_irq_save(flags);

Trying this first with lockdep enabled.  Might be spectacular.

2.  Invoke clocksource_verify_percpu() from its original
location in clocksource_watchdog(), just before the call to
__clocksource_unstable().  This relies on the fact that
clocksource_watchdog() acquires watchdog_lock without
disabling interrupts.

3.  Restrict CLOCK_SOURCE_VERIFY_PERCPU to 

Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-11 Thread Paul E. McKenney
On Sun, Apr 11, 2021 at 12:33:44PM +0200, Thomas Gleixner wrote:
> On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote:
> > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
> >> > +if (WARN_ON_ONCE(!cs))
> >> > +return;
> >> > +pr_warn("Checking clocksource %s synchronization from CPU 
> >> > %d.\n",
> >> > +cs->name, smp_processor_id());
> >> > +cpumask_clear(_ahead);
> >> > +cpumask_clear(_behind);
> >> > +csnow_begin = cs->read(cs);
> >> 
> >> So this is invoked via work and the actual clocksource change is done
> >> via work too. Once the clocksource is not longer actively used for
> >> timekeeping it can go away. What's guaranteeing that this runs prior to
> >> the clocksource change and 'cs' is valid throughout this function?
> >
> > From what I can see, cs->read() doesn't care whether or not the
> > clocksource has been marked unstable.  So it should be OK to call
> > cs->read() before, during, or after the call to __clocksource_unstable().
> >
> > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
> > so any clocksource that did not like cs->read() being invoked during
> > or after the call to __clocksource_unstable() should leave off the
> > CLOCK_SOURCE_VERIFY_PERCPU bit.
> >
> > Or did I take a wrong turn somewhere in the pointers to functions?
> 
> Right. cs->read() does not care, but what guarantees that cs is valid
> and not freed yet? It's not an issue with TSC and KVMCLOCK, but
> conceptually the following is possible:
> 
> watchdog()   
>   queue_work(synccheck);
>   queue_work(clocksource_change);
> 
> work:   
>   synccheck() clocksource_change()
> preemption...
>   ...
>   some_other_code():
>  unregister_clocksource(cs)
>  free(cs)
>   cs->read()   <- UAF

Got it, with the ingenic_tcu_init() function being case in point.
It invokes clcoksource_unregister() shortly followed by clk_put(), which,
if I found the correct clk_put(), can kfree() it.

Thank you!

> >> > +queue_work(system_highpri_wq, _verify_work);
> >> 
> >> This does not guarantee anything. So why does this need an extra work
> >> function which is scheduled seperately?
> >
> > Because I was concerned about doing smp_call_function() while holding
> > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
> > And it still looks like on x86 that spin_lock_irqsave() spins with irqs
> > disabled, which could result in deadlock.  The smp_call_function_single()
> > would wait for the target CPU to enable interrupts, which would not
> > happen until after the smp_call_function_single() returned due to its
> > caller holding watchdog_lock.
> >
> > Or is there something that I am missing that prevents this deadlock
> > from occurring?
> 
> The unstable mechanism is:
> 
> watchdog()
>   __clocksource_unstable()
> schedule_work(_work);
> 
> watchdog_work()
>   kthread_run(clocksource_watchdog_thread);
> 
> cs_watchdog_thread()
>   mutex_lock(_mutex);
>   if (__clocksource_watchdog_kthread())
>   clocksource_select();
>   mutex_unlock(_mutex);
> 
> So what prevents you from doing that right in watchdog_work() or even in
> cs_watchdog_thread() properly ordered against the actual clocksource
> switch?
> 
> Hmm?

My own confusion, apparently.  :-/

So I need to is inline clocksource_verify_percpu_wq()
into clocksource_verify_percpu() and then move the call to
clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right
before the existing call to list_del_init().  Will do!

Thanx, Paul


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-11 Thread Thomas Gleixner
On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote:
> On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
>> > +  if (WARN_ON_ONCE(!cs))
>> > +  return;
>> > +  pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
>> > +  cs->name, smp_processor_id());
>> > +  cpumask_clear(_ahead);
>> > +  cpumask_clear(_behind);
>> > +  csnow_begin = cs->read(cs);
>> 
>> So this is invoked via work and the actual clocksource change is done
>> via work too. Once the clocksource is not longer actively used for
>> timekeeping it can go away. What's guaranteeing that this runs prior to
>> the clocksource change and 'cs' is valid throughout this function?
>
> From what I can see, cs->read() doesn't care whether or not the
> clocksource has been marked unstable.  So it should be OK to call
> cs->read() before, during, or after the call to __clocksource_unstable().
>
> Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
> so any clocksource that did not like cs->read() being invoked during
> or after the call to __clocksource_unstable() should leave off the
> CLOCK_SOURCE_VERIFY_PERCPU bit.
>
> Or did I take a wrong turn somewhere in the pointers to functions?

Right. cs->read() does not care, but what guarantees that cs is valid
and not freed yet? It's not an issue with TSC and KVMCLOCK, but
conceptually the following is possible:

watchdog()   
  queue_work(synccheck);
  queue_work(clocksource_change);

work:   
  synccheck() clocksource_change()
preemption...
  ...
  some_other_code():
 unregister_clocksource(cs)
 free(cs)
  cs->read()   <- UAF

>> > +  queue_work(system_highpri_wq, _verify_work);
>> 
>> This does not guarantee anything. So why does this need an extra work
>> function which is scheduled seperately?
>
> Because I was concerned about doing smp_call_function() while holding
> watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
> And it still looks like on x86 that spin_lock_irqsave() spins with irqs
> disabled, which could result in deadlock.  The smp_call_function_single()
> would wait for the target CPU to enable interrupts, which would not
> happen until after the smp_call_function_single() returned due to its
> caller holding watchdog_lock.
>
> Or is there something that I am missing that prevents this deadlock
> from occurring?

The unstable mechanism is:

watchdog()
  __clocksource_unstable()
schedule_work(_work);

watchdog_work()
  kthread_run(clocksource_watchdog_thread);

cs_watchdog_thread()
  mutex_lock(_mutex);
  if (__clocksource_watchdog_kthread())
clocksource_select();
  mutex_unlock(_mutex);

So what prevents you from doing that right in watchdog_work() or even in
cs_watchdog_thread() properly ordered against the actual clocksource
switch?

Hmm?

Thanks,

tglx


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-10 Thread Paul E. McKenney
On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote:
> On Fri, Apr 02 2021 at 15:49, paulmck wrote:
> >
> > +static void clocksource_verify_percpu_wq(struct work_struct *unused)
> > +{
> > +   int cpu;
> > +   struct clocksource *cs;
> > +   int64_t cs_nsec;
> > +   u64 csnow_begin;
> > +   u64 csnow_end;
> > +   u64 delta;
> 
> Please use reverse fir tree ordering and stick variables of the same
> type together:
> 
>   u64 csnow_begin, csnow_end, delta;
>   struct clocksource *cs;
>   s64 cs_nsec;
> int cpu;

Will do.

> > +
> > +   cs = smp_load_acquire(_verify_work_cs); // pairs with 
> > release
> 
> Please don't use tail comments. They are a horrible distraction.

I will remove it.

> > +   if (WARN_ON_ONCE(!cs))
> > +   return;
> > +   pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
> > +   cs->name, smp_processor_id());
> > +   cpumask_clear(_ahead);
> > +   cpumask_clear(_behind);
> > +   csnow_begin = cs->read(cs);
> 
> So this is invoked via work and the actual clocksource change is done
> via work too. Once the clocksource is not longer actively used for
> timekeeping it can go away. What's guaranteeing that this runs prior to
> the clocksource change and 'cs' is valid throughout this function?

>From what I can see, cs->read() doesn't care whether or not the
clocksource has been marked unstable.  So it should be OK to call
cs->read() before, during, or after the call to __clocksource_unstable().

Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU,
so any clocksource that did not like cs->read() being invoked during
or after the call to __clocksource_unstable() should leave off the
CLOCK_SOURCE_VERIFY_PERCPU bit.

Or did I take a wrong turn somewhere in the pointers to functions?

> > +   queue_work(system_highpri_wq, _verify_work);
> 
> This does not guarantee anything. So why does this need an extra work
> function which is scheduled seperately?

Because I was concerned about doing smp_call_function() while holding
watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave().
And it still looks like on x86 that spin_lock_irqsave() spins with irqs
disabled, which could result in deadlock.  The smp_call_function_single()
would wait for the target CPU to enable interrupts, which would not
happen until after the smp_call_function_single() returned due to its
caller holding watchdog_lock.

Or is there something that I am missing that prevents this deadlock
from occurring?

Thanx, Paul


Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-10 Thread Thomas Gleixner
On Fri, Apr 02 2021 at 15:49, paulmck wrote:
>
> +static void clocksource_verify_percpu_wq(struct work_struct *unused)
> +{
> + int cpu;
> + struct clocksource *cs;
> + int64_t cs_nsec;
> + u64 csnow_begin;
> + u64 csnow_end;
> + u64 delta;

Please use reverse fir tree ordering and stick variables of the same
type together:

u64 csnow_begin, csnow_end, delta;
struct clocksource *cs;
s64 cs_nsec;
int cpu;
> +
> + cs = smp_load_acquire(_verify_work_cs); // pairs with 
> release

Please don't use tail comments. They are a horrible distraction.

> + if (WARN_ON_ONCE(!cs))
> + return;
> + pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
> + cs->name, smp_processor_id());
> + cpumask_clear(_ahead);
> + cpumask_clear(_behind);
> + csnow_begin = cs->read(cs);

So this is invoked via work and the actual clocksource change is done
via work too. Once the clocksource is not longer actively used for
timekeeping it can go away. What's guaranteeing that this runs prior to
the clocksource change and 'cs' is valid throughout this function?

> + queue_work(system_highpri_wq, _verify_work);

This does not guarantee anything. So why does this need an extra work
function which is scheduled seperately?

Thanks,

tglx


[PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable

2021-04-02 Thread paulmck
From: "Paul E. McKenney" 

Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy
been solved in the past ten years.  Except that it is all too possible
that the problem has instead simply been made less likely, which might
mean that some of the occasional "Marking clocksource 'tsc' as unstable"
messages might be due to desynchronization.  How would anyone know?

This commit therefore adds CPU-to-CPU synchronization checking
for newly unstable clocksource that are marked with the new
CLOCK_SOURCE_VERIFY_PERCPU flag.  Lists of desynchronized CPUs are
printed, with the caveat that if it is the reporting CPU that is itself
desynchronized, it will appear that all the other clocks are wrong.
Just like in real life.

Cc: John Stultz 
Cc: Thomas Gleixner 
Cc: Stephen Boyd 
Cc: Jonathan Corbet 
Cc: Mark Rutland 
Cc: Marc Zyngier 
Cc: Andi Kleen 
Reported-by: Chris Mason 
[ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot 
feedback. ]
Signed-off-by: Paul E. McKenney 
---
 arch/x86/kernel/kvmclock.c  |  2 +-
 arch/x86/kernel/tsc.c   |  3 +-
 include/linux/clocksource.h |  2 +-
 kernel/time/clocksource.c   | 73 +
 4 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1fc0962..97eeaf1 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -169,7 +169,7 @@ struct clocksource kvm_clock = {
.read   = kvm_clock_get_cycles,
.rating = 400,
.mask   = CLOCKSOURCE_MASK(64),
-   .flags  = CLOCK_SOURCE_IS_CONTINUOUS,
+   .flags  = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU,
.enable = kvm_cs_enable,
 };
 EXPORT_SYMBOL_GPL(kvm_clock);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..5628917 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = {
.mask   = CLOCKSOURCE_MASK(64),
.flags  = CLOCK_SOURCE_IS_CONTINUOUS |
  CLOCK_SOURCE_VALID_FOR_HRES |
- CLOCK_SOURCE_MUST_VERIFY,
+ CLOCK_SOURCE_MUST_VERIFY |
+ CLOCK_SOURCE_VERIFY_PERCPU,
.vdso_clock_mode= VDSO_CLOCKMODE_TSC,
.enable = tsc_cs_enable,
.resume = tsc_resume,
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 86d143d..83a3ebf 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -131,7 +131,7 @@ struct clocksource {
 #define CLOCK_SOURCE_UNSTABLE  0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP   0x80
 #define CLOCK_SOURCE_RESELECT  0x100
-
+#define CLOCK_SOURCE_VERIFY_PERCPU 0x200
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f734c6..663bc53 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void)
WARN_ON_ONCE(injectfail < 0);
 }
 
+static struct clocksource *clocksource_verify_work_cs;
+static DEFINE_PER_CPU(u64, csnow_mid);
+static cpumask_t cpus_ahead;
+static cpumask_t cpus_behind;
+
+static void clocksource_verify_one_cpu(void *csin)
+{
+   struct clocksource *cs = (struct clocksource *)csin;
+
+   __this_cpu_write(csnow_mid, cs->read(cs));
+}
+
+static void clocksource_verify_percpu_wq(struct work_struct *unused)
+{
+   int cpu;
+   struct clocksource *cs;
+   int64_t cs_nsec;
+   u64 csnow_begin;
+   u64 csnow_end;
+   u64 delta;
+
+   cs = smp_load_acquire(_verify_work_cs); // pairs with 
release
+   if (WARN_ON_ONCE(!cs))
+   return;
+   pr_warn("Checking clocksource %s synchronization from CPU %d.\n",
+   cs->name, smp_processor_id());
+   cpumask_clear(_ahead);
+   cpumask_clear(_behind);
+   csnow_begin = cs->read(cs);
+   smp_call_function(clocksource_verify_one_cpu, cs, 1);
+   csnow_end = cs->read(cs);
+   for_each_online_cpu(cpu) {
+   if (cpu == smp_processor_id())
+   continue;
+   delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask;
+   if ((s64)delta < 0)
+   cpumask_set_cpu(cpu, _behind);
+   delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask;
+   if ((s64)delta < 0)
+   cpumask_set_cpu(cpu, _ahead);
+   }
+   if (!cpumask_empty(_ahead))
+   pr_warn("CPUs %*pbl ahead of CPU %d for clocksource 
%s.\n",
+   cpumask_pr_args(_ahead),
+