Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Tue, Apr 13, 2021 at 10:49:11PM +0200, Thomas Gleixner wrote: > Paul, > > On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote: > > On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote: > >> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote: > >> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote: > >> >> > I will send a new series out later today, Pacific Time. > >> >> > >> >> Can you do me a favour and send it standalone and not as yet another > >> >> reply to this existing thread maze. A trivial lore link to the previous > >> >> version gives enough context. > >> > > >> > Will do! > >> > > >> > Of course, it turns out that lockdep also doesn't like waited-on > >> > smp_call_function_single() invocations from timer handlers, > >> > so I am currently looking at other options for dealing with that > >> > potential use-after-free. I am starting to like the looks of "only set > >> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures > >> > and let KASAN enforce this restriction", but I have not quite given up > >> > on making it more general. > >> > >> The simplest point is in the thread under the clocksource_mutex which > >> prevents anything from vanishing under your feet. > > > > And lockdep is -much- happier with the setup shown below, so thank > > you again! > > But it is too simple now :) ... > > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index f047c6cb056c..34dc38b6b923 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void) > > unsigned long flags; > > int select = 0; > > > > + /* Do any required per-CPU skew verification. */ > > + list_for_each_entry(cs, &watchdog_list, wd_list) { > > + if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | > > CLOCK_SOURCE_VERIFY_PERCPU)) == > > + (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) > > + clocksource_verify_percpu(cs); > > + } > > because that list is _NOT_ protected by the clocksource_mutex as you > noticed yourself already. > > But you don't have to walk that list at all because the only interesting > thing is the currently active clocksource, which is about to be changed > in case the watchdog marked it unstable and cannot be changed by any > other code concurrently because clocksource_mutex is held. > > So all you need is: > > if (curr_clocksource && > curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE && > curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU) > clocksource_verify_percpu_wreckage(curr_clocksource); > > Hmm? With the addition of a clocksource=tsc boot parameter, this approach does appear to work, thank you! I sent out the updated series. Thanx, Paul
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
Paul, On Mon, Apr 12 2021 at 16:18, Paul E. McKenney wrote: > On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote: >> On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote: >> > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote: >> >> > I will send a new series out later today, Pacific Time. >> >> >> >> Can you do me a favour and send it standalone and not as yet another >> >> reply to this existing thread maze. A trivial lore link to the previous >> >> version gives enough context. >> > >> > Will do! >> > >> > Of course, it turns out that lockdep also doesn't like waited-on >> > smp_call_function_single() invocations from timer handlers, >> > so I am currently looking at other options for dealing with that >> > potential use-after-free. I am starting to like the looks of "only set >> > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures >> > and let KASAN enforce this restriction", but I have not quite given up >> > on making it more general. >> >> The simplest point is in the thread under the clocksource_mutex which >> prevents anything from vanishing under your feet. > > And lockdep is -much- happier with the setup shown below, so thank > you again! But it is too simple now :) ... > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index f047c6cb056c..34dc38b6b923 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void) > unsigned long flags; > int select = 0; > > + /* Do any required per-CPU skew verification. */ > + list_for_each_entry(cs, &watchdog_list, wd_list) { > + if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | > CLOCK_SOURCE_VERIFY_PERCPU)) == > + (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) > + clocksource_verify_percpu(cs); > + } because that list is _NOT_ protected by the clocksource_mutex as you noticed yourself already. But you don't have to walk that list at all because the only interesting thing is the currently active clocksource, which is about to be changed in case the watchdog marked it unstable and cannot be changed by any other code concurrently because clocksource_mutex is held. So all you need is: if (curr_clocksource && curr_clocksource->flags & CLOCK_SOURCE_UNSTABLE && curr_clocksource->flags & CLOCK_SOURCE_VERIFY_PERCPU) clocksource_verify_percpu_wreckage(curr_clocksource); Hmm? Thanks, tglx
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Mon, Apr 12, 2021 at 10:37:10PM +0200, Thomas Gleixner wrote: > On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote: > > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote: > >> > I will send a new series out later today, Pacific Time. > >> > >> Can you do me a favour and send it standalone and not as yet another > >> reply to this existing thread maze. A trivial lore link to the previous > >> version gives enough context. > > > > Will do! > > > > Of course, it turns out that lockdep also doesn't like waited-on > > smp_call_function_single() invocations from timer handlers, > > so I am currently looking at other options for dealing with that > > potential use-after-free. I am starting to like the looks of "only set > > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures > > and let KASAN enforce this restriction", but I have not quite given up > > on making it more general. > > The simplest point is in the thread under the clocksource_mutex which > prevents anything from vanishing under your feet. And lockdep is -much- happier with the setup shown below, so thank you again! Thanx, Paul diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index f047c6cb056c..34dc38b6b923 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -519,6 +515,13 @@ static int __clocksource_watchdog_kthread(void) unsigned long flags; int select = 0; + /* Do any required per-CPU skew verification. */ + list_for_each_entry(cs, &watchdog_list, wd_list) { + if ((cs->flags & (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) == + (CLOCK_SOURCE_UNSTABLE | CLOCK_SOURCE_VERIFY_PERCPU)) + clocksource_verify_percpu(cs); + } + spin_lock_irqsave(&watchdog_lock, flags); list_for_each_entry_safe(cs, tmp, &watchdog_list, wd_list) { if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Mon, Apr 12 2021 at 12:57, Paul E. McKenney wrote: > On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote: >> > I will send a new series out later today, Pacific Time. >> >> Can you do me a favour and send it standalone and not as yet another >> reply to this existing thread maze. A trivial lore link to the previous >> version gives enough context. > > Will do! > > Of course, it turns out that lockdep also doesn't like waited-on > smp_call_function_single() invocations from timer handlers, > so I am currently looking at other options for dealing with that > potential use-after-free. I am starting to like the looks of "only set > CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures > and let KASAN enforce this restriction", but I have not quite given up > on making it more general. The simplest point is in the thread under the clocksource_mutex which prevents anything from vanishing under your feet. Thanks, tglx
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Mon, Apr 12, 2021 at 08:54:03PM +0200, Thomas Gleixner wrote: > Paul, > > On Mon, Apr 12 2021 at 11:20, Paul E. McKenney wrote: > > On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote: > >> The reason for irqsave is again historical AFAICT and nobody bothered to > >> clean it up. spin_lock_bh() should be sufficient to serialize against > >> the watchdog timer, though I haven't looked at all possible scenarios. > > > > Though if BH is disabled, there is not so much advantage to > > invoking it from __clocksource_watchdog_kthread(). Might as > > well just invoke it directly from clocksource_watchdog(). > > > >> > 2. Invoke clocksource_verify_percpu() from its original > >> > location in clocksource_watchdog(), just before the call to > >> > __clocksource_unstable(). This relies on the fact that > >> > clocksource_watchdog() acquires watchdog_lock without > >> > disabling interrupts. > >> > >> That should be fine, but this might cause the softirq to 'run' for a > >> very long time which is not pretty either. > >> > >> Aside of that, do we really need to check _all_ online CPUs? What you > >> are trying to figure out is whether the wreckage is CPU local or global, > >> right? > >> > >> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good > >> enough? Either the other CPU has the same wreckage, then it's global or > >> it hasn't which points to a per CPU local issue. > >> > >> Sure it does not catch the case where a subset (>1) of all CPUs is > >> affected, but I'm not seing how that really buys us anything. > > > > Good point! My thought is to randomly pick eight CPUs to keep the > > duration reasonable while having a good chance of hitting "interesting" > > CPU choices in multicore and multisocket systems. > > > > However, if a hard-to-reproduce problem occurred, it would be good to take > > the hit and scan all the CPUs. Additionally, there are some workloads > > for which the switch from TSC to HPET is fatal anyway due to increased > > overhead. For these workloads, the full CPU scan is no additional pain. > > > > So I am thinking in terms of a default that probes eight randomly selected > > CPUs without worrying about duplicates (as in there would be some chance > > that fewer CPUs would actually be probed), but with a boot-time flag > > that does all CPUs. I would add the (default) random selection as a > > separate patch. > > You can't do without making it complex, right? Keep it simple is not an > option for a RCU hacker it seems :) But it was simple! It just hit all the CPUs. However, you (quite rightly) pointed out that this simple approach had a few shortcomings. ;-) > > I will send a new series out later today, Pacific Time. > > Can you do me a favour and send it standalone and not as yet another > reply to this existing thread maze. A trivial lore link to the previous > version gives enough context. Will do! Of course, it turns out that lockdep also doesn't like waited-on smp_call_function_single() invocations from timer handlers, so I am currently looking at other options for dealing with that potential use-after-free. I am starting to like the looks of "only set CLOCK_SOURCE_VERIFY_PERCPU on statically allocated clocksource structures and let KASAN enforce this restriction", but I have not quite given up on making it more general. Thanx, Paul
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
Paul, On Mon, Apr 12 2021 at 11:20, Paul E. McKenney wrote: > On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote: >> The reason for irqsave is again historical AFAICT and nobody bothered to >> clean it up. spin_lock_bh() should be sufficient to serialize against >> the watchdog timer, though I haven't looked at all possible scenarios. > > Though if BH is disabled, there is not so much advantage to > invoking it from __clocksource_watchdog_kthread(). Might as > well just invoke it directly from clocksource_watchdog(). > >> > 2. Invoke clocksource_verify_percpu() from its original >> >location in clocksource_watchdog(), just before the call to >> >__clocksource_unstable(). This relies on the fact that >> >clocksource_watchdog() acquires watchdog_lock without >> >disabling interrupts. >> >> That should be fine, but this might cause the softirq to 'run' for a >> very long time which is not pretty either. >> >> Aside of that, do we really need to check _all_ online CPUs? What you >> are trying to figure out is whether the wreckage is CPU local or global, >> right? >> >> Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good >> enough? Either the other CPU has the same wreckage, then it's global or >> it hasn't which points to a per CPU local issue. >> >> Sure it does not catch the case where a subset (>1) of all CPUs is >> affected, but I'm not seing how that really buys us anything. > > Good point! My thought is to randomly pick eight CPUs to keep the > duration reasonable while having a good chance of hitting "interesting" > CPU choices in multicore and multisocket systems. > > However, if a hard-to-reproduce problem occurred, it would be good to take > the hit and scan all the CPUs. Additionally, there are some workloads > for which the switch from TSC to HPET is fatal anyway due to increased > overhead. For these workloads, the full CPU scan is no additional pain. > > So I am thinking in terms of a default that probes eight randomly selected > CPUs without worrying about duplicates (as in there would be some chance > that fewer CPUs would actually be probed), but with a boot-time flag > that does all CPUs. I would add the (default) random selection as a > separate patch. You can't do without making it complex, right? Keep it simple is not an option for a RCU hacker it seems :) > I will send a new series out later today, Pacific Time. Can you do me a favour and send it standalone and not as yet another reply to this existing thread maze. A trivial lore link to the previous version gives enough context. Thanks, tglx
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Mon, Apr 12, 2021 at 03:08:16PM +0200, Thomas Gleixner wrote: > On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote: > > On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote: > >> So I need to is inline clocksource_verify_percpu_wq() > >> into clocksource_verify_percpu() and then move the call to > >> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right > >> before the existing call to list_del_init(). Will do! > > > > Except that this triggers the WARN_ON_ONCE() in smp_call_function_single() > > due to interrupts being disabled across that list_del_init(). > > > > Possibilities include: > > > > 1. Figure out why interrupts must be disabled only sometimes while > > holding watchdog_lock, in the hope that they need not be across > > the entire critical section for __clocksource_watchdog_kthread(). > > As in: > > > > local_irq_restore(flags); > > clocksource_verify_percpu(cs); > > local_irq_save(flags); > > > > Trying this first with lockdep enabled. Might be spectacular. > > Yes, it's a possible deadlock against the watchdog timer firing ... And lockdep most emphatically agrees with you. ;-) > The reason for irqsave is again historical AFAICT and nobody bothered to > clean it up. spin_lock_bh() should be sufficient to serialize against > the watchdog timer, though I haven't looked at all possible scenarios. Though if BH is disabled, there is not so much advantage to invoking it from __clocksource_watchdog_kthread(). Might as well just invoke it directly from clocksource_watchdog(). > > 2. Invoke clocksource_verify_percpu() from its original > > location in clocksource_watchdog(), just before the call to > > __clocksource_unstable(). This relies on the fact that > > clocksource_watchdog() acquires watchdog_lock without > > disabling interrupts. > > That should be fine, but this might cause the softirq to 'run' for a > very long time which is not pretty either. > > Aside of that, do we really need to check _all_ online CPUs? What you > are trying to figure out is whether the wreckage is CPU local or global, > right? > > Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good > enough? Either the other CPU has the same wreckage, then it's global or > it hasn't which points to a per CPU local issue. > > Sure it does not catch the case where a subset (>1) of all CPUs is > affected, but I'm not seing how that really buys us anything. Good point! My thought is to randomly pick eight CPUs to keep the duration reasonable while having a good chance of hitting "interesting" CPU choices in multicore and multisocket systems. However, if a hard-to-reproduce problem occurred, it would be good to take the hit and scan all the CPUs. Additionally, there are some workloads for which the switch from TSC to HPET is fatal anyway due to increased overhead. For these workloads, the full CPU scan is no additional pain. So I am thinking in terms of a default that probes eight randomly selected CPUs without worrying about duplicates (as in there would be some chance that fewer CPUs would actually be probed), but with a boot-time flag that does all CPUs. I would add the (default) random selection as a separate patch. I will send a new series out later today, Pacific Time. Thanx, Paul
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Sun, Apr 11 2021 at 21:21, Paul E. McKenney wrote: > On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote: >> So I need to is inline clocksource_verify_percpu_wq() >> into clocksource_verify_percpu() and then move the call to >> clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right >> before the existing call to list_del_init(). Will do! > > Except that this triggers the WARN_ON_ONCE() in smp_call_function_single() > due to interrupts being disabled across that list_del_init(). > > Possibilities include: > > 1.Figure out why interrupts must be disabled only sometimes while > holding watchdog_lock, in the hope that they need not be across > the entire critical section for __clocksource_watchdog_kthread(). > As in: > > local_irq_restore(flags); > clocksource_verify_percpu(cs); > local_irq_save(flags); > > Trying this first with lockdep enabled. Might be spectacular. Yes, it's a possible deadlock against the watchdog timer firing ... The reason for irqsave is again historical AFAICT and nobody bothered to clean it up. spin_lock_bh() should be sufficient to serialize against the watchdog timer, though I haven't looked at all possible scenarios. > 2.Invoke clocksource_verify_percpu() from its original > location in clocksource_watchdog(), just before the call to > __clocksource_unstable(). This relies on the fact that > clocksource_watchdog() acquires watchdog_lock without > disabling interrupts. That should be fine, but this might cause the softirq to 'run' for a very long time which is not pretty either. Aside of that, do we really need to check _all_ online CPUs? What you are trying to figure out is whether the wreckage is CPU local or global, right? Wouldn't a shirt-sleeve approach of just querying _one_ CPU be good enough? Either the other CPU has the same wreckage, then it's global or it hasn't which points to a per CPU local issue. Sure it does not catch the case where a subset (>1) of all CPUs is affected, but I'm not seing how that really buys us anything. Thanks, tglx
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Sun, Apr 11, 2021 at 09:46:12AM -0700, Paul E. McKenney wrote: > On Sun, Apr 11, 2021 at 12:33:44PM +0200, Thomas Gleixner wrote: > > On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote: > > > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote: > > >> > + if (WARN_ON_ONCE(!cs)) > > >> > + return; > > >> > + pr_warn("Checking clocksource %s synchronization from CPU > > >> > %d.\n", > > >> > + cs->name, smp_processor_id()); > > >> > + cpumask_clear(&cpus_ahead); > > >> > + cpumask_clear(&cpus_behind); > > >> > + csnow_begin = cs->read(cs); > > >> > > >> So this is invoked via work and the actual clocksource change is done > > >> via work too. Once the clocksource is not longer actively used for > > >> timekeeping it can go away. What's guaranteeing that this runs prior to > > >> the clocksource change and 'cs' is valid throughout this function? > > > > > > From what I can see, cs->read() doesn't care whether or not the > > > clocksource has been marked unstable. So it should be OK to call > > > cs->read() before, during, or after the call to __clocksource_unstable(). > > > > > > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU, > > > so any clocksource that did not like cs->read() being invoked during > > > or after the call to __clocksource_unstable() should leave off the > > > CLOCK_SOURCE_VERIFY_PERCPU bit. > > > > > > Or did I take a wrong turn somewhere in the pointers to functions? > > > > Right. cs->read() does not care, but what guarantees that cs is valid > > and not freed yet? It's not an issue with TSC and KVMCLOCK, but > > conceptually the following is possible: > > > > watchdog() > > queue_work(synccheck); > > queue_work(clocksource_change); > > > > work: > > synccheck() clocksource_change() > > preemption... > > ... > > some_other_code(): > > unregister_clocksource(cs) > > free(cs) > > cs->read() <- UAF > > Got it, with the ingenic_tcu_init() function being case in point. > It invokes clcoksource_unregister() shortly followed by clk_put(), which, > if I found the correct clk_put(), can kfree() it. > > Thank you! > > > >> > + queue_work(system_highpri_wq, &clocksource_verify_work); > > >> > > >> This does not guarantee anything. So why does this need an extra work > > >> function which is scheduled seperately? > > > > > > Because I was concerned about doing smp_call_function() while holding > > > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave(). > > > And it still looks like on x86 that spin_lock_irqsave() spins with irqs > > > disabled, which could result in deadlock. The smp_call_function_single() > > > would wait for the target CPU to enable interrupts, which would not > > > happen until after the smp_call_function_single() returned due to its > > > caller holding watchdog_lock. > > > > > > Or is there something that I am missing that prevents this deadlock > > > from occurring? > > > > The unstable mechanism is: > > > > watchdog() > > __clocksource_unstable() > > schedule_work(&watchdog_work); > > > > watchdog_work() > > kthread_run(clocksource_watchdog_thread); > > > > cs_watchdog_thread() > > mutex_lock(&clocksource_mutex); > > if (__clocksource_watchdog_kthread()) > > clocksource_select(); > > mutex_unlock(&clocksource_mutex); > > > > So what prevents you from doing that right in watchdog_work() or even in > > cs_watchdog_thread() properly ordered against the actual clocksource > > switch? > > > > Hmm? > > My own confusion, apparently. :-/ > > So I need to is inline clocksource_verify_percpu_wq() > into clocksource_verify_percpu() and then move the call to > clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right > before the existing call to list_del_init(). Will do! Except that this triggers the WARN_ON_ONCE() in smp_call_function_single() due to interrupts being disabled across that list_del_init(). Possibilities include: 1. Figure out why interrupts must be disabled only sometimes while holding watchdog_lock, in the hope that they need not be across the entire critical section for __clocksource_watchdog_kthread(). As in: local_irq_restore(flags); clocksource_verify_percpu(cs); local_irq_save(flags); Trying this first with lockdep enabled. Might be spectacular. 2. Invoke clocksource_verify_percpu() from its original location in clocksource_watchdog(), just before the call to __clocksource_unstable(). This relies on the fact that clocksource_watchdog() acquires watchdog_lock without disabling interrupts. 3.
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Sun, Apr 11, 2021 at 12:33:44PM +0200, Thomas Gleixner wrote: > On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote: > > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote: > >> > +if (WARN_ON_ONCE(!cs)) > >> > +return; > >> > +pr_warn("Checking clocksource %s synchronization from CPU > >> > %d.\n", > >> > +cs->name, smp_processor_id()); > >> > +cpumask_clear(&cpus_ahead); > >> > +cpumask_clear(&cpus_behind); > >> > +csnow_begin = cs->read(cs); > >> > >> So this is invoked via work and the actual clocksource change is done > >> via work too. Once the clocksource is not longer actively used for > >> timekeeping it can go away. What's guaranteeing that this runs prior to > >> the clocksource change and 'cs' is valid throughout this function? > > > > From what I can see, cs->read() doesn't care whether or not the > > clocksource has been marked unstable. So it should be OK to call > > cs->read() before, during, or after the call to __clocksource_unstable(). > > > > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU, > > so any clocksource that did not like cs->read() being invoked during > > or after the call to __clocksource_unstable() should leave off the > > CLOCK_SOURCE_VERIFY_PERCPU bit. > > > > Or did I take a wrong turn somewhere in the pointers to functions? > > Right. cs->read() does not care, but what guarantees that cs is valid > and not freed yet? It's not an issue with TSC and KVMCLOCK, but > conceptually the following is possible: > > watchdog() > queue_work(synccheck); > queue_work(clocksource_change); > > work: > synccheck() clocksource_change() > preemption... > ... > some_other_code(): > unregister_clocksource(cs) > free(cs) > cs->read() <- UAF Got it, with the ingenic_tcu_init() function being case in point. It invokes clcoksource_unregister() shortly followed by clk_put(), which, if I found the correct clk_put(), can kfree() it. Thank you! > >> > +queue_work(system_highpri_wq, &clocksource_verify_work); > >> > >> This does not guarantee anything. So why does this need an extra work > >> function which is scheduled seperately? > > > > Because I was concerned about doing smp_call_function() while holding > > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave(). > > And it still looks like on x86 that spin_lock_irqsave() spins with irqs > > disabled, which could result in deadlock. The smp_call_function_single() > > would wait for the target CPU to enable interrupts, which would not > > happen until after the smp_call_function_single() returned due to its > > caller holding watchdog_lock. > > > > Or is there something that I am missing that prevents this deadlock > > from occurring? > > The unstable mechanism is: > > watchdog() > __clocksource_unstable() > schedule_work(&watchdog_work); > > watchdog_work() > kthread_run(clocksource_watchdog_thread); > > cs_watchdog_thread() > mutex_lock(&clocksource_mutex); > if (__clocksource_watchdog_kthread()) > clocksource_select(); > mutex_unlock(&clocksource_mutex); > > So what prevents you from doing that right in watchdog_work() or even in > cs_watchdog_thread() properly ordered against the actual clocksource > switch? > > Hmm? My own confusion, apparently. :-/ So I need to is inline clocksource_verify_percpu_wq() into clocksource_verify_percpu() and then move the call to clocksource_verify_percpu() to __clocksource_watchdog_kthread(), right before the existing call to list_del_init(). Will do! Thanx, Paul
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Sat, Apr 10 2021 at 17:20, Paul E. McKenney wrote: > On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote: >> > + if (WARN_ON_ONCE(!cs)) >> > + return; >> > + pr_warn("Checking clocksource %s synchronization from CPU %d.\n", >> > + cs->name, smp_processor_id()); >> > + cpumask_clear(&cpus_ahead); >> > + cpumask_clear(&cpus_behind); >> > + csnow_begin = cs->read(cs); >> >> So this is invoked via work and the actual clocksource change is done >> via work too. Once the clocksource is not longer actively used for >> timekeeping it can go away. What's guaranteeing that this runs prior to >> the clocksource change and 'cs' is valid throughout this function? > > From what I can see, cs->read() doesn't care whether or not the > clocksource has been marked unstable. So it should be OK to call > cs->read() before, during, or after the call to __clocksource_unstable(). > > Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU, > so any clocksource that did not like cs->read() being invoked during > or after the call to __clocksource_unstable() should leave off the > CLOCK_SOURCE_VERIFY_PERCPU bit. > > Or did I take a wrong turn somewhere in the pointers to functions? Right. cs->read() does not care, but what guarantees that cs is valid and not freed yet? It's not an issue with TSC and KVMCLOCK, but conceptually the following is possible: watchdog() queue_work(synccheck); queue_work(clocksource_change); work: synccheck() clocksource_change() preemption... ... some_other_code(): unregister_clocksource(cs) free(cs) cs->read() <- UAF >> > + queue_work(system_highpri_wq, &clocksource_verify_work); >> >> This does not guarantee anything. So why does this need an extra work >> function which is scheduled seperately? > > Because I was concerned about doing smp_call_function() while holding > watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave(). > And it still looks like on x86 that spin_lock_irqsave() spins with irqs > disabled, which could result in deadlock. The smp_call_function_single() > would wait for the target CPU to enable interrupts, which would not > happen until after the smp_call_function_single() returned due to its > caller holding watchdog_lock. > > Or is there something that I am missing that prevents this deadlock > from occurring? The unstable mechanism is: watchdog() __clocksource_unstable() schedule_work(&watchdog_work); watchdog_work() kthread_run(clocksource_watchdog_thread); cs_watchdog_thread() mutex_lock(&clocksource_mutex); if (__clocksource_watchdog_kthread()) clocksource_select(); mutex_unlock(&clocksource_mutex); So what prevents you from doing that right in watchdog_work() or even in cs_watchdog_thread() properly ordered against the actual clocksource switch? Hmm? Thanks, tglx
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Sat, Apr 10, 2021 at 11:00:25AM +0200, Thomas Gleixner wrote: > On Fri, Apr 02 2021 at 15:49, paulmck wrote: > > > > +static void clocksource_verify_percpu_wq(struct work_struct *unused) > > +{ > > + int cpu; > > + struct clocksource *cs; > > + int64_t cs_nsec; > > + u64 csnow_begin; > > + u64 csnow_end; > > + u64 delta; > > Please use reverse fir tree ordering and stick variables of the same > type together: > > u64 csnow_begin, csnow_end, delta; > struct clocksource *cs; > s64 cs_nsec; > int cpu; Will do. > > + > > + cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with > > release > > Please don't use tail comments. They are a horrible distraction. I will remove it. > > + if (WARN_ON_ONCE(!cs)) > > + return; > > + pr_warn("Checking clocksource %s synchronization from CPU %d.\n", > > + cs->name, smp_processor_id()); > > + cpumask_clear(&cpus_ahead); > > + cpumask_clear(&cpus_behind); > > + csnow_begin = cs->read(cs); > > So this is invoked via work and the actual clocksource change is done > via work too. Once the clocksource is not longer actively used for > timekeeping it can go away. What's guaranteeing that this runs prior to > the clocksource change and 'cs' is valid throughout this function? >From what I can see, cs->read() doesn't care whether or not the clocksource has been marked unstable. So it should be OK to call cs->read() before, during, or after the call to __clocksource_unstable(). Also, this is only done on clocksources marked CLOCK_SOURCE_VERIFY_PERCPU, so any clocksource that did not like cs->read() being invoked during or after the call to __clocksource_unstable() should leave off the CLOCK_SOURCE_VERIFY_PERCPU bit. Or did I take a wrong turn somewhere in the pointers to functions? > > + queue_work(system_highpri_wq, &clocksource_verify_work); > > This does not guarantee anything. So why does this need an extra work > function which is scheduled seperately? Because I was concerned about doing smp_call_function() while holding watchdog_lock, which is also acquired elsewhere using spin_lock_irqsave(). And it still looks like on x86 that spin_lock_irqsave() spins with irqs disabled, which could result in deadlock. The smp_call_function_single() would wait for the target CPU to enable interrupts, which would not happen until after the smp_call_function_single() returned due to its caller holding watchdog_lock. Or is there something that I am missing that prevents this deadlock from occurring? Thanx, Paul
Re: [PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
On Fri, Apr 02 2021 at 15:49, paulmck wrote: > > +static void clocksource_verify_percpu_wq(struct work_struct *unused) > +{ > + int cpu; > + struct clocksource *cs; > + int64_t cs_nsec; > + u64 csnow_begin; > + u64 csnow_end; > + u64 delta; Please use reverse fir tree ordering and stick variables of the same type together: u64 csnow_begin, csnow_end, delta; struct clocksource *cs; s64 cs_nsec; int cpu; > + > + cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with > release Please don't use tail comments. They are a horrible distraction. > + if (WARN_ON_ONCE(!cs)) > + return; > + pr_warn("Checking clocksource %s synchronization from CPU %d.\n", > + cs->name, smp_processor_id()); > + cpumask_clear(&cpus_ahead); > + cpumask_clear(&cpus_behind); > + csnow_begin = cs->read(cs); So this is invoked via work and the actual clocksource change is done via work too. Once the clocksource is not longer actively used for timekeeping it can go away. What's guaranteeing that this runs prior to the clocksource change and 'cs' is valid throughout this function? > + queue_work(system_highpri_wq, &clocksource_verify_work); This does not guarantee anything. So why does this need an extra work function which is scheduled seperately? Thanks, tglx
[PATCH v7 clocksource 3/5] clocksource: Check per-CPU clock synchronization when marked unstable
From: "Paul E. McKenney" Some sorts of per-CPU clock sources have a history of going out of synchronization with each other. However, this problem has purportedy been solved in the past ten years. Except that it is all too possible that the problem has instead simply been made less likely, which might mean that some of the occasional "Marking clocksource 'tsc' as unstable" messages might be due to desynchronization. How would anyone know? This commit therefore adds CPU-to-CPU synchronization checking for newly unstable clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag. Lists of desynchronized CPUs are printed, with the caveat that if it is the reporting CPU that is itself desynchronized, it will appear that all the other clocks are wrong. Just like in real life. Cc: John Stultz Cc: Thomas Gleixner Cc: Stephen Boyd Cc: Jonathan Corbet Cc: Mark Rutland Cc: Marc Zyngier Cc: Andi Kleen Reported-by: Chris Mason [ paulmck: Add "static" to clocksource_verify_one_cpu() per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney --- arch/x86/kernel/kvmclock.c | 2 +- arch/x86/kernel/tsc.c | 3 +- include/linux/clocksource.h | 2 +- kernel/time/clocksource.c | 73 + 4 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 1fc0962..97eeaf1 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -169,7 +169,7 @@ struct clocksource kvm_clock = { .read = kvm_clock_get_cycles, .rating = 400, .mask = CLOCKSOURCE_MASK(64), - .flags = CLOCK_SOURCE_IS_CONTINUOUS, + .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VERIFY_PERCPU, .enable = kvm_cs_enable, }; EXPORT_SYMBOL_GPL(kvm_clock); diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index f70dffc..5628917 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1151,7 +1151,8 @@ static struct clocksource clocksource_tsc = { .mask = CLOCKSOURCE_MASK(64), .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VALID_FOR_HRES | - CLOCK_SOURCE_MUST_VERIFY, + CLOCK_SOURCE_MUST_VERIFY | + CLOCK_SOURCE_VERIFY_PERCPU, .vdso_clock_mode= VDSO_CLOCKMODE_TSC, .enable = tsc_cs_enable, .resume = tsc_resume, diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 86d143d..83a3ebf 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -131,7 +131,7 @@ struct clocksource { #define CLOCK_SOURCE_UNSTABLE 0x40 #define CLOCK_SOURCE_SUSPEND_NONSTOP 0x80 #define CLOCK_SOURCE_RESELECT 0x100 - +#define CLOCK_SOURCE_VERIFY_PERCPU 0x200 /* simplify initialization of mask field */ #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 3f734c6..663bc53 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -211,6 +211,78 @@ static void clocksource_watchdog_inject_delay(void) WARN_ON_ONCE(injectfail < 0); } +static struct clocksource *clocksource_verify_work_cs; +static DEFINE_PER_CPU(u64, csnow_mid); +static cpumask_t cpus_ahead; +static cpumask_t cpus_behind; + +static void clocksource_verify_one_cpu(void *csin) +{ + struct clocksource *cs = (struct clocksource *)csin; + + __this_cpu_write(csnow_mid, cs->read(cs)); +} + +static void clocksource_verify_percpu_wq(struct work_struct *unused) +{ + int cpu; + struct clocksource *cs; + int64_t cs_nsec; + u64 csnow_begin; + u64 csnow_end; + u64 delta; + + cs = smp_load_acquire(&clocksource_verify_work_cs); // pairs with release + if (WARN_ON_ONCE(!cs)) + return; + pr_warn("Checking clocksource %s synchronization from CPU %d.\n", + cs->name, smp_processor_id()); + cpumask_clear(&cpus_ahead); + cpumask_clear(&cpus_behind); + csnow_begin = cs->read(cs); + smp_call_function(clocksource_verify_one_cpu, cs, 1); + csnow_end = cs->read(cs); + for_each_online_cpu(cpu) { + if (cpu == smp_processor_id()) + continue; + delta = (per_cpu(csnow_mid, cpu) - csnow_begin) & cs->mask; + if ((s64)delta < 0) + cpumask_set_cpu(cpu, &cpus_behind); + delta = (csnow_end - per_cpu(csnow_mid, cpu)) & cs->mask; + if ((s64)delta < 0) + cpumask_set_cpu(cpu, &cpus_ahead); + } + if (!cpumask_empty(&cpus_ahead)) + pr_warn("CPUs %*pbl ahead of CPU %d for clocksource %s.\n", +