Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Fri, 23 Jan 2015, Sören Brinkmann wrote: > On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote: > > I thought that a shutdown clock event device explicitly disables IRQ > > assertion, that's why I am inquiring, I do not understand how this > > can happen - how can you have a pending timer IRQ at step 3 above. > > It does I guess, but the shutdown of the IRQ happens after disabling > IRQs. During that window the IRQ can become pening without anybody > handling it. Shutting the timer down usually just ensures that no new > interrupts are signaled but it apparently doesn't check whether any are > still pending (which isn't necessary in almost all cases since the ISR > would take care of it). Well, the issue is that lots of platforms are just not affected by this because they either power off or simply do not propagate interrupts which are not explicitely marked as wakeup sources. After thinking more about it, we really should try to handle it at the core code, especially because the freezer folks who try to shutdown the tick completely run into similar issues. But, I don't think that just moving the suspend() call before disabling interrupts is sufficient. I rather think it's outright dangerous. suspend_ce() local_irq_disable(); ---> Interrupt becomes pending shutdown(ce); local_irq_enable(); ---> Interrupt fires There are implementations which actually switch of clocks and such at shutdown. So depending on the code in the actual interrupt handler (not the clockevent->handler, though that might have issues as well) we might touch a disfunctional device with possibly fatal consequences. Right now this cannot happen as the invocation of the interrupt handler is guaranteed to happen _after_ the device has been resumed. Just because it does not explode in your implementation does not mean its safe to inflict on everyone. So if we try to handle this issue at the core level, then we really must deal with the interrupt itself. Timer interrupts are always marked IRQF_NO_SUSPEND, because we still need them until syscore_suspend(). So in case we really shut down the clockevent device via tick_suspend() then we must disable the interrupt there as well and reenable it in tick_resume(). But that's not really possible at the core level today. Timer interrupts are delivered by various mechanisms: - plain interrupts - percpu interrupts - per cpu direct vectors - more complex stuff and we have no idea which type we are dealing with. So the only option would be to have a callback and let the suspend code do: if (cedev->disable_irq) cedev->disable_irq(cddev); and the reverse operation for resume. So we can provide default implementations for the callbacks: void ce_[dis|en]able_irq(...) void ce_[dis|en]able_percpu_irq(...) which would rely on cedev->irq. Those two would cover most of the implementations. The ones with special requirements can implement their own. Now at the irq level, we'd need a new function for the device irqs: disable_and_mask_irq_nosync(irq) which makes sure that the line is masked at the hardware level. That's necessary because the lazy interrupt disable is not sufficient. The percpu irqs are already masking at the hardware level. One might argue that we could handle this in set_mode() as well, but we are thinking about going away from that multiplex call. Aside of that, ensuring the symmetry of disable/enable invocations would be problematic with the existing set_mode() implementations and usage. Thanks, tglx
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Fri, 23 Jan 2015, Sören Brinkmann wrote: On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote: I thought that a shutdown clock event device explicitly disables IRQ assertion, that's why I am inquiring, I do not understand how this can happen - how can you have a pending timer IRQ at step 3 above. It does I guess, but the shutdown of the IRQ happens after disabling IRQs. During that window the IRQ can become pening without anybody handling it. Shutting the timer down usually just ensures that no new interrupts are signaled but it apparently doesn't check whether any are still pending (which isn't necessary in almost all cases since the ISR would take care of it). Well, the issue is that lots of platforms are just not affected by this because they either power off or simply do not propagate interrupts which are not explicitely marked as wakeup sources. After thinking more about it, we really should try to handle it at the core code, especially because the freezer folks who try to shutdown the tick completely run into similar issues. But, I don't think that just moving the suspend() call before disabling interrupts is sufficient. I rather think it's outright dangerous. suspend_ce() local_irq_disable(); --- Interrupt becomes pending shutdown(ce); local_irq_enable(); --- Interrupt fires There are implementations which actually switch of clocks and such at shutdown. So depending on the code in the actual interrupt handler (not the clockevent-handler, though that might have issues as well) we might touch a disfunctional device with possibly fatal consequences. Right now this cannot happen as the invocation of the interrupt handler is guaranteed to happen _after_ the device has been resumed. Just because it does not explode in your implementation does not mean its safe to inflict on everyone. So if we try to handle this issue at the core level, then we really must deal with the interrupt itself. Timer interrupts are always marked IRQF_NO_SUSPEND, because we still need them until syscore_suspend(). So in case we really shut down the clockevent device via tick_suspend() then we must disable the interrupt there as well and reenable it in tick_resume(). But that's not really possible at the core level today. Timer interrupts are delivered by various mechanisms: - plain interrupts - percpu interrupts - per cpu direct vectors - more complex stuff and we have no idea which type we are dealing with. So the only option would be to have a callback and let the suspend code do: if (cedev-disable_irq) cedev-disable_irq(cddev); and the reverse operation for resume. So we can provide default implementations for the callbacks: void ce_[dis|en]able_irq(...) void ce_[dis|en]able_percpu_irq(...) which would rely on cedev-irq. Those two would cover most of the implementations. The ones with special requirements can implement their own. Now at the irq level, we'd need a new function for the device irqs: disable_and_mask_irq_nosync(irq) which makes sure that the line is masked at the hardware level. That's necessary because the lazy interrupt disable is not sufficient. The percpu irqs are already masking at the hardware level. One might argue that we could handle this in set_mode() as well, but we are thinking about going away from that multiplex call. Aside of that, ensuring the symmetry of disable/enable invocations would be problematic with the existing set_mode() implementations and usage. Thanks, tglx
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Mon, 2015-01-12 at 12:20AM +0100, Rafael J. Wysocki wrote: > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > > > Hi Rafael, > > > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > > > Hi Rafael, > > > > > > > > Hi, > > > > > > > > Sorry for the huge delay. > > > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > > > > On platforms that do not power off during suspend, successfully > > > > > > > entering > > > > > > > suspend races with timers. > > > > > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > > > > > 1. disable IRQs (e.g. > > > > > > > arch_suspend_disable_irqs()) > > > > > > > ... > > > > > > > 2. syscore_suspend() > > > > > > > -> timekeeping_suspend() > > > > > > >-> clockevents_notify(SUSPEND) > > > > > > > -> tick_suspend() (timers are turned off here) > > > > > > > ... > > > > > > > 3. wfi (wait for wake-IRQ here) > > > > > > > > > > > > > > Between steps 1 and 2 the timers can still generate interrupts > > > > > > > that are > > > > > > > not handled and stay pending until step 3. That pending IRQ > > > > > > > causes an > > > > > > > immediate - spurious - wake. > > > > > > > > > > > > > > The solution is to move the clockevents suspend/resume > > > > > > > notification > > > > > > > out of the syscore_suspend step and explictly call them at the > > > > > > > appropriate > > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend > > > > > > > _before_ > > > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > > > --- > > > > > > > Hi, > > > > > > > > > > > > > > there was not a lot of discussion on the last submission. Just > > > > > > > one comment from > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I > > > > > > > outlined in my > > > > > > > response, does not apply, IMHO, since the platform does not > > > > > > > re-enable > > > > > > > interrupts. > > > > > > > > > > > > Well, you just don't agree with it. > > > > > > > > > > > > The problem with your approach is that timer interrupts aren't > > > > > > actually as > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would > > > > > > have caused > > > > > > similar issues to appear under specific conditions. > > > > > > > > > > > > The solution I would suggest and that actually covers all > > > > > > IRQF_NO_SUSPEND > > > > > > interrupts would be to use a wait_event() loop like the one in > > > > > > freeze_enter() > > > > > > (on top of the current linux-next or the pm-genirq branch of > > > > > > linux-pm.git), > > > > > > but wait for pm_abort_suspend to become true, to implement system > > > > > > suspend. > > > > > > > > > > sorry, it took me a while since I needed to get some dependencies > > > > > ported > > > > > to the pm-genirq base. Once I had that, it reproduced my original > > > > > issue. > > > > > So far so good. I then looked into finding a solution following your > > > > > guidance. I'm not sure I really found what you had in mind, but below > > > > > is > > > > > what I came up with, which seems to do it. > > > > > Please let me know how far off I am. > > > > > > > > > > Thanks, > > > > > Sören > > > > > > > > > > ---8<--8<8<8<--- > > > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > > > > > index c2744b30d5d9..a4f9914571f1 100644 > > > > > --- a/drivers/base/power/wakeup.c > > > > > +++ b/drivers/base/power/wakeup.c > > > > > @@ -25,7 +25,7 @@ > > > > > bool events_check_enabled __read_mostly; > > > > > > > > > > /* If set and the system is suspending, terminate the suspend. */ > > > > > -static bool pm_abort_suspend __read_mostly; > > > > > +bool pm_abort_suspend __read_mostly; > > > > > > > > > > /* > > > > > * Combined counters of registered wakeup events and wakeup events > > > > > in progress. > > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644 > > > > > --- a/kernel/power/suspend.c > > > > > +++ b/kernel/power/suspend.c > > > > > @@ -33,6 +33,7 @@ > > > > > > > > > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > > > > > const char *pm_states[PM_SUSPEND_MAX]; > > > > > +extern bool pm_abort_suspend; > > > > > > > > > > static const struct platform_suspend_ops *suspend_ops; > > > > > static const struct platform_freeze_ops *freeze_ops; > > > > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, > > >
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Sorry for the delay, but a lot of stuff came in between and I need to rebase a couple of branches and re-test things. Some comments inline below: On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote: > On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote: > > On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote: > > > Hi Rafael, Soren, > > > > > > On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: > > > > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: > > > > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > > > > > > Hi Rafael, > > > > > > > > > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > > > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > > > > > > Hi Rafael, > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > Sorry for the huge delay. > > > > > > > > > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann > > > > > > > > > wrote: > > > > > > > > > > On platforms that do not power off during suspend, > > > > > > > > > > successfully entering > > > > > > > > > > suspend races with timers. > > > > > > > > > > > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > > > > > > > > > > > 1. disable IRQs (e.g. > > > > > > > > > > arch_suspend_disable_irqs()) > > > > > > > > > > ... > > > > > > > > > > 2. syscore_suspend() > > > > > > > > > > -> timekeeping_suspend() > > > > > > > > > >-> clockevents_notify(SUSPEND) > > > > > > > > > > -> tick_suspend() (timers are turned off here) > > > > > > > > > > ... > > > > > > > > > > 3. wfi(wait for wake-IRQ here) > > > > > > > > > > > > > > > > > > > > Between steps 1 and 2 the timers can still generate > > > > > > > > > > interrupts that are > > > > > > > > > > not handled and stay pending until step 3. That pending IRQ > > > > > > > > > > causes an > > > > > > > > > > immediate - spurious - wake. > > > > > > > > > > > > > > > > > > > > The solution is to move the clockevents suspend/resume > > > > > > > > > > notification > > > > > > > > > > out of the syscore_suspend step and explictly call them at > > > > > > > > > > the appropriate > > > > > > > > > > time in the suspend/hibernation paths. I.e. timers are > > > > > > > > > > suspend _before_ > > > > > > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > > > > > > --- > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > there was not a lot of discussion on the last submission. > > > > > > > > > > Just one comment from > > > > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I > > > > > > > > > > outlined in my > > > > > > > > > > response, does not apply, IMHO, since the platform does not > > > > > > > > > > re-enable > > > > > > > > > > interrupts. > > > > > > > > > > > > > > > > > > Well, you just don't agree with it. > > > > > > > > > > > > > > > > > > The problem with your approach is that timer interrupts > > > > > > > > > aren't actually as > > > > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts > > > > > > > > > would have caused > > > > > > > > > similar issues to appear under specific conditions. > > > > > > > > > > > > > > > > > > The solution I would suggest and that actually covers all > > > > > > > > > IRQF_NO_SUSPEND > > > > > > > > > interrupts would be to use a wait_event() loop like the one > > > > > > > > > in freeze_enter() > > > > > > > > > (on top of the current linux-next or the pm-genirq branch of > > > > > > > > > linux-pm.git), > > > > > > > > > but wait for pm_abort_suspend to become true, to implement > > > > > > > > > system suspend. > > > > > > > > > > > > > > > > sorry, it took me a while since I needed to get some > > > > > > > > dependencies ported > > > > > > > > to the pm-genirq base. Once I had that, it reproduced my > > > > > > > > original issue. > > > > > > > > So far so good. I then looked into finding a solution following > > > > > > > > your > > > > > > > > guidance. I'm not sure I really found what you had in mind, but > > > > > > > > below is > > > > > > > > what I came up with, which seems to do it. > > > > > > > > Please let me know how far off I am. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Sören > > > > > > > > > > > > > > > > ---8<--8<8<8<--- > > > > > > > > diff --git a/drivers/base/power/wakeup.c > > > > > > > > b/drivers/base/power/wakeup.c > > > > > > > > index c2744b30d5d9..a4f9914571f1 100644 > > > > > > > > --- a/drivers/base/power/wakeup.c > > > > > > > > +++ b/drivers/base/power/wakeup.c > > > > > > > > @@ -25,7 +25,7 @@ > > > > > > > > bool events_check_enabled
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Mon, 2015-01-12 at 12:20AM +0100, Rafael J. Wysocki wrote: On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi (wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Sorry for the delay, but a lot of stuff came in between and I need to rebase a couple of branches and re-test things. Some comments inline below: On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote: On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote: On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote: Hi Rafael, Soren, On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi(wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote: > On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote: > > Hi Rafael, Soren, > > > > On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: > > > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: > > > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > > > > > Hi Rafael, > > > > > > > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > > > > > Hi Rafael, > > > > > > > > > > > > Hi, > > > > > > > > > > > > Sorry for the huge delay. > > > > > > > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > > > > > > On platforms that do not power off during suspend, > > > > > > > > > successfully entering > > > > > > > > > suspend races with timers. > > > > > > > > > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > > > > > > > > > 1. disable IRQs (e.g. > > > > > > > > > arch_suspend_disable_irqs()) > > > > > > > > > ... > > > > > > > > > 2. syscore_suspend() > > > > > > > > > -> timekeeping_suspend() > > > > > > > > >-> clockevents_notify(SUSPEND) > > > > > > > > > -> tick_suspend() (timers are turned off here) > > > > > > > > > ... > > > > > > > > > 3. wfi(wait for wake-IRQ here) > > > > > > > > > > > > > > > > > > Between steps 1 and 2 the timers can still generate > > > > > > > > > interrupts that are > > > > > > > > > not handled and stay pending until step 3. That pending IRQ > > > > > > > > > causes an > > > > > > > > > immediate - spurious - wake. > > > > > > > > > > > > > > > > > > The solution is to move the clockevents suspend/resume > > > > > > > > > notification > > > > > > > > > out of the syscore_suspend step and explictly call them at > > > > > > > > > the appropriate > > > > > > > > > time in the suspend/hibernation paths. I.e. timers are > > > > > > > > > suspend _before_ > > > > > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > > > > > --- > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > there was not a lot of discussion on the last submission. > > > > > > > > > Just one comment from > > > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I > > > > > > > > > outlined in my > > > > > > > > > response, does not apply, IMHO, since the platform does not > > > > > > > > > re-enable > > > > > > > > > interrupts. > > > > > > > > > > > > > > > > Well, you just don't agree with it. > > > > > > > > > > > > > > > > The problem with your approach is that timer interrupts aren't > > > > > > > > actually as > > > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts > > > > > > > > would have caused > > > > > > > > similar issues to appear under specific conditions. > > > > > > > > > > > > > > > > The solution I would suggest and that actually covers all > > > > > > > > IRQF_NO_SUSPEND > > > > > > > > interrupts would be to use a wait_event() loop like the one in > > > > > > > > freeze_enter() > > > > > > > > (on top of the current linux-next or the pm-genirq branch of > > > > > > > > linux-pm.git), > > > > > > > > but wait for pm_abort_suspend to become true, to implement > > > > > > > > system suspend. > > > > > > > > > > > > > > sorry, it took me a while since I needed to get some dependencies > > > > > > > ported > > > > > > > to the pm-genirq base. Once I had that, it reproduced my original > > > > > > > issue. > > > > > > > So far so good. I then looked into finding a solution following > > > > > > > your > > > > > > > guidance. I'm not sure I really found what you had in mind, but > > > > > > > below is > > > > > > > what I came up with, which seems to do it. > > > > > > > Please let me know how far off I am. > > > > > > > > > > > > > > Thanks, > > > > > > > Sören > > > > > > > > > > > > > > ---8<--8<8<8<--- > > > > > > > diff --git a/drivers/base/power/wakeup.c > > > > > > > b/drivers/base/power/wakeup.c > > > > > > > index c2744b30d5d9..a4f9914571f1 100644 > > > > > > > --- a/drivers/base/power/wakeup.c > > > > > > > +++ b/drivers/base/power/wakeup.c > > > > > > > @@ -25,7 +25,7 @@ > > > > > > > bool events_check_enabled __read_mostly; > > > > > > > > > > > > > > /* If set and the system is suspending, terminate the suspend. */ > > > > > > > -static bool pm_abort_suspend __read_mostly; > > > > > > > +bool pm_abort_suspend __read_mostly; > > > > > > > > > > > > > > /* > > > > > > > * Combined counters of registered wakeup events and wakeup > > > > > > > events in progress. > > > > > > > diff --git a/kernel/power/suspend.c
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote: > Hi Rafael, Soren, > > On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: > > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: > > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > > > > Hi Rafael, > > > > > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > > > > Hi Rafael, > > > > > > > > > > Hi, > > > > > > > > > > Sorry for the huge delay. > > > > > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > > > > > On platforms that do not power off during suspend, successfully > > > > > > > > entering > > > > > > > > suspend races with timers. > > > > > > > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > > > > > > > 1. disable IRQs (e.g. > > > > > > > > arch_suspend_disable_irqs()) > > > > > > > > ... > > > > > > > > 2. syscore_suspend() > > > > > > > > -> timekeeping_suspend() > > > > > > > >-> clockevents_notify(SUSPEND) > > > > > > > > -> tick_suspend() (timers are turned off here) > > > > > > > > ... > > > > > > > > 3. wfi(wait for wake-IRQ here) > > > > > > > > > > > > > > > > Between steps 1 and 2 the timers can still generate interrupts > > > > > > > > that are > > > > > > > > not handled and stay pending until step 3. That pending IRQ > > > > > > > > causes an > > > > > > > > immediate - spurious - wake. > > > > > > > > > > > > > > > > The solution is to move the clockevents suspend/resume > > > > > > > > notification > > > > > > > > out of the syscore_suspend step and explictly call them at the > > > > > > > > appropriate > > > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend > > > > > > > > _before_ > > > > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > > > > --- > > > > > > > > Hi, > > > > > > > > > > > > > > > > there was not a lot of discussion on the last submission. Just > > > > > > > > one comment from > > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I > > > > > > > > outlined in my > > > > > > > > response, does not apply, IMHO, since the platform does not > > > > > > > > re-enable > > > > > > > > interrupts. > > > > > > > > > > > > > > Well, you just don't agree with it. > > > > > > > > > > > > > > The problem with your approach is that timer interrupts aren't > > > > > > > actually as > > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts > > > > > > > would have caused > > > > > > > similar issues to appear under specific conditions. > > > > > > > > > > > > > > The solution I would suggest and that actually covers all > > > > > > > IRQF_NO_SUSPEND > > > > > > > interrupts would be to use a wait_event() loop like the one in > > > > > > > freeze_enter() > > > > > > > (on top of the current linux-next or the pm-genirq branch of > > > > > > > linux-pm.git), > > > > > > > but wait for pm_abort_suspend to become true, to implement system > > > > > > > suspend. > > > > > > > > > > > > sorry, it took me a while since I needed to get some dependencies > > > > > > ported > > > > > > to the pm-genirq base. Once I had that, it reproduced my original > > > > > > issue. > > > > > > So far so good. I then looked into finding a solution following your > > > > > > guidance. I'm not sure I really found what you had in mind, but > > > > > > below is > > > > > > what I came up with, which seems to do it. > > > > > > Please let me know how far off I am. > > > > > > > > > > > > Thanks, > > > > > > Sören > > > > > > > > > > > > ---8<--8<8<8<--- > > > > > > diff --git a/drivers/base/power/wakeup.c > > > > > > b/drivers/base/power/wakeup.c > > > > > > index c2744b30d5d9..a4f9914571f1 100644 > > > > > > --- a/drivers/base/power/wakeup.c > > > > > > +++ b/drivers/base/power/wakeup.c > > > > > > @@ -25,7 +25,7 @@ > > > > > > bool events_check_enabled __read_mostly; > > > > > > > > > > > > /* If set and the system is suspending, terminate the suspend. */ > > > > > > -static bool pm_abort_suspend __read_mostly; > > > > > > +bool pm_abort_suspend __read_mostly; > > > > > > > > > > > > /* > > > > > > * Combined counters of registered wakeup events and wakeup events > > > > > > in progress. > > > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644 > > > > > > --- a/kernel/power/suspend.c > > > > > > +++ b/kernel/power/suspend.c > > > > > > @@ -33,6 +33,7 @@ > > > > > > > > > > > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > > > >
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, Soren, On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > > > Hi Rafael, > > > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > > > Hi Rafael, > > > > > > > > Hi, > > > > > > > > Sorry for the huge delay. > > > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > > > > On platforms that do not power off during suspend, successfully > > > > > > > entering > > > > > > > suspend races with timers. > > > > > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > > > > > 1. disable IRQs (e.g. > > > > > > > arch_suspend_disable_irqs()) > > > > > > > ... > > > > > > > 2. syscore_suspend() > > > > > > > -> timekeeping_suspend() > > > > > > >-> clockevents_notify(SUSPEND) > > > > > > > -> tick_suspend() (timers are turned off here) > > > > > > > ... > > > > > > > 3. wfi (wait for wake-IRQ here) > > > > > > > > > > > > > > Between steps 1 and 2 the timers can still generate interrupts > > > > > > > that are > > > > > > > not handled and stay pending until step 3. That pending IRQ > > > > > > > causes an > > > > > > > immediate - spurious - wake. > > > > > > > > > > > > > > The solution is to move the clockevents suspend/resume > > > > > > > notification > > > > > > > out of the syscore_suspend step and explictly call them at the > > > > > > > appropriate > > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend > > > > > > > _before_ > > > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > > > --- > > > > > > > Hi, > > > > > > > > > > > > > > there was not a lot of discussion on the last submission. Just > > > > > > > one comment from > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I > > > > > > > outlined in my > > > > > > > response, does not apply, IMHO, since the platform does not > > > > > > > re-enable > > > > > > > interrupts. > > > > > > > > > > > > Well, you just don't agree with it. > > > > > > > > > > > > The problem with your approach is that timer interrupts aren't > > > > > > actually as > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would > > > > > > have caused > > > > > > similar issues to appear under specific conditions. > > > > > > > > > > > > The solution I would suggest and that actually covers all > > > > > > IRQF_NO_SUSPEND > > > > > > interrupts would be to use a wait_event() loop like the one in > > > > > > freeze_enter() > > > > > > (on top of the current linux-next or the pm-genirq branch of > > > > > > linux-pm.git), > > > > > > but wait for pm_abort_suspend to become true, to implement system > > > > > > suspend. > > > > > > > > > > sorry, it took me a while since I needed to get some dependencies > > > > > ported > > > > > to the pm-genirq base. Once I had that, it reproduced my original > > > > > issue. > > > > > So far so good. I then looked into finding a solution following your > > > > > guidance. I'm not sure I really found what you had in mind, but below > > > > > is > > > > > what I came up with, which seems to do it. > > > > > Please let me know how far off I am. > > > > > > > > > > Thanks, > > > > > Sören > > > > > > > > > > ---8<--8<8<8<--- > > > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > > > > > index c2744b30d5d9..a4f9914571f1 100644 > > > > > --- a/drivers/base/power/wakeup.c > > > > > +++ b/drivers/base/power/wakeup.c > > > > > @@ -25,7 +25,7 @@ > > > > > bool events_check_enabled __read_mostly; > > > > > > > > > > /* If set and the system is suspending, terminate the suspend. */ > > > > > -static bool pm_abort_suspend __read_mostly; > > > > > +bool pm_abort_suspend __read_mostly; > > > > > > > > > > /* > > > > > * Combined counters of registered wakeup events and wakeup events > > > > > in progress. > > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644 > > > > > --- a/kernel/power/suspend.c > > > > > +++ b/kernel/power/suspend.c > > > > > @@ -33,6 +33,7 @@ > > > > > > > > > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > > > > > const char *pm_states[PM_SUSPEND_MAX]; > > > > > +extern bool pm_abort_suspend; > > > > > > > > > > static const struct platform_suspend_ops *suspend_ops; > > > > > static const struct platform_freeze_ops *freeze_ops; > > > > > @@ -294,25 +295,27 @@ static int
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, Soren, On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi (wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote: Hi Rafael, Soren, On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi(wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend),
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote: On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote: Hi Rafael, Soren, On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote: On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi(wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > > Hi Rafael, > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > > Hi Rafael, > > > > > > Hi, > > > > > > Sorry for the huge delay. > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > > > On platforms that do not power off during suspend, successfully > > > > > > entering > > > > > > suspend races with timers. > > > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > > > 1. disable IRQs (e.g. > > > > > > arch_suspend_disable_irqs()) > > > > > > ... > > > > > > 2. syscore_suspend() > > > > > > -> timekeeping_suspend() > > > > > >-> clockevents_notify(SUSPEND) > > > > > > -> tick_suspend() (timers are turned off here) > > > > > > ... > > > > > > 3. wfi(wait for wake-IRQ here) > > > > > > > > > > > > Between steps 1 and 2 the timers can still generate interrupts that > > > > > > are > > > > > > not handled and stay pending until step 3. That pending IRQ causes > > > > > > an > > > > > > immediate - spurious - wake. > > > > > > > > > > > > The solution is to move the clockevents suspend/resume notification > > > > > > out of the syscore_suspend step and explictly call them at the > > > > > > appropriate > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend > > > > > > _before_ > > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > > --- > > > > > > Hi, > > > > > > > > > > > > there was not a lot of discussion on the last submission. Just one > > > > > > comment from > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined > > > > > > in my > > > > > > response, does not apply, IMHO, since the platform does not > > > > > > re-enable > > > > > > interrupts. > > > > > > > > > > Well, you just don't agree with it. > > > > > > > > > > The problem with your approach is that timer interrupts aren't > > > > > actually as > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would > > > > > have caused > > > > > similar issues to appear under specific conditions. > > > > > > > > > > The solution I would suggest and that actually covers all > > > > > IRQF_NO_SUSPEND > > > > > interrupts would be to use a wait_event() loop like the one in > > > > > freeze_enter() > > > > > (on top of the current linux-next or the pm-genirq branch of > > > > > linux-pm.git), > > > > > but wait for pm_abort_suspend to become true, to implement system > > > > > suspend. > > > > > > > > sorry, it took me a while since I needed to get some dependencies ported > > > > to the pm-genirq base. Once I had that, it reproduced my original issue. > > > > So far so good. I then looked into finding a solution following your > > > > guidance. I'm not sure I really found what you had in mind, but below is > > > > what I came up with, which seems to do it. > > > > Please let me know how far off I am. > > > > > > > > Thanks, > > > > Sören > > > > > > > > ---8<--8<8<8<--- > > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > > > > index c2744b30d5d9..a4f9914571f1 100644 > > > > --- a/drivers/base/power/wakeup.c > > > > +++ b/drivers/base/power/wakeup.c > > > > @@ -25,7 +25,7 @@ > > > > bool events_check_enabled __read_mostly; > > > > > > > > /* If set and the system is suspending, terminate the suspend. */ > > > > -static bool pm_abort_suspend __read_mostly; > > > > +bool pm_abort_suspend __read_mostly; > > > > > > > > /* > > > > * Combined counters of registered wakeup events and wakeup events in > > > > progress. > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644 > > > > --- a/kernel/power/suspend.c > > > > +++ b/kernel/power/suspend.c > > > > @@ -33,6 +33,7 @@ > > > > > > > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > > > > const char *pm_states[PM_SUSPEND_MAX]; > > > > +extern bool pm_abort_suspend; > > > > > > > > static const struct platform_suspend_ops *suspend_ops; > > > > static const struct platform_freeze_ops *freeze_ops; > > > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, > > > > bool *wakeup) > > > > if (error || suspend_test(TEST_CPUS)) > > > > goto Enable_cpus; > > > > > > > > - arch_suspend_disable_irqs(); > > > > - BUG_ON(!irqs_disabled()); > > > > - > > > > - error = syscore_suspend(); > > > > - if (!error) { > > > > -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote: On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi(wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { That won't work in general, because
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: > Hi Rafael, > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > > Hi Rafael, > > > > Hi, > > > > Sorry for the huge delay. > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > > On platforms that do not power off during suspend, successfully > > > > > entering > > > > > suspend races with timers. > > > > > > > > > > The race happening in a couple of location is: > > > > > > > > > > 1. disable IRQs (e.g. arch_suspend_disable_irqs()) > > > > > ... > > > > > 2. syscore_suspend() > > > > > -> timekeeping_suspend() > > > > >-> clockevents_notify(SUSPEND) > > > > > -> tick_suspend() (timers are turned off here) > > > > > ... > > > > > 3. wfi (wait for wake-IRQ here) > > > > > > > > > > Between steps 1 and 2 the timers can still generate interrupts that > > > > > are > > > > > not handled and stay pending until step 3. That pending IRQ causes an > > > > > immediate - spurious - wake. > > > > > > > > > > The solution is to move the clockevents suspend/resume notification > > > > > out of the syscore_suspend step and explictly call them at the > > > > > appropriate > > > > > time in the suspend/hibernation paths. I.e. timers are suspend > > > > > _before_ > > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > > > Signed-off-by: Soren Brinkmann > > > > > --- > > > > > Hi, > > > > > > > > > > there was not a lot of discussion on the last submission. Just one > > > > > comment from > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined > > > > > in my > > > > > response, does not apply, IMHO, since the platform does not re-enable > > > > > interrupts. > > > > > > > > Well, you just don't agree with it. > > > > > > > > The problem with your approach is that timer interrupts aren't actually > > > > as > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would > > > > have caused > > > > similar issues to appear under specific conditions. > > > > > > > > The solution I would suggest and that actually covers all > > > > IRQF_NO_SUSPEND > > > > interrupts would be to use a wait_event() loop like the one in > > > > freeze_enter() > > > > (on top of the current linux-next or the pm-genirq branch of > > > > linux-pm.git), > > > > but wait for pm_abort_suspend to become true, to implement system > > > > suspend. > > > > > > sorry, it took me a while since I needed to get some dependencies ported > > > to the pm-genirq base. Once I had that, it reproduced my original issue. > > > So far so good. I then looked into finding a solution following your > > > guidance. I'm not sure I really found what you had in mind, but below is > > > what I came up with, which seems to do it. > > > Please let me know how far off I am. > > > > > > Thanks, > > > Sören > > > > > > ---8<--8<8<8<--- > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > > > index c2744b30d5d9..a4f9914571f1 100644 > > > --- a/drivers/base/power/wakeup.c > > > +++ b/drivers/base/power/wakeup.c > > > @@ -25,7 +25,7 @@ > > > bool events_check_enabled __read_mostly; > > > > > > /* If set and the system is suspending, terminate the suspend. */ > > > -static bool pm_abort_suspend __read_mostly; > > > +bool pm_abort_suspend __read_mostly; > > > > > > /* > > > * Combined counters of registered wakeup events and wakeup events in > > > progress. > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > > index 6dadb25cb0d8..e6a6de8f76d0 100644 > > > --- a/kernel/power/suspend.c > > > +++ b/kernel/power/suspend.c > > > @@ -33,6 +33,7 @@ > > > > > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > > > const char *pm_states[PM_SUSPEND_MAX]; > > > +extern bool pm_abort_suspend; > > > > > > static const struct platform_suspend_ops *suspend_ops; > > > static const struct platform_freeze_ops *freeze_ops; > > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, > > > bool *wakeup) > > > if (error || suspend_test(TEST_CPUS)) > > > goto Enable_cpus; > > > > > > - arch_suspend_disable_irqs(); > > > - BUG_ON(!irqs_disabled()); > > > - > > > - error = syscore_suspend(); > > > - if (!error) { > > > - *wakeup = pm_wakeup_pending(); > > > - if (!(suspend_test(TEST_CORE) || *wakeup)) { > > > - trace_suspend_resume(TPS("machine_suspend"), > > > - state, true); > > > - error = suspend_ops->enter(state); > > > - trace_suspend_resume(TPS("machine_suspend"), > > > - state, false); > > > -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote: Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi (wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { That won't work in general, because pm_abort_suspend may not be set on some platforms on wakeup. It is only set if a wakeup interrupt triggers which may not be the case on ACPI systems if the BIOS has woken up the system. But that could be addressed by making those platforms simply set pm_wakeup_pending in their BIOS exit path. +
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > > Hi Rafael, > > Hi, > > Sorry for the huge delay. > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > > On platforms that do not power off during suspend, successfully entering > > > > suspend races with timers. > > > > > > > > The race happening in a couple of location is: > > > > > > > > 1. disable IRQs (e.g. arch_suspend_disable_irqs()) > > > > ... > > > > 2. syscore_suspend() > > > > -> timekeeping_suspend() > > > >-> clockevents_notify(SUSPEND) > > > > -> tick_suspend() (timers are turned off here) > > > > ... > > > > 3. wfi(wait for wake-IRQ here) > > > > > > > > Between steps 1 and 2 the timers can still generate interrupts that are > > > > not handled and stay pending until step 3. That pending IRQ causes an > > > > immediate - spurious - wake. > > > > > > > > The solution is to move the clockevents suspend/resume notification > > > > out of the syscore_suspend step and explictly call them at the > > > > appropriate > > > > time in the suspend/hibernation paths. I.e. timers are suspend _before_ > > > > IRQs get disabled. And accordingly in the resume path. > > > > > > > > Signed-off-by: Soren Brinkmann > > > > --- > > > > Hi, > > > > > > > > there was not a lot of discussion on the last submission. Just one > > > > comment from > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in > > > > my > > > > response, does not apply, IMHO, since the platform does not re-enable > > > > interrupts. > > > > > > Well, you just don't agree with it. > > > > > > The problem with your approach is that timer interrupts aren't actually as > > > special as you think and any other IRQF_NO_SUSPEND interrupts would have > > > caused > > > similar issues to appear under specific conditions. > > > > > > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND > > > interrupts would be to use a wait_event() loop like the one in > > > freeze_enter() > > > (on top of the current linux-next or the pm-genirq branch of > > > linux-pm.git), > > > but wait for pm_abort_suspend to become true, to implement system suspend. > > > > sorry, it took me a while since I needed to get some dependencies ported > > to the pm-genirq base. Once I had that, it reproduced my original issue. > > So far so good. I then looked into finding a solution following your > > guidance. I'm not sure I really found what you had in mind, but below is > > what I came up with, which seems to do it. > > Please let me know how far off I am. > > > > Thanks, > > Sören > > > > ---8<--8<8<8<--- > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > > index c2744b30d5d9..a4f9914571f1 100644 > > --- a/drivers/base/power/wakeup.c > > +++ b/drivers/base/power/wakeup.c > > @@ -25,7 +25,7 @@ > > bool events_check_enabled __read_mostly; > > > > /* If set and the system is suspending, terminate the suspend. */ > > -static bool pm_abort_suspend __read_mostly; > > +bool pm_abort_suspend __read_mostly; > > > > /* > > * Combined counters of registered wakeup events and wakeup events in > > progress. > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 6dadb25cb0d8..e6a6de8f76d0 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -33,6 +33,7 @@ > > > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > > const char *pm_states[PM_SUSPEND_MAX]; > > +extern bool pm_abort_suspend; > > > > static const struct platform_suspend_ops *suspend_ops; > > static const struct platform_freeze_ops *freeze_ops; > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool > > *wakeup) > > if (error || suspend_test(TEST_CPUS)) > > goto Enable_cpus; > > > > - arch_suspend_disable_irqs(); > > - BUG_ON(!irqs_disabled()); > > - > > - error = syscore_suspend(); > > - if (!error) { > > - *wakeup = pm_wakeup_pending(); > > - if (!(suspend_test(TEST_CORE) || *wakeup)) { > > - trace_suspend_resume(TPS("machine_suspend"), > > - state, true); > > - error = suspend_ops->enter(state); > > - trace_suspend_resume(TPS("machine_suspend"), > > - state, false); > > - events_check_enabled = false; > > + while (!pm_abort_suspend) { > > That won't work in general, because pm_abort_suspend may not be set on some > platforms on wakeup. It is only set if a wakeup interrupt triggers which > may not be the case on ACPI systems if the BIOS has woken up the system. > > But that could
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote: On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi(wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { That won't work in general, because pm_abort_suspend may not be set on some platforms on wakeup. It is only set if a wakeup interrupt triggers which may not be the case on ACPI systems if the BIOS has woken up the system. But that could be addressed by making those platforms simply set pm_wakeup_pending in their BIOS exit path. + arch_suspend_disable_irqs(); + BUG_ON(!irqs_disabled()); + + error = syscore_suspend(); Also it shouldn't be necessary to do syscore_suspend()/syscore_resume()
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: > Hi Rafael, Hi, Sorry for the huge delay. > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > On platforms that do not power off during suspend, successfully entering > > > suspend races with timers. > > > > > > The race happening in a couple of location is: > > > > > > 1. disable IRQs (e.g. arch_suspend_disable_irqs()) > > > ... > > > 2. syscore_suspend() > > > -> timekeeping_suspend() > > >-> clockevents_notify(SUSPEND) > > > -> tick_suspend() (timers are turned off here) > > > ... > > > 3. wfi (wait for wake-IRQ here) > > > > > > Between steps 1 and 2 the timers can still generate interrupts that are > > > not handled and stay pending until step 3. That pending IRQ causes an > > > immediate - spurious - wake. > > > > > > The solution is to move the clockevents suspend/resume notification > > > out of the syscore_suspend step and explictly call them at the appropriate > > > time in the suspend/hibernation paths. I.e. timers are suspend _before_ > > > IRQs get disabled. And accordingly in the resume path. > > > > > > Signed-off-by: Soren Brinkmann > > > --- > > > Hi, > > > > > > there was not a lot of discussion on the last submission. Just one > > > comment from > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my > > > response, does not apply, IMHO, since the platform does not re-enable > > > interrupts. > > > > Well, you just don't agree with it. > > > > The problem with your approach is that timer interrupts aren't actually as > > special as you think and any other IRQF_NO_SUSPEND interrupts would have > > caused > > similar issues to appear under specific conditions. > > > > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND > > interrupts would be to use a wait_event() loop like the one in > > freeze_enter() > > (on top of the current linux-next or the pm-genirq branch of linux-pm.git), > > but wait for pm_abort_suspend to become true, to implement system suspend. > > sorry, it took me a while since I needed to get some dependencies ported > to the pm-genirq base. Once I had that, it reproduced my original issue. > So far so good. I then looked into finding a solution following your > guidance. I'm not sure I really found what you had in mind, but below is > what I came up with, which seems to do it. > Please let me know how far off I am. > > Thanks, > Sören > > ---8<--8<8<8<--- > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > index c2744b30d5d9..a4f9914571f1 100644 > --- a/drivers/base/power/wakeup.c > +++ b/drivers/base/power/wakeup.c > @@ -25,7 +25,7 @@ > bool events_check_enabled __read_mostly; > > /* If set and the system is suspending, terminate the suspend. */ > -static bool pm_abort_suspend __read_mostly; > +bool pm_abort_suspend __read_mostly; > > /* > * Combined counters of registered wakeup events and wakeup events in > progress. > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 6dadb25cb0d8..e6a6de8f76d0 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -33,6 +33,7 @@ > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > const char *pm_states[PM_SUSPEND_MAX]; > +extern bool pm_abort_suspend; > > static const struct platform_suspend_ops *suspend_ops; > static const struct platform_freeze_ops *freeze_ops; > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool > *wakeup) > if (error || suspend_test(TEST_CPUS)) > goto Enable_cpus; > > - arch_suspend_disable_irqs(); > - BUG_ON(!irqs_disabled()); > - > - error = syscore_suspend(); > - if (!error) { > - *wakeup = pm_wakeup_pending(); > - if (!(suspend_test(TEST_CORE) || *wakeup)) { > - trace_suspend_resume(TPS("machine_suspend"), > - state, true); > - error = suspend_ops->enter(state); > - trace_suspend_resume(TPS("machine_suspend"), > - state, false); > - events_check_enabled = false; > + while (!pm_abort_suspend) { That won't work in general, because pm_abort_suspend may not be set on some platforms on wakeup. It is only set if a wakeup interrupt triggers which may not be the case on ACPI systems if the BIOS has woken up the system. But that could be addressed by making those platforms simply set pm_wakeup_pending in their BIOS exit path. > + arch_suspend_disable_irqs(); > + BUG_ON(!irqs_disabled()); > + > + error = syscore_suspend(); Also it shouldn't be necessary to do
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote: Hi Rafael, Hi, Sorry for the huge delay. On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi (wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { That won't work in general, because pm_abort_suspend may not be set on some platforms on wakeup. It is only set if a wakeup interrupt triggers which may not be the case on ACPI systems if the BIOS has woken up the system. But that could be addressed by making those platforms simply set pm_wakeup_pending in their BIOS exit path. + arch_suspend_disable_irqs(); + BUG_ON(!irqs_disabled()); + + error = syscore_suspend(); Also it shouldn't be necessary to do syscore_suspend()/syscore_resume() in every iteration of the loop. + if (!error) { + *wakeup = pm_wakeup_pending(); Plus pm_wakeup_pending() returns true if
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, any opinion on this? Thanks, Sören On Thu, 2014-10-02 at 09:01AM -0700, Sören Brinkmann wrote: > Hi Rafael, > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > > On platforms that do not power off during suspend, successfully entering > > > suspend races with timers. > > > > > > The race happening in a couple of location is: > > > > > > 1. disable IRQs (e.g. arch_suspend_disable_irqs()) > > > ... > > > 2. syscore_suspend() > > > -> timekeeping_suspend() > > >-> clockevents_notify(SUSPEND) > > > -> tick_suspend() (timers are turned off here) > > > ... > > > 3. wfi (wait for wake-IRQ here) > > > > > > Between steps 1 and 2 the timers can still generate interrupts that are > > > not handled and stay pending until step 3. That pending IRQ causes an > > > immediate - spurious - wake. > > > > > > The solution is to move the clockevents suspend/resume notification > > > out of the syscore_suspend step and explictly call them at the appropriate > > > time in the suspend/hibernation paths. I.e. timers are suspend _before_ > > > IRQs get disabled. And accordingly in the resume path. > > > > > > Signed-off-by: Soren Brinkmann > > > --- > > > Hi, > > > > > > there was not a lot of discussion on the last submission. Just one > > > comment from > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my > > > response, does not apply, IMHO, since the platform does not re-enable > > > interrupts. > > > > Well, you just don't agree with it. > > > > The problem with your approach is that timer interrupts aren't actually as > > special as you think and any other IRQF_NO_SUSPEND interrupts would have > > caused > > similar issues to appear under specific conditions. > > > > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND > > interrupts would be to use a wait_event() loop like the one in > > freeze_enter() > > (on top of the current linux-next or the pm-genirq branch of linux-pm.git), > > but wait for pm_abort_suspend to become true, to implement system suspend. > > sorry, it took me a while since I needed to get some dependencies ported > to the pm-genirq base. Once I had that, it reproduced my original issue. > So far so good. I then looked into finding a solution following your > guidance. I'm not sure I really found what you had in mind, but below is > what I came up with, which seems to do it. > Please let me know how far off I am. > > Thanks, > Sören > > ---8<--8<8<8<--- > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > index c2744b30d5d9..a4f9914571f1 100644 > --- a/drivers/base/power/wakeup.c > +++ b/drivers/base/power/wakeup.c > @@ -25,7 +25,7 @@ > bool events_check_enabled __read_mostly; > > /* If set and the system is suspending, terminate the suspend. */ > -static bool pm_abort_suspend __read_mostly; > +bool pm_abort_suspend __read_mostly; > > /* > * Combined counters of registered wakeup events and wakeup events in > progress. > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 6dadb25cb0d8..e6a6de8f76d0 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -33,6 +33,7 @@ > > static const char *pm_labels[] = { "mem", "standby", "freeze", }; > const char *pm_states[PM_SUSPEND_MAX]; > +extern bool pm_abort_suspend; > > static const struct platform_suspend_ops *suspend_ops; > static const struct platform_freeze_ops *freeze_ops; > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool > *wakeup) > if (error || suspend_test(TEST_CPUS)) > goto Enable_cpus; > > - arch_suspend_disable_irqs(); > - BUG_ON(!irqs_disabled()); > - > - error = syscore_suspend(); > - if (!error) { > - *wakeup = pm_wakeup_pending(); > - if (!(suspend_test(TEST_CORE) || *wakeup)) { > - trace_suspend_resume(TPS("machine_suspend"), > - state, true); > - error = suspend_ops->enter(state); > - trace_suspend_resume(TPS("machine_suspend"), > - state, false); > - events_check_enabled = false; > + while (!pm_abort_suspend) { > + arch_suspend_disable_irqs(); > + BUG_ON(!irqs_disabled()); > + > + error = syscore_suspend(); > + if (!error) { > + *wakeup = pm_wakeup_pending(); > + if (!(suspend_test(TEST_CORE) || *wakeup)) { > + trace_suspend_resume(TPS("machine_suspend"), > + state, true); > + error = suspend_ops->enter(state); > +
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, any opinion on this? Thanks, Sören On Thu, 2014-10-02 at 09:01AM -0700, Sören Brinkmann wrote: Hi Rafael, On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi (wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { + arch_suspend_disable_irqs(); + BUG_ON(!irqs_disabled()); + + error = syscore_suspend(); + if (!error) { + *wakeup = pm_wakeup_pending(); + if (!(suspend_test(TEST_CORE) || *wakeup)) { + trace_suspend_resume(TPS(machine_suspend), + state, true); + error = suspend_ops-enter(state); + trace_suspend_resume(TPS(machine_suspend), + state, false); + events_check_enabled = false; +
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > > On platforms that do not power off during suspend, successfully entering > > suspend races with timers. > > > > The race happening in a couple of location is: > > > > 1. disable IRQs (e.g. arch_suspend_disable_irqs()) > > ... > > 2. syscore_suspend() > > -> timekeeping_suspend() > >-> clockevents_notify(SUSPEND) > > -> tick_suspend() (timers are turned off here) > > ... > > 3. wfi(wait for wake-IRQ here) > > > > Between steps 1 and 2 the timers can still generate interrupts that are > > not handled and stay pending until step 3. That pending IRQ causes an > > immediate - spurious - wake. > > > > The solution is to move the clockevents suspend/resume notification > > out of the syscore_suspend step and explictly call them at the appropriate > > time in the suspend/hibernation paths. I.e. timers are suspend _before_ > > IRQs get disabled. And accordingly in the resume path. > > > > Signed-off-by: Soren Brinkmann > > --- > > Hi, > > > > there was not a lot of discussion on the last submission. Just one comment > > from > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my > > response, does not apply, IMHO, since the platform does not re-enable > > interrupts. > > Well, you just don't agree with it. > > The problem with your approach is that timer interrupts aren't actually as > special as you think and any other IRQF_NO_SUSPEND interrupts would have > caused > similar issues to appear under specific conditions. > > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND > interrupts would be to use a wait_event() loop like the one in freeze_enter() > (on top of the current linux-next or the pm-genirq branch of linux-pm.git), > but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8<--8<8<8<--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { "mem", "standby", "freeze", }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS("machine_suspend"), - state, true); - error = suspend_ops->enter(state); - trace_suspend_resume(TPS("machine_suspend"), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { + arch_suspend_disable_irqs(); + BUG_ON(!irqs_disabled()); + + error = syscore_suspend(); + if (!error) { + *wakeup = pm_wakeup_pending(); + if (!(suspend_test(TEST_CORE) || *wakeup)) { + trace_suspend_resume(TPS("machine_suspend"), + state, true); + error = suspend_ops->enter(state); + trace_suspend_resume(TPS("machine_suspend"), + state, false); + events_check_enabled = false; + } + syscore_resume(); } - syscore_resume(); -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
Hi Rafael, On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote: On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi(wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. sorry, it took me a while since I needed to get some dependencies ported to the pm-genirq base. Once I had that, it reproduced my original issue. So far so good. I then looked into finding a solution following your guidance. I'm not sure I really found what you had in mind, but below is what I came up with, which seems to do it. Please let me know how far off I am. Thanks, Sören ---8--888--- diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index c2744b30d5d9..a4f9914571f1 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -25,7 +25,7 @@ bool events_check_enabled __read_mostly; /* If set and the system is suspending, terminate the suspend. */ -static bool pm_abort_suspend __read_mostly; +bool pm_abort_suspend __read_mostly; /* * Combined counters of registered wakeup events and wakeup events in progress. diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25cb0d8..e6a6de8f76d0 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -33,6 +33,7 @@ static const char *pm_labels[] = { mem, standby, freeze, }; const char *pm_states[PM_SUSPEND_MAX]; +extern bool pm_abort_suspend; static const struct platform_suspend_ops *suspend_ops; static const struct platform_freeze_ops *freeze_ops; @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) if (error || suspend_test(TEST_CPUS)) goto Enable_cpus; - arch_suspend_disable_irqs(); - BUG_ON(!irqs_disabled()); - - error = syscore_suspend(); - if (!error) { - *wakeup = pm_wakeup_pending(); - if (!(suspend_test(TEST_CORE) || *wakeup)) { - trace_suspend_resume(TPS(machine_suspend), - state, true); - error = suspend_ops-enter(state); - trace_suspend_resume(TPS(machine_suspend), - state, false); - events_check_enabled = false; + while (!pm_abort_suspend) { + arch_suspend_disable_irqs(); + BUG_ON(!irqs_disabled()); + + error = syscore_suspend(); + if (!error) { + *wakeup = pm_wakeup_pending(); + if (!(suspend_test(TEST_CORE) || *wakeup)) { + trace_suspend_resume(TPS(machine_suspend), + state, true); + error = suspend_ops-enter(state); + trace_suspend_resume(TPS(machine_suspend), + state, false); + events_check_enabled = false; + } + syscore_resume(); } - syscore_resume(); - } - arch_suspend_enable_irqs(); -
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: > On platforms that do not power off during suspend, successfully entering > suspend races with timers. > > The race happening in a couple of location is: > > 1. disable IRQs (e.g. arch_suspend_disable_irqs()) > ... > 2. syscore_suspend() > -> timekeeping_suspend() >-> clockevents_notify(SUSPEND) > -> tick_suspend() (timers are turned off here) > ... > 3. wfi (wait for wake-IRQ here) > > Between steps 1 and 2 the timers can still generate interrupts that are > not handled and stay pending until step 3. That pending IRQ causes an > immediate - spurious - wake. > > The solution is to move the clockevents suspend/resume notification > out of the syscore_suspend step and explictly call them at the appropriate > time in the suspend/hibernation paths. I.e. timers are suspend _before_ > IRQs get disabled. And accordingly in the resume path. > > Signed-off-by: Soren Brinkmann > --- > Hi, > > there was not a lot of discussion on the last submission. Just one comment > from > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my > response, does not apply, IMHO, since the platform does not re-enable > interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] PM / sleep: Fix racing timers
On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote: On platforms that do not power off during suspend, successfully entering suspend races with timers. The race happening in a couple of location is: 1. disable IRQs (e.g. arch_suspend_disable_irqs()) ... 2. syscore_suspend() - timekeeping_suspend() - clockevents_notify(SUSPEND) - tick_suspend() (timers are turned off here) ... 3. wfi (wait for wake-IRQ here) Between steps 1 and 2 the timers can still generate interrupts that are not handled and stay pending until step 3. That pending IRQ causes an immediate - spurious - wake. The solution is to move the clockevents suspend/resume notification out of the syscore_suspend step and explictly call them at the appropriate time in the suspend/hibernation paths. I.e. timers are suspend _before_ IRQs get disabled. And accordingly in the resume path. Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com --- Hi, there was not a lot of discussion on the last submission. Just one comment from Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my response, does not apply, IMHO, since the platform does not re-enable interrupts. Well, you just don't agree with it. The problem with your approach is that timer interrupts aren't actually as special as you think and any other IRQF_NO_SUSPEND interrupts would have caused similar issues to appear under specific conditions. The solution I would suggest and that actually covers all IRQF_NO_SUSPEND interrupts would be to use a wait_event() loop like the one in freeze_enter() (on top of the current linux-next or the pm-genirq branch of linux-pm.git), but wait for pm_abort_suspend to become true, to implement system suspend. Rafael -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/