Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-24 Thread Thomas Gleixner
On Fri, 23 Jan 2015, Sören Brinkmann wrote:
> On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote:
> > I thought that a shutdown clock event device explicitly disables IRQ
> > assertion, that's why I am inquiring, I do not understand how this
> > can happen - how can you have a pending timer IRQ at step 3 above.
> 
> It does I guess, but the shutdown of the IRQ happens after disabling
> IRQs. During that window the IRQ can become pening without anybody
> handling it. Shutting the timer down usually just ensures that no new
> interrupts are signaled but it apparently doesn't check whether any are
> still pending (which isn't necessary in almost all cases since the ISR
> would take care of it).

Well, the issue is that lots of platforms are just not affected by
this because they either power off or simply do not propagate
interrupts which are not explicitely marked as wakeup sources.

After thinking more about it, we really should try to handle it at the
core code, especially because the freezer folks who try to shutdown
the tick completely run into similar issues.

But, I don't think that just moving the suspend() call before
disabling interrupts is sufficient. I rather think it's outright
dangerous.

suspend_ce()
   local_irq_disable();

---> Interrupt becomes pending

   shutdown(ce);
   local_irq_enable();

---> Interrupt fires

There are implementations which actually switch of clocks and such at
shutdown. So depending on the code in the actual interrupt handler (not the
clockevent->handler, though that might have issues as well) we might
touch a disfunctional device with possibly fatal consequences.

Right now this cannot happen as the invocation of the interrupt
handler is guaranteed to happen _after_ the device has been
resumed. Just because it does not explode in your implementation does
not mean its safe to inflict on everyone.

So if we try to handle this issue at the core level, then we really
must deal with the interrupt itself.

Timer interrupts are always marked IRQF_NO_SUSPEND, because we still
need them until syscore_suspend(). So in case we really shut down the
clockevent device via tick_suspend() then we must disable the
interrupt there as well and reenable it in tick_resume().

But that's not really possible at the core level today. Timer
interrupts are delivered by various mechanisms:

  - plain interrupts
  - percpu interrupts
  - per cpu direct vectors
  - more complex stuff

and we have no idea which type we are dealing with. So the only option
would be to have a callback and let the suspend code do:

   if (cedev->disable_irq)
  cedev->disable_irq(cddev);

and the reverse operation for resume. So we can provide default
implementations for the callbacks:

   void ce_[dis|en]able_irq(...)
   void ce_[dis|en]able_percpu_irq(...)

which would rely on cedev->irq. Those two would cover most of the
implementations. The ones with special requirements can implement
their own.

Now at the irq level, we'd need a new function for the device irqs:

disable_and_mask_irq_nosync(irq)

which makes sure that the line is masked at the hardware level. That's
necessary because the lazy interrupt disable is not sufficient. The
percpu irqs are already masking at the hardware level.

One might argue that we could handle this in set_mode() as well, but
we are thinking about going away from that multiplex call. Aside of
that, ensuring the symmetry of disable/enable invocations would be
problematic with the existing set_mode() implementations and usage.

Thanks,

tglx



Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-24 Thread Thomas Gleixner
On Fri, 23 Jan 2015, Sören Brinkmann wrote:
 On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote:
  I thought that a shutdown clock event device explicitly disables IRQ
  assertion, that's why I am inquiring, I do not understand how this
  can happen - how can you have a pending timer IRQ at step 3 above.
 
 It does I guess, but the shutdown of the IRQ happens after disabling
 IRQs. During that window the IRQ can become pening without anybody
 handling it. Shutting the timer down usually just ensures that no new
 interrupts are signaled but it apparently doesn't check whether any are
 still pending (which isn't necessary in almost all cases since the ISR
 would take care of it).

Well, the issue is that lots of platforms are just not affected by
this because they either power off or simply do not propagate
interrupts which are not explicitely marked as wakeup sources.

After thinking more about it, we really should try to handle it at the
core code, especially because the freezer folks who try to shutdown
the tick completely run into similar issues.

But, I don't think that just moving the suspend() call before
disabling interrupts is sufficient. I rather think it's outright
dangerous.

suspend_ce()
   local_irq_disable();

--- Interrupt becomes pending

   shutdown(ce);
   local_irq_enable();

--- Interrupt fires

There are implementations which actually switch of clocks and such at
shutdown. So depending on the code in the actual interrupt handler (not the
clockevent-handler, though that might have issues as well) we might
touch a disfunctional device with possibly fatal consequences.

Right now this cannot happen as the invocation of the interrupt
handler is guaranteed to happen _after_ the device has been
resumed. Just because it does not explode in your implementation does
not mean its safe to inflict on everyone.

So if we try to handle this issue at the core level, then we really
must deal with the interrupt itself.

Timer interrupts are always marked IRQF_NO_SUSPEND, because we still
need them until syscore_suspend(). So in case we really shut down the
clockevent device via tick_suspend() then we must disable the
interrupt there as well and reenable it in tick_resume().

But that's not really possible at the core level today. Timer
interrupts are delivered by various mechanisms:

  - plain interrupts
  - percpu interrupts
  - per cpu direct vectors
  - more complex stuff

and we have no idea which type we are dealing with. So the only option
would be to have a callback and let the suspend code do:

   if (cedev-disable_irq)
  cedev-disable_irq(cddev);

and the reverse operation for resume. So we can provide default
implementations for the callbacks:

   void ce_[dis|en]able_irq(...)
   void ce_[dis|en]able_percpu_irq(...)

which would rely on cedev-irq. Those two would cover most of the
implementations. The ones with special requirements can implement
their own.

Now at the irq level, we'd need a new function for the device irqs:

disable_and_mask_irq_nosync(irq)

which makes sure that the line is masked at the hardware level. That's
necessary because the lazy interrupt disable is not sufficient. The
percpu irqs are already masking at the hardware level.

One might argue that we could handle this in set_mode() as well, but
we are thinking about going away from that multiplex call. Aside of
that, ensuring the symmetry of disable/enable invocations would be
problematic with the existing set_mode() implementations and usage.

Thanks,

tglx



Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-23 Thread Sören Brinkmann
On Mon, 2015-01-12 at 12:20AM +0100, Rafael J. Wysocki wrote:
> On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
> > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> > > Hi Rafael,
> > > 
> > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > > > Hi Rafael,
> > > > 
> > > > Hi,
> > > > 
> > > > Sorry for the huge delay.
> > > > 
> > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > > > > On platforms that do not power off during suspend, successfully 
> > > > > > > entering
> > > > > > > suspend races with timers.
> > > > > > > 
> > > > > > > The race happening in a couple of location is:
> > > > > > > 
> > > > > > >   1. disable IRQs (e.g. 
> > > > > > > arch_suspend_disable_irqs())
> > > > > > >  ...
> > > > > > >   2. syscore_suspend()
> > > > > > >   -> timekeeping_suspend()
> > > > > > >-> clockevents_notify(SUSPEND)
> > > > > > > -> tick_suspend() (timers are turned off here)
> > > > > > >  ...
> > > > > > >   3. wfi  (wait for wake-IRQ here)
> > > > > > > 
> > > > > > > Between steps 1 and 2 the timers can still generate interrupts 
> > > > > > > that are
> > > > > > > not handled and stay pending until step 3. That pending IRQ 
> > > > > > > causes an
> > > > > > > immediate - spurious - wake.
> > > > > > > 
> > > > > > > The solution is to move the clockevents suspend/resume 
> > > > > > > notification
> > > > > > > out of the syscore_suspend step and explictly call them at the 
> > > > > > > appropriate
> > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend 
> > > > > > > _before_
> > > > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > > > 
> > > > > > > Signed-off-by: Soren Brinkmann 
> > > > > > > ---
> > > > > > > Hi,
> > > > > > > 
> > > > > > > there was not a lot of discussion on the last submission. Just 
> > > > > > > one comment from
> > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
> > > > > > > outlined in my
> > > > > > > response, does not apply, IMHO, since the platform does not 
> > > > > > > re-enable
> > > > > > > interrupts.
> > > > > > 
> > > > > > Well, you just don't agree with it.
> > > > > > 
> > > > > > The problem with your approach is that timer interrupts aren't 
> > > > > > actually as
> > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would 
> > > > > > have caused
> > > > > > similar issues to appear under specific conditions.
> > > > > > 
> > > > > > The solution I would suggest and that actually covers all 
> > > > > > IRQF_NO_SUSPEND
> > > > > > interrupts would be to use a wait_event() loop like the one in 
> > > > > > freeze_enter()
> > > > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > > > linux-pm.git),
> > > > > > but wait for pm_abort_suspend to become true, to implement system 
> > > > > > suspend.
> > > > > 
> > > > > sorry, it took me a while since I needed to get some dependencies 
> > > > > ported
> > > > > to the pm-genirq base. Once I had that, it reproduced my original 
> > > > > issue.
> > > > > So far so good. I then looked into finding a solution following your
> > > > > guidance. I'm not sure I really found what you had in mind, but below 
> > > > > is
> > > > > what I came up with, which seems to do it.
> > > > > Please let me know how far off I am.
> > > > > 
> > > > >   Thanks,
> > > > >   Sören
> > > > > 
> > > > > ---8<--8<8<8<---
> > > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > > > index c2744b30d5d9..a4f9914571f1 100644
> > > > > --- a/drivers/base/power/wakeup.c
> > > > > +++ b/drivers/base/power/wakeup.c
> > > > > @@ -25,7 +25,7 @@
> > > > >  bool events_check_enabled __read_mostly;
> > > > >  
> > > > >  /* If set and the system is suspending, terminate the suspend. */
> > > > > -static bool pm_abort_suspend __read_mostly;
> > > > > +bool pm_abort_suspend __read_mostly;
> > > > >  
> > > > >  /*
> > > > >   * Combined counters of registered wakeup events and wakeup events 
> > > > > in progress.
> > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644
> > > > > --- a/kernel/power/suspend.c
> > > > > +++ b/kernel/power/suspend.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  
> > > > >  static const char *pm_labels[] = { "mem", "standby", "freeze", };
> > > > >  const char *pm_states[PM_SUSPEND_MAX];
> > > > > +extern bool pm_abort_suspend;
> > > > >  
> > > > >  static const struct platform_suspend_ops *suspend_ops;
> > > > >  static const struct platform_freeze_ops *freeze_ops;
> > > > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
> > > 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-23 Thread Sören Brinkmann
Sorry for the delay, but a lot of stuff came in between and I need to
rebase a couple of branches and re-test things.
Some comments inline below:

On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote:
> On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote:
> > On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote:
> > > Hi Rafael, Soren,
> > >
> > > On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
> > > > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
> > > > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> > > > > > Hi Rafael,
> > > > > >
> > > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > > > > > > Hi Rafael,
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Sorry for the huge delay.
> > > > > > >
> > > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann 
> > > > > > > > > wrote:
> > > > > > > > > > On platforms that do not power off during suspend, 
> > > > > > > > > > successfully entering
> > > > > > > > > > suspend races with timers.
> > > > > > > > > >
> > > > > > > > > > The race happening in a couple of location is:
> > > > > > > > > >
> > > > > > > > > >   1. disable IRQs   (e.g. 
> > > > > > > > > > arch_suspend_disable_irqs())
> > > > > > > > > >  ...
> > > > > > > > > >   2. syscore_suspend()
> > > > > > > > > >   -> timekeeping_suspend()
> > > > > > > > > >-> clockevents_notify(SUSPEND)
> > > > > > > > > > -> tick_suspend()   (timers are turned off here)
> > > > > > > > > >  ...
> > > > > > > > > >   3. wfi(wait for wake-IRQ here)
> > > > > > > > > >
> > > > > > > > > > Between steps 1 and 2 the timers can still generate 
> > > > > > > > > > interrupts that are
> > > > > > > > > > not handled and stay pending until step 3. That pending IRQ 
> > > > > > > > > > causes an
> > > > > > > > > > immediate - spurious - wake.
> > > > > > > > > >
> > > > > > > > > > The solution is to move the clockevents suspend/resume 
> > > > > > > > > > notification
> > > > > > > > > > out of the syscore_suspend step and explictly call them at 
> > > > > > > > > > the appropriate
> > > > > > > > > > time in the suspend/hibernation paths. I.e. timers are 
> > > > > > > > > > suspend _before_
> > > > > > > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Soren Brinkmann 
> > > > > > > > > > ---
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > there was not a lot of discussion on the last submission. 
> > > > > > > > > > Just one comment from
> > > > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
> > > > > > > > > > outlined in my
> > > > > > > > > > response, does not apply, IMHO, since the platform does not 
> > > > > > > > > > re-enable
> > > > > > > > > > interrupts.
> > > > > > > > >
> > > > > > > > > Well, you just don't agree with it.
> > > > > > > > >
> > > > > > > > > The problem with your approach is that timer interrupts 
> > > > > > > > > aren't actually as
> > > > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts 
> > > > > > > > > would have caused
> > > > > > > > > similar issues to appear under specific conditions.
> > > > > > > > >
> > > > > > > > > The solution I would suggest and that actually covers all 
> > > > > > > > > IRQF_NO_SUSPEND
> > > > > > > > > interrupts would be to use a wait_event() loop like the one 
> > > > > > > > > in freeze_enter()
> > > > > > > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > > > > > > linux-pm.git),
> > > > > > > > > but wait for pm_abort_suspend to become true, to implement 
> > > > > > > > > system suspend.
> > > > > > > >
> > > > > > > > sorry, it took me a while since I needed to get some 
> > > > > > > > dependencies ported
> > > > > > > > to the pm-genirq base. Once I had that, it reproduced my 
> > > > > > > > original issue.
> > > > > > > > So far so good. I then looked into finding a solution following 
> > > > > > > > your
> > > > > > > > guidance. I'm not sure I really found what you had in mind, but 
> > > > > > > > below is
> > > > > > > > what I came up with, which seems to do it.
> > > > > > > > Please let me know how far off I am.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Sören
> > > > > > > >
> > > > > > > > ---8<--8<8<8<---
> > > > > > > > diff --git a/drivers/base/power/wakeup.c 
> > > > > > > > b/drivers/base/power/wakeup.c
> > > > > > > > index c2744b30d5d9..a4f9914571f1 100644
> > > > > > > > --- a/drivers/base/power/wakeup.c
> > > > > > > > +++ b/drivers/base/power/wakeup.c
> > > > > > > > @@ -25,7 +25,7 @@
> > > > > > > >  bool events_check_enabled 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-23 Thread Sören Brinkmann
On Mon, 2015-01-12 at 12:20AM +0100, Rafael J. Wysocki wrote:
 On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
  On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
   Hi Rafael,
   
   On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
 Hi Rafael,

Hi,

Sorry for the huge delay.

 On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
  On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
   On platforms that do not power off during suspend, successfully 
   entering
   suspend races with timers.
   
   The race happening in a couple of location is:
   
 1. disable IRQs (e.g. 
   arch_suspend_disable_irqs())
...
 2. syscore_suspend()
 - timekeeping_suspend()
  - clockevents_notify(SUSPEND)
   - tick_suspend() (timers are turned off here)
...
 3. wfi  (wait for wake-IRQ here)
   
   Between steps 1 and 2 the timers can still generate interrupts 
   that are
   not handled and stay pending until step 3. That pending IRQ 
   causes an
   immediate - spurious - wake.
   
   The solution is to move the clockevents suspend/resume 
   notification
   out of the syscore_suspend step and explictly call them at the 
   appropriate
   time in the suspend/hibernation paths. I.e. timers are suspend 
   _before_
   IRQs get disabled. And accordingly in the resume path.
   
   Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
   ---
   Hi,
   
   there was not a lot of discussion on the last submission. Just 
   one comment from
   Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
   outlined in my
   response, does not apply, IMHO, since the platform does not 
   re-enable
   interrupts.
  
  Well, you just don't agree with it.
  
  The problem with your approach is that timer interrupts aren't 
  actually as
  special as you think and any other IRQF_NO_SUSPEND interrupts would 
  have caused
  similar issues to appear under specific conditions.
  
  The solution I would suggest and that actually covers all 
  IRQF_NO_SUSPEND
  interrupts would be to use a wait_event() loop like the one in 
  freeze_enter()
  (on top of the current linux-next or the pm-genirq branch of 
  linux-pm.git),
  but wait for pm_abort_suspend to become true, to implement system 
  suspend.
 
 sorry, it took me a while since I needed to get some dependencies 
 ported
 to the pm-genirq base. Once I had that, it reproduced my original 
 issue.
 So far so good. I then looked into finding a solution following your
 guidance. I'm not sure I really found what you had in mind, but below 
 is
 what I came up with, which seems to do it.
 Please let me know how far off I am.
 
   Thanks,
   Sören
 
 ---8--888---
 diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
 index c2744b30d5d9..a4f9914571f1 100644
 --- a/drivers/base/power/wakeup.c
 +++ b/drivers/base/power/wakeup.c
 @@ -25,7 +25,7 @@
  bool events_check_enabled __read_mostly;
  
  /* If set and the system is suspending, terminate the suspend. */
 -static bool pm_abort_suspend __read_mostly;
 +bool pm_abort_suspend __read_mostly;
  
  /*
   * Combined counters of registered wakeup events and wakeup events 
 in progress.
 diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
 index 6dadb25cb0d8..e6a6de8f76d0 100644
 --- a/kernel/power/suspend.c
 +++ b/kernel/power/suspend.c
 @@ -33,6 +33,7 @@
  
  static const char *pm_labels[] = { mem, standby, freeze, };
  const char *pm_states[PM_SUSPEND_MAX];
 +extern bool pm_abort_suspend;
  
  static const struct platform_suspend_ops *suspend_ops;
  static const struct platform_freeze_ops *freeze_ops;
 @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
 bool *wakeup)
   if (error || suspend_test(TEST_CPUS))
   goto Enable_cpus;
  
 - arch_suspend_disable_irqs();
 - BUG_ON(!irqs_disabled());
 -
 - error = syscore_suspend();
 - if (!error) {
 - *wakeup = pm_wakeup_pending();
 - if (!(suspend_test(TEST_CORE) || *wakeup)) {
 - trace_suspend_resume(TPS(machine_suspend),
 - state, true);
 - error = suspend_ops-enter(state);
 - trace_suspend_resume(TPS(machine_suspend),
 -   

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-23 Thread Sören Brinkmann
Sorry for the delay, but a lot of stuff came in between and I need to
rebase a couple of branches and re-test things.
Some comments inline below:

On Mon, 2015-01-12 at 04:14PM +, Lorenzo Pieralisi wrote:
 On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote:
  On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote:
   Hi Rafael, Soren,
  
   On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
 On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
  Hi Rafael,
 
  On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
   On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
Hi Rafael,
  
   Hi,
  
   Sorry for the huge delay.
  
On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
 On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann 
 wrote:
  On platforms that do not power off during suspend, 
  successfully entering
  suspend races with timers.
 
  The race happening in a couple of location is:
 
1. disable IRQs   (e.g. 
  arch_suspend_disable_irqs())
   ...
2. syscore_suspend()
- timekeeping_suspend()
 - clockevents_notify(SUSPEND)
  - tick_suspend()   (timers are turned off here)
   ...
3. wfi(wait for wake-IRQ here)
 
  Between steps 1 and 2 the timers can still generate 
  interrupts that are
  not handled and stay pending until step 3. That pending IRQ 
  causes an
  immediate - spurious - wake.
 
  The solution is to move the clockevents suspend/resume 
  notification
  out of the syscore_suspend step and explictly call them at 
  the appropriate
  time in the suspend/hibernation paths. I.e. timers are 
  suspend _before_
  IRQs get disabled. And accordingly in the resume path.
 
  Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
  ---
  Hi,
 
  there was not a lot of discussion on the last submission. 
  Just one comment from
  Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
  outlined in my
  response, does not apply, IMHO, since the platform does not 
  re-enable
  interrupts.

 Well, you just don't agree with it.

 The problem with your approach is that timer interrupts 
 aren't actually as
 special as you think and any other IRQF_NO_SUSPEND interrupts 
 would have caused
 similar issues to appear under specific conditions.

 The solution I would suggest and that actually covers all 
 IRQF_NO_SUSPEND
 interrupts would be to use a wait_event() loop like the one 
 in freeze_enter()
 (on top of the current linux-next or the pm-genirq branch of 
 linux-pm.git),
 but wait for pm_abort_suspend to become true, to implement 
 system suspend.
   
sorry, it took me a while since I needed to get some 
dependencies ported
to the pm-genirq base. Once I had that, it reproduced my 
original issue.
So far so good. I then looked into finding a solution following 
your
guidance. I'm not sure I really found what you had in mind, but 
below is
what I came up with, which seems to do it.
Please let me know how far off I am.
   
Thanks,
Sören
   
---8--888---
diff --git a/drivers/base/power/wakeup.c 
b/drivers/base/power/wakeup.c
index c2744b30d5d9..a4f9914571f1 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -25,7 +25,7 @@
 bool events_check_enabled __read_mostly;
   
 /* If set and the system is suspending, terminate the suspend. 
*/
-static bool pm_abort_suspend __read_mostly;
+bool pm_abort_suspend __read_mostly;
   
 /*
  * Combined counters of registered wakeup events and wakeup 
events in progress.
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 6dadb25cb0d8..e6a6de8f76d0 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -33,6 +33,7 @@
   
 static const char *pm_labels[] = { mem, standby, freeze, 
};
 const char *pm_states[PM_SUSPEND_MAX];
+extern bool pm_abort_suspend;
   
 static const struct platform_suspend_ops *suspend_ops;
 static const struct platform_freeze_ops 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-12 Thread Lorenzo Pieralisi
On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote:
> On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote:
> > Hi Rafael, Soren,
> >
> > On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
> > > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
> > > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> > > > > Hi Rafael,
> > > > >
> > > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > > > > > Hi Rafael,
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Sorry for the huge delay.
> > > > > >
> > > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > > > > > > On platforms that do not power off during suspend, 
> > > > > > > > > successfully entering
> > > > > > > > > suspend races with timers.
> > > > > > > > >
> > > > > > > > > The race happening in a couple of location is:
> > > > > > > > >
> > > > > > > > >   1. disable IRQs   (e.g. 
> > > > > > > > > arch_suspend_disable_irqs())
> > > > > > > > >  ...
> > > > > > > > >   2. syscore_suspend()
> > > > > > > > >   -> timekeeping_suspend()
> > > > > > > > >-> clockevents_notify(SUSPEND)
> > > > > > > > > -> tick_suspend()   (timers are turned off here)
> > > > > > > > >  ...
> > > > > > > > >   3. wfi(wait for wake-IRQ here)
> > > > > > > > >
> > > > > > > > > Between steps 1 and 2 the timers can still generate 
> > > > > > > > > interrupts that are
> > > > > > > > > not handled and stay pending until step 3. That pending IRQ 
> > > > > > > > > causes an
> > > > > > > > > immediate - spurious - wake.
> > > > > > > > >
> > > > > > > > > The solution is to move the clockevents suspend/resume 
> > > > > > > > > notification
> > > > > > > > > out of the syscore_suspend step and explictly call them at 
> > > > > > > > > the appropriate
> > > > > > > > > time in the suspend/hibernation paths. I.e. timers are 
> > > > > > > > > suspend _before_
> > > > > > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Soren Brinkmann 
> > > > > > > > > ---
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > there was not a lot of discussion on the last submission. 
> > > > > > > > > Just one comment from
> > > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
> > > > > > > > > outlined in my
> > > > > > > > > response, does not apply, IMHO, since the platform does not 
> > > > > > > > > re-enable
> > > > > > > > > interrupts.
> > > > > > > >
> > > > > > > > Well, you just don't agree with it.
> > > > > > > >
> > > > > > > > The problem with your approach is that timer interrupts aren't 
> > > > > > > > actually as
> > > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts 
> > > > > > > > would have caused
> > > > > > > > similar issues to appear under specific conditions.
> > > > > > > >
> > > > > > > > The solution I would suggest and that actually covers all 
> > > > > > > > IRQF_NO_SUSPEND
> > > > > > > > interrupts would be to use a wait_event() loop like the one in 
> > > > > > > > freeze_enter()
> > > > > > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > > > > > linux-pm.git),
> > > > > > > > but wait for pm_abort_suspend to become true, to implement 
> > > > > > > > system suspend.
> > > > > > >
> > > > > > > sorry, it took me a while since I needed to get some dependencies 
> > > > > > > ported
> > > > > > > to the pm-genirq base. Once I had that, it reproduced my original 
> > > > > > > issue.
> > > > > > > So far so good. I then looked into finding a solution following 
> > > > > > > your
> > > > > > > guidance. I'm not sure I really found what you had in mind, but 
> > > > > > > below is
> > > > > > > what I came up with, which seems to do it.
> > > > > > > Please let me know how far off I am.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sören
> > > > > > >
> > > > > > > ---8<--8<8<8<---
> > > > > > > diff --git a/drivers/base/power/wakeup.c 
> > > > > > > b/drivers/base/power/wakeup.c
> > > > > > > index c2744b30d5d9..a4f9914571f1 100644
> > > > > > > --- a/drivers/base/power/wakeup.c
> > > > > > > +++ b/drivers/base/power/wakeup.c
> > > > > > > @@ -25,7 +25,7 @@
> > > > > > >  bool events_check_enabled __read_mostly;
> > > > > > >
> > > > > > >  /* If set and the system is suspending, terminate the suspend. */
> > > > > > > -static bool pm_abort_suspend __read_mostly;
> > > > > > > +bool pm_abort_suspend __read_mostly;
> > > > > > >
> > > > > > >  /*
> > > > > > >   * Combined counters of registered wakeup events and wakeup 
> > > > > > > events in progress.
> > > > > > > diff --git a/kernel/power/suspend.c 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-12 Thread Sören Brinkmann
On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote:
> Hi Rafael, Soren,
> 
> On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
> > On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
> > > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> > > > Hi Rafael,
> > > > 
> > > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > > > > Hi Rafael,
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Sorry for the huge delay.
> > > > > 
> > > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > > > > > On platforms that do not power off during suspend, successfully 
> > > > > > > > entering
> > > > > > > > suspend races with timers.
> > > > > > > > 
> > > > > > > > The race happening in a couple of location is:
> > > > > > > > 
> > > > > > > >   1. disable IRQs   (e.g. 
> > > > > > > > arch_suspend_disable_irqs())
> > > > > > > >  ...
> > > > > > > >   2. syscore_suspend()
> > > > > > > >   -> timekeeping_suspend()
> > > > > > > >-> clockevents_notify(SUSPEND)
> > > > > > > > -> tick_suspend()   (timers are turned off here)
> > > > > > > >  ...
> > > > > > > >   3. wfi(wait for wake-IRQ here)
> > > > > > > > 
> > > > > > > > Between steps 1 and 2 the timers can still generate interrupts 
> > > > > > > > that are
> > > > > > > > not handled and stay pending until step 3. That pending IRQ 
> > > > > > > > causes an
> > > > > > > > immediate - spurious - wake.
> > > > > > > > 
> > > > > > > > The solution is to move the clockevents suspend/resume 
> > > > > > > > notification
> > > > > > > > out of the syscore_suspend step and explictly call them at the 
> > > > > > > > appropriate
> > > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend 
> > > > > > > > _before_
> > > > > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > > > > 
> > > > > > > > Signed-off-by: Soren Brinkmann 
> > > > > > > > ---
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > there was not a lot of discussion on the last submission. Just 
> > > > > > > > one comment from
> > > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
> > > > > > > > outlined in my
> > > > > > > > response, does not apply, IMHO, since the platform does not 
> > > > > > > > re-enable
> > > > > > > > interrupts.
> > > > > > > 
> > > > > > > Well, you just don't agree with it.
> > > > > > > 
> > > > > > > The problem with your approach is that timer interrupts aren't 
> > > > > > > actually as
> > > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts 
> > > > > > > would have caused
> > > > > > > similar issues to appear under specific conditions.
> > > > > > > 
> > > > > > > The solution I would suggest and that actually covers all 
> > > > > > > IRQF_NO_SUSPEND
> > > > > > > interrupts would be to use a wait_event() loop like the one in 
> > > > > > > freeze_enter()
> > > > > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > > > > linux-pm.git),
> > > > > > > but wait for pm_abort_suspend to become true, to implement system 
> > > > > > > suspend.
> > > > > > 
> > > > > > sorry, it took me a while since I needed to get some dependencies 
> > > > > > ported
> > > > > > to the pm-genirq base. Once I had that, it reproduced my original 
> > > > > > issue.
> > > > > > So far so good. I then looked into finding a solution following your
> > > > > > guidance. I'm not sure I really found what you had in mind, but 
> > > > > > below is
> > > > > > what I came up with, which seems to do it.
> > > > > > Please let me know how far off I am.
> > > > > > 
> > > > > > Thanks,
> > > > > > Sören
> > > > > > 
> > > > > > ---8<--8<8<8<---
> > > > > > diff --git a/drivers/base/power/wakeup.c 
> > > > > > b/drivers/base/power/wakeup.c
> > > > > > index c2744b30d5d9..a4f9914571f1 100644
> > > > > > --- a/drivers/base/power/wakeup.c
> > > > > > +++ b/drivers/base/power/wakeup.c
> > > > > > @@ -25,7 +25,7 @@
> > > > > >  bool events_check_enabled __read_mostly;
> > > > > >  
> > > > > >  /* If set and the system is suspending, terminate the suspend. */
> > > > > > -static bool pm_abort_suspend __read_mostly;
> > > > > > +bool pm_abort_suspend __read_mostly;
> > > > > >  
> > > > > >  /*
> > > > > >   * Combined counters of registered wakeup events and wakeup events 
> > > > > > in progress.
> > > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644
> > > > > > --- a/kernel/power/suspend.c
> > > > > > +++ b/kernel/power/suspend.c
> > > > > > @@ -33,6 +33,7 @@
> > > > > >  
> > > > > >  static const char *pm_labels[] = { "mem", "standby", "freeze", };
> > > > 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-12 Thread Lorenzo Pieralisi
Hi Rafael, Soren,

On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
> On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
> > On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> > > Hi Rafael,
> > > 
> > > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > > > Hi Rafael,
> > > > 
> > > > Hi,
> > > > 
> > > > Sorry for the huge delay.
> > > > 
> > > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > > > > On platforms that do not power off during suspend, successfully 
> > > > > > > entering
> > > > > > > suspend races with timers.
> > > > > > > 
> > > > > > > The race happening in a couple of location is:
> > > > > > > 
> > > > > > >   1. disable IRQs (e.g. 
> > > > > > > arch_suspend_disable_irqs())
> > > > > > >  ...
> > > > > > >   2. syscore_suspend()
> > > > > > >   -> timekeeping_suspend()
> > > > > > >-> clockevents_notify(SUSPEND)
> > > > > > > -> tick_suspend() (timers are turned off here)
> > > > > > >  ...
> > > > > > >   3. wfi  (wait for wake-IRQ here)
> > > > > > > 
> > > > > > > Between steps 1 and 2 the timers can still generate interrupts 
> > > > > > > that are
> > > > > > > not handled and stay pending until step 3. That pending IRQ 
> > > > > > > causes an
> > > > > > > immediate - spurious - wake.
> > > > > > > 
> > > > > > > The solution is to move the clockevents suspend/resume 
> > > > > > > notification
> > > > > > > out of the syscore_suspend step and explictly call them at the 
> > > > > > > appropriate
> > > > > > > time in the suspend/hibernation paths. I.e. timers are suspend 
> > > > > > > _before_
> > > > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > > > 
> > > > > > > Signed-off-by: Soren Brinkmann 
> > > > > > > ---
> > > > > > > Hi,
> > > > > > > 
> > > > > > > there was not a lot of discussion on the last submission. Just 
> > > > > > > one comment from
> > > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
> > > > > > > outlined in my
> > > > > > > response, does not apply, IMHO, since the platform does not 
> > > > > > > re-enable
> > > > > > > interrupts.
> > > > > > 
> > > > > > Well, you just don't agree with it.
> > > > > > 
> > > > > > The problem with your approach is that timer interrupts aren't 
> > > > > > actually as
> > > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would 
> > > > > > have caused
> > > > > > similar issues to appear under specific conditions.
> > > > > > 
> > > > > > The solution I would suggest and that actually covers all 
> > > > > > IRQF_NO_SUSPEND
> > > > > > interrupts would be to use a wait_event() loop like the one in 
> > > > > > freeze_enter()
> > > > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > > > linux-pm.git),
> > > > > > but wait for pm_abort_suspend to become true, to implement system 
> > > > > > suspend.
> > > > > 
> > > > > sorry, it took me a while since I needed to get some dependencies 
> > > > > ported
> > > > > to the pm-genirq base. Once I had that, it reproduced my original 
> > > > > issue.
> > > > > So far so good. I then looked into finding a solution following your
> > > > > guidance. I'm not sure I really found what you had in mind, but below 
> > > > > is
> > > > > what I came up with, which seems to do it.
> > > > > Please let me know how far off I am.
> > > > > 
> > > > >   Thanks,
> > > > >   Sören
> > > > > 
> > > > > ---8<--8<8<8<---
> > > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > > > index c2744b30d5d9..a4f9914571f1 100644
> > > > > --- a/drivers/base/power/wakeup.c
> > > > > +++ b/drivers/base/power/wakeup.c
> > > > > @@ -25,7 +25,7 @@
> > > > >  bool events_check_enabled __read_mostly;
> > > > >  
> > > > >  /* If set and the system is suspending, terminate the suspend. */
> > > > > -static bool pm_abort_suspend __read_mostly;
> > > > > +bool pm_abort_suspend __read_mostly;
> > > > >  
> > > > >  /*
> > > > >   * Combined counters of registered wakeup events and wakeup events 
> > > > > in progress.
> > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > > > > index 6dadb25cb0d8..e6a6de8f76d0 100644
> > > > > --- a/kernel/power/suspend.c
> > > > > +++ b/kernel/power/suspend.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  
> > > > >  static const char *pm_labels[] = { "mem", "standby", "freeze", };
> > > > >  const char *pm_states[PM_SUSPEND_MAX];
> > > > > +extern bool pm_abort_suspend;
> > > > >  
> > > > >  static const struct platform_suspend_ops *suspend_ops;
> > > > >  static const struct platform_freeze_ops *freeze_ops;
> > > > > @@ -294,25 +295,27 @@ static int 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-12 Thread Lorenzo Pieralisi
Hi Rafael, Soren,

On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
 On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
  On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
   Hi Rafael,
   
   On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
 Hi Rafael,

Hi,

Sorry for the huge delay.

 On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
  On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
   On platforms that do not power off during suspend, successfully 
   entering
   suspend races with timers.
   
   The race happening in a couple of location is:
   
 1. disable IRQs (e.g. 
   arch_suspend_disable_irqs())
...
 2. syscore_suspend()
 - timekeeping_suspend()
  - clockevents_notify(SUSPEND)
   - tick_suspend() (timers are turned off here)
...
 3. wfi  (wait for wake-IRQ here)
   
   Between steps 1 and 2 the timers can still generate interrupts 
   that are
   not handled and stay pending until step 3. That pending IRQ 
   causes an
   immediate - spurious - wake.
   
   The solution is to move the clockevents suspend/resume 
   notification
   out of the syscore_suspend step and explictly call them at the 
   appropriate
   time in the suspend/hibernation paths. I.e. timers are suspend 
   _before_
   IRQs get disabled. And accordingly in the resume path.
   
   Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
   ---
   Hi,
   
   there was not a lot of discussion on the last submission. Just 
   one comment from
   Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
   outlined in my
   response, does not apply, IMHO, since the platform does not 
   re-enable
   interrupts.
  
  Well, you just don't agree with it.
  
  The problem with your approach is that timer interrupts aren't 
  actually as
  special as you think and any other IRQF_NO_SUSPEND interrupts would 
  have caused
  similar issues to appear under specific conditions.
  
  The solution I would suggest and that actually covers all 
  IRQF_NO_SUSPEND
  interrupts would be to use a wait_event() loop like the one in 
  freeze_enter()
  (on top of the current linux-next or the pm-genirq branch of 
  linux-pm.git),
  but wait for pm_abort_suspend to become true, to implement system 
  suspend.
 
 sorry, it took me a while since I needed to get some dependencies 
 ported
 to the pm-genirq base. Once I had that, it reproduced my original 
 issue.
 So far so good. I then looked into finding a solution following your
 guidance. I'm not sure I really found what you had in mind, but below 
 is
 what I came up with, which seems to do it.
 Please let me know how far off I am.
 
   Thanks,
   Sören
 
 ---8--888---
 diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
 index c2744b30d5d9..a4f9914571f1 100644
 --- a/drivers/base/power/wakeup.c
 +++ b/drivers/base/power/wakeup.c
 @@ -25,7 +25,7 @@
  bool events_check_enabled __read_mostly;
  
  /* If set and the system is suspending, terminate the suspend. */
 -static bool pm_abort_suspend __read_mostly;
 +bool pm_abort_suspend __read_mostly;
  
  /*
   * Combined counters of registered wakeup events and wakeup events 
 in progress.
 diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
 index 6dadb25cb0d8..e6a6de8f76d0 100644
 --- a/kernel/power/suspend.c
 +++ b/kernel/power/suspend.c
 @@ -33,6 +33,7 @@
  
  static const char *pm_labels[] = { mem, standby, freeze, };
  const char *pm_states[PM_SUSPEND_MAX];
 +extern bool pm_abort_suspend;
  
  static const struct platform_suspend_ops *suspend_ops;
  static const struct platform_freeze_ops *freeze_ops;
 @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
 bool *wakeup)
   if (error || suspend_test(TEST_CPUS))
   goto Enable_cpus;
  
 - arch_suspend_disable_irqs();
 - BUG_ON(!irqs_disabled());
 -
 - error = syscore_suspend();
 - if (!error) {
 - *wakeup = pm_wakeup_pending();
 - if (!(suspend_test(TEST_CORE) || *wakeup)) {
 - trace_suspend_resume(TPS(machine_suspend),
 - state, true);
 - error = suspend_ops-enter(state);
 - 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-12 Thread Sören Brinkmann
On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote:
 Hi Rafael, Soren,
 
 On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
  On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
   On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
Hi Rafael,

On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
 On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
  Hi Rafael,
 
 Hi,
 
 Sorry for the huge delay.
 
  On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
   On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
On platforms that do not power off during suspend, successfully 
entering
suspend races with timers.

The race happening in a couple of location is:

  1. disable IRQs   (e.g. 
arch_suspend_disable_irqs())
 ...
  2. syscore_suspend()
  - timekeeping_suspend()
   - clockevents_notify(SUSPEND)
- tick_suspend()   (timers are turned off here)
 ...
  3. wfi(wait for wake-IRQ here)

Between steps 1 and 2 the timers can still generate interrupts 
that are
not handled and stay pending until step 3. That pending IRQ 
causes an
immediate - spurious - wake.

The solution is to move the clockevents suspend/resume 
notification
out of the syscore_suspend step and explictly call them at the 
appropriate
time in the suspend/hibernation paths. I.e. timers are suspend 
_before_
IRQs get disabled. And accordingly in the resume path.

Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
---
Hi,

there was not a lot of discussion on the last submission. Just 
one comment from
Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
outlined in my
response, does not apply, IMHO, since the platform does not 
re-enable
interrupts.
   
   Well, you just don't agree with it.
   
   The problem with your approach is that timer interrupts aren't 
   actually as
   special as you think and any other IRQF_NO_SUSPEND interrupts 
   would have caused
   similar issues to appear under specific conditions.
   
   The solution I would suggest and that actually covers all 
   IRQF_NO_SUSPEND
   interrupts would be to use a wait_event() loop like the one in 
   freeze_enter()
   (on top of the current linux-next or the pm-genirq branch of 
   linux-pm.git),
   but wait for pm_abort_suspend to become true, to implement system 
   suspend.
  
  sorry, it took me a while since I needed to get some dependencies 
  ported
  to the pm-genirq base. Once I had that, it reproduced my original 
  issue.
  So far so good. I then looked into finding a solution following your
  guidance. I'm not sure I really found what you had in mind, but 
  below is
  what I came up with, which seems to do it.
  Please let me know how far off I am.
  
  Thanks,
  Sören
  
  ---8--888---
  diff --git a/drivers/base/power/wakeup.c 
  b/drivers/base/power/wakeup.c
  index c2744b30d5d9..a4f9914571f1 100644
  --- a/drivers/base/power/wakeup.c
  +++ b/drivers/base/power/wakeup.c
  @@ -25,7 +25,7 @@
   bool events_check_enabled __read_mostly;
   
   /* If set and the system is suspending, terminate the suspend. */
  -static bool pm_abort_suspend __read_mostly;
  +bool pm_abort_suspend __read_mostly;
   
   /*
* Combined counters of registered wakeup events and wakeup events 
  in progress.
  diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
  index 6dadb25cb0d8..e6a6de8f76d0 100644
  --- a/kernel/power/suspend.c
  +++ b/kernel/power/suspend.c
  @@ -33,6 +33,7 @@
   
   static const char *pm_labels[] = { mem, standby, freeze, };
   const char *pm_states[PM_SUSPEND_MAX];
  +extern bool pm_abort_suspend;
   
   static const struct platform_suspend_ops *suspend_ops;
   static const struct platform_freeze_ops *freeze_ops;
  @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t 
  state, bool *wakeup)
  if (error || suspend_test(TEST_CPUS))
  goto Enable_cpus;
   
  -   arch_suspend_disable_irqs();
  -   BUG_ON(!irqs_disabled());
  -
  -   error = syscore_suspend();
  -   if (!error) {
  -   *wakeup = pm_wakeup_pending();
  -   if (!(suspend_test(TEST_CORE) || *wakeup)) {
  -   trace_suspend_resume(TPS(machine_suspend),
  

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-12 Thread Lorenzo Pieralisi
On Mon, Jan 12, 2015 at 03:55:05PM +, Sören Brinkmann wrote:
 On Mon, 2015-01-12 at 03:43PM +, Lorenzo Pieralisi wrote:
  Hi Rafael, Soren,
 
  On Sun, Jan 11, 2015 at 11:20:36PM +, Rafael J. Wysocki wrote:
   On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
 Hi Rafael,

 On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
  On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
   Hi Rafael,
 
  Hi,
 
  Sorry for the huge delay.
 
   On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
 On platforms that do not power off during suspend, 
 successfully entering
 suspend races with timers.

 The race happening in a couple of location is:

   1. disable IRQs   (e.g. 
 arch_suspend_disable_irqs())
  ...
   2. syscore_suspend()
   - timekeeping_suspend()
- clockevents_notify(SUSPEND)
 - tick_suspend()   (timers are turned off here)
  ...
   3. wfi(wait for wake-IRQ here)

 Between steps 1 and 2 the timers can still generate 
 interrupts that are
 not handled and stay pending until step 3. That pending IRQ 
 causes an
 immediate - spurious - wake.

 The solution is to move the clockevents suspend/resume 
 notification
 out of the syscore_suspend step and explictly call them at 
 the appropriate
 time in the suspend/hibernation paths. I.e. timers are 
 suspend _before_
 IRQs get disabled. And accordingly in the resume path.

 Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
 ---
 Hi,

 there was not a lot of discussion on the last submission. 
 Just one comment from
 Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I 
 outlined in my
 response, does not apply, IMHO, since the platform does not 
 re-enable
 interrupts.
   
Well, you just don't agree with it.
   
The problem with your approach is that timer interrupts aren't 
actually as
special as you think and any other IRQF_NO_SUSPEND interrupts 
would have caused
similar issues to appear under specific conditions.
   
The solution I would suggest and that actually covers all 
IRQF_NO_SUSPEND
interrupts would be to use a wait_event() loop like the one in 
freeze_enter()
(on top of the current linux-next or the pm-genirq branch of 
linux-pm.git),
but wait for pm_abort_suspend to become true, to implement 
system suspend.
  
   sorry, it took me a while since I needed to get some dependencies 
   ported
   to the pm-genirq base. Once I had that, it reproduced my original 
   issue.
   So far so good. I then looked into finding a solution following 
   your
   guidance. I'm not sure I really found what you had in mind, but 
   below is
   what I came up with, which seems to do it.
   Please let me know how far off I am.
  
   Thanks,
   Sören
  
   ---8--888---
   diff --git a/drivers/base/power/wakeup.c 
   b/drivers/base/power/wakeup.c
   index c2744b30d5d9..a4f9914571f1 100644
   --- a/drivers/base/power/wakeup.c
   +++ b/drivers/base/power/wakeup.c
   @@ -25,7 +25,7 @@
bool events_check_enabled __read_mostly;
  
/* If set and the system is suspending, terminate the suspend. */
   -static bool pm_abort_suspend __read_mostly;
   +bool pm_abort_suspend __read_mostly;
  
/*
 * Combined counters of registered wakeup events and wakeup 
   events in progress.
   diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
   index 6dadb25cb0d8..e6a6de8f76d0 100644
   --- a/kernel/power/suspend.c
   +++ b/kernel/power/suspend.c
   @@ -33,6 +33,7 @@
  
static const char *pm_labels[] = { mem, standby, freeze, };
const char *pm_states[PM_SUSPEND_MAX];
   +extern bool pm_abort_suspend;
  
static const struct platform_suspend_ops *suspend_ops;
static const struct platform_freeze_ops *freeze_ops;
   @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t 
   state, bool *wakeup)
   if (error || suspend_test(TEST_CPUS))
   goto Enable_cpus;
  
   -   arch_suspend_disable_irqs();
   -   BUG_ON(!irqs_disabled());
   -
   -   error = syscore_suspend();
   -   if (!error) {
   -  

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-11 Thread Rafael J. Wysocki
On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
> On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> > Hi Rafael,
> > 
> > On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > > Hi Rafael,
> > > 
> > > Hi,
> > > 
> > > Sorry for the huge delay.
> > > 
> > > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > > > On platforms that do not power off during suspend, successfully 
> > > > > > entering
> > > > > > suspend races with timers.
> > > > > > 
> > > > > > The race happening in a couple of location is:
> > > > > > 
> > > > > >   1. disable IRQs   (e.g. 
> > > > > > arch_suspend_disable_irqs())
> > > > > >  ...
> > > > > >   2. syscore_suspend()
> > > > > >   -> timekeeping_suspend()
> > > > > >-> clockevents_notify(SUSPEND)
> > > > > > -> tick_suspend()   (timers are turned off here)
> > > > > >  ...
> > > > > >   3. wfi(wait for wake-IRQ here)
> > > > > > 
> > > > > > Between steps 1 and 2 the timers can still generate interrupts that 
> > > > > > are
> > > > > > not handled and stay pending until step 3. That pending IRQ causes 
> > > > > > an
> > > > > > immediate - spurious - wake.
> > > > > > 
> > > > > > The solution is to move the clockevents suspend/resume notification
> > > > > > out of the syscore_suspend step and explictly call them at the 
> > > > > > appropriate
> > > > > > time in the suspend/hibernation paths. I.e. timers are suspend 
> > > > > > _before_
> > > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > > 
> > > > > > Signed-off-by: Soren Brinkmann 
> > > > > > ---
> > > > > > Hi,
> > > > > > 
> > > > > > there was not a lot of discussion on the last submission. Just one 
> > > > > > comment from
> > > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined 
> > > > > > in my
> > > > > > response, does not apply, IMHO, since the platform does not 
> > > > > > re-enable
> > > > > > interrupts.
> > > > > 
> > > > > Well, you just don't agree with it.
> > > > > 
> > > > > The problem with your approach is that timer interrupts aren't 
> > > > > actually as
> > > > > special as you think and any other IRQF_NO_SUSPEND interrupts would 
> > > > > have caused
> > > > > similar issues to appear under specific conditions.
> > > > > 
> > > > > The solution I would suggest and that actually covers all 
> > > > > IRQF_NO_SUSPEND
> > > > > interrupts would be to use a wait_event() loop like the one in 
> > > > > freeze_enter()
> > > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > > linux-pm.git),
> > > > > but wait for pm_abort_suspend to become true, to implement system 
> > > > > suspend.
> > > > 
> > > > sorry, it took me a while since I needed to get some dependencies ported
> > > > to the pm-genirq base. Once I had that, it reproduced my original issue.
> > > > So far so good. I then looked into finding a solution following your
> > > > guidance. I'm not sure I really found what you had in mind, but below is
> > > > what I came up with, which seems to do it.
> > > > Please let me know how far off I am.
> > > > 
> > > > Thanks,
> > > > Sören
> > > > 
> > > > ---8<--8<8<8<---
> > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > > index c2744b30d5d9..a4f9914571f1 100644
> > > > --- a/drivers/base/power/wakeup.c
> > > > +++ b/drivers/base/power/wakeup.c
> > > > @@ -25,7 +25,7 @@
> > > >  bool events_check_enabled __read_mostly;
> > > >  
> > > >  /* If set and the system is suspending, terminate the suspend. */
> > > > -static bool pm_abort_suspend __read_mostly;
> > > > +bool pm_abort_suspend __read_mostly;
> > > >  
> > > >  /*
> > > >   * Combined counters of registered wakeup events and wakeup events in 
> > > > progress.
> > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > > > index 6dadb25cb0d8..e6a6de8f76d0 100644
> > > > --- a/kernel/power/suspend.c
> > > > +++ b/kernel/power/suspend.c
> > > > @@ -33,6 +33,7 @@
> > > >  
> > > >  static const char *pm_labels[] = { "mem", "standby", "freeze", };
> > > >  const char *pm_states[PM_SUSPEND_MAX];
> > > > +extern bool pm_abort_suspend;
> > > >  
> > > >  static const struct platform_suspend_ops *suspend_ops;
> > > >  static const struct platform_freeze_ops *freeze_ops;
> > > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
> > > > bool *wakeup)
> > > > if (error || suspend_test(TEST_CPUS))
> > > > goto Enable_cpus;
> > > >  
> > > > -   arch_suspend_disable_irqs();
> > > > -   BUG_ON(!irqs_disabled());
> > > > -
> > > > -   error = syscore_suspend();
> > > > -   if (!error) {
> > > > -   

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-11 Thread Rafael J. Wysocki
On Friday, January 09, 2015 01:50:59 PM Sören Brinkmann wrote:
 On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
  Hi Rafael,
  
  On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
   On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
Hi Rafael,
   
   Hi,
   
   Sorry for the huge delay.
   
On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
 On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
  On platforms that do not power off during suspend, successfully 
  entering
  suspend races with timers.
  
  The race happening in a couple of location is:
  
1. disable IRQs   (e.g. 
  arch_suspend_disable_irqs())
   ...
2. syscore_suspend()
- timekeeping_suspend()
 - clockevents_notify(SUSPEND)
  - tick_suspend()   (timers are turned off here)
   ...
3. wfi(wait for wake-IRQ here)
  
  Between steps 1 and 2 the timers can still generate interrupts that 
  are
  not handled and stay pending until step 3. That pending IRQ causes 
  an
  immediate - spurious - wake.
  
  The solution is to move the clockevents suspend/resume notification
  out of the syscore_suspend step and explictly call them at the 
  appropriate
  time in the suspend/hibernation paths. I.e. timers are suspend 
  _before_
  IRQs get disabled. And accordingly in the resume path.
  
  Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
  ---
  Hi,
  
  there was not a lot of discussion on the last submission. Just one 
  comment from
  Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined 
  in my
  response, does not apply, IMHO, since the platform does not 
  re-enable
  interrupts.
 
 Well, you just don't agree with it.
 
 The problem with your approach is that timer interrupts aren't 
 actually as
 special as you think and any other IRQF_NO_SUSPEND interrupts would 
 have caused
 similar issues to appear under specific conditions.
 
 The solution I would suggest and that actually covers all 
 IRQF_NO_SUSPEND
 interrupts would be to use a wait_event() loop like the one in 
 freeze_enter()
 (on top of the current linux-next or the pm-genirq branch of 
 linux-pm.git),
 but wait for pm_abort_suspend to become true, to implement system 
 suspend.

sorry, it took me a while since I needed to get some dependencies ported
to the pm-genirq base. Once I had that, it reproduced my original issue.
So far so good. I then looked into finding a solution following your
guidance. I'm not sure I really found what you had in mind, but below is
what I came up with, which seems to do it.
Please let me know how far off I am.

Thanks,
Sören

---8--888---
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index c2744b30d5d9..a4f9914571f1 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -25,7 +25,7 @@
 bool events_check_enabled __read_mostly;
 
 /* If set and the system is suspending, terminate the suspend. */
-static bool pm_abort_suspend __read_mostly;
+bool pm_abort_suspend __read_mostly;
 
 /*
  * Combined counters of registered wakeup events and wakeup events in 
progress.
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 6dadb25cb0d8..e6a6de8f76d0 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -33,6 +33,7 @@
 
 static const char *pm_labels[] = { mem, standby, freeze, };
 const char *pm_states[PM_SUSPEND_MAX];
+extern bool pm_abort_suspend;
 
 static const struct platform_suspend_ops *suspend_ops;
 static const struct platform_freeze_ops *freeze_ops;
@@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
bool *wakeup)
if (error || suspend_test(TEST_CPUS))
goto Enable_cpus;
 
-   arch_suspend_disable_irqs();
-   BUG_ON(!irqs_disabled());
-
-   error = syscore_suspend();
-   if (!error) {
-   *wakeup = pm_wakeup_pending();
-   if (!(suspend_test(TEST_CORE) || *wakeup)) {
-   trace_suspend_resume(TPS(machine_suspend),
-   state, true);
-   error = suspend_ops-enter(state);
-   trace_suspend_resume(TPS(machine_suspend),
-   state, false);
-   events_check_enabled = false;
+   while (!pm_abort_suspend) {
   
   That won't work in general, because 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-09 Thread Sören Brinkmann
On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
> Hi Rafael,
> 
> On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> > On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > > Hi Rafael,
> > 
> > Hi,
> > 
> > Sorry for the huge delay.
> > 
> > > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > > On platforms that do not power off during suspend, successfully 
> > > > > entering
> > > > > suspend races with timers.
> > > > > 
> > > > > The race happening in a couple of location is:
> > > > > 
> > > > >   1. disable IRQs (e.g. arch_suspend_disable_irqs())
> > > > >  ...
> > > > >   2. syscore_suspend()
> > > > >   -> timekeeping_suspend()
> > > > >-> clockevents_notify(SUSPEND)
> > > > > -> tick_suspend() (timers are turned off here)
> > > > >  ...
> > > > >   3. wfi  (wait for wake-IRQ here)
> > > > > 
> > > > > Between steps 1 and 2 the timers can still generate interrupts that 
> > > > > are
> > > > > not handled and stay pending until step 3. That pending IRQ causes an
> > > > > immediate - spurious - wake.
> > > > > 
> > > > > The solution is to move the clockevents suspend/resume notification
> > > > > out of the syscore_suspend step and explictly call them at the 
> > > > > appropriate
> > > > > time in the suspend/hibernation paths. I.e. timers are suspend 
> > > > > _before_
> > > > > IRQs get disabled. And accordingly in the resume path.
> > > > > 
> > > > > Signed-off-by: Soren Brinkmann 
> > > > > ---
> > > > > Hi,
> > > > > 
> > > > > there was not a lot of discussion on the last submission. Just one 
> > > > > comment from
> > > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined 
> > > > > in my
> > > > > response, does not apply, IMHO, since the platform does not re-enable
> > > > > interrupts.
> > > > 
> > > > Well, you just don't agree with it.
> > > > 
> > > > The problem with your approach is that timer interrupts aren't actually 
> > > > as
> > > > special as you think and any other IRQF_NO_SUSPEND interrupts would 
> > > > have caused
> > > > similar issues to appear under specific conditions.
> > > > 
> > > > The solution I would suggest and that actually covers all 
> > > > IRQF_NO_SUSPEND
> > > > interrupts would be to use a wait_event() loop like the one in 
> > > > freeze_enter()
> > > > (on top of the current linux-next or the pm-genirq branch of 
> > > > linux-pm.git),
> > > > but wait for pm_abort_suspend to become true, to implement system 
> > > > suspend.
> > > 
> > > sorry, it took me a while since I needed to get some dependencies ported
> > > to the pm-genirq base. Once I had that, it reproduced my original issue.
> > > So far so good. I then looked into finding a solution following your
> > > guidance. I'm not sure I really found what you had in mind, but below is
> > > what I came up with, which seems to do it.
> > > Please let me know how far off I am.
> > > 
> > >   Thanks,
> > >   Sören
> > > 
> > > ---8<--8<8<8<---
> > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > index c2744b30d5d9..a4f9914571f1 100644
> > > --- a/drivers/base/power/wakeup.c
> > > +++ b/drivers/base/power/wakeup.c
> > > @@ -25,7 +25,7 @@
> > >  bool events_check_enabled __read_mostly;
> > >  
> > >  /* If set and the system is suspending, terminate the suspend. */
> > > -static bool pm_abort_suspend __read_mostly;
> > > +bool pm_abort_suspend __read_mostly;
> > >  
> > >  /*
> > >   * Combined counters of registered wakeup events and wakeup events in 
> > > progress.
> > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > > index 6dadb25cb0d8..e6a6de8f76d0 100644
> > > --- a/kernel/power/suspend.c
> > > +++ b/kernel/power/suspend.c
> > > @@ -33,6 +33,7 @@
> > >  
> > >  static const char *pm_labels[] = { "mem", "standby", "freeze", };
> > >  const char *pm_states[PM_SUSPEND_MAX];
> > > +extern bool pm_abort_suspend;
> > >  
> > >  static const struct platform_suspend_ops *suspend_ops;
> > >  static const struct platform_freeze_ops *freeze_ops;
> > > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
> > > bool *wakeup)
> > >   if (error || suspend_test(TEST_CPUS))
> > >   goto Enable_cpus;
> > >  
> > > - arch_suspend_disable_irqs();
> > > - BUG_ON(!irqs_disabled());
> > > -
> > > - error = syscore_suspend();
> > > - if (!error) {
> > > - *wakeup = pm_wakeup_pending();
> > > - if (!(suspend_test(TEST_CORE) || *wakeup)) {
> > > - trace_suspend_resume(TPS("machine_suspend"),
> > > - state, true);
> > > - error = suspend_ops->enter(state);
> > > - trace_suspend_resume(TPS("machine_suspend"),
> > > - state, false);
> > > - 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2015-01-09 Thread Sören Brinkmann
On Sat, 2014-11-08 at 03:56PM -0800, Sören Brinkmann wrote:
 Hi Rafael,
 
 On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
  On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
   Hi Rafael,
  
  Hi,
  
  Sorry for the huge delay.
  
   On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
 On platforms that do not power off during suspend, successfully 
 entering
 suspend races with timers.
 
 The race happening in a couple of location is:
 
   1. disable IRQs (e.g. arch_suspend_disable_irqs())
  ...
   2. syscore_suspend()
   - timekeeping_suspend()
- clockevents_notify(SUSPEND)
 - tick_suspend() (timers are turned off here)
  ...
   3. wfi  (wait for wake-IRQ here)
 
 Between steps 1 and 2 the timers can still generate interrupts that 
 are
 not handled and stay pending until step 3. That pending IRQ causes an
 immediate - spurious - wake.
 
 The solution is to move the clockevents suspend/resume notification
 out of the syscore_suspend step and explictly call them at the 
 appropriate
 time in the suspend/hibernation paths. I.e. timers are suspend 
 _before_
 IRQs get disabled. And accordingly in the resume path.
 
 Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
 ---
 Hi,
 
 there was not a lot of discussion on the last submission. Just one 
 comment from
 Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined 
 in my
 response, does not apply, IMHO, since the platform does not re-enable
 interrupts.

Well, you just don't agree with it.

The problem with your approach is that timer interrupts aren't actually 
as
special as you think and any other IRQF_NO_SUSPEND interrupts would 
have caused
similar issues to appear under specific conditions.

The solution I would suggest and that actually covers all 
IRQF_NO_SUSPEND
interrupts would be to use a wait_event() loop like the one in 
freeze_enter()
(on top of the current linux-next or the pm-genirq branch of 
linux-pm.git),
but wait for pm_abort_suspend to become true, to implement system 
suspend.
   
   sorry, it took me a while since I needed to get some dependencies ported
   to the pm-genirq base. Once I had that, it reproduced my original issue.
   So far so good. I then looked into finding a solution following your
   guidance. I'm not sure I really found what you had in mind, but below is
   what I came up with, which seems to do it.
   Please let me know how far off I am.
   
 Thanks,
 Sören
   
   ---8--888---
   diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
   index c2744b30d5d9..a4f9914571f1 100644
   --- a/drivers/base/power/wakeup.c
   +++ b/drivers/base/power/wakeup.c
   @@ -25,7 +25,7 @@
bool events_check_enabled __read_mostly;

/* If set and the system is suspending, terminate the suspend. */
   -static bool pm_abort_suspend __read_mostly;
   +bool pm_abort_suspend __read_mostly;

/*
 * Combined counters of registered wakeup events and wakeup events in 
   progress.
   diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
   index 6dadb25cb0d8..e6a6de8f76d0 100644
   --- a/kernel/power/suspend.c
   +++ b/kernel/power/suspend.c
   @@ -33,6 +33,7 @@

static const char *pm_labels[] = { mem, standby, freeze, };
const char *pm_states[PM_SUSPEND_MAX];
   +extern bool pm_abort_suspend;

static const struct platform_suspend_ops *suspend_ops;
static const struct platform_freeze_ops *freeze_ops;
   @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, 
   bool *wakeup)
 if (error || suspend_test(TEST_CPUS))
 goto Enable_cpus;

   - arch_suspend_disable_irqs();
   - BUG_ON(!irqs_disabled());
   -
   - error = syscore_suspend();
   - if (!error) {
   - *wakeup = pm_wakeup_pending();
   - if (!(suspend_test(TEST_CORE) || *wakeup)) {
   - trace_suspend_resume(TPS(machine_suspend),
   - state, true);
   - error = suspend_ops-enter(state);
   - trace_suspend_resume(TPS(machine_suspend),
   - state, false);
   - events_check_enabled = false;
   + while (!pm_abort_suspend) {
  
  That won't work in general, because pm_abort_suspend may not be set on some
  platforms on wakeup.  It is only set if a wakeup interrupt triggers which
  may not be the case on ACPI systems if the BIOS has woken up the system.
  
  But that could be addressed by making those platforms simply set 
  pm_wakeup_pending
  in their BIOS exit path.
  
   + 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-11-08 Thread Sören Brinkmann
Hi Rafael,

On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
> On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> > Hi Rafael,
> 
> Hi,
> 
> Sorry for the huge delay.
> 
> > On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > > On platforms that do not power off during suspend, successfully entering
> > > > suspend races with timers.
> > > > 
> > > > The race happening in a couple of location is:
> > > > 
> > > >   1. disable IRQs   (e.g. arch_suspend_disable_irqs())
> > > >  ...
> > > >   2. syscore_suspend()
> > > >   -> timekeeping_suspend()
> > > >-> clockevents_notify(SUSPEND)
> > > > -> tick_suspend()   (timers are turned off here)
> > > >  ...
> > > >   3. wfi(wait for wake-IRQ here)
> > > > 
> > > > Between steps 1 and 2 the timers can still generate interrupts that are
> > > > not handled and stay pending until step 3. That pending IRQ causes an
> > > > immediate - spurious - wake.
> > > > 
> > > > The solution is to move the clockevents suspend/resume notification
> > > > out of the syscore_suspend step and explictly call them at the 
> > > > appropriate
> > > > time in the suspend/hibernation paths. I.e. timers are suspend _before_
> > > > IRQs get disabled. And accordingly in the resume path.
> > > > 
> > > > Signed-off-by: Soren Brinkmann 
> > > > ---
> > > > Hi,
> > > > 
> > > > there was not a lot of discussion on the last submission. Just one 
> > > > comment from
> > > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in 
> > > > my
> > > > response, does not apply, IMHO, since the platform does not re-enable
> > > > interrupts.
> > > 
> > > Well, you just don't agree with it.
> > > 
> > > The problem with your approach is that timer interrupts aren't actually as
> > > special as you think and any other IRQF_NO_SUSPEND interrupts would have 
> > > caused
> > > similar issues to appear under specific conditions.
> > > 
> > > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
> > > interrupts would be to use a wait_event() loop like the one in 
> > > freeze_enter()
> > > (on top of the current linux-next or the pm-genirq branch of 
> > > linux-pm.git),
> > > but wait for pm_abort_suspend to become true, to implement system suspend.
> > 
> > sorry, it took me a while since I needed to get some dependencies ported
> > to the pm-genirq base. Once I had that, it reproduced my original issue.
> > So far so good. I then looked into finding a solution following your
> > guidance. I'm not sure I really found what you had in mind, but below is
> > what I came up with, which seems to do it.
> > Please let me know how far off I am.
> > 
> > Thanks,
> > Sören
> > 
> > ---8<--8<8<8<---
> > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > index c2744b30d5d9..a4f9914571f1 100644
> > --- a/drivers/base/power/wakeup.c
> > +++ b/drivers/base/power/wakeup.c
> > @@ -25,7 +25,7 @@
> >  bool events_check_enabled __read_mostly;
> >  
> >  /* If set and the system is suspending, terminate the suspend. */
> > -static bool pm_abort_suspend __read_mostly;
> > +bool pm_abort_suspend __read_mostly;
> >  
> >  /*
> >   * Combined counters of registered wakeup events and wakeup events in 
> > progress.
> > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> > index 6dadb25cb0d8..e6a6de8f76d0 100644
> > --- a/kernel/power/suspend.c
> > +++ b/kernel/power/suspend.c
> > @@ -33,6 +33,7 @@
> >  
> >  static const char *pm_labels[] = { "mem", "standby", "freeze", };
> >  const char *pm_states[PM_SUSPEND_MAX];
> > +extern bool pm_abort_suspend;
> >  
> >  static const struct platform_suspend_ops *suspend_ops;
> >  static const struct platform_freeze_ops *freeze_ops;
> > @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
> > *wakeup)
> > if (error || suspend_test(TEST_CPUS))
> > goto Enable_cpus;
> >  
> > -   arch_suspend_disable_irqs();
> > -   BUG_ON(!irqs_disabled());
> > -
> > -   error = syscore_suspend();
> > -   if (!error) {
> > -   *wakeup = pm_wakeup_pending();
> > -   if (!(suspend_test(TEST_CORE) || *wakeup)) {
> > -   trace_suspend_resume(TPS("machine_suspend"),
> > -   state, true);
> > -   error = suspend_ops->enter(state);
> > -   trace_suspend_resume(TPS("machine_suspend"),
> > -   state, false);
> > -   events_check_enabled = false;
> > +   while (!pm_abort_suspend) {
> 
> That won't work in general, because pm_abort_suspend may not be set on some
> platforms on wakeup.  It is only set if a wakeup interrupt triggers which
> may not be the case on ACPI systems if the BIOS has woken up the system.
> 
> But that could 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-11-08 Thread Sören Brinkmann
Hi Rafael,

On Thu, 2014-11-06 at 01:33AM +0100, Rafael J. Wysocki wrote:
 On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
  Hi Rafael,
 
 Hi,
 
 Sorry for the huge delay.
 
  On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
   On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
On platforms that do not power off during suspend, successfully entering
suspend races with timers.

The race happening in a couple of location is:

  1. disable IRQs   (e.g. arch_suspend_disable_irqs())
 ...
  2. syscore_suspend()
  - timekeeping_suspend()
   - clockevents_notify(SUSPEND)
- tick_suspend()   (timers are turned off here)
 ...
  3. wfi(wait for wake-IRQ here)

Between steps 1 and 2 the timers can still generate interrupts that are
not handled and stay pending until step 3. That pending IRQ causes an
immediate - spurious - wake.

The solution is to move the clockevents suspend/resume notification
out of the syscore_suspend step and explictly call them at the 
appropriate
time in the suspend/hibernation paths. I.e. timers are suspend _before_
IRQs get disabled. And accordingly in the resume path.

Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
---
Hi,

there was not a lot of discussion on the last submission. Just one 
comment from
Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in 
my
response, does not apply, IMHO, since the platform does not re-enable
interrupts.
   
   Well, you just don't agree with it.
   
   The problem with your approach is that timer interrupts aren't actually as
   special as you think and any other IRQF_NO_SUSPEND interrupts would have 
   caused
   similar issues to appear under specific conditions.
   
   The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
   interrupts would be to use a wait_event() loop like the one in 
   freeze_enter()
   (on top of the current linux-next or the pm-genirq branch of 
   linux-pm.git),
   but wait for pm_abort_suspend to become true, to implement system suspend.
  
  sorry, it took me a while since I needed to get some dependencies ported
  to the pm-genirq base. Once I had that, it reproduced my original issue.
  So far so good. I then looked into finding a solution following your
  guidance. I'm not sure I really found what you had in mind, but below is
  what I came up with, which seems to do it.
  Please let me know how far off I am.
  
  Thanks,
  Sören
  
  ---8--888---
  diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
  index c2744b30d5d9..a4f9914571f1 100644
  --- a/drivers/base/power/wakeup.c
  +++ b/drivers/base/power/wakeup.c
  @@ -25,7 +25,7 @@
   bool events_check_enabled __read_mostly;
   
   /* If set and the system is suspending, terminate the suspend. */
  -static bool pm_abort_suspend __read_mostly;
  +bool pm_abort_suspend __read_mostly;
   
   /*
* Combined counters of registered wakeup events and wakeup events in 
  progress.
  diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
  index 6dadb25cb0d8..e6a6de8f76d0 100644
  --- a/kernel/power/suspend.c
  +++ b/kernel/power/suspend.c
  @@ -33,6 +33,7 @@
   
   static const char *pm_labels[] = { mem, standby, freeze, };
   const char *pm_states[PM_SUSPEND_MAX];
  +extern bool pm_abort_suspend;
   
   static const struct platform_suspend_ops *suspend_ops;
   static const struct platform_freeze_ops *freeze_ops;
  @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
  *wakeup)
  if (error || suspend_test(TEST_CPUS))
  goto Enable_cpus;
   
  -   arch_suspend_disable_irqs();
  -   BUG_ON(!irqs_disabled());
  -
  -   error = syscore_suspend();
  -   if (!error) {
  -   *wakeup = pm_wakeup_pending();
  -   if (!(suspend_test(TEST_CORE) || *wakeup)) {
  -   trace_suspend_resume(TPS(machine_suspend),
  -   state, true);
  -   error = suspend_ops-enter(state);
  -   trace_suspend_resume(TPS(machine_suspend),
  -   state, false);
  -   events_check_enabled = false;
  +   while (!pm_abort_suspend) {
 
 That won't work in general, because pm_abort_suspend may not be set on some
 platforms on wakeup.  It is only set if a wakeup interrupt triggers which
 may not be the case on ACPI systems if the BIOS has woken up the system.
 
 But that could be addressed by making those platforms simply set 
 pm_wakeup_pending
 in their BIOS exit path.
 
  +   arch_suspend_disable_irqs();
  +   BUG_ON(!irqs_disabled());
  +
  +   error = syscore_suspend();
 
 Also it shouldn't be necessary to do syscore_suspend()/syscore_resume() 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-11-05 Thread Rafael J. Wysocki
On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
> Hi Rafael,

Hi,

Sorry for the huge delay.

> On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > On platforms that do not power off during suspend, successfully entering
> > > suspend races with timers.
> > > 
> > > The race happening in a couple of location is:
> > > 
> > >   1. disable IRQs (e.g. arch_suspend_disable_irqs())
> > >  ...
> > >   2. syscore_suspend()
> > >   -> timekeeping_suspend()
> > >-> clockevents_notify(SUSPEND)
> > > -> tick_suspend() (timers are turned off here)
> > >  ...
> > >   3. wfi  (wait for wake-IRQ here)
> > > 
> > > Between steps 1 and 2 the timers can still generate interrupts that are
> > > not handled and stay pending until step 3. That pending IRQ causes an
> > > immediate - spurious - wake.
> > > 
> > > The solution is to move the clockevents suspend/resume notification
> > > out of the syscore_suspend step and explictly call them at the appropriate
> > > time in the suspend/hibernation paths. I.e. timers are suspend _before_
> > > IRQs get disabled. And accordingly in the resume path.
> > > 
> > > Signed-off-by: Soren Brinkmann 
> > > ---
> > > Hi,
> > > 
> > > there was not a lot of discussion on the last submission. Just one 
> > > comment from
> > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
> > > response, does not apply, IMHO, since the platform does not re-enable
> > > interrupts.
> > 
> > Well, you just don't agree with it.
> > 
> > The problem with your approach is that timer interrupts aren't actually as
> > special as you think and any other IRQF_NO_SUSPEND interrupts would have 
> > caused
> > similar issues to appear under specific conditions.
> > 
> > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
> > interrupts would be to use a wait_event() loop like the one in 
> > freeze_enter()
> > (on top of the current linux-next or the pm-genirq branch of linux-pm.git),
> > but wait for pm_abort_suspend to become true, to implement system suspend.
> 
> sorry, it took me a while since I needed to get some dependencies ported
> to the pm-genirq base. Once I had that, it reproduced my original issue.
> So far so good. I then looked into finding a solution following your
> guidance. I'm not sure I really found what you had in mind, but below is
> what I came up with, which seems to do it.
> Please let me know how far off I am.
> 
>   Thanks,
>   Sören
> 
> ---8<--8<8<8<---
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index c2744b30d5d9..a4f9914571f1 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -25,7 +25,7 @@
>  bool events_check_enabled __read_mostly;
>  
>  /* If set and the system is suspending, terminate the suspend. */
> -static bool pm_abort_suspend __read_mostly;
> +bool pm_abort_suspend __read_mostly;
>  
>  /*
>   * Combined counters of registered wakeup events and wakeup events in 
> progress.
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 6dadb25cb0d8..e6a6de8f76d0 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -33,6 +33,7 @@
>  
>  static const char *pm_labels[] = { "mem", "standby", "freeze", };
>  const char *pm_states[PM_SUSPEND_MAX];
> +extern bool pm_abort_suspend;
>  
>  static const struct platform_suspend_ops *suspend_ops;
>  static const struct platform_freeze_ops *freeze_ops;
> @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
> *wakeup)
>   if (error || suspend_test(TEST_CPUS))
>   goto Enable_cpus;
>  
> - arch_suspend_disable_irqs();
> - BUG_ON(!irqs_disabled());
> -
> - error = syscore_suspend();
> - if (!error) {
> - *wakeup = pm_wakeup_pending();
> - if (!(suspend_test(TEST_CORE) || *wakeup)) {
> - trace_suspend_resume(TPS("machine_suspend"),
> - state, true);
> - error = suspend_ops->enter(state);
> - trace_suspend_resume(TPS("machine_suspend"),
> - state, false);
> - events_check_enabled = false;
> + while (!pm_abort_suspend) {

That won't work in general, because pm_abort_suspend may not be set on some
platforms on wakeup.  It is only set if a wakeup interrupt triggers which
may not be the case on ACPI systems if the BIOS has woken up the system.

But that could be addressed by making those platforms simply set 
pm_wakeup_pending
in their BIOS exit path.

> + arch_suspend_disable_irqs();
> + BUG_ON(!irqs_disabled());
> +
> + error = syscore_suspend();

Also it shouldn't be necessary to do 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-11-05 Thread Rafael J. Wysocki
On Thursday, October 02, 2014 09:01:15 AM Sören Brinkmann wrote:
 Hi Rafael,

Hi,

Sorry for the huge delay.

 On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
  On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
   On platforms that do not power off during suspend, successfully entering
   suspend races with timers.
   
   The race happening in a couple of location is:
   
 1. disable IRQs (e.g. arch_suspend_disable_irqs())
...
 2. syscore_suspend()
 - timekeeping_suspend()
  - clockevents_notify(SUSPEND)
   - tick_suspend() (timers are turned off here)
...
 3. wfi  (wait for wake-IRQ here)
   
   Between steps 1 and 2 the timers can still generate interrupts that are
   not handled and stay pending until step 3. That pending IRQ causes an
   immediate - spurious - wake.
   
   The solution is to move the clockevents suspend/resume notification
   out of the syscore_suspend step and explictly call them at the appropriate
   time in the suspend/hibernation paths. I.e. timers are suspend _before_
   IRQs get disabled. And accordingly in the resume path.
   
   Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
   ---
   Hi,
   
   there was not a lot of discussion on the last submission. Just one 
   comment from
   Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
   response, does not apply, IMHO, since the platform does not re-enable
   interrupts.
  
  Well, you just don't agree with it.
  
  The problem with your approach is that timer interrupts aren't actually as
  special as you think and any other IRQF_NO_SUSPEND interrupts would have 
  caused
  similar issues to appear under specific conditions.
  
  The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
  interrupts would be to use a wait_event() loop like the one in 
  freeze_enter()
  (on top of the current linux-next or the pm-genirq branch of linux-pm.git),
  but wait for pm_abort_suspend to become true, to implement system suspend.
 
 sorry, it took me a while since I needed to get some dependencies ported
 to the pm-genirq base. Once I had that, it reproduced my original issue.
 So far so good. I then looked into finding a solution following your
 guidance. I'm not sure I really found what you had in mind, but below is
 what I came up with, which seems to do it.
 Please let me know how far off I am.
 
   Thanks,
   Sören
 
 ---8--888---
 diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
 index c2744b30d5d9..a4f9914571f1 100644
 --- a/drivers/base/power/wakeup.c
 +++ b/drivers/base/power/wakeup.c
 @@ -25,7 +25,7 @@
  bool events_check_enabled __read_mostly;
  
  /* If set and the system is suspending, terminate the suspend. */
 -static bool pm_abort_suspend __read_mostly;
 +bool pm_abort_suspend __read_mostly;
  
  /*
   * Combined counters of registered wakeup events and wakeup events in 
 progress.
 diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
 index 6dadb25cb0d8..e6a6de8f76d0 100644
 --- a/kernel/power/suspend.c
 +++ b/kernel/power/suspend.c
 @@ -33,6 +33,7 @@
  
  static const char *pm_labels[] = { mem, standby, freeze, };
  const char *pm_states[PM_SUSPEND_MAX];
 +extern bool pm_abort_suspend;
  
  static const struct platform_suspend_ops *suspend_ops;
  static const struct platform_freeze_ops *freeze_ops;
 @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
 *wakeup)
   if (error || suspend_test(TEST_CPUS))
   goto Enable_cpus;
  
 - arch_suspend_disable_irqs();
 - BUG_ON(!irqs_disabled());
 -
 - error = syscore_suspend();
 - if (!error) {
 - *wakeup = pm_wakeup_pending();
 - if (!(suspend_test(TEST_CORE) || *wakeup)) {
 - trace_suspend_resume(TPS(machine_suspend),
 - state, true);
 - error = suspend_ops-enter(state);
 - trace_suspend_resume(TPS(machine_suspend),
 - state, false);
 - events_check_enabled = false;
 + while (!pm_abort_suspend) {

That won't work in general, because pm_abort_suspend may not be set on some
platforms on wakeup.  It is only set if a wakeup interrupt triggers which
may not be the case on ACPI systems if the BIOS has woken up the system.

But that could be addressed by making those platforms simply set 
pm_wakeup_pending
in their BIOS exit path.

 + arch_suspend_disable_irqs();
 + BUG_ON(!irqs_disabled());
 +
 + error = syscore_suspend();

Also it shouldn't be necessary to do syscore_suspend()/syscore_resume() in
every iteration of the loop.

 + if (!error) {
 + *wakeup = pm_wakeup_pending();

Plus pm_wakeup_pending() returns true if 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-10-28 Thread Sören Brinkmann
Hi Rafael,

any opinion on this?

Thanks,
Sören

On Thu, 2014-10-02 at 09:01AM -0700, Sören Brinkmann wrote:
> Hi Rafael,
> 
> On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> > On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > > On platforms that do not power off during suspend, successfully entering
> > > suspend races with timers.
> > > 
> > > The race happening in a couple of location is:
> > > 
> > >   1. disable IRQs (e.g. arch_suspend_disable_irqs())
> > >  ...
> > >   2. syscore_suspend()
> > >   -> timekeeping_suspend()
> > >-> clockevents_notify(SUSPEND)
> > > -> tick_suspend() (timers are turned off here)
> > >  ...
> > >   3. wfi  (wait for wake-IRQ here)
> > > 
> > > Between steps 1 and 2 the timers can still generate interrupts that are
> > > not handled and stay pending until step 3. That pending IRQ causes an
> > > immediate - spurious - wake.
> > > 
> > > The solution is to move the clockevents suspend/resume notification
> > > out of the syscore_suspend step and explictly call them at the appropriate
> > > time in the suspend/hibernation paths. I.e. timers are suspend _before_
> > > IRQs get disabled. And accordingly in the resume path.
> > > 
> > > Signed-off-by: Soren Brinkmann 
> > > ---
> > > Hi,
> > > 
> > > there was not a lot of discussion on the last submission. Just one 
> > > comment from
> > > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
> > > response, does not apply, IMHO, since the platform does not re-enable
> > > interrupts.
> > 
> > Well, you just don't agree with it.
> > 
> > The problem with your approach is that timer interrupts aren't actually as
> > special as you think and any other IRQF_NO_SUSPEND interrupts would have 
> > caused
> > similar issues to appear under specific conditions.
> > 
> > The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
> > interrupts would be to use a wait_event() loop like the one in 
> > freeze_enter()
> > (on top of the current linux-next or the pm-genirq branch of linux-pm.git),
> > but wait for pm_abort_suspend to become true, to implement system suspend.
> 
> sorry, it took me a while since I needed to get some dependencies ported
> to the pm-genirq base. Once I had that, it reproduced my original issue.
> So far so good. I then looked into finding a solution following your
> guidance. I'm not sure I really found what you had in mind, but below is
> what I came up with, which seems to do it.
> Please let me know how far off I am.
> 
>   Thanks,
>   Sören
> 
> ---8<--8<8<8<---
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index c2744b30d5d9..a4f9914571f1 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -25,7 +25,7 @@
>  bool events_check_enabled __read_mostly;
>  
>  /* If set and the system is suspending, terminate the suspend. */
> -static bool pm_abort_suspend __read_mostly;
> +bool pm_abort_suspend __read_mostly;
>  
>  /*
>   * Combined counters of registered wakeup events and wakeup events in 
> progress.
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 6dadb25cb0d8..e6a6de8f76d0 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -33,6 +33,7 @@
>  
>  static const char *pm_labels[] = { "mem", "standby", "freeze", };
>  const char *pm_states[PM_SUSPEND_MAX];
> +extern bool pm_abort_suspend;
>  
>  static const struct platform_suspend_ops *suspend_ops;
>  static const struct platform_freeze_ops *freeze_ops;
> @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
> *wakeup)
>   if (error || suspend_test(TEST_CPUS))
>   goto Enable_cpus;
>  
> - arch_suspend_disable_irqs();
> - BUG_ON(!irqs_disabled());
> -
> - error = syscore_suspend();
> - if (!error) {
> - *wakeup = pm_wakeup_pending();
> - if (!(suspend_test(TEST_CORE) || *wakeup)) {
> - trace_suspend_resume(TPS("machine_suspend"),
> - state, true);
> - error = suspend_ops->enter(state);
> - trace_suspend_resume(TPS("machine_suspend"),
> - state, false);
> - events_check_enabled = false;
> + while (!pm_abort_suspend) {
> + arch_suspend_disable_irqs();
> + BUG_ON(!irqs_disabled());
> +
> + error = syscore_suspend();
> + if (!error) {
> + *wakeup = pm_wakeup_pending();
> + if (!(suspend_test(TEST_CORE) || *wakeup)) {
> + trace_suspend_resume(TPS("machine_suspend"),
> + state, true);
> + error = suspend_ops->enter(state);
> +  

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-10-28 Thread Sören Brinkmann
Hi Rafael,

any opinion on this?

Thanks,
Sören

On Thu, 2014-10-02 at 09:01AM -0700, Sören Brinkmann wrote:
 Hi Rafael,
 
 On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
  On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
   On platforms that do not power off during suspend, successfully entering
   suspend races with timers.
   
   The race happening in a couple of location is:
   
 1. disable IRQs (e.g. arch_suspend_disable_irqs())
...
 2. syscore_suspend()
 - timekeeping_suspend()
  - clockevents_notify(SUSPEND)
   - tick_suspend() (timers are turned off here)
...
 3. wfi  (wait for wake-IRQ here)
   
   Between steps 1 and 2 the timers can still generate interrupts that are
   not handled and stay pending until step 3. That pending IRQ causes an
   immediate - spurious - wake.
   
   The solution is to move the clockevents suspend/resume notification
   out of the syscore_suspend step and explictly call them at the appropriate
   time in the suspend/hibernation paths. I.e. timers are suspend _before_
   IRQs get disabled. And accordingly in the resume path.
   
   Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
   ---
   Hi,
   
   there was not a lot of discussion on the last submission. Just one 
   comment from
   Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
   response, does not apply, IMHO, since the platform does not re-enable
   interrupts.
  
  Well, you just don't agree with it.
  
  The problem with your approach is that timer interrupts aren't actually as
  special as you think and any other IRQF_NO_SUSPEND interrupts would have 
  caused
  similar issues to appear under specific conditions.
  
  The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
  interrupts would be to use a wait_event() loop like the one in 
  freeze_enter()
  (on top of the current linux-next or the pm-genirq branch of linux-pm.git),
  but wait for pm_abort_suspend to become true, to implement system suspend.
 
 sorry, it took me a while since I needed to get some dependencies ported
 to the pm-genirq base. Once I had that, it reproduced my original issue.
 So far so good. I then looked into finding a solution following your
 guidance. I'm not sure I really found what you had in mind, but below is
 what I came up with, which seems to do it.
 Please let me know how far off I am.
 
   Thanks,
   Sören
 
 ---8--888---
 diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
 index c2744b30d5d9..a4f9914571f1 100644
 --- a/drivers/base/power/wakeup.c
 +++ b/drivers/base/power/wakeup.c
 @@ -25,7 +25,7 @@
  bool events_check_enabled __read_mostly;
  
  /* If set and the system is suspending, terminate the suspend. */
 -static bool pm_abort_suspend __read_mostly;
 +bool pm_abort_suspend __read_mostly;
  
  /*
   * Combined counters of registered wakeup events and wakeup events in 
 progress.
 diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
 index 6dadb25cb0d8..e6a6de8f76d0 100644
 --- a/kernel/power/suspend.c
 +++ b/kernel/power/suspend.c
 @@ -33,6 +33,7 @@
  
  static const char *pm_labels[] = { mem, standby, freeze, };
  const char *pm_states[PM_SUSPEND_MAX];
 +extern bool pm_abort_suspend;
  
  static const struct platform_suspend_ops *suspend_ops;
  static const struct platform_freeze_ops *freeze_ops;
 @@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
 *wakeup)
   if (error || suspend_test(TEST_CPUS))
   goto Enable_cpus;
  
 - arch_suspend_disable_irqs();
 - BUG_ON(!irqs_disabled());
 -
 - error = syscore_suspend();
 - if (!error) {
 - *wakeup = pm_wakeup_pending();
 - if (!(suspend_test(TEST_CORE) || *wakeup)) {
 - trace_suspend_resume(TPS(machine_suspend),
 - state, true);
 - error = suspend_ops-enter(state);
 - trace_suspend_resume(TPS(machine_suspend),
 - state, false);
 - events_check_enabled = false;
 + while (!pm_abort_suspend) {
 + arch_suspend_disable_irqs();
 + BUG_ON(!irqs_disabled());
 +
 + error = syscore_suspend();
 + if (!error) {
 + *wakeup = pm_wakeup_pending();
 + if (!(suspend_test(TEST_CORE) || *wakeup)) {
 + trace_suspend_resume(TPS(machine_suspend),
 + state, true);
 + error = suspend_ops-enter(state);
 + trace_suspend_resume(TPS(machine_suspend),
 + state, false);
 + events_check_enabled = false;
 +   

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-10-02 Thread Sören Brinkmann
Hi Rafael,

On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
> On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> > On platforms that do not power off during suspend, successfully entering
> > suspend races with timers.
> > 
> > The race happening in a couple of location is:
> > 
> >   1. disable IRQs   (e.g. arch_suspend_disable_irqs())
> >  ...
> >   2. syscore_suspend()
> >   -> timekeeping_suspend()
> >-> clockevents_notify(SUSPEND)
> > -> tick_suspend()   (timers are turned off here)
> >  ...
> >   3. wfi(wait for wake-IRQ here)
> > 
> > Between steps 1 and 2 the timers can still generate interrupts that are
> > not handled and stay pending until step 3. That pending IRQ causes an
> > immediate - spurious - wake.
> > 
> > The solution is to move the clockevents suspend/resume notification
> > out of the syscore_suspend step and explictly call them at the appropriate
> > time in the suspend/hibernation paths. I.e. timers are suspend _before_
> > IRQs get disabled. And accordingly in the resume path.
> > 
> > Signed-off-by: Soren Brinkmann 
> > ---
> > Hi,
> > 
> > there was not a lot of discussion on the last submission. Just one comment 
> > from
> > Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
> > response, does not apply, IMHO, since the platform does not re-enable
> > interrupts.
> 
> Well, you just don't agree with it.
> 
> The problem with your approach is that timer interrupts aren't actually as
> special as you think and any other IRQF_NO_SUSPEND interrupts would have 
> caused
> similar issues to appear under specific conditions.
> 
> The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
> interrupts would be to use a wait_event() loop like the one in freeze_enter()
> (on top of the current linux-next or the pm-genirq branch of linux-pm.git),
> but wait for pm_abort_suspend to become true, to implement system suspend.

sorry, it took me a while since I needed to get some dependencies ported
to the pm-genirq base. Once I had that, it reproduced my original issue.
So far so good. I then looked into finding a solution following your
guidance. I'm not sure I really found what you had in mind, but below is
what I came up with, which seems to do it.
Please let me know how far off I am.

Thanks,
Sören

---8<--8<8<8<---
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index c2744b30d5d9..a4f9914571f1 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -25,7 +25,7 @@
 bool events_check_enabled __read_mostly;
 
 /* If set and the system is suspending, terminate the suspend. */
-static bool pm_abort_suspend __read_mostly;
+bool pm_abort_suspend __read_mostly;
 
 /*
  * Combined counters of registered wakeup events and wakeup events in progress.
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 6dadb25cb0d8..e6a6de8f76d0 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -33,6 +33,7 @@
 
 static const char *pm_labels[] = { "mem", "standby", "freeze", };
 const char *pm_states[PM_SUSPEND_MAX];
+extern bool pm_abort_suspend;
 
 static const struct platform_suspend_ops *suspend_ops;
 static const struct platform_freeze_ops *freeze_ops;
@@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
*wakeup)
if (error || suspend_test(TEST_CPUS))
goto Enable_cpus;
 
-   arch_suspend_disable_irqs();
-   BUG_ON(!irqs_disabled());
-
-   error = syscore_suspend();
-   if (!error) {
-   *wakeup = pm_wakeup_pending();
-   if (!(suspend_test(TEST_CORE) || *wakeup)) {
-   trace_suspend_resume(TPS("machine_suspend"),
-   state, true);
-   error = suspend_ops->enter(state);
-   trace_suspend_resume(TPS("machine_suspend"),
-   state, false);
-   events_check_enabled = false;
+   while (!pm_abort_suspend) {
+   arch_suspend_disable_irqs();
+   BUG_ON(!irqs_disabled());
+
+   error = syscore_suspend();
+   if (!error) {
+   *wakeup = pm_wakeup_pending();
+   if (!(suspend_test(TEST_CORE) || *wakeup)) {
+   trace_suspend_resume(TPS("machine_suspend"),
+   state, true);
+   error = suspend_ops->enter(state);
+   trace_suspend_resume(TPS("machine_suspend"),
+   state, false);
+   events_check_enabled = false;
+   }
+   syscore_resume();
}
-   syscore_resume();
- 

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-10-02 Thread Sören Brinkmann
Hi Rafael,

On Tue, 2014-09-23 at 01:01AM +0200, Rafael J. Wysocki wrote:
 On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
  On platforms that do not power off during suspend, successfully entering
  suspend races with timers.
  
  The race happening in a couple of location is:
  
1. disable IRQs   (e.g. arch_suspend_disable_irqs())
   ...
2. syscore_suspend()
- timekeeping_suspend()
 - clockevents_notify(SUSPEND)
  - tick_suspend()   (timers are turned off here)
   ...
3. wfi(wait for wake-IRQ here)
  
  Between steps 1 and 2 the timers can still generate interrupts that are
  not handled and stay pending until step 3. That pending IRQ causes an
  immediate - spurious - wake.
  
  The solution is to move the clockevents suspend/resume notification
  out of the syscore_suspend step and explictly call them at the appropriate
  time in the suspend/hibernation paths. I.e. timers are suspend _before_
  IRQs get disabled. And accordingly in the resume path.
  
  Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
  ---
  Hi,
  
  there was not a lot of discussion on the last submission. Just one comment 
  from
  Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
  response, does not apply, IMHO, since the platform does not re-enable
  interrupts.
 
 Well, you just don't agree with it.
 
 The problem with your approach is that timer interrupts aren't actually as
 special as you think and any other IRQF_NO_SUSPEND interrupts would have 
 caused
 similar issues to appear under specific conditions.
 
 The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
 interrupts would be to use a wait_event() loop like the one in freeze_enter()
 (on top of the current linux-next or the pm-genirq branch of linux-pm.git),
 but wait for pm_abort_suspend to become true, to implement system suspend.

sorry, it took me a while since I needed to get some dependencies ported
to the pm-genirq base. Once I had that, it reproduced my original issue.
So far so good. I then looked into finding a solution following your
guidance. I'm not sure I really found what you had in mind, but below is
what I came up with, which seems to do it.
Please let me know how far off I am.

Thanks,
Sören

---8--888---
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index c2744b30d5d9..a4f9914571f1 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -25,7 +25,7 @@
 bool events_check_enabled __read_mostly;
 
 /* If set and the system is suspending, terminate the suspend. */
-static bool pm_abort_suspend __read_mostly;
+bool pm_abort_suspend __read_mostly;
 
 /*
  * Combined counters of registered wakeup events and wakeup events in progress.
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 6dadb25cb0d8..e6a6de8f76d0 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -33,6 +33,7 @@
 
 static const char *pm_labels[] = { mem, standby, freeze, };
 const char *pm_states[PM_SUSPEND_MAX];
+extern bool pm_abort_suspend;
 
 static const struct platform_suspend_ops *suspend_ops;
 static const struct platform_freeze_ops *freeze_ops;
@@ -294,25 +295,27 @@ static int suspend_enter(suspend_state_t state, bool 
*wakeup)
if (error || suspend_test(TEST_CPUS))
goto Enable_cpus;
 
-   arch_suspend_disable_irqs();
-   BUG_ON(!irqs_disabled());
-
-   error = syscore_suspend();
-   if (!error) {
-   *wakeup = pm_wakeup_pending();
-   if (!(suspend_test(TEST_CORE) || *wakeup)) {
-   trace_suspend_resume(TPS(machine_suspend),
-   state, true);
-   error = suspend_ops-enter(state);
-   trace_suspend_resume(TPS(machine_suspend),
-   state, false);
-   events_check_enabled = false;
+   while (!pm_abort_suspend) {
+   arch_suspend_disable_irqs();
+   BUG_ON(!irqs_disabled());
+
+   error = syscore_suspend();
+   if (!error) {
+   *wakeup = pm_wakeup_pending();
+   if (!(suspend_test(TEST_CORE) || *wakeup)) {
+   trace_suspend_resume(TPS(machine_suspend),
+   state, true);
+   error = suspend_ops-enter(state);
+   trace_suspend_resume(TPS(machine_suspend),
+   state, false);
+   events_check_enabled = false;
+   }
+   syscore_resume();
}
-   syscore_resume();
-   }
 
-   arch_suspend_enable_irqs();
-   

Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-09-22 Thread Rafael J. Wysocki
On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
> On platforms that do not power off during suspend, successfully entering
> suspend races with timers.
> 
> The race happening in a couple of location is:
> 
>   1. disable IRQs (e.g. arch_suspend_disable_irqs())
>  ...
>   2. syscore_suspend()
>   -> timekeeping_suspend()
>-> clockevents_notify(SUSPEND)
> -> tick_suspend() (timers are turned off here)
>  ...
>   3. wfi  (wait for wake-IRQ here)
> 
> Between steps 1 and 2 the timers can still generate interrupts that are
> not handled and stay pending until step 3. That pending IRQ causes an
> immediate - spurious - wake.
> 
> The solution is to move the clockevents suspend/resume notification
> out of the syscore_suspend step and explictly call them at the appropriate
> time in the suspend/hibernation paths. I.e. timers are suspend _before_
> IRQs get disabled. And accordingly in the resume path.
> 
> Signed-off-by: Soren Brinkmann 
> ---
> Hi,
> 
> there was not a lot of discussion on the last submission. Just one comment 
> from
> Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
> response, does not apply, IMHO, since the platform does not re-enable
> interrupts.

Well, you just don't agree with it.

The problem with your approach is that timer interrupts aren't actually as
special as you think and any other IRQF_NO_SUSPEND interrupts would have caused
similar issues to appear under specific conditions.

The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
interrupts would be to use a wait_event() loop like the one in freeze_enter()
(on top of the current linux-next or the pm-genirq branch of linux-pm.git),
but wait for pm_abort_suspend to become true, to implement system suspend.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] PM / sleep: Fix racing timers

2014-09-22 Thread Rafael J. Wysocki
On Monday, September 22, 2014 10:07:03 AM Soren Brinkmann wrote:
 On platforms that do not power off during suspend, successfully entering
 suspend races with timers.
 
 The race happening in a couple of location is:
 
   1. disable IRQs (e.g. arch_suspend_disable_irqs())
  ...
   2. syscore_suspend()
   - timekeeping_suspend()
- clockevents_notify(SUSPEND)
 - tick_suspend() (timers are turned off here)
  ...
   3. wfi  (wait for wake-IRQ here)
 
 Between steps 1 and 2 the timers can still generate interrupts that are
 not handled and stay pending until step 3. That pending IRQ causes an
 immediate - spurious - wake.
 
 The solution is to move the clockevents suspend/resume notification
 out of the syscore_suspend step and explictly call them at the appropriate
 time in the suspend/hibernation paths. I.e. timers are suspend _before_
 IRQs get disabled. And accordingly in the resume path.
 
 Signed-off-by: Soren Brinkmann soren.brinkm...@xilinx.com
 ---
 Hi,
 
 there was not a lot of discussion on the last submission. Just one comment 
 from
 Rafael (https://lkml.org/lkml/2014/8/26/780), which - as I outlined in my
 response, does not apply, IMHO, since the platform does not re-enable
 interrupts.

Well, you just don't agree with it.

The problem with your approach is that timer interrupts aren't actually as
special as you think and any other IRQF_NO_SUSPEND interrupts would have caused
similar issues to appear under specific conditions.

The solution I would suggest and that actually covers all IRQF_NO_SUSPEND
interrupts would be to use a wait_event() loop like the one in freeze_enter()
(on top of the current linux-next or the pm-genirq branch of linux-pm.git),
but wait for pm_abort_suspend to become true, to implement system suspend.

Rafael

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/