Re: Linux 4.15-rc2

2018-02-21 Thread Eugene Syromiatnikov
On Sun, Dec 03, 2017 at 11:22:56AM -0500, Linus Torvalds wrote:
> 
> Linus Torvalds (6):
>   Rename superblock flags (MS_xyz -> SB_xyz)

This commit, while claims that it changes internal flags, also touches
an UAPI header (include/uapi/linux/bfs_fs.h), specifically, the macro
BFS_UNCLEAN.  I expect that either this macro should be in a private
header, or (if this check is expected to be available to the userspace)
at least SB_RDONLY should be also defined there.


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-15 Thread Thomas Gleixner
On Thu, 14 Dec 2017, Linus Torvalds wrote:

> On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner  wrote:
> >
> > But, because the silly firmware comes out of suspend with all PIC lines
> > unmasked for whatever reason, the PIC can observe that IRQ being raised and
> > the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> > think it's the firmware which causes it by unmasking the PIC irqs.
> 
> Yes, that sounds quite likely.

And just for the record I was able to figure out which interrupt comes in
and goes away again. It's the only level triggered interrupt, which is the
ACPI interrupt.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Rafael J. Wysocki
On Thu, Dec 14, 2017 at 11:36 PM, Thomas Gleixner  wrote:
> On Thu, 14 Dec 2017, Linus Torvalds wrote:
>> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner  wrote:
>> I just wanted to pipe up about that "irq7", because judging from your
>> email it seems like you think it's a real irq:
>>
>> >Now there is a race
>> > whether the kernel resume path manages to mask the PIC again early enough
>> > before something triggers IRQ7 or not.
>>
>> ..and that's not how the PIC works.
>>
>> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
>> interrupt, and it's documented. If the PIC gets an interrupt from
>> _any_ source, but the interrupt goes away before the PIC gets an
>> acknowledge from the CPU (and by "acknowledge", I'm not talking about
>> the explicit software IRQ ACK, I'm talking about the hardware
>> protocol, between the PIC and the CPU), the PIC will then report irq 7
>> as the interrupt - regardless of what the original was.
>>
>> The reason is almost always something like
>>
>>  - CPU interrupts are disabled or masked
>>
>>  - driver does a write to the external hardware that causes an
>> interrupt to be raised
>
> Which should be a non issue because _ALL_ PIC irq lines are masked at the
> PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
> sports similar behaviour the PIC should not ever observe that scenario.
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.

That's my understanding too.

Thanks,
Rafael


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Linus Torvalds
On Thu, Dec 14, 2017 at 2:36 PM, Thomas Gleixner  wrote:
>
> But, because the silly firmware comes out of suspend with all PIC lines
> unmasked for whatever reason, the PIC can observe that IRQ being raised and
> the CPU not handling it. So yes, I forgot about 7 being magic, but I still
> think it's the firmware which causes it by unmasking the PIC irqs.

Yes, that sounds quite likely.

Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Thomas Gleixner
On Thu, 14 Dec 2017, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner  wrote:
> I just wanted to pipe up about that "irq7", because judging from your
> email it seems like you think it's a real irq:
> 
> >Now there is a race
> > whether the kernel resume path manages to mask the PIC again early enough
> > before something triggers IRQ7 or not.
> 
> ..and that's not how the PIC works.
> 
> In fact, "legacy irq 7" is the _normal_ and very traditional spurious
> interrupt, and it's documented. If the PIC gets an interrupt from
> _any_ source, but the interrupt goes away before the PIC gets an
> acknowledge from the CPU (and by "acknowledge", I'm not talking about
> the explicit software IRQ ACK, I'm talking about the hardware
> protocol, between the PIC and the CPU), the PIC will then report irq 7
> as the interrupt - regardless of what the original was.
> 
> The reason is almost always something like
> 
>  - CPU interrupts are disabled or masked
> 
>  - driver does a write to the external hardware that causes an
> interrupt to be raised

Which should be a non issue because _ALL_ PIC irq lines are masked at the
PIC itself. All interrupts are routed through IOAPIC. So unless the IOAPIC
sports similar behaviour the PIC should not ever observe that scenario.

But, because the silly firmware comes out of suspend with all PIC lines
unmasked for whatever reason, the PIC can observe that IRQ being raised and
the CPU not handling it. So yes, I forgot about 7 being magic, but I still
think it's the firmware which causes it by unmasking the PIC irqs.

Thanks,

tglx




Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Linus Torvalds
On Thu, Dec 14, 2017 at 3:54 AM, Thomas Gleixner  wrote:
>
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
>
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.

Great debugging, and it looks like Rafael has a patch that already got
positive testing.

I just wanted to pipe up about that "irq7", because judging from your
email it seems like you think it's a real irq:

>Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not.

..and that's not how the PIC works.

In fact, "legacy irq 7" is the _normal_ and very traditional spurious
interrupt, and it's documented. If the PIC gets an interrupt from
_any_ source, but the interrupt goes away before the PIC gets an
acknowledge from the CPU (and by "acknowledge", I'm not talking about
the explicit software IRQ ACK, I'm talking about the hardware
protocol, between the PIC and the CPU), the PIC will then report irq 7
as the interrupt - regardless of what the original was.

The reason is almost always something like

 - CPU interrupts are disabled or masked

 - driver does a write to the external hardware that causes an
interrupt to be raised

 - CPU doesn't react to the irq due to the disabled/masked nature

 - but the driver then does something that masks the interrupt again

 - interrupts are enabled/unmasked on the CPU

 - CPU now acks the interrupt, but the PIC no longer sees any
interrupt source, so the PIC (that has to reply with *something*)
replies with that documented spurious irq7.

To confuse things further, irq7 is not _exclusively_ the spurious
interrupt, You can definitely put real hardware and connect it to pin7
of the PIC, and get real irq7 reports.

And to confuse things even *more*, this "irq7" thing is per-PIC, and
the PC model obviously has the whole "nested PIC" thing where the
second PIC is connected to irq2 of the first PIC. So there are *two*
different "spurious interrupt" reports, one for each PIC.

Anyway, to avoid this issue, drivers should strive to

 (a) actually take the interrupt when doing things that can cause
them, and have the interrupt handler do whatever it is that causes the
interrupt to go away (ie: "normal operation")

 (b) if you play games with clearing the source of the interrupt
_without_ taking the interrupt, you should strive to basically mask
the interrupt first.

So to do (b) you can do something like

  mask_device_interrupt(dev);
  read_from_device_to_synchronize(dev);

instead of (or perhaps _before_) disabling interrupts at a CPU level.
Suspend/resume obviously does tend to play games with these kinds of
things where you are no longer in "normal operation" and you do setup
without having interrupts actually enabled.

Or you can just decide that spurious interrupts are ok, and ignore the
issue. But they *can* be very confusing, and obviously in this case
that confusion then seems to have caused actual problems.

Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Maarten Lankhorst
Op 14-12-17 om 16:54 schreef Rafael J. Wysocki:
> On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
>> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
>>> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
>>> in fact requires the device to be in D0, so the caller should put it into
>>> D0 instead of trying to "update" its power state.
>>>
>>> [Note that the PCI layer doesn't put devices into low-power states during 
>>> the
>>> hibernation's "freeze" transition, but drivers can legitimately do that in
>>> their "freeze" callbacks which was overlooked in that code and that's what
>>> i915 does.]
>>>
>>> So IMO what we need is the change below.  I'm going to test it shortly,
>>> but please give it a go too.
>> So now this looks more reasonable:
>>
>>   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>>   __pci_write_msi_msg: :00:02.0 fee0100c 412a 
>>   __pci_write_msi_msg: Not written
>>   ...
>>   device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
>> [thaw]
>>   pci_pm_thaw_noirq <-dpm_run_callback
>>   __pci_write_msi_msg: :00:02.0 fee0100c 412a 
>>   device_pm_callback_end: i915 :00:02.0, err=0
>>   ...
>>   resume_irqs: Resume 125
>>   ...
>>   irq_handler_entry: irq=125 name=i915
> Cool.
>
> Let me respin it with a changelog etc then.
>
> Thanks,
> Rafael
>
>
The machine I was using for reproducing the bug appears to be fixed with this 
patch, so I now sent
it to intel's trybot for results.

https://patchwork.freedesktop.org/series/35367/

Thanks for looking at the bug!

~Maarten



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Rafael J. Wysocki
On Thursday, December 14, 2017 4:52:22 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> > in fact requires the device to be in D0, so the caller should put it into
> > D0 instead of trying to "update" its power state.
> > 
> > [Note that the PCI layer doesn't put devices into low-power states during 
> > the
> > hibernation's "freeze" transition, but drivers can legitimately do that in
> > their "freeze" callbacks which was overlooked in that code and that's what
> > i915 does.]
> > 
> > So IMO what we need is the change below.  I'm going to test it shortly,
> > but please give it a go too.
> 
> So now this looks more reasonable:
> 
>   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>   __pci_write_msi_msg: :00:02.0 fee0100c 412a 
>   __pci_write_msi_msg: Not written
>   ...
>   device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
> [thaw]
>   pci_pm_thaw_noirq <-dpm_run_callback
>   __pci_write_msi_msg: :00:02.0 fee0100c 412a 
>   device_pm_callback_end: i915 :00:02.0, err=0
>   ...
>   resume_irqs: Resume 125
>   ...
>   irq_handler_entry: irq=125 name=i915

Cool.

Let me respin it with a changelog etc then.

Thanks,
Rafael



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Thomas Gleixner
On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
> in fact requires the device to be in D0, so the caller should put it into
> D0 instead of trying to "update" its power state.
> 
> [Note that the PCI layer doesn't put devices into low-power states during the
> hibernation's "freeze" transition, but drivers can legitimately do that in
> their "freeze" callbacks which was overlooked in that code and that's what
> i915 does.]
> 
> So IMO what we need is the change below.  I'm going to test it shortly,
> but please give it a go too.

So now this looks more reasonable:

  irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
  __pci_write_msi_msg: :00:02.0 fee0100c 412a 
  __pci_write_msi_msg: Not written
  ...
  device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
[thaw]
  pci_pm_thaw_noirq <-dpm_run_callback
  __pci_write_msi_msg: :00:02.0 fee0100c 412a 
  device_pm_callback_end: i915 :00:02.0, err=0
  ...
  resume_irqs: Resume 125
  ...
  irq_handler_entry: irq=125 name=i915

Thanks,

tglx

> ---
>  drivers/pci/pci-driver.c |7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/pci/pci-driver.c
> ===
> --- linux-pm.orig/drivers/pci/pci-driver.c
> +++ linux-pm/drivers/pci/pci-driver.c
> @@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
>   if (pci_has_legacy_pm_support(pci_dev))
>   return pci_legacy_resume_early(dev);
>  
> - pci_update_current_state(pci_dev, PCI_D0);
> + /*
> +  * pci_restore_state() requires the device to be in D0 (because of MSI
> +  * restoration among other things), so force it into D0 in case the
> +  * driver's "freeze" callbacks put it into a low-power state directly.
> +  */
> + pci_set_power_state(pci_dev, PCI_D0);
>   pci_restore_state(pci_dev);
>  
>   if (drv && drv->pm && drv->pm->thaw_noirq)
> 
> 


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Rafael J. Wysocki
On Thursday, December 14, 2017 1:30:37 PM CET Thomas Gleixner wrote:
> On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> > On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > > Now the graphics issue is a different story. That only happens on
> > > hibernation after doing the snapshot. There all non boot cpus are onlined
> > > again and after that the devices are 'thawed'. The following reenable of
> > > interrupts fails because i915 is not in PCI_D0 state.
> > > 
> > > Suspend:
> > > 
> > >irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > >__pci_write_msi_msg: :00:02.0 fee0100c 412a
> > >__pci_write_msi_msg: Not written <- Device not in PCI_D0
> > >
> > >device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq 
> > > bus [resume]
> > >pci_pm_resume_noirq <-dpm_run_callback
> > >pci_pm_resume_noirq <-dpm_run_callback
> > >pci_pm_default_resume_early <-pci_pm_resume_noirq
> > >pci_pm_default_resume_early <-pci_pm_resume_noirq
> > >__pci_write_msi_msg: :00:02.0 fee0100c 412a  <-- Set 
> > > the new affinity
> > >device_pm_callback_end: i915 :00:02.0, err=0
> > 
> > So this works, because we power up the device during resume even if it
> > had been suspended (via runtime PM) before the suspend started.
> > 
> > > Hibernate:
> > > 
> > >irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> > >__pci_write_msi_msg: :00:02.0 fee0100c 412a
> > >__pci_write_msi_msg: Not written <- Device not in PCI_D0
> > >
> > >device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq 
> > > bus [thaw]
> > >pci_pm_thaw_noirq <-dpm_run_callback
> > >__pci_write_msi_msg: :00:02.0 fee0100c 412a
> > >__pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
> > >device_pm_callback_end: i915 :00:02.0, err=0
> > 
> > And here we try to leave the device alone which is OK for devices in D0,
> > but not for suspended ones.
> > 
> > It looks like we need to power up them at the "thaw" time too or at least
> > I don't see how to address that differently.
> 
> The question is whether the code which brings the device out of D0 should
> write the message unconditionally. That would be sufficient I think.

It doesn't have to do that.

The problem here is that pci_pm_thaw_noirq() calls pci_restore_state() which
in fact requires the device to be in D0, so the caller should put it into
D0 instead of trying to "update" its power state.

[Note that the PCI layer doesn't put devices into low-power states during the
hibernation's "freeze" transition, but drivers can legitimately do that in
their "freeze" callbacks which was overlooked in that code and that's what
i915 does.]

So IMO what we need is the change below.  I'm going to test it shortly,
but please give it a go too.

---
 drivers/pci/pci-driver.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/pci/pci-driver.c
===
--- linux-pm.orig/drivers/pci/pci-driver.c
+++ linux-pm/drivers/pci/pci-driver.c
@@ -1027,7 +1027,12 @@ static int pci_pm_thaw_noirq(struct devi
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume_early(dev);
 
-   pci_update_current_state(pci_dev, PCI_D0);
+   /*
+* pci_restore_state() requires the device to be in D0 (because of MSI
+* restoration among other things), so force it into D0 in case the
+* driver's "freeze" callbacks put it into a low-power state directly.
+*/
+   pci_set_power_state(pci_dev, PCI_D0);
pci_restore_state(pci_dev);
 
if (drv && drv->pm && drv->pm->thaw_noirq)



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Thomas Gleixner
On Thu, 14 Dec 2017, Thomas Gleixner wrote:
> Now, what's different vs. 4.14:
> 
> The 4.14 code accidentaly had the irq descriptor for this vector still
> populated in the old CPU due to the convoluted way the vector allocation
> worked. I have still to investigate if one of those cases is actually
> leaking the descriptor, which would be a fatal bug.

It doesn't leak. It repopulates it at the same place out of sheer luck.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Thomas Gleixner
On Thu, 14 Dec 2017, Rafael J. Wysocki wrote:
> On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> > Now the graphics issue is a different story. That only happens on
> > hibernation after doing the snapshot. There all non boot cpus are onlined
> > again and after that the devices are 'thawed'. The following reenable of
> > interrupts fails because i915 is not in PCI_D0 state.
> > 
> > Suspend:
> > 
> >irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> >__pci_write_msi_msg: :00:02.0 fee0100c 412a
> >__pci_write_msi_msg: Not written <- Device not in PCI_D0
> >
> >device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq 
> > bus [resume]
> >pci_pm_resume_noirq <-dpm_run_callback
> >pci_pm_resume_noirq <-dpm_run_callback
> >pci_pm_default_resume_early <-pci_pm_resume_noirq
> >pci_pm_default_resume_early <-pci_pm_resume_noirq
> >__pci_write_msi_msg: :00:02.0 fee0100c 412a  <-- Set the 
> > new affinity
> >device_pm_callback_end: i915 :00:02.0, err=0
> 
> So this works, because we power up the device during resume even if it
> had been suspended (via runtime PM) before the suspend started.
> 
> > Hibernate:
> > 
> >irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
> >__pci_write_msi_msg: :00:02.0 fee0100c 412a
> >__pci_write_msi_msg: Not written <- Device not in PCI_D0
> >
> >device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq 
> > bus [thaw]
> >pci_pm_thaw_noirq <-dpm_run_callback
> >__pci_write_msi_msg: :00:02.0 fee0100c 412a
> >__pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
> >device_pm_callback_end: i915 :00:02.0, err=0
> 
> And here we try to leave the device alone which is OK for devices in D0,
> but not for suspended ones.
> 
> It looks like we need to power up them at the "thaw" time too or at least
> I don't see how to address that differently.

The question is whether the code which brings the device out of D0 should
write the message unconditionally. That would be sufficient I think.

Thanks,

tglx



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Rafael J. Wysocki
On Thursday, December 14, 2017 12:54:05 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > > 
> > > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  
> > > > > wrote:
> > > > > >
> > > > > > Definitely. That was fragile forever but puzzles me is that I can't 
> > > > > > figure
> > > > > > out what now causes that spurious interrupt to surface out of the 
> > > > > > blue.
> > > > > 
> > > > > Perhaps just timing?
> > > > 
> > > > That's what I'm trying to figure out right now, because that is the only
> > > > sensible explanation left. The whole machinery of suspend is exactly the
> > > > same with and without the vector changes. I instrumented all functions
> > > > involved and the picture is the same. I even do not see any fundamental
> > > > timing differences where one would say: That's it.
> > > > 
> > > > What puzzles me even more is that in the range of commits I'm fiddling 
> > > > with
> > > > there is no other change than the vector management stuff and the point
> > > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > > works nicely here, so that might just point to a very subtle timing 
> > > > issue.
> > > 
> > > After doing more debugging on this it turns out that this looks like a
> > > legacy interrupt coming in. The vector number is always 55, which is 
> > > legacy
> > > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > > masked and vector 55 is completely unused.
> > > 
> > > More questions than answers. Still investigating.
> 
> At least that one could be explained by the changes. In the previous
> management scheme the IOAPIC interrupts were always allocated even when the
> interrupt was not in use. The new scheme does not longer do that because
> people complained about the vector waste (16 vectors on each CPU) and it
> got rid of all the special casing of IRQ0-15.
> 
> So the old scheme silently consumed the spurious vector. I added debug code
> to that effect to 4.14 and on that machine IRQ7 is triggered at the same
> point post resume and the core code drops it silently because the interrupt
> is marked masked and no action assigned.
> 
> So the only difference to today is that the new code complains, while the
> old one does an extra mask of the already masked IOAPIC pin and silently
> returns.
> 
> After quite some investigation I found out that its independent of the
> graphics thing. That's a genuine issue on that platform which seems to emit
> random legacy vectors which were never ever used for unknown reasons. I
> verified that both the IOAPIC and the PIC are masked, so they cannot send
> crap. Though it turned out that the silly firmware unmasks the PIC and
> leaves it that way when it returns from suspend. Now there is a race
> whether the kernel resume path manages to mask the PIC again early enough
> before something triggers IRQ7 or not. Adding/removing debug code makes the
> problem come and go. So I really don't worry about that one and rather
> prefer to have the spurious interrupt printed than silently consumed by
> chance.

OK

> Now the graphics issue is a different story. That only happens on
> hibernation after doing the snapshot. There all non boot cpus are onlined
> again and after that the devices are 'thawed'. The following reenable of
> interrupts fails because i915 is not in PCI_D0 state.
> 
> Suspend:
> 
>irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>__pci_write_msi_msg: :00:02.0 fee0100c 412a
>__pci_write_msi_msg: Not written <- Device not in PCI_D0
>
>device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
> [resume]
>pci_pm_resume_noirq <-dpm_run_callback
>pci_pm_resume_noirq <-dpm_run_callback
>pci_pm_default_resume_early <-pci_pm_resume_noirq
>pci_pm_default_resume_early <-pci_pm_resume_noirq
>__pci_write_msi_msg: :00:02.0 fee0100c 412a  <-- Set the 
> new affinity
>device_pm_callback_end: i915 :00:02.0, err=0

So this works, because we power up the device during resume even if it
had been suspended (via runtime PM) before the suspend started.

> Hibernate:
> 
>irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
>__pci_write_msi_msg: :00:02.0 fee0100c 412a
>__pci_write_msi_msg: Not written <- Device not in PCI_D0
>
>device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
> [thaw]
>pci_pm_thaw_noirq <-dpm_run_callback
>__pci_write_msi_msg: :00:02.0 fee0100c 412a
>__pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
>device_pm_callback_end: i915 :00:02.0, err=0

And here we try to leave the device alone which is OK for devices in D0,
but not for suspended 

Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-14 Thread Thomas Gleixner
On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > 
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  
> > > > wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't 
> > > > > figure
> > > > > out what now causes that spurious interrupt to surface out of the 
> > > > > blue.
> > > > 
> > > > Perhaps just timing?
> > > 
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > > 
> > > What puzzles me even more is that in the range of commits I'm fiddling 
> > > with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> > 
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> > 
> > More questions than answers. Still investigating.

At least that one could be explained by the changes. In the previous
management scheme the IOAPIC interrupts were always allocated even when the
interrupt was not in use. The new scheme does not longer do that because
people complained about the vector waste (16 vectors on each CPU) and it
got rid of all the special casing of IRQ0-15.

So the old scheme silently consumed the spurious vector. I added debug code
to that effect to 4.14 and on that machine IRQ7 is triggered at the same
point post resume and the core code drops it silently because the interrupt
is marked masked and no action assigned.

So the only difference to today is that the new code complains, while the
old one does an extra mask of the already masked IOAPIC pin and silently
returns.

After quite some investigation I found out that its independent of the
graphics thing. That's a genuine issue on that platform which seems to emit
random legacy vectors which were never ever used for unknown reasons. I
verified that both the IOAPIC and the PIC are masked, so they cannot send
crap. Though it turned out that the silly firmware unmasks the PIC and
leaves it that way when it returns from suspend. Now there is a race
whether the kernel resume path manages to mask the PIC again early enough
before something triggers IRQ7 or not. Adding/removing debug code makes the
problem come and go. So I really don't worry about that one and rather
prefer to have the spurious interrupt printed than silently consumed by
chance.

Now the graphics issue is a different story. That only happens on
hibernation after doing the snapshot. There all non boot cpus are onlined
again and after that the devices are 'thawed'. The following reenable of
interrupts fails because i915 is not in PCI_D0 state.

Suspend:

   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
   __pci_write_msi_msg: :00:02.0 fee0100c 412a
   __pci_write_msi_msg: Not written <- Device not in PCI_D0
   
   device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
[resume]
   pci_pm_resume_noirq <-dpm_run_callback
   pci_pm_resume_noirq <-dpm_run_callback
   pci_pm_default_resume_early <-pci_pm_resume_noirq
   pci_pm_default_resume_early <-pci_pm_resume_noirq
   __pci_write_msi_msg: :00:02.0 fee0100c 412a  <-- Set the new 
affinity
   device_pm_callback_end: i915 :00:02.0, err=0

Hibernate:

   irq_migrate_all_off_this_cpu: Mask 125 pci_msi_mask_irq+0x0/0x10
   __pci_write_msi_msg: :00:02.0 fee0100c 412a
   __pci_write_msi_msg: Not written <- Device not in PCI_D0
   
   device_pm_callback_start: i915 :00:02.0, parent: pci:00, noirq bus 
[thaw]
   pci_pm_thaw_noirq <-dpm_run_callback
   __pci_write_msi_msg: :00:02.0 fee0100c 412a
   __pci_write_msi_msg: Not written  <--- Device is not in PCI_D0
   device_pm_callback_end: i915 :00:02.0, err=0

So that code path fails to set the new affinity because at the point where
the MSI msg should be written the device state is != PCI_D0.

Now, what's different vs. 4.14:

The 4.14 code accidentaly had the irq descriptor for this vector still
populated in the old CPU due to the convoluted way the vector allocation
worked. I have still to investigate if one of those cases is actually
leaking the descriptor, which would be a fatal bug.

But the new code does a proper cleanup and does not repopulate it on 

Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Rafael J. Wysocki
On Wed, Dec 13, 2017 at 11:39 PM, Rafael J. Wysocki  wrote:
> On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
>> On Wed, 13 Dec 2017, Linus Torvalds wrote:
>>
>> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  
>> > wrote:
>> > >
>> > > Definitely. That was fragile forever but puzzles me is that I can't 
>> > > figure
>> > > out what now causes that spurious interrupt to surface out of the blue.
>> >
>> > Perhaps just timing?
>>
>> That's what I'm trying to figure out right now, because that is the only
>> sensible explanation left. The whole machinery of suspend is exactly the
>> same with and without the vector changes. I instrumented all functions
>> involved and the picture is the same. I even do not see any fundamental
>> timing differences where one would say: That's it.
>>
>> What puzzles me even more is that in the range of commits I'm fiddling with
>> there is no other change than the vector management stuff and the point
>> where it breaks makes no sense at all. The point Maarten bisected it to
>> works nicely here, so that might just point to a very subtle timing issue.
>>
>> > How hard would it be to change the ordering to just redirect irqs first?
>>
>> The whole interrupt redirection happens when the non boot CPUs are brought
>> down, which is the very last step before the actual suspend happens.
>>
>> We could probably do that earlier, but that's something Rafael needs to
>> answer ultimately.
>
> Well, that's both flattering and concerning. ;-)
>
> Anyway, yes, we can do that earlier AFAICS.  Action handlers are not going to
> run after we've called suspend_device_irqs() which happens before the final
> stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
> gets the interrupt from that point on (it is either wakeup or unwanted then).

There is a catch that we don't and likely should not do that for
suspend-to-idle, but since we have pm_suspend_target_state now, that
case can be distinguished from the "full suspend" one readily.

Thanks,
Rafael


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Rafael J. Wysocki
On Wednesday, December 13, 2017 10:06:40 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > > 
> > > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  
> > > > wrote:
> > > > >
> > > > > Definitely. That was fragile forever but puzzles me is that I can't 
> > > > > figure
> > > > > out what now causes that spurious interrupt to surface out of the 
> > > > > blue.
> > > > 
> > > > Perhaps just timing?
> > > 
> > > That's what I'm trying to figure out right now, because that is the only
> > > sensible explanation left. The whole machinery of suspend is exactly the
> > > same with and without the vector changes. I instrumented all functions
> > > involved and the picture is the same. I even do not see any fundamental
> > > timing differences where one would say: That's it.
> > > 
> > > What puzzles me even more is that in the range of commits I'm fiddling 
> > > with
> > > there is no other change than the vector management stuff and the point
> > > where it breaks makes no sense at all. The point Maarten bisected it to
> > > works nicely here, so that might just point to a very subtle timing issue.
> > 
> > After doing more debugging on this it turns out that this looks like a
> > legacy interrupt coming in. The vector number is always 55, which is legacy
> > IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> > masked and vector 55 is completely unused.
> > 
> > More questions than answers. Still investigating.
> 
> And it does not explain Maartens report which gets a spurious vector 33 on
> CPU4 after the non boot cpus have been brought online again. And that's the
> vector which was assigned before the affinity was moved by unplugging CPU4.
> 
> Hrmpf. Even more mystery to solve.

Any chance to look at /proc/interrupts from a machine where that can be
reproduced?

I'm also curious if that can be reproduced by doing CPU offline/online
without suspending?



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Rafael J. Wysocki
On Wednesday, December 13, 2017 7:19:17 PM CET Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
> 
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> > 
> > Perhaps just timing?
> 
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
> 
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.
> 
> > How hard would it be to change the ordering to just redirect irqs first?
> 
> The whole interrupt redirection happens when the non boot CPUs are brought
> down, which is the very last step before the actual suspend happens.
> 
> We could probably do that earlier, but that's something Rafael needs to
> answer ultimately.

Well, that's both flattering and concerning. ;-)

Anyway, yes, we can do that earlier AFAICS.  Action handlers are not going to
run after we've called suspend_device_irqs() which happens before the final
stage of PCI devices suspend (suspend_noirq) and it doesn't matter which CPU
gets the interrupt from that point on (it is either wakeup or unwanted then).

Thanks,
Rafael



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Thomas Gleixner
On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> > 
> > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  
> > > wrote:
> > > >
> > > > Definitely. That was fragile forever but puzzles me is that I can't 
> > > > figure
> > > > out what now causes that spurious interrupt to surface out of the blue.
> > > 
> > > Perhaps just timing?
> > 
> > That's what I'm trying to figure out right now, because that is the only
> > sensible explanation left. The whole machinery of suspend is exactly the
> > same with and without the vector changes. I instrumented all functions
> > involved and the picture is the same. I even do not see any fundamental
> > timing differences where one would say: That's it.
> > 
> > What puzzles me even more is that in the range of commits I'm fiddling with
> > there is no other change than the vector management stuff and the point
> > where it breaks makes no sense at all. The point Maarten bisected it to
> > works nicely here, so that might just point to a very subtle timing issue.
> 
> After doing more debugging on this it turns out that this looks like a
> legacy interrupt coming in. The vector number is always 55, which is legacy
> IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> masked and vector 55 is completely unused.
> 
> More questions than answers. Still investigating.

And it does not explain Maartens report which gets a spurious vector 33 on
CPU4 after the non boot cpus have been brought online again. And that's the
vector which was assigned before the affinity was moved by unplugging CPU4.

Hrmpf. Even more mystery to solve.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Thomas Gleixner
On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Linus Torvalds wrote:
> 
> > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  wrote:
> > >
> > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > out what now causes that spurious interrupt to surface out of the blue.
> > 
> > Perhaps just timing?
> 
> That's what I'm trying to figure out right now, because that is the only
> sensible explanation left. The whole machinery of suspend is exactly the
> same with and without the vector changes. I instrumented all functions
> involved and the picture is the same. I even do not see any fundamental
> timing differences where one would say: That's it.
> 
> What puzzles me even more is that in the range of commits I'm fiddling with
> there is no other change than the vector management stuff and the point
> where it breaks makes no sense at all. The point Maarten bisected it to
> works nicely here, so that might just point to a very subtle timing issue.

After doing more debugging on this it turns out that this looks like a
legacy interrupt coming in. The vector number is always 55, which is legacy
IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
masked and vector 55 is completely unused.

More questions than answers. Still investigating.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Andy Lutomirski
On Wed, Dec 13, 2017 at 3:16 AM, Jarkko Nikula
 wrote:
> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
>>
>> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
>>  wrote:
>>>
>>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski 
>>> wrote:
>>
>> Like this?
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=cb855aa9679a15adbe43732f5854270de2b35856
>>
>> I've barely tested it.  It suspended and resumed once in a 64-bit VM.
>> It compiles on 32-bit.
>>
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
>
> Tested-by: Jarkko Nikula 

Thanks!

I've split the patch into three and put your Tested-By on all of them,
since the end state is exactly the same as what you tested.  I'll
email them out once I test them myself :)


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Thomas Gleixner
On Wed, 13 Dec 2017, Linus Torvalds wrote:

> On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  wrote:
> >
> > Definitely. That was fragile forever but puzzles me is that I can't figure
> > out what now causes that spurious interrupt to surface out of the blue.
> 
> Perhaps just timing?

That's what I'm trying to figure out right now, because that is the only
sensible explanation left. The whole machinery of suspend is exactly the
same with and without the vector changes. I instrumented all functions
involved and the picture is the same. I even do not see any fundamental
timing differences where one would say: That's it.

What puzzles me even more is that in the range of commits I'm fiddling with
there is no other change than the vector management stuff and the point
where it breaks makes no sense at all. The point Maarten bisected it to
works nicely here, so that might just point to a very subtle timing issue.

> How hard would it be to change the ordering to just redirect irqs first?

The whole interrupt redirection happens when the non boot CPUs are brought
down, which is the very last step before the actual suspend happens.

We could probably do that earlier, but that's something Rafael needs to
answer ultimately.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Linus Torvalds
On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner  wrote:
>
> Definitely. That was fragile forever but puzzles me is that I can't figure
> out what now causes that spurious interrupt to surface out of the blue.

Perhaps just timing?

How hard would it be to change the ordering to just redirect irqs first?

 Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Thomas Gleixner
On Wed, 13 Dec 2017, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-pm]
> 
> On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> > So I was finally able to figure out what the hell is going on:
> > 
> > Suspend:
> > 
> >  - The device suspend code puts the graphics card into a power
> >state != PCI_D0.
> > 
> >  - Offline non boot CPUs
> > 
> >  - Break interrupt affinity. Allocate new vector on CPU 0, compose and
> >write MSI message which ends up in:
> > 
> >__pci_write_msi_msg(entry, msg)
> >{
> > if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> >/* Don't touch the hardware now */
> > } else {
> >
> > }
> > entry->msg = *msg;
> >}
> >  
> >   So because the device is not in PCI_D0 the message is not written. It's
> >   written in the device resume path.
> 
> I'm not a PM guru, but this ordering seems fragile.  If we offline
> CPUs before re-targeting interrupts directed at those CPUs, aren't we
> always going to be at risk of sending interrupts to an offline CPU?
> 
> Even if the device is now asleep and therefore should not generate an
> interrupt, it seems like there's a window when the device returns to
> PCI_D0 where it could generate an interrupt before we have a chance to
> update the MSI message.

Definitely. That was fragile forever but puzzles me is that I can't figure
out what now causes that spurious interrupt to surface out of the blue.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Bjorn Helgaas
[+cc linux-pci, linux-pm]

On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> So I was finally able to figure out what the hell is going on:
> 
> Suspend:
> 
>  - The device suspend code puts the graphics card into a power
>state != PCI_D0.
> 
>  - Offline non boot CPUs
> 
>  - Break interrupt affinity. Allocate new vector on CPU 0, compose and
>write MSI message which ends up in:
> 
>__pci_write_msi_msg(entry, msg)
>{
>   if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
>  /* Don't touch the hardware now */
>   } else {
>  
>   }
>   entry->msg = *msg;
>}
>  
>   So because the device is not in PCI_D0 the message is not written. It's
>   written in the device resume path.

I'm not a PM guru, but this ordering seems fragile.  If we offline
CPUs before re-targeting interrupts directed at those CPUs, aren't we
always going to be at risk of sending interrupts to an offline CPU?

Even if the device is now asleep and therefore should not generate an
interrupt, it seems like there's a window when the device returns to
PCI_D0 where it could generate an interrupt before we have a chance to
update the MSI message.

> Resume:
> [  139.670446] ACPI: Low-level resume complete
> [  139.670541] PM: Restoring platform NVS memory
> [  139.672462] do_IRQ: 0.55 No irq handler for vector
> [  139.672475] Enabling non-boot CPUs ...
> 
> So the spurious interrupt happens early and way before the device resume
> code writes the new MSI message.
> 
> I checked the behaviour on 4.14. The MSI write is delayed there in the same
> way, but there is no spurious interrupt. There is no interrupt coming in at
> all _BEFORE_ the device is put out of PCI_D0.
> 
> And this has certainly nothing to do with the vector management changes,
> but I can't figure yet what makes that spurious interrupt to be sent.
> 
> Any ideas welcome.
> 
> Thanks,
> 
>   tglx
> 


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Thomas Gleixner
So I was finally able to figure out what the hell is going on:

Suspend:

 - The device suspend code puts the graphics card into a power
   state != PCI_D0.

 - Offline non boot CPUs

 - Break interrupt affinity. Allocate new vector on CPU 0, compose and
   write MSI message which ends up in:

   __pci_write_msi_msg(entry, msg)
   {
if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
   /* Don't touch the hardware now */
} else {
   
}
entry->msg = *msg;
   }
 
  So because the device is not in PCI_D0 the message is not written. It's
  written in the device resume path.

Resume:
[  139.670446] ACPI: Low-level resume complete
[  139.670541] PM: Restoring platform NVS memory
[  139.672462] do_IRQ: 0.55 No irq handler for vector
[  139.672475] Enabling non-boot CPUs ...

So the spurious interrupt happens early and way before the device resume
code writes the new MSI message.

I checked the behaviour on 4.14. The MSI write is delayed there in the same
way, but there is no spurious interrupt. There is no interrupt coming in at
all _BEFORE_ the device is put out of PCI_D0.

And this has certainly nothing to do with the vector management changes,
but I can't figure yet what makes that spurious interrupt to be sent.

Any ideas welcome.

Thanks,

tglx



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Ingo Molnar

* Jarkko Nikula  wrote:

> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> > On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> >  wrote:
> > > On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski  wrote:
> > Like this?
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=cb855aa9679a15adbe43732f5854270de2b35856
> > 
> > I've barely tested it.  It suspended and resumed once in a 64-bit VM.
> > It compiles on 32-bit.
> > 
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
> 
> Tested-by: Jarkko Nikula 

Great, thanks for the testing!

Andy, mind sending the final fix with a changelog and the Reported-by/Tested-by 
tags, etc?

Thanks,

Ingo


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-13 Thread Jarkko Nikula

On 12/13/2017 12:10 AM, Andy Lutomirski wrote:

On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
 wrote:

On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski  wrote:

Like this?

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=cb855aa9679a15adbe43732f5854270de2b35856

I've barely tested it.  It suspended and resumed once in a 64-bit VM.
It compiles on 32-bit.

I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 
64-bit and 32-bit work.


Tested-by: Jarkko Nikula 


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Andy Lutomirski
On Tue, Dec 12, 2017 at 2:33 PM, Linus Torvalds
 wrote:
> On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski  wrote:
>>
>> (That link might not work for a little bit.  I'm not sure what's up.)
>
> I think your link is just bogus.
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> works.

If I click "commit" on that page, I get my link, and it's still busted.

>
> Anyway, the code looks much better to me.
>
> Whether it _works_ is another matter, of course.
>
> Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Linus Torvalds
On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski  wrote:
>
> (That link might not work for a little bit.  I'm not sure what's up.)

I think your link is just bogus.

  
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes

works.

Anyway, the code looks much better to me.

Whether it _works_ is another matter, of course.

Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Andy Lutomirski
On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
 wrote:
> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski  wrote:
>>>
>>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>>> and plays with interrupt state. Just load the segment register, and
>>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>>> need for the swapgs dance.
>>
>> Using what helper?  On x86_64, it can fault, and IIRC we explicitly
>> don't allow loadsegment(gs, ...).
>
> Just do the loadsegment() thing. The fact that we don't have a gs
> version of it is legacy - to catch bad users. It shouldn't stop us
> from having good users.
>
> That said - can it really fault? Because if it can, then why can't %fs
> fault? And on x86-64, we just do
>
> asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
>
> and don't actually use 'loadsegment()' for _any_ of the segments.  We
> only do the fault protection on 32-bit.
>
> In fact, we really should try to avoid taking faults here anyway,
> shouldn't we? We haven't loaded enough of the context yet.
>
> Hmm.
>
> Maybe we should load only the fixed kernel segments at this point, and
> then do all the loadsegment() of gs/fs in the later phase when we're
> all set up.
>
> THERE we can do the swapgs dance with interrupt tracing etc, because
> *there* we actually are fully set up. I guess that means reloading the
> FS/GS base MSR's,

Like this?

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=cb855aa9679a15adbe43732f5854270de2b35856

I've barely tested it.  It suspended and resumed once in a 64-bit VM.
It compiles on 32-bit.

(That link might not work for a little bit.  I'm not sure what's up.)


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Linus Torvalds
On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski  wrote:
>>
>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>> and plays with interrupt state. Just load the segment register, and
>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>> need for the swapgs dance.
>
> Using what helper?  On x86_64, it can fault, and IIRC we explicitly
> don't allow loadsegment(gs, ...).

Just do the loadsegment() thing. The fact that we don't have a gs
version of it is legacy - to catch bad users. It shouldn't stop us
from having good users.

That said - can it really fault? Because if it can, then why can't %fs
fault? And on x86-64, we just do

asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));

and don't actually use 'loadsegment()' for _any_ of the segments.  We
only do the fault protection on 32-bit.

In fact, we really should try to avoid taking faults here anyway,
shouldn't we? We haven't loaded enough of the context yet.

Hmm.

Maybe we should load only the fixed kernel segments at this point, and
then do all the loadsegment() of gs/fs in the later phase when we're
all set up.

THERE we can do the swapgs dance with interrupt tracing etc, because
*there* we actually are fully set up. I guess that means reloading the
FS/GS base MSR's,

  Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Andy Lutomirski
On Tue, Dec 12, 2017 at 9:27 AM, Linus Torvalds
 wrote:
> On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
>  wrote:
>>
>> That said, this *all* smells wrong. Why is there a special
>> fix_processor_context() function at all with different 32-bit and
>> 64-bit behavior? This code is all written to be maximally confusing.
>
> Hmm. Looking a bit more at this, I think it should be solved by:
>
>  - load the original read-write GDT early, along with the IDT.
>
>   We have already saved it off in __save_processor_state(), and it may
> have already gotten loaded very early in at least some of the paths,
> but it definitely hasn't gotten reloaded in all the paths (think
> "suspend/resume testing" etc).
>
>  - add the LDT descriptor to the save area too, exactly like we
> already have IDT/GDT.
>
>Then, we can do "load_ldt()" early (along with IDT and GDT).
>
>  - now we can just load all the segments early, and get the percpu and
> stack canary stuff without any special cases
>
> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
> and plays with interrupt state. Just load the segment register, and
> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
> need for the swapgs dance.

Using what helper?  On x86_64, it can fault, and IIRC we explicitly
don't allow loadsegment(gs, ...).

>
>  - now that we have a fully working system, then the
> "fix_processor_context()" code can do the "fancy" stuff like setting
> up the RO fixmap GDT and re-initializing the TLB state. Those want
> percpu stuff.
>
> In other words, why not try to organize this so that we do a simple
> and fairly mindless restore of the low-level state first? Avoid all
> the "system is halfway up" crud.
>
> Would that work for people? Andy?

Other than the above, more or less.

But we should really do all the user segments last.  They're not at
all needed for normal kernel execution, so I think they should be the
very last part.

I'll try to get to this today.


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Linus Torvalds
On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
 wrote:
>
> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.

Hmm. Looking a bit more at this, I think it should be solved by:

 - load the original read-write GDT early, along with the IDT.

  We have already saved it off in __save_processor_state(), and it may
have already gotten loaded very early in at least some of the paths,
but it definitely hasn't gotten reloaded in all the paths (think
"suspend/resume testing" etc).

 - add the LDT descriptor to the save area too, exactly like we
already have IDT/GDT.

   Then, we can do "load_ldt()" early (along with IDT and GDT).

 - now we can just load all the segments early, and get the percpu and
stack canary stuff without any special cases

- do NOT use "load_gs_index()", which does that swapgs dance (twice!)
and plays with interrupt state. Just load the segment register, and
then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
need for the swapgs dance.

 - now that we have a fully working system, then the
"fix_processor_context()" code can do the "fancy" stuff like setting
up the RO fixmap GDT and re-initializing the TLB state. Those want
percpu stuff.

In other words, why not try to organize this so that we do a simple
and fairly mindless restore of the low-level state first? Avoid all
the "system is halfway up" crud.

Would that work for people? Andy?

   Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-12 Thread Pavel Machek
Hi!

> > > ...should take 10 seconds or so.
> > I'm told 0day does *some* suspend/resume testing, but I think it's
> > pretty limited, partly because the kinds of machines it primarily
> > works on don't really support suspend/resume at all.
> 
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.

Thanks!

> >  I'm also not sure
> > just how many of those machines are 32-bit at all..
> 
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.

Yes, Intel cpus are pretty good at backwards compatibility, and most
problems are not subtle at all. So yes, 32-bit kernel / rootfs on
recent machine should be good for testing.

Best regards,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-11 Thread Andy Lutomirski
On Mon, Dec 11, 2017 at 6:09 AM, Zhang Rui  wrote:
> On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
>> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek  wrote:
>> >
>> >
>> > Confirmed, revert fixes it. You see how it moves
>> > fix_processor_context
>> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
>> > machines exist? Aha.
>> Yeah, people do.
>>
>> Andy?
>>
>> >
>> > Which brings me to .. various people do automated testing of
>> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
>> > for
>> > boot and suspend would be very nice. The last item is not hard,
>> > either:
>> >
>> > sudo rtcwake -l -m mem -s 5
>> >
>> > ...should take 10 seconds or so.
>> I'm told 0day does *some* suspend/resume testing, but I think it's
>> pretty limited, partly because the kinds of machines it primarily
>> works on don't really support suspend/resume at all.
>
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.
>
> I will check why it does not find the first regression introduced by
> ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
> native_load_gs_index()").
>
>>  I'm also not sure
>> just how many of those machines are 32-bit at all..
>
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.
>

Yes.

The 64-bit problem should also be reproducible with rtcwake even in a vm.

Also, on this topic, could make run_tests in
tools/testing/selftests/x86 be added to the rotation as well?  The
testing dir should match the kernel being tested IMO.

> thanks,
> rui
>>
>> But I'm adding Zhang Rui to the cc, to see if my recollection is
>> right.
>>
>> Because you're right, more suspend/resume automated testing would be
>> good to have. And yes, people test mainly 64-bit these days.
>>
>> Also, I'm not even sure what the 0day rules are for just plain
>> mainline. I don't tend to see a lot of breakage reports, even though
>> I'd expect to. This came in from the x86 trees (and those do their
>> own
>> tests too, but probably not suspend/resume either), but it hit my
>> tree
>> fairly soon after going into the x86 -tip trees.
>>
>> Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-11 Thread Zhang Rui
On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek  wrote:
> > 
> > 
> > Confirmed, revert fixes it. You see how it moves
> > fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
> Yeah, people do.
> 
> Andy?
> 
> > 
> > Which brings me to .. various people do automated testing of
> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
> > for
> > boot and suspend would be very nice. The last item is not hard,
> > either:
> > 
> > sudo rtcwake -l -m mem -s 5
> > 
> > ...should take 10 seconds or so.
> I'm told 0day does *some* suspend/resume testing, but I think it's
> pretty limited, partly because the kinds of machines it primarily
> works on don't really support suspend/resume at all.

currently, we're running suspend test on 1 platform only, with 64 bit
kernel. suspend test will be enabled on more platforms (laptops) in
next two weeks.

I will check why it does not find the first regression introduced by
ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
native_load_gs_index()").

>  I'm also not sure
> just how many of those machines are 32-bit at all..

for this, I suppose it can be reproduced if we use 32-bit kernel and
rootfs, right? Then it's easier to enable this in 0Day.

thanks,
rui
> 
> But I'm adding Zhang Rui to the cc, to see if my recollection is
> right.
> 
> Because you're right, more suspend/resume automated testing would be
> good to have. And yes, people test mainly 64-bit these days.
> 
> Also, I'm not even sure what the 0day rules are for just plain
> mainline. I don't tend to see a lot of breakage reports, even though
> I'd expect to. This came in from the x86 trees (and those do their
> own
> tests too, but probably not suspend/resume either), but it hit my
> tree
> fairly soon after going into the x86 -tip trees.
> 
> Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Pavel Machek
On Sun 2017-12-10 13:28:50, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek  wrote:
> >
> > For the record... this should fix it. Tested on x60. More tests pending.
> 
> This can't be right.
> 
> At the very least, now the comment is wrong. And the comment does seem
> relevant for 32-bit too:

Well, take a look at orignal patch. I'm reverting 32-bit code to
v4.15-rc1 version, while keeping 64-bit code at v4.15-rc3
version. Yes, my brain hurts from looking at the code :-(.

In the meantime, I did short test on 64-bit machine. No ill effect observed.

Hmm. Aha. Yes, the comment is wrong... as it was in wrong in -rc1.

> > -   fix_processor_context();
> > -
> > /*
> >  * Restore segment registers.  This happens after restoring the GDT
> >  * and LDT, which happen in fix_processor_context().
> 
> Notice? You've moved down the 32-bit fix_processor_context() call to
> after the loadsegment() calls, which smells wrong.

Yeah, I did. There's where it was in v4.15-rc1, and that's what ws
working for me. 

> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.
> 
> I think this could do with some re-org to make it more logical. That
> "some random things done in fix_processor_context(), other random
> things done directly in __restore_processor_state()" makes no sense at
> all to me. There's no logic to what is done where.

I have to agree.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Linus Torvalds
On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek  wrote:
>
> For the record... this should fix it. Tested on x60. More tests pending.

This can't be right.

At the very least, now the comment is wrong. And the comment does seem
relevant for 32-bit too:

> -   fix_processor_context();
> -
> /*
>  * Restore segment registers.  This happens after restoring the GDT
>  * and LDT, which happen in fix_processor_context().

Notice? You've moved down the 32-bit fix_processor_context() call to
after the loadsegment() calls, which smells wrong.

That said, this *all* smells wrong. Why is there a special
fix_processor_context() function at all with different 32-bit and
64-bit behavior? This code is all written to be maximally confusing.

I think this could do with some re-org to make it more logical. That
"some random things done in fix_processor_context(), other random
things done directly in __restore_processor_state()" makes no sense at
all to me. There's no logic to what is done where.

Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Pavel Machek
On Sun 2017-12-10 12:30:52, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek  wrote:
> >
> > Confirmed, revert fixes it. You see how it moves fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
> 
> Yeah, people do.
> 
> Andy?

For the record... this should fix it. Tested on x60. More tests pending.

Signed-off-by: Pavel Machek 


diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..d59f05f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct 
saved_context *ctxt)
write_cr0(ctxt->cr0);
 
/*
-* now restore the descriptor tables to their proper values
-* ltr is done i fix_processor_context().
+* Now restore the descriptor tables to their proper values
+* ltr is done in fix_processor_context().
 */
 #ifdef CONFIG_X86_32
load_idt(>idt);
@@ -235,8 +235,6 @@ static void notrace __restore_processor_state(struct 
saved_context *ctxt)
wrmsrl(MSR_GS_BASE, ctxt->gs_base);
 #endif
 
-   fix_processor_context();
-
/*
 * Restore segment registers.  This happens after restoring the GDT
 * and LDT, which happen in fix_processor_context().
@@ -252,8 +250,12 @@ static void notrace __restore_processor_state(struct 
saved_context *ctxt)
 */
if (boot_cpu_has(X86_FEATURE_SEP))
enable_sep_cpu();
+
+   fix_processor_context();
 #else
 /* CONFIG_X86_64 */
+   fix_processor_context();
+
asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Linus Torvalds
On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek  wrote:
>
> Confirmed, revert fixes it. You see how it moves fix_processor_context
> around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> machines exist? Aha.

Yeah, people do.

Andy?

> Which brings me to .. various people do automated testing of
> kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
> boot and suspend would be very nice. The last item is not hard, either:
>
> sudo rtcwake -l -m mem -s 5
>
> ...should take 10 seconds or so.

I'm told 0day does *some* suspend/resume testing, but I think it's
pretty limited, partly because the kinds of machines it primarily
works on don't really support suspend/resume at all. I'm also not sure
just how many of those machines are 32-bit at all..

But I'm adding Zhang Rui to the cc, to see if my recollection is right.

Because you're right, more suspend/resume automated testing would be
good to have. And yes, people test mainly 64-bit these days.

Also, I'm not even sure what the 0day rules are for just plain
mainline. I don't tend to see a lot of breakage reports, even though
I'd expect to. This came in from the x86 trees (and those do their own
tests too, but probably not suspend/resume either), but it hit my tree
fairly soon after going into the x86 -tip trees.

Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Michal Hocko
On Thu 07-12-17 08:55:08, Michal Hocko wrote:
> On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> > On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  
> > > wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > > 
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > > 
> > > HOWEVER.
> > > 
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > > 
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
> > > 
> > > https://lkml.org/lkml/2017/11/30/546
> > > 
> > > (also note his follow-up "fix the commit message" note, but that one
> > > doesn't actually affect the code itself).
> > 
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [  131.711412] ACPI: Preparing to enter system sleep state S3
> > [  131.755328] ACPI: EC: event blocked
> > [  131.755328] ACPI: EC: EC stopped
> > [  131.755328] PM: Saving platform NVS memory
> > [  131.755344] Disabling non-boot CPUs ...
> > [  131.779330] IRQ 124: no longer affine to CPU1
> > [  131.780334] smpboot: CPU 1 is now offline
> > [  131.804465] smpboot: CPU 2 is now offline
> > [  131.827291] IRQ 122: no longer affine to CPU3
> > [  131.827292] IRQ 123: no longer affine to CPU3
> > [  131.828293] smpboot: CPU 3 is now offline
> > [  131.830991] ACPI: Low-level resume complete
> > [  131.831092] ACPI: EC: EC started
> > [  131.831093] PM: Restoring platform NVS memory
> > [  131.831864] do_IRQ: 0.55 No irq handler for vector
> > [  131.831884] Enabling non-boot CPUs ...
> > [  131.831909] x86: Booting SMP configuration:
> > [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [  131.832913]  cache: parent cpu1 should not be sleeping
> > [  131.833058] CPU1 is up
> > [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [  131.833864]  cache: parent cpu2 should not be sleeping
> > [  131.833983] CPU2 is up
> > [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [  131.834776]  cache: parent cpu3 should not be sleeping
> > [  131.834923] CPU3 is up
> > 
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> Hmm, there is still something bad going on during resume. My laptop
> haven't woken up from s2ram this morning. The screen was powered on
> but the system hasn't come up.

It's been few days and I haven't seen this problem again. And I am doing
s2ram all the time...
-- 
Michal Hocko
SUSE Labs


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Pavel Machek
On Sun 2017-12-10 08:37:56, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek  wrote:
> >
> > Can you do something about html emails? Quoting them doesn't work too well.
> 
> Yeah, and they don't show up onlkml either because of rules. I try to
> avoid them, but have been more on mobile for various reasons lately
> than usual. That should be over and done with after today, though.
> 
> >> Any chance to bisect it? This doesn't sound like the other problem
> >> we had.
> >
> > Let me try...
> >
> > v4.15-rc2: suspends/resumes ok.
> > 4ded3be: hangs on resume.
> >
> > Given that between those, there was supposed "fix" for suspend, I
> > believe I should try reverting that one first. .. if someone can tell
> > me commit id, that would help.
> 
> The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
> bugs in __restore_processor_context()") so you can certainly see if it
> works before that (or just reverting it).

Revert is easier.

> But there are also a few other x86 low-level things there, and that
> fix really looks very safe, so I'd almost expect something else to
> have triggered your problem. There's less than 500 commits in that
> range you have, so a few bisections should narrow it down a lot.

No, that commit does _look_ pretty suspect to me...

Confirmed, revert fixes it. You see how it moves fix_processor_context
around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
machines exist? Aha.

Which brings me to .. various people do automated testing of
kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
boot and suspend would be very nice. The last item is not hard, either:

sudo rtcwake -l -m mem -s 5

...should take 10 seconds or so. 

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Linus Torvalds
On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek  wrote:
>
> Can you do something about html emails? Quoting them doesn't work too well.

Yeah, and they don't show up onlkml either because of rules. I try to
avoid them, but have been more on mobile for various reasons lately
than usual. That should be over and done with after today, though.

>> Any chance to bisect it? This doesn't sound like the other problem
>> we had.
>
> Let me try...
>
> v4.15-rc2: suspends/resumes ok.
> 4ded3be: hangs on resume.
>
> Given that between those, there was supposed "fix" for suspend, I
> believe I should try reverting that one first. .. if someone can tell
> me commit id, that would help.

The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
bugs in __restore_processor_context()") so you can certainly see if it
works before that (or just reverting it).

But there are also a few other x86 low-level things there, and that
fix really looks very safe, so I'd almost expect something else to
have triggered your problem. There's less than 500 commits in that
range you have, so a few bisections should narrow it down a lot.

Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-10 Thread Pavel Machek
On Sat 2017-12-09 14:47:41, Linus Torvalds wrote:
> On Dec 9, 2017 14:01, "Pavel Machek"  wrote:
> 
> 
> Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
> 32-bit). It was broken in -next. I updated to current mainline (
> 4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.
> 

Can you do something about html emails? Quoting them doesn't work too well.

> Any chance to bisect it? This doesn't sound like the other problem
> we had.

Let me try...

v4.15-rc2: suspends/resumes ok.
4ded3be: hangs on resume.

Given that between those, there was supposed "fix" for suspend, I
believe I should try reverting that one first. .. if someone can tell
me commit id, that would help.

And yes, if everything else fails, I can probably bisect.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-09 Thread Pavel Machek
Hi!

On Sat 2017-12-09 10:55:14, Linus Torvalds wrote:
> On Dec 9, 2017 02:33, "Pavel Machek"  wrote:
> 
> > I believe I have the issue here, too (-next on thinkpad x60). Which
> > patch is expected to fix it? Let me try recent -next...
> 
> 
> It should be fixed in mainline. I don't know if next has picked it
> up yet.

Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
32-bit). It was broken in -next. I updated to current mainline (
4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.

It suspends ok, I press Fn button to make it resume, fans spin up but
moon LED is still lit.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-09 Thread Pavel Machek
On Sat 2017-12-09 11:33:25, Pavel Machek wrote:
> On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > > > 
> > > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki 
> > > > >  wrote:
> > > > > >
> > > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > > broken everywhere.
> > > > > 
> > > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > > and was traveling last week due to my mom's bday.
> > > > > 
> > > > > HOWEVER.
> > > > > 
> > > > > Some of the x86 work seems to have broken it for some configurations.
> > > > > In particular, do you have a big "everything enabled" kernel config -
> > > > > particularly lockdep and irqflags tracing enabled?
> > > > > 
> > > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > > the x86 people are very busy with the kaiser work):
> > > 
> > > This definitely fixes the problem at least on one of the affected 
> > > machines.
> > 
> > I can confirm that the Andy's patch fixes it on all systems that had this
> > issue here.
> 
> I believe I have the issue here, too (-next on thinkpad x60). Which
> patch is expected to fix it? Let me try recent -next...

Still there AFAICT.
Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-09 Thread Pavel Machek
On Tue 2017-12-05 01:25:55, Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> > On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > > 
> > > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  
> > > > wrote:
> > > > >
> > > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > > systems I have tested, so it is probably safe to assume it to be
> > > > > broken everywhere.
> > > > 
> > > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > > and was traveling last week due to my mom's bday.
> > > > 
> > > > HOWEVER.
> > > > 
> > > > Some of the x86 work seems to have broken it for some configurations.
> > > > In particular, do you have a big "everything enabled" kernel config -
> > > > particularly lockdep and irqflags tracing enabled?
> > > > 
> > > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > > the x86 people are very busy with the kaiser work):
> > 
> > This definitely fixes the problem at least on one of the affected machines.
> 
> I can confirm that the Andy's patch fixes it on all systems that had this
> issue here.

I believe I have the issue here, too (-next on thinkpad x60). Which
patch is expected to fix it? Let me try recent -next...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-08 Thread Thomas Gleixner
On Thu, 7 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> >>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>  Op 06-12-17 om 13:15 schreef Michal Hocko:
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
>  Another reproducer for 
>  https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>  Symptoms are similar..
> >>> Well, the spurious interrupt is one thing, but you obviously lose
> >>> interrupts for some reason.
> >>>
> >>> Did you ever manage to get the data out which I asked for?
> >>>
> >>> Thanks,
> >>>
> >>>   tglx
> >>>
> >> Yes, sent this out about an hour ago
> >>
> >> https://lkml.org/lkml/2017/12/6/215
> > Weird. Did not reach me
> >
> But do you have any idea?

Can you please provide the full trace, dmesg and the full output of
.../debug/irq/... ?

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-07 Thread Maarten Lankhorst
Op 06-12-17 om 15:15 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
>>> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
 Op 06-12-17 om 13:15 schreef Michal Hocko:
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.
 Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
 Symptoms are similar..
>>> Well, the spurious interrupt is one thing, but you obviously lose
>>> interrupts for some reason.
>>>
>>> Did you ever manage to get the data out which I asked for?
>>>
>>> Thanks,
>>>
>>> tglx
>>>
>> Yes, sent this out about an hour ago
>>
>> https://lkml.org/lkml/2017/12/6/215
> Weird. Did not reach me
>
But do you have any idea?

~Maarten



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Michal Hocko
On Wed 06-12-17 13:14:52, Michal Hocko wrote:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  
> > wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> > 
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> > 
> > HOWEVER.
> > 
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> > 
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):
> > 
> > https://lkml.org/lkml/2017/11/30/546
> > 
> > (also note his follow-up "fix the commit message" note, but that one
> > doesn't actually affect the code itself).
> 
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [  131.711412] ACPI: Preparing to enter system sleep state S3
> [  131.755328] ACPI: EC: event blocked
> [  131.755328] ACPI: EC: EC stopped
> [  131.755328] PM: Saving platform NVS memory
> [  131.755344] Disabling non-boot CPUs ...
> [  131.779330] IRQ 124: no longer affine to CPU1
> [  131.780334] smpboot: CPU 1 is now offline
> [  131.804465] smpboot: CPU 2 is now offline
> [  131.827291] IRQ 122: no longer affine to CPU3
> [  131.827292] IRQ 123: no longer affine to CPU3
> [  131.828293] smpboot: CPU 3 is now offline
> [  131.830991] ACPI: Low-level resume complete
> [  131.831092] ACPI: EC: EC started
> [  131.831093] PM: Restoring platform NVS memory
> [  131.831864] do_IRQ: 0.55 No irq handler for vector
> [  131.831884] Enabling non-boot CPUs ...
> [  131.831909] x86: Booting SMP configuration:
> [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  131.832913]  cache: parent cpu1 should not be sleeping
> [  131.833058] CPU1 is up
> [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [  131.833864]  cache: parent cpu2 should not be sleeping
> [  131.833983] CPU2 is up
> [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [  131.834776]  cache: parent cpu3 should not be sleeping
> [  131.834923] CPU3 is up
> 
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

Hmm, there is still something bad going on during resume. My laptop
haven't woken up from s2ram this morning. The screen was powered on
but the system hasn't come up.

The last thing that made it into the kernel log on fs is this

Dec  6 19:32:29 tiehlicka kernel: [21898.084685] PM: suspend entry (deep)

which won't tell us much I suspect. I've tried dozen s2ram cycles and it
hasn't reproduced so it smells like a timing issue.
-- 
Michal Hocko
SUSE Labs


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Thomas Gleixner
On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> > On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> >> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >>> "No irq handler" part looks a bit scary (maybe related to lost affinity
> >>> messages?) but the following messages look quite as well. Is this
> >>> something known? The system seems to be up and running without any
> >>> visible issues.
> >> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> >> Symptoms are similar..
> > Well, the spurious interrupt is one thing, but you obviously lose
> > interrupts for some reason.
> >
> > Did you ever manage to get the data out which I asked for?
> >
> > Thanks,
> >
> > tglx
> >
> Yes, sent this out about an hour ago
> 
> https://lkml.org/lkml/2017/12/6/215

Weird. Did not reach me


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Rafael J. Wysocki
On Wednesday, December 6, 2017 1:23:34 PM CET Thomas Gleixner wrote:
> On Wed, 6 Dec 2017, Michal Hocko wrote:
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [  131.711412] ACPI: Preparing to enter system sleep state S3
> > [  131.755328] ACPI: EC: event blocked
> > [  131.755328] ACPI: EC: EC stopped
> > [  131.755328] PM: Saving platform NVS memory
> > [  131.755344] Disabling non-boot CPUs ...
> > [  131.779330] IRQ 124: no longer affine to CPU1
> > [  131.780334] smpboot: CPU 1 is now offline
> > [  131.804465] smpboot: CPU 2 is now offline
> > [  131.827291] IRQ 122: no longer affine to CPU3
> > [  131.827292] IRQ 123: no longer affine to CPU3
> > [  131.828293] smpboot: CPU 3 is now offline
> > [  131.830991] ACPI: Low-level resume complete
> > [  131.831092] ACPI: EC: EC started
> > [  131.831093] PM: Restoring platform NVS memory
> > [  131.831864] do_IRQ: 0.55 No irq handler for vector
> 
> Hmm, that's really odd.
> 
> > [  131.831884] Enabling non-boot CPUs ...
> > [  131.831909] x86: Booting SMP configuration:
> > [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [  131.832913]  cache: parent cpu1 should not be sleeping
> 
> This is an old one. 
> 
> > [  131.833058] CPU1 is up
> > [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [  131.833864]  cache: parent cpu2 should not be sleeping
> > [  131.833983] CPU2 is up
> > [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [  131.834776]  cache: parent cpu3 should not be sleeping
> > [  131.834923] CPU3 is up
> > 
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> I assume it's due to the affinity break, just that we don't know right now
> on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
> the others are offline already, but 

This is resume from S3, so the firmware might do something odd to the other
CPUs, but in case it didn't (which is quite likely or we would have seen more
of these messages), they are offline and in mwait_play_dead(), so IMO it is
safe to assume that this was CPU0.

And this appears to have happened at the atch_suspend_enable_irqs() time,
which is just local_irq_enable() on x86 running on CPU0.

> I'll think about it how we can figure out what's going on.

It looks like an interrupt that have triggered right after we've enabled
interrupts on the boot CPU.

Thanks,
Rafael



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Maarten Lankhorst
Op 06-12-17 om 13:46 schreef Thomas Gleixner:
> On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
>> Op 06-12-17 om 13:15 schreef Michal Hocko:
>>> "No irq handler" part looks a bit scary (maybe related to lost affinity
>>> messages?) but the following messages look quite as well. Is this
>>> something known? The system seems to be up and running without any
>>> visible issues.
>> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
>> Symptoms are similar..
> Well, the spurious interrupt is one thing, but you obviously lose
> interrupts for some reason.
>
> Did you ever manage to get the data out which I asked for?
>
> Thanks,
>
>   tglx
>
Yes, sent this out about an hour ago

https://lkml.org/lkml/2017/12/6/215

Cheers,

Maarten



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Thomas Gleixner
On Wed, 6 Dec 2017, Maarten Lankhorst wrote:
> Op 06-12-17 om 13:15 schreef Michal Hocko:
> >
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
> Symptoms are similar..

Well, the spurious interrupt is one thing, but you obviously lose
interrupts for some reason.

Did you ever manage to get the data out which I asked for?

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Maarten Lankhorst
Op 06-12-17 om 13:15 schreef Michal Hocko:
> On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
>> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  wrote:
>>> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
>>> systems I have tested, so it is probably safe to assume it to be
>>> broken everywhere.
>> Oh, it's definitely not broken everywhere, because I use it myself,
>> and was traveling last week due to my mom's bday.
>>
>> HOWEVER.
>>
>> Some of the x86 work seems to have broken it for some configurations.
>> In particular, do you have a big "everything enabled" kernel config -
>> particularly lockdep and irqflags tracing enabled?
>>
>> Andy has a patch, but it hasn't made it to me yet (probably because
>> the x86 people are very busy with the kaiser work):
>>
>> https://lkml.org/lkml/2017/11/30/546
>>
>> (also note his follow-up "fix the commit message" note, but that one
>> doesn't actually affect the code itself).
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [  131.711412] ACPI: Preparing to enter system sleep state S3
> [  131.755328] ACPI: EC: event blocked
> [  131.755328] ACPI: EC: EC stopped
> [  131.755328] PM: Saving platform NVS memory
> [  131.755344] Disabling non-boot CPUs ...
> [  131.779330] IRQ 124: no longer affine to CPU1
> [  131.780334] smpboot: CPU 1 is now offline
> [  131.804465] smpboot: CPU 2 is now offline
> [  131.827291] IRQ 122: no longer affine to CPU3
> [  131.827292] IRQ 123: no longer affine to CPU3
> [  131.828293] smpboot: CPU 3 is now offline
> [  131.830991] ACPI: Low-level resume complete
> [  131.831092] ACPI: EC: EC started
> [  131.831093] PM: Restoring platform NVS memory
> [  131.831864] do_IRQ: 0.55 No irq handler for vector
> [  131.831884] Enabling non-boot CPUs ...
> [  131.831909] x86: Booting SMP configuration:
> [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  131.832913]  cache: parent cpu1 should not be sleeping
> [  131.833058] CPU1 is up
> [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [  131.833864]  cache: parent cpu2 should not be sleeping
> [  131.833983] CPU2 is up
> [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [  131.834776]  cache: parent cpu3 should not be sleeping
> [  131.834923] CPU3 is up
>
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

Another reproducer for https://bugzilla.kernel.org/show_bug.cgi?id=198033 ?
Symptoms are similar..

~Maarten



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Thomas Gleixner
On Wed, 6 Dec 2017, Michal Hocko wrote:
> merging tip/x86/urgent on top of your tree fixed this problem for me,
> but I am seeing something else
> [  131.711412] ACPI: Preparing to enter system sleep state S3
> [  131.755328] ACPI: EC: event blocked
> [  131.755328] ACPI: EC: EC stopped
> [  131.755328] PM: Saving platform NVS memory
> [  131.755344] Disabling non-boot CPUs ...
> [  131.779330] IRQ 124: no longer affine to CPU1
> [  131.780334] smpboot: CPU 1 is now offline
> [  131.804465] smpboot: CPU 2 is now offline
> [  131.827291] IRQ 122: no longer affine to CPU3
> [  131.827292] IRQ 123: no longer affine to CPU3
> [  131.828293] smpboot: CPU 3 is now offline
> [  131.830991] ACPI: Low-level resume complete
> [  131.831092] ACPI: EC: EC started
> [  131.831093] PM: Restoring platform NVS memory
> [  131.831864] do_IRQ: 0.55 No irq handler for vector

Hmm, that's really odd.

> [  131.831884] Enabling non-boot CPUs ...
> [  131.831909] x86: Booting SMP configuration:
> [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  131.832913]  cache: parent cpu1 should not be sleeping

This is an old one. 

> [  131.833058] CPU1 is up
> [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> [  131.833864]  cache: parent cpu2 should not be sleeping
> [  131.833983] CPU2 is up
> [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> [  131.834776]  cache: parent cpu3 should not be sleeping
> [  131.834923] CPU3 is up
> 
> "No irq handler" part looks a bit scary (maybe related to lost affinity
> messages?) but the following messages look quite as well. Is this
> something known? The system seems to be up and running without any
> visible issues.

I assume it's due to the affinity break, just that we don't know right now
on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
the others are offline already, but 

I'll think about it how we can figure out what's going on.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-06 Thread Michal Hocko
On Mon 04-12-17 14:36:20, Linus Torvalds wrote:
> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
> 
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
> 
> HOWEVER.
> 
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
> 
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):
> 
> https://lkml.org/lkml/2017/11/30/546
> 
> (also note his follow-up "fix the commit message" note, but that one
> doesn't actually affect the code itself).

merging tip/x86/urgent on top of your tree fixed this problem for me,
but I am seeing something else
[  131.711412] ACPI: Preparing to enter system sleep state S3
[  131.755328] ACPI: EC: event blocked
[  131.755328] ACPI: EC: EC stopped
[  131.755328] PM: Saving platform NVS memory
[  131.755344] Disabling non-boot CPUs ...
[  131.779330] IRQ 124: no longer affine to CPU1
[  131.780334] smpboot: CPU 1 is now offline
[  131.804465] smpboot: CPU 2 is now offline
[  131.827291] IRQ 122: no longer affine to CPU3
[  131.827292] IRQ 123: no longer affine to CPU3
[  131.828293] smpboot: CPU 3 is now offline
[  131.830991] ACPI: Low-level resume complete
[  131.831092] ACPI: EC: EC started
[  131.831093] PM: Restoring platform NVS memory
[  131.831864] do_IRQ: 0.55 No irq handler for vector
[  131.831884] Enabling non-boot CPUs ...
[  131.831909] x86: Booting SMP configuration:
[  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  131.832913]  cache: parent cpu1 should not be sleeping
[  131.833058] CPU1 is up
[  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
[  131.833864]  cache: parent cpu2 should not be sleeping
[  131.833983] CPU2 is up
[  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  131.834776]  cache: parent cpu3 should not be sleeping
[  131.834923] CPU3 is up

"No irq handler" part looks a bit scary (maybe related to lost affinity
messages?) but the following messages look quite as well. Is this
something known? The system seems to be up and running without any
visible issues.
-- 
Michal Hocko
SUSE Labs


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-04 Thread Rafael J. Wysocki
On Monday, December 4, 2017 11:41:06 PM CET Rafael J. Wysocki wrote:
> On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> > On Mon, 4 Dec 2017, Linus Torvalds wrote:
> > 
> > > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  
> > > wrote:
> > > >
> > > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > > systems I have tested, so it is probably safe to assume it to be
> > > > broken everywhere.
> > > 
> > > Oh, it's definitely not broken everywhere, because I use it myself,
> > > and was traveling last week due to my mom's bday.
> > > 
> > > HOWEVER.
> > > 
> > > Some of the x86 work seems to have broken it for some configurations.
> > > In particular, do you have a big "everything enabled" kernel config -
> > > particularly lockdep and irqflags tracing enabled?
> > > 
> > > Andy has a patch, but it hasn't made it to me yet (probably because
> > > the x86 people are very busy with the kaiser work):
> 
> This definitely fixes the problem at least on one of the affected machines.

I can confirm that the Andy's patch fixes it on all systems that had this
issue here.

Thanks,
Rafael



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-04 Thread Rafael J. Wysocki
On Monday, December 4, 2017 11:38:54 PM CET Thomas Gleixner wrote:
> On Mon, 4 Dec 2017, Linus Torvalds wrote:
> 
> > On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  
> > wrote:
> > >
> > > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > > systems I have tested, so it is probably safe to assume it to be
> > > broken everywhere.
> > 
> > Oh, it's definitely not broken everywhere, because I use it myself,
> > and was traveling last week due to my mom's bday.
> > 
> > HOWEVER.
> > 
> > Some of the x86 work seems to have broken it for some configurations.
> > In particular, do you have a big "everything enabled" kernel config -
> > particularly lockdep and irqflags tracing enabled?
> > 
> > Andy has a patch, but it hasn't made it to me yet (probably because
> > the x86 people are very busy with the kaiser work):

This definitely fixes the problem at least on one of the affected machines.
 
> Picking it up right now.

Cool, thanks!

Rafael



Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-04 Thread Thomas Gleixner
On Mon, 4 Dec 2017, Linus Torvalds wrote:

> On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  wrote:
> >
> > So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> > systems I have tested, so it is probably safe to assume it to be
> > broken everywhere.
> 
> Oh, it's definitely not broken everywhere, because I use it myself,
> and was traveling last week due to my mom's bday.
> 
> HOWEVER.
> 
> Some of the x86 work seems to have broken it for some configurations.
> In particular, do you have a big "everything enabled" kernel config -
> particularly lockdep and irqflags tracing enabled?
> 
> Andy has a patch, but it hasn't made it to me yet (probably because
> the x86 people are very busy with the kaiser work):

Picking it up right now.

Thanks,

tglx


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-04 Thread Linus Torvalds
On Mon, Dec 4, 2017 at 2:25 PM, Rafael J. Wysocki  wrote:
>
> So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
> systems I have tested, so it is probably safe to assume it to be
> broken everywhere.

Oh, it's definitely not broken everywhere, because I use it myself,
and was traveling last week due to my mom's bday.

HOWEVER.

Some of the x86 work seems to have broken it for some configurations.
In particular, do you have a big "everything enabled" kernel config -
particularly lockdep and irqflags tracing enabled?

Andy has a patch, but it hasn't made it to me yet (probably because
the x86 people are very busy with the kaiser work):

https://lkml.org/lkml/2017/11/30/546

(also note his follow-up "fix the commit message" note, but that one
doesn't actually affect the code itself).

Does that patch fix it for  you?

  Linus


Re: Linux 4.15-rc2: Regression in resume from ACPI S3

2017-12-04 Thread Rafael J. Wysocki
On Sunday, December 3, 2017 5:22:56 PM CET Linus Torvalds wrote:
> It's Sunday, but a few hours earlier than usual, since I'm on the east
> coast, three hours ahead of my normal release schedule.
> 
> It's a slightly bigger rc2 than I would have wished for, but this
> early in the release process I don't worry about it. The appended
> shortlog gives the details, it's fixes all over the place -
> architectures, drivers, filesystems, networking, core kernel.
> 
> One thing I'll point out is that I'm trying to get some kernel ASLR
> leaks plugged, and as part of that we now hash any pointers printed by
> "%p' by default. That won't affect a lot of people, but where it is a
> debugging problem (rather than leaking interesting kernel pointers),
> we will have to fix things up.
> 
> It can be a small annoyance, but the alternatives (trying to actually
> find all the cases where we might be leaking) were worse. But let's
> see if anybody even notices - a lot of the pointer printouts are stale
> debug information from when some driver was originally written, and
> aren't actually really interesting.
> 
> There will probably be some more leak fixes during this rc process,
> we'll see how that all sorts out.

So far, resume from suspend-to-RAM (ACPI S3) is broken on all of the
systems I have tested, so it is probably safe to assume it to be
broken everywhere.

I'm quite confident that this is not something that went in through the
PM tree, because I was running those changes on the systems that turn
out to be broken now.

It looks like the the ACPI waking vector mechanism stopped working, so
I'm suspecting some x86 changes having to do with virtual-to-physical
address mapping.

I've just started bisection.

Thanks,
Rafael



Linux 4.15-rc2

2017-12-03 Thread Linus Torvalds
   mm, thp: Do not make pmd/pud dirty without a reason
  mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine

Laurent Pinchart (1):
  drm: omapdrm: Fix DPI on platforms using the DSI VDDS

Leo (Sunpeng) Li (3):
  drm/amd/display: Should disable when new stream is null
  drm/amd/display: Do DC mode-change check when adding CRTCs
  drm/amd/display: Do not put drm_atomic_state on resume

Leo Liu (1):
  drm/amdgpu: move UVD/VCE and VCN structure out from union

Linus Torvalds (6):
  Rename superblock flags (MS_xyz -> SB_xyz)
  proc: don't report kernel addresses in /proc//stack
  Revert "mm, thp: Do not make pmd/pud dirty without a reason"
  kallsyms: take advantage of the new '%px' format
  vsprintf: don't use 'restricted_pointer()' when not restricting
  Linux 4.15-rc2

Liran Alon (6):
  KVM: x86: pvclock: Handle first-time write to pvclock-page
contains random junk
  KVM: nVMX/nSVM: Don't intercept #UD when running L2
  KVM: x86: Exit to user-mode on #UD intercept when emulator requires
  KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
  KVM: x86: Don't re-execute instruction when not passing CR2 value
  KVM: nVMX: Fix vmx_check_nested_events() return value in case an
event was reinjected to L2

Liu Bo (3):
  Btrfs: add write_flags for compression bio
  Btrfs: bail out gracefully rather than BUG_ON
  Btrfs: fix list_add corruption and soft lockups in fsync

Liu, Changcheng (1):
  scripts/faddr2line: extend usage on generic arch

Liviu Dudau (3):
  drm: hdlcd: Update PM code to save/restore console.
  drm: mali-dp: Separate static internal data into a read-only structure.
  drm: mali-dp: Disable planes when their CRTC gets disabled.

Lucas Stach (2):
  drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressive
  drm/imx: always call wait_for_flip_done in commit_tail

Lv Zheng (1):
  ACPI / EC: Fix regression related to PM ops support in ECDT device

Maarten Lankhorst (2):
  drm/vblank: Pass crtc_id to page_flip_ioctl.
  drm/fb_helper: Disable all crtc's when initial setup fails.

Mahesh Salgaonkar (1):
  powerpc/powernv: Fix kexec crashes caused by tlbie tracing

Marcos Paulo de Souza (1):
  blktrace: Use blk_trace_bio_get_cgid inside blk_add_trace_bio

Mark Rutland (1):
  arm64: mm: cleanup stale AIVIVT references

Martin Schwidefsky (4):
  s390: fix alloc_pgste check in init_new_context again
  s390: sthyi: add SPDX identifiers to the remaining files
  s390: revert ELF_ET_DYN_BASE base changes
  s390/gs: add compat regset for the guarded storage broadcast control block

Max Gurtovoy (1):
  nvme-rdma: fix memory leak during queue allocation

Michael Ellerman (1):
  powerpc/kexec: Fix kexec/kdump in P9 guest kernels

Michael Lyle (1):
  bcache: check return value of register_shrinker

Michal Hocko (3):
  xfs: fortify xfs_alloc_buftarg error handling
  mm, memory_hotplug: do not back off draining pcp free pages from
kworker context
  Revert "mm/page-writeback.c: print a warning if the vm dirtiness
settings are illogical"

Michel Dänzer (2):
  drm/amdgpu: Set adev->vcn.irq.num_types for VCN
  drm/amdgpu: Use unsigned ring indices in amdgpu_queue_mgr_map

Mika Westerberg (1):
  net: thunderbolt: Stop using zero to mean no valid DMA mapping

Mike Kravetz (1):
  mm/cma: fix alloc_contig_range ret code/potential leak

Mike Maloney (1):
  packet: fix crash in fanout_demux_rollover()

Mikulas Patocka (1):
  block: remove useless assignment in bio_split

Minwoo Im (2):
  nvme-pci: avoid hmb desc array idx out-of-bound when hmmaxd set.
  nvme-pci: fix NULL pointer dereference in nvme_free_host_mem()

Mirza Krak (1):
  drm/rockchip: dw-mipi-dsi: fix possible un-balanced runtime PM enable

Nadav Amit (1):
  fs/hugetlbfs/inode.c: change put_page/unlock_page order in
hugetlbfs_fallocate()

Naofumi Honda (1):
  nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat

Nikita Leshenko (5):
  KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  KVM: x86: ioapic: Clear Remote IRR when entry is switched to
edge-triggered
  KVM: x86: ioapic: Preserve read-only values in the redirection table

Nikolay Borisov (1):
  btrfs: Fix transaction abort during failure in btrfs_rm_dev_item

OGAWA Hirofumi (1):
  fs/fat/inode.c: fix sb_rdonly() change

Oded Gabbay (2):
  microblaze: add missing include to mmu_context_mm.h
  drm/radeon: remove init of CIK VMIDs 8-16 for amdkfd

Olof Johansson (8):
  RISC-V: use generic serial.h
  RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
  RISC-V: io.h: type fixes for warnings
  RISC-V: mov

[GIT PULL] KVM fixes for Linux 4.15-rc2

2017-11-30 Thread Paolo Bonzini
Linus,

The following changes since commit d02fcf50779ec9d8eb7a81473fd76efe3f04b3a5:

  kvm: vmx: Allow disabling virtual NMI support (2017-11-17 13:20:07 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to a63dd7480d8f3c2b52e1e2bcab83e3e64c7c61a1:

  Merge tag 'kvm-ppc-fixes-4.15-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master 
(2017-11-27 17:54:13 +0100)



* x86 bugfixes: APIC, nested virtualization, IOAPIC
* PPC bugfix: HPT guests on a POWER9 radix host


David Hildenbrand (1):
  KVM: x86: fix em_fxstor() sleeping while in atomic

Dr. David Alan Gilbert (2):
  KVM: lapic: Split out x2apic ldr calculation
  KVM: lapic: Fixup LDR on load in x2apic

Eyal Moscovici (1):
  KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs

Jan H. Schönherr (1):
  KVM: Let KVM_SET_SIGNAL_MASK work as advertised

Janakarajan Natarajan (1):
  KVM: x86: Fix CPUID function for word 6 (8001_ECX)

Liran Alon (6):
  KVM: x86: pvclock: Handle first-time write to pvclock-page contains 
random junk
  KVM: nVMX/nSVM: Don't intercept #UD when running L2
  KVM: x86: Exit to user-mode on #UD intercept when emulator requires
  KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
  KVM: x86: Don't re-execute instruction when not passing CR2 value
  KVM: nVMX: Fix vmx_check_nested_events() return value in case an event 
was reinjected to L2

Nikita Leshenko (5):
  KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  KVM: x86: ioapic: Clear Remote IRR when entry is switched to 
edge-triggered
  KVM: x86: ioapic: Preserve read-only values in the redirection table

Paolo Bonzini (3):
  KVM: x86: inject exceptions produced by x86_decode_insn
  KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
  Merge tag 'kvm-ppc-fixes-4.15-1' of 
git://git.kernel.org/.../paulus/powerpc into kvm-master

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on 
radix hosts

Wanpeng Li (6):
  KVM: X86: Fix operand/address-size during instruction decoding
  KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
  KVM: X86: Fix softlockup when get the current kvmclock
  KVM: VMX: Fix rflags cache during vCPU reset
  KVM: VMX: Fix vmx->nested freeing when no SMI handler

 arch/mips/kvm/mips.c|  7 +---
 arch/powerpc/include/asm/kvm_ppc.h  |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 37 ---
 arch/powerpc/kvm/book3s_hv.c|  3 +-
 arch/powerpc/kvm/powerpc.c  |  7 +---
 arch/s390/kvm/kvm-s390.c|  7 +---
 arch/x86/include/asm/kvm_host.h |  3 +-
 arch/x86/kvm/cpuid.h|  2 +-
 arch/x86/kvm/emulate.c  | 39 +---
 arch/x86/kvm/ioapic.c   | 34 -
 arch/x86/kvm/lapic.c| 12 +-
 arch/x86/kvm/svm.c  | 11 +-
 arch/x86/kvm/vmx.c  | 73 ++---
 arch/x86/kvm/x86.c  | 42 ++---
 include/linux/kvm_host.h|  3 ++
 virt/kvm/arm/arm.c  |  8 ++--
 virt/kvm/kvm_main.c | 23 
 17 files changed, 214 insertions(+), 98 deletions(-)