Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-09 Thread Mathias Nyman



I can't see how this relates to Julius patch though, and I'm not sure yet why it
only triggers when devices are connected to SS ports. Maybe just unlucky timing?


I think the non-SS ports are connected to the EHCI controllers rather
than the XHCI controllers. So that explains at least one detail. And I
guess timing is as good an excuse as any why this gets exposed by the
patch in question.


That's right, sometimes I forget that there exists something else than xHCI.





Does this help?:


Indeed it does. The machine just survived a dozen or so suspend+resume
cycles without a hitch. The bug was 100% reproducible on this machine,
so the fix seems solid.

Tested-by: Ville Syrjälä 



Great, a patch with your Tested-by tag pushed to my tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git for-usb-linus

-Mathias


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-09 Thread Mathias Nyman



I can't see how this relates to Julius patch though, and I'm not sure yet why it
only triggers when devices are connected to SS ports. Maybe just unlucky timing?


I think the non-SS ports are connected to the EHCI controllers rather
than the XHCI controllers. So that explains at least one detail. And I
guess timing is as good an excuse as any why this gets exposed by the
patch in question.


That's right, sometimes I forget that there exists something else than xHCI.





Does this help?:


Indeed it does. The machine just survived a dozen or so suspend+resume
cycles without a hitch. The bug was 100% reproducible on this machine,
so the fix seems solid.

Tested-by: Ville Syrjälä ville.syrj...@linux.intel.com



Great, a patch with your Tested-by tag pushed to my tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git for-usb-linus

-Mathias


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-07 Thread Ville Syrjälä
On Wed, May 07, 2014 at 04:48:48PM +0300, Mathias Nyman wrote:
> On 05/06/2014 02:41 PM, Ville Syrjälä wrote:
> > On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:
> >> Hmmm... very odd. I unfortunately don't have a machine that can easily
> >> do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
> >> in S3 (essentially the same code path), and I didn't run into any
> >> problems.
> >>
> >> How exactly does your machine fail on resume? Is it a kernel crash or
> >> just a hang? Can you try getting some debug output (by setting 'echo N
> >>> /sys/module/printk/parameters/console_suspend' and trying to catch
> >> the crash on the screen or a serial line, or maybe through pstore)? I
> >> really don't see much that could go wrong with this patch, so without
> >> more info it will be hard to understand your problem.
> >>
> >> Also, I noticed that you have two HID devices plugged in during
> >> suspend. Does it make a difference if you have different devices (e.g.
> >> a mass storage stick) or none at all?
> >
> > Looks like it doesn't like it when there's anything plugged into the
> > "SS" ports. I tried with just a HID keyboard or with just a hub. In
> > both cases it fails to resume. If I have nothing connected to the "SS"
> > ports then it resumes just fine.
> >
> > I managed to catch something with ramoops. Looks like it's hitting
> > POISON_FREE when trying to delete some list entry.
> >
> 
> > <4>[  107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from 
> > BW list!
> > <4>[  107.047574] general protection fault:  [#1] PREEMPT SMP
> 
> I took a look at the xhci_mem_cleanup() function and to me it looks
> like it tries to access a list_head that is already freed.
> 
> The struct list_head xhci->devs[].eps[].bw_endpoint_list is added to an 
> endpoint 
> list in xhci->rh_bw[].bw_table.interval_bw[].endpoints
> 
> xhci_mem_cleanup() frees all devices (the allocated xhci->devs[], containing 
> the 
> bw_endpoint_list) before it starts to loop through, and delete entries from 
> the 
> xhci->rh_bw[].bw_table.interval_bw[].endpoints list.
> 
> I can't see how this relates to Julius patch though, and I'm not sure yet why 
> it 
> only triggers when devices are connected to SS ports. Maybe just unlucky 
> timing?

I think the non-SS ports are connected to the EHCI controllers rather
than the XHCI controllers. So that explains at least one detail. And I
guess timing is as good an excuse as any why this gets exposed by the
patch in question.

> 
> Does this help?:

Indeed it does. The machine just survived a dozen or so suspend+resume
cycles without a hitch. The bug was 100% reproducible on this machine,
so the fix seems solid.

Tested-by: Ville Syrjälä 

> 
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index c089668..b1a8a5f 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
>  kfree(cur_cd);
>  }
> 
> +   num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
> +   for (i = 0; i < num_ports; i++) {
> +   struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table;
> +   for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
> +   struct list_head *ep = >interval_bw[j].endpoints;
> +   while (!list_empty(ep))
> +   list_del_init(ep->next);
> +   }
> +   }
> +
>  for (i = 1; i < MAX_HC_SLOTS; ++i)
>  xhci_free_virt_device(xhci, i);
> 
> @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
>  if (!xhci->rh_bw)
>  goto no_bw;
> 
> -   num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
> -   for (i = 0; i < num_ports; i++) {
> -   struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table;
> -   for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
> -   struct list_head *ep = >interval_bw[j].endpoints;
> -   while (!list_empty(ep))
> -   list_del_init(ep->next);
> -   }
> -   }
> -
>  for (i = 0; i < num_ports; i++) {
>  struct xhci_tt_bw_info *tt, *n;
>  list_for_each_entry_safe(tt, n, >rh_bw[i].tts, 
> tt_list) {

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-07 Thread Mathias Nyman

On 05/06/2014 02:41 PM, Ville Syrjälä wrote:

On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:

Hmmm... very odd. I unfortunately don't have a machine that can easily
do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
in S3 (essentially the same code path), and I didn't run into any
problems.

How exactly does your machine fail on resume? Is it a kernel crash or
just a hang? Can you try getting some debug output (by setting 'echo N

/sys/module/printk/parameters/console_suspend' and trying to catch

the crash on the screen or a serial line, or maybe through pstore)? I
really don't see much that could go wrong with this patch, so without
more info it will be hard to understand your problem.

Also, I noticed that you have two HID devices plugged in during
suspend. Does it make a difference if you have different devices (e.g.
a mass storage stick) or none at all?


Looks like it doesn't like it when there's anything plugged into the
"SS" ports. I tried with just a HID keyboard or with just a hub. In
both cases it fails to resume. If I have nothing connected to the "SS"
ports then it resumes just fine.

I managed to catch something with ramoops. Looks like it's hitting
POISON_FREE when trying to delete some list entry.




<4>[  107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW 
list!
<4>[  107.047574] general protection fault:  [#1] PREEMPT SMP


I took a look at the xhci_mem_cleanup() function and to me it looks
like it tries to access a list_head that is already freed.

The struct list_head xhci->devs[].eps[].bw_endpoint_list is added to an endpoint 
list in xhci->rh_bw[].bw_table.interval_bw[].endpoints


xhci_mem_cleanup() frees all devices (the allocated xhci->devs[], containing the 
bw_endpoint_list) before it starts to loop through, and delete entries from the 
xhci->rh_bw[].bw_table.interval_bw[].endpoints list.


I can't see how this relates to Julius patch though, and I'm not sure yet why it 
only triggers when devices are connected to SS ports. Maybe just unlucky timing?


Does this help?:

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index c089668..b1a8a5f 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
kfree(cur_cd);
}

+   num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
+   for (i = 0; i < num_ports; i++) {
+   struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table;
+   for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
+   struct list_head *ep = >interval_bw[j].endpoints;
+   while (!list_empty(ep))
+   list_del_init(ep->next);
+   }
+   }
+
for (i = 1; i < MAX_HC_SLOTS; ++i)
xhci_free_virt_device(xhci, i);

@@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
if (!xhci->rh_bw)
goto no_bw;

-   num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
-   for (i = 0; i < num_ports; i++) {
-   struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table;
-   for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
-   struct list_head *ep = >interval_bw[j].endpoints;
-   while (!list_empty(ep))
-   list_del_init(ep->next);
-   }
-   }
-
for (i = 0; i < num_ports; i++) {
struct xhci_tt_bw_info *tt, *n;
list_for_each_entry_safe(tt, n, >rh_bw[i].tts, tt_list) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-07 Thread Mathias Nyman

On 05/06/2014 02:41 PM, Ville Syrjälä wrote:

On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:

Hmmm... very odd. I unfortunately don't have a machine that can easily
do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
in S3 (essentially the same code path), and I didn't run into any
problems.

How exactly does your machine fail on resume? Is it a kernel crash or
just a hang? Can you try getting some debug output (by setting 'echo N

/sys/module/printk/parameters/console_suspend' and trying to catch

the crash on the screen or a serial line, or maybe through pstore)? I
really don't see much that could go wrong with this patch, so without
more info it will be hard to understand your problem.

Also, I noticed that you have two HID devices plugged in during
suspend. Does it make a difference if you have different devices (e.g.
a mass storage stick) or none at all?


Looks like it doesn't like it when there's anything plugged into the
SS ports. I tried with just a HID keyboard or with just a hub. In
both cases it fails to resume. If I have nothing connected to the SS
ports then it resumes just fine.

I managed to catch something with ramoops. Looks like it's hitting
POISON_FREE when trying to delete some list entry.




4[  107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW 
list!
4[  107.047574] general protection fault:  [#1] PREEMPT SMP


I took a look at the xhci_mem_cleanup() function and to me it looks
like it tries to access a list_head that is already freed.

The struct list_head xhci-devs[].eps[].bw_endpoint_list is added to an endpoint 
list in xhci-rh_bw[].bw_table.interval_bw[].endpoints


xhci_mem_cleanup() frees all devices (the allocated xhci-devs[], containing the 
bw_endpoint_list) before it starts to loop through, and delete entries from the 
xhci-rh_bw[].bw_table.interval_bw[].endpoints list.


I can't see how this relates to Julius patch though, and I'm not sure yet why it 
only triggers when devices are connected to SS ports. Maybe just unlucky timing?


Does this help?:

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index c089668..b1a8a5f 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
kfree(cur_cd);
}

+   num_ports = HCS_MAX_PORTS(xhci-hcs_params1);
+   for (i = 0; i  num_ports; i++) {
+   struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table;
+   for (j = 0; j  XHCI_MAX_INTERVAL; j++) {
+   struct list_head *ep = bwt-interval_bw[j].endpoints;
+   while (!list_empty(ep))
+   list_del_init(ep-next);
+   }
+   }
+
for (i = 1; i  MAX_HC_SLOTS; ++i)
xhci_free_virt_device(xhci, i);

@@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
if (!xhci-rh_bw)
goto no_bw;

-   num_ports = HCS_MAX_PORTS(xhci-hcs_params1);
-   for (i = 0; i  num_ports; i++) {
-   struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table;
-   for (j = 0; j  XHCI_MAX_INTERVAL; j++) {
-   struct list_head *ep = bwt-interval_bw[j].endpoints;
-   while (!list_empty(ep))
-   list_del_init(ep-next);
-   }
-   }
-
for (i = 0; i  num_ports; i++) {
struct xhci_tt_bw_info *tt, *n;
list_for_each_entry_safe(tt, n, xhci-rh_bw[i].tts, tt_list) {
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-07 Thread Ville Syrjälä
On Wed, May 07, 2014 at 04:48:48PM +0300, Mathias Nyman wrote:
 On 05/06/2014 02:41 PM, Ville Syrjälä wrote:
  On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:
  Hmmm... very odd. I unfortunately don't have a machine that can easily
  do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
  in S3 (essentially the same code path), and I didn't run into any
  problems.
 
  How exactly does your machine fail on resume? Is it a kernel crash or
  just a hang? Can you try getting some debug output (by setting 'echo N
  /sys/module/printk/parameters/console_suspend' and trying to catch
  the crash on the screen or a serial line, or maybe through pstore)? I
  really don't see much that could go wrong with this patch, so without
  more info it will be hard to understand your problem.
 
  Also, I noticed that you have two HID devices plugged in during
  suspend. Does it make a difference if you have different devices (e.g.
  a mass storage stick) or none at all?
 
  Looks like it doesn't like it when there's anything plugged into the
  SS ports. I tried with just a HID keyboard or with just a hub. In
  both cases it fails to resume. If I have nothing connected to the SS
  ports then it resumes just fine.
 
  I managed to catch something with ramoops. Looks like it's hitting
  POISON_FREE when trying to delete some list entry.
 
 
  4[  107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from 
  BW list!
  4[  107.047574] general protection fault:  [#1] PREEMPT SMP
 
 I took a look at the xhci_mem_cleanup() function and to me it looks
 like it tries to access a list_head that is already freed.
 
 The struct list_head xhci-devs[].eps[].bw_endpoint_list is added to an 
 endpoint 
 list in xhci-rh_bw[].bw_table.interval_bw[].endpoints
 
 xhci_mem_cleanup() frees all devices (the allocated xhci-devs[], containing 
 the 
 bw_endpoint_list) before it starts to loop through, and delete entries from 
 the 
 xhci-rh_bw[].bw_table.interval_bw[].endpoints list.
 
 I can't see how this relates to Julius patch though, and I'm not sure yet why 
 it 
 only triggers when devices are connected to SS ports. Maybe just unlucky 
 timing?

I think the non-SS ports are connected to the EHCI controllers rather
than the XHCI controllers. So that explains at least one detail. And I
guess timing is as good an excuse as any why this gets exposed by the
patch in question.

 
 Does this help?:

Indeed it does. The machine just survived a dozen or so suspend+resume
cycles without a hitch. The bug was 100% reproducible on this machine,
so the fix seems solid.

Tested-by: Ville Syrjälä ville.syrj...@linux.intel.com

 
 diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
 index c089668..b1a8a5f 100644
 --- a/drivers/usb/host/xhci-mem.c
 +++ b/drivers/usb/host/xhci-mem.c
 @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
  kfree(cur_cd);
  }
 
 +   num_ports = HCS_MAX_PORTS(xhci-hcs_params1);
 +   for (i = 0; i  num_ports; i++) {
 +   struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table;
 +   for (j = 0; j  XHCI_MAX_INTERVAL; j++) {
 +   struct list_head *ep = bwt-interval_bw[j].endpoints;
 +   while (!list_empty(ep))
 +   list_del_init(ep-next);
 +   }
 +   }
 +
  for (i = 1; i  MAX_HC_SLOTS; ++i)
  xhci_free_virt_device(xhci, i);
 
 @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
  if (!xhci-rh_bw)
  goto no_bw;
 
 -   num_ports = HCS_MAX_PORTS(xhci-hcs_params1);
 -   for (i = 0; i  num_ports; i++) {
 -   struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table;
 -   for (j = 0; j  XHCI_MAX_INTERVAL; j++) {
 -   struct list_head *ep = bwt-interval_bw[j].endpoints;
 -   while (!list_empty(ep))
 -   list_del_init(ep-next);
 -   }
 -   }
 -
  for (i = 0; i  num_ports; i++) {
  struct xhci_tt_bw_info *tt, *n;
  list_for_each_entry_safe(tt, n, xhci-rh_bw[i].tts, 
 tt_list) {

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-06 Thread Ville Syrjälä
On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:
> Hmmm... very odd. I unfortunately don't have a machine that can easily
> do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
> in S3 (essentially the same code path), and I didn't run into any
> problems.
> 
> How exactly does your machine fail on resume? Is it a kernel crash or
> just a hang? Can you try getting some debug output (by setting 'echo N
> > /sys/module/printk/parameters/console_suspend' and trying to catch
> the crash on the screen or a serial line, or maybe through pstore)? I
> really don't see much that could go wrong with this patch, so without
> more info it will be hard to understand your problem.
> 
> Also, I noticed that you have two HID devices plugged in during
> suspend. Does it make a difference if you have different devices (e.g.
> a mass storage stick) or none at all?

Looks like it doesn't like it when there's anything plugged into the
"SS" ports. I tried with just a HID keyboard or with just a hub. In
both cases it fails to resume. If I have nothing connected to the "SS"
ports then it resumes just fine.

I managed to catch something with ramoops. Looks like it's hitting
POISON_FREE when trying to delete some list entry.

Oops#1 Part1
<4>[  106.321876]  [] ? kthread_create_on_node+0x210/0x210
<4>[  106.321878]  [] ret_from_fork+0x7c/0xb0
<4>[  106.321879]  [] ? kthread_create_on_node+0x210/0x210
<4>[  106.321879] ---[ end trace f5b8b9411bd5e24b ]---
<6>[  106.719552] PM: freeze of devices complete after 513.577 msecs
<6>[  106.720978] PM: late freeze of devices complete after 1.377 msecs
<6>[  106.723388] PM: noirq freeze of devices complete after 2.378 msecs
<6>[  106.723795] ACPI: Preparing to enter system sleep state S4
<6>[  106.727934] PM: Saving platform NVS memory
<4>[  106.740582] Disabling non-boot CPUs ...
<6>[  106.743252] kvm: disabling virtualization on CPU1
<6>[  106.743332] smpboot: CPU 1 is now offline
<6>[  106.750476] kvm: disabling virtualization on CPU2
<6>[  106.750518] smpboot: CPU 2 is now offline
<6>[  106.754634] kvm: disabling virtualization on CPU3
<6>[  106.754682] smpboot: CPU 3 is now offline
<6>[  106.758510] kvm: disabling virtualization on CPU4
<6>[  106.758817] smpboot: CPU 4 is now offline
<6>[  106.761210] kvm: disabling virtualization on CPU5
<6>[  106.761253] smpboot: CPU 5 is now offline
<6>[  106.763567] kvm: disabling virtualization on CPU6
<6>[  106.763596] smpboot: CPU 6 is now offline
<6>[  106.765906] kvm: disabling virtualization on CPU7
<6>[  106.765943] smpboot: CPU 7 is now offline
<6>[  106.766958] PM: Creating hibernation image:
<6>[  106.786249] PM: Need to copy 73589 pages
<6>[  106.768456] PM: Restoring platform NVS memory
<6>[  106.769104] microcode: CPU0 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.770518] Enabling non-boot CPUs ...
<6>[  106.771473] x86: Booting SMP configuration:
<6>[  106.771536] smpboot: Booting Node 0 Processor 1 APIC 0x2
<6>[  106.783221] CPU1 microcode updated early to revision 0x19, date = 
2013-06-13
<6>[  106.783921] kvm: enabling virtualization on CPU1
<6>[  106.788131] microcode: CPU1 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.794579] CPU1 is up
<6>[  106.795048] smpboot: Booting Node 0 Processor 2 APIC 0x4
<6>[  106.806241] CPU2 microcode updated early to revision 0x19, date = 
2013-06-13
<6>[  106.806963] kvm: enabling virtualization on CPU2
<6>[  106.811056] microcode: CPU2 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.817512] CPU2 is up
<6>[  106.817999] smpboot: Booting Node 0 Processor 3 APIC 0x6
<6>[  106.829157] CPU3 microcode updated early to revision 0x19, date = 
2013-06-13
<6>[  106.829918] kvm: enabling virtualization on CPU3
<6>[  106.834104] microcode: CPU3 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.840666] CPU3 is up
<6>[  106.841118] smpboot: Booting Node 0 Processor 4 APIC 0x1
<6>[  106.852238] CPU4 microcode updated early to revision 0x19, date = 
2013-06-13
<6>[  106.853485] kvm: enabling virtualization on CPU4
<6>[  106.857868] microcode: CPU4 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.864443] CPU4 is up
<6>[  106.864911] smpboot: Booting Node 0 Processor 5 APIC 0x3
<6>[  106.876633] kvm: enabling virtualization on CPU5
<6>[  106.881188] microcode: CPU5 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.887793] CPU5 is up
<6>[  106.888264] smpboot: Booting Node 0 Processor 6 APIC 0x5
<6>[  106.96] kvm: enabling virtualization on CPU6
<6>[  106.904526] microcode: CPU6 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.911141] CPU6 is up
<6>[  106.911605] smpboot: Booting Node 0 Processor 7 APIC 0x7
<6>[  106.923408] kvm: enabling virtualization on CPU7
<6>[  106.928161] microcode: CPU7 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.934883] CPU7 is up
<6>[  106.957959] ACPI: Waking up from system sleep state S4
<6>[  106.990680] PM: noirq restore of devices complete after 11.474 msecs
<6>[  106.993975] PM: early restore of devices complete after 3.024 msecs
<4>[  107.046519] usb usb3: root hub lost 

Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-06 Thread Ville Syrjälä
On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:
 Hmmm... very odd. I unfortunately don't have a machine that can easily
 do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
 in S3 (essentially the same code path), and I didn't run into any
 problems.
 
 How exactly does your machine fail on resume? Is it a kernel crash or
 just a hang? Can you try getting some debug output (by setting 'echo N
  /sys/module/printk/parameters/console_suspend' and trying to catch
 the crash on the screen or a serial line, or maybe through pstore)? I
 really don't see much that could go wrong with this patch, so without
 more info it will be hard to understand your problem.
 
 Also, I noticed that you have two HID devices plugged in during
 suspend. Does it make a difference if you have different devices (e.g.
 a mass storage stick) or none at all?

Looks like it doesn't like it when there's anything plugged into the
SS ports. I tried with just a HID keyboard or with just a hub. In
both cases it fails to resume. If I have nothing connected to the SS
ports then it resumes just fine.

I managed to catch something with ramoops. Looks like it's hitting
POISON_FREE when trying to delete some list entry.

Oops#1 Part1
4[  106.321876]  [8106bb10] ? kthread_create_on_node+0x210/0x210
4[  106.321878]  [8151522c] ret_from_fork+0x7c/0xb0
4[  106.321879]  [8106bb10] ? kthread_create_on_node+0x210/0x210
4[  106.321879] ---[ end trace f5b8b9411bd5e24b ]---
6[  106.719552] PM: freeze of devices complete after 513.577 msecs
6[  106.720978] PM: late freeze of devices complete after 1.377 msecs
6[  106.723388] PM: noirq freeze of devices complete after 2.378 msecs
6[  106.723795] ACPI: Preparing to enter system sleep state S4
6[  106.727934] PM: Saving platform NVS memory
4[  106.740582] Disabling non-boot CPUs ...
6[  106.743252] kvm: disabling virtualization on CPU1
6[  106.743332] smpboot: CPU 1 is now offline
6[  106.750476] kvm: disabling virtualization on CPU2
6[  106.750518] smpboot: CPU 2 is now offline
6[  106.754634] kvm: disabling virtualization on CPU3
6[  106.754682] smpboot: CPU 3 is now offline
6[  106.758510] kvm: disabling virtualization on CPU4
6[  106.758817] smpboot: CPU 4 is now offline
6[  106.761210] kvm: disabling virtualization on CPU5
6[  106.761253] smpboot: CPU 5 is now offline
6[  106.763567] kvm: disabling virtualization on CPU6
6[  106.763596] smpboot: CPU 6 is now offline
6[  106.765906] kvm: disabling virtualization on CPU7
6[  106.765943] smpboot: CPU 7 is now offline
6[  106.766958] PM: Creating hibernation image:
6[  106.786249] PM: Need to copy 73589 pages
6[  106.768456] PM: Restoring platform NVS memory
6[  106.769104] microcode: CPU0 sig=0x306a9, pf=0x2, revision=0x19
6[  106.770518] Enabling non-boot CPUs ...
6[  106.771473] x86: Booting SMP configuration:
6[  106.771536] smpboot: Booting Node 0 Processor 1 APIC 0x2
6[  106.783221] CPU1 microcode updated early to revision 0x19, date = 
2013-06-13
6[  106.783921] kvm: enabling virtualization on CPU1
6[  106.788131] microcode: CPU1 sig=0x306a9, pf=0x2, revision=0x19
6[  106.794579] CPU1 is up
6[  106.795048] smpboot: Booting Node 0 Processor 2 APIC 0x4
6[  106.806241] CPU2 microcode updated early to revision 0x19, date = 
2013-06-13
6[  106.806963] kvm: enabling virtualization on CPU2
6[  106.811056] microcode: CPU2 sig=0x306a9, pf=0x2, revision=0x19
6[  106.817512] CPU2 is up
6[  106.817999] smpboot: Booting Node 0 Processor 3 APIC 0x6
6[  106.829157] CPU3 microcode updated early to revision 0x19, date = 
2013-06-13
6[  106.829918] kvm: enabling virtualization on CPU3
6[  106.834104] microcode: CPU3 sig=0x306a9, pf=0x2, revision=0x19
6[  106.840666] CPU3 is up
6[  106.841118] smpboot: Booting Node 0 Processor 4 APIC 0x1
6[  106.852238] CPU4 microcode updated early to revision 0x19, date = 
2013-06-13
6[  106.853485] kvm: enabling virtualization on CPU4
6[  106.857868] microcode: CPU4 sig=0x306a9, pf=0x2, revision=0x19
6[  106.864443] CPU4 is up
6[  106.864911] smpboot: Booting Node 0 Processor 5 APIC 0x3
6[  106.876633] kvm: enabling virtualization on CPU5
6[  106.881188] microcode: CPU5 sig=0x306a9, pf=0x2, revision=0x19
6[  106.887793] CPU5 is up
6[  106.888264] smpboot: Booting Node 0 Processor 6 APIC 0x5
6[  106.96] kvm: enabling virtualization on CPU6
6[  106.904526] microcode: CPU6 sig=0x306a9, pf=0x2, revision=0x19
6[  106.911141] CPU6 is up
6[  106.911605] smpboot: Booting Node 0 Processor 7 APIC 0x7
6[  106.923408] kvm: enabling virtualization on CPU7
6[  106.928161] microcode: CPU7 sig=0x306a9, pf=0x2, revision=0x19
6[  106.934883] CPU7 is up
6[  106.957959] ACPI: Waking up from system sleep state S4
6[  106.990680] PM: noirq restore of devices complete after 11.474 msecs
6[  106.993975] PM: early restore of devices complete after 3.024 msecs
4[  107.046519] usb usb3: root hub lost power or was reset
4[  107.046549] usb usb1: root hub lost power or was reset
4[  107.046694] usb usb4: 

Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-05 Thread Julius Werner
Hmmm... very odd. I unfortunately don't have a machine that can easily
do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
in S3 (essentially the same code path), and I didn't run into any
problems.

How exactly does your machine fail on resume? Is it a kernel crash or
just a hang? Can you try getting some debug output (by setting 'echo N
> /sys/module/printk/parameters/console_suspend' and trying to catch
the crash on the screen or a serial line, or maybe through pstore)? I
really don't see much that could go wrong with this patch, so without
more info it will be hard to understand your problem.

Also, I noticed that you have two HID devices plugged in during
suspend. Does it make a difference if you have different devices (e.g.
a mass storage stick) or none at all?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-05 Thread Julius Werner
Hmmm... very odd. I unfortunately don't have a machine that can easily
do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
in S3 (essentially the same code path), and I didn't run into any
problems.

How exactly does your machine fail on resume? Is it a kernel crash or
just a hang? Can you try getting some debug output (by setting 'echo N
 /sys/module/printk/parameters/console_suspend' and trying to catch
the crash on the screen or a serial line, or maybe through pstore)? I
really don't see much that could go wrong with this patch, so without
more info it will be hard to understand your problem.

Also, I noticed that you have two HID devices plugged in during
suspend. Does it make a difference if you have different devices (e.g.
a mass storage stick) or none at all?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/