Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
I can't see how this relates to Julius patch though, and I'm not sure yet why it only triggers when devices are connected to SS ports. Maybe just unlucky timing? I think the non-SS ports are connected to the EHCI controllers rather than the XHCI controllers. So that explains at least one detail. And I guess timing is as good an excuse as any why this gets exposed by the patch in question. That's right, sometimes I forget that there exists something else than xHCI. Does this help?: Indeed it does. The machine just survived a dozen or so suspend+resume cycles without a hitch. The bug was 100% reproducible on this machine, so the fix seems solid. Tested-by: Ville Syrjälä Great, a patch with your Tested-by tag pushed to my tree at: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git for-usb-linus -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
I can't see how this relates to Julius patch though, and I'm not sure yet why it only triggers when devices are connected to SS ports. Maybe just unlucky timing? I think the non-SS ports are connected to the EHCI controllers rather than the XHCI controllers. So that explains at least one detail. And I guess timing is as good an excuse as any why this gets exposed by the patch in question. That's right, sometimes I forget that there exists something else than xHCI. Does this help?: Indeed it does. The machine just survived a dozen or so suspend+resume cycles without a hitch. The bug was 100% reproducible on this machine, so the fix seems solid. Tested-by: Ville Syrjälä ville.syrj...@linux.intel.com Great, a patch with your Tested-by tag pushed to my tree at: git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git for-usb-linus -Mathias -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On Wed, May 07, 2014 at 04:48:48PM +0300, Mathias Nyman wrote: > On 05/06/2014 02:41 PM, Ville Syrjälä wrote: > > On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: > >> Hmmm... very odd. I unfortunately don't have a machine that can easily > >> do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME > >> in S3 (essentially the same code path), and I didn't run into any > >> problems. > >> > >> How exactly does your machine fail on resume? Is it a kernel crash or > >> just a hang? Can you try getting some debug output (by setting 'echo N > >>> /sys/module/printk/parameters/console_suspend' and trying to catch > >> the crash on the screen or a serial line, or maybe through pstore)? I > >> really don't see much that could go wrong with this patch, so without > >> more info it will be hard to understand your problem. > >> > >> Also, I noticed that you have two HID devices plugged in during > >> suspend. Does it make a difference if you have different devices (e.g. > >> a mass storage stick) or none at all? > > > > Looks like it doesn't like it when there's anything plugged into the > > "SS" ports. I tried with just a HID keyboard or with just a hub. In > > both cases it fails to resume. If I have nothing connected to the "SS" > > ports then it resumes just fine. > > > > I managed to catch something with ramoops. Looks like it's hitting > > POISON_FREE when trying to delete some list entry. > > > > > <4>[ 107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from > > BW list! > > <4>[ 107.047574] general protection fault: [#1] PREEMPT SMP > > I took a look at the xhci_mem_cleanup() function and to me it looks > like it tries to access a list_head that is already freed. > > The struct list_head xhci->devs[].eps[].bw_endpoint_list is added to an > endpoint > list in xhci->rh_bw[].bw_table.interval_bw[].endpoints > > xhci_mem_cleanup() frees all devices (the allocated xhci->devs[], containing > the > bw_endpoint_list) before it starts to loop through, and delete entries from > the > xhci->rh_bw[].bw_table.interval_bw[].endpoints list. > > I can't see how this relates to Julius patch though, and I'm not sure yet why > it > only triggers when devices are connected to SS ports. Maybe just unlucky > timing? I think the non-SS ports are connected to the EHCI controllers rather than the XHCI controllers. So that explains at least one detail. And I guess timing is as good an excuse as any why this gets exposed by the patch in question. > > Does this help?: Indeed it does. The machine just survived a dozen or so suspend+resume cycles without a hitch. The bug was 100% reproducible on this machine, so the fix seems solid. Tested-by: Ville Syrjälä > > diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c > index c089668..b1a8a5f 100644 > --- a/drivers/usb/host/xhci-mem.c > +++ b/drivers/usb/host/xhci-mem.c > @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) > kfree(cur_cd); > } > > + num_ports = HCS_MAX_PORTS(xhci->hcs_params1); > + for (i = 0; i < num_ports; i++) { > + struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table; > + for (j = 0; j < XHCI_MAX_INTERVAL; j++) { > + struct list_head *ep = >interval_bw[j].endpoints; > + while (!list_empty(ep)) > + list_del_init(ep->next); > + } > + } > + > for (i = 1; i < MAX_HC_SLOTS; ++i) > xhci_free_virt_device(xhci, i); > > @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) > if (!xhci->rh_bw) > goto no_bw; > > - num_ports = HCS_MAX_PORTS(xhci->hcs_params1); > - for (i = 0; i < num_ports; i++) { > - struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table; > - for (j = 0; j < XHCI_MAX_INTERVAL; j++) { > - struct list_head *ep = >interval_bw[j].endpoints; > - while (!list_empty(ep)) > - list_del_init(ep->next); > - } > - } > - > for (i = 0; i < num_ports; i++) { > struct xhci_tt_bw_info *tt, *n; > list_for_each_entry_safe(tt, n, >rh_bw[i].tts, > tt_list) { -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On 05/06/2014 02:41 PM, Ville Syrjälä wrote: On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: Hmmm... very odd. I unfortunately don't have a machine that can easily do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME in S3 (essentially the same code path), and I didn't run into any problems. How exactly does your machine fail on resume? Is it a kernel crash or just a hang? Can you try getting some debug output (by setting 'echo N /sys/module/printk/parameters/console_suspend' and trying to catch the crash on the screen or a serial line, or maybe through pstore)? I really don't see much that could go wrong with this patch, so without more info it will be hard to understand your problem. Also, I noticed that you have two HID devices plugged in during suspend. Does it make a difference if you have different devices (e.g. a mass storage stick) or none at all? Looks like it doesn't like it when there's anything plugged into the "SS" ports. I tried with just a HID keyboard or with just a hub. In both cases it fails to resume. If I have nothing connected to the "SS" ports then it resumes just fine. I managed to catch something with ramoops. Looks like it's hitting POISON_FREE when trying to delete some list entry. <4>[ 107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW list! <4>[ 107.047574] general protection fault: [#1] PREEMPT SMP I took a look at the xhci_mem_cleanup() function and to me it looks like it tries to access a list_head that is already freed. The struct list_head xhci->devs[].eps[].bw_endpoint_list is added to an endpoint list in xhci->rh_bw[].bw_table.interval_bw[].endpoints xhci_mem_cleanup() frees all devices (the allocated xhci->devs[], containing the bw_endpoint_list) before it starts to loop through, and delete entries from the xhci->rh_bw[].bw_table.interval_bw[].endpoints list. I can't see how this relates to Julius patch though, and I'm not sure yet why it only triggers when devices are connected to SS ports. Maybe just unlucky timing? Does this help?: diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index c089668..b1a8a5f 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) kfree(cur_cd); } + num_ports = HCS_MAX_PORTS(xhci->hcs_params1); + for (i = 0; i < num_ports; i++) { + struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table; + for (j = 0; j < XHCI_MAX_INTERVAL; j++) { + struct list_head *ep = >interval_bw[j].endpoints; + while (!list_empty(ep)) + list_del_init(ep->next); + } + } + for (i = 1; i < MAX_HC_SLOTS; ++i) xhci_free_virt_device(xhci, i); @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) if (!xhci->rh_bw) goto no_bw; - num_ports = HCS_MAX_PORTS(xhci->hcs_params1); - for (i = 0; i < num_ports; i++) { - struct xhci_interval_bw_table *bwt = >rh_bw[i].bw_table; - for (j = 0; j < XHCI_MAX_INTERVAL; j++) { - struct list_head *ep = >interval_bw[j].endpoints; - while (!list_empty(ep)) - list_del_init(ep->next); - } - } - for (i = 0; i < num_ports; i++) { struct xhci_tt_bw_info *tt, *n; list_for_each_entry_safe(tt, n, >rh_bw[i].tts, tt_list) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On 05/06/2014 02:41 PM, Ville Syrjälä wrote: On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: Hmmm... very odd. I unfortunately don't have a machine that can easily do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME in S3 (essentially the same code path), and I didn't run into any problems. How exactly does your machine fail on resume? Is it a kernel crash or just a hang? Can you try getting some debug output (by setting 'echo N /sys/module/printk/parameters/console_suspend' and trying to catch the crash on the screen or a serial line, or maybe through pstore)? I really don't see much that could go wrong with this patch, so without more info it will be hard to understand your problem. Also, I noticed that you have two HID devices plugged in during suspend. Does it make a difference if you have different devices (e.g. a mass storage stick) or none at all? Looks like it doesn't like it when there's anything plugged into the SS ports. I tried with just a HID keyboard or with just a hub. In both cases it fails to resume. If I have nothing connected to the SS ports then it resumes just fine. I managed to catch something with ramoops. Looks like it's hitting POISON_FREE when trying to delete some list entry. 4[ 107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW list! 4[ 107.047574] general protection fault: [#1] PREEMPT SMP I took a look at the xhci_mem_cleanup() function and to me it looks like it tries to access a list_head that is already freed. The struct list_head xhci-devs[].eps[].bw_endpoint_list is added to an endpoint list in xhci-rh_bw[].bw_table.interval_bw[].endpoints xhci_mem_cleanup() frees all devices (the allocated xhci-devs[], containing the bw_endpoint_list) before it starts to loop through, and delete entries from the xhci-rh_bw[].bw_table.interval_bw[].endpoints list. I can't see how this relates to Julius patch though, and I'm not sure yet why it only triggers when devices are connected to SS ports. Maybe just unlucky timing? Does this help?: diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index c089668..b1a8a5f 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) kfree(cur_cd); } + num_ports = HCS_MAX_PORTS(xhci-hcs_params1); + for (i = 0; i num_ports; i++) { + struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table; + for (j = 0; j XHCI_MAX_INTERVAL; j++) { + struct list_head *ep = bwt-interval_bw[j].endpoints; + while (!list_empty(ep)) + list_del_init(ep-next); + } + } + for (i = 1; i MAX_HC_SLOTS; ++i) xhci_free_virt_device(xhci, i); @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) if (!xhci-rh_bw) goto no_bw; - num_ports = HCS_MAX_PORTS(xhci-hcs_params1); - for (i = 0; i num_ports; i++) { - struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table; - for (j = 0; j XHCI_MAX_INTERVAL; j++) { - struct list_head *ep = bwt-interval_bw[j].endpoints; - while (!list_empty(ep)) - list_del_init(ep-next); - } - } - for (i = 0; i num_ports; i++) { struct xhci_tt_bw_info *tt, *n; list_for_each_entry_safe(tt, n, xhci-rh_bw[i].tts, tt_list) { -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On Wed, May 07, 2014 at 04:48:48PM +0300, Mathias Nyman wrote: On 05/06/2014 02:41 PM, Ville Syrjälä wrote: On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: Hmmm... very odd. I unfortunately don't have a machine that can easily do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME in S3 (essentially the same code path), and I didn't run into any problems. How exactly does your machine fail on resume? Is it a kernel crash or just a hang? Can you try getting some debug output (by setting 'echo N /sys/module/printk/parameters/console_suspend' and trying to catch the crash on the screen or a serial line, or maybe through pstore)? I really don't see much that could go wrong with this patch, so without more info it will be hard to understand your problem. Also, I noticed that you have two HID devices plugged in during suspend. Does it make a difference if you have different devices (e.g. a mass storage stick) or none at all? Looks like it doesn't like it when there's anything plugged into the SS ports. I tried with just a HID keyboard or with just a hub. In both cases it fails to resume. If I have nothing connected to the SS ports then it resumes just fine. I managed to catch something with ramoops. Looks like it's hitting POISON_FREE when trying to delete some list entry. 4[ 107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW list! 4[ 107.047574] general protection fault: [#1] PREEMPT SMP I took a look at the xhci_mem_cleanup() function and to me it looks like it tries to access a list_head that is already freed. The struct list_head xhci-devs[].eps[].bw_endpoint_list is added to an endpoint list in xhci-rh_bw[].bw_table.interval_bw[].endpoints xhci_mem_cleanup() frees all devices (the allocated xhci-devs[], containing the bw_endpoint_list) before it starts to loop through, and delete entries from the xhci-rh_bw[].bw_table.interval_bw[].endpoints list. I can't see how this relates to Julius patch though, and I'm not sure yet why it only triggers when devices are connected to SS ports. Maybe just unlucky timing? I think the non-SS ports are connected to the EHCI controllers rather than the XHCI controllers. So that explains at least one detail. And I guess timing is as good an excuse as any why this gets exposed by the patch in question. Does this help?: Indeed it does. The machine just survived a dozen or so suspend+resume cycles without a hitch. The bug was 100% reproducible on this machine, so the fix seems solid. Tested-by: Ville Syrjälä ville.syrj...@linux.intel.com diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index c089668..b1a8a5f 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) kfree(cur_cd); } + num_ports = HCS_MAX_PORTS(xhci-hcs_params1); + for (i = 0; i num_ports; i++) { + struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table; + for (j = 0; j XHCI_MAX_INTERVAL; j++) { + struct list_head *ep = bwt-interval_bw[j].endpoints; + while (!list_empty(ep)) + list_del_init(ep-next); + } + } + for (i = 1; i MAX_HC_SLOTS; ++i) xhci_free_virt_device(xhci, i); @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) if (!xhci-rh_bw) goto no_bw; - num_ports = HCS_MAX_PORTS(xhci-hcs_params1); - for (i = 0; i num_ports; i++) { - struct xhci_interval_bw_table *bwt = xhci-rh_bw[i].bw_table; - for (j = 0; j XHCI_MAX_INTERVAL; j++) { - struct list_head *ep = bwt-interval_bw[j].endpoints; - while (!list_empty(ep)) - list_del_init(ep-next); - } - } - for (i = 0; i num_ports; i++) { struct xhci_tt_bw_info *tt, *n; list_for_each_entry_safe(tt, n, xhci-rh_bw[i].tts, tt_list) { -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: > Hmmm... very odd. I unfortunately don't have a machine that can easily > do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME > in S3 (essentially the same code path), and I didn't run into any > problems. > > How exactly does your machine fail on resume? Is it a kernel crash or > just a hang? Can you try getting some debug output (by setting 'echo N > > /sys/module/printk/parameters/console_suspend' and trying to catch > the crash on the screen or a serial line, or maybe through pstore)? I > really don't see much that could go wrong with this patch, so without > more info it will be hard to understand your problem. > > Also, I noticed that you have two HID devices plugged in during > suspend. Does it make a difference if you have different devices (e.g. > a mass storage stick) or none at all? Looks like it doesn't like it when there's anything plugged into the "SS" ports. I tried with just a HID keyboard or with just a hub. In both cases it fails to resume. If I have nothing connected to the "SS" ports then it resumes just fine. I managed to catch something with ramoops. Looks like it's hitting POISON_FREE when trying to delete some list entry. Oops#1 Part1 <4>[ 106.321876] [] ? kthread_create_on_node+0x210/0x210 <4>[ 106.321878] [] ret_from_fork+0x7c/0xb0 <4>[ 106.321879] [] ? kthread_create_on_node+0x210/0x210 <4>[ 106.321879] ---[ end trace f5b8b9411bd5e24b ]--- <6>[ 106.719552] PM: freeze of devices complete after 513.577 msecs <6>[ 106.720978] PM: late freeze of devices complete after 1.377 msecs <6>[ 106.723388] PM: noirq freeze of devices complete after 2.378 msecs <6>[ 106.723795] ACPI: Preparing to enter system sleep state S4 <6>[ 106.727934] PM: Saving platform NVS memory <4>[ 106.740582] Disabling non-boot CPUs ... <6>[ 106.743252] kvm: disabling virtualization on CPU1 <6>[ 106.743332] smpboot: CPU 1 is now offline <6>[ 106.750476] kvm: disabling virtualization on CPU2 <6>[ 106.750518] smpboot: CPU 2 is now offline <6>[ 106.754634] kvm: disabling virtualization on CPU3 <6>[ 106.754682] smpboot: CPU 3 is now offline <6>[ 106.758510] kvm: disabling virtualization on CPU4 <6>[ 106.758817] smpboot: CPU 4 is now offline <6>[ 106.761210] kvm: disabling virtualization on CPU5 <6>[ 106.761253] smpboot: CPU 5 is now offline <6>[ 106.763567] kvm: disabling virtualization on CPU6 <6>[ 106.763596] smpboot: CPU 6 is now offline <6>[ 106.765906] kvm: disabling virtualization on CPU7 <6>[ 106.765943] smpboot: CPU 7 is now offline <6>[ 106.766958] PM: Creating hibernation image: <6>[ 106.786249] PM: Need to copy 73589 pages <6>[ 106.768456] PM: Restoring platform NVS memory <6>[ 106.769104] microcode: CPU0 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.770518] Enabling non-boot CPUs ... <6>[ 106.771473] x86: Booting SMP configuration: <6>[ 106.771536] smpboot: Booting Node 0 Processor 1 APIC 0x2 <6>[ 106.783221] CPU1 microcode updated early to revision 0x19, date = 2013-06-13 <6>[ 106.783921] kvm: enabling virtualization on CPU1 <6>[ 106.788131] microcode: CPU1 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.794579] CPU1 is up <6>[ 106.795048] smpboot: Booting Node 0 Processor 2 APIC 0x4 <6>[ 106.806241] CPU2 microcode updated early to revision 0x19, date = 2013-06-13 <6>[ 106.806963] kvm: enabling virtualization on CPU2 <6>[ 106.811056] microcode: CPU2 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.817512] CPU2 is up <6>[ 106.817999] smpboot: Booting Node 0 Processor 3 APIC 0x6 <6>[ 106.829157] CPU3 microcode updated early to revision 0x19, date = 2013-06-13 <6>[ 106.829918] kvm: enabling virtualization on CPU3 <6>[ 106.834104] microcode: CPU3 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.840666] CPU3 is up <6>[ 106.841118] smpboot: Booting Node 0 Processor 4 APIC 0x1 <6>[ 106.852238] CPU4 microcode updated early to revision 0x19, date = 2013-06-13 <6>[ 106.853485] kvm: enabling virtualization on CPU4 <6>[ 106.857868] microcode: CPU4 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.864443] CPU4 is up <6>[ 106.864911] smpboot: Booting Node 0 Processor 5 APIC 0x3 <6>[ 106.876633] kvm: enabling virtualization on CPU5 <6>[ 106.881188] microcode: CPU5 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.887793] CPU5 is up <6>[ 106.888264] smpboot: Booting Node 0 Processor 6 APIC 0x5 <6>[ 106.96] kvm: enabling virtualization on CPU6 <6>[ 106.904526] microcode: CPU6 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.911141] CPU6 is up <6>[ 106.911605] smpboot: Booting Node 0 Processor 7 APIC 0x7 <6>[ 106.923408] kvm: enabling virtualization on CPU7 <6>[ 106.928161] microcode: CPU7 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.934883] CPU7 is up <6>[ 106.957959] ACPI: Waking up from system sleep state S4 <6>[ 106.990680] PM: noirq restore of devices complete after 11.474 msecs <6>[ 106.993975] PM: early restore of devices complete after 3.024 msecs <4>[ 107.046519] usb usb3: root hub lost
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: Hmmm... very odd. I unfortunately don't have a machine that can easily do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME in S3 (essentially the same code path), and I didn't run into any problems. How exactly does your machine fail on resume? Is it a kernel crash or just a hang? Can you try getting some debug output (by setting 'echo N /sys/module/printk/parameters/console_suspend' and trying to catch the crash on the screen or a serial line, or maybe through pstore)? I really don't see much that could go wrong with this patch, so without more info it will be hard to understand your problem. Also, I noticed that you have two HID devices plugged in during suspend. Does it make a difference if you have different devices (e.g. a mass storage stick) or none at all? Looks like it doesn't like it when there's anything plugged into the SS ports. I tried with just a HID keyboard or with just a hub. In both cases it fails to resume. If I have nothing connected to the SS ports then it resumes just fine. I managed to catch something with ramoops. Looks like it's hitting POISON_FREE when trying to delete some list entry. Oops#1 Part1 4[ 106.321876] [8106bb10] ? kthread_create_on_node+0x210/0x210 4[ 106.321878] [8151522c] ret_from_fork+0x7c/0xb0 4[ 106.321879] [8106bb10] ? kthread_create_on_node+0x210/0x210 4[ 106.321879] ---[ end trace f5b8b9411bd5e24b ]--- 6[ 106.719552] PM: freeze of devices complete after 513.577 msecs 6[ 106.720978] PM: late freeze of devices complete after 1.377 msecs 6[ 106.723388] PM: noirq freeze of devices complete after 2.378 msecs 6[ 106.723795] ACPI: Preparing to enter system sleep state S4 6[ 106.727934] PM: Saving platform NVS memory 4[ 106.740582] Disabling non-boot CPUs ... 6[ 106.743252] kvm: disabling virtualization on CPU1 6[ 106.743332] smpboot: CPU 1 is now offline 6[ 106.750476] kvm: disabling virtualization on CPU2 6[ 106.750518] smpboot: CPU 2 is now offline 6[ 106.754634] kvm: disabling virtualization on CPU3 6[ 106.754682] smpboot: CPU 3 is now offline 6[ 106.758510] kvm: disabling virtualization on CPU4 6[ 106.758817] smpboot: CPU 4 is now offline 6[ 106.761210] kvm: disabling virtualization on CPU5 6[ 106.761253] smpboot: CPU 5 is now offline 6[ 106.763567] kvm: disabling virtualization on CPU6 6[ 106.763596] smpboot: CPU 6 is now offline 6[ 106.765906] kvm: disabling virtualization on CPU7 6[ 106.765943] smpboot: CPU 7 is now offline 6[ 106.766958] PM: Creating hibernation image: 6[ 106.786249] PM: Need to copy 73589 pages 6[ 106.768456] PM: Restoring platform NVS memory 6[ 106.769104] microcode: CPU0 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.770518] Enabling non-boot CPUs ... 6[ 106.771473] x86: Booting SMP configuration: 6[ 106.771536] smpboot: Booting Node 0 Processor 1 APIC 0x2 6[ 106.783221] CPU1 microcode updated early to revision 0x19, date = 2013-06-13 6[ 106.783921] kvm: enabling virtualization on CPU1 6[ 106.788131] microcode: CPU1 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.794579] CPU1 is up 6[ 106.795048] smpboot: Booting Node 0 Processor 2 APIC 0x4 6[ 106.806241] CPU2 microcode updated early to revision 0x19, date = 2013-06-13 6[ 106.806963] kvm: enabling virtualization on CPU2 6[ 106.811056] microcode: CPU2 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.817512] CPU2 is up 6[ 106.817999] smpboot: Booting Node 0 Processor 3 APIC 0x6 6[ 106.829157] CPU3 microcode updated early to revision 0x19, date = 2013-06-13 6[ 106.829918] kvm: enabling virtualization on CPU3 6[ 106.834104] microcode: CPU3 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.840666] CPU3 is up 6[ 106.841118] smpboot: Booting Node 0 Processor 4 APIC 0x1 6[ 106.852238] CPU4 microcode updated early to revision 0x19, date = 2013-06-13 6[ 106.853485] kvm: enabling virtualization on CPU4 6[ 106.857868] microcode: CPU4 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.864443] CPU4 is up 6[ 106.864911] smpboot: Booting Node 0 Processor 5 APIC 0x3 6[ 106.876633] kvm: enabling virtualization on CPU5 6[ 106.881188] microcode: CPU5 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.887793] CPU5 is up 6[ 106.888264] smpboot: Booting Node 0 Processor 6 APIC 0x5 6[ 106.96] kvm: enabling virtualization on CPU6 6[ 106.904526] microcode: CPU6 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.911141] CPU6 is up 6[ 106.911605] smpboot: Booting Node 0 Processor 7 APIC 0x7 6[ 106.923408] kvm: enabling virtualization on CPU7 6[ 106.928161] microcode: CPU7 sig=0x306a9, pf=0x2, revision=0x19 6[ 106.934883] CPU7 is up 6[ 106.957959] ACPI: Waking up from system sleep state S4 6[ 106.990680] PM: noirq restore of devices complete after 11.474 msecs 6[ 106.993975] PM: early restore of devices complete after 3.024 msecs 4[ 107.046519] usb usb3: root hub lost power or was reset 4[ 107.046549] usb usb1: root hub lost power or was reset 4[ 107.046694] usb usb4:
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
Hmmm... very odd. I unfortunately don't have a machine that can easily do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME in S3 (essentially the same code path), and I didn't run into any problems. How exactly does your machine fail on resume? Is it a kernel crash or just a hang? Can you try getting some debug output (by setting 'echo N > /sys/module/printk/parameters/console_suspend' and trying to catch the crash on the screen or a serial line, or maybe through pstore)? I really don't see much that could go wrong with this patch, so without more info it will be hard to understand your problem. Also, I noticed that you have two HID devices plugged in during suspend. Does it make a difference if you have different devices (e.g. a mass storage stick) or none at all? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
Hmmm... very odd. I unfortunately don't have a machine that can easily do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME in S3 (essentially the same code path), and I didn't run into any problems. How exactly does your machine fail on resume? Is it a kernel crash or just a hang? Can you try getting some debug output (by setting 'echo N /sys/module/printk/parameters/console_suspend' and trying to catch the crash on the screen or a serial line, or maybe through pstore)? I really don't see much that could go wrong with this patch, so without more info it will be hard to understand your problem. Also, I noticed that you have two HID devices plugged in during suspend. Does it make a difference if you have different devices (e.g. a mass storage stick) or none at all? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/