subject:"Machine crashes right \*after\* \~successful resume"

On Fri, Oct 31, 2014 at 5:00 PM, Wilmer van der Gaast  wrote:
> Hello,
>
> Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the
> problem as well!

updated first #1.
---
 drivers/pci/pci.c |   18 ++
 1 file changed, 18 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	/*
+	 * FW enable the bridge already, so call pci_enable_bridge()
+	 * to keep enable_cnt consistent, then later we can go through
+	 * pci_pm_resume/pci_pm_reenable_device to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+if (!pci_is_enabled(dev)) {
+		u16 cmd;
+
+		pci_read_config_word(dev, PCI_COMMAND, );
+		if (cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY))
+			pci_enable_bridge(dev);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume


Hello,

Patch #1 worked after a simple s/&&/)/. And patch #2 seems to fix the 
problem as well!



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 2:22 PM, Yinghai Lu  wrote:
> On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast  
> wrote:
>> On 31-10-14 16:11, Yinghai Lu wrote:
>>>
>>>
>>> Good. Please check if attached one on top of 3.17 only would work too.
>>>
>> No luck, sadly. :-( Unsuccessful third resume.

Please try attached two patches separately on top of 3.17.
---
 drivers/pci/pci.c |   18 ++
 1 file changed, 18 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	/*
+	 * FW enable the bridge already, so call pci_enable_bridge()
+	 * to keep enable_cnt consistent, then later we can go through
+	 * pci_pm_resume/pci_pm_reenable_device to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+if (!pci_is_enabled(dev)) {
+		u16 cmd;
+
+		pci_read_config_word(dev, PCI_COMMAND, );
+		if ((cmd & (PCI_COMMAND_IO || PCI_COMMAND_MEMORY)) &&
+			pci_enable_bridge(dev);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
---
 drivers/pci/pci-driver.c |9 +
 1 file changed, 9 insertions(+)

Index: linux-2.6/drivers/pci/pci-driver.c
===
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -519,8 +519,17 @@ static void pci_pm_set_unknown_state(str
  */
 static int pci_pm_reenable_device(struct pci_dev *pci_dev)
 {
+	u16 cmd;
 	int retval;
 
+	/* update enable_cnt according to cmd register */
+	pci_read_config_word(pci_dev, PCI_COMMAND, );
+	if (!pci_dev->is_busmaster && (cmd & PCI_COMMAND_MASTER))
+		pci_dev->is_busmaster = true;
+	if (!pci_is_enabled(pci_dev) &&
+	(cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)))
+		atomic_inc(_dev->enable_cnt);
+
 	/* if the device was enabled before suspend, reenable */
 	retval = pci_reenable_device(pci_dev);
 	/*

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast  wrote:
> On 31-10-14 16:11, Yinghai Lu wrote:
>>
>>
>> Good. Please check if attached one on top of 3.17 only would work too.
>>
> No luck, sadly. :-( Unsuccessful third resume.
>
> I forgot to set up the serial console, would that still be useful?

never mind, let me go through suspend/resume code path again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


On 31-10-14 16:11, Yinghai Lu wrote:


Good. Please check if attached one on top of 3.17 only would work too.


No luck, sadly. :-( Unsuccessful third resume.

I forgot to set up the serial console, would that still be useful?


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 2:39 AM, Wilmer van der Gaast  wrote:
> Hello Yinghai,
>
> On 31-10-14 02:13, Yinghai Lu wrote:
>>
>> Last try:
>>
>> Please check attached patch that will keep state consistent.
>
>
> Good news: This last patch worked! For good measure, I ran my test twice
> with a reboot in between. Worked consistently.
>
> And similarly, to ensure that your debugging-at-boottime-only patch wasn't
> just working by accident yesterday, I tested it twice more with the same
> effect.

Good. Please check if attached one on top of 3.17 only would work too.

Thanks

Yinghai
---
 drivers/pci/pci.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1063,7 +1063,9 @@ static void pci_restore_config_space(str
 		pci_restore_config_space_range(pdev, 4, 9, 10);
 		pci_restore_config_space_range(pdev, 0, 3, 0);
 	} else {
-		pci_restore_config_space_range(pdev, 0, 15, 0);
+		/* Restore BARs before the command register. */
+		pci_restore_config_space_range(pdev, 4, 15, 0);
+		pci_restore_config_space_range(pdev, 0, 3, 0);
 	}
 }

Re: Machine crashes right after ~successful resume


Hello Yinghai,

On 31-10-14 02:13, Yinghai Lu wrote:

Last try:

Please check attached patch that will keep state consistent.


Good news: This last patch worked! For good measure, I ran my test twice 
with a reboot in between. Worked consistently.


And similarly, to ensure that your debugging-at-boottime-only patch 
wasn't just working by accident yesterday, I tested it twice more with 
the same effect.



Thanks,

Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello Yinghai,

On 31-10-14 02:13, Yinghai Lu wrote:

Last try:

Please check attached patch that will keep state consistent.


Good news: This last patch worked! For good measure, I ran my test twice 
with a reboot in between. Worked consistently.


And similarly, to ensure that your debugging-at-boottime-only patch 
wasn't just working by accident yesterday, I tested it twice more with 
the same effect.



Thanks,

Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 2:39 AM, Wilmer van der Gaast wil...@gaast.net wrote:
 Hello Yinghai,

 On 31-10-14 02:13, Yinghai Lu wrote:

 Last try:

 Please check attached patch that will keep state consistent.


 Good news: This last patch worked! For good measure, I ran my test twice
 with a reboot in between. Worked consistently.

 And similarly, to ensure that your debugging-at-boottime-only patch wasn't
 just working by accident yesterday, I tested it twice more with the same
 effect.

Good. Please check if attached one on top of 3.17 only would work too.

Thanks

Yinghai
---
 drivers/pci/pci.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1063,7 +1063,9 @@ static void pci_restore_config_space(str
 		pci_restore_config_space_range(pdev, 4, 9, 10);
 		pci_restore_config_space_range(pdev, 0, 3, 0);
 	} else {
-		pci_restore_config_space_range(pdev, 0, 15, 0);
+		/* Restore BARs before the command register. */
+		pci_restore_config_space_range(pdev, 4, 15, 0);
+		pci_restore_config_space_range(pdev, 0, 3, 0);
 	}
 }

Re: Machine crashes right after ~successful resume


On 31-10-14 16:11, Yinghai Lu wrote:


Good. Please check if attached one on top of 3.17 only would work too.


No luck, sadly. :-( Unsuccessful third resume.

I forgot to set up the serial console, would that still be useful?


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 On 31-10-14 16:11, Yinghai Lu wrote:


 Good. Please check if attached one on top of 3.17 only would work too.

 No luck, sadly. :-( Unsuccessful third resume.

 I forgot to set up the serial console, would that still be useful?

never mind, let me go through suspend/resume code path again.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 2:22 PM, Yinghai Lu ying...@kernel.org wrote:
 On Fri, Oct 31, 2014 at 2:13 PM, Wilmer van der Gaast wil...@gaast.net 
 wrote:
 On 31-10-14 16:11, Yinghai Lu wrote:


 Good. Please check if attached one on top of 3.17 only would work too.

 No luck, sadly. :-( Unsuccessful third resume.

Please try attached two patches separately on top of 3.17.
---
 drivers/pci/pci.c |   18 ++
 1 file changed, 18 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	/*
+	 * FW enable the bridge already, so call pci_enable_bridge()
+	 * to keep enable_cnt consistent, then later we can go through
+	 * pci_pm_resume/pci_pm_reenable_device to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+if (!pci_is_enabled(dev)) {
+		u16 cmd;
+
+		pci_read_config_word(dev, PCI_COMMAND, cmd);
+		if ((cmd  (PCI_COMMAND_IO || PCI_COMMAND_MEMORY)) 
+			pci_enable_bridge(dev);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
---
 drivers/pci/pci-driver.c |9 +
 1 file changed, 9 insertions(+)

Index: linux-2.6/drivers/pci/pci-driver.c
===
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -519,8 +519,17 @@ static void pci_pm_set_unknown_state(str
  */
 static int pci_pm_reenable_device(struct pci_dev *pci_dev)
 {
+	u16 cmd;
 	int retval;
 
+	/* update enable_cnt according to cmd register */
+	pci_read_config_word(pci_dev, PCI_COMMAND, cmd);
+	if (!pci_dev-is_busmaster  (cmd  PCI_COMMAND_MASTER))
+		pci_dev-is_busmaster = true;
+	if (!pci_is_enabled(pci_dev) 
+	(cmd  (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)))
+		atomic_inc(pci_dev-enable_cnt);
+
 	/* if the device was enabled before suspend, reenable */
 	retval = pci_reenable_device(pci_dev);
 	/*

Re: Machine crashes right after ~successful resume


Hello,

Patch #1 worked after a simple s//)/. And patch #2 seems to fix the 
problem as well!



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Fri, Oct 31, 2014 at 5:00 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 Hello,

 Patch #1 worked after a simple s//)/. And patch #2 seems to fix the
 problem as well!

updated first #1.
---
 drivers/pci/pci.c |   18 ++
 1 file changed, 18 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,24 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	/*
+	 * FW enable the bridge already, so call pci_enable_bridge()
+	 * to keep enable_cnt consistent, then later we can go through
+	 * pci_pm_resume/pci_pm_reenable_device to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+if (!pci_is_enabled(dev)) {
+		u16 cmd;
+
+		pci_read_config_word(dev, PCI_COMMAND, cmd);
+		if (cmd  (PCI_COMMAND_IO | PCI_COMMAND_MEMORY))
+			pci_enable_bridge(dev);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 5:43 PM, Yinghai Lu  wrote:
> On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast  
> wrote:
>>
>>
>> Same problem like this morning: Failure after the second resume already. :-(
>>
> can not find out any magic line in pci_enable_bridge that could cause
> the difference.
>
> so either use attached pcie_enable_bridge_ite.patch or just revert the
> commit 928bea9?

Last try:

Please check attached patch that will keep state consistent.

Thanks

Yinghai
---
 drivers/pci/pci.c |   20 
 1 file changed, 20 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1264,6 +1264,26 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+u16 cmd;
+
+	/*
+	 * FW enable the bridge already, so keep enable_cnt consistent,
+	 * then later we can go through pci_pm_resume/pci_pm_reenable_device
+	 * to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+	if (cmd & PCI_COMMAND_MASTER)
+		dev->is_busmaster = true;
+
+	pci_read_config_word(dev, PCI_COMMAND, );
+	if (cmd & (PCI_COMMAND_IO || PCI_COMMAND_MEMORY))
+		atomic_inc(>enable_cnt);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast  wrote:
>
>
> Same problem like this morning: Failure after the second resume already. :-(
>
can not find out any magic line in pci_enable_bridge that could cause
the difference.

so either use attached pcie_enable_bridge_ite.patch or just revert the
commit 928bea9?

Bjorn, please check which one that you want to go on.

Thanks

Yinghai
---
 drivers/pci/pci.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,12 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	pci_enable_bridge(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
 			 * Assign resources.
 			 */
 			pci_bus_assign_resources(bus);
+
+
+			/*
+			 * Enable bridges
+			 */
+			pci_enable_bridges(bus);
 		}
 
 		/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
 	pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
 	pci_bus_size_bridges(rootbus);
 	pci_bus_assign_resources(rootbus);
+	pci_enable_bridges(rootbus);
 	return 0;
 }
 
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
 		if (!pci_has_flag(PCI_PROBE_ONLY)) {
 			pci_bus_size_bridges(bus);
 			pci_bus_assign_resources(bus);
+			pci_enable_bridges(bus);
 		}
 	}
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
 
 		pci_bus_size_bridges(bus);
 		pci_bus_assign_resources(bus);
+		pci_enable_bridges(bus);
 	} else {
 		pci_free_resource_list();
 	}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	if (system_state != SYSTEM_BOOTING) {
 		pcibios_resource_survey_bus(root->bus);
 		pci_assign_unassigned_root_bus_resources(root->bus);
+
+		/* need to after hot-added ioapic is registered */
+		pci_enable_bridges(root->bus);
 	}
 
 	pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
 		lba_dump_res(_dev->hba.lmmio_space, 2);
 #endif
 	}
+	pci_enable_bridges(lba_bus);
 
 	/*
 	** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
+void pci_enable_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int retval;
+
+	list_for_each_entry(dev, >devices, bus_list) {
+		if (dev->subordinate) {
+			if (!pci_is_enabled(dev)) {
+retval = pci_enable_device(dev);
+if (retval)
+	dev_err(>dev, "Error enabling bridge (%d), continuing\n", retval);
+pci_set_master(dev);
+			}
+			pci_enable_bridges(dev->subordinate);
+		}
+	}
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
 /** pci_walk_bus - walk devices on/under bus, calling callback.
  *  @top  bus whose devices should be walked
  *  @cb   callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
 	acpiphp_set_acpi_region(slot);
+	pci_enable_bridges(bus);
 
 	list_for_each_entry(dev, >devices, bus_list) {
 		/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_reenable_device);
 
-static

Re: Machine crashes right after ~successful resume




On 30-10-14 23:02, Yinghai Lu wrote:

http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt


no difference except on 00:1c.3

--- before.txt2014-10-30 15:20:35.782886485 -0700
+++ after.txt2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
  02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
  0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
  0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Those diffs are in exactly the same offsets like the dumps I was diffing 
a few days ago it seems.



Please try attached patch on top of 3.17 without other patches.


Same problem like this morning: Failure after the second resume already. :-(


If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.

Original files and iasl interpretations in: 
http://gaast.net/~wilmer/.lkml/tables/



Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 2:54 PM, Wilmer van der Gaast  wrote:

> http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt

no difference except on 00:1c.3

--- before.txt2014-10-30 15:20:35.782886485 -0700
+++ after.txt2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
 02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
 0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
 0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Please try attached patch on top of 3.17 without other patches.

If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.

Thanks

Yinghai
---
 arch/x86/pci/common.c |8 
 1 file changed, 8 insertions(+)

Index: linux-2.6/arch/x86/pci/common.c
===
--- linux-2.6.orig/arch/x86/pci/common.c
+++ linux-2.6/arch/x86/pci/common.c
@@ -719,6 +719,14 @@ int pcibios_enable_device(struct pci_dev
 	return 0;
 }
 
+static void pci_enable_irq_ite(struct pci_dev *dev)
+{
+	if (!pci_dev_msi_enabled(dev))
+		pcibios_enable_irq(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_irq_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_irq_ite);
+
 void pcibios_disable_device (struct pci_dev *dev)
 {
 	if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)

Re: Machine crashes right after ~successful resume


Hello,

On 30-10-14 16:57, Yinghai Lu wrote:

Sadly, with that patch (applied against a vanilla 3.17 tree like all the> 
others) the second resume fails already. :-(


oh, no. Really want to know which bit causes the problem.

Good question. And I think you will find my new finding even more 
confusing: With your two patches from this e-mail, I could 
suspend+resume 3× with no problems.. With just your two debugging 
patches applied.


Lovely heisenbug here. I'll add that for every test so far I've removed 
the kernel source tree, re-untarred it and applied the patches from your 
e-mails on that, so the tests should be consistent. As is the bug 
normally, before we started testing patches the crashes were already 
always *very* reliably happening exactly after the third resume.


Just to be sure this morning was not a fluke, I've retested your patch 
from this morning, and still a crash on the second resume.



Please check debug patch...that will print out pci conf space before
...and after...


http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 3:36 AM, Wilmer van der Gaast  wrote:

> Sadly, with that patch (applied against a vanilla 3.17 tree like all the> 
> others) the second resume fails already. :-(

oh, no. Really want to know which bit causes the problem.

Please check debug patch...that will print out pci conf space before
...and after...
Subject: [PATCH] pci: print out about pci=dump

debug print out before later driver hang

Signed-off-by: Yinghai Lu 

---
 drivers/pci/pci.c |   52 +++-
 1 file changed, 51 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+	 unsigned size)
+{
+	int i;
+	int j;
+	u32 val;
+	int end = start_reg + size;
+
+	printk(KERN_DEBUG "PCI: %s", pci_name(dev));
+
+	for (i = start_reg; i < end; i += 4) {
+		if (!(i & 0x0f))
+			printk("\n%04x:", i);
+
+		pci_read_config_dword(dev, i, );
+		for (j = 0; j < 4; j++) {
+			printk(" %02x", val & 0xff);
+			val >>= 8;
+		}
+	}
+	printk("\n");
+}
+
+static int dump_pci_devices(void)
+{
+	struct pci_dev *dev = NULL;
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+		dump_pci_device_range(dev, 0, dev->cfg_size);
+
+	return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+	pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+	if (pci_dump_regs)
+		dump_pci_devices();
+
+	return 0;
+}
+device_initcall(pci_init);
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
 		if (k)
 			*k++ = 0;
 		if (*str && (str = pcibios_setup(str)) && *str) {
-			if (!strcmp(str, "nomsi")) {
+			if (!strcmp(str, "dump")) {
+pci_dump();
+			} else if (!strcmp(str, "nomsi")) {
 pci_no_msi();
 			} else if (!strcmp(str, "noaer")) {
 pci_no_aer();
---
 drivers/pci/pci.c |   14 ++
 1 file changed, 14 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,20 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static int dump_pci_devices(void);
+
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	pr_info("before...\n");
+	dump_pci_devices();
+
+	pci_enable_bridge(dev);
+
+	pr_info("after...\n");
+	dump_pci_devices();
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume


Hello,

On 30-10-14 00:53, Yinghai Lu wrote:

Done, and that did work! Four suspend+resume cycles later and it's still
stable.

Then can you test attached simplified one.

Sadly, with that patch (applied against a vanilla 3.17 tree like all the 
others) the second resume fails already. :-(



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

On 30-10-14 00:53, Yinghai Lu wrote:

Done, and that did work! Four suspend+resume cycles later and it's still
stable.

Then can you test attached simplified one.

Sadly, with that patch (applied against a vanilla 3.17 tree like all the 
others) the second resume fails already. :-(



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 3:36 AM, Wilmer van der Gaast wil...@gaast.net wrote:

 Sadly, with that patch (applied against a vanilla 3.17 tree like all the 
 others) the second resume fails already. :-(

oh, no. Really want to know which bit causes the problem.

Please check debug patch...that will print out pci conf space before
...and after...
Subject: [PATCH] pci: print out about pci=dump

debug print out before later driver hang

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 drivers/pci/pci.c |   52 +++-
 1 file changed, 51 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+	 unsigned size)
+{
+	int i;
+	int j;
+	u32 val;
+	int end = start_reg + size;
+
+	printk(KERN_DEBUG PCI: %s, pci_name(dev));
+
+	for (i = start_reg; i  end; i += 4) {
+		if (!(i  0x0f))
+			printk(\n%04x:, i);
+
+		pci_read_config_dword(dev, i, val);
+		for (j = 0; j  4; j++) {
+			printk( %02x, val  0xff);
+			val = 8;
+		}
+	}
+	printk(\n);
+}
+
+static int dump_pci_devices(void)
+{
+	struct pci_dev *dev = NULL;
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+		dump_pci_device_range(dev, 0, dev-cfg_size);
+
+	return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+	pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+	if (pci_dump_regs)
+		dump_pci_devices();
+
+	return 0;
+}
+device_initcall(pci_init);
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
 		if (k)
 			*k++ = 0;
 		if (*str  (str = pcibios_setup(str))  *str) {
-			if (!strcmp(str, nomsi)) {
+			if (!strcmp(str, dump)) {
+pci_dump();
+			} else if (!strcmp(str, nomsi)) {
 pci_no_msi();
 			} else if (!strcmp(str, noaer)) {
 pci_no_aer();
---
 drivers/pci/pci.c |   14 ++
 1 file changed, 14 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,20 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static int dump_pci_devices(void);
+
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	pr_info(before...\n);
+	dump_pci_devices();
+
+	pci_enable_bridge(dev);
+
+	pr_info(after...\n);
+	dump_pci_devices();
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume


Hello,

On 30-10-14 16:57, Yinghai Lu wrote:

Sadly, with that patch (applied against a vanilla 3.17 tree like all the 
others) the second resume fails already. :-(


oh, no. Really want to know which bit causes the problem.

Good question. And I think you will find my new finding even more 
confusing: With your two patches from this e-mail, I could 
suspend+resume 3× with no problems.. With just your two debugging 
patches applied.


Lovely heisenbug here. I'll add that for every test so far I've removed 
the kernel source tree, re-untarred it and applied the patches from your 
e-mails on that, so the tests should be consistent. As is the bug 
normally, before we started testing patches the crashes were already 
always *very* reliably happening exactly after the third resume.


Just to be sure this morning was not a fluke, I've retested your patch 
from this morning, and still a crash on the second resume.



Please check debug patch...that will print out pci conf space before
...and after...


http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 2:54 PM, Wilmer van der Gaast wil...@gaast.net wrote:

 http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt

no difference except on 00:1c.3

--- before.txt2014-10-30 15:20:35.782886485 -0700
+++ after.txt2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
 02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
 0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
 0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Please try attached patch on top of 3.17 without other patches.

If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.

Thanks

Yinghai
---
 arch/x86/pci/common.c |8 
 1 file changed, 8 insertions(+)

Index: linux-2.6/arch/x86/pci/common.c
===
--- linux-2.6.orig/arch/x86/pci/common.c
+++ linux-2.6/arch/x86/pci/common.c
@@ -719,6 +719,14 @@ int pcibios_enable_device(struct pci_dev
 	return 0;
 }
 
+static void pci_enable_irq_ite(struct pci_dev *dev)
+{
+	if (!pci_dev_msi_enabled(dev))
+		pcibios_enable_irq(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_irq_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_irq_ite);
+
 void pcibios_disable_device (struct pci_dev *dev)
 {
 	if (!pci_dev_msi_enabled(dev)  pcibios_disable_irq)

Re: Machine crashes right after ~successful resume




On 30-10-14 23:02, Yinghai Lu wrote:

http://gaast.net/~wilmer/.lkml/good3.17-patched-megadebug.txt


no difference except on 00:1c.3

--- before.txt2014-10-30 15:20:35.782886485 -0700
+++ after.txt2014-10-30 15:21:37.034882515 -0700
@@ -49,10 +49,10 @@
  02f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-0320: 5b 60 09 00 00 20 00 0a ec 19 b8 04 eb 09 b9 06
+0320: 5b 60 09 00 00 20 00 0a 33 1a b8 04 32 0a 00 07
  0330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 45 02 b9 00
-0350: 4b 02 c1 00 01 00 08 00 00 00 00 00 00 00 00 00
+0340: 33 03 33 00 64 03 3f 00 30 00 0c 00 46 02 00 00
+0350: 4c 02 08 00 01 00 08 00 00 00 00 00 00 00 00 00
  0360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Those diffs are in exactly the same offsets like the dumps I was diffing 
a few days ago it seems.



Please try attached patch on top of 3.17 without other patches.


Same problem like this morning: Failure after the second resume already. :-(


If it is working, please dump acpi tables include dsdt.
need to check if there extra work in _PRT.

Original files and iasl interpretations in: 
http://gaast.net/~wilmer/.lkml/tables/



Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast wil...@gaast.net wrote:


 Same problem like this morning: Failure after the second resume already. :-(

can not find out any magic line in pci_enable_bridge that could cause
the difference.

so either use attached pcie_enable_bridge_ite.patch or just revert the
commit 928bea9?

Bjorn, please check which one that you want to go on.

Thanks

Yinghai
---
 drivers/pci/pci.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,12 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+	pci_enable_bridge(dev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
 			 * Assign resources.
 			 */
 			pci_bus_assign_resources(bus);
+
+
+			/*
+			 * Enable bridges
+			 */
+			pci_enable_bridges(bus);
 		}
 
 		/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
 	pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
 	pci_bus_size_bridges(rootbus);
 	pci_bus_assign_resources(rootbus);
+	pci_enable_bridges(rootbus);
 	return 0;
 }
 
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
 		if (!pci_has_flag(PCI_PROBE_ONLY)) {
 			pci_bus_size_bridges(bus);
 			pci_bus_assign_resources(bus);
+			pci_enable_bridges(bus);
 		}
 	}
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
 
 		pci_bus_size_bridges(bus);
 		pci_bus_assign_resources(bus);
+		pci_enable_bridges(bus);
 	} else {
 		pci_free_resource_list(resources);
 	}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	if (system_state != SYSTEM_BOOTING) {
 		pcibios_resource_survey_bus(root-bus);
 		pci_assign_unassigned_root_bus_resources(root-bus);
+
+		/* need to after hot-added ioapic is registered */
+		pci_enable_bridges(root-bus);
 	}
 
 	pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
 		lba_dump_res(lba_dev-hba.lmmio_space, 2);
 #endif
 	}
+	pci_enable_bridges(lba_bus);
 
 	/*
 	** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
+void pci_enable_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int retval;
+
+	list_for_each_entry(dev, bus-devices, bus_list) {
+		if (dev-subordinate) {
+			if (!pci_is_enabled(dev)) {
+retval = pci_enable_device(dev);
+if (retval)
+	dev_err(dev-dev, Error enabling bridge (%d), continuing\n, retval);
+pci_set_master(dev);
+			}
+			pci_enable_bridges(dev-subordinate);
+		}
+	}
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
 /** pci_walk_bus - walk devices on/under bus, calling callback.
  *  @top  bus whose devices should be walked
  *  @cb   callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
 	acpiphp_set_acpi_region(slot);
+	pci_enable_bridges(bus);
 
 	list_for_each_entry(dev, bus-devices, bus_list) {
 		/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
 }

Re: Machine crashes right after ~successful resume

On Thu, Oct 30, 2014 at 5:43 PM, Yinghai Lu ying...@kernel.org wrote:
 On Thu, Oct 30, 2014 at 4:24 PM, Wilmer van der Gaast wil...@gaast.net 
 wrote:


 Same problem like this morning: Failure after the second resume already. :-(

 can not find out any magic line in pci_enable_bridge that could cause
 the difference.

 so either use attached pcie_enable_bridge_ite.patch or just revert the
 commit 928bea9?

Last try:

Please check attached patch that will keep state consistent.

Thanks

Yinghai
---
 drivers/pci/pci.c |   20 
 1 file changed, 20 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1264,6 +1264,26 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void pci_enable_ite(struct pci_dev *dev)
+{
+u16 cmd;
+
+	/*
+	 * FW enable the bridge already, so keep enable_cnt consistent,
+	 * then later we can go through pci_pm_resume/pci_pm_reenable_device
+	 * to enable it again.
+	 * --- for pci bridge without driver case.
+	 */
+	if (cmd  PCI_COMMAND_MASTER)
+		dev-is_busmaster = true;
+
+	pci_read_config_word(dev, PCI_COMMAND, cmd);
+	if (cmd  (PCI_COMMAND_IO || PCI_COMMAND_MEMORY))
+		atomic_inc(dev-enable_cnt);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, pci_enable_ite);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_ite);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume

2014-10-29 Thread Yinghai Lu

On Wed, Oct 29, 2014 at 2:37 AM, Wilmer van der Gaast  wrote:
>
>> Anyway please try attached patched on top of 3.17.
>>
> Done, and that did work! Four suspend+resume cycles later and it's still
> stable.

Then can you test attached simplified one.
---
 drivers/pci/pci.c |   13 +
 1 file changed, 13 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,19 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void ite_set_d0(struct pci_dev *dev)
+{
+	if (dev->pm_cap) {
+		u16 pmcsr;
+		pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, );
+		dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
+	}
+
+	pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, ite_set_d0);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, ite_set_d0);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume

2014-10-29 Thread Wilmer van der Gaast


Helllo,

On 29-10-14 05:17, Yinghai Lu wrote:

(Diff is in the Intel device, not the ITE one.)

That is strange.

I did wonder later, why was I not seeing the ff* dump anymore after the 
resume..



Anyway please try attached patched on top of 3.17.

Done, and that did work! Four suspend+resume cycles later and it's still 
stable.



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-29 Thread Wilmer van der Gaast


Helllo,

On 29-10-14 05:17, Yinghai Lu wrote:

(Diff is in the Intel device, not the ITE one.)

That is strange.

I did wonder later, why was I not seeing the ff* dump anymore after the 
resume..



Anyway please try attached patched on top of 3.17.

Done, and that did work! Four suspend+resume cycles later and it's still 
stable.



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-29 Thread Yinghai Lu

On Wed, Oct 29, 2014 at 2:37 AM, Wilmer van der Gaast wil...@gaast.net wrote:

 Anyway please try attached patched on top of 3.17.

 Done, and that did work! Four suspend+resume cycles later and it's still
 stable.

Then can you test attached simplified one.
---
 drivers/pci/pci.c |   13 +
 1 file changed, 13 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,19 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+static void ite_set_d0(struct pci_dev *dev)
+{
+	if (dev-pm_cap) {
+		u16 pmcsr;
+		pci_read_config_word(dev, dev-pm_cap + PCI_PM_CTRL, pmcsr);
+		dev-current_state = (pmcsr  PCI_PM_CTRL_STATE_MASK);
+	}
+
+	pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x244e, ite_set_d0);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, ite_set_d0);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume

2014-10-28 Thread Yinghai Lu

On Tue, Oct 28, 2014 at 4:34 PM, Wilmer van der Gaast  wrote:
>
> I've run the commands twice, once before and once after a single
> suspend+resume cycle. Small difference and only before that cycle:
>
> ruby:~/crashit# diff -u lspcixx-*
> --- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +
> +++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +
> @@ -92,10 +92,10 @@
>  2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
> -320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
> +320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
>  330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
> -340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
> -350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
> +340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
> +350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
>  360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> (Diff is in the Intel device, not the ITE one.)
>

That is strange.

Anyway please try attached patched on top of 3.17.

Thanks

Yinghai
---
 drivers/pci/pci.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,8 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_bridge);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume


Hello,

On 28-10-14 01:12, Yinghai Lu wrote:

lspci -vv -s 00:1c.3
lspci -vv -s 04:00.0
before reverting enable bridge early patch


http://gaast.net/~wilmer/.lkml/lspcixx-nopatch.txt (So that's 3.17 + 
your revert patch)



and after reverting on 3.17+?


http://gaast.net/~wilmer/.lkml/lspcixx-patched.txt

plain 3.17.

I've run the commands twice, once before and once after a single 
suspend+resume cycle. Small difference and only before that cycle:


ruby:~/crashit# diff -u lspcixx-*
--- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +
+++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +
@@ -92,10 +92,10 @@
 2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
+320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
 330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
-350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
+340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
+350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
 360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

(Diff is in the Intel device, not the ITE one.)


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


On 28-10-14 04:03, Yinghai Lu wrote:


Please check if attached patch could fix the problem on your setup.

Sadly it looks like it did not. :-( Applied your patch on a vanilla 3.17 
tree, still seeing the same crash.


I'll get more debugging output and the output you asked for in your 
previous e-mail tonight, need to go to work now.



Cheers,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


On 28-10-14 04:03, Yinghai Lu wrote:


Please check if attached patch could fix the problem on your setup.

Sadly it looks like it did not. :-( Applied your patch on a vanilla 3.17 
tree, still seeing the same crash.


I'll get more debugging output and the output you asked for in your 
previous e-mail tonight, need to go to work now.



Cheers,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

On 28-10-14 01:12, Yinghai Lu wrote:

lspci -vv -s 00:1c.3
lspci -vv -s 04:00.0
before reverting enable bridge early patch


http://gaast.net/~wilmer/.lkml/lspcixx-nopatch.txt (So that's 3.17 + 
your revert patch)



and after reverting on 3.17+?


http://gaast.net/~wilmer/.lkml/lspcixx-patched.txt

plain 3.17.

I've run the commands twice, once before and once after a single 
suspend+resume cycle. Small difference and only before that cycle:


ruby:~/crashit# diff -u lspcixx-*
--- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +
+++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +
@@ -92,10 +92,10 @@
 2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
-320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
+320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
 330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
-340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
-350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
+340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
+350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
 360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

(Diff is in the Intel device, not the ITE one.)


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-28 Thread Yinghai Lu

On Tue, Oct 28, 2014 at 4:34 PM, Wilmer van der Gaast wil...@gaast.net wrote:

 I've run the commands twice, once before and once after a single
 suspend+resume cycle. Small difference and only before that cycle:

 ruby:~/crashit# diff -u lspcixx-*
 --- lspcixx-nopatch.txt 2014-10-28 23:26:08.679690828 +
 +++ lspcixx-patched.txt 2014-10-28 23:10:05.391896757 +
 @@ -92,10 +92,10 @@
  2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  310: 00 00 00 00 1b 36 3a 74 00 00 14 14 31 17 42 00
 -320: 5b 60 09 00 00 20 00 0a 08 10 b8 04 07 00 d5 0c
 +320: 5b 60 09 00 00 20 00 0a 90 10 b8 04 8f 00 5d 0d
  330: 16 00 00 28 bc b5 bc 4a 00 00 00 00 74 4c 85 00
 -340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 d5 00
 -350: fe 07 dd 00 01 00 08 00 00 00 00 00 00 00 00 00
 +340: 33 03 33 00 64 03 3f 00 30 00 0c 00 f8 07 5d 00
 +350: fe 07 65 00 01 00 08 00 00 00 00 00 00 00 00 00
  360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

 (Diff is in the Intel device, not the ITE one.)


That is strange.

Anyway please try attached patched on top of 3.17.

Thanks

Yinghai
---
 drivers/pci/pci.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1265,6 +1265,8 @@ static void pci_enable_bridge(struct pci
 	pci_set_master(dev);
 }
 
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, pci_enable_bridge);
+
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
 	struct pci_dev *bridge;

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 6:12 PM, Yinghai Lu  wrote:
> On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast  
> wrote:
>> I was curious about that already, did that with a 3.16.6 that I think just
>> had your revert applied (and using lspci - to get the dump which I
>> assumed would be the same): No changes to 04:00 at all.
>>
>> Confirmed that this is the case with 3.17 + those patches as well, it's
>> showing this at all times:
>
> can you post
> lspci -vv -s 00:1c.3
> lspci -vv -s 04:00.0
> before reverting enable bridge early patch
> and after reverting on 3.17+?

Please check if attached patch could fix the problem on your setup.

Thanks

Yinghai
---
 drivers/pci/quirks.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/pci/quirks.c
===
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -3098,6 +3098,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c02, quirk_remove_d3_delay);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c22, quirk_remove_d3_delay);
 
+static void enable_pci_bridge_d0(struct pci_dev *dev)
+{
+	pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, enable_pci_bridge_d0);
+
 /*
  * Some devices may pass our check in pci_intx_mask_supported if
  * PCI_COMMAND_INTX_DISABLE works though they actually do not properly

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast  wrote:
> I was curious about that already, did that with a 3.16.6 that I think just
> had your revert applied (and using lspci - to get the dump which I
> assumed would be the same): No changes to 04:00 at all.
>
> Confirmed that this is the case with 3.17 + those patches as well, it's
> showing this at all times:

can you post
lspci -vv -s 00:1c.3
lspci -vv -s 04:00.0
before reverting enable bridge early patch
and after reverting on 3.17+?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


On 27-10-14 23:41, Yinghai Lu wrote:


Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?

I was curious about that already, did that with a 3.16.6 that I think 
just had your revert applied (and using lspci - to get the dump 
which I assumed would be the same): No changes to 04:00 at all.


Confirmed that this is the case with 3.17 + those patches as well, it's 
showing this at all times:


[  130.000122] PCI: :04:00.0
: 83 12 92 88 07 00 10 00 10 01 04 06 01 00 01 00
0010: 00 00 00 00 00 00 00 00 04 05 05 20 d1 d1 20 22
0020: c0 fb c0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 02
0040: 0c 31 00 00 08 06 00 00 00 00 00 00 ff 00 00 00
0050: 72 ab b9 6d 00 00 00 00 20 c9 8e 00 00 00 00 00
0060: 00 00 00 00 aa 0d 00 10 00 44 00 00 00 00 00 80
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 01 a0 42 fe 00 00 00 00 00 00 00 00 00 00 00 00
00a0: 0d 00 00 00 58 14 00 50 00 00 00 00 00 00 00 00
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00f0: 00 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 3:22 PM, Wilmer van der Gaast  wrote:
> Hello,
>
> On 27-10-14 18:23, Yinghai Lu wrote:
>>
>>
>> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
>>
>> So that ITE will not work after suspend/resume?
>>
> Even after the first one already, you mean?

Yes.

>
> Honestly, I don't really know what its purpose is, and it doesn't have any
> child nodes in the PCI tree from what I can tell. Possibly because I don't
> have any PCI cards in the machine, just a PCIe video card - assuming this is
> a PCI bridge taking care of legacy PCI plugin cards?
>
>> Please apply 4 attached patches and try to remove the device like
>>
>> echo 1 > /sys/bus/pci/devices/\:04\:00.0/remove
>> echo 1 > /sys/bus/pci/devices/\:00\:1c.3/pcie_link_disable
>>
>> before suspend/resume test.
>>
> That worked! Resumed properly now.
>
> Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the PCI
> dump at boot time, where that device doesn't dump just ff's.

Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

On 27-10-14 18:23, Yinghai Lu wrote:


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892

So that ITE will not work after suspend/resume?


Even after the first one already, you mean?

Honestly, I don't really know what its purpose is, and it doesn't have 
any child nodes in the PCI tree from what I can tell. Possibly because I 
don't have any PCI cards in the machine, just a PCIe video card - 
assuming this is a PCI bridge taking care of legacy PCI plugin cards?



Please apply 4 attached patches and try to remove the device like

echo 1 > /sys/bus/pci/devices/\:04\:00.0/remove
echo 1 > /sys/bus/pci/devices/\:00\:1c.3/pcie_link_disable

before suspend/resume test.


That worked! Resumed properly now.

Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the 
PCI dump at boot time, where that device doesn't dump just ff's.



Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-27 Thread Pavel Machek

On Mon 2014-10-27 10:50:04, Wilmer van der Gaast wrote:
> Hello Yinghai,
> 
> Thanks again for your time!
> 
> I've applied your two patches, and as a wild guess also added pci=dump to my
> kernel cmdline though I guess that just gave me a boot-time dump - which
> mostly didn't make it into my dmesg.
> 
> I accidentally booted with no_console_suspend on the first run, which still
> caused no output at all on the failed resume. I'm including the output of
> that anyway, but also I have a run with that flag removed, and annoyingly
> the crash appears to happen before the dump during the crash finishes -
> while dumping info for this device, it seems:
> 
> 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 10)
> (prog-if 01 [Subtractive decode])
> 
> (More info in my lspci.txt)
> 
> Wondering what device that is exactly, I stumbled upon
> http://sourceforge.net/p/linux1394/mailman/message/29755048/ where someone
> describes it as a "cheap and crappy PCI bridge". More and more I wonder if I
> should just buy a new motherboard - sadly this one wasn't even that
> cheap.

It is probably not just you that is affected, and we already know what
change broke it. So we really should fix it.

> :-( Though I don't know if the output stopping while dumping output for this
> device means that it is the culprit, is printk() to the serial console in
> any way blocking/buffered?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 3:50 AM, Wilmer van der Gaast  wrote:

> http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt

[  252.028142] PCI: :04:00.0
: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0010: ff ff ff ff ff ff ff ff


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
(rev 10) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- TAbort-
Reset- FastB2B-
PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
Capabilities: [90] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=55mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 5000

under

00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
(prog-if 01 [Subtractive decode])

So that ITE will not work after suspend/resume?

Please apply 4 attached patches and try to remove the device like

echo 1 > /sys/bus/pci/devices/\:04\:00.0/remove
echo 1 > /sys/bus/pci/devices/\:00\:1c.3/pcie_link_disable

before suspend/resume test.

Thanks

Yinghai
Subject: [PATCH] PCI: Add generic pcie_link_disable

Remove not needed return value checking that Linus pointed out before.

Will use it from /sys/.../pcie/link_disable

Signed-off-by: Yinghai Lu 

---
 drivers/pci/Makefile|2 +-
 drivers/pci/pcie-link.c |   42 ++
 include/linux/pci.h |2 ++
 3 files changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pcie-link.c
===
--- /dev/null
+++ linux-2.6/drivers/pci/pcie-link.c
@@ -0,0 +1,42 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int pcie_link_disable_get(struct pci_dev *dev)
+{
+	u16 lnk_ctrl;
+	if (!pci_is_pcie(dev))
+		return 0;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, _ctrl);
+
+	return !!(lnk_ctrl & PCI_EXP_LNKCTL_LD);
+}
+
+void pcie_link_disable_set(struct pci_dev *dev, int bit)
+{
+	u16 lnk_ctrl, old_lnk_ctrl;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, _ctrl);
+	old_lnk_ctrl = lnk_ctrl;
+
+	if (!bit)
+		lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
+	else
+		lnk_ctrl |= PCI_EXP_LNKCTL_LD;
+
+	if (old_lnk_ctrl == lnk_ctrl)
+		return;
+
+	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctrl);
+
+	dev_printk(KERN_DEBUG, >dev, "%s: lnk_ctrl = %x\n", __func__,
+			 lnk_ctrl);
+}
+EXPORT_SYMBOL(pcie_link_disable_set);
Index: linux-2.6/include/linux/pci.h
===
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -842,6 +842,8 @@ struct pci_bus *pci_scan_root_bus(struct
 struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 int busnr);
 void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
+void pcie_link_disable_set(struct pci_dev *dev, int bit);
+int pcie_link_disable_get(struct pci_dev *dev);
 struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
  const char *name,
  struct hotplug_slot *hotplug);
Index: linux-2.6/drivers/pci/Makefile
===
--- linux-2.6.orig/drivers/pci/Makefile
+++ linux-2.6/drivers/pci/Makefile
@@ -4,7 +4,7 @@
 
 obj-y		+= access.o bus.o probe.o host-bridge.o remove.o pci.o \
 			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
-			irq.o vpd.o setup-bus.o vc.o
+			irq.o vpd.o setup-bus.o pcie-link.o vc.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSFS) += slot.o
 
Subject: [PATCH] PCI, pciehp: Use generic pcie_link_disable

Also remove old version with not needed return check.

Signed-off-by: Yinghai Lu 

---
 drivers/pci/hotplug/pciehp_hpc.c |   30 +++---
 1 file changed, 3 insertions(+), 27 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -305,28 +305,6 @@ int pciehp_check_link_status(struct cont
 	return 0;
 }
 
-static int __pciehp_link_set(struct controller *ctrl, bool enable)
-{
-	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 lnk_ctrl;
-
-	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, _ctrl);
-
-	if (enable)
-		lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
-	else
-		lnk_ctrl |= PCI_EXP_LNKCTL_LD;
-
-	pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
-	ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
-	return 0;
-}
-
-static int pciehp_link_enable(struct controller *ctrl)
-{
-	return __pciehp_link_set(ctrl, true);
-}
-
 void pciehp_get_attention_status(struct slot *slot, u8 *status)
 {
 	struct controller *ctrl = slot->ctrl;
@@ -473,7 +451,6 @@ int pciehp_power_on_slot(struct slot * s

Re: Machine crashes right after ~successful resume


Hello Yinghai,

Thanks again for your time!

I've applied your two patches, and as a wild guess also added pci=dump 
to my kernel cmdline though I guess that just gave me a boot-time dump - 
which mostly didn't make it into my dmesg.


I accidentally booted with no_console_suspend on the first run, which 
still caused no output at all on the failed resume. I'm including the 
output of that anyway, but also I have a run with that flag removed, and 
annoyingly the crash appears to happen before the dump during the crash 
finishes - while dumping info for this device, it seems:


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 
10) (prog-if 01 [Subtractive decode])


(More info in my lspci.txt)

Wondering what device that is exactly, I stumbled upon 
http://sourceforge.net/p/linux1394/mailman/message/29755048/ where 
someone describes it as a "cheap and crappy PCI bridge". More and more I 
wonder if I should just buy a new motherboard - sadly this one wasn't 
even that cheap. :-( Though I don't know if the output stopping while 
dumping output for this device means that it is the culprit, is printk() 
to the serial console in any way blocking/buffered?


Anyway, dumps are in:

http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps-no_console_suspend.txt
http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt


Cheers,

Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello Yinghai,

Thanks again for your time!

I've applied your two patches, and as a wild guess also added pci=dump 
to my kernel cmdline though I guess that just gave me a boot-time dump - 
which mostly didn't make it into my dmesg.


I accidentally booted with no_console_suspend on the first run, which 
still caused no output at all on the failed resume. I'm including the 
output of that anyway, but also I have a run with that flag removed, and 
annoyingly the crash appears to happen before the dump during the crash 
finishes - while dumping info for this device, it seems:


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 
10) (prog-if 01 [Subtractive decode])


(More info in my lspci.txt)

Wondering what device that is exactly, I stumbled upon 
http://sourceforge.net/p/linux1394/mailman/message/29755048/ where 
someone describes it as a cheap and crappy PCI bridge. More and more I 
wonder if I should just buy a new motherboard - sadly this one wasn't 
even that cheap. :-( Though I don't know if the output stopping while 
dumping output for this device means that it is the culprit, is printk() 
to the serial console in any way blocking/buffered?


Anyway, dumps are in:

http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps-no_console_suspend.txt
http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt


Cheers,

Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 3:50 AM, Wilmer van der Gaast wil...@gaast.net wrote:

 http://gaast.net/~wilmer/.lkml/bad3.17-pcidumps.txt

[  252.028142] PCI: :04:00.0
: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0010: ff ff ff ff ff ff ff ff


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892
(rev 10) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
I/O behind bridge: d000-dfff
Memory behind bridge: fbc0-fbcf
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium TAbort-
TAbort- MAbort+ SERR- PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
Capabilities: [90] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=55mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 5000

under

00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
(prog-if 01 [Subtractive decode])

So that ITE will not work after suspend/resume?

Please apply 4 attached patches and try to remove the device like

echo 1  /sys/bus/pci/devices/\:04\:00.0/remove
echo 1  /sys/bus/pci/devices/\:00\:1c.3/pcie_link_disable

before suspend/resume test.

Thanks

Yinghai
Subject: [PATCH] PCI: Add generic pcie_link_disable

Remove not needed return value checking that Linus pointed out before.

Will use it from /sys/.../pcie/link_disable

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 drivers/pci/Makefile|2 +-
 drivers/pci/pcie-link.c |   42 ++
 include/linux/pci.h |2 ++
 3 files changed, 45 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pcie-link.c
===
--- /dev/null
+++ linux-2.6/drivers/pci/pcie-link.c
@@ -0,0 +1,42 @@
+#include linux/kernel.h
+#include linux/module.h
+#include linux/pci.h
+#include linux/errno.h
+#include linux/jiffies.h
+#include linux/delay.h
+
+int pcie_link_disable_get(struct pci_dev *dev)
+{
+	u16 lnk_ctrl;
+	if (!pci_is_pcie(dev))
+		return 0;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, lnk_ctrl);
+
+	return !!(lnk_ctrl  PCI_EXP_LNKCTL_LD);
+}
+
+void pcie_link_disable_set(struct pci_dev *dev, int bit)
+{
+	u16 lnk_ctrl, old_lnk_ctrl;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, lnk_ctrl);
+	old_lnk_ctrl = lnk_ctrl;
+
+	if (!bit)
+		lnk_ctrl = ~PCI_EXP_LNKCTL_LD;
+	else
+		lnk_ctrl |= PCI_EXP_LNKCTL_LD;
+
+	if (old_lnk_ctrl == lnk_ctrl)
+		return;
+
+	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctrl);
+
+	dev_printk(KERN_DEBUG, dev-dev, %s: lnk_ctrl = %x\n, __func__,
+			 lnk_ctrl);
+}
+EXPORT_SYMBOL(pcie_link_disable_set);
Index: linux-2.6/include/linux/pci.h
===
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -842,6 +842,8 @@ struct pci_bus *pci_scan_root_bus(struct
 struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev *dev,
 int busnr);
 void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
+void pcie_link_disable_set(struct pci_dev *dev, int bit);
+int pcie_link_disable_get(struct pci_dev *dev);
 struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
  const char *name,
  struct hotplug_slot *hotplug);
Index: linux-2.6/drivers/pci/Makefile
===
--- linux-2.6.orig/drivers/pci/Makefile
+++ linux-2.6/drivers/pci/Makefile
@@ -4,7 +4,7 @@
 
 obj-y		+= access.o bus.o probe.o host-bridge.o remove.o pci.o \
 			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
-			irq.o vpd.o setup-bus.o vc.o
+			irq.o vpd.o setup-bus.o pcie-link.o vc.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSFS) += slot.o
 
Subject: [PATCH] PCI, pciehp: Use generic pcie_link_disable

Also remove old version with not needed return check.

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 drivers/pci/hotplug/pciehp_hpc.c |   30 +++---
 1 file changed, 3 insertions(+), 27 deletions(-)

Index: linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
===
--- linux-2.6.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-2.6/drivers/pci/hotplug/pciehp_hpc.c
@@ -305,28 +305,6 @@ int pciehp_check_link_status(struct cont
 	return 0;
 }
 
-static int __pciehp_link_set(struct controller *ctrl, bool enable)
-{
-	struct pci_dev *pdev = ctrl_dev(ctrl);
-	u16 lnk_ctrl;
-
-	pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
-
-	if (enable)
-		lnk_ctrl =

Re: Machine crashes right after ~successful resume

2014-10-27 Thread Pavel Machek

On Mon 2014-10-27 10:50:04, Wilmer van der Gaast wrote:
 Hello Yinghai,
 
 Thanks again for your time!
 
 I've applied your two patches, and as a wild guess also added pci=dump to my
 kernel cmdline though I guess that just gave me a boot-time dump - which
 mostly didn't make it into my dmesg.
 
 I accidentally booted with no_console_suspend on the first run, which still
 caused no output at all on the failed resume. I'm including the output of
 that anyway, but also I have a run with that flag removed, and annoyingly
 the crash appears to happen before the dump during the crash finishes -
 while dumping info for this device, it seems:
 
 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 10)
 (prog-if 01 [Subtractive decode])
 
 (More info in my lspci.txt)
 
 Wondering what device that is exactly, I stumbled upon
 http://sourceforge.net/p/linux1394/mailman/message/29755048/ where someone
 describes it as a cheap and crappy PCI bridge. More and more I wonder if I
 should just buy a new motherboard - sadly this one wasn't even that
 cheap.

It is probably not just you that is affected, and we already know what
change broke it. So we really should fix it.

 :-( Though I don't know if the output stopping while dumping output for this
 device means that it is the culprit, is printk() to the serial console in
 any way blocking/buffered?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

On 27-10-14 18:23, Yinghai Lu wrote:


04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892

So that ITE will not work after suspend/resume?


Even after the first one already, you mean?

Honestly, I don't really know what its purpose is, and it doesn't have 
any child nodes in the PCI tree from what I can tell. Possibly because I 
don't have any PCI cards in the machine, just a PCIe video card - 
assuming this is a PCI bridge taking care of legacy PCI plugin cards?



Please apply 4 attached patches and try to remove the device like

echo 1  /sys/bus/pci/devices/\:04\:00.0/remove
echo 1  /sys/bus/pci/devices/\:00\:1c.3/pcie_link_disable

before suspend/resume test.


That worked! Resumed properly now.

Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the 
PCI dump at boot time, where that device doesn't dump just ff's.



Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 3:22 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 Hello,

 On 27-10-14 18:23, Yinghai Lu wrote:


 04:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892

 So that ITE will not work after suspend/resume?

 Even after the first one already, you mean?

Yes.


 Honestly, I don't really know what its purpose is, and it doesn't have any
 child nodes in the PCI tree from what I can tell. Possibly because I don't
 have any PCI cards in the machine, just a PCIe video card - assuming this is
 a PCI bridge taking care of legacy PCI plugin cards?

 Please apply 4 attached patches and try to remove the device like

 echo 1  /sys/bus/pci/devices/\:04\:00.0/remove
 echo 1  /sys/bus/pci/devices/\:00\:1c.3/pcie_link_disable

 before suspend/resume test.

 That worked! Resumed properly now.

 Full log in http://gaast.net/~wilmer/.lkml/good3.17.txt . Including the PCI
 dump at boot time, where that device doesn't dump just ff's.

Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


On 27-10-14 23:41, Yinghai Lu wrote:


Can you only apply the patch that revert enable bridge early and
two pci dump patches to see if 04:00.0 readout is 0xff?

I was curious about that already, did that with a 3.16.6 that I think 
just had your revert applied (and using lspci - to get the dump 
which I assumed would be the same): No changes to 04:00 at all.


Confirmed that this is the case with 3.17 + those patches as well, it's 
showing this at all times:


[  130.000122] PCI: :04:00.0
: 83 12 92 88 07 00 10 00 10 01 04 06 01 00 01 00
0010: 00 00 00 00 00 00 00 00 04 05 05 20 d1 d1 20 22
0020: c0 fb c0 fb f1 ff 01 00 00 00 00 00 00 00 00 00
0030: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 02
0040: 0c 31 00 00 08 06 00 00 00 00 00 00 ff 00 00 00
0050: 72 ab b9 6d 00 00 00 00 20 c9 8e 00 00 00 00 00
0060: 00 00 00 00 aa 0d 00 10 00 44 00 00 00 00 00 80
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0090: 01 a0 42 fe 00 00 00 00 00 00 00 00 00 00 00 00
00a0: 0d 00 00 00 58 14 00 50 00 00 00 00 00 00 00 00
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00f0: 00 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 I was curious about that already, did that with a 3.16.6 that I think just
 had your revert applied (and using lspci - to get the dump which I
 assumed would be the same): No changes to 04:00 at all.

 Confirmed that this is the case with 3.17 + those patches as well, it's
 showing this at all times:

can you post
lspci -vv -s 00:1c.3
lspci -vv -s 04:00.0
before reverting enable bridge early patch
and after reverting on 3.17+?

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Mon, Oct 27, 2014 at 6:12 PM, Yinghai Lu ying...@kernel.org wrote:
 On Mon, Oct 27, 2014 at 5:03 PM, Wilmer van der Gaast wil...@gaast.net 
 wrote:
 I was curious about that already, did that with a 3.16.6 that I think just
 had your revert applied (and using lspci - to get the dump which I
 assumed would be the same): No changes to 04:00 at all.

 Confirmed that this is the case with 3.17 + those patches as well, it's
 showing this at all times:

 can you post
 lspci -vv -s 00:1c.3
 lspci -vv -s 04:00.0
 before reverting enable bridge early patch
 and after reverting on 3.17+?

Please check if attached patch could fix the problem on your setup.

Thanks

Yinghai
---
 drivers/pci/quirks.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/drivers/pci/quirks.c
===
--- linux-2.6.orig/drivers/pci/quirks.c
+++ linux-2.6/drivers/pci/quirks.c
@@ -3098,6 +3098,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_IN
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c02, quirk_remove_d3_delay);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x8c22, quirk_remove_d3_delay);
 
+static void enable_pci_bridge_d0(struct pci_dev *dev)
+{
+	pci_set_power_state(dev, PCI_D0);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ITE, 0x8892, enable_pci_bridge_d0);
+
 /*
  * Some devices may pass our check in pci_intx_mask_supported if
  * PCI_COMMAND_INTX_DISABLE works though they actually do not properly

Re: Machine crashes right after ~successful resume

2014-10-26 Thread Yinghai Lu

On Wed, Oct 22, 2014 at 5:53 AM, Wilmer van der Gaast  wrote:
> That seems to be the case yes:
>
> [  106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
> [  106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
> [  106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
> [  106.675775] pm_restore_console() before move
>
> Then nothing, during the third resume.
>
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
> the full log.
>
> (Some of your other debug lines in your patch don't seem to be logging
> anything during my repro BTW.)

Please try attached two debug patches to check the pci registers
between the suspend/resume.
Subject: [PATCH] pci: print out about pci=dump

debug print out before later driver hang

Signed-off-by: Yinghai Lu 

---
 drivers/pci/pci.c |   52 +++-
 1 file changed, 51 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+	 unsigned size)
+{
+	int i;
+	int j;
+	u32 val;
+	int end = start_reg + size;
+
+	printk(KERN_DEBUG "PCI: %s", pci_name(dev));
+
+	for (i = start_reg; i < end; i += 4) {
+		if (!(i & 0x0f))
+			printk("\n%04x:", i);
+
+		pci_read_config_dword(dev, i, );
+		for (j = 0; j < 4; j++) {
+			printk(" %02x", val & 0xff);
+			val >>= 8;
+		}
+	}
+	printk("\n");
+}
+
+static int dump_pci_devices(void)
+{
+	struct pci_dev *dev = NULL;
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+		dump_pci_device_range(dev, 0, dev->cfg_size);
+
+	return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+	pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+	if (pci_dump_regs)
+		dump_pci_devices();
+
+	return 0;
+}
+device_initcall(pci_init);
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
 		if (k)
 			*k++ = 0;
 		if (*str && (str = pcibios_setup(str)) && *str) {
-			if (!strcmp(str, "nomsi")) {
+			if (!strcmp(str, "dump")) {
+pci_dump();
+			} else if (!strcmp(str, "nomsi")) {
 pci_no_msi();
 			} else if (!strcmp(str, "noaer")) {
 pci_no_aer();
---
 drivers/pci/pci.c  |2 +-
 kernel/power/suspend.c |2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -4462,7 +4462,7 @@ static void dump_pci_device_range(struct
 	printk("\n");
 }
 
-static int dump_pci_devices(void)
+int dump_pci_devices(void)
 {
 	struct pci_dev *dev = NULL;
 
Index: linux-2.6/kernel/power/suspend.c
===
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -401,6 +401,7 @@ int suspend_devices_and_enter(suspend_st
 	goto Resume_devices;
 }
 
+int dump_pci_devices(void);
 /**
  * suspend_finish - Clean up before finishing the suspend sequence.
  *
@@ -411,6 +412,7 @@ static void suspend_finish(void)
 {
 	suspend_thaw_processes();
 	pm_notifier_call_chain(PM_POST_SUSPEND);
+	dump_pci_devices();
 	pm_restore_console();
 }

Re: Machine crashes right after ~successful resume

2014-10-26 Thread Yinghai Lu

On Wed, Oct 22, 2014 at 5:53 AM, Wilmer van der Gaast wil...@gaast.net wrote:
 That seems to be the case yes:

 [  106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
 [  106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
 [  106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
 [  106.675775] pm_restore_console() before move

 Then nothing, during the third resume.

 http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
 the full log.

 (Some of your other debug lines in your patch don't seem to be logging
 anything during my repro BTW.)

Please try attached two debug patches to check the pci registers
between the suspend/resume.
Subject: [PATCH] pci: print out about pci=dump

debug print out before later driver hang

Signed-off-by: Yinghai Lu ying...@kernel.org

---
 drivers/pci/pci.c |   52 +++-
 1 file changed, 51 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -3858,6 +3858,54 @@ void __weak pci_fixup_cardbus(struct pci
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static void dump_pci_device_range(struct pci_dev *dev, unsigned start_reg,
+	 unsigned size)
+{
+	int i;
+	int j;
+	u32 val;
+	int end = start_reg + size;
+
+	printk(KERN_DEBUG PCI: %s, pci_name(dev));
+
+	for (i = start_reg; i  end; i += 4) {
+		if (!(i  0x0f))
+			printk(\n%04x:, i);
+
+		pci_read_config_dword(dev, i, val);
+		for (j = 0; j  4; j++) {
+			printk( %02x, val  0xff);
+			val = 8;
+		}
+	}
+	printk(\n);
+}
+
+static int dump_pci_devices(void)
+{
+	struct pci_dev *dev = NULL;
+
+	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL)
+		dump_pci_device_range(dev, 0, dev-cfg_size);
+
+	return 0;
+}
+
+static int pci_dump_regs;
+static void pci_dump(void)
+{
+	pci_dump_regs = 1;
+}
+
+static int pci_init(void)
+{
+	if (pci_dump_regs)
+		dump_pci_devices();
+
+	return 0;
+}
+device_initcall(pci_init);
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -3865,7 +3913,9 @@ static int __init pci_setup(char *str)
 		if (k)
 			*k++ = 0;
 		if (*str  (str = pcibios_setup(str))  *str) {
-			if (!strcmp(str, nomsi)) {
+			if (!strcmp(str, dump)) {
+pci_dump();
+			} else if (!strcmp(str, nomsi)) {
 pci_no_msi();
 			} else if (!strcmp(str, noaer)) {
 pci_no_aer();
---
 drivers/pci/pci.c  |2 +-
 kernel/power/suspend.c |2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -4462,7 +4462,7 @@ static void dump_pci_device_range(struct
 	printk(\n);
 }
 
-static int dump_pci_devices(void)
+int dump_pci_devices(void)
 {
 	struct pci_dev *dev = NULL;
 
Index: linux-2.6/kernel/power/suspend.c
===
--- linux-2.6.orig/kernel/power/suspend.c
+++ linux-2.6/kernel/power/suspend.c
@@ -401,6 +401,7 @@ int suspend_devices_and_enter(suspend_st
 	goto Resume_devices;
 }
 
+int dump_pci_devices(void);
 /**
  * suspend_finish - Clean up before finishing the suspend sequence.
  *
@@ -411,6 +412,7 @@ static void suspend_finish(void)
 {
 	suspend_thaw_processes();
 	pm_notifier_call_chain(PM_POST_SUSPEND);
+	dump_pci_devices();
 	pm_restore_console();
 }

Re: Machine crashes right after ~successful resume

2014-10-22 Thread Wilmer van der Gaast

Hello Yinghai,

This looks more promising!

Yinghai Lu (ying...@kernel.org) wrote:
> >
> > And then nothing, and it's hung. Looks the same to me (apart from the tsc
> > issues + hpet switch) as a successful resume:
> 
> then it stuck in pm_restore_console()?
> 
That seems to be the case yes:

[  106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
[  106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
[  106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
[  106.675775] pm_restore_console() before move

Then nothing, during the third resume.

http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
the full log.

(Some of your other debug lines in your patch don't seem to be logging
anything during my repro BTW.)


Wilmer v/d Gaast.

-- 
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-22 Thread Wilmer van der Gaast

Hello Yinghai,

This looks more promising!

Yinghai Lu (ying...@kernel.org) wrote:
 
  And then nothing, and it's hung. Looks the same to me (apart from the tsc
  issues + hpet switch) as a successful resume:
 
 then it stuck in pm_restore_console()?
 
That seems to be the case yes:

[  106.661152] PM: ... nb fw_pm_notify+0x0/0x150 done
[  106.665939] PM: calling nb bsp_pm_callback+0x0/0x50
[  106.670814] PM: ... nb bsp_pm_callback+0x0/0x50 done
[  106.675775] pm_restore_console() before move

Then nothing, during the third resume.

http://gaast.net/~wilmer/.lkml/bad3.17-patched-console-restore.txt has
the full log.

(Some of your other debug lines in your patch don't seem to be logging
anything during my repro BTW.)


Wilmer v/d Gaast.

-- 
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-21 Thread Yinghai Lu

On Tue, Oct 21, 2014 at 2:40 PM, Wilmer van der Gaast  wrote:
> Hello,
>
> Sorry for the delay, finally poked at this again. It looks like the
> no_console_suspend flag was causing troubles, which I didn't really need
> anyway with logging going to my serial port.
>
> This is what I get now on the failing resume:
>
> [  112.879390] PM: resume of devices complete after 2239.905 msecs
> [  112.880068] r8169 :07:00.0 eth0: link up
> [  112.880078] Switched to clocksource hpet
> [  116.069248] PM: Finishing wakeup.
> [  116.072574] Restarting tasks ... done.
> [  116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
> [  116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
> [  116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
> [  116.088526] systemd[1]: Got notification message for unit
> systemd-journald.service
> [  116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
> [  116.105099] PM: calling nb fw_pm_notify+0x0/0x150
> [  116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
> [  116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
> [  116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done
>
> And then nothing, and it's hung. Looks the same to me (apart from the tsc
> issues + hpet switch) as a successful resume:

then it stuck in pm_restore_console()?

Please check attached debut patch.

Thanks

Yinghai
---
 kernel/power/console.c |9 +
 1 file changed, 9 insertions(+)

Index: linux-2.6/kernel/power/console.c
===
--- linux-2.6.orig/kernel/power/console.c
+++ linux-2.6/kernel/power/console.c
@@ -51,6 +51,7 @@ void pm_vt_switch_required(struct device
 		if (tmp->dev == dev) {
 			/* already registered, update requirement */
 			tmp->required = required;
+			dev_info(dev, "pm_vt_switch_required() update %d\n", required);
 			goto out;
 		}
 	}
@@ -61,6 +62,7 @@ void pm_vt_switch_required(struct device
 
 	entry->required = required;
 	entry->dev = dev;
+	dev_info(dev, "pm_vt_switch_required() added %d\n", required);
 
 	list_add(>head, _vt_switch_list);
 out:
@@ -81,6 +83,7 @@ void pm_vt_switch_unregister(struct devi
 	mutex_lock(_switch_mutex);
 	list_for_each_entry(tmp, _vt_switch_list, head) {
 		if (tmp->dev == dev) {
+			dev_info(dev, "pm_vt_switch_required() removed %d\n", tmp->required);
 			list_del(>head);
 			kfree(tmp);
 			break;
@@ -131,11 +134,14 @@ int pm_prepare_console(void)
 	if (!pm_vt_switch())
 		return 0;
 
+	pr_info("pm_prepare_console() before move\n");
 	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
 	if (orig_fgconsole < 0)
 		return 1;
 
+	pr_info("pm_prepare_console() before redirect\n");
 	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
+	pr_info("pm_prepare_console() done\n");
 	return 0;
 }
 
@@ -145,7 +151,10 @@ void pm_restore_console(void)
 		return;
 
 	if (orig_fgconsole >= 0) {
+		pr_info("pm_restore_console() before move\n");
 		vt_move_to_console(orig_fgconsole, 0);
+		pr_info("pm_restore_console() before redirect\n");
 		vt_kmsg_redirect(orig_kmsg);
+		pr_info("pm_restore_console() done\n");
 	}
 }

Re: Machine crashes right after ~successful resume

2014-10-21 Thread Wilmer van der Gaast


Hello,

Sorry for the delay, finally poked at this again. It looks like the 
no_console_suspend flag was causing troubles, which I didn't really need 
anyway with logging going to my serial port.


This is what I get now on the failing resume:

[  112.879390] PM: resume of devices complete after 2239.905 msecs
[  112.880068] r8169 :07:00.0 eth0: link up
[  112.880078] Switched to clocksource hpet
[  116.069248] PM: Finishing wakeup.
[  116.072574] Restarting tasks ... done.
[  116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
[  116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
[  116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[  116.088526] systemd[1]: Got notification message for unit 
systemd-journald.service

[  116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[  116.105099] PM: calling nb fw_pm_notify+0x0/0x150
[  116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
[  116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
[  116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done

And then nothing, and it's hung. Looks the same to me (apart from the 
tsc issues + hpet switch) as a successful resume:


[   95.499513] PM: resume of devices complete after 1240.115 msecs
[   96.368940] r8169 :07:00.0 eth0: link up
[   98.676455] PM: Finishing wakeup.
[   98.679765] Restarting tasks ... done.
[   98.683821] PM: calling nb rcu_pm_notify+0x0/0x60
[   98.688524] PM: ... nb rcu_pm_notify+0x0/0x60 done
[   98.692044] systemd[1]: Got notification message for unit 
systemd-journald.service

[   98.700897] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[   98.706470] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[   98.712132] PM: calling nb fw_pm_notify+0x0/0x150
[   98.716848] PM: ... nb fw_pm_notify+0x0/0x150 done
[   98.721644] PM: calling nb bsp_pm_callback+0x0/0x50
[   98.726536] PM: ... nb bsp_pm_callback+0x0/0x50 done

Full logs in http://gaast.net/~wilmer/.lkml/bad3.17-patched-megadebug.txt


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-21 Thread Wilmer van der Gaast


Hello,

Sorry for the delay, finally poked at this again. It looks like the 
no_console_suspend flag was causing troubles, which I didn't really need 
anyway with logging going to my serial port.


This is what I get now on the failing resume:

[  112.879390] PM: resume of devices complete after 2239.905 msecs
[  112.880068] r8169 :07:00.0 eth0: link up
[  112.880078] Switched to clocksource hpet
[  116.069248] PM: Finishing wakeup.
[  116.072574] Restarting tasks ... done.
[  116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
[  116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
[  116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[  116.088526] systemd[1]: Got notification message for unit 
systemd-journald.service

[  116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[  116.105099] PM: calling nb fw_pm_notify+0x0/0x150
[  116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
[  116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
[  116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done

And then nothing, and it's hung. Looks the same to me (apart from the 
tsc issues + hpet switch) as a successful resume:


[   95.499513] PM: resume of devices complete after 1240.115 msecs
[   96.368940] r8169 :07:00.0 eth0: link up
[   98.676455] PM: Finishing wakeup.
[   98.679765] Restarting tasks ... done.
[   98.683821] PM: calling nb rcu_pm_notify+0x0/0x60
[   98.688524] PM: ... nb rcu_pm_notify+0x0/0x60 done
[   98.692044] systemd[1]: Got notification message for unit 
systemd-journald.service

[   98.700897] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[   98.706470] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[   98.712132] PM: calling nb fw_pm_notify+0x0/0x150
[   98.716848] PM: ... nb fw_pm_notify+0x0/0x150 done
[   98.721644] PM: calling nb bsp_pm_callback+0x0/0x50
[   98.726536] PM: ... nb bsp_pm_callback+0x0/0x50 done

Full logs in http://gaast.net/~wilmer/.lkml/bad3.17-patched-megadebug.txt


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-21 Thread Yinghai Lu

On Tue, Oct 21, 2014 at 2:40 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 Hello,

 Sorry for the delay, finally poked at this again. It looks like the
 no_console_suspend flag was causing troubles, which I didn't really need
 anyway with logging going to my serial port.

 This is what I get now on the failing resume:

 [  112.879390] PM: resume of devices complete after 2239.905 msecs
 [  112.880068] r8169 :07:00.0 eth0: link up
 [  112.880078] Switched to clocksource hpet
 [  116.069248] PM: Finishing wakeup.
 [  116.072574] Restarting tasks ... done.
 [  116.076664] PM: calling nb rcu_pm_notify+0x0/0x60
 [  116.081439] PM: ... nb rcu_pm_notify+0x0/0x60 done
 [  116.086267] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
 [  116.088526] systemd[1]: Got notification message for unit
 systemd-journald.service
 [  116.099442] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
 [  116.105099] PM: calling nb fw_pm_notify+0x0/0x150
 [  116.109812] PM: ... nb fw_pm_notify+0x0/0x150 done
 [  116.114623] PM: calling nb bsp_pm_callback+0x0/0x50
 [  116.119504] PM: ... nb bsp_pm_callback+0x0/0x50 done

 And then nothing, and it's hung. Looks the same to me (apart from the tsc
 issues + hpet switch) as a successful resume:

then it stuck in pm_restore_console()?

Please check attached debut patch.

Thanks

Yinghai
---
 kernel/power/console.c |9 +
 1 file changed, 9 insertions(+)

Index: linux-2.6/kernel/power/console.c
===
--- linux-2.6.orig/kernel/power/console.c
+++ linux-2.6/kernel/power/console.c
@@ -51,6 +51,7 @@ void pm_vt_switch_required(struct device
 		if (tmp-dev == dev) {
 			/* already registered, update requirement */
 			tmp-required = required;
+			dev_info(dev, pm_vt_switch_required() update %d\n, required);
 			goto out;
 		}
 	}
@@ -61,6 +62,7 @@ void pm_vt_switch_required(struct device
 
 	entry-required = required;
 	entry-dev = dev;
+	dev_info(dev, pm_vt_switch_required() added %d\n, required);
 
 	list_add(entry-head, pm_vt_switch_list);
 out:
@@ -81,6 +83,7 @@ void pm_vt_switch_unregister(struct devi
 	mutex_lock(vt_switch_mutex);
 	list_for_each_entry(tmp, pm_vt_switch_list, head) {
 		if (tmp-dev == dev) {
+			dev_info(dev, pm_vt_switch_required() removed %d\n, tmp-required);
 			list_del(tmp-head);
 			kfree(tmp);
 			break;
@@ -131,11 +134,14 @@ int pm_prepare_console(void)
 	if (!pm_vt_switch())
 		return 0;
 
+	pr_info(pm_prepare_console() before move\n);
 	orig_fgconsole = vt_move_to_console(SUSPEND_CONSOLE, 1);
 	if (orig_fgconsole  0)
 		return 1;
 
+	pr_info(pm_prepare_console() before redirect\n);
 	orig_kmsg = vt_kmsg_redirect(SUSPEND_CONSOLE);
+	pr_info(pm_prepare_console() done\n);
 	return 0;
 }
 
@@ -145,7 +151,10 @@ void pm_restore_console(void)
 		return;
 
 	if (orig_fgconsole = 0) {
+		pr_info(pm_restore_console() before move\n);
 		vt_move_to_console(orig_fgconsole, 0);
+		pr_info(pm_restore_console() before redirect\n);
 		vt_kmsg_redirect(orig_kmsg);
+		pr_info(pm_restore_console() done\n);
 	}
 }

Re: Machine crashes right after ~successful resume

2014-10-19 Thread Wilmer van der Gaast


Hello,

On 19-10-14 05:29, Yinghai Lu wrote:


Please try to "debug ignore_loglevel no_console_suspend".


Same thing. :-(

[   72.572354] Restarting tasks ... done.
[   72.576554] PM: calling nb rcu_pm_notify+0x0/0x60
[   72.581277] PM: ... nb rcu_pm_notify+0x0/0x60 done
[   72.586115] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[   72.591692] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[   72.597345] PM: calling nb fw_pm_notify+0x0/0x150
[   72.602047] PM: ... nb fw_pm_notify+0x0/0x150 done
[   72.606839] PM: calling nb bsp_pm_callback+0x0/0x50
[   72.611711] PM: ... nb bsp_pm_callback+0x0/0x50 done
[   73.382175] r8169 :07:00.0 eth0: link up
[   78.857526] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   79.025718] ata3.00: configured for UDMA/133
[   81.379533] ata4: softreset failed (device not ready)
[   82.623212] PM: Syncing filesystems ... done.
[   82.661564] PM: Preparing system for mem sleep
[   82.669405] Freezing user space processes ... (elapsed 0.001 seconds) 
done.
[   82.677729] Freezing remaining freezable tasks ... (elapsed 0.001 
seconds) done.

[   82.686338] PM: Entering mem sleep

And nothing related to resume. :-(

Is there any point of me retrying with the initcall_debug flag but 
without your patch?


Looking at your patch again, it seems pretty mad that this would cause 
such a big difference. Overnight I remembered how my machine has TSC 
issues at the time this bug shows, so I tried setting hpet as the 
clocksource. (hpet=force on the cmdline did not seem to have that effect 
so I used sysfs instead) No effect either.


I need to go now, can experiment a little more tonight.


Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-19 Thread Pavel Machek

On Sun 2014-10-19 00:57:12, Wilmer van der Gaast wrote:
> (Resending, forgot to hit reply-to-all.)
> 
> Hello Yinghai,
> 
> On 18-10-14 22:28, Yinghai Lu wrote:
> >
> > Please apply attached debug patch on top of v3.17 and boot with
> > "debug ignore_loglevel initcall_debug no_console_suspend".
> >
> > Hope we can find out which nb notifier cause problem.
> >
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
> 
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.

Tainting should not be a problem. If it works for you, it works...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-19 Thread Pavel Machek

On Sun 2014-10-19 00:57:12, Wilmer van der Gaast wrote:
 (Resending, forgot to hit reply-to-all.)
 
 Hello Yinghai,
 
 On 18-10-14 22:28, Yinghai Lu wrote:
 
  Please apply attached debug patch on top of v3.17 and boot with
  debug ignore_loglevel initcall_debug no_console_suspend.
 
  Hope we can find out which nb notifier cause problem.
 
 Did that. Strangely, or better said, quite annoyingly, I'm now getting no
 output anymore at all on the third resume! :-(
 
 I could try non-serial instead if you think that's worth a shot, but the
 most annoying thing is that my video doesn't get initialised properly after
 resume unless I have the tainting nvidia driver loaded. I could try if
 nouveau helps.

Tainting should not be a problem. If it works for you, it works...

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-19 Thread Wilmer van der Gaast


Hello,

On 19-10-14 05:29, Yinghai Lu wrote:


Please try to debug ignore_loglevel no_console_suspend.


Same thing. :-(

[   72.572354] Restarting tasks ... done.
[   72.576554] PM: calling nb rcu_pm_notify+0x0/0x60
[   72.581277] PM: ... nb rcu_pm_notify+0x0/0x60 done
[   72.586115] PM: calling nb cpu_hotplug_pm_callback+0x0/0x50
[   72.591692] PM: ... nb cpu_hotplug_pm_callback+0x0/0x50 done
[   72.597345] PM: calling nb fw_pm_notify+0x0/0x150
[   72.602047] PM: ... nb fw_pm_notify+0x0/0x150 done
[   72.606839] PM: calling nb bsp_pm_callback+0x0/0x50
[   72.611711] PM: ... nb bsp_pm_callback+0x0/0x50 done
[   73.382175] r8169 :07:00.0 eth0: link up
[   78.857526] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   79.025718] ata3.00: configured for UDMA/133
[   81.379533] ata4: softreset failed (device not ready)
[   82.623212] PM: Syncing filesystems ... done.
[   82.661564] PM: Preparing system for mem sleep
[   82.669405] Freezing user space processes ... (elapsed 0.001 seconds) 
done.
[   82.677729] Freezing remaining freezable tasks ... (elapsed 0.001 
seconds) done.

[   82.686338] PM: Entering mem sleep

And nothing related to resume. :-(

Is there any point of me retrying with the initcall_debug flag but 
without your patch?


Looking at your patch again, it seems pretty mad that this would cause 
such a big difference. Overnight I remembered how my machine has TSC 
issues at the time this bug shows, so I tried setting hpet as the 
clocksource. (hpet=force on the cmdline did not seem to have that effect 
so I used sysfs instead) No effect either.


I need to go now, can experiment a little more tonight.


Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Sat, Oct 18, 2014 at 4:57 PM, Wilmer van der Gaast  wrote:
> On 18-10-14 22:28, Yinghai Lu wrote:
>>
>> Please apply attached debug patch on top of v3.17 and boot with
>> "debug ignore_loglevel initcall_debug no_console_suspend".
>>
>> Hope we can find out which nb notifier cause problem.
>>
> Did that. Strangely, or better said, quite annoyingly, I'm now getting no
> output anymore at all on the third resume! :-(
>
> I could try non-serial instead if you think that's worth a shot, but the
> most annoying thing is that my video doesn't get initialised properly after
> resume unless I have the tainting nvidia driver loaded. I could try if
> nouveau helps.

oh no.

Please try to "debug ignore_loglevel no_console_suspend".

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-18 Thread Wilmer van der Gaast


(Resending, forgot to hit reply-to-all.)

Hello Yinghai,

On 18-10-14 22:28, Yinghai Lu wrote:
>
> Please apply attached debug patch on top of v3.17 and boot with
> "debug ignore_loglevel initcall_debug no_console_suspend".
>
> Hope we can find out which nb notifier cause problem.
>
Did that. Strangely, or better said, quite annoyingly, I'm now getting 
no output anymore at all on the third resume! :-(


I could try non-serial instead if you think that's worth a shot, but the 
most annoying thing is that my video doesn't get initialised properly 
after resume unless I have the tainting nvidia driver loaded. I could 
try if nouveau helps.


I've dropped all the debugging output in the same directory like before, 
look for files named like 
http://roy.gaast.net/~wilmer/.lkml/bad3.17-patched-initcall.txt



Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Thu, Oct 16, 2014 at 2:08 PM, Wilmer van der Gaast  wrote:
> Did that on this run, no difference either. For full completeness, I
> reproduced this problem with no modules loaded (done from initramfs) at all,
> with a kernel with your workaround included, logs are here:
> http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt

Yes, those output are good.

Please apply attached debug patch on top of v3.17 and boot with
"debug ignore_loglevel initcall_debug no_console_suspend".

Hope we can find out which nb notifier cause problem.

Thanks

Yinghai
---
 kernel/notifier.c   |9 +
 kernel/power/main.c |4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/main.c
===
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -24,16 +24,18 @@ DEFINE_MUTEX(pm_mutex);
 
 /* Routines for PM-transition notifications */
 
-static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
+BLOCKING_NOTIFIER_HEAD(pm_chain_head);
 
 int register_pm_notifier(struct notifier_block *nb)
 {
+	pr_info("PM: registering nb %pF\n", nb->notifier_call);
 	return blocking_notifier_chain_register(_chain_head, nb);
 }
 EXPORT_SYMBOL_GPL(register_pm_notifier);
 
 int unregister_pm_notifier(struct notifier_block *nb)
 {
+	pr_info("PM: unregistering nb %pF\n", nb->notifier_call);
 	return blocking_notifier_chain_unregister(_chain_head, nb);
 }
 EXPORT_SYMBOL_GPL(unregister_pm_notifier);
Index: linux-2.6/kernel/notifier.c
===
--- linux-2.6.orig/kernel/notifier.c
+++ linux-2.6/kernel/notifier.c
@@ -59,6 +59,9 @@ static int notifier_chain_unregister(str
 	return -ENOENT;
 }
 
+extern struct blocking_notifier_head pm_chain_head;
+#define PM_POST_SUSPEND		0x0004 /* Suspend finished */
+
 /**
  * notifier_call_chain - Informs the registered notifiers about an event.
  *	@nl:		Pointer to head of the blocking notifier chain
@@ -90,8 +93,14 @@ static int notifier_call_chain(struct no
 			continue;
 		}
 #endif
+		if (nl == _chain_head.head && val == PM_POST_SUSPEND)
+			pr_info("PM: calling nb %pF\n", nb->notifier_call);
+
 		ret = nb->notifier_call(nb, val, v);
 
+		if (nl == _chain_head.head && val == PM_POST_SUSPEND)
+			pr_info("PM: ... nb %pF done\n", nb->notifier_call);
+
 		if (nr_calls)
 			(*nr_calls)++;

Re: Machine crashes right after ~successful resume

On Thu, Oct 16, 2014 at 2:08 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 Did that on this run, no difference either. For full completeness, I
 reproduced this problem with no modules loaded (done from initramfs) at all,
 with a kernel with your workaround included, logs are here:
 http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt

Yes, those output are good.

Please apply attached debug patch on top of v3.17 and boot with
debug ignore_loglevel initcall_debug no_console_suspend.

Hope we can find out which nb notifier cause problem.

Thanks

Yinghai
---
 kernel/notifier.c   |9 +
 kernel/power/main.c |4 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/main.c
===
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -24,16 +24,18 @@ DEFINE_MUTEX(pm_mutex);
 
 /* Routines for PM-transition notifications */
 
-static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
+BLOCKING_NOTIFIER_HEAD(pm_chain_head);
 
 int register_pm_notifier(struct notifier_block *nb)
 {
+	pr_info(PM: registering nb %pF\n, nb-notifier_call);
 	return blocking_notifier_chain_register(pm_chain_head, nb);
 }
 EXPORT_SYMBOL_GPL(register_pm_notifier);
 
 int unregister_pm_notifier(struct notifier_block *nb)
 {
+	pr_info(PM: unregistering nb %pF\n, nb-notifier_call);
 	return blocking_notifier_chain_unregister(pm_chain_head, nb);
 }
 EXPORT_SYMBOL_GPL(unregister_pm_notifier);
Index: linux-2.6/kernel/notifier.c
===
--- linux-2.6.orig/kernel/notifier.c
+++ linux-2.6/kernel/notifier.c
@@ -59,6 +59,9 @@ static int notifier_chain_unregister(str
 	return -ENOENT;
 }
 
+extern struct blocking_notifier_head pm_chain_head;
+#define PM_POST_SUSPEND		0x0004 /* Suspend finished */
+
 /**
  * notifier_call_chain - Informs the registered notifiers about an event.
  *	@nl:		Pointer to head of the blocking notifier chain
@@ -90,8 +93,14 @@ static int notifier_call_chain(struct no
 			continue;
 		}
 #endif
+		if (nl == pm_chain_head.head  val == PM_POST_SUSPEND)
+			pr_info(PM: calling nb %pF\n, nb-notifier_call);
+
 		ret = nb-notifier_call(nb, val, v);
 
+		if (nl == pm_chain_head.head  val == PM_POST_SUSPEND)
+			pr_info(PM: ... nb %pF done\n, nb-notifier_call);
+
 		if (nr_calls)
 			(*nr_calls)++;

Re: Machine crashes right after ~successful resume

2014-10-18 Thread Wilmer van der Gaast


(Resending, forgot to hit reply-to-all.)

Hello Yinghai,

On 18-10-14 22:28, Yinghai Lu wrote:

 Please apply attached debug patch on top of v3.17 and boot with
 debug ignore_loglevel initcall_debug no_console_suspend.

 Hope we can find out which nb notifier cause problem.

Did that. Strangely, or better said, quite annoyingly, I'm now getting 
no output anymore at all on the third resume! :-(


I could try non-serial instead if you think that's worth a shot, but the 
most annoying thing is that my video doesn't get initialised properly 
after resume unless I have the tainting nvidia driver loaded. I could 
try if nouveau helps.


I've dropped all the debugging output in the same directory like before, 
look for files named like 
http://roy.gaast.net/~wilmer/.lkml/bad3.17-patched-initcall.txt



Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Sat, Oct 18, 2014 at 4:57 PM, Wilmer van der Gaast wil...@gaast.net wrote:
 On 18-10-14 22:28, Yinghai Lu wrote:

 Please apply attached debug patch on top of v3.17 and boot with
 debug ignore_loglevel initcall_debug no_console_suspend.

 Hope we can find out which nb notifier cause problem.

 Did that. Strangely, or better said, quite annoyingly, I'm now getting no
 output anymore at all on the third resume! :-(

 I could try non-serial instead if you think that's worth a shot, but the
 most annoying thing is that my video doesn't get initialised properly after
 resume unless I have the tainting nvidia driver loaded. I could try if
 nouveau helps.

oh no.

Please try to debug ignore_loglevel no_console_suspend.

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

I have filed a bug now: 
https://bugzilla.kernel.org/show_bug.cgi?id=86421 We should probably 
continue the discussion there now? I've added just you to the CC field, 
not sure who else on this thread is still interested at this point.


On 16-10-14 17:36, Yinghai Lu wrote:


Can you put "debug ignore_loglevel" in boot command line?
So we can compare output from serial console between good one and bad
one directly.

Did that, will throw the output in the same log dir. Those arguments 
resulted in very little extra output. :-/



Also did you try to remove r8169 every time before suspend?

Did that on this run, no difference either. For full completeness, I 
reproduced this problem with no modules loaded (done from initramfs) at 
all, with a kernel with your workaround included, logs are here: 
http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-16 Thread Yinghai Lu

On Thu, Oct 16, 2014 at 2:36 AM, Wilmer van der Gaast  wrote:
> Hello,
>
> On 16-10-14 05:32, Yinghai Lu wrote:
>>
>>
>> Can you please try attached patch? that should workaround the problem.
>>
> Sadly, no luck. (I do assume you meant me to use the patch against a clean
> 3.17 tree *without* yesterday's revert patch applied.) Back to a crash
> at/after the third resume:
>
> [  372.502897] usb 3-1.1: reset high-speed USB device number 3 using
> ehci-pci
> [  372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
> [  373.398437] Clocksource tsc unstable (delta = -136457848 ns)
> [  373.897503] Switched to clocksource hpet
> [  373.897536] PM: resume of devices complete after 2143.535 msecs
> [  373.898225] r8169 :07:00.0 eth0: link up
> [  374.319311] Restarting tasks ... done.
> (And then nothing.)
>
> Interestingly I did see the "resume of devices" time grow on each resume
> again this time. I'll put the full dmesg dump in the same place like before:
> http://gaast.net/~wilmer/.lkml/

Checked that dmesg and console output, looks ok from last resume.

Can you put "debug ignore_loglevel" in boot command line?
So we can compare output from serial console between good one and bad
one directly.

Also did you try to remove r8169 every time before suspend?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

On 16-10-14 05:32, Yinghai Lu wrote:


Can you please try attached patch? that should workaround the problem.

Sadly, no luck. (I do assume you meant me to use the patch against a 
clean 3.17 tree *without* yesterday's revert patch applied.) Back to a 
crash at/after the third resume:


[  372.502897] usb 3-1.1: reset high-speed USB device number 3 using 
ehci-pci

[  372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
[  373.398437] Clocksource tsc unstable (delta = -136457848 ns)
[  373.897503] Switched to clocksource hpet
[  373.897536] PM: resume of devices complete after 2143.535 msecs
[  373.898225] r8169 :07:00.0 eth0: link up
[  374.319311] Restarting tasks ... done.
(And then nothing.)

Interestingly I did see the "resume of devices" time grow on each resume 
again this time. I'll put the full dmesg dump in the same place like 
before: http://gaast.net/~wilmer/.lkml/


There's a lspci -vv dump there as well, as Bjorn asked for. I'll file a 
bug on bugzilla tonight.



as some driver is using pci_enable_device in .resume instead of
pci_renable_device

Maybe this doesn't matter, but I could reproduce this issue even with no 
modules loaded at all (so barebone that I couldn't even mount my rootfs 
and had to do this testing in the initrd), so with only mainline kernel 
code running.



Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

On 16-10-14 05:32, Yinghai Lu wrote:


Can you please try attached patch? that should workaround the problem.

Sadly, no luck. (I do assume you meant me to use the patch against a 
clean 3.17 tree *without* yesterday's revert patch applied.) Back to a 
crash at/after the third resume:


[  372.502897] usb 3-1.1: reset high-speed USB device number 3 using 
ehci-pci

[  372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
[  373.398437] Clocksource tsc unstable (delta = -136457848 ns)
[  373.897503] Switched to clocksource hpet
[  373.897536] PM: resume of devices complete after 2143.535 msecs
[  373.898225] r8169 :07:00.0 eth0: link up
[  374.319311] Restarting tasks ... done.
(And then nothing.)

Interestingly I did see the resume of devices time grow on each resume 
again this time. I'll put the full dmesg dump in the same place like 
before: http://gaast.net/~wilmer/.lkml/


There's a lspci -vv dump there as well, as Bjorn asked for. I'll file a 
bug on bugzilla tonight.



as some driver is using pci_enable_device in .resume instead of
pci_renable_device

Maybe this doesn't matter, but I could reproduce this issue even with no 
modules loaded at all (so barebone that I couldn't even mount my rootfs 
and had to do this testing in the initrd), so with only mainline kernel 
code running.



Thanks,

Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-16 Thread Yinghai Lu

On Thu, Oct 16, 2014 at 2:36 AM, Wilmer van der Gaast wil...@gaast.net wrote:
 Hello,

 On 16-10-14 05:32, Yinghai Lu wrote:


 Can you please try attached patch? that should workaround the problem.

 Sadly, no luck. (I do assume you meant me to use the patch against a clean
 3.17 tree *without* yesterday's revert patch applied.) Back to a crash
 at/after the third resume:

 [  372.502897] usb 3-1.1: reset high-speed USB device number 3 using
 ehci-pci
 [  372.678765] usb 2-1.5: reset low-speed USB device number 3 using ehci-pci
 [  373.398437] Clocksource tsc unstable (delta = -136457848 ns)
 [  373.897503] Switched to clocksource hpet
 [  373.897536] PM: resume of devices complete after 2143.535 msecs
 [  373.898225] r8169 :07:00.0 eth0: link up
 [  374.319311] Restarting tasks ... done.
 (And then nothing.)

 Interestingly I did see the resume of devices time grow on each resume
 again this time. I'll put the full dmesg dump in the same place like before:
 http://gaast.net/~wilmer/.lkml/

Checked that dmesg and console output, looks ok from last resume.

Can you put debug ignore_loglevel in boot command line?
So we can compare output from serial console between good one and bad
one directly.

Also did you try to remove r8169 every time before suspend?

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

I have filed a bug now: 
https://bugzilla.kernel.org/show_bug.cgi?id=86421 We should probably 
continue the discussion there now? I've added just you to the CC field, 
not sure who else on this thread is still interested at this point.


On 16-10-14 17:36, Yinghai Lu wrote:


Can you put debug ignore_loglevel in boot command line?
So we can compare output from serial console between good one and bad
one directly.

Did that, will throw the output in the same log dir. Those arguments 
resulted in very little extra output. :-/



Also did you try to remove r8169 every time before suspend?

Did that on this run, no difference either. For full completeness, I 
reproduced this problem with no modules loaded (done from initramfs) at 
all, with a kernel with your workaround included, logs are here: 
http://gaast.net/~wilmer/.lkml/bad3.17-patched-debug-initramfs.txt



Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Wed, Oct 15, 2014 at 4:34 PM, Wilmer van der Gaast  wrote:
>
> Is there anything I can do now to find out why your change is causing my
> machine to crash?

Can you please try attached patch? that should workaround the problem.

as some driver is using pci_enable_device in .resume instead of
pci_renable_device

We should skip the pci_enable_bridge in those pci_enable_device to avoid
contention between async device_resume.

Thanks

Yinghai
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..6567831 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1266,7 +1266,6 @@ static void pci_enable_bridge(struct pci_dev *dev)
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
-	struct pci_dev *bridge;
 	int err;
 	int i, bars = 0;
 
@@ -1285,9 +1284,19 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 	if (atomic_inc_return(>enable_cnt) > 1)
 		return 0;		/* already enabled */
 
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
+	/*
+	 * Do not enable bridge again on resume path, as parent state
+	 * get restored before.
+	 * Also could avoid delay between different async resume.
+	 */
+	if (!(dev->dev.power.is_suspended ||
+	  dev->dev.power.is_noirq_suspended ||
+	  dev->dev.power.is_late_suspended)) {
+		struct pci_dev *bridge = pci_upstream_bridge(dev);
+
+		if (bridge)
+			pci_enable_bridge(bridge);
+	}
 
 	/* only skip sriov related */
 	for (i = 0; i <= PCI_ROM_RESOURCE; i++)

Re: Machine crashes right after ~successful resume


Hello Yinghai,

On 15-10-14 19:39, Yinghai Lu wrote:


so third resume will not work? that is strange.
second and third should not use same code path...

Always exactly the third time, yes. Seems strange indeed. :-( I was 
under the impression that on each resume, completion time of device 
resumes was growing, and wondered whether that could be related. However 
looking back at my logs, this is not consistent, in some cases the time 
is constant.


Anyway, your patch works! Had to tweak it slightly to apply cleanly to 
the 3.17 tarball I have, but my machine now went through eleven 
successful suspend+resume cycles again.


Is there anything I can do now to find out why your change is causing my 
machine to crash?


Thank you!


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Wed, Oct 15, 2014 at 6:58 AM, Bjorn Helgaas  wrote:
> [+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
> until they're needed")]
>
> On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast 
>> Not sure why 2e8b... was initially found guilty by git bisect, I fear
>> that my testing was not thorough enough. I've verified a couple of times
>> now that 928bea96... does cause crashes and the previous revision does not.

so third resume will not work? that is strange.
second and third should not use same code path...

>>
>> 928bea... seems to reshuffle PCI initialisation a little bit and has
>> caused more troubles, judging from a Google query for it. Some changes
>> were made already as a result, and this unfortunately makes a revert on
>> a later kernel tree (to see if that fixes the problem for me) much less
>> straight-forward. :-(
>
> More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Please check if attached reverting patch would work on 3.17.

Yinghai
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
 			 * Assign resources.
 			 */
 			pci_bus_assign_resources(bus);
+
+
+			/*
+			 * Enable bridges
+			 */
+			pci_enable_bridges(bus);
 		}
 
 		/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
 	pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
 	pci_bus_size_bridges(rootbus);
 	pci_bus_assign_resources(rootbus);
+	pci_enable_bridges(rootbus);
 	return 0;
 }
 
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
 		if (!pci_has_flag(PCI_PROBE_ONLY)) {
 			pci_bus_size_bridges(bus);
 			pci_bus_assign_resources(bus);
+			pci_enable_bridges(bus);
 		}
 	}
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
 
 		pci_bus_size_bridges(bus);
 		pci_bus_assign_resources(bus);
+		pci_enable_bridges(bus);
 	} else {
 		pci_free_resource_list();
 	}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	if (system_state != SYSTEM_BOOTING) {
 		pcibios_resource_survey_bus(root->bus);
 		pci_assign_unassigned_root_bus_resources(root->bus);
+
+		/* need to after hot-added ioapic is registered */
+		pci_enable_bridges(root->bus);
 	}
 
 	pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
 		lba_dump_res(_dev->hba.lmmio_space, 2);
 #endif
 	}
+	pci_enable_bridges(lba_bus);
 
 	/*
 	** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
+void pci_enable_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int retval;
+
+	list_for_each_entry(dev, >devices, bus_list) {
+		if (dev->subordinate) {
+			if (!pci_is_enabled(dev)) {
+retval = pci_enable_device(dev);
+if (retval)
+	dev_err(>dev, "Error enabling bridge (%d), continuing\n", retval);
+pci_set_master(dev);
+			}
+			pci_enable_bridges(dev->subordinate);
+		}
+	}
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
 /** pci_walk_bus - walk devices on/under bus, calling callback.
  *  @top  bus whose devices should be walked
  *  @cb   callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
 	acpiphp_set_acpi_region(slot);
+	pci_enable_bridges(bus);
 
 	list_for_each_entry(dev, >devices, bus_list) {
 		/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
 }

Re: Machine crashes right after ~successful resume

2014-10-15 Thread Bjorn Helgaas

[+cc Yinghai, author of 928bea964827 ("PCI: Delay enabling bridges
until they're needed")]

On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast  wrote:
> Hello Rafael,
>
> Rafael J. Wysocki (r...@rjwysocki.net) wrote:
>> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
>> That's a merge, isn't it?
>>
> Correct, it was, and I did try to figure out which of its parents was
> the guilty one, but then I found out the real problem is
> 928bea964827d7824b548c1f8e06eccbbc4d0d7d.
>
> Not sure why 2e8b... was initially found guilty by git bisect, I fear
> that my testing was not thorough enough. I've verified a couple of times
> now that 928bea96... does cause crashes and the previous revision does not.
>
> 928bea... seems to reshuffle PCI initialisation a little bit and has
> caused more troubles, judging from a Google query for it. Some changes
> were made already as a result, and this unfortunately makes a revert on
> a later kernel tree (to see if that fixes the problem for me) much less
> straight-forward. :-(

More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Can you open a report at http://bugzilla.kernel.org, please?  Please
also attach the complete "lspci -vv" output.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

Hello Rafael,

Rafael J. Wysocki (r...@rjwysocki.net) wrote:
> > Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
> That's a merge, isn't it?
> 
Correct, it was, and I did try to figure out which of its parents was
the guilty one, but then I found out the real problem is
928bea964827d7824b548c1f8e06eccbbc4d0d7d.

Not sure why 2e8b... was initially found guilty by git bisect, I fear
that my testing was not thorough enough. I've verified a couple of times
now that 928bea96... does cause crashes and the previous revision does not.

928bea... seems to reshuffle PCI initialisation a little bit and has
caused more troubles, judging from a Google query for it. Some changes
were made already as a result, and this unfortunately makes a revert on
a later kernel tree (to see if that fixes the problem for me) much less
straight-forward. :-(

I can look at the code and see how to revert this now, but I'm
definitely not very proficient outside userland.

Wilmer v/d Gaast.

-- 
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+

signature.asc
Description: Digital signature

Re: Machine crashes right after ~successful resume

Hello Rafael,

Rafael J. Wysocki (r...@rjwysocki.net) wrote:
  Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
 That's a merge, isn't it?
 
Correct, it was, and I did try to figure out which of its parents was
the guilty one, but then I found out the real problem is
928bea964827d7824b548c1f8e06eccbbc4d0d7d.

Not sure why 2e8b... was initially found guilty by git bisect, I fear
that my testing was not thorough enough. I've verified a couple of times
now that 928bea96... does cause crashes and the previous revision does not.

928bea... seems to reshuffle PCI initialisation a little bit and has
caused more troubles, judging from a Google query for it. Some changes
were made already as a result, and this unfortunately makes a revert on
a later kernel tree (to see if that fixes the problem for me) much less
straight-forward. :-(

I can look at the code and see how to revert this now, but I'm
definitely not very proficient outside userland.


Wilmer v/d Gaast.

-- 
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+


signature.asc
Description: Digital signature

Re: Machine crashes right after ~successful resume

2014-10-15 Thread Bjorn Helgaas

[+cc Yinghai, author of 928bea964827 (PCI: Delay enabling bridges
until they're needed)]

On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast wil...@gaast.net wrote:
 Hello Rafael,

 Rafael J. Wysocki (r...@rjwysocki.net) wrote:
  Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?
 That's a merge, isn't it?

 Correct, it was, and I did try to figure out which of its parents was
 the guilty one, but then I found out the real problem is
 928bea964827d7824b548c1f8e06eccbbc4d0d7d.

 Not sure why 2e8b... was initially found guilty by git bisect, I fear
 that my testing was not thorough enough. I've verified a couple of times
 now that 928bea96... does cause crashes and the previous revision does not.

 928bea... seems to reshuffle PCI initialisation a little bit and has
 caused more troubles, judging from a Google query for it. Some changes
 were made already as a result, and this unfortunately makes a revert on
 a later kernel tree (to see if that fixes the problem for me) much less
 straight-forward. :-(

More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Can you open a report at http://bugzilla.kernel.org, please?  Please
also attach the complete lspci -vv output.

Bjorn
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Wed, Oct 15, 2014 at 6:58 AM, Bjorn Helgaas bhelg...@google.com wrote:
 [+cc Yinghai, author of 928bea964827 (PCI: Delay enabling bridges
 until they're needed)]

 On Wed, Oct 15, 2014 at 5:16 AM, Wilmer van der Gaast wil...@gaast.net
 Not sure why 2e8b... was initially found guilty by git bisect, I fear
 that my testing was not thorough enough. I've verified a couple of times
 now that 928bea96... does cause crashes and the previous revision does not.

so third resume will not work? that is strange.
second and third should not use same code path...


 928bea... seems to reshuffle PCI initialisation a little bit and has
 caused more troubles, judging from a Google query for it. Some changes
 were made already as a result, and this unfortunately makes a revert on
 a later kernel tree (to see if that fixes the problem for me) much less
 straight-forward. :-(

 More details (from initial post) here: http://roy.gaast.net/~wilmer/.lkml/

Please check if attached reverting patch would work on 3.17.

Yinghai
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index 17a26c1..306ca53 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -538,6 +538,12 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
 			 * Assign resources.
 			 */
 			pci_bus_assign_resources(bus);
+
+
+			/*
+			 * Enable bridges
+			 */
+			pci_enable_bridges(bus);
 		}
 
 		/*
diff --git a/arch/m68k/coldfire/pci.c b/arch/m68k/coldfire/pci.c
index df96792..b33f97a 100644
--- a/arch/m68k/coldfire/pci.c
+++ b/arch/m68k/coldfire/pci.c
@@ -319,6 +319,7 @@ static int __init mcf_pci_init(void)
 	pci_fixup_irqs(pci_common_swizzle, mcf_pci_map_irq);
 	pci_bus_size_bridges(rootbus);
 	pci_bus_assign_resources(rootbus);
+	pci_enable_bridges(rootbus);
 	return 0;
 }
 
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 1bf60b1..4f2e17d 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -113,6 +113,7 @@ static void pcibios_scanbus(struct pci_controller *hose)
 		if (!pci_has_flag(PCI_PROBE_ONLY)) {
 			pci_bus_size_bridges(bus);
 			pci_bus_assign_resources(bus);
+			pci_enable_bridges(bus);
 		}
 	}
 }
diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
index 1bc09ee..5272327 100644
--- a/arch/sh/drivers/pci/pci.c
+++ b/arch/sh/drivers/pci/pci.c
@@ -69,6 +69,7 @@ static void pcibios_scanbus(struct pci_channel *hose)
 
 		pci_bus_size_bridges(bus);
 		pci_bus_assign_resources(bus);
+		pci_enable_bridges(bus);
 	} else {
 		pci_free_resource_list(resources);
 	}
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index cd4de7e..c15bc3c 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -614,6 +614,9 @@ static int acpi_pci_root_add(struct acpi_device *device,
 	if (system_state != SYSTEM_BOOTING) {
 		pcibios_resource_survey_bus(root-bus);
 		pci_assign_unassigned_root_bus_resources(root-bus);
+
+		/* need to after hot-added ioapic is registered */
+		pci_enable_bridges(root-bus);
 	}
 
 	pci_lock_rescan_remove();
diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
index 37e71ff..19f6f70 100644
--- a/drivers/parisc/lba_pci.c
+++ b/drivers/parisc/lba_pci.c
@@ -1590,6 +1590,7 @@ lba_driver_probe(struct parisc_device *dev)
 		lba_dump_res(lba_dev-hba.lmmio_space, 2);
 #endif
 	}
+	pci_enable_bridges(lba_bus);
 
 	/*
 	** Once PCI register ops has walked the bus, access to config
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 73aef51..761601e 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -283,6 +283,26 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
+void pci_enable_bridges(struct pci_bus *bus)
+{
+	struct pci_dev *dev;
+	int retval;
+
+	list_for_each_entry(dev, bus-devices, bus_list) {
+		if (dev-subordinate) {
+			if (!pci_is_enabled(dev)) {
+retval = pci_enable_device(dev);
+if (retval)
+	dev_err(dev-dev, Error enabling bridge (%d), continuing\n, retval);
+pci_set_master(dev);
+			}
+			pci_enable_bridges(dev-subordinate);
+		}
+	}
+}
+EXPORT_SYMBOL(pci_enable_bridges);
+
+
 /** pci_walk_bus - walk devices on/under bus, calling callback.
  *  @top  bus whose devices should be walked
  *  @cb   callback to be called for each device found
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index bcb90e4..8fadc84 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -511,6 +511,7 @@ static void enable_slot(struct acpiphp_slot *slot)
 	acpiphp_sanitize_bus(bus);
 	pcie_bus_configure_settings(bus);
 	acpiphp_set_acpi_region(slot);
+	pci_enable_bridges(bus);
 
 	list_for_each_entry(dev, bus-devices, bus_list) {
 		/* Assume that newly added devices are powered on already. */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..4121518 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1242,31 +1242,8 @@ int pci_reenable_device(struct pci_dev *dev)
 }

Re: Machine crashes right after ~successful resume


Hello Yinghai,

On 15-10-14 19:39, Yinghai Lu wrote:


so third resume will not work? that is strange.
second and third should not use same code path...

Always exactly the third time, yes. Seems strange indeed. :-( I was 
under the impression that on each resume, completion time of device 
resumes was growing, and wondered whether that could be related. However 
looking back at my logs, this is not consistent, in some cases the time 
is constant.


Anyway, your patch works! Had to tweak it slightly to apply cleanly to 
the 3.17 tarball I have, but my machine now went through eleven 
successful suspend+resume cycles again.


Is there anything I can do now to find out why your change is causing my 
machine to crash?


Thank you!


Wilmer v/d Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

On Wed, Oct 15, 2014 at 4:34 PM, Wilmer van der Gaast wil...@gaast.net wrote:

 Is there anything I can do now to find out why your change is causing my
 machine to crash?

Can you please try attached patch? that should workaround the problem.

as some driver is using pci_enable_device in .resume instead of
pci_renable_device

We should skip the pci_enable_bridge in those pci_enable_device to avoid
contention between async device_resume.

Thanks

Yinghai
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..6567831 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1266,7 +1266,6 @@ static void pci_enable_bridge(struct pci_dev *dev)
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
-	struct pci_dev *bridge;
 	int err;
 	int i, bars = 0;
 
@@ -1285,9 +1284,19 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 	if (atomic_inc_return(dev-enable_cnt)  1)
 		return 0;		/* already enabled */
 
-	bridge = pci_upstream_bridge(dev);
-	if (bridge)
-		pci_enable_bridge(bridge);
+	/*
+	 * Do not enable bridge again on resume path, as parent state
+	 * get restored before.
+	 * Also could avoid delay between different async resume.
+	 */
+	if (!(dev-dev.power.is_suspended ||
+	  dev-dev.power.is_noirq_suspended ||
+	  dev-dev.power.is_late_suspended)) {
+		struct pci_dev *bridge = pci_upstream_bridge(dev);
+
+		if (bridge)
+			pci_enable_bridge(bridge);
+	}
 
 	/* only skip sriov related */
 	for (i = 0; i = PCI_ROM_RESOURCE; i++)

Re: Machine crashes right after ~successful resume

2014-10-13 Thread Rafael J. Wysocki

On Sunday, October 12, 2014 10:40:32 PM Pavel Machek wrote:
> Bjorn, any ideas?
> 
> Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

That's a merge, isn't it?

I'd rather check what the pci/misc branch was based on and then bisect that
branch.

If you do

$ git show fed2451

you'll see (among other things) that this indeed is the PCI branch merged
by that commit and that it is based on

3b2f64d00c46 Linux 3.11-rc2

So, you can do

$ git bisect 3b2f64d00c46..fed2451

and see which of the commits in there introduced the problem you're seeing.

Note: Test fed2451 itself *first* and if that is bad already, then the merge
itself was problematic, in which case please let me know.


> On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> > Hello,
> > 
> > Many thanks for your response!
> > 
> > On 12-10-14 15:30, Pavel Machek wrote:
> > >
> > >Has it ever worked ok? ...aha, in 3.10, ok.
> > >
> > Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> > still has this issue, 3.10 is completely fine all the way up to 3.10.57
> > (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> > have other suspend-resume stability issues not present anymore in later
> > kernels, I've mostly not used those results.
> > 
> > git bisect: I've finally succeeded! I've tried automating it completely, but
> > sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> > watchdog work. :-(
> > 
> > The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
> > 
> > Merge: 07f2daa fed2451
> > Author: Bjorn Helgaas 
> > Date:   Wed Aug 28 20:55:41 2013 -0600
> > 
> > Merge branch 'pci/misc' into next
> > 
> > * pci/misc:
> >   PCI: Remove pcie_cap_has_devctl()
> >   PCI: Support PCIe Capability Slot registers only for ports with slots
> >   PCI: Remove PCIe Capability version checks
> >   PCI: Allow PCIe Capability link-related register access for switches
> >   PCI: Add offsets of PCIe capability registers
> >   PCI: Tidy bitmasks and spacing of PCIe capability definitions
> >   PCI: Remove obsolete comment reference to pci_pcie_cap2()
> >   PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
> >   PCI: Rename PCIe capability definitions to follow convention
> >   PCI: Disable decoding for BAR sizing only when it was actually enabled
> >   PCI: Add comment about needing pci_msi_off() even when
> > CONFIG_PCI_MSI=n
> >   PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> > functionality
> > 
> > I've then tried to narrow down which of the merged changes is my issue but
> > with no luck, possibly because there's a problem with a combination of one
> > of these changes, and a change that was not in the pci/misc branch at the
> > time. I could do a manual test instead.
> > 
> > >>I've already tried to skip the NVidia + VMware modules at boot time (as 
> > >>you
> > >>can see from the logs they're not loaded at any point), but it didn't 
> > >>help.
> > >>I could try omitting more modules.
> > >Yes, try with minimal modules (and no s2ram) would be nice.
> > >   
> > I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> > I can try this again with an even more minimal set. If this improves the
> > situation, I'll post again.
> > 
> > 
> > Wilmer van der Gaast.
> > 
> 
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

2014-10-13 Thread Rafael J. Wysocki

On Sunday, October 12, 2014 10:40:32 PM Pavel Machek wrote:
 Bjorn, any ideas?
 
 Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

That's a merge, isn't it?

I'd rather check what the pci/misc branch was based on and then bisect that
branch.

If you do

$ git show fed2451

you'll see (among other things) that this indeed is the PCI branch merged
by that commit and that it is based on

3b2f64d00c46 Linux 3.11-rc2

So, you can do

$ git bisect 3b2f64d00c46..fed2451

and see which of the commits in there introduced the problem you're seeing.

Note: Test fed2451 itself *first* and if that is bad already, then the merge
itself was problematic, in which case please let me know.


 On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
  Hello,
  
  Many thanks for your response!
  
  On 12-10-14 15:30, Pavel Machek wrote:
  
  Has it ever worked ok? ...aha, in 3.10, ok.
  
  Correct. And I've tried a few more kernels now, compiled on my own. 3.17
  still has this issue, 3.10 is completely fine all the way up to 3.10.57
  (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
  have other suspend-resume stability issues not present anymore in later
  kernels, I've mostly not used those results.
  
  git bisect: I've finally succeeded! I've tried automating it completely, but
  sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
  watchdog work. :-(
  
  The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
  
  Merge: 07f2daa fed2451
  Author: Bjorn Helgaas bhelg...@google.com
  Date:   Wed Aug 28 20:55:41 2013 -0600
  
  Merge branch 'pci/misc' into next
  
  * pci/misc:
PCI: Remove pcie_cap_has_devctl()
PCI: Support PCIe Capability Slot registers only for ports with slots
PCI: Remove PCIe Capability version checks
PCI: Allow PCIe Capability link-related register access for switches
PCI: Add offsets of PCIe capability registers
PCI: Tidy bitmasks and spacing of PCIe capability definitions
PCI: Remove obsolete comment reference to pci_pcie_cap2()
PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
PCI: Rename PCIe capability definitions to follow convention
PCI: Disable decoding for BAR sizing only when it was actually enabled
PCI: Add comment about needing pci_msi_off() even when
  CONFIG_PCI_MSI=n
PCI: Add pcibios_pm_ops for optional arch-specific hibernate
  functionality
  
  I've then tried to narrow down which of the merged changes is my issue but
  with no luck, possibly because there's a problem with a combination of one
  of these changes, and a change that was not in the pci/misc branch at the
  time. I could do a manual test instead.
  
  I've already tried to skip the NVidia + VMware modules at boot time (as 
  you
  can see from the logs they're not loaded at any point), but it didn't 
  help.
  I could try omitting more modules.
  Yes, try with minimal modules (and no s2ram) would be nice.
 
  I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
  I can try this again with an even more minimal set. If this improves the
  situation, I'll post again.
  
  
  Wilmer van der Gaast.
  
 
 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


On 12-10-14 21:40, Pavel Machek wrote:

Bjorn, any ideas?

Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?


I've tried this, too many conflicts unfortunately.

Just noticed this message appear during failing resumes by the way:

[   54.203072] Clocksource tsc unstable (delta = -499956111 ns)
[   54.203151] Switched to clocksource hpet
[   54.203166] PM: resume of devices complete after 2142.341 msecs

Though not all the time. Feels like it's more another symptom of the 
same problem. In my original e-mail I already noted timing strangeness, 
with a 0.01s ping interval growing to 0.4s+.


Anyway, my previous bisect result appears to be wrong. :-( I've done 
another bisect on a narrow range around it, now 
928bea964827d7824b548c1f8e06eccbbc4d0d7d is considered guilty. I've 
rerun the test twice with that revision and the one before it 
(55ed83a615730c2578da155bc99b68f4417ffe20), and the result seems 
consistent now; 928bea gets me just two clean suspend+resumes, 55ed83 more.


I have tried to revert this change in a 3.17 tree but it didn't apply 
cleanly. One issue was a "Unreversed patch detected!" which looks to me 
like some of this work has been changed already. Even against a 3.12 
tree I get this issue.


Just to be sure, I've tried ignoring the unreversed patch warning and 
tweaked the patch in two more places to make it apply, but indeed that 
does not solve my problem.


A Google search for the revision number shows that there has been quite 
a discussion about it already. Maybe my machine has found another issue 
(though I suppose my machine's more guilty than the kernel! :-/).



I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
I can try this again with an even more minimal set. If this improves the
situation, I'll post again.

This is done: Still seeing the same issue. (And I'm using raw echo 
mem>/proc/... for all testing now.) Same for a "make defconfig" kernel.



Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

Bjorn, any ideas?

Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

Thanks,
Pavel

On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
> Hello,
> 
> Many thanks for your response!
> 
> On 12-10-14 15:30, Pavel Machek wrote:
> >
> >Has it ever worked ok? ...aha, in 3.10, ok.
> >
> Correct. And I've tried a few more kernels now, compiled on my own. 3.17
> still has this issue, 3.10 is completely fine all the way up to 3.10.57
> (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
> have other suspend-resume stability issues not present anymore in later
> kernels, I've mostly not used those results.
> 
> git bisect: I've finally succeeded! I've tried automating it completely, but
> sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
> watchdog work. :-(
> 
> The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
> 
> Merge: 07f2daa fed2451
> Author: Bjorn Helgaas 
> Date:   Wed Aug 28 20:55:41 2013 -0600
> 
> Merge branch 'pci/misc' into next
> 
> * pci/misc:
>   PCI: Remove pcie_cap_has_devctl()
>   PCI: Support PCIe Capability Slot registers only for ports with slots
>   PCI: Remove PCIe Capability version checks
>   PCI: Allow PCIe Capability link-related register access for switches
>   PCI: Add offsets of PCIe capability registers
>   PCI: Tidy bitmasks and spacing of PCIe capability definitions
>   PCI: Remove obsolete comment reference to pci_pcie_cap2()
>   PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
>   PCI: Rename PCIe capability definitions to follow convention
>   PCI: Disable decoding for BAR sizing only when it was actually enabled
>   PCI: Add comment about needing pci_msi_off() even when
> CONFIG_PCI_MSI=n
>   PCI: Add pcibios_pm_ops for optional arch-specific hibernate
> functionality
> 
> I've then tried to narrow down which of the merged changes is my issue but
> with no luck, possibly because there's a problem with a combination of one
> of these changes, and a change that was not in the pci/misc branch at the
> time. I could do a manual test instead.
> 
> >>I've already tried to skip the NVidia + VMware modules at boot time (as you
> >>can see from the logs they're not loaded at any point), but it didn't help.
> >>I could try omitting more modules.
> >Yes, try with minimal modules (and no s2ram) would be nice.
> > 
> I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
> I can try this again with an even more minimal set. If this improves the
> situation, I'll post again.
> 
> 
> Wilmer van der Gaast.
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

Many thanks for your response!

On 12-10-14 15:30, Pavel Machek wrote:


Has it ever worked ok? ...aha, in 3.10, ok.

Correct. And I've tried a few more kernels now, compiled on my own. 3.17 
still has this issue, 3.10 is completely fine all the way up to 3.10.57 
(I've tested just under 50 cycles last night). 3.11 I tried but it seems 
to have other suspend-resume stability issues not present anymore in 
later kernels, I've mostly not used those results.


git bisect: I've finally succeeded! I've tried automating it completely, 
but sadly Gigabyte couldn't be bothered wiring up the motherboard to 
make the watchdog work. :-(


The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb

Merge: 07f2daa fed2451
Author: Bjorn Helgaas 
Date:   Wed Aug 28 20:55:41 2013 -0600

Merge branch 'pci/misc' into next

* pci/misc:
  PCI: Remove pcie_cap_has_devctl()
  PCI: Support PCIe Capability Slot registers only for ports with slots
  PCI: Remove PCIe Capability version checks
  PCI: Allow PCIe Capability link-related register access for switches
  PCI: Add offsets of PCIe capability registers
  PCI: Tidy bitmasks and spacing of PCIe capability definitions
  PCI: Remove obsolete comment reference to pci_pcie_cap2()
  PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
  PCI: Rename PCIe capability definitions to follow convention
  PCI: Disable decoding for BAR sizing only when it was actually 
enabled
  PCI: Add comment about needing pci_msi_off() even when 
CONFIG_PCI_MSI=n
  PCI: Add pcibios_pm_ops for optional arch-specific hibernate 
functionality


I've then tried to narrow down which of the merged changes is my issue 
but with no luck, possibly because there's a problem with a combination 
of one of these changes, and a change that was not in the pci/misc 
branch at the time. I could do a manual test instead.



I've already tried to skip the NVidia + VMware modules at boot time (as you
can see from the logs they're not loaded at any point), but it didn't help.
I could try omitting more modules.

Yes, try with minimal modules (and no s2ram) would be nice.

I've tried unloading a bunch of modules (sound and NIC IIRC), same 
results. I can try this again with an even more minimal set. If this 
improves the situation, I'll post again.



Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

Hi!

> Rafael, including you on this since 
> http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
> mentions you as the maintainer for Linux + power management. I hope this is
> still accurate.
> 
> Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
> 3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
> machine (Intel Z68, i7-3770K) that are somewhat less obvious.
> 
> After every boot, I get two successful suspend+resume cycles, but after the
> third suspend, it won't resume successfully. On the VGA console I've never
> had anything useful logged, luckily over the serial console I've had more
> luck. I seem to get as far as:

Has it ever worked ok? ...aha, in 3.10, ok.

> I've found out about pm_trace, which always points at the same line (and no
> device):
> 
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [0.780503]   Magic
> number: 0:52:740
> /var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [0.780599]   hash matches
> /tmp/linux-3.16.3/drivers/base/power/main.c:812
> 
> In my source tree that line is:
> 
> TRACE_RESUME(error);


if it resumes ok, this kind of tracking will not help.

> With kernels 3.10 and older I have no such problems, I can suspend+resume as
> often as I want.

is there chance to bisect?

> I've already tried to skip the NVidia + VMware modules at boot time (as you
> can see from the logs they're not loaded at any point), but it didn't help.
> I could try omitting more modules.

Yes, try with minimal modules (and no s2ram) would be nice.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

Hi!

Rafael, including you on this since
http://linuxconcloudopenna2013.sched.org/event/d708f47d07cd44b9669610778c024708#.VDRzTDS_EUF
mentions you as the maintainer for Linux + power management. I hope this is
still accurate.

Since Linux 3.12 (Debian version 3.12.9-1~bpo70+1) and all the way up to
3.16 (Debian version 3.16.3-2), I'm having suspend-resume issues on my
machine (Intel Z68, i7-3770K) that are somewhat less obvious.

After every boot, I get two successful suspend+resume cycles, but after the
third suspend, it won't resume successfully. On the VGA console I've never
had anything useful logged, luckily over the serial console I've had more
luck. I seem to get as far as:

Has it ever worked ok? ...aha, in 3.10, ok.

I've found out about pm_trace, which always points at the same line (and no
device):

/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [0.780503] Magic
number: 0:52:740
/var/log/syslog.1:Oct 10 16:43:58 ruby kernel: [0.780599] hash matches
/tmp/linux-3.16.3/drivers/base/power/main.c:812

In my source tree that line is:

TRACE_RESUME(error);

if it resumes ok, this kind of tracking will not help.

With kernels 3.10 and older I have no such problems, I can suspend+resume as
often as I want.

is there chance to bisect?

I've already tried to skip the NVidia + VMware modules at boot time (as you
can see from the logs they're not loaded at any point), but it didn't help.
I could try omitting more modules.

Yes, try with minimal modules (and no s2ram) would be nice.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures)
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume


Hello,

Many thanks for your response!

On 12-10-14 15:30, Pavel Machek wrote:


Has it ever worked ok? ...aha, in 3.10, ok.

Correct. And I've tried a few more kernels now, compiled on my own. 3.17 
still has this issue, 3.10 is completely fine all the way up to 3.10.57 
(I've tested just under 50 cycles last night). 3.11 I tried but it seems 
to have other suspend-resume stability issues not present anymore in 
later kernels, I've mostly not used those results.


git bisect: I've finally succeeded! I've tried automating it completely, 
but sadly Gigabyte couldn't be bothered wiring up the motherboard to 
make the watchdog work. :-(


The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb

Merge: 07f2daa fed2451
Author: Bjorn Helgaas bhelg...@google.com
Date:   Wed Aug 28 20:55:41 2013 -0600

Merge branch 'pci/misc' into next

* pci/misc:
  PCI: Remove pcie_cap_has_devctl()
  PCI: Support PCIe Capability Slot registers only for ports with slots
  PCI: Remove PCIe Capability version checks
  PCI: Allow PCIe Capability link-related register access for switches
  PCI: Add offsets of PCIe capability registers
  PCI: Tidy bitmasks and spacing of PCIe capability definitions
  PCI: Remove obsolete comment reference to pci_pcie_cap2()
  PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
  PCI: Rename PCIe capability definitions to follow convention
  PCI: Disable decoding for BAR sizing only when it was actually 
enabled
  PCI: Add comment about needing pci_msi_off() even when 
CONFIG_PCI_MSI=n
  PCI: Add pcibios_pm_ops for optional arch-specific hibernate 
functionality


I've then tried to narrow down which of the merged changes is my issue 
but with no luck, possibly because there's a problem with a combination 
of one of these changes, and a change that was not in the pci/misc 
branch at the time. I could do a manual test instead.



I've already tried to skip the NVidia + VMware modules at boot time (as you
can see from the logs they're not loaded at any point), but it didn't help.
I could try omitting more modules.

Yes, try with minimal modules (and no s2ram) would be nice.

I've tried unloading a bunch of modules (sound and NIC IIRC), same 
results. I can try this again with an even more minimal set. If this 
improves the situation, I'll post again.



Wilmer van der Gaast.

--
+ .''`. - -- ---+  +- -- ---  - --+
| wilmer : :'  :  gaast.net |  | OSS Programmer   www.bitlbee.org |
| lintux `. `~'  debian.org |  | Full-time geek  wilmer.gaast.net |
+--- -- -  ` ---+  +-- -  --- -- -+
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume

Bjorn, any ideas?

Would it be feasible to revert 2e8b... to see if it fixes it on 3.17?

Thanks,
Pavel

On Sun 2014-10-12 16:49:18, Wilmer van der Gaast wrote:
 Hello,
 
 Many thanks for your response!
 
 On 12-10-14 15:30, Pavel Machek wrote:
 
 Has it ever worked ok? ...aha, in 3.10, ok.
 
 Correct. And I've tried a few more kernels now, compiled on my own. 3.17
 still has this issue, 3.10 is completely fine all the way up to 3.10.57
 (I've tested just under 50 cycles last night). 3.11 I tried but it seems to
 have other suspend-resume stability issues not present anymore in later
 kernels, I've mostly not used those results.
 
 git bisect: I've finally succeeded! I've tried automating it completely, but
 sadly Gigabyte couldn't be bothered wiring up the motherboard to make the
 watchdog work. :-(
 
 The culprit appears to be this one: 2e8b5f621dbe29425906852c6079afb6b28720cb
 
 Merge: 07f2daa fed2451
 Author: Bjorn Helgaas bhelg...@google.com
 Date:   Wed Aug 28 20:55:41 2013 -0600
 
 Merge branch 'pci/misc' into next
 
 * pci/misc:
   PCI: Remove pcie_cap_has_devctl()
   PCI: Support PCIe Capability Slot registers only for ports with slots
   PCI: Remove PCIe Capability version checks
   PCI: Allow PCIe Capability link-related register access for switches
   PCI: Add offsets of PCIe capability registers
   PCI: Tidy bitmasks and spacing of PCIe capability definitions
   PCI: Remove obsolete comment reference to pci_pcie_cap2()
   PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment
   PCI: Rename PCIe capability definitions to follow convention
   PCI: Disable decoding for BAR sizing only when it was actually enabled
   PCI: Add comment about needing pci_msi_off() even when
 CONFIG_PCI_MSI=n
   PCI: Add pcibios_pm_ops for optional arch-specific hibernate
 functionality
 
 I've then tried to narrow down which of the merged changes is my issue but
 with no luck, possibly because there's a problem with a combination of one
 of these changes, and a change that was not in the pci/misc branch at the
 time. I could do a manual test instead.
 
 I've already tried to skip the NVidia + VMware modules at boot time (as you
 can see from the logs they're not loaded at any point), but it didn't help.
 I could try omitting more modules.
 Yes, try with minimal modules (and no s2ram) would be nice.
  
 I've tried unloading a bunch of modules (sound and NIC IIRC), same results.
 I can try this again with an even more minimal set. If this improves the
 situation, I'll post again.
 
 
 Wilmer van der Gaast.
 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine crashes right after ~successful resume