Re: omap4-panda-es boot issues with v3.15-rc4
On 05/13/2014 01:07 AM, Tony Lindgren wrote: * Santosh Shilimkar santosh.shilim...@ti.com [140512 14:41]: On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: * Kevin Hilman khil...@linaro.org [140509 16:46]: Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? IIRC things were stable till some CPUIDLE code consolidation happened. I don't recall exactly but some one did discuss about it a while back. OK that's good to hear. Can you re-run your test-cases with patch at end of the email. This is just a hunch so don't blame me if I waste your time testing the patch. Seems to work after adding #include linux/clockchips.h. I did about 10 reboots and they all succeeded for me. Without your revert, I'm getting a hang (with sysrq not working) about 1/3 of the boots. Kevin, Roger, does the revert from Santosh work for you too? next-20140508 worked for me 10/10 times with Santosh's patch. The heartbeat LED behaves normally as well. So I like it :). cheers, -roger BTW, I think the the RCU stall was/is a separate issue. That's different where the system actually recovers after about a minute, or after sysrq ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the older kernels around v3.10 time, or if it's still also happening. Regards, Tony From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar santosh.shilim...@ti.com Date: Mon, 12 May 2014 17:37:59 -0400 Subject: [PATCH] Revert cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. Conflicts: arch/arm/mach-omap2/cpuidle44xx.c --- arch/arm/mach-omap2/cpuidle44xx.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..aae3606 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, { struct idle_statedata *cx = state_ptr + index; u32 mpuss_can_lose_context = 0; +int cpu_id = smp_processor_id(); /* * CPU0 has to wait and stay ON until CPU1 is OFF state. @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context = (cx-mpu_state == PWRDM_POWER_RET) (cx-mpu_logic_state == PWRDM_POWER_OFF); +clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu_id); + /* * Call idle CPU PM enter notifier chain so that * VFP and per CPU interrupt context is saved. @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev-cpu == 0 mpuss_can_lose_context) cpu_cluster_pm_exit(); +clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu_id); + fail: cpuidle_coupled_parallel_barrier(dev, abort_barrier); cpu_done[dev-cpu] = false; @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ .exit_latency = 328 + 440, .target_residency = 960, -.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, +.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = C2, .desc = CPUx OFF, MPUSS CSWR, @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ .exit_latency = 460 + 518, .target_residency = 1100, -.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, +
Re: omap4-panda-es boot issues with v3.15-rc4
On Tuesday 13 May 2014 04:10 AM, Roger Quadros wrote: On 05/13/2014 01:07 AM, Tony Lindgren wrote: * Santosh Shilimkar santosh.shilim...@ti.com [140512 14:41]: On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: * Kevin Hilman khil...@linaro.org [140509 16:46]: Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? IIRC things were stable till some CPUIDLE code consolidation happened. I don't recall exactly but some one did discuss about it a while back. OK that's good to hear. Can you re-run your test-cases with patch at end of the email. This is just a hunch so don't blame me if I waste your time testing the patch. Seems to work after adding #include linux/clockchips.h. I did about 10 reboots and they all succeeded for me. Without your revert, I'm getting a hang (with sysrq not working) about 1/3 of the boots. Kevin, Roger, does the revert from Santosh work for you too? next-20140508 worked for me 10/10 times with Santosh's patch. The heartbeat LED behaves normally as well. So I like it :). Great. Will post the patch with change log updated and cc you guys. Regards, Santosh -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: * Kevin Hilman khil...@linaro.org [140509 16:46]: Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? IIRC things were stable till some CPUIDLE code consolidation happened. I don't recall exactly but some one did discuss about it a while back. Can you re-run your test-cases with patch at end of the email. This is just a hunch so don't blame me if I waste your time testing the patch. regards, Santosh From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar santosh.shilim...@ti.com Date: Mon, 12 May 2014 17:37:59 -0400 Subject: [PATCH] Revert cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. Conflicts: arch/arm/mach-omap2/cpuidle44xx.c --- arch/arm/mach-omap2/cpuidle44xx.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..aae3606 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, { struct idle_statedata *cx = state_ptr + index; u32 mpuss_can_lose_context = 0; + int cpu_id = smp_processor_id(); /* * CPU0 has to wait and stay ON until CPU1 is OFF state. @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context = (cx-mpu_state == PWRDM_POWER_RET) (cx-mpu_logic_state == PWRDM_POWER_OFF); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu_id); + /* * Call idle CPU PM enter notifier chain so that * VFP and per CPU interrupt context is saved. @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev-cpu == 0 mpuss_can_lose_context) cpu_cluster_pm_exit(); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu_id); + fail: cpuidle_coupled_parallel_barrier(dev, abort_barrier); cpu_done[dev-cpu] = false; @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ .exit_latency = 328 + 440, .target_residency = 960, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | -CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = C2, .desc = CPUx OFF, MPUSS CSWR, @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ .exit_latency = 460 + 518, .target_residency = 1100, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | -CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = C3, .desc = CPUx OFF, MPUSS OSWR, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
* Santosh Shilimkar santosh.shilim...@ti.com [140512 14:41]: On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: * Kevin Hilman khil...@linaro.org [140509 16:46]: Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? IIRC things were stable till some CPUIDLE code consolidation happened. I don't recall exactly but some one did discuss about it a while back. OK that's good to hear. Can you re-run your test-cases with patch at end of the email. This is just a hunch so don't blame me if I waste your time testing the patch. Seems to work after adding #include linux/clockchips.h. I did about 10 reboots and they all succeeded for me. Without your revert, I'm getting a hang (with sysrq not working) about 1/3 of the boots. Kevin, Roger, does the revert from Santosh work for you too? BTW, I think the the RCU stall was/is a separate issue. That's different where the system actually recovers after about a minute, or after sysrq ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the older kernels around v3.10 time, or if it's still also happening. Regards, Tony From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar santosh.shilim...@ti.com Date: Mon, 12 May 2014 17:37:59 -0400 Subject: [PATCH] Revert cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78. Conflicts: arch/arm/mach-omap2/cpuidle44xx.c --- arch/arm/mach-omap2/cpuidle44xx.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..aae3606 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, { struct idle_statedata *cx = state_ptr + index; u32 mpuss_can_lose_context = 0; + int cpu_id = smp_processor_id(); /* * CPU0 has to wait and stay ON until CPU1 is OFF state. @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, mpuss_can_lose_context = (cx-mpu_state == PWRDM_POWER_RET) (cx-mpu_logic_state == PWRDM_POWER_OFF); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu_id); + /* * Call idle CPU PM enter notifier chain so that * VFP and per CPU interrupt context is saved. @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev, if (dev-cpu == 0 mpuss_can_lose_context) cpu_cluster_pm_exit(); + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu_id); + fail: cpuidle_coupled_parallel_barrier(dev, abort_barrier); cpu_done[dev-cpu] = false; @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */ .exit_latency = 328 + 440, .target_residency = 960, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter = omap_enter_idle_coupled, .name = C2, .desc = CPUx OFF, MPUSS CSWR, @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = { /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */ .exit_latency = 460 + 518, .target_residency = 1100, - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED | - CPUIDLE_FLAG_TIMER_STOP, + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED, .enter =
Re: omap4-panda-es boot issues with v3.15-rc4
Santosh Shilimkar santosh.shilim...@ti.com writes: On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote: * Kevin Hilman khil...@linaro.org [140509 16:46]: Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? IIRC things were stable till some CPUIDLE code consolidation happened. I don't recall exactly but some one did discuss about it a while back. Can you re-run your test-cases with patch at end of the email. This is just a hunch so don't blame me if I waste your time testing the patch. With your patch applied on top of next-20140512, my 4460 Panda-ES has booted 25 times in a row, and still going. Kevin -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
* Kevin Hilman khil...@linaro.org [140509 16:46]: Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. But what has caused this regression, does it work reliably with let's say v3.13 or v3.12? Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
On 05/08/2014 07:55 PM, Tony Lindgren wrote: * Kevin Hilman khil...@linaro.org [140508 08:40]: On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote: Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Booting today's next with multi_v7_defconfig (so cpuidle enabled) on omap4 sdp seems to boot reliably. And it's not producing these: gic_timer_retrigger: lost localtimer interrupt while panda is producing those errors like Roger mentioned. It seems that the USB networking is the main difference between omap4 sdp and panda? Is your sdp using omap4430? To confirm 4430 vs 4460 I ran 10 tests each on omap4430 panda and omap4460 panda. 4430panda fails 2/10 times. 4460panda fails 7/10 times. cheers, -roger -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. diff --git a/arch/arm/mach-omap2/cpuidle44xx.c b/arch/arm/mach-omap2/cpuidle44xx.c index 01fc710..99362ff 100644 --- a/arch/arm/mach-omap2/cpuidle44xx.c +++ b/arch/arm/mach-omap2/cpuidle44xx.c @@ -206,7 +206,12 @@ static struct cpuidle_driver omap4_idle_driver = { .desc = CPUx OFF, MPUSS OSWR, }, }, - .state_count = ARRAY_SIZE(omap4_idle_data), +/* + * Disable C3 state since it is unstable + * + * .state_count = ARRAY_SIZE(omap4_idle_data), + */ + .state_count = 2, .safe_state_index = 0, }; -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Grygorii, On 05/08/2014 08:12 PM, Grygorii Strashko wrote: Hi, On 05/08/2014 06:40 PM, Kevin Hilman wrote: On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote: Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Is it observed on OMAP4460 only? if no - it's smth new. if yes - may be some racing condition is still present. I could observe it on 4430 as well, but just less frequent. 2/10 times on 4430 vs 7/10 times on 4460. Roger, is it possible to connect debugger and check GIC distributor status (gic_dist_base_addr + GIC_DIST_CTRL) in case of failure? Sorry, I do not have a debugger with me at the moment. According to the current code (OMAP4460) it's possible that CPU0 will stuck only in case if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls clkdm_wakeup(cpu_clkdm[1]); Sorry, but I'm not able to debug it now. Stupid question, is hearbeat LED even supposed to stop blinking in C3 state? It would make a user think that the board is dead. cheers, -roger -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
On Fri, May 9, 2014 at 3:30 AM, Roger Quadros rog...@ti.com wrote: Stupid question, is hearbeat LED even supposed to stop blinking in C3 state? It would make a user think that the board is dead. I believe yes - we have tick suppression. else we'd be just wasting power by waking up just to blink an LED. some deeper C states need higher latencies. Regards, Nishanth Menon -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Roger Quadros rog...@ti.com writes: Kevin, On 05/09/2014 01:15 AM, Kevin Hilman wrote: Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Can you please test with CPU_IDLE enabled but C3 disabled as in below patch? It worked for me 10/10 boots. Yup, it worked for me too for 10/10 boots in a row. Kevin -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Kevin -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote: Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Kevin -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Hi, On 05/08/2014 06:40 PM, Kevin Hilman wrote: On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote: Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Is it observed on OMAP4460 only? if no - it's smth new. if yes - may be some racing condition is still present. Roger, is it possible to connect debugger and check GIC distributor status (gic_dist_base_addr + GIC_DIST_CTRL) in case of failure? According to the current code (OMAP4460) it's possible that CPU0 will stuck only in case if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls clkdm_wakeup(cpu_clkdm[1]); Sorry, but I'm not able to debug it now. Regards, -grygorii -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
* Kevin Hilman khil...@linaro.org [140508 08:40]: On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote: Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Booting today's next with multi_v7_defconfig (so cpuidle enabled) on omap4 sdp seems to boot reliably. And it's not producing these: gic_timer_retrigger: lost localtimer interrupt while panda is producing those errors like Roger mentioned. It seems that the USB networking is the main difference between omap4 sdp and panda? Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Added few cpuidle people to Cc on this regression. * Tony Lindgren t...@atomide.com [140508 09:57]: * Kevin Hilman khil...@linaro.org [140508 08:40]: On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote: Roger Quadros rog...@ti.com writes: Hi, Nishant pointed me to a booting issue with omap4-panda-es on linux-next but I'm observing similar issues, although less frequent, with v3.15-rc4 as well. Configuration: - kernel v3.15-rc4 or linux-next (20140507) - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled - u-boot/master 173d294b94cf Observations: - Out of 10 boots a few may not succeed and hang midway without any warnings. Heartbeat LED stops. e.g. http://www.hastebin.com/ebumojegoq.vhdl - Hang more noticeable on linux-next (20140507) than on v3.15-rc4 I've beeen noticing the same thing for awhile with my boot tests. For me, next-20140508 is failing most of the time now. - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even without USB_EHCI_HCD. Maybe related to when high speed interrupts occur in the boot process. - On successful boots following warning is seen [4.010375] gic_timer_retrigger: lost localtimer interrupt - On successful boots heartbeat LED stops blinking after boot process and left idle. LED can remain stuck in ON state as well. It does blink again when doing activity on console. Workaround: - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all the above issues. I don't really know what exactly is the issue but it seems to be specific to OMAP4, GIC, MPU OSWR. I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem go away. Hmm Another finger pointing in the same direction: omap2plus_defconfig + CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's -next. Booting today's next with multi_v7_defconfig (so cpuidle enabled) on omap4 sdp seems to boot reliably. And it's not producing these: gic_timer_retrigger: lost localtimer interrupt Still seeing the above, looks like the lost localtimer interrupt above is a separate issue.. while panda is producing those errors like Roger mentioned. It seems that the USB networking is the main difference between omap4 sdp and panda? ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Regards, Tony -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: omap4-panda-es boot issues with v3.15-rc4
Tony Lindgren t...@atomide.com writes: [...] ..but I think I found the cause for recent hangs on panda, just a wild guess based on looking at the recent cpuidle patches after v3.14. Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts until all coupled CPUs leave idle) makes booting work reliably again on panda. Can you guys confirm, so far no issues here after few boot tests, but it might be too early to tell. Reverting that makes things a bit more stable, but it still eventually fails in the same way. For me it took 8 boots for it to eventually fail. However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable (20+ boots in a row and still going.) Kevin -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html