Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-13 Thread Roger Quadros
On 05/13/2014 01:07 AM, Tony Lindgren wrote:
 * Santosh Shilimkar santosh.shilim...@ti.com [140512 14:41]:
 On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
 * Kevin Hilman khil...@linaro.org [140509 16:46]:
 Roger Quadros rog...@ti.com writes:

 Kevin,

 On 05/09/2014 01:15 AM, Kevin Hilman wrote:
 Tony Lindgren t...@atomide.com writes:

 [...]

 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.

 Reverting that makes things a bit more stable, but it still eventually
 fails in the same way.  For me it took 8 boots for it to eventually
 fail.

 However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
 (20+ boots in a row and still going.)


 Can you please test with CPU_IDLE enabled but C3 disabled as in below 
 patch?
 It worked for me 10/10 boots.

 Yup, it worked for me too for 10/10 boots in a row.

 But what has caused this regression, does it work reliably with let's
 say v3.13 or v3.12?

 IIRC things were stable till some CPUIDLE code consolidation happened.
 I don't recall exactly but some one did discuss about it a while back.
 
 OK that's good to hear.
  
 Can you re-run your test-cases with patch at end of the email. This
 is just a hunch so don't blame me if I waste your time testing the
 patch.
 
 Seems to work after adding #include linux/clockchips.h. I did about 10
 reboots and they all succeeded for me. Without your revert, I'm getting
 a hang (with sysrq not working) about 1/3 of the boots.
 
 Kevin, Roger, does the revert from Santosh work for you too?
 

next-20140508 worked for me 10/10 times with Santosh's patch.
The heartbeat LED behaves normally as well. So I like it :).

cheers,
-roger

 BTW, I think the the RCU stall was/is a separate issue. That's different
 where the system actually recovers after about a minute, or after sysrq
 ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the
 older kernels around v3.10 time, or if it's still also happening.
 
 Regards,
 
 Tony
  
 From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001
 From: Santosh Shilimkar santosh.shilim...@ti.com
 Date: Mon, 12 May 2014 17:37:59 -0400
 Subject: [PATCH] Revert cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag

 This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78.

 Conflicts:

  arch/arm/mach-omap2/cpuidle44xx.c
 ---
  arch/arm/mach-omap2/cpuidle44xx.c |   11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

 diff --git a/arch/arm/mach-omap2/cpuidle44xx.c 
 b/arch/arm/mach-omap2/cpuidle44xx.c
 index 01fc710..aae3606 100644
 --- a/arch/arm/mach-omap2/cpuidle44xx.c
 +++ b/arch/arm/mach-omap2/cpuidle44xx.c
 @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
 *dev,
  {
  struct idle_statedata *cx = state_ptr + index;
  u32 mpuss_can_lose_context = 0;
 +int cpu_id = smp_processor_id();
  
  /*
   * CPU0 has to wait and stay ON until CPU1 is OFF state.
 @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
 *dev,
  mpuss_can_lose_context = (cx-mpu_state == PWRDM_POWER_RET) 
   (cx-mpu_logic_state == PWRDM_POWER_OFF);
  
 +clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu_id);
 +
  /*
   * Call idle CPU PM enter notifier chain so that
   * VFP and per CPU interrupt context is saved.
 @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
 *dev,
  if (dev-cpu == 0  mpuss_can_lose_context)
  cpu_cluster_pm_exit();
  
 +clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu_id);
 +
  fail:
  cpuidle_coupled_parallel_barrier(dev, abort_barrier);
  cpu_done[dev-cpu] = false;
 @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = {
  /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
  .exit_latency = 328 + 440,
  .target_residency = 960,
 -.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED 
 |
 - CPUIDLE_FLAG_TIMER_STOP,
 +.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
  .enter = omap_enter_idle_coupled,
  .name = C2,
  .desc = CPUx OFF, MPUSS CSWR,
 @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = {
  /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
  .exit_latency = 460 + 518,
  .target_residency = 1100,
 -.flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED 
 |
 - CPUIDLE_FLAG_TIMER_STOP,
 +  

Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-13 Thread Santosh Shilimkar
On Tuesday 13 May 2014 04:10 AM, Roger Quadros wrote:
 On 05/13/2014 01:07 AM, Tony Lindgren wrote:
 * Santosh Shilimkar santosh.shilim...@ti.com [140512 14:41]:
 On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
 * Kevin Hilman khil...@linaro.org [140509 16:46]:
 Roger Quadros rog...@ti.com writes:

 Kevin,

 On 05/09/2014 01:15 AM, Kevin Hilman wrote:
 Tony Lindgren t...@atomide.com writes:

 [...]

 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.

 Reverting that makes things a bit more stable, but it still eventually
 fails in the same way.  For me it took 8 boots for it to eventually
 fail.

 However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
 (20+ boots in a row and still going.)


 Can you please test with CPU_IDLE enabled but C3 disabled as in below 
 patch?
 It worked for me 10/10 boots.

 Yup, it worked for me too for 10/10 boots in a row.

 But what has caused this regression, does it work reliably with let's
 say v3.13 or v3.12?

 IIRC things were stable till some CPUIDLE code consolidation happened.
 I don't recall exactly but some one did discuss about it a while back.

 OK that's good to hear.
  
 Can you re-run your test-cases with patch at end of the email. This
 is just a hunch so don't blame me if I waste your time testing the
 patch.

 Seems to work after adding #include linux/clockchips.h. I did about 10
 reboots and they all succeeded for me. Without your revert, I'm getting
 a hang (with sysrq not working) about 1/3 of the boots.

 Kevin, Roger, does the revert from Santosh work for you too?

 
 next-20140508 worked for me 10/10 times with Santosh's patch.
 The heartbeat LED behaves normally as well. So I like it :).
 
Great. Will post the patch with change log updated and cc
you guys.

Regards,
Santosh

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-12 Thread Santosh Shilimkar
On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
 * Kevin Hilman khil...@linaro.org [140509 16:46]:
 Roger Quadros rog...@ti.com writes:

 Kevin,

 On 05/09/2014 01:15 AM, Kevin Hilman wrote:
 Tony Lindgren t...@atomide.com writes:

 [...]

 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.

 Reverting that makes things a bit more stable, but it still eventually
 fails in the same way.  For me it took 8 boots for it to eventually
 fail.

 However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
 (20+ boots in a row and still going.)


 Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
 It worked for me 10/10 boots.

 Yup, it worked for me too for 10/10 boots in a row.
 
 But what has caused this regression, does it work reliably with let's
 say v3.13 or v3.12?
 
IIRC things were stable till some CPUIDLE code consolidation happened.
I don't recall exactly but some one did discuss about it a while back.

Can you re-run your test-cases with patch at end of the email. This
is just a hunch so don't blame me if I waste your time testing the
patch.

regards,
Santosh

From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar santosh.shilim...@ti.com
Date: Mon, 12 May 2014 17:37:59 -0400
Subject: [PATCH] Revert cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag

This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78.

Conflicts:

arch/arm/mach-omap2/cpuidle44xx.c
---
 arch/arm/mach-omap2/cpuidle44xx.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle44xx.c 
b/arch/arm/mach-omap2/cpuidle44xx.c
index 01fc710..aae3606 100644
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device *dev,
 {
struct idle_statedata *cx = state_ptr + index;
u32 mpuss_can_lose_context = 0;
+   int cpu_id = smp_processor_id();
 
/*
 * CPU0 has to wait and stay ON until CPU1 is OFF state.
@@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
*dev,
mpuss_can_lose_context = (cx-mpu_state == PWRDM_POWER_RET) 
 (cx-mpu_logic_state == PWRDM_POWER_OFF);
 
+   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu_id);
+
/*
 * Call idle CPU PM enter notifier chain so that
 * VFP and per CPU interrupt context is saved.
@@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
*dev,
if (dev-cpu == 0  mpuss_can_lose_context)
cpu_cluster_pm_exit();
 
+   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu_id);
+
 fail:
cpuidle_coupled_parallel_barrier(dev, abort_barrier);
cpu_done[dev-cpu] = false;
@@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = {
/* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
.exit_latency = 328 + 440,
.target_residency = 960,
-   .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED 
|
-CPUIDLE_FLAG_TIMER_STOP,
+   .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
.enter = omap_enter_idle_coupled,
.name = C2,
.desc = CPUx OFF, MPUSS CSWR,
@@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = {
/* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
.exit_latency = 460 + 518,
.target_residency = 1100,
-   .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED 
|
-CPUIDLE_FLAG_TIMER_STOP,
+   .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
.enter = omap_enter_idle_coupled,
.name = C3,
.desc = CPUx OFF, MPUSS OSWR,
-- 
1.7.9.5


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-12 Thread Tony Lindgren
* Santosh Shilimkar santosh.shilim...@ti.com [140512 14:41]:
 On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
  * Kevin Hilman khil...@linaro.org [140509 16:46]:
  Roger Quadros rog...@ti.com writes:
 
  Kevin,
 
  On 05/09/2014 01:15 AM, Kevin Hilman wrote:
  Tony Lindgren t...@atomide.com writes:
 
  [...]
 
  ..but I think I found the cause for recent hangs on panda, just a wild
  guess based on looking at the recent cpuidle patches after v3.14.
 
  Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
  until all coupled CPUs leave idle) makes booting work reliably again
  on panda.
 
  Can you guys confirm, so far no issues here after few boot tests,
  but it might be too early to tell.
 
  Reverting that makes things a bit more stable, but it still eventually
  fails in the same way.  For me it took 8 boots for it to eventually
  fail.
 
  However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
  (20+ boots in a row and still going.)
 
 
  Can you please test with CPU_IDLE enabled but C3 disabled as in below 
  patch?
  It worked for me 10/10 boots.
 
  Yup, it worked for me too for 10/10 boots in a row.
  
  But what has caused this regression, does it work reliably with let's
  say v3.13 or v3.12?
  
 IIRC things were stable till some CPUIDLE code consolidation happened.
 I don't recall exactly but some one did discuss about it a while back.

OK that's good to hear.
 
 Can you re-run your test-cases with patch at end of the email. This
 is just a hunch so don't blame me if I waste your time testing the
 patch.

Seems to work after adding #include linux/clockchips.h. I did about 10
reboots and they all succeeded for me. Without your revert, I'm getting
a hang (with sysrq not working) about 1/3 of the boots.

Kevin, Roger, does the revert from Santosh work for you too?

BTW, I think the the RCU stall was/is a separate issue. That's different
where the system actually recovers after about a minute, or after sysrq
ctrl-a f h or l. Sorry, I no longer know if the RCU stall is only with the
older kernels around v3.10 time, or if it's still also happening.

Regards,

Tony
 
 From bdd30d68f8fa659aa0e3ce436f94029a7719036b Mon Sep 17 00:00:00 2001
 From: Santosh Shilimkar santosh.shilim...@ti.com
 Date: Mon, 12 May 2014 17:37:59 -0400
 Subject: [PATCH] Revert cpuidle / omap4 : use CPUIDLE_FLAG_TIMER_STOP flag
 
 This reverts commit cb7094e848f7bcaa0a4cda3db4b232f08dbf5b78.
 
 Conflicts:
 
   arch/arm/mach-omap2/cpuidle44xx.c
 ---
  arch/arm/mach-omap2/cpuidle44xx.c |   11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)
 
 diff --git a/arch/arm/mach-omap2/cpuidle44xx.c 
 b/arch/arm/mach-omap2/cpuidle44xx.c
 index 01fc710..aae3606 100644
 --- a/arch/arm/mach-omap2/cpuidle44xx.c
 +++ b/arch/arm/mach-omap2/cpuidle44xx.c
 @@ -83,6 +83,7 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
 *dev,
  {
   struct idle_statedata *cx = state_ptr + index;
   u32 mpuss_can_lose_context = 0;
 + int cpu_id = smp_processor_id();
  
   /*
* CPU0 has to wait and stay ON until CPU1 is OFF state.
 @@ -110,6 +111,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
 *dev,
   mpuss_can_lose_context = (cx-mpu_state == PWRDM_POWER_RET) 
(cx-mpu_logic_state == PWRDM_POWER_OFF);
  
 + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu_id);
 +
   /*
* Call idle CPU PM enter notifier chain so that
* VFP and per CPU interrupt context is saved.
 @@ -165,6 +168,8 @@ static int omap_enter_idle_coupled(struct cpuidle_device 
 *dev,
   if (dev-cpu == 0  mpuss_can_lose_context)
   cpu_cluster_pm_exit();
  
 + clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu_id);
 +
  fail:
   cpuidle_coupled_parallel_barrier(dev, abort_barrier);
   cpu_done[dev-cpu] = false;
 @@ -189,8 +194,7 @@ static struct cpuidle_driver omap4_idle_driver = {
   /* C2 - CPU0 OFF + CPU1 OFF + MPU CSWR */
   .exit_latency = 328 + 440,
   .target_residency = 960,
 - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED 
 |
 -  CPUIDLE_FLAG_TIMER_STOP,
 + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
   .enter = omap_enter_idle_coupled,
   .name = C2,
   .desc = CPUx OFF, MPUSS CSWR,
 @@ -199,8 +203,7 @@ static struct cpuidle_driver omap4_idle_driver = {
   /* C3 - CPU0 OFF + CPU1 OFF + MPU OSWR */
   .exit_latency = 460 + 518,
   .target_residency = 1100,
 - .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED 
 |
 -  CPUIDLE_FLAG_TIMER_STOP,
 + .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_COUPLED,
   .enter = 

Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-12 Thread Kevin Hilman
Santosh Shilimkar santosh.shilim...@ti.com writes:

 On Sunday 11 May 2014 11:55 AM, Tony Lindgren wrote:
 * Kevin Hilman khil...@linaro.org [140509 16:46]:
 Roger Quadros rog...@ti.com writes:

 Kevin,

 On 05/09/2014 01:15 AM, Kevin Hilman wrote:
 Tony Lindgren t...@atomide.com writes:

 [...]

 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.

 Reverting that makes things a bit more stable, but it still eventually
 fails in the same way.  For me it took 8 boots for it to eventually
 fail.

 However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
 (20+ boots in a row and still going.)


 Can you please test with CPU_IDLE enabled but C3 disabled as in below 
 patch?
 It worked for me 10/10 boots.

 Yup, it worked for me too for 10/10 boots in a row.
 
 But what has caused this regression, does it work reliably with let's
 say v3.13 or v3.12?
 
 IIRC things were stable till some CPUIDLE code consolidation happened.
 I don't recall exactly but some one did discuss about it a while back.

 Can you re-run your test-cases with patch at end of the email. This
 is just a hunch so don't blame me if I waste your time testing the
 patch.

With your patch applied on top of next-20140512, my 4460 Panda-ES has
booted 25 times in a row, and still going.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-11 Thread Tony Lindgren
* Kevin Hilman khil...@linaro.org [140509 16:46]:
 Roger Quadros rog...@ti.com writes:
 
  Kevin,
 
  On 05/09/2014 01:15 AM, Kevin Hilman wrote:
  Tony Lindgren t...@atomide.com writes:
  
  [...]
  
  ..but I think I found the cause for recent hangs on panda, just a wild
  guess based on looking at the recent cpuidle patches after v3.14.
 
  Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
  until all coupled CPUs leave idle) makes booting work reliably again
  on panda.
 
  Can you guys confirm, so far no issues here after few boot tests,
  but it might be too early to tell.
  
  Reverting that makes things a bit more stable, but it still eventually
  fails in the same way.  For me it took 8 boots for it to eventually
  fail.
  
  However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
  (20+ boots in a row and still going.)
  
 
  Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
  It worked for me 10/10 boots.
 
 Yup, it worked for me too for 10/10 boots in a row.

But what has caused this regression, does it work reliably with let's
say v3.13 or v3.12?

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-09 Thread Roger Quadros
On 05/08/2014 07:55 PM, Tony Lindgren wrote:
 * Kevin Hilman khil...@linaro.org [140508 08:40]:
 On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote:
 Roger Quadros rog...@ti.com writes:

 Hi,

 Nishant pointed me to a booting issue with omap4-panda-es on linux-next 
 but I'm observing
 similar issues, although less frequent, with v3.15-rc4 as well.

 Configuration:

 - kernel v3.15-rc4 or linux-next (20140507)
 - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
 - u-boot/master   173d294b94cf

 Observations:

 - Out of 10 boots a few may not succeed and hang midway without any 
 warnings. Heartbeat LED stops.
 e.g. http://www.hastebin.com/ebumojegoq.vhdl

 - Hang more noticeable on linux-next (20140507) than on v3.15-rc4

 I've beeen noticing the same thing for awhile with my boot tests.  For
 me, next-20140508 is failing most of the time now.

 - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
 without USB_EHCI_HCD.
 Maybe related to when high speed interrupts occur in the boot process.

 - On successful boots following warning is seen
 [4.010375] gic_timer_retrigger: lost localtimer interrupt

 - On successful boots heartbeat LED stops blinking after boot process and 
 left idle. LED can remain stuck in
 ON state as well. It does blink again when doing activity on console.

 Workaround:

 - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all 
 the above issues.

 I don't really know what exactly is the issue but it seems to be specific 
 to OMAP4, GIC, MPU OSWR.

 I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
 go away.  Hmm

 Another finger pointing in the same direction: omap2plus_defconfig +
 CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
 -next.
 
 Booting today's next with multi_v7_defconfig (so cpuidle enabled) on
 omap4 sdp seems to boot reliably. And it's not producing these:
 
 gic_timer_retrigger: lost localtimer interrupt 
 
 while panda is producing those errors like Roger mentioned.
 
 It seems that the USB networking is the main difference between
 omap4 sdp and panda?

Is your sdp using omap4430?

To confirm 4430 vs 4460 I ran 10 tests each on omap4430 panda and omap4460 
panda.

4430panda fails 2/10 times.
4460panda fails 7/10 times.

cheers,
-roger
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-09 Thread Roger Quadros
Kevin,

On 05/09/2014 01:15 AM, Kevin Hilman wrote:
 Tony Lindgren t...@atomide.com writes:
 
 [...]
 
 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.
 
 Reverting that makes things a bit more stable, but it still eventually
 fails in the same way.  For me it took 8 boots for it to eventually
 fail.
 
 However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
 (20+ boots in a row and still going.)
 

Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
It worked for me 10/10 boots.

diff --git a/arch/arm/mach-omap2/cpuidle44xx.c 
b/arch/arm/mach-omap2/cpuidle44xx.c
index 01fc710..99362ff 100644
--- a/arch/arm/mach-omap2/cpuidle44xx.c
+++ b/arch/arm/mach-omap2/cpuidle44xx.c
@@ -206,7 +206,12 @@ static struct cpuidle_driver omap4_idle_driver = {
.desc = CPUx OFF, MPUSS OSWR,
},
},
-   .state_count = ARRAY_SIZE(omap4_idle_data),
+/*
+ * Disable C3 state since it is unstable
+ *
+ * .state_count = ARRAY_SIZE(omap4_idle_data),
+ */
+   .state_count = 2,
.safe_state_index = 0,
 };
 



--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-09 Thread Roger Quadros
Grygorii,

On 05/08/2014 08:12 PM, Grygorii Strashko wrote:
 Hi,
 
 On 05/08/2014 06:40 PM, Kevin Hilman wrote:
 On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote:
 Roger Quadros rog...@ti.com writes:

 Hi,

 Nishant pointed me to a booting issue with omap4-panda-es on linux-next 
 but I'm observing
 similar issues, although less frequent, with v3.15-rc4 as well.

 Configuration:

 - kernel v3.15-rc4 or linux-next (20140507)
 - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
 - u-boot/master   173d294b94cf

 Observations:

 - Out of 10 boots a few may not succeed and hang midway without any 
 warnings. Heartbeat LED stops.
 e.g. http://www.hastebin.com/ebumojegoq.vhdl

 - Hang more noticeable on linux-next (20140507) than on v3.15-rc4

 I've beeen noticing the same thing for awhile with my boot tests.  For
 me, next-20140508 is failing most of the time now.

 - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
 without USB_EHCI_HCD.
 Maybe related to when high speed interrupts occur in the boot process.

 - On successful boots following warning is seen
 [4.010375] gic_timer_retrigger: lost localtimer interrupt

 - On successful boots heartbeat LED stops blinking after boot process and 
 left idle. LED can remain stuck in
 ON state as well. It does blink again when doing activity on console.

 Workaround:

 - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all 
 the above issues.

 I don't really know what exactly is the issue but it seems to be specific 
 to OMAP4, GIC, MPU OSWR.

 I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
 go away.  Hmm

 Another finger pointing in the same direction: omap2plus_defconfig +
 CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
 -next.
 
 Is it observed on OMAP4460 only?
 if no - it's smth new.
 if yes - may be some racing condition is still present.

I could observe it on 4430 as well, but just less frequent. 2/10 times on 4430 
vs 7/10 times on 4460.

 
 Roger, is it possible to connect debugger and check GIC distributor status
 (gic_dist_base_addr + GIC_DIST_CTRL) in case of failure?

Sorry, I do not have a debugger with me at the moment.
 
 According to the current code (OMAP4460) it's possible that CPU0 will stuck 
 only in case
 if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. 
 Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 
 calls clkdm_wakeup(cpu_clkdm[1]); 
 
 Sorry, but I'm not able to debug it now.

Stupid question, is hearbeat LED even supposed to stop blinking in C3 state?
It would make a user think that the board is dead.

cheers,
-roger
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-09 Thread Nishanth Menon
On Fri, May 9, 2014 at 3:30 AM, Roger Quadros rog...@ti.com wrote:

 Stupid question, is hearbeat LED even supposed to stop blinking in C3 state?
 It would make a user think that the board is dead.

I believe yes - we have tick suppression. else we'd be just wasting
power by waking up just to blink an LED. some deeper C states need
higher latencies.

Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-09 Thread Kevin Hilman
Roger Quadros rog...@ti.com writes:

 Kevin,

 On 05/09/2014 01:15 AM, Kevin Hilman wrote:
 Tony Lindgren t...@atomide.com writes:
 
 [...]
 
 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.
 
 Reverting that makes things a bit more stable, but it still eventually
 fails in the same way.  For me it took 8 boots for it to eventually
 fail.
 
 However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
 (20+ boots in a row and still going.)
 

 Can you please test with CPU_IDLE enabled but C3 disabled as in below patch?
 It worked for me 10/10 boots.

Yup, it worked for me too for 10/10 boots in a row.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-08 Thread Kevin Hilman
Roger Quadros rog...@ti.com writes:

 Hi,

 Nishant pointed me to a booting issue with omap4-panda-es on linux-next but 
 I'm observing
 similar issues, although less frequent, with v3.15-rc4 as well.

 Configuration:

 - kernel v3.15-rc4 or linux-next (20140507)
 - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
 - u-boot/master   173d294b94cf

 Observations:

 - Out of 10 boots a few may not succeed and hang midway without any warnings. 
 Heartbeat LED stops.
 e.g. http://www.hastebin.com/ebumojegoq.vhdl

 - Hang more noticeable on linux-next (20140507) than on v3.15-rc4

I've beeen noticing the same thing for awhile with my boot tests.  For
me, next-20140508 is failing most of the time now.

 - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
 without USB_EHCI_HCD.
 Maybe related to when high speed interrupts occur in the boot process.

 - On successful boots following warning is seen
 [4.010375] gic_timer_retrigger: lost localtimer interrupt

 - On successful boots heartbeat LED stops blinking after boot process and 
 left idle. LED can remain stuck in
 ON state as well. It does blink again when doing activity on console.

 Workaround:

 - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all 
 the above issues.

 I don't really know what exactly is the issue but it seems to be specific to 
 OMAP4, GIC, MPU OSWR.

I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
go away.  Hmm

Kevin


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-08 Thread Kevin Hilman
On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote:
 Roger Quadros rog...@ti.com writes:

 Hi,

 Nishant pointed me to a booting issue with omap4-panda-es on linux-next but 
 I'm observing
 similar issues, although less frequent, with v3.15-rc4 as well.

 Configuration:

 - kernel v3.15-rc4 or linux-next (20140507)
 - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
 - u-boot/master   173d294b94cf

 Observations:

 - Out of 10 boots a few may not succeed and hang midway without any 
 warnings. Heartbeat LED stops.
 e.g. http://www.hastebin.com/ebumojegoq.vhdl

 - Hang more noticeable on linux-next (20140507) than on v3.15-rc4

 I've beeen noticing the same thing for awhile with my boot tests.  For
 me, next-20140508 is failing most of the time now.

 - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
 without USB_EHCI_HCD.
 Maybe related to when high speed interrupts occur in the boot process.

 - On successful boots following warning is seen
 [4.010375] gic_timer_retrigger: lost localtimer interrupt

 - On successful boots heartbeat LED stops blinking after boot process and 
 left idle. LED can remain stuck in
 ON state as well. It does blink again when doing activity on console.

 Workaround:

 - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all 
 the above issues.

 I don't really know what exactly is the issue but it seems to be specific to 
 OMAP4, GIC, MPU OSWR.

 I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
 go away.  Hmm

Another finger pointing in the same direction: omap2plus_defconfig +
CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
-next.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-08 Thread Grygorii Strashko
Hi,

On 05/08/2014 06:40 PM, Kevin Hilman wrote:
 On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote:
 Roger Quadros rog...@ti.com writes:

 Hi,

 Nishant pointed me to a booting issue with omap4-panda-es on linux-next but 
 I'm observing
 similar issues, although less frequent, with v3.15-rc4 as well.

 Configuration:

 - kernel v3.15-rc4 or linux-next (20140507)
 - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
 - u-boot/master   173d294b94cf

 Observations:

 - Out of 10 boots a few may not succeed and hang midway without any 
 warnings. Heartbeat LED stops.
 e.g. http://www.hastebin.com/ebumojegoq.vhdl

 - Hang more noticeable on linux-next (20140507) than on v3.15-rc4

 I've beeen noticing the same thing for awhile with my boot tests.  For
 me, next-20140508 is failing most of the time now.

 - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
 without USB_EHCI_HCD.
 Maybe related to when high speed interrupts occur in the boot process.

 - On successful boots following warning is seen
 [4.010375] gic_timer_retrigger: lost localtimer interrupt

 - On successful boots heartbeat LED stops blinking after boot process and 
 left idle. LED can remain stuck in
 ON state as well. It does blink again when doing activity on console.

 Workaround:

 - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all 
 the above issues.

 I don't really know what exactly is the issue but it seems to be specific 
 to OMAP4, GIC, MPU OSWR.

 I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
 go away.  Hmm
 
 Another finger pointing in the same direction: omap2plus_defconfig +
 CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
 -next.

Is it observed on OMAP4460 only?
if no - it's smth new.
if yes - may be some racing condition is still present.

Roger, is it possible to connect debugger and check GIC distributor status
(gic_dist_base_addr + GIC_DIST_CTRL) in case of failure?

According to the current code (OMAP4460) it's possible that CPU0 will stuck 
only in case
if CPU1 is kicked off from PWRDM_POWER_OFF state somehow but not by CPU0. 
Code assumes that CPU1 can exit from PWRDM_POWER_OFF state only when CPU0 calls 
clkdm_wakeup(cpu_clkdm[1]); 

Sorry, but I'm not able to debug it now.

Regards,
-grygorii
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-08 Thread Tony Lindgren
* Kevin Hilman khil...@linaro.org [140508 08:40]:
 On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote:
  Roger Quadros rog...@ti.com writes:
 
  Hi,
 
  Nishant pointed me to a booting issue with omap4-panda-es on linux-next 
  but I'm observing
  similar issues, although less frequent, with v3.15-rc4 as well.
 
  Configuration:
 
  - kernel v3.15-rc4 or linux-next (20140507)
  - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
  - u-boot/master   173d294b94cf
 
  Observations:
 
  - Out of 10 boots a few may not succeed and hang midway without any 
  warnings. Heartbeat LED stops.
  e.g. http://www.hastebin.com/ebumojegoq.vhdl
 
  - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
 
  I've beeen noticing the same thing for awhile with my boot tests.  For
  me, next-20140508 is failing most of the time now.
 
  - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
  without USB_EHCI_HCD.
  Maybe related to when high speed interrupts occur in the boot process.
 
  - On successful boots following warning is seen
  [4.010375] gic_timer_retrigger: lost localtimer interrupt
 
  - On successful boots heartbeat LED stops blinking after boot process and 
  left idle. LED can remain stuck in
  ON state as well. It does blink again when doing activity on console.
 
  Workaround:
 
  - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix all 
  the above issues.
 
  I don't really know what exactly is the issue but it seems to be specific 
  to OMAP4, GIC, MPU OSWR.
 
  I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
  go away.  Hmm
 
 Another finger pointing in the same direction: omap2plus_defconfig +
 CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
 -next.

Booting today's next with multi_v7_defconfig (so cpuidle enabled) on
omap4 sdp seems to boot reliably. And it's not producing these:

gic_timer_retrigger: lost localtimer interrupt 

while panda is producing those errors like Roger mentioned.

It seems that the USB networking is the main difference between
omap4 sdp and panda?

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-08 Thread Tony Lindgren
Added few cpuidle people to Cc on this regression.

* Tony Lindgren t...@atomide.com [140508 09:57]:
 * Kevin Hilman khil...@linaro.org [140508 08:40]:
  On Thu, May 8, 2014 at 8:31 AM, Kevin Hilman khil...@linaro.org wrote:
   Roger Quadros rog...@ti.com writes:
  
   Hi,
  
   Nishant pointed me to a booting issue with omap4-panda-es on linux-next 
   but I'm observing
   similar issues, although less frequent, with v3.15-rc4 as well.
  
   Configuration:
  
   - kernel v3.15-rc4 or linux-next (20140507)
   - multi_v7_defconfig with LEDS_TRIGGER_HEARTBEAT and LEDS_GPIO enabled
   - u-boot/master   173d294b94cf
  
   Observations:
  
   - Out of 10 boots a few may not succeed and hang midway without any 
   warnings. Heartbeat LED stops.
   e.g. http://www.hastebin.com/ebumojegoq.vhdl
  
   - Hang more noticeable on linux-next (20140507) than on v3.15-rc4
  
   I've beeen noticing the same thing for awhile with my boot tests.  For
   me, next-20140508 is failing most of the time now.
  
   - Hang more noticeable with USB_EHCI_HCD enabled but hang observed even 
   without USB_EHCI_HCD.
   Maybe related to when high speed interrupts occur in the boot process.
  
   - On successful boots following warning is seen
   [4.010375] gic_timer_retrigger: lost localtimer interrupt
  
   - On successful boots heartbeat LED stops blinking after boot process 
   and left idle. LED can remain stuck in
   ON state as well. It does blink again when doing activity on console.
  
   Workaround:
  
   - Disabling CPU_IDLE or even just disabling C3 (MPU OSWR) seems to fix 
   all the above issues.
  
   I don't really know what exactly is the issue but it seems to be 
   specific to OMAP4, GIC, MPU OSWR.
  
   I can confirm that disabling CONFIG_CPU_IDLE seems to make the problem
   go away.  Hmm
  
  Another finger pointing in the same direction: omap2plus_defconfig +
  CONFIG_CPU_IDLE=y also fails to boot rather consistently in today's
  -next.
 
 Booting today's next with multi_v7_defconfig (so cpuidle enabled) on
 omap4 sdp seems to boot reliably. And it's not producing these:
 
 gic_timer_retrigger: lost localtimer interrupt 

Still seeing the above, looks like the lost localtimer interrupt
above is a separate issue..
 
 while panda is producing those errors like Roger mentioned.
 
 It seems that the USB networking is the main difference between
 omap4 sdp and panda?

..but I think I found the cause for recent hangs on panda, just a wild
guess based on looking at the recent cpuidle patches after v3.14.

Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
until all coupled CPUs leave idle) makes booting work reliably again
on panda.

Can you guys confirm, so far no issues here after few boot tests,
but it might be too early to tell.

Regards,

Tony
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: omap4-panda-es boot issues with v3.15-rc4

2014-05-08 Thread Kevin Hilman
Tony Lindgren t...@atomide.com writes:

[...]

 ..but I think I found the cause for recent hangs on panda, just a wild
 guess based on looking at the recent cpuidle patches after v3.14.

 Looks like reverting 0b89e9aa2856 (cpuidle: delay enabling interrupts
 until all coupled CPUs leave idle) makes booting work reliably again
 on panda.

 Can you guys confirm, so far no issues here after few boot tests,
 but it might be too early to tell.

Reverting that makes things a bit more stable, but it still eventually
fails in the same way.  For me it took 8 boots for it to eventually
fail.

However, if I build with CONFIG_CPU_IDLE=n, it becomes much more stable
(20+ boots in a row and still going.)

Kevin




--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html