subject:"Re\: DSS2\/PM on 3.2 broken\?"

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-02-01 Thread Paul Walmsley

On Wed, 1 Feb 2012, NeilBrown wrote:

 On Fri, 27 Jan 2012 23:02:51 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
  Since the HDQ module doesn't support the idle protocol, the target clock 
  FSM in the CM is what should determine whether the module is considered 
  idle or not.  And as long as the bit in the CM_FCLKEN register 
  corresponding to HDQ_FCLK is set, the FSM should consider the module as 
  active, and the clockdomain should not be allowed to go inactive.
  
  I write should.  Given that big caution at the bottom of section 20.4.5 
  System Power Management and Wakeup, this appears to have not been 
  correctly implemented.
 
 Can you help me understand why you say should there.
 You seem to be implying that a down-stream clock gate should cause an
 upstream clock to stay active, but that doesn't make sense to me given the
 presence of the much richer SIDLE framework.  Also I cannot find it in the
 TRM (though there a lots of words in there and I might have missed some).

Functional clocks are almost never auto-idled by the hardware[*].  So 
when the kernel sets a CM_FCLKEN* bit in a register for a clockdomain, 
that clockdomain should never go inactive.

As I understand it, main functional clocks on I/O IP blocks aren't 
automatically idled for a few basic reasons.  One is that the main 
functional clock has to be running in order for register accesses to 
succeed, so initiators have to have some way of preventing it from going 
off while they are interacting with it.  The PRCM doesn't snoop the 
interconnect :-)  Another is that some of the I/O IP blocks can continue 
to operate after they've idle-acked.  They idle-ack, which could allow 
clocks to be gated to interconnects, but the module can still continue to 
fill or drain FIFOs, which obviously requires a clock.  This was what is 
not working correctly with the UARTs... some events that generate 
interrupts can't generate SWAKEUPs :-(  And there are a few other lesser 
reasons that I am aware of that are not worth going into at this point.

Even interface clock auto-idle works in a different way than what one 
might expect.  Even if a module idle-acks, the interface clock will stay 
on until the interface clockdomain can go idle.  So suppose that you have 
an interface clockdomain with 20 modules and 19 of them idle-ack.  The 
interface clock stays supplied to those 19 modules until the 20th 
idle-acks.  This ensures that an initiator access to a target module will 
succeed.  (This is true for OMAP2/3; OMAP4 does something more clever.) In 
general, setting ICLKEN/FCLKEN bits to zero doesn't shut down clocks 
directly; it simply indicates that the clocks may be shut down once other 
users of that clock also indicate that they don't need the clocks.

 Most modules use SIDLE to keep the upstream clock active.

It's the other way around.  The device driver uses the FCLKEN bits to keep 
the upstream clock active :-)

  Disabling AUTO_HDQ is probably a reasonable workaround.  It would prevent 
  the CORE_L4 clockdomain from going inactive, and that happens to be where 
  the functional clock originates from, also.
  
  We have a partial facility to handle this type of bug in the existing 
  code.  Could you please try the patch enclosed at the bottom of this 
  E-mail and see if it helps?
 
 Yes, that patch fixes my problem too - the HDQ keeps working.

Great; thanks for the test.  We'll implement a variant of that for 
mainline.

Please also let me know if you wind up having wakeup problems with HDQ, 
especially with MPU off-mode.  If you don't, obviously that's good; but I 
suspect you might want it at some point.

 (it's a bit smoke-and-mirrors though .. I want fclk to stay on, so let's make
 sure iclk doesn't autoidle, because we *know* they have the same source :-)

A theoretically clearer workaround would be to place the main functional 
clock's clockdomain into software-supervised wakeup as long as the hwmod 
code thinks that the HDQ IP block is active.  But that is not so easy to 
implement with our current codebase.  We'd need to implement usecounting 
for our clockdomain control operations, and the other PM code assumes that 
it can force clockdomain states directly.  So I'm not too eager to do that 
rewrite at the moment given all the other stuff that's going on...  
Meanwhile, this fix targets the target clock FSM, and that's where the bug 
is, albeit on a slightly different path.  It's also functionally 
equivalent, given that the HDQ apparently can't send wakeups.


- Paul


*. Main functional clocks can be autoidled if they are 
interface+functional clocks.  This is the case for some purely 
computational, non-I/O IP blocks such as MAILBOXES.  Those combined 
interface+functional clocks are controlled via ICLKEN registers, not 
FCLKEN registers.


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-31 Thread NeilBrown

On Fri, 27 Jan 2012 23:02:51 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:


 Since the HDQ module doesn't support the idle protocol, the target clock 
 FSM in the CM is what should determine whether the module is considered 
 idle or not.  And as long as the bit in the CM_FCLKEN register 
 corresponding to HDQ_FCLK is set, the FSM should consider the module as 
 active, and the clockdomain should not be allowed to go inactive.
 
 I write should.  Given that big caution at the bottom of section 20.4.5 
 System Power Management and Wakeup, this appears to have not been 
 correctly implemented.

Can you help me understand why you say should there.
You seem to be implying that a down-stream clock gate should cause an
upstream clock to stay active, but that doesn't make sense to me given the
presence of the much richer SIDLE framework.  Also I cannot find it in the
TRM (though there a lots of words in there and I might have missed some).

Most modules use SIDLE to keep the upstream clock active.  HDQ doesn't have
that so it needs software over-rid to keep it active.

 
  It also mentions the AUTO_HDQ bit of PRCM.CM_AUTOIDLE1_CORE.  This had me
  confused for a while, but I think it only affects the iclk.  
 
 That's correct.
 
  I don't think the iclk is the problem.  It is the fclk that is 
  disappearing because the clkdm that feeds it is being turned off.
 
 Disabling AUTO_HDQ is probably a reasonable workaround.  It would prevent 
 the CORE_L4 clockdomain from going inactive, and that happens to be where 
 the functional clock originates from, also.
 
 We have a partial facility to handle this type of bug in the existing 
 code.  Could you please try the patch enclosed at the bottom of this 
 E-mail and see if it helps?

Yes, that patch fixes my problem too - the HDQ keeps working.

(it's a bit smoke-and-mirrors though .. I want fclk to stay on, so let's make
sure iclk doesn't autoidle, because we *know* they have the same source :-)


Thanks,
NeilBrown



signature.asc
Description: PGP signature

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-27 Thread NeilBrown

On Thu, 26 Jan 2012 07:19:19 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 On Tue, 24 Jan 2012, NeilBrown wrote:
 
  On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley p...@pwsan.com 
  wrote:
  
   On Wed, 18 Jan 2012, NeilBrown wrote:
   
Oh - another thing.
Sometimes during early boot I get:

[0.158447] omap_hwmod: usbtll_fck: missing clockdomain for 
usbtll_fck.
[0.176879] omap_hwmod: hdq: softreset failed (waited 1 usec)
   
   This latter bug just got fixed, it's updated in the new HDQ series that I 
   sent out:
   
   http://marc.info/?l=linux-omapm=132719073710874w=2
   
   BTW, if you could give that a spin for us at some point, it would be 
   greatly appreciated; I don't think I have a board with a 1-wire device on 
   it (at least, not that I know of).
  
  Yes, the patch series seems to work OK - at least it doesn't break anything:
  I can still access my battery charge meter.
 
 Thanks.  Should I add a Tested-by: from you on those patches?

Well... I only tested them after backporting to 3.2, and Linus is a big
proponent of the idea that once you rebase patches you invalidate any
testing
However I don't think there difference between 3.2 and 3.3-rc1 is big enough
in the area, so yes:

   Tested-by: NeilBrown ne...@suse.de

Thanks.


 
  It doesn't fix the problem I am having where HDQ gets confused when CPUIDLE
  kicks in, but I think I might have a handle on that now.
  
  Unlike other modules, HDQ doesn't have SIDLEMODE.
  According to 18.4.5 (AM/DM37x Multimedia Device Silicon Revision) it acts as
  though SIDLEMODE were set to SIDLE_FORCE so if the clock-domain trys to
  switch off, HDQ lets it.
 
 Hmm.  Do you know if the problem occurs in the middle of the HDQ 
 transaction, or when HDQ is completely idle?  I'd suspect the former.

I cannot tell.  Each transaction involves receiving two interrupts - one to
say the write completed and one to say the read completed.
I don't recall seeing any transaction which got the write-completion
interrupt but not the read-completion, but I haven't watched closely enough
to be sure.

 
 Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
 while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
 power state, so is the MPU interrupt controller, so the only way that the 
 MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
 don't see any evidence that the HDQ is capable of doing this, based on the 
 HDQ sections of the TRM.  What a huge energy waste, if true.

In the config I am testing MPU only goes to RET, never OFF. Same for CORE.

 
 Maybe try something like the following patch -- compile-tested only here.
 
 If this works, you might want to try dropping this patch and using the pad 
 mux to set a wakeup event on the 1-wire pad when the signal goes low.  
 That might be a more power-efficient approach.  You may still have to use 
 some PM QoS request there to ensure that the HDQ can wake up fast enough 
 to see the pulse, but the constraint shouldn't need to be as ludicrously 
 low as it is in the following patch.

Doesn't work - crashes :-(

pm_qos_power_init is defined as a late_initcall, so you cannot call
pm_qos_update_request until after all the initcalls have run.
But with your patch, the probe of the bq27000 triggers a read of the registers
which tries to update_request - and it goes 'bang'.

I really think the problem is the CORE pwrdm gating a clock because no module
says it needs it - i.e. nothing to do with MPU at all.

We want to keep CORE active when an HDQ transaction is happening, but MPU is
welcome to go to sleep.  I don't think you can express that with 'qos'.  I
think it needs some omap-specific machinery.

I can 'fix' the problem simply by making sure

pwrdm_for_each_clkdm(core_pd, _cpuidle_deny_idle);

runs in omap3_enter_idle whenever HDQ is active.

One of the reasons that I think it is a clock problem rather than just
missing a wakeup event is that once the problem starts happening I cannot
recovery without rebooting.
i.e. even if I tell the UARTs to keep the clocks on permanently and keep the
CPUIDLE state at 0, the HDQ doesn't start working again.  It has clearly
become confused.
The HDQ doco makes a point of saying that you shouldn't issue any commands
(except 'enable clock') when the clock is disabled.  I think we end up doing
that and it gets confused and cannot recover.

I note that there is an ad-hoc dependency between the camera and various
power states as well.  Maybe we need a little bit of infrastructure so that
camera can say Keep CORE and MPU on (or whatever it needs) and HDQ can say
Keep CORE on.  ???

Thanks,
NeilBrown


 
 
 - Paul
 
 commit 92fa561191b60ddf1296e02a6206160e8a5718f0
 Author: Paul Walmsley p...@pwsan.com
 Date:   Thu Jan 26 07:04:40 2012 -0700
 
 Attempt to work around lack of HDQ wakeup support
 
 diff --git a/drivers/w1/masters/omap_hdq.c

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-27 Thread Paul Walmsley

On Sat, 28 Jan 2012, NeilBrown wrote:

 On Thu, 26 Jan 2012 07:19:19 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

  Thanks.  Should I add a Tested-by: from you on those patches?
 
 Well... I only tested them after backporting to 3.2, and Linus is a big
 proponent of the idea that once you rebase patches you invalidate any
 testing
 However I don't think there difference between 3.2 and 3.3-rc1 is big enough
 in the area, so yes:
 
Tested-by: NeilBrown ne...@suse.de

Okay, thanks.

  Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
  while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
  power state, so is the MPU interrupt controller, so the only way that the 
  MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
  don't see any evidence that the HDQ is capable of doing this, based on the 
  HDQ sections of the TRM.  What a huge energy waste, if true.
 
 In the config I am testing MPU only goes to RET, never OFF. Same for CORE.

That qualifies as a low-power state for the MPU :-)

In MPU RET, at least in theory, the MPU INTC will not be operational, and 
thus unable to respond to interrupts.

  Maybe try something like the following patch -- compile-tested only here.
  
  If this works, you might want to try dropping this patch and using the pad 
  mux to set a wakeup event on the 1-wire pad when the signal goes low.  
  That might be a more power-efficient approach.  You may still have to use 
  some PM QoS request there to ensure that the HDQ can wake up fast enough 
  to see the pulse, but the constraint shouldn't need to be as ludicrously 
  low as it is in the following patch.
 
 Doesn't work - crashes :-(
 
 pm_qos_power_init is defined as a late_initcall, so you cannot call
 pm_qos_update_request until after all the initcalls have run.
 But with your patch, the probe of the bq27000 triggers a read of the registers
 which tries to update_request - and it goes 'bang'.

Hmm, okay.  Could you please remove the register read from the HDQ probe 
and see if that makes a difference?

 I really think the problem is the CORE pwrdm gating a clock because no module
 says it needs it - i.e. nothing to do with MPU at all.

Until pm_runtime_put*() is called, the usecount of hdq_fck will still be 
non-zero.  So the CORE shouldn't be able to gate it or hdq_ick at that 
time, and thus should not be able to enter idle. Hence the question about 
where the problems occur: whether they occur in the middle of the 
transaction or when the HDQ clocks are disabled.

 We want to keep CORE active when an HDQ transaction is happening, but MPU is
 welcome to go to sleep.  I don't think you can express that with 'qos'.  I
 think it needs some omap-specific machinery.

The OMAP PRCM hardware should keep the CORE* clkdms active when the 
hdq_fck is enabled.  So it's possible there could be a PRCM silicon bug 
that doesn't take hdq_fck into account when determining whether the CORE_* 
clkdms are inactive.

 I can 'fix' the problem simply by making sure
 
   pwrdm_for_each_clkdm(core_pd, _cpuidle_deny_idle);
 
 runs in omap3_enter_idle whenever HDQ is active.

Hmm, that does suggest that it's not wakeup related.

 One of the reasons that I think it is a clock problem rather than just 
 missing a wakeup event is that once the problem starts happening I 
 cannot recovery without rebooting. i.e. even if I tell the UARTs to keep 
 the clocks on permanently and keep the CPUIDLE state at 0, the HDQ 
 doesn't start working again.  It has clearly become confused. The HDQ 
 doco makes a point of saying that you shouldn't issue any commands 
 (except 'enable clock') when the clock is disabled.  I think we end up 
 doing that and it gets confused and cannot recover.
 
 I note that there is an ad-hoc dependency between the camera and various 
 power states as well.  Maybe we need a little bit of infrastructure so 
 that camera can say Keep CORE and MPU on (or whatever it needs) and 
 HDQ can say Keep CORE on.  ???

I'm not familiar with the camera problems, but in the HDQ case, this 
should only be needed if a silicon bug exists.  Which is certainly 
possible; we've seen this problem with one other IP block in the past. 
Based on a quick glance at the errata, I don't see anything related to the 
HDQ, but that doesn't really mean anything.

In any case, we should be able to work around this via the hwmod layer and 
a special flag, if the problem really is a PRCM bug.  This will depend on 
the functional powerstate conversion.  

Thanks for the detailed test reports.

- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-27 Thread NeilBrown

On Fri, 27 Jan 2012 15:58:40 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 On Sat, 28 Jan 2012, NeilBrown wrote:

   Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
   while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
   power state, so is the MPU interrupt controller, so the only way that the 
   MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And 
   I 
   don't see any evidence that the HDQ is capable of doing this, based on 
   the 
   HDQ sections of the TRM.  What a huge energy waste, if true.
  
  In the config I am testing MPU only goes to RET, never OFF. Same for CORE.
 
 That qualifies as a low-power state for the MPU :-)
 
 In MPU RET, at least in theory, the MPU INTC will not be operational, and 
 thus unable to respond to interrupts.
 
   Maybe try something like the following patch -- compile-tested only here.
   
   If this works, you might want to try dropping this patch and using the 
   pad 
   mux to set a wakeup event on the 1-wire pad when the signal goes low.  
   That might be a more power-efficient approach.  You may still have to use 
   some PM QoS request there to ensure that the HDQ can wake up fast enough 
   to see the pulse, but the constraint shouldn't need to be as ludicrously 
   low as it is in the following patch.
  
  Doesn't work - crashes :-(
  
  pm_qos_power_init is defined as a late_initcall, so you cannot call
  pm_qos_update_request until after all the initcalls have run.
  But with your patch, the probe of the bq27000 triggers a read of the 
  registers
  which tries to update_request - and it goes 'bang'.
 
 Hmm, okay.  Could you please remove the register read from the HDQ probe 
 and see if that makes a difference?

Sorry, I misdiagnosed.  The problem was that the pm_qos_request structure
needs to be zeroed before pm_qos_add_request() or it complains, doesn't
initialise it, and future calls can crash.

So with 'kzalloc' in place of 'kmalloc' where hdq_data is allocated, the qos
patch does make my problem go away.  It feels a bit heavy-handed though.

 
  I really think the problem is the CORE pwrdm gating a clock because no 
  module
  says it needs it - i.e. nothing to do with MPU at all.
 
 Until pm_runtime_put*() is called, the usecount of hdq_fck will still be 
 non-zero.  So the CORE shouldn't be able to gate it or hdq_ick at that 
 time, and thus should not be able to enter idle. Hence the question about 
 where the problems occur: whether they occur in the middle of the 
 transaction or when the HDQ clocks are disabled.
 
  We want to keep CORE active when an HDQ transaction is happening, but MPU is
  welcome to go to sleep.  I don't think you can express that with 'qos'.  I
  think it needs some omap-specific machinery.
 
 The OMAP PRCM hardware should keep the CORE* clkdms active when the 
 hdq_fck is enabled.  So it's possible there could be a PRCM silicon bug 
 that doesn't take hdq_fck into account when determining whether the CORE_* 
 clkdms are inactive.

That isn't the way I understand the TRM and the code.  I admit there is a lot
that I don't understand, but I've really drilled down in this area and my
understanding aligns with what I see experimentally, so

clkdm_allow_idle(struct clockdomain *clkdm) tells the hardware to allow that
clkdm to go idle, and it does so with no reference to whether any clk in the
clkdm is current enabled.  This seems to be called conditional only on which
IDLE state is being entered. Any state below C1 is entered with
clkdm_allow_idle in force.

When the hardware has been told that it can idle a clockdomain, it will
handshake with each module that uses the clockdomain.  i.e. it doesn't check
if the clock is enabled, it asks the module directly.

The module answers according to the SIDLE (slave idle) setting. This can be
set to
 - always say no, don't idle, I'm busy
 - always say yes, force me idle, I don't care
 - 'smartidle' where the response is based on the internal state of the
   module.

A good example of smartidle is the UART.  If it hasn't seen a character in or
out for about 10 (I think) character times it will say yes, idle the clock,
else it will say no.   When RX then goes low it will handshake back to say
Hey, I'm busy all of a sudden, can I have the clock back please which it
then gets in time to count off the start bit and then read the rest of the
incoming byte.(*)

However HDQ doesn't have SIDLE.  So the PRCM doesn't handshake with it.  It
just assumes that HDQ will say yes, force me idle, I don't care.

The diversion about UARTs is relevant because when the UARTs have SIDLE set
to 'no' I don't see the problem (they keep the clock domain active).  It is
only when the UARTs set their SIDLE to smartidle that the problem appears.

So: if the HDQ driver wants the clkdm to stay on it has to explicitly arrange
it in software by appropriate clkdm_deny_idle() calls.

From the System Power Management and Wakeup section of the

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-27 Thread Paul Walmsley

On Sat, 28 Jan 2012, NeilBrown wrote:

 On Fri, 27 Jan 2012 15:58:40 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
  Hmm, okay.  Could you please remove the register read from the HDQ probe 
  and see if that makes a difference?
 
 So with 'kzalloc' in place of 'kmalloc' where hdq_data is allocated, the qos
 patch does make my problem go away.  It feels a bit heavy-handed though.

Thanks for the test.  It's not clear to me how this driver could really 
work well if the MPU powerdomain is able to go to a low power state 
between the two phases of the transaction.  There's no wakeup source, so 
the interrupt won't be handled until another device wakes the system up.  
Of course, if you have a lot of wakeups coming from other devices, or if 
something is forcing the MPU powerdomain to stay ON or INACTIVE, this may 
not really matter.

So you might want that patch, at least on 3.3.  But from your following 
text, it sounds like there is another problem.

  The OMAP PRCM hardware should keep the CORE* clkdms active when the 
  hdq_fck is enabled.  So it's possible there could be a PRCM silicon bug 
  that doesn't take hdq_fck into account when determining whether the CORE_* 
  clkdms are inactive.

...

 clkdm_allow_idle(struct clockdomain *clkdm) tells the hardware to allow that
 clkdm to go idle, and it does so with no reference to whether any clk in the
 clkdm is current enabled.

Since the HDQ module doesn't support the idle protocol, the target clock 
FSM in the CM is what should determine whether the module is considered 
idle or not.  And as long as the bit in the CM_FCLKEN register 
corresponding to HDQ_FCLK is set, the FSM should consider the module as 
active, and the clockdomain should not be allowed to go inactive.

I write should.  Given that big caution at the bottom of section 20.4.5 
System Power Management and Wakeup, this appears to have not been 
correctly implemented.

 It also mentions the AUTO_HDQ bit of PRCM.CM_AUTOIDLE1_CORE.  This had me
 confused for a while, but I think it only affects the iclk.  

That's correct.

 I don't think the iclk is the problem.  It is the fclk that is 
 disappearing because the clkdm that feeds it is being turned off.

Disabling AUTO_HDQ is probably a reasonable workaround.  It would prevent 
the CORE_L4 clockdomain from going inactive, and that happens to be where 
the functional clock originates from, also.

We have a partial facility to handle this type of bug in the existing 
code.  Could you please try the patch enclosed at the bottom of this 
E-mail and see if it helps?

 My idea of a fix would be a secondary reference count on each clkdm.
 When a clk that doesn't have an associated SIDLE setting is activated we
 increase this reference count (and decrease it when the clk is deactivated).
 
 While the count is non-zero we deny_idle for that clkdm.
 
 Or something like that.

I think we can work around this case with the OCPIF_SWSUP_IDLE flag, since 
the functional clock and interface clock are in the same clockdomain.

 Thanks for listening :-)

Thanks for working through this bug and helping test :-)

 (*) Another note about UARTs.
   In 3.2 (haven't checked 3.3) the driver not only sets SMARTIDLE but also
   stops the clocks when things are idle.
   So when RX goes low and the UART wakes us up it doesn't get a clock
   straight away but we have to wait for code to enable the clock.
   In my experiments, this is fast enough to stay in-sync for speeds up to
   38400 (I think).  Above there we miss the first bit and get corrupt
   characters.
   I think that if the UART is expecting data it should not disable the clock
   but should use SMARTIDLE to let it turn off, and then turn back on again
   quickly but maybe 3.3 gets that right.

With the most recent UART patch set that was posted against 3.3-rc1, as 
long as the CORE powerdomain isn't in a low-power state, the UART can 
enter idle with clocks cut and still wake up quickly enough to handle 
incoming data at 115200 8n1 on the 35xx BeagleBoard.  That all happens in 
hardware; no kernel involvement is needed.  I don't see the corrupt 
characters that you do, but there may be a 35xx vs. 37xx difference here.  
(You're using an OMAP37xx?)


- Paul

From 16b9f4ee44f96846fad5a9cb0e55bcd25b77ed59 Mon Sep 17 00:00:00 2001
From: Paul Walmsley p...@pwsan.com
Date: Fri, 27 Jan 2012 22:46:34 -0700
Subject: [PATCH] Try software-supervised control of HDQ_ICLK

 Do not automatically enable interface clock autoidle on
 HDQ_ICLK. Require the hwmod layer to use software gating on
 HDQ_ICLK. This is because the CM FSM that handles HDQ_FCLK
 appears to not be working as expected.  This patch is
 experimental, untested, and for bug-testing purposes only.

---
 arch/arm/mach-omap2/clock3xxx_data.c   |1 +
 arch/arm/mach-omap2/omap_hwmod_3xxx_data.c |2 +-
 arch/arm/plat-omap/clock.c |4 +++-
 arch/arm/plat-omap/include/plat/clock.h|1 +
 4 files changed, 6 insertions(+), 2

Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-26 Thread Paul Walmsley

On Tue, 24 Jan 2012, NeilBrown wrote:

 On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
  On Wed, 18 Jan 2012, NeilBrown wrote:
  
   Oh - another thing.
   Sometimes during early boot I get:
   
   [0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
   [0.176879] omap_hwmod: hdq: softreset failed (waited 1 usec)
  
  This latter bug just got fixed, it's updated in the new HDQ series that I 
  sent out:
  
  http://marc.info/?l=linux-omapm=132719073710874w=2
  
  BTW, if you could give that a spin for us at some point, it would be 
  greatly appreciated; I don't think I have a board with a 1-wire device on 
  it (at least, not that I know of).
 
 Yes, the patch series seems to work OK - at least it doesn't break anything:
 I can still access my battery charge meter.

Thanks.  Should I add a Tested-by: from you on those patches?

 It doesn't fix the problem I am having where HDQ gets confused when CPUIDLE
 kicks in, but I think I might have a handle on that now.
 
 Unlike other modules, HDQ doesn't have SIDLEMODE.
 According to 18.4.5 (AM/DM37x Multimedia Device Silicon Revision) it acts as
 though SIDLEMODE were set to SIDLE_FORCE so if the clock-domain trys to
 switch off, HDQ lets it.

Hmm.  Do you know if the problem occurs in the middle of the HDQ 
transaction, or when HDQ is completely idle?  I'd suspect the former.

Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
power state, so is the MPU interrupt controller, so the only way that the 
MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
don't see any evidence that the HDQ is capable of doing this, based on the 
HDQ sections of the TRM.  What a huge energy waste, if true.

Maybe try something like the following patch -- compile-tested only here.

If this works, you might want to try dropping this patch and using the pad 
mux to set a wakeup event on the 1-wire pad when the signal goes low.  
That might be a more power-efficient approach.  You may still have to use 
some PM QoS request there to ensure that the HDQ can wake up fast enough 
to see the pulse, but the constraint shouldn't need to be as ludicrously 
low as it is in the following patch.


- Paul

commit 92fa561191b60ddf1296e02a6206160e8a5718f0
Author: Paul Walmsley p...@pwsan.com
Date:   Thu Jan 26 07:04:40 2012 -0700

Attempt to work around lack of HDQ wakeup support

diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c
index 63e3eda..990e38b 100644
--- a/drivers/w1/masters/omap_hdq.c
+++ b/drivers/w1/masters/omap_hdq.c
@@ -17,6 +17,7 @@
 #include linux/io.h
 #include linux/sched.h
 #include linux/pm_runtime.h
+#include linux/pm_qos.h
 
 #include asm/irq.h
 #include mach/hardware.h
@@ -61,6 +62,8 @@ struct hdq_data {
/* lock status update */
struct  mutex   hdq_mutex;
int hdq_usecount;
+   struct pm_qos_request   pm_qos_request;
+
u8  hdq_irqstatus;
/* device lock */
spinlock_t  hdq_spinlock;
@@ -409,6 +412,18 @@ static int omap_hdq_get(struct hdq_data *hdq_data)
goto rtn;
}
 
+   /*
+* XXX Horrible hack to keep the MPU powerdomain out of a
+* low-power state during HDQ transfers, since the HDQ doesn't
+* appear to be able to wake the system.  This constraint is
+* way too short - it will probably keep the MPU out of WFI
+* which is a total waste - but we don't have a
+* platform-independent way to control powerdomain states at
+* the moment and there is no obvious replacement for platform_data
+* fn ptrs AFAIK
+*/
+   pm_qos_update_request(hdq_data-pm_qos_request, 1);
+
if (OMAP_HDQ_MAX_USER == hdq_data-hdq_usecount) {
dev_dbg(hdq_data-dev, attempt to exceed the max use count);
ret = -EINVAL;
@@ -464,6 +479,10 @@ static int omap_hdq_put(struct hdq_data *hdq_data)
if (0 == hdq_data-hdq_usecount)
pm_runtime_put_sync(hdq_data-dev);
}
+
+   pm_qos_update_request(hdq_data-pm_qos_request,
+ PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE);
+
mutex_unlock(hdq_data-hdq_mutex);
 
return ret;
@@ -578,6 +597,10 @@ static int __devinit omap_hdq_probe(struct platform_device 
*pdev)
hdq_data-hdq_usecount = 0;
mutex_init(hdq_data-hdq_mutex);
 
+   pm_qos_add_request(hdq_data-pm_qos_request,
+  PM_QOS_CPU_DMA_LATENCY,
+  PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE);
+
pm_runtime_enable(pdev-dev);
pm_runtime_get_sync(pdev-dev);
 
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at

OMAP HDQ: was Re: DSS2/PM on 3.2 broken?

2012-01-24 Thread NeilBrown

On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 On Wed, 18 Jan 2012, NeilBrown wrote:
 
  Oh - another thing.
  Sometimes during early boot I get:
  
  [0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
  [0.176879] omap_hwmod: hdq: softreset failed (waited 1 usec)
 
 This latter bug just got fixed, it's updated in the new HDQ series that I 
 sent out:
 
 http://marc.info/?l=linux-omapm=132719073710874w=2
 
 BTW, if you could give that a spin for us at some point, it would be 
 greatly appreciated; I don't think I have a board with a 1-wire device on 
 it (at least, not that I know of).

Yes, the patch series seems to work OK - at least it doesn't break anything:
I can still access my battery charge meter.

It doesn't fix the problem I am having where HDQ gets confused when CPUIDLE
kicks in, but I think I might have a handle on that now.

Unlike other modules, HDQ doesn't have SIDLEMODE.
According to 18.4.5 (AM/DM37x Multimedia Device Silicon Revision) it acts as
though SIDLEMODE were set to SIDLE_FORCE so if the clock-domain trys to
switch off, HDQ lets it.

Normally UART has SIDLEMODE set to SIDLE_NO so that keeps the necessary clock
domain active.  But if I let the UARTs sleep, SIDLEMODE changes to
SIDLE_SMART which can allow the clockdomain to be stopped, which affects HDQ.

However I'm not sure how PRCM.CM_AUTOIDLE1_CORE fits into this. if AUTO_HDQ
is clear - which it is by default and I cannot see code that changes it -
then the clock for HDQ should be independent of  something.

I definitely see behaviour that looks like the HDQ clock stopping
when the UART allows SIDLE_SMART or when I explicitly deny idling of the
clockdomains.

Should there be special handling of HDQ because it doesn't have SIDLEMODE??

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-24 Thread NeilBrown

On Mon, 23 Jan 2012 14:11:16 -0800 Kevin Hilman khil...@ti.com wrote:

 NeilBrown ne...@suse.de writes:
 
  On Thu, 19 Jan 2012 16:22:37 -0800 Kevin Hilman khil...@ti.com wrote:
 
  NeilBrown ne...@suse.de writes:
  
   On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman khil...@ti.com wrote:
  
   Joe Woodward j...@terrafix.co.uk writes:
 
   
 
 [...]
 
  At least this part is expected.  
  
  In the kernel you're using the UART clocks are enabled/disabled during
  the idle path depending on the low-power state being targetted, so would
  expect to see lots of UART clock gating going on.
 
  Even though uarts 1,2,4 aren't even in use?
 
 Yes.  Our UART idle management before v3.3 was, um... a disaster.
 
  Maybe this is fixed in 3.3?
 
 As of v3.3, the UARTs are managed independently using runtime PM
 autosupend, so only UART that are in use should be coming on and back
 off again.
 
 However, as Paul has recently posted some UART fixes, you'll see that we
 have a few kinks to work out in the UART driver as well.

Good news ... I'll probably hold out until -rc2 before I start experimenting
with 3.3 much.

I think I understand why I was getting single-character corruption when
waking the UART while you just lost the character - I suspect you had
off_mode enabled.  That makes the difference for me.
Hopefully we can get rid of both the corruption and the character loss, but
I'll wait until I've tried 3.3 before I pursue that much more.

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-23 Thread Kevin Hilman

NeilBrown ne...@suse.de writes:

 On Thu, 19 Jan 2012 16:22:37 -0800 Kevin Hilman khil...@ti.com wrote:

 NeilBrown ne...@suse.de writes:
 
  On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman khil...@ti.com wrote:
 
  Joe Woodward j...@terrafix.co.uk writes:

  

[...]

 At least this part is expected.  
 
 In the kernel you're using the UART clocks are enabled/disabled during
 the idle path depending on the low-power state being targetted, so would
 expect to see lots of UART clock gating going on.

 Even though uarts 1,2,4 aren't even in use?

Yes.  Our UART idle management before v3.3 was, um... a disaster.

 Maybe this is fixed in 3.3?

As of v3.3, the UARTs are managed independently using runtime PM
autosupend, so only UART that are in use should be coming on and back
off again.

However, as Paul has recently posted some UART fixes, you'll see that we
have a few kinks to work out in the UART driver as well.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-22 Thread NeilBrown

On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 On Wed, 18 Jan 2012, NeilBrown wrote:
 
  Oh - another thing.
  Sometimes during early boot I get:
  
  [0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
  [0.176879] omap_hwmod: hdq: softreset failed (waited 1 usec)
 
 This latter bug just got fixed, it's updated in the new HDQ series that I 
 sent out:
 
 http://marc.info/?l=linux-omapm=132719073710874w=2
 
 BTW, if you could give that a spin for us at some point, it would be 
 greatly appreciated; I don't think I have a board with a 1-wire device on 
 it (at least, not that I know of).

I'll try them against 3.2 tomorrow.  I'll be a while longer before I move up
to 3.3-rc but when I do I'll try to test there too.

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-21 Thread NeilBrown

On Thu, 19 Jan 2012 16:22:37 -0800 Kevin Hilman khil...@ti.com wrote:

 NeilBrown ne...@suse.de writes:
 
  On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman khil...@ti.com wrote:
 
  Joe Woodward j...@terrafix.co.uk writes:
  
  [...]
  
   If I do (either from the console or via a button press on the screen)
   then I never get a SYNC_LOST.
  
 echo 0  /sys/devices/omapdss/display0/enabled
 echo 1  /sys/devices/omapdss/display0/enabled
  
   Just trying to think of some ideas that may be affecting the DSS...
 - Could it to be to do with the GPIO being used as a wake source (i.e. 
   does the GPIO driver do runtime_pm properly?)?
 - Could it to be to do with the UART as it seems to fix itself 
   whenever a character is pressed?
 - Could it to be to do with the ordering in which drivers are resumed?
  
  Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.
  
  The UART's are idled using timeouts, so after any activity (including a
  wakeup) the UART timeout will not alow the UARTs to idle (and thus the
  system to hit low power states) for a given timeout period.
 
  While I agree with that, I'm wondering if there might be something else odd
  relating to the UARTs that we haven't spotted yet.
 
 I suspect the problem for both is constraints that aren't stated
 explicitly.
 
  I'm chasing done the different problem of HDQ not coping with CPUIDLE but I
  think it is very likely to have a similar root cause, as HDQ goes wrong at
  the same time that DSS goes wrong, and is a vaguely similar way.
 
  I just tried compiling omap_hwmod.c with #define DEBUG and got some 
  strange
  results.
 
 Strange indeed.
 
 I assume you are you using the runtime PM converted driver from Paul?  I
 tried that branch on my 3530/Overo (console on UART3) and didn't see any
 nul's on my UART3 console.  

I found out what is causing this.  I'm open to suggestions for fixing it.

I turned on debugging in clock.c and found that just before the nuls start
appearing, I'm told
clock: dpll4_ck: disabling in hardware
and just before it starts working again I see
clock: dpll4_ck: enabling in hardware

The uart3 fck depends on dpll4_ck, but this early it boot, uart3 hasn't been
initialised (I use earlyprintk to see this at all).  So as soon as something
that depends on dpll4_ck goes to sleep and disables its clock, the uart3
clock can get turned off because the dependency isn't known yet.

I think it is mmc3 turning off that initially turns dpll4_ck off.
Then when HDQ wants hdq_fck which depends on dpll4_ck, we get the uart clock
back for a while.
Then when uart2 gets enabled, uart3 comes back on, and soon after uart3 gets
enabled, which keeps it on.

So it seems we need some way to avoid turning off any clocks until all
dependencies have been registered.  At least when earlyprintk is being used.

Would that make sense?

Unfortunately understanding this issue doesn't seem to give any insight into
the other issues that seemed to be related - ah well...


 
 [...]
 
  The other thing I discovered is that when I set the uart 'sleep_timeout' to 
  5
  (all uarts) so that CPUIDLE can let CORE enter the lower power states, I get
  a fairly steady stream of:
 
  [ 1168.490478] omap_hwmod: uart1: enabling
  [ 1168.490509] omap_hwmod: uart1: enabling clocks
  [ 1168.490539] omap_hwmod: uart2: enabling
  [ 1168.490539] omap_hwmod: uart2: enabling clocks
  [ 1168.490631] omap_hwmod: uart3: enabling
  [ 1168.490661] omap_hwmod: uart3: enabling clocks
  [ 1168.490661] omap_hwmod: uart4: enabling
  [ 1168.490692] omap_hwmod: uart4: enabling clocks
  [ 1168.578796] omap_hwmod: uart3: idling
  [ 1168.578796] omap_hwmod: uart3: disabling clocks
  [ 1168.578826] omap_hwmod: uart4: idling
  [ 1168.578826] omap_hwmod: uart4: disabling clocks
  [ 1168.578857] omap_hwmod: uart1: idling
  [ 1168.578857] omap_hwmod: uart1: disabling clocks
  [ 1168.578887] omap_hwmod: uart2: idling
  [ 1168.578887] omap_hwmod: uart2: disabling clocks
  [ 1168.714263] omap_hwmod: uart1: enabling
  [ 1168.714294] omap_hwmod: uart1: enabling clocks
  [ 1168.714324] omap_hwmod: uart2: enabling
  [ 1168.714324] omap_hwmod: uart2: enabling clocks
  [ 1168.714416] omap_hwmod: uart3: enabling
  [ 1168.714447] omap_hwmod: uart3: enabling clocks
  [ 1168.714447] omap_hwmod: uart4: enabling
  [ 1168.714477] omap_hwmod: uart4: enabling clocks
  [ 1168.803344] omap_hwmod: uart3: idling
  [ 1168.803344] omap_hwmod: uart3: disabling clocks
  [ 1168.803375] omap_hwmod: uart4: idling
  [ 1168.803375] omap_hwmod: uart4: disabling clocks
  [ 1168.851257] omap_hwmod: uart3: enabling
  [ 1168.851287] omap_hwmod: uart3: enabling clocks
  [ 1168.851318] omap_hwmod: uart4: enabling
  [ 1168.851318] omap_hwmod: uart4: enabling clocks
  [ 1168.885955] omap_hwmod: uart3: idling
  [ 1168.885986] omap_hwmod: uart3: disabling clocks
  [ 1168.885986] omap_hwmod: uart4: idling
  [ 1168.886016] omap_hwmod: uart4: disabling clocks
  [ 1168.886383]

Re: DSS2/PM on 3.2 broken?

2012-01-21 Thread Paul Walmsley

On Wed, 18 Jan 2012, NeilBrown wrote:

 Oh - another thing.
 Sometimes during early boot I get:
 
 [0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
 [0.176879] omap_hwmod: hdq: softreset failed (waited 1 usec)

This latter bug just got fixed, it's updated in the new HDQ series that I 
sent out:

http://marc.info/?l=linux-omapm=132719073710874w=2

BTW, if you could give that a spin for us at some point, it would be 
greatly appreciated; I don't think I have a board with a 1-wire device on 
it (at least, not that I know of).

 At the same time the UART seem to hiccup and my USB-Serial port loses track
 and doesn't see any more characters until I close and re-open.
 
 Could that be related?  Any idea what it means?

It shouldn't be related.


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-20 Thread Kevin Hilman

Tomi Valkeinen tomi.valkei...@ti.com writes:

 On Thu, 2012-01-19 at 11:24 -0800, Kevin Hilman wrote:
 Tomi Valkeinen tomi.valkei...@ti.com writes:

  Now, you already said using pm_runtime_put_sync version is the correct
  way when suspending. But to use that I need to either always use
  pm_runtime_put_sync, or add an extra boolean which marks that we're
  suspending, and pass that around, or make it a DSS global variable.
 
 I'm not sure why can't you use the sync version just in the suspend
 callback?

 To do that the suspend callback should be the one that disables the
 device and calls pm_runtime_suspend. With DSS that's not the case, it's
 the panel drivers that are in charge of enabling/disabling DSS (by
 calling appropriate functions in omapdss).

Ah, OK.  makes sense.

 The only thing that the omapdss suspend callback does is to call the
 normal disable functions on the display drivers. This leads to calls to
 disable-functions on the dss hwmod drivers, and then pm_runtime_put
 calls. But at the point the pm_runtime funcs are called, the code has no
 idea that we're actually doing system suspend. Thus I'd need to pass
 that information somehow, probably with a global variable.

 And while that's not difficult to do, it sure feels a bit ugly.

 I think I'll just change the pm_runtime_put calls to sync versions for
 now. I imagine the perf impact with the change should be negligible.
 I'll return to this issue after the devtree adaptation has been made, as
 it changes the child-parent relations of the dss related devices, and
 perhaps managing the PM states will also get a bit easier.

Sounds reasonable.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Joe Woodward

...snip...
 
cat /sys/kernel/debug/omapdss/clk
 
 is below and reports 66461538 for fck, so 66MHz?  Still safe for OPP50.
 
 And disabling SMART REFLEX had no obvious effect.
 
 If you can think of anything else I could try to explore to narrow down
 the source of this, I am very happy to test or examine anything you
 suggest.
 
 Thanks,
 NeilBrown
 

I'm not sure if this will give you any pointers but i'll tell you what I'm 
fairly certain I'm seeing...

If I run the stock 3.2 with the changes to dss.c and dispc.c to make the 
pm_runtime_put()'s in to pm_runtime_put_sync()'s then the DSS warnings when 
suspending do indeed go away.

I have two wake sources, one being the UART the console is on and the other 
being a GPIO button.

I've tested the following situations:
 - If I sleep using the console and wake by typing a character then I *never* 
get a SYNC_LOST error.
 - If I sleep by pressing a button on the display (which does a system (echo 
mem  /sys/power/state) and then wake by typing a character in to the console 
then 
I *never* get a SYNC_LOST error.
 - If I sleep by pressing a button on the display (which does a system (echo 
mem  /sys/power/state) and then wake by pressing a button attached to the 
GPIO 
then I get a sequence of SYNC_LOST errors which continue at a rate of about one 
every 0.5 seconds until a type a character in to the console and then 
everything 
settles down and starts working again.

So, at least from what I'm seeing, the SYNC_LOST errors seem related somehow to 
the UART, if that is indeed possible?

Cheers,
Joe


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Tomi Valkeinen

On Thu, 2012-01-19 at 10:17 +, Joe Woodward wrote:
 ...snip...
  
 cat /sys/kernel/debug/omapdss/clk
  
  is below and reports 66461538 for fck, so 66MHz?  Still safe for OPP50.
  
  And disabling SMART REFLEX had no obvious effect.
  
  If you can think of anything else I could try to explore to narrow down
  the source of this, I am very happy to test or examine anything you
  suggest.
  
  Thanks,
  NeilBrown
  
 
 I'm not sure if this will give you any pointers but i'll tell you what I'm 
 fairly certain I'm seeing...
 
 If I run the stock 3.2 with the changes to dss.c and dispc.c to make the 
 pm_runtime_put()'s in to pm_runtime_put_sync()'s then the DSS warnings when 
 suspending do indeed go away.
 
 I have two wake sources, one being the UART the console is on and the other 
 being a GPIO button.
 
 I've tested the following situations:
  - If I sleep using the console and wake by typing a character then I *never* 
 get a SYNC_LOST error.
  - If I sleep by pressing a button on the display (which does a system (echo 
 mem  /sys/power/state) and then wake by typing a character in to the 
 console then 
 I *never* get a SYNC_LOST error.
  - If I sleep by pressing a button on the display (which does a system (echo 
 mem  /sys/power/state) and then wake by pressing a button attached to the 
 GPIO 
 then I get a sequence of SYNC_LOST errors which continue at a rate of about 
 one every 0.5 seconds until a type a character in to the console and then 
 everything 
 settles down and starts working again.
 
 So, at least from what I'm seeing, the SYNC_LOST errors seem related somehow 
 to the UART, if that is indeed possible?

Well, I don't see how UART could directly affect DSS. The thing that
comes to my mind is that typing a char in the console causes a change in
the power management, which then fixes the DSS. But then again, I'd
expect the console to go into sleep/idle state after a short while,
which should again cause sync_losts. But that's not happening...

Can you show the dss clocks you use (cat debugfs/omapdss/clk)?

It's been a while since I did anything with PM, so I don't remember how
and if it can be done, but a simple test would be to lock the OMAP to
always use OPP100.

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Joe Woodward

... (apologies for awful formatting of the previous mail) ...
 
 Well, I don't see how UART could directly affect DSS. The thing that
 comes to my mind is that typing a char in the console causes a change
 in
 the power management, which then fixes the DSS. But then again, I'd
 expect the console to go into sleep/idle state after a short while,
 which should again cause sync_losts. But that's not happening...
 
 Can you show the dss clocks you use (cat debugfs/omapdss/clk)?
 
 It's been a while since I did anything with PM, so I don't remember how
 and if it can be done, but a simple test would be to lock the OMAP to
 always use OPP100.
 
  Tomi
 

Right,

I'm running with both CPUIDLE and CPUFREQ disabled at present so I assume that 
means I should always be in OPP100?

The panel is 800x480 (I've added a configuration to the generic-panel with the 
correct panel parameters), with just 1 FB device (/dev/fb0).

Clock dump is shown below:

# pwd
/debug/omapdss
# cat clk
[   53.146057] omapdss DSS: dss_runtime_get
[   53.151000] omapdss DSS: dss_runtime_put
[   53.155151] omapdss DISPC: dispc_runtime_get
[   53.160430] omapdss DISPC: dispc_runtime_put
- DSS -
dpll4_ck 43200
DSS_FCK (DSS1_ALWON_FCLK) = 43200 / 12 * 2 = 7200
- DISPC -
dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
fck 7200
- LCD1 -
lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
lck 7200lck div 1
pck 3600pck div 2


The SYNC_LOST errors sometimes don't happen, and sometimes fix themselves. It 
does seem that whenever a character is typed in to the UART the SYNC_LOST 
errors stop immediately.

This is a question more for Kevin, but is there some debug/tracing I can enable 
within the PM framework to see a bit more of whats going on?

I've added some more traces below...

Trace with no errors:
[   91.806427] PM: Syncing filesystems ... done.
[   91.811370] PM: Preparing system for mem sleep
[   91.816986] Freezing user space processes ... (elapsed 0.02 seconds) done.
[   91.846221] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) 
done.
[   91.877441] PM: Entering mem sleep
[   91.881317] Suspending console(s) (use no_console_suspend to debug)
[   92.001068] omapdss CORE: suspend 2
[   92.184600] omapdss DISPC: dispc_runtime_put
[   92.184631] omapdss DISPC: dispc_save_context
[   92.184692] omapdss DISPC: context saved, ctx_loss_count 0
[   92.184722] omapdss DSS: dss_runtime_put
[   92.184753] omapdss DSS: dss_runtime_put
[   92.184783] omapdss DSS: dss_save_context
[   92.184783] omapdss DSS: context saved
[   92.185485] PM: suspend of devices complete after 297.088 msecs
[   92.186401] PM: late suspend of devices complete after 0.885 msecs
[   94.111755] Successfully put all powerdomains to target state
[   94.112518] PM: early resume of devices complete after 0.549 msecs
[   94.113555] omapdss CORE: resume
[   94.114135] omapdss DSS: dss_runtime_get
[   94.114166] omapdss DSS: dss_restore_context
[   94.114196] omapdss DSS: context restored
[   94.114196] omapdss DISPC: dispc_runtime_get
[   94.114227] omapdss DSS: dss_runtime_get
[   94.114257] omapdss DISPC: dispc_restore_context
[   94.114288] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[   94.114318] omapdss DSS: dispc clock info found from cache.
[   94.114318] omapdss DSS: dpll4_m4 = 43200
[   94.114349] omapdss DSS: fck = 7200 (12)
[   94.114379] omapdss DISPC: lck = 7200 (1)
[   94.114379] omapdss DISPC: pck = 3600 (2)
[   94.114410] omapdss DISPC: channel 0 xres 800 yres 480
[   94.114440] omapdss DISPC: pck 36000
[   94.114440] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[   94.114471] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[   94.698455] PM: resume of devices complete after 585.082 msecs
[   94.847900] PM: Finishing wakeup.
[   94.851409] Restarting tasks ... done.

Trace with SYNC_LOST errors:
[  106.635437] PM: Syncing filesystems ... done.
[  106.640563] PM: Preparing system for mem sleep
[  106.646118] Freezing user space processes ... (elapsed 0.02 seconds) done.
[  106.677703] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) 
done.
[  106.708923] PM: Entering mem sleep
[  106.712768] Suspending console(s) (use no_console_suspend to debug)
[  106.832672] omapdss CORE: suspend 2
[  107.011260] omapdss DISPC: dispc_runtime_put
[  107.011291] omapdss DISPC: dispc_save_context
[  107.011352] omapdss DISPC: context saved, ctx_loss_count 0
[  107.011383] omapdss DSS: dss_runtime_put
[  107.011413] omapdss DSS: dss_runtime_put
[  107.011444] omapdss DSS: dss_save_context
[  107.011444] omapdss DSS: context saved
[  107.012145] PM: suspend of devices complete after 292.266 msecs
[  107.013061] PM: late suspend of devices complete after 0.885 msecs
[  107.716400] Successfully put all powerdomains to target state
[  107.717102] PM: early resume of devices complete after 0.488 msecs
[  107.718139] omapdss CORE: resume
[  107.718780]

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Tomi Valkeinen

On Thu, 2012-01-19 at 11:29 +, Joe Woodward wrote:
 ... (apologies for awful formatting of the previous mail) ...
  
  Well, I don't see how UART could directly affect DSS. The thing that
  comes to my mind is that typing a char in the console causes a change
  in
  the power management, which then fixes the DSS. But then again, I'd
  expect the console to go into sleep/idle state after a short while,
  which should again cause sync_losts. But that's not happening...
  
  Can you show the dss clocks you use (cat debugfs/omapdss/clk)?
  
  It's been a while since I did anything with PM, so I don't remember how
  and if it can be done, but a simple test would be to lock the OMAP to
  always use OPP100.
  
   Tomi
  
 
 Right,
 
 I'm running with both CPUIDLE and CPUFREQ disabled at present so I assume 
 that means I should always be in OPP100?
 
 The panel is 800x480 (I've added a configuration to the generic-panel with 
 the correct panel parameters), with just 1 FB device (/dev/fb0).
 
 Clock dump is shown below:
 
 # pwd
 /debug/omapdss
 # cat clk
 [   53.146057] omapdss DSS: dss_runtime_get
 [   53.151000] omapdss DSS: dss_runtime_put
 [   53.155151] omapdss DISPC: dispc_runtime_get
 [   53.160430] omapdss DISPC: dispc_runtime_put
 - DSS -
 dpll4_ck 43200
 DSS_FCK (DSS1_ALWON_FCLK) = 43200 / 12 * 2 = 7200
 - DISPC -
 dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
 fck 7200
 - LCD1 -
 lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
 lck 7200lck div 1
 pck 3600pck div 2
 
 
 The SYNC_LOST errors sometimes don't happen, and sometimes fix themselves. It 
 does seem that whenever a character is typed in to the UART the SYNC_LOST 
 errors stop immediately.

Yep, the clocks are so low that they should work fine with OPP50 also...

 Tomi


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Joe Woodward

-Original Message-
From: Tomi Valkeinen tomi.valkei...@ti.com
To: Joe Woodward j...@terrafix.co.uk
Cc: NeilBrown ne...@suse.de, Paul Walmsley p...@pwsan.com, khil...@ti.com, 
t-kri...@ti.com, govindra...@ti.com, linux-omap@vger.kernel.org
Date: Thu, 19 Jan 2012 13:36:59 +0200
Subject: Re: DSS2/PM on 3.2 broken?

 On Thu, 2012-01-19 at 11:29 +, Joe Woodward wrote:
  ... (apologies for awful formatting of the previous mail) ...

   Well, I don't see how UART could directly affect DSS. The thing
 that
   comes to my mind is that typing a char in the console causes a
 change
   in
   the power management, which then fixes the DSS. But then again,
 I'd
   expect the console to go into sleep/idle state after a short while,
   which should again cause sync_losts. But that's not happening...

   Can you show the dss clocks you use (cat debugfs/omapdss/clk)?

   It's been a while since I did anything with PM, so I don't remember
 how
   and if it can be done, but a simple test would be to lock the OMAP
 to
   always use OPP100.

Tomi

  Right,

  I'm running with both CPUIDLE and CPUFREQ disabled at present so I
 assume that means I should always be in OPP100?

  The panel is 800x480 (I've added a configuration to the generic-panel
 with the correct panel parameters), with just 1 FB device (/dev/fb0).

  Clock dump is shown below:

  # pwd
  /debug/omapdss
  # cat clk
  [   53.146057] omapdss DSS: dss_runtime_get
  [   53.151000] omapdss DSS: dss_runtime_put
  [   53.155151] omapdss DISPC: dispc_runtime_get
  [   53.160430] omapdss DISPC: dispc_runtime_put
  - DSS -
  dpll4_ck 43200
  DSS_FCK (DSS1_ALWON_FCLK) = 43200 / 12 * 2 = 7200
  - DISPC -
  dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
  fck 7200
  - LCD1 -
  lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
  lck 7200lck div 1
  pck 3600pck div 2

  The SYNC_LOST errors sometimes don't happen, and sometimes fix
 themselves. It does seem that whenever a character is typed in to the
 UART the SYNC_LOST errors stop immediately.

 Yep, the clocks are so low that they should work fine with OPP50
 also...

If I do (either from the console or via a button press on the screen) then I 
never get a SYNC_LOST.
  echo 0  /sys/devices/omapdss/display0/enabled
  echo 1  /sys/devices/omapdss/display0/enabled

Just trying to think of some ideas that may be affecting the DSS...
  - Could it to be to do with the GPIO being used as a wake source (i.e. does 
the GPIO driver do runtime_pm properly?)?
  - Could it to be to do with the UART as it seems to fix itself whenever a 
character is pressed?
  - Could it to be to do with the ordering in which drivers are resumed?

Is the SYNC_LOST normally due to a lack of memory bandwidth? If so, it is 
possible to find out what the kernel is doing during the resume?

And before looking at this too much more, is the changing of the pm_runtime_put 
to the _sync versions the correct fix?

Sorry for so many questions, but I'm interested in getting this fixed as it's 
the only thing stopping me from switching to 3.2 from 3.0!

Cheers,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Tomi Valkeinen

On Thu, 2012-01-19 at 12:21 +, Joe Woodward wrote:

 If I do (either from the console or via a button press on the screen) then I 
 never get a SYNC_LOST.
   echo 0  /sys/devices/omapdss/display0/enabled
   echo 1  /sys/devices/omapdss/display0/enabled
 
 Just trying to think of some ideas that may be affecting the DSS...
   - Could it to be to do with the GPIO being used as a wake source (i.e. does 
 the GPIO driver do runtime_pm properly?)?
   - Could it to be to do with the UART as it seems to fix itself whenever a 
 character is pressed?
   - Could it to be to do with the ordering in which drivers are resumed?

Well, none of these sound probable to me. I don't see any connection
with GPIOs or UART as such with DSS (I mean something that could cause
sync losts).

The only thing that I can see affecting DSS via GPIO/UART is indirectly
via PM, with voltages or clocks. But if DSS responds in some way, I
presume the voltages are ok. And your clocks should be low enough to
work with lower OPPs also. So I'm quite out of ideas.

Of course there could be a problem in the omapdss when it returns its
registers. But it wouldn't explain why the synclosts end if you press a
key in the console.

 Is the SYNC_LOST normally due to a lack of memory bandwidth? If so, it is 
 possible to find out what the kernel is doing during the resume?

No. That should cause fifo underflows. I don't know the exact reasons
for sync_lost, but I think it generally means that the DSS sub-modules
lose sync with each other. Mostly this is due to clock config, but can
be cause also by other (wrong) configuration which makes a DSS
sub-module behave somehow wrong (or halt totally).

 And before looking at this too much more, is the changing of the 
 pm_runtime_put to the _sync versions the correct fix?

Well. I think it's a correct fix, in functional sense. However, I would
like to use non-sync versions normally, but that's just for performance
optimization.

 Sorry for so many questions, but I'm interested in getting this fixed as it's 
 the only thing stopping me from switching to 3.2 from 3.0!

The only idea I have currently is to add/enable debug prints which show
information about PM changing its states, so we could see if something
actually changes at the moment you press a key in the console.

What kind of setup did you have again? I wonder if I could reproduce it
easily with overo/beagle (it was omap3, wasn't it?)?

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Kevin Hilman

Tomi Valkeinen tomi.valkei...@ti.com writes:

 On Fri, 2012-01-13 at 11:30 -0800, Kevin Hilman wrote:
 Tomi Valkeinen tomi.valkei...@ti.com writes:

  pm_runtime_put() is called inside omapdss driver's .suspend callback. So
  I guess that means the system suspend has started. However, the logs
  show that the runtime_suspend callback _is_ being called before the
  system suspend is finished, so the workqueue can't be frozen...
 
 ...or there are other ways that the runtime_suspend callback is called.
 
 In order to ensure that devices are properly idled for system-wide
 suspend, The OMAP PM domain layer will call the drivers
 -runtime_suspend callback during late suspend (using the PM domain's
 _noirq callbacks.)
 
 This is there so that even when runtime PM has been disabled (via
 userspace, or pm_runtime_forbid() calls) the driver can still be
 properly idled before suspend.
 
 In your case, I suspect you're seeing the driver's -runtime_suspend
 callback called during late suspend, not by the PM workqueue.

 Ok, that explains the invocation of the callbacks.

 I don't understand DSS enough to make sense of the logs you sent, so not
 sure how to be of more help.

 Well, I think I can summarize the situation:

 Normally, when the user wants to turn a display off, the panel drivers
 call DSS functions to disable the corresponding output, leading to
 pm_runtime_put() calls.

 In system suspend case, the suspend callback initiates the turning off
 of the displays, which again leads to pm_runtime_put() calls.

 Now, you already said using pm_runtime_put_sync version is the correct
 way when suspending. But to use that I need to either always use
 pm_runtime_put_sync, or add an extra boolean which marks that we're
 suspending, and pass that around, or make it a DSS global variable.

I'm not sure why can't you use the sync version just in the suspend
callback?

 Both of those options sound a bit ugly to me. So my two questions are,

 1) Is there something funny with the DSS architecture to have a problem
 like this? I tried to grep around, but I didn't see any other driver
 doing something like that (but I could've missed it).

 2) why can't the PM framework handle this, by directing the
 pm_runtime_put calls to the _sync version when suspending. Somehow it
 feels strange to me that the driver needs to care about that.

This has been a big debate with the runtime PM framework in general.
Some of us have argued that system PM and runtime PM should be more
unified, and that runtime PM calls should work normally from system
suspend.  After all, system suspend could be considered simply a forced
idle/quiecient state state where all the runtime callbacks kick in to
idle the system.  However those of us arguing for that lost the battle
with the PM core maintainers, and system PM and runtime PM have to be
managed separately.

I've worked around that by using the PM domain level calls to ensure
runtime PM callbacks are called on system suspend, but the side effect
is the complications you've run into.

So for now, if you use runtime PM in your system PM you have to use the
sync versions.  For most drivers I've dealt with, that means just using
sync version in the suspend callback is required, but maybe the
interaction of the various DSS subblocks means you need the sync version
everywhere.

Kevin



--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Kevin Hilman

Joe Woodward j...@terrafix.co.uk writes:

[...]

 If I do (either from the console or via a button press on the screen)
 then I never get a SYNC_LOST.

   echo 0  /sys/devices/omapdss/display0/enabled
   echo 1  /sys/devices/omapdss/display0/enabled

 Just trying to think of some ideas that may be affecting the DSS...
   - Could it to be to do with the GPIO being used as a wake source (i.e. does 
 the GPIO driver do runtime_pm properly?)?
   - Could it to be to do with the UART as it seems to fix itself whenever a 
 character is pressed?
   - Could it to be to do with the ordering in which drivers are resumed?

Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.

The UART's are idled using timeouts, so after any activity (including a
wakeup) the UART timeout will not alow the UARTs to idle (and thus the
system to hit low power states) for a given timeout period.

I don't know what DSS does on wakeup, but presumably it has a burst of
activity to do right away.  Because of the UART timeout, the DSS
probably completes its burst of activity before the system ever idles
to a low power state.

With a GPIO wakeup, the system can (re)idle and possibly hit low power
states immediately after the wakeup.

If that is the case, that suggests that the DSS probably needs a
constraint someplace to prevent this from happening, at least during
this burst period.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Kevin Hilman

NeilBrown ne...@suse.de writes:

 On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman khil...@ti.com wrote:

 Joe Woodward j...@terrafix.co.uk writes:
 
 [...]
 
  If I do (either from the console or via a button press on the screen)
  then I never get a SYNC_LOST.
 
echo 0  /sys/devices/omapdss/display0/enabled
echo 1  /sys/devices/omapdss/display0/enabled
 
  Just trying to think of some ideas that may be affecting the DSS...
- Could it to be to do with the GPIO being used as a wake source (i.e. 
  does the GPIO driver do runtime_pm properly?)?
- Could it to be to do with the UART as it seems to fix itself whenever 
  a character is pressed?
- Could it to be to do with the ordering in which drivers are resumed?
 
 Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.
 
 The UART's are idled using timeouts, so after any activity (including a
 wakeup) the UART timeout will not alow the UARTs to idle (and thus the
 system to hit low power states) for a given timeout period.

 While I agree with that, I'm wondering if there might be something else odd
 relating to the UARTs that we haven't spotted yet.

I suspect the problem for both is constraints that aren't stated
explicitly.

 I'm chasing done the different problem of HDQ not coping with CPUIDLE but I
 think it is very likely to have a similar root cause, as HDQ goes wrong at
 the same time that DSS goes wrong, and is a vaguely similar way.

 I just tried compiling omap_hwmod.c with #define DEBUG and got some strange
 results.

Strange indeed.

I assume you are you using the runtime PM converted driver from Paul?  I
tried that branch on my 3530/Overo (console on UART3) and didn't see any
nul's on my UART3 console.  

[...]

 The other thing I discovered is that when I set the uart 'sleep_timeout' to 5
 (all uarts) so that CPUIDLE can let CORE enter the lower power states, I get
 a fairly steady stream of:

 [ 1168.490478] omap_hwmod: uart1: enabling
 [ 1168.490509] omap_hwmod: uart1: enabling clocks
 [ 1168.490539] omap_hwmod: uart2: enabling
 [ 1168.490539] omap_hwmod: uart2: enabling clocks
 [ 1168.490631] omap_hwmod: uart3: enabling
 [ 1168.490661] omap_hwmod: uart3: enabling clocks
 [ 1168.490661] omap_hwmod: uart4: enabling
 [ 1168.490692] omap_hwmod: uart4: enabling clocks
 [ 1168.578796] omap_hwmod: uart3: idling
 [ 1168.578796] omap_hwmod: uart3: disabling clocks
 [ 1168.578826] omap_hwmod: uart4: idling
 [ 1168.578826] omap_hwmod: uart4: disabling clocks
 [ 1168.578857] omap_hwmod: uart1: idling
 [ 1168.578857] omap_hwmod: uart1: disabling clocks
 [ 1168.578887] omap_hwmod: uart2: idling
 [ 1168.578887] omap_hwmod: uart2: disabling clocks
 [ 1168.714263] omap_hwmod: uart1: enabling
 [ 1168.714294] omap_hwmod: uart1: enabling clocks
 [ 1168.714324] omap_hwmod: uart2: enabling
 [ 1168.714324] omap_hwmod: uart2: enabling clocks
 [ 1168.714416] omap_hwmod: uart3: enabling
 [ 1168.714447] omap_hwmod: uart3: enabling clocks
 [ 1168.714447] omap_hwmod: uart4: enabling
 [ 1168.714477] omap_hwmod: uart4: enabling clocks
 [ 1168.803344] omap_hwmod: uart3: idling
 [ 1168.803344] omap_hwmod: uart3: disabling clocks
 [ 1168.803375] omap_hwmod: uart4: idling
 [ 1168.803375] omap_hwmod: uart4: disabling clocks
 [ 1168.851257] omap_hwmod: uart3: enabling
 [ 1168.851287] omap_hwmod: uart3: enabling clocks
 [ 1168.851318] omap_hwmod: uart4: enabling
 [ 1168.851318] omap_hwmod: uart4: enabling clocks
 [ 1168.885955] omap_hwmod: uart3: idling
 [ 1168.885986] omap_hwmod: uart3: disabling clocks
 [ 1168.885986] omap_hwmod: uart4: idling
 [ 1168.886016] omap_hwmod: uart4: disabling clocks
 [ 1168.886383] omap_hwmod: uart3: enabling
 [ 1168.886383] omap_hwmod: uart3: enabling clocks
 [ 1168.886383] omap_hwmod: uart4: enabling
 [ 1168.886413] omap_hwmod: uart4: enabling clocks
 [ 1168.944885] omap_hwmod: uart3: idling
 [ 1168.944915] omap_hwmod: uart3: disabling clocks

At least this part is expected.  

In the kernel you're using the UART clocks are enabled/disabled during
the idle path depending on the low-power state being targetted, so would
expect to see lots of UART clock gating going on.

Kevin

 It seems to mostly be uart3 and uart4.
 If there is any interaction between the uart clock and the hdq clock, this
 could explain why the HDQ gets confused when all this is happening.

 And if there is a connection there, then there could also be a connection
 with a DSS clock which could be confusing it.

 I admit this is largely surmise and hypothesis - but there is definitely
 something odd here!

 Thanks,
 NeilBrown
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-19 Thread Tomi Valkeinen

On Thu, 2012-01-19 at 11:24 -0800, Kevin Hilman wrote:
 Tomi Valkeinen tomi.valkei...@ti.com writes:

  Now, you already said using pm_runtime_put_sync version is the correct
  way when suspending. But to use that I need to either always use
  pm_runtime_put_sync, or add an extra boolean which marks that we're
  suspending, and pass that around, or make it a DSS global variable.
 
 I'm not sure why can't you use the sync version just in the suspend
 callback?

To do that the suspend callback should be the one that disables the
device and calls pm_runtime_suspend. With DSS that's not the case, it's
the panel drivers that are in charge of enabling/disabling DSS (by
calling appropriate functions in omapdss).

The only thing that the omapdss suspend callback does is to call the
normal disable functions on the display drivers. This leads to calls to
disable-functions on the dss hwmod drivers, and then pm_runtime_put
calls. But at the point the pm_runtime funcs are called, the code has no
idea that we're actually doing system suspend. Thus I'd need to pass
that information somehow, probably with a global variable.

And while that's not difficult to do, it sure feels a bit ugly.

I think I'll just change the pm_runtime_put calls to sync versions for
now. I imagine the perf impact with the change should be negligible.
I'll return to this issue after the devtree adaptation has been made, as
it changes the child-parent relations of the dss related devices, and
perhaps managing the PM states will also get a bit easier.

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-18 Thread NeilBrown

On Wed, 18 Jan 2012 09:13:59 +0200 Tomi Valkeinen tomi.valkei...@ti.com
wrote:

 On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
  Having CPUIDLE makes the DSS2 problem worse: lots of 
  
  [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
  restarting the output with video overlays disabled
  
  messages whenever the CPU isn't busy.
 
 I'm not sure if it is the case here, but DSS has restrictions about the
 max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
 maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
 drop, causing problems with all but the rather small displays.
 
 And the DSS driver doesn't have any support to handle this at the
 moment, as there isn't support in the PM framework to do this. I think
 the only way to handle this at the moment is for the DSS driver to set
 an arbitrarily high constraint on, say, mem throughput, and hope that it
 keeps the OMAP in the required OPP.
 
  Tomi
 

This LCD panel on this device sets:
   .pixel_clock = 22000,
in the struct omap_video_timings so I'm guessing that is 22MHz?

I'll try building without SMARTREFLEX and see if that makes a difference -
presumably that should keep it at OPP100 (??).

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-18 Thread Tomi Valkeinen

On Wed, 2012-01-18 at 22:15 +1100, NeilBrown wrote:
 On Wed, 18 Jan 2012 09:13:59 +0200 Tomi Valkeinen tomi.valkei...@ti.com
 wrote:
 
  On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
   Having CPUIDLE makes the DSS2 problem worse: lots of 
   
   [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
   restarting the output with video overlays disabled
   
   messages whenever the CPU isn't busy.
  
  I'm not sure if it is the case here, but DSS has restrictions about the
  max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
  maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
  drop, causing problems with all but the rather small displays.
  
  And the DSS driver doesn't have any support to handle this at the
  moment, as there isn't support in the PM framework to do this. I think
  the only way to handle this at the moment is for the DSS driver to set
  an arbitrarily high constraint on, say, mem throughput, and hope that it
  keeps the OMAP in the required OPP.
  
   Tomi
  
 
 This LCD panel on this device sets:
.pixel_clock   = 22000,
 in the struct omap_video_timings so I'm guessing that is 22MHz?

No, that's the pixel clock. There are probably limitations on the pix
clock also, but usually the problem is the functional clocks, which need
to be n x pck, where n depends on the needs for scaling.

 Tomi


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-18 Thread NeilBrown

On Wed, 18 Jan 2012 13:42:20 +0200 Tomi Valkeinen tomi.valkei...@ti.com
wrote:

 On Wed, 2012-01-18 at 22:15 +1100, NeilBrown wrote:
  On Wed, 18 Jan 2012 09:13:59 +0200 Tomi Valkeinen tomi.valkei...@ti.com
  wrote:
  
   On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
Having CPUIDLE makes the DSS2 problem worse: lots of 

[   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
restarting the output with video overlays disabled

messages whenever the CPU isn't busy.
   
   I'm not sure if it is the case here, but DSS has restrictions about the
   max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
   maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
   drop, causing problems with all but the rather small displays.
   
   And the DSS driver doesn't have any support to handle this at the
   moment, as there isn't support in the PM framework to do this. I think
   the only way to handle this at the moment is for the DSS driver to set
   an arbitrarily high constraint on, say, mem throughput, and hope that it
   keeps the OMAP in the required OPP.
   
Tomi
   
  
  This LCD panel on this device sets:
 .pixel_clock = 22000,
  in the struct omap_video_timings so I'm guessing that is 22MHz?
 
 No, that's the pixel clock. There are probably limitations on the pix
 clock also, but usually the problem is the functional clocks, which need
 to be n x pck, where n depends on the needs for scaling.

Ahh..

   cat /sys/kernel/debug/omapdss/clk

is below and reports 66461538 for fck, so 66MHz?  Still safe for OPP50.

And disabling SMART REFLEX had no obvious effect.

If you can think of anything else I could try to explore to narrow down 
the source of this, I am very happy to test or examine anything you suggest.

Thanks,
NeilBrown



- DSS -
dpll4_ck 86400
DSS_FCK (DSS1_ALWON_FCLK) = 86400 / 13  = 66461538
- DISPC -
dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
fck 66461538
- LCD1 -
lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
lck 66461538lck div 1
pck 22153846pck div 3
- DSI1 PLL -
dsi pll source = pclkfree
Fint0   regn 0
CLKIN4DDR   0   regm 0
DSS_FCK (DSS1_ALWON_FCLK)   0   regm_dispc 0(off)
DSS_FCK (DSS1_ALWON_FCLK)   0   regm_dsi 0  (off)
- DSI1 -
dsi fclk source = DSS_FCK (DSS1_ALWON_FCLK)
DSI_FCLK66461538
DDR_CLK 0
TxByteClkHS 0
LP_CLK  0




signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-17 Thread NeilBrown

On Sat, 14 Jan 2012 10:09:15 +1100 NeilBrown ne...@suse.de wrote:

 On Fri, 13 Jan 2012 04:31:37 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
  On Fri, 13 Jan 2012, NeilBrown wrote:
  
   Also, the HDQ access to the battery 'fuel gauge' is working fine, so
   presumably that gets disturbed by one of the lower power states (3,4,5).
   I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
   HDQ to say I'm busy just like UART can.  omap_hdq_can_sleep() ???
   There must be a cleaner way...
  
  The first thing to try is the HDQ runtime PM conversion series that was 
  sent separately.  Maybe let us know if that fixes the problem.
 
 Not completely - but I'm still exploring.
 


I've finished exploring (for now) and am seriously stumped.  Maybe you can
help... 

I modified your patch so that hdq uses autosuspend - because it seemed like a
good idea and ends up providing slightly more informative failure modes.

The normal behaviour of the bq27000 driver is to request lots of values in
quick succession.  So using autosuspend (with 100ms timeout) means that we
have only one pm_resume and one pm_suspend for each batch of requests which
at least reduces the noise.

I have added an omap_hdq_can_sleep() function which only succeeds if the hdq
device is run-time suspended, and call it in the same place that
omap_uart_can_sleep is called, to ensure the core never goes to a lower state
while he hdq is active.  I've double-checked that this really works.

HDQ all works find normally, but then normally CORE stays ON because the UART
keeps it on.

If I tell the UARTs they can sleep (by setting sleep_timeout) and then access
the bq27000 via HDQ it doesn't work properly.

The normal sequence is that it resets the HDQ and sends a 'BREAK' signal
(possibly not really necessary) for which we get a TIMEOUT interrupt.
Then 
 write address - get TX interrupt - wait a bit - get RX interrupt - read byte.

We normally get the TIMEOUT interrupt and often get the first TX and then RX
interrupts but then the device goes silent.  No more interrupts come in, and
the status register (which shows interrupt source) reads as zero.

Occasionally I have seen maybe half a dozen successful reads before it goes
silent, but usually it is zero or 1.

It is as though something is timing out and going to sleep.  A clock
auto-sleeping?  Seems unlikely.

If I set all the UART sleep_timeout values to 0 to keep CORE always ON, the
problem doesn't fix itself.  Whatever got turned off stays off.

Once it starts failing, it never works again until a reboot.

Could we be caching something in memory which we think is on, but due to CORE
going into RET (or maybe OFF), the thing is now off and we never bother to
reset it because the cached value is 'on' ??  I'm wondering about
_sysc_cache, but I haven't looked very deeply into it.

Any other ideas?

Oh - another thing.
Sometimes during early boot I get:

[0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
[0.176879] omap_hwmod: hdq: softreset failed (waited 1 usec)

At the same time the UART seem to hiccup and my USB-Serial port loses track
and doesn't see any more characters until I close and re-open.

Could that be related?  Any idea what it means?


Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-17 Thread Tomi Valkeinen

On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
 Having CPUIDLE makes the DSS2 problem worse: lots of 
 
 [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
 restarting the output with video overlays disabled
 
 messages whenever the CPU isn't busy.

I'm not sure if it is the case here, but DSS has restrictions about the
max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
drop, causing problems with all but the rather small displays.

And the DSS driver doesn't have any support to handle this at the
moment, as there isn't support in the PM framework to do this. I think
the only way to handle this at the moment is for the DSS driver to set
an arbitrarily high constraint on, say, mem throughput, and hope that it
keeps the OMAP in the required OPP.

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

cc Tero, Govindraj

On Thu, 12 Jan 2012, NeilBrown wrote:

 On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
 I spent some time exploring why cpuidle never drops below state 0 and found
 out that the code explicitly disables other states when uart is active - for
 a fairly broad definition of 'active'.
 
 I found the sleep_timeout setting and set them all to 1 second.  This meant
 that cpuidle started working, but I got a lot of garbage characters.
 

...

 Does this really mean CPU_IDLE cannot be used when you might expect chars
 from a serial port - e.g. GPS or Bluetooth?

Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it 
indeed prevents even the MPU from entering a low-power state when the UART 
is active, as you write.  That doesn't seem correct.

Please try the following patch.  It works fine for me on v3.2 with 
omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled 
characters with console use.  Not too surprising, since the UART has a 
receive FIFO to withstand interrupt servicing delays.


- Paul

From: Paul Walmsley p...@pwsan.com
Date: Fri, 13 Jan 2012 02:10:30 -0700
Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
 when the UART is active

For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
driver prevent the MPU powerdomain from entering low-power modes when
any UART isn't asleep.  Possibly it is intended to minimize the ARM
wakeup latency when UART activity arrives, but the UART has a FIFO
that should handle this for most cases, with no dropped characters.  I
may be forgetting something important, though.  And CORE/PER low-power
states are a different matter entirely.

Thanks to NeilBrown ne...@suse.de for reporting the problem.

Signed-off-by: Paul Walmsley p...@pwsan.com
Cc: NeilBrown ne...@suse.de
Cc: Joe Woodward j...@terrafix.co.uk
Cc: Tero Kristo t-kri...@ti.com
Cc: Kevin Hilman khil...@ti.com
Cc: Govindraj.R govindraj.r...@ti.com
---
 arch/arm/mach-omap2/cpuidle34xx.c |8 +++-
 arch/arm/mach-omap2/pm.h  |1 -
 arch/arm/mach-omap2/pm34xx.c  |   10 --
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle34xx.c 
b/arch/arm/mach-omap2/cpuidle34xx.c
index 942bb4f..4ef32d4 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
core_deepest_state = PWRDM_POWER_OFF;
}
 
+   if (!omap_uart_can_sleep())
+   core_deepest_state = PWRDM_POWER_ON;
+
/* Check if current state is valid */
if ((cx-valid) 
(cx-mpu_state = mpu_deepest_state) 
@@ -244,11 +247,6 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
struct omap3_idle_statedata *cx;
int ret;
 
-   if (!omap3_can_sleep()) {
-   new_state_idx = drv-safe_state_index;
-   goto select_state;
-   }
-
/*
 * Prevent idle completely if CAM is active.
 * CAM does not have wakeup capability in OMAP3.
diff --git a/arch/arm/mach-omap2/pm.h b/arch/arm/mach-omap2/pm.h
index 4e166ad..eac6fce 100644
--- a/arch/arm/mach-omap2/pm.h
+++ b/arch/arm/mach-omap2/pm.h
@@ -18,7 +18,6 @@
 extern void *omap3_secure_ram_storage;
 extern void omap3_pm_off_mode_enable(int);
 extern void omap_sram_idle(void);
-extern int omap3_can_sleep(void);
 extern int omap_set_pwrdm_state(struct powerdomain *pwrdm, u32 state);
 extern int omap3_idle_init(void);
 
diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
index efa6649..1c49824 100644
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -484,21 +484,11 @@ console_still_active:
clkdm_allow_idle(mpu_pwrdm-pwrdm_clkdms[0]);
 }
 
-int omap3_can_sleep(void)
-{
-   if (!omap_uart_can_sleep())
-   return 0;
-   return 1;
-}
-
 static void omap3_pm_idle(void)
 {
local_irq_disable();
local_fiq_disable();
 
-   if (!omap3_can_sleep())
-   goto out;
-
if (omap_irq_pending() || need_resched())
goto out;
 
-- 
1.7.8.3

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Thu, 12 Jan 2012, NeilBrown wrote:

 On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
 Not off-mode - just retention.

Okay.  It's expected that the UART will drop the first byte also when CORE 
is in retention, due to the delay involved in waking up the chip.  But 
that shouldn't prevent the MPU from going into a low-power state.  I've 
sent a patch under separate cover that should fix this.

  If the former, this is hardly surprising as the HDQ driver 
  (drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore 
  code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit 
  of work is needed there. 
 
 The HDQ driver does enable/disable the iclk and fclk.  I thought I read that
 having the clocks enabled would prevent the powerdomain from dropping to a
 lower-power state ??

Several years ago, we started handling all of the operations needed to 
quiesce and wake an IP block in a common integration layer called 'hwmod', 
which is currently called through the PM runtime functions.  This includes 
clk_{enable,disable}().  Unfortunately, due to some internal political 
reasons, not all of our drivers are converted to use PM runtime functions 
yet.  You just happened to hit one that isn't -- apparently very few 
people care very much about this driver...

 Could you suggest some other driver which does do the right sort of runtime
 PM that I could try copying?

I did a quick conversion for you.  The series is at 
git://git.pwsan.com/linux-2.6 in the branch named 'hdq_hwmod_runtime_pm'.  
Please let me know if it works.

I did boot-test it, and noticed that the hwmod subsystem was unable to 
softreset the device:

[0.182769] omap_hwmod: hdq: softreset failed (waited 1 usec)

This could be because I screwed up something in the hwmod data, or it 
could be that the HDQ reset hardware doesn't work correctly, similar to 
I2C.  If that presents a problem, you might add the HWMOD_INIT_NO_RESET 
flag to the struct omap_hwmod.flags field for the HDQ.

It also spit out these messages later, but I have no idea what they mean.  
At least the kernel seems to be able to access the HDQ registers:

[1.685272] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in 
Interrupt mode
[1.888153] w1_master_driver w1_bus_master1: Family 1 for 
01..3d is not registered.


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread NeilBrown

On Fri, 13 Jan 2012 03:05:03 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 cc Tero, Govindraj
 
 On Thu, 12 Jan 2012, NeilBrown wrote:
 
  On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley p...@pwsan.com 
  wrote:
  
  I spent some time exploring why cpuidle never drops below state 0 and found
  out that the code explicitly disables other states when uart is active - for
  a fairly broad definition of 'active'.
  
  I found the sleep_timeout setting and set them all to 1 second.  This 
  meant
  that cpuidle started working, but I got a lot of garbage characters.
  
 
 ...
 
  Does this really mean CPU_IDLE cannot be used when you might expect chars
  from a serial port - e.g. GPS or Bluetooth?
 
 Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it 
 indeed prevents even the MPU from entering a low-power state when the UART 
 is active, as you write.  That doesn't seem correct.
 
 Please try the following patch.  It works fine for me on v3.2 with 
 omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled 
 characters with console use.  Not too surprising, since the UART has a 
 receive FIFO to withstand interrupt servicing delays.

Much nicer!!

I now see CPUIDLE using state1 and state2 as well as the normal state0.

I also see idle current drain drop from about 160mA to 95mA

And there are no garbled characters when I type.

Having CPUIDLE makes the DSS2 problem worse: lots of 

[   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
output with video overlays disabled

messages whenever the CPU isn't busy.

The only way to get rid of them that I have found is to blank the display:
# echo 1  /sys/devices/platform/omapfb/graphics/fb0/blank

Also, the HDQ access to the battery 'fuel gauge' is working fine, so
presumably that gets disturbed by one of the lower power states (3,4,5).
I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
HDQ to say I'm busy just like UART can.  omap_hdq_can_sleep() ???
There must be a cleaner way...

More experiments:
 - I enabled off_mode and started seeing state3 happening.  UART and HDQ
   still fine - as expected.
 - I set the  /sys/devices/system/cpu/cpu0/cpuidle/state?/usage values
   all to '10'.
   Now things start to break;
   Console is mostly OK, but the first char after 10 seconds of idle is bad.
   I type 'enter' and get 'C'.  In general the first char typed is mapped to
   something else in a consistent way:
  0a - c3
  20 - 90
  55 - d5
  2a - 95
   It is a bit like we are missing a start bit, and a stop bit is shifting
   down into the msb .. sometimes.

   HDQ also breaks.
   That surprised me a bit as the HDQ access is immediately after typing so
   it should be in the 10-second timeout when the UART keeps CORE powered on.
   Maybe we need runtime_pm to run omap_hdq_break() again to reinitialise
   the HDQ port if CORE might have been turned off.

   Also saw states all the way down to 6.  Cannot tell if this helps current
   drain as I need HDQ working for that.

So this is real progress, but there is still room for improvement.  I might
try writing an omap_hdq_can_sleep() and use it like omap_uart_can_sleep().
And get runtime_pm working on HDQ so it turns off after a short delay, and
re-initialises if we have to turn it back on again.

Thanks for the patch, it's been:
   Tested-by: NeilBrown ne...@suse.de


Thanks,
NeilBrown


 
 
 - Paul
 
 From: Paul Walmsley p...@pwsan.com
 Date: Fri, 13 Jan 2012 02:10:30 -0700
 Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
  when the UART is active
 
 For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
 driver prevent the MPU powerdomain from entering low-power modes when
 any UART isn't asleep.  Possibly it is intended to minimize the ARM
 wakeup latency when UART activity arrives, but the UART has a FIFO
 that should handle this for most cases, with no dropped characters.  I
 may be forgetting something important, though.  And CORE/PER low-power
 states are a different matter entirely.
 
 Thanks to NeilBrown ne...@suse.de for reporting the problem.
 
 Signed-off-by: Paul Walmsley p...@pwsan.com
 Cc: NeilBrown ne...@suse.de
 Cc: Joe Woodward j...@terrafix.co.uk
 Cc: Tero Kristo t-kri...@ti.com
 Cc: Kevin Hilman khil...@ti.com
 Cc: Govindraj.R govindraj.r...@ti.com
 ---
  arch/arm/mach-omap2/cpuidle34xx.c |8 +++-
  arch/arm/mach-omap2/pm.h  |1 -
  arch/arm/mach-omap2/pm34xx.c  |   10 --
  3 files changed, 3 insertions(+), 16 deletions(-)
 
 diff --git a/arch/arm/mach-omap2/cpuidle34xx.c 
 b/arch/arm/mach-omap2/cpuidle34xx.c
 index 942bb4f..4ef32d4 100644
 --- a/arch/arm/mach-omap2/cpuidle34xx.c
 +++ b/arch/arm/mach-omap2/cpuidle34xx.c
 @@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
   core_deepest_state = PWRDM_POWER_OFF;
   }
  
 + if (!omap_uart_can_sleep())
 +

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Fri, 13 Jan 2012, NeilBrown wrote:

 Also, the HDQ access to the battery 'fuel gauge' is working fine, so
 presumably that gets disturbed by one of the lower power states (3,4,5).
 I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
 HDQ to say I'm busy just like UART can.  omap_hdq_can_sleep() ???
 There must be a cleaner way...

The first thing to try is the HDQ runtime PM conversion series that was 
sent separately.  Maybe let us know if that fixes the problem.

 More experiments:
  - I enabled off_mode and started seeing state3 happening.  UART and HDQ
still fine - as expected.
  - I set the  /sys/devices/system/cpu/cpu0/cpuidle/state?/usage values
all to '10'.
Now things start to break;
Console is mostly OK, but the first char after 10 seconds of idle is bad.
I type 'enter' and get 'C'.  In general the first char typed is mapped to
something else in a consistent way:
   0a - c3
   20 - 90
   55 - d5
   2a - 95
It is a bit like we are missing a start bit, and a stop bit is shifting
down into the msb .. sometimes.

I'm not seeing garbling at all on the OMAP3 BeagleBoard here, running v3.2 
with omap2plus_defconfig.  The first character from the console gets lost, 
which again is expected when the CORE enters a low-power state and there's 
no flow control on the serial interface -- but no garbling.

HDQ also breaks.
That surprised me a bit as the HDQ access is immediately after typing so
it should be in the 10-second timeout when the UART keeps CORE powered on.
Maybe we need runtime_pm to run omap_hdq_break() again to reinitialise
the HDQ port if CORE might have been turned off.
 
Also saw states all the way down to 6.  Cannot tell if this helps current
drain as I need HDQ working for that.

If CORE is going off, the HDQ device context will be lost, since the HDQ 
controller will be powered off.  You'll need to add code to either save 
and restore the HDQ register context into the HDQ driver, or to 
reinitialize the HDQ when it is re-enabled.  The latter is what we do for 
I2C, I think.

As far as the DSS goes, your best bet there is to try to work it out with 
Archit and Tomi.

 Thanks for the patch, it's been:
Tested-by: NeilBrown ne...@suse.de

Thanks.


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Govindraj

On Fri, Jan 13, 2012 at 3:35 PM, Paul Walmsley p...@pwsan.com wrote:
 cc Tero, Govindraj

 On Thu, 12 Jan 2012, NeilBrown wrote:

 On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley p...@pwsan.com 
 wrote:

 I spent some time exploring why cpuidle never drops below state 0 and found
 out that the code explicitly disables other states when uart is active - for
 a fairly broad definition of 'active'.

 I found the sleep_timeout setting and set them all to 1 second.  This meant
 that cpuidle started working, but I got a lot of garbage characters.


 ...

 Does this really mean CPU_IDLE cannot be used when you might expect chars
 from a serial port - e.g. GPS or Bluetooth?

 Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it
 indeed prevents even the MPU from entering a low-power state when the UART
 is active, as you write.  That doesn't seem correct.

 Please try the following patch.  It works fine for me on v3.2 with
 omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled
 characters with console use.  Not too surprising, since the UART has a
 receive FIFO to withstand interrupt servicing delays.


 - Paul

 From: Paul Walmsley p...@pwsan.com
 Date: Fri, 13 Jan 2012 02:10:30 -0700
 Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
  when the UART is active

 For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
 driver prevent the MPU powerdomain from entering low-power modes when
 any UART isn't asleep.  Possibly it is intended to minimize the ARM
 wakeup latency when UART activity arrives, but the UART has a FIFO
 that should handle this for most cases, with no dropped characters.  I
 may be forgetting something important, though.  And CORE/PER low-power
 states are a different matter entirely.

 Thanks to NeilBrown ne...@suse.de for reporting the problem.

 Signed-off-by: Paul Walmsley p...@pwsan.com
 Cc: NeilBrown ne...@suse.de
 Cc: Joe Woodward j...@terrafix.co.uk
 Cc: Tero Kristo t-kri...@ti.com
 Cc: Kevin Hilman khil...@ti.com
 Cc: Govindraj.R govindraj.r...@ti.com
 ---
  arch/arm/mach-omap2/cpuidle34xx.c |    8 +++-
  arch/arm/mach-omap2/pm.h          |    1 -
  arch/arm/mach-omap2/pm34xx.c      |   10 --
  3 files changed, 3 insertions(+), 16 deletions(-)

 diff --git a/arch/arm/mach-omap2/cpuidle34xx.c 
 b/arch/arm/mach-omap2/cpuidle34xx.c
 index 942bb4f..4ef32d4 100644
 --- a/arch/arm/mach-omap2/cpuidle34xx.c
 +++ b/arch/arm/mach-omap2/cpuidle34xx.c
 @@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
                        core_deepest_state = PWRDM_POWER_OFF;
        }

 +       if (!omap_uart_can_sleep())
 +               core_deepest_state = PWRDM_POWER_ON;
 +

omap_uart_can_sleep is removed part of uart runtime conversion,
and is no more available on latest mainline, looks like uart runtime
patches got merged after 3.2 tagging.

--
Thanks,
Govindraj.R
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Fri, 13 Jan 2012, Govindraj wrote:

  diff --git a/arch/arm/mach-omap2/cpuidle34xx.c 
  b/arch/arm/mach-omap2/cpuidle34xx.c
  index 942bb4f..4ef32d4 100644
  --- a/arch/arm/mach-omap2/cpuidle34xx.c
  +++ b/arch/arm/mach-omap2/cpuidle34xx.c
  @@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
                         core_deepest_state = PWRDM_POWER_OFF;
         }
 
  +       if (!omap_uart_can_sleep())
  +               core_deepest_state = PWRDM_POWER_ON;
  +
 
 omap_uart_can_sleep is removed part of uart runtime conversion,
 and is no more available on latest mainline, looks like uart runtime
 patches got merged after 3.2 tagging.

Great, so sounds like this patch is not needed...

- Paul

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Kevin Hilman

Paul Walmsley p...@pwsan.com writes:

 cc Tero, Govindraj

 On Thu, 12 Jan 2012, NeilBrown wrote:

 On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley p...@pwsan.com 
 wrote:
 
 I spent some time exploring why cpuidle never drops below state 0 and found
 out that the code explicitly disables other states when uart is active - for
 a fairly broad definition of 'active'.
 
 I found the sleep_timeout setting and set them all to 1 second.  This meant
 that cpuidle started working, but I got a lot of garbage characters.
 

 ...

 Does this really mean CPU_IDLE cannot be used when you might expect chars
 from a serial port - e.g. GPS or Bluetooth?

 Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it 
 indeed prevents even the MPU from entering a low-power state when the UART 
 is active, as you write.  That doesn't seem correct.

 Please try the following patch.  It works fine for me on v3.2 with 
 omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled 
 characters with console use.  Not too surprising, since the UART has a 
 receive FIFO to withstand interrupt servicing delays.


 - Paul

 From: Paul Walmsley p...@pwsan.com
 Date: Fri, 13 Jan 2012 02:10:30 -0700
 Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
  when the UART is active

 For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
 driver prevent the MPU powerdomain from entering low-power modes when
 any UART isn't asleep.  Possibly it is intended to minimize the ARM
 wakeup latency when UART activity arrives, but the UART has a FIFO
 that should handle this for most cases, with no dropped characters.  I
 may be forgetting something important, though.  And CORE/PER low-power
 states are a different matter entirely.

Just FYI... the UART can_sleep hackery was removed for v3.3 and replaced
by using a PM QoS constraint:

commit 2fd149645eb46d26130d7070c6de037dddf34880
Author: Govindraj.R govindraj.r...@ti.com
Date:   Wed Nov 9 17:41:21 2011 +0530

ARM: OMAP2+: UART: Remove omap_uart_can_sleep and add pm_qos

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Kevin Hilman

Tomi Valkeinen tomi.valkei...@ti.com writes:

 On Thu, 2012-01-12 at 14:40 -0800, Kevin Hilman wrote:
 Tomi Valkeinen tomi.valkei...@ti.com writes:
 
  On Mon, 2012-01-09 at 12:46 +, Joe Woodward wrote:
  I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel 
  connected via the DPI interface (using the generic panel driver).
  
  Entering standby used to work just fine on 3.0, but on 3.2 I get the 
  following:
 
  I've been debugging this, but I'm at loss. I added some debug prints,
  which I show below, and I also pushed them to
  git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints if
  somebody wants to see exactly what they print.
 
  Kevin, perhaps you have an idea what could be wrong here. Long version
  below, short version: doing system suspend breaks omapdss, if omapdss
  uses pm_runtime_put, but works with pm_runtime_put_sync.
 
 Is the pm_runtime_put() done after system suspend has started?  
 
 After system suspend has begun, the workqueue used for async callbacks
 is frozen, so any runtime PM calls that you want to work during
 suspend/resume must use the _sync versions.

 pm_runtime_put() is called inside omapdss driver's .suspend callback. So
 I guess that means the system suspend has started. However, the logs
 show that the runtime_suspend callback _is_ being called before the
 system suspend is finished, so the workqueue can't be frozen...

...or there are other ways that the runtime_suspend callback is called.

In order to ensure that devices are properly idled for system-wide
suspend, The OMAP PM domain layer will call the drivers
-runtime_suspend callback during late suspend (using the PM domain's
_noirq callbacks.)

This is there so that even when runtime PM has been disabled (via
userspace, or pm_runtime_forbid() calls) the driver can still be
properly idled before suspend.

In your case, I suspect you're seeing the driver's -runtime_suspend
callback called during late suspend, not by the PM workqueue.

I don't understand DSS enough to make sense of the logs you sent, so not
sure how to be of more help.

Kevin



--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Kevin Hilman

Kevin Hilman khil...@ti.com writes:

[...]

 From: Paul Walmsley p...@pwsan.com
 Date: Fri, 13 Jan 2012 02:10:30 -0700
 Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
  when the UART is active

 For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
 driver prevent the MPU powerdomain from entering low-power modes when
 any UART isn't asleep.  Possibly it is intended to minimize the ARM
 wakeup latency when UART activity arrives, but the UART has a FIFO
 that should handle this for most cases, with no dropped characters.  I
 may be forgetting something important, though.  And CORE/PER low-power
 states are a different matter entirely.

 Just FYI... the UART can_sleep hackery was removed for v3.3 and replaced
 by using a PM QoS constraint:

However, looking closer I see now that the constraint being used in the
mailine driver uses the CPU_DMA_LATENCY QoS constraint, which is preventing 
deeper C states as well just like the current code (before your patch),
so an similar fix will still be needed for mainline.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Fri, 13 Jan 2012, Kevin Hilman wrote:

 Kevin Hilman khil...@ti.com writes:

 [...]

  From: Paul Walmsley p...@pwsan.com
  Date: Fri, 13 Jan 2012 02:10:30 -0700
  Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
   when the UART is active

  For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
  driver prevent the MPU powerdomain from entering low-power modes when
  any UART isn't asleep.  Possibly it is intended to minimize the ARM
  wakeup latency when UART activity arrives, but the UART has a FIFO
  that should handle this for most cases, with no dropped characters.  I
  may be forgetting something important, though.  And CORE/PER low-power
  states are a different matter entirely.

  Just FYI... the UART can_sleep hackery was removed for v3.3 and replaced
  by using a PM QoS constraint:

 However, looking closer I see now that the constraint being used in the
 mailine driver uses the CPU_DMA_LATENCY QoS constraint, which is preventing 
 deeper C states as well just like the current code (before your patch),
 so an similar fix will still be needed for mainline.

Hmm.  This is the following code in serial_omap_set_termios() ? 

/* calculate wakeup latency constraint */
up-calc_latency = (100 * up-port.fifosize) /
(1000 * baud / 8);
up-latency = up-calc_latency;
schedule_work(up-qos_work);

Let's work out this formula for 115.2kbps:

(100 * 16) / (1000 * 115200 / 8)

= 1600 / 1440

= 1.11...  (presumably the unit here is milliseconds)

= 1 (since up-calc_latency is a u32)

Then up-calc_latency is passed to pm_qos_update_request().  According to 
Documentation/power/pm_qos_interface.txt, it is expecting microseconds.
So that's one problem.  To fix that it should be 

up-calc_latency = (100 * up-port.fifosize) /
(baud / 8);

With that change, it should be possible to enter C3, with the timings that 
are in the mainline cpuidle34xx.c.

- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread NeilBrown

On Fri, 13 Jan 2012 04:31:37 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 On Fri, 13 Jan 2012, NeilBrown wrote:
 
  Also, the HDQ access to the battery 'fuel gauge' is working fine, so
  presumably that gets disturbed by one of the lower power states (3,4,5).
  I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
  HDQ to say I'm busy just like UART can.  omap_hdq_can_sleep() ???
  There must be a cleaner way...
 
 The first thing to try is the HDQ runtime PM conversion series that was 
 sent separately.  Maybe let us know if that fixes the problem.

Not completely - but I'm still exploring.

 
  More experiments:
   - I enabled off_mode and started seeing state3 happening.  UART and HDQ
 still fine - as expected.
   - I set the  /sys/devices/system/cpu/cpu0/cpuidle/state?/usage values
 all to '10'.
 Now things start to break;
 Console is mostly OK, but the first char after 10 seconds of idle is bad.
 I type 'enter' and get 'C'.  In general the first char typed is mapped 
  to 
 something else in a consistent way:
0a - c3
20 - 90
55 - d5
2a - 95
 It is a bit like we are missing a start bit, and a stop bit is shifting
 down into the msb .. sometimes.
 
 I'm not seeing garbling at all on the OMAP3 BeagleBoard here, running v3.2 
 with omap2plus_defconfig.  The first character from the console gets lost, 
 which again is expected when the CORE enters a low-power state and there's 
 no flow control on the serial interface -- but no garbling.

What baud rate?

If I use 57600 or lower I see no problem at all.  The first char I type gets
through fine.
If I try higher rates (115200 is the default, I've tried 230400 and 460800)
I get consistent corruption, though different corruptions at different rates.

I've tried enabling hardware handshaking but it makes no difference -
however it is possibly there is something wrong with the wiring so I'll check
that before pursuing that direction much further.

Presumably with working hardware handshaking we should be able to avoid any
loss or corruption??

(I'm not sure if I said - but the board I am working on is the GTA04 Openmoko
board: www.gta04.org).

Thanks,
NeilBrown



signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

Hi

just to follow up on this, I think there are some other bugs in this 
formula... maybe someone might want to tackle them.

On Fri, 13 Jan 2012, Paul Walmsley wrote:

 up-calc_latency = (100 * up-port.fifosize) /
 (baud / 8);

One problem is that UARTs generally don't transmit only 8 bits per byte.  
In the most common framing arrangement, 8N1, there are ten bits per byte.  
See for example

https://en.wikipedia.org/wiki/8-N-1

So that part of this formula will contribute to an underestimation of the 
time it takes to fill the FIFO.  Using 10 data bits per byte adds another 
277 microseconds.

So ideally this formula should change the 8 depending on the number of 
data bits, whether there is a parity bit, and on the number of stop bits.  
The real formula to determine the number of bits per byte should be 
something like:

bits_per_byte = number of data bits + 1 bit if parity is used + number 
of stop bits

Those quantities should all be easily available in 
serial_omap_set_termios().  So then the formula would be:

 up-calc_latency = (100 * up-port.fifosize) /
 (baud / bits_per_byte);


...

Another issue is that the formula is missing the receive FIFO trigger 
level watermark.  That might contribute to an overestimation of the amount 
of time available for the MPU to service the FIFO.  So rather than using 
simply up-port.fifosize, it should use (up-port.fifosize - 
up.port-rxtrigthres).  up.port-rxtrigthres should be the appropriate 
number derived from the 34xx TRM vZR section 17.4.2.1.2 Receive FIFO 
Trigger section - it comes from the UART's TCR register.

So then it would be

 up-calc_latency = 
   (100 * (up-port.fifosize - up-port.rxtrigthres)) /
 (baud / bits_per_byte);

It seems however that omap-serial.c just hardcodes the rxtrigthres to 0, 
so maybe this is not such a significant issue:

serial_out(up, UART_TI752_TCR, OMAP_UART_TCR_TRIG);

Ideally the two FIFO thresholds here would be configurable by the UART 
user - especially given that the OMAP UART can be quite deep - 60 bytes.

...

One other issue.  If flow control is enabled, then IMHO, the UART probably 
shouldn't set a wakeup latency constraint at all.  That's because there 
would be no risk of data loss.  Users of the UART might have some 
performance constraints, but those should be set by the UART-using code or 
some system-wide wakeup latency value.

...

As an aside, I wonder how we handle the UART flow control lines during 
chip off-mode?  I assume those would need an appropriate off-mode mux 
value to indicate that the OMAP UART isn't ready; I wonder if we do that?

  
- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Sat, 14 Jan 2012, NeilBrown wrote:

 On Fri, 13 Jan 2012 04:31:37 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
  I'm not seeing garbling at all on the OMAP3 BeagleBoard here, running v3.2 
  with omap2plus_defconfig.  The first character from the console gets lost, 
  which again is expected when the CORE enters a low-power state and there's 
  no flow control on the serial interface -- but no garbling.
 
 What baud rate?

115200 bps, no flow control.

 Presumably with working hardware handshaking we should be able to avoid any
 loss or corruption??

Yes.  If you're using off-mode, you'll need to ensure that the I/O ring is 
set to wake the system when it sees the appropriate signal level on the 
incoming flow control line.

 (I'm not sure if I said - but the board I am working on is the GTA04 Openmoko
 board: www.gta04.org).

Did not know that.  Exciting!  Maybe you can send some to the OMAP 
maintainers ;-)
 

- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Fri, 13 Jan 2012, Paul Walmsley wrote:

 Let's work out this formula for 115.2kbps:
 
 (100 * 16) / (1000 * 115200 / 8)
 
 = 1600 / 1440
 
 = 1.11...  (presumably the unit here is milliseconds)
 
 = 1 (since up-calc_latency is a u32)
 
 Then up-calc_latency is passed to pm_qos_update_request().  According to 
 Documentation/power/pm_qos_interface.txt, it is expecting microseconds.
 So that's one problem.  To fix that it should be 
 
 up-calc_latency = (100 * up-port.fifosize) /
 (baud / 8);
 
 With that change, it should be possible to enter C3, with the timings that 
 are in the mainline cpuidle34xx.c.

Actually, just noticed that the fifosize is 60 bytes.  So that's even 
better.  With the fixed formula above, it should be ~ 5.2 ms wakeup 
latency - enough to enter C4 in the CPUIdle data in mainline -- at least 
in theory...


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread NeilBrown

On Fri, 13 Jan 2012 16:34:06 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:


 
 So ideally this formula should change the 8 depending on the number of 
 data bits, whether there is a parity bit, and on the number of stop bits.  
 The real formula to determine the number of bits per byte should be 
 something like:
 
 bits_per_byte = number of data bits + 1 bit if parity is used + number 
 of stop bits
 + 1 for the start bit.

NeilBrown


signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-13 Thread Paul Walmsley

On Sat, 14 Jan 2012, NeilBrown wrote:

 On Fri, 13 Jan 2012 16:34:06 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:
 
 
  
  So ideally this formula should change the 8 depending on the number of 
  data bits, whether there is a parity bit, and on the number of stop bits.  
  The real formula to determine the number of bits per byte should be 
  something like:
  
  bits_per_byte = number of data bits + 1 bit if parity is used + number 
  of stop bits
  + 1 for the start bit.

Indeed...


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-12 Thread Tomi Valkeinen

Hi,

On Wed, 2012-01-11 at 15:15 +, Joe Woodward wrote:

  The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
  negative value, could you print out that value too?
  
  If the call to pm_runtime_get_sync() fails, we bail out immediately. So
  the LCD interface shouldn't have been enabled at all. I don't know why 
  we get sync lost errors.
  
  There is a possibility that the LCD interface wasn't switched off when 
  for some reason when we suspended, and switching on the clocks again 
  tries to start the transfer without DSS being configured correctly?

I can reproduce this on my Overo. The problem goes away if I change
dss's and dispc'c pm_runtime_put() calls to pm_runtime_put_sync(). But
I'm not quite sure why.

I would imagine that the pm framework would flush the async
pm_runtime_put calls before the system goes into suspend. Comparing the
logs of successful and non-successful suspends it seems that the order
of suspend and resume events changes. Perhaps the suspend goes fine, but
for whatever reason the resume calls happen in a different order,
causing the problems. I need to study this more to understand the
problem.

Kevin, can you confirm that just using pm_runtime_put() is enough even
in system suspend case, and the framework handles doing the runtime_put
before suspending?

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-12 Thread Tomi Valkeinen

On Thu, 2012-01-12 at 11:28 +0200, Tomi Valkeinen wrote:
 Hi,
 
 On Wed, 2012-01-11 at 15:15 +, Joe Woodward wrote:
 
   The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
   negative value, could you print out that value too?
   
   If the call to pm_runtime_get_sync() fails, we bail out immediately. So
   the LCD interface shouldn't have been enabled at all. I don't know why 
   we get sync lost errors.
   
   There is a possibility that the LCD interface wasn't switched off when 
   for some reason when we suspended, and switching on the clocks again 
   tries to start the transfer without DSS being configured correctly?
 
 I can reproduce this on my Overo. The problem goes away if I change
 dss's and dispc'c pm_runtime_put() calls to pm_runtime_put_sync(). But
 I'm not quite sure why.

Oh, one more thing. I didn't get any sync losts, but I may just have
been lucky. So it may or may not be a separate problem.

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-12 Thread Tomi Valkeinen

On Thu, 2012-01-12 at 11:28 +0200, Tomi Valkeinen wrote:
 Hi,
 
 On Wed, 2012-01-11 at 15:15 +, Joe Woodward wrote:
 
   The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
   negative value, could you print out that value too?
   
   If the call to pm_runtime_get_sync() fails, we bail out immediately. So
   the LCD interface shouldn't have been enabled at all. I don't know why 
   we get sync lost errors.
   
   There is a possibility that the LCD interface wasn't switched off when 
   for some reason when we suspended, and switching on the clocks again 
   tries to start the transfer without DSS being configured correctly?
 
 I can reproduce this on my Overo. The problem goes away if I change
 dss's and dispc'c pm_runtime_put() calls to pm_runtime_put_sync(). But
 I'm not quite sure why.

(I seem to have not waken up today...)

I also forgot to mention that the fail happens in
drivers/base/power/runtime.c:rpm_resume(), where the check
dev-power.disable_depth  0 causes the function to return EACCES.

I'm not sure what disable_depth  0 means.

 Tomi



signature.asc
Description: This is a digitally signed message part

Re: DSS2/PM on 3.2 broken?

2012-01-12 Thread Tomi Valkeinen

On Mon, 2012-01-09 at 12:46 +, Joe Woodward wrote:
 I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected 
 via the DPI interface (using the generic panel driver).
 
 Entering standby used to work just fine on 3.0, but on 3.2 I get the 
 following:

I've been debugging this, but I'm at loss. I added some debug prints,
which I show below, and I also pushed them to
git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints if
somebody wants to see exactly what they print.

Kevin, perhaps you have an idea what could be wrong here. Long version
below, short version: doing system suspend breaks omapdss, if omapdss
uses pm_runtime_put, but works with pm_runtime_put_sync.

First, as a test case, here's what happens if I manually disable a
display (with my comments beginning with #):

echo 0  /sys/devices/platform/omapdss/display0/enabled 
# here dpi.c calls dispc_runtime_put
[   44.528015] dispc_runtime_put
[   44.531158] dispc disable depth: 0
[   44.535308] dispc disable depth: 0
[   44.538879] dispc_runtime_put end 0
# here dpi.c calls dss_runtime_put
[   44.542633] dss_runtime_put
[   44.545593] dss disable depth: 0
[   44.548980] dss disable depth: 0
[   44.552429] dss_runtime_put end 0

# then the runtime PM runs dispc suspend in its workqueue
[   44.557861] dispc_runtime_suspend start
# here dispc_runtime_suspend uses dss_runtime_put
[   44.561981] dss_runtime_put
[   44.564910] dss disable depth: 0
[   44.568420] dss disable depth: 0
[   44.571838] dss_runtime_put end 0
[   44.575378] dispc_runtime_suspend end
# then the runtime PM runs dss suspend in its workqueue
[   44.579315] dss_runtime_suspend start
[   44.583221] dss_runtime_suspend end


And similarly when enabling the display:

echo 1  /sys/devices/platform/omapdss/display0/enabled 
[  442.109222] dss_runtime_get
[  442.112304] dss disable depth: 0
[  442.115753] dss_runtime_resume start
[  442.119537] dss_runtime_resume end
[  442.123199] dss disable depth: 0
[  442.126586] dss_runtime_get end 0
[  442.130126] dispc_runtime_get
[  442.133270] dispc disable depth: 0
[  442.136932] dispc_runtime_resume start
[  442.140869] dss_runtime_get
[  442.143890] dss disable depth: 0
[  442.147308] dss disable depth: 0
[  442.150695] dss_runtime_get end 1
[  442.154235] dispc_runtime_resume end
[  442.158020] dispc disable depth: 0
[  442.161651] dispc_runtime_get end 0


This all works fine. Now, if I suspend the system:

echo mem  /sys/power/state 
[  482.097930] PM: Syncing filesystems ... done.
[  482.105041] PM: Preparing system for mem sleep
[  482.116394] Freezing user space processes ... (elapsed 0.02 seconds) done.
[  482.146331] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) 
done.
[  482.177520] PM: Entering mem sleep
# dss core suspend disables the displays, as was done manually above
[  482.197570] core suspend 2
[  482.213867] dispc_runtime_put
[  482.217010] dispc disable depth: 0
[  482.220886] dispc disable depth: 0
[  482.224578] dispc_runtime_put end 0
[  482.228271] dss_runtime_put
[  482.231262] dss disable depth: 0
[  482.234649] dss disable depth: 0
[  482.238128] dss_runtime_put end 0
[  482.242218] core suspend end 0
# so far looks exactly the same as above

[  482.246795] PM: suspend of devices complete after 60.260 msecs
# now runtime PM runs the suspends in its workqueue
[  482.256744] dispc_runtime_suspend start
[  482.260955] dss_runtime_put

# dss's disable_depth is 1?? Which, afaik, means something like runtime PM is 
disabled for DSS.

[  482.263916] dss disable depth: 1
[  482.267303] dss disable depth: 1
[  482.270782] dss_runtime_put end 0
[  482.274261] dispc_runtime_suspend end
[  482.278381] dss_runtime_suspend start
[  482.282257] dss_runtime_suspend end
[  482.286224] PM: late suspend of devices complete after 33.209 msecs
[  482.292968] Disabling non-boot CPUs ...


Except the disable_depth, the suspend went as in manual case. But on resume 
things are rather broken:


[  696.654754] Successfully put all powerdomains to target state

# Here the PM framework calls dss's runtime_resume callback, even if nobody has 
called pm_rutime_get for dss!

[  696.661254] dss_runtime_resume start
[  696.665100] dss_runtime_resume end
[  696.668731] dispc_runtime_resume start
[  696.672760] dss_runtime_get
[  696.675689] dss disable depth: 1
[  696.679168] EACCESS RPM_RESUME
[  696.682373] dss disable depth: 1

# And here pm_runtime_get fails for DSS, because dss disable_depth is 1...

[  696.685791] [ cut here ]
[  696.690734] WARNING: at drivers/video/omap2/dss/dss.c:716 
dss_runtime_get+0x74/0x8c()
[  696.699035] Modules linked in:
[  696.702392] [c001b3c4] (unwind_backtrace+0x0/0xf0) from [c005248c] 
(warn_slowpath_common+0x4c
/0x64)
[  696.712371] [c005248c] (warn_slowpath_common+0x4c/0x64) from [c00524c0] 
(warn_slowpath_null+0
x1c/0x24)
[  696.722625] [c00524c0] (warn_slowpath_null+0x1c/0x24) from [c0273cd8] 
(dss_runtime_get+0x74/0
x8c)
[  696.732421]

Re: DSS2/PM on 3.2 broken?

2012-01-12 Thread Kevin Hilman

Tomi Valkeinen tomi.valkei...@ti.com writes:

 On Mon, 2012-01-09 at 12:46 +, Joe Woodward wrote:
 I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected 
 via the DPI interface (using the generic panel driver).
 
 Entering standby used to work just fine on 3.0, but on 3.2 I get the 
 following:

 I've been debugging this, but I'm at loss. I added some debug prints,
 which I show below, and I also pushed them to
 git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints if
 somebody wants to see exactly what they print.

 Kevin, perhaps you have an idea what could be wrong here. Long version
 below, short version: doing system suspend breaks omapdss, if omapdss
 uses pm_runtime_put, but works with pm_runtime_put_sync.

Is the pm_runtime_put() done after system suspend has started?  

After system suspend has begun, the workqueue used for async callbacks
is frozen, so any runtime PM calls that you want to work during
suspend/resume must use the _sync versions.

Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-12 Thread Tomi Valkeinen

On Thu, 2012-01-12 at 14:40 -0800, Kevin Hilman wrote:
 Tomi Valkeinen tomi.valkei...@ti.com writes:
 
  On Mon, 2012-01-09 at 12:46 +, Joe Woodward wrote:
  I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel 
  connected via the DPI interface (using the generic panel driver).
  
  Entering standby used to work just fine on 3.0, but on 3.2 I get the 
  following:
 
  I've been debugging this, but I'm at loss. I added some debug prints,
  which I show below, and I also pushed them to
  git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints if
  somebody wants to see exactly what they print.
 
  Kevin, perhaps you have an idea what could be wrong here. Long version
  below, short version: doing system suspend breaks omapdss, if omapdss
  uses pm_runtime_put, but works with pm_runtime_put_sync.
 
 Is the pm_runtime_put() done after system suspend has started?  
 
 After system suspend has begun, the workqueue used for async callbacks
 is frozen, so any runtime PM calls that you want to work during
 suspend/resume must use the _sync versions.

pm_runtime_put() is called inside omapdss driver's .suspend callback. So
I guess that means the system suspend has started. However, the logs
show that the runtime_suspend callback _is_ being called before the
system suspend is finished, so the workqueue can't be frozen...

 Tomi


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Paul Walmsley

On Mon, 9 Jan 2012, Joe Woodward wrote:

 I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected 
 via the DPI interface (using the generic panel driver).
 
 Entering standby used to work just fine on 3.0, but on 3.2 I get the 
 following:
 
 # echo mem  /sys/power/state

...

 [   24.466674] [ cut here ]
 [   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713 0xc01350f8()

...

 At which point the screen either restarts, or sometimes flickers and I get 
 the following:
 [   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
 output with video overlays disabled
 [   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
 output with video overlays disabled
 [   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
 output with video overlays disabled
 
 It normally recovers after doing this for a while...
 
 Anyone have any ideas?

Consider cc'ing the DSS maintainers and linux-fbdev list:

OMAP DISPLAY SUBSYSTEM and FRAMEBUFFER SUPPORT (DSS2)
M:  Tomi Valkeinen tomi.valkei...@ti.com
L:  linux-omap@vger.kernel.org
L:  linux-fb...@vger.kernel.org
S:  Maintained
F:  drivers/video/omap2/
F:  Documentation/arm/OMAP/DSS


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Paul Walmsley

cc Kevin

On Tue, 10 Jan 2012, NeilBrown wrote:

 It seems that when cpuidle on an omap3 tries to switch to lower power
 states, various things misbehave:
  - UARTs lose characters

Is this with off-mode enabled, or is this with only retention idle 
enabled?

If off-mode is enabled, and incoming serial traffic is used to wake the 
system, it's known and expected that the first byte will be lost.  This is 
an artifact of the hardware and seems to be unavoidable.

  - dss loses sync

Best to take this up with the DSS maintainer directly.

  - HDQ seems to lose everything.

Is this with off-mode enabled, or just with retention idle?

If the former, this is hardly surprising as the HDQ driver 
(drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore 
code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit 
of work is needed there. 


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Archit


Hi,

On Wednesday 11 January 2012 07:13 PM, Paul Walmsley wrote:

cc Kevin

On Tue, 10 Jan 2012, NeilBrown wrote:


It seems that when cpuidle on an omap3 tries to switch to lower power
states, various things misbehave:
  - UARTs lose characters


Is this with off-mode enabled, or is this with only retention idle
enabled?

If off-mode is enabled, and incoming serial traffic is used to wake the
system, it's known and expected that the first byte will be lost.  This is
an artifact of the hardware and seems to be unavoidable.


  - dss loses sync


Best to take this up with the DSS maintainer directly.


Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in your 
bootargs and share logs?


The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
negative value, could you print out that value too?


If the call to pm_runtime_get_sync() fails, we bail out immediately. So 
the LCD interface shouldn't have been enabled at all. I don't know why 
we get sync lost errors.


There is a possibility that the LCD interface wasn't switched off when 
for some reason when we suspended, and switching on the clocks again 
tries to start the transfer without DSS being configured correctly?


Also, are you on the final 3.2, or the 3.2-rc's by any chance? There 
were some hwmod patches which got in 3.2-rc4 related to hwmod fixes.


Archit




  - HDQ seems to lose everything.


Is this with off-mode enabled, or just with retention idle?

If the former, this is hardly surprising as the HDQ driver
(drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore
code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit
of work is needed there.


- Paul
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Joe Woodward

...snip...

 
 Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
 your bootargs and share logs?

As requested:

# echo mem  /sys/power/state
[   37.371734] PM: Syncing filesystems ... done.
[   37.397460] PM: Preparing system for mem sleep
[   37.402923] Freezing user space processes ... (elapsed 0.02 seconds) done.
[   37.432312] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) 
done.
[   37.463256] PM: Entering mem sleep
[   37.467132] Suspending console(s) (use no_console_suspend to debug)

 SYSTEM SUSPENDS HERE

[   37.587646] omapdss CORE: suspend 2
[   37.776000] omapdss DISPC: dispc_runtime_put
[   37.776336] omapdss DSS: dss_runtime_put
[   37.779693] PM: suspend of devices complete after 305.389 msecs
[   37.786010] omapdss DISPC: dispc_save_context
[   37.786437] omapdss DISPC: context saved, ctx_loss_count 0
[   37.786560] omapdss DSS: dss_runtime_put
[   37.786773] omapdss DSS: dss_save_context
[   37.786895] omapdss DSS: context saved
[   37.787384] PM: late suspend of devices complete after 7.446 msecs
[   40.932312] Successfully put all powerdomains to target state
[   40.934509] omapdss DSS: dss_restore_context
[   40.934661] omapdss DSS: context restored
[   40.934814] omapdss DSS: dss_runtime_get
[   40.934967] @@@ dss_runtime_get::pm_runtime_get_sync() return val = -13
[   40.935089] [ cut here ]
[   40.935241] WARNING: at drivers/video/omap2/dss/dss.c:716 0xc0137304()
[   40.935363] Modules linked in:
[   40.935516] Backtrace:
[   40.935638] Function entered at [c0010c48] from [c02bdad0]
[   40.935791]  r6:c02fa302 r5:02cc r4: r3:
[   40.936065] Function entered at [c02bdab8] from [c0034874]
[   40.936218] Function entered at [c0034820] from [c00348b0]
[   40.936340]  r8: r7:c0386c14 r6:c0028c48 r5:c0371c24 r4:fff3
[   40.936645] r3:0009
[   40.936767] Function entered at [c003488c] from [c0137304]
[   40.936920] Function entered at [c01372a8] from [c0137b20]
[   40.937042]  r4:dec50008 r3:c0137b10
[   40.937225] Function entered at [c0137b10] from [c0179ff0]
[   40.937347]  r6:c0028c48 r5:c0371c24 r4:dec50008 r3:c0137b10
[   40.937652] Function entered at [c0179fbc] from [c0028ca4]
[   40.937774] Function entered at [c0028c48] from [c017afc0]
[   40.937927]  r4:dec50008 r3:
[   40.938079] Function entered at [c017aee0] from [c017b5ec]
[   40.938232]  r9:c02d0044 r8: r6:dec5005c r5:0010 r4:dec50008
[   40.938568] Function entered at [c017b520] from [c0063e48]
[   40.938690]  r8:c02c3898 r7: r6:0003 r5: r4:
[   40.938995] Function entered at [c0063cc0] from [c0064004]
[   40.939147]  r7:0004 r6: r5:c02c388c r4:0003
[   40.939422] Function entered at [c0063ef4] from [c0062e78]
[   40.939544]  r6:0003 r5:0003 r4:d8869000 r3:006d
[   40.939819] Function entered at [c0062dd0] from [c0117ac8]
[   40.939971] Function entered at [c0117aac] from [c00e2e54]
[   40.940124] Function entered at [c00e2d44] from [c009903c]
[   40.940246] Function entered at [c0098f88] from [c00992b8]
[   40.940368]  r8:0004 r7: r6: r5:000aabf8 r4:d857c940
[   40.940704] Function entered at [c0099274] from [c000dcc0]
[   40.941101]  r8:c000de44 r7:0004 r6:000aabf8 r5:0004 r4:000a8e38
[   40.941436] ---[ end trace db0a5fd53a21ce6e ]---
[   40.943511] PM: early resume of devices complete after 9.307 msecs
[   40.950622] omapdss DISPC: dispc_save_context
[   40.951049] omapdss DISPC: context saved, ctx_loss_count 0
[   40.951171] omapdss DSS: dss_runtime_put
[   40.951812] omapdss CORE: resume
[   40.953460] omapdss DSS: dss_runtime_get
[   40.953613] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[   40.953735] omapdss DISPC: dispc_runtime_get
[   40.953948] omapdss DSS: dss_runtime_get
[   40.954071] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[   40.954193] omapdss DISPC: dispc_restore_context
[   40.954467] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[   40.954620] omapdss DSS: dispc clock info found from cache.
[   40.954772] omapdss DSS: dpll4_m4 = 43200
[   40.954986] omapdss DSS: fck = 7200 (12)
[   40.955108] omapdss DISPC: lck = 7200 (1)
[   40.955230] omapdss DISPC: pck = 3600 (2)
[   40.955383] omapdss DISPC: channel 0 xres 800 yres 480
[   40.955505] omapdss DISPC: pck 36000
[   40.955657] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[   40.955810] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[   41.294464] PM: resume of devices complete after 345.153 msecs
[   41.688568] PM: Finishing wakeup.
[   41.692169] Restarting tasks ... done.

When SYNC is lost we get something like:
[  118.972534] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
output with video overlays disabled
[  119.166839] omapdss DISPC: dispc_runtime_put
[  119.175292] omapdss DSS: dss_runtime_put
[  119.183746] omapdss DISPC: dispc_enable_plane 1, 0
[

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Archit


Hi,

On Wednesday 11 January 2012 08:45 PM, Joe Woodward wrote:

...snip...



Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
your bootargs and share logs?


As requested:

# echo mem  /sys/power/state
[   37.371734] PM: Syncing filesystems ... done.
[   37.397460] PM: Preparing system for mem sleep
[   37.402923] Freezing user space processes ... (elapsed 0.02 seconds) done.
[   37.432312] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) 
done.
[   37.463256] PM: Entering mem sleep
[   37.467132] Suspending console(s) (use no_console_suspend to debug)

 SYSTEM SUSPENDS HERE

[   37.587646] omapdss CORE: suspend 2
[   37.776000] omapdss DISPC: dispc_runtime_put
[   37.776336] omapdss DSS: dss_runtime_put
[   37.779693] PM: suspend of devices complete after 305.389 msecs
[   37.786010] omapdss DISPC: dispc_save_context
[   37.786437] omapdss DISPC: context saved, ctx_loss_count 0
[   37.786560] omapdss DSS: dss_runtime_put
[   37.786773] omapdss DSS: dss_save_context
[   37.786895] omapdss DSS: context saved
[   37.787384] PM: late suspend of devices complete after 7.446 msecs
[   40.932312] Successfully put all powerdomains to target state
[   40.934509] omapdss DSS: dss_restore_context
[   40.934661] omapdss DSS: context restored
[   40.934814] omapdss DSS: dss_runtime_get
[   40.934967] @@@ dss_runtime_get::pm_runtime_get_sync() return val = -13


This resume/restore looks to be happening a bit early according to me. 
All the resuming should happen after the print 'omapdss CORE: resume'.


One experiment worth trying would be to reboot/halt the kernel which 
will lead to only shutdown of DSS, and no resume. This will help us 
figure out if the issue is in the suspend or resume path.


Archit


[   40.935089] [ cut here ]
[   40.935241] WARNING: at drivers/video/omap2/dss/dss.c:716 0xc0137304()
[   40.935363] Modules linked in:
[   40.935516] Backtrace:
[   40.935638] Function entered at [c0010c48] from [c02bdad0]
[   40.935791]  r6:c02fa302 r5:02cc r4: r3:
[   40.936065] Function entered at [c02bdab8] from [c0034874]
[   40.936218] Function entered at [c0034820] from [c00348b0]
[   40.936340]  r8: r7:c0386c14 r6:c0028c48 r5:c0371c24 r4:fff3
[   40.936645] r3:0009
[   40.936767] Function entered at [c003488c] from [c0137304]
[   40.936920] Function entered at [c01372a8] from [c0137b20]
[   40.937042]  r4:dec50008 r3:c0137b10
[   40.937225] Function entered at [c0137b10] from [c0179ff0]
[   40.937347]  r6:c0028c48 r5:c0371c24 r4:dec50008 r3:c0137b10
[   40.937652] Function entered at [c0179fbc] from [c0028ca4]
[   40.937774] Function entered at [c0028c48] from [c017afc0]
[   40.937927]  r4:dec50008 r3:
[   40.938079] Function entered at [c017aee0] from [c017b5ec]
[   40.938232]  r9:c02d0044 r8: r6:dec5005c r5:0010 r4:dec50008
[   40.938568] Function entered at [c017b520] from [c0063e48]
[   40.938690]  r8:c02c3898 r7: r6:0003 r5: r4:
[   40.938995] Function entered at [c0063cc0] from [c0064004]
[   40.939147]  r7:0004 r6: r5:c02c388c r4:0003
[   40.939422] Function entered at [c0063ef4] from [c0062e78]
[   40.939544]  r6:0003 r5:0003 r4:d8869000 r3:006d
[   40.939819] Function entered at [c0062dd0] from [c0117ac8]
[   40.939971] Function entered at [c0117aac] from [c00e2e54]
[   40.940124] Function entered at [c00e2d44] from [c009903c]
[   40.940246] Function entered at [c0098f88] from [c00992b8]
[   40.940368]  r8:0004 r7: r6: r5:000aabf8 r4:d857c940
[   40.940704] Function entered at [c0099274] from [c000dcc0]
[   40.941101]  r8:c000de44 r7:0004 r6:000aabf8 r5:0004 r4:000a8e38
[   40.941436] ---[ end trace db0a5fd53a21ce6e ]---
[   40.943511] PM: early resume of devices complete after 9.307 msecs
[   40.950622] omapdss DISPC: dispc_save_context
[   40.951049] omapdss DISPC: context saved, ctx_loss_count 0
[   40.951171] omapdss DSS: dss_runtime_put
[   40.951812] omapdss CORE: resume
[   40.953460] omapdss DSS: dss_runtime_get
[   40.953613] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[   40.953735] omapdss DISPC: dispc_runtime_get
[   40.953948] omapdss DSS: dss_runtime_get
[   40.954071] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[   40.954193] omapdss DISPC: dispc_restore_context
[   40.954467] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[   40.954620] omapdss DSS: dispc clock info found from cache.
[   40.954772] omapdss DSS: dpll4_m4 = 43200
[   40.954986] omapdss DSS: fck = 7200 (12)
[   40.955108] omapdss DISPC: lck = 7200 (1)
[   40.955230] omapdss DISPC: pck = 3600 (2)
[   40.955383] omapdss DISPC: channel 0 xres 800 yres 480
[   40.955505] omapdss DISPC: pck 36000
[   40.955657] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[   40.955810] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[   41.294464] PM: resume of devices

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Joe Woodward

-Original Message-
From: Archit a0393...@ti.com
To: Joe Woodward j...@terrafix.co.uk
Cc: Paul Walmsley p...@pwsan.com, NeilBrown ne...@suse.de, 
khil...@ti.com, linux-omap@vger.kernel.org, Tomi Valkeinen 
tomi.valkei...@ti.com
Date: Wed, 11 Jan 2012 21:22:00 +0530
Subject: Re: DSS2/PM on 3.2 broken?

 Hi,

 On Wednesday 11 January 2012 08:45 PM, Joe Woodward wrote:
  ...snip...

  Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
  your bootargs and share logs?

  As requested:

  # echo mem  /sys/power/state
  [   37.371734] PM: Syncing filesystems ... done.
  [   37.397460] PM: Preparing system for mem sleep
  [   37.402923] Freezing user space processes ... (elapsed 0.02
 seconds) done.
  [   37.432312] Freezing remaining freezable tasks ... (elapsed 0.02
 seconds) done.
  [   37.463256] PM: Entering mem sleep
  [   37.467132] Suspending console(s) (use no_console_suspend to
 debug)

   SYSTEM SUSPENDS HERE

  [   37.587646] omapdss CORE: suspend 2
  [   37.776000] omapdss DISPC: dispc_runtime_put
  [   37.776336] omapdss DSS: dss_runtime_put
  [   37.779693] PM: suspend of devices complete after 305.389 msecs
  [   37.786010] omapdss DISPC: dispc_save_context
  [   37.786437] omapdss DISPC: context saved, ctx_loss_count 0
  [   37.786560] omapdss DSS: dss_runtime_put
  [   37.786773] omapdss DSS: dss_save_context
  [   37.786895] omapdss DSS: context saved
  [   37.787384] PM: late suspend of devices complete after 7.446 msecs
  [   40.932312] Successfully put all powerdomains to target state
  [   40.934509] omapdss DSS: dss_restore_context
  [   40.934661] omapdss DSS: context restored
  [   40.934814] omapdss DSS: dss_runtime_get
  [   40.934967] @@@ dss_runtime_get::pm_runtime_get_sync() return val
 = -13

 This resume/restore looks to be happening a bit early according to me. 
 All the resuming should happen after the print 'omapdss CORE: resume'.

 One experiment worth trying would be to reboot/halt the kernel which 
 will lead to only shutdown of DSS, and no resume. This will help us 
 figure out if the issue is in the suspend or resume path.

 Archit

I'm not 100% sure what you mean for me to do here?

Do you mean to halt the kernel after suspending, but before resuming? If so, 
how would I go about doing this?

If I just reboot or poweroff I get the following...

# poweroff
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
[   43.310607] omapdss CORE: shutdown
[   43.499053] omapdss DISPC: dispc_runtime_put
[   43.504272] omapdss DISPC: dispc_save_context
[   43.509399] omapdss DISPC: context saved, ctx_loss_count 0
[   43.515747] omapdss DSS: dss_runtime_put
[   43.521453] omapdss DSS: dss_runtime_put
[   43.527221] System halted.

# reboot
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[   25.652008] omapdss CORE: shutdown
[   25.836120] omapdss DISPC: dispc_runtime_put
[   25.841125] omapdss DISPC: dispc_save_context
[   25.846221] omapdss DSS: dss_runtime_put
[   25.851776] omapdss DISPC: context saved, ctx_loss_count 0
[   25.857788] omapdss DSS: dss_runtime_put
[   25.864074] Restarting system.

Cheers,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread Archit

Hi,

On Wednesday 11 January 2012 09:43 PM, Joe Woodward wrote:

-Original Message-
From: Archita0393...@ti.com
To: Joe Woodwardj...@terrafix.co.uk
Cc: Paul Walmsleyp...@pwsan.com, 
NeilBrownne...@suse.de,khil...@ti.com,linux-omap@vger.kernel.org, Tomi Valkeinen
tomi.valkei...@ti.com
Date: Wed, 11 Jan 2012 21:22:00 +0530
Subject: Re: DSS2/PM on 3.2 broken?

Hi,

On Wednesday 11 January 2012 08:45 PM, Joe Woodward wrote:

...snip...

Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
your bootargs and share logs?

snip

Archit

I'm not 100% sure what you mean for mmount -t debugfs debugfs 
/sys/kernel/debuge to do here?

Do you mean to halt the kernel after suspending, but before resuming? If so, 
how would I go about doing this?

I'd asked for reboot or power off as you have done below. The suspend 
path looks okay, I asked for the power off to see if there was a 
dss_runtime_get() was called at the wrong time by some interrupt while 
suspending.

One difference between the shutdown and suspend logs is that we don't 
see the prints 'omapdss DSS: dss_save_context' and 'omapdss DSS: context 
saved', that seems a bit fishy.

I'm not sure how to go ahead from here. We haven't seen this on OMAP4, 
and I don't think I have tried this on OAMP3. Maybe Tomi has more ideas.

Archit

If I just reboot or poweroff I get the following...

# poweroff
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
[   43.310607] omapdss CORE: shutdown
[   43.499053] omapdss DISPC: dispc_runtime_put
[   43.504272] omapdss DISPC: dispc_save_context
[   43.509399] omapdss DISPC: context saved, ctx_loss_count 0
[   43.515747] omapdss DSS: dss_runtime_put
[   43.521453] omapdss DSS: dss_runtime_put
[   43.527221] System halted.

# reboot
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[   25.652008] omapdss CORE: shutdown
[   25.836120] omapdss DISPC: dispc_runtime_put
[   25.841125] omapdss DISPC: dispc_save_context
[   25.846221] omapdss DSS: dss_runtime_put
[   25.851776] omapdss DISPC: context saved, ctx_loss_count 0
[   25.857788] omapdss DSS: dss_runtime_put
[   25.864074] Restarting system.

Cheers,
Joe

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DSS2/PM on 3.2 broken?

2012-01-11 Thread NeilBrown

On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley p...@pwsan.com wrote:

 cc Kevin
 
 On Tue, 10 Jan 2012, NeilBrown wrote:
 
  It seems that when cpuidle on an omap3 tries to switch to lower power
  states, various things misbehave:
   - UARTs lose characters
 
 Is this with off-mode enabled, or is this with only retention idle 
 enabled?

I haven't experimented with off-mode much.  When I try it the suspend-time
current drain goes from 65mA to about 90mA which doesn't seem the right
direction.  The only other observation is that the cam power domain
doesn't reach the target state of '0'.  I'm guessing they are related but
haven't progressed further. (I don't have anything like a camera driver
compiled in - nor do I have a camera attached).

So it is just retention.

I occasionally see evidence of a garbled char when I wake from suspend, but
that doesn't bother me much.

I spent some time exploring why cpuidle never drops below state 0 and found
out that the code explicitly disables other states when uart is active - for
a fairly broad definition of 'active'.

I found the sleep_timeout setting and set them all to 1 second.  This meant
that cpuidle started working, but I got a lot of garbage characters.

 
 If off-mode is enabled, and incoming serial traffic is used to wake the 
 system, it's known and expected that the first byte will be lost.  This is 
 an artifact of the hardware and seems to be unavoidable.

Does this really mean CPU_IDLE cannot be used when you might expect chars
from a serial port - e.g. GPS or Bluetooth?
Which power domain needs to stay non-off?  I would guess 'CORE' for UART1,2
and 'PER' for UART3,4.  I'm using UART3 for the console port, but it looks
like 'PER' doesn't get switched off by CPUIDLE.

What is the important issue here that I don't know about ? :-)

 
   - dss loses sync
 
 Best to take this up with the DSS maintainer directly.
 
   - HDQ seems to lose everything.
 
 Is this with off-mode enabled, or just with retention idle?

Not off-mode - just retention.

 
 If the former, this is hardly surprising as the HDQ driver 
 (drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore 
 code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit 
 of work is needed there. 

The HDQ driver does enable/disable the iclk and fclk.  I thought I read that
having the clocks enabled would prevent the powerdomain from dropping to a
lower-power state ??

Could you suggest some other driver which does do the right sort of runtime
PM that I could try copying?

Thanks,
NeilBrown



signature.asc
Description: PGP signature

Re: DSS2/PM on 3.2 broken?

2012-01-10 Thread Joe Woodward

-Original Message-
From: NeilBrown ne...@suse.de
To: Joe Woodward j...@terrafix.co.uk
Cc: linux-omap@vger.kernel.org
Date: Tue, 10 Jan 2012 08:08:49 +1100
Subject: Re: DSS2/PM on 3.2 broken?

 On Mon, 09 Jan 2012 12:46:43 + Joe Woodward j...@terrafix.co.uk
 wrote:
 
  I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel
 connected via the DPI interface (using the generic panel driver).
  
  Entering standby used to work just fine on 3.0, but on 3.2 I get the
 following:
  
  # echo mem  /sys/power/state
  [   23.186279] PM: Syncing filesystems ... done.
  [   23.194244] Freezing user space processes ... (elapsed 0.01
 seconds) done.
  [   23.219543] Freezing remaining freezable tasks ... (elapsed 0.02
 seconds) done.
  [   23.251037] Suspending console(s) (use no_console_suspend to
 debug)
  [   23.554656] PM: suspend of devices complete after 296.417 msecs
  [   23.561859] PM: late suspend of devices complete after 6.957 msecs
  [   24.464813] Successfully put all powerdomains to target state
  [   24.466674] [ cut here ]
  [   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713
 0xc01350f8()
  [   24.467010] Modules linked in:
  [   24.467132] Backtrace:
  [   24.467254] Function entered at [c0010c3c] from [c02c4a60]
  [   24.467407]  r6:c02ffdaa r5:02c9 r4: r3:
  [   24.467651] Function entered at [c02c4a48] from [c00344d0]
  [   24.467803] Function entered at [c003447c] from [c003450c]
  [   24.467926]  r8: r7:c0390a84 r6:c00288a4 r5:c037b85c
 r4:fff3
  [   24.468200] r3:0009
  [   24.468322] Function entered at [c00344e8] from [c01350f8]
  [   24.468475] Function entered at [c01350ac] from [c013595c]
  [   24.468597]  r4:dec50208 r3:c013594c
  [   24.468780] Function entered at [c013594c] from [c0182724]
  [   24.468902]  r6:c00288a4 r5:c037b85c r4:dec50208 r3:c013594c
  [   24.469177] Function entered at [c01826f0] from [c0028900]
  [   24.469299] Function entered at [c00288a4] from [c01836f4]
  [   24.469421]  r4:dec50208 r3:
  [   24.469604] Function entered at [c0183614] from [c0183d20]
  [   24.469726]  r9:c02d7044 r8: r6:dec5025c r5:0010
 r4:dec50208
  [   24.470031] Function entered at [c0183c54] from [c0063aa4]
  [   24.470153]  r8:c02ca888 r7: r6:0003 r5:
 r4:
  [   24.470458] Function entered at [c006391c] from [c0063c60]
  [   24.470581]  r7:0004 r6: r5:c02ca87c r4:0003
  [   24.471130] Function entered at [c0063b50] from [c0062ad4]
  [   24.471282]  r6:0003 r5:0003 r4:c6a87000 r3:006d
  [   24.471557] Function entered at [c0062a2c] from [c0117728]
  [   24.471679] Function entered at [c011770c] from [c00e2ab0]
  [   24.471832] Function entered at [c00e29a0] from [c0098c98]
  [   24.471954] Function entered at [c0098be4] from [c0098f14]
  [   24.472076]  r8:0004 r7: r6: r5:000ac750
 r4:d8a70dc0
  [   24.472412] Function entered at [c0098ed0] from [c000dcc0]
  [   24.472534]  r8:c000de44 r7:0004 r6:000ac750 r5:0004
 r4:000a8e38
  [   24.472839] ---[ end trace 9f4f3053f6637dae ]---
  [   24.475006] PM: early resume of devices complete after 8.666 msecs
  [   25.040344] PM: resume of devices complete after 560.943 msecs
  [   25.277801] Restarting tasks ... done.
  
  At which point the screen either restarts, or sometimes flickers and
 I get the following:
  [   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd,
 restarting the output with video overlays disabled
  [   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd,
 restarting the output with video overlays disabled
  [   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd,
 restarting the output with video overlays disabled
  
  It normally recovers after doing this for a while...
  
  Anyone have any ideas?
 
 I think I can help you work around the problem but I would much rather
 see it
 fixed.  So the main reason I'm replying is to make this thread seem
 more
 interesting so that more people look at it and hopefully the right
 person
 sees it :-)
 
 It seems that when cpuidle on an omap3 tries to switch to lower power
 states, various things misbehave:
  - UARTs lose characters
  - dss loses sync
  - HDQ seems to lose everything.
 
 The first is known in the code so cpuidle is disabled if there is any
 UART
 traffic.  At first boot it is assume thered might be uart activity and
 the
 timeout for there doesn't seem to be any activity is '0' meaning
 'don't
 time out' (look for sleep_timeout in sysfs).
 
 On suspend, the UARTs are allowed to go to sleep and they are not
 explicitly
 woken at resume.  So after a suspend/resume cycle, cpuidle is more
 likely to
 try to adjust idle states and so DSS and HDQ are more likely to fail.
 
 You can disable cpuidle by writing '0' to all the 'sleep_timeout'
 fields, or
 probably by
echo 1   /sys/module/cpuidle/parameters/off
 
 That should stablise the display.
 
 It would be great

Re: DSS2/PM on 3.2 broken?

2012-01-09 Thread NeilBrown

On Mon, 09 Jan 2012 12:46:43 + Joe Woodward j...@terrafix.co.uk wrote:

 I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected 
 via the DPI interface (using the generic panel driver).
 
 Entering standby used to work just fine on 3.0, but on 3.2 I get the 
 following:
 
 # echo mem  /sys/power/state
 [   23.186279] PM: Syncing filesystems ... done.
 [   23.194244] Freezing user space processes ... (elapsed 0.01 seconds) done.
 [   23.219543] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) 
 done.
 [   23.251037] Suspending console(s) (use no_console_suspend to debug)
 [   23.554656] PM: suspend of devices complete after 296.417 msecs
 [   23.561859] PM: late suspend of devices complete after 6.957 msecs
 [   24.464813] Successfully put all powerdomains to target state
 [   24.466674] [ cut here ]
 [   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713 0xc01350f8()
 [   24.467010] Modules linked in:
 [   24.467132] Backtrace:
 [   24.467254] Function entered at [c0010c3c] from [c02c4a60]
 [   24.467407]  r6:c02ffdaa r5:02c9 r4: r3:
 [   24.467651] Function entered at [c02c4a48] from [c00344d0]
 [   24.467803] Function entered at [c003447c] from [c003450c]
 [   24.467926]  r8: r7:c0390a84 r6:c00288a4 r5:c037b85c r4:fff3
 [   24.468200] r3:0009
 [   24.468322] Function entered at [c00344e8] from [c01350f8]
 [   24.468475] Function entered at [c01350ac] from [c013595c]
 [   24.468597]  r4:dec50208 r3:c013594c
 [   24.468780] Function entered at [c013594c] from [c0182724]
 [   24.468902]  r6:c00288a4 r5:c037b85c r4:dec50208 r3:c013594c
 [   24.469177] Function entered at [c01826f0] from [c0028900]
 [   24.469299] Function entered at [c00288a4] from [c01836f4]
 [   24.469421]  r4:dec50208 r3:
 [   24.469604] Function entered at [c0183614] from [c0183d20]
 [   24.469726]  r9:c02d7044 r8: r6:dec5025c r5:0010 r4:dec50208
 [   24.470031] Function entered at [c0183c54] from [c0063aa4]
 [   24.470153]  r8:c02ca888 r7: r6:0003 r5: r4:
 [   24.470458] Function entered at [c006391c] from [c0063c60]
 [   24.470581]  r7:0004 r6: r5:c02ca87c r4:0003
 [   24.471130] Function entered at [c0063b50] from [c0062ad4]
 [   24.471282]  r6:0003 r5:0003 r4:c6a87000 r3:006d
 [   24.471557] Function entered at [c0062a2c] from [c0117728]
 [   24.471679] Function entered at [c011770c] from [c00e2ab0]
 [   24.471832] Function entered at [c00e29a0] from [c0098c98]
 [   24.471954] Function entered at [c0098be4] from [c0098f14]
 [   24.472076]  r8:0004 r7: r6: r5:000ac750 r4:d8a70dc0
 [   24.472412] Function entered at [c0098ed0] from [c000dcc0]
 [   24.472534]  r8:c000de44 r7:0004 r6:000ac750 r5:0004 r4:000a8e38
 [   24.472839] ---[ end trace 9f4f3053f6637dae ]---
 [   24.475006] PM: early resume of devices complete after 8.666 msecs
 [   25.040344] PM: resume of devices complete after 560.943 msecs
 [   25.277801] Restarting tasks ... done.
 
 At which point the screen either restarts, or sometimes flickers and I get 
 the following:
 [   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
 output with video overlays disabled
 [   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
 output with video overlays disabled
 [   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the 
 output with video overlays disabled
 
 It normally recovers after doing this for a while...
 
 Anyone have any ideas?

I think I can help you work around the problem but I would much rather see it
fixed.  So the main reason I'm replying is to make this thread seem more
interesting so that more people look at it and hopefully the right person
sees it :-)

It seems that when cpuidle on an omap3 tries to switch to lower power
states, various things misbehave:
 - UARTs lose characters
 - dss loses sync
 - HDQ seems to lose everything.

The first is known in the code so cpuidle is disabled if there is any UART
traffic.  At first boot it is assume thered might be uart activity and the
timeout for there doesn't seem to be any activity is '0' meaning 'don't
time out' (look for sleep_timeout in sysfs).

On suspend, the UARTs are allowed to go to sleep and they are not explicitly
woken at resume.  So after a suspend/resume cycle, cpuidle is more likely to
try to adjust idle states and so DSS and HDQ are more likely to fail.

You can disable cpuidle by writing '0' to all the 'sleep_timeout' fields, or
probably by
   echo 1   /sys/module/cpuidle/parameters/off

That should stablise the display.

It would be great if you could check if cpuidle was active in 3.0 when this
all worked.  I would check by:

  grep . /sys/devices/system/cpu/cpu?/cpuidle/state?/usage

and see if any state other than state0 is used.

If cpuidle is changing power states but the display is happy, then it would
be fantastic if you

61 matches

Mail list logo