from:"Thomas Gleixner"

Re: [PATCH] treewide: Convert clockevents_notify to use int cpu

2015-01-22 Thread Thomas Gleixner

On Wed, 10 Dec 2014, Joe Perches wrote:
 As far as I can tell, there's no value indirecting
 the cpu passed to this function via a void *.
 
 Update all the callers and called functions from within
 clockevents_notify.

Aside of that there is no value for this 'notification' function at
all. This should be seperate explicit calls. The notify function is a
leftover from the original implementation which used actual notifier
chains. I'll send out a cleanup series later today.

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v1 15/25] genirq: Kill the first parameter 'irq' of irq_flow_handler_t

2015-05-20 Thread Thomas Gleixner

On Wed, 20 May 2015, Jiang Liu wrote:

 On 2015/5/20 23:28, Thomas Gleixner wrote:
  On Wed, 20 May 2015, Jiang Liu wrote:
  -static void locomo_handler(unsigned int irq, struct irq_desc *desc)
  +static void locomo_handler(struct irq_desc *desc)
   {
 struct locomo *lchip = irq_desc_get_chip_data(desc);
  +  unsigned int irq;
 int req, i;
  
  That leaves irq unitialized 
 That should be OK, 'irq' here is just a local variable.
 Actually it may be changed as:
 if (req) {
 /* generate the next interrupt(s) */
 -irq = lchip-irq_base;
 +unsigned int irq = lchip-irq_base;

Indeed. 
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v1 15/25] genirq: Kill the first parameter 'irq' of irq_flow_handler_t

2015-05-20 Thread Thomas Gleixner

On Wed, 20 May 2015, Jiang Liu wrote:
 -static void locomo_handler(unsigned int irq, struct irq_desc *desc)
 +static void locomo_handler(struct irq_desc *desc)
  {
   struct locomo *lchip = irq_desc_get_chip_data(desc);
 + unsigned int irq;
   int req, i;

That leaves irq unitialized 

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up

2015-06-19 Thread Thomas Gleixner

On Fri, 19 Jun 2015, Kevin Hilman wrote:
 On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior
 A handful of boot test failures on ARM/OMAP were found by kernelci.org
 in next-20150619[1] and were bisected down to this patch, which hit
 next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the
 lock contention on the HB lock during wake up).  I confirmed that
 reverting that patch on top of next-20150619 gets things booting again
 for the affected platforms.
 
 I haven't debugged this any further, but full boot logs are available
 for the boot failures[2][3] and the linux-omap list and maintainer are
 Cc'd here to help investigate further if needed.

Found it. Dunno, how I missed that one. Fix below.

Thanks,

tglx
---

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 10dbeb6fe96f..5674b073473c 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1365,9 +1365,14 @@ rt_mutex_fastunlock(struct rt_mutex *lock,
if (likely(rt_mutex_cmpxchg(lock, current, NULL))) {
rt_mutex_deadlock_account_unlock(current);
 
-   } else if (slowfn(lock, wake_q)) {
+   } else {
+   bool deboost = slowfn(lock, wake_q);
+
+   wake_up_q(wake_q);
+
/* Undo pi boosting if necessary: */
-   rt_mutex_adjust_prio(current);
+   if (deboost)
+   rt_mutex_adjust_prio(current);
}
 }
 


--
To unsubscribe from this list: send the line unsubscribe linux-omap in

Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Thomas Gleixner

On Mon, 6 Jul 2015, Tony Lindgren wrote:
 * Thomas Gleixner t...@linutronix.de [150706 07:20]:
  On Mon, 6 Jul 2015, Tony Lindgren wrote:
 The timekeeping accuracy issue certainly needs some thinking, and
 also the resolution between the clocksources can be different.. In the
 test case I have the slow timer is always on and of a lower resolution
 than the ARM global timer being used during runtime.
 
 Got some handy timer test in mind you want me to run to provide data
 on the accuracy?

John Stultz might have something.
  
   +/**
   + * clocksource_pm_enter - change to a persistent clocksource before idle
   + *
   + * Changes system to use a persistent clocksource for idle. Intended to
   + * be called from CPUidle from the last active CPU.
   + */
   +int clocksource_pm_enter(void)
   +{
   + bool oneshot = tick_oneshot_mode_active();
   + struct clocksource *best;
   +
   + if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
   +   Unable to get clocksource_mutex))
   + return -EINTR;
  
  This trylock serves which purpose?
 
 Well we don't want to start changing clocksource if something is
 running like you pointed out.

Well yes, but 
 
  I really cannot see how this is proper serialized.
 
 We need to allow this only from the last cpu before hitting idle.

And I cannot see anything which does so.

cpu0cpu1
is_idle 
go_idle()
  clocksource_pm_enter()
lock(cs_mutex);
wakeup()
clocksource_pm_exit()
  trylock fails 

...  
unlock(cs_mutex);
  
-- Crap!

   @@ -1086,7 +1086,18 @@ int timekeeping_notify(struct clocksource *clock)

 if (tk-tkr_mono.clock == clock)
 return 0;
   - stop_machine(change_clocksource, clock, NULL);
   +
   + /*
   +  * We may want to toggle between a fast and a persistent
   +  * clocksource from CPUidle on the last active CPU and can't
   +  * use stop_machine at that point.
   +  */
   + if (cpumask_test_cpu(smp_processor_id(), cpu_online_mask) 
  
  Can you please explain how this code gets called from an offline cpu?
 
 Last cpu getting idled..

That does not make any sense at all. How is idle related to the online
mask? An idle cpu is still online.

   + !rcu_is_watching())
  
  So pick some random combination of conditions and define that it is
  correct, right? How on earth does !rcu_watching() tell that this is
  the last running cpu.
 
 We have called rcu_idle_enter() from cpuidle_idle_call(). Do you have
 some better test in mind when the last cpu is about hit idle?

The cpuidle code should know that. And if it does not know, it better
should keep track of that information and based on it provide the
proper serialization, so the call into the timekeeping code is not a
subject to guess work and race conditions.

Thanks,

tglx

  
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Thomas Gleixner

On Mon, 6 Jul 2015, John Stultz wrote:
 On Mon, Jul 6, 2015 at 12:12 AM, Tony Lindgren t...@atomide.com wrote:
  Some persistent clocksources can be on a slow external bus. For shorter
  latencies for RT use, let's allow toggling the clocksource during idle
  between a faster non-persistent runtime clocksource and a slower persistent
  clocksource.
 
 So yea, as Thomas mentioned, this will cause problems for timekeeping
 accuracy, since we end up resetting the ntp state when we change
 clocksource (additionally you gain a bit of error each time you switch
 clocksources). So you'll most likely see trouble w/ ntpd steering the
 clock.
 
 I'm not sure its quite clear in the description as to the need here.
 Could you expand a bit as to the rational for why?  I assume it has to
 do with you have a high-res clocksource that is good for fine-grained
 clock_gettime() calls, but wraps quickly, making it poor for long idle
 times. So you're trying to swap to a less fine grained clocksource
 that won't wrap so fast?
 
 The persistent-clock-like name muddies things further, since the
 persistent-clock is used for suspend, while it looks like this is just
 for idle times.

Though that are idle states where the cpu is powered off so the fast
clock source is powered off as well

We all know how well that works from the x86/TSC horror. It's just the
same thing again with a different arch prefix.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] clocksource: Allow toggling between runtime and persistent clocksource for idle

2015-07-06 Thread Thomas Gleixner

On Mon, 6 Jul 2015, Tony Lindgren wrote:

 Some persistent clocksources can be on a slow external bus. For shorter
 latencies for RT use, let's allow toggling the clocksource during idle
 between a faster non-persistent runtime clocksource and a slower persistent
 clocksource.

I really cannot follow that RT argument here. The whole switchover
causes latencies itself and whats worse is, that this breaks
timekeeping accuracy because there is no way to switch clocksources
atomically without loss.

 ---
  include/linuxt-email-lkml-omap/clocksource.h |  2 ++

Interesting file name.

  extern int timekeeping_notify(struct clocksource *clock);
 +extern int clocksource_pm_enter(void);
 +extern void clocksource_pm_exit(void);

Unfortunately you are not providing the call site, so I cannot see
from which context this is going to be called.

I fear its from the guts of the idle code probably with interrupts
disabled etc , right?
  
 +/**
 + * clocksource_pm_enter - change to a persistent clocksource before idle
 + *
 + * Changes system to use a persistent clocksource for idle. Intended to
 + * be called from CPUidle from the last active CPU.
 + */
 +int clocksource_pm_enter(void)
 +{
 + bool oneshot = tick_oneshot_mode_active();
 + struct clocksource *best;
 +
 + if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
 +   Unable to get clocksource_mutex))
 + return -EINTR;

This trylock serves which purpose?

 +
 + best = clocksource_find_best(oneshot, true, false);
 + if (best) {
 + if (curr_clocksource != best 
 + !timekeeping_notify(best)) {
 + runtime_clocksource = curr_clocksource;
 + curr_clocksource = best;
 + }
 + }
 + mutex_unlock(clocksource_mutex);
 +
 + return !!best;
 +}
 +
 +/**
 + * clocksource_pm_exit - change to a runtime clocksrouce after idle
 + *
 + * Changes system to use the best clocksource for runtime. Intended to
 + * be called after waking up from CPUidle on the first active CPU.
 + */
 +void clocksource_pm_exit(void)
 +{
 + if (WARN_ONCE(!mutex_trylock(clocksource_mutex),
 +   Unable to get clocksource_mutex))
 + return;
 +
 + if (runtime_clocksource) {
 + if (curr_clocksource != runtime_clocksource 
 + !timekeeping_notify(runtime_clocksource)) {
 + curr_clocksource = runtime_clocksource;
 + runtime_clocksource = NULL;
 + }
 + }
 + mutex_unlock(clocksource_mutex);
 +}

I really cannot see how this is proper serialized.

  #ifdef CONFIG_SYSFS
  /**
   * sysfs_show_current_clocksources - sysfs interface for current clocksource
 diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
 index bca3667..0379260 100644
 --- a/kernel/time/timekeeping.c
 +++ b/kernel/time/timekeeping.c
 @@ -1086,7 +1086,18 @@ int timekeeping_notify(struct clocksource *clock)
  
   if (tk-tkr_mono.clock == clock)
   return 0;
 - stop_machine(change_clocksource, clock, NULL);
 +
 + /*
 +  * We may want to toggle between a fast and a persistent
 +  * clocksource from CPUidle on the last active CPU and can't
 +  * use stop_machine at that point.
 +  */
 + if (cpumask_test_cpu(smp_processor_id(), cpu_online_mask) 

Can you please explain how this code gets called from an offline cpu?

 + !rcu_is_watching())

So pick some random combination of conditions and define that it is
correct, right? How on earth does !rcu_watching() tell that this is
the last running cpu.

 + change_clocksource(clock);
 + else
 + stop_machine(change_clocksource, clock, NULL);

This patch definitely earns a place in my ugly code museum under the
category 'Does not explode in my face, so it must be correct'.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] gpio: omap: use raw locks for locking

2015-07-28 Thread Thomas Gleixner

On Mon, 27 Jul 2015, Sebastian Andrzej Siewior wrote:
 On 07/27/2015 02:50 PM, Linus Walleij wrote:
  Patch applied.
 thanks.
 
  
  Now this question appear in my head:
  
  Is drivers/gpio full of stuff that will not work with the -RT kernel,
  and is this a change that should be done mutatis mutandis on
  all the GPIO drivers?
 
 I described two call paths where you need a rawlock_t. If your gpio
 driver uses irq_chip_generic then you a rawlock here and things should
 be fine.
 
 In general: If your gpio controller acts as an interrupts controller
 (that is via chained handler) then you need the raw-locks if you need
 any locking (if you have a write 1 to mask/unmask/enable/disable
 register then you probably don't need any locking here at all). If the
 gpio controller does not act as an interrupt controller than the
 spinlock_t type should be enough.
 If your gpio-interrupt controller requests its interrupt via
 requested_threaded_irq() then it should do handle_nested_irq() and a
 mutex is probably used for locking.
 
 Using request_irq() with 0 flags is kind of broken. It works in
 IRQ-context and delivers the interrupts with generic_handle_irq() and
 this one passes it the handler (like handle_edge_irq() /
 handle_level_irq()) which takes a raw_lock. Now, if you boot the
 vanilla kernel with threadedirq then the irq-handler runs in threaded
 context and you can't take a spinlock here anymore. So I think you
 should use here IRQF_NO_THREAD here (and the raw lock type). I added
 tglx on Cc: to back up because it is Monday.

Indeed it was monday.

It's an RT only problem. On mainline spinlock resovles to raw_spinlock.

Now there is the story what breaks in RT:

1) irq controller specific locks which are taken in the irq chip
   callbacks need to be raw_spinlocks because these functions are
   called with irq_desc-lock helds and interrupts disabled.

   If that irq chip lock is not raw you rightfully get a backtrace.

   raw_spinlock_irq(desc-lock);
   chip-callback()
 spin_lock(chip_private_lock);
   might_sleep() triggers

2) locks which are taken in chained interrupt handlers need to be raw
   on RT. These handlers run in hard interrupt context so again
   might_sleep() rightfully complains.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND] irqchip: omap-intc: improve IRQ handler

2015-07-20 Thread Thomas Gleixner

On Mon, 20 Jul 2015, Felipe Balbi wrote:
 + irqnr = intc_readl(INTC_SIR);
 + irqnr = ACTIVEIRQ_MASK;
 + WARN(!irqnr, Spurious IRQ ?\n);

Shouldn't that be WARN_ONCE?

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] irqchip: omap-intc: improve IRQ handler

2015-07-15 Thread Thomas Gleixner

On Wed, 15 Jul 2015, Tony Lindgren wrote:
 Felipe,
 
 * Tony Lindgren t...@atomide.com [150119 13:41]:
  * Felipe Balbi ba...@ti.com [150102 10:50]:
   as it turns out the current IRQ number will
   *always* be available from SIR register which
   renders the reads of PENDING registers as plain
   unnecessary overhead.
   
   In order to catch any situation where SIR reads
   as zero, we're adding a WARN() to turn it into
   a very verbose error and users actually report
   it.
   
   With this patch average running time of
   omap_intc_handle_irq() reduced from about 28.5us
   to 19.8us as measured by the kernel function
   profiler.
   
   Tested with BeagleBoneBlack Rev A5C.
   
   Signed-off-by: Felipe Balbi ba...@ti.com
  
  Jason, looks like this is not showing up in Linux next. The
  same for the changes I did for dm81xx.
 
 Can you please resend this to Jason? Looks like this
 is still not merged.

Please send it to me asap and please cc lkml on irqchip patches.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] irqchip: omap-intc: fix spurious irq handling

2015-10-19 Thread Thomas Gleixner

On Mon, 19 Oct 2015, Sekhar Nori wrote:
> + /*
> +  * A spurious IRQ can result if interrupt that triggered the
> +  * sorting is no longer active during the sorting (10 INTC
> +  * functional clock cycles after interrupt assertion). Or a
> +  * change in interrupt mask affected the result during sorting
> +  * time. There is no special handling required except ignoring
> +  * the SIR register value just read and retrying.
> +  * See section 6.2.5 of AM335x TRM Literature Number: SPRUH73K
> +  */
> + if ((irqnr & SPURIOUSIRQ_MASK) == SPURIOUSIRQ_MASK) {
> + pr_debug_ratelimited("%s: spurious irq!\n", __func__);

I'd prefer that this is a pr_once() and the spurious interrupt counter
is incremented. That's far more useful as it gives you real
information about the frequency of the issue.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sched_clock: add data pointer argument to read callback

2015-10-11 Thread Thomas Gleixner

Mans,

On Fri, 9 Oct 2015, Mans Rullgard wrote:

> This passes a data pointer specified in the sched_clock_register()
> call to the read callback allowing simpler implementations thereof.
>
> In this patch, existing uses of this interface are simply updated
> with a null pointer.

I can't see any simplifaction of any driver. This is called in hot
pathes and that extra unused argument just adds register pressure for
no value.

What are you actually trying to solve?

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] gpio: omap: convert to use generic irq handler

2015-10-12 Thread Thomas Gleixner

Grygorii,

can you please provide a patch set against 4.1-RT? That stuff rejects
left and right.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] gpio: omap: convert to use generic irq handler

2015-10-13 Thread Thomas Gleixner

Grygorii,

On Tue, 13 Oct 2015, Grygorii Strashko wrote:
> I'd very appreciate for any advice of how to better proceed with your request.
>  - I can try to apply and re-send only patches marked by '*'
>  - I can prepare branch with all above patches

Please provide a branch on top of 4.1.10 which contains the whole
lot. I have a look at it and decide then how to go from there.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-28 Thread Thomas Gleixner

On Tue, 25 Aug 2015, Felipe Balbi wrote:
 Hi Ingo,

Thanks for not cc'ing the irq maintainer 
 
 I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
 devm_request_*irq().
 
 If we using devm_request_*irq(), that irq will be freed after device
 drivers' -remove() gets called. If on -remove(), we're calling
 pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
 gated and, because we do an extra call to the device's IRQ handler when
 CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
 IRQ handler, we try to read a register which is clocked by the device's
 clock.
 
 This is, of course, really old code which has been in tree for many,
 many years. I guess nobody has been running their tests in the setup
 mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
 -remove(), a register read on IRQ handler, and a shared IRQ handler),
 so that's why we never caught this before.
 
 Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
 if driver *must* be ready to receive, and handle, an IRQ even during
 module removal, I wonder what the IRQ handler should do. We can't, in
 most cases, call pm_runtime_put_sync() from IRQ handler.

Well, a shared interrupt handler must handle this situation, no matter
what. Assume the following:

irqreturn_t dev_irq(int irq, void *data)
{
struct devdata *dd = data;
u32 state;

state = readl(dd-base);
...
}

void module_exit(void)
{   
/* Write to the device interrupt register */
disable_device_irq(dd-base);
/*
 * After this point the device does not longer
 * raise an interrupt
 */
iounmap(dd-base);
free_irq();

If the other device which shares the interrupt line raises an
interrupt after the unmap and before free_irq() removed the device
handler from the irq, the machine is toast, because the dev_irq
handler is still called.

If the handler is shut down after critical parts of the driver/device
are shut down, then you can 

 - either can change the setup/teardown ordering

disable_device_irq(dd-base);
free_irq();
iounmap(dd-base);

 - or have a proper flag in the private data which tells the interrupt
   handler to sod off.

irqreturn_t dev_irq(int irq, void *data)
{
struct devdata *dd = data;

if (dd-shutdown)
return IRQ_NONE;
...

void module_exit(void)
{   
disable_device_irq(dd-base);
dd-shutdown = 1;

/* On an SMP machine you also need: */  
synchronize_irq(dd-irq);

So for the problem at hand, the devm magic needs to make sure that the
crucial parts are still alive when the devm allocated irq is released.

I have no idea how that runtime PM stuff is integrated into devm (I
fear not at all), so it's hard to give you a proper advise on that.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Routable IRQs

2015-12-30 Thread Thomas Gleixner

Felipe,

On Tue, 29 Dec 2015, Felipe Balbi wrote:
> Anyway, the interesting part is that PRUSS has 64 events (on current
> incarnations at least) and PRUSS has 10 physical IRQ lines to the ARM
> land. Each of these 64 events can be routed to any of these 10 IRQ
> lines. This might not be very useful on UP (AM335x & AM437x) other than
> the fact that soft-IP drivers running on Linux would need to guarantee
> they are the ones who should handle the IRQ. However, on SMP (AM57xx) we
> could have real tangible benefits by means of IRQ affinity, etc.
> 
> So, the question is, what is there in IRQ subsystem today for routable
> IRQ support ?
> 
> If a Diagram helps here's a simple one. Note that I'm not showing
> details on the PRUSS side, but that side can also map events pretty much
> any way it wants.
> 
>  .. ..
>  |  HOST CPU  | |   PRUSS|
>  || ||
>  || ||
>  |   irq0 |<-.--|evt0|
>  ||  |  ||
>  |   irq1 |  |  .---|evt1|
>  ||  |  |   ||
>  |   irq2 |  '--|evt2|
>  || |   ||
>  |   irq3 | |   ||
>  || |   ||
>  |   irq4 | |   | .  |
>  || |   ||
>  |   irq5 | |   | .  |
>  || |   ||
>  |   irq6 | |   | .  |
>  || |   ||
>  |   irq7 |<'   ||
>  || ||
>  |   irq8 | ||
>  || ||
>  |   irq9 |<|evtN|
>  '' ''
> 
> Given this setup, what I want to do, is let soft-IP drivers running on
> linux rely on standard *request_*irq() calls and DTS descrition. But I'm
> still considering how/if we should describe the routing itself or just
> go round-robin (i.o.w. irq0 -> evt0, irq1 -> evt1, ..., irq9 -> evt9,
> irq0 -> evt10, ...).
> 
> Thoughts ?

I have a few questions:

 - Is there a "mapping" block between PRUSS and the host interrupt controller
   or is this "mapping" block part of PRUSS?

 - We all know how well shared interrupts work. Is there a point of supporting
   64 interrupts when you only have 10 irq lines available?

 - I assume that the PRUSS interrupt mapping is more or less a question of the
   firmware implementation. So you either have a fixed association in the
   firmware which is reflected in the DT description of the IP block or you
   need an interface to tell the PRUSS firmware which event it should map to
   which irq line. Is there actually a value in doing the latter?

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Routable IRQs

2015-12-30 Thread Thomas Gleixner

Felipe,

On Wed, 30 Dec 2015, Felipe Balbi wrote:
> Thomas Gleixner <t...@linutronix.de> writes:
> >  - Is there a "mapping" block between PRUSS and the host interrupt 
> > controller
> >or is this "mapping" block part of PRUSS?
> 
> The description in TRM is a bit "poor", but from what I can gather, the
> mapping is done on an interrupt controller inside the PRUSS. However,
> Linux is the one who's got the driver for that INTC (well, Linux will be
> the one with the soft ethernet/uart/whatever IP to talk to). All of its
> (INTC's) registers are memory mapped to the ARM side.

Ok. And the INTC registers include the "mapping" configuration, right?

> >  - We all know how well shared interrupts work. Is there a point of 
> > supporting
> >64 interrupts when you only have 10 irq lines available?
> 
> I'm looking at these 64 events more like MSI kind of events. It's just

Well, that's fine to look at them this way, but they will end up shared no
matter what.

> that the events themselves can be routed to any of the 10 available HW
> IRQ lines.
> 
> >  - I assume that the PRUSS interrupt mapping is more or less a question of 
> > the
> >firmware implementation. So you either have a fixed association in the
> >firmware which is reflected in the DT description of the IP block or you
> >need an interface to tell the PRUSS firmware which event it should map to
> >which irq line. Is there actually a value in doing the latter?
> 
> right, I'd say the mapping is pretty static. Unless Suman has some extra
> information which I don't. I guess the question was really to see if
> there was an easy way for doing this so we don't have to mess with DTS
> for every other FW and their neighbor.

Well, you will need information about every other firmware simply because you
need to know which events the firmware is actually using and what the purpose
of the particular event is.

Assume you have a simple uart with 3 events (RX, TX, status). So how will the
firmware tell you which event is which? You have a few options:

 1) DT + fixed mapping scheme: 

Describe the PRUSS event number in DT and have a fixed mapping scheme like
the one you mentioned evt0 -> irq0 .

 2) DT + DT mapping scheme

Describe the PRUSS event number in DT and describe the mapping scheme in
DT as well

 3) DT + dynamic mapping scheme

Describe the PRUSS event number in DT and let your interrupt controller
associate the irq number dynamically. That's kind of similar to MSI with
the exception that it needs to support shared interrupts.

 4) Fully dynamic association

Have a query interface to the firmware which tells you which event it uses
for which particular purpose (RX, TX ...) and then establish a dynamic
mapping to one of the interrupts.

Not sure which level of complexity you want :)

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] clocksource: arm_global_timer: fix suspend resume

2015-11-20 Thread Thomas Gleixner

On Fri, 20 Nov 2015, John Stultz wrote:
> So its unlikely that the hardware both stays running through suspend,
> but also might halt in idle. That would be "unique".

The amount of creativity put into the next variants of differently
broken timers is amazing. So I wouldn't be too surprised if such a
thing actually surfaces.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] ARM: twd: Add context save restore support

2011-01-25 Thread Thomas Gleixner

On Tue, 25 Jan 2011, Russell King - ARM Linux wrote:
 On Mon, Jan 24, 2011 at 11:39:13PM -0800, Colin Cross wrote:
  On Mon, Jan 24, 2011 at 3:11 AM, Russell King - ARM Linux
  li...@arm.linux.org.uk wrote:
   On Mon, Jan 24, 2011 at 11:06:09AM +, Russell King - ARM Linux wrote:
   On Mon, Jan 24, 2011 at 02:21:17PM +0530, Santosh Shilimkar wrote:
In CPU low power state, local timer looses its register context. This
patch adds context save restore hooks which can be used by platforms
appropriately.
  
   I thought the whole point of CLOCK_EVT_FEAT_C3STOP was that the generic
   timer stuff wouldn't rely on it being kept alive?
  
   Hmm, it looks like we bypass the clockevents code by only setting the
   TWD load value at initialization time, not when we switch to periodic
   mode.  We really ought to rewrite it whenever we switch back to periodic
   mode.
  
   I suspect fixing that means you won't need this save/restore support.
  
   Untested, but should do what's required.
  
   diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
   index fd91566..60636f4 100644
   --- a/arch/arm/kernel/smp_twd.c
   +++ b/arch/arm/kernel/smp_twd.c
   @@ -36,6 +36,7 @@ static void twd_set_mode(enum clock_event_mode mode,
                  /* timer load already set up */
                  ctrl = TWD_TIMER_CONTROL_ENABLE | 
   TWD_TIMER_CONTROL_IT_ENABLE
                          | TWD_TIMER_CONTROL_PERIODIC;
   +               __raw_writel(twd_timer_rate / HZ, twd_base + 
   TWD_TIMER_LOAD);
                  break;
          case CLOCK_EVT_MODE_ONESHOT:
                  /* period set, and timer enabled in 'next_event' hook */
   @@ -81,7 +82,7 @@ int twd_timer_ack(void)
  
    static void __cpuinit twd_calibrate_rate(void)
    {
   -       unsigned long load, count;
   +       unsigned long count;
          u64 waitjiffies;
  
          /*
   @@ -116,10 +117,6 @@ static void __cpuinit twd_calibrate_rate(void)
                  printk(%lu.%02luMHz.\n, twd_timer_rate / 100,
                          (twd_timer_rate / 100) % 100);
          }
   -
   -       load = twd_timer_rate / HZ;
   -
   -       __raw_writel(load, twd_base + TWD_TIMER_LOAD);
    }
  
    /*
  
  This doesn't work for oneshot timers if the TWD_TIMER_CONTROL register
  gets reset by cpu idle between twd_set_mode and twd_set_next_event.
  Shadowing ctrl in a percpu variable and rewriting it during every call
  to twd_set_next_event does work, but that's not much different, and a
  little less efficient, than just saving and restoring the control and
  load registers in idle.  It does have the advantage that platforms
  don't need any extra calls.
 
 The next question is can we teach the generic time infrastructure about
 this so we don't have to modify every clock event driver for it?  We
 really need to get away from having this kind of knowledge buried down
 in the lowest levels of every driver.

In which way? I mean the generic code issues a call to the set_mode
function when we leave the broadcast mode. So what should the generic
code do more ?

Thanks,

tglx

Re: [PATCH 3/5] ARM: twd: Add context save restore support

2011-01-25 Thread Thomas Gleixner

On Tue, 25 Jan 2011, Russell King - ARM Linux wrote:
 On Tue, Jan 25, 2011 at 02:23:10PM +0100, Thomas Gleixner wrote:
  In which way? I mean the generic code issues a call to the set_mode
  function when we leave the broadcast mode. So what should the generic
  code do more ?
 
 I can't say because these patches only add the hooks, there's no
 implementation yet which uses the hooks.
 
 Given the description about _why_ those hooks are necessary, it seems
 that something is required.  Either we start adding custom hacks to
 each clockevent driver as is done with this patch, or we get some
 generic help in place.
 
 I'm not thrilled by the custom hack approach - and I thought the
 clockevent stuff was created to stop this kind of thing happening.

Yes, and it does the right thing:

 idle enter (where the cpu local tick device stops)

  tick_broadcast_oneshot_control(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)

clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN)
tick_broadcast_set_event(dev-next_event, 1)

idle exit

  tick_broadcast_oneshot_control(CLOCK_EVT_NOTIFY_BROADCAST_EXIT)

clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT)
tick_program_event(dev-next_event, 1)

So the generic code has all the calls in place. If a clock chip
implementation misses to set control registers in the
CLOCK_EVT_MODE_ONESHOT case, then it's not a short coming of the
generic code which needs magic hooks in the arch code.

The same applies for the periodic mode switch, which is handled via
tick_broadcast_on_off().

Am I missing something ?

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] ARM: twd: Add context save restore support

2011-01-25 Thread Thomas Gleixner

On Tue, 25 Jan 2011, Russell King - ARM Linux wrote:
 On Tue, Jan 25, 2011 at 03:12:24PM +0100, Thomas Gleixner wrote:
  On Tue, 25 Jan 2011, Russell King - ARM Linux wrote:
   On Tue, Jan 25, 2011 at 02:23:10PM +0100, Thomas Gleixner wrote:
In which way? I mean the generic code issues a call to the set_mode
function when we leave the broadcast mode. So what should the generic
code do more ?
   
   I can't say because these patches only add the hooks, there's no
   implementation yet which uses the hooks.
   
   Given the description about _why_ those hooks are necessary, it seems
   that something is required.  Either we start adding custom hacks to
   each clockevent driver as is done with this patch, or we get some
   generic help in place.
   
   I'm not thrilled by the custom hack approach - and I thought the
   clockevent stuff was created to stop this kind of thing happening.
  
  Yes, and it does the right thing:
  
   idle enter (where the cpu local tick device stops)
  

  tick_broadcast_oneshot_control(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
  
  clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN)
  tick_broadcast_set_event(dev-next_event, 1)
  
  idle exit
  

  tick_broadcast_oneshot_control(CLOCK_EVT_NOTIFY_BROADCAST_EXIT)
  
  clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT)
  tick_program_event(dev-next_event, 1)
  
  So the generic code has all the calls in place. If a clock chip
  implementation misses to set control registers in the
  CLOCK_EVT_MODE_ONESHOT case, then it's not a short coming of the
  generic code which needs magic hooks in the arch code.
  
  The same applies for the periodic mode switch, which is handled via
  tick_broadcast_on_off().
  
  Am I missing something ?
 
 I quote Colin:
 
 | This doesn't work for oneshot timers if the TWD_TIMER_CONTROL register
 | gets reset by cpu idle between twd_set_mode and twd_set_next_event.

And I fear that he means the twd_set_mode(ONESHOT) call which happens
during boot/cpuonline.

The point is that the above function needs to be called from the cpu
idle code to notify the clock events layer. The correct mechanism is
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER/EXIT). The
existing callers are in:

arch/x86/kernel/process.c
drivers/acpi/acpi_pad.c
drivers/acpi/processor_idle.c
drivers/idle/intel_idle.c

So i'm not surprised, that this wont work on ARM :)

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.33-rc8-rt1 on Beagle

2010-03-01 Thread Thomas Gleixner

On Mon, 1 Mar 2010, Uwe Kleine-König wrote:
 You might want to report that to the author(s) of
 drivers/mmc/host/omap_hsmmc.c.

FYI, thats fixed in 33-rt3 and the fix is going mainline as well.

Thanks,

tglx

[PATCH] mmc: omap_hsmmc: Fix conditional locking

2010-03-01 Thread Thomas Gleixner

Conditional locking on (!in_interrupt()) is broken by design and there
is no reason to keep the host-irq_lock across the call to
mmc_request_done(). Also the host-protect_card magic hack does not
depend on the context

Fix the mess by dropping host-irq_lock before calling
mmc_request_done().

Signed-off-by: Thomas Gleixner t...@linutronix.de

diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
index 4b23225..99a3383 100644
--- a/drivers/mmc/host/omap_hsmmc.c
+++ b/drivers/mmc/host/omap_hsmmc.c
@@ -468,14 +468,6 @@ omap_hsmmc_start_command(struct omap_hsmmc_host *host, 
struct mmc_command *cmd,
if (host-use_dma)
cmdreg |= DMA_EN;
 
-   /*
-* In an interrupt context (i.e. STOP command), the spinlock is unlocked
-* by the interrupt handler, otherwise (i.e. for a new request) it is
-* unlocked here.
-*/
-   if (!in_interrupt())
-   spin_unlock_irqrestore(host-irq_lock, host-flags);
-
OMAP_HSMMC_WRITE(host-base, ARG, cmd-arg);
OMAP_HSMMC_WRITE(host-base, CMD, cmdreg);
 }
@@ -506,7 +498,9 @@ omap_hsmmc_xfer_done(struct omap_hsmmc_host *host, struct 
mmc_data *data)
}
 
host-mrq = NULL;
+   spin_unlock(host-irq_lock);
mmc_request_done(host-mmc, mrq);
+   spin_lock(host-irq_lock);
return;
}
 
@@ -523,7 +517,9 @@ omap_hsmmc_xfer_done(struct omap_hsmmc_host *host, struct 
mmc_data *data)
 
if (!data-stop) {
host-mrq = NULL;
+   spin_unlock(host-irq_lock);
mmc_request_done(host-mmc, data-mrq);
+   spin_lock(host-irq_lock);
return;
}
omap_hsmmc_start_command(host, data-stop, NULL);
@@ -551,7 +547,9 @@ omap_hsmmc_cmd_done(struct omap_hsmmc_host *host, struct 
mmc_command *cmd)
}
if ((host-data == NULL  !host-response_busy) || cmd-error) {
host-mrq = NULL;
+   spin_unlock(host-irq_lock);
mmc_request_done(host-mmc, cmd-mrq);
+   spin_lock(host-irq_lock);
}
 }
 
@@ -1077,37 +1075,31 @@ static void omap_hsmmc_request(struct mmc_host *mmc, 
struct mmc_request *req)
struct omap_hsmmc_host *host = mmc_priv(mmc);
int err;
 
+   spin_lock_irqsave(host-irq_lock, host-flags);
/*
-* Prevent races with the interrupt handler because of unexpected
-* interrupts, but not if we are already in interrupt context i.e.
-* retries.
+* Protect the card from I/O if there is a possibility
+* it can be removed.
 */
-   if (!in_interrupt()) {
-   spin_lock_irqsave(host-irq_lock, host-flags);
-   /*
-* Protect the card from I/O if there is a possibility
-* it can be removed.
-*/
-   if (host-protect_card) {
-   if (host-reqs_blocked  3) {
-   /*
-* Ensure the controller is left in a consistent
-* state by resetting the command and data state
-* machines.
-*/
-   omap_hsmmc_reset_controller_fsm(host, SRD);
-   omap_hsmmc_reset_controller_fsm(host, SRC);
-   host-reqs_blocked += 1;
-   }
-   req-cmd-error = -EBADF;
-   if (req-data)
-   req-data-error = -EBADF;
-   spin_unlock_irqrestore(host-irq_lock, host-flags);
-   mmc_request_done(mmc, req);
-   return;
-   } else if (host-reqs_blocked)
-   host-reqs_blocked = 0;
-   }
+   if (host-protect_card) {
+   if (host-reqs_blocked  3) {
+   /*
+* Ensure the controller is left in a consistent
+* state by resetting the command and data state
+* machines.
+*/
+   omap_hsmmc_reset_controller_fsm(host, SRD);
+   omap_hsmmc_reset_controller_fsm(host, SRC);
+   host-reqs_blocked += 1;
+   }
+   req-cmd-error = -EBADF;
+   if (req-data)
+   req-data-error = -EBADF;
+   spin_unlock_irqrestore(host-irq_lock, host-flags);
+   mmc_request_done(mmc, req);
+   return;
+   } else if (host-reqs_blocked)
+   host-reqs_blocked = 0;
+
WARN_ON(host-mrq != NULL);
host-mrq = req;
err = omap_hsmmc_prepare_data(host, req);
@@ -1116,13 +1108,13 @@ static void omap_hsmmc_request(struct mmc_host *mmc, 
struct mmc_request *req

RE: [PATCH] mmc: omap_hsmmc: Fix conditional locking

2010-03-03 Thread Thomas Gleixner

On Tue, 2 Mar 2010, Madhusudhan wrote:
  Conditional locking on (!in_interrupt()) is broken by design and there
  is no reason to keep the host-irq_lock across the call to
  mmc_request_done(). Also the host-protect_card magic hack does not
  depend on the context
  
 
 Can you please elaborate why the existing logic is broken?

Locks are only to be held to serialize data or state. 

The mmc_request_done() call does _NOT_ require that at all. So
dropping the lock there is the right thing to do.

Also conditional locking on in_interrupt() is generally a nono as it
relies on assumptions which are not necessarily true in all
circumstances. Just one simple example: interrupt threading will make
it explode nicely and it did already with the realtime patches
applied.

Such code constructs prevent us to do generic changes to the kernel
behaviour without any real good reason.

 It locks at the new request and unlocks just before issuing the cmd. Further
 IRQ handler has these calls hence the !in_interrupt check.

Aside of the conditional locking I have several issues with that code:

1) The code flow is massively unreadable:

   omap_hsmmc_start_command()
   {
.
if (!in_interrupt())
   spin_unlock_irq();
   }
 
   omap_hsmmc_request()
   {
if (!in_interrupt())
   spin_lock_irq();

omap_hsmmc_start_command();
   }

We generally want to see the lock/unlock pairs in one function and not
having to figure out where the heck unlock happens.

2) The point of unlocking is patently wrong

   omap_hsmmc_start_command()
   {
.
if (!in_interrupt())
   spin_unlock_irq();
---
OMAP_HSMMC_WRITE(host-base, ARG, cmd-arg);
---
OMAP_HSMMC_WRITE(host-base, CMD, cmdreg);
   }

What happens, if you get a spurious interrupt here ? Same for SMP,
though you are probably protected by the core mmc code request
serialization there.

 How does this patch improve that? In fact with your patch for a data
 transfer cmd there are several lock-unlock calls.

1) The patch simply removes conditional locking and moves the lock
   sections to those places which protect something. Aside of that it
   makes the code easier to understand.

2) What's the point of not having those lock/unlocks ? On UP the
   spinlock is a NOOP anyway, so you won't even notice. On SMP you
   won't notice either, simply because the lock is cache hot and
   almost never contended.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mfd: twl4030-irq: irq_desc-lock converted to raw_spinlock_t

2009-12-16 Thread Thomas Gleixner

On Wed, 16 Dec 2009, Felipe Balbi wrote:
 commit 239007b8440abff689632f50cdf0f2b9e895b534 converted
 the spinlock_t to raw_spinlock_t. Unfortunately twl4030-irq
 was left aside on the conversion.

 Cc: Thomas Gleixner t...@linutronix.de
 Cc: Tony Lindgren t...@atomide.com
 Cc: linux-omap@vger.kernel.org
 Signed-off-by: Felipe Balbi felipe.ba...@nokia.com
 ---

 I'm not sure this is the expected fix since twl4030-irq handler
 should be running in thread context. Currently mask/unmask/set_type
 are deferred to a workqueue. Thomas, should this be done ? How
 do you expect irq chips on slow busses to implement mask/unmask/set_type ?

That does not matter whether this runs in thread context or not. We
changed the type of the lock from spinlock to raw_spinlock. So we need
to use the corresponding raw_spin_* functions for it independent of
the context. Your patch is completely correct.

Acked-by: Thomas Gleixner t...@linutronix.de

  drivers/mfd/twl4030-irq.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/mfd/twl4030-irq.c b/drivers/mfd/twl4030-irq.c
 index 20d29ba..9df9a5a 100644
 --- a/drivers/mfd/twl4030-irq.c
 +++ b/drivers/mfd/twl4030-irq.c
 @@ -568,12 +568,12 @@ static void twl4030_sih_do_edge(struct work_struct
 *work)
   bytes[byte] = ~(0x03  off);
  -spin_lock_irq(d-lock);
 + raw_spin_lock_irq(d-lock);
   if (d-status  IRQ_TYPE_EDGE_RISING)
   bytes[byte] |= BIT(off + 1);
   if (d-status  IRQ_TYPE_EDGE_FALLING)
   bytes[byte] |= BIT(off + 0);
 - spin_unlock_irq(d-lock);
 + raw_spin_unlock_irq(d-lock);
   edge_change = ~BIT(i);
   }
 -- 
 1.6.6.rc0

 -- 
 balbi

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-26 Thread Thomas Gleixner

Alan,

On Wed, 26 May 2010, Alan Cox wrote:

  Really, what are you getting at? Do you deny that there are programs,
  that prevent a device from sleeping? (Just think of the bouncing
  cows app)
  
  And if you have two kernels, one with which your device is dead after 1
  hour and one with which your device is dead after 10 hours. Which would
  you prefer? I mean really... this is ridiculous. 
 
 The problem you have is that this is policy. If I have the device wired
 to a big screen and I want cows bouncing on it I'll be most upset if
 instead it suspends. What you are essentially arguing for is for the
 kernel to disobey the userspace. It's as ridiculous (albeit usually less
 damaging) as a file system saying Ooh thats a rude file name, the app
 can't have meant it, I'll put your document soemwhere else
 
 The whole API feels wrong to me. It's breaking rule #1 of technology You
 cannot solve a social problem with technology. In this case you have a
 social/economic problem which is crap code. You solve it with an
 economics solution - creative incentives not to produce crap code like
 boxes that keep popping up saying App XYZ is using all your battery and
 red-amber-green powermeter scores in app stores.

I completely agree. 

We have already proven that the social pressure on crappy applications
works. When NOHZ was merged into the kernel we got no effect at all
because a big percentage of user space applications just used timers
at will and without any thoughts, also it unveiled busy polling and
other horrible coding constructs. So what happened ? Arjan created
powertop which lets the user analyse the worst offenders in his
system. As a result the offending apps got fixed rapidly simply
because no maintainer wanted to be on top of the powertop sh*tlist.

In the mobile app space it's basically the same problem. Users can
influence the app writers simply by voting and setting up public lists
of apps which are crappy or excellent. All it needs is a nice powertop
tool for the phone which allows the users to identify the crap on
their phones. That provides much more incentive - especially for
commercial players - to fix their crappy code.

Adding that sys_try_to_fix_crappy_userspace_code() API to the kernel
is just counter productive as it signals to the app provider: Go
ahead, keep on coding crap!

That's not a solution, that's just capitulation. 

It's absurd that some folks believe that giving up the most efficient
tool to apply pressure to crappy app providers is a good idea.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-26 Thread Thomas Gleixner

Florian,

On Wed, 26 May 2010, Florian Mickler wrote:

 On the other hand, applications can say, they don't need that much
 power and userspace guaranties and not take a suspend blocker.
 
 This is an option which they currently don't have.

Wrong. A well coded power aware application is very well able to
express that in various ways already today. Admittedly it's far from
perfect, but that can be fixed by adding interfaces which allow the
power aware coder to express the requirements of his application
actively, not by avoiding it.

suspend blockers are completely backwards as they basically disable
the kernels ability to do resource management.

Also they enforce a black and white scheme (suspend or run) on the
kernel which is stupid, as there are more options to efficiently save
power than those two. While current android devices might not provide
them, later hardware will have it and any atom based device has them
already.

So what the kernel needs to know to make better decisions are things
like:

  - how much slack can timers have (exisiting interface)
  - how much delay of wakeups is tolerated (missing interface)

and probably some others. That information would allow the kernel to
make better decisions versus power states, grouping timers, race to
idle and other things which influence the power consumption based on
the hardware you are running on.

 I don't think opportunistic suspend is a policy decision by the kernel.
 it is something new. Something which currently only the android
 userspace implements / supports. If you don't want to suspend while
 looking at the bouncing-cow, you have to take a suspend blocker and
 make yourself a user-visible power-eater, or don't do 
 
 echo opportunistic  /sys/power/policy 
 
 in the first place.
 
 This optionally being badly written, who cares? is a new feature the
 kernel can provide to applications. 

It's a misfeature which the kernel should not provide at all. It sends
out the completely wrong message: Hey, we can deal with your crappy
code, keep on coding that way.

While this seems to sound cool to certain people in the mobile space,
it's completely backwards and will backfire in no time. 

The power efficiency of a mobile device is depending on a sane overall
software stack and not on the ability to mitigate crappy software in
some obscure way which is prone to malfunction and disappoint users.

Thanks,

tglx

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Florian Mickler wrote:

 On Wed, 26 May 2010 22:03:37 +0200
 Vitaly Wool vitalyw...@gmail.com wrote:
 
  On Wed, May 26, 2010 at 9:56 PM, Florian Mickler flor...@mickler.org 
  wrote:
  
   Your approach definitely sounds better than the current solution.
   What about mapping suspend blocker functionality later on, when this
   interface exists, on to this new approach and deprecating it?
  
  What about coming back after some while with the appropriate solution
  when it's ready instead of stubbornly pushing crap?
  
  ~Vitaly
 
 Because quite frankly, for a good part of linux users, suspend blockers
 is already in the kernel. It's just an historical mistake that they are
 not in the linux kernel's hosted on kernel.org. 

No, it's not a historical mistake. It's a technical decision _NOT_ to
merge crap. If we would accept every crappy patch which gets shipped
in large quantities as a defacto part of the kernel we would have a
completely unmaintainable mess since years.

 So why don't we do what we always do? Improve existing interfaces step
 by step? 

Exactly, that's what we are going to do. We improve and extend
existing interfaces step by step, but not by creating a horrible and
unmaintainable mess in the frist place which we can never get rid of
anymore.

 Top Down approaches fail from time to time. Also it is not clear, that
 that proposed interface works for the use cases. This has to be proven
 by providing an implementation. 

Nobody prevents you to sit down and start with a prove of concept
implementation.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Wed, 26 May 2010, Florian Mickler wrote:

 On Wed, 26 May 2010 19:22:16 +0200 (CEST)
 Thomas Gleixner t...@linutronix.de wrote:
  On Wed, 26 May 2010, Florian Mickler wrote:
  
   On the other hand, applications can say, they don't need that much
   power and userspace guaranties and not take a suspend blocker.
   
   This is an option which they currently don't have.
  
  Wrong. A well coded power aware application is very well able to
  express that in various ways already today. Admittedly it's far from
  perfect, but that can be fixed by adding interfaces which allow the
  power aware coder to express the requirements of his application
  actively, not by avoiding it.
 
 Yeah, but a user can't say: I don't
 know programming, but I had this idea. Here try it out. 
 to his friend. Because his friends phone then will crap out.
 
 This is a negative. The phone just doesn't work well. 
 
 A knowledgeable programmer is able to do extra work to enable specific
 guarantees. A dummy-throw-away-programmer doesn't want to do
 any extra work. 

Trying to solve that inside the kernel is the patently wrong
approach. The only way to give Joe Clicker the ability to code his
idea is to give him a restricted sandbox environment which takes care
of the extra work and is created by knowledgable programmers with the
Joe Clickers in mind.

Every Joe Clicker can code websites and does this happily in his
sandbox which consists of a web server and a web application
framework. There is no single line of code in the kernel to make this
work.

As I said before we need new interfaces and new technologies to let
the kernel do better power management, but definitely not a big hammer
approach which is actively preventing the kernel to do smarter
decisions.

The correct approach is QoS based and everything else is just a
bandaid which is going to hurt us badly.

  
  suspend blockers are completely backwards as they basically disable
  the kernels ability to do resource management.
 
 I don't think this is a defect in the approach but the point of it. 

I know that this is the point of the approach, but that does not make
it less wrong. Me, Alan and others explained already in great length
why it is the wrong approach, but you refuse to listen.

You remind me of my 14 year old son, who argues in circles to convince
me that I must allow him something which is wrong. And if he would
think about it w/o his primary goal of getting permission in mind he
would concede that it's wrong.

  
  Also they enforce a black and white scheme (suspend or run) on the
  kernel which is stupid, as there are more options to efficiently save
  power than those two. While current android devices might not provide
  them, later hardware will have it and any atom based device has them
  already.
 
 This is true. Nonetheless, in my opinion, implementing the backwards
 approach in any form (suspend blockers or some other sort of sane
 interface) is necessary in the long run.  I also believe that Alan's
 approach is the more flexible one. But I'm not in a position to judge
 on this.
 
 If it really is better and superior, I think userland will switch
 over to it, as soon as it is possible to do it. The impact to the
 drivers code is needed anyway. What looses the kernel by implementing
 suspend blockers, and later a more finegrained approach and mapping the
 userspace part of suspend blockers on that finegrained approach later
 on?

The kernel loses the ability to remove suspend blockers once the
interface is exposed to user space. That's the whole point. We would
have to carry it on for a long time and trying to work around it when
implementing a technical correct approach.

And we have never seen crap move to a better interface. It will stay
there forever and hell will break lose when we break it.

  So what the kernel needs to know to make better decisions are things
  like:
  
- how much slack can timers have (exisiting interface)
 
 I liked this idea of Arjan, to give some applications infinite timer
 slack. But I don't know if it can made work in a dummy proof way.
 (I.e. it is too complicated to get it right in general, except for the
 infinity or not kind of tuning)

A mobile device can implement sensible defaults for the careless
applications and let the good apps override them.
 
- how much delay of wakeups is tolerated (missing interface)
 
 Doesn't solve the segregation problem and is probably overkill for most

It solves the segregration problem quite nicely, as again it can be
set to sensible defaults and to very long for the non verified apps.

 applications. I see this as an orthogonal thing (i.e. not
 affecting this merge). 

It's not orthogonal, it's essential to do QoS based power management,
which is the only sensible technical solution to do, as it allows the
kernel to optimize in various areas while at the same time
guaranteeing the response time to applications which require them.

Blockers are not orthogonal at all

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Wed, 26 May 2010, Arve Hjønnevåg wrote:
 2010/5/26 Alan Cox a...@lxorguk.ukuu.org.uk:
  On Wed, 26 May 2010 15:30:58 -0700
  Arve Hjønnevåg a...@android.com wrote:
 
  On Wed, May 26, 2010 at 6:16 AM, Alan Cox a...@lxorguk.ukuu.org.uk wrote:
   Really, what are you getting at? Do you deny that there are programs,
   that prevent a device from sleeping? (Just think of the bouncing
   cows app)
  
   And if you have two kernels, one with which your device is dead after 1
   hour and one with which your device is dead after 10 hours. Which would
   you prefer? I mean really... this is ridiculous.
  
   The problem you have is that this is policy. If I have the device wired
   to a big screen and I want cows bouncing on it I'll be most upset if
   instead it suspends.
 
  We never suspend when the screen is on. If the screen is off, I would
  not be upset if it suspends.
 
  This is policy and platform specific. The OLPC can suspend with the
  screen on. Should I write my app to know about this once for Android and
  once for OLPC (and no doubt once for Apple). In the OLPC case cows could
  indeed suspend to RAM between frames if the wakeup time was suitable.
 
 Are you still talking about Linux suspend? If you can enter S3 from
 idle and still wake up for the next timer or interrupt, then do that.
 Suspend blockers should have not effect on this.

It does not matter whether it's S3 or any other power saving
state / mechanism in a particular device.

What matters is that the kernel needs to know about the QoS
requirements of the applications which are active to make optimal
decisions for power management. 

If we can go into any given power state from idle then the decision
which power state to select needs to be made on the requirements of an
application to react on the next event (timer, interrupt, ...).

So yes, we could go into S3 (with some effort) and arm the wakeup
source which will bring us back when the requirements of the apps are
known at the point where the decision is made.

 
  My app simply should not have to know this sort of crap, that's the whole
  point of an OS.
 
 
 Most apps does not have to know about this with opportunistic suspend
 either. If the user is interacting with the device, we don't suspend.
 If your apps needs to run when the user is not interacting with the
 device, then you can block suspend.

You can express that in QoS requirements as well. If you say max
wakeup latency 100ms then the kernel will select a power state which
will meet this requirement. So it can decide whether to go into full
suspend on a given system or select some other better suiting power
saving mechanism. But this allows us to express this in a completely
hardware independent way. If the hardware does not provide a suitable
state, then the box stays on.

   What you are essentially arguing for is for the
   kernel to disobey the userspace.
 
  No I'm not. User-space asked the kernel to suspend when possible.
  Suspend is an existing kernel feature. All opportunistic suspend adds
  is the ability to use it safely when there are wakeup event you cannot
  afford to ignore.
 
  Don't get me wrong - opportunistic suspend isn't the problem. Suspend
  blockers are - or more precisely - the manner in which they express
  intent from userspace. Opportunistic suspend is wonderful stuff for all
  sorts of things and if it persuades people like netbook manufacturers to
  think harder, and Linux driver authors to optimise their suspend/resume
  paths we all win.
 
  Our actual stating point is this: Some systems can only enter their
  lowest power state from suspend. So we added an api that allowed us to
  use suspend without loosing wakeup events. Since suspending also
  improves our battery life on platforms that enter the same power state
  from idle and suspend and we already have a way to safely use suspend,
  we would be crazy not to use it.
 
  Opportunistic suspend isn't special. Suspend is just a very deep idle. In
 
 Suspend as it is currently implemented in Linux is special. Regular
 timers stop, and only specially marked wakeup events can bring the
 system back to the normal state.

That's a matter of the current implementation. We can change that and
improve it to do better resource management. And this requirement is
not restricted to the mobile phone space, it's true for laptops,
virtualization and other areas as well.
 
  But you can express user suspend blocking in this manner. Your call
  incoming code would say 'I want good latency'. As someone needs good
  latency the box won't suspend. If your approach is to start with an
  initial 'anyone can set a low latency we don't care' then anyone can
  block suspends.
 
  Equally your call handling example is about latency not about suspend.
  You want the phone to stay on, to fetch a picture and display it promptly.
 
 
 I don't think a latency api is the right way to express this since the
 only latency values we would use is minimal-latency and

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:
 On Thu, May 27, 2010 at 12:39:43AM +0100, Alan Cox wrote:
 
  - Supporting Android needs well good
  - Opportunistic suspend good
  - Manner in which interface is expressed to userspace bad
  - Latency constraint interface would be better
  - Your existing behaviour can be implemented by a simplistic use of a
latency constraint interface
  - We can fix a pile of other directly connected things at the same time
  - Implementation internals I care far less about because we can fix those
later
  - Suspend is just a power state
  
  How does that fit your model and vision ?
 
 I don't entirely see how this works. In order to deal with poorly 
 written applications, it's necessary to (optionally, based on some 
 policy) ignore them when it comes to the scheduler. The problem is how 
 to implement the optional nature of this in a race-free manner. This is 
 obviously a pathological case, but imagine an application that does 
 something along the following lines:
 
 int input = open (/dev/input, O_RDONLY|O_NONBLOCK);
 char foo;
 
 while (1) {
   suspend_block();
   if (read(input, foo, 1)  0) {
   (do something)
   suspend_unblock();
   } else {
   suspend_unblock();
   (draw bouncing cows and clouds and tractor beams briefly)
   }
 }
 
 Now, if the user is playing this game, you want it to be scheduled. If 
 the user has put down their phone and the screen lock has kicked in, you 
 don't want it to be scheduled. So we could imagine some sort of cgroup 
 that contains untrusted tasks - when the session is active we set a flag 
 one way which indicates to the scheduler that tasks in TASK_RUNNING 
 should be scheduled, and when the session is idle we set the flag the 
 other way and all processes in that cgroup get shifted to 
 TASK_INTERRUPTIBLE or something.
 
 Except that doesn't work. If the session goes idle in the middle of the 
 app drawing a frame, we'll stop the process and the task will never call 
 read(). So the user hits a key, we wake up, nothing shifts from 
 TASK_INTERRUPTIBLE into TASK_RUNNING, the key never gets read, we go 
 back to sleep. The event never gets delivered.
 
 Now let's try this in the Android world. The user hits a key and the 
 system wakes up. The input layer takes a suspend block. The application 
 now draws all the cows it wants to, takes its own suspend block and 
 reads the input device. This empties the queue and the kernel-level 
 suspend block is released. The application then processes the event 
 before releasing the suspend block. The event has been delivered and 
 handled.

Thanks for providing this example:

  1) It proves that suspend blockers are solely designed to encourage
 people to code crap.

  2) It exposes the main conceptual problem of the approach:

 The input layer in the kernel magically takes a suspend blocker
 and releases it in an equally magic way just to allow the crappy
 application to reach the point where it takes it's own suspend
 blocker and can react on the user input.
 
 And you need to do that, because the user applications suspend
 blocker magic is racy as hell. To work around that you sprinkle
 your suspend blocker magic all over the kernel instead of telling
 people how to solve the problem correctly.

 And what are you achieving with this versus power saving ?

 Exaclty _NOTHING_ ! 

 Simply because you move the cow drawing CPU time from the point
 where the device wants to go into suspend to the point where the
 user hits a key again. You even delay the reaction of your app to
 the user input by the time it needs to finish drawing cows.
 
 So you need to come up with a way better example why you need
 your blockers than with this prove of misconception.

 You can't express that with resource limits or QoS constraints. If you 
 want to deal with this kind of situation then, as far as I can tell, you 
 need either suspend blockers or something so close to them that it makes 
 no difference.

Wrong. If your application is interactive then you set the QoS
requirement once to interactive and be done.

So the correct point to make a power state decision is when the app
waits for a key press. At this point the kernel can take several
pathes:

  1) Keep the system alive because the input device is in active
 state and a key press is expected

  2) Go into supsend because the input device is deactivated after
 the screen lock kicked in.

This behaves exactly the same way in terms of power consumption as
your blocker example just without all the mess you are trying to
create.

And it allows the kernel to use intermediate power saving methods
(between idle and suspend) which might be available on some hardware.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Stern wrote:
 On Thu, 27 May 2010, Peter Zijlstra wrote:
 
  On Thu, 2010-05-27 at 16:33 +0100, Alan Cox wrote:
   On Thu, 27 May 2010 17:09:16 +0200
   Peter Zijlstra pet...@infradead.org wrote:
   
On Thu, 2010-05-27 at 11:06 -0400, Alan Stern wrote:
 
 Opportunistic suspends are okay.
 
 The proposed userspace API is too Android-specific.

I would argue opportunistic suspends are not ok, and therefore the
proposed API is utterly irrelevant.
   
   Assuming you are happy that opportunistically entering C6 and the like is
   ok via ACPI can you explain why you have a problem with opportunistic
   suspend and why it is different to a very deep sleep CPU state such as we
   have now (and which on many embedded platforms we deal with *is* sleeping
   external devices too)
  
  Agreed, but I understood the opportunistic suspend line from Alan Stern
  to mean the echo opportunistic  /sys/power/foo thing.
 
 Yes, that's what I meant.  Why do you think it is any worse than echo 
 mem /sys/power/state?  The only difference is that it will block 
 until all in-kernel suspend blockers are disabled.
 
 Or do you also think that echo mem /sys/power/state is evil and 
 should be removed from the kernel as soon as possible?
 
 And to answer Thomas's question: The whole reason for having in-kernel 
 suspend blockers is so that userspace can tell the system to suspend 
 without losing wakeup events.
 
 Suppose a key is pressed just as a user program writes mem to
 /sys/power/state.  The keyboard driver handles the keystroke and queues
 an input event.  Then the system suspends and doesn't wake up until
 some other event occurs -- even though the keypress was an important
 one and it should have prevented the system from suspending.
 
 With in-kernel suspend blockers and opportunistic suspend, this 
 scenario is prevented.  That is their raison-d'etre.

I tend to disagree. You are still looking at suspend as some extra
esoteric mechanism. We should stop doing this and look at suspend (to
RAM) as an additional deep idle state, which is well defined in terms
of power savings and wakeup latency. That's what I think opportunistic
suspend is all about. Now if you look at our current idle states in
x86 the incoming keystroke is never lost. 

So when suspend does lose the wakeup event then we need to fix that,
but why do we need suspend blockers for this ? Let's take your
example:

 The keyboard driver handles the keystroke and queues an input
 event. Then the system goes into suspend 

Why do we go into suspend at all? If there is an event queued then
something is woken up and got running, which is reason enough _not_ to
enter suspend. If that's broken in the current implementation then we
need to fix it, but not with a big hammer by adding another
interface. We have that information already and obviously we do not
use it, so lets figure out why before adding suspend blockers just
because they paper over the underlying problem.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:
 On Thu, May 27, 2010 at 05:32:56PM +0200, Thomas Gleixner wrote:
  On Thu, 27 May 2010, Matthew Garrett wrote:
   Now let's try this in the Android world. The user hits a key and the 
   system wakes up. The input layer takes a suspend block. The application 
   now draws all the cows it wants to, takes its own suspend block and 
   reads the input device. This empties the queue and the kernel-level 
   suspend block is released. The application then processes the event 
   before releasing the suspend block. The event has been delivered and 
   handled.
  
  Thanks for providing this example:
  
1) It proves that suspend blockers are solely designed to encourage
   people to code crap.
 
 No. Suspend blockers are designed to ensure that suspend isn't racy with 
 respect to wakeup events. The bit that mitigates badly written 
 applications is the bit where the scheduler doesn't run any more.
 
 If you're happy with a single badly written application being able to 
 cripple your power management story, you don't need opportunistic 
 suspend. But you still have complications when it comes to deciding to 
 enter suspend at the same time as you receive a wakeup event.

Wrong. Setting the QoS requirements of the badly written app to any
latency will allow the kernel to suspend even if the crappy app is
active.

And again. I'm opposing the general chant that fixing crappy
applications in the kernel is a good thing. It's the worst decision we
could make.
 
   And you need to do that, because the user applications suspend
   blocker magic is racy as hell. To work around that you sprinkle
   your suspend blocker magic all over the kernel instead of telling
   people how to solve the problem correctly.
 
 What /is/ the correct way to solve this problem when entering explicit 
 suspend states? How do you guarantee that an event has been delivered to 
 userspace before transitioning into suspend? Now, this is a less 
 interesting problem if you're not using opportunistic suspend, but it's 
 still a problem.

Holy crap. If an event happens _before_ we go into an idle state - and
I see suspend as an deeper idle state - then we do not go there at all.

The whole notion of treating suspend to RAM any different than a plain
idle C-State is wrong. It's not different at all. You just use a
different mechanism which has longer takedown and wakeup latencies and
requires to shut down stuff and setup extra wakeup sources.

And there is the whole problem. Switching from normal event delivery
to those special wakeup sources. That needs to be engineered in any
case carefuly and it does not matter whether you add suspend blockers
or not.
 
   Simply because you move the cow drawing CPU time from the point
   where the device wants to go into suspend to the point where the
   user hits a key again. You even delay the reaction of your app to
   the user input by the time it needs to finish drawing cows.
 
 It's how application mainloops tend to work.

So what's the f*cking point ? You draw exactly the same amount of
power and still you are claiming that it's better or what ?
 
   You can't express that with resource limits or QoS constraints. If you 
   want to deal with this kind of situation then, as far as I can tell, you 
   need either suspend blockers or something so close to them that it makes 
   no difference.
  
  Wrong. If your application is interactive then you set the QoS
  requirement once to interactive and be done.
 
  So the correct point to make a power state decision is when the app
  waits for a key press. At this point the kernel can take several
  pathes:
  
1) Keep the system alive because the input device is in active
   state and a key press is expected
  
2) Go into supsend because the input device is deactivated after
   the screen lock kicked in.
 
 That's no good. If the input device has been deactivated, how does the 
 wakeup event get delivered to the application?
  
  This behaves exactly the same way in terms of power consumption as
  your blocker example just without all the mess you are trying to
  create.
 
 And means that wakeup events don't get delivered. That's a shortcoming.

That's utter nonsense. If we have a problem with missed wakeups then
it needs to be fixed and not papered over with suspend blocker magic.

I'm starting to get really grumpy about the chant that suspend
blockers are the only way to fix missed wakeups. That might be the
only way you can think of with your pink android glasses on, but again
this is not rocket science even if it does not fit into the current
way the kernel handles the whole suspend mechanism.

So if we really sit back and look at suspend as another idle state,
then we have first off the same requirements for entering it as we
have for any other idle state:

 No running tasks (and we can solve the don't care task problem
 nicely

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Cox wrote:
 That's all your need to do it right.
 
 In kernel yes your device driver probably does need to say things like
 'Don't go below C6 for a moment' just as a high speed serial port might
 want to say 'Nothing over 10mS please'
 
 I can't speak for Thomas, but I'm certainly not arguing that you don't
 need something that looks more like the blocker side of the logic *in
 kernel*, because there is stuff that you want to express which isn't tied
 to the task.

I'm not opposed, but yes it needs to be expressed in quantifiable
terms, i.e. wakeup latency. That's just contributing to the global QoS
state of affairs even if it is not tied to a particular task. 

And that allows the driver to be intelligent about it. The serial port
at 9600 has definitely different requirements than at 115200.

But that's quite a different concept than the big hammer approach of
the blockers.

Thanks,

tglx

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 05:16:15PM +0100, Alan Cox wrote:
 
  I can't speak for Thomas, but I'm certainly not arguing that you don't
  need something that looks more like the blocker side of the logic *in
  kernel*, because there is stuff that you want to express which isn't tied
  to the task.
 
 Sure, if you're not using opportunistic suspend then I don't think 
 there's any real need for the userspace side of this. The question is 
 how to implement something with the useful properties of opportunistic 
 suspend without without implementing something pretty much equivalent to 
 the userspace suspend blockers. I've sent another mail expressing why I 
 don't think your proposed QoS style behaviour provides that.

Opportunistic suspend is just a deep idle state, nothing else. If the
overall QoS requirements allow to enter that deep idle state then the
kernel goes there. Same decision as for all other idle states. You
don't need any user space blocker for this decision, just sensible QoS
information.

Stop thinking about suspend as a special mechanism. It's not - except
for s2disk, which is an entirely different beast.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 06:45:25PM +0200, Thomas Gleixner wrote:
  On Thu, 27 May 2010, Matthew Garrett wrote:
   No. Suspend blockers are designed to ensure that suspend isn't racy with 
   respect to wakeup events. The bit that mitigates badly written 
   applications is the bit where the scheduler doesn't run any more.
   
   If you're happy with a single badly written application being able to 
   cripple your power management story, you don't need opportunistic 
   suspend. But you still have complications when it comes to deciding to 
   enter suspend at the same time as you receive a wakeup event.
  
  Wrong. Setting the QoS requirements of the badly written app to any
  latency will allow the kernel to suspend even if the crappy app is
  active.
 
 That's not what I want if I'm using the app at the time...

Your crappy app does not use suspend blockers either.
 
  And again. I'm opposing the general chant that fixing crappy
  applications in the kernel is a good thing. It's the worst decision we
  could make.
 
 You still need the in-kernel suspend blockers if you want to guarantee 
 that you can't lose wakeup events. But yes, if you're not concerned 
 handling badly behaved applications then I believe that you can lose 
 opportunistic suspend and just use the scheduler.

No, we do not. We need correctly implemented drivers and a safe
switchover from normal event delivery to wakeup based.
 
  The whole notion of treating suspend to RAM any different than a plain
  idle C-State is wrong. It's not different at all. You just use a
  different mechanism which has longer takedown and wakeup latencies and
  requires to shut down stuff and setup extra wakeup sources.
 
 My question was about explicit suspend states, not implicitly handling 
 an identical state based on scheduler constraints. Suspend-as-a-C-state 
 isn't usable on x86 - you have to explicitly trigger it based on some 

And why not ? Just because suspend is not implemented as an ACPI
C-state ? 

Nonsense, if we want to push the system into suspend from the idle
state we can do that. It's just not implemented and we've never tried
to do it as it requires a non trivial amount of work, but I have done
it on an ARM two years ago as a prove of concept and it works like a
charm.

 policy. And if you want to be able to do that without risking the loss 
 of wakeup events then you need in-kernel suspend blockers.

Crap. Stop beating on those lost wakeup events. If we lose them then
the drivers are broken and do not handle the switch over correctly. Or
the suspend mechanism is broken as it does not evaluate the system
state correctly. Blockers are just papering over that w/o tackling the
real problem.

  So if we really sit back and look at suspend as another idle state,
  then we have first off the same requirements for entering it as we
  have for any other idle state:
 
 There are various platforms where we cannot treat suspend as an idle 
 state. Any solution that requires that doesn't actually solve the 
 problem. Yes, this is *trivial* if you can depend on the scheduler. But 
 we can't, and so it's difficult.

Stop handwaving. Which platforms prevent us to go into suspend from
idle ? Please point me to the relevant documentation which says so.

Just because we have not tried to implemented it does not mean that we
cannot implement it.

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Stern wrote:

 On Thu, 27 May 2010, Thomas Gleixner wrote:
 
  The whole notion of treating suspend to RAM any different than a plain
  idle C-State is wrong. It's not different at all. You just use a
  different mechanism which has longer takedown and wakeup latencies and
  requires to shut down stuff and setup extra wakeup sources.
 
 This is where you are wrong.  Maybe not wrong in principle, but wrong 
 in practice -- the kernel's present suspend-to-RAM (or just suspend) 
 implementation is _very_ different from C-states (or just idle).

Holy bouncing cow. I damned good know that the current implementation
is not doing that and that suspend is implemented in a totally
different way. That does not mean that this is written in stone. We
CAN change that and fix it to gain opportunistic suspend.
 
 The primary difference is that the kernel can be forced into suspend
 even when the system isn't idle.  In particular, it may be in the
 middle of processing a potential wakeup event -- and the current design
 gives the PM core no way to know if that is so.  This is a weakness
 that in-kernel suspend blockers fix.

Oh no. They paper over a short coming. If there is a pending event,
the kernel knows that. It just does not make use of this
information. Blockers just paper over this by sprinkling
do_not_suspend() calls all over the place. What a sensible solution.

 With C-states this can't happen.  If the CPU goes into a deeper C-state 
 then ipso facto it isn't busy processing anything, much less a wakeup 
 event.

And that's the whole point of doing the opportunistic suspend with the
help of the scheduler.

 Now maybe this difference is a bad thing, and the whole PM 
 suspend/hibernate infrastructure should be rewritten.  But the fact, 
 remains, that's how it works now.  And it can't be changed easily or 
 quickly.

So what you are saying is that we better paper over the shortcomings
of our current implementation with do_not_suspend() code sprinkled all
over the place instead of sitting down and making suspend from idle
work. It's more or less trivial depending on the platform, but not
rocket science.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Stern wrote:

 On Thu, 27 May 2010, Felipe Balbi wrote:
 
  On Thu, May 27, 2010 at 05:06:23PM +0200, ext Alan Stern wrote:
  If people don't mind, here is a greatly simplified summary of the
  comments and objections I have seen so far on this thread:
  
 The in-kernel suspend blocker implementation is okay, even
 beneficial.
  
  I disagree here. I believe expressing that as QoS is much better. Let 
  the kernel decide which power state is better as long as I can say I 
  need 100us IRQ latency or 100ms wakeup latency.
 
 Does this mean you believe echo mem /sys/power/state is bad and
 should be removed?  Or echo disk /sys/power/state?  They pay no

mem should be replaced by an idle suspend to ram mechanism

 attention to latencies or other requirements.

s2disk is a totally different beast as it shuts down the box into the
complete power off state.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:
 On Thu, May 27, 2010 at 07:15:31PM +0200, Thomas Gleixner wrote:
  On Thu, 27 May 2010, Matthew Garrett wrote:
   My question was about explicit suspend states, not implicitly handling 
   an identical state based on scheduler constraints. Suspend-as-a-C-state 
   isn't usable on x86 - you have to explicitly trigger it based on some 
  
  And why not ? Just because suspend is not implemented as an ACPI
  C-state ? 
  
  Nonsense, if we want to push the system into suspend from the idle
  state we can do that. It's just not implemented and we've never tried
  to do it as it requires a non trivial amount of work, but I have done
  it on an ARM two years ago as a prove of concept and it works like a
  charm.
 
 ACPI provides no guarantees about what level of hardware functionality 
 remains during S3. You don't have any useful ability to determine which 
 events will generate wakeups. And from a purely practical point of view, 
 since the latency is in the range of seconds, you'll never have a low 
 enough wakeup rate to hit it.

Right, it does not as of today. So we cannot use that on x86
hardware. Fine. That does not prevent us to implement it for
architectures which can do it. And if x86 comes to the point where it
can handle it as well we're going to use it. Where is the problem ? If
x86 cannot guarantee the wakeup sources it's not going to be used for
such devices. The kernel just does not provide the service for it, so
what ?
 
   policy. And if you want to be able to do that without risking the loss 
   of wakeup events then you need in-kernel suspend blockers.
  
  Crap. Stop beating on those lost wakeup events. If we lose them then
  the drivers are broken and do not handle the switch over correctly. Or
  the suspend mechanism is broken as it does not evaluate the system
  state correctly. Blockers are just papering over that w/o tackling the
  real problem.
 
 Ger;kljaserf;kljf;kljer;klj. Suspend blockers are the mechanism for the 
 driver to indicate whether the wakeup event has been handled. That's 
 what they're there for. The in-kernel ones don't paper over anything.

So the driver says: events have been handled. Neat.

Now if that crappy app does not use the user space blockers or is not
allowed to use them, then what are you guaranteeing ? All you
guarantee is that the application has emptied the input queue. Nothing
else. And that's the same as setting the QoS guarantee to NONE.

An application which uses the blocker is just holding off the system
from going into deep idle. Nothing which cannot be done today.

So the only thing you are imposing to app writers is to use an
interface which solves nothing and does not save you any power at
all. 

If the application drops the blocker after processing the event and
before it goes back to the read event then you just postpone the CPU
usage and therefor power consumption to a later point in time instead
of going back to the blocking read right away. 

Again what is it saving ?  NOTHING!  And for nothing you want to mess
up the kernel big time ?

Runnable tasks and QoS guarantees are the indicators whether you can
go to opportunistic suspend or not. Everything else is just window
dressing.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 07:24:02PM +0200, Thomas Gleixner wrote:
 
  Oh no. They paper over a short coming. If there is a pending event,
  the kernel knows that. It just does not make use of this
  information. Blockers just paper over this by sprinkling
  do_not_suspend() calls all over the place. What a sensible solution.
 
 Even if we could use suspend-via-deep-idle-state on PCs, we still need 
 to be able to enter suspend while the system isn't idle. There's two 
 ways to do that:
 
 1) Force the system to be idle. Doing this race-free is difficult.
 
 2) Enter suspend even though the system isn't idle. Since we can't rely 
 on the scheduler, we need drivers to know whether userspace has consumed 
 all wakeup events before allowing the transition to occur. Doing so 
 requires either in-kernel suspend blockers or something that's almost 
 identical.

You're just not getting it. If user space has consumed the event is
not relevant at all.

What's relevant is whether the app has processed the event and reacted
accordingly. That's all that matters.

Emptying your input queue is just the wrong indicator.

And as I explained several times now: It does _NOT_ matter when the
app goes back in to blocked/idle state. You have to spend the CPU
cycles and power for that anyway.

And for the apps which do not use the user space blockers the queue
empty indicator is just bullshit, because after emptying the queue the
kernel can go into suspend w/o any guarantee that the event has been
processed.

The whole concept sucks, as it does not solve anything. Burning power
now or in 100ms is the same thing power consumption wise.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Stern wrote:

 On Thu, 27 May 2010, Thomas Gleixner wrote:
 
  Crap. Stop beating on those lost wakeup events. If we lose them then
  the drivers are broken and do not handle the switch over correctly. Or
  the suspend mechanism is broken as it does not evaluate the system
  state correctly. Blockers are just papering over that w/o tackling the
  real problem.
 
 That's the point -- suspend does not evaluate the system state 
 correctly because it doesn't have the necessary information.  Suspend 
 blockers are a way of providing it that information.  They don't paper 
 over the problem; they solve it.

Nonsense. The system state is well defined when a event is pending and
we just have to say good bye to the idea that forced suspend is a good
solution. It's not as it does not guarantee the event processing in
badly written apps and it does move the power consumption to a later
point in time for those apps which acquire/drop the blockers.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:
 On Thu, May 27, 2010 at 07:35:50PM +0200, Peter Zijlstra wrote:
  On Thu, 2010-05-27 at 18:32 +0100, Matthew Garrett wrote:
   On Thu, May 27, 2010 at 07:28:08PM +0200, Peter Zijlstra wrote:
Why would you care about the screen for a network event?
   
   Because the application that needs to handle the network packet is 
   currently blocked trying to draw something to the screen.
  
  Then that's an application bug right there, isn't it?
  
  If should have listened to the window server telling its clients it was
  going to go away. Drawing after you get that is your own damn fault ;-)
 
 How long do you wait for applications to respond that they've stopped 
 drawing? What if the application is heavily in swap at the time?

Very realistic scenario on a mobile phone. 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 07:34:40PM +0200, Peter Zijlstra wrote:
   we still need to be able to enter suspend while the system isn't idle.
  
  _WHY_!?
 
 Because if I'm running a kernel build in a tmpfs and I hit the sleep 
 key, I need to go to sleep. Blocking processes on driver access isn't 
 sufficient.

That's the difference between opportunistic and enforced suspend. On
enforced suspend the QoS guarantee is forced to NONE for everything
and you go down no matter what. When you decide to push the wakeup
button the system restores.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 06:49:18PM +0100, Alan Cox wrote:
   ACPI provides no guarantees about what level of hardware functionality 
   remains during S3. You don't have any useful ability to determine which 
   events will generate wakeups. And from a purely practical point of view, 
   since the latency is in the range of seconds, you'll never have a low 
   enough wakeup rate to hit it.
  
  So PCs with current ACPI don't get opportunistic suspend capability. It
  probably won't be supported on the Commodore Amiga either - your point ?
 
 Actually, the reverse - there's no terribly good way to make PCs work 
 with scheduler-based suspend, but there's no reason why they wouldn't 
 work with the current opportunistic suspend implementation.

How does that solve the problems you mentioned above ? Wakeup
guarantees, latencies ...

It's not a prove of the technical correctness of the approach if it
can provide a useless functionality.
 
Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 07:59:02PM +0200, Thomas Gleixner wrote:
  On Thu, 27 May 2010, Matthew Garrett wrote:
   ACPI provides no guarantees about what level of hardware functionality 
   remains during S3. You don't have any useful ability to determine which 
   events will generate wakeups. And from a purely practical point of view, 
   since the latency is in the range of seconds, you'll never have a low 
   enough wakeup rate to hit it.
  
  Right, it does not as of today. So we cannot use that on x86
  hardware. Fine. That does not prevent us to implement it for
  architectures which can do it. And if x86 comes to the point where it
  can handle it as well we're going to use it. Where is the problem ? If
  x86 cannot guarantee the wakeup sources it's not going to be used for
  such devices. The kernel just does not provide the service for it, so
  what ?
 
 We were talking about PCs. Suspend-as-c-state is already implemented for 
 OMAP.
 
Ah, now we talk about PCs. And all of a sudden the problem of the
unability of determining wakeup sources is not longer relevant ? So
how do you guarantee that we don't miss one if we cant figure out
which ones are kept alive in S3 ?

  So the only thing you are imposing to app writers is to use an
  interface which solves nothing and does not save you any power at
  all. 
 
 It's already been demonstrated that the Android approach saves power.

Demonstrated ? Care to explain me how it makes a difference:

while (1) {
  block();
  read();
  process_event();
  unblock();
--- suspend
--- resume
  do_crap();100 cycles
}

vs.

while (1) {
  read();
--- suspend
--- resume
  process_event();
  do_crap();100 cycles
}

You spend the damned 1000 cycles in any case just at a different
point in time. So if you are so convinced and have fully understood
all the implications, please enlighten me why do_crap() costs less
power with the blockers approach.

An you are also stubbornly refusing to answer my analysis about the
effect on apps which do not use the blocker or are not allowed to.

1) The kernel blocker does not guarantee that the lousy app has
   processed the event. It just guarantees that the lousy app has
   emptied the input queue. So what's the point of the kernel blocker
   in that case ?

2) What's the difference on setting that app to QoS(NONE) and let the
   kernel happily ignore it.

Come up with real explanations and numbers and not just the it has
been demonstrated chant which is not holding water if you look at the
above.
 
  Runnable tasks and QoS guarantees are the indicators whether you can
  go to opportunistic suspend or not. Everything else is just window
  dressing.
 
 As I keep saying, this is all much less interesting if you don't care 
 about handling suboptimal applications. If you do care about them then 
 the Android approach works. Nobody has demonstrated a scheduler-based 
 one that does.

That does not make the android approach any better. They should have
talked to us upfront and not after the fact. Just because they decided
to do that in their google basement w/o talking to people who care is
not proving that it's a good solution and even less a reason to merge
it as is.

The kernel history is full of examples where crappy solutions got
rejected and kept out of the kernel for a long time even if there was
a need for them in the application field and they got shipped in
quantities with out of tree patches (NOHZ, high resolution timers,
...). At some point people stopped arguing for crappy solutions and
sat down and got it right. The problem of power management and
opportunistic suspend is not different in any way.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 08:18:11PM +0200, Peter Zijlstra wrote:
  On Thu, 2010-05-27 at 19:14 +0100, Matthew Garrett wrote:
   If I get a WoL 
   packet in the 0.5 of a second between userspace deciding to suspend and 
   actually doing so, the system shouldn't suspend. 
  
  Please re-read Thomas' description of how a driver should do the state
  transition.
 
  So either we get the packet before suspend, and we cancel the suspend,
  or we get it after and we wake up. What's the problem, and how does that
  need suspend blockers?
 
 In order to cancel the suspend we need to keep track of whether 
 userspace has consumed the event and reacted appropriately. Since we 
 can't do this with the process scheduler, we end up with something that 
 looks like suspend blockers.

And the scheduler based approach does exactly this and provides
exactly the same non guarantees to applications which are set to
QoS(NONE) as it does to applications which do not (or are not allowed
to) use the user space suspend blockers.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Cox wrote:

  No, it's not. Forced suspend may be in response to hitting a key, but it 
 
 You are the only person here talking about 'forced' suspends. The rest of
 us are talking about idling down and ensuring we are always in a state we
 un-idle correctly.
 
  may also be in response to a 30 minute timeout expiring. If I get a WoL 
  packet in the 0.5 of a second between userspace deciding to suspend and 
  actually doing so, the system shouldn't suspend.
 
 I don't think that argument holds water in the form you have it
 
 What about 5 nanoseconds before you suspend. Now you can't do that (laws
 of physics and stuff).
 
 So your position would seem to be we have a race but can debate how big
 is permissible
 
 The usual model is
 
 At no point should we be in a position when entering a suspend style
  deep sleep where we neither abort the suspend, nor commit to a
  suspend/resume sequence if the events we care about occur
 
 and that is why the hardware model is
 
   Set wake flags
   Check if idle
   If idle
   Suspend
   else
   Clear wake flags
   Unwind
 
 and the wake flags guarantee that an event at any point after the wake
 flags are set until they are cleared will cause a suspend to be resumed,
 possibly immediately after the suspend.

And if a platform can not guarantee the wakeup or the lossless
transition of states then you can not solve this by throwing blockers
or whatever into the code.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Matthew Garrett wrote:

 On Thu, May 27, 2010 at 08:53:11PM +0200, Thomas Gleixner wrote:
  On Thu, 27 May 2010, Matthew Garrett wrote:
   We were talking about PCs. Suspend-as-c-state is already implemented for 
   OMAP.
   
  Ah, now we talk about PCs. And all of a sudden the problem of the
  unability of determining wakeup sources is not longer relevant ? So
  how do you guarantee that we don't miss one if we cant figure out
  which ones are kept alive in S3 ?
 
 A wakeup event is defined as one that wakes the system - if a system 
 can't be woken by a specific event then it's impossible to lose it, 
 since it wasn't a wakeup event to begin with.

So where is the problem ?
 
So the only thing you are imposing to app writers is to use an
interface which solves nothing and does not save you any power at
all. 
   
   It's already been demonstrated that the Android approach saves power.
  
  Demonstrated ? Care to explain me how it makes a difference:
  
  while (1) {
block();
read();
process_event();
unblock();
  --- suspend
  --- resume
do_crap();100 cycles
  }
  
  vs.
  
  while (1) {
read();
  --- suspend
  --- resume
process_event();
do_crap();100 cycles
  }
  
  You spend the damned 1000 cycles in any case just at a different
  point in time. So if you are so convinced and have fully understood
  all the implications, please enlighten me why do_crap() costs less
  power with the blockers approach.
 
 Consider the case where the read() is non-blocking.

Which I consider in the same range as the application which does:

  while (1);

  1) The kernel blocker does not guarantee that the lousy app has
 processed the event. It just guarantees that the lousy app has
 emptied the input queue. So what's the point of the kernel blocker
 in that case ?
 
 Yes, I think you're right here. You need the userspace component as well 
 for this to work correctly.

  2) What's the difference on setting that app to QoS(NONE) and let the
 kernel happily ignore it.
 
 What sets that flag, and how does it interact with an application that 
 never makes a blocking system call?

  QoS(NONE) would be default policy for untrusted apps in the same way
  as you'd refuse the usage of supsend blockers for untrusted apps.
  
  while (1); is definitely not an application which should be granted
  trusted rights, right ?

  block_suspend(); while(1);
  
  is the same as:

  QoS(minimal latency); while(1);

  So if you really go to trust an while (1); application you better
  make sure that this app has the appropriate QoS(NONE) or QoS(10s)
  call in it before letting it lose.
 
  Come up with real explanations and numbers and not just the it has
  been demonstrated chant which is not holding water if you look at the
  above.
 
 The numbers were earlier in the thread.

  Numbers, yes. But I really give a sh*t about numbers w/o a detailed
  explanation of the scenario which created those numbers. And if the
  numbers boil down to: we handle the untrusted app which does while
  (1); then they are absolutely useless.

  The kernel history is full of examples where crappy solutions got
  rejected and kept out of the kernel for a long time even if there was
  a need for them in the application field and they got shipped in
  quantities with out of tree patches (NOHZ, high resolution timers,
  ...). At some point people stopped arguing for crappy solutions and
  sat down and got it right. The problem of power management and
  opportunistic suspend is not different in any way.
 
 Yes, and I'd agree with this if anyone seemed to have any idea how to do 
 it right. But despite all the heat and noise in this thread, all we've 
 got is an expression that it should be handled by the scheduler (a 
 viewpoint I agree with) without any explanation of how to switch 
 policies in such a way that applications idle without losing wakeup 
 events.

Why in the world should they lose wakeup events ? If an app goes idle,
it goes idle because it is waiting for an event. There is no other
sane reason except for those apps which are untrusted and force
idled. And for those you agreed already the suspend blockers don't
solve anything as they are either not implemented or the app is not
trusted to use them.

So we are back to round one of the whole discussion:

   Is it worth to create an utter mess in the kernel just to handle a
   subset of crappy coded applications ?

And the answer stays the same: No, not at all.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Stern wrote:

 On Thu, 27 May 2010, Matthew Garrett wrote:
 
  On Thu, May 27, 2010 at 10:23:50PM +0200, Thomas Gleixner wrote:
   On Thu, 27 May 2010, Matthew Garrett wrote:
A wakeup event is defined as one that wakes the system - if a system 
can't be woken by a specific event then it's impossible to lose it, 
since it wasn't a wakeup event to begin with.
   
   So where is the problem ?
  
  The problem is that, right now, if a wakeup event is received between 
  the point where userspace decides to start a suspend and userspace 
  actually starts a suspend, that event may not abort the suspend.
 
 The two of you are talking at cross purposes.  Thomas is referring to 
 idle-based suspend and Matthew is talking about forced suspend.

Yes, and forced suspend to disk is the same as force suspend to disk,
which has both nothing to do with sensible resource management.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-27 Thread Thomas Gleixner

On Thu, 27 May 2010, Rafael J. Wysocki wrote:

 On Thursday 27 May 2010, Thomas Gleixner wrote:
  On Thu, 27 May 2010, Alan Stern wrote:
  
   On Thu, 27 May 2010, Felipe Balbi wrote:
   
On Thu, May 27, 2010 at 05:06:23PM +0200, ext Alan Stern wrote:
If people don't mind, here is a greatly simplified summary of the
comments and objections I have seen so far on this thread:

   The in-kernel suspend blocker implementation is okay, even
   beneficial.

I disagree here. I believe expressing that as QoS is much better. Let 
the kernel decide which power state is better as long as I can say I 
need 100us IRQ latency or 100ms wakeup latency.
   
   Does this mean you believe echo mem /sys/power/state is bad and
   should be removed?  Or echo disk /sys/power/state?  They pay no
  
  mem should be replaced by an idle suspend to ram mechanism
 
 Well, what about when I want the machine to suspend _regardless_ of whether
 or not it's idle at the moment?  That actually happens quite often to me. :-)

Fair enough. Let's agree on a non ambigous terminology then:

 forced:

 suspend which you enforce via user interaction, which
 also implies that you risk losing wakeups depending on
 the hardware properties

 opportunistic:

 suspend driven from the idle context, which guarantees to
 not lose wakeups. Provided only when the hardware does
 provide the necessary capabilities.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-28 Thread Thomas Gleixner

On Thu, 27 May 2010, Alan Stern wrote:

 On Thu, 27 May 2010, Thomas Gleixner wrote:
 
   The two of you are talking at cross purposes.  Thomas is referring to 
   idle-based suspend and Matthew is talking about forced suspend.
  
  Yes, and forced suspend to disk is the same as force suspend to disk,
  which has both nothing to do with sensible resource management.
 
 If I understand correctly, you are saying that all the untrusted 
 applications should run with QoS(NONE).  Then they could do whatever 
 they wanted without causing any interference.
 
 And with idle-based power management (rather than forced suspend), 
 there would be no issue with wakeup events getting unduly delayed.
 
 Unless one of those events was meant for an untrusted application.  Is 
 that the source of the difficulty?

Probably, but that's not solved by suspend blockers either as I
explained several times now. Because those untrusted apps either lack
blocker calls or are not allowed to use them, so the blocker does not
help for those either.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Sat, 29 May 2010, James Bottomley wrote:
 The job of the kernel is to accommodate hardware as best it can ...
 sometimes it might not be able to, but most of the time it does a pretty
 good job.
 
 The facts are that C states and S states are different and are entered
 differently. 

That's an x86'ism which is going away. And that's really completely
irrelevant for the mobile device space. Can we please stop trying to
fix todays x86 based laptop problems? They are simply not fixable.

 For some omap hardware, the power consumption in the
 lowest C state (with all the ancillary power control) is the same as S3,
 that's fine, suspend from idle works as well as suspend to ram modulo
 bad apps. For quite a lot of MSM hardware, the lowest C state power
 consumption is quite a bit above S3.  It's not acceptable to tell those
 people tough, your battery runs out in 30 minutes because you bought
 the wrong hardware.  We have to figure out how to get to S3 ... whether
 this is from idle or some other mechanism is again a discussion point,
 but not doing it is not an option.

If you'd have read the answers from Alan carefully, then you'd have
noticed that even x86 hardware is getting to the point where OMAP is
today. i.e. support of transparent suspend from idle. If that wouldn't
happen then x86 would be simply unusable for mobile devices. It's that
easy. And we really do _NOT_ care about the existing laptop hardware
which does not provide that because it's a lost case. Not only due to
the missing (or just disabled) wakeup sources, also due to the fact
that you cannot do sensible power management by completely disabling
clock and/or power of unused devices in the chipset. There is a damn
good reason why the mobile space is _NOT_ x86 based at the moment.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Sat, 29 May 2010, James Bottomley wrote:
 On Sat, 2010-05-29 at 10:10 +0200, Peter Zijlstra wrote:
  
  Not using suspend is exactly the point. As Alan has argued, propagating
  suspend blockers up into all regions of userspace will take much longer
  than fixing the hardware.
 
 Strange, that's not what I heard as the possible solution.  I thought he
 was advocating expressing the kernel side of suspend blockers as QoS
 constraints.  Once we have QoS constraints correctly done in the kernel,
 userspace still has to express its requirements.  If the requirements
 are static, then they can be done from policy files or in some other
 static way but if they're dynamic, they'll still have to be in the
 applications ... in about the same places the android wakelocks are.

That's wrong. You only need the explicit dynamic QoS constraints for
applications which follow the scheme:

 while (1) {
   if (event_available())
  process_event();
   else
  do_useless_crap_which_consumes_power();
 } 

which need the following annotation:

 while (1) {
   block_suspend();
   if (event_available()) {
  process_event();
  unblock_suspend();
   } else {
  unblock_suspend();
  do_useless_crap_which_consumes_power();
   }
 } 

Plus the kernel counterpart of drivers which take the suspend blocker
in the interrupt handler and release it when the event queue is empty.

So that's done for making polling event handling power efficient.

Even worse, you need the same annotation for non polling mode and it
enforces the use of select() because you cannot take a suspend blocker
across a blocking read() without adding more invasive interactions to
the kernel..

So the sane app looks like:

   while (1) {
 select();
 block_suspend();
 process_events();
 unblock_suspend();
   }

I'm really tired of arguing that this promotion of programming style
is the worst idea ever, so let's look how you can do the same thing
QoS based.

s/block_suspend()/qos(INTERACTIVE)/ and
s/unblock_suspend()/qos(NONE)/ and
s/block_magic()/qos_magic()/ in the drivers.

Yes, it's mostly the same, with a subtle difference:

While android can use it in the big hammer approach to disable the
existing user initiated suspend via /sys/power/state, the rest of the
world can benefit as well in various ways.

 - Sane applications which use a blocking event wait can be handled
   with a static QoS setting simply because a blocking read relies on
   the QoS state of the underlying I/O system.

 - Idle based suspend as the logical consequence of idle states is
   just a matter of QoS constraint based decisions.

 - Untrusted apps can be confined in cgroups. The groups are set to
   QoS(None) when user land decides that it's time to safe power
   (e.g. screen lock kicks in)

 - QoS states can block applications from I/O when the I/O system is
   set to a state which is exclusive.

  - ...

So that allows to use the same mechanism for more than the android
sledge hammer approach and confines the controversial use cases into
android specific files without adding a hard to maintain user space
interface which would prevent or at least make it hard to do some of
the above mentioned things which we want to see implemented.

While I personally disagree with the android approach, I REALLY want
to provide them a mechanism to make it work, but not for the price
that it actively prevents solutions which are considered to be
technically superior by a relevant group of developers/maintainers.

  You got to realize this is about Linux as a whole, I really don't care
  one whit about the specific Android case. We want a solution that is
  generic enough to solve the power consumption problem and makes sense on
  future hardware.
 
 I don't think anyone disagrees with this. As long as we find a long term
 solution that satisfies the android case, everyone will be happy.

See above.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Mon, 31 May 2010, James Bottomley wrote:
 
 For MSM hardware, it looks possible to unify the S and C states by doing
 suspend to ram from idle but I'm not sure how much work that is.

On ARM, it's not rocket science and we have in tree support for this
already (OMAP). I have done the same thing on a Samsung part as a
prove of concept two years ago and it's really easy as the hardware is
sane. Hint: It's designed for mobile devices :)

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Mon, 31 May 2010, James Bottomley wrote:

 On Mon, 2010-05-31 at 22:49 +0200, Thomas Gleixner wrote:
  On Sat, 29 May 2010, James Bottomley wrote:
   The job of the kernel is to accommodate hardware as best it can ...
   sometimes it might not be able to, but most of the time it does a pretty
   good job.
   
   The facts are that C states and S states are different and are entered
   differently. 
  
  That's an x86'ism which is going away. And that's really completely
  irrelevant for the mobile device space. Can we please stop trying to
  fix todays x86 based laptop problems? They are simply not fixable.
 
 You're the one mentioning x86, not me.  I already explained that some
 MSM hardware (the G1 for example) has lower power consumption in S3
 (which I'm using as an ACPI shorthand for suspend to ram) than any
 suspend from idle C state.

Those machines can go from idle into S2RAM just fine w/o touching the
/sys/power/state S2RAM mechanism.

It's just a deeper C state, really.

The confusion is that S3 is considered to be a complete different
mechanism - which is true for PC style x86 - but not relevant for
hardware which is sane from the PM point of view.

Now some people think, that suspend blockers are a cure for the
existing x86/ACPI/BIOS mess, which cannot go to S3 from idle, but
that's simply not feasible.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Tue, 1 Jun 2010, Rafael J. Wysocki wrote:
 On Monday 31 May 2010, Thomas Gleixner wrote:
  So that allows to use the same mechanism for more than the android
  sledge hammer approach and confines the controversial use cases into
  android specific files without adding a hard to maintain user space
  interface which would prevent or at least make it hard to do some of
  the above mentioned things which we want to see implemented.
 
 I generally agree.
 
 I think the Alan Stern's recent proposal goes along these lines, but it has
 the advantage of being a bit more specific. ;-)

Yes, Alan Stern's proposal is going into that direction and I'm not
opposed. Just wanted to get the overall picture straight for James :)

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Tue, 1 Jun 2010, Rafael J. Wysocki wrote:
 On Tuesday 01 June 2010, Neil Brown wrote:
  My freerunner has a single core so without CONFIG_PREEMPT it may be that
  there is no actual race-window - maybe the PRE_SUSPENDs will all run before 
  a
  soft_irq thread has a chance to finish handling of the interrupt (my
  knowledge of these details is limits).  But on a muilti-core device I think
  there would definitely be a race-window.
 
 Yes, there always will be a race window.  The only thing we can do is to
 narrow it, but we cannot really close it (at least not on a PC, but I'm not
 really sure it can be closed at all).

It can be closed, when the state transition from normal event delivery
to wakeup mode is state safe, which it is on most platforms which are
designed for the mobile space.

Not so the current PC style x86 platforms, which are not relevant for
the problem at hand at all. Really, that stuff is going either to gain
sane properties or it's just going into the irrelevant realm.

Any attempt to solve the current x86/ACPI/BIOS/mess is waste of time
and is inevitably going to prevent progress.

 If you really want _all_ events to be delivered timely, the only way to go is
 to avoid using suspend (and use the idle framework for power management).

Amen.

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-05-31 Thread Thomas Gleixner

On Tue, 1 Jun 2010, Neil Brown wrote:
 And if you are right that the race window cannot be closed, then the whole
 suspend-blocker infrastructure is pointless as the purpose of it is simply to
 close that window.  If it really does not and cannot work, then it would be
 best to reject it for that reason rather than for less concrete aesthetic
 arguments.
 But presumably it does work in practice on Android hardware so . confused.
 
 Having just seen the email from Thomas, maybe you mean that it cannot be
 closed on devices using ACPI, but can on other devices.  I can sort-of
 imagine how that would be the case (I tried reading an ACPI spec once - my
 hat is of to those of you who understand it).
 That shouldn't prevent us from closing the race window on sane hardware
 that allows it.  This would, I think, be sufficient for Android's needs.
 
 I'm hoping we can get agreement on:
   - there is a race with suspend

That's a matter of how you define suspend. 

If suspend is another deep idle state and the hardware is sane,
there is no race at all - assumed that the driver/platform developer
got it right. It's not rocket science to transition from normal irq
delivery to wakeup based delivery raceless (except for PC style x86
hardware of today)

If suspend is the thing we are used to via /sys/power/state then the
race will persist forever except for the suspend blocker workaround,
which we can express in QoS terms as well w/o adding another suspend
related user space API.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-01 Thread Thomas Gleixner

On Tue, 1 Jun 2010, Neil Brown wrote:
 
 I think you have acknowledged that there is a race with suspend - thanks.
 Next step was can it be closed.
 You seem to suggest that it can, but you describe it as a work around
 rather than a bug fix...
 
 Do you agree that the race is a bug, and therefore it is appropriate to
 fix it assuming an acceptable fix can be found (which I think it can)?

If we can fix it, yes we definitely should do and not work around it.
 
Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-01 Thread Thomas Gleixner


On Mon, 31 May 2010, Arve Hjønnevåg wrote:

 On Mon, May 31, 2010 at 2:46 PM, Thomas Gleixner t...@linutronix.de wrote:
  On Mon, 31 May 2010, James Bottomley wrote:
 
  For MSM hardware, it looks possible to unify the S and C states by doing
  suspend to ram from idle but I'm not sure how much work that is.
 
  On ARM, it's not rocket science and we have in tree support for this
  already (OMAP). I have done the same thing on a Samsung part as a
  prove of concept two years ago and it's really easy as the hardware is
  sane. Hint: It's designed for mobile devices :)
 
 
 We already enter the same power state from idle and suspend on msm. In
 the absence of misbehaving apps, the difference in power consumption
 is entirely caused by periodic timers in the user-space framework
 _and_ kernel. It only takes a few timers triggering per second (I
 think 3 if they do no work) to double the average power consumption on
 the G1 if the radio is off. We originally added wakelocks because the
 hardware we had at the time had much lower power consumption in
 suspend then idle, but we still use suspend because it saves power.

So how do you differentiate between timers which _should_ fire and
those you do not care about ?

We have mechanisms in place to defer timers so the wakeups are
minimized. If that's not enough we need to revisit.

Thanks,

tglx

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-01 Thread Thomas Gleixner

On Mon, 31 May 2010, Arve Hjønnevåg wrote:

 2010/5/31 Rafael J. Wysocki r...@sisk.pl:
  On Monday 31 May 2010, Arve Hjønnevåg wrote:
  2010/5/30 Rafael J. Wysocki r...@sisk.pl:
  ...
 
  I think it makes more sense to block suspend while wakeup events are
  pending than blocking it everywhere timers are used by code that could
  be called while handling wakeup events or other critical work. Also,
  even if you did block suspend everywhere timers where used you still
  have the race where a wakeup interrupt happens right after you decided
  to suspend. In other words, you still need to block suspend in all the
  same places as with the current opportunistic suspend code, so what is
  the benefit of delaying suspend until idle?
 
  Assume for a while that you don't use suspend blockers, OK?  I realize you
  think that anything else doesn't make sense, but evidently some other people
  have that opinion about suspend blockers.
 
 
 It sounded like you were suggesting that initiating suspend from idle
 would somehow avoid the race condition with wakeup events. All I'm
 saying is that you would need to block suspend in all the same places.
 If you don't care about ignoring wakeup events, then sure you can
 initiate suspend from idle.

And why should you miss a wakeup there ? If you get an interrupt in
the transition, then you are not longer idle.

Thanks,

tglx

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-02 Thread Thomas Gleixner

On Tue, 1 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/1 Thomas Gleixner t...@linutronix.de:
 
  On Mon, 31 May 2010, Arve Hjønnevåg wrote:
 
  On Mon, May 31, 2010 at 2:46 PM, Thomas Gleixner t...@linutronix.de 
  wrote:
   On Mon, 31 May 2010, James Bottomley wrote:
  
   For MSM hardware, it looks possible to unify the S and C states by doing
   suspend to ram from idle but I'm not sure how much work that is.
  
   On ARM, it's not rocket science and we have in tree support for this
   already (OMAP). I have done the same thing on a Samsung part as a
   prove of concept two years ago and it's really easy as the hardware is
   sane. Hint: It's designed for mobile devices :)
  
 
  We already enter the same power state from idle and suspend on msm. In
  the absence of misbehaving apps, the difference in power consumption
  is entirely caused by periodic timers in the user-space framework
  _and_ kernel. It only takes a few timers triggering per second (I
  think 3 if they do no work) to double the average power consumption on
  the G1 if the radio is off. We originally added wakelocks because the
  hardware we had at the time had much lower power consumption in
  suspend then idle, but we still use suspend because it saves power.
 
  So how do you differentiate between timers which _should_ fire and
  those you do not care about ?
 
 
 Only alarms are allowed to fire while suspended.
 
  We have mechanisms in place to defer timers so the wakeups are
  minimized. If that's not enough we need to revisit.
 
 
 Deferring the the timers forever without stopping the clock can cause
 problems. Our user space code has a lot of timeouts that will trigger
 an error if an app does not respond in time. Freezing everything and
 stopping the clock while suspended is a lot simpler than trying to
 stop individual timers and processes from running.

And resume updates timekeeping to account for the slept time. So the
only way to get away with that is to sleep under a second or just
ignoring the update by avoiding the access to rtc. 

So how do you keep timekeeping happy ?

Thanks,

tglx

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-02 Thread Thomas Gleixner

On Wed, 2 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/2 Thomas Gleixner t...@linutronix.de:
  On Tue, 1 Jun 2010, Arve Hjønnevåg wrote:
  Deferring the the timers forever without stopping the clock can cause
  problems. Our user space code has a lot of timeouts that will trigger
  an error if an app does not respond in time. Freezing everything and
  stopping the clock while suspended is a lot simpler than trying to
  stop individual timers and processes from running.
 
  And resume updates timekeeping to account for the slept time. So the
 
 No, for the monotonic clock it does the opposite. The hardware clock
 is read on resume and the offset is set so the monotonic clock gets
 the same value as it had when suspend was called.

Grr, yes. Misread the code. -ENOTENOUGHCOFFEE

Thanks,

tglx

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-02 Thread Thomas Gleixner

On Tue, 1 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/1 Thomas Gleixner t...@linutronix.de:
  On Mon, 31 May 2010, Arve Hjønnevåg wrote:
 
  2010/5/31 Rafael J. Wysocki r...@sisk.pl:
   On Monday 31 May 2010, Arve Hjønnevåg wrote:
   2010/5/30 Rafael J. Wysocki r...@sisk.pl:
   ...
  
   I think it makes more sense to block suspend while wakeup events are
   pending than blocking it everywhere timers are used by code that could
   be called while handling wakeup events or other critical work. Also,
   even if you did block suspend everywhere timers where used you still
   have the race where a wakeup interrupt happens right after you decided
   to suspend. In other words, you still need to block suspend in all the
   same places as with the current opportunistic suspend code, so what is
   the benefit of delaying suspend until idle?
  
   Assume for a while that you don't use suspend blockers, OK?  I realize 
   you
   think that anything else doesn't make sense, but evidently some other 
   people
   have that opinion about suspend blockers.
  
 
  It sounded like you were suggesting that initiating suspend from idle
  would somehow avoid the race condition with wakeup events. All I'm
  saying is that you would need to block suspend in all the same places.
  If you don't care about ignoring wakeup events, then sure you can
  initiate suspend from idle.
 
  And why should you miss a wakeup there ? If you get an interrupt in
  the transition, then you are not longer idle.
 
 
 Because suspend itself causes you to not be idle you cannot abort
 suspend just because you are not idle anymore.

You still are addicted to the current suspend mechanism. :)

The whole point of doing it from idle in the form of a deep power
state is to avoid the massive sh*tload of work which is neccesary to
run the existing suspend code. But that needs runtime power management
and tweaks to avoid your timers waking you, etc.

The mechanism you want to use is: suspend no matter what, like closing
the lid of the laptop, but with a few tweaks around it:

   1) An interrupt on a wakeup source which comes in while the suspend
  code runs, i.e before you transitioned into wakeup mode, must
  abort / prevent suspend.

   2) Prevent another attempt to suspend before the event has been
  delivered and the apps have signaled that they have not longer
  any work to do.

   As a side effect you confine crappy apps with that mechanism in
   some way.

In the laptop case we do not want the tweaks as not going into suspend
might set your backpack on fire.

If I understood you correctly then you can shutdown the CPU in idle
completelty already, but that's not enough due to:

1) crappy applications keeping the cpu away from idle
2) timers firing

Would solving those two issues be sufficient for you or am I missing
something ?

Thanks,

tglx

Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-02 Thread Thomas Gleixner

On Wed, 2 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/2 Neil Brown ne...@suse.de:
  There would still need to be some sort of communication between the the
  suspend daemon on any event daemon to ensure that the events had been
  processed.  This could be very light weight interaction.  The point though 
  is
  that with this patch it becomes possible to avoid races.  Possible is better
  than impossible.
 
 
 We already have a solution. I don't think rejecting our solution but
 merging a worse solution should be the goal.

That's not the goal at all. We want a solution which is acceptable for
android and OTOH does not get into the way of other approaches.

The main problem I have is that suspend blockers are only addressing
one particular problem space of power management.

We have more requirements than that, e.g. an active device transfer
requires to prevent the idle code to select a deep power state due to
latency requirements. 

So we then have to implement two mechanisms in the relevant drivers:

   1) telling the idle code to limit latency
   2) telling the suspend code not to suspend

My main interest is to limit it to one mechanism, which is QoS based
and let idle and suspend make the appropriate decisions based on that
information.

Thanks,

tglx

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-03 Thread Thomas Gleixner

On Thu, 3 Jun 2010, James Bottomley wrote:

 On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
   [mtg: ] This has been a pain point for the PM_QOS implementation.  They 
   change the constrain back and forth at the transaction level of the i2c 
   driver.  The pm_qos code really wasn't made to deal with such hot path 
   use, as each such change triggers a re-computation of what the aggregate 
   qos request is.
  
  That should be trivial in the usual case because 99% of the time you can
  hot path
  
  the QoS entry changing is the latest one
  there have been no other changes
  If it is valid I can use the cached previous aggregate I cunningly
  saved in the top QoS entry when I computed the new one
  
  (ie most of the time from the kernel side you have a QoS stack)
 
 It's not just the list based computation: that's trivial to fix, as you
 say ... the other problem is the notifier chain, because that's blocking
 and could be long.  Could we invoke the notifier through a workqueue?
 It doesn't seem to have veto power, so it's pure notification, does it
 matter if the notice is delayed (as long as it's in order)?

It depends on the information type and for a lot of things we might
get away without notifiers. 

The only real issue is when you need to get other cores out of their
deep idle state to make a new constraint work. That's what we do with
the DMA latency notifier right now.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: suspend blockers Android integration

2010-06-04 Thread Thomas Gleixner

On Fri, 4 Jun 2010, Peter Zijlstra wrote:

 On Fri, 2010-06-04 at 11:43 +0200, Peter Zijlstra wrote:
  I still believe containment is a cgroup problem. The freeze/snapshot/resume
  container folks seem to face many of the same problems. Including the
  pending timer one I suspect. Lets solve it there. 
 
 While talking to Thomas about this, we'd probably need a CLOCK_MONOTONIC
 namespace to pull this off, so that resumed apps don't see the jump in
 absolute time.
 
 This would also help with locating the relevant timers, since they'd be
 on the related timer base.
 
 The only 'interesting' issue I can see here is that if you create 1000
 CLOCK_MONOTONIC namepaces, we'd need to have a tree of trees in order to
 efficiently find the leftmost timer.

We can do more clever than that. All CLOCK_MONOTONIC timers can live
in the CLOCK_MONOTONIC rbtree, we just need proper annotation, i.e.:

struct hrtimer {
   ktime_t expires;
   ..
   struct list_head namespace;
   ktime_t base_offset;
};

So expires would be on CLOCK_MONOTONIC as seen from the kernel, just
the user space interfaces would take the base_offset into account.

On freeze we remove the timers from the rbtree (they are easy to
find via the namespace list) and on thaw we set the base_offset
accordingly and insert them again. So no surprise for user space and
no tree of trees to walk through.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: suspend blockers Android integration

2010-06-04 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
 I kind of agree here, so I'd like to focus a bit on that.
 
 Here's my idea in the very general terms:
 
 (1) Use the cgroup freezer to suspend the untrusted apps (ie. the ones
 that don't use suspend blockers aka wakelocks in the Android world) at the
 point Android would normally start opportunistic suspend.

There is an additional benefit to this approach:

 In the current android world a background task (e.g. download
 initiated before the screensaver kicked in) prevents the suspend,
 but that also means that the crapplications can still suck power
 completely unconfined.

 With the cgroup freezer you can suspend them right away and
 just keep the trusted background task(s) alive which allows us to
 go into deeper idle states instead of letting the crapplications
 run unconfined until the download finished and the suspend
 blocker goes away.
 
 (2) Allow the cpuidle framework to put CPUs into low-power states after the
 trusted apps (ie. the ones that use suspend blockers in the Android
 world) have gone idle.
 
 (3) Teach the cpuidle framework to schedule runtime suspend of I/O devices
 before idling the last CPU (*).
 
 (4) Design a mechanism to resume the I/O devices suspended in (3) so that
 they are not powered up unnecessarily (that's going to be difficult as far
 as I can see).
 
 This way, in principle, we should be able to save (at least almost) as much
 energy as the opportunistic suspend currently used by Android, provided that
 things will be capable of staying idle for extended periods of time.
 
 (*) That may require per-device PM QoS requirements to be used, in which case
 devices may even be suspended earlier if the PM QoS requirements of all
 of their users are met.
 
 I wonder what people think.  Is this realistic and if so, would it be 
 difficult
 to implement?

I think it's realistic and not overly complicated to implement.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: suspend blockers Android integration

2010-06-04 Thread Thomas Gleixner

Arve,

On Fri, 4 Jun 2010, Arve Hjønnevåg wrote:

 On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner t...@linutronix.de wrote:
  On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
  I kind of agree here, so I'd like to focus a bit on that.
 
  Here's my idea in the very general terms:
 
  (1) Use the cgroup freezer to suspend the untrusted apps (ie. the ones
      that don't use suspend blockers aka wakelocks in the Android world) at 
  the
      point Android would normally start opportunistic suspend.
 
  There is an additional benefit to this approach:
 
      In the current android world a background task (e.g. download
      initiated before the screensaver kicked in) prevents the suspend,
      but that also means that the crapplications can still suck power
      completely unconfined.
 
 
 Yes this can happen. It is usually only a big problem when you combine
 an (trusted) application that has a bug that blocks suspend forever
 with an application that wakes up too often for us to enter low power
 idle modes.

Why is it a BUG in the trusted app, when I initiate a download and put
the phone down ?

That download might take a minute or two, but that's not an
justification for the crapplication to run unconfined and prevent
lower power states.

      With the cgroup freezer you can suspend them right away and
      just keep the trusted background task(s) alive which allows us to
      go into deeper idle states instead of letting the crapplications
      run unconfined until the download finished and the suspend
      blocker goes away.
 
 
 Yes this would be better, but I want it in addition to suspend, not
 instead of it. It is also unclear if our user-space code could easily
 make use of it since our trusted code calls into untrusted code.

Sorry, that's really the worst argument I saw in this whole
discussion. 

You're basically saying, that you have no idea what your user space
stack is doing and you do not care at all as long as your suspend
blocker scheme makes things work somehow.

Up to that point, I really tried hard to step back from my initial
OMG, promoting crap is a nono reaction and work with you on a
sensible technical solution to confine crap and make it aligned with
other efforts in this area.

So now, after I spent a reasonable amount of time (as you did) to
understand what your requirements are, you come up with another
restriction which is so outside of any level of sanity, that I'm at
the point of giving up and just going into NAK mode.

Can you please answer the following question:

What is the point of having the distinction of trusted and
untrusted when you have no way to prevent trusted code calling
into untrusted code ?

That's violating any sense of abstraction and layering and makes it
entirely clear that the only way you can deal with your own design
failure is a big hammer which you need to force into the kernel.

Sorry, no. I'm perfectly willing to make progress on that, as long as
we walk on a sane ground. But abusing the kernel for fixing basic
engineering problems in the user space side of affairs is completely
out of discussion.

  I wonder what people think.  Is this realistic and if so, would it be 
  difficult
  to implement?
 
  I think it's realistic and not overly complicated to implement.
 
 
 The kernel support can be easily implemented on most arm hardware, I
 don't know if it can work on most existing x86 hardware. It does not

It does not matter. Even Intel folks told you more than once, that x86
hardware is going to be fixed pretty soon. Hint: that's crucial to
their business 

 give us the same power savings as suspend with existing software, but
 it can handle bad apps better (assuming you don't combine
 opportunistic suspend and cgroup freezing). The biggest hurdle is how
 to handle dependencies between processes that gets frozen and
 processes that don't get frozen.

See above.

Thanks,

tglx

Re: suspend blockers Android integration

2010-06-05 Thread Thomas Gleixner

B1;2005;0cOn Fri, 4 Jun 2010, Arve Hjønnevåg wrote:

 2010/6/4 Thomas Gleixner t...@linutronix.de:
  Arve,
 
  On Fri, 4 Jun 2010, Arve Hjønnevåg wrote:
 
  On Fri, Jun 4, 2010 at 5:05 PM, Thomas Gleixner t...@linutronix.de wrote:
   On Sat, 5 Jun 2010, Rafael J. Wysocki wrote:
   I kind of agree here, so I'd like to focus a bit on that.
  
   Here's my idea in the very general terms:
  
   (1) Use the cgroup freezer to suspend the untrusted apps (ie. the 
   ones
       that don't use suspend blockers aka wakelocks in the Android world) 
   at the
       point Android would normally start opportunistic suspend.
  
   There is an additional benefit to this approach:
  
       In the current android world a background task (e.g. download
       initiated before the screensaver kicked in) prevents the suspend,
       but that also means that the crapplications can still suck power
       completely unconfined.
  
 
  Yes this can happen. It is usually only a big problem when you combine
  an (trusted) application that has a bug that blocks suspend forever
  with an application that wakes up too often for us to enter low power
  idle modes.
 
  Why is it a BUG in the trusted app, when I initiate a download and put
  the phone down ?
 
 
 It is not, but we have had bugs where a trusted app does not unblock
 suspend after some failure case where it is no longer making any
 progress.

Well, that's simply an application bug which sucks battery with or
without suspend blockers. So it's unrelated to the freezing of
untrusted apps while a trusted app still works in the background
before allowing the machine to suspend.
 
  That download might take a minute or two, but that's not an
  justification for the crapplication to run unconfined and prevent
  lower power states.
 
 
 I agree, but this is not a simple problem to solve.

Not with suspend blockers, but with cgroup confinement of crap, it's
straight forward.

       With the cgroup freezer you can suspend them right away and
       just keep the trusted background task(s) alive which allows us to
       go into deeper idle states instead of letting the crapplications
       run unconfined until the download finished and the suspend
       blocker goes away.
  
 
  Yes this would be better, but I want it in addition to suspend, not
  instead of it. It is also unclear if our user-space code could easily
  make use of it since our trusted code calls into untrusted code.
 
  Sorry, that's really the worst argument I saw in this whole
  discussion.
 
  You're basically saying, that you have no idea what your user space
  stack is doing and you do not care at all as long as your suspend
  blocker scheme makes things work somehow.
 
 
 Yes I don't know everything our user-space stack is doing, but I do
 know that it makes many calls between processes (and in both
 directions). As far as I know it uses timeouts when calling into
 untrusted code, so a misbehaving application will cause an error
 dialog to pop up asking if the user if it should wait longer or
 terminate the application.

Sigh, the more I learn about the details of android and it's violation
of all sane engineering principles the more I understand why you
invented a huge nail to push through all layers in order to bring the
system into idle at all. And yes, you need a sledge hammer to drive
that big nail through everything, so you are using the right tool.

Seriously, the cross app call goes through your framework, which
already knows, that the untrusted part is frozen. So it can deal
nicely with it in any way you want including unfreezing.

  Up to that point, I really tried hard to step back from my initial
  OMG, promoting crap is a nono reaction and work with you on a
  sensible technical solution to confine crap and make it aligned with
  other efforts in this area.
 
  So now, after I spent a reasonable amount of time (as you did) to
  understand what your requirements are, you come up with another
  restriction which is so outside of any level of sanity, that I'm at
  the point of giving up and just going into NAK mode.
 
 
 I don't think this is a new restriction. Both Brian and I have
 mentioned that we have a lot of dependencies between processes.
 
  Can you please answer the following question:
 
     What is the point of having the distinction of trusted and
     untrusted when you have no way to prevent trusted code calling
     into untrusted code ?
 
 
 Trusted code that calls into untrusted code has to deal with the
 untrusted code not responding, but we only want to pop up a message
 that the application is not responding if it is misbehaving, not just
 because it was frozen though no fault of its own.

See above.
 
  That's violating any sense of abstraction and layering and makes it
  entirely clear that the only way you can deal with your own design
  failure is a big hammer which you need to force into the kernel.
 
 
 How can it be fixed? The user presses the back button

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Florian Mickler wrote:
 On Sat, 5 Jun 2010 23:26:27 +0300
 Felipe Contreras felipe.contre...@gmail.com wrote:
 
  Supposing there's a perfect usage of suspend blockers from user-space
  on current x86 platforms (in theory Android would have that), is the
  benefit that big to consider this a strong argument in favor of
  suspend blockers? Considering the small amount of x86 platforms using
  Android (is there any?), the fact that nobody else will use suspend
  blockers, and that x86 is already being fixed (as mentioned many times
  before) so dynamic PM is possible, I don't think we should be
  considering current x86 at all for suspend blockers.
 
 A solution for the desktop to deprecate having to shut down the
 machine would not be that bad, wouldn;t it? (Why have I to shut down
 my machine anyway?) 
 
 In my opinion such a decision (when to shutdown) has to be guided by
 userspace knowledge.
 Future x86 hardware is to be fixed (as I read in this discussion), so
 using suspend blockers could be the tool of choice. 
 
 But alright. Let's be a little bit more focused on the present
 situation:
 
 1) There currently is no other solution.

Wrong. The solutions are there, just being refused.

 2) It is a first stepping stone to bringing android to mainline.

Crap. Once we have an userspace API we are bound to it. Period. 

Also there is no technical reason why the whole driver stack cannot
be merged w/o the suspend blockers.

 3) Any perfect solution will emerge anyway. As there are so many
 people with so strong opinions engaged in this discussion, I'm
 confident.

Yes, and that needs cooperation and compromises from both sides and
not just blindly defining that there is no other solution.

 4) If there is a better solution, android will shurely adapt it as
 soon as it is there. 

Sure, after we accepted the crap there is no incentive at all.

Stop that advertising campaing already.

No thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Florian Mickler wrote:

 On Sat, 5 Jun 2010 23:24:40 +0200 (CEST)
 Thomas Gleixner t...@linutronix.de wrote:
 
  Stop that advertising campaing already.
 
 Stop advertising that there is no problem. 
 
  
  No thanks,
  
  tglx
 
 Cheers,
 Flo
 
 (Sorry, crossfire. Caused
 by you answering in the wrong subthread. I know that you are
 engineering and not dismissing the problem. This was pointed at
 Felipes There is no problem/suspend blockers don't do what they are 
 designed to do well)

Welcome to my killfile.

 
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: suspend blockers Android integration

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
  B1;2005;0cOn Fri, 4 Jun 2010, Arve Hjønnevåg wrote:
   Why is it a BUG in the trusted app, when I initiate a download and put
   the phone down ?
  
 
  It is not, but we have had bugs where a trusted app does not unblock
  suspend after some failure case where it is no longer making any
  progress.
 
  Well, that's simply an application bug which sucks battery with or
  without suspend blockers. So it's unrelated to the freezing of
  untrusted apps while a trusted app still works in the background
  before allowing the machine to suspend.
 
 
 It is not unrelated if the trusted app has stopped working but still
 blocks suspend. The battery drains when you combine them.

What you are describing is a problem which is not solvable either way.
If you take the lock and do not release it you're not going to
suspend. I never claimed that any other mechanism resolves this.

But this is not related to the fact that freezing crap while running a
sane background task is going to save you power vs. an approach where
running a sane background task allows crap to consume power unconfined
until it is done.

   That download might take a minute or two, but that's not an
   justification for the crapplication to run unconfined and prevent
   lower power states.
  
 
  I agree, but this is not a simple problem to solve.
 
  Not with suspend blockers, but with cgroup confinement of crap, it's
  straight forward.
 
 
 I don't think is is straight forward. If the a process in the frozen
 group holds a resource that a process in the unfrozen group needs, how
 do deal with that?

I'm going to fix the framework which puts the group into freeze state
w/o making sure that there is no held shared resource. Come on it's
not rocket science.
 
  Yes I don't know everything our user-space stack is doing, but I do
  know that it makes many calls between processes (and in both
  directions). As far as I know it uses timeouts when calling into
  untrusted code, so a misbehaving application will cause an error
  dialog to pop up asking if the user if it should wait longer or
  terminate the application.
 
  Sigh, the more I learn about the details of android and it's violation
  of all sane engineering principles the more I understand why you
  invented a huge nail to push through all layers in order to bring the
  system into idle at all. And yes, you need a sledge hammer to drive
  that big nail through everything, so you are using the right tool.
 
  Seriously, the cross app call goes through your framework, which
  already knows, that the untrusted part is frozen. So it can deal
  nicely with it in any way you want including unfreezing.
 
 Cross app calls do not go through a central process.

It's not about a central process, it goes through your framework,
which should be able to deal with it. If not, it's a design failure
which needs to be fixed at the place where the failure happened.
 
 
  How can it be fixed? The user presses the back button, the framework
  determines that app A is in the foreground and send the key to app A,
  app A decides that it it does not have anything internal to go back to
  and tells the framework to switch back to the previous app. If the
  user presses the back key again, the framework does not know which app
  this key should go to until app A has finished processing the first
  key press.
 
  Errm, what has this to do with frozen apps? If your system is
  handling input events then there are no frozen apps and even if they
  are frozen your framework can unfreeze them _before_ talking to them.
 
  So which unfixable problem are you describing with the above example ?
 
 
 You are claiming that trusted code should not have any dependencies on
 untrusted code. I gave you a visible example of such a dependency and
 want you to tell me how you can avoid this dependency. Since you are
 claiming that our user-space framework is fundamentally broken if it
 has to wait for untrusted code, I don't think it is unreasonable for
 you to answer this. Or do you think it is valid to communicate with
 untrusted code when the screen is on but not when it is off.

It does not matter whether the screen is off or not. If you need to
call into that untrusted app from your trusted app and you know about
the might be frozen state then you can deal with it.

So taking your example:

Event happens and gets delivered to the framework

  framework selects A because it is in the foreground
  
  if (A is frozen)
 unfreeze(A)

  deliver_event_to(A)

It's that simple.

If your framework cannot deal with that simple problem then you have a
much more serious problem already.

Thanks,

tglx

Re: suspend blockers Android integration

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arjan van de Ven wrote:

 On Sat, 5 Jun 2010 15:26:36 -0700
 Brian Swetland swetl...@google.com wrote:
 
  
  I'm continually surprised by answers like this.  We run on hardware
  that power gates very aggressively and draws in the neighborhood of
  1-2mA at the battery when in the lowest state (3-5mA while the radio
  is connected to the network and paging).  Waking up out of that lowest
  state and executing code every few seconds or (worse) several times a
  second) will raise your average power consumption.  Being able to stay
  parked at the very bottom for minutes or hours at a time when nothing
  interesting is happening is very useful and can have a significant
  impact on overall battery life.
 
 It's relatively simple math.
 
 If you wake up for a burst of work, you burn power at the higher level
 P1 (versus the lower power level P2), for, lets say an average time T,
 with a relatively small T (few milliseconds at most).
 
 If you wake up X times per second (with X being a fractional number, so
 can be smaller than 1) the extra power consumption factor is
 
   X * T * P1
 ---
 X * T * P1 + (1.0 - X * T) * P2
 
 if you draw a graph of this, for real values of P and T, there's a real
 point where you hit diminishing returns.
 
 if say T is 5 milliseconds (that's a high amount), and X is 1
 wakeup/second, then there's already a 200:1 ratio in time an power.
 
 If X goes to once every 10 seconds (not unreasonable, especially since
 any real device will pull email and stuff in the backgroudn), you have
 2000:1 time and power ratios...
 
 Unless your on power is insane high (and hopefully it's not, since
 you're not turning on the whole device obviously, you do selective
 power and clock gating)... that divide by 200 or 2000 makes the whole
 problem go away.. in the seconds range for really low power devices.
 Not in hours range. 

That's the whole problem. Suspend blockers are a binary all on/off
approach so you waste power just to get the thing back to
suspend. They unleash the world and some more just to put it back
into oblivion with brute force.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: suspend blockers Android integration

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
  On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
That download might take a minute or two, but that's not an
justification for the crapplication to run unconfined and prevent
lower power states.
   
  
   I agree, but this is not a simple problem to solve.
  
   Not with suspend blockers, but with cgroup confinement of crap, it's
   straight forward.
  
 
  I don't think is is straight forward. If the a process in the frozen
  group holds a resource that a process in the unfrozen group needs, how
  do deal with that?
 
  I'm going to fix the framework which puts the group into freeze state
  w/o making sure that there is no held shared resource. Come on it's
  not rocket science.
 
 
 I'm not sure which framework you are talking about here, but I don't
 think there is a single framework that knows about all shared
 resources.

Damn, it's not me talking about our framework, you are mentioning
when it fits your needs.

If you do not have a clearly defined user space framework, then we
talk about a completely random conglomeration of applications which
need to be brought into submission by some global brute force
approach.

I'm tired of this, really. You just use terminlology as it fits to
defend the complete design failure of android. But you fail to trick
me :)

Can you please explain in a consistent way how the application stack
and the underlying framework (which exists according to android docs)
is handling events and how the separation of trust level works ?

We need to know that, otherwise we turn in circles forever.

Thanks,

tglx

Re: suspend blockers Android integration

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
   Well, that's simply an application bug which sucks battery with or
   without suspend blockers. So it's unrelated to the freezing of
   untrusted apps while a trusted app still works in the background
   before allowing the machine to suspend.
  
 
  It is not unrelated if the trusted app has stopped working but still
  blocks suspend. The battery drains when you combine them.
 
  What you are describing is a problem which is not solvable either way.
  If you take the lock and do not release it you're not going to
  suspend. I never claimed that any other mechanism resolves this.
 
 Whether you claimed it or not, this is the only case where using
 cgroups would have a significant power saving over what we get with
 suspend. The trusted app is idle and the untrusted app is frozen, so
 we enter a low power mode from idle.

Nothing else was what I said and depending on the usage pattern this
can be significant. Just you converted a perfectly sensible technical
argument into a quibble about BUGs in applicatins which are not
confinable by defintion.
 
  But this is not related to the fact that freezing crap while running a
  sane background task is going to save you power vs. an approach where
  running a sane background task allows crap to consume power unconfined
  until it is done.
 
 If the task that is blocking suspend is using the cpu anyway, then the
 bad app does not increase the power consumption nearly as much as if
 the task that blocked suspend is idle.

That's utter bullshit. If the app missed to release the supsend
blocker then your crappy while(1); app is killing you in no time,
while the same frozen crappy while(1); does no harm at all.
 
Thanks,

tglx

Re: suspend blockers Android integration

2010-06-05 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
  On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
  2010/6/5 Thomas Gleixner t...@linutronix.de:
   B1;2005;0cOn Fri, 4 Jun 2010, Arve Hjønnevåg wrote:
  Cross app calls do not go through a central process.
 
  It's not about a central process, it goes through your framework,
  which should be able to deal with it. If not, it's a design failure
  which needs to be fixed at the place where the failure happened.
 
  
   How can it be fixed? The user presses the back button, the framework
   determines that app A is in the foreground and send the key to app A,
   app A decides that it it does not have anything internal to go back to
   and tells the framework to switch back to the previous app. If the
   user presses the back key again, the framework does not know which app
   this key should go to until app A has finished processing the first
   key press.
  
   Errm, what has this to do with frozen apps? If your system is
   handling input events then there are no frozen apps and even if they
   are frozen your framework can unfreeze them _before_ talking to them.
  
   So which unfixable problem are you describing with the above example ?
  
 
  You are claiming that trusted code should not have any dependencies on
  untrusted code. I gave you a visible example of such a dependency and
  want you to tell me how you can avoid this dependency. Since you are
  claiming that our user-space framework is fundamentally broken if it
  has to wait for untrusted code, I don't think it is unreasonable for
  you to answer this. Or do you think it is valid to communicate with
  untrusted code when the screen is on but not when it is off.
 
  It does not matter whether the screen is off or not. If you need to
  call into that untrusted app from your trusted app and you know about
  the might be frozen state then you can deal with it.
 
  So taking your example:
 
  Event happens and gets delivered to the framework
 
       framework selects A because it is in the foreground
 
       if (A is frozen)
          unfreeze(A)
 
       deliver_event_to(A)
 
  It's that simple.
 
 
 That is too simple. You also have to prevent A from being frozen while
 it is processing the event or the result would be the same as if it
 was frozen beforehand.

The framework decides when to freeze the app in the first place (as
your framework does now when it decides to suspend)
 
 So it knows whether the app is frozen or not.

 So it knows damend well whether it processed the event or not.

Thanks,

tglx

Re: suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
  On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
  That is too simple. You also have to prevent A from being frozen while
  it is processing the event or the result would be the same as if it
  was frozen beforehand.
 
  The framework decides when to freeze the app in the first place (as
  your framework does now when it decides to suspend)
 
      So it knows whether the app is frozen or not.
 
      So it knows damend well whether it processed the event or not.
 
 
 Our user-space code is not single-threaded. So just because an app was
 not frozen when you checked does not mean it will remain unfrozen. We
 can use the same user-space wakelock api we have now to prevent
 freezing apps instead of preventing suspend, but we loose any
 advantage we get from freezing just a subset of processes this way.

Errm. That does not matter whether its single threaded or not. And
right, you have to prevent that it gets frozen while you are calling
into it.

But that does not change the fact that you can do finer grained power
control even in the case when suspend is impossible because a
background application has work to finish and does that without
requiring interaction with the frozen part.

That's what I pointed out in the first place and you just argue in
circles why it is impossible to do so.

Let me recapitulate:

   Full on state:   No difference because everything runs
   Full suspend state:  No difference because everything is down

   Screen off, background work active:

  Suspend blocker held by the active background work lets
  other applications which are unrelated consume CPU cycles
  and power.

  versus

  Frozen apps restrict the CPU cycles and power consumption to
  the background work (if there are no interactions with
  frozen tasks) and therefor save more power than the on/off
  approach.

  If your user space stack cannot be distangled that way, then
  it's a problem of your user space stack and not changing the
  fact, that a well designedd system allows you to do that.

Any objections ?

Thanks,

tglx

Re: suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
  On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
  2010/6/5 Thomas Gleixner t...@linutronix.de:
   On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 That download might take a minute or two, but that's not an
 justification for the crapplication to run unconfined and prevent
 lower power states.

   
I agree, but this is not a simple problem to solve.
   
Not with suspend blockers, but with cgroup confinement of crap, it's
straight forward.
   
  
   I don't think is is straight forward. If the a process in the frozen
   group holds a resource that a process in the unfrozen group needs, how
   do deal with that?
  
   I'm going to fix the framework which puts the group into freeze state
   w/o making sure that there is no held shared resource. Come on it's
   not rocket science.
  
 
  I'm not sure which framework you are talking about here, but I don't
  think there is a single framework that knows about all shared
  resources.
 
  Damn, it's not me talking about our framework, you are mentioning
  when it fits your needs.
 
 You said you were going to fix the framework. I did know if you were
 talking about the cgroup framework, or the android user-space
 frameworks. I don't think either has knowledge about all shared
 resources.

The cgroup freezer makes sure that there are no in kernel resources
blocked. Of course the user space side has to do the same and it's not
rocket science.
 
 
  If you do not have a clearly defined user space framework, then we
  talk about a completely random conglomeration of applications which
  need to be brought into submission by some global brute force
  approach.
 
  I'm tired of this, really. You just use terminlology as it fits to
  defend the complete design failure of android. But you fail to trick
  me :)
 
  Can you please explain in a consistent way how the application stack
  and the underlying framework (which exists according to android docs)
  is handling events and how the separation of trust level works ?
 
 
 I don't think I can, since I only know small parts of it. I know some

Sigh. That's the main reason why this discussion goes nowhere.

How in heavens sake can we make a decision whether suspend blockers
are the right and only way to go, when the people 

 events like input event go though a single thread in our system
 process, while other events like network packets (which are also
 wakeup events) goes directly to the app.

Re: suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

On Sat, 5 Jun 2010, Arve Hjønnevåg wrote:
 2010/6/5 Thomas Gleixner t...@linutronix.de:
 
  Can you please explain in a consistent way how the application stack
  and the underlying framework (which exists according to android docs)
  is handling events and how the separation of trust level works ?
 
 
 I don't think I can, since I only know small parts of it. I know some

Sigh, thats the whole reason why this discussion goes nowhere.

How in heavens sake should we be able to decide whether suspend
blockers are the right and only thing which solves a problem, when the
folks advocating suspend blockers are not able to explain the problem
in the first place ?

 events like input event go though a single thread in our system
 process, while other events like network packets (which are also
 wakeup events) goes directly to the app.

Yes, we know that already, but that's a completely useless information
as it does not describe the full constraints and dependencies.

Lemme summarize:

  Android needs suspend blockers, because it works, but cannot explain
  why it works and why it only works that way.

A brilliant argument to merge them - NOT.

Thanks,

tglx

Re: [linux-pm] suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

On Sun, 6 Jun 2010, James Bottomley wrote:

  3. We've lost sight of one of the original goals, which was to
 bring the android tree close enough to the kernel so that the
 android downstream driver and board producers don't have to
 choose between the android kernel and vanilla kernel.

There are two ways to do that w/o creating a dependcy on anything.

1) merge the drivers w/o the suspend_blockers. It's not rocket science
   to have a patch which brings them back for android.

2) merge the drivers with empty stub implementations for annotation.
   android just has to patch in the real one.

While I'd prefer #1, I' not in the way of #2.

Both ways can get the drivers into the kernel and it could/should have
been done right from the beginning, but now we face a situation where
drivers are held hostage.

Then we can sit down more relaxed and fix the stuff in a way which
makes both sides happy. If we manage to replace them, we can deprecate
the stub implementation and remove it after a grace period. If we
rename them it's not an issue either. We can rename them right away to
a qos interface, but that does not really make a difference.

What we really want to avoid is implementing an user space contract in
a frenzy which binds us forever.

It's not the suspend_blockers which are the causing the nightmare,
it's solely the drivers itself especially when there are different
implementations in both trees. And frankly, the drivers in android are
not in a shape which makes them flood in within 2 weeks. That's
serious work to get them brushed up and polished. So that gives us
quite a period of time to solve the suspend problem.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

James,

On Sun, 6 Jun 2010, James Bottomley wrote:

 On Sun, 2010-06-06 at 17:46 +0200, Thomas Gleixner wrote:
  On Sun, 6 Jun 2010, James Bottomley wrote:
  
3. We've lost sight of one of the original goals, which was to
   bring the android tree close enough to the kernel so that the
   android downstream driver and board producers don't have to
   choose between the android kernel and vanilla kernel.
  
  There are two ways to do that w/o creating a dependcy on anything.
  
  1) merge the drivers w/o the suspend_blockers. It's not rocket science
 to have a patch which brings them back for android.
 
 Well, we sort of tried this when Greg pulled some of them into the
 staging tree.  The problem is that without the annotations, the drivers
 are still different, and patches won't apply, so, unsurprisingly, they
 didn't get improved or even maintained.
 
  2) merge the drivers with empty stub implementations for annotation.
 android just has to patch in the real one.
 
 That's also possible.  This time, we would have a cosmetically closer
 tree ... however, what's in the kernel wouldn't be compilable for
 android ... which is where all the downstream wants to test, so they'd
 still be building for the android tree ... we just might have an easier
 time of it picking up their fixes.

The downstream users will be bound to the android tree anyway until
the full set of drivers for a given platform is completely merged. So
optimistically that would be 2.6.36, which gives us a couple of
months to sort out the whole thing.

Once a driver is merged mainline and the android tree switched over to
use the mainline version, fixes can be sent both ways and that's not a
real problem.

  While I'd prefer #1, I' not in the way of #2.
 
 I think 1 is unviable ... I'm not opposed to 2 but I'd like to try to
 get the kernel really closer to android before we go for the cosmetic
 only option.
 
  Both ways can get the drivers into the kernel and it could/should have
  been done right from the beginning, but now we face a situation where
  drivers are held hostage.
  
  Then we can sit down more relaxed and fix the stuff in a way which
  makes both sides happy. If we manage to replace them, we can deprecate
  the stub implementation and remove it after a grace period. If we
  rename them it's not an issue either. We can rename them right away to
  a qos interface, but that does not really make a difference.
  
  What we really want to avoid is implementing an user space contract in
  a frenzy which binds us forever.
 
 Well, that's why the QoS proposal ... it already has a userspace API ...
 we'd just be extending it for statistics, which looks like a wothwhile
 goal independent of android anyway.

Right, but there is no dependency for the driver merge on that.
 
  It's not the suspend_blockers which are the causing the nightmare,
  it's solely the drivers itself especially when there are different
  implementations in both trees. And frankly, the drivers in android are
  not in a shape which makes them flood in within 2 weeks. That's
  serious work to get them brushed up and polished. So that gives us
  quite a period of time to solve the suspend problem.
 
 Right, so the sooner we make it easier for the drivers to use the kernel
 as their main repository, the better.

Yep, the fastest way is to provide two stub inlines in pm.h and let
the driver flood come in.

I think all of us involved in that can do with a break, where we sit
back, calm down and rethink w/o time pressure.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

On Sun, 6 Jun 2010, Brian Swetland wrote:

 On Sun, Jun 6, 2010 at 11:04 AM, Thomas Gleixner t...@linutronix.de wrote:
 
  Right, so the sooner we make it easier for the drivers to use the kernel
  as their main repository, the better.
 
  Yep, the fastest way is to provide two stub inlines in pm.h and let
  the driver flood come in.
 
 As mentioned previously, we didn't know this was an option (stubs
 without an implementation behind them).  If it is, and thus we can

That's what Greg did when he pulled stuff into staging, but there is
no reason not to do it outside of staging as well. We can simply put
the stub functions into Documentation/feature-removal.txt to ensure
that we don't forget about them :)

 simplify the driver merging process short-term while sorting out a
 long-term implementation or replacement for suspend blockers, then I
 think we're making real progress.

Yes, that way we do not lose the annotations. Replacing them, ripping
them out or whatever we agree on, is a nobrainer.

But it gets the drivers into the tree, so they are usable outside of
android as well and the delta between android and mainline shrinks
significantly.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] suspend blockers Android integration

2010-06-06 Thread Thomas Gleixner

Brian,

On Sun, 6 Jun 2010, Brian Swetland wrote:

 On Sun, Jun 6, 2010 at 12:24 PM, Christoph Hellwig h...@infradead.org wrote:
 
  Yes.  That's what makes me wonder about some parts of the discussion
  here.  Getting the drivers for one or more of the android plattforms
  in is not a problem.  I'd say it could have easily been done with the
  manweeks spent arguing in this and related threads.
 
  The much bigger issues is to get android userspace running on a more
  or less vanilla kernel, that is withoutthe  binder, without the rather
  interesting group ID based security hack^H^H^H^Hmodel, without the
  suspend blocker userspace API and so on, and so on.
 
 Somebody will have to broker a deal with the frameworks/apps folks to
 get rid of the binder.  They like it a lot.  Of course if somebody
 built a drop-in replacement for the userspace side that didn't require
 a kernel driver, had the same performance characteristics, solved the
 same problems, etc, they could probably make an argument for it (or
 just provide it as a drop-in replacement for people who want a more
 pure linux underneath Android, even if we didn't pick it up).
 
 The group ID stuff works incredibly well for gating device access --
 we ensure that devices that need access from various processes end up
 with perms like 0660 root audio (say for a raw audio interface), and
 then we assure that processes which have the may use audio hardware
 permission are executed with audio as an additional group.  We ended
 up using the same model to control socket, raw socket, and bt socket
 access because at the time we could not find a reasonable way to grant
 or exclude such permissions on a process by process basis.
 Maintaining about 20-30 lines of diffs to make that work was not a bad
 tradeoff (and we don't expect those patches to go upstream).  If
 there's a way to accomplish this without patching the kernel, we're
 all ears.
 
 While we do maintain some assorted patches to core code (like the
 permissions thing), we strongly prefer to keep our work localized to
 drivers (generic to android or specific to hardware), and try to
 migrate to common kernel features when possible, submit patches (like
 work Mike Chan is doing around cpufreq and power measurement), etc.
 Patches to core code cause more headaches when we rebase up to new
 kernel versions, after all.

thanks for the honest answer and thanks to Christoph for the
reminder! 

That takes a lot of the bullshit arguments about downstream users
being hurt out of the discussion. The above problems are way more
complex to resolve than the suspend blocker details.

That's another prove why we can let the drivers flow in (in the worst
case w/o the suspend blocker stubs) and have no pressure to resolve
the suspend blocker problem yesterday.

That said, after thinking more about it, I'm advocating the stubs
solution with a clear removal / decision date constraint
(e.g. 2.6.37), as it forces all involved parties to stay tuned and not
to forget about it. I'm curious about the outcome :)

Thanks,

tglx

Re: [linux-pm] suspend blockers Android integrationy

2010-06-06 Thread Thomas Gleixner

Alan,

On Sun, 6 Jun 2010, Alan Stern wrote:

 On Sun, 6 Jun 2010, Matthew Garrett wrote:
 
  The difference between idle-based suspend and opportunistic suspend is 
  that the former will continue to wake up for timers and will never be 
  entered if something is using CPU, whereas the latter will be entered 
  whenever no suspend blocks are held. The problem with opportunistic 
  suspend is that you might make the decision to suspend simultaneusly 
  with a wakeup event being received. Suspend blocks facilitate 
  synchronisation between the kernel and userspace to ensure that all such 
  events have been consumed and handld appropriately.
 
 Remember that suspend takes place in several phases, the first of which 
 is to freeze tasks.  The phases can be controlled individually by the 
 process carrying out a suspend, and there's nothing to prevent you from 
 stopping after the freezer phase.  Devices won't get powered down, but 
 Android uses aggressive runtime power management for its devices 
 anyway.
 
 If you do this then the synchronization can be carried out entirely
 from userspace, with no need for kernel modifications such as suspend
 blockers. And since Android can reach essentially the same low-power
 state from idle as from suspend, it appears that they really don't need
 any kernel changes at all.

Well there are some things to solve:

 1) the confinement of untrusted apps 

 2) the confinement of trusted apps firing periodic timers. 

 Aside of that they need to shut off undesired interrupt sources,
 but that's not a real problem to solve and probably possible
 today already.

#1 can be solved elegantly by cgroups. We know how to freeze the
   cgroup timers when the need arises, but that's not a real problem
   as all periodic timers are firing only once when the target app
   does not handle them.

   Though the cgroup based approach of freezing timers might be
   interesting for containers c/r as well and it might be necessary to
   emulate the suspend semantics of freezing CLOCK_MONOTONIC, but
   that's simple to do and basically no overhead.

#2 is a tad harder, as it requires to fix the trusted apps not to fire
   timers when there is nothing to do.

   Though you can solve it with cgroups as well. The unfreeze problem
   for real wakeups can be solved as mhelsley pointed out somewhere
   else in this thread.

But that depends on user space changes 

Though as we learned today that suspend blockers are the least of the
problems which android is facing vs. mainline and we have a plan to
get the drivers in we can relax a bit and think more about it.

I'm not saying that we should ignore the shortcomings of todays code,
but in the face of hardware which perfectly goes into the same power
state from idle as it does from suspend we can IMNSHO safely ignore
the x86/ACPI/BIOS crap and the user space wreckage (see above) and
just focus on a sane design based on current/future hardware.

That's nothing new, we do not go and make NOHZ/HIGHRES work on crappy
hardware either, even if there have been patches around to do so. We
do not make broken TSCs work, even if there are brute force ways to do
so. 

It's more sane to say Sorry, it does not work on your system than
trying to make it work under all circumstances for a questionable
benefit and paying the price for it in terms of maintainability and/or
complexity.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] suspend blockers Android integrationy

2010-06-07 Thread Thomas Gleixner

Alan,

On Mon, 7 Jun 2010, Alan Stern wrote:
 On Mon, 7 Jun 2010, Thomas Gleixner wrote:
  #2 is a tad harder, as it requires to fix the trusted apps not to fire
 timers when there is nothing to do.
 
 No; all you have to do is handle the trusted apps as though they were 
 untrusted -- just as in the original wakelock approach.
 
 Though you can solve it with cgroups as well. The unfreeze problem
 for real wakeups can be solved as mhelsley pointed out somewhere
 else in this thread.
  
  But that depends on user space changes 
 
 If you handle all the apps uniformly, very few userspace changes are
 needed.

Oh, I see. I misunderstood you. -ENOTENOUGHSLEEP
 
Thanks,

Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timekeeping issue on aggressive suspend/resume

2010-06-09 Thread Thomas Gleixner

On Wed, 9 Jun 2010, Suresh Rajashekara wrote:

 I have an application (running on 2.6.29-omap1) which puts an OMAP1
 system to suspend aggressively. The system wakes up every 4 seconds
 and stays awake for about 35 milliseconds and sleeps again for another
 4 seconds. This design is to save power on a battery operated device.
 
 This aggressive suspend resume action seems like creating an issue to
 other applications in the system waiting for some timeout to happen
 (especially an application which is waiting using the mq_timedreceive
 and is supposed to timeout every 30 seconds. It seems to wake up every
 90 seconds). Seems like the timekeeping is not happening properly in
 side the kernel.
 
 If the suspend duration is changed from 4 second to 1 second, then
 things work somewhat better. On reducing it to 0.5 second (which was
 our earlier design on 2.6.16-rc3), the problem seems to disappear.
 
 Is this expected?

Yes, that's caused by the fact that suspend (via sys/power/state )
freezes the kernel internal timers and the user space visible timers
which are based on CLOCK_MONOTONIC or jiffies (like mq_timedreceive on
your .29 kernel). Only CLOCK_REALTIME based timers are kept correct as
we have to align to the wall clock time.

The reason for this is, that otherwise almost all timers are expired
when we resume and we get a thundering herd of apps and kernel
facilities due to firing timeouts.

Another problem is that jiffies can wrap around on 32 bit systems
during a long suspend though I don't think that's a real world problem
as it takes between 49 to 497 days of suspend depending on the HZ
setting. SO for your usecase it would not matter.

I'm more concerned about code getting surprised by firing timers as
the kernel has this behaviour for a long time now.

Though we could change that conditionally - the default would still be
the freeze of jiffies and CLOCK_MONOTONIC for historical compability.

There will be probably some accounting issues. uptime, cpu time of the
suspend task and some others, but that needs to be found out.

Thanks,

tglx



--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Timekeeping issue on aggressive suspend/resume

2010-06-14 Thread Thomas Gleixner

On Mon, 14 Jun 2010, john stultz wrote:

 On Mon, 2010-06-14 at 00:46 -0700, Suresh Rajashekara wrote:
  On Thu, Jun 10, 2010 at 12:52 PM, john stultz johns...@us.ibm.com wrote:
   I think Thomas was suggesting that you consider creating a option for
   where CLOCK_MONOTONIC included total_sleep_time.
  
   In that case the *hack* (and this is a hack, we'll need some more
   thoughtful discussion before anything like it could make it upstream)
   would be in timekeeping_resume() to comment out the lines that update
   wall_to_monotonic and total_sleep_time.
  
   It would be interesting to hear if that hack works for you, and we can
   try to come up with a better way to think about how to accommodate both
   views of how to account time over suspend.
  
  Thanks.
  
  I tried this fix. It seemed to help, however the accuracy of sleep
  time for the process was not quite right. A process thread which was
  supposed to wake up every (X) seconds, seemed to wake up every (X -
  delta X) seconds.
 
 Ah, the sleep time is probably too coarse (seconds). We probably need to
 increase the granularity from read_persistent_clock() and see if that
 helps (although most persistent clocks aren't very fine grained).
 
  Also another side effect of this change was that the system time was
  no longer in sync with the wall time.
 
 ? This doesn't make much sense to me, as you shouldn't be manipulating
 xtime differently.
 
 Just to be clear, you mean the value from date doesn't match your
 watch after resume? 
 
  These problems were more pronounced when the suspend/wakeup cycle time
  was brought down to 0.5 seconds from 4 seconds. The periodicity of
  most of the process threads were disturbed.
  
  I decided to NOT suspend/resume the timekeeping subsystem in the
  kernel and try. It seemed to work. Every application seems to work
  fine.
  
  Now my question is; Is it safe to disable suspend/resume of the
  timekeeping subsystem? Will it have an effect (on
  functionality/performance) which may not surface in my short
  experiments?
 
 Well, the difficultly here is what folks actually mean by suspend. On
 some hardware it means everything is powered off, and so on resume we
 have to re-init hardware values.
 
 It seems in your case that the hardware isn't completely powered off,
 since the clocksource you're using seemed to continue counting while the
 system was suspended. 
 
 So in this case you might be ok. Your suspend seems closer to an deep
 idle state on x86. So suspending timekeeping might not be necessary.
 
 However, you're right that there may be lurking issues:
 
 1) The suspend time would have to be limited to the clocksource's
 max_idle_ns value, since after that amount of cycles have past, we might
 overflow the accumulation function, or the clocksource may have wrapped.
 
 2) If the hardware does reset the clocksource at some point during the
 suspend, you'll have odd time issues.
 
 3) You could run into some difficulty keeping close sync with an NTP
 server, as the long delays between accumulation will probably cause an
 oscillating over-shoot and over-correction.
 
 I suspect these different definitions of suspend on all of the
 different hardware types out there is going to be a growing problem in
 the near term. Especially as deep idle states start to power off more
 hardware and becomes closer to suspend in behavior.

I don't think it's really an issue. Such hardware uses a 32.768kHz
driven (RTC alike) clocksource/event which is never powered off and
not affected by suspend/resume unless you run out of battery. That
hardware provides sub second resolution (~30us) contrary to the PC
style RTC which gives you seconds only. That's really good enough for
timekeeping, NOHZ and even HIGHRES.

The NTP sync might become an issue for real long sleep times, but
that's an NTP problem and needs to be addressed seperately.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/25] OMAP4: PM: Add WakeupGen module as OMAP gic_arch_extn

2011-09-09 Thread Thomas Gleixner

On Fri, 9 Sep 2011, Santosh wrote:

 On Thursday 08 September 2011 11:57 PM, Kevin Hilman wrote:
  Santosh Shilimkarsantosh.shilim...@ti.com  writes:
  
   OMAP WakeupGen is the interrupt controller extension used along
   with ARM GIC to wake the CPU out from low power states on
   external interrupts.
   
   The WakeupGen unit is responsible for generating wakeup event
   from the incoming interrupts and enable bits. It is implemented
   in MPU always ON power domain. During normal operation,
   WakeupGen delivers external interrupts directly to the GIC.
   
   Signed-off-by: Santosh Shilimkarsantosh.shilim...@ti.com
   Cc: Kevin Hilmankhil...@ti.com
  
  [...]
  
   +#ifdef CONFIG_PM
   +/*
   + * Masking wakeup irqs is handled by the IRQCHIP_MASK_ON_SUSPEND flag,
   + * so no action is necessary in set_wake, but implement an empty handler
   + * here to prevent enable_irq_wake() returning an error.
   + * FIXME: Remove the dummy handler once gen irq code fix above.
   + */
  
  Just curious... is there a fix for this pending for v3.2?
  
 I have sent a proposed change to Thomas but have not seen
 any response yet from him.
 
 http://www.spinics.net/lists/arm-kernel/msg134297.html

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a7840a..cd4bc01 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c

@@ -467,6 +467,9 @@ static int set_irq_wake_real(unsigned int irq, unsigned int 
on)

   struct irq_desc *desc = irq_to_desc(irq);
   int ret = -ENXIO;

+  if (irq_desc_get_chip(desc)-flags  IRQCHIP_MASK_ON_SUSPEND)
+ return 0;
+
if (desc-irq_data.chip-irq_set_wake)
   ret = desc-irq_data.chip-irq_set_wake(desc-irq_data, on);

The flag says: MASK ON SUSPEND and it does not imply that you don't
need a wake function. There might be cases where you want to setup
stuff in that function in order to have the wakeup happen on that
interrupt line despite of the mask on suspend.

We either need a separate flag or a global dummy set_wake function in
the core to avoid empty copies all over the place.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/25] OMAP4: PM: Add WakeupGen module as OMAP gic_arch_extn

2011-09-09 Thread Thomas Gleixner

On Fri, 9 Sep 2011, Santosh wrote:
 On Friday 09 September 2011 12:49 PM, Thomas Gleixner wrote:
  
  The flag says: MASK ON SUSPEND and it does not imply that you don't
  need a wake function. There might be cases where you want to setup
  stuff in that function in order to have the wakeup happen on that
  interrupt line despite of the mask on suspend.
  
 I see your point.
 
  We either need a separate flag or a global dummy set_wake function in
  the core to avoid empty copies all over the place.
  
 A flag is probably better since you mentioned that on some arch, there
 might be need to have actual set_wake() handler. Or if the global
 dummy can be over-ridden by platform, that's fine too.

Global dummy would mean:

int irq_set_wake_dummy(...)
{
return 0;
}

And you just add it to your chip, but either way I don't care whether
it's a dummy function or a flag.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/25] OMAP4: PM: Add WakeupGen module as OMAP gic_arch_extn

2011-09-12 Thread Thomas Gleixner

On Fri, 9 Sep 2011, Santosh wrote:

 On Friday 09 September 2011 01:48 PM, Thomas Gleixner wrote:
  On Fri, 9 Sep 2011, Santosh wrote:
   On Friday 09 September 2011 12:49 PM, Thomas Gleixner wrote:

The flag says: MASK ON SUSPEND and it does not imply that you don't
need a wake function. There might be cases where you want to setup
stuff in that function in order to have the wakeup happen on that
interrupt line despite of the mask on suspend.

   I see your point.
   
We either need a separate flag or a global dummy set_wake function in
the core to avoid empty copies all over the place.

   A flag is probably better since you mentioned that on some arch, there
   might be need to have actual set_wake() handler. Or if the global
   dummy can be over-ridden by platform, that's fine too.
  
  Global dummy would mean:
  
  int irq_set_wake_dummy(...)
  {
  return 0;
  }
  
  And you just add it to your chip, but either way I don't care whether
  it's a dummy function or a flag.
  
 Will below patch work for you then ?
 Attaching the same in case mailer damages it.

It does :)
 
 Regards,
 Santosh
 
 From d63d4347dc8fb144b19f4d4e7c0621397cccea94 Mon Sep 17 00:00:00 2001
 From: Santosh Shilimkar santosh.shilim...@ti.com
 Date: Fri, 9 Sep 2011 13:59:35 +0530
 Subject: [PATCH] irq: Add IRQCHIP_SKIP_SET_WAKE flag to avoid need of dummy
 set_wake() handler.
 
 Certain IRQCHIP's may not need to install the irq_set_wake() handler if
 the IRQCHIP_MASK_ON_SUSPEND flag is set. But but if it's not implemented,
 enable_irq_wake() will return an error.  That needs the IRQCHIP to install
 an empty set_wake handler.
 
 Add an 'IRQCHIP_SKIP_SET_WAKE' flag so that IRQCHIP can inform the core
 irq code that irq_set_wake() handler is not necessary.
 
 Signed-off-by: Santosh Shilimkar santosh.shilim...@ti.com
 Cc: Thomas Gleixner t...@linutronix.de

Queued to irq/core for 3.2

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ARM SoC tree: OMAP PM dependency on tip irq/core

2011-10-07 Thread Thomas Gleixner

On Fri, 7 Oct 2011, Arnd Bergmann wrote:

 On Friday 07 October 2011, Kevin Hilman wrote:
   I've pulled in rmk/devel-stable as a dependency now, thanks for
   reminding me of that.
  
   Thomas, where should I get the irq-core branch (or whichever
   I should wait for) to pull in as another dependency. Is that
   branch one that never gets rebased? 
  
  git://tesla.tglx.de/git/linux-2.6-tip irq/core
  
  I asked Thomas about this earlier when I was going to build up the
  dependencies myself, and he said it won't be rebased.
 
 Ok, thanks for the info.
 
 I think I've now also come up with a workflow for tracking the dependencies:
 I have a depends/xxx branch for each other branch that I need to wait for
 getting merged first. When I want to send a pull request, I first check
 all the depends/* branches using 'git branch --merged next/xxx |
 grep depends' to see what the dependencies are, and 'git branch --merged
 torvalds/master | grep depends' to see if they are already merged upstream.
 When a dependency is already merged, I can remove its tracking branch
 from the arm-soc tree.

For your internal dependecies, yes. But for irq/core you don't have to
wait. You just need to tell Linus in the pull request that you pulled
my branch with my ack as it contains modifications which are
prerequisite for arm/whatever.

When I send my pull request later, then this wont do any damage as
Linus has the commits already. Same the other way round. We do that
all the time, otherwise pull dependencies would be a nightmare.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Seeking clarity on IRQCHIP_MASK_ON_SUSPEND

2012-09-10 Thread Thomas Gleixner

On Mon, 10 Sep 2012, NeilBrown wrote:
 
 The IRQCHIP_MASK_ON_SUSPEND flag seems to be hard to use correctly, so either
 I'm understanding it wrongly, or it could be made easier to use.
 If the first case, I'm hoping that some improvement to documentation might
 result.  If the second, then maybe we can fix the code.
... 
 Is anyone able to give a definitive answer on this?  Should
 IRQCHIP_MASK_ON_SUSPEND be removed?

The whole point of IRQCHIP_MASK_ON_SUSPEND is to deal with hardware
designed by geniuses.

Most SoCs have a way to mark the interrupts which serve as a wake up
source as such. All other interrupts are magically masked on entry
to suspend.

Now there is hardware which is missing such a control, so we need to
mask the non wakeup interrupts right before going into suspend.

That's what IRQCHIP_MASK_ON_SUSPEND does. Not more, not less. See
commit d209a699a0b for more ugly details.

You might be looking for a different functionality. Can you explain
what you need?

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Seeking clarity on IRQCHIP_MASK_ON_SUSPEND

2012-09-10 Thread Thomas Gleixner

On Mon, 10 Sep 2012, Shilimkar, Santosh wrote:
 Thomas,
 
 On Mon, Sep 10, 2012 at 3:58 PM, Thomas Gleixner t...@linutronix.de wrote:
  On Mon, 10 Sep 2012, NeilBrown wrote:
 
  The IRQCHIP_MASK_ON_SUSPEND flag seems to be hard to use correctly, so 
  either
  I'm understanding it wrongly, or it could be made easier to use.
  If the first case, I'm hoping that some improvement to documentation might
  result.  If the second, then maybe we can fix the code.
  ...
  Is anyone able to give a definitive answer on this?  Should
  IRQCHIP_MASK_ON_SUSPEND be removed?
 
  The whole point of IRQCHIP_MASK_ON_SUSPEND is to deal with hardware
  designed by geniuses.
 
  Most SoCs have a way to mark the interrupts which serve as a wake up
  source as such. All other interrupts are magically masked on entry
  to suspend.
 
 Just to support this, IRQCHIP_MASK_ON_SUSPEND does work with quite
 a few ARM platforms too. OMAP already uses it in mainline and I have
 seen patches for U500 and Tegra SOCs. Most of these usages are with
 IRQ chips who doesn't have any driver run time PM supported and
 the IRQ CHIP itself is shutdown when the CPU/CPU cluster gets
 power down. So as far as functionality concerned with the flag,
 it does what it suppose to do.
 
  Now there is hardware which is missing such a control, so we need to
  mask the non wakeup interrupts right before going into suspend.
 
  That's what IRQCHIP_MASK_ON_SUSPEND does. Not more, not less. See
  commit d209a699a0b for more ugly details.
 
  You might be looking for a different functionality. Can you explain
  what you need?
 
 Neil's email came from a discussion on the usage of this flag for
 OMAP GPIO irqchip which I proposed. With the flag, when the
 lazy check_irq routine is called, the GPIO driver is runtime suspended
 and hence the late mask/unmask calls take abort(clocks are already gated).
 GPIO IRQCHIP is secondary IRQCHIP connected to 1 interrupt line
 per bank(32 GPIOs) to the primary interrupt controller IRQCHIP.

So for this thing you need a IRQCHIP_MASK_BEFORE_SUSPEND variant? 

--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Seeking clarity on IRQCHIP_MASK_ON_SUSPEND

2012-09-11 Thread Thomas Gleixner

On Tue, 11 Sep 2012, NeilBrown wrote:
 On Mon, 10 Sep 2012 12:28:35 +0200 (CEST) Thomas Gleixner
 t...@linutronix.de wrote:
  You might be looking for a different functionality. Can you explain
  what you need?
 
 I want as particular GPIO interrupt to be masked before entering suspend.
 I produced code to get the -suspend() callback to perform this masking.
 Another developer (Santosh) felt that IRQCHIP_MASK_ON_SUSPEND was the
 preferred way to do this and on the surface this looks like it should be
 correct.  However it doesn't work as explained previously.
 I want a resolution to this difference of opinion that doesn't just sweep the
 issue under that carpet but provides a clear answer to this sort of issue.
 
 My current opinion is that IRQCHIP_MASK_ON_SUSPEND should be discarded.  The
 patch which introduced it says:
 
 Rather than working around this in the affected interrupt chip
 implementations we can solve this elegant in the core code itself.
 
 It appears that the solution in core code, while elegant, is wrong.  It
 happens too late to be generally usable.  It is easy enough to handle this

Sigh. The flag was invented with the semantics to keep the current
check for any interrupt pending functionality alive and then mask it
right before going down, so only the designated wakeup interrupts can
wake the device. That was the result of the discussion back then and
that was what the developer wanted to achieve with his driver suspend
hackery.

 issue in the interrupt chip drivers so maybe that is the best place
 to handle it.

And have the same keep track of wakeup interrupts code over and over
in the drivers.
 
 The the very least I think we need a big comment saying the
 IRQCHIP_MASK_ON_SUSPEND can only be used for irqchips which can always be
 programmed, even when they are suspended from an runtime-PM perspective,
 and that those chips must handle masking in their 'suspend' callback.

Sigh, no. Either we make IRQCHIP_MASK_ON_SUSPEND into an
implementation which masks the interrupts early, if the existing users
find this acceptable or have a separate IRQCHIP_MASK_BEFORE_SUSPEND
flag.

This GPIO driver at hand is probably not the last one which requires
this and we really want to do stuff in the core code instead of having
random implementations of the same stuff all over the place.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tty: omap-serial: fix boot hang by converting to use a threaded IRQ handler (was Re: [PATCH] irq: always set IRQF_ONESHOT if no primary handler is specified)

2011-08-23 Thread Thomas Gleixner

Linus,

On Tue, 23 Aug 2011, Linus Torvalds wrote:
 but the there is no difference for others seems to be total crap,
 exactly because it results in this IRQF mismatch.
 
 So I think that commit should just be reverted. Thomas?

Yes. I missed that detail when I applied that patch. Reverting it
seems to be the only sensible solution.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] genirq: provide means to retrigger parent

2012-10-23 Thread Thomas Gleixner

On Tue, 23 Oct 2012, Kevin Hilman wrote:

 Russell King - ARM Linux li...@arm.linux.org.uk writes:
 
  On Tue, Oct 16, 2012 at 03:07:49PM -0700, Kevin Hilman wrote:
  From: Thomas Gleixner t...@linutronix.de
  
  Attempts to retrigger nested threaded IRQs currently fail because they
  have no primary handler.  In order to support retrigger of nested
  IRQs, the parent IRQ needs to be retriggered.
  
  To fix, when an IRQ needs to be resent, if the interrupt has a parent
  IRQ and runs in the context of the parent IRQ, then resend the parent.
  
  Also, handle_nested_irq() needs to clear the replay flag like the
  other handlers, otherwise check_irq_resend() will set it and it will
  never be cleared.  Without clearing, it results in the first resend
  working fine, but check_irq_resend() returning early on subsequent
  resends because the replay flag is still set.
  
  Problem discovered on ARM/OMAP platforms where a nested IRQ that's
  also a wakeup IRQ happens late in suspend and needed to be retriggered
  during the resume process.
  
  Reported-by: Kevin Hilman khil...@ti.com
  Tested-by: Kevin Hilman khil...@ti.com
  [khil...@ti.com: changelog edits, clear IRQS_REPLAY in handle_nested_irq()]
  Signed-off-by: Thomas Gleixner t...@linutronix.de
 
  Umm, we also have the converse situation.  We have platforms where the
  resend has to be done from the child IRQ, and the parent must not be
  touched.  I hope that doesn't break those.
 
 I'm assuming the child IRQs you're concerned with are not threaded,
 right?  This patch only addresses nested, threaded IRQs, and these don't
 have a primary handler to run at all, so cannot do any triggering.

And it involves that you activly set the parent irq via the new
interface: irq_set_parent()

You don't have that yet or you don't use that in your future changes,
then you're good. :)

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] IRQ: allow check_wakeup_irqs to notice level-triggered interrupts.

2012-05-04 Thread Thomas Gleixner

Neil,

On Fri, 4 May 2012, NeilBrown wrote:

 On Wed, 25 Apr 2012 14:54:54 +0200 (CEST) Thomas Gleixner
 t...@linutronix.de wrote:
 
  Why not simply managing the pending bit for level irqs ?
  
 
 Hi Thomas,
  thanks again for the patch.  I finally made time to test it and it works as
 expected.  I've included it below with a change-log entry and tested-by:
 in case that helps.

thanks for testing. The changelog is great. You know how to make the
live of lazy buggers easier :)
  
   for_each_irq_desc(irq, desc) {
 - if (irqd_is_wakeup_set(desc-irq_data)) {
 + if (desc-depth == 1 
 + irqd_is_wakeup_set(desc-irq_data)) {
   if (desc-istate  IRQS_PENDING)
   return -EBUSY;
   continue;

I split that part into a separate patch, as it's really a different
issue.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Thomas Gleixner

On Wed, 27 Apr 2011, Menon, Nishanth wrote:
 On Wed, Apr 27, 2011 at 12:49, Colin Cross ccr...@google.com wrote:
  I proposed in a different thread on LKML that DVFS be handled within
  the generic clock implementation.  Platforms would register a
  regulator and a table of voltages for each struct clock that required
  DVFS, and the voltages would be changed on normal clk_* requests.
  This maintains compatibility with existing clk_* calls.
 
 It is upto SoC frameworks to implement the transitions. E.g. lets look
 at scalability: How'd the mechanism proposed work with temperature
 variances: Example: I dont want to hit 1.5GHz if temp 70C - wont it
 be an SoC specific hack I'd need to introduce?

Why is limiting the max core frequency depending on temperature a SoC
specific problem ?

Everyone wants to do that. x86 does it in hardware / SMM, other
architectures want the kernel to take care of it.

So the decision is simple. Something wants to set core freq to 1.5
GHz, so it calls clk_set_rate() and there we consult the DVFS code
first to validate that setting. If it can be set, fine, then DVFS will
set the voltages _before_ we change the frequency or it will simply
veto the change because one of the preliminaries for such a change is
not given.

Please stop thinking that your SoC is sooo special. It's NOT.

The HW concepts are quite similar all over the place, they are just
named differently and use different IP blocks with slightly different
functionality, but the problems are not unique to a particular SoC at
all.

 All OPP framework does is store that maps, and leaves it to users to
 choose regulators, clock framework variances, SoC temperature sensors
 or what ever mechanisms they choose to allow through a transition.

That's how it's implemented, but that does not say that the design is
correct and usable for more than the usecase it was modeled after.
 
We are looking into a common clock framework, which abstracts out the
duplicated functionality of the various implementations and reduces
them to the real thing: hardware drivers. So we really need to look
into that DVFS problem as well, simply because it is tightly coupled
and not a complete separate entity.

And looking at the struct clk disaster we really don't want another
incarnation in terms of DVFS where we end up with the same decision
functions in various SoCs over and over.

Thanks,

tglx

Re: [linux-pm] [RFC PATCH] PM: Introduce generic DVFS framework with device-specific OPPs

2011-04-27 Thread Thomas Gleixner

On Wed, 27 Apr 2011, Menon, Nishanth wrote:

 OPP table is just a storage and retrieval mechanism, it is upto SoC
 frameworks to choose the most adequate of solutions - e.g. OMAP has
 omap_device, hwmod and a clock framework for more intricate control to
 work in conjunction with cpuidle frameworks as well.

Can you please stop thinking about OMAP for a minute?

A clock framework is nothing SoC specific. A framework is an
abstraction of common HW functionality, which implements general
functionality and relies on the HW specific part to configure it and
to provide access to the hardware itself.

clocks are ordered as trees in HW, simply because you cannot have a
clock consumer be driven by more than one active clock at the same
time. A clock consumer may select a different clock producer, but that
merily changes the tree structure nothing else. So why should every
SoC implement it's own (different buggy) version of tree handling and
call it framework?

Yes, I know you might argue that some devices need two clocks enabled
to be functional. That's correct, but coupling those clocks at the
framework level is the wrong thing to do. If a device needs both an
interface clock and a separate interconnect clock to work, then it
needs to enable both clocks and become a consumer of them.
 
 There is cross domain dependency which OMAP (yet to be pushed to
 mainline) has - example: when OMAP4's MPUs are at a certain OPP, L3
 (OMAP's SoC bus) needs to be at least a certain OPP - these are
 framework which may be very custom to OMAP itself.

Wrong again. That's not a framework when you hack SoC specific
decision functions into it. It's the OMAP internal hackery to make
stuff work, but that's far from a framework.

What you are describing is a restriction which can be expressed in
tables or rules which are fed into a general framework.

Look at generic irqs, generic timekeeping, generic clockevents and
tons of other real frameworks in the kernel. They abstract out
concepts and provide generic interfaces rather than claiming that the
problem is unique to a particular piece of silicon.

Forget OMAP implementation details for a while, sit back and look at
the big picture.

Thanks,

tglx


--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 150 matches

Mail list logo