Re: [RFC PATCH 0/2] cpufreq: Introduce LAB cpufreq governor.

2013-04-10 Thread Lorenzo Pieralisi
On Wed, Apr 10, 2013 at 09:44:52AM +0100, Lukasz Majewski wrote:

[...]

  Have you also looked at the power clamp driver that have similar
  target ?
 
 I might be wrong here, but in my opinion the power clamp driver is a bit
 different:
 
 1. It is dedicated to Intel SoCs, which provide special set of
 registers (i.e. MSR_PKG_Cx_RESIDENCY [*]), which forces a processor to
 enter certain C state for a given duration. Idle duration is calculated
 by per CPU set of high priority kthreads (which also program [*]
 registers). 
 

Those registers are used for compensation (ie user asked a given idle
ratio but HW stats show a mismatch) and they are not programmed
they are just read. That code is Intel specific but it can be easily ported
to ARM, I did that and most of the code is common with zero dependency
on the architecture.

 2. ARM SoCs don't have such infrastructure, so we depend on SW here.

Well, it is true that most of the SoCs I am working on do not have
a programming interface to monitor C-state residency, granted, this is
a problem. If those stats can be retrieved somehow (I did that on our TC2
platform) then power clamp can be used on ARM with minor modifications.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/7] tick: Handle broadcast wakeup of multiple cpus

2013-03-13 Thread Lorenzo Pieralisi
Hi Thomas,

On Wed, Mar 06, 2013 at 11:18:35AM +, Thomas Gleixner wrote:
 Some brilliant hardware implementations wake multiple cores when the
 broadcast timer fires. This leads to the following interesting
 problem:
 
 CPU0  CPU1
 wakeup from idle  wakeup from idle
 
 leave broadcast mode  leave broadcast mode
  restart per cpu timer restart per cpu timer
   go back to idle
 handle broadcast
  (empty mask) 
   enter broadcast mode
   programm broadcast device
 enter broadcast mode
 programm broadcast device
 
 So what happens is that due to the forced reprogramming of the cpu
 local timer, we need to set a event in the future. Now if we manage to
 go back to idle before the timer fires, we switch off the timer and
 arm the broadcast device with an already expired time (covered by
 forced mode). So in the worst case we repeat the above ping pong
 forever.
   
 Unfortunately we have no information about what caused the wakeup, but
 we can check current time against the expiry time of the local cpu. If
 the local event is already in the past, we know that the broadcast
 timer is about to fire and send an IPI. So we mark ourself as an IPI
 target even if we left broadcast mode and avoid the reprogramming of
 the local cpu timer.
 
 This still leaves the possibility that a CPU which is not handling the
 broadcast interrupt is going to reach idle again before the IPI
 arrives. This can't be solved in the core code and will be handled in
 follow up patches.
 
 Reported-by: Jason Liu liu.h.ja...@gmail.com
 Signed-off-by: Thomas Gleixner t...@linutronix.de
 ---
  kernel/time/tick-broadcast.c |   59 
 ++-
  1 file changed, 58 insertions(+), 1 deletion(-)
 
 Index: tip/kernel/time/tick-broadcast.c
 ===
 --- tip.orig/kernel/time/tick-broadcast.c
 +++ tip/kernel/time/tick-broadcast.c
 @@ -393,6 +393,7 @@ int tick_resume_broadcast(void)
  
  static cpumask_var_t tick_broadcast_oneshot_mask;
  static cpumask_var_t tick_broadcast_pending_mask;
 +static cpumask_var_t tick_broadcast_force_mask;
  
  /*
   * Exposed for debugging: see timer_list.c
 @@ -462,6 +463,10 @@ again:
   }
   }
  
 + /* Take care of enforced broadcast requests */
 + cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
 + cpumask_clear(tick_broadcast_force_mask);

I tested the set and it works fine on a dual cluster big.LITTLE testchip
using broadcast timer to manage deep idle cluster states.

Just asking a question: the force mask is cleared before sending the
timer IPI. Would not be better to clear it after the IPI is sent in

tick_do_broadcast(...) ?

Can you spot a regression if we do this ? The idle thread checks that
mask with irqs disabled, so it is possible that we clear the mask before
the CPU has a chance to get the IPI. If we clear the mask after sending
the IPI, we are increasing the chances for the idle thread to get it.

It is just a further optimization, just asking, thanks.

 +
   /*
* Wakeup the cpus which have an expired event.
*/
 @@ -497,6 +502,7 @@ void tick_broadcast_oneshot_control(unsi
   struct clock_event_device *bc, *dev;
   struct tick_device *td;
   unsigned long flags;
 + ktime_t now;
   int cpu;
  
   /*
 @@ -524,7 +530,16 @@ void tick_broadcast_oneshot_control(unsi
   WARN_ON_ONCE(cpumask_test_cpu(cpu, 
 tick_broadcast_pending_mask));
   if (!cpumask_test_and_set_cpu(cpu, 
 tick_broadcast_oneshot_mask)) {
   clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
 - if (dev-next_event.tv64  bc-next_event.tv64)
 + /*
 +  * We only reprogram the broadcast timer if we
 +  * did not mark ourself in the force mask and
 +  * if the cpu local event is earlier than the
 +  * broadcast event. If the current CPU is in
 +  * the force mask, then we are going to be
 +  * woken by the IPI right away.
 +  */
 + if (!cpumask_test_cpu(cpu, tick_broadcast_force_mask) 
Is the test against tick_broadcast_force_mask necessary if we add the check
in the idle thread before entering idle ? It does not hurt, agreed, and we'd
better leave it there, it is just for my own understanding, thanks a lot.

Having said that, on the series:

Tested-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com

 + dev-next_event.tv64  bc-next_event.tv64)
   tick_broadcast_set_event(dev-next_event, 1);
   }
   } else {
 @@ -545,6 +560,47 @@ void tick_broadcast_oneshot_control(unsi

Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-21 Thread Lorenzo Pieralisi
Hi Jason,

On Thu, Feb 21, 2013 at 06:16:51AM +, Jason Liu wrote:
 2013/2/20 Thomas Gleixner t...@linutronix.de:
  On Wed, 20 Feb 2013, Jason Liu wrote:
  void arch_idle(void)
  {
  
  clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, cpu);
 
  enter_the_wait_mode();
 
  clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu);
  }
 
  when the broadcast timer interrupt arrives(this interrupt just wakeup
  the ARM, and ARM has no chance
  to handle it since local irq is disabled. In fact it's disabled in
  cpu_idle() of arch/arm/kernel/process.c)
 
  the broadcast timer interrupt will wake up the CPU and run:
 
  clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu);-
  tick_broadcast_oneshot_control(...);
  -
  tick_program_event(dev-next_event, 1);
  -
  tick_dev_program_event(dev, expires, force);
  -
  for (i = 0;;) {
  int ret = clockevents_program_event(dev, expires, now);
  if (!ret || !force)
  return ret;
 
  dev-retries++;
  
  now = ktime_get();
  expires = ktime_add_ns(now, dev-min_delta_ns);
  }
  clockevents_program_event(dev, expires, now);
 
  delta = ktime_to_ns(ktime_sub(expires, now));
 
  if (delta = 0)
  return -ETIME;
 
  when the bc timer interrupt arrives,  which means the last local timer
  expires too. so,
  clockevents_program_event will return -ETIME, which will cause the
  dev-retries++
  when retry to program the expired timer.
 
  Even under the worst case, after the re-program the expired timer,
  then CPU enter idle
  quickly before the re-progam timer expired, it will make system
  ping-pang forever,
 
  That's nonsense.
 
 I don't think so.
 
 
  The timer IPI brings the core out of the deep idle state.
 
  So after returning from enter_wait_mode() and after calling
  clockevents_notify() it returns from arch_idle() to cpu_idle().
 
  In cpu_idle() interrupts are reenabled, so the timer IPI handler is
  invoked. That calls the event_handler of the per cpu local clockevent
  device (the one which stops in C3). That ends up in the generic timer
  code which expires timers and reprograms the local clock event device
  with the next pending timer.
 
  So you cannot go idle again, before the expired timers of this event
  are handled and their callbacks invoked.
 
 That's true for the CPUs which not response to the global timer interrupt.
 Take our platform as example: we have 4CPUs(CPU0, CPU1,CPU2,CPU3)
 The global timer device will keep running even in the deep idle mode, so, it
 can be used as the broadcast timer device, and the interrupt of this device
 just raised to CPU0 when the timer expired, then, CPU0 will broadcast the
 IPI timer to other CPUs which is in deep idle mode.
 
 So for CPU1, CPU2, CPU3, you are right, the IPI timer will bring it out of 
 idle
 state, after running clockevents_notify() it returns from arch_idle()
 to cpu_idle(),
 then local_irq_enable(), the IPI handler will be invoked and handle
 the expires times
 and re-program the next pending timer.
 
 But, that's not true for the CPU0. The flow for CPU0 is:
 the global timer interrupt wakes up CPU0 and then call:
 clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, cpu);
 
 which will cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask());
 in the function tick_broadcast_oneshot_control(),

For my own understanding: at this point in time CPU0 local timer is
also reprogrammed, with min_delta (ie 1us) if I got your description
right.

 
 After return from clockevents_notify(), it will return to cpu_idle
 from arch_idle,
 then local_irq_enable(), the CPU0 will response to the global timer
 interrupt, and
 call the interrupt handler: tick_handle_oneshot_broadcast()
 
 static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
 {
 struct tick_device *td;
 ktime_t now, next_event;
 int cpu;
 
 raw_spin_lock(tick_broadcast_lock);
 again:
 dev-next_event.tv64 = KTIME_MAX;
 next_event.tv64 = KTIME_MAX;
 cpumask_clear(to_cpumask(tmpmask));
 now = ktime_get();
 /* Find all expired events */
 for_each_cpu(cpu, tick_get_broadcast_oneshot_mask()) {
 td = per_cpu(tick_cpu_device, cpu);
 if (td-evtdev-next_event.tv64 = now.tv64)
 cpumask_set_cpu(cpu, to_cpumask(tmpmask));
 else if (td-evtdev-next_event.tv64  next_event.tv64)
 next_event.tv64 = td-evtdev-next_event.tv64;
 }
 
 /*
  * Wakeup the cpus which have an expired event.
  */
 tick_do_broadcast(to_cpumask(tmpmask));
 ...
 }
 
 since cpu0 has been removed from the tick_get_broadcast_oneshot_mask(), and if
 all the other cpu1/2/3 state in idle, and no expired timers, then the
 tmpmask will be 0,
 when call tick_do_broadcast().
 
 static void 

Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-22 Thread Lorenzo Pieralisi
On Thu, Feb 21, 2013 at 10:19:18PM +, Thomas Gleixner wrote:
 On Thu, 21 Feb 2013, Santosh Shilimkar wrote:
  On Thursday 21 February 2013 07:18 PM, Thomas Gleixner wrote:
   find below a completely untested patch, which should address that issue.
   
  After looking at the thread, I tried to see the issue on OMAP and could
  see the same issue as Jason.
 
 That's interesting. We have the same issue on x86 since 2007 and
 nobody noticed ever. It's basically the same problem there, but it
 seems that on x86 getting out of those low power states is way slower
 than the minimal reprogramming delta which is used to enforce the
 local timer to fire after the wakeup. 
 
 I'm still amazed that as Jason stated a 1us reprogramming delta is
 sufficient to get this ping-pong going. I somehow doubt that, but
 maybe ARM is really that fast :)

It also depends on when the idle driver exits broadcast mode.
Certainly if that's the last thing it does before enabling IRQs, that
might help trigger the issue.

I am still a bit sceptic myself too, and I take advantage of Thomas'
knowledge on the subject, which is ways deeper than mine BTW, to ask a
question. The thread started with a subject too many retries and
here I have a doubt. If the fix is not applied, on the CPU affine to
the broadcast timer, it is _normal_ to have local timer retries, since
the CPU is setting/forcing the local timer to fire after a min_delta_ns every
time the expired event was local to the CPU affine to the broadcast timer.

The problem, supposedly, is that the timer has not enough time (sorry for the
mouthful) to expire(fire) before IRQs are disabled and the idle thread goes
back to idle again. This means that we should notice a mismatch between
the number of broadcast timer IRQs and local timer IRQs on the CPU
affine to the broadcast timer IRQ (granted, we also have to take into
account broadcast timer IRQs meant to service (through IPI) a local timer
expired on a CPU which is not the one running the broadcast IRQ handler and
normal local timer IRQs as well).

I am not sure the sheer number of retries is a symptom of the problem
happening, but I might well be mistaken so I am asking.

For certain, with the fix applied, lots of duplicate IRQs on the CPU
affine to the broadcast timer are eliminated, since the local timer is
not reprogrammed anymore (before the fix, basically the broadcast timer
was firing an IRQ that did nothing since the CPU was already out of
broadcast mode by the time the broadcast handler was running, the real job
was carried out in the local timer handler).

 
  Your patch fixes the retries on both CPUs on my dual core machine. So
  you use my tested by if you need one.
 
 They are always welcome.
 
  Tested-by: Santosh Shilimkar santosh.shilim...@ti.com

You can add mine too, we should fix the WARN_ON_ONCE mentioned in Santosh's
reply.

Tested-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-22 Thread Lorenzo Pieralisi
On Fri, Feb 22, 2013 at 10:24:00AM +, Thomas Gleixner wrote:
 On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
  BTW, Lorenzo off-list mentioned to me about warning in boot-up
  which I missed while testing your patch. It will take bit more
  time for me to look into it and hence thought of reporting it.
  
  [2.186126] [ cut here ]
  [2.190979] WARNING: at kernel/time/tick-broadcast.c:501
  tick_broadcast_oneshot_control+0x1c0/0x21c()
 
 Which one is that? tick_broadcast_pending or tick_force_broadcast_mask ?

It is the tick_force_broadcast_mask and I think that's because on all
systems we are testing, the broadcast timer IRQ is a thundering herd,
all CPUs get out of idle at once and try to get out of broadcast mode
at more or less the same time.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-22 Thread Lorenzo Pieralisi
On Fri, Feb 22, 2013 at 12:07:30PM +, Thomas Gleixner wrote:
 On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
 
  On Friday 22 February 2013 04:01 PM, Lorenzo Pieralisi wrote:
   On Fri, Feb 22, 2013 at 10:24:00AM +, Thomas Gleixner wrote:
On Fri, 22 Feb 2013, Santosh Shilimkar wrote:
 BTW, Lorenzo off-list mentioned to me about warning in boot-up
 which I missed while testing your patch. It will take bit more
 time for me to look into it and hence thought of reporting it.
 
 [2.186126] [ cut here ]
 [2.190979] WARNING: at kernel/time/tick-broadcast.c:501
 tick_broadcast_oneshot_control+0x1c0/0x21c()

Which one is that? tick_broadcast_pending or tick_force_broadcast_mask ?
   
   It is the tick_force_broadcast_mask and I think that's because on all
   systems we are testing, the broadcast timer IRQ is a thundering herd,
   all CPUs get out of idle at once and try to get out of broadcast mode
   at more or less the same time.
   
  So the issue comes ups only when the idle state used where CPU wakeup
  more or less at same time as Lorenzo mentioned. I have two platforms
  where I could test the patch and see the issue only with one platform.
  
  Yesterday I didn't notice the warning because it wasn't seen on that
  platform :-) OMAP4 idle entry and exit in deep state is staggered
  between CPUs and hence the warning isn't seen. On OMAP5 though,
  there is an additional C-state where idle entry/exit for CPU
  isn't staggered and I see the issue in that case.
  
  Actually the broad-cast code doesn't expect such a behavior
  from CPUs since only the broad-cast affine CPU should wake
  up and rest of the CPU should be woken up by the broad-cast
  IPIs.
 
 That's what I feared. We might have the same issue on x86, depending
 on the cpu model.
 
 But thinking more about it. It's actually not a real problem, just
 pointless burning of cpu cycles.
 
 So on the CPU which gets woken along with the target CPU of the
 broadcast the following happens:
 
   deep_idle()
   -- spurious wakeup
   broadcast_exit()
 set forced bit
   
   enable interrupts
 
   -- Nothing happens
 
   disable interrupts
 
   broadcast_enter()
   -- Here we observe the forced bit is set
   deep_idle()
 
 Now after that the target CPU of the broadcast runs the broadcast
 handler and finds the other CPU in both the broadcast and the forced
 mask, sends the IPI and stuff gets back to normal.
 
 So it's not actually harmful, just more evidence for the theory, that
 hardware designers have access to very special drug supplies.
 
 Now we could make use of that and avoid going deep idle just to come
 back right away via the IPI. Unfortunately the notification thingy has
 no return value, but we can fix that.
 
 To confirm that theory, could you please try the hack below and add
 some instrumentation (trace_printk)?

Applied, and it looks like that's exactly why the warning triggers, at least
on the platform I am testing on which is a dual-cluster ARM testchip.

There is a still time window though where the CPU (the IPI target) can get
back to idle (tick_broadcast_pending still not set) before the CPU target of
the broadcast has a chance to run tick_handle_oneshot_broadcast (and set
tick_broadcast_pending), or am I missing something ?
It is a corner case, granted. Best thing would be to check pending IRQs in the
idle driver back-end (or have always-on local timers :-)).

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-22 Thread Lorenzo Pieralisi
On Fri, Feb 22, 2013 at 03:03:02PM +, Thomas Gleixner wrote:
 On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote:
  On Fri, Feb 22, 2013 at 12:07:30PM +, Thomas Gleixner wrote:
   Now we could make use of that and avoid going deep idle just to come
   back right away via the IPI. Unfortunately the notification thingy has
   no return value, but we can fix that.
   
   To confirm that theory, could you please try the hack below and add
   some instrumentation (trace_printk)?
  
  Applied, and it looks like that's exactly why the warning triggers, at least
  on the platform I am testing on which is a dual-cluster ARM testchip.
  
  There is a still time window though where the CPU (the IPI target) can get
  back to idle (tick_broadcast_pending still not set) before the CPU target of
  the broadcast has a chance to run tick_handle_oneshot_broadcast (and set
  tick_broadcast_pending), or am I missing something ?
 
 Well, the tick_broadcast_pending bit is uninteresting if the
 force_broadcast bit is set. Because if that bit is set we know for
 sure, that we got woken with the cpu which gets the broadcast timer
 and raced back to idle before the broadcast handler managed to
 send the IPI.

Gah, my bad sorry, I mixed things up. I thought

tick_check_broadcast_pending()

was checking against the tick_broadcast_pending mask not

tick_force_broadcast_mask

as it correctly does.

All clear now.

Thanks a lot,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: too many timer retries happen when do local timer swtich with broadcast timer

2013-02-25 Thread Lorenzo Pieralisi
On Fri, Feb 22, 2013 at 06:52:14PM +, Thomas Gleixner wrote:
 On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote:
  On Fri, Feb 22, 2013 at 03:03:02PM +, Thomas Gleixner wrote:
   On Fri, 22 Feb 2013, Lorenzo Pieralisi wrote:
On Fri, Feb 22, 2013 at 12:07:30PM +, Thomas Gleixner wrote:
 Now we could make use of that and avoid going deep idle just to come
 back right away via the IPI. Unfortunately the notification thingy has
 no return value, but we can fix that.
 
 To confirm that theory, could you please try the hack below and add
 some instrumentation (trace_printk)?

Applied, and it looks like that's exactly why the warning triggers, at 
least
on the platform I am testing on which is a dual-cluster ARM testchip.

There is a still time window though where the CPU (the IPI target) can 
get
back to idle (tick_broadcast_pending still not set) before the CPU 
target of
the broadcast has a chance to run tick_handle_oneshot_broadcast (and set
tick_broadcast_pending), or am I missing something ?
   
   Well, the tick_broadcast_pending bit is uninteresting if the
   force_broadcast bit is set. Because if that bit is set we know for
   sure, that we got woken with the cpu which gets the broadcast timer
   and raced back to idle before the broadcast handler managed to
   send the IPI.
  
  Gah, my bad sorry, I mixed things up. I thought
  
  tick_check_broadcast_pending()
  
  was checking against the tick_broadcast_pending mask not
  
  tick_force_broadcast_mask
 
 Yep, that's a misnomer. I just wanted to make sure that my theory is
 correct. I need to think about the real solution some more.
 
 We have two alternatives:
 
 1) Make the clockevents_notify function have a return value.
 
 2) Add something like the hack I gave you with a proper name.
 
 The latter has the beauty, that we just need to modify the platform
 independent idle code instead of going down to every callsite of the
 clockevents_notify thing.

Thank you.

I am not sure (1) would buy us anything compared to (2) and as you said we
would end up patching all callsites so (2) is preferred.

As I mentioned, we can even just apply your fixes and leave platform specific
code deal with this optimization, at the end of the day idle driver has
just to check pending IRQs/wake-up sources (which would cover all IRQs not
just TIMER IPI) if and when it has to start time consuming operations like
cache cleaning to enter deep idle. If it goes into a shallow C-state so be it.

On x86 I think it is HW/FW that prevents C-state entering if IRQs are pending,
and on ARM it is likely to happen too, so I am just saying you should not
bother if you think the code becomes too hairy to justify this change.

Thank you very much for the fixes and your help,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: suspend: use flush range instead of flush all

2012-09-12 Thread Lorenzo Pieralisi
On Wed, Sep 12, 2012 at 08:43:33AM +0100, Shilimkar, Santosh wrote:
 + Lorenzo,
 
 On Wed, Sep 12, 2012 at 12:48 PM, wzch w...@marvell.com wrote:
  From: Wenzeng Chen w...@marvell.com
 
  In cpu suspend function __cpu_suspend_save(), we save cp15 registers,
  resume function, sp and suspend_pgd, then flush the data to DDR, but
  no need to flush all D cache, only need to flush range.
 
  Change-Id: I591a1fde929f3f987c69306b601843ed975d3e41
 You should drop above.
 
  Signed-off-by: Wenzeng Chen w...@marvell.com
  ---
 Lorenzo and myself discussed about the above expensive flush and he
 is planning to post a similar patch but with small difference.
 
   arch/arm/kernel/suspend.c |4 +++-
   1 files changed, 3 insertions(+), 1 deletions(-)
 
  diff --git a/arch/arm/kernel/suspend.c b/arch/arm/kernel/suspend.c
  index 1794cc3..bb582d8 100644
  --- a/arch/arm/kernel/suspend.c
  +++ b/arch/arm/kernel/suspend.c
  @@ -17,6 +17,7 @@ extern void cpu_resume_mmu(void);
*/
   void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 *save_ptr)
   {
  +   u32 *ptr_orig = ptr;
  *save_ptr = virt_to_phys(ptr);
 
  /* This must correspond to the LDM in cpu_resume() assembly */
  @@ -26,7 +27,8 @@ void __cpu_suspend_save(u32 *ptr, u32 ptrsz, u32 sp, u32 
  *save_ptr)
 
  cpu_do_suspend(ptr);
 
  -   flush_cache_all();
 Lorenzo's patch was limiting above flush to local cache (LOUs) instead
 of dropping
 it completely.

Because if we remove it completely we have to make sure that every given
suspend finisher calls flush_cache_all(), hence from my viewpoint this
patch is incomplete. Either we remove it, and add it to ALL suspend
finisher (or just make sure it is there) or we leave it but it should use
the new LoUIS API we are going to add.

 
  +   __cpuc_flush_dcache_area((void *)ptr_orig, ptrsz);
  +   __cpuc_flush_dcache_area((void *)save_ptr, sizeof(*save_ptr));
  outer_clean_range(*save_ptr, *save_ptr + ptrsz);
  outer_clean_range(virt_to_phys(save_ptr),
virt_to_phys(save_ptr) + sizeof(*save_ptr));
 
 Just thinking bit more, even in case we drop the flush_cache_all()
 completely, it should be safe since the suspend_finisher()  takes
 care of the cache maintenance already.

We already discussed this. Fine by me, but we have to make sure it is
called on all suspend finishers in the mainline.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND 2/2] ARM: local timers: add timer support using IO mapped register

2012-09-28 Thread Lorenzo Pieralisi
On Fri, Sep 28, 2012 at 04:57:46PM +0100, Dave Martin wrote:
 [ Note: please aim to CC devicetree-disc...@lists.ozlabs.org with any
 patches or bindings relevant to device tree. ]
 
 [ Lorenzo, there's a question for you further down this mail. ]

[...]

+  If using the memory mapped interface, list the interrupts for each 
core,
+  starting with core 0.
  
  I take it that core 0 means physical cpu 0 (i.e. MPIDR.Aff{2,1,0} == 0)?
 
 Lorenzo, should we have a standard way of referring to CPUs and topology
 nodes documented as part of the topology bindings?  We certainly need
 rules of some kind, since when the topology is non-trivial there is no
 well-defined first CPU, nor any single correct order in which to list
 CPUs.

I think, and that's just my opinion, that whatever solution we go for to
describe the topology must contain the information needed by all kernel
subsystems to retrieve HW information. I do not think we should document
how devices connect to CPU(s)/Cluster(s) in the topology bindings per-se,
since those are properties that belong to device nodes.

There must be a common way for all devices to link to the topology, though.

The topology must be descriptive enough to cater for all required cases
and that's what Mark with PMU and all of us are trying to come up with, a solid
way to represent with DT the topology of current and future ARM systems.

First idea I implemented and related LAK posting:

http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080873.html

Are cluster nodes really needed or cpu nodes are enough ? I do not
know, let's get this discussion started, that's all I need.

But definitely declaring IRQs in physical CPU id order (and mind, as you say,
physical CPU ids, ie MPIDR, can be sparsely populated) and initializing them
*thinking* the order is the logical one is plainly broken.

 The topology may also be sparsely populated (e.g.,
 Aff[2,1,0] in { (0,0,0), (0,0,1), (0,1,0), (0,1,1), (0,1,2), (0,1,3) })
 
 It would be bad if different driver bindings end up solving this in
 different ways (even non-broken ways)

Yes, I agree and code that relies on any temporary work-around to tackle
this problem should not be merged before we set in stone proper topology
bindings.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND 2/2] ARM: local timers: add timer support using IO mapped register

2012-10-02 Thread Lorenzo Pieralisi
On Tue, Oct 02, 2012 at 12:27:04PM +0100, Dave Martin wrote:
 On Fri, Sep 28, 2012 at 06:15:53PM +0100, Lorenzo Pieralisi wrote:
  On Fri, Sep 28, 2012 at 04:57:46PM +0100, Dave Martin wrote:

[...]

  There must be a common way for all devices to link to the topology, though.
  
  The topology must be descriptive enough to cater for all required cases
  and that's what Mark with PMU and all of us are trying to come up with, a 
  solid
  way to represent with DT the topology of current and future ARM systems.
  
  First idea I implemented and related LAK posting:
  
  http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080873.html
  
  Are cluster nodes really needed or cpu nodes are enough ? I do not
  know, let's get this discussion started, that's all I need.
 
 One thing which now occurs to me on this point it that if we want to describe
 the CCI properly in the DT (yes) then we need a way to describe the mapping
 between clusters and CCI slave ports.  Currently that knowledge just has to
 be a hard-coded hack somewhere: it's not probeable at all.

That's definitely a good point. We can still define CCI ports as belonging
to a range of CPUs, but that's a bit of a stretch IMHO.

 I'm not sure how we do that, or how we describe the cache topology, without
 the clusters being explicit in the DT
 
 ...unless you already have ideas ?

Either we define the cluster node explicitly or we can always see it as a
collection of CPUs, ie phandles to cpu nodes. That's what the decision
we have to make is all about. I think that describing it explicitly make
sense, but we need to check all possible use cases to see if that's
worthwhile.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea

2013-07-29 Thread Lorenzo Pieralisi
On Mon, Jul 29, 2013 at 02:12:58PM +0100, Arjan van de Ven wrote:
 
  The menu governor tries to deduce the next wakeup but based on events
  per cpu. That means if a task with a specific behavior is migrated
  across cpus, the statistics will be wrong.
 
 
 btw this is largely a misunderstanding;
 tasks are not the issue; tasks use timers and those are perfectly predictable.
 It's interrupts that are not and the heuristics are for that.
 
 Now, if your hardware does the really-bad-for-power wake-all on any interrupt,
 then the menu governor logic is not good for you; rather than looking at the 
 next
 timer on the current cpu you need to look at the earliest timer on the set of 
 bundled
 cpus as the upper bound of the next wake event.

Yes, that's true and we have to look into this properly, but certainly
a wake-up for a CPU in a package C-state is not beneficial to x86 CPUs either,
or I am missing something ?

Even if the wake-up interrupts just power up one of the CPUs in a package
and leave other(s) alone, all HW state shared (ie caches) by those CPUs must
be turned on. What I am asking is: this bundled next event is a concept
that should apply to x86 CPUs too, or it is entirely managed in FW/HW
and the kernel just should not care ?

I still do not understand how this bundled next event is managed on
x86 with the menu governor, or better why it is not managed at all, given
the importance of package C-states.

 And maybe even more special casing is needed... but I doubt it.

I lost you here, can you elaborate pls ?

Thanks a lot,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RFC: revert request for cpuidle patches e11538d1 and 69a37bea

2013-07-29 Thread Lorenzo Pieralisi
On Mon, Jul 29, 2013 at 03:29:20PM +0100, Arjan van de Ven wrote:
 On 7/29/2013 7:14 AM, Lorenzo Pieralisi wrote:
 
 
  btw this is largely a misunderstanding;
  tasks are not the issue; tasks use timers and those are perfectly 
  predictable.
  It's interrupts that are not and the heuristics are for that.
 
  Now, if your hardware does the really-bad-for-power wake-all on any 
  interrupt,
  then the menu governor logic is not good for you; rather than looking at 
  the next
  timer on the current cpu you need to look at the earliest timer on the set 
  of bundled
  cpus as the upper bound of the next wake event.
 
  Yes, that's true and we have to look into this properly, but certainly
  a wake-up for a CPU in a package C-state is not beneficial to x86 CPUs 
  either,
  or I am missing something ?
 
 a CPU core isn't in a package C state, the system is.
 (in a core C state the whole core is already powered down completely; a 
 package C state
 just also turns off the memory controller/etc)
 
 package C states are global on x86 (not just per package); there's nothing one
 can do there in terms of grouping/etc.

So you are saying that package states are system states on x86 right ?
Now things are a bit clearer, I was a bit baffled at first when you
mentioned that package C-states allow PM to turn off DRAM controller,
the concept of package C-state is a bit misleading and does not resemble
much to what cluster states are now for ARM, that's why I asked in the
first place, thank you.

  Even if the wake-up interrupts just power up one of the CPUs in a package
  and leave other(s) alone, all HW state shared (ie caches) by those CPUs must
  be turned on. What I am asking is: this bundled next event is a concept
  that should apply to x86 CPUs too, or it is entirely managed in FW/HW
  and the kernel just should not care ?
 
 on Intel x86 cpus, there's not really bundled concept. or rather, there is 
 only 1 bundle
 (which amounts to the same thing).
 Yes in a multi-package setup there are some cache power effects... but there's
 not a lot one can do there.

On ARM there is, some basic optimizations like avoid cleaning caches if
an IRQ is pending or a next event is due shortly, for instance. The
difference is that on ARM cache management in done in SW and under
kernel or FW control (which in a way is closer to what x86 does, even
though I think on x86 most of power control is offloaded to HW).

Given what you said above, I understand that even on a multi-package system,
package C-states are global, not per package. That's a pivotal point.

 The other cores don't wake up, so they still make their own correct decisions.
 
  I still do not understand how this bundled next event is managed on
  x86 with the menu governor, or better why it is not managed at all, given
  the importance of package C-states.
 
 package C states on x86 are basically OS invisible. The OS manages core level 
 C states,
 the hardware manages the rest.
 The bundle part hurts you on a one wakes all system,
 not because of package level power effects, but because others wake up 
 prematurely
 (compared to what they expected) which causes them to think future wakups 
 will also
 be earlier. All because they get the what is the next known event wrong,
 and start correcting for known events instead of only for 'unpredictable' 
 interrupts.

Well, reality is a bit more complex and probably less drastic, cores that are
woken up spuriosly can be put back to sleep without going through the governor
again, but your point is taken.

 Things will go very wonky if you do that for sure.
 (I've seen various simulation data on that, and the menu governor indeed acts 
 quite poorly
 for that)

That's something we have been benchmarking, yes.

  And maybe even more special casing is needed... but I doubt it.
 
  I lost you here, can you elaborate pls ?
 
 well.. just looking at the earliest timer might not be enough; that timer 
 might be on a different
 core that's still active, and may change after the current cpu has gone into 
 an idle state.

Yes, I have worked on this, and certainly next events must be tied to
the state a core is in, the bare next event does not help.

 Fun.

Big :D

 Coupled C states on this level are a PAIN in many ways, and tend to totally 
 suck for power
 due to this and the general too much is active reasons.

I think the trend is moving towards core gating, which resembles a lot to what
x86 world does today. Still, the interaction between menu governor and
cluster states has to be characterized and that's we are doing at the
moment.

Thank you very much,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/2] ARM64: add cpu topology definition

2013-07-29 Thread Lorenzo Pieralisi
On Mon, Jul 29, 2013 at 02:36:30PM +0100, Dave Martin wrote:
 On Mon, Jul 29, 2013 at 10:54:01AM +0100, Will Deacon wrote:
  On Mon, Jul 29, 2013 at 10:46:06AM +0100, Vincent Guittot wrote:
   On 27 July 2013 12:42, Hanjun Guo hanjun@linaro.org wrote:
Power aware scheduling needs the cpu topology information to improve the
cpu scheduler decision making.
   
   It's not only power aware scheduling. The scheduler already uses
   topology and cache sharing when  CONFIG_SCHED_MC and/or
   CONFIG_SCHED_SMT are enable. So you should also add these configs for
   arm64 so the scheduler can use it
  
  ... except that the architecture doesn't define what the AFF fields in MPIDR
  really represent. Using them to make key scheduling decisions relating to
 
 In fact, the ARM Architecture doesn't place any requirements on MPIDRs to
 force the aff fields to exist _at all_.  It's just a recommendation.
 Instead, you have a 24 or 32-bit number which is unique per CPU, and which
 is _probably_ assigned in a way resembling the aff fields.
 
  cache proximity seems pretty risky to me, especially given the track record
  we've seen already on AArch32 silicon. It's a convenient register if it
  contains the data we want it to contain, but we need to force ourselves to
  come to terms with reality here and simply use it as an identifier for a
  CPU.
 
 +1
 
 Also, we should align arm and arm64.  The problem is basically exactly
 the same, and the solution needs to be the same.  struct cputopo_arm is
 already being abused  -- for example, TC2 describes the A15 and A7
 clusters on a single die as having different socket_id values, even
 though this is obviously nonsense.  But there's no other way to describe
 that system today.
 
  Can't we just use the device-tree to represent this topological data for
  arm64? Lorenzo has been working on bindings in this area.
 
 This may become more important as we start to see things like asymmetric
 topologies appearing (different numbers of nodes and different
 interdependence characteristics in adjacent branches of the topology
 etc.)

Will and Dave summed up the existing issues with MPIDR definition related to
the topology description.

FYI, a link to the current topology bindings posted on DT-discuss and LAKML:

https://lists.ozlabs.org/pipermail/devicetree-discuss/2013-April/031725.html

I am waiting for the dust to settle on the DT bindings review discussions to
repost them and get them finalized.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v4 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-18 Thread Lorenzo Pieralisi
Hi Olof,

thanks a lot.

On Mon, Jun 17, 2013 at 06:44:51PM +0100, Olof Johansson wrote:
 On Mon, Jun 17, 2013 at 04:51:09PM +0100, Lorenzo Pieralisi wrote:
  The TC2 versatile express core tile integrates a logic block that provides 
  the
  interface between the dual cluster test-chip and the M3 microcontroller that
  carries out power management. The logic block, called Serial Power 
  Controller
  (SPC), contains several memory mapped registers to control among other 
  things
  low-power states, operating points and reset control.
  
  This patch provides a driver that enables run-time control of features
  implemented by the SPC control logic.
  
  The SPC control logic is required to be programmed very early in the boot
  process to reset secondary CPUs on the TC2 testchip, set-up jump addresses 
  and
  wake-up IRQs for power management.
  Since the SPC logic is also used to control clocks and operating points,
  that have to be initialized early as well, the SPC interface consumers can 
  not
  rely on early initcalls ordering, which is inconsistent, to wait for SPC
  initialization. Hence, in order to keep the components relying on the SPC
  coded in a sane way, the driver puts in place a synchronization scheme that
  allows kernel drivers to check if the SPC driver has been initialized and if
  not, to initialize it upon check.
  
  A status variable is kept in memory so that loadable modules that require 
  SPC
  interface (eg CPUfreq drivers) can still check the correct initialization 
  and
  use the driver correctly after functions used at boot to init the driver are
  freed.
  
  The driver also provides a bridge interface through the vexpress config
  infrastructure. Operations allowing to read/write operating points are
  made to go via the same interface as configuration transactions so that
  all requests to M3 are serialized.
  
  Device tree bindings documentation for the SPC component is provided with
  the patchset.
 
 Sorry, I got to think of this over the weekend and should have replied
 before you had a chance to repost, but still:
 
 Why is the operating point and frequency change code in this driver at all?
 Usually, the MFD driver contains a shared method to access register space on
 a multifunction device, but the actual operation of each subdevice is handled
 by individual drivers in the regular locations.
 
 So, in the case of operating points and requencies, that should be in
 a cpufreq driver. And the clock setup should presumably be in a clk
 framework driver if needed.

Well, yes this can be done. I will probably move this code out of mfd
since this choice caused more issues than the current driver solves.

By moving the frequency changes into subsystems, we are certainly
trimming down the code, not sure we improve the maintainability though,
keep in mind that we already had to change the vexpress-config interface
to cope with SPC oddities, at least now these intricacies are self-contained.

What you are suggesting makes sense though, I will do it.

 Then all that would be left here is the functionality for submitting the two
 kinds of commands, and handling interrupts.

Not really. There are still a bunch of registers to set-up wake-up IRQs,
power down flags and warm-boot jump addresses that do not go through the
vexpress interface, they are ad-hoc. I could also split that stuff, but I
really do not think it is worth the effort.

 That'll trim down the driver to a point where I think you'll find it much
 easier to get merged. :-)

To start with I have to understand in which directory this code should
live. Moving the frequency settings in clk/CPUfreq drivers should be
feasible with extra DT complexity for their bindings.

 [...]
 
  +struct ve_spc_drvdata {
  +   void __iomem *baseaddr;
  +   /* A15 processor MPIDR[15:8] bitfield */
 
 A comment describing what the meaning is would be more useful, even if
 less technically specific. Or maybe something like Cluster ID of the
 A15 cluster, from MPIDR[15:8] or similar.
 
  +   u32 a15_clusid;
 
 
 (I reserve the right to have more comments later but I think we want to 
 discuss
 the removal of frequency management code first. :-)

I will do that and comments are always welcome.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v4 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-18 Thread Lorenzo Pieralisi
On Tue, Jun 18, 2013 at 05:25:22AM +0100, Nicolas Pitre wrote:
 On Mon, 17 Jun 2013, Lorenzo Pieralisi wrote:
 
  The TC2 versatile express core tile integrates a logic block that provides 
  the
  interface between the dual cluster test-chip and the M3 microcontroller that
  carries out power management. The logic block, called Serial Power 
  Controller
  (SPC), contains several memory mapped registers to control among other 
  things
  low-power states, operating points and reset control.
 
 [...]
 
 I slightly modified the following before committing this patch to my TC2 
 branch:
 
  +/**
  + * ve_spc_cpu_wakeup_irq()
  + *
  + * Function to set/clear per-CPU wake-up IRQs. Not protected by locking 
  since
  + * it might be used in code paths where normal cacheable locks are not
  + * working. Locking must be provided by the caller to ensure atomicity.
  + *
  + * @cpu: mpidr[7:0] bitfield describing cpu affinity level
  + * @cluster: mpidr[15:8] bitfield describing cluster affinity level
  + * @set: if true, wake-up IRQs are set, if false they are cleared
  + */
  +void ve_spc_cpu_wakeup_irq(u32 cpu, u32 cluster, bool set)
  +{
 
 I made cluster first then cpu.  All the other functions have the cluster 
 argument first, and ve_spc_set_resume_addr() already uses that order.

Ok thanks.

 [...]
  +#ifdef CONFIG_VEXPRESS_SPC
  +int ve_spc_probe(void);
  +int ve_spc_get_freq(u32 cluster);
  +int ve_spc_set_freq(u32 cluster, u32 freq);
  +int ve_spc_get_freq_table(u32 cluster, const u32 **fptr);
  +void ve_spc_global_wakeup_irq(bool set);
  +void ve_spc_cpu_wakeup_irq(u32 cpu, u32 cluster, bool set);
  +void ve_spc_set_resume_addr(u32 cluster, u32 cpu, u32 addr);
  +u32 ve_spc_get_nr_cpus(u32 cluster);
  +void ve_spc_powerdown(u32 cluster, bool enable);
  +#else
  +static inline bool ve_spc_probe(void) { return -ENODEV; }
 
 s/bool/int/

Bah, sorry.

Thanks a lot,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: EXYNOS: Fix incorrect usage of S5P_ARM_CORE1_* registers

2013-06-19 Thread Lorenzo Pieralisi
On Wed, Jun 19, 2013 at 01:50:57PM +0100, Tomasz Figa wrote:
 On Wednesday 19 of June 2013 17:39:21 Chander Kashyap wrote:
  On 18 June 2013 23:29, Kukjin Kim kgene@samsung.com wrote:
   On 06/19/13 02:45, Tomasz Figa wrote:
   Ccing Arnd and Olof, because I forgot to add them to git send-email...
   
   Sorry for the noise.
   
   On Tuesday 18 of June 2013 17:26:31 Tomasz Figa wrote:
   S5P_ARM_CORE1_* registers affect only core 1. To control further
   cores
   properly another registers must be used.
   
   This patch replaces S5P_ARM_CORE1_* register definitions with
   S5P_ARM_CORE_*(x) macro which return addresses of registers for
   specified core.
   
   This fixes CPU hotplug on quad core Exynos SoCs on which currently
   offlining CPUs 2 or 3 caused CPU 1 to be turned off.
   
   Obviously this doesn't happen currently because of the if (cpu == 1),
   but 
   Yes, not happened...and just note exynos5440 doesn't support hotplug :)
   so this is available on exynos4412 and added 5420.
   
   if logical cpu1 turned out not to be physical cpu1, then it would
   crash.
   
   Best regards,
   Tomasz
   
   In addition,
   bring-up of CPU 2 and 3 is fixed on boards where bootloader powers
   off
   secondary cores by default.
   
   I need to test on board about above...
   
   Cc: sta...@vger.kernel.org
   Signed-off-by: Tomasz Figat.f...@samsung.com
   Signed-off-by: Kyungmin Parkkyungmin.p...@samsung.com
   ---
   
 arch/arm/mach-exynos/hotplug.c   |  9 +
 arch/arm/mach-exynos/include/mach/regs-pmu.h | 10 +++---
 arch/arm/mach-exynos/platsmp.c   |  9 +
 3 files changed, 17 insertions(+), 11 deletions(-)
   
   diff --git a/arch/arm/mach-exynos/hotplug.c
   b/arch/arm/mach-exynos/hotplug.c index af90cfa..c089943 100644
   --- a/arch/arm/mach-exynos/hotplug.c
   +++ b/arch/arm/mach-exynos/hotplug.c
   @@ -93,10 +93,11 @@ static inline void cpu_leave_lowpower(void)
   
 static inline void platform_do_lowpower(unsigned int cpu, int
   
   *spurious) {
   
   for (;;) {
   
   +   void __iomem *reg_base;
   +   unsigned int phys_cpu = cpu_logical_map(cpu);
   
   -   /* make cpu1 to be turned off at next WFI command */
   -   if (cpu == 1)
   -   __raw_writel(0, S5P_ARM_CORE1_CONFIGURATION);
   +   reg_base = S5P_ARM_CORE_CONFIGURATION(phys_cpu);
  
  Tomasz,
  This will break for non-zero, MPIDR value.  Say if MPIDR is 1 then for
  cpu0 phys_cpu value will be 0x100,
  and address calculation will be   (S5P_ARM_CORE0_CONFIGURATION +
  ((0x101) * 0x80)), which is wrong.

Honestly, I did not understand the reasoning above, please clarify.

 
 Hmm, according to the code initializing __cpu_logical_map[] array this is 
 not true.
 
 Here's the code:
 
 https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/arch/arm/kernel/setup.c?id=refs/tags/next-20130619#n468
 
 and for used macros and bitmasks:
 
 https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/arch/arm/include/asm/cputype.h?id=refs/tags/next-20130619#n45
 
 Now the structure of the MPIDR register:
 
 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIHEBGFG.html
 
 As you can see, the value read from the register in 
 smp_setup_processor_id() is only the physical CPU ID, so I don't see any 
 problem here.

Define physical CPU ID :-)

There is a problem here: the MPIDR is not an index, and the cpu_logical_map is
populated in arm_dt_init_cpu_maps in:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/kernel/devtree.c?id=refs/tags/v3.10-rc6

with all affinity levels.

You need to perform a mapping between logical cpus and registers offset,
you can't use the cpu_logical_map directly for that.

Next accident waiting to happen is GIC code (CONFIG_GIC_NON_BANKED), where
cpu_logical_map is used erroneously as an index.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v5 0/1] drivers: mfd: Versatile Express SPC support

2013-07-16 Thread Lorenzo Pieralisi
Hello,

version v5 of VExpress SPC driver, please read on the changelog for major
changes and explanations.

The probing scheme is unchanged, since after trying the early platform
devices approach it appeared that the end result was no better than the
current one. The only clean solution relies either on changing how
secondaries are brought up in the kernel (later than now) or enable
early platform device registration through DT. Please check this
thread for the related discussion:

https://lists.ozlabs.org/pipermail/devicetree-discuss/2013-June/036542.html

The interface was adapted to regmap and again reverted to old driver for
the following reasons:

- Power down registers locking is hairy and requires arch spinlocks in
  the MCPM back end to work properly, normal spinlocks cannot be used
- Regmap adds unnecessary code to manage SPC since it is just a bunch of
  registers used to control power management flags, the overhead is just
  not worth it (talking about power down registers, not the vexpress config
  interface)
- The locking scheme behind regmap requires all registers in the map
  to be protected with the same lock, which is not exactly what we want
  here
- Given the reasons above, adding a regmap interface buys us nothing from
  a driver readability and maintainability perspective (again just talking
  about the power interface, a few registers) because for the SPC it would
  simply not be used

/drivers/mfd is probably not the right place for this code as it stands (but
probably will be when the entire driver, with DVFS and config interface, is
complete).

Thank you for the review in advance,
Lorenzo

This patch is v5 of a previous posting:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/177150.html

v5 changes:

- Completely removed vexpress-config interface waiting for refactoring
  based on regmap
- Removed frequency scaling interface and operating points programming
  retrieval
- Trimmed the driver code and DT bindings

v4 changes:
- Applied review comments (trimmed function names, added comments, refactored
  some APIs)
- Added comments throughout the set
- Fixed irq handler bug in checking the transaction status
- Improved commit log to explain early init synchro scheme
- Created a single static structure for variables dynamically allocated to
  remove usage of static
- Improved Kconfig entry

v3 changes:

- added __refdata to spc_check_loaded pointer
- removed some exported symbols
- added node pointer check in vexpress_spc_init()

v2 changes:

- Dropped timeout interface patch
- Converted interfaces to non-timeout ones, integrated and retested
- Removed mutex used at init
- Refactored code to work around init sections warning
- Fixed two minor bugs

Lorenzo Pieralisi (1):
  drivers: mfd: vexpress: add Serial Power Controller (SPC) support

 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  36 ++
 drivers/mfd/Kconfig|  10 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-spc.c | 253 ++
 include/linux/vexpress.h   |  17 +
 5 files changed, 317 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
 create mode 100644 drivers/mfd/vexpress-spc.c

-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v5 1/1] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-07-16 Thread Lorenzo Pieralisi
The TC2 versatile express core tile integrates a logic block that provides the
interface between the dual cluster test-chip and the M3 microcontroller that
carries out power management. The logic block, called Serial Power Controller
(SPC), contains several memory mapped registers to control among other things
low-power states, wake-up irqs and per-CPU jump addresses registers.

This patch provides a driver that enables run-time control of features
implemented by the SPC power management control logic.

The SPC control logic is required to be programmed very early in the boot
process to reset secondary CPUs on the TC2 testchip, set-up jump addresses and
wake-up IRQs for power management. Hence, waiting for core changes to be
made in the device core code to enable early registration of platform
devices, the driver puts in place an early init scheme that allows kernel
drivers to initialize the SPC driver directly from the components requiring
it, if their initialization routine is called before the driver init
function by the boot process.

Device tree bindings documentation for the SPC component is provided with
the patchset.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Olof Johansson o...@lixom.net
Cc: Pawel Moll pawel.m...@arm.com
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Achin Gupta achin.gu...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
Signed-off-by: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
---
 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  36 ++
 drivers/mfd/Kconfig|  10 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-spc.c | 253 ++
 include/linux/vexpress.h   |  17 +
 5 files changed, 317 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/vexpress-spc.txt 
b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
new file mode 100644
index 000..1614725
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
@@ -0,0 +1,36 @@
+* ARM Versatile Express Serial Power Controller device tree bindings
+
+Latest ARM development boards implement a power management interface (serial
+power controller - SPC) that is capable of managing power states transitions,
+wake-up IRQs and resume addresses for ARM multiprocessor testchips.
+The serial controller can be programmed through a memory mapped interface
+that enables communication between firmware running on the microcontroller
+managing power states and the application processors.
+
+The SPC DT bindings are defined as follows:
+
+- spc node
+
+   - compatible:
+   Usage: required
+   Value type: stringlist
+   Definition: must be
+   arm,vexpress-spc,v2p-ca15_a7, arm,vexpress-spc
+   - reg:
+   Usage: required
+   Value type: prop-encode-array
+   Definition: A standard property that specifies the base address
+   and the size of the SPC address space
+   - interrupts:
+   Usage: required
+   Value type: prop-encoded-array
+   Definition:  SPC interrupt configuration. A standard property
+that follows ePAPR interrupts specifications
+
+Example:
+
+spc: spc@7fff {
+   compatible = arm,vexpress-spc,v2p-ca15_a7, arm,vexpress-spc;
+   reg = 0x7fff 0x1000;
+   interrupts = 0 95 4;
+};
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 6959b8d..ebd23f4 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1149,3 +1149,13 @@ config VEXPRESS_CONFIG
help
  Platform configuration infrastructure for the ARM Ltd.
  Versatile Express.
+
+config VEXPRESS_SPC
+   bool Versatile Express SPC driver support
+   depends on ARM
+   help
+ The Serial Power Controller (SPC) for ARM Ltd. test chips, is
+ an IP that provides a memory mapped interface to power controller
+ HW. The driver provides an API abstraction allowing to program
+ registers controlling low-level power management features like power
+ down flags, global and per-cpu wake-up IRQs.
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 718e94a..3a01203 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -153,5 +153,6 @@ obj-$(CONFIG_MFD_SEC_CORE)  += sec-core.o sec-irq.o
 obj-$(CONFIG_MFD_SYSCON)   += syscon.o
 obj-$(CONFIG_MFD_LM3533)   += lm3533-core.o lm3533-ctrlbank.o
 obj-$(CONFIG_VEXPRESS_CONFIG)  += vexpress-config.o vexpress-sysreg.o
+obj-$(CONFIG_VEXPRESS_SPC) += vexpress-spc.o
 obj-$(CONFIG_MFD_RETU) += retu-mfd.o
 obj-$(CONFIG_MFD_AS3711)   += as3711.o
diff --git a/drivers/mfd/vexpress-spc.c b/drivers/mfd/vexpress-spc.c
new file mode 100644
index 000..aa8c2a4
--- /dev/null
+++ b/drivers

Re: [RFC PATCH v5 0/1] drivers: mfd: Versatile Express SPC support

2013-07-17 Thread Lorenzo Pieralisi
On Wed, Jul 17, 2013 at 10:18:25AM +0100, Pawel Moll wrote:
 On Tue, 2013-07-16 at 17:05 +0100, Lorenzo Pieralisi wrote:
  /drivers/mfd is probably not the right place for this code as it stands (but
  probably will be when the entire driver, with DVFS and config interface, is
  complete).
 
 Not that it really matters now, but my vexpress-sysreg rework will -
 most likely - leave only skeleton in the MFD (registering mfd_cells) and
 other stuff is going to be spread all around. Then I'm planning to move
 the remaining of the vexpress-specific initialization to
 drivers/platform/arm/vexpress.c, so maybe sticking vexpress-spc.c to
 this (non-existing yet) directory would be the right thing to do?

Done. I do not think there is a point in splitting the patch to create
the dir and make infrastructure, so I squashed everything in the
original patch. I have not added any maintainer for that dir/file, I
guess it can wait till you finish the rework so that you can add
yourself there.

If that's all I need to change I do not even think that reposting is
necessary.

It does matter though, since it implies changes on who is in charge of
ack/nack'ing this code, if it is no more an mfd matter.

I will wait to check all interested/concerned parties opinions, that are
always welcome.

 Other than that:
 
 Acked-by: Pawel Moll pawel.m...@arm.com

Thank you !!!
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v5 0/1] drivers: mfd: Versatile Express SPC support

2013-07-18 Thread Lorenzo Pieralisi
Hi Samuel,

On Wed, Jul 17, 2013 at 10:07:00PM +0100, Samuel Ortiz wrote:
 Hi Lorenzo,
 
 On Tue, Jul 16, 2013 at 05:05:42PM +0100, Lorenzo Pieralisi wrote:
  Hello,
  
  version v5 of VExpress SPC driver, please read on the changelog for major
  changes and explanations.
  
  The probing scheme is unchanged, since after trying the early platform
  devices approach it appeared that the end result was no better than the
  current one. The only clean solution relies either on changing how
  secondaries are brought up in the kernel (later than now) or enable
  early platform device registration through DT. Please check this
  thread for the related discussion:
  
  https://lists.ozlabs.org/pipermail/devicetree-discuss/2013-June/036542.html
  
  The interface was adapted to regmap and again reverted to old driver for
  the following reasons:
  
  - Power down registers locking is hairy and requires arch spinlocks in
the MCPM back end to work properly, normal spinlocks cannot be used
  - Regmap adds unnecessary code to manage SPC since it is just a bunch of
registers used to control power management flags, the overhead is just
not worth it (talking about power down registers, not the vexpress config
interface)
  - The locking scheme behind regmap requires all registers in the map
to be protected with the same lock, which is not exactly what we want
here
  - Given the reasons above, adding a regmap interface buys us nothing from
a driver readability and maintainability perspective (again just talking
about the power interface, a few registers) because for the SPC it would
simply not be used
  
  /drivers/mfd is probably not the right place for this code as it stands (but
  probably will be when the entire driver, with DVFS and config interface, is
  complete).
 Could you please elaborate on how will the SPC driver extend into an MFD
 driver?

Reading through the thread I noticed Nico explained details properly, I
was about to mention a possible solution to the directory issue but I am
pretty sure that what he did will turn out for the best.

Usually, or better, historically, these pieces of code that program
PMICs lived in arch/arm/mach-* directories and that's something we could
have done as well (create a static mapping and write some functions to
peek and poke a few registers), but we thought that it was not the proper
way to go.

On top of that, the SPC is part of a component whose register space maps
disparate functions (config interface for voltage, clocks, energy probes,
frequency scaling and power states management) and basically that's the
reason we struggled to partition it properly (with further complexity
implied by the way requests - config and frequency scaling - have to be
serialized).

I hope the end result is reasonable, and overall I think it was a debate
that was worth having.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the arm-soc tree with the pm tree

2013-09-10 Thread Lorenzo Pieralisi
On Mon, Sep 09, 2013 at 06:22:16PM +0100, Kevin Hilman wrote:
 On Mon, Sep 2, 2013 at 11:09 AM, Lorenzo Pieralisi
 lorenzo.pieral...@arm.com wrote:
  On Thu, Aug 29, 2013 at 06:57:15PM +0100, Olof Johansson wrote:
  On Thu, Aug 29, 2013 at 06:04:25PM +1000, Stephen Rothwell wrote:
   Hi all,
  
   Today's linux-next merge of the arm-soc tree got a conflict in
   drivers/cpuidle/Makefile between commits b98e01ad4ed9 (cpuidle: Add
   Kconfig.arm and move calxeda, kirkwood and zynq) and d3f2950f2ade (ARM:
   ux500: cpuidle: Move ux500 cpuidle driver to drivers/cpuidle) from the
   pm tree and commit 14d2c34cfa00 (cpuidle: big.LITTLE: vexpress-TC2 CPU
   idle driver) from the arm-soc tree.
  
   I fixed it up (see below) and can carry the fix as necessary (no action
   is required).
  
   --
   Cheers,
   Stephen Rothwells...@canb.auug.org.au
  
   diff --cc drivers/cpuidle/Makefile
   index 0b9d200,3b6445c..000
   --- a/drivers/cpuidle/Makefile
   +++ b/drivers/cpuidle/Makefile
   @@@ -5,9 -5,7 +5,10 @@@
 obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
 obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
  
-obj-$(CONFIG_CPU_IDLE_CALXEDA) += cpuidle-calxeda.o
-obj-$(CONFIG_ARCH_KIRKWOOD) += cpuidle-kirkwood.o
-obj-$(CONFIG_CPU_IDLE_ZYNQ) += cpuidle-zynq.o
-obj-$(CONFIG_CPU_IDLE_BIG_LITTLE) += cpuidle-big_little.o

   +##
+# ARM SoC drivers
+obj-$(CONFIG_ARM_HIGHBANK_CPUIDLE)+= cpuidle-calxeda.o
+obj-$(CONFIG_ARM_KIRKWOOD_CPUIDLE)+= cpuidle-kirkwood.o
+obj-$(CONFIG_ARM_ZYNQ_CPUIDLE)+= cpuidle-zynq.o
+obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
   ++obj-$(CONFIG_CPU_IDLE_BIG_LITTLE)   += cpuidle-big_little.o
 
 
  Might want to sort u8500 before zynq, but otherwise looks fine.
 
  I noticed that owing to the merge, CONFIG_CPU_IDLE_BIG_LITTLE should be 
  moved
  to the newly introduced Kconfig.arm. How are we going to handle this ? It is
  just a matter of renaming the config entry and moving it to Kconfig.arm.
 
 
 For the merge window, we'll merge it as is, but a rename
 cleanup/rename patch for v3.12-rc would be appreciated.

Agreed, thanks Kevin.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-22 Thread Lorenzo Pieralisi
On Sat, Feb 22, 2014 at 10:38:40AM +, Russell King - ARM Linux wrote:
 On Wed, Feb 19, 2014 at 04:12:54PM +, Lorenzo Pieralisi wrote:
  On Wed, Feb 19, 2014 at 01:52:09AM +, Sebastian Capella wrote:
   +/*
   + * Snapshot kernel memory and reset the system.
   + * After resume, the hibernation snapshot is written out.
   + */
   +static int notrace __swsusp_arch_save_image(unsigned long unused)
   +{
   + int ret;
   +
   + ret = swsusp_save();
   + if (ret == 0)
   + soft_restart(virt_to_phys(cpu_resume));
  
  By the time the suspend finisher (ie this function) is run, the
  processor state has been saved and I think that's all you have to do,
  function can just return after calling swsusp_save(), unless I am missing
  something.
  
  I do not understand why a soft_restart is required here. On a side note,
  finisher is called with irqs disabled so, since you added a function for
  soft restart noirq, it should be used, if needed, but I have to understand
  why in the first place.
 
 It's required because you can't just return from the finisher.  A normal
 return from the finisher will always be interpreted as an abort rather
 than success (because the state has to be unwound.)
 
 This is the only way to get a zero return from cpu_suspend().

Yes, that's the only reason why this code is jumping to cpu_resume, since
all it is needed is to snapshot the CPU context and by the time the
finisher is called that's done. Wanted to say that soft reboot is not
useful (cache flushing and resume with MMU off), but what you are saying
is correct. We might be saving swsusp_save return value in a global
variable and just return from the finisher, but that's horrible and
given the amount of time it takes to snapshot the image to disk the
cost of this soft reboot will be dwarfed by that.

I wanted to ask and clarify why the code was written like this though, given
its complexity.

   +/*
   + * The framework loads the hibernation image into a linked list anchored
   + * at restore_pblist, for swsusp_arch_resume() to copy back to the proper
   + * destinations.
   + *
   + * To make this work if resume is triggered from initramfs, the
   + * pagetables need to be switched to allow writes to kernel mem.
  
  Can you elaborate a bit more on this please ?
  
   + */
   +static void notrace __swsusp_arch_restore_image(void *unused)
   +{
   + struct pbe *pbe;
   +
   + cpu_switch_mm(idmap_pgd, init_mm);
  
  Same here, thanks.
  
   + for (pbe = restore_pblist; pbe; pbe = pbe-next)
   + copy_page(pbe-orig_address, pbe-address);
   +
   + soft_restart_noirq(virt_to_phys(cpu_resume));
  
  This soft_restart is justified so that you resume from the context saved
  when creating the image.
 
 You need the idmap_pgd in place to call cpu_resume at it's physical
 address.  Other page tables just won't do here.  It's well established
 that this page table must be in place for the resume paths to work.

Well, we do not need idmap page tables for copying the restore_pblist,
but we do need a set of tables that won't be corrupted by the copy and
idmap does the trick (I was confused because 1:1 mappings are not needed
for the copy itself).

The switch to idmap is done for us in soft_reboot anyway before jumping to
cpu_resume and that's required, as you said.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-22 Thread Lorenzo Pieralisi
On Sat, Feb 22, 2014 at 10:16:55AM +, Russell King - ARM Linux wrote:
 On Thu, Feb 20, 2014 at 04:27:55PM +, Lorenzo Pieralisi wrote:
  I still do not understand why switching to idmap, which is a clone of
  init_mm + 1:1 kernel mappings is required here. Why idmap ?
  
  And while at it, can't the idmap be overwritten _while_ copying back the
  resume kernel ? Is it safe to use idmap page tables while copying ?
  
  I had a look at x86 and there idmap page tables used to resume are created
  on the fly using safe pages, on ARM idmap is created at boot.
 
 That's fine.
 
 Remember, you're required to boot exactly the same kernel image when
 resuming as the kernel which created the suspend image.  Unless you
 have random allocations going on, you should get the same layout for
 the idmap stuff at each boot.

Thanks Russell, now that's clear. We do need a copy of page tables
that are not tampered with while copying, and idmap works well for
that.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-02-25 Thread Lorenzo Pieralisi
Hi Stephen,

On Wed, Feb 19, 2014 at 12:20:43AM +, Stephen Boyd wrote:
 (Sorry, this discussion stalled due to merge window + life events)

Sorry for the delay in replying on my side too.

 On 01/17, Lorenzo Pieralisi wrote:
  On Thu, Jan 16, 2014 at 07:26:17PM +, Stephen Boyd wrote:
   On 01/16, Lorenzo Pieralisi wrote:
On Thu, Jan 16, 2014 at 06:05:05PM +, Stephen Boyd wrote:
 On 01/16, Lorenzo Pieralisi wrote:
  Do we really want to do that ? I am not sure. A cpus node is 
  supposed to
  be a container node, we should not define this binding just because 
  we
  know the kernel creates a platform device for it then.
 
 This is just copying more of the ePAPR spec into this document.
 It just so happens that having a compatible field here allows a
 platform device to be created. I don't see why that's a problem.

I do not see why you cannot define a node like pmu or arch-timer and 
stick
a compatible property in there. cpus node does not represent a device, 
and
must not be created as a platform device, that's my opinion.

   
   I had what you're suggesting before in the original revision of
   this patch. Please take a look at the original patch series[1]. I
   suppose it could be tweaked slightly to still have a cache node
   for the L2 interrupt and the next-level-cache pointer from the
   CPUs.
  
  Ok, sorry, we are running around in circles here, basically you moved
  the node to cpus according to reviews. I still think that treating cpus
  as a device is not a great idea, even though I am in the same
  position with C-states and probably will add C-state tables in the cpus
  node.
  
  http://comments.gmane.org/gmane.linux.power-management.general/41012
  
  I just would like to see under cpus nodes and properties that apply to
  all ARM systems, and avoid defining properties (eg interrupts) that
  have different meanings for different ARM cores.
  
  The question related to why the kernel should create a platform device
  out of cpus is still open. I really do not want to block your series
  for these simple issues but we have to make a decision and stick to that,
  I am fine either way if we have a plan.
  
 
 Do you just want a backup plan in case we don't make a platform
 device out of the cpus node? I believe we can always add code
 somewhere to create a platform device at runtime if we detect the
 cpus node has a compatible string equal to qcom,krait. We could
 probably change this driver's module_init() to scan the DT for
 such a compatible string and create the platform device right
 there. If we get more than one interrupt in the cpus node we can
 add interrupt-names and then have software look for interrupts by
 name instead of number.

As I mentioned, I do not like the idea of adding compatible properties
just to force the kernel to create platform devices out of device tree
nodes. On top of that I would avoid adding a compatible property
to the cpus node (after all properties like enable-method are common for all
cpus but still duplicated), my only concern being backward compatibility
here (ie if we do that for interrupts, we should do that also for other
common cpu nodes properties, otherwise we have different rules for
different properties).

I think you can then add interrupts to cpu nodes (qcom,krait specific),
and as you mentioned create a platform device for that.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-25 Thread Lorenzo Pieralisi
On Sun, Feb 23, 2014 at 08:02:08PM +, Sebastian Capella wrote:
 Quoting Lorenzo Pieralisi (2014-02-22 04:09:10)
  On Sat, Feb 22, 2014 at 10:38:40AM +, Russell King - ARM Linux wrote:
   On Wed, Feb 19, 2014 at 04:12:54PM +, Lorenzo Pieralisi wrote:
On Wed, Feb 19, 2014 at 01:52:09AM +, Sebastian Capella wrote:
 +/*
 + * Snapshot kernel memory and reset the system.
 + * After resume, the hibernation snapshot is written out.
 + */
 +static int notrace __swsusp_arch_save_image(unsigned long unused)
 +{
 + int ret;
 +
 + ret = swsusp_save();
 + if (ret == 0)
 + soft_restart(virt_to_phys(cpu_resume));

By the time the suspend finisher (ie this function) is run, the
processor state has been saved and I think that's all you have to do,
function can just return after calling swsusp_save(), unless I am 
missing
something.

I do not understand why a soft_restart is required here. On a side note,
finisher is called with irqs disabled so, since you added a function for
soft restart noirq, it should be used, if needed, but I have to 
understand
why in the first place.
   
   It's required because you can't just return from the finisher.  A normal
   return from the finisher will always be interpreted as an abort rather
   than success (because the state has to be unwound.)
   
   This is the only way to get a zero return from cpu_suspend().
  
  Yes, that's the only reason why this code is jumping to cpu_resume, since
  all it is needed is to snapshot the CPU context and by the time the
  finisher is called that's done. Wanted to say that soft reboot is not
  useful (cache flushing and resume with MMU off), but what you are saying
  is correct. We might be saving swsusp_save return value in a global
  variable and just return from the finisher, but that's horrible and
  given the amount of time it takes to snapshot the image to disk the
  cost of this soft reboot will be dwarfed by that.
  
  I wanted to ask and clarify why the code was written like this though, given
  its complexity.
 
 We could also return a constant  1.  __cpu_suspend code will replace
 a 0 return with 1 for paths exiting suspend, but will not change return
 values != 0.  

Yes, we could but that's an API abuse and as I mentioned that soft_reboot is
not a massive deal, should not block your series. It is certainly
something to be benchmarked though since wiping the entire cache hierarchy
for nothing is not nifty.

 cpu_suspend_abort:
 ldmia   sp!, {r1 - r3}  @ pop phys pgd, virt SP, phys
   resume fn
   teq r0, #0
   moveq   r0, #1  @ force non-zero value
   mov sp, r2
   ldmfd   sp!, {r4 - r11, pc}
 
 We could take advantage of that if we wanted, but Lorenzo pointed out
 also that the relative benefit is very low since the cost of
 resuming is  soft_restart. 

The cost of writing to disk, to be precise. Again, this should be benchmarked.

 I'll go with leaving the soft_restart as is unless someone feels
 strongly against.

Leaving it as it is is fine for now, but should be commented, because that's
not clear why it is needed by just reading the code.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-26 Thread Lorenzo Pieralisi
On Tue, Feb 25, 2014 at 05:55:31PM +, Sebastian Capella wrote:
 Quoting Lorenzo Pieralisi (2014-02-25 03:32:51)
  On Sun, Feb 23, 2014 at 08:02:08PM +, Sebastian Capella wrote:
   I'll go with leaving the soft_restart as is unless someone feels
   strongly against.
  
  Leaving it as it is is fine for now, but should be commented, because that's
  not clear why it is needed by just reading the code.
 
 Hi Lorenzo,
 
 How is something like this?
 
 /*
  * Snapshot kernel memory and reset the system.

Please add:

swsusp_save() is executed in the suspend finisher so that the CPU context
pointer and memory are part of the saved image, which is required by the
resume kernel image to restart execution from swsusp_arch_suspend()

  * soft_restart is not technically needed, but is used
  * to get success returned from cpu_suspend.
  * After resume, the hibernation snapshot is written out.

When soft reboot completes, the hibernation snapshot is written out.

Resume is confusing since this code is resuming twice :D on image saving
and on kernel image restoration.

Lorenzo

  */
 static int notrace __swsusp_arch_save_image(unsigned long unused)
 {
 int ret;
 
 ret = swsusp_save();
 if (ret == 0)
 soft_restart(virt_to_phys(cpu_resume));
 return ret;
 }
 
 Thanks again for all of the feedback!
 
 Sebastian
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-02-26 Thread Lorenzo Pieralisi
On Tue, Feb 25, 2014 at 08:48:38PM +, Kumar Gala wrote:
 
 On Feb 25, 2014, at 5:16 AM, Lorenzo Pieralisi lorenzo.pieral...@arm.com 
 wrote:
 
  Hi Stephen,
  
  On Wed, Feb 19, 2014 at 12:20:43AM +, Stephen Boyd wrote:
  (Sorry, this discussion stalled due to merge window + life events)
  
  Sorry for the delay in replying on my side too.
  
  On 01/17, Lorenzo Pieralisi wrote:
  On Thu, Jan 16, 2014 at 07:26:17PM +, Stephen Boyd wrote:
  On 01/16, Lorenzo Pieralisi wrote:
  On Thu, Jan 16, 2014 at 06:05:05PM +, Stephen Boyd wrote:
  On 01/16, Lorenzo Pieralisi wrote:
  Do we really want to do that ? I am not sure. A cpus node is supposed 
  to
  be a container node, we should not define this binding just because we
  know the kernel creates a platform device for it then.
  
  This is just copying more of the ePAPR spec into this document.
  It just so happens that having a compatible field here allows a
  platform device to be created. I don't see why that's a problem.
  
  I do not see why you cannot define a node like pmu or arch-timer and 
  stick
  a compatible property in there. cpus node does not represent a device, 
  and
  must not be created as a platform device, that's my opinion.
  
  
  I had what you're suggesting before in the original revision of
  this patch. Please take a look at the original patch series[1]. I
  suppose it could be tweaked slightly to still have a cache node
  for the L2 interrupt and the next-level-cache pointer from the
  CPUs.
  
  Ok, sorry, we are running around in circles here, basically you moved
  the node to cpus according to reviews. I still think that treating cpus
  as a device is not a great idea, even though I am in the same
  position with C-states and probably will add C-state tables in the cpus
  node.
  
  http://comments.gmane.org/gmane.linux.power-management.general/41012
  
  I just would like to see under cpus nodes and properties that apply to
  all ARM systems, and avoid defining properties (eg interrupts) that
  have different meanings for different ARM cores.
  
  The question related to why the kernel should create a platform device
  out of cpus is still open. I really do not want to block your series
  for these simple issues but we have to make a decision and stick to that,
  I am fine either way if we have a plan.
  
  
  Do you just want a backup plan in case we don't make a platform
  device out of the cpus node? I believe we can always add code
  somewhere to create a platform device at runtime if we detect the
  cpus node has a compatible string equal to qcom,krait. We could
  probably change this driver's module_init() to scan the DT for
  such a compatible string and create the platform device right
  there. If we get more than one interrupt in the cpus node we can
  add interrupt-names and then have software look for interrupts by
  name instead of number.
  
  As I mentioned, I do not like the idea of adding compatible properties
  just to force the kernel to create platform devices out of device tree
  nodes. On top of that I would avoid adding a compatible property
  to the cpus node (after all properties like enable-method are common for all
  cpus but still duplicated), my only concern being backward compatibility
  here (ie if we do that for interrupts, we should do that also for other
  common cpu nodes properties, otherwise we have different rules for
  different properties).
  
  I think you can then add interrupts to cpu nodes (qcom,krait specific),
  and as you mentioned create a platform device for that.
  
  Thanks,
  Lorenzo
 
 So I agree with the statement about adding compatibles just to create 
 platform devices is wrong.  However its seems perfectly reasonable for a cpu 
 node to have a compatible property.  I don't see why a CPU is any different 
 from any other device described in a DT.

I was referring to the /cpus node, not to individual cpu nodes, where
the compatible property is already present now.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-26 Thread Lorenzo Pieralisi
On Wed, Feb 26, 2014 at 05:50:55PM +, Sebastian Capella wrote:
 Quoting Lorenzo Pieralisi (2014-02-26 02:24:27)
  On Tue, Feb 25, 2014 at 05:55:31PM +, Sebastian Capella wrote:
  
  Please add:
  
  swsusp_save() is executed in the suspend finisher so that the CPU context
  pointer and memory are part of the saved image, which is required by the
  resume kernel image to restart execution from swsusp_arch_suspend()
  
* soft_restart is not technically needed, but is used
* to get success returned from cpu_suspend.
* After resume, the hibernation snapshot is written out.
  
  When soft reboot completes, the hibernation snapshot is written out.
  
  Resume is confusing since this code is resuming twice :D on image saving
  and on kernel image restoration.
 
 Thanks Lorenzo!
 
 Here's what I've got.
 
 /*
  * Snapshot kernel memory and reset the system.
  *
  * swsusp_save() is executed in the suspend finisher so that the CPU
  * context pointer and memory are part of the saved image, which is
  * required by the resume kernel image to restart execution from
  * swsusp_arch_suspend().
  *
  * soft_restart is not technically needed, but is used to get success
  * returned from cpu_suspend.
  * 
  * When soft reboot completes, the hibernation snapshot is
  * written out.
  */
 
 Does this look ok?  I'll prepare a v4 patchset.

Yes it does, I will wait and review v4 then.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] ARM hibernation / suspend-to-disk

2014-02-28 Thread Lorenzo Pieralisi
On Thu, Feb 27, 2014 at 11:57:58PM +, Sebastian Capella wrote:

[...]

 diff --git a/arch/arm/kernel/hibernate.c b/arch/arm/kernel/hibernate.c
 new file mode 100644
 index 000..a41e0e3
 --- /dev/null
 +++ b/arch/arm/kernel/hibernate.c
 @@ -0,0 +1,113 @@
 +/*
 + * Hibernation support specific for ARM
 + *
 + * Derived from work on ARM hibernation support by:
 + *
 + * Ubuntu project, hibernation support for mach-dove
 + * Copyright (C) 2010 Nokia Corporation (Hiroshi Doyu)
 + * Copyright (C) 2010 Texas Instruments, Inc. (Teerth Reddy et al.)
 + *  https://lkml.org/lkml/2010/6/18/4
 + *  
 https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/027422.html
 + *  https://patchwork.kernel.org/patch/96442/
 + *
 + * Copyright (C) 2006 Rafael J. Wysocki r...@sisk.pl
 + *
 + * License terms: GNU General Public License (GPL) version 2
 + */
 +
 +#include linux/mm.h
 +#include linux/suspend.h
 +#include asm/tlbflush.h
 +#include asm/cacheflush.h

You can drop tlbflush.h and cacheflush.h, they do not seem to be needed.

 +#include asm/system_misc.h
 +#include asm/idmap.h
 +#include asm/suspend.h
 +
 +extern const void __nosave_begin, __nosave_end;
 +
 +int pfn_is_nosave(unsigned long pfn)
 +{
 + unsigned long nosave_begin_pfn =
 + __pa_symbol(__nosave_begin)  PAGE_SHIFT;
 + unsigned long nosave_end_pfn =
 + PAGE_ALIGN(__pa_symbol(__nosave_end))  PAGE_SHIFT;
 +
 + return (pfn = nosave_begin_pfn)  (pfn  nosave_end_pfn);
 +}
 +
 +void notrace save_processor_state(void)
 +{
 + WARN_ON(num_online_cpus() != 1);
 + local_fiq_disable();
 +}
 +
 +void notrace restore_processor_state(void)
 +{
 + local_fiq_enable();
 +}
 +
 +/*
 + * Snapshot kernel memory and reset the system.
 + *
 + * swsusp_save() is executed in the suspend finisher so that the CPU
 + * context pointer and memory are part of the saved image, which is
 + * required by the resume kernel image to restart execution from
 + * swsusp_arch_suspend().
 + *
 + * soft_restart is not technically needed, but is used to get success
 + * returned from cpu_suspend.
 + *
 + * When soft reboot completes, the hibernation snapshot is written out.
 + */
 +static int notrace arch_save_image(unsigned long unused)
 +{
 + int ret;
 +
 + ret = swsusp_save();
 + if (ret == 0)
 + soft_restart(virt_to_phys(cpu_resume));
 + return ret;
 +}
 +
 +/*
 + * Save the current CPU state before suspend / poweroff.
 + */
 +int notrace swsusp_arch_suspend(void)
 +{
 + return cpu_suspend(0, arch_save_image);
 +}
 +
 +/*
 + * The framework loads the hibernation image into a linked list anchored
 + * at restore_pblist, for swsusp_arch_resume() to copy back to the proper
 + * destinations.
 + *
 + * To make this work if resume is triggered from initramfs, the
 + * pagetables need to be switched to allow writes to kernel mem.
 + */

Comment above needs updating. We are switching page tables to a set of
page tables that are certain to live at the same location in the older
kernel, that's the only reason, as we discussed. soft_restart will make
sure (again) to switch to 1:1 page tables so that we can call cpu_resume
with the MMU off.

 +static void notrace arch_restore_image(void *unused)
 +{
 + struct pbe *pbe;
 +
 + cpu_switch_mm(idmap_pgd, init_mm);
 + for (pbe = restore_pblist; pbe; pbe = pbe-next)
 + copy_page(pbe-orig_address, pbe-address);
 +
 + soft_restart(virt_to_phys(cpu_resume));
 +}
 +
 +static u8 resume_stack[PAGE_SIZE/2] __nosavedata;
 +
 +/*
 + * Resume from the hibernation image.
 + * Due to the kernel heap / data restore, stack contents change underneath
 + * and that would make function calls impossible; switch to a temporary
 + * stack within the nosave region to avoid that problem.
 + */
 +int swsusp_arch_resume(void)
 +{
 + extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);
 + call_with_stack(arch_restore_image, 0,
 + resume_stack + sizeof(resume_stack));

This does not guarantee your stack is 8-byte aligned, that's not AAPCS
compliant and might buy you trouble.

Either you align the stack or you align the pointer you are passing.

Please have a look at kernel/process.c

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] ARM hibernation / suspend-to-disk

2014-02-28 Thread Lorenzo Pieralisi
On Fri, Feb 28, 2014 at 08:15:57PM +, Sebastian Capella wrote:

[...]

   +
   +/*
   + * The framework loads the hibernation image into a linked list anchored
   + * at restore_pblist, for swsusp_arch_resume() to copy back to the proper
   + * destinations.
   + *
   + * To make this work if resume is triggered from initramfs, the
   + * pagetables need to be switched to allow writes to kernel mem.
   + */
  
  Comment above needs updating. We are switching page tables to a set of
  page tables that are certain to live at the same location in the older
  kernel, that's the only reason, as we discussed. soft_restart will make
  sure (again) to switch to 1:1 page tables so that we can call cpu_resume
  with the MMU off.
 
 How does this look?
 
 The framework loads as much of the hibernation image to final physical
 pages as possible.  Any pages that were in use, will need to be restored
 prior to the soft_restart.  The pages to restore are maintained in
 the list anchored at restore_pblist.  At this point, we can swap the
 pages to their final location.  We must switch the mapping to 1:1 to
 ensure that when we overwrite the page table physical pages we're using
 a known physical location (idmap_pgd) with known contents.

It is ok, a tad too verbose. All I care about is a comment describing
what's really needed, the existing one was confusing and wrong.

   +/*
   + * Resume from the hibernation image.
   + * Due to the kernel heap / data restore, stack contents change 
   underneath
   + * and that would make function calls impossible; switch to a temporary
   + * stack within the nosave region to avoid that problem.
   + */
   +int swsusp_arch_resume(void)
   +{
   + extern void call_with_stack(void (*fn)(void *), void *arg, void 
   *sp);
   + call_with_stack(arch_restore_image, 0,
   + resume_stack + sizeof(resume_stack));
  
  This does not guarantee your stack is 8-byte aligned, that's not AAPCS
  compliant and might buy you trouble.
  
  Either you align the stack or you align the pointer you are passing.
  
  Please have a look at kernel/process.c
 
 I've added this for now, do you see any issues?
 
 -static u8 resume_stack[PAGE_SIZE/2] __nosavedata;
 +static u64 resume_stack[PAGE_SIZE/2/sizeof(u64)] __nosavedata;
 -   resume_stack + sizeof(resume_stack));
 +   resume_stack + ARRAY_SIZE(resume_stack));

I do not see why the stack depends on the PAGE_SIZE. I would be surprised
if you need more than a few bytes (given that soft_restart switches stack
again...), go through it with a debugger, it is easy to check the stack
usage and allow for some extra buffer (but half a page is not needed).

My main concern was alignment, and now that's fixed.

Thanks !
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] ARM hibernation / suspend-to-disk

2014-03-04 Thread Lorenzo Pieralisi
On Tue, Mar 04, 2014 at 09:55:31AM +, Sebastian Capella wrote:
 Quoting Sebastian Capella (2014-02-28 15:38:54)
  Quoting Lorenzo Pieralisi (2014-02-28 14:49:33)
   On Fri, Feb 28, 2014 at 08:15:57PM +, Sebastian Capella wrote:
 
 This does not guarantee your stack is 8-byte aligned, that's not AAPCS
 compliant and might buy you trouble.
 
 Either you align the stack or you align the pointer you are passing.
 
 Please have a look at kernel/process.c

I've added this for now, do you see any issues?

-static u8 resume_stack[PAGE_SIZE/2] __nosavedata;
+static u64 resume_stack[PAGE_SIZE/2/sizeof(u64)] __nosavedata;
-   resume_stack + sizeof(resume_stack));
+   resume_stack + ARRAY_SIZE(resume_stack));
   
   I do not see why the stack depends on the PAGE_SIZE. I would be surprised
   if you need more than a few bytes (given that soft_restart switches stack
   again...), go through it with a debugger, it is easy to check the stack
   usage and allow for some extra buffer (but half a page is not needed).
  
  I assuming this is becase the no-save region is one page anyway (we skip
  restoring the no-save region physical page).  So maybe 1/2 is a way to
  leave some room for whatever else may need to be here, but in any case
  the 4k is used for nosave.  I think you're right that it can be much less.
 
 Hi Lorenzo,
 
 Are you ok with this just being half a page?  Or do you want me to try
 to reduce the stack size?  I am at Connect without my debugger, so in
 that case it would have to wait until next week.

I am ok, either you leave that as it is (that multiple division looks
horrible but it is just nitpicking on my side) or define it as an u8 array,
stick __attribute__((aligned(8)) to the definition (and explain why) and be
done with it.

You can add my:

Reviewed-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Linux-kernel] [PATCH] ARM: kernel: respect device tree status of cpu nodes

2014-03-06 Thread Lorenzo Pieralisi
[CC'in BenH and Grant to check how this is handled in powerPC]

On Thu, Mar 06, 2014 at 10:00:10AM +, Ben Dooks wrote:
 On 05/03/14 20:33, Stephen Boyd wrote:
  +Lorenzo
 
  On 02/24/14 03:22, Jürg Billeter wrote:
  Skip 'disabled' cpu nodes when building the cpu logical map. This avoids
  booting cpus that have been disabled in the device tree.
 
  Signed-off-by: Jürg Billeter j...@bitron.ch
  Reviewed-by: Ben Dooks ben.do...@codethink.co.uk
  ---
arch/arm/kernel/devtree.c | 4 
1 file changed, 4 insertions(+)
 
  diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
  index 739c3df..9aed299 100644
  --- a/arch/arm/kernel/devtree.c
  +++ b/arch/arm/kernel/devtree.c
  @@ -95,6 +95,10 @@ void __init arm_dt_init_cpu_maps(void)
 if (of_node_cmp(cpu-type, cpu))
 continue;
 
  +  /* Check if CPU is enabled */
  +  if (!of_device_is_available(cpu))
  +  continue;
  +
 pr_debug( * %s...\n, cpu-full_name);
 /*
  * A device tree containing CPU nodes with missing reg
 
  This doesn't follow the ePAPR spec. According to ePAPR status=disabled
  in a cpu node means the cpu is in a quiescent state and one can enable
  it by using the enable-method. At the least, we should document this
  in bindings/arm/cpus.txt if we can all agree that we want this
  definition of disabled.
 
 My view was that disabled should be the same as for the device case
 as it makes sense that way.

Ok, it has been brought up before.

ePAPR v1.1, 2.3.4 status

disabled - Indicates that the device is not presently operational, but
 it might become operational in the future (for example,
 something is not plugged in, or switched off).
 Refer to the device binding for details on what disabled means
 for a given device.

So disabled for devices does not read that different to me to what
Stephen mentioned for a CPU.

After all you just want the CPU node to be ignored, and that's not a CPU
status, it is a DT trick to use one .dtsi for multiple boot scenarios.

I wonder how the status property has been used on powerPC.

By grepping the sources, it is checked in arch/powerpc/kernel/prom_init.c
and that's just to check CPU that are already in status ==okay, so
they can be put in a holding loop (I guess).

It should be fine to deviate from the ePAPR and consider using:

status = disabled

for your aim (it has been ignored so far on all ARM platforms I am aware
of, so this should not break the kernel).

We need to update cpus.txt accordingly and override it for ARM.

 We have a position where we want to use the .dtsi files but not all
 cpus are available to the Linux. If we cannot use disabled then we
 need some other method to say this cpu node is not available, so
 please do not try and bring it online (saves boot time waiting for
 CPU nodes that cannot online)

Understood, see above.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-03-11 Thread Lorenzo Pieralisi
On Fri, Mar 07, 2014 at 11:08:56PM +, Stephen Boyd wrote:
 On 02/26, Lorenzo Pieralisi wrote:
  On Tue, Feb 25, 2014 at 08:48:38PM +, Kumar Gala wrote:
   
   On Feb 25, 2014, at 5:16 AM, Lorenzo Pieralisi 
   lorenzo.pieral...@arm.com wrote:

As I mentioned, I do not like the idea of adding compatible properties
just to force the kernel to create platform devices out of device tree
nodes. On top of that I would avoid adding a compatible property
to the cpus node (after all properties like enable-method are common 
for all
cpus but still duplicated), my only concern being backward compatibility
here (ie if we do that for interrupts, we should do that also for other
common cpu nodes properties, otherwise we have different rules for
different properties).

I think you can then add interrupts to cpu nodes (qcom,krait 
specific),
and as you mentioned create a platform device for that.

Thanks,
Lorenzo
   
   So I agree with the statement about adding compatibles just to create 
   platform devices is wrong.  However its seems perfectly reasonable for a 
   cpu node to have a compatible property.  I don't see why a CPU is any 
   different from any other device described in a DT.
  
  I was referring to the /cpus node, not to individual cpu nodes, where
  the compatible property is already present now.
  
 
 Ok I think I'll go ahead with moving the interrupts into each cpu node, i.e.:
 
 cpus {  
 #address-cells = 1;
 #size-cells = 0;
 
 cpu@0 { 
 compatible = qcom,krait;
 device_type = cpu;
 reg = 0;
 interrupts = 1 14 0x304;
 next-level-cache = L2;
 };
 
 cpu@1 { 
 compatible = qcom,krait;
 device_type = cpu;
 reg = 1;
 interrupts = 1 14 0x304;
 next-level-cache = L2;
 };
 
 L2: l2-cache {
 compatible = cache;
 interrupts = 0 2 0x4;
   };
   };
 
 Or should we be expressing the L1 cache as well? Something like:
 
 cpus {  
 #address-cells = 1;
 #size-cells = 0;
 
 cpu@0 { 
 compatible = qcom,krait;
 device_type = cpu;
 reg = 0;
 next-level-cache = L1_0;
 
   L1_0: l1-cache {
   compatible = arm,arch-cache;
   interrupts = 1 14 0x304;
   next-level-cache = L2;
   }
 };
 
 cpu@1 { 
 compatible = qcom,krait;
 device_type = cpu;
 reg = 1;
 next-level-cache = L1_1;
 
   L1_1: l1-cache {
   compatible = arm,arch-cache;
   interrupts = 1 14 0x304;
   next-level-cache = L2;
   }
 };
 
 L2: l2-cache {
 compatible = arm,arch-cache;
 interrupts = 0 2 0x4;
   };
   };
 
 (I'm also wondering if the 3rd cell of the interrupt binding
 should only indicate the CPU that the interrupt property is
 inside?)

I am not aware of interrupts associated with vanilla :) arm,arch-cache
objects, so I think that should be handled as a qcom,krait specific property
(in the cpu node), or you should add another cache binding (compatible) for
that.

As you might have noticed (idle states thread) I am keen on defining objects
for L1 caches explicitly, that patch still requires an ACK though (and
you need to update it since you cannot add an interrupt property for all
arm,arch-cache objects. I am sorry for being a pain, but I do not
think that's correct from a HW description standpoint).

 Finally we can have the edac driver look for a qcom,krait
 compatible node in cpus that it can create a platform device for,
 i.e..
 
 static int __init krait_edac_driver_init(void)
 {
 struct device_node *np;
 
 np = of_get_cpu_node(0, NULL);
 if (!np)
 return 0;
 
 if (!krait_edacp  of_device_is_compatible(np, qcom,krait))
 krait_edacp = of_platform_device_create(np, krait_edac, 
 NULL);
 of_node_put(np);
 
 return platform_driver_register(krait_edac_driver);
 }
 module_init(krait_edac_driver_init);

It seems fine to me, but it requires an ACK from platform bus and DT
maintainers.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body

Re: [PATCH v4 4/6] devicetree: bindings: Document Krait L1/L2 EDAC

2014-01-10 Thread Lorenzo Pieralisi
On Thu, Jan 09, 2014 at 08:52:21PM +, Stephen Boyd wrote:
 On 01/08/14 02:05, Lorenzo Pieralisi wrote:
  On Tue, Jan 07, 2014 at 08:12:39PM +, Stephen Boyd wrote:
  On 01/07, Lorenzo Pieralisi wrote:
 
  I have a problem with the cache level definition, and in
  particular the numbering, ie what the level number represents. If we
  mean the cache level seen through the CLIDR and co., it is hard to use
  it for shared caches since the level seen by different CPUs can actually
  be different, or put it differently the level number might not be unique 
  for
  a shared cache. I need to think about a proper way to sort this out.
 
  Ok. I don't even use this property in my driver. All I really
  need is the phandle from cpus pointing to the L2 and the
  interrupts property in the L2 node.
 
  How do you want to proceed here? If your cache binding goes
  through I would just need to add the interrupts part. Or you
  could even add that part in the same patch, you could have my
  signed-off-by for that.
  Ok, I will try to update the bindings with the interrupt part and copy
  you in, even though the level definition worries me a bit, it is an
  important property for power management and I need to find a proper
  solution before bindings can get accepted (basically the problem is:
  if different CPUs can see a cache at different levels as defined in the
  CLIDR we cannot describe a cache with a single cache level or put it
  differently, level can not represent the value in the CLIDR hence we
  need to describe it differently).
 
 Ok. I've dropped the cache part from this patch. I left the example as
 is minus the cache-level attribute.
 
 Understanding how the cache-level value would be used might help. I
 wonder if the cache-level can just be a number that describes the
 largest value that the cache could be assigned. Then if you have
 different CPUs seeing different levels of cache they can traverse from
 their CPU node to the cache and count how many phandles they went through.

Yes, that's one of the solutions I envisaged, and likely to be the one
that I will put forward since it requires almost no changes. If we go that way
cache-level becomes pretty useless though (which might be a good thing) and I
do not like the implicit cache level obtained by counting phandles.
Another option would be making cache-level a list and add a property
cache-level-affinity as 1:1 map list of phandles to cpu-map node to define for
each CPU the level at which that cache is mapped, somthing like the bindings
described here for IRQ affinity:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/162466.html

I would say I tend to prefer the latter option, since I do not like relying
on unwritten rules (implicit level numbering implied by phandle traversal) but
I am open to suggestions.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-19 Thread Lorenzo Pieralisi
On Wed, Feb 19, 2014 at 01:52:09AM +, Sebastian Capella wrote:

[...]

 diff --git a/arch/arm/kernel/hibernate.c b/arch/arm/kernel/hibernate.c
 new file mode 100644
 index 000..16f406f
 --- /dev/null
 +++ b/arch/arm/kernel/hibernate.c
 @@ -0,0 +1,106 @@
 +/*
 + * Hibernation support specific for ARM
 + *
 + * Derived from work on ARM hibernation support by:
 + *
 + * Ubuntu project, hibernation support for mach-dove
 + * Copyright (C) 2010 Nokia Corporation (Hiroshi Doyu)
 + * Copyright (C) 2010 Texas Instruments, Inc. (Teerth Reddy et al.)
 + *  https://lkml.org/lkml/2010/6/18/4
 + *  
 https://lists.linux-foundation.org/pipermail/linux-pm/2010-June/027422.html
 + *  https://patchwork.kernel.org/patch/96442/
 + *
 + * Copyright (C) 2006 Rafael J. Wysocki r...@sisk.pl
 + *
 + * License terms: GNU General Public License (GPL) version 2
 + */
 +
 +#include linux/mm.h
 +#include linux/suspend.h
 +#include asm/tlbflush.h
 +#include asm/cacheflush.h
 +#include asm/system_misc.h
 +#include asm/idmap.h
 +#include asm/suspend.h
 +
 +extern const void __nosave_begin, __nosave_end;
 +
 +int pfn_is_nosave(unsigned long pfn)
 +{
 + unsigned long nosave_begin_pfn =
 + __pa_symbol(__nosave_begin)  PAGE_SHIFT;
 + unsigned long nosave_end_pfn =
 + PAGE_ALIGN(__pa_symbol(__nosave_end))  PAGE_SHIFT;
 +
 + return (pfn = nosave_begin_pfn)  (pfn  nosave_end_pfn);
 +}
 +
 +void notrace save_processor_state(void)
 +{
 + WARN_ON(num_online_cpus() != 1);
 + flush_thread();

Can you explain to me please why we need to call flush_thread() here ?
At this point in time syscore_suspend() was already called and CPU
peripheral state saved through CPU PM notifiers.

 + local_fiq_disable();

To me it looks like we are using this hook to disable fiqs, since it is
not done in generic code.

 +}
 +
 +void notrace restore_processor_state(void)
 +{
 + local_fiq_enable();
 +}
 +
 +/*
 + * Snapshot kernel memory and reset the system.
 + * After resume, the hibernation snapshot is written out.
 + */
 +static int notrace __swsusp_arch_save_image(unsigned long unused)
 +{
 + int ret;
 +
 + ret = swsusp_save();
 + if (ret == 0)
 + soft_restart(virt_to_phys(cpu_resume));

By the time the suspend finisher (ie this function) is run, the
processor state has been saved and I think that's all you have to do,
function can just return after calling swsusp_save(), unless I am missing
something.

I do not understand why a soft_restart is required here. On a side note,
finisher is called with irqs disabled so, since you added a function for
soft restart noirq, it should be used, if needed, but I have to understand
why in the first place.

 + return ret;
 +}
 +
 +/*
 + * Save the current CPU state before suspend / poweroff.
 + */
 +int notrace swsusp_arch_suspend(void)
 +{
 + return cpu_suspend(0, __swsusp_arch_save_image);

If the goal of soft_restart is to return 0 on success from this call,
you can still do that without requiring a soft_restart in the first
place. IIUC all you want to achieve is to save processor context
registers so that when you resume from image you will actually return
from here.

 +}
 +
 +/*
 + * The framework loads the hibernation image into a linked list anchored
 + * at restore_pblist, for swsusp_arch_resume() to copy back to the proper
 + * destinations.
 + *
 + * To make this work if resume is triggered from initramfs, the
 + * pagetables need to be switched to allow writes to kernel mem.

Can you elaborate a bit more on this please ?

 + */
 +static void notrace __swsusp_arch_restore_image(void *unused)
 +{
 + struct pbe *pbe;
 +
 + cpu_switch_mm(idmap_pgd, init_mm);

Same here, thanks.

 + for (pbe = restore_pblist; pbe; pbe = pbe-next)
 + copy_page(pbe-orig_address, pbe-address);
 +
 + soft_restart_noirq(virt_to_phys(cpu_resume));

This soft_restart is justified so that you resume from the context saved
when creating the image.

 +}
 +
 +static u8 __swsusp_resume_stk[PAGE_SIZE/2] __nosavedata;
 +
 +/*
 + * Resume from the hibernation image.
 + * Due to the kernel heap / data restore, stack contents change underneath
 + * and that would make function calls impossible; switch to a temporary
 + * stack within the nosave region to avoid that problem.
 + */
 +int __naked swsusp_arch_resume(void)
 +{
 + extern void call_with_stack(void (*fn)(void *), void *arg, void *sp);

Ok, a function with attribute __naked that still calls C functions, is
attr __naked really needed here ?

 + cpu_init(); /* get a clean PSR */

cpu_init is called in the cpu_resume path, why is this call needed here ?

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-20 Thread Lorenzo Pieralisi
On Wed, Feb 19, 2014 at 07:10:31PM +, Russ Dill wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 02/19/2014 08:12 AM, Lorenzo Pieralisi wrote:
 
 + *  https://patchwork.kernel.org/patch/96442/
 

I am guessing the snippets of code your comments refer to.

 I think the idea here is to get the CPU into a state so that later
 when we resume from the resume kernel, the actual CPU state matches
 the state we have in kernel. The main thing flush_thread does is clear
 out any and all FP state.

Which has already been saved through syscore_suspend()

 The may be part of the patchset that is OBE.

It has to be updated then.

 cpu_resume makes many assumptions about the state of the state of the
 CPU, the primary being that the MMU is disabled, but also that all
 caches and IRQs are disabled. soft_restart does all this for us.
 
 
 
 ah, you are saying just return from __swsusp_arch_save_image and allow
 cpu_suspend_abort to be called, placing the result of swsusp_save
 somewhere else. This may work and would reduce the complexity of the
 code slightly.

Yes. Basically you are doing a soft reboot just to return 0.

 This is taken from the previous iteration of the patchset, I think the
 comment is OBE.

Updated it please then.

 But this is still required to select the right mapping for our copying.

/me confused. Please describe what switching to idmap is meant to
achieve. In the patch above the copied swapper pgdir is not even used, I
would like to understand why this is done.

 I don't remember why I needed to prevent gcc from manipulating the
 stack here.

That's not a good reason to mark a function with attr __naked. If it is
needed we leave it there, if it is not it has to go.

 This is another holdover from previous patch versions that may be OBE.

See above.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v1 3/3] ARM hibernation / suspend-to-disk

2014-02-20 Thread Lorenzo Pieralisi
Hi Sebastian,

On Wed, Feb 19, 2014 at 07:33:15PM +, Sebastian Capella wrote:
 Quoting Lorenzo Pieralisi (2014-02-19 08:12:54)
  On Wed, Feb 19, 2014 at 01:52:09AM +, Sebastian Capella wrote:
  [...]
   diff --git a/arch/arm/kernel/hibernate.c b/arch/arm/kernel/hibernate.c
   new file mode 100644
   index 000..16f406f
   --- /dev/null
   +++ b/arch/arm/kernel/hibernate.c
   +void notrace save_processor_state(void)
   +{
   + WARN_ON(num_online_cpus() != 1);
   + flush_thread();
  
  Can you explain to me please why we need to call flush_thread() here ?
  At this point in time syscore_suspend() was already called and CPU
  peripheral state saved through CPU PM notifiers.
 
 Copying Russ' response here: 
 
 I think the idea here is to get the CPU into a state so that later
 when we resume from the resume kernel, the actual CPU state matches
 the state we have in kernel. The main thing flush_thread does is clear
 out any and all FP state. - Russ Dill

See my reply to Russ.

[...]

   +static void notrace __swsusp_arch_restore_image(void *unused)
   +{
   + struct pbe *pbe;
   +
   + cpu_switch_mm(idmap_pgd, init_mm);
  
  Same here, thanks.
 
 At restore time, we take the save buffer data and restore it to the same
 physical locations used in the previous execution.  This will require having
 write access to all of memory, which may not be generally granted by the
 current mm.  So we switch to 1-1 init_mm to restore memory.

I still do not understand why switching to idmap, which is a clone of
init_mm + 1:1 kernel mappings is required here. Why idmap ?

And while at it, can't the idmap be overwritten _while_ copying back the
resume kernel ? Is it safe to use idmap page tables while copying ?

I had a look at x86 and there idmap page tables used to resume are created
on the fly using safe pages, on ARM idmap is created at boot.

I am grokking the code to understand what is really needed here, will get
back to you asap but I would like things to be clarified in the interim.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-02-12 Thread Lorenzo Pieralisi
On Mon, Feb 03, 2014 at 04:17:47PM +, Arjan van de Ven wrote:

[...]

  1) A latency driven one
  2) A performance impact on
 
  first one is pretty much the exit latency related time, sort of a
  expected time to first instruction (currently menuidle has the
  99.999% worst case number, which is not useful for this, but is a
  first approximation). This is obviously the dominating number for
  expected-short running tasks
 
  second on is more of a is there any cache/TLB left or is it flushed
  kind of metric. It's more tricky to compute, since what is the cost of
  an empty cache (or even a cache migration) after all   but I
  suspect it's in part what the scheduler will care about more for
  expected-long  running tasks.
 
  Yeah, so currently we 'assume' cache hotness based on runtime; see
  task_hot(). A hint that the CPU wiped its caches might help there.
 
 if there's a simple api like
 
 sched_cpu_cache_wiped(int llc)
 
 that would be very nice for this; the menuidle side knows this
 for some cases and thus can just call it. This would be a very
 small and minimal change

What do you mean by menuidle side knows this for some cases ?
You mean you know that some C-state entries imply llc clean/invalidate ?

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-02-12 Thread Lorenzo Pieralisi
On Wed, Feb 12, 2014 at 04:14:38PM +, Arjan van de Ven wrote:
 
  sched_cpu_cache_wiped(int llc)
 
  that would be very nice for this; the menuidle side knows this
  for some cases and thus can just call it. This would be a very
  small and minimal change
 
  What do you mean by menuidle side knows this for some cases ?
  You mean you know that some C-state entries imply llc clean/invalidate ?
 
 in the architectural idle code we can know if the llc got flushed
 there's also the per core flags where we know with reasonable certainty
 that the per core caches got flushed.

Ok, but that's arch specific, not something we can detect from the menu
governor in generic code , that's what I wanted to ask because it was not
clear.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm/dt: Don't add disabled CPUs to system topology

2013-06-07 Thread Lorenzo Pieralisi
On Fri, Jun 07, 2013 at 03:20:20PM +0100, Rob Herring wrote:
 On 06/07/2013 05:23 AM, Lorenzo Pieralisi wrote:
  Hi James,
  
  On Thu, Jun 06, 2013 at 06:11:25PM +0100, James King wrote:
  If CPUs are marked as disabled in the devicetree, make sure they do
  not exist in the system CPU information and CPU topology information.
  In this case these CPUs will not be able to be added to the system later
  using hot-plug. This allows a single chip with many CPUs to be easily
  used in a variety of hardware devices where they may have different
  actual processing requirements (eg for thermal/cost reasons).
 
  - Change devicetree.c to ignore any cpu nodes marked as disabled,
this effectively limits the number of active cpu cores so no need
for the max_cpus=x in the chosen node.
  - Change topology.c to ignore any cpu nodes marked as disabled, this
is where the scheduler would learn about big/LITTLE cores so this
effectively keeps the scheduler in sync.
 
  
  I have two questions:
  
  1) Since with this approach the DT should change anyway if on different
 hardware devices based on the same chip you want to allow booting a
 different number of CPUs, why do not we remove the cpu nodes instead of
 disabling them ? Put it another way: cpu nodes define a cpu as
 possible (currently), we can simply remove the node if we do not want
 that cpu to be seen by the kernel.
  2) If we go for the status property, why do not we use it to set present
 mask ? That way the cpu is possible but not present, you cannot
 hotplug it in. It is a bit of a stretch, granted, the cpu _is_ present,
 we just want to disable it, do not know how this is handled in x86
 and other archs though.
 
 The meaning of disabled for cpus in ePAPR is that the cpu is offline
 (i.e. in a spinloop or wfi), not that the cpu is unavailable. This is a
 bit of a departure and inconsistency from how status for devices are
 used. That would imply that we should be setting status to disabled for
 all secondary cores and that possibly the status value should get
 updated to reflect the state of the cpu.

Yes, that's what I understood from the ePAPR as well. According to
the ePAPR, as you say, a cpu with its status property == disabled is a
possible CPU, since it can be enabled (through a specific enable-method).

I am not sure status can be reused for the purpose this patch was developed
for without changing the bindings in the ePAPR (ie if DT parsing skips
cpu nodes with status == disabled, this is a significant departure
from what ePAPR defines, and it would force us to define an enable-method
to enable/online those CPUs which is not what this patch was developed for).

How was PowerPC tackling the problem James set about solving ?

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm/dt: Don't add disabled CPUs to system topology

2013-06-07 Thread Lorenzo Pieralisi
On Fri, Jun 07, 2013 at 12:48:58PM +0100, James King wrote:
 Hi Lorenzo,
 
 On 7 June 2013 11:23, Lorenzo Pieralisi lorenzo.pieral...@arm.com wrote:
  Hi James,
 
  On Thu, Jun 06, 2013 at 06:11:25PM +0100, James King wrote:
  If CPUs are marked as disabled in the devicetree, make sure they do
  not exist in the system CPU information and CPU topology information.
  In this case these CPUs will not be able to be added to the system later
  using hot-plug. This allows a single chip with many CPUs to be easily
  used in a variety of hardware devices where they may have different
  actual processing requirements (eg for thermal/cost reasons).
 
  - Change devicetree.c to ignore any cpu nodes marked as disabled,
this effectively limits the number of active cpu cores so no need
for the max_cpus=x in the chosen node.
  - Change topology.c to ignore any cpu nodes marked as disabled, this
is where the scheduler would learn about big/LITTLE cores so this
effectively keeps the scheduler in sync.
 
 
  I have two questions:
 
  1) Since with this approach the DT should change anyway if on different
 hardware devices based on the same chip you want to allow booting a
 different number of CPUs, why do not we remove the cpu nodes instead of
 disabling them ? Put it another way: cpu nodes define a cpu as
 possible (currently), we can simply remove the node if we do not want
 that cpu to be seen by the kernel.
 
 The reason we want disabled status rather than just remove the nodes
 is to use a common soc.dtsi file which is included in many board.dts
 files - eg:
 
 file soc.dtsi contains:
 
 cpus {
 cpu0: cpu@0 {
 device_type = cpu;
 compatible = arm,cortex-a7;
 reg = 0;
 cluster = cluster0;
 core = core0;

Minor nit, cluster and core phandles are not part of cpu the bindings
that will be merged this cycle, I know it is just an example.

 };
 
 cpu1: cpu@1 {
 device_type = cpu;
 compatible = arm,cortex-a7;
 reg = 1;
 cluster = cluster0;
 core = core1;
 };
 
 cpu2: cpu@2 {
 device_type = cpu;
 compatible = arm,cortex-a15;
 reg = 2;
 cluster = cluster0;
 core = core2;
 };
 };
 
 file board1.dts where we want the A15 disabled contains:
 
 /include/ soc.dtsi
 
 cpus {
 cpu2: cpu@2 {
 status = disabled;
 };
 };

Understood, see the other reply as far as the status property is concerned.

  2) If we go for the status property, why do not we use it to set present
 mask ? That way the cpu is possible but not present, you cannot
 hotplug it in. It is a bit of a stretch, granted, the cpu _is_ present,
 we just want to disable it, do not know how this is handled in x86
 and other archs though.
 
 I have been struggling to find any equivalent example for another
 arch, so just tried to solve our problem. I guess in the x86 world it
 is less likely to want to disable processors in a SoC for heat/battery
 issues so this has just never arisen. In this case the cpu is
 physically present but not possible, but I am not sure it should be in
 a present mask (giving the impression it can be used). Perhaps you
 could elaborate with an example what you are thinking about here?

I was thinking about using status == disabled to mark a cpu as
possible but not present; that is a bad idea for the reason you mentioned
and also for the point Rob raised related to the ePAPR.

I am not sure how we can solve this issue, as I mentioned the easiest
solution consists in not defining cpu nodes in the DT or probably add
an additional property != status with proper bindings attached to it.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 0/2] drivers: mfd: Versatile Express SPC support

2013-06-11 Thread Lorenzo Pieralisi
Hi Samuel,

if nobody has objections I think this set is ready to get merged. As
Nico mentioned in:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/173541.html

since we would like to get it merged through the ARM SoC tree owing to
dependencies between this code and ARM power management back-ends, your ack
would be appreciated, if you think it is worth it of course.

Thank you very much indeed,
Lorenzo

On Thu, Jun 06, 2013 at 10:59:21AM +0100, Lorenzo Pieralisi wrote:
 This patch is v3 of a previous posting:
 
 http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/173464.html
 
 v3 changes:
 
 - added __refdata to spc_check_loaded pointer
 - removed some exported symbols
 - added node pointer check in vexpress_spc_init()
 
 v2 changes:
 
 - Dropped timeout interface patch
 - Converted interfaces to non-timeout ones, integrated and retested
 - Removed mutex used at init
 - Refactored code to work around init sections warning
 - Fixed two minor bugs
 
 This patch series introduces support for the Versatile Express Serial
 Power Controller (SPC) present in ARM Versatile Express TC2 core tiles.
 SPC driver is a fundamental component of TC2 power management and allows
 to carry out C-state management and DVFS for A15 and A7 clusters.
 
 First patch provides changes required by SPC to comply with the
 Versatile Express config API, second patch is the SPC driver implementation.
 
 Code extensively exercised through CPUidle and CPUfreq power states and
 operating point transitions.
 
 Lorenzo Pieralisi (1):
   drivers: mfd: vexpress: add Serial Power Controller (SPC) support
 
 Pawel Moll (1):
   drivers: mfd: refactor the vexpress config bridge API
 
  Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  35 +
  drivers/mfd/Kconfig|   7 +
  drivers/mfd/Makefile   |   1 +
  drivers/mfd/vexpress-config.c  |  61 +-
  drivers/mfd/vexpress-spc.c | 633 ++
  drivers/mfd/vexpress-sysreg.c  |   2 +-
  include/linux/vexpress.h   |  59 +-
  7 files changed, 770 insertions(+), 28 deletions(-)
  create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
  create mode 100644 drivers/mfd/vexpress-spc.c
 
 -- 
 1.8.2.2
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-13 Thread Lorenzo Pieralisi
Hi Samuel,

first things first, thanks a lot for having a look.

On Thu, Jun 13, 2013 at 01:01:43AM +0100, Samuel Ortiz wrote:
 Hi Lorenzo,
 
 I don't particularily like this code, but I guess most of my dislike
 comes from the whole bridge interface API and how that forces you into
 implementing pretty much static code.

I do not particularly like it either; you have to grant us though, as Nico
explained, that the usage of this piece of hardware very early at boot is
forcing us to find a solution that is not necessarily easy to implement.

 A few nitpicks:
 
 On Thu, Jun 06, 2013 at 10:59:23AM +0100, Lorenzo Pieralisi wrote:
  diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
  index d54e985..391eda1 100644
  --- a/drivers/mfd/Kconfig
  +++ b/drivers/mfd/Kconfig
  @@ -1148,3 +1148,10 @@ config VEXPRESS_CONFIG
  help
Platform configuration infrastructure for the ARM Ltd.
Versatile Express.
  +
  +config VEXPRESS_SPC
  +   bool Versatile Express SPC driver support
  +   depends on ARM
  +   depends on VEXPRESS_CONFIG
  +   help
 Please provide a detailed help entry here. 

Ok.

  + Serial Power Controller driver for ARM Ltd. test chips.
  diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
  index 718e94a..3a01203 100644
  --- a/drivers/mfd/Makefile
  +++ b/drivers/mfd/Makefile
  @@ -153,5 +153,6 @@ obj-$(CONFIG_MFD_SEC_CORE)  += sec-core.o sec-irq.o
   obj-$(CONFIG_MFD_SYSCON)   += syscon.o
   obj-$(CONFIG_MFD_LM3533)   += lm3533-core.o lm3533-ctrlbank.o
   obj-$(CONFIG_VEXPRESS_CONFIG)  += vexpress-config.o vexpress-sysreg.o
  +obj-$(CONFIG_VEXPRESS_SPC) += vexpress-spc.o
 So you have Versatile Express platforms that will not need SPC ? i.e.
 why isn't all that stuff under a generic CONFIG_VEXPRESS symbol ?

You answered your own question, the Serial Power Controller aka SPC is
present only in one of the many coretiles that can be stacked on top
of the versatile express motherboard, so it requires a specific entry
unless we want to compile it in for all vexpress platforms.

  +static struct vexpress_spc_drvdata *info;
  +static u32 *vexpress_spc_config_data;
  +static struct vexpress_config_bridge *vexpress_spc_config_bridge;
  +static struct vexpress_config_func *opp_func, *perf_func;
  +
  +static int vexpress_spc_load_result = -EAGAIN;
 As I said, quite static...

I will have a look and see if I can improve it, I could include some of
those variables in the driver data and alloc them dynamically.

  +irqreturn_t vexpress_spc_irq_handler(int irq, void *data)
 missing a static here ?

Were not there enough :-) ? Correct, I will fix it.

  +static bool __init __vexpress_spc_check_loaded(void);
  +/*
  + * Pointer spc_check_loaded is swapped after init hence it is safe
  + * to initialize it to a function in the __init section
  + */
  +static bool (*spc_check_loaded)(void) __refdata = 
  __vexpress_spc_check_loaded;
  +
  +static bool __init __vexpress_spc_check_loaded(void)
  +{
  +   if (vexpress_spc_load_result == -EAGAIN)
  +   vexpress_spc_load_result = vexpress_spc_init();
  +   spc_check_loaded = vexpress_spc_initialized;
  +   return vexpress_spc_initialized();
  +}
  +
  +/*
  + * Function exported to manage early_initcall ordering.
  + * SPC code is needed very early in the boot process
  + * to bring CPUs out of reset and initialize power
  + * management back-end. After boot swap pointers to
  + * make the functionality check available to loadable
  + * modules, when early boot init functions have been
  + * already freed from kernel address space.
  + */
  +bool vexpress_spc_check_loaded(void)
  +{
  +   return spc_check_loaded();
  +}
  +EXPORT_SYMBOL_GPL(vexpress_spc_check_loaded);
 That one and the previous function look really nasty to me.
 The simple fact that you need a static variable in your code to check if
 your module is loaded sounds really fishy.

Nico explained the reasons behind this nasty hack, because that's what it
is. The only solution is resorting to vexpress platform code to initialize
this driver directly (providing a static virtual memory mapping since that
has to happen very early) to remove all needs for early_initcall
synchronization and remove that variable. It won't look nicer though.

I will review the code again to see how I can improve it.

Thanks a lot,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-14 Thread Lorenzo Pieralisi
Hi Olof,

thank you very much for having a look.

On Thu, Jun 13, 2013 at 11:52:33PM +0100, Olof Johansson wrote:
 Hi,
 
 Overall this driver looks like it just needs more cooking
 time. It's... gritty.  Complicated when it should be simple and
 layered. Naming is nonobvious, and overall it's hard to glance at a
 function and say ah, it does x.

It is hard to make it clear too, and I am not on the defensive, it is
not a standard component following a standard interface, but point taken
I will do my best to improve it according to your reviews.

 I think some of this might be because of naming conventions. Lots of long
 prefixes, subsystem indirection, etc. I wish I had a straight answer to
 what would make it better, but just more overall polish.
 
 So, I have a bunch of comments below. Most of these are related to
 readability, which is one of the most important things of new code
 these days.
 
 Please find a shorter suitable prefix than vexpress_spc_.* too, it's
 way too long.

Ok.

 On Thu, Jun 06, 2013 at 10:59:23AM +0100, Lorenzo Pieralisi wrote:
  The TC2 versatile express core tile integrates a logic block that provides 
  the
  interface between the dual cluster test-chip and the M3 microcontroller that
  carries out power management. The logic block, called Serial Power 
  Controller
  (SPC), contains several memory mapped registers to control among other 
  things
  low-power states, operating points and reset control.
 
  This patch provides a driver that enables run-time control of features
  implemented by the SPC control logic.
 
  The driver also provides a bridge interface through the vexpress config
  infrastructure. Operations allowing to read/write operating points are
  made to go via the same interface as configuration transactions so that
  all requests to M3 are serialized.
 
  Device tree bindings documentation for the SPC component is provided with
  the patchset.
 
  Cc: Samuel Ortiz sa...@linux.intel.com
  Cc: Pawel Moll pawel.m...@arm.com
  Cc: Nicolas Pitre nicolas.pi...@linaro.org
  Cc: Amit Kucheria amit.kuche...@linaro.org
  Cc: Jon Medhurst t...@linaro.org
  Signed-off-by: Achin Gupta achin.gu...@arm.com
  Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
  Signed-off-by: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
  Reviewed-by: Nicolas Pitre n...@linaro.org
  ---
   Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  35 +
   drivers/mfd/Kconfig|   7 +
   drivers/mfd/Makefile   |   1 +
   drivers/mfd/vexpress-spc.c | 633 ++
   include/linux/vexpress.h   |  43 +
   5 files changed, 719 insertions(+)
 
  diff --git a/Documentation/devicetree/bindings/mfd/vexpress-spc.txt 
  b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
  new file mode 100644
  index 000..1d71dc2
  --- /dev/null
  +++ b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
  @@ -0,0 +1,35 @@
  +* ARM Versatile Express Serial Power Controller device tree bindings
  +
  +Latest ARM development boards implement a power management interface 
  (serial
  +power controller - SPC) that is capable of managing power/voltage and
  +operating point transitions, through memory mapped registers interface.
  +
  +On testchips like TC2 it also provides a configuration interface that can
  +be used to read/write values which cannot be read/written through simple
  +memory mapped reads/writes.
 
 A configuration interface for what? Just having it as a PMIC doesn't warrant 
 it
 being an MFD, really.

Ok, description added.

  +- spc node
  +
  + - compatible:
  + Usage: required
  + Value type: stringlist
  + Definition: must be
  + arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc
  + - reg:
  + Usage: required
  + Value type: prop-encode-array
  + Definition: A standard property that specifies the base 
  address
  + and the size of the SPC address space
  + - interrupts:
  + Usage: required
  + Value type: prop-encoded-array
  + Definition:  SPC interrupt configuration. A standard property
  +  that follows ePAPR interrupts specifications
  +
  +Example:
  +
  +spc: spc@7fff {
  + compatible = arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc;
 
 Nit: space after comma between strings.
 
  + reg = 0 0x7FFF 0 0x1000;
 
 #size-cells 2 on the parent bus? That's somewhat unusual. We tend to prefer
 lowercase hex.
 

Ok on both counts.

  + interrupts = 0 95 4;
  +};
  diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
  index d54e985..391eda1 100644
  --- a/drivers/mfd/Kconfig
  +++ b/drivers/mfd/Kconfig
  @@ -1148,3 +1148,10 @@ config VEXPRESS_CONFIG
help
  Platform configuration infrastructure for the ARM Ltd.
  Versatile

[RFC PATCH v4 0/2] drivers: mfd: Versatile Express SPC support

2013-06-17 Thread Lorenzo Pieralisi
This patch is v4 of a previous posting:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/173831.html

v4 changes:
- Applied review comments (trimmed function names, added comments, refactored
  some APIs)
- Added comments throughout the set
- Fixed irq handler bug in checking the transaction status
- Improved commit log to explain early init synchro scheme
- Created a single static structure for variables dynamically allocated to
  remove usage of static
- Improved Kconfig entry

v3 changes:

- added __refdata to spc_check_loaded pointer
- removed some exported symbols
- added node pointer check in vexpress_spc_init()

v2 changes:

- Dropped timeout interface patch
- Converted interfaces to non-timeout ones, integrated and retested
- Removed mutex used at init
- Refactored code to work around init sections warning
- Fixed two minor bugs

This patch series introduces support for the Versatile Express Serial
Power Controller (SPC) present in ARM Versatile Express TC2 core tiles.
SPC driver is a fundamental component of TC2 power management and allows
to carry out C-state management and DVFS for A15 and A7 clusters.

First patch provides changes required by SPC to comply with the
Versatile Express config API, second patch is the SPC driver implementation.

Code extensively exercised through CPUidle and CPUfreq power states and
operating point transitions.

Lorenzo Pieralisi (1):
  drivers: mfd: vexpress: add Serial Power Controller (SPC) support

Pawel Moll (1):
  drivers: mfd: refactor the vexpress config bridge API

 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  36 +
 drivers/mfd/Kconfig|  13 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-config.c  |  61 +-
 drivers/mfd/vexpress-spc.c | 666 ++
 drivers/mfd/vexpress-sysreg.c  |   2 +-
 include/linux/vexpress.h   |  49 +-
 7 files changed, 800 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
 create mode 100644 drivers/mfd/vexpress-spc.c

-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v4 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-17 Thread Lorenzo Pieralisi
The TC2 versatile express core tile integrates a logic block that provides the
interface between the dual cluster test-chip and the M3 microcontroller that
carries out power management. The logic block, called Serial Power Controller
(SPC), contains several memory mapped registers to control among other things
low-power states, operating points and reset control.

This patch provides a driver that enables run-time control of features
implemented by the SPC control logic.

The SPC control logic is required to be programmed very early in the boot
process to reset secondary CPUs on the TC2 testchip, set-up jump addresses and
wake-up IRQs for power management.
Since the SPC logic is also used to control clocks and operating points,
that have to be initialized early as well, the SPC interface consumers can not
rely on early initcalls ordering, which is inconsistent, to wait for SPC
initialization. Hence, in order to keep the components relying on the SPC
coded in a sane way, the driver puts in place a synchronization scheme that
allows kernel drivers to check if the SPC driver has been initialized and if
not, to initialize it upon check.

A status variable is kept in memory so that loadable modules that require SPC
interface (eg CPUfreq drivers) can still check the correct initialization and
use the driver correctly after functions used at boot to init the driver are
freed.

The driver also provides a bridge interface through the vexpress config
infrastructure. Operations allowing to read/write operating points are
made to go via the same interface as configuration transactions so that
all requests to M3 are serialized.

Device tree bindings documentation for the SPC component is provided with
the patchset.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Olof Johansson o...@lixom.net
Cc: Pawel Moll pawel.m...@arm.com
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Achin Gupta achin.gu...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
Signed-off-by: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Reviewed-by: Nicolas Pitre n...@linaro.org
---
 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  36 +
 drivers/mfd/Kconfig|  13 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-spc.c | 666 ++
 include/linux/vexpress.h   |  33 +
 5 files changed, 749 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/vexpress-spc.txt 
b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
new file mode 100644
index 000..bd381d1
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
@@ -0,0 +1,36 @@
+* ARM Versatile Express Serial Power Controller device tree bindings
+
+Latest ARM development boards implement a power management interface (serial
+power controller - SPC) that is capable of managing power/voltage and
+operating point transitions, through memory mapped registers interface.
+
+On testchips like TC2 it also provides a serial configuration interface
+that can be used to retrieve temperature sensors, energy/voltage/current
+probes and oscillators values through the SYS configuration protocol defined
+for versatile express motherboards.
+
+- spc node
+
+   - compatible:
+   Usage: required
+   Value type: stringlist
+   Definition: must be
+   arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc
+   - reg:
+   Usage: required
+   Value type: prop-encode-array
+   Definition: A standard property that specifies the base address
+   and the size of the SPC address space
+   - interrupts:
+   Usage: required
+   Value type: prop-encoded-array
+   Definition:  SPC interrupt configuration. A standard property
+that follows ePAPR interrupts specifications
+
+Example:
+
+spc: spc@7fff {
+   compatible = arm,vexpress-spc,v2p-ca15_a7, arm,vexpress-spc;
+   reg = 0x7fff 0x1000;
+   interrupts = 0 95 4;
+};
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index d54e985..e032099 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1148,3 +1148,16 @@ config VEXPRESS_CONFIG
help
  Platform configuration infrastructure for the ARM Ltd.
  Versatile Express.
+
+config VEXPRESS_SPC
+   bool Versatile Express SPC driver support
+   depends on ARM
+   depends on VEXPRESS_CONFIG
+   help
+ The Serial Power Controller (SPC) for ARM Ltd. test chips, is
+ an IP that provides a memory mapped interface to power controller
+ HW and also a configuration interface compatible with the existing
+ Versatile Express SYS configuration protocol. The driver provides
+ an API abstraction allowing to control operating

[RFC PATCH v4 1/2] drivers: mfd: refactor the vexpress config bridge API

2013-06-17 Thread Lorenzo Pieralisi
From: Pawel Moll pawel.m...@arm.com

The introduction of Serial Power Controller (SPC) requires the vexpress
config interface to change slightly since the SPC memory mapped interface
can be used as configuration bus but also for operating points
programming and retrieval. The helper that allocates the bridge functions
requires an additional parameter allowing to request component specific
functions that need not be initialized through device tree bindings but
just using simple look-up and statically defined constants.

This patch introduces the necessary changes to the vexpress config layer
to cater for the new vexpress bridge interface requirements.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Achin Gupta achin.gu...@arm.com
Cc: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Pawel Moll pawel.m...@arm.com
Acked-by: Nicolas Pitre n...@linaro.org
---
 drivers/mfd/vexpress-config.c | 61 ++
 drivers/mfd/vexpress-sysreg.c |  2 +-
 include/linux/vexpress.h  | 16 ++-
 3 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/drivers/mfd/vexpress-config.c b/drivers/mfd/vexpress-config.c
index 84ce6b9..1af2b0e 100644
--- a/drivers/mfd/vexpress-config.c
+++ b/drivers/mfd/vexpress-config.c
@@ -86,29 +86,13 @@ void vexpress_config_bridge_unregister(struct 
vexpress_config_bridge *bridge)
 }
 EXPORT_SYMBOL(vexpress_config_bridge_unregister);
 
-
-struct vexpress_config_func {
-   struct vexpress_config_bridge *bridge;
-   void *func;
-};
-
-struct vexpress_config_func *__vexpress_config_func_get(struct device *dev,
-   struct device_node *node)
+static struct vexpress_config_bridge *
+   vexpress_config_bridge_find(struct device_node *node)
 {
-   struct device_node *bridge_node;
-   struct vexpress_config_func *func;
int i;
+   struct vexpress_config_bridge *res = NULL;
+   struct device_node *bridge_node = of_node_get(node);
 
-   if (WARN_ON(dev  node  dev-of_node != node))
-   return NULL;
-   if (dev  !node)
-   node = dev-of_node;
-
-   func = kzalloc(sizeof(*func), GFP_KERNEL);
-   if (!func)
-   return NULL;
-
-   bridge_node = of_node_get(node);
while (bridge_node) {
const __be32 *prop = of_get_property(bridge_node,
arm,vexpress,config-bridge, NULL);
@@ -129,13 +113,46 @@ struct vexpress_config_func 
*__vexpress_config_func_get(struct device *dev,
 
if (test_bit(i, vexpress_config_bridges_map) 
bridge-node == bridge_node) {
-   func-bridge = bridge;
-   func-func = bridge-info-func_get(dev, node);
+   res = bridge;
break;
}
}
mutex_unlock(vexpress_config_bridges_mutex);
 
+   return res;
+}
+
+
+struct vexpress_config_func {
+   struct vexpress_config_bridge *bridge;
+   void *func;
+};
+
+struct vexpress_config_func *__vexpress_config_func_get(
+   struct vexpress_config_bridge *bridge,
+   struct device *dev,
+   struct device_node *node,
+   const char *id)
+{
+   struct vexpress_config_func *func;
+
+   if (WARN_ON(dev  node  dev-of_node != node))
+   return NULL;
+   if (dev  !node)
+   node = dev-of_node;
+
+   if (!bridge)
+   bridge = vexpress_config_bridge_find(node);
+   if (!bridge)
+   return NULL;
+
+   func = kzalloc(sizeof(*func), GFP_KERNEL);
+   if (!func)
+   return NULL;
+
+   func-bridge = bridge;
+   func-func = bridge-info-func_get(dev, node, id);
+
if (!func-func) {
of_node_put(node);
kfree(func);
diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c
index 96a020b..d2599aa 100644
--- a/drivers/mfd/vexpress-sysreg.c
+++ b/drivers/mfd/vexpress-sysreg.c
@@ -165,7 +165,7 @@ static u32 *vexpress_sysreg_config_data;
 static int vexpress_sysreg_config_tries;
 
 static void *vexpress_sysreg_config_func_get(struct device *dev,
-   struct device_node *node)
+   struct device_node *node, const char *id)
 {
struct vexpress_sysreg_config_func *config_func;
u32 site;
diff --git a/include/linux/vexpress.h b/include/linux/vexpress.h
index 6e7980d..50368e0 100644
--- a/include/linux/vexpress.h
+++ b/include/linux/vexpress.h
@@ -68,7 +68,8 @@
  */
 struct vexpress_config_bridge_info {
const char *name;
-   void *(*func_get)(struct device *dev, struct device_node *node);
+   void *(*func_get)(struct device *dev, struct device_node *node,
+ const char *id);
void (*func_put)(void *func);
int (*func_exec)(void *func, int offset, 

[RFC PATCH v2 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-05 Thread Lorenzo Pieralisi
The TC2 versatile express core tile integrates a logic block that provides the
interface between the dual cluster test-chip and the M3 microcontroller that
carries out power management. The logic block, called Serial Power Controller
(SPC), contains several memory mapped registers to control among other things
low-power states, operating points and reset control.

This patch provides a driver that enables run-time control of features
implemented by the SPC control logic.

The driver also provides a bridge interface through the vexpress config
infrastructure. Operations allowing to read/write operating points are
made to go via the same interface as configuration transactions so that
all requests to M3 are serialized.

Device tree bindings documentation for the SPC component is provided with
the patchset.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Achin Gupta achin.gu...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
Signed-off-by: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
---
 .../devicetree/bindings/mfd/vexpress-spc.txt   |  35 ++
 drivers/mfd/Kconfig|   7 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-spc.c | 630 +
 include/linux/vexpress.h   |  43 ++
 5 files changed, 716 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
 create mode 100644 drivers/mfd/vexpress-spc.c

diff --git a/Documentation/devicetree/bindings/mfd/vexpress-spc.txt 
b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
new file mode 100644
index 000..1d71dc2
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
@@ -0,0 +1,35 @@
+* ARM Versatile Express Serial Power Controller device tree bindings
+
+Latest ARM development boards implement a power management interface (serial
+power controller - SPC) that is capable of managing power/voltage and
+operating point transitions, through memory mapped registers interface.
+
+On testchips like TC2 it also provides a configuration interface that can
+be used to read/write values which cannot be read/written through simple
+memory mapped reads/writes.
+
+- spc node
+
+   - compatible:
+   Usage: required
+   Value type: stringlist
+   Definition: must be
+   arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc
+   - reg:
+   Usage: required
+   Value type: prop-encode-array
+   Definition: A standard property that specifies the base address
+   and the size of the SPC address space
+   - interrupts:
+   Usage: required
+   Value type: prop-encoded-array
+   Definition:  SPC interrupt configuration. A standard property
+that follows ePAPR interrupts specifications
+
+Example:
+
+spc: spc@7fff {
+   compatible = arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc;
+   reg = 0 0x7FFF 0 0x1000;
+   interrupts = 0 95 4;
+};
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index d54e985..391eda1 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1148,3 +1148,10 @@ config VEXPRESS_CONFIG
help
  Platform configuration infrastructure for the ARM Ltd.
  Versatile Express.
+
+config VEXPRESS_SPC
+   bool Versatile Express SPC driver support
+   depends on ARM
+   depends on VEXPRESS_CONFIG
+   help
+ Serial Power Controller driver for ARM Ltd. test chips.
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 718e94a..3a01203 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -153,5 +153,6 @@ obj-$(CONFIG_MFD_SEC_CORE)  += sec-core.o sec-irq.o
 obj-$(CONFIG_MFD_SYSCON)   += syscon.o
 obj-$(CONFIG_MFD_LM3533)   += lm3533-core.o lm3533-ctrlbank.o
 obj-$(CONFIG_VEXPRESS_CONFIG)  += vexpress-config.o vexpress-sysreg.o
+obj-$(CONFIG_VEXPRESS_SPC) += vexpress-spc.o
 obj-$(CONFIG_MFD_RETU) += retu-mfd.o
 obj-$(CONFIG_MFD_AS3711)   += as3711.o
diff --git a/drivers/mfd/vexpress-spc.c b/drivers/mfd/vexpress-spc.c
new file mode 100644
index 000..1aaa673
--- /dev/null
+++ b/drivers/mfd/vexpress-spc.c
@@ -0,0 +1,630 @@
+/*
+ * Versatile Express Serial Power Controller (SPC) support
+ *
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * Author(s): Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
+ *Achin Gupta   achin.gu...@arm.com
+ *Lorenzo Pieralisi lorenzo.pieral...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation

[RFC PATCH v2 1/2] drivers: mfd: refactor the vexpress config bridge API

2013-06-05 Thread Lorenzo Pieralisi
From: Pawel Moll pawel.m...@arm.com

The introduction of Serial Power Controller (SPC) requires the vexpress
config interface to change slightly since the SPC memory mapped interface
can be used as configuration bus but also for operating points
programming and retrieval. The helper that allocates the bridge functions
requires an additional parameter allowing to request component specific
functions that need not be initialized through device tree bindings but
just using simple look-up and statically defined constants.

This patch introduces the necessary changes to the vexpress config layer
to cater for the new vexpress bridge interface requirements.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Achin Gupta achin.gu...@arm.com
Cc: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Pawel Moll pawel.m...@arm.com
---
 drivers/mfd/vexpress-config.c | 61 +++
 drivers/mfd/vexpress-sysreg.c |  2 +-
 include/linux/vexpress.h  | 16 
 3 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/drivers/mfd/vexpress-config.c b/drivers/mfd/vexpress-config.c
index 84ce6b9..1af2b0e 100644
--- a/drivers/mfd/vexpress-config.c
+++ b/drivers/mfd/vexpress-config.c
@@ -86,29 +86,13 @@ void vexpress_config_bridge_unregister(struct 
vexpress_config_bridge *bridge)
 }
 EXPORT_SYMBOL(vexpress_config_bridge_unregister);
 
-
-struct vexpress_config_func {
-   struct vexpress_config_bridge *bridge;
-   void *func;
-};
-
-struct vexpress_config_func *__vexpress_config_func_get(struct device *dev,
-   struct device_node *node)
+static struct vexpress_config_bridge *
+   vexpress_config_bridge_find(struct device_node *node)
 {
-   struct device_node *bridge_node;
-   struct vexpress_config_func *func;
int i;
+   struct vexpress_config_bridge *res = NULL;
+   struct device_node *bridge_node = of_node_get(node);
 
-   if (WARN_ON(dev  node  dev-of_node != node))
-   return NULL;
-   if (dev  !node)
-   node = dev-of_node;
-
-   func = kzalloc(sizeof(*func), GFP_KERNEL);
-   if (!func)
-   return NULL;
-
-   bridge_node = of_node_get(node);
while (bridge_node) {
const __be32 *prop = of_get_property(bridge_node,
arm,vexpress,config-bridge, NULL);
@@ -129,13 +113,46 @@ struct vexpress_config_func 
*__vexpress_config_func_get(struct device *dev,
 
if (test_bit(i, vexpress_config_bridges_map) 
bridge-node == bridge_node) {
-   func-bridge = bridge;
-   func-func = bridge-info-func_get(dev, node);
+   res = bridge;
break;
}
}
mutex_unlock(vexpress_config_bridges_mutex);
 
+   return res;
+}
+
+
+struct vexpress_config_func {
+   struct vexpress_config_bridge *bridge;
+   void *func;
+};
+
+struct vexpress_config_func *__vexpress_config_func_get(
+   struct vexpress_config_bridge *bridge,
+   struct device *dev,
+   struct device_node *node,
+   const char *id)
+{
+   struct vexpress_config_func *func;
+
+   if (WARN_ON(dev  node  dev-of_node != node))
+   return NULL;
+   if (dev  !node)
+   node = dev-of_node;
+
+   if (!bridge)
+   bridge = vexpress_config_bridge_find(node);
+   if (!bridge)
+   return NULL;
+
+   func = kzalloc(sizeof(*func), GFP_KERNEL);
+   if (!func)
+   return NULL;
+
+   func-bridge = bridge;
+   func-func = bridge-info-func_get(dev, node, id);
+
if (!func-func) {
of_node_put(node);
kfree(func);
diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c
index 96a020b..d2599aa 100644
--- a/drivers/mfd/vexpress-sysreg.c
+++ b/drivers/mfd/vexpress-sysreg.c
@@ -165,7 +165,7 @@ static u32 *vexpress_sysreg_config_data;
 static int vexpress_sysreg_config_tries;
 
 static void *vexpress_sysreg_config_func_get(struct device *dev,
-   struct device_node *node)
+   struct device_node *node, const char *id)
 {
struct vexpress_sysreg_config_func *config_func;
u32 site;
diff --git a/include/linux/vexpress.h b/include/linux/vexpress.h
index 6e7980d..50368e0 100644
--- a/include/linux/vexpress.h
+++ b/include/linux/vexpress.h
@@ -68,7 +68,8 @@
  */
 struct vexpress_config_bridge_info {
const char *name;
-   void *(*func_get)(struct device *dev, struct device_node *node);
+   void *(*func_get)(struct device *dev, struct device_node *node,
+ const char *id);
void (*func_put)(void *func);
  

[RFC PATCH v2 0/2] drivers: mfd: Versatile Express SPC support

2013-06-05 Thread Lorenzo Pieralisi
This patch is v2 of a previous posting:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-May/170624.html

V2 changes:

- Dropped timeout interface patch
- Converted interfaces to non-timeout ones, integrated and retested
- Removed mutex used at init
- Refactored code to work around init sections warning
- Fixed interface type enumeration init
- Fixed data write in SYSCFG write commands

This patch series introduces support for the Versatile Express Serial
Power Controller (SPC) present in ARM Versatile Express TC2 core tiles.
SPC driver is a fundamental component of TC2 power management and allows
to carry out C-state management and DVFS for A15 and A7 clusters.

First patch provides changes required by SPC to comply with the
Versatile Express config API, second patch is the SPC driver implementation.

Code extensively exercised through CPUidle and CPUfreq power states and
operating point transitions.

Lorenzo Pieralisi (1):
  drivers: mfd: vexpress: add Serial Power Controller (SPC) support

Pawel Moll (1):
  drivers: mfd: refactor the vexpress config bridge API

 .../devicetree/bindings/mfd/vexpress-spc.txt   |  35 ++
 drivers/mfd/Kconfig|   7 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-config.c  |  61 +-
 drivers/mfd/vexpress-spc.c | 630 +
 drivers/mfd/vexpress-sysreg.c  |   2 +-
 include/linux/vexpress.h   |  59 +-
 7 files changed, 767 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
 create mode 100644 drivers/mfd/vexpress-spc.c

-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-05 Thread Lorenzo Pieralisi
On Wed, Jun 05, 2013 at 07:08:33PM +0100, Jon Medhurst (Tixy) wrote:
 On Wed, 2013-06-05 at 12:46 +0100, Lorenzo Pieralisi wrote:
 [...]
  +static const struct of_device_id vexpress_spc_ids[] __initconst = {
  +   { .compatible = arm,vexpress-spc,v2p-ca15_a7 },
  +   { .compatible = arm,vexpress-spc },
  +   {},
  +};
  +
  +static int __init vexpress_spc_init(void)
  +{
  +   int ret;
  +   struct device_node *node = of_find_matching_node(NULL,
  +vexpress_spc_ids);
 
 To allow for devices without an SPC we should check for !node here and
 bail out, otherwise we get an ugly message from the WARN_ON further
 down. I see this on RTSM, and multiplatform kernels would suffer this as
 well.
 
 Even if the ugly warning wasn't there, it still seems cleaner to me to
 have a proper check for an absent spc node.

Absolutely, I will apply both fixes, thanks a lot for the review.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH v3 1/2] drivers: mfd: refactor the vexpress config bridge API

2013-06-06 Thread Lorenzo Pieralisi
From: Pawel Moll pawel.m...@arm.com

The introduction of Serial Power Controller (SPC) requires the vexpress
config interface to change slightly since the SPC memory mapped interface
can be used as configuration bus but also for operating points
programming and retrieval. The helper that allocates the bridge functions
requires an additional parameter allowing to request component specific
functions that need not be initialized through device tree bindings but
just using simple look-up and statically defined constants.

This patch introduces the necessary changes to the vexpress config layer
to cater for the new vexpress bridge interface requirements.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Achin Gupta achin.gu...@arm.com
Cc: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Pawel Moll pawel.m...@arm.com
Acked-by: Nicolas Pitre n...@linaro.org
---
 drivers/mfd/vexpress-config.c | 61 ++
 drivers/mfd/vexpress-sysreg.c |  2 +-
 include/linux/vexpress.h  | 16 ++-
 3 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/drivers/mfd/vexpress-config.c b/drivers/mfd/vexpress-config.c
index 84ce6b9..1af2b0e 100644
--- a/drivers/mfd/vexpress-config.c
+++ b/drivers/mfd/vexpress-config.c
@@ -86,29 +86,13 @@ void vexpress_config_bridge_unregister(struct 
vexpress_config_bridge *bridge)
 }
 EXPORT_SYMBOL(vexpress_config_bridge_unregister);
 
-
-struct vexpress_config_func {
-   struct vexpress_config_bridge *bridge;
-   void *func;
-};
-
-struct vexpress_config_func *__vexpress_config_func_get(struct device *dev,
-   struct device_node *node)
+static struct vexpress_config_bridge *
+   vexpress_config_bridge_find(struct device_node *node)
 {
-   struct device_node *bridge_node;
-   struct vexpress_config_func *func;
int i;
+   struct vexpress_config_bridge *res = NULL;
+   struct device_node *bridge_node = of_node_get(node);
 
-   if (WARN_ON(dev  node  dev-of_node != node))
-   return NULL;
-   if (dev  !node)
-   node = dev-of_node;
-
-   func = kzalloc(sizeof(*func), GFP_KERNEL);
-   if (!func)
-   return NULL;
-
-   bridge_node = of_node_get(node);
while (bridge_node) {
const __be32 *prop = of_get_property(bridge_node,
arm,vexpress,config-bridge, NULL);
@@ -129,13 +113,46 @@ struct vexpress_config_func 
*__vexpress_config_func_get(struct device *dev,
 
if (test_bit(i, vexpress_config_bridges_map) 
bridge-node == bridge_node) {
-   func-bridge = bridge;
-   func-func = bridge-info-func_get(dev, node);
+   res = bridge;
break;
}
}
mutex_unlock(vexpress_config_bridges_mutex);
 
+   return res;
+}
+
+
+struct vexpress_config_func {
+   struct vexpress_config_bridge *bridge;
+   void *func;
+};
+
+struct vexpress_config_func *__vexpress_config_func_get(
+   struct vexpress_config_bridge *bridge,
+   struct device *dev,
+   struct device_node *node,
+   const char *id)
+{
+   struct vexpress_config_func *func;
+
+   if (WARN_ON(dev  node  dev-of_node != node))
+   return NULL;
+   if (dev  !node)
+   node = dev-of_node;
+
+   if (!bridge)
+   bridge = vexpress_config_bridge_find(node);
+   if (!bridge)
+   return NULL;
+
+   func = kzalloc(sizeof(*func), GFP_KERNEL);
+   if (!func)
+   return NULL;
+
+   func-bridge = bridge;
+   func-func = bridge-info-func_get(dev, node, id);
+
if (!func-func) {
of_node_put(node);
kfree(func);
diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c
index 96a020b..d2599aa 100644
--- a/drivers/mfd/vexpress-sysreg.c
+++ b/drivers/mfd/vexpress-sysreg.c
@@ -165,7 +165,7 @@ static u32 *vexpress_sysreg_config_data;
 static int vexpress_sysreg_config_tries;
 
 static void *vexpress_sysreg_config_func_get(struct device *dev,
-   struct device_node *node)
+   struct device_node *node, const char *id)
 {
struct vexpress_sysreg_config_func *config_func;
u32 site;
diff --git a/include/linux/vexpress.h b/include/linux/vexpress.h
index 6e7980d..50368e0 100644
--- a/include/linux/vexpress.h
+++ b/include/linux/vexpress.h
@@ -68,7 +68,8 @@
  */
 struct vexpress_config_bridge_info {
const char *name;
-   void *(*func_get)(struct device *dev, struct device_node *node);
+   void *(*func_get)(struct device *dev, struct device_node *node,
+ const char *id);
void (*func_put)(void *func);

[RFC PATCH v3 2/2] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-06-06 Thread Lorenzo Pieralisi
The TC2 versatile express core tile integrates a logic block that provides the
interface between the dual cluster test-chip and the M3 microcontroller that
carries out power management. The logic block, called Serial Power Controller
(SPC), contains several memory mapped registers to control among other things
low-power states, operating points and reset control.

This patch provides a driver that enables run-time control of features
implemented by the SPC control logic.

The driver also provides a bridge interface through the vexpress config
infrastructure. Operations allowing to read/write operating points are
made to go via the same interface as configuration transactions so that
all requests to M3 are serialized.

Device tree bindings documentation for the SPC component is provided with
the patchset.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Achin Gupta achin.gu...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
Signed-off-by: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Reviewed-by: Nicolas Pitre n...@linaro.org
---
 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  35 +
 drivers/mfd/Kconfig|   7 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-spc.c | 633 ++
 include/linux/vexpress.h   |  43 +
 5 files changed, 719 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/vexpress-spc.txt 
b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
new file mode 100644
index 000..1d71dc2
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
@@ -0,0 +1,35 @@
+* ARM Versatile Express Serial Power Controller device tree bindings
+
+Latest ARM development boards implement a power management interface (serial
+power controller - SPC) that is capable of managing power/voltage and
+operating point transitions, through memory mapped registers interface.
+
+On testchips like TC2 it also provides a configuration interface that can
+be used to read/write values which cannot be read/written through simple
+memory mapped reads/writes.
+
+- spc node
+
+   - compatible:
+   Usage: required
+   Value type: stringlist
+   Definition: must be
+   arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc
+   - reg:
+   Usage: required
+   Value type: prop-encode-array
+   Definition: A standard property that specifies the base address
+   and the size of the SPC address space
+   - interrupts:
+   Usage: required
+   Value type: prop-encoded-array
+   Definition:  SPC interrupt configuration. A standard property
+that follows ePAPR interrupts specifications
+
+Example:
+
+spc: spc@7fff {
+   compatible = arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc;
+   reg = 0 0x7FFF 0 0x1000;
+   interrupts = 0 95 4;
+};
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index d54e985..391eda1 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1148,3 +1148,10 @@ config VEXPRESS_CONFIG
help
  Platform configuration infrastructure for the ARM Ltd.
  Versatile Express.
+
+config VEXPRESS_SPC
+   bool Versatile Express SPC driver support
+   depends on ARM
+   depends on VEXPRESS_CONFIG
+   help
+ Serial Power Controller driver for ARM Ltd. test chips.
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 718e94a..3a01203 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -153,5 +153,6 @@ obj-$(CONFIG_MFD_SEC_CORE)  += sec-core.o sec-irq.o
 obj-$(CONFIG_MFD_SYSCON)   += syscon.o
 obj-$(CONFIG_MFD_LM3533)   += lm3533-core.o lm3533-ctrlbank.o
 obj-$(CONFIG_VEXPRESS_CONFIG)  += vexpress-config.o vexpress-sysreg.o
+obj-$(CONFIG_VEXPRESS_SPC) += vexpress-spc.o
 obj-$(CONFIG_MFD_RETU) += retu-mfd.o
 obj-$(CONFIG_MFD_AS3711)   += as3711.o
diff --git a/drivers/mfd/vexpress-spc.c b/drivers/mfd/vexpress-spc.c
new file mode 100644
index 000..0c6718a
--- /dev/null
+++ b/drivers/mfd/vexpress-spc.c
@@ -0,0 +1,633 @@
+/*
+ * Versatile Express Serial Power Controller (SPC) support
+ *
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * Authors: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
+ *  Achin Gupta   achin.gu...@arm.com
+ *  Lorenzo Pieralisi lorenzo.pieral...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed as is WITHOUT ANY WARRANTY of any
+ * kind, whether

[RFC PATCH v3 0/2] drivers: mfd: Versatile Express SPC support

2013-06-06 Thread Lorenzo Pieralisi
This patch is v3 of a previous posting:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/173464.html

v3 changes:

- added __refdata to spc_check_loaded pointer
- removed some exported symbols
- added node pointer check in vexpress_spc_init()

v2 changes:

- Dropped timeout interface patch
- Converted interfaces to non-timeout ones, integrated and retested
- Removed mutex used at init
- Refactored code to work around init sections warning
- Fixed two minor bugs

This patch series introduces support for the Versatile Express Serial
Power Controller (SPC) present in ARM Versatile Express TC2 core tiles.
SPC driver is a fundamental component of TC2 power management and allows
to carry out C-state management and DVFS for A15 and A7 clusters.

First patch provides changes required by SPC to comply with the
Versatile Express config API, second patch is the SPC driver implementation.

Code extensively exercised through CPUidle and CPUfreq power states and
operating point transitions.

Lorenzo Pieralisi (1):
  drivers: mfd: vexpress: add Serial Power Controller (SPC) support

Pawel Moll (1):
  drivers: mfd: refactor the vexpress config bridge API

 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  35 +
 drivers/mfd/Kconfig|   7 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-config.c  |  61 +-
 drivers/mfd/vexpress-spc.c | 633 ++
 drivers/mfd/vexpress-sysreg.c  |   2 +-
 include/linux/vexpress.h   |  59 +-
 7 files changed, 770 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
 create mode 100644 drivers/mfd/vexpress-spc.c

-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm/dt: Don't add disabled CPUs to system topology

2013-06-07 Thread Lorenzo Pieralisi
Hi James,

On Thu, Jun 06, 2013 at 06:11:25PM +0100, James King wrote:
 If CPUs are marked as disabled in the devicetree, make sure they do
 not exist in the system CPU information and CPU topology information.
 In this case these CPUs will not be able to be added to the system later
 using hot-plug. This allows a single chip with many CPUs to be easily
 used in a variety of hardware devices where they may have different
 actual processing requirements (eg for thermal/cost reasons).
 
 - Change devicetree.c to ignore any cpu nodes marked as disabled,
   this effectively limits the number of active cpu cores so no need
   for the max_cpus=x in the chosen node.
 - Change topology.c to ignore any cpu nodes marked as disabled, this
   is where the scheduler would learn about big/LITTLE cores so this
   effectively keeps the scheduler in sync.
 

I have two questions:

1) Since with this approach the DT should change anyway if on different
   hardware devices based on the same chip you want to allow booting a
   different number of CPUs, why do not we remove the cpu nodes instead of
   disabling them ? Put it another way: cpu nodes define a cpu as
   possible (currently), we can simply remove the node if we do not want
   that cpu to be seen by the kernel.
2) If we go for the status property, why do not we use it to set present
   mask ? That way the cpu is possible but not present, you cannot
   hotplug it in. It is a bit of a stretch, granted, the cpu _is_ present,
   we just want to disable it, do not know how this is handled in x86
   and other archs though.

I am just asking, since it is something I thought about while writing
code that parses the DT cpu map, basically we do not have a way to
disable a cpu in the DT and that's what you are doing, I just would like
to understand the best way to put it into DT bindings.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the arm-mpidr tree with the arm tree

2013-06-26 Thread Lorenzo Pieralisi
Hi Stephen,

On Wed, Jun 26, 2013 at 02:04:11AM +0100, Stephen Rothwell wrote:
 Hi Lorenzo,
 
 Today's linux-next merge of the arm-mpidr tree got a conflict in
 arch/arm/kernel/suspend.c between commit 7604537bbb57 (ARM: kernel:
 implement stack pointer save array through MPIDR hashing) from the arm
 tree and commit 3fed6a1e3bf0 (ARM: kernel: implement stack pointer save
 array through MPIDR hashing) from the arm-mpidr tree.
 
 The former is just a rebase of the latter, so I used it.  It then turns
 out that the arm-mpidr contains nothing that is not already included in
 the arm tree, so is it needed any more i.e. is there further work coming?

No, arm-mpidr can be dropped from -next now that those patches, as you
correctly mentioned, are queued through the arm tree.

Thank you very much indeed,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/3] drivers: mfd: vexpress: add timeout API to vexpress config interface

2013-06-03 Thread Lorenzo Pieralisi
On Mon, Jun 03, 2013 at 11:15:32AM +0100, Jon Medhurst (Tixy) wrote:
 On Fri, 2013-05-24 at 13:53 +0100, Lorenzo Pieralisi wrote:
  In case some transactions to the Serial Power Controller (SPC) are lost 
  owing
  to multiple operations handled at once by the M3 controller the OS needs to
  rely on a configuration API that can time out so that failures do not result
  in an unusable system.
  
  This patch adds a timeout API to the vexpress config programming interface,
  and refactors the existing read/write functions so that they can be reused
  seamlessly on top of the newly defined API.
 
 Isn't one of the main purposes of the config interface to serialise
 transactions to the config bus, so why would the SPC be handling
 multiple transactions at once? And if we can in fact loose transactions
 doesn't this mean we get random failures in the system? E.g. if this
 happened at boot in vexpress_spc_populate_opps then cpufreq will fail.

It has more to do with firmware carrying out background operations like
powering up a cluster when a DVFS is requested. You are absolutely right
though:

a) the timeout interface is broken, as you mentioned (I noticed after
   posting it)
b) we should not add a timeout interface to paper over FW issues

I can prepare a v2 with timeout interface dropped and extensively test that
one, I do not think we should add the required complexity that you describe
below for something that should never happen.

 Also, I think the code implementing timeouts is broken, see below.

I will have a look asap and repost a v2 accordingly.

  Cc: Samuel Ortiz sa...@linux.intel.com
  Cc: Achin Gupta achin.gu...@arm.com
  Cc: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
  Cc: Pawel Moll pawel.m...@arm.com
  Cc: Nicolas Pitre nicolas.pi...@linaro.org
  Cc: Amit Kucheria amit.kuche...@linaro.org
  Cc: Jon Medhurst t...@linaro.org
  Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
  ---
   drivers/mfd/vexpress-config.c | 26 +++---
   include/linux/vexpress.h  | 23 ++--
   2 files changed, 37 insertions(+), 12 deletions(-)
  
  diff --git a/drivers/mfd/vexpress-config.c b/drivers/mfd/vexpress-config.c
  index 1af2b0e..6f4aa5a 100644
  --- a/drivers/mfd/vexpress-config.c
  +++ b/drivers/mfd/vexpress-config.c
  @@ -266,8 +266,18 @@ int vexpress_config_wait(struct vexpress_config_trans 
  *trans)
   }
   EXPORT_SYMBOL(vexpress_config_wait);
   
  -int vexpress_config_read(struct vexpress_config_func *func, int offset,
  -   u32 *data)
  +int vexpress_config_wait_timeout(struct vexpress_config_trans *trans,
  +   long jiffies)
  +{
  +   int ret;
  +   ret = wait_for_completion_timeout(trans-completion, jiffies);
 
 If the request times out, don't we need to call vexpress_config_complete
 to dequeue the timed out request and trigger the next one? Though we
 will still have a problem where the timeout happens but the request
 then does in fact complete normally, in that case we would signal
 completion of the second request before it has in fact completed.
 
 So, if transactions really can get silently dropped by thing on the end
 of the config bus, then we must have a mechanism for associating a
 particular transaction with a completion signal, otherwise we won't know
 what transaction actually got completed OK and which ones were dropped
 and should receive -ETIMEDOUT.
 
 Finally, I don't think these issues are purely theoretical, I'm pretty
 certain that the kernel panics and spinlock bad magic errors I see with
 his patch series are due to requests completing after they have been
 timed out and then the stack based transaction object is being accessed
 after it has gone out of scope.

You are absolutely right, apologies for wasting your time in testing it.

Thanks a lot for the review,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/3] drivers: mfd: vexpress: add timeout API to vexpress config interface

2013-06-03 Thread Lorenzo Pieralisi
On Mon, Jun 03, 2013 at 01:03:50PM +0100, Jon Medhurst (Tixy) wrote:
 On Mon, 2013-06-03 at 12:52 +0100, Lorenzo Pieralisi wrote:
  On Mon, Jun 03, 2013 at 11:15:32AM +0100, Jon Medhurst (Tixy) wrote:
   On Fri, 2013-05-24 at 13:53 +0100, Lorenzo Pieralisi wrote:
In case some transactions to the Serial Power Controller (SPC) are lost 
owing
to multiple operations handled at once by the M3 controller the OS 
needs to
rely on a configuration API that can time out so that failures do not 
result
in an unusable system.

This patch adds a timeout API to the vexpress config programming 
interface,
and refactors the existing read/write functions so that they can be 
reused
seamlessly on top of the newly defined API.
   
   Isn't one of the main purposes of the config interface to serialise
   transactions to the config bus, so why would the SPC be handling
   multiple transactions at once? And if we can in fact loose transactions
   doesn't this mean we get random failures in the system? E.g. if this
   happened at boot in vexpress_spc_populate_opps then cpufreq will fail.
  
  It has more to do with firmware carrying out background operations like
  powering up a cluster when a DVFS is requested.
 
 Would that make it drop transactions or just take a longer time to get
 around to servicing them?

It should just take longer to service them, that's what the behaviour
should be.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 0/3] drivers: mfd: Versatile Express SPC support

2013-05-24 Thread Lorenzo Pieralisi
This patch series introduces support for the Versatile Express Serial
Power Controller (SPC) present in ARM Versatile Express TC2 core tiles.
SPC driver is a fundamental component of TC2 power management and allows
to carry out C-state management and DVFS for A15 and A7 clusters.

First two patches provide changes required by SPC to comply with the
Versatile Express config API, third patch is the SPC driver implementation.

Code extensively exercised through CPUidle and CPUfreq power states and
operating point transitions.

Lorenzo Pieralisi (2):
  drivers: mfd: vexpress: add timeout API to vexpress config interface
  drivers: mfd: vexpress: add Serial Power Controller (SPC) support

Pawel Moll (1):
  drivers: mfd: refactor the vexpress config bridge API

 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  35 +
 drivers/mfd/Kconfig|   7 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-config.c  |  87 +-
 drivers/mfd/vexpress-spc.c | 626 ++
 drivers/mfd/vexpress-sysreg.c  |   2 +-
 include/linux/vexpress.h   |  82 +-
 7 files changed, 800 insertions(+), 40 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/mfd/vexpress-spc.txt
 create mode 100644 drivers/mfd/vexpress-spc.c

-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/3] drivers: mfd: refactor the vexpress config bridge API

2013-05-24 Thread Lorenzo Pieralisi
From: Pawel Moll pawel.m...@arm.com

The introduction of Serial Power Controller (SPC) requires the vexpress
config interface to change slightly since the SPC memory mapped interface
can be used as configuration bus but also for operating points
programming and retrieval. The helper that allocates the bridge functions
requires an additional parameter allowing to request component specific
functions that need not be initialized through device tree bindings but
just using simple look-up and statically defined constants.

This patch introduces the necessary changes to the vexpress config layer
to cater for the new vexpress bridge interface requirements.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Achin Gupta achin.gu...@arm.com
Cc: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Pawel Moll pawel.m...@arm.com
---
 drivers/mfd/vexpress-config.c | 61 ++
 drivers/mfd/vexpress-sysreg.c |  2 +-
 include/linux/vexpress.h  | 16 ++-
 3 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/drivers/mfd/vexpress-config.c b/drivers/mfd/vexpress-config.c
index 84ce6b9..1af2b0e 100644
--- a/drivers/mfd/vexpress-config.c
+++ b/drivers/mfd/vexpress-config.c
@@ -86,29 +86,13 @@ void vexpress_config_bridge_unregister(struct 
vexpress_config_bridge *bridge)
 }
 EXPORT_SYMBOL(vexpress_config_bridge_unregister);
 
-
-struct vexpress_config_func {
-   struct vexpress_config_bridge *bridge;
-   void *func;
-};
-
-struct vexpress_config_func *__vexpress_config_func_get(struct device *dev,
-   struct device_node *node)
+static struct vexpress_config_bridge *
+   vexpress_config_bridge_find(struct device_node *node)
 {
-   struct device_node *bridge_node;
-   struct vexpress_config_func *func;
int i;
+   struct vexpress_config_bridge *res = NULL;
+   struct device_node *bridge_node = of_node_get(node);
 
-   if (WARN_ON(dev  node  dev-of_node != node))
-   return NULL;
-   if (dev  !node)
-   node = dev-of_node;
-
-   func = kzalloc(sizeof(*func), GFP_KERNEL);
-   if (!func)
-   return NULL;
-
-   bridge_node = of_node_get(node);
while (bridge_node) {
const __be32 *prop = of_get_property(bridge_node,
arm,vexpress,config-bridge, NULL);
@@ -129,13 +113,46 @@ struct vexpress_config_func 
*__vexpress_config_func_get(struct device *dev,
 
if (test_bit(i, vexpress_config_bridges_map) 
bridge-node == bridge_node) {
-   func-bridge = bridge;
-   func-func = bridge-info-func_get(dev, node);
+   res = bridge;
break;
}
}
mutex_unlock(vexpress_config_bridges_mutex);
 
+   return res;
+}
+
+
+struct vexpress_config_func {
+   struct vexpress_config_bridge *bridge;
+   void *func;
+};
+
+struct vexpress_config_func *__vexpress_config_func_get(
+   struct vexpress_config_bridge *bridge,
+   struct device *dev,
+   struct device_node *node,
+   const char *id)
+{
+   struct vexpress_config_func *func;
+
+   if (WARN_ON(dev  node  dev-of_node != node))
+   return NULL;
+   if (dev  !node)
+   node = dev-of_node;
+
+   if (!bridge)
+   bridge = vexpress_config_bridge_find(node);
+   if (!bridge)
+   return NULL;
+
+   func = kzalloc(sizeof(*func), GFP_KERNEL);
+   if (!func)
+   return NULL;
+
+   func-bridge = bridge;
+   func-func = bridge-info-func_get(dev, node, id);
+
if (!func-func) {
of_node_put(node);
kfree(func);
diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c
index 96a020b..d2599aa 100644
--- a/drivers/mfd/vexpress-sysreg.c
+++ b/drivers/mfd/vexpress-sysreg.c
@@ -165,7 +165,7 @@ static u32 *vexpress_sysreg_config_data;
 static int vexpress_sysreg_config_tries;
 
 static void *vexpress_sysreg_config_func_get(struct device *dev,
-   struct device_node *node)
+   struct device_node *node, const char *id)
 {
struct vexpress_sysreg_config_func *config_func;
u32 site;
diff --git a/include/linux/vexpress.h b/include/linux/vexpress.h
index 6e7980d..50368e0 100644
--- a/include/linux/vexpress.h
+++ b/include/linux/vexpress.h
@@ -68,7 +68,8 @@
  */
 struct vexpress_config_bridge_info {
const char *name;
-   void *(*func_get)(struct device *dev, struct device_node *node);
+   void *(*func_get)(struct device *dev, struct device_node *node,
+ const char *id);
void (*func_put)(void *func);
int (*func_exec)(void *func, int 

[RFC PATCH 2/3] drivers: mfd: vexpress: add timeout API to vexpress config interface

2013-05-24 Thread Lorenzo Pieralisi
In case some transactions to the Serial Power Controller (SPC) are lost owing
to multiple operations handled at once by the M3 controller the OS needs to
rely on a configuration API that can time out so that failures do not result
in an unusable system.

This patch adds a timeout API to the vexpress config programming interface,
and refactors the existing read/write functions so that they can be reused
seamlessly on top of the newly defined API.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Achin Gupta achin.gu...@arm.com
Cc: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
---
 drivers/mfd/vexpress-config.c | 26 +++---
 include/linux/vexpress.h  | 23 ++--
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/mfd/vexpress-config.c b/drivers/mfd/vexpress-config.c
index 1af2b0e..6f4aa5a 100644
--- a/drivers/mfd/vexpress-config.c
+++ b/drivers/mfd/vexpress-config.c
@@ -266,8 +266,18 @@ int vexpress_config_wait(struct vexpress_config_trans 
*trans)
 }
 EXPORT_SYMBOL(vexpress_config_wait);
 
-int vexpress_config_read(struct vexpress_config_func *func, int offset,
-   u32 *data)
+int vexpress_config_wait_timeout(struct vexpress_config_trans *trans,
+   long jiffies)
+{
+   int ret;
+   ret = wait_for_completion_timeout(trans-completion, jiffies);
+
+   return ret ? trans-status : -ETIMEDOUT;
+}
+EXPORT_SYMBOL(vexpress_config_wait_timeout);
+
+int vexpress_config_read_timeout(struct vexpress_config_func *func, int offset,
+   u32 *data, long jiffies)
 {
struct vexpress_config_trans trans = {
.func = func,
@@ -279,14 +289,14 @@ int vexpress_config_read(struct vexpress_config_func 
*func, int offset,
int status = vexpress_config_schedule(trans);
 
if (status == VEXPRESS_CONFIG_STATUS_WAIT)
-   status = vexpress_config_wait(trans);
+   status = vexpress_config_wait_timeout(trans, jiffies);
 
return status;
 }
-EXPORT_SYMBOL(vexpress_config_read);
+EXPORT_SYMBOL(vexpress_config_read_timeout);
 
-int vexpress_config_write(struct vexpress_config_func *func, int offset,
-   u32 data)
+int vexpress_config_write_timeout(struct vexpress_config_func *func,
+ int offset, u32 data, long jiffies)
 {
struct vexpress_config_trans trans = {
.func = func,
@@ -298,8 +308,8 @@ int vexpress_config_write(struct vexpress_config_func 
*func, int offset,
int status = vexpress_config_schedule(trans);
 
if (status == VEXPRESS_CONFIG_STATUS_WAIT)
-   status = vexpress_config_wait(trans);
+   status = vexpress_config_wait_timeout(trans, jiffies);
 
return status;
 }
-EXPORT_SYMBOL(vexpress_config_write);
+EXPORT_SYMBOL(vexpress_config_write_timeout);
diff --git a/include/linux/vexpress.h b/include/linux/vexpress.h
index 50368e0..e5015d8 100644
--- a/include/linux/vexpress.h
+++ b/include/linux/vexpress.h
@@ -15,6 +15,7 @@
 #define _LINUX_VEXPRESS_H
 
 #include linux/device.h
+#include linux/sched.h
 
 #define VEXPRESS_SITE_MB   0
 #define VEXPRESS_SITE_DB1  1
@@ -102,10 +103,24 @@ struct vexpress_config_func *__vexpress_config_func_get(
 void vexpress_config_func_put(struct vexpress_config_func *func);
 
 /* Both may sleep! */
-int vexpress_config_read(struct vexpress_config_func *func, int offset,
-   u32 *data);
-int vexpress_config_write(struct vexpress_config_func *func, int offset,
-   u32 data);
+int vexpress_config_read_timeout(struct vexpress_config_func *func, int offset,
+   u32 *data, long jiffies);
+int vexpress_config_write_timeout(struct vexpress_config_func *func,
+   int offset, u32 data, long jiffies);
+
+static inline int vexpress_config_read(struct vexpress_config_func *func,
+int offset, u32 *data)
+{
+   return vexpress_config_read_timeout(func, offset, data,
+MAX_SCHEDULE_TIMEOUT);
+}
+
+static inline int vexpress_config_write(struct vexpress_config_func *func,
+int offset, u32 data)
+{
+   return vexpress_config_write_timeout(func, offset, data,
+MAX_SCHEDULE_TIMEOUT);
+}
 
 /* Platform control */
 
-- 
1.8.2.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 3/3] drivers: mfd: vexpress: add Serial Power Controller (SPC) support

2013-05-24 Thread Lorenzo Pieralisi
The TC2 versatile express core tile integrates a logic block that provides the
interface between the dual cluster test-chip and the M3 microcontroller that
carries out power management. The logic block, called Serial Power Controller
(SPC), contains several memory mapped registers to control among other things
low-power states, operating points and reset control.

This patch provides a driver that enables run-time control of features
implemented by the SPC control logic.

The driver also provides a bridge interface through the vexpress config
infrastructure. Operations allowing to read/write operating points are
made to go via the same interface as configuration transactions so that
all requests to M3 are serialized.

Device tree bindings documentation for the SPC component are provided with
the patchset.

Cc: Samuel Ortiz sa...@linux.intel.com
Cc: Pawel Moll pawel.m...@arm.com
Cc: Nicolas Pitre nicolas.pi...@linaro.org
Cc: Amit Kucheria amit.kuche...@linaro.org
Cc: Jon Medhurst t...@linaro.org
Signed-off-by: Achin Gupta achin.gu...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
Signed-off-by: Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
---
 Documentation/devicetree/bindings/mfd/vexpress-spc.txt |  35 +
 drivers/mfd/Kconfig|   7 +
 drivers/mfd/Makefile   |   1 +
 drivers/mfd/vexpress-spc.c | 626 ++
 include/linux/vexpress.h   |  43 +
 5 files changed, 712 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/vexpress-spc.txt 
b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
new file mode 100644
index 000..1d71dc2
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/vexpress-spc.txt
@@ -0,0 +1,35 @@
+* ARM Versatile Express Serial Power Controller device tree bindings
+
+Latest ARM development boards implement a power management interface (serial
+power controller - SPC) that is capable of managing power/voltage and
+operating point transitions, through memory mapped registers interface.
+
+On testchips like TC2 it also provides a configuration interface that can
+be used to read/write values which cannot be read/written through simple
+memory mapped reads/writes.
+
+- spc node
+
+   - compatible:
+   Usage: required
+   Value type: stringlist
+   Definition: must be
+   arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc
+   - reg:
+   Usage: required
+   Value type: prop-encode-array
+   Definition: A standard property that specifies the base address
+   and the size of the SPC address space
+   - interrupts:
+   Usage: required
+   Value type: prop-encoded-array
+   Definition:  SPC interrupt configuration. A standard property
+that follows ePAPR interrupts specifications
+
+Example:
+
+spc: spc@7fff {
+   compatible = arm,vexpress-spc,v2p-ca15_a7,arm,vexpress-spc;
+   reg = 0 0x7FFF 0 0x1000;
+   interrupts = 0 95 4;
+};
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index d9aed15..b5259af 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1147,3 +1147,10 @@ config VEXPRESS_CONFIG
help
  Platform configuration infrastructure for the ARM Ltd.
  Versatile Express.
+
+config VEXPRESS_SPC
+   bool Versatile Express SPC driver support
+   depends on ARM
+   depends on VEXPRESS_CONFIG
+   help
+ Serial Power Controller driver for ARM Ltd. test chips.
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 718e94a..3a01203 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -153,5 +153,6 @@ obj-$(CONFIG_MFD_SEC_CORE)  += sec-core.o sec-irq.o
 obj-$(CONFIG_MFD_SYSCON)   += syscon.o
 obj-$(CONFIG_MFD_LM3533)   += lm3533-core.o lm3533-ctrlbank.o
 obj-$(CONFIG_VEXPRESS_CONFIG)  += vexpress-config.o vexpress-sysreg.o
+obj-$(CONFIG_VEXPRESS_SPC) += vexpress-spc.o
 obj-$(CONFIG_MFD_RETU) += retu-mfd.o
 obj-$(CONFIG_MFD_AS3711)   += as3711.o
diff --git a/drivers/mfd/vexpress-spc.c b/drivers/mfd/vexpress-spc.c
new file mode 100644
index 000..f78257a
--- /dev/null
+++ b/drivers/mfd/vexpress-spc.c
@@ -0,0 +1,626 @@
+/*
+ * Versatile Express Serial Power Controller (SPC) support
+ *
+ * Copyright (C) 2013 ARM Ltd.
+ *
+ * Author(s): Sudeep KarkadaNagesha sudeep.karkadanage...@arm.com
+ *Achin Gupta   achin.gu...@arm.com
+ *Lorenzo Pieralisi lorenzo.pieral...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed as is WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even

Re: [RFC PATCH v2 3/4] powerpc: refactor of_get_cpu_node to support other architectures

2013-08-29 Thread Lorenzo Pieralisi
On Wed, Aug 28, 2013 at 08:46:38PM +0100, Grant Likely wrote:
 On Thu, 22 Aug 2013 14:59:30 +0100, Mark Rutland mark.rutl...@arm.com wrote:
  On Mon, Aug 19, 2013 at 02:56:10PM +0100, Sudeep KarkadaNagesha wrote:
   On 19/08/13 14:02, Rob Herring wrote:
On 08/19/2013 05:19 AM, Mark Rutland wrote:
On Sat, Aug 17, 2013 at 11:09:36PM +0100, Benjamin Herrenschmidt wrote:
On Sat, 2013-08-17 at 12:50 +0200, Tomasz Figa wrote:
I wonder how would this handle uniprocessor ARM (pre-v7) cores, for
which 
the updated bindings[1] define #address-cells = 0 and so no reg 
property.
   
[1] - http://thread.gmane.org/gmane.linux.ports.arm.kernel/260795
   
Why did you do that in the binding ? That sounds like looking to 
create
problems ... 
   
Traditionally, UP setups just used 0 as the reg property on other
architectures, why do differently ?
   
The decision was taken because we defined our reg property to refer to
the MPIDR register's Aff{2,1,0} bitfields, and on UP cores before v7
there's no MPIDR register at all. Given there can only be a single CPU
in that case, describing a register that wasn't present didn't seem
necessary or helpful.

What exactly reg represents is up to the binding definition, but it
still should be present IMO. I don't see any issue with it being
different for pre-v7.

   Yes it's better to have 'reg' with value 0 than not having it.
   Otherwise this generic of_get_cpu_node implementation would need some
   _hack_ to handle that case.
  
  I'm not sure that having some code to handle a difference in standard
  between two architectures is a hack. If anything, I'd argue encoding a
  reg of 0 that corresponds to a nonexistent MPIDR value (given that's
  what the reg property is defined to map to on ARM) is more of a hack ;)
  
  I'm not averse to having a reg value of 0 for this case, but given that
  there are existing devicetrees without it, requiring a reg property will
  break compatibility with them.
 
 Then special cases those device trees, but you changing existing
 convention really needs to be avoided. The referenced documentation
 change is brand new, so we're not stuck with it.

I have no problem with changing the bindings and forcing:

#address-cells = 1;
reg = 0;

for UP predating v7, my big worry is related to in-kernel dts that we
already patched to follow the #address-cells = 0 rule (and we had to
do it since we got asked that question multiple times on the public
lists).

What do you mean by special case those device trees ? I have not
planned to patch them again, unless we really consider that a necessary
evil.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the arm-soc tree with the pm tree

2013-09-02 Thread Lorenzo Pieralisi
On Thu, Aug 29, 2013 at 06:57:15PM +0100, Olof Johansson wrote:
 On Thu, Aug 29, 2013 at 06:04:25PM +1000, Stephen Rothwell wrote:
  Hi all,
  
  Today's linux-next merge of the arm-soc tree got a conflict in
  drivers/cpuidle/Makefile between commits b98e01ad4ed9 (cpuidle: Add
  Kconfig.arm and move calxeda, kirkwood and zynq) and d3f2950f2ade (ARM:
  ux500: cpuidle: Move ux500 cpuidle driver to drivers/cpuidle) from the
  pm tree and commit 14d2c34cfa00 (cpuidle: big.LITTLE: vexpress-TC2 CPU
  idle driver) from the arm-soc tree.
  
  I fixed it up (see below) and can carry the fix as necessary (no action
  is required).
  
  -- 
  Cheers,
  Stephen Rothwells...@canb.auug.org.au
  
  diff --cc drivers/cpuidle/Makefile
  index 0b9d200,3b6445c..000
  --- a/drivers/cpuidle/Makefile
  +++ b/drivers/cpuidle/Makefile
  @@@ -5,9 -5,7 +5,10 @@@
obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o

   -obj-$(CONFIG_CPU_IDLE_CALXEDA) += cpuidle-calxeda.o
   -obj-$(CONFIG_ARCH_KIRKWOOD) += cpuidle-kirkwood.o
   -obj-$(CONFIG_CPU_IDLE_ZYNQ) += cpuidle-zynq.o
   -obj-$(CONFIG_CPU_IDLE_BIG_LITTLE) += cpuidle-big_little.o
   
  +##
   +# ARM SoC drivers
   +obj-$(CONFIG_ARM_HIGHBANK_CPUIDLE)+= cpuidle-calxeda.o
   +obj-$(CONFIG_ARM_KIRKWOOD_CPUIDLE)+= cpuidle-kirkwood.o
   +obj-$(CONFIG_ARM_ZYNQ_CPUIDLE)+= cpuidle-zynq.o
   +obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
  ++obj-$(CONFIG_CPU_IDLE_BIG_LITTLE)   += cpuidle-big_little.o
 
 
 Might want to sort u8500 before zynq, but otherwise looks fine.

I noticed that owing to the merge, CONFIG_CPU_IDLE_BIG_LITTLE should be moved
to the newly introduced Kconfig.arm. How are we going to handle this ? It is
just a matter of renaming the config entry and moving it to Kconfig.arm.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ARM: sunxi: Convert DTSI to new CPU bindings

2013-07-05 Thread Lorenzo Pieralisi
On Sun, Jun 30, 2013 at 10:48:46AM +0100, Lorenzo Pieralisi wrote:
 On Sat, Jun 29, 2013 at 08:38:19PM +0100, Russell King - ARM Linux wrote:
  On Fri, Jun 28, 2013 at 01:05:42PM -0700, Olof Johansson wrote:
   On Fri, Jun 28, 2013 at 1:03 PM, Maxime Ripard
   maxime.rip...@free-electrons.com wrote:
On Fri, Jun 28, 2013 at 06:15:32PM +0100, Lorenzo Pieralisi wrote:
The patch above should already be queued in next/dt right ?
   
Indeed.
   
Then why the latest patch of your patchset got in 3.10, while the
patches actually fixing the DT it would have impacted were delayed to
3.11?
   
(And why was it merged so late in the development cycle?)
   
   This. So now we have to scramble because some device trees will
   produce warnings at boot.
   
   Russell, the alternative is to revert Lorenzo's patch for 3.10 (and
   re-introduce it for 3.11). Do you have a preference?
  
  Sorry but I really don't understand what all the fuss in this thread
  is about.
  
  This thread seems to be saying that two development patches were
  merged, which were 7762/1 and 7763/1, and that 7764/1 is a fix?
  Are you sure about that, because that's not how they're described,
  and not how they look either.
 
 As Olof's warning downgrade is being merged (thanks for that and apologies for
 failing to explain patches dependencies properly and stable related issues),
 7764/1 won't apply cleanly anymore. Can you please drop it from the patch
 system, I will update it and test it first thing tomorrow and send a
 final version to the patch system.

Patch 7779/1, replacing 7764/1 is in the patch system now, and is ready
to get merged.

Unfortunately cpu/cpus bindings documentation updates, following:

https://lists.ozlabs.org/pipermail/devicetree-discuss/2013-June/036735.html
https://lists.ozlabs.org/pipermail/devicetree-discuss/2013-May/033779.html

were not pulled in the kernel. This is an issue since this means that
we still have no reference in the kernel or wherever it has to be, to
the final cpus/cpu bindings for ARM and ARM64 provided in the pull
request link above (that has been reviewed to death and acknowledged).

It is a significant overhaul of cpu/cpus bindings standard for ARM/ARM64,
covering all CPUs harking back to arm926 and beyond, and should be final.

dts updates following that standard have already been pulled into 3.11
through arm-soc.

IMHO the bindings contained in pull request above must be merged in the
kernel asap, I would like to ask you please what should I do to get them in
please. If we want to move bindings documentation elsewhere let's do it,
as long as there is a published standard I am happy and will stop annoying
you with this stuff.

Thank you very much,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: EXYNOS: mcpm: Don't rely on firmware's secondary_cpu_start

2014-06-07 Thread Lorenzo Pieralisi
On Fri, Jun 06, 2014 at 10:43:05PM +0100, Doug Anderson wrote:
 On exynos mcpm systems the firmware is hardcoded to jump to an address
 in SRAM (0x02073000) when secondary CPUs come up.  By default the
 firmware puts a bunch of code at that location.  That code expects the
 kernel to fill in a few slots with addresses that it uses to jump back
 to the kernel's entry point for secondary CPUs.
 
 Originally (on prerelease hardware) this firmware code contained a
 bunch of workarounds to deal with boot ROM bugs.  However on all
 shipped hardware we simply use this code to redirect to a kernel
 function for bringing up the CPUs.
 
 Let's stop relying on the code provided by the bootloader and just
 plumb in our own (simple) code jump to the kernel.  This has the nice
 benefit of fixing problems due to the fact that older bootloaders
 (like the one shipped on the Samsung Chromebook 2) might have put
 slightly different code into this location.
 
 Once suspend/resume is implemented for systems using exynos-mcpm we'll
 need to make sure we reinstall our fixed up code after resume.  ...but
 that's not anything new since IRAM (and thus the address of the
 mcpm_entry_point) is lost across suspend/resume anyway.

Can I ask you please what the firmware does for the boot (primary) cpu
on cold-boot and warm-boot (resume from suspend) ?

Does it jump to a specific (hardcoded) location ?

Is the primary CPU (MPIDR) hardcoded in firmware so that on both
cold and warm-boot firmware sees a specific MPIDR as special ?

I am asking to check if on this platform CPUidle (where the notion of
primary CPU disappears) has a chance to run properly.

Probably CPUidle won't attain idle states where IRAM content is lost, but I
am still worried about the primary vs secondaries firmware boot behaviour.

What happens on reboot from suspend to RAM (or to put it differently,
what does secure firmware do on reboot from suspend to RAM - in
particular how is the jump address to bootloader/kernel set ?)

Thank you very much.

Lorenzo

 
 Signed-off-by: Doug Anderson diand...@chromium.org
 ---
  arch/arm/mach-exynos/mcpm-exynos.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/arch/arm/mach-exynos/mcpm-exynos.c 
 b/arch/arm/mach-exynos/mcpm-exynos.c
 index 0498d0b..3a7fad0 100644
 --- a/arch/arm/mach-exynos/mcpm-exynos.c
 +++ b/arch/arm/mach-exynos/mcpm-exynos.c
 @@ -343,11 +343,13 @@ static int __init exynos_mcpm_init(void)
   pr_info(Exynos MCPM support installed\n);
  
   /*
 -  * Future entries into the kernel can now go
 -  * through the cluster entry vectors.
 +  * U-Boot SPL is hardcoded to jump to the start of ns_sram_base_addr
 +  * as part of secondary_cpu_start().  Let's redirect it to the
 +  * mcpm_entry_point().
*/
 - __raw_writel(virt_to_phys(mcpm_entry_point),
 - ns_sram_base_addr + MCPM_BOOT_ADDR_OFFSET);
 + __raw_writel(0xe59f, ns_sram_base_addr); /* ldr r0, [pc, #0] */
 + __raw_writel(0xe12fff10, ns_sram_base_addr + 4); /* bx  r0 */
 + __raw_writel(virt_to_phys(mcpm_entry_point), ns_sram_base_addr + 8);
  
   iounmap(ns_sram_base_addr);
  
 -- 
 2.0.0.526.g5318336
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: EXYNOS: mcpm: Don't rely on firmware's secondary_cpu_start

2014-06-09 Thread Lorenzo Pieralisi
On Mon, Jun 09, 2014 at 06:03:31PM +0100, Doug Anderson wrote:

[...]

 Cold boot and resume from suspend are detected via various special
 flags in various special locations.  Resume from suspend looks at
 INFORM1 (0x10048004) for flags.  This register is 0 during a cold boot
 and has special values set by the kernel at resume time.
 
 It also looks as if some code looks at 0x10040900 (PMU_SPARE0) to help
 tell initial cold boot and secondary CPU bringup.

Ok, thanks a lot. It looks like firmware paths should be ready to
detect cold vs warm boot, and hopefully do not rely on a specific
MPIDR to come up first out of power states.

  I am asking to check if on this platform CPUidle (where the notion of
  primary CPU disappears) has a chance to run properly.
 
 I believe it should be possible, but we don't have CPUidle implemented
 in our current system.  Abhilash may be able to comment more.

I am interested in more insights, that's very helpful thanks.

  Probably CPUidle won't attain idle states where IRAM content is lost, but I
  am still worried about the primary vs secondaries firmware boot behaviour.
 
 I don't think iRAM can be turned off for CPUidle.

It might be added a system state but I doubt that too and if you are
relying on registers for jump addresses that's not even a problem in
the first place.

  What happens on reboot from suspend to RAM (or to put it differently,
  what does secure firmware do on reboot from suspend to RAM - in
  particular how is the jump address to bootloader/kernel set ?)
 
 Should be described above now.

Thank you very much.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bus/arm-cci: add dependency on OF CPU_V7

2014-05-09 Thread Lorenzo Pieralisi
On Thu, May 08, 2014 at 03:56:18PM +0100, Arnd Bergmann wrote:
 The arm-cci code uses device tree helpers for initialization
 that don't work on kernels built without CONFIG_OF. Further,
 it contains an inline assembly in cci_enable_port_for_self()
 that uses ARMv7 instructions and fails to build when targetting
 other ARM instruction set versions.
 
 This works around both issues by limiting the scope of the
 Kconfig symbol to platforms that can actually build this driver
 cleanly.
 
 Signed-off-by: Arnd Bergmann a...@arndb.de
 Cc: Shawn Guo shawn@linaro.org
 Cc: Lorenzo Pieralisi lorenzo.pieral...@arm.com
 ---
  drivers/bus/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
 index 552373c..d53417e 100644
 --- a/drivers/bus/Kconfig
 +++ b/drivers/bus/Kconfig
 @@ -37,7 +37,7 @@ config OMAP_INTERCONNECT
  
  config ARM_CCI
   bool ARM CCI driver support
 - depends on ARM
 + depends on ARM  OF  CPU_V7
   help
 Driver supporting the CCI cache coherent interconnect for ARM
 platforms.

The dependency on CPU_V7 will need reworking, since we might want to
enable the driver on arm64 platforms too (eg CCI PMUs), other than that:

Acked-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v4 5/5] mcpm: exynos: populate suspend and powered_up callbacks

2014-05-09 Thread Lorenzo Pieralisi
On Mon, May 05, 2014 at 10:27:20AM +0100, Chander Kashyap wrote:
 In order to support cpuidle through mcpm, suspend and powered-up
 callbacks are required in mcpm platform code.
 Hence populate the same callbacks.
 
 Signed-off-by: Chander Kashyap chander.kash...@linaro.org
 Signed-off-by: Chander Kashyap k.chan...@samsung.com
 ---
 Changes in v4: None
 Changes in v3:
   1. Removed coherancy enablement after suspend failure.

coherency

   2. Use generic function to poweron cpu.
 changes in v2:
   1. Fixed typo: enynos_pmu_cpunr to exynos_pmu_cpunr
  arch/arm/mach-exynos/mcpm-exynos.c |   34 ++
  1 file changed, 34 insertions(+)
 
 diff --git a/arch/arm/mach-exynos/mcpm-exynos.c 
 b/arch/arm/mach-exynos/mcpm-exynos.c
 index d0f7461..6d4a907 100644
 --- a/arch/arm/mach-exynos/mcpm-exynos.c
 +++ b/arch/arm/mach-exynos/mcpm-exynos.c
 @@ -256,10 +256,44 @@ static int exynos_power_down_finish(unsigned int cpu, 
 unsigned int cluster)
   return -ETIMEDOUT; /* timeout */
  }
  
 +void exynos_powered_up(void)

static ?

 +{
 + unsigned int mpidr, cpu, cluster;
 +
 + mpidr = read_cpuid_mpidr();
 + cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
 + cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
 +
 + arch_spin_lock(exynos_mcpm_lock);
 + if (cpu_use_count[cpu][cluster] == 0)
 + cpu_use_count[cpu][cluster] = 1;
 + arch_spin_unlock(exynos_mcpm_lock);
 +}
 +
 +static void exynos_suspend(u64 residency)
 +{
 + unsigned int mpidr, cpunr;
 +
 + mpidr = read_cpuid_mpidr();
 + cpunr = exynos_pmu_cpunr(mpidr);

If I were to be picky, I would compute these values only if they
are needed, ie move the computation after exynos_power_down().

There is another quite horrible issue here. We know this code works
because the processors A15/A7 hit the caches with C bit in SCTLR cleared.

On processors where this is not true, this sequence would explode
if power down fails (in case core is gated but L2 is still powered on,
the stack is stuck in L2) since it is going to read stack data that is
in L2 but can't be read.

It is not related to this sequence only, but it is an issue in general
and wanted to mention that on the lists for public awareness.

The gist of what I am saying is, please add a comment to that extent,
here and it should be added in exynos_power_down() too.

 + __raw_writel(virt_to_phys(mcpm_entry_point), ns_sram_base_addr + 0x1c);

No magic numbers please (0x1c). You can add a macro/wrapper, as TC2 does.

 + exynos_power_down();
 +
 + /*
 +  * Execution reaches here only if cpu did not power down.
 +  * Hence roll back the changes done in exynos_power_down function.
 + */
 + exynos_cpu_powerup(cpunr);

Please be aware that if this function returns MCPM will soft reboot, and
the CPUidle driver will have no way to detect a state entry failure.

I am just flagging this up, since fixing this behaviour is not easy, and
honestly, since power down failure should be the exception not the rule,
the idle stats should not be affected much.

I think this is the proper way of implementing the sequence but please
all keep in mind what I wrote above.

Lorenzo

 +}
 +
  static const struct mcpm_platform_ops exynos_power_ops = {
   .power_up   = exynos_power_up,
   .power_down = exynos_power_down,
   .power_down_finish  = exynos_power_down_finish,
 + .suspend= exynos_suspend,
 + .powered_up = exynos_powered_up,
  };
  
  static void __init exynos_mcpm_usage_count_init(void)
 -- 
 1.7.9.5
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: EXYNOS: mcpm: Don't rely on firmware's secondary_cpu_start

2014-06-11 Thread Lorenzo Pieralisi
On Wed, Jun 11, 2014 at 05:52:10AM +0100, Chander Kashyap wrote:
 Hi Doug,
 
 On Tue, Jun 10, 2014 at 9:19 PM, Nicolas Pitre nicolas.pi...@linaro.org 
 wrote:
  On Tue, 10 Jun 2014, Doug Anderson wrote:
 
  My S-state knowledge is not strong, but I believe that Lorenzo's
  questions matter if we're using S2 for CPUidle (where we actually turn
  off power and hot unplug CPUs) but not when we're using S1 for CPUidle
  (where we just enter WFI/WFE).
 
 
 No Its not plain WFI.
 
 All cores in Exynos5420 can be powered off independently.
 This functionality has been tested.
 
 Below is the link for the posted patches.
 
 https://lkml.org/lkml/2014/6/10/194
 
 And as Nicolas wrote, these patches need MCPM for that.

Chander, I cast a look into the code and I have a question
(you also told me on CPUidle review that only core off
is supported in CPUidle).

When you say all cores can be powered off independently, do
you also mean that clusters can be powered off (in CPUidle) ?

I am pointing this out since in the MCPM backend I noticed:

/* TODO: Turn off the cluster here to save power. */

I do not see any cluster power down request in the down path.

If I am wrong, ignore my message, I am just writing to help.

If you can only power down cores, but not the clusters on idle,
please keep in mind that you are currently cleaning and invalidating
the L2 when last man is running and this must not be taken
lightly since, if L2 stays on, that's a massive energy waste
for nothing.

So, if clusters stay up, you _have_ to tweak the MCPM backend slightly
to avoid cleaning L2, that's pivotal.

Lorenzo

 
  I believe that in ChromeOS we use S1 CPUidle and that it works fine.
  We've never implemented S2 that I'm aware of.
 
  You'll have to rely on MCPM for that.  That's probably why it hasn't
  been implemented before.
 
 
  Nicolas
 
  ___
  linux-arm-kernel mailing list
  linux-arm-ker...@lists.infradead.org
  http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: EXYNOS: mcpm: Don't rely on firmware's secondary_cpu_start

2014-06-11 Thread Lorenzo Pieralisi
On Wed, Jun 11, 2014 at 01:14:21PM +0100, Chander Kashyap wrote:
 On Wed, Jun 11, 2014 at 3:43 PM, Lorenzo Pieralisi
 lorenzo.pieral...@arm.com wrote:
  On Wed, Jun 11, 2014 at 05:52:10AM +0100, Chander Kashyap wrote:
  Hi Doug,
 
  On Tue, Jun 10, 2014 at 9:19 PM, Nicolas Pitre nicolas.pi...@linaro.org 
  wrote:
   On Tue, 10 Jun 2014, Doug Anderson wrote:
  
   My S-state knowledge is not strong, but I believe that Lorenzo's
   questions matter if we're using S2 for CPUidle (where we actually turn
   off power and hot unplug CPUs) but not when we're using S1 for CPUidle
   (where we just enter WFI/WFE).
  
 
  No Its not plain WFI.
 
  All cores in Exynos5420 can be powered off independently.
  This functionality has been tested.
 
  Below is the link for the posted patches.
 
  https://lkml.org/lkml/2014/6/10/194
 
  And as Nicolas wrote, these patches need MCPM for that.
 
  Chander, I cast a look into the code and I have a question
  (you also told me on CPUidle review that only core off
  is supported in CPUidle).
 
  When you say all cores can be powered off independently, do
  you also mean that clusters can be powered off (in CPUidle) ?
 
  I am pointing this out since in the MCPM backend I noticed:
 
  /* TODO: Turn off the cluster here to save power. */
 
  I do not see any cluster power down request in the down path.
 
  If I am wrong, ignore my message, I am just writing to help.
 
  If you can only power down cores, but not the clusters on idle,
  please keep in mind that you are currently cleaning and invalidating
  the L2 when last man is running and this must not be taken
  lightly since, if L2 stays on, that's a massive energy waste
  for nothing.
 
  So, if clusters stay up, you _have_ to tweak the MCPM backend slightly
  to avoid cleaning L2, that's pivotal.
 
 Hi Lorenzo,
 Cluster shutdown is in progress. Abhilash will add support for that.
 
 https://www.mail-archive.com/linux-samsung-soc@vger.kernel.org/msg31104.html

Hi Chander,

thanks. So, as a heads-up:

1) if you merge CPUidle support now, as it is it is at least suboptimal, may
   even burn more energy than it saves. Latencies in the bL idle driver
   are likely to be wrong, since they are for cluster shutdown and for
   TC2, not core power gating that should require shorter target_residencies.
   On top of that, L2 is cleaned and invalidated for nothing.
2) when cluster support is merged, you might want to extend the CPUidle
   driver to add an additional state (ie C1 core gating, C2 cluster
   gating) and to do that you should extend the driver and the MCPM
   back-end accordingly, I discussed that with Nico already some time ago
   and actually it should be fairly easy to do but we have got to talk
   about that.

Thank you,
Lorenzo

 
 
 
  Lorenzo
 
 
   I believe that in ChromeOS we use S1 CPUidle and that it works fine.
   We've never implemented S2 that I'm aware of.
  
   You'll have to rely on MCPM for that.  That's probably why it hasn't
   been implemented before.
  
  
   Nicolas
  
   ___
   linux-arm-kernel mailing list
   linux-arm-ker...@lists.infradead.org
   http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
 
  --
  To unsubscribe from this list: send the line unsubscribe 
  linux-samsung-soc in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device

2014-05-29 Thread Lorenzo Pieralisi
On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel software broadcast hrtimer upon boot. It relies on a dynamically
chosen CPU to be always powered-up. This CPU then relays the timer interrupt
to CPUs in deep-idle states through its HW local timer device.

On systems with power management capabilities but no functional HW broadcast
tick device, the hrtimer based clock event device allows the kernel to
enter high-resolution timer mode, which improves system latencies and saves
dynamic power.

The side effect of having a CPU always-on has implications on power management
platform capabilities and makes CPUidle suboptimal, since at least a CPU is
kept always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working power
management capabilities.

The hrtimer based clock event device has lowest possible rating so that,
if a platform contains a functional HW clock event device with broadcast
capabilities, that device is always chosen as a tick broadcast device instead
of the software based one, now present by default.

Cc: Preeti U Murthy pre...@linux.vnet.ibm.com
Cc: Will Deacon will.dea...@arm.com
Acked-by: Mark Rutland mark.rutl...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
---
 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 29c39d5..3d43900 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see http://www.gnu.org/licenses/.
  */
 
+#include linux/clockchips.h
 #include linux/export.h
 #include linux/kernel.h
 #include linux/interrupt.h
@@ -67,6 +68,8 @@ void __init time_init(void)
 
clocksource_of_init();
 
+   tick_setup_hrtimer_broadcast();
+
arch_timer_rate = arch_timer_get_rate();
if (!arch_timer_rate)
panic(Unable to initialise architected timer.\n);
-- 
1.8.4


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device

2014-05-29 Thread Lorenzo Pieralisi
Hi Preeti,

On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
 Hi Lorenzo,
 
 On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
  On platforms implementing CPU power management, the CPUidle subsystem
  can allow CPUs to enter idle states where local timers logic is lost on 
  power
  down. To keep the software timers functional the kernel relies on an
  always-on broadcast timer to be present in the platform to relay the
  interrupt signalling the timer expiries.
  
  For platforms implementing CPU core gating that do not implement an 
  always-on
  HW timer or implement it in a broken way, this patch adds code to initialize
  the kernel software broadcast hrtimer upon boot. It relies on a dynamically
 
 It would be best to use the term hrtimer based broadcast device
 throughout the changelog for uniformity and to avoid confusion instead
 of mixing it with software broadcast.

Agreed.

  chosen CPU to be always powered-up. This CPU then relays the timer interrupt
  to CPUs in deep-idle states through its HW local timer device.
  
  On systems with power management capabilities but no functional HW broadcast
  tick device, the hrtimer based clock event device allows the kernel to
  enter high-resolution timer mode, which improves system latencies and saves
  dynamic power.
 
 Sorry but I do not understand the above paragraph. What do you mean by
 allows the kernel to enter high resolution timer mode ? And how does
 it improve system latency? I understand that the hrtimer based
 clockevent device saves dynamic power since it provides a mechanism in
 which cpus can enter deeper idle states.

See Mark's reply, I have nothing to add. I will remove this paragraph anyway.

  The side effect of having a CPU always-on has implications on power 
  management
  platform capabilities and makes CPUidle suboptimal, since at least a CPU is
  kept always in a shallow idle state by the kernel to relay timer interrupts,
  but at least leaves the kernel with a functional system with some working 
  power
  management capabilities.
  
  The hrtimer based clock event device has lowest possible rating so that,
  if a platform contains a functional HW clock event device with broadcast
  capabilities, that device is always chosen as a tick broadcast device 
  instead
  of the software based one, now present by default.
 
 I think this statement instead of the software based one, now present
 by default is incorrect. The hrtimer based clock event device will come
 into picture only when the arch calls tick_setup_hrtimer_broadcast()
 explicitly. Otherwise either the arch should register a real clock
 device which does broadcast or should disable deep idle states where the
 local timers stop. So I would suggest skipping the last paragraph as it
 is not conveying anything in specific. The fact that a clock device with
 the highest rating will be chosen is already known and need not be
 mentioned explicitly IMHO.
 
  
  Cc: Preeti U Murthy pre...@linux.vnet.ibm.com
  Cc: Will Deacon will.dea...@arm.com
  Acked-by: Mark Rutland mark.rutl...@arm.com
  Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
  ---
   arch/arm64/kernel/time.c | 3 +++
   1 file changed, 3 insertions(+)
  
  diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
  index 29c39d5..3d43900 100644
  --- a/arch/arm64/kernel/time.c
  +++ b/arch/arm64/kernel/time.c
  @@ -18,6 +18,7 @@
* along with this program.  If not, see http://www.gnu.org/licenses/.
*/
   
  +#include linux/clockchips.h
   #include linux/export.h
   #include linux/kernel.h
   #include linux/interrupt.h
  @@ -67,6 +68,8 @@ void __init time_init(void)
   
  clocksource_of_init();
   
  +   tick_setup_hrtimer_broadcast();
  +
  arch_timer_rate = arch_timer_get_rate();
  if (!arch_timer_rate)
  panic(Unable to initialise architected timer.\n);
  
 
 You have defined flag CPUIDLE_FLAG_TIMER_STOP for your deep idle
 states in which timer stops right?

Yes, I would have noticed otherwise =)

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device

2014-05-29 Thread Lorenzo Pieralisi
On Thu, May 29, 2014 at 01:39:29PM +0100, Mark Rutland wrote:

[...]

   The side effect of having a CPU always-on has implications on power 
   management
   platform capabilities and makes CPUidle suboptimal, since at least a CPU 
   is
   kept always in a shallow idle state by the kernel to relay timer 
   interrupts,
   but at least leaves the kernel with a functional system with some working 
   power
   management capabilities.
   
   The hrtimer based clock event device has lowest possible rating so that,
   if a platform contains a functional HW clock event device with broadcast
   capabilities, that device is always chosen as a tick broadcast device 
   instead
   of the software based one, now present by default.
  
  I think this statement instead of the software based one, now present
  by default is incorrect. The hrtimer based clock event device will come
  into picture only when the arch calls tick_setup_hrtimer_broadcast()
  explicitly. Otherwise either the arch should register a real clock
  device which does broadcast or should disable deep idle states where the
  local timers stop. So I would suggest skipping the last paragraph as it
  is not conveying anything in specific. The fact that a clock device with
  the highest rating will be chosen is already known and need not be
  mentioned explicitly IMHO.
 
 I think it is worth keeping the paragraph to allay anyone's fear that
 the hrtimer based broadcast device might be selected in preference to a
 real suitable clock. I would otherwise not be aware that the hrtimer
 based broadcast device had the lowest rating (and would have to go and
 look that up separately).
 
 As the arch code has delegated timer registration to
 clocksoruce_of_init, it doesn't know whether any of the real devices
 that may have been registered are suitable as a broadcast source for
 oneshot events. So we can't conditionally register the hrtimer based
 broadcast device.
 
 Perhaps we could replace now present by default with which is
 unconditionally registered in case no suitable hardware device is
 present?

How about this:

-- 8 --
Subject: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event
 device

On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel hrtimer based clock event device upon boot (which can be chosen as
tick broadcast device by the kernel).
It relies on a dynamically chosen CPU to be always powered-up. This CPU then
relays the timer interrupt to CPUs in deep-idle states through its HW local
timer device.

The side effect of having a CPU always-on has implications on power management
platform capabilities and makes CPUidle suboptimal, since at least a CPU is
kept always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working power
management capabilities.

The hrtimer based clock event device has lowest possible rating so that,
if a platform contains a functional HW clock event device with broadcast
capabilities, that device is always chosen as a tick broadcast device instead
of the hrtimer based one, which is unconditionally registered in case no
suitable hardware clock event device is present.

Cc: Preeti U Murthy pre...@linux.vnet.ibm.com
Cc: Will Deacon will.dea...@arm.com
Acked-by: Mark Rutland mark.rutl...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
---
 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 6815987..1a7125c 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see http://www.gnu.org/licenses/.
  */
 
+#include linux/clockchips.h
 #include linux/export.h
 #include linux/kernel.h
 #include linux/interrupt.h
@@ -69,6 +70,8 @@ void __init time_init(void)
of_clk_init(NULL);
clocksource_of_init();
 
+   tick_setup_hrtimer_broadcast();
+
arch_timer_rate = arch_timer_get_rate();
if (!arch_timer_rate)
panic(Unable to initialise architected timer.\n);
-- 
1.8.4


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] arm64: kernel: initialize broadcast hrtimer based clock event device

2014-05-29 Thread Lorenzo Pieralisi
On platforms implementing CPU power management, the CPUidle subsystem
can allow CPUs to enter idle states where local timers logic is lost on power
down. To keep the software timers functional the kernel relies on an
always-on broadcast timer to be present in the platform to relay the
interrupt signalling the timer expiries.

For platforms implementing CPU core gating that do not implement an always-on
HW timer or implement it in a broken way, this patch adds code to initialize
the kernel hrtimer based clock event device upon boot (which can be chosen as
tick broadcast device by the kernel).
It relies on a dynamically chosen CPU to be always powered-up. This CPU then
relays the timer interrupt to CPUs in deep-idle states through its HW local
timer device.

Having a CPU always-on has implications on power management platform
capabilities and makes CPUidle suboptimal, since at least a CPU is kept
always in a shallow idle state by the kernel to relay timer interrupts,
but at least leaves the kernel with a functional system with some working
power management capabilities.

The hrtimer based clock event device is unconditionally registered, but
has the lowest possible rating such that any broadcast-capable HW clock
event device present will be chosen in preference as the tick broadcast
device.

Cc: Preeti U Murthy pre...@linux.vnet.ibm.com
Acked-by: Will Deacon will.dea...@arm.com
Acked-by: Mark Rutland mark.rutl...@arm.com
Signed-off-by: Lorenzo Pieralisi lorenzo.pieral...@arm.com
---
changes in v2:

- Reworded the commit log according to reviews

It should be ready to go.

Thanks,
Lorenzo

 arch/arm64/kernel/time.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index 6815987..1a7125c 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -18,6 +18,7 @@
  * along with this program.  If not, see http://www.gnu.org/licenses/.
  */
 
+#include linux/clockchips.h
 #include linux/export.h
 #include linux/kernel.h
 #include linux/interrupt.h
@@ -69,6 +70,8 @@ void __init time_init(void)
of_clk_init(NULL);
clocksource_of_init();
 
+   tick_setup_hrtimer_broadcast();
+
arch_timer_rate = arch_timer_get_rate();
if (!arch_timer_rate)
panic(Unable to initialise architected timer.\n);
-- 
1.8.4


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/5] devicetree: bindings: document Broadcom CPU enable method

2014-05-27 Thread Lorenzo Pieralisi
On Tue, May 20, 2014 at 06:43:46PM +0100, Alex Elder wrote:
 Broadcom mobile SoCs use a ROM-implemented holding pen for
 controlled boot of secondary cores.  A special register is
 used to communicate to the ROM that a secondary core should
 start executing kernel code.  This enable method is currently
 used for members of the bcm281xx and bcm21664 SoC families.
 
 The use of an enable method also allows the SMP operation vector to
 be assigned as a result of device tree content for these SoCs.
 
 Signed-off-by: Alex Elder el...@linaro.org

This is getting out of control, it is absolutely ghastly. I wonder how
I can manage to keep cpus.txt updated if anyone with a boot method
du jour adds into cpus.txt, and honestly in this specific case it is even
hard to understand why.

Can't it be done with bindings for the relative register address space
(regmap ?) and platform code just calls the registers driver to set-up the
jump address ? It is platform specific code anyway there is no way you
can make this generic.

I really do not see the point in cluttering cpus.txt with this stuff, it
is a platform specific hack, and do not belong in generic bindings in my
opinion.

Thanks,
Lorenzo

 ---
  Documentation/devicetree/bindings/arm/cpus.txt | 12 
  1 file changed, 12 insertions(+)
 
 diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
 b/Documentation/devicetree/bindings/arm/cpus.txt
 index 333f4ae..c6a2411 100644
 --- a/Documentation/devicetree/bindings/arm/cpus.txt
 +++ b/Documentation/devicetree/bindings/arm/cpus.txt
 @@ -185,6 +185,7 @@ nodes to be present and contain the properties described 
 below.
   qcom,gcc-msm8660
   qcom,kpss-acc-v1
   qcom,kpss-acc-v2
 + brcm,bcm11351-cpu-method
  
   - cpu-release-addr
   Usage: required for systems that have an enable-method
 @@ -209,6 +210,17 @@ nodes to be present and contain the properties described 
 below.
   Value type: phandle
   Definition: Specifies the ACC[2] node associated with this CPU.
  
 + - secondary-boot-reg
 + Usage:
 + Required for systems that have an enable-method
 + property value of brcm,bcm11351-cpu-method.
 + Value type: u32
 + Definition:
 + Specifies the physical address of the register used to
 + request the ROM holding pen code release a secondary
 + CPU.  The value written to the register is formed by
 + encoding the target CPU id into the low bits of the
 + physical start address it should jump to.
  
  Example 1 (dual-cluster big.LITTLE system 32-bit):
  
 -- 
 1.9.1
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/5] devicetree: bindings: document Broadcom CPU enable method

2014-05-28 Thread Lorenzo Pieralisi
On Wed, May 28, 2014 at 04:30:47AM +0100, Alex Elder wrote:
 On 05/27/2014 06:49 AM, Lorenzo Pieralisi wrote:
  On Tue, May 20, 2014 at 06:43:46PM +0100, Alex Elder wrote:
  Broadcom mobile SoCs use a ROM-implemented holding pen for
  controlled boot of secondary cores.  A special register is
  used to communicate to the ROM that a secondary core should
  start executing kernel code.  This enable method is currently
  used for members of the bcm281xx and bcm21664 SoC families.
 
  The use of an enable method also allows the SMP operation vector to
  be assigned as a result of device tree content for these SoCs.
 
  Signed-off-by: Alex Elder el...@linaro.org
  
  This is getting out of control, it is absolutely ghastly. I wonder how
  I can manage to keep cpus.txt updated if anyone with a boot method
  du jour adds into cpus.txt, and honestly in this specific case it is even
  hard to understand why.
 
 OK, in this message I'll focus on the particulars of this
 proposed binding.
 
  Can't it be done with bindings for the relative register address space
  (regmap ?) and platform code just calls the registers driver to set-up the
  jump address ? It is platform specific code anyway there is no way you
  can make this generic.
 
 I want to clarify what you're after here.
 
 My aim is to add SMP support for a class of Broadcom SMP
 machines.  To do so, I'm told I need to use the technique
 of assigning the SMP operations vector as a result of
 identifying an enable method in the DT.
 
 For 32-bit ARM, there are no generic enable-method values.
 (I did attempt to create one for spin-table but that was
 rejected by Russell King.)  For the machines I'm trying to
 enable, secondary CPUS start out spinning in a ROM-based
 holding pen, and there is no need for a kernel-based one.
 
 However, like a spin-table/holding pen enable method, a
 memory location is required for coordination between the
 boot CPU running kernel code and secondary CPUs running ROM
 code.  My proposal specifies it using a special numeric
 property value named secondary-boot-reg in the cpus
 node in the DT.
 
 And as I understand it, the issue you have relates to how
 this memory location is specified.

The issue I have relates to cluttering cpus.txt with all
sorts of platform specific SMP boot hacks.

 You suggest regmap.  I'm using a single 32-bit register,
 only at very early boot time, and thereafter access to
 it is meaningless.  It seems like overkill if it's only
 used for this purpose.  I could hide the register values
 in the code, but with the exception of that, the code I'm
 using is generic (in the context of this class of Broadcom
 machine).  I could specify the register differently somehow,
 in a different node, or with a different property.

Is that register part of a larger registers block ? What I wanted
to say is that you can use a driver API (we wish) to write that
register, something like eg vexpress does with sysflags:

drivers/mfd/vexpress-sysreg.c

vexpress_flags_set()

instead of grabbing the reg address from a platform specific boot
method DT entry.

I doubt that register exists on its own, even though I have to say this
would force you to write yet another platform specific driver to control
a bunch of registers, I do not see any other solution.

One thing is for certain: I really do not see the point in adding a boot
method per-SoC, and I do not want to end up having a cpus.txt file with a
gazillion entries just because every given platform reinvents the wheel when
it comes to booting an SMP system, cpus.txt would become a document that
describes platform quirks, not a proper binding anymore.

At least all platform specific quirks must be moved out of cpus.txt and
in platform documentation, I understand it is just a cosmetic change but
I want to prevent cpus.txt to become an abomination.

 The bottom line here is I'm not sure whether I understand
 what you're suggesting, or perhaps why what you suggest is
 preferable.  I'm very open to suggestions, I just need it
 laid out a bit more detail in order to respond directly.

See above.

Thanks !
Lorenzo

 
 Thanks.
 
   -Alex
 
  I really do not see the point in cluttering cpus.txt with this stuff, it
  is a platform specific hack, and do not belong in generic bindings in my
  opinion.
  
  Thanks,
  Lorenzo
  
  ---
   Documentation/devicetree/bindings/arm/cpus.txt | 12 
   1 file changed, 12 insertions(+)
 
  diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
  b/Documentation/devicetree/bindings/arm/cpus.txt
  index 333f4ae..c6a2411 100644
  --- a/Documentation/devicetree/bindings/arm/cpus.txt
  +++ b/Documentation/devicetree/bindings/arm/cpus.txt
  @@ -185,6 +185,7 @@ nodes to be present and contain the properties 
  described below.
 qcom,gcc-msm8660
 qcom,kpss-acc-v1
 qcom,kpss-acc-v2
  +  brcm,bcm11351-cpu-method

Re: [PATCH v4 1/5] devicetree: bindings: document Broadcom CPU enable method

2014-05-28 Thread Lorenzo Pieralisi
On Wed, May 28, 2014 at 01:22:06PM +0100, Alex Elder wrote:
 On 05/28/2014 05:36 AM, Lorenzo Pieralisi wrote:
  On Wed, May 28, 2014 at 04:30:47AM +0100, Alex Elder wrote:
  On 05/27/2014 06:49 AM, Lorenzo Pieralisi wrote:
  On Tue, May 20, 2014 at 06:43:46PM +0100, Alex Elder wrote:
  Broadcom mobile SoCs use a ROM-implemented holding pen for
  controlled boot of secondary cores.  A special register is
  used to communicate to the ROM that a secondary core should
  start executing kernel code.  This enable method is currently
  used for members of the bcm281xx and bcm21664 SoC families.
 
  The use of an enable method also allows the SMP operation vector to
  be assigned as a result of device tree content for these SoCs.
 
  Signed-off-by: Alex Elder el...@linaro.org
 
  This is getting out of control, it is absolutely ghastly. I wonder how
  I can manage to keep cpus.txt updated if anyone with a boot method
  du jour adds into cpus.txt, and honestly in this specific case it is even
  hard to understand why.
 
  OK, in this message I'll focus on the particulars of this
  proposed binding.
 
  Can't it be done with bindings for the relative register address space
  (regmap ?) and platform code just calls the registers driver to set-up the
  jump address ? It is platform specific code anyway there is no way you
  can make this generic.
 
  I want to clarify what you're after here.
 
  My aim is to add SMP support for a class of Broadcom SMP
  machines.  To do so, I'm told I need to use the technique
  of assigning the SMP operations vector as a result of
  identifying an enable method in the DT.
 
  For 32-bit ARM, there are no generic enable-method values.
  (I did attempt to create one for spin-table but that was
  rejected by Russell King.)  For the machines I'm trying to
  enable, secondary CPUS start out spinning in a ROM-based
  holding pen, and there is no need for a kernel-based one.
 
  However, like a spin-table/holding pen enable method, a
  memory location is required for coordination between the
  boot CPU running kernel code and secondary CPUs running ROM
  code.  My proposal specifies it using a special numeric
  property value named secondary-boot-reg in the cpus
  node in the DT.
 
  And as I understand it, the issue you have relates to how
  this memory location is specified.
  
  The issue I have relates to cluttering cpus.txt with all
  sorts of platform specific SMP boot hacks.
 
 OK, as I mentioned in my other message, I totally
 agree with you here.  It's a total (and building)
 mess.  I discussed this with Mark Rutland at ELC
 last month and suggested splitting that stuff out
 of cpus.txt, which I have now proposed with a
 patch.
 https://lkml.org/lkml/2014/5/8/545

I think this makes sense, I will review that patchset, and with this
approach agreed I am ok with adding a platform specific boot method,
since it is split up nicely, do not bother adding a specific driver
to poke a register (it will be fun to see the number of files we have
to add to /cpu-enable-method though, big fun).

I still think that it is high time we started pushing back on these
platform hacks and move towards a common interface like PSCI to boot
(and suspend) ARM processors, there is no reason whatsoever why this
can't be done on the platforms you are trying to get merged unless I am
missing something.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v4 5/5] mcpm: exynos: populate suspend and powered_up callbacks

2014-05-13 Thread Lorenzo Pieralisi
On Tue, May 13, 2014 at 12:43:31PM +0100, Chander Kashyap wrote:

[...]

  +static void exynos_suspend(u64 residency)
  +{
  + unsigned int mpidr, cpunr;
  +
  + mpidr = read_cpuid_mpidr();
  + cpunr = exynos_pmu_cpunr(mpidr);
 
  If I were to be picky, I would compute these values only if they
  are needed, ie move the computation after exynos_power_down().
 
 Yes thats make sense. I will realign it.
 
 
  There is another quite horrible issue here. We know this code works
  because the processors A15/A7 hit the caches with C bit in SCTLR cleared.
 
  On processors where this is not true, this sequence would explode
  if power down fails (in case core is gated but L2 is still powered on,
  the stack is stuck in L2) since it is going to read stack data that is
  in L2 but can't be read.
 
  It is not related to this sequence only, but it is an issue in general
  and wanted to mention that on the lists for public awareness.
 
 
 Can you please elaborate. I didn't understand.

It is not related to this patch only. This function carries out writes to the
stack (which might end up in eg L2) and then disables the C bit in SCTLR
through MCPM.

A15 and A7 processors hit the cache with the C bit clear in the SCTLR
so the processor still hits the stack values if the power down fails.
On processors where caches are not hit with the C bit clear (eg A9) this code
would fail since the stack values that sit in the caches cannot be read with
the C bit clear in SCTLR until the SCTLR is restored, so it will have to
be implemented in assembly with no stack usage (or better, no cacheable data
usage).

So, all I am saying is, to avoid copy'n'paste havoc and to avoid running
this code on Exynos platforms where it must not be run as-is, please add
a comment along the line:

This function requires the stack data to be visible through power down
and can only be executed on processors like A15 and A7 that hit the cache
with the C bit clear in the SCTLR register.

Please let me know if that's clear.

Lorenzo

 
  The gist of what I am saying is, please add a comment to that extent,
  here and it should be added in exynos_power_down() too.
 
  + __raw_writel(virt_to_phys(mcpm_entry_point), ns_sram_base_addr + 
  0x1c);
 
  No magic numbers please (0x1c). You can add a macro/wrapper, as TC2 does.
 
 Yes i will remove it.
 
 
  + exynos_power_down();
  +
  + /*
  +  * Execution reaches here only if cpu did not power down.
  +  * Hence roll back the changes done in exynos_power_down function.
  + */
  + exynos_cpu_powerup(cpunr);
 
  Please be aware that if this function returns MCPM will soft reboot, and
  the CPUidle driver will have no way to detect a state entry failure.
 
  I am just flagging this up, since fixing this behaviour is not easy, and
  honestly, since power down failure should be the exception not the rule,
  the idle stats should not be affected much.
 
  I think this is the proper way of implementing the sequence but please
  all keep in mind what I wrote above.
 
  Lorenzo
 
  +}
  +
   static const struct mcpm_platform_ops exynos_power_ops = {
.power_up   = exynos_power_up,
.power_down = exynos_power_down,
.power_down_finish  = exynos_power_down_finish,
  + .suspend= exynos_suspend,
  + .powered_up = exynos_powered_up,
   };
 
   static void __init exynos_mcpm_usage_count_init(void)
  --
  1.7.9.5
 
 
 
 
 
 
 -- 
 with warm regards,
 Chander Kashyap
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 4/6] driver: cpuidle: cpuidle-big-little: init driver for Exynos5420

2014-05-14 Thread Lorenzo Pieralisi
On Wed, May 14, 2014 at 02:04:51PM +0100, Arnd Bergmann wrote:
 On Wednesday 14 May 2014 13:33:55 Chander Kashyap wrote:
  
  diff --git a/drivers/cpuidle/cpuidle-big_little.c 
  b/drivers/cpuidle/cpuidle-big_little.c
  index 4cd02bd..344d79fa 100644
  --- a/drivers/cpuidle/cpuidle-big_little.c
  +++ b/drivers/cpuidle/cpuidle-big_little.c
  @@ -165,6 +165,7 @@ static int __init bl_idle_driver_init(struct 
  cpuidle_driver *drv, int cpu_id)
   
   static const struct of_device_id compatible_machine_match[] = {
  { .compatible = arm,vexpress,v2p-ca15_a7 },
  +   { .compatible = samsung,exynos5420 },
  {},
   };
 
 Does the cpuidle-big_little driver actually care about the platform?

No, platform specific bits are hidden behind MCPM. Apart from idle states
data, that will soon be initialized through DT too.

Actually, when idle for arm64 is merged we can even rework it and end up
with a single driver, DT based, that's the ultimate goal.

 If not, it would be good to add a generic string here as well, for
 future platforms to match.

Yes, you have a point.

 It still makes sense to list both the generic string and the platform
 specific one though, in case we have to work around subtle differences.

Agreed, but subtle differences do not belong in this driver, that's the
purpose of abstracting it the best we can. I have no problem in leaving
platform specific compatible there though.

Lorenzo

 
   Arnd
 --
 To unsubscribe from this list: send the line unsubscribe linux-pm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-02-01 Thread Lorenzo Pieralisi
On Sat, Feb 01, 2014 at 06:00:40AM +, Brown, Len wrote:
  Right now (on ARM at least but I imagine this is pretty universal), the
  biggest impact on information accuracy for a CPU depends on what the
  other CPUs are doing.  The most obvious example is cluster power down.
  For a cluster to be powered down, all the CPUs sharing this cluster must
  also be powered down.  And all those CPUs must have agreed to a possible
  cluster power down in advance as well.  But it is not because an idle
  CPU has agreed to the extra latency imposed by a cluster power down that
  the cluster has actually powered down since another CPU in that cluster
  might still be running, in which case the recorded latency information
  for that idle CPU would be higher than it would be in practice at that
  moment.
 
 That will not work.
 
 When a CPU goes idle, it uses the CURRENT criteria for entering that state.
 If the criteria change after it has entered the state, are you going
 to wake it up so it can re-evaluate?  No.
 
 That is why the state must describe the worst case latency
 that CPU may see when waking from the state on THAT entry.
 
 That is why we use the package C-state numbers to describe
 core C-states on IA.

That's what we do on ARM too for cluster states. But the state decision
might turn out suboptimal in this case too for the same reasons you have
just mentioned.

There are some use cases when it matters (and where monitoring the
timers on all CPUs in a cluster shows that aborting cluster shutdown is
required because some CPUs have a pending timer and the governor decision is
stale), there are some use cases where it does not matter at all.

We talked about this at LPC and I guess x86 FW/HW plays a role in
package states demotion too, we can do it in FW on ARM.

Overall we all know that whatever we do, it is impossible to know the
precise C-state a CPU is in, even if we resort to HW probing, it is just
a matter of deciding what level of abstraction is necessary for the
scheduler to work properly.

Thanks for bringing this topic up.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 04/10] base: power: Add generic OF-based power domain look-up

2014-01-22 Thread Lorenzo Pieralisi
On Mon, Jan 20, 2014 at 05:32:53PM +, Tomasz Figa wrote:
 Hi Lorenzo,
 
 On 16.01.2014 17:34, Lorenzo Pieralisi wrote:
  Hi Tomasz,
 
  thank you for posting this series. I would like to use the DT bindings
  for power domains in the bindings for C-states on ARM:
 
  http://comments.gmane.org/gmane.linux.power-management.general/41012
 
  and in particular link a given C-state to a given power domain so that the
  kernel will have a way to actually check what devices are lost upon C-state
  entry (and for devices I also mean CPU peripheral like PMUs, GIC CPU IF,
  caches and possibly cpus, all of them already represented with DT nodes).
 
  I have a remark:
 
  -  Can we group device nodes under a single power-domain-parent so that
  all devices defined under that parent won't have to re-define a
  power-domain property (a property like interrupt-parent, so to speak)
 
  What do you think ?
 
 Hmm, I can see potential benefits of such construct on platforms with 
 clear hierarchy of devices, but to make sure I'm getting it correctly, 
 is the following what you have in mind?
 
 soc-domain-x@1234 {
   compatible = ...;
   reg = ...;
   power-domain-parent = power_domains DOMAIN_X;
 
   device@1000 {
   compatible = ...;
   // inherits power-domain = power_domains DOMAIN_X
   };
 
   device@2000 {
   compatible = ...;
   // inherits power-domain = power_domains DOMAIN_X
   };
 };

Yes, exactly, it could avoid duplicated data. I still have an issue
with nodes that are per-cpu but define just one node (eg PMU), since
a CPU might belong in a power-domain on its own (ie one power domain
per-CPU) and basically this means that arch-timers, PMU  company should
link to multiple power domains, ie one per CPU or we find a way to define
a power domain as banked.

I need to think about this a bit more, thanks for your feedback.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/20] ARM64 / ACPI: Introduce arm_core.c and its related head file

2014-01-22 Thread Lorenzo Pieralisi
On Fri, Jan 17, 2014 at 12:24:58PM +, Hanjun Guo wrote:

[...]

 diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
 index bd9bbd0..2210353 100644
 --- a/arch/arm64/kernel/setup.c
 +++ b/arch/arm64/kernel/setup.c
 @@ -41,6 +41,7 @@
  #include linux/memblock.h
  #include linux/of_fdt.h
  #include linux/of_platform.h
 +#include linux/acpi.h
  
  #include asm/cputype.h
  #include asm/elf.h
 @@ -225,6 +226,11 @@ void __init setup_arch(char **cmdline_p)
  
   arm64_memblock_init();
  
 + /* Parse the ACPI tables for possible boot-time configuration */
 + acpi_boot_table_init();
 + early_acpi_boot_init();
 + acpi_boot_init();
 +
   paging_init();

Can I ask you please why we need to parse ACPI tables before
paging_init() ?

[...]

 +/*
 + * __acpi_map_table() will be called before page_init(), so early_ioremap()
 + * or early_memremap() should be called here.

Again, why is this needed ? What's needed before paging_init() from ACPI ?

[...]

 +/*
 + * acpi_boot_table_init() and acpi_boot_init()
 + *  called from setup_arch(), always.
 + *   1. checksums all tables
 + *   2. enumerates lapics
 + *   3. enumerates io-apics
 + *
 + * acpi_table_init() is separated to allow reading SRAT without
 + * other side effects.
 + */
 +void __init acpi_boot_table_init(void)
 +{
 + /*
 +  * If acpi_disabled, bail out
 +  */
 + if (acpi_disabled)
 + return;
 +
 + /*
 +  * Initialize the ACPI boot-time table parser.
 +  */
 + if (acpi_table_init()) {
 + disable_acpi();
 + return;
 + }
 +}
 +
 +int __init early_acpi_boot_init(void)
 +{
 + /*
 +  * If acpi_disabled, bail out
 +  */
 + if (acpi_disabled)
 + return -ENODEV;
 +
 + /*
 +  * Process the Multiple APIC Description Table (MADT), if present
 +  */
 + early_acpi_process_madt();
 +
 + return 0;
 +}
 +
 +int __init acpi_boot_init(void)
 +{
 + /*
 +  * If acpi_disabled, bail out
 +  */
 + if (acpi_disabled)
 + return -ENODEV;
 +
 + acpi_table_parse(ACPI_SIG_FADT, acpi_parse_fadt);
 +
 + /*
 +  * Process the Multiple APIC Description Table (MADT), if present
 +  */
 + acpi_process_madt();
 +
 + return 0;
 +}

Well, apart from having three init calls, one returning void and two
returning proper values, do not understand why, and do not understand
why we need three calls in the first place...why should we process MADT
twice in two separate calls ? What is supposed to change in between that
prevents you from merging the two together ?

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/20] ARM64 / ACPI: Enumerate possible/present CPU set and map logical cpu id to APIC id

2014-01-22 Thread Lorenzo Pieralisi
On Fri, Jan 17, 2014 at 12:25:04PM +, Hanjun Guo wrote:

[...]

 +/* map logic cpu id to physical GIC id */
 +extern int arm_cpu_to_apicid[NR_CPUS];
 +#define cpu_physical_id(cpu) arm_cpu_to_apicid[cpu]

Sudeep already commented on this, please update it accordingly.

 +
  #else/* !CONFIG_ACPI */
  #define acpi_disabled 1  /* ACPI sometimes enabled on ARM */
  #define acpi_noirq 1 /* ACPI sometimes enabled on ARM */
 diff --git a/drivers/acpi/plat/arm-core.c b/drivers/acpi/plat/arm-core.c
 index 8ba3e6f..1d9b789 100644
 --- a/drivers/acpi/plat/arm-core.c
 +++ b/drivers/acpi/plat/arm-core.c
 @@ -31,6 +31,7 @@
  #include linux/smp.h
  
  #include asm/pgtable.h
 +#include asm/cputype.h
  
  /*
   * We never plan to use RSDT on arm/arm64 as its deprecated in spec but this
 @@ -52,6 +53,13 @@ EXPORT_SYMBOL(acpi_pci_disabled);
   */
  static u64 acpi_lapic_addr __initdata;

Is this variable actually needed ?

  
 +/* available_cpus here means enabled cpu in MADT */
 +static int available_cpus;

Ditto.

 +
 +/* Map logic cpu id to physical GIC id (physical CPU id). */
 +int arm_cpu_to_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = -1 };
 +static int boot_cpu_apic_id = -1;

Do we need all these variables ? I think we should reuse cpu_logical_map
data structures for that, it looks suspiciously familiar.

  #define BAD_MADT_ENTRY(entry, end) ( \
   (!entry) || (unsigned long)entry + sizeof(*entry)  end ||  \
   ((struct acpi_subtable_header *)entry)-length  sizeof(*entry))
 @@ -132,6 +140,55 @@ static int __init acpi_parse_madt(struct 
 acpi_table_header *table)
   * Please refer to chapter5.2.12.14/15 of ACPI 5.0
   */
  
 +/**
 + * acpi_register_gic_cpu_interface - register a gic cpu interface and
 + * generates a logic cpu number
 + * @id: gic cpu interface id to register
 + * @enabled: this cpu is enabled or not
 + *
 + * Returns the logic cpu number which maps to the gic cpu interface
 + */
 +static int acpi_register_gic_cpu_interface(int id, u8 enabled)
 +{
 + int cpu;
 +
 + if (id = MAX_GIC_CPU_INTERFACE) {
 + pr_info(PREFIX skipped apicid that is too big\n);
 + return -EINVAL;
 + }
 +
 + total_cpus++;
 + if (!enabled)
 + return -EINVAL;
 +
 + if (available_cpus =  NR_CPUS) {
 + pr_warn(PREFIX NR_CPUS limit of %d reached,
 +  Processor %d/0x%x ignored.\n, NR_CPUS, total_cpus, id);
 + return -EINVAL;
 + }

Hmm...what if you are missing the boot CPU ? It is a worthy check.
Have a look at smp_init_cpus(), it does not bail out on cpu= NR_CPUS
because you do want to make sure that the DT contains the boot CPU
node. Same logic applies.

 +
 + available_cpus++;
 +

Is available_cpus != num_possible_cpus() ? It does not look like hence
available_cpus can go.

 + /* allocate a logic cpu id for the new comer */
 + if (boot_cpu_apic_id == id) {
 + /*
 +  * boot_cpu_init() already hold bit 0 in cpu_present_mask
 +  * for BSP, no need to allocte again.
 +  */
 + cpu = 0;
 + } else {
 + cpu = cpumask_next_zero(-1, cpu_present_mask);
 + }
 +
 + /* map the logic cpu id to APIC id */
 + arm_cpu_to_apicid[cpu] = id;
 +
 + set_cpu_present(cpu, true);
 + set_cpu_possible(cpu, true);

This is getting nasty. Before adding this patch and previous ones we
need to put in place a method for the kernel to make a definite choice between
ACPI and DT and stick to that. We can't initialize the logical map twice
(which will happen if your DT has valid cpu nodes and a chosen node pointing
to proper ACPI tables) or even having some entries initialized from DT and
others by ACPI. It is a big fat no-no, please update the series accordingly.

 +
 + return cpu;
 +}
 +
  static int __init
  acpi_parse_gic(struct acpi_subtable_header *header, const unsigned long end)
  {
 @@ -144,6 +201,16 @@ acpi_parse_gic(struct acpi_subtable_header *header, 
 const unsigned long end)
  
   acpi_table_print_madt_entry(header);
  
 + /*
 +  * We need to register disabled CPU as well to permit
 +  * counting disabled CPUs. This allows us to size
 +  * cpus_possible_map more accurately, to permit
 +  * to not preallocating memory for all NR_CPUS
 +  * when we use CPU hotplug.
 +  */
 + acpi_register_gic_cpu_interface(processor-gic_id,
 + processor-flags  ACPI_MADT_ENABLED);
 +
   return 0;
  }
  
 @@ -186,6 +253,19 @@ static int __init acpi_parse_madt_gic_entries(void)
   return count;
   }
  
 +#ifdef CONFIG_SMP
 + if (available_cpus == 0) {
 + pr_info(PREFIX Found 0 CPUs; assuming 1\n);
 + arm_cpu_to_apicid[available_cpus] =
 + read_cpuid_mpidr()  MPIDR_HWID_BITMASK;
 + available_cpus = 1; /* We've got at least one of these */
 + }

I'd rather 

Re: [Linaro-acpi] [PATCH 04/20] ARM64 / ACPI: Introduce arm_core.c and its related head file

2014-01-24 Thread Lorenzo Pieralisi
Hi Hanjun,

On Fri, Jan 24, 2014 at 09:09:40AM +, Hanjun Guo wrote:
 On 2014?01?23? 23:56, Tomasz Nowicki wrote:
  Hi Lorenzo,
 
  W dniu 22.01.2014 12:54, Lorenzo Pieralisi pisze:
  On Fri, Jan 17, 2014 at 12:24:58PM +, Hanjun Guo wrote:
 
  [...]
 
  diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
  index bd9bbd0..2210353 100644
  --- a/arch/arm64/kernel/setup.c
  +++ b/arch/arm64/kernel/setup.c
  @@ -41,6 +41,7 @@
  #include linux/memblock.h
  #include linux/of_fdt.h
  #include linux/of_platform.h
  +#include linux/acpi.h
 
  #include asm/cputype.h
  #include asm/elf.h
  @@ -225,6 +226,11 @@ void __init setup_arch(char **cmdline_p)
 
  arm64_memblock_init();
 
  + /* Parse the ACPI tables for possible boot-time configuration */
  + acpi_boot_table_init();
  + early_acpi_boot_init();
  + acpi_boot_init();
  +
  paging_init();
 
  Can I ask you please why we need to parse ACPI tables before
  paging_init() ?
  This is for future usage and because of couple of reasons. Mainly SRAT 
  table parsing should be done (before paging_init()) for proper NUMA 
  initialization and then paging_init().

Thank you for the explanation. I still have some questions:

1) What are the other reasons ?
2) NUMA is not supported at the moment and I reckon SRAT needs updating
   since the only way to associate a CPU to a memory node is through
   a local APIC id that is non-existent on ARM and at least deserves a
   new entry.

I am still not sure that providing a hook for parsing the ACPI tables before
paging_init() is the main focus at the moment, it is probably best, as we've
mentioned manifold times, to make sure that the infrastructure to detect
ACPI vs DT at run-time is in place in the kernel and allows us to boot
either with ACPI or DT, in a mutual exclusive way (same binary kernel
supporting both, runtime detection/decision on what data to use, ACPI tables
vs DT nodes, detection made once for all and NOT on a per property basis).

I will have a look at SRAT and how it is used on x86, and get back to you on
this.

[...]

  + * acpi_boot_table_init() and acpi_boot_init()
  + * called from setup_arch(), always.
  + * 1. checksums all tables
  + * 2. enumerates lapics
  + * 3. enumerates io-apics
  + *
  + * acpi_table_init() is separated to allow reading SRAT without
  + * other side effects.
  + */
  +void __init acpi_boot_table_init(void)
  +{
  + /*
  + * If acpi_disabled, bail out
  + */
  + if (acpi_disabled)
  + return;
  +
  + /*
  + * Initialize the ACPI boot-time table parser.
  + */
  + if (acpi_table_init()) {
  + disable_acpi();
  + return;
  + }
  +}
  +
  +int __init early_acpi_boot_init(void)
  +{
  + /*
  + * If acpi_disabled, bail out
  + */
  + if (acpi_disabled)
  + return -ENODEV;
  +
  + /*
  + * Process the Multiple APIC Description Table (MADT), if present
  + */
  + early_acpi_process_madt();
  +
  + return 0;
  +}
  +
  +int __init acpi_boot_init(void)
  +{
  + /*
  + * If acpi_disabled, bail out
  + */
  + if (acpi_disabled)
  + return -ENODEV;
  +
  + acpi_table_parse(ACPI_SIG_FADT, acpi_parse_fadt);
  +
  + /*
  + * Process the Multiple APIC Description Table (MADT), if present
  + */
  + acpi_process_madt();
  +
  + return 0;
  +}
 
  Well, apart from having three init calls, one returning void and two
  returning proper values, do not understand why, and do not understand
  why we need three calls in the first place...why should we process MADT
  twice in two separate calls ? What is supposed to change in between that
  prevents you from merging the two together ?
 
 Thanks for pointing this out. I can merge acpi_boot_table_init() and
 early_acpi_boot_init() together, but can not merge early_acpi_boot_init()
 and acpi_boot_init() together.
 
 early_acpi_boot_init() and acpi_boot_init() was separated intentionally for
 memory hotplug reasons. memory allocated in this stage can not be migrated
 and cause memory hot-remove failed, in order to keep memory allocated
 at base node (general NUMA node 0 in the system) at boot stage, we should
 parse SRAT first before CPU is enumerated, does this make sense to you?

Are you parsing the SRAT in this series to get memory info or memory is
still initialized by DT even when system is supposed to be booted with ACPI
(ie there is a valid ACPI root table ?)

I have a hunch the latter is what's happening (and that's wrong, because
memory information when kernel is booted through ACPI must be retrieved
from UEFI - at least that's what has been defined), so I still see an early
hook to initialize ACPI tables before paging_init() that has no use as the
current patchset stands, please correct me if I am wrong.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/20] ARM64 / ACPI: Enumerate possible/present CPU set and map logical cpu id to APIC id

2014-01-24 Thread Lorenzo Pieralisi
On Fri, Jan 24, 2014 at 02:37:28PM +, Hanjun Guo wrote:
 Hi Lorenzo,
 
 On 2014?01?22? 23:53, Lorenzo Pieralisi wrote:
  On Fri, Jan 17, 2014 at 12:25:04PM +, Hanjun Guo wrote:
 
  [...]
 
  +/* map logic cpu id to physical GIC id */
  +extern int arm_cpu_to_apicid[NR_CPUS];
  +#define cpu_physical_id(cpu) arm_cpu_to_apicid[cpu]
  Sudeep already commented on this, please update it accordingly.
 
 Actually after some careful review of the ACPI code, I can't
 update it as MPIDR here.
 
 MPIDR can be the ACPI uid and NOT the GIC id, the mapping
 of them are something like this in ACPI driver now:
 
 logic cpu id --- APIC Id (GIC ID) --- ACPI uid (MPIDR on ARM)
 but not logic cpu id --- ACPI uid directly, you can refer to
 the code of processor_core.c
 
 So here I can only map GIC id to logical cpu id.

On ARM platforms GIC CPU IF id is probeable, you do not need to parse
it (ie it is not information that you will find in DT). Please have a look
at drivers/irqchip/irq-gic.c.

We have to understand what's really required and when in ACPI, or to put it
differently, why cpu_physical_id(cpu) is required and at what time at
boot, I will have a look on my side too.

 
  +
#else/* !CONFIG_ACPI */
#define acpi_disabled 1  /* ACPI sometimes enabled on ARM */
#define acpi_noirq 1 /* ACPI sometimes enabled on ARM */
  diff --git a/drivers/acpi/plat/arm-core.c b/drivers/acpi/plat/arm-core.c
  index 8ba3e6f..1d9b789 100644
  --- a/drivers/acpi/plat/arm-core.c
  +++ b/drivers/acpi/plat/arm-core.c
  @@ -31,6 +31,7 @@
#include linux/smp.h

#include asm/pgtable.h
  +#include asm/cputype.h

/*
 * We never plan to use RSDT on arm/arm64 as its deprecated in spec but 
  this
  @@ -52,6 +53,13 @@ EXPORT_SYMBOL(acpi_pci_disabled);
 */
static u64 acpi_lapic_addr __initdata;
  Is this variable actually needed ?
 
 Yes, needed for GIC initialization.
 
 

  +/* available_cpus here means enabled cpu in MADT */
  +static int available_cpus;
  Ditto.
 
  +
  +/* Map logic cpu id to physical GIC id (physical CPU id). */
  +int arm_cpu_to_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = -1 };
  +static int boot_cpu_apic_id = -1;
  Do we need all these variables ? I think we should reuse cpu_logical_map
  data structures for that, it looks suspiciously familiar.
 
 MPIDR is the different part. if we use MPIDR as GIC id, i think
 we can reuse cpu_logical_map, but Sudeep suggested not
 use MPIDR as GIC id.

It is not about *reusing* cpu_logical_map, it is about setting it up
properly. cpu_logical_map must be initialized by ACPI for the spin table
method to work properly (and PSCI too).

And yes, cpu_physical_id(cpu) is expected to be the GIC CPU IF id on
ARM, at least it looks like, I had a look too. But this does not change
anything as far as cpu_logical_map is concerned, it must contain a list
of MPIDRs in the system and must be retrieved via ACPI, not DT CPU nodes
when ACPI is used for booting.

I will have a further look, since this discrepancy is annoying.

[...]

  +
  +  available_cpus++;
  +
  Is available_cpus != num_possible_cpus() ? It does not look like hence
  available_cpus can go.
 
 No, possible cpus include available cpus and disabled cpus
 this is useful for ACPI based CPU hot-plug features.
 
 
  +  /* allocate a logic cpu id for the new comer */
  +  if (boot_cpu_apic_id == id) {
  +  /*
  +   * boot_cpu_init() already hold bit 0 in cpu_present_mask
  +   * for BSP, no need to allocte again.
  +   */
  +  cpu = 0;
  +  } else {
  +  cpu = cpumask_next_zero(-1, cpu_present_mask);
  +  }
  +
  +  /* map the logic cpu id to APIC id */
  +  arm_cpu_to_apicid[cpu] = id;
  +
  +  set_cpu_present(cpu, true);
  +  set_cpu_possible(cpu, true);
  This is getting nasty. Before adding this patch and previous ones we
  need to put in place a method for the kernel to make a definite choice 
  between
  ACPI and DT and stick to that. We can't initialize the logical map twice
  (which will happen if your DT has valid cpu nodes and a chosen node pointing
  to proper ACPI tables) or even having some entries initialized from DT and
  others by ACPI. It is a big fat no-no, please update the series accordingly.
 
 really good catch here :)
 so the problem here is that should we use both ACPI and DT in one system?
 
 
 
  +
  +  return cpu;
  +}
  +
static int __init
acpi_parse_gic(struct acpi_subtable_header *header, const unsigned long 
  end)
{
  @@ -144,6 +201,16 @@ acpi_parse_gic(struct acpi_subtable_header *header, 
  const unsigned long end)

 acpi_table_print_madt_entry(header);

  +  /*
  +   * We need to register disabled CPU as well to permit
  +   * counting disabled CPUs. This allows us to size
  +   * cpus_possible_map more accurately, to permit
  +   * to not preallocating memory for all NR_CPUS
  +   * when we use CPU hotplug.
  +   */
  +  acpi_register_gic_cpu_interface(processor-gic_id

Re: [PATCH v4 4/6] devicetree: bindings: Document Krait L1/L2 EDAC

2014-01-07 Thread Lorenzo Pieralisi
On Mon, Dec 30, 2013 at 08:14:15PM +, Stephen Boyd wrote:
 The Krait L1/L2 error reporting device is made up of two
 interrupts, one per-CPU interrupt for the L1 caches and one
 interrupt for the L2 cache.
 
 Cc: Lorenzo Pieralisi lorenzo.pieral...@arm.com
 Cc: Mark Rutland mark.rutl...@arm.com
 Cc: Kumar Gala ga...@codeaurora.org
 Cc: devicet...@vger.kernel.org
 Signed-off-by: Stephen Boyd sb...@codeaurora.org
 ---
  Documentation/devicetree/bindings/arm/cpus.txt | 72 
 ++
  1 file changed, 72 insertions(+)
 
 diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
 b/Documentation/devicetree/bindings/arm/cpus.txt
 index 9130435..54de94b 100644
 --- a/Documentation/devicetree/bindings/arm/cpus.txt
 +++ b/Documentation/devicetree/bindings/arm/cpus.txt
 @@ -191,6 +191,35 @@ nodes to be present and contain the properties described 
 below.
 property identifying a 64-bit zero-initialised
 memory location.
  
 + - interrupts
 + Usage: required for cpus with compatible string qcom,krait.
 + Value type: prop-encoded-array
 + Definition: L1/CPU error interrupt
 +
 + - next-level-cache
 + Usage: optional
 + Value type: phandle
 + Definition: phandle pointing to the next level cache
 +
 +- cache node

Not sure this binding (cache node) belongs in cpus.txt

I am working on defining cache bindings for ARM within the C-state
standardization effort:

http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/215543.html

 +
 + Description: Describes a cache in an ARM based system
 +
 + - compatible
 + Usage: required
 + Value type: string
 + Definition: shall contain at least cache

It is a bit vague, can't we just follow the ePAPR compatible definition ?
See posting above.

 +
 + - cache-level
 + Usage: required
 + Value type: u32
 + Definition: level in the cache heirachy

hierarchy. I have a problem with the cache level definition, and in
particular the numbering, ie what the level number represents. If we
mean the cache level seen through the CLIDR and co., it is hard to use
it for shared caches since the level seen by different CPUs can actually
be different, or put it differently the level number might not be unique for
a shared cache. I need to think about a proper way to sort this out.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 4/6] devicetree: bindings: Document Krait L1/L2 EDAC

2014-01-08 Thread Lorenzo Pieralisi
On Tue, Jan 07, 2014 at 08:12:39PM +, Stephen Boyd wrote:
 On 01/07, Lorenzo Pieralisi wrote:
  
  Not sure this binding (cache node) belongs in cpus.txt
  
  I am working on defining cache bindings for ARM within the C-state
  standardization effort:
  
  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/215543.html
 
 Thanks I'll take a look.
 
  
   +
   + Description: Describes a cache in an ARM based system
   +
   + - compatible
   + Usage: required
   + Value type: string
   + Definition: shall contain at least cache
  
  It is a bit vague, can't we just follow the ePAPR compatible definition ?
  See posting above.
 
 Hm.. I thought this did follow the ePAPR spec. I see 'compatible,
 required, string, A standard property. The value shall include
 the string cache.' Looks the same?

Sorry, my bad, you are right.

 And I see 'cache-level, required, u32, Specifies the level in the
 cache hierarchy. For example, a level 2 cache has a value of
 2.'

We need to define it properly for ARM, I am not sure we can use level
as defined in CLIDR, I need to think more about this.

  
   +
   + - cache-level
   + Usage: required
   + Value type: u32
   + Definition: level in the cache heirachy
  
  hierarchy.
 
 Thanks.
 
  I have a problem with the cache level definition, and in
  particular the numbering, ie what the level number represents. If we
  mean the cache level seen through the CLIDR and co., it is hard to use
  it for shared caches since the level seen by different CPUs can actually
  be different, or put it differently the level number might not be unique for
  a shared cache. I need to think about a proper way to sort this out.
  
 
 Ok. I don't even use this property in my driver. All I really
 need is the phandle from cpus pointing to the L2 and the
 interrupts property in the L2 node.
 
 How do you want to proceed here? If your cache binding goes
 through I would just need to add the interrupts part. Or you
 could even add that part in the same patch, you could have my
 signed-off-by for that.

Ok, I will try to update the bindings with the interrupt part and copy
you in, even though the level definition worries me a bit, it is an
important property for power management and I need to find a proper
solution before bindings can get accepted (basically the problem is:
if different CPUs can see a cache at different levels as defined in the
CLIDR we cannot describe a cache with a single cache level or put it
differently, level can not represent the value in the CLIDR hence we
need to describe it differently).

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-01-15 Thread Lorenzo Pieralisi
On Tue, Jan 14, 2014 at 09:30:32PM +, Stephen Boyd wrote:
 The Krait CPU/L1 error reporting device is made up a per-CPU
 interrupt. While we're here, document the next-level-cache
 property that's used by the Krait EDAC driver.
 
 Cc: Lorenzo Pieralisi lorenzo.pieral...@arm.com
 Cc: Mark Rutland mark.rutl...@arm.com
 Cc: Kumar Gala ga...@codeaurora.org
 Cc: devicet...@vger.kernel.org
 Signed-off-by: Stephen Boyd sb...@codeaurora.org
 ---
  Documentation/devicetree/bindings/arm/cpus.txt | 52 
 ++
  1 file changed, 52 insertions(+)
 
 diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
 b/Documentation/devicetree/bindings/arm/cpus.txt
 index 91304353eea4..c332b5168456 100644
 --- a/Documentation/devicetree/bindings/arm/cpus.txt
 +++ b/Documentation/devicetree/bindings/arm/cpus.txt
 @@ -191,6 +191,16 @@ nodes to be present and contain the properties described 
 below.
 property identifying a 64-bit zero-initialised
 memory location.
  
 + - interrupts
 + Usage: required for cpus with compatible string qcom,krait.
 + Value type: prop-encoded-array
 + Definition: L1/CPU error interrupt

I reckon you want this property to belong in the cpus node (example below),
not in cpu nodes, right ?

Are you relying on a platform device to be created for /cpus node in
order for this series to work ? I guess that's why you want the
interrupts property to be defined in /cpus so that it becomes a platform
device resource (and you also add a compatible property in /cpus that is
missing in these bindings).

 +
 + - next-level-cache
 + Usage: optional
 + Value type: phandle
 + Definition: phandle pointing to the next level cache
 +
  Example 1 (dual-cluster big.LITTLE system 32-bit):
  
   cpus {
 @@ -382,3 +392,45 @@ cpus {
   cpu-release-addr = 0 0x2000;
   };
  };
 +
 +
 +Example 5 (Krait 32-bit system):
 +
 +cpus {
 + #address-cells = 1;
 + #size-cells = 0;
 + interrupts = 1 9 0xf04;

In patch 4 you also add a compatible property here, and that's not documented,
and honestly I do not think that's acceptable either. I guess you want a
compatible property here to match the platform driver, right ?

Thank you,
Lorenzo

 +
 + cpu@0 {
 + device_type = cpu;
 + compatible = qcom,krait;
 + reg = 0;
 + next-level-cache = L2;
 + };
 +
 + cpu@1 {
 + device_type = cpu;
 + compatible = qcom,krait;
 + reg = 1;
 + next-level-cache = L2;
 + };
 +
 + cpu@2 {
 + device_type = cpu;
 + compatible = qcom,krait;
 + reg = 2;
 + next-level-cache = L2;
 + };
 +
 + cpu@3 {
 + device_type = cpu;
 + compatible = qcom,krait;
 + reg = 3;
 + next-level-cache = L2;
 + };
 +
 + L2: l2-cache {
 + compatible = cache;
 + interrupts = 0 2 0x4;
 + };
 +};
 -- 
 The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
 hosted by The Linux Foundation
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-01-16 Thread Lorenzo Pieralisi
On Thu, Jan 16, 2014 at 01:38:40AM +, Stephen Boyd wrote:
 On 01/15, Stephen Boyd wrote:
  
  Ah sorry, I forgot to put the compatible property here like in
  the dts change. I'll do that in the next revision. Yes we need a
  compatible property here to match the platform driver.
  
 
 This is the replacement patch
 
 -8--
 From: Stephen Boyd sb...@codeaurora.org
 Subject: [PATCH v9] devicetree: bindings: Document Krait CPU/L1 EDAC
 
 The Krait CPU/L1 error reporting device is made up a per-CPU
 interrupt. While we're here, document the next-level-cache
 property that's used by the Krait EDAC driver.
 
 Cc: Lorenzo Pieralisi lorenzo.pieral...@arm.com
 Cc: Mark Rutland mark.rutl...@arm.com
 Cc: Kumar Gala ga...@codeaurora.org
 Cc: devicet...@vger.kernel.org
 Signed-off-by: Stephen Boyd sb...@codeaurora.org
 ---
  Documentation/devicetree/bindings/arm/cpus.txt | 58 
 ++
  1 file changed, 58 insertions(+)
 
 diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
 b/Documentation/devicetree/bindings/arm/cpus.txt
 index 91304353eea4..03a529e791c4 100644
 --- a/Documentation/devicetree/bindings/arm/cpus.txt
 +++ b/Documentation/devicetree/bindings/arm/cpus.txt
 @@ -62,6 +62,20 @@ nodes to be present and contain the properties described 
 below.
   Value type: u32
   Definition: must be set to 0
  
 + - compatible
 + Usage: optional
 + Value type: string
 + Definition: should be one of the compatible strings listed
 + in the cpu node compatible property. This property
 + shall only be present if all the cpu nodes have the
 + same compatible property.

Do we really want to do that ? I am not sure. A cpus node is supposed to
be a container node, we should not define this binding just because we
know the kernel creates a platform device for it then.

interrupts is a cpu node property and I think it should be kept as such.

I know it will be duplicated and I know you can't rely on a platform
device for probing (since if I am not mistaken, removing a compatible
string from cpus prevents its platform device creation), but that's an issue
related to how the kernel works, you should not define DT bindings to solve
that IMHO.

Lorenzo

 +
 + - interrupts
 + Usage: required when node contains cpus with compatible
 +string qcom,krait.
 + Value type: prop-encoded-array
 + Definition: L1/CPU error interrupt
 +
  - cpu node
  
   Description: Describes a CPU in an ARM based system
 @@ -191,6 +205,11 @@ nodes to be present and contain the properties described 
 below.
 property identifying a 64-bit zero-initialised
 memory location.
  
 + - next-level-cache
 + Usage: optional
 + Value type: phandle
 + Definition: phandle pointing to the next level cache
 +
  Example 1 (dual-cluster big.LITTLE system 32-bit):
  
   cpus {
 @@ -382,3 +401,42 @@ cpus {
   cpu-release-addr = 0 0x2000;
   };
  };
 +
 +
 +Example 5 (Krait 32-bit system):
 +
 +cpus {
 + #address-cells = 1;
 + #size-cells = 0;
 + interrupts = 1 9 0xf04;
 + compatible = qcom,krait;
 +
 + cpu@0 {
 + device_type = cpu;
 + reg = 0;
 + next-level-cache = L2;
 + };
 +
 + cpu@1 {
 + device_type = cpu;
 + reg = 1;
 + next-level-cache = L2;
 + };
 +
 + cpu@2 {
 + device_type = cpu;
 + reg = 2;
 + next-level-cache = L2;
 + };
 +
 + cpu@3 {
 + device_type = cpu;
 + reg = 3;
 + next-level-cache = L2;
 + };
 +
 + L2: l2-cache {
 + compatible = cache;
 + interrupts = 0 2 0x4;
 + };
 +};
 -- 
 Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
 hosted by The Linux Foundation
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 04/10] base: power: Add generic OF-based power domain look-up

2014-01-16 Thread Lorenzo Pieralisi
Hi Tomasz,

thank you for posting this series. I would like to use the DT bindings
for power domains in the bindings for C-states on ARM:

http://comments.gmane.org/gmane.linux.power-management.general/41012

and in particular link a given C-state to a given power domain so that the
kernel will have a way to actually check what devices are lost upon C-state
entry (and for devices I also mean CPU peripheral like PMUs, GIC CPU IF,
caches and possibly cpus, all of them already represented with DT nodes).

I have a remark:

-  Can we group device nodes under a single power-domain-parent so that
   all devices defined under that parent won't have to re-define a
   power-domain property (a property like interrupt-parent, so to speak)

What do you think ?

Thanks,
Lorenzo

On Sat, Jan 11, 2014 at 07:42:46PM +, Tomasz Figa wrote:
 This patch introduces generic code to perform power domain look-up using
 device tree and automatically bind devices to their power domains.
 Generic device tree binding is introduced to specify power domains of
 devices in their device tree nodes.
 
 Backwards compatibility with legacy Samsung-specific power domain
 bindings is provided, but for now the new code is not compiled when
 CONFIG_ARCH_EXYNOS is selected to avoid collision with legacy code. This
 will change as soon as Exynos power domain code gets converted to use
 the generic framework in further patch.
 
 Signed-off-by: Tomasz Figa tomasz.f...@gmail.com
 ---
  .../devicetree/bindings/power/power_domain.txt |  51 
  drivers/base/power/domain.c| 339 
 +
  include/linux/pm_domain.h  |  34 +++
  kernel/power/Kconfig   |   4 +
  4 files changed, 428 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/power/power_domain.txt
 
 diff --git a/Documentation/devicetree/bindings/power/power_domain.txt 
 b/Documentation/devicetree/bindings/power/power_domain.txt
 new file mode 100644
 index 000..93be5d9
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/power/power_domain.txt
 @@ -0,0 +1,51 @@
 +* Generic power domains
 +
 +System on chip designs are often divided into multiple power domains that
 +can be used for power gating of selected IP blocks for power saving by
 +reduced leakage current.
 +
 +This device tree binding can be used to bind power domain consumer devices
 +with their power domains provided by power domain providers. A power domain
 +provider can be represented by any node in the device tree and can provide
 +one or more power domains. A consumer node can refer to the provider by
 +a phandle and a set of phandle arguments (so called power domain specifier)
 +of length specified by #power-domain-cells property in the power domain
 +provider node.
 +
 +==Power domain providers==
 +
 +Required properties:
 + - #power-domain-cells : Number of cells in a power domain specifier;
 +   Typically 0 for nodes representing a single power domain and 1 for nodes
 +   providing multiple power domains (e.g. power controllers), but can be
 +   any value as specified by device tree binding documentation of particular
 +   provider.
 +
 +Example:
 +
 +   power: power-controller@1234 {
 +   compatible = foo,power-controller;
 +   reg = 0x1234 0x1000;
 +   #power-domain-cells = 1;
 +   };
 +
 +The node above defines a power controller that is a power domain provider
 +and expects one cell as its phandle argument.
 +
 +==Power domain consumers==
 +
 +Required properties:
 + - power-domain : A phandle and power domain specifier as defined by bindings
 +  of power controller specified by phandle.
 +
 +Example:
 +
 +   leaky-device@1235 {
 +   compatible = foo,i-leak-current;
 +   reg = 0x1235 0x1000;
 +   power-domain = power 0;
 +   };
 +
 +The node above defines a typical power domain consumer device, which is 
 located
 +inside power domain with index 0 of power controller represented by node with
 +label power.
 diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
 index bfb8955..6d47498 100644
 --- a/drivers/base/power/domain.c
 +++ b/drivers/base/power/domain.c
 @@ -3,12 +3,16 @@
   *
   * Copyright (C) 2011 Rafael J. Wysocki r...@sisk.pl, Renesas Electronics 
 Corp.
   *
 + * Support for Device Tree based power domain providers:
 + * Copyright (C) 2014 Tomasz Figa tomasz.f...@gmail.com
 + *
   * This file is released under the GPLv2.
   */
 
  #include linux/init.h
  #include linux/kernel.h
  #include linux/io.h
 +#include linux/platform_device.h
  #include linux/pm_runtime.h
  #include linux/pm_domain.h
  #include linux/pm_qos.h
 @@ -2177,3 +2181,338 @@ void pm_genpd_init(struct generic_pm_domain *genpd,
 list_add(genpd-gpd_list_node, gpd_list);
 mutex_unlock(gpd_list_lock);
  }
 +
 +#ifdef CONFIG_PM_GENERIC_DOMAINS_OF
 +/*
 + * DEVICE TREE BASED 

Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-01-16 Thread Lorenzo Pieralisi
On Thu, Jan 16, 2014 at 06:05:05PM +, Stephen Boyd wrote:
 On 01/16, Lorenzo Pieralisi wrote:
  On Thu, Jan 16, 2014 at 01:38:40AM +, Stephen Boyd wrote:
   On 01/15, Stephen Boyd wrote:

Ah sorry, I forgot to put the compatible property here like in
the dts change. I'll do that in the next revision. Yes we need a
compatible property here to match the platform driver.

   
   This is the replacement patch
   
   -8--
   From: Stephen Boyd sb...@codeaurora.org
   Subject: [PATCH v9] devicetree: bindings: Document Krait CPU/L1 EDAC
   
   The Krait CPU/L1 error reporting device is made up a per-CPU
   interrupt. While we're here, document the next-level-cache
   property that's used by the Krait EDAC driver.
   
   Cc: Lorenzo Pieralisi lorenzo.pieral...@arm.com
   Cc: Mark Rutland mark.rutl...@arm.com
   Cc: Kumar Gala ga...@codeaurora.org
   Cc: devicet...@vger.kernel.org
   Signed-off-by: Stephen Boyd sb...@codeaurora.org
   ---
Documentation/devicetree/bindings/arm/cpus.txt | 58 
   ++
1 file changed, 58 insertions(+)
   
   diff --git a/Documentation/devicetree/bindings/arm/cpus.txt 
   b/Documentation/devicetree/bindings/arm/cpus.txt
   index 91304353eea4..03a529e791c4 100644
   --- a/Documentation/devicetree/bindings/arm/cpus.txt
   +++ b/Documentation/devicetree/bindings/arm/cpus.txt
   @@ -62,6 +62,20 @@ nodes to be present and contain the properties 
   described below.
 Value type: u32
 Definition: must be set to 0

   + - compatible
   + Usage: optional
   + Value type: string
   + Definition: should be one of the compatible strings listed
   + in the cpu node compatible property. This property
   + shall only be present if all the cpu nodes have the
   + same compatible property.
  
  Do we really want to do that ? I am not sure. A cpus node is supposed to
  be a container node, we should not define this binding just because we
  know the kernel creates a platform device for it then.
 
 This is just copying more of the ePAPR spec into this document.
 It just so happens that having a compatible field here allows a
 platform device to be created. I don't see why that's a problem.

I do not see why you cannot define a node like pmu or arch-timer and stick
a compatible property in there. cpus node does not represent a device, and
must not be created as a platform device, that's my opinion.

What would you do for big.LITTLE systems ? We are going to create two
cpus node because we need two platform devices ? I really think there
must be a better way to implement this, but I will let DT maintainers
make a decision.

  interrupts is a cpu node property and I think it should be kept as such.
  
  I know it will be duplicated and I know you can't rely on a platform
  device for probing (since if I am not mistaken, removing a compatible
  string from cpus prevents its platform device creation), but that's an issue
  related to how the kernel works, you should not define DT bindings to solve
  that IMHO.
 
 The interrupts property is also common for all cpus so it seems
 fine to collapse the value down into a PPI specifier indicating
 that all CPUs get the interrupt, similar to how we compress the
 information about the compatible string.

I think it is nicer to create a device node (as I said, like a pmu or an
arch-timer) and define interrupts there along with a proper compatible
property. This would serve the same purpose without adding properties in
the cpus node.

cpu-edac {
compatible = qcom,cpu-edac;
interrupts = ...;
};

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] devicetree: bindings: Document Krait CPU/L1 EDAC

2014-01-17 Thread Lorenzo Pieralisi
On Thu, Jan 16, 2014 at 07:26:17PM +, Stephen Boyd wrote:
 On 01/16, Lorenzo Pieralisi wrote:
  On Thu, Jan 16, 2014 at 06:05:05PM +, Stephen Boyd wrote:
   On 01/16, Lorenzo Pieralisi wrote:
Do we really want to do that ? I am not sure. A cpus node is supposed to
be a container node, we should not define this binding just because we
know the kernel creates a platform device for it then.
   
   This is just copying more of the ePAPR spec into this document.
   It just so happens that having a compatible field here allows a
   platform device to be created. I don't see why that's a problem.
  
  I do not see why you cannot define a node like pmu or arch-timer and stick
  a compatible property in there. cpus node does not represent a device, and
  must not be created as a platform device, that's my opinion.
  
 
 I had what you're suggesting before in the original revision of
 this patch. Please take a look at the original patch series[1]. I
 suppose it could be tweaked slightly to still have a cache node
 for the L2 interrupt and the next-level-cache pointer from the
 CPUs.

Ok, sorry, we are running around in circles here, basically you moved
the node to cpus according to reviews. I still think that treating cpus
as a device is not a great idea, even though I am in the same
position with C-states and probably will add C-state tables in the cpus
node.

http://comments.gmane.org/gmane.linux.power-management.general/41012

I just would like to see under cpus nodes and properties that apply to
all ARM systems, and avoid defining properties (eg interrupts) that
have different meanings for different ARM cores.

The question related to why the kernel should create a platform device
out of cpus is still open. I really do not want to block your series
for these simple issues but we have to make a decision and stick to that,
I am fine either way if we have a plan.

  What would you do for big.LITTLE systems ? We are going to create two
  cpus node because we need two platform devices ? I really think there
  must be a better way to implement this, but I will let DT maintainers
  make a decision.
 
 There is no such thing as big.LITTLE for Krait, so this is not a
 concern.

Yes, but if a core supporting big.LITTLE requires the same concept you
need here we have to cater for that because after all cpus is a node that
exists on ALL ARM systems, we should not rush and find a solution du
jour without thinking about all foreseeable use cases.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-01-30 Thread Lorenzo Pieralisi
On Thu, Jan 30, 2014 at 05:25:27PM +, Daniel Lezcano wrote:
 On 01/30/2014 05:35 PM, Peter Zijlstra wrote:
  On Thu, Jan 30, 2014 at 05:27:54PM +0100, Daniel Lezcano wrote:
  struct cpuidle_state *state = drv-states[rq-index];
 
  And from the state, we have the following informations:
 
  struct cpuidle_state {
 
 [ ... ]
 
   unsigned intexit_latency; /* in US */
   int power_usage; /* in mW */
   unsigned inttarget_residency; /* in US */
   booldisabled; /* disabled on all CPUs */
 
 [ ... ]
  };
 
  Right, but can we say that a higher index will save more power and have
  a higher exit latency? Or is a driver free to have a random mapping from
  idle_index to state?
 
 If the driver does its own random mapping that will break the governor 
 logic. So yes, the states are ordered, the higher the index is, the more 
 you save power and the higher the exit latency is.
 
  Also, we should probably create a pretty function to get that state,
  just like you did in patch 1.
 
 Yes, right.
 
  IIRC, Alex Shi sent a patchset to improve the choosing of the idlest cpu 
  and
  the exit_latency was needed.
 
  Right. However if we have a 'natural' order in the state array the index
  itself might often be sufficient to find the least idle state, in this
  specific case the absolute exit latency doesn't matter, all we want is
  the lowest one.
 
 Indeed. It could be simple as that. I feel we may need more informations 
 in the future but comparing the indexes could be a nice simple and 
 efficient solution.

As long as we take into account that some states might require multiple
CPUs to be idle in order to be entered, fine by me. But we should
certainly avoid waking up a CPU in a cluster that is in eg C2 (all CPUs in
C2, so cluster in C2) when there are CPUs in C3 in other clusters with
some CPUs running in those clusters, because there C3 means CPU in C3, not
cluster in C3. Overall what I am saying is that what you are doing
makes perfect sense but we have to take the above into account.

Some states have CPU and cluster (or we can call it package) components,
and that's true on ARM and other architectures too, to the best of my
knowledge.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

2014-01-31 Thread Lorenzo Pieralisi
On Thu, Jan 30, 2014 at 09:02:15PM +, Nicolas Pitre wrote:
 On Thu, 30 Jan 2014, Lorenzo Pieralisi wrote:
 
  On Thu, Jan 30, 2014 at 05:25:27PM +, Daniel Lezcano wrote:
   On 01/30/2014 05:35 PM, Peter Zijlstra wrote:
On Thu, Jan 30, 2014 at 05:27:54PM +0100, Daniel Lezcano wrote:
IIRC, Alex Shi sent a patchset to improve the choosing of the idlest 
cpu and
the exit_latency was needed.
   
Right. However if we have a 'natural' order in the state array the index
itself might often be sufficient to find the least idle state, in this
specific case the absolute exit latency doesn't matter, all we want is
the lowest one.
   
   Indeed. It could be simple as that. I feel we may need more informations 
   in the future but comparing the indexes could be a nice simple and 
   efficient solution.
  
  As long as we take into account that some states might require multiple
  CPUs to be idle in order to be entered, fine by me. But we should
  certainly avoid waking up a CPU in a cluster that is in eg C2 (all CPUs in
  C2, so cluster in C2) when there are CPUs in C3 in other clusters with
  some CPUs running in those clusters, because there C3 means CPU in C3, not
  cluster in C3. Overall what I am saying is that what you are doing
  makes perfect sense but we have to take the above into account.
  
  Some states have CPU and cluster (or we can call it package) components,
  and that's true on ARM and other architectures too, to the best of my
  knowledge.
 
 The notion of cluster or package maps pretty naturally onto scheduling 
 domains.  And the search for an idle CPU to wake up should avoid a 
 scheduling domain with a load of zero (which is obviously a prerequisite 
 for a power save mode to be applied to the cluster level) if there exist 
 idle CPUs in another domain already which load is not zero (all other 
 considerations being equal).  Hence your concern would be addressed 
 without any particular issue even if the individual CPU idle state index 
 is not exactly in sync with reality because of other hardware related 
 constraints.

Yes, just wanted to mention that relying solely on the C-state index
is not enough and can lead to surprises, as long as we take that into
account, it is ok. Highest index does not mean idlest CPU in power
consumption terms, because of cluster/package shared components.

It is probably worth noticing that parameters like eg exit_latency
have a meaning that necessarily depends on other CPUs in a cluster/package
for a given C-state, same logic as to my reasoning above applies.

 The other solution consists in making the index dynamic.  That means 
 letting backend idle drivers change it i.e. when the last man in a 
 cluster goes idle it could update the index for all the other CPUs in 
 the cluster.  There is no locking needed as the scheduler is only 
 consuming this info, and the scheduler getting it wrong on rare 
 occasions is not a big deal either.  But that looks pretty ugly as at 
 least 2 levels of abstractions would be breached in this case.

Yes, that's ugly, and that's the reason why tracing cluster C-states
require external tools (or HW probing) like the one Daniel wrote to
analize the time spent in cluster states.

Overall, it is not a concern, it is something we should take into account,
that's why I mentioned that.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 3/5] devicetree: bindings: Document Krait cache error interrupts

2014-04-29 Thread Lorenzo Pieralisi
On Tue, Apr 08, 2014 at 04:39:25PM +0100, Borislav Petkov wrote:
 On Fri, Apr 04, 2014 at 12:57:28PM -0700, Stephen Boyd wrote:
  The Krait L1/L2 error reporting hardware is made up a per-CPU
  interrupt for the L1 cache and a SPI interrupt for the L2.
  
  Cc: Lorenzo Pieralisi lorenzo.pieral...@arm.com
  Cc: Mark Rutland mark.rutl...@arm.com
  Cc: Kumar Gala ga...@codeaurora.org
  Cc: devicet...@vger.kernel.org
  Signed-off-by: Stephen Boyd sb...@codeaurora.org
  ---
   Documentation/devicetree/bindings/arm/cache.txt | 48 
  -
   1 file changed, 47 insertions(+), 1 deletion(-)
  
  diff --git a/Documentation/devicetree/bindings/arm/cache.txt 
  b/Documentation/devicetree/bindings/arm/cache.txt
  index b90fcc7c53cf..d7357e777399 100644
  --- a/Documentation/devicetree/bindings/arm/cache.txt
  +++ b/Documentation/devicetree/bindings/arm/cache.txt
 
 Right, that's http://www.spinics.net/lists/arm-kernel/msg308540.html
 
 So whoever picks those patches up, Lorenzo's doc needs to be in his tree
 first too.

Sorry for the delay in replying. Those cache bindings need an ACK to get
merged, and were introduced so that idle states can retrieve power domain
information for caches. I am going to revive the idle bindings thread
to see what we can/should merge of these bindings as things stand, I
really hope this won't block the series any further, otherwise we can
rework the patches so that this series can get in first, or simplify my
series to allow both to get merged as soon as possible without compromising
future requirements.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 2/2] arm64: topology: add MPIDR-based detection

2014-04-25 Thread Lorenzo Pieralisi
On Fri, Apr 25, 2014 at 04:18:42AM +0100, Zi Shen Lim wrote:
 Create cpu topology based on MPIDR. When hardware sets MPIDR to sane
 values, this method will always work. Therefore it should also work well
 as the fallback method. [1]

It has to be implemented as fallback, so you have to rebase this patch
on top of Mark's series.

 When we have multiple processing elements in the system, we create
 the cpu topology by mapping each affinity level (from lowest to highest)
 to threads (if they exist), cores, and clusters.
 
 We combine data from all higher affinity levels into cluster_id
 so we don't lose any information from MPIDR. [2]
 
 [1] http://www.spinics.net/lists/arm-kernel/msg317445.html
 [2] https://lkml.org/lkml/2014/4/23/703
 
 Signed-off-by: Zi Shen Lim z...@broadcom.com
 ---
 v1-v2: Addressed comments from Mark Brown.
  - Reduce noise. Use pr_debug instead of pr_info.
  - Don't ignore higher affinity levels.
 
  arch/arm64/include/asm/cputype.h |  2 ++
  arch/arm64/kernel/topology.c | 34 ++
  2 files changed, 36 insertions(+)
 
 diff --git a/arch/arm64/include/asm/cputype.h 
 b/arch/arm64/include/asm/cputype.h
 index c404fb0..7639e8b 100644
 --- a/arch/arm64/include/asm/cputype.h
 +++ b/arch/arm64/include/asm/cputype.h
 @@ -18,6 +18,8 @@
  
  #define INVALID_HWID ULONG_MAX
  
 +#define MPIDR_UP_BITMASK (0x1  30)
 +#define MPIDR_MT_BITMASK (0x1  24)
  #define MPIDR_HWID_BITMASK   0xff00ff
  
  #define MPIDR_LEVEL_BITS_SHIFT   3
 diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
 index 3e06b0b..7dbf981 100644
 --- a/arch/arm64/kernel/topology.c
 +++ b/arch/arm64/kernel/topology.c
 @@ -19,6 +19,8 @@
  #include linux/nodemask.h
  #include linux/sched.h
  
 +#include asm/cputype.h
 +#include asm/smp_plat.h
  #include asm/topology.h
  
  /*
 @@ -71,6 +73,38 @@ static void update_siblings_masks(unsigned int cpuid)
  
  void store_cpu_topology(unsigned int cpuid)
  {
 + struct cpu_topology *cpuid_topo = cpu_topology[cpuid];
 + u64 mpidr;
 +
 + mpidr = read_cpuid_mpidr();
 +
 + /* Create cpu topology mapping based on MPIDR. */
 + if (mpidr  MPIDR_UP_BITMASK) {
 + /* Uniprocessor system */
 + cpuid_topo-thread_id  = -1;
 + cpuid_topo-core_id= MPIDR_AFFINITY_LEVEL(mpidr, 0);
 + cpuid_topo-cluster_id = -1;
 + } else if (mpidr  MPIDR_MT_BITMASK) {
 + /* Multiprocessor system : Multi-threads per core */
 + cpuid_topo-thread_id  = MPIDR_AFFINITY_LEVEL(mpidr, 0);
 + cpuid_topo-core_id= MPIDR_AFFINITY_LEVEL(mpidr, 1);
 + cpuid_topo-cluster_id =
 + MPIDR_AFFINITY_LEVEL(mpidr, 2) |
 + MPIDR_AFFINITY_LEVEL(mpidr, 3)  
 mpidr_hash.shift_aff[3];

That's probably not what you want, even though you still end up with a
unique cluster identifier (but insanely large) if you get lucky and it
does not overflow an int. The shift is the amount of bits the level must be
shift _right_ to create the hash value.

I am wondering whether it is time for me to add those as macros.

 + } else {
 + /* Multiprocessor system : Single-thread per core */
 + cpuid_topo-thread_id  = -1;
 + cpuid_topo-core_id= MPIDR_AFFINITY_LEVEL(mpidr, 0);
 + cpuid_topo-cluster_id =
 + MPIDR_AFFINITY_LEVEL(mpidr, 1) |
 + MPIDR_AFFINITY_LEVEL(mpidr, 2)  
 mpidr_hash.shift_aff[2] |
 + MPIDR_AFFINITY_LEVEL(mpidr, 3)  
 mpidr_hash.shift_aff[3];

Ditto.

 + }
 +
 + pr_debug(CPU%u: cluster %d core %d thread %d mpidr %llx\n,
 +  cpuid, cpuid_topo-cluster_id, cpuid_topo-core_id,
 +  cpuid_topo-thread_id, mpidr);
 +
   update_siblings_masks(cpuid);

That's why I object. With this implementation MPIDR_EL1 takes over DT,
and we do not want that. It has to work the other way around.

What you should do, in update_sibling_masks(), check if the topology has
been reset (ie it is not set-up), and parse the MPIDR if that's the case.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2 4/4] mcpm: exynos: populate suspend and powered_up callbacks

2014-04-23 Thread Lorenzo Pieralisi
[added Nico in CC]

On Wed, Apr 23, 2014 at 10:25:54AM +0100, Chander Kashyap wrote:
 In order to support cpuidle through mcpm, suspend and powered-up
 callbacks are required in mcpm platform code.
 Hence populate the same callbacks.
 
 Signed-off-by: Chander Kashyap chander.kash...@linaro.org
 Signed-off-by: Chander Kashyap k.chan...@samsung.com
 ---
 changes in v2:
   1. Fixed typo: enynos_pmu_cpunr to exynos_pmu_cpunr
 
  arch/arm/mach-exynos/mcpm-exynos.c |   53 
 
  1 file changed, 53 insertions(+)
 
 diff --git a/arch/arm/mach-exynos/mcpm-exynos.c 
 b/arch/arm/mach-exynos/mcpm-exynos.c
 index 6c74c82..d53f597 100644
 --- a/arch/arm/mach-exynos/mcpm-exynos.c
 +++ b/arch/arm/mach-exynos/mcpm-exynos.c
 @@ -272,10 +272,63 @@ static int exynos_power_down_finish(unsigned int cpu, 
 unsigned int cluster)
   return 0; /* success: the CPU is halted */
  }
  
 +static void enable_coherency(void)
 +{
 + unsigned long v, u;
 +
 + asm volatile(
 + mrcp15, 0, %0, c1, c0, 1\n
 + orr%0, %0, %2\n
 + ldr%1, [%3]\n
 + and%1, %1, #0\n
 + orr%0, %0, %1\n
 + mcrp15, 0, %0, c1, c0, 1\n
 + : =r (v), =r (u)
 + : Ir (0x40), Ir (S5P_INFORM0)
 + : cc);
 +}
 +
 +void exynos_powered_up(void)
 +{
 + unsigned int mpidr, cpu, cluster;
 +
 + mpidr = read_cpuid_mpidr();
 + cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
 + cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
 +
 + arch_spin_lock(exynos_mcpm_lock);
 + if (cpu_use_count[cpu][cluster] == 0)
 + cpu_use_count[cpu][cluster] = 1;
 + arch_spin_unlock(exynos_mcpm_lock);
 +}
 +
 +static void exynos_suspend(u64 residency)
 +{
 + unsigned int mpidr, cpunr;
 +
 + mpidr = read_cpuid_mpidr();
 + cpunr = exynos_pmu_cpunr(mpidr);
 +
 + __raw_writel(virt_to_phys(mcpm_entry_point), REG_ENTRY_ADDR);
 +
 + exynos_power_down();
 +
 + /*
 +  * Execution reaches here only if cpu did not power down.
 +  * Hence roll back the changes done in exynos_power_down function.
 + */
 + __raw_writel(EXYNOS_CORE_LOCAL_PWR_EN,
 + EXYNOS_ARM_CORE_CONFIGURATION(cpunr));
 + set_cr(get_cr() | CR_C);
 + enable_coherency();

This is wrong:

1) MCPM would eventually reboot the CPU in question if the suspend call
   returns (and restore SCTLR and ACTLR in cpu_resume), so there is 0 point
   in doing that here.
2) The core would have executed out of coherency for a while so the
   tlbs could be stale and you do not invalidate them. But given (1), (2)
   becomes just informational. The register write must be executed
   though (I guess...). Now, on restoring the SMP bit in cpu_resume
   (errata 799270) I need to verify this is safe and get back to you.

Cheers,
Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch v2 2/4] driver: cpuidle: cpuidle-big-little: init driver for Exynos5420

2014-04-23 Thread Lorenzo Pieralisi
On Wed, Apr 23, 2014 at 10:25:52AM +0100, Chander Kashyap wrote:
 Add samsung,exynos5420 compatible string to initialize generic
 big-little cpuidle driver for Exynos5420.
 
 Signed-off-by: Chander Kashyap chander.kash...@linaro.org
 Signed-off-by: Chander Kashyap k.chan...@samsung.com
 Acked-by: Daniel Lezcano daniel.lezc...@linaro.org
 ---
  drivers/cpuidle/cpuidle-big_little.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/cpuidle/cpuidle-big_little.c 
 b/drivers/cpuidle/cpuidle-big_little.c
 index b45fc62..d0fac53 100644
 --- a/drivers/cpuidle/cpuidle-big_little.c
 +++ b/drivers/cpuidle/cpuidle-big_little.c
 @@ -170,7 +170,8 @@ static int __init bl_idle_init(void)
   /*
* Initialize the driver just for a compliant set of machines
*/
 - if (!of_machine_is_compatible(arm,vexpress,v2p-ca15_a7))
 + if (!of_machine_is_compatible(arm,vexpress,v2p-ca15_a7) 
 + (!of_machine_is_compatible(samsung,exynos5420)))
   return -ENODEV;

We should handle the string matching differently, we can't keep adding
comparisons.

Daniel raised the point already: what about the idle tables (data and
number of states ?). TC2 has just a cluster state, and specific
latencies, which are highly unlikely to be correct for this platform.

Lorenzo

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >