Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?

2014-02-09 Thread Stephen N Chivers
James Yang james.y...@freescale.com wrote on 02/08/2014 07:49:40 AM:

 From: James Yang james.y...@freescale.com
 To: Gabriel Paubert paub...@iram.es
 Cc: Stephen N Chivers schiv...@csc.com.au, Chris Proctor 
 cproc...@csc.com.au, linuxppc-dev@lists.ozlabs.org
 Date: 02/08/2014 07:49 AM
 Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
 
 On Fri, 7 Feb 2014, Gabriel Paubert wrote:
 
 Hi Stephen,
  
  On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote:
   Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM:
   
From: Gabriel Paubert paub...@iram.es
To: Stephen N Chivers schiv...@csc.com.au
Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor 
cproc...@csc.com.au
Date: 02/06/2014 07:26 PM
Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?

On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote:
 
 
 mask = 0;
 if (FM  (1  0))
 mask |= 0x000f;
 if (FM  (1  1))
 mask |= 0x00f0;
 if (FM  (1  2))
 mask |= 0x0f00;
 if (FM  (1  3))
 mask |= 0xf000;
 if (FM  (1  4))
 mask |= 0x000f;
 if (FM  (1  5))
 mask |= 0x00f0;
 if (FM  (1  6))
 mask |= 0x0f00;
 if (FM  (1  7))
 mask |= 0x9000;
 
 With the above mask computation I get consistent results for 
 both the MPC8548 and MPC7410 boards.
 
 Am I missing something subtle?

No I think you are correct. This said, this code may probably be 
   optimized 
to eliminate a lot of the conditional branches. I think that:
 
 
 If the compiler is enabled to generate isel instructions, it would not 
 use a conditional branch for this code. (ignore the andi's values, 
 this is an old compile)
 
From limited research, the 440GP is a processor
that doesn't implement the isel instruction and it does
not implement floating point.

The kernel emulates isel and so using that instruction
for the 440GP would have a double trap penalty.

Correct me if I am wrong, the isel instruction first appears
in PowerPC ISA v2.04 around mid 2007. 

Stephen Chivers,
CSC Australia Pty. Ltd. 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-09 Thread Joonsoo Kim
On Sat, Feb 08, 2014 at 01:57:39AM -0800, David Rientjes wrote:
 On Fri, 7 Feb 2014, Joonsoo Kim wrote:
 
   It seems like a better approach would be to do this when a node is 
   brought 
   online and determine the fallback node based not on the zonelists as you 
   do here but rather on locality (such as through a SLIT if provided, see 
   node_distance()).
  
  Hmm...
  I guess that zonelist is base on locality. Zonelist is generated using
  node_distance(), so I think that it reflects locality. But, I'm not expert
  on NUMA, so please let me know what I am missing here :)
  
 
 The zonelist is, yes, but I'm talking about memoryless and cpuless nodes.  
 If your solution is going to become the generic kernel API that determines 
 what node has local memory for a particular node, then it will have to 
 support all definitions of node.  That includes nodes that consist solely 
 of I/O, chipsets, networking, or storage devices.  These nodes may not 
 have memory or cpus, so doing it as part of onlining cpus isn't going to 
 be generic enough.  You want a node_to_mem_node() API for all possible 
 node types (the possible node types listed above are straight from the 
 ACPI spec).  For 99% of people, node_to_mem_node(X) is always going to be 
 X and we can optimize for that, but any solution that relies on cpu online 
 is probably shortsighted right now.
 
 I think it would be much better to do this as a part of setting a node to 
 be online.

Okay. I got your point.
I will change it to rely on node online if this patch is really needed.

Thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-09 Thread Joonsoo Kim
On Fri, Feb 07, 2014 at 01:38:55PM -0800, Nishanth Aravamudan wrote:
 On 07.02.2014 [12:51:07 -0600], Christoph Lameter wrote:
  Here is a draft of a patch to make this work with memoryless nodes.
 
 Hi Christoph, this should be tested instead of Joonsoo's patch 2 (and 3)?

Hello,

I guess that your system has another problem that makes my patches inactive.
Maybe it will also affect to the Christoph's one. Could you confirm 
page_to_nid(),
numa_mem_id() and node_present_pages although I doubt mostly about 
page_to_nid()?

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node

2014-02-09 Thread Joonsoo Kim
On Fri, Feb 07, 2014 at 11:49:57AM -0600, Christoph Lameter wrote:
 On Fri, 7 Feb 2014, Joonsoo Kim wrote:
 
   This check wouild need to be something that checks for other contigencies
   in the page allocator as well. A simple solution would be to actually run
   a GFP_THIS_NODE alloc to see if you can grab a page from the proper node.
   If that fails then fallback. See how fallback_alloc() does it in slab.
  
 
  Hello, Christoph.
 
  This !node_present_pages() ensure that allocation on this node cannot 
  succeed.
  So we can directly use numa_mem_id() here.
 
 Yes of course we can use numa_mem_id().
 
 But the check is only for not having any memory at all on a node. There
 are other reason for allocations to fail on a certain node. The node could
 have memory that cannot be reclaimed, all dirty, beyond certain
 thresholds, not in the current set of allowed nodes etc etc.

Yes. There are many other cases, but I prefer that we think them separately.
Maybe they needs another approach. For now, to solve memoryless node problem,
my solution is enough and safe.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] clocksource: Make clocksource register functions void

2014-02-09 Thread Yijing Wang
On 2014/2/6 4:40, Thomas Gleixner wrote:
 Yijing,
 
 On Thu, 23 Jan 2014, David Laight wrote:
 
 From: Linuxppc-dev Tony Prisk
 On 23/01/14 20:12, Yijing Wang wrote:
 Currently, clocksource_register() and __clocksource_register_scale()
 functions always return 0, it's pointless, make functions void.
 And remove the dead code that check the clocksource_register_hz()
 return value.
 ..
 -static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
 +static inline void clocksource_register_hz(struct clocksource *cs, u32 hz)
   {
return __clocksource_register_scale(cs, 1, hz);
   }

 This doesn't make sense - you are still returning a value on a function
 declared void, and the return is now from a function that doesn't return
 anything either ?!?!
 Doesn't this throw a compile-time warning??

 It depends on the compiler.
 Recent gcc allow it.
 I don't know if it is actually valid C though.

 There is no excuse for it on lines like the above though.
 
 Can you please resend with that fixed against 3.14-rc1 ?

OK, I will resend later.

Thanks!
Yijing.


 
 .
 


-- 
Thanks!
Yijing

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node

2014-02-09 Thread Joonsoo Kim
On Fri, Feb 07, 2014 at 12:51:07PM -0600, Christoph Lameter wrote:
 Here is a draft of a patch to make this work with memoryless nodes.
 
 The first thing is that we modify node_match to also match if we hit an
 empty node. In that case we simply take the current slab if its there.

Why not inspecting whether we can get the page on the best node such as
numa_mem_id() node?

 
 If there is no current slab then a regular allocation occurs with the
 memoryless node. The page allocator will fallback to a possible node and
 that will become the current slab. Next alloc from a memoryless node
 will then use that slab.
 
 For that we also add some tracking of allocations on nodes that were not
 satisfied using the empty_node[] array. A successful alloc on a node
 clears that flag.
 
 I would rather avoid the empty_node[] array since its global and there may
 be thread specific allocation restrictions but it would be expensive to do
 an allocation attempt via the page allocator to make sure that there is
 really no page available from the page allocator.
 
 Index: linux/mm/slub.c
 ===
 --- linux.orig/mm/slub.c  2014-02-03 13:19:22.896853227 -0600
 +++ linux/mm/slub.c   2014-02-07 12:44:49.311494806 -0600
 @@ -132,6 +132,8 @@ static inline bool kmem_cache_has_cpu_pa
  #endif
  }
 
 +static int empty_node[MAX_NUMNODES];
 +
  /*
   * Issues still to be resolved:
   *
 @@ -1405,16 +1407,22 @@ static struct page *new_slab(struct kmem
   void *last;
   void *p;
   int order;
 + int alloc_node;
 
   BUG_ON(flags  GFP_SLAB_BUG_MASK);
 
   page = allocate_slab(s,
   flags  (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
 - if (!page)
 + if (!page) {
 + if (node != NUMA_NO_NODE)
 + empty_node[node] = 1;
   goto out;
 + }

empty_node cannot be set on memoryless node, since page allocation would
succeed on different node.

Thanks.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RESEND PATCH 0/3] powerpc: Free up an IPI message slot for tick broadcast IPIs

2014-02-09 Thread Preeti U Murthy
This patchset is a precursor for enabling deep idle states on powerpc,
when the local CPU timers stop. The tick broadcast framework in
the Linux Kernel today handles wakeup of such CPUs at their next timer event
by using an external clock device. At the expiry of this clock device, IPIs
are sent to the CPUs in deep idle states  so that they wakeup to handle their
respective timers. This patchset frees up one of the IPI slots on powerpc
so as to be used to handle the tick broadcast IPI.

On certain implementations of powerpc, such an external clock device is absent.
The support in the tick broadcast framework to handle wakeup of CPUs from
deep idle states on such implementations is currently in the tip tree.
https://lkml.org/lkml/2014/2/7/906
https://lkml.org/lkml/2014/2/7/876
https://lkml.org/lkml/2014/2/7/608

With the above support in place, this patchset is next in line to enable deep
idle states on powerpc.

The patchset has been appended by a RESEND tag since nothing has changed from
the previous post except for an added config condition around
tick_broadcast() which handles sending broadcast IPIs, and the update in the 
cover
letter.
---

Preeti U Murthy (1):
  cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt 
handling routines

Srivatsa S. Bhat (2):
  powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
  powerpc: Implement tick broadcast IPI as a fixed IPI message


 arch/powerpc/include/asm/smp.h  |2 -
 arch/powerpc/include/asm/time.h |1 
 arch/powerpc/kernel/smp.c   |   25 ++---
 arch/powerpc/kernel/time.c  |   86 ++-
 arch/powerpc/platforms/cell/interrupt.c |2 -
 arch/powerpc/platforms/ps3/smp.c|2 -
 6 files changed, 73 insertions(+), 45 deletions(-)

-- 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RESEND PATCH 2/3] powerpc: Implement tick broadcast IPI as a fixed IPI message

2014-02-09 Thread Preeti U Murthy
From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

For scalability and performance reasons, we want the tick broadcast IPIs
to be handled as efficiently as possible. Fixed IPI messages
are one of the most efficient mechanisms available - they are faster than
the smp_call_function mechanism because the IPI handlers are fixed and hence
they don't involve costly operations such as adding IPI handlers to the target
CPU's function queue, acquiring locks for synchronization etc.

Luckily we have an unused IPI message slot, so use that to implement
tick broadcast IPIs efficiently.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
[Functions renamed to tick_broadcast* and Changelog modified by
 Preeti U. Murthypre...@linux.vnet.ibm.com]
Signed-off-by: Preeti U. Murthy pre...@linux.vnet.ibm.com
Acked-by: Geoff Levand ge...@infradead.org [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h  |2 +-
 arch/powerpc/include/asm/time.h |1 +
 arch/powerpc/kernel/smp.c   |   21 +
 arch/powerpc/kernel/time.c  |5 +
 arch/powerpc/platforms/cell/interrupt.c |2 +-
 arch/powerpc/platforms/ps3/smp.c|2 +-
 6 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9f7356b..ff51046 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE  1
-#define PPC_MSG_UNUSED 2
+#define PPC_MSG_TICK_BROADCAST 2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..1d428e6 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -28,6 +28,7 @@ extern struct clock_event_device decrementer_clockevent;
 struct rtc_time;
 extern void to_tm(int tim, struct rtc_time * tm);
 extern void GregorianDay(struct rtc_time *tm);
+extern void tick_broadcast_ipi_handler(void);
 
 extern void generic_calibrate_decr(void);
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7d76b..e2a4232 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -35,6 +35,7 @@
 #include asm/ptrace.h
 #include linux/atomic.h
 #include asm/irq.h
+#include asm/hw_irq.h
 #include asm/page.h
 #include asm/pgtable.h
 #include asm/prom.h
@@ -145,9 +146,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
return IRQ_HANDLED;
 }
 
-static irqreturn_t unused_action(int irq, void *data)
+static irqreturn_t tick_broadcast_ipi_action(int irq, void *data)
 {
-   /* This slot is unused and hence available for use, if needed */
+   tick_broadcast_ipi_handler();
return IRQ_HANDLED;
 }
 
@@ -168,14 +169,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
[PPC_MSG_CALL_FUNCTION] =  call_function_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
-   [PPC_MSG_UNUSED] = unused_action,
+   [PPC_MSG_TICK_BROADCAST] = tick_broadcast_ipi_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
[PPC_MSG_CALL_FUNCTION] =  ipi call function,
[PPC_MSG_RESCHEDULE] = ipi reschedule,
-   [PPC_MSG_UNUSED] = ipi unused,
+   [PPC_MSG_TICK_BROADCAST] = ipi tick-broadcast,
[PPC_MSG_DEBUGGER_BREAK] = ipi debugger,
 };
 
@@ -251,6 +252,8 @@ irqreturn_t smp_ipi_demux(void)
generic_smp_call_function_interrupt();
if (all  IPI_MESSAGE(PPC_MSG_RESCHEDULE))
scheduler_ipi();
+   if (all  IPI_MESSAGE(PPC_MSG_TICK_BROADCAST))
+   tick_broadcast_ipi_handler();
if (all  IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
debug_ipi_action(0, NULL);
} while (info-messages);
@@ -289,6 +292,16 @@ void arch_send_call_function_ipi_mask(const struct cpumask 
*mask)
do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
+#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
+void tick_broadcast(const struct cpumask *mask)
+{
+   unsigned int cpu;
+
+   for_each_cpu(cpu, mask)
+   do_message_pass(cpu, PPC_MSG_TICK_BROADCAST);
+}
+#endif
+
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 void smp_send_debugger_break(void)
 {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3dab20..3ff97db 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -825,6 +825,11 @@ static void decrementer_set_mode(enum clock_event_mode 
mode,
decrementer_set_next_event(DECREMENTER_MAX, dev);
 }
 
+/* Interrupt handler for the timer broadcast IPI */
+void 

[RESEND PATCH 1/3] powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message

2014-02-09 Thread Preeti U Murthy
From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE map
to a common implementation - generic_smp_call_function_single_interrupt(). So,
we can consolidate them and save one of the IPI message slots, (which are
precious on powerpc, since only 4 of those slots are available).

So, implement the functionality of PPC_MSG_CALL_FUNC_SINGLE using
PPC_MSG_CALL_FUNC itself and release its IPI message slot, so that it can be
used for something else in the future, if desired.

Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com
Signed-off-by: Preeti U. Murthy pre...@linux.vnet.ibm.com
Acked-by: Geoff Levand ge...@infradead.org [For the PS3 part]
---

 arch/powerpc/include/asm/smp.h  |2 +-
 arch/powerpc/kernel/smp.c   |   12 +---
 arch/powerpc/platforms/cell/interrupt.c |2 +-
 arch/powerpc/platforms/ps3/smp.c|2 +-
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 084e080..9f7356b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
  * in /proc/interrupts will be wrong!!! --Troy */
 #define PPC_MSG_CALL_FUNCTION   0
 #define PPC_MSG_RESCHEDULE  1
-#define PPC_MSG_CALL_FUNC_SINGLE   2
+#define PPC_MSG_UNUSED 2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
 /* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..ee7d76b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -145,9 +145,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
return IRQ_HANDLED;
 }
 
-static irqreturn_t call_function_single_action(int irq, void *data)
+static irqreturn_t unused_action(int irq, void *data)
 {
-   generic_smp_call_function_single_interrupt();
+   /* This slot is unused and hence available for use, if needed */
return IRQ_HANDLED;
 }
 
@@ -168,14 +168,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
 static irq_handler_t smp_ipi_action[] = {
[PPC_MSG_CALL_FUNCTION] =  call_function_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
-   [PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
+   [PPC_MSG_UNUSED] = unused_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
 };
 
 const char *smp_ipi_name[] = {
[PPC_MSG_CALL_FUNCTION] =  ipi call function,
[PPC_MSG_RESCHEDULE] = ipi reschedule,
-   [PPC_MSG_CALL_FUNC_SINGLE] = ipi call function single,
+   [PPC_MSG_UNUSED] = ipi unused,
[PPC_MSG_DEBUGGER_BREAK] = ipi debugger,
 };
 
@@ -251,8 +251,6 @@ irqreturn_t smp_ipi_demux(void)
generic_smp_call_function_interrupt();
if (all  IPI_MESSAGE(PPC_MSG_RESCHEDULE))
scheduler_ipi();
-   if (all  IPI_MESSAGE(PPC_MSG_CALL_FUNC_SINGLE))
-   generic_smp_call_function_single_interrupt();
if (all  IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
debug_ipi_action(0, NULL);
} while (info-messages);
@@ -280,7 +278,7 @@ EXPORT_SYMBOL_GPL(smp_send_reschedule);
 
 void arch_send_call_function_single_ipi(int cpu)
 {
-   do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
+   do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
 }
 
 void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/powerpc/platforms/cell/interrupt.c 
b/arch/powerpc/platforms/cell/interrupt.c
index 2d42f3b..adf3726 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
 {
iic_request_ipi(PPC_MSG_CALL_FUNCTION);
iic_request_ipi(PPC_MSG_RESCHEDULE);
-   iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
+   iic_request_ipi(PPC_MSG_UNUSED);
iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
 }
 
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 4b35166..00d1a7c 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
 
BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION!= 0);
BUILD_BUG_ON(PPC_MSG_RESCHEDULE   != 1);
-   BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
+   BUILD_BUG_ON(PPC_MSG_UNUSED   != 2);
BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK   != 3);
 
for (i = 0; i  MSG_COUNT; i++) {

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[RESEND PATCH 3/3] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines

2014-02-09 Thread Preeti U Murthy
From: Preeti U Murthy pre...@linux.vnet.ibm.com

Split timer_interrupt(), which is the local timer interrupt handler on ppc
into routines called during regular interrupt handling and __timer_interrupt(),
which takes care of running local timers and collecting time related stats.

This will enable callers interested only in running expired local timers to
directly call into __timer_interupt(). One of the use cases of this is the
tick broadcast IPI handling in which the sleeping CPUs need to handle the local
timers that have expired.

Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
---

 arch/powerpc/kernel/time.c |   81 +---
 1 file changed, 46 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3ff97db..df2989b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -478,6 +478,47 @@ void arch_irq_work_raise(void)
 
 #endif /* CONFIG_IRQ_WORK */
 
+void __timer_interrupt(void)
+{
+   struct pt_regs *regs = get_irq_regs();
+   u64 *next_tb = __get_cpu_var(decrementers_next_tb);
+   struct clock_event_device *evt = __get_cpu_var(decrementers);
+   u64 now;
+
+   trace_timer_interrupt_entry(regs);
+
+   if (test_irq_work_pending()) {
+   clear_irq_work_pending();
+   irq_work_run();
+   }
+
+   now = get_tb_or_rtc();
+   if (now = *next_tb) {
+   *next_tb = ~(u64)0;
+   if (evt-event_handler)
+   evt-event_handler(evt);
+   __get_cpu_var(irq_stat).timer_irqs_event++;
+   } else {
+   now = *next_tb - now;
+   if (now = DECREMENTER_MAX)
+   set_dec((int)now);
+   /* We may have raced with new irq work */
+   if (test_irq_work_pending())
+   set_dec(1);
+   __get_cpu_var(irq_stat).timer_irqs_others++;
+   }
+
+#ifdef CONFIG_PPC64
+   /* collect purr register values often, for accurate calculations */
+   if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+   struct cpu_usage *cu = __get_cpu_var(cpu_usage_array);
+   cu-current_tb = mfspr(SPRN_PURR);
+   }
+#endif
+
+   trace_timer_interrupt_exit(regs);
+}
+
 /*
  * timer_interrupt - gets called when the decrementer overflows,
  * with interrupts disabled.
@@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs)
 {
struct pt_regs *old_regs;
u64 *next_tb = __get_cpu_var(decrementers_next_tb);
-   struct clock_event_device *evt = __get_cpu_var(decrementers);
-   u64 now;
 
/* Ensure a positive value is written to the decrementer, or else
 * some CPUs will continue to take decrementer exceptions.
@@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs)
old_regs = set_irq_regs(regs);
irq_enter();
 
-   trace_timer_interrupt_entry(regs);
-
-   if (test_irq_work_pending()) {
-   clear_irq_work_pending();
-   irq_work_run();
-   }
-
-   now = get_tb_or_rtc();
-   if (now = *next_tb) {
-   *next_tb = ~(u64)0;
-   if (evt-event_handler)
-   evt-event_handler(evt);
-   __get_cpu_var(irq_stat).timer_irqs_event++;
-   } else {
-   now = *next_tb - now;
-   if (now = DECREMENTER_MAX)
-   set_dec((int)now);
-   /* We may have raced with new irq work */
-   if (test_irq_work_pending())
-   set_dec(1);
-   __get_cpu_var(irq_stat).timer_irqs_others++;
-   }
-
-#ifdef CONFIG_PPC64
-   /* collect purr register values often, for accurate calculations */
-   if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
-   struct cpu_usage *cu = __get_cpu_var(cpu_usage_array);
-   cu-current_tb = mfspr(SPRN_PURR);
-   }
-#endif
-
-   trace_timer_interrupt_exit(regs);
-
+   __timer_interrupt();
irq_exit();
set_irq_regs(old_regs);
 }
@@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode 
mode,
 /* Interrupt handler for the timer broadcast IPI */
 void tick_broadcast_ipi_handler(void)
 {
+   u64 *next_tb = __get_cpu_var(decrementers_next_tb);
+
+   *next_tb = get_tb_or_rtc();
+   __timer_interrupt();
 }
 
 static void register_decrementer_clockevent(int cpu)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Convert powerpc simple spinlocks into ticket locks

2014-02-09 Thread Benjamin Herrenschmidt
On Thu, 2014-02-06 at 13:28 -0600, Tom Musta wrote:
 My read is consistent with Torsten's ... this looks like a bad idea.
 
 Look at the RTL for sthcx. on page 692 (Power ISA V2.06) and you will
 see this:
 
 if RESERVE then
   if RESERVE_LENGTH = 2 then
  ...
   else
  undefined_case - 1
 else
   ...
 
 A legal implementation might never perform the store.

This is an area where we definitely want to check with the implementors
and if the implementations happen to do what we want (they likely do),
get the architecture changed for future chips and use it anyway.

There's a a *significant* benefit in avoiding an atomic operation in the
unlock case .

The reservation mechanism being based on a granule that is generally a
cache line, I doubt implementations will ever check the actual access
size, but we need to double check.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Convert powerpc simple spinlocks into ticket locks

2014-02-09 Thread Benjamin Herrenschmidt
On Fri, 2014-02-07 at 10:02 +0100, Torsten Duwe wrote:
Can you pair lwarx with sthcx ? I couldn't immediately find the answer
in the PowerISA doc. If so I think you can do better by being able to
atomically load both tickets but only storing the head without affecting
the tail.
 
 Can I simply write the half word, without a reservation, or will the HW caches
 mess up the other half? Will it ruin the cache coherency on some 
 (sub)architectures?

Yes, you can, I *think*

  Plus, sthcx doesn't exist on all PPC chips.
 
 Which ones are lacking it? Do all have at least a simple 16-bit store?

half word atomics (and byte atomics) are new, they've been added in architecture
2.06 I believe so it's fairly recent, but it's still worthwhile to investigate a
way to avoid atomics on unlock on recent processors (we can use instruction 
patching
if necessary based on CPU features) because there's definitely a significant 
cost
in doing a larx/stcx. sequence on powerpc, way higher than our current unlock 
path
of barrier + store.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Convert powerpc simple spinlocks into ticket locks

2014-02-09 Thread Benjamin Herrenschmidt
On Fri, 2014-02-07 at 09:51 -0600, Kumar Gala wrote:
 On Feb 7, 2014, at 3:02 AM, Torsten Duwe d...@lst.de wrote:
 
  On Thu, Feb 06, 2014 at 02:19:52PM -0600, Scott Wood wrote:
  On Thu, 2014-02-06 at 18:37 +0100, Torsten Duwe wrote:
  On Thu, Feb 06, 2014 at 05:38:37PM +0100, Peter Zijlstra wrote:
  
  Can you pair lwarx with sthcx ? I couldn't immediately find the answer
  in the PowerISA doc. If so I think you can do better by being able to
  atomically load both tickets but only storing the head without affecting
  the tail.
  
  Can I simply write the half word, without a reservation, or will the HW 
  caches
  mess up the other half? Will it ruin the cache coherency on some 
  (sub)architectures?
 
 The coherency should be fine, I just can’t remember if you’ll lose the 
 reservation by doing this.

Yes you do.

  Plus, sthcx doesn't exist on all PPC chips.
  
  Which ones are lacking it? Do all have at least a simple 16-bit store?
 
 Everything implements a simple 16-bit store, just not everything implements 
 the store conditional of 16-bit data.

Ben.

 - k--
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc ticket locks

2014-02-09 Thread Benjamin Herrenschmidt
On Fri, 2014-02-07 at 17:58 +0100, Torsten Duwe wrote:
  typedef struct {
 -   volatile unsigned int slock;
 -} arch_spinlock_t;
 +   union {
 +   __ticketpair_t head_tail;
 +   struct __raw_tickets {
 +#ifdef __BIG_ENDIAN__  /* The tail part should be in the MSBs */
 +   __ticket_t tail, head;
 +#else
 +   __ticket_t head, tail;
 +#endif
 +   } tickets;
 +   };
 +#if defined(CONFIG_PPC_SPLPAR)
 +   u32 holder;
 +#endif
 +} arch_spinlock_t __aligned(4);

That's still broken with lockref (which we just merged).

We must have the arch_spinlock_t and the ref in the same 64-bit word
otherwise it will break.

We can make it work in theory since the holder doesn't have to be
accessed atomically, but the practicals are a complete mess ...
lockref would essentially have to re-implement the holder handling
of the spinlocks and use lower level ticket stuff.

Unless you can find a sneaky trick ... :-(

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] PPC: powernv: remove redundant cpuidle_idle_call()

2014-02-09 Thread Preeti U Murthy
Hi Peter,

On 02/07/2014 06:11 PM, Peter Zijlstra wrote:
 On Fri, Feb 07, 2014 at 05:11:26PM +0530, Preeti U Murthy wrote:
 But observe the idle state snooze on powerpc. The power that this idle
 state saves is through the lowering of the thread priority of the CPU.
 After it lowers the thread priority, it is done. It cannot
 wait_for_interrupts. It will exit my_idle(). It is now upto the
 generic idle loop to increase the thread priority if the need_resched
 flag is set. Only an interrupt routine can increase the thread priority.
 Else we will need to do it explicitly. And in such states which have a
 polling nature, the cpu will not receive a reschedule IPI.

 That is why in the snooze_loop() we poll on need_resched. If it is set
 we up the priority of the thread using HMT_MEDIUM() and then exit the
 my_idle() loop. In case of interrupts, the priority gets automatically
 increased.
 
 You can poll without setting TS_POLLING/TIF_POLLING_NRFLAGS just fine
 and get the IPI if that is what you want.
 
 Depending on how horribly unprovisioned the thread gets at the lowest
 priority, that might actually be faster than polling and raising the
 prio whenever it does get ran.

So I am assuming you mean something like the below:

my_idle()
{
   local_irq_enable();
   /* Remove the setting of the polling flag */
   HMT_low();
   return index;
}

And then exit into the generic idle loop. But the issue I see here is
that the TS_POLLING/TIF_POLLING_NRFLAGS gets set immediately. So, if on
testing need_resched() immediately after this returns that the
TIF_NEED_RESCHED flag is set, the thread will exit at low priority right?
 We could raise the priority of the thread in arch_cpu_idle_exit() soon
after setting the polling flag but that would mean for cases where the
TIF_NEED_RESCHED flag is not set we unnecessarily raise the priority of
the thread.

Thanks

Regards
Preeti U Murthy

 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev