Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask?
James Yang james.y...@freescale.com wrote on 02/08/2014 07:49:40 AM: From: James Yang james.y...@freescale.com To: Gabriel Paubert paub...@iram.es Cc: Stephen N Chivers schiv...@csc.com.au, Chris Proctor cproc...@csc.com.au, linuxppc-dev@lists.ozlabs.org Date: 02/08/2014 07:49 AM Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask? On Fri, 7 Feb 2014, Gabriel Paubert wrote: Hi Stephen, On Fri, Feb 07, 2014 at 11:27:57AM +1000, Stephen N Chivers wrote: Gabriel Paubert paub...@iram.es wrote on 02/06/2014 07:26:37 PM: From: Gabriel Paubert paub...@iram.es To: Stephen N Chivers schiv...@csc.com.au Cc: linuxppc-dev@lists.ozlabs.org, Chris Proctor cproc...@csc.com.au Date: 02/06/2014 07:26 PM Subject: Re: arch/powerpc/math-emu/mtfsf.c - incorrect mask? On Thu, Feb 06, 2014 at 12:09:00PM +1000, Stephen N Chivers wrote: mask = 0; if (FM (1 0)) mask |= 0x000f; if (FM (1 1)) mask |= 0x00f0; if (FM (1 2)) mask |= 0x0f00; if (FM (1 3)) mask |= 0xf000; if (FM (1 4)) mask |= 0x000f; if (FM (1 5)) mask |= 0x00f0; if (FM (1 6)) mask |= 0x0f00; if (FM (1 7)) mask |= 0x9000; With the above mask computation I get consistent results for both the MPC8548 and MPC7410 boards. Am I missing something subtle? No I think you are correct. This said, this code may probably be optimized to eliminate a lot of the conditional branches. I think that: If the compiler is enabled to generate isel instructions, it would not use a conditional branch for this code. (ignore the andi's values, this is an old compile) From limited research, the 440GP is a processor that doesn't implement the isel instruction and it does not implement floating point. The kernel emulates isel and so using that instruction for the 440GP would have a double trap penalty. Correct me if I am wrong, the isel instruction first appears in PowerPC ISA v2.04 around mid 2007. Stephen Chivers, CSC Australia Pty. Ltd. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
On Sat, Feb 08, 2014 at 01:57:39AM -0800, David Rientjes wrote: On Fri, 7 Feb 2014, Joonsoo Kim wrote: It seems like a better approach would be to do this when a node is brought online and determine the fallback node based not on the zonelists as you do here but rather on locality (such as through a SLIT if provided, see node_distance()). Hmm... I guess that zonelist is base on locality. Zonelist is generated using node_distance(), so I think that it reflects locality. But, I'm not expert on NUMA, so please let me know what I am missing here :) The zonelist is, yes, but I'm talking about memoryless and cpuless nodes. If your solution is going to become the generic kernel API that determines what node has local memory for a particular node, then it will have to support all definitions of node. That includes nodes that consist solely of I/O, chipsets, networking, or storage devices. These nodes may not have memory or cpus, so doing it as part of onlining cpus isn't going to be generic enough. You want a node_to_mem_node() API for all possible node types (the possible node types listed above are straight from the ACPI spec). For 99% of people, node_to_mem_node(X) is always going to be X and we can optimize for that, but any solution that relies on cpu online is probably shortsighted right now. I think it would be much better to do this as a part of setting a node to be online. Okay. I got your point. I will change it to rely on node online if this patch is really needed. Thanks! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
On Fri, Feb 07, 2014 at 01:38:55PM -0800, Nishanth Aravamudan wrote: On 07.02.2014 [12:51:07 -0600], Christoph Lameter wrote: Here is a draft of a patch to make this work with memoryless nodes. Hi Christoph, this should be tested instead of Joonsoo's patch 2 (and 3)? Hello, I guess that your system has another problem that makes my patches inactive. Maybe it will also affect to the Christoph's one. Could you confirm page_to_nid(), numa_mem_id() and node_present_pages although I doubt mostly about page_to_nid()? Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node
On Fri, Feb 07, 2014 at 11:49:57AM -0600, Christoph Lameter wrote: On Fri, 7 Feb 2014, Joonsoo Kim wrote: This check wouild need to be something that checks for other contigencies in the page allocator as well. A simple solution would be to actually run a GFP_THIS_NODE alloc to see if you can grab a page from the proper node. If that fails then fallback. See how fallback_alloc() does it in slab. Hello, Christoph. This !node_present_pages() ensure that allocation on this node cannot succeed. So we can directly use numa_mem_id() here. Yes of course we can use numa_mem_id(). But the check is only for not having any memory at all on a node. There are other reason for allocations to fail on a certain node. The node could have memory that cannot be reclaimed, all dirty, beyond certain thresholds, not in the current set of allowed nodes etc etc. Yes. There are many other cases, but I prefer that we think them separately. Maybe they needs another approach. For now, to solve memoryless node problem, my solution is enough and safe. Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] clocksource: Make clocksource register functions void
On 2014/2/6 4:40, Thomas Gleixner wrote: Yijing, On Thu, 23 Jan 2014, David Laight wrote: From: Linuxppc-dev Tony Prisk On 23/01/14 20:12, Yijing Wang wrote: Currently, clocksource_register() and __clocksource_register_scale() functions always return 0, it's pointless, make functions void. And remove the dead code that check the clocksource_register_hz() return value. .. -static inline int clocksource_register_hz(struct clocksource *cs, u32 hz) +static inline void clocksource_register_hz(struct clocksource *cs, u32 hz) { return __clocksource_register_scale(cs, 1, hz); } This doesn't make sense - you are still returning a value on a function declared void, and the return is now from a function that doesn't return anything either ?!?! Doesn't this throw a compile-time warning?? It depends on the compiler. Recent gcc allow it. I don't know if it is actually valid C though. There is no excuse for it on lines like the above though. Can you please resend with that fixed against 3.14-rc1 ? OK, I will resend later. Thanks! Yijing. . -- Thanks! Yijing ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
On Fri, Feb 07, 2014 at 12:51:07PM -0600, Christoph Lameter wrote: Here is a draft of a patch to make this work with memoryless nodes. The first thing is that we modify node_match to also match if we hit an empty node. In that case we simply take the current slab if its there. Why not inspecting whether we can get the page on the best node such as numa_mem_id() node? If there is no current slab then a regular allocation occurs with the memoryless node. The page allocator will fallback to a possible node and that will become the current slab. Next alloc from a memoryless node will then use that slab. For that we also add some tracking of allocations on nodes that were not satisfied using the empty_node[] array. A successful alloc on a node clears that flag. I would rather avoid the empty_node[] array since its global and there may be thread specific allocation restrictions but it would be expensive to do an allocation attempt via the page allocator to make sure that there is really no page available from the page allocator. Index: linux/mm/slub.c === --- linux.orig/mm/slub.c 2014-02-03 13:19:22.896853227 -0600 +++ linux/mm/slub.c 2014-02-07 12:44:49.311494806 -0600 @@ -132,6 +132,8 @@ static inline bool kmem_cache_has_cpu_pa #endif } +static int empty_node[MAX_NUMNODES]; + /* * Issues still to be resolved: * @@ -1405,16 +1407,22 @@ static struct page *new_slab(struct kmem void *last; void *p; int order; + int alloc_node; BUG_ON(flags GFP_SLAB_BUG_MASK); page = allocate_slab(s, flags (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); - if (!page) + if (!page) { + if (node != NUMA_NO_NODE) + empty_node[node] = 1; goto out; + } empty_node cannot be set on memoryless node, since page allocation would succeed on different node. Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RESEND PATCH 0/3] powerpc: Free up an IPI message slot for tick broadcast IPIs
This patchset is a precursor for enabling deep idle states on powerpc, when the local CPU timers stop. The tick broadcast framework in the Linux Kernel today handles wakeup of such CPUs at their next timer event by using an external clock device. At the expiry of this clock device, IPIs are sent to the CPUs in deep idle states so that they wakeup to handle their respective timers. This patchset frees up one of the IPI slots on powerpc so as to be used to handle the tick broadcast IPI. On certain implementations of powerpc, such an external clock device is absent. The support in the tick broadcast framework to handle wakeup of CPUs from deep idle states on such implementations is currently in the tip tree. https://lkml.org/lkml/2014/2/7/906 https://lkml.org/lkml/2014/2/7/876 https://lkml.org/lkml/2014/2/7/608 With the above support in place, this patchset is next in line to enable deep idle states on powerpc. The patchset has been appended by a RESEND tag since nothing has changed from the previous post except for an added config condition around tick_broadcast() which handles sending broadcast IPIs, and the update in the cover letter. --- Preeti U Murthy (1): cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines Srivatsa S. Bhat (2): powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message powerpc: Implement tick broadcast IPI as a fixed IPI message arch/powerpc/include/asm/smp.h |2 - arch/powerpc/include/asm/time.h |1 arch/powerpc/kernel/smp.c | 25 ++--- arch/powerpc/kernel/time.c | 86 ++- arch/powerpc/platforms/cell/interrupt.c |2 - arch/powerpc/platforms/ps3/smp.c|2 - 6 files changed, 73 insertions(+), 45 deletions(-) -- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RESEND PATCH 2/3] powerpc: Implement tick broadcast IPI as a fixed IPI message
From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com For scalability and performance reasons, we want the tick broadcast IPIs to be handled as efficiently as possible. Fixed IPI messages are one of the most efficient mechanisms available - they are faster than the smp_call_function mechanism because the IPI handlers are fixed and hence they don't involve costly operations such as adding IPI handlers to the target CPU's function queue, acquiring locks for synchronization etc. Luckily we have an unused IPI message slot, so use that to implement tick broadcast IPIs efficiently. Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com [Functions renamed to tick_broadcast* and Changelog modified by Preeti U. Murthypre...@linux.vnet.ibm.com] Signed-off-by: Preeti U. Murthy pre...@linux.vnet.ibm.com Acked-by: Geoff Levand ge...@infradead.org [For the PS3 part] --- arch/powerpc/include/asm/smp.h |2 +- arch/powerpc/include/asm/time.h |1 + arch/powerpc/kernel/smp.c | 21 + arch/powerpc/kernel/time.c |5 + arch/powerpc/platforms/cell/interrupt.c |2 +- arch/powerpc/platforms/ps3/smp.c|2 +- 6 files changed, 26 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 9f7356b..ff51046 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu); * in /proc/interrupts will be wrong!!! --Troy */ #define PPC_MSG_CALL_FUNCTION 0 #define PPC_MSG_RESCHEDULE 1 -#define PPC_MSG_UNUSED 2 +#define PPC_MSG_TICK_BROADCAST 2 #define PPC_MSG_DEBUGGER_BREAK 3 /* for irq controllers that have dedicated ipis per message (4) */ diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index c1f2676..1d428e6 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -28,6 +28,7 @@ extern struct clock_event_device decrementer_clockevent; struct rtc_time; extern void to_tm(int tim, struct rtc_time * tm); extern void GregorianDay(struct rtc_time *tm); +extern void tick_broadcast_ipi_handler(void); extern void generic_calibrate_decr(void); diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ee7d76b..e2a4232 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -35,6 +35,7 @@ #include asm/ptrace.h #include linux/atomic.h #include asm/irq.h +#include asm/hw_irq.h #include asm/page.h #include asm/pgtable.h #include asm/prom.h @@ -145,9 +146,9 @@ static irqreturn_t reschedule_action(int irq, void *data) return IRQ_HANDLED; } -static irqreturn_t unused_action(int irq, void *data) +static irqreturn_t tick_broadcast_ipi_action(int irq, void *data) { - /* This slot is unused and hence available for use, if needed */ + tick_broadcast_ipi_handler(); return IRQ_HANDLED; } @@ -168,14 +169,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data) static irq_handler_t smp_ipi_action[] = { [PPC_MSG_CALL_FUNCTION] = call_function_action, [PPC_MSG_RESCHEDULE] = reschedule_action, - [PPC_MSG_UNUSED] = unused_action, + [PPC_MSG_TICK_BROADCAST] = tick_broadcast_ipi_action, [PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action, }; const char *smp_ipi_name[] = { [PPC_MSG_CALL_FUNCTION] = ipi call function, [PPC_MSG_RESCHEDULE] = ipi reschedule, - [PPC_MSG_UNUSED] = ipi unused, + [PPC_MSG_TICK_BROADCAST] = ipi tick-broadcast, [PPC_MSG_DEBUGGER_BREAK] = ipi debugger, }; @@ -251,6 +252,8 @@ irqreturn_t smp_ipi_demux(void) generic_smp_call_function_interrupt(); if (all IPI_MESSAGE(PPC_MSG_RESCHEDULE)) scheduler_ipi(); + if (all IPI_MESSAGE(PPC_MSG_TICK_BROADCAST)) + tick_broadcast_ipi_handler(); if (all IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK)) debug_ipi_action(0, NULL); } while (info-messages); @@ -289,6 +292,16 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask) do_message_pass(cpu, PPC_MSG_CALL_FUNCTION); } +#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST +void tick_broadcast(const struct cpumask *mask) +{ + unsigned int cpu; + + for_each_cpu(cpu, mask) + do_message_pass(cpu, PPC_MSG_TICK_BROADCAST); +} +#endif + #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC) void smp_send_debugger_break(void) { diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index b3dab20..3ff97db 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -825,6 +825,11 @@ static void decrementer_set_mode(enum clock_event_mode mode, decrementer_set_next_event(DECREMENTER_MAX, dev); } +/* Interrupt handler for the timer broadcast IPI */ +void
[RESEND PATCH 1/3] powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
From: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE map to a common implementation - generic_smp_call_function_single_interrupt(). So, we can consolidate them and save one of the IPI message slots, (which are precious on powerpc, since only 4 of those slots are available). So, implement the functionality of PPC_MSG_CALL_FUNC_SINGLE using PPC_MSG_CALL_FUNC itself and release its IPI message slot, so that it can be used for something else in the future, if desired. Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Signed-off-by: Preeti U. Murthy pre...@linux.vnet.ibm.com Acked-by: Geoff Levand ge...@infradead.org [For the PS3 part] --- arch/powerpc/include/asm/smp.h |2 +- arch/powerpc/kernel/smp.c | 12 +--- arch/powerpc/platforms/cell/interrupt.c |2 +- arch/powerpc/platforms/ps3/smp.c|2 +- 4 files changed, 8 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 084e080..9f7356b 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu); * in /proc/interrupts will be wrong!!! --Troy */ #define PPC_MSG_CALL_FUNCTION 0 #define PPC_MSG_RESCHEDULE 1 -#define PPC_MSG_CALL_FUNC_SINGLE 2 +#define PPC_MSG_UNUSED 2 #define PPC_MSG_DEBUGGER_BREAK 3 /* for irq controllers that have dedicated ipis per message (4) */ diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index ac2621a..ee7d76b 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -145,9 +145,9 @@ static irqreturn_t reschedule_action(int irq, void *data) return IRQ_HANDLED; } -static irqreturn_t call_function_single_action(int irq, void *data) +static irqreturn_t unused_action(int irq, void *data) { - generic_smp_call_function_single_interrupt(); + /* This slot is unused and hence available for use, if needed */ return IRQ_HANDLED; } @@ -168,14 +168,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data) static irq_handler_t smp_ipi_action[] = { [PPC_MSG_CALL_FUNCTION] = call_function_action, [PPC_MSG_RESCHEDULE] = reschedule_action, - [PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action, + [PPC_MSG_UNUSED] = unused_action, [PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action, }; const char *smp_ipi_name[] = { [PPC_MSG_CALL_FUNCTION] = ipi call function, [PPC_MSG_RESCHEDULE] = ipi reschedule, - [PPC_MSG_CALL_FUNC_SINGLE] = ipi call function single, + [PPC_MSG_UNUSED] = ipi unused, [PPC_MSG_DEBUGGER_BREAK] = ipi debugger, }; @@ -251,8 +251,6 @@ irqreturn_t smp_ipi_demux(void) generic_smp_call_function_interrupt(); if (all IPI_MESSAGE(PPC_MSG_RESCHEDULE)) scheduler_ipi(); - if (all IPI_MESSAGE(PPC_MSG_CALL_FUNC_SINGLE)) - generic_smp_call_function_single_interrupt(); if (all IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK)) debug_ipi_action(0, NULL); } while (info-messages); @@ -280,7 +278,7 @@ EXPORT_SYMBOL_GPL(smp_send_reschedule); void arch_send_call_function_single_ipi(int cpu) { - do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE); + do_message_pass(cpu, PPC_MSG_CALL_FUNCTION); } void arch_send_call_function_ipi_mask(const struct cpumask *mask) diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c index 2d42f3b..adf3726 100644 --- a/arch/powerpc/platforms/cell/interrupt.c +++ b/arch/powerpc/platforms/cell/interrupt.c @@ -215,7 +215,7 @@ void iic_request_IPIs(void) { iic_request_ipi(PPC_MSG_CALL_FUNCTION); iic_request_ipi(PPC_MSG_RESCHEDULE); - iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE); + iic_request_ipi(PPC_MSG_UNUSED); iic_request_ipi(PPC_MSG_DEBUGGER_BREAK); } diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c index 4b35166..00d1a7c 100644 --- a/arch/powerpc/platforms/ps3/smp.c +++ b/arch/powerpc/platforms/ps3/smp.c @@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void) BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION!= 0); BUILD_BUG_ON(PPC_MSG_RESCHEDULE != 1); - BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2); + BUILD_BUG_ON(PPC_MSG_UNUSED != 2); BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK != 3); for (i = 0; i MSG_COUNT; i++) { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RESEND PATCH 3/3] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
From: Preeti U Murthy pre...@linux.vnet.ibm.com Split timer_interrupt(), which is the local timer interrupt handler on ppc into routines called during regular interrupt handling and __timer_interrupt(), which takes care of running local timers and collecting time related stats. This will enable callers interested only in running expired local timers to directly call into __timer_interupt(). One of the use cases of this is the tick broadcast IPI handling in which the sleeping CPUs need to handle the local timers that have expired. Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com --- arch/powerpc/kernel/time.c | 81 +--- 1 file changed, 46 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 3ff97db..df2989b 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -478,6 +478,47 @@ void arch_irq_work_raise(void) #endif /* CONFIG_IRQ_WORK */ +void __timer_interrupt(void) +{ + struct pt_regs *regs = get_irq_regs(); + u64 *next_tb = __get_cpu_var(decrementers_next_tb); + struct clock_event_device *evt = __get_cpu_var(decrementers); + u64 now; + + trace_timer_interrupt_entry(regs); + + if (test_irq_work_pending()) { + clear_irq_work_pending(); + irq_work_run(); + } + + now = get_tb_or_rtc(); + if (now = *next_tb) { + *next_tb = ~(u64)0; + if (evt-event_handler) + evt-event_handler(evt); + __get_cpu_var(irq_stat).timer_irqs_event++; + } else { + now = *next_tb - now; + if (now = DECREMENTER_MAX) + set_dec((int)now); + /* We may have raced with new irq work */ + if (test_irq_work_pending()) + set_dec(1); + __get_cpu_var(irq_stat).timer_irqs_others++; + } + +#ifdef CONFIG_PPC64 + /* collect purr register values often, for accurate calculations */ + if (firmware_has_feature(FW_FEATURE_SPLPAR)) { + struct cpu_usage *cu = __get_cpu_var(cpu_usage_array); + cu-current_tb = mfspr(SPRN_PURR); + } +#endif + + trace_timer_interrupt_exit(regs); +} + /* * timer_interrupt - gets called when the decrementer overflows, * with interrupts disabled. @@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs) { struct pt_regs *old_regs; u64 *next_tb = __get_cpu_var(decrementers_next_tb); - struct clock_event_device *evt = __get_cpu_var(decrementers); - u64 now; /* Ensure a positive value is written to the decrementer, or else * some CPUs will continue to take decrementer exceptions. @@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs) old_regs = set_irq_regs(regs); irq_enter(); - trace_timer_interrupt_entry(regs); - - if (test_irq_work_pending()) { - clear_irq_work_pending(); - irq_work_run(); - } - - now = get_tb_or_rtc(); - if (now = *next_tb) { - *next_tb = ~(u64)0; - if (evt-event_handler) - evt-event_handler(evt); - __get_cpu_var(irq_stat).timer_irqs_event++; - } else { - now = *next_tb - now; - if (now = DECREMENTER_MAX) - set_dec((int)now); - /* We may have raced with new irq work */ - if (test_irq_work_pending()) - set_dec(1); - __get_cpu_var(irq_stat).timer_irqs_others++; - } - -#ifdef CONFIG_PPC64 - /* collect purr register values often, for accurate calculations */ - if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - struct cpu_usage *cu = __get_cpu_var(cpu_usage_array); - cu-current_tb = mfspr(SPRN_PURR); - } -#endif - - trace_timer_interrupt_exit(regs); - + __timer_interrupt(); irq_exit(); set_irq_regs(old_regs); } @@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode mode, /* Interrupt handler for the timer broadcast IPI */ void tick_broadcast_ipi_handler(void) { + u64 *next_tb = __get_cpu_var(decrementers_next_tb); + + *next_tb = get_tb_or_rtc(); + __timer_interrupt(); } static void register_decrementer_clockevent(int cpu) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Convert powerpc simple spinlocks into ticket locks
On Thu, 2014-02-06 at 13:28 -0600, Tom Musta wrote: My read is consistent with Torsten's ... this looks like a bad idea. Look at the RTL for sthcx. on page 692 (Power ISA V2.06) and you will see this: if RESERVE then if RESERVE_LENGTH = 2 then ... else undefined_case - 1 else ... A legal implementation might never perform the store. This is an area where we definitely want to check with the implementors and if the implementations happen to do what we want (they likely do), get the architecture changed for future chips and use it anyway. There's a a *significant* benefit in avoiding an atomic operation in the unlock case . The reservation mechanism being based on a granule that is generally a cache line, I doubt implementations will ever check the actual access size, but we need to double check. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Convert powerpc simple spinlocks into ticket locks
On Fri, 2014-02-07 at 10:02 +0100, Torsten Duwe wrote: Can you pair lwarx with sthcx ? I couldn't immediately find the answer in the PowerISA doc. If so I think you can do better by being able to atomically load both tickets but only storing the head without affecting the tail. Can I simply write the half word, without a reservation, or will the HW caches mess up the other half? Will it ruin the cache coherency on some (sub)architectures? Yes, you can, I *think* Plus, sthcx doesn't exist on all PPC chips. Which ones are lacking it? Do all have at least a simple 16-bit store? half word atomics (and byte atomics) are new, they've been added in architecture 2.06 I believe so it's fairly recent, but it's still worthwhile to investigate a way to avoid atomics on unlock on recent processors (we can use instruction patching if necessary based on CPU features) because there's definitely a significant cost in doing a larx/stcx. sequence on powerpc, way higher than our current unlock path of barrier + store. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] Convert powerpc simple spinlocks into ticket locks
On Fri, 2014-02-07 at 09:51 -0600, Kumar Gala wrote: On Feb 7, 2014, at 3:02 AM, Torsten Duwe d...@lst.de wrote: On Thu, Feb 06, 2014 at 02:19:52PM -0600, Scott Wood wrote: On Thu, 2014-02-06 at 18:37 +0100, Torsten Duwe wrote: On Thu, Feb 06, 2014 at 05:38:37PM +0100, Peter Zijlstra wrote: Can you pair lwarx with sthcx ? I couldn't immediately find the answer in the PowerISA doc. If so I think you can do better by being able to atomically load both tickets but only storing the head without affecting the tail. Can I simply write the half word, without a reservation, or will the HW caches mess up the other half? Will it ruin the cache coherency on some (sub)architectures? The coherency should be fine, I just can’t remember if you’ll lose the reservation by doing this. Yes you do. Plus, sthcx doesn't exist on all PPC chips. Which ones are lacking it? Do all have at least a simple 16-bit store? Everything implements a simple 16-bit store, just not everything implements the store conditional of 16-bit data. Ben. - k-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] powerpc ticket locks
On Fri, 2014-02-07 at 17:58 +0100, Torsten Duwe wrote: typedef struct { - volatile unsigned int slock; -} arch_spinlock_t; + union { + __ticketpair_t head_tail; + struct __raw_tickets { +#ifdef __BIG_ENDIAN__ /* The tail part should be in the MSBs */ + __ticket_t tail, head; +#else + __ticket_t head, tail; +#endif + } tickets; + }; +#if defined(CONFIG_PPC_SPLPAR) + u32 holder; +#endif +} arch_spinlock_t __aligned(4); That's still broken with lockref (which we just merged). We must have the arch_spinlock_t and the ref in the same 64-bit word otherwise it will break. We can make it work in theory since the holder doesn't have to be accessed atomically, but the practicals are a complete mess ... lockref would essentially have to re-implement the holder handling of the spinlocks and use lower level ticket stuff. Unless you can find a sneaky trick ... :-( Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] PPC: powernv: remove redundant cpuidle_idle_call()
Hi Peter, On 02/07/2014 06:11 PM, Peter Zijlstra wrote: On Fri, Feb 07, 2014 at 05:11:26PM +0530, Preeti U Murthy wrote: But observe the idle state snooze on powerpc. The power that this idle state saves is through the lowering of the thread priority of the CPU. After it lowers the thread priority, it is done. It cannot wait_for_interrupts. It will exit my_idle(). It is now upto the generic idle loop to increase the thread priority if the need_resched flag is set. Only an interrupt routine can increase the thread priority. Else we will need to do it explicitly. And in such states which have a polling nature, the cpu will not receive a reschedule IPI. That is why in the snooze_loop() we poll on need_resched. If it is set we up the priority of the thread using HMT_MEDIUM() and then exit the my_idle() loop. In case of interrupts, the priority gets automatically increased. You can poll without setting TS_POLLING/TIF_POLLING_NRFLAGS just fine and get the IPI if that is what you want. Depending on how horribly unprovisioned the thread gets at the lowest priority, that might actually be faster than polling and raising the prio whenever it does get ran. So I am assuming you mean something like the below: my_idle() { local_irq_enable(); /* Remove the setting of the polling flag */ HMT_low(); return index; } And then exit into the generic idle loop. But the issue I see here is that the TS_POLLING/TIF_POLLING_NRFLAGS gets set immediately. So, if on testing need_resched() immediately after this returns that the TIF_NEED_RESCHED flag is set, the thread will exit at low priority right? We could raise the priority of the thread in arch_cpu_idle_exit() soon after setting the polling flag but that would mean for cases where the TIF_NEED_RESCHED flag is not set we unnecessarily raise the priority of the thread. Thanks Regards Preeti U Murthy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev