Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Wed, Apr 07, 2010 at 06:03:31PM +1000, Benjamin Herrenschmidt wrote: Ok so too many problems with your last patch, I didn't have time to fix them all, so it's not going into -next this week. Please, test with a variety of defconfigs (iseries, cell, g5 for example), and especially with CONFIG_PERF_EVENTS not set. There are issues in the generic header for that (though I'm told some people are working on a fix). Basically, we need something like CONFIG_HW_BREAKPOINTS that is set (maybe optionally, maybe not) with CONFIG_PERF_EVENTS and CONFIG_HAVE_HW_BREAKPOINT. Then you can use that to properly fix the ifdef'ing in include/linux/hw_breakpoints.h, and fix the various other cases in powerpc code where you are testing for the wrong thing. Cheers, Ben. Hi Ben, I've sent a new patch (linuxppc-dev message-id ref: 20100414034340.ga6...@in.ibm.com) that builds against the defconfigs for various architectures pointed out by you (I did see quite a few errors that aren't induced by the patch). The source tree is buildable even without CONFIG_PERF_EVENTS, and is limited to scope using CONFIG_HAVE_HW_BREAKPOINT. At this stage, I didnot find a need for a seperate CONFIG_HW_BREAKPOINTS though. Let me know what you think. Thanks, K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
Ok so too many problems with your last patch, I didn't have time to fix them all, so it's not going into -next this week. Please, test with a variety of defconfigs (iseries, cell, g5 for example), and especially with CONFIG_PERF_EVENTS not set. There are issues in the generic header for that (though I'm told some people are working on a fix). Basically, we need something like CONFIG_HW_BREAKPOINTS that is set (maybe optionally, maybe not) with CONFIG_PERF_EVENTS and CONFIG_HAVE_HW_BREAKPOINT. Then you can use that to properly fix the ifdef'ing in include/linux/hw_breakpoints.h, and fix the various other cases in powerpc code where you are testing for the wrong thing. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Tue, Mar 23, 2010 at 04:33:01PM +1100, Paul Mackerras wrote: On Mon, Mar 08, 2010 at 11:44:48PM +0530, K.Prasad wrote: @@ -479,6 +483,7 @@ struct task_struct *__switch_to(struct t old_thread-accum_tb += (current_tb - start_tb); new_thread-start_tb = current_tb; } + flush_ptrace_hw_breakpoint(current); #endif local_irq_save(flags); This line should be in flush_thread(), not __switch_to(). In fact it may not be necessary at all given that flush_ptrace_hw_breakpoint() gets called in do_exit(). Paul. Yes, I did realise it. The unintended movement of flush_ptrace_hw_breakpoint() from flush_thread() is a result of a patching error (while forward porting). A fix for this issue along with those pointed out by BenH will be a part of the next version of the patch, that I intend to send very soon. Thanks for reviewing them. -- K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Mon, Mar 08, 2010 at 11:44:48PM +0530, K.Prasad wrote: @@ -479,6 +483,7 @@ struct task_struct *__switch_to(struct t old_thread-accum_tb += (current_tb - start_tb); new_thread-start_tb = current_tb; } + flush_ptrace_hw_breakpoint(current); #endif local_irq_save(flags); This line should be in flush_thread(), not __switch_to(). In fact it may not be necessary at all given that flush_ptrace_hw_breakpoint() gets called in do_exit(). Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Fri, Mar 12, 2010 at 05:19:36PM +1100, Benjamin Herrenschmidt wrote: Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h === --- /dev/null +++ linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h @@ -0,0 +1,54 @@ +#ifndef_PPC64_HW_BREAKPOINT_H +#define_PPC64_HW_BREAKPOINT_H + +#ifdef __KERNEL__ +#define__ARCH_HW_BREAKPOINT_H +#ifdef CONFIG_PPC64 + +struct arch_hw_breakpoint { + u8 len; /* length of the target symbol */ I don't understand the usage of the word symbol above, can you explain ? The word symbol, here, carries a meaning similar to what is derived from its usage in the word kernel data symbols - although it is used to store lengths for both kernel and user-space breakpoints. Since the desired length of the breakpoint is typically determined by the size of the symbol (variable) being monitored (not exceeding 8-Bytes), the comment was such. I shall change it to a more descriptive one, such as length of the target kernel/user data symbol if you prefer that. + int type; + unsigned long address; +}; + +#include linux/kdebug.h +#include asm/reg.h +#include asm/system.h + +/* Total number of available HW breakpoint registers */ +#define HBP_NUM 1 + +struct perf_event; +struct pmu; +struct perf_sample_data; + +#define HW_BREAKPOINT_ALIGN 0x7 +/* Maximum permissible length of any HW Breakpoint */ +#define HW_BREAKPOINT_LEN 0x8 That's a lot of server-only hard wired assumptions... I suppose the DABR emulation of BookE will catch but do you intend to provide proper BookE support at some stage ? Yes, I intend to extend support for Book-E sometime soon during which the above code would have to be either a) enclosed inside #ifdef PPC64 or b) a new header file for BookE debug register definitions are used (guess Shaggy's code would have many of them done already). +static inline void hw_breakpoint_disable(void) +{ + set_dabr(0); +} How much of these set_dabr() I see here are going to interact with ptrace ? Is there some exclusion going on between ptrace and perf event use of the DABR or none at all ? Or are you replacing the ptrace bits ? This set_dabr() in hw_breakpoint_disable() is to be called only once during machine_kexec() [which I realise is missing in the patch...will add that] and will not conflict with ptrace. The other set_dabr() in arch_uninstall_hw_breakpoint() will perform the actual debug register write, while the ptrace_getset_debugreg() requests are routed through them (via the hw-breakpoint interfaces for arbitration as shown below) and are designed to not conflict with each other. ptrace_set_debugreg()--register_user_hw_breakpoint() ... (sched)--perf_event_task_sched_inout()---arch_uninstall_hw_breakpoint() In short, an existing ptrace request will not be overwritten/replaced to accommodate a new kernel/user-space request (which will fail because of DABR unavailability). +/* + * Install a perf counter breakpoint. + * + * We seek a free debug address register and use it for this + * breakpoint. + * + * Atomic: we hold the counter-ctx-lock and we only handle variables + * and registers local to this cpu. + */ +int arch_install_hw_breakpoint(struct perf_event *bp) +{ + struct arch_hw_breakpoint *info = counter_arch_bp(bp); + struct perf_event **slot = __get_cpu_var(bp_per_reg); + + if (!*slot) + *slot = bp; + else { + WARN_ONCE(1, Can't find any breakpoint slot); + return -EBUSY; + } + + set_dabr(info-address | info-type | DABR_TRANSLATION); + return 0; +} Under which circumstances will the upper layer call that more than once ? If it's a legit thing to do, then the WARN_ONCE() is a heavy hammer here. I wouldn't even printk or only pr_debug() if it's really worth it. Or is that something that should just not happen ? I don't really see when this can happen in PPC64 with one DABR. The slot reservation mechanism wouldn't allow this to happen and the code is here since it got inherited from x86. I shall remove the check and hence the WARN_ONCE. I would also use this coding style which is more compact and avoids the horrible (!*slot) : /* Check if the slot is busy */ if (*slot) return -EBUSY; set_dabr(...); +/* + * Uninstall the breakpoint contained in the given counter. + * + * First we search the debug address register it uses and then we disable + * it. + * + * Atomic: we hold the counter-ctx-lock and we only handle variables + * and registers local to this cpu. + */ +void arch_uninstall_hw_breakpoint(struct perf_event *bp) +{ + struct perf_event **slot = __get_cpu_var(bp_per_reg); + + if (*slot == bp) + *slot = NULL; + else { + WARN_ONCE(1,
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
Index: linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h === --- /dev/null +++ linux-2.6.ppc64_test/arch/powerpc/include/asm/hw_breakpoint.h @@ -0,0 +1,54 @@ +#ifndef _PPC64_HW_BREAKPOINT_H +#define _PPC64_HW_BREAKPOINT_H + +#ifdef __KERNEL__ +#define __ARCH_HW_BREAKPOINT_H +#ifdef CONFIG_PPC64 + +struct arch_hw_breakpoint { + u8 len; /* length of the target symbol */ I don't understand the usage of the word symbol above, can you explain ? + int type; + unsigned long address; +}; + +#include linux/kdebug.h +#include asm/reg.h +#include asm/system.h + +/* Total number of available HW breakpoint registers */ +#define HBP_NUM 1 + +struct perf_event; +struct pmu; +struct perf_sample_data; + +#define HW_BREAKPOINT_ALIGN 0x7 +/* Maximum permissible length of any HW Breakpoint */ +#define HW_BREAKPOINT_LEN 0x8 That's a lot of server-only hard wired assumptions... I suppose the DABR emulation of BookE will catch but do you intend to provide proper BookE support at some stage ? +static inline void hw_breakpoint_disable(void) +{ + set_dabr(0); +} How much of these set_dabr() I see here are going to interact with ptrace ? Is there some exclusion going on between ptrace and perf event use of the DABR or none at all ? Or are you replacing the ptrace bits ? +/* + * Install a perf counter breakpoint. + * + * We seek a free debug address register and use it for this + * breakpoint. + * + * Atomic: we hold the counter-ctx-lock and we only handle variables + * and registers local to this cpu. + */ +int arch_install_hw_breakpoint(struct perf_event *bp) +{ + struct arch_hw_breakpoint *info = counter_arch_bp(bp); + struct perf_event **slot = __get_cpu_var(bp_per_reg); + + if (!*slot) + *slot = bp; + else { + WARN_ONCE(1, Can't find any breakpoint slot); + return -EBUSY; + } + + set_dabr(info-address | info-type | DABR_TRANSLATION); + return 0; +} Under which circumstances will the upper layer call that more than once ? If it's a legit thing to do, then the WARN_ONCE() is a heavy hammer here. I wouldn't even printk or only pr_debug() if it's really worth it. Or is that something that should just not happen ? I would also use this coding style which is more compact and avoids the horrible (!*slot) : /* Check if the slot is busy */ if (*slot) return -EBUSY; set_dabr(...); +/* + * Uninstall the breakpoint contained in the given counter. + * + * First we search the debug address register it uses and then we disable + * it. + * + * Atomic: we hold the counter-ctx-lock and we only handle variables + * and registers local to this cpu. + */ +void arch_uninstall_hw_breakpoint(struct perf_event *bp) +{ + struct perf_event **slot = __get_cpu_var(bp_per_reg); + + if (*slot == bp) + *slot = NULL; + else { + WARN_ONCE(1, Can't find the breakpoint slot); + return; + } + set_dabr(0); +} Similar coding style issues... That one might be worth the warning as I suppose the core should -really- not try to uninstall a bp that hasn't been installed in the first place. +/* + * Validate the arch-specific HW Breakpoint register settings + */ +int arch_validate_hwbkpt_settings(struct perf_event *bp, + struct task_struct *tsk) +{ + int is_kernel, ret = -EINVAL; + struct arch_hw_breakpoint *info = counter_arch_bp(bp); + + if (!bp) + return ret; + + switch (bp-attr.bp_type) { + case HW_BREAKPOINT_R: + info-type = DABR_DATA_READ; + break; + case HW_BREAKPOINT_W: + info-type = DABR_DATA_WRITE; + break; + case HW_BREAKPOINT_R | HW_BREAKPOINT_W: + info-type = (DABR_DATA_READ | DABR_DATA_WRITE); + break; + default: + return ret; + } I'm not -too- fan of the above, I suppose I would have written it a bit differently using if's but that's not a big deal... however: + /* TODO: Check for a valid triggered function */ + /* if (!bp-triggered) + return -EINVAL; */ What is that ? Is the patch incomplete ? Don't leave commented out code in there. If you think there's a worthwhile improvement, then add a comment with maybe a bit more explanations, and make it clear that the patch is still useful without the code, but don't just leave commented out code like that without a good reason. A good reason would be some optional debug stuff for example, but then an ifdef is preferrable to comments. + is_kernel = is_kernel_addr(bp-attr.bp_addr); + if ((tsk is_kernel) || (!tsk !is_kernel)) + return -EINVAL; + + info-address =
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Fri, Feb 26, 2010 at 02:58:12AM +0100, Frederic Weisbecker wrote: On Mon, Feb 22, 2010 at 06:47:46PM +0530, K.Prasad wrote: [snip] Oh, why does ptrace use a one-shot behaviour in ppc? Breakpoints only trigger once? Yes, ptrace breakpoints on PPC64 are designed to trigger once and this patch retains that behaviour. It is very convenient to use one-shot behaviour on archs where exceptions are triggered-before-execute. Ah, Why? Because otherwise you have jump through some tricky hoops so that when gdb (or whatever) resumes after the breakpoint, you don't immediately retrap in the same place. I gather x86 has hardware assistance to do this, but powerpc doesn't. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Fri, Feb 26, 2010 at 02:58:12AM +0100, Frederic Weisbecker wrote: On Mon, Feb 22, 2010 at 06:47:46PM +0530, K.Prasad wrote: The 'name' field here is actually a legacy inherited from x86 code. It is part of x86's arch-specific hw-breakpoint structure since: - inspired by the kprobe implementation which accepts symbol name as input. - kallsyms_lookup_name() was 'unexported' and a module could not resolve symbol names externally, so the core-infrastructure had to provide such facilities. - I wasn't sure about the discussions behind 'unexporting' of kallsyms_lookup_name(), so did not venture to export them again (which you rightfully did :-) Having said that, I'll be glad to remove this field (along with that in x86), Cool, I'll integrate the x86 name field removal to the .24 series I've removed the .name field in the latest version of the patch sent (linuxppc-dev message-id: 20100308181232.ga3...@in.ibm.com). +void ptrace_triggered(struct perf_event *bp, int nmi, + struct perf_sample_data *data, struct pt_regs *regs) +{ + struct perf_event_attr attr; + + /* +* Disable the breakpoint request here since ptrace has defined a +* one-shot behaviour for breakpoint exceptions in PPC64. +* The SIGTRAP signal is generated automatically for us in do_dabr(). +* We don't have to do anything about that here +*/ Oh, why does ptrace use a one-shot behaviour in ppc? Breakpoints only trigger once? Yes, ptrace breakpoints on PPC64 are designed to trigger once and this patch retains that behaviour. It is very convenient to use one-shot behaviour on archs where exceptions are triggered-before-execute. Ah, Why? Because we don't have to create the single_step_dabr_instruction() function :-) With one-shot behaviour, the hw_breakpoint_handler() doesn't have to worry about entering into an infinite-loop (since exceptions are triggered before instruction execution, and if breakpoints are still active every attempt to execute the causative instruction raises the breakpoint exception). To circumvent infinite looping, we temporarily disable the breakpoints, enable single-stepping (to single-step over the causative instruction) and re-enable them inside single-step exception handler. For the one-shot type breakpoint, the pain of re-registration is left to the end-user. I've wondered why PowerPC was designed to be 'trigger-before-execute' when handling the exception can grow complex as this. Perhaps it is meant to give absolute control over the data value (stored in the target address) before the instruction gets to see it. For instance, breakpoints can be used to change the data to a 'desirable' value before actually being 'seen' by instructions. Thanks, K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Tue, Feb 23, 2010 at 04:27:15PM +0530, K.Prasad wrote: On Mon, Feb 22, 2010 at 06:47:46PM +0530, K.Prasad wrote: On Sun, Feb 21, 2010 at 02:01:37AM +0100, Frederic Weisbecker wrote: On Mon, Feb 15, 2010 at 11:29:14AM +0530, K.Prasad wrote: [snipped] Also, do you think addr/len/type is enough to abstract out any ppc breakpoints? This looks enough to me to express range breakpoints and simple breakpoints. But what about value comparison? (And still, there may be other trickier implementations I don't know in ppc). The above implementation is for PPC64 architecture that supports only 'simple' breakpoints of fixed length (no range breakpoints, no value comparison). More on that below. Looks like I forgot the 'more on that below' part :-)here are some thoughts... Architectures like PPC Book-E have support for a variety of sophisticated debug features and our generic framework, in its present form, cannot easily port itself to these processors. In order to extend the framework for PPC Book-E, I intend the following to begin with: - Implement support for data breakpoints through DAC registers with all the 'bells and whistles'...support for instruction breakpoints through IAC can come in later (without precluding its use through ptrace). - Embed the flags/variables to store DVC, masked address mode, etc. in 'struct arch_hw_breakpoint', which will be populated by the user of register_breakpoint interface. Agreed. Apart from the above extensions to the framework, changes in the generic code would be required as described in an earlier LKML mail (ref: message-id: 20091127190705.gb18...@in.ibm.com)relevant contents pasted below: I think the register_ interfaces can become wrappers around functions that do the following: - arch_validate(): Validate request by invoking an arch-dependant routine. Proceed if returned valid. - arch-specific debugreg availability: Do something like if (arch_hw_breakpoint_availabile()) bp = perf_event_create_kernel_counter(); This is already what does register_hw_break(), it fails if a slot is not available: perf_event_create_kernel_counter - perf_bp_init() - reserve_bp_slot() Having a: if (arch_hw_breakpoint_availabile()) bp = perf_event_create_kernel_counter(); would be racy. perf_event_create_kernel_counter()---arch_install_hw_breakpoint(); This way, all book-keeping related work (no. of pinned/flexible/per-cpu) will be moved to arch-specific files (will be helpful for PPC Book-E implementation having two types of debug registers). Every new architecture that intends to port to the new hw-breakpoint implementation must define their arch_validate(), arch_hw_breakpoint_available() and an arch_install_hw_breakpoint(), while the hw-breakpoint code will be flexible enough to extend itself to each of these archs. Let me know what you think of the above. We certainly need the slot reservation in arch (a part of it at least). But we also need a kind of new interface for arch predefined attributes, instead of generic attributes. Probably we need a kind of perf_event_create_kernel_counter() that can accept either a perf_event_attr (for perf syscall or ftrace) and an arch structure that can be passed to the breakpoint API, so that we don't need the generic translation. Thanks, K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Mon, Feb 22, 2010 at 06:47:46PM +0530, K.Prasad wrote: The 'name' field here is actually a legacy inherited from x86 code. It is part of x86's arch-specific hw-breakpoint structure since: - inspired by the kprobe implementation which accepts symbol name as input. - kallsyms_lookup_name() was 'unexported' and a module could not resolve symbol names externally, so the core-infrastructure had to provide such facilities. - I wasn't sure about the discussions behind 'unexporting' of kallsyms_lookup_name(), so did not venture to export them again (which you rightfully did :-) Having said that, I'll be glad to remove this field (along with that in x86), Cool, I'll integrate the x86 name field removal to the .24 series provided we know that there's a way for the user to resolve symbol names on its own i.e. routines like kallsyms_lookup_name() will remain exported. Yeah, I guess it's fine to keep kallsyms_lookup_name() exported. Also, do you think addr/len/type is enough to abstract out any ppc breakpoints? This looks enough to me to express range breakpoints and simple breakpoints. But what about value comparison? (And still, there may be other trickier implementations I don't know in ppc). The above implementation is for PPC64 architecture that supports only 'simple' breakpoints of fixed length (no range breakpoints, no value comparison). More on that below. Ok. I was just a bit confused in the middle of the several PPC breakpoint implementations :) + /* + * As a policy, the callback is invoked in a 'trigger-after-execute' + * fashion + */ + (bp-overflow_handler)(bp, 0, NULL, regs); Why are you calling this explicitly instead of using the perf_bp_event() thing? This looks like it won't work with perf as the event won't be recorded by perf. Yes, should have invoked perf_bp_event() for perf to work well (on a side note, it makes me wonder at the amount of 'extra' code that each breakpoint exception would execute if it were not called through perf sys-call...well, the costs of integrating with a generic infrastructure!) It has the benefit of not adding extra checks in the breakpoint handler, like checking the callback. Every breakpoint is treated the same way, which makes the code more simple. +void ptrace_triggered(struct perf_event *bp, int nmi, + struct perf_sample_data *data, struct pt_regs *regs) +{ + struct perf_event_attr attr; + + /* + * Disable the breakpoint request here since ptrace has defined a + * one-shot behaviour for breakpoint exceptions in PPC64. + * The SIGTRAP signal is generated automatically for us in do_dabr(). + * We don't have to do anything about that here + */ Oh, why does ptrace use a one-shot behaviour in ppc? Breakpoints only trigger once? Yes, ptrace breakpoints on PPC64 are designed to trigger once and this patch retains that behaviour. It is very convenient to use one-shot behaviour on archs where exceptions are triggered-before-execute. Ah, Why? This looks fine for basic breakpoints. And this can probably be improved to integrate ranges. But I think we did something wrong with the generic breakpoint interface. We are translating the arch values to generic attributes. Then this all will be translated back to arch values. Having generic attributes is necessary for any perf event use from userspace. But it looks like a waste for ptrace that already gives us arch values. And the problem is the same for x86. So I think we should implement a register_ptrace_breakpoint() that doesn't take perf_event_attr but specific arch informations, so that we don't need to pass through a generic conversion, which: I agree that the layers of conversion from generic to arch-specific breakpoint constants is wasteful. Can't the arch_bp_generic_fields() function be moved to arch/x86/kernel/ptrace.c instead of a new interface? I'll answer in your subsequent mail :) - is wasteful - won't be able to express 100% of any arch capabilities. We can certainly express most arch breakpoints features through the generic interface, but not all of them (given how tricky the data value comparison features can be) I will rework that during the next cycle. Thanks. Thank you for the comments. I will re-send a new version of the patch with the perf_bp_event() substitution. Thanks. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Mon, Feb 22, 2010 at 06:47:46PM +0530, K.Prasad wrote: On Sun, Feb 21, 2010 at 02:01:37AM +0100, Frederic Weisbecker wrote: On Mon, Feb 15, 2010 at 11:29:14AM +0530, K.Prasad wrote: [snipped] Also, do you think addr/len/type is enough to abstract out any ppc breakpoints? This looks enough to me to express range breakpoints and simple breakpoints. But what about value comparison? (And still, there may be other trickier implementations I don't know in ppc). The above implementation is for PPC64 architecture that supports only 'simple' breakpoints of fixed length (no range breakpoints, no value comparison). More on that below. Looks like I forgot the 'more on that below' part :-)here are some thoughts... Architectures like PPC Book-E have support for a variety of sophisticated debug features and our generic framework, in its present form, cannot easily port itself to these processors. In order to extend the framework for PPC Book-E, I intend the following to begin with: - Implement support for data breakpoints through DAC registers with all the 'bells and whistles'...support for instruction breakpoints through IAC can come in later (without precluding its use through ptrace). - Embed the flags/variables to store DVC, masked address mode, etc. in 'struct arch_hw_breakpoint', which will be populated by the user of register_breakpoint interface. Apart from the above extensions to the framework, changes in the generic code would be required as described in an earlier LKML mail (ref: message-id: 20091127190705.gb18...@in.ibm.com)relevant contents pasted below: I think the register_ interfaces can become wrappers around functions that do the following: - arch_validate(): Validate request by invoking an arch-dependant routine. Proceed if returned valid. - arch-specific debugreg availability: Do something like if (arch_hw_breakpoint_availabile()) bp = perf_event_create_kernel_counter(); perf_event_create_kernel_counter()---arch_install_hw_breakpoint(); This way, all book-keeping related work (no. of pinned/flexible/per-cpu) will be moved to arch-specific files (will be helpful for PPC Book-E implementation having two types of debug registers). Every new architecture that intends to port to the new hw-breakpoint implementation must define their arch_validate(), arch_hw_breakpoint_available() and an arch_install_hw_breakpoint(), while the hw-breakpoint code will be flexible enough to extend itself to each of these archs. Let me know what you think of the above. Thanks, K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Sun, Feb 21, 2010 at 02:01:37AM +0100, Frederic Weisbecker wrote: On Mon, Feb 15, 2010 at 11:29:14AM +0530, K.Prasad wrote: +struct arch_hw_breakpoint { + u8 len; /* length of the target symbol */ + int type; + char*name; /* Contains name of the symbol to set bkpt */ + unsigned long address; +}; I don't think it's a good idea to integrate the name of the target. This is something that should be done in a higher level, not in an arch backend. We don't even need to store it anywhere as we can resolve back an address easily. Symbol awareness is not something the hardware breakpoint should care about, neither in the arch nor the generic level. The 'name' field here is actually a legacy inherited from x86 code. It is part of x86's arch-specific hw-breakpoint structure since: - inspired by the kprobe implementation which accepts symbol name as input. - kallsyms_lookup_name() was 'unexported' and a module could not resolve symbol names externally, so the core-infrastructure had to provide such facilities. - I wasn't sure about the discussions behind 'unexporting' of kallsyms_lookup_name(), so did not venture to export them again (which you rightfully did :-) Having said that, I'll be glad to remove this field (along with that in x86), provided we know that there's a way for the user to resolve symbol names on its own i.e. routines like kallsyms_lookup_name() will remain exported. Also, do you think addr/len/type is enough to abstract out any ppc breakpoints? This looks enough to me to express range breakpoints and simple breakpoints. But what about value comparison? (And still, there may be other trickier implementations I don't know in ppc). The above implementation is for PPC64 architecture that supports only 'simple' breakpoints of fixed length (no range breakpoints, no value comparison). More on that below. + +#include linux/kdebug.h +#include asm/reg.h +#include asm/system.h + +/* Total number of available HW breakpoint registers */ +#define HBP_NUM 1 Looking at the G2 PowerPc implementation, DABR and DABR2 can either express two different watchpoints or one range watchpoint. There are also IABR and IABR2 for instruction breakpoints that follow the same above scheme. I'm not sure we can abstract that using a constant max linear number of resources. As stated above, the patch implements breakpoints for PPC64 processors only (http://www.power.org/resources/downloads/PowerISA_V2.06_PUBLIC.pdf), which does not support advanced breakpoint features (which its ppc Book-E counterpart has). +static inline void hw_breakpoint_disable(void) +{ + set_dabr(0); +} So, this is only about data breakpoints? Yes, newer PPC64 processors do not support IABR. + /* +* As a policy, the callback is invoked in a 'trigger-after-execute' +* fashion +*/ + (bp-overflow_handler)(bp, 0, NULL, regs); Why are you calling this explicitly instead of using the perf_bp_event() thing? This looks like it won't work with perf as the event won't be recorded by perf. Yes, should have invoked perf_bp_event() for perf to work well (on a side note, it makes me wonder at the amount of 'extra' code that each breakpoint exception would execute if it were not called through perf sys-call...well, the costs of integrating with a generic infrastructure!) +void ptrace_triggered(struct perf_event *bp, int nmi, + struct perf_sample_data *data, struct pt_regs *regs) +{ + struct perf_event_attr attr; + + /* +* Disable the breakpoint request here since ptrace has defined a +* one-shot behaviour for breakpoint exceptions in PPC64. +* The SIGTRAP signal is generated automatically for us in do_dabr(). +* We don't have to do anything about that here +*/ Oh, why does ptrace use a one-shot behaviour in ppc? Breakpoints only trigger once? Yes, ptrace breakpoints on PPC64 are designed to trigger once and this patch retains that behaviour. It is very convenient to use one-shot behaviour on archs where exceptions are triggered-before-execute. + if (bp) { + attr = bp-attr; + attr.bp_addr = data ~HW_BREAKPOINT_ALIGN; + + switch (data (DABR_DATA_WRITE | DABR_DATA_READ)) { + case DABR_DATA_READ: + attr.bp_type = HW_BREAKPOINT_R; + break; + case DABR_DATA_WRITE: + attr.bp_type = HW_BREAKPOINT_W; + break; + case (DABR_DATA_WRITE | DABR_DATA_READ): + attr.bp_type = HW_BREAKPOINT_R | HW_BREAKPOINT_W; + break; + } + ret = modify_user_hw_breakpoint(bp, attr); + if (ret) + return ret; + thread-ptrace_bps[0] = bp; + thread-dabr = data; +
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Mon, Feb 15, 2010 at 11:29:14AM +0530, K.Prasad wrote: +struct arch_hw_breakpoint { + u8 len; /* length of the target symbol */ + int type; + char*name; /* Contains name of the symbol to set bkpt */ + unsigned long address; +}; I don't think it's a good idea to integrate the name of the target. This is something that should be done in a higher level, not in an arch backend. We don't even need to store it anywhere as we can resolve back an address easily. Symbol awareness is not something the hardware breakpoint should care about, neither in the arch nor the generic level. Also, do you think addr/len/type is enough to abstract out any ppc breakpoints? This looks enough to me to express range breakpoints and simple breakpoints. But what about value comparison? (And still, there may be other trickier implementations I don't know in ppc). + +#include linux/kdebug.h +#include asm/reg.h +#include asm/system.h + +/* Total number of available HW breakpoint registers */ +#define HBP_NUM 1 Looking at the G2 PowerPc implementation, DABR and DABR2 can either express two different watchpoints or one range watchpoint. There are also IABR and IABR2 for instruction breakpoints that follow the same above scheme. I'm not sure we can abstract that using a constant max linear number of resources. +static inline void hw_breakpoint_disable(void) +{ + set_dabr(0); +} So, this is only about data breakpoints? + /* + * As a policy, the callback is invoked in a 'trigger-after-execute' + * fashion + */ + (bp-overflow_handler)(bp, 0, NULL, regs); Why are you calling this explicitly instead of using the perf_bp_event() thing? This looks like it won't work with perf as the event won't be recorded by perf. +void ptrace_triggered(struct perf_event *bp, int nmi, + struct perf_sample_data *data, struct pt_regs *regs) +{ + struct perf_event_attr attr; + + /* + * Disable the breakpoint request here since ptrace has defined a + * one-shot behaviour for breakpoint exceptions in PPC64. + * The SIGTRAP signal is generated automatically for us in do_dabr(). + * We don't have to do anything about that here + */ Oh, why does ptrace use a one-shot behaviour in ppc? Breakpoints only trigger once? + if (bp) { + attr = bp-attr; + attr.bp_addr = data ~HW_BREAKPOINT_ALIGN; + + switch (data (DABR_DATA_WRITE | DABR_DATA_READ)) { + case DABR_DATA_READ: + attr.bp_type = HW_BREAKPOINT_R; + break; + case DABR_DATA_WRITE: + attr.bp_type = HW_BREAKPOINT_W; + break; + case (DABR_DATA_WRITE | DABR_DATA_READ): + attr.bp_type = HW_BREAKPOINT_R | HW_BREAKPOINT_W; + break; + } + ret = modify_user_hw_breakpoint(bp, attr); + if (ret) + return ret; + thread-ptrace_bps[0] = bp; + thread-dabr = data; + return 0; + } + + /* Create a new breakpoint request if one doesn't exist already */ + hw_breakpoint_init(attr); + attr.bp_addr = data ~HW_BREAKPOINT_ALIGN; + switch (data (DABR_DATA_WRITE | DABR_DATA_READ)) { + case DABR_DATA_READ: + attr.bp_type = HW_BREAKPOINT_R; + break; + case DABR_DATA_WRITE: + attr.bp_type = HW_BREAKPOINT_W; + break; + case (DABR_DATA_WRITE | DABR_DATA_READ): + attr.bp_type = HW_BREAKPOINT_R | HW_BREAKPOINT_W; + break; + } + thread-ptrace_bps[0] = bp = register_user_hw_breakpoint(attr, + ptrace_triggered, task); + if (IS_ERR(bp)) { + thread-ptrace_bps[0] = NULL; + return PTR_ERR(bp); + } + +#endif /* CONFIG_PPC64 */ This looks fine for basic breakpoints. And this can probably be improved to integrate ranges. But I think we did something wrong with the generic breakpoint interface. We are translating the arch values to generic attributes. Then this all will be translated back to arch values. Having generic attributes is necessary for any perf event use from userspace. But it looks like a waste for ptrace that already gives us arch values. And the problem is the same for x86. So I think we should implement a register_ptrace_breakpoint() that doesn't take perf_event_attr but specific arch informations, so that we don't need to pass through a generic conversion, which: - is wasteful - won't be able to express 100% of any arch capabilities. We can certainly express most arch breakpoints features through the generic interface, but not all of them (given how tricky the data value comparison features can be) I will rework
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Tue, Jan 19, 2010 at 02:03:35AM -0800, Roland McGrath wrote: It is also not clear to me if disabling pre-emption for the user-space (albeit for a very tiny time-window) is incorrect and if their side-effects are known. If otherwise, I think we should choose to operate in pre-empt safe mode and avoid all costs associated when done without it. I never really gave much consideration to returning to user mode with preemption disabled. It would not really have occurred to me that was even possible. I can't say it seems to me like it could ever be a very good idea. I find it hard even to start listing the cans of worms that might be opened by that. Perhaps the powerpc maintainers have a clearer picture of it than I do. What does it mean when there is something that prevents it from returning to user mode? i.e., TIF_SIGPENDING or TIF_NEED_RESCHED, or whatever. It could do a lot in the kernel before it gets back to user mode. What if in there somewhere it blocks voluntarily? Similarly, what does it mean if you get to user mode but the single-stepped instruction is a load/store that gets a page fault? What if it blocks in the page fault handler? For that matter, what about a page fault for the kernel-mode case? Perhaps I'm imagining gremlins where there aren't any, but I just cannot really get my head around this disable preemption while running some unknown instruction that normally runs with preemption enabled thing--let alone disable preemption while returning to user mode. Thanks, Roland I posted a patch which re-enables pre-emption after a hw-breakpoint is processed (linuxppc-dev ref: 20100121084640.ga3...@in.ibm.com). It does lead to clumsiness (due to the new variables to track states, prior breakpoints, etc.) but with the reasons you pointed out, it is much better than having uncertain/incorrect code. Thanks for your comments. -- K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Mon, Dec 14, 2009 at 11:26:26AM -0800, Roland McGrath wrote: snipped I understand the reason for using stepping. (I have advised in the past that I thought this magical implicit step logic was too hairy to roll in under the covers and that a low-level facility expressing the different hardware semantics to a kernel API would be OK. I do agree with the motivation of cross-arch uniformity of the semantics. I don't object to making it magically right--I just expressed general skepticism/fear about getting that right so that I didn't want to try writing that magic. Now I'm just responding about the particular details I've noticed about that can of worms. It's certainly great if you can resolve all that. But I'll note that I am still by no means confident that the details I have raised cover all the worms in that can.) What remains less than clear is how preemption relates. For any per-thread hw_breakpoint, there is no high-level reason to care one way or the other. The thread, its HW breakpoints, its register state including state of stepping, are all part of per-thread state and no reason to do any less (or more) preemption than normally happens. I get that reasoning now...I'd been unduly worried about pre-emption and hence the introduction of pre-emption disabled state. But of course, in the existing design, the per-cpu variables would be affected because if pre-emption was to occur. I'll see how that can be factored in, while retaining the abstraction provided by the interfaces. Disabling pre-emption is necessary to ensure that hw-breakpoints are enabled immediately after the causative instruction has finished execution (the control flow may go astray if pre-emption occurs between i1 and i2). I don't understand what go astray means here. The only thing I can think of is the effect on any per-cpu variables you are using in hw_breakpoint implementation. As stated above, I was worried about a pre-emption happening between a return from breakpoint exception handler and the execution of the causative instructionbut as I learn, it seems fine now. It is just that the kernel code needs to be tweaked keeping this in mind. Thanks, K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
On Sun, Dec 13, 2009 at 04:56:48PM -0800, Roland McGrath wrote: I can't see anything you've done to keep this use of MSR_SE in the user-mode register state from interfering with user_enable_single_step(). It looks to me like you'd swallow the normal step indications. Yes, it does unset MSR_SE bit in single_step_dabr_instruction() irrespective of whether it was previously enabled through user_enable_single_step(). This could be mitigated with the use of a separate flag which can be used to conditionally unset MSR_SE, however given further concerns about pre-emption (as expressed by you below), I'm afraid of substantial revamp of the user-space semantics. Likewise I'm not very clear on the interaction with kprobes, kgdb, or whatnot for kernel-mode cases. But I'll leave those concerns to others, since I know more about the user-mode situations. Kprobes has been tested to work simultaneously with hw-breakpoints. KGDB has not been ported yet to use the hw-breakpoint interfaces (KGDB had issues in it, that prevented it from being tested during our development...though its maintainer has begun showing interest recently). Xmon was (and I believe is still) in a state where data breakpoints did not work. It needs to be ported too, to benefit from the hw-breakpoint interfaces. Back to the user-mode case, is it really reasonable to disable preemption in hw_breakpoint_handler and leave it so across returning to user mode? (Is that even possible? I thought user mode was always preemptible.) That is done very casually with little comment in hw_breakpoint_handler and single_step_dabr_instruction, but it seems like an extremely deep and magical thing that merits more explanation. I guess the need for it has to do with the per_cpu variable you're using, but the whole situation is not very clear on first reading. Even for kernel mode, what does this mean when the stepped instruction does a page fault? I must admit that the issue of pre-emption should have been given more thought. Suppose there's a stream of user-space instructions i1, i2, i3, .in and if 'i2' instruction can cause a hw-breakpoint exception, then there exists a small window between i1 and i2 where pre-emption is disabled (while a schedule operation could have taken place otherwise). Disabling pre-emption is necessary to ensure that hw-breakpoints are enabled immediately after the causative instruction has finished execution (the control flow may go astray if pre-emption occurs between i1 and i2). The root cause of this behaviour is a combination of 'trigger-before-execute' behaviour (for data-exceptions) and a desire for 'continuous' exceptions (as opposed to one-shot behaviour seen in ptrace). The per-cpu variable 'last_hit_bp' just helps identify a single-step exception resulting from a hbp_handler vs other sources. Resorting to one-shot behaviour (which is the easiest workaround available) will break the desired uniformity in behaviour for hw-breakpoint interfaces - say every register_user_ interface must be accompanied by a unregister_ interface, etc. Post perf-events' integration, ensuring a one-shot behaviour might also have its own bunch of undesirable consequences (such as circular locks), that must be overcome. Unless I see a way to re-instate the breakpoints (surviving a pre-emption), I will send out a new patch that resorts to a one-shot behaviour for user-space (kernel-space is fine though). Thank you for the insightful comments! K.Prasad ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
Yes, it does unset MSR_SE bit in single_step_dabr_instruction() irrespective of whether it was previously enabled through user_enable_single_step(). This could be mitigated with the use of a separate flag which can be used to conditionally unset MSR_SE, however given further concerns about pre-emption (as expressed by you below), I'm afraid of substantial revamp of the user-space semantics. There is already TIF_SINGLESTEP set by user_enable_single_step. So for that aspect, it is probably relatively straightforward to cover that interaction. The code has to be pretty exact and will merit some comments about subtleties, but I suspect the actual new code required will be just a tiny amount. Kprobes has been tested to work simultaneously with hw-breakpoints. KGDB has not been ported yet to use the hw-breakpoint interfaces (KGDB had issues in it, that prevented it from being tested during our development...though its maintainer has begun showing interest recently). Xmon was (and I believe is still) in a state where data breakpoints did not work. It needs to be ported too, to benefit from the hw-breakpoint interfaces. That is not really what I meant at all. That is good stuff to work out. But I just meant the interactions with kprobes/kgdb's use of single-stepping, the direct analogy to the user_enable_single_step issue. I must admit that the issue of pre-emption [...] I understand the reason for using stepping. (I have advised in the past that I thought this magical implicit step logic was too hairy to roll in under the covers and that a low-level facility expressing the different hardware semantics to a kernel API would be OK. I do agree with the motivation of cross-arch uniformity of the semantics. I don't object to making it magically right--I just expressed general skepticism/fear about getting that right so that I didn't want to try writing that magic. Now I'm just responding about the particular details I've noticed about that can of worms. It's certainly great if you can resolve all that. But I'll note that I am still by no means confident that the details I have raised cover all the worms in that can.) What remains less than clear is how preemption relates. For any per-thread hw_breakpoint, there is no high-level reason to care one way or the other. The thread, its HW breakpoints, its register state including state of stepping, are all part of per-thread state and no reason to do any less (or more) preemption than normally happens. Disabling pre-emption is necessary to ensure that hw-breakpoints are enabled immediately after the causative instruction has finished execution (the control flow may go astray if pre-emption occurs between i1 and i2). I don't understand what go astray means here. The only thing I can think of is the effect on any per-cpu variables you are using in hw_breakpoint implementation. Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [Patch 1/1] PPC64-HWBKPT: Implement hw-breakpoints for PPC64
I can't see anything you've done to keep this use of MSR_SE in the user-mode register state from interfering with user_enable_single_step(). It looks to me like you'd swallow the normal step indications. Likewise I'm not very clear on the interaction with kprobes, kgdb, or whatnot for kernel-mode cases. But I'll leave those concerns to others, since I know more about the user-mode situations. Back to the user-mode case, is it really reasonable to disable preemption in hw_breakpoint_handler and leave it so across returning to user mode? (Is that even possible? I thought user mode was always preemptible.) That is done very casually with little comment in hw_breakpoint_handler and single_step_dabr_instruction, but it seems like an extremely deep and magical thing that merits more explanation. I guess the need for it has to do with the per_cpu variable you're using, but the whole situation is not very clear on first reading. Even for kernel mode, what does this mean when the stepped instruction does a page fault? Thanks, Roland ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev