Re: [PATCH 5/6] x86: patch all traced function calls using the int3-based framework
On Fri, 18 Oct 2013 16:27:24 +0200 Petr Mladek wrote: > void ftrace_replace_code(int enable) > { > struct ftrace_rec_iter *iter; > struct dyn_ftrace *rec; > - const char *report = "adding breakpoints"; > - int count = 0; > - int ret; > + unsigned long *addr = NULL; > + void *opcode = NULL, *op = NULL; > + const unsigned char *new; > + size_t count = 0; > > for_ftrace_rec_iter(iter) { > rec = ftrace_rec_iter_record(iter); > - > - ret = add_breakpoints(rec, enable); > - if (ret) > - goto remove_breakpoints; > - count++; > - } > - > - run_sync(); > - > - report = "updating code"; > - > - for_ftrace_rec_iter(iter) { > - rec = ftrace_rec_iter_record(iter); > - > - ret = add_update(rec, enable); > - if (ret) > - goto remove_breakpoints; > + new = ftrace_new_code(rec, enable); > + > + /* check next record if there is no new code for this one */ > + if (!new) > + continue; > + > + /* allocate buffers only when there will be a change */ > + if (unlikely(!addr)) { > + if (ftrace_allocate_replace_buffers(, )) > + goto out; > + op = opcode; > + count = 0; > + } > + > + /* fill buffers */ > + addr[count++] = rec->ip; > + memcpy(op, new, MCOUNT_INSN_SIZE); > + op += MCOUNT_INSN_SIZE; > + > + /* write buffers if they are full */ > + if (count == FTRACE_MAX_RECS_PATCH) { > + text_poke_bp((void **)addr, opcode, > + MCOUNT_INSN_SIZE, count, NULL); > + op = opcode; > + count = 0; > + } > } > > - run_sync(); > - > - report = "removing breakpoints"; > - > - for_ftrace_rec_iter(iter) { > - rec = ftrace_rec_iter_record(iter); > + if (count) > + text_poke_bp((void **)addr, opcode, > + MCOUNT_INSN_SIZE, count, NULL); > > - ret = finish_update(rec, enable); > - if (ret) > - goto remove_breakpoints; > - } > - > - run_sync(); > - > - return; > - > - remove_breakpoints: > - ftrace_bug(ret, rec ? rec->ip : 0); > - printk(KERN_WARNING "Failed on %s (%d):\n", report, count); > - for_ftrace_rec_iter(iter) { > - rec = ftrace_rec_iter_record(iter); > - remove_breakpoint(rec); Also you must keep the ftrace_bug() functionality. I notice there's no error check for text_poke_bp(). Not only do we need to check the error status, but if an error does happen, we must know the following: 1) What record it failed on 2) Did in fail on the read, compare, or write? If it failed on the read: -EFAULT is returned If it failed on the compare: -EINVAL is returned If it failed on the write:-EPERM is returned Look at "ftrace_bug()", and then you can also do a google search on "ftrace faulted" or "ftrace failed to modify", and see that this is an extremely useful debugging tool, and not something I will allow to be removed. -- Steve > - } > +out: > + kfree(addr); > + kfree(opcode); > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] x86: patch all traced function calls using the int3-based framework
On Fri, 18 Oct 2013 10:55:57 -0400 Steven Rostedt wrote: > On Fri, 18 Oct 2013 16:27:24 +0200 > Petr Mladek wrote: > > > > +/* > > + * We do not want to replace ftrace calls one by one because syncing all > > CPUs > > + * is quite expensive. > > + * > > + * On the other hand, we do not want to modify all calls at once because > > + * the buffers for text_poke_bp might be quite huge. Also we do not want > > + * to count the number of records before the allocation. > > + * > > + * Let's modify the call in batches defined by this define. > > + */ > > +#define FTRACE_MAX_RECS_PATCH 8192 > > > > -static int finish_update(struct dyn_ftrace *rec, int enable) > > +static int ftrace_allocate_replace_buffers(unsigned long **addr, void > > **opcode) > > I absolutely hate this. Current ftrace conversion does not need to > allocate at all. I want to keep it that way. > I should clarify that I do not hate the patch set. I actually like what it's trying to achieve (a lot!). But I absolutely hate having to copy records to do the bulk conversion. That part I'm giving a NAK to. Add a text_poke_bp_iterate() or something, that removes the need for allocating extra buffers, then I'll be very happy :-) -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] x86: patch all traced function calls using the int3-based framework
On Fri, 18 Oct 2013 16:27:24 +0200 Petr Mladek wrote: > +/* > + * We do not want to replace ftrace calls one by one because syncing all CPUs > + * is quite expensive. > + * > + * On the other hand, we do not want to modify all calls at once because > + * the buffers for text_poke_bp might be quite huge. Also we do not want > + * to count the number of records before the allocation. > + * > + * Let's modify the call in batches defined by this define. > + */ > +#define FTRACE_MAX_RECS_PATCH 8192 > > -static int finish_update(struct dyn_ftrace *rec, int enable) > +static int ftrace_allocate_replace_buffers(unsigned long **addr, void > **opcode) I absolutely hate this. Current ftrace conversion does not need to allocate at all. I want to keep it that way. Now, what I was thinking is that the text_poke_bp() should have a variant where you can pass in an iterator. That way we don't need to copy the ftrace records to an array that text_poke_bp() uses to do the conversion. Just let the iterator return the next address to use. I'm all for merging the code (was on my TODO list, so thank you for this :-) but one reason I didn't get to it yet, was because of not doing the copy. To me, that's a cop out. Anyway, this was really bad timing. I'm trying to get some other things reviewed and into the kernel for 3.12 before it's too late, and next week I'll be traveling for 10 days (Kernel Summit, followed by RT summit). You may need to ping me again in a couple of weeks to review this. Unless I get some time to do it in my travels. Thanks, -- Steve > { > - unsigned long ftrace_addr; > - int ret; > - > - ret = ftrace_update_record(rec, enable); > - > - ftrace_addr = get_ftrace_addr(rec); > - > - switch (ret) { > - case FTRACE_UPDATE_IGNORE: > - return 0; > - > - case FTRACE_UPDATE_MODIFY_CALL_REGS: > - case FTRACE_UPDATE_MODIFY_CALL: > - case FTRACE_UPDATE_MAKE_CALL: > - /* converting nop to call */ > - return finish_update_call(rec, ftrace_addr); > + *addr = kmalloc_array(FTRACE_MAX_RECS_PATCH, > + sizeof(*addr), GFP_KERNEL); > + if (!addr) { > + printk(KERN_ERR > +"ftrace_replace_code: cannot allocate addr buffer\n"); > + return -ENOMEM; > + } > > - case FTRACE_UPDATE_MAKE_NOP: > - /* converting a call to a nop */ > - return finish_update_nop(rec); > + *opcode = kmalloc_array(FTRACE_MAX_RECS_PATCH, > + MCOUNT_INSN_SIZE, GFP_KERNEL); > + if (!opcode) { > + printk(KERN_ERR > +"ftrace_replace_code: cannot allocate codes buffer\n"); > + return -ENOMEM; > } > > return 0; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/6] x86: patch all traced function calls using the int3-based framework
Let's continue with replacing the existing Int3-based framewok in ftrace with the new generic function introduced by the commit fd4363fff3d9 (x86: Introduce int3 (breakpoint)-based instruction patching) This time use it in ftrace_replace_code that modifies all the watched function calls. If we use text_poke_bp in ftrace_make_nop, ftrace_make_call, ftrace_modify_call, it would be possible to get rid of the x86-specific ftrace_replace_code implementation and use the generic code. This would be really lovely change. Unfortunately, the code would be slow because syncing on each CPU is relatively expensive operation. For example, I tried to modify the ftrace function, enable, and disable the tracer in a cycle. With 9 different trace functions and 500 cycles, I got these times: original code: text_poke_bp for single address: real19m51.667s real33m29.456s user0m0.004suser0m0.008s sys 0m27.124s sys 0m27.620s Hence, we still need the custom ftrace_replace_code and fill the buffers for text_poke_bp. Anyway, it still allows to remove lots of custom code. It is even slightly faster. With the test metnioned above, I got: original code: text_poke_bp for batched addresses: (this patch) real20m8.123s real18m47.042s user0m0.004suser0m0.008s sys 0m28.400s sys 0m42.760s real19m59.777s real18m41.668s user0m0.004suser0m0.000s sys 0m28.344s sys 0m43.576s real19m57.598s real18m38.203s user0m0.004suser0m0.004s sys 0m28.728s sys 0m43.104s It faster because it checks the modified code only once. It removes usage of the relatively complex "ftrace_test_record". Also it reduces iterations over the list of records. All this should be safe because other manipulations with the list are quarded by "ftrace_lock". This cover also adding and removing modules. In addition, the int3 handler is faster as well. It could use bsearch on all patched addresses. While ftrace_location uses bsearch only for functions from core or the same module. Another win might be that we modify the calls in batches while keeping the CPU syncing on low level. The result should be reduced usage of the relatively slow int3 handler. It might especially help on busy systems where the performance is more important. By other words, saving several int3 handler calls should be worth the few extra sync calls. Well, this is just speculation because I do not have real numbers here. They are not easy to get as they depend on the type of system load. Note that we need to add notrace for all functions that are called by text_poke_bp and poke_int3_handler. Otherwise, we get either infinite loop or too many operations going through the relatively slow handler. Signed-off-by: Petr Mladek --- arch/x86/kernel/alternative.c | 6 +- arch/x86/kernel/ftrace.c | 461 +- arch/x86/kernel/traps.c | 10 - lib/bsearch.c | 2 +- 4 files changed, 97 insertions(+), 382 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 0eb8250..45d7922 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -603,7 +603,7 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len) * * Note: Must be called under text_mutex. */ -static void *text_poke_part(void *addr, const void *opcode, size_t len) +static void *notrace text_poke_part(void *addr, const void *opcode, size_t len) { /* * On x86_64, kernel text mappings are mapped read-only with @@ -649,7 +649,7 @@ static void **bp_int3_handler, **bp_int3_addr; static size_t bp_int3_len; /* length of the modified instruction */ static unsigned int bp_int3_count; /* number of modified instructions */ -static int poke_int3_cmp(const void *a, const void *b) +static int notrace poke_int3_cmp(const void *a, const void *b) { unsigned long key = (unsigned long)a; unsigned long addr = *(unsigned long *)b; @@ -657,7 +657,7 @@ static int poke_int3_cmp(const void *a, const void *b) return key - addr; } -int poke_int3_handler(struct pt_regs *regs) +int notrace poke_int3_handler(struct pt_regs *regs) { void *found; unsigned long index; diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index f96dbcc..f3c5f3a 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -127,23 +128,6 @@ static const unsigned char *ftrace_nop_replace(void) } static int
[PATCH 5/6] x86: patch all traced function calls using the int3-based framework
Let's continue with replacing the existing Int3-based framewok in ftrace with the new generic function introduced by the commit fd4363fff3d9 (x86: Introduce int3 (breakpoint)-based instruction patching) This time use it in ftrace_replace_code that modifies all the watched function calls. If we use text_poke_bp in ftrace_make_nop, ftrace_make_call, ftrace_modify_call, it would be possible to get rid of the x86-specific ftrace_replace_code implementation and use the generic code. This would be really lovely change. Unfortunately, the code would be slow because syncing on each CPU is relatively expensive operation. For example, I tried to modify the ftrace function, enable, and disable the tracer in a cycle. With 9 different trace functions and 500 cycles, I got these times: original code: text_poke_bp for single address: real19m51.667s real33m29.456s user0m0.004suser0m0.008s sys 0m27.124s sys 0m27.620s Hence, we still need the custom ftrace_replace_code and fill the buffers for text_poke_bp. Anyway, it still allows to remove lots of custom code. It is even slightly faster. With the test metnioned above, I got: original code: text_poke_bp for batched addresses: (this patch) real20m8.123s real18m47.042s user0m0.004suser0m0.008s sys 0m28.400s sys 0m42.760s real19m59.777s real18m41.668s user0m0.004suser0m0.000s sys 0m28.344s sys 0m43.576s real19m57.598s real18m38.203s user0m0.004suser0m0.004s sys 0m28.728s sys 0m43.104s It faster because it checks the modified code only once. It removes usage of the relatively complex ftrace_test_record. Also it reduces iterations over the list of records. All this should be safe because other manipulations with the list are quarded by ftrace_lock. This cover also adding and removing modules. In addition, the int3 handler is faster as well. It could use bsearch on all patched addresses. While ftrace_location uses bsearch only for functions from core or the same module. Another win might be that we modify the calls in batches while keeping the CPU syncing on low level. The result should be reduced usage of the relatively slow int3 handler. It might especially help on busy systems where the performance is more important. By other words, saving several int3 handler calls should be worth the few extra sync calls. Well, this is just speculation because I do not have real numbers here. They are not easy to get as they depend on the type of system load. Note that we need to add notrace for all functions that are called by text_poke_bp and poke_int3_handler. Otherwise, we get either infinite loop or too many operations going through the relatively slow handler. Signed-off-by: Petr Mladek pmla...@suse.cz --- arch/x86/kernel/alternative.c | 6 +- arch/x86/kernel/ftrace.c | 461 +- arch/x86/kernel/traps.c | 10 - lib/bsearch.c | 2 +- 4 files changed, 97 insertions(+), 382 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 0eb8250..45d7922 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -603,7 +603,7 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len) * * Note: Must be called under text_mutex. */ -static void *text_poke_part(void *addr, const void *opcode, size_t len) +static void *notrace text_poke_part(void *addr, const void *opcode, size_t len) { /* * On x86_64, kernel text mappings are mapped read-only with @@ -649,7 +649,7 @@ static void **bp_int3_handler, **bp_int3_addr; static size_t bp_int3_len; /* length of the modified instruction */ static unsigned int bp_int3_count; /* number of modified instructions */ -static int poke_int3_cmp(const void *a, const void *b) +static int notrace poke_int3_cmp(const void *a, const void *b) { unsigned long key = (unsigned long)a; unsigned long addr = *(unsigned long *)b; @@ -657,7 +657,7 @@ static int poke_int3_cmp(const void *a, const void *b) return key - addr; } -int poke_int3_handler(struct pt_regs *regs) +int notrace poke_int3_handler(struct pt_regs *regs) { void *found; unsigned long index; diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index f96dbcc..f3c5f3a 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -17,6 +17,7 @@ #include linux/ftrace.h #include linux/percpu.h #include linux/sched.h +#include linux/slab.h #include linux/init.h #include linux/list.h #include
Re: [PATCH 5/6] x86: patch all traced function calls using the int3-based framework
On Fri, 18 Oct 2013 16:27:24 +0200 Petr Mladek pmla...@suse.cz wrote: +/* + * We do not want to replace ftrace calls one by one because syncing all CPUs + * is quite expensive. + * + * On the other hand, we do not want to modify all calls at once because + * the buffers for text_poke_bp might be quite huge. Also we do not want + * to count the number of records before the allocation. + * + * Let's modify the call in batches defined by this define. + */ +#define FTRACE_MAX_RECS_PATCH 8192 -static int finish_update(struct dyn_ftrace *rec, int enable) +static int ftrace_allocate_replace_buffers(unsigned long **addr, void **opcode) I absolutely hate this. Current ftrace conversion does not need to allocate at all. I want to keep it that way. Now, what I was thinking is that the text_poke_bp() should have a variant where you can pass in an iterator. That way we don't need to copy the ftrace records to an array that text_poke_bp() uses to do the conversion. Just let the iterator return the next address to use. I'm all for merging the code (was on my TODO list, so thank you for this :-) but one reason I didn't get to it yet, was because of not doing the copy. To me, that's a cop out. Anyway, this was really bad timing. I'm trying to get some other things reviewed and into the kernel for 3.12 before it's too late, and next week I'll be traveling for 10 days (Kernel Summit, followed by RT summit). You may need to ping me again in a couple of weeks to review this. Unless I get some time to do it in my travels. Thanks, -- Steve { - unsigned long ftrace_addr; - int ret; - - ret = ftrace_update_record(rec, enable); - - ftrace_addr = get_ftrace_addr(rec); - - switch (ret) { - case FTRACE_UPDATE_IGNORE: - return 0; - - case FTRACE_UPDATE_MODIFY_CALL_REGS: - case FTRACE_UPDATE_MODIFY_CALL: - case FTRACE_UPDATE_MAKE_CALL: - /* converting nop to call */ - return finish_update_call(rec, ftrace_addr); + *addr = kmalloc_array(FTRACE_MAX_RECS_PATCH, + sizeof(*addr), GFP_KERNEL); + if (!addr) { + printk(KERN_ERR +ftrace_replace_code: cannot allocate addr buffer\n); + return -ENOMEM; + } - case FTRACE_UPDATE_MAKE_NOP: - /* converting a call to a nop */ - return finish_update_nop(rec); + *opcode = kmalloc_array(FTRACE_MAX_RECS_PATCH, + MCOUNT_INSN_SIZE, GFP_KERNEL); + if (!opcode) { + printk(KERN_ERR +ftrace_replace_code: cannot allocate codes buffer\n); + return -ENOMEM; } return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] x86: patch all traced function calls using the int3-based framework
On Fri, 18 Oct 2013 10:55:57 -0400 Steven Rostedt rost...@goodmis.org wrote: On Fri, 18 Oct 2013 16:27:24 +0200 Petr Mladek pmla...@suse.cz wrote: +/* + * We do not want to replace ftrace calls one by one because syncing all CPUs + * is quite expensive. + * + * On the other hand, we do not want to modify all calls at once because + * the buffers for text_poke_bp might be quite huge. Also we do not want + * to count the number of records before the allocation. + * + * Let's modify the call in batches defined by this define. + */ +#define FTRACE_MAX_RECS_PATCH 8192 -static int finish_update(struct dyn_ftrace *rec, int enable) +static int ftrace_allocate_replace_buffers(unsigned long **addr, void **opcode) I absolutely hate this. Current ftrace conversion does not need to allocate at all. I want to keep it that way. I should clarify that I do not hate the patch set. I actually like what it's trying to achieve (a lot!). But I absolutely hate having to copy records to do the bulk conversion. That part I'm giving a NAK to. Add a text_poke_bp_iterate() or something, that removes the need for allocating extra buffers, then I'll be very happy :-) -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/6] x86: patch all traced function calls using the int3-based framework
On Fri, 18 Oct 2013 16:27:24 +0200 Petr Mladek pmla...@suse.cz wrote: void ftrace_replace_code(int enable) { struct ftrace_rec_iter *iter; struct dyn_ftrace *rec; - const char *report = adding breakpoints; - int count = 0; - int ret; + unsigned long *addr = NULL; + void *opcode = NULL, *op = NULL; + const unsigned char *new; + size_t count = 0; for_ftrace_rec_iter(iter) { rec = ftrace_rec_iter_record(iter); - - ret = add_breakpoints(rec, enable); - if (ret) - goto remove_breakpoints; - count++; - } - - run_sync(); - - report = updating code; - - for_ftrace_rec_iter(iter) { - rec = ftrace_rec_iter_record(iter); - - ret = add_update(rec, enable); - if (ret) - goto remove_breakpoints; + new = ftrace_new_code(rec, enable); + + /* check next record if there is no new code for this one */ + if (!new) + continue; + + /* allocate buffers only when there will be a change */ + if (unlikely(!addr)) { + if (ftrace_allocate_replace_buffers(addr, opcode)) + goto out; + op = opcode; + count = 0; + } + + /* fill buffers */ + addr[count++] = rec-ip; + memcpy(op, new, MCOUNT_INSN_SIZE); + op += MCOUNT_INSN_SIZE; + + /* write buffers if they are full */ + if (count == FTRACE_MAX_RECS_PATCH) { + text_poke_bp((void **)addr, opcode, + MCOUNT_INSN_SIZE, count, NULL); + op = opcode; + count = 0; + } } - run_sync(); - - report = removing breakpoints; - - for_ftrace_rec_iter(iter) { - rec = ftrace_rec_iter_record(iter); + if (count) + text_poke_bp((void **)addr, opcode, + MCOUNT_INSN_SIZE, count, NULL); - ret = finish_update(rec, enable); - if (ret) - goto remove_breakpoints; - } - - run_sync(); - - return; - - remove_breakpoints: - ftrace_bug(ret, rec ? rec-ip : 0); - printk(KERN_WARNING Failed on %s (%d):\n, report, count); - for_ftrace_rec_iter(iter) { - rec = ftrace_rec_iter_record(iter); - remove_breakpoint(rec); Also you must keep the ftrace_bug() functionality. I notice there's no error check for text_poke_bp(). Not only do we need to check the error status, but if an error does happen, we must know the following: 1) What record it failed on 2) Did in fail on the read, compare, or write? If it failed on the read: -EFAULT is returned If it failed on the compare: -EINVAL is returned If it failed on the write:-EPERM is returned Look at ftrace_bug(), and then you can also do a google search on ftrace faulted or ftrace failed to modify, and see that this is an extremely useful debugging tool, and not something I will allow to be removed. -- Steve - } +out: + kfree(addr); + kfree(opcode); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/