Re: RFC: paravirtualizing perf_clock
(2013/10/30 23:03), David Ahern wrote: On 10/29/13 11:59 PM, Masami Hiramatsu wrote: (2013/10/29 11:58), David Ahern wrote: To back out a bit, my end goal is to be able to create and merge perf-events from any context on a KVM-based host -- guest userspace, guest kernel space, host userspace and host kernel space (userspace events with a perf-clock timestamp is another topic ;-)). That is almost same as what we(Yoshihiro and I) are trying on integrated tracing, we are doing it on ftrace and trace-cmd (but perhaps, it eventually works on perf-ftrace). I thought at this point (well, once perf-ftrace gets committed) that you can do everything with perf. What feature is missing in perf that you get with trace-cmd or using debugfs directly? The perftools interface is the best for profiling a process or in a short period. However, what we'd like to do is monitoring or tracing in background a long period on the memory, while the system life cycle, as a flight recorder. This kind of tracing interface is required for mission-critical system for trouble shooting. Also, on-the-fly configurability of ftrace such as snapshot, multi-buffer, event-adding/removing are very useful, since in the flight-recorder use-case, we can't stop tracing for even a moment. Moreover, our guest/host integrated tracer can pass event buffers from guest to host with very small overhead, because it uses ftrace ringbuffer and virtio-serial with splice (so, zero page copying in the guest). Note that we need low overhead tracing as small as possible because it is running always in background. That's why we're using ftrace for our purpose. But anyway, the time synchronization is common issue. Let's share the solution :) And then for the cherry on top a design that works across architectures (e.g., x86 now, but arm later). I think your proposal is good for the default implementation, it doesn't depends on the arch specific feature. However, since physical timer(clock) interfaces and virtualization interfaces strongly depends on the arch, I guess the optimized implementations will become different on each arch. For example, maybe we can export tsc-offset to the guest to adjust clock on x86, but not on ARM, or other devices. In that case, until implementing optimized one, we can use paravirt perf_clock. So this MSR read takes about 1.6usecs (from 'perf stat kvm live') and that is total time between VMEXIT and VMENTRY. The time it takes to run perf_clock in the host should be a very small part of that 1.6 usec. Yeah, a hypercall is always heavy operation. So that is not the best solution, we need a optimized one for each arch. I'll take a look at the TSC path to see how it is optimized (suggestions appreciated). At least on the machine which has stable tsc, we can relay on that. We just need the tsc-offset to adjust it in the guest. Note that this offset can change if the guest sleeps/resumes or does a live-migration. Each time we need to refresh the tsc-offset. Another thought is to make the use of pv_perf_clock an option -- user can knowingly decide the additional latency/overhead is worth the feature. Yeah. BTW, would you see the paravirt_sched_clock(pv_time_ops)? It seems that such synchronized clock is there. Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: RFC: paravirtualizing perf_clock
(2013/10/29 11:58), David Ahern wrote: On 10/28/13 7:15 AM, Peter Zijlstra wrote: Any suggestions on how to do this and without impacting performance. I noticed the MSR path seems to take about twice as long as the current implementation (which I believe results in rdtsc in the VM for x86 with stable TSC). So assuming all the TSCs are in fact stable; you could implement this by syncing up the guest TSC to the host TSC on guest boot. I don't think anything _should_ rely on the absolute TSC value. Of course you then also need to make sure the host and guest tsc multipliers (cyc2ns) are identical, you can play games with cyc2ns_offset if you're brave. This and the method Gleb mentioned both are going to be complex and fragile -- based assumptions on how the perf_clock timestamps are generated. For example, 489223e assumes you have the tracepoint enabled at VM start with some means of capturing the data (e.g., a perf-session active). In both cases the end result requires piecing together and re-generating the VM's timestamp on the events. For perf this means either modifying the tool to take parameters and an algorithm on how to modify the timestamp or a homegrown tool to regenerate the file with updated timestamps. To back out a bit, my end goal is to be able to create and merge perf-events from any context on a KVM-based host -- guest userspace, guest kernel space, host userspace and host kernel space (userspace events with a perf-clock timestamp is another topic ;-)). That is almost same as what we(Yoshihiro and I) are trying on integrated tracing, we are doing it on ftrace and trace-cmd (but perhaps, it eventually works on perf-ftrace). Having the events generated with the proper timestamp is the simpler approach than trying to collect various tidbits of data, massage timestamps (and hoping the clock source hasn't changed) and then merge events. Yeah, if possible, we'd like to use it too. And then for the cherry on top a design that works across architectures (e.g., x86 now, but arm later). I think your proposal is good for the default implementation, it doesn't depends on the arch specific feature. However, since physical timer(clock) interfaces and virtualization interfaces strongly depends on the arch, I guess the optimized implementations will become different on each arch. For example, maybe we can export tsc-offset to the guest to adjust clock on x86, but not on ARM, or other devices. In that case, until implementing optimized one, we can use paravirt perf_clock. Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
(2010/12/16 19:20), Avi Kivity wrote: On 12/13/2010 01:20 PM, Masami Hiramatsu wrote: (2010/12/13 2:47), Avi Kivity wrote: On 12/12/2010 07:43 PM, Arnaldo Carvalho de Melo wrote: Em Sun, Dec 12, 2010 at 07:42:06PM +0200, Avi Kivity escreveu: On 12/12/2010 07:36 PM, Arnaldo Carvalho de Melo wrote: Em Sun, Dec 12, 2010 at 06:35:24PM +0200, Avi Kivity escreveu: On 11/23/2010 05:45 PM, Steven Rostedt wrote: Again, the work around is to replace your trace_printks() with __trace_printk(_THIS_IP_, ...) or just modify the trace_printk() macro in include/linux/kernel.h to always use the __trace_printk() version. This works; I'm using it for now (I tried to use 'perf probe', but I get unpredictable results, like null pointer derefs). Can you tell us which functions, environment, etc? Something around 2.6.27-rc4; example functions are FNAME(fetch) in arch/x86/kvm/paging_tmpl.h; compiled modular (which was Steven's guess as to why it fails). (note, the failure is with trace-cmd, not /sys/kernel/debug/tracing). I mean the I tried to use 'perf probe' part. Well, same, more or less. perf probe -m kvm --add 'fetch_access=paging64_fetch pt_access=gw-pt_access pte_access=gw-pte_access dirty' would return garbage for gw-*, and the log would show the exception handler called. gw is most certainly valid. Thank you for reporting. Hmm, actually, pagefaults could happen on fetching variables. But fetching argument routines should handle it... They did handle it (or so I understood from the logs). But they shouldn't have occured in the first place, since gw was dereferenceable (and the function dereferences it). Ah, OK. Sometimes, it's hard to find the register/memory location of local variables. (and sometimes it fails) So something went wrong while fetching gw itself (do you interpret the dwarf tables to find where the variable is stored?) Hm, yes, you can use eu-readelf to dump debuginfo, and also objdump will help you to find the address and assembler code. I'd like to check it, could you tell me details? for example, that exception log, kprobe-tracer's event definition(you can see it via debugfs/tracing/kprobe-events) and the result of `perf probe -L paging64_fetch:0-10`. I no longer have the logs, I'll try to reproduce it later. Oh, Thank you! :) -- Masami HIRAMATSU 2nd Dept. Linux Technology Center Hitachi, Ltd., Systems Development Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trace_printk() support in trace-cmd
(2010/12/13 2:47), Avi Kivity wrote: On 12/12/2010 07:43 PM, Arnaldo Carvalho de Melo wrote: Em Sun, Dec 12, 2010 at 07:42:06PM +0200, Avi Kivity escreveu: On 12/12/2010 07:36 PM, Arnaldo Carvalho de Melo wrote: Em Sun, Dec 12, 2010 at 06:35:24PM +0200, Avi Kivity escreveu: On 11/23/2010 05:45 PM, Steven Rostedt wrote: Again, the work around is to replace your trace_printks() with __trace_printk(_THIS_IP_, ...) or just modify the trace_printk() macro in include/linux/kernel.h to always use the __trace_printk() version. This works; I'm using it for now (I tried to use 'perf probe', but I get unpredictable results, like null pointer derefs). Can you tell us which functions, environment, etc? Something around 2.6.27-rc4; example functions are FNAME(fetch) in arch/x86/kvm/paging_tmpl.h; compiled modular (which was Steven's guess as to why it fails). (note, the failure is with trace-cmd, not /sys/kernel/debug/tracing). I mean the I tried to use 'perf probe' part. Well, same, more or less. perf probe -m kvm --add 'fetch_access=paging64_fetch pt_access=gw-pt_access pte_access=gw-pte_access dirty' would return garbage for gw-*, and the log would show the exception handler called. gw is most certainly valid. Thank you for reporting. Hmm, actually, pagefaults could happen on fetching variables. But fetching argument routines should handle it... I'd like to check it, could you tell me details? for example, that exception log, kprobe-tracer's event definition(you can see it via debugfs/tracing/kprobe-events) and the result of `perf probe -L paging64_fetch:0-10`. Best regards, -- Masami HIRAMATSU 2nd Dept. Linux Technology Center Hitachi, Ltd., Systems Development Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
oerg Roedel wrote: On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote: Hm, that sounds rather messy if we want to use it to basically expose kernel functionality in a guest/host unified way. Is the qemu process discoverable in some secure way? Can we trust it? Is there some proper tooling available to do it, or do we have to push it through 2-3 packages to get such a useful feature done? Since we want to implement a pmu usable for the guest anyway why we don't just use a guests perf to get all information we want? If we get a pmu-nmi from the guest we just re-inject it to the guest and perf in the guest gives us all information we wand including kernel and userspace symbols, stack traces, and so on. I guess this aims to get information from old environments running on kvm for life extension :) In the previous thread we discussed about a direct trace channel between guest and host kernel (which can be used for ftrace events for example). This channel could be used to transport this information to the host kernel. Interesting! I know the people who are trying to do that with systemtap. See, http://vesper.sourceforge.net/ The only additional feature needed is a way for the host to start a perf instance in the guest. # ssh localguest perf record --host-chanel ... ? B-) Thank you, Opinions? Joerg -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Masami Hiramatsu e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TOOL] c2kpe: C expression to kprobe event format converter
Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:59:19PM -0400, Masami Hiramatsu wrote: This program converts probe point in C expression to kprobe event format for kprobe-based event tracer. This helps to define kprobes events by C source line number or function name, and local variable name. Currently, this supports only x86(32/64) kernels. Compile Before compilation, please install libelf and libdwarf development packages. (e.g. elfutils-libelf-devel and libdwarf-devel on Fedora) This may probably need a specific libdwarf version? c2kpe.c: In function ‘die_get_entrypc’: c2kpe.c:422: erreur: ‘Dwarf_Ranges’ undeclared (first use in this function) c2kpe.c:422: erreur: (Each undeclared identifier is reported only once c2kpe.c:422: erreur: for each function it appears in.) c2kpe.c:422: erreur: ‘ranges’ undeclared (first use in this function) c2kpe.c:447: attention : implicit declaration of function ‘dwarf_get_ranges’ c2kpe.c:451: attention : implicit declaration of function ‘dwarf_ranges_dealloc’ Aah, sure, it should be compiled with libdwarf newer than 20090324. You can find it in http://reality.sgiweb.org/davea/dwarf.html BTW, libdwarf and libdw (which is the yet another implementation of dwarf library) are still under development, e.g. libdwarf doesn't support gcc-4.4.1(very new) and only the latest libdw(0.142) can support it. So, perhaps I might better port it on libdw, even that is less documented...:( TODO - Fix bugs. - Support multiple probepoints from stdin. - Better kmodule support. - Use elfutils-libdw? - Merge into trace-cmd or perf-tools? Yeah definetly, that would be a veeery interesting thing to have. I've played with kprobe ftrace to debug something this evening. It's very cool to be able to put dynamic tracepoints in desired places. But... I firstly needed to put random trace_printk() in some places to observe some variables values. And then I thought about the kprobes tracer and realized I could do that without the need of rebuilding my kernel. Then I've played with it and indeed it works well and it's useful, but at the cost of reading objdump based assembly code to find the places where I could find my variables values. And after two or three probes in such conditions, I've become tired of that, then I wanted to try this tool. While I cannot yet because of this build error, I can imagine the power of such facility from perf. We could have a perf probe that creates a kprobe event in debugfs (default enable = 0) and which then rely on perf record for the actual recording. Then we could analyse it through perf trace. Let's imagine a simple example: int foo(int arg1, int arg2) { int var1; var1 = arg1; var1 *= arg2; var1 -= arg1; -- insert a probe here (file bar.c : line 60) var1 ^= ... return var1; } ./perf kprobe --file bar.c:60 --action arg1=%d,arg2=%d,var1=%d -- ls -R / I recommend it should be separated from record, like below: # set new event ./perf kprobe --add kprobe:event1 --file bar.c:60 --action arg1=%d,arg2=%d,var1=%d # record new event ./perf record -e kprobe:event1 -a -R -- ls -R / This will allow us to focus on one thing -- convert C to kprobe-tracer. And also, it can be listed as like as tracepoint events. ./perf trace arg1=1 arg2=1 var1=0 arg1=2 arg2=2 var1=2 etc.. You may want to sort by field: ./perf trace -s arg1 --order desc arg1=1 | --- arg2=1 var=1 | --- arg2=2 var=1 arg1=2 | --- arg2=1 var=0 | --- [...] ./perf trace -s arg1,arg2 --order asc arg1=1 | --- arg2=1 | - var1=0 | - var1= arg2=... | Ok the latter is a bad example because var1 will always have only one value for a given arg1 and arg2. But I guess you see the point. You won't have to care about the perf trace part, it's already implemented and I'll soon handle the sorting part. All we need is the perf kprobes that translate a C level probing expression to a /debug/tracing/kprobe_events compliant thing. And then just call perf record with the new created event as an argument. Indeed, that's what I imagine. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tracing/kprobes 1/4] x86: Fix x86 instruction decoder selftest to check only .text
Fix x86 instruction decoder selftest to check only .text because other sections (e.g. .notes) will have random bytes which don't need to be checked. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/tools/Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile index 3dd626b..95e9cc4 100644 --- a/arch/x86/tools/Makefile +++ b/arch/x86/tools/Makefile @@ -1,6 +1,6 @@ PHONY += posttest quiet_cmd_posttest = TEST$@ - cmd_posttest = $(OBJDUMP) -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len + cmd_posttest = $(OBJDUMP) -d -j .text $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len posttest: $(obj)/test_get_len vmlinux $(call cmd,posttest) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tracing/kprobes 2/4] x86: Check awk features before generating inat-tables.c
Check some awk features which old mawk doesn't support. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/tools/gen-insn-attr-x86.awk | 20 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk index 93b62c9..19ba096 100644 --- a/arch/x86/tools/gen-insn-attr-x86.awk +++ b/arch/x86/tools/gen-insn-attr-x86.awk @@ -4,7 +4,25 @@ # # Usage: awk -f gen-insn-attr-x86.awk x86-opcode-map.txt inat-tables.c +# Awk implementation sanity check +function check_awk_implement() { + if (!match(abc, [[:lower:]]+)) + return Your awk doesn't support charactor-class. + if (sprintf(%x, 0) != 0) + return Your awk has a printf-format problem. + return +} + BEGIN { + # Implementation error checking + awkchecked = check_awk_implement() + if (awkchecked != ) { + print Error: awkchecked /dev/stderr + print Please try to use gawk. /dev/stderr + exit 1 + } + + # Setup generating tables print /* x86 opcode map generated from x86-opcode-map.txt */ print /* Do not change this code. */ ggid = 1 @@ -293,6 +311,8 @@ function convert_operands(opnd, i,imm,mod) } END { + if (awkchecked != ) + exit 1 # print escape opcode map's array print /* Escape opcode map array */ print const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1] \ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tracing/kprobes 3/4] tracing/kprobes: Fix format typo in trace_kprobes
Fix a format typo in kprobe-tracer. Currently, it shows 'tsize' in format; $ cat /debug/tracing/events/kprobes/event/format ... field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; ... This should be '\tsize'; $ cat /debug/tracing/events/kprobes/event/format ... field: unsigned long ip;offset:16; size:8; field: int nargs; offset:24; size:4; ... Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- kernel/trace/trace_kprobe.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 7cd726e..22e91c0 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -1069,7 +1069,7 @@ static int __probe_event_show_format(struct trace_seq *s, #define SHOW_FIELD(type, item, name) \ do {\ ret = trace_seq_printf(s, \tfield: #type %s;\t\ - offset:%u;tsize:%u;\n, name, \ + offset:%u;\tsize:%u;\n, name, \ (unsigned int)offsetof(typeof(field), item),\ (unsigned int)sizeof(type));\ if (!ret) \ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH tracing/kprobes 4/4] tracing/kprobes: Change trace_arg to probe_arg
Change trace_arg_string() and parse_trace_arg() to probe_arg_string() and parse_probe_arg(), since those are kprobe-tracer local functions. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- kernel/trace/trace_kprobe.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 22e91c0..783d2db 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -220,7 +220,7 @@ static __kprobes void *probe_address(struct trace_probe *tp) return (probe_is_return(tp)) ? tp-rp.kp.addr : tp-kp.addr; } -static int trace_arg_string(char *buf, size_t n, struct fetch_func *ff) +static int probe_arg_string(char *buf, size_t n, struct fetch_func *ff) { int ret = -EINVAL; @@ -250,7 +250,7 @@ static int trace_arg_string(char *buf, size_t n, struct fetch_func *ff) if (ret = n) goto end; l += ret; - ret = trace_arg_string(buf + l, n - l, id-orig); + ret = probe_arg_string(buf + l, n - l, id-orig); if (ret 0) goto end; l += ret; @@ -408,7 +408,7 @@ static int split_symbol_offset(char *symbol, long *offset) #define PARAM_MAX_ARGS 16 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long)) -static int parse_trace_arg(char *arg, struct fetch_func *ff, int is_return) +static int parse_probe_arg(char *arg, struct fetch_func *ff, int is_return) { int ret = 0; unsigned long param; @@ -499,7 +499,7 @@ static int parse_trace_arg(char *arg, struct fetch_func *ff, int is_return) if (!id) return -ENOMEM; id-offset = offset; - ret = parse_trace_arg(arg, id-orig, is_return); + ret = parse_probe_arg(arg, id-orig, is_return); if (ret) kfree(id); else { @@ -617,7 +617,7 @@ static int create_trace_probe(int argc, char **argv) ret = -ENOSPC; goto error; } - ret = parse_trace_arg(argv[i], tp-args[i], is_return); + ret = parse_probe_arg(argv[i], tp-args[i], is_return); if (ret) goto error; } @@ -680,7 +680,7 @@ static int probes_seq_show(struct seq_file *m, void *v) seq_printf(m, 0x%p, probe_address(tp)); for (i = 0; i tp-nr_args; i++) { - ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); + ret = probe_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); if (ret 0) { pr_warning(Argument%d decoding error(%d).\n, i, ret); return ret; @@ -996,7 +996,7 @@ static int kprobe_event_define_fields(struct ftrace_event_call *event_call) sprintf(buf, arg%d, i); DEFINE_FIELD(unsigned long, args[i], buf, 0); /* Set argument string as an alias field */ - ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); + ret = probe_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); if (ret 0) return ret; DEFINE_FIELD(unsigned long, args[i], buf, 0); @@ -1023,7 +1023,7 @@ static int kretprobe_event_define_fields(struct ftrace_event_call *event_call) sprintf(buf, arg%d, i); DEFINE_FIELD(unsigned long, args[i], buf, 0); /* Set argument string as an alias field */ - ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); + ret = probe_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); if (ret 0) return ret; DEFINE_FIELD(unsigned long, args[i], buf, 0); @@ -1040,7 +1040,7 @@ static int __probe_event_show_format(struct trace_seq *s, /* Show aliases */ for (i = 0; i tp-nr_args; i++) { - ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]); + ret = probe_arg_string
Re: [PATCH -tip v14 01/12] x86: instruction decoder API
Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps include a few SSE and FP opcodes (for setup), because those opcodes are used in the kernel. I'm getting the following build error on an old K7 box: arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’: arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in this function) arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only once arch/x86/lib/inat.c:29: erreur: for each function it appears in.) Thanks for reporting! Hmm, it seems that inat-tables.c is not correctly generated. Could you tell me which awk you used and send the inat-tables.c? Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v14 01/12] x86: instruction decoder API
Frederic Weisbecker wrote: On Thu, Aug 20, 2009 at 01:42:31AM +0200, Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps include a few SSE and FP opcodes (for setup), because those opcodes are used in the kernel. I'm getting the following build error on an old K7 box: arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’: arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in this function) arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only once arch/x86/lib/inat.c:29: erreur: for each function it appears in.) I've attached my config. I haven't such problem on a dual x86-64 box. Actually I have the same problem in x86-64 The content of my arch/x86/lib/inat-tables.c: /* x86 opcode map generated from x86-opcode-map.txt */ /* Do not change this code. */ /* Table: one byte opcode */ /* Escape opcode map array */ const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1][INAT_LPREFIX_MAX + 1] = { }; /* Group opcode map array */ const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1][INAT_LPREFIX_MAX + 1] = { }; I guess there is a problem with the generation of this file. Aah, you may use mawk on Ubuntu 9.04, right? If so, unfortunately, mawk is still under development. http://invisible-island.net/mawk/CHANGES 20090727 add check/fix to prevent gsub from recurring to modify on a substring of the current line when the regular expression is anchored to the beginning of the line; fixes gawk's anchgsub testcase. add check for implicit concatenation mistaken for exponent; fixes gawk's hex testcase. add character-classes to built-in regular expressions. ^^ Look, this means we can't use char-class expressions like [:lower:] until this version... And I've found another bug in mawk-1.3.3-20090728(the latest one). it almost works, but; $ mawk 'BEGIN {printf(0x%x\n, 0)}' 0x1 $ gawk 'BEGIN {printf(0x%x\n, 0)}' 0x0 This bug skips an array element index 0x0 in inat-tables.c :( So, I recommend you to install gawk instead mawk until that supports all posix-awk features, since I don't think it is good idea to avoid all those bugs which depends on implementation (not specification). Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v14 01/12] x86: instruction decoder API
Frederic Weisbecker wrote: On Thu, Aug 20, 2009 at 11:03:40AM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Thu, Aug 20, 2009 at 01:42:31AM +0200, Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps include a few SSE and FP opcodes (for setup), because those opcodes are used in the kernel. I'm getting the following build error on an old K7 box: arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’: arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in this function) arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only once arch/x86/lib/inat.c:29: erreur: for each function it appears in.) I've attached my config. I haven't such problem on a dual x86-64 box. Actually I have the same problem in x86-64 The content of my arch/x86/lib/inat-tables.c: /* x86 opcode map generated from x86-opcode-map.txt */ /* Do not change this code. */ /* Table: one byte opcode */ /* Escape opcode map array */ const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1][INAT_LPREFIX_MAX + 1] = { }; /* Group opcode map array */ const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1][INAT_LPREFIX_MAX + 1] = { }; I guess there is a problem with the generation of this file. Aah, you may use mawk on Ubuntu 9.04, right? If so, unfortunately, mawk is still under development. http://invisible-island.net/mawk/CHANGES Aargh... 20090727 add check/fix to prevent gsub from recurring to modify on a substring of the current line when the regular expression is anchored to the beginning of the line; fixes gawk's anchgsub testcase. add check for implicit concatenation mistaken for exponent; fixes gawk's hex testcase. add character-classes to built-in regular expressions. ^^ Look, this means we can't use char-class expressions like [:lower:] until this version... And I've found another bug in mawk-1.3.3-20090728(the latest one). it almost works, but; $ mawk 'BEGIN {printf(0x%x\n, 0)}' 0x1 Ouch, indeed. $ gawk 'BEGIN {printf(0x%x\n, 0)}' 0x0 This bug skips an array element index 0x0 in inat-tables.c :( So, I recommend you to install gawk instead mawk until that supports all posix-awk features, since I don't think it is good idea to avoid all those bugs which depends on implementation (not specification). Thank you, Yeah, indeed. May be add a warning (or build error) in case the user uses mawk? Hmm, it is possible that mawk will fix those bugs and catch up soon, so, I think checking mawk is not a good idea. (and since there will be other awk implementations, it's not fair.) I think what all I can do now is reporting bugs to mawk and ubuntu people.:-) Anyway that works fine now with gawk, thanks! All your patches build well :-) Thank you for testing! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v14 01/12] x86: instruction decoder API
Frederic Weisbecker wrote: On Thu, Aug 20, 2009 at 12:16:05PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Thu, Aug 20, 2009 at 11:03:40AM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Thu, Aug 20, 2009 at 01:42:31AM +0200, Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote: Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps include a few SSE and FP opcodes (for setup), because those opcodes are used in the kernel. I'm getting the following build error on an old K7 box: arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’: arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in this function) arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only once arch/x86/lib/inat.c:29: erreur: for each function it appears in.) I've attached my config. I haven't such problem on a dual x86-64 box. Actually I have the same problem in x86-64 The content of my arch/x86/lib/inat-tables.c: /* x86 opcode map generated from x86-opcode-map.txt */ /* Do not change this code. */ /* Table: one byte opcode */ /* Escape opcode map array */ const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1][INAT_LPREFIX_MAX + 1] = { }; /* Group opcode map array */ const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1][INAT_LPREFIX_MAX + 1] = { }; I guess there is a problem with the generation of this file. Aah, you may use mawk on Ubuntu 9.04, right? If so, unfortunately, mawk is still under development. http://invisible-island.net/mawk/CHANGES Aargh... 20090727 add check/fix to prevent gsub from recurring to modify on a substring of the current line when the regular expression is anchored to the beginning of the line; fixes gawk's anchgsub testcase. add check for implicit concatenation mistaken for exponent; fixes gawk's hex testcase. add character-classes to built-in regular expressions. ^^ Look, this means we can't use char-class expressions like [:lower:] until this version... And I've found another bug in mawk-1.3.3-20090728(the latest one). it almost works, but; $ mawk 'BEGIN {printf(0x%x\n, 0)}' 0x1 Ouch, indeed. $ gawk 'BEGIN {printf(0x%x\n, 0)}' 0x0 This bug skips an array element index 0x0 in inat-tables.c :( So, I recommend you to install gawk instead mawk until that supports all posix-awk features, since I don't think it is good idea to avoid all those bugs which depends on implementation (not specification). Thank you, Yeah, indeed. May be add a warning (or build error) in case the user uses mawk? Hmm, it is possible that mawk will fix those bugs and catch up soon, so, I think checking mawk is not a good idea. (and since there will be other awk implementations, it's not fair.) I think what all I can do now is reporting bugs to mawk and ubuntu people.:-) Yeah, but without your tip I couldn't be able to find the origin before some time. And the kernel couldn't build anyway. At least we should do something with this version of mawk. Hm, indeed. Maybe, we can run additional sanity check script before using awk, like this; --- res=`echo a | $AWK '/[[:lower:]]+/{print OK}'` [ $res != OK ] exit 1 res=`$AWK 'BEGIN {printf(%x, 0)}'` [ $res != 0 ] exit 1 exit 0 --- Thanks, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TOOL] kprobestest : Kprobe stress test tool
Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:57:20PM -0400, Masami Hiramatsu wrote: This script tests kprobes to probe on all symbols in the kernel and finds symbols which must be blacklisted. Usage - kprobestest [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST] Run stress test. If SYMLIST file is specified, use it as an initial symbol list (This is useful for verifying white list after diagnosing all symbols). kprobestest cleanup Cleanup all lists How to Work --- This tool list up all symbols in the kernel via /proc/kallsyms, and sorts it into groups (each of them including 64 symbols in default). And then, it tests each group by using kprobe-tracer. If a kernel crash occurred, that group is moved into 'failed' dir. If the group passed the test, this script moves it into 'passed' dir and saves kprobe_profile into 'passed/profiles/'. After testing all groups, all 'failed' groups are merged and sorted into smaller groups (divided by 4, in default). And those are tested again. This loop will be repeated until all group has just 1 symbol. Finally, the script sorts all 'passed' symbols into 'tested', 'untested', and 'missed' based on profiles. Note - This script just gives us some clues to the blacklisted functions. In some cases, a combination of probe points will cause a problem, but each of them doesn't cause the problem alone. Thank you, This script makes my x86-64 dual core easily and hardly locking-up on the 1st batch of symbols to test. I have one sym list in the failed and unset directories: int_very_careful int_signal int_restore_rest stub_clone stub_fork stub_vfork stub_sigaltstack stub_iopl ptregscall_common stub_execve stub_rt_sigreturn irq_entries_start common_interrupt ret_from_intr exit_intr retint_with_reschedule retint_check retint_swapgs retint_restore_args restore_args irq_return retint_careful retint_signal retint_kernel irq_move_cleanup_interrupt reboot_interrupt apic_timer_interrupt generic_interrupt invalidate_interrupt0 invalidate_interrupt1 invalidate_interrupt2 invalidate_interrupt3 invalidate_interrupt4 invalidate_interrupt5 invalidate_interrupt6 invalidate_interrupt7 threshold_interrupt thermal_interrupt mce_self_interrupt call_function_single_interrupt call_function_interrupt reschedule_interrupt error_interrupt spurious_interrupt perf_pending_interrupt divide_error overflow bounds invalid_op device_not_available double_fault coprocessor_segment_overrun invalid_TSS segment_not_present spurious_interrupt_bug coprocessor_error alignment_check simd_coprocessor_error native_load_gs_index gs_change kernel_thread child_rip kernel_execve call_softirq I don't have a crash log because I was running with X. But it also happened with other batch of symbols. Thank you for reporting, here, I also have a result tested on k...@x86-64. native_read_tscp native_read_msr_safe native_read_msr_amd_safe native_write_msr_safe vmalloc_fault spurious_fault search_exception_tables notify_die trace_hardirqs_off_caller ident_complete lock_acquire lock_release bad_address secondary_startup_64 stack_start bad_address restore_args irq_return restore trace_hardirqs_off_thunk init_level4_pgt level3_ident_pgt level3_kernel_pgt level2_fixmap_pgt _text startup_64 level1_fixmap_pgt level2_ident_pgt level2_kernel_pgt level2_spare_pgt native_get_debugreg native_set_debugreg native_set_iopl_mask native_load_sp0 debug_show_all_locks debug_check_no_locks_held valid_state mark_lock mark_held_locks lockdep_trace_alloc trace_softirqs_on trace_hardirqs_on_caller __down_write __down_read trace_hardirqs_on_thunk lockdep_sys_exit_thunk Most of them can be fixed just by adding __kprobes. Some of them which are already in the another section, kprobes should check the symbols are in the section. The problem is that I don't have any serial line in this box then I can't catch any crash log. My K7 testbox also died in my arms this afternoon. But I still have two other testboxes (one P2 and one P3), hopefully I could reproduce the problem in these boxes in which I can connect a serial line. Thank you for helping me to find it! I've pushed your patches in the following git tree: git://git.kernel.org/pub/scm/linux/kernel/git/fgrederic/random-tracing.git \ tracing/kprobes So you can send patches on top of this one. Great! I've found another trivial bugs, so I'll fix those on it. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TOOL] kprobestest : Kprobe stress test tool
Frederic Weisbecker wrote: Most of them can be fixed just by adding __kprobes. Some of them which are already in the another section, kprobes should check the symbols are in the section. You mean the blacklist? I also fear that putting bad kprobed functions into the kprobe section or into the blacklist may hide some kprobe internal bugs. Doing so is indeed mandatory for functions that trigger tracing recursion of things like that, but what if kprobe has an internal bug that only triggers while probe a certain class of function. Ie: it would be nice to identify the reason of the crash for each culprit in these lists. That may even help to find the others in advance. Indeed, actually I've found some bugs while making jump-optimization patches by using this stress test. But some of them are obviously what we just forget to add __kprobes, since those will be called from kprobes int3 handling functions. And also, many lock-related code has been changed. I think kprobes should use raw_*_lock, or prohibit to probe lock monitoring functions like lockdep, because it will cause recursive call. Also kprobes seems to be a very fragile feature (that's what this selftest unearthes at least for me). And it really needs a recursion detection that stops every kprobing while reaching a given threshold of recursion. Something that would dump the stack and the falling kprobe structure. Hmm, kprobes already has recursion detection(kp-nmiss), so maybe, we can check it. That would avoid such hard lockups and also help to identify the dangerous symbols to probe. The problem is that I don't have any serial line in this box then I can't catch any crash log. My K7 testbox also died in my arms this afternoon. But I still have two other testboxes (one P2 and one P3), hopefully I could reproduce the problem in these boxes in which I can connect a serial line. Thank you for helping me to find it! I've pushed your patches in the following git tree: git://git.kernel.org/pub/scm/linux/kernel/git/fgrederic/random-tracing.git \ tracing/kprobes So you can send patches on top of this one. Great! I've found another trivial bugs, so I'll fix those on it. Cool :) Btw, here is the result of your stress test in a PIII (attaching the log and the config). Thanks, I'll check that. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v14 03/12] kprobes: checks probe address is instruction boudary on x86
Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:34:28PM -0400, Masami Hiramatsu wrote: Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..80d493f 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ +struct kprobe *kp; +kp = get_kprobe((void *)addr); +if (!kp) +return -EINVAL; + +/* + * Basically, kp-ainsn.insn has an original instruction. + * However, RIP-relative instruction can not do single-stepping + * at different place, fix_riprel() tweaks the displacement of + * that instruction. In that case, we can't recover the instruction + * from the kp-ainsn.insn. + * + * On the other hand, kp-opcode has a copy of the first byte of + * the probed instruction, which is overwritten by int3. And + * the instruction at kp-addr is not modified by kprobes except + * for the first byte, we can recover the original instruction + * from it and kp-opcode. + */ +memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); +buf[0] = kp-opcode; +return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ +int ret; +unsigned long addr, offset = 0; +struct insn insn; +kprobe_opcode_t buf[MAX_INSN_SIZE]; + +if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) +return 0; + +/* Decode instructions */ +addr = paddr - offset; +while (addr paddr) { +kernel_insn_init(insn, (void *)addr); +insn_get_opcode(insn); + +/* Check if the instruction has been modified. */ +if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) { +ret = recover_probed_instruction(buf, addr); I'm confused about the reason of this recovering. Is it to remove kprobes behind the current setting one in the current function? No, it recovers just an instruction which is probed by a kprobe, because we need to know the first byte of this instruction for decoding it. Perhaps we'd better to have more generic interface (text_peek?) for it because another subsystem (e.g. kgdb) may want to insert int3... Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v14 03/12] kprobes: checks probe address is instruction boudary on x86
Frederic Weisbecker wrote: On Tue, Aug 18, 2009 at 07:17:39PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); I'm confused about the reason of this recovering. Is it to remove kprobes behind the current setting one in the current function? No, it recovers just an instruction which is probed by a kprobe, because we need to know the first byte of this instruction for decoding it. Ah, sorry, it was not accurate. the function recovers an instruction on the buffer(buf), not on the real kernel text. :) Perhaps we'd better to have more generic interface (text_peek?) for it because another subsystem (e.g. kgdb) may want to insert int3... Thank you, Aah, I see now, it's to keep a sane check of the instructions boundaries without int 3 artifacts in the middle. But in that case, you should re-arm the breakpoint after your check, right? Or may be you could do the check without repatching? Yes, it doesn't modify kernel text, just recover an original instruction from kernel text and backup byte on a buffer. May be by doing a copy of insn.opcode.bytes and replacing bytes[0] with what a random kprobe has stolen? Hm, no, this function is protected from other kprobes by kprobe_mutex. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip v14 07/12] tracing: Introduce TRACE_FIELD_ZERO() macro
Frederic Weisbecker wrote: On Thu, Aug 13, 2009 at 04:35:01PM -0400, Masami Hiramatsu wrote: Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item). This also includes a fix of TRACE_ZERO_CHAR() macro. I can't find what the fix is about (see below) Ah, OK. This patch actually includes two parts. One is introducing TRACE_FIELD_ZERO which is more generic than TRACE_FIELD_ZERO_CHAR, I think. Another is a typo fix of TRACE_ZERO_CHAR. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- kernel/trace/trace_event_types.h |4 ++-- kernel/trace/trace_export.c | 16 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 6db005e..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) TRACE_FIELD(char *, fmt, fmt) -TRACE_FIELD_ZERO_CHAR(buf) +TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); @@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) -TRACE_FIELD_ZERO_CHAR(buf) +TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index 71c8d7f..b0ac92c 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -42,9 +42,9 @@ extern void __bad_type_size(void); if (!ret) \ return 0; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item) \ -ret = trace_seq_printf(s, \tfield:char #item ;\t \ +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) \ +ret = trace_seq_printf(s, \tfield: #type #item ;\t \ offset:%u;\tsize:0;\n, \ (unsigned int)offsetof(typeof(field), item)); \ if (!ret) \ @@ -92,9 +92,6 @@ ftrace_format_##call(struct ftrace_event_call *unused, \ #include trace_event_types.h -#undef TRACE_ZERO_CHAR -#define TRACE_ZERO_CHAR(arg) - #undef TRACE_FIELD #define TRACE_FIELD(type, item, assign)\ entry-item = assign; @@ -107,6 +104,9 @@ ftrace_format_##call(struct ftrace_event_call *unused, \ #define TRACE_FIELD_SIGN(type, item, assign, is_signed) \ TRACE_FIELD(type, item, assign) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) + Is it about the above moving? If so, could you just tell so that I can add something about it in the changelog. No, I assume that TRACE_ZERO_CHAR is just a typo of TRACE_FIELD_ZERO_CHAR. (because I couldn't find any other TRACE_ZERO_CHAR) BTW, this patch may not be needed after applying patch 10/12, since it removes ftrace event definitions of TRACE_KPROBE/KRETPROBE. Perhaps, would I better merge and split those additional patches(and remove this change)? (It also could make the incremental review hard...) Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v14 02/12] x86: x86 instruction decoder build-time selftest
Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/Kconfig.debug|9 +++ arch/x86/Makefile |3 + arch/x86/tools/Makefile | 15 + arch/x86/tools/distill.awk| 42 +++ arch/x86/tools/test_get_len.c | 113 + 5 files changed, 182 insertions(+), 0 deletions(-) create mode 100644 arch/x86/tools/Makefile create mode 100644 arch/x86/tools/distill.awk create mode 100644 arch/x86/tools/test_get_len.c diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d105f29..7d0b681 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -186,6 +186,15 @@ config X86_DS_SELFTEST config HAVE_MMIOTRACE_SUPPORT def_bool y +config X86_DECODER_SELFTEST + bool x86 instruction decoder selftest + depends on DEBUG_KERNEL + ---help--- +Perform x86 instruction decoder selftests at build time. +This option is useful for checking the sanity of x86 instruction +decoder code. +If unsure, say N. + # # IO delay types: # diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1f3851a..f79580c 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -154,6 +154,9 @@ all: bzImage KBUILD_IMAGE := $(boot)/bzImage bzImage: vmlinux +ifeq ($(CONFIG_X86_DECODER_SELFTEST),y) + $(Q)$(MAKE) $(build)=arch/x86/tools posttest +endif $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile new file mode 100644 index 000..3dd626b --- /dev/null +++ b/arch/x86/tools/Makefile @@ -0,0 +1,15 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = $(OBJDUMP) -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len + +posttest: $(obj)/test_get_len vmlinux + $(call cmd,posttest) + +hostprogs-y:= test_get_len + +# -I needed for generated C source and C source which in the kernel tree. +HOSTCFLAGS_test_get_len.o := -Wall -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/ + +# Dependancies are also needed. +$(obj)/test_get_len.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + diff --git a/arch/x86/tools/distill.awk b/arch/x86/tools/distill.awk new file mode 100644 index 000..d433619 --- /dev/null +++ b/arch/x86/tools/distill.awk @@ -0,0 +1,42 @@ +#!/bin/awk -f +# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len +# Distills the disassembly as follows: +# - Removes all lines except the disassembled instructions. +# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes +# into a single line. +# - Remove bad(or prefix only) instructions + +BEGIN { + prev_addr = + prev_hex = + prev_mnemonic = + bad_expr = (\\(bad\\)|^rex|^.byte|^rep(z|nz)$|^lock$|^es$|^cs$|^ss$|^ds$|^fs$|^gs$|^data(16|32)$|^addr(16|32|64)) + fwait_expr = ^9b + fwait_str=9b\tfwait +} + +/^ *[0-9a-f]+:/ { + if (split($0, field, \t) 3) { + # This is a continuation of the same insn. + prev_hex = prev_hex field[2] + } else { + # Skip bad instructions + if (match(prev_mnemonic, bad_expr)) + prev_addr = + # Split fwait from other f* instructions + if (match(prev_hex, fwait_expr) prev_mnemonic != fwait) { + printf %s\t%s\n, prev_addr, fwait_str + sub(fwait_expr, , prev_hex) + } + if (prev_addr
[PATCH -tip v14 01/12] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps include a few SSE and FP opcodes (for setup), because those opcodes are used in the kernel. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Acked-by: H. Peter Anvin h...@zytor.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/include/asm/inat.h | 188 + arch/x86/include/asm/inat_types.h| 29 + arch/x86/include/asm/insn.h | 143 +++ arch/x86/lib/Makefile| 13 + arch/x86/lib/inat.c | 78 arch/x86/lib/insn.c | 464 ++ arch/x86/lib/x86-opcode-map.txt | 719 ++ arch/x86/tools/gen-insn-attr-x86.awk | 314 +++ 8 files changed, 1948 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/inat_types.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/tools/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..2866fdd --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,188 @@ +#ifndef _ASM_X86_INAT_H +#define _ASM_X86_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include asm/inat_types.h + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should use checking functions. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6
[PATCH -tip v14 00/12] tracing: kprobe-based event tracer and x86 instruction decoder
a new definition to kprobe_events as below. echo p:myprobe do_sys_open a0 a1 a2 a3 /sys/kernel/debug/tracing/kprobe_events This sets a kprobe on the top of do_sys_open() function with recording 1st to 4th arguments as myprobe event. echo r:myretprobe do_sys_open rv ra /sys/kernel/debug/tracing/kprobe_events This sets a kretprobe on the return point of do_sys_open() function with recording return value and return address as myretprobe event. You can see the format of these events via /sys/kernel/debug/tracing/events/kprobes/EVENT/format. cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format name: myprobe ID: 23 format: field:unsigned short common_type; offset:0; size:2; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (12): tracing: Add kprobes event profiling interface tracing: Kprobe tracer assigns new event ids for each event tracing: Generate names for each kprobe event automatically tracing: Kprobe-tracer supports more than 6 arguments tracing: add kprobe-based event tracer tracing: Introduce TRACE_FIELD_ZERO() macro tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobetrace.txt | 148 arch/x86/Kconfig.debug |9 arch/x86/Makefile|3 arch/x86/include/asm/inat.h | 188 + arch/x86/include/asm/inat_types.h| 29 + arch/x86/include/asm/insn.h | 143 arch/x86/include/asm/ptrace.h| 62 ++ arch/x86/kernel/kprobes.c| 197 +++-- arch/x86/kernel/ptrace.c | 112 +++ arch/x86/lib/Makefile| 13 arch/x86/lib/inat.c | 78 ++ arch/x86/lib/insn.c | 464 + arch/x86/lib/x86-opcode-map.txt | 719 arch/x86/tools/Makefile | 15 arch/x86/tools/distill.awk | 42 + arch/x86/tools/gen-insn-attr-x86.awk | 314 + arch/x86/tools/test_get_len.c| 113 +++ include/linux/ftrace_event.h | 14 include/linux/syscalls.h |4 include/trace/ftrace.h | 19 - include/trace/syscall.h |8 kernel/trace/Kconfig | 12 kernel/trace/Makefile|1 kernel/trace/trace.h | 23 + kernel/trace/trace_event_types.h |4 kernel/trace/trace_events.c | 119 ++- kernel/trace/trace_export.c | 39 + kernel/trace/trace_kprobe.c | 1234 ++ kernel/trace/trace_syscalls.c| 16 29 files changed, 3949 insertions(+), 193 deletions(-) create mode 100644
[PATCH -tip v14 04/12] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 80d493f..98f48d0 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67
[PATCH -tip v14 06/12] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Changes from v13: - Define remove_subsystem_dir() always (revirt a2ca5e03), because trace_remove_event_call() uses it. - Modify syscall tracer because of ftrace_event_call change. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Frederic Weisbecker fweis...@gmail.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- include/linux/ftrace_event.h | 14 +++-- include/linux/syscalls.h |4 + include/trace/ftrace.h| 19 +++ include/trace/syscall.h |8 +-- kernel/trace/trace_events.c | 119 + kernel/trace/trace_export.c | 23 kernel/trace/trace_syscalls.c | 16 +++--- 7 files changed, 125 insertions(+), 78 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 189806b..9af68ce 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -112,13 +112,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void *); - void(*unregfunc)(void *); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct ftrace_event_call *call, - struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; struct event_filter *filter; @@ -142,6 +142,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 87d06c1..be59d22 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -165,7 +165,7 @@ static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \ struct trace_event enter_syscall_print_##sname = { \ .trace = print_syscall_enter, \ }; \ - static int init_enter_##sname(void) \ + static int init_enter_##sname(struct ftrace_event_call *call) \ { \ int num, id;\ num = syscall_name_to_nr(sys#sname); \ @@ -201,7 +201,7 @@ static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \ struct trace_event exit_syscall_print_##sname = { \ .trace = print_syscall_exit, \ }; \ - static int init_exit_##sname(void) \ + static int init_exit_##sname(struct ftrace_event_call *call)\ { \ int num, id;\ num = syscall_name_to_nr(sys#sname); \ diff --git
[PATCH -tip v14 07/12] tracing: Introduce TRACE_FIELD_ZERO() macro
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item). This also includes a fix of TRACE_ZERO_CHAR() macro. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- kernel/trace/trace_event_types.h |4 ++-- kernel/trace/trace_export.c | 16 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 6db005e..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) TRACE_FIELD(char *, fmt, fmt) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); @@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index 71c8d7f..b0ac92c 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -42,9 +42,9 @@ extern void __bad_type_size(void); if (!ret) \ return 0; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item)\ - ret = trace_seq_printf(s, \tfield:char #item ;\t \ +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) \ + ret = trace_seq_printf(s, \tfield: #type #item ;\t \ offset:%u;\tsize:0;\n, \ (unsigned int)offsetof(typeof(field), item)); \ if (!ret) \ @@ -92,9 +92,6 @@ ftrace_format_##call(struct ftrace_event_call *unused, \ #include trace_event_types.h -#undef TRACE_ZERO_CHAR -#define TRACE_ZERO_CHAR(arg) - #undef TRACE_FIELD #define TRACE_FIELD(type, item, assign)\ entry-item = assign; @@ -107,6 +104,9 @@ ftrace_format_##call(struct ftrace_event_call *unused, \ #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\ TRACE_FIELD(type, item, assign) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) + #undef TP_CMD #define TP_CMD(cmd...) cmd @@ -178,8 +178,8 @@ __attribute__((section(_ftrace_events))) event_##call = { \ if (ret)\ return ret; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) #undef TRACE_EVENT_FORMAT #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v14 10/12] tracing: Generate names for each kprobe event automatically
Generate names for each kprobe event based on the probe point, and remove generic k*probe event types because there is no user of those types. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- Documentation/trace/kprobetrace.txt |3 +- kernel/trace/trace_event_types.h| 18 -- kernel/trace/trace_kprobe.c | 64 ++- 3 files changed, 35 insertions(+), 50 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index c9c09b4..5e59e85 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -28,7 +28,8 @@ Synopsis of kprobe_events p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe - EVENT : Event name. + EVENT : Event name. If omitted, the event name is generated + based on SYMBOL+offs or MEMADDR. SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 186b598..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, kmemtrace_free_entry, ignore, TP_RAW_FMT(type:%u call_site:%lx ptr:%p) ); -TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx: args:0x%lx ...) -); - -TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, func, func) - TRACE_FIELD(unsigned long, ret_ip, ret_ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...) -); #undef TRACE_SYSTEM diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 4704e40..ec137ed 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -34,6 +34,7 @@ #define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 +#define MAX_EVENT_NAME_LEN 64 /* currently, trace_kprobe only supports X86. */ @@ -280,11 +281,11 @@ static struct trace_probe *alloc_trace_probe(const char *symbol, if (!tp-symbol) goto error; } - if (event) { - tp-call.name = kstrdup(event, GFP_KERNEL); - if (!tp-call.name) - goto error; - } + if (!event) + goto error; + tp-call.name = kstrdup(event, GFP_KERNEL); + if (!tp-call.name) + goto error; INIT_LIST_HEAD(tp-list); return tp; @@ -314,7 +315,7 @@ static struct trace_probe *find_probe_event(const char *event) struct trace_probe *tp; list_for_each_entry(tp, probe_list, list) - if (tp-call.name !strcmp(tp-call.name, event)) + if (!strcmp(tp-call.name, event)) return tp; return NULL; } @@ -330,8 +331,7 @@ static void __unregister_trace_probe(struct trace_probe *tp) /* Unregister a trace_probe and probe_event: call with locking probe_lock */ static void unregister_trace_probe(struct trace_probe *tp) { - if (tp-call.name) - unregister_probe_event(tp); + unregister_probe_event(tp); __unregister_trace_probe(tp); list_del(tp-list); } @@ -360,18 +360,16 @@ static int register_trace_probe(struct trace_probe *tp) goto end; } /* register as an event */ - if (tp-call.name) { - old_tp = find_probe_event(tp-call.name); - if (old_tp) { - /* delete old event */ - unregister_trace_probe(old_tp); - free_trace_probe(old_tp
[PATCH -tip v14 12/12] tracing: Add kprobes event profiling interface
Add profiling interaces for each kprobes event. This interface provides how many times each probe hit or missed. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- Documentation/trace/kprobetrace.txt |8 +++ kernel/trace/trace_kprobe.c | 43 +++ 2 files changed, 51 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 5e59e85..3de7517 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -70,6 +70,14 @@ filter: names and field names for describing filters. +Event Profiling +--- + You can check the total number of probe hits and probe miss-hits via +/sys/kernel/debug/tracing/kprobe_profile. + The first column is event name, the second is the number of probe hits, +the third is the number of probe miss-hits. + + Usage examples -- To add a probe as a new event, write a new definition to kprobe_events diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 0e8498e..0f5d0a6 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -184,6 +184,7 @@ struct trace_probe { struct kprobe kp; struct kretproberp; }; + unsigned long nhit; const char *symbol;/* symbol name */ struct ftrace_event_callcall; struct trace_event event; @@ -781,6 +782,37 @@ static const struct file_operations kprobe_events_ops = { .write = probes_write, }; +/* Probes profiling interfaces */ +static int probes_profile_seq_show(struct seq_file *m, void *v) +{ + struct trace_probe *tp = v; + + seq_printf(m, %-44s %15lu %15lu\n, tp-call.name, tp-nhit, + probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed); + + return 0; +} + +static const struct seq_operations profile_seq_op = { + .start = probes_seq_start, + .next = probes_seq_next, + .stop = probes_seq_stop, + .show = probes_profile_seq_show +}; + +static int profile_open(struct inode *inode, struct file *file) +{ + return seq_open(file, profile_seq_op); +} + +static const struct file_operations kprobe_profile_ops = { + .owner = THIS_MODULE, + .open = profile_open, + .read = seq_read, + .llseek = seq_lseek, + .release= seq_release, +}; + /* Kprobe handler */ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) { @@ -791,6 +823,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) unsigned long irq_flags; struct ftrace_event_call *call = tp-call; + tp-nhit++; + local_save_flags(irq_flags); pc = preempt_count(); @@ -1143,9 +1177,18 @@ static __init int init_kprobe_trace(void) entry = debugfs_create_file(kprobe_events, 0644, d_tracer, NULL, kprobe_events_ops); + /* Event list interface */ if (!entry) pr_warning(Could not create debugfs 'kprobe_events' entry\n); + + /* Profile interface */ + entry = debugfs_create_file(kprobe_profile, 0444, d_tracer, + NULL, kprobe_profile_ops); + + if (!entry) + pr_warning(Could not create debugfs + 'kprobe_profile' entry\n); return 0; } fs_initcall(init_kprobe_trace); -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v14 08/12] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes (kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /sys/kernel/debug/tracing/kprobe_events. And you can set filters on each probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) sa: Fetch stack address. @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobetrace.txt for details. Changes from v13: - Support 'sa' for stack address. - Use call-data instead of container_of() macro. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- Documentation/trace/kprobetrace.txt | 139 kernel/trace/Kconfig| 12 kernel/trace/Makefile |1 kernel/trace/trace.h| 29 + kernel/trace/trace_event_types.h| 18 + kernel/trace/trace_kprobe.c | 1205 +++ 6 files changed, 1404 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt new file mode 100644 index 000..efff6eb --- /dev/null +++ b/Documentation/trace/kprobetrace.txt @@ -0,0 +1,139 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each +probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe + + EVENT : Event name. + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. + MEMADDR : Address where the probe is inserted. + + FETCHARGS : Arguments. + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + sa : Fetch stack address. + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace
[PATCH -tip v14 09/12] tracing: Kprobe-tracer supports more than 6 arguments
Support up to 128 arguments for each kprobes event. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- Documentation/trace/kprobetrace.txt |2 +- kernel/trace/trace_kprobe.c | 21 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index efff6eb..c9c09b4 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -32,7 +32,7 @@ Synopsis of kprobe_events SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. - FETCHARGS : Arguments. + FETCHARGS : Arguments. Each probe can have up to 128 args. %REG : Fetch register REG sN : Fetch Nth entry of stack (N = 0) sa : Fetch stack address. diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index d92877a..4704e40 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -32,7 +32,7 @@ #include trace.h #include trace_output.h -#define TRACE_KPROBE_ARGS 6 +#define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 /* currently, trace_kprobe only supports X86. */ @@ -184,11 +184,15 @@ struct trace_probe { struct kretproberp; }; const char *symbol;/* symbol name */ - unsigned intnr_args; - struct fetch_func args[TRACE_KPROBE_ARGS]; struct ftrace_event_callcall; + unsigned intnr_args; + struct fetch_func args[]; }; +#define SIZEOF_TRACE_PROBE(n) \ + (offsetof(struct trace_probe, args) + \ + (sizeof(struct fetch_func) * (n))) + static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs); static int kretprobe_trace_func(struct kretprobe_instance *ri, struct pt_regs *regs); @@ -263,11 +267,11 @@ static DEFINE_MUTEX(probe_lock); static LIST_HEAD(probe_list); static struct trace_probe *alloc_trace_probe(const char *symbol, -const char *event) +const char *event, int nargs) { struct trace_probe *tp; - tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL); + tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL); if (!tp) return ERR_PTR(-ENOMEM); @@ -573,9 +577,10 @@ static int create_trace_probe(int argc, char **argv) if (offset is_return) return -EINVAL; } + argc -= 2; argv += 2; /* setup a probe */ - tp = alloc_trace_probe(symbol, event); + tp = alloc_trace_probe(symbol, event, argc); if (IS_ERR(tp)) return PTR_ERR(tp); @@ -594,8 +599,8 @@ static int create_trace_probe(int argc, char **argv) kp-addr = addr; /* parse arguments */ - argc -= 2; argv += 2; ret = 0; - for (i = 0; i argc i TRACE_KPROBE_ARGS; i++) { + ret = 0; + for (i = 0; i argc i MAX_TRACE_ARGS; i++) { if (strlen(argv[i]) MAX_ARGSTR_LEN) { pr_info(Argument%d(%s) is too long.\n, i, argv[i]); ret = -ENOSPC; -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v14 11/12] tracing: Kprobe tracer assigns new event ids for each event
Assigns new event ids for each kprobes event. This doesn't clear ring_buffer when unregistering each kprobe event. Thus, if you mind 'Unknown event' messages, clear the buffer manually after changing kprobe events. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- kernel/trace/trace.h|6 - kernel/trace/trace_kprobe.c | 51 +-- 2 files changed, 15 insertions(+), 42 deletions(-) diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 4ce4525..0b78d76 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -43,8 +43,6 @@ enum trace_type { TRACE_POWER, TRACE_BLK, TRACE_KSYM, - TRACE_KPROBE, - TRACE_KRETPROBE, __TRACE_LAST_TYPE, }; @@ -358,10 +356,6 @@ extern void __ftrace_bad_type(void); IF_ASSIGN(var, ent, struct kmemtrace_free_entry,\ TRACE_KMEM_FREE); \ IF_ASSIGN(var, ent, struct ksym_trace_entry, TRACE_KSYM);\ - IF_ASSIGN(var, ent, struct kprobe_trace_entry, \ - TRACE_KPROBE);\ - IF_ASSIGN(var, ent, struct kretprobe_trace_entry, \ - TRACE_KRETPROBE); \ __ftrace_bad_type();\ } while (0) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index ec137ed..0e8498e 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -186,6 +186,7 @@ struct trace_probe { }; const char *symbol;/* symbol name */ struct ftrace_event_callcall; + struct trace_event event; unsigned intnr_args; struct fetch_func args[]; }; @@ -795,7 +796,7 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) size = SIZEOF_KPROBE_TRACE_ENTRY(tp-nr_args); - event = trace_current_buffer_lock_reserve(TRACE_KPROBE, size, + event = trace_current_buffer_lock_reserve(call-id, size, irq_flags, pc); if (!event) return 0; @@ -827,7 +828,7 @@ static __kprobes int kretprobe_trace_func(struct kretprobe_instance *ri, size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp-nr_args); - event = trace_current_buffer_lock_reserve(TRACE_KRETPROBE, size, + event = trace_current_buffer_lock_reserve(call-id, size, irq_flags, pc); if (!event) return 0; @@ -853,7 +854,7 @@ print_kprobe_event(struct trace_iterator *iter, int flags) struct trace_seq *s = iter-seq; int i; - trace_assign_type(field, iter-ent); + field = (struct kprobe_trace_entry *)iter-ent; if (!seq_print_ip_sym(s, field-ip, flags | TRACE_ITER_SYM_OFFSET)) goto partial; @@ -880,7 +881,7 @@ print_kretprobe_event(struct trace_iterator *iter, int flags) struct trace_seq *s = iter-seq; int i; - trace_assign_type(field, iter-ent); + field = (struct kretprobe_trace_entry *)iter-ent; if (!seq_print_ip_sym(s, field-ret_ip, flags | TRACE_ITER_SYM_OFFSET)) goto partial; @@ -906,16 +907,6 @@ partial: return TRACE_TYPE_PARTIAL_LINE; } -static struct trace_event kprobe_trace_event = { - .type = TRACE_KPROBE, - .trace = print_kprobe_event, -}; - -static struct trace_event kretprobe_trace_event = { - .type = TRACE_KRETPROBE, - .trace = print_kretprobe_event, -}; - static int probe_event_enable(struct ftrace_event_call *call) { struct trace_probe *tp = (struct trace_probe *)call-data; @@ -1107,35 +1098,35 @@ static int register_probe_event(struct trace_probe *tp) /* Initialize ftrace_event_call */ call-system = kprobes; if (probe_is_return(tp)) { - call-event = kretprobe_trace_event; - call-id = TRACE_KRETPROBE; + tp-event.trace = print_kretprobe_event
[PATCH -tip v14 03/12] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Avi Kivity a...@redhat.com Cc: Andi Kleen a...@linux.intel.com Cc: Christoph Hellwig h...@infradead.org Cc: Frank Ch. Eigler f...@redhat.com Cc: Frederic Weisbecker fweis...@gmail.com Cc: H. Peter Anvin h...@zytor.com Cc: Ingo Molnar mi...@elte.hu Cc: Jason Baron jba...@redhat.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: K.Prasad pra...@linux.vnet.ibm.com Cc: Lai Jiangshan la...@cn.fujitsu.com Cc: Li Zefan l...@cn.fujitsu.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Roland McGrath rol...@redhat.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Steven Rostedt rost...@goodmis.org Cc: Tom Zanussi tzanu...@gmail.com Cc: Vegard Nossum vegard.nos...@gmail.com --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..80d493f 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[TOOL] kprobestest : Kprobe stress test tool
This script tests kprobes to probe on all symbols in the kernel and finds symbols which must be blacklisted. Usage - kprobestest [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST] Run stress test. If SYMLIST file is specified, use it as an initial symbol list (This is useful for verifying white list after diagnosing all symbols). kprobestest cleanup Cleanup all lists How to Work --- This tool list up all symbols in the kernel via /proc/kallsyms, and sorts it into groups (each of them including 64 symbols in default). And then, it tests each group by using kprobe-tracer. If a kernel crash occurred, that group is moved into 'failed' dir. If the group passed the test, this script moves it into 'passed' dir and saves kprobe_profile into 'passed/profiles/'. After testing all groups, all 'failed' groups are merged and sorted into smaller groups (divided by 4, in default). And those are tested again. This loop will be repeated until all group has just 1 symbol. Finally, the script sorts all 'passed' symbols into 'tested', 'untested', and 'missed' based on profiles. Note - This script just gives us some clues to the blacklisted functions. In some cases, a combination of probe points will cause a problem, but each of them doesn't cause the problem alone. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com #!/bin/bash # # kprobestest: Kprobes stress test tool # Written by Masami Hiramatsu mhira...@redhat.com # # Usage: # $ kprobestest [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST] #Run stress test. If SYMLIST file is specified, use it as #an initial symbol list (This is useful for verifying white list #after diagnosing all symbols). # # $ kprobestest cleanup #Cleanup all lists # Configurations DEBUGFS=/sys/kernel/debug INITNR=64 DIV=4 SYMFILE=syms.list FAILFILE=black.list function do_test () { # Do some benchmark for i in {1..4} ; do sleep 0.5 echo -n . done } function usage () { echo Usage: kprobestest [cleanup] [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST] exit 0 } function cleanup_test () { echo Cleanup all files rm -rf $SYMFILE failed passed testing unset exit 0 } # Parse arguments WHITELIST= BLACKLIST= SYMLIST= while [ $1 ]; do case $1 in cleanup) cleanup_test ;; -s) SYMLIST=$2 shift 1 ;; -b) BLACKLIST=$2 shift 1 ;; -w) WHITELIST=$2 shift 1 ;; *) usage ;; esac shift 1 done # Show configurations echo Kprobe stress test starting. [ -f $BLACKLIST ] echo Blacklist: $BLACKLIST || BLACKLIST= [ -f $WHITELIST ] echo Whitelist: $WHITELIST || WHITELIST= [ -f $SYMLIST ] echo Symlist: $SYMLIST || SYMLIST= function make_filter () { local EXP= if [ -z $WHITELIST -a -z $BLACKLIST ]; then echo s/^$//g else for i in `cat $WHITELIST $BLACKLIST` ;do [ -z $EXP ] EXP=^$i\$ || EXP=$EXP\\|^$i\$ done ; EXP=s/$EXP//g echo $EXP fi } function list_allsyms () { local sym local out=1 for sym in `sort /proc/kallsyms | egrep '[0-9a-f]+ [Tt] [^[]*$' | cut -d\ -f 3`;do [ $sym = __kprobes_text_start ] out=0 continue [ $sym = __kprobes_text_end ] out=1 continue [ $sym = _etext ] break [ $out -eq 1 ] echo $sym done } function prep_testing () { local i=0 local n=0 local NR=$1 local fname= echo Grouping symbols: $NR fname=`printf list-%03d.%d $i $NR` cat $SYMFILE | while read ln; do [ -z $ln ] continue echo $ln testing/$fname n=$((n+1)) if [ $n -eq $NR ]; then n=0 i=$((i+1)) fname=`printf list-%03d.%d $i $NR` fi done sync } function init_first () { local EXP EXP=`make_filter` if [ -f $SYMLIST ]; then cat $SYMLIST | sed $EXP $SYMFILE else echo -n Generating symbol list from /proc/kallsyms... list_allsyms | sed $EXP $SYMFILE echo done. `wc -l $SYMFILE | cut -f1 -d\ ` symbols listed. fi mkdir -p testing failed unset passed passed/profiles prep_testing $INITNR } function get_max_nr () { wc -l failed/list-* unset/list-* 2/dev/null |\ awk '/^ *[0-9]+ .*list.*$/{ if (nr $1) nr=$1 } BEGIN { nr=0 } END { print nr}' } function init_next () { local NR NR=`get_max_nr` [ $NR -eq 0 ] return 1 [ $NR -eq 1 ] return 2 [ $NR -le $DIV ] NR=1 || NR=`expr $NR / $DIV` cat failed/* unset/* $SYMFILE rm failed/* unset/* prep_testing $NR return 0 } # Initialize symbols if [ ! -d testing ]; then init_first elif [ -z `ls testing/` ]; then init_next fi function set_probes () { local s for s in `cat $1`; do echo p:$s $s $DEBUGFS/tracing/kprobe_events [ $? -ne 0 ] return -1 done return 0 } function clear_probes () { echo $DEBUGFS/tracing/kprobe_events } function save_profile () { cat $DEBUGFS/tracing/kprobe_profile passed/profiles/$1
[TOOL] c2kpe: C expression to kprobe event format converter
This program converts probe point in C expression to kprobe event format for kprobe-based event tracer. This helps to define kprobes events by C source line number or function name, and local variable name. Currently, this supports only x86(32/64) kernels. Compile Before compilation, please install libelf and libdwarf development packages. (e.g. elfutils-libelf-devel and libdwarf-devel on Fedora) $ gcc -Wall -lelf -ldwarf c2kpe.c -o c2kpe Synopsis $ c2kpe [options] function[+off...@src] [VAR [VAR ...]] or $ c2kpe [options] @SRC:LINE [VAR [VAR ...]] FUNCTION: Probing function name. OFFS: Offset in bytes. SRC: Source file path. LINE: Line number VAR: Local variable name. options: -r KREL Kernel release version (e.g. 2.6.31-rc5) -m DEBUGINFO Dwarf-format binary file (vmlinux or kmodule) Example --- $ c2kpe sys_read fd buf count sys_read+0 %di %si %dx $ c2kpe @mm/filemap.c:339 inode pos sync_page_range+125 -48(%bp) %r14 Example with kprobe-tracer -- Since C expression may be converted multiple results, I recommend to use readline. $ c2kpe sys_read fd buf count | while read i; do \ echo p $i $DEBUGFS/tracing/kprobe_events ;\ done Note - This requires a kernel compiled with CONFIG_DEBUG_INFO. - Specifying @SRC speeds up c2kpe, because we can skip CUs which don't include specified SRC file. - c2kpe doesn't check whether the offset byte is correctly on the instruction boundary. I recommend you to use @SRC:LINE expression for tracing function body. - This tool doesn't search kmodule file. You need to specify kmodule file if you want to probe it. TODO - Fix bugs. - Support multiple probepoints from stdin. - Better kmodule support. - Use elfutils-libdw? - Merge into trace-cmd or perf-tools? -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com /* * c2kpe : C expression to kprobe event converter * * Written by Masami Hiramatsu mhira...@redhat.com * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ #include sys/utsname.h #include sys/types.h #include sys/stat.h #include fcntl.h #include errno.h #include stdio.h #include unistd.h #include getopt.h #include stdlib.h #include string.h #include libdwarf/dwarf.h #include libdwarf/libdwarf.h /* Default vmlinux search paths */ #define NR_SEARCH_PATH 2 const char *default_search_path[NR_SEARCH_PATH] = { /lib/modules/%s/build/vmlinux,/* Custom build kernel */ /usr/lib/debug/lib/modules/%s/vmlinux,/* Red Hat debuginfo */ }; #define _stringify(n) #n #define stringify(n)_stringify(n) #ifdef DEBUG #define debug(fmt ...) \ fprintf(stderr, DBG( __FILE__ : stringify(__LINE__) ): fmt) #else #define debug(fmt ...) do {} while (0) #endif #define ERR_IF(cnd) \ do { if (cnd) { \ fprintf(stderr, Error ( __FILE__ : stringify(__LINE__) \ ): stringify(cnd) \n); \ exit(1);\ }} while (0) #define MAX_PATH_LEN 256 /* Dwarf_Die Linkage to parent Die */ struct die_link { struct die_link *parent;/* Parent die */ Dwarf_Die die; /* Current die */ }; #define X86_32_MAX_REGS 8 const char *x86_32_regs_table[X86_32_MAX_REGS] = { %ax, %cx, %dx, %bx, sa, /* Stack address */ %bp, %si, %di, }; #define X86_64_MAX_REGS 16 const char *x86_64_regs_table[X86_64_MAX_REGS] = { %ax, %dx, %cx, %bx, %si, %di, %bp, %sp, %r8, %r9, %r10, %r11, %r12, %r13, %r14, %r15, }; /* TODO: switching by dwarf address size */ #ifdef __x86_64__ #define ARCH_MAX_REGS X86_64_MAX_REGS #define arch_regs_table x86_64_regs_table #else #define ARCH_MAX_REGS X86_32_MAX_REGS #define arch_regs_table x86_32_regs_table #endif /* Return architecture dependent register string */ static inline const char *get_arch_regstr(unsigned int n) { return (n = ARCH_MAX_REGS) ? arch_regs_table[n] : NULL
[PATCH -tip -v13 03/11] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..80d493f 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v13 04/11] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 80d493f..98f48d0 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67: - case 0x2e: - case 0x3e: - case 0x26: - case 0x64: - case 0x65: - case 0x36: - case 0xf0: - case 0xf3: - case 0xf2: - ++insn; - continue; - } - break; - } + struct insn insn; + kernel_insn_init(insn, p-ainsn.insn); - /* Skip REX instruction prefix. */ - if (is_REX_prefix(insn)) - ++insn; - - if (*insn == 0x0f) { - /* Two-byte opcode. */ - ++insn
[PATCH -tip -v13 07/11] tracing: Introduce TRACE_FIELD_ZERO() macro
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item). This also includes a fix of TRACE_ZERO_CHAR() macro. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- kernel/trace/trace_event_types.h |4 ++-- kernel/trace/trace_export.c | 16 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 6db005e..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) TRACE_FIELD(char *, fmt, fmt) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); @@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index 7cee79d..23125b5 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -42,9 +42,9 @@ extern void __bad_type_size(void); if (!ret) \ return 0; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item)\ - ret = trace_seq_printf(s, \tfield:char #item ;\t \ +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) \ + ret = trace_seq_printf(s, \tfield: #type #item ;\t \ offset:%u;\tsize:0;\n, \ (unsigned int)offsetof(typeof(field), item)); \ if (!ret) \ @@ -90,9 +90,6 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct trace_seq *s)\ #include trace_event_types.h -#undef TRACE_ZERO_CHAR -#define TRACE_ZERO_CHAR(arg) - #undef TRACE_FIELD #define TRACE_FIELD(type, item, assign)\ entry-item = assign; @@ -105,6 +102,9 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct trace_seq *s)\ #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\ TRACE_FIELD(type, item, assign) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) + #undef TP_CMD #define TP_CMD(cmd...) cmd @@ -176,8 +176,8 @@ __attribute__((section(_ftrace_events))) event_##call = { \ if (ret)\ return ret; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) #undef TRACE_EVENT_FORMAT #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v13 06/11] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com --- include/linux/ftrace_event.h | 13 +--- include/trace/ftrace.h | 22 ++--- kernel/trace/trace_events.c | 72 -- kernel/trace/trace_export.c | 27 4 files changed, 86 insertions(+), 48 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 5c093ff..f7733b6 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 1867553..d696580 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -147,7 +147,8 @@ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call( \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *unused) * { * int ret; * @@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call( \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *unused) * { * unregister_trace_call(ftrace_event_call); * } @@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call( \ * trace_current_buffer_unlock_commit(event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_call(void) + * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused) * { * int
[PATCH -tip -v13 05/11] x86: add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from pt_regs. These APIs are required by kprobes-based event tracer on ftrace. Some other debugging tools might be able to use it too. - regs_query_register_offset(const char *name) Query the offset of name register. - regs_query_register_name(unsigned int offset) Query the name of register by its offset. - regs_get_register(struct pt_regs *regs, unsigned int offset) Get the value of a register by its offset. - regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth) Get Nth entry of the kernel stack. (N = 0) - regs_get_argument_nth(struct pt_regs *reg, unsigned int nth) Get Nth argument at function call. (N = 0) Signed-off-by: Masami Hiramatsu mhira...@redhat.com Reviewed-by: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@firstfloor.org Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: linux-a...@vger.kernel.org --- arch/x86/include/asm/ptrace.h | 62 +++ arch/x86/kernel/ptrace.c | 112 + 2 files changed, 174 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..a3d49dd 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int regs_query_register_offset(const char *name); +extern const char *regs_query_register_name(unsigned int offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/** + * regs_get_register() - get register value from its offset + * @regs: pt_regs from which register value is gotten. + * @offset:offset number of the register. + * + * regs_get_register returns the value of a register whose offset from @regs + * is @offset. The @offset is the offset of the register in struct pt_regs. + * If @offset is bigger than MAX_REG_OFFSET, this returns 0. + */ +static inline unsigned long regs_get_register(struct pt_regs *regs, + unsigned int offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/** + * regs_within_kernel_stack() - check the address in the stack + * @regs: pt_regs which contains kernel stack pointer. + * @addr: address which is checked. + * + * regs_within_kenel_stack() checks @addr is within the kernel stack page(s). + * If @addr is within the kernel stack, it returns true. If not, returns false. + */ +static inline int regs_within_kernel_stack(struct pt_regs *regs, + unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/** + * regs_get_kernel_stack_nth() - get Nth entry of the stack + * @regs: pt_regs which contains kernel stack pointer. + * @n: stack entry number. + * + * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which + * is specifined by @regs. If the @n th entry is NOT in the kernel stack, + * this returns 0. + */ +static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs, + unsigned int n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (regs_within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +extern unsigned long regs_get_argument_nth(struct pt_regs *regs, + unsigned int n); + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index cabdabc..32729ec 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -49,6 +49,118 @@ enum x86_regset { REGSET_IOPERM32, }; +struct pt_regs_offset { + const char *name; + int offset; +}; + +#define REG_OFFSET_NAME(r) {.name = #r, .offset = offsetof(struct pt_regs, r)} +#define REG_OFFSET_END {.name = NULL, .offset = 0} + +static const struct pt_regs_offset regoffset_table[] = { +#ifdef CONFIG_X86_64 + REG_OFFSET_NAME(r15), + REG_OFFSET_NAME(r14), + REG_OFFSET_NAME(r13
[PATCH -tip -v13 09/11] tracing: Kprobe-tracer supports more than 6 arguments
Support up to 128 arguments for each kprobes event. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |2 +- kernel/trace/trace_kprobe.c | 21 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 9ad907c..b29a54b 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -32,7 +32,7 @@ Synopsis of kprobe_events SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. - FETCHARGS : Arguments. + FETCHARGS : Arguments. Each probe can have up to 128 args. %REG : Fetch register REG sN : Fetch Nth entry of stack (N = 0) @ADDR: Fetch memory at ADDR (ADDR should be in kernel) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 39491f0..e78c4ea 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -32,7 +32,7 @@ #include trace.h #include trace_output.h -#define TRACE_KPROBE_ARGS 6 +#define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 /* currently, trace_kprobe only supports X86. */ @@ -178,11 +178,15 @@ struct trace_probe { struct kretproberp; }; const char *symbol;/* symbol name */ - unsigned intnr_args; - struct fetch_func args[TRACE_KPROBE_ARGS]; struct ftrace_event_callcall; + unsigned intnr_args; + struct fetch_func args[]; }; +#define SIZEOF_TRACE_PROBE(n) \ + (offsetof(struct trace_probe, args) + \ + (sizeof(struct fetch_func) * (n))) + static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs); static int kretprobe_trace_func(struct kretprobe_instance *ri, struct pt_regs *regs); @@ -255,11 +259,11 @@ static DEFINE_MUTEX(probe_lock); static LIST_HEAD(probe_list); static struct trace_probe *alloc_trace_probe(const char *symbol, -const char *event) +const char *event, int nargs) { struct trace_probe *tp; - tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL); + tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL); if (!tp) return ERR_PTR(-ENOMEM); @@ -559,9 +563,10 @@ static int create_trace_probe(int argc, char **argv) if (offset is_return) return -EINVAL; } + argc -= 2; argv += 2; /* setup a probe */ - tp = alloc_trace_probe(symbol, event); + tp = alloc_trace_probe(symbol, event, argc); if (IS_ERR(tp)) return PTR_ERR(tp); @@ -580,8 +585,8 @@ static int create_trace_probe(int argc, char **argv) kp-addr = addr; /* parse arguments */ - argc -= 2; argv += 2; ret = 0; - for (i = 0; i argc i TRACE_KPROBE_ARGS; i++) { + ret = 0; + for (i = 0; i argc i MAX_TRACE_ARGS; i++) { if (strlen(argv[i]) MAX_ARGSTR_LEN) { pr_info(Argument%d(%s) is too long.\n, i, argv[i]); ret = -ENOSPC; -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v13 08/11] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes (kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /sys/kernel/debug/tracing/kprobe_events. And you can set filters on each probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobetrace.txt for details. Changes from v12: - Check O_TRUNC for cleanup events, instead of !O_APPEND. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com Cc: Li Zefan l...@cn.fujitsu.com --- Documentation/trace/kprobetrace.txt | 138 kernel/trace/Kconfig| 12 kernel/trace/Makefile |1 kernel/trace/trace.h| 29 + kernel/trace/trace_event_types.h| 18 + kernel/trace/trace_kprobe.c | 1193 +++ 6 files changed, 1391 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt new file mode 100644 index 000..9ad907c --- /dev/null +++ b/Documentation/trace/kprobetrace.txt @@ -0,0 +1,138 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each +probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe + + EVENT : Event name. + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. + MEMADDR : Address where the probe is inserted. + + FETCHARGS : Arguments. + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace buffer. If an event +name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds +an event under tracing/events/kprobes/EVENT, at the directory you can see +'id', 'enabled', 'format' and 'filter'. + +enabled: + You can enable/disable the probe by writing 1 or 0 on it. + +format: + It shows the format of this probe event. It also shows aliases of arguments + which you specified to kprobe_events. + +filter: + You can write filtering rules of this event. And you can use both of aliase + names and field names for describing filters. + + +Usage examples
[PATCH -tip -v13 10/11] tracing: Generate names for each kprobe event automatically
Generate names for each kprobe event based on the probe point, and remove generic k*probe event types because there is no user of those types. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |3 +- kernel/trace/trace_event_types.h| 18 -- kernel/trace/trace_kprobe.c | 64 ++- 3 files changed, 35 insertions(+), 50 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index b29a54b..437ad49 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -28,7 +28,8 @@ Synopsis of kprobe_events p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe - EVENT : Event name. + EVENT : Event name. If omitted, the event name is generated + based on SYMBOL+offs or MEMADDR. SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 186b598..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, kmemtrace_free_entry, ignore, TP_RAW_FMT(type:%u call_site:%lx ptr:%p) ); -TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx: args:0x%lx ...) -); - -TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, func, func) - TRACE_FIELD(unsigned long, ret_ip, ret_ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...) -); #undef TRACE_SYSTEM diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index e78c4ea..9f9f161 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -34,6 +34,7 @@ #define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 +#define MAX_EVENT_NAME_LEN 64 /* currently, trace_kprobe only supports X86. */ @@ -272,11 +273,11 @@ static struct trace_probe *alloc_trace_probe(const char *symbol, if (!tp-symbol) goto error; } - if (event) { - tp-call.name = kstrdup(event, GFP_KERNEL); - if (!tp-call.name) - goto error; - } + if (!event) + goto error; + tp-call.name = kstrdup(event, GFP_KERNEL); + if (!tp-call.name) + goto error; INIT_LIST_HEAD(tp-list); return tp; @@ -306,7 +307,7 @@ static struct trace_probe *find_probe_event(const char *event) struct trace_probe *tp; list_for_each_entry(tp, probe_list, list) - if (tp-call.name !strcmp(tp-call.name, event)) + if (!strcmp(tp-call.name, event)) return tp; return NULL; } @@ -322,8 +323,7 @@ static void __unregister_trace_probe(struct trace_probe *tp) /* Unregister a trace_probe and probe_event: call with locking probe_lock */ static void unregister_trace_probe(struct trace_probe *tp) { - if (tp-call.name) - unregister_probe_event(tp); + unregister_probe_event(tp); __unregister_trace_probe(tp); list_del(tp-list); } @@ -352,18 +352,16 @@ static int register_trace_probe(struct trace_probe *tp) goto end; } /* register as an event */ - if (tp-call.name) { - old_tp = find_probe_event(tp-call.name); - if (old_tp) { - /* delete old event */ - unregister_trace_probe(old_tp); - free_trace_probe(old_tp); - } - ret = register_probe_event(tp); - if (ret) { - pr_warning(Faild to register probe event(%d)\n, ret); - __unregister_trace_probe(tp); - } + old_tp = find_probe_event(tp-call.name); + if (old_tp) { + /* delete old event */ + unregister_trace_probe(old_tp); + free_trace_probe(old_tp); + } + ret = register_probe_event(tp); + if (ret) { + pr_warning(Faild
[PATCH -tip -v13 00/11] tracing: kprobe-based event tracer and x86 instruction decoder
/kprobe_events This sets a kretprobe on the return point of do_sys_open() function with recording return value and return address as myretprobe event. You can see the format of these events via /sys/kernel/debug/tracing/events/kprobes/EVENT/format. cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format name: myprobe ID: 23 format: field:unsigned short common_type; offset:0; size:2; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (11): tracing: Add kprobes event profiling interface tracing: Generate names for each kprobe event automatically tracing: Kprobe-tracer supports more than 6 arguments tracing: add kprobe-based event tracer tracing: Introduce TRACE_FIELD_ZERO() macro tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobetrace.txt | 147 arch/x86/Kconfig.debug |9 arch/x86/Makefile|3 arch/x86/include/asm/inat.h | 188 + arch/x86/include/asm/inat_types.h| 29 + arch/x86/include/asm/insn.h | 143 arch/x86/include/asm/ptrace.h| 62 ++ arch/x86/kernel/kprobes.c| 197 +++-- arch/x86/kernel/ptrace.c | 112 +++ arch/x86/lib/Makefile| 13 arch/x86/lib/inat.c | 78 ++ arch/x86/lib/insn.c | 464 + arch/x86/lib/x86-opcode-map.txt | 719 arch/x86/tools/Makefile | 15 arch/x86/tools/distill.awk | 42 + arch/x86/tools/gen-insn-attr-x86.awk | 314 + arch/x86/tools/test_get_len.c| 113 +++ include/linux/ftrace_event.h | 13 include/trace/ftrace.h | 22 - kernel/trace/Kconfig | 12 kernel/trace/Makefile|1 kernel/trace/trace.h | 29 + kernel/trace/trace_event_types.h |4 kernel/trace/trace_events.c | 72 +- kernel/trace/trace_export.c | 43 + kernel/trace/trace_kprobe.c | 1243 ++ 26 files changed, 3924 insertions(+), 163 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/inat_types.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/tools/Makefile create mode 100644 arch/x86/tools/distill.awk create mode 100644 arch/x86/tools/gen-insn-attr-x86.awk create mode 100644 arch/x86/tools/test_get_len.c
[PATCH -tip -v13 11/11] tracing: Add kprobes event profiling interface
Add profiling interaces for each kprobes event. This interface provides how many times each probe hit or missed. Changes from v12: - Reformat profile data. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com Cc: Li Zefan l...@cn.fujitsu.com --- Documentation/trace/kprobetrace.txt |8 +++ kernel/trace/trace_kprobe.c | 43 +++ 2 files changed, 51 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 437ad49..9c6be05 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -69,6 +69,14 @@ filter: names and field names for describing filters. +Event Profiling +--- + You can check the total number of probe hits and probe miss-hits via +/sys/kernel/debug/tracing/kprobe_profile. + The first column is event name, the second is the number of probe hits, +the third is the number of probe miss-hits. + + Usage examples -- To add a probe as a new event, write a new definition to kprobe_events diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 9f9f161..aedf25a 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -178,6 +178,7 @@ struct trace_probe { struct kprobe kp; struct kretproberp; }; + unsigned long nhit; const char *symbol;/* symbol name */ struct ftrace_event_callcall; unsigned intnr_args; @@ -766,6 +767,37 @@ static const struct file_operations kprobe_events_ops = { .write = probes_write, }; +/* Probes profiling interfaces */ +static int probes_profile_seq_show(struct seq_file *m, void *v) +{ + struct trace_probe *tp = v; + + seq_printf(m, %-44s %15lu %15lu\n, tp-call.name, tp-nhit, + probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed); + + return 0; +} + +static const struct seq_operations profile_seq_op = { + .start = probes_seq_start, + .next = probes_seq_next, + .stop = probes_seq_stop, + .show = probes_profile_seq_show +}; + +static int profile_open(struct inode *inode, struct file *file) +{ + return seq_open(file, profile_seq_op); +} + +static const struct file_operations kprobe_profile_ops = { + .owner = THIS_MODULE, + .open = profile_open, + .read = seq_read, + .llseek = seq_lseek, + .release= seq_release, +}; + /* Kprobe handler */ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) { @@ -776,6 +808,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) unsigned long irq_flags; struct ftrace_event_call *call = tp-call; + tp-nhit++; + local_save_flags(irq_flags); pc = preempt_count(); @@ -1152,9 +1186,18 @@ static __init int init_kprobe_trace(void) entry = debugfs_create_file(kprobe_events, 0644, d_tracer, NULL, kprobe_events_ops); + /* Event list interface */ if (!entry) pr_warning(Could not create debugfs 'kprobe_events' entry\n); + + /* Profile interface */ + entry = debugfs_create_file(kprobe_profile, 0444, d_tracer, + NULL, kprobe_profile_ops); + + if (!entry) + pr_warning(Could not create debugfs + 'kprobe_profile' entry\n); return 0; } fs_initcall(init_kprobe_trace); -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v13 01/11] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps include a few SSE and FP opcodes (for setup), because those opcodes are used in the kernel. Changes from v12: - Use arch/x86/tools dir instead of arch/x86/scripts. - Remove all EXPORT_SYMBOL_GPL() and linux/module.h. - Replace all types defined in linux/types.h. - Use inline functions instead of macros. - Add VIA's RNG/ACE instructions. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Acked-by: H. Peter Anvin h...@zytor.com Cc: Sam Ravnborg s...@ravnborg.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/include/asm/inat.h | 188 + arch/x86/include/asm/inat_types.h| 29 + arch/x86/include/asm/insn.h | 143 +++ arch/x86/lib/Makefile| 13 + arch/x86/lib/inat.c | 78 arch/x86/lib/insn.c | 464 ++ arch/x86/lib/x86-opcode-map.txt | 719 ++ arch/x86/tools/gen-insn-attr-x86.awk | 314 +++ 8 files changed, 1948 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/inat_types.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/tools/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..2866fdd --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,188 @@ +#ifndef _ASM_X86_INAT_H +#define _ASM_X86_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include asm/inat_types.h + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should use checking functions. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6 +#define INAT_IMM_VWORD 7 + +/* Legacy
[PATCH -tip -v13 02/11] x86: x86 instruction decoder build-time selftest
Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Changes from v12: - Remove user_include.h. - Use $(OBJDUMP) instead of native objdump. - Use hostprogs-y and include insn.c and inat.c directly from test_gen_insn.c. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: Sam Ravnborg s...@ravnborg.org Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/Kconfig.debug|9 +++ arch/x86/Makefile |3 + arch/x86/tools/Makefile | 15 + arch/x86/tools/distill.awk| 42 +++ arch/x86/tools/test_get_len.c | 113 + 5 files changed, 182 insertions(+), 0 deletions(-) create mode 100644 arch/x86/tools/Makefile create mode 100644 arch/x86/tools/distill.awk create mode 100644 arch/x86/tools/test_get_len.c diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d105f29..7d0b681 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -186,6 +186,15 @@ config X86_DS_SELFTEST config HAVE_MMIOTRACE_SUPPORT def_bool y +config X86_DECODER_SELFTEST + bool x86 instruction decoder selftest + depends on DEBUG_KERNEL + ---help--- +Perform x86 instruction decoder selftests at build time. +This option is useful for checking the sanity of x86 instruction +decoder code. +If unsure, say N. + # # IO delay types: # diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1b68659..5fe16bf 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -154,6 +154,9 @@ all: bzImage KBUILD_IMAGE := $(boot)/bzImage bzImage: vmlinux +ifeq ($(CONFIG_X86_DECODER_SELFTEST),y) + $(Q)$(MAKE) $(build)=arch/x86/tools posttest +endif $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile new file mode 100644 index 000..3dd626b --- /dev/null +++ b/arch/x86/tools/Makefile @@ -0,0 +1,15 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = $(OBJDUMP) -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len + +posttest: $(obj)/test_get_len vmlinux + $(call cmd,posttest) + +hostprogs-y:= test_get_len + +# -I needed for generated C source and C source which in the kernel tree. +HOSTCFLAGS_test_get_len.o := -Wall -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/ + +# Dependancies are also needed. +$(obj)/test_get_len.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + diff --git a/arch/x86/tools/distill.awk b/arch/x86/tools/distill.awk new file mode 100644 index 000..d433619 --- /dev/null +++ b/arch/x86/tools/distill.awk @@ -0,0 +1,42 @@ +#!/bin/awk -f +# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len +# Distills the disassembly as follows: +# - Removes all lines except the disassembled instructions. +# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes +# into a single line. +# - Remove bad(or prefix only) instructions + +BEGIN { + prev_addr = + prev_hex = + prev_mnemonic = + bad_expr = (\\(bad\\)|^rex|^.byte|^rep(z|nz)$|^lock$|^es$|^cs$|^ss$|^ds$|^fs$|^gs$|^data(16|32)$|^addr(16|32|64)) + fwait_expr = ^9b + fwait_str=9b\tfwait +} + +/^ *[0-9a-f]+:/ { + if (split($0, field, \t) 3) { + # This is a continuation of the same insn. + prev_hex = prev_hex field[2] + } else { + # Skip bad instructions + if (match(prev_mnemonic, bad_expr)) + prev_addr = + # Split fwait from other f* instructions + if (match(prev_hex, fwait_expr) prev_mnemonic != fwait) { + printf %s\t%s\n, prev_addr, fwait_str + sub(fwait_expr, , prev_hex) + } + if (prev_addr != ) + printf %s\t%s\t%s\n, prev_addr, prev_hex, prev_mnemonic + prev_addr = field[1
[PATCH -tip -v12 04/11] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 5341842..b77e050 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67: - case 0x2e: - case 0x3e: - case 0x26: - case 0x64: - case 0x65: - case 0x36: - case 0xf0: - case 0xf3: - case 0xf2: - ++insn; - continue; - } - break; - } + struct insn insn; + kernel_insn_init(insn, p-ainsn.insn); - /* Skip REX instruction prefix. */ - if (is_REX_prefix(insn)) - ++insn; - - if (*insn == 0x0f) { - /* Two-byte opcode. */ - ++insn
[PATCH -tip -v12 06/11] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Changes from v11: - Call remove_subsystem_dir() when unregistering an event call. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com --- include/linux/ftrace_event.h | 13 +--- include/trace/ftrace.h | 22 ++--- kernel/trace/trace_events.c | 72 -- kernel/trace/trace_export.c | 27 4 files changed, 86 insertions(+), 48 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 5c093ff..f7733b6 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 1867553..d696580 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -147,7 +147,8 @@ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call( \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *unused) * { * int ret; * @@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call( \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *unused) * { * unregister_trace_call(ftrace_event_call); * } @@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call( \ * trace_current_buffer_unlock_commit(event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_call(void) + * static
[PATCH -tip -v12 07/11] tracing: Introduce TRACE_FIELD_ZERO() macro
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item). This also includes a fix of TRACE_ZERO_CHAR() macro. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- kernel/trace/trace_event_types.h |4 ++-- kernel/trace/trace_export.c | 16 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 6db005e..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) TRACE_FIELD(char *, fmt, fmt) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); @@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index 7cee79d..23125b5 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -42,9 +42,9 @@ extern void __bad_type_size(void); if (!ret) \ return 0; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item)\ - ret = trace_seq_printf(s, \tfield:char #item ;\t \ +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) \ + ret = trace_seq_printf(s, \tfield: #type #item ;\t \ offset:%u;\tsize:0;\n, \ (unsigned int)offsetof(typeof(field), item)); \ if (!ret) \ @@ -90,9 +90,6 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct trace_seq *s)\ #include trace_event_types.h -#undef TRACE_ZERO_CHAR -#define TRACE_ZERO_CHAR(arg) - #undef TRACE_FIELD #define TRACE_FIELD(type, item, assign)\ entry-item = assign; @@ -105,6 +102,9 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct trace_seq *s)\ #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\ TRACE_FIELD(type, item, assign) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) + #undef TP_CMD #define TP_CMD(cmd...) cmd @@ -176,8 +176,8 @@ __attribute__((section(_ftrace_events))) event_##call = { \ if (ret)\ return ret; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) #undef TRACE_EVENT_FORMAT #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v12 05/11] x86: add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from pt_regs. These APIs are required by kprobes-based event tracer on ftrace. Some other debugging tools might be able to use it too. - regs_query_register_offset(const char *name) Query the offset of name register. - regs_query_register_name(unsigned int offset) Query the name of register by its offset. - regs_get_register(struct pt_regs *regs, unsigned int offset) Get the value of a register by its offset. - regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth) Get Nth entry of the kernel stack. (N = 0) - regs_get_argument_nth(struct pt_regs *reg, unsigned int nth) Get Nth argument at function call. (N = 0) Changes from v10: - Use an offsetof table in regs_get_argument_nth(). - Use unsigned int instead of unsigned. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Reviewed-by: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@firstfloor.org Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: linux-a...@vger.kernel.org --- arch/x86/include/asm/ptrace.h | 62 +++ arch/x86/kernel/ptrace.c | 112 + 2 files changed, 174 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..a3d49dd 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int regs_query_register_offset(const char *name); +extern const char *regs_query_register_name(unsigned int offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/** + * regs_get_register() - get register value from its offset + * @regs: pt_regs from which register value is gotten. + * @offset:offset number of the register. + * + * regs_get_register returns the value of a register whose offset from @regs + * is @offset. The @offset is the offset of the register in struct pt_regs. + * If @offset is bigger than MAX_REG_OFFSET, this returns 0. + */ +static inline unsigned long regs_get_register(struct pt_regs *regs, + unsigned int offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/** + * regs_within_kernel_stack() - check the address in the stack + * @regs: pt_regs which contains kernel stack pointer. + * @addr: address which is checked. + * + * regs_within_kenel_stack() checks @addr is within the kernel stack page(s). + * If @addr is within the kernel stack, it returns true. If not, returns false. + */ +static inline int regs_within_kernel_stack(struct pt_regs *regs, + unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/** + * regs_get_kernel_stack_nth() - get Nth entry of the stack + * @regs: pt_regs which contains kernel stack pointer. + * @n: stack entry number. + * + * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which + * is specifined by @regs. If the @n th entry is NOT in the kernel stack, + * this returns 0. + */ +static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs, + unsigned int n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (regs_within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +extern unsigned long regs_get_argument_nth(struct pt_regs *regs, + unsigned int n); + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index cabdabc..32729ec 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -49,6 +49,118 @@ enum x86_regset { REGSET_IOPERM32, }; +struct pt_regs_offset { + const char *name; + int offset; +}; + +#define REG_OFFSET_NAME(r) {.name = #r, .offset = offsetof(struct pt_regs, r)} +#define REG_OFFSET_END {.name = NULL, .offset = 0} + +static const struct pt_regs_offset regoffset_table[] = { +#ifdef
[PATCH -tip -v12 09/11] tracing: Kprobe-tracer supports more than 6 arguments
Support up to 128 arguments for each kprobes event. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |2 +- kernel/trace/trace_kprobe.c | 21 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 9ad907c..b29a54b 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -32,7 +32,7 @@ Synopsis of kprobe_events SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. - FETCHARGS : Arguments. + FETCHARGS : Arguments. Each probe can have up to 128 args. %REG : Fetch register REG sN : Fetch Nth entry of stack (N = 0) @ADDR: Fetch memory at ADDR (ADDR should be in kernel) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index ad33073..67c33e1 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -32,7 +32,7 @@ #include trace.h #include trace_output.h -#define TRACE_KPROBE_ARGS 6 +#define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 /* currently, trace_kprobe only supports X86. */ @@ -178,11 +178,15 @@ struct trace_probe { struct kretproberp; }; const char *symbol;/* symbol name */ - unsigned intnr_args; - struct fetch_func args[TRACE_KPROBE_ARGS]; struct ftrace_event_callcall; + unsigned intnr_args; + struct fetch_func args[]; }; +#define SIZEOF_TRACE_PROBE(n) \ + (offsetof(struct trace_probe, args) + \ + (sizeof(struct fetch_func) * (n))) + static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs); static int kretprobe_trace_func(struct kretprobe_instance *ri, struct pt_regs *regs); @@ -255,11 +259,11 @@ static DEFINE_MUTEX(probe_lock); static LIST_HEAD(probe_list); static struct trace_probe *alloc_trace_probe(const char *symbol, -const char *event) +const char *event, int nargs) { struct trace_probe *tp; - tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL); + tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL); if (!tp) return ERR_PTR(-ENOMEM); @@ -559,9 +563,10 @@ static int create_trace_probe(int argc, char **argv) if (offset is_return) return -EINVAL; } + argc -= 2; argv += 2; /* setup a probe */ - tp = alloc_trace_probe(symbol, event); + tp = alloc_trace_probe(symbol, event, argc); if (IS_ERR(tp)) return PTR_ERR(tp); @@ -580,8 +585,8 @@ static int create_trace_probe(int argc, char **argv) kp-addr = addr; /* parse arguments */ - argc -= 2; argv += 2; ret = 0; - for (i = 0; i argc i TRACE_KPROBE_ARGS; i++) { + ret = 0; + for (i = 0; i argc i MAX_TRACE_ARGS; i++) { if (strlen(argv[i]) MAX_ARGSTR_LEN) { pr_info(Argument%d(%s) is too long.\n, i, argv[i]); ret = -ENOSPC; -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v12 10/11] tracing: Generate names for each kprobe event automatically
Generate names for each kprobe event based on the probe point, and remove generic k*probe event types because there is no user of those types. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |3 +- kernel/trace/trace_event_types.h| 18 -- kernel/trace/trace_kprobe.c | 64 ++- 3 files changed, 35 insertions(+), 50 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index b29a54b..437ad49 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -28,7 +28,8 @@ Synopsis of kprobe_events p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe - EVENT : Event name. + EVENT : Event name. If omitted, the event name is generated + based on SYMBOL+offs or MEMADDR. SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 186b598..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, kmemtrace_free_entry, ignore, TP_RAW_FMT(type:%u call_site:%lx ptr:%p) ); -TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx: args:0x%lx ...) -); - -TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, func, func) - TRACE_FIELD(unsigned long, ret_ip, ret_ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...) -); #undef TRACE_SYSTEM diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 67c33e1..3444d1d 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -34,6 +34,7 @@ #define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 +#define MAX_EVENT_NAME_LEN 64 /* currently, trace_kprobe only supports X86. */ @@ -272,11 +273,11 @@ static struct trace_probe *alloc_trace_probe(const char *symbol, if (!tp-symbol) goto error; } - if (event) { - tp-call.name = kstrdup(event, GFP_KERNEL); - if (!tp-call.name) - goto error; - } + if (!event) + goto error; + tp-call.name = kstrdup(event, GFP_KERNEL); + if (!tp-call.name) + goto error; INIT_LIST_HEAD(tp-list); return tp; @@ -306,7 +307,7 @@ static struct trace_probe *find_probe_event(const char *event) struct trace_probe *tp; list_for_each_entry(tp, probe_list, list) - if (tp-call.name !strcmp(tp-call.name, event)) + if (!strcmp(tp-call.name, event)) return tp; return NULL; } @@ -322,8 +323,7 @@ static void __unregister_trace_probe(struct trace_probe *tp) /* Unregister a trace_probe and probe_event: call with locking probe_lock */ static void unregister_trace_probe(struct trace_probe *tp) { - if (tp-call.name) - unregister_probe_event(tp); + unregister_probe_event(tp); __unregister_trace_probe(tp); list_del(tp-list); } @@ -352,18 +352,16 @@ static int register_trace_probe(struct trace_probe *tp) goto end; } /* register as an event */ - if (tp-call.name) { - old_tp = find_probe_event(tp-call.name); - if (old_tp) { - /* delete old event */ - unregister_trace_probe(old_tp); - free_trace_probe(old_tp); - } - ret = register_probe_event(tp); - if (ret) { - pr_warning(Faild to register probe event(%d)\n, ret); - __unregister_trace_probe(tp); - } + old_tp = find_probe_event(tp-call.name); + if (old_tp) { + /* delete old event */ + unregister_trace_probe(old_tp); + free_trace_probe(old_tp); + } + ret = register_probe_event(tp); + if (ret) { + pr_warning(Faild
[PATCH -tip -v12 00/11] tracing: kprobe-based event tracer and x86 instruction decoder
r:myretprobe do_sys_open rv ra /sys/kernel/debug/tracing/kprobe_events This sets a kretprobe on the return point of do_sys_open() function with recording return value and return address as myretprobe event. You can see the format of these events via /sys/kernel/debug/tracing/events/kprobes/EVENT/format. cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format name: myprobe ID: 23 format: field:unsigned short common_type; offset:0; size:2; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (11): tracing: Add kprobes event profiling interface tracing: Generate names for each kprobe event automatically tracing: Kprobe-tracer supports more than 6 arguments tracing: add kprobe-based event tracer tracing: Introduce TRACE_FIELD_ZERO() macro tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobetrace.txt| 147 arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 arch/x86/include/asm/inat.h| 127 +++ arch/x86/include/asm/insn.h| 136 +++ arch/x86/include/asm/ptrace.h | 62 ++ arch/x86/kernel/kprobes.c | 197 ++--- arch/x86/kernel/ptrace.c | 112 +++ arch/x86/lib/Makefile | 13 arch/x86/lib/inat.c| 82 ++ arch/x86/lib/insn.c| 473 arch/x86/lib/x86-opcode-map.txt| 711 ++ arch/x86/scripts/Makefile | 19 arch/x86/scripts/distill.awk | 42 + arch/x86/scripts/gen-insn-attr-x86.awk | 314 arch/x86/scripts/test_get_len.c| 99 +++ arch/x86/scripts/user_include.h| 49 + include/linux/ftrace_event.h | 13 include/trace/ftrace.h | 22 - kernel/trace/Kconfig | 12 kernel/trace/Makefile |1 kernel/trace/trace.h | 29 + kernel/trace/trace_event_types.h |4 kernel/trace/trace_events.c| 72 +- kernel/trace/trace_export.c| 43 + kernel/trace/trace_kprobe.c| 1245 26 files changed, 3873 insertions(+), 163 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk
[PATCH -tip -v12 11/11] tracing: Add kprobes event profiling interface
Add profiling interaces for each kprobes event. Changes from v11: - Fix a typo and remove redundant check. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com Cc: Li Zefan l...@cn.fujitsu.com --- Documentation/trace/kprobetrace.txt |8 ++ kernel/trace/trace_kprobe.c | 45 +++ 2 files changed, 53 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 437ad49..9c6be05 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -69,6 +69,14 @@ filter: names and field names for describing filters. +Event Profiling +--- + You can check the total number of probe hits and probe miss-hits via +/sys/kernel/debug/tracing/kprobe_profile. + The first column is event name, the second is the number of probe hits, +the third is the number of probe miss-hits. + + Usage examples -- To add a probe as a new event, write a new definition to kprobe_events diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 3444d1d..21e619f 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -178,6 +178,7 @@ struct trace_probe { struct kprobe kp; struct kretproberp; }; + unsigned long nhits; const char *symbol;/* symbol name */ struct ftrace_event_callcall; unsigned intnr_args; @@ -766,6 +767,39 @@ static const struct file_operations kprobe_events_ops = { .write = probes_write, }; +/* Probes profiling interfaces */ +static int profile_seq_show(struct seq_file *m, void *v) +{ + struct trace_probe *tp = v; + + seq_printf(m, %s, tp-call.name); + + seq_printf(m, \t%8lu %8lu\n, tp-nhits, + probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed); + + return 0; +} + +static const struct seq_operations profile_seq_op = { + .start = probes_seq_start, + .next = probes_seq_next, + .stop = probes_seq_stop, + .show = profile_seq_show +}; + +static int profile_open(struct inode *inode, struct file *file) +{ + return seq_open(file, profile_seq_op); +} + +static const struct file_operations kprobe_profile_ops = { + .owner = THIS_MODULE, + .open = profile_open, + .read = seq_read, + .llseek = seq_lseek, + .release= seq_release, +}; + /* Kprobe handler */ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) { @@ -776,6 +810,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) unsigned long irq_flags; struct ftrace_event_call *call = tp-call; + tp-nhits++; + local_save_flags(irq_flags); pc = preempt_count(); @@ -1152,9 +1188,18 @@ static __init int init_kprobe_trace(void) entry = debugfs_create_file(kprobe_events, 0644, d_tracer, NULL, kprobe_events_ops); + /* Event list interface */ if (!entry) pr_warning(Could not create debugfs 'kprobe_events' entry\n); + + /* Profile interface */ + entry = debugfs_create_file(kprobe_profile, 0444, d_tracer, + NULL, kprobe_profile_ops); + + if (!entry) + pr_warning(Could not create debugfs + 'kprobe_profile' entry\n); return 0; } fs_initcall(init_kprobe_trace); -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest
Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Changes from v10: - Use unsigned int instead of unsigned. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Sam Ravnborg s...@ravnborg.org --- arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 + arch/x86/include/asm/inat.h |2 + arch/x86/include/asm/insn.h |2 + arch/x86/lib/inat.c |2 + arch/x86/lib/insn.c |2 + arch/x86/scripts/Makefile | 19 +++ arch/x86/scripts/distill.awk| 42 + arch/x86/scripts/test_get_len.c | 99 +++ arch/x86/scripts/user_include.h | 49 +++ 10 files changed, 229 insertions(+), 0 deletions(-) create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d105f29..7d0b681 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -186,6 +186,15 @@ config X86_DS_SELFTEST config HAVE_MMIOTRACE_SUPPORT def_bool y +config X86_DECODER_SELFTEST + bool x86 instruction decoder selftest + depends on DEBUG_KERNEL + ---help--- +Perform x86 instruction decoder selftests at build time. +This option is useful for checking the sanity of x86 instruction +decoder code. +If unsure, say N. + # # IO delay types: # diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1b68659..7046556 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -154,6 +154,9 @@ all: bzImage KBUILD_IMAGE := $(boot)/bzImage bzImage: vmlinux +ifeq ($(CONFIG_X86_DECODER_SELFTEST),y) + $(Q)$(MAKE) $(build)=arch/x86/scripts posttest +endif $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 01e079a..9090665 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -20,7 +20,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* Instruction attributes */ typedef u32 insn_attr_t; diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h index 5b50fa3..5736404 100644 --- a/arch/x86/include/asm/insn.h +++ b/arch/x86/include/asm/insn.h @@ -20,7 +20,9 @@ * Copyright (C) IBM Corporation, 2009 */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* insn_attr_t is defined in inat.h */ #include asm/inat.h diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c index d6a34be..564ecbd 100644 --- a/arch/x86/lib/inat.c +++ b/arch/x86/lib/inat.c @@ -18,7 +18,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/module.h +#endif #include asm/insn.h /* Attribute tables are generated from opcode map */ diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c index 254c848..3b9451a 100644 --- a/arch/x86/lib/insn.c +++ b/arch/x86/lib/insn.c @@ -18,8 +18,10 @@ * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ +#ifdef __KERNEL__ #include linux/string.h #include linux/module.h +#endif #include asm/inat.h #include asm/insn.h diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile new file mode 100644 index 000..f08859e --- /dev/null +++ b/arch/x86/scripts/Makefile @@ -0,0 +1,19 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len + +posttest: $(obj)/test_get_len vmlinux + $(call cmd,posttest) + +test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c +test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + +quiet_cmd_test_get_len = CC $@ + cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) -I
[PATCH -tip -v12 08/11] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes (kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /sys/kernel/debug/tracing/kprobe_events. And you can set filters on each probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobetrace.txt for details. Changes from v11: - Put a line after local variable definitions. - Fix indirect memory access string bug in trace_arg_string(). - Remove redundant checks. - Fix buffer overflow in probes_write(). - Fix probes_write() to support inputs ended without a new-line. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com Cc: Li Zefan l...@cn.fujitsu.com --- Documentation/trace/kprobetrace.txt | 138 kernel/trace/Kconfig| 12 kernel/trace/Makefile |1 kernel/trace/trace.h| 29 + kernel/trace/trace_event_types.h| 18 + kernel/trace/trace_kprobe.c | 1193 +++ 6 files changed, 1391 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt new file mode 100644 index 000..9ad907c --- /dev/null +++ b/Documentation/trace/kprobetrace.txt @@ -0,0 +1,138 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each +probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe + + EVENT : Event name. + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. + MEMADDR : Address where the probe is inserted. + + FETCHARGS : Arguments. + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace buffer. If an event +name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds +an event under tracing/events/kprobes/EVENT, at the directory you can see +'id', 'enabled', 'format' and 'filter'. + +enabled: + You can enable/disable the probe by writing 1 or 0 on it. + +format: + It shows the format of this probe event. It also shows aliases of arguments + which you specified
[PATCH -tip -v12 03/11] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..5341842 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v12 01/11] x86: instruction decoder API
On 2009年07月16日 12:19, H. Peter Anvin wrote: Masami Hiramatsu wrote: These opcode maps do NOT include most of SSE and FP opcodes, because those opcodes are not used in the kernel. That is not true. Ah, these opcode maps include some SSE/FP setup opcdes which are used in the kernel. I've found that opcodes while running selftest of decoder, so, I checked asm() code and added those in the maps. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v12 01/11] x86: instruction decoder API
Sam Ravnborg wrote: diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H [With reference to comment on patch 2/12...] You create inat.h here. Could you investigave what is needed to factor out the stuff needed from userspace so we can avoid the ugly havk where you redefine types.h? Sorry, I'm a bit confusing. Would you mean that I should break down user_include.h and add those redefined types in inat.h? Maybe create a inat_types.h + inat.h as we do in other cases? And inat_types.h has two parts, one for kernel, and one for userspace(which is moved from user_include.h), is that right? Thank you, Same for the other files that requred the types.h hack. Sam -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest
Sam Ravnborg wrote: On Thu, Jul 16, 2009 at 11:57:06AM -0400, Masami Hiramatsu wrote: Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Long overdue review from my side... arch/x86/scripts/Makefile | 19 +++ arch/x86/scripts/distill.awk| 42 + arch/x86/scripts/test_get_len.c | 99 +++ arch/x86/scripts/user_include.h | 49 +++ Hmmm, we have two architectures that uses scripts/ and three that uses tools/. I prefer the latter name as what we have ere is beyound what I generally recognize as a script. we have scripts/ in top-level and we do not rename this as we have this hardcoded too many places - but no reason to use the wrong name here. diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 01e079a..9090665 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -20,7 +20,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* Instruction attributes */ typedef u32 insn_attr_t; Why this? If you need this to use this file from userspace then could we do some other trick to make this OK? I see it repeated several times below. [If this has already been discussed I have missed it - sorry]. diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile new file mode 100644 index 000..f08859e --- /dev/null +++ b/arch/x86/scripts/Makefile @@ -0,0 +1,19 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len + You are using the native objdump here. But I assume this fails miserably when you build x86 on a powerpc host. In other words - you broke an allyesconfig build for -next... We have $(OBJDUMP) for this. Ah, I see... Would you know actual name of x86-objdump on the powerpc (or any other crosscompiling host)? I just set OBJDUMP=objdump is OK? I'm not so sure about cross-compiling kernel... +posttest: $(obj)/test_get_len vmlinux +$(call cmd,posttest) + +test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c +test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + +quiet_cmd_test_get_len = CC $@ + cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include -include $(srctree)/arch/x86/scripts/user_include.h -o $@ Is there a specific reason why you cannot use the standard hostprogs-y for this? It will take care of dependency tracking etc. What you have above is a hopeless incomplete list of dependencies. You need to use HOST_EXTRACFLAGS to set additional -I options and the -include. Thank you, I'll try to use hostprogs-y. + +static void usage() +{ +fprintf(stderr, usage: %s distilled_disassembly\n, prog); +exit(1); +} It would be nice to tell the user what the program is supposed to do. I know this is a bit unusual but no reason to copy bad practice. Sure, maybe copying usage line in distill.awk is more helpful for user... Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest
Masami Hiramatsu wrote: You are using the native objdump here. But I assume this fails miserably when you build x86 on a powerpc host. In other words - you broke an allyesconfig build for -next... We have $(OBJDUMP) for this. Ah, I see... Would you know actual name of x86-objdump on the powerpc (or any other crosscompiling host)? I just set OBJDUMP=objdump is OK? I'm not so sure about cross-compiling kernel... Oops, we already have it. Yes, I'll use $(OBJDUMP). -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest
Sam Ravnborg wrote: + cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len + You are using the native objdump here. But I assume this fails miserably when you build x86 on a powerpc host. In other words - you broke an allyesconfig build for -next... We have $(OBJDUMP) for this. Ah, I see... Would you know actual name of x86-objdump on the powerpc (or any other crosscompiling host)? I just set OBJDUMP=objdump is OK? I'm not so sure about cross-compiling kernel... Replacing objdump with $(OBJDUMP) will do the trick. We set OBJDUMP to the correct value in the top-level makefile. Are there any parts of your user-space program that rely on the host is little-endian? If it does then it would fail on a power-pc target despite using the correct objdump. Hmm, as far as I can see, the result of get_next() macro with the types more than two bytes(s16, s32...) might be effected. But it doesn't effect get_insn_len test because those values are ignored. Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v12 01/11] x86: instruction decoder API
Sam Ravnborg wrote: On Thu, Jul 16, 2009 at 01:28:54PM -0400, Masami Hiramatsu wrote: Sam Ravnborg wrote: diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H [With reference to comment on patch 2/12...] You create inat.h here. Could you investigave what is needed to factor out the stuff needed from userspace so we can avoid the ugly havk where you redefine types.h? Sorry, I'm a bit confusing. Would you mean that I should break down user_include.h and add those redefined types in inat.h? No - try to factor out what is needed for your program so you can avoid user_include.h entirely. Maybe create a inat_types.h + inat.h as we do in other cases? And inat_types.h has two parts, one for kernel, and one for userspace(which is moved from user_include.h), is that right? More like inat_types.h include pure definitions and inat.h define all the macros (that would be much nicer if expressed as static inlines). OK, some macros still need to be macros, because it will be used for defining static tables. The real thing to consider is what is needed from your userspace program and is also required by the kernel. I did not event remotely try to find out - as I guess you know it. So try to isolate these bits somehow and you have then nicely dropped a lot of dependencies on the remainign headers and can thus hopefully get rid of the ugly usser_include.h hack. OK, I'll try to remove user_include.h hack. Thank you so much! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v11 11/11] tracing: Add kprobes event profiling interface
Hi, Li Zefan wrote: +Event Profiling +--- + You can check the total number of probe hits and probe miss-hits via +/sys/kernel/debug/tracing/kprobe_profile. + The fist column is event name, the second is the number of probe hits, s/fist/first Oops, fixed. +the third is the number of probe miss-hits. + + ... +/* Probes profiling interfaces */ +static int profile_seq_show(struct seq_file *m, void *v) +{ +struct trace_probe *tp = v; + +if (tp == NULL) +return 0; + tp will never be NULL, which is guaranteed by seq_file OK, fixed. +seq_printf(m, %s, tp-call.name); + +seq_printf(m, \t%8lu %8lu\n, tp-nhits, + probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed); + +return 0; +} Thank you for review! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v11 08/11] tracing: add kprobe-based event tracer
for review my patch! -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v11 03/11] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..5341842 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v11 06/11] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Frederic Weisbecker fweis...@gmail.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com --- include/linux/ftrace_event.h | 13 +--- include/trace/ftrace.h | 22 +++-- kernel/trace/trace_events.c | 70 -- kernel/trace/trace_export.c | 27 4 files changed, 85 insertions(+), 47 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 5c093ff..f7733b6 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 1867553..d696580 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -147,7 +147,8 @@ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call( \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *unused) * { * int ret; * @@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call( \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *unused) * { * unregister_trace_call(ftrace_event_call); * } @@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call( \ * trace_current_buffer_unlock_commit(event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_call(void) + * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused) * { * int
[PATCH -tip -v11 04/11] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 5341842..b77e050 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67: - case 0x2e: - case 0x3e: - case 0x26: - case 0x64: - case 0x65: - case 0x36: - case 0xf0: - case 0xf3: - case 0xf2: - ++insn; - continue; - } - break; - } + struct insn insn; + kernel_insn_init(insn, p-ainsn.insn); - /* Skip REX instruction prefix. */ - if (is_REX_prefix(insn)) - ++insn; - - if (*insn == 0x0f) { - /* Two-byte opcode. */ - ++insn
[PATCH -tip -v11 01/11] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps do NOT include most of SSE and FP opcodes, because those opcodes are not used in the kernel. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Acked-by: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/include/asm/inat.h| 125 ++ arch/x86/include/asm/insn.h| 134 ++ arch/x86/lib/Makefile | 13 + arch/x86/lib/inat.c| 80 arch/x86/lib/insn.c| 471 + arch/x86/lib/x86-opcode-map.txt| 711 arch/x86/scripts/gen-insn-attr-x86.awk | 314 ++ 7 files changed, 1848 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include linux/types.h + +/* Instruction attributes */ +typedef u32 insn_attr_t; + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should add checking macros and use that macro in + * your code. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6 +#define INAT_IMM_VWORD 7 + +/* Legacy prefix */ +#define INAT_PFX_OFFS 0 +#define INAT_PFX_BITS 4 +#define INAT_PFX_MAX((1 INAT_PFX_BITS) - 1) +#define INAT_PFX_MASK (INAT_PFX_MAX INAT_PFX_OFFS) +/* Escape opcodes */ +#define INAT_ESC_OFFS (INAT_PFX_OFFS + INAT_PFX_BITS) +#define INAT_ESC_BITS 2 +#define INAT_ESC_MAX
[PATCH -tip -v11 05/11] x86: add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from pt_regs. These APIs are required by kprobes-based event tracer on ftrace. Some other debugging tools might be able to use it too. - regs_query_register_offset(const char *name) Query the offset of name register. - regs_query_register_name(unsigned int offset) Query the name of register by its offset. - regs_get_register(struct pt_regs *regs, unsigned int offset) Get the value of a register by its offset. - regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth) Get Nth entry of the kernel stack. (N = 0) - regs_get_argument_nth(struct pt_regs *reg, unsigned int nth) Get Nth argument at function call. (N = 0) Changes from v10: - Use an offsetof table in regs_get_argument_nth(). - Use unsigned int instead of unsigned. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Reviewed-by: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@firstfloor.org Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: linux-a...@vger.kernel.org --- arch/x86/include/asm/ptrace.h | 62 +++ arch/x86/kernel/ptrace.c | 112 + 2 files changed, 174 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..a3d49dd 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int regs_query_register_offset(const char *name); +extern const char *regs_query_register_name(unsigned int offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/** + * regs_get_register() - get register value from its offset + * @regs: pt_regs from which register value is gotten. + * @offset:offset number of the register. + * + * regs_get_register returns the value of a register whose offset from @regs + * is @offset. The @offset is the offset of the register in struct pt_regs. + * If @offset is bigger than MAX_REG_OFFSET, this returns 0. + */ +static inline unsigned long regs_get_register(struct pt_regs *regs, + unsigned int offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/** + * regs_within_kernel_stack() - check the address in the stack + * @regs: pt_regs which contains kernel stack pointer. + * @addr: address which is checked. + * + * regs_within_kenel_stack() checks @addr is within the kernel stack page(s). + * If @addr is within the kernel stack, it returns true. If not, returns false. + */ +static inline int regs_within_kernel_stack(struct pt_regs *regs, + unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/** + * regs_get_kernel_stack_nth() - get Nth entry of the stack + * @regs: pt_regs which contains kernel stack pointer. + * @n: stack entry number. + * + * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which + * is specifined by @regs. If the @n th entry is NOT in the kernel stack, + * this returns 0. + */ +static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs, + unsigned int n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (regs_within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +extern unsigned long regs_get_argument_nth(struct pt_regs *regs, + unsigned int n); + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index cabdabc..32729ec 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -49,6 +49,118 @@ enum x86_regset { REGSET_IOPERM32, }; +struct pt_regs_offset { + const char *name; + int offset; +}; + +#define REG_OFFSET_NAME(r) {.name = #r, .offset = offsetof(struct pt_regs, r)} +#define REG_OFFSET_END {.name = NULL, .offset = 0} + +static const struct pt_regs_offset regoffset_table[] = { +#ifdef
[PATCH -tip -v11 07/11] tracing: Introduce TRACE_FIELD_ZERO() macro
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item). This also includes a fix of TRACE_ZERO_CHAR() macro. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- kernel/trace/trace_event_types.h |4 ++-- kernel/trace/trace_export.c | 16 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 6db005e..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) TRACE_FIELD(char *, fmt, fmt) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); @@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, ignore, TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore, TRACE_STRUCT( TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD_ZERO_CHAR(buf) + TRACE_FIELD_ZERO(char, buf) ), TP_RAW_FMT(%08lx (%d) fmt:%p %s) ); diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c index 7cee79d..23125b5 100644 --- a/kernel/trace/trace_export.c +++ b/kernel/trace/trace_export.c @@ -42,9 +42,9 @@ extern void __bad_type_size(void); if (!ret) \ return 0; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item)\ - ret = trace_seq_printf(s, \tfield:char #item ;\t \ +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) \ + ret = trace_seq_printf(s, \tfield: #type #item ;\t \ offset:%u;\tsize:0;\n, \ (unsigned int)offsetof(typeof(field), item)); \ if (!ret) \ @@ -90,9 +90,6 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct trace_seq *s)\ #include trace_event_types.h -#undef TRACE_ZERO_CHAR -#define TRACE_ZERO_CHAR(arg) - #undef TRACE_FIELD #define TRACE_FIELD(type, item, assign)\ entry-item = assign; @@ -105,6 +102,9 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct trace_seq *s)\ #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\ TRACE_FIELD(type, item, assign) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) + #undef TP_CMD #define TP_CMD(cmd...) cmd @@ -176,8 +176,8 @@ __attribute__((section(_ftrace_events))) event_##call = { \ if (ret)\ return ret; -#undef TRACE_FIELD_ZERO_CHAR -#define TRACE_FIELD_ZERO_CHAR(item) +#undef TRACE_FIELD_ZERO +#define TRACE_FIELD_ZERO(type, item) #undef TRACE_EVENT_FORMAT #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \ -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v11 02/11] x86: x86 instruction decoder build-time selftest
Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Changes from v10: - Use unsigned int instead of unsigned. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Sam Ravnborg s...@ravnborg.org --- arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 + arch/x86/include/asm/inat.h |2 + arch/x86/include/asm/insn.h |2 + arch/x86/lib/inat.c |2 + arch/x86/lib/insn.c |2 + arch/x86/scripts/Makefile | 19 +++ arch/x86/scripts/distill.awk| 42 + arch/x86/scripts/test_get_len.c | 99 +++ arch/x86/scripts/user_include.h | 49 +++ 10 files changed, 229 insertions(+), 0 deletions(-) create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d105f29..7d0b681 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -186,6 +186,15 @@ config X86_DS_SELFTEST config HAVE_MMIOTRACE_SUPPORT def_bool y +config X86_DECODER_SELFTEST + bool x86 instruction decoder selftest + depends on DEBUG_KERNEL + ---help--- +Perform x86 instruction decoder selftests at build time. +This option is useful for checking the sanity of x86 instruction +decoder code. +If unsure, say N. + # # IO delay types: # diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1b68659..7046556 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -154,6 +154,9 @@ all: bzImage KBUILD_IMAGE := $(boot)/bzImage bzImage: vmlinux +ifeq ($(CONFIG_X86_DECODER_SELFTEST),y) + $(Q)$(MAKE) $(build)=arch/x86/scripts posttest +endif $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 01e079a..9090665 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -20,7 +20,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* Instruction attributes */ typedef u32 insn_attr_t; diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h index 5b50fa3..5736404 100644 --- a/arch/x86/include/asm/insn.h +++ b/arch/x86/include/asm/insn.h @@ -20,7 +20,9 @@ * Copyright (C) IBM Corporation, 2009 */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* insn_attr_t is defined in inat.h */ #include asm/inat.h diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c index d6a34be..564ecbd 100644 --- a/arch/x86/lib/inat.c +++ b/arch/x86/lib/inat.c @@ -18,7 +18,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/module.h +#endif #include asm/insn.h /* Attribute tables are generated from opcode map */ diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c index 254c848..3b9451a 100644 --- a/arch/x86/lib/insn.c +++ b/arch/x86/lib/insn.c @@ -18,8 +18,10 @@ * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ +#ifdef __KERNEL__ #include linux/string.h #include linux/module.h +#endif #include asm/inat.h #include asm/insn.h diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile new file mode 100644 index 000..f08859e --- /dev/null +++ b/arch/x86/scripts/Makefile @@ -0,0 +1,19 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len + +posttest: $(obj)/test_get_len vmlinux + $(call cmd,posttest) + +test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c +test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + +quiet_cmd_test_get_len = CC $@ + cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) -I
[PATCH -tip -v11 10/11] tracing: Generate names for each kprobe event automatically
Generate names for each kprobe event based on the probe point, and remove generic k*probe event types because there is no user of those types. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |3 +- kernel/trace/trace_event_types.h| 18 -- kernel/trace/trace_kprobe.c | 62 +++ 3 files changed, 35 insertions(+), 48 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index b29a54b..437ad49 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -28,7 +28,8 @@ Synopsis of kprobe_events p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe - EVENT : Event name. + EVENT : Event name. If omitted, the event name is generated + based on SYMBOL+offs or MEMADDR. SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 186b598..e74f090 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, kmemtrace_free_entry, ignore, TP_RAW_FMT(type:%u call_site:%lx ptr:%p) ); -TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, ip, ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx: args:0x%lx ...) -); - -TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore, - TRACE_STRUCT( - TRACE_FIELD(unsigned long, func, func) - TRACE_FIELD(unsigned long, ret_ip, ret_ip) - TRACE_FIELD(int, nargs, nargs) - TRACE_FIELD_ZERO(unsigned long, args) - ), - TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...) -); #undef TRACE_SYSTEM diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 8754c7e..9c6ffcc 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -34,6 +34,7 @@ #define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 +#define MAX_EVENT_NAME_LEN 64 /* currently, trace_kprobe only supports X86. */ @@ -265,11 +266,11 @@ static struct trace_probe *alloc_trace_probe(const char *symbol, if (!tp-symbol) goto error; } - if (event) { - tp-call.name = kstrdup(event, GFP_KERNEL); - if (!tp-call.name) - goto error; - } + if (!event) + goto error; + tp-call.name = kstrdup(event, GFP_KERNEL); + if (!tp-call.name) + goto error; INIT_LIST_HEAD(tp-list); return tp; @@ -297,7 +298,7 @@ static struct trace_probe *find_probe_event(const char *event) { struct trace_probe *tp; list_for_each_entry(tp, probe_list, list) - if (tp-call.name !strcmp(tp-call.name, event)) + if (!strcmp(tp-call.name, event)) return tp; return NULL; } @@ -313,8 +314,7 @@ static void __unregister_trace_probe(struct trace_probe *tp) /* Unregister a trace_probe and probe_event: call with locking probe_lock */ static void unregister_trace_probe(struct trace_probe *tp) { - if (tp-call.name) - unregister_probe_event(tp); + unregister_probe_event(tp); __unregister_trace_probe(tp); list_del(tp-list); } @@ -343,18 +343,16 @@ static int register_trace_probe(struct trace_probe *tp) goto end; } /* register as an event */ - if (tp-call.name) { - old_tp = find_probe_event(tp-call.name); - if (old_tp) { - /* delete old event */ - unregister_trace_probe(old_tp); - free_trace_probe(old_tp); - } - ret = register_probe_event(tp); - if (ret) { - pr_warning(Faild to register probe event(%d)\n, ret); - __unregister_trace_probe(tp); - } + old_tp = find_probe_event(tp-call.name); + if (old_tp) { + /* delete old event */ + unregister_trace_probe(old_tp); + free_trace_probe(old_tp); + } + ret = register_probe_event(tp); + if (ret) { + pr_warning(Faild
[PATCH -tip -v11 08/11] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes (kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /sys/kernel/debug/tracing/kprobe_events. And you can set filters on each probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobetrace.txt for details. Changes from v10: - Use unsigned int instead of unsigned. - Make kprobe_trace_entry and kretprobe_trace_entry variable array. - Use TRACE_FIELD_ZERO() - Rename the document to kprobetrace.txt. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt | 138 kernel/trace/Kconfig| 12 kernel/trace/Makefile |1 kernel/trace/trace.h| 29 + kernel/trace/trace_event_types.h| 18 + kernel/trace/trace_kprobe.c | 1183 +++ 6 files changed, 1381 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt new file mode 100644 index 000..9ad907c --- /dev/null +++ b/Documentation/trace/kprobetrace.txt @@ -0,0 +1,138 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each +probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe + + EVENT : Event name. + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. + MEMADDR : Address where the probe is inserted. + + FETCHARGS : Arguments. + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace buffer. If an event +name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds +an event under tracing/events/kprobes/EVENT, at the directory you can see +'id', 'enabled', 'format' and 'filter'. + +enabled: + You can enable/disable the probe by writing 1 or 0 on it. + +format: + It shows the format of this probe event. It also shows aliases of arguments + which you specified to kprobe_events. + +filter: + You can write filtering rules of this event. And you can use both
[PATCH -tip -v11 11/11] tracing: Add kprobes event profiling interface
Add profiling interaces for each kprobes event. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |8 ++ kernel/trace/trace_kprobe.c | 48 +++ 2 files changed, 56 insertions(+), 0 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 437ad49..d386d96 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -69,6 +69,14 @@ filter: names and field names for describing filters. +Event Profiling +--- + You can check the total number of probe hits and probe miss-hits via +/sys/kernel/debug/tracing/kprobe_profile. + The fist column is event name, the second is the number of probe hits, +the third is the number of probe miss-hits. + + Usage examples -- To add a probe as a new event, write a new definition to kprobe_events diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 9c6ffcc..cbff9d5 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -174,6 +174,7 @@ struct trace_probe { struct kprobe kp; struct kretproberp; }; + unsigned long nhits; const char *symbol;/* symbol name */ struct ftrace_event_callcall; unsigned intnr_args; @@ -762,6 +763,42 @@ static const struct file_operations kprobe_events_ops = { .write = probes_write, }; +/* Probes profiling interfaces */ +static int profile_seq_show(struct seq_file *m, void *v) +{ + struct trace_probe *tp = v; + + if (tp == NULL) + return 0; + + seq_printf(m, %s, tp-call.name); + + seq_printf(m, \t%8lu %8lu\n, tp-nhits, + probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed); + + return 0; +} + +static const struct seq_operations profile_seq_op = { + .start = probes_seq_start, + .next = probes_seq_next, + .stop = probes_seq_stop, + .show = profile_seq_show +}; + +static int profile_open(struct inode *inode, struct file *file) +{ + return seq_open(file, profile_seq_op); +} + +static const struct file_operations kprobe_profile_ops = { + .owner = THIS_MODULE, + .open = profile_open, + .read = seq_read, + .llseek = seq_lseek, + .release= seq_release, +}; + /* Kprobe handler */ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) { @@ -772,6 +809,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs) unsigned long irq_flags; struct ftrace_event_call *call = tp-call; + tp-nhits++; + local_save_flags(irq_flags); pc = preempt_count(); @@ -1145,9 +1184,18 @@ static __init int init_kprobe_trace(void) entry = debugfs_create_file(kprobe_events, 0644, d_tracer, NULL, kprobe_events_ops); + /* Event list interface */ if (!entry) pr_warning(Could not create debugfs 'kprobe_events' entry\n); + + /* Profile interface */ + entry = debugfs_create_file(kprobe_profile, 0444, d_tracer, + NULL, kprobe_profile_ops); + + if (!entry) + pr_warning(Could not create debugfs + 'kprobe_profile' entry\n); return 0; } fs_initcall(init_kprobe_trace); -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v11 00/11] tracing: kprobe-based event tracer and x86 instruction decoder
/tracing/kprobe_events This sets a kretprobe on the return point of do_sys_open() function with recording return value and return address as myretprobe event. You can see the format of these events via /sys/kernel/debug/tracing/events/kprobes/EVENT/format. cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format name: myprobe ID: 23 format: field:unsigned short common_type; offset:0; size:2; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (11): tracing: Add kprobes event profiling interface tracing: Generate names for each kprobe event automatically tracing: Kprobe-tracer supports more than 6 arguments tracing: add kprobe-based event tracer tracing: Introduce TRACE_FIELD_ZERO() macro tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobetrace.txt| 147 arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 arch/x86/include/asm/inat.h| 127 +++ arch/x86/include/asm/insn.h| 136 arch/x86/include/asm/ptrace.h | 62 ++ arch/x86/kernel/kprobes.c | 197 ++--- arch/x86/kernel/ptrace.c | 112 +++ arch/x86/lib/Makefile | 13 arch/x86/lib/inat.c| 82 ++ arch/x86/lib/insn.c| 473 arch/x86/lib/x86-opcode-map.txt| 711 ++ arch/x86/scripts/Makefile | 19 arch/x86/scripts/distill.awk | 42 + arch/x86/scripts/gen-insn-attr-x86.awk | 314 arch/x86/scripts/test_get_len.c| 99 +++ arch/x86/scripts/user_include.h| 49 + include/linux/ftrace_event.h | 13 include/trace/ftrace.h | 22 - kernel/trace/Kconfig | 12 kernel/trace/Makefile |1 kernel/trace/trace.h | 29 + kernel/trace/trace_event_types.h |4 kernel/trace/trace_events.c| 70 +- kernel/trace/trace_export.c| 43 + kernel/trace/trace_kprobe.c| 1240 26 files changed, 3867 insertions(+), 162 deletions(-) create mode 100644 Documentation/trace/kprobetrace.txt create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk create mode 100644 arch/x86/scripts
[PATCH -tip -v11 09/11] tracing: Kprobe-tracer supports more than 6 arguments
Support up to 128 arguments for each kprobes event. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobetrace.txt |2 +- kernel/trace/trace_kprobe.c | 21 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 9ad907c..b29a54b 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt @@ -32,7 +32,7 @@ Synopsis of kprobe_events SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. - FETCHARGS : Arguments. + FETCHARGS : Arguments. Each probe can have up to 128 args. %REG : Fetch register REG sN : Fetch Nth entry of stack (N = 0) @ADDR: Fetch memory at ADDR (ADDR should be in kernel) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 57bf521..8754c7e 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -32,7 +32,7 @@ #include trace.h #include trace_output.h -#define TRACE_KPROBE_ARGS 6 +#define MAX_TRACE_ARGS 128 #define MAX_ARGSTR_LEN 63 /* currently, trace_kprobe only supports X86. */ @@ -174,11 +174,15 @@ struct trace_probe { struct kretproberp; }; const char *symbol;/* symbol name */ - unsigned intnr_args; - struct fetch_func args[TRACE_KPROBE_ARGS]; struct ftrace_event_callcall; + unsigned intnr_args; + struct fetch_func args[]; }; +#define SIZEOF_TRACE_PROBE(n) \ + (offsetof(struct trace_probe, args) + \ + (sizeof(struct fetch_func) * (n))) + static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs); static int kretprobe_trace_func(struct kretprobe_instance *ri, struct pt_regs *regs); @@ -248,11 +252,11 @@ static DEFINE_MUTEX(probe_lock); static LIST_HEAD(probe_list); static struct trace_probe *alloc_trace_probe(const char *symbol, -const char *event) +const char *event, int nargs) { struct trace_probe *tp; - tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL); + tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL); if (!tp) return ERR_PTR(-ENOMEM); @@ -550,9 +554,10 @@ static int create_trace_probe(int argc, char **argv) if (offset is_return) return -EINVAL; } + argc -= 2; argv += 2; /* setup a probe */ - tp = alloc_trace_probe(symbol, event); + tp = alloc_trace_probe(symbol, event, argc); if (IS_ERR(tp)) return PTR_ERR(tp); @@ -571,8 +576,8 @@ static int create_trace_probe(int argc, char **argv) kp-addr = addr; /* parse arguments */ - argc -= 2; argv += 2; ret = 0; - for (i = 0; i argc i TRACE_KPROBE_ARGS; i++) { + ret = 0; + for (i = 0; i argc i MAX_TRACE_ARGS; i++) { if (strlen(argv[i]) MAX_ARGSTR_LEN) { pr_info(Argument%d(%s) is too long.\n, i, argv[i]); ret = -ENOSPC; -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer
Frederic Weisbecker wrote: diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 206cb7d..65945eb 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -45,6 +45,8 @@ enum trace_type { TRACE_POWER, TRACE_BLK, TRACE_KSYM, +TRACE_KPROBE, +TRACE_KRETPROBE, __TRACE_LAST_TYPE, }; @@ -227,6 +229,22 @@ struct trace_ksym { charksym_name[KSYM_NAME_LEN]; charp_name[TASK_COMM_LEN]; }; +#define TRACE_KPROBE_ARGS 6 + +struct kprobe_trace_entry { +struct trace_entry ent; +unsigned long ip; +int nargs; +unsigned long args[TRACE_KPROBE_ARGS]; I see that you actually make use of arg as a dynamic sizeable array. For clarity, args[TRACE_KPROBE_ARGS] could be args[0]. It's just a neat and wouldn't affect the code nor the data but would be clearer for readers of that code. Hmm. In that case, I think we'll need a new macro for field definition, like TRACE_FIELD_ZERO(type, item). +}; + +struct kretprobe_trace_entry { +struct trace_entry ent; +unsigned long func; +unsigned long ret_ip; +int nargs; +unsigned long args[TRACE_KPROBE_ARGS]; +}; ditto /* * trace_flag_type is an enumeration that holds different @@ -344,6 +362,10 @@ extern void __ftrace_bad_type(void); IF_ASSIGN(var, ent, struct syscall_trace_exit, \ TRACE_SYSCALL_EXIT); \ IF_ASSIGN(var, ent, struct trace_ksym, TRACE_KSYM); \ +IF_ASSIGN(var, ent, struct kprobe_trace_entry, \ + TRACE_KPROBE);\ +IF_ASSIGN(var, ent, struct kretprobe_trace_entry, \ + TRACE_KRETPROBE); \ __ftrace_bad_type();\ } while (0) diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h index 6db005e..ec2e6f3 100644 --- a/kernel/trace/trace_event_types.h +++ b/kernel/trace/trace_event_types.h @@ -175,4 +175,24 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, kmemtrace_free_entry, ignore, TP_RAW_FMT(type:%u call_site:%lx ptr:%p) ); +TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore, +TRACE_STRUCT( +TRACE_FIELD(unsigned long, ip, ip) +TRACE_FIELD(int, nargs, nargs) +TRACE_FIELD_SPECIAL(unsigned long args[TRACE_KPROBE_ARGS], +args, TRACE_KPROBE_ARGS, args) +), +TP_RAW_FMT(%08lx: args:0x%lx ...) +); + +TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore, +TRACE_STRUCT( +TRACE_FIELD(unsigned long, func, func) +TRACE_FIELD(unsigned long, ret_ip, ret_ip) +TRACE_FIELD(int, nargs, nargs) +TRACE_FIELD_SPECIAL(unsigned long args[TRACE_KPROBE_ARGS], +args, TRACE_KPROBE_ARGS, args) +), +TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...) +); #undef TRACE_SYSTEM diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c new file mode 100644 index 000..0951512 --- /dev/null +++ b/kernel/trace/trace_kprobe.c @@ -0,0 +1,1183 @@ +/* + * kprobe based kernel tracer + * + * Created by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#include linux/module.h +#include linux/uaccess.h +#include linux/kprobes.h +#include linux/seq_file.h +#include linux/slab.h +#include linux/smp.h +#include linux/debugfs.h +#include linux/types.h +#include linux/string.h +#include linux/ctype.h +#include linux/ptrace.h + +#include trace.h +#include trace_output.h + +#define MAX_ARGSTR_LEN 63 + +/* currently, trace_kprobe only supports X86. */ + +struct fetch_func { +unsigned long (*func)(struct pt_regs *, void *); +void *data; +}; + +static __kprobes unsigned long call_fetch(struct fetch_func *f, + struct pt_regs *regs) +{ +return f-func(regs, f-data); +} + +/* fetch handlers */ +static __kprobes unsigned long
Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer
Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 04:42:32PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 03:55:28PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 206cb7d..65945eb 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -45,6 +45,8 @@ enum trace_type { TRACE_POWER, TRACE_BLK, TRACE_KSYM, +TRACE_KPROBE, +TRACE_KRETPROBE, __TRACE_LAST_TYPE, }; @@ -227,6 +229,22 @@ struct trace_ksym { charksym_name[KSYM_NAME_LEN]; charp_name[TASK_COMM_LEN]; }; +#define TRACE_KPROBE_ARGS 6 + +struct kprobe_trace_entry { +struct trace_entry ent; +unsigned long ip; +int nargs; +unsigned long args[TRACE_KPROBE_ARGS]; I see that you actually make use of arg as a dynamic sizeable array. For clarity, args[TRACE_KPROBE_ARGS] could be args[0]. It's just a neat and wouldn't affect the code nor the data but would be clearer for readers of that code. Hmm. In that case, I think we'll need a new macro for field definition, like TRACE_FIELD_ZERO(type, item). You mean that for trace_define_field() to describe fields of events? Actually the fields should be defined dynamically depending on how is built the kprobe event (which arguments are requested, how many, etc..). Yeah, if you specified a probe point with its event name, the tracer will make a corresponding event dynamically. There are also anonymous probes which don't have corresponding events. For those anonymous probes, I need to define two generic event types(kprobe and kretprobe). Thank you, Ok. Btw, why do you need to define those two anonymous events? Actually your event types are always dynamically created. Those you defined through TRACE_FORMAT_EVENT are only ghost events, they only stand there as a abstract pattern, right? Not always created. Below command will create an event event1; p probe_point:event1 a1 a2 a3 ... /debug/tracing/kprobe_events But next command doesn't create. p probe_point a1 a2 a3 ... /debug/tracing/kprobe_events This just inserts a kprobe to probe_point. the advantage of this simple command is that you never be annoyed by making different name for new events :-) Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer
Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 05:31:25PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 04:42:32PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 03:55:28PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 206cb7d..65945eb 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -45,6 +45,8 @@ enum trace_type { TRACE_POWER, TRACE_BLK, TRACE_KSYM, + TRACE_KPROBE, + TRACE_KRETPROBE, __TRACE_LAST_TYPE, }; @@ -227,6 +229,22 @@ struct trace_ksym { charksym_name[KSYM_NAME_LEN]; charp_name[TASK_COMM_LEN]; }; +#define TRACE_KPROBE_ARGS 6 + +struct kprobe_trace_entry { + struct trace_entry ent; + unsigned long ip; + int nargs; + unsigned long args[TRACE_KPROBE_ARGS]; I see that you actually make use of arg as a dynamic sizeable array. For clarity, args[TRACE_KPROBE_ARGS] could be args[0]. It's just a neat and wouldn't affect the code nor the data but would be clearer for readers of that code. Hmm. In that case, I think we'll need a new macro for field definition, like TRACE_FIELD_ZERO(type, item). You mean that for trace_define_field() to describe fields of events? Actually the fields should be defined dynamically depending on how is built the kprobe event (which arguments are requested, how many, etc..). Yeah, if you specified a probe point with its event name, the tracer will make a corresponding event dynamically. There are also anonymous probes which don't have corresponding events. For those anonymous probes, I need to define two generic event types(kprobe and kretprobe). Thank you, Ok. Btw, why do you need to define those two anonymous events? Actually your event types are always dynamically created. Those you defined through TRACE_FORMAT_EVENT are only ghost events, they only stand there as a abstract pattern, right? Not always created. Below command will create an event event1; p probe_point:event1 a1 a2 a3 ... /debug/tracing/kprobe_events But next command doesn't create. p probe_point a1 a2 a3 ... /debug/tracing/kprobe_events Aah, ok. This just inserts a kprobe to probe_point. the advantage of this simple command is that you never be annoyed by making different name for new events :-) Indeed. But speaking about that, may be you could dynamically create a name following this simple model: func+offset Unless we can set several kprobes on the exact same address? Actually, we can... I thought that someone might want to insert events in the same address for retrieving more than 6 arguments. Thanks, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer
Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 05:31:25PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 04:42:32PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: On Tue, Jul 07, 2009 at 03:55:28PM -0400, Masami Hiramatsu wrote: Frederic Weisbecker wrote: diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 206cb7d..65945eb 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -45,6 +45,8 @@ enum trace_type { TRACE_POWER, TRACE_BLK, TRACE_KSYM, + TRACE_KPROBE, + TRACE_KRETPROBE, __TRACE_LAST_TYPE, }; @@ -227,6 +229,22 @@ struct trace_ksym { charksym_name[KSYM_NAME_LEN]; charp_name[TASK_COMM_LEN]; }; +#define TRACE_KPROBE_ARGS 6 + +struct kprobe_trace_entry { + struct trace_entry ent; + unsigned long ip; + int nargs; + unsigned long args[TRACE_KPROBE_ARGS]; I see that you actually make use of arg as a dynamic sizeable array. For clarity, args[TRACE_KPROBE_ARGS] could be args[0]. It's just a neat and wouldn't affect the code nor the data but would be clearer for readers of that code. Hmm. In that case, I think we'll need a new macro for field definition, like TRACE_FIELD_ZERO(type, item). You mean that for trace_define_field() to describe fields of events? Actually the fields should be defined dynamically depending on how is built the kprobe event (which arguments are requested, how many, etc..). Yeah, if you specified a probe point with its event name, the tracer will make a corresponding event dynamically. There are also anonymous probes which don't have corresponding events. For those anonymous probes, I need to define two generic event types(kprobe and kretprobe). Thank you, Ok. Btw, why do you need to define those two anonymous events? Actually your event types are always dynamically created. Those you defined through TRACE_FORMAT_EVENT are only ghost events, they only stand there as a abstract pattern, right? Not always created. Below command will create an event event1; p probe_point:event1 a1 a2 a3 ... /debug/tracing/kprobe_events But next command doesn't create. p probe_point a1 a2 a3 ... /debug/tracing/kprobe_events Aah, ok. This just inserts a kprobe to probe_point. the advantage of this simple command is that you never be annoyed by making different name for new events :-) Indeed. But speaking about that, may be you could dynamically create a name following this simple model: func+offset hmm, and we have two probe types, p(robe) and r(et probe). so, event name should be t...@func+offset or t...@address. Unless we can set several kprobes on the exact same address? Actually, we can... I thought that someone might want to insert events in the same address for retrieving more than 6 arguments. Anyway, I can improve the interface according to user's voice. If you have good idea, I'm happy to hear that:-) Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -tip -v10 5/7] x86: add pt_regs register and stack access APIs
Andi Kleen wrote: Masami Hiramatsu mhira...@redhat.com writes: Add following APIs for accessing registers and stack entries from pt_regs. You forgot to state who calls these functions/why are they added? Who only has strings for registers? Oh, yes. This patch is needed for kprobes based event tracer on ftrace. Some other debugging tools might be able to use it. I can see the point of having a function for nth argument though, that's useful. +static inline unsigned long regs_get_argument_nth(struct pt_regs *regs, + unsigned n) +{ +if (n NR_REGPARMS) { +switch (n) { +case 0: +return regs-ax; +case 1: +return regs-dx; +case 2: +return regs-cx; [] That could be done shorter with a offsetof table. +if (n NR_REGPARMS) { +switch (n) { +case 0: +return regs-di; +case 1: +return regs-si; +case 2: +return regs-dx; +case 3: +return regs-cx; +case 4: +return regs-r8; +case 5: +return regs-r9; and that too. I'm not so sure about your idea. Would you mean below code? int offs_table[NR_REGPARMS] = { [0] = offsetof(struct pt_regs, di), ... }; if (n NR_REGPARMS) return *((unsigned long *)regs + offs_table[n]); Thank you, -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v10 5/7] x86: add pt_regs register and stack access APIs
Andi Kleen wrote: On Mon, Jul 06, 2009 at 03:28:02PM -0400, Masami Hiramatsu wrote: I'm not so sure about your idea. Would you mean below code? int offs_table[NR_REGPARMS] = { not REGPARMS of course [0] = offsetof(struct pt_regs, di), ... }; if (n NR_REGPARMS) return *((unsigned long *)regs + offs_table[n]); Yes. OK, here, I updated my patch. Thank you, x86: add pt_regs register and stack access APIs From: Masami Hiramatsu mhira...@redhat.com Add following APIs for accessing registers and stack entries from pt_regs. These APIs are required by kprobes-based event tracer on ftrace. Some other debugging tools might be able to use it too. - regs_query_register_offset(const char *name) Query the offset of name register. - regs_query_register_name(unsigned offset) Query the name of register by its offset. - regs_get_register(struct pt_regs *regs, unsigned offset) Get the value of a register by its offset. - regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned nth) Get Nth entry of the kernel stack. (N = 0) - regs_get_argument_nth(struct pt_regs *reg, unsigned nth) Get Nth argument at function call. (N = 0) Changes from v10: - Use an offsetof table in regs_get_argument_nth(). Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Andi Kleen a...@firstfloor.org Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: linux-a...@vger.kernel.org --- arch/x86/include/asm/ptrace.h | 61 ++ arch/x86/kernel/ptrace.c | 112 + 2 files changed, 173 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..a9b7e2d 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,66 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int regs_query_register_offset(const char *name); +extern const char *regs_query_register_name(unsigned offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/** + * regs_get_register() - get register value from its offset + * @regs: pt_regs from which register value is gotten. + * @offset:offset number of the register. + * + * regs_get_register returns the value of a register whose offset from @regs + * is @offset. The @offset is the offset of the register in struct pt_regs. + * If @offset is bigger than MAX_REG_OFFSET, this returns 0. + */ +static inline unsigned long regs_get_register(struct pt_regs *regs, + unsigned offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/** + * regs_within_kernel_stack() - check the address in the stack + * @regs: pt_regs which contains kernel stack pointer. + * @addr: address which is checked. + * + * regs_within_kenel_stack() checks @addr is within the kernel stack page(s). + * If @addr is within the kernel stack, it returns true. If not, returns false. + */ +static inline int regs_within_kernel_stack(struct pt_regs *regs, + unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/** + * regs_get_kernel_stack_nth() - get Nth entry of the stack + * @regs: pt_regs which contains kernel stack pointer. + * @n: stack entry number. + * + * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which + * is specifined by @regs. If the @n th entry is NOT in the kernel stack, + * this returns 0. + */ +static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs, + unsigned n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (regs_within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/* Get Nth argument at function call */ +extern unsigned long regs_get_argument_nth(struct pt_regs *regs, unsigned n); + /* * These are defined as per linux/ptrace.h, which see. */ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index cabdabc..4f9b513 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -49,6 +49,118 @@ enum x86_regset
[PATCH -tip -v10 2/7] x86: x86 instruction decoder build-time selftest
Add a user-space selftest of x86 instruction decoder at kernel build time. When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86 instruction decoder and performs it after building vmlinux. The test compares the results of objdump and x86 instruction decoder code and check there are no differences. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it Cc: Sam Ravnborg s...@ravnborg.org --- arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 + arch/x86/include/asm/inat.h |2 + arch/x86/include/asm/insn.h |2 + arch/x86/lib/inat.c |2 + arch/x86/lib/insn.c |2 + arch/x86/scripts/Makefile | 19 +++ arch/x86/scripts/distill.awk| 42 + arch/x86/scripts/test_get_len.c | 99 +++ arch/x86/scripts/user_include.h | 49 +++ 10 files changed, 229 insertions(+), 0 deletions(-) create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index d105f29..7d0b681 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -186,6 +186,15 @@ config X86_DS_SELFTEST config HAVE_MMIOTRACE_SUPPORT def_bool y +config X86_DECODER_SELFTEST + bool x86 instruction decoder selftest + depends on DEBUG_KERNEL + ---help--- +Perform x86 instruction decoder selftests at build time. +This option is useful for checking the sanity of x86 instruction +decoder code. +If unsure, say N. + # # IO delay types: # diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1b68659..7046556 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -154,6 +154,9 @@ all: bzImage KBUILD_IMAGE := $(boot)/bzImage bzImage: vmlinux +ifeq ($(CONFIG_X86_DECODER_SELFTEST),y) + $(Q)$(MAKE) $(build)=arch/x86/scripts posttest +endif $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE) $(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot $(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@ diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 01e079a..9090665 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -20,7 +20,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* Instruction attributes */ typedef u32 insn_attr_t; diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h index 5b50fa3..5736404 100644 --- a/arch/x86/include/asm/insn.h +++ b/arch/x86/include/asm/insn.h @@ -20,7 +20,9 @@ * Copyright (C) IBM Corporation, 2009 */ +#ifdef __KERNEL__ #include linux/types.h +#endif /* insn_attr_t is defined in inat.h */ #include asm/inat.h diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c index d6a34be..564ecbd 100644 --- a/arch/x86/lib/inat.c +++ b/arch/x86/lib/inat.c @@ -18,7 +18,9 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ +#ifdef __KERNEL__ #include linux/module.h +#endif #include asm/insn.h /* Attribute tables are generated from opcode map */ diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c index 254c848..3b9451a 100644 --- a/arch/x86/lib/insn.c +++ b/arch/x86/lib/insn.c @@ -18,8 +18,10 @@ * Copyright (C) IBM Corporation, 2002, 2004, 2009 */ +#ifdef __KERNEL__ #include linux/string.h #include linux/module.h +#endif #include asm/inat.h #include asm/insn.h diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile new file mode 100644 index 000..f08859e --- /dev/null +++ b/arch/x86/scripts/Makefile @@ -0,0 +1,19 @@ +PHONY += posttest +quiet_cmd_posttest = TEST$@ + cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len + +posttest: $(obj)/test_get_len vmlinux + $(call cmd,posttest) + +test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c +test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c + +quiet_cmd_test_get_len = CC $@ + cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include -include
[PATCH -tip -v10 5/7] x86: add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from pt_regs. - regs_query_register_offset(const char *name) Query the offset of name register. - regs_query_register_name(unsigned offset) Query the name of register by its offset. - regs_get_register(struct pt_regs *regs, unsigned offset) Get the value of a register by its offset. - regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned nth) Get Nth entry of the kernel stack. (N = 0) - regs_get_argument_nth(struct pt_regs *reg, unsigned nth) Get Nth argument at function call. (N = 0) Changes from v9: -Fix a typo in a comment. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: linux-a...@vger.kernel.org --- arch/x86/include/asm/ptrace.h | 122 + arch/x86/kernel/ptrace.c | 73 + 2 files changed, 195 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..d5e3b3b 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,127 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int regs_query_register_offset(const char *name); +extern const char *regs_query_register_name(unsigned offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/** + * regs_get_register() - get register value from its offset + * @regs: pt_regs from which register value is gotten. + * @offset:offset number of the register. + * + * regs_get_register returns the value of a register whose offset from @regs + * is @offset. The @offset is the offset of the register in struct pt_regs. + * If @offset is bigger than MAX_REG_OFFSET, this returns 0. + */ +static inline unsigned long regs_get_register(struct pt_regs *regs, + unsigned offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/** + * regs_within_kernel_stack() - check the address in the stack + * @regs: pt_regs which contains kernel stack pointer. + * @addr: address which is checked. + * + * regs_within_kenel_stack() checks @addr is within the kernel stack page(s). + * If @addr is within the kernel stack, it returns true. If not, returns false. + */ +static inline int regs_within_kernel_stack(struct pt_regs *regs, + unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/** + * regs_get_kernel_stack_nth() - get Nth entry of the stack + * @regs: pt_regs which contains kernel stack pointer. + * @n: stack entry number. + * + * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which + * is specifined by @regs. If the @n th entry is NOT in the kernel stack, + * this returns 0. + */ +static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs, + unsigned n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (regs_within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/** + * regs_get_argument_nth() - get Nth argument at function call + * @regs: pt_regs which contains registers at function entry. + * @n: argument number. + * + * regs_get_argument_nth() returns @n th argument of a function call. + * Since usually the kernel stack will be changed right after function entry, + * you must use this at function entry. If the @n th entry is NOT in the + * kernel stack or pt_regs, this returns 0. + */ +#ifdef CONFIG_X86_32 +#define NR_REGPARMS 3 +static inline unsigned long regs_get_argument_nth(struct pt_regs *regs, + unsigned n) +{ + if (n NR_REGPARMS) { + switch (n) { + case 0: + return regs-ax; + case 1: + return regs-dx; + case 2: + return regs-cx; + } + return 0; + } else { + /* +* The typical case: arg n is on the stack
[PATCH -tip -v10 6/7] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- include/linux/ftrace_event.h | 13 +--- include/trace/ftrace.h | 22 +++-- kernel/trace/trace_events.c | 70 -- kernel/trace/trace_export.c | 27 4 files changed, 85 insertions(+), 47 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 5c093ff..f7733b6 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 1867553..d696580 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -147,7 +147,8 @@ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call( \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *unused) * { * int ret; * @@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call( \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *unused) * { * unregister_trace_call(ftrace_event_call); * } @@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call( \ * trace_current_buffer_unlock_commit(event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_call(void) + * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused) * { * int ret
[PATCH -tip -v10 3/7] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..5341842 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v10 4/7] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 5341842..b77e050 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67: - case 0x2e: - case 0x3e: - case 0x26: - case 0x64: - case 0x65: - case 0x36: - case 0xf0: - case 0xf3: - case 0xf2: - ++insn; - continue; - } - break; - } + struct insn insn; + kernel_insn_init(insn, p-ainsn.insn); - /* Skip REX instruction prefix. */ - if (is_REX_prefix(insn)) - ++insn; - - if (*insn == 0x0f) { - /* Two-byte opcode. */ - ++insn
[PATCH -tip -v10 0/7] tracing: kprobe-based event tracer and x86 instruction decoder
. cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format name: myprobe ID: 23 format: field:unsigned short common_type; offset:0; size:2; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (7): tracing: add kprobe-based event tracer tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobes.txt| 138 arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 arch/x86/include/asm/inat.h| 127 +++ arch/x86/include/asm/insn.h| 136 arch/x86/include/asm/ptrace.h | 122 +++ arch/x86/kernel/kprobes.c | 197 ++--- arch/x86/kernel/ptrace.c | 73 ++ arch/x86/lib/Makefile | 13 arch/x86/lib/inat.c| 82 ++ arch/x86/lib/insn.c| 473 + arch/x86/lib/x86-opcode-map.txt| 711 +++ arch/x86/scripts/Makefile | 19 + arch/x86/scripts/distill.awk | 42 + arch/x86/scripts/gen-insn-attr-x86.awk | 314 arch/x86/scripts/test_get_len.c| 99 +++ arch/x86/scripts/user_include.h| 49 + include/linux/ftrace_event.h | 13 include/trace/ftrace.h | 22 - kernel/trace/Kconfig | 12 kernel/trace/Makefile |1 kernel/trace/trace.h | 22 + kernel/trace/trace_event_types.h | 20 + kernel/trace/trace_events.c| 70 +- kernel/trace/trace_export.c| 27 - kernel/trace/trace_kprobe.c| 1183 26 files changed, 3825 insertions(+), 152 deletions(-) create mode 100644 Documentation/trace/kprobes.txt create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h create mode 100644 kernel/trace/trace_kprobe.c -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip -v10 1/7] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps do NOT include most of SSE and FP opcodes, because those opcodes are not used in the kernel. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Acked-by: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/include/asm/inat.h| 125 ++ arch/x86/include/asm/insn.h| 134 ++ arch/x86/lib/Makefile | 13 + arch/x86/lib/inat.c| 80 arch/x86/lib/insn.c| 471 + arch/x86/lib/x86-opcode-map.txt| 711 arch/x86/scripts/gen-insn-attr-x86.awk | 314 ++ 7 files changed, 1848 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include linux/types.h + +/* Instruction attributes */ +typedef u32 insn_attr_t; + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should add checking macros and use that macro in + * your code. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6 +#define INAT_IMM_VWORD 7 + +/* Legacy prefix */ +#define INAT_PFX_OFFS 0 +#define INAT_PFX_BITS 4 +#define INAT_PFX_MAX((1 INAT_PFX_BITS) - 1) +#define INAT_PFX_MASK (INAT_PFX_MAX INAT_PFX_OFFS) +/* Escape opcodes */ +#define INAT_ESC_OFFS (INAT_PFX_OFFS + INAT_PFX_BITS) +#define INAT_ESC_BITS 2 +#define INAT_ESC_MAX
[PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /sys/kernel/debug/tracing/kprobe_events. And you can set filters on each probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobes.txt for details. Changes from v9: - Select CONFIG_GENERIC_TRACER when CONFIG_KPROBE_TRACER=y. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobes.txt | 138 kernel/trace/Kconfig | 12 kernel/trace/Makefile|1 kernel/trace/trace.h | 22 + kernel/trace/trace_event_types.h | 20 + kernel/trace/trace_kprobe.c | 1183 ++ 6 files changed, 1376 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobes.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobes.txt b/Documentation/trace/kprobes.txt new file mode 100644 index 000..3a90ebb --- /dev/null +++ b/Documentation/trace/kprobes.txt @@ -0,0 +1,138 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each +probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe + + EVENT : Event name + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted + MEMADDR : Address where the probe is inserted + + FETCHARGS : Arguments + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace buffer. If an event +name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds +an event under tracing/events/kprobes/EVENT, at the directory you can see +'id', 'enabled', 'format' and 'filter'. + +enabled: + You can enable/disable the probe by writing 1 or 0 on it. + +format: + It shows the format of this probe event. It also shows aliases of arguments + which you specified to kprobe_events. + +filter: + You can write filtering rules of this event. And you can use both of aliase + names and field names for describing filters. + + +Usage examples +-- +To add a probe as a new event, write a new definition
[RESEND][ PATCH -tip -v9 3/7] kprobes: checks probe address is instruction boudary on x86
Ensure safeness of inserting kprobes by checking whether the specified address is at the first byte of a instruction on x86. This is done by decoding probed function from its head to the probe point. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 69 + 1 files changed, 69 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index b5b1848..5341842 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -48,6 +48,7 @@ #include linux/preempt.h #include linux/module.h #include linux/kdebug.h +#include linux/kallsyms.h #include asm/cacheflush.h #include asm/desc.h @@ -55,6 +56,7 @@ #include asm/uaccess.h #include asm/alternative.h #include asm/debugreg.h +#include asm/insn.h void jprobe_return_end(void); @@ -245,6 +247,71 @@ retry: } } +/* Recover the probed instruction at addr for further analysis. */ +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr) +{ + struct kprobe *kp; + kp = get_kprobe((void *)addr); + if (!kp) + return -EINVAL; + + /* +* Basically, kp-ainsn.insn has an original instruction. +* However, RIP-relative instruction can not do single-stepping +* at different place, fix_riprel() tweaks the displacement of +* that instruction. In that case, we can't recover the instruction +* from the kp-ainsn.insn. +* +* On the other hand, kp-opcode has a copy of the first byte of +* the probed instruction, which is overwritten by int3. And +* the instruction at kp-addr is not modified by kprobes except +* for the first byte, we can recover the original instruction +* from it and kp-opcode. +*/ + memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + buf[0] = kp-opcode; + return 0; +} + +/* Dummy buffers for kallsyms_lookup */ +static char __dummy_buf[KSYM_NAME_LEN]; + +/* Check if paddr is at an instruction boundary */ +static int __kprobes can_probe(unsigned long paddr) +{ + int ret; + unsigned long addr, offset = 0; + struct insn insn; + kprobe_opcode_t buf[MAX_INSN_SIZE]; + + if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf)) + return 0; + + /* Decode instructions */ + addr = paddr - offset; + while (addr paddr) { + kernel_insn_init(insn, (void *)addr); + insn_get_opcode(insn); + + /* Check if the instruction has been modified. */ + if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) { + ret = recover_probed_instruction(buf, addr); + if (ret) + /* +* Another debugging subsystem might insert +* this breakpoint. In that case, we can't +* recover it. +*/ + return 0; + kernel_insn_init(insn, buf); + } + insn_get_length(insn); + addr += insn.length; + } + + return (addr == paddr); +} + /* * Returns non-zero if opcode modifies the interrupt flag. */ @@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) int __kprobes arch_prepare_kprobe(struct kprobe *p) { + if (!can_probe((unsigned long)p-addr)) + return -EILSEQ; /* insn: must be on special executable page on x86. */ p-ainsn.insn = get_insn_slot(); if (!p-ainsn.insn) -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND][ PATCH -tip -v9 1/7] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps do NOT include most of SSE and FP opcodes, because those opcodes are not used in the kernel. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/include/asm/inat.h| 125 ++ arch/x86/include/asm/insn.h| 134 ++ arch/x86/lib/Makefile | 13 + arch/x86/lib/inat.c| 80 arch/x86/lib/insn.c| 471 + arch/x86/lib/x86-opcode-map.txt| 711 arch/x86/scripts/gen-insn-attr-x86.awk | 314 ++ 7 files changed, 1848 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include linux/types.h + +/* Instruction attributes */ +typedef u32 insn_attr_t; + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should add checking macros and use that macro in + * your code. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6 +#define INAT_IMM_VWORD 7 + +/* Legacy prefix */ +#define INAT_PFX_OFFS 0 +#define INAT_PFX_BITS 4 +#define INAT_PFX_MAX((1 INAT_PFX_BITS) - 1) +#define INAT_PFX_MASK (INAT_PFX_MAX INAT_PFX_OFFS) +/* Escape opcodes */ +#define INAT_ESC_OFFS (INAT_PFX_OFFS + INAT_PFX_BITS) +#define INAT_ESC_BITS 2 +#define INAT_ESC_MAX ((1
[RESEND][ PATCH -tip -v9 5/7] x86: add pt_regs register and stack access APIs
Add following APIs for accessing registers and stack entries from pt_regs. - regs_query_register_offset(const char *name) Query the offset of name register. - regs_query_register_name(unsigned offset) Query the name of register by its offset. - regs_get_register(struct pt_regs *regs, unsigned offset) Get the value of a register by its offset. - regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr) Check the address is in the kernel stack. - regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned nth) Get Nth entry of the kernel stack. (N = 0) - regs_get_argument_nth(struct pt_regs *reg, unsigned nth) Get Nth argument at function call. (N = 0) Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Roland McGrath rol...@redhat.com Cc: linux-a...@vger.kernel.org --- arch/x86/include/asm/ptrace.h | 122 + arch/x86/kernel/ptrace.c | 73 + 2 files changed, 195 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index 0f0d908..2fd3ea3 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -7,6 +7,7 @@ #ifdef __KERNEL__ #include asm/segment.h +#include asm/page_types.h #endif #ifndef __ASSEMBLY__ @@ -216,6 +217,127 @@ static inline unsigned long user_stack_pointer(struct pt_regs *regs) return regs-sp; } +/* Query offset/name of register from its name/offset */ +extern int regs_query_register_offset(const char *name); +extern const char *regs_query_register_name(unsigned offset); +#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss)) + +/** + * regs_get_regsiter() - get register value from its offset + * @regs: pt_regs from which register value is gotten. + * @offset:offset number of the register. + * + * regs_get_register returns the value of a register whose offset from @regs + * is @offset. The @offset is the offset of the register in struct pt_regs. + * If @offset is bigger than MAX_REG_OFFSET, this returns 0. + */ +static inline unsigned long regs_get_register(struct pt_regs *regs, + unsigned offset) +{ + if (unlikely(offset MAX_REG_OFFSET)) + return 0; + return *(unsigned long *)((unsigned long)regs + offset); +} + +/** + * regs_within_kernel_stack() - check the address in the stack + * @regs: pt_regs which contains kernel stack pointer. + * @addr: address which is checked. + * + * regs_within_kenel_stack() checks @addr is within the kernel stack page(s). + * If @addr is within the kernel stack, it returns true. If not, returns false. + */ +static inline int regs_within_kernel_stack(struct pt_regs *regs, + unsigned long addr) +{ + return ((addr ~(THREAD_SIZE - 1)) == + (kernel_stack_pointer(regs) ~(THREAD_SIZE - 1))); +} + +/** + * regs_get_kernel_stack_nth() - get Nth entry of the stack + * @regs: pt_regs which contains kernel stack pointer. + * @n: stack entry number. + * + * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which + * is specifined by @regs. If the @n th entry is NOT in the kernel stack, + * this returns 0. + */ +static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs, + unsigned n) +{ + unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs); + addr += n; + if (regs_within_kernel_stack(regs, (unsigned long)addr)) + return *addr; + else + return 0; +} + +/** + * regs_get_argument_nth() - get Nth argument at function call + * @regs: pt_regs which contains registers at function entry. + * @n: argument number. + * + * regs_get_argument_nth() returns @n th argument of a function call. + * Since usually the kernel stack will be changed right after function entry, + * you must use this at function entry. If the @n th entry is NOT in the + * kernel stack or pt_regs, this returns 0. + */ +#ifdef CONFIG_X86_32 +#define NR_REGPARMS 3 +static inline unsigned long regs_get_argument_nth(struct pt_regs *regs, + unsigned n) +{ + if (n NR_REGPARMS) { + switch (n) { + case 0: + return regs-ax; + case 1: + return regs-dx; + case 2: + return regs-cx; + } + return 0; + } else { + /* +* The typical case: arg n is on the stack. +* (Note: stack[0] = return address, so skip it) +*/ + return
[RESEND][ PATCH -tip -v9 7/7] tracing: add kprobe-based event tracer
Add kprobes-based event tracer on ftrace. This tracer is similar to the events tracer which is based on Tracepoint infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe and kretprobe). It probes anywhere where kprobes can probe(this means, all functions body except for __kprobes functions). Similar to the events tracer, this tracer doesn't need to be activated via current_tracer, instead of that, just set probe points via /sys/kernel/debug/tracing/kprobe_events. And you can set filters on each probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. This tracer supports following probe arguments for each probe. %REG : Fetch register REG sN: Fetch Nth entry of stack (N = 0) @ADDR : Fetch memory at ADDR (ADDR should be in kernel) @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) aN: Fetch function argument. (N = 0) rv: Fetch return value. ra: Fetch return address. +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address. See Documentation/trace/kprobes.txt for details. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Christoph Hellwig h...@infradead.org Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Tom Zanussi tzanu...@gmail.com --- Documentation/trace/kprobes.txt | 138 kernel/trace/Kconfig | 11 kernel/trace/Makefile|1 kernel/trace/trace.h | 22 + kernel/trace/trace_event_types.h | 20 + kernel/trace/trace_kprobe.c | 1183 ++ 6 files changed, 1375 insertions(+), 0 deletions(-) create mode 100644 Documentation/trace/kprobes.txt create mode 100644 kernel/trace/trace_kprobe.c diff --git a/Documentation/trace/kprobes.txt b/Documentation/trace/kprobes.txt new file mode 100644 index 000..3a90ebb --- /dev/null +++ b/Documentation/trace/kprobes.txt @@ -0,0 +1,138 @@ + Kprobe-based Event Tracer + = + + Documentation is written by Masami Hiramatsu + + +Overview + +This tracer is similar to the events tracer which is based on Tracepoint +infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe +and kretprobe). It probes anywhere where kprobes can probe(this means, all +functions body except for __kprobes functions). + +Unlike the function tracer, this tracer can probe instructions inside of +kernel functions. It allows you to check which instruction has been executed. + +Unlike the Tracepoint based events tracer, this tracer can add and remove +probe points on the fly. + +Similar to the events tracer, this tracer doesn't need to be activated via +current_tracer, instead of that, just set probe points via +/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each +probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter. + + +Synopsis of kprobe_events +- + p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: set a probe + r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe + + EVENT : Event name + SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted + MEMADDR : Address where the probe is inserted + + FETCHARGS : Arguments + %REG : Fetch register REG + sN : Fetch Nth entry of stack (N = 0) + @ADDR: Fetch memory at ADDR (ADDR should be in kernel) + @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data symbol) + aN : Fetch function argument. (N = 0)(*) + rv : Fetch return value.(**) + ra : Fetch return address.(**) + +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***) + + (*) aN may not correct on asmlinkaged functions and at the middle of + function body. + (**) only for return probe. + (***) this is useful for fetching a field of data structures. + + +Per-Probe Event Filtering +- + Per-probe event filtering feature allows you to set different filter on each +probe and gives you what arguments will be shown in trace buffer. If an event +name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds +an event under tracing/events/kprobes/EVENT, at the directory you can see +'id', 'enabled', 'format' and 'filter'. + +enabled: + You can enable/disable the probe by writing 1 or 0 on it. + +format: + It shows the format of this probe event. It also shows aliases of arguments + which you specified to kprobe_events. + +filter: + You can write filtering rules of this event. And you can use both of aliase + names and field names for describing filters. + + +Usage examples +-- +To add a probe as a new event, write a new definition to kprobe_events +as below. + + echo p:myprobe do_sys_open a0 a1 a2 a3 /sys/kernel/debug
[RESEND][ PATCH -tip -v9 0/7] tracing: kprobe-based event tracer and x86 instruction decoder
; field:unsigned char common_flags; offset:2; size:1; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (7): tracing: add kprobe-based event tracer tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobes.txt| 138 arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 arch/x86/include/asm/inat.h| 127 +++ arch/x86/include/asm/insn.h| 136 arch/x86/include/asm/ptrace.h | 122 +++ arch/x86/kernel/kprobes.c | 197 ++--- arch/x86/kernel/ptrace.c | 73 ++ arch/x86/lib/Makefile | 13 arch/x86/lib/inat.c| 82 ++ arch/x86/lib/insn.c| 473 + arch/x86/lib/x86-opcode-map.txt| 711 +++ arch/x86/scripts/Makefile | 19 + arch/x86/scripts/distill.awk | 42 + arch/x86/scripts/gen-insn-attr-x86.awk | 314 arch/x86/scripts/test_get_len.c| 99 +++ arch/x86/scripts/user_include.h| 49 + include/linux/ftrace_event.h | 13 include/trace/ftrace.h | 22 - kernel/trace/Kconfig | 11 kernel/trace/Makefile |1 kernel/trace/trace.h | 22 + kernel/trace/trace_event_types.h | 20 + kernel/trace/trace_events.c| 70 +- kernel/trace/trace_export.c| 27 - kernel/trace/trace_kprobe.c| 1183 26 files changed, 3824 insertions(+), 152 deletions(-) create mode 100644 Documentation/trace/kprobes.txt create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h create mode 100644 kernel/trace/trace_kprobe.c -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND][ PATCH -tip -v9 6/7] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- include/linux/ftrace_event.h | 13 +--- include/trace/ftrace.h | 22 +++-- kernel/trace/trace_events.c | 70 -- kernel/trace/trace_export.c | 27 4 files changed, 85 insertions(+), 47 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 5c093ff..f7733b6 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 1867553..d696580 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -147,7 +147,8 @@ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call( \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *unused) * { * int ret; * @@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call( \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *unused) * { * unregister_trace_call(ftrace_event_call); * } @@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call( \ * trace_current_buffer_unlock_commit(event, irq_flags, pc); * } * - * static int ftrace_raw_reg_event_call(void) + * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused) * { * int ret
[RESEND][ PATCH -tip -v9 4/7] kprobes: cleanup fix_riprel() using insn decoder on x86
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction decoder. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Jim Keniston jkeni...@us.ibm.com Cc: Ingo Molnar mi...@elte.hu --- arch/x86/kernel/kprobes.c | 128 - 1 files changed, 23 insertions(+), 105 deletions(-) diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c index 5341842..b77e050 100644 --- a/arch/x86/kernel/kprobes.c +++ b/arch/x86/kernel/kprobes.c @@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = { /* --- */ /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ }; -static const u32 onebyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */ - W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */ - W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */ - W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */ - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */ - W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */ - W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */ - W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */ - W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */ - W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */ - W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */ - W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */ - W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */ - W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */ - W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */ - W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) /* f0 */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; -static const u32 twobyte_has_modrm[256 / 32] = { - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ - /* --- */ - W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */ - W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */ - W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */ - W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */ - W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */ - W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */ - W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */ - W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */ - W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */ - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */ - W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */ - W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */ - W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */ - W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */ - W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */ - W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* ff */ - /* --- */ - /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ -}; #undef W struct kretprobe_blackpoint kretprobe_blacklist[] = { @@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn) static void __kprobes fix_riprel(struct kprobe *p) { #ifdef CONFIG_X86_64 - u8 *insn = p-ainsn.insn; - s64 disp; - int need_modrm; - - /* Skip legacy instruction prefixes. */ - while (1) { - switch (*insn) { - case 0x66: - case 0x67: - case 0x2e: - case 0x3e: - case 0x26: - case 0x64: - case 0x65: - case 0x36: - case 0xf0: - case 0xf3: - case 0xf2: - ++insn; - continue; - } - break; - } + struct insn insn; + kernel_insn_init(insn, p-ainsn.insn); - /* Skip REX instruction prefix. */ - if (is_REX_prefix(insn)) - ++insn; - - if (*insn == 0x0f) { - /* Two-byte opcode. */ - ++insn
[PATCH -tip v9 1/7] x86: instruction decoder API
Add x86 instruction decoder to arch-specific libraries. This decoder can decode x86 instructions used in kernel into prefix, opcode, modrm, sib, displacement and immediates. This can also show the length of instructions. This version introduces instruction attributes for decoding instructions. The instruction attribute tables are generated from the opcode map file (x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk). Currently, the opcode maps are based on opcode maps in Intel(R) 64 and IA-32 Architectures Software Developers Manual Vol.2: Appendix.A, and consist of below two types of opcode tables. 1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are written as below; Table: table-name Referrer: escaped-name opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] (or) opcode: escape # escaped-name EndTable Group opcodes, which has 8 elements, are written as below; GrpTable: GrpXXX reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...] EndTable These opcode maps do NOT include most of SSE and FP opcodes, because those opcodes are not used in the kernel. Changes from v6.1: - fix patch title. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Signed-off-by: Jim Keniston jkeni...@us.ibm.com Cc: H. Peter Anvin h...@zytor.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com Cc: Ingo Molnar mi...@elte.hu Cc: Frederic Weisbecker fweis...@gmail.com Cc: Andi Kleen a...@linux.intel.com Cc: Vegard Nossum vegard.nos...@gmail.com Cc: Avi Kivity a...@redhat.com Cc: Przemysław Pawełczyk przemys...@pawelczyk.it --- arch/x86/include/asm/inat.h| 125 ++ arch/x86/include/asm/insn.h| 134 ++ arch/x86/lib/Makefile | 13 + arch/x86/lib/inat.c| 80 arch/x86/lib/insn.c| 471 + arch/x86/lib/x86-opcode-map.txt| 711 arch/x86/scripts/gen-insn-attr-x86.awk | 314 ++ 7 files changed, 1848 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h new file mode 100644 index 000..01e079a --- /dev/null +++ b/arch/x86/include/asm/inat.h @@ -0,0 +1,125 @@ +#ifndef _ASM_INAT_INAT_H +#define _ASM_INAT_INAT_H +/* + * x86 instruction attributes + * + * Written by Masami Hiramatsu mhira...@redhat.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + */ +#include linux/types.h + +/* Instruction attributes */ +typedef u32 insn_attr_t; + +/* + * Internal bits. Don't use bitmasks directly, because these bits are + * unstable. You should add checking macros and use that macro in + * your code. + */ + +#define INAT_OPCODE_TABLE_SIZE 256 +#define INAT_GROUP_TABLE_SIZE 8 + +/* Legacy instruction prefixes */ +#define INAT_PFX_OPNDSZ1 /* 0x66 */ /* LPFX1 */ +#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */ +#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */ +#define INAT_PFX_LOCK 4 /* 0xF0 */ +#define INAT_PFX_CS5 /* 0x2E */ +#define INAT_PFX_DS6 /* 0x3E */ +#define INAT_PFX_ES7 /* 0x26 */ +#define INAT_PFX_FS8 /* 0x64 */ +#define INAT_PFX_GS9 /* 0x65 */ +#define INAT_PFX_SS10 /* 0x36 */ +#define INAT_PFX_ADDRSZ11 /* 0x67 */ + +#define INAT_LPREFIX_MAX 3 + +/* Immediate size */ +#define INAT_IMM_BYTE 1 +#define INAT_IMM_WORD 2 +#define INAT_IMM_DWORD 3 +#define INAT_IMM_QWORD 4 +#define INAT_IMM_PTR 5 +#define INAT_IMM_VWORD32 6 +#define INAT_IMM_VWORD 7 + +/* Legacy prefix */ +#define INAT_PFX_OFFS 0 +#define INAT_PFX_BITS 4 +#define INAT_PFX_MAX((1 INAT_PFX_BITS) - 1) +#define INAT_PFX_MASK (INAT_PFX_MAX INAT_PFX_OFFS) +/* Escape opcodes */ +#define INAT_ESC_OFFS (INAT_PFX_OFFS + INAT_PFX_BITS) +#define INAT_ESC_BITS 2 +#define INAT_ESC_MAX ((1
[PATCH -tip v9 0/7] tracing: kprobe-based event tracer and x86 instruction decoder
; field:unsigned char common_preempt_count; offset:3; size:1; field:int common_pid; offset:4; size:4; field:int common_tgid; offset:8; size:4; field: unsigned long ip;offset:16;tsize:8; field: int nargs; offset:24;tsize:4; field: unsigned long arg0; offset:32;tsize:8; field: unsigned long arg1; offset:40;tsize:8; field: unsigned long arg2; offset:48;tsize:8; field: unsigned long arg3; offset:56;tsize:8; alias: a0; original: arg0; alias: a1; original: arg1; alias: a2; original: arg2; alias: a3; original: arg3; print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3 You can see that the event has 4 arguments and alias expressions corresponding to it. echo /sys/kernel/debug/tracing/kprobe_events This clears all probe points. and you can see the traced information via /sys/kernel/debug/tracing/trace. cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PIDCPU#TIMESTAMP FUNCTION # | | | | | ...-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0 ...-1447 [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 0xfffe 0x81367a3a ...-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 0x40413c 0x8000 0x1b6 ...-1447 [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a ...-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 0x4041c6 0x98800 0x10 ...-1447 [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 0x3 0x81367a3a Each line shows when the kernel hits a probe, and - SYMBOL means kernel returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel returns from do_sys_open to sys_open+0x1b). Thank you, --- Masami Hiramatsu (7): tracing: add kprobe-based event tracer tracing: ftrace dynamic ftrace_event_call support x86: add pt_regs register and stack access APIs kprobes: cleanup fix_riprel() using insn decoder on x86 kprobes: checks probe address is instruction boudary on x86 x86: x86 instruction decoder build-time selftest x86: instruction decoder API Documentation/trace/kprobes.txt| 138 arch/x86/Kconfig.debug |9 arch/x86/Makefile |3 arch/x86/include/asm/inat.h| 127 +++ arch/x86/include/asm/insn.h| 136 arch/x86/include/asm/ptrace.h | 122 +++ arch/x86/kernel/kprobes.c | 197 ++--- arch/x86/kernel/ptrace.c | 73 ++ arch/x86/lib/Makefile | 13 arch/x86/lib/inat.c| 82 ++ arch/x86/lib/insn.c| 473 + arch/x86/lib/x86-opcode-map.txt| 711 +++ arch/x86/scripts/Makefile | 19 + arch/x86/scripts/distill.awk | 42 + arch/x86/scripts/gen-insn-attr-x86.awk | 314 arch/x86/scripts/test_get_len.c| 99 +++ arch/x86/scripts/user_include.h| 49 + include/linux/ftrace_event.h | 13 include/trace/ftrace.h | 22 - kernel/trace/Kconfig | 11 kernel/trace/Makefile |1 kernel/trace/trace.h | 22 + kernel/trace/trace_event_types.h | 20 + kernel/trace/trace_events.c| 70 +- kernel/trace/trace_export.c| 27 - kernel/trace/trace_kprobe.c| 1183 26 files changed, 3824 insertions(+), 152 deletions(-) create mode 100644 Documentation/trace/kprobes.txt create mode 100644 arch/x86/include/asm/inat.h create mode 100644 arch/x86/include/asm/insn.h create mode 100644 arch/x86/lib/inat.c create mode 100644 arch/x86/lib/insn.c create mode 100644 arch/x86/lib/x86-opcode-map.txt create mode 100644 arch/x86/scripts/Makefile create mode 100644 arch/x86/scripts/distill.awk create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk create mode 100644 arch/x86/scripts/test_get_len.c create mode 100644 arch/x86/scripts/user_include.h create mode 100644 kernel/trace/trace_kprobe.c -- Masami Hiramatsu Software Engineer Hitachi Computer Products (America), Inc. Software Solutions Division e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -tip v9 6/7] tracing: ftrace dynamic ftrace_event_call support
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new ftrace_event_call to ftrace on the fly. Each operator functions of the call takes a ftrace_event_call data structure as an argument, because these functions may be shared among several ftrace_event_calls. Changes from v8: - Lock event_mutex in trace_add/remove_event_call(). - Add __trace_add/remove_event_call() for internal use. - Rename dummy variables to unused. Signed-off-by: Masami Hiramatsu mhira...@redhat.com Cc: Steven Rostedt rost...@goodmis.org Cc: Ingo Molnar mi...@elte.hu Cc: Tom Zanussi tzanu...@gmail.com Cc: Frederic Weisbecker fweis...@gmail.com --- include/linux/ftrace_event.h | 13 +--- include/trace/ftrace.h | 22 +++-- kernel/trace/trace_events.c | 70 -- kernel/trace/trace_export.c | 27 4 files changed, 85 insertions(+), 47 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index bbf40f6..e25f3a4 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -108,12 +108,13 @@ struct ftrace_event_call { struct dentry *dir; struct trace_event *event; int enabled; - int (*regfunc)(void); - void(*unregfunc)(void); + int (*regfunc)(struct ftrace_event_call *); + void(*unregfunc)(struct ftrace_event_call *); int id; - int (*raw_init)(void); - int (*show_format)(struct trace_seq *s); - int (*define_fields)(void); + int (*raw_init)(struct ftrace_event_call *); + int (*show_format)(struct ftrace_event_call *, + struct trace_seq *); + int (*define_fields)(struct ftrace_event_call *); struct list_headfields; int filter_active; void*filter; @@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct ftrace_event_call *call, extern int trace_define_field(struct ftrace_event_call *call, char *type, char *name, int offset, int size, int is_signed); +extern int trace_add_event_call(struct ftrace_event_call *call); +extern void trace_remove_event_call(struct ftrace_event_call *call); #define is_signed_type(type) (((type)(-1)) 0) diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index b4ec83a..e163e4b 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -229,7 +229,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int flags)\ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ static int \ -ftrace_format_##call(struct trace_seq *s) \ +ftrace_format_##call(struct ftrace_event_call *event_call, \ +struct trace_seq *s) \ { \ struct ftrace_raw_##call field __attribute__((unused)); \ int ret = 0;\ @@ -269,10 +270,9 @@ ftrace_format_##call(struct trace_seq *s) \ #undef TRACE_EVENT #define TRACE_EVENT(call, proto, args, tstruct, func, print) \ int\ -ftrace_define_fields_##call(void) \ +ftrace_define_fields_##call(struct ftrace_event_call *event_call) \ { \ struct ftrace_raw_##call field; \ - struct ftrace_event_call *event_call = event_##call; \ int ret;\ \ __common_field(int, type, 1); \ @@ -298,7 +298,7 @@ ftrace_define_fields_##call(void) \ * event_trace_printk(_RET_IP_, call: fmt); * } * - * static int ftrace_reg_event_call(void) + * static int ftrace_reg_event_call(struct ftrace_event_call *unused) * { * int ret; * @@ -309,7 +309,7 @@ ftrace_define_fields_##call(void) \ * return ret; * } * - * static void ftrace_unreg_event_call(void) + * static void ftrace_unreg_event_call(struct ftrace_event_call *unused) * { * unregister_trace_call(ftrace_event_call); * } @@ -342,7 +342,7 @@ ftrace_define_fields_##call(void