Re: RFC: paravirtualizing perf_clock

2013-10-31 Thread Masami Hiramatsu
(2013/10/30 23:03), David Ahern wrote:
 On 10/29/13 11:59 PM, Masami Hiramatsu wrote:
 (2013/10/29 11:58), David Ahern wrote:
 To back out a bit, my end goal is to be able to create and merge
 perf-events from any context on a KVM-based host -- guest userspace,
 guest kernel space, host userspace and host kernel space (userspace
 events with a perf-clock timestamp is another topic ;-)).

 That is almost same as what we(Yoshihiro and I) are trying on integrated
 tracing, we are doing it on ftrace and trace-cmd (but perhaps, it eventually
 works on perf-ftrace).
 
 I thought at this point (well, once perf-ftrace gets committed) that you 
 can do everything with perf. What feature is missing in perf that you 
 get with trace-cmd or using debugfs directly?

The perftools interface is the best for profiling a process or in a short 
period.
However, what we'd like to do is monitoring or tracing in background a long
period on the memory, while the system life cycle, as a flight recorder.
This kind of tracing interface is required for mission-critical system for
trouble shooting.

Also, on-the-fly configurability of ftrace such as snapshot, multi-buffer,
event-adding/removing are very useful, since in the flight-recorder
use-case, we can't stop tracing for even a moment.

Moreover, our guest/host integrated tracer can pass event buffers from
guest to host with very small overhead, because it uses ftrace ringbuffer
and virtio-serial with splice (so, zero page copying in the guest).
Note that we need low overhead tracing as small as possible because it
is running always in background.

That's why we're using ftrace for our purpose. But anyway, the time
synchronization is common issue. Let's share the solution :)


 And then for the cherry on top a design that works across architectures
 (e.g., x86 now, but arm later).

 I think your proposal is good for the default implementation, it doesn't
 depends on the arch specific feature. However, since physical timer(clock)
 interfaces and virtualization interfaces strongly depends on the arch,
 I guess the optimized implementations will become different on each arch.
 For example, maybe we can export tsc-offset to the guest to adjust clock
 on x86, but not on ARM, or other devices. In that case, until implementing
 optimized one, we can use paravirt perf_clock.
 
 So this MSR read takes about 1.6usecs (from 'perf stat kvm live') and 
 that is total time between VMEXIT and VMENTRY. The time it takes to run 
 perf_clock in the host should be a very small part of that 1.6 usec. 

Yeah, a hypercall is always heavy operation. So that is not the best
solution, we need a optimized one for each arch.

 I'll take a look at the TSC path to see how it is optimized (suggestions 
 appreciated).

At least on the machine which has stable tsc, we can relay on that.
We just need the tsc-offset to adjust it in the guest. Note that this
offset can change if the guest sleeps/resumes or does a live-migration.
Each time we need to refresh the tsc-offset.

 Another thought is to make the use of pv_perf_clock an option -- user 
 can knowingly decide the additional latency/overhead is worth the feature.

Yeah. BTW, would you see the paravirt_sched_clock(pv_time_ops)?
It seems that such synchronized clock is there.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: RFC: paravirtualizing perf_clock

2013-10-30 Thread Masami Hiramatsu
(2013/10/29 11:58), David Ahern wrote:
 On 10/28/13 7:15 AM, Peter Zijlstra wrote:
 Any suggestions on how to do this and without impacting performance. I
 noticed the MSR path seems to take about twice as long as the current
 implementation (which I believe results in rdtsc in the VM for x86 with
 stable TSC).

 So assuming all the TSCs are in fact stable; you could implement this by
 syncing up the guest TSC to the host TSC on guest boot. I don't think
 anything _should_ rely on the absolute TSC value.

 Of course you then also need to make sure the host and guest tsc
 multipliers (cyc2ns) are identical, you can play games with
 cyc2ns_offset if you're brave.

 
 This and the method Gleb mentioned both are going to be complex and 
 fragile -- based assumptions on how the perf_clock timestamps are 
 generated. For example, 489223e assumes you have the tracepoint enabled 
 at VM start with some means of capturing the data (e.g., a perf-session 
 active). In both cases the end result requires piecing together and 
 re-generating the VM's timestamp on the events. For perf this means 
 either modifying the tool to take parameters and an algorithm on how to 
 modify the timestamp or a homegrown tool to regenerate the file with 
 updated timestamps.
 
 To back out a bit, my end goal is to be able to create and merge 
 perf-events from any context on a KVM-based host -- guest userspace, 
 guest kernel space, host userspace and host kernel space (userspace 
 events with a perf-clock timestamp is another topic ;-)).

That is almost same as what we(Yoshihiro and I) are trying on integrated
tracing, we are doing it on ftrace and trace-cmd (but perhaps, it eventually
works on perf-ftrace).

 Having the 
 events generated with the proper timestamp is the simpler approach than 
 trying to collect various tidbits of data, massage timestamps (and 
 hoping the clock source hasn't changed) and then merge events.

Yeah, if possible, we'd like to use it too.

 
 And then for the cherry on top a design that works across architectures 
 (e.g., x86 now, but arm later).

I think your proposal is good for the default implementation, it doesn't
depends on the arch specific feature. However, since physical timer(clock)
interfaces and virtualization interfaces strongly depends on the arch,
I guess the optimized implementations will become different on each arch.
For example, maybe we can export tsc-offset to the guest to adjust clock
on x86, but not on ARM, or other devices. In that case, until implementing
optimized one, we can use paravirt perf_clock.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-12-16 Thread Masami Hiramatsu
(2010/12/16 19:20), Avi Kivity wrote:
 On 12/13/2010 01:20 PM, Masami Hiramatsu wrote:
 (2010/12/13 2:47), Avi Kivity wrote:
   On 12/12/2010 07:43 PM, Arnaldo Carvalho de Melo wrote:
   Em Sun, Dec 12, 2010 at 07:42:06PM +0200, Avi Kivity escreveu:
  On 12/12/2010 07:36 PM, Arnaldo Carvalho de Melo wrote:
  Em Sun, Dec 12, 2010 at 06:35:24PM +0200, Avi Kivity escreveu:
  On 11/23/2010 05:45 PM, Steven Rostedt wrote:
  Again, the work around is to replace your trace_printks() with
  __trace_printk(_THIS_IP_, ...) or just modify the 
  trace_printk() macro
  in include/linux/kernel.h to always use the __trace_printk() 
  version.
  
  This works; I'm using it for now (I tried to use 'perf probe', 
  but I
  get unpredictable results, like null pointer derefs).
  
  Can you tell us which functions, environment, etc?
   
  Something around 2.6.27-rc4; example functions are FNAME(fetch) in
  arch/x86/kvm/paging_tmpl.h; compiled modular (which was Steven's
  guess as to why it fails).
   
  (note, the failure is with trace-cmd, not /sys/kernel/debug/tracing).
 
   I mean the I tried to use 'perf probe' part.
 
   Well, same, more or less.
 
 perf probe -m kvm --add 'fetch_access=paging64_fetch 
  pt_access=gw-pt_access pte_access=gw-pte_access dirty'
 
   would return garbage for gw-*, and the log would show the exception 
  handler called.  gw is most certainly valid.
 

 Thank you for reporting.
 Hmm, actually, pagefaults could happen on fetching variables. But
 fetching argument routines should handle it...
 
 They did handle it (or so I understood from the logs).  But they shouldn't 
 have 
 occured in the first place, since gw was dereferenceable (and the function 
 dereferences it).

Ah, OK. Sometimes, it's hard to find the register/memory location of
local variables. (and sometimes it fails)

  So something went wrong while fetching gw itself (do you interpret the
 dwarf tables to find where the variable is stored?)

Hm, yes, you can use eu-readelf to dump debuginfo, and also objdump will help 
you
to find the address and assembler code.

 
 I'd like to check it, could you tell me details? for example, that exception 
 log,
 kprobe-tracer's event definition(you can see it via 
 debugfs/tracing/kprobe-events)
 and the result of `perf probe -L paging64_fetch:0-10`.
 
 I no longer have the logs, I'll try to reproduce it later.

Oh, Thank you! :)


-- 
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu...@hitachi.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trace_printk() support in trace-cmd

2010-12-13 Thread Masami Hiramatsu
(2010/12/13 2:47), Avi Kivity wrote:
 On 12/12/2010 07:43 PM, Arnaldo Carvalho de Melo wrote:
 Em Sun, Dec 12, 2010 at 07:42:06PM +0200, Avi Kivity escreveu:
   On 12/12/2010 07:36 PM, Arnaldo Carvalho de Melo wrote:
   Em Sun, Dec 12, 2010 at 06:35:24PM +0200, Avi Kivity escreveu:
  On 11/23/2010 05:45 PM, Steven Rostedt wrote:
  Again, the work around is to replace your trace_printks() with
  __trace_printk(_THIS_IP_, ...) or just modify the trace_printk() 
  macro
  in include/linux/kernel.h to always use the __trace_printk() 
  version.
   
  This works; I'm using it for now (I tried to use 'perf probe', but I
  get unpredictable results, like null pointer derefs).
   
   Can you tell us which functions, environment, etc?
 
   Something around 2.6.27-rc4; example functions are FNAME(fetch) in
   arch/x86/kvm/paging_tmpl.h; compiled modular (which was Steven's
   guess as to why it fails).
 
   (note, the failure is with trace-cmd, not /sys/kernel/debug/tracing).

 I mean the I tried to use 'perf probe' part.
 
 Well, same, more or less.
 
   perf probe -m kvm --add 'fetch_access=paging64_fetch 
 pt_access=gw-pt_access pte_access=gw-pte_access dirty'
 
 would return garbage for gw-*, and the log would show the exception handler 
 called.  gw is most certainly valid.
 

Thank you for reporting.
Hmm, actually, pagefaults could happen on fetching variables. But
fetching argument routines should handle it...
I'd like to check it, could you tell me details? for example, that exception 
log,
kprobe-tracer's event definition(you can see it via 
debugfs/tracing/kprobe-events)
and the result of `perf probe -L paging64_fetch:0-10`.

Best regards,

-- 
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: masami.hiramatsu...@hitachi.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

2010-03-16 Thread Masami Hiramatsu
oerg Roedel wrote:
 On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
 Hm, that sounds rather messy if we want to use it to basically expose kernel 
 functionality in a guest/host unified way. Is the qemu process discoverable 
 in 
 some secure way? Can we trust it? Is there some proper tooling available to 
 do 
 it, or do we have to push it through 2-3 packages to get such a useful 
 feature 
 done?
 
 Since we want to implement a pmu usable for the guest anyway why we
 don't just use a guests perf to get all information we want? If we get a
 pmu-nmi from the guest we just re-inject it to the guest and perf in the
 guest gives us all information we wand including kernel and userspace
 symbols, stack traces, and so on.

I guess this aims to get information from old environments running on
kvm for life extension :)

 In the previous thread we discussed about a direct trace channel between
 guest and host kernel (which can be used for ftrace events for example).
 This channel could be used to transport this information to the host
 kernel.

Interesting! I know the people who are trying to do that with systemtap.
See, http://vesper.sourceforge.net/

 
 The only additional feature needed is a way for the host to start a perf
 instance in the guest.

# ssh localguest perf record --host-chanel ... ? B-)

Thank you,

 
 Opinions?
 
 
   Joerg
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Masami Hiramatsu
e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TOOL] c2kpe: C expression to kprobe event format converter

2009-08-30 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Thu, Aug 13, 2009 at 04:59:19PM -0400, Masami Hiramatsu wrote:
 This program converts probe point in C expression to kprobe event
 format for kprobe-based event tracer. This helps to define kprobes
 events by C source line number or function name, and local variable
 name. Currently, this supports only x86(32/64) kernels.


 Compile
 
 Before compilation, please install libelf and libdwarf development
 packages.
 (e.g. elfutils-libelf-devel and libdwarf-devel on Fedora)
 
 
 This may probably need a specific libdwarf version?
 
 c2kpe.c: In function ‘die_get_entrypc’:
 c2kpe.c:422: erreur: ‘Dwarf_Ranges’ undeclared (first use in this function)
 c2kpe.c:422: erreur: (Each undeclared identifier is reported only once
 c2kpe.c:422: erreur: for each function it appears in.)
 c2kpe.c:422: erreur: ‘ranges’ undeclared (first use in this function)
 c2kpe.c:447: attention : implicit declaration of function ‘dwarf_get_ranges’
 c2kpe.c:451: attention : implicit declaration of function 
 ‘dwarf_ranges_dealloc’

Aah, sure, it should be compiled with libdwarf newer than 20090324.
You can find it in http://reality.sgiweb.org/davea/dwarf.html

BTW, libdwarf and libdw (which is the yet another implementation of
dwarf library) are still under development, e.g. libdwarf doesn't
support gcc-4.4.1(very new) and only the latest libdw(0.142) can
support it. So, perhaps I might better port it on libdw, even that is
less documented...:(

 TODO
 
  - Fix bugs.
  - Support multiple probepoints from stdin.
  - Better kmodule support.
  - Use elfutils-libdw?
  - Merge into trace-cmd or perf-tools?
 
 
 Yeah definetly, that would be a veeery interesting thing to have.
 I've played with kprobe ftrace to debug something this evening.
 
 It's very cool to be able to put dynamic tracepoints in desired places.
 
 But...
 I firstly needed to put random trace_printk() in some places to
 observe some variables values. And then I thought about the kprobes
 tracer and realized I could do that without the need of rebuilding
 my kernel. Then I've played with it and indeed it works well and
 it's useful, but at the cost of reading objdump based assembly
 code to find the places where I could find my variables values.
 And after two or three probes in such conditions, I've become
 tired of that, then I wanted to try this tool.
 
 
 While I cannot yet because of this build error, I can imagine
 the power of such facility from perf.
 
 We could have a perf probe that creates a kprobe event in debugfs
 (default enable = 0) and which then rely on perf record for the actual
 recording.
 
 Then we could analyse it through perf trace.
 Let's imagine a simple example:
 
 int foo(int arg1, int arg2)
 {
   int var1;
 
   var1 = arg1;
   var1 *= arg2;
   var1 -= arg1;
 
 -- insert a probe here (file bar.c : line 60)
 
   var1 ^= ...
 
   return var1;
 }
 
 ./perf kprobe --file bar.c:60 --action arg1=%d,arg2=%d,var1=%d -- ls -R 
 /

I recommend it should be separated from record, like below:

# set new event
./perf kprobe --add kprobe:event1 --file bar.c:60 --action 
arg1=%d,arg2=%d,var1=%d
# record new event
./perf record -e kprobe:event1 -a -R -- ls -R /

This will allow us to focus on one thing -- convert C to kprobe-tracer.
And also, it can be listed as like as tracepoint events.

 ./perf trace
   arg1=1 arg2=1 var1=0
   arg1=2 arg2=2 var1=2
   etc..
 
 You may want to sort by field:
 
 ./perf trace -s arg1 --order desc
 arg1=1
   |
   --- arg2=1 var=1
   |
   --- arg2=2 var=1
 
 arg1=2
   |
   --- arg2=1 var=0
   |
   --- [...]
 
 ./perf trace -s arg1,arg2 --order asc
 arg1=1
   |
   --- arg2=1
 |
 - var1=0
 |
 - var1=
   arg2=...
 |
 
 Ok the latter is a bad example because var1 will always have only one
 value for a given arg1 and arg2. But I guess you see the point.
 
 You won't have to care about the perf trace part, it's already
 implemented and I'll soon handle the sorting part.
 
 All we need is the perf kprobes that translate a C level
 probing expression to a /debug/tracing/kprobe_events compliant
 thing. And then just call perf record with the new created
 event as an argument.

Indeed, that's what I imagine.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tracing/kprobes 1/4] x86: Fix x86 instruction decoder selftest to check only .text

2009-08-21 Thread Masami Hiramatsu
Fix x86 instruction decoder selftest to check only .text because other
sections (e.g. .notes) will have random bytes which don't need to be checked.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 arch/x86/tools/Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
index 3dd626b..95e9cc4 100644
--- a/arch/x86/tools/Makefile
+++ b/arch/x86/tools/Makefile
@@ -1,6 +1,6 @@
 PHONY += posttest
 quiet_cmd_posttest = TEST$@
-  cmd_posttest = $(OBJDUMP) -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len
+  cmd_posttest = $(OBJDUMP) -d -j .text $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len
 
 posttest: $(obj)/test_get_len vmlinux
$(call cmd,posttest)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tracing/kprobes 2/4] x86: Check awk features before generating inat-tables.c

2009-08-21 Thread Masami Hiramatsu
Check some awk features which old mawk doesn't support.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 arch/x86/tools/gen-insn-attr-x86.awk |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/x86/tools/gen-insn-attr-x86.awk 
b/arch/x86/tools/gen-insn-attr-x86.awk
index 93b62c9..19ba096 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -4,7 +4,25 @@
 #
 # Usage: awk -f gen-insn-attr-x86.awk x86-opcode-map.txt  inat-tables.c
 
+# Awk implementation sanity check
+function check_awk_implement() {
+   if (!match(abc, [[:lower:]]+))
+   return Your awk doesn't support charactor-class.
+   if (sprintf(%x, 0) != 0)
+   return Your awk has a printf-format problem.
+   return 
+}
+
 BEGIN {
+   # Implementation error checking
+   awkchecked = check_awk_implement()
+   if (awkchecked != ) {
+   print Error:  awkchecked  /dev/stderr
+   print Please try to use gawk.  /dev/stderr
+   exit 1
+   }
+
+   # Setup generating tables
print /* x86 opcode map generated from x86-opcode-map.txt */
print /* Do not change this code. */
ggid = 1
@@ -293,6 +311,8 @@ function convert_operands(opnd,   i,imm,mod)
 }
 
 END {
+   if (awkchecked != )
+   exit 1
# print escape opcode map's array
print /* Escape opcode map array */
print const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1] \


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tracing/kprobes 3/4] tracing/kprobes: Fix format typo in trace_kprobes

2009-08-21 Thread Masami Hiramatsu
Fix a format typo in kprobe-tracer.

Currently, it shows 'tsize' in format;

$ cat /debug/tracing/events/kprobes/event/format 
...
field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
...

This should be '\tsize';

$ cat /debug/tracing/events/kprobes/event/format 
...
field: unsigned long ip;offset:16;  size:8;
field: int nargs;   offset:24;  size:4;
...

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 kernel/trace/trace_kprobe.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7cd726e..22e91c0 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1069,7 +1069,7 @@ static int __probe_event_show_format(struct trace_seq *s,
 #define SHOW_FIELD(type, item, name)   \
do {\
ret = trace_seq_printf(s, \tfield:  #type  %s;\t\
-   offset:%u;tsize:%u;\n, name,  \
+   offset:%u;\tsize:%u;\n, name, \
(unsigned int)offsetof(typeof(field), item),\
(unsigned int)sizeof(type));\
if (!ret)   \


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH tracing/kprobes 4/4] tracing/kprobes: Change trace_arg to probe_arg

2009-08-21 Thread Masami Hiramatsu
Change trace_arg_string() and parse_trace_arg() to probe_arg_string()
and parse_probe_arg(), since those are kprobe-tracer local functions.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 kernel/trace/trace_kprobe.c |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 22e91c0..783d2db 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -220,7 +220,7 @@ static __kprobes void *probe_address(struct trace_probe *tp)
return (probe_is_return(tp)) ? tp-rp.kp.addr : tp-kp.addr;
 }
 
-static int trace_arg_string(char *buf, size_t n, struct fetch_func *ff)
+static int probe_arg_string(char *buf, size_t n, struct fetch_func *ff)
 {
int ret = -EINVAL;
 
@@ -250,7 +250,7 @@ static int trace_arg_string(char *buf, size_t n, struct 
fetch_func *ff)
if (ret = n)
goto end;
l += ret;
-   ret = trace_arg_string(buf + l, n - l, id-orig);
+   ret = probe_arg_string(buf + l, n - l, id-orig);
if (ret  0)
goto end;
l += ret;
@@ -408,7 +408,7 @@ static int split_symbol_offset(char *symbol, long *offset)
 #define PARAM_MAX_ARGS 16
 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
 
-static int parse_trace_arg(char *arg, struct fetch_func *ff, int is_return)
+static int parse_probe_arg(char *arg, struct fetch_func *ff, int is_return)
 {
int ret = 0;
unsigned long param;
@@ -499,7 +499,7 @@ static int parse_trace_arg(char *arg, struct fetch_func 
*ff, int is_return)
if (!id)
return -ENOMEM;
id-offset = offset;
-   ret = parse_trace_arg(arg, id-orig, is_return);
+   ret = parse_probe_arg(arg, id-orig, is_return);
if (ret)
kfree(id);
else {
@@ -617,7 +617,7 @@ static int create_trace_probe(int argc, char **argv)
ret = -ENOSPC;
goto error;
}
-   ret = parse_trace_arg(argv[i], tp-args[i], is_return);
+   ret = parse_probe_arg(argv[i], tp-args[i], is_return);
if (ret)
goto error;
}
@@ -680,7 +680,7 @@ static int probes_seq_show(struct seq_file *m, void *v)
seq_printf(m,  0x%p, probe_address(tp));
 
for (i = 0; i  tp-nr_args; i++) {
-   ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
+   ret = probe_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
if (ret  0) {
pr_warning(Argument%d decoding error(%d).\n, i, ret);
return ret;
@@ -996,7 +996,7 @@ static int kprobe_event_define_fields(struct 
ftrace_event_call *event_call)
sprintf(buf, arg%d, i);
DEFINE_FIELD(unsigned long, args[i], buf, 0);
/* Set argument string as an alias field */
-   ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
+   ret = probe_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
if (ret  0)
return ret;
DEFINE_FIELD(unsigned long, args[i], buf, 0);
@@ -1023,7 +1023,7 @@ static int kretprobe_event_define_fields(struct 
ftrace_event_call *event_call)
sprintf(buf, arg%d, i);
DEFINE_FIELD(unsigned long, args[i], buf, 0);
/* Set argument string as an alias field */
-   ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
+   ret = probe_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
if (ret  0)
return ret;
DEFINE_FIELD(unsigned long, args[i], buf, 0);
@@ -1040,7 +1040,7 @@ static int __probe_event_show_format(struct trace_seq *s,
 
/* Show aliases */
for (i = 0; i  tp-nr_args; i++) {
-   ret = trace_arg_string(buf, MAX_ARGSTR_LEN, tp-args[i]);
+   ret = probe_arg_string

Re: [PATCH -tip v14 01/12] x86: instruction decoder API

2009-08-20 Thread Masami Hiramatsu

Frederic Weisbecker wrote:

On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote:

Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

  Table: table-name
  Referrer: escaped-name
  opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
   (or)
  opcode: escape # escaped-name
  EndTable

Group opcodes, which has 8 elements, are written as below;

  GrpTable: GrpXXX
  reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.




I'm getting the following build error on an old K7 box:

arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’:
arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in 
this function)
arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only 
once
arch/x86/lib/inat.c:29: erreur: for each function it appears in.)


Thanks for reporting!
Hmm, it seems that inat-tables.c is not correctly generated.
Could you tell me which awk you used and send the inat-tables.c?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip v14 01/12] x86: instruction decoder API

2009-08-20 Thread Masami Hiramatsu

Frederic Weisbecker wrote:

On Thu, Aug 20, 2009 at 01:42:31AM +0200, Frederic Weisbecker wrote:

On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote:

Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

  Table: table-name
  Referrer: escaped-name
  opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
   (or)
  opcode: escape # escaped-name
  EndTable

Group opcodes, which has 8 elements, are written as below;

  GrpTable: GrpXXX
  reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.




I'm getting the following build error on an old K7 box:

arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’:
arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in 
this function)
arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only 
once
arch/x86/lib/inat.c:29: erreur: for each function it appears in.)


I've attached my config. I haven't such problem on a dual x86-64 box.



Actually I have the same problem in x86-64
The content of my arch/x86/lib/inat-tables.c:

/* x86 opcode map generated from x86-opcode-map.txt */
/* Do not change this code. */
/* Table: one byte opcode */
/* Escape opcode map array */
const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1][INAT_LPREFIX_MAX 
+ 1] = {
};

/* Group opcode map array */
const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1][INAT_LPREFIX_MAX + 
1] = {
};


I guess there is a problem with the generation of this file.


Aah, you may use mawk on Ubuntu 9.04, right?
If so, unfortunately, mawk is still under development.

http://invisible-island.net/mawk/CHANGES


20090727
add check/fix to prevent gsub from recurring to modify on a substring
of the current line when the regular expression is anchored to the
beginning of the line; fixes gawk's anchgsub testcase.

add check for implicit concatenation mistaken for exponent; fixes
gawk's hex testcase.

add character-classes to built-in regular expressions.

^^
Look, this means we can't use char-class expressions like
[:lower:] until this version...

And I've found another bug in mawk-1.3.3-20090728(the latest one).
it almost works, but;

$ mawk 'BEGIN {printf(0x%x\n, 0)}'
0x1
$ gawk 'BEGIN {printf(0x%x\n, 0)}'
0x0

This bug skips an array element index 0x0 in inat-tables.c :(

So, I recommend you to install gawk instead mawk until that
supports all posix-awk features, since I don't think it is
good idea to avoid all those bugs which depends on
implementation (not specification).


Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip v14 01/12] x86: instruction decoder API

2009-08-20 Thread Masami Hiramatsu

Frederic Weisbecker wrote:

On Thu, Aug 20, 2009 at 11:03:40AM -0400, Masami Hiramatsu wrote:

Frederic Weisbecker wrote:

On Thu, Aug 20, 2009 at 01:42:31AM +0200, Frederic Weisbecker wrote:

On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote:

Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

   Table: table-name
   Referrer: escaped-name
   opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
(or)
   opcode: escape # escaped-name
   EndTable

Group opcodes, which has 8 elements, are written as below;

   GrpTable: GrpXXX
   reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
   EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.




I'm getting the following build error on an old K7 box:

arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’:
arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in 
this function)
arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only 
once
arch/x86/lib/inat.c:29: erreur: for each function it appears in.)


I've attached my config. I haven't such problem on a dual x86-64 box.



Actually I have the same problem in x86-64
The content of my arch/x86/lib/inat-tables.c:

/* x86 opcode map generated from x86-opcode-map.txt */
/* Do not change this code. */
/* Table: one byte opcode */
/* Escape opcode map array */
const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1][INAT_LPREFIX_MAX 
+ 1] = {
};

/* Group opcode map array */
const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1][INAT_LPREFIX_MAX + 
1] = {
};


I guess there is a problem with the generation of this file.


Aah, you may use mawk on Ubuntu 9.04, right?
If so, unfortunately, mawk is still under development.

http://invisible-island.net/mawk/CHANGES




Aargh...



20090727
add check/fix to prevent gsub from recurring to modify on a substring
of the current line when the regular expression is anchored to the
beginning of the line; fixes gawk's anchgsub testcase.

add check for implicit concatenation mistaken for exponent; fixes
gawk's hex testcase.

add character-classes to built-in regular expressions.

 ^^
Look, this means we can't use char-class expressions like
[:lower:] until this version...

And I've found another bug in mawk-1.3.3-20090728(the latest one).
it almost works, but;

$ mawk 'BEGIN {printf(0x%x\n, 0)}'
0x1



Ouch, indeed.




$ gawk 'BEGIN {printf(0x%x\n, 0)}'
0x0

This bug skips an array element index 0x0 in inat-tables.c :(

So, I recommend you to install gawk instead mawk until that
supports all posix-awk features, since I don't think it is
good idea to avoid all those bugs which depends on
implementation (not specification).


Thank you,




Yeah, indeed. May be add a warning (or build error) in case the user uses
mawk?


Hmm, it is possible that mawk will fix those bugs and catch up soon,
so, I think checking mawk is not a good idea.
(and since there will be other awk implementations, it's not fair.)

I think what all I can do now is reporting bugs to
mawk and ubuntu people.:-)



Anyway that works fine now with gawk, thanks!
All your patches build well :-)


Thank you for testing!

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip v14 01/12] x86: instruction decoder API

2009-08-20 Thread Masami Hiramatsu

Frederic Weisbecker wrote:

On Thu, Aug 20, 2009 at 12:16:05PM -0400, Masami Hiramatsu wrote:

Frederic Weisbecker wrote:

On Thu, Aug 20, 2009 at 11:03:40AM -0400, Masami Hiramatsu wrote:

Frederic Weisbecker wrote:

On Thu, Aug 20, 2009 at 01:42:31AM +0200, Frederic Weisbecker wrote:

On Thu, Aug 13, 2009 at 04:34:13PM -0400, Masami Hiramatsu wrote:

Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

Table: table-name
Referrer: escaped-name
opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 (or)
opcode: escape # escaped-name
EndTable

Group opcodes, which has 8 elements, are written as below;

GrpTable: GrpXXX
reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.




I'm getting the following build error on an old K7 box:

arch/x86/lib/inat.c: In function ‘inat_get_opcode_attribute’:
arch/x86/lib/inat.c:29: erreur: ‘inat_primary_table’ undeclared (first use in 
this function)
arch/x86/lib/inat.c:29: erreur: (Each undeclared identifier is reported only 
once
arch/x86/lib/inat.c:29: erreur: for each function it appears in.)


I've attached my config. I haven't such problem on a dual x86-64 box.



Actually I have the same problem in x86-64
The content of my arch/x86/lib/inat-tables.c:

/* x86 opcode map generated from x86-opcode-map.txt */
/* Do not change this code. */
/* Table: one byte opcode */
/* Escape opcode map array */
const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1][INAT_LPREFIX_MAX 
+ 1] = {
};

/* Group opcode map array */
const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1][INAT_LPREFIX_MAX + 
1] = {
};


I guess there is a problem with the generation of this file.


Aah, you may use mawk on Ubuntu 9.04, right?
If so, unfortunately, mawk is still under development.

http://invisible-island.net/mawk/CHANGES




Aargh...



20090727
add check/fix to prevent gsub from recurring to modify on a substring
of the current line when the regular expression is anchored to the
beginning of the line; fixes gawk's anchgsub testcase.

add check for implicit concatenation mistaken for exponent; fixes
gawk's hex testcase.

add character-classes to built-in regular expressions.

  ^^
Look, this means we can't use char-class expressions like
[:lower:] until this version...

And I've found another bug in mawk-1.3.3-20090728(the latest one).
it almost works, but;

$ mawk 'BEGIN {printf(0x%x\n, 0)}'
0x1



Ouch, indeed.




$ gawk 'BEGIN {printf(0x%x\n, 0)}'
0x0

This bug skips an array element index 0x0 in inat-tables.c :(

So, I recommend you to install gawk instead mawk until that
supports all posix-awk features, since I don't think it is
good idea to avoid all those bugs which depends on
implementation (not specification).


Thank you,




Yeah, indeed. May be add a warning (or build error) in case the user uses
mawk?


Hmm, it is possible that mawk will fix those bugs and catch up soon,
so, I think checking mawk is not a good idea.
(and since there will be other awk implementations, it's not fair.)

I think what all I can do now is reporting bugs to
mawk and ubuntu people.:-)




Yeah, but without your tip I couldn't be able to find the origin
before some time.
And the kernel couldn't build anyway.

At least we should do something with this version of mawk.


Hm, indeed.
Maybe, we can run additional sanity check script before using
awk, like this;

---
res=`echo a | $AWK '/[[:lower:]]+/{print OK}'`
[ $res != OK ]  exit 1

res=`$AWK 'BEGIN {printf(%x, 0)}'`
[ $res != 0 ]  exit 1

exit 0
---

Thanks,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TOOL] kprobestest : Kprobe stress test tool

2009-08-20 Thread Masami Hiramatsu

Frederic Weisbecker wrote:

On Thu, Aug 13, 2009 at 04:57:20PM -0400, Masami Hiramatsu wrote:

This script tests kprobes to probe on all symbols in the kernel and finds
symbols which must be blacklisted.


Usage
-
   kprobestest [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST]
  Run stress test. If SYMLIST file is specified, use it as
  an initial symbol list (This is useful for verifying white list
  after diagnosing all symbols).

   kprobestest cleanup
  Cleanup all lists


How to Work
---
This tool list up all symbols in the kernel via /proc/kallsyms, and sorts
it into groups (each of them including 64 symbols in default). And then,
it tests each group by using kprobe-tracer. If a kernel crash occurred,
that group is moved into 'failed' dir. If the group passed the test, this
script moves it into 'passed' dir and saves kprobe_profile into
'passed/profiles/'.
After testing all groups, all 'failed' groups are merged and sorted into
smaller groups (divided by 4, in default). And those are tested again.
This loop will be repeated until all group has just 1 symbol.

Finally, the script sorts all 'passed' symbols into 'tested', 'untested',
and 'missed' based on profiles.


Note

  - This script just gives us some clues to the blacklisted functions.
In some cases, a combination of probe points will cause a problem, but
each of them doesn't cause the problem alone.

Thank you,




This script makes my x86-64 dual core easily and hardly locking-up
on the 1st batch of symbols to test.
I have one sym list in the failed and unset directories:

int_very_careful
int_signal
int_restore_rest
stub_clone
stub_fork
stub_vfork
stub_sigaltstack
stub_iopl
ptregscall_common
stub_execve
stub_rt_sigreturn
irq_entries_start
common_interrupt
ret_from_intr
exit_intr
retint_with_reschedule
retint_check
retint_swapgs
retint_restore_args
restore_args
irq_return
retint_careful
retint_signal
retint_kernel
irq_move_cleanup_interrupt
reboot_interrupt
apic_timer_interrupt
generic_interrupt
invalidate_interrupt0
invalidate_interrupt1
invalidate_interrupt2
invalidate_interrupt3
invalidate_interrupt4
invalidate_interrupt5
invalidate_interrupt6
invalidate_interrupt7
threshold_interrupt
thermal_interrupt
mce_self_interrupt
call_function_single_interrupt
call_function_interrupt
reschedule_interrupt
error_interrupt
spurious_interrupt
perf_pending_interrupt
divide_error
overflow
bounds
invalid_op
device_not_available
double_fault
coprocessor_segment_overrun
invalid_TSS
segment_not_present
spurious_interrupt_bug
coprocessor_error
alignment_check
simd_coprocessor_error
native_load_gs_index
gs_change
kernel_thread
child_rip
kernel_execve
call_softirq


I don't have a crash log because I was running with X.
But it also happened with other batch of symbols.


Thank you for reporting, here, I also have a result
tested on k...@x86-64.

native_read_tscp
native_read_msr_safe
native_read_msr_amd_safe
native_write_msr_safe
vmalloc_fault
spurious_fault
search_exception_tables
notify_die
trace_hardirqs_off_caller
ident_complete
lock_acquire
lock_release
bad_address
secondary_startup_64
stack_start
bad_address
restore_args
irq_return
restore
trace_hardirqs_off_thunk
init_level4_pgt
level3_ident_pgt
level3_kernel_pgt
level2_fixmap_pgt
_text
startup_64
level1_fixmap_pgt
level2_ident_pgt
level2_kernel_pgt
level2_spare_pgt
native_get_debugreg
native_set_debugreg
native_set_iopl_mask
native_load_sp0
debug_show_all_locks
debug_check_no_locks_held
valid_state
mark_lock
mark_held_locks
lockdep_trace_alloc
trace_softirqs_on
trace_hardirqs_on_caller
__down_write
__down_read
trace_hardirqs_on_thunk
lockdep_sys_exit_thunk

Most of them can be fixed just by adding __kprobes.
Some of them which are already in the another section, kprobes
should check the symbols are in the section.


The problem is that I don't have any serial line in this
box then I can't catch any crash log.
My K7 testbox also died in my arms this afternoon.

But I still have two other testboxes (one P2 and one P3),
hopefully I could reproduce the problem in these boxes
in which I can connect a serial line.


Thank you for helping me to find it!


I've pushed your patches in the following git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/fgrederic/random-tracing.git \
tracing/kprobes

So you can send patches on top of this one.


Great! I've found another trivial bugs, so I'll fix those on it.

Thank you,


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TOOL] kprobestest : Kprobe stress test tool

2009-08-20 Thread Masami Hiramatsu

Frederic Weisbecker wrote:

Most of them can be fixed just by adding __kprobes.
Some of them which are already in the another section, kprobes
should check the symbols are in the section.



You mean the blacklist?

I also fear that putting bad kprobed functions into the kprobe
section or into the blacklist may hide some kprobe internal bugs.

Doing so is indeed mandatory for functions that trigger tracing
recursion of things like that, but what if kprobe has an internal
bug that only triggers while probe a certain class of function.

Ie: it would be nice to identify the reason of the crash for
each culprit in these lists.


 That may even help to find the others in advance.

Indeed, actually I've found some bugs while making jump-optimization
patches by using this stress test.
But some of them are obviously what we just forget to add __kprobes,
since those will be called from kprobes int3 handling functions.

And also, many lock-related code has been changed. I think
kprobes should use raw_*_lock, or prohibit to probe lock monitoring
functions like lockdep, because it will cause recursive call.



Also kprobes seems to be a very fragile feature (that's what
this selftest unearthes at least for me).
And it really needs a recursion detection that stops every kprobing
while reaching a given threshold of recursion. Something
that would dump the stack and the falling kprobe structure.


Hmm, kprobes already has recursion detection(kp-nmiss), so
maybe, we can check it.



That would avoid such hard lockups and also help to identify
the dangerous symbols to probe.




The problem is that I don't have any serial line in this
box then I can't catch any crash log.
My K7 testbox also died in my arms this afternoon.

But I still have two other testboxes (one P2 and one P3),
hopefully I could reproduce the problem in these boxes
in which I can connect a serial line.


Thank you for helping me to find it!


I've pushed your patches in the following git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/fgrederic/random-tracing.git \
tracing/kprobes

So you can send patches on top of this one.


Great! I've found another trivial bugs, so I'll fix those on it.


Cool :)

Btw, here is the result of your stress test in a PIII (attaching the log
and the config).


Thanks, I'll check that.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip v14 03/12] kprobes: checks probe address is instruction boudary on x86

2009-08-18 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Thu, Aug 13, 2009 at 04:34:28PM -0400, Masami Hiramatsu wrote:
 Ensure safeness of inserting kprobes by checking whether the specified
 address is at the first byte of a instruction on x86.
 This is done by decoding probed function from its head to the probe point.

 Signed-off-by: Masami Hiramatsu mhira...@redhat.com
 Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
 Cc: Avi Kivity a...@redhat.com
 Cc: Andi Kleen a...@linux.intel.com
 Cc: Christoph Hellwig h...@infradead.org
 Cc: Frank Ch. Eigler f...@redhat.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: H. Peter Anvin h...@zytor.com
 Cc: Ingo Molnar mi...@elte.hu
 Cc: Jason Baron jba...@redhat.com
 Cc: Jim Keniston jkeni...@us.ibm.com
 Cc: K.Prasad pra...@linux.vnet.ibm.com
 Cc: Lai Jiangshan la...@cn.fujitsu.com
 Cc: Li Zefan l...@cn.fujitsu.com
 Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
 Cc: Roland McGrath rol...@redhat.com
 Cc: Sam Ravnborg s...@ravnborg.org
 Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: Tom Zanussi tzanu...@gmail.com
 Cc: Vegard Nossum vegard.nos...@gmail.com
 ---

  arch/x86/kernel/kprobes.c |   69 
 +
  1 files changed, 69 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
 index b5b1848..80d493f 100644
 --- a/arch/x86/kernel/kprobes.c
 +++ b/arch/x86/kernel/kprobes.c
 @@ -48,6 +48,7 @@
  #include linux/preempt.h
  #include linux/module.h
  #include linux/kdebug.h
 +#include linux/kallsyms.h
  
  #include asm/cacheflush.h
  #include asm/desc.h
 @@ -55,6 +56,7 @@
  #include asm/uaccess.h
  #include asm/alternative.h
  #include asm/debugreg.h
 +#include asm/insn.h
  
  void jprobe_return_end(void);
  
 @@ -245,6 +247,71 @@ retry:
  }
  }
  
 +/* Recover the probed instruction at addr for further analysis. */
 +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long 
 addr)
 +{
 +struct kprobe *kp;
 +kp = get_kprobe((void *)addr);
 +if (!kp)
 +return -EINVAL;
 +
 +/*
 + *  Basically, kp-ainsn.insn has an original instruction.
 + *  However, RIP-relative instruction can not do single-stepping
 + *  at different place, fix_riprel() tweaks the displacement of
 + *  that instruction. In that case, we can't recover the instruction
 + *  from the kp-ainsn.insn.
 + *
 + *  On the other hand, kp-opcode has a copy of the first byte of
 + *  the probed instruction, which is overwritten by int3. And
 + *  the instruction at kp-addr is not modified by kprobes except
 + *  for the first byte, we can recover the original instruction
 + *  from it and kp-opcode.
 + */
 +memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
 +buf[0] = kp-opcode;
 +return 0;
 +}
 +
 +/* Dummy buffers for kallsyms_lookup */
 +static char __dummy_buf[KSYM_NAME_LEN];
 +
 +/* Check if paddr is at an instruction boundary */
 +static int __kprobes can_probe(unsigned long paddr)
 +{
 +int ret;
 +unsigned long addr, offset = 0;
 +struct insn insn;
 +kprobe_opcode_t buf[MAX_INSN_SIZE];
 +
 +if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
 +return 0;
 +
 +/* Decode instructions */
 +addr = paddr - offset;
 +while (addr  paddr) {
 +kernel_insn_init(insn, (void *)addr);
 +insn_get_opcode(insn);
 +
 +/* Check if the instruction has been modified. */
 +if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) {
 +ret = recover_probed_instruction(buf, addr);
 
 
 
 I'm confused about the reason of this recovering. Is it to remove
 kprobes behind the current setting one in the current function?

No, it recovers just an instruction which is probed by a kprobe,
because we need to know the first byte of this instruction for
decoding it.

Perhaps we'd better to have more generic interface (text_peek?)
for it because another subsystem (e.g. kgdb) may want to insert int3...

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip v14 03/12] kprobes: checks probe address is instruction boudary on x86

2009-08-18 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Tue, Aug 18, 2009 at 07:17:39PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 +  while (addr  paddr) {
 +  kernel_insn_init(insn, (void *)addr);
 +  insn_get_opcode(insn);
 +
 +  /* Check if the instruction has been modified. */
 +  if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) {
 +  ret = recover_probed_instruction(buf, addr);



 I'm confused about the reason of this recovering. Is it to remove
 kprobes behind the current setting one in the current function?

 No, it recovers just an instruction which is probed by a kprobe,
 because we need to know the first byte of this instruction for
 decoding it.

Ah, sorry, it was not accurate. the function recovers an instruction
on the buffer(buf), not on the real kernel text. :)


 Perhaps we'd better to have more generic interface (text_peek?)
 for it because another subsystem (e.g. kgdb) may want to insert int3...

 Thank you,
 
 
 Aah, I see now, it's to keep a sane check of the instructions
 boundaries without int 3 artifacts in the middle.
 
 But in that case, you should re-arm the breakpoint after your
 check, right?
 
 Or may be you could do the check without repatching?

Yes, it doesn't modify kernel text, just recover an original
instruction from kernel text and backup byte on a buffer.

 May be by doing a copy of insn.opcode.bytes and replacing bytes[0]
 with what a random kprobe has stolen?

Hm, no, this function is protected from other kprobes by kprobe_mutex.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip v14 07/12] tracing: Introduce TRACE_FIELD_ZERO() macro

2009-08-18 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Thu, Aug 13, 2009 at 04:35:01PM -0400, Masami Hiramatsu wrote:
 Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item).
 This also includes a fix of TRACE_ZERO_CHAR() macro.
 
 
 I can't find what the fix is about (see below)

Ah, OK. This patch actually includes two parts.

One is introducing TRACE_FIELD_ZERO which is more generic than
TRACE_FIELD_ZERO_CHAR, I think.

Another is a typo fix of TRACE_ZERO_CHAR.


 Signed-off-by: Masami Hiramatsu mhira...@redhat.com
 Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
 Cc: Avi Kivity a...@redhat.com
 Cc: Andi Kleen a...@linux.intel.com
 Cc: Christoph Hellwig h...@infradead.org
 Cc: Frank Ch. Eigler f...@redhat.com
 Cc: Frederic Weisbecker fweis...@gmail.com
 Cc: H. Peter Anvin h...@zytor.com
 Cc: Ingo Molnar mi...@elte.hu
 Cc: Jason Baron jba...@redhat.com
 Cc: Jim Keniston jkeni...@us.ibm.com
 Cc: K.Prasad pra...@linux.vnet.ibm.com
 Cc: Lai Jiangshan la...@cn.fujitsu.com
 Cc: Li Zefan l...@cn.fujitsu.com
 Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
 Cc: Roland McGrath rol...@redhat.com
 Cc: Sam Ravnborg s...@ravnborg.org
 Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Cc: Steven Rostedt rost...@goodmis.org
 Cc: Tom Zanussi tzanu...@gmail.com
 Cc: Vegard Nossum vegard.nos...@gmail.com
 ---

  kernel/trace/trace_event_types.h |4 ++--
  kernel/trace/trace_export.c  |   16 
  2 files changed, 10 insertions(+), 10 deletions(-)

 diff --git a/kernel/trace/trace_event_types.h 
 b/kernel/trace/trace_event_types.h
 index 6db005e..e74f090 100644
 --- a/kernel/trace/trace_event_types.h
 +++ b/kernel/trace/trace_event_types.h
 @@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
 ignore,
  TRACE_STRUCT(
  TRACE_FIELD(unsigned long, ip, ip)
  TRACE_FIELD(char *, fmt, fmt)
 -TRACE_FIELD_ZERO_CHAR(buf)
 +TRACE_FIELD_ZERO(char, buf)
  ),
  TP_RAW_FMT(%08lx (%d) fmt:%p %s)
  );
 @@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
 ignore,
  TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
  TRACE_STRUCT(
  TRACE_FIELD(unsigned long, ip, ip)
 -TRACE_FIELD_ZERO_CHAR(buf)
 +TRACE_FIELD_ZERO(char, buf)
  ),
  TP_RAW_FMT(%08lx (%d) fmt:%p %s)
  );
 diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
 index 71c8d7f..b0ac92c 100644
 --- a/kernel/trace/trace_export.c
 +++ b/kernel/trace/trace_export.c
 @@ -42,9 +42,9 @@ extern void __bad_type_size(void);
  if (!ret)   \
  return 0;
  
 -#undef TRACE_FIELD_ZERO_CHAR
 -#define TRACE_FIELD_ZERO_CHAR(item) \
 -ret = trace_seq_printf(s, \tfield:char  #item ;\t   \
 +#undef TRACE_FIELD_ZERO
 +#define TRACE_FIELD_ZERO(type, item)
 \
 +ret = trace_seq_printf(s, \tfield: #type   #item ;\t  \
 offset:%u;\tsize:0;\n, \
 (unsigned int)offsetof(typeof(field), item)); \
  if (!ret)   \
 @@ -92,9 +92,6 @@ ftrace_format_##call(struct ftrace_event_call *unused, 
 \
  
  #include trace_event_types.h
  
 -#undef TRACE_ZERO_CHAR
 -#define TRACE_ZERO_CHAR(arg)
 -
  #undef TRACE_FIELD
  #define TRACE_FIELD(type, item, assign)\
  entry-item = assign;
 @@ -107,6 +104,9 @@ ftrace_format_##call(struct ftrace_event_call *unused,   
 \
  #define TRACE_FIELD_SIGN(type, item, assign, is_signed) \
  TRACE_FIELD(type, item, assign)
  
 +#undef TRACE_FIELD_ZERO
 +#define TRACE_FIELD_ZERO(type, item)
 +
 
 
 
 Is it about the above moving?
 If so, could you just tell so that I can add something about
 it in the changelog.

No, I assume that TRACE_ZERO_CHAR is just a typo of TRACE_FIELD_ZERO_CHAR.
(because I couldn't find any other TRACE_ZERO_CHAR)

BTW, this patch may not be needed after applying patch 10/12, since
it removes ftrace event definitions of TRACE_KPROBE/KRETPROBE.

Perhaps, would I better merge and split those additional patches(and
remove this change)?
(It also could make the incremental review hard...)

Thank you,


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v14 02/12] x86: x86 instruction decoder build-time selftest

2009-08-13 Thread Masami Hiramatsu
Add a user-space selftest of x86 instruction decoder at kernel build time.
When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
instruction decoder and performs it after building vmlinux.
The test compares the results of objdump and x86 instruction decoder
code and check there are no differences.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 arch/x86/Kconfig.debug|9 +++
 arch/x86/Makefile |3 +
 arch/x86/tools/Makefile   |   15 +
 arch/x86/tools/distill.awk|   42 +++
 arch/x86/tools/test_get_len.c |  113 +
 5 files changed, 182 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/tools/Makefile
 create mode 100644 arch/x86/tools/distill.awk
 create mode 100644 arch/x86/tools/test_get_len.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d105f29..7d0b681 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -186,6 +186,15 @@ config X86_DS_SELFTEST
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_DECODER_SELFTEST
+ bool x86 instruction decoder selftest
+ depends on DEBUG_KERNEL
+   ---help---
+Perform x86 instruction decoder selftests at build time.
+This option is useful for checking the sanity of x86 instruction
+decoder code.
+If unsure, say N.
+
 #
 # IO delay types:
 #
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1f3851a..f79580c 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -154,6 +154,9 @@ all: bzImage
 KBUILD_IMAGE := $(boot)/bzImage
 
 bzImage: vmlinux
+ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
+   $(Q)$(MAKE) $(build)=arch/x86/tools posttest
+endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage 
$(objtree)/arch/$(UTS_MACHINE)/boot/$@
diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
new file mode 100644
index 000..3dd626b
--- /dev/null
+++ b/arch/x86/tools/Makefile
@@ -0,0 +1,15 @@
+PHONY += posttest
+quiet_cmd_posttest = TEST$@
+  cmd_posttest = $(OBJDUMP) -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len
+
+posttest: $(obj)/test_get_len vmlinux
+   $(call cmd,posttest)
+
+hostprogs-y:= test_get_len
+
+# -I needed for generated C source and C source which in the kernel tree.
+HOSTCFLAGS_test_get_len.o := -Wall -I$(objtree)/arch/x86/lib/ 
-I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/
+
+# Dependancies are also needed.
+$(obj)/test_get_len.o: $(srctree)/arch/x86/lib/insn.c 
$(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h 
$(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h 
$(objtree)/arch/x86/lib/inat-tables.c
+
diff --git a/arch/x86/tools/distill.awk b/arch/x86/tools/distill.awk
new file mode 100644
index 000..d433619
--- /dev/null
+++ b/arch/x86/tools/distill.awk
@@ -0,0 +1,42 @@
+#!/bin/awk -f
+# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len
+# Distills the disassembly as follows:
+# - Removes all lines except the disassembled instructions.
+# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes
+# into a single line.
+# - Remove bad(or prefix only) instructions
+
+BEGIN {
+   prev_addr = 
+   prev_hex = 
+   prev_mnemonic = 
+   bad_expr = 
(\\(bad\\)|^rex|^.byte|^rep(z|nz)$|^lock$|^es$|^cs$|^ss$|^ds$|^fs$|^gs$|^data(16|32)$|^addr(16|32|64))
+   fwait_expr = ^9b 
+   fwait_str=9b\tfwait
+}
+
+/^ *[0-9a-f]+:/ {
+   if (split($0, field, \t)  3) {
+   # This is a continuation of the same insn.
+   prev_hex = prev_hex field[2]
+   } else {
+   # Skip bad instructions
+   if (match(prev_mnemonic, bad_expr))
+   prev_addr = 
+   # Split fwait from other f* instructions
+   if (match(prev_hex, fwait_expr)  prev_mnemonic != fwait) {
+   printf %s\t%s\n, prev_addr, fwait_str
+   sub(fwait_expr, , prev_hex)
+   }
+   if (prev_addr

[PATCH -tip v14 01/12] x86: instruction decoder API

2009-08-13 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Acked-by: H. Peter Anvin h...@zytor.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 arch/x86/include/asm/inat.h  |  188 +
 arch/x86/include/asm/inat_types.h|   29 +
 arch/x86/include/asm/insn.h  |  143 +++
 arch/x86/lib/Makefile|   13 +
 arch/x86/lib/inat.c  |   78 
 arch/x86/lib/insn.c  |  464 ++
 arch/x86/lib/x86-opcode-map.txt  |  719 ++
 arch/x86/tools/gen-insn-attr-x86.awk |  314 +++
 8 files changed, 1948 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/inat_types.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/tools/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..2866fdd
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,188 @@
+#ifndef _ASM_X86_INAT_H
+#define _ASM_X86_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include asm/inat_types.h
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should use checking functions.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6

[PATCH -tip v14 00/12] tracing: kprobe-based event tracer and x86 instruction decoder

2009-08-13 Thread Masami Hiramatsu
 a new definition to kprobe_events
as below.

  echo p:myprobe do_sys_open a0 a1 a2 a3  
/sys/kernel/debug/tracing/kprobe_events

 This sets a kprobe on the top of do_sys_open() function with recording
1st to 4th arguments as myprobe event.

  echo r:myretprobe do_sys_open rv ra  /sys/kernel/debug/tracing/kprobe_events

 This sets a kretprobe on the return point of do_sys_open() function with
recording return value and return address as myretprobe event.
 You can see the format of these events via
/sys/kernel/debug/tracing/events/kprobes/EVENT/format.

  cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 23
format:
field:unsigned short common_type;   offset:0;   size:2;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (12):
  tracing: Add kprobes event profiling interface
  tracing: Kprobe tracer assigns new event ids for each event
  tracing: Generate names for each kprobe event automatically
  tracing: Kprobe-tracer supports more than 6 arguments
  tracing: add kprobe-based event tracer
  tracing: Introduce TRACE_FIELD_ZERO() macro
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobetrace.txt  |  148 
 arch/x86/Kconfig.debug   |9 
 arch/x86/Makefile|3 
 arch/x86/include/asm/inat.h  |  188 +
 arch/x86/include/asm/inat_types.h|   29 +
 arch/x86/include/asm/insn.h  |  143 
 arch/x86/include/asm/ptrace.h|   62 ++
 arch/x86/kernel/kprobes.c|  197 +++--
 arch/x86/kernel/ptrace.c |  112 +++
 arch/x86/lib/Makefile|   13 
 arch/x86/lib/inat.c  |   78 ++
 arch/x86/lib/insn.c  |  464 +
 arch/x86/lib/x86-opcode-map.txt  |  719 
 arch/x86/tools/Makefile  |   15 
 arch/x86/tools/distill.awk   |   42 +
 arch/x86/tools/gen-insn-attr-x86.awk |  314 +
 arch/x86/tools/test_get_len.c|  113 +++
 include/linux/ftrace_event.h |   14 
 include/linux/syscalls.h |4 
 include/trace/ftrace.h   |   19 -
 include/trace/syscall.h  |8 
 kernel/trace/Kconfig |   12 
 kernel/trace/Makefile|1 
 kernel/trace/trace.h |   23 +
 kernel/trace/trace_event_types.h |4 
 kernel/trace/trace_events.c  |  119 ++-
 kernel/trace/trace_export.c  |   39 +
 kernel/trace/trace_kprobe.c  | 1234 ++
 kernel/trace/trace_syscalls.c|   16 
 29 files changed, 3949 insertions(+), 193 deletions(-)
 create mode 100644

[PATCH -tip v14 04/12] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-08-13 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 80d493f..98f48d0 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67

[PATCH -tip v14 06/12] tracing: ftrace dynamic ftrace_event_call support

2009-08-13 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Changes from v13:
 - Define remove_subsystem_dir() always (revirt a2ca5e03), because
   trace_remove_event_call() uses it.
 - Modify syscall tracer because of ftrace_event_call change.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 include/linux/ftrace_event.h  |   14 +++--
 include/linux/syscalls.h  |4 +
 include/trace/ftrace.h|   19 +++
 include/trace/syscall.h   |8 +--
 kernel/trace/trace_events.c   |  119 +
 kernel/trace/trace_export.c   |   23 
 kernel/trace/trace_syscalls.c |   16 +++---
 7 files changed, 125 insertions(+), 78 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 189806b..9af68ce 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -112,13 +112,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void *);
-   void(*unregfunc)(void *);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct ftrace_event_call *call,
-  struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
struct event_filter *filter;
@@ -142,6 +142,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 87d06c1..be59d22 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -165,7 +165,7 @@ static void prof_sysexit_disable_##sname(struct 
ftrace_event_call *event_call) \
struct trace_event enter_syscall_print_##sname = {  \
.trace  = print_syscall_enter,  \
};  \
-   static int init_enter_##sname(void) \
+   static int init_enter_##sname(struct ftrace_event_call *call)   \
{   \
int num, id;\
num = syscall_name_to_nr(sys#sname);  \
@@ -201,7 +201,7 @@ static void prof_sysexit_disable_##sname(struct 
ftrace_event_call *event_call) \
struct trace_event exit_syscall_print_##sname = {   \
.trace  = print_syscall_exit,   \
};  \
-   static int init_exit_##sname(void)  \
+   static int init_exit_##sname(struct ftrace_event_call *call)\
{   \
int num, id;\
num = syscall_name_to_nr(sys#sname);  \
diff --git

[PATCH -tip v14 07/12] tracing: Introduce TRACE_FIELD_ZERO() macro

2009-08-13 Thread Masami Hiramatsu
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item).
This also includes a fix of TRACE_ZERO_CHAR() macro.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 kernel/trace/trace_event_types.h |4 ++--
 kernel/trace/trace_export.c  |   16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 6db005e..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
TRACE_FIELD(char *, fmt, fmt)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
@@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
 TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 71c8d7f..b0ac92c 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -42,9 +42,9 @@ extern void __bad_type_size(void);
if (!ret)   \
return 0;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)\
-   ret = trace_seq_printf(s, \tfield:char  #item ;\t   \
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)   \
+   ret = trace_seq_printf(s, \tfield: #type   #item ;\t  \
   offset:%u;\tsize:0;\n, \
   (unsigned int)offsetof(typeof(field), item)); \
if (!ret)   \
@@ -92,9 +92,6 @@ ftrace_format_##call(struct ftrace_event_call *unused,
\
 
 #include trace_event_types.h
 
-#undef TRACE_ZERO_CHAR
-#define TRACE_ZERO_CHAR(arg)
-
 #undef TRACE_FIELD
 #define TRACE_FIELD(type, item, assign)\
entry-item = assign;
@@ -107,6 +104,9 @@ ftrace_format_##call(struct ftrace_event_call *unused,  
\
 #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\
TRACE_FIELD(type, item, assign)
 
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
+
 #undef TP_CMD
 #define TP_CMD(cmd...) cmd
 
@@ -178,8 +178,8 @@ __attribute__((section(_ftrace_events))) event_##call = { 
\
if (ret)\
return ret;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
 
 #undef TRACE_EVENT_FORMAT
 #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v14 10/12] tracing: Generate names for each kprobe event automatically

2009-08-13 Thread Masami Hiramatsu
Generate names for each kprobe event based on the probe point,
and remove generic k*probe event types because there is no user
of those types.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 Documentation/trace/kprobetrace.txt |3 +-
 kernel/trace/trace_event_types.h|   18 --
 kernel/trace/trace_kprobe.c |   64 ++-
 3 files changed, 35 insertions(+), 50 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index c9c09b4..5e59e85 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -28,7 +28,8 @@ Synopsis of kprobe_events
   p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
   r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
 
- EVENT : Event name.
+ EVENT : Event name. If omitted, the event name is generated
+ based on SYMBOL+offs or MEMADDR.
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 186b598..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, 
kmemtrace_free_entry, ignore,
TP_RAW_FMT(type:%u call_site:%lx ptr:%p)
 );
 
-TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx: args:0x%lx ...)
-);
-
-TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, func, func)
-   TRACE_FIELD(unsigned long, ret_ip, ret_ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...)
-);
 #undef TRACE_SYSTEM
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4704e40..ec137ed 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -34,6 +34,7 @@
 
 #define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
+#define MAX_EVENT_NAME_LEN 64
 
 /* currently, trace_kprobe only supports X86. */
 
@@ -280,11 +281,11 @@ static struct trace_probe *alloc_trace_probe(const char 
*symbol,
if (!tp-symbol)
goto error;
}
-   if (event) {
-   tp-call.name = kstrdup(event, GFP_KERNEL);
-   if (!tp-call.name)
-   goto error;
-   }
+   if (!event)
+   goto error;
+   tp-call.name = kstrdup(event, GFP_KERNEL);
+   if (!tp-call.name)
+   goto error;
 
INIT_LIST_HEAD(tp-list);
return tp;
@@ -314,7 +315,7 @@ static struct trace_probe *find_probe_event(const char 
*event)
struct trace_probe *tp;
 
list_for_each_entry(tp, probe_list, list)
-   if (tp-call.name  !strcmp(tp-call.name, event))
+   if (!strcmp(tp-call.name, event))
return tp;
return NULL;
 }
@@ -330,8 +331,7 @@ static void __unregister_trace_probe(struct trace_probe *tp)
 /* Unregister a trace_probe and probe_event: call with locking probe_lock */
 static void unregister_trace_probe(struct trace_probe *tp)
 {
-   if (tp-call.name)
-   unregister_probe_event(tp);
+   unregister_probe_event(tp);
__unregister_trace_probe(tp);
list_del(tp-list);
 }
@@ -360,18 +360,16 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}
/* register as an event */
-   if (tp-call.name) {
-   old_tp = find_probe_event(tp-call.name);
-   if (old_tp) {
-   /* delete old event */
-   unregister_trace_probe(old_tp);
-   free_trace_probe(old_tp

[PATCH -tip v14 12/12] tracing: Add kprobes event profiling interface

2009-08-13 Thread Masami Hiramatsu
Add profiling interaces for each kprobes event. This interface provides
how many times each probe hit or missed.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 Documentation/trace/kprobetrace.txt |8 +++
 kernel/trace/trace_kprobe.c |   43 +++
 2 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 5e59e85..3de7517 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -70,6 +70,14 @@ filter:
  names and field names for describing filters.
 
 
+Event Profiling
+---
+ You can check the total number of probe hits and probe miss-hits via
+/sys/kernel/debug/tracing/kprobe_profile.
+ The first column is event name, the second is the number of probe hits,
+the third is the number of probe miss-hits.
+
+
 Usage examples
 --
 To add a probe as a new event, write a new definition to kprobe_events
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 0e8498e..0f5d0a6 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -184,6 +184,7 @@ struct trace_probe {
struct kprobe   kp;
struct kretproberp;
};
+   unsigned long   nhit;
const char  *symbol;/* symbol name */
struct ftrace_event_callcall;
struct trace_event  event;
@@ -781,6 +782,37 @@ static const struct file_operations kprobe_events_ops = {
.write  = probes_write,
 };
 
+/* Probes profiling interfaces */
+static int probes_profile_seq_show(struct seq_file *m, void *v)
+{
+   struct trace_probe *tp = v;
+
+   seq_printf(m,   %-44s %15lu %15lu\n, tp-call.name, tp-nhit,
+  probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed);
+
+   return 0;
+}
+
+static const struct seq_operations profile_seq_op = {
+   .start  = probes_seq_start,
+   .next   = probes_seq_next,
+   .stop   = probes_seq_stop,
+   .show   = probes_profile_seq_show
+};
+
+static int profile_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, profile_seq_op);
+}
+
+static const struct file_operations kprobe_profile_ops = {
+   .owner  = THIS_MODULE,
+   .open   = profile_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 /* Kprobe handler */
 static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs)
 {
@@ -791,6 +823,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, 
struct pt_regs *regs)
unsigned long irq_flags;
struct ftrace_event_call *call = tp-call;
 
+   tp-nhit++;
+
local_save_flags(irq_flags);
pc = preempt_count();
 
@@ -1143,9 +1177,18 @@ static __init int init_kprobe_trace(void)
entry = debugfs_create_file(kprobe_events, 0644, d_tracer,
NULL, kprobe_events_ops);
 
+   /* Event list interface */
if (!entry)
pr_warning(Could not create debugfs 
   'kprobe_events' entry\n);
+
+   /* Profile interface */
+   entry = debugfs_create_file(kprobe_profile, 0444, d_tracer,
+   NULL, kprobe_profile_ops);
+
+   if (!entry)
+   pr_warning(Could not create debugfs 
+  'kprobe_profile' entry\n);
return 0;
 }
 fs_initcall(init_kprobe_trace);


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v14 08/12] tracing: add kprobe-based event tracer

2009-08-13 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes
(kprobe and kretprobe). It probes anywhere where kprobes can probe(this
 means, all functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  sa: Fetch stack address.
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobetrace.txt for details.

Changes from v13:
 - Support 'sa' for stack address.
 - Use call-data instead of container_of() macro.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 Documentation/trace/kprobetrace.txt |  139 
 kernel/trace/Kconfig|   12 
 kernel/trace/Makefile   |1 
 kernel/trace/trace.h|   29 +
 kernel/trace/trace_event_types.h|   18 +
 kernel/trace/trace_kprobe.c | 1205 +++
 6 files changed, 1404 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
new file mode 100644
index 000..efff6eb
--- /dev/null
+++ b/Documentation/trace/kprobetrace.txt
@@ -0,0 +1,139 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
+probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+
+ EVENT : Event name.
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
+ MEMADDR   : Address where the probe is inserted.
+
+ FETCHARGS : Arguments.
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  sa   : Fetch stack address.
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace

[PATCH -tip v14 09/12] tracing: Kprobe-tracer supports more than 6 arguments

2009-08-13 Thread Masami Hiramatsu
Support up to 128 arguments for each kprobes event.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 Documentation/trace/kprobetrace.txt |2 +-
 kernel/trace/trace_kprobe.c |   21 +
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index efff6eb..c9c09b4 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -32,7 +32,7 @@ Synopsis of kprobe_events
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
- FETCHARGS : Arguments.
+ FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
   sN   : Fetch Nth entry of stack (N = 0)
   sa   : Fetch stack address.
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d92877a..4704e40 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -32,7 +32,7 @@
 #include trace.h
 #include trace_output.h
 
-#define TRACE_KPROBE_ARGS 6
+#define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
 
 /* currently, trace_kprobe only supports X86. */
@@ -184,11 +184,15 @@ struct trace_probe {
struct kretproberp;
};
const char  *symbol;/* symbol name */
-   unsigned intnr_args;
-   struct fetch_func   args[TRACE_KPROBE_ARGS];
struct ftrace_event_callcall;
+   unsigned intnr_args;
+   struct fetch_func   args[];
 };
 
+#define SIZEOF_TRACE_PROBE(n)  \
+   (offsetof(struct trace_probe, args) +   \
+   (sizeof(struct fetch_func) * (n)))
+
 static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
 static int kretprobe_trace_func(struct kretprobe_instance *ri,
struct pt_regs *regs);
@@ -263,11 +267,11 @@ static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
 
 static struct trace_probe *alloc_trace_probe(const char *symbol,
-const char *event)
+const char *event, int nargs)
 {
struct trace_probe *tp;
 
-   tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL);
+   tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL);
if (!tp)
return ERR_PTR(-ENOMEM);
 
@@ -573,9 +577,10 @@ static int create_trace_probe(int argc, char **argv)
if (offset  is_return)
return -EINVAL;
}
+   argc -= 2; argv += 2;
 
/* setup a probe */
-   tp = alloc_trace_probe(symbol, event);
+   tp = alloc_trace_probe(symbol, event, argc);
if (IS_ERR(tp))
return PTR_ERR(tp);
 
@@ -594,8 +599,8 @@ static int create_trace_probe(int argc, char **argv)
kp-addr = addr;
 
/* parse arguments */
-   argc -= 2; argv += 2; ret = 0;
-   for (i = 0; i  argc  i  TRACE_KPROBE_ARGS; i++) {
+   ret = 0;
+   for (i = 0; i  argc  i  MAX_TRACE_ARGS; i++) {
if (strlen(argv[i])  MAX_ARGSTR_LEN) {
pr_info(Argument%d(%s) is too long.\n, i, argv[i]);
ret = -ENOSPC;


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v14 11/12] tracing: Kprobe tracer assigns new event ids for each event

2009-08-13 Thread Masami Hiramatsu
Assigns new event ids for each kprobes event. This doesn't clear ring_buffer
when unregistering each kprobe event. Thus, if you mind 'Unknown event'
messages, clear the buffer manually after changing kprobe events.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 kernel/trace/trace.h|6 -
 kernel/trace/trace_kprobe.c |   51 +--
 2 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4ce4525..0b78d76 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -43,8 +43,6 @@ enum trace_type {
TRACE_POWER,
TRACE_BLK,
TRACE_KSYM,
-   TRACE_KPROBE,
-   TRACE_KRETPROBE,
 
__TRACE_LAST_TYPE,
 };
@@ -358,10 +356,6 @@ extern void __ftrace_bad_type(void);
IF_ASSIGN(var, ent, struct kmemtrace_free_entry,\
  TRACE_KMEM_FREE); \
IF_ASSIGN(var, ent, struct ksym_trace_entry, TRACE_KSYM);\
-   IF_ASSIGN(var, ent, struct kprobe_trace_entry,  \
- TRACE_KPROBE);\
-   IF_ASSIGN(var, ent, struct kretprobe_trace_entry,   \
- TRACE_KRETPROBE); \
__ftrace_bad_type();\
} while (0)
 
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ec137ed..0e8498e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -186,6 +186,7 @@ struct trace_probe {
};
const char  *symbol;/* symbol name */
struct ftrace_event_callcall;
+   struct trace_event  event;
unsigned intnr_args;
struct fetch_func   args[];
 };
@@ -795,7 +796,7 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, 
struct pt_regs *regs)
 
size = SIZEOF_KPROBE_TRACE_ENTRY(tp-nr_args);
 
-   event = trace_current_buffer_lock_reserve(TRACE_KPROBE, size,
+   event = trace_current_buffer_lock_reserve(call-id, size,
  irq_flags, pc);
if (!event)
return 0;
@@ -827,7 +828,7 @@ static __kprobes int kretprobe_trace_func(struct 
kretprobe_instance *ri,
 
size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp-nr_args);
 
-   event = trace_current_buffer_lock_reserve(TRACE_KRETPROBE, size,
+   event = trace_current_buffer_lock_reserve(call-id, size,
  irq_flags, pc);
if (!event)
return 0;
@@ -853,7 +854,7 @@ print_kprobe_event(struct trace_iterator *iter, int flags)
struct trace_seq *s = iter-seq;
int i;
 
-   trace_assign_type(field, iter-ent);
+   field = (struct kprobe_trace_entry *)iter-ent;
 
if (!seq_print_ip_sym(s, field-ip, flags | TRACE_ITER_SYM_OFFSET))
goto partial;
@@ -880,7 +881,7 @@ print_kretprobe_event(struct trace_iterator *iter, int 
flags)
struct trace_seq *s = iter-seq;
int i;
 
-   trace_assign_type(field, iter-ent);
+   field = (struct kretprobe_trace_entry *)iter-ent;
 
if (!seq_print_ip_sym(s, field-ret_ip, flags | TRACE_ITER_SYM_OFFSET))
goto partial;
@@ -906,16 +907,6 @@ partial:
return TRACE_TYPE_PARTIAL_LINE;
 }
 
-static struct trace_event kprobe_trace_event = {
-   .type   = TRACE_KPROBE,
-   .trace  = print_kprobe_event,
-};
-
-static struct trace_event kretprobe_trace_event = {
-   .type   = TRACE_KRETPROBE,
-   .trace  = print_kretprobe_event,
-};
-
 static int probe_event_enable(struct ftrace_event_call *call)
 {
struct trace_probe *tp = (struct trace_probe *)call-data;
@@ -1107,35 +1098,35 @@ static int register_probe_event(struct trace_probe *tp)
/* Initialize ftrace_event_call */
call-system = kprobes;
if (probe_is_return(tp)) {
-   call-event = kretprobe_trace_event;
-   call-id = TRACE_KRETPROBE;
+   tp-event.trace = print_kretprobe_event

[PATCH -tip v14 03/12] kprobes: checks probe address is instruction boudary on x86

2009-08-13 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Avi Kivity a...@redhat.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Frank Ch. Eigler f...@redhat.com
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Jason Baron jba...@redhat.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: K.Prasad pra...@linux.vnet.ibm.com
Cc: Lai Jiangshan la...@cn.fujitsu.com
Cc: Li Zefan l...@cn.fujitsu.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Roland McGrath rol...@redhat.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Vegard Nossum vegard.nos...@gmail.com
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b5b1848..80d493f 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,6 +48,7 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
@@ -55,6 +56,7 @@
 #include asm/uaccess.h
 #include asm/alternative.h
 #include asm/debugreg.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -245,6 +247,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TOOL] kprobestest : Kprobe stress test tool

2009-08-13 Thread Masami Hiramatsu

This script tests kprobes to probe on all symbols in the kernel and finds
symbols which must be blacklisted.


Usage
-
  kprobestest [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST]
 Run stress test. If SYMLIST file is specified, use it as
 an initial symbol list (This is useful for verifying white list
 after diagnosing all symbols).

  kprobestest cleanup
 Cleanup all lists


How to Work
---
This tool list up all symbols in the kernel via /proc/kallsyms, and sorts
it into groups (each of them including 64 symbols in default). And then,
it tests each group by using kprobe-tracer. If a kernel crash occurred,
that group is moved into 'failed' dir. If the group passed the test, this
script moves it into 'passed' dir and saves kprobe_profile into
'passed/profiles/'.
After testing all groups, all 'failed' groups are merged and sorted into
smaller groups (divided by 4, in default). And those are tested again.
This loop will be repeated until all group has just 1 symbol.

Finally, the script sorts all 'passed' symbols into 'tested', 'untested',
and 'missed' based on profiles.


Note

 - This script just gives us some clues to the blacklisted functions.
   In some cases, a combination of probe points will cause a problem, but
   each of them doesn't cause the problem alone.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

#!/bin/bash
#
#  kprobestest: Kprobes stress test tool
#  Written by Masami Hiramatsu mhira...@redhat.com
#
#  Usage:
# $ kprobestest [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST]
#Run stress test. If SYMLIST file is specified, use it as 
#an initial symbol list (This is useful for verifying white list
#after diagnosing all symbols).
#
# $ kprobestest cleanup
#Cleanup all lists

# Configurations 
DEBUGFS=/sys/kernel/debug
INITNR=64
DIV=4
SYMFILE=syms.list
FAILFILE=black.list

function do_test () {
  # Do some benchmark
  for i in {1..4} ; do
  sleep 0.5
  echo -n .
  done
}

function usage () {
  echo Usage: kprobestest [cleanup] [-s SYMLIST] [-b BLACKLIST] [-w WHITELIST]
  exit 0
}

function cleanup_test () {
  echo Cleanup all files
  rm -rf $SYMFILE failed passed testing unset
  exit 0
}


# Parse arguments
WHITELIST=
BLACKLIST=
SYMLIST=

while [ $1 ]; do
  case $1 in
cleanup)
  cleanup_test
  ;;
-s)
  SYMLIST=$2
  shift 1
  ;;
-b)
  BLACKLIST=$2
  shift 1
  ;;
-w)
  WHITELIST=$2
  shift 1
  ;;
*)
  usage
  ;;
  esac
  shift 1
done

# Show configurations
echo Kprobe stress test starting.
[ -f $BLACKLIST ]  echo Blacklist: $BLACKLIST || BLACKLIST=
[ -f $WHITELIST ]  echo Whitelist: $WHITELIST || WHITELIST=
[ -f $SYMLIST ]  echo Symlist: $SYMLIST || SYMLIST=

function make_filter () {
  local EXP=
  if [ -z $WHITELIST -a -z $BLACKLIST ]; then
echo s/^$//g
  else
for i in `cat $WHITELIST $BLACKLIST` ;do
  [ -z $EXP ]  EXP=^$i\$ || EXP=$EXP\\|^$i\$
done ; EXP=s/$EXP//g
echo $EXP
  fi
}

function list_allsyms () {
  local sym
  local out=1
  for sym in `sort /proc/kallsyms | egrep '[0-9a-f]+ [Tt] [^[]*$' | cut -d\  -f 
3`;do
[ $sym  = __kprobes_text_start ]  out=0  continue
[ $sym  = __kprobes_text_end ]  out=1  continue
[ $sym  = _etext ]  break
[ $out -eq 1 ]  echo $sym
  done
}

function prep_testing () {
  local i=0
  local n=0
  local NR=$1
  local fname=

  echo Grouping symbols: $NR

  fname=`printf list-%03d.%d $i $NR`
  cat $SYMFILE | while read ln; do
[ -z $ln ]  continue
echo $ln  testing/$fname
n=$((n+1))
if [ $n -eq $NR ]; then
  n=0
  i=$((i+1))
  fname=`printf list-%03d.%d $i $NR`
fi
  done
  sync
}

function init_first () {
  local EXP
  EXP=`make_filter`
  if [ -f $SYMLIST ]; then
cat $SYMLIST | sed $EXP  $SYMFILE
  else
echo -n Generating symbol list from /proc/kallsyms...
list_allsyms | sed $EXP  $SYMFILE
echo done.  `wc -l $SYMFILE | cut -f1 -d\  ` symbols listed.
  fi
  mkdir -p testing failed unset passed passed/profiles
  prep_testing $INITNR
}

function get_max_nr () {
  wc -l failed/list-* unset/list-* 2/dev/null |\
  awk '/^ *[0-9]+ .*list.*$/{ if (nr  $1) nr=$1 } BEGIN { nr=0 } END { print 
nr}'
}

function init_next () {
  local NR
  NR=`get_max_nr`
  [ $NR -eq 0 ]  return 1
  [ $NR -eq 1 ]  return 2
  [ $NR -le $DIV ]  NR=1 || NR=`expr $NR / $DIV`

  cat failed/* unset/*  $SYMFILE
  rm failed/* unset/*

  prep_testing $NR
  return 0
}


# Initialize symbols
if [ ! -d testing ]; then
  init_first
elif [ -z `ls testing/` ]; then
  init_next
fi

function set_probes () {
  local s
  for s in `cat $1`; do
echo p:$s $s  $DEBUGFS/tracing/kprobe_events
[ $? -ne 0 ]  return -1
  done
  return 0
}

function clear_probes () {
  echo  $DEBUGFS/tracing/kprobe_events
}

function save_profile () {
  cat $DEBUGFS/tracing/kprobe_profile  passed/profiles/$1

[TOOL] c2kpe: C expression to kprobe event format converter

2009-08-13 Thread Masami Hiramatsu

This program converts probe point in C expression to kprobe event
format for kprobe-based event tracer. This helps to define kprobes
events by C source line number or function name, and local variable
name. Currently, this supports only x86(32/64) kernels.


Compile

Before compilation, please install libelf and libdwarf development
packages.
(e.g. elfutils-libelf-devel and libdwarf-devel on Fedora)

 $ gcc -Wall -lelf -ldwarf c2kpe.c -o c2kpe


Synopsis

 $ c2kpe [options] function[+off...@src] [VAR [VAR ...]]
 or
 $ c2kpe [options] @SRC:LINE [VAR [VAR ...]]

  FUNCTION: Probing function name.
  OFFS: Offset in bytes.
  SRC:  Source file path.
  LINE: Line number
  VAR:  Local variable name.
  options:
  -r KREL   Kernel release version (e.g. 2.6.31-rc5)
  -m DEBUGINFO  Dwarf-format binary file (vmlinux or kmodule)


Example
---
 $ c2kpe sys_read fd buf count
 sys_read+0 %di %si %dx

 $ c2kpe @mm/filemap.c:339 inode pos
 sync_page_range+125 -48(%bp) %r14


Example with kprobe-tracer
--
Since C expression may be converted multiple results, I recommend to use
readline.

 $ c2kpe sys_read fd buf count | while read i; do \
   echo p $i  $DEBUGFS/tracing/kprobe_events ;\
   done


Note

 - This requires a kernel compiled with CONFIG_DEBUG_INFO.
 - Specifying @SRC speeds up c2kpe, because we can skip CUs which don't
   include specified SRC file.
 - c2kpe doesn't check whether the offset byte is correctly on the
   instruction boundary. I recommend you to use @SRC:LINE expression for
   tracing function body.
 - This tool doesn't search kmodule file. You need to specify kmodule
   file if you want to probe it.


TODO

 - Fix bugs.
 - Support multiple probepoints from stdin.
 - Better kmodule support.
 - Use elfutils-libdw?
 - Merge into trace-cmd or perf-tools?

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

/*
 * c2kpe : C expression to kprobe event converter
 *
 * Written by Masami Hiramatsu mhira...@redhat.com
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 */

#include sys/utsname.h
#include sys/types.h
#include sys/stat.h
#include fcntl.h
#include errno.h
#include stdio.h
#include unistd.h
#include getopt.h
#include stdlib.h
#include string.h
#include libdwarf/dwarf.h
#include libdwarf/libdwarf.h

/* Default vmlinux search paths */
#define NR_SEARCH_PATH 2
const char *default_search_path[NR_SEARCH_PATH] = {
/lib/modules/%s/build/vmlinux,/* Custom build kernel */
/usr/lib/debug/lib/modules/%s/vmlinux,/* Red Hat debuginfo */
};

#define _stringify(n)   #n
#define stringify(n)_stringify(n)

#ifdef DEBUG
#define debug(fmt ...)  \
fprintf(stderr, DBG( __FILE__ : stringify(__LINE__) ):  fmt)
#else
#define debug(fmt ...)  do {} while (0)
#endif

#define ERR_IF(cnd) \
do { if (cnd) { \
fprintf(stderr, Error ( __FILE__ : stringify(__LINE__) \
):  stringify(cnd) \n); 
\
exit(1);\
}} while (0)

#define MAX_PATH_LEN 256

/* Dwarf_Die Linkage to parent Die */
struct die_link {
struct die_link *parent;/* Parent die */
Dwarf_Die die;  /* Current die */
};

#define X86_32_MAX_REGS 8
const char *x86_32_regs_table[X86_32_MAX_REGS] = {
%ax,
%cx,
%dx,
%bx,
sa,   /* Stack address */
%bp,
%si,
%di,
};

#define X86_64_MAX_REGS 16
const char *x86_64_regs_table[X86_64_MAX_REGS] = {
%ax,
%dx,
%cx,
%bx,
%si,
%di,
%bp,
%sp,
%r8,
%r9,
%r10,
%r11,
%r12,
%r13,
%r14,
%r15,
};

/* TODO: switching by dwarf address size */
#ifdef __x86_64__
#define ARCH_MAX_REGS X86_64_MAX_REGS
#define arch_regs_table x86_64_regs_table
#else
#define ARCH_MAX_REGS X86_32_MAX_REGS
#define arch_regs_table x86_32_regs_table
#endif

/* Return architecture dependent register string */
static inline const char *get_arch_regstr(unsigned int n)
{
return (n = ARCH_MAX_REGS) ? arch_regs_table[n] : NULL

[PATCH -tip -v13 03/11] kprobes: checks probe address is instruction boudary on x86

2009-07-24 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b5b1848..80d493f 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,6 +48,7 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
@@ -55,6 +56,7 @@
 #include asm/uaccess.h
 #include asm/alternative.h
 #include asm/debugreg.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -245,6 +247,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (insn.opcode.bytes[0] == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v13 04/11] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-07-24 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 80d493f..98f48d0 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67:
-   case 0x2e:
-   case 0x3e:
-   case 0x26:
-   case 0x64:
-   case 0x65:
-   case 0x36:
-   case 0xf0:
-   case 0xf3:
-   case 0xf2:
-   ++insn;
-   continue;
-   }
-   break;
-   }
+   struct insn insn;
+   kernel_insn_init(insn, p-ainsn.insn);
 
-   /* Skip REX instruction prefix.  */
-   if (is_REX_prefix(insn))
-   ++insn;
-
-   if (*insn == 0x0f) {
-   /* Two-byte opcode.  */
-   ++insn

[PATCH -tip -v13 07/11] tracing: Introduce TRACE_FIELD_ZERO() macro

2009-07-24 Thread Masami Hiramatsu
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item).
This also includes a fix of TRACE_ZERO_CHAR() macro.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 kernel/trace/trace_event_types.h |4 ++--
 kernel/trace/trace_export.c  |   16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 6db005e..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
TRACE_FIELD(char *, fmt, fmt)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
@@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
 TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 7cee79d..23125b5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -42,9 +42,9 @@ extern void __bad_type_size(void);
if (!ret)   \
return 0;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)\
-   ret = trace_seq_printf(s, \tfield:char  #item ;\t   \
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)   \
+   ret = trace_seq_printf(s, \tfield: #type   #item ;\t  \
   offset:%u;\tsize:0;\n, \
   (unsigned int)offsetof(typeof(field), item)); \
if (!ret)   \
@@ -90,9 +90,6 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct 
trace_seq *s)\
 
 #include trace_event_types.h
 
-#undef TRACE_ZERO_CHAR
-#define TRACE_ZERO_CHAR(arg)
-
 #undef TRACE_FIELD
 #define TRACE_FIELD(type, item, assign)\
entry-item = assign;
@@ -105,6 +102,9 @@ ftrace_format_##call(struct ftrace_event_call *dummy, 
struct trace_seq *s)\
 #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\
TRACE_FIELD(type, item, assign)
 
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
+
 #undef TP_CMD
 #define TP_CMD(cmd...) cmd
 
@@ -176,8 +176,8 @@ __attribute__((section(_ftrace_events))) event_##call = { 
\
if (ret)\
return ret;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
 
 #undef TRACE_EVENT_FORMAT
 #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v13 06/11] tracing: ftrace dynamic ftrace_event_call support

2009-07-24 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
---

 include/linux/ftrace_event.h |   13 +---
 include/trace/ftrace.h   |   22 ++---
 kernel/trace/trace_events.c  |   72 --
 kernel/trace/trace_export.c  |   27 
 4 files changed, 86 insertions(+), 48 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c093ff..f7733b6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 1867553..d696580 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -147,7 +147,8 @@
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)   \
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call(
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret;
  *
@@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call(
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *unused)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call(
\
  * trace_current_buffer_unlock_commit(event, irq_flags, pc);
  * }
  *
- * static int ftrace_raw_reg_event_call(void)
+ * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int

[PATCH -tip -v13 05/11] x86: add pt_regs register and stack access APIs

2009-07-24 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.
These APIs are required by kprobes-based event tracer on ftrace.
Some other debugging tools might be able to use it too.

- regs_query_register_offset(const char *name)
   Query the offset of name register.

- regs_query_register_name(unsigned int offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned int offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth entry of the kernel stack. (N = 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth argument at function call. (N = 0)


Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Reviewed-by: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@firstfloor.org
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: linux-a...@vger.kernel.org
---

 arch/x86/include/asm/ptrace.h |   62 +++
 arch/x86/kernel/ptrace.c  |  112 +
 2 files changed, 174 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..a3d49dd 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@
 
 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif
 
 #ifndef __ASSEMBLY__
@@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }
 
+/* Query offset/name of register from its name/offset */
+extern int regs_query_register_offset(const char *name);
+extern const char *regs_query_register_name(unsigned int offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/**
+ * regs_get_register() - get register value from its offset
+ * @regs:  pt_regs from which register value is gotten.
+ * @offset:offset number of the register.
+ *
+ * regs_get_register returns the value of a register whose offset from @regs
+ * is @offset. The @offset is the offset of the register in struct pt_regs.
+ * If @offset is bigger than MAX_REG_OFFSET, this returns 0.
+ */
+static inline unsigned long regs_get_register(struct pt_regs *regs,
+ unsigned int offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/**
+ * regs_within_kernel_stack() - check the address in the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @addr:  address which is checked.
+ *
+ * regs_within_kenel_stack() checks @addr is within the kernel stack page(s).
+ * If @addr is within the kernel stack, it returns true. If not, returns false.
+ */
+static inline int regs_within_kernel_stack(struct pt_regs *regs,
+  unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/**
+ * regs_get_kernel_stack_nth() - get Nth entry of the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @n: stack entry number.
+ *
+ * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
+ * is specifined by @regs. If the @n th entry is NOT in the kernel stack,
+ * this returns 0.
+ */
+static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
+ unsigned int n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (regs_within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+extern unsigned long regs_get_argument_nth(struct pt_regs *regs,
+  unsigned int n);
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index cabdabc..32729ec 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -49,6 +49,118 @@ enum x86_regset {
REGSET_IOPERM32,
 };
 
+struct pt_regs_offset {
+   const char *name;
+   int offset;
+};
+
+#define REG_OFFSET_NAME(r) {.name = #r, .offset = offsetof(struct pt_regs, r)}
+#define REG_OFFSET_END {.name = NULL, .offset = 0}
+
+static const struct pt_regs_offset regoffset_table[] = {
+#ifdef CONFIG_X86_64
+   REG_OFFSET_NAME(r15),
+   REG_OFFSET_NAME(r14),
+   REG_OFFSET_NAME(r13

[PATCH -tip -v13 09/11] tracing: Kprobe-tracer supports more than 6 arguments

2009-07-24 Thread Masami Hiramatsu
Support up to 128 arguments for each kprobes event.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |2 +-
 kernel/trace/trace_kprobe.c |   21 +
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 9ad907c..b29a54b 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -32,7 +32,7 @@ Synopsis of kprobe_events
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
- FETCHARGS : Arguments.
+ FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
   sN   : Fetch Nth entry of stack (N = 0)
   @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 39491f0..e78c4ea 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -32,7 +32,7 @@
 #include trace.h
 #include trace_output.h
 
-#define TRACE_KPROBE_ARGS 6
+#define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
 
 /* currently, trace_kprobe only supports X86. */
@@ -178,11 +178,15 @@ struct trace_probe {
struct kretproberp;
};
const char  *symbol;/* symbol name */
-   unsigned intnr_args;
-   struct fetch_func   args[TRACE_KPROBE_ARGS];
struct ftrace_event_callcall;
+   unsigned intnr_args;
+   struct fetch_func   args[];
 };
 
+#define SIZEOF_TRACE_PROBE(n)  \
+   (offsetof(struct trace_probe, args) +   \
+   (sizeof(struct fetch_func) * (n)))
+
 static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
 static int kretprobe_trace_func(struct kretprobe_instance *ri,
struct pt_regs *regs);
@@ -255,11 +259,11 @@ static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
 
 static struct trace_probe *alloc_trace_probe(const char *symbol,
-const char *event)
+const char *event, int nargs)
 {
struct trace_probe *tp;
 
-   tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL);
+   tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL);
if (!tp)
return ERR_PTR(-ENOMEM);
 
@@ -559,9 +563,10 @@ static int create_trace_probe(int argc, char **argv)
if (offset  is_return)
return -EINVAL;
}
+   argc -= 2; argv += 2;
 
/* setup a probe */
-   tp = alloc_trace_probe(symbol, event);
+   tp = alloc_trace_probe(symbol, event, argc);
if (IS_ERR(tp))
return PTR_ERR(tp);
 
@@ -580,8 +585,8 @@ static int create_trace_probe(int argc, char **argv)
kp-addr = addr;
 
/* parse arguments */
-   argc -= 2; argv += 2; ret = 0;
-   for (i = 0; i  argc  i  TRACE_KPROBE_ARGS; i++) {
+   ret = 0;
+   for (i = 0; i  argc  i  MAX_TRACE_ARGS; i++) {
if (strlen(argv[i])  MAX_ARGSTR_LEN) {
pr_info(Argument%d(%s) is too long.\n, i, argv[i]);
ret = -ENOSPC;


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v13 08/11] tracing: add kprobe-based event tracer

2009-07-24 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes
(kprobe and kretprobe). It probes anywhere where kprobes can probe(this
 means, all functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobetrace.txt for details.

Changes from v12:
 - Check O_TRUNC for cleanup events, instead of !O_APPEND.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Li Zefan l...@cn.fujitsu.com
---

 Documentation/trace/kprobetrace.txt |  138 
 kernel/trace/Kconfig|   12 
 kernel/trace/Makefile   |1 
 kernel/trace/trace.h|   29 +
 kernel/trace/trace_event_types.h|   18 +
 kernel/trace/trace_kprobe.c | 1193 +++
 6 files changed, 1391 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
new file mode 100644
index 000..9ad907c
--- /dev/null
+++ b/Documentation/trace/kprobetrace.txt
@@ -0,0 +1,138 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
+probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+
+ EVENT : Event name.
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
+ MEMADDR   : Address where the probe is inserted.
+
+ FETCHARGS : Arguments.
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
+an event under tracing/events/kprobes/EVENT, at the directory you can see
+'id', 'enabled', 'format' and 'filter'.
+
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+
+format:
+  It shows the format of this probe event. It also shows aliases of arguments
+ which you specified to kprobe_events.
+
+filter:
+  You can write filtering rules of this event. And you can use both of aliase
+ names and field names for describing filters.
+
+
+Usage examples

[PATCH -tip -v13 10/11] tracing: Generate names for each kprobe event automatically

2009-07-24 Thread Masami Hiramatsu
Generate names for each kprobe event based on the probe point,
and remove generic k*probe event types because there is no user
of those types.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |3 +-
 kernel/trace/trace_event_types.h|   18 --
 kernel/trace/trace_kprobe.c |   64 ++-
 3 files changed, 35 insertions(+), 50 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index b29a54b..437ad49 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -28,7 +28,8 @@ Synopsis of kprobe_events
   p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
   r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
 
- EVENT : Event name.
+ EVENT : Event name. If omitted, the event name is generated
+ based on SYMBOL+offs or MEMADDR.
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 186b598..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, 
kmemtrace_free_entry, ignore,
TP_RAW_FMT(type:%u call_site:%lx ptr:%p)
 );
 
-TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx: args:0x%lx ...)
-);
-
-TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, func, func)
-   TRACE_FIELD(unsigned long, ret_ip, ret_ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...)
-);
 #undef TRACE_SYSTEM
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index e78c4ea..9f9f161 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -34,6 +34,7 @@
 
 #define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
+#define MAX_EVENT_NAME_LEN 64
 
 /* currently, trace_kprobe only supports X86. */
 
@@ -272,11 +273,11 @@ static struct trace_probe *alloc_trace_probe(const char 
*symbol,
if (!tp-symbol)
goto error;
}
-   if (event) {
-   tp-call.name = kstrdup(event, GFP_KERNEL);
-   if (!tp-call.name)
-   goto error;
-   }
+   if (!event)
+   goto error;
+   tp-call.name = kstrdup(event, GFP_KERNEL);
+   if (!tp-call.name)
+   goto error;
 
INIT_LIST_HEAD(tp-list);
return tp;
@@ -306,7 +307,7 @@ static struct trace_probe *find_probe_event(const char 
*event)
struct trace_probe *tp;
 
list_for_each_entry(tp, probe_list, list)
-   if (tp-call.name  !strcmp(tp-call.name, event))
+   if (!strcmp(tp-call.name, event))
return tp;
return NULL;
 }
@@ -322,8 +323,7 @@ static void __unregister_trace_probe(struct trace_probe *tp)
 /* Unregister a trace_probe and probe_event: call with locking probe_lock */
 static void unregister_trace_probe(struct trace_probe *tp)
 {
-   if (tp-call.name)
-   unregister_probe_event(tp);
+   unregister_probe_event(tp);
__unregister_trace_probe(tp);
list_del(tp-list);
 }
@@ -352,18 +352,16 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}
/* register as an event */
-   if (tp-call.name) {
-   old_tp = find_probe_event(tp-call.name);
-   if (old_tp) {
-   /* delete old event */
-   unregister_trace_probe(old_tp);
-   free_trace_probe(old_tp);
-   }
-   ret = register_probe_event(tp);
-   if (ret) {
-   pr_warning(Faild to register probe event(%d)\n, ret);
-   __unregister_trace_probe(tp);
-   }
+   old_tp = find_probe_event(tp-call.name);
+   if (old_tp) {
+   /* delete old event */
+   unregister_trace_probe(old_tp);
+   free_trace_probe(old_tp);
+   }
+   ret = register_probe_event(tp);
+   if (ret) {
+   pr_warning(Faild

[PATCH -tip -v13 00/11] tracing: kprobe-based event tracer and x86 instruction decoder

2009-07-24 Thread Masami Hiramatsu
/kprobe_events

 This sets a kretprobe on the return point of do_sys_open() function with
recording return value and return address as myretprobe event.
 You can see the format of these events via
/sys/kernel/debug/tracing/events/kprobes/EVENT/format.

  cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 23
format:
field:unsigned short common_type;   offset:0;   size:2;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (11):
  tracing: Add kprobes event profiling interface
  tracing: Generate names for each kprobe event automatically
  tracing: Kprobe-tracer supports more than 6 arguments
  tracing: add kprobe-based event tracer
  tracing: Introduce TRACE_FIELD_ZERO() macro
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobetrace.txt  |  147 
 arch/x86/Kconfig.debug   |9 
 arch/x86/Makefile|3 
 arch/x86/include/asm/inat.h  |  188 +
 arch/x86/include/asm/inat_types.h|   29 +
 arch/x86/include/asm/insn.h  |  143 
 arch/x86/include/asm/ptrace.h|   62 ++
 arch/x86/kernel/kprobes.c|  197 +++--
 arch/x86/kernel/ptrace.c |  112 +++
 arch/x86/lib/Makefile|   13 
 arch/x86/lib/inat.c  |   78 ++
 arch/x86/lib/insn.c  |  464 +
 arch/x86/lib/x86-opcode-map.txt  |  719 
 arch/x86/tools/Makefile  |   15 
 arch/x86/tools/distill.awk   |   42 +
 arch/x86/tools/gen-insn-attr-x86.awk |  314 +
 arch/x86/tools/test_get_len.c|  113 +++
 include/linux/ftrace_event.h |   13 
 include/trace/ftrace.h   |   22 -
 kernel/trace/Kconfig |   12 
 kernel/trace/Makefile|1 
 kernel/trace/trace.h |   29 +
 kernel/trace/trace_event_types.h |4 
 kernel/trace/trace_events.c  |   72 +-
 kernel/trace/trace_export.c  |   43 +
 kernel/trace/trace_kprobe.c  | 1243 ++
 26 files changed, 3924 insertions(+), 163 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/inat_types.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/tools/Makefile
 create mode 100644 arch/x86/tools/distill.awk
 create mode 100644 arch/x86/tools/gen-insn-attr-x86.awk
 create mode 100644 arch/x86/tools/test_get_len.c

[PATCH -tip -v13 11/11] tracing: Add kprobes event profiling interface

2009-07-24 Thread Masami Hiramatsu
Add profiling interaces for each kprobes event. This interface provides
how many times each probe hit or missed.

Changes from v12:
 - Reformat profile data.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Li Zefan l...@cn.fujitsu.com
---

 Documentation/trace/kprobetrace.txt |8 +++
 kernel/trace/trace_kprobe.c |   43 +++
 2 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 437ad49..9c6be05 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -69,6 +69,14 @@ filter:
  names and field names for describing filters.
 
 
+Event Profiling
+---
+ You can check the total number of probe hits and probe miss-hits via
+/sys/kernel/debug/tracing/kprobe_profile.
+ The first column is event name, the second is the number of probe hits,
+the third is the number of probe miss-hits.
+
+
 Usage examples
 --
 To add a probe as a new event, write a new definition to kprobe_events
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 9f9f161..aedf25a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -178,6 +178,7 @@ struct trace_probe {
struct kprobe   kp;
struct kretproberp;
};
+   unsigned long   nhit;
const char  *symbol;/* symbol name */
struct ftrace_event_callcall;
unsigned intnr_args;
@@ -766,6 +767,37 @@ static const struct file_operations kprobe_events_ops = {
.write  = probes_write,
 };
 
+/* Probes profiling interfaces */
+static int probes_profile_seq_show(struct seq_file *m, void *v)
+{
+   struct trace_probe *tp = v;
+
+   seq_printf(m,   %-44s %15lu %15lu\n, tp-call.name, tp-nhit,
+  probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed);
+
+   return 0;
+}
+
+static const struct seq_operations profile_seq_op = {
+   .start  = probes_seq_start,
+   .next   = probes_seq_next,
+   .stop   = probes_seq_stop,
+   .show   = probes_profile_seq_show
+};
+
+static int profile_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, profile_seq_op);
+}
+
+static const struct file_operations kprobe_profile_ops = {
+   .owner  = THIS_MODULE,
+   .open   = profile_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 /* Kprobe handler */
 static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs)
 {
@@ -776,6 +808,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, 
struct pt_regs *regs)
unsigned long irq_flags;
struct ftrace_event_call *call = tp-call;
 
+   tp-nhit++;
+
local_save_flags(irq_flags);
pc = preempt_count();
 
@@ -1152,9 +1186,18 @@ static __init int init_kprobe_trace(void)
entry = debugfs_create_file(kprobe_events, 0644, d_tracer,
NULL, kprobe_events_ops);
 
+   /* Event list interface */
if (!entry)
pr_warning(Could not create debugfs 
   'kprobe_events' entry\n);
+
+   /* Profile interface */
+   entry = debugfs_create_file(kprobe_profile, 0444, d_tracer,
+   NULL, kprobe_profile_ops);
+
+   if (!entry)
+   pr_warning(Could not create debugfs 
+  'kprobe_profile' entry\n);
return 0;
 }
 fs_initcall(init_kprobe_trace);


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v13 01/11] x86: instruction decoder API

2009-07-24 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps include a few SSE and FP opcodes (for setup), because
those opcodes are used in the kernel.

Changes from v12:
 - Use arch/x86/tools dir instead of arch/x86/scripts.
 - Remove all EXPORT_SYMBOL_GPL() and linux/module.h.
 - Replace all types defined in linux/types.h.
 - Use inline functions instead of macros.
 - Add VIA's RNG/ACE instructions.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Acked-by: H. Peter Anvin h...@zytor.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/include/asm/inat.h  |  188 +
 arch/x86/include/asm/inat_types.h|   29 +
 arch/x86/include/asm/insn.h  |  143 +++
 arch/x86/lib/Makefile|   13 +
 arch/x86/lib/inat.c  |   78 
 arch/x86/lib/insn.c  |  464 ++
 arch/x86/lib/x86-opcode-map.txt  |  719 ++
 arch/x86/tools/gen-insn-attr-x86.awk |  314 +++
 8 files changed, 1948 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/inat_types.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/tools/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..2866fdd
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,188 @@
+#ifndef _ASM_X86_INAT_H
+#define _ASM_X86_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include asm/inat_types.h
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should use checking functions.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6
+#define INAT_IMM_VWORD 7
+
+/* Legacy

[PATCH -tip -v13 02/11] x86: x86 instruction decoder build-time selftest

2009-07-24 Thread Masami Hiramatsu
Add a user-space selftest of x86 instruction decoder at kernel build time.
When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
instruction decoder and performs it after building vmlinux.
The test compares the results of objdump and x86 instruction decoder
code and check there are no differences.

Changes from v12:
 - Remove user_include.h.
 - Use $(OBJDUMP) instead of native objdump.
 - Use hostprogs-y and include insn.c and inat.c directly from test_gen_insn.c.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: Sam Ravnborg s...@ravnborg.org
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/Kconfig.debug|9 +++
 arch/x86/Makefile |3 +
 arch/x86/tools/Makefile   |   15 +
 arch/x86/tools/distill.awk|   42 +++
 arch/x86/tools/test_get_len.c |  113 +
 5 files changed, 182 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/tools/Makefile
 create mode 100644 arch/x86/tools/distill.awk
 create mode 100644 arch/x86/tools/test_get_len.c

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d105f29..7d0b681 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -186,6 +186,15 @@ config X86_DS_SELFTEST
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_DECODER_SELFTEST
+ bool x86 instruction decoder selftest
+ depends on DEBUG_KERNEL
+   ---help---
+Perform x86 instruction decoder selftests at build time.
+This option is useful for checking the sanity of x86 instruction
+decoder code.
+If unsure, say N.
+
 #
 # IO delay types:
 #
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1b68659..5fe16bf 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -154,6 +154,9 @@ all: bzImage
 KBUILD_IMAGE := $(boot)/bzImage
 
 bzImage: vmlinux
+ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
+   $(Q)$(MAKE) $(build)=arch/x86/tools posttest
+endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage 
$(objtree)/arch/$(UTS_MACHINE)/boot/$@
diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
new file mode 100644
index 000..3dd626b
--- /dev/null
+++ b/arch/x86/tools/Makefile
@@ -0,0 +1,15 @@
+PHONY += posttest
+quiet_cmd_posttest = TEST$@
+  cmd_posttest = $(OBJDUMP) -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len
+
+posttest: $(obj)/test_get_len vmlinux
+   $(call cmd,posttest)
+
+hostprogs-y:= test_get_len
+
+# -I needed for generated C source and C source which in the kernel tree.
+HOSTCFLAGS_test_get_len.o := -Wall -I$(objtree)/arch/x86/lib/ 
-I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/
+
+# Dependancies are also needed.
+$(obj)/test_get_len.o: $(srctree)/arch/x86/lib/insn.c 
$(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h 
$(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h 
$(objtree)/arch/x86/lib/inat-tables.c
+
diff --git a/arch/x86/tools/distill.awk b/arch/x86/tools/distill.awk
new file mode 100644
index 000..d433619
--- /dev/null
+++ b/arch/x86/tools/distill.awk
@@ -0,0 +1,42 @@
+#!/bin/awk -f
+# Usage: objdump -d a.out | awk -f distill.awk | ./test_get_len
+# Distills the disassembly as follows:
+# - Removes all lines except the disassembled instructions.
+# - For instructions that exceed 1 line (7 bytes), crams all the hex bytes
+# into a single line.
+# - Remove bad(or prefix only) instructions
+
+BEGIN {
+   prev_addr = 
+   prev_hex = 
+   prev_mnemonic = 
+   bad_expr = 
(\\(bad\\)|^rex|^.byte|^rep(z|nz)$|^lock$|^es$|^cs$|^ss$|^ds$|^fs$|^gs$|^data(16|32)$|^addr(16|32|64))
+   fwait_expr = ^9b 
+   fwait_str=9b\tfwait
+}
+
+/^ *[0-9a-f]+:/ {
+   if (split($0, field, \t)  3) {
+   # This is a continuation of the same insn.
+   prev_hex = prev_hex field[2]
+   } else {
+   # Skip bad instructions
+   if (match(prev_mnemonic, bad_expr))
+   prev_addr = 
+   # Split fwait from other f* instructions
+   if (match(prev_hex, fwait_expr)  prev_mnemonic != fwait) {
+   printf %s\t%s\n, prev_addr, fwait_str
+   sub(fwait_expr, , prev_hex)
+   }
+   if (prev_addr != )
+   printf %s\t%s\t%s\n, prev_addr, prev_hex, 
prev_mnemonic
+   prev_addr = field[1

[PATCH -tip -v12 04/11] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-07-16 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 5341842..b77e050 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67:
-   case 0x2e:
-   case 0x3e:
-   case 0x26:
-   case 0x64:
-   case 0x65:
-   case 0x36:
-   case 0xf0:
-   case 0xf3:
-   case 0xf2:
-   ++insn;
-   continue;
-   }
-   break;
-   }
+   struct insn insn;
+   kernel_insn_init(insn, p-ainsn.insn);
 
-   /* Skip REX instruction prefix.  */
-   if (is_REX_prefix(insn))
-   ++insn;
-
-   if (*insn == 0x0f) {
-   /* Two-byte opcode.  */
-   ++insn

[PATCH -tip -v12 06/11] tracing: ftrace dynamic ftrace_event_call support

2009-07-16 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Changes from v11:
 - Call remove_subsystem_dir() when unregistering an event call.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
---

 include/linux/ftrace_event.h |   13 +---
 include/trace/ftrace.h   |   22 ++---
 kernel/trace/trace_events.c  |   72 --
 kernel/trace/trace_export.c  |   27 
 4 files changed, 86 insertions(+), 48 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c093ff..f7733b6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 1867553..d696580 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -147,7 +147,8 @@
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)   \
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call(
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret;
  *
@@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call(
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *unused)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call(
\
  * trace_current_buffer_unlock_commit(event, irq_flags, pc);
  * }
  *
- * static int ftrace_raw_reg_event_call(void)
+ * static

[PATCH -tip -v12 07/11] tracing: Introduce TRACE_FIELD_ZERO() macro

2009-07-16 Thread Masami Hiramatsu
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item).
This also includes a fix of TRACE_ZERO_CHAR() macro.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 kernel/trace/trace_event_types.h |4 ++--
 kernel/trace/trace_export.c  |   16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 6db005e..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
TRACE_FIELD(char *, fmt, fmt)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
@@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
 TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 7cee79d..23125b5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -42,9 +42,9 @@ extern void __bad_type_size(void);
if (!ret)   \
return 0;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)\
-   ret = trace_seq_printf(s, \tfield:char  #item ;\t   \
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)   \
+   ret = trace_seq_printf(s, \tfield: #type   #item ;\t  \
   offset:%u;\tsize:0;\n, \
   (unsigned int)offsetof(typeof(field), item)); \
if (!ret)   \
@@ -90,9 +90,6 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct 
trace_seq *s)\
 
 #include trace_event_types.h
 
-#undef TRACE_ZERO_CHAR
-#define TRACE_ZERO_CHAR(arg)
-
 #undef TRACE_FIELD
 #define TRACE_FIELD(type, item, assign)\
entry-item = assign;
@@ -105,6 +102,9 @@ ftrace_format_##call(struct ftrace_event_call *dummy, 
struct trace_seq *s)\
 #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\
TRACE_FIELD(type, item, assign)
 
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
+
 #undef TP_CMD
 #define TP_CMD(cmd...) cmd
 
@@ -176,8 +176,8 @@ __attribute__((section(_ftrace_events))) event_##call = { 
\
if (ret)\
return ret;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
 
 #undef TRACE_EVENT_FORMAT
 #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v12 05/11] x86: add pt_regs register and stack access APIs

2009-07-16 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.
These APIs are required by kprobes-based event tracer on ftrace.
Some other debugging tools might be able to use it too.

- regs_query_register_offset(const char *name)
   Query the offset of name register.

- regs_query_register_name(unsigned int offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned int offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth entry of the kernel stack. (N = 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth argument at function call. (N = 0)

Changes from v10:
 - Use an offsetof table in regs_get_argument_nth().
 - Use unsigned int instead of unsigned.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Reviewed-by: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@firstfloor.org
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: linux-a...@vger.kernel.org
---

 arch/x86/include/asm/ptrace.h |   62 +++
 arch/x86/kernel/ptrace.c  |  112 +
 2 files changed, 174 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..a3d49dd 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@
 
 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif
 
 #ifndef __ASSEMBLY__
@@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }
 
+/* Query offset/name of register from its name/offset */
+extern int regs_query_register_offset(const char *name);
+extern const char *regs_query_register_name(unsigned int offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/**
+ * regs_get_register() - get register value from its offset
+ * @regs:  pt_regs from which register value is gotten.
+ * @offset:offset number of the register.
+ *
+ * regs_get_register returns the value of a register whose offset from @regs
+ * is @offset. The @offset is the offset of the register in struct pt_regs.
+ * If @offset is bigger than MAX_REG_OFFSET, this returns 0.
+ */
+static inline unsigned long regs_get_register(struct pt_regs *regs,
+ unsigned int offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/**
+ * regs_within_kernel_stack() - check the address in the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @addr:  address which is checked.
+ *
+ * regs_within_kenel_stack() checks @addr is within the kernel stack page(s).
+ * If @addr is within the kernel stack, it returns true. If not, returns false.
+ */
+static inline int regs_within_kernel_stack(struct pt_regs *regs,
+  unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/**
+ * regs_get_kernel_stack_nth() - get Nth entry of the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @n: stack entry number.
+ *
+ * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
+ * is specifined by @regs. If the @n th entry is NOT in the kernel stack,
+ * this returns 0.
+ */
+static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
+ unsigned int n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (regs_within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+extern unsigned long regs_get_argument_nth(struct pt_regs *regs,
+  unsigned int n);
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index cabdabc..32729ec 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -49,6 +49,118 @@ enum x86_regset {
REGSET_IOPERM32,
 };
 
+struct pt_regs_offset {
+   const char *name;
+   int offset;
+};
+
+#define REG_OFFSET_NAME(r) {.name = #r, .offset = offsetof(struct pt_regs, r)}
+#define REG_OFFSET_END {.name = NULL, .offset = 0}
+
+static const struct pt_regs_offset regoffset_table[] = {
+#ifdef

[PATCH -tip -v12 09/11] tracing: Kprobe-tracer supports more than 6 arguments

2009-07-16 Thread Masami Hiramatsu
Support up to 128 arguments for each kprobes event.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |2 +-
 kernel/trace/trace_kprobe.c |   21 +
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 9ad907c..b29a54b 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -32,7 +32,7 @@ Synopsis of kprobe_events
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
- FETCHARGS : Arguments.
+ FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
   sN   : Fetch Nth entry of stack (N = 0)
   @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ad33073..67c33e1 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -32,7 +32,7 @@
 #include trace.h
 #include trace_output.h
 
-#define TRACE_KPROBE_ARGS 6
+#define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
 
 /* currently, trace_kprobe only supports X86. */
@@ -178,11 +178,15 @@ struct trace_probe {
struct kretproberp;
};
const char  *symbol;/* symbol name */
-   unsigned intnr_args;
-   struct fetch_func   args[TRACE_KPROBE_ARGS];
struct ftrace_event_callcall;
+   unsigned intnr_args;
+   struct fetch_func   args[];
 };
 
+#define SIZEOF_TRACE_PROBE(n)  \
+   (offsetof(struct trace_probe, args) +   \
+   (sizeof(struct fetch_func) * (n)))
+
 static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
 static int kretprobe_trace_func(struct kretprobe_instance *ri,
struct pt_regs *regs);
@@ -255,11 +259,11 @@ static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
 
 static struct trace_probe *alloc_trace_probe(const char *symbol,
-const char *event)
+const char *event, int nargs)
 {
struct trace_probe *tp;
 
-   tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL);
+   tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL);
if (!tp)
return ERR_PTR(-ENOMEM);
 
@@ -559,9 +563,10 @@ static int create_trace_probe(int argc, char **argv)
if (offset  is_return)
return -EINVAL;
}
+   argc -= 2; argv += 2;
 
/* setup a probe */
-   tp = alloc_trace_probe(symbol, event);
+   tp = alloc_trace_probe(symbol, event, argc);
if (IS_ERR(tp))
return PTR_ERR(tp);
 
@@ -580,8 +585,8 @@ static int create_trace_probe(int argc, char **argv)
kp-addr = addr;
 
/* parse arguments */
-   argc -= 2; argv += 2; ret = 0;
-   for (i = 0; i  argc  i  TRACE_KPROBE_ARGS; i++) {
+   ret = 0;
+   for (i = 0; i  argc  i  MAX_TRACE_ARGS; i++) {
if (strlen(argv[i])  MAX_ARGSTR_LEN) {
pr_info(Argument%d(%s) is too long.\n, i, argv[i]);
ret = -ENOSPC;


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v12 10/11] tracing: Generate names for each kprobe event automatically

2009-07-16 Thread Masami Hiramatsu
Generate names for each kprobe event based on the probe point,
and remove generic k*probe event types because there is no user
of those types.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |3 +-
 kernel/trace/trace_event_types.h|   18 --
 kernel/trace/trace_kprobe.c |   64 ++-
 3 files changed, 35 insertions(+), 50 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index b29a54b..437ad49 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -28,7 +28,8 @@ Synopsis of kprobe_events
   p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
   r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
 
- EVENT : Event name.
+ EVENT : Event name. If omitted, the event name is generated
+ based on SYMBOL+offs or MEMADDR.
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 186b598..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, 
kmemtrace_free_entry, ignore,
TP_RAW_FMT(type:%u call_site:%lx ptr:%p)
 );
 
-TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx: args:0x%lx ...)
-);
-
-TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, func, func)
-   TRACE_FIELD(unsigned long, ret_ip, ret_ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...)
-);
 #undef TRACE_SYSTEM
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 67c33e1..3444d1d 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -34,6 +34,7 @@
 
 #define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
+#define MAX_EVENT_NAME_LEN 64
 
 /* currently, trace_kprobe only supports X86. */
 
@@ -272,11 +273,11 @@ static struct trace_probe *alloc_trace_probe(const char 
*symbol,
if (!tp-symbol)
goto error;
}
-   if (event) {
-   tp-call.name = kstrdup(event, GFP_KERNEL);
-   if (!tp-call.name)
-   goto error;
-   }
+   if (!event)
+   goto error;
+   tp-call.name = kstrdup(event, GFP_KERNEL);
+   if (!tp-call.name)
+   goto error;
 
INIT_LIST_HEAD(tp-list);
return tp;
@@ -306,7 +307,7 @@ static struct trace_probe *find_probe_event(const char 
*event)
struct trace_probe *tp;
 
list_for_each_entry(tp, probe_list, list)
-   if (tp-call.name  !strcmp(tp-call.name, event))
+   if (!strcmp(tp-call.name, event))
return tp;
return NULL;
 }
@@ -322,8 +323,7 @@ static void __unregister_trace_probe(struct trace_probe *tp)
 /* Unregister a trace_probe and probe_event: call with locking probe_lock */
 static void unregister_trace_probe(struct trace_probe *tp)
 {
-   if (tp-call.name)
-   unregister_probe_event(tp);
+   unregister_probe_event(tp);
__unregister_trace_probe(tp);
list_del(tp-list);
 }
@@ -352,18 +352,16 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}
/* register as an event */
-   if (tp-call.name) {
-   old_tp = find_probe_event(tp-call.name);
-   if (old_tp) {
-   /* delete old event */
-   unregister_trace_probe(old_tp);
-   free_trace_probe(old_tp);
-   }
-   ret = register_probe_event(tp);
-   if (ret) {
-   pr_warning(Faild to register probe event(%d)\n, ret);
-   __unregister_trace_probe(tp);
-   }
+   old_tp = find_probe_event(tp-call.name);
+   if (old_tp) {
+   /* delete old event */
+   unregister_trace_probe(old_tp);
+   free_trace_probe(old_tp);
+   }
+   ret = register_probe_event(tp);
+   if (ret) {
+   pr_warning(Faild

[PATCH -tip -v12 00/11] tracing: kprobe-based event tracer and x86 instruction decoder

2009-07-16 Thread Masami Hiramatsu
 r:myretprobe do_sys_open rv ra  /sys/kernel/debug/tracing/kprobe_events

 This sets a kretprobe on the return point of do_sys_open() function with
recording return value and return address as myretprobe event.
 You can see the format of these events via
/sys/kernel/debug/tracing/events/kprobes/EVENT/format.

  cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 23
format:
field:unsigned short common_type;   offset:0;   size:2;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (11):
  tracing: Add kprobes event profiling interface
  tracing: Generate names for each kprobe event automatically
  tracing: Kprobe-tracer supports more than 6 arguments
  tracing: add kprobe-based event tracer
  tracing: Introduce TRACE_FIELD_ZERO() macro
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobetrace.txt|  147 
 arch/x86/Kconfig.debug |9 
 arch/x86/Makefile  |3 
 arch/x86/include/asm/inat.h|  127 +++
 arch/x86/include/asm/insn.h|  136 +++
 arch/x86/include/asm/ptrace.h  |   62 ++
 arch/x86/kernel/kprobes.c  |  197 ++---
 arch/x86/kernel/ptrace.c   |  112 +++
 arch/x86/lib/Makefile  |   13 
 arch/x86/lib/inat.c|   82 ++
 arch/x86/lib/insn.c|  473 
 arch/x86/lib/x86-opcode-map.txt|  711 ++
 arch/x86/scripts/Makefile  |   19 
 arch/x86/scripts/distill.awk   |   42 +
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 
 arch/x86/scripts/test_get_len.c|   99 +++
 arch/x86/scripts/user_include.h|   49 +
 include/linux/ftrace_event.h   |   13 
 include/trace/ftrace.h |   22 -
 kernel/trace/Kconfig   |   12 
 kernel/trace/Makefile  |1 
 kernel/trace/trace.h   |   29 +
 kernel/trace/trace_event_types.h   |4 
 kernel/trace/trace_events.c|   72 +-
 kernel/trace/trace_export.c|   43 +
 kernel/trace/trace_kprobe.c| 1245 
 26 files changed, 3873 insertions(+), 163 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk

[PATCH -tip -v12 11/11] tracing: Add kprobes event profiling interface

2009-07-16 Thread Masami Hiramatsu
Add profiling interaces for each kprobes event.

Changes from v11:
 - Fix a typo and remove redundant check.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Li Zefan l...@cn.fujitsu.com
---

 Documentation/trace/kprobetrace.txt |8 ++
 kernel/trace/trace_kprobe.c |   45 +++
 2 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 437ad49..9c6be05 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -69,6 +69,14 @@ filter:
  names and field names for describing filters.
 
 
+Event Profiling
+---
+ You can check the total number of probe hits and probe miss-hits via
+/sys/kernel/debug/tracing/kprobe_profile.
+ The first column is event name, the second is the number of probe hits,
+the third is the number of probe miss-hits.
+
+
 Usage examples
 --
 To add a probe as a new event, write a new definition to kprobe_events
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3444d1d..21e619f 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -178,6 +178,7 @@ struct trace_probe {
struct kprobe   kp;
struct kretproberp;
};
+   unsigned long   nhits;
const char  *symbol;/* symbol name */
struct ftrace_event_callcall;
unsigned intnr_args;
@@ -766,6 +767,39 @@ static const struct file_operations kprobe_events_ops = {
.write  = probes_write,
 };
 
+/* Probes profiling interfaces */
+static int profile_seq_show(struct seq_file *m, void *v)
+{
+   struct trace_probe *tp = v;
+
+   seq_printf(m, %s, tp-call.name);
+
+   seq_printf(m, \t%8lu %8lu\n, tp-nhits,
+  probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed);
+
+   return 0;
+}
+
+static const struct seq_operations profile_seq_op = {
+   .start  = probes_seq_start,
+   .next   = probes_seq_next,
+   .stop   = probes_seq_stop,
+   .show   = profile_seq_show
+};
+
+static int profile_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, profile_seq_op);
+}
+
+static const struct file_operations kprobe_profile_ops = {
+   .owner  = THIS_MODULE,
+   .open   = profile_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 /* Kprobe handler */
 static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs)
 {
@@ -776,6 +810,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, 
struct pt_regs *regs)
unsigned long irq_flags;
struct ftrace_event_call *call = tp-call;
 
+   tp-nhits++;
+
local_save_flags(irq_flags);
pc = preempt_count();
 
@@ -1152,9 +1188,18 @@ static __init int init_kprobe_trace(void)
entry = debugfs_create_file(kprobe_events, 0644, d_tracer,
NULL, kprobe_events_ops);
 
+   /* Event list interface */
if (!entry)
pr_warning(Could not create debugfs 
   'kprobe_events' entry\n);
+
+   /* Profile interface */
+   entry = debugfs_create_file(kprobe_profile, 0444, d_tracer,
+   NULL, kprobe_profile_ops);
+
+   if (!entry)
+   pr_warning(Could not create debugfs 
+  'kprobe_profile' entry\n);
return 0;
 }
 fs_initcall(init_kprobe_trace);


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest

2009-07-16 Thread Masami Hiramatsu
Add a user-space selftest of x86 instruction decoder at kernel build time.
When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
instruction decoder and performs it after building vmlinux.
The test compares the results of objdump and x86 instruction decoder
code and check there are no differences.

Changes from v10:
 - Use unsigned int instead of unsigned.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Sam Ravnborg s...@ravnborg.org
---

 arch/x86/Kconfig.debug  |9 
 arch/x86/Makefile   |3 +
 arch/x86/include/asm/inat.h |2 +
 arch/x86/include/asm/insn.h |2 +
 arch/x86/lib/inat.c |2 +
 arch/x86/lib/insn.c |2 +
 arch/x86/scripts/Makefile   |   19 +++
 arch/x86/scripts/distill.awk|   42 +
 arch/x86/scripts/test_get_len.c |   99 +++
 arch/x86/scripts/user_include.h |   49 +++
 10 files changed, 229 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d105f29..7d0b681 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -186,6 +186,15 @@ config X86_DS_SELFTEST
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_DECODER_SELFTEST
+ bool x86 instruction decoder selftest
+ depends on DEBUG_KERNEL
+   ---help---
+Perform x86 instruction decoder selftests at build time.
+This option is useful for checking the sanity of x86 instruction
+decoder code.
+If unsure, say N.
+
 #
 # IO delay types:
 #
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1b68659..7046556 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -154,6 +154,9 @@ all: bzImage
 KBUILD_IMAGE := $(boot)/bzImage
 
 bzImage: vmlinux
+ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
+   $(Q)$(MAKE) $(build)=arch/x86/scripts posttest
+endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage 
$(objtree)/arch/$(UTS_MACHINE)/boot/$@
diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 01e079a..9090665 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -20,7 +20,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 
 /* Instruction attributes */
 typedef u32 insn_attr_t;
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5b50fa3..5736404 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -20,7 +20,9 @@
  * Copyright (C) IBM Corporation, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 /* insn_attr_t is defined in inat.h */
 #include asm/inat.h
 
diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c
index d6a34be..564ecbd 100644
--- a/arch/x86/lib/inat.c
+++ b/arch/x86/lib/inat.c
@@ -18,7 +18,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/module.h
+#endif
 #include asm/insn.h
 
 /* Attribute tables are generated from opcode map */
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 254c848..3b9451a 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -18,8 +18,10 @@
  * Copyright (C) IBM Corporation, 2002, 2004, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/string.h
 #include linux/module.h
+#endif
 #include asm/inat.h
 #include asm/insn.h
 
diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile
new file mode 100644
index 000..f08859e
--- /dev/null
+++ b/arch/x86/scripts/Makefile
@@ -0,0 +1,19 @@
+PHONY += posttest
+quiet_cmd_posttest = TEST$@
+  cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len
+
+posttest: $(obj)/test_get_len vmlinux
+   $(call cmd,posttest)
+
+test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c 
$(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c
+test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h 
$(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
+
+quiet_cmd_test_get_len = CC  $@
+  cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) 
-I

[PATCH -tip -v12 08/11] tracing: add kprobe-based event tracer

2009-07-16 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes
(kprobe and kretprobe). It probes anywhere where kprobes can probe(this
 means, all functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobetrace.txt for details.

Changes from v11:
 - Put a line after local variable definitions.
 - Fix indirect memory access string bug in trace_arg_string().
 - Remove redundant checks.
 - Fix buffer overflow in probes_write().
 - Fix probes_write() to support inputs ended without a new-line.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Li Zefan l...@cn.fujitsu.com
---

 Documentation/trace/kprobetrace.txt |  138 
 kernel/trace/Kconfig|   12 
 kernel/trace/Makefile   |1 
 kernel/trace/trace.h|   29 +
 kernel/trace/trace_event_types.h|   18 +
 kernel/trace/trace_kprobe.c | 1193 +++
 6 files changed, 1391 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
new file mode 100644
index 000..9ad907c
--- /dev/null
+++ b/Documentation/trace/kprobetrace.txt
@@ -0,0 +1,138 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
+probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+
+ EVENT : Event name.
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
+ MEMADDR   : Address where the probe is inserted.
+
+ FETCHARGS : Arguments.
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
+an event under tracing/events/kprobes/EVENT, at the directory you can see
+'id', 'enabled', 'format' and 'filter'.
+
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+
+format:
+  It shows the format of this probe event. It also shows aliases of arguments
+ which you specified

[PATCH -tip -v12 03/11] kprobes: checks probe address is instruction boudary on x86

2009-07-16 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b5b1848..5341842 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,6 +48,7 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
@@ -55,6 +56,7 @@
 #include asm/uaccess.h
 #include asm/alternative.h
 #include asm/debugreg.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -245,6 +247,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v12 01/11] x86: instruction decoder API

2009-07-16 Thread Masami Hiramatsu
On 2009年07月16日 12:19, H. Peter Anvin wrote:
 Masami Hiramatsu wrote:

 These opcode maps do NOT include most of SSE and FP opcodes, because
 those opcodes are not used in the kernel.

 
 That is not true.

Ah, these opcode maps include some SSE/FP setup opcdes which
are used in the kernel.

I've found that opcodes while running selftest of decoder,
so, I checked asm() code and added those in the maps.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v12 01/11] x86: instruction decoder API

2009-07-16 Thread Masami Hiramatsu
Sam Ravnborg wrote:
 diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
 new file mode 100644
 index 000..01e079a
 --- /dev/null
 +++ b/arch/x86/include/asm/inat.h
 @@ -0,0 +1,125 @@
 +#ifndef _ASM_INAT_INAT_H
 +#define _ASM_INAT_INAT_H
 
 [With reference to comment on patch 2/12...]
 You create inat.h here.
 Could you investigave what is needed to factor out the stuff
 needed from userspace so we can avoid the ugly havk where
 you redefine types.h?

Sorry, I'm a bit confusing.
Would you mean that I should break down user_include.h and
add those redefined types in inat.h?

 Maybe create a inat_types.h + inat.h as we do in other cases?

And inat_types.h has two parts, one for kernel, and one for
userspace(which is moved from user_include.h), is that right?

Thank you,

 
 Same for the other files that requred the types.h hack.
 
   Sam
 

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest

2009-07-16 Thread Masami Hiramatsu
Sam Ravnborg wrote:
 On Thu, Jul 16, 2009 at 11:57:06AM -0400, Masami Hiramatsu wrote:
 Add a user-space selftest of x86 instruction decoder at kernel build time.
 When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
 instruction decoder and performs it after building vmlinux.
 The test compares the results of objdump and x86 instruction decoder
 code and check there are no differences.
 
 Long overdue review from my side...
 
  arch/x86/scripts/Makefile   |   19 +++
  arch/x86/scripts/distill.awk|   42 +
  arch/x86/scripts/test_get_len.c |   99 
 +++
  arch/x86/scripts/user_include.h |   49 +++
 
 Hmmm, we have two architectures that uses scripts/ and three that
 uses tools/.
 I prefer the latter name as what we have ere is beyound what
 I generally recognize as a script.
 
 we have scripts/ in top-level and we do not rename this
 as we have this hardcoded too many places - but no reason to
 use the wrong name here.
 
 diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
 index 01e079a..9090665 100644
 --- a/arch/x86/include/asm/inat.h
 +++ b/arch/x86/include/asm/inat.h
 @@ -20,7 +20,9 @@
   * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, 
 USA.
   *
   */
 +#ifdef __KERNEL__
  #include linux/types.h
 +#endif
  
  /* Instruction attributes */
  typedef u32 insn_attr_t;
 
 Why this?
 If you need this to use this file from userspace then could we do some
 other trick to make this OK?



 
 I see it repeated several times below.
 [If this has already been discussed I have missed it - sorry].
 
 
 diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile
 new file mode 100644
 index 000..f08859e
 --- /dev/null
 +++ b/arch/x86/scripts/Makefile
 @@ -0,0 +1,19 @@
 +PHONY += posttest
 +quiet_cmd_posttest = TEST$@
 +  cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f 
 $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len
 +
 
 You are using the native objdump here.
 But I assume this fails miserably when you build x86 on a powerpc host.
 In other words - you broke an allyesconfig build for -next...
 We have $(OBJDUMP) for this.

Ah, I see... Would you know actual name of x86-objdump on the powerpc
(or any other crosscompiling host)? I just set OBJDUMP=objdump is OK?
I'm not so sure about cross-compiling kernel...

 +posttest: $(obj)/test_get_len vmlinux
 +$(call cmd,posttest)
 +
 +test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c 
 $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c
 +test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h 
 $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
 +
 +quiet_cmd_test_get_len = CC  $@
 +  cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) 
 -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include -include 
 $(srctree)/arch/x86/scripts/user_include.h -o $@
 
 Is there a specific reason why you cannot use the standard hostprogs-y for 
 this?
 It will take care of dependency tracking etc.
 What you have above is a hopeless incomplete list of dependencies.
 
 You need to use HOST_EXTRACFLAGS to set additional -I options and the 
 -include.

Thank you, I'll try to use hostprogs-y.

 +
 +static void usage()
 +{
 +fprintf(stderr, usage: %s  distilled_disassembly\n, prog);
 +exit(1);
 +}
 
 It would be nice to tell the user what the program is supposed to do.
 I know this is a bit unusual but no reason to copy bad practice.
 

Sure, maybe copying usage line in distill.awk is more helpful for user...

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest

2009-07-16 Thread Masami Hiramatsu
Masami Hiramatsu wrote:
 You are using the native objdump here.
 But I assume this fails miserably when you build x86 on a powerpc host.
 In other words - you broke an allyesconfig build for -next...
 We have $(OBJDUMP) for this.
 
 Ah, I see... Would you know actual name of x86-objdump on the powerpc
 (or any other crosscompiling host)? I just set OBJDUMP=objdump is OK?
 I'm not so sure about cross-compiling kernel...

Oops, we already have it. Yes, I'll use $(OBJDUMP).


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v12 02/11] x86: x86 instruction decoder build-time selftest

2009-07-16 Thread Masami Hiramatsu
Sam Ravnborg wrote:
 +  cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f 
 $(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len
 +
 You are using the native objdump here.
 But I assume this fails miserably when you build x86 on a powerpc host.
 In other words - you broke an allyesconfig build for -next...
 We have $(OBJDUMP) for this.
 Ah, I see... Would you know actual name of x86-objdump on the powerpc
 (or any other crosscompiling host)? I just set OBJDUMP=objdump is OK?
 I'm not so sure about cross-compiling kernel...
 
 Replacing objdump with $(OBJDUMP) will do the trick.
 We set OBJDUMP to the correct value in the top-level makefile.
 
 Are there any parts of your user-space program that rely
 on the host is little-endian?
 If it does then it would fail on a power-pc target despite using the
 correct objdump.

Hmm, as far as I can see, the result of get_next() macro with the types
more than two bytes(s16, s32...) might be effected.
But it doesn't effect get_insn_len test because those values are ignored.

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v12 01/11] x86: instruction decoder API

2009-07-16 Thread Masami Hiramatsu
Sam Ravnborg wrote:
 On Thu, Jul 16, 2009 at 01:28:54PM -0400, Masami Hiramatsu wrote:
 Sam Ravnborg wrote:
 diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
 new file mode 100644
 index 000..01e079a
 --- /dev/null
 +++ b/arch/x86/include/asm/inat.h
 @@ -0,0 +1,125 @@
 +#ifndef _ASM_INAT_INAT_H
 +#define _ASM_INAT_INAT_H
 [With reference to comment on patch 2/12...]
 You create inat.h here.
 Could you investigave what is needed to factor out the stuff
 needed from userspace so we can avoid the ugly havk where
 you redefine types.h?
 Sorry, I'm a bit confusing.
 Would you mean that I should break down user_include.h and
 add those redefined types in inat.h?
 No - try to factor out what is needed for your program
 so you can avoid user_include.h entirely.
 Maybe create a inat_types.h + inat.h as we do in other cases?
 And inat_types.h has two parts, one for kernel, and one for
 userspace(which is moved from user_include.h), is that right?
 More like inat_types.h include pure definitions and inat.h
 define all the macros (that would be much nicer if expressed
 as static inlines).

OK, some macros still need to be macros, because it will be used
for defining static tables.

 The real thing to consider is what is needed from your userspace
 program and is also required by the kernel.
 I did not event remotely try to find out - as I guess you know it.
 So try to isolate these bits somehow and you have then nicely dropped
 a lot of dependencies on the remainign headers and can thus
 hopefully get rid of the ugly usser_include.h hack.

OK, I'll try to remove user_include.h hack.

Thank you so much!


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v11 11/11] tracing: Add kprobes event profiling interface

2009-07-10 Thread Masami Hiramatsu
Hi,

Li Zefan wrote:
 +Event Profiling
 +---
 + You can check the total number of probe hits and probe miss-hits via
 +/sys/kernel/debug/tracing/kprobe_profile.
 + The fist column is event name, the second is the number of probe hits,
 
 s/fist/first

Oops, fixed.

 
 +the third is the number of probe miss-hits.
 +
 +
 ...
 +/* Probes profiling interfaces */
 +static int profile_seq_show(struct seq_file *m, void *v)
 +{
 +struct trace_probe *tp = v;
 +
 +if (tp == NULL)
 +return 0;
 +
 
 tp will never be NULL, which is guaranteed by seq_file

OK, fixed.

 +seq_printf(m, %s, tp-call.name);
 +
 +seq_printf(m, \t%8lu %8lu\n, tp-nhits,
 +   probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed);
 +
 +return 0;
 +}

Thank you for review!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v11 08/11] tracing: add kprobe-based event tracer

2009-07-10 Thread Masami Hiramatsu
 for review my patch!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v11 03/11] kprobes: checks probe address is instruction boudary on x86

2009-07-09 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b5b1848..5341842 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,6 +48,7 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
@@ -55,6 +56,7 @@
 #include asm/uaccess.h
 #include asm/alternative.h
 #include asm/debugreg.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -245,6 +247,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v11 06/11] tracing: ftrace dynamic ftrace_event_call support

2009-07-09 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Frederic Weisbecker fweis...@gmail.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
---

 include/linux/ftrace_event.h |   13 +---
 include/trace/ftrace.h   |   22 +++--
 kernel/trace/trace_events.c  |   70 --
 kernel/trace/trace_export.c  |   27 
 4 files changed, 85 insertions(+), 47 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c093ff..f7733b6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 1867553..d696580 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -147,7 +147,8 @@
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)   \
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call(
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret;
  *
@@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call(
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *unused)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call(
\
  * trace_current_buffer_unlock_commit(event, irq_flags, pc);
  * }
  *
- * static int ftrace_raw_reg_event_call(void)
+ * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int

[PATCH -tip -v11 04/11] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-07-09 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 5341842..b77e050 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67:
-   case 0x2e:
-   case 0x3e:
-   case 0x26:
-   case 0x64:
-   case 0x65:
-   case 0x36:
-   case 0xf0:
-   case 0xf3:
-   case 0xf2:
-   ++insn;
-   continue;
-   }
-   break;
-   }
+   struct insn insn;
+   kernel_insn_init(insn, p-ainsn.insn);
 
-   /* Skip REX instruction prefix.  */
-   if (is_REX_prefix(insn))
-   ++insn;
-
-   if (*insn == 0x0f) {
-   /* Two-byte opcode.  */
-   ++insn

[PATCH -tip -v11 01/11] x86: instruction decoder API

2009-07-09 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps do NOT include most of SSE and FP opcodes, because
those opcodes are not used in the kernel.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Acked-by: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/include/asm/inat.h|  125 ++
 arch/x86/include/asm/insn.h|  134 ++
 arch/x86/lib/Makefile  |   13 +
 arch/x86/lib/inat.c|   80 
 arch/x86/lib/insn.c|  471 +
 arch/x86/lib/x86-opcode-map.txt|  711 
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 ++
 7 files changed, 1848 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..01e079a
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,125 @@
+#ifndef _ASM_INAT_INAT_H
+#define _ASM_INAT_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include linux/types.h
+
+/* Instruction attributes */
+typedef u32 insn_attr_t;
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should add checking macros and use that macro in
+ * your code.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6
+#define INAT_IMM_VWORD 7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS  0
+#define INAT_PFX_BITS  4
+#define INAT_PFX_MAX((1  INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK  (INAT_PFX_MAX  INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS  (INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS  2
+#define INAT_ESC_MAX

[PATCH -tip -v11 05/11] x86: add pt_regs register and stack access APIs

2009-07-09 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.
These APIs are required by kprobes-based event tracer on ftrace.
Some other debugging tools might be able to use it too.

- regs_query_register_offset(const char *name)
   Query the offset of name register.

- regs_query_register_name(unsigned int offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned int offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth entry of the kernel stack. (N = 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned int nth)
   Get Nth argument at function call. (N = 0)

Changes from v10:
 - Use an offsetof table in regs_get_argument_nth().
 - Use unsigned int instead of unsigned.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Reviewed-by: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@firstfloor.org
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: linux-a...@vger.kernel.org
---

 arch/x86/include/asm/ptrace.h |   62 +++
 arch/x86/kernel/ptrace.c  |  112 +
 2 files changed, 174 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..a3d49dd 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@
 
 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif
 
 #ifndef __ASSEMBLY__
@@ -216,6 +217,67 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }
 
+/* Query offset/name of register from its name/offset */
+extern int regs_query_register_offset(const char *name);
+extern const char *regs_query_register_name(unsigned int offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/**
+ * regs_get_register() - get register value from its offset
+ * @regs:  pt_regs from which register value is gotten.
+ * @offset:offset number of the register.
+ *
+ * regs_get_register returns the value of a register whose offset from @regs
+ * is @offset. The @offset is the offset of the register in struct pt_regs.
+ * If @offset is bigger than MAX_REG_OFFSET, this returns 0.
+ */
+static inline unsigned long regs_get_register(struct pt_regs *regs,
+ unsigned int offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/**
+ * regs_within_kernel_stack() - check the address in the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @addr:  address which is checked.
+ *
+ * regs_within_kenel_stack() checks @addr is within the kernel stack page(s).
+ * If @addr is within the kernel stack, it returns true. If not, returns false.
+ */
+static inline int regs_within_kernel_stack(struct pt_regs *regs,
+  unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/**
+ * regs_get_kernel_stack_nth() - get Nth entry of the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @n: stack entry number.
+ *
+ * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
+ * is specifined by @regs. If the @n th entry is NOT in the kernel stack,
+ * this returns 0.
+ */
+static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
+ unsigned int n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (regs_within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+extern unsigned long regs_get_argument_nth(struct pt_regs *regs,
+  unsigned int n);
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index cabdabc..32729ec 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -49,6 +49,118 @@ enum x86_regset {
REGSET_IOPERM32,
 };
 
+struct pt_regs_offset {
+   const char *name;
+   int offset;
+};
+
+#define REG_OFFSET_NAME(r) {.name = #r, .offset = offsetof(struct pt_regs, r)}
+#define REG_OFFSET_END {.name = NULL, .offset = 0}
+
+static const struct pt_regs_offset regoffset_table[] = {
+#ifdef

[PATCH -tip -v11 07/11] tracing: Introduce TRACE_FIELD_ZERO() macro

2009-07-09 Thread Masami Hiramatsu
Use TRACE_FIELD_ZERO(type, item) instead of TRACE_FIELD_ZERO_CHAR(item).
This also includes a fix of TRACE_ZERO_CHAR() macro.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 kernel/trace/trace_event_types.h |4 ++--
 kernel/trace/trace_export.c  |   16 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 6db005e..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -109,7 +109,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
TRACE_FIELD(char *, fmt, fmt)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
@@ -117,7 +117,7 @@ TRACE_EVENT_FORMAT(bprint, TRACE_BPRINT, bprint_entry, 
ignore,
 TRACE_EVENT_FORMAT(print, TRACE_PRINT, print_entry, ignore,
TRACE_STRUCT(
TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD_ZERO_CHAR(buf)
+   TRACE_FIELD_ZERO(char, buf)
),
TP_RAW_FMT(%08lx (%d) fmt:%p %s)
 );
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index 7cee79d..23125b5 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -42,9 +42,9 @@ extern void __bad_type_size(void);
if (!ret)   \
return 0;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)\
-   ret = trace_seq_printf(s, \tfield:char  #item ;\t   \
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)   \
+   ret = trace_seq_printf(s, \tfield: #type   #item ;\t  \
   offset:%u;\tsize:0;\n, \
   (unsigned int)offsetof(typeof(field), item)); \
if (!ret)   \
@@ -90,9 +90,6 @@ ftrace_format_##call(struct ftrace_event_call *dummy, struct 
trace_seq *s)\
 
 #include trace_event_types.h
 
-#undef TRACE_ZERO_CHAR
-#define TRACE_ZERO_CHAR(arg)
-
 #undef TRACE_FIELD
 #define TRACE_FIELD(type, item, assign)\
entry-item = assign;
@@ -105,6 +102,9 @@ ftrace_format_##call(struct ftrace_event_call *dummy, 
struct trace_seq *s)\
 #define TRACE_FIELD_SIGN(type, item, assign, is_signed)\
TRACE_FIELD(type, item, assign)
 
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
+
 #undef TP_CMD
 #define TP_CMD(cmd...) cmd
 
@@ -176,8 +176,8 @@ __attribute__((section(_ftrace_events))) event_##call = { 
\
if (ret)\
return ret;
 
-#undef TRACE_FIELD_ZERO_CHAR
-#define TRACE_FIELD_ZERO_CHAR(item)
+#undef TRACE_FIELD_ZERO
+#define TRACE_FIELD_ZERO(type, item)
 
 #undef TRACE_EVENT_FORMAT
 #define TRACE_EVENT_FORMAT(call, proto, args, fmt, tstruct, tpfmt) \


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v11 02/11] x86: x86 instruction decoder build-time selftest

2009-07-09 Thread Masami Hiramatsu
Add a user-space selftest of x86 instruction decoder at kernel build time.
When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
instruction decoder and performs it after building vmlinux.
The test compares the results of objdump and x86 instruction decoder
code and check there are no differences.

Changes from v10:
 - Use unsigned int instead of unsigned.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Sam Ravnborg s...@ravnborg.org
---

 arch/x86/Kconfig.debug  |9 
 arch/x86/Makefile   |3 +
 arch/x86/include/asm/inat.h |2 +
 arch/x86/include/asm/insn.h |2 +
 arch/x86/lib/inat.c |2 +
 arch/x86/lib/insn.c |2 +
 arch/x86/scripts/Makefile   |   19 +++
 arch/x86/scripts/distill.awk|   42 +
 arch/x86/scripts/test_get_len.c |   99 +++
 arch/x86/scripts/user_include.h |   49 +++
 10 files changed, 229 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d105f29..7d0b681 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -186,6 +186,15 @@ config X86_DS_SELFTEST
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_DECODER_SELFTEST
+ bool x86 instruction decoder selftest
+ depends on DEBUG_KERNEL
+   ---help---
+Perform x86 instruction decoder selftests at build time.
+This option is useful for checking the sanity of x86 instruction
+decoder code.
+If unsure, say N.
+
 #
 # IO delay types:
 #
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1b68659..7046556 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -154,6 +154,9 @@ all: bzImage
 KBUILD_IMAGE := $(boot)/bzImage
 
 bzImage: vmlinux
+ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
+   $(Q)$(MAKE) $(build)=arch/x86/scripts posttest
+endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage 
$(objtree)/arch/$(UTS_MACHINE)/boot/$@
diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 01e079a..9090665 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -20,7 +20,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 
 /* Instruction attributes */
 typedef u32 insn_attr_t;
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5b50fa3..5736404 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -20,7 +20,9 @@
  * Copyright (C) IBM Corporation, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 /* insn_attr_t is defined in inat.h */
 #include asm/inat.h
 
diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c
index d6a34be..564ecbd 100644
--- a/arch/x86/lib/inat.c
+++ b/arch/x86/lib/inat.c
@@ -18,7 +18,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/module.h
+#endif
 #include asm/insn.h
 
 /* Attribute tables are generated from opcode map */
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 254c848..3b9451a 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -18,8 +18,10 @@
  * Copyright (C) IBM Corporation, 2002, 2004, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/string.h
 #include linux/module.h
+#endif
 #include asm/inat.h
 #include asm/insn.h
 
diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile
new file mode 100644
index 000..f08859e
--- /dev/null
+++ b/arch/x86/scripts/Makefile
@@ -0,0 +1,19 @@
+PHONY += posttest
+quiet_cmd_posttest = TEST$@
+  cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len
+
+posttest: $(obj)/test_get_len vmlinux
+   $(call cmd,posttest)
+
+test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c 
$(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c
+test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h 
$(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
+
+quiet_cmd_test_get_len = CC  $@
+  cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) 
-I

[PATCH -tip -v11 10/11] tracing: Generate names for each kprobe event automatically

2009-07-09 Thread Masami Hiramatsu
Generate names for each kprobe event based on the probe point,
and remove generic k*probe event types because there is no user
of those types.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |3 +-
 kernel/trace/trace_event_types.h|   18 --
 kernel/trace/trace_kprobe.c |   62 +++
 3 files changed, 35 insertions(+), 48 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index b29a54b..437ad49 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -28,7 +28,8 @@ Synopsis of kprobe_events
   p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
   r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
 
- EVENT : Event name.
+ EVENT : Event name. If omitted, the event name is generated
+ based on SYMBOL+offs or MEMADDR.
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
diff --git a/kernel/trace/trace_event_types.h b/kernel/trace/trace_event_types.h
index 186b598..e74f090 100644
--- a/kernel/trace/trace_event_types.h
+++ b/kernel/trace/trace_event_types.h
@@ -175,22 +175,4 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, 
kmemtrace_free_entry, ignore,
TP_RAW_FMT(type:%u call_site:%lx ptr:%p)
 );
 
-TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, ip, ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx: args:0x%lx ...)
-);
-
-TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, ignore,
-   TRACE_STRUCT(
-   TRACE_FIELD(unsigned long, func, func)
-   TRACE_FIELD(unsigned long, ret_ip, ret_ip)
-   TRACE_FIELD(int, nargs, nargs)
-   TRACE_FIELD_ZERO(unsigned long, args)
-   ),
-   TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...)
-);
 #undef TRACE_SYSTEM
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 8754c7e..9c6ffcc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -34,6 +34,7 @@
 
 #define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
+#define MAX_EVENT_NAME_LEN 64
 
 /* currently, trace_kprobe only supports X86. */
 
@@ -265,11 +266,11 @@ static struct trace_probe *alloc_trace_probe(const char 
*symbol,
if (!tp-symbol)
goto error;
}
-   if (event) {
-   tp-call.name = kstrdup(event, GFP_KERNEL);
-   if (!tp-call.name)
-   goto error;
-   }
+   if (!event)
+   goto error;
+   tp-call.name = kstrdup(event, GFP_KERNEL);
+   if (!tp-call.name)
+   goto error;
 
INIT_LIST_HEAD(tp-list);
return tp;
@@ -297,7 +298,7 @@ static struct trace_probe *find_probe_event(const char 
*event)
 {
struct trace_probe *tp;
list_for_each_entry(tp, probe_list, list)
-   if (tp-call.name  !strcmp(tp-call.name, event))
+   if (!strcmp(tp-call.name, event))
return tp;
return NULL;
 }
@@ -313,8 +314,7 @@ static void __unregister_trace_probe(struct trace_probe *tp)
 /* Unregister a trace_probe and probe_event: call with locking probe_lock */
 static void unregister_trace_probe(struct trace_probe *tp)
 {
-   if (tp-call.name)
-   unregister_probe_event(tp);
+   unregister_probe_event(tp);
__unregister_trace_probe(tp);
list_del(tp-list);
 }
@@ -343,18 +343,16 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}
/* register as an event */
-   if (tp-call.name) {
-   old_tp = find_probe_event(tp-call.name);
-   if (old_tp) {
-   /* delete old event */
-   unregister_trace_probe(old_tp);
-   free_trace_probe(old_tp);
-   }
-   ret = register_probe_event(tp);
-   if (ret) {
-   pr_warning(Faild to register probe event(%d)\n, ret);
-   __unregister_trace_probe(tp);
-   }
+   old_tp = find_probe_event(tp-call.name);
+   if (old_tp) {
+   /* delete old event */
+   unregister_trace_probe(old_tp);
+   free_trace_probe(old_tp);
+   }
+   ret = register_probe_event(tp);
+   if (ret) {
+   pr_warning(Faild

[PATCH -tip -v11 08/11] tracing: add kprobe-based event tracer

2009-07-09 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes
(kprobe and kretprobe). It probes anywhere where kprobes can probe(this
 means, all functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobetrace.txt for details.

Changes from v10:
 - Use unsigned int instead of unsigned.
 - Make kprobe_trace_entry and kretprobe_trace_entry variable array.
 - Use TRACE_FIELD_ZERO()
 - Rename the document to kprobetrace.txt.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |  138 
 kernel/trace/Kconfig|   12 
 kernel/trace/Makefile   |1 
 kernel/trace/trace.h|   29 +
 kernel/trace/trace_event_types.h|   18 +
 kernel/trace/trace_kprobe.c | 1183 +++
 6 files changed, 1381 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
new file mode 100644
index 000..9ad907c
--- /dev/null
+++ b/Documentation/trace/kprobetrace.txt
@@ -0,0 +1,138 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
+probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: Set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+
+ EVENT : Event name.
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
+ MEMADDR   : Address where the probe is inserted.
+
+ FETCHARGS : Arguments.
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
+an event under tracing/events/kprobes/EVENT, at the directory you can see
+'id', 'enabled', 'format' and 'filter'.
+
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+
+format:
+  It shows the format of this probe event. It also shows aliases of arguments
+ which you specified to kprobe_events.
+
+filter:
+  You can write filtering rules of this event. And you can use both

[PATCH -tip -v11 11/11] tracing: Add kprobes event profiling interface

2009-07-09 Thread Masami Hiramatsu
Add profiling interaces for each kprobes event.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |8 ++
 kernel/trace/trace_kprobe.c |   48 +++
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 437ad49..d386d96 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -69,6 +69,14 @@ filter:
  names and field names for describing filters.
 
 
+Event Profiling
+---
+ You can check the total number of probe hits and probe miss-hits via
+/sys/kernel/debug/tracing/kprobe_profile.
+ The fist column is event name, the second is the number of probe hits,
+the third is the number of probe miss-hits.
+
+
 Usage examples
 --
 To add a probe as a new event, write a new definition to kprobe_events
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 9c6ffcc..cbff9d5 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -174,6 +174,7 @@ struct trace_probe {
struct kprobe   kp;
struct kretproberp;
};
+   unsigned long   nhits;
const char  *symbol;/* symbol name */
struct ftrace_event_callcall;
unsigned intnr_args;
@@ -762,6 +763,42 @@ static const struct file_operations kprobe_events_ops = {
.write  = probes_write,
 };
 
+/* Probes profiling interfaces */
+static int profile_seq_show(struct seq_file *m, void *v)
+{
+   struct trace_probe *tp = v;
+
+   if (tp == NULL)
+   return 0;
+
+   seq_printf(m, %s, tp-call.name);
+
+   seq_printf(m, \t%8lu %8lu\n, tp-nhits,
+  probe_is_return(tp) ? tp-rp.kp.nmissed : tp-kp.nmissed);
+
+   return 0;
+}
+
+static const struct seq_operations profile_seq_op = {
+   .start  = probes_seq_start,
+   .next   = probes_seq_next,
+   .stop   = probes_seq_stop,
+   .show   = profile_seq_show
+};
+
+static int profile_open(struct inode *inode, struct file *file)
+{
+   return seq_open(file, profile_seq_op);
+}
+
+static const struct file_operations kprobe_profile_ops = {
+   .owner  = THIS_MODULE,
+   .open   = profile_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+
 /* Kprobe handler */
 static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs)
 {
@@ -772,6 +809,8 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, 
struct pt_regs *regs)
unsigned long irq_flags;
struct ftrace_event_call *call = tp-call;
 
+   tp-nhits++;
+
local_save_flags(irq_flags);
pc = preempt_count();
 
@@ -1145,9 +1184,18 @@ static __init int init_kprobe_trace(void)
entry = debugfs_create_file(kprobe_events, 0644, d_tracer,
NULL, kprobe_events_ops);
 
+   /* Event list interface */
if (!entry)
pr_warning(Could not create debugfs 
   'kprobe_events' entry\n);
+
+   /* Profile interface */
+   entry = debugfs_create_file(kprobe_profile, 0444, d_tracer,
+   NULL, kprobe_profile_ops);
+
+   if (!entry)
+   pr_warning(Could not create debugfs 
+  'kprobe_profile' entry\n);
return 0;
 }
 fs_initcall(init_kprobe_trace);


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v11 00/11] tracing: kprobe-based event tracer and x86 instruction decoder

2009-07-09 Thread Masami Hiramatsu
/tracing/kprobe_events

 This sets a kretprobe on the return point of do_sys_open() function with
recording return value and return address as myretprobe event.
 You can see the format of these events via
/sys/kernel/debug/tracing/events/kprobes/EVENT/format.

  cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 23
format:
field:unsigned short common_type;   offset:0;   size:2;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (11):
  tracing: Add kprobes event profiling interface
  tracing: Generate names for each kprobe event automatically
  tracing: Kprobe-tracer supports more than 6 arguments
  tracing: add kprobe-based event tracer
  tracing: Introduce TRACE_FIELD_ZERO() macro
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobetrace.txt|  147 
 arch/x86/Kconfig.debug |9 
 arch/x86/Makefile  |3 
 arch/x86/include/asm/inat.h|  127 +++
 arch/x86/include/asm/insn.h|  136 
 arch/x86/include/asm/ptrace.h  |   62 ++
 arch/x86/kernel/kprobes.c  |  197 ++---
 arch/x86/kernel/ptrace.c   |  112 +++
 arch/x86/lib/Makefile  |   13 
 arch/x86/lib/inat.c|   82 ++
 arch/x86/lib/insn.c|  473 
 arch/x86/lib/x86-opcode-map.txt|  711 ++
 arch/x86/scripts/Makefile  |   19 
 arch/x86/scripts/distill.awk   |   42 +
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 
 arch/x86/scripts/test_get_len.c|   99 +++
 arch/x86/scripts/user_include.h|   49 +
 include/linux/ftrace_event.h   |   13 
 include/trace/ftrace.h |   22 -
 kernel/trace/Kconfig   |   12 
 kernel/trace/Makefile  |1 
 kernel/trace/trace.h   |   29 +
 kernel/trace/trace_event_types.h   |4 
 kernel/trace/trace_events.c|   70 +-
 kernel/trace/trace_export.c|   43 +
 kernel/trace/trace_kprobe.c| 1240 
 26 files changed, 3867 insertions(+), 162 deletions(-)
 create mode 100644 Documentation/trace/kprobetrace.txt
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk
 create mode 100644 arch/x86/scripts

[PATCH -tip -v11 09/11] tracing: Kprobe-tracer supports more than 6 arguments

2009-07-09 Thread Masami Hiramatsu
Support up to 128 arguments for each kprobes event.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobetrace.txt |2 +-
 kernel/trace/trace_kprobe.c |   21 +
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt 
b/Documentation/trace/kprobetrace.txt
index 9ad907c..b29a54b 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -32,7 +32,7 @@ Synopsis of kprobe_events
  SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted.
  MEMADDR   : Address where the probe is inserted.
 
- FETCHARGS : Arguments.
+ FETCHARGS : Arguments. Each probe can have up to 128 args.
   %REG : Fetch register REG
   sN   : Fetch Nth entry of stack (N = 0)
   @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 57bf521..8754c7e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -32,7 +32,7 @@
 #include trace.h
 #include trace_output.h
 
-#define TRACE_KPROBE_ARGS 6
+#define MAX_TRACE_ARGS 128
 #define MAX_ARGSTR_LEN 63
 
 /* currently, trace_kprobe only supports X86. */
@@ -174,11 +174,15 @@ struct trace_probe {
struct kretproberp;
};
const char  *symbol;/* symbol name */
-   unsigned intnr_args;
-   struct fetch_func   args[TRACE_KPROBE_ARGS];
struct ftrace_event_callcall;
+   unsigned intnr_args;
+   struct fetch_func   args[];
 };
 
+#define SIZEOF_TRACE_PROBE(n)  \
+   (offsetof(struct trace_probe, args) +   \
+   (sizeof(struct fetch_func) * (n)))
+
 static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
 static int kretprobe_trace_func(struct kretprobe_instance *ri,
struct pt_regs *regs);
@@ -248,11 +252,11 @@ static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
 
 static struct trace_probe *alloc_trace_probe(const char *symbol,
-const char *event)
+const char *event, int nargs)
 {
struct trace_probe *tp;
 
-   tp = kzalloc(sizeof(struct trace_probe), GFP_KERNEL);
+   tp = kzalloc(SIZEOF_TRACE_PROBE(nargs), GFP_KERNEL);
if (!tp)
return ERR_PTR(-ENOMEM);
 
@@ -550,9 +554,10 @@ static int create_trace_probe(int argc, char **argv)
if (offset  is_return)
return -EINVAL;
}
+   argc -= 2; argv += 2;
 
/* setup a probe */
-   tp = alloc_trace_probe(symbol, event);
+   tp = alloc_trace_probe(symbol, event, argc);
if (IS_ERR(tp))
return PTR_ERR(tp);
 
@@ -571,8 +576,8 @@ static int create_trace_probe(int argc, char **argv)
kp-addr = addr;
 
/* parse arguments */
-   argc -= 2; argv += 2; ret = 0;
-   for (i = 0; i  argc  i  TRACE_KPROBE_ARGS; i++) {
+   ret = 0;
+   for (i = 0; i  argc  i  MAX_TRACE_ARGS; i++) {
if (strlen(argv[i])  MAX_ARGSTR_LEN) {
pr_info(Argument%d(%s) is too long.\n, i, argv[i]);
ret = -ENOSPC;


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer

2009-07-07 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
 index 206cb7d..65945eb 100644
 --- a/kernel/trace/trace.h
 +++ b/kernel/trace/trace.h
 @@ -45,6 +45,8 @@ enum trace_type {
  TRACE_POWER,
  TRACE_BLK,
  TRACE_KSYM,
 +TRACE_KPROBE,
 +TRACE_KRETPROBE,
  
  __TRACE_LAST_TYPE,
  };
 @@ -227,6 +229,22 @@ struct trace_ksym {
  charksym_name[KSYM_NAME_LEN];
  charp_name[TASK_COMM_LEN];
  };
 +#define TRACE_KPROBE_ARGS 6
 +
 +struct kprobe_trace_entry {
 +struct trace_entry  ent;
 +unsigned long   ip;
 +int nargs;
 +unsigned long   args[TRACE_KPROBE_ARGS];
 
 
 
 I see that you actually make use of arg as a dynamic sizeable
 array.
 For clarity, args[TRACE_KPROBE_ARGS] could be args[0].
 
 It's just a neat and wouldn't affect the code nor the data
 but would be clearer for readers of that code.

Hmm. In that case, I think we'll need a new macro for field
definition, like TRACE_FIELD_ZERO(type, item).

 +};
 +
 +struct kretprobe_trace_entry {
 +struct trace_entry  ent;
 +unsigned long   func;
 +unsigned long   ret_ip;
 +int nargs;
 +unsigned long   args[TRACE_KPROBE_ARGS];
 +};
 
 
 ditto
 
 
   
  /*
   * trace_flag_type is an enumeration that holds different
 @@ -344,6 +362,10 @@ extern void __ftrace_bad_type(void);
  IF_ASSIGN(var, ent, struct syscall_trace_exit,  \
TRACE_SYSCALL_EXIT);  \
  IF_ASSIGN(var, ent, struct trace_ksym, TRACE_KSYM); \
 +IF_ASSIGN(var, ent, struct kprobe_trace_entry,  \
 +  TRACE_KPROBE);\
 +IF_ASSIGN(var, ent, struct kretprobe_trace_entry,   \
 +  TRACE_KRETPROBE); \
  __ftrace_bad_type();\
  } while (0)
  
 diff --git a/kernel/trace/trace_event_types.h 
 b/kernel/trace/trace_event_types.h
 index 6db005e..ec2e6f3 100644
 --- a/kernel/trace/trace_event_types.h
 +++ b/kernel/trace/trace_event_types.h
 @@ -175,4 +175,24 @@ TRACE_EVENT_FORMAT(kmem_free, TRACE_KMEM_FREE, 
 kmemtrace_free_entry, ignore,
  TP_RAW_FMT(type:%u call_site:%lx ptr:%p)
  );
  
 +TRACE_EVENT_FORMAT(kprobe, TRACE_KPROBE, kprobe_trace_entry, ignore,
 +TRACE_STRUCT(
 +TRACE_FIELD(unsigned long, ip, ip)
 +TRACE_FIELD(int, nargs, nargs)
 +TRACE_FIELD_SPECIAL(unsigned long args[TRACE_KPROBE_ARGS],
 +args, TRACE_KPROBE_ARGS, args)
 +),
 +TP_RAW_FMT(%08lx: args:0x%lx ...)
 +);
 +
 +TRACE_EVENT_FORMAT(kretprobe, TRACE_KRETPROBE, kretprobe_trace_entry, 
 ignore,
 +TRACE_STRUCT(
 +TRACE_FIELD(unsigned long, func, func)
 +TRACE_FIELD(unsigned long, ret_ip, ret_ip)
 +TRACE_FIELD(int, nargs, nargs)
 +TRACE_FIELD_SPECIAL(unsigned long args[TRACE_KPROBE_ARGS],
 +args, TRACE_KPROBE_ARGS, args)
 +),
 +TP_RAW_FMT(%08lx - %08lx: args:0x%lx ...)
 +);
  #undef TRACE_SYSTEM
 diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
 new file mode 100644
 index 000..0951512
 --- /dev/null
 +++ b/kernel/trace/trace_kprobe.c
 @@ -0,0 +1,1183 @@
 +/*
 + * kprobe based kernel tracer
 + *
 + * Created by Masami Hiramatsu mhira...@redhat.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 + */
 +
 +#include linux/module.h
 +#include linux/uaccess.h
 +#include linux/kprobes.h
 +#include linux/seq_file.h
 +#include linux/slab.h
 +#include linux/smp.h
 +#include linux/debugfs.h
 +#include linux/types.h
 +#include linux/string.h
 +#include linux/ctype.h
 +#include linux/ptrace.h
 +
 +#include trace.h
 +#include trace_output.h
 +
 +#define MAX_ARGSTR_LEN 63
 +
 +/* currently, trace_kprobe only supports X86. */
 +
 +struct fetch_func {
 +unsigned long (*func)(struct pt_regs *, void *);
 +void *data;
 +};
 +
 +static __kprobes unsigned long call_fetch(struct fetch_func *f,
 +  struct pt_regs *regs)
 +{
 +return f-func(regs, f-data);
 +}
 +
 +/* fetch handlers */
 +static __kprobes unsigned long

Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer

2009-07-07 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 04:42:32PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 03:55:28PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
 index 206cb7d..65945eb 100644
 --- a/kernel/trace/trace.h
 +++ b/kernel/trace/trace.h
 @@ -45,6 +45,8 @@ enum trace_type {
  TRACE_POWER,
  TRACE_BLK,
  TRACE_KSYM,
 +TRACE_KPROBE,
 +TRACE_KRETPROBE,
  
  __TRACE_LAST_TYPE,
  };
 @@ -227,6 +229,22 @@ struct trace_ksym {
  charksym_name[KSYM_NAME_LEN];
  charp_name[TASK_COMM_LEN];
  };
 +#define TRACE_KPROBE_ARGS 6
 +
 +struct kprobe_trace_entry {
 +struct trace_entry  ent;
 +unsigned long   ip;
 +int nargs;
 +unsigned long   args[TRACE_KPROBE_ARGS];

 I see that you actually make use of arg as a dynamic sizeable
 array.
 For clarity, args[TRACE_KPROBE_ARGS] could be args[0].

 It's just a neat and wouldn't affect the code nor the data
 but would be clearer for readers of that code.
 Hmm. In that case, I think we'll need a new macro for field
 definition, like TRACE_FIELD_ZERO(type, item).


 You mean that for trace_define_field() to describe fields of events?
 Actually the fields should be defined dynamically depending on how
 is built the kprobe event (which arguments are requested, how many,
 etc..).
 Yeah, if you specified a probe point with its event name, the tracer
 will make a corresponding event dynamically. There are also anonymous
 probes which don't have corresponding events. For those anonymous
 probes, I need to define two generic event types(kprobe and kretprobe).

 Thank you,
 
 
 Ok. Btw, why do you need to define those two anonymous events?
 Actually your event types are always dynamically created.
 Those you defined through TRACE_FORMAT_EVENT are only ghost events,
 they only stand there as a abstract pattern, right?
 

Not always created.

Below command will create an event event1;
p probe_point:event1 a1 a2 a3 ...  /debug/tracing/kprobe_events

But next command doesn't create.
p probe_point a1 a2 a3 ...  /debug/tracing/kprobe_events

This just inserts a kprobe to probe_point. the advantage of this
simple command is that you never be annoyed by making different
name for new events :-)

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer

2009-07-07 Thread Masami Hiramatsu
Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 05:31:25PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 04:42:32PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 03:55:28PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
 index 206cb7d..65945eb 100644
 --- a/kernel/trace/trace.h
 +++ b/kernel/trace/trace.h
 @@ -45,6 +45,8 @@ enum trace_type {
TRACE_POWER,
TRACE_BLK,
TRACE_KSYM,
 +  TRACE_KPROBE,
 +  TRACE_KRETPROBE,
  
__TRACE_LAST_TYPE,
  };
 @@ -227,6 +229,22 @@ struct trace_ksym {
charksym_name[KSYM_NAME_LEN];
charp_name[TASK_COMM_LEN];
  };
 +#define TRACE_KPROBE_ARGS 6
 +
 +struct kprobe_trace_entry {
 +  struct trace_entry  ent;
 +  unsigned long   ip;
 +  int nargs;
 +  unsigned long   args[TRACE_KPROBE_ARGS];
 I see that you actually make use of arg as a dynamic sizeable
 array.
 For clarity, args[TRACE_KPROBE_ARGS] could be args[0].

 It's just a neat and wouldn't affect the code nor the data
 but would be clearer for readers of that code.
 Hmm. In that case, I think we'll need a new macro for field
 definition, like TRACE_FIELD_ZERO(type, item).

 You mean that for trace_define_field() to describe fields of events?
 Actually the fields should be defined dynamically depending on how
 is built the kprobe event (which arguments are requested, how many,
 etc..).
 Yeah, if you specified a probe point with its event name, the tracer
 will make a corresponding event dynamically. There are also anonymous
 probes which don't have corresponding events. For those anonymous
 probes, I need to define two generic event types(kprobe and kretprobe).

 Thank you,

 Ok. Btw, why do you need to define those two anonymous events?
 Actually your event types are always dynamically created.
 Those you defined through TRACE_FORMAT_EVENT are only ghost events,
 they only stand there as a abstract pattern, right?

 Not always created.

 Below command will create an event event1;
 p probe_point:event1 a1 a2 a3 ...  /debug/tracing/kprobe_events

 But next command doesn't create.
 p probe_point a1 a2 a3 ...  /debug/tracing/kprobe_events
 
 
 Aah, ok.
 
 
 This just inserts a kprobe to probe_point. the advantage of this
 simple command is that you never be annoyed by making different
 name for new events :-)
 
 
 Indeed.
 But speaking about that, may be you could dynamically create a name
 following this simple model: func+offset
 Unless we can set several kprobes on the exact same address?

Actually, we can...
I thought that someone might want to insert events in the same
address for retrieving more than 6 arguments.

Thanks,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer

2009-07-07 Thread Masami Hiramatsu
Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 05:31:25PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 04:42:32PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 On Tue, Jul 07, 2009 at 03:55:28PM -0400, Masami Hiramatsu wrote:
 Frederic Weisbecker wrote:
 diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
 index 206cb7d..65945eb 100644
 --- a/kernel/trace/trace.h
 +++ b/kernel/trace/trace.h
 @@ -45,6 +45,8 @@ enum trace_type {
   TRACE_POWER,
   TRACE_BLK,
   TRACE_KSYM,
 + TRACE_KPROBE,
 + TRACE_KRETPROBE,
  
   __TRACE_LAST_TYPE,
  };
 @@ -227,6 +229,22 @@ struct trace_ksym {
   charksym_name[KSYM_NAME_LEN];
   charp_name[TASK_COMM_LEN];
  };
 +#define TRACE_KPROBE_ARGS 6
 +
 +struct kprobe_trace_entry {
 + struct trace_entry  ent;
 + unsigned long   ip;
 + int nargs;
 + unsigned long   args[TRACE_KPROBE_ARGS];
 I see that you actually make use of arg as a dynamic sizeable
 array.
 For clarity, args[TRACE_KPROBE_ARGS] could be args[0].

 It's just a neat and wouldn't affect the code nor the data
 but would be clearer for readers of that code.
 Hmm. In that case, I think we'll need a new macro for field
 definition, like TRACE_FIELD_ZERO(type, item).
 You mean that for trace_define_field() to describe fields of events?
 Actually the fields should be defined dynamically depending on how
 is built the kprobe event (which arguments are requested, how many,
 etc..).
 Yeah, if you specified a probe point with its event name, the tracer
 will make a corresponding event dynamically. There are also anonymous
 probes which don't have corresponding events. For those anonymous
 probes, I need to define two generic event types(kprobe and kretprobe).

 Thank you,
 Ok. Btw, why do you need to define those two anonymous events?
 Actually your event types are always dynamically created.
 Those you defined through TRACE_FORMAT_EVENT are only ghost events,
 they only stand there as a abstract pattern, right?

 Not always created.

 Below command will create an event event1;
 p probe_point:event1 a1 a2 a3 ...  /debug/tracing/kprobe_events

 But next command doesn't create.
 p probe_point a1 a2 a3 ...  /debug/tracing/kprobe_events

 Aah, ok.


 This just inserts a kprobe to probe_point. the advantage of this
 simple command is that you never be annoyed by making different
 name for new events :-)

 Indeed.
 But speaking about that, may be you could dynamically create a name
 following this simple model: func+offset

hmm, and we have two probe types, p(robe) and r(et probe).
so, event name should be t...@func+offset or t...@address.

 Unless we can set several kprobes on the exact same address?
 
 Actually, we can...
 I thought that someone might want to insert events in the same
 address for retrieving more than 6 arguments.

Anyway, I can improve the interface according to user's voice.
If you have good idea, I'm happy to hear that:-)

Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -tip -v10 5/7] x86: add pt_regs register and stack access APIs

2009-07-06 Thread Masami Hiramatsu
Andi Kleen wrote:
 Masami Hiramatsu mhira...@redhat.com writes:
 
 Add following APIs for accessing registers and stack entries from pt_regs.
 
 You forgot to state who calls these functions/why are they added?
 Who only has strings for registers?

Oh, yes. This patch is needed for kprobes based event tracer on ftrace.
Some other debugging tools might be able to use it.

 I can see the point of having a function for nth argument though,
 that's useful.
 
 +static inline unsigned long regs_get_argument_nth(struct pt_regs *regs,
 +  unsigned n)
 +{
 +if (n  NR_REGPARMS) {
 +switch (n) {
 +case 0:
 +return regs-ax;
 +case 1:
 +return regs-dx;
 +case 2:
 +return regs-cx;
 
 
 []
 
 That could be done shorter with a offsetof table.
 
 +if (n  NR_REGPARMS) {
 +switch (n) {
 +case 0:
 +return regs-di;
 +case 1:
 +return regs-si;
 +case 2:
 +return regs-dx;
 +case 3:
 +return regs-cx;
 +case 4:
 +return regs-r8;
 +case 5:
 +return regs-r9;
 
 and that too.

I'm not so sure about your idea.
Would you mean below code?

int offs_table[NR_REGPARMS] = {
[0] = offsetof(struct pt_regs, di),
...
};
if (n  NR_REGPARMS)
return *((unsigned long *)regs + offs_table[n]);


Thank you,

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v10 5/7] x86: add pt_regs register and stack access APIs

2009-07-06 Thread Masami Hiramatsu
Andi Kleen wrote:
 On Mon, Jul 06, 2009 at 03:28:02PM -0400, Masami Hiramatsu wrote:
 I'm not so sure about your idea.
 Would you mean below code?

 int offs_table[NR_REGPARMS] = {
 
 not REGPARMS of course
 
  [0] = offsetof(struct pt_regs, di),
  ...
 };
 if (n  NR_REGPARMS)
  return *((unsigned long *)regs + offs_table[n]);
 
 Yes.

OK, here, I updated my patch.

Thank you,


x86: add pt_regs register and stack access APIs

From: Masami Hiramatsu mhira...@redhat.com

Add following APIs for accessing registers and stack entries from pt_regs.
These APIs are required by kprobes-based event tracer on ftrace.
Some other debugging tools might be able to use it too.

- regs_query_register_offset(const char *name)
   Query the offset of name register.

- regs_query_register_name(unsigned offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned nth)
   Get Nth entry of the kernel stack. (N = 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned nth)
   Get Nth argument at function call. (N = 0)

Changes from v10:
 - Use an offsetof table in regs_get_argument_nth().

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Andi Kleen a...@firstfloor.org
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: linux-a...@vger.kernel.org
---

 arch/x86/include/asm/ptrace.h |   61 ++
 arch/x86/kernel/ptrace.c  |  112 +
 2 files changed, 173 insertions(+), 0 deletions(-)


diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..a9b7e2d 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@

 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif

 #ifndef __ASSEMBLY__
@@ -216,6 +217,66 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }

+/* Query offset/name of register from its name/offset */
+extern int regs_query_register_offset(const char *name);
+extern const char *regs_query_register_name(unsigned offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/**
+ * regs_get_register() - get register value from its offset
+ * @regs:  pt_regs from which register value is gotten.
+ * @offset:offset number of the register.
+ *
+ * regs_get_register returns the value of a register whose offset from @regs
+ * is @offset. The @offset is the offset of the register in struct pt_regs.
+ * If @offset is bigger than MAX_REG_OFFSET, this returns 0.
+ */
+static inline unsigned long regs_get_register(struct pt_regs *regs,
+ unsigned offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/**
+ * regs_within_kernel_stack() - check the address in the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @addr:  address which is checked.
+ *
+ * regs_within_kenel_stack() checks @addr is within the kernel stack page(s).
+ * If @addr is within the kernel stack, it returns true. If not, returns false.
+ */
+static inline int regs_within_kernel_stack(struct pt_regs *regs,
+  unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/**
+ * regs_get_kernel_stack_nth() - get Nth entry of the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @n: stack entry number.
+ *
+ * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
+ * is specifined by @regs. If the @n th entry is NOT in the kernel stack,
+ * this returns 0.
+ */
+static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
+ unsigned n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (regs_within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/* Get Nth argument at function call */
+extern unsigned long regs_get_argument_nth(struct pt_regs *regs, unsigned n);
+
 /*
  * These are defined as per linux/ptrace.h, which see.
  */
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index cabdabc..4f9b513 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -49,6 +49,118 @@ enum x86_regset

[PATCH -tip -v10 2/7] x86: x86 instruction decoder build-time selftest

2009-06-30 Thread Masami Hiramatsu
Add a user-space selftest of x86 instruction decoder at kernel build time.
When CONFIG_X86_DECODER_SELFTEST=y, Kbuild builds a test harness of x86
instruction decoder and performs it after building vmlinux.
The test compares the results of objdump and x86 instruction decoder
code and check there are no differences.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
Cc: Sam Ravnborg s...@ravnborg.org
---

 arch/x86/Kconfig.debug  |9 
 arch/x86/Makefile   |3 +
 arch/x86/include/asm/inat.h |2 +
 arch/x86/include/asm/insn.h |2 +
 arch/x86/lib/inat.c |2 +
 arch/x86/lib/insn.c |2 +
 arch/x86/scripts/Makefile   |   19 +++
 arch/x86/scripts/distill.awk|   42 +
 arch/x86/scripts/test_get_len.c |   99 +++
 arch/x86/scripts/user_include.h |   49 +++
 10 files changed, 229 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index d105f29..7d0b681 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -186,6 +186,15 @@ config X86_DS_SELFTEST
 config HAVE_MMIOTRACE_SUPPORT
def_bool y
 
+config X86_DECODER_SELFTEST
+ bool x86 instruction decoder selftest
+ depends on DEBUG_KERNEL
+   ---help---
+Perform x86 instruction decoder selftests at build time.
+This option is useful for checking the sanity of x86 instruction
+decoder code.
+If unsure, say N.
+
 #
 # IO delay types:
 #
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1b68659..7046556 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -154,6 +154,9 @@ all: bzImage
 KBUILD_IMAGE := $(boot)/bzImage
 
 bzImage: vmlinux
+ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
+   $(Q)$(MAKE) $(build)=arch/x86/scripts posttest
+endif
$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage 
$(objtree)/arch/$(UTS_MACHINE)/boot/$@
diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 01e079a..9090665 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -20,7 +20,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 
 /* Instruction attributes */
 typedef u32 insn_attr_t;
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5b50fa3..5736404 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -20,7 +20,9 @@
  * Copyright (C) IBM Corporation, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/types.h
+#endif
 /* insn_attr_t is defined in inat.h */
 #include asm/inat.h
 
diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c
index d6a34be..564ecbd 100644
--- a/arch/x86/lib/inat.c
+++ b/arch/x86/lib/inat.c
@@ -18,7 +18,9 @@
  * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  *
  */
+#ifdef __KERNEL__
 #include linux/module.h
+#endif
 #include asm/insn.h
 
 /* Attribute tables are generated from opcode map */
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 254c848..3b9451a 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -18,8 +18,10 @@
  * Copyright (C) IBM Corporation, 2002, 2004, 2009
  */
 
+#ifdef __KERNEL__
 #include linux/string.h
 #include linux/module.h
+#endif
 #include asm/inat.h
 #include asm/insn.h
 
diff --git a/arch/x86/scripts/Makefile b/arch/x86/scripts/Makefile
new file mode 100644
index 000..f08859e
--- /dev/null
+++ b/arch/x86/scripts/Makefile
@@ -0,0 +1,19 @@
+PHONY += posttest
+quiet_cmd_posttest = TEST$@
+  cmd_posttest = objdump -d $(objtree)/vmlinux | awk -f 
$(srctree)/arch/x86/scripts/distill.awk | $(obj)/test_get_len
+
+posttest: $(obj)/test_get_len vmlinux
+   $(call cmd,posttest)
+
+test_get_len_SRC = $(srctree)/arch/x86/scripts/test_get_len.c 
$(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c
+test_get_len_INC = $(srctree)/arch/x86/include/asm/inat.h 
$(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
+
+quiet_cmd_test_get_len = CC  $@
+  cmd_test_get_len = $(CC) -Wall $(test_get_len_SRC) 
-I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include -include

[PATCH -tip -v10 5/7] x86: add pt_regs register and stack access APIs

2009-06-30 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.

- regs_query_register_offset(const char *name)
   Query the offset of name register.

- regs_query_register_name(unsigned offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned nth)
   Get Nth entry of the kernel stack. (N = 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned nth)
   Get Nth argument at function call. (N = 0)

Changes from v9:
 -Fix a typo in a comment.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: linux-a...@vger.kernel.org
---

 arch/x86/include/asm/ptrace.h |  122 +
 arch/x86/kernel/ptrace.c  |   73 +
 2 files changed, 195 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..d5e3b3b 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@
 
 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif
 
 #ifndef __ASSEMBLY__
@@ -216,6 +217,127 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }
 
+/* Query offset/name of register from its name/offset */
+extern int regs_query_register_offset(const char *name);
+extern const char *regs_query_register_name(unsigned offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/**
+ * regs_get_register() - get register value from its offset
+ * @regs:  pt_regs from which register value is gotten.
+ * @offset:offset number of the register.
+ *
+ * regs_get_register returns the value of a register whose offset from @regs
+ * is @offset. The @offset is the offset of the register in struct pt_regs.
+ * If @offset is bigger than MAX_REG_OFFSET, this returns 0.
+ */
+static inline unsigned long regs_get_register(struct pt_regs *regs,
+ unsigned offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/**
+ * regs_within_kernel_stack() - check the address in the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @addr:  address which is checked.
+ *
+ * regs_within_kenel_stack() checks @addr is within the kernel stack page(s).
+ * If @addr is within the kernel stack, it returns true. If not, returns false.
+ */
+static inline int regs_within_kernel_stack(struct pt_regs *regs,
+  unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/**
+ * regs_get_kernel_stack_nth() - get Nth entry of the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @n: stack entry number.
+ *
+ * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
+ * is specifined by @regs. If the @n th entry is NOT in the kernel stack,
+ * this returns 0.
+ */
+static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
+ unsigned n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (regs_within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/**
+ * regs_get_argument_nth() - get Nth argument at function call
+ * @regs:  pt_regs which contains registers at function entry.
+ * @n: argument number.
+ *
+ * regs_get_argument_nth() returns @n th argument of a function call.
+ * Since usually the kernel stack will be changed right after function entry,
+ * you must use this at function entry. If the @n th entry is NOT in the
+ * kernel stack or pt_regs, this returns 0.
+ */
+#ifdef CONFIG_X86_32
+#define NR_REGPARMS 3
+static inline unsigned long regs_get_argument_nth(struct pt_regs *regs,
+ unsigned n)
+{
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0:
+   return regs-ax;
+   case 1:
+   return regs-dx;
+   case 2:
+   return regs-cx;
+   }
+   return 0;
+   } else {
+   /*
+* The typical case: arg n is on the stack

[PATCH -tip -v10 6/7] tracing: ftrace dynamic ftrace_event_call support

2009-06-30 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 include/linux/ftrace_event.h |   13 +---
 include/trace/ftrace.h   |   22 +++--
 kernel/trace/trace_events.c  |   70 --
 kernel/trace/trace_export.c  |   27 
 4 files changed, 85 insertions(+), 47 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c093ff..f7733b6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 1867553..d696580 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -147,7 +147,8 @@
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)   \
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call(
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret;
  *
@@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call(
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *unused)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call(
\
  * trace_current_buffer_unlock_commit(event, irq_flags, pc);
  * }
  *
- * static int ftrace_raw_reg_event_call(void)
+ * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret

[PATCH -tip -v10 3/7] kprobes: checks probe address is instruction boudary on x86

2009-06-30 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b5b1848..5341842 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,6 +48,7 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
@@ -55,6 +56,7 @@
 #include asm/uaccess.h
 #include asm/alternative.h
 #include asm/debugreg.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -245,6 +247,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v10 4/7] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-06-30 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 5341842..b77e050 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67:
-   case 0x2e:
-   case 0x3e:
-   case 0x26:
-   case 0x64:
-   case 0x65:
-   case 0x36:
-   case 0xf0:
-   case 0xf3:
-   case 0xf2:
-   ++insn;
-   continue;
-   }
-   break;
-   }
+   struct insn insn;
+   kernel_insn_init(insn, p-ainsn.insn);
 
-   /* Skip REX instruction prefix.  */
-   if (is_REX_prefix(insn))
-   ++insn;
-
-   if (*insn == 0x0f) {
-   /* Two-byte opcode.  */
-   ++insn

[PATCH -tip -v10 0/7] tracing: kprobe-based event tracer and x86 instruction decoder

2009-06-30 Thread Masami Hiramatsu
.

  cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 23
format:
field:unsigned short common_type;   offset:0;   size:2;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (7):
  tracing: add kprobe-based event tracer
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobes.txt|  138 
 arch/x86/Kconfig.debug |9 
 arch/x86/Makefile  |3 
 arch/x86/include/asm/inat.h|  127 +++
 arch/x86/include/asm/insn.h|  136 
 arch/x86/include/asm/ptrace.h  |  122 +++
 arch/x86/kernel/kprobes.c  |  197 ++---
 arch/x86/kernel/ptrace.c   |   73 ++
 arch/x86/lib/Makefile  |   13 
 arch/x86/lib/inat.c|   82 ++
 arch/x86/lib/insn.c|  473 +
 arch/x86/lib/x86-opcode-map.txt|  711 +++
 arch/x86/scripts/Makefile  |   19 +
 arch/x86/scripts/distill.awk   |   42 +
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 
 arch/x86/scripts/test_get_len.c|   99 +++
 arch/x86/scripts/user_include.h|   49 +
 include/linux/ftrace_event.h   |   13 
 include/trace/ftrace.h |   22 -
 kernel/trace/Kconfig   |   12 
 kernel/trace/Makefile  |1 
 kernel/trace/trace.h   |   22 +
 kernel/trace/trace_event_types.h   |   20 +
 kernel/trace/trace_events.c|   70 +-
 kernel/trace/trace_export.c|   27 -
 kernel/trace/trace_kprobe.c| 1183 
 26 files changed, 3825 insertions(+), 152 deletions(-)
 create mode 100644 Documentation/trace/kprobes.txt
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h
 create mode 100644 kernel/trace/trace_kprobe.c

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip -v10 1/7] x86: instruction decoder API

2009-06-30 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps do NOT include most of SSE and FP opcodes, because
those opcodes are not used in the kernel.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Acked-by: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/include/asm/inat.h|  125 ++
 arch/x86/include/asm/insn.h|  134 ++
 arch/x86/lib/Makefile  |   13 +
 arch/x86/lib/inat.c|   80 
 arch/x86/lib/insn.c|  471 +
 arch/x86/lib/x86-opcode-map.txt|  711 
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 ++
 7 files changed, 1848 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..01e079a
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,125 @@
+#ifndef _ASM_INAT_INAT_H
+#define _ASM_INAT_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include linux/types.h
+
+/* Instruction attributes */
+typedef u32 insn_attr_t;
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should add checking macros and use that macro in
+ * your code.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6
+#define INAT_IMM_VWORD 7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS  0
+#define INAT_PFX_BITS  4
+#define INAT_PFX_MAX((1  INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK  (INAT_PFX_MAX  INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS  (INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS  2
+#define INAT_ESC_MAX

[PATCH -tip -v10 7/7] tracing: add kprobe-based event tracer

2009-06-30 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
and kretprobe). It probes anywhere where kprobes can probe(this means, all
functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobes.txt for details.

Changes from v9:
 - Select CONFIG_GENERIC_TRACER when CONFIG_KPROBE_TRACER=y.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobes.txt  |  138 
 kernel/trace/Kconfig |   12 
 kernel/trace/Makefile|1 
 kernel/trace/trace.h |   22 +
 kernel/trace/trace_event_types.h |   20 +
 kernel/trace/trace_kprobe.c  | 1183 ++
 6 files changed, 1376 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobes.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobes.txt b/Documentation/trace/kprobes.txt
new file mode 100644
index 000..3a90ebb
--- /dev/null
+++ b/Documentation/trace/kprobes.txt
@@ -0,0 +1,138 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
+probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe
+
+ EVENT : Event name
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted
+ MEMADDR   : Address where the probe is inserted
+
+ FETCHARGS : Arguments
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
+an event under tracing/events/kprobes/EVENT, at the directory you can see
+'id', 'enabled', 'format' and 'filter'.
+
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+
+format:
+  It shows the format of this probe event. It also shows aliases of arguments
+ which you specified to kprobe_events.
+
+filter:
+  You can write filtering rules of this event. And you can use both of aliase
+ names and field names for describing filters.
+
+
+Usage examples
+--
+To add a probe as a new event, write a new definition

[RESEND][ PATCH -tip -v9 3/7] kprobes: checks probe address is instruction boudary on x86

2009-06-12 Thread Masami Hiramatsu
Ensure safeness of inserting kprobes by checking whether the specified
address is at the first byte of a instruction on x86.
This is done by decoding probed function from its head to the probe point.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |   69 +
 1 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b5b1848..5341842 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -48,6 +48,7 @@
 #include linux/preempt.h
 #include linux/module.h
 #include linux/kdebug.h
+#include linux/kallsyms.h
 
 #include asm/cacheflush.h
 #include asm/desc.h
@@ -55,6 +56,7 @@
 #include asm/uaccess.h
 #include asm/alternative.h
 #include asm/debugreg.h
+#include asm/insn.h
 
 void jprobe_return_end(void);
 
@@ -245,6 +247,71 @@ retry:
}
 }
 
+/* Recover the probed instruction at addr for further analysis. */
+static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long addr)
+{
+   struct kprobe *kp;
+   kp = get_kprobe((void *)addr);
+   if (!kp)
+   return -EINVAL;
+
+   /*
+*  Basically, kp-ainsn.insn has an original instruction.
+*  However, RIP-relative instruction can not do single-stepping
+*  at different place, fix_riprel() tweaks the displacement of
+*  that instruction. In that case, we can't recover the instruction
+*  from the kp-ainsn.insn.
+*
+*  On the other hand, kp-opcode has a copy of the first byte of
+*  the probed instruction, which is overwritten by int3. And
+*  the instruction at kp-addr is not modified by kprobes except
+*  for the first byte, we can recover the original instruction
+*  from it and kp-opcode.
+*/
+   memcpy(buf, kp-addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
+   buf[0] = kp-opcode;
+   return 0;
+}
+
+/* Dummy buffers for kallsyms_lookup */
+static char __dummy_buf[KSYM_NAME_LEN];
+
+/* Check if paddr is at an instruction boundary */
+static int __kprobes can_probe(unsigned long paddr)
+{
+   int ret;
+   unsigned long addr, offset = 0;
+   struct insn insn;
+   kprobe_opcode_t buf[MAX_INSN_SIZE];
+
+   if (!kallsyms_lookup(paddr, NULL, offset, NULL, __dummy_buf))
+   return 0;
+
+   /* Decode instructions */
+   addr = paddr - offset;
+   while (addr  paddr) {
+   kernel_insn_init(insn, (void *)addr);
+   insn_get_opcode(insn);
+
+   /* Check if the instruction has been modified. */
+   if (OPCODE1(insn) == BREAKPOINT_INSTRUCTION) {
+   ret = recover_probed_instruction(buf, addr);
+   if (ret)
+   /*
+* Another debugging subsystem might insert
+* this breakpoint. In that case, we can't
+* recover it.
+*/
+   return 0;
+   kernel_insn_init(insn, buf);
+   }
+   insn_get_length(insn);
+   addr += insn.length;
+   }
+
+   return (addr == paddr);
+}
+
 /*
  * Returns non-zero if opcode modifies the interrupt flag.
  */
@@ -360,6 +427,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
 
 int __kprobes arch_prepare_kprobe(struct kprobe *p)
 {
+   if (!can_probe((unsigned long)p-addr))
+   return -EILSEQ;
/* insn: must be on special executable page on x86. */
p-ainsn.insn = get_insn_slot();
if (!p-ainsn.insn)


-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND][ PATCH -tip -v9 1/7] x86: instruction decoder API

2009-06-12 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps do NOT include most of SSE and FP opcodes, because
those opcodes are not used in the kernel.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Srikar Dronamraju sri...@linux.vnet.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/include/asm/inat.h|  125 ++
 arch/x86/include/asm/insn.h|  134 ++
 arch/x86/lib/Makefile  |   13 +
 arch/x86/lib/inat.c|   80 
 arch/x86/lib/insn.c|  471 +
 arch/x86/lib/x86-opcode-map.txt|  711 
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 ++
 7 files changed, 1848 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..01e079a
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,125 @@
+#ifndef _ASM_INAT_INAT_H
+#define _ASM_INAT_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include linux/types.h
+
+/* Instruction attributes */
+typedef u32 insn_attr_t;
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should add checking macros and use that macro in
+ * your code.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6
+#define INAT_IMM_VWORD 7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS  0
+#define INAT_PFX_BITS  4
+#define INAT_PFX_MAX((1  INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK  (INAT_PFX_MAX  INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS  (INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS  2
+#define INAT_ESC_MAX   ((1

[RESEND][ PATCH -tip -v9 5/7] x86: add pt_regs register and stack access APIs

2009-06-12 Thread Masami Hiramatsu
Add following APIs for accessing registers and stack entries from pt_regs.

- regs_query_register_offset(const char *name)
   Query the offset of name register.

- regs_query_register_name(unsigned offset)
   Query the name of register by its offset.

- regs_get_register(struct pt_regs *regs, unsigned offset)
   Get the value of a register by its offset.

- regs_within_kernel_stack(struct pt_regs *regs, unsigned long addr)
   Check the address is in the kernel stack.

- regs_get_kernel_stack_nth(struct pt_regs *reg, unsigned nth)
   Get Nth entry of the kernel stack. (N = 0)

- regs_get_argument_nth(struct pt_regs *reg, unsigned nth)
   Get Nth argument at function call. (N = 0)


Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Roland McGrath rol...@redhat.com
Cc: linux-a...@vger.kernel.org
---

 arch/x86/include/asm/ptrace.h |  122 +
 arch/x86/kernel/ptrace.c  |   73 +
 2 files changed, 195 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 0f0d908..2fd3ea3 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -7,6 +7,7 @@
 
 #ifdef __KERNEL__
 #include asm/segment.h
+#include asm/page_types.h
 #endif
 
 #ifndef __ASSEMBLY__
@@ -216,6 +217,127 @@ static inline unsigned long user_stack_pointer(struct 
pt_regs *regs)
return regs-sp;
 }
 
+/* Query offset/name of register from its name/offset */
+extern int regs_query_register_offset(const char *name);
+extern const char *regs_query_register_name(unsigned offset);
+#define MAX_REG_OFFSET (offsetof(struct pt_regs, ss))
+
+/**
+ * regs_get_regsiter() - get register value from its offset
+ * @regs:  pt_regs from which register value is gotten.
+ * @offset:offset number of the register.
+ *
+ * regs_get_register returns the value of a register whose offset from @regs
+ * is @offset. The @offset is the offset of the register in struct pt_regs.
+ * If @offset is bigger than MAX_REG_OFFSET, this returns 0.
+ */
+static inline unsigned long regs_get_register(struct pt_regs *regs,
+ unsigned offset)
+{
+   if (unlikely(offset  MAX_REG_OFFSET))
+   return 0;
+   return *(unsigned long *)((unsigned long)regs + offset);
+}
+
+/**
+ * regs_within_kernel_stack() - check the address in the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @addr:  address which is checked.
+ *
+ * regs_within_kenel_stack() checks @addr is within the kernel stack page(s).
+ * If @addr is within the kernel stack, it returns true. If not, returns false.
+ */
+static inline int regs_within_kernel_stack(struct pt_regs *regs,
+  unsigned long addr)
+{
+   return ((addr  ~(THREAD_SIZE - 1))  ==
+   (kernel_stack_pointer(regs)  ~(THREAD_SIZE - 1)));
+}
+
+/**
+ * regs_get_kernel_stack_nth() - get Nth entry of the stack
+ * @regs:  pt_regs which contains kernel stack pointer.
+ * @n: stack entry number.
+ *
+ * regs_get_kernel_stack_nth() returns @n th entry of the kernel stack which
+ * is specifined by @regs. If the @n th entry is NOT in the kernel stack,
+ * this returns 0.
+ */
+static inline unsigned long regs_get_kernel_stack_nth(struct pt_regs *regs,
+ unsigned n)
+{
+   unsigned long *addr = (unsigned long *)kernel_stack_pointer(regs);
+   addr += n;
+   if (regs_within_kernel_stack(regs, (unsigned long)addr))
+   return *addr;
+   else
+   return 0;
+}
+
+/**
+ * regs_get_argument_nth() - get Nth argument at function call
+ * @regs:  pt_regs which contains registers at function entry.
+ * @n: argument number.
+ *
+ * regs_get_argument_nth() returns @n th argument of a function call.
+ * Since usually the kernel stack will be changed right after function entry,
+ * you must use this at function entry. If the @n th entry is NOT in the
+ * kernel stack or pt_regs, this returns 0.
+ */
+#ifdef CONFIG_X86_32
+#define NR_REGPARMS 3
+static inline unsigned long regs_get_argument_nth(struct pt_regs *regs,
+ unsigned n)
+{
+   if (n  NR_REGPARMS) {
+   switch (n) {
+   case 0:
+   return regs-ax;
+   case 1:
+   return regs-dx;
+   case 2:
+   return regs-cx;
+   }
+   return 0;
+   } else {
+   /*
+* The typical case: arg n is on the stack.
+* (Note: stack[0] = return address, so skip it)
+*/
+   return

[RESEND][ PATCH -tip -v9 7/7] tracing: add kprobe-based event tracer

2009-06-12 Thread Masami Hiramatsu
Add kprobes-based event tracer on ftrace.

This tracer is similar to the events tracer which is based on Tracepoint
infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
and kretprobe). It probes anywhere where kprobes can probe(this means, all
functions body except for __kprobes functions).

Similar to the events tracer, this tracer doesn't need to be activated via
current_tracer, instead of that, just set probe points via
/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.

This tracer supports following probe arguments for each probe.

  %REG  : Fetch register REG
  sN: Fetch Nth entry of stack (N = 0)
  @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  aN: Fetch function argument. (N = 0)
  rv: Fetch return value.
  ra: Fetch return address.
  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.

See Documentation/trace/kprobes.txt for details.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Acked-by: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Tom Zanussi tzanu...@gmail.com
---

 Documentation/trace/kprobes.txt  |  138 
 kernel/trace/Kconfig |   11 
 kernel/trace/Makefile|1 
 kernel/trace/trace.h |   22 +
 kernel/trace/trace_event_types.h |   20 +
 kernel/trace/trace_kprobe.c  | 1183 ++
 6 files changed, 1375 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/trace/kprobes.txt
 create mode 100644 kernel/trace/trace_kprobe.c

diff --git a/Documentation/trace/kprobes.txt b/Documentation/trace/kprobes.txt
new file mode 100644
index 000..3a90ebb
--- /dev/null
+++ b/Documentation/trace/kprobes.txt
@@ -0,0 +1,138 @@
+ Kprobe-based Event Tracer
+ =
+
+ Documentation is written by Masami Hiramatsu
+
+
+Overview
+
+This tracer is similar to the events tracer which is based on Tracepoint
+infrastructure. Instead of Tracepoint, this tracer is based on kprobes(kprobe
+and kretprobe). It probes anywhere where kprobes can probe(this means, all
+functions body except for __kprobes functions).
+
+Unlike the function tracer, this tracer can probe instructions inside of
+kernel functions. It allows you to check which instruction has been executed.
+
+Unlike the Tracepoint based events tracer, this tracer can add and remove
+probe points on the fly.
+
+Similar to the events tracer, this tracer doesn't need to be activated via
+current_tracer, instead of that, just set probe points via
+/sys/kernel/debug/tracing/kprobe_events. And you can set filters on each
+probe events via /sys/kernel/debug/tracing/events/kprobes/EVENT/filter.
+
+
+Synopsis of kprobe_events
+-
+  p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS]: set a probe
+  r[:EVENT] SYMBOL[+0] [FETCHARGS] : set a return probe
+
+ EVENT : Event name
+ SYMBOL[+offs|-offs]   : Symbol+offset where the probe is inserted
+ MEMADDR   : Address where the probe is inserted
+
+ FETCHARGS : Arguments
+  %REG : Fetch register REG
+  sN   : Fetch Nth entry of stack (N = 0)
+  @ADDR: Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs]: Fetch memory at SYM +|- offs (SYM should be a data 
symbol)
+  aN   : Fetch function argument. (N = 0)(*)
+  rv   : Fetch return value.(**)
+  ra   : Fetch return address.(**)
+  +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+
+  (*) aN may not correct on asmlinkaged functions and at the middle of
+  function body.
+  (**) only for return probe.
+  (***) this is useful for fetching a field of data structures.
+
+
+Per-Probe Event Filtering
+-
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, the tracer adds
+an event under tracing/events/kprobes/EVENT, at the directory you can see
+'id', 'enabled', 'format' and 'filter'.
+
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+
+format:
+  It shows the format of this probe event. It also shows aliases of arguments
+ which you specified to kprobe_events.
+
+filter:
+  You can write filtering rules of this event. And you can use both of aliase
+ names and field names for describing filters.
+
+
+Usage examples
+--
+To add a probe as a new event, write a new definition to kprobe_events
+as below.
+
+  echo p:myprobe do_sys_open a0 a1 a2 a3  
/sys/kernel/debug

[RESEND][ PATCH -tip -v9 0/7] tracing: kprobe-based event tracer and x86 instruction decoder

2009-06-12 Thread Masami Hiramatsu
;
field:unsigned char common_flags;   offset:2;   size:1;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (7):
  tracing: add kprobe-based event tracer
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobes.txt|  138 
 arch/x86/Kconfig.debug |9 
 arch/x86/Makefile  |3 
 arch/x86/include/asm/inat.h|  127 +++
 arch/x86/include/asm/insn.h|  136 
 arch/x86/include/asm/ptrace.h  |  122 +++
 arch/x86/kernel/kprobes.c  |  197 ++---
 arch/x86/kernel/ptrace.c   |   73 ++
 arch/x86/lib/Makefile  |   13 
 arch/x86/lib/inat.c|   82 ++
 arch/x86/lib/insn.c|  473 +
 arch/x86/lib/x86-opcode-map.txt|  711 +++
 arch/x86/scripts/Makefile  |   19 +
 arch/x86/scripts/distill.awk   |   42 +
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 
 arch/x86/scripts/test_get_len.c|   99 +++
 arch/x86/scripts/user_include.h|   49 +
 include/linux/ftrace_event.h   |   13 
 include/trace/ftrace.h |   22 -
 kernel/trace/Kconfig   |   11 
 kernel/trace/Makefile  |1 
 kernel/trace/trace.h   |   22 +
 kernel/trace/trace_event_types.h   |   20 +
 kernel/trace/trace_events.c|   70 +-
 kernel/trace/trace_export.c|   27 -
 kernel/trace/trace_kprobe.c| 1183 
 26 files changed, 3824 insertions(+), 152 deletions(-)
 create mode 100644 Documentation/trace/kprobes.txt
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h
 create mode 100644 kernel/trace/trace_kprobe.c

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RESEND][ PATCH -tip -v9 6/7] tracing: ftrace dynamic ftrace_event_call support

2009-06-12 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 include/linux/ftrace_event.h |   13 +---
 include/trace/ftrace.h   |   22 +++--
 kernel/trace/trace_events.c  |   70 --
 kernel/trace/trace_export.c  |   27 
 4 files changed, 85 insertions(+), 47 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5c093ff..f7733b6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 1867553..d696580 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -147,7 +147,8 @@
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -289,10 +290,9 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)   \
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -355,7 +355,7 @@ static inline int ftrace_get_offsets_##call(
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret;
  *
@@ -366,7 +366,7 @@ static inline int ftrace_get_offsets_##call(
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *unused)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -399,7 +399,7 @@ static inline int ftrace_get_offsets_##call(
\
  * trace_current_buffer_unlock_commit(event, irq_flags, pc);
  * }
  *
- * static int ftrace_raw_reg_event_call(void)
+ * static int ftrace_raw_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret

[RESEND][ PATCH -tip -v9 4/7] kprobes: cleanup fix_riprel() using insn decoder on x86

2009-06-12 Thread Masami Hiramatsu
Cleanup fix_riprel() in arch/x86/kernel/kprobes.c by using x86 instruction
decoder.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Jim Keniston jkeni...@us.ibm.com
Cc: Ingo Molnar mi...@elte.hu
---

 arch/x86/kernel/kprobes.c |  128 -
 1 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 5341842..b77e050 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -109,50 +109,6 @@ static const u32 twobyte_is_boostable[256 / 32] = {
/*  --- */
/*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
 };
-static const u32 onebyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 00 */
-   W(0x10, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 10 */
-   W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 20 */
-   W(0x30, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) , /* 30 */
-   W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
-   W(0x50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 50 */
-   W(0x60, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0) | /* 60 */
-   W(0x70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 70 */
-   W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
-   W(0x90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 90 */
-   W(0xa0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* a0 */
-   W(0xb0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* b0 */
-   W(0xc0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* c0 */
-   W(0xd0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
-   W(0xe0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* e0 */
-   W(0xf0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1)   /* f0 */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
-static const u32 twobyte_has_modrm[256 / 32] = {
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-   /*  --- */
-   W(0x00, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1) | /* 0f */
-   W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) , /* 1f */
-   W(0x20, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 2f */
-   W(0x30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 3f */
-   W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 4f */
-   W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 5f */
-   W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 6f */
-   W(0x70, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1) , /* 7f */
-   W(0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 8f */
-   W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 9f */
-   W(0xa0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1) | /* af */
-   W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* bf */
-   W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0) | /* cf */
-   W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* df */
-   W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* ef */
-   W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0)   /* ff */
-   /*  --- */
-   /*  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  */
-};
 #undef W
 
 struct kretprobe_blackpoint kretprobe_blacklist[] = {
@@ -345,68 +301,30 @@ static int __kprobes is_IF_modifier(kprobe_opcode_t *insn)
 static void __kprobes fix_riprel(struct kprobe *p)
 {
 #ifdef CONFIG_X86_64
-   u8 *insn = p-ainsn.insn;
-   s64 disp;
-   int need_modrm;
-
-   /* Skip legacy instruction prefixes.  */
-   while (1) {
-   switch (*insn) {
-   case 0x66:
-   case 0x67:
-   case 0x2e:
-   case 0x3e:
-   case 0x26:
-   case 0x64:
-   case 0x65:
-   case 0x36:
-   case 0xf0:
-   case 0xf3:
-   case 0xf2:
-   ++insn;
-   continue;
-   }
-   break;
-   }
+   struct insn insn;
+   kernel_insn_init(insn, p-ainsn.insn);
 
-   /* Skip REX instruction prefix.  */
-   if (is_REX_prefix(insn))
-   ++insn;
-
-   if (*insn == 0x0f) {
-   /* Two-byte opcode.  */
-   ++insn

[PATCH -tip v9 1/7] x86: instruction decoder API

2009-06-01 Thread Masami Hiramatsu
Add x86 instruction decoder to arch-specific libraries. This decoder
can decode x86 instructions used in kernel into prefix, opcode, modrm,
sib, displacement and immediates. This can also show the length of
instructions.

This version introduces instruction attributes for decoding instructions.
The instruction attribute tables are generated from the opcode map file
(x86-opcode-map.txt) by the generator script(gen-insn-attr-x86.awk).

Currently, the opcode maps are based on opcode maps in Intel(R) 64 and
IA-32 Architectures Software Developers Manual Vol.2: Appendix.A,
and consist of below two types of opcode tables.

1-byte/2-bytes/3-bytes opcodes, which has 256 elements, are
written as below;

 Table: table-name
 Referrer: escaped-name
 opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
  (or)
 opcode: escape # escaped-name
 EndTable

Group opcodes, which has 8 elements, are written as below;

 GrpTable: GrpXXX
 reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 
2nd-mnemonic ...]
 EndTable

These opcode maps do NOT include most of SSE and FP opcodes, because
those opcodes are not used in the kernel.

Changes from v6.1:
- fix patch title.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Signed-off-by: Jim Keniston jkeni...@us.ibm.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ananth N Mavinakayanahalli ana...@in.ibm.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Andi Kleen a...@linux.intel.com
Cc: Vegard Nossum vegard.nos...@gmail.com
Cc: Avi Kivity a...@redhat.com
Cc: Przemysław Pawełczyk przemys...@pawelczyk.it
---

 arch/x86/include/asm/inat.h|  125 ++
 arch/x86/include/asm/insn.h|  134 ++
 arch/x86/lib/Makefile  |   13 +
 arch/x86/lib/inat.c|   80 
 arch/x86/lib/insn.c|  471 +
 arch/x86/lib/x86-opcode-map.txt|  711 
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 ++
 7 files changed, 1848 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
new file mode 100644
index 000..01e079a
--- /dev/null
+++ b/arch/x86/include/asm/inat.h
@@ -0,0 +1,125 @@
+#ifndef _ASM_INAT_INAT_H
+#define _ASM_INAT_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu mhira...@redhat.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include linux/types.h
+
+/* Instruction attributes */
+typedef u32 insn_attr_t;
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should add checking macros and use that macro in
+ * your code.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy instruction prefixes */
+#define INAT_PFX_OPNDSZ1   /* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPNE 2   /* 0xF2 */ /* LPFX2 */
+#define INAT_PFX_REPE  3   /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_LOCK  4   /* 0xF0 */
+#define INAT_PFX_CS5   /* 0x2E */
+#define INAT_PFX_DS6   /* 0x3E */
+#define INAT_PFX_ES7   /* 0x26 */
+#define INAT_PFX_FS8   /* 0x64 */
+#define INAT_PFX_GS9   /* 0x65 */
+#define INAT_PFX_SS10  /* 0x36 */
+#define INAT_PFX_ADDRSZ11  /* 0x67 */
+
+#define INAT_LPREFIX_MAX   3
+
+/* Immediate size */
+#define INAT_IMM_BYTE  1
+#define INAT_IMM_WORD  2
+#define INAT_IMM_DWORD 3
+#define INAT_IMM_QWORD 4
+#define INAT_IMM_PTR   5
+#define INAT_IMM_VWORD32   6
+#define INAT_IMM_VWORD 7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS  0
+#define INAT_PFX_BITS  4
+#define INAT_PFX_MAX((1  INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK  (INAT_PFX_MAX  INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS  (INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS  2
+#define INAT_ESC_MAX   ((1

[PATCH -tip v9 0/7] tracing: kprobe-based event tracer and x86 instruction decoder

2009-06-01 Thread Masami Hiramatsu
;
field:unsigned char common_preempt_count;   offset:3;   size:1;
field:int common_pid;   offset:4;   size:4;
field:int common_tgid;  offset:8;   size:4;

field: unsigned long ip;offset:16;tsize:8;
field: int nargs;   offset:24;tsize:4;
field: unsigned long arg0;  offset:32;tsize:8;
field: unsigned long arg1;  offset:40;tsize:8;
field: unsigned long arg2;  offset:48;tsize:8;
field: unsigned long arg3;  offset:56;tsize:8;

alias: a0;  original: arg0;
alias: a1;  original: arg1;
alias: a2;  original: arg2;
alias: a3;  original: arg3;

print fmt: %lx: 0x%lx 0x%lx 0x%lx 0x%lx, ip, arg0, arg1, arg2, arg3


 You can see that the event has 4 arguments and alias expressions
corresponding to it.

  echo  /sys/kernel/debug/tracing/kprobe_events

 This clears all probe points. and you can see the traced information via
/sys/kernel/debug/tracing/trace.

  cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
   ...-1447  [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 
0x7fffd1ec4440 0x8000 0x0
   ...-1447  [001] 1038282.286878: sys_openat+0xc/0xe - do_sys_open: 
0xfffe 0x81367a3a
   ...-1447  [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xff9c 
0x40413c 0x8000 0x1b6
   ...-1447  [001] 1038282.286915: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a
   ...-1447  [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xff9c 
0x4041c6 0x98800 0x10
   ...-1447  [001] 1038282.286976: sys_open+0x1b/0x1d - do_sys_open: 
0x3 0x81367a3a


 Each line shows when the kernel hits a probe, and - SYMBOL means kernel
returns from SYMBOL(e.g. sys_open+0x1b/0x1d - do_sys_open means kernel
returns from do_sys_open to sys_open+0x1b).


Thank you,

---

Masami Hiramatsu (7):
  tracing: add kprobe-based event tracer
  tracing: ftrace dynamic ftrace_event_call support
  x86: add pt_regs register and stack access APIs
  kprobes: cleanup fix_riprel() using insn decoder on x86
  kprobes: checks probe address is instruction boudary on x86
  x86: x86 instruction decoder build-time selftest
  x86: instruction decoder API


 Documentation/trace/kprobes.txt|  138 
 arch/x86/Kconfig.debug |9 
 arch/x86/Makefile  |3 
 arch/x86/include/asm/inat.h|  127 +++
 arch/x86/include/asm/insn.h|  136 
 arch/x86/include/asm/ptrace.h  |  122 +++
 arch/x86/kernel/kprobes.c  |  197 ++---
 arch/x86/kernel/ptrace.c   |   73 ++
 arch/x86/lib/Makefile  |   13 
 arch/x86/lib/inat.c|   82 ++
 arch/x86/lib/insn.c|  473 +
 arch/x86/lib/x86-opcode-map.txt|  711 +++
 arch/x86/scripts/Makefile  |   19 +
 arch/x86/scripts/distill.awk   |   42 +
 arch/x86/scripts/gen-insn-attr-x86.awk |  314 
 arch/x86/scripts/test_get_len.c|   99 +++
 arch/x86/scripts/user_include.h|   49 +
 include/linux/ftrace_event.h   |   13 
 include/trace/ftrace.h |   22 -
 kernel/trace/Kconfig   |   11 
 kernel/trace/Makefile  |1 
 kernel/trace/trace.h   |   22 +
 kernel/trace/trace_event_types.h   |   20 +
 kernel/trace/trace_events.c|   70 +-
 kernel/trace/trace_export.c|   27 -
 kernel/trace/trace_kprobe.c| 1183 
 26 files changed, 3824 insertions(+), 152 deletions(-)
 create mode 100644 Documentation/trace/kprobes.txt
 create mode 100644 arch/x86/include/asm/inat.h
 create mode 100644 arch/x86/include/asm/insn.h
 create mode 100644 arch/x86/lib/inat.c
 create mode 100644 arch/x86/lib/insn.c
 create mode 100644 arch/x86/lib/x86-opcode-map.txt
 create mode 100644 arch/x86/scripts/Makefile
 create mode 100644 arch/x86/scripts/distill.awk
 create mode 100644 arch/x86/scripts/gen-insn-attr-x86.awk
 create mode 100644 arch/x86/scripts/test_get_len.c
 create mode 100644 arch/x86/scripts/user_include.h
 create mode 100644 kernel/trace/trace_kprobe.c

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -tip v9 6/7] tracing: ftrace dynamic ftrace_event_call support

2009-06-01 Thread Masami Hiramatsu
Add dynamic ftrace_event_call support to ftrace. Trace engines can adds new
ftrace_event_call to ftrace on the fly. Each operator functions of the call
takes a ftrace_event_call data structure as an argument, because these
functions may be shared among several ftrace_event_calls.

Changes from v8:
 - Lock event_mutex in trace_add/remove_event_call().
 - Add __trace_add/remove_event_call() for internal use.
 - Rename dummy variables to unused.

Signed-off-by: Masami Hiramatsu mhira...@redhat.com
Cc: Steven Rostedt rost...@goodmis.org
Cc: Ingo Molnar mi...@elte.hu
Cc: Tom Zanussi tzanu...@gmail.com
Cc: Frederic Weisbecker fweis...@gmail.com
---

 include/linux/ftrace_event.h |   13 +---
 include/trace/ftrace.h   |   22 +++--
 kernel/trace/trace_events.c  |   70 --
 kernel/trace/trace_export.c  |   27 
 4 files changed, 85 insertions(+), 47 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index bbf40f6..e25f3a4 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -108,12 +108,13 @@ struct ftrace_event_call {
struct dentry   *dir;
struct trace_event  *event;
int enabled;
-   int (*regfunc)(void);
-   void(*unregfunc)(void);
+   int (*regfunc)(struct ftrace_event_call *);
+   void(*unregfunc)(struct ftrace_event_call *);
int id;
-   int (*raw_init)(void);
-   int (*show_format)(struct trace_seq *s);
-   int (*define_fields)(void);
+   int (*raw_init)(struct ftrace_event_call *);
+   int (*show_format)(struct ftrace_event_call *,
+  struct trace_seq *);
+   int (*define_fields)(struct ftrace_event_call *);
struct list_headfields;
int filter_active;
void*filter;
@@ -138,6 +139,8 @@ extern int filter_current_check_discard(struct 
ftrace_event_call *call,
 
 extern int trace_define_field(struct ftrace_event_call *call, char *type,
  char *name, int offset, int size, int is_signed);
+extern int trace_add_event_call(struct ftrace_event_call *call);
+extern void trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)   (((type)(-1))  0)
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index b4ec83a..e163e4b 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -229,7 +229,8 @@ ftrace_raw_output_##call(struct trace_iterator *iter, int 
flags)\
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 static int \
-ftrace_format_##call(struct trace_seq *s)  \
+ftrace_format_##call(struct ftrace_event_call *event_call, \
+struct trace_seq *s)   \
 {  \
struct ftrace_raw_##call field __attribute__((unused)); \
int ret = 0;\
@@ -269,10 +270,9 @@ ftrace_format_##call(struct trace_seq *s)  
\
 #undef TRACE_EVENT
 #define TRACE_EVENT(call, proto, args, tstruct, func, print)   \
 int\
-ftrace_define_fields_##call(void)  \
+ftrace_define_fields_##call(struct ftrace_event_call *event_call)  \
 {  \
struct ftrace_raw_##call field; \
-   struct ftrace_event_call *event_call = event_##call;   \
int ret;\
\
__common_field(int, type, 1);   \
@@ -298,7 +298,7 @@ ftrace_define_fields_##call(void)   
\
  * event_trace_printk(_RET_IP_, call:  fmt);
  * }
  *
- * static int ftrace_reg_event_call(void)
+ * static int ftrace_reg_event_call(struct ftrace_event_call *unused)
  * {
  * int ret;
  *
@@ -309,7 +309,7 @@ ftrace_define_fields_##call(void)   
\
  * return ret;
  * }
  *
- * static void ftrace_unreg_event_call(void)
+ * static void ftrace_unreg_event_call(struct ftrace_event_call *unused)
  * {
  * unregister_trace_call(ftrace_event_call);
  * }
@@ -342,7 +342,7 @@ ftrace_define_fields_##call(void

  1   2   >