[tip:perf/urgent] perf, nmi: Fix unknown NMI warning

2014-02-21 Thread tip-bot for Markus Metzger
Commit-ID:  a3ef2229c94ff70998724cb64b9cb4c77db9e950
Gitweb: http://git.kernel.org/tip/a3ef2229c94ff70998724cb64b9cb4c77db9e950
Author: Markus Metzger 
AuthorDate: Fri, 14 Feb 2014 16:44:08 -0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 21 Feb 2014 22:09:01 +0100

perf, nmi: Fix unknown NMI warning

When using BTS on Core i7-4*, I get the below kernel warning.

$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317920] Do you have a strange power saving mode enabled?

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317945] Dazed and confused, but trying to continue

Make intel_pmu_handle_irq() take the full exit path when returning early.

Cc: eran...@google.com
Cc: pet...@infradead.org
Cc: mi...@kernel.org
Signed-off-by: Markus Metzger 
Signed-off-by: Andi Kleen 
Signed-off-by: Peter Zijlstra 
Link: 
http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-a...@firstfloor.org
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/perf_event_intel.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..698ae77 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1361,10 +1361,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
-   if (!status) {
-   intel_pmu_enable_all(0);
-   return handled;
-   }
+   if (!status)
+   goto done;
 
loops = 0;
 again:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:perf/urgent] perf, nmi: Fix unknown NMI warning

2014-02-21 Thread tip-bot for Markus Metzger
Commit-ID:  a3ef2229c94ff70998724cb64b9cb4c77db9e950
Gitweb: http://git.kernel.org/tip/a3ef2229c94ff70998724cb64b9cb4c77db9e950
Author: Markus Metzger markus.t.metz...@intel.com
AuthorDate: Fri, 14 Feb 2014 16:44:08 -0800
Committer:  Thomas Gleixner t...@linutronix.de
CommitDate: Fri, 21 Feb 2014 22:09:01 +0100

perf, nmi: Fix unknown NMI warning

When using BTS on Core i7-4*, I get the below kernel warning.

$ perf record -c 1 -e branches:u ls
Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2.

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317920] Do you have a strange power saving mode enabled?

Message from syslogd@labpc1501 at Nov 11 15:49:25 ...
 kernel:[  438.317945] Dazed and confused, but trying to continue

Make intel_pmu_handle_irq() take the full exit path when returning early.

Cc: eran...@google.com
Cc: pet...@infradead.org
Cc: mi...@kernel.org
Signed-off-by: Markus Metzger markus.t.metz...@intel.com
Signed-off-by: Andi Kleen a...@linux.intel.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: 
http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-a...@firstfloor.org
Signed-off-by: Thomas Gleixner t...@linutronix.de
---
 arch/x86/kernel/cpu/perf_event_intel.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..698ae77 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1361,10 +1361,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
-   if (!status) {
-   intel_pmu_enable_all(0);
-   return handled;
-   }
+   if (!status)
+   goto done;
 
loops = 0;
 again:
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/2] x86, ptrace, man: corresponding man pages

2008-02-18 Thread Markus Metzger
Man pages describing the user API of the ptrace BTS extensions.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-02-13 09:35:47.%N +0100
+++ man/man2/ptrace.2   2008-02-13 09:41:31.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Feb 2008, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,135 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   u32 size;
+   u32 flags;
+   u32 signal;
+   u32 bts_size;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records.
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information.
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow. May
+only be used together with PTRACE_BTS_O_ALLOC for configuration purposes.
+.TP
+.BR PTRACE_BTS_O_ALLOC
+Allocate a BTS buffer of size \fIsize\fP. Frees any previously
+allocated buffer for the same task.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns t

[patch 2/2] x86, ptrace, man: corresponding man pages

2008-02-18 Thread Markus Metzger
Man pages describing the user API of the ptrace BTS extensions.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-02-13 09:35:47.%N +0100
+++ man/man2/ptrace.2   2008-02-13 09:41:31.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Feb 2008, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,135 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   u32 size;
+   u32 flags;
+   u32 signal;
+   u32 bts_size;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records.
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information.
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow. May
+only be used together with PTRACE_BTS_O_ALLOC for configuration purposes.
+.TP
+.BR PTRACE_BTS_O_ALLOC
+Allocate a BTS buffer of size \fIsize\fP. Frees any previously
+allocated buffer for the same task.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer

[patch 2/2] x86, ptrace, man: corresponding man pages

2008-02-13 Thread Markus Metzger
Man pages describing the user API of the ptrace BTS extensions.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-02-13 09:35:47.%N +0100
+++ man/man2/ptrace.2   2008-02-13 09:41:31.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Feb 2008, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,135 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   u32 size;
+   u32 flags;
+   u32 signal;
+   u32 bts_size;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records.
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information.
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow. May
+only be used together with PTRACE_BTS_O_ALLOC for configuration purposes.
+.TP
+.BR PTRACE_BTS_O_ALLOC
+Allocate a BTS buffer of size \fIsize\fP. Frees any previously
+allocated buffer for the same task.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns t

[patch 2/2] x86, ptrace, man: corresponding man pages

2008-02-13 Thread Markus Metzger
Man pages describing the user API of the ptrace BTS extensions.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-02-13 09:35:47.%N +0100
+++ man/man2/ptrace.2   2008-02-13 09:41:31.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Feb 2008, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,135 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   u32 size;
+   u32 flags;
+   u32 signal;
+   u32 bts_size;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records.
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information.
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow. May
+only be used together with PTRACE_BTS_O_ALLOC for configuration purposes.
+.TP
+.BR PTRACE_BTS_O_ALLOC
+Allocate a BTS buffer of size \fIsize\fP. Frees any previously
+allocated buffer for the same task.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer

[patch 2/2] x86, ptrace, man: corresponding man pages

2008-01-08 Thread Markus Metzger
Man pages for the ptrace bts API.


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-01-08 11:21:38.%N +0100
+++ man/man2/ptrace.2   2008-01-08 11:22:38.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,134 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   u32 size;
+   u32 flags;
+   u32 signal;
+   u32 bts_size;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+drain

[patch 1/2] x86, ptrace: add bts_struct size to status command

2008-01-08 Thread Markus Metzger
Return the size of bts_struct in the PTRACE_BTS_STATUS command.
Change types to u32.


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2008-01-07 12:42:39.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2008-01-08 11:18:35.%N +0100
@@ -787,6 +787,8 @@
cfg.flags |= PTRACE_BTS_O_SCHED;
}
 
+   cfg.bts_size = sizeof(struct bts_struct);
+
if (copy_to_user(ucfg, , sizeof(cfg)))
return -EFAULT;
 
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2008-01-07 12:42:39.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2008-01-08 11:42:55.%N +0100
@@ -81,16 +81,21 @@
 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */
 
 #ifndef __ASSEMBLY__
+
+#include 
+
 /* configuration/status structure used in PTRACE_BTS_CONFIG and
PTRACE_BTS_STATUS commands.
 */
 struct ptrace_bts_config {
/* requested or actual size of BTS buffer in bytes */
-   unsigned int size;
+   u32 size;
/* bitmask of below flags */
-   unsigned int flags;
+   u32 flags;
/* buffer overflow signal */
-   unsigned int signal;
+   u32 signal;
+   /* actual size of bts_struct in bytes */
+   u32 bts_size;
 };
 #endif
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: add bts_struct size to status command

2008-01-08 Thread Markus Metzger
Return the size of bts_struct in the PTRACE_BTS_STATUS command.
Change types to u32.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2008-01-07 12:42:39.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2008-01-08 11:18:35.%N +0100
@@ -787,6 +787,8 @@
cfg.flags |= PTRACE_BTS_O_SCHED;
}
 
+   cfg.bts_size = sizeof(struct bts_struct);
+
if (copy_to_user(ucfg, cfg, sizeof(cfg)))
return -EFAULT;
 
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2008-01-07 12:42:39.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2008-01-08 11:42:55.%N +0100
@@ -81,16 +81,21 @@
 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */
 
 #ifndef __ASSEMBLY__
+
+#include asm/types.h
+
 /* configuration/status structure used in PTRACE_BTS_CONFIG and
PTRACE_BTS_STATUS commands.
 */
 struct ptrace_bts_config {
/* requested or actual size of BTS buffer in bytes */
-   unsigned int size;
+   u32 size;
/* bitmask of below flags */
-   unsigned int flags;
+   u32 flags;
/* buffer overflow signal */
-   unsigned int signal;
+   u32 signal;
+   /* actual size of bts_struct in bytes */
+   u32 bts_size;
 };
 #endif
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/2] x86, ptrace, man: corresponding man pages

2008-01-08 Thread Markus Metzger
Man pages for the ptrace bts API.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-01-08 11:21:38.%N +0100
+++ man/man2/ptrace.2   2008-01-08 11:22:38.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,134 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   u32 size;
+   u32 flags;
+   u32 signal;
+   u32 bts_size;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+draining using PTRACE_BTS_GET commands.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP

[patch 2/2] x86, ptrace, man: corresponding man pages

2008-01-07 Thread Markus Metzger
Man pages for the bts ptrace extension.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2008-01-07 13:12:08.%N +0100
+++ man/man2/ptrace.2   2008-01-07 13:15:42.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,137 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   unsigned int size;
+   unsigned int flags;
+   unsigned int signal;
+   unsigned short bts_size;
+   unsigned short version;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+\fIBts_size\fP is the actual size of the \fIptrace_bts_record\fP
+structure in bytes. It is ignored for configuration purposes.
+\fIVersion\fP is the version of the last branch recording API. It is
+ignored for configuration purposes.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the late

[patch 1/2] x86, ptrace: add version and last remaining size to status command

2008-01-07 Thread Markus Metzger
Return the API version and the size of a bts_struct in the PTRACE_BTS_STATUS 
command. This might be handy in case other archs want to use and extend the 
interface. It allows users to program against one version and continue to work 
for newer versions (they have to discard everything they don't understand, of 
course).

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2008-01-07 12:42:39.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2008-01-07 13:09:07.%N +0100
@@ -787,6 +787,9 @@
cfg.flags |= PTRACE_BTS_O_SCHED;
}
 
+   cfg.bts_size = sizeof(struct bts_struct);
+   cfg.version  = PTRACE_BTS_VERSION;
+
if (copy_to_user(ucfg, , sizeof(cfg)))
return -EFAULT;
 
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2008-01-07 12:42:39.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2008-01-07 13:08:39.%N +0100
@@ -91,6 +91,10 @@
unsigned int flags;
/* buffer overflow signal */
unsigned int signal;
+   /* actual size of bts_struct in bytes */
+   unsigned short bts_size;
+   /* interface version */
+   unsigned short version;
 };
 #endif
 
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2008-01-07 12:42:39.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2008-01-07 13:08:39.%N +0100
@@ -9,6 +9,8 @@
 
 #ifdef __KERNEL__
 
+#define PTRACE_BTS_VERSION 1
+
 /* the DS BTS struct is used for ptrace as well */
 #include 
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: add version and last remaining size to status command

2008-01-07 Thread Markus Metzger
Return the API version and the size of a bts_struct in the PTRACE_BTS_STATUS 
command. This might be handy in case other archs want to use and extend the 
interface. It allows users to program against one version and continue to work 
for newer versions (they have to discard everything they don't understand, of 
course).

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2008-01-07 12:42:39.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2008-01-07 13:09:07.%N +0100
@@ -787,6 +787,9 @@
cfg.flags |= PTRACE_BTS_O_SCHED;
}
 
+   cfg.bts_size = sizeof(struct bts_struct);
+   cfg.version  = PTRACE_BTS_VERSION;
+
if (copy_to_user(ucfg, cfg, sizeof(cfg)))
return -EFAULT;
 
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2008-01-07 12:42:39.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2008-01-07 13:08:39.%N +0100
@@ -91,6 +91,10 @@
unsigned int flags;
/* buffer overflow signal */
unsigned int signal;
+   /* actual size of bts_struct in bytes */
+   unsigned short bts_size;
+   /* interface version */
+   unsigned short version;
 };
 #endif
 
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2008-01-07 12:42:39.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2008-01-07 13:08:39.%N +0100
@@ -9,6 +9,8 @@
 
 #ifdef __KERNEL__
 
+#define PTRACE_BTS_VERSION 1
+
 /* the DS BTS struct is used for ptrace as well */
 #include asm/ds.h
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 5/5] x86, ptrace, man: man pages for ptrace BTS extensions

2007-12-20 Thread Markus Metzger
Document changes for this patch set.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-14 17:45:33.%N +0100
+++ man/man2/ptrace.2   2007-12-20 13:20:07.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,131 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   unsigned int size;
+   unsigned int flags;
+   unsigned int signal;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+draining using PTRACE_BTS_GET commands.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_DRAIN
+Reads all available BT

[patch 4/5] x86, ptrace: overflow signal API

2007-12-20 Thread Markus Metzger
Establish the user API for sending a user-defined signal to the traced task on 
a BTS buffer overflow.

This should complete the user API for the BTS ptrace extension.
The patches so far implement wrap-around overflow handling as is needed for 
debugging.

The remaining open is another overflow handling mechanism that sends a signal 
to the traced task on a buffer overflow.
This will take some more time from my side.

Since, from a user perspective, this occurs behind the scenes, the patch set 
should already be useful. More features may/will be added on top of it 
(overflow signal, pageable back-up buffers, kernel tracing, core file support, 
profiling, ...).


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
 ---

Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:09.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-20 13:52:14.%N +0100
@@ -88,11 +88,13 @@
unsigned int size;
/* bitmask of below flags */
unsigned int flags;
+   /* buffer overflow signal */
+   unsigned int signal;
 };
 
 #define PTRACE_BTS_O_TRACE 0x1 /* branch trace */
 #define PTRACE_BTS_O_SCHED 0x2 /* scheduling events w/ jiffies */
-#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIG? on buffer overflow
+#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIG on buffer overflow
   instead of wrapping around */
 #define PTRACE_BTS_O_CUT_SIZE  0x8 /* cut requested size to max available
   instead of failing */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/5] x86, ptrace: add buffer size checks

2007-12-20 Thread Markus Metzger
Pass the buffer size for (most) ptrace commands that pass user-allocated 
buffers and check that size before accessing the buffer. Unfortunately, 
PTRACE_BTS_GET already uses all 4 parameters.
Commands that access user buffers return the number of bytes or records read or 
written.


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:52:01.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-20 13:52:09.%N +0100
@@ -591,6 +591,7 @@
 }
 
 static int ptrace_bts_drain(struct task_struct *child,
+   long size,
struct bts_struct __user *out)
 {
int end, i;
@@ -603,6 +604,9 @@
if (end <= 0)
return end;
 
+   if (size < (end * sizeof(struct bts_struct)))
+   return -EIO;
+
for (i = 0; i < end; i++, out++) {
struct bts_struct ret;
int retval;
@@ -617,7 +621,7 @@
 
ds_clear(ds);
 
-   return i;
+   return end;
 }
 
 static int ptrace_bts_realloc(struct task_struct *child,
@@ -690,15 +694,22 @@
 }
 
 static int ptrace_bts_config(struct task_struct *child,
+long cfg_size,
 const struct ptrace_bts_config __user *ucfg)
 {
struct ptrace_bts_config cfg;
int bts_size, ret = 0;
void *ds;
 
+   if (cfg_size < sizeof(cfg))
+   return -EIO;
+
if (copy_from_user(, ucfg, sizeof(cfg)))
return -EFAULT;
 
+   if ((int)cfg.size < 0)
+   return -EINVAL;
+
bts_size = 0;
ds = (void *)child->thread.ds_area_msr;
if (ds) {
@@ -734,6 +745,8 @@
else
clear_tsk_thread_flag(child, TIF_BTS_TRACE_TS);
 
+   ret = sizeof(cfg);
+
 out:
if (child->thread.debugctlmsr)
set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
@@ -749,11 +762,15 @@
 }
 
 static int ptrace_bts_status(struct task_struct *child,
+long cfg_size,
 struct ptrace_bts_config __user *ucfg)
 {
void *ds = (void *)child->thread.ds_area_msr;
struct ptrace_bts_config cfg;
 
+   if (cfg_size < sizeof(cfg))
+   return -EIO;
+
memset(, 0, sizeof(cfg));
 
if (ds) {
@@ -935,12 +952,12 @@
 
case PTRACE_BTS_CONFIG:
ret = ptrace_bts_config
-   (child, (struct ptrace_bts_config __user *)addr);
+   (child, data, (struct ptrace_bts_config __user *)addr);
break;
 
case PTRACE_BTS_STATUS:
ret = ptrace_bts_status
-   (child, (struct ptrace_bts_config __user *)addr);
+   (child, data, (struct ptrace_bts_config __user *)addr);
break;
 
case PTRACE_BTS_SIZE:
@@ -958,7 +975,7 @@
 
case PTRACE_BTS_DRAIN:
ret = ptrace_bts_drain
-   (child, (struct bts_struct __user *) addr);
+   (child, data, (struct bts_struct __user *) addr);
break;
 
default:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:01.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-20 13:52:09.%N +0100
@@ -99,13 +99,15 @@
 
 #define PTRACE_BTS_CONFIG  40
 /* Configure branch trace recording.
-   DATA is ignored, ADDR points to a struct ptrace_bts_config.
+   ADDR points to a struct ptrace_bts_config.
+   DATA gives the size of that buffer.
A new buffer is allocated, iff the size changes.
+   Returns the number of bytes read.
 */
 #define PTRACE_BTS_STATUS  41
-/* Return the current configuration.
-   DATA is ignored, ADDR points to a struct ptrace_bts_config
-   that will contain the result.
+/* Return the current configuration in a struct ptrace_bts_config
+   pointed to by ADDR; DATA gives the size of that buffer.
+   Returns the number of bytes written.
 */
 #define PTRACE_BTS_SIZE42
 /* Return the number of available BTS records.
@@ -123,8 +125,8 @@
 */
 #define PTRACE_BTS_DRAIN   45
 /* Read all available BTS records and clear the buffer.
-   DATA is ignored. ADDR points to an array of struct bts_struct of
-   suitable size.
+   ADDR points to an array of struct bts_struct.
+   DATA gives the size of that buffer.
BTS records are read from oldest to newest.
Returns number of BTS records drained.
 */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, P

[patch 2/5] x86, ptrace: support 32bit-cross-64bit BTS recording

2007-12-20 Thread Markus Metzger
Support BTS recording of 32bit and 64bit tasks from 32bit or 64bit tasks.


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-20 13:51:20.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-20 13:52:01.%N +0100
@@ -111,53 +111,53 @@
  * Accessor functions for some DS and BTS fields using the above
  * global ptrace_bts_cfg.
  */
-static inline void *get_bts_buffer_base(char *base)
+static inline unsigned long get_bts_buffer_base(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_buffer_base.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_buffer_base.offset);
 }
-static inline void set_bts_buffer_base(char *base, void *value)
+static inline void set_bts_buffer_base(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_buffer_base.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_buffer_base.offset)) = value;
 }
-static inline void *get_bts_index(char *base)
+static inline unsigned long get_bts_index(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_index.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_index.offset);
 }
-static inline void set_bts_index(char *base, void *value)
+static inline void set_bts_index(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_index.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_index.offset)) = value;
 }
-static inline void *get_bts_absolute_maximum(char *base)
+static inline unsigned long get_bts_absolute_maximum(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_absolute_maximum.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset);
 }
-static inline void set_bts_absolute_maximum(char *base, void *value)
+static inline void set_bts_absolute_maximum(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_absolute_maximum.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset)) = value;
 }
-static inline void *get_bts_interrupt_threshold(char *base)
+static inline unsigned long get_bts_interrupt_threshold(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_interrupt_threshold.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset);
 }
-static inline void set_bts_interrupt_threshold(char *base, void *value)
+static inline void set_bts_interrupt_threshold(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_interrupt_threshold.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset)) = 
value;
 }
-static inline long get_from_ip(char *base)
+static inline unsigned long get_from_ip(char *base)
 {
-   return *(long *)(base + ds_cfg.from_ip.offset);
+   return *(unsigned long *)(base + ds_cfg.from_ip.offset);
 }
-static inline void set_from_ip(char *base, long value)
+static inline void set_from_ip(char *base, unsigned long value)
 {
-   (*(long *)(base + ds_cfg.from_ip.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.from_ip.offset)) = value;
 }
-static inline long get_to_ip(char *base)
+static inline unsigned long get_to_ip(char *base)
 {
-   return *(long *)(base + ds_cfg.to_ip.offset);
+   return *(unsigned long *)(base + ds_cfg.to_ip.offset);
 }
-static inline void set_to_ip(char *base, long value)
+static inline void set_to_ip(char *base, unsigned long value)
 {
-   (*(long *)(base + ds_cfg.to_ip.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.to_ip.offset)) = value;
 }
 static inline unsigned char get_info_type(char *base)
 {
@@ -180,7 +180,7 @@
 int ds_allocate(void **dsp, size_t bts_size_in_bytes)
 {
size_t bts_size_in_records;
-   void *bts;
+   unsigned long bts;
void *ds;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
@@ -197,7 +197,7 @@
if (bts_size_in_bytes <= 0)
return -EINVAL;
 
-   bts = kzalloc(bts_size_in_bytes, GFP_KERNEL);
+   bts = (unsigned long)kzalloc(bts_size_in_bytes, GFP_KERNEL);
 
if (!bts)
return -ENOMEM;
@@ -205,7 +205,7 @@
ds = kzalloc(ds_cfg.sizeof_ds, GFP_KERNEL);
 
if (!ds) {
-   kfree(bts);
+   kfree((void *)bts);
return -ENOMEM;
}
 
@@ -221,7 +221,7 @@
 int ds_free(void **dsp)
 {
if (*dsp)
-   kfree(get_bts_buffer_base(*dsp));
+   kfree((void *)get_bts_buffer_base(*dsp));
kfree(*dsp);
*dsp = 0;
 
@@ -230,7 +230,7 @@
 
 int ds_get_bts_size(void *ds)
 {
-   size_t size_in_bytes;
+   int size_in_bytes;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
@@ -246,7 +246,7 @@
 
 int ds_get_bts_end(void *ds)
 {
-   size_t size_in_bytes = ds_get_bts_size(ds);
+   int size

[patch 1/5] x86, ptrace: rlimit BTS buffer allocation

2007-12-20 Thread Markus Metzger
Check the rlimit of the tracing task for total and locked memory when 
allocating the BTS buffer.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:51:21.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-20 13:51:45.%N +0100
@@ -620,12 +620,80 @@
return i;
 }
 
+static int ptrace_bts_realloc(struct task_struct *child,
+ int size, int reduce_size)
+{
+   unsigned long rlim, vm;
+   int ret, old_size;
+
+   if (size < 0)
+   return -EINVAL;
+
+   old_size = ds_get_bts_size((void *)child->thread.ds_area_msr);
+   if (old_size < 0)
+   return old_size;
+
+   ret = ds_free((void **)>thread.ds_area_msr);
+   if (ret < 0)
+   goto out;
+
+   size >>= PAGE_SHIFT;
+   old_size >>= PAGE_SHIFT;
+
+   current->mm->total_vm  -= old_size;
+   current->mm->locked_vm -= old_size;
+
+   if (size == 0)
+   goto out;
+
+   rlim = current->signal->rlim[RLIMIT_AS].rlim_cur >> PAGE_SHIFT;
+   vm = current->mm->total_vm  + size;
+   if (rlim < vm) {
+   ret = -ENOMEM;
+
+   if (!reduce_size)
+   goto out;
+
+   size = rlim - current->mm->total_vm;
+   if (size <= 0)
+   goto out;
+   }
+
+   rlim = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT;
+   vm = current->mm->locked_vm  + size;
+   if (rlim < vm) {
+   ret = -ENOMEM;
+
+   if (!reduce_size)
+   goto out;
+
+   size = rlim - current->mm->locked_vm;
+   if (size <= 0)
+   goto out;
+   }
+
+   ret = ds_allocate((void **)>thread.ds_area_msr,
+ size << PAGE_SHIFT);
+   if (ret < 0)
+   goto out;
+
+   current->mm->total_vm  += size;
+   current->mm->locked_vm += size;
+
+out:
+   if (child->thread.ds_area_msr)
+   set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
+   else
+   clear_tsk_thread_flag(child, TIF_DS_AREA_MSR);
+
+   return ret;
+}
+
 static int ptrace_bts_config(struct task_struct *child,
 const struct ptrace_bts_config __user *ucfg)
 {
struct ptrace_bts_config cfg;
-   unsigned long debugctl_mask;
-   int bts_size, ret;
+   int bts_size, ret = 0;
void *ds;
 
if (copy_from_user(, ucfg, sizeof(cfg)))
@@ -638,59 +706,46 @@
if (bts_size < 0)
return bts_size;
}
+   cfg.size = PAGE_ALIGN(cfg.size);
 
if (bts_size != cfg.size) {
-   ret = ds_free((void **)>thread.ds_area_msr);
+   ret = ptrace_bts_realloc(child, cfg.size,
+cfg.flags & PTRACE_BTS_O_CUT_SIZE);
if (ret < 0)
-   return ret;
+   goto errout;
 
-   if (cfg.size > 0)
-   ret = ds_allocate((void **)>thread.ds_area_msr,
- cfg.size);
ds = (void *)child->thread.ds_area_msr;
-   if (ds)
-   set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
-   else
-   clear_tsk_thread_flag(child, TIF_DS_AREA_MSR);
-
-   if (ret < 0)
-   return ret;
-
-   bts_size = ds_get_bts_size(ds);
-   if (bts_size <= 0)
-   return bts_size;
}
 
-   if (ds) {
-   if (cfg.flags & PTRACE_BTS_O_SIGNAL) {
-   ret = ds_set_overflow(ds, DS_O_SIGNAL);
-   } else {
-   ret = ds_set_overflow(ds, DS_O_WRAP);
-   }
-   if (ret < 0)
-   return ret;
-   }
-
-   debugctl_mask = ds_debugctl_mask();
-   if (ds && (cfg.flags & PTRACE_BTS_O_TRACE)) {
-   child->thread.debugctlmsr |= debugctl_mask;
-   set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
-   } else {
-   /* there is no way for us to check whether we 'own'
-* the respective bits in the DEBUGCTL MSR, we're
-* about to clear */
-   child->thread.debugctlmsr &= ~debugctl_mask;
+   if (cfg.flags & PTRACE_BTS_O_SIGNAL)
+   ret = ds_set_overflow(ds, DS_O_SIGNAL);
+   else
+   ret = ds_set_overflow(ds, DS_O_WRAP);
+   if (ret < 0)
+   goto errout;
 
-   if

[patch 1/5] x86, ptrace: rlimit BTS buffer allocation

2007-12-20 Thread Markus Metzger
Check the rlimit of the tracing task for total and locked memory when 
allocating the BTS buffer.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:51:21.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-20 13:51:45.%N +0100
@@ -620,12 +620,80 @@
return i;
 }
 
+static int ptrace_bts_realloc(struct task_struct *child,
+ int size, int reduce_size)
+{
+   unsigned long rlim, vm;
+   int ret, old_size;
+
+   if (size  0)
+   return -EINVAL;
+
+   old_size = ds_get_bts_size((void *)child-thread.ds_area_msr);
+   if (old_size  0)
+   return old_size;
+
+   ret = ds_free((void **)child-thread.ds_area_msr);
+   if (ret  0)
+   goto out;
+
+   size = PAGE_SHIFT;
+   old_size = PAGE_SHIFT;
+
+   current-mm-total_vm  -= old_size;
+   current-mm-locked_vm -= old_size;
+
+   if (size == 0)
+   goto out;
+
+   rlim = current-signal-rlim[RLIMIT_AS].rlim_cur  PAGE_SHIFT;
+   vm = current-mm-total_vm  + size;
+   if (rlim  vm) {
+   ret = -ENOMEM;
+
+   if (!reduce_size)
+   goto out;
+
+   size = rlim - current-mm-total_vm;
+   if (size = 0)
+   goto out;
+   }
+
+   rlim = current-signal-rlim[RLIMIT_MEMLOCK].rlim_cur  PAGE_SHIFT;
+   vm = current-mm-locked_vm  + size;
+   if (rlim  vm) {
+   ret = -ENOMEM;
+
+   if (!reduce_size)
+   goto out;
+
+   size = rlim - current-mm-locked_vm;
+   if (size = 0)
+   goto out;
+   }
+
+   ret = ds_allocate((void **)child-thread.ds_area_msr,
+ size  PAGE_SHIFT);
+   if (ret  0)
+   goto out;
+
+   current-mm-total_vm  += size;
+   current-mm-locked_vm += size;
+
+out:
+   if (child-thread.ds_area_msr)
+   set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
+   else
+   clear_tsk_thread_flag(child, TIF_DS_AREA_MSR);
+
+   return ret;
+}
+
 static int ptrace_bts_config(struct task_struct *child,
 const struct ptrace_bts_config __user *ucfg)
 {
struct ptrace_bts_config cfg;
-   unsigned long debugctl_mask;
-   int bts_size, ret;
+   int bts_size, ret = 0;
void *ds;
 
if (copy_from_user(cfg, ucfg, sizeof(cfg)))
@@ -638,59 +706,46 @@
if (bts_size  0)
return bts_size;
}
+   cfg.size = PAGE_ALIGN(cfg.size);
 
if (bts_size != cfg.size) {
-   ret = ds_free((void **)child-thread.ds_area_msr);
+   ret = ptrace_bts_realloc(child, cfg.size,
+cfg.flags  PTRACE_BTS_O_CUT_SIZE);
if (ret  0)
-   return ret;
+   goto errout;
 
-   if (cfg.size  0)
-   ret = ds_allocate((void **)child-thread.ds_area_msr,
- cfg.size);
ds = (void *)child-thread.ds_area_msr;
-   if (ds)
-   set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
-   else
-   clear_tsk_thread_flag(child, TIF_DS_AREA_MSR);
-
-   if (ret  0)
-   return ret;
-
-   bts_size = ds_get_bts_size(ds);
-   if (bts_size = 0)
-   return bts_size;
}
 
-   if (ds) {
-   if (cfg.flags  PTRACE_BTS_O_SIGNAL) {
-   ret = ds_set_overflow(ds, DS_O_SIGNAL);
-   } else {
-   ret = ds_set_overflow(ds, DS_O_WRAP);
-   }
-   if (ret  0)
-   return ret;
-   }
-
-   debugctl_mask = ds_debugctl_mask();
-   if (ds  (cfg.flags  PTRACE_BTS_O_TRACE)) {
-   child-thread.debugctlmsr |= debugctl_mask;
-   set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
-   } else {
-   /* there is no way for us to check whether we 'own'
-* the respective bits in the DEBUGCTL MSR, we're
-* about to clear */
-   child-thread.debugctlmsr = ~debugctl_mask;
+   if (cfg.flags  PTRACE_BTS_O_SIGNAL)
+   ret = ds_set_overflow(ds, DS_O_SIGNAL);
+   else
+   ret = ds_set_overflow(ds, DS_O_WRAP);
+   if (ret  0)
+   goto errout;
 
-   if (!child-thread.debugctlmsr)
-   clear_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
-   }
+   if (cfg.flags  PTRACE_BTS_O_TRACE)
+   child-thread.debugctlmsr |= ds_debugctl_mask();
+   else

[patch 2/5] x86, ptrace: support 32bit-cross-64bit BTS recording

2007-12-20 Thread Markus Metzger
Support BTS recording of 32bit and 64bit tasks from 32bit or 64bit tasks.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-20 13:51:20.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-20 13:52:01.%N +0100
@@ -111,53 +111,53 @@
  * Accessor functions for some DS and BTS fields using the above
  * global ptrace_bts_cfg.
  */
-static inline void *get_bts_buffer_base(char *base)
+static inline unsigned long get_bts_buffer_base(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_buffer_base.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_buffer_base.offset);
 }
-static inline void set_bts_buffer_base(char *base, void *value)
+static inline void set_bts_buffer_base(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_buffer_base.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_buffer_base.offset)) = value;
 }
-static inline void *get_bts_index(char *base)
+static inline unsigned long get_bts_index(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_index.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_index.offset);
 }
-static inline void set_bts_index(char *base, void *value)
+static inline void set_bts_index(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_index.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_index.offset)) = value;
 }
-static inline void *get_bts_absolute_maximum(char *base)
+static inline unsigned long get_bts_absolute_maximum(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_absolute_maximum.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset);
 }
-static inline void set_bts_absolute_maximum(char *base, void *value)
+static inline void set_bts_absolute_maximum(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_absolute_maximum.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_absolute_maximum.offset)) = value;
 }
-static inline void *get_bts_interrupt_threshold(char *base)
+static inline unsigned long get_bts_interrupt_threshold(char *base)
 {
-   return *(void **)(base + ds_cfg.bts_interrupt_threshold.offset);
+   return *(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset);
 }
-static inline void set_bts_interrupt_threshold(char *base, void *value)
+static inline void set_bts_interrupt_threshold(char *base, unsigned long value)
 {
-   (*(void **)(base + ds_cfg.bts_interrupt_threshold.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.bts_interrupt_threshold.offset)) = 
value;
 }
-static inline long get_from_ip(char *base)
+static inline unsigned long get_from_ip(char *base)
 {
-   return *(long *)(base + ds_cfg.from_ip.offset);
+   return *(unsigned long *)(base + ds_cfg.from_ip.offset);
 }
-static inline void set_from_ip(char *base, long value)
+static inline void set_from_ip(char *base, unsigned long value)
 {
-   (*(long *)(base + ds_cfg.from_ip.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.from_ip.offset)) = value;
 }
-static inline long get_to_ip(char *base)
+static inline unsigned long get_to_ip(char *base)
 {
-   return *(long *)(base + ds_cfg.to_ip.offset);
+   return *(unsigned long *)(base + ds_cfg.to_ip.offset);
 }
-static inline void set_to_ip(char *base, long value)
+static inline void set_to_ip(char *base, unsigned long value)
 {
-   (*(long *)(base + ds_cfg.to_ip.offset)) = value;
+   (*(unsigned long *)(base + ds_cfg.to_ip.offset)) = value;
 }
 static inline unsigned char get_info_type(char *base)
 {
@@ -180,7 +180,7 @@
 int ds_allocate(void **dsp, size_t bts_size_in_bytes)
 {
size_t bts_size_in_records;
-   void *bts;
+   unsigned long bts;
void *ds;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
@@ -197,7 +197,7 @@
if (bts_size_in_bytes = 0)
return -EINVAL;
 
-   bts = kzalloc(bts_size_in_bytes, GFP_KERNEL);
+   bts = (unsigned long)kzalloc(bts_size_in_bytes, GFP_KERNEL);
 
if (!bts)
return -ENOMEM;
@@ -205,7 +205,7 @@
ds = kzalloc(ds_cfg.sizeof_ds, GFP_KERNEL);
 
if (!ds) {
-   kfree(bts);
+   kfree((void *)bts);
return -ENOMEM;
}
 
@@ -221,7 +221,7 @@
 int ds_free(void **dsp)
 {
if (*dsp)
-   kfree(get_bts_buffer_base(*dsp));
+   kfree((void *)get_bts_buffer_base(*dsp));
kfree(*dsp);
*dsp = 0;
 
@@ -230,7 +230,7 @@
 
 int ds_get_bts_size(void *ds)
 {
-   size_t size_in_bytes;
+   int size_in_bytes;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
@@ -246,7 +246,7 @@
 
 int ds_get_bts_end(void *ds)
 {
-   size_t size_in_bytes = ds_get_bts_size(ds);
+   int size_in_bytes

[patch 3/5] x86, ptrace: add buffer size checks

2007-12-20 Thread Markus Metzger
Pass the buffer size for (most) ptrace commands that pass user-allocated 
buffers and check that size before accessing the buffer. Unfortunately, 
PTRACE_BTS_GET already uses all 4 parameters.
Commands that access user buffers return the number of bytes or records read or 
written.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-20 13:52:01.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-20 13:52:09.%N +0100
@@ -591,6 +591,7 @@
 }
 
 static int ptrace_bts_drain(struct task_struct *child,
+   long size,
struct bts_struct __user *out)
 {
int end, i;
@@ -603,6 +604,9 @@
if (end = 0)
return end;
 
+   if (size  (end * sizeof(struct bts_struct)))
+   return -EIO;
+
for (i = 0; i  end; i++, out++) {
struct bts_struct ret;
int retval;
@@ -617,7 +621,7 @@
 
ds_clear(ds);
 
-   return i;
+   return end;
 }
 
 static int ptrace_bts_realloc(struct task_struct *child,
@@ -690,15 +694,22 @@
 }
 
 static int ptrace_bts_config(struct task_struct *child,
+long cfg_size,
 const struct ptrace_bts_config __user *ucfg)
 {
struct ptrace_bts_config cfg;
int bts_size, ret = 0;
void *ds;
 
+   if (cfg_size  sizeof(cfg))
+   return -EIO;
+
if (copy_from_user(cfg, ucfg, sizeof(cfg)))
return -EFAULT;
 
+   if ((int)cfg.size  0)
+   return -EINVAL;
+
bts_size = 0;
ds = (void *)child-thread.ds_area_msr;
if (ds) {
@@ -734,6 +745,8 @@
else
clear_tsk_thread_flag(child, TIF_BTS_TRACE_TS);
 
+   ret = sizeof(cfg);
+
 out:
if (child-thread.debugctlmsr)
set_tsk_thread_flag(child, TIF_DEBUGCTLMSR);
@@ -749,11 +762,15 @@
 }
 
 static int ptrace_bts_status(struct task_struct *child,
+long cfg_size,
 struct ptrace_bts_config __user *ucfg)
 {
void *ds = (void *)child-thread.ds_area_msr;
struct ptrace_bts_config cfg;
 
+   if (cfg_size  sizeof(cfg))
+   return -EIO;
+
memset(cfg, 0, sizeof(cfg));
 
if (ds) {
@@ -935,12 +952,12 @@
 
case PTRACE_BTS_CONFIG:
ret = ptrace_bts_config
-   (child, (struct ptrace_bts_config __user *)addr);
+   (child, data, (struct ptrace_bts_config __user *)addr);
break;
 
case PTRACE_BTS_STATUS:
ret = ptrace_bts_status
-   (child, (struct ptrace_bts_config __user *)addr);
+   (child, data, (struct ptrace_bts_config __user *)addr);
break;
 
case PTRACE_BTS_SIZE:
@@ -958,7 +975,7 @@
 
case PTRACE_BTS_DRAIN:
ret = ptrace_bts_drain
-   (child, (struct bts_struct __user *) addr);
+   (child, data, (struct bts_struct __user *) addr);
break;
 
default:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:01.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-20 13:52:09.%N +0100
@@ -99,13 +99,15 @@
 
 #define PTRACE_BTS_CONFIG  40
 /* Configure branch trace recording.
-   DATA is ignored, ADDR points to a struct ptrace_bts_config.
+   ADDR points to a struct ptrace_bts_config.
+   DATA gives the size of that buffer.
A new buffer is allocated, iff the size changes.
+   Returns the number of bytes read.
 */
 #define PTRACE_BTS_STATUS  41
-/* Return the current configuration.
-   DATA is ignored, ADDR points to a struct ptrace_bts_config
-   that will contain the result.
+/* Return the current configuration in a struct ptrace_bts_config
+   pointed to by ADDR; DATA gives the size of that buffer.
+   Returns the number of bytes written.
 */
 #define PTRACE_BTS_SIZE42
 /* Return the number of available BTS records.
@@ -123,8 +125,8 @@
 */
 #define PTRACE_BTS_DRAIN   45
 /* Read all available BTS records and clear the buffer.
-   DATA is ignored. ADDR points to an array of struct bts_struct of
-   suitable size.
+   ADDR points to an array of struct bts_struct.
+   DATA gives the size of that buffer.
BTS records are read from oldest to newest.
Returns number of BTS records drained.
 */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer

[patch 4/5] x86, ptrace: overflow signal API

2007-12-20 Thread Markus Metzger
Establish the user API for sending a user-defined signal to the traced task on 
a BTS buffer overflow.

This should complete the user API for the BTS ptrace extension.
The patches so far implement wrap-around overflow handling as is needed for 
debugging.

The remaining open is another overflow handling mechanism that sends a signal 
to the traced task on a buffer overflow.
This will take some more time from my side.

Since, from a user perspective, this occurs behind the scenes, the patch set 
should already be useful. More features may/will be added on top of it 
(overflow signal, pageable back-up buffers, kernel tracing, core file support, 
profiling, ...).


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
 ---

Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-20 13:52:09.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-20 13:52:14.%N +0100
@@ -88,11 +88,13 @@
unsigned int size;
/* bitmask of below flags */
unsigned int flags;
+   /* buffer overflow signal */
+   unsigned int signal;
 };
 
 #define PTRACE_BTS_O_TRACE 0x1 /* branch trace */
 #define PTRACE_BTS_O_SCHED 0x2 /* scheduling events w/ jiffies */
-#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIG? on buffer overflow
+#define PTRACE_BTS_O_SIGNAL 0x4 /* send SIGsignal on buffer overflow
   instead of wrapping around */
 #define PTRACE_BTS_O_CUT_SIZE  0x8 /* cut requested size to max available
   instead of failing */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 5/5] x86, ptrace, man: man pages for ptrace BTS extensions

2007-12-20 Thread Markus Metzger
Document changes for this patch set.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-14 17:45:33.%N +0100
+++ man/man2/ptrace.2   2007-12-20 13:20:07.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,131 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   u64 qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   u64 from_ip;
+   u64 to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   u64 timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   unsigned int size;
+   unsigned int flags;
+   unsigned int signal;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fIsignal\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+\fISignal\fP is the signal to send to the traced task in case of a
+buffer overflow.
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above); \fIdata\fP specifies
+the size of that structure.
+Returns the number of bytes read.
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure. \fIData\fP specifies the size of that structure.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+Returns the number of bytes written.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+draining using PTRACE_BTS_GET commands.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_DRAIN
+Reads all available BTS records into the buffer pointed to by
+\fIaddr\fP and clears the buffer

[patch 5/5] x86, ptrace, man: updated man pages for the ptrace API changes

2007-12-14 Thread Markus Metzger
Describe the ptrace user API changes for this patch set.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-14 13:22:17.%N +0100
+++ man/man2/ptrace.2   2007-12-14 14:35:10.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,126 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   unsigned long size;
+   unsigned long flags;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fBSIG?\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above).
+(\fIdata\fP is ignored.)
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure.
+(\fIdata\fP is ignored.)
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+draining using PTRACE_BTS_GET commands.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_DRAIN
+Reads all available BTS records into the buffer pointed to by
+\fIaddr\fP and clears the buffer. Records are read from oldest to
+newest. Returns the number of BTS records drained. The caller
+is responsible for allocating enough memor

[patch 4/5] x86, ptrace: new ptrace BTS API

2007-12-14 Thread Markus Metzger
Here's the new ptrace BTS API that supports two different overflow handling 
mechanisms (wrap-around and buffer-full-signal) to support two different use 
cases (debugging and profiling).

It further combines buffer allocation and configuration.


Opens:
- memory rlimit
- overflow signal

What would be the right signal to use?


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-14 15:31:48.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-14 15:31:48.%N +0100
@@ -177,18 +177,20 @@
 }
 
 
-int ds_allocate(void **dsp, size_t bts_size_in_records)
+int ds_allocate(void **dsp, size_t bts_size_in_bytes)
 {
-   size_t bts_size_in_bytes = 0;
-   void *bts = 0;
-   void *ds = 0;
+   size_t bts_size_in_records;
+   void *bts;
+   void *ds;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
 
-   if (bts_size_in_records < 0)
+   if (bts_size_in_bytes < 0)
return -EINVAL;
 
+   bts_size_in_records =
+   bts_size_in_bytes / ds_cfg.sizeof_bts;
bts_size_in_bytes =
bts_size_in_records * ds_cfg.sizeof_bts;
 
@@ -233,9 +235,21 @@
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
 
+   if (!ds)
+   return 0;
+
size_in_bytes =
get_bts_absolute_maximum(ds) -
get_bts_buffer_base(ds);
+   return size_in_bytes;
+}
+
+int ds_get_bts_end(void *ds)
+{
+   size_t size_in_bytes = ds_get_bts_size(ds);
+
+   if (size_in_bytes <= 0)
+   return size_in_bytes;
 
return size_in_bytes / ds_cfg.sizeof_bts;
 }
@@ -254,6 +268,38 @@
return index_offset_in_bytes / ds_cfg.sizeof_bts;
 }
 
+int ds_set_overflow(void *ds, int method)
+{
+   switch (method) {
+   case DS_O_SIGNAL:
+   return -EOPNOTSUPP;
+   case DS_O_WRAP:
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+int ds_get_overflow(void *ds)
+{
+   return DS_O_WRAP;
+}
+
+int ds_clear(void *ds)
+{
+   int bts_size = ds_get_bts_size(ds);
+   void *bts_base;
+
+   if (bts_size <= 0)
+   return bts_size;
+
+   bts_base = get_bts_buffer_base(ds);
+   memset(bts_base, 0, bts_size);
+
+   set_bts_index(ds, bts_base);
+   return 0;
+}
+
 int ds_read_bts(void *ds, size_t index, struct bts_struct *out)
 {
void *bts;
Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-14 15:31:48.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-14 17:32:40.%N +0100
@@ -33,12 +33,6 @@
 
 
 /*
- * The maximal size of a BTS buffer per traced task in number of BTS
- * records.
- */
-#define PTRACE_BTS_BUFFER_MAX 4000
-
-/*
  * does not yet catch signals sent when the child dies.
  * in exit.c or in signal.c.
  */
@@ -466,17 +460,12 @@
return 0;
 }
 
-static int ptrace_bts_max_buffer_size(void)
-{
-   return PTRACE_BTS_BUFFER_MAX;
-}
-
-static int ptrace_bts_get_buffer_size(struct task_struct *child)
+static int ptrace_bts_get_size(struct task_struct *child)
 {
if (!child->thread.ds_area_msr)
return -ENXIO;
 
-   return ds_get_bts_size((void *)child->thread.ds_area_msr);
+   return ds_get_bts_index((void *)child->thread.ds_area_msr);
 }
 
 static int ptrace_bts_read_record(struct task_struct *child,
@@ -485,7 +474,7 @@
 {
struct bts_struct ret;
int retval;
-   int bts_size;
+   int bts_end;
int bts_index;
 
if (!child->thread.ds_area_msr)
@@ -494,15 +483,15 @@
if (index < 0)
return -EINVAL;
 
-   bts_size = ds_get_bts_size((void *)child->thread.ds_area_msr);
-   if (bts_size <= index)
+   bts_end = ds_get_bts_end((void *)child->thread.ds_area_msr);
+   if (bts_end <= index)
return -EINVAL;
 
/* translate the ptrace bts index into the ds bts index */
bts_index = ds_get_bts_index((void *)child->thread.ds_area_msr);
bts_index -= (index + 1);
if (bts_index < 0)
-   bts_index += bts_size;
+   bts_index += bts_end;
 
retval = ds_read_bts((void *)child->thread.ds_area_msr,
 bts_index, );
@@ -530,19 +519,97 @@
return sizeof(*in);
 }
 
-static int ptrace_bts_config(struct task_struct *child,
-unsigned long options)
+static int ptrace_bts_clear(struct task_struct *child)
 {
-   unsigned long debugctl_mask = ds_debugctl_mask();
-   int retval;
+   if (!child->thread.ds_area_msr)
+   return -E

[patch 3/5] x86, ptrace: change BTS GET ptrace interface

2007-12-14 Thread Markus Metzger
Change the ptrace interface to mimick an array from newst to oldest.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 11:14:26.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 11:34:20.%N +0100
@@ -479,26 +479,33 @@
return ds_get_bts_size((void *)child->thread.ds_area_msr);
 }
 
-static int ptrace_bts_get_index(struct task_struct *child)
-{
-   if (!child->thread.ds_area_msr)
-   return -ENXIO;
-
-   return ds_get_bts_index((void *)child->thread.ds_area_msr);
-}
-
 static int ptrace_bts_read_record(struct task_struct *child,
  long index,
  struct bts_struct __user *out)
 {
struct bts_struct ret;
int retval;
+   int bts_size;
+   int bts_index;
 
if (!child->thread.ds_area_msr)
return -ENXIO;
 
+   if (index < 0)
+   return -EINVAL;
+
+   bts_size = ds_get_bts_size((void *)child->thread.ds_area_msr);
+   if (bts_size <= index)
+   return -EINVAL;
+
+   /* translate the ptrace bts index into the ds bts index */
+   bts_index = ds_get_bts_index((void *)child->thread.ds_area_msr);
+   bts_index -= (index + 1);
+   if (bts_index < 0)
+   bts_index += bts_size;
+
retval = ds_read_bts((void *)child->thread.ds_area_msr,
-index, );
+bts_index, );
if (retval)
return retval;
 
@@ -813,10 +820,6 @@
ret = ptrace_bts_get_buffer_size(child);
break;
 
-   case PTRACE_BTS_GET_INDEX:
-   ret = ptrace_bts_get_index(child);
-   break;
-
case PTRACE_BTS_READ_RECORD:
ret = ptrace_bts_read_record
(child, data,
@@ -1017,7 +1020,6 @@
case PTRACE_BTS_MAX_BUFFER_SIZE:
case PTRACE_BTS_ALLOCATE_BUFFER:
case PTRACE_BTS_GET_BUFFER_SIZE:
-   case PTRACE_BTS_GET_INDEX:
case PTRACE_BTS_READ_RECORD:
case PTRACE_BTS_CONFIG:
case PTRACE_BTS_STATUS:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-10 11:14:26.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-10 11:16:27.%N +0100
@@ -99,32 +99,27 @@
ENXIOno buffer allocated */
 #define PTRACE_BTS_GET_BUFFER_SIZE 42
 
-/* Return the index of the next bts record to be written,
-   if successful; -1, otherwise.
-   EOPNOTSUPP...processor does not support bts tracing
-   ENXIOno buffer allocated
-   After the first warp-around, this is the start of the circular bts buffer. 
*/
-#define PTRACE_BTS_GET_INDEX 43
-
-/* Read the DATA'th bts record into a ptrace_bts_record buffer provided in 
ADDR.
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
+   provided in ADDR.
+   Records are ordered from newest to oldest.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated
EINVAL...invalid index */
-#define PTRACE_BTS_READ_RECORD 44
+#define PTRACE_BTS_READ_RECORD 43
 
 /* Configure last branch trace; the configuration is given as a bit-mask of
PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_CONFIG 45
+#define PTRACE_BTS_CONFIG 44
 
 /* Return the configuration as bit-mask of PTRACE_BTS_O_* options
if successful; -1, otherwise.
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_STATUS 46
+#define PTRACE_BTS_STATUS 45
 
 /* Trace configuration options */
 /* Collect last branch trace */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/5] x86, ptrace: use jiffies for BTS timestamps

2007-12-14 Thread Markus Metzger
Replace sched_clock() with jiffies for BTS timestamps.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 10:11:37.%N +0100
@@ -616,7 +616,7 @@
 {
struct bts_struct rec = {
.qualifier = qualifier,
-   .variant.timestamp = sched_clock()
+   .variant.jiffies = jiffies
};
 
if (ptrace_bts_get_buffer_size(tsk) <= 0)
Index: linux-2.6-x86/include/asm-x86/ds.h
===
--- linux-2.6-x86.orig/include/asm-x86/ds.h 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/include/asm-x86/ds.h  2007-12-10 10:11:37.%N +0100
@@ -48,7 +48,7 @@
} lbr;
/* BTS_TASK_ARRIVES or
   BTS_TASK_DEPARTS */
-   unsigned long long timestamp;
+   unsigned long jiffies;
} variant;
 };
 
Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-10 09:13:55.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-10 10:22:43.%N +0100
@@ -167,23 +167,13 @@
 {
(*(unsigned char *)(base + ds_cfg.info_type.offset)) = value;
 }
-/*
- * The info data might overlap with the info type on some architectures.
- * We therefore read and write the exact number of bytes.
- */
-static inline unsigned long long get_info_data(char *base)
-{
-   unsigned long long value = 0;
-   memcpy(,
-  base + ds_cfg.info_data.offset,
-  ds_cfg.info_data.size);
-   return value;
-}
-static inline void set_info_data(char *base, unsigned long long value)
-{
-   memcpy(base + ds_cfg.info_data.offset,
-  ,
-  ds_cfg.info_data.size);
+static inline unsigned long get_info_data(char *base)
+{
+   return *(unsigned long *)(base + ds_cfg.info_data.offset);
+}
+static inline void set_info_data(char *base, unsigned long value)
+{
+   (*(unsigned long *)(base + ds_cfg.info_data.offset)) = value;
 }
 
 
@@ -282,8 +272,8 @@
 
memset(out, 0, sizeof(*out));
if (get_from_ip(bts) == BTS_ESCAPE_ADDRESS) {
-   out->qualifier = get_info_type(bts);
-   out->variant.timestamp = get_info_data(bts);
+   out->qualifier   = get_info_type(bts);
+   out->variant.jiffies = get_info_data(bts);
} else {
out->qualifier = BTS_BRANCH;
out->variant.lbr.from_ip = get_from_ip(bts);
@@ -319,7 +309,7 @@
case BTS_TASK_DEPARTS:
set_from_ip(bts, BTS_ESCAPE_ADDRESS);
set_info_type(bts, in->qualifier);
-   set_info_data(bts, in->variant.timestamp);
+   set_info_data(bts, in->variant.jiffies);
break;
 
default:
@@ -350,7 +340,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (1<<2)|(1<<3)
 };
 
@@ -364,7 +354,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (1<<6)|(1<<7)
 };
 #endif /* _i386_ */
@@ -379,7 +369,7 @@
.from_ip = { 0, 8 },
.to_ip = { 8, 8 },
.info_type = { 8, 1 },
-   .info_data = { 9, 7 },
+   .info_data = { 16, 8 },
.debugctl_mask = (1<<6)|(1<<7)|(1<<9)
 };
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/5] x86, ptrace: remove bad comment

2007-12-14 Thread Markus Metzger
Remove no longer correct comment.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/process_64.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_64.c 2007-12-14 15:31:37.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_64.c  2007-12-14 15:31:42.%N +0100
@@ -597,10 +597,6 @@
memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
}
 
-   /*
-* Last branch recording recofiguration of trace hardware and
-* disentangling of trace data per task.
-*/
if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/5] x86, ptrace: remove bad comment

2007-12-14 Thread Markus Metzger
Remove no longer correct comment.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/process_64.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_64.c 2007-12-14 15:31:37.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_64.c  2007-12-14 15:31:42.%N +0100
@@ -597,10 +597,6 @@
memset(tss-io_bitmap, 0xff, prev-io_bitmap_max);
}
 
-   /*
-* Last branch recording recofiguration of trace hardware and
-* disentangling of trace data per task.
-*/
if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/5] x86, ptrace: use jiffies for BTS timestamps

2007-12-14 Thread Markus Metzger
Replace sched_clock() with jiffies for BTS timestamps.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 10:11:37.%N +0100
@@ -616,7 +616,7 @@
 {
struct bts_struct rec = {
.qualifier = qualifier,
-   .variant.timestamp = sched_clock()
+   .variant.jiffies = jiffies
};
 
if (ptrace_bts_get_buffer_size(tsk) = 0)
Index: linux-2.6-x86/include/asm-x86/ds.h
===
--- linux-2.6-x86.orig/include/asm-x86/ds.h 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/include/asm-x86/ds.h  2007-12-10 10:11:37.%N +0100
@@ -48,7 +48,7 @@
} lbr;
/* BTS_TASK_ARRIVES or
   BTS_TASK_DEPARTS */
-   unsigned long long timestamp;
+   unsigned long jiffies;
} variant;
 };
 
Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-10 09:13:55.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-10 10:22:43.%N +0100
@@ -167,23 +167,13 @@
 {
(*(unsigned char *)(base + ds_cfg.info_type.offset)) = value;
 }
-/*
- * The info data might overlap with the info type on some architectures.
- * We therefore read and write the exact number of bytes.
- */
-static inline unsigned long long get_info_data(char *base)
-{
-   unsigned long long value = 0;
-   memcpy(value,
-  base + ds_cfg.info_data.offset,
-  ds_cfg.info_data.size);
-   return value;
-}
-static inline void set_info_data(char *base, unsigned long long value)
-{
-   memcpy(base + ds_cfg.info_data.offset,
-  value,
-  ds_cfg.info_data.size);
+static inline unsigned long get_info_data(char *base)
+{
+   return *(unsigned long *)(base + ds_cfg.info_data.offset);
+}
+static inline void set_info_data(char *base, unsigned long value)
+{
+   (*(unsigned long *)(base + ds_cfg.info_data.offset)) = value;
 }
 
 
@@ -282,8 +272,8 @@
 
memset(out, 0, sizeof(*out));
if (get_from_ip(bts) == BTS_ESCAPE_ADDRESS) {
-   out-qualifier = get_info_type(bts);
-   out-variant.timestamp = get_info_data(bts);
+   out-qualifier   = get_info_type(bts);
+   out-variant.jiffies = get_info_data(bts);
} else {
out-qualifier = BTS_BRANCH;
out-variant.lbr.from_ip = get_from_ip(bts);
@@ -319,7 +309,7 @@
case BTS_TASK_DEPARTS:
set_from_ip(bts, BTS_ESCAPE_ADDRESS);
set_info_type(bts, in-qualifier);
-   set_info_data(bts, in-variant.timestamp);
+   set_info_data(bts, in-variant.jiffies);
break;
 
default:
@@ -350,7 +340,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (12)|(13)
 };
 
@@ -364,7 +354,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (16)|(17)
 };
 #endif /* _i386_ */
@@ -379,7 +369,7 @@
.from_ip = { 0, 8 },
.to_ip = { 8, 8 },
.info_type = { 8, 1 },
-   .info_data = { 9, 7 },
+   .info_data = { 16, 8 },
.debugctl_mask = (16)|(17)|(19)
 };
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/5] x86, ptrace: change BTS GET ptrace interface

2007-12-14 Thread Markus Metzger
Change the ptrace interface to mimick an array from newst to oldest.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 11:14:26.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 11:34:20.%N +0100
@@ -479,26 +479,33 @@
return ds_get_bts_size((void *)child-thread.ds_area_msr);
 }
 
-static int ptrace_bts_get_index(struct task_struct *child)
-{
-   if (!child-thread.ds_area_msr)
-   return -ENXIO;
-
-   return ds_get_bts_index((void *)child-thread.ds_area_msr);
-}
-
 static int ptrace_bts_read_record(struct task_struct *child,
  long index,
  struct bts_struct __user *out)
 {
struct bts_struct ret;
int retval;
+   int bts_size;
+   int bts_index;
 
if (!child-thread.ds_area_msr)
return -ENXIO;
 
+   if (index  0)
+   return -EINVAL;
+
+   bts_size = ds_get_bts_size((void *)child-thread.ds_area_msr);
+   if (bts_size = index)
+   return -EINVAL;
+
+   /* translate the ptrace bts index into the ds bts index */
+   bts_index = ds_get_bts_index((void *)child-thread.ds_area_msr);
+   bts_index -= (index + 1);
+   if (bts_index  0)
+   bts_index += bts_size;
+
retval = ds_read_bts((void *)child-thread.ds_area_msr,
-index, ret);
+bts_index, ret);
if (retval)
return retval;
 
@@ -813,10 +820,6 @@
ret = ptrace_bts_get_buffer_size(child);
break;
 
-   case PTRACE_BTS_GET_INDEX:
-   ret = ptrace_bts_get_index(child);
-   break;
-
case PTRACE_BTS_READ_RECORD:
ret = ptrace_bts_read_record
(child, data,
@@ -1017,7 +1020,6 @@
case PTRACE_BTS_MAX_BUFFER_SIZE:
case PTRACE_BTS_ALLOCATE_BUFFER:
case PTRACE_BTS_GET_BUFFER_SIZE:
-   case PTRACE_BTS_GET_INDEX:
case PTRACE_BTS_READ_RECORD:
case PTRACE_BTS_CONFIG:
case PTRACE_BTS_STATUS:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-10 11:14:26.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-10 11:16:27.%N +0100
@@ -99,32 +99,27 @@
ENXIOno buffer allocated */
 #define PTRACE_BTS_GET_BUFFER_SIZE 42
 
-/* Return the index of the next bts record to be written,
-   if successful; -1, otherwise.
-   EOPNOTSUPP...processor does not support bts tracing
-   ENXIOno buffer allocated
-   After the first warp-around, this is the start of the circular bts buffer. 
*/
-#define PTRACE_BTS_GET_INDEX 43
-
-/* Read the DATA'th bts record into a ptrace_bts_record buffer provided in 
ADDR.
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
+   provided in ADDR.
+   Records are ordered from newest to oldest.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated
EINVAL...invalid index */
-#define PTRACE_BTS_READ_RECORD 44
+#define PTRACE_BTS_READ_RECORD 43
 
 /* Configure last branch trace; the configuration is given as a bit-mask of
PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_CONFIG 45
+#define PTRACE_BTS_CONFIG 44
 
 /* Return the configuration as bit-mask of PTRACE_BTS_O_* options
if successful; -1, otherwise.
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_STATUS 46
+#define PTRACE_BTS_STATUS 45
 
 /* Trace configuration options */
 /* Collect last branch trace */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/5] x86, ptrace: new ptrace BTS API

2007-12-14 Thread Markus Metzger
Here's the new ptrace BTS API that supports two different overflow handling 
mechanisms (wrap-around and buffer-full-signal) to support two different use 
cases (debugging and profiling).

It further combines buffer allocation and configuration.


Opens:
- memory rlimit
- overflow signal

What would be the right signal to use?


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-14 15:31:48.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-14 15:31:48.%N +0100
@@ -177,18 +177,20 @@
 }
 
 
-int ds_allocate(void **dsp, size_t bts_size_in_records)
+int ds_allocate(void **dsp, size_t bts_size_in_bytes)
 {
-   size_t bts_size_in_bytes = 0;
-   void *bts = 0;
-   void *ds = 0;
+   size_t bts_size_in_records;
+   void *bts;
+   void *ds;
 
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
 
-   if (bts_size_in_records  0)
+   if (bts_size_in_bytes  0)
return -EINVAL;
 
+   bts_size_in_records =
+   bts_size_in_bytes / ds_cfg.sizeof_bts;
bts_size_in_bytes =
bts_size_in_records * ds_cfg.sizeof_bts;
 
@@ -233,9 +235,21 @@
if (!ds_cfg.sizeof_ds || !ds_cfg.sizeof_bts)
return -EOPNOTSUPP;
 
+   if (!ds)
+   return 0;
+
size_in_bytes =
get_bts_absolute_maximum(ds) -
get_bts_buffer_base(ds);
+   return size_in_bytes;
+}
+
+int ds_get_bts_end(void *ds)
+{
+   size_t size_in_bytes = ds_get_bts_size(ds);
+
+   if (size_in_bytes = 0)
+   return size_in_bytes;
 
return size_in_bytes / ds_cfg.sizeof_bts;
 }
@@ -254,6 +268,38 @@
return index_offset_in_bytes / ds_cfg.sizeof_bts;
 }
 
+int ds_set_overflow(void *ds, int method)
+{
+   switch (method) {
+   case DS_O_SIGNAL:
+   return -EOPNOTSUPP;
+   case DS_O_WRAP:
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
+int ds_get_overflow(void *ds)
+{
+   return DS_O_WRAP;
+}
+
+int ds_clear(void *ds)
+{
+   int bts_size = ds_get_bts_size(ds);
+   void *bts_base;
+
+   if (bts_size = 0)
+   return bts_size;
+
+   bts_base = get_bts_buffer_base(ds);
+   memset(bts_base, 0, bts_size);
+
+   set_bts_index(ds, bts_base);
+   return 0;
+}
+
 int ds_read_bts(void *ds, size_t index, struct bts_struct *out)
 {
void *bts;
Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-14 15:31:48.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-14 17:32:40.%N +0100
@@ -33,12 +33,6 @@
 
 
 /*
- * The maximal size of a BTS buffer per traced task in number of BTS
- * records.
- */
-#define PTRACE_BTS_BUFFER_MAX 4000
-
-/*
  * does not yet catch signals sent when the child dies.
  * in exit.c or in signal.c.
  */
@@ -466,17 +460,12 @@
return 0;
 }
 
-static int ptrace_bts_max_buffer_size(void)
-{
-   return PTRACE_BTS_BUFFER_MAX;
-}
-
-static int ptrace_bts_get_buffer_size(struct task_struct *child)
+static int ptrace_bts_get_size(struct task_struct *child)
 {
if (!child-thread.ds_area_msr)
return -ENXIO;
 
-   return ds_get_bts_size((void *)child-thread.ds_area_msr);
+   return ds_get_bts_index((void *)child-thread.ds_area_msr);
 }
 
 static int ptrace_bts_read_record(struct task_struct *child,
@@ -485,7 +474,7 @@
 {
struct bts_struct ret;
int retval;
-   int bts_size;
+   int bts_end;
int bts_index;
 
if (!child-thread.ds_area_msr)
@@ -494,15 +483,15 @@
if (index  0)
return -EINVAL;
 
-   bts_size = ds_get_bts_size((void *)child-thread.ds_area_msr);
-   if (bts_size = index)
+   bts_end = ds_get_bts_end((void *)child-thread.ds_area_msr);
+   if (bts_end = index)
return -EINVAL;
 
/* translate the ptrace bts index into the ds bts index */
bts_index = ds_get_bts_index((void *)child-thread.ds_area_msr);
bts_index -= (index + 1);
if (bts_index  0)
-   bts_index += bts_size;
+   bts_index += bts_end;
 
retval = ds_read_bts((void *)child-thread.ds_area_msr,
 bts_index, ret);
@@ -530,19 +519,97 @@
return sizeof(*in);
 }
 
-static int ptrace_bts_config(struct task_struct *child,
-unsigned long options)
+static int ptrace_bts_clear(struct task_struct *child)
 {
-   unsigned long debugctl_mask = ds_debugctl_mask();
-   int retval;
+   if (!child-thread.ds_area_msr)
+   return -ENXIO;
 
-   retval = ptrace_bts_get_buffer_size(child);
-   if (retval  0

[patch 5/5] x86, ptrace, man: updated man pages for the ptrace API changes

2007-12-14 Thread Markus Metzger
Describe the ptrace user API changes for this patch set.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-14 13:22:17.%N +0100
+++ man/man2/ptrace.2   2007-12-14 14:35:10.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,126 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced
+process. For every (conditional) control flow change, the source and
+destination address are stored. On some architectures, control flow
+changes inside the kernel are recorded, as well. On later
+architectures, these are automatically filtered out.
+.LP
+The buffer (called Branch Trace Store) can be configured to be either
+circular, or to send a signal to the traced task when it is about to
+overflow. Not all methods may be available on all architectures.
+.LP
+The buffer can be accessed in two ways matching the above
+configurations: either as an array of BTS records from newest
+record to older records, one record at a time; or all records at once,
+from oldest to newest.
+.LP
+The former is mostly used for circular buffers to capture a tail of
+the execution trace (e.g. for debugging); the latter is mostly used to
+collect a continuous trace (e.g. for profiling) where the user drains
+the hardware buffer into a larger private buffer or into a file.
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+A BTS record is defined as:
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long timestamp;
+   } variant;
+};
+.fi
+.LP
+For configuring last branch recording and for querying its status, the
+following struct is used:
+.LP
+.nf
+struct ptrace_bts_config {
+   unsigned long size;
+   unsigned long flags;
+};
+.fi
+.LP
+\fISize\fP is either the requested or the actual size of the kernel
+BTS buffer in bytes.
+\fIFlags\fP is a bitmask of options, which are specified by the
+following flags:
+.RS
+.TP
+.BR PTRACE_BTS_O_TRACE
+Collect branch trace records
+.TP
+.BR PTRACE_BTS_O_SCHED
+Collect scheduling timing information
+.TP
+.BR PTRACE_BTS_O_SIGNAL
+Send \fBSIG?\fP to the traced task in case of a buffer overflow
+.TP
+.BR PTRACE_BTS_O_CUT_SIZE
+Reduce the requested buffer size if it is bigger than the available
+buffer size.
+.RE
+.TP
+.BR PTRACE_BTS_CONFIG
+Configure last branch recording. \fIaddr\fP points to a
+\fIptrace_bts_config\fP structure (see above).
+(\fIdata\fP is ignored.)
+.TP
+.BR PTRACE_BTS_STATUS
+Writes the actual configuration into a \fIptrace_bts_config\fP
+structure pointed to by \fIaddr\fP. The caller is responsible for
+allocating memory at \fIaddr\fP to hold a \fIptrace_bts_config\fP
+structure.
+(\fIdata\fP is ignored.)
+.TP
+.BR PTRACE_BTS_SIZE
+Returns the number of BTS records available for draining. For a
+circular buffer, this number is meaningless.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_GET
+Reads a single BTS record at index \fIdata\fP into \fIaddr\fP. The
+caller is responsible for allocating memory at \fIaddr\fP to hold one
+\fIptrace_bts_record\fP structure.
+The bigger the index, the older the record; the latest record can
+always be found at index 0.
+.TP
+.BR PTRACE_BTS_CLEAR
+Clears the BTS buffer. This command can be used after a manual
+draining using PTRACE_BTS_GET commands.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+.BR PTRACE_BTS_DRAIN
+Reads all available BTS records into the buffer pointed to by
+\fIaddr\fP and clears the buffer. Records are read from oldest to
+newest. Returns the number of BTS records drained. The caller
+is responsible for allocating enough memory to hold an array of
+PTRACE_BTS_SIZE \fIptrace_bts_record\fP structures

[patch 4/4] man: updated man pages

2007-12-10 Thread Markus Metzger
Update ptrace man pages to reflect the interface changes from the last two 
patches in the series.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-10 11:22:19.%N +0100
+++ man/man2/ptrace.2   2007-12-10 11:28:56.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,94 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+The buffer can be accessed as an array of BTS records from newest
+record to older records. 
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1. The bigger the
+index, the older the record; the latest record can always be found at
+index 0.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH "RETURN VALUE"
 On success,
 .B PTRACE_PEEK*
@@ -432,6 +523,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH "CONFORMING TO"
 SVr4, 4.3BSD
 .SH NOTES
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052


[patch 3/4] x86, ptrace: change interface to newest-to-oldest array

2007-12-10 Thread Markus Metzger
Change the ptrace user interface to use an array from newest to oldest BTS 
entry.
This eliminates the need for the GET_INDEX command.

I could further imagine combining ALLOCATE_BUFFER with CONFIG and 
GET_BUFFER_SIZE with STATUS. This would bring the interface down to 3 commands 
(config, status, read), but it would require struct to be passed for config and 
status.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 11:14:26.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 11:34:20.%N +0100
@@ -479,26 +479,33 @@
return ds_get_bts_size((void *)child->thread.ds_area_msr);
 }
 
-static int ptrace_bts_get_index(struct task_struct *child)
-{
-   if (!child->thread.ds_area_msr)
-   return -ENXIO;
-
-   return ds_get_bts_index((void *)child->thread.ds_area_msr);
-}
-
 static int ptrace_bts_read_record(struct task_struct *child,
  long index,
  struct bts_struct __user *out)
 {
struct bts_struct ret;
int retval;
+   int bts_size;
+   int bts_index;
 
if (!child->thread.ds_area_msr)
return -ENXIO;
 
+   if (index < 0)
+   return -EINVAL;
+
+   bts_size = ds_get_bts_size((void *)child->thread.ds_area_msr);
+   if (bts_size <= index)
+   return -EINVAL;
+
+   /* translate the ptrace bts index into the ds bts index */
+   bts_index = ds_get_bts_index((void *)child->thread.ds_area_msr);
+   bts_index -= (index + 1);
+   if (bts_index < 0)
+   bts_index += bts_size;
+
retval = ds_read_bts((void *)child->thread.ds_area_msr,
-index, );
+bts_index, );
if (retval)
return retval;
 
@@ -813,10 +820,6 @@
ret = ptrace_bts_get_buffer_size(child);
break;
 
-   case PTRACE_BTS_GET_INDEX:
-   ret = ptrace_bts_get_index(child);
-   break;
-
case PTRACE_BTS_READ_RECORD:
ret = ptrace_bts_read_record
(child, data,
@@ -1017,7 +1020,6 @@
case PTRACE_BTS_MAX_BUFFER_SIZE:
case PTRACE_BTS_ALLOCATE_BUFFER:
case PTRACE_BTS_GET_BUFFER_SIZE:
-   case PTRACE_BTS_GET_INDEX:
case PTRACE_BTS_READ_RECORD:
case PTRACE_BTS_CONFIG:
case PTRACE_BTS_STATUS:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-10 11:14:26.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-10 11:16:27.%N +0100
@@ -99,32 +99,27 @@
ENXIOno buffer allocated */
 #define PTRACE_BTS_GET_BUFFER_SIZE 42
 
-/* Return the index of the next bts record to be written,
-   if successful; -1, otherwise.
-   EOPNOTSUPP...processor does not support bts tracing
-   ENXIOno buffer allocated
-   After the first warp-around, this is the start of the circular bts buffer. 
*/
-#define PTRACE_BTS_GET_INDEX 43
-
-/* Read the DATA'th bts record into a ptrace_bts_record buffer provided in 
ADDR.
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
+   provided in ADDR.
+   Records are ordered from newest to oldest.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated
EINVAL...invalid index */
-#define PTRACE_BTS_READ_RECORD 44
+#define PTRACE_BTS_READ_RECORD 43
 
 /* Configure last branch trace; the configuration is given as a bit-mask of
PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_CONFIG 45
+#define PTRACE_BTS_CONFIG 44
 
 /* Return the configuration as bit-mask of PTRACE_BTS_O_* options
if successful; -1, otherwise.
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_STATUS 46
+#define PTRACE_BTS_STATUS 45
 
 /* Trace configuration options */
 /* Collect last branch trace */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. I

[patch 1/4] x86: remove bad comment

2007-12-10 Thread Markus Metzger
Remove a comment that is no longer correct. The reconfiguration is done 
directly in __switch_to_xtra.

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/process_64.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_64.c 2007-12-10 09:35:26.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_64.c  2007-12-10 09:36:38.%N +0100
@@ -598,10 +598,6 @@
memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
}
 
-   /*
-* Last branch recording recofiguration of trace hardware and
-* disentangling of trace data per task.
-*/
if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/4] x86, ptrace: use jiffies for bts timestamps

2007-12-10 Thread Markus Metzger
Use jiffies for timestamps in last branch recording.


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 10:11:37.%N +0100
@@ -616,7 +616,7 @@
 {
struct bts_struct rec = {
.qualifier = qualifier,
-   .variant.timestamp = sched_clock()
+   .variant.jiffies = jiffies
};
 
if (ptrace_bts_get_buffer_size(tsk) <= 0)
Index: linux-2.6-x86/include/asm-x86/ds.h
===
--- linux-2.6-x86.orig/include/asm-x86/ds.h 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/include/asm-x86/ds.h  2007-12-10 10:11:37.%N +0100
@@ -48,7 +48,7 @@
} lbr;
/* BTS_TASK_ARRIVES or
   BTS_TASK_DEPARTS */
-   unsigned long long timestamp;
+   unsigned long jiffies;
} variant;
 };
 
Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-10 09:13:55.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-10 10:22:43.%N +0100
@@ -167,23 +167,13 @@
 {
(*(unsigned char *)(base + ds_cfg.info_type.offset)) = value;
 }
-/*
- * The info data might overlap with the info type on some architectures.
- * We therefore read and write the exact number of bytes.
- */
-static inline unsigned long long get_info_data(char *base)
-{
-   unsigned long long value = 0;
-   memcpy(,
-  base + ds_cfg.info_data.offset,
-  ds_cfg.info_data.size);
-   return value;
-}
-static inline void set_info_data(char *base, unsigned long long value)
-{
-   memcpy(base + ds_cfg.info_data.offset,
-  ,
-  ds_cfg.info_data.size);
+static inline unsigned long get_info_data(char *base)
+{
+   return *(unsigned long *)(base + ds_cfg.info_data.offset);
+}
+static inline void set_info_data(char *base, unsigned long value)
+{
+   (*(unsigned long *)(base + ds_cfg.info_data.offset)) = value;
 }
 
 
@@ -282,8 +272,8 @@
 
memset(out, 0, sizeof(*out));
if (get_from_ip(bts) == BTS_ESCAPE_ADDRESS) {
-   out->qualifier = get_info_type(bts);
-   out->variant.timestamp = get_info_data(bts);
+   out->qualifier   = get_info_type(bts);
+   out->variant.jiffies = get_info_data(bts);
} else {
out->qualifier = BTS_BRANCH;
out->variant.lbr.from_ip = get_from_ip(bts);
@@ -319,7 +309,7 @@
case BTS_TASK_DEPARTS:
set_from_ip(bts, BTS_ESCAPE_ADDRESS);
set_info_type(bts, in->qualifier);
-   set_info_data(bts, in->variant.timestamp);
+   set_info_data(bts, in->variant.jiffies);
break;
 
default:
@@ -350,7 +340,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (1<<2)|(1<<3)
 };
 
@@ -364,7 +354,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (1<<6)|(1<<7)
 };
 #endif /* _i386_ */
@@ -379,7 +369,7 @@
.from_ip = { 0, 8 },
.to_ip = { 8, 8 },
.info_type = { 8, 1 },
-   .info_data = { 9, 7 },
+   .info_data = { 16, 8 },
.debugctl_mask = (1<<6)|(1<<7)|(1<<9)
 };
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/4] x86: remove bad comment

2007-12-10 Thread Markus Metzger
Remove a comment that is no longer correct. The reconfiguration is done 
directly in __switch_to_xtra.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/process_64.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_64.c 2007-12-10 09:35:26.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_64.c  2007-12-10 09:36:38.%N +0100
@@ -598,10 +598,6 @@
memset(tss-io_bitmap, 0xff, prev-io_bitmap_max);
}
 
-   /*
-* Last branch recording recofiguration of trace hardware and
-* disentangling of trace data per task.
-*/
if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/4] x86, ptrace: use jiffies for bts timestamps

2007-12-10 Thread Markus Metzger
Use jiffies for timestamps in last branch recording.


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 10:11:37.%N +0100
@@ -616,7 +616,7 @@
 {
struct bts_struct rec = {
.qualifier = qualifier,
-   .variant.timestamp = sched_clock()
+   .variant.jiffies = jiffies
};
 
if (ptrace_bts_get_buffer_size(tsk) = 0)
Index: linux-2.6-x86/include/asm-x86/ds.h
===
--- linux-2.6-x86.orig/include/asm-x86/ds.h 2007-12-10 09:47:57.%N +0100
+++ linux-2.6-x86/include/asm-x86/ds.h  2007-12-10 10:11:37.%N +0100
@@ -48,7 +48,7 @@
} lbr;
/* BTS_TASK_ARRIVES or
   BTS_TASK_DEPARTS */
-   unsigned long long timestamp;
+   unsigned long jiffies;
} variant;
 };
 
Index: linux-2.6-x86/arch/x86/kernel/ds.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ds.c 2007-12-10 09:13:55.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ds.c  2007-12-10 10:22:43.%N +0100
@@ -167,23 +167,13 @@
 {
(*(unsigned char *)(base + ds_cfg.info_type.offset)) = value;
 }
-/*
- * The info data might overlap with the info type on some architectures.
- * We therefore read and write the exact number of bytes.
- */
-static inline unsigned long long get_info_data(char *base)
-{
-   unsigned long long value = 0;
-   memcpy(value,
-  base + ds_cfg.info_data.offset,
-  ds_cfg.info_data.size);
-   return value;
-}
-static inline void set_info_data(char *base, unsigned long long value)
-{
-   memcpy(base + ds_cfg.info_data.offset,
-  value,
-  ds_cfg.info_data.size);
+static inline unsigned long get_info_data(char *base)
+{
+   return *(unsigned long *)(base + ds_cfg.info_data.offset);
+}
+static inline void set_info_data(char *base, unsigned long value)
+{
+   (*(unsigned long *)(base + ds_cfg.info_data.offset)) = value;
 }
 
 
@@ -282,8 +272,8 @@
 
memset(out, 0, sizeof(*out));
if (get_from_ip(bts) == BTS_ESCAPE_ADDRESS) {
-   out-qualifier = get_info_type(bts);
-   out-variant.timestamp = get_info_data(bts);
+   out-qualifier   = get_info_type(bts);
+   out-variant.jiffies = get_info_data(bts);
} else {
out-qualifier = BTS_BRANCH;
out-variant.lbr.from_ip = get_from_ip(bts);
@@ -319,7 +309,7 @@
case BTS_TASK_DEPARTS:
set_from_ip(bts, BTS_ESCAPE_ADDRESS);
set_info_type(bts, in-qualifier);
-   set_info_data(bts, in-variant.timestamp);
+   set_info_data(bts, in-variant.jiffies);
break;
 
default:
@@ -350,7 +340,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (12)|(13)
 };
 
@@ -364,7 +354,7 @@
.from_ip = { 0, 4 },
.to_ip = { 4, 4 },
.info_type = { 4, 1 },
-   .info_data = { 5, 7 },
+   .info_data = { 8, 4 },
.debugctl_mask = (16)|(17)
 };
 #endif /* _i386_ */
@@ -379,7 +369,7 @@
.from_ip = { 0, 8 },
.to_ip = { 8, 8 },
.info_type = { 8, 1 },
-   .info_data = { 9, 7 },
+   .info_data = { 16, 8 },
.debugctl_mask = (16)|(17)|(19)
 };
 
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/4] x86, ptrace: change interface to newest-to-oldest array

2007-12-10 Thread Markus Metzger
Change the ptrace user interface to use an array from newest to oldest BTS 
entry.
This eliminates the need for the GET_INDEX command.

I could further imagine combining ALLOCATE_BUFFER with CONFIG and 
GET_BUFFER_SIZE with STATUS. This would bring the interface down to 3 commands 
(config, status, read), but it would require struct to be passed for config and 
status.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/ptrace.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/ptrace.c 2007-12-10 11:14:26.%N +0100
+++ linux-2.6-x86/arch/x86/kernel/ptrace.c  2007-12-10 11:34:20.%N +0100
@@ -479,26 +479,33 @@
return ds_get_bts_size((void *)child-thread.ds_area_msr);
 }
 
-static int ptrace_bts_get_index(struct task_struct *child)
-{
-   if (!child-thread.ds_area_msr)
-   return -ENXIO;
-
-   return ds_get_bts_index((void *)child-thread.ds_area_msr);
-}
-
 static int ptrace_bts_read_record(struct task_struct *child,
  long index,
  struct bts_struct __user *out)
 {
struct bts_struct ret;
int retval;
+   int bts_size;
+   int bts_index;
 
if (!child-thread.ds_area_msr)
return -ENXIO;
 
+   if (index  0)
+   return -EINVAL;
+
+   bts_size = ds_get_bts_size((void *)child-thread.ds_area_msr);
+   if (bts_size = index)
+   return -EINVAL;
+
+   /* translate the ptrace bts index into the ds bts index */
+   bts_index = ds_get_bts_index((void *)child-thread.ds_area_msr);
+   bts_index -= (index + 1);
+   if (bts_index  0)
+   bts_index += bts_size;
+
retval = ds_read_bts((void *)child-thread.ds_area_msr,
-index, ret);
+bts_index, ret);
if (retval)
return retval;
 
@@ -813,10 +820,6 @@
ret = ptrace_bts_get_buffer_size(child);
break;
 
-   case PTRACE_BTS_GET_INDEX:
-   ret = ptrace_bts_get_index(child);
-   break;
-
case PTRACE_BTS_READ_RECORD:
ret = ptrace_bts_read_record
(child, data,
@@ -1017,7 +1020,6 @@
case PTRACE_BTS_MAX_BUFFER_SIZE:
case PTRACE_BTS_ALLOCATE_BUFFER:
case PTRACE_BTS_GET_BUFFER_SIZE:
-   case PTRACE_BTS_GET_INDEX:
case PTRACE_BTS_READ_RECORD:
case PTRACE_BTS_CONFIG:
case PTRACE_BTS_STATUS:
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-10 11:14:26.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-10 11:16:27.%N +0100
@@ -99,32 +99,27 @@
ENXIOno buffer allocated */
 #define PTRACE_BTS_GET_BUFFER_SIZE 42
 
-/* Return the index of the next bts record to be written,
-   if successful; -1, otherwise.
-   EOPNOTSUPP...processor does not support bts tracing
-   ENXIOno buffer allocated
-   After the first warp-around, this is the start of the circular bts buffer. 
*/
-#define PTRACE_BTS_GET_INDEX 43
-
-/* Read the DATA'th bts record into a ptrace_bts_record buffer provided in 
ADDR.
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
+   provided in ADDR.
+   Records are ordered from newest to oldest.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated
EINVAL...invalid index */
-#define PTRACE_BTS_READ_RECORD 44
+#define PTRACE_BTS_READ_RECORD 43
 
 /* Configure last branch trace; the configuration is given as a bit-mask of
PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
Return 0, if successful; -1, otherwise
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_CONFIG 45
+#define PTRACE_BTS_CONFIG 44
 
 /* Return the configuration as bit-mask of PTRACE_BTS_O_* options
if successful; -1, otherwise.
EOPNOTSUPP...processor does not support bts tracing
ENXIOno buffer allocated */
-#define PTRACE_BTS_STATUS 46
+#define PTRACE_BTS_STATUS 45
 
 /* Trace configuration options */
 /* Collect last branch trace */
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient

[patch 4/4] man: updated man pages

2007-12-10 Thread Markus Metzger
Update ptrace man pages to reflect the interface changes from the last two 
patches in the series.

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-12-10 11:22:19.%N +0100
+++ man/man2/ptrace.2   2007-12-10 11:28:56.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,94 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+The buffer can be accessed as an array of BTS records from newest
+record to older records. 
+.LP
+In addition to branches, timestamps (in jiffies) may optionally be
+recorded when the traced process arrives and departs,
+respectively. This information can be used to obtain a qualitative
+execution order, if more than one process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1. The bigger the
+index, the older the record; the latest record can always be found at
+index 0.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH RETURN VALUE
 On success,
 .B PTRACE_PEEK*
@@ -432,6 +523,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH CONFORMING TO
 SVr4, 4.3BSD
 .SH NOTES
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s

[patch 2/2] man: man pages for ptrace BTS extension

2007-12-05 Thread Markus Metzger
Resend using different email client


Changes to the last version:
- ported to v 2.68

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: man-pages-2.68/man2/ptrace.2
===
--- man-pages-2.68.orig/man2/ptrace.2   2007-11-30 17:22:59.%N +0100
+++ man-pages-2.68/man2/ptrace.22007-11-30 17:26:48.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,95 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH "RETURN VALUE"
 On success,
 .B PTRACE_PEEK*
@@ -432,6 +524,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH "CONFORMING TO"
 SVr4, 4.3BSD
 .SH NOTES
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Us

[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-12-05 Thread Markus Metzger

Resend using different mail client


Changes to the last version:
- split implementation into two layers: ds/bts and ptrace
- renamed TIF's
- save/restore ds save area msr in __switch_to_xtra()
- make block-stepping only look at BTF bit


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/process_32.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_32.c 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_32.c  2007-12-04 18:12:42.%N +0100
@@ -594,11 +594,21 @@
 struct tss_struct *tss)
 {
struct thread_struct *prev, *next;
+   unsigned long debugctl;
 
prev = _p->thread;
next = _p->thread;
 
-   if (next->debugctlmsr != prev->debugctlmsr)
+   debugctl = prev->debugctlmsr;
+   if (next->ds_area_msr != prev->ds_area_msr) {
+   /* we clear debugctl to make sure DS
+* is not in use when we change it */
+   debugctl = 0;
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, 0);
+   wrmsr(MSR_IA32_DS_AREA, next->ds_area_msr, 0);
+   }
+
+   if (next->debugctlmsr != debugctl)
wrmsr(MSR_IA32_DEBUGCTLMSR, next->debugctlmsr, 0);
 
if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
@@ -622,6 +632,13 @@
}
 #endif
 
+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(next_p, BTS_TASK_ARRIVES);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-04 17:21:28.%N +0100
@@ -80,4 +80,56 @@
 
 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */
 
+/* Return maximal BTS buffer size in number of records,
+   if successuf; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing */
+#define PTRACE_BTS_MAX_BUFFER_SIZE 40
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   After the first warp-around, this is the start of the circular bts buffer. 
*/
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer provided in 
ADDR.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_STATUS 46
+
+/* Trace configuration options */
+/* Collect last branch trace */
+#define PTRACE_BTS_O_TRACE_TASK 0x1
+/* Take timestamps when the task arrives and departs */
+#define PTRACE_BTS_O_TIMESTAMPS 0x2
+
 #endif
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2007-12-04 16:44:47.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2007-12-04 18:17:52.%N +0100
@@ -4,8 +4,19 @@
 #include /* For __user */
 #include 
 
+
 #ifndef __ASSEMBLY__
 
+#ifdef __KERNEL__
+
+#include 
+
+struct task_struct;
+extern void ptrace_bts_take_timestamp(struct task_struct *, enum 
bts_qualifier);
+
+#endif /* __KERNEL__ */
+
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
stack dur

[patch 2/2] man: man pages for ptrace BTS extension

2007-12-05 Thread Markus Metzger
Resend using different email client


Changes to the last version:
- ported to v 2.68

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: man-pages-2.68/man2/ptrace.2
===
--- man-pages-2.68.orig/man2/ptrace.2   2007-11-30 17:22:59.%N +0100
+++ man-pages-2.68/man2/ptrace.22007-11-30 17:26:48.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,95 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH RETURN VALUE
 On success,
 .B PTRACE_PEEK*
@@ -432,6 +524,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH CONFORMING TO
 SVr4, 4.3BSD
 .SH NOTES
-
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any

[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-12-05 Thread Markus Metzger

Resend using different mail client


Changes to the last version:
- split implementation into two layers: ds/bts and ptrace
- renamed TIF's
- save/restore ds save area msr in __switch_to_xtra()
- make block-stepping only look at BTF bit


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/process_32.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_32.c 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_32.c  2007-12-04 18:12:42.%N +0100
@@ -594,11 +594,21 @@
 struct tss_struct *tss)
 {
struct thread_struct *prev, *next;
+   unsigned long debugctl;
 
prev = prev_p-thread;
next = next_p-thread;
 
-   if (next-debugctlmsr != prev-debugctlmsr)
+   debugctl = prev-debugctlmsr;
+   if (next-ds_area_msr != prev-ds_area_msr) {
+   /* we clear debugctl to make sure DS
+* is not in use when we change it */
+   debugctl = 0;
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, 0);
+   wrmsr(MSR_IA32_DS_AREA, next-ds_area_msr, 0);
+   }
+
+   if (next-debugctlmsr != debugctl)
wrmsr(MSR_IA32_DEBUGCTLMSR, next-debugctlmsr, 0);
 
if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
@@ -622,6 +632,13 @@
}
 #endif
 
+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(next_p, BTS_TASK_ARRIVES);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-04 17:21:28.%N +0100
@@ -80,4 +80,56 @@
 
 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */
 
+/* Return maximal BTS buffer size in number of records,
+   if successuf; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing */
+#define PTRACE_BTS_MAX_BUFFER_SIZE 40
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   After the first warp-around, this is the start of the circular bts buffer. 
*/
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer provided in 
ADDR.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_STATUS 46
+
+/* Trace configuration options */
+/* Collect last branch trace */
+#define PTRACE_BTS_O_TRACE_TASK 0x1
+/* Take timestamps when the task arrives and departs */
+#define PTRACE_BTS_O_TIMESTAMPS 0x2
+
 #endif
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2007-12-04 16:44:47.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2007-12-04 18:17:52.%N +0100
@@ -4,8 +4,19 @@
 #include linux/compiler.h/* For __user */
 #include asm/ptrace-abi.h
 
+
 #ifndef __ASSEMBLY__
 
+#ifdef __KERNEL__
+
+#include asm/ds.h
+
+struct task_struct;
+extern void ptrace_bts_take_timestamp(struct task_struct *, enum 
bts_qualifier);
+
+#endif /* __KERNEL__ */
+
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
stack during

[patch 2/2] man: man pages for ptrace BTS extensions

2007-12-04 Thread Markus Metzger
Changes to the last version:
- ported to v 2.68

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: man-pages-2.68/man2/ptrace.2
===
--- man-pages-2.68.orig/man2/ptrace.2   2007-11-30 17:22:59.%N +0100
+++ man-pages-2.68/man2/ptrace.22007-11-30 17:26:48.%N +0100
@@ -40,6 +40,9 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
 .TH PTRACE 2 2007-11-15 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,95 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH "RETURN VALUE"
 On success,
 .B PTRACE_PEEK*
@@ -432,6 +524,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH "CONFORMING TO"
 SVr4, 4.3BSD
 .SH NOTES
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-12-04 Thread Markus Metzger
Changes to the last version:
- split implementation into two layers: ds/bts and ptrace
- renamed TIF's
- save/restore ds save area msr in __switch_to_xtra()
- make block-stepping only look at BTF bit


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/process_32.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_32.c 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_32.c  2007-12-04 18:12:42.%N +0100
@@ -594,11 +594,21 @@
 struct tss_struct *tss)
 {
struct thread_struct *prev, *next;
+   unsigned long debugctl;

prev = _p->thread;
next = _p->thread;

-   if (next->debugctlmsr != prev->debugctlmsr)
+   debugctl = prev->debugctlmsr;
+   if (next->ds_area_msr != prev->ds_area_msr) {
+   /* we clear debugctl to make sure DS
+* is not in use when we change it */
+   debugctl = 0;
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, 0);
+   wrmsr(MSR_IA32_DS_AREA, next->ds_area_msr, 0);
+   }
+
+   if (next->debugctlmsr != debugctl)
wrmsr(MSR_IA32_DEBUGCTLMSR, next->debugctlmsr, 0);

if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
@@ -622,6 +632,13 @@
}
 #endif

+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(next_p, BTS_TASK_ARRIVES);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-04 17:21:28.%N +0100
@@ -80,4 +80,56 @@

 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */

+/* Return maximal BTS buffer size in number of records,
+   if successuf; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing */
+#define PTRACE_BTS_MAX_BUFFER_SIZE 40
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   After the first warp-around, this is the start of the circular bts
buffer. */
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
provided in ADDR.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_STATUS 46
+
+/* Trace configuration options */
+/* Collect last branch trace */
+#define PTRACE_BTS_O_TRACE_TASK 0x1
+/* Take timestamps when the task arrives and departs */
+#define PTRACE_BTS_O_TIMESTAMPS 0x2
+
 #endif
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2007-12-04 16:44:47.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2007-12-04 18:17:52.%N +0100
@@ -4,8 +4,19 @@
 #include /* For __user */
 #include 

+
 #ifndef __ASSEMBLY__

+#ifdef __KERNEL__
+
+#include 
+
+struct task_struct;
+extern void ptrace_bts_take_timestamp(struct task_struct *, enum
bts_qualifier);
+
+#endif /* __KERNEL__ */
+
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
stack during a system call. */
Index:

[patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-12-04 Thread Markus Metzger
Support for Intel's last branch recording to ptrace. This gives debuggers
access to this hardware feature and allows them to show an execution trace
of the debugged application.

Last branch recording (see section 18.5 in the Intel 64 and IA-32
Architectures Software Developer's Manual) allows taking an execution
trace of the running application without instrumentation. When a branch
is executed, the hardware logs the source and destination address in a
cyclic buffer given to it by the OS.

This can be a great debugging aid. It shows you how exactly you got
where you currently are without requiring you to do lots of single
stepping and rerunning.

This patch manages the various buffers, configures the trace
hardware, disentangles the trace, and provides a user interface via
ptrace. On the high-level design:
- there is one optional trace buffer per thread_struct
- upon a context switch, the trace hardware is reconfigured to either
  disable tracing or to use the appropriate buffer for the new task.
  - tracing induces ~20% overhead as branch records are sent out on
the bus.
  - the hardware collects trace per processor. To disentangle the
traces for different tasks, we use separate buffers and reconfigure
the trace hardware.
- the low-level data layout is configured at cpu initialization time
  - different processors use different branch record formats
- the implementation is done in two layers
  - the lower layer implements the DS/BTS access
  - the higher layer implements a ptrace interface

Per-CPU tracing can be implemented on top of the lower layer.
A per-cpu array of DS pointers needs to be ds_allocate()'d and the
MSR_IA32_DS_AREA and MSR_IA32_DEBUGCTLMSR MSR's need to be properly
configured. Care needs to be taken to not interfere with the ptrace
use of the above MSR's.


patch 1/2 contains the kernel changes
patch 2/2 contains changes to the ptrace man pages


regards,
markus.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-12-04 Thread Markus Metzger
Support for Intel's last branch recording to ptrace. This gives debuggers
access to this hardware feature and allows them to show an execution trace
of the debugged application.

Last branch recording (see section 18.5 in the Intel 64 and IA-32
Architectures Software Developer's Manual) allows taking an execution
trace of the running application without instrumentation. When a branch
is executed, the hardware logs the source and destination address in a
cyclic buffer given to it by the OS.

This can be a great debugging aid. It shows you how exactly you got
where you currently are without requiring you to do lots of single
stepping and rerunning.

This patch manages the various buffers, configures the trace
hardware, disentangles the trace, and provides a user interface via
ptrace. On the high-level design:
- there is one optional trace buffer per thread_struct
- upon a context switch, the trace hardware is reconfigured to either
  disable tracing or to use the appropriate buffer for the new task.
  - tracing induces ~20% overhead as branch records are sent out on
the bus.
  - the hardware collects trace per processor. To disentangle the
traces for different tasks, we use separate buffers and reconfigure
the trace hardware.
- the low-level data layout is configured at cpu initialization time
  - different processors use different branch record formats
- the implementation is done in two layers
  - the lower layer implements the DS/BTS access
  - the higher layer implements a ptrace interface

Per-CPU tracing can be implemented on top of the lower layer.
A per-cpu array of DS pointers needs to be ds_allocate()'d and the
MSR_IA32_DS_AREA and MSR_IA32_DEBUGCTLMSR MSR's need to be properly
configured. Care needs to be taken to not interfere with the ptrace
use of the above MSR's.


patch 1/2 contains the kernel changes
patch 2/2 contains changes to the ptrace man pages


regards,
markus.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-12-04 Thread Markus Metzger
Changes to the last version:
- split implementation into two layers: ds/bts and ptrace
- renamed TIF's
- save/restore ds save area msr in __switch_to_xtra()
- make block-stepping only look at BTF bit


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/process_32.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_32.c 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_32.c  2007-12-04 18:12:42.%N +0100
@@ -594,11 +594,21 @@
 struct tss_struct *tss)
 {
struct thread_struct *prev, *next;
+   unsigned long debugctl;

prev = prev_p-thread;
next = next_p-thread;

-   if (next-debugctlmsr != prev-debugctlmsr)
+   debugctl = prev-debugctlmsr;
+   if (next-ds_area_msr != prev-ds_area_msr) {
+   /* we clear debugctl to make sure DS
+* is not in use when we change it */
+   debugctl = 0;
+   wrmsrl(MSR_IA32_DEBUGCTLMSR, 0);
+   wrmsr(MSR_IA32_DS_AREA, next-ds_area_msr, 0);
+   }
+
+   if (next-debugctlmsr != debugctl)
wrmsr(MSR_IA32_DEBUGCTLMSR, next-debugctlmsr, 0);

if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
@@ -622,6 +632,13 @@
}
 #endif

+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(prev_p, BTS_TASK_DEPARTS);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_take_timestamp(next_p, BTS_TASK_ARRIVES);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-12-04 16:44:47.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-12-04 17:21:28.%N +0100
@@ -80,4 +80,56 @@

 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */

+/* Return maximal BTS buffer size in number of records,
+   if successuf; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing */
+#define PTRACE_BTS_MAX_BUFFER_SIZE 40
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   After the first warp-around, this is the start of the circular bts
buffer. */
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
provided in ADDR.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_STATUS 46
+
+/* Trace configuration options */
+/* Collect last branch trace */
+#define PTRACE_BTS_O_TRACE_TASK 0x1
+/* Take timestamps when the task arrives and departs */
+#define PTRACE_BTS_O_TIMESTAMPS 0x2
+
 #endif
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2007-12-04 16:44:47.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2007-12-04 18:17:52.%N +0100
@@ -4,8 +4,19 @@
 #include linux/compiler.h/* For __user */
 #include asm/ptrace-abi.h

+
 #ifndef __ASSEMBLY__

+#ifdef __KERNEL__
+
+#include asm/ds.h
+
+struct task_struct;
+extern void ptrace_bts_take_timestamp(struct task_struct *, enum
bts_qualifier);
+
+#endif /* __KERNEL__ */
+
+
 #ifdef __i386__
 /* this struct defines the way the registers are stored on the
stack during a system call. */
Index: linux-2.6-x86/arch/x86/kernel

[patch 2/2] man: man pages for ptrace BTS extensions

2007-12-04 Thread Markus Metzger
Changes to the last version:
- ported to v 2.68

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: man-pages-2.68/man2/ptrace.2
===
--- man-pages-2.68.orig/man2/ptrace.2   2007-11-30 17:22:59.%N +0100
+++ man-pages-2.68/man2/ptrace.22007-11-30 17:26:48.%N +0100
@@ -40,6 +40,9 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
 .TH PTRACE 2 2007-11-15 Linux Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
@@ -378,6 +381,95 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH RETURN VALUE
 On success,
 .B PTRACE_PEEK*
@@ -432,6 +524,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH CONFORMING TO
 SVr4, 4.3BSD
 .SH NOTES
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-12-03 Thread Markus Metzger
> Cool.  It's been on my list to look into exposing those features
> somehow. I hadn't planned on doing it until after the utrace stuff
> settles and there is a more coherent interface context in which to do
> it.

I'm looking very much forward to utrace. From what I read so far, this is
a much nicer interface.
I would expect that this feature, together with all other ptrace extensions,
would need to be adapted to utrace, once that is in.


> If they are tackling the MSR hacking and context switch and so forth,
> I'd like to see them start out by just adding block-step
> (debugctlmsr.btf) with the PTRACE_SINGLEBLOCK interface as ia64 has.
> That should lay some of the same groundwork needed here, but is much
> simpler.

There seems to be support for block stepping in arch/x86/kernel/step.c,
which is used by kernel/ptrace.c.

This is now another user for the DEBUGCTL MSR; the access needs to be
synchronized. I'll look into it.


> I am not really in favor of this new ptrace interface.  I think they
> should look around across arch's and think about sane general-purpose
> interfaces for features of this kind that might be built with some
> commonality across machines.

I looked at the include/asm-*/ptrace.h files and some arch/*/kernel/ptrace.c
files. Most arch's support a few variants of GETREGS.
Most implementations simply copy_to_user the kernel structures for the
requested registers.

Sparc64 needs to convert pointer sizes and defines the returned struct
directly in the implementation.
Xtensa provides access to an array of FP regs of varying size. They provide
a ptrace command to query for the size, but otherwise also copy_to_user
the entire array.

I have not found any arch that does anything more fancy than return a single
integer value or an array of registers.
In all cases, the command carries enough information to interpret the result.

In our case, the array we're querying for can be rather big and
typically only some
of the information is interesting. The data we return is inhomogeneous.

The former may be true for register arrays as well, but they are
typically small
enough. The latter would compare to a general GETREGS command that returns
all registers in a self-describing format (that might be an
interesting extension, if
one got tired of yet another GETREGS command).

Instead of providing the entire array in one command, we introduced commands to
handle that array.
Instead of carrying the information how to interpret the result in the
command itself,
we provide that information directly in the result.

I would argue that this interface may be directly (re)used and
extended by other arch's.


Do you have specific concerns regarding the interface?


> Also do it in a layered way from
> low-level, with something usable for kernel-mode too.

To disable cpl0-filtering should be fairly easy; we would simply clear
the cpl-bit
in the debugctl_mask. This way, you can trace the kernel
part of the application, but you would still debug the application.
You could call the ptrace_bts_ functions directly or we might add a new set of
interface functions that simply forward the request (or the other way round).

To provide a per-cpu trace instead of a per-thread trace would be a
completely new
feature that only has the configuration part in common with our patch.

What did you have in mind when you asked for kernel-mode support?

thanks and regards,
markus.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-12-03 Thread Markus Metzger
 Cool.  It's been on my list to look into exposing those features
 somehow. I hadn't planned on doing it until after the utrace stuff
 settles and there is a more coherent interface context in which to do
 it.

I'm looking very much forward to utrace. From what I read so far, this is
a much nicer interface.
I would expect that this feature, together with all other ptrace extensions,
would need to be adapted to utrace, once that is in.


 If they are tackling the MSR hacking and context switch and so forth,
 I'd like to see them start out by just adding block-step
 (debugctlmsr.btf) with the PTRACE_SINGLEBLOCK interface as ia64 has.
 That should lay some of the same groundwork needed here, but is much
 simpler.

There seems to be support for block stepping in arch/x86/kernel/step.c,
which is used by kernel/ptrace.c.

This is now another user for the DEBUGCTL MSR; the access needs to be
synchronized. I'll look into it.


 I am not really in favor of this new ptrace interface.  I think they
 should look around across arch's and think about sane general-purpose
 interfaces for features of this kind that might be built with some
 commonality across machines.

I looked at the include/asm-*/ptrace.h files and some arch/*/kernel/ptrace.c
files. Most arch's support a few variants of GETwhateverREGS.
Most implementations simply copy_to_user the kernel structures for the
requested registers.

Sparc64 needs to convert pointer sizes and defines the returned struct
directly in the implementation.
Xtensa provides access to an array of FP regs of varying size. They provide
a ptrace command to query for the size, but otherwise also copy_to_user
the entire array.

I have not found any arch that does anything more fancy than return a single
integer value or an array of registers.
In all cases, the command carries enough information to interpret the result.

In our case, the array we're querying for can be rather big and
typically only some
of the information is interesting. The data we return is inhomogeneous.

The former may be true for register arrays as well, but they are
typically small
enough. The latter would compare to a general GETREGS command that returns
all registers in a self-describing format (that might be an
interesting extension, if
one got tired of yet another GETnew-type-ofREGS command).

Instead of providing the entire array in one command, we introduced commands to
handle that array.
Instead of carrying the information how to interpret the result in the
command itself,
we provide that information directly in the result.

I would argue that this interface may be directly (re)used and
extended by other arch's.


Do you have specific concerns regarding the interface?


 Also do it in a layered way from
 low-level, with something usable for kernel-mode too.

To disable cpl0-filtering should be fairly easy; we would simply clear
the cpl-bit
in the debugctl_mask. This way, you can trace the kernel
part of the application, but you would still debug the application.
You could call the ptrace_bts_ functions directly or we might add a new set of
interface functions that simply forward the request (or the other way round).

To provide a per-cpu trace instead of a per-thread trace would be a
completely new
feature that only has the configuration part in common with our patch.

What did you have in mind when you asked for kernel-mode support?

thanks and regards,
markus.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/2] man: man pages for ptrace BTS extensions

2007-11-30 Thread Markus Metzger
Changes to previous version(s):
- added PTRACE_BTS_MAX_BUFFER_SIZE command

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-11-22 20:25:21.%N +0100
+++ man/man2/ptrace.2   2007-11-30 15:30:44.%N +0100
@@ -40,7 +40,10 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
-.TH PTRACE 2 2006-03-24 "Linux 2.6.16" "Linux Programmer's Manual"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
+.TH PTRACE 2 2007-11 "Linux 2.6.16" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
 .SH SYNOPSIS
@@ -312,6 +315,95 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH NOTES
 Although arguments to
 .BR ptrace ()
@@ -409,6 +501,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH "CONFORMING TO"
 SVr4, 4.3BSD
 .SH "SEE ALSO"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Markus Metzger
Changes to previous version(s):
- moved task arrives/departs notifications to __switch_to_xtra()
- added _TIF_BTS_TRACE and _TIF_BTS_TRACE_TS to _TIF_WORK_CTXSW_*
- split _TIF_WORK_CTXSW into ~_PREV and ~_NEXT for x86_64
- ptrace_bts_init_intel() function called from init_intel()
- removed PTRACE_BTS_INIT ptrace command
- cache DEBUGCTRL MSR
- replace struct declarations and operations struct with
  configuration struct defining offset/size pairs and
  generic operations
- added PTRACE_BTS_MAX_BUFFER_SIZE command


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: linux-2.6-x86/arch/x86/kernel/process_32.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_32.c 2007-11-30 14:03:41.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_32.c  2007-11-30 14:03:50.%N +0100
@@ -622,6 +622,19 @@
}
 #endif

+   /*
+* Last branch recording recofiguration of trace hardware and
+* disentangling of trace data per task.
+*/
+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_departs(prev_p);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_arrives(next_p);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-11-30 14:03:41.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-11-30 15:07:08.%N +0100
@@ -80,4 +80,56 @@

 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */

+/* Return maximal BTS buffer size in number of records,
+   if successuf; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing */
+#define PTRACE_BTS_MAX_BUFFER_SIZE 40
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   After the first warp-around, this is the start of the circular bts
buffer. */
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
provided in ADDR.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_STATUS 46
+
+/* Trace configuration options */
+/* Collect last branch trace */
+#define PTRACE_BTS_O_TRACE_TASK 0x1
+/* Take timestamps when the task arrives and departs */
+#define PTRACE_BTS_O_TIMESTAMPS 0x2
+
 #endif
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2007-11-30 14:03:41.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2007-11-30 14:03:50.%N +0100
@@ -4,8 +4,48 @@
 #include /* For __user */
 #include 

+
 #ifndef __ASSEMBLY__

+/* a branch trace record entry
+ *
+ * In order to unify the interface between various processor versions,
+ * we use the below data structure for all processors.
+ */
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   long from_ip;
+   long to_ip;
+

[patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Markus Metzger
Support for Intel's last branch recording to ptrace. This gives debuggers
access to this hardware feature and allows them to show an execution trace
of the debugged application.

Last branch recording (see section 18.5 in the Intel 64 and IA-32
Architectures Software Developer's Manual) allows taking an execution
trace of the running application without instrumentation. When a branch
is executed, the hardware logs the source and destination address in a
cyclic buffer given to it by the OS.

This can be a great debugging aid. It shows you how exactly you got
where you currently are without requiring you to do lots of single
stepping and rerunning.

This patch manages the various buffers, configures the trace
hardware, disentangles the trace, and provides a user interface via
ptrace. On the high-level design:
- there is one optional trace buffer per thread_struct
- upon a context switch, the trace hardware is reconfigured to either
  disable tracing or to use the appropriate buffer for the new task.
  - tracing induces ~20% overhead as branch records are sent out on
the bus.
  - the hardware collects trace per processor. To disentangle the
traces for different tasks, we use separate buffers and reconfigure
the trace hardware.
- the low-level data layout is configured at cpu initialization time
  - different processors use different branch record formats

Opens:
- kernel interface

patch 1/2 contains the kernel changes
patch 2/2 contains changes to the ptrace man pages


regards,
markus.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/2] man: man pages for ptrace BTS extensions

2007-11-30 Thread Markus Metzger
resend using a different mailer


Describe extensions to ptrace interface for branch trace store

Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-11-22 20:25:21.%N +0100
+++ man/man2/ptrace.2   2007-11-22 20:25:33.%N +0100
@@ -40,7 +40,10 @@
 .\"PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\"(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\"
-.TH PTRACE 2 2006-03-24 "Linux 2.6.16" "Linux Programmer's Manual"
+.\" Modified Nov 2007, Markus Metzger <[EMAIL PROTECTED]>
+.\" Added PTRACE_BTS_* commands
+.\"
+.TH PTRACE 2 2007-11 "Linux 2.6.16" "Linux Programmer's Manual"
 .SH NAME
 ptrace \- process trace
 .SH SYNOPSIS
@@ -312,6 +315,96 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER
+This is not a ptrace command but a macro that defines the maximal size
+of the BTS buffer in number of BTS records.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH NOTES
 Although arguments to
 .BR ptrace ()
@@ -409,6 +502,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH "CONFORMING TO"
 SVr4, 4.3BSD
 .SH "SEE ALSO"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Markus Metzger
resend using a different mailer


Changes to previous version(s):
- moved task arrives/departs notifications to __switch_to_xtra()
- added _TIF_BTS_TRACE and _TIF_BTS_TRACE_TS to _TIF_WORK_CTXSW_*
- split _TIF_WORK_CTXSW into ~_PREV and ~_NEXT for x86_64
- ptrace_bts_init_intel() function called from init_intel()
- removed PTRACE_BTS_INIT ptrace command
- cache DEBUGCTRL MSR
- replace struct declarations and operations struct with
  configuration struct defining offset/size pairs and
  generic operations
- added a patch for the ptrace.2 man page for discussing the API
  in this forum


Signed-off-by: Markus Metzger <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>
---

Index: linux-2.6/arch/x86/kernel/process_32.c
===
--- linux-2.6.orig/arch/x86/kernel/process_32.c 2007-11-22 11:03:44.%N +0100
+++ linux-2.6/arch/x86/kernel/process_32.c  2007-11-22 11:36:39.%N +0100
@@ -623,6 +623,19 @@
}
 #endif

+   /*
+* Last branch recording recofiguration of trace hardware and
+* disentangling of trace data per task.
+*/
+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_departs(prev_p);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_arrives(next_p);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6/arch/x86/kernel/ptrace_32.c
===
--- linux-2.6.orig/arch/x86/kernel/ptrace_32.c  2007-11-22 11:03:44.%N +0100
+++ linux-2.6/arch/x86/kernel/ptrace_32.c   2007-11-22 11:36:39.%N +0100
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * does not yet catch signals sent when the child dies.
@@ -274,6 +275,7 @@
 {
clear_singlestep(child);
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+   ptrace_bts_task_detached(child);
 }

 /*
@@ -610,6 +612,32 @@
(struct user_desc __user *) data);
break;

+   case PTRACE_BTS_ALLOCATE_BUFFER:
+   ret = ptrace_bts_allocate_bts(child, data);
+   break;
+
+   case PTRACE_BTS_GET_BUFFER_SIZE:
+   ret = ptrace_bts_get_buffer_size(child);
+   break;
+
+   case PTRACE_BTS_GET_INDEX:
+   ret = ptrace_bts_get_index(child);
+   break;
+
+   case PTRACE_BTS_READ_RECORD:
+   ret = ptrace_bts_read_record
+   (child, data,
+(struct ptrace_bts_record __user *) addr);
+   break;
+
+   case PTRACE_BTS_CONFIG:
+   ret = ptrace_bts_config(child, data);
+   break;
+
+   case PTRACE_BTS_STATUS:
+   ret = ptrace_bts_status(child);
+   break;
+
default:
ret = ptrace_request(child, request, addr, data);
break;
Index: linux-2.6/include/asm-x86/ptrace-abi.h
===
--- linux-2.6.orig/include/asm-x86/ptrace-abi.h 2007-11-22 11:03:45.%N +0100
+++ linux-2.6/include/asm-x86/ptrace-abi.h  2007-11-22 13:28:22.%N +0100
@@ -78,4 +78,49 @@
 # define PTRACE_SYSEMU_SINGLESTEP 32
 #endif

+/*
+ * Maximal BTS buffer size in number of records
+ * This is a macro, not a ptrace command.
+ */
+#define PTRACE_BTS_MAX_BTS_SIZE 4000
+
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   ENXIOptrace bts not initialized
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   ENXIOptrace bts not initialized or no buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   After the first warp-around, this is the start of the circular bts buffer.
+   ENXIOptrace bts not initialized or no buffer allocated */
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
provided in ADDR.
+   Return 0, if successful; -1, otherwise
+   ENXIOptrace bts not initialized or no buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored. */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options.*/
+#define PTRACE_B

[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Markus Metzger
resend using a different mailer


Changes to previous version(s):
- moved task arrives/departs notifications to __switch_to_xtra()
- added _TIF_BTS_TRACE and _TIF_BTS_TRACE_TS to _TIF_WORK_CTXSW_*
- split _TIF_WORK_CTXSW into ~_PREV and ~_NEXT for x86_64
- ptrace_bts_init_intel() function called from init_intel()
- removed PTRACE_BTS_INIT ptrace command
- cache DEBUGCTRL MSR
- replace struct declarations and operations struct with
  configuration struct defining offset/size pairs and
  generic operations
- added a patch for the ptrace.2 man page for discussing the API
  in this forum


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: linux-2.6/arch/x86/kernel/process_32.c
===
--- linux-2.6.orig/arch/x86/kernel/process_32.c 2007-11-22 11:03:44.%N +0100
+++ linux-2.6/arch/x86/kernel/process_32.c  2007-11-22 11:36:39.%N +0100
@@ -623,6 +623,19 @@
}
 #endif

+   /*
+* Last branch recording recofiguration of trace hardware and
+* disentangling of trace data per task.
+*/
+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_departs(prev_p);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_arrives(next_p);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6/arch/x86/kernel/ptrace_32.c
===
--- linux-2.6.orig/arch/x86/kernel/ptrace_32.c  2007-11-22 11:03:44.%N +0100
+++ linux-2.6/arch/x86/kernel/ptrace_32.c   2007-11-22 11:36:39.%N +0100
@@ -24,6 +24,7 @@
 #include asm/debugreg.h
 #include asm/ldt.h
 #include asm/desc.h
+#include asm/ptrace_bts.h

 /*
  * does not yet catch signals sent when the child dies.
@@ -274,6 +275,7 @@
 {
clear_singlestep(child);
clear_tsk_thread_flag(child, TIF_SYSCALL_EMU);
+   ptrace_bts_task_detached(child);
 }

 /*
@@ -610,6 +612,32 @@
(struct user_desc __user *) data);
break;

+   case PTRACE_BTS_ALLOCATE_BUFFER:
+   ret = ptrace_bts_allocate_bts(child, data);
+   break;
+
+   case PTRACE_BTS_GET_BUFFER_SIZE:
+   ret = ptrace_bts_get_buffer_size(child);
+   break;
+
+   case PTRACE_BTS_GET_INDEX:
+   ret = ptrace_bts_get_index(child);
+   break;
+
+   case PTRACE_BTS_READ_RECORD:
+   ret = ptrace_bts_read_record
+   (child, data,
+(struct ptrace_bts_record __user *) addr);
+   break;
+
+   case PTRACE_BTS_CONFIG:
+   ret = ptrace_bts_config(child, data);
+   break;
+
+   case PTRACE_BTS_STATUS:
+   ret = ptrace_bts_status(child);
+   break;
+
default:
ret = ptrace_request(child, request, addr, data);
break;
Index: linux-2.6/include/asm-x86/ptrace-abi.h
===
--- linux-2.6.orig/include/asm-x86/ptrace-abi.h 2007-11-22 11:03:45.%N +0100
+++ linux-2.6/include/asm-x86/ptrace-abi.h  2007-11-22 13:28:22.%N +0100
@@ -78,4 +78,49 @@
 # define PTRACE_SYSEMU_SINGLESTEP 32
 #endif

+/*
+ * Maximal BTS buffer size in number of records
+ * This is a macro, not a ptrace command.
+ */
+#define PTRACE_BTS_MAX_BTS_SIZE 4000
+
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   ENXIOptrace bts not initialized
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   ENXIOptrace bts not initialized or no buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   After the first warp-around, this is the start of the circular bts buffer.
+   ENXIOptrace bts not initialized or no buffer allocated */
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
provided in ADDR.
+   Return 0, if successful; -1, otherwise
+   ENXIOptrace bts not initialized or no buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored. */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask

[patch 2/2] man: man pages for ptrace BTS extensions

2007-11-30 Thread Markus Metzger
Changes to previous version(s):
- added PTRACE_BTS_MAX_BUFFER_SIZE command

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-11-22 20:25:21.%N +0100
+++ man/man2/ptrace.2   2007-11-30 15:30:44.%N +0100
@@ -40,7 +40,10 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
-.TH PTRACE 2 2006-03-24 Linux 2.6.16 Linux Programmer's Manual
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
+.TH PTRACE 2 2007-11 Linux 2.6.16 Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
 .SH SYNOPSIS
@@ -312,6 +315,95 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER_SIZE
+Returns the maximal BTS buffer size.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER_SIZE.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH NOTES
 Although arguments to
 .BR ptrace ()
@@ -409,6 +501,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH CONFORMING TO
 SVr4, 4.3BSD
 .SH SEE ALSO
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Markus Metzger
Support for Intel's last branch recording to ptrace. This gives debuggers
access to this hardware feature and allows them to show an execution trace
of the debugged application.

Last branch recording (see section 18.5 in the Intel 64 and IA-32
Architectures Software Developer's Manual) allows taking an execution
trace of the running application without instrumentation. When a branch
is executed, the hardware logs the source and destination address in a
cyclic buffer given to it by the OS.

This can be a great debugging aid. It shows you how exactly you got
where you currently are without requiring you to do lots of single
stepping and rerunning.

This patch manages the various buffers, configures the trace
hardware, disentangles the trace, and provides a user interface via
ptrace. On the high-level design:
- there is one optional trace buffer per thread_struct
- upon a context switch, the trace hardware is reconfigured to either
  disable tracing or to use the appropriate buffer for the new task.
  - tracing induces ~20% overhead as branch records are sent out on
the bus.
  - the hardware collects trace per processor. To disentangle the
traces for different tasks, we use separate buffers and reconfigure
the trace hardware.
- the low-level data layout is configured at cpu initialization time
  - different processors use different branch record formats

Opens:
- kernel interface

patch 1/2 contains the kernel changes
patch 2/2 contains changes to the ptrace man pages


regards,
markus.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/2] man: man pages for ptrace BTS extensions

2007-11-30 Thread Markus Metzger
resend using a different mailer


Describe extensions to ptrace interface for branch trace store

Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: man/man2/ptrace.2
===
--- man.orig/man2/ptrace.2  2007-11-22 20:25:21.%N +0100
+++ man/man2/ptrace.2   2007-11-22 20:25:33.%N +0100
@@ -40,7 +40,10 @@
 .\PTRACE_SETSIGINFO, PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP
 .\(Thanks to Blaisorblade, Daniel Jacobowitz and others who helped.)
 .\
-.TH PTRACE 2 2006-03-24 Linux 2.6.16 Linux Programmer's Manual
+.\ Modified Nov 2007, Markus Metzger [EMAIL PROTECTED]
+.\ Added PTRACE_BTS_* commands
+.\
+.TH PTRACE 2 2007-11 Linux 2.6.16 Linux Programmer's Manual
 .SH NAME
 ptrace \- process trace
 .SH SYNOPSIS
@@ -312,6 +315,96 @@
 detached in this way regardless of which method was used to initiate
 tracing.
 (\fIaddr\fP is ignored.)
+.LP
+The following ptrace commands provide access to the hardware's last
+branch recording. They may not be available on all architectures.
+.LP
+Last branch recording stores an execution trace of the traced process
+in a circular buffer (called Branch Trace Store). For every
+(conditional) control flow change, the source and destination address
+are stored. On some architectures, control flow changes inside the
+kernel may be recorded, as well. On later architectures, these are
+automatically filtered out.
+.LP
+In addition to branches, timestamps may optionally be recorded when
+the traced process arrives and departs, respectively. This information
+can be used to obtain a qualitative execution order, if more than one
+process is traced.
+.LP
+.nf
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+.sp
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   void *from_ip;
+   void *to_ip;
+   } lbr;
+   /* PTRACE_BTS_TASK_ARRIVES or
+  PTRACE_BTS_TASK_DEPARTS */
+   unsigned long long timestamp;
+   } variant;
+};
+.fi
+.LP
+.TP
+PTRACE_BTS_MAX_BUFFER
+This is not a ptrace command but a macro that defines the maximal size
+of the BTS buffer in number of BTS records.
+.TP
+PTRACE_BTS_ALLOCATE_BUFFER
+Allocate a new BTS buffer big enough to hold \fIdata\fP \fBstruct
+ptrace_bts_record\fP entries.
+\fIData\fP must be in the range of 0..PTRACE_BTS_MAX_BUFFER.
+If a buffer is already allocated, that buffer is freed after the new
+buffer was successfully allocated. The new buffer initially contains
+invalid entries.
+Typically, a buffer is allocated once when tracing starts. It is
+automatically deallocated when the parent detaches from the child.
+(\fIaddr\fP is ignored.)
+.TP
+PTRACE_BTS_GET_BUFFER_SIZE
+Returns the actual BTS buffer size in number of BTS records. The
+command fails, if no buffer has been allocated.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_GET_INDEX
+Returns the index of the next entry to be (over)written by the tracing
+hardware. This can be used to determine the end of the current
+execution trace.
+(\fIaddr\fP and \fIdata\fP are ignored.)
+.TP
+PTRACE_BTS_READ_RECORD
+Reads the BTS record at index \fIdata\fP into \fIaddr\fP. The caller
+is responsible for allocating memory at \fIaddr\fP of at least
+\fB sizeof(struct ptrace_bts_record)\fP bytes. The index \fIdata\fP
+must be in the range 0..PTRACE_BTS_GET_BUFFER_SIZE - 1.
+.TP
+PTRACE_BTS_CONFIG
+Configures last branch recording from \fIdata\fP in the parent.
+(\fIaddr\fP is ignored.)
+\fIdata\fP is interpreted
+as a bitmask of options, which are specified by the following flags:
+.RS
+.TP
+PTRACE_BTS_O_TRACE_TASK
+Record last branch records for control flow changes.
+.TP
+PTRACE_BTS_O_TIMESTAMPS
+Record timestamps when child arrives and departs, respectively.
+.RE
+.TP
+PTRACE_BTS_STATUS
+Returns the current BTS configuration as a bitmask of the above
+options.
+(\fIaddr\fP and \fIdata\fP are ignored.)
 .SH NOTES
 Although arguments to
 .BR ptrace ()
@@ -409,6 +502,16 @@
 .B ESRCH
 The specified process does not exist, or is not currently being traced
 by the caller, or is not stopped (for requests that require that).
+.TP
+.B EOPNOTSUPP
+The operation is not supported on this architecture.
+.TP
+.B ENOMEM
+Not enough memory to allocate the BTS buffer.
+.TP
+.B ENXIO
+An attempt to access BTS information has been made without allocating
+a BTS buffer first.
 .SH CONFORMING TO
 SVr4, 4.3BSD
 .SH SEE ALSO
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/2] x86, ptrace: support for branch trace store(BTS)

2007-11-30 Thread Markus Metzger
Changes to previous version(s):
- moved task arrives/departs notifications to __switch_to_xtra()
- added _TIF_BTS_TRACE and _TIF_BTS_TRACE_TS to _TIF_WORK_CTXSW_*
- split _TIF_WORK_CTXSW into ~_PREV and ~_NEXT for x86_64
- ptrace_bts_init_intel() function called from init_intel()
- removed PTRACE_BTS_INIT ptrace command
- cache DEBUGCTRL MSR
- replace struct declarations and operations struct with
  configuration struct defining offset/size pairs and
  generic operations
- added PTRACE_BTS_MAX_BUFFER_SIZE command


Signed-off-by: Markus Metzger [EMAIL PROTECTED]
Signed-off-by: Suresh Siddha [EMAIL PROTECTED]
---

Index: linux-2.6-x86/arch/x86/kernel/process_32.c
===
--- linux-2.6-x86.orig/arch/x86/kernel/process_32.c 2007-11-30 14:03:41.%N 
+0100
+++ linux-2.6-x86/arch/x86/kernel/process_32.c  2007-11-30 14:03:50.%N +0100
@@ -622,6 +622,19 @@
}
 #endif

+   /*
+* Last branch recording recofiguration of trace hardware and
+* disentangling of trace data per task.
+*/
+   if (test_tsk_thread_flag(prev_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(prev_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_departs(prev_p);
+
+   if (test_tsk_thread_flag(next_p, TIF_BTS_TRACE) ||
+   test_tsk_thread_flag(next_p, TIF_BTS_TRACE_TS))
+   ptrace_bts_task_arrives(next_p);
+
+
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
 * Disable the bitmap via an invalid offset. We still cache
Index: linux-2.6-x86/include/asm-x86/ptrace-abi.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace-abi.h 2007-11-30 14:03:41.%N 
+0100
+++ linux-2.6-x86/include/asm-x86/ptrace-abi.h  2007-11-30 15:07:08.%N +0100
@@ -80,4 +80,56 @@

 #define PTRACE_SINGLEBLOCK 33  /* resume execution until next branch */

+/* Return maximal BTS buffer size in number of records,
+   if successuf; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing */
+#define PTRACE_BTS_MAX_BUFFER_SIZE 40
+
+/* Allocate new bts buffer (free old one, if exists) of size DATA bts records;
+   parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   EINVAL...invalid size in records
+   ENOMEM...out of memory */
+#define PTRACE_BTS_ALLOCATE_BUFFER 41
+
+/* Return the size of the bts buffer in number of bts records,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_GET_BUFFER_SIZE 42
+
+/* Return the index of the next bts record to be written,
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   After the first warp-around, this is the start of the circular bts
buffer. */
+#define PTRACE_BTS_GET_INDEX 43
+
+/* Read the DATA'th bts record into a ptrace_bts_record buffer
provided in ADDR.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated
+   EINVAL...invalid index */
+#define PTRACE_BTS_READ_RECORD 44
+
+/* Configure last branch trace; the configuration is given as a bit-mask of
+   PTRACE_BTS_O_* options in DATA; parameter ADDR is ignored.
+   Return 0, if successful; -1, otherwise
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_CONFIG 45
+
+/* Return the configuration as bit-mask of PTRACE_BTS_O_* options
+   if successful; -1, otherwise.
+   EOPNOTSUPP...processor does not support bts tracing
+   ENXIOno buffer allocated */
+#define PTRACE_BTS_STATUS 46
+
+/* Trace configuration options */
+/* Collect last branch trace */
+#define PTRACE_BTS_O_TRACE_TASK 0x1
+/* Take timestamps when the task arrives and departs */
+#define PTRACE_BTS_O_TIMESTAMPS 0x2
+
 #endif
Index: linux-2.6-x86/include/asm-x86/ptrace.h
===
--- linux-2.6-x86.orig/include/asm-x86/ptrace.h 2007-11-30 14:03:41.%N +0100
+++ linux-2.6-x86/include/asm-x86/ptrace.h  2007-11-30 14:03:50.%N +0100
@@ -4,8 +4,48 @@
 #include linux/compiler.h/* For __user */
 #include asm/ptrace-abi.h

+
 #ifndef __ASSEMBLY__

+/* a branch trace record entry
+ *
+ * In order to unify the interface between various processor versions,
+ * we use the below data structure for all processors.
+ */
+enum ptrace_bts_qualifier {
+   PTRACE_BTS_INVALID = 0,
+   PTRACE_BTS_BRANCH,
+   PTRACE_BTS_TASK_ARRIVES,
+   PTRACE_BTS_TASK_DEPARTS
+};
+
+struct ptrace_bts_record {
+   enum ptrace_bts_qualifier qualifier;
+   union {
+   /* PTRACE_BTS_BRANCH */
+   struct {
+   long from_ip;
+   long to_ip