Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-09 Thread Mathieu Desnoyers
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
> * Ingo Molnar ([EMAIL PROTECTED]) wrote:
> > 
> > hi Mathieu,
> > 
> > * Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi,
> > > 
> > > Here is the architecture dependent instrumentation for LTTng. [...]
> > 
> > A fundamental observation about markers, and i raised this point many 
> > many months ago already, so it might sound repetitive, but i'm unsure 
> > wether it's addressed. Documentation/markers.txt still says:
> > 
> > | * Purpose of markers
> > |
> > | A marker placed in code provides a hook to call a function (probe) 
> > | that you can provide at runtime. A marker can be "on" (a probe is 
> > | connected to it) or "off" (no probe is attached). When a marker is 
> > | "off" it has no effect, except for adding a tiny time penalty 
> > | (checking a condition for a branch) and space penalty (adding a few 
> > | bytes for the function call at the end of the instrumented function 
> > | and adds a data structure in a separate section).
> > 
> > could you please eliminate the checking of the flag, and insert a pure 
> > NOP sequence by default (no extra branches), which is then patched in 
> > with a function call instruction sequence, when the trace point is 
> > turned on? (on architectures that have code patching infrastructure - 
> > such as x86)
> > 
> 
> Hi Ingo,
> 
[...] 
> * No marker at all
> 
> 240300 cycles total
> 12.02 cycles per loop
> 
[...]
> * With my marker implementation (load immediate 0, branch predicted) :
> 
> between 200355 and 200580 cycles total (avg 200400 cycles)
> 10.02 cycles per loop (yes, adding the marker increases performance)
> 
[...]
> * With NOPs :
> 
> avg around 41 cycles total
> 20.5 cycles/loop (slowdown of 2)
> 
>
[...]
> Therefore, because of the cost of stack setup, the load immediate and
> conditionnal branch seems to be _much_ faster than the NOP alternative.
> 

I wanted to know what clever things the dtrace guys have done, so I just
dug into the dtrace code today, and it isn't pretty for x86.

For the kernel sdt (static dtrace), the closest match to markers, they :

1 - Use the linker to turn the calls to an undefined symbol into 
"0x90 0x90 0x90 0x90 0x90" (5 nops)
(note that they still suffer from the stack setup cost even when
disabled. Therefore, performance-wise, I think the markers are already
faster)

But let's dig deeper..

2 - When what they call a "provider" is actvated, the first byte of the
"instruction" (actually, it would be the second NOP) is changed for a f0
lock prefix) :

"0x90 0xf0 0x90 0x90 0x90"

3 - When this site is hit, the 0xf0 0x90 instruction will produce an
illegal op fault. In the handler, they emulate a trap by incrementing
EIP of the size of the illegal op. They lookup the faulty EIP in a hash
table to know which site caused it and then they call the dtrace_probe
function to call the consumers from there.

So, if I have not missed anything, they will have the performance cost
of a fault and a hash table lookup on the critical path, which is kind
of dumb. Just the fault adds a few thousand cycles (assuming it will
perform like an int3 breakpoint).

Compared to this, my approach of load immediate + branch when disabled
and the added function call when enabled are _much_ more lighweight.

I guess the dtrace approach is good enough on sparc (except for stack
setup cost when disabled), where they patch the 4 bytes nop into a 4
byte function call and manage to get good performance, but the hack they
are doing on x86 seems to be just too slow.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-09 Thread Mathieu Desnoyers
* Mathieu Desnoyers ([EMAIL PROTECTED]) wrote:
 * Ingo Molnar ([EMAIL PROTECTED]) wrote:
  
  hi Mathieu,
  
  * Mathieu Desnoyers [EMAIL PROTECTED] wrote:
  
   Hi,
   
   Here is the architecture dependent instrumentation for LTTng. [...]
  
  A fundamental observation about markers, and i raised this point many 
  many months ago already, so it might sound repetitive, but i'm unsure 
  wether it's addressed. Documentation/markers.txt still says:
  
  | * Purpose of markers
  |
  | A marker placed in code provides a hook to call a function (probe) 
  | that you can provide at runtime. A marker can be on (a probe is 
  | connected to it) or off (no probe is attached). When a marker is 
  | off it has no effect, except for adding a tiny time penalty 
  | (checking a condition for a branch) and space penalty (adding a few 
  | bytes for the function call at the end of the instrumented function 
  | and adds a data structure in a separate section).
  
  could you please eliminate the checking of the flag, and insert a pure 
  NOP sequence by default (no extra branches), which is then patched in 
  with a function call instruction sequence, when the trace point is 
  turned on? (on architectures that have code patching infrastructure - 
  such as x86)
  
 
 Hi Ingo,
 
[...] 
 * No marker at all
 
 240300 cycles total
 12.02 cycles per loop
 
[...]
 * With my marker implementation (load immediate 0, branch predicted) :
 
 between 200355 and 200580 cycles total (avg 200400 cycles)
 10.02 cycles per loop (yes, adding the marker increases performance)
 
[...]
 * With NOPs :
 
 avg around 41 cycles total
 20.5 cycles/loop (slowdown of 2)
 

[...]
 Therefore, because of the cost of stack setup, the load immediate and
 conditionnal branch seems to be _much_ faster than the NOP alternative.
 

I wanted to know what clever things the dtrace guys have done, so I just
dug into the dtrace code today, and it isn't pretty for x86.

For the kernel sdt (static dtrace), the closest match to markers, they :

1 - Use the linker to turn the calls to an undefined symbol into 
0x90 0x90 0x90 0x90 0x90 (5 nops)
(note that they still suffer from the stack setup cost even when
disabled. Therefore, performance-wise, I think the markers are already
faster)

But let's dig deeper..

2 - When what they call a provider is actvated, the first byte of the
instruction (actually, it would be the second NOP) is changed for a f0
lock prefix) :

0x90 0xf0 0x90 0x90 0x90

3 - When this site is hit, the 0xf0 0x90 instruction will produce an
illegal op fault. In the handler, they emulate a trap by incrementing
EIP of the size of the illegal op. They lookup the faulty EIP in a hash
table to know which site caused it and then they call the dtrace_probe
function to call the consumers from there.

So, if I have not missed anything, they will have the performance cost
of a fault and a hash table lookup on the critical path, which is kind
of dumb. Just the fault adds a few thousand cycles (assuming it will
perform like an int3 breakpoint).

Compared to this, my approach of load immediate + branch when disabled
and the added function call when enabled are _much_ more lighweight.

I guess the dtrace approach is good enough on sparc (except for stack
setup cost when disabled), where they patch the 4 bytes nop into a 4
byte function call and manage to get good performance, but the hack they
are doing on x86 seems to be just too slow.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-08 Thread Mathieu Desnoyers
* Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> hi Mathieu,
> 
> * Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > Here is the architecture dependent instrumentation for LTTng. [...]
> 
> A fundamental observation about markers, and i raised this point many 
> many months ago already, so it might sound repetitive, but i'm unsure 
> wether it's addressed. Documentation/markers.txt still says:
> 
> | * Purpose of markers
> |
> | A marker placed in code provides a hook to call a function (probe) 
> | that you can provide at runtime. A marker can be "on" (a probe is 
> | connected to it) or "off" (no probe is attached). When a marker is 
> | "off" it has no effect, except for adding a tiny time penalty 
> | (checking a condition for a branch) and space penalty (adding a few 
> | bytes for the function call at the end of the instrumented function 
> | and adds a data structure in a separate section).
> 
> could you please eliminate the checking of the flag, and insert a pure 
> NOP sequence by default (no extra branches), which is then patched in 
> with a function call instruction sequence, when the trace point is 
> turned on? (on architectures that have code patching infrastructure - 
> such as x86)
> 

Hi Ingo,

Here are the results of a test I made, hacking a binary to put nops
instead of a function call.

The test is 2 loops calling a function that contains a marker with
interrupts disabled. It is performed on a x86 32, Pentium 4 3GHz.

__my_trace_mark(0, kernel_debug_test, NULL, "%d %d %ld %ld", 2, current->pid,
  arg, arg2);

The number here include the function call (present in both cases) the
counter increment/tests and the marker.

* No marker at all

240300 cycles total
12.02 cycles per loop

void test(unsigned long arg, unsigned long arg2)
{
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
asm volatile ("");
}
   3:   5d  pop%ebp
   4:   c3  ret


* With my marker implementation (load immediate 0, branch predicted) :

between 200355 and 200580 cycles total (avg 200400 cycles)
10.02 cycles per loop (yes, adding the marker increases performance)


void test(unsigned long arg, unsigned long arg2)
{
  4d:   55  push   %ebp
  4e:   89 e5   mov%esp,%ebp
  50:   83 ec 1csub$0x1c,%esp
  53:   89 c1   mov%eax,%ecx
__my_trace_mark(0, kernel_debug_test, NULL, "%d %d %ld %ld", 2, current-
>pid, arg, arg2);
  55:   b0 00   mov$0x0,%al
  57:   84 c0   test   %al,%al
  59:   75 02   jne5d 
}
  5b:   c9  leave  
  5c:   c3  ret


* With NOPs :

avg around 41 cycles total
20.5 cycles/loop (slowdown of 2)

void test(unsigned long arg, unsigned long arg2)
{
  4d:   55  push   %ebp
  4e:   89 e5   mov%esp,%ebp
  50:   83 ec 1csub$0x1c,%esp
struct task_struct;

DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return x86_read_percpu(current_task);
  53:   64 8b 0d 00 00 00 00mov%fs:0x0,%ecx
__my_trace_mark(0, kernel_debug_test, NULL, "%d %d %ld %ld", 2, current-
>pid, arg, arg2);
  5a:   89 54 24 18 mov%edx,0x18(%esp)
  5e:   89 44 24 14 mov%eax,0x14(%esp)
  62:   8b 81 c4 00 00 00   mov0xc4(%ecx),%eax
  68:   89 44 24 10 mov%eax,0x10(%esp)
  6c:   c7 44 24 0c 02 00 00movl   $0x2,0xc(%esp)
  73:   00 
  74:   c7 44 24 08 0e 00 00movl   $0xe,0x8(%esp)
  7b:   00 
  7c:   c7 44 24 04 00 00 00movl   $0x0,0x4(%esp)
  83:   00 
  84:   c7 04 24 00 00 00 00movl   $0x0,(%esp)
  8b:   90  nop
  8c:   90  nop
  8d:   90  nop
  8e:   90  nop
  8f:   90  nop
}
  90:   c9  leave  
  91:   c3  ret


Therefore, because of the cost of stack setup, the load immediate and
conditionnal branch seems to be _much_ faster than the NOP alternative.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-08 Thread Mathieu Desnoyers
* Ingo Molnar ([EMAIL PROTECTED]) wrote:
 
 hi Mathieu,
 
 * Mathieu Desnoyers [EMAIL PROTECTED] wrote:
 
  Hi,
  
  Here is the architecture dependent instrumentation for LTTng. [...]
 
 A fundamental observation about markers, and i raised this point many 
 many months ago already, so it might sound repetitive, but i'm unsure 
 wether it's addressed. Documentation/markers.txt still says:
 
 | * Purpose of markers
 |
 | A marker placed in code provides a hook to call a function (probe) 
 | that you can provide at runtime. A marker can be on (a probe is 
 | connected to it) or off (no probe is attached). When a marker is 
 | off it has no effect, except for adding a tiny time penalty 
 | (checking a condition for a branch) and space penalty (adding a few 
 | bytes for the function call at the end of the instrumented function 
 | and adds a data structure in a separate section).
 
 could you please eliminate the checking of the flag, and insert a pure 
 NOP sequence by default (no extra branches), which is then patched in 
 with a function call instruction sequence, when the trace point is 
 turned on? (on architectures that have code patching infrastructure - 
 such as x86)
 

Hi Ingo,

Here are the results of a test I made, hacking a binary to put nops
instead of a function call.

The test is 2 loops calling a function that contains a marker with
interrupts disabled. It is performed on a x86 32, Pentium 4 3GHz.

__my_trace_mark(0, kernel_debug_test, NULL, %d %d %ld %ld, 2, current-pid,
  arg, arg2);

The number here include the function call (present in both cases) the
counter increment/tests and the marker.

* No marker at all

240300 cycles total
12.02 cycles per loop

void test(unsigned long arg, unsigned long arg2)
{
   0:   55  push   %ebp
   1:   89 e5   mov%esp,%ebp
asm volatile ();
}
   3:   5d  pop%ebp
   4:   c3  ret


* With my marker implementation (load immediate 0, branch predicted) :

between 200355 and 200580 cycles total (avg 200400 cycles)
10.02 cycles per loop (yes, adding the marker increases performance)


void test(unsigned long arg, unsigned long arg2)
{
  4d:   55  push   %ebp
  4e:   89 e5   mov%esp,%ebp
  50:   83 ec 1csub$0x1c,%esp
  53:   89 c1   mov%eax,%ecx
__my_trace_mark(0, kernel_debug_test, NULL, %d %d %ld %ld, 2, current-
pid, arg, arg2);
  55:   b0 00   mov$0x0,%al
  57:   84 c0   test   %al,%al
  59:   75 02   jne5d test+0x10
}
  5b:   c9  leave  
  5c:   c3  ret


* With NOPs :

avg around 41 cycles total
20.5 cycles/loop (slowdown of 2)

void test(unsigned long arg, unsigned long arg2)
{
  4d:   55  push   %ebp
  4e:   89 e5   mov%esp,%ebp
  50:   83 ec 1csub$0x1c,%esp
struct task_struct;

DECLARE_PER_CPU(struct task_struct *, current_task);
static __always_inline struct task_struct *get_current(void)
{
return x86_read_percpu(current_task);
  53:   64 8b 0d 00 00 00 00mov%fs:0x0,%ecx
__my_trace_mark(0, kernel_debug_test, NULL, %d %d %ld %ld, 2, current-
pid, arg, arg2);
  5a:   89 54 24 18 mov%edx,0x18(%esp)
  5e:   89 44 24 14 mov%eax,0x14(%esp)
  62:   8b 81 c4 00 00 00   mov0xc4(%ecx),%eax
  68:   89 44 24 10 mov%eax,0x10(%esp)
  6c:   c7 44 24 0c 02 00 00movl   $0x2,0xc(%esp)
  73:   00 
  74:   c7 44 24 08 0e 00 00movl   $0xe,0x8(%esp)
  7b:   00 
  7c:   c7 44 24 04 00 00 00movl   $0x0,0x4(%esp)
  83:   00 
  84:   c7 04 24 00 00 00 00movl   $0x0,(%esp)
  8b:   90  nop
  8c:   90  nop
  8d:   90  nop
  8e:   90  nop
  8f:   90  nop
}
  90:   c9  leave  
  91:   c3  ret


Therefore, because of the cost of stack setup, the load immediate and
conditionnal branch seems to be _much_ faster than the NOP alternative.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-06 Thread Mathieu Desnoyers
* Ingo Molnar ([EMAIL PROTECTED]) wrote:
> 
> hi Mathieu,
> 
> * Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> > 
> > Here is the architecture dependent instrumentation for LTTng. [...]
> 
> A fundamental observation about markers, and i raised this point many 
> many months ago already, so it might sound repetitive, but i'm unsure 
> wether it's addressed. Documentation/markers.txt still says:
> 
> | * Purpose of markers
> |
> | A marker placed in code provides a hook to call a function (probe) 
> | that you can provide at runtime. A marker can be "on" (a probe is 
> | connected to it) or "off" (no probe is attached). When a marker is 
> | "off" it has no effect, except for adding a tiny time penalty 
> | (checking a condition for a branch) and space penalty (adding a few 
> | bytes for the function call at the end of the instrumented function 
> | and adds a data structure in a separate section).
> 
> could you please eliminate the checking of the flag, and insert a pure 
> NOP sequence by default (no extra branches), which is then patched in 
> with a function call instruction sequence, when the trace point is 
> turned on? (on architectures that have code patching infrastructure - 
> such as x86)
> 
>   Ingo

Hi Ingo,

Do you propose that we NOP out the entire function call stack setup and
other related inline functions and pointer dereferences that would be
needed by the call ?

I don't see how we can do this on optimized code without having
side-effects. So there, I think markers could have even less side-effect
than the dtrace NOPs, because I can jump over all the function call
preparation, which they can't. And branch prediction logic is cheap
nowadays, especially since this is a likely branch. However,
benchmarking the "real" impact of this becomes kind of crazy, because it
may depend on workloads, memory pressure,  so it leaves us mostly
with microbenchmarks.

I also tried to use an unconditional jump to skip the function call, but
the problem here, as has been discussed about a year ago, is gcc : it
does not allow to jump getween two different inline assembly. And since
I don't want to create the a number of macros equivalent to the powerset
of the number of arguments/types we want to support (this is why I use
var args), and I don't see how we could declare var args in inline
assembly portably, I guess the best solution left was what I have done :
loading an immediate value and let gcc handle the conditional jump.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-06 Thread Ingo Molnar

hi Mathieu,

* Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Here is the architecture dependent instrumentation for LTTng. [...]

A fundamental observation about markers, and i raised this point many 
many months ago already, so it might sound repetitive, but i'm unsure 
wether it's addressed. Documentation/markers.txt still says:

| * Purpose of markers
|
| A marker placed in code provides a hook to call a function (probe) 
| that you can provide at runtime. A marker can be "on" (a probe is 
| connected to it) or "off" (no probe is attached). When a marker is 
| "off" it has no effect, except for adding a tiny time penalty 
| (checking a condition for a branch) and space penalty (adding a few 
| bytes for the function call at the end of the instrumented function 
| and adds a data structure in a separate section).

could you please eliminate the checking of the flag, and insert a pure 
NOP sequence by default (no extra branches), which is then patched in 
with a function call instruction sequence, when the trace point is 
turned on? (on architectures that have code patching infrastructure - 
such as x86)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-06 Thread Ingo Molnar

hi Mathieu,

* Mathieu Desnoyers [EMAIL PROTECTED] wrote:

 Hi,
 
 Here is the architecture dependent instrumentation for LTTng. [...]

A fundamental observation about markers, and i raised this point many 
many months ago already, so it might sound repetitive, but i'm unsure 
wether it's addressed. Documentation/markers.txt still says:

| * Purpose of markers
|
| A marker placed in code provides a hook to call a function (probe) 
| that you can provide at runtime. A marker can be on (a probe is 
| connected to it) or off (no probe is attached). When a marker is 
| off it has no effect, except for adding a tiny time penalty 
| (checking a condition for a branch) and space penalty (adding a few 
| bytes for the function call at the end of the instrumented function 
| and adds a data structure in a separate section).

could you please eliminate the checking of the flag, and insert a pure 
NOP sequence by default (no extra branches), which is then patched in 
with a function call instruction sequence, when the trace point is 
turned on? (on architectures that have code patching infrastructure - 
such as x86)

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-06 Thread Mathieu Desnoyers
* Ingo Molnar ([EMAIL PROTECTED]) wrote:
 
 hi Mathieu,
 
 * Mathieu Desnoyers [EMAIL PROTECTED] wrote:
 
  Hi,
  
  Here is the architecture dependent instrumentation for LTTng. [...]
 
 A fundamental observation about markers, and i raised this point many 
 many months ago already, so it might sound repetitive, but i'm unsure 
 wether it's addressed. Documentation/markers.txt still says:
 
 | * Purpose of markers
 |
 | A marker placed in code provides a hook to call a function (probe) 
 | that you can provide at runtime. A marker can be on (a probe is 
 | connected to it) or off (no probe is attached). When a marker is 
 | off it has no effect, except for adding a tiny time penalty 
 | (checking a condition for a branch) and space penalty (adding a few 
 | bytes for the function call at the end of the instrumented function 
 | and adds a data structure in a separate section).
 
 could you please eliminate the checking of the flag, and insert a pure 
 NOP sequence by default (no extra branches), which is then patched in 
 with a function call instruction sequence, when the trace point is 
 turned on? (on architectures that have code patching infrastructure - 
 such as x86)
 
   Ingo

Hi Ingo,

Do you propose that we NOP out the entire function call stack setup and
other related inline functions and pointer dereferences that would be
needed by the call ?

I don't see how we can do this on optimized code without having
side-effects. So there, I think markers could have even less side-effect
than the dtrace NOPs, because I can jump over all the function call
preparation, which they can't. And branch prediction logic is cheap
nowadays, especially since this is a likely branch. However,
benchmarking the real impact of this becomes kind of crazy, because it
may depend on workloads, memory pressure,  so it leaves us mostly
with microbenchmarks.

I also tried to use an unconditional jump to skip the function call, but
the problem here, as has been discussed about a year ago, is gcc : it
does not allow to jump getween two different inline assembly. And since
I don't want to create the a number of macros equivalent to the powerset
of the number of arguments/types we want to support (this is why I use
var args), and I don't see how we could declare var args in inline
assembly portably, I guess the best solution left was what I have done :
loading an immediate value and let gcc handle the conditional jump.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-05 Thread Mathieu Desnoyers
Hi,

Here is the architecture dependent instrumentation for LTTng. Not all
architectures are supported, and some of them have missing instrumentation
points.

The most complete should be :
x86_32, x86_64, powerpc, mips and arm.

It depends on the kernel trace thread flag patchset.

It instruments :
- traps/faults
- system calls
- kernel thread creation
- IPC calls

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch-early-RFC 00/10] LTTng architecture dependent instrumentation

2007-12-05 Thread Mathieu Desnoyers
Hi,

Here is the architecture dependent instrumentation for LTTng. Not all
architectures are supported, and some of them have missing instrumentation
points.

The most complete should be :
x86_32, x86_64, powerpc, mips and arm.

It depends on the kernel trace thread flag patchset.

It instruments :
- traps/faults
- system calls
- kernel thread creation
- IPC calls

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/