Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-08 Thread Masami Hiramatsu
Hello,

On Sun, 8 Jan 2017 13:22:42 +0900
Masami Hiramatsu  wrote:
> > > Would you have any good benchmark to measure it?
> > 
> > Not trivially so; what I did was cobble together a debugfs file that
> > measures the average of the PMI time in perf_sample_event_took(), and a
> > module that has a 10 deep callchain around a while(1) loop. Then perf
> > record with callchains for a few seconds.
> > 
> > Generating the callchain does the unwinder thing and ends up calling
> > is_kernel_address() lots.
> > 
> > The case I worked on was 0 modules vs 100+ modules in a distro build,
> > which was fairly obviously painful back then, since
> > is_module_text_address() used a linear lookup.
> > 
> > I'm not sure I still have all those bits, but I can dig around a bit if
> > you're interested.
> 
> Hmm, I tried to do similar thing (make a test module which has a loop with
> 10 deep recursive call and save stack-trace) on kvm, but got very unstable
> results.
> Maybe it needs to run on bare-metal machine.

On bare-metal machine, I tried to use attached module to measure
actual performance degrade.

benchmarks with no kprobeq
$ lsmod | wc -l
123

$ for i in {0..10}; do sudo insmod ./unwind_bench.ko && sudo rmmod 
unwind_bench;done

[  245.639150] Start benchmarking.
[  245.639353] End benchmarking. Elapsed: 620579 cycles, loop count:1000
[  277.481621] Start benchmarking.
[  277.481821] End benchmarking. Elapsed: 609523 cycles, loop count:1000
[  302.210924] Start benchmarking.
[  302.211077] End benchmarking. Elapsed: 441272 cycles, loop count:1000
[  302.242931] Start benchmarking.
[  302.243070] End benchmarking. Elapsed: 401676 cycles, loop count:1000
[  302.270682] Start benchmarking.
[  302.270762] End benchmarking. Elapsed: 242776 cycles, loop count:1000
[  302.294656] Start benchmarking.
[  302.294737] End benchmarking. Elapsed: 247352 cycles, loop count:1000
[  302.318517] Start benchmarking.
[  302.318600] End benchmarking. Elapsed: 255048 cycles, loop count:1000
[  302.344519] Start benchmarking.
[  302.344599] End benchmarking. Elapsed: 242392 cycles, loop count:1000
[  302.369450] Start benchmarking.
[  302.369530] End benchmarking. Elapsed: 242628 cycles, loop count:1000
[  302.399517] Start benchmarking.
[  302.399605] End benchmarking. Elapsed: 268736 cycles, loop count:1000
[  302.423508] Start benchmarking.
[  302.423587] End benchmarking. Elapsed: 242068 cycles, loop count:1000
[  302.451282] Start benchmarking.
[  302.451361] End benchmarking. Elapsed: 242272 cycles, loop count:1000
[  302.480656] Start benchmarking.
[  302.480736] End benchmarking. Elapsed: 243552 cycles, loop count:1000

benchmarks with disabled 106 kprobes (only 89 probes are optimized)

# grep "[tT] audit_*" /proc/kallsyms | while read a b c d; do echo p $c+5 >> 
kprobe_events ;done 
# cat ../kprobes/list  | wc -l
106
# echo 1 > events/enable 
# grep OPTIMIZED ../kprobes/list  | wc -l
89

[ 1037.065151] Start benchmarking.
[ 1037.065500] End benchmarking. Elapsed: 1067196 cycles, loop count:1000
[ 1037.113893] Start benchmarking.
[ 1037.114766] End benchmarking. Elapsed: 2667646 cycles, loop count:1000
[ 1037.149251] Start benchmarking.
[ 1037.149382] End benchmarking. Elapsed: 402500 cycles, loop count:1000
[ 1037.180287] Start benchmarking.
[ 1037.180368] End benchmarking. Elapsed: 246000 cycles, loop count:1000
[ 1037.208416] Start benchmarking.
[ 1037.208503] End benchmarking. Elapsed: 266768 cycles, loop count:1000
[ 1037.236328] Start benchmarking.
[ 1037.236407] End benchmarking. Elapsed: 242396 cycles, loop count:1000
[ 1037.270207] Start benchmarking.
[ 1037.270288] End benchmarking. Elapsed: 246384 cycles, loop count:1000
[ 1037.302086] Start benchmarking.
[ 1037.302174] End benchmarking. Elapsed: 266796 cycles, loop count:1000
[ 1037.330165] Start benchmarking.
[ 1037.330245] End benchmarking. Elapsed: 242888 cycles, loop count:1000
[ 1037.358207] Start benchmarking.
[ 1037.358287] End benchmarking. Elapsed: 243288 cycles, loop count:1000
[ 1037.388134] Start benchmarking.
[ 1037.388213] End benchmarking. Elapsed: 240936 cycles, loop count:1000

FYI, I also tried same benchmark without patch, but it was very unstable.
Benchmarks without patch:

[ 4583.832368] Start benchmarking.
[ 4583.832943] End benchmarking. Elapsed: 1758512 cycles, loop count:1000
[ 4712.636096] Start benchmarking.
[ 4712.636712] End benchmarking. Elapsed: 1881112 cycles, loop count:1000
[ 4712.668672] Start benchmarking.
[ 4712.669254] End benchmarking. Elapsed: 1777584 cycles, loop count:1000
[ 4712.696150] Start benchmarking.
[ 4712.696925] End benchmarking. Elapsed: 2324292 cycles, loop count:1000
[ 4712.727850] Start benchmarking.
[ 4712.727988] End benchmarking. Elapsed: 422768 cycles, loop count:1000
[ 4712.756108] Start benchmarking.
[ 4712.756684] End benchmarking. Elapsed: 1755676 cycles, loop count:1000
[ 4712.780775] Start benchmarking.
[ 4712.781347] End benchmarking. Elapsed: 1749872 cycles, loop count:1000
[ 

Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-08 Thread Masami Hiramatsu
Hello,

On Sun, 8 Jan 2017 13:22:42 +0900
Masami Hiramatsu  wrote:
> > > Would you have any good benchmark to measure it?
> > 
> > Not trivially so; what I did was cobble together a debugfs file that
> > measures the average of the PMI time in perf_sample_event_took(), and a
> > module that has a 10 deep callchain around a while(1) loop. Then perf
> > record with callchains for a few seconds.
> > 
> > Generating the callchain does the unwinder thing and ends up calling
> > is_kernel_address() lots.
> > 
> > The case I worked on was 0 modules vs 100+ modules in a distro build,
> > which was fairly obviously painful back then, since
> > is_module_text_address() used a linear lookup.
> > 
> > I'm not sure I still have all those bits, but I can dig around a bit if
> > you're interested.
> 
> Hmm, I tried to do similar thing (make a test module which has a loop with
> 10 deep recursive call and save stack-trace) on kvm, but got very unstable
> results.
> Maybe it needs to run on bare-metal machine.

On bare-metal machine, I tried to use attached module to measure
actual performance degrade.

benchmarks with no kprobeq
$ lsmod | wc -l
123

$ for i in {0..10}; do sudo insmod ./unwind_bench.ko && sudo rmmod 
unwind_bench;done

[  245.639150] Start benchmarking.
[  245.639353] End benchmarking. Elapsed: 620579 cycles, loop count:1000
[  277.481621] Start benchmarking.
[  277.481821] End benchmarking. Elapsed: 609523 cycles, loop count:1000
[  302.210924] Start benchmarking.
[  302.211077] End benchmarking. Elapsed: 441272 cycles, loop count:1000
[  302.242931] Start benchmarking.
[  302.243070] End benchmarking. Elapsed: 401676 cycles, loop count:1000
[  302.270682] Start benchmarking.
[  302.270762] End benchmarking. Elapsed: 242776 cycles, loop count:1000
[  302.294656] Start benchmarking.
[  302.294737] End benchmarking. Elapsed: 247352 cycles, loop count:1000
[  302.318517] Start benchmarking.
[  302.318600] End benchmarking. Elapsed: 255048 cycles, loop count:1000
[  302.344519] Start benchmarking.
[  302.344599] End benchmarking. Elapsed: 242392 cycles, loop count:1000
[  302.369450] Start benchmarking.
[  302.369530] End benchmarking. Elapsed: 242628 cycles, loop count:1000
[  302.399517] Start benchmarking.
[  302.399605] End benchmarking. Elapsed: 268736 cycles, loop count:1000
[  302.423508] Start benchmarking.
[  302.423587] End benchmarking. Elapsed: 242068 cycles, loop count:1000
[  302.451282] Start benchmarking.
[  302.451361] End benchmarking. Elapsed: 242272 cycles, loop count:1000
[  302.480656] Start benchmarking.
[  302.480736] End benchmarking. Elapsed: 243552 cycles, loop count:1000

benchmarks with disabled 106 kprobes (only 89 probes are optimized)

# grep "[tT] audit_*" /proc/kallsyms | while read a b c d; do echo p $c+5 >> 
kprobe_events ;done 
# cat ../kprobes/list  | wc -l
106
# echo 1 > events/enable 
# grep OPTIMIZED ../kprobes/list  | wc -l
89

[ 1037.065151] Start benchmarking.
[ 1037.065500] End benchmarking. Elapsed: 1067196 cycles, loop count:1000
[ 1037.113893] Start benchmarking.
[ 1037.114766] End benchmarking. Elapsed: 2667646 cycles, loop count:1000
[ 1037.149251] Start benchmarking.
[ 1037.149382] End benchmarking. Elapsed: 402500 cycles, loop count:1000
[ 1037.180287] Start benchmarking.
[ 1037.180368] End benchmarking. Elapsed: 246000 cycles, loop count:1000
[ 1037.208416] Start benchmarking.
[ 1037.208503] End benchmarking. Elapsed: 266768 cycles, loop count:1000
[ 1037.236328] Start benchmarking.
[ 1037.236407] End benchmarking. Elapsed: 242396 cycles, loop count:1000
[ 1037.270207] Start benchmarking.
[ 1037.270288] End benchmarking. Elapsed: 246384 cycles, loop count:1000
[ 1037.302086] Start benchmarking.
[ 1037.302174] End benchmarking. Elapsed: 266796 cycles, loop count:1000
[ 1037.330165] Start benchmarking.
[ 1037.330245] End benchmarking. Elapsed: 242888 cycles, loop count:1000
[ 1037.358207] Start benchmarking.
[ 1037.358287] End benchmarking. Elapsed: 243288 cycles, loop count:1000
[ 1037.388134] Start benchmarking.
[ 1037.388213] End benchmarking. Elapsed: 240936 cycles, loop count:1000

FYI, I also tried same benchmark without patch, but it was very unstable.
Benchmarks without patch:

[ 4583.832368] Start benchmarking.
[ 4583.832943] End benchmarking. Elapsed: 1758512 cycles, loop count:1000
[ 4712.636096] Start benchmarking.
[ 4712.636712] End benchmarking. Elapsed: 1881112 cycles, loop count:1000
[ 4712.668672] Start benchmarking.
[ 4712.669254] End benchmarking. Elapsed: 1777584 cycles, loop count:1000
[ 4712.696150] Start benchmarking.
[ 4712.696925] End benchmarking. Elapsed: 2324292 cycles, loop count:1000
[ 4712.727850] Start benchmarking.
[ 4712.727988] End benchmarking. Elapsed: 422768 cycles, loop count:1000
[ 4712.756108] Start benchmarking.
[ 4712.756684] End benchmarking. Elapsed: 1755676 cycles, loop count:1000
[ 4712.780775] Start benchmarking.
[ 4712.781347] End benchmarking. Elapsed: 1749872 cycles, loop count:1000
[ 4712.811910] Start 

Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-07 Thread Masami Hiramatsu
On Wed, 4 Jan 2017 11:01:02 +0100
Peter Zijlstra  wrote:

> On Wed, Jan 04, 2017 at 02:06:04PM +0900, Masami Hiramatsu wrote:
> > On Tue, 3 Jan 2017 11:54:02 +0100
> > Peter Zijlstra  wrote:
> 
> > > How many entries should one expect on that list? I spend quite a bit of
> > > time reducing the cost of is_module_text_address() a while back and see
> > > that both ftrace (which actually needs this to be fast) and now
> > > kprobes have linear list walks in here.
> > 
> > It depends on how many probes are used and optimized. However, in most
> > cases, there should be one entry (unless user defines optimized probes
> > over 32 on x86, from my experience, it is very rare case. :) )
> 
> OK, that's good :-)

OK, I'll add above comment on the patch.

> 
> > > I'm assuming the ftrace thing to be mostly empty, since I never saw it
> > > on my benchmarks back then, but it is something Steve should look at I
> > > suppose.
> > > 
> > > Similarly, the changelog here should include some talk about worst case
> > > costs.
> > 
> > Would you have any good benchmark to measure it?
> 
> Not trivially so; what I did was cobble together a debugfs file that
> measures the average of the PMI time in perf_sample_event_took(), and a
> module that has a 10 deep callchain around a while(1) loop. Then perf
> record with callchains for a few seconds.
> 
> Generating the callchain does the unwinder thing and ends up calling
> is_kernel_address() lots.
> 
> The case I worked on was 0 modules vs 100+ modules in a distro build,
> which was fairly obviously painful back then, since
> is_module_text_address() used a linear lookup.
> 
> I'm not sure I still have all those bits, but I can dig around a bit if
> you're interested.

Hmm, I tried to do similar thing (make a test module which has a loop with
10 deep recursive call and save stack-trace) on kvm, but got very unstable
results.
Maybe it needs to run on bare-metal machine.

Thanks,

-- 
Masami Hiramatsu 


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-07 Thread Masami Hiramatsu
On Wed, 4 Jan 2017 11:01:02 +0100
Peter Zijlstra  wrote:

> On Wed, Jan 04, 2017 at 02:06:04PM +0900, Masami Hiramatsu wrote:
> > On Tue, 3 Jan 2017 11:54:02 +0100
> > Peter Zijlstra  wrote:
> 
> > > How many entries should one expect on that list? I spend quite a bit of
> > > time reducing the cost of is_module_text_address() a while back and see
> > > that both ftrace (which actually needs this to be fast) and now
> > > kprobes have linear list walks in here.
> > 
> > It depends on how many probes are used and optimized. However, in most
> > cases, there should be one entry (unless user defines optimized probes
> > over 32 on x86, from my experience, it is very rare case. :) )
> 
> OK, that's good :-)

OK, I'll add above comment on the patch.

> 
> > > I'm assuming the ftrace thing to be mostly empty, since I never saw it
> > > on my benchmarks back then, but it is something Steve should look at I
> > > suppose.
> > > 
> > > Similarly, the changelog here should include some talk about worst case
> > > costs.
> > 
> > Would you have any good benchmark to measure it?
> 
> Not trivially so; what I did was cobble together a debugfs file that
> measures the average of the PMI time in perf_sample_event_took(), and a
> module that has a 10 deep callchain around a while(1) loop. Then perf
> record with callchains for a few seconds.
> 
> Generating the callchain does the unwinder thing and ends up calling
> is_kernel_address() lots.
> 
> The case I worked on was 0 modules vs 100+ modules in a distro build,
> which was fairly obviously painful back then, since
> is_module_text_address() used a linear lookup.
> 
> I'm not sure I still have all those bits, but I can dig around a bit if
> you're interested.

Hmm, I tried to do similar thing (make a test module which has a loop with
10 deep recursive call and save stack-trace) on kvm, but got very unstable
results.
Maybe it needs to run on bare-metal machine.

Thanks,

-- 
Masami Hiramatsu 


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-04 Thread Steven Rostedt
On Tue, 3 Jan 2017 11:54:02 +0100
Peter Zijlstra  wrote:


> How many entries should one expect on that list? I spend quite a bit of
> time reducing the cost of is_module_text_address() a while back and see
> that both ftrace (which actually needs this to be fast) and now
> kprobes have linear list walks in here.
> 
> I'm assuming the ftrace thing to be mostly empty, since I never saw it
> on my benchmarks back then, but it is something Steve should look at I
> suppose.

Yeah, that do_for_each_ftrace_op() loop iterates all the users that
have connected to function tracing. Which in your case is probably at
most 2 (one for ftrace and one for perf). You could get more by
creating instances and enabling function tracing there too. Oh, and the
stack tracer could add its own.

Hmm, with the addition of live kernel patching, this list could get
larger. As you can have one per updated function. But heh, that's just
part of the overhead for live patching ;-)

-- Steve


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-04 Thread Steven Rostedt
On Tue, 3 Jan 2017 11:54:02 +0100
Peter Zijlstra  wrote:


> How many entries should one expect on that list? I spend quite a bit of
> time reducing the cost of is_module_text_address() a while back and see
> that both ftrace (which actually needs this to be fast) and now
> kprobes have linear list walks in here.
> 
> I'm assuming the ftrace thing to be mostly empty, since I never saw it
> on my benchmarks back then, but it is something Steve should look at I
> suppose.

Yeah, that do_for_each_ftrace_op() loop iterates all the users that
have connected to function tracing. Which in your case is probably at
most 2 (one for ftrace and one for perf). You could get more by
creating instances and enabling function tracing there too. Oh, and the
stack tracer could add its own.

Hmm, with the addition of live kernel patching, this list could get
larger. As you can have one per updated function. But heh, that's just
part of the overhead for live patching ;-)

-- Steve


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-04 Thread Peter Zijlstra
On Wed, Jan 04, 2017 at 02:06:04PM +0900, Masami Hiramatsu wrote:
> On Tue, 3 Jan 2017 11:54:02 +0100
> Peter Zijlstra  wrote:

> > How many entries should one expect on that list? I spend quite a bit of
> > time reducing the cost of is_module_text_address() a while back and see
> > that both ftrace (which actually needs this to be fast) and now
> > kprobes have linear list walks in here.
> 
> It depends on how many probes are used and optimized. However, in most
> cases, there should be one entry (unless user defines optimized probes
> over 32 on x86, from my experience, it is very rare case. :) )

OK, that's good :-)

> > I'm assuming the ftrace thing to be mostly empty, since I never saw it
> > on my benchmarks back then, but it is something Steve should look at I
> > suppose.
> > 
> > Similarly, the changelog here should include some talk about worst case
> > costs.
> 
> Would you have any good benchmark to measure it?

Not trivially so; what I did was cobble together a debugfs file that
measures the average of the PMI time in perf_sample_event_took(), and a
module that has a 10 deep callchain around a while(1) loop. Then perf
record with callchains for a few seconds.

Generating the callchain does the unwinder thing and ends up calling
is_kernel_address() lots.

The case I worked on was 0 modules vs 100+ modules in a distro build,
which was fairly obviously painful back then, since
is_module_text_address() used a linear lookup.

I'm not sure I still have all those bits, but I can dig around a bit if
you're interested.


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-04 Thread Peter Zijlstra
On Wed, Jan 04, 2017 at 02:06:04PM +0900, Masami Hiramatsu wrote:
> On Tue, 3 Jan 2017 11:54:02 +0100
> Peter Zijlstra  wrote:

> > How many entries should one expect on that list? I spend quite a bit of
> > time reducing the cost of is_module_text_address() a while back and see
> > that both ftrace (which actually needs this to be fast) and now
> > kprobes have linear list walks in here.
> 
> It depends on how many probes are used and optimized. However, in most
> cases, there should be one entry (unless user defines optimized probes
> over 32 on x86, from my experience, it is very rare case. :) )

OK, that's good :-)

> > I'm assuming the ftrace thing to be mostly empty, since I never saw it
> > on my benchmarks back then, but it is something Steve should look at I
> > suppose.
> > 
> > Similarly, the changelog here should include some talk about worst case
> > costs.
> 
> Would you have any good benchmark to measure it?

Not trivially so; what I did was cobble together a debugfs file that
measures the average of the PMI time in perf_sample_event_took(), and a
module that has a 10 deep callchain around a while(1) loop. Then perf
record with callchains for a few seconds.

Generating the callchain does the unwinder thing and ends up calling
is_kernel_address() lots.

The case I worked on was 0 modules vs 100+ modules in a distro build,
which was fairly obviously painful back then, since
is_module_text_address() used a linear lookup.

I'm not sure I still have all those bits, but I can dig around a bit if
you're interested.


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-03 Thread Masami Hiramatsu
On Tue, 3 Jan 2017 11:54:02 +0100
Peter Zijlstra  wrote:

> On Tue, Dec 27, 2016 at 03:14:10PM +0900, Masami Hiramatsu wrote:
> 
> > diff --git a/kernel/extable.c b/kernel/extable.c
> > index e820cce..81c9633 100644
> > --- a/kernel/extable.c
> > +++ b/kernel/extable.c
> 
> > @@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr)
> > return 1;
> > if (is_module_text_address(addr))
> > return 1;
> > -   return is_ftrace_trampoline(addr);
> > +   if (is_ftrace_trampoline(addr))
> > +   return 1;
> > +   if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
> > +   return 1;
> > +   return 0;
> >  }
> 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index d630954..be41f6d 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> 
> > +/*
> > + * Check given address is on the page of kprobe instruction slots.
> > + * This will be used for checking whether the address on a stack
> > + * is on a text area or not.
> > + */
> > +bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr)
> > +{
> > +   struct kprobe_insn_page *kip;
> > +   bool ret = false;
> > +
> > +   rcu_read_lock();
> > +   list_for_each_entry_rcu(kip, >pages, list) {
> > +   if (addr >= (unsigned long)kip->insns &&
> > +   addr < (unsigned long)kip->insns + PAGE_SIZE) {
> > +   ret = true;
> > +   break;
> > +   }
> > +   }
> > +   rcu_read_unlock();
> > +
> > +   return ret;
> > +}
> 
> How many entries should one expect on that list? I spend quite a bit of
> time reducing the cost of is_module_text_address() a while back and see
> that both ftrace (which actually needs this to be fast) and now
> kprobes have linear list walks in here.

It depends on how many probes are used and optimized. However, in most
cases, there should be one entry (unless user defines optimized probes
over 32 on x86, from my experience, it is very rare case. :) )

> I'm assuming the ftrace thing to be mostly empty, since I never saw it
> on my benchmarks back then, but it is something Steve should look at I
> suppose.
> 
> Similarly, the changelog here should include some talk about worst case
> costs.

Would you have any good benchmark to measure it?

Thanks,

> 
> FWIW, see commit: 93c2e105f6bc ("module: Optimize __module_address() using a 
> latched RB-tree")


-- 
Masami Hiramatsu 


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-03 Thread Masami Hiramatsu
On Tue, 3 Jan 2017 11:54:02 +0100
Peter Zijlstra  wrote:

> On Tue, Dec 27, 2016 at 03:14:10PM +0900, Masami Hiramatsu wrote:
> 
> > diff --git a/kernel/extable.c b/kernel/extable.c
> > index e820cce..81c9633 100644
> > --- a/kernel/extable.c
> > +++ b/kernel/extable.c
> 
> > @@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr)
> > return 1;
> > if (is_module_text_address(addr))
> > return 1;
> > -   return is_ftrace_trampoline(addr);
> > +   if (is_ftrace_trampoline(addr))
> > +   return 1;
> > +   if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
> > +   return 1;
> > +   return 0;
> >  }
> 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index d630954..be41f6d 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> 
> > +/*
> > + * Check given address is on the page of kprobe instruction slots.
> > + * This will be used for checking whether the address on a stack
> > + * is on a text area or not.
> > + */
> > +bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr)
> > +{
> > +   struct kprobe_insn_page *kip;
> > +   bool ret = false;
> > +
> > +   rcu_read_lock();
> > +   list_for_each_entry_rcu(kip, >pages, list) {
> > +   if (addr >= (unsigned long)kip->insns &&
> > +   addr < (unsigned long)kip->insns + PAGE_SIZE) {
> > +   ret = true;
> > +   break;
> > +   }
> > +   }
> > +   rcu_read_unlock();
> > +
> > +   return ret;
> > +}
> 
> How many entries should one expect on that list? I spend quite a bit of
> time reducing the cost of is_module_text_address() a while back and see
> that both ftrace (which actually needs this to be fast) and now
> kprobes have linear list walks in here.

It depends on how many probes are used and optimized. However, in most
cases, there should be one entry (unless user defines optimized probes
over 32 on x86, from my experience, it is very rare case. :) )

> I'm assuming the ftrace thing to be mostly empty, since I never saw it
> on my benchmarks back then, but it is something Steve should look at I
> suppose.
> 
> Similarly, the changelog here should include some talk about worst case
> costs.

Would you have any good benchmark to measure it?

Thanks,

> 
> FWIW, see commit: 93c2e105f6bc ("module: Optimize __module_address() using a 
> latched RB-tree")


-- 
Masami Hiramatsu 


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-03 Thread Peter Zijlstra
On Tue, Dec 27, 2016 at 03:14:10PM +0900, Masami Hiramatsu wrote:

> diff --git a/kernel/extable.c b/kernel/extable.c
> index e820cce..81c9633 100644
> --- a/kernel/extable.c
> +++ b/kernel/extable.c

> @@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr)
>   return 1;
>   if (is_module_text_address(addr))
>   return 1;
> - return is_ftrace_trampoline(addr);
> + if (is_ftrace_trampoline(addr))
> + return 1;
> + if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
> + return 1;
> + return 0;
>  }

> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d630954..be41f6d 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c

> +/*
> + * Check given address is on the page of kprobe instruction slots.
> + * This will be used for checking whether the address on a stack
> + * is on a text area or not.
> + */
> +bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr)
> +{
> + struct kprobe_insn_page *kip;
> + bool ret = false;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(kip, >pages, list) {
> + if (addr >= (unsigned long)kip->insns &&
> + addr < (unsigned long)kip->insns + PAGE_SIZE) {
> + ret = true;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}

How many entries should one expect on that list? I spend quite a bit of
time reducing the cost of is_module_text_address() a while back and see
that both ftrace (which actually needs this to be fast) and now
kprobes have linear list walks in here.

I'm assuming the ftrace thing to be mostly empty, since I never saw it
on my benchmarks back then, but it is something Steve should look at I
suppose.

Similarly, the changelog here should include some talk about worst case
costs.

FWIW, see commit: 93c2e105f6bc ("module: Optimize __module_address() using a 
latched RB-tree")


Re: [PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2017-01-03 Thread Peter Zijlstra
On Tue, Dec 27, 2016 at 03:14:10PM +0900, Masami Hiramatsu wrote:

> diff --git a/kernel/extable.c b/kernel/extable.c
> index e820cce..81c9633 100644
> --- a/kernel/extable.c
> +++ b/kernel/extable.c

> @@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr)
>   return 1;
>   if (is_module_text_address(addr))
>   return 1;
> - return is_ftrace_trampoline(addr);
> + if (is_ftrace_trampoline(addr))
> + return 1;
> + if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
> + return 1;
> + return 0;
>  }

> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d630954..be41f6d 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c

> +/*
> + * Check given address is on the page of kprobe instruction slots.
> + * This will be used for checking whether the address on a stack
> + * is on a text area or not.
> + */
> +bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr)
> +{
> + struct kprobe_insn_page *kip;
> + bool ret = false;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(kip, >pages, list) {
> + if (addr >= (unsigned long)kip->insns &&
> + addr < (unsigned long)kip->insns + PAGE_SIZE) {
> + ret = true;
> + break;
> + }
> + }
> + rcu_read_unlock();
> +
> + return ret;
> +}

How many entries should one expect on that list? I spend quite a bit of
time reducing the cost of is_module_text_address() a while back and see
that both ftrace (which actually needs this to be fast) and now
kprobes have linear list walks in here.

I'm assuming the ftrace thing to be mostly empty, since I never saw it
on my benchmarks back then, but it is something Steve should look at I
suppose.

Similarly, the changelog here should include some talk about worst case
costs.

FWIW, see commit: 93c2e105f6bc ("module: Optimize __module_address() using a 
latched RB-tree")


[PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2016-12-26 Thread Masami Hiramatsu
Make __kernel_text_address()/kernel_text_address() returns
true if the given address is on a kprobe's instruction slot,
which is generated by kprobes as a trampoline code.
This can help stacktraces to determine the address is on a
text area or not.

To implement this without any sleep in is_kprobe_*_slot(),
this also modify insn_cache page list as a rcu list. It may
increase processing deley (not processing time) for garbage
slot collection, because it requires to wait an additional
rcu grance period when freeing a page from the list.
However, since it is not a hot path, we may not take care of it.

Signed-off-by: Masami Hiramatsu 

---
 V3:
  - Fix build error on archs which don't need insn_slot
 (e.g. sh, sparc).
  - Fix a missed rcu_read_unlock() in fast path of __get_insn_slot.
---
 include/linux/kprobes.h |   30 
 kernel/extable.c|9 +-
 kernel/kprobes.c|   69 +++
 3 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 8f68490..16ddfb8 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -278,9 +278,13 @@ struct kprobe_insn_cache {
int nr_garbage;
 };
 
+#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
 extern kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c);
 extern void __free_insn_slot(struct kprobe_insn_cache *c,
 kprobe_opcode_t *slot, int dirty);
+/* sleep-less address checking routine  */
+extern bool __is_insn_slot_addr(struct kprobe_insn_cache *c,
+   unsigned long addr);
 
 #define DEFINE_INSN_CACHE_OPS(__name)  \
 extern struct kprobe_insn_cache kprobe_##__name##_slots;   \
@@ -294,6 +298,18 @@ static inline void free_##__name##_slot(kprobe_opcode_t 
*slot, int dirty)\
 {  \
__free_insn_slot(_##__name##_slots, slot, dirty);\
 }  \
+   \
+static inline bool is_kprobe_##__name##_slot(unsigned long addr)   \
+{  \
+   return __is_insn_slot_addr(_##__name##_slots, addr); \
+}
+#else /* __ARCH_WANT_KPROBES_INSN_SLOT */
+#define DEFINE_INSN_CACHE_OPS(__name)  \
+static inline bool is_kprobe_##__name##_slot(unsigned long addr)   \
+{  \
+   return 0;   \
+}
+#endif
 
 DEFINE_INSN_CACHE_OPS(insn);
 
@@ -330,7 +346,6 @@ extern int proc_kprobes_optimization_handler(struct 
ctl_table *table,
 int write, void __user *buffer,
 size_t *length, loff_t *ppos);
 #endif
-
 #endif /* CONFIG_OPTPROBES */
 #ifdef CONFIG_KPROBES_ON_FTRACE
 extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
@@ -481,6 +496,19 @@ static inline int enable_jprobe(struct jprobe *jp)
return enable_kprobe(>kp);
 }
 
+#ifndef CONFIG_KPROBES
+static inline bool is_kprobe_insn_slot(unsigned long addr)
+{
+   return false;
+}
+#endif
+#ifndef CONFIG_OPTPROBES
+static inline bool is_kprobe_optinsn_slot(unsigned long addr)
+{
+   return false;
+}
+#endif
+
 #ifdef CONFIG_KPROBES
 /*
  * Blacklist ganerating macro. Specify functions which is not probed
diff --git a/kernel/extable.c b/kernel/extable.c
index e820cce..81c9633 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -104,6 +105,8 @@ int __kernel_text_address(unsigned long addr)
return 1;
if (is_ftrace_trampoline(addr))
return 1;
+   if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
+   return 1;
/*
 * There might be init symbols in saved stacktraces.
 * Give those symbols a chance to be printed in
@@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr)
return 1;
if (is_module_text_address(addr))
return 1;
-   return is_ftrace_trampoline(addr);
+   if (is_ftrace_trampoline(addr))
+   return 1;
+   if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
+   return 1;
+   return 0;
 }
 
 /*
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d630954..be41f6d 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -149,9 +149,11 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache 
*c)
struct kprobe_insn_page *kip;
kprobe_opcode_t *slot = NULL;
 
+   /* Since the slot array is not protected by rcu, we need a 

[PATCH tip/master v3] kprobes: extable: Identify kprobes' insn-slots as kernel text area

2016-12-26 Thread Masami Hiramatsu
Make __kernel_text_address()/kernel_text_address() returns
true if the given address is on a kprobe's instruction slot,
which is generated by kprobes as a trampoline code.
This can help stacktraces to determine the address is on a
text area or not.

To implement this without any sleep in is_kprobe_*_slot(),
this also modify insn_cache page list as a rcu list. It may
increase processing deley (not processing time) for garbage
slot collection, because it requires to wait an additional
rcu grance period when freeing a page from the list.
However, since it is not a hot path, we may not take care of it.

Signed-off-by: Masami Hiramatsu 

---
 V3:
  - Fix build error on archs which don't need insn_slot
 (e.g. sh, sparc).
  - Fix a missed rcu_read_unlock() in fast path of __get_insn_slot.
---
 include/linux/kprobes.h |   30 
 kernel/extable.c|9 +-
 kernel/kprobes.c|   69 +++
 3 files changed, 89 insertions(+), 19 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 8f68490..16ddfb8 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -278,9 +278,13 @@ struct kprobe_insn_cache {
int nr_garbage;
 };
 
+#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
 extern kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c);
 extern void __free_insn_slot(struct kprobe_insn_cache *c,
 kprobe_opcode_t *slot, int dirty);
+/* sleep-less address checking routine  */
+extern bool __is_insn_slot_addr(struct kprobe_insn_cache *c,
+   unsigned long addr);
 
 #define DEFINE_INSN_CACHE_OPS(__name)  \
 extern struct kprobe_insn_cache kprobe_##__name##_slots;   \
@@ -294,6 +298,18 @@ static inline void free_##__name##_slot(kprobe_opcode_t 
*slot, int dirty)\
 {  \
__free_insn_slot(_##__name##_slots, slot, dirty);\
 }  \
+   \
+static inline bool is_kprobe_##__name##_slot(unsigned long addr)   \
+{  \
+   return __is_insn_slot_addr(_##__name##_slots, addr); \
+}
+#else /* __ARCH_WANT_KPROBES_INSN_SLOT */
+#define DEFINE_INSN_CACHE_OPS(__name)  \
+static inline bool is_kprobe_##__name##_slot(unsigned long addr)   \
+{  \
+   return 0;   \
+}
+#endif
 
 DEFINE_INSN_CACHE_OPS(insn);
 
@@ -330,7 +346,6 @@ extern int proc_kprobes_optimization_handler(struct 
ctl_table *table,
 int write, void __user *buffer,
 size_t *length, loff_t *ppos);
 #endif
-
 #endif /* CONFIG_OPTPROBES */
 #ifdef CONFIG_KPROBES_ON_FTRACE
 extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
@@ -481,6 +496,19 @@ static inline int enable_jprobe(struct jprobe *jp)
return enable_kprobe(>kp);
 }
 
+#ifndef CONFIG_KPROBES
+static inline bool is_kprobe_insn_slot(unsigned long addr)
+{
+   return false;
+}
+#endif
+#ifndef CONFIG_OPTPROBES
+static inline bool is_kprobe_optinsn_slot(unsigned long addr)
+{
+   return false;
+}
+#endif
+
 #ifdef CONFIG_KPROBES
 /*
  * Blacklist ganerating macro. Specify functions which is not probed
diff --git a/kernel/extable.c b/kernel/extable.c
index e820cce..81c9633 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -104,6 +105,8 @@ int __kernel_text_address(unsigned long addr)
return 1;
if (is_ftrace_trampoline(addr))
return 1;
+   if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
+   return 1;
/*
 * There might be init symbols in saved stacktraces.
 * Give those symbols a chance to be printed in
@@ -123,7 +126,11 @@ int kernel_text_address(unsigned long addr)
return 1;
if (is_module_text_address(addr))
return 1;
-   return is_ftrace_trampoline(addr);
+   if (is_ftrace_trampoline(addr))
+   return 1;
+   if (is_kprobe_optinsn_slot(addr) || is_kprobe_insn_slot(addr))
+   return 1;
+   return 0;
 }
 
 /*
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d630954..be41f6d 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -149,9 +149,11 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache 
*c)
struct kprobe_insn_page *kip;
kprobe_opcode_t *slot = NULL;
 
+   /* Since the slot array is not protected by rcu, we need a mutex */