Re: [PATCH RFC] perf: Go back to by-hand proc mmap parsing and kill timeout
Em Wed, Oct 31, 2018 at 12:19:10PM -0400, Liang, Kan escreveu: > > > On 10/31/2018 9:21 AM, Arnaldo Carvalho de Melo wrote: > > Em Tue, Oct 30, 2018 at 10:54:16PM -0700, David Miller escreveu: > > > > > > This goes back to by-hand parsing of the proc mmap file, and removes > > > the timeout. > > > > > > In my measurements this makes the parsing about twice as fast. > > > > > > Profiling thread synthesizing shows that most of the time is spent in > > > the sscanf() call. > > > > > I did some performance test to evaluate the time cost on proc mmap parsing > with different options/patches. > > The benchmark I used is case-small-allocs from vm-scalability. > > I did the test on two machines. > M1: Skylake server (224 CPUs) > M2: Knights Landing (272 CPUs) > > With three different options for perf top. > P1: baseline test. No David's patch. "perf top" > P2: No David's patch. Set a big timeout. The proc map processing will not be > truncated. "perf top --proc-map-timeout 1" > P3: With David's patch. "perf top" > > The elapsed time is the time cost for synthesize_threads. > > On M1 > Elapsed time (Seconds) > P1~6 > P28-24* > P36-11* > *The time cost is not stable. I did several tests. Here lists a range of the > results. > > On M2 > Elapsed time (Seconds) > P141 > P2282 > P3176 > I only did test one time. > > The conclusion, > - David's patch significantly improve the time cost of proc mmap parsing. > - On some platform, especially Knights Landing, the time cost is still huge > even with David's patch. > - I think timeout is still needed. We have to give user warning/hints if the > processing time is too long. For example, with P3 on M2, I can only see a > black screen during the ~3 mins processing time. That is why I mentioned that: > I'm going to split this patch into two, one that makes it go back to > by-hand proc mmap parsing, and the other killing the mmap-timeout. I.e. the improvement is welcome, the removal of the timeout is a separate issue :-) - Arnaldo > Thanks, > Kan > > > > Processing samples is critical for perf top and perf record -a startup > > > performance, and perf's ability to keep up with high sample rates. > > > > I'm going to split this patch into two, one that makes it go back to > > by-hand proc mmap parsing, and the other killing the mmap-timeout. > > > > Thanks, > > > > - Arnaldo > > > Signed-off-by: David S. Miller > > > > > > diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c > > > index 2b1ef704169f..d4b5177c3c22 100644 > > > --- a/tools/perf/builtin-kvm.c > > > +++ b/tools/perf/builtin-kvm.c > > > @@ -1364,8 +1364,6 @@ static int kvm_events_live(struct perf_kvm_stat > > > *kvm, > > > "show events other than" > > > " HLT (x86 only) or Wait state (s390 only)" > > > " that take longer than duration usecs"), > > > - OPT_UINTEGER(0, "proc-map-timeout", &kvm->opts.proc_map_timeout, > > > - "per thread proc mmap processing timeout in > > > ms"), > > > OPT_END() > > > }; > > > const char * const live_usage[] = { > > > @@ -1394,7 +1392,6 @@ static int kvm_events_live(struct perf_kvm_stat > > > *kvm, > > > kvm->opts.target.uses_mmap = false; > > > kvm->opts.target.uid_str = NULL; > > > kvm->opts.target.uid = UINT_MAX; > > > - kvm->opts.proc_map_timeout = 500; > > > symbol__init(NULL); > > > disable_buildid_cache(); > > > @@ -1453,8 +1450,7 @@ static int kvm_events_live(struct perf_kvm_stat > > > *kvm, > > > perf_session__set_id_hdr_size(kvm->session); > > > > > > ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true); > > > machine__synthesize_threads(&kvm->session->machines.host, > > > &kvm->opts.target, > > > - kvm->evlist->threads, false, > > > - kvm->opts.proc_map_timeout, 1); > > > + kvm->evlist->threads, false, 1); > > > err = kvm_live_open_events(kvm); > > > if (err) > > > goto out; > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > > > index 0980dfe3396b..4259ea42a8ef 100644 > > > --- a/tools/perf/builtin-record.c > > > +++ b/tools/perf/builtin-record.c > > > @@ -633,8 +633,7 @@ static int record__synthesize_workload(struct record > > > *rec, bool tail) > > > err = perf_event__synthesize_thread_map(&rec->tool, thread_map, > > > > > > process_synthesized_event, > > > > > > &rec->session->machines.host, > > > - rec->opts.sample_address, > > > - rec->opts.proc_map_timeout); > > > +
Re: [PATCH RFC] perf: Go back to by-hand proc mmap parsing and kill timeout
On 10/31/2018 9:21 AM, Arnaldo Carvalho de Melo wrote: Em Tue, Oct 30, 2018 at 10:54:16PM -0700, David Miller escreveu: This goes back to by-hand parsing of the proc mmap file, and removes the timeout. In my measurements this makes the parsing about twice as fast. Profiling thread synthesizing shows that most of the time is spent in the sscanf() call. I did some performance test to evaluate the time cost on proc mmap parsing with different options/patches. The benchmark I used is case-small-allocs from vm-scalability. I did the test on two machines. M1: Skylake server (224 CPUs) M2: Knights Landing (272 CPUs) With three different options for perf top. P1: baseline test. No David's patch. "perf top" P2: No David's patch. Set a big timeout. The proc map processing will not be truncated. "perf top --proc-map-timeout 1" P3: With David's patch. "perf top" The elapsed time is the time cost for synthesize_threads. On M1 Elapsed time (Seconds) P1 ~6 P2 8-24* P3 6-11* *The time cost is not stable. I did several tests. Here lists a range of the results. On M2 Elapsed time (Seconds) P1 41 P2 282 P3 176 I only did test one time. The conclusion, - David's patch significantly improve the time cost of proc mmap parsing. - On some platform, especially Knights Landing, the time cost is still huge even with David's patch. - I think timeout is still needed. We have to give user warning/hints if the processing time is too long. For example, with P3 on M2, I can only see a black screen during the ~3 mins processing time. Thanks, Kan Processing samples is critical for perf top and perf record -a startup performance, and perf's ability to keep up with high sample rates. I'm going to split this patch into two, one that makes it go back to by-hand proc mmap parsing, and the other killing the mmap-timeout. Thanks, - Arnaldo Signed-off-by: David S. Miller diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 2b1ef704169f..d4b5177c3c22 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -1364,8 +1364,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, "show events other than" " HLT (x86 only) or Wait state (s390 only)" " that take longer than duration usecs"), - OPT_UINTEGER(0, "proc-map-timeout", &kvm->opts.proc_map_timeout, - "per thread proc mmap processing timeout in ms"), OPT_END() }; const char * const live_usage[] = { @@ -1394,7 +1392,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, kvm->opts.target.uses_mmap = false; kvm->opts.target.uid_str = NULL; kvm->opts.target.uid = UINT_MAX; - kvm->opts.proc_map_timeout = 500; symbol__init(NULL); disable_buildid_cache(); @@ -1453,8 +1450,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, perf_session__set_id_hdr_size(kvm->session); ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true); machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target, - kvm->evlist->threads, false, - kvm->opts.proc_map_timeout, 1); + kvm->evlist->threads, false, 1); err = kvm_live_open_events(kvm); if (err) goto out; diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 0980dfe3396b..4259ea42a8ef 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -633,8 +633,7 @@ static int record__synthesize_workload(struct record *rec, bool tail) err = perf_event__synthesize_thread_map(&rec->tool, thread_map, process_synthesized_event, &rec->session->machines.host, -rec->opts.sample_address, -rec->opts.proc_map_timeout); +rec->opts.sample_address); thread_map__put(thread_map); return err; } @@ -848,8 +847,7 @@ static int record__synthesize(struct record *rec, bool tail) } err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads, - process_synthesized_event, opts->sample_address, - opts->proc_map_timeout, 1); + process_synthesized_event, opts->sample_address, 1); out: return err; } @@ -1521,7 +1519,6 @@ static struct record record = { .uses_mmap = true, .default_per_cpu = true, }, - .proc_map_timeout = 500, },
Re: [PATCH RFC] perf: Go back to by-hand proc mmap parsing and kill timeout
Em Tue, Oct 30, 2018 at 10:54:16PM -0700, David Miller escreveu: > > This goes back to by-hand parsing of the proc mmap file, and removes > the timeout. > > In my measurements this makes the parsing about twice as fast. > > Profiling thread synthesizing shows that most of the time is spent in > the sscanf() call. > > Processing samples is critical for perf top and perf record -a startup > performance, and perf's ability to keep up with high sample rates. I'm going to split this patch into two, one that makes it go back to by-hand proc mmap parsing, and the other killing the mmap-timeout. Thanks, - Arnaldo > Signed-off-by: David S. Miller > > diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c > index 2b1ef704169f..d4b5177c3c22 100644 > --- a/tools/perf/builtin-kvm.c > +++ b/tools/perf/builtin-kvm.c > @@ -1364,8 +1364,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, > "show events other than" > " HLT (x86 only) or Wait state (s390 only)" > " that take longer than duration usecs"), > - OPT_UINTEGER(0, "proc-map-timeout", &kvm->opts.proc_map_timeout, > - "per thread proc mmap processing timeout in > ms"), > OPT_END() > }; > const char * const live_usage[] = { > @@ -1394,7 +1392,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, > kvm->opts.target.uses_mmap = false; > kvm->opts.target.uid_str = NULL; > kvm->opts.target.uid = UINT_MAX; > - kvm->opts.proc_map_timeout = 500; > > symbol__init(NULL); > disable_buildid_cache(); > @@ -1453,8 +1450,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, > perf_session__set_id_hdr_size(kvm->session); > ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true); > machine__synthesize_threads(&kvm->session->machines.host, > &kvm->opts.target, > - kvm->evlist->threads, false, > - kvm->opts.proc_map_timeout, 1); > + kvm->evlist->threads, false, 1); > err = kvm_live_open_events(kvm); > if (err) > goto out; > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c > index 0980dfe3396b..4259ea42a8ef 100644 > --- a/tools/perf/builtin-record.c > +++ b/tools/perf/builtin-record.c > @@ -633,8 +633,7 @@ static int record__synthesize_workload(struct record > *rec, bool tail) > err = perf_event__synthesize_thread_map(&rec->tool, thread_map, >process_synthesized_event, >&rec->session->machines.host, > - rec->opts.sample_address, > - rec->opts.proc_map_timeout); > + rec->opts.sample_address); > thread_map__put(thread_map); > return err; > } > @@ -848,8 +847,7 @@ static int record__synthesize(struct record *rec, bool > tail) > } > > err = __machine__synthesize_threads(machine, tool, &opts->target, > rec->evlist->threads, > - process_synthesized_event, > opts->sample_address, > - opts->proc_map_timeout, 1); > + process_synthesized_event, > opts->sample_address, 1); > out: > return err; > } > @@ -1521,7 +1519,6 @@ static struct record record = { > .uses_mmap = true, > .default_per_cpu = true, > }, > - .proc_map_timeout = 500, > }, > .tool = { > .sample = process_sample_event, > @@ -1651,8 +1648,6 @@ static struct option __record_options[] = { > parse_clockid), > OPT_STRING_OPTARG('S', "snapshot", &record.opts.auxtrace_snapshot_opts, > "opts", "AUX area tracing Snapshot Mode", ""), > - OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout, > - "per thread proc mmap processing timeout in ms"), > OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces, > "Record namespaces events"), > OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events, > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c > index d21d8751e749..4eec44d815bd 100644 > --- a/tools/perf/builtin-top.c > +++ b/tools/perf/builtin-top.c > @@ -1096,7 +1096,6 @@ static int __cmd_top(struct perf_top *top) > > machine__synthesize_threads(&top->session->machines.host, &opts->target, > top->evlist->threads, false, > - opts->proc_map_timeout, > top->nr_threads_synthesize); > > if (top->nr_threads_synthesize > 1) > @@ -1256
[PATCH RFC] perf: Go back to by-hand proc mmap parsing and kill timeout
This goes back to by-hand parsing of the proc mmap file, and removes the timeout. In my measurements this makes the parsing about twice as fast. Profiling thread synthesizing shows that most of the time is spent in the sscanf() call. Processing samples is critical for perf top and perf record -a startup performance, and perf's ability to keep up with high sample rates. Signed-off-by: David S. Miller diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 2b1ef704169f..d4b5177c3c22 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -1364,8 +1364,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, "show events other than" " HLT (x86 only) or Wait state (s390 only)" " that take longer than duration usecs"), - OPT_UINTEGER(0, "proc-map-timeout", &kvm->opts.proc_map_timeout, - "per thread proc mmap processing timeout in ms"), OPT_END() }; const char * const live_usage[] = { @@ -1394,7 +1392,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, kvm->opts.target.uses_mmap = false; kvm->opts.target.uid_str = NULL; kvm->opts.target.uid = UINT_MAX; - kvm->opts.proc_map_timeout = 500; symbol__init(NULL); disable_buildid_cache(); @@ -1453,8 +1450,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm, perf_session__set_id_hdr_size(kvm->session); ordered_events__set_copy_on_queue(&kvm->session->ordered_events, true); machine__synthesize_threads(&kvm->session->machines.host, &kvm->opts.target, - kvm->evlist->threads, false, - kvm->opts.proc_map_timeout, 1); + kvm->evlist->threads, false, 1); err = kvm_live_open_events(kvm); if (err) goto out; diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 0980dfe3396b..4259ea42a8ef 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -633,8 +633,7 @@ static int record__synthesize_workload(struct record *rec, bool tail) err = perf_event__synthesize_thread_map(&rec->tool, thread_map, process_synthesized_event, &rec->session->machines.host, -rec->opts.sample_address, -rec->opts.proc_map_timeout); +rec->opts.sample_address); thread_map__put(thread_map); return err; } @@ -848,8 +847,7 @@ static int record__synthesize(struct record *rec, bool tail) } err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads, - process_synthesized_event, opts->sample_address, - opts->proc_map_timeout, 1); + process_synthesized_event, opts->sample_address, 1); out: return err; } @@ -1521,7 +1519,6 @@ static struct record record = { .uses_mmap = true, .default_per_cpu = true, }, - .proc_map_timeout = 500, }, .tool = { .sample = process_sample_event, @@ -1651,8 +1648,6 @@ static struct option __record_options[] = { parse_clockid), OPT_STRING_OPTARG('S', "snapshot", &record.opts.auxtrace_snapshot_opts, "opts", "AUX area tracing Snapshot Mode", ""), - OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout, - "per thread proc mmap processing timeout in ms"), OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces, "Record namespaces events"), OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events, diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index d21d8751e749..4eec44d815bd 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1096,7 +1096,6 @@ static int __cmd_top(struct perf_top *top) machine__synthesize_threads(&top->session->machines.host, &opts->target, top->evlist->threads, false, - opts->proc_map_timeout, top->nr_threads_synthesize); if (top->nr_threads_synthesize > 1) @@ -1256,7 +1255,6 @@ int cmd_top(int argc, const char **argv) .target = { .uses_mmap = true, }, - .proc_map_timeout= 500, .overwrite = 1, }, .max_sta