Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: "Liang, Kan" Date: Mon, 29 Oct 2018 10:33:06 -0400 > The problem is that users have to wait tens of minutes to see perf > top results on the screen in KNL. Before that, there is nothing but > a black screen. I'm actually curious why so I started digging last night. First, I made perf top exit when it is done synthesizing existing threads and profiled this during a full workload. It takes about 3 seconds on this 128 cpu sparc64 system. Some curious things in there, first of all we spend a lot of time looking for the hugetlbfs mount point. And sure enough the FS ABI code keeps reparsing the entire /proc/mounts file every time hugetlbfs__mountpoint() is called. It's logic is that if the mountpoint wasn't found on a previous call it rechecks everything. For perf's usage, this is unnecessary overhead. Simply mounting hugetlbfs gave me half a second back in startup time, so I was down to 2.5 seconds from 3 seconds already. Next I found that perf execution time during thread synthesis is dominated by sscanf() processing. So I went into the history books and found that back in the 3.x days we parsed the file by hand, so I brought that code back and extended it for what mmap2 events need. That patch is at the end of this email, ignore the XXX bits as I was too lazy to remove all of the mmap timeout code but I am absolutely certain that is the right thing to do. This gave me another half second or so back, and startup to this end of thread synthesization is now less than 2 seconds for this workload. Next, I let the perf top startup continue up until right before the display thread is started. This takes a minute or more total, and is dominated by: time seconds secondscalls s/call s/call name 28.30 12.7012.70 1927341 0.00 0.00 __hists__add_entry 12.77 18.43 5.73 27844359 0.00 0.00 sort__dso_cmp 10.21 23.01 4.58 23074107 0.00 0.00 sort__sym_cmp 8.29 26.73 3.72 1050281 0.00 0.00 dso__find_symbol 4.95 28.95 2.22 106607184 0.00 0.00 perf_hpp__is_dynamic_entry The histogram code is _insanely_ expensive and the algorithmic complexity is quite high. It does rbtree walks, doing a dso comparison and then a symbol comparison at each and every step in the walk. The symbol comparison is a full blown strcmp() and the histogram table in this situation is huge. This is where the real problems are, not in the simple mmap processing and other places where we've put timouts and other hacks that really don't try to address the fundamental problems perf has in these situations. Thanks. diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index bc64618..70597fd 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -317,6 +318,30 @@ static int perf_event__synthesize_fork(struct perf_tool *tool, return 0; } +static int dec(char ch) +{ + if ((ch >= '0') && (ch <= '9')) + return ch - '0'; + return -1; +} + +static int dec2u64(const char *ptr, u64 *long_val) +{ + const char *p = ptr; + + *long_val = 0; + while (*p) { + const int dec_val = dec(*p); + + if (dec_val < 0) + break; + + *long_val = (*long_val * 10) + dec_val; + p++; + } + return p - ptr; +} + int perf_event__synthesize_mmap_events(struct perf_tool *tool, union perf_event *event, pid_t pid, pid_t tgid, @@ -327,13 +352,12 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool, { char filename[PATH_MAX]; FILE *fp; - unsigned long long t; - bool truncation = false; - unsigned long long timeout = proc_map_timeout * 100ULL; int rc = 0; const char *hugetlbfs_mnt = hugetlbfs__mountpoint(); int hugetlbfs_mnt_len = hugetlbfs_mnt ? strlen(hugetlbfs_mnt) : 0; + (void) proc_map_timeout; /* XXX */ + if (machine__is_default_guest(machine)) return 0; @@ -350,87 +374,99 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool, } event->header.type = PERF_RECORD_MMAP2; - t = rdclock(); while (1) { - char bf[BUFSIZ]; - char prot[5]; - char execname[PATH_MAX]; + char bf[BUFSIZ], *pbf = bf; char anonstr[] = "//anon"; - unsigned int ino; + char *execname; size_t size; ssize_t n; + u64 tmp; if (fgets(bf, sizeof(bf), fp) == NULL) break; - if ((rdclock() - t) > timeout) { - pr_warning("Reading %s time out. " - "You may want to increase " - "the time limit by --proc-map-timeout\n", -
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
On 10/29/2018 6:42 PM, David Miller wrote: From: "Liang, Kan" Date: Mon, 29 Oct 2018 18:32:40 -0400 - struct annotation_options *annotation_options __maybe_unused) + struct annotation_options *annotation_options __maybe_unused, + atomic_t *nr_rb_read __maybe_unused) { What is going on with the indentations of this patch? Sorry, my editor auto wraps the line. The patch has been sent in a separate email. Thanks, Kan
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: "Liang, Kan" Date: Mon, 29 Oct 2018 18:32:40 -0400 > - struct annotation_options *annotation_options > __maybe_unused) > + struct annotation_options *annotation_options __maybe_unused, > + atomic_t *nr_rb_read __maybe_unused) > { What is going on with the indentations of this patch?
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
There is no problem with the message, the problem is the thread where the message is being displayed, just signal the display thread to display the warning, not doing that in the event processing thread. How about this patch (just did some simple test)? It moves the warning to display thread. I will find a KNL and do more test tomorrow. From 78e471c5c153c97f352dca8956ed03af81cb80ea Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Mon, 29 Oct 2018 15:14:27 -0700 Subject: [PATCH] perf top: Move the timeout warning from event processing thread to display thread The main event processing thread may hang if the ring buffer event processing timeouts. Analysis from David Miller: "It hangs the event thread, because the ui call waits for a keypress but the display thread will eat them up and the event thread thus hangs in select()." The timeout warning is moved to display thread. The nr_rb_read is introduced to track the times of perf_top__mmap_read(), which is the main function of event processing. If the nr_rb_read doesn't increase during the refresh time, the display thread may output stale data. The timeout warning will be triggered. The timeout warning can only be triggered one time to avoid the annoying and duplicated warning message. The first perf_top__mmap_read() is moved to after display thread create. Because the perf_top__mmap_read() could cost long time. For example, the function may cost tens of minutes on Knights Landing platform with parallel kernel build. There will be nothing displayed on the screent. The display thread has to be created before perf_top__mmap_read(). But at that time, the data is not ready. Sleep the refresh time in display thread. Fix: 8cc42de736b6 ("perf top: Check the latency of perf_top__mmap_read()") Reported-by: David Miller Signed-off-by: Kan Liang --- tools/perf/builtin-c2c.c | 4 +-- tools/perf/builtin-report.c| 3 ++- tools/perf/builtin-top.c | 39 +++- tools/perf/ui/browsers/hists.c | 58 ++ tools/perf/ui/browsers/hists.h | 2 +- tools/perf/util/hist.h | 6 +++-- tools/perf/util/top.h | 1 + 7 files changed, 85 insertions(+), 28 deletions(-) diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index f3aa9d0..1e77515 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -2371,7 +2371,7 @@ static int perf_c2c__browse_cacheline(struct hist_entry *he) c2c_browser__update_nr_entries(browser); while (1) { - key = hist_browser__run(browser, "? - help", true); + key = hist_browser__run(browser, "? - help", true, NULL); switch (key) { case 's': @@ -2440,7 +2440,7 @@ static int perf_c2c__hists_browse(struct hists *hists) c2c_browser__update_nr_entries(browser); while (1) { - key = hist_browser__run(browser, "? - help", true); + key = hist_browser__run(browser, "? - help", true, NULL); switch (key) { case 'q': diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 76e12bc..ed908e6 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -562,7 +562,8 @@ static int report__browse_hists(struct report *rep) ret = perf_evlist__tui_browse_hists(evlist, help, NULL, rep->min_percent, &session->header.env, - true, &rep->annotation_opts); + true, &rep->annotation_opts, + NULL); /* * Usually "ret" is the last pressed key, and we only * care if the key notifies us to switch data file. diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index d21d875..95409de 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -584,6 +584,8 @@ static void *display_thread_tui(void *arg) .refresh= top->delay_secs, }; + sleep(top->delay_secs); + /* In order to read symbols from other namespaces perf to needs to call * setns(2). This isn't permitted if the struct_fs has multiple users. * unshare(2) the fs so that we may continue to setns into namespaces @@ -607,7 +609,8 @@ static void *display_thread_tui(void *arg) top->min_percent, &top->session->header.env, !top->record_opts.overwrite, - &top->annotation_opts); + &top->annotation_opts, + &top->nr_rb_read); done = 1; return NULL; @@ -633,6 +636,11 @@ static void
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: "Liang, Kan" Date: Mon, 29 Oct 2018 14:20:15 -0400 > You didn't see any warning before the patch. I think it is just > because perf top hides the problem. Quite honestly, the last time I played around with this: 1) The new ring buffer mode didn't exist 2) perf started up much more quickly and was much more responsive than it is these days It used to handle a 128-cpu system doing a parallel kernel build with no problems, no dropped events, nothing. Something has changed to make perf more bloated and slow, and one by one I'm trying to identify and deal with these issues rather than just make perf abort when it can't keep up which is the approach that has seem to have taken over. That's to me is just wrong. One point I want to make clear, dropping things like mmap events will make perf run more slowly not more fast. I keep trying to explain this over and over. If you drop map events, you have no range into which to fit samples. Therefore samples create a unique histogram entries, and the histogram table grows to be quite huge. And this slows down perf significantly because every new sample and histogram decay walks this table, sorting it, killing off old entries, finding a place for new ones, etc. I really think dropping events causes more harm than good in this case, therefore. This also happens when perf cannot find a symbol, for example in a binary or library with no symbols. I see this as an area for significant improvement.
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
Em Mon, Oct 29, 2018 at 02:20:15PM -0400, Liang, Kan escreveu: > > > On 10/29/2018 1:48 PM, David Miller wrote: > > From: "Liang, Kan" > > Date: Mon, 29 Oct 2018 13:42:56 -0400 > > > > > > > > > > > On 10/29/2018 1:40 PM, David Miller wrote: > > > > From: "Liang, Kan" > > > > Date: Mon, 29 Oct 2018 10:33:06 -0400 > > > > > > > > > I just realized that the problem in KNL will be back if we switch > > > > > back to non-overwrite mode. > > > > What is KNL? > > > > > > > Intel Xeon Phi Processor, Knights Landing. > > > > I don't understand how a specific piece of hardware directly leads to > > ring buffer processing timeouts, or multi-minute thread map processing > > times... > > Perf top processes all samples in a serial way. With the number of CPU > increasing under the heavy load, the number of samples increase > dramatically. The processing time also increase significantly. > When the processing time is longer than display refresh time, only the stale > data is shown. > > I use KNL as an example. Because the problem is even worse on KNL. There is > nothing output with perf top. > > In theory, it's a problem for all large scale platforms. > > > > > You'll have to explain all of the details of your test scenerio, and > > the exact problems triggers, which > > My test was the same as yours, just running a parallel kernel build on KNL. > > > caused you to write these patches > > which causes serious regressions for what I consider a core simple use > > case of perf top. > > I agree that the warning message is annoying. I will try to find another way > to deliver the message. But I think we do need the warning message. There is no problem with the message, the problem is the thread where the message is being displayed, just signal the display thread to display the warning, not doing that in the event processing thread. > You didn't see any warning before the patch. I think it is just because perf > top hides the problem. > > Thanks, > Kan > > > > And that's running perf top during a parallel kernel build. > > > >
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
On 10/29/2018 1:48 PM, David Miller wrote: From: "Liang, Kan" Date: Mon, 29 Oct 2018 13:42:56 -0400 On 10/29/2018 1:40 PM, David Miller wrote: From: "Liang, Kan" Date: Mon, 29 Oct 2018 10:33:06 -0400 I just realized that the problem in KNL will be back if we switch back to non-overwrite mode. What is KNL? Intel Xeon Phi Processor, Knights Landing. I don't understand how a specific piece of hardware directly leads to ring buffer processing timeouts, or multi-minute thread map processing times... Perf top processes all samples in a serial way. With the number of CPU increasing under the heavy load, the number of samples increase dramatically. The processing time also increase significantly. When the processing time is longer than display refresh time, only the stale data is shown. I use KNL as an example. Because the problem is even worse on KNL. There is nothing output with perf top. In theory, it's a problem for all large scale platforms. You'll have to explain all of the details of your test scenerio, and the exact problems triggers, which My test was the same as yours, just running a parallel kernel build on KNL. caused you to write these patches which causes serious regressions for what I consider a core simple use case of perf top. I agree that the warning message is annoying. I will try to find another way to deliver the message. But I think we do need the warning message. You didn't see any warning before the patch. I think it is just because perf top hides the problem. Thanks, Kan And that's running perf top during a parallel kernel build.
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
Em Mon, Oct 29, 2018 at 10:43:21AM -0700, David Miller escreveu: > From: "Liang, Kan" > Date: Mon, 29 Oct 2018 11:11:25 -0400 > > > The processing time for each perf_top__mmap_read_idx() should not that > > long. We may check it after each perf_top__mmap_read_idx(). Also > > change the ui_warning to one-time warning. The patch as below can do > > that (not test). > > You cannot make ui__*() calls from the event processing thread. > > I tried to make this point strongly over the weekend. > > It hangs the event thread, because the ui call waits for a keypress > but the display thread will eat them up and the event thread thus > hangs in select(). > > So, once more, all ui calls must be avoided in event processing code. > > Said another way, it is not legal to call UI interfaces from any > code that could be called by the event thread. Right, that global in fact needs to be set in the event processing and checked in the display thread, start it with 0, switch to 1 and then to 2 to say it was processed just once, etc. - Arnaldo
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
Em Mon, Oct 29, 2018 at 10:40:08AM -0700, David Miller escreveu: > From: "Liang, Kan" > Date: Mon, 29 Oct 2018 10:33:06 -0400 > > > I just realized that the problem in KNL will be back if we switch > > back to non-overwrite mode. > >> What is KNL? I guess its Knights Landing. https://en.wikipedia.org/wiki/Xeon_Phi - Arnaldo
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: "Liang, Kan" Date: Mon, 29 Oct 2018 13:42:56 -0400 > > > On 10/29/2018 1:40 PM, David Miller wrote: >> From: "Liang, Kan" >> Date: Mon, 29 Oct 2018 10:33:06 -0400 >> >>> I just realized that the problem in KNL will be back if we switch >>> back to non-overwrite mode. >> What is KNL? >> > Intel Xeon Phi Processor, Knights Landing. I don't understand how a specific piece of hardware directly leads to ring buffer processing timeouts, or multi-minute thread map processing times... You'll have to explain all of the details of your test scenerio, and the exact problems triggers, which caused you to write these patches which causes serious regressions for what I consider a core simple use case of perf top. And that's running perf top during a parallel kernel build.
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: "Liang, Kan" Date: Mon, 29 Oct 2018 11:11:25 -0400 > The processing time for each perf_top__mmap_read_idx() should not that > long. We may check it after each perf_top__mmap_read_idx(). Also > change the ui_warning to one-time warning. The patch as below can do > that (not test). You cannot make ui__*() calls from the event processing thread. I tried to make this point strongly over the weekend. It hangs the event thread, because the ui call waits for a keypress but the display thread will eat them up and the event thread thus hangs in select(). So, once more, all ui calls must be avoided in event processing code. Said another way, it is not legal to call UI interfaces from any code that could be called by the event thread.
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
On 10/29/2018 1:40 PM, David Miller wrote: From: "Liang, Kan" Date: Mon, 29 Oct 2018 10:33:06 -0400 I just realized that the problem in KNL will be back if we switch back to non-overwrite mode. What is KNL? Intel Xeon Phi Processor, Knights Landing. Thanks, Kan
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: "Liang, Kan" Date: Mon, 29 Oct 2018 10:33:06 -0400 > I just realized that the problem in KNL will be back if we switch > back to non-overwrite mode. What is KNL?
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
On 10/29/2018 10:35 AM, Arnaldo Carvalho de Melo wrote: Em Mon, Oct 29, 2018 at 10:33:06AM -0400, Liang, Kan escreveu: On 10/29/2018 9:03 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu: On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu: It is mainly for performance reason to switch to overwrite mode. The impact was very small when I did my test. But now the effect is easily noticeable in other tests. Yes, I agree. We may change it back to non-overwrite mode until the issue is addressed. So, I have these two patches in my perf/core branch, with Fixes tags that will make them get to the stable kernels, ok? I just realized that the problem in KNL will be back if we switch back to non-overwrite mode. The problem is that users have to wait tens of minutes to see perf top results on the screen in KNL. Before that, there is nothing but a black screen. Sorry I didn't notice it last Friday. Because I thought the ui_warning in perf_top__mmap_read() can give user a hint. So the user can switch to overwrite mode manually. But unfortunately, the ui_warning doesn't work. Because it is called after perf_top__mmap_read(). The processing time of perf_top__mmap_read() could be tens of minutes. So we need a way to notice that we're in a machine like that and warn the user before the wait takes place, ideas on how to do that? The processing time for each perf_top__mmap_read_idx() should not that long. We may check it after each perf_top__mmap_read_idx(). Also change the ui_warning to one-time warning. The patch as below can do that (not test). diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index d21d875..5e532e0 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -877,31 +877,40 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx) perf_mmap__read_done(md); } +static bool check_processing_time = true; + static void perf_top__mmap_read(struct perf_top *top) { bool overwrite = top->record_opts.overwrite; struct perf_evlist *evlist = top->evlist; - unsigned long long start, end; + unsigned long long start, end, tolerance; int i; - start = rdclock(); if (overwrite) perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_DATA_PENDING); - for (i = 0; i < top->evlist->nr_mmaps; i++) + tolerance = (unsigned long long)top->delay_secs * NSEC_PER_SEC / top->evlist->nr_mmaps; + start = rdclock(); + for (i = 0; i < top->evlist->nr_mmaps; i++) { perf_top__mmap_read_idx(top, i); + if (check_processing_time) { + end = rdclock(); + + if ((end - start) > tolerance) { + ui__warning("Too slow to read ring buffer.\n" + "Please try increasing the period (-c) or\n" + "decreasing the freq (-F) or\n" + "limiting the number of CPUs (-C)\n"); + check_processing_time = false; + } + start = end; + } + } if (overwrite) { perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY); perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING); } - end = rdclock(); - - if ((end - start) > (unsigned long long)top->delay_secs * NSEC_PER_SEC) - ui__warning("Too slow to read ring buffer.\n" - "Please try increasing the period (-c) or\n" - "decreasing the freq (-F) or\n" - "limiting the number of CPUs (-C)\n"); } /*
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
Em Mon, Oct 29, 2018 at 10:33:06AM -0400, Liang, Kan escreveu: > On 10/29/2018 9:03 AM, Arnaldo Carvalho de Melo wrote: > > Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu: > > > On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote: > > > > Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu: > > > It is mainly for performance reason to switch to overwrite mode. The > > > impact > > > was very small when I did my test. But now the effect is easily noticeable > > > in other tests. Yes, I agree. We may change it back to non-overwrite mode > > > until the issue is addressed. > > So, I have these two patches in my perf/core branch, with Fixes tags > > that will make them get to the stable kernels, ok? > I just realized that the problem in KNL will be back if we switch back to > non-overwrite mode. > The problem is that users have to wait tens of minutes to see perf top > results on the screen in KNL. Before that, there is nothing but a black > screen. > Sorry I didn't notice it last Friday. Because I thought the ui_warning in > perf_top__mmap_read() can give user a hint. So the user can switch to > overwrite mode manually. > But unfortunately, the ui_warning doesn't work. Because it is called after > perf_top__mmap_read(). The processing time of perf_top__mmap_read() could be > tens of minutes. So we need a way to notice that we're in a machine like that and warn the user before the wait takes place, ideas on how to do that? - Arnaldo
Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
On 10/29/2018 9:03 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu: On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu: On 10/26/2018 3:12 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Oct 26, 2018 at 03:07:40PM -0400, Liang, Kan escreveu: On 10/26/2018 3:02 PM, Arnaldo Carvalho de Melo wrote: I checked and both have the same result. But I still think there is value in having the shorter form, ok? Sure. Ok. I think that we should default back to --no-overwrite till we get this sorted out, as the effect is easily noticeable, as David reported and I reproduced, when doing kernel builds. It is mainly for performance reason to switch to overwrite mode. The impact was very small when I did my test. But now the effect is easily noticeable in other tests. Yes, I agree. We may change it back to non-overwrite mode until the issue is addressed. So, I have these two patches in my perf/core branch, with Fixes tags that will make them get to the stable kernels, ok? I just realized that the problem in KNL will be back if we switch back to non-overwrite mode. The problem is that users have to wait tens of minutes to see perf top results on the screen in KNL. Before that, there is nothing but a black screen. Sorry I didn't notice it last Friday. Because I thought the ui_warning in perf_top__mmap_read() can give user a hint. So the user can switch to overwrite mode manually. But unfortunately, the ui_warning doesn't work. Because it is called after perf_top__mmap_read(). The processing time of perf_top__mmap_read() could be tens of minutes. Thanks, Kan From f54ef0e7342efb77205b2faaacbcb81cdd31f064 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Fri, 26 Oct 2018 15:55:23 -0300 Subject: [PATCH 1/2] perf top: Allow disabling the overwrite mode In ebebbf082357 ("perf top: Switch default mode to overwrite mode") we forgot to leave a way to disable that new default, add a --overwrite option that can be disabled using --no-overwrite, since the code already in such a way that we can readily disable this mode. This is useful when investigating bugs with this mode like the recent report from David Miller where lots of unknown symbols appear due to disabling the events while processing them which disables all record types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve maps when we lose PERF_RECORD_MMAP records. This can be easily seen while building a kernel, when there are lots of short lived processes. Reported-by: David Miller Acked-by: Kan Liang Cc: Adrian Hunter Cc: Andi Kleen Cc: David Ahern Cc: Jin Yao Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Wang Nan Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode") Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcd...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-top.txt | 5 + tools/perf/builtin-top.c | 2 ++ 2 files changed, 7 insertions(+) diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 114fda12aa49..d4be6061fe1c 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -242,6 +242,11 @@ Default is to monitor all CPUS. --hierarchy:: Enable hierarchy output. +--overwrite:: + This is the default, but for investigating problems with it or any other strange + behaviour like lots of unknown samples, we may want to disable this mode by using + --no-overwrite. + --force:: Don't do ownership validation. diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index d21d8751e749..214fad747b04 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1372,6 +1372,8 @@ int cmd_top(int argc, const char **argv) "Show raw trace event output (do not use print fmt or plugins)"), OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy, "Show entries in a hierarchy"), + OPT_BOOLEAN(0, "overwrite", &top.record_opts.overwrite, + "Use a backward ring buffer, default: yes"), OPT_BOOLEAN(0, "force", &symbol_conf.force, "don't complain, do it"), OPT_UINTEGER(0, "num-thread-synthesize", &top.nr_threads_synthesize, "number of thread to run event synthesize"),
[PATCHES/RFC] Re: A concern about overflow ring buffer mode
Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu: > On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote: > > Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu: > > > On 10/26/2018 3:12 PM, Arnaldo Carvalho de Melo wrote: > > > > Em Fri, Oct 26, 2018 at 03:07:40PM -0400, Liang, Kan escreveu: > > > > > On 10/26/2018 3:02 PM, Arnaldo Carvalho de Melo wrote: > > > > I checked and both have the same result. But I still think there is > > > > value in having the shorter form, ok? > > > Sure. > > Ok. > > I think that we should default back to --no-overwrite till we get this > > sorted out, as the effect is easily noticeable, as David reported and I > > reproduced, when doing kernel builds. > It is mainly for performance reason to switch to overwrite mode. The impact > was very small when I did my test. But now the effect is easily noticeable > in other tests. Yes, I agree. We may change it back to non-overwrite mode > until the issue is addressed. So, I have these two patches in my perf/core branch, with Fixes tags that will make them get to the stable kernels, ok? >From f54ef0e7342efb77205b2faaacbcb81cdd31f064 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Fri, 26 Oct 2018 15:55:23 -0300 Subject: [PATCH 1/2] perf top: Allow disabling the overwrite mode In ebebbf082357 ("perf top: Switch default mode to overwrite mode") we forgot to leave a way to disable that new default, add a --overwrite option that can be disabled using --no-overwrite, since the code already in such a way that we can readily disable this mode. This is useful when investigating bugs with this mode like the recent report from David Miller where lots of unknown symbols appear due to disabling the events while processing them which disables all record types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve maps when we lose PERF_RECORD_MMAP records. This can be easily seen while building a kernel, when there are lots of short lived processes. Reported-by: David Miller Acked-by: Kan Liang Cc: Adrian Hunter Cc: Andi Kleen Cc: David Ahern Cc: Jin Yao Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Wang Nan Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode") Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcd...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-top.txt | 5 + tools/perf/builtin-top.c | 2 ++ 2 files changed, 7 insertions(+) diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 114fda12aa49..d4be6061fe1c 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -242,6 +242,11 @@ Default is to monitor all CPUS. --hierarchy:: Enable hierarchy output. +--overwrite:: + This is the default, but for investigating problems with it or any other strange + behaviour like lots of unknown samples, we may want to disable this mode by using + --no-overwrite. + --force:: Don't do ownership validation. diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index d21d8751e749..214fad747b04 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1372,6 +1372,8 @@ int cmd_top(int argc, const char **argv) "Show raw trace event output (do not use print fmt or plugins)"), OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy, "Show entries in a hierarchy"), + OPT_BOOLEAN(0, "overwrite", &top.record_opts.overwrite, + "Use a backward ring buffer, default: yes"), OPT_BOOLEAN(0, "force", &symbol_conf.force, "don't complain, do it"), OPT_UINTEGER(0, "num-thread-synthesize", &top.nr_threads_synthesize, "number of thread to run event synthesize"), -- 2.14.4 >From 5af190cac4ec10c53f1a934e7bbd30da7e616b22 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Mon, 29 Oct 2018 09:47:00 -0300 Subject: [PATCH 2/2] perf top: Do not use overwrite mode by default Enabling --overwrite mode allows us to to use just the most recent records, which helps in high core count machines such as Knights Landing/Mill, but right now is being disabled by default as the pausing used in this technique is leading to loss of metadata events such as PERF_RECORD_MMAP which makes 'perf top' unable to resolve samples, leading to lots of unknown samples appearing on the UI. Enabling this may be useful if you are in such machines and profiling a workload that doesn't creates short lived threads and/or doesn't uses many executable mmap operations. Work is being planed to solve this situation, till then, this will remain disabled by default. Reported-by: David Miller Acked-by: Kan Liang Link: https://lkml.kernel.org/r/4f84468f-37d9-cf1b-12c1-514ef74b6...@linux.intel.com Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wan