Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-30 Thread David Miller
From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 10:33:06 -0400

> The problem is that users have to wait tens of minutes to see perf
> top results on the screen in KNL. Before that, there is nothing but
> a black screen.

I'm actually curious why so I started digging last night.

First, I made perf top exit when it is done synthesizing existing
threads and profiled this during a full workload.

It takes about 3 seconds on this 128 cpu sparc64 system.

Some curious things in there, first of all we spend a lot of time
looking for the hugetlbfs mount point.

And sure enough the FS ABI code keeps reparsing the entire
/proc/mounts file every time hugetlbfs__mountpoint() is called.  It's
logic is that if the mountpoint wasn't found on a previous call it
rechecks everything.  For perf's usage, this is unnecessary overhead.

Simply mounting hugetlbfs gave me half a second back in startup time,
so I was down to 2.5 seconds from 3 seconds already.

Next I found that perf execution time during thread synthesis is
dominated by sscanf() processing.  So I went into the history books
and found that back in the 3.x days we parsed the file by hand, so I
brought that code back and extended it for what mmap2 events need.

That patch is at the end of this email, ignore the XXX bits as I was
too lazy to remove all of the mmap timeout code but I am absolutely
certain that is the right thing to do.

This gave me another half second or so back, and startup to this end
of thread synthesization is now less than 2 seconds for this workload.

Next, I let the perf top startup continue up until right before the
display thread is started.  This takes a minute or more total, and is
dominated by:

 time   seconds   secondscalls   s/call   s/call  name
 28.30 12.7012.70  1927341 0.00 0.00  __hists__add_entry
 12.77 18.43 5.73 27844359 0.00 0.00  sort__dso_cmp
 10.21 23.01 4.58 23074107 0.00 0.00  sort__sym_cmp
  8.29 26.73 3.72  1050281 0.00 0.00  dso__find_symbol
  4.95 28.95 2.22 106607184 0.00 0.00  
perf_hpp__is_dynamic_entry

The histogram code is _insanely_ expensive and the algorithmic
complexity is quite high.  It does rbtree walks, doing a dso
comparison and then a symbol comparison at each and every step in the
walk.  The symbol comparison is a full blown strcmp() and the
histogram table in this situation is huge.

This is where the real problems are, not in the simple mmap processing
and other places where we've put timouts and other hacks that really
don't try to address the fundamental problems perf has in these
situations.

Thanks.

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index bc64618..70597fd 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -317,6 +318,30 @@ static int perf_event__synthesize_fork(struct perf_tool 
*tool,
return 0;
 }
 
+static int dec(char ch)
+{
+   if ((ch >= '0') && (ch <= '9'))
+   return ch - '0';
+   return -1;
+}
+
+static int dec2u64(const char *ptr, u64 *long_val)
+{
+   const char *p = ptr;
+
+   *long_val = 0;
+   while (*p) {
+   const int dec_val = dec(*p);
+
+   if (dec_val < 0)
+   break;
+
+   *long_val = (*long_val * 10) + dec_val;
+   p++;
+   }
+   return p - ptr;
+}
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
   union perf_event *event,
   pid_t pid, pid_t tgid,
@@ -327,13 +352,12 @@ int perf_event__synthesize_mmap_events(struct perf_tool 
*tool,
 {
char filename[PATH_MAX];
FILE *fp;
-   unsigned long long t;
-   bool truncation = false;
-   unsigned long long timeout = proc_map_timeout * 100ULL;
int rc = 0;
const char *hugetlbfs_mnt = hugetlbfs__mountpoint();
int hugetlbfs_mnt_len = hugetlbfs_mnt ? strlen(hugetlbfs_mnt) : 0;
 
+   (void) proc_map_timeout; /* XXX */
+
if (machine__is_default_guest(machine))
return 0;
 
@@ -350,87 +374,99 @@ int perf_event__synthesize_mmap_events(struct perf_tool 
*tool,
}
 
event->header.type = PERF_RECORD_MMAP2;
-   t = rdclock();
 
while (1) {
-   char bf[BUFSIZ];
-   char prot[5];
-   char execname[PATH_MAX];
+   char bf[BUFSIZ], *pbf = bf;
char anonstr[] = "//anon";
-   unsigned int ino;
+   char *execname;
size_t size;
ssize_t n;
+   u64 tmp;
 
if (fgets(bf, sizeof(bf), fp) == NULL)
break;
 
-   if ((rdclock() - t) > timeout) {
-   pr_warning("Reading %s time out. "
-  "You may want to increase "
-  "the time limit by --proc-map-timeout\n",
-

Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 6:42 PM, David Miller wrote:

From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 18:32:40 -0400


- struct annotation_options *annotation_options 
__maybe_unused)
+ struct annotation_options *annotation_options __maybe_unused,
+ atomic_t *nr_rb_read __maybe_unused)
  {


What is going on with the indentations of this patch?



Sorry, my editor auto wraps the line.
The patch has been sent in a separate email.

Thanks,
Kan


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread David Miller
From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 18:32:40 -0400

> -   struct annotation_options *annotation_options 
> __maybe_unused)
> + struct annotation_options *annotation_options __maybe_unused,
> +   atomic_t *nr_rb_read __maybe_unused)
>  {

What is going on with the indentations of this patch?


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




There is no problem with the message, the problem is the thread where
the message is being displayed, just signal the display thread to
display the warning, not doing that in the event processing thread.
  


How about this patch (just did some simple test)? It moves the warning 
to display thread.

I will find a KNL and do more test tomorrow.

From 78e471c5c153c97f352dca8956ed03af81cb80ea Mon Sep 17 00:00:00 2001
From: Kan Liang 
Date: Mon, 29 Oct 2018 15:14:27 -0700
Subject: [PATCH] perf top: Move the timeout warning from event 
processing thread to display thread


The main event processing thread may hang if the ring buffer event
processing timeouts.

Analysis from David Miller:
"It hangs the event thread, because the ui call waits for a keypress
but the display thread will eat them up and the event thread thus
hangs in select()."

The timeout warning is moved to display thread.

The nr_rb_read is introduced to track the times of
perf_top__mmap_read(), which is the main function of event processing.
If the nr_rb_read doesn't increase during the refresh time, the display
thread may output stale data. The timeout warning will be triggered.

The timeout warning can only be triggered one time to avoid the annoying
and duplicated warning message.

The first perf_top__mmap_read() is moved to after display thread create.
Because the perf_top__mmap_read() could cost long time. For example, the
function may cost tens of minutes on Knights Landing platform with
parallel kernel build. There will be nothing displayed on the screent.
The display thread has to be created before perf_top__mmap_read(). But
at that time, the data is not ready. Sleep the refresh time in display
thread.

Fix: 8cc42de736b6 ("perf top: Check the latency of
perf_top__mmap_read()")
Reported-by: David Miller 
Signed-off-by: Kan Liang 
---
 tools/perf/builtin-c2c.c   |  4 +--
 tools/perf/builtin-report.c|  3 ++-
 tools/perf/builtin-top.c   | 39 +++-
 tools/perf/ui/browsers/hists.c | 58 
++

 tools/perf/ui/browsers/hists.h |  2 +-
 tools/perf/util/hist.h |  6 +++--
 tools/perf/util/top.h  |  1 +
 7 files changed, 85 insertions(+), 28 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f3aa9d0..1e77515 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2371,7 +2371,7 @@ static int perf_c2c__browse_cacheline(struct 
hist_entry *he)

c2c_browser__update_nr_entries(browser);

while (1) {
-   key = hist_browser__run(browser, "? - help", true);
+   key = hist_browser__run(browser, "? - help", true, NULL);

switch (key) {
case 's':
@@ -2440,7 +2440,7 @@ static int perf_c2c__hists_browse(struct hists *hists)
c2c_browser__update_nr_entries(browser);

while (1) {
-   key = hist_browser__run(browser, "? - help", true);
+   key = hist_browser__run(browser, "? - help", true, NULL);

switch (key) {
case 'q':
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 76e12bc..ed908e6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -562,7 +562,8 @@ static int report__browse_hists(struct report *rep)
ret = perf_evlist__tui_browse_hists(evlist, help, NULL,
rep->min_percent,
&session->header.env,
-   true, 
&rep->annotation_opts);
+   true, &rep->annotation_opts,
+   NULL);
/*
 * Usually "ret" is the last pressed key, and we only
 * care if the key notifies us to switch data file.
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..95409de 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -584,6 +584,8 @@ static void *display_thread_tui(void *arg)
.refresh= top->delay_secs,
};

+   sleep(top->delay_secs);
+
/* In order to read symbols from other namespaces perf to  needs to call
 * setns(2).  This isn't permitted if the struct_fs has multiple users.
 * unshare(2) the fs so that we may continue to setns into namespaces
@@ -607,7 +609,8 @@ static void *display_thread_tui(void *arg)
  top->min_percent,
  &top->session->header.env,
  !top->record_opts.overwrite,
- &top->annotation_opts);
+ &top->annotation_opts,
+ &top->nr_rb_read);

done = 1;
return NULL;
@@ -633,6 +636,11 @@ static void 

Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread David Miller
From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 14:20:15 -0400

> You didn't see any warning before the patch. I think it is just
> because perf top hides the problem.

Quite honestly, the last time I played around with this:

1) The new ring buffer mode didn't exist

2) perf started up much more quickly and was much more responsive
   than it is these days

It used to handle a 128-cpu system doing a parallel kernel build
with no problems, no dropped events, nothing.

Something has changed to make perf more bloated and slow, and one by
one I'm trying to identify and deal with these issues rather than
just make perf abort when it can't keep up which is the approach
that has seem to have taken over.  That's to me is just wrong.

One point I want to make clear, dropping things like mmap events will
make perf run more slowly not more fast.

I keep trying to explain this over and over.

If you drop map events, you have no range into which to fit samples.

Therefore samples create a unique histogram entries, and the histogram
table grows to be quite huge.  And this slows down perf significantly
because every new sample and histogram decay walks this table, sorting
it, killing off old entries, finding a place for new ones, etc.

I really think dropping events causes more harm than good in this
case, therefore.

This also happens when perf cannot find a symbol, for example in a
binary or library with no symbols.  I see this as an area for
significant improvement.




Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 29, 2018 at 02:20:15PM -0400, Liang, Kan escreveu:
> 
> 
> On 10/29/2018 1:48 PM, David Miller wrote:
> > From: "Liang, Kan" 
> > Date: Mon, 29 Oct 2018 13:42:56 -0400
> > 
> > > 
> > > 
> > > On 10/29/2018 1:40 PM, David Miller wrote:
> > > > From: "Liang, Kan" 
> > > > Date: Mon, 29 Oct 2018 10:33:06 -0400
> > > > 
> > > > > I just realized that the problem in KNL will be back if we switch
> > > > > back to non-overwrite mode.
> > > > What is KNL?
> > > > 
> > > Intel Xeon Phi Processor, Knights Landing.
> > 
> > I don't understand how a specific piece of hardware directly leads to
> > ring buffer processing timeouts, or multi-minute thread map processing
> > times...
> 
> Perf top processes all samples in a serial way. With the number of CPU
> increasing under the heavy load, the number of samples increase
> dramatically. The processing time also increase significantly.
> When the processing time is longer than display refresh time, only the stale
> data is shown.
> 
> I use KNL as an example. Because the problem is even worse on KNL. There is
> nothing output with perf top.
> 
> In theory, it's a problem for all large scale platforms.
> 
> > 
> > You'll have to explain all of the details of your test scenerio, and
> > the exact problems triggers, which
> 
> My test was the same as yours, just running a parallel kernel build on KNL.
> 
> > caused you to write these patches
> > which causes serious regressions for what I consider a core simple use
> > case of perf top.
> 
> I agree that the warning message is annoying. I will try to find another way
> to deliver the message. But I think we do need the warning message.

There is no problem with the message, the problem is the thread where
the message is being displayed, just signal the display thread to
display the warning, not doing that in the event processing thread.
 
> You didn't see any warning before the patch. I think it is just because perf
> top hides the problem.
> 
> Thanks,
> Kan
> > 
> > And that's running perf top during a parallel kernel build.
> > 
> > 


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 1:48 PM, David Miller wrote:

From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 13:42:56 -0400




On 10/29/2018 1:40 PM, David Miller wrote:

From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 10:33:06 -0400


I just realized that the problem in KNL will be back if we switch
back to non-overwrite mode.

What is KNL?


Intel Xeon Phi Processor, Knights Landing.


I don't understand how a specific piece of hardware directly leads to
ring buffer processing timeouts, or multi-minute thread map processing
times...


Perf top processes all samples in a serial way. With the number of CPU 
increasing under the heavy load, the number of samples increase 
dramatically. The processing time also increase significantly.
When the processing time is longer than display refresh time, only the 
stale data is shown.


I use KNL as an example. Because the problem is even worse on KNL. There 
is nothing output with perf top.


In theory, it's a problem for all large scale platforms.



You'll have to explain all of the details of your test scenerio, and
the exact problems triggers, which


My test was the same as yours, just running a parallel kernel build on KNL.


caused you to write these patches
which causes serious regressions for what I consider a core simple use
case of perf top.


I agree that the warning message is annoying. I will try to find another 
way to deliver the message. But I think we do need the warning message.


You didn't see any warning before the patch. I think it is just because 
perf top hides the problem.


Thanks,
Kan


And that's running perf top during a parallel kernel build.




Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 29, 2018 at 10:43:21AM -0700, David Miller escreveu:
> From: "Liang, Kan" 
> Date: Mon, 29 Oct 2018 11:11:25 -0400
> 
> > The processing time for each perf_top__mmap_read_idx() should not that
> > long. We may check it after each perf_top__mmap_read_idx(). Also
> > change the ui_warning to one-time warning. The patch as below can do
> > that (not test).
> 
> You cannot make ui__*() calls from the event processing thread.
> 
> I tried to make this point strongly over the weekend.
> 
> It hangs the event thread, because the ui call waits for a keypress
> but the display thread will eat them up and the event thread thus
> hangs in select().
> 
> So, once more, all ui calls must be avoided in event processing code.
> 
> Said another way, it is not legal to call UI interfaces from any
> code that could be called by the event thread.

Right, that global in fact needs to be set in the event processing and
checked in the display thread, start it with 0, switch to 1 and then to
2 to say it was processed just once, etc.

- Arnaldo


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 29, 2018 at 10:40:08AM -0700, David Miller escreveu:
> From: "Liang, Kan" 
> Date: Mon, 29 Oct 2018 10:33:06 -0400
> 
> > I just realized that the problem in KNL will be back if we switch
> > back to non-overwrite mode.
> 
>> What is KNL?

I guess its Knights Landing.

https://en.wikipedia.org/wiki/Xeon_Phi

- Arnaldo


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread David Miller
From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 13:42:56 -0400

> 
> 
> On 10/29/2018 1:40 PM, David Miller wrote:
>> From: "Liang, Kan" 
>> Date: Mon, 29 Oct 2018 10:33:06 -0400
>> 
>>> I just realized that the problem in KNL will be back if we switch
>>> back to non-overwrite mode.
>> What is KNL?
>>
> Intel Xeon Phi Processor, Knights Landing.

I don't understand how a specific piece of hardware directly leads to
ring buffer processing timeouts, or multi-minute thread map processing
times...

You'll have to explain all of the details of your test scenerio, and
the exact problems triggers, which caused you to write these patches
which causes serious regressions for what I consider a core simple use
case of perf top.

And that's running perf top during a parallel kernel build.




Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread David Miller
From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 11:11:25 -0400

> The processing time for each perf_top__mmap_read_idx() should not that
> long. We may check it after each perf_top__mmap_read_idx(). Also
> change the ui_warning to one-time warning. The patch as below can do
> that (not test).

You cannot make ui__*() calls from the event processing thread.

I tried to make this point strongly over the weekend.

It hangs the event thread, because the ui call waits for a keypress
but the display thread will eat them up and the event thread thus
hangs in select().

So, once more, all ui calls must be avoided in event processing code.

Said another way, it is not legal to call UI interfaces from any
code that could be called by the event thread.


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 1:40 PM, David Miller wrote:

From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 10:33:06 -0400


I just realized that the problem in KNL will be back if we switch
back to non-overwrite mode.


What is KNL?


Intel Xeon Phi Processor, Knights Landing.

Thanks,
Kan



Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread David Miller
From: "Liang, Kan" 
Date: Mon, 29 Oct 2018 10:33:06 -0400

> I just realized that the problem in KNL will be back if we switch
> back to non-overwrite mode.

What is KNL?


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 10:35 AM, Arnaldo Carvalho de Melo wrote:

Em Mon, Oct 29, 2018 at 10:33:06AM -0400, Liang, Kan escreveu:

On 10/29/2018 9:03 AM, Arnaldo Carvalho de Melo wrote:

Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu:

On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote:

Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu:

It is mainly for performance reason to switch to overwrite mode. The impact
was very small when I did my test. But now the effect is easily noticeable
in other tests. Yes, I agree. We may change it back to non-overwrite mode
until the issue is addressed.



So, I have these two patches in my perf/core branch, with Fixes tags
that will make them get to the stable kernels, ok?
  

I just realized that the problem in KNL will be back if we switch back to
non-overwrite mode.
The problem is that users have to wait tens of minutes to see perf top
results on the screen in KNL. Before that, there is nothing but a black
screen.
  

Sorry I didn't notice it last Friday. Because I thought the ui_warning in
perf_top__mmap_read() can give user a hint. So the user can switch to
overwrite mode manually.
But unfortunately, the ui_warning doesn't work. Because it is called after
perf_top__mmap_read(). The processing time of perf_top__mmap_read() could be
tens of minutes.


So we need a way to notice that we're in a machine like that and warn
the user before the wait takes place, ideas on how to do that?

The processing time for each perf_top__mmap_read_idx() should not that 
long. We may check it after each perf_top__mmap_read_idx(). Also change 
the ui_warning to one-time warning. The patch as below can do that (not 
test).



diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d875..5e532e0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -877,31 +877,40 @@ static void perf_top__mmap_read_idx(struct 
perf_top *top, int idx)

perf_mmap__read_done(md);
 }

+static bool check_processing_time = true;
+
 static void perf_top__mmap_read(struct perf_top *top)
 {
bool overwrite = top->record_opts.overwrite;
struct perf_evlist *evlist = top->evlist;
-   unsigned long long start, end;
+   unsigned long long start, end, tolerance;
int i;

-   start = rdclock();
if (overwrite)
perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_DATA_PENDING);

-   for (i = 0; i < top->evlist->nr_mmaps; i++)
+	tolerance = (unsigned long long)top->delay_secs * NSEC_PER_SEC / 
top->evlist->nr_mmaps;

+   start = rdclock();
+   for (i = 0; i < top->evlist->nr_mmaps; i++) {
perf_top__mmap_read_idx(top, i);
+   if (check_processing_time) {
+   end = rdclock();
+
+   if ((end - start) > tolerance) {
+   ui__warning("Too slow to read ring buffer.\n"
+   "Please try increasing the period (-c) 
or\n"
+   "decreasing the freq (-F) or\n"
+   "limiting the number of CPUs 
(-C)\n");
+   check_processing_time = false;
+   }
+   start = end;
+   }
+   }

if (overwrite) {
perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING);
}
-   end = rdclock();
-
-   if ((end - start) > (unsigned long long)top->delay_secs * NSEC_PER_SEC)
-   ui__warning("Too slow to read ring buffer.\n"
-   "Please try increasing the period (-c) or\n"
-   "decreasing the freq (-F) or\n"
-   "limiting the number of CPUs (-C)\n");
 }

 /*


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 29, 2018 at 10:33:06AM -0400, Liang, Kan escreveu:
> On 10/29/2018 9:03 AM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu:
> > > On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote:
> > > > Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu:
> > > It is mainly for performance reason to switch to overwrite mode. The 
> > > impact
> > > was very small when I did my test. But now the effect is easily noticeable
> > > in other tests. Yes, I agree. We may change it back to non-overwrite mode
> > > until the issue is addressed.

> > So, I have these two patches in my perf/core branch, with Fixes tags
> > that will make them get to the stable kernels, ok?
 
> I just realized that the problem in KNL will be back if we switch back to
> non-overwrite mode.
> The problem is that users have to wait tens of minutes to see perf top
> results on the screen in KNL. Before that, there is nothing but a black
> screen.
 
> Sorry I didn't notice it last Friday. Because I thought the ui_warning in
> perf_top__mmap_read() can give user a hint. So the user can switch to
> overwrite mode manually.
> But unfortunately, the ui_warning doesn't work. Because it is called after
> perf_top__mmap_read(). The processing time of perf_top__mmap_read() could be
> tens of minutes.

So we need a way to notice that we're in a machine like that and warn
the user before the wait takes place, ideas on how to do that?

- Arnaldo


Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Liang, Kan




On 10/29/2018 9:03 AM, Arnaldo Carvalho de Melo wrote:

Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu:

On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote:

Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu:

On 10/26/2018 3:12 PM, Arnaldo Carvalho de Melo wrote:

Em Fri, Oct 26, 2018 at 03:07:40PM -0400, Liang, Kan escreveu:

On 10/26/2018 3:02 PM, Arnaldo Carvalho de Melo wrote:

I checked and both have the same result. But I still think there is
value in having the shorter form, ok?



Sure.



Ok.



I think that we should default back to --no-overwrite till we get this
sorted out, as the effect is easily noticeable, as David reported and I
reproduced, when doing kernel builds.
  

It is mainly for performance reason to switch to overwrite mode. The impact
was very small when I did my test. But now the effect is easily noticeable
in other tests. Yes, I agree. We may change it back to non-overwrite mode
until the issue is addressed.


So, I have these two patches in my perf/core branch, with Fixes tags
that will make them get to the stable kernels, ok?


I just realized that the problem in KNL will be back if we switch back 
to non-overwrite mode.
The problem is that users have to wait tens of minutes to see perf top 
results on the screen in KNL. Before that, there is nothing but a black 
screen.


Sorry I didn't notice it last Friday. Because I thought the ui_warning 
in perf_top__mmap_read() can give user a hint. So the user can switch to 
overwrite mode manually.
But unfortunately, the ui_warning doesn't work. Because it is called 
after perf_top__mmap_read(). The processing time of 
perf_top__mmap_read() could be tens of minutes.


Thanks,
Kan




 From f54ef0e7342efb77205b2faaacbcb81cdd31f064 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo 
Date: Fri, 26 Oct 2018 15:55:23 -0300
Subject: [PATCH 1/2] perf top: Allow disabling the overwrite mode

In ebebbf082357 ("perf top: Switch default mode to overwrite mode") we
forgot to leave a way to disable that new default, add a --overwrite
option that can be disabled using --no-overwrite, since the code already
in such a way that we can readily disable this mode.

This is useful when investigating bugs with this mode like the recent
report from David Miller where lots of unknown symbols appear due to
disabling the events while processing them which disables all record
types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve
maps when we lose PERF_RECORD_MMAP records.

This can be easily seen while building a kernel, when there are lots of
short lived processes.

Reported-by: David Miller 
Acked-by: Kan Liang 
Cc: Adrian Hunter 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode")
Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcd...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
  tools/perf/Documentation/perf-top.txt | 5 +
  tools/perf/builtin-top.c  | 2 ++
  2 files changed, 7 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt 
b/tools/perf/Documentation/perf-top.txt
index 114fda12aa49..d4be6061fe1c 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -242,6 +242,11 @@ Default is to monitor all CPUS.
  --hierarchy::
Enable hierarchy output.
  
+--overwrite::

+   This is the default, but for investigating problems with it or any 
other strange
+   behaviour like lots of unknown samples, we may want to disable this 
mode by using
+   --no-overwrite.
+
  --force::
Don't do ownership validation.
  
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c

index d21d8751e749..214fad747b04 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1372,6 +1372,8 @@ int cmd_top(int argc, const char **argv)
"Show raw trace event output (do not use print fmt or 
plugins)"),
OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
"Show entries in a hierarchy"),
+   OPT_BOOLEAN(0, "overwrite", &top.record_opts.overwrite,
+   "Use a backward ring buffer, default: yes"),
OPT_BOOLEAN(0, "force", &symbol_conf.force, "don't complain, do it"),
OPT_UINTEGER(0, "num-thread-synthesize", &top.nr_threads_synthesize,
"number of thread to run event synthesize"),



[PATCHES/RFC] Re: A concern about overflow ring buffer mode

2018-10-29 Thread Arnaldo Carvalho de Melo
Em Fri, Oct 26, 2018 at 04:11:51PM -0400, Liang, Kan escreveu:
> On 10/26/2018 3:24 PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 26, 2018 at 03:16:29PM -0400, Liang, Kan escreveu:
> > > On 10/26/2018 3:12 PM, Arnaldo Carvalho de Melo wrote:
> > > > Em Fri, Oct 26, 2018 at 03:07:40PM -0400, Liang, Kan escreveu:
> > > > > On 10/26/2018 3:02 PM, Arnaldo Carvalho de Melo wrote:
> > > > I checked and both have the same result. But I still think there is
> > > > value in having the shorter form, ok?

> > > Sure.

> > Ok.

> > I think that we should default back to --no-overwrite till we get this
> > sorted out, as the effect is easily noticeable, as David reported and I
> > reproduced, when doing kernel builds.
 
> It is mainly for performance reason to switch to overwrite mode. The impact
> was very small when I did my test. But now the effect is easily noticeable
> in other tests. Yes, I agree. We may change it back to non-overwrite mode
> until the issue is addressed.

So, I have these two patches in my perf/core branch, with Fixes tags
that will make them get to the stable kernels, ok?


>From f54ef0e7342efb77205b2faaacbcb81cdd31f064 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo 
Date: Fri, 26 Oct 2018 15:55:23 -0300
Subject: [PATCH 1/2] perf top: Allow disabling the overwrite mode

In ebebbf082357 ("perf top: Switch default mode to overwrite mode") we
forgot to leave a way to disable that new default, add a --overwrite
option that can be disabled using --no-overwrite, since the code already
in such a way that we can readily disable this mode.

This is useful when investigating bugs with this mode like the recent
report from David Miller where lots of unknown symbols appear due to
disabling the events while processing them which disables all record
types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve
maps when we lose PERF_RECORD_MMAP records.

This can be easily seen while building a kernel, when there are lots of
short lived processes.

Reported-by: David Miller 
Acked-by: Kan Liang 
Cc: Adrian Hunter 
Cc: Andi Kleen 
Cc: David Ahern 
Cc: Jin Yao 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Wang Nan 
Fixes: ebebbf082357 ("perf top: Switch default mode to overwrite mode")
Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcd...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/Documentation/perf-top.txt | 5 +
 tools/perf/builtin-top.c  | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt 
b/tools/perf/Documentation/perf-top.txt
index 114fda12aa49..d4be6061fe1c 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -242,6 +242,11 @@ Default is to monitor all CPUS.
 --hierarchy::
Enable hierarchy output.
 
+--overwrite::
+   This is the default, but for investigating problems with it or any 
other strange
+   behaviour like lots of unknown samples, we may want to disable this 
mode by using
+   --no-overwrite.
+
 --force::
Don't do ownership validation.
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index d21d8751e749..214fad747b04 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1372,6 +1372,8 @@ int cmd_top(int argc, const char **argv)
"Show raw trace event output (do not use print fmt or 
plugins)"),
OPT_BOOLEAN(0, "hierarchy", &symbol_conf.report_hierarchy,
"Show entries in a hierarchy"),
+   OPT_BOOLEAN(0, "overwrite", &top.record_opts.overwrite,
+   "Use a backward ring buffer, default: yes"),
OPT_BOOLEAN(0, "force", &symbol_conf.force, "don't complain, do it"),
OPT_UINTEGER(0, "num-thread-synthesize", &top.nr_threads_synthesize,
"number of thread to run event synthesize"),
-- 
2.14.4

>From 5af190cac4ec10c53f1a934e7bbd30da7e616b22 Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo 
Date: Mon, 29 Oct 2018 09:47:00 -0300
Subject: [PATCH 2/2] perf top: Do not use overwrite mode by default

Enabling --overwrite mode allows us to to use just the most recent
records, which helps in high core count machines such as Knights
Landing/Mill, but right now is being disabled by default as the pausing
used in this technique is leading to loss of metadata events such as
PERF_RECORD_MMAP which makes 'perf top' unable to resolve samples,
leading to lots of unknown samples appearing on the UI.

Enabling this may be useful if you are in such machines and profiling a
workload that doesn't creates short lived threads and/or doesn't uses
many executable mmap operations.

Work is being planed to solve this situation, till then, this will
remain disabled by default.

Reported-by: David Miller 
Acked-by: Kan Liang 
Link: 
https://lkml.kernel.org/r/4f84468f-37d9-cf1b-12c1-514ef74b6...@linux.intel.com
Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Wan