Re: [PATCH] perf/core: clear sibling list of detached events (was "Re: [PATCH] perf: Fix sibling iteration")

2018-03-16 Thread Peter Zijlstra
On Fri, Mar 16, 2018 at 01:17:41PM +, Mark Rutland wrote:
> I've given this 50 boots with the 0day scripts, and no explosions so far
> (with 5 boots where a leader had an empty group_node).

I've been running this for almost 2 hours now, and no splats either.
I'll let it run for another few hours, just in case.

Thanks Mark!


[PATCH] perf/core: clear sibling list of detached events (was "Re: [PATCH] perf: Fix sibling iteration")

2018-03-16 Thread Mark Rutland
On Fri, Mar 16, 2018 at 11:50:17AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 16, 2018 at 11:39:46AM +0100, Jiri Olsa wrote:
> > On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote:

> > > There is at least one more known issue with that patch, but neither Mark
> > > nor me could reproduce so far, so we don't know if we're right about the
> > > cause.
> > 
> > is there more info about that issue? I could try it
> 
> Find below, 0day report didn't go out to lkml. We think moving the
> list_del_init() out from the !RB_NODE_EMPTY() test might fix, but since
> we can't repro so far, its all guesses.

In testing, I can see this always fires after we perf_group_detach() a
leader whose group_node is empty.

With the list_del_init() pulled out of that check, I see that we still
hit the leaders with an empty group_node (with a hacked-in WARN_ON), but
no longer blow up in a subsequent perf_group_detach().

I've given this 50 boots with the 0day scripts, and no explosions so far
(with 5 boots where a leader had an empty group_node).

Thanks,
Mark.

>8
>From 136ebe0f3756d4cf1a37f6d00b7ec1b902980b83 Mon Sep 17 00:00:00 2001
From: Mark Rutland 
Date: Fri, 16 Mar 2018 12:51:40 +
Subject: [PATCH] perf/core: clear sibling list of detached events

When perf_group_dettach() is called on a group leader, it updates each
sibling's group_leader field to point to that sibling, effectively
upgrading each siblnig to a group leader. After perf_group_detach has
completed, the caller may free the leader event.

We only remove siblings from the group leader's sibling_list when the
leader has a non-empty group_node. This was fine prior to commit:

  8343aae66167df67 ("perf/core: Remove perf_event::group_entry")

... as the sibling's sibling_list would be empty. However, now that we
use the sibling_list field as both the list head and the list entry,
this leaves each sibling with a non-empty sibling list, including the
stale leader event.

If perf_group_detach() is subsequently called on a sibling, it will
appear to be a group leader, and we'll walk the sibling_list,
potentially dereferencing these stale events. In 0day testing, this has
been observed to result in kernel panics.

Let's avoid this by always removing siblings from the sibling list when
we promote them to leaders.

Fixes: 8343aae66167df67 ("perf/core: Remove perf_event::group_entry")
Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9a07bbe66451..627814e1820d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1917,12 +1917,12 @@ static void perf_group_detach(struct perf_event *event)
list_for_each_entry_safe(sibling, tmp, &event->sibling_list, 
sibling_list) {
 
sibling->group_leader = sibling;
+   list_del_init(&sibling->sibling_list);
 
/* Inherit group flags from the previous leader */
sibling->group_caps = event->group_caps;
 
if (!RB_EMPTY_NODE(&event->group_node)) {
-   list_del_init(&sibling->sibling_list);
add_event_to_groups(sibling, event->ctx);
}
 
-- 
2.11.0



Re: [PATCH] perf: Fix sibling iteration

2018-03-16 Thread Mark Rutland
On Fri, Mar 16, 2018 at 11:50:17AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 16, 2018 at 11:39:46AM +0100, Jiri Olsa wrote:
> > On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote:

> > > There is at least one more known issue with that patch, but neither Mark
> > > nor me could reproduce so far, so we don't know if we're right about the
> > > cause.
> > 
> > is there more info about that issue? I could try it
> 
> Find below, 0day report didn't go out to lkml. We think moving the
> list_del_init() out from the !RB_NODE_EMPTY() test might fix, but since
> we can't repro so far, its all guesses.

I've managed to reproduce this using the 0day scripts. From the 0day
logs, it looks like it's possible to hit it ~6% of the time.

I added a WARN_ON(RB_NODE_EMPTY(...)), and I see that fire in the exit
path:

[   76.287197]  perf_remove_from_context+0x9a/0xc0
[   76.287552]  perf_event_release_kernel+0x214/0x3e0
[   76.287928]  ? _raw_spin_unlock+0x8/0x10
[   76.288237]  ? locks_remove_file+0x219/0x230
[   76.288572]  perf_release+0xe/0x20
[   76.288842]  __fput+0x1c9/0x340
[   76.289089]  fput+0x8/0x10
[   76.289332]  task_work_run+0x9a/0xd0
[   76.289613]  do_exit+0x6cc/0x1220
[   76.289877]  ? __might_sleep+0xcb/0x150
[   76.290183]  do_group_exit+0x96/0x110
[   76.290473]  get_signal+0x8c3/0xb60
[   76.290750]  ? __perf_event_task_sched_in+0x20d/0x250
[   76.291143]  do_signal+0x19/0x950
[   76.291408]  ? finish_task_switch+0x212/0x480
[   76.291750]  ? __switch_to+0x414/0x730
[   76.292051]  ? _raw_spin_unlock_irqrestore+0x45/0x60
[   76.292439]  ? trace_hardirqs_on+0x63/0x100
[   76.292769]  ? prepare_to_wait_event+0x23a/0x250
[   76.293132]  ? do_int80_syscall_32+0x271/0x290
[   76.293478]  exit_to_usermode_loop+0xb9/0x190
[   76.293819]  do_int80_syscall_32+0x271/0x290
[   76.294161]  entry_INT80_32+0x36/0x36

... then we subsequently hit the initial splat, which looks promisingly
like our initial theory.

However, I don't currently understand how we can see a group leader with
an empty RB node in this path. I can only see that being the case for
siblings that we promoted to being leaders, and those should have an
empty list at the point we promote them...

Thanks,
Mark.


Re: [PATCH] perf: Fix sibling iteration

2018-03-16 Thread Alexey Budankov
On 16.03.2018 13:31, Peter Zijlstra wrote:
> On Fri, Mar 16, 2018 at 12:59:34PM +0300, Alexey Budankov wrote:
>> Hi,
>> On 15.03.2018 20:01, Peter Zijlstra wrote:
>>> Subject: perf: Fix sibling iteration
>>> From: Peter Zijlstra 
>>> Date: Thu Mar 15 17:36:56 CET 2018
>>>
>>> Mark noticed that the change to sibling_list changed some iteration
>>> semantics; because previously we used group_list as list entry,
>>> sibling events would always have an empty sibling_list.
>>>
>>> But because we now use sibling_list for both list head and list entry,
>>> siblings will report as having siblings.
>>>
>>> Fix this with a custom for_each_sibling_event() iterator.
>>>
>>> Suggested-by: Mark Rutland 
>>> Reported-by: Mark Rutland 
>>> Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
>>> Signed-off-by: Peter Zijlstra (Intel) 
>>> ---
>>
>> Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 
>> cores):
> 
> There is at least one more known issue with that patch, but neither Mark
> nor me could reproduce so far, so we don't know if we're right about the
> cause.

Please share more information regarding the issue. I am ready to help.

Thanks,
Alexey



Re: [PATCH] perf: Fix sibling iteration

2018-03-16 Thread Jiri Olsa
On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 16, 2018 at 12:59:34PM +0300, Alexey Budankov wrote:
> > Hi,
> > On 15.03.2018 20:01, Peter Zijlstra wrote:
> > > Subject: perf: Fix sibling iteration
> > > From: Peter Zijlstra 
> > > Date: Thu Mar 15 17:36:56 CET 2018
> > > 
> > > Mark noticed that the change to sibling_list changed some iteration
> > > semantics; because previously we used group_list as list entry,
> > > sibling events would always have an empty sibling_list.
> > > 
> > > But because we now use sibling_list for both list head and list entry,
> > > siblings will report as having siblings.
> > > 
> > > Fix this with a custom for_each_sibling_event() iterator.
> > > 
> > > Suggested-by: Mark Rutland 
> > > Reported-by: Mark Rutland 
> > > Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > 
> > Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 
> > cores):
> 
> There is at least one more known issue with that patch, but neither Mark
> nor me could reproduce so far, so we don't know if we're right about the
> cause.

is there more info about that issue? I could try it

thanks,
jirka


Re: [PATCH] perf: Fix sibling iteration

2018-03-16 Thread Peter Zijlstra
On Fri, Mar 16, 2018 at 12:59:34PM +0300, Alexey Budankov wrote:
> Hi,
> On 15.03.2018 20:01, Peter Zijlstra wrote:
> > Subject: perf: Fix sibling iteration
> > From: Peter Zijlstra 
> > Date: Thu Mar 15 17:36:56 CET 2018
> > 
> > Mark noticed that the change to sibling_list changed some iteration
> > semantics; because previously we used group_list as list entry,
> > sibling events would always have an empty sibling_list.
> > 
> > But because we now use sibling_list for both list head and list entry,
> > siblings will report as having siblings.
> > 
> > Fix this with a custom for_each_sibling_event() iterator.
> > 
> > Suggested-by: Mark Rutland 
> > Reported-by: Mark Rutland 
> > Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> 
> Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 
> cores):

There is at least one more known issue with that patch, but neither Mark
nor me could reproduce so far, so we don't know if we're right about the
cause.




Re: [PATCH] perf: Fix sibling iteration

2018-03-16 Thread Alexey Budankov
Hi,
On 15.03.2018 20:01, Peter Zijlstra wrote:
> Subject: perf: Fix sibling iteration
> From: Peter Zijlstra 
> Date: Thu Mar 15 17:36:56 CET 2018
> 
> Mark noticed that the change to sibling_list changed some iteration
> semantics; because previously we used group_list as list entry,
> sibling events would always have an empty sibling_list.
> 
> But because we now use sibling_list for both list head and list entry,
> siblings will report as having siblings.
> 
> Fix this with a custom for_each_sibling_event() iterator.
> 
> Suggested-by: Mark Rutland 
> Reported-by: Mark Rutland 
> Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
> Signed-off-by: Peter Zijlstra (Intel) 
> ---

Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 
cores):

make -C tools/perf/ build-test
make: Entering directory '/root/abudanko/tip/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
- /root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP: cd . && make 
FEATURE_DUMP_COPY=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
feature-dump
cd . && make 
FEATURE_DUMP_COPY=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP 
feature-dump
  make_with_babeltrace_O: cd . && make LIBBABELTRACE=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.icWjm9G83i DESTDIR=/tmp/tmp.k3gS9nZWn1
   make_no_libperl_O: cd . && make NO_LIBPERL=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.cFFDwS0GI0 DESTDIR=/tmp/tmp.ZSHnATR3nx
  make_no_gtk2_O: cd . && make NO_GTK2=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.RnSxaWFXYe DESTDIR=/tmp/tmp.fQ1csVAL81
  make_no_newt_O: cd . && make NO_NEWT=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.IVFn6QEq17 DESTDIR=/tmp/tmp.6eBvM9A4kX
 make_no_backtrace_O: cd . && make NO_BACKTRACE=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.EM0Q3e692P DESTDIR=/tmp/tmp.ZwfkJkPCHP
make_no_libdw_dwarf_unwind_O: cd . && make NO_LIBDW_DWARF_UNWIND=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.3Vo6EX89gQ DESTDIR=/tmp/tmp.AYf0P4xIMu
   make_no_scripts_O: cd . && make NO_LIBPYTHON=1 NO_LIBPERL=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.wKJnsI2J6Y DESTDIR=/tmp/tmp.dbkavZubVA
  make_no_auxtrace_O: cd . && make NO_AUXTRACE=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.JC2HxLAtWy DESTDIR=/tmp/tmp.m3kuS2SjoJ
 make_tags_O: cd . && make tags 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.yNJ1JYIHRe DESTDIR=/tmp/tmp.DfpYbrOHyN
  make_doc_O: cd . && make doc 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.4oRV5wwXRG DESTDIR=/tmp/tmp.hoHvRcWGMz
make_clean_all_O: cd . && make clean all 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.VVZwGQwAsl DESTDIR=/tmp/tmp.otyKC854jC
  make_no_demangle_O: cd . && make NO_DEMANGLE=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.Pqn88IDBJe DESTDIR=/tmp/tmp.26CemSq8rl
  make_install_bin_O: cd . && make install-bin 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.CJ5gfJRJ88 DESTDIR=/tmp/tmp.hS0o8yXeK7
  make_install_O: cd . && make install 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.RPRcY6uDq8 DESTDIR=/tmp/tmp.IW01wtATTd
 make_install_prefix_slash_O: cd . && make install prefix=/tmp/krava/ 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.gvOLjRABv2 DESTDIR=/tmp/tmp.6OMgy187rT
 make_no_slang_O: cd . && make NO_SLANG=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.5IfNr3OiDn DESTDIR=/tmp/tmp.28yzBF5bDE
   make_perf_o_O: cd . && make perf.o 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.CTmyGT5ZNU DESTDIR=/tmp/tmp.ohiadqvLzw
   make_with_clangllvm_O: cd . && make LIBCLANGLLVM=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.bDoP5XgtvV DESTDIR=/tmp/tmp.Zfn944JFmO
make_no_libelf_O: cd . && make NO_LIBELF=1 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.mzs7boZhfA DESTDIR=/tmp/tmp.P9AgDYEA3j
 make_pure_O: cd . && make 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.pJGEhkbl9n DESTDIR=/tmp/tmp.msv5mDxu4F
   make_install_prefix_O: cd . && make install prefix=/tmp/krava 
FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP  
O=/tmp/tmp.QoERhLwZDr DESTDIR=/tmp/tmp.ZEmUFt1UnF
 make_no_libpython_O: cd . && make NO_LIBPYTHON=1 
FEATURES_DUMP=/root/abudanko/tip/tools/per

[PATCH] perf: Fix sibling iteration

2018-03-15 Thread Peter Zijlstra
Subject: perf: Fix sibling iteration
From: Peter Zijlstra 
Date: Thu Mar 15 17:36:56 CET 2018

Mark noticed that the change to sibling_list changed some iteration
semantics; because previously we used group_list as list entry,
sibling events would always have an empty sibling_list.

But because we now use sibling_list for both list head and list entry,
siblings will report as having siblings.

Fix this with a custom for_each_sibling_event() iterator.

Suggested-by: Mark Rutland 
Reported-by: Mark Rutland 
Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry")
Signed-off-by: Peter Zijlstra (Intel) 
---
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -351,7 +351,7 @@ static int collect_events(struct perf_ev
evtype[n] = group->hw.event_base;
current_idx[n++] = PMC_NO_INDEX;
}
-   list_for_each_entry(pe, &group->sibling_list, sibling_list) {
+   for_each_sibling_event(pe, group) {
if (!is_software_event(pe) && pe->state != 
PERF_EVENT_STATE_OFF) {
if (n >= max_count)
return -1;
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -269,7 +269,7 @@ static bool mmdc_pmu_group_is_valid(stru
return false;
}
 
-   list_for_each_entry(sibling, &leader->sibling_list, sibling_list) {
+   for_each_sibling_event(sibling, leader) {
if (!mmdc_pmu_group_event_is_valid(sibling, pmu, &counter_mask))
return false;
}
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -293,7 +293,7 @@ static bool l2x0_pmu_group_is_valid(stru
else if (!is_software_event(leader))
return false;
 
-   list_for_each_entry(sibling, &leader->sibling_list, sibling_list) {
+   for_each_sibling_event(sibling, leader) {
if (sibling->pmu == pmu)
num_hw++;
else if (!is_software_event(sibling))
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -711,7 +711,7 @@ static int validate_group(struct perf_ev
if (mipsxx_pmu_alloc_counter(&fake_cpuc, &leader->hw) < 0)
return -EINVAL;
 
-   list_for_each_entry(sibling, &leader->sibling_list, sibling_list) {
+   for_each_sibling_event(sibling, leader) {
if (mipsxx_pmu_alloc_counter(&fake_cpuc, &sibling->hw) < 0)
return -EINVAL;
}
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1426,7 +1426,7 @@ static int collect_events(struct perf_ev
flags[n] = group->hw.event_base;
events[n++] = group->hw.config;
}
-   list_for_each_entry(event, &group->sibling_list, sibling_list) {
+   for_each_sibling_event(event, group) {
if (event->pmu->task_ctx_nr == perf_hw_context &&
event->state != PERF_EVENT_STATE_OFF) {
if (n >= max_count)
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -277,7 +277,7 @@ static int collect_events(struct perf_ev
ctrs[n] = group;
n++;
}
-   list_for_each_entry(event, &group->sibling_list, sibling_list) {
+   for_each_sibling_event(event, group) {
if (!is_software_event(event) &&
event->state != PERF_EVENT_STATE_OFF) {
if (n >= max_count)
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1342,7 +1342,7 @@ static int collect_events(struct perf_ev
events[n] = group->hw.event_base;
current_idx[n++] = PIC_NO_INDEX;
}
-   list_for_each_entry(event, &group->sibling_list, sibling_list) {
+   for_each_sibling_event(event, group) {
if (!is_software_event(event) &&
event->state != PERF_EVENT_STATE_OFF) {
if (n >= max_count)
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -990,7 +990,7 @@ static int collect_events(struct cpu_hw_
if (!dogrp)
return n;
 
-   list_for_each_entry(event, &leader->sibling_list, sibling_list) {
+   for_each_sibling_event(event, leader) {
if (!is_x86_event(event) ||
event->state <= PERF_EVENT_STATE_OFF)
continue;
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -354,7 +354,7 @@ uncore_collect_events(struct intel_uncor
if (!dogrp)
return n;
 
-   list_for_each_entry(event, &leader->sibling_list, sibling_list) {
+   for_each_sibling_event(event, leader) {
if (!is_box_event(box, event) ||
event->state <= PERF_EVENT_STATE_OFF)
continue;
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@