Re: [PATCH] perf/core: clear sibling list of detached events (was "Re: [PATCH] perf: Fix sibling iteration")
On Fri, Mar 16, 2018 at 01:17:41PM +, Mark Rutland wrote: > I've given this 50 boots with the 0day scripts, and no explosions so far > (with 5 boots where a leader had an empty group_node). I've been running this for almost 2 hours now, and no splats either. I'll let it run for another few hours, just in case. Thanks Mark!
[PATCH] perf/core: clear sibling list of detached events (was "Re: [PATCH] perf: Fix sibling iteration")
On Fri, Mar 16, 2018 at 11:50:17AM +0100, Peter Zijlstra wrote: > On Fri, Mar 16, 2018 at 11:39:46AM +0100, Jiri Olsa wrote: > > On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote: > > > There is at least one more known issue with that patch, but neither Mark > > > nor me could reproduce so far, so we don't know if we're right about the > > > cause. > > > > is there more info about that issue? I could try it > > Find below, 0day report didn't go out to lkml. We think moving the > list_del_init() out from the !RB_NODE_EMPTY() test might fix, but since > we can't repro so far, its all guesses. In testing, I can see this always fires after we perf_group_detach() a leader whose group_node is empty. With the list_del_init() pulled out of that check, I see that we still hit the leaders with an empty group_node (with a hacked-in WARN_ON), but no longer blow up in a subsequent perf_group_detach(). I've given this 50 boots with the 0day scripts, and no explosions so far (with 5 boots where a leader had an empty group_node). Thanks, Mark. >8 >From 136ebe0f3756d4cf1a37f6d00b7ec1b902980b83 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 16 Mar 2018 12:51:40 + Subject: [PATCH] perf/core: clear sibling list of detached events When perf_group_dettach() is called on a group leader, it updates each sibling's group_leader field to point to that sibling, effectively upgrading each siblnig to a group leader. After perf_group_detach has completed, the caller may free the leader event. We only remove siblings from the group leader's sibling_list when the leader has a non-empty group_node. This was fine prior to commit: 8343aae66167df67 ("perf/core: Remove perf_event::group_entry") ... as the sibling's sibling_list would be empty. However, now that we use the sibling_list field as both the list head and the list entry, this leaves each sibling with a non-empty sibling list, including the stale leader event. If perf_group_detach() is subsequently called on a sibling, it will appear to be a group leader, and we'll walk the sibling_list, potentially dereferencing these stale events. In 0day testing, this has been observed to result in kernel panics. Let's avoid this by always removing siblings from the sibling list when we promote them to leaders. Fixes: 8343aae66167df67 ("perf/core: Remove perf_event::group_entry") Signed-off-by: Mark Rutland Cc: Peter Zijlstra --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 9a07bbe66451..627814e1820d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1917,12 +1917,12 @@ static void perf_group_detach(struct perf_event *event) list_for_each_entry_safe(sibling, tmp, &event->sibling_list, sibling_list) { sibling->group_leader = sibling; + list_del_init(&sibling->sibling_list); /* Inherit group flags from the previous leader */ sibling->group_caps = event->group_caps; if (!RB_EMPTY_NODE(&event->group_node)) { - list_del_init(&sibling->sibling_list); add_event_to_groups(sibling, event->ctx); } -- 2.11.0
Re: [PATCH] perf: Fix sibling iteration
On Fri, Mar 16, 2018 at 11:50:17AM +0100, Peter Zijlstra wrote: > On Fri, Mar 16, 2018 at 11:39:46AM +0100, Jiri Olsa wrote: > > On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote: > > > There is at least one more known issue with that patch, but neither Mark > > > nor me could reproduce so far, so we don't know if we're right about the > > > cause. > > > > is there more info about that issue? I could try it > > Find below, 0day report didn't go out to lkml. We think moving the > list_del_init() out from the !RB_NODE_EMPTY() test might fix, but since > we can't repro so far, its all guesses. I've managed to reproduce this using the 0day scripts. From the 0day logs, it looks like it's possible to hit it ~6% of the time. I added a WARN_ON(RB_NODE_EMPTY(...)), and I see that fire in the exit path: [ 76.287197] perf_remove_from_context+0x9a/0xc0 [ 76.287552] perf_event_release_kernel+0x214/0x3e0 [ 76.287928] ? _raw_spin_unlock+0x8/0x10 [ 76.288237] ? locks_remove_file+0x219/0x230 [ 76.288572] perf_release+0xe/0x20 [ 76.288842] __fput+0x1c9/0x340 [ 76.289089] fput+0x8/0x10 [ 76.289332] task_work_run+0x9a/0xd0 [ 76.289613] do_exit+0x6cc/0x1220 [ 76.289877] ? __might_sleep+0xcb/0x150 [ 76.290183] do_group_exit+0x96/0x110 [ 76.290473] get_signal+0x8c3/0xb60 [ 76.290750] ? __perf_event_task_sched_in+0x20d/0x250 [ 76.291143] do_signal+0x19/0x950 [ 76.291408] ? finish_task_switch+0x212/0x480 [ 76.291750] ? __switch_to+0x414/0x730 [ 76.292051] ? _raw_spin_unlock_irqrestore+0x45/0x60 [ 76.292439] ? trace_hardirqs_on+0x63/0x100 [ 76.292769] ? prepare_to_wait_event+0x23a/0x250 [ 76.293132] ? do_int80_syscall_32+0x271/0x290 [ 76.293478] exit_to_usermode_loop+0xb9/0x190 [ 76.293819] do_int80_syscall_32+0x271/0x290 [ 76.294161] entry_INT80_32+0x36/0x36 ... then we subsequently hit the initial splat, which looks promisingly like our initial theory. However, I don't currently understand how we can see a group leader with an empty RB node in this path. I can only see that being the case for siblings that we promoted to being leaders, and those should have an empty list at the point we promote them... Thanks, Mark.
Re: [PATCH] perf: Fix sibling iteration
On 16.03.2018 13:31, Peter Zijlstra wrote: > On Fri, Mar 16, 2018 at 12:59:34PM +0300, Alexey Budankov wrote: >> Hi, >> On 15.03.2018 20:01, Peter Zijlstra wrote: >>> Subject: perf: Fix sibling iteration >>> From: Peter Zijlstra >>> Date: Thu Mar 15 17:36:56 CET 2018 >>> >>> Mark noticed that the change to sibling_list changed some iteration >>> semantics; because previously we used group_list as list entry, >>> sibling events would always have an empty sibling_list. >>> >>> But because we now use sibling_list for both list head and list entry, >>> siblings will report as having siblings. >>> >>> Fix this with a custom for_each_sibling_event() iterator. >>> >>> Suggested-by: Mark Rutland >>> Reported-by: Mark Rutland >>> Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") >>> Signed-off-by: Peter Zijlstra (Intel) >>> --- >> >> Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 >> cores): > > There is at least one more known issue with that patch, but neither Mark > nor me could reproduce so far, so we don't know if we're right about the > cause. Please share more information regarding the issue. I am ready to help. Thanks, Alexey
Re: [PATCH] perf: Fix sibling iteration
On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote: > On Fri, Mar 16, 2018 at 12:59:34PM +0300, Alexey Budankov wrote: > > Hi, > > On 15.03.2018 20:01, Peter Zijlstra wrote: > > > Subject: perf: Fix sibling iteration > > > From: Peter Zijlstra > > > Date: Thu Mar 15 17:36:56 CET 2018 > > > > > > Mark noticed that the change to sibling_list changed some iteration > > > semantics; because previously we used group_list as list entry, > > > sibling events would always have an empty sibling_list. > > > > > > But because we now use sibling_list for both list head and list entry, > > > siblings will report as having siblings. > > > > > > Fix this with a custom for_each_sibling_event() iterator. > > > > > > Suggested-by: Mark Rutland > > > Reported-by: Mark Rutland > > > Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") > > > Signed-off-by: Peter Zijlstra (Intel) > > > --- > > > > Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 > > cores): > > There is at least one more known issue with that patch, but neither Mark > nor me could reproduce so far, so we don't know if we're right about the > cause. is there more info about that issue? I could try it thanks, jirka
Re: [PATCH] perf: Fix sibling iteration
On Fri, Mar 16, 2018 at 12:59:34PM +0300, Alexey Budankov wrote: > Hi, > On 15.03.2018 20:01, Peter Zijlstra wrote: > > Subject: perf: Fix sibling iteration > > From: Peter Zijlstra > > Date: Thu Mar 15 17:36:56 CET 2018 > > > > Mark noticed that the change to sibling_list changed some iteration > > semantics; because previously we used group_list as list entry, > > sibling events would always have an empty sibling_list. > > > > But because we now use sibling_list for both list head and list entry, > > siblings will report as having siblings. > > > > Fix this with a custom for_each_sibling_event() iterator. > > > > Suggested-by: Mark Rutland > > Reported-by: Mark Rutland > > Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 > cores): There is at least one more known issue with that patch, but neither Mark nor me could reproduce so far, so we don't know if we're right about the cause.
Re: [PATCH] perf: Fix sibling iteration
Hi, On 15.03.2018 20:01, Peter Zijlstra wrote: > Subject: perf: Fix sibling iteration > From: Peter Zijlstra > Date: Thu Mar 15 17:36:56 CET 2018 > > Mark noticed that the change to sibling_list changed some iteration > semantics; because previously we used group_list as list entry, > sibling events would always have an empty sibling_list. > > But because we now use sibling_list for both list head and list entry, > siblings will report as having siblings. > > Fix this with a custom for_each_sibling_event() iterator. > > Suggested-by: Mark Rutland > Reported-by: Mark Rutland > Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") > Signed-off-by: Peter Zijlstra (Intel) > --- Applied to tip repo and run testing on Fedora 27/x86_64 (client skylake 8 cores): make -C tools/perf/ build-test make: Entering directory '/root/abudanko/tip/tools/perf' - tarpkg: ./tests/perf-targz-src-pkg . - /root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP: cd . && make FEATURE_DUMP_COPY=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump cd . && make FEATURE_DUMP_COPY=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP feature-dump make_with_babeltrace_O: cd . && make LIBBABELTRACE=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.icWjm9G83i DESTDIR=/tmp/tmp.k3gS9nZWn1 make_no_libperl_O: cd . && make NO_LIBPERL=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.cFFDwS0GI0 DESTDIR=/tmp/tmp.ZSHnATR3nx make_no_gtk2_O: cd . && make NO_GTK2=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.RnSxaWFXYe DESTDIR=/tmp/tmp.fQ1csVAL81 make_no_newt_O: cd . && make NO_NEWT=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.IVFn6QEq17 DESTDIR=/tmp/tmp.6eBvM9A4kX make_no_backtrace_O: cd . && make NO_BACKTRACE=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.EM0Q3e692P DESTDIR=/tmp/tmp.ZwfkJkPCHP make_no_libdw_dwarf_unwind_O: cd . && make NO_LIBDW_DWARF_UNWIND=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.3Vo6EX89gQ DESTDIR=/tmp/tmp.AYf0P4xIMu make_no_scripts_O: cd . && make NO_LIBPYTHON=1 NO_LIBPERL=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.wKJnsI2J6Y DESTDIR=/tmp/tmp.dbkavZubVA make_no_auxtrace_O: cd . && make NO_AUXTRACE=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.JC2HxLAtWy DESTDIR=/tmp/tmp.m3kuS2SjoJ make_tags_O: cd . && make tags FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.yNJ1JYIHRe DESTDIR=/tmp/tmp.DfpYbrOHyN make_doc_O: cd . && make doc FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.4oRV5wwXRG DESTDIR=/tmp/tmp.hoHvRcWGMz make_clean_all_O: cd . && make clean all FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.VVZwGQwAsl DESTDIR=/tmp/tmp.otyKC854jC make_no_demangle_O: cd . && make NO_DEMANGLE=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.Pqn88IDBJe DESTDIR=/tmp/tmp.26CemSq8rl make_install_bin_O: cd . && make install-bin FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.CJ5gfJRJ88 DESTDIR=/tmp/tmp.hS0o8yXeK7 make_install_O: cd . && make install FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.RPRcY6uDq8 DESTDIR=/tmp/tmp.IW01wtATTd make_install_prefix_slash_O: cd . && make install prefix=/tmp/krava/ FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.gvOLjRABv2 DESTDIR=/tmp/tmp.6OMgy187rT make_no_slang_O: cd . && make NO_SLANG=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.5IfNr3OiDn DESTDIR=/tmp/tmp.28yzBF5bDE make_perf_o_O: cd . && make perf.o FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.CTmyGT5ZNU DESTDIR=/tmp/tmp.ohiadqvLzw make_with_clangllvm_O: cd . && make LIBCLANGLLVM=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.bDoP5XgtvV DESTDIR=/tmp/tmp.Zfn944JFmO make_no_libelf_O: cd . && make NO_LIBELF=1 FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.mzs7boZhfA DESTDIR=/tmp/tmp.P9AgDYEA3j make_pure_O: cd . && make FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.pJGEhkbl9n DESTDIR=/tmp/tmp.msv5mDxu4F make_install_prefix_O: cd . && make install prefix=/tmp/krava FEATURES_DUMP=/root/abudanko/tip/tools/perf/BUILD_TEST_FEATURE_DUMP O=/tmp/tmp.QoERhLwZDr DESTDIR=/tmp/tmp.ZEmUFt1UnF make_no_libpython_O: cd . && make NO_LIBPYTHON=1 FEATURES_DUMP=/root/abudanko/tip/tools/per
[PATCH] perf: Fix sibling iteration
Subject: perf: Fix sibling iteration From: Peter Zijlstra Date: Thu Mar 15 17:36:56 CET 2018 Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Suggested-by: Mark Rutland Reported-by: Mark Rutland Fixes: 8343aae66167 ("perf/core: Remove perf_event::group_entry") Signed-off-by: Peter Zijlstra (Intel) --- --- a/arch/alpha/kernel/perf_event.c +++ b/arch/alpha/kernel/perf_event.c @@ -351,7 +351,7 @@ static int collect_events(struct perf_ev evtype[n] = group->hw.event_base; current_idx[n++] = PMC_NO_INDEX; } - list_for_each_entry(pe, &group->sibling_list, sibling_list) { + for_each_sibling_event(pe, group) { if (!is_software_event(pe) && pe->state != PERF_EVENT_STATE_OFF) { if (n >= max_count) return -1; --- a/arch/arm/mach-imx/mmdc.c +++ b/arch/arm/mach-imx/mmdc.c @@ -269,7 +269,7 @@ static bool mmdc_pmu_group_is_valid(stru return false; } - list_for_each_entry(sibling, &leader->sibling_list, sibling_list) { + for_each_sibling_event(sibling, leader) { if (!mmdc_pmu_group_event_is_valid(sibling, pmu, &counter_mask)) return false; } --- a/arch/arm/mm/cache-l2x0-pmu.c +++ b/arch/arm/mm/cache-l2x0-pmu.c @@ -293,7 +293,7 @@ static bool l2x0_pmu_group_is_valid(stru else if (!is_software_event(leader)) return false; - list_for_each_entry(sibling, &leader->sibling_list, sibling_list) { + for_each_sibling_event(sibling, leader) { if (sibling->pmu == pmu) num_hw++; else if (!is_software_event(sibling)) --- a/arch/mips/kernel/perf_event_mipsxx.c +++ b/arch/mips/kernel/perf_event_mipsxx.c @@ -711,7 +711,7 @@ static int validate_group(struct perf_ev if (mipsxx_pmu_alloc_counter(&fake_cpuc, &leader->hw) < 0) return -EINVAL; - list_for_each_entry(sibling, &leader->sibling_list, sibling_list) { + for_each_sibling_event(sibling, leader) { if (mipsxx_pmu_alloc_counter(&fake_cpuc, &sibling->hw) < 0) return -EINVAL; } --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -1426,7 +1426,7 @@ static int collect_events(struct perf_ev flags[n] = group->hw.event_base; events[n++] = group->hw.config; } - list_for_each_entry(event, &group->sibling_list, sibling_list) { + for_each_sibling_event(event, group) { if (event->pmu->task_ctx_nr == perf_hw_context && event->state != PERF_EVENT_STATE_OFF) { if (n >= max_count) --- a/arch/powerpc/perf/core-fsl-emb.c +++ b/arch/powerpc/perf/core-fsl-emb.c @@ -277,7 +277,7 @@ static int collect_events(struct perf_ev ctrs[n] = group; n++; } - list_for_each_entry(event, &group->sibling_list, sibling_list) { + for_each_sibling_event(event, group) { if (!is_software_event(event) && event->state != PERF_EVENT_STATE_OFF) { if (n >= max_count) --- a/arch/sparc/kernel/perf_event.c +++ b/arch/sparc/kernel/perf_event.c @@ -1342,7 +1342,7 @@ static int collect_events(struct perf_ev events[n] = group->hw.event_base; current_idx[n++] = PIC_NO_INDEX; } - list_for_each_entry(event, &group->sibling_list, sibling_list) { + for_each_sibling_event(event, group) { if (!is_software_event(event) && event->state != PERF_EVENT_STATE_OFF) { if (n >= max_count) --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -990,7 +990,7 @@ static int collect_events(struct cpu_hw_ if (!dogrp) return n; - list_for_each_entry(event, &leader->sibling_list, sibling_list) { + for_each_sibling_event(event, leader) { if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF) continue; --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -354,7 +354,7 @@ uncore_collect_events(struct intel_uncor if (!dogrp) return n; - list_for_each_entry(event, &leader->sibling_list, sibling_list) { + for_each_sibling_event(event, leader) { if (!is_box_event(box, event) || event->state <= PERF_EVENT_STATE_OFF) continue; --- a/drivers/bus/arm-cci.c +++ b/drivers/bus/arm-cci.c @