Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 11/3/20 10:24 PM, John Garry wrote: > On 03/11/2020 16:05, Ian Rogers wrote: >> On Tue, Nov 3, 2020 at 6:43 AM John Garry wrote: >>> On 20/10/2020 17:53, Ian Rogers wrote: >> Thanks for taking a look John. If you want help you can send the >> output of "perf test 67 -vvv" to me. It is possible Broadwell has >> similar glitches in the json to Skylake. I tested the original test on >> server parts as I can access them as cloud machines. >> >>> I will have a look, but I was hoping that Ian would have a proper fix >>> for this on top of ("perf metricgroup: Fix uncore metric expressions"), >>> which now looks to be merged. >> I still have these changes to look at in my inbox but I'm assuming >> they're good:-) Sorry for not getting to them, but it's good they are >> merged. > Hi Ian, > Checked in upstream kernel with your fix patch, in powerpc also test > case 67 is passing. > But I am getting issue in test 10 for powerpc > > [command]# ./perf test 10 > 10: PMU events : > 10.1: PMU event table sanity : Ok > 10.2: PMU event map aliases : Ok > 10.3: Parsing of PMU event table metrics : > Skip (some metrics failed) > 10.4: Parsing of PMU event table metrics with fake PMUs : > FAILED! > > Was debugging it, issue is with commit e1c92a7fbbc5 perf tests: Add > another metric parsing test. > > So, there we are passing different runtime parameter value in > "expr__find_other and expr__parse" > in function `metric_parse_fake`. I believe we need to send same value. > I will send fix patch for the same. >>> Just wondering, was a patch ever submitted for this? Something still >>> broken? I can't see any recent relevant changes to tests/pmu-events.c >> The test itself shouldn't have changed, but the json files parsed by >> jevents and turned into C code that the test exercises should have >> changed. Jin Yao has sent two patch sets fixing a metric issue on SKL >> (Skylake non-server) that should hopefully fix the issue there - I'll >> check the status on these. Are you testing on Skylake? > > So I have re-read this thread, and it seems that 2x different things are > being discussed: > a. some breakage for test #10 on skylake > b. test #67 being broken > > It seems that a. has been addressed. That's what I was asking about just now. Hi Ian/John, The breakage for test #10 which I mentioned is for power9 machine, if that you were asking. I still need to send fix patch out. I will send it soon. Thanks, Kajol Jain > > So about b., which I thought may be broken for some other reason apart from > my hacky patch. But it seems not the case, and a proper patch is needed there. > > Ian, have you had a chance to consider this issue in b.? That is, we have > breakage for metrics using uncore alias expressions for when multiple uncore > PMUs associated exist in the system? As before, looks broken by ded80bda8bc9 > ("perf expr: Migrate expr ids table to a hashmap") > > Thanks, > John > >
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 03/11/2020 16:05, Ian Rogers wrote: On Tue, Nov 3, 2020 at 6:43 AM John Garry wrote: On 20/10/2020 17:53, Ian Rogers wrote: Thanks for taking a look John. If you want help you can send the output of "perf test 67 -vvv" to me. It is possible Broadwell has similar glitches in the json to Skylake. I tested the original test on server parts as I can access them as cloud machines. I will have a look, but I was hoping that Ian would have a proper fix for this on top of ("perf metricgroup: Fix uncore metric expressions"), which now looks to be merged. I still have these changes to look at in my inbox but I'm assuming they're good:-) Sorry for not getting to them, but it's good they are merged. Hi Ian, Checked in upstream kernel with your fix patch, in powerpc also test case 67 is passing. But I am getting issue in test 10 for powerpc [command]# ./perf test 10 10: PMU events : 10.1: PMU event table sanity: Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics: Skip (some metrics failed) 10.4: Parsing of PMU event table metrics with fake PMUs : FAILED! Was debugging it, issue is with commit e1c92a7fbbc5 perf tests: Add another metric parsing test. So, there we are passing different runtime parameter value in "expr__find_other and expr__parse" in function `metric_parse_fake`. I believe we need to send same value. I will send fix patch for the same. Just wondering, was a patch ever submitted for this? Something still broken? I can't see any recent relevant changes to tests/pmu-events.c The test itself shouldn't have changed, but the json files parsed by jevents and turned into C code that the test exercises should have changed. Jin Yao has sent two patch sets fixing a metric issue on SKL (Skylake non-server) that should hopefully fix the issue there - I'll check the status on these. Are you testing on Skylake? So I have re-read this thread, and it seems that 2x different things are being discussed: a. some breakage for test #10 on skylake b. test #67 being broken It seems that a. has been addressed. That's what I was asking about just now. So about b., which I thought may be broken for some other reason apart from my hacky patch. But it seems not the case, and a proper patch is needed there. Ian, have you had a chance to consider this issue in b.? That is, we have breakage for metrics using uncore alias expressions for when multiple uncore PMUs associated exist in the system? As before, looks broken by ded80bda8bc9 ("perf expr: Migrate expr ids table to a hashmap") Thanks, John
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On Tue, Nov 3, 2020 at 6:43 AM John Garry wrote: > > On 20/10/2020 17:53, Ian Rogers wrote: > >>> Thanks for taking a look John. If you want help you can send the > >>> output of "perf test 67 -vvv" to me. It is possible Broadwell has > >>> similar glitches in the json to Skylake. I tested the original test on > >>> server parts as I can access them as cloud machines. > >>> > I will have a look, but I was hoping that Ian would have a proper fix > for this on top of ("perf metricgroup: Fix uncore metric expressions"), > which now looks to be merged. > >>> I still have these changes to look at in my inbox but I'm assuming > >>> they're good:-) Sorry for not getting to them, but it's good they are > >>> merged. > >> Hi Ian, > >> Checked in upstream kernel with your fix patch, in powerpc also test > >> case 67 is passing. > >> But I am getting issue in test 10 for powerpc > >> > >> [command]# ./perf test 10 > >> 10: PMU events : > >> 10.1: PMU event table sanity: Ok > >> 10.2: PMU event map aliases : Ok > >> 10.3: Parsing of PMU event table metrics: Skip > >> (some metrics failed) > >> 10.4: Parsing of PMU event table metrics with fake PMUs : > >> FAILED! > >> > >> Was debugging it, issue is with commit e1c92a7fbbc5 perf tests: Add > >> another metric parsing test. > >> > >> So, there we are passing different runtime parameter value in > >> "expr__find_other and expr__parse" > >> in function `metric_parse_fake`. I believe we need to send same value. > >> I will send fix patch for the same. > > Just wondering, was a patch ever submitted for this? Something still > broken? I can't see any recent relevant changes to tests/pmu-events.c The test itself shouldn't have changed, but the json files parsed by jevents and turned into C code that the test exercises should have changed. Jin Yao has sent two patch sets fixing a metric issue on SKL (Skylake non-server) that should hopefully fix the issue there - I'll check the status on these. Are you testing on Skylake? Thanks, Ian > Thanks, > John
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 20/10/2020 17:53, Ian Rogers wrote: Thanks for taking a look John. If you want help you can send the output of "perf test 67 -vvv" to me. It is possible Broadwell has similar glitches in the json to Skylake. I tested the original test on server parts as I can access them as cloud machines. I will have a look, but I was hoping that Ian would have a proper fix for this on top of ("perf metricgroup: Fix uncore metric expressions"), which now looks to be merged. I still have these changes to look at in my inbox but I'm assuming they're good:-) Sorry for not getting to them, but it's good they are merged. Hi Ian, Checked in upstream kernel with your fix patch, in powerpc also test case 67 is passing. But I am getting issue in test 10 for powerpc [command]# ./perf test 10 10: PMU events : 10.1: PMU event table sanity: Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics: Skip (some metrics failed) 10.4: Parsing of PMU event table metrics with fake PMUs : FAILED! Was debugging it, issue is with commit e1c92a7fbbc5 perf tests: Add another metric parsing test. So, there we are passing different runtime parameter value in "expr__find_other and expr__parse" in function `metric_parse_fake`. I believe we need to send same value. I will send fix patch for the same. Just wondering, was a patch ever submitted for this? Something still broken? I can't see any recent relevant changes to tests/pmu-events.c Thanks, John
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On Tue, Oct 20, 2020 at 1:56 AM kajoljain wrote: > > > > On 10/19/20 9:50 PM, Ian Rogers wrote: > > On Mon, Oct 19, 2020 at 2:51 AM John Garry wrote: > >> > >> On 19/10/2020 00:30, Ian Rogers wrote: > >>> On Sun, Oct 18, 2020 at 1:51 AM kernel test robot > >>> wrote: > > Greeting, > > FYI, we noticed the following commit (built with gcc-9): > > commit: fcc9c5243c478f104014daf4d23db86098d2aef0 ("perf metricgroup: > Hack a fix for aliases when covering multiple PMUs") > url: > https://github.com/0day-ci/linux/commits/John-Garry/perf-pmu-events-Support-event-aliasing-for-system-PMUs/20201008-182049 > > > in testcase: perf-sanity-tests > version: perf-x86_64-c85fb28b6f99-1_20201008 > with following parameters: > > perf_compiler: gcc > ucode: 0xdc > > > > on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with > 32G memory > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): > >>> > >>> I believe this is a Skylake and there is a known bug in the Skylake > >>> metric DRAM_Parallel_Reads as described here: > >>> https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ > >>> Fixing the bug needs more knowledge than what is available in manuals. > >>> Hopefully Intel can take a look. > >>> > >>> Thanks, > >>> Ian > >> > >> So this named patch ("perf metricgroup: Hack a fix for aliases...") is > >> breaking test #67 on my machine also, which is a broadwell. > > > > Thanks for taking a look John. If you want help you can send the > > output of "perf test 67 -vvv" to me. It is possible Broadwell has > > similar glitches in the json to Skylake. I tested the original test on > > server parts as I can access them as cloud machines. > > > >> I will have a look, but I was hoping that Ian would have a proper fix > >> for this on top of ("perf metricgroup: Fix uncore metric expressions"), > >> which now looks to be merged. > > > > I still have these changes to look at in my inbox but I'm assuming > > they're good :-) Sorry for not getting to them, but it's good they are > > merged. > > Hi Ian, >Checked in upstream kernel with your fix patch, in powerpc also test case > 67 is passing. > But I am getting issue in test 10 for powerpc > > [command]# ./perf test 10 > 10: PMU events : > 10.1: PMU event table sanity: Ok > 10.2: PMU event map aliases : Ok > 10.3: Parsing of PMU event table metrics: Skip > (some metrics failed) > 10.4: Parsing of PMU event table metrics with fake PMUs : FAILED! > > Was debugging it, issue is with commit e1c92a7fbbc5 perf tests: Add another > metric parsing test. > > So, there we are passing different runtime parameter value in > "expr__find_other and expr__parse" > in function `metric_parse_fake`. I believe we need to send same value. > I will send fix patch for the same. > > Thanks, > Kajol Jain Thanks, the fake support was done by Jiri. I do try to test on Power 8. The awesome thing, aside from the testing nit fixes, is that the metrics will actually work once the test is passing :-). They may of course report junk. Thanks, Ian > > > > Thanks, > > Ian > > > >> Thanks! > >> > >>> > > > If you fix the issue, kindly add following tag > Reported-by: kernel test robot > > > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 67 > 67: Parse and process metrics : FAILED! > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 68 > 68: x86 rdpmc : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 69 > 69: Convert perf time to TSC : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 70 > 70: DWARF unwind : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 71 > 71: x86 instruction decoder - new instructions: Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 72 > 72: Intel PT packet decoder : Ok > 2020-10-16 19:31:52 sudo >
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 10/19/20 9:50 PM, Ian Rogers wrote: > On Mon, Oct 19, 2020 at 2:51 AM John Garry wrote: >> >> On 19/10/2020 00:30, Ian Rogers wrote: >>> On Sun, Oct 18, 2020 at 1:51 AM kernel test robot >>> wrote: Greeting, FYI, we noticed the following commit (built with gcc-9): commit: fcc9c5243c478f104014daf4d23db86098d2aef0 ("perf metricgroup: Hack a fix for aliases when covering multiple PMUs") url: https://github.com/0day-ci/linux/commits/John-Garry/perf-pmu-events-Support-event-aliasing-for-system-PMUs/20201008-182049 in testcase: perf-sanity-tests version: perf-x86_64-c85fb28b6f99-1_20201008 with following parameters: perf_compiler: gcc ucode: 0xdc on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): >>> >>> I believe this is a Skylake and there is a known bug in the Skylake >>> metric DRAM_Parallel_Reads as described here: >>> https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ >>> Fixing the bug needs more knowledge than what is available in manuals. >>> Hopefully Intel can take a look. >>> >>> Thanks, >>> Ian >> >> So this named patch ("perf metricgroup: Hack a fix for aliases...") is >> breaking test #67 on my machine also, which is a broadwell. > > Thanks for taking a look John. If you want help you can send the > output of "perf test 67 -vvv" to me. It is possible Broadwell has > similar glitches in the json to Skylake. I tested the original test on > server parts as I can access them as cloud machines. > >> I will have a look, but I was hoping that Ian would have a proper fix >> for this on top of ("perf metricgroup: Fix uncore metric expressions"), >> which now looks to be merged. > > I still have these changes to look at in my inbox but I'm assuming > they're good :-) Sorry for not getting to them, but it's good they are > merged. Hi Ian, Checked in upstream kernel with your fix patch, in powerpc also test case 67 is passing. But I am getting issue in test 10 for powerpc [command]# ./perf test 10 10: PMU events : 10.1: PMU event table sanity: Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics: Skip (some metrics failed) 10.4: Parsing of PMU event table metrics with fake PMUs : FAILED! Was debugging it, issue is with commit e1c92a7fbbc5 perf tests: Add another metric parsing test. So, there we are passing different runtime parameter value in "expr__find_other and expr__parse" in function `metric_parse_fake`. I believe we need to send same value. I will send fix patch for the same. Thanks, Kajol Jain > > Thanks, > Ian > >> Thanks! >> >>> If you fix the issue, kindly add following tag Reported-by: kernel test robot 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 67 67: Parse and process metrics : FAILED! 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 68 68: x86 rdpmc : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 69 69: Convert perf time to TSC : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 70 70: DWARF unwind : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 71 71: x86 instruction decoder - new instructions: Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 72 72: Intel PT packet decoder : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 73 73: x86 bp modify : Ok 2020-10-16 19:31:53 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 74 74: probe libc's inet_pton & backtrace it with ping : Ok 2020-10-16 19:31:54 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 75
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 19/10/2020 17:20, Ian Rogers wrote: n So this named patch ("perf metricgroup: Hack a fix for aliases...") is breaking test #67 on my machine also, which is a broadwell. Thanks for taking a look John. If you want help you can send the output of "perf test 67 -vvv" to me. It is possible Broadwell has similar glitches in the json to Skylake. I tested the original test on server parts as I can access them as cloud machines. Here it is: john@localhost:~/kernel-dev7/tools/perf> ./perf test -vv 67 Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc 67: Parse and process metrics : --- start --- test child forked, pid 24433 metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread found event inst_retired.any found event cpu_clk_unhalted.thread adding {inst_retired.any,cpu_clk_unhalted.thread}:W Attempting to add event pmu 'inst_retired.any' with '' that may result in non-fatal errors Attempting to add event pmu 'cpu_clk_unhalted.thread' with '' that may result in non-fatal errors parsing metric: inst_retired.any / cpu_clk_unhalted.thread lookup: is_ref 0, counted 0, val 300.00: inst_retired.any lookup: is_ref 0, counted 101, val 200.00: cpu_clk_unhalted.thread metric expr idq_uops_not_delivered.core / (4 * (( ( cpu_clk_unhalted.thread / 2 ) * ( 1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk ) ))) for Frontend_Bound_SMT parsing metric: idq_uops_not_delivered.core / (4 * (( ( cpu_clk_unhalted.thread / 2 ) * ( 1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk ) ))) found event cpu_clk_unhalted.one_thread_active found event cpu_clk_unhalted.ref_xclk found event idq_uops_not_delivered.core found event cpu_clk_unhalted.thread adding {cpu_clk_unhalted.one_thread_active,cpu_clk_unhalted.ref_xclk,idq_uops_not_delivered.core,cpu_clk_unhalted.thread}:W Attempting to add event pmu 'cpu_clk_unhalted.one_thread_active' with '' that may result in non-fatal errors Attempting to add event pmu 'cpu_clk_unhalted.ref_xclk' with '' that may result in non-fatal errors Attempting to add event pmu 'idq_uops_not_delivered.core' with '' that may result in non-fatal errors Attempting to add event pmu 'cpu_clk_unhalted.thread' with '' that may result in non-fatal errors parsing metric: idq_uops_not_delivered.core / (4 * (( ( cpu_clk_unhalted.thread / 2 ) * ( 1 + cpu_clk_unhalted.one_thread_active / cpu_clk_unhalted.ref_xclk ) ))) lookup: is_ref 0, counted 46, val 300.00: idq_uops_not_delivered.core lookup: is_ref 0, counted 0, val 200.00: cpu_clk_unhalted.thread lookup: is_ref 0, counted 216, val 400.00: cpu_clk_unhalted.one_thread_active lookup: is_ref 0, counted 46, val 600.00: cpu_clk_unhalted.ref_xclk metric expr (dcache_miss_cpi + icache_miss_cycles) for cache_miss_cycles parsing metric: (dcache_miss_cpi + icache_miss_cycles) metric expr l1d\-loads\-misses / inst_retired.any for dcache_miss_cpi parsing metric: l1d\-loads\-misses / inst_retired.any metric expr l1i\-loads\-misses / inst_retired.any for icache_miss_cycles parsing metric: l1i\-loads\-misses / inst_retired.any found event inst_retired.any found event l1i-loads-misses found event l1d-loads-misses adding {inst_retired.any,l1i-loads-misses,l1d-loads-misses}:W Attempting to add event pmu 'inst_retired.any' with '' that may result in non-fatal errors adding ref metric icache_miss_cycles: l1i\-loads\-misses / inst_retired.any adding ref metric dcache_miss_cpi: l1d\-loads\-misses / inst_retired.any parsing metric: (dcache_miss_cpi + icache_miss_cycles) lookup: is_ref 1, counted 0, val 0.00: dcache_miss_cpi processing metric: dcache_miss_cpi ENTRY parsing metric: l1d\-loads\-misses / inst_retired.any lookup: is_ref 0, counted 105, val 300.00: l1d-loads-misses lookup: is_ref 0, counted 46, val 400.00: inst_retired.any processing metric: dcache_miss_cpi EXIT: 0.75 lookup: is_ref 1, counted 0, val 0.00: icache_miss_cycles processing metric: icache_miss_cycles ENTRY parsing metric: l1i\-loads\-misses / inst_retired.any lookup: is_ref 0, counted 216, val 200.00: l1i-loads-misses lookup: is_ref 0, counted 46, val 400.00: inst_retired.any processing metric: icache_miss_cycles EXIT: 0.50 metric expr d_ratio(dcache_l2_all_hits, dcache_l2_all) for DCache_L2_Hits parsing metric: d_ratio(dcache_l2_all_hits, dcache_l2_all) metric expr l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit for DCache_L2_All_Hits parsing metric: l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit metric expr dcache_l2_all_hits + dcache_l2_all_miss for DCache_L2_All parsing metric: dcache_l2_all_hits + dcache_l2_all_miss metric expr l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit for DCache_L2_All_Hits parsing metric: l2_rqsts.demand_data_rd_hit + l2_rqsts.pf_hit + l2_rqsts.rfo_hit metric expr max(l2_rqsts.all_demand_data_rd -
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On Mon, Oct 19, 2020 at 2:51 AM John Garry wrote: > > On 19/10/2020 00:30, Ian Rogers wrote: > > On Sun, Oct 18, 2020 at 1:51 AM kernel test robot > > wrote: > >> > >> Greeting, > >> > >> FYI, we noticed the following commit (built with gcc-9): > >> > >> commit: fcc9c5243c478f104014daf4d23db86098d2aef0 ("perf metricgroup: Hack > >> a fix for aliases when covering multiple PMUs") > >> url: > >> https://github.com/0day-ci/linux/commits/John-Garry/perf-pmu-events-Support-event-aliasing-for-system-PMUs/20201008-182049 > >> > >> > >> in testcase: perf-sanity-tests > >> version: perf-x86_64-c85fb28b6f99-1_20201008 > >> with following parameters: > >> > >> perf_compiler: gcc > >> ucode: 0xdc > >> > >> > >> > >> on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with > >> 32G memory > >> > >> caused below changes (please refer to attached dmesg/kmsg for entire > >> log/backtrace): > > > > I believe this is a Skylake and there is a known bug in the Skylake > > metric DRAM_Parallel_Reads as described here: > > https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ > > Fixing the bug needs more knowledge than what is available in manuals. > > Hopefully Intel can take a look. > > > > Thanks, > > Ian > > So this named patch ("perf metricgroup: Hack a fix for aliases...") is > breaking test #67 on my machine also, which is a broadwell. Thanks for taking a look John. If you want help you can send the output of "perf test 67 -vvv" to me. It is possible Broadwell has similar glitches in the json to Skylake. I tested the original test on server parts as I can access them as cloud machines. > I will have a look, but I was hoping that Ian would have a proper fix > for this on top of ("perf metricgroup: Fix uncore metric expressions"), > which now looks to be merged. I still have these changes to look at in my inbox but I'm assuming they're good :-) Sorry for not getting to them, but it's good they are merged. Thanks, Ian > Thanks! > > > > >> > >> > >> If you fix the issue, kindly add following tag > >> Reported-by: kernel test robot > >> > >> > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 67 > >> 67: Parse and process metrics : FAILED! > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 68 > >> 68: x86 rdpmc : Ok > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 69 > >> 69: Convert perf time to TSC : Ok > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 70 > >> 70: DWARF unwind : Ok > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 71 > >> 71: x86 instruction decoder - new instructions: Ok > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 72 > >> 72: Intel PT packet decoder : Ok > >> 2020-10-16 19:31:52 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 73 > >> 73: x86 bp modify : Ok > >> 2020-10-16 19:31:53 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 74 > >> 74: probe libc's inet_pton & backtrace it with ping : Ok > >> 2020-10-16 19:31:54 sudo > >> /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > >> test 75 > >> 75: Zstd perf.data compression/decompression : Ok > >> > >> > >> > >> To reproduce: > >> > >> git clone https://github.com/intel/lkp-tests.git > >> cd lkp-tests > >> bin/lkp install job.yaml # job file is attached in this email > >> bin/lkp run job.yaml > >> > >> > >> > >> Thanks, > >> Rong Chen > >> > > . > > >
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
Hi Garry, Hi Ian, On 10/19/2020 5:48 PM, John Garry wrote: On 19/10/2020 00:30, Ian Rogers wrote: On Sun, Oct 18, 2020 at 1:51 AM kernel test robot wrote: Greeting, FYI, we noticed the following commit (built with gcc-9): commit: fcc9c5243c478f104014daf4d23db86098d2aef0 ("perf metricgroup: Hack a fix for aliases when covering multiple PMUs") url: https://github.com/0day-ci/linux/commits/John-Garry/perf-pmu-events-Support-event-aliasing-for-system-PMUs/20201008-182049 in testcase: perf-sanity-tests version: perf-x86_64-c85fb28b6f99-1_20201008 with following parameters: perf_compiler: gcc ucode: 0xdc on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): I believe this is a Skylake and there is a known bug in the Skylake metric DRAM_Parallel_Reads as described here: https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ Fixing the bug needs more knowledge than what is available in manuals. Hopefully Intel can take a look. Thanks, Ian So this named patch ("perf metricgroup: Hack a fix for aliases...") is breaking test #67 on my machine also, which is a broadwell. I will have a look, but I was hoping that Ian would have a proper fix for this on top of ("perf metricgroup: Fix uncore metric expressions"), which now looks to be merged. Thanks! I just think they are different issues. On my KBL client, the perf test #67 is passed. But DRAM_Parallel_Reads does have issue. root@kbl-ppc:~# perf stat -M DRAM_Parallel_Reads -- sleep 1 event syntax error: '{arb/event=0x80,umask=0x2/,arb/event=0x80,umask=0x2,thresh=1/}:W' \___ unknown term 'thresh' for pmu 'uncore_arb' valid terms: event,edge,inv,umask,cmask,config,config1,config2,name,period,percore Initial error: event syntax error: '..umask=0x2/,arb/event=0x80,umask=0x2,thresh=1/}:W' \___ Cannot find PMU `arb'. Missing kernel support? Usage: perf stat [] [] -M, --metrics monitor specified metrics or metric groups (separated by ,) I have a patch to fix DRAM_Parallel_Reads. After: root@kbl-ppc:~# perf stat -M MEM_Parallel_Reads -- sleep 1 Performance counter stats for 'system wide': 3,043,952 arb/event=0x80,umask=0x2/ # 1.00 MEM_Parallel_Reads 1.000879932 seconds time elapsed I will post the patch later. Thanks Jin Yao If you fix the issue, kindly add following tag Reported-by: kernel test robot 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 67 67: Parse and process metrics : FAILED! 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 68 68: x86 rdpmc : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 69 69: Convert perf time to TSC : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 70 70: DWARF unwind : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 71 71: x86 instruction decoder - new instructions : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 72 72: Intel PT packet decoder : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 73 73: x86 bp modify : Ok 2020-10-16 19:31:53 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 74 74: probe libc's inet_pton & backtrace it with ping : Ok 2020-10-16 19:31:54 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 75 75: Zstd perf.data compression/decompression : Ok To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml Thanks, Rong Chen .
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 19/10/2020 00:30, Ian Rogers wrote: On Sun, Oct 18, 2020 at 1:51 AM kernel test robot wrote: Greeting, FYI, we noticed the following commit (built with gcc-9): commit: fcc9c5243c478f104014daf4d23db86098d2aef0 ("perf metricgroup: Hack a fix for aliases when covering multiple PMUs") url: https://github.com/0day-ci/linux/commits/John-Garry/perf-pmu-events-Support-event-aliasing-for-system-PMUs/20201008-182049 in testcase: perf-sanity-tests version: perf-x86_64-c85fb28b6f99-1_20201008 with following parameters: perf_compiler: gcc ucode: 0xdc on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): I believe this is a Skylake and there is a known bug in the Skylake metric DRAM_Parallel_Reads as described here: https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ Fixing the bug needs more knowledge than what is available in manuals. Hopefully Intel can take a look. Thanks, Ian So this named patch ("perf metricgroup: Hack a fix for aliases...") is breaking test #67 on my machine also, which is a broadwell. I will have a look, but I was hoping that Ian would have a proper fix for this on top of ("perf metricgroup: Fix uncore metric expressions"), which now looks to be merged. Thanks! If you fix the issue, kindly add following tag Reported-by: kernel test robot 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 67 67: Parse and process metrics : FAILED! 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 68 68: x86 rdpmc : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 69 69: Convert perf time to TSC : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 70 70: DWARF unwind : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 71 71: x86 instruction decoder - new instructions: Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 72 72: Intel PT packet decoder : Ok 2020-10-16 19:31:52 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 73 73: x86 bp modify : Ok 2020-10-16 19:31:53 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 74 74: probe libc's inet_pton & backtrace it with ping : Ok 2020-10-16 19:31:54 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf test 75 75: Zstd perf.data compression/decompression : Ok To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml Thanks, Rong Chen .
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On 10/19/2020 9:52 AM, Andi Kleen wrote: I believe this is a Skylake and there is a known bug in the Skylake metric DRAM_Parallel_Reads as described here: https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ Fixing the bug needs more knowledge than what is available in manuals. Hopefully Intel can take a look. Oh I missed the original mail for some reason. Yes it should be cmask instead of thresh for client. I think thresh is used on the server uncore only, not on the client. Jin Yao, can you send a patch please? -Andi Yes, the DRAM_Parallel_Reads works on server but it's failed on client. I will post a patch to fix that. Thanks Jin Yao
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
> I believe this is a Skylake and there is a known bug in the Skylake > metric DRAM_Parallel_Reads as described here: > https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ > Fixing the bug needs more knowledge than what is available in manuals. > Hopefully Intel can take a look. Oh I missed the original mail for some reason. Yes it should be cmask instead of thresh for client. I think thresh is used on the server uncore only, not on the client. Jin Yao, can you send a patch please? -Andi
Re: [perf metricgroup] fcc9c5243c: perf-sanity-tests.Parse_and_process_metrics.fail
On Sun, Oct 18, 2020 at 1:51 AM kernel test robot wrote: > > Greeting, > > FYI, we noticed the following commit (built with gcc-9): > > commit: fcc9c5243c478f104014daf4d23db86098d2aef0 ("perf metricgroup: Hack a > fix for aliases when covering multiple PMUs") > url: > https://github.com/0day-ci/linux/commits/John-Garry/perf-pmu-events-Support-event-aliasing-for-system-PMUs/20201008-182049 > > > in testcase: perf-sanity-tests > version: perf-x86_64-c85fb28b6f99-1_20201008 > with following parameters: > > perf_compiler: gcc > ucode: 0xdc > > > > on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G > memory > > caused below changes (please refer to attached dmesg/kmsg for entire > log/backtrace): I believe this is a Skylake and there is a known bug in the Skylake metric DRAM_Parallel_Reads as described here: https://lore.kernel.org/lkml/CAP-5=fxejvaqa9qfw66cy77qb962+jbe8tt5bslooocfmod...@mail.gmail.com/ Fixing the bug needs more knowledge than what is available in manuals. Hopefully Intel can take a look. Thanks, Ian > > > If you fix the issue, kindly add following tag > Reported-by: kernel test robot > > > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 67 > 67: Parse and process metrics : FAILED! > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 68 > 68: x86 rdpmc : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 69 > 69: Convert perf time to TSC : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 70 > 70: DWARF unwind : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 71 > 71: x86 instruction decoder - new instructions: Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 72 > 72: Intel PT packet decoder : Ok > 2020-10-16 19:31:52 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 73 > 73: x86 bp modify : Ok > 2020-10-16 19:31:53 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 74 > 74: probe libc's inet_pton & backtrace it with ping : Ok > 2020-10-16 19:31:54 sudo > /usr/src/perf_selftests-x86_64-rhel-8.3-fcc9c5243c478f104014daf4d23db86098d2aef0/tools/perf/perf > test 75 > 75: Zstd perf.data compression/decompression : Ok > > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml > > > > Thanks, > Rong Chen >