Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
On Wed 03-06-20 17:48:04, Feng Tang wrote: > On Tue, Jun 02, 2020 at 12:02:22AM -0400, Qian Cai wrote: > > > > > > > On Jun 1, 2020, at 11:37 PM, Feng Tang wrote: > > > > > > I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, > > > the overall performance change is trivial (which is expected) > > > > > > 1330147+0.1%1331032will-it-scale.72.processes > > > > > > But the perf stats of "self" shows big change for __vm_enough_memory() > > > > > > 0.27-0.30.00pp.self.__vm_enough_memory > > > > > > I post the full compare result in the end. > > > > I don’t really see what that means exactly, but I suppose the warning is > > there for so long and no one seems notice much trouble (or benefit) because > > of it, so I think you will probably need to come up with a proper > > justification to explain why it is a trouble now, and how your patchset > > suddenly start to trigger the warning as well as why it is no better way > > but to suffer this debuggability regression (probably tiny but still). > > Thanks for the suggestion, and I updated the commit log. > > > >From 1633da8228bd3d0dcbbd8df982977ad4594962a1 Mon Sep 17 00:00:00 2001 > From: Feng Tang > Date: Fri, 29 May 2020 08:48:48 +0800 > Subject: [PATCH] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as > underflow check > > This check was added by 82f71ae4a2b8 ("mm: catch memory commitment underflow") > in 2014 to have a safety check for issues which have been fixed. > And there has been few report caught by it, as described in its > commit log: > > : This shouldn't happen any more - the previous two patches fixed > : the committed_as underflow issues. > > But it was really found by Qian Cai when he used the LTP memory > stress suite to test a RFC patchset, which tries to improve scalability > of per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number > for loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), > while keeping current number for OVERCOMMIT_NEVER. > > With that patchset, when system firstly uses a loose policy, the > 'vm_committed_as' count could be a big negative value, as its big 'batch' > number allows a big deviation, then when the policy is changed to > OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value, > thus hits this WARN check. > > To mitigate this, one proposed solution is to queue work on all online > CPUs to do a local sync for 'vm_committed_as' when changing policy to > OVERCOMMIT_NEVER, plus some global syncing to garante the case won't > be hit. > > But this solution is costy and slow, given this check hasn't shown real > trouble or benefit, simply drop it from one hot path of MM. And perf > stats does show some tiny saving for removing it. > > Reported-by: Qian Cai > Signed-off-by: Feng Tang > Cc: Konstantin Khlebnikov > Cc: Michal Hocko > Cc: Andi Kleen Acked-by: Michal Hocko > --- > mm/util.c | 8 > 1 file changed, 8 deletions(-) > > diff --git a/mm/util.c b/mm/util.c > index 9b3be03..c63c8e4 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, > int cap_sys_admin) > { > long allowed; > > - /* > - * A transient decrease in the value is unlikely, so no need > - * READ_ONCE() for vm_committed_as.count. > - */ > - VM_WARN_ONCE(data_race(percpu_counter_read(_committed_as) < > - -(s64)vm_committed_as_batch * num_online_cpus()), > - "memory commitment underflow"); > - > vm_acct_memory(pages); > > /* > -- > 2.7.4 > -- Michal Hocko SUSE Labs
Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
> On Jun 3, 2020, at 5:48 AM, Feng Tang wrote: > > This check was added by 82f71ae4a2b8 ("mm: catch memory commitment underflow") > in 2014 to have a safety check for issues which have been fixed. > And there has been few report caught by it, as described in its > commit log: > > : This shouldn't happen any more - the previous two patches fixed > : the committed_as underflow issues. > > But it was really found by Qian Cai when he used the LTP memory > stress suite to test a RFC patchset, which tries to improve scalability > of per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number > for loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), > while keeping current number for OVERCOMMIT_NEVER. > > With that patchset, when system firstly uses a loose policy, the > 'vm_committed_as' count could be a big negative value, as its big 'batch' > number allows a big deviation, then when the policy is changed to > OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value, > thus hits this WARN check. > > To mitigate this, one proposed solution is to queue work on all online > CPUs to do a local sync for 'vm_committed_as' when changing policy to > OVERCOMMIT_NEVER, plus some global syncing to garante the case won't > be hit. > > But this solution is costy and slow, given this check hasn't shown real > trouble or benefit, simply drop it from one hot path of MM. And perf > stats does show some tiny saving for removing it. The text looks more reasonable than the previous one. Reviewed-by: Qian Cai
Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
On Tue, Jun 02, 2020 at 12:02:22AM -0400, Qian Cai wrote: > > > > On Jun 1, 2020, at 11:37 PM, Feng Tang wrote: > > > > I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, > > the overall performance change is trivial (which is expected) > > > > 1330147+0.1%1331032will-it-scale.72.processes > > > > But the perf stats of "self" shows big change for __vm_enough_memory() > > > > 0.27-0.30.00pp.self.__vm_enough_memory > > > > I post the full compare result in the end. > > I don’t really see what that means exactly, but I suppose the warning is > there for so long and no one seems notice much trouble (or benefit) because > of it, so I think you will probably need to come up with a proper > justification to explain why it is a trouble now, and how your patchset > suddenly start to trigger the warning as well as why it is no better way but > to suffer this debuggability regression (probably tiny but still). Thanks for the suggestion, and I updated the commit log. >From 1633da8228bd3d0dcbbd8df982977ad4594962a1 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Fri, 29 May 2020 08:48:48 +0800 Subject: [PATCH] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check This check was added by 82f71ae4a2b8 ("mm: catch memory commitment underflow") in 2014 to have a safety check for issues which have been fixed. And there has been few report caught by it, as described in its commit log: : This shouldn't happen any more - the previous two patches fixed : the committed_as underflow issues. But it was really found by Qian Cai when he used the LTP memory stress suite to test a RFC patchset, which tries to improve scalability of per-cpu counter 'vm_committed_as', by chosing a bigger 'batch' number for loose overcommit policies (OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS), while keeping current number for OVERCOMMIT_NEVER. With that patchset, when system firstly uses a loose policy, the 'vm_committed_as' count could be a big negative value, as its big 'batch' number allows a big deviation, then when the policy is changed to OVERCOMMIT_NEVER, the 'batch' will be decreased to a much smaller value, thus hits this WARN check. To mitigate this, one proposed solution is to queue work on all online CPUs to do a local sync for 'vm_committed_as' when changing policy to OVERCOMMIT_NEVER, plus some global syncing to garante the case won't be hit. But this solution is costy and slow, given this check hasn't shown real trouble or benefit, simply drop it from one hot path of MM. And perf stats does show some tiny saving for removing it. Reported-by: Qian Cai Signed-off-by: Feng Tang Cc: Konstantin Khlebnikov Cc: Michal Hocko Cc: Andi Kleen --- mm/util.c | 8 1 file changed, 8 deletions(-) diff --git a/mm/util.c b/mm/util.c index 9b3be03..c63c8e4 100644 --- a/mm/util.c +++ b/mm/util.c @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) { long allowed; - /* -* A transient decrease in the value is unlikely, so no need -* READ_ONCE() for vm_committed_as.count. -*/ - VM_WARN_ONCE(data_race(percpu_counter_read(_committed_as) < - -(s64)vm_committed_as_batch * num_online_cpus()), - "memory commitment underflow"); - vm_acct_memory(pages); /* -- 2.7.4
Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
> On Jun 1, 2020, at 11:37 PM, Feng Tang wrote: > > I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, > the overall performance change is trivial (which is expected) > > 1330147+0.1%1331032will-it-scale.72.processes > > But the perf stats of "self" shows big change for __vm_enough_memory() > > 0.27-0.30.00pp.self.__vm_enough_memory > > I post the full compare result in the end. I don’t really see what that means exactly, but I suppose the warning is there for so long and no one seems notice much trouble (or benefit) because of it, so I think you will probably need to come up with a proper justification to explain why it is a trouble now, and how your patchset suddenly start to trigger the warning as well as why it is no better way but to suffer this debuggability regression (probably tiny but still).
Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
Hi Qian, On Thu, May 28, 2020 at 10:49:28PM -0400, Qian Cai wrote: > On Fri, May 29, 2020 at 09:06:09AM +0800, Feng Tang wrote: > > As is explained by Michal Hocko: > > > > : Looking at the history, this has been added by 82f71ae4a2b8 > > : ("mm: catch memory commitment underflow") to have a safety check > > : for issues which have been fixed. There doesn't seem to be any bug > > : reports mentioning this splat since then so it is likely just > > : spending cycles for a hot path (yes many people run with DEBUG_VM) > > : without a strong reason. > > Hmm, it looks like the warning is still useful to catch issues in, > > https://lore.kernel.org/linux-mm/20140624201606.18273.44270.stgit@zurg > https://lore.kernel.org/linux-mm/54bb9a32.7080...@oracle.com/ > > After read the whole discussion in that thread, I actually disagree with > Michal. In order to get ride of this existing warning, it is rather > someone needs a strong reason that could prove the performance hit is > noticeable with some data. I re-run the same benchmark with v5.7 and 5.7+remove_warning kernels, the overall performance change is trivial (which is expected) 1330147+0.1%1331032will-it-scale.72.processes But the perf stats of "self" shows big change for __vm_enough_memory() 0.27-0.30.00pp.self.__vm_enough_memory I post the full compare result in the end. Thanks, Feng = tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode: lkp-skl-2sp7/will-it-scale/debian-x86_64-20191114.cgz/x86_64-rhel-7.6-vm-debug/gcc-7/100%/process/mmap2/performance/0x265 commit: v5.7 af3eca72dc43078e1ee4a38b0ecc0225b659f345 v5.7 af3eca72dc43078e1ee4a38b0ec --- fail:runs %reproductionfail:runs | | | 850:3 -12130% 486:2 dmesg.timestamp:last 2:3 -67%:2 kmsg.Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#bb) :3 33% 1:2 kmsg.Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#e08) 5:3 -177%:2 kmsg.timestamp:Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#bb) :3 88% 2:2 kmsg.timestamp:Firmware_Bug]:the_BIOS_has_corrupted_hw-PMU_resources(MSR#is#e08) 398:3-% 265:2 kmsg.timestamp:last %stddev %change %stddev \ |\ 1330147+0.1%1331032will-it-scale.72.processes 0.02+0.0% 0.02will-it-scale.72.processes_idle 18474+0.1% 18486will-it-scale.per_process_ops 301.18-0.0% 301.16will-it-scale.time.elapsed_time 301.18-0.0% 301.16 will-it-scale.time.elapsed_time.max 1.00 ± 81%+100.0% 2.00 will-it-scale.time.involuntary_context_switches 9452+0.0% 9452 will-it-scale.time.maximum_resident_set_size 5925+0.1% 5932 will-it-scale.time.minor_page_faults 4096+0.0% 4096will-it-scale.time.page_size 0.01 ± 35% +12.5% 0.01 ± 33% will-it-scale.time.system_time 0.03 ± 14% +5.0% 0.04 ± 14% will-it-scale.time.user_time 83.33+0.2% 83.50 will-it-scale.time.voluntary_context_switches 1330147+0.1%1331032will-it-scale.workload 0.45 ± 29% +0.00.50 ± 28% mpstat.cpu.all.idle% 98.41-0.1 98.34mpstat.cpu.all.sys% 1.14+0.01.16mpstat.cpu.all.usr% 200395 ± 18% +11.9% 224282 ± 14% cpuidle.C1.time 4008 ± 38% -2.1% 3924 ± 15% cpuidle.C1.usage 1.222e+08 ± 19% -29.2% 86444161cpuidle.C1E.time 254203 ± 19% -23.2% 195198 ± 4% cpuidle.C1E.usage 8145747 ± 31%+339.9% 35830338 ± 72% cpuidle.C6.time 22878 ± 9%+288.2% 88823 ± 70% cpuidle.C6.usage 8891 ± 7% -7.4% 8229cpuidle.POLL.time 3111 ± 18% -11.1% 2766cpuidle.POLL.usage 0.00 -100.0% 0.00numa-numastat.node0.interleave_hit 314399 ± 2% -1.0% 311244 ± 3% numa-numastat.node0.local_node 322209+2.4% 329909numa-numastat.node0.numa_hit 7814 ± 73%+138.9% 18670 ± 24% numa-numastat.node0.other_node 0.00 -100.0% 0.00numa-numastat.node1.interleave_hit 343026 ± 2% -0.3% 341980numa-numastat.node1.local_node 358632-3.3% 346708numa-numastat.node1.numa_hit 15613 ±
Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
On Thu, May 28, 2020 at 10:49:28PM -0400, Qian Cai wrote: > On Fri, May 29, 2020 at 09:06:09AM +0800, Feng Tang wrote: > > As is explained by Michal Hocko: > > > > : Looking at the history, this has been added by 82f71ae4a2b8 > > : ("mm: catch memory commitment underflow") to have a safety check > > : for issues which have been fixed. There doesn't seem to be any bug > > : reports mentioning this splat since then so it is likely just > > : spending cycles for a hot path (yes many people run with DEBUG_VM) > > : without a strong reason. > > Hmm, it looks like the warning is still useful to catch issues in, > > https://lore.kernel.org/linux-mm/20140624201606.18273.44270.stgit@zurg > https://lore.kernel.org/linux-mm/54bb9a32.7080...@oracle.com/ > > After read the whole discussion in that thread, I actually disagree with > Michal. In order to get ride of this existing warning, it is rather > someone needs a strong reason that could prove the performance hit is > noticeable with some data. One problem with current check is percpu_counter_read(_committed_as) is not accurate, and percpu_counter_sum() is way too heavy. Thanks, Feng
Re: [PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
On Fri, May 29, 2020 at 09:06:09AM +0800, Feng Tang wrote: > As is explained by Michal Hocko: > > : Looking at the history, this has been added by 82f71ae4a2b8 > : ("mm: catch memory commitment underflow") to have a safety check > : for issues which have been fixed. There doesn't seem to be any bug > : reports mentioning this splat since then so it is likely just > : spending cycles for a hot path (yes many people run with DEBUG_VM) > : without a strong reason. Hmm, it looks like the warning is still useful to catch issues in, https://lore.kernel.org/linux-mm/20140624201606.18273.44270.stgit@zurg https://lore.kernel.org/linux-mm/54bb9a32.7080...@oracle.com/ After read the whole discussion in that thread, I actually disagree with Michal. In order to get ride of this existing warning, it is rather someone needs a strong reason that could prove the performance hit is noticeable with some data. > > Signed-off-by: Feng Tang > Cc: Konstantin Khlebnikov > Cc: Qian Cai > Cc: Michal Hocko > Cc: Andi Kleen > --- > mm/util.c | 8 > 1 file changed, 8 deletions(-) > > diff --git a/mm/util.c b/mm/util.c > index 3c7a08c..fe63271 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, > int cap_sys_admin) > { > long allowed; > > - /* > - * A transient decrease in the value is unlikely, so no need > - * READ_ONCE() for vm_committed_as.count. > - */ > - VM_WARN_ONCE(data_race(percpu_counter_read(_committed_as) < > - -(s64)vm_committed_as_batch * num_online_cpus()), > - "memory commitment underflow"); > - > vm_acct_memory(pages); > > /* > -- > 2.7.4 >
[PATCH v4 3/4] mm/util.c: remove the VM_WARN_ONCE for vm_committed_as underflow check
As is explained by Michal Hocko: : Looking at the history, this has been added by 82f71ae4a2b8 : ("mm: catch memory commitment underflow") to have a safety check : for issues which have been fixed. There doesn't seem to be any bug : reports mentioning this splat since then so it is likely just : spending cycles for a hot path (yes many people run with DEBUG_VM) : without a strong reason. Signed-off-by: Feng Tang Cc: Konstantin Khlebnikov Cc: Qian Cai Cc: Michal Hocko Cc: Andi Kleen --- mm/util.c | 8 1 file changed, 8 deletions(-) diff --git a/mm/util.c b/mm/util.c index 3c7a08c..fe63271 100644 --- a/mm/util.c +++ b/mm/util.c @@ -814,14 +814,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) { long allowed; - /* -* A transient decrease in the value is unlikely, so no need -* READ_ONCE() for vm_committed_as.count. -*/ - VM_WARN_ONCE(data_race(percpu_counter_read(_committed_as) < - -(s64)vm_committed_as_batch * num_online_cpus()), - "memory commitment underflow"); - vm_acct_memory(pages); /* -- 2.7.4