Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
On Mon, Nov 27, 2023 at 09:10:54AM -0800, Linus Torvalds wrote: > On Mon, 27 Nov 2023 at 02:27, Christian Brauner wrote: > > > > So I've picked up your patch (vfs.misc). It's clever alright so thanks > > for the comments in there otherwise I would've stared at this for far > > too long. > > Note that I should probably have commented on one other thing: that > whole "just load from fd[0] is always safe, because the fd[] array > always exists". I added a comment to that effect in the code. > > IOW, that whole "load and mask" thing only works when you know the > array exists at all. > > Doing that "just mask the index" wouldn't be valid if "size = 0" is an > option and might mean that we don't have an array at all (ie if "->fd" > itself could be NULL. > > But we never have a completely empty file descriptor array, and > fdp->fd is never NULL. At a minimum 'max_fds' is NR_OPEN_DEFAULT. > > (The whole 'tsk->files' could be NULL, but only for kernel threads or > when exiting, so fget_task() will check for *that*, but it's a > separate thing) Yep. > > So that's why it's safe to *entirely* remove the whole > > if (unlikely(fd >= fdt->max_fds)) > > test, and do it *all* with just "mask the index, and mask the resulting load". Yep.
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
On Mon, 27 Nov 2023 at 02:27, Christian Brauner wrote: > > So I've picked up your patch (vfs.misc). It's clever alright so thanks > for the comments in there otherwise I would've stared at this for far > too long. Note that I should probably have commented on one other thing: that whole "just load from fd[0] is always safe, because the fd[] array always exists". IOW, that whole "load and mask" thing only works when you know the array exists at all. Doing that "just mask the index" wouldn't be valid if "size = 0" is an option and might mean that we don't have an array at all (ie if "->fd" itself could be NULL. But we never have a completely empty file descriptor array, and fdp->fd is never NULL. At a minimum 'max_fds' is NR_OPEN_DEFAULT. (The whole 'tsk->files' could be NULL, but only for kernel threads or when exiting, so fget_task() will check for *that*, but it's a separate thing) So that's why it's safe to *entirely* remove the whole if (unlikely(fd >= fdt->max_fds)) test, and do it *all* with just "mask the index, and mask the resulting load". Because we can *always* do that load at "fdt->fd[0]", and we want to check the result for NULL anyway, so the "mask at the end and check for NULL" is both natural and generates very good code. Anyway, not a big deal, bit it might be worth noting before somebody tries the same trick on some other array that *could* be zero-sized and with a NULL base pointer, and where that 'array[0]' access isn't necessarily guaranteed to be ok. > It's a little unpleasant because of the cast-orama going on before we > check the file pointer but I don't see that it's in any way wrong. In my cleanup phase - which was a bit messy - I did wonder if I should have some helper for it, since it shows up in both __fget_files_rcu() and in files_lookup_fd_raw(). So I *could* have tried to add something like a "masked_rcu_dereference()" that took the base pointer, the index, and the mask, and did that whole dance. Or I could have had just a "mask_pointer()" function, which we do occasionally do in other places too (ie we hide data in low bits, and then we mask them away when the pointer is used as a pointer). But with only two users, it seemed to add more conceptual complexity than it's worth, and I was not convinced that we'd want to expose that pattern and have others use it. So having a helper might clarify things, but it might also encourage wrong users. I dunno. I suspect the only real use for this ends up being this very special "access the fdt->fd[] array using a file descriptor". Anyway, that's why I largely just did it with comments, and commented both places - and just kept the cast there in the open. Linus
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
On Mon, Nov 20, 2023 at 03:11:31PM +0800, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops > on: > > > commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to > SLAB_TYPESAFE_BY_RCU") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > 93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 > --- > %stddev %change %stddev > \ |\ [snip] > 30.90 ± 4% -20.6 10.35 ± 2% > perf-profile.self.cycles-pp.__fget_light > 0.00 +26.5 26.48 > perf-profile.self.cycles-pp.__get_file_rcu [snip] So __fget_light now got a func call. I don't know if this is worth patching (and benchmarking after), but I if sorting this out is of interest, triviality below is probably the easiest way out: diff --git a/fs/file.c b/fs/file.c index 5fb0b146e79e..d8d3e18800c4 100644 --- a/fs/file.c +++ b/fs/file.c @@ -856,14 +856,14 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(>file_lock); } -static struct file *__get_file_rcu(struct file __rcu **f) +static __always_inline struct file *__get_file_rcu(struct file __rcu **f) { struct file __rcu *file; struct file __rcu *file_reloaded; struct file __rcu *file_reloaded_cmp; file = rcu_dereference_raw(*f); - if (!file) + if (unlikely(!file)) return NULL; if (unlikely(!atomic_long_inc_not_zero(>f_count))) @@ -891,7 +891,7 @@ static struct file *__get_file_rcu(struct file __rcu **f) * If the pointers don't match the file has been reallocated by * SLAB_TYPESAFE_BY_RCU. */ - if (file == file_reloaded_cmp) + if (likely(file == file_reloaded_cmp)) return file_reloaded; fput(file);
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
> So that nobody else would waste any time on this, attached is a new > attempt. This time actually tested *after* the changes. So I've picked up your patch (vfs.misc). It's clever alright so thanks for the comments in there otherwise I would've stared at this for far too long. It's a little unpleasant because of the cast-orama going on before we check the file pointer but I don't see that it's in any way wrong. And given how focussed people are with __fget_* performance I think it might even be the right thing to do. But the cleverness means we have the same logic slightly differently twice. Not too bad ofc but not too nice either especially because that rcu lookup is pretty complicated already. A few days ago I did just write a long explanatory off-list email to someone who had questions about this and who is fairly experienced so we're not making it easy on people. But performance or simplicity; one can't necessarily always have both.
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
> I took a look at the code generation, and honestly, I think we're > better off just making __fget_files_rcu() have special logic for this > all, and not use __get_file_rcu(). My initial massaging of the patch did that btw. Then I sat there wondering whether it would matter if we just made it possible to reuse that code and I went through a bunch of iterations. Oh well, it seems to matter. > Comments? I also looked at that odd OPTIMIZER_HIDE_VAR() that Concept looks sane to me. > __get_file_rcu() does, and I don't get it. Both things come from > volatile accesses, I don't see the point of those games, but I also > didn't care, since it's no longer in a critical code path. > > Christian? Puts his completely imagined "I understand RCU head on". SLAB_TYPESAFE_BY_RCU makes the RCU consume memory ordering that the compiler doesn't officialy support (afaik) a bit wonky. So the thinking was that we could have code patterns where you could free the object and reallocate it while legitimatly passing the pointer recheck. In that case there is no memory ordering between the allocation and the pointer recheck because the last (re)allocation could have been after the rcu_dereference(). To combat that all future loads were made to have a dependency on the first load using the hidevar trick. I guess that might only be theoretically possible but not in practice? But then I liked that we explicitly commented on it as a reminder.
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
hi, Linus, On Sun, Nov 26, 2023 at 03:20:58PM -0800, Linus Torvalds wrote: > On Sun, 26 Nov 2023 at 12:23, Linus Torvalds > wrote: > > > > IOW, I might have messed up some "trivial cleanup" when prepping for > > sending it out... > > Bah. Famous last words. One of the "trivial cleanups" made the code > more "obvious" by renaming the nospec mask as just "mask". > > And that trivial rename broke that patch *entirely*, because now that > name shadowed the "fmode_t" mask argument. > > Don't even ask how long it took me to go from "I *tested* this, > dammit, now it doesn't work at all" to "Oh God, I'm so stupid". > > So that nobody else would waste any time on this, attached is a new > attempt. This time actually tested *after* the changes. we applied the new patch upon 0ede61d858, and confirmed regression is gone, even 3.4% better than 93faf426e3 now. Tested-by: kernel test robot = compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale commit: 93faf426e3 ("vfs: shave work on failed file open") 0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU") c712b4365b ("Improve __fget_files_rcu() code generation (and thus __fget_light())") 93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 c712b4365b5b4dbe1d1380edd37 --- --- %stddev %change %stddev %change %stddev \ |\ |\ 228481 ± 4% -4.6% 217900 ± 6% -11.7% 201857 ± 5% meminfo.DirectMap4k 89056-2.0% 87309-1.6% 87606 proc-vmstat.nr_slab_unreclaimable 16.28-0.7% 16.16-1.0% 16.12 turbostat.RAMWatt 0.01 ± 9% +58125.6% 4.17 ±175% +23253.5% 1.67 ±222% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 781.67 ± 10% +6.5% 832.50 ± 19% -14.3% 670.17 ± 4% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 97958 ± 7% -9.7% 88449 ± 4% -0.6% 97399 ± 4% sched_debug.cpu.avg_idle.stddev 0.00 ± 12% +24.2% 0.00 ± 17% -5.2% 0.00 ± 7% sched_debug.cpu.next_balance.stddev 6391048-2.9%6208584+3.4%6605584 will-it-scale.16.threads 399440-2.9% 388036+3.4% 412848 will-it-scale.per_thread_ops 6391048-2.9%6208584+3.4%6605584 will-it-scale.workload 19.99 ± 4% -2.2 17.74+1.2 21.18 ± 2% perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 1.27 ± 5% +0.82.11 ± 3% +31.1 32.36 ± 2% perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 32.69 ± 4% +5.0 37.70 -32.70.00 perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 0.00 +27.9 27.85+0.00.00 perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll 20.00 ± 4% -2.3 17.75+0.4 20.43 ± 2% perf-profile.children.cycles-pp.fput 0.24 ± 10% -0.10.18 ± 2% -0.10.18 ± 10% perf-profile.children.cycles-pp.syscall_return_via_sysret 1.48 ± 5% +0.51.98 ± 3% +30.8 32.32 ± 2% perf-profile.children.cycles-pp.__fdget 31.85 ± 4% +6.0 37.86 -31.80.00 perf-profile.children.cycles-pp.__fget_light 0.00 +27.7 27.67+0.00.00 perf-profile.children.cycles-pp.__get_file_rcu 30.90 ± 4% -20.6 10.35 ± 2% -30.90.00 perf-profile.self.cycles-pp.__fget_light 19.94 ± 4% -2.4 17.53-0.3 19.62 ± 2% perf-profile.self.cycles-pp.fput 9.81 ± 4% -2.47.42 ± 2% +1.7 11.51 ± 4% perf-profile.self.cycles-pp.do_poll 0.23 ± 11% -0.10.17 ± 4% -0.10.18 ± 11% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.44 ± 7% +0.00.45 ± 5% +0.10.52 ± 4% perf-profile.self.cycles-pp.__poll 0.85 ± 4% +0.10.92 ± 3% +30.3 31.17 ± 2% perf-profile.self.cycles-pp.__fdget 0.00 +26.5 26.48+0.00.00 perf-profile.self.cycles-pp.__get_file_rcu 2.146e+10 ± 2% +8.5% 2.329e+10 ± 2% -2.1% 2.101e+10 perf-stat.i.branch-instructions
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
On Sun, 26 Nov 2023 at 12:23, Linus Torvalds wrote: > > IOW, I might have messed up some "trivial cleanup" when prepping for > sending it out... Bah. Famous last words. One of the "trivial cleanups" made the code more "obvious" by renaming the nospec mask as just "mask". And that trivial rename broke that patch *entirely*, because now that name shadowed the "fmode_t" mask argument. Don't even ask how long it took me to go from "I *tested* this, dammit, now it doesn't work at all" to "Oh God, I'm so stupid". So that nobody else would waste any time on this, attached is a new attempt. This time actually tested *after* the changes. Linus From 45f70b5413a654d20ead410c533ec1d604bdb1e2 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 26 Nov 2023 12:24:38 -0800 Subject: [PATCH] Improve __fget_files_rcu() code generation (and thus __fget_light()) Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a performance regression as reported by the kernel test robot. The __fget_light() function is one of those critical ones for some loads, and the code generation was unnecessarily impacted. Let's just write that function to better. Reported-by: kernel test robot Cc: Christian Brauner Cc: Jann Horn Cc: Mateusz Guzik Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.s...@intel.com Signed-off-by: Linus Torvalds --- fs/file.c | 48 + include/linux/fdtable.h | 15 - 2 files changed, 40 insertions(+), 23 deletions(-) diff --git a/fs/file.c b/fs/file.c index 5fb0b146e79e..608b4802c214 100644 --- a/fs/file.c +++ b/fs/file.c @@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files, struct file *file; struct fdtable *fdt = rcu_dereference_raw(files->fdt); struct file __rcu **fdentry; + unsigned long nospec; - if (unlikely(fd >= fdt->max_fds)) + /* Mask is a 0 for invalid fd's, ~0 for valid ones */ + nospec = array_index_mask_nospec(fd, fdt->max_fds); + + /* fdentry points to the 'fd' offset, or fdt->fd[0] */ + fdentry = fdt->fd + (fd); + + /* Do the load, then mask any invalid result */ + file = rcu_dereference_raw(*fdentry); + file = (void *)(nospec & (unsigned long)file); + + if (unlikely(!file)) return NULL; - fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + /* + * Ok, we have a file pointer that was valid at + * some point, but it might have become stale since. + * + * We need to confirm it by incrementing the refcount + * and then check the lookup again. + * + * atomic_long_inc_not_zero() gives us a full memory + * barrier. We only really need an 'acquire' one to + * protect the loads below, but we don't have that. + */ + if (unlikely(!atomic_long_inc_not_zero(>f_count))) + continue; /* - * Ok, we have a file pointer. However, because we do - * this all locklessly under RCU, we may be racing with - * that file being closed. - * * Such a race can take two forms: * * (a) the file ref already went down to zero and the * file hasn't been reused yet or the file count * isn't zero but the file has already been reused. - */ - file = __get_file_rcu(fdentry); - if (unlikely(!file)) - return NULL; - - if (unlikely(IS_ERR(file))) - continue; - - /* + * * (b) the file table entry has changed under us. * Note that we don't need to re-check the 'fdt->fd' * pointer having changed, because it always goes @@ -991,7 +1002,8 @@ static inline struct file *__fget_files_rcu(struct files_struct *files, * * If so, we need to put our ref and try again. */ - if (unlikely(rcu_dereference_raw(files->fdt) != fdt)) { + if (unlikely(file != rcu_dereference_raw(*fdentry)) || + unlikely(rcu_dereference_raw(files->fdt) != fdt)) { fput(file); continue; } @@ -1128,13 +1140,13 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) * atomic_read_acquire() pairs with atomic_dec_and_test() in * put_files_struct(). */ - if (atomic_read_acquire(>count) == 1) { + if (likely(atomic_read_acquire(>count) == 1)) { file = files_lookup_fd_raw(files, fd); if (!file || unlikely(file->f_mode & mask)) return 0; return (unsigned long)file; } else { - file = __fget(fd, mask); + file = __fget_files(files, fd, mask); if (!file) return 0; return FDPUT_FPUT | (unsigned long)file; diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index bc4c3287a65e..80bd7789bab1 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -83,12 +83,17 @@ struct dentry; static inline struct file *files_lookup_fd_raw(struct files_struct *files, unsigned int fd) { struct fdtable *fdt = rcu_dereference_raw(files->fdt); + unsigned long mask = array_index_mask_nospec(fd, fdt->max_fds); + struct file *needs_masking; - if (fd < fdt->max_fds) { - fd = array_index_nospec(fd,
Re: [Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
On Sun, 19 Nov 2023 at 23:11, kernel test robot wrote: > > kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops > on: > > commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to > SLAB_TYPESAFE_BY_RCU") Ok, so __fget_light() is one of our more performance-critical things, and that commit ends up making it call a rather more expensive version in __get_file_rcu(), so we have: > 30.90 ą 4% -20.6 10.35 ą 2% > perf-profile.self.cycles-pp.__fget_light > 0.00 +26.5 26.48 > perf-profile.self.cycles-pp.__get_file_rcu and that "20% decrease balanced by 26% increase elsewhere" then directly causes the ~3% regression. I took a look at the code generation, and honestly, I think we're better off just making __fget_files_rcu() have special logic for this all, and not use __get_file_rcu(). The 'fd' case really is special because we need to do that non-speculative pointer access. Because it turns out that we already have to use array_index_nospec() to safely generate that pointer to the fd entry, and once you have to do that "use non-speculative accesses to generate a safe pointer", you might as well just go whole hog. So this takes a different approach, and uses the array_index_mask_nospec() thing that we have exactly to generate that non-speculative mask for these things: /* Mask is a 0 for invalid fd's, ~0 for valid ones */ mask = array_index_mask_nospec(fd, fdt->max_fds); and then it does something you can consider either horribly clever, or horribly ugly (this first part is basically just array_index_nospec()): /* fdentry points to the 'fd' offset, or fdt->fd[0] */ fdentry = fdt->fd + (fd); and then we can just *unconditionally* do the load - but since we might be loading fd[0] for an invalid case, we need to mask the result too: /* Do the load, then mask any invalid result */ file = rcu_dereference_raw(*fdentry); file = (void *)(mask & (unsigned long)file); but now we have done everything without any conditionals, and the only conditional is now "did we load NULL" - which includes that "we masked the bad value". Then we just do that atomic_long_inc_not_zero() on the file, and re-check the pointer chain we used. I made files_lookup_fd_raw() do the same thing. The end result is much nicer code generation at least from my quick check. And I assume the regression will be gone, and hopefully even turned into an improvement since this is so hot. Comments? I also looked at that odd OPTIMIZER_HIDE_VAR() that __get_file_rcu() does, and I don't get it. Both things come from volatile accesses, I don't see the point of those games, but I also didn't care, since it's no longer in a critical code path. Christian? NOTE! This patch is not well tested. I verified an earlier version of this, but have been playing with it since, so caveat emptor. IOW, I might have messed up some "trivial cleanup" when prepping for sending it out... Linus fs/file.c | 48 ++-- include/linux/fdtable.h | 15 ++- 2 files changed, 40 insertions(+), 23 deletions(-) diff --git a/fs/file.c b/fs/file.c index 5fb0b146e79e..c74a6e8687d9 100644 --- a/fs/file.c +++ b/fs/file.c @@ -959,31 +959,42 @@ static inline struct file *__fget_files_rcu(struct files_struct *files, struct file *file; struct fdtable *fdt = rcu_dereference_raw(files->fdt); struct file __rcu **fdentry; + unsigned long mask; - if (unlikely(fd >= fdt->max_fds)) + /* Mask is a 0 for invalid fd's, ~0 for valid ones */ + mask = array_index_mask_nospec(fd, fdt->max_fds); + + /* fdentry points to the 'fd' offset, or fdt->fd[0] */ + fdentry = fdt->fd + (fd); + + /* Do the load, then mask any invalid result */ + file = rcu_dereference_raw(*fdentry); + file = (void *)(mask & (unsigned long)file); + + if (unlikely(!file)) return NULL; - fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + /* + * Ok, we have a file pointer that was valid at + * some point, but it might have become stale since. + * + * We need to confirm it by incrementing the refcount + * and then check the lookup again. + * + * atomic_long_inc_not_zero() gives us a full memory + * barrier. We only really need an 'acquire' one to + * protect the loads below, but we don't have that. + */ + if (unlikely(!atomic_long_inc_not_zero(>f_count))) + continue; /* - * Ok, we have a file pointer. However, because we do - * this all locklessly under RCU, we may be racing with - * that file being closed. - * * Such a race can take two forms: * * (a) the file ref already went down to zero and the * file hasn't been reused yet or the file count * isn't zero but the file has already been reused. - */ - file = __get_file_rcu(fdentry); - if (unlikely(!file)) - return NULL; - - if (unlikely(IS_ERR(file))) -
[Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression
Hello, kernel test robot noticed a -2.9% regression of will-it-scale.per_thread_ops on: commit: 0ede61d8589cc2d93aa78230d74ac58b5b8d0244 ("file: convert to SLAB_TYPESAFE_BY_RCU") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: will-it-scale test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory parameters: nr_task: 16 mode: thread test: poll2 cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.s...@intel.com Details are as below: --> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231120/202311201406.2022ca3f-oliver.s...@intel.com = compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale commit: 93faf426e3 ("vfs: shave work on failed file open") 0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU") 93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 --- %stddev %change %stddev \ |\ 0.01 ± 9% +58125.6% 4.17 ±175% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 89056-2.0% 87309proc-vmstat.nr_slab_unreclaimable 97958 ± 7% -9.7% 88449 ± 4% sched_debug.cpu.avg_idle.stddev 0.00 ± 12% +24.2% 0.00 ± 17% sched_debug.cpu.next_balance.stddev 6391048-2.9%6208584will-it-scale.16.threads 399440-2.9% 388036will-it-scale.per_thread_ops 6391048-2.9%6208584will-it-scale.workload 19.99 ± 4% -2.2 17.74 perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 1.27 ± 5% +0.82.11 ± 3% perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 32.69 ± 4% +5.0 37.70 perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 0.00 +27.9 27.85 perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll 20.00 ± 4% -2.3 17.75 perf-profile.children.cycles-pp.fput 0.24 ± 10% -0.10.18 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret 1.48 ± 5% +0.51.98 ± 3% perf-profile.children.cycles-pp.__fdget 31.85 ± 4% +6.0 37.86 perf-profile.children.cycles-pp.__fget_light 0.00 +27.7 27.67 perf-profile.children.cycles-pp.__get_file_rcu 30.90 ± 4% -20.6 10.35 ± 2% perf-profile.self.cycles-pp.__fget_light 19.94 ± 4% -2.4 17.53perf-profile.self.cycles-pp.fput 9.81 ± 4% -2.47.42 ± 2% perf-profile.self.cycles-pp.do_poll 0.23 ± 11% -0.10.17 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.00 +26.5 26.48 perf-profile.self.cycles-pp.__get_file_rcu 2.146e+10 ± 2% +8.5% 2.329e+10 ± 2% perf-stat.i.branch-instructions 0.22 ± 14% -0.00.19 ± 14% perf-stat.i.branch-miss-rate% 1.404e+10 ± 2% +8.7% 1.526e+10 ± 2% perf-stat.i.dTLB-stores 70.87-2.3 68.59perf-stat.i.iTLB-load-miss-rate% 5267608-5.5%4979133 ± 2% perf-stat.i.iTLB-load-misses 2102507+5.4%2215725perf-stat.i.iTLB-loads 18791 ± 3% +10.5% 20757 ± 2% perf-stat.i.instructions-per-iTLB-miss 266.67 ± 2% +6.8% 284.75 ± 2% perf-stat.i.metric.M/sec 0.01 ± 10% -10.5% 0.01 ± 5% perf-stat.overall.MPKI 0.19-0.00.17 perf-stat.overall.branch-miss-rate% 0.65-3.1% 0.63perf-stat.overall.cpi 0.00 ± 4% -0.00.00 ± 4% perf-stat.overall.dTLB-store-miss-rate% 71.48-2.3 69.21 perf-stat.overall.iTLB-load-miss-rate% 18757 +10.0% 20629 perf-stat.overall.instructions-per-iTLB-miss 1.54+3.2% 1.59perf-stat.overall.ipc 4795147+6.4%5100406perf-stat.overall.path-length 2.14e+10 ± 2% +8.5% 2.322e+10 ± 2% perf-stat.ps.branch-instructions 1.4e+10