Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-09-01 Thread Huang Ying
On Wed, 2015-08-05 at 10:38 +0200, Ingo Molnar wrote:
> * kernel test robot  wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
> > commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection 
> > of GCC -mpreferred-stack-boundary support")
> 
> Does the performance regression go away reproducibly if you do:
> 
>git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
> 
> ?

Sorry for reply so late!

Revert the commit will restore part of the performance, as below.
parent commit: f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
the commit:b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
revert commit: 987d12601a4a82cc2f2151b1be704723eb84cb9d

=
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/test:
  
wsm/will-it-scale/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/readseek2

commit: 
  f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
  b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
  987d12601a4a82cc2f2151b1be704723eb84cb9d

f2a50f8b7da45ff2 b2c51106c7581866c37ffc77c5 987d12601a4a82cc2f2151b1be 
 -- -- 
 %stddev %change %stddev %change %stddev
 \  |\  |\  
879002 ±  0% -18.1% 720270 ±  7%  -3.6% 847011 ±  2%  
will-it-scale.per_process_ops
  0.02 ±  0% +34.5%   0.02 ±  7%  +5.6%   0.02 ±  2%  
will-it-scale.scalability
 11144 ±  0%  +0.1%  11156 ±  0% +10.6%  12320 ±  0%  
will-it-scale.time.minor_page_faults
769.30 ±  0%  -0.9% 762.15 ±  0%  +1.1% 777.42 ±  0%  
will-it-scale.time.system_time
  26153173 ±  0%  +7.0%   27977076 ±  0%  +3.5%   27078124 ±  0%  
will-it-scale.time.voluntary_context_switches
  2964 ±  2%  +1.4%   3004 ±  1% -51.9%   1426 ±  2%  
proc-vmstat.pgactivate
  0.06 ± 27%+154.5%   0.14 ± 44%+122.7%   0.12 ± 24%  
turbostat.CPU%c3
370683 ±  0%  +6.2% 393491 ±  0%  +2.4% 379575 ±  0%  
vmstat.system.cs
 11144 ±  0%  +0.1%  11156 ±  0% +10.6%  12320 ±  0%  
time.minor_page_faults
 15.70 ±  2% +14.5%  17.98 ±  0%  +1.5%  15.94 ±  1%  
time.user_time
830343 ± 56% -54.0% 382128 ± 39% -22.3% 645308 ± 65%  
cpuidle.C1E-NHM.time
788.25 ± 14% -21.7% 617.25 ± 16% -12.3% 691.00 ±  3%  
cpuidle.C1E-NHM.usage
   2489132 ± 20% +79.3%4464147 ± 33% +78.4%4440574 ± 21%  
cpuidle.C3-NHM.time
   1082762 ±162%-100.0%   0.00 ± -1%+189.3%3132030 ±110%  
latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
102189 ±  2%  -2.1% 100087 ±  5% -32.9%  68568 ±  2%  
latency_stats.hits.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%+289.6%4217977 ±109%  
latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%+478.5%6264061 ±110%  
latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
  5.10 ±  2%  -8.0%   4.69 ±  1% +13.0%   5.76 ±  1%  
perf-profile.cpu-cycles.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
  2.58 ±  8% +19.5%   3.09 ±  3%  -1.8%   2.54 ± 11%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry
  7.02 ±  3%  +9.2%   7.67 ±  2%  +7.1%   7.52 ±  3%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry
  3.07 ±  2% +14.8%   3.53 ±  3%  -1.4%   3.03 ±  5%  
perf-profile.cpu-cycles.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
  3.05 ±  5%  -8.4%   2.79 ±  4%  -5.2%   2.90 ±  5%  
perf-profile.cpu-cycles.hrtimer_start_range_ns.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_idle_enter.cpu_startup_entry
  0.89 ±  5%  -7.6%   0.82 ±  3% +16.3%   1.03 ±  5%  
perf-profile.cpu-cycles.is_ftrace_trampoline.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
  0.98 ±  3% -25.1%   0.74 ±  7% -16.8%   0.82 ±  2%  

Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-09-01 Thread Huang Ying
On Wed, 2015-08-05 at 10:38 +0200, Ingo Molnar wrote:
> * kernel test robot  wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
> > commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection 
> > of GCC -mpreferred-stack-boundary support")
> 
> Does the performance regression go away reproducibly if you do:
> 
>git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
> 
> ?

Sorry for reply so late!

Revert the commit will restore part of the performance, as below.
parent commit: f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
the commit:b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
revert commit: 987d12601a4a82cc2f2151b1be704723eb84cb9d

=
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/test:
  
wsm/will-it-scale/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/readseek2

commit: 
  f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
  b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
  987d12601a4a82cc2f2151b1be704723eb84cb9d

f2a50f8b7da45ff2 b2c51106c7581866c37ffc77c5 987d12601a4a82cc2f2151b1be 
 -- -- 
 %stddev %change %stddev %change %stddev
 \  |\  |\  
879002 ±  0% -18.1% 720270 ±  7%  -3.6% 847011 ±  2%  
will-it-scale.per_process_ops
  0.02 ±  0% +34.5%   0.02 ±  7%  +5.6%   0.02 ±  2%  
will-it-scale.scalability
 11144 ±  0%  +0.1%  11156 ±  0% +10.6%  12320 ±  0%  
will-it-scale.time.minor_page_faults
769.30 ±  0%  -0.9% 762.15 ±  0%  +1.1% 777.42 ±  0%  
will-it-scale.time.system_time
  26153173 ±  0%  +7.0%   27977076 ±  0%  +3.5%   27078124 ±  0%  
will-it-scale.time.voluntary_context_switches
  2964 ±  2%  +1.4%   3004 ±  1% -51.9%   1426 ±  2%  
proc-vmstat.pgactivate
  0.06 ± 27%+154.5%   0.14 ± 44%+122.7%   0.12 ± 24%  
turbostat.CPU%c3
370683 ±  0%  +6.2% 393491 ±  0%  +2.4% 379575 ±  0%  
vmstat.system.cs
 11144 ±  0%  +0.1%  11156 ±  0% +10.6%  12320 ±  0%  
time.minor_page_faults
 15.70 ±  2% +14.5%  17.98 ±  0%  +1.5%  15.94 ±  1%  
time.user_time
830343 ± 56% -54.0% 382128 ± 39% -22.3% 645308 ± 65%  
cpuidle.C1E-NHM.time
788.25 ± 14% -21.7% 617.25 ± 16% -12.3% 691.00 ±  3%  
cpuidle.C1E-NHM.usage
   2489132 ± 20% +79.3%4464147 ± 33% +78.4%4440574 ± 21%  
cpuidle.C3-NHM.time
   1082762 ±162%-100.0%   0.00 ± -1%+189.3%3132030 ±110%  
latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
102189 ±  2%  -2.1% 100087 ±  5% -32.9%  68568 ±  2%  
latency_stats.hits.pipe_wait.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%+289.6%4217977 ±109%  
latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%+478.5%6264061 ±110%  
latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
  5.10 ±  2%  -8.0%   4.69 ±  1% +13.0%   5.76 ±  1%  
perf-profile.cpu-cycles.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
  2.58 ±  8% +19.5%   3.09 ±  3%  -1.8%   2.54 ± 11%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry
  7.02 ±  3%  +9.2%   7.67 ±  2%  +7.1%   7.52 ±  3%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry
  3.07 ±  2% +14.8%   3.53 ±  3%  -1.4%   3.03 ±  5%  
perf-profile.cpu-cycles.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
  3.05 ±  5%  -8.4%   2.79 ±  4%  -5.2%   2.90 ±  5%  
perf-profile.cpu-cycles.hrtimer_start_range_ns.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_idle_enter.cpu_startup_entry
  0.89 ±  5%  -7.6%   0.82 ±  3% +16.3%   1.03 ±  5%  
perf-profile.cpu-cycles.is_ftrace_trampoline.__kernel_text_address.print_context_stack.dump_trace.save_stack_trace_tsk
  0.98 ±  3% -25.1%   0.74 ±  7% -16.8%   0.82 ±  2%  

Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-08-10 Thread Andy Lutomirski
On Wed, Aug 5, 2015 at 1:38 AM, Ingo Molnar  wrote:
>
> * kernel test robot  wrote:
>
>> FYI, we noticed the below changes on
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
>> commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection 
>> of GCC -mpreferred-stack-boundary support")
>
> Does the performance regression go away reproducibly if you do:
>
>git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9
>
> ?

FWIW, I spot-checked the generated code.  All I saw were the deletion
of some dummy subtraction from rsp to align the stack for children,
changes of stack frame offsets, and a couple instances in which
instructions got reordered.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-08-10 Thread Andy Lutomirski
On Wed, Aug 5, 2015 at 1:38 AM, Ingo Molnar mi...@kernel.org wrote:

 * kernel test robot ying.hu...@intel.com wrote:

 FYI, we noticed the below changes on

 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
 commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 (x86/build: Fix detection 
 of GCC -mpreferred-stack-boundary support)

 Does the performance regression go away reproducibly if you do:

git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9

 ?

FWIW, I spot-checked the generated code.  All I saw were the deletion
of some dummy subtraction from rsp to align the stack for children,
changes of stack frame offsets, and a couple instances in which
instructions got reordered.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-08-05 Thread Ingo Molnar

* kernel test robot  wrote:

> FYI, we noticed the below changes on
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
> commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection of 
> GCC -mpreferred-stack-boundary support")

Does the performance regression go away reproducibly if you do:

   git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9

?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-08-05 Thread Ingo Molnar

* kernel test robot ying.hu...@intel.com wrote:

 FYI, we noticed the below changes on
 
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
 commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 (x86/build: Fix detection of 
 GCC -mpreferred-stack-boundary support)

Does the performance regression go away reproducibly if you do:

   git revert b2c51106c7581866c37ffc77c5d739f3d4b7cbc9

?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-07-31 Thread kernel test robot
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 ("x86/build: Fix detection of 
GCC -mpreferred-stack-boundary support")


=
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/test:
  
wsm/will-it-scale/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/readseek2

commit: 
  f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
  b2c51106c7581866c37ffc77c5d739f3d4b7cbc9

f2a50f8b7da45ff2 b2c51106c7581866c37ffc77c5 
 -- 
 %stddev %change %stddev
 \  |\  
879002 ±  0% -18.1% 720270 ±  7%  will-it-scale.per_process_ops
  0.02 ±  0% +34.5%   0.02 ±  7%  will-it-scale.scalability
  26153173 ±  0%  +7.0%   27977076 ±  0%  
will-it-scale.time.voluntary_context_switches
 15.70 ±  2% +14.5%  17.98 ±  0%  time.user_time
370683 ±  0%  +6.2% 393491 ±  0%  vmstat.system.cs
830343 ± 56% -54.0% 382128 ± 39%  cpuidle.C1E-NHM.time
788.25 ± 14% -21.7% 617.25 ± 16%  cpuidle.C1E-NHM.usage
  3947 ±  2% +10.6%   4363 ±  3%  slabinfo.kmalloc-192.active_objs
  3947 ±  2% +10.6%   4363 ±  3%  slabinfo.kmalloc-192.num_objs
   1082762 ±162%-100.0%   0.00 ± -1%  
latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%  
latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%  
latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
  2.58 ±  8% +19.5%   3.09 ±  3%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry
  7.02 ±  3%  +9.2%   7.67 ±  2%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry
  3.07 ±  2% +14.8%   3.53 ±  3%  
perf-profile.cpu-cycles.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
  3.05 ±  5%  -8.4%   2.79 ±  4%  
perf-profile.cpu-cycles.hrtimer_start_range_ns.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_idle_enter.cpu_startup_entry
  0.98 ±  3% -25.1%   0.74 ±  7%  
perf-profile.cpu-cycles.is_ftrace_trampoline.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
  1.82 ± 18% +46.6%   2.67 ±  3%  
perf-profile.cpu-cycles.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page
  8.05 ±  3%  +9.5%   8.82 ±  3%  
perf-profile.cpu-cycles.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
  7.75 ± 34% -64.5%   2.75 ± 64%  
sched_debug.cfs_rq[2]:/.nr_spread_over
  1135 ± 20% -43.6% 640.75 ± 49%  
sched_debug.cfs_rq[3]:/.blocked_load_avg
  1215 ± 21% -43.1% 691.50 ± 46%  
sched_debug.cfs_rq[3]:/.tg_load_contrib
 38.50 ± 21%+129.9%  88.50 ± 36%  sched_debug.cfs_rq[4]:/.load
 26.00 ± 20% +98.1%  51.50 ± 46%  
sched_debug.cfs_rq[4]:/.runnable_load_avg
128.25 ± 18%+227.5% 420.00 ± 43%  
sched_debug.cfs_rq[4]:/.utilization_load_avg
  1015 ± 78%+101.1%   2042 ± 25%  
sched_debug.cfs_rq[6]:/.blocked_load_avg
  1069 ± 72%+100.2%   2140 ± 23%  
sched_debug.cfs_rq[6]:/.tg_load_contrib
 88.75 ± 14% -47.3%  46.75 ± 36%  sched_debug.cfs_rq[9]:/.load
 59.25 ± 23% -41.4%  34.75 ± 34%  
sched_debug.cfs_rq[9]:/.runnable_load_avg
315.50 ± 45% -64.6% 111.67 ±  1%  
sched_debug.cfs_rq[9]:/.utilization_load_avg
   2246758 ±  7% +87.6%4213925 ± 65%  sched_debug.cpu#0.nr_switches
   2249376 ±  7% +87.4%4215969 ± 65%  sched_debug.cpu#0.sched_count
   1121438 ±  7% +81.0%2030313 ± 61%  sched_debug.cpu#0.sched_goidle
   1151160 ±  7% +86.5%2146608 ± 64%  sched_debug.cpu#0.ttwu_count
 33.75 ± 15% -22.2%  26.25 ±  6%  sched_debug.cpu#1.cpu_load[3]
 33.25 ± 10% -18.0%  27.25 ±  7%  sched_debug.cpu#1.cpu_load[4]
 40.00 ± 18% +24.4%  49.75 ± 18%  sched_debug.cpu#10.cpu_load[2]
 39.25 ± 14% +22.3%  48.00 ± 10%  sched_debug.cpu#10.cpu_load[3]
 39.50 ± 15% +20.3%  47.50 ±  6%  sched_debug.cpu#10.cpu_load[4]
   5269004 ±  1% 

[lkp] [x86/build] b2c51106c75: -18.1% will-it-scale.per_process_ops

2015-07-31 Thread kernel test robot
FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
commit b2c51106c7581866c37ffc77c5d739f3d4b7cbc9 (x86/build: Fix detection of 
GCC -mpreferred-stack-boundary support)


=
tbox_group/testcase/rootfs/kconfig/compiler/cpufreq_governor/test:
  
wsm/will-it-scale/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/performance/readseek2

commit: 
  f2a50f8b7da45ff2de93a71393e715a2ab9f3b68
  b2c51106c7581866c37ffc77c5d739f3d4b7cbc9

f2a50f8b7da45ff2 b2c51106c7581866c37ffc77c5 
 -- 
 %stddev %change %stddev
 \  |\  
879002 ±  0% -18.1% 720270 ±  7%  will-it-scale.per_process_ops
  0.02 ±  0% +34.5%   0.02 ±  7%  will-it-scale.scalability
  26153173 ±  0%  +7.0%   27977076 ±  0%  
will-it-scale.time.voluntary_context_switches
 15.70 ±  2% +14.5%  17.98 ±  0%  time.user_time
370683 ±  0%  +6.2% 393491 ±  0%  vmstat.system.cs
830343 ± 56% -54.0% 382128 ± 39%  cpuidle.C1E-NHM.time
788.25 ± 14% -21.7% 617.25 ± 16%  cpuidle.C1E-NHM.usage
  3947 ±  2% +10.6%   4363 ±  3%  slabinfo.kmalloc-192.active_objs
  3947 ±  2% +10.6%   4363 ±  3%  slabinfo.kmalloc-192.num_objs
   1082762 ±162%-100.0%   0.00 ± -1%  
latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%  
latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
   1082762 ±162%-100.0%   0.00 ± -1%  
latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
  2.58 ±  8% +19.5%   3.09 ±  3%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry
  7.02 ±  3%  +9.2%   7.67 ±  2%  
perf-profile.cpu-cycles._raw_spin_lock_irqsave.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry
  3.07 ±  2% +14.8%   3.53 ±  3%  
perf-profile.cpu-cycles.finish_wait.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
  3.05 ±  5%  -8.4%   2.79 ±  4%  
perf-profile.cpu-cycles.hrtimer_start_range_ns.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_idle_enter.cpu_startup_entry
  0.98 ±  3% -25.1%   0.74 ±  7%  
perf-profile.cpu-cycles.is_ftrace_trampoline.print_context_stack.dump_trace.save_stack_trace_tsk.__account_scheduler_latency
  1.82 ± 18% +46.6%   2.67 ±  3%  
perf-profile.cpu-cycles.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.finish_wait.__wait_on_bit_lock.__lock_page
  8.05 ±  3%  +9.5%   8.82 ±  3%  
perf-profile.cpu-cycles.prepare_to_wait_exclusive.__wait_on_bit_lock.__lock_page.find_lock_entry.shmem_getpage_gfp
  7.75 ± 34% -64.5%   2.75 ± 64%  
sched_debug.cfs_rq[2]:/.nr_spread_over
  1135 ± 20% -43.6% 640.75 ± 49%  
sched_debug.cfs_rq[3]:/.blocked_load_avg
  1215 ± 21% -43.1% 691.50 ± 46%  
sched_debug.cfs_rq[3]:/.tg_load_contrib
 38.50 ± 21%+129.9%  88.50 ± 36%  sched_debug.cfs_rq[4]:/.load
 26.00 ± 20% +98.1%  51.50 ± 46%  
sched_debug.cfs_rq[4]:/.runnable_load_avg
128.25 ± 18%+227.5% 420.00 ± 43%  
sched_debug.cfs_rq[4]:/.utilization_load_avg
  1015 ± 78%+101.1%   2042 ± 25%  
sched_debug.cfs_rq[6]:/.blocked_load_avg
  1069 ± 72%+100.2%   2140 ± 23%  
sched_debug.cfs_rq[6]:/.tg_load_contrib
 88.75 ± 14% -47.3%  46.75 ± 36%  sched_debug.cfs_rq[9]:/.load
 59.25 ± 23% -41.4%  34.75 ± 34%  
sched_debug.cfs_rq[9]:/.runnable_load_avg
315.50 ± 45% -64.6% 111.67 ±  1%  
sched_debug.cfs_rq[9]:/.utilization_load_avg
   2246758 ±  7% +87.6%4213925 ± 65%  sched_debug.cpu#0.nr_switches
   2249376 ±  7% +87.4%4215969 ± 65%  sched_debug.cpu#0.sched_count
   1121438 ±  7% +81.0%2030313 ± 61%  sched_debug.cpu#0.sched_goidle
   1151160 ±  7% +86.5%2146608 ± 64%  sched_debug.cpu#0.ttwu_count
 33.75 ± 15% -22.2%  26.25 ±  6%  sched_debug.cpu#1.cpu_load[3]
 33.25 ± 10% -18.0%  27.25 ±  7%  sched_debug.cpu#1.cpu_load[4]
 40.00 ± 18% +24.4%  49.75 ± 18%  sched_debug.cpu#10.cpu_load[2]
 39.25 ± 14% +22.3%  48.00 ± 10%  sched_debug.cpu#10.cpu_load[3]
 39.50 ± 15% +20.3%  47.50 ±  6%  sched_debug.cpu#10.cpu_load[4]
   5269004 ±  1% +27.8%