Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-03-15 Thread Linus Torvalds
On Thu, Mar 15, 2018 at 12:33 AM, kemi  wrote:
> Hi, Jeff
>Today, I deleted the previous kernel images for commit 
> 3da90b159b146672f830bcd2489dd3a1f4e9e089
> and commit c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e, respectively. And, 
> re-run the same aim7
> jobs for three times for each commit. The aim7 score between two commit does 
> not have obvious difference.
>
> Perhaps something weird happen when compiling kernel. Please ignore this 
> report, apologize for the bother.

Ok, I'm not entirely happy with how random this was, but that commit
never looked like a remotely likely culprit, so I think we'll just
chalk it up to bad luck and unknown effects.

Linus


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-03-15 Thread Linus Torvalds
On Thu, Mar 15, 2018 at 12:33 AM, kemi  wrote:
> Hi, Jeff
>Today, I deleted the previous kernel images for commit 
> 3da90b159b146672f830bcd2489dd3a1f4e9e089
> and commit c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e, respectively. And, 
> re-run the same aim7
> jobs for three times for each commit. The aim7 score between two commit does 
> not have obvious difference.
>
> Perhaps something weird happen when compiling kernel. Please ignore this 
> report, apologize for the bother.

Ok, I'm not entirely happy with how random this was, but that commit
never looked like a remotely likely culprit, so I think we'll just
chalk it up to bad luck and unknown effects.

Linus


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-03-15 Thread kemi
Hi, Jeff
   Today, I deleted the previous kernel images for commit 
3da90b159b146672f830bcd2489dd3a1f4e9e089
and commit c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e, respectively. And, re-run 
the same aim7 
jobs for three times for each commit. The aim7 score between two commit does 
not have obvious difference.

Perhaps something weird happen when compiling kernel. Please ignore this 
report, apologize for the bother.


On 2018年02月25日 23:41, Jeff Layton wrote:
> On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
>>
>>
>> commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
>> inode_cmp_iversion{+raw} return bool instead of s64")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: aim7
>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 
>> 384G memory
>> with following parameters:
>>
>>  disk: 4BRD_12G
>>  md: RAID0
>>  fs: xfs
>>  test: disk_src
>>  load: 3000
>>  cpufreq_governor: performance
>>
>> test-description: AIM7 is a traditional UNIX system level benchmark suite 
>> which is used to test and measure the performance of multiuser system.
>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>
>>
> 
> I'm a bit suspicious of this result.
> 
> This patch only changes inode_cmp_iversion{+raw} (since renamed to
> inode_eq_iversion{+raw}), and that neither should ever be called from
> xfs. The patch is fairly trivial too, and I wouldn't expect a big
> performance hit.
> 
> Is IMA involved here at all? I didn't see any evidence of it, but the
> kernel config did have it enabled.
> 
> 
>>
>> Details are as below:
>> -->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml  # job file is attached in this email
>> bin/lkp run job.yaml
>>
>> =
>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>   
>> gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/3000/RAID0/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_src/aim7
>>
>> commit: 
>>   3da90b159b (" f2fs-for-4.16-rc1")
>>   c0cef30e4f ("iversion: make inode_cmp_iversion{+raw} return bool instead 
>> of s64")
>>
>> 3da90b159b146672 c0cef30e4ff0dc025f4a1660b8 
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>  40183   -18.0%  32964aim7.jobs-per-min
>> 448.60   +21.9% 546.68aim7.time.elapsed_time
>> 448.60   +21.9% 546.68aim7.time.elapsed_time.max
>>   5615 ±  5% +33.4%   7489 ±  4%  
>> aim7.time.involuntary_context_switches
>>   3086   +14.0%   3518aim7.time.system_time
>>   19439782-5.6%   18359474
>> aim7.time.voluntary_context_switches
>> 199333   +14.3% 227794 ±  2%  
>> interrupts.CAL:Function_call_interrupts
>>   0.59-0.10.50mpstat.cpu.usr%
>>2839401   +16.0%3293688softirqs.SCHED
>>7600068   +15.1%8747820softirqs.TIMER
>> 118.00 ± 43% +98.7% 234.50 ± 15%  vmstat.io.bo
>>  87840   -22.4%  68154vmstat.system.cs
>> 552798 ±  6% +15.8% 640107 ±  4%  numa-numastat.node0.local_node
>> 557345 ±  6% +15.7% 644666 ±  4%  numa-numastat.node0.numa_hit
>> 528341 ±  7% +21.7% 642933 ±  4%  numa-numastat.node1.local_node
>> 531604 ±  7% +21.6% 646209 ±  4%  numa-numastat.node1.numa_hit
>>  2.147e+09   -12.4%   1.88e+09cpuidle.C1.time
>>   13702041   -14.7%   11683737cpuidle.C1.usage
>>  2.082e+08 ±  4% +28.1%  2.667e+08 ±  5%  cpuidle.C1E.time
>>  4.719e+08 ±  2% +23.1%  5.807e+08 ±  4%  cpuidle.C3.time
>>  1.141e+10   +31.0%  1.496e+10cpuidle.C6.time
>>   15672622   +27.8%   20031028cpuidle.C6.usage
>>   13520572 ±  3% +29.5%   17514398 ±  9%  cpuidle.POLL.time
>> 278.25 ±  5% -46.0% 150.25 ± 73%  numa-vmstat.node0.nr_dirtied
>>   3200 ± 14% -20.6%   2542 ± 19%  numa-vmstat.node0.nr_mapped
>> 277.75 ±  5% -46.2% 149.50 ± 73%  numa-vmstat.node0.nr_written
>>  28.50 ± 52%+448.2% 156.25 ± 70%  numa-vmstat.node1.nr_dirtied
>>   2577 ± 19% +26.3%   3255 ± 15%  numa-vmstat.node1.nr_mapped
>> 634338 ±  4%  +7.8% 683959 ±  4%  numa-vmstat.node1.numa_hit
>> 457411 ±  6% +10.8% 506800 ±  5%  numa-vmstat.node1.numa_local
>>   3734 ±  8% -11.5%   3306 ±  6%  
>> 

Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-03-15 Thread kemi
Hi, Jeff
   Today, I deleted the previous kernel images for commit 
3da90b159b146672f830bcd2489dd3a1f4e9e089
and commit c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e, respectively. And, re-run 
the same aim7 
jobs for three times for each commit. The aim7 score between two commit does 
not have obvious difference.

Perhaps something weird happen when compiling kernel. Please ignore this 
report, apologize for the bother.


On 2018年02月25日 23:41, Jeff Layton wrote:
> On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
>>
>>
>> commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
>> inode_cmp_iversion{+raw} return bool instead of s64")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: aim7
>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 
>> 384G memory
>> with following parameters:
>>
>>  disk: 4BRD_12G
>>  md: RAID0
>>  fs: xfs
>>  test: disk_src
>>  load: 3000
>>  cpufreq_governor: performance
>>
>> test-description: AIM7 is a traditional UNIX system level benchmark suite 
>> which is used to test and measure the performance of multiuser system.
>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>
>>
> 
> I'm a bit suspicious of this result.
> 
> This patch only changes inode_cmp_iversion{+raw} (since renamed to
> inode_eq_iversion{+raw}), and that neither should ever be called from
> xfs. The patch is fairly trivial too, and I wouldn't expect a big
> performance hit.
> 
> Is IMA involved here at all? I didn't see any evidence of it, but the
> kernel config did have it enabled.
> 
> 
>>
>> Details are as below:
>> -->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml  # job file is attached in this email
>> bin/lkp run job.yaml
>>
>> =
>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>   
>> gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/3000/RAID0/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_src/aim7
>>
>> commit: 
>>   3da90b159b (" f2fs-for-4.16-rc1")
>>   c0cef30e4f ("iversion: make inode_cmp_iversion{+raw} return bool instead 
>> of s64")
>>
>> 3da90b159b146672 c0cef30e4ff0dc025f4a1660b8 
>>  -- 
>>  %stddev %change %stddev
>>  \  |\  
>>  40183   -18.0%  32964aim7.jobs-per-min
>> 448.60   +21.9% 546.68aim7.time.elapsed_time
>> 448.60   +21.9% 546.68aim7.time.elapsed_time.max
>>   5615 ±  5% +33.4%   7489 ±  4%  
>> aim7.time.involuntary_context_switches
>>   3086   +14.0%   3518aim7.time.system_time
>>   19439782-5.6%   18359474
>> aim7.time.voluntary_context_switches
>> 199333   +14.3% 227794 ±  2%  
>> interrupts.CAL:Function_call_interrupts
>>   0.59-0.10.50mpstat.cpu.usr%
>>2839401   +16.0%3293688softirqs.SCHED
>>7600068   +15.1%8747820softirqs.TIMER
>> 118.00 ± 43% +98.7% 234.50 ± 15%  vmstat.io.bo
>>  87840   -22.4%  68154vmstat.system.cs
>> 552798 ±  6% +15.8% 640107 ±  4%  numa-numastat.node0.local_node
>> 557345 ±  6% +15.7% 644666 ±  4%  numa-numastat.node0.numa_hit
>> 528341 ±  7% +21.7% 642933 ±  4%  numa-numastat.node1.local_node
>> 531604 ±  7% +21.6% 646209 ±  4%  numa-numastat.node1.numa_hit
>>  2.147e+09   -12.4%   1.88e+09cpuidle.C1.time
>>   13702041   -14.7%   11683737cpuidle.C1.usage
>>  2.082e+08 ±  4% +28.1%  2.667e+08 ±  5%  cpuidle.C1E.time
>>  4.719e+08 ±  2% +23.1%  5.807e+08 ±  4%  cpuidle.C3.time
>>  1.141e+10   +31.0%  1.496e+10cpuidle.C6.time
>>   15672622   +27.8%   20031028cpuidle.C6.usage
>>   13520572 ±  3% +29.5%   17514398 ±  9%  cpuidle.POLL.time
>> 278.25 ±  5% -46.0% 150.25 ± 73%  numa-vmstat.node0.nr_dirtied
>>   3200 ± 14% -20.6%   2542 ± 19%  numa-vmstat.node0.nr_mapped
>> 277.75 ±  5% -46.2% 149.50 ± 73%  numa-vmstat.node0.nr_written
>>  28.50 ± 52%+448.2% 156.25 ± 70%  numa-vmstat.node1.nr_dirtied
>>   2577 ± 19% +26.3%   3255 ± 15%  numa-vmstat.node1.nr_mapped
>> 634338 ±  4%  +7.8% 683959 ±  4%  numa-vmstat.node1.numa_hit
>> 457411 ±  6% +10.8% 506800 ±  5%  numa-vmstat.node1.numa_local
>>   3734 ±  8% -11.5%   3306 ±  6%  
>> 

Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-03-01 Thread kemi


On 2018年02月28日 01:04, Linus Torvalds wrote:
> On Tue, Feb 27, 2018 at 5:43 AM, David Howells  wrote:
>> Is it possible there's a stall between the load of RCX and the subsequent
>> instructions because they all have to wait for RCX to become available?
> 
> No. Modern Intel big-core CPU's simply aren't that fragile. All these
> instructions should do OoO fine for trivial sequences like this, and
> as far as I can tell, the new code sequence should be better.
> 
> And even if it were worse for some odd reason, it would be worse by a cycle.
> 
> This kind of 18% change is something else, it is definitely not about
> instruction scheduling.
> 
> Now, if the change to inode_cmp_iversion() causes some actual
> _behavioral_ changes, and we get more IO, that's more like it. But the
> code really does seem to be equivalent. In both cases it is simply
> comparing 63 bits: the high 63 bits of 0x150(%rbp) - inode->i_version
> - with the low 63 bits of 0x20(%rax) - iint->version.
> 
> The only issue would be if the high bit of 0x20(%rax) was somehow set.
> The new code doesn't shift that bit away an more, but it should never
> be set since it comes from
> 
> i_version = inode_query_iversion(inode);
> ...
> iint->version = i_version;
> 
> and that inode_query_iversion() will have done the version shift.
> 
>> The interleaving between operating on RSI and RCX in the older code might
>> alleviate that.
>>
>> In addition, the load if the 20(%rax) value is now done in the CMP 
>> instruction
>> rather than earlier, so it might not get speculatively loaded in time, 
>> whereas
>> the earlier code explicitly loads it up front.
> 
> No again, OoO cores will generally hide details like that.
> 
> You can see effects of it, but it's hard, and it can go both ways.
> 
> Anyway, I think the _real_ change has nothing to with instruction
> scheduling, and everything to do with this:
> 
> 107.62 ± 37%+139.1% 257.38 ± 16%  vmstat.io.bo
>  48740 ± 36%+191.4% 142047 ± 16%  proc-vmstat.pgpgout
> 
> (There's fairly big variation in those numbers, but the changes are
> even bigger) or this:
> 
> 258.12  -100.0%   0.00turbostat.Avg_MHz
>  21.48   -21.50.00turbostat.Busy%
> 

This is caused by a limitation in current turbostat parse script of lkp. It 
treats a string including wildcard character (e.g. 30.**) in the output of 
turbostat
monitor as an error and set all the stats value as 0.

Turbostat monitor runs successfully during these tests.

> or this:
> 
>  27397 ±194%  +43598.3%   11972338 ±139%
> latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
>  27942 ±189%  +96489.5%   26989044 ±139%
> latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
> 
> but those all sound like something changed in the setup, not in the kernel.
> 
> Odd.
> 
> Linus
> 


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-03-01 Thread kemi


On 2018年02月28日 01:04, Linus Torvalds wrote:
> On Tue, Feb 27, 2018 at 5:43 AM, David Howells  wrote:
>> Is it possible there's a stall between the load of RCX and the subsequent
>> instructions because they all have to wait for RCX to become available?
> 
> No. Modern Intel big-core CPU's simply aren't that fragile. All these
> instructions should do OoO fine for trivial sequences like this, and
> as far as I can tell, the new code sequence should be better.
> 
> And even if it were worse for some odd reason, it would be worse by a cycle.
> 
> This kind of 18% change is something else, it is definitely not about
> instruction scheduling.
> 
> Now, if the change to inode_cmp_iversion() causes some actual
> _behavioral_ changes, and we get more IO, that's more like it. But the
> code really does seem to be equivalent. In both cases it is simply
> comparing 63 bits: the high 63 bits of 0x150(%rbp) - inode->i_version
> - with the low 63 bits of 0x20(%rax) - iint->version.
> 
> The only issue would be if the high bit of 0x20(%rax) was somehow set.
> The new code doesn't shift that bit away an more, but it should never
> be set since it comes from
> 
> i_version = inode_query_iversion(inode);
> ...
> iint->version = i_version;
> 
> and that inode_query_iversion() will have done the version shift.
> 
>> The interleaving between operating on RSI and RCX in the older code might
>> alleviate that.
>>
>> In addition, the load if the 20(%rax) value is now done in the CMP 
>> instruction
>> rather than earlier, so it might not get speculatively loaded in time, 
>> whereas
>> the earlier code explicitly loads it up front.
> 
> No again, OoO cores will generally hide details like that.
> 
> You can see effects of it, but it's hard, and it can go both ways.
> 
> Anyway, I think the _real_ change has nothing to with instruction
> scheduling, and everything to do with this:
> 
> 107.62 ± 37%+139.1% 257.38 ± 16%  vmstat.io.bo
>  48740 ± 36%+191.4% 142047 ± 16%  proc-vmstat.pgpgout
> 
> (There's fairly big variation in those numbers, but the changes are
> even bigger) or this:
> 
> 258.12  -100.0%   0.00turbostat.Avg_MHz
>  21.48   -21.50.00turbostat.Busy%
> 

This is caused by a limitation in current turbostat parse script of lkp. It 
treats a string including wildcard character (e.g. 30.**) in the output of 
turbostat
monitor as an error and set all the stats value as 0.

Turbostat monitor runs successfully during these tests.

> or this:
> 
>  27397 ±194%  +43598.3%   11972338 ±139%
> latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
>  27942 ±189%  +96489.5%   26989044 ±139%
> latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
> 
> but those all sound like something changed in the setup, not in the kernel.
> 
> Odd.
> 
> Linus
> 


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread Linus Torvalds
On Tue, Feb 27, 2018 at 5:43 AM, David Howells  wrote:
> Is it possible there's a stall between the load of RCX and the subsequent
> instructions because they all have to wait for RCX to become available?

No. Modern Intel big-core CPU's simply aren't that fragile. All these
instructions should do OoO fine for trivial sequences like this, and
as far as I can tell, the new code sequence should be better.

And even if it were worse for some odd reason, it would be worse by a cycle.

This kind of 18% change is something else, it is definitely not about
instruction scheduling.

Now, if the change to inode_cmp_iversion() causes some actual
_behavioral_ changes, and we get more IO, that's more like it. But the
code really does seem to be equivalent. In both cases it is simply
comparing 63 bits: the high 63 bits of 0x150(%rbp) - inode->i_version
- with the low 63 bits of 0x20(%rax) - iint->version.

The only issue would be if the high bit of 0x20(%rax) was somehow set.
The new code doesn't shift that bit away an more, but it should never
be set since it comes from

i_version = inode_query_iversion(inode);
...
iint->version = i_version;

and that inode_query_iversion() will have done the version shift.

> The interleaving between operating on RSI and RCX in the older code might
> alleviate that.
>
> In addition, the load if the 20(%rax) value is now done in the CMP instruction
> rather than earlier, so it might not get speculatively loaded in time, whereas
> the earlier code explicitly loads it up front.

No again, OoO cores will generally hide details like that.

You can see effects of it, but it's hard, and it can go both ways.

Anyway, I think the _real_ change has nothing to with instruction
scheduling, and everything to do with this:

107.62 ± 37%+139.1% 257.38 ± 16%  vmstat.io.bo
 48740 ± 36%+191.4% 142047 ± 16%  proc-vmstat.pgpgout

(There's fairly big variation in those numbers, but the changes are
even bigger) or this:

258.12  -100.0%   0.00turbostat.Avg_MHz
 21.48   -21.50.00turbostat.Busy%

or this:

 27397 ±194%  +43598.3%   11972338 ±139%
latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
 27942 ±189%  +96489.5%   26989044 ±139%
latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath

but those all sound like something changed in the setup, not in the kernel.

Odd.

Linus


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread Linus Torvalds
On Tue, Feb 27, 2018 at 5:43 AM, David Howells  wrote:
> Is it possible there's a stall between the load of RCX and the subsequent
> instructions because they all have to wait for RCX to become available?

No. Modern Intel big-core CPU's simply aren't that fragile. All these
instructions should do OoO fine for trivial sequences like this, and
as far as I can tell, the new code sequence should be better.

And even if it were worse for some odd reason, it would be worse by a cycle.

This kind of 18% change is something else, it is definitely not about
instruction scheduling.

Now, if the change to inode_cmp_iversion() causes some actual
_behavioral_ changes, and we get more IO, that's more like it. But the
code really does seem to be equivalent. In both cases it is simply
comparing 63 bits: the high 63 bits of 0x150(%rbp) - inode->i_version
- with the low 63 bits of 0x20(%rax) - iint->version.

The only issue would be if the high bit of 0x20(%rax) was somehow set.
The new code doesn't shift that bit away an more, but it should never
be set since it comes from

i_version = inode_query_iversion(inode);
...
iint->version = i_version;

and that inode_query_iversion() will have done the version shift.

> The interleaving between operating on RSI and RCX in the older code might
> alleviate that.
>
> In addition, the load if the 20(%rax) value is now done in the CMP instruction
> rather than earlier, so it might not get speculatively loaded in time, whereas
> the earlier code explicitly loads it up front.

No again, OoO cores will generally hide details like that.

You can see effects of it, but it's hard, and it can go both ways.

Anyway, I think the _real_ change has nothing to with instruction
scheduling, and everything to do with this:

107.62 ± 37%+139.1% 257.38 ± 16%  vmstat.io.bo
 48740 ± 36%+191.4% 142047 ± 16%  proc-vmstat.pgpgout

(There's fairly big variation in those numbers, but the changes are
even bigger) or this:

258.12  -100.0%   0.00turbostat.Avg_MHz
 21.48   -21.50.00turbostat.Busy%

or this:

 27397 ±194%  +43598.3%   11972338 ±139%
latency_stats.max.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
 27942 ±189%  +96489.5%   26989044 ±139%
latency_stats.sum.io_schedule.nfs_lock_and_join_requests.nfs_updatepage.nfs_write_end.generic_perform_write.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath

but those all sound like something changed in the setup, not in the kernel.

Odd.

Linus


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread Jeff Layton
On Tue, 2018-02-27 at 13:43 +, David Howells wrote:
> Jeff Layton  wrote:
> 
> >0x813ae828 <+136>:   je 0x813ae83a 
> > 
> >0x813ae82a <+138>:   mov0x150(%rbp),%rcx
> >0x813ae831 <+145>:   shr%rcx
> >0x813ae834 <+148>:   cmp%rcx,0x20(%rax)
> >0x813ae838 <+152>:   je 0x813ae862 
> > 
> 
> Is it possible there's a stall between the load of RCX and the subsequent
> instructions because they all have to wait for RCX to become available?
> 
> The interleaving between operating on RSI and RCX in the older code might
> alleviate that.
> 
> In addition, the load if the 20(%rax) value is now done in the CMP instruction
> rather than earlier, so it might not get speculatively loaded in time, whereas
> the earlier code explicitly loads it up front.
> 

Thanks David, that makes sense.

At this point, I think we ought to wait and see what the results look
like without IMA compiled in at all.

It's possible we're misunderstanding this completely. At most, we'll be
hitting this once on every close of a file. It doesn't seem like that
ought to be causing something this noticeable though.
-- 
Jeff Layton 


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread Jeff Layton
On Tue, 2018-02-27 at 13:43 +, David Howells wrote:
> Jeff Layton  wrote:
> 
> >0x813ae828 <+136>:   je 0x813ae83a 
> > 
> >0x813ae82a <+138>:   mov0x150(%rbp),%rcx
> >0x813ae831 <+145>:   shr%rcx
> >0x813ae834 <+148>:   cmp%rcx,0x20(%rax)
> >0x813ae838 <+152>:   je 0x813ae862 
> > 
> 
> Is it possible there's a stall between the load of RCX and the subsequent
> instructions because they all have to wait for RCX to become available?
> 
> The interleaving between operating on RSI and RCX in the older code might
> alleviate that.
> 
> In addition, the load if the 20(%rax) value is now done in the CMP instruction
> rather than earlier, so it might not get speculatively loaded in time, whereas
> the earlier code explicitly loads it up front.
> 

Thanks David, that makes sense.

At this point, I think we ought to wait and see what the results look
like without IMA compiled in at all.

It's possible we're misunderstanding this completely. At most, we'll be
hitting this once on every close of a file. It doesn't seem like that
ought to be causing something this noticeable though.
-- 
Jeff Layton 


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread David Howells
Jeff Layton  wrote:

>0x813ae828 <+136>: je 0x813ae83a 
>0x813ae82a <+138>: mov0x150(%rbp),%rcx
>0x813ae831 <+145>: shr%rcx
>0x813ae834 <+148>: cmp%rcx,0x20(%rax)
>0x813ae838 <+152>: je 0x813ae862 

Is it possible there's a stall between the load of RCX and the subsequent
instructions because they all have to wait for RCX to become available?

The interleaving between operating on RSI and RCX in the older code might
alleviate that.

In addition, the load if the 20(%rax) value is now done in the CMP instruction
rather than earlier, so it might not get speculatively loaded in time, whereas
the earlier code explicitly loads it up front.

David


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread David Howells
Jeff Layton  wrote:

>0x813ae828 <+136>: je 0x813ae83a 
>0x813ae82a <+138>: mov0x150(%rbp),%rcx
>0x813ae831 <+145>: shr%rcx
>0x813ae834 <+148>: cmp%rcx,0x20(%rax)
>0x813ae838 <+152>: je 0x813ae862 

Is it possible there's a stall between the load of RCX and the subsequent
instructions because they all have to wait for RCX to become available?

The interleaving between operating on RSI and RCX in the older code might
alleviate that.

In addition, the load if the 20(%rax) value is now done in the CMP instruction
rather than earlier, so it might not get speculatively loaded in time, whereas
the earlier code explicitly loads it up front.

David


Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread Jeff Layton
On Tue, 2018-02-27 at 15:42 +0800, kemi wrote:
> 
> On 2018年02月26日 20:33, Jeff Layton wrote:
> > On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
> > > On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
> > > > On 02/25, Jeff Layton wrote:
> > > > > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> > > > > > Greeting,
> > > > > > 
> > > > > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to 
> > > > > > commit:
> > > > > > 
> > > > > > 
> > > > > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
> > > > > > inode_cmp_iversion{+raw} return bool instead of s64")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
> > > > > > master
> > > > > > 
> > > > > > in testcase: aim7
> > > > > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 
> > > > > > 3.00GHz with 384G memory
> > > > > > with following parameters:
> > > > > > 
> > > > > > disk: 4BRD_12G
> > > > > > md: RAID0
> > > > > > fs: xfs
> > > > > > test: disk_src
> > > > > > load: 3000
> > > > > > cpufreq_governor: performance
> > > > > > 
> > > > > > test-description: AIM7 is a traditional UNIX system level benchmark 
> > > > > > suite which is used to test and measure the performance of 
> > > > > > multiuser system.
> > > > > > test-url: 
> > > > > > https://sourceforge.net/projects/aimbench/files/aim-suite7/
> > > > > > 
> > > > > > 
> > > > > 
> > > > > I'm a bit suspicious of this result.
> > > > > 
> > > > > This patch only changes inode_cmp_iversion{+raw} (since renamed to
> > > > > inode_eq_iversion{+raw}), and that neither should ever be called from
> > > > > xfs. The patch is fairly trivial too, and I wouldn't expect a big
> > > > > performance hit.
> > > > 
> > > > I tried to queue 4 more times test for both commit c0cef30e4f and its 
> > > > parent,
> > > > the result seems quite stable.
> > > > 
> > > > c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
> > > >  "aim7.jobs-per-min": [
> > > > 32964.01,
> > > > 32938.68,
> > > > 33068.18,
> > > > 32886.32,
> > > > 32843.72,
> > > > 32798.83,
> > > > 32898.34,
> > > > 32952.55
> > > >   ],
> > > > 
> > > > 3da90b159b146672f830bcd2489dd3a1f4e9e089:
> > > >   "aim7.jobs-per-min": [
> > > > 40239.65,
> > > > 40163.33,
> > > > 40353.32,
> > > > 39976.9,
> > > > 40185.75,
> > > > 40411.3,
> > > > 40213.58,
> > > > 39900.69
> > > >   ],
> > > > 
> > > > Any other test data you may need?
> > > > 
> > > > > 
> > > > > Is IMA involved here at all? I didn't see any evidence of it, but the
> > > > > kernel config did have it enabled.
> > > > > 
> > > > 
> > > > Sorry, not quite familiar with IMA, could you tell more about how to 
> > > > check it?
> > > > 
> > > 
> > > Thanks for retesting it, but I'm at a loss for why we're seeing this:
> > > 
> > > IMA is the the integrity management subsystem. It will use the iversion
> > > field to determine whether to remeasure files during remeasurement.  It
> > > looks like the kernel config has it enabled, but it doesn't look like
> > > it's in use, based on the info in the initial report.
> > > 
> > > This patch only affects two inlined functions inode_cmp_iversion and
> > > inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
> > > out). These functions are only called from IMA and fs-specific code
> > > (usually in readdir implementations to detect directory changes).
> > > 
> > > XFS does not call either of these functions however, so I'm a little
> > > unclear on how this patch could slow anything down on this test. The
> > > only thing I can think to do here would be to profile this and see what
> > > stands out.
> > > 
> > > Note that we do need to keep this in perspective too. This 18%
> > > regression on this test follows around a ~230% improvement that occurred
> > > when we merged the bulk of these patches. It's should still be quite a
> > > bit faster than the v4.15 in this regard.
> > > 
> > > Still, it'd be good to understand what's going on here.
> > > 
> > > 
> > 
> > Could we see the dmesg from this boot? It'd be good to confirm that IMA
> > is not involved here, as that's the only place that I can see that would
> > call into this code at all here.
> > 
> 
> See attachment for info on dmesg/perf-profile/compare_result.
> Feel free to let Xiaolong or me know if anything else you would like to check.
> 

Many thanks,

Only one caller of the functions touched by this patch shows up in the
profiles: ima_file_free. That calls ima_check_last_writer, which calls
inode_cmp_iversion. The lines from the profiles show:

3da90b159b146672f830bcd2489dd3a1f4e9e089:
 0.00% 0.00%  [kernel.kallsyms]   [k] ima_file_free

c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
 0.01% 0.01%  [kernel.kallsyms]   [k] ima_file_free

Seems pretty insignificant, but perhaps that is somehow accounting for
the difference. This is called when a file is freed so there could be an
effect I 

Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-27 Thread Jeff Layton
On Tue, 2018-02-27 at 15:42 +0800, kemi wrote:
> 
> On 2018年02月26日 20:33, Jeff Layton wrote:
> > On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
> > > On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
> > > > On 02/25, Jeff Layton wrote:
> > > > > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> > > > > > Greeting,
> > > > > > 
> > > > > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to 
> > > > > > commit:
> > > > > > 
> > > > > > 
> > > > > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
> > > > > > inode_cmp_iversion{+raw} return bool instead of s64")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
> > > > > > master
> > > > > > 
> > > > > > in testcase: aim7
> > > > > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 
> > > > > > 3.00GHz with 384G memory
> > > > > > with following parameters:
> > > > > > 
> > > > > > disk: 4BRD_12G
> > > > > > md: RAID0
> > > > > > fs: xfs
> > > > > > test: disk_src
> > > > > > load: 3000
> > > > > > cpufreq_governor: performance
> > > > > > 
> > > > > > test-description: AIM7 is a traditional UNIX system level benchmark 
> > > > > > suite which is used to test and measure the performance of 
> > > > > > multiuser system.
> > > > > > test-url: 
> > > > > > https://sourceforge.net/projects/aimbench/files/aim-suite7/
> > > > > > 
> > > > > > 
> > > > > 
> > > > > I'm a bit suspicious of this result.
> > > > > 
> > > > > This patch only changes inode_cmp_iversion{+raw} (since renamed to
> > > > > inode_eq_iversion{+raw}), and that neither should ever be called from
> > > > > xfs. The patch is fairly trivial too, and I wouldn't expect a big
> > > > > performance hit.
> > > > 
> > > > I tried to queue 4 more times test for both commit c0cef30e4f and its 
> > > > parent,
> > > > the result seems quite stable.
> > > > 
> > > > c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
> > > >  "aim7.jobs-per-min": [
> > > > 32964.01,
> > > > 32938.68,
> > > > 33068.18,
> > > > 32886.32,
> > > > 32843.72,
> > > > 32798.83,
> > > > 32898.34,
> > > > 32952.55
> > > >   ],
> > > > 
> > > > 3da90b159b146672f830bcd2489dd3a1f4e9e089:
> > > >   "aim7.jobs-per-min": [
> > > > 40239.65,
> > > > 40163.33,
> > > > 40353.32,
> > > > 39976.9,
> > > > 40185.75,
> > > > 40411.3,
> > > > 40213.58,
> > > > 39900.69
> > > >   ],
> > > > 
> > > > Any other test data you may need?
> > > > 
> > > > > 
> > > > > Is IMA involved here at all? I didn't see any evidence of it, but the
> > > > > kernel config did have it enabled.
> > > > > 
> > > > 
> > > > Sorry, not quite familiar with IMA, could you tell more about how to 
> > > > check it?
> > > > 
> > > 
> > > Thanks for retesting it, but I'm at a loss for why we're seeing this:
> > > 
> > > IMA is the the integrity management subsystem. It will use the iversion
> > > field to determine whether to remeasure files during remeasurement.  It
> > > looks like the kernel config has it enabled, but it doesn't look like
> > > it's in use, based on the info in the initial report.
> > > 
> > > This patch only affects two inlined functions inode_cmp_iversion and
> > > inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
> > > out). These functions are only called from IMA and fs-specific code
> > > (usually in readdir implementations to detect directory changes).
> > > 
> > > XFS does not call either of these functions however, so I'm a little
> > > unclear on how this patch could slow anything down on this test. The
> > > only thing I can think to do here would be to profile this and see what
> > > stands out.
> > > 
> > > Note that we do need to keep this in perspective too. This 18%
> > > regression on this test follows around a ~230% improvement that occurred
> > > when we merged the bulk of these patches. It's should still be quite a
> > > bit faster than the v4.15 in this regard.
> > > 
> > > Still, it'd be good to understand what's going on here.
> > > 
> > > 
> > 
> > Could we see the dmesg from this boot? It'd be good to confirm that IMA
> > is not involved here, as that's the only place that I can see that would
> > call into this code at all here.
> > 
> 
> See attachment for info on dmesg/perf-profile/compare_result.
> Feel free to let Xiaolong or me know if anything else you would like to check.
> 

Many thanks,

Only one caller of the functions touched by this patch shows up in the
profiles: ima_file_free. That calls ima_check_last_writer, which calls
inode_cmp_iversion. The lines from the profiles show:

3da90b159b146672f830bcd2489dd3a1f4e9e089:
 0.00% 0.00%  [kernel.kallsyms]   [k] ima_file_free

c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
 0.01% 0.01%  [kernel.kallsyms]   [k] ima_file_free

Seems pretty insignificant, but perhaps that is somehow accounting for
the difference. This is called when a file is freed so there could be an
effect I 

Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-26 Thread kemi


On 2018年02月26日 20:33, Jeff Layton wrote:
> On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
>> On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
>>> On 02/25, Jeff Layton wrote:
 On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
>
>
> commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
> inode_cmp_iversion{+raw} return bool instead of s64")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: aim7
> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
> with 384G memory
> with following parameters:
>
>   disk: 4BRD_12G
>   md: RAID0
>   fs: xfs
>   test: disk_src
>   load: 3000
>   cpufreq_governor: performance
>
> test-description: AIM7 is a traditional UNIX system level benchmark suite 
> which is used to test and measure the performance of multiuser system.
> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>
>

 I'm a bit suspicious of this result.

 This patch only changes inode_cmp_iversion{+raw} (since renamed to
 inode_eq_iversion{+raw}), and that neither should ever be called from
 xfs. The patch is fairly trivial too, and I wouldn't expect a big
 performance hit.
>>>
>>> I tried to queue 4 more times test for both commit c0cef30e4f and its 
>>> parent,
>>> the result seems quite stable.
>>>
>>> c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
>>>  "aim7.jobs-per-min": [
>>> 32964.01,
>>> 32938.68,
>>> 33068.18,
>>> 32886.32,
>>> 32843.72,
>>> 32798.83,
>>> 32898.34,
>>> 32952.55
>>>   ],
>>>
>>> 3da90b159b146672f830bcd2489dd3a1f4e9e089:
>>>   "aim7.jobs-per-min": [
>>> 40239.65,
>>> 40163.33,
>>> 40353.32,
>>> 39976.9,
>>> 40185.75,
>>> 40411.3,
>>> 40213.58,
>>> 39900.69
>>>   ],
>>>
>>> Any other test data you may need?
>>>

 Is IMA involved here at all? I didn't see any evidence of it, but the
 kernel config did have it enabled.

>>>
>>> Sorry, not quite familiar with IMA, could you tell more about how to check 
>>> it?
>>>
>>
>> Thanks for retesting it, but I'm at a loss for why we're seeing this:
>>
>> IMA is the the integrity management subsystem. It will use the iversion
>> field to determine whether to remeasure files during remeasurement.  It
>> looks like the kernel config has it enabled, but it doesn't look like
>> it's in use, based on the info in the initial report.
>>
>> This patch only affects two inlined functions inode_cmp_iversion and
>> inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
>> out). These functions are only called from IMA and fs-specific code
>> (usually in readdir implementations to detect directory changes).
>>
>> XFS does not call either of these functions however, so I'm a little
>> unclear on how this patch could slow anything down on this test. The
>> only thing I can think to do here would be to profile this and see what
>> stands out.
>>
>> Note that we do need to keep this in perspective too. This 18%
>> regression on this test follows around a ~230% improvement that occurred
>> when we merged the bulk of these patches. It's should still be quite a
>> bit faster than the v4.15 in this regard.
>>
>> Still, it'd be good to understand what's going on here.
>>
>>
> 
> Could we see the dmesg from this boot? It'd be good to confirm that IMA
> is not involved here, as that's the only place that I can see that would
> call into this code at all here.
> 

See attachment for info on dmesg/perf-profile/compare_result.
Feel free to let Xiaolong or me know if anything else you would like to check.

> Thanks,
> Jeff
> 
> 
>>> Thanks,
>>> Xiaolong

>
> Details are as below:
> -->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
>
> =
> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>   
> gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/3000/RAID0/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_src/aim7
>
> commit: 
>   3da90b159b (" f2fs-for-4.16-rc1")
>   c0cef30e4f ("iversion: make inode_cmp_iversion{+raw} return bool 
> instead of s64")
>
> 3da90b159b146672 c0cef30e4ff0dc025f4a1660b8 
>  -- 
>  %stddev %change %stddev
>  \  |\  
>  40183   -18.0%  32964  

Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression

2018-02-26 Thread kemi


On 2018年02月26日 20:33, Jeff Layton wrote:
> On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
>> On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
>>> On 02/25, Jeff Layton wrote:
 On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
>
>
> commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make 
> inode_cmp_iversion{+raw} return bool instead of s64")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: aim7
> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 
> with 384G memory
> with following parameters:
>
>   disk: 4BRD_12G
>   md: RAID0
>   fs: xfs
>   test: disk_src
>   load: 3000
>   cpufreq_governor: performance
>
> test-description: AIM7 is a traditional UNIX system level benchmark suite 
> which is used to test and measure the performance of multiuser system.
> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>
>

 I'm a bit suspicious of this result.

 This patch only changes inode_cmp_iversion{+raw} (since renamed to
 inode_eq_iversion{+raw}), and that neither should ever be called from
 xfs. The patch is fairly trivial too, and I wouldn't expect a big
 performance hit.
>>>
>>> I tried to queue 4 more times test for both commit c0cef30e4f and its 
>>> parent,
>>> the result seems quite stable.
>>>
>>> c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
>>>  "aim7.jobs-per-min": [
>>> 32964.01,
>>> 32938.68,
>>> 33068.18,
>>> 32886.32,
>>> 32843.72,
>>> 32798.83,
>>> 32898.34,
>>> 32952.55
>>>   ],
>>>
>>> 3da90b159b146672f830bcd2489dd3a1f4e9e089:
>>>   "aim7.jobs-per-min": [
>>> 40239.65,
>>> 40163.33,
>>> 40353.32,
>>> 39976.9,
>>> 40185.75,
>>> 40411.3,
>>> 40213.58,
>>> 39900.69
>>>   ],
>>>
>>> Any other test data you may need?
>>>

 Is IMA involved here at all? I didn't see any evidence of it, but the
 kernel config did have it enabled.

>>>
>>> Sorry, not quite familiar with IMA, could you tell more about how to check 
>>> it?
>>>
>>
>> Thanks for retesting it, but I'm at a loss for why we're seeing this:
>>
>> IMA is the the integrity management subsystem. It will use the iversion
>> field to determine whether to remeasure files during remeasurement.  It
>> looks like the kernel config has it enabled, but it doesn't look like
>> it's in use, based on the info in the initial report.
>>
>> This patch only affects two inlined functions inode_cmp_iversion and
>> inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
>> out). These functions are only called from IMA and fs-specific code
>> (usually in readdir implementations to detect directory changes).
>>
>> XFS does not call either of these functions however, so I'm a little
>> unclear on how this patch could slow anything down on this test. The
>> only thing I can think to do here would be to profile this and see what
>> stands out.
>>
>> Note that we do need to keep this in perspective too. This 18%
>> regression on this test follows around a ~230% improvement that occurred
>> when we merged the bulk of these patches. It's should still be quite a
>> bit faster than the v4.15 in this regard.
>>
>> Still, it'd be good to understand what's going on here.
>>
>>
> 
> Could we see the dmesg from this boot? It'd be good to confirm that IMA
> is not involved here, as that's the only place that I can see that would
> call into this code at all here.
> 

See attachment for info on dmesg/perf-profile/compare_result.
Feel free to let Xiaolong or me know if anything else you would like to check.

> Thanks,
> Jeff
> 
> 
>>> Thanks,
>>> Xiaolong

>
> Details are as below:
> -->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
>
> =
> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>   
> gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/3000/RAID0/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_src/aim7
>
> commit: 
>   3da90b159b (" f2fs-for-4.16-rc1")
>   c0cef30e4f ("iversion: make inode_cmp_iversion{+raw} return bool 
> instead of s64")
>
> 3da90b159b146672 c0cef30e4ff0dc025f4a1660b8 
>  -- 
>  %stddev %change %stddev
>  \  |\  
>  40183   -18.0%  32964