Re: [RFC PATCH] fs: Move @f_count to different cacheline with @f_mode

2020-05-17 Thread Shaokun Zhang
Hi maintainers,

A gentle ping.

Thanks,
Shaokun

On 2020/4/30 11:25, Shaokun Zhang wrote:
> From: Yuqi Jin 
> 
> __fget_files does check the @f_mode with mask variable and will do some
> atomic operations on @f_count while both are on the same cacheline.
> Many CPU cores do file access and it will cause much conflicts on @f_count. 
> If we could make the two members into different cachelines, it shall relax
> the siutations.
> 
> We have tested this on ARM64 and X86, the result is as follows:
> 
> Syscall of unixbench has been run on Huawei Kunpeng920 with this patch:
> 24 x System Call Overhead  1
> 
> System Call Overhead3160841.4 lps   (10.0 s, 1 samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.03160841.4   2107.2
>
> System Benchmarks Index Score (Partial Only) 2107.2
> 
> Without this patch:
> 24 x System Call Overhead  1
> 
> System Call Overhead456.0 lps   (10.0 s, 1 samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.0456.0   1481.6
>
> System Benchmarks Index Score (Partial Only) 1481.6
> 
> And on Intel 6248 platform with this patch:
> 40 CPUs in system; running 24 parallel copies of tests
> 
> System Call Overhead4288509.1 lps   (10.0 s, 1 
> samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.04288509.1   2859.0
>
> System Benchmarks Index Score (Partial Only) 2859.0
> 
> Without this patch:
> 40 CPUs in system; running 24 parallel copies of tests
> 
> System Call Overhead3666313.0 lps   (10.0 s, 1 
> samples)
> 
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> System Call Overhead  15000.03666313.0   2444.2
>
> System Benchmarks Index Score (Partial Only) 2444.2
> 
> Cc: Alexander Viro 
> Signed-off-by: Yuqi Jin 
> Signed-off-by: Shaokun Zhang 
> ---
>  include/linux/fs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 4f6f59b4f22a..90e76283f0fd 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -953,7 +953,6 @@ struct file {
>*/
>   spinlock_t  f_lock;
>   enum rw_hintf_write_hint;
> - atomic_long_t   f_count;
>   unsigned intf_flags;
>   fmode_t f_mode;
>   struct mutexf_pos_lock;
> @@ -976,6 +975,7 @@ struct file {
>  #endif /* #ifdef CONFIG_EPOLL */
>   struct address_space*f_mapping;
>   errseq_tf_wb_err;
> + atomic_long_t   f_count;
>  } __randomize_layout
>__attribute__((aligned(4)));   /* lest something weird decides that 2 
> is OK */
>  
> 



[RFC PATCH] fs: Move @f_count to different cacheline with @f_mode

2020-04-29 Thread Shaokun Zhang
From: Yuqi Jin 

__fget_files does check the @f_mode with mask variable and will do some
atomic operations on @f_count while both are on the same cacheline.
Many CPU cores do file access and it will cause much conflicts on @f_count. 
If we could make the two members into different cachelines, it shall relax
the siutations.

We have tested this on ARM64 and X86, the result is as follows:

Syscall of unixbench has been run on Huawei Kunpeng920 with this patch:
24 x System Call Overhead  1

System Call Overhead3160841.4 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index  BASELINE   RESULTINDEX
System Call Overhead  15000.03160841.4   2107.2
   
System Benchmarks Index Score (Partial Only) 2107.2

Without this patch:
24 x System Call Overhead  1

System Call Overhead456.0 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index  BASELINE   RESULTINDEX
System Call Overhead  15000.0456.0   1481.6
   
System Benchmarks Index Score (Partial Only) 1481.6

And on Intel 6248 platform with this patch:
40 CPUs in system; running 24 parallel copies of tests

System Call Overhead4288509.1 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index  BASELINE   RESULTINDEX
System Call Overhead  15000.04288509.1   2859.0
   
System Benchmarks Index Score (Partial Only) 2859.0

Without this patch:
40 CPUs in system; running 24 parallel copies of tests

System Call Overhead3666313.0 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index  BASELINE   RESULTINDEX
System Call Overhead  15000.03666313.0   2444.2
   
System Benchmarks Index Score (Partial Only) 2444.2

Cc: Alexander Viro 
Signed-off-by: Yuqi Jin 
Signed-off-by: Shaokun Zhang 
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f6f59b4f22a..90e76283f0fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -953,7 +953,6 @@ struct file {
 */
spinlock_t  f_lock;
enum rw_hintf_write_hint;
-   atomic_long_t   f_count;
unsigned intf_flags;
fmode_t f_mode;
struct mutexf_pos_lock;
@@ -976,6 +975,7 @@ struct file {
 #endif /* #ifdef CONFIG_EPOLL */
struct address_space*f_mapping;
errseq_tf_wb_err;
+   atomic_long_t   f_count;
 } __randomize_layout
   __attribute__((aligned(4))); /* lest something weird decides that 2 is OK */
 
-- 
2.7.4