Re: [PATCH 12/18] arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2016-09-02 Thread Bamvor Jian Zhang
Hi, Yury

On 08/17/2016 07:46 PM, Yury Norov wrote:
> From: Andrew Pinski <apin...@cavium.com>
> 
[...]
> diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
> new file mode 100644
> index 000..10fc0ca
> --- /dev/null
> +++ b/arch/arm64/kernel/sys_ilp32.c
> @@ -0,0 +1,86 @@
> +/*
> + * AArch64- ILP32 specific system calls implementation
> + *
> + * Copyright (C) 2016 Cavium Inc.
> + * Author: Andrew Pinski <apin...@cavium.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#define __SYSCALL_COMPAT
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Using aarch32 syscall handlerss where off_t is passed.
> + */
> +#define compat_sys_fadvise64_64  compat_sys_fadvise64_64_wrapper
> +#define compat_sys_fallocate compat_sys_fallocate_wrapper
> +#define compat_sys_fcntl64   sys_fcntl
> +#define compat_sys_ftruncate64   compat_sys_ftruncate64_wrapper
> +#define compat_sys_pread64   compat_sys_pread64_wrapper
> +#define compat_sys_pwrite64  compat_sys_pwrite64_wrapper
> +#define compat_sys_readahead compat_sys_readahead_wrapper
> +#define compat_sys_shmat sys_shmat
> +#define compat_sys_sync_file_range   compat_sys_sync_file_range2_wrapper
> +#define compat_sys_truncate64compat_sys_truncate64_wrapper
When we test sync_file_range with our own testcase, we found that the parameter
sequence is not correct for sync_file_range2 because glibc think kernel use
the sync_file_range. So, in this patch we force to sync_file_range2.

Is it make sense to you?

Regards

Bamvor

>From e730e4db23bca4dd0ff6bcca0bc4c04e5c13b5c7 Mon Sep 17 00:00:00 2001
From: Bamvor Jian Zhang <bamvor.zhangj...@huawei.com>
Date: Sat, 27 Aug 2016 12:26:31 +0800
Subject: [PATCH] arm64:ilp32: force sync_file_range2

Define __ARCH_WANT_SYNC_FILE_RANGE2 in order to select correct
sync_file_range parameters sequence in glibc and kernel.

Tested-by: Jianguo Chen <chenjiang...@huawei.com>
Signed-off-by: Bamvor Jian Zhang <bamvor.zhangj...@huawei.com>
---
 arch/arm64/include/uapi/asm/unistd.h | 5 +
 arch/arm64/kernel/sys_ilp32.c| 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/uapi/asm/unistd.h 
b/arch/arm64/include/uapi/asm/unistd.h
index 043d17a..78bea1d 100644
--- a/arch/arm64/include/uapi/asm/unistd.h
+++ b/arch/arm64/include/uapi/asm/unistd.h
@@ -16,4 +16,9 @@

 #define __ARCH_WANT_RENAMEAT

+/* We need to make sure it works for both userspace and kernel(sys_ilp32.c) */
+#if defined(__ILP32__) || defined(__SYSCALL_COMPAT)
+#define __ARCH_WANT_SYNC_FILE_RANGE2
+#endif
+
 #include 
diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c
index 10fc0ca..13c9c9d 100644
--- a/arch/arm64/kernel/sys_ilp32.c
+++ b/arch/arm64/kernel/sys_ilp32.c
@@ -42,7 +42,7 @@
 #define compat_sys_pwrite64compat_sys_pwrite64_wrapper
 #define compat_sys_readahead   compat_sys_readahead_wrapper
 #define compat_sys_shmat   sys_shmat
-#define compat_sys_sync_file_range compat_sys_sync_file_range2_wrapper
+#define compat_sys_sync_file_range2compat_sys_sync_file_range2_wrapper
 #define compat_sys_truncate64  compat_sys_truncate64_wrapper
 #define sys_mmap2  compat_sys_mmap2_wrapper
 #define sys_ptrace compat_sys_ptrace
-- 
1.8.4.5

> +#define sys_mmap2compat_sys_mmap2_wrapper
> +#define sys_ptrace   compat_sys_ptrace
> +
> +/*
> + * Use non-compat syscall handlers where rlimit, stat and statfs
> + * structure pointers are passed, as their layout is identical to LP64.
> + */
> +#define compat_sys_fstatfs64 sys_fstatfs
> +#define compat_sys_statfs64  sys_statfs
> +#define sys_fstat64  sys_newfstat
> +#define sys_fstatat64sys_newfstatat
> +#define compat_sys_getrlimit sys_getrlimit
> +#define compat_sys_setrlimit sys_setrlimit
> +
> +asmlinkage long compat_sys_fadvise64_64_wra

Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64

2016-09-02 Thread Bamvor Jian Zhang
Base on the off-list discussion, the community care about the
performance regression of aarch64 LP64 and aarch32 after ILP32
is merged.

Given that there is not big open issue in ILP32 in kernel part, I try
to address this concern. It is reasonable that we should run lots of
testsuite(such as LKP) to ensure there is no performance regression.
But I am not expert of this, I started from test the lmbench for
aarch64 LP64 and compare the differnce between ILP32 enabled and
without ILP32 patches.

The branch I used is ilp32-4.8 on [1], compare the result between
two commit "d3746f1 arm64:ilp32: add ARM64_ILP32 to Kconfig"(defconfig
with CONFIG_ARM64_ILP32) and "3054de8 fiz set_personality by Catalin"
(defconfig).

The result show there is no big difference. Most of the difference is
less than 5%. Only two differnce more than 10%:
1.  Context switching 2p/16K 13.16%(ILP32 is bigger than No_ILP32.
smaller is better)
2.  *Local* Communication bandwidths: TCP -10.77%.(ILP32 is smaller than
No_ILP32. bigger is better).


If it is make sense to community, I could continue to do more that.

Thanks

Bamvor

[1] https://github.com/norov/linux.git
[2] The full result: (ILP32 - No_ILP32)/No_ILP32

 L M B E N C H  3 . 0   S U M M A R Y
 
 (Alpha software, do not distribute)

Basic system parameters
--
Host OS Description  Mhz  tlb  cache  mem   scal
 pages line   par   load
   bytes
- - ---  - - -- 
buildroot Linux 4.8.0-r A64_ILP32_diff_No_ILP32 102432   128 0.23% 1

Processor, Processes - times in microseconds - smaller is better
--
Host OS  Mhz null null  open slct sig  sig  fork exec sh
 call  I/O stat clos TCP  inst hndl proc proc proc
- -           
buildroot Linux 4.8.0-r 0.00% 0.00% 0.00% -3.03% -0.42% -1.96% 0.00% -0.67% 
2.29% -6.34% 0.85%

Basic integer operations - times in nanoseconds - smaller is better
---
Host OS  intgr intgr  intgr  intgr  intgr
  bit   addmuldivmod
- - -- -- -- -- --
buildroot Linux 4.8.0-r 0.00%  0.00%  0.00%  0.00%  0.00%

Basic uint64 operations - times in nanoseconds - smaller is better
--
Host OS int64  int64  int64  int64  int64
 bitaddmuldivmod
- - -- -- -- -- --
buildroot Linux 4.8.0-r  0.00%0.00%  0.00%  0.00%

Basic float operations - times in nanoseconds - smaller is better
-
Host OS  float  float  float  float
 addmuldivbogo
- - -- -- -- --
buildroot Linux 4.8.0-r 0.00%  0.00%  0.04%  0.00%

Basic double operations - times in nanoseconds - smaller is better
--
Host OS  double double double double
 addmuldivbogo
- - --  -- -- --
buildroot Linux 4.8.0-r 0.00%  0.00%0.00%  0.00%

Context switching - times in microseconds - smaller is better
-
Host OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
- - -- -- -- -- -- --- ---
buildroot Linux 4.8.0-r  -6.00%  13.16% -1.83% 3.80%  9.94%  -6.17%   2.72%

*Local* Communication latencies in microseconds - smaller is better
-
Host OS 2p/0K  Pipe AF UDP  RPC/   TCP  RPC/ TCP
ctxsw   UNIX UDP TCP conn
- - - -  - - - - 
buildroot Linux 4.8.0-r -6.00% -4.08% 1.95% -5.02%4.87% 0.00%


File & VM system latencies in microseconds - smaller is better
---
Host OS   0K File  10K File MmapProt   Page   100fd
Create Delete Create Delete Latency Fault  Fault  selct
- - -- -- -- -- --- - --- -
buildroot