Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
Base on the off-list discussion, the community care about the performance regression of aarch64 LP64 and aarch32 after ILP32 is merged. Given that there is not big open issue in ILP32 in kernel part, I try to address this concern. It is reasonable that we should run lots of testsuite(such as LKP) to ensure there is no performance regression. But I am not expert of this, I started from test the lmbench for aarch64 LP64 and compare the differnce between ILP32 enabled and without ILP32 patches. The branch I used is ilp32-4.8 on [1], compare the result between two commit "d3746f1 arm64:ilp32: add ARM64_ILP32 to Kconfig"(defconfig with CONFIG_ARM64_ILP32) and "3054de8 fiz set_personality by Catalin" (defconfig). The result show there is no big difference. Most of the difference is less than 5%. Only two differnce more than 10%: 1. Context switching 2p/16K 13.16%(ILP32 is bigger than No_ILP32. smaller is better) 2. *Local* Communication bandwidths: TCP -10.77%.(ILP32 is smaller than No_ILP32. bigger is better). If it is make sense to community, I could continue to do more that. Thanks Bamvor [1] https://github.com/norov/linux.git [2] The full result: (ILP32 - No_ILP32)/No_ILP32 L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) Basic system parameters -- Host OS Description Mhz tlb cache mem scal pages line par load bytes - - --- - - -- buildroot Linux 4.8.0-r A64_ILP32_diff_No_ILP32 102432 128 0.23% 1 Processor, Processes - times in microseconds - smaller is better -- Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc - - buildroot Linux 4.8.0-r 0.00% 0.00% 0.00% -3.03% -0.42% -1.96% 0.00% -0.67% 2.29% -6.34% 0.85% Basic integer operations - times in nanoseconds - smaller is better --- Host OS intgr intgr intgr intgr intgr bit addmuldivmod - - -- -- -- -- -- buildroot Linux 4.8.0-r 0.00% 0.00% 0.00% 0.00% 0.00% Basic uint64 operations - times in nanoseconds - smaller is better -- Host OS int64 int64 int64 int64 int64 bitaddmuldivmod - - -- -- -- -- -- buildroot Linux 4.8.0-r 0.00%0.00% 0.00% 0.00% Basic float operations - times in nanoseconds - smaller is better - Host OS float float float float addmuldivbogo - - -- -- -- -- buildroot Linux 4.8.0-r 0.00% 0.00% 0.04% 0.00% Basic double operations - times in nanoseconds - smaller is better -- Host OS double double double double addmuldivbogo - - -- -- -- -- buildroot Linux 4.8.0-r 0.00% 0.00%0.00% 0.00% Context switching - times in microseconds - smaller is better - Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw - - -- -- -- -- -- --- --- buildroot Linux 4.8.0-r -6.00% 13.16% -1.83% 3.80% 9.94% -6.17% 2.72% *Local* Communication latencies in microseconds - smaller is better - Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn - - - - - - - - buildroot Linux 4.8.0-r -6.00% -4.08% 1.95% -5.02%4.87% 0.00% File & VM system latencies in microseconds - smaller is better --- Host OS 0K File 10K File MmapProt Page 100fd Create Delete Create Delete Latency Fault Fault selct - - -- -- -- -- --- - --- - buildroot
Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
On Wed, Aug 17, 2016 at 04:26:42PM +0100, Catalin Marinas wrote: > On Wed, Aug 17, 2016 at 04:32:23PM +0200, Dr. Philipp Tomsich wrote: > > On 17 Aug 2016, at 16:29, Catalin Marinaswrote: > > > On Wed, Aug 17, 2016 at 02:54:59PM +0200, Dr. Philipp Tomsich wrote: > > >> On 17 Aug 2016, at 14:48, Yury Norov wrote: > > >>> On Wed, Aug 17, 2016 at 02:28:50PM +0200, Alexander Graf wrote: > > On 17 Aug 2016, at 13:46, Yury Norov wrote: > > > This series enables aarch64 with ilp32 mode, and as supporting work, > > > introduces ARCH_32BIT_OFF_T configuration option that is enabled for > > > existing 32-bit architectures but disabled for new arches (so 64-bit > > > off_t is is used by new userspace). > > > > > > This version is based on kernel v4.8-rc2. > > > It works with glibc-2.23, and tested with LTP. > > > > > > This is RFC because there is still no solid understanding what type > > > of registers > > > top-halves delousing we prefer. In this patchset, w0-w7 are cleared > > > for each > > > syscall in assembler entry. The alternative approach is in > > > introducing compat > > > wrappers which is little faster for natively routed syscalls (~2.6% > > > for syscall > > > with no payload) but much more complicated. > > > > So you’re saying there are 2 options: > > > > 1) easy to get right, slightly slower, same ABI to user space as 2 > > 2) harder to get right, minor performance benefit > > >>> > > >>> No, ABI is little different. If 1) we pass off_t in a pair to syscalls, > > >>> if 2) - in a single register. So if 1, we 'd take some wrappers from > > >>> aarch32. > > >>> See patch 12 here. > > >> > > >> From our experience with ILP32, I’d prefer to have off_t (and similar) > > >> in a single register whenever possible (i.e. option #2). It feels > > >> more natural to use the full 64bit registers whenever possible, as > > >> ILP32 on ARMv8 should really be understood as a 64bit ABI with a 32bit > > >> memory model. > > > > > > I think we are well past the point where we considered ILP32 a 64-bit > > > ABI. It would have been nice but we decided that breaking POSIX > > > compatibility is a bad idea, so we went back (again) to a 32-bit ABI for > > > ILP32. While there are 64-bit arguments that, at a first look, would > > > make sense to be passed in 64-bit registers, the kernel maintenance cost > > > is significant with changes to generic files. > > > > > > Allowing 64-bit wide registers at the ILP32 syscall interface means that > > > the kernel would have to zero/sign-extend the upper half of the 32-bit > > > arguments for the cases where they are passed directly to a native > > > syscall that expects a 64-bit argument. This (a) adds a significant > > > number of wrappers to the generic code together additional annotations > > > to the generic unistd.h and (b) it adds a small overhead to the AArch32 > > > (compat) ABI since it doesn't need such generic wrapping (the upper half > > > of 64-bit registers is guaranteed to be zero/preserved by the > > > architecture when coming from the AArch32 mode). > > > > Yes, I remember the discussions and just wanted to put option #2 in > > context again. > > I don't particularly like splitting 64-bit arguments in two 32-bit > values either but I don't see a better alternative. To keep this > mostly in the arch code we would need an additional table of syscall > wrappers where the majority just use the default zero-extend everything > with a few specific wrappers where we pass 64-bit arguments. Or we could > set an extra bit in the syscall number for those syscalls that need > special wrapping and avoid zero-extending. But neither of these look any > nicer (well, maybe only from the user-space perspective). > This is the discussion started by David Miller https://patchwork.kernel.org/patch/9132521/ After it we switched to current version. > > Everything points to just going with the pair-of-registers and getting > > this merged quickly then, I suppose. > > I will refrain from commenting on how quickly we merge this ;) (it may > be seen as binding by some). > > -- > Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
> On 17 Aug 2016, at 16:29, Catalin Marinaswrote: > > On Wed, Aug 17, 2016 at 02:54:59PM +0200, Dr. Philipp Tomsich wrote: >> On 17 Aug 2016, at 14:48, Yury Norov wrote: >>> On Wed, Aug 17, 2016 at 02:28:50PM +0200, Alexander Graf wrote: On 17 Aug 2016, at 13:46, Yury Norov wrote: > This series enables aarch64 with ilp32 mode, and as supporting work, > introduces ARCH_32BIT_OFF_T configuration option that is enabled for > existing 32-bit architectures but disabled for new arches (so 64-bit > off_t is is used by new userspace). > > This version is based on kernel v4.8-rc2. > It works with glibc-2.23, and tested with LTP. > > This is RFC because there is still no solid understanding what type of > registers > top-halves delousing we prefer. In this patchset, w0-w7 are cleared for > each > syscall in assembler entry. The alternative approach is in introducing > compat > wrappers which is little faster for natively routed syscalls (~2.6% for > syscall > with no payload) but much more complicated. So you’re saying there are 2 options: 1) easy to get right, slightly slower, same ABI to user space as 2 2) harder to get right, minor performance benefit >>> >>> No, ABI is little different. If 1) we pass off_t in a pair to syscalls, >>> if 2) - in a single register. So if 1, we 'd take some wrappers from >>> aarch32. >>> See patch 12 here. >> >> From our experience with ILP32, I’d prefer to have off_t (and similar) >> in a single register whenever possible (i.e. option #2). It feels >> more natural to use the full 64bit registers whenever possible, as >> ILP32 on ARMv8 should really be understood as a 64bit ABI with a 32bit >> memory model. > > I think we are well past the point where we considered ILP32 a 64-bit > ABI. It would have been nice but we decided that breaking POSIX > compatibility is a bad idea, so we went back (again) to a 32-bit ABI for > ILP32. While there are 64-bit arguments that, at a first look, would > make sense to be passed in 64-bit registers, the kernel maintenance cost > is significant with changes to generic files. > > Allowing 64-bit wide registers at the ILP32 syscall interface means that > the kernel would have to zero/sign-extend the upper half of the 32-bit > arguments for the cases where they are passed directly to a native > syscall that expects a 64-bit argument. This (a) adds a significant > number of wrappers to the generic code together additional annotations > to the generic unistd.h and (b) it adds a small overhead to the AArch32 > (compat) ABI since it doesn't need such generic wrapping (the upper half > of 64-bit registers is guaranteed to be zero/preserved by the > architecture when coming from the AArch32 mode). Yes, I remember the discussions and just wanted to put option #2 in context again. Everything points to just going with the pair-of-registers and getting this merged quickly then, I suppose. Cheers, Philipp.-- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
On Wed, Aug 17, 2016 at 02:54:59PM +0200, Dr. Philipp Tomsich wrote: > On 17 Aug 2016, at 14:48, Yury Norovwrote: > > On Wed, Aug 17, 2016 at 02:28:50PM +0200, Alexander Graf wrote: > >> On 17 Aug 2016, at 13:46, Yury Norov wrote: > >>> This series enables aarch64 with ilp32 mode, and as supporting work, > >>> introduces ARCH_32BIT_OFF_T configuration option that is enabled for > >>> existing 32-bit architectures but disabled for new arches (so 64-bit > >>> off_t is is used by new userspace). > >>> > >>> This version is based on kernel v4.8-rc2. > >>> It works with glibc-2.23, and tested with LTP. > >>> > >>> This is RFC because there is still no solid understanding what type of > >>> registers > >>> top-halves delousing we prefer. In this patchset, w0-w7 are cleared for > >>> each > >>> syscall in assembler entry. The alternative approach is in introducing > >>> compat > >>> wrappers which is little faster for natively routed syscalls (~2.6% for > >>> syscall > >>> with no payload) but much more complicated. > >> > >> So you’re saying there are 2 options: > >> > >> 1) easy to get right, slightly slower, same ABI to user space as 2 > >> 2) harder to get right, minor performance benefit > > > > No, ABI is little different. If 1) we pass off_t in a pair to syscalls, > > if 2) - in a single register. So if 1, we 'd take some wrappers from > > aarch32. > > See patch 12 here. > > From our experience with ILP32, I’d prefer to have off_t (and similar) > in a single register whenever possible (i.e. option #2). It feels > more natural to use the full 64bit registers whenever possible, as > ILP32 on ARMv8 should really be understood as a 64bit ABI with a 32bit > memory model. I think we are well past the point where we considered ILP32 a 64-bit ABI. It would have been nice but we decided that breaking POSIX compatibility is a bad idea, so we went back (again) to a 32-bit ABI for ILP32. While there are 64-bit arguments that, at a first look, would make sense to be passed in 64-bit registers, the kernel maintenance cost is significant with changes to generic files. Allowing 64-bit wide registers at the ILP32 syscall interface means that the kernel would have to zero/sign-extend the upper half of the 32-bit arguments for the cases where they are passed directly to a native syscall that expects a 64-bit argument. This (a) adds a significant number of wrappers to the generic code together additional annotations to the generic unistd.h and (b) it adds a small overhead to the AArch32 (compat) ABI since it doesn't need such generic wrapping (the upper half of 64-bit registers is guaranteed to be zero/preserved by the architecture when coming from the AArch32 mode). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
> On 17 Aug 2016, at 14:48, Yury Norovwrote: > > On Wed, Aug 17, 2016 at 02:28:50PM +0200, Alexander Graf wrote: >> >>> On 17 Aug 2016, at 13:46, Yury Norov wrote: >>> >>> This series enables aarch64 with ilp32 mode, and as supporting work, >>> introduces ARCH_32BIT_OFF_T configuration option that is enabled for >>> existing 32-bit architectures but disabled for new arches (so 64-bit >>> off_t is is used by new userspace). >>> >>> This version is based on kernel v4.8-rc2. >>> It works with glibc-2.23, and tested with LTP. >>> >>> This is RFC because there is still no solid understanding what type of >>> registers >>> top-halves delousing we prefer. In this patchset, w0-w7 are cleared for each >>> syscall in assembler entry. The alternative approach is in introducing >>> compat >>> wrappers which is little faster for natively routed syscalls (~2.6% for >>> syscall >>> with no payload) but much more complicated. >> >> So you’re saying there are 2 options: >> >> 1) easy to get right, slightly slower, same ABI to user space as 2 >> 2) harder to get right, minor performance benefit > > No, ABI is little different. If 1) we pass off_t in a pair to syscalls, > if 2) - in a single register. So if 1, we 'd take some wrappers from aarch32. > See patch 12 here. >From our experience with ILP32, I’d prefer to have off_t (and similar) in a >single register whenever possible (i.e. option #2). It feels more natural to use the full 64bit registers whenever possible, as ILP32 on ARMv8 should really be understood as a 64bit ABI with a 32bit memory model. Cheers, Philipp.-- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
On Wed, Aug 17, 2016 at 02:28:50PM +0200, Alexander Graf wrote: > > > On 17 Aug 2016, at 13:46, Yury Norovwrote: > > > > This series enables aarch64 with ilp32 mode, and as supporting work, > > introduces ARCH_32BIT_OFF_T configuration option that is enabled for > > existing 32-bit architectures but disabled for new arches (so 64-bit > > off_t is is used by new userspace). > > > > This version is based on kernel v4.8-rc2. > > It works with glibc-2.23, and tested with LTP. > > > > This is RFC because there is still no solid understanding what type of > > registers > > top-halves delousing we prefer. In this patchset, w0-w7 are cleared for each > > syscall in assembler entry. The alternative approach is in introducing > > compat > > wrappers which is little faster for natively routed syscalls (~2.6% for > > syscall > > with no payload) but much more complicated. > > So you’re saying there are 2 options: > > 1) easy to get right, slightly slower, same ABI to user space as 2 > 2) harder to get right, minor performance benefit No, ABI is little different. If 1) we pass off_t in a pair to syscalls, if 2) - in a single register. So if 1, we 'd take some wrappers from aarch32. See patch 12 here. > That’s an obvious pick, no? Mark it non-RFC and stay with the clearing in > assembler entry. If anyone cares about those last few percent, they can still > push the harder path upstream later if they want to, but at least we’ll have > the ABI stable, so that you can start using and developing for ilp32 on > aarch64. > > > Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
> On 17 Aug 2016, at 13:46, Yury Norovwrote: > > This series enables aarch64 with ilp32 mode, and as supporting work, > introduces ARCH_32BIT_OFF_T configuration option that is enabled for > existing 32-bit architectures but disabled for new arches (so 64-bit > off_t is is used by new userspace). > > This version is based on kernel v4.8-rc2. > It works with glibc-2.23, and tested with LTP. > > This is RFC because there is still no solid understanding what type of > registers > top-halves delousing we prefer. In this patchset, w0-w7 are cleared for each > syscall in assembler entry. The alternative approach is in introducing compat > wrappers which is little faster for natively routed syscalls (~2.6% for > syscall > with no payload) but much more complicated. So you’re saying there are 2 options: 1) easy to get right, slightly slower, same ABI to user space as 2 2) harder to get right, minor performance benefit That’s an obvious pick, no? Mark it non-RFC and stay with the clearing in assembler entry. If anyone cares about those last few percent, they can still push the harder path upstream later if they want to, but at least we’ll have the ABI stable, so that you can start using and developing for ilp32 on aarch64. Alex -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC2 nowrap: PATCH v7 00/18] ILP32 for ARM64
This series enables aarch64 with ilp32 mode, and as supporting work, introduces ARCH_32BIT_OFF_T configuration option that is enabled for existing 32-bit architectures but disabled for new arches (so 64-bit off_t is is used by new userspace). This version is based on kernel v4.8-rc2. It works with glibc-2.23, and tested with LTP. This is RFC because there is still no solid understanding what type of registers top-halves delousing we prefer. In this patchset, w0-w7 are cleared for each syscall in assembler entry. The alternative approach is in introducing compat wrappers which is little faster for natively routed syscalls (~2.6% for syscall with no payload) but much more complicated. There's no major changes here comparing to previous submission, mostly the rebase to current master. All changes in details are listed below. No additional regression is observed since previous submission. Patch 1 may be applied separately from other patches of series. v3: https://lkml.org/lkml/2014/9/3/704 v4: https://lkml.org/lkml/2015/4/13/691 v5: https://lkml.org/lkml/2015/9/29/911 v6: https://lkml.org/lkml/2016/5/23/661 v7: RFC nowrap: https://lkml.org/lkml/2016/6/17/990 v7: RFC2 nowrap: - rebased on kernel 4.8-rc2; - setrlimit(), getrlimit() are handled by non-compat handlers to follow switching rlim_t to 64-bit in glibc, as pointed by Andreas Shwab; - fixed {GET,SET}SIGMASK handling in ptrace(), as pointed by Zhou Chengming; - removed put_sig{set,get)_t duplication; - patches 1 and 2 from previous submission are joined, missed chunk restored, found by by Andreas Shwab. Links: Kernel: https://github.com/norov/linux/commits/ilp32-4.8 glibc: https://github.com/norov/glibc/commits/ilp32-2.24-dev Andrew Pinski (6): arm64: ensure the kernel is compiled for LP64 arm64: rename COMPAT to AARCH32_EL0 in Kconfig arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64 arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it arm64: ilp32: introduce ilp32-specific handlers for sigframe and ucontext arm64:ilp32: add ARM64_ILP32 to Kconfig Philipp Tomsich (1): arm64:ilp32: add vdso-ilp32 and use for signal return Yury Norov (11): 32-bit ABI: introduce ARCH_32BIT_OFF_T config option arm64: ilp32: add documentation on the ILP32 ABI for ARM64 thread: move thread bits accessors to separated file arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat) arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64 arm64: introduce binfmt_elf32.c arm64: ilp32: introduce binfmt_ilp32.c arm64: ilp32: share aarch32 syscall handlers arm64: signal: share lp64 signal routines to ilp32 arm64: signal32: move ilp32 and aarch32 common code to separated file arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32 Documentation/arm64/ilp32.txt | 54 arch/Kconfig | 4 + arch/arc/Kconfig | 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig| 19 ++- arch/arm64/Makefile | 5 + arch/arm64/include/asm/compat.h | 19 +-- arch/arm64/include/asm/elf.h | 29 +++-- arch/arm64/include/asm/fpsimd.h | 2 +- arch/arm64/include/asm/ftrace.h | 2 +- arch/arm64/include/asm/hwcap.h| 6 +- arch/arm64/include/asm/is_compat.h| 90 ++ arch/arm64/include/asm/memory.h | 5 +- arch/arm64/include/asm/processor.h| 11 +- arch/arm64/include/asm/ptrace.h | 2 +- arch/arm64/include/asm/signal32.h | 9 +- arch/arm64/include/asm/signal32_common.h | 28 + arch/arm64/include/asm/signal_common.h| 33 + arch/arm64/include/asm/signal_ilp32.h | 38 ++ arch/arm64/include/asm/syscall.h | 2 +- arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/include/asm/unistd.h | 6 +- arch/arm64/include/asm/unistd32.h | 2 +- arch/arm64/include/asm/vdso.h | 6 + arch/arm64/include/uapi/asm/bitsperlong.h | 9 +- arch/arm64/kernel/Makefile| 18 ++- arch/arm64/kernel/asm-offsets.c | 9 +- arch/arm64/kernel/binfmt_elf32.c | 31 + arch/arm64/kernel/binfmt_ilp32.c | 96 +++ arch/arm64/kernel/cpufeature.c| 8 +- arch/arm64/kernel/cpuinfo.c | 20 +-- arch/arm64/kernel/entry.S | 34 - arch/arm64/kernel/entry32.S | 65 -- arch/arm64/kernel/entry32_common.S| 93 ++ arch/arm64/kernel/entry_ilp32.S | 23 arch/arm64/kernel/head.S | 2 +- arch/arm64/kernel/hw_breakpoint.c | 10 +-