Re: [PATCH v7 00/20] ILP32 for ARM64
> On Feb 12, 2017, at 4:07 PM, Andrew Pinskiwrote: > > On Mon, Jan 9, 2017 at 3:29 AM, Yury Norov wrote: >> This series enables aarch64 with ilp32 mode. >> ... > > For folks concerned about performance, here is what we get for SPEC > CPU 2006 on ThunderX 2 CN99xx. > Positive means ILP32 is faster than LP64. This core does not have > AARCH32 so I can't compare that. > Also my LP64 scores don't change with and without the patches. > > Options: > -Ofast -flto=32 -mcpu=native -fno-aggressive-loop-optimizations > -funroll-loops -fprefetch-loop-arrays > GCC 7.0.1 r245361 with ilp32 multi-arch patch applied. > 4.10rc2 Plus ILP32 patches > > SPEC CPU 2006 INT ILP32/LP64 > 400.perlbench 5.23% > 401.bzip2 7.83% > 403.gcc 6.22% > 429.mcf 14.25% > 445.gobmk -1.33% > 456.hmmer -0.61% > 458.sjeng0.00% > 462.libquantum-7.38% > 464.h264ref 10.86% > 471.omnetpp 13.53% > 473.astar 1.38% > 483.xalancbmk 3.73% > Score4.29% > > Rate (32): > 400.perlbench 6.10% > 401.bzip2 7.10% > 403.gcc 6.71% > 429.mcf 57.29% > 445.gobmk-0.87% > 456.hmmer-0.19% > 458.sjeng 0.22% > 462.libquantum 0.00% > 464.h264ref 11.19% > 471.omnetpp11.80% > 473.astar -0.29% > 483.xalancbmk 8.87% > Score 8.12% These are good numbers and show that ILP32 has performance advantage over LP64. SPEC CPU2006 is a user-land benchmark and spends almost no time in the kernel (by design). Similar results for a kernel-focused benchmark would be highly interesting too, and kernel reviewers have asked for these a couple of times. Do you plan to run kernel benchmarks on the hardware you have? Thanks, -- Maxim Kuvyrkov www.linaro.org -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 00/20] ILP32 for ARM64
On Mon, Jan 9, 2017 at 3:29 AM, Yury Norovwrote: > This series enables aarch64 with ilp32 mode. > > As supporting work, it introduces ARCH_32BIT_OFF_T configuration > option that is enabled for existing 32-bit architectures but disabled > for new arches (so 64-bit off_t is is used by new userspace). Also it > deprecates getrlimit and setrlimit syscalls prior to prlimit64. > > This version is based on linux-next from 2017-01-09. It works with > glibc-2.25, and tested with LTP, glibc testsuite, trinity, lmbench, > CPUSpec. > > This is not RFC anymore. I believe that all ABI and implementation > issues are resolved now. The way that kernel clears registers top > halves is probably the last question, and because there's no objection > for current approach for more that 6 month, I think, community agrees > with it. > > Patches 1, 2, 3 and 8 are general, and may be applied separately. > > Current version does not introduce ABI changes comparing to RFC3. > Kernel and GLIBC trees: > https://github.com/norov/linux/tree/ilp32-2017-01-09 > https://github.com/norov/glibc/tree/dev9 > > (GLIBC patches are managed by Steve Ellsey, so my tree is only for > reference.) For folks concerned about performance, here is what we get for SPEC CPU 2006 on ThunderX 2 CN99xx. Positive means ILP32 is faster than LP64. This core does not have AARCH32 so I can't compare that. Also my LP64 scores don't change with and without the patches. Options: -Ofast -flto=32 -mcpu=native -fno-aggressive-loop-optimizations -funroll-loops -fprefetch-loop-arrays GCC 7.0.1 r245361 with ilp32 multi-arch patch applied. 4.10rc2 Plus ILP32 patches SPEC CPU 2006 INT ILP32/LP64 400.perlbench 5.23% 401.bzip2 7.83% 403.gcc 6.22% 429.mcf 14.25% 445.gobmk -1.33% 456.hmmer -0.61% 458.sjeng0.00% 462.libquantum-7.38% 464.h264ref 10.86% 471.omnetpp 13.53% 473.astar 1.38% 483.xalancbmk 3.73% Score4.29% Rate (32): 400.perlbench 6.10% 401.bzip2 7.10% 403.gcc 6.71% 429.mcf 57.29% 445.gobmk-0.87% 456.hmmer-0.19% 458.sjeng 0.22% 462.libquantum 0.00% 464.h264ref 11.19% 471.omnetpp11.80% 473.astar -0.29% 483.xalancbmk 8.87% Score 8.12% Thanks, Andrew Pinski > > Changes: > v3: https://lkml.org/lkml/2014/9/3/704 > v4: https://lkml.org/lkml/2015/4/13/691 > v5: https://lkml.org/lkml/2015/9/29/911 > v6: https://lkml.org/lkml/2016/5/23/661 > v7: RFC nowrap: https://lkml.org/lkml/2016/6/17/990 > v7: RFC2 nowrap: https://lkml.org/lkml/2016/8/17/245 > v7: RFC3 nowrap: https://lkml.org/lkml/2016/10/21/883 > v7: - 32-bit off_t deprecation is splitted for compat > and native 32-bit arches, as it was initially > done (patches 1, 2); > - getrlimit() and setrlimit() syscalls deprecated for > aarch64/ilp32 and all new architectures; > - documentation is cleaned up (patch 4); > - compat-related definitions moved from > aarch64/include/elf.h to binfmt_elf32.c (patch 11) > - for ptrace, execution mode detection is performed > at runtime, as it was in v4 (patch 18) > > Andrew Pinski (6): > arm64: rename COMPAT to AARCH32_EL0 in Kconfig > arm64: ensure the kernel is compiled for LP64 > arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64 > arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use > it > arm64: ilp32: introduce ilp32-specific handlers for sigframe and > ucontext > arm64:ilp32: add ARM64_ILP32 to Kconfig > > Philipp Tomsich (1): > arm64:ilp32: add vdso-ilp32 and use for signal return > > Yury Norov (13): > compat ABI: use non-compat openat and open_by_handle_at variants > 32-bit ABI: introduce ARCH_32BIT_OFF_T config option > asm-generic: Drop getrlimit and setrlimit syscalls from default list > arm64: ilp32: add documentation on the ILP32 ABI for ARM64 > thread: move thread bits accessors to separated file > arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat) > arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64 > arm64: introduce binfmt_elf32.c > arm64: ilp32: introduce binfmt_ilp32.c > arm64: ilp32: share aarch32 syscall handlers > arm64: signal: share lp64 signal routines to ilp32 > arm64: signal32: move ilp32 and aarch32 common code to separated file > arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32 > > Documentation/arm64/ilp32.txt | 45 +++ > arch/Kconfig | 4 + > arch/arc/Kconfig | 1 + > arch/arc/include/uapi/asm/unistd.h| 1 + > arch/arm/Kconfig
[PATCH v7 00/20] ILP32 for ARM64
This series enables aarch64 with ilp32 mode. As supporting work, it introduces ARCH_32BIT_OFF_T configuration option that is enabled for existing 32-bit architectures but disabled for new arches (so 64-bit off_t is is used by new userspace). Also it deprecates getrlimit and setrlimit syscalls prior to prlimit64. This version is based on linux-next from 2017-01-09. It works with glibc-2.25, and tested with LTP, glibc testsuite, trinity, lmbench, CPUSpec. This is not RFC anymore. I believe that all ABI and implementation issues are resolved now. The way that kernel clears registers top halves is probably the last question, and because there's no objection for current approach for more that 6 month, I think, community agrees with it. Patches 1, 2, 3 and 8 are general, and may be applied separately. Current version does not introduce ABI changes comparing to RFC3. Kernel and GLIBC trees: https://github.com/norov/linux/tree/ilp32-2017-01-09 https://github.com/norov/glibc/tree/dev9 (GLIBC patches are managed by Steve Ellsey, so my tree is only for reference.) Changes: v3: https://lkml.org/lkml/2014/9/3/704 v4: https://lkml.org/lkml/2015/4/13/691 v5: https://lkml.org/lkml/2015/9/29/911 v6: https://lkml.org/lkml/2016/5/23/661 v7: RFC nowrap: https://lkml.org/lkml/2016/6/17/990 v7: RFC2 nowrap: https://lkml.org/lkml/2016/8/17/245 v7: RFC3 nowrap: https://lkml.org/lkml/2016/10/21/883 v7: - 32-bit off_t deprecation is splitted for compat and native 32-bit arches, as it was initially done (patches 1, 2); - getrlimit() and setrlimit() syscalls deprecated for aarch64/ilp32 and all new architectures; - documentation is cleaned up (patch 4); - compat-related definitions moved from aarch64/include/elf.h to binfmt_elf32.c (patch 11) - for ptrace, execution mode detection is performed at runtime, as it was in v4 (patch 18) Andrew Pinski (6): arm64: rename COMPAT to AARCH32_EL0 in Kconfig arm64: ensure the kernel is compiled for LP64 arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64 arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it arm64: ilp32: introduce ilp32-specific handlers for sigframe and ucontext arm64:ilp32: add ARM64_ILP32 to Kconfig Philipp Tomsich (1): arm64:ilp32: add vdso-ilp32 and use for signal return Yury Norov (13): compat ABI: use non-compat openat and open_by_handle_at variants 32-bit ABI: introduce ARCH_32BIT_OFF_T config option asm-generic: Drop getrlimit and setrlimit syscalls from default list arm64: ilp32: add documentation on the ILP32 ABI for ARM64 thread: move thread bits accessors to separated file arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat) arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64 arm64: introduce binfmt_elf32.c arm64: ilp32: introduce binfmt_ilp32.c arm64: ilp32: share aarch32 syscall handlers arm64: signal: share lp64 signal routines to ilp32 arm64: signal32: move ilp32 and aarch32 common code to separated file arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32 Documentation/arm64/ilp32.txt | 45 +++ arch/Kconfig | 4 + arch/arc/Kconfig | 1 + arch/arc/include/uapi/asm/unistd.h| 1 + arch/arm/Kconfig | 1 + arch/arm64/Kconfig| 19 ++- arch/arm64/Makefile | 5 + arch/arm64/include/asm/compat.h | 19 +-- arch/arm64/include/asm/elf.h | 32 ++--- arch/arm64/include/asm/fpsimd.h | 2 +- arch/arm64/include/asm/ftrace.h | 2 +- arch/arm64/include/asm/hwcap.h| 6 +- arch/arm64/include/asm/is_compat.h| 90 ++ arch/arm64/include/asm/memory.h | 5 +- arch/arm64/include/asm/processor.h| 11 +- arch/arm64/include/asm/ptrace.h | 2 +- arch/arm64/include/asm/seccomp.h | 2 +- arch/arm64/include/asm/signal32.h | 9 +- arch/arm64/include/asm/signal32_common.h | 27 arch/arm64/include/asm/signal_common.h| 33 + arch/arm64/include/asm/signal_ilp32.h | 38 ++ arch/arm64/include/asm/syscall.h | 2 +- arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/include/asm/unistd.h | 8 +- arch/arm64/include/asm/vdso.h | 6 + arch/arm64/include/uapi/asm/bitsperlong.h | 9 +- arch/arm64/include/uapi/asm/unistd.h | 13 ++ arch/arm64/kernel/Makefile| 18 ++- arch/arm64/kernel/asm-offsets.c | 9 +- arch/arm64/kernel/binfmt_elf32.c | 32 + arch/arm64/kernel/binfmt_ilp32.c | 98 +++ arch/arm64/kernel/cpufeature.c| 8 +-