Re: [PATCH v7 00/20] ILP32 for ARM64

2017-02-16 Thread Maxim Kuvyrkov
> On Feb 12, 2017, at 4:07 PM, Andrew Pinski  wrote:
> 
> On Mon, Jan 9, 2017 at 3:29 AM, Yury Norov  wrote:
>> This series enables aarch64 with ilp32 mode.
>> 
...
> 
> For folks concerned about performance, here is what we get for SPEC
> CPU 2006 on ThunderX 2 CN99xx.
> Positive means ILP32 is faster than LP64.  This core does not have
> AARCH32 so I can't compare that.
> Also my LP64 scores don't change with and without the patches.
> 
> Options:
> -Ofast -flto=32 -mcpu=native -fno-aggressive-loop-optimizations
> -funroll-loops -fprefetch-loop-arrays
> GCC 7.0.1 r245361 with ilp32 multi-arch patch applied.
> 4.10rc2 Plus ILP32 patches
> 
> SPEC CPU 2006 INT ILP32/LP64
> 400.perlbench   5.23%
> 401.bzip2  7.83%
> 403.gcc 6.22%
> 429.mcf 14.25%
> 445.gobmk -1.33%
> 456.hmmer -0.61%
> 458.sjeng0.00%
> 462.libquantum-7.38%
> 464.h264ref 10.86%
> 471.omnetpp   13.53%
> 473.astar  1.38%
> 483.xalancbmk 3.73%
> Score4.29%
> 
> Rate (32):
> 400.perlbench   6.10%
> 401.bzip2  7.10%
> 403.gcc 6.71%
> 429.mcf 57.29%
> 445.gobmk-0.87%
> 456.hmmer-0.19%
> 458.sjeng  0.22%
> 462.libquantum 0.00%
> 464.h264ref  11.19%
> 471.omnetpp11.80%
> 473.astar  -0.29%
> 483.xalancbmk 8.87%
> Score   8.12%

These are good numbers and show that ILP32 has performance advantage over LP64.

SPEC CPU2006 is a user-land benchmark and spends almost no time in the kernel 
(by design).  Similar results for a kernel-focused benchmark would be highly 
interesting too, and kernel reviewers have asked for these a couple of times.  
Do you plan to run kernel benchmarks on the hardware you have?

Thanks,

--
Maxim Kuvyrkov
www.linaro.org

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 00/20] ILP32 for ARM64

2017-02-12 Thread Andrew Pinski
On Mon, Jan 9, 2017 at 3:29 AM, Yury Norov  wrote:
> This series enables aarch64 with ilp32 mode.
>
> As supporting work, it introduces ARCH_32BIT_OFF_T configuration
> option that is enabled for existing 32-bit architectures but disabled
> for new arches (so 64-bit off_t is is used by new userspace). Also it
> deprecates getrlimit and setrlimit syscalls prior to prlimit64.
>
> This version is based on linux-next from 2017-01-09. It works with
> glibc-2.25, and tested with LTP, glibc testsuite, trinity, lmbench,
> CPUSpec.
>
> This is not RFC anymore. I believe that all ABI and implementation
> issues are resolved now. The way that kernel clears registers top
> halves is probably the last question, and because there's no objection
> for current approach for more that 6 month, I think, community agrees
> with it.
>
> Patches 1, 2, 3 and 8 are general, and may be applied separately.
>
> Current version does not introduce ABI changes comparing to RFC3.
> Kernel and GLIBC trees:
> https://github.com/norov/linux/tree/ilp32-2017-01-09
> https://github.com/norov/glibc/tree/dev9
>
> (GLIBC patches are managed by Steve Ellsey, so my tree is only for
> reference.)

For folks concerned about performance, here is what we get for SPEC
CPU 2006 on ThunderX 2 CN99xx.
Positive means ILP32 is faster than LP64.  This core does not have
AARCH32 so I can't compare that.
Also my LP64 scores don't change with and without the patches.

Options:
-Ofast -flto=32 -mcpu=native -fno-aggressive-loop-optimizations
-funroll-loops -fprefetch-loop-arrays
GCC 7.0.1 r245361 with ilp32 multi-arch patch applied.
4.10rc2 Plus ILP32 patches

SPEC CPU 2006 INT ILP32/LP64
400.perlbench   5.23%
401.bzip2  7.83%
403.gcc 6.22%
429.mcf 14.25%
445.gobmk -1.33%
456.hmmer -0.61%
458.sjeng0.00%
462.libquantum-7.38%
464.h264ref 10.86%
471.omnetpp   13.53%
473.astar  1.38%
483.xalancbmk 3.73%
Score4.29%

Rate (32):
400.perlbench   6.10%
401.bzip2  7.10%
403.gcc 6.71%
429.mcf 57.29%
445.gobmk-0.87%
456.hmmer-0.19%
458.sjeng  0.22%
462.libquantum 0.00%
464.h264ref  11.19%
471.omnetpp11.80%
473.astar  -0.29%
483.xalancbmk 8.87%
Score   8.12%

Thanks,
Andrew Pinski

>
> Changes:
> v3: https://lkml.org/lkml/2014/9/3/704
> v4: https://lkml.org/lkml/2015/4/13/691
> v5: https://lkml.org/lkml/2015/9/29/911
> v6: https://lkml.org/lkml/2016/5/23/661
> v7: RFC nowrap:  https://lkml.org/lkml/2016/6/17/990
> v7: RFC2 nowrap: https://lkml.org/lkml/2016/8/17/245
> v7: RFC3 nowrap: https://lkml.org/lkml/2016/10/21/883
> v7: - 32-bit off_t deprecation is splitted for compat
>   and native 32-bit arches, as it was initially
>   done (patches 1, 2);
> - getrlimit() and setrlimit() syscalls deprecated for
>   aarch64/ilp32 and all new architectures;
> - documentation is cleaned up (patch 4);
> - compat-related definitions moved from
>   aarch64/include/elf.h to binfmt_elf32.c (patch 11)
> - for ptrace, execution mode detection is performed
>   at runtime, as it was in v4 (patch 18)
>
> Andrew Pinski (6):
>   arm64: rename COMPAT to AARCH32_EL0 in Kconfig
>   arm64: ensure the kernel is compiled for LP64
>   arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64
>   arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use
> it
>   arm64: ilp32: introduce ilp32-specific handlers for sigframe and
> ucontext
>   arm64:ilp32: add ARM64_ILP32 to Kconfig
>
> Philipp Tomsich (1):
>   arm64:ilp32: add vdso-ilp32 and use for signal return
>
> Yury Norov (13):
>   compat ABI: use non-compat openat and open_by_handle_at variants
>   32-bit ABI: introduce ARCH_32BIT_OFF_T config option
>   asm-generic: Drop getrlimit and setrlimit syscalls from default list
>   arm64: ilp32: add documentation on the ILP32 ABI for ARM64
>   thread: move thread bits accessors to separated file
>   arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat)
>   arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64
>   arm64: introduce binfmt_elf32.c
>   arm64: ilp32: introduce binfmt_ilp32.c
>   arm64: ilp32: share aarch32 syscall handlers
>   arm64: signal: share lp64 signal routines to ilp32
>   arm64: signal32: move ilp32 and aarch32 common code to separated file
>   arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32
>
>  Documentation/arm64/ilp32.txt |  45 +++
>  arch/Kconfig  |   4 +
>  arch/arc/Kconfig  |   1 +
>  arch/arc/include/uapi/asm/unistd.h|   1 +
>  arch/arm/Kconfig  

[PATCH v7 00/20] ILP32 for ARM64

2017-01-09 Thread Yury Norov
This series enables aarch64 with ilp32 mode.

As supporting work, it introduces ARCH_32BIT_OFF_T configuration
option that is enabled for existing 32-bit architectures but disabled
for new arches (so 64-bit off_t is is used by new userspace). Also it
deprecates getrlimit and setrlimit syscalls prior to prlimit64.

This version is based on linux-next from 2017-01-09. It works with
glibc-2.25, and tested with LTP, glibc testsuite, trinity, lmbench,
CPUSpec.

This is not RFC anymore. I believe that all ABI and implementation
issues are resolved now. The way that kernel clears registers top
halves is probably the last question, and because there's no objection
for current approach for more that 6 month, I think, community agrees
with it.

Patches 1, 2, 3 and 8 are general, and may be applied separately.

Current version does not introduce ABI changes comparing to RFC3.
Kernel and GLIBC trees:
https://github.com/norov/linux/tree/ilp32-2017-01-09
https://github.com/norov/glibc/tree/dev9

(GLIBC patches are managed by Steve Ellsey, so my tree is only for
reference.)

Changes:
v3: https://lkml.org/lkml/2014/9/3/704
v4: https://lkml.org/lkml/2015/4/13/691
v5: https://lkml.org/lkml/2015/9/29/911
v6: https://lkml.org/lkml/2016/5/23/661
v7: RFC nowrap:  https://lkml.org/lkml/2016/6/17/990
v7: RFC2 nowrap: https://lkml.org/lkml/2016/8/17/245
v7: RFC3 nowrap: https://lkml.org/lkml/2016/10/21/883
v7: - 32-bit off_t deprecation is splitted for compat
  and native 32-bit arches, as it was initially
  done (patches 1, 2);
- getrlimit() and setrlimit() syscalls deprecated for
  aarch64/ilp32 and all new architectures;
- documentation is cleaned up (patch 4);
- compat-related definitions moved from
  aarch64/include/elf.h to binfmt_elf32.c (patch 11)
- for ptrace, execution mode detection is performed
  at runtime, as it was in v4 (patch 18)

Andrew Pinski (6):
  arm64: rename COMPAT to AARCH32_EL0 in Kconfig
  arm64: ensure the kernel is compiled for LP64
  arm64:uapi: set __BITS_PER_LONG correctly for ILP32 and LP64
  arm64: ilp32: add sys_ilp32.c and a separate table (in entry.S) to use
it
  arm64: ilp32: introduce ilp32-specific handlers for sigframe and
ucontext
  arm64:ilp32: add ARM64_ILP32 to Kconfig

Philipp Tomsich (1):
  arm64:ilp32: add vdso-ilp32 and use for signal return

Yury Norov (13):
  compat ABI: use non-compat openat and open_by_handle_at variants
  32-bit ABI: introduce ARCH_32BIT_OFF_T config option
  asm-generic: Drop getrlimit and setrlimit syscalls from default list
  arm64: ilp32: add documentation on the ILP32 ABI for ARM64
  thread: move thread bits accessors to separated file
  arm64: introduce is_a32_task and is_a32_thread (for AArch32 compat)
  arm64: ilp32: add is_ilp32_compat_{task,thread} and TIF_32BIT_AARCH64
  arm64: introduce binfmt_elf32.c
  arm64: ilp32: introduce binfmt_ilp32.c
  arm64: ilp32: share aarch32 syscall handlers
  arm64: signal: share lp64 signal routines to ilp32
  arm64: signal32: move ilp32 and aarch32 common code to separated file
  arm64: ptrace: handle ptrace_request differently for aarch32 and ilp32

 Documentation/arm64/ilp32.txt |  45 +++
 arch/Kconfig  |   4 +
 arch/arc/Kconfig  |   1 +
 arch/arc/include/uapi/asm/unistd.h|   1 +
 arch/arm/Kconfig  |   1 +
 arch/arm64/Kconfig|  19 ++-
 arch/arm64/Makefile   |   5 +
 arch/arm64/include/asm/compat.h   |  19 +--
 arch/arm64/include/asm/elf.h  |  32 ++---
 arch/arm64/include/asm/fpsimd.h   |   2 +-
 arch/arm64/include/asm/ftrace.h   |   2 +-
 arch/arm64/include/asm/hwcap.h|   6 +-
 arch/arm64/include/asm/is_compat.h|  90 ++
 arch/arm64/include/asm/memory.h   |   5 +-
 arch/arm64/include/asm/processor.h|  11 +-
 arch/arm64/include/asm/ptrace.h   |   2 +-
 arch/arm64/include/asm/seccomp.h  |   2 +-
 arch/arm64/include/asm/signal32.h |   9 +-
 arch/arm64/include/asm/signal32_common.h  |  27 
 arch/arm64/include/asm/signal_common.h|  33 +
 arch/arm64/include/asm/signal_ilp32.h |  38 ++
 arch/arm64/include/asm/syscall.h  |   2 +-
 arch/arm64/include/asm/thread_info.h  |   4 +-
 arch/arm64/include/asm/unistd.h   |   8 +-
 arch/arm64/include/asm/vdso.h |   6 +
 arch/arm64/include/uapi/asm/bitsperlong.h |   9 +-
 arch/arm64/include/uapi/asm/unistd.h  |  13 ++
 arch/arm64/kernel/Makefile|  18 ++-
 arch/arm64/kernel/asm-offsets.c   |   9 +-
 arch/arm64/kernel/binfmt_elf32.c  |  32 +
 arch/arm64/kernel/binfmt_ilp32.c  |  98 +++
 arch/arm64/kernel/cpufeature.c|   8 +-