Re: [PATCH v5 05/10] powerpc/smp: Dont assume l2-cache to be superset of sibling

2020-09-12 Thread Michael Ellerman
Srikar Dronamraju  writes:
> * Michael Ellerman  [2020-09-11 21:55:23]:
>
>> Srikar Dronamraju  writes:
>> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
>> > always be a superset of cpu_sibling_mask.
>> >
>> > Lets stop that assumption. cpu_l2_cache_mask is a superset of
>> > cpu_sibling_mask if and only if shared_caches is set.
>> 
>> I'm seeing oopses with this:
>> 
>> [0.117392][T1] smp: Bringing up secondary CPUs ...
>> [0.156515][T1] smp: Brought up 2 nodes, 2 CPUs
>> [0.158265][T1] numa: Node 0 CPUs: 0
>> [0.158520][T1] numa: Node 1 CPUs: 1
>> [0.167453][T1] BUG: Unable to handle kernel data access on read at 
>> 0x800041228298
>> [0.167992][T1] Faulting instruction address: 0xc018c128
>> [0.168817][T1] Oops: Kernel access of bad area, sig: 11 [#1]
>> [0.168964][T1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA 
>> pSeries
>> [0.169417][T1] Modules linked in:
>> [0.170047][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
>> 5.9.0-rc2-00095-g7430ad5aa700 #209
>> [0.170305][T1] NIP:  c018c128 LR: c018c0cc CTR: 
>> c004dce0
>> [0.170498][T1] REGS: c0007e343880 TRAP: 0380   Not tainted  
>> (5.9.0-rc2-00095-g7430ad5aa700)
>> [0.170602][T1] MSR:  82009033   
>> CR: 4400  XER: 
>> [0.170985][T1] CFAR: c018c288 IRQMASK: 0
>> [0.170985][T1] GPR00:  c0007e343b10 
>> c173e400 4000
>> [0.170985][T1] GPR04:  0800 
>> 0800 
>> [0.170985][T1] GPR08:  c122c298 
>> c0003fffc000 c0007fd05ce8
>> [0.170985][T1] GPR12: c0007e0119f8 c193 
>> 8ade 
>> [0.170985][T1] GPR16: c0007e3c0640 0917 
>> c0007e3c0658 0008
>> [0.170985][T1] GPR20: c15d0bb8 8ade 
>> c0f57400 c1817c28
>> [0.170985][T1] GPR24: c176dc80 c0007e3c0890 
>> c0007e3cfe00 
>> [0.170985][T1] GPR28: c1772310 c0007e011900 
>> c0007e3c0800 0001
>> [0.172750][T1] NIP [c018c128] 
>> build_sched_domains+0x808/0x14b0
>> [0.172900][T1] LR [c018c0cc] build_sched_domains+0x7ac/0x14b0
>> [0.173186][T1] Call Trace:
>> [0.173484][T1] [c0007e343b10] [c018bfe8] 
>> build_sched_domains+0x6c8/0x14b0 (unreliable)
>> [0.173821][T1] [c0007e343c50] [c018dcdc] 
>> sched_init_domains+0xec/0x130
>> [0.174037][T1] [c0007e343ca0] [c10d59d8] 
>> sched_init_smp+0x50/0xc4
>> [0.174207][T1] [c0007e343cd0] [c10b45c4] 
>> kernel_init_freeable+0x1b4/0x378
>> [0.174378][T1] [c0007e343db0] [c00129fc] 
>> kernel_init+0x24/0x158
>> [0.174740][T1] [c0007e343e20] [c000d9d0] 
>> ret_from_kernel_thread+0x5c/0x6c
>> [0.175050][T1] Instruction dump:
>> [0.175626][T1] 554905ee 71480040 7d2907b4 4182016c 2c29 3920006e 
>> 913e002c 41820034
>> [0.175841][T1] 7c6307b4 e9300020 78631f24 7d58182a <7d2a482a> 
>> f93e0080 7d404828 314a0001
>> [0.178340][T1] ---[ end trace 6876b88dd1d4b3bb ]---
>> [0.178512][T1]
>> [1.180458][T1] Kernel panic - not syncing: Attempted to kill init! 
>> exitcode=0x000b
>> 
>> That's qemu:
>> 
>> qemu-system-ppc64 -nographic -vga none -M pseries -cpu POWER8 \
>>   -kernel build~/vmlinux \
>>   -m 2G,slots=2,maxmem=4G \
>>   -object memory-backend-ram,size=1G,id=m0 \
>>   -object memory-backend-ram,size=1G,id=m1 \
>>   -numa node,nodeid=0,memdev=m0 \
>>   -numa node,nodeid=1,memdev=m1 \
>>   -smp 2,sockets=2,maxcpus=2  \
>> 
>
> Thanks Michael for the report and also for identifying the patch and also
> giving an easy reproducer. That made my task easy. (My only problem was all
> my PowerKVM hosts had a old compiler that refuse to compile never kernels.)
>
> So in this setup, CPU doesn't have a l2-cache. And in that scenario, we
> miss updating the l2-cache domain. Actually the initial patch had this
> exact code. However it was my mistake. I should have reassessed it before
> making changes suggested by Gautham.
>
> Patch below. Do let me know if you want me to send the patch separately.

>> On mambo I get:
>> 
>> [0.005069][T1] smp: Bringing up secondary CPUs ...
>> [0.011656][T1] smp: Brought up 2 nodes, 8 CPUs
>> [0.011682][T1] numa: Node 0 CPUs: 0-3
>> [0.011709][T1] numa: Node 1 CPUs: 4-7
>> [0.012015][T1] BUG: arch topology borken
>> [0.012040][T1]  the SMT domain not a subset of the CACHE domain
>> [0.012107][T1] BUG: Unable to handle kernel data access on read at 
>> 0x8001012e7398
>> [0.012142][T1] Faulting instruction address: 

Re: [PATCH v5 20/23] powerpc/book3s64/hash/kuap: Enable kuap on hash

2020-09-12 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/book3s64/pkeys.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index 391230f93da2..16ea0b2f0ea5 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -257,7 +257,12 @@ void __init setup_kuep(bool disabled)
>  #ifdef CONFIG_PPC_KUAP
>  void __init setup_kuap(bool disabled)
>  {
> - if (disabled || !early_radix_enabled())
> + if (disabled)
> + return;
> + /*
> +  * On hash if PKEY feature is not enabled, disable KUAP too.
> +  */
> + if (!early_radix_enabled() && !early_mmu_has_feature(MMU_FTR_PKEY))
>   return;
>  
>   if (smp_processor_id() == boot_cpuid) {

I'm seeing userspace crashes, bisect points at this commit. But
obviously this is just the commit that enables the code.

It seems to be TM related, a lot of the TM selftests are failing with
seemingly random segfaults.

eg:

  ./tm-tmspr
  test: tm_tmspr
  tags: git_version:v5.9-rc2-146-g928c664a7f4e
  !! child died by signal 11
  failure: tm_tmspr

And this warning fires:

  [  308.825455] [ cut here ]
  [  308.825485] WARNING: CPU: 5 PID: 105478 at 
arch/powerpc/include/asm/book3s/64/kup.h:287 syscall_exit_prepare+0x350/0x390
  [  308.825495] Modules linked in: iptable_mangle xt_MASQUERADE iptable_nat 
nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT 
nf_reject_ipv4 xt_tcpudp tun bridge stp llc ip6table_filter ip6_tables 
iptable_filter fuse kvm_hv kvm binfmt_misc squashfs mlx4_ib ib_uverbs 
dm_multipath scsi_dh_rdac scsi_dh_alua ib_core mlx4_en bnx2x lpfc mlx4_core 
crc_t10dif sr_mod crct10dif_generic cdrom vmx_crypto scsi_transport_fc sg mdio 
gf128mul crct10dif_vpmsum crct10dif_common powernv_rng crc32c_vpmsum rng_core 
leds_powernv powernv_op_panel led_class sunrpc ip_tables x_tables autofs4
  [  308.825596] CPU: 5 PID: 105478 Comm: tm-tmspr Tainted: P  
5.9.0-rc2-00146-g928c664a7f4e #1
  [  308.825607] NIP:  c0038640 LR: c000ddcc CTR: 
c03c32d0
  [  308.825616] REGS: c01e41877b00 TRAP: 0700   Tainted: P 
  (5.9.0-rc2-00146-g928c664a7f4e)
  [  308.825625] MSR:  90010282b033 
  CR: 44004484  XER: 2000
  [  308.825649] CFAR: c0038358 IRQMASK: 0 
 GPR00: c000ddcc c01e41877da0 c143f000 
 
 GPR04: c01e41877e80  7ffec6ed 
0008 
 GPR08: 0002 3cff fcff 
f5638000 
 GPR12: 4400 c007a680  
77ff 
 GPR16: 77f54410 77f50320 7ffec6eff240 
77f54420 
 GPR20: 04ba 000116c0 000100034840 
 
 GPR24:  7ffec6eff8f0  
7ffec6efea80 
 GPR28: 738f c01e2e05d600 7ffec6eff180 
c01e41877e80 
  [  308.825724] NIP [c0038640] syscall_exit_prepare+0x350/0x390
  [  308.825734] LR [c000ddcc] system_call_common+0xfc/0x27c
  [  308.825741] Call Trace:
  [  308.825751] [c01e41877da0] [c01e41877e10] 0xc01e41877e10 
(unreliable)
  [  308.825763] [c01e41877e10] [c000ddcc] 
system_call_common+0xfc/0x27c
  [  308.825771] Instruction dump:
  [  308.825778] 713c0800 4082004c f87f0018 395d0080 39001800 7ce050a8 7ce74078 
7ce051ad 
  [  308.825794] 40c2fff4 4bfffd54 6000 6000 <0fe0> 4bfffd18 
6000 6000 
  [  308.825811] ---[ end trace 57dae589ab37c54c ]---


That's on power8, both bare metal and LPAR.

cheers


[PATCH 15/15] selftests/seccomp: Use __NR_mknodat instead of __NR_mknod

2020-09-12 Thread Kees Cook
The __NR_mknod syscall doesn't exist on arm64 (only __NR_mknodat).
Switch to the modern syscall.

Fixes: ad5682184a81 ("selftests/seccomp: Check for EPOLLHUP for user_notif")
Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c712c6a575..b34ede28f314 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -3773,7 +3773,7 @@ TEST(user_notification_filter_empty)
if (pid == 0) {
int listener;
 
-   listener = user_notif_syscall(__NR_mknod, 
SECCOMP_FILTER_FLAG_NEW_LISTENER);
+   listener = user_notif_syscall(__NR_mknodat, 
SECCOMP_FILTER_FLAG_NEW_LISTENER);
if (listener < 0)
_exit(EXIT_FAILURE);
 
-- 
2.25.1



[PATCH 14/15] selftests/clone3: Avoid OS-defined clone_args

2020-09-12 Thread Kees Cook
As the UAPI headers start to appear in distros, we need to avoid
outdated versions of struct clone_args to be able to test modern
features. Additionally pull in the syscall numbers correctly.

Signed-off-by: Kees Cook 
---
I needed to fix this to get MIPS to build the seccomp selftests.
---
 .../testing/selftests/clone3/clone3_selftests.h  | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3_selftests.h 
b/tools/testing/selftests/clone3/clone3_selftests.h
index 91c1a78ddb39..bc0f34e37ae1 100644
--- a/tools/testing/selftests/clone3/clone3_selftests.h
+++ b/tools/testing/selftests/clone3/clone3_selftests.h
@@ -4,11 +4,19 @@
 #define _CLONE3_SELFTESTS_H
 
 #define _GNU_SOURCE
+
+/* Pull in syscall numbers. */
+#include 
+#include 
+
+/* Avoid old OS versions of "struct clone_args". */
+#define clone_args old_clone_args
 #include 
 #include 
+#undef clone_args
+
 #include 
 #include 
-#include 
 #include 
 
 #include "../kselftest.h"
@@ -25,6 +33,7 @@
 
 #ifndef __NR_clone3
 #define __NR_clone3 -1
+#endif
 struct clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
@@ -34,13 +43,16 @@ struct clone_args {
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
+#ifndef CLONE_ARGS_SIZE_VER1
 #define CLONE_ARGS_SIZE_VER1 80
+#endif
__aligned_u64 set_tid;
__aligned_u64 set_tid_size;
+#ifndef CLONE_ARGS_SIZE_VER2
 #define CLONE_ARGS_SIZE_VER2 88
+#endif
__aligned_u64 cgroup;
 };
-#endif /* __NR_clone3 */
 
 static pid_t sys_clone3(struct clone_args *args, size_t size)
 {
-- 
2.25.1



[PATCH 11/15] selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET

2020-09-12 Thread Kees Cook
Instead of special-casing the specific case of shared registers, create
a default SYSCALL_RET_SET() macro (mirroring SYSCALL_NUM_SET()), that
writes to the SYSCALL_RET register. For architectures that can't set the
return value (for whatever reason), they can define SYSCALL_RET_SET()
without an associated SYSCALL_RET() macro. This also paves the way for
architectures that need to do special things to set the return value
(e.g. powerpc).

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 33 +--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 2790d9cd50f4..623953a53032 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1753,8 +1753,8 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 #elif defined(__s390__)
 # define ARCH_REGS s390_regs
 # define SYSCALL_NUM(_regs)(_regs).gprs[2]
-# define SYSCALL_RET(_regs)(_regs).gprs[2]
-# define SYSCALL_NUM_RET_SHARE_REG
+# define SYSCALL_RET_SET(_regs, _val)  \
+   TH_LOG("Can't modify syscall return on this architecture")
 #elif defined(__mips__)
 # include 
 # include 
@@ -1776,8 +1776,8 @@ TEST_F(TRACE_poke, getpid_runs_normally)
else\
(_regs).regs[2] = _nr;  \
} while (0)
-# define SYSCALL_RET(_regs)(_regs).regs[2]
-# define SYSCALL_NUM_RET_SHARE_REG
+# define SYSCALL_RET_SET(_regs, _val)  \
+   TH_LOG("Can't modify syscall return on this architecture")
 #elif defined(__xtensa__)
 # define ARCH_REGS struct user_pt_regs
 # define SYSCALL_NUM(_regs)(_regs).syscall
@@ -1804,9 +1804,26 @@ TEST_F(TRACE_poke, getpid_runs_normally)
SYSCALL_NUM(_regs) = (_nr); \
} while (0)
 #endif
+/*
+ * Most architectures can change the syscall return value by just
+ * writing to the SYSCALL_RET register. This is the default if not
+ * defined above. If an architecture cannot set the return value
+ * (for example when the syscall and return value register is
+ * shared), report it with TH_LOG() in an arch-specific definition
+ * of SYSCALL_RET_SET() above, and leave SYSCALL_RET undefined.
+ */
+#if !defined(SYSCALL_RET) && !defined(SYSCALL_RET_SET)
+# error "One of SYSCALL_RET or SYSCALL_RET_SET is needed for this arch"
+#endif
+#ifndef SYSCALL_RET_SET
+# define SYSCALL_RET_SET(_regs, _val)  \
+   do {\
+   SYSCALL_RET(_regs) = (_val);\
+   } while (0)
+#endif
 
 /* When the syscall return can't be changed, stub out the tests for it. */
-#ifdef SYSCALL_NUM_RET_SHARE_REG
+#ifndef SYSCALL_RET
 # define EXPECT_SYSCALL_RETURN(val, action)EXPECT_EQ(-1, action)
 #else
 # define EXPECT_SYSCALL_RETURN(val, action)\
@@ -1870,11 +1887,7 @@ void change_syscall(struct __test_metadata *_metadata,
 
/* If syscall is skipped, change return value. */
if (syscall == -1)
-#ifdef SYSCALL_NUM_RET_SHARE_REG
-   TH_LOG("Can't modify syscall return on this architecture");
-#else
-   SYSCALL_RET(regs) = result;
-#endif
+   SYSCALL_RET_SET(regs, result);
 
/* Flush any register changes made. */
if (memcmp(, , sizeof(orig)) != 0)
-- 
2.25.1



[PATCH 13/15] selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit

2020-09-12 Thread Kees Cook
Some archs (like ppc) only support changing the return code during
syscall exit when ptrace is used. As the syscall number might not
be available anymore during syscall exit, it needs to be saved
during syscall enter. Adjust the ptrace tests to do this.

Reported-by: Thadeu Lima de Souza Cascardo 
Suggested-by: Thadeu Lima de Souza Cascardo 
Link: 
https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-casca...@canonical.com/
Fixes: 58d0a862f573 ("seccomp: add tests for ptrace hole")
Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 34 +++
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index bbab2420d708..26c712c6a575 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1949,12 +1949,19 @@ void tracer_seccomp(struct __test_metadata *_metadata, 
pid_t tracee,
 
 }
 
+FIXTURE(TRACE_syscall) {
+   struct sock_fprog prog;
+   pid_t tracer, mytid, mypid, parent;
+   long syscall_nr;
+};
+
 void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
   int status, void *args)
 {
-   int ret, nr;
+   int ret;
unsigned long msg;
static bool entry;
+   FIXTURE_DATA(TRACE_syscall) *self = args;
 
/*
 * The traditional way to tell PTRACE_SYSCALL entry/exit
@@ -1968,24 +1975,23 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
 
-   if (!entry)
-   return;
-
-   nr = get_syscall(_metadata, tracee);
+   /*
+* Some architectures only support setting return values during
+* syscall exit under ptrace, and on exit the syscall number may
+* no longer be available. Therefore, save it here, and call
+* "change syscall and set return values" on both entry and exit.
+*/
+   if (entry)
+   self->syscall_nr = get_syscall(_metadata, tracee);
 
-   if (nr == __NR_getpid)
+   if (self->syscall_nr == __NR_getpid)
change_syscall(_metadata, tracee, __NR_getppid, 0);
-   if (nr == __NR_gettid)
+   if (self->syscall_nr == __NR_gettid)
change_syscall(_metadata, tracee, -1, 45000);
-   if (nr == __NR_openat)
+   if (self->syscall_nr == __NR_openat)
change_syscall(_metadata, tracee, -1, -ESRCH);
 }
 
-FIXTURE(TRACE_syscall) {
-   struct sock_fprog prog;
-   pid_t tracer, mytid, mypid, parent;
-};
-
 FIXTURE_VARIANT(TRACE_syscall) {
/*
 * All of the SECCOMP_RET_TRACE behaviors can be tested with either
@@ -2044,7 +2050,7 @@ FIXTURE_SETUP(TRACE_syscall)
self->tracer = setup_trace_fixture(_metadata,
   variant->use_ptrace ? tracer_ptrace
   : tracer_seccomp,
-  NULL, variant->use_ptrace);
+  self, variant->use_ptrace);
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
-- 
2.25.1



[PATCH 12/15] selftests/seccomp: powerpc: Fix seccomp return value testing

2020-09-12 Thread Kees Cook
On powerpc, the errno is not inverted, and depends on ccr.so being
set. Add this to a powerpc definition of SYSCALL_RET_SET().

Co-developed-by: Thadeu Lima de Souza Cascardo 
Signed-off-by: Thadeu Lima de Souza Cascardo 
Link: 
https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-casca...@canonical.com/
Fixes: 5d83c2b37d43 ("selftests/seccomp: Add powerpc support")
Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 623953a53032..bbab2420d708 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1750,6 +1750,21 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 # define ARCH_REGS struct pt_regs
 # define SYSCALL_NUM(_regs)(_regs).gpr[0]
 # define SYSCALL_RET(_regs)(_regs).gpr[3]
+# define SYSCALL_RET_SET(_regs, _val)  \
+   do {\
+   typeof(_val) _result = (_val);  \
+   /*  \
+* A syscall error is signaled by CR0 SO bit\
+* and the code is stored as a positive value.  \
+*/ \
+   if (_result < 0) {  \
+   SYSCALL_RET(_regs) = -result;   \
+   (_regs).ccr |= 0x1000;  \
+   } else {\
+   SYSCALL_RET(_regs) = result;\
+   (_regs).ccr &= ~0x1000; \
+   }   \
+   } while (0)
 #elif defined(__s390__)
 # define ARCH_REGS s390_regs
 # define SYSCALL_NUM(_regs)(_regs).gprs[2]
-- 
2.25.1



[PATCH 10/15] selftests/seccomp: Avoid redundant register flushes

2020-09-12 Thread Kees Cook
When none of the registers have changed, don't flush them back. This can
happen if the architecture uses a non-register way to change the syscall
(e.g. arm64) , and a return value hasn't been written.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index d9346121b89b..2790d9cd50f4 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1859,11 +1859,12 @@ int get_syscall(struct __test_metadata *_metadata, 
pid_t tracee)
 void change_syscall(struct __test_metadata *_metadata,
pid_t tracee, int syscall, int result)
 {
-   ARCH_REGS regs;
+   ARCH_REGS orig, regs;
 
EXPECT_EQ(0, ARCH_GETREGS(regs)) {
return;
}
+   orig = regs;
 
SYSCALL_NUM_SET(regs, syscall);
 
@@ -1876,7 +1877,8 @@ void change_syscall(struct __test_metadata *_metadata,
 #endif
 
/* Flush any register changes made. */
-   EXPECT_EQ(0, ARCH_SETREGS(regs));
+   if (memcmp(, , sizeof(orig)) != 0)
+   EXPECT_EQ(0, ARCH_SETREGS(regs));
 }
 
 void tracer_seccomp(struct __test_metadata *_metadata, pid_t tracee,
-- 
2.25.1



[PATCH 09/15] selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG

2020-09-12 Thread Kees Cook
Consolidate the REGSET logic into the new ARCH_GETREG() and
ARCH_SETREG() macros, avoiding more #ifdef code in function bodies.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 42 +++
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index a986f2332327..d9346121b89b 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1828,26 +1828,29 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 #if defined(__x86_64__) || defined(__i386__) || defined(__mips__)
 # define ARCH_GETREGS(_regs)   ptrace(PTRACE_GETREGS, tracee, 0, &(_regs))
 # define ARCH_SETREGS(_regs)   ptrace(PTRACE_SETREGS, tracee, 0, &(_regs))
+#else
+# define ARCH_GETREGS(_regs)   ({  \
+   struct iovec __v;   \
+   __v.iov_base = &(_regs);\
+   __v.iov_len = sizeof(_regs);\
+   ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, &__v);\
+   })
+# define ARCH_SETREGS(_regs)   ({  \
+   struct iovec __v;   \
+   __v.iov_base = &(_regs);\
+   __v.iov_len = sizeof(_regs);\
+   ptrace(PTRACE_SETREGSET, tracee, NT_PRSTATUS, &__v);\
+   })
 #endif
 
 /* Architecture-specific syscall fetching routine. */
 int get_syscall(struct __test_metadata *_metadata, pid_t tracee)
 {
ARCH_REGS regs;
-#ifdef ARCH_GETREGS
-   EXPECT_EQ(0, ARCH_GETREGS(regs)) {
-   return -1;
-   }
-#else
-   struct iovec iov;
 
-   iov.iov_base = 
-   iov.iov_len = sizeof(regs);
-   EXPECT_EQ(0, ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, )) {
-   TH_LOG("PTRACE_GETREGSET failed");
+   EXPECT_EQ(0, ARCH_GETREGS(regs)) {
return -1;
}
-#endif
 
return SYSCALL_NUM(regs);
 }
@@ -1857,18 +1860,10 @@ void change_syscall(struct __test_metadata *_metadata,
pid_t tracee, int syscall, int result)
 {
ARCH_REGS regs;
-#ifdef ARCH_GETREGS
+
EXPECT_EQ(0, ARCH_GETREGS(regs)) {
return;
}
-#else
-   int ret;
-   struct iovec iov;
-   iov.iov_base = 
-   iov.iov_len = sizeof(regs);
-   ret = ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, );
-   EXPECT_EQ(0, ret);
-#endif
 
SYSCALL_NUM_SET(regs, syscall);
 
@@ -1881,14 +1876,7 @@ void change_syscall(struct __test_metadata *_metadata,
 #endif
 
/* Flush any register changes made. */
-#ifdef ARCH_SETREGS
EXPECT_EQ(0, ARCH_SETREGS(regs));
-#else
-   iov.iov_base = 
-   iov.iov_len = sizeof(regs);
-   ret = ptrace(PTRACE_SETREGSET, tracee, NT_PRSTATUS, );
-   EXPECT_EQ(0, ret);
-#endif
 }
 
 void tracer_seccomp(struct __test_metadata *_metadata, pid_t tracee,
-- 
2.25.1



[PATCH 08/15] selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG

2020-09-12 Thread Kees Cook
Instead of special-casing the get/set-registers routines, move the
HAVE_GETREG logic into the new ARCH_GETREG() and ARCH_SETREG() macros.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 27 ++-
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 3b77bdbe7125..a986f2332327 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1821,20 +1821,21 @@ TEST_F(TRACE_poke, getpid_runs_normally)
} while (0)
 #endif
 
-/* Use PTRACE_GETREGS and PTRACE_SETREGS when available. This is useful for
+/*
+ * Use PTRACE_GETREGS and PTRACE_SETREGS when available. This is useful for
  * architectures without HAVE_ARCH_TRACEHOOK (e.g. User-mode Linux).
  */
 #if defined(__x86_64__) || defined(__i386__) || defined(__mips__)
-#define HAVE_GETREGS
+# define ARCH_GETREGS(_regs)   ptrace(PTRACE_GETREGS, tracee, 0, &(_regs))
+# define ARCH_SETREGS(_regs)   ptrace(PTRACE_SETREGS, tracee, 0, &(_regs))
 #endif
 
 /* Architecture-specific syscall fetching routine. */
 int get_syscall(struct __test_metadata *_metadata, pid_t tracee)
 {
ARCH_REGS regs;
-#ifdef HAVE_GETREGS
-   EXPECT_EQ(0, ptrace(PTRACE_GETREGS, tracee, 0, )) {
-   TH_LOG("PTRACE_GETREGS failed");
+#ifdef ARCH_GETREGS
+   EXPECT_EQ(0, ARCH_GETREGS(regs)) {
return -1;
}
 #else
@@ -1855,17 +1856,19 @@ int get_syscall(struct __test_metadata *_metadata, 
pid_t tracee)
 void change_syscall(struct __test_metadata *_metadata,
pid_t tracee, int syscall, int result)
 {
-   int ret;
ARCH_REGS regs;
-#ifdef HAVE_GETREGS
-   ret = ptrace(PTRACE_GETREGS, tracee, 0, );
+#ifdef ARCH_GETREGS
+   EXPECT_EQ(0, ARCH_GETREGS(regs)) {
+   return;
+   }
 #else
+   int ret;
struct iovec iov;
iov.iov_base = 
iov.iov_len = sizeof(regs);
ret = ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, );
-#endif
EXPECT_EQ(0, ret);
+#endif
 
SYSCALL_NUM_SET(regs, syscall);
 
@@ -1878,14 +1881,14 @@ void change_syscall(struct __test_metadata *_metadata,
 #endif
 
/* Flush any register changes made. */
-#ifdef HAVE_GETREGS
-   ret = ptrace(PTRACE_SETREGS, tracee, 0, );
+#ifdef ARCH_SETREGS
+   EXPECT_EQ(0, ARCH_SETREGS(regs));
 #else
iov.iov_base = 
iov.iov_len = sizeof(regs);
ret = ptrace(PTRACE_SETREGSET, tracee, NT_PRSTATUS, );
-#endif
EXPECT_EQ(0, ret);
+#endif
 }
 
 void tracer_seccomp(struct __test_metadata *_metadata, pid_t tracee,
-- 
2.25.1



[PATCH 07/15] selftests/seccomp: Remove syscall setting #ifdefs

2020-09-12 Thread Kees Cook
With all architectures now using the common SYSCALL_NUM_SET() macro, the
arch-specific #ifdef can be removed from change_syscall() itself.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index aa1c224371d1..3b77bdbe7125 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1865,20 +1865,9 @@ void change_syscall(struct __test_metadata *_metadata,
iov.iov_len = sizeof(regs);
ret = ptrace(PTRACE_GETREGSET, tracee, NT_PRSTATUS, );
 #endif
-   EXPECT_EQ(0, ret) {}
+   EXPECT_EQ(0, ret);
 
-#if defined(__x86_64__) || defined(__i386__) || defined(__powerpc__) || \
-   defined(__s390__) || defined(__hppa__) || defined(__riscv) || \
-   defined(__xtensa__) || defined(__csky__) || defined(__sh__) || \
-   defined(__mips__) || defined(__arm__) || defined(__aarch64__)
-   {
-   SYSCALL_NUM_SET(regs, syscall);
-   }
-#else
-   ASSERT_EQ(1, 0) {
-   TH_LOG("How is the syscall changed on this architecture?");
-   }
-#endif
+   SYSCALL_NUM_SET(regs, syscall);
 
/* If syscall is skipped, change return value. */
if (syscall == -1)
@@ -1888,6 +1877,7 @@ void change_syscall(struct __test_metadata *_metadata,
SYSCALL_RET(regs) = result;
 #endif
 
+   /* Flush any register changes made. */
 #ifdef HAVE_GETREGS
ret = ptrace(PTRACE_SETREGS, tracee, 0, );
 #else
-- 
2.25.1



[PATCH 06/15] selftests/seccomp: mips: Remove O32-specific macro

2020-09-12 Thread Kees Cook
Instead of having the mips O32 macro special-cased, pull the logic into
the SYSCALL_NUM() macro. Additionally include the ABI headers, since
these appear to have been missing, leaving __NR_O32_Linux undefined.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index cfa606d96086..aa1c224371d1 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1756,9 +1756,19 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 # define SYSCALL_RET(_regs)(_regs).gprs[2]
 # define SYSCALL_NUM_RET_SHARE_REG
 #elif defined(__mips__)
+# include 
+# include 
+# include 
 # define ARCH_REGS struct pt_regs
-# define SYSCALL_NUM(_regs)(_regs).regs[2]
-# define SYSCALL_SYSCALL_NUM   regs[4]
+# define SYSCALL_NUM(_regs)\
+   ({  \
+   typeof((_regs).regs[2]) _nr;\
+   if ((_regs).regs[2] == __NR_O32_Linux)  \
+   _nr = (_regs).regs[4];  \
+   else\
+   _nr = (_regs).regs[2];  \
+   _nr;\
+   })
 # define SYSCALL_NUM_SET(_regs, _nr)   \
do {\
if ((_regs).regs[2] == __NR_O32_Linux)  \
@@ -1838,10 +1848,6 @@ int get_syscall(struct __test_metadata *_metadata, pid_t 
tracee)
}
 #endif
 
-#if defined(__mips__)
-   if (SYSCALL_NUM(regs) == __NR_O32_Linux)
-   return regs.SYSCALL_SYSCALL_NUM;
-#endif
return SYSCALL_NUM(regs);
 }
 
-- 
2.25.1



[PATCH 05/15] selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro

2020-09-12 Thread Kees Cook
Remove the arm64 special-case in change_syscall().

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 610fc036e374..cfa606d96086 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1717,6 +1717,18 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 #elif defined(__aarch64__)
 # define ARCH_REGS struct user_pt_regs
 # define SYSCALL_NUM(_regs)(_regs).regs[8]
+# ifndef NT_ARM_SYSTEM_CALL
+#  define NT_ARM_SYSTEM_CALL 0x404
+# endif
+# define SYSCALL_NUM_SET(_regs, _nr)   \
+   do {\
+   struct iovec __v;   \
+   typeof(_nr) __nr = (_nr);   \
+   __v.iov_base = &__nr;   \
+   __v.iov_len = sizeof(__nr); \
+   EXPECT_EQ(0, ptrace(PTRACE_SETREGSET, tracee,   \
+   NT_ARM_SYSTEM_CALL, &__v)); \
+   } while (0)
 # define SYSCALL_RET(_regs)(_regs).regs[0]
 #elif defined(__riscv) && __riscv_xlen == 64
 # define ARCH_REGS struct user_regs_struct
@@ -1852,23 +1864,10 @@ void change_syscall(struct __test_metadata *_metadata,
 #if defined(__x86_64__) || defined(__i386__) || defined(__powerpc__) || \
defined(__s390__) || defined(__hppa__) || defined(__riscv) || \
defined(__xtensa__) || defined(__csky__) || defined(__sh__) || \
-   defined(__mips__) || defined(__arm__)
+   defined(__mips__) || defined(__arm__) || defined(__aarch64__)
{
SYSCALL_NUM_SET(regs, syscall);
}
-
-#elif defined(__aarch64__)
-# ifndef NT_ARM_SYSTEM_CALL
-#  define NT_ARM_SYSTEM_CALL 0x404
-# endif
-   {
-   iov.iov_base = 
-   iov.iov_len = sizeof(syscall);
-   ret = ptrace(PTRACE_SETREGSET, tracee, NT_ARM_SYSTEM_CALL,
-);
-   EXPECT_EQ(0, ret);
-   }
-
 #else
ASSERT_EQ(1, 0) {
TH_LOG("How is the syscall changed on this architecture?");
-- 
2.25.1



[PATCH 04/15] selftests/seccomp: arm: Define SYSCALL_NUM_SET macro

2020-09-12 Thread Kees Cook
Remove the arm special-case in change_syscall().

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 02a9a6599746..610fc036e374 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1708,6 +1708,11 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 #elif defined(__arm__)
 # define ARCH_REGS struct pt_regs
 # define SYSCALL_NUM(_regs)(_regs).ARM_r7
+# ifndef PTRACE_SET_SYSCALL
+#  define PTRACE_SET_SYSCALL   23
+# endif
+# define SYSCALL_NUM_SET(_regs, _nr)   \
+   EXPECT_EQ(0, ptrace(PTRACE_SET_SYSCALL, tracee, NULL, _nr))
 # define SYSCALL_RET(_regs)(_regs).ARM_r0
 #elif defined(__aarch64__)
 # define ARCH_REGS struct user_pt_regs
@@ -1847,20 +1852,11 @@ void change_syscall(struct __test_metadata *_metadata,
 #if defined(__x86_64__) || defined(__i386__) || defined(__powerpc__) || \
defined(__s390__) || defined(__hppa__) || defined(__riscv) || \
defined(__xtensa__) || defined(__csky__) || defined(__sh__) || \
-   defined(__mips__)
+   defined(__mips__) || defined(__arm__)
{
SYSCALL_NUM_SET(regs, syscall);
}
 
-#elif defined(__arm__)
-# ifndef PTRACE_SET_SYSCALL
-#  define PTRACE_SET_SYSCALL   23
-# endif
-   {
-   ret = ptrace(PTRACE_SET_SYSCALL, tracee, NULL, syscall);
-   EXPECT_EQ(0, ret);
-   }
-
 #elif defined(__aarch64__)
 # ifndef NT_ARM_SYSTEM_CALL
 #  define NT_ARM_SYSTEM_CALL 0x404
-- 
2.25.1



[PATCH 02/15] selftests/seccomp: Provide generic syscall setting macro

2020-09-12 Thread Kees Cook
In order to avoid "#ifdef"s in the main function bodies, create a new
macro, SYSCALL_NUM_SET(), where arch-specific logic can live.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index fef15080b575..1c83e743bfb1 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1760,6 +1760,17 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 # error "Do not know how to find your architecture's registers and syscalls"
 #endif
 
+/*
+ * Most architectures can change the syscall by just updating the
+ * associated register. This is the default if not defined above.
+ */
+#ifndef SYSCALL_NUM_SET
+# define SYSCALL_NUM_SET(_regs, _nr)   \
+   do {\
+   SYSCALL_NUM(_regs) = (_nr); \
+   } while (0)
+#endif
+
 /* When the syscall return can't be changed, stub out the tests for it. */
 #ifdef SYSCALL_NUM_RET_SHARE_REG
 # define EXPECT_SYSCALL_RETURN(val, action)EXPECT_EQ(-1, action)
@@ -1830,14 +1841,14 @@ void change_syscall(struct __test_metadata *_metadata,
defined(__s390__) || defined(__hppa__) || defined(__riscv) || \
defined(__xtensa__) || defined(__csky__) || defined(__sh__)
{
-   SYSCALL_NUM(regs) = syscall;
+   SYSCALL_NUM_SET(regs, syscall);
}
 #elif defined(__mips__)
{
if (SYSCALL_NUM(regs) == __NR_O32_Linux)
regs.SYSCALL_SYSCALL_NUM = syscall;
else
-   SYSCALL_NUM(regs) = syscall;
+   SYSCALL_NUM_SET(regs, syscall);
}
 
 #elif defined(__arm__)
-- 
2.25.1



[PATCH 00/15] selftests/seccomp: Refactor change_syscall()

2020-09-12 Thread Kees Cook
Hi,

This refactors the seccomp selftest macros used in change_syscall(),
in an effort to remove special cases for mips, arm, arm64, and xtensa,
which paves the way for powerpc fixes.

I'm not entirely done testing, but all-arch build tests and x86_64
selftests pass. I'll be doing arm, arm64, and i386 selftests shortly,
but I currently don't have an easy way to check xtensa, mips, nor
powerpc. Any help there would be appreciated!

(FWIW, I expect to take these via the seccomp tree.)

Thanks,

-Kees


Kees Cook (15):
  selftests/seccomp: Refactor arch register macros to avoid xtensa
special case
  selftests/seccomp: Provide generic syscall setting macro
  selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
  selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
  selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
  selftests/seccomp: mips: Remove O32-specific macro
  selftests/seccomp: Remove syscall setting #ifdefs
  selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Avoid redundant register flushes
  selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of
SYSCALL_RET_SET
  selftests/seccomp: powerpc: Fix seccomp return value testing
  selftests/seccomp: powerpc: Set syscall return during ptrace syscall
exit
  selftests/clone3: Avoid OS-defined clone_args
  selftests/seccomp: Use __NR_mknodat instead of __NR_mknod

 .../selftests/clone3/clone3_selftests.h   |  16 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c | 313 ++
 2 files changed, 184 insertions(+), 145 deletions(-)

-- 
2.25.1



[PATCH 01/15] selftests/seccomp: Refactor arch register macros to avoid xtensa special case

2020-09-12 Thread Kees Cook
To avoid an xtensa special-case, refactor all arch register macros to
take the register variable instead of depending on the macro expanding
as a struct member name.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 97 +--
 1 file changed, 47 insertions(+), 50 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index c5002fc25b00..fef15080b575 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1698,64 +1698,64 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 }
 
 #if defined(__x86_64__)
-# define ARCH_REGS struct user_regs_struct
-# define SYSCALL_NUM   orig_rax
-# define SYSCALL_RET   rax
+# define ARCH_REGS struct user_regs_struct
+# define SYSCALL_NUM(_regs)(_regs).orig_rax
+# define SYSCALL_RET(_regs)(_regs).rax
 #elif defined(__i386__)
-# define ARCH_REGS struct user_regs_struct
-# define SYSCALL_NUM   orig_eax
-# define SYSCALL_RET   eax
+# define ARCH_REGS struct user_regs_struct
+# define SYSCALL_NUM(_regs)(_regs).orig_eax
+# define SYSCALL_RET(_regs)(_regs).eax
 #elif defined(__arm__)
-# define ARCH_REGS struct pt_regs
-# define SYSCALL_NUM   ARM_r7
-# define SYSCALL_RET   ARM_r0
+# define ARCH_REGS struct pt_regs
+# define SYSCALL_NUM(_regs)(_regs).ARM_r7
+# define SYSCALL_RET(_regs)(_regs).ARM_r0
 #elif defined(__aarch64__)
-# define ARCH_REGS struct user_pt_regs
-# define SYSCALL_NUM   regs[8]
-# define SYSCALL_RET   regs[0]
+# define ARCH_REGS struct user_pt_regs
+# define SYSCALL_NUM(_regs)(_regs).regs[8]
+# define SYSCALL_RET(_regs)(_regs).regs[0]
 #elif defined(__riscv) && __riscv_xlen == 64
-# define ARCH_REGS struct user_regs_struct
-# define SYSCALL_NUM   a7
-# define SYSCALL_RET   a0
+# define ARCH_REGS struct user_regs_struct
+# define SYSCALL_NUM(_regs)(_regs).a7
+# define SYSCALL_RET(_regs)(_regs).a0
 #elif defined(__csky__)
-# define ARCH_REGS struct pt_regs
-#if defined(__CSKYABIV2__)
-# define SYSCALL_NUM   regs[3]
-#else
-# define SYSCALL_NUM   regs[9]
-#endif
-# define SYSCALL_RET   a0
+# define ARCH_REGS struct pt_regs
+#  if defined(__CSKYABIV2__)
+#   define SYSCALL_NUM(_regs)  (_regs).regs[3]
+#  else
+#   define SYSCALL_NUM(_regs)  (_regs).regs[9]
+#  endif
+# define SYSCALL_RET(_regs)(_regs).a0
 #elif defined(__hppa__)
-# define ARCH_REGS struct user_regs_struct
-# define SYSCALL_NUM   gr[20]
-# define SYSCALL_RET   gr[28]
+# define ARCH_REGS struct user_regs_struct
+# define SYSCALL_NUM(_regs)(_regs).gr[20]
+# define SYSCALL_RET(_regs)(_regs).gr[28]
 #elif defined(__powerpc__)
-# define ARCH_REGS struct pt_regs
-# define SYSCALL_NUM   gpr[0]
-# define SYSCALL_RET   gpr[3]
+# define ARCH_REGS struct pt_regs
+# define SYSCALL_NUM(_regs)(_regs).gpr[0]
+# define SYSCALL_RET(_regs)(_regs).gpr[3]
 #elif defined(__s390__)
-# define ARCH_REGS s390_regs
-# define SYSCALL_NUM   gprs[2]
-# define SYSCALL_RET   gprs[2]
+# define ARCH_REGS s390_regs
+# define SYSCALL_NUM(_regs)(_regs).gprs[2]
+# define SYSCALL_RET(_regs)(_regs).gprs[2]
 # define SYSCALL_NUM_RET_SHARE_REG
 #elif defined(__mips__)
-# define ARCH_REGS struct pt_regs
-# define SYSCALL_NUM   regs[2]
-# define SYSCALL_SYSCALL_NUM regs[4]
-# define SYSCALL_RET   regs[2]
+# define ARCH_REGS struct pt_regs
+# define SYSCALL_NUM(_regs)(_regs).regs[2]
+# define SYSCALL_SYSCALL_NUM   regs[4]
+# define SYSCALL_RET(_regs)(_regs).regs[2]
 # define SYSCALL_NUM_RET_SHARE_REG
 #elif defined(__xtensa__)
-# define ARCH_REGS struct user_pt_regs
-# define SYSCALL_NUM   syscall
+# define ARCH_REGS struct user_pt_regs
+# define SYSCALL_NUM(_regs)(_regs).syscall
 /*
  * On xtensa syscall return value is in the register
  * a2 of the current window which is not fixed.
  */
-#define SYSCALL_RET(reg) a[(reg).windowbase * 4 + 2]
+#define SYSCALL_RET(_regs) (_regs).a[(_regs).windowbase * 4 + 2]
 #elif defined(__sh__)
-# define ARCH_REGS struct pt_regs
-# define SYSCALL_NUM   gpr[3]
-# define SYSCALL_RET   gpr[0]
+# define ARCH_REGS struct pt_regs
+# define SYSCALL_NUM(_regs)(_regs).gpr[3]
+# define SYSCALL_RET(_regs)(_regs).gpr[0]
 #else
 # error "Do not know how to find your architecture's registers and syscalls"
 #endif
@@ -1804,10 +1804,10 @@ int get_syscall(struct __test_metadata *_metadata, 
pid_t tracee)
 #endif
 
 #if defined(__mips__)
-   if (regs.SYSCALL_NUM == __NR_O32_Linux)
+   if (SYSCALL_NUM(regs) == __NR_O32_Linux)
return regs.SYSCALL_SYSCALL_NUM;
 #endif
-   return regs.SYSCALL_NUM;
+   return SYSCALL_NUM(regs);
 }
 
 /* Architecture-specific syscall changing routine. */
@@ -1830,14 +1830,14 @@ void change_syscall(struct __test_metadata *_metadata,

[PATCH 03/15] selftests/seccomp: mips: Define SYSCALL_NUM_SET macro

2020-09-12 Thread Kees Cook
Remove the mips special-case in change_syscall().

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 1c83e743bfb1..02a9a6599746 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1742,6 +1742,13 @@ TEST_F(TRACE_poke, getpid_runs_normally)
 # define ARCH_REGS struct pt_regs
 # define SYSCALL_NUM(_regs)(_regs).regs[2]
 # define SYSCALL_SYSCALL_NUM   regs[4]
+# define SYSCALL_NUM_SET(_regs, _nr)   \
+   do {\
+   if ((_regs).regs[2] == __NR_O32_Linux)  \
+   (_regs).regs[4] = _nr;  \
+   else\
+   (_regs).regs[2] = _nr;  \
+   } while (0)
 # define SYSCALL_RET(_regs)(_regs).regs[2]
 # define SYSCALL_NUM_RET_SHARE_REG
 #elif defined(__xtensa__)
@@ -1839,17 +1846,11 @@ void change_syscall(struct __test_metadata *_metadata,
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__powerpc__) || \
defined(__s390__) || defined(__hppa__) || defined(__riscv) || \
-   defined(__xtensa__) || defined(__csky__) || defined(__sh__)
+   defined(__xtensa__) || defined(__csky__) || defined(__sh__) || \
+   defined(__mips__)
{
SYSCALL_NUM_SET(regs, syscall);
}
-#elif defined(__mips__)
-   {
-   if (SYSCALL_NUM(regs) == __NR_O32_Linux)
-   regs.SYSCALL_SYSCALL_NUM = syscall;
-   else
-   SYSCALL_NUM_SET(regs, syscall);
-   }
 
 #elif defined(__arm__)
 # ifndef PTRACE_SET_SYSCALL
-- 
2.25.1



Re: [PATCH] powerpc/papr_scm: Fix warning triggered by perf_stats_show()

2020-09-12 Thread Vaibhav Jain
Ira Weiny  writes:

> On Sat, Sep 12, 2020 at 01:36:46AM +0530, Vaibhav Jain wrote:
>> Thanks for reviewing this patch Ira,
>> 
>> 
>> Ira Weiny  writes:
>> 
>> > On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote:
>> >> A warning is reported by the kernel in case perf_stats_show() returns
>> >> an error code. The warning is of the form below:
>> >> 
>> >>  papr_scm ibm,persistent-memory:ibm,pmemory@4411:
>> >> Failed to query performance stats, Err:-10
>> >>  dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
>> >>  fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
>> >> 
>> >> On investigation it looks like that the compiler is silently truncating 
>> >> the
>> >> return value of drc_pmem_query_stats() from 'long' to 'int', since the
>> >> variable used to store the return code 'rc' is an 'int'. This
>> >> truncated value is then returned back as a 'ssize_t' back from
>> >> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
>> >> unsigned number and triggers this warning..
>> >> 
>> >> To fix this we update the type of variable 'rc' from 'int' to
>> >> 'ssize_t' that prevents the compiler from truncating the return value
>> >> of drc_pmem_query_stats() and returning correct signed value back from
>> >> perf_stats_show().
>> >> 
>> >> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
>> >>stats from PHYP')
>> >> Signed-off-by: Vaibhav Jain 
>> >> ---
>> >>  arch/powerpc/platforms/pseries/papr_scm.c | 3 ++-
>> >>  1 file changed, 2 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> >> b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index a88a707a608aa..9f00b61676ab9 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct 
>> >> nvdimm_bus_descriptor *nd_desc,
>> >>  static ssize_t perf_stats_show(struct device *dev,
>> >>  struct device_attribute *attr, char *buf)
>> >>  {
>> >> - int index, rc;
>> >> + int index;
>> >> + ssize_t rc;
>> >
>> > I'm not sure this is really fixing everything here.
>> 
>> The issue is with the statement in perf_stats_show():
>> 
>> 'return rc ? rc : seq_buf_used();'
>> 
>> The function seq_buf_used() returns an 'unsigned int' and with 'rc'
>> typed as 'int', forces a promotion of the expression to 'unsigned int'
>> which causes a loss of signedness of 'rc' and compiler silently
>> assigns this unsigned value to the function return typed as 'signed
>> long'.
>> 
>> Making 'rc', a 'signed long' forces a promotion of the expresion to
>> 'signed long' which preserves the signedness of 'rc' and will also be
>> compatible with the function return type.
>
> Ok, ok, I read this all wrong.
>
> FWIW I would also cast seq_buf_used() to ssize_t to show you know what you are
> doing there.
>
>> 
>> >
>> > drc_pmem_query_stats() can return negative errno's.  Why are those not 
>> > checked
>> > somewhere in perf_stats_show()?
>> >
>> For the specific invocation 'drc_pmem_query_stats(p, stats, 0)' we only
>> expect return value 'rc <=0' with '0' indicating a successful fetch of
>> nvdimm performance stats from hypervisor. Hence there are no explicit
>> checks for negative error codes in the functions as all return values
>> !=0 indicate an error.
>> 
>> 
>> > It seems like all this fix is handling is a > 0 return value: 'ret[0]' from
>> > line 289 in papr_scm.c...  Or something?
>> No, in case the arg 'num_stats' is '0' and 'buff_stats != NULL' the
>> variable 'size' is assigned a non-zero value hence that specific branch
>> you mentioned  is never taken. Instead in case of success
>> drc_pmem_query_stats() return '0' and in case of an error a negative
>> error code is returned.
>> 
>> >
>> > Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed 
>> > value.
>> > Therefore, it should not be returning -errno.  I'm surprised the static
>> > checkers did not catch that.
>> Didnt quite get the assertion here. The function is marked to return
>> 'ssize_t' because we can return both +ve for success and -ve values to
>> indicate errors.
>
> Sorry I was reading this as size_t and meant to say unsigned...  I was looking
> at this too quickly.
>
>> 
>> >
>> > I believe I caught similar errors with a patch series before which did not 
>> > pay
>> > attention to variable types.
>> >
>> > Please audit this code for these types of errors and ensure you are really
>> > doing the correct thing when using the sysfs interface.  I'm pretty sure 
>> > bad
>> > things will eventually happen (if they are not already) if you return some
>> > really big number to the sysfs core from *_show().
>> I think this problem is different compared to what you had previously pointed
>> to. The values returned from drc_pmem_query_stats() can be stored in an
>> 'int' variable too, however it was the silent promotion of a signed 

[PATCH v2] powerpc/papr_scm: Fix warning triggered by perf_stats_show()

2020-09-12 Thread Vaibhav Jain
A warning is reported by the kernel in case perf_stats_show() returns
an error code. The warning is of the form below:

 papr_scm ibm,persistent-memory:ibm,pmemory@4411:
  Failed to query performance stats, Err:-10
 dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
 fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count

On investigation it looks like that the compiler is silently truncating the
return value of drc_pmem_query_stats() from 'long' to 'int', since the
variable used to store the return code 'rc' is an 'int'. This
truncated value is then returned back as a 'ssize_t' back from
perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
unsigned number and triggers this warning..

To fix this we update the type of variable 'rc' from 'int' to
'ssize_t' that prevents the compiler from truncating the return value
of drc_pmem_query_stats() and returning correct signed value back from
perf_stats_show().

Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
   stats from PHYP')
Signed-off-by: Vaibhav Jain 
---
Changelog:

v2: Added an explicit cast to the expression calling 'seq_buf_used()'
and triggering this issue. [ Ira ]
---
 arch/powerpc/platforms/pseries/papr_scm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index a88a707a608aa..5493bc847bd08 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor 
*nd_desc,
 static ssize_t perf_stats_show(struct device *dev,
   struct device_attribute *attr, char *buf)
 {
-   int index, rc;
+   int index;
+   ssize_t rc;
struct seq_buf s;
struct papr_scm_perf_stat *stat;
struct papr_scm_perf_stats *stats;
@@ -820,7 +821,7 @@ static ssize_t perf_stats_show(struct device *dev,
 
 free_stats:
kfree(stats);
-   return rc ? rc : seq_buf_used();
+   return rc ? rc : (ssize_t)seq_buf_used();
 }
 DEVICE_ATTR_ADMIN_RO(perf_stats);
 
-- 
2.26.2