[GIT PULL] gcc-plugin updates for v4.20-rc6

2018-12-07 Thread Kees Cook
Hi Linus,

Please pull these gcc-plugin fixes for v4.20-rc6.

Thanks!

-Kees

The following changes since commit ef1a8409348966f0b25ff97a170d6d0367710ea9:

  stackleak: Disable function tracing and kprobes for stackleak_erase() 
(2018-11-30 09:05:07 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/gcc-plugins-v4.20-rc6

for you to fetch changes up to 8fb2dfb228df785bbeb4d055a74402ef4b07fc25:

  stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass 
(2018-12-06 09:10:23 -0800)


Fixes for stackleak

- Remove tracing for inserted stack depth marking function (Anders Roxell)
- Move gcc-plugin pass location to avoid objtool warnings (Alexander Popov)


Alexander Popov (1):
  stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' 
pass

Anders Roxell (1):
  stackleak: Mark stackleak_track_stack() as notrace

 kernel/stackleak.c | 2 +-
 scripts/gcc-plugins/stackleak_plugin.c | 8 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

-- 
Kees Cook


[PATCH] selftests/seccomp: Remove SIGSTOP si_pid check

2018-12-06 Thread Kees Cook
Commit f149b3155744 ("signal: Never allocate siginfo for SIGKILL or SIGSTOP")
means that the seccomp selftest cannot check si_pid under SIGSTOP anymore.
Since it's believed[1] there are no other userspace things depending on the
old behavior, this removes the behavioral check in the selftest, since it's
more a "extra" sanity check (which turns out, maybe, not to have been
useful to test).

[1] 
https://lkml.kernel.org/r/cagxu5jjazaozp1qfz66tyrtbuywqb+un2soa1vlhpccoiyv...@mail.gmail.com

Reported-by: Tycho Andersen 
Suggested-by: Eric W. Biederman 
Signed-off-by: Kees Cook 
---
Shuah, can you make sure that Linus gets this before v4.20 is released? Thanks!
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index e1473234968d..c9a2abf8be1b 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -2731,9 +2731,14 @@ TEST(syscall_restart)
ASSERT_EQ(child_pid, waitpid(child_pid, , 0));
ASSERT_EQ(true, WIFSTOPPED(status));
ASSERT_EQ(SIGSTOP, WSTOPSIG(status));
-   /* Verify signal delivery came from parent now. */
ASSERT_EQ(0, ptrace(PTRACE_GETSIGINFO, child_pid, NULL, ));
-   EXPECT_EQ(getpid(), info.si_pid);
+   /*
+* There is no siginfo on SIGSTOP any more, so we can't verify
+* signal delivery came from parent now (getpid() == info.si_pid).
+* 
https://lkml.kernel.org/r/cagxu5jjazaozp1qfz66tyrtbuywqb+un2soa1vlhpccoiyv...@mail.gmail.com
+* At least verify the SIGSTOP via PTRACE_GETSIGINFO.
+*/
+   EXPECT_EQ(SIGSTOP, info.si_signo);
 
/* Restart nanosleep with SIGCONT, which triggers restart_syscall. */
ASSERT_EQ(0, kill(child_pid, SIGCONT));
-- 
2.17.1


-- 
Kees Cook


Re: siginfo pid not populated from ptrace?

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 2:43 PM Eric W. Biederman  wrote:
>
> Kees Cook  writes:
> > What should we do for v4.20? I need to have the selftests actually
> > passing. :)
>
> For v4.20 we need to do one of two things.
> 1) Present a plausible case that someone will could care about,
>we document it in the commit we can perform my earlier partial revert.

If SIGSTOP si_pid can't be used to determine who sent the signal
reliably even before, then I'm guessing we'll never see a real-world
case where this matters.

> 2) Remove the sanity check seccomp_bpf.c
>
> I really just want to ensure we have clear reasoning here.

I'll remove it for now and add a link to this conversation, in case
anyone else goes looking.

-Kees

-- 
Kees Cook


Re: siginfo pid not populated from ptrace?

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 1:11 PM Eric W. Biederman  wrote:
>
> Tycho Andersen  writes:
>
> > On Thu, Dec 06, 2018 at 10:48:39AM -0800, Linus Torvalds wrote:
> >> On Thu, Dec 6, 2018 at 6:40 AM Eric W. Biederman  
> >> wrote:
> >> >
> >> > We have in the past had ptrace users that weren't just about debugging
> >> > so I don't know that it is fair to just dismiss it as debugging
> >> > infrastructure.
> >>
> >> Absolutely.
> >>
> >> Some uses are more than just debug. People occasionally use ptrace
> >> because it's the only way to do what they want, so you'll find people
> >> who do it for sandboxing, for example. It's not necessarily designed
> >> for that, or particularly fast or well-suited for it, but I've
> >> definitely seen it used that way.
> >>
> >> So I don't think the behavioral test breakage like this is necessarily
> >> a huge deal, and until some "real use" actually shows that it cares it
> >> might be something we dismiss as "just test", but it very much has the
> >> potential to hit real uses.
> >>
> >> The fact that a behavioral test broke is definitely interesting.
> >>
> >> And maybe some of the siginfo allocations could depend on whether the
> >> signal is actually ever caught or not.
> >>
> >> For example, a terminal signal (or one that is ignored) might not need
> >> siginfo. But if the process is ptraced, maybe that terminal signal
> >> isn't actually terminal? So we might have situations where we want to
> >> simply check "is the signal target being ptraced"..
> >
> > Yes, something like this, I suppose? It works for me.
>
> The challenge is that we could be delivering this to a zombie signal
> group leader.  At which point we won't deliver it to the target task.
>
> Sigh it is probably time that I dig in and figure out how to avoid that
> case which we need to fix anyway because we can get the permission
> checks wrong for multi-threaded processes that call setuid and friends.
>
> Once that is sorted your small change will at least be safe.
>
> Eric
>
> > From 3bcaadd56ebb532ab4d481556fcc0826d65efc43 Mon Sep 17 00:00:00 2001
> > From: Tycho Andersen 
> > Date: Thu, 6 Dec 2018 12:15:22 -0700
> > Subject: [PATCH] signal: allocate siginfo when a traced task gets SIGSTOP
> >
> > Tracers can view SIGSTOP:
> >
> > https://lore.kernel.org/lkml/87zhtthkuy@xmission.com/T/#u
> >
> > so let's allocate a siginfo for SIGSTOP when a task is traced.
> >
> > Signed-off-by: Tycho Andersen 
> > ---
> >  kernel/signal.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 9a32bc2088c9..ab4ba00119f4 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1056,11 +1056,14 @@ static int __send_signal(int sig, struct 
> > kernel_siginfo *info, struct task_struc
> >   goto ret;
> >
> >   result = TRACE_SIGNAL_DELIVERED;
> > +
> >   /*
> > -  * Skip useless siginfo allocation for SIGKILL SIGSTOP,
> > -  * and kernel threads.
> > +  * Skip useless siginfo allocation for SIGKILL and kernel threads.
> > +  * SIGSTOP is visible to tracers, so only skip allocation when the 
> > task
> > +  * is not traced.
> >*/
> > - if (sig_kernel_only(sig) || (t->flags & PF_KTHREAD))
> > + if ((sig == SIGKILL) || (!task_is_traced(t) && sig == SIGSTOP) ||
> > + (t->flags & PF_KTHREAD))
> >   goto out_set;
> >
> >   /*

What should we do for v4.20? I need to have the selftests actually passing. :)

-- 
Kees Cook


Re: [PATCH] proc/sysctl: fix return error for proc_doulongvec_minmax

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 12:52 AM Luis Chamberlain  wrote:
>
> On Thu, Dec 06, 2018 at 03:36:15PM +0800, Cheng Lin wrote:
> > If the number of input parameters is less than the total
> > parameters, an EINVAL error will be returned.
> >
> > e.g.
> > We use proc_doulongvec_minmax to pass up to two parameters
> > with kern_table.
> >
> > {
> >   .procname   = "monitor_signals",
> >   .data   = _sigs,
> >   .maxlen = 2*sizeof(unsigned long),
> >   .mode   = 0644,
> >   .proc_handler   = proc_doulongvec_minmax,
> > },
> >
> > Reproduce:
> > When passing two parameters, it's work normal. But passing
> > only one parameter, an error "Invalid argument"(EINVAL) is
> > returned.
> >
> > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
> > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > 1   2
> > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
> > -bash: echo: write error: Invalid argument
> > [root@cl150 ~]# echo $?
> > 1
> > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > 3   2
> > [root@cl150 ~]#
> >
> > The following is the result after apply this patch. No error
> > is returned when the number of input parameters is less than
> > the total parameters.
> >
> > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
> > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > 1   2
> > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
> > [root@cl150 ~]# echo $?
> > 0
> > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > 3   2
> > [root@cl150 ~]#
> >
> > There are three processing functions dealing with digital parameters,
> > __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.
> >
> > This patch deals with __do_proc_doulongvec_minmax, just as
> > __do_proc_dointvec does, adding a check for parameters 'left'. In
> > __do_proc_douintvec, its code implementation explicitly does not
> > support multiple inputs.
> >
> > static int __do_proc_douintvec(...){
> >  ...
> >  /*
> >   * Arrays are not supported, keep this simple. *Do not* add
> >   * support for them.
> >   */
> >  if (vleft != 1) {
> >  *lenp = 0;
> >  return -EINVAL;
> >  }
> >  ...
> > }
> >
> > So, just __do_proc_doulongvec_minmax has the problem. And most use of
> > proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
> > parameter.
> >
> > Signed-off-by: Cheng Lin 
>
> Thanks for fixing up the commit log.
>
> Acked-by: Luis Chamberlain 

Reviewed-by: Kees Cook 

-Kees

>
> I think we can live with this outside of stable. So stable is not
> needed. But I would not be surprised if autosel algorithm will end
> up picking it up. And if so.. well, it cannot hurt.
>
>   Luis



-- 
Kees Cook


Re: [PATCH 1/2] riscv: add support for SECCOMP incl. filters

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 10:26 AM David Abdurachmanov
 wrote:
>
> On Thu, Dec 6, 2018 at 5:52 PM Kees Cook  wrote:
> >
> > On Thu, Dec 6, 2018 at 7:02 AM David Abdurachmanov
> >  wrote:
> > > The patch adds support for SECCOMP and SECCOMP_FILTER (BPF).
> >
> > Can you add support to tools/testing/selftests/seccomp/seccomp_bpf.c
> > as well? That selftest finds a lot of weird corner-cases...
>
> I hate it locally and will include in v2.

I hate it too. ;) But seriously (reading through the "have" typo),
that's awesome. Thanks!

> The results see fine (tested in QEMU).

Great!

-- 
Kees Cook


Re: [PATCH v4] signal: add taskfd_send_signal() syscall

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 9:41 AM Christian Brauner  wrote:
> I feel changing the name around by a single persons preferences is not
> really a nice thing to do community-wise. So I'd like to hear other
> people chime in first before I make that change.

I don't think the name is hugely critical (but it's always the hardest
to settle on). My preference order would be:

taskfd_send_signal()
pidfd_send_signal()
procfd_send_signal()
fd_send_signal()

But, agreed, I think fdkill() should not be used.

-- 
Kees Cook


Re: [PATCH 1/2] riscv: add support for SECCOMP incl. filters

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 7:02 AM David Abdurachmanov
 wrote:
> The patch adds support for SECCOMP and SECCOMP_FILTER (BPF).

I built this against linux-next but it's missing seccomp.h. Was that
accidentally left out of the commit?


  CC  arch/riscv/kernel/asm-offsets.s
In file included from ./include/linux/sched.h:21:0,
 from arch/riscv/kernel/asm-offsets.c:18:
./include/linux/seccomp.h:14:10: fatal error: asm/seccomp.h: No such
file or directory
 #include 
  ^~~

-- 
Kees Cook


Re: [PATCH 1/2] riscv: add support for SECCOMP incl. filters

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 7:02 AM David Abdurachmanov
 wrote:
> The patch adds support for SECCOMP and SECCOMP_FILTER (BPF).

Can you add support to tools/testing/selftests/seccomp/seccomp_bpf.c
as well? That selftest finds a lot of weird corner-cases...

> diff --git a/arch/riscv/include/asm/thread_info.h 
> b/arch/riscv/include/asm/thread_info.h
> index 1c9cc8389928..1fd6e4130cab 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -81,6 +81,7 @@ struct thread_info {
>  #define TIF_MEMDIE 5   /* is terminating due to OOM killer */
>  #define TIF_SYSCALL_TRACEPOINT  6   /* syscall tracepoint 
> instrumentation */
>  #define TIF_SYSCALL_AUDIT  7   /* syscall auditing */
> +#define TIF_SECCOMP8   /* syscall secure computing */

Nit: extra tab needs to be removed.

-- 
Kees Cook


Re: [PATCH 2/2] riscv: fix syscall_{get,set}_arguments

2018-12-06 Thread Kees Cook
On Thu, Dec 6, 2018 at 7:02 AM David Abdurachmanov
 wrote:
>
> Testing with libseccomp master branch revealed that testcases with
> filters on syscall arguments were failing due to wrong values. Seccomp
> uses syscall_get_argumentsi() to copy syscall arguments, and there is a
> bug in pointer arithmetics in memcpy() call.
>
> Two alternative implementation were tested: the one in this patch and
> another one based on while-break loop. Both delivered the same results.
>
> This implementation is also used in arm, arm64 and nds32 arches.

Minor nit: can you make this the first patch? That way seccomp works
correctly from the point of introduction. :)

-Kees

>
> Signed-off-by: David Abdurachmanov 
> ---
>  arch/riscv/include/asm/syscall.h | 42 
>  1 file changed, 32 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/include/asm/syscall.h 
> b/arch/riscv/include/asm/syscall.h
> index bba3da6ef157..26ceb434a433 100644
> --- a/arch/riscv/include/asm/syscall.h
> +++ b/arch/riscv/include/asm/syscall.h
> @@ -70,19 +70,32 @@ static inline void syscall_set_return_value(struct 
> task_struct *task,
> regs->a0 = (long) error ?: val;
>  }
>
> +#define SYSCALL_MAX_ARGS 6
> +
>  static inline void syscall_get_arguments(struct task_struct *task,
>  struct pt_regs *regs,
>  unsigned int i, unsigned int n,
>  unsigned long *args)
>  {
> -   BUG_ON(i + n > 6);
> +   if (n == 0)
> +   return;
> +
> +   if (i + n > SYSCALL_MAX_ARGS) {
> +   unsigned long *args_bad = args + SYSCALL_MAX_ARGS - i;
> +   unsigned int n_bad = n + i - SYSCALL_MAX_ARGS;
> +   pr_warning("%s called with max args %d, handling only %d\n",
> +   __func__, i + n, SYSCALL_MAX_ARGS);
> +   memset(args_bad, 0, n_bad * sizeof(args[0]));
> +   }
> +
> if (i == 0) {
> args[0] = regs->orig_a0;
> args++;
> i++;
> n--;
> }
> -   memcpy(args, >a1 + i * sizeof(regs->a1), n * sizeof(args[0]));
> +
> +   memcpy(args, >a0 + i, n * sizeof(args[0]));
>  }
>
>  static inline void syscall_set_arguments(struct task_struct *task,
> @@ -90,14 +103,23 @@ static inline void syscall_set_arguments(struct 
> task_struct *task,
>  unsigned int i, unsigned int n,
>  const unsigned long *args)
>  {
> -   BUG_ON(i + n > 6);
> -if (i == 0) {
> -regs->orig_a0 = args[0];
> -args++;
> -i++;
> -n--;
> -}
> -   memcpy(>a1 + i * sizeof(regs->a1), args, n * sizeof(regs->a0));
> +   if (n == 0)
> +   return;
> +
> +   if (i + n > SYSCALL_MAX_ARGS) {
> +   pr_warning("%s called with max args %d, handling only %d\n",
> +   __func__, i + n, SYSCALL_MAX_ARGS);
> +   n = SYSCALL_MAX_ARGS - i;
> +   }
> +
> +   if (i == 0) {
> +   regs->orig_a0 = args[0];
> +   args++;
> +   i++;
> +   n--;
> +   }
> +
> +   memcpy(>a0 + i, args, n * sizeof(args[0]));
>  }
>
>  static inline int syscall_get_arch(void)
> --
> 2.19.2
>


-- 
Kees Cook


Re: [PATCH 1/2] riscv: add support for SECCOMP incl. filters

2018-12-06 Thread Kees Cook
; +* Seccomp already set return value for the current task pt_regs.
> +* (If it was configured with SECCOMP_RET_ERRNO/TRACE)
> +*/
> +ret_from_syscall_rejected:
> /* Trace syscalls, but only if requested by the user. */
> REG_L t0, TASK_TI_FLAGS(tp)
> andi t0, t0, _TIF_SYSCALL_WORK
> diff --git a/arch/riscv/kernel/ptrace.c b/arch/riscv/kernel/ptrace.c
> index c1b51539c3e2..598e48b8ca2b 100644
> --- a/arch/riscv/kernel/ptrace.c
> +++ b/arch/riscv/kernel/ptrace.c
> @@ -160,6 +160,14 @@ void do_syscall_trace_enter(struct pt_regs *regs)
> if (tracehook_report_syscall_entry(regs))
> syscall_set_nr(current, regs, -1);
>
> +   /*
> +* Do the secure computing after ptrace; failures should be fast.
> +* If this fails we might have return value in a0 from seccomp
> +    * (via SECCOMP_RET_ERRNO/TRACE).
> +*/
> +   if (secure_computing(NULL) == -1)
> +   syscall_set_nr(current, regs, -1);

On a -1 return, this should return immediately -- it should not
continue to process trace_sys_enter(), etc.

-Kees

> +
>  #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
> if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> trace_sys_enter(regs, syscall_get_nr(current, regs));
> --
> 2.19.2
>


-- 
Kees Cook


Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-12-05 Thread Kees Cook
On Wed, Dec 5, 2018 at 7:43 PM Steven Rostedt  wrote:
>
> On Wed, 5 Dec 2018 19:29:11 -0800
> Kees Cook  wrote:
>
> > On Wed, Dec 5, 2018 at 6:29 PM Steven Rostedt  wrote:
> > >
> > > On Wed, 5 Dec 2018 21:26:51 -0500
> > > Steven Rostedt  wrote:
> > >
> > > > On Wed, 5 Dec 2018 17:08:34 -0800
> > > > Kees Cook  wrote:
> > > >
> > >
> > > > I'll Ack the Makefile
> > > > change in the tracing directory, but the rest belongs to others.
> >
> > Okay, I wasn't sure. Anders's patch was marked "1/3" so I thought it
> > was directed at you. :)
> >
> > I'll grab this one in the gcc-plugins tree.
>
> Should I just take patch 2 then? I'm thinking it's independent too.
>
> I'm collecting patches for the next merge window right now so it wont
> really be an issue if I do.

Sure! I'm not sure what Anders's intention was, but yeah. :)

-- 
Kees Cook


Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-12-05 Thread Kees Cook
On Wed, Dec 5, 2018 at 6:29 PM Steven Rostedt  wrote:
>
> On Wed, 5 Dec 2018 21:26:51 -0500
> Steven Rostedt  wrote:
>
> > On Wed, 5 Dec 2018 17:08:34 -0800
> > Kees Cook  wrote:
> >
>
> > I'll Ack the Makefile
> > change in the tracing directory, but the rest belongs to others.

Okay, I wasn't sure. Anders's patch was marked "1/3" so I thought it
was directed at you. :)

I'll grab this one in the gcc-plugins tree.

-Kees

-- 
Kees Cook


Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread Kees Cook
On Wed, Dec 5, 2018 at 7:08 PM Christian Brauner  wrote:
> As a sidenote I'm switching the name from procfd_send_signal() to
> taskfd_send_signal(). It seems to me the best way to handle Eric's
> request to reflect that we can eventually both signal tgids and tids.

Cool; sounds fine to me.

-- 
Kees Cook


Re: [PATCH 1/3] stackleak: mark stackleak_track_stack() as notrace

2018-12-05 Thread Kees Cook
On Fri, Nov 30, 2018 at 7:09 AM Anders Roxell  wrote:
>
> Function graph tracing recurses into itself when stackleak is enabled,
> causing the ftrace graph selftest to run for up to 90 seconds and
> trigger the softlockup watchdog.
>
> Breakpoint 2, ftrace_graph_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:200
> 200 mcount_get_lr_addrx0// pointer to function's 
> saved lr
> (gdb) bt
> \#0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:200
> \#1  0xff80081d5280 in ftrace_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:153
> \#2  0xff8008555484 in stackleak_track_stack () at 
> ../kernel/stackleak.c:106
> \#3  0xff8008421ff8 in ftrace_ops_test (ops=0xff8009eaa840 
> , ip=18446743524091297036, regs=) at 
> ../kernel/trace/ftrace.c:1507
> \#4  0xff8008428770 in __ftrace_ops_list_func (regs=, 
> ignored=, parent_ip=, ip=) at 
> ../kernel/trace/ftrace.c:6286
> \#5  ftrace_ops_no_ops (ip=18446743524091297036, 
> parent_ip=18446743524091242824) at ../kernel/trace/ftrace.c:6321
> \#6  0xff80081d5280 in ftrace_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:153
> \#7  0xff800832fd10 in irq_find_mapping (domain=0xffc03fc4bc80, 
> hwirq=27) at ../kernel/irq/irqdomain.c:876
> \#8  0xff800832294c in __handle_domain_irq (domain=0xffc03fc4bc80, 
> hwirq=27, lookup=true, regs=0xff800814b840) at ../kernel/irq/irqdesc.c:650
> \#9  0xff80081d52b4 in ftrace_graph_caller () at 
> ../arch/arm64/kernel/entry-ftrace.S:205
>
> Rework so we mark stackleak_track_stack as notrace
>
> Co-developed-by: Arnd Bergmann 
> Signed-off-by: Arnd Bergmann 
> Signed-off-by: Anders Roxell 
> ---
>  kernel/stackleak.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> index e42892926244..5de3bf596dd7 100644
> --- a/kernel/stackleak.c
> +++ b/kernel/stackleak.c
> @@ -102,7 +102,7 @@ asmlinkage void stackleak_erase(void)
> current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
>  }
>
> -void __used stackleak_track_stack(void)
> +void __used notrace stackleak_track_stack(void)
>  {
> /*
>  * N.B. stackleak_erase() fills the kernel stack with the poison 
> value,
> --
> 2.19.2
>

Acked-by: Kees Cook 

Steven, I assume this series going via your tree?

-- 
Kees Cook


Re: siginfo pid not populated from ptrace?

2018-12-05 Thread Kees Cook
On Sat, Dec 1, 2018 at 7:04 AM Eric W. Biederman  wrote:
>
> Kees Cook  writes:
>
> > On Tue, Nov 27, 2018 at 8:44 PM Eric W. Biederman  
> > wrote:
> >>
> >> Kees Cook  writes:
> >>
> >> > On Tue, Nov 27, 2018 at 4:38 PM, Kees Cook  wrote:
> >> >> On Tue, Nov 27, 2018 at 3:21 PM, Tycho Andersen  wrote:
> >> >>> On Mon, Nov 12, 2018 at 12:24:43PM -0700, Tycho Andersen wrote:
> >> >>>> On Mon, Nov 12, 2018 at 11:55:38AM -0700, Tycho Andersen wrote:
> >> >>>> > I haven't manage to reproduce it on stock v4.20-rc2, unfortunately.
> >> >>>>
> >> >>>> Ok, now I have,
> >> >>>>
> >> >>>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1493) == 
> >> >>>> info._sifields._kill.si_pid (0)
> >> >>>> global.syscall_restart: Test failed at step #22
> >> >>>
> >> >>> Seems like this is still happening on v4.20-rc4,
> >> >>>
> >> >>> [ RUN  ] global.syscall_restart
> >> >>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1901) == 
> >> >>> info._sifields._kill.si_pid (0)
> >> >>> global.syscall_restart: Test failed at step #22
> >> >>
> >> >> This fails every time for me -- is it still racey for you?
> >> >>
> >> >> I'm attempting a bisect, hoping it doesn't _become_ racey for me. ;)
> >> >
> >> > This bisects to here for me:
> >> >
> >> > commit f149b31557446aff9ca96d4be7e39cc266f6e7cc
> >> > Author: Eric W. Biederman 
> >> > Date:   Mon Sep 3 09:50:36 2018 +0200
> >> >
> >> > signal: Never allocate siginfo for SIGKILL or SIGSTOP
> >> >
> >> > The SIGKILL and SIGSTOP signals are never delivered to userspace so
> >> > queued siginfo for these signals can never be observed.  Therefore
> >> > remove the chance of failure by never even attempting to allocate
> >> > siginfo in those cases.
> >> >
> >> > Reviewed-by: Thomas Gleixner 
> >> > Signed-off-by: "Eric W. Biederman" 
> >> >
> >> > They are certainly visible via seccomp ;)
> >>
> >> Well SIGSTOP is visible via PTRACE_GETSIGINFO.
> >>
> >> I see what is happening now.  Since we don't have queued siginfo
> >> we generate some as:
> >> /*
> >>  * Ok, it wasn't in the queue.  This must be
> >>  * a fast-pathed signal or we must have been
> >>  * out of queue space.  So zero out the info.
> >>  */
> >> clear_siginfo(info);
> >> info->si_signo = sig;
> >> info->si_errno = 0;
> >> info->si_code = SI_USER;
> >> info->si_pid = 0;
> >> info->si_uid = 0;
> >>
> >> Which allows last_signfo to be set,
> >> so despite not really having any siginfo PTRACE_GET_SIGINFO
> >> has something to return so does not return -EINVAL.
> >>
> >> Reconstructing my context that was part of removing SEND_SIG_FORCED
> >> so this looks like it will take a little more than a revert to fix
> >> this.
> >>
> >> This is definitely a change that is visible to user space.  The logic in
> >> my patch was definitely wrong with respect to SIGSTOP and
> >> PTRACE_GETSIGINFO.  Is there something in userspace that actually cares?
> >> AKA is the idiom that the test seccomp_bpf.c is using something that
> >> non-test code does?
> >
> > I think this would be needed by any ptracer that handled multiple
> > threads. It needs to figure out which pid stopped. I think it's worth
> > fixing, yes.
> >
> >> The change below should restore the old behavior.  I am just wondering
> >> if this is something we want to do.  siginfo is allocated with
> >> GFP_ATOMIC so if your machine is under memory pressure there is a real
> >> chance the allocation can fail.  Which would cause whatever is breaking
> >> now to break less deterministically then.
> >
> > I think memory pressure that would block a 128 byte GFP_ATOMIC
> > allocation would mean the system was about to seriously fall over.
> > Given the user-facing behavior change and that 

Re: Re: [PATCH] proc/sysctl: fix return error for proc_doulongvec_minmax

2018-12-05 Thread Kees Cook
On Mon, Dec 3, 2018 at 12:14 PM Luis Chamberlain  wrote:
> Since this worked before I do agree that we need to keep it working now,
> and I can't think of an issue with returning 0 now. Since this is about
> semantics though I'd like a bit more review from at last one more
> person.
>
> Kees, Eric, Andrew?

This is a weird one: it would return an error _AND_ still perform the
write. :( I think this patch is right, and I struggle to imagine a
case where removing the failure is a problem.

A quick question, though, do we want to instead do the reverse? (Not
update, and keep the error?) Are there any examples of doing partial
writes like this in real software?

The proposed change is the safest change, though...

-- 
Kees Cook


Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread Kees Cook
On Wed, Dec 5, 2018 at 12:53 PM Christian Brauner  wrote:
> On Wed, Dec 05, 2018 at 12:20:43PM -0600, Eric W. Biederman wrote:
> > Christian Brauner  writes:
> > > [1]:  https://lkml.org/lkml/2018/11/18/130
> > > [2]:  
> > > https://lore.kernel.org/lkml/874lbtjvtd@oldenburg2.str.redhat.com/
> > > [3]:  
> > > https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6f...@brauner.io/
> > > [4]:  
> > > https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru...@brauner.io/
> > > [5]:  https://lore.kernel.org/lkml/20181121213946.ga10...@mail.hallyn.com/
> > > [6]:  
> > > https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6...@brauner.io/
> > > [7]:  
> > > https://lore.kernel.org/lkml/36323361-90bd-41af-ab5b-ee0d7ba02...@amacapital.net/
> > > [8]:  https://lore.kernel.org/lkml/87tvjxp8pc@xmission.com/
> > > [9]:  https://asciinema.org/a/X1J8eGhe3vCfBE2b9TXtTaSJ7
> > > [10]: 
> > > https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru...@brauner.io/
> > > [11]: 
> > > https://lore.kernel.org/lkml/f53d6d38-3521-4c20-9034-5af447df6...@amacapital.net/

I nominate this for 2018's most-well-documented syscall commit log award. ;)

> > > +   /*
> > > +* Give userspace a way to detect whether /proc//task/ fds
> > > +* are supported.
> > > +*/
> > > +   ret = -EOPNOTSUPP;
> > > +   if (proc_is_tid_procfd(f.file))
> > > +   goto err;
> >
> >   -EBADF is the proper error code.
>
> This is done so that userspace has a way of figuring out that tid fds
> are not yet supported. This has been discussed with Florian (see commit
> message).

Right, we should keep this -EOPNOTSUPP.

> > > +   /* Is this a procfd? */
> > > +   ret = -EINVAL;
> > > +   if (!proc_is_tgid_procfd(f.file))
> > > +   goto err;
> >
> >   -EBADF is the proper error code.

Yeah, EINVAL tends to be used for bad flags... this is more about an
improper fd.

> >
> > > +   /* Without CONFIG_PROC_FS proc_pid() returns NULL. */
> > > +   pid = proc_pid(file_inode(f.file));
> > > +   if (!pid)
> > > +   goto err;
> >
> > Perhaps you want to fold the proc_pid into the proc_is_tgid_procfd
> > call.  That way proc_pid can stay private to proc.
>
> Hm, I guess we can do that for now. My intention was to have reuseable
> helpers but I guess it would be fine for now.
>
> >
> > > +   if (!may_signal_procfd(pid))
> > > +   goto err;
> > > +

Does the ns parent checking in may_signal_procfd need any locking or
RCU? I know pid and current namespaces are "pinned", but I don't know
how parent ns works here. I'm assuming the parents are stuck until all
children go away?

> > > +   ret = kill_pid_info(sig, , pid);

Just double-checking for myself: this does not bypass
security_task_kill(), so no problem there AFAIK.

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH AUTOSEL 4.19 104/123] pstore/ram: Correctly calculate usable PRZ bytes

2018-12-05 Thread Kees Cook
On Wed, Dec 5, 2018 at 1:41 AM Sasha Levin  wrote:
>
> From: Kees Cook 
>
> [ Upstream commit 89d328f637b9904b6d4c9af73c8a608b8dd4d6f8 ]
>
> The actual number of bytes stored in a PRZ is smaller than the
> bytes requested by platform data, since there is a header on each
> PRZ. Additionally, if ECC is enabled, there are trailing bytes used
> as well. Normally this mismatch doesn't matter since PRZs are circular
> buffers and the leading "overflow" bytes are just thrown away. However, in
> the case of a compressed record, this rather badly corrupts the results.
>
> This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
> Any stored crashes would not be uncompressable (producing a pstorefs
> "dmesg-*.enc.z" file), and triggering errors at boot:
>
>   [2.790759] pstore: crypto_comp_decompress failed, ret = -22!
>
> Backporting this depends on commit 70ad35db3321 ("pstore: Convert console
> write to use ->write_buf")

Please note the above. If this gets backported, this one is needed too.

-Kees

>
> Reported-by: Joel Fernandes 
> Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore")
> Signed-off-by: Kees Cook 
> Reviewed-by: Joel Fernandes (Google) 
> Signed-off-by: Sasha Levin 
> ---
>  fs/pstore/ram.c| 15 ++-
>  include/linux/pstore.h |  5 -
>  2 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index f4fd2e72add4..03cd59375abe 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -806,17 +806,14 @@ static int ramoops_probe(struct platform_device *pdev)
>
> cxt->pstore.data = cxt;
> /*
> -* Console can handle any buffer size, so prefer LOG_LINE_MAX. If we
> -* have to handle dumps, we must have at least record_size buffer. And
> -* for ftrace, bufsize is irrelevant (if bufsize is 0, buf will be
> -* ZERO_SIZE_PTR).
> +* Since bufsize is only used for dmesg crash dumps, it
> +* must match the size of the dprz record (after PRZ header
> +* and ECC bytes have been accounted for).
>  */
> -   if (cxt->console_size)
> -   cxt->pstore.bufsize = 1024; /* LOG_LINE_MAX */
> -   cxt->pstore.bufsize = max(cxt->record_size, cxt->pstore.bufsize);
> -   cxt->pstore.buf = kmalloc(cxt->pstore.bufsize, GFP_KERNEL);
> +   cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
> +   cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
> if (!cxt->pstore.buf) {
> -   pr_err("cannot allocate pstore buffer\n");
> +   pr_err("cannot allocate pstore crash dump buffer\n");
> err = -ENOMEM;
> goto fail_clear;
> }
> diff --git a/include/linux/pstore.h b/include/linux/pstore.h
> index a15bc4d48752..30fcec375a3a 100644
> --- a/include/linux/pstore.h
> +++ b/include/linux/pstore.h
> @@ -90,7 +90,10 @@ struct pstore_record {
>   *
>   * @buf_lock:  spinlock to serialize access to @buf
>   * @buf:   preallocated crash dump buffer
> - * @bufsize:   size of @buf available for crash dump writes
> + * @bufsize:   size of @buf available for crash dump bytes (must match
> + * smallest number of bytes available for writing to a
> + * backend entry, since compressed bytes don't take kindly
> + * to being truncated)
>   *
>   * @read_mutex:serializes @open, @read, @close, and @erase callbacks
>   * @flags: bitfield of frontends the backend can accept writes for
> --
> 2.17.1
>


-- 
Kees Cook


Re: [PATCH 00/16] v6 kernel core pieces refcount conversions

2018-12-04 Thread Kees Cook
On Wed, Nov 15, 2017 at 6:07 AM Elena Reshetova
 wrote:
> Changes in v6:
>  * memory ordering differences are outlined in each patch
>together with potential problematic areas.
>   Note: I didn't include any statements in individual patches
>   on why I think the memory ordering changes do not matter
>   in that particular case since ultimately these are only
>   known by maintainers (unless explicitly documented) and
>   very hard to figure out reliably from the code.
>   Therefore maintainers are expected to double check the
>   specific pointed functions and make the end decision.
>  * rebase on top of today's linux-next/master

*thread resurrection*

Was there a v7 for this series? I'd like to finish off any of the
known outstanding refcount_t conversions.

Thanks!

-- 
Kees Cook


[PATCH] fanotify: Make sure to check event_len when copying

2018-12-04 Thread Kees Cook
As a precaution, make sure we check event_len when copying to userspace.
Based on old feedback: https://lkml.kernel.org/r/542d9fe5.3010...@gmx.de

Signed-off-by: Kees Cook 
---
 fs/notify/fanotify/fanotify_user.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c 
b/fs/notify/fanotify/fanotify_user.c
index e03be5071362..d9484a0ac6b3 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -206,7 +206,7 @@ static int process_access_response(struct fsnotify_group 
*group,
 
 static ssize_t copy_event_to_user(struct fsnotify_group *group,
  struct fsnotify_event *event,
- char __user *buf)
+ char __user *buf, size_t count)
 {
struct fanotify_event_metadata fanotify_event_metadata;
struct file *f;
@@ -220,6 +220,12 @@ static ssize_t copy_event_to_user(struct fsnotify_group 
*group,
 
fd = fanotify_event_metadata.fd;
ret = -EFAULT;
+   /*
+* Sanity check copy size in case get_one_event() and
+* fill_event_metadata() event_len sizes ever get out of sync.
+*/
+   if (WARN_ON_ONCE(fanotify_event_metadata.event_len > count))
+   goto out_close_fd;
if (copy_to_user(buf, _event_metadata,
 fanotify_event_metadata.event_len))
goto out_close_fd;
@@ -295,7 +301,7 @@ static ssize_t fanotify_read(struct file *file, char __user 
*buf,
continue;
}
 
-   ret = copy_event_to_user(group, kevent, buf);
+   ret = copy_event_to_user(group, kevent, buf, count);
if (unlikely(ret == -EOPENSTALE)) {
/*
 * We cannot report events with stale fd so drop it.
-- 
2.17.1


-- 
Kees Cook


Re: [PATCH] pstore: Convert buf_lock to semaphore

2018-12-04 Thread Kees Cook
On Tue, Dec 4, 2018 at 7:41 AM Sebastian Andrzej Siewior
 wrote:
>
> On 2018-11-30 14:47:36 [-0800], Kees Cook wrote:
> > diff --git a/drivers/firmware/efi/efi-pstore.c 
> > b/drivers/firmware/efi/efi-pstore.c
> > index cfe87b465819..0f7d97917197 100644
> > --- a/drivers/firmware/efi/efi-pstore.c
> > +++ b/drivers/firmware/efi/efi-pstore.c
> > @@ -259,8 +259,7 @@ static int efi_pstore_write(struct pstore_record 
> > *record)
> >   efi_name[i] = name[i];
> >
> >   ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES,
> > -   !pstore_cannot_block_path(record->reason),
> > -   record->size, record->psi->buf);
> > +   preemptible(), record->size, record->psi->buf);
>
> Well. Better I think.
> might_sleep() / preempt_count_equals() checks for preemptible() + 
> rcu_preempt_depth().
> kmsg_dump() starts with rcu_read_lock() which means with this patch applied I
> got:

Okay, so, if kmsg_dump() uses rcu_read_lock(), that means efi-pstore
can _never_ sleep, and it's nothing to do with pstore internals. :( I
guess we just hard-code it, then? And efi-pstore should probably only
attach to pstore if it has a nonblock implementation (and warn if one
isn't available).

-Kees

-- 
Kees Cook


Re: [PATCH v3] panic: Avoid the extra noise dmesg

2018-12-04 Thread Kees Cook
On Mon, Dec 3, 2018 at 9:42 PM Feng Tang  wrote:
>
> When kernel panic happens, it will first print the panic call stack,
> then the ending msg like:
>
> [   35.743249] ---[ end Kernel panic - not syncing: Fatal exception
> [   35.749975] [ cut here ]
>
> The above message are very useful for debugging.
>
> But if system is configured to not reboot on panic, say the "panic_timeout"
> parameter equals 0, it will likely print out many noisy message like
> WARN() call stack for each and every CPU except the panic one, messages
> like below:
>
> WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 
> set_task_cpu+0x183/0x190
> Call Trace:
> 
> try_to_wake_up
> default_wake_function
> autoremove_wake_function
> __wake_up_common
> __wake_up_common_lock
> __wake_up
> wake_up_klogd_work_func
> irq_work_run_list
> irq_work_tick
> update_process_times
> tick_sched_timer
> __hrtimer_run_queues
> hrtimer_interrupt
> smp_apic_timer_interrupt
> apic_timer_interrupt
>
> For people working in console mode, the screen will first show the panic
> call stack, but immediately overridded by these noisy extra messages, which
> makes debugging much more difficult, as the original context gets lost on
> screen.
>
> Also these noisy messages will confuse some users, as I have seen many bug
> reporters posted the noisy message into bugzilla, instead of the real panic
> call stack and context.
>
> Removing the "local_irq_enable" will avoid the noisy message.
>
> The justification for the removing is: when code runs to this point, it
> means user has chosed to not reboot, or do any special handling by using
> the panic notifier method, no much point in re-enabling the interrupt.
>
> Signed-off-by: Feng Tang 
> Cc: Thomas Gleixner 
> Cc: Kees Cook 
> Cc: Borislav Petkov 
> Cc: sta...@kernel.org

Acked-by: Kees Cook 

-Kees

> ---
>  kernel/panic.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index f6d549a..a616e55 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -295,7 +295,6 @@ void panic(const char *fmt, ...)
> }
>  #endif
> pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
> -   local_irq_enable();
> for (i = 0; ; i += PANIC_TIMER_STEP) {
> touch_softlockup_watchdog();
> if (i >= i_next) {
> --
> 2.7.4
>


-- 
Kees Cook


[PATCH] pstore/ram: Avoid NULL deref in ftrace merging failure path

2018-12-03 Thread Kees Cook
Given corruption in the ftrace records, it might be possible to allocate
tmp_prz without assigning prz to it, but still marking it as needing to
be freed, which would cause at least a NULL dereference.

smatch warnings:
fs/pstore/ram.c:340 ramoops_pstore_read() error: we previously assumed 'prz' 
could be null (see line 255)

https://lists.01.org/pipermail/kbuild-all/2018-December/055528.html

Reported-by: Dan Carpenter 
Fixes: 2fbea82bbb89 ("pstore: Merge per-CPU ftrace records into one")
Cc: "Joel Fernandes (Google)" 
Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index e6d9560ea455..96f7d32cd184 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -291,6 +291,7 @@ static ssize_t ramoops_pstore_read(struct pstore_record 
*record)
  GFP_KERNEL);
if (!tmp_prz)
return -ENOMEM;
+   prz = tmp_prz;
free_prz = true;
 
while (cxt->ftrace_read_cnt < cxt->max_ftrace_cnt) {
@@ -309,7 +310,6 @@ static ssize_t ramoops_pstore_read(struct pstore_record 
*record)
goto out;
}
record->id = 0;
-   prz = tmp_prz;
    }
    }
 
-- 
2.17.1


-- 
Kees Cook


Re: [PATCH] prctl: add PR_{GET,SET}_KILL_DESCENDANTS_ON_EXIT

2018-11-30 Thread Kees Cook
On Fri, Nov 30, 2018 at 2:33 AM Oleg Nesterov  wrote:
>
> On 11/29, Jürg Billeter wrote:
> >
> > On Thu, 2018-11-29 at 13:34 +0100, Oleg Nesterov wrote:
> > > So I think the patch is mostly fine, the only problem I can see is that
> > > PR_SET_KILL_DESCENDANTS_ON_EXIT can race with PR_SET_CHILD_SUBREAPER, 
> > > they both
> > > need to update the bits in the same word.
> >
> > Good point. I'll make it a regular bool instead of a bitfield for v2,
>
> Agreed,

Is it worth doing something for singal_struct like we did for
task_struct with the TASK_PFA_* and atomic_flags?

-- 
Kees Cook


Fwd: [PATCH RFC 00/15] Zero ****s, hugload of hugs <3

2018-11-30 Thread Kees Cook
[resend with CCs dropped so it'll actually get delivered to lkml...]

On Fri, Nov 30, 2018 at 11:27 AM Jarkko Sakkinen
 wrote:
>
> In order to comply with the CoC, replace  with a hug.

Heh. I support the replacement of the stronger language, but I find
"hug", "hugged", and "hugging" to be very weird replacements. Can we
bikeshed this to "heck", "hecked", and "hecking" (or "heckin" to
follow true Doggo meme style).

"This API is hugged" doesn't make any sense to me. "This API is
hecked" is better, or at least funnier (to me). "Hug this interface"
similarly makes no sense, but "Heck this interface" seems better.
"Don't touch my hecking code", "What the heck were they thinking?"
etc... "hug" is odd.

Better yet, since it's only 17 files, how about doing context-specific
changes? "This API is terrible", "Hateful interface", "Don't touch my
freakin' code", "What in the world were they thinking?" etc?

-Kees

>
> Jarkko Sakkinen (15):
>   MIPS: replace  with a hug
>   Documentation: replace  with a hug
>   drm/nouveau: replace  with a hug
>   m68k: replace  with a hug
>   parisc: replace  with a hug
>   cpufreq: replace  with a hug
>   ide: replace  with a hug
>   media: replace  with a hug
>   mtd: replace  with a hug
>   net/sunhme: replace  with a hug
>   scsi: replace  with a hug
>   inotify: replace  with a hug
>   irq: replace  with a hug
>   lib: replace  with a hug
>   net: replace  with a hug
>
>  Documentation/kernel-hacking/locking.rst  |  2 +-
>  arch/m68k/include/asm/sun3ints.h  |  2 +-
>  arch/mips/pci/ops-bridge.c| 24 +--
>  arch/mips/sgi-ip22/ip22-setup.c   |  2 +-
>  arch/parisc/kernel/sys_parisc.c   |  2 +-
>  drivers/cpufreq/powernow-k7.c |  2 +-
>  .../gpu/drm/nouveau/nvkm/subdev/bios/init.c   |  2 +-
>  .../nouveau/nvkm/subdev/pmu/fuc/macros.fuc|  2 +-
>  drivers/ide/cmd640.c  |  2 +-
>  drivers/media/i2c/bt819.c |  8 ---
>  drivers/mtd/mtd_blkdevs.c |  2 +-
>  drivers/net/ethernet/sun/sunhme.c |  4 ++--
>  drivers/scsi/qlogicpti.h  |  2 +-
>  fs/notify/inotify/inotify_user.c  |  2 +-
>  kernel/irq/timings.c  |  2 +-
>  lib/vsprintf.c|  2 +-
>  net/core/skbuff.c |  2 +-
>  17 files changed, 33 insertions(+), 31 deletions(-)
>
> --
> 2.19.1
>

-- 
Kees Cook


[GIT PULL] gcc-plugins fix for v4.20-rc5

2018-11-30 Thread Kees Cook
Hi Linus,

Please pull this gcc-plugin fix for v4.20-rc5.

Thanks!

-Kees

The following changes since commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad:

  Linux 4.20-rc2 (2018-11-11 17:12:31 -0600)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/gcc-plugins-v4.20-rc5

for you to fetch changes up to ef1a8409348966f0b25ff97a170d6d0367710ea9:

  stackleak: Disable function tracing and kprobes for stackleak_erase() 
(2018-11-30 09:05:07 -0800)


stackleak plugin fix

- Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov)


Alexander Popov (1):
  stackleak: Disable function tracing and kprobes for stackleak_erase()

 kernel/stackleak.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
Kees Cook


Re: [PATCH 09/14] fs/pstore: Use %pS printk format for direct addresses

2018-11-29 Thread Kees Cook
On Thu, Nov 29, 2018 at 3:49 PM Luck, Tony  wrote:
>
> On Thu, Nov 29, 2018 at 03:26:51PM -0800, Kees Cook wrote:
> > On Wed, Sep 6, 2017 at 1:28 PM Helge Deller  wrote:
> > >
> > > Use the %pS instead of the %pF printk format specifier for printing 
> > > symbols
> > > from direct addresses. This is needed for the ia64, ppc64 and parisc64
> > > architectures.
> > >
> > > Signed-off-by: Helge Deller 
> > > Cc: Kees Cook 
> > > Cc: Tony Luck 
> >
> > I missed this email from some time ago. I've now applied it, thanks!
>
> Though it isn't really needed any more. See
>
> 04b8eb7a4ccd ("symbol lookup: introduce dereference_symbol_descriptor()")
>
> which is part of the plan to deprecate %pF

Ah-ha! Okay then, nevermind. :) Thanks!

-- 
Kees Cook


Re: [PATCH v5 0/5] pstore: ramoops: support multiple pmsg instances

2018-11-29 Thread Kees Cook
On Sun, Feb 26, 2017 at 5:54 PM Nobuhiro Iwamatsu
 wrote:
> The following series implements multiple pmsg. This feature allows userspace
> program to control individual content aging or priority.

I'd like to take a look at this series again, if you're still
interested in upstreaming it. :) There were a lot of things that
needed refactoring in pstore to sanely deal with this, and I think
we're there now. Is this still desired?

Thanks,

-Kees

>
> If a pstore backend module(e.g. ramoops) requires the multiple pmsg instances
> when registering itself to pstore, multiple /dev/pmsg[ID] are created. Writes
> to each /dev/pmsg[ID] are isolated each other. After reboot, the contents are
> available in /sys/fs/pstore/pmsg-[backend]-[ID].
>
> In addition, we add multiple pmsg support for ramoops. We can specify multiple
> pmsg area size by its module parameter as follows.
>
>  pmsg_size=0x1000,0x2000,...
>
> I did check the operation of this feature on CycloneV (socfpga) Helio board.
>
> v5:
>   Add commit: "pstore: Change parameter of ramoops_free_przs()"
> - I forgot addition to previous patch series.
>   Update commit: "pstore: support multiple pmsg instances"
> - Fix comment.
> - Fix initialization of num_pmsg.
>
> v4:
>   Rebase to 4.10-rc5
>   The following patches have been removed from this series as similar 
> functions
>   were modified by other commit.
>  - pstore: Replace four kzalloc() calls by kcalloc() in 
> ramoops_init_przs()
>  - pstore: Change parameter of ramoops_free_przs()
>  - pstore: Rename 'przs' to 'dprzs' in struct ramoops_context
>  - ramoops: Rename ramoops_init_prz() to ramoops_init_dprzs()
>
> v3:
>   Rebase to v4.8.
>   Split patch.
>   merged device_create().
>   Remove Blank lines.
>   Update documentiation of DT binding.
>   Update parsing function of ramoops_pmsg_size, add NULL termination.
>   Update module parameters for pmsg_size list.
>
> Hiraku Toyooka (2):
>   pstore: support multiple pmsg instances
>   selftests/pstore: add testcases for multiple pmsg instances
>
> Nobuhiro Iwamatsu (3):
>   pstore: Change parameter of ramoops_free_przs()
>   ramoops: Add __ramoops_init_prz() as generic function
>   ramoops: support multiple pmsg instances
>
>  Documentation/admin-guide/ramoops.rst  |  22 ++
>  .../bindings/reserved-memory/ramoops.txt   |   6 +-
>  fs/pstore/pmsg.c   |  23 +-
>  fs/pstore/ram.c| 317 
> -
>  include/linux/pstore.h |   1 +
>  include/linux/pstore_ram.h |   8 +-
>  tools/testing/selftests/pstore/common_tests|  21 +-
>  .../selftests/pstore/pstore_post_reboot_tests  |  27 +-
>  tools/testing/selftests/pstore/pstore_tests|  14 +-
>  9 files changed, 342 insertions(+), 97 deletions(-)
>
> --
> 2.11.0
>


-- 
Kees Cook


Re: [PATCH 09/14] fs/pstore: Use %pS printk format for direct addresses

2018-11-29 Thread Kees Cook
On Wed, Sep 6, 2017 at 1:28 PM Helge Deller  wrote:
>
> Use the %pS instead of the %pF printk format specifier for printing symbols
> from direct addresses. This is needed for the ia64, ppc64 and parisc64
> architectures.
>
> Signed-off-by: Helge Deller 
> Cc: Kees Cook 
> Cc: Tony Luck 

I missed this email from some time ago. I've now applied it, thanks!

-Kees

> ---
>  fs/pstore/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
> index fefd226..59f65d7 100644
> --- a/fs/pstore/inode.c
> +++ b/fs/pstore/inode.c
> @@ -116,7 +116,7 @@ static int pstore_ftrace_seq_show(struct seq_file *s, 
> void *v)
>
> rec = (struct pstore_ftrace_record *)(ps->record->buf + data->off);
>
> -   seq_printf(s, "CPU:%d ts:%llu %08lx  %08lx  %pf <- %pF\n",
> +   seq_printf(s, "CPU:%d ts:%llu %08lx  %08lx  %ps <- %pS\n",
>pstore_ftrace_decode_cpu(rec),
>pstore_ftrace_read_timestamp(rec),
>rec->ip, rec->parent_ip, (void *)rec->ip,
> --
> 2.1.0
>


-- 
Kees Cook


Re: [PATCH] pstore: Fix bool initialization/comparison

2018-11-29 Thread Kees Cook
On Sat, Oct 7, 2017 at 7:24 AM Thomas Meyer  wrote:
> Bool initializations should use true and false. Bool tests don't need
> comparisons.
>
> Signed-off-by: Thomas Meyer 

I totally missed this email (now over a year ago!)

However, since it's still correct and applies, I've taken it for -next
now. :) Thanks!

-Kees

> ---
>
> diff -u -p a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c
> --- a/fs/pstore/ftrace.c
> +++ b/fs/pstore/ftrace.c
> @@ -148,7 +148,7 @@ void pstore_unregister_ftrace(void)
> mutex_lock(_ftrace_lock);
> if (pstore_ftrace_enabled) {
> unregister_ftrace_function(_ftrace_ops);
> -   pstore_ftrace_enabled = 0;
> +   pstore_ftrace_enabled = false;
> }
>     mutex_unlock(_ftrace_lock);
>


-- 
Kees Cook


[GIT PULL] pstore fix for v4.20-rc5

2018-11-29 Thread Kees Cook
Hi Linus,

Please pull this pstore fix for v4.20-rc5.

Thanks!

-Kees

The following changes since commit 1227daa43bce1318ff6fb54e6cd862b4f60245c7:

  pstore/ram: Clarify resource reservation labels (2018-10-22 07:11:58 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git 
tags/pstore-v4.20-rc5

for you to fetch changes up to 89d328f637b9904b6d4c9af73c8a608b8dd4d6f8:

  pstore/ram: Correctly calculate usable PRZ bytes (2018-11-29 13:46:43 -0800)


pstore fix:

- Fix corrupted compression due to unlucky size choice with ECC


Kees Cook (1):
  pstore/ram: Correctly calculate usable PRZ bytes

 fs/pstore/ram.c| 15 ++-
 include/linux/pstore.h |  5 -
 2 files changed, 10 insertions(+), 10 deletions(-)

-- 
Kees Cook


[PATCH -next v2 04/12] pstore: Avoid duplicate call of persistent_ram_zap()

2018-11-29 Thread Kees Cook
From: Peng Wang 

When initialing a prz, if invalid data is found (no PERSISTENT_RAM_SIG),
the function call path looks like this:

ramoops_init_prz ->
persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
persistent_ram_zap

As we can see, persistent_ram_zap() is called twice.
We can avoid this by adding an option to persistent_ram_new(), and
only call persistent_ram_zap() when it is needed.

Signed-off-by: Peng Wang 
[kees: minor tweak to exit path and commit log]
Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c|  4 +---
 fs/pstore/ram_core.c   | 15 +--
 include/linux/pstore_ram.h |  1 +
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index e02a9039b5ea..768759841491 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -640,7 +640,7 @@ static int ramoops_init_prz(const char *name,
 
label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
*prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
- cxt->memtype, 0, label);
+ cxt->memtype, PRZ_FLAG_ZAP_OLD, label);
if (IS_ERR(*prz)) {
int err = PTR_ERR(*prz);
 
@@ -649,8 +649,6 @@ static int ramoops_init_prz(const char *name,
return err;
}
 
-   persistent_ram_zap(*prz);
-
*paddr += sz;
 
return 0;
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 12e21f789194..23ca6f2c98a0 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -489,6 +489,7 @@ static int persistent_ram_post_init(struct 
persistent_ram_zone *prz, u32 sig,
struct persistent_ram_ecc_info *ecc_info)
 {
int ret;
+   bool zap = !!(prz->flags & PRZ_FLAG_ZAP_OLD);
 
ret = persistent_ram_init_ecc(prz, ecc_info);
if (ret)
@@ -498,23 +499,25 @@ static int persistent_ram_post_init(struct 
persistent_ram_zone *prz, u32 sig,
 
if (prz->buffer->sig == sig) {
if (buffer_size(prz) > prz->buffer_size ||
-   buffer_start(prz) > buffer_size(prz))
+   buffer_start(prz) > buffer_size(prz)) {
pr_info("found existing invalid buffer, size %zu, start 
%zu\n",
buffer_size(prz), buffer_start(prz));
-   else {
+   zap = true;
+   } else {
pr_debug("found existing buffer, size %zu, start %zu\n",
 buffer_size(prz), buffer_start(prz));
persistent_ram_save_old(prz);
-   return 0;
}
} else {
pr_debug("no valid data in buffer (sig = 0x%08x)\n",
 prz->buffer->sig);
+   prz->buffer->sig = sig;
+   zap = true;
}
 
-   /* Rewind missing or invalid memory area. */
-   prz->buffer->sig = sig;
-   persistent_ram_zap(prz);
+   /* Reset missing, invalid, or single-use memory area. */
+   if (zap)
+   persistent_ram_zap(prz);
 
return 0;
 }
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index 602d64725222..6e94980357d2 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -30,6 +30,7 @@
  * PRZ_FLAG_NO_LOCK is used. For all other cases, locking is required.
  */
 #define PRZ_FLAG_NO_LOCK   BIT(0)
+#define PRZ_FLAG_ZAP_OLD   BIT(1)
 
 struct persistent_ram_buffer;
 struct rs_control;
-- 
2.17.1



[PATCH -next v2 01/12] pstore/ram: Correctly calculate usable PRZ bytes

2018-11-29 Thread Kees Cook
The actual number of bytes stored in a PRZ is smaller than the
bytes requested by platform data, since there is a header on each
PRZ. Additionally, if ECC is enabled, there are trailing bytes used
as well. Normally this mismatch doesn't matter since PRZs are circular
buffers and the leading "overflow" bytes are just thrown away. However, in
the case of a compressed record, this rather badly corrupts the results.

This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
Any stored crashes would not be uncompressable (producing a pstorefs
"dmesg-*.enc.z" file), and triggering errors at boot:

  [2.790759] pstore: crypto_comp_decompress failed, ret = -22!

Backporting this depends on commit 70ad35db3321 ("pstore: Convert console
write to use ->write_buf")

Reported-by: Joel Fernandes 
Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore")
Signed-off-by: Kees Cook 
Reviewed-by: Joel Fernandes (Google) 
---
 fs/pstore/ram.c| 15 ++-
 include/linux/pstore.h |  5 -
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index ffcff6516e89..e02a9039b5ea 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -816,17 +816,14 @@ static int ramoops_probe(struct platform_device *pdev)
 
cxt->pstore.data = cxt;
/*
-* Console can handle any buffer size, so prefer LOG_LINE_MAX. If we
-* have to handle dumps, we must have at least record_size buffer. And
-* for ftrace, bufsize is irrelevant (if bufsize is 0, buf will be
-* ZERO_SIZE_PTR).
+* Since bufsize is only used for dmesg crash dumps, it
+* must match the size of the dprz record (after PRZ header
+* and ECC bytes have been accounted for).
 */
-   if (cxt->console_size)
-   cxt->pstore.bufsize = 1024; /* LOG_LINE_MAX */
-   cxt->pstore.bufsize = max(cxt->record_size, cxt->pstore.bufsize);
-   cxt->pstore.buf = kmalloc(cxt->pstore.bufsize, GFP_KERNEL);
+   cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
+   cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
if (!cxt->pstore.buf) {
-   pr_err("cannot allocate pstore buffer\n");
+   pr_err("cannot allocate pstore crash dump buffer\n");
err = -ENOMEM;
goto fail_clear;
}
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index a15bc4d48752..30fcec375a3a 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -90,7 +90,10 @@ struct pstore_record {
  *
  * @buf_lock:  spinlock to serialize access to @buf
  * @buf:   preallocated crash dump buffer
- * @bufsize:   size of @buf available for crash dump writes
+ * @bufsize:   size of @buf available for crash dump bytes (must match
+ * smallest number of bytes available for writing to a
+ * backend entry, since compressed bytes don't take kindly
+ * to being truncated)
  *
  * @read_mutex:serializes @open, @read, @close, and @erase callbacks
  * @flags: bitfield of frontends the backend can accept writes for
-- 
2.17.1



[PATCH -next v2 07/12] pstore/ram: Add kern-doc for struct persistent_ram_zone

2018-11-29 Thread Kees Cook
The struct persistent_ram_zone wasn't well documented. This adds kern-doc
for it.

Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c   | 10 +
 include/linux/pstore_ram.h | 46 +++---
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 62830734deee..3e9e3ba4fb07 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -29,6 +29,16 @@
 #include 
 #include 
 
+/**
+ * struct persistent_ram_buffer - persistent circular RAM buffer
+ *
+ * @sig:
+ * signature to indicate header (PERSISTENT_RAM_SIG xor PRZ-type value)
+ * @start:
+ * offset into @data where the beginning of the stored bytes begin
+ * @size:
+ * number of valid bytes stored in @data
+ */
 struct persistent_ram_buffer {
uint32_tsig;
atomic_tstart;
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index 6e94980357d2..5d10ad51c1c4 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -30,6 +30,10 @@
  * PRZ_FLAG_NO_LOCK is used. For all other cases, locking is required.
  */
 #define PRZ_FLAG_NO_LOCK   BIT(0)
+/*
+ * If a PRZ should only have a single-boot lifetime, this marks it as
+ * getting wiped after its contents get copied out after boot.
+ */
 #define PRZ_FLAG_ZAP_OLD   BIT(1)
 
 struct persistent_ram_buffer;
@@ -43,17 +47,53 @@ struct persistent_ram_ecc_info {
uint16_t *par;
 };
 
+/**
+ * struct persistent_ram_zone - Details of a persistent RAM zone (PRZ)
+ *  used as a pstore backend
+ *
+ * @paddr: physical address of the mapped RAM area
+ * @size:  size of mapping
+ * @label: unique name of this PRZ
+ * @flags: holds PRZ_FLAGS_* bits
+ *
+ * @buffer_lock:
+ * locks access to @buffer "size" bytes and "start" offset
+ * @buffer:
+ * pointer to actual RAM area managed by this PRZ
+ * @buffer_size:
+ * bytes in @buffer->data (not including any trailing ECC bytes)
+ *
+ * @par_buffer:
+ * pointer into @buffer->data containing ECC bytes for @buffer->data
+ * @par_header:
+ * pointer into @buffer->data containing ECC bytes for @buffer header
+ * (i.e. all fields up to @data)
+ * @rs_decoder:
+ * RSLIB instance for doing ECC calculations
+ * @corrected_bytes:
+ * ECC corrected bytes accounting since boot
+ * @bad_blocks:
+ * ECC uncorrectable bytes accounting since boot
+ * @ecc_info:
+ * ECC configuration details
+ *
+ * @old_log:
+ * saved copy of @buffer->data prior to most recent wipe
+ * @old_log_size:
+ * bytes contained in @old_log
+ *
+ */
 struct persistent_ram_zone {
phys_addr_t paddr;
size_t size;
void *vaddr;
char *label;
-   struct persistent_ram_buffer *buffer;
-   size_t buffer_size;
u32 flags;
+
raw_spinlock_t buffer_lock;
+   struct persistent_ram_buffer *buffer;
+   size_t buffer_size;
 
-   /* ECC correction */
char *par_buffer;
char *par_header;
struct rs_control *rs_decoder;
-- 
2.17.1



[PATCH -next v2 05/12] pstore/ram: Standardize module name in ramoops

2018-11-29 Thread Kees Cook
With both ram.c and ram_core.c built into ramoops.ko, it doesn't make
sense to have differing pr_fmt prefixes. This fixes ram_core.c to use
the module name (as ram.c already does). Additionally improves region
reservation error to include the region name.

Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 23ca6f2c98a0..f5d0173901aa 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -12,7 +12,7 @@
  *
  */
 
-#define pr_fmt(fmt) "persistent_ram: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
 #include 
@@ -443,7 +443,8 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t 
size,
void *va;
 
if (!request_mem_region(start, size, label ?: "ramoops")) {
-   pr_err("request mem region (0x%llx@0x%llx) failed\n",
+   pr_err("request mem region (%s 0x%llx@0x%llx) failed\n",
+   label ?: "ramoops",
(unsigned long long)size, (unsigned long long)start);
return NULL;
}
-- 
2.17.1



[PATCH -next v2 08/12] pstore: Improve and update some comments and status output

2018-11-29 Thread Kees Cook
This improves and updates some comments:
 - dump handler comment out of sync from calling convention
 - fix kern-doc typo

and improves status output:
 - reminder that only kernel crash dumps are compressed
 - do not be silent about ECC infrastructure failures

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c   | 7 +++
 fs/pstore/ram_core.c   | 4 +++-
 include/linux/pstore.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index a75756c48e10..32340e7dd6a5 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -304,7 +304,7 @@ static void allocate_buf_for_compression(void)
big_oops_buf_sz = size;
big_oops_buf = buf;
 
-   pr_info("Using compression: %s\n", zbackend->name);
+   pr_info("Using crash dump compression: %s\n", zbackend->name);
 }
 
 static void free_buf_for_compression(void)
@@ -354,9 +354,8 @@ void pstore_record_init(struct pstore_record *record,
 }
 
 /*
- * callback from kmsg_dump. (s2,l2) has the most recently
- * written bytes, older bytes are in (s1,l1). Save as much
- * as we can from the end of the buffer.
+ * callback from kmsg_dump. Save as much as we can (up to kmsg_bytes) from the
+ * end of the buffer.
  */
 static void pstore_dump(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason)
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 3e9e3ba4fb07..e6375439c5ac 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -503,8 +503,10 @@ static int persistent_ram_post_init(struct 
persistent_ram_zone *prz, u32 sig,
bool zap = !!(prz->flags & PRZ_FLAG_ZAP_OLD);
 
ret = persistent_ram_init_ecc(prz, ecc_info);
-   if (ret)
+   if (ret) {
+   pr_warn("ECC failed %s\n", prz->label);
return ret;
+   }
 
sig ^= PERSISTENT_RAM_SIG;
 
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 30fcec375a3a..81669aa80027 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -85,7 +85,7 @@ struct pstore_record {
 /**
  * struct pstore_info - backend pstore driver structure
  *
- * @owner: module which is repsonsible for this backend driver
+ * @owner: module which is responsible for this backend driver
  * @name:  name of the backend driver
  *
  * @buf_lock:  spinlock to serialize access to @buf
-- 
2.17.1



[PATCH -next v2 06/12] pstore/ram: Report backend assignments with finer granularity

2018-11-29 Thread Kees Cook
In order to more easily perform automated regression testing, this
adds pr_debug() calls to report each prz allocation which can then be
verified against persistent storage. Specifically, seeing the dividing
line between header, data, any ECC bytes. (And the general assignment
output is updated to remove the bogus ECC blocksize which isn't actually
recorded outside the prz instance.)

Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c  | 4 ++--
 fs/pstore/ram_core.c | 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 768759841491..10ac4d23c423 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -853,9 +853,9 @@ static int ramoops_probe(struct platform_device *pdev)
ramoops_pmsg_size = pdata->pmsg_size;
ramoops_ftrace_size = pdata->ftrace_size;
 
-   pr_info("attached 0x%lx@0x%llx, ecc: %d/%d\n",
+   pr_info("using 0x%lx@0x%llx, ecc: %d\n",
cxt->size, (unsigned long long)cxt->phys_addr,
-   cxt->ecc_info.ecc_size, cxt->ecc_info.block_size);
+   cxt->ecc_info.ecc_size);
 
return 0;
 
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index f5d0173901aa..62830734deee 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -576,6 +576,12 @@ struct persistent_ram_zone *persistent_ram_new(phys_addr_t 
start, size_t size,
if (ret)
goto err;
 
+   pr_debug("attached %s 0x%zx@0x%llx: %zu header, %zu data, %zu ecc 
(%d/%d)\n",
+   prz->label, prz->size, (unsigned long long)prz->paddr,
+   sizeof(*prz->buffer), prz->buffer_size,
+   prz->size - sizeof(*prz->buffer) - prz->buffer_size,
+   prz->ecc_info.ecc_size, prz->ecc_info.block_size);
+
return prz;
 err:
persistent_ram_free(prz);
-- 
2.17.1



[PATCH -next v2 10/12] pstore: Map PSTORE_TYPE_* to strings

2018-11-29 Thread Kees Cook
From: "Joel Fernandes (Google)" 

In later patches we will need to map types to names, so create a
constant table for that which can also be used in different parts of
old and new code. This saves the type in the PRZ which will be useful
in later patches.

Instead of having an explicit PSTORE_TYPE_UNKNOWN, just use ..._MAX.

This includes removing the now redundant filename templates which can use
a single format string. Also, there's no reason to limit the "is it still
compressed?" test to only PSTORE_TYPE_DMESG when building the pstorefs
filename. Records are zero-initialized, so a backend would need to have
explicitly set compressed=1.

Signed-off-by: Joel Fernandes (Google) 
Co-developed-by: Kees Cook 
Signed-off-by: Kees Cook 
---
 drivers/acpi/apei/erst.c   |  2 +-
 fs/pstore/inode.c  | 51 +++---
 fs/pstore/platform.c   | 37 +++
 fs/pstore/ram.c|  4 ++-
 include/linux/pstore.h | 17 ++---
 include/linux/pstore_ram.h |  3 +++
 6 files changed, 62 insertions(+), 52 deletions(-)

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index 3c5ea7cb693e..a5e1d963208e 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -1035,7 +1035,7 @@ static ssize_t erst_reader(struct pstore_record *record)
 CPER_SECTION_TYPE_MCE) == 0)
record->type = PSTORE_TYPE_MCE;
else
-   record->type = PSTORE_TYPE_UNKNOWN;
+   record->type = PSTORE_TYPE_MAX;
 
if (rcd->hdr.validation_bits & CPER_VALID_TIMESTAMP)
record->time.tv_sec = rcd->hdr.timestamp;
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 8cf2218b46a7..c60ee46f3e39 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -335,53 +335,10 @@ int pstore_mkfile(struct dentry *root, struct 
pstore_record *record)
goto fail_alloc;
private->record = record;
 
-   switch (record->type) {
-   case PSTORE_TYPE_DMESG:
-   scnprintf(name, sizeof(name), "dmesg-%s-%llu%s",
- record->psi->name, record->id,
- record->compressed ? ".enc.z" : "");
-   break;
-   case PSTORE_TYPE_CONSOLE:
-   scnprintf(name, sizeof(name), "console-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_FTRACE:
-   scnprintf(name, sizeof(name), "ftrace-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_MCE:
-   scnprintf(name, sizeof(name), "mce-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_PPC_RTAS:
-   scnprintf(name, sizeof(name), "rtas-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_PPC_OF:
-   scnprintf(name, sizeof(name), "powerpc-ofw-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_PPC_COMMON:
-   scnprintf(name, sizeof(name), "powerpc-common-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_PMSG:
-   scnprintf(name, sizeof(name), "pmsg-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_PPC_OPAL:
-   scnprintf(name, sizeof(name), "powerpc-opal-%s-%llu",
- record->psi->name, record->id);
-   break;
-   case PSTORE_TYPE_UNKNOWN:
-   scnprintf(name, sizeof(name), "unknown-%s-%llu",
- record->psi->name, record->id);
-   break;
-   default:
-   scnprintf(name, sizeof(name), "type%d-%s-%llu",
- record->type, record->psi->name, record->id);
-   break;
-   }
+   scnprintf(name, sizeof(name), "%s-%s-%llu%s",
+   pstore_type_to_name(record->type),
+   record->psi->name, record->id,
+   record->compressed ? ".enc.z" : "");
 
dentry = d_alloc_name(root, name);
if (!dentry)
diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 32340e7dd6a5..2387cb74f729 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -59,6 +59,19 @@ MODULE_PARM_DESC(update_ms, "milliseconds before pstore 
updates its content "
 "enabling this option is not

[PATCH -next v2 03/12] pstore: Remove needless lock during console writes

2018-11-29 Thread Kees Cook
Since the console writer does not use the preallocated crash dump buffer
any more, there is no reason to perform locking around it.

Fixes: 70ad35db3321 ("pstore: Convert console write to use ->write_buf")
Signed-off-by: Kees Cook 
Reviewed-by: Joel Fernandes (Google) 
---
 fs/pstore/platform.c | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 8b6028948cf3..a75756c48e10 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -462,31 +462,14 @@ static void pstore_unregister_kmsg(void)
 #ifdef CONFIG_PSTORE_CONSOLE
 static void pstore_console_write(struct console *con, const char *s, unsigned 
c)
 {
-   const char *e = s + c;
+   struct pstore_record record;
 
-   while (s < e) {
-   struct pstore_record record;
-   unsigned long flags;
-
-   pstore_record_init(, psinfo);
-   record.type = PSTORE_TYPE_CONSOLE;
-
-   if (c > psinfo->bufsize)
-   c = psinfo->bufsize;
+   pstore_record_init(, psinfo);
+   record.type = PSTORE_TYPE_CONSOLE;
 
-   if (oops_in_progress) {
-   if (!spin_trylock_irqsave(>buf_lock, flags))
-   break;
-   } else {
-   spin_lock_irqsave(>buf_lock, flags);
-   }
-   record.buf = (char *)s;
-   record.size = c;
-   psinfo->write();
-   spin_unlock_irqrestore(>buf_lock, flags);
-   s += c;
-   c = e - s;
-   }
+   record.buf = (char *)s;
+   record.size = c;
+   psinfo->write();
 }
 
 static struct console pstore_console = {
-- 
2.17.1



[PATCH -next v2 02/12] pstore: Do not use crash buffer for decompression

2018-11-29 Thread Kees Cook
The pre-allocated compression buffer used for crash dumping was also
being used for decompression. This isn't technically safe, since it's
possible the kernel may attempt a crashdump while pstore is populating the
pstore filesystem (and performing decompression). Instead, just allocate
a separate buffer for decompression. Correctness is preferred over
performance here.

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c | 56 
 1 file changed, 25 insertions(+), 31 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index b821054ca3ed..8b6028948cf3 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -258,20 +258,6 @@ static int pstore_compress(const void *in, void *out,
return outlen;
 }
 
-static int pstore_decompress(void *in, void *out,
-unsigned int inlen, unsigned int outlen)
-{
-   int ret;
-
-   ret = crypto_comp_decompress(tfm, in, inlen, out, );
-   if (ret) {
-   pr_err("crypto_comp_decompress failed, ret = %d!\n", ret);
-   return ret;
-   }
-
-   return outlen;
-}
-
 static void allocate_buf_for_compression(void)
 {
struct crypto_comp *ctx;
@@ -656,8 +642,9 @@ EXPORT_SYMBOL_GPL(pstore_unregister);
 
 static void decompress_record(struct pstore_record *record)
 {
+   int ret;
int unzipped_len;
-   char *decompressed;
+   char *unzipped, *workspace;
 
if (!record->compressed)
return;
@@ -668,35 +655,42 @@ static void decompress_record(struct pstore_record 
*record)
return;
}
 
-   /* No compression method has created the common buffer. */
+   /* Missing compression buffer means compression was not initialized. */
if (!big_oops_buf) {
-   pr_warn("no decompression buffer allocated\n");
+   pr_warn("no decompression method initialized!\n");
return;
}
 
-   unzipped_len = pstore_decompress(record->buf, big_oops_buf,
-record->size, big_oops_buf_sz);
-   if (unzipped_len <= 0) {
-   pr_err("decompression failed: %d\n", unzipped_len);
+   /* Allocate enough space to hold max decompression and ECC. */
+   unzipped_len = big_oops_buf_sz;
+   workspace = kmalloc(unzipped_len + record->ecc_notice_size,
+   GFP_KERNEL);
+   if (!workspace)
return;
-   }
 
-   /* Build new buffer for decompressed contents. */
-   decompressed = kmalloc(unzipped_len + record->ecc_notice_size,
-  GFP_KERNEL);
-   if (!decompressed) {
-   pr_err("decompression ran out of memory\n");
+   /* After decompression "unzipped_len" is almost certainly smaller. */
+   ret = crypto_comp_decompress(tfm, record->buf, record->size,
+ workspace, _len);
+   if (ret) {
+   pr_err("crypto_comp_decompress failed, ret = %d!\n", ret);
+   kfree(workspace);
return;
}
-   memcpy(decompressed, big_oops_buf, unzipped_len);
 
/* Append ECC notice to decompressed buffer. */
-   memcpy(decompressed + unzipped_len, record->buf + record->size,
+   memcpy(workspace + unzipped_len, record->buf + record->size,
   record->ecc_notice_size);
 
-   /* Swap out compresed contents with decompressed contents. */
+   /* Copy decompressed contents into an minimum-sized allocation. */
+   unzipped = kmemdup(workspace, unzipped_len + record->ecc_notice_size,
+  GFP_KERNEL);
+   kfree(workspace);
+   if (!unzipped)
+   return;
+
+   /* Swap out compressed contents with decompressed contents. */
kfree(record->buf);
-   record->buf = decompressed;
+   record->buf = unzipped;
record->size = unzipped_len;
record->compressed = false;
 }
-- 
2.17.1



[PATCH -next v2 11/12] pstore/ram: Simplify ramoops_get_next_prz() arguments

2018-11-29 Thread Kees Cook
From: "Joel Fernandes (Google)" 

(1) remove type argument from ramoops_get_next_prz()

Since we store the type of the prz when we initialize it, we no longer
need to pass it again in ramoops_get_next_prz() since we can just use
that to setup the pstore record. So lets remove it from the argument list.

(2) remove max argument from ramoops_get_next_prz()

Looking at the code flow, the 'max' checks are already being done on
the prz passed to ramoops_get_next_prz(). Lets remove it to simplify
this function and reduce its arguments.

(3) further reduce ramoops_get_next_prz() arguments by passing record

Both the id and type fields of a pstore_record are set by
ramoops_get_next_prz(). So we can just pass a pointer to the pstore_record
instead of passing individual elements. This results in cleaner more
readable code and fewer lines.

In addition lets also remove the 'update' argument since we can detect
that. Changes are squashed into a single patch to reduce fixup conflicts.

Signed-off-by: Joel Fernandes (Google) 
Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c | 48 ++--
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index b174d0fc009f..202eaa82bcc6 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -124,19 +124,17 @@ static int ramoops_pstore_open(struct pstore_info *psi)
 }
 
 static struct persistent_ram_zone *
-ramoops_get_next_prz(struct persistent_ram_zone *przs[], uint *c, uint max,
-u64 *id,
-enum pstore_type_id *typep, enum pstore_type_id type,
-bool update)
+ramoops_get_next_prz(struct persistent_ram_zone *przs[], int id,
+struct pstore_record *record)
 {
struct persistent_ram_zone *prz;
-   int i = (*c)++;
+   bool update = (record->type == PSTORE_TYPE_DMESG);
 
/* Give up if we never existed or have hit the end. */
-   if (!przs || i >= max)
+   if (!przs)
return NULL;
 
-   prz = przs[i];
+   prz = przs[id];
if (!prz)
return NULL;
 
@@ -147,8 +145,8 @@ ramoops_get_next_prz(struct persistent_ram_zone *przs[], 
uint *c, uint max,
if (!persistent_ram_old_size(prz))
return NULL;
 
-   *typep = type;
-   *id = i;
+   record->type = prz->type;
+   record->id = id;
 
return prz;
 }
@@ -255,10 +253,8 @@ static ssize_t ramoops_pstore_read(struct pstore_record 
*record)
 
/* Find the next valid persistent_ram_zone for DMESG */
while (cxt->dump_read_cnt < cxt->max_dump_cnt && !prz) {
-   prz = ramoops_get_next_prz(cxt->dprzs, >dump_read_cnt,
-  cxt->max_dump_cnt, >id,
-  >type,
-  PSTORE_TYPE_DMESG, 1);
+   prz = ramoops_get_next_prz(cxt->dprzs, cxt->dump_read_cnt++,
+  record);
if (!prz_ok(prz))
continue;
header_length = ramoops_read_kmsg_hdr(persistent_ram_old(prz),
@@ -272,22 +268,18 @@ static ssize_t ramoops_pstore_read(struct pstore_record 
*record)
}
}
 
-   if (!prz_ok(prz))
-   prz = ramoops_get_next_prz(>cprz, >console_read_cnt,
-  1, >id, >type,
-  PSTORE_TYPE_CONSOLE, 0);
+   if (!prz_ok(prz) && !cxt->console_read_cnt++)
+   prz = ramoops_get_next_prz(>cprz, 0 /* single */, record);
 
-   if (!prz_ok(prz))
-   prz = ramoops_get_next_prz(>mprz, >pmsg_read_cnt,
-  1, >id, >type,
-  PSTORE_TYPE_PMSG, 0);
+   if (!prz_ok(prz) && !cxt->pmsg_read_cnt++)
+   prz = ramoops_get_next_prz(>mprz, 0 /* single */, record);
 
/* ftrace is last since it may want to dynamically allocate memory. */
if (!prz_ok(prz)) {
-   if (!(cxt->flags & RAMOOPS_FLAG_FTRACE_PER_CPU)) {
-   prz = ramoops_get_next_prz(cxt->fprzs,
-   >ftrace_read_cnt, 1, >id,
-   >type, PSTORE_TYPE_FTRACE, 0);
+   if (!(cxt->flags & RAMOOPS_FLAG_FTRACE_PER_CPU) &&
+   !cxt->ftrace_read_cnt++) {
+   prz = ramoops_get_next_prz(cxt->fprzs, 0 /* single */,
+  record);
} else {
/*
 * Build a new dummy record which combines all the
@@ -303,11 +295,7 @@ static ssize_t ramoops_pstore_read(struct pstore_record 
*record

[PATCH -next v2 12/12] pstore/ram: Do not treat empty buffers as valid

2018-11-29 Thread Kees Cook
From: "Joel Fernandes (Google)" 

The ramoops backend currently calls persistent_ram_save_old() even
if a buffer is empty. While this appears to work, it is does not seem
like the right thing to do and could lead to future bugs so lets avoid
that. It also prevents misleading prints in the logs which claim the
buffer is valid.

I got something like:

found existing buffer, size 0, start 0

When I was expecting:

no valid data in buffer (sig = ...)

This bails out early (and reports with pr_debug()), since it's an
acceptable state.

Signed-off-by: Joel Fernandes (Google) 
Co-developed-by: Kees Cook 
Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index e6375439c5ac..c11711c2cc83 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -511,6 +511,11 @@ static int persistent_ram_post_init(struct 
persistent_ram_zone *prz, u32 sig,
sig ^= PERSISTENT_RAM_SIG;
 
if (prz->buffer->sig == sig) {
+   if (buffer_size(prz) == 0) {
+   pr_debug("found existing empty buffer\n");
+   return 0;
+   }
+
if (buffer_size(prz) > prz->buffer_size ||
buffer_start(prz) > buffer_size(prz)) {
pr_info("found existing invalid buffer, size %zu, start 
%zu\n",
-- 
2.17.1



[PATCH -next v2 09/12] pstore: Replace open-coded << with BIT()

2018-11-29 Thread Kees Cook
Minor clean-up to use BIT() (as already done in pstore_ram.h).

Signed-off-by: Kees Cook 
---
 include/linux/pstore.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 81669aa80027..f46e5df76b58 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -192,10 +192,10 @@ struct pstore_info {
 };
 
 /* Supported frontends */
-#define PSTORE_FLAGS_DMESG (1 << 0)
-#define PSTORE_FLAGS_CONSOLE   (1 << 1)
-#define PSTORE_FLAGS_FTRACE(1 << 2)
-#define PSTORE_FLAGS_PMSG  (1 << 3)
+#define PSTORE_FLAGS_DMESG BIT(0)
+#define PSTORE_FLAGS_CONSOLE   BIT(1)
+#define PSTORE_FLAGS_FTRACEBIT(2)
+#define PSTORE_FLAGS_PMSG  BIT(3)
 
 extern int pstore_register(struct pstore_info *);
 extern void pstore_unregister(struct pstore_info *);
-- 
2.17.1



[PATCH -next v2 00/12] pstore: various clean-ups

2018-11-29 Thread Kees Cook
This is a collection of clean-ups from Joel Fernandes, Peng Wang, and
myself. I wanted to put the entire series up for review since they've
changed a bit from their earlier incarnations. I intend to push them to
next shortly...

Thanks!

-Kees


Joel Fernandes (Google) (3):
  pstore: Map PSTORE_TYPE_* to strings
  pstore/ram: Simplify ramoops_get_next_prz() arguments
  pstore/ram: Do not treat empty buffers as valid

Kees Cook (8):
  pstore/ram: Correctly calculate usable PRZ bytes
  pstore: Do not use crash buffer for decompression
  pstore: Remove needless lock during console writes
  pstore/ram: Standardize module name in ramoops
  pstore/ram: Report backend assignments with finer granularity
  pstore/ram: Add kern-doc for struct persistent_ram_zone
  pstore: Improve and update some comments and status output
  pstore: Replace open-coded << with BIT()

Peng Wang (1):
  pstore: Avoid duplicate call of persistent_ram_zap()

 drivers/acpi/apei/erst.c   |   2 +-
 fs/pstore/inode.c  |  51 ++-
 fs/pstore/platform.c   | 129 -
 fs/pstore/ram.c|  75 +
 fs/pstore/ram_core.c   |  45 ++---
 include/linux/pstore.h |  32 ++---
 include/linux/pstore_ram.h |  50 +-
 7 files changed, 212 insertions(+), 172 deletions(-)

-- 
2.17.1



Re: [PATCH 2/8] pstore: Do not use crash buffer for decompression

2018-11-29 Thread Kees Cook
On Tue, Nov 13, 2018 at 11:56 PM Kees Cook  wrote:
> On Fri, Nov 2, 2018 at 1:24 PM, Joel Fernandes  wrote:
> > On Thu, Nov 01, 2018 at 04:51:54PM -0700, Kees Cook wrote:
> >> + workspace = kmalloc(unzipped_len + record->ecc_notice_size,
> >
> > Should tihs be unzipped_len + record->ecc_notice_size + 1. The extra byte
> > being for the NULL character of the ecc notice?
> >
> > This occurred to me when I saw the + 1 in ram.c. It could be better to just
> > abstract the size as a macro.
>
> Ooh, yes, good catch. I'll get this fixed.

I spent more time looking at this, and it seems that only the initial
creation of this string needs the +1, since all other operations are
byte-based not NUL-terminated string based. It's a big odd, and I
might try to clean it up differently, but as it stands, this is okay.
(See inode.c which doesn't include the trailing NUL byte.)

-Kees

-- 
Kees Cook


Re: [PATCH] x86/mm/dump_pagetables: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-11-29 Thread Kees Cook
On Wed, Nov 28, 2018 at 10:45 AM Dave Hansen  wrote:
>
> On 11/27/18 2:50 PM, Kees Cook wrote:
> > On Mon, Nov 19, 2018 at 9:06 AM, Dave Hansen  wrote:
> >> On 11/19/18 7:43 AM, Yangtao Li wrote:
> >>> -static const struct file_operations ptdump_curusr_fops = {
> >>> - .owner  = THIS_MODULE,
> >>> - .open   = ptdump_open_curusr,
> >>> - .read   = seq_read,
> >>> - .llseek = seq_lseek,
> >>> - .release= single_release,
> >>> -};
> >>> +DEFINE_SHOW_ATTRIBUTE(ptdump_curusr);
> >>
> >> FWIW, I rather dislike this conversion and the DEFINE_SHOW_ATTRIBUTE()
> >> approach in general.  It makes it basically impossible to go from
> >> ptdump_curusr to ptdump_open_curusr without opening up the macro and
> >> reverse-engineering it.
> >>
> >> My test is that for these macros to be sane, I need to be able to find
> >> "ptdump_open_curusr" by grepping for "ptdump_curusr".  This fails the test.
> >
> > Er, "ptdump_curusr" matches the generated name "ptdump_curusr_show",
> > is that what you mean?
>
> Ahh, I also missed some of the renames to make this OK, like:
>
> > -static int ptdump_show_efi(struct seq_file *m, void *v)
> > +static int ptdump_efi_show(struct seq_file *m, void *v)
>
> I thought there was some macro magic going on that screwed the names up.
>
> This looks fine to me.
>
> Reviewed-by: Dave Hansen 

Cool, thanks!

Reviewed-by: Kees Cook 

Ingo, can you toss this in -tip please?

-- 
Kees Cook


Re: [PATCH 1/1] sched/headers: fix thread_info. is overwritten by STACK_END_MAGIC

2018-11-29 Thread Kees Cook
On Tue, Nov 27, 2018 at 8:38 PM Wang, Dongsheng
 wrote:
>
> Hello Kees,
>
> On 2018/11/28 6:38, Kees Cook wrote:
> > On Thu, Nov 22, 2018 at 11:54 PM, Wang Dongsheng
> >  wrote:
> >> When select ARCH_TASK_STRUCT_ON_STACK the first of thread_info variable
> >> is overwritten by STACK_END_MAGIC. In fact, the ARCH_TASK_STRUCT_ON_STACK
> >> is not a real task on stack, it's only init_task on init_stack.
> >>
> >> Commit 0500871f21b2 ("Construct init thread stack in the linker script
> >> rather than by union") added this macro and put task_strcut into
> >> thread_union. This brings us the following possibilities:
> >> TASK_ON_STACKTHREAD_INFO_IN_TASKSTACK
> >> - <-- thread_info & stack
> >> NN | | --- <-- task
> >>| ||   |
> >> -  ---
> >>
> >> - <-- stack
> >> NY | | --- <-- 
> >> task(Including thread_info)
> >>| ||   |
> >> -  ---
> >>
> >> - <-- stack & task & 
> >> thread_info
> >> YN | |
> >>| |
> >> -
> >>
> >> - <-- stack & 
> >> task(Including thread_info)
> >> YY | |
> >>| |
> >> -
> >> The kernel has handled the first two cases correctly.
> >>
> >> For the third case:
> >> TASK_ON_STACK: Y. THREAD_INFO_IN_TASK: N. this case
> >> should never happen, because the task and thread_info will overlap. So
> >> when TASK_ON_STACK is selected, THREAD_INFO_IN_TASK must be selected too.
> >>
> >> For the fourth case:
> >> When task on stack, the end of stack should add a sizeof(task_struct) 
> >> offset.
> >>
> >> This patch handled with the third and fourth case.
> >>
> >> Fixes: 0500871f21b2 ("Construct init thread stack in the linker ...")
> >>
> >> Signed-off-by: Wang Dongsheng 
> >> Signed-off-by: Shunyong Yang 
> >> ---
> >>  arch/Kconfig | 1 +
> >>  include/linux/sched/task_stack.h | 5 -
> >>  2 files changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/Kconfig b/arch/Kconfig
> >> index e1e540ffa979..0a2c73e73195 100644
> >> --- a/arch/Kconfig
> >> +++ b/arch/Kconfig
> >> @@ -251,6 +251,7 @@ config ARCH_HAS_SET_MEMORY
> >>  # Select if arch init_task must go in the __init_task_data section
> >>  config ARCH_TASK_STRUCT_ON_STACK
> >> bool
> >> +   depends on THREAD_INFO_IN_TASK || IA64
> > The "IA64" part shouldn't be needed since IA64 already selects it.
> >
> > Since it's selected, it also can't have a depends, IIUC.
>
> Since the IA64 thread_info including task_struct, it doesn't need to
> select THREAD_INFO_IN_TASK.
> So we need to allow IA64 select ARCH_TASK_STRUCT_ON_STACK without
> THREAD_INFO.

Okay.

> >>  # Select if arch has its private alloc_task_struct() function
> >>  config ARCH_TASK_STRUCT_ALLOCATOR
> >> diff --git a/include/linux/sched/task_stack.h 
> >> b/include/linux/sched/task_stack.h
> >> index 6a841929073f..624c48defb9e 100644
> >> --- a/include/linux/sched/task_stack.h
> >> +++ b/include/linux/sched/task_stack.h
> >> @@ -7,6 +7,7 @@
> >>   */
> >>
> >>  #include 
> >> +#include 
> >>  #include 
> >>
> >>  #ifdef CONFIG_THREAD_INFO_IN_TASK
> >> @@ -25,7 +26,9 @@ static inline void *task_stack_page(const struct 
> >> task_struct *task)
> >>
> >>  static inline unsigned long *end_of_stack(const struct task_struct *task)
> >>  {
> >> -   return task->stack;
> >> +   if (!IS_ENABLED(CONFIG_ARCH_TASK_STRUCT_ON_STACK) || task != 
> >> _task)
> >> +   return task->stac

Re: siginfo pid not populated from ptrace?

2018-11-29 Thread Kees Cook
On Tue, Nov 27, 2018 at 8:44 PM Eric W. Biederman  wrote:
>
> Kees Cook  writes:
>
> > On Tue, Nov 27, 2018 at 4:38 PM, Kees Cook  wrote:
> >> On Tue, Nov 27, 2018 at 3:21 PM, Tycho Andersen  wrote:
> >>> On Mon, Nov 12, 2018 at 12:24:43PM -0700, Tycho Andersen wrote:
> >>>> On Mon, Nov 12, 2018 at 11:55:38AM -0700, Tycho Andersen wrote:
> >>>> > I haven't manage to reproduce it on stock v4.20-rc2, unfortunately.
> >>>>
> >>>> Ok, now I have,
> >>>>
> >>>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1493) == 
> >>>> info._sifields._kill.si_pid (0)
> >>>> global.syscall_restart: Test failed at step #22
> >>>
> >>> Seems like this is still happening on v4.20-rc4,
> >>>
> >>> [ RUN  ] global.syscall_restart
> >>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1901) == 
> >>> info._sifields._kill.si_pid (0)
> >>> global.syscall_restart: Test failed at step #22
> >>
> >> This fails every time for me -- is it still racey for you?
> >>
> >> I'm attempting a bisect, hoping it doesn't _become_ racey for me. ;)
> >
> > This bisect to here for me:
> >
> > commit f149b31557446aff9ca96d4be7e39cc266f6e7cc
> > Author: Eric W. Biederman 
> > Date:   Mon Sep 3 09:50:36 2018 +0200
> >
> > signal: Never allocate siginfo for SIGKILL or SIGSTOP
> >
> > The SIGKILL and SIGSTOP signals are never delivered to userspace so
> > queued siginfo for these signals can never be observed.  Therefore
> > remove the chance of failure by never even attempting to allocate
> > siginfo in those cases.
> >
> > Reviewed-by: Thomas Gleixner 
> > Signed-off-by: "Eric W. Biederman" 
> >
> > They are certainly visible via seccomp ;)
>
> Well SIGSTOP is visible via PTRACE_GETSIGINFO.
>
> I see what is happening now.  Since we don't have queued siginfo
> we generate some as:
> /*
>  * Ok, it wasn't in the queue.  This must be
>  * a fast-pathed signal or we must have been
>  * out of queue space.  So zero out the info.
>  */
> clear_siginfo(info);
> info->si_signo = sig;
> info->si_errno = 0;
> info->si_code = SI_USER;
> info->si_pid = 0;
> info->si_uid = 0;
>
> Which allows last_signfo to be set,
> so despite not really having any siginfo PTRACE_GET_SIGINFO
> has something to return so does not return -EINVAL.
>
> Reconstructing my context that was part of removing SEND_SIG_FORCED
> so this looks like it will take a little more than a revert to fix
> this.
>
> This is definitely a change that is visible to user space.  The logic in
> my patch was definitely wrong with respect to SIGSTOP and
> PTRACE_GETSIGINFO.  Is there something in userspace that actually cares?
> AKA is the idiom that the test seccomp_bpf.c is using something that
> non-test code does?

I think this would be needed by any ptracer that handled multiple
threads. It needs to figure out which pid stopped. I think it's worth
fixing, yes.

> The change below should restore the old behavior.  I am just wondering
> if this is something we want to do.  siginfo is allocated with
> GFP_ATOMIC so if your machine is under memory pressure there is a real
> chance the allocation can fail.  Which would cause whatever is breaking
> now to break less deterministically then.

I think memory pressure that would block a 128 byte GFP_ATOMIC
allocation would mean the system was about to seriously fall over.
Given the user-facing behavior change and that an existing test was
already checking for this means we need to fix it.

> If we need to fix this do we need to make siginfo allocation more
> reliable?

I don't think so -- we'd already get a WARN() if allocation failed.

> Eric
>
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 4fd431ce4f91..5c34c55bfea4 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1057,10 +1057,10 @@ static int __send_signal(int sig, struct 
> kernel_siginfo *info, struct task_struc
>
> result = TRACE_SIGNAL_DELIVERED;
> /*
> -* Skip useless siginfo allocation for SIGKILL SIGSTOP,
> +* Skip useless siginfo allocation for SIGKILL,
>  * and kernel threads.
>  */
> -   if (sig_kernel_only(sig) || (t->flags & PF_KTHREAD))
> +   if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
> goto out_set;
>
> /*
>

This fixes it for me!

Reported-by: Tycho Andersen 
Tested-by: Kees Cook 
Fixes: f149b3155744 ("signal: Never allocate siginfo for SIGKILL or SIGSTOP")

Thanks!

-- 
Kees Cook


Re: siginfo pid not populated from ptrace?

2018-11-27 Thread Kees Cook
On Tue, Nov 27, 2018 at 4:38 PM, Kees Cook  wrote:
> On Tue, Nov 27, 2018 at 3:21 PM, Tycho Andersen  wrote:
>> On Mon, Nov 12, 2018 at 12:24:43PM -0700, Tycho Andersen wrote:
>>> On Mon, Nov 12, 2018 at 11:55:38AM -0700, Tycho Andersen wrote:
>>> > I haven't manage to reproduce it on stock v4.20-rc2, unfortunately.
>>>
>>> Ok, now I have,
>>>
>>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1493) == 
>>> info._sifields._kill.si_pid (0)
>>> global.syscall_restart: Test failed at step #22
>>
>> Seems like this is still happening on v4.20-rc4,
>>
>> [ RUN  ] global.syscall_restart
>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1901) == 
>> info._sifields._kill.si_pid (0)
>> global.syscall_restart: Test failed at step #22
>
> This fails every time for me -- is it still racey for you?
>
> I'm attempting a bisect, hoping it doesn't _become_ racey for me. ;)

This bisect to here for me:

commit f149b31557446aff9ca96d4be7e39cc266f6e7cc
Author: Eric W. Biederman 
Date:   Mon Sep 3 09:50:36 2018 +0200

signal: Never allocate siginfo for SIGKILL or SIGSTOP

The SIGKILL and SIGSTOP signals are never delivered to userspace so
queued siginfo for these signals can never be observed.  Therefore
remove the chance of failure by never even attempting to allocate
siginfo in those cases.

Reviewed-by: Thomas Gleixner 
Signed-off-by: "Eric W. Biederman" 

They are certainly visible via seccomp ;)


-- 
Kees Cook


Re: siginfo pid not populated from ptrace?

2018-11-27 Thread Kees Cook
On Tue, Nov 27, 2018 at 3:21 PM, Tycho Andersen  wrote:
> On Mon, Nov 12, 2018 at 12:24:43PM -0700, Tycho Andersen wrote:
>> On Mon, Nov 12, 2018 at 11:55:38AM -0700, Tycho Andersen wrote:
>> > I haven't manage to reproduce it on stock v4.20-rc2, unfortunately.
>>
>> Ok, now I have,
>>
>> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1493) == 
>> info._sifields._kill.si_pid (0)
>> global.syscall_restart: Test failed at step #22
>
> Seems like this is still happening on v4.20-rc4,
>
> [ RUN  ] global.syscall_restart
> seccomp_bpf.c:2736:global.syscall_restart:Expected getpid() (1901) == 
> info._sifields._kill.si_pid (0)
> global.syscall_restart: Test failed at step #22

This fails every time for me -- is it still racey for you?

I'm attempting a bisect, hoping it doesn't _become_ racey for me. ;)

-- 
Kees Cook


Re: [PATCH] x86/mm/dump_pagetables: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-11-27 Thread Kees Cook
On Mon, Nov 19, 2018 at 9:06 AM, Dave Hansen  wrote:
> On 11/19/18 7:43 AM, Yangtao Li wrote:
>> -static const struct file_operations ptdump_curusr_fops = {
>> - .owner  = THIS_MODULE,
>> - .open   = ptdump_open_curusr,
>> - .read   = seq_read,
>> - .llseek = seq_lseek,
>> - .release= single_release,
>> -};
>> +DEFINE_SHOW_ATTRIBUTE(ptdump_curusr);
>
> FWIW, I rather dislike this conversion and the DEFINE_SHOW_ATTRIBUTE()
> approach in general.  It makes it basically impossible to go from
> ptdump_curusr to ptdump_open_curusr without opening up the macro and
> reverse-engineering it.
>
> My test is that for these macros to be sane, I need to be able to find
> "ptdump_open_curusr" by grepping for "ptdump_curusr".  This fails the test.

Er, "ptdump_curusr" matches the generated name "ptdump_curusr_show",
is that what you mean?

> I don't think saving a few lines of code is worth the obfuscation.

This is the standard boilerplate for attributes, though. I'd be nice
to drop all the copy/pasted code...

-- 
Kees Cook


Re: [PATCH 1/1] sched/headers: fix thread_info. is overwritten by STACK_END_MAGIC

2018-11-27 Thread Kees Cook
On Thu, Nov 22, 2018 at 11:54 PM, Wang Dongsheng
 wrote:
> When select ARCH_TASK_STRUCT_ON_STACK the first of thread_info variable
> is overwritten by STACK_END_MAGIC. In fact, the ARCH_TASK_STRUCT_ON_STACK
> is not a real task on stack, it's only init_task on init_stack.
>
> Commit 0500871f21b2 ("Construct init thread stack in the linker script
> rather than by union") added this macro and put task_strcut into
> thread_union. This brings us the following possibilities:
> TASK_ON_STACKTHREAD_INFO_IN_TASKSTACK
> - <-- thread_info & stack
> NN | | --- <-- task
>| ||   |
> -  ---
>
> - <-- stack
> NY | | --- <-- task(Including 
> thread_info)
>| ||   |
> -  ---
>
> - <-- stack & task & 
> thread_info
> YN | |
>| |
> -
>
> - <-- stack & task(Including 
> thread_info)
> YY | |
>| |
> -
> The kernel has handled the first two cases correctly.
>
> For the third case:
> TASK_ON_STACK: Y. THREAD_INFO_IN_TASK: N. this case
> should never happen, because the task and thread_info will overlap. So
> when TASK_ON_STACK is selected, THREAD_INFO_IN_TASK must be selected too.
>
> For the fourth case:
> When task on stack, the end of stack should add a sizeof(task_struct) offset.
>
> This patch handled with the third and fourth case.
>
> Fixes: 0500871f21b2 ("Construct init thread stack in the linker ...")
>
> Signed-off-by: Wang Dongsheng 
> Signed-off-by: Shunyong Yang 
> ---
>  arch/Kconfig | 1 +
>  include/linux/sched/task_stack.h | 5 -
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index e1e540ffa979..0a2c73e73195 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -251,6 +251,7 @@ config ARCH_HAS_SET_MEMORY
>  # Select if arch init_task must go in the __init_task_data section
>  config ARCH_TASK_STRUCT_ON_STACK
> bool
> +   depends on THREAD_INFO_IN_TASK || IA64

The "IA64" part shouldn't be needed since IA64 already selects it.

Since it's selected, it also can't have a depends, IIUC.

>
>  # Select if arch has its private alloc_task_struct() function
>  config ARCH_TASK_STRUCT_ALLOCATOR
> diff --git a/include/linux/sched/task_stack.h 
> b/include/linux/sched/task_stack.h
> index 6a841929073f..624c48defb9e 100644
> --- a/include/linux/sched/task_stack.h
> +++ b/include/linux/sched/task_stack.h
> @@ -7,6 +7,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>
>  #ifdef CONFIG_THREAD_INFO_IN_TASK
> @@ -25,7 +26,9 @@ static inline void *task_stack_page(const struct 
> task_struct *task)
>
>  static inline unsigned long *end_of_stack(const struct task_struct *task)
>  {
> -   return task->stack;
> +   if (!IS_ENABLED(CONFIG_ARCH_TASK_STRUCT_ON_STACK) || task != 
> _task)
> +   return task->stack;
> +   return (unsigned long *)(task + 1);
>  }

This seems like a strange place for the change. It feels more like
init_task has been defined incorrectly.

-Kees

>
>  #elif !defined(__HAVE_THREAD_FUNCTIONS)
> --
> 2.19.1
>



-- 
Kees Cook


Re: [RFC PATCH v2] ptrace: add PTRACE_GET_SYSCALL_INFO request

2018-11-27 Thread Kees Cook
On Sat, Nov 24, 2018 at 8:10 PM, Dmitry V. Levin  wrote:
> On Fri, Nov 23, 2018 at 07:01:39AM +0300, Dmitry V. Levin wrote:
>> On Thu, Nov 22, 2018 at 04:19:10PM -0800, Andy Lutomirski wrote:
>> > On Thu, Nov 22, 2018 at 11:15 AM Dmitry V. Levin wrote:
>> > > On Thu, Nov 22, 2018 at 06:55:29AM -0800, Andy Lutomirski wrote:
>> > > > On Wed, Nov 21, 2018 at 3:56 PM Dmitry V. Levin wrote:
>> > > > > On Wed, Nov 21, 2018 at 02:56:57PM -0800, Andy Lutomirski wrote:
>> > > > > > Please cc linux-...@vger.kernel.org for future versions.
>> > > > > >
>> > > > > > On Wed, Nov 21, 2018 at 7:58 AM Elvira Khabirova wrote:
>> > > > > > >
>> > > > > > > struct ptrace_syscall_info {
>> > > > > > > __u8 op; /* 0 for entry, 1 for exit */
>> > > > > >
>> > > > > > Can you add proper defines, like:
>> > > > > >
>> > > > > > #define PTRACE_SYSCALL_ENTRY 0
>> > > > > > #define PTRACE_SYSCALL_EXIT 1
>> > > > > > #define PTRACE_SYSCALL_SECCOMP 2
>> > > > > >
>> > > > > > and make seccomp work from the start?  I'd rather we don't merge an
>> > > > > > implementation that doesn't work for seccomp and then have to 
>> > > > > > rework
>> > > > > > it later.

Yes, please.

>> > > > >
>> > > > > What's the difference between PTRACE_EVENT_SECCOMP and 
>> > > > > syscall-entry-stop
>> > > > > with regards to PTRACE_GET_SYSCALL_INFO request?  At least they have 
>> > > > > the
>> > > > > same entry_info to return.
>> > > >
>> > > > I'm not sure there's any material difference.
>> > >
>> > > In that case we don't really need PTRACE_SYSCALL_SECCOMP: op field
>> > > describes the structure inside the union to use, not the ptrace stop.
>> >
>> > Unless we think the structures might diverge in the future.

Yes, I want to make sure we have a way to expand this, especially for
seccomp: we've come close a few times to adding new fields to struct
seccomp_data, for example.

>>
>> If these structures ever diverge, then a seccomp structure will be added
>> to the union, and a portable userspace code will likely look this way:
>>
>> #include 
>> ...
>> struct ptrace_syscall_info info;
>> long rc = ptrace(PTRACE_GET_SYSCALL_INFO, pid, (void *) sizeof(info), );
>> ...
>> switch (info.op) {
>>   case PTRACE_SYSCALL_INFO_ENTRY:
>>   /* handle info.entry */
>>   case PTRACE_SYSCALL_INFO_EXIT:
>>   /* handle info.exit */
>> #ifdef PTRACE_SYSCALL_INFO_SECCOMP
>>   case PTRACE_SYSCALL_INFO_SECCOMP:
>>   /* handle info.seccomp */
>> #endif
>>   default:
>>   /* handle unknown info.op */
>> }
>>
>> In other words, it would be better if PTRACE_SYSCALL_INFO_* selector
>> constants were introduced along with corresponding structures in the
>> union.
>
> However, the approach I suggested doesn't provide forward compatibility:
> if userspace is compiled with kernel headers that don't define
> PTRACE_SYSCALL_INFO_SECCOMP, it will break when the kernel
> starts to use PTRACE_SYSCALL_INFO_SECCOMP instead of
> PTRACE_SYSCALL_INFO_ENTRY for PTRACE_EVENT_SECCOMP support
> in PTRACE_GET_SYSCALL_INFO.
>
> The solution is to introduce PTRACE_SYSCALL_INFO_SECCOMP and struct
> ptrace_syscall_info.seccomp along with PTRACE_EVENT_SECCOMP support
> in PTRACE_GET_SYSCALL_INFO.  The initial revision of the seccomp
> structure could be made the same as the entry structure, or it can
> diverge from the beginning, e.g., by adding ret_data field containing
> SECCOMP_RET_DATA return value stored in ptrace_message, this would save
> ptracers an extra PTRACE_GETEVENTMSG call currently required to obtain it.

Yup, that'd be a nice addition.

-- 
Kees Cook


Re: [PATCH 2/2] kbuild: descend into scripts/gcc-plugins/ via scripts/Makefile

2018-11-27 Thread Kees Cook
On Thu, Nov 22, 2018 at 8:51 PM, Masahiro Yamada
 wrote:
> Now that 'prepare0' depends on 'scripts', building GCC plugins can
> go into scripts/Makefile, which is a more standard way.
>
> Signed-off-by: Masahiro Yamada 

Reviewed-by: Kees Cook 

-Kees

> ---
>
>  Makefile | 2 +-
>  scripts/Makefile | 3 ++-
>  scripts/Makefile.gcc-plugins | 8 
>  3 files changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index cee4cec..a8bbe68 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1114,7 +1114,7 @@ macroprepare: prepare1 archmacros
>
>  archprepare: archheaders archscripts macroprepare scripts_basic
>
> -prepare0: scripts archprepare gcc-plugins
> +prepare0: scripts archprepare
> $(Q)$(MAKE) $(build)=scripts/mod
> $(Q)$(MAKE) $(build)=.
>
> diff --git a/scripts/Makefile b/scripts/Makefile
> index b48259d..feb1f71 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -36,9 +36,10 @@ PHONY += build_unifdef
>  build_unifdef: $(obj)/unifdef
> @:
>
> +subdir-$(CONFIG_GCC_PLUGINS) += gcc-plugins
>  subdir-$(CONFIG_MODVERSIONS) += genksyms
>  subdir-$(CONFIG_SECURITY_SELINUX) += selinux
>  subdir-$(CONFIG_GDB_SCRIPTS) += gdb
>
>  # Let clean descend into subdirs
> -subdir-+= basic dtc kconfig mod package gcc-plugins
> +subdir-+= basic dtc kconfig mod package
> diff --git a/scripts/Makefile.gcc-plugins b/scripts/Makefile.gcc-plugins
> index 46c5c68..c36f199 100644
> --- a/scripts/Makefile.gcc-plugins
> +++ b/scripts/Makefile.gcc-plugins
> @@ -49,11 +49,3 @@ KBUILD_CFLAGS += $(GCC_PLUGINS_CFLAGS)
>  # All enabled GCC plugins are collected here for building below.
>  GCC_PLUGIN := $(gcc-plugin-y)
>  export GCC_PLUGIN
> -
> -# Actually do the build, if requested.
> -PHONY += gcc-plugins
> -gcc-plugins: scripts_basic
> -ifdef CONFIG_GCC_PLUGINS
> -   $(Q)$(MAKE) $(build)=scripts/gcc-plugins
> -endif
> -   @:
> --
> 2.7.4
>



-- 
Kees Cook


Re: [PATCH 2/2] selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config

2018-11-27 Thread Kees Cook
On Mon, Nov 26, 2018 at 7:12 PM, Dan Rue  wrote:
> CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y is required for fw_fallback.sh.
> Without it, fw_fallback.sh fails with 'usermode helper disabled so
> ignoring test'. Enable the config in selftest so that it gets built by
> default.
>
> Signed-off-by: Dan Rue 

Acked-by: Kees Cook 

-Kees

> ---
>  tools/testing/selftests/firmware/config | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/testing/selftests/firmware/config 
> b/tools/testing/selftests/firmware/config
> index bf634dda0720..913a25a4a32b 100644
> --- a/tools/testing/selftests/firmware/config
> +++ b/tools/testing/selftests/firmware/config
> @@ -1,5 +1,6 @@
>  CONFIG_TEST_FIRMWARE=y
>  CONFIG_FW_LOADER=y
>  CONFIG_FW_LOADER_USER_HELPER=y
> +CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
>  CONFIG_IKCONFIG=y
>  CONFIG_IKCONFIG_PROC=y
> --
> 2.19.1
>



-- 
Kees Cook


Re: [PATCH 1/2] selftests: firmware: remove use of non-standard diff -Z option

2018-11-27 Thread Kees Cook
On Mon, Nov 26, 2018 at 7:12 PM, Dan Rue  wrote:
> diff -Z is used to trim the trailing whitespace when comparing the
> loaded firmware file with the source firmware file. However, per the
> comment in the source code, -Z should not be necessary. In testing, the
> input and output files are identical.
>
> Additionally, -Z is not a standard option and is not available in
> environments such as busybox. When -Z is not supported, diff fails with
> a usage error, which is suppressed, but then causes read_firmwares() to
> exit with a false failure message.
>
> Signed-off-by: Dan Rue 

Acked-by: Kees Cook 

-Kees

> ---
>  tools/testing/selftests/firmware/fw_filesystem.sh | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/firmware/fw_filesystem.sh 
> b/tools/testing/selftests/firmware/fw_filesystem.sh
> index a4320c4b44dc..466cf2f91ba0 100755
> --- a/tools/testing/selftests/firmware/fw_filesystem.sh
> +++ b/tools/testing/selftests/firmware/fw_filesystem.sh
> @@ -155,11 +155,8 @@ read_firmwares()
>  {
> for i in $(seq 0 3); do
> config_set_read_fw_idx $i
> -   # Verify the contents are what we expect.
> -   # -Z required for now -- check for yourself, md5sum
> -   # on $FW and DIR/read_firmware will yield the same. Even
> -   # cmp agrees, so something is off.
> -   if ! diff -q -Z "$FW" $DIR/read_firmware 2>/dev/null ; then
> +   # Verify the contents match
> +   if ! diff -q "$FW" $DIR/read_firmware 2>/dev/null ; then
> echo "request #$i: firmware was not loaded" >&2
> exit 1
> fi
> @@ -171,7 +168,7 @@ read_firmwares_expect_nofile()
> for i in $(seq 0 3); do
> config_set_read_fw_idx $i
> # Ensures contents differ
> -   if diff -q -Z "$FW" $DIR/read_firmware 2>/dev/null ; then
> +   if diff -q "$FW" $DIR/read_firmware 2>/dev/null ; then
> echo "request $i: file was not expected to match" >&2
> exit 1
> fi
> --
> 2.19.1
>



-- 
Kees Cook


Re: [PATCH v2] panic: Add options to print system info when panic happens

2018-11-27 Thread Kees Cook
On Mon, Nov 26, 2018 at 11:15 PM, Feng Tang  wrote:
> Kernel panic issues are always painful to debug, partially
> because it's not easy to get enough information of the
> context when panic happens.
>
> And we have ramoops and kdump for that, while this commit
> tries to provide a easier way to show the system info by adding
> a cmdline parameter, referring some idea from sysrq handler.
>
> Signed-off-by: Feng Tang 
> Cc: Thomas Gleixner 
> Cc: John Stultz 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Steven Rostedt 

Reviewed-by: Kees Cook 

-Kees


> ---
> Changelog:
>  v2:
> - change text "dump/DUMP" to "print/PRINT" which
>   is more accurate, suggested by Andrew Morton
> - add code to print ftrace buffer
>
>  Documentation/admin-guide/kernel-parameters.txt |  8 +++
>  kernel/panic.c  | 28 
> +
>  2 files changed, 36 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 19f4423..80c819a 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3081,6 +3081,14 @@
> timeout < 0: reboot immediately
> Format: 
>
> +   panic_print=Bitmask for printing system info when panic happens.
> +   User can chose combination of the following bits:
> +   bit 0: print all tasks info
> +   bit 1: print system memory info
> +   bit 2: print timer info
> +   bit 3: print locks info if CONFIG_LOCKDEP is on
> +   bit 4: print ftrace buffer
> +
> panic_on_warn   panic() instead of WARN().  Useful to cause kdump
> on a WARN().
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index f6d549a..fb6ccd1 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -45,6 +45,13 @@ int panic_on_warn __read_mostly;
>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>  EXPORT_SYMBOL_GPL(panic_timeout);
>
> +#define PANIC_PRINT_TASK_INFO  0x0001
> +#define PANIC_PRINT_MEM_INFO   0x0002
> +#define PANIC_PRINT_TIMER_INFO 0x0004
> +#define PANIC_PRINT_LOCK_INFO  0x0008
> +#define PANIC_PRINT_FTRACE_INFO0x0010
> +static unsigned long panic_print;
> +
>  ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
>
>  EXPORT_SYMBOL(panic_notifier_list);
> @@ -124,6 +131,24 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
>  }
>  EXPORT_SYMBOL(nmi_panic);
>
> +static void panic_print_sys_info(void)
> +{
> +   if (panic_print & PANIC_PRINT_TASK_INFO)
> +   show_state();
> +
> +   if (panic_print & PANIC_PRINT_MEM_INFO)
> +   show_mem(0, NULL);
> +
> +   if (panic_print & PANIC_PRINT_TIMER_INFO)
> +   sysrq_timer_list_show();
> +
> +   if (panic_print & PANIC_PRINT_LOCK_INFO)
> +   debug_show_all_locks();
> +
> +   if (panic_print & PANIC_PRINT_FTRACE_INFO)
> +   ftrace_dump(DUMP_ALL);
> +}
> +
>  /**
>   * panic - halt the system
>   * @fmt: The text string to print
> @@ -250,6 +275,8 @@ void panic(const char *fmt, ...)
> debug_locks_off();
> console_flush_on_panic();
>
> +   panic_print_sys_info();
> +
> if (!panic_blink)
> panic_blink = no_blink;
>
> @@ -654,6 +681,7 @@ void refcount_error_report(struct pt_regs *regs, const 
> char *err)
>  #endif
>
>  core_param(panic, panic_timeout, int, 0644);
> +core_param(panic_print, panic_print, ulong, 0644);
>  core_param(pause_on_oops, pause_on_oops, int, 0644);
>  core_param(panic_on_warn, panic_on_warn, int, 0644);
>  core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 
> 0644);
> --
> 2.7.4
>



-- 
Kees Cook


Re: linux-next: build warnings after merge of the cifs tree

2018-11-25 Thread Kees Cook
On Sun, Nov 25, 2018 at 4:52 PM Stephen Rothwell  wrote:
>
> Hi Steve,
>
> On Sun, 25 Nov 2018 18:31:40 -0600 Steve French  wrote:
> >
> > Both of those cases are intentional fallthroughs and there are
> > existing comments in the code noting the reasons for them to
> > fallthrough
> >
> > (also can see the reasoning for these in the commits which introduced
> > them from Sachin c369c9a4a7c82) and dde2356c84662)
>
> I am not questioning that :-)
>
> The gcc warning can be turned off by adding a /* fall through */
> comment at the point the fall through happens.  Kees and others are
> working on the several hundred other places that need annotating.

Right. The goal is to avoid adding any _new_ cases of this. :)

> This one just popped up.

It's already working! :) Thanks Stephen!

-Kees

-- 
Kees Cook


Re: [PATCH 1/2] x86/pkeys: copy pkey state at fork()

2018-11-20 Thread Kees Cook
On Fri, Oct 26, 2018 at 12:59 PM, Dave Hansen  wrote:
> On 10/26/18 12:51 PM, Dave Hansen wrote:
> ...
>> The result is that, after a fork(), the child's pkey state ends up
>> looking like it does after an execve(), which is totally wrong.  pkeys
>> that are already allocated can be allocated again, for instance.
>
> One thing I omitted.  This was very nicely discovered and reported by
> danielmi...@gmail.com.  Thanks, Daniel!

Thread ping. Is there a v2 of this, or can this go in as-is? Looks good to me:

Reviewed-by: Kees Cook 

-Kees

-- 
Kees Cook


Re: [PATCH 1/2] build_bug.h: remove negative-array fallback for BUILD_BUG_ON()

2018-11-16 Thread Kees Cook
On Fri, Nov 16, 2018 at 12:19 AM, Masahiro Yamada
 wrote:
> The kernel can only be compiled with an optimization option (-O2, -Os,
> or the currently proposed -Og). Hence, __OPTIMIZE__ is always defined
> in the kernel source.
>
> A fallback for -O0 case is just hypothetical and pointless. Moreover,
> commit 0bb95f80a38f ("Makefile: Globally enable VLA warning") enabled
> -Wvla warning. The use of variable length arrays is banned.
>
> Signed-off-by: Masahiro Yamada 

Acked-by: Kees Cook 

-Kees

> ---
>
>  include/linux/build_bug.h | 14 --
>  1 file changed, 14 deletions(-)
>
> diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
> index 43d1fd5..d415c64 100644
> --- a/include/linux/build_bug.h
> +++ b/include/linux/build_bug.h
> @@ -51,23 +51,9 @@
>   * If you have some code which relies on certain constants being equal, or
>   * some other compile-time-evaluated condition, you should use BUILD_BUG_ON 
> to
>   * detect if someone changes it.
> - *
> - * The implementation uses gcc's reluctance to create a negative array, but 
> gcc
> - * (as of 4.4) only emits that error for obvious cases (e.g. not arguments to
> - * inline functions).  Luckily, in 4.3 they added the "error" function
> - * attribute just for this type of case.  Thus, we use a negative sized array
> - * (should always create an error on gcc versions older than 4.4) and then 
> call
> - * an undefined function with the error attribute (should always create an
> - * error on gcc 4.3 and later).  If for some reason, neither creates a
> - * compile-time error, we'll still have a link-time error, which is harder to
> - * track down.
>   */
> -#ifndef __OPTIMIZE__
> -#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
> -#else
>  #define BUILD_BUG_ON(condition) \
> BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
> -#endif
>
>  /**
>   * BUILD_BUG - break compile if used.
> --
> 2.7.4
>



-- 
Kees Cook


Re: [PATCH v2 2/2] build_bug.h: remove all dummy BUILD_BUG_ON stubs for sparse

2018-11-16 Thread Kees Cook
On Fri, Nov 16, 2018 at 12:27 AM, Masahiro Yamada
 wrote:
> The introduction of these dummy BUILD_BUG_ON stubs dates back to
> commit 903c0c7cdc21 ("sparse: define dummy BUILD_BUG_ON definition
> for sparse"). At that time, BUILD_BUG_ON() was implemented with the
> negative array trick, which Sparse complains about even if the
> condition can be optimized and evaluated to 0 at compile-time.
>
> With the previous commit, the leftover negative array trick is gone.
> Sparse is happy with the current BUILD_BUG_ON(), which is implemented
> by using the 'error' attribute.
>
> There might be a little room for argument about BUILD_BUG_ON_ZERO().
> Sparse reports 'invalid bitfield width, -1' for non-zero value,
> and 'bad integer constant expression' for non-constant value.
> This is the same criteria as GCC uses. So, if those Sparse errors
> occurred, they would cause errors for GCC as well. (Hence, such
> errors would have been detected by the normal compile test process.)
>
> Signed-off-by: Masahiro Yamada 

Acked-by: Kees Cook 

-Kees

> ---
>
> Changes in v2:
>  - Fix a coding style error (two consecutive blank lines)
>
>  include/linux/build_bug.h | 12 
>  1 file changed, 12 deletions(-)
>
> diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
> index d415c64..6625c88 100644
> --- a/include/linux/build_bug.h
> +++ b/include/linux/build_bug.h
> @@ -4,16 +4,6 @@
>
>  #include 
>
> -#ifdef __CHECKER__
> -#define __BUILD_BUG_ON_NOT_POWER_OF_2(n) (0)
> -#define BUILD_BUG_ON_NOT_POWER_OF_2(n) (0)
> -#define BUILD_BUG_ON_ZERO(e) (0)
> -#define BUILD_BUG_ON_INVALID(e) (0)
> -#define BUILD_BUG_ON_MSG(cond, msg) (0)
> -#define BUILD_BUG_ON(condition) (0)
> -#define BUILD_BUG() (0)
> -#else /* __CHECKER__ */
> -
>  /* Force a compilation error if a constant expression is not a power of 2 */
>  #define __BUILD_BUG_ON_NOT_POWER_OF_2(n)   \
> BUILD_BUG_ON(((n) & ((n) - 1)) != 0)
> @@ -64,6 +54,4 @@
>   */
>  #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
>
> -#endif /* __CHECKER__ */
> -
>  #endif /* _LINUX_BUILD_BUG_H */
> --
> 2.7.4
>



-- 
Kees Cook


Re: [PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"

2018-11-15 Thread Kees Cook
On Thu, Nov 15, 2018 at 5:55 AM, David Herrmann  wrote:
> Hi
>
> On Thu, Nov 15, 2018 at 12:09 AM Kees Cook  wrote:
>> On Wed, Nov 14, 2018 at 9:40 AM, Laura Abbott  wrote:
> [...]
>> > Can we switch to strscpy instead? This will quiet gcc and avoid the
>> > issues with strlcpy.
>>
>> Yes please: it looks like these strings are expected to be NUL
>> terminated, so strscpy() without the "- 1" and min() logic would be
>> the correct solution here.
>
> "the correct solution"? To my knowledge the original code was correct
> as well. Am I missing something?

So, yes, no one should use strlcpy():
https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy

And while I think nothing was technically wrong with the strncpy()
usage in the original version, I think strncpy() should only be used
for __nonstring cases:
https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings

>
>>If @hid is already zero, then this would
>> just be:
>>
>>strscpy(hid->name, ev->u.create2.name, sizeof(hid->name));
>>strscpy(hid->phys, ev->u.create2.phys, sizeof(hid->phys));
>>strscpy(hid->uniq, ev->u.create2.uniq, sizeof(hid->uniq));
>>
>> If they are NOT NUL terminated, then keep using strncpy() but mark the
>> fields in the struct with the __nonstring attribute.
>
> They are supposed to be NUL terminated, but for compatibility reasons
> we allow them to be not. So I don't think your proposal is safe.

I was originally thinking only about the the hid->* strings, so I was
confused by this answer (they appear to always be NUL-terminated).
Then I thought you meant that ev->u.create2.* strings may not be
terminated, but I stayed confused. :)

The original code was:

len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1;
strncpy(hid->name, ev->u.create2.name, len);

If sizeof(hid->name) is smaller than sizeof(ev->u.create2.name), it
made sure than hid->name kept a trailing NUL.

If sizeof(ev->u.create2.name) is smaller than sizeof(hid->name), it
made sure than the last byte of ev->u.create2.name was ignored, and by
definition, hid->name would be NUL-terminated.

So you're converting from a potentially unterminated string into a
terminated string... (ev->u.create2.name maybe needs to be marked
__nonstring?)

The most you can write is sizeof(dest) - 1 but you must not read more
than sizeof(source). So I see that if the destination is smaller than
the source, you cannot represent these conditions correctly to
strscpy(). (And, I would argue, you can't with strncpy() either.)

However, they're all exactly the same size, so none of this matters,
and I think strscpy() would be the most sensible. And maybe you could
enforce the size checking:

BUILD_BUG_ON(sizeof(hid->name) != sizeof(ev->u.create2.name));
strscpy(hid->name, ev->u.create2.name, sizeof(hid->name));

etc...

-- 
Kees Cook


Re: [PATCH] firmware: raspberrypi: Fix firmware calls with large buffers

2018-11-15 Thread Kees Cook
On Thu, Nov 15, 2018 at 10:05 AM, Stefan Wahren  wrote:
> Hi James,
>
> please look at
> https://www.kernel.org/doc/html/v4.19/process/submitting-patches.html,
> because there are several issues with this patch. Most critical one is
> that i received it not as plain text. Please make sure that send your
> patch with git send-email.

The irony is I had to fight Gmail over your multipart HTML+plain
email. Yay email clients! ;)

> Am 15.11.18 um 14:18 schrieb James Hughes:
> > A previous change (5bfdc1097654) moved away from VLA's
>
> Please use the commit format mentioned in the link above.

And actually, this SHA isn't the upstream SHA. This should be:

a1547e0bca51 ("firmware: raspberrypi: Remove VLA usage")

> > to a fixed maximum size for mailbox data.
> > However, some mailbox calls use larger data buffers
> > than the maximum allowed in that change. This fix therefor

Which ones did this? In the initial change I couldn't find anything
that exceeded 32 bytes. (I'm just curious if I missed something or if
something new appeared.)

> > [...]
> > + /* Some mailboxes can use over 1k bytes. Rather than checking

Up to Eric, but I think the preferred comment style is:

/*
 * lines go here
 */

Otherwise, this seems fine to me. Thanks for getting it fixed!

-- 
Kees Cook


Re: [PATCH] mm/usercopy: Use memory range to be accessed for wraparound check

2018-11-14 Thread Kees Cook
On Tue, Nov 13, 2018 at 6:51 PM, Isaac J. Manjarres
 wrote:
> Currently, when checking to see if accessing n bytes starting at
> address "ptr" will cause a wraparound in the memory addresses,
> the check in check_bogus_address() adds an extra byte, which is
> incorrect, as the range of addresses that will be accessed is
> [ptr, ptr + (n - 1)].
>
> This can lead to incorrectly detecting a wraparound in the
> memory address, when trying to read 4 KB from memory that is
> mapped to the the last possible page in the virtual address
> space, when in fact, accessing that range of memory would not
> cause a wraparound to occur.

I'm kind of surprised anything is using the -4K memory range -- this
is ERR_PTR() area and I'd expect there to be an explicit unallocated
memory hole here.

>
> Use the memory range that will actually be accessed when
> considering if accessing a certain amount of bytes will cause
> the memory address to wrap around.
>
> Change-Id: I2563a5988e41122727ede17180f365e999b953e6
> Fixes: f5509cc18daa ("mm: Hardened usercopy")
> Co-Developed-by: Prasad Sodagudi 
> Signed-off-by: Prasad Sodagudi 
> Signed-off-by: Isaac J. Manjarres 
> Cc: sta...@vger.kernel.org

Regardless, I'll take it in my tree if akpm doesn't grab it first. :)

Acked-by: Kees Cook 

-Kees

> ---
>  mm/usercopy.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/usercopy.c b/mm/usercopy.c
> index 852eb4e..0293645 100644
> --- a/mm/usercopy.c
> +++ b/mm/usercopy.c
> @@ -151,7 +151,7 @@ static inline void check_bogus_address(const unsigned 
> long ptr, unsigned long n,
>bool to_user)
>  {
> /* Reject if object wraps past end of memory. */
> -   if (ptr + n < ptr)
> +   if (ptr + (n - 1) < ptr)
> usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
>
> /* Reject if NULL or ZERO-allocation. */
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>



-- 
Kees Cook


Re: [PATCH] mm/usercopy: Use memory range to be accessed for wraparound check

2018-11-14 Thread Kees Cook
On Wed, Nov 14, 2018 at 4:35 AM, William Kucharski
 wrote:
>
>
>> On Nov 13, 2018, at 5:51 PM, Isaac J. Manjarres  
>> wrote:
>>
>> diff --git a/mm/usercopy.c b/mm/usercopy.c
>> index 852eb4e..0293645 100644
>> --- a/mm/usercopy.c
>> +++ b/mm/usercopy.c
>> @@ -151,7 +151,7 @@ static inline void check_bogus_address(const unsigned 
>> long ptr, unsigned long n,
>>  bool to_user)
>> {
>>   /* Reject if object wraps past end of memory. */
>> - if (ptr + n < ptr)
>> + if (ptr + (n - 1) < ptr)
>>   usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
>
> I'm being paranoid, but is it possible this routine could ever be passed "n" 
> set to zero?

It's a single-use inline, and zero is tested just before getting called:

/* Skip all tests if size is zero. */
if (!n)
return;

/* Check for invalid addresses. */
check_bogus_address((const unsigned long)ptr, n, to_user);


>
> If so, it will erroneously abort indicating a wrapped address as (n - 1) 
> wraps to ULONG_MAX.
>
> Easily fixed via:
>
> if ((n != 0) && (ptr + (n - 1) < ptr))

Agreed. Thanks for noticing this!

-Kees

-- 
Kees Cook


Re: [PATCH] Revert "HID: uhid: use strlcpy() instead of strncpy()"

2018-11-14 Thread Kees Cook
On Wed, Nov 14, 2018 at 9:40 AM, Laura Abbott  wrote:
> On 11/14/18 5:16 AM, David Herrmann wrote:
>>
>> This reverts commit 336fd4f5f25157e9e8bd50e898a1bbcd99eaea46.
>>
>> Please note that `strlcpy()` does *NOT* do what you think it does.
>> strlcpy() *ALWAYS* reads the full input string, regardless of the
>> 'length' parameter. That is, if the input is not zero-terminated,
>> strlcpy() will *READ* beyond input boundaries. It does this, because it
>> always returns the size it *would* copy if the target was big enough,
>> not the truncated size it actually copied.
>>
>> The original code was perfectly fine. The hid device is
>> zero-initialized and the strncpy() functions copied up to n-1
>> characters. The result is always zero-terminated this way.
>>
>> This is the third time someone tried to replace strncpy with strlcpy in
>> this function, and gets it wrong. I now added a comment that should at
>> least make people reconsider.
>>
>
> Can we switch to strscpy instead? This will quiet gcc and avoid the
> issues with strlcpy.

Yes please: it looks like these strings are expected to be NUL
terminated, so strscpy() without the "- 1" and min() logic would be
the correct solution here. If @hid is already zero, then this would
just be:

   strscpy(hid->name, ev->u.create2.name, sizeof(hid->name));
   strscpy(hid->phys, ev->u.create2.phys, sizeof(hid->phys));
   strscpy(hid->uniq, ev->u.create2.uniq, sizeof(hid->uniq));

If they are NOT NUL terminated, then keep using strncpy() but mark the
fields in the struct with the __nonstring attribute.

-Kees

>
>
>> Signed-off-by: David Herrmann 
>> ---
>>   drivers/hid/uhid.c | 13 +++--
>>   1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/hid/uhid.c b/drivers/hid/uhid.c
>> index fefedc0b4dc6..0dfdd0ac7120 100644
>> --- a/drivers/hid/uhid.c
>> +++ b/drivers/hid/uhid.c
>> @@ -496,12 +496,13 @@ static int uhid_dev_create2(struct uhid_device
>> *uhid,
>> goto err_free;
>> }
>>   - len = min(sizeof(hid->name), sizeof(ev->u.create2.name));
>> -   strlcpy(hid->name, ev->u.create2.name, len);
>> -   len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys));
>> -   strlcpy(hid->phys, ev->u.create2.phys, len);
>> -   len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq));
>> -   strlcpy(hid->uniq, ev->u.create2.uniq, len);
>> +   /* @hid is zero-initialized, strncpy() is correct, strlcpy() not
>> */
>> +   len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1;
>> +   strncpy(hid->name, ev->u.create2.name, len);
>> +   len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1;
>> +   strncpy(hid->phys, ev->u.create2.phys, len);
>> +   len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1;
>> +   strncpy(hid->uniq, ev->u.create2.uniq, len);
>> hid->ll_driver = _hid_driver;
>> hid->bus = ev->u.create2.bus;
>>
>



-- 
Kees Cook


Re: [PATCH 2/8] pstore: Do not use crash buffer for decompression

2018-11-13 Thread Kees Cook
On Fri, Nov 2, 2018 at 1:24 PM, Joel Fernandes  wrote:
> On Thu, Nov 01, 2018 at 04:51:54PM -0700, Kees Cook wrote:
>>  static void decompress_record(struct pstore_record *record)
>>  {
>> + int ret;
>>   int unzipped_len;
>
> nit: We could get rid of the unzipped_len variable now I think.

I didn't follow this -- it gets used quite a bit. I don't see a clean
way to remove it?

>> + workspace = kmalloc(unzipped_len + record->ecc_notice_size,
>
> Should tihs be unzipped_len + record->ecc_notice_size + 1. The extra byte
> being for the NULL character of the ecc notice?
>
> This occurred to me when I saw the + 1 in ram.c. It could be better to just
> abstract the size as a macro.

Ooh, yes, good catch. I'll get this fixed.

Thanks for the review!

-- 
Kees Cook


Re: [PATCH RFC v2 0/3] cleanups for pstore and ramoops

2018-11-13 Thread Kees Cook
On Sat, Nov 3, 2018 at 6:38 PM, Joel Fernandes (Google)
 wrote:
> Here are some simple cleanups and fixes for ramoops in pstore. Let me know
> what you think, thanks.

I took these and slightly tweaked code locations for the first one.
I'll send out the series for review when I'm back from Plumber's.

-Kees

>
> Joel Fernandes (Google) (3):
> pstore: map pstore types to names
> pstore: simplify ramoops_get_next_prz arguments
> pstore: donot treat empty buffers as valid
>
> fs/pstore/inode.c  | 53 +-
> fs/pstore/ram.c| 52 +++--
> fs/pstore/ram_core.c   |  2 +-
> include/linux/pstore.h | 37 ++
> include/linux/pstore_ram.h |  2 ++
> 5 files changed, 67 insertions(+), 79 deletions(-)
>
> --
> 2.19.1.930.g4563a0d9d0-goog
>



-- 
Kees Cook


Re: [PATCH] exec: separate MM_ANONPAGES and RLIMIT_STACK accounting

2018-11-13 Thread Kees Cook
On Mon, Nov 12, 2018 at 10:09 AM, Oleg Nesterov  wrote:
> get_arg_page() checks bprm->rlim_stack.rlim_cur and re-calculates the
> "extra" size for argv/envp pointers every time, this is a bit ugly and
> even not strictly correct: acct_arg_size() must not account this size.
>
> Remove all the rlimit code in get_arg_page(). Instead, add bprm->argmin
> calculated once at the start of __do_execve_file() and change copy_strings
> to check bprm->p >= bprm->argmin.
>
> The patch adds the new helper, prepare_arg_pages() which initializes
> bprm->argc/envc and bprm->argmin.
>
> Signed-off-by: Oleg Nesterov 

Acked-by: Kees Cook 

Thanks for nailing this all down. :)

-Kees

> ---
>  fs/exec.c   | 103 
> +++-
>  include/linux/binfmts.h |   1 +
>  2 files changed, 51 insertions(+), 53 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index fc281b7..61a5460 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -218,55 +218,10 @@ static struct page *get_arg_page(struct linux_binprm 
> *bprm, unsigned long pos,
> if (ret <= 0)
> return NULL;
>
> -   if (write) {
> -   unsigned long size = bprm->vma->vm_end - bprm->vma->vm_start;
> -   unsigned long ptr_size, limit;
> -
> -   /*
> -* Since the stack will hold pointers to the strings, we
> -* must account for them as well.
> -*
> -* The size calculation is the entire vma while each arg page 
> is
> -* built, so each time we get here it's calculating how far it
> -* is currently (rather than each call being just the newly
> -* added size from the arg page).  As a result, we need to
> -* always add the entire size of the pointers, so that on the
> -* last call to get_arg_page() we'll actually have the entire
> -* correct size.
> -*/
> -   ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
> -   if (ptr_size > ULONG_MAX - size)
> -   goto fail;
> -   size += ptr_size;
> -
> -   acct_arg_size(bprm, size / PAGE_SIZE);
> -
> -   /*
> -* We've historically supported up to 32 pages (ARG_MAX)
> -* of argument strings even with small stacks
> -*/
> -   if (size <= ARG_MAX)
> -   return page;
> -
> -   /*
> -* Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
> -* (whichever is smaller) for the argv+env strings.
> -* This ensures that:
> -*  - the remaining binfmt code will not run out of stack 
> space,
> -*  - the program will have a reasonable amount of stack left
> -*to work from.
> -*/
> -   limit = _STK_LIM / 4 * 3;
> -   limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
> -   if (size > limit)
> -   goto fail;
> -   }
> +   if (write)
> +   acct_arg_size(bprm, vma_pages(bprm->vma));
>
> return page;
> -
> -fail:
> -   put_page(page);
> -   return NULL;
>  }
>
>  static void put_arg_page(struct page *page)
> @@ -492,6 +447,50 @@ static int count(struct user_arg_ptr argv, int max)
> return i;
>  }
>
> +static int prepare_arg_pages(struct linux_binprm *bprm,
> +   struct user_arg_ptr argv, struct user_arg_ptr envp)
> +{
> +   unsigned long limit, ptr_size;
> +
> +   bprm->argc = count(argv, MAX_ARG_STRINGS);
> +   if (bprm->argc < 0)
> +   return bprm->argc;
> +
> +   bprm->envc = count(envp, MAX_ARG_STRINGS);
> +   if (bprm->envc < 0)
> +   return bprm->envc;
> +
> +   /*
> +* Limit to 1/4 of the max stack size or 3/4 of _STK_LIM
> +* (whichever is smaller) for the argv+env strings.
> +* This ensures that:
> +*  - the remaining binfmt code will not run out of stack space,
> +*  - the program will have a reasonable amount of stack left
> +*to work from.
> +*/
> +   limit = _STK_LIM / 4 * 3;
> +   limit = min(limit, bprm->rlim_stack.rlim_cur / 4);
> +   /*
> +* We've historically supported up to 32 pages (ARG_MAX)
> +* of argument strings even with small stacks
> +*/
> +   l

Re: [PATCH 1/2] exec: load_script: don't blindly truncate shebang string

2018-11-13 Thread Kees Cook
On Mon, Nov 12, 2018 at 10:09 AM, Oleg Nesterov  wrote:
> load_script() simply truncates bprm->buf and this is very wrong if the
> length of shebang string exceeds BINPRM_BUF_SIZE-2. This can silently
> truncate i_arg or (worse) we can execute the wrong binary if buf[2:126]
> happens to be the valid executable path.
>
> Change load_script() to return ENOEXEC if it can't find '\n' or zero in
> bprm->buf. Note that '\0' can come from either prepare_binprm()->memset()
> or from kernel_read(), we do not care.
>
> Signed-off-by: Oleg Nesterov 

Acked-by: Kees Cook 

-Kees

> ---
>  fs/binfmt_script.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
> index 7cde3f4..d0078cb 100644
> --- a/fs/binfmt_script.c
> +++ b/fs/binfmt_script.c
> @@ -42,10 +42,14 @@ static int load_script(struct linux_binprm *bprm)
> fput(bprm->file);
> bprm->file = NULL;
>
> -   bprm->buf[BINPRM_BUF_SIZE - 1] = '\0';
> -   if ((cp = strchr(bprm->buf, '\n')) == NULL)
> -   cp = bprm->buf+BINPRM_BUF_SIZE-1;
> +   for (cp = bprm->buf+2;; cp++) {
> +   if (cp >= bprm->buf + BINPRM_BUF_SIZE)
> +   return -ENOEXEC;
> +   if (!*cp || (*cp == '\n'))
> +   break;
> +   }
> *cp = '\0';
> +
> while (cp > bprm->buf) {
> cp--;
> if ((*cp == ' ') || (*cp == '\t'))
> --
> 2.5.0
>
>



-- 
Kees Cook


Re: [PATCH 1/1] stackleak: Disable function tracing and kprobes for stackleak_erase()

2018-11-13 Thread Kees Cook
On Mon, Nov 12, 2018 at 3:08 PM, Alexander Popov  wrote:
> The stackleak_erase() function is called on the trampoline stack at the end
> of syscall. This stack is not big enough for ftrace and kprobes operations,
> e.g. it can be exhausted if we use kprobe_events for stackleak_erase().
>
> So let's disable function tracing and kprobes for stackleak_erase().
>
> Reported-by: kernel test robot 
> Signed-off-by: Alexander Popov 

Thanks! I'll get this into my tree.

-Kees

> ---
>  kernel/stackleak.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> index e428929..08cb57e 100644
> --- a/kernel/stackleak.c
> +++ b/kernel/stackleak.c
> @@ -11,6 +11,7 @@
>   */
>
>  #include 
> +#include 
>
>  #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
>  #include 
> @@ -47,7 +48,7 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
>  #define skip_erasing() false
>  #endif /* CONFIG_STACKLEAK_RUNTIME_DISABLE */
>
> -asmlinkage void stackleak_erase(void)
> +asmlinkage void notrace stackleak_erase(void)
>  {
> /* It would be nice not to have 'kstack_ptr' and 'boundary' on stack 
> */
> unsigned long kstack_ptr = current->lowest_stack;
> @@ -101,6 +102,7 @@ asmlinkage void stackleak_erase(void)
> /* Reset the 'lowest_stack' value for the next syscall */
> current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
>  }
> +NOKPROBE_SYMBOL(stackleak_erase);
>
>  void __used stackleak_track_stack(void)
>  {
> --
> 2.7.4
>



-- 
Kees Cook


Re: [PATCH 2/2] exec: increase BINPRM_BUF_SIZE to 256

2018-11-12 Thread Kees Cook
On Mon, Nov 12, 2018 at 5:52 PM, Andrew Morton
 wrote:
> On Mon, 12 Nov 2018 17:09:56 +0100 Oleg Nesterov  wrote:
>
>> Large enterprise clients often times run applications out of networked
>> file systems where the IT mandated layout of project volumes can end up
>> leading to paths that are longer than 128 characters. Bumping this up to
>> the next order of two solves this problem in all but the most egregious
>> case while still fitting into a 512b slab.
>>
>> ...
>>
>> --- a/include/uapi/linux/binfmts.h
>> +++ b/include/uapi/linux/binfmts.h
>> @@ -16,6 +16,6 @@ struct pt_regs;
>>  #define MAX_ARG_STRINGS 0x7FFF
>>
>>  /* sizeof(linux_binprm->buf) */
>> -#define BINPRM_BUF_SIZE 128
>> +#define BINPRM_BUF_SIZE 256

This comment needs updating too:

fs/exec.c: * Check permissions, then read the first 128 (BINPRM_BUF_SIZE) bytes

>>  #endif /* _UAPI_LINUX_BINFMTS_H */
>
> It does seem a rather silly restriction, and it's tempting to suggest
> reworking the code so that linux_binprm.buf is dynamically sized to
> accommodate even ludicrously large strings.
>
> But obviously 128 bytes has been enough for all this time, so that's
> going too far.  However it would be basically cost-free to increase
> BINPRM_BUF_SIZE up to the point where sizeof(struct linux_binprm) ==
> PAGE_SIZE?

Yeah, and this might be a useful detail included in a comment above
the #define...

Regardless:

Acked-by: Kees Cook 


-- 
Kees Cook


Re: [PATCH] x86: remove gcc-x86_*-has-stack-protector.sh checks

2018-11-12 Thread Kees Cook
On Mon, Nov 12, 2018 at 7:54 AM, Masahiro Yamada
 wrote:
> On Mon, Nov 12, 2018 at 5:29 PM Kees Cook  wrote:
>>
>> On Sun, Nov 11, 2018 at 9:06 PM, Masahiro Yamada
>>  wrote:
>> > gcc-x86_64-has-stack-protector.sh was introduced by commit 4f7fd4d7a791
>> > ("[PATCH] Add the -fstack-protector option to the CFLAGS") in 2006
>> > to work around buggy compilers.
>> >
>> > gcc-x86_32-has-stack-protector.sh was introduced by commit 60a5317ff0f4
>> > ("x86: implement x86_32 stack protector"), which did not clearly state
>> > whether compilers were still producing broken code at that time.
>> >
>> > Now, the minimum reuquired GCC version is 4.6, which was released in
>> > 2011. Probably, we can dump these old compiler checks.
>>
>> NAK. We need to keep this because we've seen recent regressions with
>> stack protection (e.g. gcc briefly used global instead of tls for the
>> canary, which silently broke the use of stack protectors). Since the
>> gcc/kernel "API" for the canary is so fragile we need to keep these
>> tests to make sure things end up where they're expected.
>
> Thanks for your feedback.
>
> I did not know this is still fragile even after ten years time.
>
> One more curious thing is, x86 is the only arch ever
> that has had this kind of script check.

Presently, yes -- x86 is the only arch with non-global canaries,
though something may be coming soon for arm64. However, that case may
be more detectable with cc-option. The trouble with gcc was that the
default switched at one point. :(


-- 
Kees Cook


Re: [PATCH] x86: remove gcc-x86_*-has-stack-protector.sh checks

2018-11-12 Thread Kees Cook
On Sun, Nov 11, 2018 at 9:06 PM, Masahiro Yamada
 wrote:
> gcc-x86_64-has-stack-protector.sh was introduced by commit 4f7fd4d7a791
> ("[PATCH] Add the -fstack-protector option to the CFLAGS") in 2006
> to work around buggy compilers.
>
> gcc-x86_32-has-stack-protector.sh was introduced by commit 60a5317ff0f4
> ("x86: implement x86_32 stack protector"), which did not clearly state
> whether compilers were still producing broken code at that time.
>
> Now, the minimum reuquired GCC version is 4.6, which was released in
> 2011. Probably, we can dump these old compiler checks.

NAK. We need to keep this because we've seen recent regressions with
stack protection (e.g. gcc briefly used global instead of tls for the
canary, which silently broke the use of stack protectors). Since the
gcc/kernel "API" for the canary is so fragile we need to keep these
tests to make sure things end up where they're expected.

-Kees

>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/x86/Kconfig  | 10 +-
>  scripts/gcc-x86_32-has-stack-protector.sh |  4 
>  scripts/gcc-x86_64-has-stack-protector.sh |  4 
>  3 files changed, 1 insertion(+), 17 deletions(-)
>  delete mode 100755 scripts/gcc-x86_32-has-stack-protector.sh
>  delete mode 100755 scripts/gcc-x86_64-has-stack-protector.sh
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9d734f3..7240d50 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -187,7 +187,7 @@ config X86
> select HAVE_REGS_AND_STACK_ACCESS_API
> select HAVE_RELIABLE_STACKTRACE if X86_64 && 
> (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION
> select HAVE_FUNCTION_ARG_ACCESS_API
> -   select HAVE_STACKPROTECTOR  if CC_HAS_SANE_STACKPROTECTOR
> +   select HAVE_STACKPROTECTOR
> select HAVE_STACK_VALIDATIONif X86_64
> select HAVE_RSEQ
> select HAVE_SYSCALL_TRACEPOINTS
> @@ -352,14 +352,6 @@ config PGTABLE_LEVELS
> default 3 if X86_PAE
> default 2
>
> -config CC_HAS_SANE_STACKPROTECTOR
> -   bool
> -   default 
> $(success,$(srctree)/scripts/gcc-x86_64-has-stack-protector.sh $(CC)) if 64BIT
> -   default 
> $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC))
> -   help
> -  We have to make sure stack protector is unconditionally disabled if
> -  the compiler produces broken code.
> -
>  menu "Processor type and features"
>
>  config ZONE_DMA
> diff --git a/scripts/gcc-x86_32-has-stack-protector.sh 
> b/scripts/gcc-x86_32-has-stack-protector.sh
> deleted file mode 100755
> index f5c1194..000
> --- a/scripts/gcc-x86_32-has-stack-protector.sh
> +++ /dev/null
> @@ -1,4 +0,0 @@
> -#!/bin/sh
> -# SPDX-License-Identifier: GPL-2.0
> -
> -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 
> -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
> diff --git a/scripts/gcc-x86_64-has-stack-protector.sh 
> b/scripts/gcc-x86_64-has-stack-protector.sh
> deleted file mode 100755
> index 75e4e22..000
> --- a/scripts/gcc-x86_64-has-stack-protector.sh
> +++ /dev/null
> @@ -1,4 +0,0 @@
> -#!/bin/sh
> -# SPDX-License-Identifier: GPL-2.0
> -
> -echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m64 -O0 
> -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
> --
> 2.7.4
>



-- 
Kees Cook


Re: afaef01c00 ("x86/entry: Add STACKLEAK erasing the kernel stack .."): double fault: 0000 [#1]

2018-11-09 Thread Kees Cook
On Fri, Nov 9, 2018 at 5:09 PM, Alexander Popov  wrote:
>
> On 09.11.2018 23:46, Andy Lutomirski wrote:
>>> On Nov 9, 2018, at 12:06 PM, Jann Horn  wrote:
>>>
>>> +Andy, Thomas, Ingo
>>>
>>>> On Fri, Nov 9, 2018 at 2:24 PM kernel test robot  wrote:
>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>>
>>>> commit afaef01c001537fa97a25092d7f54d764dc7d8c1
>>>> Author: Alexander Popov 
>>>> AuthorDate: Fri Aug 17 01:16:58 2018 +0300
>>>> Commit: Kees Cook 
>>>> CommitDate: Tue Sep 4 10:35:47 2018 -0700
>>>>
>>>>x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls
>>> [...]
>>>> [  127.808225] double fault:  [#1]
>>>> [  127.808695] CPU: 0 PID: 414 Comm: trinity-main Tainted: G   
>>>>  T 4.19.0-rc2-1-gafaef01 #1
>>>> [  127.809799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>>>> 1.10.2-1 04/01/2014
>>>> [  127.810760] RIP: 0010:ftrace_ops_test+0x27/0xa0
>>>> [  127.811289] Code: eb 9a 90 41 54 55 49 89 f4 53 48 89 d3 48 89 fd 48 81 
>>>> ec b0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 a8 00 00 00 31 c0 
>>>>  54 df ff ff 48 85 db 74 57 e8 4a df ff ff 48 8b 85 d0 00 00 00
>>>> [  127.813385] RSP: 0018:fe001fb8 EFLAGS: 00010046
>>> [...]
>>>> [  127.819762] CR2: fe001fa8 CR3: 1579a000 CR4: 
>>>> 06b0
>>> [...]
>>>> [  127.822234] Call Trace:
>>>> [  127.822530]  
>>>> [  127.822914]  ? __ia32_sys_rseq+0x2f0/0x2f0
>>>> [  127.823395]  ftrace_ops_list_func+0xa5/0x1b0
>>>> [  127.823922]  ftrace_call+0x5/0x34
>>>> [  127.824318]  ? stackleak_erase+0x5/0xf0
>>>> [  127.824789]  ? stackleak_erase+0x43/0xf0
>>>> [  127.825260]  stackleak_erase+0x5/0xf0
>>>> [  127.825699]  syscall_return_via_sysret+0x61/0x81
>>>> [  127.826238] WARNING: stack recursion on stack type 4
>>>> [  127.826243] WARNING: can't dereference registers at (ptrval) 
>>>> for ip syscall_return_via_sysret+0x61/0x81
>>>> [  127.826246]  
>>>> [  127.828342] ---[ end trace e9f96d3f45575499 ]---
>>>> [  127.828911] RIP: 0010:ftrace_ops_test+0x27/0xa0
>>>
>>> CR2: fe001fa8, RSP: 0018:fe001fb8; this is a pagefault
>>> on the stack. fe00 is CPU_ENTRY_AREA_RO_IDT;
>>> fe001000 is CPU_ENTRY_AREA_PER_CPU; so fe002000 is the
>>> page with the entry stack for cpu 0, and you overflowed from that into
>>> the readonly gdt at fe001000, which doubles as a guard page
>>> for the entry stack:
>>>
>>> struct cpu_entry_area {
>>>char gdt[PAGE_SIZE];
>>>
>>>/*
>>> * The GDT is just below entry_stack and thus serves (on x86_64) as
>>> * a a read-only guard page.
>>> */
>>>struct entry_stack_page entry_stack_page;
>>> [...]
>>> };
>>>
>>> In other words: You're calling C code on the entry trampoline stack;
>>> this C code can call into ftrace; and the entry trampoline stack isn't
>>> big enough for ftrace shenanigans. I think you probably shouldn't be
>>> calling C code on the entry stack, but maybe one of the X86 folks has
>>> a different opinion?
>>
>> My opinion was that, on x86_32, the entry stack ought to be fairly large so
>> that NMIs could execute on the entry stack.  I don’t remember what the code
>> actually does, though.
>>
>> But stackleak_erase should probably not run on the entry stack. That seems
>> like it’s just asking for trouble.
>
> Hello Jann and Andy,
>
>
> The stackleak_erase() function is called on the trampoline stack at the end of
> syscall, it erases the used part of the kernel thread stack after the syscall 
> is
> handled.
>
>
> I've reproduced such a double fault with function tracing for 
> stackleak_erase():
>
>   # mount -t tracefs nodev /sys/kernel/tracing
>   # echo 'p:myprobe stackleak_erase' > /sys/kernel/debug/tracing/kprobe_events
>   # echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
>
>
> I think we should simply not allow function tracing for stackleak_*() 
> functions:
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 

Re: [PATCH] ARM: mm: dump: Change to use DEFINE_SHOW_ATTRIBUTE macro

2018-11-05 Thread Kees Cook
On Mon, Nov 5, 2018 at 9:01 AM, Laura Abbott  wrote:
> On 11/5/18 6:32 AM, Yangtao Li wrote:
>>
>> Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
>>
>
> Acked-by: Laura Abbott 
>
>
>> Signed-off-by: Yangtao Li 
>> ---
>>   arch/arm/mm/ptdump_debugfs.c | 12 +---
>>   1 file changed, 1 insertion(+), 11 deletions(-)
>>
>> diff --git a/arch/arm/mm/ptdump_debugfs.c b/arch/arm/mm/ptdump_debugfs.c
>> index be8d87be4b93..201cd467a739 100644
>> --- a/arch/arm/mm/ptdump_debugfs.c
>> +++ b/arch/arm/mm/ptdump_debugfs.c
>> @@ -12,17 +12,7 @@ static int ptdump_show(struct seq_file *m, void *v)
>> return 0;
>>   }
>>   -static int ptdump_open(struct inode *inode, struct file *file)
>> -{
>> -   return single_open(file, ptdump_show, inode->i_private);
>> -}
>> -
>> -static const struct file_operations ptdump_fops = {
>> -   .open   = ptdump_open,
>> -   .read   = seq_read,
>> -   .llseek = seq_lseek,
>> -   .release= single_release,
>> -};
>> +DEFINE_SHOW_ATTRIBUTE(ptdump);

Nice! Can you do the same for arch/x86/mm/debug_pagetables.c too?

Reviewed-by: Kees Cook 

-Kees

>> int ptdump_debugfs_register(struct ptdump_info *info, const char
>> *name)
>>   {
>>
>



-- 
Kees Cook


Re: [PATCH 8/8] pstore/ram: Correctly calculate usable PRZ bytes

2018-11-05 Thread Kees Cook
On Sun, Nov 4, 2018 at 8:42 PM, Joel Fernandes  wrote:
> Dumping the magic bytes of the non decompressable .enc.z files, I get this
> which shows a valid zlib compressed header:
>
> Something like:
> 48 89 85 54 4d 6f 1a 31
>
> The 0b1000 in the first byte means it is "deflate". The file tool indeed
> successfully shows "zlib compressed data" and I did the math for the header
> and it is indeed valid. So I don't think the data is insane. The buffer has
> enough room because even the very small dumps are not decompressable.

Interesting. So the kernel wouldn't decompress it even though it's the
right algo and untruncated? That seems worth fixing.

> At this point we can park this issue I guess, but a scenario that is still
> broken is:
> Say someone crashes the system on compress algo X and then recompiles with
> compress algo Y, then the decompress would fail no?
>
> One way to fix that is to store the comrpession method in buffer as well,
> then initialize all algorithms at boot and choose the right one in the
> buffer ideally. Otherwise atleast we should print a message saying "buffer is
> encoded with algo X but compression selected is Y" or something. But I agree
> its a very low priority "doctor it hurts if I do this" kind of issue :)

Right, this is fine: if algos change across a kernel version, I'm fine
with it failing. pstore isn't expected to work sanely outside of a
pretty narrow set of use-cases.

-Kees

-- 
Kees Cook


Re: [RFC PATCH 0/7] runtime format string checking

2018-11-02 Thread Kees Cook
On Fri, Nov 2, 2018 at 1:09 PM, Rasmus Villemoes
 wrote:
> That's a bit too naive. At the very least, you must exclude static
> stuff, i.e. restrict to actual auto variables. Otherwise you're making
> things worse (a "static const char []" just occupies some space in
> .rodata, a "static const char * const" occupies the same space for the
> anonymous literal, plus space for a pointer). Furthermore, you must
> ensure that nobody does sizeof() on VAR. With a trivial extension of
> your script to exclude the "static const char" places, I get

Yes, thank you. That's the part I was forgetting and why I was doing
[] over * back then. There are certainly uses of sizeof() on these
strings. So, it seems better to get sizeof() right that the const-ness
right.

>>>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=kspp/format-security=b7dcfc8f48caaafcc423e5793f7ef61b9bb5c458
>>>> This one covers cases where the pointer is pointing to a const string,
>>>> so really there's no sense in injecting the "%s", but I was collecting
>>>> them to make real ones stand out.
>>>
>>> I don't agree. [...]
>>
>> Okay, then I'll forward this to akpm maybe?
>
> Yes, if all they do is replace f(..., s) by f(..., "%s", s) that should
> never hurt. Maybe check if there's a ..._puts() variant that can be used
> instead, e.g. seq_puts().

Alright, I'll see about bringing that series forward in time...

-Kees

-- 
Kees Cook


Re: [PATCH 7/8] pstore: Remove needless lock during console writes

2018-11-02 Thread Kees Cook
On Fri, Nov 2, 2018 at 11:32 AM, Joel Fernandes  wrote:
> On Thu, Nov 01, 2018 at 04:51:59PM -0700, Kees Cook wrote:
>> Since commit 70ad35db3321 ("pstore: Convert console write to use
>> ->write_buf"), the console writer does not use the preallocated crash
>> dump buffer any more, so there is no reason to perform locking around it.
>
> Out of curiosity, what was the reason for having this preallocated crash
> buffer in the first place? I thought the 'console' type only did regular
> kernel console logging, not crash dumps.

The primary reason is that the dumper needs to write to somewhere and
we don't know the state of the system (memory allocation may not work
for example).

The other frontends tend to run at "sane" locations in the kernel. The
dumper, however, is quite fragile.

> I looked at all the patches and had some minor nits, with the nits addressed
> (if you agree with them), feel free to add my Reviewed-by on future respins:
>
> Reviewed-by: Joel Fernandes (Google) 

Thanks!

> Also I wonder if Namhyung is still poking around that virtio pstore driver he
> mentioned in the commit mentioned above. :)

Did that never land? I thought it mostly had to happen at the qemu end?

With nvdimm emulation, we can just use ramoops. :)

-Kees

-- 
Kees Cook


Re: [PATCH 8/8] pstore/ram: Correctly calculate usable PRZ bytes

2018-11-02 Thread Kees Cook
On Fri, Nov 2, 2018 at 11:01 AM, Joel Fernandes  wrote:
> On Thu, Nov 01, 2018 at 04:52:00PM -0700, Kees Cook wrote:
>> The actual number of bytes stored in a PRZ is smaller than the
>> bytes requested by platform data, since there is a header on each
>> PRZ. Additionally, if ECC is enabled, there are trailing bytes used
>> as well. Normally this mismatch doesn't matter since PRZs are circular
>> buffers and the leading "overflow" bytes are just thrown away. However, in
>> the case of a compressed record, this rather badly corrupts the results.
>
> Actually this would also mean some data loss for non-compressed records were
> also there before, but is now fixed?

No, it's what I mentioned in the commit log: only the "tail" of any
data was getting stored, which is consistent with the configuration
given. The main problem is that ECC bytes weren't part of the
calculation the ram backend provided to pstore for pstore to pick the
correct amount of bytes to compress.

>> This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
>> Any stored crashes would not be uncompressable (producing a pstorefs
>> "dmesg-*.enc.z" file), and triggering errors at boot:
>>
>>   [2.790759] pstore: crypto_comp_decompress failed, ret = -22!
>>
>> Reported-by: Joel Fernandes 
>> Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore")
>> Signed-off-by: Kees Cook 
>
> Thanks!
> Reviewed-by: Joel Fernandes (Google) 

Thanks!

> Also should this be fixed for other backends or are those good? AFAIR, I saw
> this for EFI too.

It seemed like the other backends were doing it correctly (e.g. erst
removes the header from calculation, etc). I did see that EFI
allocates more memory than needed?

efi_pstore_info.buf = kmalloc(4096, GFP_KERNEL);
if (!efi_pstore_info.buf)
return -ENOMEM;

efi_pstore_info.bufsize = 1024;

efi_pstore_write() does:

ret = efivar_entry_set_safe(efi_name, vendor, PSTORE_EFI_ATTRIBUTES,
  !pstore_cannot_block_path(record->reason),
  record->size, record->psi->buf);

and efivar_entry_set_safe() says:

 * Returns 0 on success, -ENOSPC if the firmware does not have enough
 * space for set_variable() to succeed, or a converted EFI status code
 * if set_variable() fails.

So I don't see how this could get truncated. (I'm not saying it
didn't... just that I can't see it in an obvious place.)

-Kees

-- 
Kees Cook


[PATCH linux-next 1/8] pstore/ram: Standardize module name in ramoops

2018-11-01 Thread Kees Cook
With both ram.c and ram_core.c built into ramoops.ko, it doesn't make
sense to have differing pr_fmt prefixes. This fixes ram_core.c to use
the module name (as ram.c already does). Additionally improves region
reservation error to include the region name.

Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 23ca6f2c98a0..f5d0173901aa 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -12,7 +12,7 @@
  *
  */
 
-#define pr_fmt(fmt) "persistent_ram: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
 #include 
@@ -443,7 +443,8 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t 
size,
void *va;
 
if (!request_mem_region(start, size, label ?: "ramoops")) {
-   pr_err("request mem region (0x%llx@0x%llx) failed\n",
+   pr_err("request mem region (%s 0x%llx@0x%llx) failed\n",
+   label ?: "ramoops",
(unsigned long long)size, (unsigned long long)start);
return NULL;
}
-- 
2.17.1



[PATCH 6/8] pstore: Replace open-coded << with BIT()

2018-11-01 Thread Kees Cook
Minor clean-up to use BIT() (as already done in pstore_ram.h).

Signed-off-by: Kees Cook 
---
 include/linux/pstore.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 877ed81de346..3549f2ba865c 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -189,10 +189,10 @@ struct pstore_info {
 };
 
 /* Supported frontends */
-#define PSTORE_FLAGS_DMESG (1 << 0)
-#define PSTORE_FLAGS_CONSOLE   (1 << 1)
-#define PSTORE_FLAGS_FTRACE(1 << 2)
-#define PSTORE_FLAGS_PMSG  (1 << 3)
+#define PSTORE_FLAGS_DMESG BIT(0)
+#define PSTORE_FLAGS_CONSOLE   BIT(1)
+#define PSTORE_FLAGS_FTRACEBIT(2)
+#define PSTORE_FLAGS_PMSG  BIT(3)
 
 extern int pstore_register(struct pstore_info *);
 extern void pstore_unregister(struct pstore_info *);
-- 
2.17.1



[PATCH 2/8] pstore: Do not use crash buffer for decompression

2018-11-01 Thread Kees Cook
The pre-allocated compression buffer used for crash dumping was also
being used for decompression. This isn't technically safe, since it's
possible the kernel may attempt a crashdump while pstore is populating the
pstore filesystem (and performing decompression). Instead, just allocate
a separate buffer for decompression. Correctness is preferred over
performance here.

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c | 56 
 1 file changed, 25 insertions(+), 31 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index b821054ca3ed..8b6028948cf3 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -258,20 +258,6 @@ static int pstore_compress(const void *in, void *out,
return outlen;
 }
 
-static int pstore_decompress(void *in, void *out,
-unsigned int inlen, unsigned int outlen)
-{
-   int ret;
-
-   ret = crypto_comp_decompress(tfm, in, inlen, out, );
-   if (ret) {
-   pr_err("crypto_comp_decompress failed, ret = %d!\n", ret);
-   return ret;
-   }
-
-   return outlen;
-}
-
 static void allocate_buf_for_compression(void)
 {
struct crypto_comp *ctx;
@@ -656,8 +642,9 @@ EXPORT_SYMBOL_GPL(pstore_unregister);
 
 static void decompress_record(struct pstore_record *record)
 {
+   int ret;
int unzipped_len;
-   char *decompressed;
+   char *unzipped, *workspace;
 
if (!record->compressed)
return;
@@ -668,35 +655,42 @@ static void decompress_record(struct pstore_record 
*record)
return;
}
 
-   /* No compression method has created the common buffer. */
+   /* Missing compression buffer means compression was not initialized. */
if (!big_oops_buf) {
-   pr_warn("no decompression buffer allocated\n");
+   pr_warn("no decompression method initialized!\n");
return;
}
 
-   unzipped_len = pstore_decompress(record->buf, big_oops_buf,
-record->size, big_oops_buf_sz);
-   if (unzipped_len <= 0) {
-   pr_err("decompression failed: %d\n", unzipped_len);
+   /* Allocate enough space to hold max decompression and ECC. */
+   unzipped_len = big_oops_buf_sz;
+   workspace = kmalloc(unzipped_len + record->ecc_notice_size,
+   GFP_KERNEL);
+   if (!workspace)
return;
-   }
 
-   /* Build new buffer for decompressed contents. */
-   decompressed = kmalloc(unzipped_len + record->ecc_notice_size,
-  GFP_KERNEL);
-   if (!decompressed) {
-   pr_err("decompression ran out of memory\n");
+   /* After decompression "unzipped_len" is almost certainly smaller. */
+   ret = crypto_comp_decompress(tfm, record->buf, record->size,
+ workspace, _len);
+   if (ret) {
+   pr_err("crypto_comp_decompress failed, ret = %d!\n", ret);
+   kfree(workspace);
return;
}
-   memcpy(decompressed, big_oops_buf, unzipped_len);
 
/* Append ECC notice to decompressed buffer. */
-   memcpy(decompressed + unzipped_len, record->buf + record->size,
+   memcpy(workspace + unzipped_len, record->buf + record->size,
   record->ecc_notice_size);
 
-   /* Swap out compresed contents with decompressed contents. */
+   /* Copy decompressed contents into an minimum-sized allocation. */
+   unzipped = kmemdup(workspace, unzipped_len + record->ecc_notice_size,
+  GFP_KERNEL);
+   kfree(workspace);
+   if (!unzipped)
+   return;
+
+   /* Swap out compressed contents with decompressed contents. */
kfree(record->buf);
-   record->buf = decompressed;
+   record->buf = unzipped;
record->size = unzipped_len;
record->compressed = false;
 }
-- 
2.17.1



[PATCH 7/8] pstore: Remove needless lock during console writes

2018-11-01 Thread Kees Cook
Since commit 70ad35db3321 ("pstore: Convert console write to use
->write_buf"), the console writer does not use the preallocated crash
dump buffer any more, so there is no reason to perform locking around it.

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index a956c7bc3f67..32340e7dd6a5 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -461,31 +461,14 @@ static void pstore_unregister_kmsg(void)
 #ifdef CONFIG_PSTORE_CONSOLE
 static void pstore_console_write(struct console *con, const char *s, unsigned 
c)
 {
-   const char *e = s + c;
+   struct pstore_record record;
 
-   while (s < e) {
-   struct pstore_record record;
-   unsigned long flags;
-
-   pstore_record_init(, psinfo);
-   record.type = PSTORE_TYPE_CONSOLE;
-
-   if (c > psinfo->bufsize)
-   c = psinfo->bufsize;
+   pstore_record_init(, psinfo);
+   record.type = PSTORE_TYPE_CONSOLE;
 
-   if (oops_in_progress) {
-   if (!spin_trylock_irqsave(>buf_lock, flags))
-   break;
-   } else {
-   spin_lock_irqsave(>buf_lock, flags);
-   }
-   record.buf = (char *)s;
-   record.size = c;
-   psinfo->write();
-   spin_unlock_irqrestore(>buf_lock, flags);
-   s += c;
-   c = e - s;
-   }
+   record.buf = (char *)s;
+   record.size = c;
+   psinfo->write();
 }
 
 static struct console pstore_console = {
-- 
2.17.1



[PATCH 3/8] pstore/ram: Report backend assignments with finer granularity

2018-11-01 Thread Kees Cook
In order to more easily perform automated regression testing, this
adds pr_debug() calls to report each prz allocation which can then be
verified against persistent storage. Specifically, seeing the dividing
line between header, data, any ECC bytes. (And the general assignment
output is updated to remove the bogus ECC blocksize which isn't actually
recorded outside the prz instance.)

Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c  | 4 ++--
 fs/pstore/ram_core.c | 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index b51901f97dc2..25bede911809 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -856,9 +856,9 @@ static int ramoops_probe(struct platform_device *pdev)
ramoops_pmsg_size = pdata->pmsg_size;
ramoops_ftrace_size = pdata->ftrace_size;
 
-   pr_info("attached 0x%lx@0x%llx, ecc: %d/%d\n",
+   pr_info("using 0x%lx@0x%llx, ecc: %d\n",
cxt->size, (unsigned long long)cxt->phys_addr,
-   cxt->ecc_info.ecc_size, cxt->ecc_info.block_size);
+   cxt->ecc_info.ecc_size);
 
return 0;
 
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index f5d0173901aa..d5bf9be82545 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -576,6 +576,12 @@ struct persistent_ram_zone *persistent_ram_new(phys_addr_t 
start, size_t size,
if (ret)
goto err;
 
+   pr_debug("attached %s 0x%lx@0x%llx: %lu header, %lu data, %lu ecc 
(%d/%d)\n",
+   prz->label, prz->size, (unsigned long long)prz->paddr,
+   sizeof(*prz->buffer), prz->buffer_size,
+   prz->size - sizeof(*prz->buffer) - prz->buffer_size,
+   prz->ecc_info.ecc_size, prz->ecc_info.block_size);
+
return prz;
 err:
persistent_ram_free(prz);
-- 
2.17.1



[PATCH 4/8] pstore/ram: Add kern-doc for struct persistent_ram_zone

2018-11-01 Thread Kees Cook
The struct persistent_ram_zone wasn't well documented. This adds kern-doc
for it.

Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c   | 10 +
 include/linux/pstore_ram.h | 46 +++---
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index d5bf9be82545..1cda5922b4b4 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -29,6 +29,16 @@
 #include 
 #include 
 
+/**
+ * struct persistent_ram_buffer - persistent circular RAM buffer
+ *
+ * @sig:
+ * signature to indicate header (PERSISTENT_RAM_SIG xor PRZ-type value)
+ * @start:
+ * offset into @data where the beginning of the stored bytes begin
+ * @size:
+ * number of valid bytes stored in @data
+ */
 struct persistent_ram_buffer {
uint32_tsig;
atomic_tstart;
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index 6e94980357d2..5d10ad51c1c4 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -30,6 +30,10 @@
  * PRZ_FLAG_NO_LOCK is used. For all other cases, locking is required.
  */
 #define PRZ_FLAG_NO_LOCK   BIT(0)
+/*
+ * If a PRZ should only have a single-boot lifetime, this marks it as
+ * getting wiped after its contents get copied out after boot.
+ */
 #define PRZ_FLAG_ZAP_OLD   BIT(1)
 
 struct persistent_ram_buffer;
@@ -43,17 +47,53 @@ struct persistent_ram_ecc_info {
uint16_t *par;
 };
 
+/**
+ * struct persistent_ram_zone - Details of a persistent RAM zone (PRZ)
+ *  used as a pstore backend
+ *
+ * @paddr: physical address of the mapped RAM area
+ * @size:  size of mapping
+ * @label: unique name of this PRZ
+ * @flags: holds PRZ_FLAGS_* bits
+ *
+ * @buffer_lock:
+ * locks access to @buffer "size" bytes and "start" offset
+ * @buffer:
+ * pointer to actual RAM area managed by this PRZ
+ * @buffer_size:
+ * bytes in @buffer->data (not including any trailing ECC bytes)
+ *
+ * @par_buffer:
+ * pointer into @buffer->data containing ECC bytes for @buffer->data
+ * @par_header:
+ * pointer into @buffer->data containing ECC bytes for @buffer header
+ * (i.e. all fields up to @data)
+ * @rs_decoder:
+ * RSLIB instance for doing ECC calculations
+ * @corrected_bytes:
+ * ECC corrected bytes accounting since boot
+ * @bad_blocks:
+ * ECC uncorrectable bytes accounting since boot
+ * @ecc_info:
+ * ECC configuration details
+ *
+ * @old_log:
+ * saved copy of @buffer->data prior to most recent wipe
+ * @old_log_size:
+ * bytes contained in @old_log
+ *
+ */
 struct persistent_ram_zone {
phys_addr_t paddr;
size_t size;
void *vaddr;
char *label;
-   struct persistent_ram_buffer *buffer;
-   size_t buffer_size;
u32 flags;
+
raw_spinlock_t buffer_lock;
+   struct persistent_ram_buffer *buffer;
+   size_t buffer_size;
 
-   /* ECC correction */
char *par_buffer;
char *par_header;
struct rs_control *rs_decoder;
-- 
2.17.1



[PATCH 0/8] pstore improvements (pstore-next)

2018-11-01 Thread Kees Cook
This is a posting of several patches I've been working on to improve
pstore. Most of it is better comments, output, and naming, but one
bug fix stands out to fix head-truncationg of compressed records.
Details in the individual patches. Review appreciated! :)

-Kees

Kees Cook (8):
  pstore/ram: Standardize module name in ramoops
  pstore: Do not use crash buffer for decompression
  pstore/ram: Report backend assignments with finer granularity
  pstore/ram: Add kern-doc for struct persistent_ram_zone
  pstore: Improve and update some comments and status output
  pstore: Replace open-coded << with BIT()
  pstore: Remove needless lock during console writes
  pstore/ram: Correctly calculate usable PRZ bytes

 fs/pstore/platform.c   | 92 ++
 fs/pstore/ram.c| 19 
 fs/pstore/ram_core.c   | 25 +--
 include/linux/pstore.h | 15 ---
 include/linux/pstore_ram.h | 46 +--
 5 files changed, 116 insertions(+), 81 deletions(-)

-- 
2.17.1



[PATCH 8/8] pstore/ram: Correctly calculate usable PRZ bytes

2018-11-01 Thread Kees Cook
The actual number of bytes stored in a PRZ is smaller than the
bytes requested by platform data, since there is a header on each
PRZ. Additionally, if ECC is enabled, there are trailing bytes used
as well. Normally this mismatch doesn't matter since PRZs are circular
buffers and the leading "overflow" bytes are just thrown away. However, in
the case of a compressed record, this rather badly corrupts the results.

This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
Any stored crashes would not be uncompressable (producing a pstorefs
"dmesg-*.enc.z" file), and triggering errors at boot:

  [2.790759] pstore: crypto_comp_decompress failed, ret = -22!

Reported-by: Joel Fernandes 
Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore")
Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c| 15 ++-
 include/linux/pstore.h |  5 -
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 25bede911809..10ac4d23c423 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -814,17 +814,14 @@ static int ramoops_probe(struct platform_device *pdev)
 
cxt->pstore.data = cxt;
/*
-* Console can handle any buffer size, so prefer LOG_LINE_MAX. If we
-* have to handle dumps, we must have at least record_size buffer. And
-* for ftrace, bufsize is irrelevant (if bufsize is 0, buf will be
-* ZERO_SIZE_PTR).
+* Since bufsize is only used for dmesg crash dumps, it
+* must match the size of the dprz record (after PRZ header
+* and ECC bytes have been accounted for).
 */
-   if (cxt->console_size)
-   cxt->pstore.bufsize = 1024; /* LOG_LINE_MAX */
-   cxt->pstore.bufsize = max(cxt->record_size, cxt->pstore.bufsize);
-   cxt->pstore.buf = kmalloc(cxt->pstore.bufsize, GFP_KERNEL);
+   cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
+   cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
if (!cxt->pstore.buf) {
-   pr_err("cannot allocate pstore buffer\n");
+   pr_err("cannot allocate pstore crash dump buffer\n");
err = -ENOMEM;
goto fail_clear;
}
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 3549f2ba865c..f46e5df76b58 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -90,7 +90,10 @@ struct pstore_record {
  *
  * @buf_lock:  spinlock to serialize access to @buf
  * @buf:   preallocated crash dump buffer
- * @bufsize:   size of @buf available for crash dump writes
+ * @bufsize:   size of @buf available for crash dump bytes (must match
+ * smallest number of bytes available for writing to a
+ * backend entry, since compressed bytes don't take kindly
+ * to being truncated)
  *
  * @read_mutex:serializes @open, @read, @close, and @erase callbacks
  * @flags: bitfield of frontends the backend can accept writes for
-- 
2.17.1



[PATCH 5/8] pstore: Improve and update some comments and status output

2018-11-01 Thread Kees Cook
This improves and updates some comments:
 - dump handler comment out of sync from calling convention
 - fix kern-doc typo

and improves status output:
 - reminder that only kernel crash dumps are compressed
 - do not be silent about ECC infrastructure failures

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c   | 7 +++
 fs/pstore/ram_core.c   | 4 +++-
 include/linux/pstore.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 8b6028948cf3..a956c7bc3f67 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -304,7 +304,7 @@ static void allocate_buf_for_compression(void)
big_oops_buf_sz = size;
big_oops_buf = buf;
 
-   pr_info("Using compression: %s\n", zbackend->name);
+   pr_info("Using crash dump compression: %s\n", zbackend->name);
 }
 
 static void free_buf_for_compression(void)
@@ -354,9 +354,8 @@ void pstore_record_init(struct pstore_record *record,
 }
 
 /*
- * callback from kmsg_dump. (s2,l2) has the most recently
- * written bytes, older bytes are in (s1,l1). Save as much
- * as we can from the end of the buffer.
+ * callback from kmsg_dump. Save as much as we can (up to kmsg_bytes) from the
+ * end of the buffer.
  */
 static void pstore_dump(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason)
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 1cda5922b4b4..e859e02f67a8 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -503,8 +503,10 @@ static int persistent_ram_post_init(struct 
persistent_ram_zone *prz, u32 sig,
bool zap = !!(prz->flags & PRZ_FLAG_ZAP_OLD);
 
ret = persistent_ram_init_ecc(prz, ecc_info);
-   if (ret)
+   if (ret) {
+   pr_warn("ECC failed %s\n", prz->label);
return ret;
+   }
 
sig ^= PERSISTENT_RAM_SIG;
 
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index a15bc4d48752..877ed81de346 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -85,7 +85,7 @@ struct pstore_record {
 /**
  * struct pstore_info - backend pstore driver structure
  *
- * @owner: module which is repsonsible for this backend driver
+ * @owner: module which is responsible for this backend driver
  * @name:  name of the backend driver
  *
  * @buf_lock:  spinlock to serialize access to @buf
-- 
2.17.1



Re: [PATCH] ARM: kprobes: Fix false positive with FORTIFY_SOURCE

2018-11-01 Thread Kees Cook
On Thu, Nov 1, 2018 at 7:41 AM, Masami Hiramatsu  wrote:
> On Tue, 30 Oct 2018 13:40:27 -0400
> William Cohen  wrote:
>
>> On 10/22/18 5:30 AM, Kees Cook wrote:
>> > The arm compiler internally interprets an inline assembly label
>> > as an unsigned long value, not a pointer. As a result, under
>> > CONFIG_FORTIFY_SOURCE, the size of the array pointed to by an address
>> > of a label is 4 bytes, which was tripping the runtime checks. Instead,
>> > we can just cast the label (as done with the size calculations earlier)
>> > to avoid the problem.
>> >
>> > Reported-by: William Cohen 
>> > Fixes: 6974f0c4555e ("include/linux/string.h: add the option of fortified 
>> > string.h functions")
>> > Cc: sta...@vger.kernel.org
>> > Signed-off-by: Kees Cook 
>> > ---
>> >  arch/arm/probes/kprobes/opt-arm.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm/probes/kprobes/opt-arm.c 
>> > b/arch/arm/probes/kprobes/opt-arm.c
>> > index b2aa9b32bff2..2c118a6ab358 100644
>> > --- a/arch/arm/probes/kprobes/opt-arm.c
>> > +++ b/arch/arm/probes/kprobes/opt-arm.c
>> > @@ -247,7 +247,7 @@ int arch_prepare_optimized_kprobe(struct 
>> > optimized_kprobe *op, struct kprobe *or
>> > }
>> >
>> > /* Copy arch-dep-instance from template. */
>> > -   memcpy(code, _template_entry,
>> > +   memcpy(code, (unsigned char *)optprobe_template_entry,
>> > TMPL_END_IDX * sizeof(kprobe_opcode_t));
>> >
>> > /* Adjust buffer according to instruction. */
>> >
>>
>> The patch fixes the issue for kretprobes.  It looks good to me.
>
> This looks good to me too. Thank you for finding and fixing it :)
>
> Acked-by: Masami Hiramatsu 
>
> Thanks!

Thanks for acks and testing. I've tossed this at the arm patch tracker:
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8806/1

-- 
Kees Cook


Re: [PATCH 16/17] prmem: pratomic-long

2018-10-31 Thread Kees Cook
On Wed, Oct 31, 2018 at 2:10 AM, Peter Zijlstra  wrote:
> On Tue, Oct 30, 2018 at 04:28:16PM +, Will Deacon wrote:
>> On Tue, Oct 30, 2018 at 04:58:41PM +0100, Peter Zijlstra wrote:
>> > Like mentioned elsewhere; if you do write_enable() + write_disable()
>> > thingies, it all becomes:
>> >
>> > write_enable();
>> > atomic_foo();
>> > write_disable();
>> >
>> > No magic gunk infested duplication at all. Of course, ideally you'd then
>> > teach objtool about this (or a GCC plugin I suppose) to ensure any
>> > enable reached a disable.
>>
>> Isn't the issue here that we don't want to change the page tables for the
>> mapping of , but instead want to create a temporary writable alias
>> at a random virtual address?
>>
>> So you'd want:
>>
>>   wbar = write_enable();
>>   atomic_foo(wbar);
>>   write_disable(wbar);
>>
>> which is probably better expressed as a map/unmap API. I suspect this
>> would also be the only way to do things for cmpxchg() loops, where you
>> want to create the mapping outside of the loop to minimise your time in
>> the critical section.
>
> Ah, so I was thikning that the altnerative mm would have stuff in the
> same location, just RW instead of RO.

I was hoping for the same location too. That allows use to use a gcc
plugin to mark, say, function pointer tables, as read-only, and
annotate their rare updates with write_rare() without any
recalculation.

-Kees

-- 
Kees Cook


Re: [patch 0/9] time: Add SPDX identifiers and cleanup boilerplates

2018-10-31 Thread Kees Cook
On Wed, Oct 31, 2018 at 11:21 AM, Thomas Gleixner  wrote:
> Add SPDX identifiers to all files in kernel/time and remove the license
> boiler plates.
>
> Aside of that use the chance to get rid of (stale) file references and tidy
> up the top of file comments as they are touched anyway by this work.
>
> This work is based on a script and data from Philippe Ombredanne, Kate
> Stewart and myself. The data has been created with two independent license
> scanners and manual inspection.

Looks good! I need to do this for some of my trees too... too much
boilerplate in the tree! :)

Acked-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 5:29 PM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 03:34:54PM -0700, Kees Cook wrote:
>> On Tue, Oct 30, 2018 at 3:32 PM, Tycho Andersen  wrote:
>> > On Tue, Oct 30, 2018 at 03:00:17PM -0700, Kees Cook wrote:
>> >> On Tue, Oct 30, 2018 at 2:54 PM, Tycho Andersen  wrote:
>> >> > On Tue, Oct 30, 2018 at 02:49:21PM -0700, Kees Cook wrote:
>> >> >> On Mon, Oct 29, 2018 at 3:40 PM, Tycho Andersen  wrote:
>> >> >> > * switch to a flags based future-proofing mechanism for struct
>> >> >> >   seccomp_notif and seccomp_notif_resp, thus avoiding version 
>> >> >> > issues
>> >> >> >   with structure length (Kees)
>> >> >> [...]
>> >> >> >
>> >> >> > +struct seccomp_notif {
>> >> >> > +   __u64 id;
>> >> >> > +   __u32 pid;
>> >> >> > +   __u32 flags;
>> >> >> > +   struct seccomp_data data;
>> >> >> > +};
>> >> >> > +
>> >> >> > +struct seccomp_notif_resp {
>> >> >> > +   __u64 id;
>> >> >> > +   __s64 val;
>> >> >> > +   __s32 error;
>> >> >> > +   __u32 flags;
>> >> >> > +};
>> >> >>
>> >> >> Hrm, so, what's the plan for when struct seccomp_data changes size?
>> >> >
>> >> > I guess my plan was don't ever change the size again, just use flags
>> >> > and have extra state available via ioctl().
>> >> >
>> >> >> I'm realizing that it might be "too late" for userspace to discover
>> >> >> it's running on a newer kernel. i.e. it gets a user notification, and
>> >> >> discovers flags it doesn't know how to handle. Do we actually need
>> >> >> both flags AND a length? Designing UAPI is frustrating! :)
>> >> >
>> >> > :). I don't see this as such a big problem -- in fact it's better than
>> >> > the length mode, where you don't know what you don't know, because it
>> >> > only copied as much info as you could handle. Older userspace would
>> >> > simply not use information it didn't know how to use.
>> >> >
>> >> >> Do we need another ioctl to discover the seccomp_data size maybe?
>> >> >
>> >> > That could be an option as well, assuming we agree that size would
>> >> > work, which I thought we didn't?
>> >>
>> >> Size alone wasn't able to determine the layout of the seccomp_notif
>> >> structure since it had holes (in the prior version). seccomp_data
>> >> doesn't have holes and is likely to change in size (see the recent
>> >> thread on adding the MPK register to it...)
>> >
>> > Oh, sorry, I misread this as seccomp_notif, not seccomp_data.
>> >
>> >> I'm trying to imagine the right API for this. A portable user of
>> >> seccomp_notif expects the id/pid/flags/data to always be in the same
>> >> place, but it's the size of seccomp_data that may change. So it wants
>> >> to allocate space for seccomp_notif header and "everything else", of
>> >> which is may only understand the start of seccomp_data (and ignore any
>> >> new trailing fields).
>> >>
>> >> So... perhaps the "how big are things?" ioctl would report the header
>> >> size and the seccomp_data size. Then both are flexible. And flags
>> >> would be left as a way to "version" the header?
>> >>
>> >> Any Linux API list members want to chime in here?
>> >
>> > So:
>> >
>> > struct seccomp_notify_sizes {
>> > u16 seccomp_notify;
>> > u16 seccomp_data;
>> > };
>> >
>> > ioctl(fd, SECCOMP_IOCTL_GET_SIZE, );
>> >
>> > This would be only one extra syscall over the lifetime of the listener
>> > process, which doesn't seem too bad. One thing that's slightly
>> > annoying is that you can't do it until you actually get an event, so
>> > maybe it could be a command on the seccomp syscall instead:
>> >
>> > seccomp(SECCOMP_GET_NOTIF_SIZES, 0, );
>>
>> Yeah, top-level makes more sense. u16 seems fine too.
>
> So one problem is this is that the third argument of the seccomp
> syscall is declared as const char, so I get:
>
> kernel/seccomp.c: In function ‘seccomp_get_notif_sizes’:
> kernel/seccomp.c:1401:19: warning: passing argument 1 of ‘copy_to_user’ 
> discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
>   if (copy_to_user(usizes, , sizeof(sizes)))
>^~
> In file included from ./include/linux/compat.h:19:0,
>  from kernel/seccomp.c:19:
> ./include/linux/uaccess.h:152:1: note: expected ‘void *’ but argument is of 
> type ‘const char *’
>  copy_to_user(void __user *to, const void *from, unsigned long n)
>  ^~~~
>
> If I drop the const it doesn't complain, but I'm not sure what the protocol is
> for changing the types of syscall declarations. In principle it doesn't really
> mean anything, but...

I think this should be fine. It's documented as "void *"...

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 3:32 PM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 03:00:17PM -0700, Kees Cook wrote:
>> On Tue, Oct 30, 2018 at 2:54 PM, Tycho Andersen  wrote:
>> > On Tue, Oct 30, 2018 at 02:49:21PM -0700, Kees Cook wrote:
>> >> On Mon, Oct 29, 2018 at 3:40 PM, Tycho Andersen  wrote:
>> >> > * switch to a flags based future-proofing mechanism for struct
>> >> >   seccomp_notif and seccomp_notif_resp, thus avoiding version issues
>> >> >   with structure length (Kees)
>> >> [...]
>> >> >
>> >> > +struct seccomp_notif {
>> >> > +   __u64 id;
>> >> > +   __u32 pid;
>> >> > +   __u32 flags;
>> >> > +   struct seccomp_data data;
>> >> > +};
>> >> > +
>> >> > +struct seccomp_notif_resp {
>> >> > +   __u64 id;
>> >> > +   __s64 val;
>> >> > +   __s32 error;
>> >> > +   __u32 flags;
>> >> > +};
>> >>
>> >> Hrm, so, what's the plan for when struct seccomp_data changes size?
>> >
>> > I guess my plan was don't ever change the size again, just use flags
>> > and have extra state available via ioctl().
>> >
>> >> I'm realizing that it might be "too late" for userspace to discover
>> >> it's running on a newer kernel. i.e. it gets a user notification, and
>> >> discovers flags it doesn't know how to handle. Do we actually need
>> >> both flags AND a length? Designing UAPI is frustrating! :)
>> >
>> > :). I don't see this as such a big problem -- in fact it's better than
>> > the length mode, where you don't know what you don't know, because it
>> > only copied as much info as you could handle. Older userspace would
>> > simply not use information it didn't know how to use.
>> >
>> >> Do we need another ioctl to discover the seccomp_data size maybe?
>> >
>> > That could be an option as well, assuming we agree that size would
>> > work, which I thought we didn't?
>>
>> Size alone wasn't able to determine the layout of the seccomp_notif
>> structure since it had holes (in the prior version). seccomp_data
>> doesn't have holes and is likely to change in size (see the recent
>> thread on adding the MPK register to it...)
>
> Oh, sorry, I misread this as seccomp_notif, not seccomp_data.
>
>> I'm trying to imagine the right API for this. A portable user of
>> seccomp_notif expects the id/pid/flags/data to always be in the same
>> place, but it's the size of seccomp_data that may change. So it wants
>> to allocate space for seccomp_notif header and "everything else", of
>> which is may only understand the start of seccomp_data (and ignore any
>> new trailing fields).
>>
>> So... perhaps the "how big are things?" ioctl would report the header
>> size and the seccomp_data size. Then both are flexible. And flags
>> would be left as a way to "version" the header?
>>
>> Any Linux API list members want to chime in here?
>
> So:
>
> struct seccomp_notify_sizes {
> u16 seccomp_notify;
> u16 seccomp_data;
> };
>
> ioctl(fd, SECCOMP_IOCTL_GET_SIZE, );
>
> This would be only one extra syscall over the lifetime of the listener
> process, which doesn't seem too bad. One thing that's slightly
> annoying is that you can't do it until you actually get an event, so
> maybe it could be a command on the seccomp syscall instead:
>
> seccomp(SECCOMP_GET_NOTIF_SIZES, 0, );

Yeah, top-level makes more sense. u16 seems fine too.

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 2:54 PM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 02:49:21PM -0700, Kees Cook wrote:
>> On Mon, Oct 29, 2018 at 3:40 PM, Tycho Andersen  wrote:
>> > * switch to a flags based future-proofing mechanism for struct
>> >   seccomp_notif and seccomp_notif_resp, thus avoiding version issues
>> >   with structure length (Kees)
>> [...]
>> >
>> > +struct seccomp_notif {
>> > +   __u64 id;
>> > +   __u32 pid;
>> > +   __u32 flags;
>> > +   struct seccomp_data data;
>> > +};
>> > +
>> > +struct seccomp_notif_resp {
>> > +   __u64 id;
>> > +   __s64 val;
>> > +   __s32 error;
>> > +   __u32 flags;
>> > +};
>>
>> Hrm, so, what's the plan for when struct seccomp_data changes size?
>
> I guess my plan was don't ever change the size again, just use flags
> and have extra state available via ioctl().
>
>> I'm realizing that it might be "too late" for userspace to discover
>> it's running on a newer kernel. i.e. it gets a user notification, and
>> discovers flags it doesn't know how to handle. Do we actually need
>> both flags AND a length? Designing UAPI is frustrating! :)
>
> :). I don't see this as such a big problem -- in fact it's better than
> the length mode, where you don't know what you don't know, because it
> only copied as much info as you could handle. Older userspace would
> simply not use information it didn't know how to use.
>
>> Do we need another ioctl to discover the seccomp_data size maybe?
>
> That could be an option as well, assuming we agree that size would
> work, which I thought we didn't?

Size alone wasn't able to determine the layout of the seccomp_notif
structure since it had holes (in the prior version). seccomp_data
doesn't have holes and is likely to change in size (see the recent
thread on adding the MPK register to it...)

I'm trying to imagine the right API for this. A portable user of
seccomp_notif expects the id/pid/flags/data to always be in the same
place, but it's the size of seccomp_data that may change. So it wants
to allocate space for seccomp_notif header and "everything else", of
which is may only understand the start of seccomp_data (and ignore any
new trailing fields).

So... perhaps the "how big are things?" ioctl would report the header
size and the seccomp_data size. Then both are flexible. And flags
would be left as a way to "version" the header?

Any Linux API list members want to chime in here?

-- 
Kees Cook


Re: [PATCH v4] pstore: Avoid duplicate call of persistent_ram_zap()

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 2:38 PM, Joel Fernandes  wrote:
> On Tue, Oct 30, 2018 at 03:52:34PM +0800, Peng Wang wrote:
>> When initialing prz with invalid data in buffer(no PERSISTENT_RAM_SIG),
>> function call path is like this:
>>
>> ramoops_init_prz ->
>> |
>> |-> persistent_ram_new -> persistent_ram_post_init -> persistent_ram_zap
>> |
>> |-> persistent_ram_zap
>>
>> As we can see, persistent_ram_zap() is called twice.
>> We can avoid this by adding an option to persistent_ram_new(), and
>> only call persistent_ram_zap() when it is needed.
>>
>> Signed-off-by: Peng Wang 
>> ---
>>  fs/pstore/ram.c| 4 +---
>>  fs/pstore/ram_core.c   | 5 +++--
>>  include/linux/pstore_ram.h | 1 +
>>  3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
>> index ffcff6516e89..b51901f97dc2 100644
>> --- a/fs/pstore/ram.c
>> +++ b/fs/pstore/ram.c
>> @@ -640,7 +640,7 @@ static int ramoops_init_prz(const char *name,
>>
>>   label = kasprintf(GFP_KERNEL, "ramoops:%s", name);
>>   *prz = persistent_ram_new(*paddr, sz, sig, >ecc_info,
>> -   cxt->memtype, 0, label);
>> +   cxt->memtype, PRZ_FLAG_ZAP_OLD, label);
>>   if (IS_ERR(*prz)) {
>>   int err = PTR_ERR(*prz);
>
> Looks good to me except the minor comment below:
>
>>
>> @@ -649,8 +649,6 @@ static int ramoops_init_prz(const char *name,
>>   return err;
>>   }
>>
>> - persistent_ram_zap(*prz);
>> -
>>   *paddr += sz;
>>
>>   return 0;
>> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
>> index 12e21f789194..2ededd1ea1c2 100644
>> --- a/fs/pstore/ram_core.c
>> +++ b/fs/pstore/ram_core.c
>> @@ -505,15 +505,16 @@ static int persistent_ram_post_init(struct 
>> persistent_ram_zone *prz, u32 sig,
>>   pr_debug("found existing buffer, size %zu, start 
>> %zu\n",
>>buffer_size(prz), buffer_start(prz));
>>   persistent_ram_save_old(prz);
>> - return 0;
>> + if (!(prz->flags & PRZ_FLAG_ZAP_OLD))
>> + return 0;
>
> This could be written differently.
>
> We could just do:
>
> if (prz->flags & PRZ_FLAG_ZAP_OLD)
> persistent_ram_zap(prz);
>
> And remove the zap from below below.

I actually rearranged things a little to avoid additional round-trips
on the mailing list. :)

> Since Kees already took this patch, I can just patch this in my series if
> Kees and you are Ok with this suggestion.

I've put it up here:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=pstore/devel=ac564e023248e3f4d87917b91d12376ddfca5000

> Sorry for the delay in my RFC series, I just got back from paternity leave
> and I'm catching up with email.

No worries! It's many weeks until the next merge window. :)

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Mon, Oct 29, 2018 at 3:40 PM, Tycho Andersen  wrote:
> * switch to a flags based future-proofing mechanism for struct
>   seccomp_notif and seccomp_notif_resp, thus avoiding version issues
>   with structure length (Kees)
[...]
>
> +struct seccomp_notif {
> +   __u64 id;
> +   __u32 pid;
> +   __u32 flags;
> +   struct seccomp_data data;
> +};
> +
> +struct seccomp_notif_resp {
> +   __u64 id;
> +   __s64 val;
> +   __s32 error;
> +   __u32 flags;
> +};

Hrm, so, what's the plan for when struct seccomp_data changes size?
I'm realizing that it might be "too late" for userspace to discover
it's running on a newer kernel. i.e. it gets a user notification, and
discovers flags it doesn't know how to handle. Do we actually need
both flags AND a length? Designing UAPI is frustrating! :)

Do we need another ioctl to discover the seccomp_data size maybe?

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 8:54 AM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 04:02:54PM +0100, Oleg Nesterov wrote:
>> On 10/29, Tycho Andersen wrote:
>> >
>> > +static long seccomp_notify_recv(struct seccomp_filter *filter,
>> > +   void __user *buf)
>> > +{
>> > +   struct seccomp_knotif *knotif = NULL, *cur;
>> > +   struct seccomp_notif unotif;
>> > +   ssize_t ret;
>> > +
>> > +   memset(, 0, sizeof(unotif));
>> > +
>> > +   ret = down_interruptible(>notif->request);
>> > +   if (ret < 0)
>> > +   return ret;
>> > +
>> > +   mutex_lock(>notify_lock);
>> > +   list_for_each_entry(cur, >notif->notifications, list) {
>> > +   if (cur->state == SECCOMP_NOTIFY_INIT) {
>> > +   knotif = cur;
>> > +   break;
>> > +   }
>> > +   }
>> > +
>> > +   /*
>> > +* If we didn't find a notification, it could be that the task was
>> > +* interrupted by a fatal signal between the time we were woken and
>> > +* when we were able to acquire the rw lock.
>> > +*
>> > +* This is the place where we handle the extra high semaphore count
>> > +* mentioned in seccomp_do_user_notification().
>> > +*/
>> > +   if (!knotif) {
>> > +   ret = -ENOENT;
>> > +   goto out;
>> > +   }
>> > +
>> > +   unotif.id = knotif->id;
>> > +   unotif.pid = task_pid_vnr(knotif->task);
>> > +   if (knotif->signaled)
>> > +   unotif.flags |= SECCOMP_NOTIF_FLAG_SIGNALED;
>> > +   unotif.data = *(knotif->data);
>>
>> Tycho, I forgot everything about seccomp, most probably I am wrong but let me
>> ask anyway.
>>
>> __seccomp_filter(SECCOMP_RET_TRACE) does
>>
>>   /*
>>* Recheck the syscall, since it may have changed. This
>>* intentionally uses a NULL struct seccomp_data to force
>>* a reload of all registers. This does not goto skip since
>>* a skip would have already been reported.
>>*/
>>   if (__seccomp_filter(this_syscall, NULL, true))
>>   return -1;
>>
>> and the next seccomp_run_filters() can return SECCOMP_RET_USER_NOTIF, right?
>> seccomp_do_user_notification() doesn't check recheck_after_trace and it 
>> simply
>> does n.data = sd.
>>
>> Doesn't this mean that "unotif.data = *(knotif->data)" can hit NULL ?
>>
>> seccomp_run_filters() does populate_seccomp_data() in this case, but this
>> won't affect "seccomp_data *sd" passed to seccomp_do_user_notification().

Woo, yeah, good catch. :)

> Oof, yes, you're right. Seems like there are no other users of sd in
> __seccomp_filter(). Seems to me like we can just do the
> populate_seccomp_data() one level higher in __seccomp_filter()?

Agreed.

>
> Tycho
>
>
> From 9e0f75ea51a2c328567910df3122a236ebeccab0 Mon Sep 17 00:00:00 2001
> From: Tycho Andersen 
> Date: Tue, 30 Oct 2018 09:51:14 -0600
> Subject: [PATCH] seccomp: hoist struct seccomp_data recalculation higher
>
> Signed-off-by: Tycho Andersen 
> ---
>  kernel/seccomp.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 4c5fb6ced4cd..1525cb753ad2 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -257,7 +257,6 @@ static int seccomp_check_filter(struct sock_filter 
> *filter, unsigned int flen)
>  static u32 seccomp_run_filters(const struct seccomp_data *sd,
>struct seccomp_filter **match)
>  {
> -   struct seccomp_data sd_local;
> u32 ret = SECCOMP_RET_ALLOW;
> /* Make sure cross-thread synced filter points somewhere sane. */
> struct seccomp_filter *f =
> @@ -267,11 +266,6 @@ static u32 seccomp_run_filters(const struct seccomp_data 
> *sd,
> if (unlikely(WARN_ON(f == NULL)))
> return SECCOMP_RET_KILL_PROCESS;
>
> -   if (!sd) {
> -   populate_seccomp_data(_local);
> -   sd = _local;
> -   }
> -
> /*
>  * All filters in the list are evaluated and the lowest BPF return
>  * value always takes priority (ignoring the DATA).
> @@ -821,6 +815,7 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
> u32 filter_ret, action;
> struct seccomp_filter *match = NULL;
> int data;
> +   struct seccomp_data sd_local;
>
> /*
>  * Make sure that any changes to mode from another thread have
> @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>  */
> rmb();
>
> +   if (!sd) {
> +   populate_seccomp_data(_local);
> +   sd = _local;
> +   }
> +
> filter_ret = seccomp_run_filters(sd, );
> data = filter_ret & SECCOMP_RET_DATA;
> action = filter_ret & SECCOMP_RET_ACTION_FULL;
> --
> 2.17.1
>

Looks good to me, yes.

-- 
Kees Cook


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Kees Cook
On Tue, Oct 30, 2018 at 10:21 AM, Tycho Andersen  wrote:
> On Tue, Oct 30, 2018 at 05:39:26PM +0100, Oleg Nesterov wrote:
>> On 10/30, Oleg Nesterov wrote:
>> >
>> > On 10/30, Tycho Andersen wrote:
>> > >
>> > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
>> > > struct seccomp_data *sd,
>> > >*/
>> > >   rmb();
>> > >
>> > > + if (!sd) {
>> > > + populate_seccomp_data(_local);
>> > > + sd = _local;
>> > > + }
>> > > +
>> >
>> > To me it would be more clean to remove the "if (!sd)" check, 
>> > case(SECCOMP_RET_TRACE)
>> > in __seccomp_filter() can simply do populate_seccomp_data(_local) 
>> > unconditionally
>> > and pass _local to __seccomp_filter().
>>
>> Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).

Right.

>>
>> Btw. why __seccomp_filter() doesn't return a boolean?

Because it was wrapped by __secure_computing(). *shrug* The common
method in the kernel is to use int and 0=ok.

>> Or at least, why can't case(SECCOMP_RET_TRACE) simply do
>>
>>   return __seccomp_filter(this_syscall, NULL, true);
>>
>> ?
>
> Yeah, at least the second one definitely makes sense. I can add that
> as a patch in the next version of this series unless Kees does it
> before.

I'd like to avoid changing the return value of __secure_computing() to
just avoid having to touch all the callers. And I'd prefer not to
change __seccomp_filter() to a bool, since I'd like the return values
to be consistent through the call chain.

I find the existing code more readable than a single-line return, just
because it's very explicit. I don't want to have to think any harder
when reading seccomp. ;)

-Kees

-- 
Kees Cook


  1   2   3   4   5   6   7   8   9   10   >