Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-25 Thread Matthew Garrett
On Thu, Apr 25, 2019 at 8:32 AM Christopher Lameter  wrote:
>
> On Wed, 24 Apr 2019, Matthew Garrett wrote:
>
> > Applications that hold secrets and wish to avoid them leaking can use
> > mlock() to prevent the page from being pushed out to swap and
> > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > can also use atexit() handlers to overwrite secrets on application exit.
> > However, if an attacker can reboot the system into another OS, they can
> > dump the contents of RAM and extract secrets. We can avoid this by setting
>
> Well nothing in this patchset deals with that issue That hole still
> exists afterwards. So is it worth to have this functionality?

On UEFI systems we can set the MOR bit and the firmware will overwrite
RAM on reboot. However, this can take a long time, which makes it
difficult to justify doing by default. We want userland to be able to
assert that secrets have been cleared from RAM and then clear the MOR
flag, but we can't do that if applications can terminate in a way that
prevents them from clearing their secrets.

> > Unfortunately, if an application exits uncleanly, its secrets may still be
> > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > killer decides to kill a process holding secrets, we're not going to be able
> > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > to request that the kernel clear the covered pages whenever the page
> > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > will only work on 64-bit systems.
>
> But then the pages are cleared anyways when reallocated to another
> process. This just clears it sooner before reuse. So it will reduce the
> time that a page contains the secret sauce in case the program is
> aborted and cannot run its exit handling.

On a mostly idle system there's a real chance that nothing will end up
re-using the page before a reboot happens.

> Is that realy worth extending system calls and adding kernel handling for
> this? Maybe the answer is yes given our current concern about anything
> related to "security".

If I didn't think it was worth it, I wouldn't be proposing it :)


Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-25 Thread Christopher Lameter
On Wed, 24 Apr 2019, Matthew Garrett wrote:

> Applications that hold secrets and wish to avoid them leaking can use
> mlock() to prevent the page from being pushed out to swap and
> MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> can also use atexit() handlers to overwrite secrets on application exit.
> However, if an attacker can reboot the system into another OS, they can
> dump the contents of RAM and extract secrets. We can avoid this by setting

Well nothing in this patchset deals with that issue That hole still
exists afterwards. So is it worth to have this functionality?

> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> reference count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems.

But then the pages are cleared anyways when reallocated to another
process. This just clears it sooner before reuse. So it will reduce the
time that a page contains the secret sauce in case the program is
aborted and cannot run its exit handling.

Is that realy worth extending system calls and adding kernel handling for
this? Maybe the answer is yes given our current concern about anything
related to "security".



Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-24 Thread Matthew Garrett
On Wed, Apr 24, 2019 at 1:20 PM Matthew Wilcox  wrote:
> It depends on the semantics you want.  There's no legacy code to
> worry about here.  I was seeing this as the equivalent of an atexit()
> handler; userspace is saying "When this page is unmapped, zero it".
> So it doesn't matter that somebody else might be able to reference it --
> userspace could have zeroed it themselves.

Mm ok that seems reasonable. I'll rework with that in mind.


Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-24 Thread Matthew Wilcox
On Wed, Apr 24, 2019 at 12:33:11PM -0700, Matthew Garrett wrote:
> On Wed, Apr 24, 2019 at 12:28 PM Matthew Wilcox  wrote:
> > But you can't have a new PageFlag.  Can you instead zero the memory in
> > unmap_single_vma() where we call uprobe_munmap() and untrack_pfn() today?
> 
> Is there any way the page could be referenced by something other than
> a VMA at this point? If so we probably don't want to zero it here, but
> we do want to zero it when the page is finally released (which is why
> I went with a page flag)

It could be the target/source of direct I/O, or userspace could have
registered it with an RDMA device, or ...

It depends on the semantics you want.  There's no legacy code to
worry about here.  I was seeing this as the equivalent of an atexit()
handler; userspace is saying "When this page is unmapped, zero it".
So it doesn't matter that somebody else might be able to reference it --
userspace could have zeroed it themselves.


Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-24 Thread Matthew Garrett
On Wed, Apr 24, 2019 at 12:28 PM Matthew Wilcox  wrote:
>
> On Wed, Apr 24, 2019 at 12:14:40PM -0700, Matthew Garrett wrote:
> > Unfortunately, if an application exits uncleanly, its secrets may still be
> > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > killer decides to kill a process holding secrets, we're not going to be able
> > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > to request that the kernel clear the covered pages whenever the page
> > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > will only work on 64-bit systems.
>
> Your request seems reasonable to me.
>
> > +++ b/include/linux/page-flags.h
> > @@ -118,6 +118,7 @@ enum pageflags {
> >   PG_reclaim, /* To be reclaimed asap */
> >   PG_swapbacked,  /* Page is backed by RAM/swap */
> >   PG_unevictable, /* Page is "unevictable"  */
> > + PG_wipeonrelease,
>
> But you can't have a new PageFlag.  Can you instead zero the memory in
> unmap_single_vma() where we call uprobe_munmap() and untrack_pfn() today?

Is there any way the page could be referenced by something other than
a VMA at this point? If so we probably don't want to zero it here, but
we do want to zero it when the page is finally released (which is why
I went with a page flag)


Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-24 Thread Matthew Wilcox
On Wed, Apr 24, 2019 at 12:14:40PM -0700, Matthew Garrett wrote:
> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> reference count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems.

Your request seems reasonable to me.

> +++ b/include/linux/page-flags.h
> @@ -118,6 +118,7 @@ enum pageflags {
>   PG_reclaim, /* To be reclaimed asap */
>   PG_swapbacked,  /* Page is backed by RAM/swap */
>   PG_unevictable, /* Page is "unevictable"  */
> + PG_wipeonrelease,

But you can't have a new PageFlag.  Can you instead zero the memory in
unmap_single_vma() where we call uprobe_munmap() and untrack_pfn() today?



[PATCH] mm: Allow userland to request that the kernel clear memory on release

2019-04-24 Thread Matthew Garrett
From: Matthew Garrett 

Applications that hold secrets and wish to avoid them leaking can use
mlock() to prevent the page from being pushed out to swap and
MADV_DONTDUMP to prevent it from being included in core dumps. Applications
can also use atexit() handlers to overwrite secrets on application exit.
However, if an attacker can reboot the system into another OS, they can
dump the contents of RAM and extract secrets. We can avoid this by setting
CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
firmware wipe the contents of RAM before booting another OS, but this means
rebooting takes a *long* time - the expected behaviour is for a clean
shutdown to remove the request after scrubbing secrets from RAM in order to
avoid this.

Unfortunately, if an application exits uncleanly, its secrets may still be
present in RAM. This can't be easily fixed in userland (eg, if the OOM
killer decides to kill a process holding secrets, we're not going to be able
to avoid that), so this patch adds a new flag to madvise() to allow userland
to request that the kernel clear the covered pages whenever the page
reference count hits zero. Since vm_flags is already full on 32-bit, it
will only work on 64-bit systems.

Signed-off-by: Matthew Garrett 
---

I know nothing about mm, so this is doubtless broken in any number of
ways - please let me know how!

 include/linux/mm.h |  6 
 include/linux/page-flags.h |  2 ++
 include/trace/events/mmflags.h |  4 +--
 include/uapi/asm-generic/mman-common.h |  2 ++
 mm/hugetlb.c   |  2 ++
 mm/madvise.c   | 39 ++
 mm/mempolicy.c |  2 ++
 mm/page_alloc.c|  6 
 8 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b10c21630f5..64bdab679275 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
+
+#define VM_WIPEONRELEASE BIT(37)   /* Clear pages when releasing them */
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_GROWSUPVM_NONE
 #endif
 
+#ifndef VM_WIPEONRELEASE
+# define VM_WIPEONRELEASE VM_NONE
+#endif
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP  (VM_RAND_READ | VM_SEQ_READ)
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9f8712a4b1a5..c52ea8a89c5d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -118,6 +118,7 @@ enum pageflags {
PG_reclaim, /* To be reclaimed asap */
PG_swapbacked,  /* Page is backed by RAM/swap */
PG_unevictable, /* Page is "unevictable"  */
+   PG_wipeonrelease,
 #ifdef CONFIG_MMU
PG_mlocked, /* Page is vma mlocked */
 #endif
@@ -316,6 +317,7 @@ PAGEFLAG(Referenced, referenced, PF_HEAD)
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+PAGEFLAG(WipeOnRelease, wipeonrelease, PF_HEAD) __CLEARPAGEFLAG(WipeOnRelease, 
wipeonrelease, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a1675d43777e..4e5116a95b82 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -100,13 +100,13 @@
{1UL << PG_mappedtodisk,"mappedtodisk"  },  \
{1UL << PG_reclaim, "reclaim"   },  \
{1UL << PG_swapbacked,  "swapbacked"},  \
-   {1UL << PG_unevictable, "unevictable"   }   \
+   {1UL << PG_unevictable, "unevictable"   },  \
+   {1UL << PG_wipeonrelease,   "wipeonrelease" }   \
 IF_HAVE_PG_MLOCK(PG_mlocked,   "mlocked"   )   \
 IF_HAVE_PG_UNCACHED(PG_uncached,   "uncached"  )   \
 IF_HAVE_PG_HWPOISON(PG_hwpoison,   "hwpoison"  )   \
 IF_HAVE_PG_IDLE(PG_young,  "young" )   \
 IF_HAVE_PG_IDLE(PG_idle,   "idle"  )
-
 #define show_page_flags(flags) \
(flags) ? __print_flags(flags, "|", \
__def_pageflag_names\
diff --git a/include/uapi/asm-generic/mman-common.h