Re: [PATCH 1/1] vsock: fix the race conditions in multi-transport support

2021-02-01 Thread Alexander Popov
On 01.02.2021 11:26, Stefano Garzarella wrote:
> On Sun, Jan 31, 2021 at 01:59:14PM +0300, Alexander Popov wrote:
>> There are multiple similar bugs implicitly introduced by the
>> commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and
>> commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading").
>>
>> The bug pattern:
>> [1] vsock_sock.transport pointer is copied to a local variable,
>> [2] lock_sock() is called,
>> [3] the local variable is used.
>> VSOCK multi-transport support introduced the race condition:
>> vsock_sock.transport value may change between [1] and [2].
>>
>> Let's copy vsock_sock.transport pointer to local variables after
>> the lock_sock() call.
> 
> We can add:
> 
> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support")
> 
>>
>> Signed-off-by: Alexander Popov 
>> ---
>> net/vmw_vsock/af_vsock.c | 17 -
>> 1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index d10916ab4526..28edac1f9aa6 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -997,9 +997,12 @@ static __poll_t vsock_poll(struct file *file, struct 
>> socket *sock,
>>  mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
>>
>>  } else if (sock->type == SOCK_STREAM) {
>> -const struct vsock_transport *transport = vsk->transport;
>> +const struct vsock_transport *transport = NULL;
> 
> I think we can avoid initializing to NULL since we assign it shortly 
> after.
> 
>> +
>>  lock_sock(sk);
>>
>> +transport = vsk->transport;
>> +
>>  /* Listening sockets that have connections in their accept
>>   * queue can be read.
>>   */
>> @@ -1082,10 +1085,11 @@ static int vsock_dgram_sendmsg(struct socket *sock, 
>> struct msghdr *msg,
>>  err = 0;
>>  sk = sock->sk;
>>  vsk = vsock_sk(sk);
>> -transport = vsk->transport;
>>
>>  lock_sock(sk);
>>
>> +transport = vsk->transport;
>> +
>>  err = vsock_auto_bind(vsk);
>>  if (err)
>>  goto out;
>> @@ -1544,10 +1548,11 @@ static int vsock_stream_setsockopt(struct 
>> socket *sock,
>>  err = 0;
>>  sk = sock->sk;
>>  vsk = vsock_sk(sk);
>> -transport = vsk->transport;
>>
>>  lock_sock(sk);
>>
>> +transport = vsk->transport;
>> +
>>  switch (optname) {
>>  case SO_VM_SOCKETS_BUFFER_SIZE:
>>  COPY_IN(val);
>> @@ -1680,7 +1685,6 @@ static int vsock_stream_sendmsg(struct socket *sock, 
>> struct msghdr *msg,
>>
>>  sk = sock->sk;
>>  vsk = vsock_sk(sk);
>> -transport = vsk->transport;
>>  total_written = 0;
>>  err = 0;
>>
>> @@ -1689,6 +1693,8 @@ static int vsock_stream_sendmsg(struct socket *sock, 
>> struct msghdr *msg,
>>
>>  lock_sock(sk);
>>
>> +transport = vsk->transport;
>> +
>>  /* Callers should not provide a destination with stream sockets. */
>>  if (msg->msg_namelen) {
>>  err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
>> @@ -1823,11 +1829,12 @@ vsock_stream_recvmsg(struct socket *sock, struct 
>> msghdr *msg, size_t len,
>>
>>  sk = sock->sk;
>>  vsk = vsock_sk(sk);
>> -transport = vsk->transport;
>>  err = 0;
>>
>>  lock_sock(sk);
>>
>> +transport = vsk->transport;
>> +
>>  if (!transport || sk->sk_state != TCP_ESTABLISHED) {
>>  /* Recvmsg is supposed to return 0 if a peer performs an
>>   * orderly shutdown. Differentiate between that case and when a
>> -- 
>> 2.26.2
>>
> 
> Thanks for fixing this issues. With the small changes applied:
> 
> Reviewed-by: Stefano Garzarella 

Hello Stefano,

Thanks for the review.

I've just sent the v2.

Best regards,
Alexander


[PATCH v2 1/1] vsock: fix the race conditions in multi-transport support

2021-02-01 Thread Alexander Popov
There are multiple similar bugs implicitly introduced by the
commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and
commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading").

The bug pattern:
 [1] vsock_sock.transport pointer is copied to a local variable,
 [2] lock_sock() is called,
 [3] the local variable is used.
VSOCK multi-transport support introduced the race condition:
vsock_sock.transport value may change between [1] and [2].

Let's copy vsock_sock.transport pointer to local variables after
the lock_sock() call.

Fixes: c0cfa2d8a788fcf4 ("vsock: add multi-transports support")

Reviewed-by: Stefano Garzarella 
Signed-off-by: Alexander Popov 
---
 net/vmw_vsock/af_vsock.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab4526..f64e681493a5 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -997,9 +997,12 @@ static __poll_t vsock_poll(struct file *file, struct 
socket *sock,
mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
 
} else if (sock->type == SOCK_STREAM) {
-   const struct vsock_transport *transport = vsk->transport;
+   const struct vsock_transport *transport;
+
lock_sock(sk);
 
+   transport = vsk->transport;
+
/* Listening sockets that have connections in their accept
 * queue can be read.
 */
@@ -1082,10 +1085,11 @@ static int vsock_dgram_sendmsg(struct socket *sock, 
struct msghdr *msg,
err = 0;
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
err = vsock_auto_bind(vsk);
if (err)
goto out;
@@ -1544,10 +1548,11 @@ static int vsock_stream_setsockopt(struct socket *sock,
err = 0;
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
switch (optname) {
case SO_VM_SOCKETS_BUFFER_SIZE:
COPY_IN(val);
@@ -1680,7 +1685,6 @@ static int vsock_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
 
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
total_written = 0;
err = 0;
 
@@ -1689,6 +1693,8 @@ static int vsock_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
/* Callers should not provide a destination with stream sockets. */
if (msg->msg_namelen) {
err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
@@ -1823,11 +1829,12 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr 
*msg, size_t len,
 
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
err = 0;
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
if (!transport || sk->sk_state != TCP_ESTABLISHED) {
/* Recvmsg is supposed to return 0 if a peer performs an
 * orderly shutdown. Differentiate between that case and when a
-- 
2.26.2



[PATCH 1/1] vsock: fix the race conditions in multi-transport support

2021-01-31 Thread Alexander Popov
There are multiple similar bugs implicitly introduced by the
commit c0cfa2d8a788fcf4 ("vsock: add multi-transports support") and
commit 6a2c0962105ae8ce ("vsock: prevent transport modules unloading").

The bug pattern:
 [1] vsock_sock.transport pointer is copied to a local variable,
 [2] lock_sock() is called,
 [3] the local variable is used.
VSOCK multi-transport support introduced the race condition:
vsock_sock.transport value may change between [1] and [2].

Let's copy vsock_sock.transport pointer to local variables after
the lock_sock() call.

Signed-off-by: Alexander Popov 
---
 net/vmw_vsock/af_vsock.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d10916ab4526..28edac1f9aa6 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -997,9 +997,12 @@ static __poll_t vsock_poll(struct file *file, struct 
socket *sock,
mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
 
} else if (sock->type == SOCK_STREAM) {
-   const struct vsock_transport *transport = vsk->transport;
+   const struct vsock_transport *transport = NULL;
+
lock_sock(sk);
 
+   transport = vsk->transport;
+
/* Listening sockets that have connections in their accept
 * queue can be read.
 */
@@ -1082,10 +1085,11 @@ static int vsock_dgram_sendmsg(struct socket *sock, 
struct msghdr *msg,
err = 0;
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
err = vsock_auto_bind(vsk);
if (err)
goto out;
@@ -1544,10 +1548,11 @@ static int vsock_stream_setsockopt(struct socket *sock,
err = 0;
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
switch (optname) {
case SO_VM_SOCKETS_BUFFER_SIZE:
COPY_IN(val);
@@ -1680,7 +1685,6 @@ static int vsock_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
 
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
total_written = 0;
err = 0;
 
@@ -1689,6 +1693,8 @@ static int vsock_stream_sendmsg(struct socket *sock, 
struct msghdr *msg,
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
/* Callers should not provide a destination with stream sockets. */
if (msg->msg_namelen) {
err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP;
@@ -1823,11 +1829,12 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr 
*msg, size_t len,
 
sk = sock->sk;
vsk = vsock_sk(sk);
-   transport = vsk->transport;
err = 0;
 
lock_sock(sk);
 
+   transport = vsk->transport;
+
if (!transport || sk->sk_state != TCP_ESTABLISHED) {
/* Recvmsg is supposed to return 0 if a peer performs an
 * orderly shutdown. Differentiate between that case and when a
-- 
2.26.2



[PATCH] mm/slab: Perform init_on_free earlier

2020-12-10 Thread Alexander Popov
Currently in CONFIG_SLAB init_on_free happens too late, and heap
objects go to the heap quarantine not being erased.

Lets move init_on_free clearing before calling kasan_slab_free().
In that case heap quarantine will store erased objects, similarly
to CONFIG_SLUB=y behavior.

Signed-off-by: Alexander Popov 
Reviewed-by: Alexander Potapenko 
---
 mm/slab.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b1113561b98b..344a101e37e0 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3416,6 +3416,9 @@ static void cache_flusharray(struct kmem_cache *cachep, 
struct array_cache *ac)
 static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
 unsigned long caller)
 {
+   if (unlikely(slab_want_init_on_free(cachep)))
+   memset(objp, 0, cachep->object_size);
+
/* Put the object into the quarantine, don't touch it for now. */
if (kasan_slab_free(cachep, objp, _RET_IP_))
return;
@@ -3434,8 +3437,6 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
struct array_cache *ac = cpu_cache_get(cachep);
 
check_irq_off();
-   if (unlikely(slab_want_init_on_free(cachep)))
-   memset(objp, 0, cachep->object_size);
kmemleak_free_recursive(objp, cachep->flags);
objp = cache_free_debugcheck(cachep, objp, caller);
memcg_slab_free_hook(cachep, , 1);
-- 
2.26.2



Re: [PATCH RFC v2 2/6] mm/slab: Perform init_on_free earlier

2020-12-04 Thread Alexander Popov
On 03.12.2020 23:49, Andrew Morton wrote:
> On Thu, 3 Dec 2020 22:50:27 +0300 Alexander Popov  
> wrote:
> 
>> On 30.09.2020 15:50, Alexander Potapenko wrote:
>>> On Tue, Sep 29, 2020 at 8:35 PM Alexander Popov  
>>> wrote:
>>>>
>>>> Currently in CONFIG_SLAB init_on_free happens too late, and heap
>>>> objects go to the heap quarantine being dirty. Lets move memory
>>>> clearing before calling kasan_slab_free() to fix that.
>>>>
>>>> Signed-off-by: Alexander Popov 
>>> Reviewed-by: Alexander Potapenko 
>>
>> Hello!
>>
>> Can this particular patch be considered for the mainline kernel?
> 
> All patches are considered ;) And merged if they're reviewed, tested,
> judged useful, etc.
> 
> If you think this particular patch should be fast-tracked then please
> send it as a non-RFC, standalone patch.  Please also enhance the
> changelog so that it actually explains what goes wrong.  Presumably
> "objects go to the heap quarantine being dirty" causes some
> user-visible problem?  What is that problem?

Ok, thanks!
I'll improve the commit message and send the patch separately.

Best regards,
Alexander


Re: [PATCH RFC v2 2/6] mm/slab: Perform init_on_free earlier

2020-12-03 Thread Alexander Popov
On 30.09.2020 15:50, Alexander Potapenko wrote:
> On Tue, Sep 29, 2020 at 8:35 PM Alexander Popov  wrote:
>>
>> Currently in CONFIG_SLAB init_on_free happens too late, and heap
>> objects go to the heap quarantine being dirty. Lets move memory
>> clearing before calling kasan_slab_free() to fix that.
>>
>> Signed-off-by: Alexander Popov 
> Reviewed-by: Alexander Potapenko 

Hello!

Can this particular patch be considered for the mainline kernel?


Note: I summarized the results of the experiment with the Linux kernel heap
quarantine in a short article, for future reference:
https://a13xp0p0v.github.io/2020/11/30/slab-quarantine.html

Best regards,
Alexander

>> ---
>>  mm/slab.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 3160dff6fd76..5140203c5b76 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -3414,6 +3414,9 @@ static void cache_flusharray(struct kmem_cache 
>> *cachep, struct array_cache *ac)
>>  static __always_inline void __cache_free(struct kmem_cache *cachep, void 
>> *objp,
>>  unsigned long caller)
>>  {
>> +   if (unlikely(slab_want_init_on_free(cachep)))
>> +   memset(objp, 0, cachep->object_size);
>> +
>> /* Put the object into the quarantine, don't touch it for now. */
>> if (kasan_slab_free(cachep, objp, _RET_IP_))
>> return;
>> @@ -3432,8 +3435,6 @@ void ___cache_free(struct kmem_cache *cachep, void 
>> *objp,
>> struct array_cache *ac = cpu_cache_get(cachep);
>>
>> check_irq_off();
>> -   if (unlikely(slab_want_init_on_free(cachep)))
>> -   memset(objp, 0, cachep->object_size);
>> kmemleak_free_recursive(objp, cachep->flags);
>> objp = cache_free_debugcheck(cachep, objp, caller);
>> memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp);
>> --
>> 2.26.2
>>
> 
> 



Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-10-06 Thread Alexander Popov
On 06.10.2020 21:37, Jann Horn wrote:
> On Tue, Oct 6, 2020 at 7:56 PM Alexander Popov  wrote:
>>
>> On 06.10.2020 01:56, Jann Horn wrote:
>>> On Thu, Oct 1, 2020 at 9:43 PM Alexander Popov  wrote:
>>>> On 29.09.2020 21:35, Alexander Popov wrote:
>>>>> This is the second version of the heap quarantine prototype for the Linux
>>>>> kernel. I performed a deeper evaluation of its security properties and
>>>>> developed new features like quarantine randomization and integration with
>>>>> init_on_free. That is fun! See below for more details.
>>>>>
>>>>>
>>>>> Rationale
>>>>> =
>>>>>
>>>>> Use-after-free vulnerabilities in the Linux kernel are very popular for
>>>>> exploitation. There are many examples, some of them:
>>>>>  
>>>>> https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
>>
>> Hello Jann, thanks for your reply.
>>
>>> I don't think your proposed mitigation would work with much
>>> reliability against this bug; the attacker has full control over the
>>> timing of the original use and the following use, so an attacker
>>> should be able to trigger the kmem_cache_free(), then spam enough new
>>> VMAs and delete them to flush out the quarantine, and then do heap
>>> spraying as normal, or something like that.
>>
>> The randomized quarantine will release the vulnerable object at an 
>> unpredictable
>> moment (patch 4/6).
>>
>> So I think the control over the time of the use-after-free access doesn't 
>> help
>> attackers, if they don't have an "infinite spray" -- unlimited ability to 
>> store
>> controlled data in the kernelspace objects of the needed size without 
>> freeing them.
>>
>> "Unlimited", because the quarantine size is 1/32 of whole memory.
>> "Without freeing", because freed objects are erased by init_on_free before 
>> going
>> to randomized heap quarantine (patch 3/6).
>>
>> Would you agree?
> 
> But you have a single quarantine (per CPU) for all objects, right? So
> for a UAF on slab A, the attacker can just spam allocations and
> deallocations on slab B to almost deterministically flush everything
> in slab A back to the SLUB freelists?

Aaaahh! Nice shot Jann, I see.

Another slab cache can be used to flush the randomized quarantine, so eventually
the vulnerable object returns into the allocator freelist in its cache, and
original heap spraying can be used again.

For now I think the idea of a global quarantine for all slab objects is dead.

Thank you.

Best regards,
Alexander


Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-10-06 Thread Alexander Popov
On 06.10.2020 01:56, Jann Horn wrote:
> On Thu, Oct 1, 2020 at 9:43 PM Alexander Popov  wrote:
>> On 29.09.2020 21:35, Alexander Popov wrote:
>>> This is the second version of the heap quarantine prototype for the Linux
>>> kernel. I performed a deeper evaluation of its security properties and
>>> developed new features like quarantine randomization and integration with
>>> init_on_free. That is fun! See below for more details.
>>>
>>>
>>> Rationale
>>> =
>>>
>>> Use-after-free vulnerabilities in the Linux kernel are very popular for
>>> exploitation. There are many examples, some of them:
>>>  
>>> https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html

Hello Jann, thanks for your reply.

> I don't think your proposed mitigation would work with much
> reliability against this bug; the attacker has full control over the
> timing of the original use and the following use, so an attacker
> should be able to trigger the kmem_cache_free(), then spam enough new
> VMAs and delete them to flush out the quarantine, and then do heap
> spraying as normal, or something like that.

The randomized quarantine will release the vulnerable object at an unpredictable
moment (patch 4/6).

So I think the control over the time of the use-after-free access doesn't help
attackers, if they don't have an "infinite spray" -- unlimited ability to store
controlled data in the kernelspace objects of the needed size without freeing 
them.

"Unlimited", because the quarantine size is 1/32 of whole memory.
"Without freeing", because freed objects are erased by init_on_free before going
to randomized heap quarantine (patch 3/6).

Would you agree?

> Also, note that here, if the reallocation fails, the kernel still
> wouldn't crash because the dangling object is not accessed further if
> the address range stored in it doesn't match the fault address. So an
> attacker could potentially try multiple times, and if the object
> happens to be on the quarantine the first time, that wouldn't really
> be a showstopper, you'd just try again.

Freed objects are filled by zero before going to quarantine (patch 3/6).
Would it cause a null pointer dereference on unsuccessful try?

>>>  
>>> https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html?m=1
> 
> I think that here, again, the free() and the dangling pointer use were
> caused by separate syscalls, meaning the attacker had control over
> that timing?

As I wrote above, I think attacker's control over this timing is required for a
successful attack, but is not enough for bypassing randomized quarantine.

>>>  https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> 
> Haven't looked at that one in detail.
> 
>>> Use-after-free exploits usually employ heap spraying technique.
>>> Generally it aims to put controlled bytes at a predetermined memory
>>> location on the heap.
> 
> Well, not necessarily "predetermined". Depending on the circumstances,
> you don't necessarily need to know which address you're writing to;
> and you might not even need to overwrite a specific object, but
> instead just have to overwrite one out of a bunch of objects, no
> matter which.

Yes, of course, I didn't mean a "predetermined memory address".
Maybe "definite memory location" is a better phrase for that.

>>> Heap spraying for exploiting use-after-free in the Linux kernel relies on
>>> the fact that on kmalloc(), the slab allocator returns the address of
>>> the memory that was recently freed.
> 
> Yeah; and that behavior is pretty critical for performance. The longer
> it's been since a newly allocated object was freed, the higher the
> chance that you'll end up having to go further down the memory cache
> hierarchy.

Yes. That behaviour is fast, however very convenient for use-after-free
exploitation...

>>> So allocating a kernel object with
>>> the same size and controlled contents allows overwriting the vulnerable
>>> freed object.
> 
> The vmacache exploit you linked to doesn't do that, it frees the
> object all the way back to the page allocator and then sprays 4MiB of
> memory from the page allocator. (Because VMAs use their own
> kmem_cache, and the kmem_cache wasn't merged with any interesting
> ones, and I saw no good way to exploit the bug by reallocating another
> VMA over the old VMA back then. Although of course that doesn't mean
> that there is no such way.)

Sorry, my mistake.
Exploit examples with heap spraying that fit my description:
 - CVE-2017-6074 https://www.openwall.com/lists/oss-security/2017/02/26/2

Re: [PATCH RFC v2 2/6] mm/slab: Perform init_on_free earlier

2020-10-01 Thread Alexander Popov
On 30.09.2020 15:50, Alexander Potapenko wrote:
> On Tue, Sep 29, 2020 at 8:35 PM Alexander Popov  wrote:
>>
>> Currently in CONFIG_SLAB init_on_free happens too late, and heap
>> objects go to the heap quarantine being dirty. Lets move memory
>> clearing before calling kasan_slab_free() to fix that.
>>
>> Signed-off-by: Alexander Popov 
> Reviewed-by: Alexander Potapenko 

Thanks for the review, Alexander!

Do you have any idea how this patch series relates to Memory Tagging support
that is currently developed?

Best regards,
Alexander


Re: [PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-10-01 Thread Alexander Popov
Hello! I have some performance numbers. Please see below.

On 29.09.2020 21:35, Alexander Popov wrote:
> Hello everyone! Requesting for your comments.
> 
> This is the second version of the heap quarantine prototype for the Linux
> kernel. I performed a deeper evaluation of its security properties and
> developed new features like quarantine randomization and integration with
> init_on_free. That is fun! See below for more details.
> 
> 
> Rationale
> =
> 
> Use-after-free vulnerabilities in the Linux kernel are very popular for
> exploitation. There are many examples, some of them:
>  
> https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
>  
> https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html?m=1
>  https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> 
> Use-after-free exploits usually employ heap spraying technique.
> Generally it aims to put controlled bytes at a predetermined memory
> location on the heap.
> 
> Heap spraying for exploiting use-after-free in the Linux kernel relies on
> the fact that on kmalloc(), the slab allocator returns the address of
> the memory that was recently freed. So allocating a kernel object with
> the same size and controlled contents allows overwriting the vulnerable
> freed object.
> 
> I've found an easy way to break the heap spraying for use-after-free
> exploitation. I extracted slab freelist quarantine from KASAN functionality
> and called it CONFIG_SLAB_QUARANTINE. Please see patch 1/6.
> 
> If this feature is enabled, freed allocations are stored in the quarantine
> queue where they wait for actual freeing. So they can't be instantly
> reallocated and overwritten by use-after-free exploits.
> 
> N.B. Heap spraying for out-of-bounds exploitation is another technique,
> heap quarantine doesn't break it.
> 
> 
> Security properties
> ===
> 
> For researching security properties of the heap quarantine I developed 2 lkdtm
> tests (see the patch 5/6).
> 
> The first test is called lkdtm_HEAP_SPRAY. It allocates and frees an object
> from a separate kmem_cache and then allocates 40 similar objects.
> I.e. this test performs an original heap spraying technique for use-after-free
> exploitation.
> 
> If CONFIG_SLAB_QUARANTINE is disabled, the freed object is instantly
> reallocated and overwritten:
>   # echo HEAP_SPRAY > /sys/kernel/debug/provoke-crash/DIRECT
>lkdtm: Performing direct entry HEAP_SPRAY
>lkdtm: Allocated and freed spray_cache object 2b5b3ad4 of size 333
>lkdtm: Original heap spraying: allocate 40 objects of size 333...
>lkdtm: FAIL: attempt 0: freed object is reallocated
> 
> If CONFIG_SLAB_QUARANTINE is enabled, 40 new allocations don't overwrite
> the freed object:
>   # echo HEAP_SPRAY > /sys/kernel/debug/provoke-crash/DIRECT
>lkdtm: Performing direct entry HEAP_SPRAY
>lkdtm: Allocated and freed spray_cache object 9909e777 of size 333
>lkdtm: Original heap spraying: allocate 40 objects of size 333...
>lkdtm: OK: original heap spraying hasn't succeed
> 
> That happens because pushing an object through the quarantine requires _both_
> allocating and freeing memory. Objects are released from the quarantine on
> new memory allocations, but only when the quarantine size is over the limit.
> And the quarantine size grows on new memory freeing.
> 
> That's why I created the second test called lkdtm_PUSH_THROUGH_QUARANTINE.
> It allocates and frees an object from a separate kmem_cache and then performs
> kmem_cache_alloc()+kmem_cache_free() for that cache 40 times.
> This test effectively pushes the object through the heap quarantine and
> reallocates it after it returns back to the allocator freelist:
>   # echo PUSH_THROUGH_QUARANTINE > /sys/kernel/debug/provoke-crash/
>lkdtm: Performing direct entry PUSH_THROUGH_QUARANTINE
>lkdtm: Allocated and freed spray_cache object 8fdb15c3 of size 333
>lkdtm: Push through quarantine: allocate and free 40 objects of size 
> 333...
>lkdtm: Target object is reallocated at attempt 182994
>   # echo PUSH_THROUGH_QUARANTINE > /sys/kernel/debug/provoke-crash/
>lkdtm: Performing direct entry PUSH_THROUGH_QUARANTINE
>lkdtm: Allocated and freed spray_cache object 4e223cbe of size 333
>lkdtm: Push through quarantine: allocate and free 40 objects of size 
> 333...
>lkdtm: Target object is reallocated at attempt 186830
>   # echo PUSH_THROUGH_QUARANTINE > /sys/kernel/debug/provoke-crash/
>lkdtm: Performing direct entry PUSH_THROUGH_QUARANTINE
>lkdtm: Allocated and freed spray_cache object 7663a058 of

[PATCH RFC v2 6/6] mm: Add heap quarantine verbose debugging (not for merge)

2020-09-29 Thread Alexander Popov
Add verbose debugging for deeper understanding of the heap quarantine
inner workings (this patch is not for merge).

Signed-off-by: Alexander Popov 
---
 mm/kasan/quarantine.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 4ce100605086..98cd6e963755 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -203,6 +203,12 @@ void quarantine_put(struct kasan_free_meta *info, struct 
kmem_cache *cache)
qlist_move_all(q, );
 
raw_spin_lock(_lock);
+
+   pr_info("quarantine: PUT %zu to tail batch %d, whole sz %zu, 
batch sz %lu\n",
+   temp.bytes, quarantine_tail,
+   READ_ONCE(quarantine_size),
+   READ_ONCE(quarantine_batch_size));
+
WRITE_ONCE(quarantine_size, quarantine_size + temp.bytes);
qlist_move_all(, _quarantine[quarantine_tail]);
if (global_quarantine[quarantine_tail].bytes >=
@@ -313,7 +319,22 @@ void quarantine_reduce(void)
quarantine_head = get_random_int() % QUARANTINE_BATCHES;
} while (quarantine_head == quarantine_tail);
qlist_move_random(_quarantine[quarantine_head], 
_free);
+   pr_info("quarantine: whole sz exceed max by %lu, REDUCE head 
batch %d by %zu, leave %zu\n",
+   quarantine_size - quarantine_max_size,
+   quarantine_head, to_free.bytes,
+   global_quarantine[quarantine_head].bytes);
WRITE_ONCE(quarantine_size, quarantine_size - to_free.bytes);
+
+   if (quarantine_head == 0) {
+   unsigned long i;
+
+   pr_info("quarantine: data level in batches:");
+   for (i = 0; i < QUARANTINE_BATCHES; i++) {
+   pr_info("  %lu - %lu%%\n",
+   i, global_quarantine[i].bytes *
+   100 / quarantine_batch_size);
+   }
+   }
}
 #endif
 
-- 
2.26.2



[PATCH RFC v2 4/6] mm: Implement slab quarantine randomization

2020-09-29 Thread Alexander Popov
The randomization is very important for the slab quarantine security
properties. Without it the number of kmalloc()+kfree() calls that are
needed for overwriting the vulnerable object is almost the same.
That would be good for stable use-after-free exploitation, and we
should not allow that.

This commit contains very compact and hackish changes that introduce
the quarantine randomization. At first all quarantine batches are filled
by objects. Then during the quarantine reducing we randomly choose and
free 1/2 of objects from a randomly chosen batch. Now the randomized
quarantine releases the freed object at an unpredictable moment, which
is harmful for the heap spraying technique employed by use-after-free
exploits.

Signed-off-by: Alexander Popov 
---
 mm/kasan/quarantine.c | 79 +--
 1 file changed, 69 insertions(+), 10 deletions(-)

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 61666263c53e..4ce100605086 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../slab.h"
 #include "kasan.h"
@@ -89,8 +90,13 @@ static void qlist_move_all(struct qlist_head *from, struct 
qlist_head *to)
 }
 
 #define QUARANTINE_PERCPU_SIZE (1 << 20)
+
+#ifdef CONFIG_KASAN
 #define QUARANTINE_BATCHES \
(1024 > 4 * CONFIG_NR_CPUS ? 1024 : 4 * CONFIG_NR_CPUS)
+#else
+#define QUARANTINE_BATCHES 128
+#endif
 
 /*
  * The object quarantine consists of per-cpu queues and a global queue,
@@ -110,10 +116,7 @@ DEFINE_STATIC_SRCU(remove_cache_srcu);
 /* Maximum size of the global queue. */
 static unsigned long quarantine_max_size;
 
-/*
- * Target size of a batch in global_quarantine.
- * Usually equal to QUARANTINE_PERCPU_SIZE unless we have too much RAM.
- */
+/* Target size of a batch in global_quarantine. */
 static unsigned long quarantine_batch_size;
 
 /*
@@ -191,7 +194,12 @@ void quarantine_put(struct kasan_free_meta *info, struct 
kmem_cache *cache)
 
q = this_cpu_ptr(_quarantine);
qlist_put(q, >quarantine_link, cache->size);
+#ifdef CONFIG_KASAN
if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE)) {
+#else
+   if (unlikely(q->bytes > min_t(size_t, QUARANTINE_PERCPU_SIZE,
+   READ_ONCE(quarantine_batch_size {
+#endif
qlist_move_all(q, );
 
raw_spin_lock(_lock);
@@ -204,7 +212,7 @@ void quarantine_put(struct kasan_free_meta *info, struct 
kmem_cache *cache)
new_tail = quarantine_tail + 1;
if (new_tail == QUARANTINE_BATCHES)
new_tail = 0;
-   if (new_tail != quarantine_head)
+   if (new_tail != quarantine_head || 
!IS_ENABLED(CONFIG_KASAN))
quarantine_tail = new_tail;
}
raw_spin_unlock(_lock);
@@ -213,12 +221,43 @@ void quarantine_put(struct kasan_free_meta *info, struct 
kmem_cache *cache)
local_irq_restore(flags);
 }
 
+static void qlist_move_random(struct qlist_head *from, struct qlist_head *to)
+{
+   struct qlist_node *curr;
+
+   if (unlikely(qlist_empty(from)))
+   return;
+
+   curr = from->head;
+   qlist_init(from);
+   while (curr) {
+   struct qlist_node *next = curr->next;
+   struct kmem_cache *obj_cache = qlink_to_cache(curr);
+   int rnd =  get_random_int();
+
+   /*
+* Hackish quarantine randomization, part 2:
+* move only 1/2 of objects to the destination list.
+* TODO: use random bits sparingly for better performance.
+*/
+   if (rnd % 2 == 0)
+   qlist_put(to, curr, obj_cache->size);
+   else
+   qlist_put(from, curr, obj_cache->size);
+
+   curr = next;
+   }
+}
+
 void quarantine_reduce(void)
 {
-   size_t total_size, new_quarantine_size, percpu_quarantines;
+   size_t total_size;
unsigned long flags;
int srcu_idx;
struct qlist_head to_free = QLIST_INIT;
+#ifdef CONFIG_KASAN
+   size_t new_quarantine_size, percpu_quarantines;
+#endif
 
if (likely(READ_ONCE(quarantine_size) <=
   READ_ONCE(quarantine_max_size)))
@@ -236,12 +275,12 @@ void quarantine_reduce(void)
srcu_idx = srcu_read_lock(_cache_srcu);
raw_spin_lock_irqsave(_lock, flags);
 
-   /*
-* Update quarantine size in case of hotplug. Allocate a fraction of
-* the installed memory to quarantine minus per-cpu queue limits.
-*/
+   /* Update quarantine size in case of hotplug */
total_size = (totalram_pages() << PAGE_SHIFT) /
QUARANTINE_FRACTION;
+
+#ifdef CONFIG_KASAN
+   /* Subtract per-cpu q

[PATCH RFC v2 0/6] Break heap spraying needed for exploiting use-after-free

2020-09-29 Thread Alexander Popov
ts. Then during
the quarantine reducing I randomly choose and free 1/2 of objects from a
randomly chosen batch. Now the randomized quarantine releases the freed object
at an unpredictable moment:
   lkdtm: Target object is reallocated at attempt 107884
   lkdtm: Target object is reallocated at attempt 265641
   lkdtm: Target object is reallocated at attempt 100030
   lkdtm: Target object is NOT reallocated in 40 attempts
   lkdtm: Target object is reallocated at attempt 204731
   lkdtm: Target object is reallocated at attempt 359333
   lkdtm: Target object is reallocated at attempt 289349
   lkdtm: Target object is reallocated at attempt 119893
   lkdtm: Target object is reallocated at attempt 225202
   lkdtm: Target object is reallocated at attempt 87343

However, this randomization alone would not disturb the attacker, because
the quarantine stores the attacker's data (the payload) in the sprayed objects.
I.e. the reallocated and overwritten vulnerable object contains the payload
until the next reallocation (very bad).

Hence heap objects should be erased before going to the heap quarantine.
Moreover, filling them by zeros gives a chance to detect use-after-free
accesses to non-zero data while an object stays in the quarantine (nice!).
That functionality already exists in the kernel, it's called init_on_free.
I integrated it with CONFIG_SLAB_QUARANTINE in the patch 3/6.

During that work I found a bug: in CONFIG_SLAB init_on_free happens too
late, and heap objects go to the KASAN quarantine being dirty. See the fix
in the patch 2/6.

For deeper understanding of the heap quarantine inner workings, I attach
the patch 6/6, which contains verbose debugging (not for merge).
It's very helpful, see the output example:
   quarantine: PUT 508992 to tail batch 123, whole sz 65118872, batch sz 508854
   quarantine: whole sz exceed max by 494552, REDUCE head batch 0 by 415392, 
leave 396304
   quarantine: data level in batches:
 0 - 77%
 1 - 108%
 2 - 83%
 3 - 21%
   ...
 125 - 75%
 126 - 12%
 127 - 108%
   quarantine: whole sz exceed max by 79160, REDUCE head batch 12 by 14160, 
leave 17608
   quarantine: whole sz exceed max by 65000, REDUCE head batch 75 by 218328, 
leave 195232
   quarantine: PUT 508992 to tail batch 124, whole sz 64979984, batch sz 508854
   ...


Changes in v2
=

 - Added heap quarantine randomization (the patch 4/6).

 - Integrated CONFIG_SLAB_QUARANTINE with init_on_free (the patch 3/6).

 - Fixed late init_on_free in CONFIG_SLAB (the patch 2/6).

 - Added lkdtm_PUSH_THROUGH_QUARANTINE test.

 - Added the quarantine verbose debugging (the patch 6/6, not for merge).

 - Improved the descriptions according to the feedback from Kees Cook
   and Matthew Wilcox.

 - Made fixes recommended by Kees Cook:

   * Avoided BUG_ON() in kasan_cache_create() by handling the error and
 reporting with WARN_ON().

   * Created a separate kmem_cache for new lkdtm tests.

   * Fixed kasan_track.pid type to pid_t.


TODO for the next prototypes


1. Performance evaluation and optimization.
   I would really appreciate your ideas about performance testing of a
   kernel with the heap quarantine. The first prototype was tested with
   hackbench and kernel build timing (which showed very different numbers).
   Earlier the developers similarly tested init_on_free functionality.
   However, Brad Spengler says in his twitter that such testing method
   is poor.

2. Complete separation of CONFIG_SLAB_QUARANTINE from KASAN (feedback
   from Andrey Konovalov).

3. Adding a kernel boot parameter for enabling/disabling the heap quaranitne
   (feedback from Kees Cook).

4. Testing the heap quarantine in near-OOM situations (feedback from
   Pavel Machek).

5. Does this work somehow help or disturb the integration of the
   Memory Tagging for the Linux kernel?

6. After rebasing the series onto v5.9.0-rc6, CONFIG_SLAB kernel started to
   show warnings about few slab caches that have no space for additional
   metadata. It needs more investigation. I believe it affects KASAN bug
   detection abilities as well. Warning example:
 WARNING: CPU: 0 PID: 0 at mm/kasan/slab_quarantine.c:38 
kasan_cache_create+0x37/0x50
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc6+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 
04/01/2014
 RIP: 0010:kasan_cache_create+0x37/0x50
 ...
 Call Trace:
  __kmem_cache_create+0x74/0x250
  create_boot_cache+0x6d/0x91
  create_kmalloc_cache+0x57/0x93
  new_kmalloc_cache+0x39/0x47
  create_kmalloc_caches+0x33/0xd9
  start_kernel+0x25b/0x532
  secondary_startup_64+0xb6/0xc0

Thanks in advance for your feedback.
Best regards,
Alexander


Alexander Popov (6):
  mm: Extract SLAB_QUARANTINE from KASAN
  mm/slab: Perform init_on_free earlier
  mm: Integrate SLAB_QUARANTINE with init_on_free
  mm: Implement slab quarantine randomizati

[PATCH RFC v2 3/6] mm: Integrate SLAB_QUARANTINE with init_on_free

2020-09-29 Thread Alexander Popov
Having slab quarantine without memory erasing is harmful.
If the quarantined objects are not cleaned and contain data, then:
  1. they will be useful for use-after-free exploitation,
  2. there is no chance to detect use-after-free access.
So we want the quarantined objects to be erased.
Enable init_on_free that cleans objects before placing them into
the quarantine. CONFIG_PAGE_POISONING should be disabled since it
cuts off init_on_free.

Signed-off-by: Alexander Popov 
---
 init/Kconfig|  3 ++-
 mm/page_alloc.c | 22 ++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 358c8ce818f4..cd4cee71fd4e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1933,7 +1933,8 @@ config SLAB_FREELIST_HARDENED
 
 config SLAB_QUARANTINE
bool "Enable slab freelist quarantine"
-   depends on !KASAN && (SLAB || SLUB)
+   depends on !KASAN && (SLAB || SLUB) && !PAGE_POISONING
+   select INIT_ON_FREE_DEFAULT_ON
help
  Enable slab freelist quarantine to delay reusing of freed slab
  objects. If this feature is enabled, freed objects are stored
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97dc9ca..f67118e88500 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -168,6 +168,27 @@ static int __init early_init_on_alloc(char *buf)
 }
 early_param("init_on_alloc", early_init_on_alloc);
 
+#ifdef CONFIG_SLAB_QUARANTINE
+static int __init early_init_on_free(char *buf)
+{
+   /*
+* Having slab quarantine without memory erasing is harmful.
+* If the quarantined objects are not cleaned and contain data, then:
+*  1. they will be useful for use-after-free exploitation,
+*  2. use-after-free access may not be detected.
+* So we want the quarantined objects to be erased.
+*
+* Enable init_on_free that cleans objects before placing them into
+* the quarantine. CONFIG_PAGE_POISONING should be disabled since it
+* cuts off init_on_free.
+*/
+   BUILD_BUG_ON(!IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON));
+   BUILD_BUG_ON(IS_ENABLED(CONFIG_PAGE_POISONING));
+   pr_info("mem auto-init: init_on_free is on for 
CONFIG_SLAB_QUARANTINE\n");
+
+   return 0;
+}
+#else /* CONFIG_SLAB_QUARANTINE */
 static int __init early_init_on_free(char *buf)
 {
int ret;
@@ -184,6 +205,7 @@ static int __init early_init_on_free(char *buf)
static_branch_disable(_on_free);
return ret;
 }
+#endif /* CONFIG_SLAB_QUARANTINE */
 early_param("init_on_free", early_init_on_free);
 
 /*
-- 
2.26.2



[PATCH RFC v2 5/6] lkdtm: Add heap quarantine tests

2020-09-29 Thread Alexander Popov
Add tests for CONFIG_SLAB_QUARANTINE.

The HEAP_SPRAY test aims to reallocate a recently freed heap object.
It allocates and frees an object from a separate kmem_cache and then
allocates 40 similar objects from it. I.e. this test performs an
original heap spraying technique for use-after-free exploitation.
If CONFIG_SLAB_QUARANTINE is disabled, the freed object is instantly
reallocated and overwritten, which is required for a successful attack.

The PUSH_THROUGH_QUARANTINE test allocates and frees an object from a
separate kmem_cache and then performs kmem_cache_alloc()+kmem_cache_free()
40 times. This test pushes the object through the heap quarantine and
reallocates it after it returns back to the allocator freelist.
If CONFIG_SLAB_QUARANTINE is enabled, this test should show that the
randomized quarantine will release the freed object at an unpredictable
moment, which makes use-after-free exploitation much harder.

Signed-off-by: Alexander Popov 
---
 drivers/misc/lkdtm/core.c  |   2 +
 drivers/misc/lkdtm/heap.c  | 110 +
 drivers/misc/lkdtm/lkdtm.h |   2 +
 3 files changed, 114 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..6be5ca49ae6b 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -126,6 +126,8 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(SLAB_FREE_DOUBLE),
CRASHTYPE(SLAB_FREE_CROSS),
CRASHTYPE(SLAB_FREE_PAGE),
+   CRASHTYPE(HEAP_SPRAY),
+   CRASHTYPE(PUSH_THROUGH_QUARANTINE),
CRASHTYPE(SOFTLOCKUP),
CRASHTYPE(HARDLOCKUP),
CRASHTYPE(SPINLOCKUP),
diff --git a/drivers/misc/lkdtm/heap.c b/drivers/misc/lkdtm/heap.c
index 1323bc16f113..f666a08d9462 100644
--- a/drivers/misc/lkdtm/heap.c
+++ b/drivers/misc/lkdtm/heap.c
@@ -10,6 +10,7 @@
 static struct kmem_cache *double_free_cache;
 static struct kmem_cache *a_cache;
 static struct kmem_cache *b_cache;
+static struct kmem_cache *spray_cache;
 
 /*
  * This tries to stay within the next largest power-of-2 kmalloc cache
@@ -204,6 +205,112 @@ static void ctor_a(void *region)
 { }
 static void ctor_b(void *region)
 { }
+static void ctor_spray(void *region)
+{ }
+
+#define SPRAY_LENGTH 40
+#define SPRAY_ITEM_SIZE 333
+
+void lkdtm_HEAP_SPRAY(void)
+{
+   int *addr;
+   int **spray_addrs = NULL;
+   unsigned long i = 0;
+
+   addr = kmem_cache_alloc(spray_cache, GFP_KERNEL);
+   if (!addr) {
+   pr_info("Can't allocate memory in spray_cache cache\n");
+   return;
+   }
+
+   memset(addr, 0xA5, SPRAY_ITEM_SIZE);
+   kmem_cache_free(spray_cache, addr);
+   pr_info("Allocated and freed spray_cache object %p of size %d\n",
+   addr, SPRAY_ITEM_SIZE);
+
+   spray_addrs = kcalloc(SPRAY_LENGTH, sizeof(int *), GFP_KERNEL);
+   if (!spray_addrs) {
+   pr_info("Unable to allocate memory for spray_addrs\n");
+   return;
+   }
+
+   pr_info("Original heap spraying: allocate %d objects of size %d...\n",
+   SPRAY_LENGTH, SPRAY_ITEM_SIZE);
+   for (i = 0; i < SPRAY_LENGTH; i++) {
+   spray_addrs[i] = kmem_cache_alloc(spray_cache, GFP_KERNEL);
+   if (!spray_addrs[i]) {
+   pr_info("Can't allocate memory in spray_cache cache\n");
+   break;
+   }
+
+   memset(spray_addrs[i], 0x42, SPRAY_ITEM_SIZE);
+
+   if (spray_addrs[i] == addr) {
+   pr_info("FAIL: attempt %lu: freed object is 
reallocated\n", i);
+   break;
+   }
+   }
+
+   if (i == SPRAY_LENGTH)
+   pr_info("OK: original heap spraying hasn't succeed\n");
+
+   for (i = 0; i < SPRAY_LENGTH; i++) {
+   if (spray_addrs[i])
+   kmem_cache_free(spray_cache, spray_addrs[i]);
+   }
+
+   kfree(spray_addrs);
+}
+
+/*
+ * Pushing an object through the quarantine requires both allocating and
+ * freeing memory. Objects are released from the quarantine on new memory
+ * allocations, but only when the quarantine size is over the limit.
+ * And the quarantine size grows on new memory freeing.
+ *
+ * This test should show that the randomized quarantine will release the
+ * freed object at an unpredictable moment.
+ */
+void lkdtm_PUSH_THROUGH_QUARANTINE(void)
+{
+   int *addr;
+   int *push_addr;
+   unsigned long i;
+
+   addr = kmem_cache_alloc(spray_cache, GFP_KERNEL);
+   if (!addr) {
+   pr_info("Can't allocate memory in spray_cache cache\n");
+   return;
+   }
+
+   memset(addr, 0xA5, SPRAY_ITEM_SIZE);
+   kmem_cache_free(spray_cache, addr);
+   pr_info("Allocated and freed spray_c

[PATCH RFC v2 1/6] mm: Extract SLAB_QUARANTINE from KASAN

2020-09-29 Thread Alexander Popov
Heap spraying is an exploitation technique that aims to put controlled
bytes at a predetermined memory location on the heap. Heap spraying for
exploiting use-after-free in the Linux kernel relies on the fact that on
kmalloc(), the slab allocator returns the address of the memory that was
recently freed. Allocating a kernel object with the same size and
controlled contents allows overwriting the vulnerable freed object.

Let's extract slab freelist quarantine from KASAN functionality and
call it CONFIG_SLAB_QUARANTINE. This feature breaks widespread heap
spraying technique for exploiting use-after-free vulnerabilities
in the kernel code.

If this feature is enabled, freed allocations are stored in the quarantine
queue where they wait for actual freeing. So they can't be instantly
reallocated and overwritten by use-after-free exploits.

N.B. Heap spraying for out-of-bounds exploitation is another technique,
heap quarantine doesn't break it.

Signed-off-by: Alexander Popov 
---
 include/linux/kasan.h  | 107 -
 include/linux/slab_def.h   |   2 +-
 include/linux/slub_def.h   |   2 +-
 init/Kconfig   |  13 +
 mm/Makefile|   3 +-
 mm/kasan/Makefile  |   2 +
 mm/kasan/kasan.h   |  75 +-
 mm/kasan/quarantine.c  |   2 +
 mm/kasan/slab_quarantine.c | 106 
 mm/slub.c  |   2 +-
 10 files changed, 225 insertions(+), 89 deletions(-)
 create mode 100644 mm/kasan/slab_quarantine.c

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 087fba34b209..b837216f760c 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -42,32 +42,14 @@ void kasan_unpoison_task_stack(struct task_struct *task);
 void kasan_alloc_pages(struct page *page, unsigned int order);
 void kasan_free_pages(struct page *page, unsigned int order);
 
-void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
-   slab_flags_t *flags);
-
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
 void kasan_poison_object_data(struct kmem_cache *cache, void *object);
 void * __must_check kasan_init_slab_obj(struct kmem_cache *cache,
const void *object);
 
-void * __must_check kasan_kmalloc_large(const void *ptr, size_t size,
-   gfp_t flags);
 void kasan_kfree_large(void *ptr, unsigned long ip);
 void kasan_poison_kfree(void *ptr, unsigned long ip);
-void * __must_check kasan_kmalloc(struct kmem_cache *s, const void *object,
-   size_t size, gfp_t flags);
-void * __must_check kasan_krealloc(const void *object, size_t new_size,
-   gfp_t flags);
-
-void * __must_check kasan_slab_alloc(struct kmem_cache *s, void *object,
-   gfp_t flags);
-bool kasan_slab_free(struct kmem_cache *s, void *object, unsigned long ip);
-
-struct kasan_cache {
-   int alloc_meta_offset;
-   int free_meta_offset;
-};
 
 /*
  * These functions provide a special case to support backing module
@@ -107,10 +89,6 @@ static inline void kasan_disable_current(void) {}
 static inline void kasan_alloc_pages(struct page *page, unsigned int order) {}
 static inline void kasan_free_pages(struct page *page, unsigned int order) {}
 
-static inline void kasan_cache_create(struct kmem_cache *cache,
- unsigned int *size,
- slab_flags_t *flags) {}
-
 static inline void kasan_poison_slab(struct page *page) {}
 static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
void *object) {}
@@ -122,17 +100,65 @@ static inline void *kasan_init_slab_obj(struct kmem_cache 
*cache,
return (void *)object;
 }
 
+static inline void kasan_kfree_large(void *ptr, unsigned long ip) {}
+static inline void kasan_poison_kfree(void *ptr, unsigned long ip) {}
+static inline void kasan_free_shadow(const struct vm_struct *vm) {}
+static inline void kasan_remove_zero_shadow(void *start, unsigned long size) {}
+static inline void kasan_unpoison_slab(const void *ptr) {}
+
+static inline int kasan_module_alloc(void *addr, size_t size)
+{
+   return 0;
+}
+
+static inline int kasan_add_zero_shadow(void *start, unsigned long size)
+{
+   return 0;
+}
+
+static inline size_t kasan_metadata_size(struct kmem_cache *cache)
+{
+   return 0;
+}
+
+#endif /* CONFIG_KASAN */
+
+struct kasan_cache {
+   int alloc_meta_offset;
+   int free_meta_offset;
+};
+
+#if defined(CONFIG_KASAN) || defined(CONFIG_SLAB_QUARANTINE)
+
+void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
+   slab_flags_t *flags);
+void * __must_check kasan_kmalloc_large(const void *ptr, size_t size

[PATCH RFC v2 2/6] mm/slab: Perform init_on_free earlier

2020-09-29 Thread Alexander Popov
Currently in CONFIG_SLAB init_on_free happens too late, and heap
objects go to the heap quarantine being dirty. Lets move memory
clearing before calling kasan_slab_free() to fix that.

Signed-off-by: Alexander Popov 
---
 mm/slab.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3160dff6fd76..5140203c5b76 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3414,6 +3414,9 @@ static void cache_flusharray(struct kmem_cache *cachep, 
struct array_cache *ac)
 static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
 unsigned long caller)
 {
+   if (unlikely(slab_want_init_on_free(cachep)))
+   memset(objp, 0, cachep->object_size);
+
/* Put the object into the quarantine, don't touch it for now. */
if (kasan_slab_free(cachep, objp, _RET_IP_))
return;
@@ -3432,8 +3435,6 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
struct array_cache *ac = cpu_cache_get(cachep);
 
check_irq_off();
-   if (unlikely(slab_want_init_on_free(cachep)))
-   memset(objp, 0, cachep->object_size);
kmemleak_free_recursive(objp, cachep->flags);
objp = cache_free_debugcheck(cachep, objp, caller);
memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp);
-- 
2.26.2



Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers

2020-09-28 Thread Alexander Popov
On 22.09.2020 08:59, Muchun Song wrote:
> On Mon, Sep 14, 2020 at 9:56 PM Alexander Popov  wrote:
>>
>> On 07.09.2020 16:53, Muchun Song wrote:
>>> On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov  wrote:
>>>>
>>>> On 07.09.2020 05:54, Muchun Song wrote:
>>>>> Hi all,
>>>>>
>>>>> Any comments or suggestions? Thanks.
>>>>>
>>>>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song  
>>>>> wrote:
>>>>>>
>>>>>> There is a race between the assignment of `table->data` and write value
>>>>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
>>>>>> the other thread.
>>>>>>
>>>>>> CPU0: CPU1:
>>>>>>   proc_sys_write
>>>>>> stack_erasing_sysctlproc_sys_call_handler
>>>>>>   table->data =stack_erasing_sysctl
>>>>>> table->data = 
>>>>>>   proc_doulongvec_minmax
>>>>>> do_proc_doulongvec_minmax sysctl_head_finish
>>>>>>   __do_proc_doulongvec_minmax   unuse_table
>>>>>> i = table->data;
>>>>>> *i = val;  // corrupt CPU1's stack
>>>>
>>>> Hello everyone!
>>>>
>>>> As I remember, I implemented stack_erasing_sysctl() very similar to other 
>>>> sysctl
>>>> handlers. Is that issue relevant for other handlers as well?
>>>
>>> Yeah, it's very similar. But the difference is that others use a
>>> global variable as the
>>> `table->data`, but here we use a local variable as the `table->data`.
>>> The local variable
>>> is allocated from the stack. So other thread could corrupt the stack
>>> like the diagram
>>> above.
>>
>> Hi Muchun,
>>
>> I don't think that the proposed copying of struct ctl_table to local 
>> variable is
>> a good fix of that issue. There might be other bugs caused by concurrent
>> execution of stack_erasing_sysctl().
> 
> Hi Alexander,
> 
> Yeah, we can fix this issue on a higher level in kernel/sysctl.c. But
> we will rework some kernel/sysctl.c base code. Because the commit:
> 
> 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack 
> erasing")
> 
> is introduced from linux-4.20. So we should backport this fix patch to the 
> other
> stable tree. Be the safe side, we can apply this patch to only fix the
> stack_erasing_sysctl. In this case, the impact of backport is minimal.
> 
> In the feature, we can fix the issue(another patch) like this on a higher
> level in kernel/sysctl.c and only apply it in the later kernel version. Is
> this OK?

Muchun, I would recommend:
  1) fixing the reason of the issue in kernel/sysctl.c
or
  2) use some locking in stack_erasing_sysctl() to fix the issue locally.

Honestly, I don't like this "dup_table" approach in the patch below. It doesn't
remove the data race.

Thank you!
Alexander

>> I would recommend using some locking instead.
>>
>> But you say there are other similar issues. Should it be fixed on higher 
>> level
>> in kernel/sysctl.c?
>>
>> [Adding more knowing people to CC]
>>
>> Thanks!
>>
>>>> Muchun, could you elaborate how CPU1's stack is corrupted and how you 
>>>> detected
>>>> that? Thanks!
>>>
>>> Why did I find this problem? Because I solve another problem which is
>>> very similar to
>>> this issue. You can reference the following fix patch. Thanks.
>>>
>>>   https://lkml.org/lkml/2020/8/22/105
>>>>
>>>>>> Fix this by duplicating the `table`, and only update the duplicate of
>>>>>> it.
>>>>>>
>>>>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack 
>>>>>> erasing")
>>>>>> Signed-off-by: Muchun Song 
>>>>>> ---
>>>>>> changelogs in v2:
>>>>>>  1. Add more details about how the race happened to the commit message.
>>>>>>
>>>>>>  kernel/stackleak.c | 11 ---
>>>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>>>>>> index a8fc9ae1d03d..fd95b87478ff 100644
>>>>>> --- a/kernel/stackleak.c
>>>>>> +++ b/kernel/stackleak.c
>>>>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, 
>>>>>> int write,
>>>>>> int ret = 0;
>>>>>> int state = !static_branch_unlikely(_erasing_bypass);
>>>>>> int prev_state = state;
>>>>>> +   struct ctl_table dup_table = *table;
>>>>>>
>>>>>> -   table->data = 
>>>>>> -   table->maxlen = sizeof(int);
>>>>>> -   ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>>>>> +   /*
>>>>>> +* In order to avoid races with __do_proc_doulongvec_minmax(), we
>>>>>> +* can duplicate the @table and alter the duplicate of it.
>>>>>> +*/
>>>>>> +   dup_table.data = 
>>>>>> +   dup_table.maxlen = sizeof(int);
>>>>>> +   ret = proc_dointvec_minmax(_table, write, buffer, lenp, 
>>>>>> ppos);
>>>>>> state = !!state;
>>>>>> if (ret || !write || state == prev_state)
>>>>>> return ret;
>>>>>> --
>>>>>> 2.11.0
> 
> 
> 



Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers

2020-09-14 Thread Alexander Popov
On 07.09.2020 16:53, Muchun Song wrote:
> On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov  wrote:
>>
>> On 07.09.2020 05:54, Muchun Song wrote:
>>> Hi all,
>>>
>>> Any comments or suggestions? Thanks.
>>>
>>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song  
>>> wrote:
>>>>
>>>> There is a race between the assignment of `table->data` and write value
>>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
>>>> the other thread.
>>>>
>>>> CPU0: CPU1:
>>>>   proc_sys_write
>>>> stack_erasing_sysctlproc_sys_call_handler
>>>>   table->data =stack_erasing_sysctl
>>>> table->data = 
>>>>   proc_doulongvec_minmax
>>>> do_proc_doulongvec_minmax sysctl_head_finish
>>>>   __do_proc_doulongvec_minmax   unuse_table
>>>> i = table->data;
>>>> *i = val;  // corrupt CPU1's stack
>>
>> Hello everyone!
>>
>> As I remember, I implemented stack_erasing_sysctl() very similar to other 
>> sysctl
>> handlers. Is that issue relevant for other handlers as well?
> 
> Yeah, it's very similar. But the difference is that others use a
> global variable as the
> `table->data`, but here we use a local variable as the `table->data`.
> The local variable
> is allocated from the stack. So other thread could corrupt the stack
> like the diagram
> above.

Hi Muchun,

I don't think that the proposed copying of struct ctl_table to local variable is
a good fix of that issue. There might be other bugs caused by concurrent
execution of stack_erasing_sysctl().

I would recommend using some locking instead.

But you say there are other similar issues. Should it be fixed on higher level
in kernel/sysctl.c?

[Adding more knowing people to CC]

Thanks!

>> Muchun, could you elaborate how CPU1's stack is corrupted and how you 
>> detected
>> that? Thanks!
> 
> Why did I find this problem? Because I solve another problem which is
> very similar to
> this issue. You can reference the following fix patch. Thanks.
> 
>   https://lkml.org/lkml/2020/8/22/105
>>
>>>> Fix this by duplicating the `table`, and only update the duplicate of
>>>> it.
>>>>
>>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack 
>>>> erasing")
>>>> Signed-off-by: Muchun Song 
>>>> ---
>>>> changelogs in v2:
>>>>  1. Add more details about how the race happened to the commit message.
>>>>
>>>>  kernel/stackleak.c | 11 ---
>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>>>> index a8fc9ae1d03d..fd95b87478ff 100644
>>>> --- a/kernel/stackleak.c
>>>> +++ b/kernel/stackleak.c
>>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int 
>>>> write,
>>>> int ret = 0;
>>>> int state = !static_branch_unlikely(_erasing_bypass);
>>>> int prev_state = state;
>>>> +   struct ctl_table dup_table = *table;
>>>>
>>>> -   table->data = 
>>>> -   table->maxlen = sizeof(int);
>>>> -   ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>>> +   /*
>>>> +* In order to avoid races with __do_proc_doulongvec_minmax(), we
>>>> +* can duplicate the @table and alter the duplicate of it.
>>>> +*/
>>>> +   dup_table.data = 
>>>> +   dup_table.maxlen = sizeof(int);
>>>> +   ret = proc_dointvec_minmax(_table, write, buffer, lenp, ppos);
>>>> state = !!state;
>>>> if (ret || !write || state == prev_state)
>>>> return ret;
>>>> --
>>>> 2.11.0


Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers

2020-09-07 Thread Alexander Popov
On 07.09.2020 05:54, Muchun Song wrote:
> Hi all,
> 
> Any comments or suggestions? Thanks.
> 
> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song  wrote:
>>
>> There is a race between the assignment of `table->data` and write value
>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
>> the other thread.
>>
>> CPU0: CPU1:
>>   proc_sys_write
>> stack_erasing_sysctlproc_sys_call_handler
>>   table->data =stack_erasing_sysctl
>> table->data = 
>>   proc_doulongvec_minmax
>> do_proc_doulongvec_minmax sysctl_head_finish
>>   __do_proc_doulongvec_minmax   unuse_table
>> i = table->data;
>> *i = val;  // corrupt CPU1's stack

Hello everyone!

As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
handlers. Is that issue relevant for other handlers as well?

Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
that? Thanks!

Best regards,
Alexander

>> Fix this by duplicating the `table`, and only update the duplicate of
>> it.
>>
>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack 
>> erasing")
>> Signed-off-by: Muchun Song 
>> ---
>> changelogs in v2:
>>  1. Add more details about how the race happened to the commit message.
>>
>>  kernel/stackleak.c | 11 ---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>> index a8fc9ae1d03d..fd95b87478ff 100644
>> --- a/kernel/stackleak.c
>> +++ b/kernel/stackleak.c
>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int 
>> write,
>> int ret = 0;
>> int state = !static_branch_unlikely(_erasing_bypass);
>> int prev_state = state;
>> +   struct ctl_table dup_table = *table;
>>
>> -   table->data = 
>> -   table->maxlen = sizeof(int);
>> -   ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> +   /*
>> +* In order to avoid races with __do_proc_doulongvec_minmax(), we
>> +* can duplicate the @table and alter the duplicate of it.
>> +*/
>> +   dup_table.data = 
>> +   dup_table.maxlen = sizeof(int);
>> +   ret = proc_dointvec_minmax(_table, write, buffer, lenp, ppos);
>> state = !!state;
>> if (ret || !write || state == prev_state)
>> return ret;
>> --
>> 2.11.0
>>
> 
> 



Re: [PATCH RFC 1/2] mm: Extract SLAB_QUARANTINE from KASAN

2020-08-18 Thread Alexander Popov
On 18.08.2020 18:45, Andrey Konovalov wrote:
> On Mon, Aug 17, 2020 at 7:32 PM Alexander Popov  wrote:
>>
>> On 15.08.2020 19:52, Kees Cook wrote:
>>> On Thu, Aug 13, 2020 at 06:19:21PM +0300, Alexander Popov wrote:
>>>> Heap spraying is an exploitation technique that aims to put controlled
>>>> bytes at a predetermined memory location on the heap. Heap spraying for
>>>> exploiting use-after-free in the Linux kernel relies on the fact that on
>>>> kmalloc(), the slab allocator returns the address of the memory that was
>>>> recently freed. Allocating a kernel object with the same size and
>>>> controlled contents allows overwriting the vulnerable freed object.
>>>>
>>>> Let's extract slab freelist quarantine from KASAN functionality and
>>>> call it CONFIG_SLAB_QUARANTINE. This feature breaks widespread heap
>>>> spraying technique used for exploiting use-after-free vulnerabilities
>>>> in the kernel code.
>>>>
>>>> If this feature is enabled, freed allocations are stored in the quarantine
>>>> and can't be instantly reallocated and overwritten by the exploit
>>>> performing heap spraying.
>>>
>>> It may be worth clarifying that this is specifically only direct UAF and
>>> doesn't help with spray-and-overflow-into-a-neighboring-object attacks
>>> (i.e. both tend to use sprays, but the former doesn't depend on a write
>>> overflow).
>>
>> Andrey Konovalov wrote:
>>> If quarantine is to be used without the rest of KASAN, I'd prefer for
>>> it to be separated from KASAN completely: move to e.g. mm/quarantine.c
>>> and don't mention KASAN in function/config names.
>>
>> Hmm, making quarantine completely separate from KASAN would bring troubles.
>>
>> Currently, in many special places the allocator calls KASAN handlers:
>>   kasan_cache_create()
>>   kasan_slab_free()
>>   kasan_kmalloc_large()
>>   kasan_krealloc()
>>   kasan_slab_alloc()
>>   kasan_kmalloc()
>>   kasan_cache_shrink()
>>   kasan_cache_shutdown()
>>   and some others.
>> These functions do a lot of interesting things and also work with the 
>> quarantine
>> using these helpers:
>>   quarantine_put()
>>   quarantine_reduce()
>>   quarantine_remove_cache()
>>
>> Making quarantine completely separate from KASAN would require to move some
>> internal logic of these KASAN handlers to allocator code.
> 
> It doesn't look like there's quite a lot of KASAN-specific logic there.
> 
> All those quarantine_*() calls are either at the beginning or at the
> end of some kasan annotations, so it should be quite easy to move
> those out. E.g. quarantine_reduce() can be moved together with the
> gfpflags_allow_blocking(flags) check and put before kasan_kmalloc()
> calls (or maybe also into some other places?), quarantine_put() can be
> put after kasan_slab_free(), etc.
> 
>> In this patch I used another approach, that doesn't require changing the API
>> between allocators and KASAN. I added linux/mm/kasan/slab_quarantine.c with 
>> slim
>> KASAN handlers that implement the minimal functionality needed for 
>> quarantine.
>>
>> Do you think that it's a bad solution?
> 
> This solution doesn't look clean. Here you provide a second KASAN
> runtime implementation, parallel to the original one, which only does
> quarantine. It seems much cleaner to put quarantine logic into a
> separate module, which can be either used independently, or together
> with KASAN built on top of it.

That sounds reasonable, I agree. Thanks, Andrey.
Added to TODO list.

At first I'm going to focus on exploring security properties of the quarantine.
And then I'll do the refactoring that you and Kees propose.

Best regards,
Alexander


Re: [PATCH RFC 0/2] Break heap spraying needed for exploiting use-after-free

2020-08-18 Thread Alexander Popov
On 15.08.2020 19:39, Kees Cook wrote:
> On Thu, Aug 13, 2020 at 06:19:20PM +0300, Alexander Popov wrote:
>> I've found an easy way to break heap spraying for use-after-free
>> exploitation. I simply extracted slab freelist quarantine from KASAN
>> functionality and called it CONFIG_SLAB_QUARANTINE. Please see patch 1.
> 
> Ah yeah, good idea. :)
> 
>> [...]
>> I did a brief performance evaluation of this feature.
>>
>> 1. Memory consumption. KASAN quarantine uses 1/32 of the memory.
>> CONFIG_SLAB_QUARANTINE disabled:
>>   # free -m
>> totalusedfree  shared  buff/cache   
>> available
>>   Mem:   1987  391862  10  86
>> 1907
>>   Swap: 0   0   0
>> CONFIG_SLAB_QUARANTINE enabled:
>>   # free -m
>> totalusedfree  shared  buff/cache   
>> available
>>   Mem:   1987 1401760  10  87
>> 1805
>>   Swap: 0   0   0
> 
> 1/32 of memory doesn't seem too bad for someone interested in this defense.

This can be configured. Quote from linux/mm/kasan/quarantine.c:
/*
 * The fraction of physical memory the quarantine is allowed to occupy.
 * Quarantine doesn't support memory shrinker with SLAB allocator, so we keep
 * the ratio low to avoid OOM.
 */
#define QUARANTINE_FRACTION 32

>> 2. Performance penalty. I used `hackbench -s 256 -l 200 -g 15 -f 25 -P`.
>> CONFIG_SLAB_QUARANTINE disabled (x86_64, CONFIG_SLUB):
>>   Times: 3.088, 3.103, 3.068, 3.103, 3.107
>>   Mean: 3.0938
>>   Standard deviation: 0.0144
>> CONFIG_SLAB_QUARANTINE enabled (x86_64, CONFIG_SLUB):
>>   Times: 3.303, 3.329, 3.356, 3.314, 3.292
>>   Mean: 3.3188 (+7.3%)
>>   Standard deviation: 0.0223
> 
> That's rather painful, but hackbench can produce some big deltas given
> it can be an unrealistic workload for most systems. I'd be curious to
> see the "building a kernel" timings, which tends to be much more
> realistic for "busy system" without hammering one particular subsystem
> (though it's a bit VFS heavy, obviously).

I have new results.

CPU: Intel Core i7-6500U CPU @ 2.50GHz

Test: time make O=../build_out/defconfig/ -j2

CONFIG_SLAB_QUARANTINE disabled:
  Times: 10m52.978s 10m50.161s 10m45.601s
  Mean: 649.58s
  Standard deviation: 3.04

CONFIG_SLAB_QUARANTINE enabled:
  Times: 10m56.256s 10m51.919s 10m47.903s
  Mean: 652.026s (+0,38%)
  Standard deviation: 3.41

This test shows much lower performance penalty.

More ideas of tests?

Best regards,
Alexander


Re: [PATCH RFC 1/2] mm: Extract SLAB_QUARANTINE from KASAN

2020-08-17 Thread Alexander Popov
On 16.08.2020 22:59, Pavel Machek wrote:
> On Sat 2020-08-15 19:54:55, Matthew Wilcox wrote:
>> On Thu, Aug 13, 2020 at 06:19:21PM +0300, Alexander Popov wrote:
>>> +config SLAB_QUARANTINE
>>> +   bool "Enable slab freelist quarantine"
>>> +   depends on !KASAN && (SLAB || SLUB)
>>> +   help
>>> + Enable slab freelist quarantine to break heap spraying technique
>>> + used for exploiting use-after-free vulnerabilities in the kernel
>>> + code. If this feature is enabled, freed allocations are stored
>>> + in the quarantine and can't be instantly reallocated and
>>> + overwritten by the exploit performing heap spraying.
>>> + This feature is a part of KASAN functionality.
>>
>> After this patch, it isn't part of KASAN any more ;-)
>>
>> The way this is written is a bit too low level.  Let's write it in terms
>> that people who don't know the guts of the slab allocator or security
>> terminology can understand:
>>
>>Delay reuse of freed slab objects.  This makes some security
>>exploits harder to execute.  It reduces performance slightly
>>as objects will be cache cold by the time they are reallocated,
>>and it costs a small amount of memory.
> 
> Written this way, it invites questions:
> 
> Does it introduce any new deadlocks in near out-of-memory situations?

Linux kernel with enabled KASAN is heavily tested by syzbot.
I think Dmitry and Andrey can give good answers to your question.

Some time ago I was doing Linux kernel fuzzing with syzkaller on low memory
virtual machines (with KASAN and LOCKUP_DETECTOR enabled). I gave less than 1G
to each debian stretch VM. I didn't get any special deadlock caused by OOM.

Best regards,
Alexander


Re: [PATCH RFC 1/2] mm: Extract SLAB_QUARANTINE from KASAN

2020-08-17 Thread Alexander Popov
On 15.08.2020 21:54, Matthew Wilcox wrote:
> On Thu, Aug 13, 2020 at 06:19:21PM +0300, Alexander Popov wrote:
>> +config SLAB_QUARANTINE
>> +bool "Enable slab freelist quarantine"
>> +depends on !KASAN && (SLAB || SLUB)
>> +help
>> +  Enable slab freelist quarantine to break heap spraying technique
>> +  used for exploiting use-after-free vulnerabilities in the kernel
>> +  code. If this feature is enabled, freed allocations are stored
>> +  in the quarantine and can't be instantly reallocated and
>> +  overwritten by the exploit performing heap spraying.
>> +  This feature is a part of KASAN functionality.
> 
> After this patch, it isn't part of KASAN any more ;-)

Ok, I'll change that to "this feature is used by KASAN" :)

> The way this is written is a bit too low level.  Let's write it in terms
> that people who don't know the guts of the slab allocator or security
> terminology can understand:
> 
> Delay reuse of freed slab objects.  This makes some security
> exploits harder to execute.  It reduces performance slightly
> as objects will be cache cold by the time they are reallocated,
> and it costs a small amount of memory.
> 
> (feel free to edit this)

Ok, I see.
I'll start from high-level description and add low-level details at the end.

>> +struct qlist_node {
>> +struct qlist_node *next;
>> +};
> 
> I appreciate this isn't new, but why do we have a new singly-linked-list
> abstraction being defined in this code?

I don't know for sure.
I suppose it is caused by SLAB/SLUB freelist implementation details (qlist_node
in kasan_free_meta is also used for the allocator freelist).

Best regards,
Alexander


Re: [PATCH RFC 2/2] lkdtm: Add heap spraying test

2020-08-17 Thread Alexander Popov
On 15.08.2020 19:59, Kees Cook wrote:
> On Thu, Aug 13, 2020 at 06:19:22PM +0300, Alexander Popov wrote:
>> Add a simple test for CONFIG_SLAB_QUARANTINE.
>>
>> It performs heap spraying that aims to reallocate the recently freed heap
>> object. This technique is used for exploiting use-after-free
>> vulnerabilities in the kernel code.
>>
>> This test shows that CONFIG_SLAB_QUARANTINE breaks heap spraying
>> exploitation technique.
> 
> Yay tests!

Yes :)
I'm going to improve it to demonstrate the quarantine security properties.

>> Signed-off-by: Alexander Popov 
>> ---
>>  drivers/misc/lkdtm/core.c  |  1 +
>>  drivers/misc/lkdtm/heap.c  | 40 ++
>>  drivers/misc/lkdtm/lkdtm.h |  1 +
>>  3 files changed, 42 insertions(+)
>>
>> diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
>> index a5e344df9166..78b7669c35eb 100644
>> --- a/drivers/misc/lkdtm/core.c
>> +++ b/drivers/misc/lkdtm/core.c
>> @@ -126,6 +126,7 @@ static const struct crashtype crashtypes[] = {
>>  CRASHTYPE(SLAB_FREE_DOUBLE),
>>  CRASHTYPE(SLAB_FREE_CROSS),
>>  CRASHTYPE(SLAB_FREE_PAGE),
>> +CRASHTYPE(HEAP_SPRAY),
>>  CRASHTYPE(SOFTLOCKUP),
>>  CRASHTYPE(HARDLOCKUP),
>>  CRASHTYPE(SPINLOCKUP),
>> diff --git a/drivers/misc/lkdtm/heap.c b/drivers/misc/lkdtm/heap.c
>> index 1323bc16f113..a72a241e314a 100644
>> --- a/drivers/misc/lkdtm/heap.c
>> +++ b/drivers/misc/lkdtm/heap.c
>> @@ -205,6 +205,46 @@ static void ctor_a(void *region)
>>  static void ctor_b(void *region)
>>  { }
>>  
>> +#define HEAP_SPRAY_SIZE 128
>> +
>> +void lkdtm_HEAP_SPRAY(void)
>> +{
>> +int *addr;
>> +int *spray_addrs[HEAP_SPRAY_SIZE] = { 0 };
> 
> (the 0 isn't needed -- and it was left there, it should be NULL)

It is used in tear-down below.
I'll change it to { NULL }.

>> +unsigned long i = 0;
>> +
>> +addr = kmem_cache_alloc(a_cache, GFP_KERNEL);
> 
> I would prefer this test add its own cache (e.g. spray_cache), to avoid
> misbehaviors between tests. (e.g. the a and b caches already run the
> risk of getting corrupted weirdly.)

Ok, I'll do that.

>> +if (!addr) {
>> +pr_info("Unable to allocate memory in lkdtm-heap-a cache\n");
>> +return;
>> +}
>> +
>> +*addr = 0x31337;
>> +kmem_cache_free(a_cache, addr);
>> +
>> +pr_info("Performing heap spraying...\n");
>> +for (i = 0; i < HEAP_SPRAY_SIZE; i++) {
>> +spray_addrs[i] = kmem_cache_alloc(a_cache, GFP_KERNEL);
>> +*spray_addrs[i] = 0x31337;
>> +pr_info("attempt %lu: spray alloc addr %p vs freed addr %p\n",
>> +i, spray_addrs[i], addr);
> 
> That's 128 lines spewed into dmesg... I would leave this out.

Ok.

>> +if (spray_addrs[i] == addr) {
>> +pr_info("freed addr is reallocated!\n");
>> +break;
>> +}
>> +}
>> +
>> +if (i < HEAP_SPRAY_SIZE)
>> +pr_info("FAIL! Heap spraying succeed :(\n");
> 
> I'd move this into the "if (spray_addrs[i] == addr)" test instead of the
> pr_info() that is there.
> 
>> +else
>> +pr_info("OK! Heap spraying hasn't succeed :)\n");
> 
> And then make this an "if (i == HEAP_SPRAY_SIZE)" test

Do you mean that I need to avoid the additional line in the test output,
printing only the final result?

>> +
>> +for (i = 0; i < HEAP_SPRAY_SIZE; i++) {
>> +if (spray_addrs[i])
>> +kmem_cache_free(a_cache, spray_addrs[i]);
>> +}
>> +}
>> +
>>  void __init lkdtm_heap_init(void)
>>  {
>>  double_free_cache = kmem_cache_create("lkdtm-heap-double_free",
>> diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
>> index 8878538b2c13..dfafb4ae6f3a 100644
>> --- a/drivers/misc/lkdtm/lkdtm.h
>> +++ b/drivers/misc/lkdtm/lkdtm.h
>> @@ -45,6 +45,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
>>  void lkdtm_SLAB_FREE_DOUBLE(void);
>>  void lkdtm_SLAB_FREE_CROSS(void);
>>  void lkdtm_SLAB_FREE_PAGE(void);
>> +void lkdtm_HEAP_SPRAY(void);
>>  
>>  /* lkdtm_perms.c */
>>  void __init lkdtm_perms_init(void);
>> -- 
>> 2.26.2
>>
> 
> I assume enabling the quarantine defense also ends up being seen in the
> SLAB_FREE_DOUBLE LKDTM test too, yes?

I'll experiment with that.

Thank you!

Best regards,
Alexander


Re: [PATCH RFC 1/2] mm: Extract SLAB_QUARANTINE from KASAN

2020-08-17 Thread Alexander Popov
On 15.08.2020 19:52, Kees Cook wrote:
> On Thu, Aug 13, 2020 at 06:19:21PM +0300, Alexander Popov wrote:
>> Heap spraying is an exploitation technique that aims to put controlled
>> bytes at a predetermined memory location on the heap. Heap spraying for
>> exploiting use-after-free in the Linux kernel relies on the fact that on
>> kmalloc(), the slab allocator returns the address of the memory that was
>> recently freed. Allocating a kernel object with the same size and
>> controlled contents allows overwriting the vulnerable freed object.
>>
>> Let's extract slab freelist quarantine from KASAN functionality and
>> call it CONFIG_SLAB_QUARANTINE. This feature breaks widespread heap
>> spraying technique used for exploiting use-after-free vulnerabilities
>> in the kernel code.
>>
>> If this feature is enabled, freed allocations are stored in the quarantine
>> and can't be instantly reallocated and overwritten by the exploit
>> performing heap spraying.
> 
> It may be worth clarifying that this is specifically only direct UAF and
> doesn't help with spray-and-overflow-into-a-neighboring-object attacks
> (i.e. both tend to use sprays, but the former doesn't depend on a write
> overflow).

Right, thank you.

>> Signed-off-by: Alexander Popov 
>> ---
>>  include/linux/kasan.h  | 107 -
>>  include/linux/slab_def.h   |   2 +-
>>  include/linux/slub_def.h   |   2 +-
>>  init/Kconfig   |  11 
>>  mm/Makefile|   3 +-
>>  mm/kasan/Makefile  |   2 +
>>  mm/kasan/kasan.h   |  75 +-
>>  mm/kasan/quarantine.c  |   2 +
>>  mm/kasan/slab_quarantine.c |  99 ++
>>  mm/slub.c  |   2 +-
>>  10 files changed, 216 insertions(+), 89 deletions(-)
>>  create mode 100644 mm/kasan/slab_quarantine.c
>>
>> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
>> index 087fba34b209..b837216f760c 100644
>> --- a/include/linux/kasan.h
>> +++ b/include/linux/kasan.h

[...]

>>  #else /* CONFIG_KASAN_GENERIC */
>> +static inline void kasan_record_aux_stack(void *ptr) {}
>> +#endif /* CONFIG_KASAN_GENERIC */
>>  
>> +#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_SLAB_QUARANTINE)
>> +void kasan_cache_shrink(struct kmem_cache *cache);
>> +void kasan_cache_shutdown(struct kmem_cache *cache);
>> +#else /* CONFIG_KASAN_GENERIC || CONFIG_SLAB_QUARANTINE */
>>  static inline void kasan_cache_shrink(struct kmem_cache *cache) {}
>>  static inline void kasan_cache_shutdown(struct kmem_cache *cache) {}
>> -static inline void kasan_record_aux_stack(void *ptr) {}
>> -
>> -#endif /* CONFIG_KASAN_GENERIC */
>> +#endif /* CONFIG_KASAN_GENERIC || CONFIG_SLAB_QUARANTINE */
> 
> In doing this extraction, I wonder if function naming should be changed?
> If it's going to live a new life outside of KASAN proper, maybe call
> these functions quarantine_cache_*()? But perhaps that's too much
> churn...

These functions are kasan handlers that are called by allocator.
I.e. allocator calls kasan handlers, and then kasan handlers call
quarantine_put(), quarantine_reduce() and quarantine_remove_cache() among other
things.

Andrey Konovalov wrote:
> If quarantine is to be used without the rest of KASAN, I'd prefer for
> it to be separated from KASAN completely: move to e.g. mm/quarantine.c
> and don't mention KASAN in function/config names.

Hmm, making quarantine completely separate from KASAN would bring troubles.

Currently, in many special places the allocator calls KASAN handlers:
  kasan_cache_create()
  kasan_slab_free()
  kasan_kmalloc_large()
  kasan_krealloc()
  kasan_slab_alloc()
  kasan_kmalloc()
  kasan_cache_shrink()
  kasan_cache_shutdown()
  and some others.
These functions do a lot of interesting things and also work with the quarantine
using these helpers:
  quarantine_put()
  quarantine_reduce()
  quarantine_remove_cache()

Making quarantine completely separate from KASAN would require to move some
internal logic of these KASAN handlers to allocator code.

In this patch I used another approach, that doesn't require changing the API
between allocators and KASAN. I added linux/mm/kasan/slab_quarantine.c with slim
KASAN handlers that implement the minimal functionality needed for quarantine.

Do you think that it's a bad solution?

>>  #ifdef CONFIG_KASAN_SW_TAGS
>>  
>> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
>> index 9eb430c163c2..fc7548f27512 100644
>> --- a/include/linux/slab_def.h
>> +++ b/include/linux/slab_def.h
>> @@ -72,7 +72,7 @@ struct kmem_cache {
>>  int obj_offse

Re: [PATCH RFC 0/2] Break heap spraying needed for exploiting use-after-free

2020-08-14 Thread Alexander Popov
On 13.08.2020 18:19, Alexander Popov wrote:
> Hello everyone! Requesting for your comments.
> 
> Use-after-free vulnerabilities in the Linux kernel are very popular for
> exploitation. A few examples:
>  
> https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
>  
> https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html?m=1
>  https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> 
> Use-after-free exploits usually employ heap spraying technique.
> Generally it aims to put controlled bytes at a predetermined memory
> location on the heap. Heap spraying for exploiting use-after-free in
> the Linux kernel relies on the fact that on kmalloc(), the slab allocator
> returns the address of the memory that was recently freed. So allocating
> a kernel object with the same size and controlled contents allows
> overwriting the vulnerable freed object.
> 
> I've found an easy way to break heap spraying for use-after-free
> exploitation. I simply extracted slab freelist quarantine from KASAN
> functionality and called it CONFIG_SLAB_QUARANTINE. Please see patch 1.
> 
> If this feature is enabled, freed allocations are stored in the quarantine
> and can't be instantly reallocated and overwritten by the exploit
> performing heap spraying.
> 
> In patch 2 you can see the lkdtm test showing how CONFIG_SLAB_QUARANTINE
> prevents immediate reallocation of a freed heap object.
> 
> I tested this patch series both for CONFIG_SLUB and CONFIG_SLAB.
> 
> CONFIG_SLAB_QUARANTINE disabled:
>   # echo HEAP_SPRAY > /sys/kernel/debug/provoke-crash/DIRECT
>   lkdtm: Performing direct entry HEAP_SPRAY
>   lkdtm: Performing heap spraying...
>   lkdtm: attempt 0: spray alloc addr f8699c7d vs freed addr 
> f8699c7d
>   lkdtm: freed addr is reallocated!
>   lkdtm: FAIL! Heap spraying succeed :(
> 
> CONFIG_SLAB_QUARANTINE enabled:
>   # echo HEAP_SPRAY > /sys/kernel/debug/provoke-crash/DIRECT
>   lkdtm: Performing direct entry HEAP_SPRAY
>   lkdtm: Performing heap spraying...
>   lkdtm: attempt 0: spray alloc addr 9cafb63f vs freed addr 
> 173cce94
>   lkdtm: attempt 1: spray alloc addr 3096911f vs freed addr 
> 173cce94
>   lkdtm: attempt 2: spray alloc addr da60d755 vs freed addr 
> 173cce94
>   lkdtm: attempt 3: spray alloc addr 0b415070 vs freed addr 
> 173cce94
>   ...
>   lkdtm: attempt 126: spray alloc addr e80ef807 vs freed addr 
> 173cce94
>   lkdtm: attempt 127: spray alloc addr 398fe535 vs freed addr 
> 173cce94
>   lkdtm: OK! Heap spraying hasn't succeed :)
> 
> I did a brief performance evaluation of this feature.
> 
> 1. Memory consumption. KASAN quarantine uses 1/32 of the memory.
> CONFIG_SLAB_QUARANTINE disabled:
>   # free -m
> totalusedfree  shared  buff/cache   
> available
>   Mem:   1987  391862  10  86
> 1907
>   Swap: 0   0   0
> CONFIG_SLAB_QUARANTINE enabled:
>   # free -m
> totalusedfree  shared  buff/cache   
> available
>   Mem:   1987 1401760  10  87
> 1805
>   Swap: 0   0   0
> 
> 2. Performance penalty. I used `hackbench -s 256 -l 200 -g 15 -f 25 -P`.
> CONFIG_SLAB_QUARANTINE disabled (x86_64, CONFIG_SLUB):
>   Times: 3.088, 3.103, 3.068, 3.103, 3.107
>   Mean: 3.0938
>   Standard deviation: 0.0144
> CONFIG_SLAB_QUARANTINE enabled (x86_64, CONFIG_SLUB):
>   Times: 3.303, 3.329, 3.356, 3.314, 3.292
>   Mean: 3.3188 (+7.3%)
>   Standard deviation: 0.0223
> 
> I would appreciate your feedback!

While waiting for the feedback on these RFC patches, I compiled a list of topics
for further research:

 - Possible ways to overwrite a quarantined heap object by making a large amount
of allocations (with/without freeing them)

 - How init_on_free=1 affects heap spraying on a system with the heap quarantine

 - How releasing batches of quarantine objects right away affects heap spraying
reliability

 - Heap spraying on multi-core systems with the heap quarantine

 - More precise performance evaluation

 - Possible ways to improve the security properties and performance results
(KASAN quarantine has some interesting settings)

Best regards,
Alexander

> Alexander Popov (2):
>   mm: Extract SLAB_QUARANTINE from KASAN
>   lkdtm: Add heap spraying test
> 
>  drivers/misc/lkdtm/core.c  |   1 +
>  drivers/misc/lkdtm/heap.c  |  40 ++
>  drivers/misc/lkdtm/lkdtm.h |   1 +
>  include/linux/kasan.h  | 107 ++

[PATCH RFC 2/2] lkdtm: Add heap spraying test

2020-08-13 Thread Alexander Popov
Add a simple test for CONFIG_SLAB_QUARANTINE.

It performs heap spraying that aims to reallocate the recently freed heap
object. This technique is used for exploiting use-after-free
vulnerabilities in the kernel code.

This test shows that CONFIG_SLAB_QUARANTINE breaks heap spraying
exploitation technique.

Signed-off-by: Alexander Popov 
---
 drivers/misc/lkdtm/core.c  |  1 +
 drivers/misc/lkdtm/heap.c  | 40 ++
 drivers/misc/lkdtm/lkdtm.h |  1 +
 3 files changed, 42 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..78b7669c35eb 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -126,6 +126,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(SLAB_FREE_DOUBLE),
CRASHTYPE(SLAB_FREE_CROSS),
CRASHTYPE(SLAB_FREE_PAGE),
+   CRASHTYPE(HEAP_SPRAY),
CRASHTYPE(SOFTLOCKUP),
CRASHTYPE(HARDLOCKUP),
CRASHTYPE(SPINLOCKUP),
diff --git a/drivers/misc/lkdtm/heap.c b/drivers/misc/lkdtm/heap.c
index 1323bc16f113..a72a241e314a 100644
--- a/drivers/misc/lkdtm/heap.c
+++ b/drivers/misc/lkdtm/heap.c
@@ -205,6 +205,46 @@ static void ctor_a(void *region)
 static void ctor_b(void *region)
 { }
 
+#define HEAP_SPRAY_SIZE 128
+
+void lkdtm_HEAP_SPRAY(void)
+{
+   int *addr;
+   int *spray_addrs[HEAP_SPRAY_SIZE] = { 0 };
+   unsigned long i = 0;
+
+   addr = kmem_cache_alloc(a_cache, GFP_KERNEL);
+   if (!addr) {
+   pr_info("Unable to allocate memory in lkdtm-heap-a cache\n");
+   return;
+   }
+
+   *addr = 0x31337;
+   kmem_cache_free(a_cache, addr);
+
+   pr_info("Performing heap spraying...\n");
+   for (i = 0; i < HEAP_SPRAY_SIZE; i++) {
+   spray_addrs[i] = kmem_cache_alloc(a_cache, GFP_KERNEL);
+   *spray_addrs[i] = 0x31337;
+   pr_info("attempt %lu: spray alloc addr %p vs freed addr %p\n",
+   i, spray_addrs[i], addr);
+   if (spray_addrs[i] == addr) {
+   pr_info("freed addr is reallocated!\n");
+   break;
+   }
+   }
+
+   if (i < HEAP_SPRAY_SIZE)
+   pr_info("FAIL! Heap spraying succeed :(\n");
+   else
+   pr_info("OK! Heap spraying hasn't succeed :)\n");
+
+   for (i = 0; i < HEAP_SPRAY_SIZE; i++) {
+   if (spray_addrs[i])
+   kmem_cache_free(a_cache, spray_addrs[i]);
+   }
+}
+
 void __init lkdtm_heap_init(void)
 {
double_free_cache = kmem_cache_create("lkdtm-heap-double_free",
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 8878538b2c13..dfafb4ae6f3a 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -45,6 +45,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
 void lkdtm_SLAB_FREE_DOUBLE(void);
 void lkdtm_SLAB_FREE_CROSS(void);
 void lkdtm_SLAB_FREE_PAGE(void);
+void lkdtm_HEAP_SPRAY(void);
 
 /* lkdtm_perms.c */
 void __init lkdtm_perms_init(void);
-- 
2.26.2



[PATCH RFC 1/2] mm: Extract SLAB_QUARANTINE from KASAN

2020-08-13 Thread Alexander Popov
Heap spraying is an exploitation technique that aims to put controlled
bytes at a predetermined memory location on the heap. Heap spraying for
exploiting use-after-free in the Linux kernel relies on the fact that on
kmalloc(), the slab allocator returns the address of the memory that was
recently freed. Allocating a kernel object with the same size and
controlled contents allows overwriting the vulnerable freed object.

Let's extract slab freelist quarantine from KASAN functionality and
call it CONFIG_SLAB_QUARANTINE. This feature breaks widespread heap
spraying technique used for exploiting use-after-free vulnerabilities
in the kernel code.

If this feature is enabled, freed allocations are stored in the quarantine
and can't be instantly reallocated and overwritten by the exploit
performing heap spraying.

Signed-off-by: Alexander Popov 
---
 include/linux/kasan.h  | 107 -
 include/linux/slab_def.h   |   2 +-
 include/linux/slub_def.h   |   2 +-
 init/Kconfig   |  11 
 mm/Makefile|   3 +-
 mm/kasan/Makefile  |   2 +
 mm/kasan/kasan.h   |  75 +-
 mm/kasan/quarantine.c  |   2 +
 mm/kasan/slab_quarantine.c |  99 ++
 mm/slub.c  |   2 +-
 10 files changed, 216 insertions(+), 89 deletions(-)
 create mode 100644 mm/kasan/slab_quarantine.c

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 087fba34b209..b837216f760c 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -42,32 +42,14 @@ void kasan_unpoison_task_stack(struct task_struct *task);
 void kasan_alloc_pages(struct page *page, unsigned int order);
 void kasan_free_pages(struct page *page, unsigned int order);
 
-void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
-   slab_flags_t *flags);
-
 void kasan_poison_slab(struct page *page);
 void kasan_unpoison_object_data(struct kmem_cache *cache, void *object);
 void kasan_poison_object_data(struct kmem_cache *cache, void *object);
 void * __must_check kasan_init_slab_obj(struct kmem_cache *cache,
const void *object);
 
-void * __must_check kasan_kmalloc_large(const void *ptr, size_t size,
-   gfp_t flags);
 void kasan_kfree_large(void *ptr, unsigned long ip);
 void kasan_poison_kfree(void *ptr, unsigned long ip);
-void * __must_check kasan_kmalloc(struct kmem_cache *s, const void *object,
-   size_t size, gfp_t flags);
-void * __must_check kasan_krealloc(const void *object, size_t new_size,
-   gfp_t flags);
-
-void * __must_check kasan_slab_alloc(struct kmem_cache *s, void *object,
-   gfp_t flags);
-bool kasan_slab_free(struct kmem_cache *s, void *object, unsigned long ip);
-
-struct kasan_cache {
-   int alloc_meta_offset;
-   int free_meta_offset;
-};
 
 /*
  * These functions provide a special case to support backing module
@@ -107,10 +89,6 @@ static inline void kasan_disable_current(void) {}
 static inline void kasan_alloc_pages(struct page *page, unsigned int order) {}
 static inline void kasan_free_pages(struct page *page, unsigned int order) {}
 
-static inline void kasan_cache_create(struct kmem_cache *cache,
- unsigned int *size,
- slab_flags_t *flags) {}
-
 static inline void kasan_poison_slab(struct page *page) {}
 static inline void kasan_unpoison_object_data(struct kmem_cache *cache,
void *object) {}
@@ -122,17 +100,65 @@ static inline void *kasan_init_slab_obj(struct kmem_cache 
*cache,
return (void *)object;
 }
 
+static inline void kasan_kfree_large(void *ptr, unsigned long ip) {}
+static inline void kasan_poison_kfree(void *ptr, unsigned long ip) {}
+static inline void kasan_free_shadow(const struct vm_struct *vm) {}
+static inline void kasan_remove_zero_shadow(void *start, unsigned long size) {}
+static inline void kasan_unpoison_slab(const void *ptr) {}
+
+static inline int kasan_module_alloc(void *addr, size_t size)
+{
+   return 0;
+}
+
+static inline int kasan_add_zero_shadow(void *start, unsigned long size)
+{
+   return 0;
+}
+
+static inline size_t kasan_metadata_size(struct kmem_cache *cache)
+{
+   return 0;
+}
+
+#endif /* CONFIG_KASAN */
+
+struct kasan_cache {
+   int alloc_meta_offset;
+   int free_meta_offset;
+};
+
+#if defined(CONFIG_KASAN) || defined(CONFIG_SLAB_QUARANTINE)
+
+void kasan_cache_create(struct kmem_cache *cache, unsigned int *size,
+   slab_flags_t *flags);
+void * __must_check kasan_kmalloc_large(const void *ptr, size_t size,
+   gfp_t flags);
+void * __must_check kasan_kmalloc(struct kmem_cache *s, const void *object

[PATCH RFC 0/2] Break heap spraying needed for exploiting use-after-free

2020-08-13 Thread Alexander Popov
Hello everyone! Requesting for your comments.

Use-after-free vulnerabilities in the Linux kernel are very popular for
exploitation. A few examples:
 
https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html
 
https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html?m=1
 https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html

Use-after-free exploits usually employ heap spraying technique.
Generally it aims to put controlled bytes at a predetermined memory
location on the heap. Heap spraying for exploiting use-after-free in
the Linux kernel relies on the fact that on kmalloc(), the slab allocator
returns the address of the memory that was recently freed. So allocating
a kernel object with the same size and controlled contents allows
overwriting the vulnerable freed object.

I've found an easy way to break heap spraying for use-after-free
exploitation. I simply extracted slab freelist quarantine from KASAN
functionality and called it CONFIG_SLAB_QUARANTINE. Please see patch 1.

If this feature is enabled, freed allocations are stored in the quarantine
and can't be instantly reallocated and overwritten by the exploit
performing heap spraying.

In patch 2 you can see the lkdtm test showing how CONFIG_SLAB_QUARANTINE
prevents immediate reallocation of a freed heap object.

I tested this patch series both for CONFIG_SLUB and CONFIG_SLAB.

CONFIG_SLAB_QUARANTINE disabled:
  # echo HEAP_SPRAY > /sys/kernel/debug/provoke-crash/DIRECT
  lkdtm: Performing direct entry HEAP_SPRAY
  lkdtm: Performing heap spraying...
  lkdtm: attempt 0: spray alloc addr f8699c7d vs freed addr 
f8699c7d
  lkdtm: freed addr is reallocated!
  lkdtm: FAIL! Heap spraying succeed :(

CONFIG_SLAB_QUARANTINE enabled:
  # echo HEAP_SPRAY > /sys/kernel/debug/provoke-crash/DIRECT
  lkdtm: Performing direct entry HEAP_SPRAY
  lkdtm: Performing heap spraying...
  lkdtm: attempt 0: spray alloc addr 9cafb63f vs freed addr 
173cce94
  lkdtm: attempt 1: spray alloc addr 3096911f vs freed addr 
173cce94
  lkdtm: attempt 2: spray alloc addr da60d755 vs freed addr 
173cce94
  lkdtm: attempt 3: spray alloc addr 0b415070 vs freed addr 
173cce94
  ...
  lkdtm: attempt 126: spray alloc addr e80ef807 vs freed addr 
173cce94
  lkdtm: attempt 127: spray alloc addr 398fe535 vs freed addr 
173cce94
  lkdtm: OK! Heap spraying hasn't succeed :)

I did a brief performance evaluation of this feature.

1. Memory consumption. KASAN quarantine uses 1/32 of the memory.
CONFIG_SLAB_QUARANTINE disabled:
  # free -m
totalusedfree  shared  buff/cache   
available
  Mem:   1987  391862  10  86
1907
  Swap: 0   0   0
CONFIG_SLAB_QUARANTINE enabled:
  # free -m
totalusedfree  shared  buff/cache   
available
  Mem:   1987 1401760  10  87
1805
  Swap: 0   0   0

2. Performance penalty. I used `hackbench -s 256 -l 200 -g 15 -f 25 -P`.
CONFIG_SLAB_QUARANTINE disabled (x86_64, CONFIG_SLUB):
  Times: 3.088, 3.103, 3.068, 3.103, 3.107
  Mean: 3.0938
  Standard deviation: 0.0144
CONFIG_SLAB_QUARANTINE enabled (x86_64, CONFIG_SLUB):
  Times: 3.303, 3.329, 3.356, 3.314, 3.292
  Mean: 3.3188 (+7.3%)
  Standard deviation: 0.0223

I would appreciate your feedback!

Best regards,
Alexander

Alexander Popov (2):
  mm: Extract SLAB_QUARANTINE from KASAN
  lkdtm: Add heap spraying test

 drivers/misc/lkdtm/core.c  |   1 +
 drivers/misc/lkdtm/heap.c  |  40 ++
 drivers/misc/lkdtm/lkdtm.h |   1 +
 include/linux/kasan.h  | 107 -
 include/linux/slab_def.h   |   2 +-
 include/linux/slub_def.h   |   2 +-
 init/Kconfig   |  11 
 mm/Makefile|   3 +-
 mm/kasan/Makefile  |   2 +
 mm/kasan/kasan.h   |  75 +-
 mm/kasan/quarantine.c  |   2 +
 mm/kasan/slab_quarantine.c |  99 ++
 mm/slub.c  |   2 +-
 13 files changed, 258 insertions(+), 89 deletions(-)
 create mode 100644 mm/kasan/slab_quarantine.c

-- 
2.26.2



Re: [PATCH v2 5/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-24 Thread Alexander Popov
On 24.06.2020 15:53, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 03:33:30PM +0300, Alexander Popov wrote:
>> Add 'verbose' plugin parameter for stackleak gcc plugin.
>> It can be used for printing additional info about the kernel code
>> instrumentation.
>>
>> For using it add the following to scripts/Makefile.gcc-plugins:
>>   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
>> += -fplugin-arg-stackleak_plugin-verbose
> 
> Would be nice if we instead could pass an argument to make which lets
> us enable this.

This feature is useful only for debugging stackleak gcc plugin.

The cflag that enables it is similar to -fplugin-arg-structleak_plugin-verbose,
which is used for debugging the structleak plugin.

This debugging feature clutters the kernel build output, I don't think that many
people will use it. So IMO creating a separate argument for make is not really
needed.

Thanks!

Best regards,
Alexander


Re: [PATCH v2 2/5] ARM: vdso: Don't use gcc plugins for building vgettimeofday.c

2020-06-24 Thread Alexander Popov
On 24.06.2020 15:52, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 03:33:27PM +0300, Alexander Popov wrote:
>> Don't use gcc plugins for building arch/arm/vdso/vgettimeofday.c to
>> avoid unneeded instrumentation.
>>
>> Signed-off-by: Alexander Popov 
> 
> But why is skipping it safe?

Hello Luis,

Kees and Will discussed that in detail in v1 of the series:
https://lore.kernel.org/lkml/20200610073046.GA15939@willie-the-truck/

Best regards,
Alexander


[PATCH v2 5/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-24 Thread Alexander Popov
Add 'verbose' plugin parameter for stackleak gcc plugin.
It can be used for printing additional info about the kernel code
instrumentation.

For using it add the following to scripts/Makefile.gcc-plugins:
  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
+= -fplugin-arg-stackleak_plugin-verbose

Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 47 +++---
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index a18b0d4af456..48e141e07956 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -34,6 +34,8 @@ __visible int plugin_is_GPL_compatible;
 static int track_frame_size = -1;
 static bool build_for_x86 = false;
 static const char track_function[] = "stackleak_track_stack";
+static bool disable = false;
+static bool verbose = false;
 
 /*
  * Mark these global variables (roots) for gcc garbage collector since
@@ -46,6 +48,7 @@ static struct plugin_info stackleak_plugin_info = {
.help = "track-min-size=nn\ttrack stack for functions with a stack 
frame size >= nn bytes\n"
"arch=target_arch\tspecify target build arch\n"
"disable\t\tdo not activate the plugin\n"
+   "verbose\t\tprint info about the instrumentation\n"
 };
 
 static void add_stack_tracking_gcall(gimple_stmt_iterator *gsi, bool after)
@@ -102,6 +105,10 @@ static tree get_current_stack_pointer_decl(void)
return var;
}
 
+   if (verbose) {
+   fprintf(stderr, "stackleak: missing current_stack_pointer in 
%s()\n",
+   DECL_NAME_POINTER(current_function_decl));
+   }
return NULL_TREE;
 }
 
@@ -195,6 +202,11 @@ static unsigned int stackleak_instrument_execute(void)
if (!is_alloca(stmt))
continue;
 
+   if (verbose) {
+   fprintf(stderr, "stackleak: be careful, 
alloca() in %s()\n",
+   
DECL_NAME_POINTER(current_function_decl));
+   }
+
/* Insert stackleak_track_stack() call after alloca() */
add_stack_tracking(, true);
if (bb == entry_bb)
@@ -384,13 +396,31 @@ static bool remove_stack_tracking_gasm(void)
  */
 static unsigned int stackleak_cleanup_execute(void)
 {
+   const char *fn = DECL_NAME_POINTER(current_function_decl);
bool removed = false;
 
-   if (cfun->calls_alloca)
+   /*
+* Leave stack tracking in functions that call alloca().
+* Additional case:
+*   gcc before version 7 called allocate_dynamic_stack_space() from
+*   expand_stack_vars() for runtime alignment of constant-sized stack
+*   variables. That caused cfun->calls_alloca to be set for functions
+*   that in fact don't use alloca().
+*   For more info see gcc commit 7072df0aae0c59ae437e.
+*   Let's leave such functions instrumented as well.
+*/
+   if (cfun->calls_alloca) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s(): 
calls_alloca\n", fn);
return 0;
+   }
 
-   if (large_stack_frame())
+   /* Leave stack tracking in functions with large stack frame */
+   if (large_stack_frame()) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s()\n", fn);
return 0;
+   }
 
if (lookup_attribute_spec(get_identifier("no_caller_saved_registers")))
removed = remove_stack_tracking_gasm();
@@ -516,9 +546,6 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
/* Parse the plugin arguments */
for (i = 0; i < argc; i++) {
-   if (!strcmp(argv[i].key, "disable"))
-   return 0;
-
if (!strcmp(argv[i].key, "track-min-size")) {
if (!argv[i].value) {
error(G_("no value supplied for option 
'-fplugin-arg-%s-%s'"),
@@ -541,6 +568,10 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
if (!strcmp(argv[i].value, "x86"))
build_for_x86 = true;
+   } else if (!strcmp(argv[i].key, "disable")) {
+   disable = true;
+   } else if (!strcmp(argv[i].key, "verbose")) {
+   verbose = true;
} else {
error(G_("unknown option '-fplugin-arg-%s-%s'"),
plugin_n

[PATCH v2 4/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-24 Thread Alexander Popov
The kernel code instrumentation in stackleak gcc plugin works in two stages.
At first, stack tracking is added to GIMPLE representation of every function
(except some special cases). And later, when stack frame size info is
available, stack tracking is removed from the RTL representation of the
functions with small stack frame. There is an unwanted side-effect for these
functions: some of them do useless work with caller-saved registers.

As an example of such case, proc_sys_write without() instrumentation:
55  push   %rbp
41 b8 01 00 00 00   mov$0x1,%r8d
48 89 e5mov%rsp,%rbp
e8 11 ff ff ff  callq  81284610 
5d  pop%rbp
c3  retq
0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
00 00 00

proc_sys_write() with instrumentation:
55  push   %rbp
48 89 e5mov%rsp,%rbp
41 56   push   %r14
41 55   push   %r13
41 54   push   %r12
53  push   %rbx
49 89 f4mov%rsi,%r12
48 89 fbmov%rdi,%rbx
49 89 d5mov%rdx,%r13
49 89 cemov%rcx,%r14
4c 89 f1mov%r14,%rcx
4c 89 eamov%r13,%rdx
4c 89 e6mov%r12,%rsi
48 89 dfmov%rbx,%rdi
41 b8 01 00 00 00   mov$0x1,%r8d
e8 f2 fe ff ff  callq  81298e80 
5b  pop%rbx
41 5c   pop%r12
41 5d   pop%r13
41 5e   pop%r14
5d  pop%rbp
c3  retq
66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
00 00

Let's improve the instrumentation to avoid this:

1. Make stackleak_track_stack() save all register that it works with.
Use no_caller_saved_registers attribute for that function. This attribute
is available for x86_64 and i386 starting from gcc-7.

2. Insert calling stackleak_track_stack() in asm:
  asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer))
Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h.
The input constraint is taken into account during gcc shrink-wrapping
optimization. It is needed to be sure that stackleak_track_stack() call is
inserted after the prologue of the containing function, when the stack
frame is prepared.

This work is a deep reengineering of the idea described on grsecurity blog
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

Signed-off-by: Alexander Popov 
Acked-by: Miguel Ojeda 
---
 include/linux/compiler_attributes.h|  13 ++
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 205 +
 4 files changed, 196 insertions(+), 40 deletions(-)

diff --git a/include/linux/compiler_attributes.h 
b/include/linux/compiler_attributes.h
index cdf016596659..551ea8cb70b1 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -37,6 +37,7 @@
 # define __GCC4_has_attribute___copy__0
 # define __GCC4_has_attribute___designated_init__ 0
 # define __GCC4_has_attribute___externally_visible__  1
+# define __GCC4_has_attribute___no_caller_saved_registers__ 0
 # define __GCC4_has_attribute___noclone__ 1
 # define __GCC4_has_attribute___nonstring__   0
 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
@@ -175,6 +176,18 @@
  */
 #define __mode(x)   __attribute__((__mode__(x)))
 
+/*
+ * Optional: only supported since gcc >= 7
+ *
+ *   gcc: 
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86
+ * clang: 
https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers
+ */
+#if __has_attribute(__no_caller_saved_registers__)
+# define __no_caller_saved_registers   
__attribute__((__no_caller_saved_registers__))
+#else
+# define __no_caller_saved_registers
+#endif
+
 /*
  * Optional: not supported by clang
  *
diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index b193a59fc05b..a8fc9ae1d03d 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -104,19 +104,9 @@ asmlinkage void notrace stackleak_erase(void)
 }
 NOKPROBE_SYMBOL(stackleak_erase);
 
-void __used notrace stackleak_track_stack(void)
+void __used __no_caller_saved_registers notrace stackleak_track_stack(void)
 {
-   /*
-* N.B. stackleak_erase() fills the kernel stack with the poison value,
-* which has the register width. That code assumes that the value
-* of 'lowest_stack' is aligned on t

[PATCH v2 3/5] arm64: vdso: Don't use gcc plugins for building vgettimeofday.c

2020-06-24 Thread Alexander Popov
Don't use gcc plugins for building arch/arm64/kernel/vdso/vgettimeofday.c
to avoid unneeded instrumentation.

Signed-off-by: Alexander Popov 
---
 arch/arm64/kernel/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 556d424c6f52..0f1ad63b3326 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -29,7 +29,7 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 
--hash-style=sysv \
 ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
 ccflags-y += -DDISABLE_BRANCH_PROFILING
 
-CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS)
+CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) 
$(GCC_PLUGINS_CFLAGS)
 KBUILD_CFLAGS  += $(DISABLE_LTO)
 KASAN_SANITIZE := n
 UBSAN_SANITIZE := n
-- 
2.25.4



[PATCH v2 2/5] ARM: vdso: Don't use gcc plugins for building vgettimeofday.c

2020-06-24 Thread Alexander Popov
Don't use gcc plugins for building arch/arm/vdso/vgettimeofday.c to
avoid unneeded instrumentation.

Signed-off-by: Alexander Popov 
---
 arch/arm/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index d3c9f03e7e79..a54f70731d9f 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -29,7 +29,7 @@ CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 CFLAGS_REMOVE_vdso.o = -pg
 
 # Force -O2 to avoid libgcc dependencies
-CFLAGS_REMOVE_vgettimeofday.o = -pg -Os
+CFLAGS_REMOVE_vgettimeofday.o = -pg -Os $(GCC_PLUGINS_CFLAGS)
 ifeq ($(c-gettimeofday-y),)
 CFLAGS_vgettimeofday.o = -O2
 else
-- 
2.25.4



[PATCH v2 1/5] gcc-plugins/stackleak: Don't instrument itself

2020-06-24 Thread Alexander Popov
There is no need to try instrumenting functions in kernel/stackleak.c.
Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
is disabled.

Signed-off-by: Alexander Popov 
Acked-by: Kees Cook 
---
 kernel/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index f3218bc5ec69..155b5380500a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o
 
 obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
 
+CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN)
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
 KCSAN_SANITIZE_stackleak.o := n
-- 
2.25.4



[PATCH v2 0/5] Improvements of the stackleak gcc plugin

2020-06-24 Thread Alexander Popov
This is the v2 of the patch series with various improvements of the
stackleak gcc plugin.

The first three patches disable unneeded gcc plugin instrumentation for
some files.

The fourth patch is the main improvement. It eliminates an unwanted
side-effect of kernel code instrumentation performed by stackleak gcc
plugin. This patch is a deep reengineering of the idea described on
grsecurity blog:
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

The final patch adds 'verbose' stackleak parameter for printing additional
info about the kernel code instrumentation during kernel building.

I would like to thank Alexander Monakov  for his
advisory on gcc internals.

This patch series was tested for gcc version 4.8, 5, 6, 7, 8, 9, and 10
on x86_64, i386 and arm64.
That was done using the project 'kernel-build-containers':
  https://github.com/a13xp0p0v/kernel-build-containers

Changes from v1:
 - rebase onto 5.8.0-rc2;
 - don't exclude alloca() from the instrumentation logic, because it
   will be used in kernel stack offset randomization;
 - reorder patches in the series;
 - don't use gcc plugins for building vgettimeofday.c in arm and
   arm64 vDSO;
 - follow alphabetic order in include/linux/compiler_attributes.h.

Link to v1:
 https://lore.kernel.org/lkml/20200604134957.505389-1-alex.po...@linux.com/


Alexander Popov (5):
  gcc-plugins/stackleak: Don't instrument itself
  ARM: vdso: Don't use gcc plugins for building vgettimeofday.c
  arm64: vdso: Don't use gcc plugins for building vgettimeofday.c
  gcc-plugins/stackleak: Use asm instrumentation to avoid useless
register saving
  gcc-plugins/stackleak: Add 'verbose' plugin parameter

 arch/arm/vdso/Makefile |   2 +-
 arch/arm64/kernel/vdso/Makefile|   2 +-
 include/linux/compiler_attributes.h|  13 ++
 kernel/Makefile|   1 +
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 248 +
 7 files changed, 239 insertions(+), 45 deletions(-)

-- 
2.25.4



Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall

2020-06-23 Thread Alexander Popov
On 22.06.2020 22:31, Kees Cook wrote:
> As Linux kernel stack protections have been constantly improving
> (vmap-based stack allocation with guard pages, removal of thread_info,
> STACKLEAK), attackers have had to find new ways for their exploits
> to work. They have done so, continuing to rely on the kernel's stack
> determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
> were not relevant. For example, the following recent attacks would have
> been hampered if the stack offset was non-deterministic between syscalls:
> 
> https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
> (page 70: targeting the pt_regs copy with linear stack overflow)
> 
> https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
> (leaked stack address from one syscall as a target during next syscall)
> 
> The main idea is that since the stack offset is randomized on each system
> call, it is harder for an attack to reliably land in any particular place
> on the thread stack, even with address exposures, as the stack base will
> change on the next syscall. Also, since randomization is performed after
> placing pt_regs, the ptrace-based approach[1] to discover the randomized
> offset during a long-running syscall should not be possible.

Hello Kees!

I would recommend to disable CONFIG_STACKLEAK_METRICS if kernel stack offset
randomization is enabled. It is a debugging feature that provides information
about kernel stack usage. That info can be useful for calculating the random 
offset.

I would also recommend to check: there might be other kernel features for
debugging or getting statistics that can be used to disclose the random stack
offset.

Best regards,
Alexander


Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-23 Thread Alexander Popov
On 10.06.2020 10:30, Will Deacon wrote:
> On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote:
>> arm_ssp_per_task_plugin.c
>>  32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?)

I tested: on 32-bit arm vDSO is built with plugin flags. I will filter them out
in a separate patch in v2 of the series.

> On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still
> get the plugins?
I tested it with this command:
  make  ARCH=arm64 CROSS_COMPILE_COMPAT=arm-linux-gnueabi-
CROSS_COMPILE=aarch64-linux-gnu- V=1

I see that COMPAT_VDSO is built without plugin flags. So it's ok.

Best regards,
Alexander


Re: [PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-11 Thread Alexander Popov
On 10.06.2020 23:03, Kees Cook wrote:
> On Wed, Jun 10, 2020 at 06:47:14PM +0300, Alexander Popov wrote:
>> On 09.06.2020 21:46, Kees Cook wrote:
>> The inline asm statement that is used for instrumentation is arch-specific.
>> Trying to add
>>   asm volatile("call stackleak_track_stack")
>> in gcc plugin on aarch64 makes gcc break spectacularly.
> 
> Ah! Thank you, that eluded my eyes. :)
> 
>> I pass the target arch name to the plugin and check it explicitly to avoid 
>> that.
>>
>> Moreover, I'm going to create a gcc enhancement request for supporting
>> no_caller_saved_registers attribute on aarch64.
> 
> For arm64 right now it looks like the plugin will just remain
> "inefficient" in these cleanup, as before, yes?

Yes, for arm64 the instrumentation didn't change in this patch series.
I checked the disasm and see the similar issue with useless register saving.

I'm going to add asm instrumentation for arm64 when (I hope) the
no_caller_saved_registers attribute becomes available for that platform.

Best regards,
Alexander


Re: [PATCH 3/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-10 Thread Alexander Popov
On 09.06.2020 21:47, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 04:49:55PM +0300, Alexander Popov wrote:
>> Add 'verbose' plugin parameter for stackleak gcc plugin.
>> It can be used for printing additional info about the kernel code
>> instrumentation.
>>
>> For using it add the following to scripts/Makefile.gcc-plugins:
>>   gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
>> += -fplugin-arg-stackleak_plugin-verbose
>>
>> Signed-off-by: Alexander Popov 
> 
> Acked-by: Kees Cook 

I see that I will change this patch after leaving alloca() support.
I'm going to add debug printing about functions that call alloca().
I have to omit your 'acked-by' for the changed patch, right?

Best regards,
Alexander


Re: [PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-10 Thread Alexander Popov
On 09.06.2020 21:46, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 04:49:54PM +0300, Alexander Popov wrote:
>> Let's improve the instrumentation to avoid this:
>>
>> 1. Make stackleak_track_stack() save all register that it works with.
>> Use no_caller_saved_registers attribute for that function. This attribute
>> is available for x86_64 and i386 starting from gcc-7.
>>
>> 2. Insert calling stackleak_track_stack() in asm:
>>   asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer))
>> Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h.
>> The input constraint is taken into account during gcc shrink-wrapping
>> optimization. It is needed to be sure that stackleak_track_stack() call is
>> inserted after the prologue of the containing function, when the stack
>> frame is prepared.
> 
> Very cool; nice work!
> 
>> +static void add_stack_tracking(gimple_stmt_iterator *gsi)
>> +{
>> +/*
>> + * The 'no_caller_saved_registers' attribute is used for
>> + * stackleak_track_stack(). If the compiler supports this attribute for
>> + * the target arch, we can add calling stackleak_track_stack() in asm.
>> + * That improves performance: we avoid useless operations with the
>> + * caller-saved registers in the functions from which we will remove
>> + * stackleak_track_stack() call during the stackleak_cleanup pass.
>> + */
>> +if (lookup_attribute_spec(get_identifier("no_caller_saved_registers")))
>> +add_stack_tracking_gasm(gsi);
>> +else
>> +add_stack_tracking_gcall(gsi);
>> +}
> 
> The build_for_x86 flag is only ever used as an assert() test against
> no_caller_saved_registers, but we're able to test for that separately.
> Why does the architecture need to be tested? (i.e. when this flag
> becomes supported o other architectures, why must it still be x86-only?)

The inline asm statement that is used for instrumentation is arch-specific.
Trying to add
  asm volatile("call stackleak_track_stack")
in gcc plugin on aarch64 makes gcc break spectacularly.
I pass the target arch name to the plugin and check it explicitly to avoid that.

Moreover, I'm going to create a gcc enhancement request for supporting
no_caller_saved_registers attribute on aarch64.

Best regards,
Alexander


Re: [PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic

2020-06-10 Thread Alexander Popov
On 09.06.2020 21:39, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 06:23:38PM +0300, Alexander Popov wrote:
>> On 04.06.2020 17:01, Jann Horn wrote:
>>> On Thu, Jun 4, 2020 at 3:51 PM Alexander Popov  wrote:
>>>> Some time ago Variable Length Arrays (VLA) were removed from the kernel.
>>>> The kernel is built with '-Wvla'. Let's exclude alloca() from the
>>>> instrumentation logic and make it simpler. The build-time assertion
>>>> against alloca() is added instead.
>>> [...]
>>>> +   /* Variable Length Arrays are forbidden in the 
>>>> kernel */
>>>> +   gcc_assert(!is_alloca(stmt));
>>>
>>> There is a patch series from Elena and Kees on the kernel-hardening
>>> list that deliberately uses __builtin_alloca() in the syscall entry
>>> path to randomize the stack pointer per-syscall - see
>>> <https://lore.kernel.org/kernel-hardening/20200406231606.37619-4-keesc...@chromium.org/>.
>>
>> Thanks, Jann.
>>
>> At first glance, leaving alloca() handling in stackleak instrumentation logic
>> would allow to integrate stackleak and this version of random_kstack_offset.
> 
> Right, it seems there would be a need for this coverage to remain,
> otherwise the depth of stack erasure might be incorrect.
> 
> It doesn't seem like the other patches strictly depend on alloca()
> support being removed, though?

Ok, I will leave alloca() support, reorganize the patch series and send v2.

>> Kees, Elena, did you try random_kstack_offset with upstream stackleak?
> 
> I didn't try that combination yet, no. It seemed there would likely
> still be further discussion about the offset series first (though the
> thread has been silent -- I'll rebase and resend it after rc2).

Ok, please add me to CC list.

Best regards,
Alexander

>> It looks to me that without stackleak erasing random_kstack_offset can be
>> weaker. I mean, if next syscall has a bigger stack randomization gap, the 
>> data
>> on thread stack from the previous syscall is not overwritten and can be 
>> used. Am
>> I right?
> 
> That's correct. I think the combination is needed, but I don't think
> they need to be strictly tied together.
> 
>> Another aspect: CONFIG_STACKLEAK_METRICS can be used for guessing kernel 
>> stack
>> offset, which is bad. It should be disabled if random_kstack_offset is on.
> 
> Agreed.



Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-10 Thread Alexander Popov
On 10.06.2020 10:30, Will Deacon wrote:
> On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote:
>> On Thu, Jun 04, 2020 at 02:58:06PM +0100, Will Deacon wrote:
>>> On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote:
>>>> Don't try instrumenting functions in 
>>>> arch/arm64/kernel/vdso/vgettimeofday.c.
>>>> Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
>>>> is disabled.
>>>>
>>>> Signed-off-by: Alexander Popov 
>>>> ---
>>>>  arch/arm64/kernel/vdso/Makefile | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/kernel/vdso/Makefile 
>>>> b/arch/arm64/kernel/vdso/Makefile
>>>> index 3862cad2410c..9b84cafbd2da 100644
>>>> --- a/arch/arm64/kernel/vdso/Makefile
>>>> +++ b/arch/arm64/kernel/vdso/Makefile
>>>> @@ -32,7 +32,8 @@ UBSAN_SANITIZE   := n
>>>>  OBJECT_FILES_NON_STANDARD := y
>>>>  KCOV_INSTRUMENT   := n
>>>>  
>>>> -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables
>>>> +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \
>>>> +  $(DISABLE_STACKLEAK_PLUGIN)
>>>
>>> I can pick this one up via arm64, thanks. Are there any other plugins we
>>> should be wary of? It looks like x86 filters out $(GCC_PLUGINS_CFLAGS)
>>> when building the vDSO.
>>
>> I didn't realize/remember that arm64 retained the kernel build flags for
>> vDSO builds. (I'm used to x86 throwing all its flags away for its vDSO.)
>>
>> How does 32-bit ARM do its vDSO?
>>
>> My quick run-through on plugins:
>>
>> arm_ssp_per_task_plugin.c
>>  32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?)
> 
> On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still
> get the plugins?
> 
>> cyc_complexity_plugin.c
>>  compile-time reporting only
>>
>> latent_entropy_plugin.c
>>  this shouldn't get triggered for the vDSO (no __latent_entropy
>>  nor __init attributes in vDSO), but perhaps explicitly disabling
>>  it would be a sensible thing to do, just for robustness?
>>
>> randomize_layout_plugin.c
>>  this shouldn't get triggered (again, lacking attributes), but
>>  should likely be disabled too.
>>
>> sancov_plugin.c
>>  This should be tracking the KCOV directly (see
>>  scripts/Makefile.kcov), which is already disabled here.
>>
>> structleak_plugin.c
>>  This should be fine in the vDSO, but there's not security
>>  boundary here, so it wouldn't be important to KEEP it enabled.
> 
> Thanks for going through these. In general though, it seems like an
> opt-in strategy would make more sense, as it doesn't make an awful lot
> of sense to me for the plugins to be used to build the vDSO.
> 
> So I would prefer that this patch filters out $(GCC_PLUGINS_CFLAGS).

Ok, I will do that in the v2 of the patch series.

Best regards,
Alexander


Re: [PATCH 0/5] Improvements of the stackleak gcc plugin

2020-06-10 Thread Alexander Popov
On 09.06.2020 22:15, Kees Cook wrote:
> On Thu, Jun 04, 2020 at 04:49:52PM +0300, Alexander Popov wrote:
>> In this patch series I collected various improvements of the stackleak
>> gcc plugin.
> 
> Thanks!
> 
>> Alexander Popov (5):
>>   gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
>>   gcc-plugins/stackleak: Use asm instrumentation to avoid useless
>> register saving
> 
> These look like they might need tweaks (noted in their separate
> replies).

Thanks for the review, Kees.

>>   gcc-plugins/stackleak: Add 'verbose' plugin parameter
>>   gcc-plugins/stackleak: Don't instrument itself
> 
> If you wanted to reorder the series and move these first, I could take
> these into my tree right away (they're logically separate from the other
> fixes).

Ok, I will put "don't instrument itself" at the beginning of v2.

The patch adding 'verbose' plugin parameter depends on the previous patches, so
I will not move it.

>>   gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
> 
> This seems good -- though I'm curious about 32-bit ARM and the other
> HAVE_GCC_PLUGINS architectures with vDSOs (which appears to be all of
> them except um).

(going to reply in a separate email)

Best regards,
Alexander



Re: [PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic

2020-06-04 Thread Alexander Popov
On 04.06.2020 17:01, Jann Horn wrote:
> On Thu, Jun 4, 2020 at 3:51 PM Alexander Popov  wrote:
>> Some time ago Variable Length Arrays (VLA) were removed from the kernel.
>> The kernel is built with '-Wvla'. Let's exclude alloca() from the
>> instrumentation logic and make it simpler. The build-time assertion
>> against alloca() is added instead.
> [...]
>> +   /* Variable Length Arrays are forbidden in the 
>> kernel */
>> +   gcc_assert(!is_alloca(stmt));
> 
> There is a patch series from Elena and Kees on the kernel-hardening
> list that deliberately uses __builtin_alloca() in the syscall entry
> path to randomize the stack pointer per-syscall - see
> <https://lore.kernel.org/kernel-hardening/20200406231606.37619-4-keesc...@chromium.org/>.

Thanks, Jann.

At first glance, leaving alloca() handling in stackleak instrumentation logic
would allow to integrate stackleak and this version of random_kstack_offset.

Kees, Elena, did you try random_kstack_offset with upstream stackleak?

It looks to me that without stackleak erasing random_kstack_offset can be
weaker. I mean, if next syscall has a bigger stack randomization gap, the data
on thread stack from the previous syscall is not overwritten and can be used. Am
I right?

Another aspect: CONFIG_STACKLEAK_METRICS can be used for guessing kernel stack
offset, which is bad. It should be disabled if random_kstack_offset is on.

Best regards,
Alexander


Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-04 Thread Alexander Popov
On 04.06.2020 17:25, Jann Horn wrote:
> On Thu, Jun 4, 2020 at 4:21 PM Alexander Popov  wrote:
>> On 04.06.2020 17:14, Jann Horn wrote:
>>> Maybe at some point we should replace exclusions based on
>>> GCC_PLUGINS_CFLAGS and KASAN_SANITIZE and UBSAN_SANITIZE and
>>> OBJECT_FILES_NON_STANDARD and so on with something more generic...
>>> something that says "this file will not be built into the normal
>>> kernel, it contains code that runs in realmode / userspace / some
>>> similarly weird context, and none of our instrumentation
>>> infrastructure is available there"...
>>
>> Good idea. I would also add 'notrace' to that list.
> 
> Hm? notrace code should definitely still be subject to sanitizer
> instrumentation.

I mean ftrace is sometimes disabled for functions that are executed in those
weird contexts. As well as kcov instrumentation.

It would be nice if that generic mechanism could help with choosing which kernel
code instrumentation technologies should be disabled in the given context.

Best regards,
Alexander


Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-04 Thread Alexander Popov
On 04.06.2020 17:14, Jann Horn wrote:
> On Thu, Jun 4, 2020 at 3:58 PM Will Deacon  wrote:
>> On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote:
>>> Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c.
>>> Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
>>> is disabled.
>>>
>>> Signed-off-by: Alexander Popov 
>>> ---
>>>  arch/arm64/kernel/vdso/Makefile | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/vdso/Makefile 
>>> b/arch/arm64/kernel/vdso/Makefile
>>> index 3862cad2410c..9b84cafbd2da 100644
>>> --- a/arch/arm64/kernel/vdso/Makefile
>>> +++ b/arch/arm64/kernel/vdso/Makefile
>>> @@ -32,7 +32,8 @@ UBSAN_SANITIZE  := n
>>>  OBJECT_FILES_NON_STANDARD:= y
>>>  KCOV_INSTRUMENT  := n
>>>
>>> -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables
>>> +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \
>>> + $(DISABLE_STACKLEAK_PLUGIN)
>>
>> I can pick this one up via arm64, thanks. Are there any other plugins we
>> should be wary of? 

I can't tell exactly. I'm sure Kees has the whole picture.

>> It looks like x86 filters out $(GCC_PLUGINS_CFLAGS)
>> when building the vDSO.

Yes, that's why building x86 vDSO doesn't need $(DISABLE_STACKLEAK_PLUGIN).

> Maybe at some point we should replace exclusions based on
> GCC_PLUGINS_CFLAGS and KASAN_SANITIZE and UBSAN_SANITIZE and
> OBJECT_FILES_NON_STANDARD and so on with something more generic...
> something that says "this file will not be built into the normal
> kernel, it contains code that runs in realmode / userspace / some
> similarly weird context, and none of our instrumentation
> infrastructure is available there"...

Good idea. I would also add 'notrace' to that list.

Best regards,
Alexander


[PATCH 1/5] gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic

2020-06-04 Thread Alexander Popov
Some time ago Variable Length Arrays (VLA) were removed from the kernel.
The kernel is built with '-Wvla'. Let's exclude alloca() from the
instrumentation logic and make it simpler. The build-time assertion
against alloca() is added instead.

Unfortunately, for that assertion we can't simply check cfun->calls_alloca
during RTL phase. It turned out that gcc before version 7 called
allocate_dynamic_stack_space() from expand_stack_vars() for runtime
alignment of constant-sized stack variables. That caused cfun->calls_alloca
to be set for functions that don't use alloca().

Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 51 +++---
 1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index cc75eeba0be1..1ecfe50d0bf5 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -9,10 +9,9 @@
  * any of the gcc libraries
  *
  * This gcc plugin is needed for tracking the lowest border of the kernel 
stack.
- * It instruments the kernel code inserting stackleak_track_stack() calls:
- *  - after alloca();
- *  - for the functions with a stack frame size greater than or equal
- * to the "track-min-size" plugin parameter.
+ * It instruments the kernel code inserting stackleak_track_stack() calls
+ * for the functions with a stack frame size greater than or equal to
+ * the "track-min-size" plugin parameter.
  *
  * This plugin is ported from grsecurity/PaX. For more information see:
  *   https://grsecurity.net/
@@ -46,7 +45,7 @@ static struct plugin_info stackleak_plugin_info = {
"disable\t\tdo not activate the plugin\n"
 };
 
-static void stackleak_add_track_stack(gimple_stmt_iterator *gsi, bool after)
+static void stackleak_add_track_stack(gimple_stmt_iterator *gsi)
 {
gimple stmt;
gcall *stackleak_track_stack;
@@ -56,12 +55,7 @@ static void stackleak_add_track_stack(gimple_stmt_iterator 
*gsi, bool after)
/* Insert call to void stackleak_track_stack(void) */
stmt = gimple_build_call(track_function_decl, 0);
stackleak_track_stack = as_a_gcall(stmt);
-   if (after) {
-   gsi_insert_after(gsi, stackleak_track_stack,
-   GSI_CONTINUE_LINKING);
-   } else {
-   gsi_insert_before(gsi, stackleak_track_stack, GSI_SAME_STMT);
-   }
+   gsi_insert_before(gsi, stackleak_track_stack, GSI_SAME_STMT);
 
/* Update the cgraph */
bb = gimple_bb(stackleak_track_stack);
@@ -87,14 +81,13 @@ static bool is_alloca(gimple stmt)
 
 /*
  * Work with the GIMPLE representation of the code. Insert the
- * stackleak_track_stack() call after alloca() and into the beginning
- * of the function if it is not instrumented.
+ * stackleak_track_stack() call into the beginning of the function.
  */
 static unsigned int stackleak_instrument_execute(void)
 {
basic_block bb, entry_bb;
-   bool prologue_instrumented = false, is_leaf = true;
-   gimple_stmt_iterator gsi;
+   bool is_leaf = true;
+   gimple_stmt_iterator gsi = { 0 };
 
/*
 * ENTRY_BLOCK_PTR is a basic block which represents possible entry
@@ -111,27 +104,17 @@ static unsigned int stackleak_instrument_execute(void)
 */
FOR_EACH_BB_FN(bb, cfun) {
for (gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next()) {
-   gimple stmt;
-
-   stmt = gsi_stmt(gsi);
+   gimple stmt = gsi_stmt(gsi);
 
/* Leaf function is a function which makes no calls */
if (is_gimple_call(stmt))
is_leaf = false;
 
-   if (!is_alloca(stmt))
-   continue;
-
-   /* Insert stackleak_track_stack() call after alloca() */
-   stackleak_add_track_stack(, true);
-   if (bb == entry_bb)
-   prologue_instrumented = true;
+   /* Variable Length Arrays are forbidden in the kernel */
+   gcc_assert(!is_alloca(stmt));
}
}
 
-   if (prologue_instrumented)
-   return 0;
-
/*
 * Special cases to skip the instrumentation.
 *
@@ -168,7 +151,7 @@ static unsigned int stackleak_instrument_execute(void)
bb = single_succ(ENTRY_BLOCK_PTR_FOR_FN(cfun));
}
gsi = gsi_after_labels(bb);
-   stackleak_add_track_stack(, false);
+   stackleak_add_track_stack();
 
return 0;
 }
@@ -185,12 +168,20 @@ static bool large_stack_frame(void)
 /*
  * Work with the RTL representation of the code.
  * Remove the unneeded stackleak_track_stack() calls from the functions
- * which don't call alloca() and d

[PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

2020-06-04 Thread Alexander Popov
Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c.
Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
is disabled.

Signed-off-by: Alexander Popov 
---
 arch/arm64/kernel/vdso/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 3862cad2410c..9b84cafbd2da 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -32,7 +32,8 @@ UBSAN_SANITIZE:= n
 OBJECT_FILES_NON_STANDARD  := y
 KCOV_INSTRUMENT:= n
 
-CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables
+CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \
+   $(DISABLE_STACKLEAK_PLUGIN)
 
 ifneq ($(c-gettimeofday-y),)
   CFLAGS_vgettimeofday.o += -include $(c-gettimeofday-y)
-- 
2.25.2



[PATCH 3/5] gcc-plugins/stackleak: Add 'verbose' plugin parameter

2020-06-04 Thread Alexander Popov
Add 'verbose' plugin parameter for stackleak gcc plugin.
It can be used for printing additional info about the kernel code
instrumentation.

For using it add the following to scripts/Makefile.gcc-plugins:
  gcc-plugin-cflags-$(CONFIG_GCC_PLUGIN_STACKLEAK) \
+= -fplugin-arg-stackleak_plugin-verbose

Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 31 +-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index 0769c5b9156d..19358712d4ed 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -33,6 +33,8 @@ __visible int plugin_is_GPL_compatible;
 static int track_frame_size = -1;
 static bool build_for_x86 = false;
 static const char track_function[] = "stackleak_track_stack";
+static bool disable = false;
+static bool verbose = false;
 
 /*
  * Mark these global variables (roots) for gcc garbage collector since
@@ -45,6 +47,7 @@ static struct plugin_info stackleak_plugin_info = {
.help = "track-min-size=nn\ttrack stack for functions with a stack 
frame size >= nn bytes\n"
"arch=target_arch\tspecify target build arch\n"
"disable\t\tdo not activate the plugin\n"
+   "verbose\t\tprint info about the instrumentation\n"
 };
 
 static void add_stack_tracking_gcall(gimple_stmt_iterator *gsi)
@@ -98,6 +101,10 @@ static tree get_current_stack_pointer_decl(void)
return var;
}
 
+   if (verbose) {
+   fprintf(stderr, "stackleak: missing current_stack_pointer in 
%s()\n",
+   DECL_NAME_POINTER(current_function_decl));
+   }
return NULL_TREE;
 }
 
@@ -366,6 +373,7 @@ static bool remove_stack_tracking_gasm(void)
  */
 static unsigned int stackleak_cleanup_execute(void)
 {
+   const char *fn = DECL_NAME_POINTER(current_function_decl);
bool removed = false;
 
/*
@@ -376,11 +384,17 @@ static unsigned int stackleak_cleanup_execute(void)
 * For more info see gcc commit 7072df0aae0c59ae437e.
 * Let's leave such functions instrumented.
 */
-   if (cfun->calls_alloca)
+   if (cfun->calls_alloca) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s() old\n", fn);
return 0;
+   }
 
-   if (large_stack_frame())
+   if (large_stack_frame()) {
+   if (verbose)
+   fprintf(stderr, "stackleak: instrument %s()\n", fn);
return 0;
+   }
 
if (lookup_attribute_spec(get_identifier("no_caller_saved_registers")))
removed = remove_stack_tracking_gasm();
@@ -506,9 +520,6 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
/* Parse the plugin arguments */
for (i = 0; i < argc; i++) {
-   if (!strcmp(argv[i].key, "disable"))
-   return 0;
-
if (!strcmp(argv[i].key, "track-min-size")) {
if (!argv[i].value) {
error(G_("no value supplied for option 
'-fplugin-arg-%s-%s'"),
@@ -531,6 +542,10 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
 
if (!strcmp(argv[i].value, "x86"))
build_for_x86 = true;
+   } else if (!strcmp(argv[i].key, "disable")) {
+   disable = true;
+   } else if (!strcmp(argv[i].key, "verbose")) {
+   verbose = true;
} else {
error(G_("unknown option '-fplugin-arg-%s-%s'"),
plugin_name, argv[i].key);
@@ -538,6 +553,12 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
}
}
 
+   if (disable) {
+   if (verbose)
+   fprintf(stderr, "stackleak: disabled for this 
translation unit\n");
+   return 0;
+   }
+
/* Give the information about the plugin */
register_callback(plugin_name, PLUGIN_INFO, NULL,
_plugin_info);
-- 
2.25.2



[PATCH 2/5] gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving

2020-06-04 Thread Alexander Popov
The kernel code instrumentation in stackleak gcc plugin works in two stages.
At first, stack tracking is added to GIMPLE representation of every function
(except some special cases). And later, when stack frame size info is
available, stack tracking is removed from the RTL representation of the
functions with small stack frame. There is an unwanted side-effect for these
functions: some of them do useless work with caller-saved registers.

As an example of such case, proc_sys_write without instrumentation:
55  push   %rbp
41 b8 01 00 00 00   mov$0x1,%r8d
48 89 e5mov%rsp,%rbp
e8 11 ff ff ff  callq  81284610 
5d  pop%rbp
c3  retq
0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
00 00 00

proc_sys_write with instrumentation:
55  push   %rbp
48 89 e5mov%rsp,%rbp
41 56   push   %r14
41 55   push   %r13
41 54   push   %r12
53  push   %rbx
49 89 f4mov%rsi,%r12
48 89 fbmov%rdi,%rbx
49 89 d5mov%rdx,%r13
49 89 cemov%rcx,%r14
4c 89 f1mov%r14,%rcx
4c 89 eamov%r13,%rdx
4c 89 e6mov%r12,%rsi
48 89 dfmov%rbx,%rdi
41 b8 01 00 00 00   mov$0x1,%r8d
e8 f2 fe ff ff  callq  81298e80 
5b  pop%rbx
41 5c   pop%r12
41 5d   pop%r13
41 5e   pop%r14
5d  pop%rbp
c3  retq
66 0f 1f 84 00 00 00nopw   0x0(%rax,%rax,1)
00 00

Let's improve the instrumentation to avoid this:

1. Make stackleak_track_stack() save all register that it works with.
Use no_caller_saved_registers attribute for that function. This attribute
is available for x86_64 and i386 starting from gcc-7.

2. Insert calling stackleak_track_stack() in asm:
  asm volatile("call stackleak_track_stack" :: "r" (current_stack_pointer))
Here we use ASM_CALL_CONSTRAINT trick from arch/x86/include/asm/asm.h.
The input constraint is taken into account during gcc shrink-wrapping
optimization. It is needed to be sure that stackleak_track_stack() call is
inserted after the prologue of the containing function, when the stack
frame is prepared.

This work is a deep reengineering of the idea described on grsecurity blog
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

Signed-off-by: Alexander Popov 
---
 include/linux/compiler_attributes.h|  13 ++
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 206 +
 4 files changed, 196 insertions(+), 41 deletions(-)

diff --git a/include/linux/compiler_attributes.h 
b/include/linux/compiler_attributes.h
index cdf016596659..522d57ae8532 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -41,6 +41,7 @@
 # define __GCC4_has_attribute___nonstring__   0
 # define __GCC4_has_attribute___no_sanitize_address__ (__GNUC_MINOR__ >= 8)
 # define __GCC4_has_attribute___fallthrough__ 0
+# define __GCC4_has_attribute___no_caller_saved_registers__ 0
 #endif
 
 /*
@@ -175,6 +176,18 @@
  */
 #define __mode(x)   __attribute__((__mode__(x)))
 
+/*
+ * Optional: only supported since gcc >= 7
+ *
+ *   gcc: 
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#index-no_005fcaller_005fsaved_005fregisters-function-attribute_002c-x86
+ * clang: 
https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers
+ */
+#if __has_attribute(__no_caller_saved_registers__)
+# define __no_caller_saved_registers   
__attribute__((__no_caller_saved_registers__))
+#else
+# define __no_caller_saved_registers
+#endif
+
 /*
  * Optional: not supported by clang
  *
diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index b193a59fc05b..a8fc9ae1d03d 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -104,19 +104,9 @@ asmlinkage void notrace stackleak_erase(void)
 }
 NOKPROBE_SYMBOL(stackleak_erase);
 
-void __used notrace stackleak_track_stack(void)
+void __used __no_caller_saved_registers notrace stackleak_track_stack(void)
 {
-   /*
-* N.B. stackleak_erase() fills the kernel stack with the poison value,
-* which has the register width. That code assumes that the value
-* of 'lowest_stack' is aligned on the register width boundary.
-*
-* That is true for x86 and x86_64 because of the kernel stack
-* alignment on these platforms (for details, see 'cc_stack_align' in
- 

[PATCH 4/5] gcc-plugins/stackleak: Don't instrument itself

2020-06-04 Thread Alexander Popov
There is no need to try instrumenting functions in kernel/stackleak.c.
Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin
is disabled.

Signed-off-by: Alexander Popov 
---
 kernel/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 4cb4130ced32..d372134ac9ec 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -118,6 +118,7 @@ obj-$(CONFIG_RSEQ) += rseq.o
 
 obj-$(CONFIG_SYSCTL_KUNIT_TEST) += sysctl-test.o
 
+CFLAGS_stackleak.o += $(DISABLE_STACKLEAK_PLUGIN)
 obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
 KASAN_SANITIZE_stackleak.o := n
 KCOV_INSTRUMENT_stackleak.o := n
-- 
2.25.2



[PATCH 0/5] Improvements of the stackleak gcc plugin

2020-06-04 Thread Alexander Popov
In this patch series I collected various improvements of the stackleak
gcc plugin.

The first patch excludes alloca() from the stackleak instrumentation logic
to make it simpler.

The second patch is the main improvement. It eliminates an unwanted
side-effect of kernel code instrumentation. This patch is a deep
reengineering of the idea described on grsecurity blog:
  https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction

The third patch adds 'verbose' plugin parameter for printing additional
info about the kernel code instrumentation.

Two other patches disable unneeded stackleak instrumentation for some
files.

I would like to thank Alexander Monakov  for his
advisory on gcc internals.

This patch series was tested for gcc version 4.8, 5, 6, 7, 8, 9, and 10
on x86_64, i386 and arm64.
That was done using the project 'kernel-build-containers':
  https://github.com/a13xp0p0v/kernel-build-containers


Alexander Popov (5):
  gcc-plugins/stackleak: Exclude alloca() from the instrumentation logic
  gcc-plugins/stackleak: Use asm instrumentation to avoid useless
register saving
  gcc-plugins/stackleak: Add 'verbose' plugin parameter
  gcc-plugins/stackleak: Don't instrument itself
  gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO

 arch/arm64/kernel/vdso/Makefile|   3 +-
 include/linux/compiler_attributes.h|  13 ++
 kernel/Makefile|   1 +
 kernel/stackleak.c |  16 +-
 scripts/Makefile.gcc-plugins   |   2 +
 scripts/gcc-plugins/stackleak_plugin.c | 260 -
 6 files changed, 232 insertions(+), 63 deletions(-)

-- 
2.25.2



Re: [PATCH] Documentation patch (gcc-plugins kernel)

2019-08-14 Thread Alexander Popov
Hello Kees and Robin,

On 01.08.2019 23:14, Kees Cook wrote:
> On Thu, Aug 01, 2019 at 09:30:58AM +0200, Robin Lindner wrote:
>> Cleaned documentation comment up. I removed the "TODO" because it was very 
>> old.
>> ---
>>  scripts/gcc-plugins/stackleak_plugin.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
>> b/scripts/gcc-plugins/stackleak_plugin.c
>> index dbd37460c573e..d8ba12c3bb238 100644
>> --- a/scripts/gcc-plugins/stackleak_plugin.c
>> +++ b/scripts/gcc-plugins/stackleak_plugin.c
>> @@ -144,8 +144,6 @@ static unsigned int stackleak_instrument_execute(void)
>>   *
>>   * Case in point: native_save_fl on amd64 when optimized for size
>>   * clobbers rdx if it were instrumented here.
>> - *
>> - * TODO: any more special cases?
>>   */
>>  if (is_leaf &&
>>  !TREE_PUBLIC(current_function_decl) &&
> 
> As to the content of the patch, let's also CC Alexander...
> 
> Are there no more special cases?

I don't know other special cases when we should avoid the instrumentation.

And I can't imagine a method of finding such cases except runtime testing.

Robin, I would better save this comment only removing "TODO" if it matters.

Best regards,
Alexander


[v3 1/1] kconfig: Terminate menu blocks with a comment in the generated config

2019-05-17 Thread Alexander Popov
Currently menu blocks start with a pretty header but end with nothing in
the generated config. So next config options stick together with the
options from the menu block.

Let's terminate menu blocks in the generated config with a comment and
a newline if needed. Example:

...
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=y
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

CONFIG_HAMRADIO=y
...

Signed-off-by: Alexander Popov 
---

v3 changes:
 - rebase onto the recent rc;
 - don't print the end comment for the rootmenu to avoid breaking
   'make testconfig' (thanks to Masahiro Yamada).

---
 scripts/kconfig/confdata.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 492ac34..6006154 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -867,6 +867,7 @@ int conf_write(const char *name)
const char *str;
char tmpname[PATH_MAX + 1], oldname[PATH_MAX + 1];
char *env;
+   bool need_newline = false;
 
if (!name)
name = conf_get_configname();
@@ -912,12 +913,16 @@ int conf_write(const char *name)
 "#\n"
 "# %s\n"
 "#\n", str);
+   need_newline = false;
} else if (!(sym->flags & SYMBOL_CHOICE)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
+   if (need_newline) {
+   fprintf(out, "\n");
+   need_newline = false;
+   }
sym->flags &= ~SYMBOL_WRITE;
-
conf_write_symbol(out, sym, _printer_cb, NULL);
}
 
@@ -929,6 +934,12 @@ int conf_write(const char *name)
if (menu->next)
menu = menu->next;
else while ((menu = menu->parent)) {
+   if (!menu->sym && menu_is_visible(menu) &&
+   menu != ) {
+   str = menu_get_prompt(menu);
+   fprintf(out, "# end of %s\n", str);
+   need_newline = true;
+   }
if (menu->next) {
menu = menu->next;
break;
-- 
2.7.4



Re: [v2 1/1] kconfig: Terminate menu blocks with a comment in the generated config

2019-05-07 Thread Alexander Popov



7 May 2019 15:29:26 GMT+03:00, Masahiro Yamada  
wrote:
>This patch breaks "make testconfig".
>
>Please get rid of the "endof" for the rootmenu
>at the end of the .config file.

Thanks a lot for having a look! 
I'll fix this issue, test it and return with the next version. 

Best regards, 
Alexander 


Re: [v2 1/1] kconfig: Terminate menu blocks with a comment in the generated config

2019-05-01 Thread Alexander Popov
On 24.04.2019 15:02, Alexander Popov wrote:
> Currently menu blocks start with a pretty header but end with nothing in
> the generated config. So next config options stick together with the
> options from the menu block.
> 
> Let's terminate menu blocks in the generated config with a comment and
> a newline if needed. Example:
> 
> ...
> CONFIG_BPF_STREAM_PARSER=y
> CONFIG_NET_FLOW_LIMIT=y
> 
> #
> # Network testing
> #
> CONFIG_NET_PKTGEN=y
> CONFIG_NET_DROP_MONITOR=y
> # end of Network testing
> # end of Networking options
> 
> CONFIG_HAMRADIO=y
> ...
> 
> Signed-off-by: Alexander Popov 

Hello!

Just a friendly ping.

Would you like to take this patch?
Can I improve anything?

Best regards,
Alexander


Re: [PATCH v3 2/3] security: Move stackleak config to Kconfig.hardening

2019-04-24 Thread Alexander Popov
On 23.04.2019 22:49, Kees Cook wrote:
> This moves the stackleak plugin options to Kconfig.hardening's memory
> initialization menu.
> 
> Signed-off-by: Kees Cook 

Hello Kees,

I see the changes in STACKLEAK help, looks good to me.
For this patch -
  Reviewed-by: Alexander Popov 


By the way, for your information, GCC_PLUGIN_STRUCTLEAK help is now unreachable
from 'make menuconfig'.

Best regards,
Alexander


> ---
>  scripts/gcc-plugins/Kconfig | 51 -
>  security/Kconfig.hardening  | 57 +
>  2 files changed, 57 insertions(+), 51 deletions(-)
> 
> diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig
> index 352f03878a1e..80220ed26a35 100644
> --- a/scripts/gcc-plugins/Kconfig
> +++ b/scripts/gcc-plugins/Kconfig
> @@ -108,57 +108,6 @@ config GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
> in structures.  This reduces the performance hit of RANDSTRUCT
> at the cost of weakened randomization.
>  
> -config GCC_PLUGIN_STACKLEAK
> - bool "Erase the kernel stack before returning from syscalls"
> - depends on GCC_PLUGINS
> - depends on HAVE_ARCH_STACKLEAK
> - help
> -   This option makes the kernel erase the kernel stack before
> -   returning from system calls. That reduces the information which
> -   kernel stack leak bugs can reveal and blocks some uninitialized
> -   stack variable attacks.
> -
> -   The tradeoff is the performance impact: on a single CPU system kernel
> -   compilation sees a 1% slowdown, other systems and workloads may vary
> -   and you are advised to test this feature on your expected workload
> -   before deploying it.
> -
> -   This plugin was ported from grsecurity/PaX. More information at:
> -* https://grsecurity.net/
> -* https://pax.grsecurity.net/
> -
> -config STACKLEAK_TRACK_MIN_SIZE
> - int "Minimum stack frame size of functions tracked by STACKLEAK"
> - default 100
> - range 0 4096
> - depends on GCC_PLUGIN_STACKLEAK
> - help
> -   The STACKLEAK gcc plugin instruments the kernel code for tracking
> -   the lowest border of the kernel stack (and for some other purposes).
> -   It inserts the stackleak_track_stack() call for the functions with
> -   a stack frame size greater than or equal to this parameter.
> -   If unsure, leave the default value 100.
> -
> -config STACKLEAK_METRICS
> - bool "Show STACKLEAK metrics in the /proc file system"
> - depends on GCC_PLUGIN_STACKLEAK
> - depends on PROC_FS
> - help
> -   If this is set, STACKLEAK metrics for every task are available in
> -   the /proc file system. In particular, /proc//stack_depth
> -   shows the maximum kernel stack consumption for the current and
> -   previous syscalls. Although this information is not precise, it
> -   can be useful for estimating the STACKLEAK performance impact for
> -   your workloads.
> -
> -config STACKLEAK_RUNTIME_DISABLE
> - bool "Allow runtime disabling of kernel stack erasing"
> - depends on GCC_PLUGIN_STACKLEAK
> - help
> -   This option provides 'stack_erasing' sysctl, which can be used in
> -   runtime to control kernel stack erasing for kernels built with
> -   CONFIG_GCC_PLUGIN_STACKLEAK.
> -
>  config GCC_PLUGIN_ARM_SSP_PER_TASK
>   bool
>   depends on GCC_PLUGINS && ARM
> diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
> index 19881341f1c2..a96d4a43ca65 100644
> --- a/security/Kconfig.hardening
> +++ b/security/Kconfig.hardening
> @@ -88,6 +88,63 @@ config GCC_PLUGIN_STRUCTLEAK_VERBOSE
> initialized. Since not all existing initializers are detected
> by the plugin, this can produce false positive warnings.
>  
> +config GCC_PLUGIN_STACKLEAK
> + bool "Poison kernel stack before returning from syscalls"
> + depends on GCC_PLUGINS
> + depends on HAVE_ARCH_STACKLEAK
> + help
> +   This option makes the kernel erase the kernel stack before
> +   returning from system calls. This has the effect of leaving
> +   the stack initialized to the poison value, which both reduces
> +   the lifetime of any sensitive stack contents and reduces
> +   potential for uninitialized stack variable exploits or information
> +   exposures (it does not cover functions reaching the same stack
> +   depth as prior functions during the same syscall). This blocks
> +   most uninitialized stack variable attacks, with the performance
> +   impact being driven by the depth of the stack usage, rather than
> +   the funct

Re: [PATCH 1/1] kconfig: Terminate menu blocks with a newline in the generated config

2019-04-24 Thread Alexander Popov
On 24.04.2019 13:09, Masahiro Yamada wrote:
> On Sat, Apr 20, 2019 at 4:12 AM Alexander Popov  wrote:
>>
>> Currently menu blocks start with a pretty header but end with nothing in
>> the generated config. So next config options stick together with the
>> options from the menu block.
>>
>> Let's terminate menu blocks with a newline in the generated config.
>>
>> Signed-off-by: Alexander Popov 
> 
> The readability of the .config could be improved somehow,
> but this patch is not the right one.

Masahiro, thanks for having a look.
I've just sent v2 which does this task much better.

Best regards,
Alexander


[v2 1/1] kconfig: Terminate menu blocks with a comment in the generated config

2019-04-24 Thread Alexander Popov
Currently menu blocks start with a pretty header but end with nothing in
the generated config. So next config options stick together with the
options from the menu block.

Let's terminate menu blocks in the generated config with a comment and
a newline if needed. Example:

...
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=y
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

CONFIG_HAMRADIO=y
...

Signed-off-by: Alexander Popov 
---
 scripts/kconfig/confdata.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 08ba146..486b4c7 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -819,6 +819,7 @@ int conf_write(const char *name)
struct menu *menu;
const char *basename;
const char *str;
+   bool need_newline = false;
char dirname[PATH_MAX+1], tmpname[PATH_MAX+22], newname[PATH_MAX+8];
char *env;
 
@@ -871,12 +872,16 @@ int conf_write(const char *name)
 "#\n"
 "# %s\n"
 "#\n", str);
+   need_newline = false;
} else if (!(sym->flags & SYMBOL_CHOICE)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
+   if (need_newline) {
+   fprintf(out, "\n");
+   need_newline = false;
+   }
sym->flags &= ~SYMBOL_WRITE;
-
conf_write_symbol(out, sym, _printer_cb, NULL);
}
 
@@ -888,6 +893,11 @@ int conf_write(const char *name)
if (menu->next)
menu = menu->next;
else while ((menu = menu->parent)) {
+   if (!menu->sym && menu_is_visible(menu)) {
+   str = menu_get_prompt(menu);
+   fprintf(out, "# end of %s\n", str);
+   need_newline = true;
+   }
if (menu->next) {
menu = menu->next;
break;
-- 
2.7.4



Re: [PATCH v2 1/3] security: Create "kernel hardening" config area

2019-04-19 Thread Alexander Popov
On 16.04.2019 16:56, Kees Cook wrote:
> On Tue, Apr 16, 2019 at 8:55 AM Alexander Popov  wrote:
>>
>> On 16.04.2019 7:02, Kees Cook wrote:
>>> On Mon, Apr 15, 2019 at 11:44 AM Alexander Popov  
>>> wrote:
>>>>
>>>> What do you think about some separator between memory initialization 
>>>> options and
>>>> CONFIG_CRYPTO?
>>>
>>> This was true before too
>>
>> Hm, yes, it's a generic behavior - there is no any separator at 'endmenu' and
>> config options stick together.
>>
>> I've created a patch to fix that. What do you think about it?
>> I can send it to LKML separately.
>>
>>
>> From 50bf59d30fafcdebb3393fb742e1bd51e7d2f2da Mon Sep 17 00:00:00 2001
>> From: Alexander Popov 
>> Date: Tue, 16 Apr 2019 16:09:40 +0300
>> Subject: [PATCH 1/1] kconfig: Terminate menu blocks with a newline in the
>>  generated config
>>
>> Currently menu blocks start with a pretty header but end with nothing in
>> the generated config. So next config options stick together with the
>> options from the menu block.
>>
>> Let's terminate menu blocks with a newline in the generated config.
>>
>> Signed-off-by: Alexander Popov 
>> ---
>>  scripts/kconfig/confdata.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
>> index 08ba146..1459153 100644
>> --- a/scripts/kconfig/confdata.c
>> +++ b/scripts/kconfig/confdata.c
>> @@ -888,6 +888,8 @@ int conf_write(const char *name)
>> if (menu->next)
>> menu = menu->next;
>> else while ((menu = menu->parent)) {
>> +   if (!menu->sym && menu_is_visible(menu))
>> +   fprintf(out, "\n");
>> if (menu->next) {
>> menu = menu->next;
>> break;
> 
> Seems fine to me. I defer to Masahiro, though. :)

Hi! I've sent this patch separately to LKML:

https://lore.kernel.org/lkml/1555669773-9766-1-git-send-email-alex.po...@linux.com/T/#u

Best regards,
Alexander


[PATCH 1/1] kconfig: Terminate menu blocks with a newline in the generated config

2019-04-19 Thread Alexander Popov
Currently menu blocks start with a pretty header but end with nothing in
the generated config. So next config options stick together with the
options from the menu block.

Let's terminate menu blocks with a newline in the generated config.

Signed-off-by: Alexander Popov 
---
 scripts/kconfig/confdata.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 08ba146..1459153 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -888,6 +888,8 @@ int conf_write(const char *name)
if (menu->next)
menu = menu->next;
else while ((menu = menu->parent)) {
+   if (!menu->sym && menu_is_visible(menu))
+   fprintf(out, "\n");
if (menu->next) {
menu = menu->next;
break;
-- 
2.7.4



Re: [PATCH v2 1/3] security: Create "kernel hardening" config area

2019-04-16 Thread Alexander Popov
On 16.04.2019 7:02, Kees Cook wrote:
> On Mon, Apr 15, 2019 at 11:44 AM Alexander Popov  wrote:
>>
>> What do you think about some separator between memory initialization options 
>> and
>> CONFIG_CRYPTO?
> 
> This was true before too

Hm, yes, it's a generic behavior - there is no any separator at 'endmenu' and
config options stick together.

I've created a patch to fix that. What do you think about it?
I can send it to LKML separately.


>From 50bf59d30fafcdebb3393fb742e1bd51e7d2f2da Mon Sep 17 00:00:00 2001
From: Alexander Popov 
Date: Tue, 16 Apr 2019 16:09:40 +0300
Subject: [PATCH 1/1] kconfig: Terminate menu blocks with a newline in the
 generated config

Currently menu blocks start with a pretty header but end with nothing in
the generated config. So next config options stick together with the
options from the menu block.

Let's terminate menu blocks with a newline in the generated config.

Signed-off-by: Alexander Popov 
---
 scripts/kconfig/confdata.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 08ba146..1459153 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -888,6 +888,8 @@ int conf_write(const char *name)
if (menu->next)
menu = menu->next;
else while ((menu = menu->parent)) {
+   if (!menu->sym && menu_is_visible(menu))
+   fprintf(out, "\n");
if (menu->next) {
menu = menu->next;
break;
-- 
2.7.4





Re: [PATCH v2 1/3] security: Create "kernel hardening" config area

2019-04-15 Thread Alexander Popov
On 11.04.2019 21:01, Kees Cook wrote:
> Right now kernel hardening options are scattered around various Kconfig
> files. This can be a central place to collect these kinds of options
> going forward. This is initially populated with the memory initialization
> options from the gcc-plugins.
> 
> Signed-off-by: Kees Cook 

Hello Kees, hello everyone!

After applying this series the kernel config looks like that:

...
...
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"

#
# Kernel hardening options
#

#
# Memory initialization
#
CONFIG_INIT_STACK_NONE=y
# CONFIG_GCC_PLUGIN_STRUCTLEAK_USER is not set
# CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF is not set
# CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL is not set
# CONFIG_GCC_PLUGIN_STACKLEAK is not set
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=y
...
...

What do you think about some separator between memory initialization options and
CONFIG_CRYPTO?

Best regards,
Alexander


Re: [PATCH 1/2] gcc-plugins: structleak: Generalize to all variable types

2019-03-11 Thread Alexander Popov
Hello Kees, hello everyone,

On 12.02.2019 21:04, Kees Cook wrote:
> Building with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL should give the
> kernel complete initialization coverage of all stack variables passed
> by reference, including padding (see lib/test_stackinit.c).

I would like to note that new STRUCTLEAK_BYREF_ALL initializes *less* stack
variables than STACKINIT, that was introduced earlier:
https://www.openwall.com/lists/kernel-hardening/2019/01/23/3

Citing the patches:
- the STACKINIT plugin "attempts to perform unconditional initialization of all
stack variables";
- the STRUCTLEAK_BYREF_ALL feature "gives the kernel complete initialization
coverage of all stack variables passed by reference".

I.e. stack variables not passed by reference are not initialized by
STRUCTLEAK_BYREF_ALL.

Kees, what do you think about adding such cases to your lib/test_stackinit.c?
This simple example demonstrates the idea:


diff --git a/lib/test_stackinit.c b/lib/test_stackinit.c
index 13115b6..f9ef313 100644
--- a/lib/test_stackinit.c
+++ b/lib/test_stackinit.c
@@ -320,9 +320,18 @@ static noinline __init int leaf_switch_2_none(unsigned 
long sp, bool fill,
 DEFINE_TEST_DRIVER(switch_1_none, uint64_t, SCALAR);
 DEFINE_TEST_DRIVER(switch_2_none, uint64_t, SCALAR);

+struct x {
+   int x1;
+   int x2;
+   int x3;
+};
+
 static int __init test_stackinit_init(void)
 {
unsigned int failures = 0;
+   struct x _x;
+
+   printk("uninitialized struct fields sum: %d\n", _x.x1 + _x.x2 + _x.x3);

 #define test_scalars(init) do {\
failures += test_u8_ ## init ();\


Kernel output:
  root@vm:~# insmod /lib/modules/5.0.0+/kernel/lib/test_stackinit.ko
  [   40.534622] uninitialized struct fields sum: -727800841

Best regards,
Alexander


Re: [PATCH 0/3] gcc-plugins: Introduce stackinit plugin

2019-01-28 Thread Alexander Popov
On 23.01.2019 14:03, Kees Cook wrote:
> This adds a new plugin "stackinit" that attempts to perform unconditional
> initialization of all stack variables

Hello Kees! Hello everyone!

I was curious about the performance impact of the initialization of all stack
variables. So I did a very brief test with this plugin on top of 4.20.5.

hackbench on Intel Core i7-4770 showed ~0.7% slowdown.
hackbench on Kirin 620 (ARM Cortex-A53 Octa-core 1.2GHz) showed ~1.3% slowdown.

This test involves the kernel scheduler and allocator. I can't say whether they
use stack aggressively. Maybe performance tests of other subsystems (e.g.
network subsystem) can show different numbers. Did you try?

I've heard a hypothesis that the initialization of all stack variables would
pollute CPU caches, which is critical for some types of computations. Maybe some
micro-benchmarks can disprove/confirm that?

Thanks!
Best regards,
Alexander


Re: [PATCH v2] KVM: x86: Fix single-step debugging

2019-01-24 Thread Alexander Popov
On 21.01.2019 15:48, Alexander Popov wrote:
> The single-step debugging of KVM guests on x86 is broken: if we run
> gdb 'stepi' command at the breakpoint when the guest interrupts are
> enabled, RIP always jumps to native_apic_mem_write(). Then other
> nasty effects follow.
> 
> Long investigation showed that on Jun 7, 2017 the
> commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall")
> introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
> be called without X86_EFLAGS_TF set.
> 
> Let's fix it. Please consider that for -stable.

Hello everyone!

Just a friendly ping about this fix.
Looking forward to the feedback.

Best regards,
Alexander


[PATCH v2] KVM: x86: Fix single-step debugging

2019-01-21 Thread Alexander Popov
The single-step debugging of KVM guests on x86 is broken: if we run
gdb 'stepi' command at the breakpoint when the guest interrupts are
enabled, RIP always jumps to native_apic_mem_write(). Then other
nasty effects follow.

Long investigation showed that on Jun 7, 2017 the
commit c8401dda2f0a00cd25c0 ("KVM: x86: fix singlestepping over syscall")
introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
be called without X86_EFLAGS_TF set.

Let's fix it. Please consider that for -stable.

Signed-off-by: Alexander Popov 
---
 arch/x86/kvm/x86.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f049ecf..9686068 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6407,8 +6407,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
kvm_rip_write(vcpu, ctxt->eip);
-   if (r == EMULATE_DONE &&
-   (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))
+   if (r == EMULATE_DONE && ctxt->tf)
kvm_vcpu_do_singlestep(vcpu, );
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP)
-- 
2.7.4



Re: [PATCH] KVM: x86: Fix single-step debugging

2019-01-21 Thread Alexander Popov
Oops. I've forgotten about checkpatch.pl.
I've fixed the commit message and will resend the patch shortly.

--
Alexander


[PATCH] KVM: x86: Fix single-step debugging

2019-01-21 Thread Alexander Popov
The single-step debugging of KVM guests on x86 is broken: if we run
gdb 'stepi' command at the breakpoint when the guest interrupts are
enabled, RIP always jumps to native_apic_mem_write(). Then other
nasty effects follow.

Long investigation showed that the commit c8401dda2f0a00cd25c0 (Jun 7 2017)
introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
be called without X86_EFLAGS_TF set.

Let's fix it. Please consider that for -stable.
---
 arch/x86/kvm/x86.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f049ecf..9686068 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6407,8 +6407,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
kvm_rip_write(vcpu, ctxt->eip);
-   if (r == EMULATE_DONE &&
-   (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))
+   if (r == EMULATE_DONE && ctxt->tf)
kvm_vcpu_do_singlestep(vcpu, );
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP)
-- 
2.7.4



Re: 10e9ae9fab ("gcc-plugins: Add STACKLEAK plugin for tracking .."): WARNING: can't dereference registers at (null) for ip entry_SYSCALL_64_after_hwframe

2018-12-20 Thread Alexander Popov
Hello everyone,

I've carefully worked with this report, let me share the results.

On 18.12.2018 8:15, kernel test robot wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d
> gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack

This bot has been running trinity at the following points:

> afaef01c00  x86/entry: Add STACKLEAK erasing the kernel stack at the end of 
> syscalls
> 10e9ae9fab  gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack

just after stackleak was merged into 4.20-rc1

> 1a9430db28  ima: cleanup the match_token policy code

near 4.20-rc7 (Dec 17)

> 6648e120dd  Add linux-next specific files for 20181217

rc + next (Dec 17)

> +---++++---+
> |   | afaef01c00 
> | 10e9ae9fab | 1a9430db28 | next-20181217 |
> +---++++---+
> | boot_successes| 386
> | 141| 134| 135   |
> | boot_failures | 68 
> | 9  | 16 | 8 |

The following oopses happened on 4.20-rc1 and disappeared on 4.20-rc7:

> | RIP:trace | 37 
> |||   |
> | WARNING:stack_recursion   | 36 
> |||   |
> | WARNING:at(ptrval)for_ip_syscall_return_via_sysret/0x | 37 
> |||   |
> | Kernel_panic-not_syncing:Machine_halted   | 37 
> |||   |
> | PANIC:double_fault| 27 
> |||   |

They are caused by stackleak issues with ftrace and kprobes, that are fixed
in these commits:
e9c7d656610e
ef1a84093489

I've double-checked that now stackleak works properly with function tracing,
function_graph tracing and kprobes.

> | Mem-Info  | 2  
> | 0  | 1  |   |
> | invoked_oom-killer:gfp_mask=0x| 1  
> | 0  | 1  |   |
> | RIP:__put_user_4  | 1  
> |||   |

These 3 lines are not meaningful to me.

> | BUG:KASAN:stack-out-of-bounds_in_u| 25 
> | 8  | 12 | 7 |

This is interesting. How does KASAN work with stackleak? I tested it using
test_kasan.ko -- it works properly both for KASAN outline and inline
instrumentation.

However I noticed that stackleak lkdtm test sometimes reports that kernel
stack is not properly erased in case of KASAN outline instrumentation.
I think it happens because KASAN increases kernel stack usage, so
CONFIG_STACKLEAK_TRACK_MIN_SIZE should be adjusted. I will investigate
that later.

> | RIP:__x86_indirect_thunk_rdx  | 26 
> | 9  | 12 | 7 |
> | INFO:rcu_preempt_detected_stalls_on_CPUs/tasks| 3  
> | 0  | 3  |   |
> | RIP:arch_local_irq_enable | 1  
> |||   |
> | RIP:mntput_no_expire  | 1  
> |||   |
> | RIP:arch_local_irq_restore| 1  
> |||   |
> | RIP:compound_head | 1  
> |||   |
> | RIP:rcu_read_lock | 1  
> |||   |
> | RIP:check_kill_permission | 1  
> |||   |
> | RIP:radix_tree_load_root  | 1  
> |||   |
> | WARNING:at(null)for_ip_entry_SYSCALL_64_after_hwframe/0x  | 0  
> | 7  | 11 | 7 |
> | WARNING:at(null)for_ip_async_page_fault/0x| 0  
> | 1  | 1  |   |
> | WARNING:at_kernel/locking/lockdep.c:#lock_downgrade   | 0  
> | 0  | 2  |   |
> | RIP:lock_downgrade

[PATCH v2 1/1] stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass

2018-12-06 Thread Alexander Popov
Currently the 'stackleak_cleanup' pass deleting a CALL insn is executed
after the 'reload' pass. That allows gcc to do some weird optimization in
function prologues and epilogues, which are generated later [1].

Let's avoid that by registering the 'stackleak_cleanup' pass before
the '*free_cfg' pass. It's the moment when the stack frame size is
already final, function prologues and epilogues are generated, and the
machine-dependent code transformations are not done.

[1] https://www.openwall.com/lists/kernel-hardening/2018/11/23/2

Reported-by: kbuild test robot 
Signed-off-by: Alexander Popov 
---
 scripts/gcc-plugins/stackleak_plugin.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c 
b/scripts/gcc-plugins/stackleak_plugin.c
index 2f48da9..dbd3746 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -363,10 +363,12 @@ __visible int plugin_init(struct plugin_name_args 
*plugin_info,
PASS_POS_INSERT_BEFORE);
 
/*
-* The stackleak_cleanup pass should be executed after the
-* "reload" pass, when the stack frame size is final.
+* The stackleak_cleanup pass should be executed before the "*free_cfg"
+* pass. It's the moment when the stack frame size is already final,
+* function prologues and epilogues are generated, and the
+* machine-dependent code transformations are not done.
 */
-   PASS_INFO(stackleak_cleanup, "reload", 1, PASS_POS_INSERT_AFTER);
+   PASS_INFO(stackleak_cleanup, "*free_cfg", 1, PASS_POS_INSERT_BEFORE);
 
if (!plugin_default_version_check(version, _version)) {
error(G_("incompatible gcc/plugin versions"));
-- 
2.7.4



Re: [PATCH 1/1] stackleak: Register the 'stackleak_cleanup' pass before the 'mach' pass

2018-12-06 Thread Alexander Popov
On 03.12.2018 21:25, Alexander Popov wrote:
> But I think it's better to register the 'stackleak_cleanup' pass just one pass
> earlier -- before the '*free_cfg' pass. I'll double check it for different
> versions of gcc on all supported architectures and return with a new patch.

I've tested this idea for gcc-5,6,7,8 on x86_64, x86_32, and arm64.
I'll send the patch soon.

Best regards,
Alexander


[PATCH 1/1] stackleak: Disable function tracing and kprobes for stackleak_erase()

2018-11-12 Thread Alexander Popov
The stackleak_erase() function is called on the trampoline stack at the end
of syscall. This stack is not big enough for ftrace and kprobes operations,
e.g. it can be exhausted if we use kprobe_events for stackleak_erase().

So let's disable function tracing and kprobes for stackleak_erase().

Reported-by: kernel test robot 
Signed-off-by: Alexander Popov 
---
 kernel/stackleak.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index e428929..08cb57e 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -11,6 +11,7 @@
  */
 
 #include 
+#include 
 
 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
 #include 
@@ -47,7 +48,7 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
 #define skip_erasing() false
 #endif /* CONFIG_STACKLEAK_RUNTIME_DISABLE */
 
-asmlinkage void stackleak_erase(void)
+asmlinkage void notrace stackleak_erase(void)
 {
/* It would be nice not to have 'kstack_ptr' and 'boundary' on stack */
unsigned long kstack_ptr = current->lowest_stack;
@@ -101,6 +102,7 @@ asmlinkage void stackleak_erase(void)
/* Reset the 'lowest_stack' value for the next syscall */
current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
 }
+NOKPROBE_SYMBOL(stackleak_erase);
 
 void __used stackleak_track_stack(void)
 {
-- 
2.7.4



[PATCH 1/1] stackleak: Disable function tracing and kprobes for stackleak_erase()

2018-11-12 Thread Alexander Popov
The stackleak_erase() function is called on the trampoline stack at the end
of syscall. This stack is not big enough for ftrace and kprobes operations,
e.g. it can be exhausted if we use kprobe_events for stackleak_erase().

So let's disable function tracing and kprobes for stackleak_erase().

Reported-by: kernel test robot 
Signed-off-by: Alexander Popov 
---
 kernel/stackleak.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index e428929..08cb57e 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -11,6 +11,7 @@
  */
 
 #include 
+#include 
 
 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
 #include 
@@ -47,7 +48,7 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
 #define skip_erasing() false
 #endif /* CONFIG_STACKLEAK_RUNTIME_DISABLE */
 
-asmlinkage void stackleak_erase(void)
+asmlinkage void notrace stackleak_erase(void)
 {
/* It would be nice not to have 'kstack_ptr' and 'boundary' on stack */
unsigned long kstack_ptr = current->lowest_stack;
@@ -101,6 +102,7 @@ asmlinkage void stackleak_erase(void)
/* Reset the 'lowest_stack' value for the next syscall */
current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;
 }
+NOKPROBE_SYMBOL(stackleak_erase);
 
 void __used stackleak_track_stack(void)
 {
-- 
2.7.4



Re: [PATCH 1/1] stackleak: Disable ftrace for stackleak.c

2018-11-12 Thread Alexander Popov
Hello Steven and Masami,

Thanks for your comments.

On 12.11.2018 5:50, Masami Hiramatsu wrote:
> Hi Alexander and Steve,
> 
> On Sun, 11 Nov 2018 20:53:51 -0500
> Steven Rostedt  wrote:
> 
>> On Sun, 11 Nov 2018 13:19:45 +0300
>> Alexander Popov  wrote:
>>
>>> On 11.11.2018 2:30, Steven Rostedt wrote:
>>>> On Sun, 11 Nov 2018 01:05:30 +0300
>>>> Alexander Popov  wrote:
>>>>   
>>>>> The stackleak_erase() function is called on the trampoline stack at the
>>>>> end of syscall. This stack is not big enough for ftrace operations,
>>>>> e.g. it can be overflowed if we enable kprobe_events for 
>>>>> stackleak_erase().  
>>>>
>>>> Is the issue with kprobes or with function tracing? Because this stops
>>>> function tracing which I only want disabled if function tracing itself
>>>> is an issue, not for other things that may use the function tracing
>>>> infrastructure.  
>>>
>>> Hello Steven,
>>>
>>> I believe that stackleak erasing is not compatible with function tracing 
>>> itself.
>>> That's what the kernel testing robot has hit:
>>> https://www.openwall.com/lists/kernel-hardening/2018/11/09/1
>>>
>>> I used kprobe_events just to reproduce the problem:
>>> https://www.openwall.com/lists/kernel-hardening/2018/11/09/4
>>
>> Have you tried adding a "notrace" to stackleak_erase()?
>>
>> Not tracing the entire file is a bit of overkill. There's no reason
>> ftrace can't trace stack_erasing_sysctl() or perhaps even
>> stackleak_track_stack() as that may be very interesting to trace.

Yes, thank you. It's much better.

> I think it is not enough for stopping kprobes. If you want to stop the kprobes
> (int3 version) on stackleak_erase(), you should use 
> NOKPROBE_SYMBOL(stackleak_erase),
> since kprobes can work without ftrace. 

Thanks!

I learned how to use kprobes without ftrace and managed to reproduce the problem
as well (I modified kprobe_example.c and kretprobe_example.c). NOKPROBE_SYMBOL()
allowed to avoid it.

I'll send the patch soon.

By the way, are there any other tracing/instrumentation mechanisms that should
be disabled?

Best regards,
Alexander


Re: [PATCH 1/1] stackleak: Disable ftrace for stackleak.c

2018-11-12 Thread Alexander Popov
Hello Steven and Masami,

Thanks for your comments.

On 12.11.2018 5:50, Masami Hiramatsu wrote:
> Hi Alexander and Steve,
> 
> On Sun, 11 Nov 2018 20:53:51 -0500
> Steven Rostedt  wrote:
> 
>> On Sun, 11 Nov 2018 13:19:45 +0300
>> Alexander Popov  wrote:
>>
>>> On 11.11.2018 2:30, Steven Rostedt wrote:
>>>> On Sun, 11 Nov 2018 01:05:30 +0300
>>>> Alexander Popov  wrote:
>>>>   
>>>>> The stackleak_erase() function is called on the trampoline stack at the
>>>>> end of syscall. This stack is not big enough for ftrace operations,
>>>>> e.g. it can be overflowed if we enable kprobe_events for 
>>>>> stackleak_erase().  
>>>>
>>>> Is the issue with kprobes or with function tracing? Because this stops
>>>> function tracing which I only want disabled if function tracing itself
>>>> is an issue, not for other things that may use the function tracing
>>>> infrastructure.  
>>>
>>> Hello Steven,
>>>
>>> I believe that stackleak erasing is not compatible with function tracing 
>>> itself.
>>> That's what the kernel testing robot has hit:
>>> https://www.openwall.com/lists/kernel-hardening/2018/11/09/1
>>>
>>> I used kprobe_events just to reproduce the problem:
>>> https://www.openwall.com/lists/kernel-hardening/2018/11/09/4
>>
>> Have you tried adding a "notrace" to stackleak_erase()?
>>
>> Not tracing the entire file is a bit of overkill. There's no reason
>> ftrace can't trace stack_erasing_sysctl() or perhaps even
>> stackleak_track_stack() as that may be very interesting to trace.

Yes, thank you. It's much better.

> I think it is not enough for stopping kprobes. If you want to stop the kprobes
> (int3 version) on stackleak_erase(), you should use 
> NOKPROBE_SYMBOL(stackleak_erase),
> since kprobes can work without ftrace. 

Thanks!

I learned how to use kprobes without ftrace and managed to reproduce the problem
as well (I modified kprobe_example.c and kretprobe_example.c). NOKPROBE_SYMBOL()
allowed to avoid it.

I'll send the patch soon.

By the way, are there any other tracing/instrumentation mechanisms that should
be disabled?

Best regards,
Alexander


Re: [PATCH 1/1] stackleak: Disable ftrace for stackleak.c

2018-11-11 Thread Alexander Popov
On 11.11.2018 2:30, Steven Rostedt wrote:
> On Sun, 11 Nov 2018 01:05:30 +0300
> Alexander Popov  wrote:
> 
>> The stackleak_erase() function is called on the trampoline stack at the
>> end of syscall. This stack is not big enough for ftrace operations,
>> e.g. it can be overflowed if we enable kprobe_events for stackleak_erase().
> 
> Is the issue with kprobes or with function tracing? Because this stops
> function tracing which I only want disabled if function tracing itself
> is an issue, not for other things that may use the function tracing
> infrastructure.

Hello Steven,

I believe that stackleak erasing is not compatible with function tracing itself.
That's what the kernel testing robot has hit:
https://www.openwall.com/lists/kernel-hardening/2018/11/09/1

I used kprobe_events just to reproduce the problem:
https://www.openwall.com/lists/kernel-hardening/2018/11/09/4

Best regards,
Alexander


Re: [PATCH 1/1] stackleak: Disable ftrace for stackleak.c

2018-11-11 Thread Alexander Popov
On 11.11.2018 2:30, Steven Rostedt wrote:
> On Sun, 11 Nov 2018 01:05:30 +0300
> Alexander Popov  wrote:
> 
>> The stackleak_erase() function is called on the trampoline stack at the
>> end of syscall. This stack is not big enough for ftrace operations,
>> e.g. it can be overflowed if we enable kprobe_events for stackleak_erase().
> 
> Is the issue with kprobes or with function tracing? Because this stops
> function tracing which I only want disabled if function tracing itself
> is an issue, not for other things that may use the function tracing
> infrastructure.

Hello Steven,

I believe that stackleak erasing is not compatible with function tracing itself.
That's what the kernel testing robot has hit:
https://www.openwall.com/lists/kernel-hardening/2018/11/09/1

I used kprobe_events just to reproduce the problem:
https://www.openwall.com/lists/kernel-hardening/2018/11/09/4

Best regards,
Alexander


[PATCH 1/1] stackleak: Disable ftrace for stackleak.c

2018-11-10 Thread Alexander Popov
The stackleak_erase() function is called on the trampoline stack at the
end of syscall. This stack is not big enough for ftrace operations,
e.g. it can be overflowed if we enable kprobe_events for stackleak_erase().

Let's disable ftrace for stackleak.c to avoid such situations.

Reported-by: kernel test robot 
Signed-off-by: Alexander Popov 
Reviewed-by: Kees Cook 
---
 kernel/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 7343b3a..0906f6d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_MULTIUSER) += groups.o
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace internal ftrace files
 CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_stackleak.o = $(CC_FLAGS_FTRACE)
 endif
 
 # Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()
-- 
2.7.4



[PATCH 1/1] stackleak: Disable ftrace for stackleak.c

2018-11-10 Thread Alexander Popov
The stackleak_erase() function is called on the trampoline stack at the
end of syscall. This stack is not big enough for ftrace operations,
e.g. it can be overflowed if we enable kprobe_events for stackleak_erase().

Let's disable ftrace for stackleak.c to avoid such situations.

Reported-by: kernel test robot 
Signed-off-by: Alexander Popov 
Reviewed-by: Kees Cook 
---
 kernel/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 7343b3a..0906f6d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -18,6 +18,7 @@ obj-$(CONFIG_MULTIUSER) += groups.o
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace internal ftrace files
 CFLAGS_REMOVE_irq_work.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_stackleak.o = $(CC_FLAGS_FTRACE)
 endif
 
 # Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()
-- 
2.7.4



Re: drivers/acpi/acpi_processor.o: warning: objtool: acpi_duplicate_processor_id()+0x49: stack state mismatch: cfa1=7+8 cfa2=6+16

2018-11-08 Thread Alexander Popov
On 06.11.2018 18:50, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   163c8d54a997153ee1a1e07fcac087492ad85b37
> commit: 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d gcc-plugins: Add STACKLEAK 
> plugin for tracking the kernel stack
> date:   9 weeks ago
> config: x86_64-randconfig-u0-11062149 (attached as .config)
> compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010
> reproduce:
> git checkout 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d
> # save the attached .config to linux build tree
> make ARCH=x86_64 
>
>>> drivers/acpi/acpi_processor.o: warning: objtool: 
>>> acpi_duplicate_processor_id()+0x49: stack state mismatch: cfa1=7+8 cfa2=6+16

Hello!

I have reproduced this problem. It looks like stackleak_cleanup_execute() does
something wrong with RTL which is generated by gcc-5.

I'm working on this.

Best regards,
Alexander



Re: drivers/acpi/acpi_processor.o: warning: objtool: acpi_duplicate_processor_id()+0x49: stack state mismatch: cfa1=7+8 cfa2=6+16

2018-11-08 Thread Alexander Popov
On 06.11.2018 18:50, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   163c8d54a997153ee1a1e07fcac087492ad85b37
> commit: 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d gcc-plugins: Add STACKLEAK 
> plugin for tracking the kernel stack
> date:   9 weeks ago
> config: x86_64-randconfig-u0-11062149 (attached as .config)
> compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010
> reproduce:
> git checkout 10e9ae9fabaf96c8e5227c1cd4827d58b3aa406d
> # save the attached .config to linux build tree
> make ARCH=x86_64 
>
>>> drivers/acpi/acpi_processor.o: warning: objtool: 
>>> acpi_duplicate_processor_id()+0x49: stack state mismatch: cfa1=7+8 cfa2=6+16

Hello!

I have reproduced this problem. It looks like stackleak_cleanup_execute() does
something wrong with RTL which is generated by gcc-5.

I'm working on this.

Best regards,
Alexander



Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

2018-08-15 Thread Alexander Popov
Hello Linus,

On 15.08.2018 22:04, Linus Torvalds wrote:
> On Wed, Aug 15, 2018 at 11:35 AM Kees Cook  wrote:
>>
>> I swear I'm doing my best. Are you speaking of
>> stackleak_check_alloca() or stackleak_erase()? These were both
>> discussed on the list, and we weren't able to come up with
>> alternatives: in both cases we're off the stack, and recovery is
>> seemingly impossible.
> 
> Why do you even *test* that thing? Why don't you just allocate stack
> and clear it.
> 
> Dammit, the whole f*cking point of this patch-set is to clear the
> stack used. It is *not* supposed to do anything else. If the process
> runs out of stack, that's caught by the vmalloc'ed stack.
> 
> And if you don't  have vmalloc'ed stack, then clearly you don't care.
> 
> I refuse to take this kind of code that does stupid things, and then
> *because* it does those initial stupid things it does even more stupid
> things to correct for it.

Could you please have a look at the commit messages (or at the code)? You are
really arguing with wrong things! Let me correct Kees and give you the details.
Please don't be angry.

Again, this plugin provides two features: kernel stack erasing and blocking
Stack Clash (ability to jump over the guard page provided by VMAP_STACK).

So:

1. stackleak_erase() erases the stack. It has a BUG_ON() to detect
'task_struct.lowest_stack' corruption. It's not a security violation BUG(),
which you hate. We just don't want to erase wrong memory. We have discussed that
with Ingo and others.

2. stackleak_check_alloca() detects 'Stack Clash' and it does absolutely similar
things with VMAP_STACK and SCHED_STACK_END_CHECK. Having VMAP_STACK + STACKLEAK
+ THREAD_INFO_IN_TASK together protects us from all known stack depth overflows.

Yes, one day we will remove all VLA's from the mainline kernel. But STACKLEAK
plugin protects un-upstreamed code as well.


I've put so much effort (1.5 years) to polish it and make you, Ingo and others
satisfied!

Best regards,
Alexander


Re: [GIT PULL] gcc-plugin updates for v4.19-rc1

2018-08-15 Thread Alexander Popov
Hello Linus,

On 15.08.2018 22:04, Linus Torvalds wrote:
> On Wed, Aug 15, 2018 at 11:35 AM Kees Cook  wrote:
>>
>> I swear I'm doing my best. Are you speaking of
>> stackleak_check_alloca() or stackleak_erase()? These were both
>> discussed on the list, and we weren't able to come up with
>> alternatives: in both cases we're off the stack, and recovery is
>> seemingly impossible.
> 
> Why do you even *test* that thing? Why don't you just allocate stack
> and clear it.
> 
> Dammit, the whole f*cking point of this patch-set is to clear the
> stack used. It is *not* supposed to do anything else. If the process
> runs out of stack, that's caught by the vmalloc'ed stack.
> 
> And if you don't  have vmalloc'ed stack, then clearly you don't care.
> 
> I refuse to take this kind of code that does stupid things, and then
> *because* it does those initial stupid things it does even more stupid
> things to correct for it.

Could you please have a look at the commit messages (or at the code)? You are
really arguing with wrong things! Let me correct Kees and give you the details.
Please don't be angry.

Again, this plugin provides two features: kernel stack erasing and blocking
Stack Clash (ability to jump over the guard page provided by VMAP_STACK).

So:

1. stackleak_erase() erases the stack. It has a BUG_ON() to detect
'task_struct.lowest_stack' corruption. It's not a security violation BUG(),
which you hate. We just don't want to erase wrong memory. We have discussed that
with Ingo and others.

2. stackleak_check_alloca() detects 'Stack Clash' and it does absolutely similar
things with VMAP_STACK and SCHED_STACK_END_CHECK. Having VMAP_STACK + STACKLEAK
+ THREAD_INFO_IN_TASK together protects us from all known stack depth overflows.

Yes, one day we will remove all VLA's from the mainline kernel. But STACKLEAK
plugin protects un-upstreamed code as well.


I've put so much effort (1.5 years) to polish it and make you, Ingo and others
satisfied!

Best regards,
Alexander


Re: [PATCHv3 2/2] arm64: Add support for STACKLEAK gcc plugin

2018-07-24 Thread Alexander Popov
On 21.07.2018 00:41, Laura Abbott wrote:
> This adds support for the STACKLEAK gcc plugin to arm64 by implementing
> stackleak_check_alloca(), based heavily on the x86 version, and adding the
> two helpers used by the stackleak common code: current_top_of_stack() and
> on_thread_stack(). The stack erasure calls are made at syscall returns.
> Additionally, this disables the plugin in hypervisor and EFI stub code,
> which are out of scope for the protection.
> 
> Reviewed-by: Mark Rutland 
> Reviewed-by: Kees Cook 
> Signed-off-by: Laura Abbott 
> ---
> v3: Actual commit text courtesy of Kees. A comment explaining why we
> panic
> ---
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/include/asm/processor.h| 15 +++
>  arch/arm64/kernel/entry.S |  7 +++
>  arch/arm64/kernel/process.c   | 22 ++
>  arch/arm64/kvm/hyp/Makefile   |  3 ++-
>  drivers/firmware/efi/libstub/Makefile |  3 ++-
>  6 files changed, 49 insertions(+), 2 deletions(-)

Laura, thanks for your work!

I've reviewed and tested this patch on my LeMaker HiKey board (HiSilicon Kirin
620 SoC). The lkdtm tests for STACKLEAK work fine.

Acked-by: Alexander Popov 

For testing I applied your patches above Kees' for-next/kspp:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=for-next/kspp

I've had one trouble with building CONFIG_STACKLEAK_RUNTIME_DISABLE on arm64.
Kees, could you please fold this into the 7th patch of the series?

 >8 

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index f731c9a..03031f7a 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -16,6 +16,7 @@

 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
 #include 
+#include 

 static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass);



Re: [PATCHv3 2/2] arm64: Add support for STACKLEAK gcc plugin

2018-07-24 Thread Alexander Popov
On 21.07.2018 00:41, Laura Abbott wrote:
> This adds support for the STACKLEAK gcc plugin to arm64 by implementing
> stackleak_check_alloca(), based heavily on the x86 version, and adding the
> two helpers used by the stackleak common code: current_top_of_stack() and
> on_thread_stack(). The stack erasure calls are made at syscall returns.
> Additionally, this disables the plugin in hypervisor and EFI stub code,
> which are out of scope for the protection.
> 
> Reviewed-by: Mark Rutland 
> Reviewed-by: Kees Cook 
> Signed-off-by: Laura Abbott 
> ---
> v3: Actual commit text courtesy of Kees. A comment explaining why we
> panic
> ---
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/include/asm/processor.h| 15 +++
>  arch/arm64/kernel/entry.S |  7 +++
>  arch/arm64/kernel/process.c   | 22 ++
>  arch/arm64/kvm/hyp/Makefile   |  3 ++-
>  drivers/firmware/efi/libstub/Makefile |  3 ++-
>  6 files changed, 49 insertions(+), 2 deletions(-)

Laura, thanks for your work!

I've reviewed and tested this patch on my LeMaker HiKey board (HiSilicon Kirin
620 SoC). The lkdtm tests for STACKLEAK work fine.

Acked-by: Alexander Popov 

For testing I applied your patches above Kees' for-next/kspp:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=for-next/kspp

I've had one trouble with building CONFIG_STACKLEAK_RUNTIME_DISABLE on arm64.
Kees, could you please fold this into the 7th patch of the series?

 >8 

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index f731c9a..03031f7a 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -16,6 +16,7 @@

 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE
 #include 
+#include 

 static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass);



Re: [PATCH 2/2] arm64: Clear the stack

2018-07-19 Thread Alexander Popov
Hello Laura,

Thanks again for your work.
Please see some comments below.

On 19.07.2018 00:10, Laura Abbott wrote:
> Implementation of stackleak based heavily on the x86 version
> 
> Signed-off-by: Laura Abbott 
> ---
> Since last time: Minor style cleanups. Re-wrote check_alloca to
> correctly handle all stack types. While doing that, I also realized
> current_top_of_stack was incorrect so I fixed that as well.
> ---
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/include/asm/processor.h| 17 ++
>  arch/arm64/kernel/entry.S |  7 ++
>  arch/arm64/kernel/process.c   | 32 +++
>  arch/arm64/kvm/hyp/Makefile   |  3 ++-
>  drivers/firmware/efi/libstub/Makefile |  3 ++-
>  include/linux/stackleak.h |  1 +
>  7 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 42c090cf0292..216d36a49ab5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -96,6 +96,7 @@ config ARM64
>   select HAVE_ARCH_MMAP_RND_BITS
>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>   select HAVE_ARCH_SECCOMP_FILTER
> + select HAVE_ARCH_STACKLEAK
>   select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>   select HAVE_ARCH_TRACEHOOK
>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index a73ae1e49200..4f3062ee22c6 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -266,5 +266,22 @@ extern void __init minsigstksz_setup(void);
>  #define SVE_SET_VL(arg)  sve_set_current_vl(arg)
>  #define SVE_GET_VL() sve_get_current_vl()
>  
> +/*
> + * For CONFIG_STACKLEAK

Our config option is called CONFIG_GCC_PLUGIN_STACKLEAK.

> + *
> + * These need to be macros because otherwise we get stuck in a nightmare
> + * of header definitions for the use of task_stack_page.
> + */
> +
> +#define current_top_of_stack()   \
> +({   \
> + unsigned long _low = 0; \
> + unsigned long _high = 0;\
> + \
> + current_stack_type(current, current_stack_pointer, &_low, &_high); \
> + _high;  \
> +})

Do you really need _low here? Ah, I see this in the previous patch:
+   if (stack_low && stack_high) {
+   *stack_low = low;
+   *stack_high = high;
+   }

How about checking them against NULL separately? That would allow
+   current_stack_type(current, current_stack_pointer, NULL, &_high);

Also a minor comment - how about aligning backslashes?

> +#define on_thread_stack()(on_task_stack(current, current_stack_pointer, 
> NULL, NULL))
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ASM_PROCESSOR_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 28ad8799406f..67d12016063d 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -431,6 +431,11 @@ tsk  .reqx28 // current thread_info
>  
>   .text
>  
> + .macro  stackleak_erase
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> + bl  stackleak_erase
> +#endif
> + .endm
>  /*
>   * Exception vectors.
>   */
> @@ -910,6 +915,7 @@ ret_fast_syscall:
>   and x2, x1, #_TIF_WORK_MASK
>   cbnzx2, work_pending
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ret_fast_syscall_trace:
>   enable_daif
> @@ -936,6 +942,7 @@ ret_to_user:
>   cbnzx2, work_pending
>  finish_ret_to_user:
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ENDPROC(ret_to_user)
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index e10bc363f533..904defa36689 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -493,3 +493,35 @@ void arch_setup_new_exec(void)
>  {
>   current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
>  }
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +void __used stackleak_check_alloca(unsigned long size)
> +{
> + unsigned long stack_left;
> + enum stack_type type;
> + unsigned long current_sp = current_stack_pointer;
> + unsigned long low, high;
> +
> + type = current_stack_type(current, current_sp, , );
> + BUG_ON(type == STACK_TYPE_UNKNOWN);
> +
> + stack_left = current_sp - low;
> +
> + if (size >= stack_left) {
> + /*
> +  * Kernel stack depth overflow is detected, let's report that.
> +  * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
> +  * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
> +  * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
> +  * panic() in a similar situation, so let's do the same if that
> +  * option is on. Otherwise just use BUG() and hope for the best.
> +

Re: [PATCH 2/2] arm64: Clear the stack

2018-07-19 Thread Alexander Popov
Hello Laura,

Thanks again for your work.
Please see some comments below.

On 19.07.2018 00:10, Laura Abbott wrote:
> Implementation of stackleak based heavily on the x86 version
> 
> Signed-off-by: Laura Abbott 
> ---
> Since last time: Minor style cleanups. Re-wrote check_alloca to
> correctly handle all stack types. While doing that, I also realized
> current_top_of_stack was incorrect so I fixed that as well.
> ---
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/include/asm/processor.h| 17 ++
>  arch/arm64/kernel/entry.S |  7 ++
>  arch/arm64/kernel/process.c   | 32 +++
>  arch/arm64/kvm/hyp/Makefile   |  3 ++-
>  drivers/firmware/efi/libstub/Makefile |  3 ++-
>  include/linux/stackleak.h |  1 +
>  7 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 42c090cf0292..216d36a49ab5 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -96,6 +96,7 @@ config ARM64
>   select HAVE_ARCH_MMAP_RND_BITS
>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>   select HAVE_ARCH_SECCOMP_FILTER
> + select HAVE_ARCH_STACKLEAK
>   select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>   select HAVE_ARCH_TRACEHOOK
>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index a73ae1e49200..4f3062ee22c6 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -266,5 +266,22 @@ extern void __init minsigstksz_setup(void);
>  #define SVE_SET_VL(arg)  sve_set_current_vl(arg)
>  #define SVE_GET_VL() sve_get_current_vl()
>  
> +/*
> + * For CONFIG_STACKLEAK

Our config option is called CONFIG_GCC_PLUGIN_STACKLEAK.

> + *
> + * These need to be macros because otherwise we get stuck in a nightmare
> + * of header definitions for the use of task_stack_page.
> + */
> +
> +#define current_top_of_stack()   \
> +({   \
> + unsigned long _low = 0; \
> + unsigned long _high = 0;\
> + \
> + current_stack_type(current, current_stack_pointer, &_low, &_high); \
> + _high;  \
> +})

Do you really need _low here? Ah, I see this in the previous patch:
+   if (stack_low && stack_high) {
+   *stack_low = low;
+   *stack_high = high;
+   }

How about checking them against NULL separately? That would allow
+   current_stack_type(current, current_stack_pointer, NULL, &_high);

Also a minor comment - how about aligning backslashes?

> +#define on_thread_stack()(on_task_stack(current, current_stack_pointer, 
> NULL, NULL))
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ASM_PROCESSOR_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 28ad8799406f..67d12016063d 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -431,6 +431,11 @@ tsk  .reqx28 // current thread_info
>  
>   .text
>  
> + .macro  stackleak_erase
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> + bl  stackleak_erase
> +#endif
> + .endm
>  /*
>   * Exception vectors.
>   */
> @@ -910,6 +915,7 @@ ret_fast_syscall:
>   and x2, x1, #_TIF_WORK_MASK
>   cbnzx2, work_pending
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ret_fast_syscall_trace:
>   enable_daif
> @@ -936,6 +942,7 @@ ret_to_user:
>   cbnzx2, work_pending
>  finish_ret_to_user:
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ENDPROC(ret_to_user)
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index e10bc363f533..904defa36689 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -493,3 +493,35 @@ void arch_setup_new_exec(void)
>  {
>   current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
>  }
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +void __used stackleak_check_alloca(unsigned long size)
> +{
> + unsigned long stack_left;
> + enum stack_type type;
> + unsigned long current_sp = current_stack_pointer;
> + unsigned long low, high;
> +
> + type = current_stack_type(current, current_sp, , );
> + BUG_ON(type == STACK_TYPE_UNKNOWN);
> +
> + stack_left = current_sp - low;
> +
> + if (size >= stack_left) {
> + /*
> +  * Kernel stack depth overflow is detected, let's report that.
> +  * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
> +  * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
> +  * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
> +  * panic() in a similar situation, so let's do the same if that
> +  * option is on. Otherwise just use BUG() and hope for the best.
> +

Re: [PATCH] arm64: Clear the stack

2018-07-03 Thread Alexander Popov
On 03.07.2018 18:03, Catalin Marinas wrote:
> On Tue, Jul 03, 2018 at 01:14:41PM +0100, Mark Rutland wrote:
>> On Mon, Jul 02, 2018 at 11:48:05AM -0700, Laura Abbott wrote:
>>> On 07/02/2018 06:02 AM, Alexander Popov wrote:
>>>> Could you rename the macro to STACKLEAK_ERASE for similarity with x86?
>>>
>>> Mark Rutland had previously asked for this to be lowercase.
>>> I really don't care one way or the other so I'll defer to
>>> someone else to have the final word.
>>
>> Will, Catalin, could you chime in either way?
>>
>> I'd previously asked for lower-case for consistency with our other
>> assembly macros.
> 
> I'd keep it lowercase as the other arm64 macros in this file.

Ok, thanks, I'm fine with it.

Best regards,
Alexander


Re: [PATCH] arm64: Clear the stack

2018-07-03 Thread Alexander Popov
On 03.07.2018 18:03, Catalin Marinas wrote:
> On Tue, Jul 03, 2018 at 01:14:41PM +0100, Mark Rutland wrote:
>> On Mon, Jul 02, 2018 at 11:48:05AM -0700, Laura Abbott wrote:
>>> On 07/02/2018 06:02 AM, Alexander Popov wrote:
>>>> Could you rename the macro to STACKLEAK_ERASE for similarity with x86?
>>>
>>> Mark Rutland had previously asked for this to be lowercase.
>>> I really don't care one way or the other so I'll defer to
>>> someone else to have the final word.
>>
>> Will, Catalin, could you chime in either way?
>>
>> I'd previously asked for lower-case for consistency with our other
>> assembly macros.
> 
> I'd keep it lowercase as the other arm64 macros in this file.

Ok, thanks, I'm fine with it.

Best regards,
Alexander


Re: [PATCH] arm64: Clear the stack

2018-07-02 Thread Alexander Popov
Hello Laura,

Thanks for your work!
Please see my comments below.

On 29.06.2018 22:05, Laura Abbott wrote:
> Implementation of stackleak based heavily on the x86 version
> 
> Signed-off-by: Laura Abbott 
> ---
> Changes since last time:
> - Minor name change in entry.S
> - Converted to use the generic interfaces so there's minimal additions.
> - Added the fast syscall path.
> - Addition of on_thread_stack and current_top_of_stack
> - Disable stackleak on hyp per suggestion
> - Added a define for check_alloca. I'm still not sure about keeping it
>   since the x86 version got reworked?
> 
> I've mostly kept this as one patch with a minimal commit text. I can
> split it up and elaborate more before final merging.
> ---
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/include/asm/processor.h|  9 +
>  arch/arm64/kernel/entry.S |  7 +++
>  arch/arm64/kernel/process.c   | 16 
>  arch/arm64/kvm/hyp/Makefile   |  3 ++-
>  drivers/firmware/efi/libstub/Makefile |  3 ++-
>  include/linux/stackleak.h |  1 +
>  scripts/Makefile.gcc-plugins  |  5 -
>  8 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eb2cf4938f6d..b0221db95dc9 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -92,6 +92,7 @@ config ARM64
>   select HAVE_ARCH_MMAP_RND_BITS
>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>   select HAVE_ARCH_SECCOMP_FILTER
> + select HAVE_ARCH_STACKLEAK
>   select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>   select HAVE_ARCH_TRACEHOOK
>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index 767598932549..73914fd7e142 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -248,5 +248,14 @@ void cpu_clear_disr(const struct arm64_cpu_capabilities 
> *__unused);
>  #define SVE_SET_VL(arg)  sve_set_current_vl(arg)
>  #define SVE_GET_VL() sve_get_current_vl()
>  
> +/*
> + * For stackleak

May I ask you to call it STACKLEAK here and in other places for easier grep?

> + *
> + * These need to be macros because otherwise we get stuck in a nightmare
> + * of header definitions for the use of task_stack_page.
> + */
> +#define current_top_of_stack()   (task_stack_page(current) + THREAD_SIZE)
> +#define on_thread_stack()(on_task_stack(current, current_stack_pointer))
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ASM_PROCESSOR_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..31c9da7d401e 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -401,6 +401,11 @@ tsk  .reqx28 // current thread_info
>  
>   .text
>  
> + .macro  stackleak_erase

Could you rename the macro to STACKLEAK_ERASE for similarity with x86?

> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> + bl  stackleak_erase_kstack
> +#endif
> + .endm
>  /*
>   * Exception vectors.
>   */
> @@ -880,6 +885,7 @@ ret_fast_syscall:
>   and x2, x1, #_TIF_WORK_MASK
>   cbnzx2, work_pending
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ret_fast_syscall_trace:
>   enable_daif
> @@ -906,6 +912,7 @@ ret_to_user:
>   cbnzx2, work_pending
>  finish_ret_to_user:
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ENDPROC(ret_to_user)
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index f08a2ed9db0d..9f0f135f8b66 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -493,3 +493,19 @@ void arch_setup_new_exec(void)
>  {
>   current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
>  }
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +#define MIN_STACK_LEFT   256
> +
> +void __used stackleak_check_alloca(unsigned long size)
> +{
> + unsigned long sp, stack_left;
> +
> + sp = current_stack_pointer;
> +
> + stack_left = sp & (THREAD_SIZE - 1);
> + BUG_ON(stack_left < MIN_STACK_LEFT ||
> + size >= stack_left - MIN_STACK_LEFT);
> +}
> +EXPORT_SYMBOL(stackleak_check_alloca);
> +#endif

This code should be updated.
You may remember the troubles I had with MIN_STACK_LEFT:
http://openwall.com/lists/kernel-hardening/2018/05/11/12
Please see that thread where Mark Rutland and I worked out the solution.

By the way, different stacks on x86_64 have different sizes. Is it false for 
arm64?

> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 4313f7475333..2fabc2dc1966 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,7 +3,8 @@
>  # Makefile for Kernel-based Virtual Machine module, HYP part
>  #
>  
> -ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
> +ccflags-y += -fno-stack-protector 

Re: [PATCH] arm64: Clear the stack

2018-07-02 Thread Alexander Popov
Hello Laura,

Thanks for your work!
Please see my comments below.

On 29.06.2018 22:05, Laura Abbott wrote:
> Implementation of stackleak based heavily on the x86 version
> 
> Signed-off-by: Laura Abbott 
> ---
> Changes since last time:
> - Minor name change in entry.S
> - Converted to use the generic interfaces so there's minimal additions.
> - Added the fast syscall path.
> - Addition of on_thread_stack and current_top_of_stack
> - Disable stackleak on hyp per suggestion
> - Added a define for check_alloca. I'm still not sure about keeping it
>   since the x86 version got reworked?
> 
> I've mostly kept this as one patch with a minimal commit text. I can
> split it up and elaborate more before final merging.
> ---
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/include/asm/processor.h|  9 +
>  arch/arm64/kernel/entry.S |  7 +++
>  arch/arm64/kernel/process.c   | 16 
>  arch/arm64/kvm/hyp/Makefile   |  3 ++-
>  drivers/firmware/efi/libstub/Makefile |  3 ++-
>  include/linux/stackleak.h |  1 +
>  scripts/Makefile.gcc-plugins  |  5 -
>  8 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index eb2cf4938f6d..b0221db95dc9 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -92,6 +92,7 @@ config ARM64
>   select HAVE_ARCH_MMAP_RND_BITS
>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
>   select HAVE_ARCH_SECCOMP_FILTER
> + select HAVE_ARCH_STACKLEAK
>   select HAVE_ARCH_THREAD_STRUCT_WHITELIST
>   select HAVE_ARCH_TRACEHOOK
>   select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index 767598932549..73914fd7e142 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -248,5 +248,14 @@ void cpu_clear_disr(const struct arm64_cpu_capabilities 
> *__unused);
>  #define SVE_SET_VL(arg)  sve_set_current_vl(arg)
>  #define SVE_GET_VL() sve_get_current_vl()
>  
> +/*
> + * For stackleak

May I ask you to call it STACKLEAK here and in other places for easier grep?

> + *
> + * These need to be macros because otherwise we get stuck in a nightmare
> + * of header definitions for the use of task_stack_page.
> + */
> +#define current_top_of_stack()   (task_stack_page(current) + THREAD_SIZE)
> +#define on_thread_stack()(on_task_stack(current, current_stack_pointer))
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ASM_PROCESSOR_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..31c9da7d401e 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -401,6 +401,11 @@ tsk  .reqx28 // current thread_info
>  
>   .text
>  
> + .macro  stackleak_erase

Could you rename the macro to STACKLEAK_ERASE for similarity with x86?

> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> + bl  stackleak_erase_kstack
> +#endif
> + .endm
>  /*
>   * Exception vectors.
>   */
> @@ -880,6 +885,7 @@ ret_fast_syscall:
>   and x2, x1, #_TIF_WORK_MASK
>   cbnzx2, work_pending
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ret_fast_syscall_trace:
>   enable_daif
> @@ -906,6 +912,7 @@ ret_to_user:
>   cbnzx2, work_pending
>  finish_ret_to_user:
>   enable_step_tsk x1, x2
> + stackleak_erase
>   kernel_exit 0
>  ENDPROC(ret_to_user)
>  
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index f08a2ed9db0d..9f0f135f8b66 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -493,3 +493,19 @@ void arch_setup_new_exec(void)
>  {
>   current->mm->context.flags = is_compat_task() ? MMCF_AARCH32 : 0;
>  }
> +
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +#define MIN_STACK_LEFT   256
> +
> +void __used stackleak_check_alloca(unsigned long size)
> +{
> + unsigned long sp, stack_left;
> +
> + sp = current_stack_pointer;
> +
> + stack_left = sp & (THREAD_SIZE - 1);
> + BUG_ON(stack_left < MIN_STACK_LEFT ||
> + size >= stack_left - MIN_STACK_LEFT);
> +}
> +EXPORT_SYMBOL(stackleak_check_alloca);
> +#endif

This code should be updated.
You may remember the troubles I had with MIN_STACK_LEFT:
http://openwall.com/lists/kernel-hardening/2018/05/11/12
Please see that thread where Mark Rutland and I worked out the solution.

By the way, different stacks on x86_64 have different sizes. Is it false for 
arm64?

> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 4313f7475333..2fabc2dc1966 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,7 +3,8 @@
>  # Makefile for Kernel-based Virtual Machine module, HYP part
>  #
>  
> -ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
> +ccflags-y += -fno-stack-protector 

Re: [PATCH 2/2] arm64: Clear the stack

2018-05-14 Thread Alexander Popov
On 14.05.2018 13:06, Mark Rutland wrote:
> On Mon, May 14, 2018 at 12:35:25PM +0300, Alexander Popov wrote:
>> On 14.05.2018 08:15, Mark Rutland wrote:
>>> On Sun, May 13, 2018 at 11:40:07AM +0300, Alexander Popov wrote:
>>>> So what would you think if I do the following in check_alloca():
>>>>
>>>>if (size >= stack_left) {
>>>> #if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
>>>>panic("alloca over the kernel stack boundary\n");
>>>> #else
>>>>BUG();
>>>> #endif
>>>
>>> Given this is already out-of-line, how about we always use panic(), 
>>> regardless
>>> of VMAP_STACK and SCHED_STACK_END_CHECK? i.e. just
>>>
>>> if (unlikely(size >= stack_left))
>>> panic("alloca over the kernel stack boundary");
>>>
>>> If we have VMAP_STACK selected, and overflow during the panic, it's the 
>>> same as
>>> if we overflowed during the BUG(). It's likely that panic() will use less 
>>> stack
>>> space than BUG(), and the compiler can put the call in a slow path that
>>> shouldn't affect most calls, so in all cases it's likely preferable.
>>
>> I'm sure that maintainers and Linus will strongly dislike my patch if I 
>> always
>> use panic() here. panic() kills the whole kernel and we shouldn't use it 
>> when we
>> can safely continue to work.
>>
>> Let me describe my logic. So let's have size >= stack_left on a thread stack.
>>
>> 1. If CONFIG_VMAP_STACK is enabled, we can safely use BUG(). Even if BUG()
>> handling overflows the thread stack into the guard page, 
>> handle_stack_overflow()
>> is called and the neighbour memory is not corrupted. The kernel can proceed 
>> to live.
> 
> On arm64 with CONFIG_VMAP_STACK, a stack overflow will result in a
> panic(). My understanding was that the same is true on x86.

No, x86 CONFIG_VMAP_STACK only kills the offending process. I see it on my deep
recursion test, the kernel continues to live. handle_stack_overflow() in
arch/x86/kernel/traps.c calls die().

>> 2. If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt the neighbour
>> kernel memory and cause the undefined behaviour of the whole kernel. I see 
>> it on
>> my lkdtm test. That is a cogent reason for panic().
> 
> In this case, panic() can also corrupt the neighbour stack, and could
> also fail.
> 
> When CONFIG_VMAP_STACK is not selected, a stack overflow simply cannot
> be handled reliably -- while panic() may be more likely to succeed, it
> is not gauranteed to.
>
>> 2.a. If CONFIG_SCHED_STACK_END_CHECK is enabled, the kernel already does 
>> panic()
>> when STACK_END_MAGIC is corrupted. So we will _not_ break the safety policy 
>> if
>> we do panic() in a similar situation in check_alloca().
> 
> Sure, I'm certainly happy with panic() here.

Ok!

>> 2.b. If CONFIG_SCHED_STACK_END_CHECK is disabled, the user has some real 
>> reasons
>> not to do panic() when the kernel stack is corrupted. 
> 
> I believe that CONFIG_SCHED_STACK_END_CHECK is seen as a debug feature,
> and hence people don't select it. 

I see CONFIG_SCHED_STACK_END_CHECK enabled by default in Ubuntu config...

> I strongly doubt that people have
> reasons to disable it other than not wanting the overhead associated
> with debug features.

I think it's not a question of performance here. There are cases when a system
must live as long as possible (even partially corrupted) and must not die
entirely. Oops is ok for those systems, but panic (full DoS) is not.

> I think it is reasonable to panic() here even with CONFIG_VMAP_STACK
> selected.

It's too tough for CONFIG_VMAP_STACK on x86 - the system can proceed to live.
Anyway, the check_alloca() code will not be shared between x86 and arm64, I've
described the reasons in this thread. So I can have BUG() for CONFIG_VMAP_STACK
on x86 and Laura can consistently use panic() on arm64.

>> So we should not do it in check_alloca() as well, just use BUG() and
>> hope for the best.
> 
> Regardless of whether we BUG() or panic(), we're hoping for the best.
> 
> Consistently using panic() here will keep things simpler, so any failure
> reported will be easier to reason about, and easier to debug.

Let me keep BUG() for !CONFIG_SCHED_STACK_END_CHECK. I beware of using panic()
by default, let distro/user decide this. I remember very well how I was shouted
at, when this one was merged:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce6fa91b93630396ca220c33dd38ffc62686d499


Mark, I'm really grateful to you for such a nice code review!
Alexander


Re: [PATCH 2/2] arm64: Clear the stack

2018-05-14 Thread Alexander Popov
On 14.05.2018 13:06, Mark Rutland wrote:
> On Mon, May 14, 2018 at 12:35:25PM +0300, Alexander Popov wrote:
>> On 14.05.2018 08:15, Mark Rutland wrote:
>>> On Sun, May 13, 2018 at 11:40:07AM +0300, Alexander Popov wrote:
>>>> So what would you think if I do the following in check_alloca():
>>>>
>>>>if (size >= stack_left) {
>>>> #if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
>>>>panic("alloca over the kernel stack boundary\n");
>>>> #else
>>>>BUG();
>>>> #endif
>>>
>>> Given this is already out-of-line, how about we always use panic(), 
>>> regardless
>>> of VMAP_STACK and SCHED_STACK_END_CHECK? i.e. just
>>>
>>> if (unlikely(size >= stack_left))
>>> panic("alloca over the kernel stack boundary");
>>>
>>> If we have VMAP_STACK selected, and overflow during the panic, it's the 
>>> same as
>>> if we overflowed during the BUG(). It's likely that panic() will use less 
>>> stack
>>> space than BUG(), and the compiler can put the call in a slow path that
>>> shouldn't affect most calls, so in all cases it's likely preferable.
>>
>> I'm sure that maintainers and Linus will strongly dislike my patch if I 
>> always
>> use panic() here. panic() kills the whole kernel and we shouldn't use it 
>> when we
>> can safely continue to work.
>>
>> Let me describe my logic. So let's have size >= stack_left on a thread stack.
>>
>> 1. If CONFIG_VMAP_STACK is enabled, we can safely use BUG(). Even if BUG()
>> handling overflows the thread stack into the guard page, 
>> handle_stack_overflow()
>> is called and the neighbour memory is not corrupted. The kernel can proceed 
>> to live.
> 
> On arm64 with CONFIG_VMAP_STACK, a stack overflow will result in a
> panic(). My understanding was that the same is true on x86.

No, x86 CONFIG_VMAP_STACK only kills the offending process. I see it on my deep
recursion test, the kernel continues to live. handle_stack_overflow() in
arch/x86/kernel/traps.c calls die().

>> 2. If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt the neighbour
>> kernel memory and cause the undefined behaviour of the whole kernel. I see 
>> it on
>> my lkdtm test. That is a cogent reason for panic().
> 
> In this case, panic() can also corrupt the neighbour stack, and could
> also fail.
> 
> When CONFIG_VMAP_STACK is not selected, a stack overflow simply cannot
> be handled reliably -- while panic() may be more likely to succeed, it
> is not gauranteed to.
>
>> 2.a. If CONFIG_SCHED_STACK_END_CHECK is enabled, the kernel already does 
>> panic()
>> when STACK_END_MAGIC is corrupted. So we will _not_ break the safety policy 
>> if
>> we do panic() in a similar situation in check_alloca().
> 
> Sure, I'm certainly happy with panic() here.

Ok!

>> 2.b. If CONFIG_SCHED_STACK_END_CHECK is disabled, the user has some real 
>> reasons
>> not to do panic() when the kernel stack is corrupted. 
> 
> I believe that CONFIG_SCHED_STACK_END_CHECK is seen as a debug feature,
> and hence people don't select it. 

I see CONFIG_SCHED_STACK_END_CHECK enabled by default in Ubuntu config...

> I strongly doubt that people have
> reasons to disable it other than not wanting the overhead associated
> with debug features.

I think it's not a question of performance here. There are cases when a system
must live as long as possible (even partially corrupted) and must not die
entirely. Oops is ok for those systems, but panic (full DoS) is not.

> I think it is reasonable to panic() here even with CONFIG_VMAP_STACK
> selected.

It's too tough for CONFIG_VMAP_STACK on x86 - the system can proceed to live.
Anyway, the check_alloca() code will not be shared between x86 and arm64, I've
described the reasons in this thread. So I can have BUG() for CONFIG_VMAP_STACK
on x86 and Laura can consistently use panic() on arm64.

>> So we should not do it in check_alloca() as well, just use BUG() and
>> hope for the best.
> 
> Regardless of whether we BUG() or panic(), we're hoping for the best.
> 
> Consistently using panic() here will keep things simpler, so any failure
> reported will be easier to reason about, and easier to debug.

Let me keep BUG() for !CONFIG_SCHED_STACK_END_CHECK. I beware of using panic()
by default, let distro/user decide this. I remember very well how I was shouted
at, when this one was merged:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce6fa91b93630396ca220c33dd38ffc62686d499


Mark, I'm really grateful to you for such a nice code review!
Alexander


Re: [PATCH 2/2] arm64: Clear the stack

2018-05-14 Thread Alexander Popov
On 14.05.2018 08:15, Mark Rutland wrote:
> On Sun, May 13, 2018 at 11:40:07AM +0300, Alexander Popov wrote:
>> It seems that previously I was very "lucky" to accidentally have those 
>> MIN_STACK_LEFT,
>> call trace depth and oops=panic together to experience a hang on stack 
>> overflow
>> during BUG().
>>
>>
>> When I run my test in a loop _without_ VMAP_STACK, I manage to corrupt the 
>> neighbour
>> processes with BUG() handling overstepping the stack boundary. It's a pity, 
>> but
>> I have an idea.
> 
> I think that in the absence of VMAP_STACK, there will always be cases where we
> *could* corrupt a neighbouring stack, but I agree that trying to minimize that
> possibility would be good.

Ok!

>> In kernel/sched/core.c we already have:
>>
>> #ifdef CONFIG_SCHED_STACK_END_CHECK
>>  if (task_stack_end_corrupted(prev))
>>  panic("corrupted stack end detected inside scheduler\n");
>> #endif
>>
>> So what would you think if I do the following in check_alloca():
>>
>>  if (size >= stack_left) {
>> #if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
>>  panic("alloca over the kernel stack boundary\n");
>> #else
>>  BUG();
>> #endif
> 
> Given this is already out-of-line, how about we always use panic(), regardless
> of VMAP_STACK and SCHED_STACK_END_CHECK? i.e. just
> 
>   if (unlikely(size >= stack_left))
>   panic("alloca over the kernel stack boundary");
> 
> If we have VMAP_STACK selected, and overflow during the panic, it's the same 
> as
> if we overflowed during the BUG(). It's likely that panic() will use less 
> stack
> space than BUG(), and the compiler can put the call in a slow path that
> shouldn't affect most calls, so in all cases it's likely preferable.

I'm sure that maintainers and Linus will strongly dislike my patch if I always
use panic() here. panic() kills the whole kernel and we shouldn't use it when we
can safely continue to work.

Let me describe my logic. So let's have size >= stack_left on a thread stack.

1. If CONFIG_VMAP_STACK is enabled, we can safely use BUG(). Even if BUG()
handling overflows the thread stack into the guard page, handle_stack_overflow()
is called and the neighbour memory is not corrupted. The kernel can proceed to 
live.

2. If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt the neighbour
kernel memory and cause the undefined behaviour of the whole kernel. I see it on
my lkdtm test. That is a cogent reason for panic().

2.a. If CONFIG_SCHED_STACK_END_CHECK is enabled, the kernel already does panic()
when STACK_END_MAGIC is corrupted. So we will _not_ break the safety policy if
we do panic() in a similar situation in check_alloca().

2.b. If CONFIG_SCHED_STACK_END_CHECK is disabled, the user has some real reasons
not to do panic() when the kernel stack is corrupted. So we should not do it in
check_alloca() as well, just use BUG() and hope for the best.

That logic can be expressed this way:

if (size >= stack_left) {
#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
panic("alloca over the kernel stack boundary\n");
#else
BUG();
#endif

I think I should add a proper comment to describe it.

Thank you.

Best regards,
Alexander


Re: [PATCH 2/2] arm64: Clear the stack

2018-05-14 Thread Alexander Popov
On 14.05.2018 08:15, Mark Rutland wrote:
> On Sun, May 13, 2018 at 11:40:07AM +0300, Alexander Popov wrote:
>> It seems that previously I was very "lucky" to accidentally have those 
>> MIN_STACK_LEFT,
>> call trace depth and oops=panic together to experience a hang on stack 
>> overflow
>> during BUG().
>>
>>
>> When I run my test in a loop _without_ VMAP_STACK, I manage to corrupt the 
>> neighbour
>> processes with BUG() handling overstepping the stack boundary. It's a pity, 
>> but
>> I have an idea.
> 
> I think that in the absence of VMAP_STACK, there will always be cases where we
> *could* corrupt a neighbouring stack, but I agree that trying to minimize that
> possibility would be good.

Ok!

>> In kernel/sched/core.c we already have:
>>
>> #ifdef CONFIG_SCHED_STACK_END_CHECK
>>  if (task_stack_end_corrupted(prev))
>>  panic("corrupted stack end detected inside scheduler\n");
>> #endif
>>
>> So what would you think if I do the following in check_alloca():
>>
>>  if (size >= stack_left) {
>> #if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
>>  panic("alloca over the kernel stack boundary\n");
>> #else
>>  BUG();
>> #endif
> 
> Given this is already out-of-line, how about we always use panic(), regardless
> of VMAP_STACK and SCHED_STACK_END_CHECK? i.e. just
> 
>   if (unlikely(size >= stack_left))
>   panic("alloca over the kernel stack boundary");
> 
> If we have VMAP_STACK selected, and overflow during the panic, it's the same 
> as
> if we overflowed during the BUG(). It's likely that panic() will use less 
> stack
> space than BUG(), and the compiler can put the call in a slow path that
> shouldn't affect most calls, so in all cases it's likely preferable.

I'm sure that maintainers and Linus will strongly dislike my patch if I always
use panic() here. panic() kills the whole kernel and we shouldn't use it when we
can safely continue to work.

Let me describe my logic. So let's have size >= stack_left on a thread stack.

1. If CONFIG_VMAP_STACK is enabled, we can safely use BUG(). Even if BUG()
handling overflows the thread stack into the guard page, handle_stack_overflow()
is called and the neighbour memory is not corrupted. The kernel can proceed to 
live.

2. If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt the neighbour
kernel memory and cause the undefined behaviour of the whole kernel. I see it on
my lkdtm test. That is a cogent reason for panic().

2.a. If CONFIG_SCHED_STACK_END_CHECK is enabled, the kernel already does panic()
when STACK_END_MAGIC is corrupted. So we will _not_ break the safety policy if
we do panic() in a similar situation in check_alloca().

2.b. If CONFIG_SCHED_STACK_END_CHECK is disabled, the user has some real reasons
not to do panic() when the kernel stack is corrupted. So we should not do it in
check_alloca() as well, just use BUG() and hope for the best.

That logic can be expressed this way:

if (size >= stack_left) {
#if !defined(CONFIG_VMAP_STACK) && defined(CONFIG_SCHED_STACK_END_CHECK)
panic("alloca over the kernel stack boundary\n");
#else
BUG();
#endif

I think I should add a proper comment to describe it.

Thank you.

Best regards,
Alexander


[PATCH 2/2] arm64: Clear the stack

2018-05-13 Thread Alexander Popov
Hello Mark,

Thanks a lot for your reply!

On 11.05.2018 19:13, Mark Rutland wrote:
> On Fri, May 11, 2018 at 06:50:09PM +0300, Alexander Popov wrote:
>> On 06.05.2018 11:22, Alexander Popov wrote:
>>> On 04.05.2018 14:09, Mark Rutland wrote:
>>>>>>> +   stack_left = sp & (THREAD_SIZE - 1);
>>>>>>> +   BUG_ON(stack_left < 256 || size >= stack_left - 256);
>>>>>>
>>>>>> Is this arbitrary, or is there something special about 256?
>>>>>>
>>>>>> Even if this is arbitrary, can we give it some mnemonic?
>>>>>
>>>>> It's just a reasonable number. We can introduce a macro for it.
>>>>
>>>> I'm just not sure I see the point in the offset, given things like
>>>> VMAP_STACK exist. BUG_ON() handling will likely require *more* than 256
>>>> bytes of stack, so it seems superfluous, as we'd be relying on stack
>>>> overflow detection at that point.
>>>>
>>>> I can see that we should take the CONFIG_SCHED_STACK_END_CHECK offset
>>>> into account, though.
>>>
>>> Mark, thank you for such an important remark!
>>>
>>> In Kconfig STACKLEAK implies but doesn't depend on VMAP_STACK. In fact 
>>> x86_32
>>> doesn't have VMAP_STACK at all but can have STACKLEAK.
>>>
>>> [Adding Andy Lutomirski]
>>>
>>> I've made some additional experiments: I exhaust the thread stack to have 
>>> only
>>> (MIN_STACK_LEFT - 1) bytes left and then force alloca. If VMAP_STACK is
>>> disabled, BUG_ON() handling causes stack depth overflow, which is detected 
>>> by
>>> SCHED_STACK_END_CHECK. If VMAP_STACK is enabled, the kernel hangs on 
>>> BUG_ON()
>>> handling! Enabling CONFIG_PROVE_LOCKING gives the needed report from 
>>> VMAP_STACK:
> 
> I can't see why CONFIG_VMAP_STACK would only work in conjunction with
> CONFIG_PROVE_LOCKING.
> 
> On arm64 at least, if we overflow the stack while handling a BUG(), we
> *should* trigger the overflow handler as usual, and that should work,
> unless I'm missing something.
> 
> Maybe it gets part-way into panic(), sets up some state,
> stack-overflows, and we get wedged because we're already in a panic?
> Perhaps CONFIG_PROVE_LOCKING causes more stack to be used, so it dies a
> little earlier in panic(), before setting up some state that causes
> wedging.

That seems likely. I later noticed that I had oops=panic kernel parameter.

> ... which sounds like something best fixed in those code paths, and not
> here.
> 
>> [...]
>>
>>> I can't say why VMAP_STACK report hangs during BUG_ON() handling on 
>>> defconfig.
>>> Andy, can you give a clue?
>>>
>>> I see that MIN_STACK_LEFT = 2048 is enough for BUG_ON() handling on both 
>>> x86_64
>>> and x86_32. So I'm going to:
>>>  - set MIN_STACK_LEFT to 2048;
>>>  - improve the lkdtm test to cover this case.
>>>
>>> Mark, Kees, Laura, does it sound good?
>>
>>
>> Could you have a look at the following changes in check_alloca() before I 
>> send
>> the next version?
>>
>> If VMAP_STACK is enabled and alloca causes stack depth overflow, I write to
>> guard page below the thread stack to cause double fault and VMAP_STACK 
>> report.
> 
> On arm64 at least, writing to the guard page will not itself trigger a
> stack overflow, but will trigger a data abort. I suspect similar is true
> on x86, if the stack pointer is sufficiently far above the guard page.

Yes, you are right, my mistake.

The comment about CONFIG_VMAP_STACK in arch/x86/kernel/traps.c says:
"If we overflow the stack into a guard page, the CPU will fail to deliver #PF
and will send #DF instead."

>> If VMAP_STACK is disabled, I use MIN_STACK_LEFT = 2048, which seems to be 
>> enough
>> for BUG_ON() handling both on x86_32 and x86_64. Unfortunately, I can't
>> guarantee that it is always enough.
> 
> I don't think that we can choose something that's guaranteed to be
> sufficient for BUG() handling and also not wasting a tonne of space
> under normal operation.
> 
> Let's figure out what's going wrong on x86 in the case that you mention,
> and try to solve that.
> 
> Here I don't think we should reserve space at all -- it's completely
> arbitrary, and as above we can't guarantee that it's sufficient anyway.
> 
>>  #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
>> -#define MIN_STACK_LEFT 256
>> +#define MIN_STACK_LEFT 2048
>>
>>  void __used check_alloca(unsigned long size)
>&

  1   2   3   4   >