Re: [PATCH 4.4-stable 6/6] bpf: prevent out-of-bounds speculation

2018-01-12 Thread Eric Dumazet
On Fri, 2018-01-12 at 17:17 +0100, Jiri Slaby wrote:
> From: Alexei Starovoitov 
> 
> commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.
> 
> Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
> memory accesses under a bounds check may be speculated even if the
> bounds check fails, providing a primitive for building a side channel.
> 

Make sure to also backport

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=
bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1



[PATCH 4.4-stable 6/6] bpf: prevent out-of-bounds speculation

2018-01-12 Thread Jiri Slaby
From: Alexei Starovoitov 

commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

[js] backport -- no percpu arrays etc.; idx in check_call; map_ptr in struct
 bpf_insn_aux_data

Signed-off-by: Alexei Starovoitov 
Acked-by: John Fastabend 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Jiri Slaby 
---
 include/linux/bpf.h   |  2 ++
 kernel/bpf/arraymap.c | 24 +++-
 kernel/bpf/verifier.c | 46 ++
 3 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4f6d29c8e3d8..f2157159b26f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -37,6 +37,7 @@ struct bpf_map {
u32 value_size;
u32 max_entries;
u32 pages;
+   bool unpriv_array;
struct user_struct *user;
const struct bpf_map_ops *ops;
struct work_struct work;
@@ -141,6 +142,7 @@ struct bpf_prog_aux {
 struct bpf_array {
struct bpf_map map;
u32 elem_size;
+   u32 index_mask;
/* 'ownership' of prog_array is claimed by the first program that
 * is going to use this map or by the first program which FD is stored
 * in the map to make sure that all callers and callees have the same
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index b0799bced518..56f8a8306a49 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -20,8 +20,9 @@
 /* Called from syscall */
 static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 {
+   u32 elem_size, array_size, index_mask, max_entries;
+   bool unpriv = !capable(CAP_SYS_ADMIN);
struct bpf_array *array;
-   u32 elem_size, array_size;
 
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
@@ -36,12 +37,21 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 
elem_size = round_up(attr->value_size, 8);
 
+   max_entries = attr->max_entries;
+   index_mask = roundup_pow_of_two(max_entries) - 1;
+
+   if (unpriv)
+   /* round up array size to nearest power of 2,
+* since cpu will speculate within index_mask limits
+*/
+   max_entries = index_mask + 1;
+
/* check round_up into zero and u32 overflow */
if (elem_size == 0 ||
-   attr->max_entries > (U32_MAX - PAGE_SIZE - sizeof(*array)) / 
elem_size)
+   max_entries > (U32_MAX - PAGE_SIZE - sizeof(*array)) / elem_size)
return ERR_PTR(-ENOMEM);
 
-   array_size = sizeof(*array) + attr->max_entries * elem_size;
+   array_size = sizeof(*array) + max_entries * elem_size;
 
/* allocate all map elements and zero-initialize them */
array = kzalloc(array_size, GFP_USER | __GFP_NOWARN);
@@ -50,6 +60,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
if (!array)
return ERR_PTR(-ENOMEM);
}
+   array->index_mask = index_mask;
+   array->map.unpriv_array = unpriv;
 
/* copy mandatory map attributes */
array->map.key_size = attr->key_size;
@@ -70,7 +82,7 @@ static void *array_map_lookup_elem(struct bpf_map *map, void 
*key)
if (index >= array->map.max_entries)
return NULL;
 
-   return array->value + array->elem_size * index;
+   return array->value + array->elem_size * (index & array->index_mask);
 }
 
 /* Called from sysc