Re: [PATCH bpf] bpf: avoid false sharing of map refcount with max_entries

2018-01-09 Thread Alexei Starovoitov
On Tue, Jan 09, 2018 at 04:23:08PM +, Edward Cree wrote:
> >
> > Quoting from Goolge's Project Zero blog [1]:
> typo "Goolge".

Applied with typo fixed, thanks Daniel!



Re: [PATCH bpf] bpf: avoid false sharing of map refcount with max_entries

2018-01-09 Thread Daniel Borkmann
On 01/09/2018 05:23 PM, Edward Cree wrote:
> On 09/01/18 12:17, Daniel Borkmann wrote:
>> In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
>> speculation") also change the layout of struct bpf_map such that
>> false sharing of fast-path members like max_entries is avoided
>> when the maps reference counter is altered. Therefore enforce
>> them to be placed into separate cachelines.
>>
>> pahole dump after change:
>>
>>   struct bpf_map {
>> const struct bpf_map_ops  * ops; /* 0 8 */
>> struct bpf_map *   inner_map_meta;   /* 8 8 */
>> void * security; /*16 8 */
>> enum bpf_map_type  map_type; /*24 4 */
>> u32key_size; /*28 4 */
>> u32value_size;   /*32 4 */
>> u32max_entries;  /*36 4 */
>> u32map_flags;/*40 4 */
>> u32pages;/*44 4 */
>> u32id;   /*48 4 */
>> intnuma_node;/*52 4 */
>> bool   unpriv_array; /*56 1 */
>>
>> /* XXX 7 bytes hole, try to pack */
>>
>> /* --- cacheline 1 boundary (64 bytes) --- */
>> struct user_struct *   user; /*64 8 */
>> atomic_t   refcnt;   /*72 4 */
>> atomic_t   usercnt;  /*76 4 */
>> struct work_struct work; /*8032 */
>> char   name[16]; /*   11216 */
>> /* --- cacheline 2 boundary (128 bytes) --- */
>>
>> /* size: 128, cachelines: 2, members: 17 */
>> /* sum members: 121, holes: 1, sum holes: 7 */
>>   };
>>
>> Now all entries in the first cacheline are read only throughout
>> the life time of the map, set up once during map creation. Overall
>> struct size and number of cachelines doesn't change from the
>> reordering. struct bpf_map is usually first member and embedded
>> in map structs in specific map implementations, so also avoid those
>> members to sit at the end where it could potentially share the
>> cacheline with first map values e.g. in the array since remote
>> CPUs could trigger map updates just as well for those (easily
>> dirtying members like max_entries intentionally as well) while
>> having subsequent values in cache.
>>
>> Quoting from Goolge's Project Zero blog [1]:
> typo "Goolge".

Sigh, thanks for catching! Alexei, let me know if you need a resend or
would just amend the message & fix up the typo.


Re: [PATCH bpf] bpf: avoid false sharing of map refcount with max_entries

2018-01-09 Thread Edward Cree
On 09/01/18 12:17, Daniel Borkmann wrote:
> In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
> speculation") also change the layout of struct bpf_map such that
> false sharing of fast-path members like max_entries is avoided
> when the maps reference counter is altered. Therefore enforce
> them to be placed into separate cachelines.
>
> pahole dump after change:
>
>   struct bpf_map {
> const struct bpf_map_ops  * ops; /* 0 8 */
> struct bpf_map *   inner_map_meta;   /* 8 8 */
> void * security; /*16 8 */
> enum bpf_map_type  map_type; /*24 4 */
> u32key_size; /*28 4 */
> u32value_size;   /*32 4 */
> u32max_entries;  /*36 4 */
> u32map_flags;/*40 4 */
> u32pages;/*44 4 */
> u32id;   /*48 4 */
> intnuma_node;/*52 4 */
> bool   unpriv_array; /*56 1 */
>
> /* XXX 7 bytes hole, try to pack */
>
> /* --- cacheline 1 boundary (64 bytes) --- */
> struct user_struct *   user; /*64 8 */
> atomic_t   refcnt;   /*72 4 */
> atomic_t   usercnt;  /*76 4 */
> struct work_struct work; /*8032 */
> char   name[16]; /*   11216 */
> /* --- cacheline 2 boundary (128 bytes) --- */
>
> /* size: 128, cachelines: 2, members: 17 */
> /* sum members: 121, holes: 1, sum holes: 7 */
>   };
>
> Now all entries in the first cacheline are read only throughout
> the life time of the map, set up once during map creation. Overall
> struct size and number of cachelines doesn't change from the
> reordering. struct bpf_map is usually first member and embedded
> in map structs in specific map implementations, so also avoid those
> members to sit at the end where it could potentially share the
> cacheline with first map values e.g. in the array since remote
> CPUs could trigger map updates just as well for those (easily
> dirtying members like max_entries intentionally as well) while
> having subsequent values in cache.
>
> Quoting from Goolge's Project Zero blog [1]:
typo "Goolge".