RE: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members

2018-10-10 Thread David Laight
From: Eric Dumazet
> Sent: 09 October 2018 21:52
> 
> On 10/09/2018 01:24 PM, Heiner Kallweit wrote:
> 
> > Reordering the struct members to fill the holes could be a little tricky
> > and could have side effects because it may make a performance difference
> > whether certain members are in one cacheline or not.
> > And whether it's worth to spend this effort (incl. the related risks)
> > just to save a few bytes (also considering that typically we have quite
> > few instances of struct net_device)?
> 
> Not really.
> 
> In fact we probably should spend time reordering fields for performance,
> since some new fields were added a bit randomly, breaking the goal of data 
> locality.
> 
> Some fields are used in control path only can could be moved out of the cache 
> lines
> needed in data path (fast path).

Interesting thought
The memory allocator rounds sizes up to a power of 2 and gives out memory
aligned to that value.
This means that the cache lines just above powers of 2 are used far
more frequently than those below one.
This will be made worse because the commonly used fields are normally at
the start of a structure.
This ought to be measurable?

Has anyone tried randomly splitting the padding between the start
and end of the allocation (while maintaining cache alignment)?
(Not sure how this would affect kfree().)

Or splitting pages (or small groups of pages) into non-power of 2
size blocks?
For instance you get three 1344 (21*64) byte blocks and five 768 byte
blocks into a 4k page.
These could give a significant footprint reduction as well as
balancing out cache line usage. 

I also wonder whether it is right to add a lot of padding to cache-line
align structure members on systems with large cache lines.
The intention is probably to get a few fields into the same cache line
not to add padding that may be larger than aggregate size of the fields.

Oh - and it is somewhat pointless because kmalloc() isn't guaranteed
to give out cache-line aligned buffers.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members

2018-10-09 Thread Eric Dumazet



On 10/09/2018 01:24 PM, Heiner Kallweit wrote:

> Reordering the struct members to fill the holes could be a little tricky
> and could have side effects because it may make a performance difference
> whether certain members are in one cacheline or not.
> And whether it's worth to spend this effort (incl. the related risks)
> just to save a few bytes (also considering that typically we have quite
> few instances of struct net_device)?

Not really.

In fact we probably should spend time reordering fields for performance,
since some new fields were added a bit randomly, breaking the goal of data 
locality.

Some fields are used in control path only can could be moved out of the cache 
lines
needed in data path (fast path).



Re: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members

2018-10-09 Thread David Ahern
On 10/9/18 2:24 PM, Heiner Kallweit wrote:
> Reordering the struct members to fill the holes could be a little tricky
> and could have side effects because it may make a performance difference
> whether certain members are in one cacheline or not.
> And whether it's worth to spend this effort (incl. the related risks)
> just to save a few bytes (also considering that typically we have quite
> few instances of struct net_device)?
> 

It would be good to get net_device below 2048 without affecting
performance. Anything else is just moving elements around for the same
end allocation (rounds up to 4096).


Re: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members

2018-10-09 Thread Heiner Kallweit
On 09.10.2018 17:20, David Ahern wrote:
> On 10/8/18 2:17 PM, Heiner Kallweit wrote:
>> bool is good as parameter type or function return type, but if used
>> for struct members it consumes more memory than needed.
>> Changing the bool members of struct net_device to bitfield members
>> allows to decrease the memory footprint of this struct.
> 
> What does pahole show for the size of the struct before and after? I
> suspect you have not really changed the size and certainly not the
> actual memory allocated.
> 
> 
Thanks for the hint to use pahole. Indeed we gain nothing,
so there's no justification for this patch.

before:
/* size: 2496, cachelines: 39, members: 116 */
/* sum members: 2396, holes: 8, sum holes: 80 */
/* padding: 20 */
/* paddings: 4, sum paddings: 19 */
/* bit_padding: 31 bits */

after:  
/* size: 2496, cachelines: 39, members: 116 */
/* sum members: 2394, holes: 8, sum holes: 82 */
/* bit holes: 1, sum bit holes: 8 bits */
/* padding: 20 */
/* paddings: 4, sum paddings: 19 */
/* bit_padding: 27 bits */

The biggest hole is here, because _tx is annotated to be cacheline-aligned.

struct hlist_node  index_hlist;  /*   88816 */

/* XXX 56 bytes hole, try to pack */

/* --- cacheline 15 boundary (960 bytes) --- */
struct netdev_queue *  _tx;  /*   960 8 */

Reordering the struct members to fill the holes could be a little tricky
and could have side effects because it may make a performance difference
whether certain members are in one cacheline or not.
And whether it's worth to spend this effort (incl. the related risks)
just to save a few bytes (also considering that typically we have quite
few instances of struct net_device)?


Re: [PATCH net-next v2] net: core: change bool members of struct net_device to bitfield members

2018-10-09 Thread David Ahern
On 10/8/18 2:17 PM, Heiner Kallweit wrote:
> bool is good as parameter type or function return type, but if used
> for struct members it consumes more memory than needed.
> Changing the bool members of struct net_device to bitfield members
> allows to decrease the memory footprint of this struct.

What does pahole show for the size of the struct before and after? I
suspect you have not really changed the size and certainly not the
actual memory allocated.