Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure

2018-11-27 Thread David Miller
From: Eric Dumazet 
Date: Mon, 26 Nov 2018 15:05:02 -0800

> One thing that could be done without too much impact is to provide a cbext[]
> only for TCP packets in write/rtx queue, that is not in sk_buff but
> on the struct sk_buff_fclones
> 
> This extra space would not be 0-initialized at alloc_skb()
> and would not be copied at skb_clone()
> 
> I mentioned this idea a while back already.
> 
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 
> 73902acf2b71c8800d81b744a936a7420f33b459..c4bfc2fd98eb9723c0219d5cd8bf5b28afaf5398
>  100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1018,6 +1018,8 @@ struct sk_buff_fclones {
> struct sk_buff  skb2;
>  
> refcount_t  fclone_ref;
> +
> +   charcbext[128] __aligned(8);
>  };

So, potentially at least, MP-TCP could use this.

The down side is that it doesn't allow for the br_netfilter, skb->sp,
etc. consoldation down to one test.


Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure

2018-11-26 Thread Eric Dumazet



On 11/26/2018 03:38 AM, Florian Westphal wrote:
> The (out-of-tree) Multipath-TCP implementation needs a significant amount
> of extra space in the skb control buffer.
> 
> Increasing skb->cb[] size in mainline is a non-starter for memory and
> and performance reasons (f.e. increase in cb size also moves several
> frequently-accessed fields to other cache lines).
>

One thing that could be done without too much impact is to provide a cbext[]
only for TCP packets in write/rtx queue, that is not in sk_buff but
on the struct sk_buff_fclones

This extra space would not be 0-initialized at alloc_skb()
and would not be copied at skb_clone()

I mentioned this idea a while back already.


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 
73902acf2b71c8800d81b744a936a7420f33b459..c4bfc2fd98eb9723c0219d5cd8bf5b28afaf5398
 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1018,6 +1018,8 @@ struct sk_buff_fclones {
struct sk_buff  skb2;
 
refcount_t  fclone_ref;
+
+   charcbext[128] __aligned(8);
 };
 
 /**




Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure

2018-11-26 Thread Florian Westphal
David Miller  wrote:
> From: Florian Westphal 
> Date: Mon, 26 Nov 2018 22:19:33 +0100
> 
> >> In the future please document what is so enormous and absolutely
> >> required that they must put it all into the SKB control block.
> > 
> > Ok, will do.
> 
> Why don't we establish the details about what MP-TCP needs in the CB
> before we discuss this further.
> 
> Could you do that for us?

Sure.

There is also an MPTCP conf call this week and I'll present this
patchset there as well.

I'll post the requirements and what possible alternatives have
been considered here as a follow-up.

Thanks,
Florian


Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure

2018-11-26 Thread David Miller
From: Florian Westphal 
Date: Mon, 26 Nov 2018 22:19:33 +0100

>> In the future please document what is so enormous and absolutely
>> required that they must put it all into the SKB control block.
> 
> Ok, will do.

Why don't we establish the details about what MP-TCP needs in the CB
before we discuss this further.

Could you do that for us?

I'm open minded about your approach still, now that I've taken the
xfrm array and nf_bridge aspects into consideration.  So please keep
pursuing this.

Thanks Florian.


Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure

2018-11-26 Thread Florian Westphal
David Miller  wrote:
> > This adds an extension infrastructure for sk_buff instead:
> > 1. extension memory is released when the sk_buff is free'd.
> > 2. data is shared after cloning an skb.
> > 3. adding extension to an skb will COW the extension
> >buffer if needed.
> 
> So MP-TCP, when enabled for a connection, will have a new atomic
> operation for every packet?

Yes, at least for every kfree_skb call.

> And new tests all in the fast paths of the networking to facilitate
> this feature, a cost paid by everyone.

No, right now everyone has two non-atomic tests (skb->sp + skb->nf_bridge),
with this proposal everyone has one (skb->active_extensions), assuming that
both br_nf and xfrm are converted to use the extension system.

Test(s) occur both on copy/clone and kfree_skb, just like in current
kernels.

atomic test(s) are done in case skb->{sp,nf_bridge} are set, with
this patch its done if skb->active_exensions is != 0.

So from that angle current status is kept.

Main motivation was to find a solution that does not add more costs
for normal cases.

I did a quick hack to also convert skb->sp, it seems possible to do so.

In that case skbuff size is reduced by 8 bytes as sp/nf_bridge get
replaced by single 'extension pointer', and slightly less code
provided kernel is built with both XFRM and bridge netfilter support.

> Sorry, that doesn't seem like a good idea to me.
>
> Can't they just encode whatever huge amount of crap they want to
> put into the CB by deriving the information from skb->sk and some
> tiny value like an index or something to resolve the path?

Perhaps, if thats the only way I'm afraid thats what will need to be
used.  I did try such a scheme once in the past to get
rid of skb->nf_bridge and things became very very fugly due to
kfree_skb() not being aware of such 'external storage', i.e. no
way to easily clean the external storage when an skbuff gets tossed.

Might be possibe to use destructor to take care of this in mptcp case.
I can have a look if this is the only possible way.

> In the future please document what is so enormous and absolutely
> required that they must put it all into the SKB control block.

Ok, will do.

> Like Eric, I am concerned about the slow creep of overhead.  Lots of
> small "not that bad" additions of extra cycles here and there over
> time adds up to impossible to fix performance regressions.

I have the same concern, which is why i am proposing the conversion
of xfrm and nf_bridge to use this instead of the current
nf_bridge/secpath maintanance.

Although MPTCP is the main motivation here, it was intended as a
standalone series, i.e., these 3 patches and a few followup changes
to convert xfrm.

> I'm sorry if this is a major disappointment for the MP-TCP folks but a
> better way needs to be found to integrate what they want to do with
> real zero cost for the rest of the world which won't be using MP-TCP
> and therefore should not be paying for it's added overhead at all.

Agreed.


Re: [RFC PATCH 0/3] sk_buff: add skb extension infrastructure

2018-11-26 Thread David Miller
From: Florian Westphal 
Date: Mon, 26 Nov 2018 12:38:54 +0100

> This adds an extension infrastructure for sk_buff instead:
> 1. extension memory is released when the sk_buff is free'd.
> 2. data is shared after cloning an skb.
> 3. adding extension to an skb will COW the extension
>buffer if needed.

So MP-TCP, when enabled for a connection, will have a new atomic
operation for every packet?

And new tests all in the fast paths of the networking to facilitate
this feature, a cost paid by everyone.

Sorry, that doesn't seem like a good idea to me.

Can't they just encode whatever huge amount of crap they want to
put into the CB by deriving the information from skb->sk and some
tiny value like an index or something to resolve the path?

In the future please document what is so enormous and absolutely
required that they must put it all into the SKB control block.

Like Eric, I am concerned about the slow creep of overhead.  Lots of
small "not that bad" additions of extra cycles here and there over
time adds up to impossible to fix performance regressions.

I'm sorry if this is a major disappointment for the MP-TCP folks but a
better way needs to be found to integrate what they want to do with
real zero cost for the rest of the world which won't be using MP-TCP
and therefore should not be paying for it's added overhead at all.