Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-27 Thread Jiri Benc
On Tue, 27 Sep 2016 10:38:41 -0600, David Ahern wrote:
> On 9/27/16 1:45 AM, Jiri Benc wrote:
> > On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
> >> you know this code better than me, but key_extract pulls the eth
> >> header and then sets network header. If MPLS labels are present then
> >> it is the labels that the network_header now points to. How did come
> >> to the conclusion it is after the labels?
> > 
> > Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
> > There's a while loop advancing network header.
> 
> got it, thanks. so that block can drop the while loop and just set 
> mpls.top_lse

I think we still need to traverse the loop to set inner_network_header.

 Jiri


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-27 Thread David Ahern
On 9/27/16 1:45 AM, Jiri Benc wrote:
> On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
>> you know this code better than me, but key_extract pulls the eth
>> header and then sets network header. If MPLS labels are present then
>> it is the labels that the network_header now points to. How did come
>> to the conclusion it is after the labels?
> 
> Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
> There's a while loop advancing network header.

got it, thanks. so that block can drop the while loop and just set mpls.top_lse


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-27 Thread Jiri Benc
On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
> you know this code better than me, but key_extract pulls the eth
> header and then sets network header. If MPLS labels are present then
> it is the labels that the network_header now points to. How did come
> to the conclusion it is after the labels?

Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
There's a while loop advancing network header.

 Jiri


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-26 Thread David Ahern
On 9/26/16 11:02 AM, Jiri Benc wrote:
> On Mon, 26 Sep 2016 17:56:22 +0200, Jiri Benc wrote:
>> After push_mpls, network_header points to the start of MPLS headers.
>> Which I understand was the point of this patch. However, push_mpls also
>> calls invalidate_flow_key. Meaning that, depending on actions, we may
>> end up calling key_extract soon after. And key_extract sets the network
>> header *after* the MPLS headers.

you know this code better than me, but key_extract pulls the eth header and 
then sets network header. If MPLS labels are present then it is the labels that 
the network_header now points to. How did come to the conclusion it is after 
the labels?

>>
>> That means that on output, for otherwise identical packet,
>> network_header can point before or after MPLS headers based on what
>> actions happened to be executed (recirculation, mainly).
>>
>> If I'm not misreading the code or missing something, this can't be
>> right.
>>
>> mpls_gso_segment does not care, it resets the network_header anyway.
>> What about drivers? What is the correct behavior?
> 
> Answering to myself: it breaks skb_mac_gso_segment. Seems we need to
> fix key_extract to set network_header to the beginning of MPLS headers.
> I'll prepare a patch.
> 
>  Jiri
> 



Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-26 Thread Jiri Benc
On Mon, 26 Sep 2016 17:56:22 +0200, Jiri Benc wrote:
> After push_mpls, network_header points to the start of MPLS headers.
> Which I understand was the point of this patch. However, push_mpls also
> calls invalidate_flow_key. Meaning that, depending on actions, we may
> end up calling key_extract soon after. And key_extract sets the network
> header *after* the MPLS headers.
> 
> That means that on output, for otherwise identical packet,
> network_header can point before or after MPLS headers based on what
> actions happened to be executed (recirculation, mainly).
> 
> If I'm not misreading the code or missing something, this can't be
> right.
> 
> mpls_gso_segment does not care, it resets the network_header anyway.
> What about drivers? What is the correct behavior?

Answering to myself: it breaks skb_mac_gso_segment. Seems we need to
fix key_extract to set network_header to the beginning of MPLS headers.
I'll prepare a patch.

 Jiri


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-26 Thread Jiri Benc
On Wed, 24 Aug 2016 10:37:51 -0600, David Ahern wrote:
> Something like this should be able to handle multiple labels. The
> inner network header is set once and the outer one pointing to MPLS
> is adjusted each time a label is pushed:
> 
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..0f37b17e3a73 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -162,10 +162,16 @@ static int push_mpls(struct sk_buff *skb,
> struct sw_flow_key *key, if (skb_cow_head(skb, MPLS_HLEN) < 0)
> return -ENOMEM;
> 
> +   if (!skb->inner_protocol) {
> +   skb_set_inner_network_header(skb, skb->mac_len);
> +   skb_set_inner_protocol(skb, skb->protocol);
> +   }
> +
> skb_push(skb, MPLS_HLEN);
> memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
> skb->mac_len);
> skb_reset_mac_header(skb);
> +   skb_set_network_header(skb, skb->mac_len);

Sorry for chiming in after a month. The code above got in
(48d2ab609b6bb), I'm currently looking at this and it looks very
suspicious to me.

After push_mpls, network_header points to the start of MPLS headers.
Which I understand was the point of this patch. However, push_mpls also
calls invalidate_flow_key. Meaning that, depending on actions, we may
end up calling key_extract soon after. And key_extract sets the network
header *after* the MPLS headers.

That means that on output, for otherwise identical packet,
network_header can point before or after MPLS headers based on what
actions happened to be executed (recirculation, mainly).

If I'm not misreading the code or missing something, this can't be
right.

mpls_gso_segment does not care, it resets the network_header anyway.
What about drivers? What is the correct behavior?

 Jiri


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread David Ahern
On 8/24/16 12:53 PM, David Ahern wrote:
> What change is needed in pop_mpls? It already resets the mac_header and if 
> MPLS labels are removed there is no need to set network_header. I take it you 
> mean if the protocol is still MPLS and there are still labels then the 
> network header needs to be set and that means finding the bottom label. Does 
> OVS set the bottom of stack bit? From what I can tell OVS is not parsing the 
> MPLS label so no requirement that BOS is set. Without that there is no way to 
> tell when the labels are done short of guessing.

I was confusing the inner network layer with the mpls network header. Just sent 
a v4. can you verify it works for single and multiple labels with OVS? 


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread pravin shelar
On Wed, Aug 24, 2016 at 11:53 AM, David Ahern  wrote:
> On 8/24/16 11:41 AM, pravin shelar wrote:
>> You also need to change pop_mpls().
>
> What change is needed in pop_mpls? It already resets the mac_header and if 
> MPLS labels are removed there is no need to set network_header. I take it you 
> mean if the protocol is still MPLS and there are still labels then the 
> network header needs to be set and that means finding the bottom label. Does 
> OVS set the bottom of stack bit? From what I can tell OVS is not parsing the 
> MPLS label so no requirement that BOS is set. Without that there is no way to 
> tell when the labels are done short of guessing.
>

OVS mpls push and pop action works on outer most mpls label. So
according to new mpls offsets tracking scheme on mpls_pop action you
need to adjust skb network offset.


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread David Ahern
On 8/24/16 11:41 AM, pravin shelar wrote:
> You also need to change pop_mpls().

What change is needed in pop_mpls? It already resets the mac_header and if MPLS 
labels are removed there is no need to set network_header. I take it you mean 
if the protocol is still MPLS and there are still labels then the network 
header needs to be set and that means finding the bottom label. Does OVS set 
the bottom of stack bit? From what I can tell OVS is not parsing the MPLS label 
so no requirement that BOS is set. Without that there is no way to tell when 
the labels are done short of guessing.

> 
> Anyways I was thinking about the neigh output functions skb pull
> issue, where it is using network-header offset. Can we use mac_len?
> this way we would not use any inner offsets for MPLS skb and current
> scheme used by OVS datapath works.

neigh_resolve_output and neigh_connected_output both do an __skb_pull to the 
network offset. When these functions are called there may or may not be a mac 
header set in the skb making the mac_header unreliable for how you want to use 
it. e.g. I tried this:

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2ae929f9bd06..9f20a0b8e6be 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1292,12 +1292,16 @@ int neigh_resolve_output(struct neighbour *neigh, 
struct sk_buff *skb)
int err;
struct net_device *dev = neigh->dev;
unsigned int seq;
+   unsigned int offset = skb_network_offset(skb);
+
+   if (unlikely(skb_mac_header_was_set(skb)))
+   offset = skb_mac_header(skb) - skb->data;

if (dev->header_ops->cache && !neigh->hh.hh_len)
neigh_hh_init(neigh);

do {
-   __skb_pull(skb, skb_network_offset(skb));
+   __skb_pull(skb, offset);
seq = read_seqbegin(>ha_lock);
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
  neigh->ha, NULL, skb->len);


It does not work. The MPLS packet goes down the stack fine, but when the packet 
is forwarded from one namespace to another you can get a panic since it hits 
neigh_resolve_output with a mac header and the pull above will do the wrong 
thing.

[   18.254133] BUG: unable to handle kernel paging request at 88023860404a
[   18.255566] IP: [] eth_header+0x40/0xaf
[   18.256649] PGD 1c40067 PUD 0
[   18.257277] Oops: 0002 [#1] SMP
[   18.257872] Modules linked in: veth 8021q garp mrp stp llc vrf
[   18.259168] CPU: 2 PID: 868 Comm: ping Not tainted 4.8.0-rc2+ #81
[   18.260308] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   18.262184] task: 88013ab61040 task.stack: 88013509
[   18.263285] RIP: 0010:[]  [] 
eth_header+0x40/0xaf
[   18.264762] RSP: 0018:88013fd03c80  EFLAGS: 00010216
[   18.265791] RAX: 88023860403e RBX: 0008 RCX: 88013a5c18a0
[   18.267040] RDX: 88023860403e RSI: 000e RDI: 88013ab0a200
[   18.268307] RBP: 88013fd03ca8 R08:  R09: 0058
[   18.269556] R10: 88023860403e R11:  R12: 88013a5c18a0
[   18.270807] R13: 880135b0b000 R14: 880135b0b000 R15: 88013a5c1828
[   18.272064] FS:  7fbc44b66700() GS:88013fd0() 
knlGS:
[   18.273477] CS:  0010 DS:  ES:  CR0: 80050033
[   18.274492] CR2: 88023860404a CR3: 0001350c8000 CR4: 000406e0
[   18.275746] Stack:
[   18.276125]   00580246 88013ab0a200 
0002
[   18.277519]  88013a5c1800 88013fd03cb8 813d5912 
88013fd03d00
[   18.278904]  813d73ea 88013a5c18a0 fffc01000246 
88013a5c1838
[   18.280295] Call Trace:
[   18.280712]  
[   18.281049]  [] dev_hard_header.constprop.42+0x26/0x28
[   18.282204]  [] neigh_resolve_output+0x1b9/0x270
[   18.283228]  [] neigh_update+0x372/0x497
[   18.284160]  [] arp_process+0x520/0x572
[   18.285061]  [] arp_rcv+0x10e/0x17d
[   18.285909]  [] __netif_receive_skb_core+0x3ea/0x4b8
[   18.286995]  [] __netif_receive_skb+0x16/0x66
[   18.287993]  [] process_backlog+0xa4/0x132
[   18.288935]  [] net_rx_action+0xd1/0x242
[   18.289854]  [] __do_softirq+0x100/0x26d
[   18.290764]  [] do_softirq_own_stack+0x1c/0x30
[   18.291775]  
[   18.292100]  [] do_softirq+0x30/0x3b
[   18.292968]  [] __local_bh_enable_ip+0x69/0x73
[   18.293919]  [] local_bh_enable+0x15/0x17
[   18.294798]  [] neigh_xmit+0x93/0xe3
[   18.295626]  [] mpls_xmit+0x379/0x3c0
[   18.296464]  [] lwtunnel_xmit+0x48/0x63



Generically though this approach just feels wrong. You want to lump the MPLS 
labels with the ethernet header but not formally, just by playing games with 
skb markers. The core networking stack is resisting this approach.



 


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread pravin shelar
On Wed, Aug 24, 2016 at 9:37 AM, David Ahern  wrote:
> On 8/24/16 10:28 AM, pravin shelar wrote:
>>> How do you feel about implementing the do_output() idea I suggested above?
>>> I'm happy to provide testing and review.
>>
>> I am not sure about changing do_output(). why not just use same scheme
>> to track mpls header in OVS datapath as done in mpls device?
>>
>
> was just replying with the same.
>
> Something like this should be able to handle multiple labels. The inner 
> network header is set once and the outer one pointing to MPLS is adjusted 
> each time a label is pushed:
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..0f37b17e3a73 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -162,10 +162,16 @@ static int push_mpls(struct sk_buff *skb, struct 
> sw_flow_key *key,
> if (skb_cow_head(skb, MPLS_HLEN) < 0)
> return -ENOMEM;
>
> +   if (!skb->inner_protocol) {
> +   skb_set_inner_network_header(skb, skb->mac_len);
> +   skb_set_inner_protocol(skb, skb->protocol);
> +   }
> +
> skb_push(skb, MPLS_HLEN);
> memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
> skb->mac_len);
> skb_reset_mac_header(skb);
> +   skb_set_network_header(skb, skb->mac_len);
>
> new_mpls_lse = (__be32 *)skb_mpls_header(skb);
> *new_mpls_lse = mpls->mpls_lse;
> @@ -173,8 +179,7 @@ static int push_mpls(struct sk_buff *skb, struct 
> sw_flow_key *key,
> skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
>
> update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> -   if (!skb->inner_protocol)
> -   skb_set_inner_protocol(skb, skb->protocol);
> +
> skb->protocol = mpls->mpls_ethertype;
>
> invalidate_flow_key(key);
>
>
>
>
> If it does, what else needs to be changed in OVS to handle the network layer 
> now pointing to the MPLS labels?
>
You also need to change pop_mpls().

Anyways I was thinking about the neigh output functions skb pull
issue, where it is using network-header offset. Can we use mac_len?
this way we would not use any inner offsets for MPLS skb and current
scheme used by OVS datapath works.


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread David Ahern
On 8/24/16 10:28 AM, pravin shelar wrote:
>> How do you feel about implementing the do_output() idea I suggested above?
>> I'm happy to provide testing and review.
> 
> I am not sure about changing do_output(). why not just use same scheme
> to track mpls header in OVS datapath as done in mpls device?
> 

was just replying with the same. 

Something like this should be able to handle multiple labels. The inner network 
header is set once and the outer one pointing to MPLS is adjusted each time a 
label is pushed:

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1ecbd7715f6d..0f37b17e3a73 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -162,10 +162,16 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
if (skb_cow_head(skb, MPLS_HLEN) < 0)
return -ENOMEM;

+   if (!skb->inner_protocol) {
+   skb_set_inner_network_header(skb, skb->mac_len);
+   skb_set_inner_protocol(skb, skb->protocol);
+   }
+
skb_push(skb, MPLS_HLEN);
memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
skb->mac_len);
skb_reset_mac_header(skb);
+   skb_set_network_header(skb, skb->mac_len);

new_mpls_lse = (__be32 *)skb_mpls_header(skb);
*new_mpls_lse = mpls->mpls_lse;
@@ -173,8 +179,7 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);

update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
-   if (!skb->inner_protocol)
-   skb_set_inner_protocol(skb, skb->protocol);
+
skb->protocol = mpls->mpls_ethertype;

invalidate_flow_key(key);




If it does, what else needs to be changed in OVS to handle the network layer 
now pointing to the MPLS labels?



Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread pravin shelar
On Wed, Aug 24, 2016 at 12:20 AM, Simon Horman
 wrote:
> Hi David,
>
> On Tue, Aug 23, 2016 at 01:24:51PM -0600, David Ahern wrote:
>> On 8/22/16 8:51 AM, Simon Horman wrote:
>> >
>> > The scheme that OvS uses so far is that mac_len denotes the number of bytes
>> > from the start of the MAC header until its end. In the absence of MPLS that
>> > will be the beginning of the network header. And in the presence of MPLS it
>> > will be the beginning of the MPLS label stack. The network header is... the
>> > network header. This allows the MAC header, MPLS label stack and network
>> > header to be tracked.
>>
>> The neigh output functions do '__skb_pull(skb, skb_network_offset(skb))' so 
>> if mpls_xmit does not reset the network header the labels get dropped. To me 
>> this says MPLS labels can not be lumped with the mac header which leaves the 
>> only option as the outer network header.
>>
>> >
>> > Pravin (CCed) may have different ideas but I wonder if the above scheme can
>> > be preserved while also meeting the needs of your new MPLS GSO scheme if
>> > you set skb_set_network_header() and skb_set_inner_network_header() in
>> > net/openvswitch/actions.c:do_output().
>> >
>> > It may also be possible to teach OvS to use skb_set_network_header to
>> > denote the beginning of the MPLS LSE and skb_set_inner_network_header to
>> > denote the network header in the presence of MPLS. Which is my current
>> > understanding of what you are trying to achieve. But I think its likely
>> > that I misunderstand things as it seems strange to me to pretend that an
>> > MPLS LSE is a network header and the outer most network header is an inner
>> > network header
>> >
>>
>> This is the only option I can see working, but open to patches showing an
>> alternative.
>
> On reflection I came to a similar conclusion.
>
>> I would like to get it resolved this week so I can move on to gso in the
>> mpls forward case.
>
> How do you feel about implementing the do_output() idea I suggested above?
> I'm happy to provide testing and review.

I am not sure about changing do_output(). why not just use same scheme
to track mpls header in OVS datapath as done in mpls device?


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-24 Thread Simon Horman
Hi David,

On Tue, Aug 23, 2016 at 01:24:51PM -0600, David Ahern wrote:
> On 8/22/16 8:51 AM, Simon Horman wrote:
> > 
> > The scheme that OvS uses so far is that mac_len denotes the number of bytes
> > from the start of the MAC header until its end. In the absence of MPLS that
> > will be the beginning of the network header. And in the presence of MPLS it
> > will be the beginning of the MPLS label stack. The network header is... the
> > network header. This allows the MAC header, MPLS label stack and network
> > header to be tracked.
> 
> The neigh output functions do '__skb_pull(skb, skb_network_offset(skb))' so 
> if mpls_xmit does not reset the network header the labels get dropped. To me 
> this says MPLS labels can not be lumped with the mac header which leaves the 
> only option as the outer network header.
> 
> > 
> > Pravin (CCed) may have different ideas but I wonder if the above scheme can
> > be preserved while also meeting the needs of your new MPLS GSO scheme if
> > you set skb_set_network_header() and skb_set_inner_network_header() in
> > net/openvswitch/actions.c:do_output().
> > 
> > It may also be possible to teach OvS to use skb_set_network_header to
> > denote the beginning of the MPLS LSE and skb_set_inner_network_header to
> > denote the network header in the presence of MPLS. Which is my current
> > understanding of what you are trying to achieve. But I think its likely
> > that I misunderstand things as it seems strange to me to pretend that an
> > MPLS LSE is a network header and the outer most network header is an inner
> > network header
> > 
> 
> This is the only option I can see working, but open to patches showing an
> alternative.

On reflection I came to a similar conclusion.

> I would like to get it resolved this week so I can move on to gso in the
> mpls forward case.

How do you feel about implementing the do_output() idea I suggested above?
I'm happy to provide testing and review.


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-23 Thread David Ahern
On 8/22/16 8:51 AM, Simon Horman wrote:
> 
> The scheme that OvS uses so far is that mac_len denotes the number of bytes
> from the start of the MAC header until its end. In the absence of MPLS that
> will be the beginning of the network header. And in the presence of MPLS it
> will be the beginning of the MPLS label stack. The network header is... the
> network header. This allows the MAC header, MPLS label stack and network
> header to be tracked.

The neigh output functions do '__skb_pull(skb, skb_network_offset(skb))' so if 
mpls_xmit does not reset the network header the labels get dropped. To me this 
says MPLS labels can not be lumped with the mac header which leaves the only 
option as the outer network header.

> 
> Pravin (CCed) may have different ideas but I wonder if the above scheme can
> be preserved while also meeting the needs of your new MPLS GSO scheme if
> you set skb_set_network_header() and skb_set_inner_network_header() in
> net/openvswitch/actions.c:do_output().
> 
> It may also be possible to teach OvS to use skb_set_network_header to
> denote the beginning of the MPLS LSE and skb_set_inner_network_header to
> denote the network header in the presence of MPLS. Which is my current
> understanding of what you are trying to achieve. But I think its likely
> that I misunderstand things as it seems strange to me to pretend that an
> MPLS LSE is a network header and the outer most network header is an inner
> network header
> 

This is the only option I can see working, but open to patches showing an 
alternative.

I would like to get it resolved this week so I can move on to gso in the mpls 
forward case.


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-22 Thread Simon Horman
On Mon, Aug 22, 2016 at 07:11:27AM -0600, David Ahern wrote:
> On 8/22/16 6:21 AM, Simon Horman wrote:
> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> >> index 1ecbd7715f6d..6d78f162a88b 100644
> >> --- a/net/openvswitch/actions.c
> >> +++ b/net/openvswitch/actions.c
> >> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
> >> sw_flow_key *key,
> >>skb->mac_len);
> >>skb_reset_mac_header(skb);
> >>  
> >> +  /* for GSO: set MPLS as network header and encapsulated protocol
> >> +   * header as inner network header
> >> +   */
> >> +  skb_set_network_header(skb, skb->mac_len);
> >> +  skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> >> +
> >>new_mpls_lse = (__be32 *)skb_mpls_header(skb);
> >>*new_mpls_lse = mpls->mpls_lse;
> > 
> > Is the above calculation correct if push_mpls() is called multiple times?
> > 
> 
> No. Does OVS support more than 1? I really need someone who is familiar with 
> the OVS code to make sure it works for all use cases. e.g., set 
> skb_set_inner_network_header() before pushing a series of MPLS labels.

Yes that is supported.

The scheme that OvS uses so far is that mac_len denotes the number of bytes
from the start of the MAC header until its end. In the absence of MPLS that
will be the beginning of the network header. And in the presence of MPLS it
will be the beginning of the MPLS label stack. The network header is... the
network header. This allows the MAC header, MPLS label stack and network
header to be tracked.

Pravin (CCed) may have different ideas but I wonder if the above scheme can
be preserved while also meeting the needs of your new MPLS GSO scheme if
you set skb_set_network_header() and skb_set_inner_network_header() in
net/openvswitch/actions.c:do_output().

It may also be possible to teach OvS to use skb_set_network_header to
denote the beginning of the MPLS LSE and skb_set_inner_network_header to
denote the network header in the presence of MPLS. Which is my current
understanding of what you are trying to achieve. But I think its likely
that I misunderstand things as it seems strange to me to pretend that an
MPLS LSE is a network header and the outer most network header is an inner
network header


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-22 Thread David Ahern
On 8/22/16 6:21 AM, Simon Horman wrote:
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 1ecbd7715f6d..6d78f162a88b 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
>> sw_flow_key *key,
>>  skb->mac_len);
>>  skb_reset_mac_header(skb);
>>  
>> +/* for GSO: set MPLS as network header and encapsulated protocol
>> + * header as inner network header
>> + */
>> +skb_set_network_header(skb, skb->mac_len);
>> +skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
>> +
>>  new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>>  *new_mpls_lse = mpls->mpls_lse;
> 
> Is the above calculation correct if push_mpls() is called multiple times?
> 

No. Does OVS support more than 1? I really need someone who is familiar with 
the OVS code to make sure it works for all use cases. e.g., set 
skb_set_inner_network_header() before pushing a series of MPLS labels.


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-22 Thread Simon Horman
On Fri, Aug 19, 2016 at 10:09:01AM -0700, David Ahern wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
> 
> 1. the inner protocol is not set so the gso segment functions for inner
>protocol layers are not getting run, and
> 
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>are not properly accounted for in mpls_gso_segment.
> 
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
> 
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
> 
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb.
> 
> Afterward the inner protocol segmentation is done the skb protocol
> is set to mpls for each segment and the network and mac headers
> restored.
> 
> Reported-by: Lennert Buytenhek 
> Signed-off-by: David Ahern 
> ---
>  net/mpls/mpls_gso.c   | 38 +++---
>  net/mpls/mpls_iptunnel.c  |  4 
>  net/openvswitch/actions.c |  6 ++
>  3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..2aa4beaa0e4f 100644

...

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..6d78f162a88b 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
> sw_flow_key *key,
>   skb->mac_len);
>   skb_reset_mac_header(skb);
>  
> + /* for GSO: set MPLS as network header and encapsulated protocol
> +  * header as inner network header
> +  */
> + skb_set_network_header(skb, skb->mac_len);
> + skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> +
>   new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>   *new_mpls_lse = mpls->mpls_lse;

Is the above calculation correct if push_mpls() is called multiple times?


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-19 Thread Alexander Duyck
On Fri, Aug 19, 2016 at 10:09 AM, David Ahern  wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
>
> 1. the inner protocol is not set so the gso segment functions for inner
>protocol layers are not getting run, and
>
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>are not properly accounted for in mpls_gso_segment.
>
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
>
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
>
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb.
>
> Afterward the inner protocol segmentation is done the skb protocol
> is set to mpls for each segment and the network and mac headers
> restored.
>
> Reported-by: Lennert Buytenhek 
> Signed-off-by: David Ahern 
> ---
>  net/mpls/mpls_gso.c   | 38 +++---
>  net/mpls/mpls_iptunnel.c  |  4 
>  net/openvswitch/actions.c |  6 ++
>  3 files changed, 37 insertions(+), 11 deletions(-)
>
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..2aa4beaa0e4f 100644
> --- a/net/mpls/mpls_gso.c
> +++ b/net/mpls/mpls_gso.c
> @@ -23,32 +23,48 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff 
> *skb,
>netdev_features_t features)
>  {
> struct sk_buff *segs = ERR_PTR(-EINVAL);
> +   u16 mac_offset = skb->mac_header;
> netdev_features_t mpls_features;
> +   u16 mac_len = skb->mac_len;
> __be16 mpls_protocol;
> +   int mpls_hlen;
> +
> +   skb_reset_network_header(skb);
> +   mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
> +   if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
> +   goto out;
>
> /* Setup inner SKB. */
> mpls_protocol = skb->protocol;
> skb->protocol = skb->inner_protocol;
>
> -   /* Push back the mac header that skb_mac_gso_segment() has pulled.
> -* It will be re-pulled by the call to skb_mac_gso_segment() below
> -*/
> -   __skb_push(skb, skb->mac_len);
> +   __skb_pull(skb, mpls_hlen);
> +
> +   skb->mac_len = 0;
> +   skb_reset_mac_header(skb);
> +   skb_set_network_header(skb, skb_inner_network_offset(skb));

No need to set the network header.  Both IPv4 and IPv6 GSO paths will
reset the network header just like you did at the start.

> /* Segment inner packet. */
> mpls_features = skb->dev->mpls_features & features;
> segs = skb_mac_gso_segment(skb, mpls_features);
> +   if (IS_ERR_OR_NULL(segs)) {
> +   skb_gso_error_unwind(skb, mpls_protocol, mpls_hlen, 
> mac_offset,
> +mac_len);
> +   goto out;
> +   }
>
> +   skb = segs;

You could probably pull your math for mpls_hlen + mac_len out of the
loop below and just take care of adding mac_len to mpls_hlen up here
and store it of in mpls_hlen since it isn't used anywhere else.

> +   do {
> +   skb->mac_len = mac_len;
> +   skb->protocol = mpls_protocol;
>
> -   /* Restore outer protocol. */
> -   skb->protocol = mpls_protocol;
> +   __skb_push(skb, mpls_hlen + mac_len);
>
> -   /* Re-pull the mac header that the call to skb_mac_gso_segment()
> -* above pulled.  It will be re-pushed after returning
> -* skb_mac_gso_segment(), an indirect caller of this function.
> -*/
> -   __skb_pull(skb, skb->data - skb_mac_header(skb));

You need to store off the inner network header before you overwrite it
in the lines below.  Either skb_reset_inner_network_header before the
push, or skb_reset_inner_headers before you call the two lines below.

> +   skb_reset_mac_header(skb);
> +   skb_set_network_header(skb, mac_len);
> +   } while ((skb = skb->next));
>
> +out:
> return segs;
>  }
>
> diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
> index aed872cc05a6..cf52cf30ac4b 100644
> --- a/net/mpls/mpls_iptunnel.c
> +++ b/net/mpls/mpls_iptunnel.c
> @@ -90,7 +90,11 @@ static int mpls_xmit(struct sk_buff *skb)
> if (skb_cow(skb, hh_len + 

[PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-19 Thread David Ahern
As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:

1. the inner protocol is not set so the gso segment functions for inner
   protocol layers are not getting run, and

2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
   are not properly accounted for in mpls_gso_segment.

The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.

This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.

>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb.

Afterward the inner protocol segmentation is done the skb protocol
is set to mpls for each segment and the network and mac headers
restored.

Reported-by: Lennert Buytenhek 
Signed-off-by: David Ahern 
---
 net/mpls/mpls_gso.c   | 38 +++---
 net/mpls/mpls_iptunnel.c  |  4 
 net/openvswitch/actions.c |  6 ++
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 2055e57ed1c3..2aa4beaa0e4f 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -23,32 +23,48 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
   netdev_features_t features)
 {
struct sk_buff *segs = ERR_PTR(-EINVAL);
+   u16 mac_offset = skb->mac_header;
netdev_features_t mpls_features;
+   u16 mac_len = skb->mac_len;
__be16 mpls_protocol;
+   int mpls_hlen;
+
+   skb_reset_network_header(skb);
+   mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
+   if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
+   goto out;
 
/* Setup inner SKB. */
mpls_protocol = skb->protocol;
skb->protocol = skb->inner_protocol;
 
-   /* Push back the mac header that skb_mac_gso_segment() has pulled.
-* It will be re-pulled by the call to skb_mac_gso_segment() below
-*/
-   __skb_push(skb, skb->mac_len);
+   __skb_pull(skb, mpls_hlen);
+
+   skb->mac_len = 0;
+   skb_reset_mac_header(skb);
+   skb_set_network_header(skb, skb_inner_network_offset(skb));
 
/* Segment inner packet. */
mpls_features = skb->dev->mpls_features & features;
segs = skb_mac_gso_segment(skb, mpls_features);
+   if (IS_ERR_OR_NULL(segs)) {
+   skb_gso_error_unwind(skb, mpls_protocol, mpls_hlen, mac_offset,
+mac_len);
+   goto out;
+   }
 
+   skb = segs;
+   do {
+   skb->mac_len = mac_len;
+   skb->protocol = mpls_protocol;
 
-   /* Restore outer protocol. */
-   skb->protocol = mpls_protocol;
+   __skb_push(skb, mpls_hlen + mac_len);
 
-   /* Re-pull the mac header that the call to skb_mac_gso_segment()
-* above pulled.  It will be re-pushed after returning
-* skb_mac_gso_segment(), an indirect caller of this function.
-*/
-   __skb_pull(skb, skb->data - skb_mac_header(skb));
+   skb_reset_mac_header(skb);
+   skb_set_network_header(skb, mac_len);
+   } while ((skb = skb->next));
 
+out:
return segs;
 }
 
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index aed872cc05a6..cf52cf30ac4b 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -90,7 +90,11 @@ static int mpls_xmit(struct sk_buff *skb)
if (skb_cow(skb, hh_len + new_header_size))
goto drop;
 
+   skb_set_inner_protocol(skb, skb->protocol);
+   skb_reset_inner_network_header(skb);
+
skb_push(skb, new_header_size);
+
skb_reset_network_header(skb);
 
skb->dev = out_dev;
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1ecbd7715f6d..6d78f162a88b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
skb->mac_len);
skb_reset_mac_header(skb);
 
+   /* for GSO: set MPLS as network header and encapsulated protocol
+* header as inner network header
+*/
+   skb_set_network_header(skb, skb->mac_len);
+   skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);

Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-18 Thread David Ahern
On 8/18/16 8:37 AM, Alexander Duyck wrote:
> Thought I would go through and do a second pass since it sounds like
> the inner_mac_header idea isn't going to fly.  If we can't push this
> as an L2 encapsulation there are few tweaks we probably need in order
> to make this work as an L3.  I have included comments inline below.
> 
> Also I haven't worked with MPLS much before.  Is there a simple way to
> setup an MPLS tunnel between two hosts connected back to back so that
> I could try testing a few things related to this patch?

Here commands that I use for VMs - copy and paste. It is an adaptation of 
Lennert's namespace script. VM id's are local to my host. Network addresses are 
10.100.1.x/24 and 2100:1::x/120 on eth1 of the respective node. Includes MPLS 
encap, IP-IP encap and none to compare performances.

VM2
===
modprobe mpls_router
modprobe mpls_gso
modprobe mpls_iptunnel

sysctl -w net.mpls.platform_labels=1000
ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.3
ip -6 route add 3000:1::1/128 encap mpls 101 via inet6 2100:1::3

ip tunnel add tun0 mode ipip remote 10.100.1.3
ip link set dev tun0 up
ip route add 10.10.10.11/32 dev tun0

ip route add 10.10.10.12/32 via inet 10.100.1.3
ip -6 route add 3000:1::3/128 via inet6 2100:1::3


VM3
===
modprobe mpls_router
modprobe mpls_gso
modprobe mpls_iptunnel

sysctl -w net.mpls.conf.eth1.input=1
sysctl -w net.mpls.platform_labels=1000
ip -f mpls route add 100 via inet 10.100.2.4
ip -f mpls route add 101 via inet6 2100:2::4

ip tunnel add tun0 mode ipip remote 10.100.1.2
ip link set dev tun0 up
ip ro add 10.10.10.11/32 via 10.100.2.4

ip ro add 10.10.10.12/32 via 10.100.2.4
ip -6 route add 3000:1::3/128 via inet6 2100:2::4


VM4
===
ip addr add 10.10.10.10/32 dev lo
ip addr add 10.10.10.11/32 dev lo
ip addr add 10.10.10.12/32 dev lo

ip -6 addr add 3000:1::1/128 dev lo
ip -6 addr add 3000:1::2/128 dev lo
ip -6 addr add 3000:1::3/128 dev lo

netserver


Go back to VM2:

ping -c 1 10.10.10.10
ping -c 1 10.10.10.11
ping -c 1 10.10.10.12

netperf -c -C -H 10.10.10.10  -l 10 -t TCP_STREAM
netperf -c -C -H 10.10.10.11  -l 10 -t TCP_STREAM
netperf -c -C -H 10.10.10.12  -l 10 -t TCP_STREAM


I'll take a look at your other comments today.



Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-18 Thread Alexander Duyck
Thought I would go through and do a second pass since it sounds like
the inner_mac_header idea isn't going to fly.  If we can't push this
as an L2 encapsulation there are few tweaks we probably need in order
to make this work as an L3.  I have included comments inline below.

Also I haven't worked with MPLS much before.  Is there a simple way to
setup an MPLS tunnel between two hosts connected back to back so that
I could try testing a few things related to this patch?

Thanks.

- Alex


On Wed, Aug 17, 2016 at 2:49 PM, David Ahern  wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
>
> 1. the inner protocol is not set so the gso segment functions for inner
>protocol layers are not getting run, and
>
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>are not properly accounted for in mpls_gso_segment.
>
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
>
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
>
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb. Afterwards the skb protocol is set to mpls for
> each segment as suggested by Simon.
>
> Reported-by: Lennert Buytenhek 
> Signed-off-by: David Ahern 
> ---
>  net/mpls/mpls_gso.c   | 24 +---
>  net/mpls/mpls_iptunnel.c  |  5 +
>  net/openvswitch/actions.c |  6 ++
>  3 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..fa6899f02cc8 100644
> --- a/net/mpls/mpls_gso.c
> +++ b/net/mpls/mpls_gso.c
> @@ -22,33 +22,35 @@
>  static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
>netdev_features_t features)
>  {
> +   int mpls_hlen = skb_inner_network_header(skb) - 
> skb_network_header(skb);
> struct sk_buff *segs = ERR_PTR(-EINVAL);
> +   u16 mac_offset = skb->mac_header;
> netdev_features_t mpls_features;
> __be16 mpls_protocol;
> +   u16 mac_len = skb->mac_len;

So one thing you may want to do here is defer the skb_network_header()
call until after being able to call skb_reset_network_header().  For
reference you might look at how we handle inet_gso_segment.  That way
if at some point in the future we end up having to support MPLS
encapsulated in an IP tunnel it should be able to play the same as
IP-in-IP.

>
> /* Setup inner SKB. */
> mpls_protocol = skb->protocol;
> skb->protocol = skb->inner_protocol;
>
> -   /* Push back the mac header that skb_mac_gso_segment() has pulled.
> -* It will be re-pulled by the call to skb_mac_gso_segment() below
> -*/
> -   __skb_push(skb, skb->mac_len);
> +   __skb_pull(skb, mpls_hlen);
> +   skb->mac_len = skb_inner_network_offset(skb);

So I am not sure sure setting the skb->mac_len here really does
anything.  If I am not mistaken I think the value should always come
out 0 since you already pulled mpls_hlen, and skb->data should be
equal to skb_network_header().  So you might save yourself a few
cycles and just set skb->mac_len = 0.

Also you may need to call skb_reset_mac_header() so that you don't
have the skb_mac_gso_segment call pushing your MPLS header and the
headers below it back on before you can capture those offsets back in
your frame.

> /* Segment inner packet. */
> mpls_features = skb->dev->mpls_features & features;
> segs = skb_mac_gso_segment(skb, mpls_features);
> -
> +   if (IS_ERR_OR_NULL(segs)) {
> +   skb_gso_error_unwind(skb, mpls_protocol, mpls_hlen, 
> mac_offset,
> +mac_len);
> +   goto out;
> +   }
>
> /* Restore outer protocol. */
> skb->protocol = mpls_protocol;
> +   for (skb = segs; skb; skb = skb->next)
> +   skb->protocol = mpls_protocol;

At this point you should probably be pushing back on your MPLS header
and resetting the inner network header, network header, and mac
header.  Otherwise either the inner IPv4 or IPv6 header will be set as
the network_header after you have segmented the frame.  This is one of
the reasons why I thought my original ideal 

Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-17 Thread David Ahern
On 8/17/16 7:06 PM, Alexander Duyck wrote:
> On Wed, Aug 17, 2016 at 4:23 PM, David Ahern  wrote:
>> On 8/17/16 5:16 PM, Alexander Duyck wrote:
 diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
 index 1ecbd7715f6d..6d78f162a88b 100644
 --- a/net/openvswitch/actions.c
 +++ b/net/openvswitch/actions.c
 @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
 sw_flow_key *key,
 skb->mac_len);
 skb_reset_mac_header(skb);

 +   /* for GSO: set MPLS as network header and encapsulated protocol
 +* header as inner network header
 +*/
 +   skb_set_network_header(skb, skb->mac_len);
 +   skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
 +
 new_mpls_lse = (__be32 *)skb_mpls_header(skb);
 *new_mpls_lse = mpls->mpls_lse;

>>>
>>> So the one question I would have about this is how attached are you to
>>> using the network_header to record the offset for the MPLS header?  I
>>> ask because I think from a hardware offloading perspective it would
>>> make it much easier if instead you used the inner_mac_header to
>>> represent the offset for the MPLS header.  This way device drivers
>>> could just skip over it like a VLAN and just use network and transport
>>> header values like they would otherwise.
>>>
>>
>> Where does the network_header relate to if I change the marker to 
>> inner_mac_header? Would it be skipped?
> 
> No, the network header would still be the network header.

If core MPLS code (ie., non-OVS) does not do skb_reset_network_header(skb) 
after adding the MPLS label nothing works. Not even ping with small packets. 
tcpdump shows a completely mangled packet. Right now resetting the 
network_header to mpls is required.



Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-17 Thread Alexander Duyck
On Wed, Aug 17, 2016 at 4:23 PM, David Ahern  wrote:
> On 8/17/16 5:16 PM, Alexander Duyck wrote:
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 1ecbd7715f6d..6d78f162a88b 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
>>> sw_flow_key *key,
>>> skb->mac_len);
>>> skb_reset_mac_header(skb);
>>>
>>> +   /* for GSO: set MPLS as network header and encapsulated protocol
>>> +* header as inner network header
>>> +*/
>>> +   skb_set_network_header(skb, skb->mac_len);
>>> +   skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
>>> +
>>> new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>>> *new_mpls_lse = mpls->mpls_lse;
>>>
>>
>> So the one question I would have about this is how attached are you to
>> using the network_header to record the offset for the MPLS header?  I
>> ask because I think from a hardware offloading perspective it would
>> make it much easier if instead you used the inner_mac_header to
>> represent the offset for the MPLS header.  This way device drivers
>> could just skip over it like a VLAN and just use network and transport
>> header values like they would otherwise.
>>
>
> Where does the network_header relate to if I change the marker to 
> inner_mac_header? Would it be skipped?

No, the network header would still be the network header.

> skb->protocol is set to MPLS.
> mac_header points to ethernet address
> network_header points to ???

The network_header would point to the IP header like it would be for a
non-MPLS frame.

> inner protocol is set to what is encapsulated (e.g., ipv4 or ipv6)

I am okay with this, but wonder if we actually need it.  Do you know
of any protocols other than IPv4 or IPv6 that can be carried over MPLS
and would expect to be offloaded?  If not we may be able to just get
away with recording the network header offset and then using the first
nibble of the network header to determine the IP version since the
value should be 4 or 6 for the two types we are offloading.

> inner_mac_header points to start of mpls label.

So this is what I would expect.

> inner_network points to start of network header.

The problem is that using inner_network_header to point to the network
header will require me to fork the path pretty significantly for most
of the Intel devices that would want to do MPLS GSO.  The assumption
most drivers make is that if we are offloading things then
network_header and inner_network_header will point to either IPv4 or
IPv6 headers.  Introducing MPLS as the network_header with IPv4 or
IPv6 as the inner_network_header throws a kink in the works because we
currently ignore inner_network_header for the devices that are doing
UDP or GRE tunnel GSO via GSO_PARTIAL with TSO_MANGLEID.

> Is that sufficient for h/w drivers?

I think of this as working like how we handle it for IP over IP
tunnels.  In that case we are at L3 so the inner_network_header field
is populated, but the transport header stays the same.  In the case of
MPLS it isn't really L3 it is more of an L2.5 so my preference would
be to treat it like it is an L2 tunnel or VLAN and just overwrite the
inner_mac_header with the MPLS header offset, and leave the network
and transport headers untouched.

One other bonus that also occurred to me is that you might be able to
get away with doing MPLS offloads for MPLS over IP or GRE tunnels.  I
hadn't realized that MPLS inside of these tunnels was a thing, I had
just noticed it while looking over how the IP-in-IP tunnels are all
being handled.  However if you move the header tracking to
inner_mac_header, and can avoid using skb->inner_protocol by instead
using the first nibble of the network_header value then you could
probably support segmenting those types of tunnels in hardware.

- Alex


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-17 Thread David Ahern
On 8/17/16 5:16 PM, Alexander Duyck wrote:
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 1ecbd7715f6d..6d78f162a88b 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
>> sw_flow_key *key,
>> skb->mac_len);
>> skb_reset_mac_header(skb);
>>
>> +   /* for GSO: set MPLS as network header and encapsulated protocol
>> +* header as inner network header
>> +*/
>> +   skb_set_network_header(skb, skb->mac_len);
>> +   skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
>> +
>> new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>> *new_mpls_lse = mpls->mpls_lse;
>>
> 
> So the one question I would have about this is how attached are you to
> using the network_header to record the offset for the MPLS header?  I
> ask because I think from a hardware offloading perspective it would
> make it much easier if instead you used the inner_mac_header to
> represent the offset for the MPLS header.  This way device drivers
> could just skip over it like a VLAN and just use network and transport
> header values like they would otherwise.
> 

Where does the network_header relate to if I change the marker to 
inner_mac_header? Would it be skipped?

skb->protocol is set to MPLS.
mac_header points to ethernet address
network_header points to ???

inner protocol is set to what is encapsulated (e.g., ipv4 or ipv6)
inner_mac_header points to start of mpls label.
inner_network points to start of network header.

Is that sufficient for h/w drivers?


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-17 Thread Alexander Duyck
On Wed, Aug 17, 2016 at 2:49 PM, David Ahern  wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
>
> 1. the inner protocol is not set so the gso segment functions for inner
>protocol layers are not getting run, and
>
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>are not properly accounted for in mpls_gso_segment.
>
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
>
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
>
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb. Afterwards the skb protocol is set to mpls for
> each segment as suggested by Simon.
>
> Reported-by: Lennert Buytenhek 
> Signed-off-by: David Ahern 
> ---
>  net/mpls/mpls_gso.c   | 24 +---
>  net/mpls/mpls_iptunnel.c  |  5 +
>  net/openvswitch/actions.c |  6 ++
>  3 files changed, 24 insertions(+), 11 deletions(-)
>



> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..6d78f162a88b 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
> sw_flow_key *key,
> skb->mac_len);
> skb_reset_mac_header(skb);
>
> +   /* for GSO: set MPLS as network header and encapsulated protocol
> +* header as inner network header
> +*/
> +   skb_set_network_header(skb, skb->mac_len);
> +   skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> +
> new_mpls_lse = (__be32 *)skb_mpls_header(skb);
> *new_mpls_lse = mpls->mpls_lse;
>

So the one question I would have about this is how attached are you to
using the network_header to record the offset for the MPLS header?  I
ask because I think from a hardware offloading perspective it would
make it much easier if instead you used the inner_mac_header to
represent the offset for the MPLS header.  This way device drivers
could just skip over it like a VLAN and just use network and transport
header values like they would otherwise.

- Alex


[PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-17 Thread David Ahern
As reported by Lennert the MPLS GSO code is failing to properly segment
large packets. There are a couple of problems:

1. the inner protocol is not set so the gso segment functions for inner
   protocol layers are not getting run, and

2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
   are not properly accounted for in mpls_gso_segment.

The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
to call the gso segment functions for the higher layer protocols. That
means skb_mac_gso_segment is called twice -- once with the network
protocol set to MPLS and again with the network protocol set to the
inner protocol.

This patch sets the inner skb protocol addressing item 1 above and sets
the network_header and inner_network_header to mark where the MPLS labels
start and end. The MPLS code in OVS is also updated to set the two
network markers.

>From there the MPLS GSO code uses the difference between the network
header and the inner network header to know the size of the MPLS header
that was pushed. It then pulls the MPLS header, resets the mac_len and
protocol for the inner protocol and then calls skb_mac_gso_segment
to segment the skb. Afterwards the skb protocol is set to mpls for
each segment as suggested by Simon.

Reported-by: Lennert Buytenhek 
Signed-off-by: David Ahern 
---
 net/mpls/mpls_gso.c   | 24 +---
 net/mpls/mpls_iptunnel.c  |  5 +
 net/openvswitch/actions.c |  6 ++
 3 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 2055e57ed1c3..fa6899f02cc8 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -22,33 +22,35 @@
 static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
   netdev_features_t features)
 {
+   int mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
struct sk_buff *segs = ERR_PTR(-EINVAL);
+   u16 mac_offset = skb->mac_header;
netdev_features_t mpls_features;
__be16 mpls_protocol;
+   u16 mac_len = skb->mac_len;
 
/* Setup inner SKB. */
mpls_protocol = skb->protocol;
skb->protocol = skb->inner_protocol;
 
-   /* Push back the mac header that skb_mac_gso_segment() has pulled.
-* It will be re-pulled by the call to skb_mac_gso_segment() below
-*/
-   __skb_push(skb, skb->mac_len);
+   __skb_pull(skb, mpls_hlen);
+   skb->mac_len = skb_inner_network_offset(skb);
 
/* Segment inner packet. */
mpls_features = skb->dev->mpls_features & features;
segs = skb_mac_gso_segment(skb, mpls_features);
-
+   if (IS_ERR_OR_NULL(segs)) {
+   skb_gso_error_unwind(skb, mpls_protocol, mpls_hlen, mac_offset,
+mac_len);
+   goto out;
+   }
 
/* Restore outer protocol. */
skb->protocol = mpls_protocol;
+   for (skb = segs; skb; skb = skb->next)
+   skb->protocol = mpls_protocol;
 
-   /* Re-pull the mac header that the call to skb_mac_gso_segment()
-* above pulled.  It will be re-pushed after returning
-* skb_mac_gso_segment(), an indirect caller of this function.
-*/
-   __skb_pull(skb, skb->data - skb_mac_header(skb));
-
+out:
return segs;
 }
 
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index aed872cc05a6..55c5ab907563 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -90,7 +90,12 @@ static int mpls_xmit(struct sk_buff *skb)
if (skb_cow(skb, hh_len + new_header_size))
goto drop;
 
+   skb_set_inner_protocol(skb, skb->protocol);
+   skb_reset_inner_network_header(skb);
+   skb->encapsulation = 1;
+
skb_push(skb, new_header_size);
+
skb_reset_network_header(skb);
 
skb->dev = out_dev;
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 1ecbd7715f6d..6d78f162a88b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
skb->mac_len);
skb_reset_mac_header(skb);
 
+   /* for GSO: set MPLS as network header and encapsulated protocol
+* header as inner network header
+*/
+   skb_set_network_header(skb, skb->mac_len);
+   skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
+
new_mpls_lse = (__be32 *)skb_mpls_header(skb);
*new_mpls_lse = mpls->mpls_lse;
 
-- 
2.1.4