Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-22 Thread Willem de Bruijn
> >
> > > +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> > > +  struct sk_buff *skb)
> > > +{
> > > +   struct udphdr *uh = udp_hdr(skb);
> > > +   struct sk_buff *pp = NULL;
> > > +   struct udphdr *uh2;
> > > +   struct sk_buff *p;
> > > +
> > > +   /* requires non zero csum, for simmetry with GSO */
> > > +   if (!uh->check) {
> > > +   NAPI_GRO_CB(skb)->flush = 1;
> > > +   return NULL;
> > > +   }
> >
> > Why is the requirement of checksums different than in
> > udp_gro_receive? It's not that I care much about UDP
> > packets without a checksum, but you would not need
> > to implement your own loop if the requirement could
> > be the same as in udp_gro_receive.

It would be nice if we could deduplicate the loops, but even without
the checksum difference they look to me a bit too different for it to be
practical, also with the constraints on segment length and max aggregation.

> uhm
> AFAIU, we need to generated aggregated packets that UDP GSO is able to
> process/segment. I was unable to get a nocsum packet segment (possibly
> PEBKAC) so I enforced that condition on the rx path.
>
> @Willem: did I see ghost here? is UDP_SEGMENT fine with no checksum
> segment?

udp_send_skb fails with EIO if ip_summed is anything but CHECKSUM_PARTIAL.

but that's not in the forwarding path. Still, __udp_gso_segment as is
depends on that invariant and will not handle packets with zero
checksum correctly. It unconditionally adjusts uh->check. That could
be changed, of course.


Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-22 Thread Willem de Bruijn
On Mon, Oct 22, 2018 at 6:13 AM Paolo Abeni  wrote:
>
> On Sun, 2018-10-21 at 16:06 -0400, Willem de Bruijn wrote:
> > On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni  wrote:
> > >
> > > This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
> > > with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
> > > eligible for GRO in the rx path: UDP segments directed to such socket
> > > are assembled into a larger GSO_UDP_L4 packet.
> > >
> > > The core UDP GRO support is enabled with setsockopt(UDP_GRO).
> > >
> > > Initial benchmark numbers:
> > >
> > > Before:
> > > udp rx:   1079 MB/s   769065 calls/s
> > >
> > > After:
> > > udp rx:   1466 MB/s24877 calls/s
> > >
> > >
> > > This change introduces a side effect in respect to UDP tunnels:
> > > after a UDP tunnel creation, now the kernel performs a lookup per ingress
> > > UDP packet, while before such lookup happened only if the ingress packet
> > > carried a valid internal header csum.
> > >
> > > v1 -> v2:
> > >  - use a new option to enable UDP GRO
> > >  - use static keys to protect the UDP GRO socket lookup
> > >
> > > Signed-off-by: Paolo Abeni 
> > > ---
> > >  include/linux/udp.h  |   3 +-
> > >  include/uapi/linux/udp.h |   1 +
> > >  net/ipv4/udp.c   |   7 +++
> > >  net/ipv4/udp_offload.c   | 109 +++
> > >  net/ipv6/udp_offload.c   |   6 +--
> > >  5 files changed, 98 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > > index a4dafff407fb..f613b329852e 100644
> > > --- a/include/linux/udp.h
> > > +++ b/include/linux/udp.h
> > > @@ -50,11 +50,12 @@ struct udp_sock {
> > > __u8 encap_type;/* Is this an Encapsulation 
> > > socket? */
> > > unsigned charno_check6_tx:1,/* Send zero UDP6 checksums on 
> > > TX? */
> > >  no_check6_rx:1,/* Allow zero UDP6 checksums on 
> > > RX? */
> > > -encap_enabled:1; /* This socket enabled encap
> > > +encap_enabled:1, /* This socket enabled encap
> > >* processing; UDP tunnels and
> > >* different encapsulation 
> > > layer set
> > >* this
> > >*/
> > > +gro_enabled:1; /* Can accept GRO packets */
> > >
> > > /*
> > >  * Following member retains the information to create a UDP header
> > >  * when the socket is uncorked.
> > > diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
> > > index 09502de447f5..30baccb6c9c4 100644
> > > --- a/include/uapi/linux/udp.h
> > > +++ b/include/uapi/linux/udp.h
> > > @@ -33,6 +33,7 @@ struct udphdr {
> > >  #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
> > >  #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
> > >  #define UDP_SEGMENT103 /* Set GSO segmentation size */
> > > +#define UDP_GRO104 /* This socket can receive UDP 
> > > GRO packets */
> > >
> > >  /* UDP encapsulation types */
> > >  #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* 
> > > draft-ietf-ipsec-nat-t-ike-00/01 */
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 9fcb5374e166..3c277378814f 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -115,6 +115,7 @@
> > >  #include "udp_impl.h"
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > >  struct udp_table udp_table __read_mostly;
> > >  EXPORT_SYMBOL(udp_table);
> > > @@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, 
> > > int optname,
> > > up->gso_size = val;
> > > break;
> > >
> > > +   case UDP_GRO:
> > > +   if (valbool)
> > > +   udp_tunnel_encap_enable(sk->sk_socket);
> > > +   up->gro_enabled = valbool;
> >
> > The socket lock is not held here, so multiple updates to
> > up->gro_enabled and the up->encap_enabled and the static branch can
> > race. Syzkaller is adept at generating those.
>
> Good catch. I was fooled by the current existing code. I think there
> are potentially similar issues for UDP_ENCAP, UDPLITE_SEND_CSCOV, ...
>
> Since the rx path don't take it anymore and we don't risk starving, I
> think we should could/always acquire the socket lock on setsockopt,
> wdyt?

Agreed. We had to add a lot of those in packet_setsockopt for the same reason.


Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-22 Thread Paolo Abeni
On Mon, 2018-10-22 at 13:24 +0200, Steffen Klassert wrote:
> On Fri, Oct 19, 2018 at 04:25:12PM +0200, Paolo Abeni wrote:
> >  
> > +#define UDO_GRO_CNT_MAX 64
> 
> Maybe better UDP_GRO_CNT_MAX?

Oops, typo. Yes, sure, will address in the next iteration.

> Btw. do we really need this explicit limit?
> We should not get more than 64 packets during
> one napi poll cycle.

With HZ >= 1000, gro_flush happens at most once per jiffies: we can
have much more than 64 packets per segment, with appropriate pkt len.

> 
> > +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> > +  struct sk_buff *skb)
> > +{
> > +   struct udphdr *uh = udp_hdr(skb);
> > +   struct sk_buff *pp = NULL;
> > +   struct udphdr *uh2;
> > +   struct sk_buff *p;
> > +
> > +   /* requires non zero csum, for simmetry with GSO */
> > +   if (!uh->check) {
> > +   NAPI_GRO_CB(skb)->flush = 1;
> > +   return NULL;
> > +   }
> 
> Why is the requirement of checksums different than in 
> udp_gro_receive? It's not that I care much about UDP
> packets without a checksum, but you would not need
> to implement your own loop if the requirement could
> be the same as in udp_gro_receive.

uhm 
AFAIU, we need to generated aggregated packets that UDP GSO is able to
process/segment. I was unable to get a nocsum packet segment (possibly
PEBKAC) so I enforced that condition on the rx path.

@Willem: did I see ghost here? is UDP_SEGMENT fine with no checksum
segment?

> > +
> > +   /* pull encapsulating udp header */
> > +   skb_gro_pull(skb, sizeof(struct udphdr));
> > +   skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
> > +
> > +   list_for_each_entry(p, head, list) {
> > +   if (!NAPI_GRO_CB(p)->same_flow)
> > +   continue;
> > +
> > +   uh2 = udp_hdr(p);
> > +
> > +   /* Match ports only, as csum is always non zero */
> > +   if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> > +   NAPI_GRO_CB(p)->same_flow = 0;
> > +   continue;
> > +   }
> > +
> > +   /* Terminate the flow on len mismatch or if it grow "too much".
> > +* Under small packet flood GRO count could elsewhere grow a lot
> > +* leading to execessive truesize values
> > +*/
> > +   if (!skb_gro_receive(p, skb) &&
> > +   NAPI_GRO_CB(p)->count > UDO_GRO_CNT_MAX)
> 
> This allows to merge UDO_GRO_CNT_MAX + 1 packets.

Thanks, will address in the next iteration.

Cheers,

Paolo



Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-22 Thread Steffen Klassert
On Fri, Oct 19, 2018 at 04:25:12PM +0200, Paolo Abeni wrote:
>  
> +#define UDO_GRO_CNT_MAX 64

Maybe better UDP_GRO_CNT_MAX?

Btw. do we really need this explicit limit?
We should not get more than 64 packets during
one napi poll cycle.

> +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> +struct sk_buff *skb)
> +{
> + struct udphdr *uh = udp_hdr(skb);
> + struct sk_buff *pp = NULL;
> + struct udphdr *uh2;
> + struct sk_buff *p;
> +
> + /* requires non zero csum, for simmetry with GSO */
> + if (!uh->check) {
> + NAPI_GRO_CB(skb)->flush = 1;
> + return NULL;
> + }

Why is the requirement of checksums different than in 
udp_gro_receive? It's not that I care much about UDP
packets without a checksum, but you would not need
to implement your own loop if the requirement could
be the same as in udp_gro_receive.

> +
> + /* pull encapsulating udp header */
> + skb_gro_pull(skb, sizeof(struct udphdr));
> + skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
> +
> + list_for_each_entry(p, head, list) {
> + if (!NAPI_GRO_CB(p)->same_flow)
> + continue;
> +
> + uh2 = udp_hdr(p);
> +
> + /* Match ports only, as csum is always non zero */
> + if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> + NAPI_GRO_CB(p)->same_flow = 0;
> + continue;
> + }
> +
> + /* Terminate the flow on len mismatch or if it grow "too much".
> +  * Under small packet flood GRO count could elsewhere grow a lot
> +  * leading to execessive truesize values
> +  */
> + if (!skb_gro_receive(p, skb) &&
> + NAPI_GRO_CB(p)->count > UDO_GRO_CNT_MAX)

This allows to merge UDO_GRO_CNT_MAX + 1 packets.

> + pp = p;
> + else if (uh->len != uh2->len)
> + pp = p;
> +
> + return pp;
> + }
> +
> + /* mismatch, but we never need to flush */
> + return NULL;
> +}



Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-22 Thread Paolo Abeni
On Sun, 2018-10-21 at 16:06 -0400, Willem de Bruijn wrote:
> On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni  wrote:
> > 
> > This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
> > with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
> > eligible for GRO in the rx path: UDP segments directed to such socket
> > are assembled into a larger GSO_UDP_L4 packet.
> > 
> > The core UDP GRO support is enabled with setsockopt(UDP_GRO).
> > 
> > Initial benchmark numbers:
> > 
> > Before:
> > udp rx:   1079 MB/s   769065 calls/s
> > 
> > After:
> > udp rx:   1466 MB/s24877 calls/s
> > 
> > 
> > This change introduces a side effect in respect to UDP tunnels:
> > after a UDP tunnel creation, now the kernel performs a lookup per ingress
> > UDP packet, while before such lookup happened only if the ingress packet
> > carried a valid internal header csum.
> > 
> > v1 -> v2:
> >  - use a new option to enable UDP GRO
> >  - use static keys to protect the UDP GRO socket lookup
> > 
> > Signed-off-by: Paolo Abeni 
> > ---
> >  include/linux/udp.h  |   3 +-
> >  include/uapi/linux/udp.h |   1 +
> >  net/ipv4/udp.c   |   7 +++
> >  net/ipv4/udp_offload.c   | 109 +++
> >  net/ipv6/udp_offload.c   |   6 +--
> >  5 files changed, 98 insertions(+), 28 deletions(-)
> > 
> > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > index a4dafff407fb..f613b329852e 100644
> > --- a/include/linux/udp.h
> > +++ b/include/linux/udp.h
> > @@ -50,11 +50,12 @@ struct udp_sock {
> > __u8 encap_type;/* Is this an Encapsulation socket? 
> > */
> > unsigned charno_check6_tx:1,/* Send zero UDP6 checksums on TX? 
> > */
> >  no_check6_rx:1,/* Allow zero UDP6 checksums on RX? 
> > */
> > -encap_enabled:1; /* This socket enabled encap
> > +encap_enabled:1, /* This socket enabled encap
> >* processing; UDP tunnels and
> >* different encapsulation layer 
> > set
> >* this
> >*/
> > +gro_enabled:1; /* Can accept GRO packets */
> > 
> > /*
> >  * Following member retains the information to create a UDP header
> >  * when the socket is uncorked.
> > diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
> > index 09502de447f5..30baccb6c9c4 100644
> > --- a/include/uapi/linux/udp.h
> > +++ b/include/uapi/linux/udp.h
> > @@ -33,6 +33,7 @@ struct udphdr {
> >  #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
> >  #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
> >  #define UDP_SEGMENT103 /* Set GSO segmentation size */
> > +#define UDP_GRO104 /* This socket can receive UDP GRO 
> > packets */
> > 
> >  /* UDP encapsulation types */
> >  #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* 
> > draft-ietf-ipsec-nat-t-ike-00/01 */
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 9fcb5374e166..3c277378814f 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -115,6 +115,7 @@
> >  #include "udp_impl.h"
> >  #include 
> >  #include 
> > +#include 
> > 
> >  struct udp_table udp_table __read_mostly;
> >  EXPORT_SYMBOL(udp_table);
> > @@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, 
> > int optname,
> > up->gso_size = val;
> > break;
> > 
> > +   case UDP_GRO:
> > +   if (valbool)
> > +   udp_tunnel_encap_enable(sk->sk_socket);
> > +   up->gro_enabled = valbool;
> 
> The socket lock is not held here, so multiple updates to
> up->gro_enabled and the up->encap_enabled and the static branch can
> race. Syzkaller is adept at generating those.

Good catch. I was fooled by the current existing code. I think there
are potentially similar issues for UDP_ENCAP, UDPLITE_SEND_CSCOV, ...

Since the rx path don't take it anymore and we don't risk starving, I
think we should could/always acquire the socket lock on setsockopt,
wdyt?

> > +#define UDO_GRO_CNT_MAX 64
> > +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> > +  struct sk_buff *skb)
> > +{
> > +   struct udphdr *uh = udp_hdr(skb);
> > +   struct sk_buff *pp = NULL;
> > +   struct udphdr *uh2;
> > +   struct sk_buff *p;
> > +
> > +   /* requires non zero csum, for simmetry with GSO */
> 
> symmetry

Thanks ;)

Paolo



Re: [RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-21 Thread Willem de Bruijn
On Fri, Oct 19, 2018 at 10:30 AM Paolo Abeni  wrote:
>
> This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
> with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
> eligible for GRO in the rx path: UDP segments directed to such socket
> are assembled into a larger GSO_UDP_L4 packet.
>
> The core UDP GRO support is enabled with setsockopt(UDP_GRO).
>
> Initial benchmark numbers:
>
> Before:
> udp rx:   1079 MB/s   769065 calls/s
>
> After:
> udp rx:   1466 MB/s24877 calls/s
>
>
> This change introduces a side effect in respect to UDP tunnels:
> after a UDP tunnel creation, now the kernel performs a lookup per ingress
> UDP packet, while before such lookup happened only if the ingress packet
> carried a valid internal header csum.
>
> v1 -> v2:
>  - use a new option to enable UDP GRO
>  - use static keys to protect the UDP GRO socket lookup
>
> Signed-off-by: Paolo Abeni 
> ---
>  include/linux/udp.h  |   3 +-
>  include/uapi/linux/udp.h |   1 +
>  net/ipv4/udp.c   |   7 +++
>  net/ipv4/udp_offload.c   | 109 +++
>  net/ipv6/udp_offload.c   |   6 +--
>  5 files changed, 98 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index a4dafff407fb..f613b329852e 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -50,11 +50,12 @@ struct udp_sock {
> __u8 encap_type;/* Is this an Encapsulation socket? */
> unsigned charno_check6_tx:1,/* Send zero UDP6 checksums on TX? */
>  no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
> -encap_enabled:1; /* This socket enabled encap
> +encap_enabled:1, /* This socket enabled encap
>* processing; UDP tunnels and
>* different encapsulation layer set
>* this
>*/
> +gro_enabled:1; /* Can accept GRO packets */
>
> /*
>  * Following member retains the information to create a UDP header
>  * when the socket is uncorked.
> diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
> index 09502de447f5..30baccb6c9c4 100644
> --- a/include/uapi/linux/udp.h
> +++ b/include/uapi/linux/udp.h
> @@ -33,6 +33,7 @@ struct udphdr {
>  #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
>  #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
>  #define UDP_SEGMENT103 /* Set GSO segmentation size */
> +#define UDP_GRO104 /* This socket can receive UDP GRO 
> packets */
>
>  /* UDP encapsulation types */
>  #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* draft-ietf-ipsec-nat-t-ike-00/01 
> */
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 9fcb5374e166..3c277378814f 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -115,6 +115,7 @@
>  #include "udp_impl.h"
>  #include 
>  #include 
> +#include 
>
>  struct udp_table udp_table __read_mostly;
>  EXPORT_SYMBOL(udp_table);
> @@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int 
> optname,
> up->gso_size = val;
> break;
>
> +   case UDP_GRO:
> +   if (valbool)
> +   udp_tunnel_encap_enable(sk->sk_socket);
> +   up->gro_enabled = valbool;

The socket lock is not held here, so multiple updates to
up->gro_enabled and the up->encap_enabled and the static branch can
race. Syzkaller is adept at generating those.

> +#define UDO_GRO_CNT_MAX 64
> +static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> +  struct sk_buff *skb)
> +{
> +   struct udphdr *uh = udp_hdr(skb);
> +   struct sk_buff *pp = NULL;
> +   struct udphdr *uh2;
> +   struct sk_buff *p;
> +
> +   /* requires non zero csum, for simmetry with GSO */

symmetry


[RFC PATCH v2 02/10] udp: implement GRO for plain UDP sockets.

2018-10-19 Thread Paolo Abeni
This is the RX counterpart of commit bec1f6f69736 ("udp: generate gso
with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
eligible for GRO in the rx path: UDP segments directed to such socket
are assembled into a larger GSO_UDP_L4 packet.

The core UDP GRO support is enabled with setsockopt(UDP_GRO).

Initial benchmark numbers:

Before:
udp rx:   1079 MB/s   769065 calls/s

After:
udp rx:   1466 MB/s24877 calls/s

This change introduces a side effect in respect to UDP tunnels:
after a UDP tunnel creation, now the kernel performs a lookup per ingress
UDP packet, while before such lookup happened only if the ingress packet
carried a valid internal header csum.

v1 -> v2:
 - use a new option to enable UDP GRO
 - use static keys to protect the UDP GRO socket lookup

Signed-off-by: Paolo Abeni 
---
 include/linux/udp.h  |   3 +-
 include/uapi/linux/udp.h |   1 +
 net/ipv4/udp.c   |   7 +++
 net/ipv4/udp_offload.c   | 109 +++
 net/ipv6/udp_offload.c   |   6 +--
 5 files changed, 98 insertions(+), 28 deletions(-)

diff --git a/include/linux/udp.h b/include/linux/udp.h
index a4dafff407fb..f613b329852e 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,11 +50,12 @@ struct udp_sock {
__u8 encap_type;/* Is this an Encapsulation socket? */
unsigned charno_check6_tx:1,/* Send zero UDP6 checksums on TX? */
 no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
-encap_enabled:1; /* This socket enabled encap
+encap_enabled:1, /* This socket enabled encap
   * processing; UDP tunnels and
   * different encapsulation layer set
   * this
   */
+gro_enabled:1; /* Can accept GRO packets */
/*
 * Following member retains the information to create a UDP header
 * when the socket is uncorked.
diff --git a/include/uapi/linux/udp.h b/include/uapi/linux/udp.h
index 09502de447f5..30baccb6c9c4 100644
--- a/include/uapi/linux/udp.h
+++ b/include/uapi/linux/udp.h
@@ -33,6 +33,7 @@ struct udphdr {
 #define UDP_NO_CHECK6_TX 101   /* Disable sending checksum for UDP6X */
 #define UDP_NO_CHECK6_RX 102   /* Disable accpeting checksum for UDP6 */
 #define UDP_SEGMENT103 /* Set GSO segmentation size */
+#define UDP_GRO104 /* This socket can receive UDP GRO 
packets */
 
 /* UDP encapsulation types */
 #define UDP_ENCAP_ESPINUDP_NON_IKE 1 /* draft-ietf-ipsec-nat-t-ike-00/01 */
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9fcb5374e166..3c277378814f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -115,6 +115,7 @@
 #include "udp_impl.h"
 #include 
 #include 
+#include 
 
 struct udp_table udp_table __read_mostly;
 EXPORT_SYMBOL(udp_table);
@@ -2459,6 +2460,12 @@ int udp_lib_setsockopt(struct sock *sk, int level, int 
optname,
up->gso_size = val;
break;
 
+   case UDP_GRO:
+   if (valbool)
+   udp_tunnel_encap_enable(sk->sk_socket);
+   up->gro_enabled = valbool;
+   break;
+
/*
 *  UDP-Lite's partial checksum coverage (RFC 3828).
 */
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 802f2bc00d69..d93c1e8097ba 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -343,6 +343,54 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
*skb,
return segs;
 }
 
+#define UDO_GRO_CNT_MAX 64
+static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
+  struct sk_buff *skb)
+{
+   struct udphdr *uh = udp_hdr(skb);
+   struct sk_buff *pp = NULL;
+   struct udphdr *uh2;
+   struct sk_buff *p;
+
+   /* requires non zero csum, for simmetry with GSO */
+   if (!uh->check) {
+   NAPI_GRO_CB(skb)->flush = 1;
+   return NULL;
+   }
+
+   /* pull encapsulating udp header */
+   skb_gro_pull(skb, sizeof(struct udphdr));
+   skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
+
+   list_for_each_entry(p, head, list) {
+   if (!NAPI_GRO_CB(p)->same_flow)
+   continue;
+
+   uh2 = udp_hdr(p);
+
+   /* Match ports only, as csum is always non zero */
+   if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
+   NAPI_GRO_CB(p)->same_flow = 0;
+   continue;
+   }
+
+   /* Terminate the flow on len mismatch or if it grow "too much".
+* Under small packet flood GRO count could elsewhere grow a lot
+* leading to execessive truesize values
+*/
+   if (!skb_gro_receive(p, skb