from:"pravin shelar"

Re: [ovs-dev] [PATCH v2] MAINTAINERS.rst: Move several people to emeritus status

2023-05-27 Thread Pravin Shelar

On Fri, May 19, 2023 at 7:53 AM Russell Bryant  wrote:
>
> The following document discusses emeritus committer status:
>
> https://docs.openvswitch.org/en/latest/internals/committer-emeritus-status/
>
> There are several people who I would guess consider themselves
> emeritus committers but have not formally declared it. Those moved to
> emeritus status in this commit have either explicitly communicated
> their desire to move or have both not been active in the last year and
> have not yet replied to this patch.
>
> It is easy to re-add people in the future should any emeritus
> committer desire to become active again.
>
> Per our policies, a vote of the majority of current committers (or
> the list of maintainers prior to this change) is required to move a
> committer to emeritus status.
>
> Signed-off-by: Russell Bryant 

Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2021-12-09 Thread Pravin Shelar

()

On Mon, Dec 6, 2021 at 3:00 PM Cpp Code  wrote:
>
> On Thu, Dec 2, 2021 at 9:28 PM Pravin Shelar  wrote:
> >
> > On Thu, Dec 2, 2021 at 12:20 PM Cpp Code  wrote:
> > >
> > > On Wed, Dec 1, 2021 at 11:34 PM Pravin Shelar  
> > > wrote:
> > > >
> > > > On Wed, Nov 24, 2021 at 11:33 AM Toms Atteka  
> > > > wrote:
> > > > >
> > > > > This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
> > > > > packets can be filtered using ipv6_ext flag.
> > > > >
> > > > > Signed-off-by: Toms Atteka 
> > > > > ---
> > > > >  include/uapi/linux/openvswitch.h |   6 ++
> > > > >  net/openvswitch/flow.c   | 140 
> > > > > +++
> > > > >  net/openvswitch/flow.h   |  14 
> > > > >  net/openvswitch/flow_netlink.c   |  26 +-
> > > > >  4 files changed, 184 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/include/uapi/linux/openvswitch.h 
> > > > > b/include/uapi/linux/openvswitch.h
> > > > > index a87b44cd5590..43790f07e4a2 100644
> > > > > --- a/include/uapi/linux/openvswitch.h
> > > > > +++ b/include/uapi/linux/openvswitch.h
> > > > > @@ -342,6 +342,7 @@ enum ovs_key_attr {
> > > > > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct 
> > > > > ovs_key_ct_tuple_ipv4 */
> > > > > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct 
> > > > > ovs_key_ct_tuple_ipv6 */
> > > > > OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
> > > > > +   OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
> > > > >
> > > > >  #ifdef __KERNEL__
> > > > > OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> > > > > @@ -421,6 +422,11 @@ struct ovs_key_ipv6 {
> > > > > __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
> > > > >  };
> > > > >
> > > > > +/* separate structure to support backward compatibility with older 
> > > > > user space */
> > > > > +struct ovs_key_ipv6_exthdrs {
> > > > > +   __u16  hdrs;
> > > > > +};
> > > > > +
> > > > >  struct ovs_key_tcp {
> > > > > __be16 tcp_src;
> > > > > __be16 tcp_dst;
> > > > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > > > > index 9d375e74b607..28acb40437ca 100644
> > > > > --- a/net/openvswitch/flow.c
> > > > > +++ b/net/openvswitch/flow.c
> > > > > @@ -239,6 +239,144 @@ static bool icmphdr_ok(struct sk_buff *skb)
> > > > >   sizeof(struct icmphdr));
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * get_ipv6_ext_hdrs() - Parses packet and sets IPv6 extension 
> > > > > header flags.
> > > > > + *
> > > > > + * @skb: buffer where extension header data starts in packet
> > > > > + * @nh: ipv6 header
> > > > > + * @ext_hdrs: flags are stored here
> > > > > + *
> > > > > + * OFPIEH12_UNREP is set if more than one of a given IPv6 extension 
> > > > > header
> > > > > + * is unexpectedly encountered. (Two destination options headers may 
> > > > > be
> > > > > + * expected and would not cause this bit to be set.)
> > > > > + *
> > > > > + * OFPIEH12_UNSEQ is set if IPv6 extension headers were not in the 
> > > > > order
> > > > > + * preferred (but not required) by RFC 2460:
> > > > > + *
> > > > > + * When more than one extension header is used in the same packet, 
> > > > > it is
> > > > > + * recommended that those headers appear in the following order:
> > > > > + *  IPv6 header
> > > > > + *  Hop-by-Hop Options header
> > > > > + *  Destination Options header
> > > > > + *  Routing header
> > > > > + *  Fragment header
> > > > > + *  Authentication header
> > > > > + *  Encapsulating Security Payload header
> > > > > + *  Destination Options header
> > > > > + *  upper-layer header
> > > >

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2021-12-02 Thread Pravin Shelar

On Thu, Dec 2, 2021 at 12:20 PM Cpp Code  wrote:
>
> On Wed, Dec 1, 2021 at 11:34 PM Pravin Shelar  wrote:
> >
> > On Wed, Nov 24, 2021 at 11:33 AM Toms Atteka  wrote:
> > >
> > > This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
> > > packets can be filtered using ipv6_ext flag.
> > >
> > > Signed-off-by: Toms Atteka 
> > > ---
> > >  include/uapi/linux/openvswitch.h |   6 ++
> > >  net/openvswitch/flow.c   | 140 +++
> > >  net/openvswitch/flow.h   |  14 
> > >  net/openvswitch/flow_netlink.c   |  26 +-
> > >  4 files changed, 184 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/openvswitch.h 
> > > b/include/uapi/linux/openvswitch.h
> > > index a87b44cd5590..43790f07e4a2 100644
> > > --- a/include/uapi/linux/openvswitch.h
> > > +++ b/include/uapi/linux/openvswitch.h
> > > @@ -342,6 +342,7 @@ enum ovs_key_attr {
> > > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct 
> > > ovs_key_ct_tuple_ipv4 */
> > > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct 
> > > ovs_key_ct_tuple_ipv6 */
> > > OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
> > > +   OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
> > >
> > >  #ifdef __KERNEL__
> > > OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> > > @@ -421,6 +422,11 @@ struct ovs_key_ipv6 {
> > > __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
> > >  };
> > >
> > > +/* separate structure to support backward compatibility with older user 
> > > space */
> > > +struct ovs_key_ipv6_exthdrs {
> > > +   __u16  hdrs;
> > > +};
> > > +
> > >  struct ovs_key_tcp {
> > > __be16 tcp_src;
> > > __be16 tcp_dst;
> > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > > index 9d375e74b607..28acb40437ca 100644
> > > --- a/net/openvswitch/flow.c
> > > +++ b/net/openvswitch/flow.c
> > > @@ -239,6 +239,144 @@ static bool icmphdr_ok(struct sk_buff *skb)
> > >   sizeof(struct icmphdr));
> > >  }
> > >
> > > +/**
> > > + * get_ipv6_ext_hdrs() - Parses packet and sets IPv6 extension header 
> > > flags.
> > > + *
> > > + * @skb: buffer where extension header data starts in packet
> > > + * @nh: ipv6 header
> > > + * @ext_hdrs: flags are stored here
> > > + *
> > > + * OFPIEH12_UNREP is set if more than one of a given IPv6 extension 
> > > header
> > > + * is unexpectedly encountered. (Two destination options headers may be
> > > + * expected and would not cause this bit to be set.)
> > > + *
> > > + * OFPIEH12_UNSEQ is set if IPv6 extension headers were not in the order
> > > + * preferred (but not required) by RFC 2460:
> > > + *
> > > + * When more than one extension header is used in the same packet, it is
> > > + * recommended that those headers appear in the following order:
> > > + *  IPv6 header
> > > + *  Hop-by-Hop Options header
> > > + *  Destination Options header
> > > + *  Routing header
> > > + *  Fragment header
> > > + *  Authentication header
> > > + *  Encapsulating Security Payload header
> > > + *  Destination Options header
> > > + *  upper-layer header
> > > + */
> > > +static void get_ipv6_ext_hdrs(struct sk_buff *skb, struct ipv6hdr *nh,
> > > + u16 *ext_hdrs)
> > > +{
> > > +   u8 next_type = nh->nexthdr;
> > > +   unsigned int start = skb_network_offset(skb) + sizeof(struct 
> > > ipv6hdr);
> > > +   int dest_options_header_count = 0;
> > > +
> > > +   *ext_hdrs = 0;
> > > +
> > > +   while (ipv6_ext_hdr(next_type)) {
> > > +   struct ipv6_opt_hdr _hdr, *hp;
> > > +
> > > +   switch (next_type) {
> > > +   case IPPROTO_NONE:
> > > +   *ext_hdrs |= OFPIEH12_NONEXT;
> > > +   /* stop parsing */
> > > +   return;
> > > +
> > > +   case IPPROTO_ESP:
> > > +   if (*ext_hdrs & OFPIEH12_ESP)
> > > +   *ext_hdrs |= OF

Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2021-12-01 Thread Pravin Shelar

On Wed, Nov 24, 2021 at 11:33 AM Toms Atteka  wrote:
>
> This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
> packets can be filtered using ipv6_ext flag.
>
> Signed-off-by: Toms Atteka 
> ---
>  include/uapi/linux/openvswitch.h |   6 ++
>  net/openvswitch/flow.c   | 140 +++
>  net/openvswitch/flow.h   |  14 
>  net/openvswitch/flow_netlink.c   |  26 +-
>  4 files changed, 184 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index a87b44cd5590..43790f07e4a2 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -342,6 +342,7 @@ enum ovs_key_attr {
> OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct ovs_key_ct_tuple_ipv4 */
> OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct ovs_key_ct_tuple_ipv6 */
> OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
> +   OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
>
>  #ifdef __KERNEL__
> OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> @@ -421,6 +422,11 @@ struct ovs_key_ipv6 {
> __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
>  };
>
> +/* separate structure to support backward compatibility with older user 
> space */
> +struct ovs_key_ipv6_exthdrs {
> +   __u16  hdrs;
> +};
> +
>  struct ovs_key_tcp {
> __be16 tcp_src;
> __be16 tcp_dst;
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 9d375e74b607..28acb40437ca 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -239,6 +239,144 @@ static bool icmphdr_ok(struct sk_buff *skb)
>   sizeof(struct icmphdr));
>  }
>
> +/**
> + * get_ipv6_ext_hdrs() - Parses packet and sets IPv6 extension header flags.
> + *
> + * @skb: buffer where extension header data starts in packet
> + * @nh: ipv6 header
> + * @ext_hdrs: flags are stored here
> + *
> + * OFPIEH12_UNREP is set if more than one of a given IPv6 extension header
> + * is unexpectedly encountered. (Two destination options headers may be
> + * expected and would not cause this bit to be set.)
> + *
> + * OFPIEH12_UNSEQ is set if IPv6 extension headers were not in the order
> + * preferred (but not required) by RFC 2460:
> + *
> + * When more than one extension header is used in the same packet, it is
> + * recommended that those headers appear in the following order:
> + *  IPv6 header
> + *  Hop-by-Hop Options header
> + *  Destination Options header
> + *  Routing header
> + *  Fragment header
> + *  Authentication header
> + *  Encapsulating Security Payload header
> + *  Destination Options header
> + *  upper-layer header
> + */
> +static void get_ipv6_ext_hdrs(struct sk_buff *skb, struct ipv6hdr *nh,
> + u16 *ext_hdrs)
> +{
> +   u8 next_type = nh->nexthdr;
> +   unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
> +   int dest_options_header_count = 0;
> +
> +   *ext_hdrs = 0;
> +
> +   while (ipv6_ext_hdr(next_type)) {
> +   struct ipv6_opt_hdr _hdr, *hp;
> +
> +   switch (next_type) {
> +   case IPPROTO_NONE:
> +   *ext_hdrs |= OFPIEH12_NONEXT;
> +   /* stop parsing */
> +   return;
> +
> +   case IPPROTO_ESP:
> +   if (*ext_hdrs & OFPIEH12_ESP)
> +   *ext_hdrs |= OFPIEH12_UNREP;
> +   if ((*ext_hdrs & ~(OFPIEH12_HOP | OFPIEH12_DEST |
> +  OFPIEH12_ROUTER | IPPROTO_FRAGMENT 
> |
> +  OFPIEH12_AUTH | OFPIEH12_UNREP)) ||
> +   dest_options_header_count >= 2) {
> +   *ext_hdrs |= OFPIEH12_UNSEQ;
> +   }
> +   *ext_hdrs |= OFPIEH12_ESP;
> +   break;
you need to check_header() before looking into each extension header.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/2] openvswitch: Fix condition check in output_userspace() by using nla_ok()

2021-09-21 Thread Pravin Shelar

On Fri, Sep 17, 2021 at 1:25 AM Jiasheng Jiang  wrote:
>
> Just using 'rem > 0' might be unsafe, so it's better
> to use the nla_ok() instead.
> Because we can see from the nla_next() that
> '*remaining' might be smaller than 'totlen'. And nla_ok()
> will avoid it happening.
> For example, ovs_dp_process_packet() -> ovs_execute_actions()
> -> do_execute_actions() -> output_userspace(), and attr comes
> from OVS_CB(skb)->input_vport,which restores the received packet
> from the user space.
>
> Fixes: ccb1352e76cff0524e7ccb2074826a092dd13016
> ('net: Add Open vSwitch kernel components.')
> Signed-off-by: Jiasheng Jiang 

> ---
>  net/openvswitch/actions.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index c23537f..e8236dd 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -915,8 +915,7 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
> upcall.cmd = OVS_PACKET_CMD_ACTION;
> upcall.mru = OVS_CB(skb)->mru;
>
> -   for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
> -a = nla_next(a, )) {
> +   nla_for_each_nested(a, attr, rem) {
> switch (nla_type(a)) {
> case OVS_USERSPACE_ATTR_USERDATA:
> upcall.userdata = a;

These nl-attributes are built and verified at time of OVS flow
install, so the rest of checks in nla_ok, is not required.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2] openvswitch: Introduce per-cpu upcall dispatch

2021-07-15 Thread Pravin Shelar

On Thu, Jul 15, 2021 at 5:28 AM Mark Gray  wrote:
>
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
>
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
>
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
>
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
>
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
>
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
>
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html
>
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> Acked-by: Flavio Leitner 

Acked-by: Pravin B Shelar 

Thanks,
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-14 Thread Pravin Shelar

On Wed, Jun 30, 2021 at 2:53 AM Mark Gray  wrote:
>
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
>
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
>
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
>
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
>
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
>
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
>
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
>
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
>
> Notes:
> v1 - Reworked based on Flavio's comments:
>  * Fixed handling of userspace action case
>  * Renamed 'struct dp_portids'
>  * Fixed handling of return from kmalloc()
>  * Removed check for dispatch type from ovs_dp_get_upcall_portid()
>- Reworked based on Dan's comments:
>  * Fixed handling of return from kmalloc()
>- Reworked based on Pravin's comments:
>  * Fixed handling of userspace action case
>- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids
>
Patch looks good to me. I have the following minor comments.

>  include/uapi/linux/openvswitch.h |  8 
>  net/openvswitch/actions.c|  6 ++-
>  net/openvswitch/datapath.c   | 70 +++-
>  net/openvswitch/datapath.h   | 20 +
>  4 files changed, 101 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 8d16744edc31..6571b57b2268 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
> OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
> OVS_DP_ATTR_PAD,
> OVS_DP_ATTR_MASKS_CACHE_SIZE,
> +   OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> per-cpu
> +* dispatch mode
> +*/
> __OVS_DP_ATTR_MAX
>  };
>
> @@ -127,6 +132,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2)
>
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU   (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index ef15d9eb4774..f79679746c62 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
> break;
>
> case OVS_USERSPACE_ATTR_PID:
> -   upcall.portid = nla_get_u32(a);
> +   if (dp->user_features & 
> OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> +   upcall.portid =
> +  ovs_dp_get_upcall_portid(dp, 
> smp_processor_id());
> +   else
> +   upcall.portid = nla_get_u32(a);
>

Re: [ovs-dev] [RFC net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-05-19 Thread Pravin Shelar

On Fri, Apr 30, 2021 at 8:33 AM Mark Gray  wrote:
>
> The Open vSwitch kernel module uses the upcall mechanism to send
> packets from kernel space to user space when it misses in the kernel
> space flow table. The upcall sends packets via a Netlink socket.
> Currently, a Netlink socket is created for every vport. In this way,
> there is a 1:1 mapping between a vport and a Netlink socket.
> When a packet is received by a vport, if it needs to be sent to
> user space, it is sent via the corresponding Netlink socket.
>
> This mechanism, with various iterations of the corresponding user
> space code, has seen some limitations and issues:
>
> * On systems with a large number of vports, there is a correspondingly
> large number of Netlink sockets which can limit scaling.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> * Packet reordering on upcalls.
> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> * A thundering herd issue.
> (https://bugzilla.redhat.com/show_bug.cgi?id=183)
>
> This patch introduces an alternative, feature-negotiated, upcall
> mode using a per-cpu dispatch rather than a per-vport dispatch.
>
> In this mode, the Netlink socket to be used for the upcall is
> selected based on the CPU of the thread that is executing the upcall.
> In this way, it resolves the issues above as:
>
> a) The number of Netlink sockets scales with the number of CPUs
> rather than the number of vports.
> b) Ordering per-flow is maintained as packets are distributed to
> CPUs based on mechanisms such as RSS and flows are distributed
> to a single user space thread.
> c) Packets from a flow can only wake up one user space thread.
>
> The corresponding user space code can be found at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
>
> Bugzilla: https://bugzilla.redhat.com/1844576
> Signed-off-by: Mark Gray 
> ---
>  include/uapi/linux/openvswitch.h |  8 
>  net/openvswitch/datapath.c   | 70 +++-
>  net/openvswitch/datapath.h   | 18 
>  net/openvswitch/flow_netlink.c   |  4 --
>  4 files changed, 94 insertions(+), 6 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 8d16744edc31..6571b57b2268 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
>   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
>   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
>   * not be sent.
> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
>   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
>   * datapath.  Always present in notifications.
>   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> the
> @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
> OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
> OVS_DP_ATTR_PAD,
> OVS_DP_ATTR_MASKS_CACHE_SIZE,
> +   OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> per-cpu
> +* dispatch mode
> +*/
> __OVS_DP_ATTR_MAX
>  };
>
> @@ -127,6 +132,9 @@ struct ovs_vport_stats {
>  /* Allow tc offload recirc sharing */
>  #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2)
>
> +/* Allow per-cpu dispatch of upcalls */
> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU   (1 << 3)
> +
>  /* Fixed logical ports. */
>  #define OVSP_LOCAL  ((__u32)0)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 9d6ef6cb9b26..98d54f41fdaa 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -121,6 +121,8 @@ int lockdep_ovsl_is_held(void)
>  #endif
>
>  static struct vport *new_vport(const struct vport_parms *);
> +static u32 ovs_dp_get_upcall_portid(const struct datapath *, uint32_t);
> +static int ovs_dp_set_upcall_portids(struct datapath *, const struct nlattr 
> *);
>  static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
>  const struct sw_flow_key *,
>  const struct dp_upcall_info *,
> @@ -238,7 +240,12 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct 
> sw_flow_key *key)
>
> memset(, 0, sizeof(upcall));
> upcall.cmd = OVS_PACKET_CMD_MISS;
> -   upcall.portid = ovs_vport_find_upcall_portid(p, skb);
> +
> +   if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU)
> +   upcall.portid = ovs_dp_get_upcall_portid(dp, 
> smp_processor_id());
> +   else
> +   upcall.portid = ovs_vport_find_upcall_portid(p, skb);
> +
> upcall.mru = OVS_CB(skb)->mru;
> error = ovs_dp_upcall(dp, skb, key, , 0);
> if (unlikely(error))
> @@ -1590,16

Re: [ovs-dev] [PATCH net] net: openvswitch: fix TTL decrement action netlink message format

2020-11-20 Thread Pravin Shelar

On Thu, Nov 19, 2020 at 1:04 AM Eelco Chaudron  wrote:
>
> Currently, the openvswitch module is not accepting the correctly formated
> netlink message for the TTL decrement action. For both setting and getting
> the dec_ttl action, the actions should be nested in the
> OVS_DEC_TTL_ATTR_ACTION attribute as mentioned in the openvswitch.h uapi.
>
> Fixes: 744676e77720 ("openvswitch: add TTL decrement action")
> Signed-off-by: Eelco Chaudron 
Thanks for working on this. can you share OVS kmod unit test for this action?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] datapath: Add a new action dec_ttl

2020-11-16 Thread Pravin Shelar

On Fri, Nov 13, 2020 at 4:06 AM Ilya Maximets  wrote:
>
> On 11/13/20 11:04 AM, Eelco Chaudron wrote:
> > Add support for the dec_ttl action. Instead of programming the datapath with
> > a flow that matches the packet TTL and an IP set, use a single dec_ttl 
> > action.
> >
> > The old behavior is kept if the new action is not supported by the datapath.
> >
> >   # ovs-ofctl dump-flows br0
> >cookie=0x0, duration=12.538s, table=0, n_packets=4, n_bytes=392, ip 
> > actions=dec_ttl,NORMAL
> >cookie=0x0, duration=12.536s, table=0, n_packets=4, n_bytes=168, 
> > actions=NORMAL
> >
> >   # ping -c1 -t 20 192.168.0.2
> >   PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
> >   IP (tos 0x0, ttl 19, id 45336, offset 0, flags [DF], proto ICMP (1), 
> > length 84)
> >   192.168.0.1 > 192.168.0.2: ICMP echo request, id 8865, seq 1, length 
> > 64
> >
> > Linux netlink datapath support depends on upstream Linux commit:
> >   744676e77720 ("openvswitch: add TTL decrement action")
> >
> >
> > Note that in the Linux kernel tree the OVS_ACTION_ATTR_ADD_MPLS has been
> > defined, and to make sure the IDs are in sync, it had to be added to the
> > OVS source tree. This required some additional case statements, which
> > should be revisited once the OVS implementation is added.
> >
> >
> > Co-developed-by: Matteo Croce 
> > Co-developed-by: Bindiya Kurle 
> > Signed-off-by: Eelco Chaudron 
> >
> > ---
> > v2: - Used definition instead of numeric value in format_dec_ttl_action()
> > - Changed format from "dec_ttl(ttl<=1()) to
> >   "dec_ttl(le_1())" to be more in line with the check_pkt_len 
> > action.
> > - Fixed parsing of "dec_ttl()" action for adding a dp flow.
> > - Cleaned up format_dec_ttl_action()
> >
> >  datapath/linux/compat/include/linux/openvswitch.h |8 ++
> >  lib/dpif-netdev.c |4 +
> >  lib/dpif.c|4 +
> >  lib/odp-execute.c |  102 
> > -
> >  lib/odp-execute.h |2
> >  lib/odp-util.c|   42 +
> >  lib/packets.h |   13 ++-
> >  ofproto/ofproto-dpif-ipfix.c  |2
> >  ofproto/ofproto-dpif-sflow.c  |2
> >  ofproto/ofproto-dpif-xlate.c  |   54 +--
> >  ofproto/ofproto-dpif.c|   37 
> >  ofproto/ofproto-dpif.h|6 +
> >  12 files changed, 253 insertions(+), 23 deletions(-)
> >
>
> 
>
> > +static void
> > +format_dec_ttl_action(struct ds *ds,const struct nlattr *attr,
> > +  const struct hmap *portno_names)
> > +{
> > +const struct nlattr *nla_acts = nl_attr_get(attr);
> > +int len = nl_attr_get_size(attr);
> > +
> > +ds_put_cstr(ds,"dec_ttl(le_1(");
> > +
> > +if (len > 4 && nla_acts->nla_type == OVS_DEC_TTL_ATTR_ACTION) {
> > +/* Linux kernel add an additional envelope we should strip. */
> > +len -= nl_attr_len_pad(nla_acts, len);
> > +nla_acts = nl_attr_next(nla_acts);
>
> CC: Pravin
>
> I looked at the kernel and I agree that there is a clear bug in kernel's
> implementaion of this action.  It receives messages on format:
>   OVS_ACTION_ATTR_DEC_TTL(),
> but reports back in format:
>   OVS_ACTION_ATTR_DEC_TTL(OVS_DEC_TTL_ATTR_ACTION()).
>
> Since 'OVS_DEC_TTL_ATTR_ACTION' exists, it's clear that original design
> was to have it, i.e. the correct format should be the form that
> kernel reports back to userspace.  I'd guess that there was a plan to
> add more features to OVS_ACTION_ATTR_DEC_TTL in the future, e.g. set
> actions execution threshold to something different than 1, so it make
> some sense.
>
> Anyway, the bug is in the kernel part of parsing the netlink message and
> it should be fixed.  What I'm suggesting is to send a bugfix to kernel
> to accept only format with nested OVS_DEC_TTL_ATTR_ACTION.  Since this
> feature was never implemented in userspace OVS, this change will not
> break anything.  On the OVS side we should always format netlink messages
> in a correct way.  We have a code that checks feature existence in kernel
> and it should fail if kernel is broken (as it is right now).  In this case
> OVS will continue to use old implementation with setting the ttl field.
>
> Since the patch for a kernel is a bug fix, it should be likely backported
> to stable versions, and distributions will pick it up too.
>
> Thoughts?
>

Yes, I agree. lets fix the interface to use currect nested netlink
attribute to pass the actions

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 3/3] net: openvswitch: remove unnused keep_flows

2020-08-26 Thread Pravin Shelar

On Mon, Aug 24, 2020 at 10:08 PM  wrote:
>
> From: Tonghao Zhang 
>
> keep_flows was introduced by [1], which used as flag to delete flows or not.
> When rehashing or expanding the table instance, we will not flush the flows.
> Now don't use it anymore, remove it.
>
> [1] - 
> https://github.com/openvswitch/ovs/commit/acd051f1761569205827dc9b037e15568a8d59f8
> Cc: Pravin B Shelar 
> Signed-off-by: Tonghao Zhang 

This patch looks good. But its fixing memory leak, I guess the patch
that removed dependedcy on keep_flows is in net-next. so we are good.

Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 2/3] net: openvswitch: refactor flow free function

2020-08-26 Thread Pravin Shelar

On Mon, Aug 24, 2020 at 10:08 PM  wrote:
>
> From: Tonghao Zhang 
>
> Decrease table->count and ufid_count unconditionally,
> because we only don't use count or ufid_count to count
> when flushing the flows. To simplify the codes, we
> remove the "count" argument of table_instance_flow_free.
>
> To avoid a bug when deleting flows in the future, add
> WARN_ON in flush flows function.
>
> Cc: Pravin B Shelar 
> Signed-off-by: Tonghao Zhang 
Looks good.

Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 1/3] net: openvswitch: improve coding style

2020-08-26 Thread Pravin Shelar

On Mon, Aug 24, 2020 at 12:37 AM  wrote:
>
> From: Tonghao Zhang 
>
> Not change the logic, just improve coding style.
>
> Cc: Pravin B Shelar 
> Signed-off-by: Tonghao Zhang 

Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 0/3] net: openvswitch: improve codes

2020-08-26 Thread Pravin Shelar

On Tue, Aug 25, 2020 at 9:37 AM David Miller  wrote:
>
> From: xiangxia.m@gmail.com
> Date: Tue, 25 Aug 2020 13:06:33 +0800
>
> > From: Tonghao Zhang 
> >
> > This series patches are not bug fix, just improve codes.
>
> Pravin, please review this patch series.
>
Sorry for delay. I will have a look tomorrow morning PST.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] openvswitch: take into account de-fragmentation in execute_check_pkt_len

2020-06-22 Thread Pravin Shelar

On Mon, Jun 22, 2020 at 5:02 AM Lorenzo Bianconi  wrote:
>
> > On Fri, Jun 19, 2020 at 4:48 AM Lorenzo Bianconi  wrote:
> > >
> > > ovs connection tracking module performs de-fragmentation on incoming
> > > fragmented traffic. Take info account if traffic has been de-fragmented
> > > in execute_check_pkt_len action otherwise we will perform the wrong
> > > nested action considering the original packet size. This issue typically
> > > occurs if ovs-vswitchd adds a rule in the pipeline that requires 
> > > connection
> > > tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
> > >
> > > Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len")
> > > Signed-off-by: Lorenzo Bianconi 
> > > ---
> > >  net/openvswitch/actions.c | 6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > > index fc0efd8833c8..9f4dd64e53bb 100644
> > > --- a/net/openvswitch/actions.c
> > > +++ b/net/openvswitch/actions.c
> > > @@ -1169,9 +1169,10 @@ static int execute_check_pkt_len(struct datapath 
> > > *dp, struct sk_buff *skb,
> > >  struct sw_flow_key *key,
> > >  const struct nlattr *attr, bool last)
> > >  {
> > > +   struct ovs_skb_cb *ovs_cb = OVS_CB(skb);
> > > const struct nlattr *actions, *cpl_arg;
> > > const struct check_pkt_len_arg *arg;
> > > -   int rem = nla_len(attr);
> > > +   int len, rem = nla_len(attr);
> > > bool clone_flow_key;
> > >
> > > /* The first netlink attribute in 'attr' is always
> > > @@ -1180,7 +1181,8 @@ static int execute_check_pkt_len(struct datapath 
> > > *dp, struct sk_buff *skb,
> > > cpl_arg = nla_data(attr);
> > > arg = nla_data(cpl_arg);
> > >
> > > -   if (skb->len <= arg->pkt_len) {
> > > +   len = ovs_cb->mru ? ovs_cb->mru : skb->len;
> > > +   if (len <= arg->pkt_len) {
> >
> > We could also check for the segmented packet and use  segment length
> > for this check.
>
> Hi Pravin,
>
> thx for review.
> By 'segmented packet' and 'segment length', do you mean 'fragment' and
> 'fragment length'?

I am actually talking about GSO packets.

Thanks.

> If so I guess we can't retrieve the original fragment length if we exec
> OVS_ACTION_ATTR_CT before OVS_ACTION_ATTR_CHECK_PKT_LEN (e.g if we have a
> stateful ACL in the ingress pipeline) since handle_fragments() will 
> reconstruct
> the whole IP datagram and it will store frag_max_size in OVS_CB(skb)->mru.
> Am I missing something?
>
> Regards,
> Lorenzo
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net] openvswitch: take into account de-fragmentation in execute_check_pkt_len

2020-06-20 Thread Pravin Shelar

On Fri, Jun 19, 2020 at 4:48 AM Lorenzo Bianconi  wrote:
>
> ovs connection tracking module performs de-fragmentation on incoming
> fragmented traffic. Take info account if traffic has been de-fragmented
> in execute_check_pkt_len action otherwise we will perform the wrong
> nested action considering the original packet size. This issue typically
> occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
> tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
>
> Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len")
> Signed-off-by: Lorenzo Bianconi 
> ---
>  net/openvswitch/actions.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index fc0efd8833c8..9f4dd64e53bb 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -1169,9 +1169,10 @@ static int execute_check_pkt_len(struct datapath *dp, 
> struct sk_buff *skb,
>  struct sw_flow_key *key,
>  const struct nlattr *attr, bool last)
>  {
> +   struct ovs_skb_cb *ovs_cb = OVS_CB(skb);
> const struct nlattr *actions, *cpl_arg;
> const struct check_pkt_len_arg *arg;
> -   int rem = nla_len(attr);
> +   int len, rem = nla_len(attr);
> bool clone_flow_key;
>
> /* The first netlink attribute in 'attr' is always
> @@ -1180,7 +1181,8 @@ static int execute_check_pkt_len(struct datapath *dp, 
> struct sk_buff *skb,
> cpl_arg = nla_data(attr);
> arg = nla_data(cpl_arg);
>
> -   if (skb->len <= arg->pkt_len) {
> +   len = ovs_cb->mru ? ovs_cb->mru : skb->len;
> +   if (len <= arg->pkt_len) {

We could also check for the segmented packet and use  segment length
for this check.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 1/1] openvswitch: fix infoleak in conntrack

2020-06-16 Thread Pravin Shelar

On Mon, Jun 15, 2020 at 7:13 PM Xidong Wang  wrote:
>
> From: xidongwang 
>
> The stack object “zone_limit” has 3 members. In function
> ovs_ct_limit_get_default_limit(), the member "count" is
> not initialized and sent out via “nla_put_nohdr”.
>
> Signed-off-by: xidongwang 

Looks good.
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 答复: 答复: [PATCH] netdev-linux: remove sum of vport stats and kernel netdev stats

2020-04-29 Thread Pravin Shelar

On Wed, Apr 29, 2020 at 11:07 AM William Tu  wrote:
>
> On Tue, Apr 28, 2020 at 03:29:10AM +, 姜立东 wrote:
> > Hi William,
> >
> > > -/* Use kernel netdev's packet and byte counts since vport's 
> > > counters
> > > - * do not reflect packet counts on the wire when GSO, TSO or GRO 
> > > are
> > > - * enabled. */
> >
> > Actually I think it should be moved to netdev_stats_from_ovs_vport_stats 
> > :), that explains what netdev_stats_from_ovs_vport_stats is doing.
> > In fact, I think better solution is copying all physical abnormal counters 
> > in netdev_stats_from_ovs_vport_stats ,
> > and remove this copy from netdev_linux_get_stats.
> > but netdev_stats_from_ovs_vport_stats is in kernel module that may not be 
> > upgraded in some application scenarios.
> > So I moved to remove unnecessary copies, such as RX/TX bytes, packets, 
> > drops.
> >
> > Regards,
> > Lidong
>
> oh, I see your point. Then this looks good to me.
> Add Pravin to see if he has some comments.
>

The patch looks fine to me.
Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] Bareudp Tunnel Support

2020-04-24 Thread Pravin Shelar

On Sun, Apr 19, 2020 at 8:11 PM Martin Varghese
 wrote:
>
> From: Martin Varghese 
>
> UDP tunnel encapsulation module for tunnelling different protocols like
> MPLS, IP, NSH etc
>
> The Bareudp tunnel module provides a generic UDP L3 encapsulation
> tunnelling module for tunnelling different protocols like MPLS,IP,NSH etc.
> inside a UDP tunnel.
>
> Signed-off-by: Martin Varghese 
> ---
>  Documentation/automake.mk  |   1 +
>  Documentation/faq/bareudp.rst  |  62 ++
>  Documentation/faq/index.rst|   1 +
>  Documentation/faq/releases.rst |   1 +
>  NEWS   |   3 +-
>  datapath/Modules.mk|   4 +-
>  datapath/linux/Modules.mk  |   2 +
>  datapath/linux/compat/bareudp.c| 820 
> +
>  datapath/linux/compat/include/linux/if_link.h  |  11 +
>  datapath/linux/compat/include/linux/openvswitch.h  |  11 +
>  datapath/linux/compat/include/net/bareudp.h|  59 ++
>  datapath/linux/compat/include/net/ip6_tunnel.h |   9 +
>  datapath/linux/compat/include/net/ip_tunnels.h |   7 +
>  datapath/linux/compat/ip6_tunnel.c |  60 ++
>  datapath/linux/compat/ip_tunnel.c  |  47 ++
>  datapath/vport-bareudp.c   | 202 +
>  datapath/vport.c   |  11 +-
>  lib/dpif-netlink-rtnl.c|  53 ++
>  lib/dpif-netlink.c |  10 +
>  lib/netdev-vport.c |  25 +-
>  lib/netdev.h   |   1 +
>  ofproto/ofproto-dpif-xlate.c   |   1 +
>  rhel/openvswitch-kmod-fedora.spec.in   |   2 +-
>  ...sr_share_openvswitch_scripts_ovs-kmod-manage.sh |   2 +-
>  tests/automake.mk  |   2 +-
>  tests/system-layer3-tunnels.at |  47 ++
>  utilities/ovs-dev.py   |   1 +
>  27 files changed, 1447 insertions(+), 8 deletions(-)
>  create mode 100644 Documentation/faq/bareudp.rst
>  create mode 100644 datapath/linux/compat/bareudp.c
>  create mode 100644 datapath/linux/compat/include/net/bareudp.h
>  create mode 100644 datapath/vport-bareudp.c
>
I do not see need to have vport-bareudp module. we can directly use
bareudp dev from upstream kernel or from ovs compat module. Current
vport modules are there due to legacy reasons. All new tunnel
implementation should follow new design in which all tunnel devices
are netdevices.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 4/5] net: openvswitch: make EINVAL return value more obvious

2020-04-22 Thread Pravin Shelar

On Wed, Apr 22, 2020 at 10:10 AM  wrote:
>
> From: Tonghao Zhang 
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/meter.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 5/5] net: openvswitch: use u64 for meter bucket

2020-04-22 Thread Pravin Shelar

On Wed, Apr 22, 2020 at 10:10 AM  wrote:
>
> From: Tonghao Zhang 
>
> When setting the meter rate to 4+Gbps, there is an
> overflow, the meters don't work as expected.
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/meter.c | 2 +-
>  net/openvswitch/meter.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 2/5] net: openvswitch: set max limitation to meters

2020-04-22 Thread Pravin Shelar

On Wed, Apr 22, 2020 at 10:10 AM  wrote:
>
> From: Tonghao Zhang 
>
> Don't allow user to create meter unlimitedly, which may cause
> to consume a large amount of kernel memory. The max number
> supported is decided by physical memory and 20K meters as default.
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/meter.c | 57 +
>  net/openvswitch/meter.h |  2 ++
>  2 files changed, 49 insertions(+), 10 deletions(-)
>
Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 1/5] net: openvswitch: expand the meters supported number

2020-04-22 Thread Pravin Shelar

On Wed, Apr 22, 2020 at 10:10 AM  wrote:
>
> From: Tonghao Zhang 
>
> In kernel datapath of Open vSwitch, there are only 1024
> buckets of meter in one datapath. If installing more than
> 1024 (e.g. 8192) meters, it may lead to the performance drop.
> But in some case, for example, Open vSwitch used as edge
> gateway, there should be 20K at least, where meters used for
> IP address bandwidth limitation.
>
> [Open vSwitch userspace datapath has this issue too.]
>
> For more scalable meter, this patch use meter array instead of
> hash tables, and expand/shrink the array when necessary. So we
> can install more meters than before in the datapath.
> Introducing the struct *dp_meter_instance, it's easy to
> expand meter though changing the *ti point in the struct
> *dp_meter_table.
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/datapath.h |   2 +-
>  net/openvswitch/meter.c| 240 -
>  net/openvswitch/meter.h|  16 ++-
>  3 files changed, 195 insertions(+), 63 deletions(-)
>
Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3 3/5] net: openvswitch: remove the unnecessary check

2020-04-22 Thread Pravin Shelar

On Wed, Apr 22, 2020 at 10:10 AM  wrote:
>
> From: Tonghao Zhang 
>
> Before invoking the ovs_meter_cmd_reply_stats, "meter"
> was checked, so don't check it agin in that function.
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/meter.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 2/5] net: openvswitch: set max limitation to meters

2020-04-20 Thread Pravin Shelar

On Sun, Apr 19, 2020 at 5:28 PM Tonghao Zhang  wrote:
>
> On Mon, Apr 20, 2020 at 1:31 AM Pravin Shelar  wrote:
> >
> > On Sat, Apr 18, 2020 at 10:25 AM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > Don't allow user to create meter unlimitedly,
> > > which may cause to consume a large amount of kernel memory.
> > > The 200,000 meters may be fine in general case.
> > >
> > > Cc: Pravin B Shelar 
> > > Cc: Andy Zhou 
> > > Signed-off-by: Tonghao Zhang 
> > > ---
> > >  net/openvswitch/meter.c | 21 +++--
> > >  net/openvswitch/meter.h |  1 +
> > >  2 files changed, 16 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> > > index 494a0014ecd8..1b6776f9c109 100644
> > > --- a/net/openvswitch/meter.c
> > > +++ b/net/openvswitch/meter.c
> > > @@ -137,6 +137,7 @@ static int attach_meter(struct dp_meter_table *tbl, 
> > > struct dp_meter *meter)
> > >  {
> > > struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> > > u32 hash = meter_hash(ti, meter->id);
> > > +   int err;
> > >
> > > /*
> > >  * In generally, slot selected should be empty, because
> > > @@ -148,16 +149,24 @@ static int attach_meter(struct dp_meter_table *tbl, 
> > > struct dp_meter *meter)
> > > dp_meter_instance_insert(ti, meter);
> > >
> > > /* That function is thread-safe. */
> > > -   if (++tbl->count >= ti->n_meters)
> > > -   if (dp_meter_instance_realloc(tbl, ti->n_meters * 2))
> > > -   goto expand_err;
> > > +   tbl->count++;
> > > +   if (tbl->count > DP_METER_NUM_MAX) {
> > > +   err = -EFBIG;
> > > +   goto attach_err;
> > > +   }
> > > +
> > > +   if (tbl->count >= ti->n_meters &&
> > > +   dp_meter_instance_realloc(tbl, ti->n_meters * 2)) {
> > > +   err = -ENOMEM;
> > > +   goto attach_err;
> > > +   }
> > >
> > > return 0;
> > >
> > > -expand_err:
> > > +attach_err:
> > > dp_meter_instance_remove(ti, meter);
> > > tbl->count--;
> > > -   return -ENOMEM;
> > > +   return err;
> > >  }
> > >
> > >  static void detach_meter(struct dp_meter_table *tbl, struct dp_meter 
> > > *meter)
> > > @@ -264,7 +273,7 @@ static int ovs_meter_cmd_features(struct sk_buff 
> > > *skb, struct genl_info *info)
> > > if (IS_ERR(reply))
> > > return PTR_ERR(reply);
> > >
> > > -   if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
> > > +   if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, 
> > > DP_METER_NUM_MAX) ||
> > > nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
> > > goto nla_put_failure;
> > >
> > > diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h
> > > index d91940383bbe..cdfc6b9dbd42 100644
> > > --- a/net/openvswitch/meter.h
> > > +++ b/net/openvswitch/meter.h
> > > @@ -19,6 +19,7 @@ struct datapath;
> > >
> > >  #define DP_MAX_BANDS   1
> > >  #define DP_METER_ARRAY_SIZE_MIN(1ULL << 10)
> > > +#define DP_METER_NUM_MAX   (20ULL)
> > >
> > Lets make it configurable and default could 200k to allow
> > customization on different memory configurations.
> Great, set different limit depend on current system memory size like tcp ?

Yes. that could be useful.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 1/5] net: openvswitch: expand the meters supported number

2020-04-20 Thread Pravin Shelar

On Sun, Apr 19, 2020 at 5:23 PM Tonghao Zhang  wrote:
>
> On Mon, Apr 20, 2020 at 1:29 AM Pravin Shelar  wrote:
> >
> > On Sat, Apr 18, 2020 at 10:25 AM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > In kernel datapath of Open vSwitch, there are only 1024
> > > buckets of meter in one dp. If installing more than 1024
> > > (e.g. 8192) meters, it may lead to the performance drop.
> > > But in some case, for example, Open vSwitch used as edge
> > > gateway, there should be 200,000+ at least, meters used for
> > > IP address bandwidth limitation.
> > >
> > > [Open vSwitch userspace datapath has this issue too.]
> > >
> > > For more scalable meter, this patch expands the buckets
> > > when necessary, so we can install more meters in the datapath.
> > > Introducing the struct *dp_meter_instance*, it's easy to
> > > expand meter though changing the *ti* point in the struct
> > > *dp_meter_table*.
> > >
> > > Cc: Pravin B Shelar 
> > > Cc: Andy Zhou 
> > > Signed-off-by: Tonghao Zhang 
> > > ---
> > >  net/openvswitch/datapath.h |   2 +-
> > >  net/openvswitch/meter.c| 200 +
> > >  net/openvswitch/meter.h|  15 ++-
> > >  3 files changed, 169 insertions(+), 48 deletions(-)
> > >
> > > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> > > index e239a46c2f94..785105578448 100644
> > > --- a/net/openvswitch/datapath.h
> > > +++ b/net/openvswitch/datapath.h
> > > @@ -82,7 +82,7 @@ struct datapath {
> > > u32 max_headroom;
> > >
> > > /* Switch meters. */
> > > -   struct hlist_head *meters;
> > > +   struct dp_meter_table *meters;
> > lets define it as part of this struct to avoid indirection.
> >
> > >  };
> > >
> > >  /**
> > > diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> > > index 5010d1ddd4bd..494a0014ecd8 100644
> > > --- a/net/openvswitch/meter.c
> > > +++ b/net/openvswitch/meter.c
> > > @@ -19,8 +19,6 @@
> > >  #include "datapath.h"
> > >  #include "meter.h"
> > >
> > > -#define METER_HASH_BUCKETS 1024
> > > -
> > >  static const struct nla_policy meter_policy[OVS_METER_ATTR_MAX + 1] = {
> > > [OVS_METER_ATTR_ID] = { .type = NLA_U32, },
> > > [OVS_METER_ATTR_KBPS] = { .type = NLA_FLAG },
> > > @@ -39,6 +37,11 @@ static const struct nla_policy 
> > > band_policy[OVS_BAND_ATTR_MAX + 1] = {
> > > [OVS_BAND_ATTR_STATS] = { .len = sizeof(struct ovs_flow_stats) },
> > >  };
> > >
> > > +static u32 meter_hash(struct dp_meter_instance *ti, u32 id)
> > > +{
> > > +   return id % ti->n_meters;
> > > +}
> > > +
> > >  static void ovs_meter_free(struct dp_meter *meter)
> > >  {
> > > if (!meter)
> > > @@ -47,40 +50,141 @@ static void ovs_meter_free(struct dp_meter *meter)
> > > kfree_rcu(meter, rcu);
> > >  }
> > >
> > > -static struct hlist_head *meter_hash_bucket(const struct datapath *dp,
> > > -   u32 meter_id)
> > > -{
> > > -   return >meters[meter_id & (METER_HASH_BUCKETS - 1)];
> > > -}
> > > -
> > >  /* Call with ovs_mutex or RCU read lock. */
> > > -static struct dp_meter *lookup_meter(const struct datapath *dp,
> > > +static struct dp_meter *lookup_meter(const struct dp_meter_table *tbl,
> > >  u32 meter_id)
> > >  {
> > > +   struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> > > +   u32 hash = meter_hash(ti, meter_id);
> > > struct dp_meter *meter;
> > > -   struct hlist_head *head;
> > >
> > > -   head = meter_hash_bucket(dp, meter_id);
> > > -   hlist_for_each_entry_rcu(meter, head, dp_hash_node,
> > > -   lockdep_ovsl_is_held()) {
> > > -   if (meter->id == meter_id)
> > > -   return meter;
> > > -   }
> > > +   meter = rcu_dereference_ovsl(ti->dp_meters[hash]);
> > > +   if (meter && likely(meter->id == meter_id))
> > > +   return meter;
> > > +
> > > return NULL;
> > >  }
> > >
>

Re: [ovs-dev] [PATCH] net: openvswitch: ovs_ct_exit to be done under ovs_lock

2020-04-19 Thread Pravin Shelar

On Sun, Apr 19, 2020 at 1:44 AM  wrote:
>
> From: Tonghao Zhang 
>
> syzbot wrote:
> | =
> | WARNING: suspicious RCU usage
> | 5.7.0-rc1+ #45 Not tainted
> | -
> | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
> |
> | other info that might help us debug this:
> | rcu_scheduler_active = 2, debug_locks = 1
> | ...
> |
> | stack backtrace:
> | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
> | Workqueue: netns cleanup_net
> | Call Trace:
> | ...
> | ovs_ct_exit
> | ovs_exit_net
> | ops_exit_list.isra.7
> | cleanup_net
> | process_one_work
> | worker_thread
>
> To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
> lockdep_ovsl_is_held as optional lockdep expression.
>
> Link: https://lore.kernel.org/lkml/e642a905a0cbe...@google.com
> Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
> Cc: Pravin B Shelar 
> Cc: Yi-Hung Wei 
> Reported-by: syzbot+7ef50afd3a211f879...@syzkaller.appspotmail.com
> Signed-off-by: Tonghao Zhang 

Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 2/5] net: openvswitch: set max limitation to meters

2020-04-19 Thread Pravin Shelar

On Sat, Apr 18, 2020 at 10:25 AM  wrote:
>
> From: Tonghao Zhang 
>
> Don't allow user to create meter unlimitedly,
> which may cause to consume a large amount of kernel memory.
> The 200,000 meters may be fine in general case.
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/meter.c | 21 +++--
>  net/openvswitch/meter.h |  1 +
>  2 files changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> index 494a0014ecd8..1b6776f9c109 100644
> --- a/net/openvswitch/meter.c
> +++ b/net/openvswitch/meter.c
> @@ -137,6 +137,7 @@ static int attach_meter(struct dp_meter_table *tbl, 
> struct dp_meter *meter)
>  {
> struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> u32 hash = meter_hash(ti, meter->id);
> +   int err;
>
> /*
>  * In generally, slot selected should be empty, because
> @@ -148,16 +149,24 @@ static int attach_meter(struct dp_meter_table *tbl, 
> struct dp_meter *meter)
> dp_meter_instance_insert(ti, meter);
>
> /* That function is thread-safe. */
> -   if (++tbl->count >= ti->n_meters)
> -   if (dp_meter_instance_realloc(tbl, ti->n_meters * 2))
> -   goto expand_err;
> +   tbl->count++;
> +   if (tbl->count > DP_METER_NUM_MAX) {
> +   err = -EFBIG;
> +   goto attach_err;
> +   }
> +
> +   if (tbl->count >= ti->n_meters &&
> +   dp_meter_instance_realloc(tbl, ti->n_meters * 2)) {
> +   err = -ENOMEM;
> +   goto attach_err;
> +   }
>
> return 0;
>
> -expand_err:
> +attach_err:
> dp_meter_instance_remove(ti, meter);
> tbl->count--;
> -   return -ENOMEM;
> +   return err;
>  }
>
>  static void detach_meter(struct dp_meter_table *tbl, struct dp_meter *meter)
> @@ -264,7 +273,7 @@ static int ovs_meter_cmd_features(struct sk_buff *skb, 
> struct genl_info *info)
> if (IS_ERR(reply))
> return PTR_ERR(reply);
>
> -   if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, U32_MAX) ||
> +   if (nla_put_u32(reply, OVS_METER_ATTR_MAX_METERS, DP_METER_NUM_MAX) ||
> nla_put_u32(reply, OVS_METER_ATTR_MAX_BANDS, DP_MAX_BANDS))
> goto nla_put_failure;
>
> diff --git a/net/openvswitch/meter.h b/net/openvswitch/meter.h
> index d91940383bbe..cdfc6b9dbd42 100644
> --- a/net/openvswitch/meter.h
> +++ b/net/openvswitch/meter.h
> @@ -19,6 +19,7 @@ struct datapath;
>
>  #define DP_MAX_BANDS   1
>  #define DP_METER_ARRAY_SIZE_MIN(1ULL << 10)
> +#define DP_METER_NUM_MAX   (20ULL)
>
Lets make it configurable and default could 200k to allow
customization on different memory configurations.


>  struct dp_meter_band {
> u32 type;
> --
> 2.23.0
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2 1/5] net: openvswitch: expand the meters supported number

2020-04-19 Thread Pravin Shelar

On Sat, Apr 18, 2020 at 10:25 AM  wrote:
>
> From: Tonghao Zhang 
>
> In kernel datapath of Open vSwitch, there are only 1024
> buckets of meter in one dp. If installing more than 1024
> (e.g. 8192) meters, it may lead to the performance drop.
> But in some case, for example, Open vSwitch used as edge
> gateway, there should be 200,000+ at least, meters used for
> IP address bandwidth limitation.
>
> [Open vSwitch userspace datapath has this issue too.]
>
> For more scalable meter, this patch expands the buckets
> when necessary, so we can install more meters in the datapath.
> Introducing the struct *dp_meter_instance*, it's easy to
> expand meter though changing the *ti* point in the struct
> *dp_meter_table*.
>
> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/datapath.h |   2 +-
>  net/openvswitch/meter.c| 200 +
>  net/openvswitch/meter.h|  15 ++-
>  3 files changed, 169 insertions(+), 48 deletions(-)
>
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index e239a46c2f94..785105578448 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -82,7 +82,7 @@ struct datapath {
> u32 max_headroom;
>
> /* Switch meters. */
> -   struct hlist_head *meters;
> +   struct dp_meter_table *meters;
lets define it as part of this struct to avoid indirection.

>  };
>
>  /**
> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> index 5010d1ddd4bd..494a0014ecd8 100644
> --- a/net/openvswitch/meter.c
> +++ b/net/openvswitch/meter.c
> @@ -19,8 +19,6 @@
>  #include "datapath.h"
>  #include "meter.h"
>
> -#define METER_HASH_BUCKETS 1024
> -
>  static const struct nla_policy meter_policy[OVS_METER_ATTR_MAX + 1] = {
> [OVS_METER_ATTR_ID] = { .type = NLA_U32, },
> [OVS_METER_ATTR_KBPS] = { .type = NLA_FLAG },
> @@ -39,6 +37,11 @@ static const struct nla_policy 
> band_policy[OVS_BAND_ATTR_MAX + 1] = {
> [OVS_BAND_ATTR_STATS] = { .len = sizeof(struct ovs_flow_stats) },
>  };
>
> +static u32 meter_hash(struct dp_meter_instance *ti, u32 id)
> +{
> +   return id % ti->n_meters;
> +}
> +
>  static void ovs_meter_free(struct dp_meter *meter)
>  {
> if (!meter)
> @@ -47,40 +50,141 @@ static void ovs_meter_free(struct dp_meter *meter)
> kfree_rcu(meter, rcu);
>  }
>
> -static struct hlist_head *meter_hash_bucket(const struct datapath *dp,
> -   u32 meter_id)
> -{
> -   return >meters[meter_id & (METER_HASH_BUCKETS - 1)];
> -}
> -
>  /* Call with ovs_mutex or RCU read lock. */
> -static struct dp_meter *lookup_meter(const struct datapath *dp,
> +static struct dp_meter *lookup_meter(const struct dp_meter_table *tbl,
>  u32 meter_id)
>  {
> +   struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> +   u32 hash = meter_hash(ti, meter_id);
> struct dp_meter *meter;
> -   struct hlist_head *head;
>
> -   head = meter_hash_bucket(dp, meter_id);
> -   hlist_for_each_entry_rcu(meter, head, dp_hash_node,
> -   lockdep_ovsl_is_held()) {
> -   if (meter->id == meter_id)
> -   return meter;
> -   }
> +   meter = rcu_dereference_ovsl(ti->dp_meters[hash]);
> +   if (meter && likely(meter->id == meter_id))
> +   return meter;
> +
> return NULL;
>  }
>
> -static void attach_meter(struct datapath *dp, struct dp_meter *meter)
> +static struct dp_meter_instance *dp_meter_instance_alloc(const u32 size)
> +{
> +   struct dp_meter_instance *ti;
> +
> +   ti = kvzalloc(sizeof(*ti) +
> + sizeof(struct dp_meter *) * size,
> + GFP_KERNEL);
> +   if (!ti)
> +   return NULL;
Given this is a kernel space array we need to have hard limit inplace.

> +
> +   ti->n_meters = size;
> +
> +   return ti;
> +}
> +
> +static void dp_meter_instance_free(struct dp_meter_instance *ti)
> +{
> +   kvfree(ti);
> +}
> +
> +static void dp_meter_instance_free_rcu(struct rcu_head *rcu)
> +{
> +   struct dp_meter_instance *ti;
> +
> +   ti = container_of(rcu, struct dp_meter_instance, rcu);
> +   kvfree(ti);
> +}
> +
> +static int
> +dp_meter_instance_realloc(struct dp_meter_table *tbl, u32 size)
> +{
> +   struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> +   int n_meters = min(size, ti->n_meters);
> +   struct dp_meter_instance *new_ti;
> +   int i;
> +
> +   new_ti = dp_meter_instance_alloc(size);
> +   if (!new_ti)
> +   return -ENOMEM;
> +
> +   for (i = 0; i < n_meters; i++)
> +   new_ti->dp_meters[i] =
> +   rcu_dereference_ovsl(ti->dp_meters[i]);
> +
> +   rcu_assign_pointer(tbl->ti, new_ti);
> +   call_rcu(>rcu, dp_meter_instance_free_rcu);
> +
> +   return 0;
> +}
>

Re: [ovs-dev] [PATCH net-next v1 1/3] net: openvswitch: expand the meters number supported

2020-04-11 Thread Pravin Shelar

Given that we already use id-pool, we can significantly reduce
probability of the negative case of meter lookup. Therefore I do not
see need to use hash table in the datapath.

On Thu, Apr 9, 2020 at 4:29 PM Tonghao Zhang  wrote:
>
> On Fri, Apr 10, 2020 at 5:41 AM William Tu  wrote:
> >
> > On Wed, Apr 08, 2020 at 11:59:25PM +0800, Tonghao Zhang wrote:
> > > On Wed, Apr 8, 2020 at 11:09 PM William Tu  wrote:
> > > >
> > > > On Wed, Apr 01, 2020 at 06:50:09PM +0800, Tonghao Zhang wrote:
> > > > > On Tue, Mar 31, 2020 at 11:57 AM Pravin Shelar  
> > > > > wrote:
> > > > > >
> > > > > > On Sun, Mar 29, 2020 at 5:35 PM Tonghao Zhang 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Mar 30, 2020 at 12:46 AM Pravin Shelar  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Mar 28, 2020 at 8:46 AM  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > From: Tonghao Zhang 
> > > > > > > > >
> > > > > > > > > In kernel datapath of Open vSwitch, there are only 1024
> > > > > > > > > buckets of meter in one dp. If installing more than 1024
> > > > > > > > > (e.g. 8192) meters, it may lead to the performance drop.
> > > > > > > > > But in some case, for example, Open vSwitch used as edge
> > > > > > > > > gateway, there should be 200,000+ at least, meters used for
> > > > > > > > > IP address bandwidth limitation.
> > > > > > > > >
> > > > > > > > > [Open vSwitch userspace datapath has this issue too.]
> > > > > > > > >
> > > > > > > > > For more scalable meter, this patch expands the buckets
> > > > > > > > > when necessary, so we can install more meters in the datapath.
> > > > > > > > >
> > > > > > > > > * Introducing the struct *dp_meter_instance*, it's easy to
> > > > > > > > >   expand meter though change the *ti* point in the struct
> > > > > > > > >   *dp_meter_table*.
> > > > > > > > > * Using kvmalloc_array instead of kmalloc_array.
> > > > > > > > >
> > > > > > > > Thanks for working on this, I have couple of comments.
> > > > > > > >
> > > > > > > > > Cc: Pravin B Shelar 
> > > > > > > > > Cc: Andy Zhou 
> > > > > > > > > Signed-off-by: Tonghao Zhang 
> > > > > > > > > ---
> > > > > > > > >  net/openvswitch/datapath.h |   2 +-
> > > > > > > > >  net/openvswitch/meter.c| 168 
> > > > > > > > > ++---
> > > > > > > > >  net/openvswitch/meter.h|  17 +++-
> > > > > > > > >  3 files changed, 153 insertions(+), 34 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/net/openvswitch/datapath.h 
> > > > > > > > > b/net/openvswitch/datapath.h
> > > > > > > > > index e239a46c2f94..785105578448 100644
> > > > > > > > > --- a/net/openvswitch/datapath.h
> > > > > > > > > +++ b/net/openvswitch/datapath.h
> > > > > > > > > @@ -82,7 +82,7 @@ struct datapath {
> > > > > > > > > u32 max_headroom;
> > > > > > > > >
> > > > > > > > > /* Switch meters. */
> > > > > > > > > -   struct hlist_head *meters;
> > > > > > > > > +   struct dp_meter_table *meters;
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > >  /**
> > > > > > > > > diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> > > > > > > > > index 5010d1ddd4bd..98003b201b45 100644
> > > > > > > > > --- a/net/openvswitch/meter.c
> > > > > > > > > +++ b/net/openvswitch/meter.c
> > > > > > > > > @@ -47,40 +47,136 @@ static void ovs_meter_free(struct

Re: [ovs-dev] [PATCH net-next v1 1/3] net: openvswitch: expand the meters number supported

2020-04-01 Thread Pravin Shelar

Ok, thanks.

On Wed, Apr 1, 2020 at 3:50 AM Tonghao Zhang  wrote:
>
> On Tue, Mar 31, 2020 at 11:57 AM Pravin Shelar  wrote:
> >
> > On Sun, Mar 29, 2020 at 5:35 PM Tonghao Zhang  
> > wrote:
> > >
> > > On Mon, Mar 30, 2020 at 12:46 AM Pravin Shelar  wrote:
> > > >
> > > > On Sat, Mar 28, 2020 at 8:46 AM  wrote:
> > > > >
> > > > > From: Tonghao Zhang 
> > > > >
> > > > > In kernel datapath of Open vSwitch, there are only 1024
> > > > > buckets of meter in one dp. If installing more than 1024
> > > > > (e.g. 8192) meters, it may lead to the performance drop.
> > > > > But in some case, for example, Open vSwitch used as edge
> > > > > gateway, there should be 200,000+ at least, meters used for
> > > > > IP address bandwidth limitation.
> > > > >
> > > > > [Open vSwitch userspace datapath has this issue too.]
> > > > >
> > > > > For more scalable meter, this patch expands the buckets
> > > > > when necessary, so we can install more meters in the datapath.
> > > > >
> > > > > * Introducing the struct *dp_meter_instance*, it's easy to
> > > > >   expand meter though change the *ti* point in the struct
> > > > >   *dp_meter_table*.
> > > > > * Using kvmalloc_array instead of kmalloc_array.
> > > > >
> > > > Thanks for working on this, I have couple of comments.
> > > >
> > > > > Cc: Pravin B Shelar 
> > > > > Cc: Andy Zhou 
> > > > > Signed-off-by: Tonghao Zhang 
> > > > > ---
> > > > >  net/openvswitch/datapath.h |   2 +-
> > > > >  net/openvswitch/meter.c| 168 
> > > > > ++---
> > > > >  net/openvswitch/meter.h|  17 +++-
> > > > >  3 files changed, 153 insertions(+), 34 deletions(-)
> > > > >
> > > > > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> > > > > index e239a46c2f94..785105578448 100644
> > > > > --- a/net/openvswitch/datapath.h
> > > > > +++ b/net/openvswitch/datapath.h
> > > > > @@ -82,7 +82,7 @@ struct datapath {
> > > > > u32 max_headroom;
> > > > >
> > > > > /* Switch meters. */
> > > > > -   struct hlist_head *meters;
> > > > > +   struct dp_meter_table *meters;
> > > > >  };
> > > > >
> > > > >  /**
> > > > > diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> > > > > index 5010d1ddd4bd..98003b201b45 100644
> > > > > --- a/net/openvswitch/meter.c
> > > > > +++ b/net/openvswitch/meter.c
> > > > > @@ -47,40 +47,136 @@ static void ovs_meter_free(struct dp_meter 
> > > > > *meter)
> > > > > kfree_rcu(meter, rcu);
> > > > >  }
> > > > >
> > > > > -static struct hlist_head *meter_hash_bucket(const struct datapath 
> > > > > *dp,
> > > > > +static struct hlist_head *meter_hash_bucket(struct dp_meter_instance 
> > > > > *ti,
> > > > > u32 meter_id)
> > > > >  {
> > > > > -   return >meters[meter_id & (METER_HASH_BUCKETS - 1)];
> > > > > +   u32 hash = jhash_1word(meter_id, ti->hash_seed);
> > > > > +
> > > > I do not see any need to hash meter-id, can you explain it.
> > > >
> > > > > +   return >buckets[hash & (ti->n_buckets - 1)];
> > > > >  }
> > > > >
> > > > >  /* Call with ovs_mutex or RCU read lock. */
> > > > > -static struct dp_meter *lookup_meter(const struct datapath *dp,
> > > > > +static struct dp_meter *lookup_meter(const struct dp_meter_table 
> > > > > *tbl,
> > > > >  u32 meter_id)
> > > > >  {
> > > > > +   struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> > > > > struct dp_meter *meter;
> > > > > struct hlist_head *head;
> > > > >
> > > > > -   head = meter_hash_bucket(dp, meter_id);
> > > > > -   hlist_for_each_entry_rcu(meter, head, dp_hash_node,
> > > > > -

Re: [ovs-dev] [PATCH net-next v1 1/3] net: openvswitch: expand the meters number supported

2020-03-30 Thread Pravin Shelar

On Sun, Mar 29, 2020 at 5:35 PM Tonghao Zhang  wrote:
>
> On Mon, Mar 30, 2020 at 12:46 AM Pravin Shelar  wrote:
> >
> > On Sat, Mar 28, 2020 at 8:46 AM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > In kernel datapath of Open vSwitch, there are only 1024
> > > buckets of meter in one dp. If installing more than 1024
> > > (e.g. 8192) meters, it may lead to the performance drop.
> > > But in some case, for example, Open vSwitch used as edge
> > > gateway, there should be 200,000+ at least, meters used for
> > > IP address bandwidth limitation.
> > >
> > > [Open vSwitch userspace datapath has this issue too.]
> > >
> > > For more scalable meter, this patch expands the buckets
> > > when necessary, so we can install more meters in the datapath.
> > >
> > > * Introducing the struct *dp_meter_instance*, it's easy to
> > >   expand meter though change the *ti* point in the struct
> > >   *dp_meter_table*.
> > > * Using kvmalloc_array instead of kmalloc_array.
> > >
> > Thanks for working on this, I have couple of comments.
> >
> > > Cc: Pravin B Shelar 
> > > Cc: Andy Zhou 
> > > Signed-off-by: Tonghao Zhang 
> > > ---
> > >  net/openvswitch/datapath.h |   2 +-
> > >  net/openvswitch/meter.c| 168 ++---
> > >  net/openvswitch/meter.h|  17 +++-
> > >  3 files changed, 153 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> > > index e239a46c2f94..785105578448 100644
> > > --- a/net/openvswitch/datapath.h
> > > +++ b/net/openvswitch/datapath.h
> > > @@ -82,7 +82,7 @@ struct datapath {
> > > u32 max_headroom;
> > >
> > > /* Switch meters. */
> > > -   struct hlist_head *meters;
> > > +   struct dp_meter_table *meters;
> > >  };
> > >
> > >  /**
> > > diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> > > index 5010d1ddd4bd..98003b201b45 100644
> > > --- a/net/openvswitch/meter.c
> > > +++ b/net/openvswitch/meter.c
> > > @@ -47,40 +47,136 @@ static void ovs_meter_free(struct dp_meter *meter)
> > > kfree_rcu(meter, rcu);
> > >  }
> > >
> > > -static struct hlist_head *meter_hash_bucket(const struct datapath *dp,
> > > +static struct hlist_head *meter_hash_bucket(struct dp_meter_instance *ti,
> > > u32 meter_id)
> > >  {
> > > -   return >meters[meter_id & (METER_HASH_BUCKETS - 1)];
> > > +   u32 hash = jhash_1word(meter_id, ti->hash_seed);
> > > +
> > I do not see any need to hash meter-id, can you explain it.
> >
> > > +   return >buckets[hash & (ti->n_buckets - 1)];
> > >  }
> > >
> > >  /* Call with ovs_mutex or RCU read lock. */
> > > -static struct dp_meter *lookup_meter(const struct datapath *dp,
> > > +static struct dp_meter *lookup_meter(const struct dp_meter_table *tbl,
> > >  u32 meter_id)
> > >  {
> > > +   struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> > > struct dp_meter *meter;
> > > struct hlist_head *head;
> > >
> > > -   head = meter_hash_bucket(dp, meter_id);
> > > -   hlist_for_each_entry_rcu(meter, head, dp_hash_node,
> > > -   lockdep_ovsl_is_held()) {
> > > +   head = meter_hash_bucket(ti, meter_id);
> > > +   hlist_for_each_entry_rcu(meter, head, hash_node[ti->node_ver],
> > > +lockdep_ovsl_is_held()) {
> > > if (meter->id == meter_id)
> > > return meter;
> > > }
> > > +
> > This patch is expanding meter table linearly with number meters added
> > to datapath. so I do not see need to have hash table. it can be a
> > simple array. This would also improve lookup efficiency.
> > For hash collision we could find next free slot in array. let me know
> > what do you think about this approach.
> Hi Pravin
> If we use the simple array, when inserting the meter, for hash collision, we 
> can
> find next free slot, but one case, when there are many meters in the array.
> we may find many slot for the free slot.
> And when we lookup the meter, for hash collision, we may fi

Re: [ovs-dev] [PATCH net-next v1 1/3] net: openvswitch: expand the meters number supported

2020-03-29 Thread Pravin Shelar

On Sat, Mar 28, 2020 at 8:46 AM  wrote:
>
> From: Tonghao Zhang 
>
> In kernel datapath of Open vSwitch, there are only 1024
> buckets of meter in one dp. If installing more than 1024
> (e.g. 8192) meters, it may lead to the performance drop.
> But in some case, for example, Open vSwitch used as edge
> gateway, there should be 200,000+ at least, meters used for
> IP address bandwidth limitation.
>
> [Open vSwitch userspace datapath has this issue too.]
>
> For more scalable meter, this patch expands the buckets
> when necessary, so we can install more meters in the datapath.
>
> * Introducing the struct *dp_meter_instance*, it's easy to
>   expand meter though change the *ti* point in the struct
>   *dp_meter_table*.
> * Using kvmalloc_array instead of kmalloc_array.
>
Thanks for working on this, I have couple of comments.

> Cc: Pravin B Shelar 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  net/openvswitch/datapath.h |   2 +-
>  net/openvswitch/meter.c| 168 ++---
>  net/openvswitch/meter.h|  17 +++-
>  3 files changed, 153 insertions(+), 34 deletions(-)
>
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index e239a46c2f94..785105578448 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -82,7 +82,7 @@ struct datapath {
> u32 max_headroom;
>
> /* Switch meters. */
> -   struct hlist_head *meters;
> +   struct dp_meter_table *meters;
>  };
>
>  /**
> diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
> index 5010d1ddd4bd..98003b201b45 100644
> --- a/net/openvswitch/meter.c
> +++ b/net/openvswitch/meter.c
> @@ -47,40 +47,136 @@ static void ovs_meter_free(struct dp_meter *meter)
> kfree_rcu(meter, rcu);
>  }
>
> -static struct hlist_head *meter_hash_bucket(const struct datapath *dp,
> +static struct hlist_head *meter_hash_bucket(struct dp_meter_instance *ti,
> u32 meter_id)
>  {
> -   return >meters[meter_id & (METER_HASH_BUCKETS - 1)];
> +   u32 hash = jhash_1word(meter_id, ti->hash_seed);
> +
I do not see any need to hash meter-id, can you explain it.


> +   return >buckets[hash & (ti->n_buckets - 1)];
>  }
>
>  /* Call with ovs_mutex or RCU read lock. */
> -static struct dp_meter *lookup_meter(const struct datapath *dp,
> +static struct dp_meter *lookup_meter(const struct dp_meter_table *tbl,
>  u32 meter_id)
>  {
> +   struct dp_meter_instance *ti = rcu_dereference_ovsl(tbl->ti);
> struct dp_meter *meter;
> struct hlist_head *head;
>
> -   head = meter_hash_bucket(dp, meter_id);
> -   hlist_for_each_entry_rcu(meter, head, dp_hash_node,
> -   lockdep_ovsl_is_held()) {
> +   head = meter_hash_bucket(ti, meter_id);
> +   hlist_for_each_entry_rcu(meter, head, hash_node[ti->node_ver],
> +lockdep_ovsl_is_held()) {
> if (meter->id == meter_id)
> return meter;
> }
> +
This patch is expanding meter table linearly with number meters added
to datapath. so I do not see need to have hash table. it can be a
simple array. This would also improve lookup efficiency.
For hash collision we could find next free slot in array. let me know
what do you think about this approach.


> return NULL;
>  }
>
> -static void attach_meter(struct datapath *dp, struct dp_meter *meter)
> +static struct dp_meter_instance *dp_meter_instance_alloc(const int size)
> +{
> +   struct dp_meter_instance *ti;
> +   int i;
> +
> +   ti = kmalloc(sizeof(*ti), GFP_KERNEL);
> +   if (!ti)
> +   return NULL;
> +
> +   ti->buckets = kvmalloc_array(size, sizeof(struct hlist_head),
> +GFP_KERNEL);
> +   if (!ti->buckets) {
> +   kfree(ti);
> +   return NULL;
> +   }
> +
> +   for (i = 0; i < size; i++)
> +   INIT_HLIST_HEAD(>buckets[i]);
> +
> +   ti->n_buckets = size;
> +   ti->node_ver = 0;
> +   get_random_bytes(>hash_seed, sizeof(u32));
> +
> +   return ti;
> +}
> +
> +static void dp_meter_instance_free_rcu(struct rcu_head *rcu)
>  {
> -   struct hlist_head *head = meter_hash_bucket(dp, meter->id);
> +   struct dp_meter_instance *ti;
>
> -   hlist_add_head_rcu(>dp_hash_node, head);
> +   ti = container_of(rcu, struct dp_meter_instance, rcu);
> +   kvfree(ti->buckets);
> +   kfree(ti);
>  }
>
> -static void detach_meter(struct dp_meter *meter)
> +static void dp_meter_instance_insert(struct dp_meter_instance *ti,
> +struct dp_meter *meter)
> +{
> +   struct hlist_head *head = meter_hash_bucket(ti, meter->id);
> +
> +   hlist_add_head_rcu(>hash_node[ti->node_ver], head);
> +}
> +
> +static void dp_meter_instance_remove(struct dp_meter_instance *ti,
> +

Re: [ovs-dev] [PATCH 00/25] netdev datapath vxlan offload

2020-03-05 Thread Pravin Shelar

On Sun, Mar 1, 2020 at 8:25 PM Sriharsha Basavapatna via dev
 wrote:
>
> On Tue, Feb 18, 2020 at 3:30 PM Eli Britstein  wrote:
> >
> >
> > On 2/10/2020 11:16 PM, Hemal Shah wrote:
> > > Eli,
> > >
> > > There are some fundamental architecture issues (multi HW tables vs.
> > > single HW table, use of rte_flow JUMP action for mapping ovs sw
> > > tnl_pop action, etc.) that we need to discuss before we can get into
> > > the details of the patchset.
> > > I'm inlining my comments on the discussion between you and Harsha below.
> > >
> > > Hemal
> > >
> > >
> > > On Wed, Feb 5, 2020 at 6:31 AM Eli Britstein  > > > wrote:
> > >
> > >
> > > On 2/3/2020 7:05 PM, Sriharsha Basavapatna wrote:
> > > > Hi Eli,
> > > >
> > > > Thanks for sending this patchset. I have some questions about the
> > > > design, please see my comments below.
> > > >
> > > > Thanks,
> > > > -Harsha
> > > >
> > > > On Mon, Jan 20, 2020 at 8:39 PM Eli Britstein
> > > mailto:el...@mellanox.com>> wrote:
> > > >> In the netdev datapath, packets arriving over a tunnel are
> > > processed by
> > > >> more than one flow. For example a packet that should be
> > > decapsulated and
> > > >> forwarded to another port is processed by two flows:
> > > >> 1. in_port=, outer-header matches, dst-port=4789 (VXLAN),
> > > >> actions:tnl_pop(vxlan_sys_4789).
> > > >> 2. in_port=vxlan_sys_4789, tunnel matches (VNI for example),
> > > >> inner-header matches, actions: vm1.
> > > >>
> > > >> In order to offload such a multi-flow processing path, a
> > > multi-table HW
> > > >> model is used.
> > > > Did you explore other ways to avoid this multi-flow processing
> > > model ?
> > > > For example, may be support some additional logic to merge the two
> > > > datapath flow-rules for decap into a single flow-rule in the offload
> > > > layer ?
> > > Yes, we did explore this approach and our research lead us to
> > > conclude
> > > that following the OVS multi-table SW model in HW proves to be a more
> > > robust architecture.
> > >
> > > [Hemal] It will be good and prudent for you to share details of that
> > > research to help the community understand how you arrive to the
> > > conclusion above. Can you share it?
> > The HW offload scheme should follow the SW model. It is not only about
> > current VXLAN patch-set, but rather an infrastructure for offloading
> > more actions. Once HW offloading does not follow the SW model it becomes
> > difficult or not practical to support them. Squashing multi-table flows
> > into single rules might work but only for specific use cases. VXLAN with
> > fixed sized outer header is one of them, but not with other use cases.
> > There are actions, like DP_HASH, that perform calculations on the
> > packets, to be used in the next recirc.
> > For example:
> > recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> > packets:26, bytes:1924, used:0.105s, flags:S,
> > actions:hash(sym_l4(0)),recirc(0x5)
> >
> > recirc_id(0x5),dp_hash(0x315c2571/0x3),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> > packets:0, bytes:0, used:never, actions:,3
> > recirc_id(0x5),dp_hash(0x315ca562/0x3),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> > packets:0, bytes:0, used:never, actions:,2
> > ..
> > 4 rules
> >
> > 100 TCP/UDP sessions will not require more data plane flows.
> > In single-table, every separated 5-tuple should be matched, so 100
> > sessions will create 100 data plane flows.
>
> [Harsha] I understand your concerns and the requirement from different
> use cases (like dp-hash) that might need multitable flows. But for
> VXLAN decap offload, we are questioning the need for multitable. We
> are looking for design options that you considered to support VXLAN
> use case with single-table. Can you please provide specific details on
> the challenges ?
>
I think we need to align offload with software datapath. So if you
want to explore single table tunnel offload processing you need to
show how would it work in software. That will eliminate the mapping of
two software tables to single hardware table.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v5] openvswitch: add TTL decrement action

2020-02-16 Thread Pravin Shelar

On Sat, Feb 15, 2020 at 5:21 AM Matteo Croce  wrote:
>
> New action to decrement TTL instead of setting it to a fixed value.
> This action will decrement the TTL and, in case of expired TTL, drop it
> or execute an action passed via a nested attribute.
> The default TTL expired action is to drop the packet.
>
> Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
>
> Tested with a corresponding change in the userspace:
>
> # ovs-dpctl dump-flows
> in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl{ttl<=1 action:(drop)},1
> in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl{ttl<=1 action:(drop)},2
> in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:2
> in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:1
>
> # ping -c1 192.168.0.2 -t 42
> IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
> # ping -c1 192.168.0.2 -t 120
> IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
> # ping -c1 192.168.0.2 -t 1
> #
>
> Co-developed-by: Bindiya Kurle 
> Signed-off-by: Bindiya Kurle 
> Signed-off-by: Matteo Croce 
> ---

Thanks!

Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Question Regarding ovs_packet_cmd_execute in Kernel Datapath

2020-02-09 Thread Pravin Shelar

On Fri, Feb 7, 2020 at 1:18 AM Dincer Beken  wrote:
>
> Just repeating last mail with fixed indentations (mail was in wrong format):
>
>
> Hello Ben, Pravin,
>
>
> Thank you for your consideration.
>
>
> >> It is also simpler to fix the stats issue using this approach.
>
>
> >There's no stats issue.  Userspace just counts the number of packets it
> >sent and adds them in.
>
>
>
> Regarding the stats issue. In case of LTE Broadcast, I have tight 
> synchronization periods (from 80ms down to 10ms in 5G) in which I need to set 
> the elapsed #octets and the packet number, as well as the timestamp into the 
> header. Since I do not wan't to make
>  an upcall with each packet, I am using the stats of the kernel flows. Since 
> OVS_PACKET_CMD_EXECUTE does remove the flow directly after use, I am missing 
> the stats of the first packet. Therefore I wanted to know, if there is a 
> specific reason why OVS_PACKET_CMD_EXECUTE
>  has to always use temporary kernel flows.
>
>
> >> >
> >> > On Thu, Feb 06, 2020 at 11:36:19AM -0800, Pravin Shelar wrote:
> >> > > Another option would be to add new command that install and execute
> >> > > packet in same netlink msg. That would save us a netlink msg to handle
> >> > > a miss-call.  what do you think about it?
> >> >
> >> > When I experimented with that in the past, I found that it didn't have a
> >> > noticeable impact on performance.
> >> >
> >> Reduced number of msgs sent here would help in certain situations.
>
>
> >That is plausible, but it didn't help when I measured it previously.
>
>
> If we add a distinct message for packet execution with checking a flow, ex.: 
> OVS_PACKET_CMD_EXECUTE_2, if there is is no flow found, should the packet be 
> dropped? I assume that you probably would like the userplane (altough we are 
> talking here about the a dpif-netlink
>  adapter), to be loosely coupled from the kernel datapath state (such that 
> the userplane does not always has to %100 know if a kernel flow has been 
> purged or not). So  this would be an unnecessary risk. Well if we add the 
> temporary flow creation, would not
>  OVS_PACKET_CMD_EXECUTE be redundant?
>
I am not in favor of checking of flow in OVS_PACKET_CMD_EXECUTE for
same issue that you have mentioned. It involves flow lookup, the flow
might not exist in datapath, so we need handle that case. On the other
hand it is much straightforward in OVS_FLOW_CMD_NEW. As Ben has
mentioned OVS_FLOW_ATTR_PACKET attr can be used for passing packet
data. After executing the actions for packet we can update the stats
for newly added flow.

>
> > If it did help, then I don't think we'd need a new command, we could
> > just add a OVS_FLOW_ATTR_PACKET to attach a packet to the existing
> > OVS_FLOW_CMD_NEW or OVS_FLOW_CMD_SET commands.
>
>
> I guess that we only need to check in handle_upcalls, if the 
> should_install_flow is positive, install the flow and append the packet, and 
> only if not, create an DPIF_OP_EXECUTE netlink message. This looks helpful. I 
> did not work with OVS_FLOW_CMD_SET yet, but
>  this should be likewise I assume.
>
> Regards,
>
> Dincer
>
>
>
> ________
>
> Von: Ben Pfaff 
>
> Gesendet: Donnerstag, 6. Februar 2020 22:55
>
> An: Pravin Shelar 
>
> Cc: Dincer Beken ; ovs-dev@openvswitch.org 
> ; Andreas Eberlein 
>
> Betreff: Re: [ovs-dev] Question Regarding ovs_packet_cmd_execute in Kernel 
> Datapath
>
>
>
> On Thu, Feb 06, 2020 at 01:32:12PM -0800, Pravin Shelar wrote:
>
> > On Thu, Feb 6, 2020 at 12:18 PM Ben Pfaff  wrote:
>
> > >
>
> > > On Thu, Feb 06, 2020 at 11:36:19AM -0800, Pravin Shelar wrote:
>
> > > > Another option would be to add new command that install and execute
>
> > > > packet in same netlink msg. That would save us a netlink msg to handle
>
> > > > a miss-call.  what do you think about it?
>
> > >
>
> > > When I experimented with that in the past, I found that it didn't have a
>
> > > noticeable impact on performance.
>
> > >
>
> > Reduced number of msgs sent here would help in certain situations.
>
>
>
> That is plausible, but it didn't help when I measured it previously.
>
>
>
> > It is also simpler to fix the stats issue using this approach.
>
>
>
> There's no stats issue.  Userspace just counts the number of packets it
>
> sent and adds them in.
>
> ___
>
> dev mailing list
>
> d...@openvswitch.org
>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Question Regarding ovs_packet_cmd_execute in Kernel Datapath

2020-02-06 Thread Pravin Shelar

On Thu, Feb 6, 2020 at 12:18 PM Ben Pfaff  wrote:
>
> On Thu, Feb 06, 2020 at 11:36:19AM -0800, Pravin Shelar wrote:
> > Another option would be to add new command that install and execute
> > packet in same netlink msg. That would save us a netlink msg to handle
> > a miss-call.  what do you think about it?
>
> When I experimented with that in the past, I found that it didn't have a
> noticeable impact on performance.
>
Reduced number of msgs sent here would help in certain situations. It
is also simpler to fix the stats issue using this approach.

> If it did help, then I don't think we'd need a new command, we could
> just add a OVS_FLOW_ATTR_PACKET to attach a packet to the existing
> OVS_FLOW_CMD_NEW or OVS_FLOW_CMD_SET commands.

Sounds good.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] Question Regarding ovs_packet_cmd_execute in Kernel Datapath

2020-02-06 Thread Pravin Shelar

On Wed, Feb 5, 2020 at 10:11 AM Dincer Beken  wrote:
>
> Hello all,
>
> I am not entirely new to Open vSwitch and am implementing new actions to 
> realize the TS25.446 SYNC algorithm from 3GPP to create an BM-SC component 
> for LTE-Multicast (eMBMS). I am lacking the experience to judge over the 
> following issue and would kindly ask for your opinion. Excuse me in advance, 
> if the e-Mail does not follow the principles of the mailing list. Here we go..
>
> When a MISS_UPCALL is triggered by the kernel, this causes 2 netlink 
> operations. First, the establishment of a new flow in the datapath via 
> OVS_FLOW_CMD_NEW and then the execution of the missed packet via 
> OVS_PACKET_CMD_EXECUTE.
>
> From the code, it seems to me, that both of these operations are in this 
> order. Considering this, why is the "ovs_packet_cmd_execute" message not 
> checking, and if it exists, directly using the new flow, which was 
> established with OVS_FLOW_CMD_NEW in the first netlink message, but creates a 
> temporary new flow, all the time. One issue this causes is that the generated 
> flow lacks the statistics of the first packet (#packet and #elapsed_octets).
>
> By adding the UFID into the OVS_FLOW_CMD_NEW, it is possible to reuse the 
> first flow, and effectively to update the flow statistics.
>
Another option would be to add new command that install and execute
packet in same netlink msg. That would save us a netlink msg to handle
a miss-call.  what do you think about it?


> So as an example, I added the following code, with which I could reuse the 
> flow generated via OVS_FLOW_CMD_NEW. Nothing new here.
>
> if(a[OVS_FLOW_ATTR_UFID]){
> /* Extract flow identifier. */
> struct sw_flow_id sw_flow_id;
> struct sw_flow_key sw_flow_key;
>
> int error = ovs_nla_get_identifier(_flow_id, a[OVS_FLOW_ATTR_UFID],
>_flow_key, log);
> if (error) {
> goto err_kfree_skb;
> }
> rcu_read_lock();
> dp = get_dp_rcu(net, ovs_header->dp_ifindex);
> err = -ENODEV;
> if (!dp)
> goto err_unlock;
> /* Check if this is a duplicate flow */
> if (ovs_identifier_is_ufid(_flow_id)){
> flow = ovs_flow_tbl_lookup_ufid(>table, _flow_id);
> }
> if(flow){
> struct sw_flow_actions *sf_acts;
> struct dp_stats_percpu *stats;
> u64 *stats_counter;
> u32 n_mask_hit;
> struct sw_flow_key flow_key_dummy = {{0}};
> err = ovs_flow_key_extract_userspace(net, a[OVS_PACKET_ATTR_KEY],
> packet, _key_dummy, log);
> ovs_flow_stats_update(flow, flow->key.tp.flags, packet);
> sf_acts = rcu_dereference(flow->sf_acts);
> ovs_execute_actions(dp, packet, sf_acts, >key);
> stats_counter = >n_hit;
> /* Update datapath statistics. */
> u64_stats_update_begin(>syncp);
> (*stats_counter)++;
> stats->n_mask_hit += n_mask_hit;
> u64_stats_update_end(>syncp);
> pr_err("Successfully handled the packet via the existing flow of the 
> UFID.");
>  }
>  rcu_read_unlock();
>  return error;
> }
>
> I think, that the ovs_packet_cmd_execute creates those temporary flows 
> because of a reason, because it is kinda an obvious issue. So, what I 
> basically would like to know is, if there are performance related or other 
> aspects (like adding the 128 bit UFID as a nlAttr), why we always create a 
> new flow.
>
> Regards,
> Dincer Beken
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3] openvswitch: add TTL decrement action

2020-01-15 Thread Pravin Shelar

On Wed, Jan 15, 2020 at 8:40 AM Matteo Croce  wrote:
>
> New action to decrement TTL instead of setting it to a fixed value.
> This action will decrement the TTL and, in case of expired TTL, drop it
> or execute an action passed via a nested attribute.
> The default TTL expired action is to drop the packet.
>
> Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
>
> Tested with a corresponding change in the userspace:
>
> # ovs-dpctl dump-flows
> in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl{ttl<=1 action:(drop)},1
> in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl{ttl<=1 action:(drop)},2
> in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:2
> in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:1
>
> # ping -c1 192.168.0.2 -t 42
> IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
> # ping -c1 192.168.0.2 -t 120
> IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
> # ping -c1 192.168.0.2 -t 1
> #
>
Thanks for the patch.

> Co-developed-by: Bindiya Kurle 
> Signed-off-by: Bindiya Kurle 
> Signed-off-by: Matteo Croce 
> ---
>  include/uapi/linux/openvswitch.h |  2 +
>  net/openvswitch/actions.c| 67 ++
>  net/openvswitch/flow_netlink.c   | 71 
>  3 files changed, 140 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index ae2bff14e7e1..9d3f040847af 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -958,6 +958,7 @@ enum ovs_action_attr {
> OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
> OVS_ACTION_ATTR_ADD_MPLS, /* struct ovs_action_add_mpls. */
> +   OVS_ACTION_ATTR_DEC_TTL,  /* Nested OVS_DEC_TTL_ATTR_*. */
>
> __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
>* from userspace. */
> @@ -1050,4 +1051,5 @@ struct ovs_zone_limit {
> __u32 count;
>  };
>
> +#define OVS_DEC_TTL_ATTR_EXEC  0

I am not sure if we need this, But if you want the nested attribute
then lets define enum with this single attribute and have actions as
part of its data. This would be optional argument, so userspace can
skip it, and in that case datapath can drop the packet.

>  #endif /* _LINUX_OPENVSWITCH_H */
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 7fbfe2adfffa..1b0afc9bf1ad 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -964,6 +964,26 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
> return ovs_dp_upcall(dp, skb, key, , cutlen);
>  }
>
> +static int dec_ttl(struct datapath *dp, struct sk_buff *skb,
> +  struct sw_flow_key *key,
> +  const struct nlattr *attr, bool last)
Can you give it better name, for example: dec_ttl_exception_handler().

> +{
> +   /* The first action is always 'OVS_DEC_TTL_ATTR_ARG'. */
> +   struct nlattr *dec_ttl_arg = nla_data(attr);
> +   u32 nested = nla_get_u32(dec_ttl_arg);
> +   int rem = nla_len(attr);
> +
> +   if (nested) {
> +   struct nlattr *actions = nla_next(dec_ttl_arg, );
> +
> +   if (actions)
> +   return clone_execute(dp, skb, key, 0, actions, rem,
> +last, false);
> +   }
> +   consume_skb(skb);
> +   return 0;
> +}
> +
>  /* When 'last' is true, sample() should always consume the 'skb'.
>   * Otherwise, sample() should keep 'skb' intact regardless what
>   * actions are executed within sample().
> @@ -1180,6 +1200,45 @@ static int execute_check_pkt_len(struct datapath *dp, 
> struct sk_buff *skb,
>  nla_len(actions), last, clone_flow_key);
>  }
>
> +static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> +   int err;
> +
> +   if (skb->protocol == htons(ETH_P_IPV6)) {
> +   struct ipv6hdr *nh;
> +
> +   err = skb_ensure_writable(skb, skb_network_offset(skb) +
> + sizeof(*nh));
> +   if (unlikely(err))
> +   return err;
> +
> +   nh = ipv6_hdr(skb);
> +
> +   if (nh->hop_limit <= 1)
> +   return -EHOSTUNREACH;
> +
> +   key->ip.ttl = --nh->hop_limit;
> +   } else {
> +   struct iphdr *nh;
> +   u8 old_ttl;
> +
> +

Re: [ovs-dev] [PATCH net-next v2] openvswitch: add TTL decrement action

2019-12-19 Thread Pravin Shelar

On Thu, Dec 19, 2019 at 8:36 AM Matteo Croce  wrote:
>
> On Wed, Dec 18, 2019 at 4:06 AM Pravin Shelar  wrote:
> >
> > On Tue, Dec 17, 2019 at 7:51 AM Matteo Croce  wrote:
> > >
> > > New action to decrement TTL instead of setting it to a fixed value.
> > > This action will decrement the TTL and, in case of expired TTL, drop it
> > > or execute an action passed via a nested attribute.
> > > The default TTL expired action is to drop the packet.
> > >
> > > Supports both IPv4 and IPv6 via the ttl and hop_limit fields, 
> > > respectively.
> > >
> > > Tested with a corresponding change in the userspace:
> > >
> > > # ovs-dpctl dump-flows
> > > in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> > > actions:dec_ttl{ttl<=1 action:(drop)},1,1
> > > in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> > > actions:dec_ttl{ttl<=1 action:(drop)},1,2
> > > in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> > > actions:2
> > > in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> > > actions:1
> > >
> > > # ping -c1 192.168.0.2 -t 42
> > > IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> > > length 84)
> > > 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, 
> > > length 64
> > > # ping -c1 192.168.0.2 -t 120
> > > IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> > > length 84)
> > > 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, 
> > > length 64
> > > # ping -c1 192.168.0.2 -t 1
> > > #
> > >
> > > Co-authored-by: Bindiya Kurle 
> > > Signed-off-by: Bindiya Kurle 
> > > Signed-off-by: Matteo Croce 
> > > ---
> > >  include/uapi/linux/openvswitch.h |  22 +++
> > >  net/openvswitch/actions.c|  71 +
> > >  net/openvswitch/flow_netlink.c   | 105 +++
> > >  3 files changed, 198 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/openvswitch.h 
> > > b/include/uapi/linux/openvswitch.h
> > > index a87b44cd5590..b6684bc04883 100644
> > > --- a/include/uapi/linux/openvswitch.h
> > > +++ b/include/uapi/linux/openvswitch.h
> > > @@ -927,6 +927,7 @@ enum ovs_action_attr {
> > > OVS_ACTION_ATTR_METER,/* u32 meter ID. */
> > > OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> > > OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested 
> > > OVS_CHECK_PKT_LEN_ATTR_*. */
> > > +   OVS_ACTION_ATTR_DEC_TTL,   /* Nested OVS_DEC_TTL_ATTR_*. */
> > >
> > > __OVS_ACTION_ATTR_MAX,/* Nothing past this will be 
> > > accepted
> > >* from userspace. */
> > > @@ -939,6 +940,23 @@ enum ovs_action_attr {
> > >  };
> > >
> > >  #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
> > > +enum ovs_dec_ttl_attr {
> > > +   OVS_DEC_TTL_ATTR_UNSPEC,
> > > +   OVS_DEC_TTL_ATTR_ACTION_TYPE,/* Action Type u32 */
> > > +   OVS_DEC_TTL_ATTR_ACTION, /* nested action */
> > > +   __OVS_DEC_TTL_ATTR_MAX,
> > > +#ifdef __KERNEL__
> > > +   OVS_DEC_TTL_ATTR_ARG  /* struct sample_arg  */
> > > +#endif
> > > +};
> > > +
> >
> > I do not see need for type or OVS_DEC_TTL_ACTION_DROP, if there are no
> > nested action the datapath can drop the packet.
> >
> > > +#ifdef __KERNEL__
> > > +struct dec_ttl_arg {
> > > +   u32 action_type;/* dec_ttl action type.*/
> > > +};
> > > +#endif
> > > +
> > > +#define OVS_DEC_TTL_ATTR_MAX (__OVS_DEC_TTL_ATTR_MAX - 1)
> > >
> > >  /* Meters. */
> > >  #define OVS_METER_FAMILY  "ovs_meter"
> > > @@ -1009,6 +1027,10 @@ enum ovs_ct_limit_attr {
> > > __OVS_CT_LIMIT_ATTR_MAX
> > >  };
> > >
> > > +enum ovs_dec_ttl_action {/*Actions supported by dec_ttl */
> > > +   OVS_DEC_TTL_ACTION_DROP,
> > > +   OVS_DEC_TTL_ACTION_USER_SPACE
> > > +};
> > >  #define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
> > >
> > >  #define OVS_ZONE_LIMIT_DEFAULT_ZONE -1
> > > diff --git a/net/openvswitch/actio

Re: [ovs-dev] [PATCH net-next v2] openvswitch: add TTL decrement action

2019-12-17 Thread Pravin Shelar

On Tue, Dec 17, 2019 at 7:51 AM Matteo Croce  wrote:
>
> New action to decrement TTL instead of setting it to a fixed value.
> This action will decrement the TTL and, in case of expired TTL, drop it
> or execute an action passed via a nested attribute.
> The default TTL expired action is to drop the packet.
>
> Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
>
> Tested with a corresponding change in the userspace:
>
> # ovs-dpctl dump-flows
> in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl{ttl<=1 action:(drop)},1,1
> in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl{ttl<=1 action:(drop)},1,2
> in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:2
> in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:1
>
> # ping -c1 192.168.0.2 -t 42
> IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
> # ping -c1 192.168.0.2 -t 120
> IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
> # ping -c1 192.168.0.2 -t 1
> #
>
> Co-authored-by: Bindiya Kurle 
> Signed-off-by: Bindiya Kurle 
> Signed-off-by: Matteo Croce 
> ---
>  include/uapi/linux/openvswitch.h |  22 +++
>  net/openvswitch/actions.c|  71 +
>  net/openvswitch/flow_netlink.c   | 105 +++
>  3 files changed, 198 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index a87b44cd5590..b6684bc04883 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -927,6 +927,7 @@ enum ovs_action_attr {
> OVS_ACTION_ATTR_METER,/* u32 meter ID. */
> OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
> +   OVS_ACTION_ATTR_DEC_TTL,   /* Nested OVS_DEC_TTL_ATTR_*. */
>
> __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
>* from userspace. */
> @@ -939,6 +940,23 @@ enum ovs_action_attr {
>  };
>
>  #define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
> +enum ovs_dec_ttl_attr {
> +   OVS_DEC_TTL_ATTR_UNSPEC,
> +   OVS_DEC_TTL_ATTR_ACTION_TYPE,/* Action Type u32 */
> +   OVS_DEC_TTL_ATTR_ACTION, /* nested action */
> +   __OVS_DEC_TTL_ATTR_MAX,
> +#ifdef __KERNEL__
> +   OVS_DEC_TTL_ATTR_ARG  /* struct sample_arg  */
> +#endif
> +};
> +

I do not see need for type or OVS_DEC_TTL_ACTION_DROP, if there are no
nested action the datapath can drop the packet.

> +#ifdef __KERNEL__
> +struct dec_ttl_arg {
> +   u32 action_type;/* dec_ttl action type.*/
> +};
> +#endif
> +
> +#define OVS_DEC_TTL_ATTR_MAX (__OVS_DEC_TTL_ATTR_MAX - 1)
>
>  /* Meters. */
>  #define OVS_METER_FAMILY  "ovs_meter"
> @@ -1009,6 +1027,10 @@ enum ovs_ct_limit_attr {
> __OVS_CT_LIMIT_ATTR_MAX
>  };
>
> +enum ovs_dec_ttl_action {/*Actions supported by dec_ttl */
> +   OVS_DEC_TTL_ACTION_DROP,
> +   OVS_DEC_TTL_ACTION_USER_SPACE
> +};
>  #define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
>
>  #define OVS_ZONE_LIMIT_DEFAULT_ZONE -1
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 4c8395462303..5329668732b1 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -960,6 +960,31 @@ static int output_userspace(struct datapath *dp, struct 
> sk_buff *skb,
> return ovs_dp_upcall(dp, skb, key, , cutlen);
>  }
>
> +static int dec_ttl(struct datapath *dp, struct sk_buff *skb,
> +  struct sw_flow_key *fk, const struct nlattr *attr, bool 
> last)
> +{
> +   struct nlattr *actions;
> +   struct nlattr *dec_ttl_arg;
> +   int rem = nla_len(attr);
> +   const struct dec_ttl_arg *arg;
> +
> +   /* The first action is always OVS_DEC_TTL_ATTR_ARG. */
> +   dec_ttl_arg = nla_data(attr);
> +   arg = nla_data(dec_ttl_arg);
> +   actions = nla_next(dec_ttl_arg, );
> +
> +   switch (arg->action_type) {
> +   case OVS_DEC_TTL_ACTION_DROP:
> +   consume_skb(skb);
> +   break;
> +
> +   case OVS_DEC_TTL_ACTION_USER_SPACE:
> +   return clone_execute(dp, skb, fk, 0, actions, rem, last, 
> false);
> +   }
> +
> +   return 0;
> +}
> +
>  /* When 'last' is true, sample() should always consume the 'skb'.
>   * Otherwise, sample() should keep 'skb' intact regardless what
>   * actions are executed within sample().
> @@ -1176,6 +1201,44 @@ static int execute_check_pkt_len(struct datapath *dp, 
> struct sk_buff *skb,
>

Re: [ovs-dev] [PATCH net 1/2] openvswitch: support asymmetric conntrack

2019-11-25 Thread Pravin Shelar

Downloading from patchwork is working for me. Its strange other
patches in my mailbox does not has this issue.

Thanks.

On Mon, Nov 25, 2019 at 7:39 AM Aaron Conole  wrote:
>
> Aaron Conole  writes:
>
> > Pravin Shelar  writes:
> >
> >> On Fri, Nov 8, 2019 at 1:07 PM Aaron Conole  wrote:
> >>>
> >>> The openvswitch module shares a common conntrack and NAT infrastructure
> >>> exposed via netfilter.  It's possible that a packet needs both SNAT and
> >>> DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
> >>> this because it runs through the NAT table twice - once on ingress and
> >>> again after egress.  The openvswitch module doesn't have such capability.
> >>>
> >>> Like netfilter hook infrastructure, we should run through NAT twice to
> >>> keep the symmetry.
> >>>
> >>> Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
> >>> Signed-off-by: Aaron Conole 
> >>
> >> The patch looks ok. But I am not able apply it. can you fix the encoding.
> >
> > Hrrm.  I didn't make any special changes (just used git send-email).  I
> > will look at spinning a second patch.
>
> Pravin,
>
> I tried the following:
>
>   10:36:59 aconole@dhcp-25 {(312434617cb1...)} ~/git/linux$ curl 
> http://patchwork.ozlabs.org/patch/1192219/mbox/ > test.patch
> % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>Dload  Upload   Total   SpentLeft  
> Speed
>   100  4827  100  48270 0   8824  0 --:--:-- --:--:-- --:--:--  
> 8808
>   10:37:21 aconole@dhcp-25 {(312434617cb1...)} ~/git/linux$ git am test.patch
>   Applying: openvswitch: support asymmetric conntrack
>   10:37:24 aconole@dhcp-25 {(f759cc2b7323...)} ~/git/linux$
>
>
> Can you check your mailer settings?  The patchwork mbox worked fine, and
> I was able to apply from my own mbox as well.
>
> -Aaron
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4] net: openvswitch: add hash info to upcall

2019-11-13 Thread Pravin Shelar

On Wed, Nov 13, 2019 at 7:05 AM  wrote:
>
> From: Tonghao Zhang 
>
> When using the kernel datapath, the upcall don't
> include skb hash info relatived. That will introduce
> some problem, because the hash of skb is important
> in kernel stack. For example, VXLAN module uses
> it to select UDP src port. The tx queue selection
> may also use the hash in stack.
>
> Hash is computed in different ways. Hash is random
> for a TCP socket, and hash may be computed in hardware,
> or software stack. Recalculation hash is not easy.
>
> Hash of TCP socket is computed:
> tcp_v4_connect
> -> sk_set_txhash (is random)
>
> __tcp_transmit_skb
> -> skb_set_hash_from_sk
>
> There will be one upcall, without information of skb
> hash, to ovs-vswitchd, for the first packet of a TCP
> session. The rest packets will be processed in Open vSwitch
> modules, hash kept. If this tcp session is forward to
> VXLAN module, then the UDP src port of first tcp packet
> is different from rest packets.
>
> TCP packets may come from the host or dockers, to Open vSwitch.
> To fix it, we store the hash info to upcall, and restore hash
> when packets sent back.
>
> +---+  +-+
> |   Docker/VMs  |  | ovs-vswitchd|
> ++--+  +-++--+
>  |   ^|
>  |   ||
>  |   |  upcallv restore packet hash (not 
> recalculate)
>  | +-++--+
>  |  tap netdev | |   vxlan module
>  +---> +-->  Open vSwitch ko +-->
>or internal type| |
>+-+
>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> Signed-off-by: Tonghao Zhang 

Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: add TTL decrement action

2019-11-12 Thread Pravin Shelar

On Tue, Nov 12, 2019 at 2:25 AM Matteo Croce  wrote:
>
> New action to decrement TTL instead of setting it to a fixed value.
> This action will decrement the TTL and, in case of expired TTL, send the
> packet to userspace via output_userspace() to take care of it.
>
> Supports both IPv4 and IPv6 via the ttl and hop_limit fields, respectively.
>
> Tested with a corresponding change in the userspace:
>
> # ovs-dpctl dump-flows
> in_port(2),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl,1
> in_port(1),eth(),eth_type(0x0800), packets:0, bytes:0, used:never, 
> actions:dec_ttl,2
> in_port(1),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:2
> in_port(2),eth(),eth_type(0x0806), packets:0, bytes:0, used:never, 
> actions:1
>
> # ping -c1 192.168.0.2 -t 42
> IP (tos 0x0, ttl 41, id 61647, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 386, seq 1, length 64
> # ping -c1 192.168.0.2 -t 120
> IP (tos 0x0, ttl 119, id 62070, offset 0, flags [DF], proto ICMP (1), 
> length 84)
> 192.168.0.1 > 192.168.0.2: ICMP echo request, id 388, seq 1, length 64
> # ping -c1 192.168.0.2 -t 1
> #
>
> Co-authored-by: Bindiya Kurle 
> Signed-off-by: Bindiya Kurle 
> Signed-off-by: Matteo Croce 
> ---
>  include/uapi/linux/openvswitch.h |  2 ++
>  net/openvswitch/actions.c| 46 
>  net/openvswitch/flow_netlink.c   |  6 +
>  3 files changed, 54 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 1887a451c388..a3bdb1ecd1e7 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -890,6 +890,7 @@ struct check_pkt_len_arg {
>   * @OVS_ACTION_ATTR_CHECK_PKT_LEN: Check the packet length and execute a set
>   * of actions if greater than the specified packet length, else execute
>   * another set of actions.
> + * @OVS_ACTION_ATTR_DEC_TTL: Decrement the IP TTL.
>   *
>   * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not 
> all
>   * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
> @@ -925,6 +926,7 @@ enum ovs_action_attr {
> OVS_ACTION_ATTR_METER,/* u32 meter ID. */
> OVS_ACTION_ATTR_CLONE,/* Nested OVS_CLONE_ATTR_*.  */
> OVS_ACTION_ATTR_CHECK_PKT_LEN, /* Nested OVS_CHECK_PKT_LEN_ATTR_*. */
> +   OVS_ACTION_ATTR_DEC_TTL,  /* Decrement ttl action */
>
> __OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
>* from userspace. */
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 12936c151cc0..077b7f309c93 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -1174,6 +1174,43 @@ static int execute_check_pkt_len(struct datapath *dp, 
> struct sk_buff *skb,
>  nla_len(actions), last, clone_flow_key);
>  }
>
> +static int execute_dec_ttl(struct sk_buff *skb, struct sw_flow_key *key)
> +{
> +   int err;
> +
> +   if (skb->protocol == htons(ETH_P_IPV6)) {
> +   struct ipv6hdr *nh = ipv6_hdr(skb);
> +
> +   err = skb_ensure_writable(skb, skb_network_offset(skb) +
> + sizeof(*nh));
> +   if (unlikely(err))
> +   return err;
> +
> +   if (nh->hop_limit <= 1)
> +   return -EHOSTUNREACH;
> +
> +   key->ip.ttl = --nh->hop_limit;
> +   } else {
> +   struct iphdr *nh = ip_hdr(skb);
> +   u8 old_ttl;
> +
> +   err = skb_ensure_writable(skb, skb_network_offset(skb) +
> + sizeof(*nh));
> +   if (unlikely(err))
> +   return err;
> +
> +   if (nh->ttl <= 1)
> +   return -EHOSTUNREACH;
> +
> +   old_ttl = nh->ttl--;
> +   csum_replace2(>check, htons(old_ttl << 8),
> + htons(nh->ttl << 8));
> +   key->ip.ttl = nh->ttl;
> +   }
> +
> +   return 0;
> +}
> +
>  /* Execute a list of actions against 'skb'. */
>  static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>   struct sw_flow_key *key,
> @@ -1345,6 +1382,15 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
>
> break;
> }
> +
> +   case OVS_ACTION_ATTR_DEC_TTL:
> +   err = execute_dec_ttl(skb, key);
> +   if (err == -EHOSTUNREACH) {
> +   output_userspace(dp, skb, key, a, attr,
> +len, OVS_CB(skb)->cutlen);
> +   OVS_CB(skb)->cutlen = 0;
> +

Re: [ovs-dev] [PATCH net-next v3] net: openvswitch: add hash info to upcall

2019-11-12 Thread Pravin Shelar

On Tue, Nov 12, 2019 at 7:09 AM  wrote:
>
> From: Tonghao Zhang 
>
> When using the kernel datapath, the upcall don't
> include skb hash info relatived. That will introduce
> some problem, because the hash of skb is important
> in kernel stack. For example, VXLAN module uses
> it to select UDP src port. The tx queue selection
> may also use the hash in stack.
>
> Hash is computed in different ways. Hash is random
> for a TCP socket, and hash may be computed in hardware,
> or software stack. Recalculation hash is not easy.
>
> Hash of TCP socket is computed:
>   tcp_v4_connect
> -> sk_set_txhash (is random)
>
>   __tcp_transmit_skb
> -> skb_set_hash_from_sk
>
> There will be one upcall, without information of skb
> hash, to ovs-vswitchd, for the first packet of a TCP
> session. The rest packets will be processed in Open vSwitch
> modules, hash kept. If this tcp session is forward to
> VXLAN module, then the UDP src port of first tcp packet
> is different from rest packets.
>
> TCP packets may come from the host or dockers, to Open vSwitch.
> To fix it, we store the hash info to upcall, and restore hash
> when packets sent back.
>
> +---+  +-+
> |   Docker/VMs  |  | ovs-vswitchd|
> ++--+  +-++--+
>  |   ^|
>  |   ||
>  |   |  upcallv restore packet hash (not 
> recalculate)
>  | +-++--+
>  |  tap netdev | |   vxlan module
>  +---> +-->  Open vSwitch ko +-->
>or internal type| |
>+-+
>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> Signed-off-by: Tonghao Zhang 
> ---
> v3:
> * add enum ovs_pkt_hash_types
> * avoid duplicate call the skb_get_hash_raw.
> * explain why we should fix this problem.
> ---
>  include/uapi/linux/openvswitch.h |  2 ++
>  net/openvswitch/datapath.c   | 30 +-
>  net/openvswitch/datapath.h   | 12 
>  3 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 1887a451c388..e65407c1f366 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -170,6 +170,7 @@ enum ovs_packet_cmd {
>   * output port is actually a tunnel port. Contains the output tunnel key
>   * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
>   * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
> + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash 
> in skb).
>   * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
>   * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
>   * size.
> @@ -190,6 +191,7 @@ enum ovs_packet_attr {
> OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
>error logging should be suppressed. */
> OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
> +   OVS_PACKET_ATTR_HASH,   /* Packet hash. */
> OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
> __OVS_PACKET_ATTR_MAX
>  };
I agree with Greg, value of existing enums can not be changed in UAPI.

> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 2088619c03f0..b556cf62b77c 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -350,7 +350,8 @@ static size_t upcall_msg_size(const struct dp_upcall_info 
> *upcall_info,
> size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
> + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
> + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY 
> */
> -   + nla_total_size(sizeof(unsigned int)); /* 
> OVS_PACKET_ATTR_LEN */
> +   + nla_total_size(sizeof(unsigned int)) /* OVS_PACKET_ATTR_LEN 
> */
> +   + nla_total_size(sizeof(u64)); /* OVS_PACKET_ATTR_HASH */
>
> /* OVS_PACKET_ATTR_USERDATA */
> if (upcall_info->userdata)
> @@ -393,6 +394,7 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> size_t len;
> unsigned int hlen;
> int err, dp_ifindex;
> +   u64 hash;
>
> dp_ifindex = get_dpifindex(dp);
> if (!dp_ifindex)
> @@ -504,6 +506,23 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> pad_packet(dp, user_skb);
> }
>
> +   hash = skb_get_hash_raw(skb);
> +   if (hash) {
Zero hash is valid hash of skb. due to this check packets with zero
hash would not get same vxlan source

Re: [ovs-dev] [PATCH net-next] net: openvswitch: add hash info to upcall

2019-11-11 Thread Pravin Shelar

On Sun, Nov 10, 2019 at 3:44 AM  wrote:
>
> From: Tonghao Zhang 
>
> When using the kernel datapath, the upcall don't
> add skb hash info relatived. That will introduce
> some problem, because the hash of skb is very
> important (e.g. vxlan module uses it for udp src port,
> tx queue selection on tx path.).
>
> For example, there will be one upcall, without information
> skb hash, to ovs-vswitchd, for the first packet of one tcp
> session. When kernel sents the tcp packets, the hash is
> random for a tcp socket:
>
> tcp_v4_connect
>   -> sk_set_txhash (is random)
>
> __tcp_transmit_skb
>   -> skb_set_hash_from_sk
>
> Then the udp src port of first tcp packet is different
> from rest packets. The topo is shown.
>
> $ ovs-vsctl add-br br-int
> $ ovs-vsctl add-port br-int vxl0 -- \
> set Interface vxl0 type=vxlan options:key=100 
> options:remote_ip=1.1.1.200
>
> $ __tap is internal type on host
> $ or tap net device for VM/Dockers
> $ ovs-vsctl add-port br-int __tap
>
> +---+  +-+
> |   Docker/VMs  |  | ovs-vswitchd|
> ++--+  +-+
>  |   ^|
>  |   ||
>  |   |  upcallv recalculate packet hash
>  | +-++--+
>  |  tap netdev | |   vxlan modules
>  +---> +-->  Open vSwitch ko   --+--->
>internal type   | |
>+-+
>
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
> Signed-off-by: Tonghao Zhang 
> ---
>  include/uapi/linux/openvswitch.h |  2 ++
>  net/openvswitch/datapath.c   | 31 ++-
>  net/openvswitch/datapath.h   |  3 +++
>  3 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 1887a451c388..1c58e019438e 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -170,6 +170,7 @@ enum ovs_packet_cmd {
>   * output port is actually a tunnel port. Contains the output tunnel key
>   * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
>   * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
> + * @OVS_PACKET_ATTR_HASH: Packet hash info (e.g. hash, sw_hash and l4_hash 
> in skb)
>   * @OVS_PACKET_ATTR_LEN: Packet size before truncation.
>   * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
>   * size.
> @@ -190,6 +191,7 @@ enum ovs_packet_attr {
> OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
>error logging should be suppressed. */
> OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
> +   OVS_PACKET_ATTR_HASH,   /* Packet hash. */
> OVS_PACKET_ATTR_LEN,/* Packet size before truncation. */
> __OVS_PACKET_ATTR_MAX
>  };
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 2088619c03f0..f938c43e3085 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -350,7 +350,8 @@ static size_t upcall_msg_size(const struct dp_upcall_info 
> *upcall_info,
> size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
> + nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
> + nla_total_size(ovs_key_attr_size()) /* OVS_PACKET_ATTR_KEY 
> */
> -   + nla_total_size(sizeof(unsigned int)); /* 
> OVS_PACKET_ATTR_LEN */
> +   + nla_total_size(sizeof(unsigned int)) /* OVS_PACKET_ATTR_LEN 
> */
> +   + nla_total_size(sizeof(u64)); /* OVS_PACKET_ATTR_HASH */
>
> /* OVS_PACKET_ATTR_USERDATA */
> if (upcall_info->userdata)
> @@ -393,6 +394,7 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> size_t len;
> unsigned int hlen;
> int err, dp_ifindex;
> +   u64 hash;
>
> dp_ifindex = get_dpifindex(dp);
> if (!dp_ifindex)
> @@ -504,6 +506,24 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> pad_packet(dp, user_skb);
> }
>
> +   if (skb_get_hash_raw(skb)) {
skb_get_hash_raw() never fails to return hash, so I do not see point
of checking hash value.

> +   hash = skb_get_hash_raw(skb);
> +
> +   if (skb->sw_hash)
> +   hash |= OVS_PACKET_HASH_SW;
> +
> +   if (skb->l4_hash)
> +   hash |= OVS_PACKET_HASH_L4;
> +
> +   if (nla_put(user_skb, OVS_PACKET_ATTR_HASH,
> +   sizeof (u64), )) {
> +   err = -ENOBUFS;
> +   goto

Re: [ovs-dev] [PATCH net 1/2] openvswitch: support asymmetric conntrack

2019-11-09 Thread Pravin Shelar

On Fri, Nov 8, 2019 at 1:07 PM Aaron Conole  wrote:
>
> The openvswitch module shares a common conntrack and NAT infrastructure
> exposed via netfilter.  It's possible that a packet needs both SNAT and
> DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
> this because it runs through the NAT table twice - once on ingress and
> again after egress.  The openvswitch module doesn't have such capability.
>
> Like netfilter hook infrastructure, we should run through NAT twice to
> keep the symmetry.
>
> Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
> Signed-off-by: Aaron Conole 

The patch looks ok. But I am not able apply it. can you fix the encoding.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2] net: openvswitch: select vport upcall portid directly

2019-11-06 Thread Pravin Shelar

On Wed, Nov 6, 2019 at 8:34 AM  wrote:
>
> From: Tonghao Zhang 
>
> The commit 69c51582ff786 ("dpif-netlink: don't allocate per
> thread netlink sockets"), in Open vSwitch ovs-vswitchd, has
> changed the number of allocated sockets to just one per port
> by moving the socket array from a per handler structure to
> a per datapath one. In the kernel datapath, a vport will have
> only one socket in most case, if so select it directly in
> fast-path.
>
> Signed-off-by: Tonghao Zhang 

Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 05/10] net: openvswitch: optimize flow-mask looking up

2019-11-04 Thread Pravin Shelar

On Mon, Nov 4, 2019 at 6:00 AM William Tu  wrote:
>
> On Sat, Nov 2, 2019 at 11:50 PM Pravin Shelar  wrote:
> >
> > On Fri, Nov 1, 2019 at 7:24 AM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > The full looking up on flow table traverses all mask array.
> > > If mask-array is too large, the number of invalid flow-mask
> > > increase, performance will be drop.
> > >
> > > One bad case, for example: M means flow-mask is valid and NULL
> > > of flow-mask means deleted.
> > >
> > > +---+
> > > | M | NULL | ...  | NULL | M|
> > > +---+
> > >
> > > In that case, without this patch, openvswitch will traverses all
> > > mask array, because there will be one flow-mask in the tail. This
> > > patch changes the way of flow-mask inserting and deleting, and the
> > > mask array will be keep as below: there is not a NULL hole. In the
> > > fast path, we can "break" "for" (not "continue") in flow_lookup
> > > when we get a NULL flow-mask.
> > >
> > >  "break"
> > > v
> > > +---+
> > > | M | M |  NULL |...   | NULL | NULL|
> > > +---+
> > >
> > > This patch don't optimize slow or control path, still using ma->max
> > > to traverse. Slow path:
> > > * tbl_mask_array_realloc
> > > * ovs_flow_tbl_lookup_exact
> > > * flow_mask_find
> > >
> > > Signed-off-by: Tonghao Zhang 
> > > Tested-by: Greg Rose 
> > > ---
> > Acked-by: Pravin B Shelar 
>
> Nack to this patch.
>
> It makes the mask cache invalid when moving the flow mask
> to fill another hole.
> And the penalty for miss the mask cache is larger than the
> benefit of this patch (avoiding the NULL flow-mask).
>
Hi William,

The cache invalidation would occur in case of changes to flow mask
list. I think in general datapath processing there should not be too
many changes to mask list since a mask is typically shared between
multiple mega flows. Flow-table changes does not always result in mask
list updates. In last round Tonghao has posted number to to support
this patch.
If you are worried about some other use case can you provide
performance numbers, that way we can take this discussion forward.

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 10/10] net: openvswitch: simplify the ovs_dp_cmd_new

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> use the specified functions to init resource.
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> Unlocking of a not locked mutex is not allowed.
> Other kernel thread may be in critical section while
> we unlock it because of setting user_feature fail.
>
> Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
> Cc: Paul Blakey 
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> Acked-by: William Tu 
> ---
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> When we destroy the flow tables which may contain the flow_mask,
> so release the flow mask struct.
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> ---
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 07/10] net: openvswitch: add likely in flow_lookup

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> The most case *index < ma->max, and flow-mask is not NULL.
> We add un/likely for performance.
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> Acked-by: William Tu 
> ---
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 05/10] net: openvswitch: optimize flow-mask looking up

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> The full looking up on flow table traverses all mask array.
> If mask-array is too large, the number of invalid flow-mask
> increase, performance will be drop.
>
> One bad case, for example: M means flow-mask is valid and NULL
> of flow-mask means deleted.
>
> +---+
> | M | NULL | ...  | NULL | M|
> +---+
>
> In that case, without this patch, openvswitch will traverses all
> mask array, because there will be one flow-mask in the tail. This
> patch changes the way of flow-mask inserting and deleting, and the
> mask array will be keep as below: there is not a NULL hole. In the
> fast path, we can "break" "for" (not "continue") in flow_lookup
> when we get a NULL flow-mask.
>
>  "break"
> v
> +---+
> | M | M |  NULL |...   | NULL | NULL|
> +---+
>
> This patch don't optimize slow or control path, still using ma->max
> to traverse. Slow path:
> * tbl_mask_array_realloc
> * ovs_flow_tbl_lookup_exact
> * flow_mask_find
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> ---
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 06/10] net: openvswitch: simplify the flow_hash

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> Simplify the code and remove the unnecessary BUILD_BUG_ON.
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> Acked-by: William Tu 
> ---
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 04/10] net: openvswitch: optimize flow mask cache hash collision

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> Port the codes to linux upstream and with little changes.
>
> Pravin B Shelar, says:
> | In case hash collision on mask cache, OVS does extra flow
> | lookup. Following patch avoid it.
>
> Link: 
> https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> ---
Signed-off-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 03/10] net: openvswitch: shrink the mask array if necessary

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> When creating and inserting flow-mask, if there is no available
> flow-mask, we realloc the mask array. When removing flow-mask,
> if necessary, we shrink mask array.
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> Acked-by: William Tu 
> ---
Acked-by: Pravin B Shelar 


>  net/openvswitch/flow_table.c | 33 +++--
>  1 file changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index 92efa23..0c0fcd6 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -694,6 +694,23 @@ static struct table_instance 
> *table_instance_expand(struct table_instance *ti,
> return table_instance_rehash(ti, ti->n_buckets * 2, ufid);
>  }
>
> +static void tbl_mask_array_delete_mask(struct mask_array *ma,
> +  struct sw_flow_mask *mask)
> +{
> +   int i;
> +
> +   /* Remove the deleted mask pointers from the array */
> +   for (i = 0; i < ma->max; i++) {
> +   if (mask == ovsl_dereference(ma->masks[i])) {
> +   RCU_INIT_POINTER(ma->masks[i], NULL);
> +   ma->count--;
> +   kfree_rcu(mask, rcu);
> +   return;
> +   }
> +   }
> +   BUG();
> +}
> +
>  /* Remove 'mask' from the mask list, if it is not needed any more. */
>  static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask 
> *mask)
>  {
> @@ -707,18 +724,14 @@ static void flow_mask_remove(struct flow_table *tbl, 
> struct sw_flow_mask *mask)
>
> if (!mask->ref_count) {
> struct mask_array *ma;
> -   int i;
>
> ma = ovsl_dereference(tbl->mask_array);
> -   for (i = 0; i < ma->max; i++) {
> -   if (mask == ovsl_dereference(ma->masks[i])) {
> -   RCU_INIT_POINTER(ma->masks[i], NULL);
> -   ma->count--;
> -   kfree_rcu(mask, rcu);
> -   return;
> -   }
> -   }
> -   BUG();
> +   tbl_mask_array_delete_mask(ma, mask);
> +
> +   /* Shrink the mask array if necessary. */
> +   if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
> +   ma->count <= (ma->max / 3))
> +   tbl_mask_array_realloc(tbl, ma->max / 2);
> }
> }
>  }
> --
> 1.8.3.1
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 02/10] net: openvswitch: convert mask list in mask array

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> Port the codes to linux upstream and with little changes.
>
> Pravin B Shelar, says:
> | mask caches index of mask in mask_list. On packet recv OVS
> | need to traverse mask-list to get cached mask. Therefore array
> | is better for retrieving cached mask. This also allows better
> | cache replacement algorithm by directly checking mask's existence.
>
> Link: 
> https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> Acked-by: William Tu 
> ---
Signed-off-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v6 01/10] net: openvswitch: add flow-mask cache for performance

2019-11-03 Thread Pravin Shelar

On Fri, Nov 1, 2019 at 7:24 AM  wrote:
>
> From: Tonghao Zhang 
>
> The idea of this optimization comes from a patch which
> is committed in 2014, openvswitch community. The author
> is Pravin B Shelar. In order to get high performance, I
> implement it again. Later patches will use it.
>
> Pravin B Shelar, says:
> | On every packet OVS needs to lookup flow-table with every
> | mask until it finds a match. The packet flow-key is first
> | masked with mask in the list and then the masked key is
> | looked up in flow-table. Therefore number of masks can
> | affect packet processing performance.
>
> Link: 
> https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> Acked-by: William Tu 
> ---
Signed-off-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-29 Thread Pravin Shelar

On Tue, Oct 29, 2019 at 4:31 AM Tonghao Zhang  wrote:
>
> On Tue, Oct 29, 2019 at 3:38 PM Pravin Shelar  wrote:
> >
> > On Sun, Oct 27, 2019 at 11:49 PM Tonghao Zhang  
> > wrote:
> > >
> > > On Thu, Oct 24, 2019 at 3:14 PM Pravin Shelar  wrote:
> > > >
> > > > On Tue, Oct 22, 2019 at 7:35 PM Tonghao Zhang 
> > > >  wrote:
> > > > >
> > > > > On Tue, Oct 22, 2019 at 2:58 PM Pravin Shelar  wrote:
> > > > > >
> > > > ...
> > > >
> > ...
> > > > >  struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
> > > > > @@ -400,10 +458,9 @@ static struct table_instance
> > > > > *table_instance_rehash(struct table_instance *ti,
> > > > > return new_ti;
> > > > >  }
> > > > >
> > > > > -int ovs_flow_tbl_flush(struct flow_table *flow_table)
> > > > > +int ovs_flow_tbl_flush(struct flow_table *table)
> > > > >  {
> > > > > -   struct table_instance *old_ti, *new_ti;
> > > > > -   struct table_instance *old_ufid_ti, *new_ufid_ti;
> > > > > +   struct table_instance *new_ti, *new_ufid_ti;
> > > > >
> > > > > new_ti = table_instance_alloc(TBL_MIN_BUCKETS);
> > > > > if (!new_ti)
> > > > > @@ -412,16 +469,12 @@ int ovs_flow_tbl_flush(struct flow_table 
> > > > > *flow_table)
> > > > > if (!new_ufid_ti)
> > > > > goto err_free_ti;
> > > > >
> > > > > -   old_ti = ovsl_dereference(flow_table->ti);
> > > > > -   old_ufid_ti = ovsl_dereference(flow_table->ufid_ti);
> > > > > +   table_instance_destroy(table, true);
> > > > >
> > > > This would destroy running table causing unnecessary flow miss. Lets
> > > > keep current scheme of setting up new table before destroying current
> > > > one.
> > > >
> > > > > -   rcu_assign_pointer(flow_table->ti, new_ti);
> > 
> > ...
> > >  /* Must be called with OVS mutex held. */
> > >  void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
> > >  {
> > > @@ -752,17 +794,7 @@ void ovs_flow_tbl_remove(struct flow_table
> > > *table, struct sw_flow *flow)
> > > struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> > >
> > > BUG_ON(table->count == 0);
> > > -   hlist_del_rcu(>flow_table.node[ti->node_ver]);
> > > -   table->count--;
> > > -   if (ovs_identifier_is_ufid(>id)) {
> > > -   hlist_del_rcu(>ufid_table.node[ufid_ti->node_ver]);
> > > -   table->ufid_count--;
> > > -   }
> > > -
> > > -   /* RCU delete the mask. 'flow->mask' is not NULLed, as it should 
> > > be
> > > -* accessible as long as the RCU read lock is held.
> > > -*/
> > > -   flow_mask_remove(table, flow->mask);
> > > +   table_instance_remove(table, ti, ufid_ti, flow, true);
> > >  }
> > Lets rename table_instance_remove() to imply it is freeing a flow.
> hi Pravin, the function ovs_flow_free will free the flow actually. In
> -ovs_flow_cmd_del
> ovs_flow_tbl_remove
> ...
> ovs_flow_free
>
> In -table_instance_destroy
> table_instance_remove
> ovs_flow_free
>
> But if rename the table_instance_remove, table_instance_flow_free ?
table_instance_flow_free() looks good.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-29 Thread Pravin Shelar

On Sun, Oct 27, 2019 at 11:49 PM Tonghao Zhang  wrote:
>
> On Thu, Oct 24, 2019 at 3:14 PM Pravin Shelar  wrote:
> >
> > On Tue, Oct 22, 2019 at 7:35 PM Tonghao Zhang  
> > wrote:
> > >
> > > On Tue, Oct 22, 2019 at 2:58 PM Pravin Shelar  wrote:
> > > >
> > ...
> >
...
> > >  struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti,
> > > @@ -400,10 +458,9 @@ static struct table_instance
> > > *table_instance_rehash(struct table_instance *ti,
> > > return new_ti;
> > >  }
> > >
> > > -int ovs_flow_tbl_flush(struct flow_table *flow_table)
> > > +int ovs_flow_tbl_flush(struct flow_table *table)
> > >  {
> > > -   struct table_instance *old_ti, *new_ti;
> > > -   struct table_instance *old_ufid_ti, *new_ufid_ti;
> > > +   struct table_instance *new_ti, *new_ufid_ti;
> > >
> > > new_ti = table_instance_alloc(TBL_MIN_BUCKETS);
> > > if (!new_ti)
> > > @@ -412,16 +469,12 @@ int ovs_flow_tbl_flush(struct flow_table 
> > > *flow_table)
> > > if (!new_ufid_ti)
> > > goto err_free_ti;
> > >
> > > -   old_ti = ovsl_dereference(flow_table->ti);
> > > -   old_ufid_ti = ovsl_dereference(flow_table->ufid_ti);
> > > +   table_instance_destroy(table, true);
> > >
> > This would destroy running table causing unnecessary flow miss. Lets
> > keep current scheme of setting up new table before destroying current
> > one.
> >
> > > -   rcu_assign_pointer(flow_table->ti, new_ti);

...
>  /* Must be called with OVS mutex held. */
>  void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
>  {
> @@ -752,17 +794,7 @@ void ovs_flow_tbl_remove(struct flow_table
> *table, struct sw_flow *flow)
> struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
>
> BUG_ON(table->count == 0);
> -   hlist_del_rcu(>flow_table.node[ti->node_ver]);
> -   table->count--;
> -   if (ovs_identifier_is_ufid(>id)) {
> -   hlist_del_rcu(>ufid_table.node[ufid_ti->node_ver]);
> -   table->ufid_count--;
> -   }
> -
> -   /* RCU delete the mask. 'flow->mask' is not NULLed, as it should be
> -* accessible as long as the RCU read lock is held.
> -*/
> -   flow_mask_remove(table, flow->mask);
> +   table_instance_remove(table, ti, ufid_ti, flow, true);
>  }
Lets rename table_instance_remove() to imply it is freeing a flow.
Otherwise looks good.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-24 Thread Pravin Shelar

On Tue, Oct 22, 2019 at 7:35 PM Tonghao Zhang  wrote:
>
> On Tue, Oct 22, 2019 at 2:58 PM Pravin Shelar  wrote:
> >
...

> > > >
> > Sure, I can review it, Can you send the patch inlined in mail?
> >
> > Thanks.
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index 5df5182..5b20793 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -257,10 +257,75 @@ static void flow_tbl_destroy_rcu_cb(struct rcu_head 
> *rcu)
> __table_instance_destroy(ti);
>  }
>
> -static void table_instance_destroy(struct table_instance *ti,
> -  struct table_instance *ufid_ti,
> +static void tbl_mask_array_del_mask(struct flow_table *tbl,
> +   struct sw_flow_mask *mask)
> +{
> +   struct mask_array *ma = ovsl_dereference(tbl->mask_array);
> +   int i, ma_count = READ_ONCE(ma->count);
> +
> +   /* Remove the deleted mask pointers from the array */
> +   for (i = 0; i < ma_count; i++) {
> +   if (mask == ovsl_dereference(ma->masks[i]))
> +   goto found;
> +   }
> +
> +   BUG();
> +   return;
> +
> +found:
> +   WRITE_ONCE(ma->count, ma_count -1);
> +
> +   rcu_assign_pointer(ma->masks[i], ma->masks[ma_count -1]);
> +   RCU_INIT_POINTER(ma->masks[ma_count -1], NULL);
> +
> +   kfree_rcu(mask, rcu);
> +
> +   /* Shrink the mask array if necessary. */
> +   if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
> +   ma_count <= (ma->max / 3))
> +   tbl_mask_array_realloc(tbl, ma->max / 2);
> +}
> +
> +/* Remove 'mask' from the mask list, if it is not needed any more. */
> +static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask 
> *mask)
> +{
> +   if (mask) {
> +   /* ovs-lock is required to protect mask-refcount and
> +* mask list.
> +*/
> +   ASSERT_OVSL();
> +   BUG_ON(!mask->ref_count);
> +   mask->ref_count--;
> +
> +   if (!mask->ref_count)
> +   tbl_mask_array_del_mask(tbl, mask);
> +   }
> +}
> +
> +static void table_instance_remove(struct flow_table *table, struct
> sw_flow *flow)
> +{
> +   struct table_instance *ti = ovsl_dereference(table->ti);
> +   struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> +
> +   BUG_ON(table->count == 0);
> +   hlist_del_rcu(>flow_table.node[ti->node_ver]);
> +   table->count--;
> +   if (ovs_identifier_is_ufid(>id)) {
> +   hlist_del_rcu(>ufid_table.node[ufid_ti->node_ver]);
> +   table->ufid_count--;
> +   }
> +
> +   /* RCU delete the mask. 'flow->mask' is not NULLed, as it should be
> +* accessible as long as the RCU read lock is held.
> +*/
> +   flow_mask_remove(table, flow->mask);
> +}
> +
> +static void table_instance_destroy(struct flow_table *table,
>bool deferred)
>  {
> +   struct table_instance *ti = ovsl_dereference(table->ti);
> +   struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> int i;
>
> if (!ti)
> @@ -274,13 +339,9 @@ static void table_instance_destroy(struct
> table_instance *ti,
> struct sw_flow *flow;
> struct hlist_head *head = >buckets[i];
> struct hlist_node *n;
> -   int ver = ti->node_ver;
> -   int ufid_ver = ufid_ti->node_ver;
>
> -   hlist_for_each_entry_safe(flow, n, head, 
> flow_table.node[ver]) {
> -   hlist_del_rcu(>flow_table.node[ver]);
> -   if (ovs_identifier_is_ufid(>id))
> -   
> hlist_del_rcu(>ufid_table.node[ufid_ver]);
> +   hlist_for_each_entry_safe(flow, n, head,
> flow_table.node[ti->node_ver]) {
> +   table_instance_remove(table, flow);
> ovs_flow_free(flow, deferred);
> }
> }
> @@ -300,12 +361,9 @@ static void table_instance_destroy(struct
> table_instance *ti,
>   */
>  void ovs_flow_tbl_destroy(struct flow_table *table)
>  {
> -   struct table_instance *ti = rcu_dereference_raw(table->ti);
> -   struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
> -
> free_percpu(table->mask_cache);
> kfree_rcu(rcu_dereference_raw(table->mask_array

Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-22 Thread Pravin Shelar

On Sun, Oct 20, 2019 at 10:02 PM Tonghao Zhang  wrote:
>
> On Sat, Oct 19, 2019 at 2:12 AM Pravin Shelar  wrote:
> >
> > On Thu, Oct 17, 2019 at 8:16 PM Tonghao Zhang  
> > wrote:
> > >
> > > On Fri, Oct 18, 2019 at 6:38 AM Pravin Shelar  wrote:
> > > >
> > > > On Wed, Oct 16, 2019 at 5:50 AM  wrote:
> > > > >
> > > > > From: Tonghao Zhang 
> > > > >
> > > > > When we destroy the flow tables which may contain the flow_mask,
> > > > > so release the flow mask struct.
> > > > >
> > > > > Signed-off-by: Tonghao Zhang 
> > > > > Tested-by: Greg Rose 
> > > > > ---
> > > > >  net/openvswitch/flow_table.c | 14 +-
> > > > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/openvswitch/flow_table.c 
> > > > > b/net/openvswitch/flow_table.c
> > > > > index 5df5182..d5d768e 100644
> > > > > --- a/net/openvswitch/flow_table.c
> > > > > +++ b/net/openvswitch/flow_table.c
> > > > > @@ -295,6 +295,18 @@ static void table_instance_destroy(struct 
> > > > > table_instance *ti,
> > > > > }
> > > > >  }
> > > > >
> > > > > +static void tbl_mask_array_destroy(struct flow_table *tbl)
> > > > > +{
> > > > > +   struct mask_array *ma = ovsl_dereference(tbl->mask_array);
> > > > > +   int i;
> > > > > +
> > > > > +   /* Free the flow-mask and kfree_rcu the NULL is allowed. */
> > > > > +   for (i = 0; i < ma->max; i++)
> > > > > +   kfree_rcu(rcu_dereference_raw(ma->masks[i]), rcu);
> > > > > +
> > > > > +   kfree_rcu(rcu_dereference_raw(tbl->mask_array), rcu);
> > > > > +}
> > > > > +
> > > > >  /* No need for locking this function is called from RCU callback or
> > > > >   * error path.
> > > > >   */
> > > > > @@ -304,7 +316,7 @@ void ovs_flow_tbl_destroy(struct flow_table 
> > > > > *table)
> > > > > struct table_instance *ufid_ti = 
> > > > > rcu_dereference_raw(table->ufid_ti);
> > > > >
> > > > > free_percpu(table->mask_cache);
> > > > > -   kfree_rcu(rcu_dereference_raw(table->mask_array), rcu);
> > > > > +   tbl_mask_array_destroy(table);
> > > > > table_instance_destroy(ti, ufid_ti, false);
> > > > >  }
> > > >
> > > > This should not be required. mask is linked to a flow and gets
> > > > released when flow is removed.
> > > > Does the memory leak occur when OVS module is abruptly unloaded and
> > > > userspace does not cleanup flow table?
> > > When we destroy the ovs datapath or net namespace is destroyed , the
> > > mask memory will be happened. The call tree:
> > > ovs_exit_net/ovs_dp_cmd_del
> > > -->__dp_destroy
> > > -->destroy_dp_rcu
> > > -->ovs_flow_tbl_destroy
> > > -->table_instance_destroy (which don't release the mask memory because
> > > don't call the ovs_flow_tbl_remove /flow_mask_remove directly or
> > > indirectly).
> > >
> > Thats what I suggested earlier, we need to call function similar to
> > ovs_flow_tbl_remove(), we could refactor code to use the code.
> > This is better since by introducing tbl_mask_array_destroy() is
> > creating a dangling pointer to mask in sw-flow object. OVS is anyway
> > iterating entire flow table to release sw-flow in
> > table_instance_destroy(), it is natural to release mask at that point
> > after releasing corresponding sw-flow.
> I got it, thanks. I rewrite the codes, can you help me to review it.
> If fine, I will sent it next version.
> >
> >
Sure, I can review it, Can you send the patch inlined in mail?

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-18 Thread Pravin Shelar

On Thu, Oct 17, 2019 at 8:16 PM Tonghao Zhang  wrote:
>
> On Fri, Oct 18, 2019 at 6:38 AM Pravin Shelar  wrote:
> >
> > On Wed, Oct 16, 2019 at 5:50 AM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > When we destroy the flow tables which may contain the flow_mask,
> > > so release the flow mask struct.
> > >
> > > Signed-off-by: Tonghao Zhang 
> > > Tested-by: Greg Rose 
> > > ---
> > >  net/openvswitch/flow_table.c | 14 +-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> > > index 5df5182..d5d768e 100644
> > > --- a/net/openvswitch/flow_table.c
> > > +++ b/net/openvswitch/flow_table.c
> > > @@ -295,6 +295,18 @@ static void table_instance_destroy(struct 
> > > table_instance *ti,
> > > }
> > >  }
> > >
> > > +static void tbl_mask_array_destroy(struct flow_table *tbl)
> > > +{
> > > +   struct mask_array *ma = ovsl_dereference(tbl->mask_array);
> > > +   int i;
> > > +
> > > +   /* Free the flow-mask and kfree_rcu the NULL is allowed. */
> > > +   for (i = 0; i < ma->max; i++)
> > > +   kfree_rcu(rcu_dereference_raw(ma->masks[i]), rcu);
> > > +
> > > +   kfree_rcu(rcu_dereference_raw(tbl->mask_array), rcu);
> > > +}
> > > +
> > >  /* No need for locking this function is called from RCU callback or
> > >   * error path.
> > >   */
> > > @@ -304,7 +316,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
> > > struct table_instance *ufid_ti = 
> > > rcu_dereference_raw(table->ufid_ti);
> > >
> > > free_percpu(table->mask_cache);
> > > -   kfree_rcu(rcu_dereference_raw(table->mask_array), rcu);
> > > +   tbl_mask_array_destroy(table);
> > > table_instance_destroy(ti, ufid_ti, false);
> > >  }
> >
> > This should not be required. mask is linked to a flow and gets
> > released when flow is removed.
> > Does the memory leak occur when OVS module is abruptly unloaded and
> > userspace does not cleanup flow table?
> When we destroy the ovs datapath or net namespace is destroyed , the
> mask memory will be happened. The call tree:
> ovs_exit_net/ovs_dp_cmd_del
> -->__dp_destroy
> -->destroy_dp_rcu
> -->ovs_flow_tbl_destroy
> -->table_instance_destroy (which don't release the mask memory because
> don't call the ovs_flow_tbl_remove /flow_mask_remove directly or
> indirectly).
>
Thats what I suggested earlier, we need to call function similar to
ovs_flow_tbl_remove(), we could refactor code to use the code.
This is better since by introducing tbl_mask_array_destroy() is
creating a dangling pointer to mask in sw-flow object. OVS is anyway
iterating entire flow table to release sw-flow in
table_instance_destroy(), it is natural to release mask at that point
after releasing corresponding sw-flow.



> but one thing, when we flush the flow, we don't flush the mask flow.(
> If necessary, one patch should be sent)
>
> > In that case better fix could be calling ovs_flow_tbl_remove()
> > equivalent from table_instance_destroy when it is iterating flow
> > table.
> I think operation of  the flow mask and flow table should use
> different API, for example:
> for flow mask, we use the:
> -tbl_mask_array_add_mask
> -tbl_mask_array_del_mask
> -tbl_mask_array_alloc
> -tbl_mask_array_realloc
> -tbl_mask_array_destroy(this patch introduce.)
>
> table instance:
> -table_instance_alloc
> -table_instance_destroy
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4 08/10] net: openvswitch: fix possible memleak on destroy flow-table

2019-10-17 Thread Pravin Shelar

On Wed, Oct 16, 2019 at 5:50 AM  wrote:
>
> From: Tonghao Zhang 
>
> When we destroy the flow tables which may contain the flow_mask,
> so release the flow mask struct.
>
> Signed-off-by: Tonghao Zhang 
> Tested-by: Greg Rose 
> ---
>  net/openvswitch/flow_table.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index 5df5182..d5d768e 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -295,6 +295,18 @@ static void table_instance_destroy(struct table_instance 
> *ti,
> }
>  }
>
> +static void tbl_mask_array_destroy(struct flow_table *tbl)
> +{
> +   struct mask_array *ma = ovsl_dereference(tbl->mask_array);
> +   int i;
> +
> +   /* Free the flow-mask and kfree_rcu the NULL is allowed. */
> +   for (i = 0; i < ma->max; i++)
> +   kfree_rcu(rcu_dereference_raw(ma->masks[i]), rcu);
> +
> +   kfree_rcu(rcu_dereference_raw(tbl->mask_array), rcu);
> +}
> +
>  /* No need for locking this function is called from RCU callback or
>   * error path.
>   */
> @@ -304,7 +316,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
> struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
>
> free_percpu(table->mask_cache);
> -   kfree_rcu(rcu_dereference_raw(table->mask_array), rcu);
> +   tbl_mask_array_destroy(table);
> table_instance_destroy(ti, ufid_ti, false);
>  }

This should not be required. mask is linked to a flow and gets
released when flow is removed.
Does the memory leak occur when OVS module is abruptly unloaded and
userspace does not cleanup flow table?
In that case better fix could be calling ovs_flow_tbl_remove()
equivalent from table_instance_destroy when it is iterating flow
table.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] net: openvswitch: free vport unless register_netdevice() succeeds

2019-08-10 Thread Pravin Shelar

On Thu, Aug 8, 2019 at 8:55 PM Hillf Danton  wrote:
>
>
> syzbot found the following crash on:
>
> HEAD commit:1e78030e Merge tag 'mmc-v5.3-rc1' of git://git.kernel.org/..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=148d3d1a60
> kernel config:  https://syzkaller.appspot.com/x/.config?x=30cef20daf3e9977
> dashboard link: https://syzkaller.appspot.com/bug?extid=13210896153522fe1ee5
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=136aa8c460
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=109ba79260
>
> =
> BUG: memory leak
> unreferenced object 0x8881207e4100 (size 128):
>comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
>hex dump (first 32 bytes):
>  00 70 16 18 81 88 ff ff 80 af 8c 22 81 88 ff ff  .p."
>  00 b6 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  ..#.
>backtrace:
>  [<0eb78212>] kmemleak_alloc_recursive  
> include/linux/kmemleak.h:43 [inline]
>  [<0eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
>  [<0eb78212>] slab_alloc mm/slab.c:3319 [inline]
>  [<0eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
>  [<006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
>  [<006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
>  [<006ea6c6>] ovs_vport_alloc+0x37/0xf0  
> net/openvswitch/vport.c:130
>  [] internal_dev_create+0x24/0x1d0  
> net/openvswitch/vport-internal_dev.c:164
>  [<56ee7c13>] ovs_vport_add+0x81/0x190  
> net/openvswitch/vport.c:199
>  [<5434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
>  [] ovs_dp_cmd_new+0x22f/0x410  
> net/openvswitch/datapath.c:1614
>  [] genl_family_rcv_msg+0x2ab/0x5b0  
> net/netlink/genetlink.c:629
>  [] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
>  [<6694b647>] netlink_rcv_skb+0x61/0x170  
> net/netlink/af_netlink.c:2477
>  [<88381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
>  [] netlink_unicast_kernel  
> net/netlink/af_netlink.c:1302 [inline]
>  [] netlink_unicast+0x1ec/0x2d0  
> net/netlink/af_netlink.c:1328
>  [<67e6b079>] netlink_sendmsg+0x270/0x480  
> net/netlink/af_netlink.c:1917
>  [] sock_sendmsg_nosec net/socket.c:637 [inline]
>  [] sock_sendmsg+0x54/0x70 net/socket.c:657
>  [<4cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
>  [] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
>  [] __do_sys_sendmsg net/socket.c:2365 [inline]
>  [] __se_sys_sendmsg net/socket.c:2363 [inline]
>  [] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363
>
> BUG: memory leak
> unreferenced object 0x88811723b600 (size 64):
>comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
>hex dump (first 32 bytes):
>  01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  
>  00 00 00 00 00 00 00 00 02 00 00 00 05 35 82 c1  .5..
>backtrace:
>  [<352f46d8>] kmemleak_alloc_recursive  
> include/linux/kmemleak.h:43 [inline]
>  [<352f46d8>] slab_post_alloc_hook mm/slab.h:522 [inline]
>  [<352f46d8>] slab_alloc mm/slab.c:3319 [inline]
>  [<352f46d8>] __do_kmalloc mm/slab.c:3653 [inline]
>  [<352f46d8>] __kmalloc+0x169/0x300 mm/slab.c:3664
>  [<8e48f3d1>] kmalloc include/linux/slab.h:557 [inline]
>  [<8e48f3d1>] ovs_vport_set_upcall_portids+0x54/0xd0  
> net/openvswitch/vport.c:343
>  [<541e4f4a>] ovs_vport_alloc+0x7f/0xf0  
> net/openvswitch/vport.c:139
>  [] internal_dev_create+0x24/0x1d0  
> net/openvswitch/vport-internal_dev.c:164
>  [<56ee7c13>] ovs_vport_add+0x81/0x190  
> net/openvswitch/vport.c:199
>  [<5434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
>  [] ovs_dp_cmd_new+0x22f/0x410  
> net/openvswitch/datapath.c:1614
>  [] genl_family_rcv_msg+0x2ab/0x5b0  
> net/netlink/genetlink.c:629
>  [] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
>  [<6694b647>] netlink_rcv_skb+0x61/0x170  
> net/netlink/af_netlink.c:2477
>  [<88381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
>  [] netlink_unicast_kernel  
> net/netlink/af_netlink.c:1302 [inline]
>  [] netlink_unicast+0x1ec/0x2d0  
> net/netlink/af_netlink.c:1328
>  [<67e6b079>] netlink_sendmsg+0x270/0x480  
> net/netlink/af_netlink.c:1917
>

Re: [ovs-dev] memory leak in internal_dev_create

2019-08-07 Thread Pravin Shelar

On Tue, Aug 6, 2019 at 5:00 AM Hillf Danton  wrote:
>
>
> On Tue, 06 Aug 2019 01:58:05 -0700
> > Hello,
> >
> > syzbot found the following crash on:
> >

...
> > BUG: memory leak
> > unreferenced object 0x8881228ca500 (size 128):
> >comm "syz-executor032", pid 7015, jiffies 4294944622 (age 7.880s)
> >hex dump (first 32 bytes):
> >  00 f0 27 18 81 88 ff ff 80 ac 8c 22 81 88 ff ff  ..'"
> >  40 b7 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  @.#.
> >backtrace:
> >  [<0eb78212>] kmemleak_alloc_recursive  
> > include/linux/kmemleak.h:43 [inline]
> >  [<0eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
> >  [<0eb78212>] slab_alloc mm/slab.c:3319 [inline]
> >  [<0eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
> >  [<006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
> >  [<006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
> >  [<006ea6c6>] ovs_vport_alloc+0x37/0xf0  
> > net/openvswitch/vport.c:130
> >  [] internal_dev_create+0x24/0x1d0  
> > net/openvswitch/vport-internal_dev.c:164
> >  [<56ee7c13>] ovs_vport_add+0x81/0x190  
> > net/openvswitch/vport.c:199
> >  [<5434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
> >  [] ovs_dp_cmd_new+0x22f/0x410  
> > net/openvswitch/datapath.c:1614
> >  [] genl_family_rcv_msg+0x2ab/0x5b0  
> > net/netlink/genetlink.c:629
> >  [] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
> >  [<6694b647>] netlink_rcv_skb+0x61/0x170  
> > net/netlink/af_netlink.c:2477
> >  [<88381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
> >  [] netlink_unicast_kernel  
> > net/netlink/af_netlink.c:1302 [inline]
> >  [] netlink_unicast+0x1ec/0x2d0  
> > net/netlink/af_netlink.c:1328
> >  [<67e6b079>] netlink_sendmsg+0x270/0x480  
> > net/netlink/af_netlink.c:1917
> >  [] sock_sendmsg_nosec net/socket.c:637 [inline]
> >  [] sock_sendmsg+0x54/0x70 net/socket.c:657
> >  [<4cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
> >  [] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
> >  [] __do_sys_sendmsg net/socket.c:2365 [inline]
> >  [] __se_sys_sendmsg net/socket.c:2363 [inline]
> >  [] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363
>
>
> Always free vport manually unless register_netdevice() succeeds.
>
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -137,7 +137,7 @@ static void do_setup(struct net_device *
> netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH |
>   IFF_NO_QUEUE;
> netdev->needs_free_netdev = true;
> -   netdev->priv_destructor = internal_dev_destructor;
> +   netdev->priv_destructor = NULL;
> netdev->ethtool_ops = _dev_ethtool_ops;
> netdev->rtnl_link_ops = _dev_link_ops;
>
> @@ -159,7 +159,6 @@ static struct vport *internal_dev_create
> struct internal_dev *internal_dev;
> struct net_device *dev;
> int err;
> -   bool free_vport = true;
>
> vport = ovs_vport_alloc(0, _internal_vport_ops, parms);
> if (IS_ERR(vport)) {
> @@ -190,10 +189,9 @@ static struct vport *internal_dev_create
>
> rtnl_lock();
> err = register_netdevice(vport->dev);
> -   if (err) {
> -   free_vport = false;
> +   if (err)
> goto error_unlock;
> -   }
> +   vport->dev->priv_destructor = internal_dev_destructor;
>
I am not sure why have you moved this assignment out of do_setup().

Otherwise patch looks good to me.

Thanks.
> dev_set_promiscuity(vport->dev, 1);
> rtnl_unlock();
> @@ -207,8 +205,7 @@ error_unlock:
>  error_free_netdev:
> free_netdev(dev);
>  error_free_vport:
> -   if (free_vport)
> -   ovs_vport_free(vport);
> +   ovs_vport_free(vport);
>  error:
> return ERR_PTR(err);
>  }
> --
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom

2019-07-15 Thread Pravin Shelar

On Fri, Jul 5, 2019 at 9:08 AM Taehee Yoo  wrote:
>
> When a vport is deleted, the maximum headroom size would be changed.
> If the vport which has the largest headroom is deleted,
> the new max_headroom would be set.
> But, if the new headroom size is equal to the old headroom size,
> updating routine is unnecessary.
>
> Signed-off-by: Taehee Yoo 

Sorry for late reply. This patch looks good to me too.

I am curious about reason to avoid adjustment to headroom. why are you
trying to avoid unnecessary changes to headroom.

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom

2019-07-11 Thread Pravin Shelar

I was bit busy for last couple of days. I will finish review by EOD today.

Thanks,
Pravin.

On Mon, Jul 8, 2019 at 4:22 PM Gregory Rose  wrote:
>
>
>
> On 7/8/2019 4:18 PM, Gregory Rose wrote:
> > On 7/8/2019 4:08 PM, David Miller wrote:
> >> From: Taehee Yoo 
> >> Date: Sat,  6 Jul 2019 01:08:09 +0900
> >>
> >>> When a vport is deleted, the maximum headroom size would be changed.
> >>> If the vport which has the largest headroom is deleted,
> >>> the new max_headroom would be set.
> >>> But, if the new headroom size is equal to the old headroom size,
> >>> updating routine is unnecessary.
> >>>
> >>> Signed-off-by: Taehee Yoo 
> >> I'm not so sure about the logic here and I'd therefore like an OVS
> >> expert
> >> to review this.
> >
> > I'll review and test it and get back.  Pravin may have input as well.
> >
>
> Err, adding Pravin.
>
> - Greg
>
> > Thanks,
> >
> > - Greg
> >
> >> Thanks.
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v5] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-29 Thread Pravin Shelar

On Wed, Mar 27, 2019 at 9:43 PM  wrote:
>
> From: wenxu 
>
> There is currently no support for the multicast/broadcast aspects
> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> And the packet can forward through the fdb table of vxlan devcice. In
> this mode the broadcast/multicast packet can be sent through the
> following ways in ovs.
>
> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> options:key=1000 options:remote_ip=flow
> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff, \
> action=output:vxlan
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
> Signed-off-by: wenxu 
> ---
>  include/uapi/linux/openvswitch.h |  1 +
>  net/openvswitch/flow_netlink.c   | 46 
> +++-
>  2 files changed, 37 insertions(+), 10 deletions(-)
>

Looks good.
Acked-by: Pravin B Shelar 

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] openvswitch: fix flow actions reallocation

2019-03-28 Thread Pravin Shelar

On Wed, Mar 27, 2019 at 11:36 PM Andrea Righi
 wrote:
>
> The flow action buffer can be resized if it's not big enough to contain
> all the requested flow actions. However, this resize doesn't take into
> account the new requested size, the buffer is only increased by a factor
> of 2x. This might be not enough to contain the new data, causing a
> buffer overflow, for example:
>
> [   42.044472] 
> =
> [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
> [   42.046415] 
> -
>
> [   42.047715] Disabling lock debugging due to kernel taint
> [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
> [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 
> flags=0x2808101
> [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
>
> [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc  
> 
> [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 
> 00  l...
> [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 
> f6  l...x...
> [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 
> 00   ...
> [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [   42.059061] Redzone 8bf2c4a5: 00 00 00 00  
> 
> [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a  
> 
>
> Fix by making sure the new buffer is properly resized to contain all the
> requested data.
>
> BugLink: https://bugs.launchpad.net/bugs/1813244
> Signed-off-by: Andrea Righi 
> ---
> Changes in v2:
>  - correctly resize to current_size+req_size (thanks to Pravin)
>
>  net/openvswitch/flow_netlink.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 691da853bef5..4bdf5e3ac208 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -2306,14 +2306,14 @@ static struct nlattr *reserve_sfa_size(struct 
> sw_flow_actions **sfa,
>
> struct sw_flow_actions *acts;
> int new_acts_size;
> -   int req_size = NLA_ALIGN(attr_len);
> +   size_t req_size = NLA_ALIGN(attr_len);
> int next_offset = offsetof(struct sw_flow_actions, actions) +
> (*sfa)->actions_len;
>
> if (req_size <= (ksize(*sfa) - next_offset))
> goto out;
>
> -   new_acts_size = ksize(*sfa) * 2;
> +   new_acts_size = max(next_offset + req_size, ksize(*sfa) * 2);
>
> if (new_acts_size > MAX_ACTIONS_BUFSIZE) {
> if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
> --

Looks good.
Acked-by: Pravin B Shelar 

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v4] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-27 Thread Pravin Shelar

On Tue, Mar 26, 2019 at 8:16 PM  wrote:
>
> From: wenxu 
>
> There is currently no support for the multicast/broadcast aspects
> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> And the packet can forward through the fdb table of vxlan devcice. In
> this mode the broadcast/multicast packet can be sent through the
> following ways in ovs.
>
> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> options:key=1000 options:remote_ip=flow
> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff, \
> action=output:vxlan
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
> Signed-off-by: wenxu 
> ---
>  include/uapi/linux/openvswitch.h |  1 +
>  net/openvswitch/flow_netlink.c   | 36 
>  2 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index dbe0cbe..b8913b9 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -364,6 +364,7 @@ enum ovs_tunnel_key_attr {
> OVS_TUNNEL_KEY_ATTR_IPV6_DST,   /* struct in6_addr dst IPv6 
> address. */
> OVS_TUNNEL_KEY_ATTR_PAD,
> OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* struct erspan_metadata */
> +   OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE,   /* No argument. 
> IPV4_INFO_BRIDGE mode.*/
> __OVS_TUNNEL_KEY_ATTR_MAX
>  };
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 691da85..edd7b41 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
...

> @@ -912,6 +931,13 @@ static int __ip_tun_to_nlattr(struct sk_buff *skb,
> return -EMSGSIZE;
> }
>
> +   if (tun_proto == AF_INET && !(output->tun_flags & ~TUNNEL_KEY) &&
> +   !output->u.ipv4.src && !output->u.ipv4.dst &&
> +   !output->tp_src && !output->tp_dst &&
> +   !output->ttl && !output->tos &&
> +   nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE))
> +   return -EMSGSIZE;
> +
It would be easier to just check IP_TUNNEL_INFO_BRIDGE in tunnel-info.
we can also skip serializing other parameters except VNI in this case.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] openvswitch: fix flow actions reallocation

2019-03-27 Thread Pravin Shelar

On Wed, Mar 27, 2019 at 3:11 PM Andrea Righi  wrote:
>
> The flow action buffer can be resized if it's not big enough to contain
> all the requested flow actions. However, this resize doesn't take into
> account the new requested size, the buffer is only increased by a factor
> of 2x. This might be not enough to contain the new data, causing a
> buffer overflow, for example:
>
> [   42.044472] 
> =
> [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
> [   42.046415] 
> -
>
> [   42.047715] Disabling lock debugging due to kernel taint
> [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
> [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 
> flags=0x2808101
> [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
>
> [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc  
> 
> [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 
> 00  l...
> [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 
> f6  l...x...
> [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 
> 00   ...
> [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00  
> [   42.059061] Redzone 8bf2c4a5: 00 00 00 00  
> 
> [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a  
> 
>
> Fix by making sure the new buffer is properly resized to contain all the
> requested data.
>
> BugLink: https://bugs.launchpad.net/bugs/1813244
> Signed-off-by: Andrea Righi 

This must be rare combination of action that trigger this case.

> ---
>  net/openvswitch/flow_netlink.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 691da853bef5..e6f789badaa3 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -2306,14 +2306,14 @@ static struct nlattr *reserve_sfa_size(struct 
> sw_flow_actions **sfa,
>
> struct sw_flow_actions *acts;
> int new_acts_size;
> -   int req_size = NLA_ALIGN(attr_len);
> +   size_t req_size = NLA_ALIGN(attr_len);
> int next_offset = offsetof(struct sw_flow_actions, actions) +
> (*sfa)->actions_len;
>
> if (req_size <= (ksize(*sfa) - next_offset))
> goto out;
>
> -   new_acts_size = ksize(*sfa) * 2;
> +   new_acts_size = max(req_size, ksize(*sfa) * 2);
>
Don't we need to compare current_action_size+req_size and
current_action_size*2 here ?

> if (new_acts_size > MAX_ACTIONS_BUFSIZE) {
> if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) {
> --
> 2.19.1
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2 net-next] net: openvswitch: Add a new action check_pkt_len

2019-03-26 Thread Pravin Shelar

On Mon, Mar 25, 2019 at 5:44 PM  wrote:
>
> From: Numan Siddique 
>
> This patch adds a new action - 'check_pkt_len' which checks the
> packet length and executes a set of actions if the packet
> length is greater than the specified length or executes
> another set of actions if the packet length is lesser or equal to.
>
> This action takes below nlattrs
>   * OVS_CHECK_PKT_LEN_ATTR_PKT_LEN - 'pkt_len' to check for
>
>   * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_GREATER - Nested actions
> to apply if the packet length is greater than the specified 'pkt_len'
>
>   * OVS_CHECK_PKT_LEN_ATTR_ACTIONS_IF_LESS_EQUAL - Nested
> actions to apply if the packet length is lesser or equal to the
> specified 'pkt_len'.
>
> The main use case for adding this action is to solve the packet
> drops because of MTU mismatch in OVN virtual networking solution.
> When a VM (which belongs to a logical switch of OVN) sends a packet
> destined to go via the gateway router and if the nic which provides
> external connectivity, has a lesser MTU, OVS drops the packet
> if the packet length is greater than this MTU.
>
> With the help of this action, OVN will check the packet length
> and if it is greater than the MTU size, it will generate an
> ICMP packet (type 3, code 4) and includes the next hop mtu in it
> so that the sender can fragment the packets.
>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-July/047039.html
> Suggested-by: Ben Pfaff 
> Signed-off-by: Numan Siddique 
> CC: Gregory Rose 
> CC: Pravin B Shelar 
> ---
> v1 -> v2
> -
>* Addressed the review comments.
>  - Removed the vlan-tag length when checking the packet length
>  - Reordered the netlink attributes
>  - Changed the comments to use 'attribute' instead of 'action'
>
> Corresponding OVS patch (submitted as RFC for now)  which makes use of this 
> action can be
> found here - https://patchwork.ozlabs.org/patch/1059081/
>
Looks good to me.
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v2] openvswitch: add seqadj extension when NAT is used.

2019-03-26 Thread Pravin Shelar

On Mon, Mar 25, 2019 at 11:58 AM Flavio Leitner  wrote:
>
> When the conntrack is initialized, there is no helper attached
> yet so the nat info initialization (nf_nat_setup_info) skips
> adding the seqadj ext.
>
> A helper is attached later when the conntrack is not confirmed
> but is going to be committed. In this case, if NAT is needed then
> adds the seqadj ext as well.
>
> Fixes: 16ec3d4fbb96 ("openvswitch: Fix cached ct with helper.")
> Signed-off-by: Flavio Leitner 
> ---
>  net/openvswitch/conntrack.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> Changelog:
> v2 - removed nfct_help(ct) check as it is not necessary.
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 51080004677e..845b83598e0d 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -990,6 +990,12 @@ static int __ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
> GFP_ATOMIC);
> if (err)
> return err;
> +
> +   /* helper installed, add seqadj if NAT is required */
> +   if (info->nat && !nfct_seqadj(ct)) {
> +   if (!nfct_seqadj_ext_add(ct))
> +   return -EINVAL;
> +   }
> }
>
> /* Call the helper only if:

Acked-by: Pravin B Shelar 

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next v3] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-26 Thread Pravin Shelar

On Mon, Mar 25, 2019 at 8:46 PM  wrote:
>
> From: wenxu 
>
> There is currently no support for the multicast/broadcast aspects
> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> And the packet can forward through the fdb table of vxlan devcice. In
> this mode the broadcast/multicast packet can be sent through the
> following ways in ovs.
>
> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> options:key=1000 options:remote_ip=flow
> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff, \
> action=output:vxlan
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
> Signed-off-by: wenxu 
> ---
>  include/uapi/linux/openvswitch.h |  1 +
>  net/openvswitch/flow_netlink.c   | 29 +
>  2 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index dbe0cbe..b8913b9 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -364,6 +364,7 @@ enum ovs_tunnel_key_attr {
> OVS_TUNNEL_KEY_ATTR_IPV6_DST,   /* struct in6_addr dst IPv6 
> address. */
> OVS_TUNNEL_KEY_ATTR_PAD,
> OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* struct erspan_metadata */
> +   OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE,   /* No argument. 
> IPV4_INFO_BRIDGE mode.*/
> __OVS_TUNNEL_KEY_ATTR_MAX
>  };
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 691da85..ed12b3f 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -403,6 +403,7 @@ size_t ovs_key_attr_size(void)
> [OVS_TUNNEL_KEY_ATTR_IPV6_SRC]  = { .len = sizeof(struct 
> in6_addr) },
> [OVS_TUNNEL_KEY_ATTR_IPV6_DST]  = { .len = sizeof(struct 
> in6_addr) },
> [OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS]   = { .len = OVS_ATTR_VARIABLE },
> +   [OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE]   = { .len = 0 },
>  };
>
>  static const struct ovs_len_tbl
> @@ -666,6 +667,7 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
>   bool log)
>  {
> bool ttl = false, ipv4 = false, ipv6 = false;
> +   bool info_bridge_mode = false;
> __be16 tun_flags = 0;
> int opts_type = 0;
> struct nlattr *a;
> @@ -782,6 +784,10 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> tun_flags |= TUNNEL_ERSPAN_OPT;
> opts_type = type;
> break;
> +   case OVS_TUNNEL_KEY_ATTR_IPV4_INFO_BRIDGE:
> +   info_bridge_mode = true;
> +   ipv4 = true;
> +   break;
> default:
> OVS_NLERR(log, "Unknown IP tunnel attribute %d",
>   type);
> @@ -812,16 +818,29 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> OVS_NLERR(log, "IP tunnel dst address not specified");
> return -EINVAL;
> }
> -   if (ipv4 && !match->key->tun_key.u.ipv4.dst) {
> -   OVS_NLERR(log, "IPv4 tunnel dst address is zero");
> -   return -EINVAL;
> +   if (ipv4) {
> +   if (info_bridge_mode) {
> +   if (match->key->tun_key.u.ipv4.src ||
> +   match->key->tun_key.u.ipv4.dst ||
> +   match->key->tun_key.tp_src ||
> +   match->key->tun_key.tp_dst ||
> +   match->key->tun_key.ttl ||
> +   match->key->tun_key.tos ||
> +   tun_flags & ~TUNNEL_KEY) {
> +   OVS_NLERR(log, "IPv4 tun info is not 
> correct");
> +   return -EINVAL;
> +   }
> +   } else if (!match->key->tun_key.u.ipv4.dst) {
> +   OVS_NLERR(log, "IPv4 tunnel dst address is 
> zero");
> +   return -EINVAL;
> +   }
> }
> if (ipv6 && ipv6_addr_any(>key->tun_key.u.ipv6.dst)) {
> OVS_NLERR(log, "IPv6 tunnel dst address is zero");
> return -EINVAL;
> }
>
> -   if (!ttl) {
> +   if (!ttl && !info_bridge_mode) {
> OVS_NLERR(log, "IP tunnel TTL not specified.");
> return -EINVAL;
> }
ip_tun_to_nlattr() also needs to be

Re: [ovs-dev] [PATCH net-next v2] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-25 Thread Pravin Shelar

On Sat, Mar 23, 2019 at 4:03 AM  wrote:
>
> From: wenxu 
>
> There is currently no support for the multicast/broadcast aspects
> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> And the packet can forward through the fdb table of vxlan devcice. In
> this mode the broadcast/multicast packet can be sent through the
> following ways in ovs.
>
> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> options:key=1000 options:remote_ip=flow
> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff,\
> action=output:vxlan
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
> Signed-off-by: wenxu 
> ---
>  include/uapi/linux/openvswitch.h |  1 +
>  net/openvswitch/flow_netlink.c   | 19 ---
>  2 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index dbe0cbe..696a308 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -364,6 +364,7 @@ enum ovs_tunnel_key_attr {
> OVS_TUNNEL_KEY_ATTR_IPV6_DST,   /* struct in6_addr dst IPv6 
> address. */
> OVS_TUNNEL_KEY_ATTR_PAD,
> OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* struct erspan_metadata */
> +   OVS_TUNNEL_KEY_ATTR_NO_IPV4_DST,/* No argument. No dst IP 
> address. */
This should be explicitly named to indicate its IP_TUNNEL_INFO_BRIDGE mode.

> __OVS_TUNNEL_KEY_ATTR_MAX
>  };
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 691da85..7abea44 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -403,6 +403,7 @@ size_t ovs_key_attr_size(void)
> [OVS_TUNNEL_KEY_ATTR_IPV6_SRC]  = { .len = sizeof(struct 
> in6_addr) },
> [OVS_TUNNEL_KEY_ATTR_IPV6_DST]  = { .len = sizeof(struct 
> in6_addr) },
> [OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS]   = { .len = OVS_ATTR_VARIABLE },
> +   [OVS_TUNNEL_KEY_ATTR_NO_IPV4_DST]   = { .len = 0 },
>  };
>
>  static const struct ovs_len_tbl
> @@ -666,6 +667,7 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
>   bool log)
>  {
> bool ttl = false, ipv4 = false, ipv6 = false;
> +   bool no_ipv4_dst = false;
> __be16 tun_flags = 0;
> int opts_type = 0;
> struct nlattr *a;
> @@ -782,6 +784,10 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> tun_flags |= TUNNEL_ERSPAN_OPT;
> opts_type = type;
> break;
> +   case OVS_TUNNEL_KEY_ATTR_NO_IPV4_DST:
> +   no_ipv4_dst = true;
> +   ipv4 = true;
> +   break;
> default:
> OVS_NLERR(log, "Unknown IP tunnel attribute %d",
>   type);
> @@ -812,9 +818,14 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> OVS_NLERR(log, "IP tunnel dst address not specified");
> return -EINVAL;
> }
> -   if (ipv4 && !match->key->tun_key.u.ipv4.dst) {
> -   OVS_NLERR(log, "IPv4 tunnel dst address is zero");
> -   return -EINVAL;
> +   if (ipv4) {
> +   if (no_ipv4_dst && match->key->tun_key.u.ipv4.dst) {
> +   OVS_NLERR(log, "IPv4 tunnel dst address is 
> not zero");
> +   return -EINVAL;
I am not sure why dst-ip is only validated here, in
IP_TUNNEL_INFO_BRIDGE mode all parameters except VNI are ignored.
either we need to check entire tunnel key or skip check for dst-ip, I
would prefer checking entire tun-key.


> +   } else if (!no_ipv4_dst && 
> !match->key->tun_key.u.ipv4.dst) {
> +   OVS_NLERR(log, "IPv4 tunnel dst address is 
> zero");
> +   return -EINVAL;
> +   }
> }
> if (ipv6 && ipv6_addr_any(>key->tun_key.u.ipv6.dst)) {
> OVS_NLERR(log, "IPv6 tunnel dst address is zero");
> @@ -2605,6 +2616,8 @@ static int validate_and_copy_set_tun(const struct 
> nlattr *attr,
> tun_info->mode = IP_TUNNEL_INFO_TX;
> if (key.tun_proto == AF_INET6)
> tun_info->mode |= IP_TUNNEL_INFO_IPV6;
> +   else if (key.tun_proto == AF_INET && key.tun_key.u.ipv4.dst == 0)
> +   tun_info->mode |= IP_TUNNEL_INFO_BRIDGE;
> tun_info->key = key.tun_key;
>
> /* We need to store the options in the action itself since
> --
> 1.8.3.1
>

Re: [ovs-dev] [PATCH net-next] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-25 Thread Pravin Shelar

On Sun, Mar 24, 2019 at 6:24 PM wenxu  wrote:
>
> On 2019/3/25 上午2:46, Pravin Shelar wrote:
> > On Sun, Mar 24, 2019 at 12:03 AM wenxu  wrote:
> >> On 2019/3/24 上午5:39, Pravin Shelar wrote:
> >>> On Sat, Mar 23, 2019 at 2:18 AM wenxu  wrote:
> >>>> On 2019/3/23 下午3:50, Pravin Shelar wrote:
> >>>>
> >>>> On Thu, Mar 21, 2019 at 3:34 AM  wrote:
> >>>>
> >>>> From: wenxu 
> >>>>
> >>>> There is currently no support for the multicasti/broadcst aspects
> >>>> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> >>>> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> >>>> And the packet can forward through the fdb of vxlan devcice. In
> >>>> this mode the broadcast/multicast packet can be sent through the
> >>>> following ways in ovs.
> >>>>
> >>>> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> >>>> options:key=1000 options:remote_ip=flow
> >>>> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff,\
> >>>> action=output:vxlan
> >>>>
> >>>> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> >>>> src_vni 1000 vni 1000 self
> >>>> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> >>>> src_vni 1000 vni 1000 self
> >>>>
> >>>> This would make datapath bit complicated, can you give example of such 
> >>>> use-case?
> >>>>
> >>>> There is currently no support for the multicast/broadcast aspects
> >>>> of VXLAN in ovs. To get around the lack of multicast support, it is 
> >>>> possible to
> >>>> pre-provision MAC to IP address mappings either manually or from a 
> >>>> controller.
> >>>>
> >>>> With this patch we can achieve this through the fdb of the lower vxlan
> >>>> device.
> >>>>
> >>>> For example. three severs connects with vxlan.
> >>>> server1 IP 10.0.0.1 tunnel IP  172.168.0.1 vni 1000
> >>>> server2 IP 10.0.0.2 tunnel IP  172.168.0.2 vni 1000
> >>>> server3 IP 10.0.0.3 tunnel IP  172.168.0.3 vni 1000
> >>>>
> >>>> All the broadcast arp request from server1, can be send to vxlan_sys_4789
> >>>> in IP_TUNNEL_INFO_BRIDGE mode. Then the broadcast packet can send through
> >>>> the fdb table in the vxlan device as following:
> >>>>
> >>>> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> >>>> src_vni 1000 vni 1000 self
> >>>> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> >>>> src_vni 1000 vni 1000 self
> >>>>
> >>>>
> >>>> Not any for multicast case. This patch make ovs vxlan tunnel using the 
> >>>> fdb
> >>>> table of lower vxlan device.
> >>> Have you tried OVS mac learning?
> >>>
> >> The key point is that it makes ovs vxlan tunnel can make use of the fdb 
> >> table of lower vxlan device.
> >>
> >> The fdb table can be configurable or mac learning from outside.
> >>
> >> For the broadcast example.  In the ovs, it can only achieve this through 
> >> multiple output actions to simulate the broadcast.
> >>
> >> ovs-ofctl add-flow br0 
> >> in_port=server1,dl_dst=ff:ff:ff:ff:ff:ff,actions=set_field:172.168.0.1->tun_dst,output:vxlan,\
> >>
> >> set_field:172.168.0.2->tun_dst,output:vxlan.
> >>
> >> But there are some limits for the number of output actions.
> >>
> > I was referring to mac-learning feature in OVS i.e. using learn
> > action. I wanted to see if there is something that you are not able to
> > do with OVS learn action.
> >
> Ovs mac learn action is only work for the specific vxlan tunnel port（ fixed 
> tun_dst, tun_id) like following.
>
> ovs-vsctl set in vxlan options:remote_ip=172.168.0.1 options:key=1000
>
> ( This is the same problem for Linux bridge, It achieve this through 
> IP_TUNNEL_INFO_BRIDGE mode work
>
> with the fdb of lower vxlan device)
>
>
> But it is not work for the flow based tunnel (remote_ip=flow),  There will be 
> huge number of the tunnel peer.
>
> It' hard to manage the tunnel port with the specific mode.
>
>
OK, So it is hard to use ovs learn action, but it is doable. Given
IP_TUNNEL_INFO_BRIDGE is not adding too much complexity to OVS, I am
fine adding this feature.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-24 Thread Pravin Shelar

On Sun, Mar 24, 2019 at 12:03 AM wenxu  wrote:
>
> On 2019/3/24 上午5:39, Pravin Shelar wrote:
> > On Sat, Mar 23, 2019 at 2:18 AM wenxu  wrote:
> >> On 2019/3/23 下午3:50, Pravin Shelar wrote:
> >>
> >> On Thu, Mar 21, 2019 at 3:34 AM  wrote:
> >>
> >> From: wenxu 
> >>
> >> There is currently no support for the multicasti/broadcst aspects
> >> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> >> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> >> And the packet can forward through the fdb of vxlan devcice. In
> >> this mode the broadcast/multicast packet can be sent through the
> >> following ways in ovs.
> >>
> >> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> >> options:key=1000 options:remote_ip=flow
> >> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff,\
> >> action=output:vxlan
> >>
> >> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> >> src_vni 1000 vni 1000 self
> >> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> >> src_vni 1000 vni 1000 self
> >>
> >> This would make datapath bit complicated, can you give example of such 
> >> use-case?
> >>
> >> There is currently no support for the multicast/broadcast aspects
> >> of VXLAN in ovs. To get around the lack of multicast support, it is 
> >> possible to
> >> pre-provision MAC to IP address mappings either manually or from a 
> >> controller.
> >>
> >> With this patch we can achieve this through the fdb of the lower vxlan
> >> device.
> >>
> >> For example. three severs connects with vxlan.
> >> server1 IP 10.0.0.1 tunnel IP  172.168.0.1 vni 1000
> >> server2 IP 10.0.0.2 tunnel IP  172.168.0.2 vni 1000
> >> server3 IP 10.0.0.3 tunnel IP  172.168.0.3 vni 1000
> >>
> >> All the broadcast arp request from server1, can be send to vxlan_sys_4789
> >> in IP_TUNNEL_INFO_BRIDGE mode. Then the broadcast packet can send through
> >> the fdb table in the vxlan device as following:
> >>
> >> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> >> src_vni 1000 vni 1000 self
> >> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> >> src_vni 1000 vni 1000 self
> >>
> >>
> >> Not any for multicast case. This patch make ovs vxlan tunnel using the fdb
> >> table of lower vxlan device.
> > Have you tried OVS mac learning?
> >
> The key point is that it makes ovs vxlan tunnel can make use of the fdb table 
> of lower vxlan device.
>
> The fdb table can be configurable or mac learning from outside.
>
> For the broadcast example.  In the ovs, it can only achieve this through 
> multiple output actions to simulate the broadcast.
>
> ovs-ofctl add-flow br0 
> in_port=server1,dl_dst=ff:ff:ff:ff:ff:ff,actions=set_field:172.168.0.1->tun_dst,output:vxlan,\
>
> set_field:172.168.0.2->tun_dst,output:vxlan.
>
> But there are some limits for the number of output actions.
>
I was referring to mac-learning feature in OVS i.e. using learn
action. I wanted to see if there is something that you are not able to
do with OVS learn action.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-23 Thread Pravin Shelar

On Sat, Mar 23, 2019 at 2:18 AM wenxu  wrote:
>
> On 2019/3/23 下午3:50, Pravin Shelar wrote:
>
> On Thu, Mar 21, 2019 at 3:34 AM  wrote:
>
> From: wenxu 
>
> There is currently no support for the multicasti/broadcst aspects
> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> And the packet can forward through the fdb of vxlan devcice. In
> this mode the broadcast/multicast packet can be sent through the
> following ways in ovs.
>
> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> options:key=1000 options:remote_ip=flow
> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff,\
> action=output:vxlan
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
> This would make datapath bit complicated, can you give example of such 
> use-case?
>
> There is currently no support for the multicast/broadcast aspects
> of VXLAN in ovs. To get around the lack of multicast support, it is possible 
> to
> pre-provision MAC to IP address mappings either manually or from a controller.
>
> With this patch we can achieve this through the fdb of the lower vxlan
> device.
>
> For example. three severs connects with vxlan.
> server1 IP 10.0.0.1 tunnel IP  172.168.0.1 vni 1000
> server2 IP 10.0.0.2 tunnel IP  172.168.0.2 vni 1000
> server3 IP 10.0.0.3 tunnel IP  172.168.0.3 vni 1000
>
> All the broadcast arp request from server1, can be send to vxlan_sys_4789
> in IP_TUNNEL_INFO_BRIDGE mode. Then the broadcast packet can send through
> the fdb table in the vxlan device as following:
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
>
> Not any for multicast case. This patch make ovs vxlan tunnel using the fdb
> table of lower vxlan device.

Have you tried OVS mac learning?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: add seqadj extension when NAT is used.

2019-03-23 Thread Pravin Shelar

On Thu, Mar 21, 2019 at 9:52 AM Flavio Leitner  wrote:
>
> When the conntrack is initialized, there is no helper attached
> yet so the nat info initialization (nf_nat_setup_info) skips
> adding the seqadj ext.
>
> A helper is attached later when the conntrack is not confirmed
> but is going to be committed. In this case, if NAT is needed then
> adds the seqadj ext as well.
>
> Fixes: 16ec3d4fbb96 ("openvswitch: Fix cached ct with helper.")
> Signed-off-by: Flavio Leitner 
> ---
>  net/openvswitch/conntrack.c | 5 +
>  1 file changed, 5 insertions(+)
>
I am not able to apply this patch.

> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 1b6896896fff..a7664515c943 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -990,6 +990,11 @@ static int __ovs_ct_lookup(struct net *net, struct 
> sw_flow_key *key,
> GFP_ATOMIC);
> if (err)
> return err;
> +
> +   if (info->nat && nfct_help(ct) && !nfct_seqadj(ct)) {
Given helper is just assigned, is nfct_help() check required here?

> +   if (!nfct_seqadj_ext_add(ct))
> +   return -EINVAL;
> +   }
> }
>
> /* Call the helper only if:
> --
> 2.20.1
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next] openvswitch: Make metadata_dst tunnel work in IP_TUNNEL_INFO_BRIDGE mode

2019-03-23 Thread Pravin Shelar

On Thu, Mar 21, 2019 at 3:34 AM  wrote:
>
> From: wenxu 
>
> There is currently no support for the multicasti/broadcst aspects
> of VXLAN in ovs. In the datapath flow the tun_dst must specific.
> But in the IP_TUNNEL_INFO_BRIDGE mode the tun_dst can not be specific.
> And the packet can forward through the fdb of vxlan devcice. In
> this mode the broadcast/multicast packet can be sent through the
> following ways in ovs.
>
> ovs-vsctl add-port br0 vxlan -- set in vxlan type=vxlan \
> options:key=1000 options:remote_ip=flow
> ovs-ofctl add-flow br0 in_port=LOCAL,dl_dst=ff:ff:ff:ff:ff:ff,\
> action=output:vxlan
>
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.1 \
> src_vni 1000 vni 1000 self
> bridge fdb append ff:ff:ff:ff:ff:ff dev vxlan_sys_4789 dst 172.168.0.2 \
> src_vni 1000 vni 1000 self
>
This would make datapath bit complicated, can you give example of such use-case?

> Signed-off-by: wenxu 
> ---
>  include/uapi/linux/openvswitch.h |  1 +
>  net/openvswitch/flow_netlink.c   | 32 ++--
>  2 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index dbe0cbe..696a308 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -364,6 +364,7 @@ enum ovs_tunnel_key_attr {
> OVS_TUNNEL_KEY_ATTR_IPV6_DST,   /* struct in6_addr dst IPv6 
> address. */
> OVS_TUNNEL_KEY_ATTR_PAD,
> OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* struct erspan_metadata */
> +   OVS_TUNNEL_KEY_ATTR_NO_IPV4_DST,/* No argument. No dst IP 
> address. */
> __OVS_TUNNEL_KEY_ATTR_MAX
>  };
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 691da85..033df5c 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -403,6 +403,7 @@ size_t ovs_key_attr_size(void)
> [OVS_TUNNEL_KEY_ATTR_IPV6_SRC]  = { .len = sizeof(struct 
> in6_addr) },
> [OVS_TUNNEL_KEY_ATTR_IPV6_DST]  = { .len = sizeof(struct 
> in6_addr) },
> [OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS]   = { .len = OVS_ATTR_VARIABLE },
> +   [OVS_TUNNEL_KEY_ATTR_NO_IPV4_DST]   = { .len = 0 },
>  };
>
>  static const struct ovs_len_tbl
> @@ -663,7 +664,7 @@ static int erspan_tun_opt_from_nlattr(const struct nlattr 
> *a,
>
>  static int ip_tun_from_nlattr(const struct nlattr *attr,
>   struct sw_flow_match *match, bool is_mask,
> - bool log)
> + bool log, bool *no_ipv4_dst)
>  {
> bool ttl = false, ipv4 = false, ipv6 = false;
> __be16 tun_flags = 0;
> @@ -671,6 +672,9 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> struct nlattr *a;
> int rem;
>
> +   if (no_ipv4_dst)
> +   *no_ipv4_dst = false;
> +
> nla_for_each_nested(a, attr, rem) {
> int type = nla_type(a);
> int err;
> @@ -782,6 +786,12 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> tun_flags |= TUNNEL_ERSPAN_OPT;
> opts_type = type;
> break;
> +   case OVS_TUNNEL_KEY_ATTR_NO_IPV4_DST:
> +   if (no_ipv4_dst) {
> +   *no_ipv4_dst = true;
> +   ipv4 = true;
> +   }
> +   break;
> default:
> OVS_NLERR(log, "Unknown IP tunnel attribute %d",
>   type);
> @@ -812,9 +822,16 @@ static int ip_tun_from_nlattr(const struct nlattr *attr,
> OVS_NLERR(log, "IP tunnel dst address not specified");
> return -EINVAL;
> }
> -   if (ipv4 && !match->key->tun_key.u.ipv4.dst) {
> -   OVS_NLERR(log, "IPv4 tunnel dst address is zero");
> -   return -EINVAL;
> +   if (ipv4) {
> +   bool no_dst = no_ipv4_dst ? *no_ipv4_dst : false;
> +
> +   if (no_dst && match->key->tun_key.u.ipv4.dst) {
> +   OVS_NLERR(log, "IPv4 tunnel dst address is 
> not zero");
> +   return -EINVAL;
> +   } else if (!no_dst && 
> !match->key->tun_key.u.ipv4.dst) {
> +   OVS_NLERR(log, "IPv4 tunnel dst address is 
> zero");
> +   return -EINVAL;
> +   }
> }
> if (ipv6 && ipv6_addr_any(>key->tun_key.u.ipv6.dst)) {
> OVS_NLERR(log, "IPv6 tunnel dst address is zero");
> @@ -1178,7 +1195,7 @@ static int metadata_from_nlattrs(struct net *net, 
> struct sw_flow_match *match,
> }
> if (*attrs & (1 <<

Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros

2019-02-03 Thread Pravin Shelar

On Sun, Feb 3, 2019 at 1:12 AM Eli Britstein  wrote:
>
> Declare ovs key structures using macros as a pre-step towards to
> enable retrieving fields information, as a work done in proposed
> commit in the OVS tree https://patchwork.ozlabs.org/patch/1023406/
> ("odp-util: Do not rewrite fields with the same values as matched"),
> with no functional change.
>
> Signed-off-by: Eli Britstein 
> Reviewed-by: Roi Dayan 
> ---
>  include/uapi/linux/openvswitch.h | 102
++-
>  1 file changed, 69 insertions(+), 33 deletions(-)
>

Thanks.
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 1/1] openvswitch: Declare ovs key structures using macros

2019-01-31 Thread Pravin Shelar

Can you send patch with this information in commit msg?


On Thu, Jan 31, 2019 at 3:32 AM Eli Britstein  wrote:
>
> ping
>
> for the using patch, i put below the v1 of it. here is v2:
>
> https://patchwork.ozlabs.org/patch/1023406/
>
>
> On 1/27/2019 8:37 AM, Eli Britstein wrote:
> >
> > On 1/27/2019 1:04 AM, David Miller wrote:
> >> From: Eli Britstein 
> >> Date: Thu, 24 Jan 2019 11:46:47 +0200
> >>
> >>> Declare ovs key structures using macros to enable retrieving fields
> >>> information, with no functional change.
> >>>
> >>> Signed-off-by: Eli Britstein 
> >>> Reviewed-by: Roi Dayan 
> >> I agree with Pravin, this need a much better commit message.
> >>
> >> Maybe even better to submit this alongside whatever is supposed
> >> to use these new macros.
> >
> > This patch is equivalent to a work done in the OVS tree.
> >
> > https://patchwork.ozlabs.org/patch/1023405/
> >
> > As a standalone it doesn't serve any purpose (as mentioned - no
> > functional change).
> >
> > It serves as a pre-step towards another patch in the OVS:
> >
> > https://patchwork.ozlabs.org/patch/1022794/
> >
> > So, the purpose of doing it in the kernel is just to keep this H file
> > identical. Once it is approved for the kernel, we will be able to
> > proceed with it in the OVS.
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net-next 1/1] openvswitch: Declare ovs key structures using macros

2019-01-25 Thread Pravin Shelar

On Thu, Jan 24, 2019 at 1:47 AM Eli Britstein  wrote:
>
> Declare ovs key structures using macros to enable retrieving fields
> information, with no functional change.
>

I am not sure why is this done. Can you explain what are u trying to solve here?

> Signed-off-by: Eli Britstein 
> Reviewed-by: Roi Dayan 
> ---
>  include/uapi/linux/openvswitch.h | 102 
> ++-
>  1 file changed, 69 insertions(+), 33 deletions(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index dbe0cbe4f1b7..dc8246f871fd 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -387,73 +387,109 @@ enum ovs_frag_type {
>
>  #define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
>
> +#define OVS_KEY_FIELD_ARR(type, name, elements) type name[elements];
> +#define OVS_KEY_FIELD(type, name) type name;
> +
> +#define OVS_KEY_ETHERNET_FIELDS \
> +OVS_KEY_FIELD_ARR(__u8, eth_src, ETH_ALEN) \
> +OVS_KEY_FIELD_ARR(__u8, eth_dst, ETH_ALEN)
> +
>  struct ovs_key_ethernet {
> -   __u8 eth_src[ETH_ALEN];
> -   __u8 eth_dst[ETH_ALEN];
> +OVS_KEY_ETHERNET_FIELDS
>  };
>
>  struct ovs_key_mpls {
> __be32 mpls_lse;
>  };
>
> +#define OVS_KEY_IPV4_FIELDS \
> +OVS_KEY_FIELD(__be32, ipv4_src) \
> +OVS_KEY_FIELD(__be32, ipv4_dst) \
> +OVS_KEY_FIELD(__u8, ipv4_proto) \
> +OVS_KEY_FIELD(__u8, ipv4_tos) \
> +OVS_KEY_FIELD(__u8, ipv4_ttl) \
> +OVS_KEY_FIELD(__u8, ipv4_frag /* One of OVS_FRAG_TYPE_*. */)
> +
>  struct ovs_key_ipv4 {
> -   __be32 ipv4_src;
> -   __be32 ipv4_dst;
> -   __u8   ipv4_proto;
> -   __u8   ipv4_tos;
> -   __u8   ipv4_ttl;
> -   __u8   ipv4_frag;   /* One of OVS_FRAG_TYPE_*. */
> +OVS_KEY_IPV4_FIELDS
>  };
>
> +#define OVS_KEY_IPV6_FIELDS \
> +OVS_KEY_FIELD_ARR(__be32, ipv6_src, 4) \
> +OVS_KEY_FIELD_ARR(__be32, ipv6_dst, 4) \
> +OVS_KEY_FIELD(__be32, ipv6_label /* 20-bits in least-significant bits. 
> */) \
> +OVS_KEY_FIELD(__u8, ipv6_proto) \
> +OVS_KEY_FIELD(__u8, ipv6_tclass) \
> +OVS_KEY_FIELD(__u8, ipv6_hlimit) \
> +OVS_KEY_FIELD(__u8, ipv6_frag /* One of OVS_FRAG_TYPE_*. */)
> +
>  struct ovs_key_ipv6 {
> -   __be32 ipv6_src[4];
> -   __be32 ipv6_dst[4];
> -   __be32 ipv6_label;  /* 20-bits in least-significant bits. */
> -   __u8   ipv6_proto;
> -   __u8   ipv6_tclass;
> -   __u8   ipv6_hlimit;
> -   __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
> +OVS_KEY_IPV6_FIELDS
>  };
>
> +#define OVS_KEY_TCP_FIELDS \
> +OVS_KEY_FIELD(__be16, tcp_src) \
> +OVS_KEY_FIELD(__be16, tcp_dst)
> +
>  struct ovs_key_tcp {
> -   __be16 tcp_src;
> -   __be16 tcp_dst;
> +OVS_KEY_TCP_FIELDS
>  };
>
> +#define OVS_KEY_UDP_FIELDS \
> +OVS_KEY_FIELD(__be16, udp_src) \
> +OVS_KEY_FIELD(__be16, udp_dst)
> +
>  struct ovs_key_udp {
> -   __be16 udp_src;
> -   __be16 udp_dst;
> +OVS_KEY_UDP_FIELDS
>  };
>
> +#define OVS_KEY_SCTP_FIELDS \
> +OVS_KEY_FIELD(__be16, sctp_src) \
> +OVS_KEY_FIELD(__be16, sctp_dst)
> +
>  struct ovs_key_sctp {
> -   __be16 sctp_src;
> -   __be16 sctp_dst;
> +OVS_KEY_SCTP_FIELDS
>  };
>
> +#define OVS_KEY_ICMP_FIELDS \
> +OVS_KEY_FIELD(__u8, icmp_type) \
> +OVS_KEY_FIELD(__u8, icmp_code)
> +
>  struct ovs_key_icmp {
> -   __u8 icmp_type;
> -   __u8 icmp_code;
> +OVS_KEY_ICMP_FIELDS
>  };
>
> +#define OVS_KEY_ICMPV6_FIELDS \
> +OVS_KEY_FIELD(__u8, icmpv6_type) \
> +OVS_KEY_FIELD(__u8, icmpv6_code)
> +
>  struct ovs_key_icmpv6 {
> -   __u8 icmpv6_type;
> -   __u8 icmpv6_code;
> +OVS_KEY_ICMPV6_FIELDS
>  };
>
> +#define OVS_KEY_ARP_FIELDS \
> +OVS_KEY_FIELD(__be32, arp_sip) \
> +OVS_KEY_FIELD(__be32, arp_tip) \
> +OVS_KEY_FIELD(__be16, arp_op) \
> +OVS_KEY_FIELD_ARR(__u8, arp_sha, ETH_ALEN) \
> +OVS_KEY_FIELD_ARR(__u8, arp_tha, ETH_ALEN)
> +
>  struct ovs_key_arp {
> -   __be32 arp_sip;
> -   __be32 arp_tip;
> -   __be16 arp_op;
> -   __u8   arp_sha[ETH_ALEN];
> -   __u8   arp_tha[ETH_ALEN];
> +OVS_KEY_ARP_FIELDS
>  };
>
> +#define OVS_KEY_ND_FIELDS \
> +OVS_KEY_FIELD_ARR(__be32, nd_target, 4) \
> +OVS_KEY_FIELD_ARR(__u8, nd_sll, ETH_ALEN) \
> +OVS_KEY_FIELD_ARR(__u8, nd_tll, ETH_ALEN)
> +
>  struct ovs_key_nd {
> -   __be32  nd_target[4];
> -   __u8nd_sll[ETH_ALEN];
> -   __u8nd_tll[ETH_ALEN];
> +OVS_KEY_ND_FIELDS
>  };
>
> +#undef OVS_KEY_FIELD_ARR
> +#undef OVS_KEY_FIELD
> +
>  #define OVS_CT_LABELS_LEN_32   4
>  #define OVS_CT_LABELS_LEN  (OVS_CT_LABELS_LEN_32 * sizeof(__u32))
>  struct ovs_key_ct_labels {
> --
> 2.14.4
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] openvswitch: Avoid OOB read when parsing flow nlattrs

2019-01-14 Thread Pravin Shelar

On Mon, Jan 14, 2019 at 1:17 AM Ross Lagerwall
 wrote:
>
> For nested and variable attributes, the expected length of an attribute
> is not known and marked by a negative number.  This results in an OOB
> read when the expected length is later used to check if the attribute is
> all zeros. Fix this by using the actual length of the attribute rather
> than the expected length.
>
> Signed-off-by: Ross Lagerwall 
> ---
>  net/openvswitch/flow_netlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 435a4bdf8f89..691da853bef5 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -500,7 +500,7 @@ static int __parse_flow_nlattrs(const struct nlattr *attr,
> return -EINVAL;
> }
>
> -   if (!nz || !is_all_zero(nla_data(nla), expected_len)) {
> +   if (!nz || !is_all_zero(nla_data(nla), nla_len(nla))) {
> attrs |= 1 << type;
> a[type] = nla;
> }

Good Catch.
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-09-28 Thread Pravin Shelar

On Wed, Sep 26, 2018 at 2:58 AM Stefano Brivio  wrote:
>
> Hi Pravin,
>
> On Wed, 15 Aug 2018 00:19:39 -0700
> Pravin Shelar  wrote:
>
> > I understand fairness has cost, but we need to find right balance
> > between performance and fairness. Current fairness scheme is a
> > lockless algorithm without much computational overhead, did you try to
> > improve current algorithm so that it uses less number of ports.
>
> In the end, we tried harder as you suggested, and found out a way to
> avoid the need for per-thread sets of per-vport sockets: with a few
> changes, a process-wide set of per-vport sockets is actually enough to
> maintain the current fairness behaviour.
>
> Further, we can get rid of duplicate fd events if the EPOLLEXCLUSIVE
> epoll() flag is available, which improves performance noticeably. This
> is explained in more detail in the commit message of the ovs-vswitchd
> patch Matteo wrote [1], now merged.
>
> This approach solves the biggest issue (i.e. huge amount of netlink
> sockets) in ovs-vswitchd itself, without the need for kernel changes.
>

Thanks for working on this issue.

> It doesn't address some proposed improvements though, that is, it does
> nothing to improve the current fairness scheme, it won't allow neither
> the configurable fairness criteria proposed by Ben, nor usage of BPF
> maps for extensibility as suggested by William.
>
> I would however say that those topics bear definitely lower priority,
> and perhaps addressing them at all becomes overkill now that we don't
> any longer have a serious issue.
>
> [1] https://patchwork.ozlabs.org/patch/974304/
Nice!

Thanks,
Pravin.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-08-15 Thread Pravin Shelar

Hi Stefano

On Tue, Aug 7, 2018 at 6:31 AM, Stefano Brivio  wrote:
> Hi Pravin,
>
> On Tue, 31 Jul 2018 16:12:03 -0700
> Pravin Shelar  wrote:
>
>> Rather than reducing number of thread down to 1, we could find better
>> number of FDs per port.
>> How about this simple solution:
>> 1. Allocate (N * P) FDs as long as it is under FD limit.
>> 2. If FD limit (-EMFILE) is hit reduce N value by half and repeat step 1.
>
> I still see a few quantitative issues with this approach, other than
> Ben's observation about design (which, by the way, looks entirely
> reasonable to me).
>
> We're talking about a disproportionate amount of sockets in any case.
> We can have up to 2^16 vports here, with 5k vports being rather common,
> and for any reasonable value of N that manages somehow to perturbate the
> distribution of upcalls per thread, we are talking about something well
> in excess of 100k sockets. I think this doesn't really scale.
>
My argument is not about proposed fairness algorithm. It is about cost
of the fairness and I do not see it is addressed in any of the follow
ups. You seems to be worried about memory cost and fairness aspects, I
am worried about CPU cost of the solution.
I think proposed solution is solving the fairness issue but it is also
creating bottleneck in upcall processing. OVS is known to have slower
upcall processing. This patch is adding even more cost to the upcall
handling. The latency of first packet handling is also going up with
this approach.

I revisited the original patch, here is what I see in term of added
cost to existing upcall processing:
1. one "kzalloc(sizeof(*upcall), GFP_ATOMIC);" This involve allocate
and initialize memory
2. copy flow key which is more than 1 KB (upcall->key = *key)
3. Acquire spin_lock_bh dp->upcalls.lock, which would disable bottom
half processing on CPU while waiting for the global lock.
4. Iterate list of queued upcalls, one of objective it is to avoid out
of order packet. But I do not see point of ordering packets from
different streams.
5. signal upcall thread after delay ovs_dp_upcall_delay(). This adds
further to the latency.
6. upcall is then handed over to different thread (context switch),
likely on different CPU.
8. the upcall object is freed on remote CPU.
9. single lock essentially means OVS kernel datapath upcall processing
is single threaded no matter number of cores in system.

I would be interested in how are we going to address these issues.

In example you were talking about netlink fd issue on server with 48
core, how does this solution works when there are 5K ports each
triggering upcall ? Can you benchmark your patch? Do you have
performance numbers for TCP_CRR with and without this patch ? Also
publish latency numbers for this patch. Please turn off megaflow to
exercise upcall handling.

I understand fairness has cost, but we need to find right balance
between performance and fairness. Current fairness scheme is a
lockless algorithm without much computational overhead, did you try to
improve current algorithm so that it uses less number of ports.

> With the current value for N (3/4 * number of threads) this can even get
> close to /proc/sys/fs/file-max on some systems, and there raising the
> number of allowed file descriptors for ovs-vswitchd isn't a solution
> anymore.
>
> I would instead try to address the concerns that you had about the
> original patch adding fairness in the kernel, rather than trying to
> make the issue appear less severe in ovs-vswitchd.
>
> --
> Stefano
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-31 Thread Pravin Shelar

On Tue, Jul 31, 2018 at 12:43 PM, Matteo Croce  wrote:
> On Mon, Jul 16, 2018 at 4:54 PM Matteo Croce  wrote:
>>
>> On Tue, Jul 10, 2018 at 6:31 PM Pravin Shelar  wrote:
>> >
>> > On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
>> > > From: Stefano Brivio 
>> > >
>> > > Open vSwitch sends to userspace all received packets that have
>> > > no associated flow (thus doing an "upcall"). Then the userspace
>> > > program creates a new flow and determines the actions to apply
>> > > based on its configuration.
>> > >
>> > > When a single port generates a high rate of upcalls, it can
>> > > prevent other ports from dispatching their own upcalls. vswitchd
>> > > overcomes this problem by creating many netlink sockets for each
>> > > port, but it quickly exceeds any reasonable maximum number of
>> > > open files when dealing with huge amounts of ports.
>> > >
>> > > This patch queues all the upcalls into a list, ordering them in
>> > > a per-port round-robin fashion, and schedules a deferred work to
>> > > queue them to userspace.
>> > >
>> > > The algorithm to queue upcalls in a round-robin fashion,
>> > > provided by Stefano, is based on these two rules:
>> > >  - upcalls for a given port must be inserted after all the other
>> > >occurrences of upcalls for the same port already in the queue,
>> > >in order to avoid out-of-order upcalls for a given port
>> > >  - insertion happens once the highest upcall count for any given
>> > >port (excluding the one currently at hand) is greater than the
>> > >count for the port we're queuing to -- if this condition is
>> > >never true, upcall is queued at the tail. This results in a
>> > >per-port round-robin order.
>> > >
>> > > In order to implement a fair round-robin behaviour, a variable
>> > > queueing delay is introduced. This will be zero if the upcalls
>> > > rate is below a given threshold, and grows linearly with the
>> > > queue utilisation (i.e. upcalls rate) otherwise.
>> > >
>> > > This ensures fairness among ports under load and with few
>> > > netlink sockets.
>> > >
>> > Thanks for the patch.
>> > This patch is adding following overhead for upcall handling:
>> > 1. kmalloc.
>> > 2. global spin-lock.
>> > 3. context switch to single worker thread.
>> > I think this could become bottle neck on most of multi core systems.
>> > You have mentioned issue with existing fairness mechanism, Can you
>> > elaborate on those, I think we could improve that before implementing
>> > heavy weight fairness in upcall handling.
>>
>> Hi Pravin,
>>
>> vswitchd allocates N * P netlink sockets, where N is the number of
>> online CPU cores, and P the number of ports.
>> With some setups, this number can grow quite fast, also exceeding the
>> system maximum file descriptor limit.
>> I've seen a 48 core server failing with -EMFILE when trying to create
>> more than 65535 netlink sockets needed for handling 1800+ ports.
>>
>> I made a previous attempt to reduce the sockets to one per CPU, but
>> this was discussed and rejected on ovs-dev because it would remove
>> fairness among ports[1].

Rather than reducing number of thread down to 1, we could find better
number of FDs per port.
How about this simple solution:
1. Allocate (N * P) FDs as long as it is under FD limit.
2. If FD limit (-EMFILE) is hit reduce N value by half and repeat step 1.


Thanks,
Pravin.

>> I think that the current approach of opening a huge number of sockets
>> doesn't really work, (it doesn't scale for sure), it still needs some
>> queueing logic (either in kernel or user space) if we really want to
>> be sure that low traffic ports gets their upcalls quota when other
>> ports are doing way more traffic.
>>
>> If you are concerned about the kmalloc or spinlock, we can solve them
>> with kmem_cache or two copies of the list and rcu, I'll happy to
>> discuss the implementation details, as long as we all agree that the
>> current implementation doesn't scale well and has an issue.
>>

>> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-February/344279.html
>>
>> --
>> Matteo Croce
>> per aspera ad upstream
>
> Hi all,
>
> any idea on how to solve the file descriptor limit hit by the netlink sockets?
> I see this issue happen very often, and raising the FD limit to 400k
> seems not the right way to solve it.
> Any other suggestion on how to improve the patch, or solve the problem
> in a different way?
>
> Regards,
>
>
>
> --
> Matteo Croce
> per aspera ad upstream
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 2/2] openvswitch: check for null return for nla_nest_start in datapath

2018-07-19 Thread Pravin Shelar

On Wed, Jul 18, 2018 at 9:12 AM, Stephen Hemminger
 wrote:
> The call to nla_nest_start when forming packet messages can lead to a NULL
> return so it's possible for attr to become NULL and we can potentially
> get a NULL pointer dereference on attr.  Fix this by checking for
> a NULL return.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200537
> Fixes: 8f0aad6f35f7 ("openvswitch: Extend packet attribute for egress tunnel 
> info")
> Signed-off-by: Stephen Hemminger 
> ---
>  net/openvswitch/datapath.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 0f5ce77460d4..93c3eb635827 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -460,6 +460,10 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
>
> if (upcall_info->egress_tun_info) {
> nla = nla_nest_start(user_skb, 
> OVS_PACKET_ATTR_EGRESS_TUN_KEY);
> +   if (!nla) {
> +   err = -EMSGSIZE;
> +   goto out;
> +   }
> err = ovs_nla_put_tunnel_info(user_skb,
>   upcall_info->egress_tun_info);
> BUG_ON(err);
> @@ -468,6 +472,10 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
>
> if (upcall_info->actions_len) {
> nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS);
> +   if (!nla) {
> +   err = -EMSGSIZE;
> +   goto out;
> +   }
> err = ovs_nla_put_actions(upcall_info->actions,
>   upcall_info->actions_len,
>   user_skb);

Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH net 1/2] openvswitch: check for null return for nla_nest_start

2018-07-19 Thread Pravin Shelar

On Wed, Jul 18, 2018 at 9:12 AM, Stephen Hemminger
 wrote:
> The call to nla_nest_start in conntrack can lead to a NULL
> return so it's possible for attr to become NULL and we can potentially
> get a NULL pointer dereference on attr.  Fix this by checking for
> a NULL return.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200533
> Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
> Signed-off-by: Stephen Hemminger 
> ---
>  net/openvswitch/conntrack.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 284aca2a252d..2e316f641df8 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -2132,6 +2132,8 @@ static int ovs_ct_limit_cmd_get(struct sk_buff *skb, 
> struct genl_info *info)
> return PTR_ERR(reply);
>
> nla_reply = nla_nest_start(reply, OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
> +   if (!nla_reply)
> +   return PRT_ERR(-EMSGSIZE);
>
> if (a[OVS_CT_LIMIT_ATTR_ZONE_LIMIT]) {
> err = ovs_ct_limit_get_zone_limit(
> --
Acked-by: Pravin B Shelar 

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-07-10 Thread Pravin Shelar

On Wed, Jul 4, 2018 at 7:23 AM, Matteo Croce  wrote:
> From: Stefano Brivio 
>
> Open vSwitch sends to userspace all received packets that have
> no associated flow (thus doing an "upcall"). Then the userspace
> program creates a new flow and determines the actions to apply
> based on its configuration.
>
> When a single port generates a high rate of upcalls, it can
> prevent other ports from dispatching their own upcalls. vswitchd
> overcomes this problem by creating many netlink sockets for each
> port, but it quickly exceeds any reasonable maximum number of
> open files when dealing with huge amounts of ports.
>
> This patch queues all the upcalls into a list, ordering them in
> a per-port round-robin fashion, and schedules a deferred work to
> queue them to userspace.
>
> The algorithm to queue upcalls in a round-robin fashion,
> provided by Stefano, is based on these two rules:
>  - upcalls for a given port must be inserted after all the other
>occurrences of upcalls for the same port already in the queue,
>in order to avoid out-of-order upcalls for a given port
>  - insertion happens once the highest upcall count for any given
>port (excluding the one currently at hand) is greater than the
>count for the port we're queuing to -- if this condition is
>never true, upcall is queued at the tail. This results in a
>per-port round-robin order.
>
> In order to implement a fair round-robin behaviour, a variable
> queueing delay is introduced. This will be zero if the upcalls
> rate is below a given threshold, and grows linearly with the
> queue utilisation (i.e. upcalls rate) otherwise.
>
> This ensures fairness among ports under load and with few
> netlink sockets.
>
Thanks for the patch.
This patch is adding following overhead for upcall handling:
1. kmalloc.
2. global spin-lock.
3. context switch to single worker thread.
I think this could become bottle neck on most of multi core systems.
You have mentioned issue with existing fairness mechanism, Can you
elaborate on those, I think we could improve that before implementing
heavy weight fairness in upcall handling.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH nf-next v2] openvswitch: use nf_ct_get_tuplepr, invert_tuplepr

2018-06-25 Thread Pravin Shelar

On Mon, Jun 25, 2018 at 8:55 AM, Florian Westphal  wrote:
> These versions deal with the l3proto/l4proto details internally.
> It removes only caller of nf_ct_get_tuple, so make it static.
>
> After this, l3proto->get_l4proto() can be removed in a followup patch.
>
> Signed-off-by: Florian Westphal 
Acked-by: Pravin B Shelar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath: stt: linearize in SKIP_ZERO_COPY case

2018-06-25 Thread Pravin Shelar

On Fri, Jun 22, 2018 at 5:20 PM, Gregory Rose  wrote:
> On 6/22/2018 3:18 PM, Neal Shrader via dev wrote:
>>
>> During the investigation of a kernel panic, we encountered a condition
>> that triggered a kernel panic due to a large skb with an unusual
>> geometry.  Inside of the STT codepath, an effort is made to linearize
>> such packets to avoid trouble during both fragment reassembly and
>> segmentation in the linux networking core.
>>
>> As currently implemented, kernels with CONFIG_SLUB defined will skip
>> this process because it does not expect an skb with a frag_list to be
>> present.  This patch removes the assumption, and allows these skb to
>> be linearized as intended.  We confirmed this corrects the panic we
>> encountered.
>>
>> Reported-by: Johannes Erdfelt 
>> Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-May/046800.html
>> Requested-by: Pravin Shelar 
>> Signed-off-by: Neal Shrader 
>> ---
>>   datapath/linux/compat/stt.c | 7 ---
>>   1 file changed, 7 deletions(-)
>>
>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>> index ca9f03988..64fc550fe 100644
>> --- a/datapath/linux/compat/stt.c
>> +++ b/datapath/linux/compat/stt.c
>> @@ -559,12 +559,6 @@ static int __try_to_segment(struct sk_buff *skb, bool
>> csum_partial,
>> static int try_to_segment(struct sk_buff *skb)
>>   {
>> -#ifdef SKIP_ZERO_COPY
>> -   /* coalesce_skb() since does not generate frag-list no need to
>> -* linearize it here.
>> -*/
>> -   return 0;
>> -#else
>> struct stthdr *stth = stt_hdr(skb);
>> bool csum_partial = !!(stth->flags & STT_CSUM_PARTIAL);
>> bool ipv4 = !!(stth->flags & STT_PROTO_IPV4);
>> @@ -572,7 +566,6 @@ static int try_to_segment(struct sk_buff *skb)
>> int l4_offset = stth->l4_offset;
>> return __try_to_segment(skb, csum_partial, ipv4, tcp, l4_offset);
>> -#endif
>>   }
>> static int segment_skb(struct sk_buff **headp, bool csum_partial,
>
>
> This seems reasonable to me but it still does seem strange that an skb with
> a frag list occurs when it's not supposed to?
>

I think some driver started generating such skb, so we need to handle
it in STT.

I applied patch to master and to branch-2.[6-9].

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] datapath: ensure UFO traffic is actually fragmented

2018-06-15 Thread Pravin Shelar

On Fri, Jun 15, 2018 at 1:21 PM, Gregory Rose  wrote:
> On 6/15/2018 1:05 PM, Ben Pfaff wrote:
>>
>> On Tue, May 29, 2018 at 06:06:19PM +, Neal Shrader via dev wrote:
>>>
>>> While investigating a kernel panic, our team noticed that UDP traffic
>>> recieved by an STT tunnel will always have a gso_type set as SKB_GSO_UDP.
>>> After decap, we also noticed that traffic that had this flag set had its
>>> fragmentation type set as OVS_FRAG_TYPE_FIRST during key extraction.
>>>
>>> When the connection tracker encounters this, it assumes it's already
>>> dealing with fragmented traffic, which might not be the case.  This
>>> patch simply ensures we're dealing with an actual fragment before sending
>>> the skb off to be reassembled.
>>>
>>> Reported-by: Johannes Erdfelt 
>>> Reported-at:
>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-May/046800.html
>>> Signed-off-by: Neal Shrader 
>>
>> Thanks a lot for the patch.
>>
>> Greg, have you taken a look at this?
>
>
> I had it flagged for review but have not yet had a chance to get to it.
> I'll do so now.
>

I do not think this is right approach to fix the issue.
I have posted my comment on discuss mailing thread:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-May/046800.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] vxlan: Fix for the packet loop issue in vxlan

2018-06-13 Thread Pravin Shelar

On Wed, Jun 13, 2018 at 5:16 AM, Neelakantam Gaddam
 wrote:
> It should be Qdisc's busylock in dev_queue_xmit function. I have seen the
> issue in kernel version is 3.10.87 vanilla.
>

I see, This looks like general problem which exist in upstream kernel.
You would be able to reproduce it even with linux bridge and vxlan
device.
Proposed patch is specific solution. You can add another layer of
bridge and the patch would not handle same issue in that
configuration. Therefore I am bit hesitant to apply this patch.

>
>
> On Wed, Jun 13, 2018 at 12:21 PM, Pravin Shelar  wrote:
>>
>> On Tue, Jun 12, 2018 at 10:58 PM, Neelakantam Gaddam
>>  wrote:
>> > Hi Pravin,
>> >
>> > I have seen the below crash.
>> >
>> > [] show_stack+0x6c/0xf8
>> >
>> > [] do_raw_spin_lock+0x168/0x170
>> >
>> > [] dev_queue_xmit+0x43c/0x470
>> >
>> > [] ip_finish_output+0x250/0x490
>> >
>> > [] rpl_iptunnel_xmit+0x134/0x218 [openvswitch]
>> >
>> > [] rpl_vxlan_xmit+0x430/0x538 [openvswitch]
>> >
>> > [] do_execute_actions+0x18f8/0x19e8 [openvswitch]
>> >
>> > [] ovs_execute_actions+0x90/0x208 [openvswitch]
>> >
>> > [] ovs_dp_process_packet+0xb0/0x1a8 [openvswitch]
>> >
>> > [] ovs_vport_receive+0x78/0x130 [openvswitch]
>> >
>> > [] internal_dev_xmit+0x34/0x98 [openvswitch]
>> >
>> > [] dev_hard_start_xmit+0x2e8/0x4f8
>> >
>> > [] sch_direct_xmit+0xf0/0x238
>> >
>> > [] dev_queue_xmit+0x1d8/0x470
>> >
>> > [] arp_process+0x614/0x628
>> >
>> > [] __netif_receive_skb_core+0x2e8/0x5d8
>> >
>> > [] process_backlog+0xc0/0x1b0
>> >
>> > [] net_rx_action+0x154/0x240
>> >
>> > [] __do_softirq+0x1d0/0x218
>> >
>> > [] do_softirq+0x68/0x70
>> >
>> > [] local_bh_enable+0xa8/0xb0
>> >
>> > [] netif_rx_ni+0x20/0x30
>> >
>> >
>> >
>> >
>> > I have spent some time in investigation and found that crash is because
>> > of
>> > spinlock recursion in dev_queue_xmit function.
>> > The packet path traced is : netif_rx->arp->dev_queue_xmit(internal
>> > port)->vxlan_xmit->dev_queue_xmit(internal port).
>> >
>>
>> Which spin-lock is it? I am surprised to see a lock taken in fast
>> path. Can you also share kernel version?
>>
>> > The macro (XMIT_RECURSION_LIMIT) is defined as 10. This limit wont
>> > prevent
>> > the crash since the recursion is 2 only for my configuration.
>>
>> right, The recursion limit is to avoid stack overflow.
>>
>> >
>> >
>> >
>> > On Wed, Jun 13, 2018 at 4:11 AM, Pravin Shelar  wrote:
>> >>
>> >> On Tue, Jun 12, 2018 at 10:11 AM, Neelakantam Gaddam
>> >>  wrote:
>> >> >
>> >> > Hi Pravin,
>> >> >
>> >> > The below configuration is causing the spinlock recursion issue.
>> >> >
>> >> > I am able to see the issue with below configuration.
>> >> >
>> >> >
>> >> >
>> >> > ovs-vsctl add-br br0
>> >> >
>> >> > ovs-vsctl add-bond br0 bond0 p1p1 p1p2
>> >> >
>> >> > ovs-vsctl set port bond0 lacp=active bond_mode=balance-tcp
>> >> >
>> >> > ifconfig br0 100.0.0.1 up
>> >> >
>> >> > ovs-vsctl add-port br0 veth0
>> >> >
>> >> > ovs-vsctl add-port br0 vx0 -- set interface vx0 type=vxlan
>> >> > options:local_ip=100.0.0.1 options:remote_ip=100.0.0.2
>> >> > option:key=flow
>> >> >
>> >> >
>> >> >
>> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, tun_id=100,
>> >> > in_port=4, action=output:3"
>> >> >
>> >> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, in_port=3,
>> >> > actions=set_field:100->tun_id output:4"
>> >> >
>> >> >
>> >> >
>> >> > The same bridge br0 is being used as the local interface for vxlan
>> >> > tunnel. Even though this configuration is invalid, we should not see
>> >> > the
>> >> > kernel crash.
>> >> >
>> >>
>> >> I agree this should not cause crash.
>> >> Can you post the crash or investi

Re: [ovs-dev] [PATCH] vxlan: Fix for the packet loop issue in vxlan

2018-06-13 Thread Pravin Shelar

On Tue, Jun 12, 2018 at 10:58 PM, Neelakantam Gaddam
 wrote:
> Hi Pravin,
>
> I have seen the below crash.
>
> [] show_stack+0x6c/0xf8
>
> [] do_raw_spin_lock+0x168/0x170
>
> [] dev_queue_xmit+0x43c/0x470
>
> [] ip_finish_output+0x250/0x490
>
> [] rpl_iptunnel_xmit+0x134/0x218 [openvswitch]
>
> [] rpl_vxlan_xmit+0x430/0x538 [openvswitch]
>
> [] do_execute_actions+0x18f8/0x19e8 [openvswitch]
>
> [] ovs_execute_actions+0x90/0x208 [openvswitch]
>
> [] ovs_dp_process_packet+0xb0/0x1a8 [openvswitch]
>
> [] ovs_vport_receive+0x78/0x130 [openvswitch]
>
> [] internal_dev_xmit+0x34/0x98 [openvswitch]
>
> [] dev_hard_start_xmit+0x2e8/0x4f8
>
> [] sch_direct_xmit+0xf0/0x238
>
> [] dev_queue_xmit+0x1d8/0x470
>
> [] arp_process+0x614/0x628
>
> [] __netif_receive_skb_core+0x2e8/0x5d8
>
> [] process_backlog+0xc0/0x1b0
>
> [] net_rx_action+0x154/0x240
>
> [] __do_softirq+0x1d0/0x218
>
> [] do_softirq+0x68/0x70
>
> [] local_bh_enable+0xa8/0xb0
>
> [] netif_rx_ni+0x20/0x30
>
>
>
>
> I have spent some time in investigation and found that crash is because of
> spinlock recursion in dev_queue_xmit function.
> The packet path traced is : netif_rx->arp->dev_queue_xmit(internal
> port)->vxlan_xmit->dev_queue_xmit(internal port).
>

Which spin-lock is it? I am surprised to see a lock taken in fast
path. Can you also share kernel version?

> The macro (XMIT_RECURSION_LIMIT) is defined as 10. This limit wont prevent
> the crash since the recursion is 2 only for my configuration.

right, The recursion limit is to avoid stack overflow.

>
>
>
> On Wed, Jun 13, 2018 at 4:11 AM, Pravin Shelar  wrote:
>>
>> On Tue, Jun 12, 2018 at 10:11 AM, Neelakantam Gaddam
>>  wrote:
>> >
>> > Hi Pravin,
>> >
>> > The below configuration is causing the spinlock recursion issue.
>> >
>> > I am able to see the issue with below configuration.
>> >
>> >
>> >
>> > ovs-vsctl add-br br0
>> >
>> > ovs-vsctl add-bond br0 bond0 p1p1 p1p2
>> >
>> > ovs-vsctl set port bond0 lacp=active bond_mode=balance-tcp
>> >
>> > ifconfig br0 100.0.0.1 up
>> >
>> > ovs-vsctl add-port br0 veth0
>> >
>> > ovs-vsctl add-port br0 vx0 -- set interface vx0 type=vxlan
>> > options:local_ip=100.0.0.1 options:remote_ip=100.0.0.2 option:key=flow
>> >
>> >
>> >
>> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, tun_id=100,
>> > in_port=4, action=output:3"
>> >
>> > ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, in_port=3,
>> > actions=set_field:100->tun_id output:4"
>> >
>> >
>> >
>> > The same bridge br0 is being used as the local interface for vxlan
>> > tunnel. Even though this configuration is invalid, we should not see the
>> > kernel crash.
>> >
>>
>> I agree this should not cause crash.
>> Can you post the crash or investigate why it is crashing I think such
>> configuration would hit the networking stack recursion limit
>> (XMIT_RECURSION_LIMIT) and then the packet would be dropped. I am not
>> sure which spinlock recursion issue you are referring to.
>>
>>
>> >
>> >
>> >
>> >
>> > On Tue, Jun 12, 2018 at 11:55 AM, Pravin Shelar  wrote:
>> >>
>> >>
>> >>
>> >> On Tue, May 22, 2018 at 10:16 PM, Neelakantam Gaddam
>> >>  wrote:
>> >>>
>> >>> This patch fixes the kernel soft lockup issue with vxlan configuration
>> >>> where the tunneled packet is sent on the same bridge where vxlan port
>> >>> is
>> >>> attched to. It detects the loop in vxlan xmit functionb and drops if
>> >>> loop is
>> >>> detected.
>> >>>
>> >>> Signed-off-by: Neelakantam Gaddam 
>> >>> ---
>> >>>  datapath/linux/compat/vxlan.c | 6 --
>> >>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/datapath/linux/compat/vxlan.c
>> >>> b/datapath/linux/compat/vxlan.c
>> >>> index 287dad2..00562fa 100644
>> >>> --- a/datapath/linux/compat/vxlan.c
>> >>> +++ b/datapath/linux/compat/vxlan.c
>> >>> @@ -1115,7 +1115,8 @@ static void vxlan_xmit_one(struct sk_buff *skb,
>> >>> struct net_device *dev,
>> >>> goto tx_error;
>> >>> }
>> >>>
>> >>> -   if (rt->dst.dev == dev) {
>> >>> +   if ((rt->dst.dev == dev) ||
>> >>> +   (OVS_CB(skb)->input_vport->dev ==
>> >>> rt->dst.dev)) {
>> >>
>> >>
>> >> I am not sure which case the  OVS_CB(skb)->input_vport->dev is not same
>> >> as the dev when there is recursion. Can you explain how to reproduce it?
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Neelakantam Gaddam
>
>
>
>
> --
> Thanks & Regards
> Neelakantam Gaddam
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] vxlan: Fix for the packet loop issue in vxlan

2018-06-12 Thread Pravin Shelar

On Tue, Jun 12, 2018 at 10:11 AM, Neelakantam Gaddam
 wrote:
>
> Hi Pravin,
>
> The below configuration is causing the spinlock recursion issue.
>
> I am able to see the issue with below configuration.
>
>
>
> ovs-vsctl add-br br0
>
> ovs-vsctl add-bond br0 bond0 p1p1 p1p2
>
> ovs-vsctl set port bond0 lacp=active bond_mode=balance-tcp
>
> ifconfig br0 100.0.0.1 up
>
> ovs-vsctl add-port br0 veth0
>
> ovs-vsctl add-port br0 vx0 -- set interface vx0 type=vxlan 
> options:local_ip=100.0.0.1 options:remote_ip=100.0.0.2 option:key=flow
>
>
>
> ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, tun_id=100, 
> in_port=4, action=output:3"
>
> ovs-ofctl add-flow br0 "table=0, priority=1, cookie=100, in_port=3, 
> actions=set_field:100->tun_id output:4"
>
>
>
> The same bridge br0 is being used as the local interface for vxlan tunnel. 
> Even though this configuration is invalid, we should not see the kernel crash.
>

I agree this should not cause crash.
Can you post the crash or investigate why it is crashing I think such
configuration would hit the networking stack recursion limit
(XMIT_RECURSION_LIMIT) and then the packet would be dropped. I am not
sure which spinlock recursion issue you are referring to.


>
>
>
>
> On Tue, Jun 12, 2018 at 11:55 AM, Pravin Shelar  wrote:
>>
>>
>>
>> On Tue, May 22, 2018 at 10:16 PM, Neelakantam Gaddam  
>> wrote:
>>>
>>> This patch fixes the kernel soft lockup issue with vxlan configuration
>>> where the tunneled packet is sent on the same bridge where vxlan port is
>>> attched to. It detects the loop in vxlan xmit functionb and drops if loop is
>>> detected.
>>>
>>> Signed-off-by: Neelakantam Gaddam 
>>> ---
>>>  datapath/linux/compat/vxlan.c | 6 --
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c
>>> index 287dad2..00562fa 100644
>>> --- a/datapath/linux/compat/vxlan.c
>>> +++ b/datapath/linux/compat/vxlan.c
>>> @@ -1115,7 +1115,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, 
>>> struct net_device *dev,
>>> goto tx_error;
>>> }
>>>
>>> -   if (rt->dst.dev == dev) {
>>> +   if ((rt->dst.dev == dev) ||
>>> +   (OVS_CB(skb)->input_vport->dev == rt->dst.dev)) {
>>
>>
>> I am not sure which case the  OVS_CB(skb)->input_vport->dev is not same as 
>> the dev when there is recursion. Can you explain how to reproduce it?
>>
>
>
>
> --
> Thanks & Regards
> Neelakantam Gaddam
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] vxlan: Fix for the packet loop issue in vxlan

2018-06-12 Thread Pravin Shelar

On Tue, May 22, 2018 at 10:16 PM, Neelakantam Gaddam 
wrote:

> This patch fixes the kernel soft lockup issue with vxlan configuration
> where the tunneled packet is sent on the same bridge where vxlan port is
> attched to. It detects the loop in vxlan xmit functionb and drops if loop
> is
> detected.
>
> Signed-off-by: Neelakantam Gaddam 
> ---
>  datapath/linux/compat/vxlan.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c
> index 287dad2..00562fa 100644
> --- a/datapath/linux/compat/vxlan.c
> +++ b/datapath/linux/compat/vxlan.c
> @@ -1115,7 +1115,8 @@ static void vxlan_xmit_one(struct sk_buff *skb,
> struct net_device *dev,
> goto tx_error;
> }
>
> -   if (rt->dst.dev == dev) {
> +   if ((rt->dst.dev == dev) ||
> +   (OVS_CB(skb)->input_vport->dev == rt->dst.dev)) {
>

I am not sure which case the  OVS_CB(skb)->input_vport->dev is not same as
the dev when there is recursion. Can you explain how to reproduce it?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] compat: Fix upstream 4.4.119 kernel

2018-05-08 Thread Pravin Shelar

On Fri, Apr 20, 2018 at 11:13 AM, Greg Rose  wrote:
> The Linux 4.4.119 kernel (and perhaps others) from kernel.org
> backports some dst_cache code that breaks the openvswitch kernel
> due to a duplicated name "dst_cache_destroy".  For most cases the
> "USE_UPSTREAM_TUNNEL" covers this but in this case the dst_cache
> feature needs to be separated out.
>
> Add the necessary compatibility detection layer in acinclude.m4 and
> then fixup the source files so that if the built-in kernel includes
> dst_cache support then exclude our own compatibility code.
>
> Signed-off-by: Greg Rose 

I pushed it to master and upto branch 2.6.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1681 matches

Mail list logo