from:"roopa"

Re: [PATCH iproute2] stats output

2018-11-30 Thread Roopa Prabhu

On Fri, Nov 30, 2018 at 10:12 AM Stephen Hemminger
 wrote:
>
> On Fri, 30 Nov 2018 14:33:49 +0100
> Alexis Vachette  wrote:
>
> > When using:
> >
> > - ip -s link
> >
> > I think it should be better to print errors stats without adding -s
> > option twice.
> >
> > This option print stats for each network interfaces and we want to see
> > if something is off and especially timers with errors.
> >
> > Man page of ip command is updated accordingly.
> >
> > Here is a patchset:
> >
> > Signed-off-by: Alexis Vachette 
> > ---
> > diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> > index 85f05a2..c70394d 100644
> > --- a/ip/ipaddress.c
> > +++ b/ip/ipaddress.c
> > @@ -323,7 +323,7 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >   fprintf(fp,"\nalias %s",
> >   (const char *) RTA_DATA(tb[IFLA_IFALIAS]));
> >
> > - if (do_link && tb[IFLA_STATS64] && show_stats) {
> > + if (do_link && tb[IFLA_STATS64]) {
> >   struct rtnl_link_stats64 slocal;
> >   struct rtnl_link_stats64 *s = RTA_DATA(tb[IFLA_STATS64]);
> >   if (((unsigned long)s) & (sizeof(unsigned long)-1)) {
> > @@ -343,16 +343,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >   if (s->rx_compressed)
> >   fprintf(fp, " %-7llu",
> >   (unsigned long long)s->rx_compressed);
> > - if (show_stats > 1) {
> > - fprintf(fp, "%s", _SL_);
> > - fprintf(fp, "RX errors: length  crc frame   fifomissed%s", 
> > _SL_);
> > - fprintf(fp, "   %-7llu  %-7llu %-7llu %-7llu %-7llu",
> > - (unsigned long long)s->rx_length_errors,
> > - (unsigned long long)s->rx_crc_errors,
> > - (unsigned long long)s->rx_frame_errors,
> > - (unsigned long long)s->rx_fifo_errors,
> > - (unsigned long long)s->rx_missed_errors);
> > - }
> > + fprintf(fp, "%s", _SL_);
> > + fprintf(fp, "RX errors: length  crc frame   fifomissed%s", 
> > _SL_);
> > + fprintf(fp, "   %-7llu  %-7llu %-7llu %-7llu %-7llu",
> > + (unsigned long long)s->rx_length_errors,
> > + (unsigned long long)s->rx_crc_errors,
> > + (unsigned long long)s->rx_frame_errors,
> > + (unsigned long long)s->rx_fifo_errors,
> > + (unsigned long long)s->rx_missed_errors);
> >   fprintf(fp, "%s", _SL_);
> >   fprintf(fp, "TX: bytes  packets  errors  dropped carrier collsns 
> > %s%s",
> >   s->tx_compressed ? "compressed" : "", _SL_);
> > @@ -366,17 +364,15 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >   if (s->tx_compressed)
> >   fprintf(fp, " %-7llu",
> >   (unsigned long long)s->tx_compressed);
> > - if (show_stats > 1) {
> > - fprintf(fp, "%s", _SL_);
> > - fprintf(fp, "TX errors: aborted fifowindow  heartbeat%s", _SL_);
> > - fprintf(fp, "   %-7llu  %-7llu %-7llu %-7llu",
> > - (unsigned long long)s->tx_aborted_errors,
> > - (unsigned long long)s->tx_fifo_errors,
> > - (unsigned long long)s->tx_window_errors,
> > - (unsigned long long)s->tx_heartbeat_errors);
> > - }
> > + fprintf(fp, "%s", _SL_);
> > + fprintf(fp, "TX errors: aborted fifowindow  heartbeat%s", _SL_);
> > + fprintf(fp, "   %-7llu  %-7llu %-7llu %-7llu",
> > + (unsigned long long)s->tx_aborted_errors,
> > + (unsigned long long)s->tx_fifo_errors,
> > + (unsigned long long)s->tx_window_errors,
> > + (unsigned long long)s->tx_heartbeat_errors);
> >   }
> > - if (do_link && !tb[IFLA_STATS64] && tb[IFLA_STATS] && show_stats) {
> > + if (do_link && !tb[IFLA_STATS64] && tb[IFLA_STATS]) {
> >   struct rtnl_link_stats slocal;
> >   struct rtnl_link_stats *s = RTA_DATA(tb[IFLA_STATS]);
> >   if (((unsigned long)s) & (sizeof(unsigned long)-1)) {
> > @@ -393,17 +389,15 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >   );
> >   if (s->rx_compressed)
> >   fprintf(fp, " %-7u", s->rx_compressed);
> > - if (show_stats > 1) {
> > - fprintf(fp, "%s", _SL_);
> > - fprintf(fp, "RX errors: length  crc frame   fifomissed%s", 
> > _SL_);
> > - fprintf(fp, "   %-7u  %-7u %-7u %-7u %-7u",
> > - s->rx_length_errors,
> > - s->rx_crc_errors,
> > - s->rx_frame_errors,
> > - s->rx_fifo_errors,
> > - s->rx_missed_errors
> > - );
> > - }
> > + fprintf(fp, "%s", _SL_);
> > + fprintf(fp, "RX errors: length  crc frame   fifomissed%s", 
> > _SL_);
> > + fprintf(fp, "   %-7u  %-7u %-7u %-7u %-7u",
> > + s->rx_length_errors,
> > + s->rx_crc_errors,
> > + s->rx_frame_errors,
> > + s->rx_fifo_errors,
> > + s->rx_missed_errors
> > + );
> >   fprintf(fp, "%s", _SL_);
> >   fprintf(fp, "TX: bytes  packets  errors  dropped carrier collsns 
> > %s%s",
> >   s->tx_compressed ? "compressed" : "", _SL_);
> > @@ -412,16 +406,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >   s->tx_dropped, s->tx_carrier_errors, s->collisions);
> >   if (s->tx_compressed)
> >   fprintf(fp, " %-7u", s->tx_compressed);
> > - if (show_stats > 1) {
> > - fprintf(fp, "%s", _SL_);
> > - fprintf(fp, "TX errors: aborted fifowindow  heartbeat%s", _SL_);
> > - fprintf(fp, "   %-7u  %-7u %-7u %-7u",
> > - s->tx_aborted_errors,
> > -

Re: [PATCH net-next v2 3/3] vxlan: move flag sets to use a helper func

2018-11-29 Thread Roopa Prabhu

On Thu, Nov 29, 2018 at 7:55 AM Sabrina Dubroca  wrote:
>
> 2018-11-29, 07:27:17 -0800, Roopa Prabhu wrote:
> > On Thu, Nov 29, 2018 at 6:19 AM Sabrina Dubroca  
> > wrote:
> > > 2018-11-28, 14:27:59 -0800, Roopa Prabhu wrote:
> > > nit: This patch would have been easier to review if it came first in
> > > the series. Converting:
> >
> > I considered that. But the first patch really removes some checks
> > making it easier for the follow-on patches...and that mostly dictated
> > the order.
>
> ok.
>
> > > if (nla_get_u8(data[IFLA_VXLAN_FOO]))
> > > conf->flags |= VXLAN_F_FOO;
> > >
> > > to this new helper, which in effect does
> > >
> > > if (nla_get_u8(data[IFLA_VXLAN_FOO]))
> > > conf->flags |= VXLAN_F_FOO;
> > > else
> > > conf->flags &= ~VXLAN_F_FOO;
> > >
> > > seems to change behavior, but since you're (unless I missed one) only
> > > applying it to flags that couldn't be changed before patch 1, it looks
> > > fine.
> >
> > It does not change behavior...the earlier code only supported the
> > flags during create time and did not support changing of the flag with
> > changelink.
> > this patch adds changelink support. But if you do see any change in
> > behavior, pls report. thanks.
>
> Yeah, looks correct. I just had to go back to patch 1 and check.
> A note "only flags [list] are changed, and they could not be modified
> after creation of the device prior to patch 1" in the cover and/or in
> this patch's commit message would have spared reviewers a bit of a
> scare :)

ack, sounds good. I am also going to reduce the scope of the series a
bit as i am only interested in support for TTL changelink right now.

Re: [PATCH net-next v2 1/3] vxlan: support changelink for a few more attributes

2018-11-29 Thread Roopa Prabhu

On Thu, Nov 29, 2018 at 5:56 AM Sabrina Dubroca  wrote:
>
> 2018-11-28, 14:27:57 -0800, Roopa Prabhu wrote:
> > From: Roopa Prabhu 
> >
> > We started very conservative when supporting changelink
> > especially because not all attribute changes could be
> > tested. This patch opens up a few more attributes for
> > changelink. The reason for choosing this set of attributes
> > is based on code references for these attributes. I have
> > tested TTL changes and did some changelink api testing
> > to sanity test the others.
> >
> > Signed-off-by: Roopa Prabhu 
> > ---
> >  drivers/net/vxlan.c | 36 
> >  1 file changed, 4 insertions(+), 32 deletions(-)
> >
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index 9110662..73caa65 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -3438,11 +3438,8 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
> > nlattr *data[],
> >   if (data[IFLA_VXLAN_TTL])
> >   conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
> >
> > - if (data[IFLA_VXLAN_TTL_INHERIT]) {
> > - if (changelink)
> > - return -EOPNOTSUPP;
> > + if (data[IFLA_VXLAN_TTL_INHERIT])
> >   conf->flags |= VXLAN_F_TTL_INHERIT;
> > - }
>
> This doesn't give us an option to disable TTL_INHERIT after it was
> enabled once. Same thing with GBP, GPE, REMCSUM_NOPARTIAL.

that is provided by patch3, with the changelink patch. I can squash
them, if thats easier.


>
> > - if (data[IFLA_VXLAN_GBP]) {
> > - if (changelink)
> > - return -EOPNOTSUPP;
> > + if (data[IFLA_VXLAN_GBP])
> >   conf->flags |= VXLAN_F_GBP;
> > - }
> >
> > - if (data[IFLA_VXLAN_GPE]) {
> > - if (changelink)
> > - return -EOPNOTSUPP;
> > + if (data[IFLA_VXLAN_GPE])
> >   conf->flags |= VXLAN_F_GPE;
> > - }
>
> GPE implies running a different setup function (vxlan_raw_setup() vs
> vxlan_ether_setup()), that vxlan_config_apply() only calls for
> !changelink. I think this is incomplete.
>
> I think we'd also end up with mixed tunnel types (GPE/!GPE) on the
> same socket, I'm not sure how that would work. Normally, they each try
> to create a separate socket, and pass the GPE flag on to the
> associated vxlan_sock. I suspect that's also a problem with rx
> offload.

that is good to know. I will drop the change to GPE and also the rx
offload flag and let somebody else using it
open it up for changelink. thanks for the review


> > - if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL]) {
> > - if (changelink)
> > - return -EOPNOTSUPP;
> > + if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
> >   conf->flags |= VXLAN_F_REMCSUM_NOPARTIAL;
> > - }
> >
> >   if (tb[IFLA_MTU]) {
> >   if (changelink)
> > --

Re: [PATCH net-next v2 3/3] vxlan: move flag sets to use a helper func

2018-11-29 Thread Roopa Prabhu

On Thu, Nov 29, 2018 at 6:19 AM Sabrina Dubroca  wrote:
>
> 2018-11-28, 14:27:59 -0800, Roopa Prabhu wrote:
> > +/* Set/clear flags based on attribute */
> > +static void vxlan_nl2flag(struct vxlan_config *conf, struct nlattr *tb[],
> > +   int attrtype, unsigned long mask)
> > +{
> > + unsigned long flags;
> > +
> > + if (!tb[attrtype])
> > + return;
> > +
> > + if (nla_get_u8(tb[attrtype]))
> > + flags = conf->flags | mask;
> > + else
> > + flags = conf->flags & ~mask;
> > +
> > + conf->flags = flags;
> > +}
> > +
> >  static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
> >struct net_device *dev, struct vxlan_config *conf,
> >bool changelink, struct netlink_ext_ack *extack)
> > @@ -3458,45 +3475,27 @@ static int vxlan_nl2conf(struct nlattr *tb[], 
> > struct nlattr *data[],
> >   if (data[IFLA_VXLAN_TTL])
> >   conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
> >
> > - if (data[IFLA_VXLAN_TTL_INHERIT])
> > - conf->flags |= VXLAN_F_TTL_INHERIT;
> > + vxlan_nl2flag(conf, data, IFLA_VXLAN_TTL_INHERIT,
> > +   VXLAN_F_TTL_INHERIT);
>
> IFLA_VXLAN_TTL_INHERIT is a NLA_FLAG. It doesn't have any u8 data to
> get. Same for:
>
> [IFLA_VXLAN_GBP]= { .type = NLA_FLAG, },
> [IFLA_VXLAN_GPE]= { .type = NLA_FLAG, },
> [IFLA_VXLAN_REMCSUM_NOPARTIAL]  = { .type = NLA_FLAG },
> [IFLA_VXLAN_TTL_INHERIT]= { .type = NLA_FLAG },
>
>

good catch. i see it, some flags are NLA_U8 and some NLA_FLAG. will fix.


>
> nit: This patch would have been easier to review if it came first in
> the series. Converting:

I considered that. But the first patch really removes some checks
making it easier for the follow-on patches...and that mostly dictated
the order.

>
> if (nla_get_u8(data[IFLA_VXLAN_FOO]))
> conf->flags |= VXLAN_F_FOO;
>
> to this new helper, which in effect does
>
> if (nla_get_u8(data[IFLA_VXLAN_FOO]))
> conf->flags |= VXLAN_F_FOO;
> else
> conf->flags &= ~VXLAN_F_FOO;
>
> seems to change behavior, but since you're (unless I missed one) only
> applying it to flags that couldn't be changed before patch 1, it looks
> fine.

It does not change behavior...the earlier code only supported the
flags during create time and did not support changing of the flag with
changelink.
this patch adds changelink support. But if you do see any change in
behavior, pls report. thanks.

[PATCH net-next v2 0/3] vxlan: a few minor cleanups

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Roopa Prabhu (3):
  vxlan: support changelink for a few more attributes
  vxlan: extack support for some changelink cases
  vxlan: move flag sets to use a helper func

 drivers/net/vxlan.c | 199 +++-
 1 file changed, 102 insertions(+), 97 deletions(-)

-- 
2.1.4

[PATCH net-next v2 3/3] vxlan: move flag sets to use a helper func

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 95 -
 1 file changed, 43 insertions(+), 52 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4cb6b50..47671fd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3374,6 +3374,23 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
return err;
 }
 
+/* Set/clear flags based on attribute */
+static void vxlan_nl2flag(struct vxlan_config *conf, struct nlattr *tb[],
+ int attrtype, unsigned long mask)
+{
+   unsigned long flags;
+
+   if (!tb[attrtype])
+   return;
+
+   if (nla_get_u8(tb[attrtype]))
+   flags = conf->flags | mask;
+   else
+   flags = conf->flags & ~mask;
+
+   conf->flags = flags;
+}
+
 static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
 struct net_device *dev, struct vxlan_config *conf,
 bool changelink, struct netlink_ext_ack *extack)
@@ -3458,45 +3475,27 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_TTL])
conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
 
-   if (data[IFLA_VXLAN_TTL_INHERIT])
-   conf->flags |= VXLAN_F_TTL_INHERIT;
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_TTL_INHERIT,
+ VXLAN_F_TTL_INHERIT);
 
if (data[IFLA_VXLAN_LABEL])
conf->label = nla_get_be32(data[IFLA_VXLAN_LABEL]) &
 IPV6_FLOWLABEL_MASK;
 
-   if (data[IFLA_VXLAN_LEARNING]) {
-   if (nla_get_u8(data[IFLA_VXLAN_LEARNING]))
-   conf->flags |= VXLAN_F_LEARN;
-   else
-   conf->flags &= ~VXLAN_F_LEARN;
-   } else if (!changelink) {
+   if (data[IFLA_VXLAN_LEARNING])
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_LEARNING,
+ VXLAN_F_LEARN);
+   else if (!changelink)
/* default to learn on a new device */
conf->flags |= VXLAN_F_LEARN;
-   }
 
if (data[IFLA_VXLAN_AGEING])
conf->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
 
-   if (data[IFLA_VXLAN_PROXY]) {
-   if (nla_get_u8(data[IFLA_VXLAN_PROXY]))
-   conf->flags |= VXLAN_F_PROXY;
-   }
-
-   if (data[IFLA_VXLAN_RSC]) {
-   if (nla_get_u8(data[IFLA_VXLAN_RSC]))
-   conf->flags |= VXLAN_F_RSC;
-   }
-
-   if (data[IFLA_VXLAN_L2MISS]) {
-   if (nla_get_u8(data[IFLA_VXLAN_L2MISS]))
-   conf->flags |= VXLAN_F_L2MISS;
-   }
-
-   if (data[IFLA_VXLAN_L3MISS]) {
-   if (nla_get_u8(data[IFLA_VXLAN_L3MISS]))
-   conf->flags |= VXLAN_F_L3MISS;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_PROXY, VXLAN_F_PROXY);
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_RSC, VXLAN_F_RSC);
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_L2MISS, VXLAN_F_L2MISS);
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_L3MISS, VXLAN_F_L3MISS);
 
if (data[IFLA_VXLAN_LIMIT]) {
if (changelink) {
@@ -3514,8 +3513,8 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
"Cannot change metadata flag");
return -EOPNOTSUPP;
}
-   if (nla_get_u8(data[IFLA_VXLAN_COLLECT_METADATA]))
-   conf->flags |= VXLAN_F_COLLECT_METADATA;
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_COLLECT_METADATA,
+ VXLAN_F_COLLECT_METADATA);
}
 
if (data[IFLA_VXLAN_PORT_RANGE]) {
@@ -3553,34 +3552,26 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
conf->flags |= VXLAN_F_UDP_ZERO_CSUM_TX;
}
 
-   if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]) {
-   if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]))
-   conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
+ VXLAN_F_UDP_ZERO_CSUM6_TX);
 
-   if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]) {
-   if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
-   conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+ VXLAN_F_UDP_ZERO_CSUM6_RX);
 
-   if (data[IFLA_VXLAN_REMCSUM_TX]) {
-   if (nla_get_u8(data[IFLA_VXLAN_REMCSUM_TX]))
-   conf->flags |= VXLAN_F_REMCSUM_TX;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_REMCSUM_TX,
+ VXLAN_F_REMCSUM_TX);

[PATCH net-next v2 2/3] vxlan: extack support for some changelink cases

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 76 +
 1 file changed, 59 insertions(+), 17 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 73caa65..4cb6b50 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3376,7 +3376,7 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 
 static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
 struct net_device *dev, struct vxlan_config *conf,
-bool changelink)
+bool changelink, struct netlink_ext_ack *extack)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
 
@@ -3389,40 +3389,60 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_ID]) {
__be32 vni = cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID]));
 
-   if (changelink && (vni != conf->vni))
+   if (changelink && (vni != conf->vni)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_ID],
+   "Cannot change vni");
return -EOPNOTSUPP;
+   }
conf->vni = cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID]));
}
 
if (data[IFLA_VXLAN_GROUP]) {
-   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET))
+   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_GROUP],
+   "New group addr family does not 
match old group");
return -EOPNOTSUPP;
-
+   }
conf->remote_ip.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_GROUP]);
conf->remote_ip.sa.sa_family = AF_INET;
} else if (data[IFLA_VXLAN_GROUP6]) {
-   if (!IS_ENABLED(CONFIG_IPV6))
+   if (!IS_ENABLED(CONFIG_IPV6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_GROUP6],
+   "IPv6 support not enabled in the 
kernel");
return -EPFNOSUPPORT;
+   }
 
-   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET6))
+   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_GROUP6],
+   "New group addr family does not 
match old group");
return -EOPNOTSUPP;
+   }
 
conf->remote_ip.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]);
conf->remote_ip.sa.sa_family = AF_INET6;
}
 
if (data[IFLA_VXLAN_LOCAL]) {
-   if (changelink && (conf->saddr.sa.sa_family != AF_INET))
+   if (changelink && (conf->saddr.sa.sa_family != AF_INET)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL],
+   "New local addr family does not 
match old");
return -EOPNOTSUPP;
+   }
 
conf->saddr.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_LOCAL]);
conf->saddr.sa.sa_family = AF_INET;
} else if (data[IFLA_VXLAN_LOCAL6]) {
-   if (!IS_ENABLED(CONFIG_IPV6))
+   if (!IS_ENABLED(CONFIG_IPV6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL6],
+   "IPv6 support not enabled in the 
kernel");
return -EPFNOSUPPORT;
+   }
 
-   if (changelink && (conf->saddr.sa.sa_family != AF_INET6))
+   if (changelink && (conf->saddr.sa.sa_family != AF_INET6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL6],
+   "New local6 addr family does not 
match old");
return -EOPNOTSUPP;
+   }
 
/* TODO: respect scope id */
conf->saddr.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]);
@@ -3479,14 +3499,21 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
}
 
if (data[IFLA_VXLAN_LIMIT]) {
-   if (changelink)
+   if (changelink) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LIMIT],
+   "Cannot change limit");
return -EOPNOTSUPP;
+   }
conf->addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);

[PATCH net-next v2 1/3] vxlan: support changelink for a few more attributes

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

We started very conservative when supporting changelink
especially because not all attribute changes could be
tested. This patch opens up a few more attributes for
changelink. The reason for choosing this set of attributes
is based on code references for these attributes. I have
tested TTL changes and did some changelink api testing
to sanity test the others.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 36 
 1 file changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9110662..73caa65 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3438,11 +3438,8 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_TTL])
conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
 
-   if (data[IFLA_VXLAN_TTL_INHERIT]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_TTL_INHERIT])
conf->flags |= VXLAN_F_TTL_INHERIT;
-   }
 
if (data[IFLA_VXLAN_LABEL])
conf->label = nla_get_be32(data[IFLA_VXLAN_LABEL]) &
@@ -3462,29 +3459,21 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
conf->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
 
if (data[IFLA_VXLAN_PROXY]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_PROXY]))
conf->flags |= VXLAN_F_PROXY;
}
 
if (data[IFLA_VXLAN_RSC]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_RSC]))
conf->flags |= VXLAN_F_RSC;
}
 
if (data[IFLA_VXLAN_L2MISS]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_L2MISS]))
conf->flags |= VXLAN_F_L2MISS;
}
 
if (data[IFLA_VXLAN_L3MISS]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_L3MISS]))
conf->flags |= VXLAN_F_L3MISS;
}
@@ -3527,50 +3516,33 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
}
 
if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]))
conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
}
 
if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
}
 
if (data[IFLA_VXLAN_REMCSUM_TX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_REMCSUM_TX]))
conf->flags |= VXLAN_F_REMCSUM_TX;
}
 
if (data[IFLA_VXLAN_REMCSUM_RX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_REMCSUM_RX]))
conf->flags |= VXLAN_F_REMCSUM_RX;
}
 
-   if (data[IFLA_VXLAN_GBP]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_GBP])
conf->flags |= VXLAN_F_GBP;
-   }
 
-   if (data[IFLA_VXLAN_GPE]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_GPE])
conf->flags |= VXLAN_F_GPE;
-   }
 
-   if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
conf->flags |= VXLAN_F_REMCSUM_NOPARTIAL;
-   }
 
if (tb[IFLA_MTU]) {
if (changelink)
-- 
2.1.4

Re: [PATCH net-next 3/3] vxlan: move flag sets to use a helper func vxlan_nl2conf

2018-11-28 Thread Roopa Prabhu

On Wed, Nov 28, 2018 at 2:10 PM Roopa Prabhu  wrote:
>
> From: Roopa Prabhu 
>
> Signed-off-by: Roopa Prabhu 
> ---

just noticed a typo in the title, spinning v2.


>  drivers/net/vxlan.c | 95 
> -
>  1 file changed, 43 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4cb6b50..47671fd 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -3374,6 +3374,23 @@ static int __vxlan_dev_create(struct net *net, struct 
> net_device *dev,
> return err;
>  }
>
> +/* Set/clear flags based on attribute */
> +static void vxlan_nl2flag(struct vxlan_config *conf, struct nlattr *tb[],
> + int attrtype, unsigned long mask)
> +{
> +   unsigned long flags;
> +
> +   if (!tb[attrtype])
> +   return;
> +
> +   if (nla_get_u8(tb[attrtype]))
> +   flags = conf->flags | mask;
> +   else
> +   flags = conf->flags & ~mask;
> +
> +   conf->flags = flags;
> +}
> +
>  static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
>  struct net_device *dev, struct vxlan_config *conf,
>  bool changelink, struct netlink_ext_ack *extack)
> @@ -3458,45 +3475,27 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
> nlattr *data[],
> if (data[IFLA_VXLAN_TTL])
> conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
>
> -   if (data[IFLA_VXLAN_TTL_INHERIT])
> -   conf->flags |= VXLAN_F_TTL_INHERIT;
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_TTL_INHERIT,
> + VXLAN_F_TTL_INHERIT);
>
> if (data[IFLA_VXLAN_LABEL])
> conf->label = nla_get_be32(data[IFLA_VXLAN_LABEL]) &
>  IPV6_FLOWLABEL_MASK;
>
> -   if (data[IFLA_VXLAN_LEARNING]) {
> -   if (nla_get_u8(data[IFLA_VXLAN_LEARNING]))
> -   conf->flags |= VXLAN_F_LEARN;
> -   else
> -   conf->flags &= ~VXLAN_F_LEARN;
> -   } else if (!changelink) {
> +   if (data[IFLA_VXLAN_LEARNING])
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_LEARNING,
> + VXLAN_F_LEARN);
> +   else if (!changelink)
> /* default to learn on a new device */
> conf->flags |= VXLAN_F_LEARN;
> -   }
>
> if (data[IFLA_VXLAN_AGEING])
> conf->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
>
> -   if (data[IFLA_VXLAN_PROXY]) {
> -   if (nla_get_u8(data[IFLA_VXLAN_PROXY]))
> -   conf->flags |= VXLAN_F_PROXY;
> -   }
> -
> -   if (data[IFLA_VXLAN_RSC]) {
> -   if (nla_get_u8(data[IFLA_VXLAN_RSC]))
> -   conf->flags |= VXLAN_F_RSC;
> -   }
> -
> -   if (data[IFLA_VXLAN_L2MISS]) {
> -   if (nla_get_u8(data[IFLA_VXLAN_L2MISS]))
> -   conf->flags |= VXLAN_F_L2MISS;
> -   }
> -
> -   if (data[IFLA_VXLAN_L3MISS]) {
> -   if (nla_get_u8(data[IFLA_VXLAN_L3MISS]))
> -   conf->flags |= VXLAN_F_L3MISS;
> -   }
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_PROXY, VXLAN_F_PROXY);
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_RSC, VXLAN_F_RSC);
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_L2MISS, VXLAN_F_L2MISS);
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_L3MISS, VXLAN_F_L3MISS);
>
> if (data[IFLA_VXLAN_LIMIT]) {
> if (changelink) {
> @@ -3514,8 +3513,8 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
> nlattr *data[],
> "Cannot change metadata flag");
> return -EOPNOTSUPP;
> }
> -   if (nla_get_u8(data[IFLA_VXLAN_COLLECT_METADATA]))
> -   conf->flags |= VXLAN_F_COLLECT_METADATA;
> +   vxlan_nl2flag(conf, data, IFLA_VXLAN_COLLECT_METADATA,
> + VXLAN_F_COLLECT_METADATA);
> }
>
> if (data[IFLA_VXLAN_PORT_RANGE]) {
> @@ -3553,34 +3552,26 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
> nlattr *data[],
> conf->flags |= VXLAN_F_UDP_ZERO_CSUM_TX;
> }
>
> -   if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]) {
> -   if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]))
> -   conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
> -   }
> +   vxlan_nl2flag(conf, data,

[PATCH net-next 1/3] vxlan: support changelink for a few more attributes

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

We started very conservative when supporting changelink
especially because not all attribute changes could be
tested. This patch opens up a few more attributes for
changelink. The reason for choosing this set of attributes
is based on code references for these attributes. I have
tested TTL changes and did some changelink api testing
to sanity test the others.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 36 
 1 file changed, 4 insertions(+), 32 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9110662..73caa65 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3438,11 +3438,8 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_TTL])
conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
 
-   if (data[IFLA_VXLAN_TTL_INHERIT]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_TTL_INHERIT])
conf->flags |= VXLAN_F_TTL_INHERIT;
-   }
 
if (data[IFLA_VXLAN_LABEL])
conf->label = nla_get_be32(data[IFLA_VXLAN_LABEL]) &
@@ -3462,29 +3459,21 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
conf->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
 
if (data[IFLA_VXLAN_PROXY]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_PROXY]))
conf->flags |= VXLAN_F_PROXY;
}
 
if (data[IFLA_VXLAN_RSC]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_RSC]))
conf->flags |= VXLAN_F_RSC;
}
 
if (data[IFLA_VXLAN_L2MISS]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_L2MISS]))
conf->flags |= VXLAN_F_L2MISS;
}
 
if (data[IFLA_VXLAN_L3MISS]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_L3MISS]))
conf->flags |= VXLAN_F_L3MISS;
}
@@ -3527,50 +3516,33 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
}
 
if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]))
conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
}
 
if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
}
 
if (data[IFLA_VXLAN_REMCSUM_TX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_REMCSUM_TX]))
conf->flags |= VXLAN_F_REMCSUM_TX;
}
 
if (data[IFLA_VXLAN_REMCSUM_RX]) {
-   if (changelink)
-   return -EOPNOTSUPP;
if (nla_get_u8(data[IFLA_VXLAN_REMCSUM_RX]))
conf->flags |= VXLAN_F_REMCSUM_RX;
}
 
-   if (data[IFLA_VXLAN_GBP]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_GBP])
conf->flags |= VXLAN_F_GBP;
-   }
 
-   if (data[IFLA_VXLAN_GPE]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_GPE])
conf->flags |= VXLAN_F_GPE;
-   }
 
-   if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL]) {
-   if (changelink)
-   return -EOPNOTSUPP;
+   if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
conf->flags |= VXLAN_F_REMCSUM_NOPARTIAL;
-   }
 
if (tb[IFLA_MTU]) {
if (changelink)
-- 
2.1.4

[PATCH net-next 3/3] vxlan: move flag sets to use a helper func vxlan_nl2conf

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 95 -
 1 file changed, 43 insertions(+), 52 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4cb6b50..47671fd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3374,6 +3374,23 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
return err;
 }
 
+/* Set/clear flags based on attribute */
+static void vxlan_nl2flag(struct vxlan_config *conf, struct nlattr *tb[],
+ int attrtype, unsigned long mask)
+{
+   unsigned long flags;
+
+   if (!tb[attrtype])
+   return;
+
+   if (nla_get_u8(tb[attrtype]))
+   flags = conf->flags | mask;
+   else
+   flags = conf->flags & ~mask;
+
+   conf->flags = flags;
+}
+
 static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
 struct net_device *dev, struct vxlan_config *conf,
 bool changelink, struct netlink_ext_ack *extack)
@@ -3458,45 +3475,27 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_TTL])
conf->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
 
-   if (data[IFLA_VXLAN_TTL_INHERIT])
-   conf->flags |= VXLAN_F_TTL_INHERIT;
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_TTL_INHERIT,
+ VXLAN_F_TTL_INHERIT);
 
if (data[IFLA_VXLAN_LABEL])
conf->label = nla_get_be32(data[IFLA_VXLAN_LABEL]) &
 IPV6_FLOWLABEL_MASK;
 
-   if (data[IFLA_VXLAN_LEARNING]) {
-   if (nla_get_u8(data[IFLA_VXLAN_LEARNING]))
-   conf->flags |= VXLAN_F_LEARN;
-   else
-   conf->flags &= ~VXLAN_F_LEARN;
-   } else if (!changelink) {
+   if (data[IFLA_VXLAN_LEARNING])
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_LEARNING,
+ VXLAN_F_LEARN);
+   else if (!changelink)
/* default to learn on a new device */
conf->flags |= VXLAN_F_LEARN;
-   }
 
if (data[IFLA_VXLAN_AGEING])
conf->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
 
-   if (data[IFLA_VXLAN_PROXY]) {
-   if (nla_get_u8(data[IFLA_VXLAN_PROXY]))
-   conf->flags |= VXLAN_F_PROXY;
-   }
-
-   if (data[IFLA_VXLAN_RSC]) {
-   if (nla_get_u8(data[IFLA_VXLAN_RSC]))
-   conf->flags |= VXLAN_F_RSC;
-   }
-
-   if (data[IFLA_VXLAN_L2MISS]) {
-   if (nla_get_u8(data[IFLA_VXLAN_L2MISS]))
-   conf->flags |= VXLAN_F_L2MISS;
-   }
-
-   if (data[IFLA_VXLAN_L3MISS]) {
-   if (nla_get_u8(data[IFLA_VXLAN_L3MISS]))
-   conf->flags |= VXLAN_F_L3MISS;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_PROXY, VXLAN_F_PROXY);
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_RSC, VXLAN_F_RSC);
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_L2MISS, VXLAN_F_L2MISS);
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_L3MISS, VXLAN_F_L3MISS);
 
if (data[IFLA_VXLAN_LIMIT]) {
if (changelink) {
@@ -3514,8 +3513,8 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
"Cannot change metadata flag");
return -EOPNOTSUPP;
}
-   if (nla_get_u8(data[IFLA_VXLAN_COLLECT_METADATA]))
-   conf->flags |= VXLAN_F_COLLECT_METADATA;
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_COLLECT_METADATA,
+ VXLAN_F_COLLECT_METADATA);
}
 
if (data[IFLA_VXLAN_PORT_RANGE]) {
@@ -3553,34 +3552,26 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
conf->flags |= VXLAN_F_UDP_ZERO_CSUM_TX;
}
 
-   if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]) {
-   if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]))
-   conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
+ VXLAN_F_UDP_ZERO_CSUM6_TX);
 
-   if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]) {
-   if (nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
-   conf->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+ VXLAN_F_UDP_ZERO_CSUM6_RX);
 
-   if (data[IFLA_VXLAN_REMCSUM_TX]) {
-   if (nla_get_u8(data[IFLA_VXLAN_REMCSUM_TX]))
-   conf->flags |= VXLAN_F_REMCSUM_TX;
-   }
+   vxlan_nl2flag(conf, data, IFLA_VXLAN_REMCSUM_TX,
+ VXLAN_F_REMCSUM_TX);

[PATCH net-next 2/3] vxlan: extack support for some changelink cases

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 76 +
 1 file changed, 59 insertions(+), 17 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 73caa65..4cb6b50 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3376,7 +3376,7 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 
 static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
 struct net_device *dev, struct vxlan_config *conf,
-bool changelink)
+bool changelink, struct netlink_ext_ack *extack)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
 
@@ -3389,40 +3389,60 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
if (data[IFLA_VXLAN_ID]) {
__be32 vni = cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID]));
 
-   if (changelink && (vni != conf->vni))
+   if (changelink && (vni != conf->vni)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_ID],
+   "Cannot change vni");
return -EOPNOTSUPP;
+   }
conf->vni = cpu_to_be32(nla_get_u32(data[IFLA_VXLAN_ID]));
}
 
if (data[IFLA_VXLAN_GROUP]) {
-   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET))
+   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_GROUP],
+   "New group addr family does not 
match old group");
return -EOPNOTSUPP;
-
+   }
conf->remote_ip.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_GROUP]);
conf->remote_ip.sa.sa_family = AF_INET;
} else if (data[IFLA_VXLAN_GROUP6]) {
-   if (!IS_ENABLED(CONFIG_IPV6))
+   if (!IS_ENABLED(CONFIG_IPV6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_GROUP6],
+   "IPv6 support not enabled in the 
kernel");
return -EPFNOSUPPORT;
+   }
 
-   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET6))
+   if (changelink && (conf->remote_ip.sa.sa_family != AF_INET6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_GROUP6],
+   "New group addr family does not 
match old group");
return -EOPNOTSUPP;
+   }
 
conf->remote_ip.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]);
conf->remote_ip.sa.sa_family = AF_INET6;
}
 
if (data[IFLA_VXLAN_LOCAL]) {
-   if (changelink && (conf->saddr.sa.sa_family != AF_INET))
+   if (changelink && (conf->saddr.sa.sa_family != AF_INET)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL],
+   "New local addr family does not 
match old");
return -EOPNOTSUPP;
+   }
 
conf->saddr.sin.sin_addr.s_addr = 
nla_get_in_addr(data[IFLA_VXLAN_LOCAL]);
conf->saddr.sa.sa_family = AF_INET;
} else if (data[IFLA_VXLAN_LOCAL6]) {
-   if (!IS_ENABLED(CONFIG_IPV6))
+   if (!IS_ENABLED(CONFIG_IPV6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL6],
+   "IPv6 support not enabled in the 
kernel");
return -EPFNOSUPPORT;
+   }
 
-   if (changelink && (conf->saddr.sa.sa_family != AF_INET6))
+   if (changelink && (conf->saddr.sa.sa_family != AF_INET6)) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LOCAL6],
+   "New local6 addr family does not 
match old");
return -EOPNOTSUPP;
+   }
 
/* TODO: respect scope id */
conf->saddr.sin6.sin6_addr = 
nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]);
@@ -3479,14 +3499,21 @@ static int vxlan_nl2conf(struct nlattr *tb[], struct 
nlattr *data[],
}
 
if (data[IFLA_VXLAN_LIMIT]) {
-   if (changelink)
+   if (changelink) {
+   NL_SET_ERR_MSG_ATTR(extack, tb[IFLA_VXLAN_LIMIT],
+   "Cannot change limit");
return -EOPNOTSUPP;
+   }
conf->addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);

[PATCH net-next 0/3] vxlan: a few minor cleanups

2018-11-28 Thread Roopa Prabhu

From: Roopa Prabhu 

Roopa Prabhu (3):
  vxlan: support changelink for a few more attributes
  vxlan: extack support for some changelink cases
  vxlan: move flag sets to use a helper func vxlan_nl2conf

 drivers/net/vxlan.c | 199 +++-
 1 file changed, 102 insertions(+), 97 deletions(-)

-- 
2.1.4

[PATCH iproute2] bridge: make -c match -compressvlans first instead of -color

2018-11-27 Thread Roopa Prabhu

From: Roopa Prabhu 

commit c7c1a1ef51ae ("bridge: colorize output and use JSON print library")
broke previous use of -c to represent compressvlans. This restores
previous use of -c to represent compressvlans. Understand the original
motivation to use -c to represent color consistently everywhere but
there are apps and network interface managers out there that are already
using -c to prepresent compressed vlans.

Fixes: c7c1a1ef51ae ("bridge: colorize output and use JSON print library")
Signed-off-by: Roopa Prabhu 
---
 bridge/bridge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index 389f1bd..a3d8154 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -171,9 +171,9 @@ main(int argc, char **argv)
NEXT_ARG();
if (netns_switch(argv[1]))
exit(-1);
-   } else if (matches_color(opt, )) {
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
+   } else if (matches_color(opt, )) {
} else if (matches(opt, "-force") == 0) {
++force;
} else if (matches(opt, "-json") == 0) {
-- 
2.1.4

Re: [PATCH v5 5/6] vxlan: handle underlay VRF changes

2018-11-26 Thread Roopa Prabhu

On Mon, Nov 26, 2018 at 5:04 PM Alexis Bauvin  wrote:
>
> When underlay VRF changes, either because the lower device itself changed,
> or its VRF changed, this patch releases the current socket of the VXLAN
> device and recreates another one in the right VRF. This allows for
> on-the-fly change of the underlay VRF of a VXLAN device.
>
> Signed-off-by: Alexis Bauvin 
> Reviewed-by: Amine Kherbouche 
> Tested-by: Amine Kherbouche 
> ---

re-iterating my comments on the patch this time.

this version still unconditionally calls reopen even if the current
state of the device is closed (eg vxlan_stop).
generally not in favor of the unconditional open/close in the driver.
Lets see if there are other options.
I interpreted one of Davids suggestions to force the change ordering
from user-space by returning an error.
ie Make the user do a down and up of the vxlan device if he wants to
change the vrf of the default remote dev.

This patch needs more thought, the rest are ok to go in if you
separate them out.

>  drivers/net/vxlan.c | 82 +
>  1 file changed, 82 insertions(+)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 8ba0a57ff958..131ee80a38f9 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -3720,6 +3720,33 @@ struct net_device *vxlan_dev_create(struct net *net, 
> const char *name,
>  }
>  EXPORT_SYMBOL_GPL(vxlan_dev_create);
>
> +static int vxlan_reopen(struct vxlan_net *vn, struct vxlan_dev *vxlan)
> +{
> +   int ret = 0;
> +
> +   if (vxlan_addr_multicast(>default_dst.remote_ip) &&
> +   !vxlan_group_used(vn, vxlan))
> +   ret = vxlan_igmp_leave(vxlan);
> +   vxlan_sock_release(vxlan);
> +
> +   if (ret < 0)
> +   return ret;
> +
> +   ret = vxlan_sock_add(vxlan);
> +   if (ret < 0)
> +   return ret;
> +
> +   if (vxlan_addr_multicast(>default_dst.remote_ip)) {
> +   ret = vxlan_igmp_join(vxlan);
> +   if (ret == -EADDRINUSE)
> +   ret = 0;
> +   if (ret)
> +   vxlan_sock_release(vxlan);
> +   }
> +
> +   return ret;
> +}
> +
>  static void vxlan_handle_lowerdev_unregister(struct vxlan_net *vn,
>  struct net_device *dev)
>  {
> @@ -3742,6 +3769,55 @@ static void vxlan_handle_lowerdev_unregister(struct 
> vxlan_net *vn,
> unregister_netdevice_many(_kill);
>  }
>
> +static void vxlan_handle_change_upper(struct vxlan_net *vn,
> + struct net_device *dev)
> +{
> +   struct vxlan_dev *vxlan, *next;
> +
> +   list_for_each_entry_safe(vxlan, next, >vxlan_list, next) {
> +   struct net_device *lower;
> +   int err;
> +
> +   lower = __dev_get_by_index(vxlan->net,
> +  vxlan->cfg.remote_ifindex);
> +   if (!netdev_is_upper_master(lower, dev))
> +   continue;
> +
> +   err = vxlan_reopen(vn, vxlan);
> +   if (err < 0)
> +   netdev_err(vxlan->dev, "Failed to reopen socket: 
> %d\n",
> +  err);
> +   }
> +}
> +
> +static void vxlan_handle_change(struct vxlan_net *vn, struct net_device *dev)
> +{
> +   struct vxlan_dev *vxlan = netdev_priv(dev);
> +   struct vxlan_sock *sock;
> +   int l3mdev_index = 0;
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +   bool metadata = vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA;
> +   bool ipv6 = vxlan->cfg.flags & VXLAN_F_IPV6 || metadata;
> +
> +   sock = ipv6 ? rcu_dereference(vxlan->vn6_sock)
> +   : rcu_dereference(vxlan->vn4_sock);
> +#else
> +   sock = rcu_dereference(vxlan->vn4_sock);
> +#endif
> +
> +   if (vxlan->cfg.remote_ifindex)
> +   l3mdev_index = l3mdev_master_upper_ifindex_by_index(
> +   vxlan->net, vxlan->cfg.remote_ifindex);
> +   if (sock->sock->sk->sk_bound_dev_if != l3mdev_index) {
> +   int ret = vxlan_reopen(vn, vxlan);
> +
> +   if (ret < 0)
> +   netdev_err(vxlan->dev, "Failed to reopen socket: 
> %d\n",
> +  ret);
> +   }
> +}
> +
>  static int vxlan_netdevice_event(struct notifier_block *unused,
>  unsigned long event, void *ptr)
>  {
> @@ -3756,6 +3832,12 @@ static int vxlan_netdevice_event(struct notifier_block 
> *unused,
> } else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO ||
>event == NETDEV_UDP_TUNNEL_DROP_INFO) {
> vxlan_offload_rx_ports(dev, event == 
> NETDEV_UDP_TUNNEL_PUSH_INFO);
> +   } else if (event == NETDEV_CHANGEUPPER) {
> +   vxlan_handle_change_upper(vn, dev);
> +   } else if (event == NETDEV_CHANGE) {
> +   if (dev->rtnl_link_ops &&
> +

Re: [RFC v4 3/5] vxlan: add support for underlay in non-default VRF

2018-11-26 Thread Roopa Prabhu

On Mon, Nov 26, 2018 at 9:54 AM David Ahern  wrote:
>
> On 11/26/18 9:32 AM, Alexis Bauvin wrote:
> > Thanks for the review. I’ll send a v5 if you have no other comment on
> > this version!
>
> A few comments on the test script; see attached which has the changes.
>
> Mainly the cleanup does not need to be called at the end since you setup
> the exit trap. The cleanup calls ip to delete veth-hv-1 and veth-tap but
> those are moved to other namespaces. 'ip netns exec NAME ip ...' is more
> efficiently done as 'ip -netns NAME ...'. The test results should align
> like this:
>
> Checking HV connectivity  [ OK ]
> Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
> Check VM connectivity through VXLAN (underlay in a VRF)   [ OK ]
>
> So it is easy for users to see the PASS/FAIL.
>
> It would be good to copy the topology ascii art into the test script as
> well for future users.
>
> Also, add the test as a separate patch at the end and include it in
> tools/testing/selftests/net/Makefile
>
> Finally, I think you should drop the RFC and send it as a 'ready for
> inclusion'.

I cant seem to find patch 5 in my mail box... so commenting here
(Using reference to patch5 from here
https://marc.info/?l=linux-netdev=154284885815549=2)

Still not convinced that the auto reopen is justified here IMO because
it can be done from user-space and there are many cases where this is
already done from user-space. A few questions for alexis on that,
- What is the reason for handling NETDEV_CHANGE on the vxlan device
from the notifier handler. It can be really done in the changelink
handler, correct  ?
- Also, IIUC, patch5 blindly re-opens the vxlan device without
considering if the admin had set it to down last (ie the last state on
it was vxlan_close). is that correct ?

(Don't want to block the entire series for just patch5. Patch5 can be
done incrementally after we converge on it. The rest of the series
looks good as David has already reviewed.  And nice to see the test!).

Re: [RFC v3 3/3] vxlan: handle underlay VRF changes

2018-11-20 Thread Roopa Prabhu

On Tue, Nov 20, 2018 at 7:04 AM David Ahern  wrote:
>
> On 11/20/18 7:23 AM, Alexis Bauvin wrote:
> > When underlay VRF changes, either because the lower device itself changed,
> > or its VRF changed, this patch releases the current socket of the VXLAN
> > device and recreates another one in the right VRF. This allows for
> > on-the-fly change of the underlay VRF of a VXLAN device.
> >
> > Signed-off-by: Alexis Bauvin 
> > Reviewed-by: Amine Kherbouche 
> > Tested-by: Amine Kherbouche 
> > ---
> >  drivers/net/vxlan.c | 94 +
> >  1 file changed, 94 insertions(+)
> >
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index a3de08122269..1e6ccad6df6a 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -208,6 +208,18 @@ static inline struct vxlan_rdst 
> > *first_remote_rtnl(struct vxlan_fdb *fdb)
> >   return list_first_entry(>remotes, struct vxlan_rdst, list);
> >  }
> >
> > +static int vxlan_is_in_l3mdev_chain(struct net_device *chain,
> > + struct net_device *dev)
> > +{
> > + if (!chain)
> > + return 0;
> > +
> > + if (chain->ifindex == dev->ifindex)
> > + return 1;
> > + return vxlan_is_in_l3mdev_chain(netdev_master_upper_dev_get(chain),
> > + dev);
> > +}
>
> This should return bool and true/false.
>
> Also, why l3mdev in the name? None of the checks look at whether it is
> an l3mdev master.
>
> And again here, someone more familiar with the vxlan code should review it.
>


I understand the need for patch 2. But I don't understand the need for
the complexity this patch introduces (especially implicit down and up
of the vxlan device).
Alexis, If your underlay routing changes,  you can down and up the
vxlan device from user-space correct ?. This should be true for any
tunnel device.

Re: [RFC v3 2/3] vxlan: add support for underlay in non-default VRF

2018-11-20 Thread Roopa Prabhu

On Tue, Nov 20, 2018 at 6:23 AM Alexis Bauvin  wrote:
>
> Creating a VXLAN device with is underlay in the non-default VRF makes
> egress route lookup fail or incorrect since it will resolve in the
> default VRF, and ingress fail because the socket listens in the default
> VRF.
>
> This patch binds the underlying UDP tunnel socket to the l3mdev of the
> lower device of the VXLAN device. This will listen in the proper VRF and
> output traffic from said l3mdev, matching l3mdev routing rules and
> looking up the correct routing table.
>
> When the VXLAN device does not have a lower device, or the lower device
> is in the default VRF, the socket will not be bound to any interface,
> keeping the previous behaviour.
>
> The underlay l3mdev is deduced from the VXLAN lower device
> (IFLA_VXLAN_LINK).
>
> The l3mdev_master_upper_ifindex_by_index function has been added to
> l3mdev. Its goal is to fetch the effective l3mdev of an interface which
> is not a direct slave of said l3mdev. It handles the following example,
> properly resolving the l3mdev of eth0 to vrf-blue:
>
> +--+ +-+
> |  | | |
> | vrf-blue | | vrf-red |
> |  | | |
> ++-+ +++
>  ||
>  ||
> ++-+ +++
> |  | | |
> | br-blue  | | br-red  |
> |  | | |
> ++-+ +---+-+---+
>  |   | |
>  | +-+ +-+
>  | | |
> ++-++--++   +++
> |  |  lower device  |   |   | |
> |   eth0   | <- - - - - - - | vxlan-red |   | tap-red | (... more taps)
> |  ||   |   | |
> +--++---+   +-+
>
> Signed-off-by: Alexis Bauvin 
> Reviewed-by: Amine Kherbouche 
> Tested-by: Amine Kherbouche 
> ---
>  drivers/net/vxlan.c  | 32 
>  include/net/l3mdev.h | 22 ++
>  net/l3mdev/l3mdev.c  | 18 ++
>  3 files changed, 64 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 27bd586b94b0..a3de08122269 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -212,7 +212,7 @@ static inline struct vxlan_rdst *first_remote_rtnl(struct 
> vxlan_fdb *fdb)
>   * and enabled unshareable flags.
>   */
>  static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t 
> family,
> - __be16 port, u32 flags)
> + __be16 port, u32 flags, int ifindex)
>  {
> struct vxlan_sock *vs;
>
> @@ -221,7 +221,8 @@ static struct vxlan_sock *vxlan_find_sock(struct net 
> *net, sa_family_t family,
> hlist_for_each_entry_rcu(vs, vs_head(net, port), hlist) {
> if (inet_sk(vs->sock->sk)->inet_sport == port &&
> vxlan_get_sk_family(vs) == family &&
> -   vs->flags == flags)
> +   vs->flags == flags &&
> +   vs->sock->sk->sk_bound_dev_if == ifindex)
> return vs;
> }
> return NULL;
> @@ -261,7 +262,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, 
> int ifindex,
>  {
> struct vxlan_sock *vs;
>
> -   vs = vxlan_find_sock(net, family, port, flags);
> +   vs = vxlan_find_sock(net, family, port, flags, ifindex);
> if (!vs)
> return NULL;
>
> @@ -2172,6 +2173,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
> struct rtable *rt;
> __be16 df = 0;
>
> +   if (!ifindex)
> +   ifindex = sock4->sock->sk->sk_bound_dev_if;
> +
> rt = vxlan_get_route(vxlan, dev, sock4, skb, ifindex, tos,
>  dst->sin.sin_addr.s_addr,
>  _ip.sin.sin_addr.s_addr,
> @@ -2210,6 +2214,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
> } else {
> struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
>
> +   if (!ifindex)
> +   ifindex = sock6->sock->sk->sk_bound_dev_if;
> +
> ndst = vxlan6_get_route(vxlan, dev, sock6, skb, ifindex, tos,
> label, >sin6.sin6_addr,
> _ip.sin6.sin6_addr,
> @@ -2813,7 +2820,7 @@ static const struct ethtool_ops vxlan_ethtool_ops = {
>  };
>
>  static struct socket *vxlan_create_sock(struct

Re: Should the bridge learn from frames with link local destination MAC address?

2018-11-09 Thread Roopa Prabhu

On Fri, Nov 9, 2018 at 8:00 AM Stephen Hemminger
 wrote:
>
> On Fri, 9 Nov 2018 04:24:43 +0100
> Andrew Lunn  wrote:
>
> > Hi Roopa, Nikolay
> >
> > br_handle_frame() looks out for frames with a destination MAC
> > addresses with is Ethernet link local, those which fit
> > 01-80-C2-00-00-XX. It does not normally forward these, but it will
> > deliver them locally.
> >
> > Should the bridge perform learning on such frames?
> >
> > I've got a setup with two bridges connected together with multiple
> > links between them. STP has done its thing, and blocked one of the
> > ports to solve the loop.
> >
> > host0   host1
> > +-+ +-+
> > | lan0 forwarding |-| lan0 forwarding |
> > | |   | |
> > | lan1 forwarding |-| lan1 blocked|
> > +-+   +-+
> >
> > I have LLDP running on both system, and they are sending out periodic
> > frames on each port.
> >
> > Now, lan0 and lan1 on host1 use the same MAC address.  So i see the
> > MAC address bouncing between ports because of the LLDP packets.
> >
> > # bridge monitor
> > 00:26:55:d2:27:a8 dev lan1 master br0
> > 00:26:55:d2:27:a8 dev lan0 master br0
> > 00:26:55:d2:27:a8 dev lan1 master br0
> > 00:26:55:d2:27:a8 dev lan0 master br0
> > 00:26:55:d2:27:a8 dev lan1 master br0
> >
> > This then results in normal traffic from host0 to host1 being sent to
> > the blocked port for some of the time.
> >
> > LLDP is using 01-80-C2-00-00-0E, a link local MAC address. If the
> > bridge did not learn on such frames, i think this setup would
> > work. The bridge would learn from ARP, IP etc, coming from the
> > forwarding port of host1, and the blocked port would be ignored.
> >
> > I've tried a similar setup with a hardware switch, Marvell 6352. It
> > never seems to learn from such frames.
> >
> > Thanks
> >   Andrew
>
> I agree with your analysis. A properly operating 802 compliant bridge
> should not learn link local addresses.  But changing that in Linux bridge
> would probably break some users. There is already a hack to forward link
> local frames. There are many usages of Linux vswitch where this behavior
> might be a problem:
> 1. a container or VM hub
> 2. bump in the wire filter
> 3. L2 nat etc.
>
> So what ever you decide it has to be optional and unfortunately default
> to off.
>

Andrew, I agree with your analysis also. We have hit this problem too
(and we have an internal bug tracking it).
We have not acted on this so far because of the fear of breaking
existing deployments. I am all for fixing this if there is a
clean way.

[PATCH iproute2] bridge: fdb: remove redundant dev string in show output

2018-11-07 Thread Roopa Prabhu

From: Roopa Prabhu 

After commit 4abb8c723a64 ("bridge: fdb: Fix for missing
keywords in non-JSON output"), I am seeing a double print for dev
in bridge fdb show. eg:
"44:38:39:00:6a:82 dev dev bridge vlan 1 master bridge permanent"

this patch removes the redundant print.

Fixes: 4abb8c723a64 ("bridge: fdb: Fix for missing keywords in non-JSON output")
CC: Phil Sutter 
Signed-off-by: Roopa Prabhu 
---
 bridge/fdb.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index f82938f..a5abc1b 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -181,13 +181,10 @@ int print_fdb(struct nlmsghdr *n, void *arg)
   "mac", "%s ", lladdr);
}
 
-   if (!filter_index && r->ndm_ifindex) {
-   if (!is_json_context())
-   fprintf(fp, "dev ");
+   if (!filter_index && r->ndm_ifindex)
print_color_string(PRINT_ANY, COLOR_IFNAME,
   "ifname", "dev %s ",
   ll_index_to_name(r->ndm_ifindex));
-   }
 
if (tb[NDA_DST]) {
int family = AF_INET;
-- 
2.1.4

Re: [PATCH net] bridge: do not add port to router list when receives query with source 0.0.0.0

2018-10-26 Thread Roopa Prabhu

On Thu, Oct 25, 2018 at 7:29 PM Hangbin Liu  wrote:
>
> Based on RFC 4541, 2.1.1.  IGMP Forwarding Rules
>
>   The switch supporting IGMP snooping must maintain a list of
>   multicast routers and the ports on which they are attached.  This
>   list can be constructed in any combination of the following ways:
>
>   a) This list should be built by the snooping switch sending
>  Multicast Router Solicitation messages as described in IGMP
>  Multicast Router Discovery [MRDISC].  It may also snoop
>  Multicast Router Advertisement messages sent by and to other
>  nodes.
>
>   b) The arrival port for IGMP Queries (sent by multicast routers)
>  where the source address is not 0.0.0.0.
>
> We should not add the port to router list when receives query with source
> 0.0.0.0.
>
> Reported-by: Ying Xu 
> Signed-off-by: Hangbin Liu 
> ---

Acked-by: Roopa Prabhu

[PATCH net] Revert "neighbour: force neigh_invalidate when NUD_FAILED update is from admin"

2018-10-20 Thread Roopa Prabhu

From: Roopa Prabhu 

This reverts commit 8e326289e3069dfc9fa9c209924668dd031ab8ef.

This patch results in unnecessary netlink notification when one
tries to delete a neigh entry already in NUD_FAILED state. Found
this with a buggy app that tries to delete a NUD_FAILED entry
repeatedly. While the notification issue can be fixed with more
checks, adding more complexity here seems unnecessary. Also,
recent tests with other changes in the neighbour code have
shown that the INCOMPLETE and PROBE checks are good enough for
the original issue.

Signed-off-by: Roopa Prabhu 
---
Dave, Sorry about the revert so late in the release. The issue
is not serious, but i think its better to revert before
it gets into a released kernel. I am happy to fix the
notification issue but seems unnecessary at this point.
Thanks.

 net/core/neighbour.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 91592fc..4e07824 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1148,8 +1148,7 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
neigh->nud_state = new;
err = 0;
notify = old & NUD_VALID;
-   if (((old & (NUD_INCOMPLETE | NUD_PROBE)) ||
-(flags & NEIGH_UPDATE_F_ADMIN)) &&
+   if ((old & (NUD_INCOMPLETE | NUD_PROBE)) &&
(new & NUD_FAILED)) {
neigh_invalidate(neigh);
notify = 1;
-- 
2.1.4

[PATCH iproute2 net-next] ipneigh: support for NTF_EXT_LEARNED flag on neigh entries

2018-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

Adds new option extern_learn to set NTF_EXT_LEARNED flag
on neigh entries.

Signed-off-by: Roopa Prabhu 
---
 ip/ipneigh.c| 7 ++-
 man/man8/ip-neighbour.8 | 9 -
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 165546e..042d01f 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -48,7 +48,7 @@ static void usage(void)
 {
fprintf(stderr, "Usage: ip neigh { add | del | change | replace }\n"
"{ ADDR [ lladdr LLADDR ] [ nud STATE ] 
| proxy ADDR } [ dev DEV ]\n");
-   fprintf(stderr, " [ router ]\n\n");
+   fprintf(stderr, " [ router ] [ 
extern_learn ]\n\n");
fprintf(stderr, "   ip neigh { show | flush } [ proxy ] [ to PREFIX 
] [ dev DEV ] [ nud STATE ]\n");
fprintf(stderr, " [ vrf NAME ]\n\n");
fprintf(stderr, "STATE := { permanent | noarp | stale | reachable | 
none |\n"
@@ -142,6 +142,8 @@ static int ipneigh_modify(int cmd, int flags, int argc, 
char **argv)
req.ndm.ndm_flags |= NTF_PROXY;
} else if (strcmp(*argv, "router") == 0) {
req.ndm.ndm_flags |= NTF_ROUTER;
+   } else if (matches(*argv, "extern_learn") == 0) {
+   req.ndm.ndm_flags |= NTF_EXT_LEARNED;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
dev = *argv;
@@ -354,6 +356,9 @@ int print_neigh(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (r->ndm_flags & NTF_PROXY)
print_null(PRINT_ANY, "proxy", " %s", "proxy");
 
+   if (r->ndm_flags & NTF_EXT_LEARNED)
+   print_null(PRINT_ANY, "extern_learn", " %s ", "extern_learn");
+
if (show_stats) {
if (tb[NDA_CACHEINFO])
print_cacheinfo(RTA_DATA(tb[NDA_CACHEINFO]));
diff --git a/man/man8/ip-neighbour.8 b/man/man8/ip-neighbour.8
index db286d1..4a672bb 100644
--- a/man/man8/ip-neighbour.8
+++ b/man/man8/ip-neighbour.8
@@ -24,7 +24,8 @@ ip-neighbour \- neighbour/arp tables management.
 .IR ADDR " } [ "
 .B  dev
 .IR DEV " ] [ "
-.BR router " ] "
+.BR router " ] [ "
+.BR extern_learn " ]"
 
 .ti -8
 .BR "ip neigh" " { " show " | " flush " } [ " proxy " ] [ " to
@@ -85,6 +86,12 @@ indicates whether we are proxying for this neigbour entry
 indicates whether neigbour is a router
 
 .TP
+.BI extern_learn
+this neigh entry was learned externally. This option can be used to
+indicate to the kernel that this is a controller learnt dynamic entry.
+Kernel will not gc such an entry.
+
+.TP
 .BI lladdr " LLADDRESS"
 the link layer address of the neighbour.
 .I LLADDRESS
-- 
2.1.4

[PATCH net-next v2] vxlan: support NTF_USE refresh of fdb entries

2018-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This makes use of NTF_USE in vxlan driver consistent
with bridge driver.

Signed-off-by: Roopa Prabhu 
---
v2: fix patch prefix

 drivers/net/vxlan.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fb0cdbb..018406c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -697,6 +697,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
__be16 port, __be32 src_vni, __be32 vni,
__u32 ifindex, __u8 ndm_flags)
 {
+   __u8 fdb_flags = (ndm_flags & ~NTF_USE);
struct vxlan_rdst *rd = NULL;
struct vxlan_fdb *f;
int notify = 0;
@@ -714,8 +715,8 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
f->updated = jiffies;
notify = 1;
}
-   if (f->flags != ndm_flags) {
-   f->flags = ndm_flags;
+   if (f->flags != fdb_flags) {
+   f->flags = fdb_flags;
f->updated = jiffies;
notify = 1;
}
@@ -737,6 +738,9 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
return rc;
notify |= rc;
}
+
+   if (ndm_flags & NTF_USE)
+   f->used = jiffies;
} else {
if (!(flags & NLM_F_CREATE))
return -ENOENT;
@@ -748,7 +752,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 
netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
- vni, ifindex, ndm_flags, );
+ vni, ifindex, fdb_flags, );
if (rc < 0)
return rc;
notify = 1;
-- 
2.1.4

Re: [PATCH cumulus-4.18.y] vxlan: support NTF_USE refresh of fdb entries

2018-10-11 Thread Roopa Prabhu

On Thu, Oct 11, 2018 at 12:33 PM Roopa Prabhu  wrote:
>
> From: Roopa Prabhu 
>
> This makes use of NTF_USE in vxlan driver consistent
> with bridge driver.
>
> Signed-off-by: Roopa Prabhu 
> ---

pls ignore. wrong patch prefix :)

[PATCH cumulus-4.18.y] vxlan: support NTF_USE refresh of fdb entries

2018-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This makes use of NTF_USE in vxlan driver consistent
with bridge driver.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fb0cdbb..018406c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -697,6 +697,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
__be16 port, __be32 src_vni, __be32 vni,
__u32 ifindex, __u8 ndm_flags)
 {
+   __u8 fdb_flags = (ndm_flags & ~NTF_USE);
struct vxlan_rdst *rd = NULL;
struct vxlan_fdb *f;
int notify = 0;
@@ -714,8 +715,8 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
f->updated = jiffies;
notify = 1;
}
-   if (f->flags != ndm_flags) {
-   f->flags = ndm_flags;
+   if (f->flags != fdb_flags) {
+   f->flags = fdb_flags;
f->updated = jiffies;
notify = 1;
}
@@ -737,6 +738,9 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
return rc;
notify |= rc;
}
+
+   if (ndm_flags & NTF_USE)
+   f->used = jiffies;
} else {
if (!(flags & NLM_F_CREATE))
return -ENOENT;
@@ -748,7 +752,7 @@ static int vxlan_fdb_update(struct vxlan_dev *vxlan,
 
netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
- vni, ifindex, ndm_flags, );
+ vni, ifindex, fdb_flags, );
if (rc < 0)
return rc;
notify = 1;
-- 
2.1.4

Re: [PATCH net-next 0/9] net: Kernel side filtering for route dumps

2018-10-11 Thread Roopa Prabhu

On Thu, Oct 11, 2018 at 9:16 AM David Ahern  wrote:
>
> On 10/11/18 10:07 AM, Jamal Hadi Salim wrote:
> > On 2018-10-11 11:46 a.m., Sowmini Varadhan wrote:
> >> On (10/11/18 08:26), Stephen Hemminger wrote:
> >>> You can do the something like this already with BPF socket filters.
> >>> But writing BPF for multi-part messages is hard.
> >>
> >> Indeed. And I was just experimenting with this for ARP just last week.
> >> So to handle the caes of "ip neigh show a.b.c.d" without walking through
> >> the entire arp table and filtering in userspace, you could add a
> >> sk_filter()
> >> hook like this:
> >>
> >
> > If this could be done a lot earlier aka at xxx_fill_info() bpf would
> > be a very good answer.
>
> IMO, bpf at the fill_info stage is not appropriate.
>
>
> > skb->sk (hence attached filter) should be available at that point.
> > Classical bpf per Sowmini's example maybe trickier.
> > Better - why dont we have an ebpf hook at this stage and then
> > we dont have to make changes to the kernel when someone adds
> > one more field to the filter?
> >
> > BTW: useful for events as well - not just dumps (as the name
> > fib_dump_filter suggests)
>
> you mean kernel notifications on internal events?
> 1. there is no user socket when notifications are created and the
> *_fill_info is invoked
>
> 2. notifications are global going to potentially many sockets. For these
> cases the existing sk_filter is appropriate.

3. All networking subsystems already have this type of netlink
attribute filtering that apps rely on. This series
just makes it consistent for route dumps. Apps use such mechanism
already when requesting dumps.
Like everywhere else, BPF hook can be an alternate parallel mechanism.

Re: [PATCH net-next v3 1/2] netlink: ipv4 igmp join notifications

2018-10-01 Thread Roopa Prabhu

On Wed, Sep 26, 2018 at 10:23 AM Roopa Prabhu  wrote:
>
> On Tue, Sep 25, 2018 at 2:34 AM, Patrick Ruddy
>  wrote:
> > On Wed, 2018-09-19 at 21:47 -0700, David Ahern wrote:
> >> On 9/18/18 6:12 AM, Patrick Ruddy wrote:
> >> >
> >> > I've hit a small snag with adding the new groups. The number of defined
> >> > groups currently sits at 31 so I can only add one before hitting the
> >>
> >> I believe you have no more available. RTNLGRP_* has been defined from 0
> >> (RTNLGRP_NONE) to 31 (RTNLGRP_IPV6_MROUTE_R) which covers the u32 range.
> >>
> >> > limit defined by the 32 bit groups bitmask in socakddr_nl. I can use 1
> >> > group for both v4 and v6 notifications which seems like the sensible
> >> > options since the AF is carried separately, but it breaks the precedent
> >> > where there are separate IPV4 and IPV6 groups for IFADDR.
> >> >
> >> > I have the combined group patches ready and can share them if that's
> >> > the preference.
> >> >
> >> > Has there been any previous discussion about extending the number of
> >> > availabel groups?
> >> >
> >>
> >> I have not tried it, but from a prior code review I believe you have you
> >> use setsockopt to add groups > 31.
> >
> > I can certainly join the new groups using setsockopt and
> > NETLINK_ADD_MEMBERSHIP.
> > I can't see any examples of extending the defined group list within the
> > kernel so I assume I just add to the RTNLGRP enum list with a suitable
> > comment to indicate that later groups must be joined with the mechanism
> > above or am I missing some other way of dynamically adding groups?
> >
>
> With a quick look, there are other subsystem specific groups:
> xfrm_nlgroups,  nfnetlink_groups ...which i see apps registering using
> NETLINK_ADD_MEMBERSHIP.
>

scratch that. These groups are for different netlink protocols and the
limit on netlink groups per protocol seems to be 32.
We seem to have hit the max on groups for NETLINK_ROUTE  protocol. we
will have to rework the group handling
to make room for more groups. We do need room for more groups in the
future and not just for this patchset.




> seems like an overkill to add something like this for your case.
>
> yet another option to consider:
> use family: RTNL_FAMILY_IPMR/  RTNL_FAMILY_IP6MR  with RTM_GETADDR/DELADDR
> and use the existing groups: RTNLGRP_IPV4_IFADDR /  RTNLGRP_IPV6_IFADDR
>
> (pls check if this will break any existing users)
>
> precedence is ipmr fib rules.

[PATCH iproute2 net-next] ipneigh: update man page and help for router

2018-09-29 Thread Roopa Prabhu

From: Roopa Prabhu 

While at it also add missing text for proxy in the man page.

Signed-off-by: Roopa Prabhu 
---
 ip/ipneigh.c|  1 +
 man/man8/ip-neighbour.8 | 11 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 5747152..165546e 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -48,6 +48,7 @@ static void usage(void)
 {
fprintf(stderr, "Usage: ip neigh { add | del | change | replace }\n"
"{ ADDR [ lladdr LLADDR ] [ nud STATE ] 
| proxy ADDR } [ dev DEV ]\n");
+   fprintf(stderr, " [ router ]\n\n");
fprintf(stderr, "   ip neigh { show | flush } [ proxy ] [ to PREFIX 
] [ dev DEV ] [ nud STATE ]\n");
fprintf(stderr, " [ vrf NAME ]\n\n");
fprintf(stderr, "STATE := { permanent | noarp | stale | reachable | 
none |\n"
diff --git a/man/man8/ip-neighbour.8 b/man/man8/ip-neighbour.8
index bbfe8e7..db286d1 100644
--- a/man/man8/ip-neighbour.8
+++ b/man/man8/ip-neighbour.8
@@ -23,7 +23,8 @@ ip-neighbour \- neighbour/arp tables management.
 .B proxy
 .IR ADDR " } [ "
 .B  dev
-.IR DEV " ]"
+.IR DEV " ] [ "
+.BR router " ] "
 
 .ti -8
 .BR "ip neigh" " { " show " | " flush " } [ " proxy " ] [ " to
@@ -76,6 +77,14 @@ the protocol address of the neighbour. It is either an IPv4 
or IPv6 address.
 the interface to which this neighbour is attached.
 
 .TP
+.BI proxy
+indicates whether we are proxying for this neigbour entry
+
+.TP
+.BI router
+indicates whether neigbour is a router
+
+.TP
 .BI lladdr " LLADDRESS"
 the link layer address of the neighbour.
 .I LLADDRESS
-- 
2.1.4

Re: [PATCH iproute2 net-next] ipneigh: support setting of NTF_ROUTER on neigh entries

2018-09-29 Thread Roopa Prabhu

On Fri, Sep 28, 2018 at 9:57 AM David Ahern  wrote:
>
> On 9/25/18 3:15 PM, Roopa Prabhu wrote:
> > From: Roopa Prabhu 
> >
> > Signed-off-by: Roopa Prabhu 
> > ---
> >  ip/ipneigh.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
>
> applied to iproute2-next.
>
> And then I noticed you did not update the help or the man page. Please
> send a follow up.

Oops, looks like i missed including it in this patch before sending.
will post it today.

Re: [PATCH] MAINTAINERS: change bridge maintainers

2018-09-27 Thread Roopa Prabhu

On Thu, Sep 27, 2018 at 1:47 AM Stephen Hemminger
 wrote:
>
> I haven't been doing reviews only but not active development on bridge
> code for several years. Roopa and Nikolay have been doing most of
> the new features and have agreed to take over as new co-maintainers.
>
> Signed-off-by: Stephen Hemminger 
> ---

Acked-by: Roopa Prabhu 

Thanks

Re: [PATCH net-next v3 1/2] netlink: ipv4 igmp join notifications

2018-09-26 Thread Roopa Prabhu

On Tue, Sep 25, 2018 at 2:34 AM, Patrick Ruddy
 wrote:
> On Wed, 2018-09-19 at 21:47 -0700, David Ahern wrote:
>> On 9/18/18 6:12 AM, Patrick Ruddy wrote:
>> >
>> > I've hit a small snag with adding the new groups. The number of defined
>> > groups currently sits at 31 so I can only add one before hitting the
>>
>> I believe you have no more available. RTNLGRP_* has been defined from 0
>> (RTNLGRP_NONE) to 31 (RTNLGRP_IPV6_MROUTE_R) which covers the u32 range.
>>
>> > limit defined by the 32 bit groups bitmask in socakddr_nl. I can use 1
>> > group for both v4 and v6 notifications which seems like the sensible
>> > options since the AF is carried separately, but it breaks the precedent
>> > where there are separate IPV4 and IPV6 groups for IFADDR.
>> >
>> > I have the combined group patches ready and can share them if that's
>> > the preference.
>> >
>> > Has there been any previous discussion about extending the number of
>> > availabel groups?
>> >
>>
>> I have not tried it, but from a prior code review I believe you have you
>> use setsockopt to add groups > 31.
>
> I can certainly join the new groups using setsockopt and
> NETLINK_ADD_MEMBERSHIP.
> I can't see any examples of extending the defined group list within the
> kernel so I assume I just add to the RTNLGRP enum list with a suitable
> comment to indicate that later groups must be joined with the mechanism
> above or am I missing some other way of dynamically adding groups?
>

With a quick look, there are other subsystem specific groups:
xfrm_nlgroups,  nfnetlink_groups ...which i see apps registering using
NETLINK_ADD_MEMBERSHIP.

seems like an overkill to add something like this for your case.

yet another option to consider:
use family: RTNL_FAMILY_IPMR/  RTNL_FAMILY_IP6MR  with RTM_GETADDR/DELADDR
and use the existing groups: RTNLGRP_IPV4_IFADDR /  RTNLGRP_IPV6_IFADDR

(pls check if this will break any existing users)

precedence is ipmr fib rules.

[PATCH net-next] bridge: br_arp_nd_proxy: set icmp6_router if neigh has NTF_ROUTER

2018-09-25 Thread Roopa Prabhu

From: Roopa Prabhu 

Fixes: ed842faeb2bd ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Roopa Prabhu 
---
 net/bridge/br_arp_nd_proxy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_arp_nd_proxy.c b/net/bridge/br_arp_nd_proxy.c
index 2cf7716..d42e390 100644
--- a/net/bridge/br_arp_nd_proxy.c
+++ b/net/bridge/br_arp_nd_proxy.c
@@ -311,7 +311,7 @@ static void br_nd_send(struct net_bridge *br, struct 
net_bridge_port *p,
/* Neighbor Advertisement */
memset(na, 0, sizeof(*na) + na_olen);
na->icmph.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT;
-   na->icmph.icmp6_router = 0; /* XXX: should be 1 ? */
+   na->icmph.icmp6_router = (n->flags & NTF_ROUTER) ? 1 : 0;
na->icmph.icmp6_override = 1;
na->icmph.icmp6_solicited = 1;
na->target = ns->target;
-- 
2.1.4

[PATCH iproute2 net-next] ipneigh: support setting of NTF_ROUTER on neigh entries

2018-09-25 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 ip/ipneigh.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index a0af705..5747152 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -139,6 +139,8 @@ static int ipneigh_modify(int cmd, int flags, int argc, 
char **argv)
dst_ok = 1;
dev_ok = 1;
req.ndm.ndm_flags |= NTF_PROXY;
+   } else if (strcmp(*argv, "router") == 0) {
+   req.ndm.ndm_flags |= NTF_ROUTER;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
dev = *argv;
-- 
2.1.4

[PATCH net-next 2/2] neighbour: send netlink notification if NTF_ROUTER changes

2018-09-22 Thread Roopa Prabhu

From: Roopa Prabhu 

send netlink notification if neigh_update results in NTF_ROUTER
change and if NEIGH_UPDATE_F_ISROUTER is on. Also move the
NTF_ROUTER change function into a helper.

Signed-off-by: Roopa Prabhu 
---
 include/net/neighbour.h | 15 +++
 net/core/neighbour.c|  7 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6c1eecd..0874f7fc 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -544,4 +544,19 @@ static inline void neigh_update_ext_learned(struct 
neighbour *neigh, u32 flags,
*notify = 1;
}
 }
+
+static inline void neigh_update_is_router(struct neighbour *neigh, u32 flags,
+ int *notify)
+{
+   u8 ndm_flags = 0;
+
+   ndm_flags |= (flags & NEIGH_UPDATE_F_ISROUTER) ? NTF_ROUTER : 0;
+   if ((neigh->flags ^ ndm_flags) & NTF_ROUTER) {
+   if (ndm_flags & NTF_ROUTER)
+   neigh->flags |= NTF_ROUTER;
+   else
+   neigh->flags &= ~NTF_ROUTER;
+   *notify = 1;
+   }
+}
 #endif
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ca99456..fb89294 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1277,11 +1277,8 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
neigh->arp_queue_len_bytes = 0;
}
 out:
-   if (update_isrouter) {
-   neigh->flags = (flags & NEIGH_UPDATE_F_ISROUTER) ?
-   (neigh->flags | NTF_ROUTER) :
-   (neigh->flags & ~NTF_ROUTER);
-   }
+   if (update_isrouter)
+   neigh_update_is_router(neigh, flags, );
write_unlock_bh(>lock);
 
if (notify)
-- 
2.1.4

[PATCH net-next 1/2] neighbour: allow admin to set NTF_ROUTER

2018-09-22 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch allows admin setting of NTF_ROUTER flag
on a neighbour entry. This enables external control
plane (like bgp evpn) to manage neigh entries with
NTF_ROUTER flag.

Signed-off-by: Roopa Prabhu 
---
 net/core/neighbour.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index aa19d86..ca99456 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1709,7 +1709,8 @@ static int neigh_delete(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh,
 struct netlink_ext_ack *extack)
 {
-   int flags = NEIGH_UPDATE_F_ADMIN | NEIGH_UPDATE_F_OVERRIDE;
+   int flags = NEIGH_UPDATE_F_ADMIN | NEIGH_UPDATE_F_OVERRIDE |
+   NEIGH_UPDATE_F_OVERRIDE_ISROUTER;
struct net *net = sock_net(skb->sk);
struct ndmsg *ndm;
struct nlattr *tb[NDA_MAX+1];
@@ -1784,12 +1785,16 @@ static int neigh_add(struct sk_buff *skb, struct 
nlmsghdr *nlh,
}
 
if (!(nlh->nlmsg_flags & NLM_F_REPLACE))
-   flags &= ~NEIGH_UPDATE_F_OVERRIDE;
+   flags &= ~(NEIGH_UPDATE_F_OVERRIDE |
+  NEIGH_UPDATE_F_OVERRIDE_ISROUTER);
}
 
if (ndm->ndm_flags & NTF_EXT_LEARNED)
flags |= NEIGH_UPDATE_F_EXT_LEARNED;
 
+   if (ndm->ndm_flags & NTF_ROUTER)
+   flags |= NEIGH_UPDATE_F_ISROUTER;
+
if (ndm->ndm_flags & NTF_USE) {
neigh_event_send(neigh, NULL);
err = 0;
-- 
2.1.4

[PATCH net-next 0/2] few NTF_ROUTER related updates

2018-09-22 Thread Roopa Prabhu

From: Roopa Prabhu 

This series allows setting of NTF_ROUTER by an external
entity (eg BGP E-VPN control plane). Also fixes missing
netlink notification on neigh NTF_ROUTER flag changes.

Roopa Prabhu (2):
  neighbour: allow admin to set NTF_ROUTER
  neighbour: send netlink notification if NTF_ROUTER changes

 include/net/neighbour.h | 15 +++
 net/core/neighbour.c| 16 +---
 2 files changed, 24 insertions(+), 7 deletions(-)

-- 
2.1.4

Re: [PATCH net-next v3 1/2] netlink: ipv4 igmp join notifications

2018-09-13 Thread Roopa Prabhu

On Thu, Sep 6, 2018 at 8:40 PM, Roopa Prabhu  wrote:
> On Thu, Sep 6, 2018 at 2:10 AM, Patrick Ruddy
>  wrote:
>> Some userspace applications need to know about IGMP joins from the
>> kernel for 2 reasons:
>> 1. To allow the programming of multicast MAC filters in hardware
>> 2. To form a multicast FORUS list for non link-local multicast
>>groups to be sent to the kernel and from there to the interested
>>party.
>> (1) can be fulfilled but simply sending the hardware multicast MAC
>> address to be programmed but (2) requires the L3 address to be sent
>> since this cannot be constructed from the MAC address whereas the
>> reverse translation is a standard library function.
>>
>> This commit provides addition and deletion of multicast addresses
>> using the RTM_NEWMDB and RTM_DELMDB messages with AF_INET. It also
>> provides the RTM_GETMDB extension to allow multicast join state to
>> be read from the kernel.
>>
>> Signed-off-by: Patrick Ruddy 
>> ---
>> v3 rework to use RTM_***MDB messages as per review comments.
>
> Patrick, this version seems to be using RTM_***MDB msgs with the
> RTM_*ADDR format.
> We cant do that...because existing RTM_MDB users will be confused.
>
> My request was to evaluate RTM_***MDB msg format. see
> nlmsg_populate_mdb_fill for details.
>
> If you can wait a day or two I can share some experimental code that
> moves high level RTM_*MDB msg handling into net/core/rtnetlink.c
> similar to RTM_*FDB
>

I was trying to get a default per interface (non bridge) RTM_*MDB
working, but realized that the dev->mc
entries are already getting dumped as part of RTM_*FDB msgs instead of
RTM_*MDB. (see net/core/rtnetlink.c:ndo_dflt_fdb_dump).
This adds another wrench.

so, that puts us back to your use of RTM_NEWADDR.
Instead of using IFA_ADDRESS, you could introduce a new one
IFA_IGMP_MULTICAST  (since IFA_MULTICAST is already taken).


To keep existing users of RTM_NEWADDR unaffected. I think you can use
the IPMR family with RTM_NEWADDR.
We can introduce new notification group. (We can choose to add a new
family too, but that seems unnecessary)

since you only need dumps:
rtnl_register(RTNL_FAMILY_IPMR, RTM_GETADDR, NULL, igmp_rtm_dumpaddrs, 0);

For notifications, since we already have many variants for routes, I
don't see a problem adding similar addr variants
RTNLGRP_IPV4_MCADDR

(Others on the list may have more feedback).

[PATCH net] net: rtnl_configure_link: fix dev flags changes arg to __dev_notify_flags

2018-09-12 Thread Roopa Prabhu

From: Roopa Prabhu 

This fix addresses https://bugzilla.kernel.org/show_bug.cgi?id=201071

Commit 5025f7f7d506 wrongly relied on __dev_change_flags to notify users of
dev flag changes in the case when dev->rtnl_link_state = RTNL_LINK_INITIALIZED.
Fix it by indicating flag changes explicitly to __dev_notify_flags.

Fixes: 5025f7f7d506 ("rtnetlink: add rtnl_link_state check in 
rtnl_configure_link")
Reported-By: Liam mcbirnie 
Signed-off-by: Roopa Prabhu 
---
Dave, if 5025f7f7d506 made it to stable, request you to pls queue this one up 
too. Thanks.

 net/core/rtnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 60c9288..63ce2283 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2810,7 +2810,7 @@ int rtnl_configure_link(struct net_device *dev, const 
struct ifinfomsg *ifm)
}
 
if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) {
-   __dev_notify_flags(dev, old_flags, 0U);
+   __dev_notify_flags(dev, old_flags, (old_flags ^ dev->flags));
} else {
dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
__dev_notify_flags(dev, old_flags, ~0U);
-- 
2.1.4

Re: Fw: [Bug 201071] New: Creating a vxlan in state 'up' does not give proper RTM_NEWLINK message

2018-09-11 Thread Roopa Prabhu

On Tue, Sep 11, 2018 at 3:10 PM, Roopa Prabhu  wrote:
> On Mon, Sep 10, 2018 at 11:55 AM, Stephen Hemminger
>  wrote:
>>
>>
>> Begin forwarded message:
>>
>> Date: Mon, 10 Sep 2018 04:04:37 +
>> From: bugzilla-dae...@bugzilla.kernel.org
>> To: step...@networkplumber.org
>> Subject: [Bug 201071] New: Creating a vxlan in state 'up' does not give 
>> proper RTM_NEWLINK message
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=201071
>>
>> Bug ID: 201071
>>Summary: Creating a vxlan in state 'up' does not give proper
>> RTM_NEWLINK message
>>Product: Networking
>>Version: 2.5
>> Kernel Version: 4.19-rc1
>>   Hardware: All
>> OS: Linux
>>   Tree: Mainline
>> Status: NEW
>>   Severity: normal
>>   Priority: P1
>>  Component: Other
>>   Assignee: step...@networkplumber.org
>>   Reporter: liam.mcbir...@boeing.com
>> Regression: Yes
>>
>> If a vxlan is created with state 'up', the RTM_NEWLINK message shows the 
>> state
>> as down, and there no other netlink messages are sent.
>> As a result, processes listening to netlink are never notified that the vxlan
>> link is up.
>
> thanks for the fwd. looking...
>
>

attached a patch to the bug. Will post it here after some confirmation
of the fix from the reporter.

Re: Fw: [Bug 201071] New: Creating a vxlan in state 'up' does not give proper RTM_NEWLINK message

2018-09-11 Thread Roopa Prabhu

On Mon, Sep 10, 2018 at 11:55 AM, Stephen Hemminger
 wrote:
>
>
> Begin forwarded message:
>
> Date: Mon, 10 Sep 2018 04:04:37 +
> From: bugzilla-dae...@bugzilla.kernel.org
> To: step...@networkplumber.org
> Subject: [Bug 201071] New: Creating a vxlan in state 'up' does not give 
> proper RTM_NEWLINK message
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201071
>
> Bug ID: 201071
>Summary: Creating a vxlan in state 'up' does not give proper
> RTM_NEWLINK message
>Product: Networking
>Version: 2.5
> Kernel Version: 4.19-rc1
>   Hardware: All
> OS: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: Other
>   Assignee: step...@networkplumber.org
>   Reporter: liam.mcbir...@boeing.com
> Regression: Yes
>
> If a vxlan is created with state 'up', the RTM_NEWLINK message shows the state
> as down, and there no other netlink messages are sent.
> As a result, processes listening to netlink are never notified that the vxlan
> link is up.

thanks for the fwd. looking...



>
> eg.
> # ip link add test up type vxlan id 8 group 224.224.224.224 dev eth0
>
> Output of ip monitor link
> # 4: test:  mtu 1450 qdisc noop state DOWN group default
>   link/ether ee:cd:97:1a:cf:91 brd ff:ff:ff:ff:ff:ff
>
> Output of ip link show (expected from netlink message)
> # 4: test:  mtu 1450 qdisc noqueue state
> UNKNOWN group default qlen 1000
>   link/ether ee:cd:97:1a:cf:91 brd ff:ff:ff:ff:ff:ff
>
> This is a regression introduced by the following patch series.
> https://patchwork.ozlabs.org/patch/947181/
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

Re: [PATCH net-next v3 1/2] netlink: ipv4 igmp join notifications

2018-09-06 Thread Roopa Prabhu

On Thu, Sep 6, 2018 at 2:10 AM, Patrick Ruddy
 wrote:
> Some userspace applications need to know about IGMP joins from the
> kernel for 2 reasons:
> 1. To allow the programming of multicast MAC filters in hardware
> 2. To form a multicast FORUS list for non link-local multicast
>groups to be sent to the kernel and from there to the interested
>party.
> (1) can be fulfilled but simply sending the hardware multicast MAC
> address to be programmed but (2) requires the L3 address to be sent
> since this cannot be constructed from the MAC address whereas the
> reverse translation is a standard library function.
>
> This commit provides addition and deletion of multicast addresses
> using the RTM_NEWMDB and RTM_DELMDB messages with AF_INET. It also
> provides the RTM_GETMDB extension to allow multicast join state to
> be read from the kernel.
>
> Signed-off-by: Patrick Ruddy 
> ---
> v3 rework to use RTM_***MDB messages as per review comments.

Patrick, this version seems to be using RTM_***MDB msgs with the
RTM_*ADDR format.
We cant do that...because existing RTM_MDB users will be confused.

My request was to evaluate RTM_***MDB msg format. see
nlmsg_populate_mdb_fill for details.

If you can wait a day or two I can share some experimental code that
moves high level RTM_*MDB msg handling into net/core/rtnetlink.c
similar to RTM_*FDB



>
>  net/ipv4/igmp.c | 139 
>  1 file changed, 139 insertions(+)
>
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 4da39446da2d..aed819e2ea93 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -86,6 +86,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1385,6 +1386,91 @@ static void ip_mc_hash_remove(struct in_device *in_dev,
>  }
>
>
> +static int fill_addr(struct sk_buff *skb, struct net_device *dev, __be32 
> addr,
> +int type, unsigned int flags)
> +{
> +   struct nlmsghdr *nlh;
> +   struct ifaddrmsg *ifm;
> +
> +   nlh = nlmsg_put(skb, 0, 0, type, sizeof(*ifm), flags);
> +   if (!nlh)
> +   return -EMSGSIZE;
> +
> +   ifm = nlmsg_data(nlh);
> +   ifm->ifa_family = AF_INET;
> +   ifm->ifa_prefixlen = 32;
> +   ifm->ifa_flags = IFA_F_PERMANENT;
> +   ifm->ifa_scope = RT_SCOPE_LINK;
> +   ifm->ifa_index = dev->ifindex;
> +
> +   if (nla_put_in_addr(skb, IFA_ADDRESS, addr))
> +   goto nla_put_failure;
> +   nlmsg_end(skb, nlh);
> +   return 0;
> +
> +nla_put_failure:
> +   nlmsg_cancel(skb, nlh);
> +   return -EMSGSIZE;
> +}
> +
> +static inline size_t addr_nlmsg_size(void)
> +{
> +   return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
> +   + nla_total_size(sizeof(__be32));
> +}
> +
> +static void ip_mc_addr_notify(struct net_device *dev, __be32 addr, int type)
> +{
> +   struct net *net = dev_net(dev);
> +   struct sk_buff *skb;
> +   int err = -ENOBUFS;
> +
> +   skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
> +   if (!skb)
> +   goto errout;
> +
> +   err = fill_addr(skb, dev, addr, type, 0);
> +   if (err < 0) {
> +   WARN_ON(err == -EMSGSIZE);
> +   kfree_skb(skb);
> +   goto errout;
> +   }
> +   rtnl_notify(skb, net, 0, RTNLGRP_MDB, NULL, GFP_ATOMIC);
> +   return;
> +errout:
> +   if (err < 0)
> +   rtnl_set_sk_err(net, RTNLGRP_MDB, err);
> +}
> +
> +int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb,
> + struct net_device *dev)
> +{
> +   int s_idx;
> +   int idx = 0;
> +   struct ip_mc_list *im;
> +   struct in_device *in_dev;
> +
> +   ASSERT_RTNL();
> +
> +   s_idx = cb->args[2];
> +   in_dev = __in_dev_get_rtnl(dev);
> +
> +   for_each_pmc_rtnl(in_dev, im) {
> +   if (idx < s_idx)
> +   continue;
> +   if (fill_addr(skb, dev, im->multiaddr, RTM_NEWMDB,
> + NLM_F_MULTI) < 0)
> +   goto done;
> +   nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> +   idx++;
> +   }
> +
> + done:
> +   cb->args[2] = idx;
> +
> +   return skb->len;
> +}
> +
>  /*
>   * A socket has joined a multicast group on device dev.
>   */
> @@ -1430,6 +1516,8 @@ static void __ip_mc_inc_group(struct in_device *in_dev, 
> __be32 addr,
> igmpv3_del_delrec(in_dev, im);
>  #endif
> igmp_group_added(im);
> +
> +   ip_mc_addr_notify(in_dev->dev, addr, RTM_NEWMDB);
> if (!in_dev->dead)
> ip_rt_multicast_event(in_dev);
>  out:
> @@ -1661,6 +1749,8 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 
> addr)
> in_dev->mc_count--;
> igmp_group_dropped(i);
> ip_mc_clear_src(i);
> +

Re: [PATCH net-next v2 1/2] netlink: ipv4 igmp join notifications

2018-09-03 Thread Roopa Prabhu

On Sun, Sep 2, 2018 at 4:18 AM, Patrick Ruddy
 wrote:
> Hi Roopa
>
> inline
>
> thx
>
> -pr
>
> On Fri, 2018-08-31 at 09:29 -0700, Roopa Prabhu wrote:
>> On Fri, Aug 31, 2018 at 4:20 AM, Patrick Ruddy
>>  wrote:
>> > Some userspace applications need to know about IGMP joins from the kernel
>> > for 2 reasons
>> > 1. To allow the programming of multicast MAC filters in hardware
>> > 2. To form a multicast FORUS list for non link-local multicast
>> >groups to be sent to the kernel and from there to the interested
>> >party.
>> > (1) can be fulfilled but simply sending the hardware multicast MAC
>> > address to be programmed but (2) requires the L3 address to be sent
>> > since this cannot be constructed from the MAC address whereas the
>> > reverse translation is a standard library function.
>> >
>> > This commit provides addition and deletion of multicast addresses
>> > using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
>> > the RTM_GETADDR extension to allow multicast join state to be read
>> > from the kernel.
>> >
>> > Signed-off-by: Patrick Ruddy 
>> > ---
>> > v2: fix kbuild warnings.
>>
>> I am still going through the series, but AFAICT, user-space caches listening 
>> to
>> RTNLGRP_IPV4_IFADDR will now also get multicast addresses by default ?
>>
>
> Yes that's the crux of this change. It's unfortunate that I could not
> use IFA_MULTICAST to distinguish the SAFI. I suppose the other option
> would be to create a set of new NEW/DEL/GETMULTICAST messages but the
> partial code for RTM_GETMULTICAST in ipv6/mcast.c complicates that
> slightly. Happy to look at it if you think that would be be better.
>

yeah, true. Thinking about this some more, you are adding an interface
for multicast entries learnt via igmp.
There is already a netlink channel for layer2 mc addresses via igmp. I
can't see why that cannot be used.
It is RTM_*MDB msgs. It is currently only available for the bridge.
But, I have a requirement for it to be
available via a vxlan dev...so, I am looking at making it available on
other devices.

Can you check if RTM_*MDB msgs can be made to work for your case ?.

The reason I think it should be possible is because this is similar to
bridge fdb entries.
The bridge fdb api  (RTM_NEWNEIGH with AF_BRIDGE) is overloaded to
notify and dump netdev unicast addresses.
similarly I think the mdb api can be overloaded to notify and dump
netdev multicast addresses (statically added or learnt via igmp)

Re: [PATCH net-next v2 1/2] netlink: ipv4 igmp join notifications

2018-08-31 Thread Roopa Prabhu

On Fri, Aug 31, 2018 at 4:20 AM, Patrick Ruddy
 wrote:
> Some userspace applications need to know about IGMP joins from the kernel
> for 2 reasons
> 1. To allow the programming of multicast MAC filters in hardware
> 2. To form a multicast FORUS list for non link-local multicast
>groups to be sent to the kernel and from there to the interested
>party.
> (1) can be fulfilled but simply sending the hardware multicast MAC
> address to be programmed but (2) requires the L3 address to be sent
> since this cannot be constructed from the MAC address whereas the
> reverse translation is a standard library function.
>
> This commit provides addition and deletion of multicast addresses
> using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
> the RTM_GETADDR extension to allow multicast join state to be read
> from the kernel.
>
> Signed-off-by: Patrick Ruddy 
> ---
> v2: fix kbuild warnings.

I am still going through the series, but AFAICT, user-space caches listening to
RTNLGRP_IPV4_IFADDR will now also get multicast addresses by default ?


>
>  include/linux/igmp.h |  4 ++
>  net/ipv4/devinet.c   | 39 +--
>  net/ipv4/igmp.c  | 90 
>  3 files changed, 122 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index 119f53941c12..644a548024ed 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -19,6 +19,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>
>  static inline struct igmphdr *igmp_hdr(const struct sk_buff *skb)
> @@ -130,6 +132,8 @@ extern void ip_mc_unmap(struct in_device *);
>  extern void ip_mc_remap(struct in_device *);
>  extern void ip_mc_dec_group(struct in_device *in_dev, __be32 addr);
>  extern void ip_mc_inc_group(struct in_device *in_dev, __be32 addr);
> +extern int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback 
> *cb,
> +struct net_device *dev);
>  int ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed);
>
>  #endif
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index ea4bd8a52422..42f7dcc4fb5e 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -57,6 +57,7 @@
>  #endif
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -1651,6 +1652,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct 
> netlink_callback *cb)
> int h, s_h;
> int idx, s_idx;
> int ip_idx, s_ip_idx;
> +   int multicast, mcast_idx;
> struct net_device *dev;
> struct in_device *in_dev;
> struct in_ifaddr *ifa;
> @@ -1659,6 +1661,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct 
> netlink_callback *cb)
> s_h = cb->args[0];
> s_idx = idx = cb->args[1];
> s_ip_idx = ip_idx = cb->args[2];
> +   multicast = cb->args[3];
> +   mcast_idx = cb->args[4];
>
> for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
> idx = 0;
> @@ -1675,18 +1679,29 @@ static int inet_dump_ifaddr(struct sk_buff *skb, 
> struct netlink_callback *cb)
> if (!in_dev)
> goto cont;
>
> -   for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
> -ifa = ifa->ifa_next, ip_idx++) {
> -   if (ip_idx < s_ip_idx)
> -   continue;
> -   if (inet_fill_ifaddr(skb, ifa,
> -NETLINK_CB(cb->skb).portid,
> -cb->nlh->nlmsg_seq,
> -RTM_NEWADDR, NLM_F_MULTI) < 0) {
> -   rcu_read_unlock();
> -   goto done;
> +   if (!multicast) {
> +   for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
> +ifa = ifa->ifa_next, ip_idx++) {
> +   if (ip_idx < s_ip_idx)
> +   continue;
> +   if (inet_fill_ifaddr(skb, ifa,
> +
> NETLINK_CB(cb->skb).portid,
> +
> cb->nlh->nlmsg_seq,
> +RTM_NEWADDR,
> +NLM_F_MULTI) < 
> 0) {
> +   rcu_read_unlock();
> +   goto done;
> +   }
> +   nl_dump_check_consistent(cb,
> +
> nlmsg_hdr(skb));
> }
> -

Re: [PATCH v2 iproute2-next 2/5] bridge: colorize output and use JSON print library

2018-08-28 Thread Roopa Prabhu

On Sat, Jul 14, 2018 at 6:41 PM, Roopa Prabhu  wrote:
> On Tue, Feb 20, 2018 at 11:24 AM, Stephen Hemminger
>  wrote:
>> From: Stephen Hemminger 
>>
>> Use new functions from json_print to simplify code.
>> Provide standard flag for colorizing output.
>>
>> The shortened -c flag is ambiguous it could mean color or
>> compressvlan; it is now changed to mean color for consistency
>> with other iproute2 commands.
>>
>> Signed-off-by: Stephen Hemminger 
>> ---

[snip]

>
> Stephen, this seems to have broken both json and non-json output.
>
> Here is some output before and after the patch (same thing for tunnelshow):
>
> before:
> $bridge vlan show
> portvlan ids
> hostbond41000
>  1001 PVID Egress Untagged
>  1002
>  1003
>  1004
>
> hostbond31000 PVID Egress Untagged
>  1001
>  1002
>  1003
>  1004
>
> bridge   1 PVID Egress Untagged
>  1000
>  1001
>  1002
>  1003
>  1004
>
> vxlan0   1 PVID Egress Untagged
>  1000
>  1001
>  1002
>  1003
>  1004
>
>
> $ bridge -j -c vlan show
> {
> "hostbond4": [{
> "vlan": 1000
> },{
> "vlan": 1001,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1002,
> "vlanEnd": 1004
> }
> ],
> "hostbond3": [{
> "vlan": 1000,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1001,
> "vlanEnd": 1004
> }
> ],
> "bridge": [{
> "vlan": 1,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1000,
> "vlanEnd": 1004
> }
> ],
> "vxlan0": [{
> "vlan": 1,
> "flags": ["PVID","Egress Untagged"
> ]
> },{
> "vlan": 1000,
> "vlanEnd": 1004
> }
> ]
> }
>
>
> after:
> 
>
> $bridge vlan show
> portvlan ids
> hostbond4
>  10001001 PVID untagged  100210031004
> hostbond3
>  1000 PVID untagged  1001100210031004
> bridge
>  1 PVID untagged 10001001100210031004
> vxlan0
>  1 PVID untagged 10001001100210031004
>
> $bridge -j -c vlan show
> ["hostbond4","vlan":[{"vlan":1000},{"vlan":1001,"pvid":null,"untagged":null},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"hostbond3","vlan":[{"vlan":1000,"pvid":null,"untagged":null},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"bridge","vlan":[{"vlan":1,"pvid":null,"untagged":null},{"vlan":1000},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}],"vxlan0","vlan":[{"vlan":1,"pvid":null,"untagged":null},{"vlan":1000},{"vlan":1001},{"vlan":1002},{"vlan":1003},{"vlan":1004}]]


Stephen, ping again...

I was trying to fix it ...but its not trivial enough for the time I
have right now.
If this cannot be fixed soon, I request you to please revert the patch
as it has broken the json output completely.

Thanks.

Re: [PATCH net 0/4] vxlan: fix default fdb entry user-space notify ordering/race

2018-07-20 Thread Roopa Prabhu

On Fri, Jul 20, 2018 at 1:21 PM, Roopa Prabhu  wrote:
> From: Roopa Prabhu 
>
> Problem:
> In vxlan_newlink, a default fdb entry is added before register_netdev.
> The default fdb creation function notifies user-space of the
> fdb entry on the vxlan device which user-space does not know about yet.
> (RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
>
> This series fixes the user-space netlink notification ordering issue
> with the following changes:
> - decouple fdb notify from fdb create.
> - Move fdb notify after register_netdev.
> - modify rtnl_configure_link to allow configuring a link early.
> - Call rtnl_configure_link in vxlan newlink handler to notify
> userspace about the newlink before fdb notify and
> hence avoiding the user-space race.
>
> Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
> Signed-off-by: Roopa Prabhu 

Just noticed that my email editor is greying out/hiding the note i
added for this series. repasting here so that its not missed:

Dave, I am sending this series against net on your request on my
net-next series.
https://marc.info/?l=linux-netdev=153098151929102=2
I did not know how long to wait before i send it to net. So sending it now.
As you also noted it is a very old bug. If you have changed your mind
about stable or
if this is too early for net and you think we should soak it more in net-next,
pls feel free to drop. also, i will be happy to do a stable backport
if needed. thanks.

[PATCH net 2/4] vxlan: add new fdb alloc and create helpers

2018-07-20 Thread Roopa Prabhu

From: Roopa Prabhu 

- Add new vxlan_fdb_alloc helper
- rename existing vxlan_fdb_create into vxlan_fdb_update:
because it really creates or updates an existing
fdb entry
- move new fdb creation into a separate vxlan_fdb_create

Main motivation for this change is to introduce the ability
to decouple vxlan fdb creation and notify, used in a later patch.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 91 -
 1 file changed, 62 insertions(+), 29 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f6bb1d5..c8d5bff 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -636,9 +636,62 @@ static int vxlan_gro_complete(struct sock *sk, struct 
sk_buff *skb, int nhoff)
return eth_gro_complete(skb, nhoff + sizeof(struct vxlanhdr));
 }
 
-/* Add new entry to forwarding table -- assumes lock held */
+static struct vxlan_fdb *vxlan_fdb_alloc(struct vxlan_dev *vxlan,
+const u8 *mac, __u16 state,
+__be32 src_vni, __u8 ndm_flags)
+{
+   struct vxlan_fdb *f;
+
+   f = kmalloc(sizeof(*f), GFP_ATOMIC);
+   if (!f)
+   return NULL;
+   f->state = state;
+   f->flags = ndm_flags;
+   f->updated = f->used = jiffies;
+   f->vni = src_vni;
+   INIT_LIST_HEAD(>remotes);
+   memcpy(f->eth_addr, mac, ETH_ALEN);
+
+   return f;
+}
+
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
const u8 *mac, union vxlan_addr *ip,
+   __u16 state, __be16 port, __be32 src_vni,
+   __be32 vni, __u32 ifindex, __u8 ndm_flags,
+   struct vxlan_fdb **fdb)
+{
+   struct vxlan_rdst *rd = NULL;
+   struct vxlan_fdb *f;
+   int rc;
+
+   if (vxlan->cfg.addrmax &&
+   vxlan->addrcnt >= vxlan->cfg.addrmax)
+   return -ENOSPC;
+
+   netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
+   f = vxlan_fdb_alloc(vxlan, mac, state, src_vni, ndm_flags);
+   if (!f)
+   return -ENOMEM;
+
+   rc = vxlan_fdb_append(f, ip, port, vni, ifindex, );
+   if (rc < 0) {
+   kfree(f);
+   return rc;
+   }
+
+   ++vxlan->addrcnt;
+   hlist_add_head_rcu(>hlist,
+  vxlan_fdb_head(vxlan, mac, src_vni));
+
+   *fdb = f;
+
+   return 0;
+}
+
+/* Add new entry to forwarding table -- assumes lock held */
+static int vxlan_fdb_update(struct vxlan_dev *vxlan,
+   const u8 *mac, union vxlan_addr *ip,
__u16 state, __u16 flags,
__be16 port, __be32 src_vni, __be32 vni,
__u32 ifindex, __u8 ndm_flags)
@@ -687,37 +740,17 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
 
-   if (vxlan->cfg.addrmax &&
-   vxlan->addrcnt >= vxlan->cfg.addrmax)
-   return -ENOSPC;
-
/* Disallow replace to add a multicast entry */
if ((flags & NLM_F_REPLACE) &&
(is_multicast_ether_addr(mac) || is_zero_ether_addr(mac)))
return -EOPNOTSUPP;
 
netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
-   f = kmalloc(sizeof(*f), GFP_ATOMIC);
-   if (!f)
-   return -ENOMEM;
-
-   notify = 1;
-   f->state = state;
-   f->flags = ndm_flags;
-   f->updated = f->used = jiffies;
-   f->vni = src_vni;
-   INIT_LIST_HEAD(>remotes);
-   memcpy(f->eth_addr, mac, ETH_ALEN);
-
-   rc = vxlan_fdb_append(f, ip, port, vni, ifindex, );
-   if (rc < 0) {
-   kfree(f);
+   rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
+ vni, ifindex, ndm_flags, );
+   if (rc < 0)
return rc;
-   }
-
-   ++vxlan->addrcnt;
-   hlist_add_head_rcu(>hlist,
-  vxlan_fdb_head(vxlan, mac, src_vni));
+   notify = 1;
}
 
if (notify) {
@@ -863,7 +896,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr 
*tb[],
return -EAFNOSUPPORT;
 
spin_lock_bh(>hash_lock);
-   err = vxlan_fdb_create(vxlan, addr, , ndm->ndm_state, flags,
+   err = vxlan_fdb_update(vxlan, addr, , ndm->ndm_state, flags,
   port, src_vni, vni, ifindex, ndm->ndm_flags);
spin_unlock_b

[PATCH net 4/4] vxlan: fix default fdb entry netlink notify ordering during netdev create

2018-07-20 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function also notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).

This patch fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.

Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a7e9a4d..e857cb3 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3195,6 +3195,7 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 {
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_dev *vxlan = netdev_priv(dev);
+   struct vxlan_fdb *f = NULL;
int err;
 
err = vxlan_dev_configure(net, dev, conf, false, extack);
@@ -3205,27 +3206,38 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 
/* create an fdb entry for a valid default destination */
if (!vxlan_addr_any(>default_dst.remote_ip)) {
-   err = vxlan_fdb_update(vxlan, all_zeros_mac,
+   err = vxlan_fdb_create(vxlan, all_zeros_mac,
   >default_dst.remote_ip,
   NUD_REACHABLE | NUD_PERMANENT,
-  NLM_F_EXCL | NLM_F_CREATE,
   vxlan->cfg.dst_port,
   vxlan->default_dst.remote_vni,
   vxlan->default_dst.remote_vni,
   vxlan->default_dst.remote_ifindex,
-  NTF_SELF);
+  NTF_SELF, );
if (err)
return err;
}
 
err = register_netdevice(dev);
+   if (err)
+   goto errout;
+
+   err = rtnl_configure_link(dev, NULL);
if (err) {
-   vxlan_fdb_delete_default(vxlan, vxlan->default_dst.remote_vni);
-   return err;
+   unregister_netdevice(dev);
+   goto errout;
}
 
+   /* notify default fdb entry */
+   if (f)
+   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_NEWNEIGH);
+
list_add(>next, >vxlan_list);
return 0;
+errout:
+   if (f)
+   vxlan_fdb_destroy(vxlan, f, false);
+   return err;
 }
 
 static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
@@ -3460,6 +3472,7 @@ static int vxlan_changelink(struct net_device *dev, 
struct nlattr *tb[],
struct vxlan_rdst *dst = >default_dst;
struct vxlan_rdst old_dst;
struct vxlan_config conf;
+   struct vxlan_fdb *f = NULL;
int err;
 
err = vxlan_nl2conf(tb, data,
@@ -3485,19 +3498,19 @@ static int vxlan_changelink(struct net_device *dev, 
struct nlattr *tb[],
   old_dst.remote_ifindex, 0);
 
if (!vxlan_addr_any(>remote_ip)) {
-   err = vxlan_fdb_update(vxlan, all_zeros_mac,
+   err = vxlan_fdb_create(vxlan, all_zeros_mac,
   >remote_ip,
   NUD_REACHABLE | NUD_PERMANENT,
-  NLM_F_CREATE | NLM_F_APPEND,
   vxlan->cfg.dst_port,
   dst->remote_vni,
   dst->remote_vni,
   dst->remote_ifindex,
-  NTF_SELF);
+  NTF_SELF, );
if (err) {
spin_unlock_bh(>hash_lock);
return err;
}
+   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), 
RTM_NEWNEIGH);
}
spin_unlock_bh(>hash_lock);
}
-- 
2.1.4

[PATCH net 1/4] rtnetlink: add rtnl_link_state check in rtnl_configure_link

2018-07-20 Thread Roopa Prabhu

From: Roopa Prabhu 

rtnl_configure_link sets dev->rtnl_link_state to
RTNL_LINK_INITIALIZED and unconditionally calls
__dev_notify_flags to notify user-space of dev flags.

current call sequence for rtnl_configure_link
rtnetlink_newlink
rtnl_link_ops->newlink
rtnl_configure_link (unconditionally notifies userspace of
 default and new dev flags)

If a newlink handler wants to call rtnl_configure_link
early, we will end up with duplicate notifications to
user-space.

This patch fixes rtnl_configure_link to check rtnl_link_state
and call __dev_notify_flags with gchanges = 0 if already
RTNL_LINK_INITIALIZED.

Later in the series, this patch will help the following sequence
where a driver implementing newlink can call rtnl_configure_link
to initialize the link early.

makes the following call sequence work:
rtnetlink_newlink
rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
link and notifies
user-space of default
dev flags)
rtnl_configure_link (updates dev flags if requested by user ifm
 and notifies user-space of new dev flags)

Signed-off-by: Roopa Prabhu 
---
 net/core/rtnetlink.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ef6122..e3f743c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2759,9 +2759,12 @@ int rtnl_configure_link(struct net_device *dev, const 
struct ifinfomsg *ifm)
return err;
}
 
-   dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
-
-   __dev_notify_flags(dev, old_flags, ~0U);
+   if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) {
+   __dev_notify_flags(dev, old_flags, 0U);
+   } else {
+   dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
+   __dev_notify_flags(dev, old_flags, ~0U);
+   }
return 0;
 }
 EXPORT_SYMBOL(rtnl_configure_link);
-- 
2.1.4

[PATCH net 3/4] vxlan: make netlink notify in vxlan_fdb_destroy optional

2018-07-20 Thread Roopa Prabhu

From: Roopa Prabhu 

Add a new option do_notify to vxlan_fdb_destroy to make
sending netlink notify optional. Used by a later patch.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c8d5bff..a7e9a4d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -774,13 +774,15 @@ static void vxlan_fdb_free(struct rcu_head *head)
kfree(f);
 }
 
-static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f)
+static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
+ bool do_notify)
 {
netdev_dbg(vxlan->dev,
"delete %pM\n", f->eth_addr);
 
--vxlan->addrcnt;
-   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
+   if (do_notify)
+   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
 
hlist_del_rcu(>hlist);
call_rcu(>rcu, vxlan_fdb_free);
@@ -930,7 +932,7 @@ static int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
goto out;
}
 
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
 
 out:
return 0;
@@ -2397,7 +2399,7 @@ static void vxlan_cleanup(struct timer_list *t)
   "garbage collect %pM\n",
   f->eth_addr);
f->state = NUD_STALE;
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
} else if (time_before(timeout, next_timer))
next_timer = timeout;
}
@@ -2448,7 +2450,7 @@ static void vxlan_fdb_delete_default(struct vxlan_dev 
*vxlan, __be32 vni)
spin_lock_bh(>hash_lock);
f = __vxlan_find_mac(vxlan, all_zeros_mac, vni);
if (f)
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
spin_unlock_bh(>hash_lock);
 }
 
@@ -2502,7 +2504,7 @@ static void vxlan_flush(struct vxlan_dev *vxlan, bool 
do_all)
continue;
/* the all_zeros_mac entry is deleted at vxlan_uninit */
if (!is_zero_ether_addr(f->eth_addr))
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
}
}
spin_unlock_bh(>hash_lock);
-- 
2.1.4

[PATCH net 0/4] vxlan: fix default fdb entry user-space notify ordering/race

2018-07-20 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).

This series fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- modify rtnl_configure_link to allow configuring a link early.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.

Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu 

Roopa Prabhu (4):
  vxlan: add new fdb alloc and create helpers
  vxlan: make netlink notify in vxlan_fdb_destroy optional
  rtnetlink: add rtnl_link_state check in rtnl_configure_link
  vxlan: fix default fdb netlink notify ordering during netdev create

 drivers/net/vxlan.c  | 130 ++-
 net/core/rtnetlink.c |   9 ++--
 2 files changed, 94 insertions(+), 45 deletions(-)

-- 
Dave, I am sending this series against net on your request on my net-next 
series.
https://marc.info/?l=linux-netdev=153098151929102=2
I did not know how long to wait before i send it to net. So sending it now.
As you also noted it is a very old bug. If you have changed your mind about 
stable or
if this is too early for net and you think we should soak it more in net-next,
pls feel free to drop. also, i will be happy to do a stable backport if needed. 
thanks.

Re: [PATCH net-next,v2] net: rename ndo_setup_tc to ndo_setup_offload

2018-07-20 Thread Roopa Prabhu

On Wed, Jul 18, 2018 at 5:11 PM, Pablo Neira Ayuso  wrote:
> One of the recurring complaints is that we do not have, as a driver
> writer, a central location from which we would be fed offloading rules
> into a NIC. This was brought up again during Netconf'18 in Boston.
>
> This patch just renames ndo_setup_tc to ndo_setup_offload as a very
> early initial work to prepare for follow up patch that discuss unified
> flow representation for the existing offload programming APIs.
>
> Signed-off-by: Pablo Neira Ayuso 
> Acked-by: Jiri Pirko 
> Acked-by: Jakub Kicinski 



ok with a rename,...but this seems to be going from a very specific to
a completely generic name.
maybe ndo_setup_flow_offload or ndo_setup_rule_offload might be better ?.

(or maybe i am missing some context and this is really for setting up
every possible hardware offload ?)




> ---
> v2: Missing function definition update in drivers/net/ethernet/sfc/falcon/tx.c
> apparently I forgot to turn on that driver when doing compile-testing,
> problem spotted by Martin Habets. Keeping Jakub and Jiri Acked-by tags,
> as this is the only change in the v1 patch.
>
>  drivers/net/ethernet/amd/xgbe/xgbe-drv.c|  6 +++---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |  6 +++---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |  4 ++--
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c|  2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c   |  6 +++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c   |  6 +++---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  6 +++---
>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c  |  6 +++---
>  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c |  6 +++---
>  drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |  6 +++---
>  drivers/net/ethernet/intel/i40e/i40e_main.c |  6 +++---
>  drivers/net/ethernet/intel/i40evf/i40evf_main.c |  8 
>  drivers/net/ethernet/intel/igb/igb_main.c   |  6 +++---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |  6 +++---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  9 +
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c   |  6 +++---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c|  6 +++---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.c  |  6 +++---
>  drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_net_repr.c   |  2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.c   |  4 ++--
>  drivers/net/ethernet/netronome/nfp/nfp_port.h   |  4 ++--
>  drivers/net/ethernet/sfc/efx.c  |  2 +-
>  drivers/net/ethernet/sfc/efx.h  |  4 ++--
>  drivers/net/ethernet/sfc/falcon/efx.c   |  2 +-
>  drivers/net/ethernet/sfc/falcon/efx.h   |  4 ++--
>  drivers/net/ethernet/sfc/falcon/tx.c|  4 ++--
>  drivers/net/ethernet/sfc/tx.c   |  4 ++--
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c   |  6 +++---
>  drivers/net/ethernet/ti/netcp_core.c|  6 +++---
>  drivers/net/netdevsim/netdev.c  |  5 +++--
>  include/linux/netdevice.h   | 18 +-
>  net/dsa/slave.c |  7 ---
>  net/sched/cls_api.c |  6 +++---
>  net/sched/sch_cbs.c |  8 
>  net/sched/sch_etf.c |  8 
>  net/sched/sch_mq.c  |  8 
>  net/sched/sch_mqprio.c  | 16 
>  net/sched/sch_prio.c| 15 ---
>  net/sched/sch_red.c | 15 ---
>  40 files changed, 131 insertions(+), 126 deletions(-)
>
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
> b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index 24f1053b8785..766864a35648 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -2246,8 +2246,8 @@ static void xgbe_poll_controller(struct net_device 
> *netdev)
>  }
>  #endif /* End CONFIG_NET_POLL_CONTROLLER */
>
> -static int xgbe_setup_tc(struct net_device *netdev, enum tc_setup_type type,
> -void *type_data)
> +static int xgbe_setup_offload(struct net_device *netdev,
> + enum tc_setup_type type, void *type_data)
>  {
> struct xgbe_prv_data *pdata = netdev_priv(netdev);
> struct tc_mqprio_qopt *mqprio = type_data;
> @@ -2501,7 +2501,7 @@ static const struct net_device_ops xgbe_netdev_ops = {
>  #ifdef CONFIG_NET_POLL_CONTROLLER
> .ndo_poll_controller= xgbe_poll_controller,
>  #endif
> -   .ndo_setup_tc   = xgbe_setup_tc,
> +   .ndo_setup_offload  = xgbe_setup_offload,
> .ndo_fix_features   =

[PATCH iproute2 net-next] ipneigh: exclude NTF_EXT_LEARNED from default filter

2018-07-16 Thread Roopa Prabhu

From: Roopa Prabhu 

NUD_NOARP entries are filtered out by default by iproute2.
We dont want NUD_NOARP with NTF_EXT_LEARNED flag filtered out.
This patch extends the default filter check for ip neigh show
to include the NTF_EXT_LEARNED flag.

Signed-off-by: Roopa Prabhu 
---
 ip/ipneigh.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index bd6e5c5..a0af705 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -262,6 +262,7 @@ int print_neigh(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
return 0;
if (!(filter.state>ndm_state) &&
!(r->ndm_flags & NTF_PROXY) &&
+   !(r->ndm_flags & NTF_EXT_LEARNED) &&
(r->ndm_state || !(filter.state&0x100)) &&
(r->ndm_family != AF_DECnet))
return 0;
-- 
2.1.4

Re: [PATCH v2 iproute2-next 2/5] bridge: colorize output and use JSON print library

2018-07-14 Thread Roopa Prabhu

On Tue, Feb 20, 2018 at 11:24 AM, Stephen Hemminger
 wrote:
> From: Stephen Hemminger 
>
> Use new functions from json_print to simplify code.
> Provide standard flag for colorizing output.
>
> The shortened -c flag is ambiguous it could mean color or
> compressvlan; it is now changed to mean color for consistency
> with other iproute2 commands.
>
> Signed-off-by: Stephen Hemminger 
> ---
>  bridge/br_common.h |   2 +-
>  bridge/bridge.c|  10 +-
>  bridge/fdb.c   | 281 +++--
>  bridge/mdb.c   | 362 
> ++---
>  bridge/vlan.c  | 276 +++-
>  5 files changed, 363 insertions(+), 568 deletions(-)
>
> diff --git a/bridge/br_common.h b/bridge/br_common.h
> index b25f61e50e05..2f1cb8fd9f3d 100644
> --- a/bridge/br_common.h
> +++ b/bridge/br_common.h
> @@ -6,7 +6,7 @@
>  #define MDB_RTR_RTA(r) \
> ((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32
>
> -extern void print_vlan_info(FILE *fp, struct rtattr *tb, int ifindex);
> +extern void print_vlan_info(FILE *fp, struct rtattr *tb);
>  extern int print_linkinfo(const struct sockaddr_nl *who,
>   struct nlmsghdr *n,
>   void *arg);
> diff --git a/bridge/bridge.c b/bridge/bridge.c
> index 4b112e3b8da9..e5b4c3c2198f 100644
> --- a/bridge/bridge.c
> +++ b/bridge/bridge.c
> @@ -16,12 +16,15 @@
>  #include "utils.h"
>  #include "br_common.h"
>  #include "namespace.h"
> +#include "color.h"
>
>  struct rtnl_handle rth = { .fd = -1 };
>  int preferred_family = AF_UNSPEC;
>  int oneline;
>  int show_stats;
>  int show_details;
> +int show_pretty;
> +int color;
>  int compress_vlans;
>  int json;
>  int timestamp;
> @@ -39,7 +42,7 @@ static void usage(void)
>  "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
>  "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
>  "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
> -"   -c[ompressvlans] -p[retty] -j{son} }\n");
> +"   -c[ompressvlans] -color -p[retty] -j{son} }\n");
> exit(-1);
>  }
>
> @@ -170,6 +173,8 @@ main(int argc, char **argv)
> NEXT_ARG();
> if (netns_switch(argv[1]))
> exit(-1);
> +   } else if (matches(opt, "-color") == 0) {
> +   enable_color();
> } else if (matches(opt, "-compressvlans") == 0) {
> ++compress_vlans;
> } else if (matches(opt, "-force") == 0) {
> @@ -195,6 +200,9 @@ main(int argc, char **argv)
>
> _SL_ = oneline ? "\\" : "\n";
>
> +   if (json)
> +   check_if_color_enabled();
> +
> if (batch_file)
> return batch(batch_file);
>
> diff --git a/bridge/fdb.c b/bridge/fdb.c
> index 93b5b2e694e3..b4f6e8b3a01b 100644
> --- a/bridge/fdb.c
> +++ b/bridge/fdb.c
> @@ -22,9 +22,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>
> +#include "json_print.h"
>  #include "libnetlink.h"
>  #include "br_common.h"
>  #include "rt_names.h"
> @@ -32,8 +32,6 @@
>
>  static unsigned int filter_index, filter_vlan, filter_state;
>
> -json_writer_t *jw_global;
> -
>  static void usage(void)
>  {
> fprintf(stderr,
> @@ -83,13 +81,46 @@ static int state_a2n(unsigned int *s, const char *arg)
> return 0;
>  }
>
> -static void start_json_fdb_flags_array(bool *fdb_flags)
> +static void fdb_print_flags(FILE *fp, unsigned int flags)
> +{
> +   open_json_array(PRINT_JSON,
> +   is_json_context() ?  "flags" : "");
> +
> +   if (flags & NTF_SELF)
> +   print_string(PRINT_ANY, NULL, "%s ", "self");
> +
> +   if (flags & NTF_ROUTER)
> +   print_string(PRINT_ANY, NULL, "%s ", "router");
> +
> +   if (flags & NTF_EXT_LEARNED)
> +   print_string(PRINT_ANY, NULL, "%s ", "extern_learn");
> +
> +   if (flags & NTF_OFFLOADED)
> +   print_string(PRINT_ANY, NULL, "%s ", "offload");
> +
> +   if (flags & NTF_MASTER)
> +   print_string(PRINT_ANY, NULL, "%s ", "master");
> +
> +   close_json_array(PRINT_JSON, NULL);
> +}
> +
> +static void fdb_print_stats(FILE *fp, const struct nda_cacheinfo *ci)
>  {
> -   if (*fdb_flags)
> -   return;
> -   jsonw_name(jw_global, "flags");
> -   jsonw_start_array(jw_global);
> -   *fdb_flags = true;
> +   static int hz;
> +
> +   if (!hz)
> +   hz = get_user_hz();
> +
> +   if (is_json_context()) {
> +   print_uint(PRINT_JSON, "used", NULL,
> +ci->ndm_used / hz);
> +   print_uint(PRINT_JSON, "updated", NULL,
> +   ci->ndm_updated / hz);
> +   } else {
> +   fprintf(fp, "used %d/%d ", ci->ndm_used / hz,
> +

Re: [PATCH net-next 0/4] vxlan: fix default fdb entry user-space notify ordering/race

2018-07-07 Thread Roopa Prabhu

On Sat, Jul 7, 2018 at 4:23 AM, David Miller  wrote:
> From: Roopa Prabhu 
> Date: Wed,  4 Jul 2018 16:46:28 -0700
>
>> From: Roopa Prabhu 
>>
>> Problem:
>> In vxlan_newlink, a default fdb entry is added before register_netdev.
>> The default fdb creation function notifies user-space of the
>> fdb entry on the vxlan device which user-space does not know about yet.
>> (RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
>>
>> This series fixes the user-space netlink notification ordering issue
>> with the following changes:
>> - decouple fdb notify from fdb create.
>> - Move fdb notify after register_netdev.
>> - modify rtnl_configure_link to allow configuring a link early.
>> - Call rtnl_configure_link in vxlan newlink handler to notify
>> userspace about the newlink before fdb notify and
>> hence avoiding the user-space race.
>>
>> Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
>> Signed-off-by: Roopa Prabhu 
>
> This is quite an old bug (circa v3.11).  Maybe after this cooks for some
> time in net-next you can also submit it for 'net' and we can thus send
> it off to -stable as well?
>
> Applied to net-next, thanks.

sure, sounds good. thanks.

[PATCH net-next 2/4] vxlan: add new fdb alloc and create helpers

2018-07-04 Thread Roopa Prabhu

From: Roopa Prabhu 

- Add new vxlan_fdb_alloc helper
- rename existing vxlan_fdb_create into vxlan_fdb_update:
because it really creates or updates an existing
fdb entry
- move new fdb creation into a separate vxlan_fdb_create

Main motivation for this change is to introduce the ability
to decouple vxlan fdb creation and notify, used in a later patch.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 91 -
 1 file changed, 62 insertions(+), 29 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 601ae17..aa88beb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -637,9 +637,62 @@ static int vxlan_gro_complete(struct sock *sk, struct 
sk_buff *skb, int nhoff)
return eth_gro_complete(skb, nhoff + sizeof(struct vxlanhdr));
 }
 
-/* Add new entry to forwarding table -- assumes lock held */
+static struct vxlan_fdb *vxlan_fdb_alloc(struct vxlan_dev *vxlan,
+const u8 *mac, __u16 state,
+__be32 src_vni, __u8 ndm_flags)
+{
+   struct vxlan_fdb *f;
+
+   f = kmalloc(sizeof(*f), GFP_ATOMIC);
+   if (!f)
+   return NULL;
+   f->state = state;
+   f->flags = ndm_flags;
+   f->updated = f->used = jiffies;
+   f->vni = src_vni;
+   INIT_LIST_HEAD(>remotes);
+   memcpy(f->eth_addr, mac, ETH_ALEN);
+
+   return f;
+}
+
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
const u8 *mac, union vxlan_addr *ip,
+   __u16 state, __be16 port, __be32 src_vni,
+   __be32 vni, __u32 ifindex, __u8 ndm_flags,
+   struct vxlan_fdb **fdb)
+{
+   struct vxlan_rdst *rd = NULL;
+   struct vxlan_fdb *f;
+   int rc;
+
+   if (vxlan->cfg.addrmax &&
+   vxlan->addrcnt >= vxlan->cfg.addrmax)
+   return -ENOSPC;
+
+   netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
+   f = vxlan_fdb_alloc(vxlan, mac, state, src_vni, ndm_flags);
+   if (!f)
+   return -ENOMEM;
+
+   rc = vxlan_fdb_append(f, ip, port, vni, ifindex, );
+   if (rc < 0) {
+   kfree(f);
+   return rc;
+   }
+
+   ++vxlan->addrcnt;
+   hlist_add_head_rcu(>hlist,
+  vxlan_fdb_head(vxlan, mac, src_vni));
+
+   *fdb = f;
+
+   return 0;
+}
+
+/* Add new entry to forwarding table -- assumes lock held */
+static int vxlan_fdb_update(struct vxlan_dev *vxlan,
+   const u8 *mac, union vxlan_addr *ip,
__u16 state, __u16 flags,
__be16 port, __be32 src_vni, __be32 vni,
__u32 ifindex, __u8 ndm_flags)
@@ -688,37 +741,17 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
 
-   if (vxlan->cfg.addrmax &&
-   vxlan->addrcnt >= vxlan->cfg.addrmax)
-   return -ENOSPC;
-
/* Disallow replace to add a multicast entry */
if ((flags & NLM_F_REPLACE) &&
(is_multicast_ether_addr(mac) || is_zero_ether_addr(mac)))
return -EOPNOTSUPP;
 
netdev_dbg(vxlan->dev, "add %pM -> %pIS\n", mac, ip);
-   f = kmalloc(sizeof(*f), GFP_ATOMIC);
-   if (!f)
-   return -ENOMEM;
-
-   notify = 1;
-   f->state = state;
-   f->flags = ndm_flags;
-   f->updated = f->used = jiffies;
-   f->vni = src_vni;
-   INIT_LIST_HEAD(>remotes);
-   memcpy(f->eth_addr, mac, ETH_ALEN);
-
-   rc = vxlan_fdb_append(f, ip, port, vni, ifindex, );
-   if (rc < 0) {
-   kfree(f);
+   rc = vxlan_fdb_create(vxlan, mac, ip, state, port, src_vni,
+ vni, ifindex, ndm_flags, );
+   if (rc < 0)
return rc;
-   }
-
-   ++vxlan->addrcnt;
-   hlist_add_head_rcu(>hlist,
-  vxlan_fdb_head(vxlan, mac, src_vni));
+   notify = 1;
}
 
if (notify) {
@@ -864,7 +897,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr 
*tb[],
return -EAFNOSUPPORT;
 
spin_lock_bh(>hash_lock);
-   err = vxlan_fdb_create(vxlan, addr, , ndm->ndm_state, flags,
+   err = vxlan_fdb_update(vxlan, addr, , ndm->ndm_state, flags,
   port, src_vni, vni, ifindex, ndm->ndm_flags);
spin_unlock_b

[PATCH net-next 4/4] vxlan: fix default fdb entry netlink notify ordering during netdev create

2018-07-04 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function also notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).

This patch fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.

Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 794a9a7..ababba3 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3197,6 +3197,7 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 {
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_dev *vxlan = netdev_priv(dev);
+   struct vxlan_fdb *f = NULL;
int err;
 
err = vxlan_dev_configure(net, dev, conf, false, extack);
@@ -3207,27 +3208,38 @@ static int __vxlan_dev_create(struct net *net, struct 
net_device *dev,
 
/* create an fdb entry for a valid default destination */
if (!vxlan_addr_any(>default_dst.remote_ip)) {
-   err = vxlan_fdb_update(vxlan, all_zeros_mac,
+   err = vxlan_fdb_create(vxlan, all_zeros_mac,
   >default_dst.remote_ip,
   NUD_REACHABLE | NUD_PERMANENT,
-  NLM_F_EXCL | NLM_F_CREATE,
   vxlan->cfg.dst_port,
   vxlan->default_dst.remote_vni,
   vxlan->default_dst.remote_vni,
   vxlan->default_dst.remote_ifindex,
-  NTF_SELF);
+  NTF_SELF, );
if (err)
return err;
}
 
err = register_netdevice(dev);
+   if (err)
+   goto errout;
+
+   err = rtnl_configure_link(dev, NULL);
if (err) {
-   vxlan_fdb_delete_default(vxlan, vxlan->default_dst.remote_vni);
-   return err;
+   unregister_netdevice(dev);
+   goto errout;
}
 
+   /* notify default fdb entry */
+   if (f)
+   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_NEWNEIGH);
+
list_add(>next, >vxlan_list);
return 0;
+errout:
+   if (f)
+   vxlan_fdb_destroy(vxlan, f, false);
+   return err;
 }
 
 static int vxlan_nl2conf(struct nlattr *tb[], struct nlattr *data[],
@@ -3462,6 +3474,7 @@ static int vxlan_changelink(struct net_device *dev, 
struct nlattr *tb[],
struct vxlan_rdst *dst = >default_dst;
struct vxlan_rdst old_dst;
struct vxlan_config conf;
+   struct vxlan_fdb *f = NULL;
int err;
 
err = vxlan_nl2conf(tb, data,
@@ -3487,19 +3500,19 @@ static int vxlan_changelink(struct net_device *dev, 
struct nlattr *tb[],
   old_dst.remote_ifindex, 0);
 
if (!vxlan_addr_any(>remote_ip)) {
-   err = vxlan_fdb_update(vxlan, all_zeros_mac,
+   err = vxlan_fdb_create(vxlan, all_zeros_mac,
   >remote_ip,
   NUD_REACHABLE | NUD_PERMANENT,
-  NLM_F_CREATE | NLM_F_APPEND,
   vxlan->cfg.dst_port,
   dst->remote_vni,
   dst->remote_vni,
   dst->remote_ifindex,
-  NTF_SELF);
+  NTF_SELF, );
if (err) {
spin_unlock_bh(>hash_lock);
return err;
}
+   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), 
RTM_NEWNEIGH);
}
spin_unlock_bh(>hash_lock);
}
-- 
2.1.4

[PATCH net-next 3/4] vxlan: make netlink notify in vxlan_fdb_destroy optional

2018-07-04 Thread Roopa Prabhu

From: Roopa Prabhu 

Add a new option do_notify to vxlan_fdb_destroy to make
sending netlink notify optional. Used by a later patch.

Signed-off-by: Roopa Prabhu 
---
 drivers/net/vxlan.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index aa88beb..794a9a7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -775,13 +775,15 @@ static void vxlan_fdb_free(struct rcu_head *head)
kfree(f);
 }
 
-static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f)
+static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
+ bool do_notify)
 {
netdev_dbg(vxlan->dev,
"delete %pM\n", f->eth_addr);
 
--vxlan->addrcnt;
-   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
+   if (do_notify)
+   vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
 
hlist_del_rcu(>hlist);
call_rcu(>rcu, vxlan_fdb_free);
@@ -931,7 +933,7 @@ static int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
goto out;
}
 
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
 
 out:
return 0;
@@ -2399,7 +2401,7 @@ static void vxlan_cleanup(struct timer_list *t)
   "garbage collect %pM\n",
   f->eth_addr);
f->state = NUD_STALE;
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
} else if (time_before(timeout, next_timer))
next_timer = timeout;
}
@@ -2450,7 +2452,7 @@ static void vxlan_fdb_delete_default(struct vxlan_dev 
*vxlan, __be32 vni)
spin_lock_bh(>hash_lock);
f = __vxlan_find_mac(vxlan, all_zeros_mac, vni);
if (f)
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
spin_unlock_bh(>hash_lock);
 }
 
@@ -2504,7 +2506,7 @@ static void vxlan_flush(struct vxlan_dev *vxlan, bool 
do_all)
continue;
/* the all_zeros_mac entry is deleted at vxlan_uninit */
if (!is_zero_ether_addr(f->eth_addr))
-   vxlan_fdb_destroy(vxlan, f);
+   vxlan_fdb_destroy(vxlan, f, true);
}
}
spin_unlock_bh(>hash_lock);
-- 
2.1.4

[PATCH net-next 1/4] rtnetlink: add rtnl_link_state check in rtnl_configure_link

2018-07-04 Thread Roopa Prabhu

From: Roopa Prabhu 

rtnl_configure_link sets dev->rtnl_link_state to
RTNL_LINK_INITIALIZED and unconditionally calls
__dev_notify_flags to notify user-space of dev flags.

current call sequence for rtnl_configure_link
rtnetlink_newlink
rtnl_link_ops->newlink
rtnl_configure_link (unconditionally notifies userspace of
 default and new dev flags)

If a newlink handler wants to call rtnl_configure_link
early, we will end up with duplicate notifications to
user-space.

This patch fixes rtnl_configure_link to check rtnl_link_state
and call __dev_notify_flags with gchanges = 0 if already
RTNL_LINK_INITIALIZED.

Later in the series, this patch will help the following sequence
where a driver implementing newlink can call rtnl_configure_link
to initialize the link early.

makes the following call sequence work:
rtnetlink_newlink
rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes
link and notifies
user-space of default
dev flags)
rtnl_configure_link (updates dev flags if requested by user ifm
 and notifies user-space of new dev flags)

Signed-off-by: Roopa Prabhu 
---
 net/core/rtnetlink.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ef6122..e3f743c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2759,9 +2759,12 @@ int rtnl_configure_link(struct net_device *dev, const 
struct ifinfomsg *ifm)
return err;
}
 
-   dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
-
-   __dev_notify_flags(dev, old_flags, ~0U);
+   if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) {
+   __dev_notify_flags(dev, old_flags, 0U);
+   } else {
+   dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
+   __dev_notify_flags(dev, old_flags, ~0U);
+   }
return 0;
 }
 EXPORT_SYMBOL(rtnl_configure_link);
-- 
2.1.4

[PATCH net-next 0/4] vxlan: fix default fdb entry user-space notify ordering/race

2018-07-04 Thread Roopa Prabhu

From: Roopa Prabhu 

Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).

This series fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- modify rtnl_configure_link to allow configuring a link early.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.

Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu 

Roopa Prabhu (4):
  vxlan: add new fdb alloc and create helpers
  vxlan: make netlink notify in vxlan_fdb_destroy optional
  rtnetlink: add rtnl_link_state check in rtnl_configure_link
  vxlan: fix default fdb netlink notify ordering during netdev create

 drivers/net/vxlan.c  | 130 ++-
 net/core/rtnetlink.c |   9 ++--
 2 files changed, 94 insertions(+), 45 deletions(-)

-- 
2.1.4

[PATCH net] net: fib_rules: bring back rule_exists to match rule during add

2018-06-29 Thread Roopa Prabhu

From: Roopa Prabhu 

After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
for existing rule lookup in both the add and del paths. While this
is good for the delete path, it solves a few problems but opens up
a few invalid key matches in the add path.

$ip -4 rule add table main tos 10 fwmark 1
$ip -4 rule add table main tos 10
RTNETLINK answers: File exists

The problem here is rule_find does not check if the key masks in
the new and old rule are the same and hence ends up matching a more
secific rule. Rule key masks cannot be easily compared today without
an elaborate if-else block. Its best to introduce key masks for easier
and accurate rule comparison in the future. Until then, due to fear of
regressions this patch re-introduces older loose rule_exists during add.
Also fixes both rule_exists and rule_find to cover missing attributes.

Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs 
into fib_nl2rule")
Signed-off-by: Roopa Prabhu 
---
This does bring back old problems with rule_exists. But at this hour, old 
problems
are better than new problems!. The pref compare in rule_exists can be fixed, 
but its best
done in net-next. I will look at adding more coverage in fib rule selftests. 
Thanks.

 net/core/fib_rules.c | 72 +++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index bc8425d..f64aa13 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -444,6 +444,9 @@ static struct fib_rule *rule_find(struct fib_rules_ops *ops,
if (rule->ip_proto && r->ip_proto != rule->ip_proto)
continue;
 
+   if (rule->proto && r->proto != rule->proto)
+   continue;
+
if (fib_rule_port_range_set(>sport_range) &&
!fib_rule_port_range_compare(>sport_range,
 >sport_range))
@@ -653,6 +656,73 @@ static int fib_nl2rule(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return err;
 }
 
+static int rule_exists(struct fib_rules_ops *ops, struct fib_rule_hdr *frh,
+  struct nlattr **tb, struct fib_rule *rule)
+{
+   struct fib_rule *r;
+
+   list_for_each_entry(r, >rules_list, list) {
+   if (r->action != rule->action)
+   continue;
+
+   if (r->table != rule->table)
+   continue;
+
+   if (r->pref != rule->pref)
+   continue;
+
+   if (memcmp(r->iifname, rule->iifname, IFNAMSIZ))
+   continue;
+
+   if (memcmp(r->oifname, rule->oifname, IFNAMSIZ))
+   continue;
+
+   if (r->mark != rule->mark)
+   continue;
+
+   if (r->suppress_ifgroup != rule->suppress_ifgroup)
+   continue;
+
+   if (r->suppress_prefixlen != rule->suppress_prefixlen)
+   continue;
+
+   if (r->mark_mask != rule->mark_mask)
+   continue;
+
+   if (r->tun_id != rule->tun_id)
+   continue;
+
+   if (r->fr_net != rule->fr_net)
+   continue;
+
+   if (r->l3mdev != rule->l3mdev)
+   continue;
+
+   if (!uid_eq(r->uid_range.start, rule->uid_range.start) ||
+   !uid_eq(r->uid_range.end, rule->uid_range.end))
+   continue;
+
+   if (r->ip_proto != rule->ip_proto)
+   continue;
+
+   if (r->proto != rule->proto)
+   continue;
+
+   if (!fib_rule_port_range_compare(>sport_range,
+>sport_range))
+   continue;
+
+   if (!fib_rule_port_range_compare(>dport_range,
+>dport_range))
+   continue;
+
+   if (!ops->compare(r, frh, tb))
+   continue;
+   return 1;
+   }
+   return 0;
+}
+
 int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
   struct netlink_ext_ack *extack)
 {
@@ -687,7 +757,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
*nlh,
goto errout;
 
if ((nlh->nlmsg_flags & NLM_F_EXCL) &&
-   rule_find(ops, frh, tb, rule, user_priority)) {
+   rule_exists(ops, frh, tb, rule)) {
err = -EEXIST;
goto errout_free;
}
-- 
2.1.4

Re: [PATCH net] net: fib_rules: add protocol check in rule_find

2018-06-29 Thread Roopa Prabhu

On Thu, Jun 28, 2018 at 9:59 PM, Roopa Prabhu  wrote:
> On Wed, Jun 27, 2018 at 6:27 PM, Roopa Prabhu  
> wrote:
>> From: Roopa Prabhu 
>>
>> After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
>> delrule msgs into fib_nl2rule"), rule_find is strict about checking
>> for an existing rule. rule_find must check against all
>> user given attributes, else it may match against a subset
>> of attributes and return an existing rule.
>>
>> In the below case, without support for protocol match, rule_find
>> will match only against 'table main' and return an existing rule.
>>
>> $ip -4 rule add table main protocol boot
>> RTNETLINK answers: File exists
>>
>> This patch adds protocol support to rule_find, forcing it to
>> check protocol match if given by the user.
>>
>> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule 
>> msgs into fib_nl2rule")
>> Signed-off-by: Roopa Prabhu 
>> ---
>> I spent some time looking at all match keys today and protocol
>> was the only missing one (protocol is not in a released kernel yet).
>> The only way this could be avoided is to move back to the old loose
>> rule_find. I am worried about this new strict checking surprising users,
>> but going back to the previous loose checking does not seem right either.
>> If there is a reason to believe that users did rely on the previous
>> behaviour, I will be happy to revert. Here is another example of old and
>> new behaviour.
>>
>> old rule_find behaviour:
>> $ip -4 rule add table main protocol boot
>> $ip -4 rule add table main protocol boot
>> $ip -4 rule add table main protocol boot
>> $ip rule show
>> 0:  from all lookup local
>> 32763:  from all lookup main  proto boot
>> 32764:  from all lookup main  proto boot
>> 32765:  from all lookup main  proto boot
>> 32766:  from all lookup main
>> 32767:  from all lookup default
>>
>> new rule_find behaviour (after this patch):
>> $ip -4 rule add table main protocol boot
>> $ip -4 rule add table main protocol boot
>> RTNETLINK answers: File exists
>>
>
> I found the case where the new rule_find breaks for add.
> $ip -4 rule add table main tos 10 fwmark 1
> $ip -4 rule add table main tos 10
> RTNETLINK answers: File exists
>
> The key masks in the new and old rule need to be compared .
> And it cannot be easily compared today without an elaborate if-else block.
> Its best to introduce key masks for easier and accurate rule comparison.
> But this is best done in net-next. I will submit an incremental patch
> tomorrow to
> restore previous rule_exists for the add case and the rest should be good.
>
> The current patch in context is needed regardless.
>
> Thanks (and sorry about the iterations).

as I write the commit msg for the new incremental patch, it seems
better to merge this one with the new one.

Please ignore this patch, I will send an updated patch in a bit. thanks.

Re: [PATCH net] net: fib_rules: add protocol check in rule_find

2018-06-28 Thread Roopa Prabhu

On Wed, Jun 27, 2018 at 6:27 PM, Roopa Prabhu  wrote:
> From: Roopa Prabhu 
>
> After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
> delrule msgs into fib_nl2rule"), rule_find is strict about checking
> for an existing rule. rule_find must check against all
> user given attributes, else it may match against a subset
> of attributes and return an existing rule.
>
> In the below case, without support for protocol match, rule_find
> will match only against 'table main' and return an existing rule.
>
> $ip -4 rule add table main protocol boot
> RTNETLINK answers: File exists
>
> This patch adds protocol support to rule_find, forcing it to
> check protocol match if given by the user.
>
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs 
> into fib_nl2rule")
> Signed-off-by: Roopa Prabhu 
> ---
> I spent some time looking at all match keys today and protocol
> was the only missing one (protocol is not in a released kernel yet).
> The only way this could be avoided is to move back to the old loose
> rule_find. I am worried about this new strict checking surprising users,
> but going back to the previous loose checking does not seem right either.
> If there is a reason to believe that users did rely on the previous
> behaviour, I will be happy to revert. Here is another example of old and
> new behaviour.
>
> old rule_find behaviour:
> $ip -4 rule add table main protocol boot
> $ip -4 rule add table main protocol boot
> $ip -4 rule add table main protocol boot
> $ip rule show
> 0:  from all lookup local
> 32763:  from all lookup main  proto boot
> 32764:  from all lookup main  proto boot
> 32765:  from all lookup main  proto boot
> 32766:  from all lookup main
> 32767:  from all lookup default
>
> new rule_find behaviour (after this patch):
> $ip -4 rule add table main protocol boot
> $ip -4 rule add table main protocol boot
> RTNETLINK answers: File exists
>

I found the case where the new rule_find breaks for add.
$ip -4 rule add table main tos 10 fwmark 1
$ip -4 rule add table main tos 10
RTNETLINK answers: File exists

The key masks in the new and old rule need to be compared .
And it cannot be easily compared today without an elaborate if-else block.
Its best to introduce key masks for easier and accurate rule comparison.
But this is best done in net-next. I will submit an incremental patch
tomorrow to
restore previous rule_exists for the add case and the rest should be good.

The current patch in context is needed regardless.

Thanks (and sorry about the iterations).

[PATCH net] net: fib_rules: add protocol check in rule_find

2018-06-27 Thread Roopa Prabhu

From: Roopa Prabhu 

After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
delrule msgs into fib_nl2rule"), rule_find is strict about checking
for an existing rule. rule_find must check against all
user given attributes, else it may match against a subset
of attributes and return an existing rule.

In the below case, without support for protocol match, rule_find
will match only against 'table main' and return an existing rule.

$ip -4 rule add table main protocol boot
RTNETLINK answers: File exists

This patch adds protocol support to rule_find, forcing it to
check protocol match if given by the user.

Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs 
into fib_nl2rule")
Signed-off-by: Roopa Prabhu 
---
I spent some time looking at all match keys today and protocol
was the only missing one (protocol is not in a released kernel yet).
The only way this could be avoided is to move back to the old loose
rule_find. I am worried about this new strict checking surprising users,
but going back to the previous loose checking does not seem right either.
If there is a reason to believe that users did rely on the previous
behaviour, I will be happy to revert. Here is another example of old and
new behaviour.

old rule_find behaviour:
$ip -4 rule add table main protocol boot
$ip -4 rule add table main protocol boot
$ip -4 rule add table main protocol boot
$ip rule show
0:  from all lookup local 
32763:  from all lookup main  proto boot 
32764:  from all lookup main  proto boot 
32765:  from all lookup main  proto boot 
32766:  from all lookup main 
32767:  from all lookup default 

new rule_find behaviour (after this patch):
$ip -4 rule add table main protocol boot
$ip -4 rule add table main protocol boot
RTNETLINK answers: File exists

 net/core/fib_rules.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index bc8425d..5905567 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -444,6 +444,9 @@ static struct fib_rule *rule_find(struct fib_rules_ops *ops,
if (rule->ip_proto && r->ip_proto != rule->ip_proto)
continue;
 
+   if (rule->proto && r->proto != rule->proto)
+   continue;
+
if (fib_rule_port_range_set(>sport_range) &&
!fib_rule_port_range_compare(>sport_range,
 >sport_range))
-- 
2.1.4

Re: [PATCH v2] fib_rules: match rules based on suppress_* properties too

2018-06-26 Thread Roopa Prabhu

On Tue, Jun 26, 2018 at 6:34 PM, David Miller  wrote:
> From: "Jason A. Donenfeld" 
> Date: Tue, 26 Jun 2018 01:39:32 +0200
>
>> Two rules with different values of suppress_prefix or suppress_ifgroup
>> are not the same. This fixes an -EEXIST when running:
>>
>>$ ip -4 rule add table main suppress_prefixlength 0
>>
>> Signed-off-by: Jason A. Donenfeld 
>> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule 
>> msgs into fib_nl2rule")
>
> Roopa, thanks for doing all of that analysis.
>
> I think applying this patch makes the most sense at this point,
> so that it what I have done.


Thanks, will keep an eye out and add some more tests

Re: [PATCH v2] fib_rules: match rules based on suppress_* properties too

2018-06-26 Thread Roopa Prabhu

On Mon, Jun 25, 2018 at 4:39 PM, Jason A. Donenfeld  wrote:
> Two rules with different values of suppress_prefix or suppress_ifgroup
> are not the same. This fixes an -EEXIST when running:
>
>$ ip -4 rule add table main suppress_prefixlength 0
>
> Signed-off-by: Jason A. Donenfeld 
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs 
> into fib_nl2rule")
> ---
> This adds the new condition you mentioned. I'm not sure what you make of
> DaveM's remark about this not being in the original code, but here is
> nonetheless the requested change.

I just saw DaveM's comment and agree the new rule_find is different
but that was intentional and it merged
the finding of the rule in the newlink and dellink paths. I did port
each of the conditions from previous rule_exists
to new rule_find, but forgot to add the new keys which now became
necessary. I replied with details on your
other bug report thread. Also pasting that response here:

So the previous rule_exists code did not check for attribute matches correctly.
It would ignore a rule at the first non-existent attribute mis-match.
And rule_find will always
be called with a valid key.
eg in your case, it would
return at pref mismatch...and never match an existing rule.

$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0

$ip rule show
0:  from all lookup local
32763:  from all lookup main suppress_prefixlength 0
32764:  from all lookup main suppress_prefixlength 0
32765:  from all lookup main suppress_prefixlength 0
32766:  from all lookup main
32767:  from all lookup default

With your patch, you should get proper EXISTS check
$ ip -4 rule add table main suppress_prefixlength 0
$ ip -4 rule add table main suppress_prefixlength 0

RTNETLINK answers: File exists

Dave, pls let me know if this is acceptable. If not
I can easily restore the previous rule_exists func. Will also submit a
patch to cover this in self-tests.

thanks.

>
>  net/core/fib_rules.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 126ffc5bc630..bc8425d81022 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -416,6 +416,14 @@ static struct fib_rule *rule_find(struct fib_rules_ops 
> *ops,
> if (rule->mark && r->mark != rule->mark)
> continue;
>
> +   if (rule->suppress_ifgroup != -1 &&
> +   r->suppress_ifgroup != rule->suppress_ifgroup)
> +   continue;
> +
> +   if (rule->suppress_prefixlen != -1 &&
> +   r->suppress_prefixlen != rule->suppress_prefixlen)
> +   continue;
> +
> if (rule->mark_mask && r->mark_mask != rule->mark_mask)
> continue;
>
> --

[PATCH net-next] neighbour: force neigh_invalidate when NUD_FAILED update is from admin

2018-06-25 Thread Roopa Prabhu

From: Roopa Prabhu 

In systems where neigh gc thresh holds are set to high values,
admin deleted neigh entries (eg ip neigh flush or ip neigh del) can
linger around in NUD_FAILED state for a long time until periodic gc kicks
in. This patch forces neigh_invalidate when NUD_FAILED neigh_update is
from an admin.

Signed-off-by: Roopa Prabhu 
---
My testing has not shown any problems with this patch. But i
am not sure why historically neigh admin was not considered here:
I am assuming that it is because the problem is not very obvious in
default low gc threshold deployments.
 net/core/neighbour.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8e3fda9..cbe85d8 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1148,7 +1148,8 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
neigh->nud_state = new;
err = 0;
notify = old & NUD_VALID;
-   if ((old & (NUD_INCOMPLETE | NUD_PROBE)) &&
+   if (((old & (NUD_INCOMPLETE | NUD_PROBE)) ||
+(flags & NEIGH_UPDATE_F_ADMIN)) &&
(new & NUD_FAILED)) {
neigh_invalidate(neigh);
notify = 1;
-- 
2.1.4

Re: [net regression] "fib_rules: move common handling of newrule delrule msgs into fib_nl2rule" breaks suppress_prefixlength

2018-06-25 Thread Roopa Prabhu

On Mon, Jun 25, 2018 at 8:23 AM, Roopa Prabhu  wrote:
> On Sat, Jun 23, 2018 at 8:46 AM, Jason A. Donenfeld  wrote:
>> Hey Roopa,
>>
>> On a kernel with a minimal networking config,
>> CONFIG_IP_MULTIPLE_TABLES appears to be broken for certain rules after
>> f9d4b0c1e9695e3de7af3768205bacc27312320c.
>>
>> Try, for example, running:
>>
>> $ ip -4 rule add table main suppress_prefixlength 0
>>
>> It returns with EEXIST.
>>
>> Perhaps the reason is that the new rule_find function does not match
>> on suppress_prefixlength? However, rule_exist from before didn't do
>> that either. I'll keep playing and see if I can track it down myself,
>> but thought I should let you know first.
>
> I am surprised at that also. I cannot find prior rule_exist looking at
> suppress_prefixlength.
> I will dig deeper also today. But your patch LGTM with a small change
> I commented on it.
>

So the previous rule_exists code did not check for attribute matches correctly.
It would ignore a rule at the first non-existent attribute mis-match.
eg in your case, it would
return at pref mismatch.


$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0

$ip rule show
0:  from all lookup local

32763:  from all lookup main suppress_prefixlength 0

32764:  from all lookup main suppress_prefixlength 0

32765:  from all lookup main suppress_prefixlength 0

32766:  from all lookup main

32767:  from all lookup default


With your patch (with my proposed fixes), you should get proper EXISTS check

$ git diff HEAD

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c

index 126ffc5..de4c171 100644

--- a/net/core/fib_rules.c

+++ b/net/core/fib_rules.c

@@ -428,6 +428,14 @@ static struct fib_rule *rule_find(struct
fib_rules_ops *ops,

if (rule->l3mdev && r->l3mdev != rule->l3mdev)

continue;



+   if (rule->suppress_ifgroup != -1 &&

+   (r->suppress_ifgroup != rule->suppress_ifgroup))

+   continue;

+

+   if (rule->suppress_prefixlen != -1 &&

+   (r->suppress_prefixlen != rule->suppress_prefixlen))

+   continue;

+

if (uid_range_set(>uid_range) &&

(!uid_eq(r->uid_range.start, rule->uid_range.start) ||

!uid_eq(r->uid_range.end, rule->uid_range.end)))



$ ip -4 rule add table main suppress_prefixlength 0

$ ip -4 rule add table main suppress_prefixlength 0

RTNETLINK answers: File exists

Re: [net regression] "fib_rules: move common handling of newrule delrule msgs into fib_nl2rule" breaks suppress_prefixlength

2018-06-25 Thread Roopa Prabhu

On Sat, Jun 23, 2018 at 8:46 AM, Jason A. Donenfeld  wrote:
> Hey Roopa,
>
> On a kernel with a minimal networking config,
> CONFIG_IP_MULTIPLE_TABLES appears to be broken for certain rules after
> f9d4b0c1e9695e3de7af3768205bacc27312320c.
>
> Try, for example, running:
>
> $ ip -4 rule add table main suppress_prefixlength 0
>
> It returns with EEXIST.
>
> Perhaps the reason is that the new rule_find function does not match
> on suppress_prefixlength? However, rule_exist from before didn't do
> that either. I'll keep playing and see if I can track it down myself,
> but thought I should let you know first.

I am surprised at that also. I cannot find prior rule_exist looking at
suppress_prefixlength.
I will dig deeper also today. But your patch LGTM with a small change
I commented on it.

>
> A relevant .config can be found at https://א.cc/iq5HoUY0
>

thanks.

Re: [PATCH] fib_rules: match rules based on suppress_* properties too

2018-06-25 Thread Roopa Prabhu

On Sat, Jun 23, 2018 at 8:59 AM, Jason A. Donenfeld  wrote:
> Two rules with different values of suppress_prefix or suppress_ifgroup
> are not the same. This fixes an -EEXIST when running:
>
>$ ip -4 rule add table main suppress_prefixlength 0
>
> Signed-off-by: Jason A. Donenfeld 
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs 
> into fib_nl2rule")
> ---
>  net/core/fib_rules.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 126ffc5bc630..665799311b98 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -416,6 +416,12 @@ static struct fib_rule *rule_find(struct fib_rules_ops 
> *ops,
> if (rule->mark && r->mark != rule->mark)
> continue;
>
> +   if (r->suppress_ifgroup != rule->suppress_ifgroup)
> +   continue;
> +
> +   if (r->suppress_prefixlen != rule->suppress_prefixlen)
> +   continue;
> +
> if (rule->mark_mask && r->mark_mask != rule->mark_mask)
> continue;
>

Can you please change the check to compare only if the new rule has
the attributes set ?

eg:

if (rule->suppress_ifgroup != -1 && (r->suppress_ifgroup !=
rule->suppress_ifgroup))

same thing for suppress_prefixlen

[PATCH net] neighbour: skip NTF_EXT_LEARNED entries during forced gc

2018-06-12 Thread Roopa Prabhu

From: Roopa Prabhu 

Commit 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
added support for NTF_EXT_LEARNED for neighbour entries.
NTF_EXT_LEARNED entries are neigh entries managed by control
plane (eg: Ethernet VPN implementation in FRR routing suite).
Periodic gc already excludes these entries. This patch extends
it to forced gc which the earlier patch missed.

Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag")
Signed-off-by: Roopa Prabhu 
---
 net/core/neighbour.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index a7a9c3d..8e3fda9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -119,13 +119,14 @@ unsigned long neigh_rand_reach_time(unsigned long base)
 EXPORT_SYMBOL(neigh_rand_reach_time);
 
 
-static bool neigh_del(struct neighbour *n, __u8 state,
+static bool neigh_del(struct neighbour *n, __u8 state, __u8 flags,
  struct neighbour __rcu **np, struct neigh_table *tbl)
 {
bool retval = false;
 
write_lock(>lock);
-   if (refcount_read(>refcnt) == 1 && !(n->nud_state & state)) {
+   if (refcount_read(>refcnt) == 1 && !(n->nud_state & state) &&
+   !(n->flags & flags)) {
struct neighbour *neigh;
 
neigh = rcu_dereference_protected(n->next,
@@ -157,7 +158,7 @@ bool neigh_remove_one(struct neighbour *ndel, struct 
neigh_table *tbl)
while ((n = rcu_dereference_protected(*np,
  lockdep_is_held(>lock {
if (n == ndel)
-   return neigh_del(n, 0, np, tbl);
+   return neigh_del(n, 0, 0, np, tbl);
np = >next;
}
return false;
@@ -185,7 +186,8 @@ static int neigh_forced_gc(struct neigh_table *tbl)
 * - nobody refers to it.
 * - it is not permanent
 */
-   if (neigh_del(n, NUD_PERMANENT, np, tbl)) {
+   if (neigh_del(n, NUD_PERMANENT, NTF_EXT_LEARNED, np,
+ tbl)) {
shrunk = 1;
continue;
}
-- 
2.1.4

[PATCH iproute2 net-next] iproute: ip route get support for sport, dport and ipproto match

2018-05-30 Thread Roopa Prabhu

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 ip/iproute.c   | 26 +-
 man/man8/ip-route.8.in | 20 +++-
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 56dd9f2..ef04d27 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -69,7 +69,8 @@ static void usage(void)
"[ from ADDRESS iif STRING ]\n"
"[ oif STRING ] [ tos TOS ]\n"
"[ mark NUMBER ] [ vrf NAME ]\n"
-   "[ uid NUMBER ]\n"
+   "[ uid NUMBER ] [ ipproto PROTOCOL 
]\n"
+   "[ sport NUMBER ] [ dport NUMBER 
]\n"
"   ip route { add | del | change | append | replace } 
ROUTE\n"
"SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX 
]\n"
"[ table TABLE_ID ] [ vrf NAME ] [ proto RTPROTO 
]\n"
@@ -1994,6 +1995,29 @@ static int iproute_get(int argc, char **argv)
req.r.rtm_family = addr.family;
addattr_l(, sizeof(req), RTA_NEWDST,
  , addr.bytelen);
+   } else if (matches(*argv, "sport") == 0) {
+   __be16 sport;
+
+   NEXT_ARG();
+   if (get_be16(, *argv, 0))
+   invarg("invalid sport\n", *argv);
+   addattr16(, sizeof(req), RTA_SPORT, sport);
+   } else if (matches(*argv, "dport") == 0) {
+   __be16 dport;
+
+   NEXT_ARG();
+   if (get_be16(, *argv, 0))
+   invarg("invalid dport\n", *argv);
+   addattr16(, sizeof(req), RTA_DPORT, dport);
+   } else if (matches(*argv, "ipproto") == 0) {
+   int ipproto;
+
+   NEXT_ARG();
+   ipproto = inet_proto_a2n(*argv);
+   if (ipproto < 0)
+   invarg("Invalid \"ipproto\" value\n",
+  *argv);
+   addattr8(, sizeof(req), RTA_IP_PROTO, ipproto);
} else {
inet_prefix addr;
 
diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index b28f3d2..b21a847 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -38,7 +38,13 @@ ip-route \- routing table management
 .B  tos
 .IR TOS " ] [ "
 .B  vrf
-.IR NAME " ] "
+.IR NAME " ] [ "
+.B  ipproto
+.IR PROTOCOL " ] [ "
+.B  sport
+.IR NUMBER " ] [ "
+.B  dport
+.IR NUMBER " ] "
 
 .ti -8
 .BR "ip route" " { " add " | " del " | " change " | " append " | "\
@@ -1045,6 +1051,18 @@ the firewall mark
 force the vrf device on which this packet will be routed.
 
 .TP
+.BI ipproto " PROTOCOL"
+ip protocol as seen by the route lookup
+
+.TP
+.BI sport " NUMBER"
+source port as seen by the route lookup
+
+.TP
+.BI dport " NUMBER"
+destination port as seen by the route lookup
+
+.TP
 .B connected
 if no source address
 .RB "(option " from ")"
-- 
2.1.4

Re: [PATCH net] net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy

2018-05-24 Thread Roopa Prabhu

On Wed, May 23, 2018 at 12:04 PM, David Miller <da...@davemloft.net> wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> Date: Tue, 22 May 2018 13:44:51 -0700
>
>> From: Roopa Prabhu <ro...@cumulusnetworks.com>
>>
>> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
>
> Applied and queued up for -stable.
>
> Please provide an appropriate Fixes: tag next time.

yes, will do, thanks.

This seems to date back to 2006 when rtm_ipv4_policy was first introduced:
Fixes: 4e902c57417c ("[IPv4]: FIB configuration using struct fib_config")

[PATCH net-next v5 3/3] selftests: net: initial fib rule tests

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 248 ++
 2 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/fib_rule_tests.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index e60dddb..7cb0f49 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,7 +6,7 @@ CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
-TEST_PROGS += udpgso_bench.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh 
b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100755
index 000..d4cfb6a
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,248 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+   local rc=$1
+   local expected=$2
+   local msg="$3"
+
+   if [ ${rc} -eq ${expected} ]; then
+   nsuccess=$((nsuccess+1))
+   printf "\nTEST: %-50s  [ OK ]\n" "${msg}"
+   else
+   nfail=$((nfail+1))
+   printf "\nTEST: %-50s  [FAIL]\n" "${msg}"
+   if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+   echo
+   echo "hit enter to continue, 'q' to quit"
+   read a
+   [ "$a" = "q" ] && exit 1
+   fi
+   fi
+}
+
+log_section()
+{
+   echo
+   echo 
"##"
+   echo "TEST SECTION: $*"
+   echo 
"##"
+}
+
+setup()
+{
+   set -e
+   ip netns add testns
+   $IP link set dev lo up
+
+   $IP link add dummy0 type dummy
+   $IP link set dev dummy0 up
+   $IP address add 198.51.100.1/24 dev dummy0
+   $IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+   set +e
+}
+
+cleanup()
+{
+   $IP link del dev dummy0 &> /dev/null
+   ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+   ip rule help 2>&1 | grep -q $1
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 iprule too old, missing $1 match"
+   return 1
+   fi
+
+   ip route get help 2>&1 | grep -q $2
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 get route too old, missing $2 match"
+   return 1
+   fi
+
+   return 0
+}
+
+fib_rule6_del()
+{
+   $IP -6 rule del $1
+   log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+   pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+   $IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+   local match="$1"
+   local getmatch="$2"
+
+   $IP -6 rule add $match table $RTABLE
+   $IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+   log_test $? 0 "rule6 check: $1"
+
+   fib_rule6_del_by_pref "$match"
+   log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+   # setup the fib rule redirect route
+   $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+   match="oif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to 
table"
+
+   match="from $SRC_IP6 iif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to 
table"
+
+   match="tos 0x10"
+   fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to 
table"
+
+   match="fwmark 0x64"
+   getmatch="mark 0x64"
+   fib_rule6_test_match_n_redir

[PATCH net-next v5 0/3] fib rule selftest

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get, 
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
 reply (as pointed out by DaveM). I now try to reset the skb after the route
 lookup and before the netlink send (testing shows this is ok. More eyes and
 any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
 it separately for net (feedback from Eric)

Roopa Prabhu (3):
  ipv4: support sport, dport and ip_proto in RTM_GETROUTE
  ipv6: support sport, dport and ip_proto in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h|   2 +
 net/ipv4/route.c  | 152 -
 net/ipv6/route.c  |  25 +++
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

[PATCH net-next v5 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 include/net/ip.h   |   3 +
 include/uapi/linux/rtnetlink.h |   3 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/fib_frontend.c|   3 +
 net/ipv4/netlink.c |  23 +++
 net/ipv4/route.c   | 146 ++---
 6 files changed, 140 insertions(+), 40 deletions(-)
 create mode 100644 net/ipv4/netlink.c

diff --git a/include/net/ip.h b/include/net/ip.h
index bada1f1..0d2281b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -664,4 +664,7 @@ extern int sysctl_icmp_msgs_burst;
 int ip_misc_proc_init(void);
 #endif
 
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+   struct netlink_ext_ack *extack);
+
 #endif /* _IP_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..cabb210 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,9 @@ enum rtattr_type_t {
RTA_PAD,
RTA_UID,
RTA_TTL_PROPAGATE,
+   RTA_IP_PROTO,
+   RTA_SPORT,
+   RTA_DPORT,
__RTA_MAX
 };
 
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index b379520..13f2ba9 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,7 @@ obj-y := route.o inetpeer.o protocol.o \
 udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-metrics.o
+metrics.o netlink.o
 
 obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
 obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..897ae92 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,9 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_ENCAP] = { .type = NLA_NESTED },
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
+   [RTA_IP_PROTO]  = { .type = NLA_U8 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/netlink.c b/net/ipv4/netlink.c
new file mode 100644
index 000..f86bb4f
--- /dev/null
+++ b/net/ipv4/netlink.c
@@ -0,0 +1,23 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+   struct netlink_ext_ack *extack)
+{
+   *ip_proto = nla_get_u8(attr);
+
+   switch (*ip_proto) {
+   case IPPROTO_TCP:
+   case IPPROTO_UDP:
+   case IPPROTO_ICMP:
+   return 0;
+   default:
+   NL_SET_ERR_MSG(extack, "Unsupported ip proto");
+   return -EOPNOTSUPP;
+   }
+}
+EXPORT_SYMBOL_GPL(rtm_getroute_parse_ip_proto);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2cfa1b5..0e401dc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2574,11 +2574,10 @@ struct rtable *ip_route_output_flow(struct net *net, 
struct flowi4 *flp4,
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 /* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
-   struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+   struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+   struct sk_buff *skb, u32 portid, u32 seq)
 {
-   struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
struct nlmsghdr *nlh;
unsigned long expires = 0;
@@ -2674,7 +2673,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
}
} else
 #endif
-   if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+   if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
goto nla_put_failure;
}
 
@@ -2689,43 +2688,93 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
return -EMSGSIZE;
 }
 
+static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst,
+  u8 ip_proto, __be16 sport,
+  __be16 dport)
+{
+   struct sk_buff *skb;
+   struct iphdr *iph;
+
+   skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+   i

[PATCH net-next v5 2/3] ipv6: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 net/ipv6/route.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index bcb8785..038d661 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -4083,6 +4084,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] 
= {
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
[RTA_TABLE] = { .type = NLA_U32 },
+   [RTA_IP_PROTO]  = { .type = NLA_U8 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4795,6 +4799,19 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
else
fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
 
+   if (tb[RTA_SPORT])
+   fl6.fl6_sport = nla_get_be16(tb[RTA_SPORT]);
+
+   if (tb[RTA_DPORT])
+   fl6.fl6_dport = nla_get_be16(tb[RTA_DPORT]);
+
+   if (tb[RTA_IP_PROTO]) {
+   err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+ _proto, extack);
+   if (err)
+   goto errout;
+   }
+
if (iif) {
struct net_device *dev;
int flags = 0;
-- 
2.1.4

[PATCH net] net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 net/ipv4/fib_frontend.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..e66172a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_ENCAP] = { .type = NLA_NESTED },
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
+   [RTA_TABLE] = { .type = NLA_U32 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
-- 
2.1.4

Re: [lkp-robot] [selftests] 3f45449746: kernel_selftests.net.fib_rule_tests.sh.fail

2018-05-22 Thread Roopa Prabhu

On Mon, May 21, 2018 at 10:15 PM, kernel test robot
<xiaolong...@intel.com> wrote:
>
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: 3f454497465bb6b1533772900bb35fab1324e35e ("selftests: net: initial 
> fib rule tests")
> url: 
> https://github.com/0day-ci/linux/commits/Roopa-Prabhu/ipv4-support-sport-dport-and-ip_proto-in-RTM_GETROUTE/20180518-051740
>
>
> in testcase: kernel_selftests
> with following parameters:
>
> group: kselftests-02
>
> test-description: The kernel contains a set of "self tests" under the 
> tools/testing/selftests/ directory. These are intended to be small unit tests 
> to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
>
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
> 64G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
> selftests: fib_rule_tests.sh
> 
> selftests: Warning: file fib_rule_tests.sh is not executable, correct this.
> not ok 1..16 selftests: fib_rule_tests.sh [FAIL]
> make: Leaving directory 
> '/usr/src/linux-selftests-x86_64-rhel-7.2-3f454497465bb6b1533772900bb35fab1324e35e/tools/testing/selftests/net'
>
>

Thanks for the report. will fix

Re: [PATCH net-next v4 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-22 Thread Roopa Prabhu

On Tue, May 22, 2018 at 8:04 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>
>
> On 05/22/2018 07:51 AM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <ro...@cumulusnetworks.com>
>>
>>
>
> ...
>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 4d622112..cf5cfc5 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -649,6 +649,10 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
>>   [RTA_ENCAP] = { .type = NLA_NESTED },
>>   [RTA_UID]   = { .type = NLA_U32 },
>>   [RTA_MARK]  = { .type = NLA_U32 },
>> + [RTA_TABLE] = { .type = NLA_U32 },
>> + [RTA_IP_PROTO]  = { .type = NLA_U8 },
>> + [RTA_SPORT] = { .type = NLA_U16 },
>> + [RTA_DPORT] = { .type = NLA_U16 },
>>  };
>
> Hi Roopa
>
> RTA_TABLE addition looks like a bug fix for net tree ?
>
> This should be sent as an independent patch IMO.
>
> Thanks.

sure, I can do that. thanks.

[PATCH net-next v4 3/3] selftests: net: initial fib rule tests

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 248 ++
 2 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index e60dddb..7cb0f49 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,7 +6,7 @@ CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
-TEST_PROGS += udpgso_bench.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh 
b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100644
index 000..d4cfb6a
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,248 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+   local rc=$1
+   local expected=$2
+   local msg="$3"
+
+   if [ ${rc} -eq ${expected} ]; then
+   nsuccess=$((nsuccess+1))
+   printf "\nTEST: %-50s  [ OK ]\n" "${msg}"
+   else
+   nfail=$((nfail+1))
+   printf "\nTEST: %-50s  [FAIL]\n" "${msg}"
+   if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+   echo
+   echo "hit enter to continue, 'q' to quit"
+   read a
+   [ "$a" = "q" ] && exit 1
+   fi
+   fi
+}
+
+log_section()
+{
+   echo
+   echo 
"##"
+   echo "TEST SECTION: $*"
+   echo 
"##"
+}
+
+setup()
+{
+   set -e
+   ip netns add testns
+   $IP link set dev lo up
+
+   $IP link add dummy0 type dummy
+   $IP link set dev dummy0 up
+   $IP address add 198.51.100.1/24 dev dummy0
+   $IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+   set +e
+}
+
+cleanup()
+{
+   $IP link del dev dummy0 &> /dev/null
+   ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+   ip rule help 2>&1 | grep -q $1
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 iprule too old, missing $1 match"
+   return 1
+   fi
+
+   ip route get help 2>&1 | grep -q $2
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 get route too old, missing $2 match"
+   return 1
+   fi
+
+   return 0
+}
+
+fib_rule6_del()
+{
+   $IP -6 rule del $1
+   log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+   pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+   $IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+   local match="$1"
+   local getmatch="$2"
+
+   $IP -6 rule add $match table $RTABLE
+   $IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+   log_test $? 0 "rule6 check: $1"
+
+   fib_rule6_del_by_pref "$match"
+   log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+   # setup the fib rule redirect route
+   $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+   match="oif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to 
table"
+
+   match="from $SRC_IP6 iif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to 
table"
+
+   match="tos 0x10"
+   fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to 
table"
+
+   match="fwmark 0x64"
+   getmatch="mark 0x64"
+   fib_rule6_test_match_n_redir

[PATCH net-next v4 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 include/net/ip.h   |   3 +
 include/uapi/linux/rtnetlink.h |   3 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/fib_frontend.c|   4 ++
 net/ipv4/netlink.c |  23 +++
 net/ipv4/route.c   | 146 ++---
 6 files changed, 141 insertions(+), 40 deletions(-)
 create mode 100644 net/ipv4/netlink.c

diff --git a/include/net/ip.h b/include/net/ip.h
index bada1f1..0d2281b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -664,4 +664,7 @@ extern int sysctl_icmp_msgs_burst;
 int ip_misc_proc_init(void);
 #endif
 
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+   struct netlink_ext_ack *extack);
+
 #endif /* _IP_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..cabb210 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,9 @@ enum rtattr_type_t {
RTA_PAD,
RTA_UID,
RTA_TTL_PROPAGATE,
+   RTA_IP_PROTO,
+   RTA_SPORT,
+   RTA_DPORT,
__RTA_MAX
 };
 
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index b379520..13f2ba9 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,7 @@ obj-y := route.o inetpeer.o protocol.o \
 udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-metrics.o
+metrics.o netlink.o
 
 obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
 obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..cf5cfc5 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,10 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_ENCAP] = { .type = NLA_NESTED },
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
+   [RTA_TABLE] = { .type = NLA_U32 },
+   [RTA_IP_PROTO]  = { .type = NLA_U8 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/netlink.c b/net/ipv4/netlink.c
new file mode 100644
index 000..f86bb4f
--- /dev/null
+++ b/net/ipv4/netlink.c
@@ -0,0 +1,23 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+   struct netlink_ext_ack *extack)
+{
+   *ip_proto = nla_get_u8(attr);
+
+   switch (*ip_proto) {
+   case IPPROTO_TCP:
+   case IPPROTO_UDP:
+   case IPPROTO_ICMP:
+   return 0;
+   default:
+   NL_SET_ERR_MSG(extack, "Unsupported ip proto");
+   return -EOPNOTSUPP;
+   }
+}
+EXPORT_SYMBOL_GPL(rtm_getroute_parse_ip_proto);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2cfa1b5..0e401dc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2574,11 +2574,10 @@ struct rtable *ip_route_output_flow(struct net *net, 
struct flowi4 *flp4,
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 /* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
-   struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+   struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+   struct sk_buff *skb, u32 portid, u32 seq)
 {
-   struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
struct nlmsghdr *nlh;
unsigned long expires = 0;
@@ -2674,7 +2673,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
}
} else
 #endif
-   if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+   if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
goto nla_put_failure;
}
 
@@ -2689,43 +2688,93 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
return -EMSGSIZE;
 }
 
+static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst,
+  u8 ip_proto, __be16 sport,
+  __be16 dport)
+{
+   struct sk_buff *skb;
+   struct iphdr *iph;
+
+

[PATCH net-next v4 2/3] ipv6: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 net/ipv6/route.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index cc24ed3..7f1babb 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -4083,6 +4084,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] 
= {
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
[RTA_TABLE] = { .type = NLA_U32 },
+   [RTA_IP_PROTO]  = { .type = NLA_U8 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4794,6 +4798,19 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
else
fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
 
+   if (tb[RTA_SPORT])
+   fl6.fl6_sport = nla_get_be16(tb[RTA_SPORT]);
+
+   if (tb[RTA_DPORT])
+   fl6.fl6_dport = nla_get_be16(tb[RTA_DPORT]);
+
+   if (tb[RTA_IP_PROTO]) {
+   err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+ _proto, extack);
+   if (err)
+   goto errout;
+   }
+
if (iif) {
struct net_device *dev;
int flags = 0;
-- 
2.1.4

[PATCH net-next v4 0/3] fib rule selftest

2018-05-22 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get, 
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
 reply (as pointed out by DaveM). I now try to reset the skb after the route
 lookup and before the netlink send (testing shows this is ok. More eyes and
 any feedback here will be helpful)

Roopa Prabhu (3):
  ipv4: support sport, dport and ip_proto in RTM_GETROUTE
  ipv6: support sport, dport and ip_proto in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h|   2 +
 net/ipv4/route.c  | 152 -
 net/ipv6/route.c  |  25 +++
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

Re: [PATCH net-next v3 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-17 Thread Roopa Prabhu

On Wed, May 16, 2018 at 7:36 PM, David Miller <da...@davemloft.net> wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> Date: Wed, 16 May 2018 13:30:28 -0700
>
>> yes, but we hold rcu read lock before calling the reply function for
>> fib result.  I did consider allocating the skb before the read
>> lock..but then the refactoring (into a separate netlink reply func)
>> would seem unnecessary.
>>
>> I am fine with pre-allocating and undoing the refactoring if that works 
>> better.
>
> Hmmm... I also notice that with this change we end up doing the
> rtnl_unicast() under the RCU lock which is unnecessary too.

that was unintentional, it seemed like the previous code did that too..
and you are right  it did not.

>
> So yes, please pull the "out_skb" allocation before the
> rcu_read_lock(), and push the rtnl_unicast() after the
> rcu_read_unlock().

agreed, will do.

>
> It really is a shame that sharing the ETH_P_IP skb between the route
> route lookup and the netlink response doesn't work properly.

I did try a few things before giving up on the same skb...since it
also seemed like
keeping the netlink code separate would be a good thing for the future.

>
> I was using RTM_GETROUTE at one point for route/fib lookup performance
> measurements.  It never was great at that, but now that there is going
> to be two SKB allocations instead of one it is going to be even less
> useful for that kind of usage.

oh...did not realize this use of it. It certainly seems like we should
try to retain the
single skb in that case. let me see what i can do.

thanks.

Re: [PATCH v2 net] net/ipv4: Initialize proto and ports in flow struct

2018-05-16 Thread Roopa Prabhu

On Wed, May 16, 2018 at 1:36 PM, David Ahern <dsah...@gmail.com> wrote:
> Updating the FIB tracepoint for the recent change to allow rules using
> the protocol and ports exposed a few places where the entries in the flow
> struct are not initialized.
>
> For __fib_validate_source add the call to fib4_rules_early_flow_dissect
> since it is invoked for the input path. For netfilter, add the memset on
> the flow struct to avoid future problems like this. In ip_route_input_slow
> need to set the fields if the skb dissection does not happen.
>
> Fixes: bfff4862653b ("net: fib_rules: support for match on ip_proto, sport 
> and dport")
> Signed-off-by: David Ahern <dsah...@gmail.com>
> ---

LGTM,
Acked-by: Roopa Prabhu <ro...@cumulusnetworks.com>

Re: [PATCH net-next v3 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-16 Thread Roopa Prabhu

On Wed, May 16, 2018 at 11:37 AM, David Miller <da...@davemloft.net> wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> Date: Tue, 15 May 2018 20:55:06 -0700
>
>> +static int inet_rtm_getroute_reply(struct sk_buff *in_skb, struct nlmsghdr 
>> *nlh,
>> +__be32 dst, __be32 src, struct flowi4 *fl4,
>> +struct rtable *rt, struct fib_result *res)
>> +{
>> + struct net *net = sock_net(in_skb->sk);
>> + struct rtmsg *rtm = nlmsg_data(nlh);
>> + u32 table_id = RT_TABLE_MAIN;
>> + struct sk_buff *skb;
>> + int err = 0;
>> +
>> + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
>> + if (!skb)
>> + return -ENOMEM;
>
> If the caller can use GFP_KERNEL, so can this allocation.

yes, but we hold rcu read lock before calling the reply function for fib result.
I did consider allocating the skb before the read lock..but then the
refactoring (into a separate netlink reply func) would seem
unnecessary.

I am fine with pre-allocating and undoing the refactoring if that works better.

[PATCH net-next v3 0/3] fib rule selftest

2018-05-15 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (tcp, udp and icmp) match in getroute.
dropped printing of new match attrs in ip route get, 
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).

I am signing up for fib_rule_get test next :). Will probably need
some fixes around that area for consistency between ipv4 and ipv6.

Roopa Prabhu (3):
  ipv4: support sport, dport and ip_proto in RTM_GETROUTE
  ipv6: support sport, dport and ip_proto in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h|   2 +
 net/ipv4/route.c  | 152 -
 net/ipv6/route.c  |  25 +++
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

[PATCH net-next v3 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-15 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests. Before this patch getroute used
same skb to pass through the route lookup and for the netlink
getroute reply msg. This patch allocates separate skb's to keep
flow dissector happy.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 include/net/ip.h   |   3 +
 include/uapi/linux/rtnetlink.h |   3 +
 net/ipv4/Makefile  |   2 +-
 net/ipv4/fib_frontend.c|   4 +
 net/ipv4/netlink.c |  23 +
 net/ipv4/route.c   | 190 ++---
 6 files changed, 173 insertions(+), 52 deletions(-)
 create mode 100644 net/ipv4/netlink.c

diff --git a/include/net/ip.h b/include/net/ip.h
index bada1f1..0d2281b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -664,4 +664,7 @@ extern int sysctl_icmp_msgs_burst;
 int ip_misc_proc_init(void);
 #endif
 
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+   struct netlink_ext_ack *extack);
+
 #endif /* _IP_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..cabb210 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,9 @@ enum rtattr_type_t {
RTA_PAD,
RTA_UID,
RTA_TTL_PROPAGATE,
+   RTA_IP_PROTO,
+   RTA_SPORT,
+   RTA_DPORT,
__RTA_MAX
 };
 
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index b379520..13f2ba9 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -14,7 +14,7 @@ obj-y := route.o inetpeer.o protocol.o \
 udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 fib_frontend.o fib_semantics.o fib_trie.o fib_notifier.o \
 inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
-metrics.o
+metrics.o netlink.o
 
 obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
 obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index f05afaf..d7092c5 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -643,6 +643,10 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_ENCAP] = { .type = NLA_NESTED },
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
+   [RTA_TABLE] = { .type = NLA_U32 },
+   [RTA_IP_PROTO]  = { .type = NLA_U8 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
diff --git a/net/ipv4/netlink.c b/net/ipv4/netlink.c
new file mode 100644
index 000..f86bb4f
--- /dev/null
+++ b/net/ipv4/netlink.c
@@ -0,0 +1,23 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int rtm_getroute_parse_ip_proto(struct nlattr *attr, u8 *ip_proto,
+   struct netlink_ext_ack *extack)
+{
+   *ip_proto = nla_get_u8(attr);
+
+   switch (*ip_proto) {
+   case IPPROTO_TCP:
+   case IPPROTO_UDP:
+   case IPPROTO_ICMP:
+   return 0;
+   default:
+   NL_SET_ERR_MSG(extack, "Unsupported ip proto");
+   return -EOPNOTSUPP;
+   }
+}
+EXPORT_SYMBOL_GPL(rtm_getroute_parse_ip_proto);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 29268ef..073c849 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2569,11 +2569,10 @@ struct rtable *ip_route_output_flow(struct net *net, 
struct flowi4 *flp4,
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 /* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
-   struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+   struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+   struct sk_buff *skb, u32 portid, u32 seq)
 {
-   struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
struct nlmsghdr *nlh;
unsigned long expires = 0;
@@ -2669,7 +2668,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
}
} else
 #endif
-   if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+   if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
goto nla_put_failure;
}
 
@@ -2684,43 +2683,128 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
return -EMSGSIZE;
 }
 
+static int inet_rtm_getroute_reply(struct sk_buff *in_skb, struct nlmsghdr 
*nlh,
+

[PATCH net-next v3 2/3] ipv6: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-15 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 net/ipv6/route.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index af04167..31fb5bc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -4073,6 +4074,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] 
= {
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
[RTA_TABLE] = { .type = NLA_U32 },
+   [RTA_IP_PROTO]  = { .type = NLA_U8 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4784,6 +4788,19 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
else
fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
 
+   if (tb[RTA_SPORT])
+   fl6.fl6_sport = nla_get_be16(tb[RTA_SPORT]);
+
+   if (tb[RTA_DPORT])
+   fl6.fl6_dport = nla_get_be16(tb[RTA_DPORT]);
+
+   if (tb[RTA_IP_PROTO]) {
+   err = rtm_getroute_parse_ip_proto(tb[RTA_IP_PROTO],
+ _proto, extack);
+   if (err)
+   goto errout;
+   }
+
if (iif) {
struct net_device *dev;
int flags = 0;
-- 
2.1.4

[PATCH net-next v3 3/3] selftests: net: initial fib rule tests

2018-05-15 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 248 ++
 2 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index e60dddb..7cb0f49 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,7 +6,7 @@ CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh
-TEST_PROGS += udpgso_bench.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh 
b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100644
index 000..d4cfb6a
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,248 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+   local rc=$1
+   local expected=$2
+   local msg="$3"
+
+   if [ ${rc} -eq ${expected} ]; then
+   nsuccess=$((nsuccess+1))
+   printf "\nTEST: %-50s  [ OK ]\n" "${msg}"
+   else
+   nfail=$((nfail+1))
+   printf "\nTEST: %-50s  [FAIL]\n" "${msg}"
+   if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+   echo
+   echo "hit enter to continue, 'q' to quit"
+   read a
+   [ "$a" = "q" ] && exit 1
+   fi
+   fi
+}
+
+log_section()
+{
+   echo
+   echo 
"##"
+   echo "TEST SECTION: $*"
+   echo 
"##"
+}
+
+setup()
+{
+   set -e
+   ip netns add testns
+   $IP link set dev lo up
+
+   $IP link add dummy0 type dummy
+   $IP link set dev dummy0 up
+   $IP address add 198.51.100.1/24 dev dummy0
+   $IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+   set +e
+}
+
+cleanup()
+{
+   $IP link del dev dummy0 &> /dev/null
+   ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+   ip rule help 2>&1 | grep -q $1
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 iprule too old, missing $1 match"
+   return 1
+   fi
+
+   ip route get help 2>&1 | grep -q $2
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 get route too old, missing $2 match"
+   return 1
+   fi
+
+   return 0
+}
+
+fib_rule6_del()
+{
+   $IP -6 rule del $1
+   log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+   pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+   $IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+   local match="$1"
+   local getmatch="$2"
+
+   $IP -6 rule add $match table $RTABLE
+   $IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+   log_test $? 0 "rule6 check: $1"
+
+   fib_rule6_del_by_pref "$match"
+   log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+   # setup the fib rule redirect route
+   $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+   match="oif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to 
table"
+
+   match="from $SRC_IP6 iif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to 
table"
+
+   match="tos 0x10"
+   fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to 
table"
+
+   match="fwmark 0x64"
+   getmatch="mark 0x64"
+   fib_rule6_test_match_n_redir

Re: [PATCH net-next v2 1/3] ipv4: support sport and dport in RTM_GETROUTE

2018-05-07 Thread Roopa Prabhu

On Sun, May 6, 2018 at 6:46 PM, David Ahern <d...@cumulusnetworks.com> wrote:
> On 5/6/18 6:59 PM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <ro...@cumulusnetworks.com>
>>
>> This is a followup to fib rules sport, dport match support.
>> Having them supported in getroute makes it easier to test
>> fib rule lookups. Used by fib rule self tests. Before this patch
>> getroute used same skb to pass through the route lookup and
>> for the netlink getroute reply msg. This patch allocates separate
>> skb's to keep flow dissector happy.
>>
>> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
>> ---
>>  include/uapi/linux/rtnetlink.h |   2 +
>>  net/ipv4/route.c   | 151 
>> ++---
>>  2 files changed, 115 insertions(+), 38 deletions(-)
>>
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index 9b15005..630ecf4 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -327,6 +327,8 @@ enum rtattr_type_t {
>>   RTA_PAD,
>>   RTA_UID,
>>   RTA_TTL_PROPAGATE,
>> + RTA_SPORT,
>> + RTA_DPORT,
>
> If you are going to add sport and dport because of the potential for FIB
> rules, you need to add ip-proto as well. I realize existing code assumed
> UDP, but the FIB rules cover any IP proto. Yes, I know this makes the
> change much larger to generate tcp, udp as well as iphdr options; the
> joys of new features. ;-)


:) sure..like i mentioned in the cover letter..., i was thinking of
submitting follow up patches for more ip_proto.
since i will be spinning v3, let me see if i can include that too.



>
> I also suggest a comment that these new RTA attributes are used for
> GETROUTE only.


sure

>
> And you need to add the new entries to rtm_ipv4_policy.
>
>
>>   __RTA_MAX
>>  };

ack,

>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 1412a7b..e91ed62 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2568,11 +2568,10 @@ struct rtable *ip_route_output_flow(struct net *net, 
>> struct flowi4 *flp4,
>>  EXPORT_SYMBOL_GPL(ip_route_output_flow);
>>
>>  /* called with rcu_read_lock held */
>> -static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 
>> table_id,
>> - struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
>> - u32 seq)
>> +static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
>> + struct rtable *rt, u32 table_id, struct flowi4 *fl4,
>> + struct sk_buff *skb, u32 portid, u32 seq)
>>  {
>> - struct rtable *rt = skb_rtable(skb);
>>   struct rtmsg *r;
>>   struct nlmsghdr *nlh;
>>   unsigned long expires = 0;
>> @@ -2651,6 +2650,14 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
>> __be32 src, u32 table_id,
>>   from_kuid_munged(current_user_ns(), fl4->flowi4_uid)))
>>   goto nla_put_failure;
>>
>> + if (fl4->fl4_sport &&
>> + nla_put_be16(skb, RTA_SPORT, fl4->fl4_sport))
>> + goto nla_put_failure;
>> +
>> + if (fl4->fl4_dport &&
>> + nla_put_be16(skb, RTA_DPORT, fl4->fl4_dport))
>> + goto nla_put_failure;
>
> Why return the attributes to the user? I can't see any value in that.
> UID option is not returned either so there is precedence.

hmm..i do see UID returned just 2 lines above. :)

In the least i think it will confirm that the kernel did see your attributes :).


>
>
>> +
>>   error = rt->dst.error;
>>
>>   if (rt_is_input_route(rt)) {
>> @@ -2668,7 +2675,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
>> __be32 src, u32 table_id,
>>   }
>>   } else
>>  #endif
>> - if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
>> + if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
>>   goto nla_put_failure;
>>   }
>>
>> @@ -2683,35 +2690,86 @@ static int rt_fill_info(struct net *net,  __be32 
>> dst, __be32 src, u32 table_id,
>>   return -EMSGSIZE;
>>  }
>>
>> +static int nla_get_port(struct nlattr *attr, __be16 *port)
>> +{
>> + int p = nla_get_be16(attr);
>
> __be16 p;
>
>> +
>> + if (p <= 0 || p >= 0x)
>> + return -EINVAL;
>
> This check i

[PATCH net-next v2 2/3] ipv6: support sport and dport in RTM_GETROUTE

2018-05-06 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib6 rules sport and dport
match support. Having them supported in getroute
makes it easier to test fib6 rule lookups. Used by fib6 rule
self tests.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 net/ipv6/route.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8ed1b51..bcdc056 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4071,6 +4071,8 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] 
= {
[RTA_UID]   = { .type = NLA_U32 },
[RTA_MARK]  = { .type = NLA_U32 },
[RTA_TABLE] = { .type = NLA_U32 },
+   [RTA_SPORT] = { .type = NLA_U16 },
+   [RTA_DPORT] = { .type = NLA_U16 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -4728,6 +4730,17 @@ int rt6_dump_route(struct fib6_info *rt, void *p_arg)
 arg->cb->nlh->nlmsg_seq, NLM_F_MULTI);
 }
 
+static int nla_get_port(struct nlattr *attr, __be16 *port)
+{
+   int p = nla_get_be16(attr);
+
+   if (p <= 0 || p >= 0x)
+   return -EINVAL;
+
+   *port = p;
+   return 0;
+}
+
 static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
  struct netlink_ext_ack *extack)
 {
@@ -4782,6 +4795,18 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
else
fl6.flowi6_uid = iif ? INVALID_UID : current_uid();
 
+   if (tb[RTA_SPORT]) {
+   err = nla_get_port(tb[RTA_SPORT], _sport);
+   if (err)
+   goto errout;
+   }
+
+   if (tb[RTA_DPORT]) {
+   err = nla_get_port(tb[RTA_DPORT], _dport);
+   if (err)
+   goto errout;
+   }
+
if (iif) {
struct net_device *dev;
int flags = 0;
-- 
2.1.4

[PATCH net-next v2 3/3] selftests: net: initial fib rule tests

2018-05-06 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++
 2 files changed, 225 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 902820d..9a8f9b0 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,7 +6,7 @@ CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
rtnetlink.sh
 TEST_PROGS += fib_tests.sh fib-onlink-tests.sh in_netns.sh pmtu.sh udpgso.sh
-TEST_PROGS += udpgso_bench.sh
+TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh
 TEST_GEN_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh 
b/tools/testing/selftests/net/fib_rule_tests.sh
new file mode 100644
index 000..01a250f
--- /dev/null
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -0,0 +1,224 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB rules API
+
+ret=0
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+IP="ip -netns testns"
+
+RTABLE=100
+GW_IP4=192.51.100.2
+SRC_IP=192.51.100.3
+GW_IP6=2001:db8:1::2
+SRC_IP6=2001:db8:1::3
+
+DEV_ADDR=192.51.100.1
+DEV=dummy0
+
+log_test()
+{
+   local rc=$1
+   local expected=$2
+   local msg="$3"
+
+   if [ ${rc} -eq ${expected} ]; then
+   nsuccess=$((nsuccess+1))
+   printf "\nTEST: %-50s  [ OK ]\n" "${msg}"
+   else
+   nfail=$((nfail+1))
+   printf "\nTEST: %-50s  [FAIL]\n" "${msg}"
+   if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+   echo
+   echo "hit enter to continue, 'q' to quit"
+   read a
+   [ "$a" = "q" ] && exit 1
+   fi
+   fi
+}
+
+log_section()
+{
+   echo
+   echo 
"##"
+   echo "TEST SECTION: $*"
+   echo 
"##"
+}
+
+setup()
+{
+   set -e
+   ip netns add testns
+   $IP link set dev lo up
+
+   $IP link add dummy0 type dummy
+   $IP link set dev dummy0 up
+   $IP address add 198.51.100.1/24 dev dummy0
+   $IP -6 address add 2001:db8:1::1/64 dev dummy0
+
+   set +e
+}
+
+cleanup()
+{
+   $IP link del dev dummy0 &> /dev/null
+   ip netns del testns
+}
+
+fib_check_iproute_support()
+{
+   ip rule help 2>&1 | grep -q $1
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 iprule too old, missing $1 match"
+   return 1
+   fi
+
+   ip route get help 2>&1 | grep -q $2
+   if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 get route too old, missing $2 match"
+   return 1
+   fi
+
+   return 0
+}
+
+fib_rule6_del()
+{
+   $IP -6 rule del $1
+   log_test $? 0 "rule6 del $1"
+}
+
+fib_rule6_del_by_pref()
+{
+   pref=$($IP -6 rule show | grep "$1 lookup $TABLE" | cut -d ":" -f 1)
+   $IP -6 rule del pref $pref
+}
+
+fib_rule6_test_match_n_redirect()
+{
+   local match="$1"
+   local getmatch="$2"
+
+   $IP -6 rule add $match table $RTABLE
+   $IP -6 route get $GW_IP6 $getmatch | grep -q "table $RTABLE"
+   log_test $? 0 "rule6 check: $1"
+
+   fib_rule6_del_by_pref "$match"
+   log_test $? 0 "rule6 del by pref: $match"
+}
+
+fib_rule6_test()
+{
+   # setup the fib rule redirect route
+   $IP -6 route add table $RTABLE default via $GW_IP6 dev $DEV onlink
+
+   match="oif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "oif redirect to 
table"
+
+   match="from $SRC_IP6 iif $DEV"
+   fib_rule6_test_match_n_redirect "$match" "$match" "iif redirect to 
table"
+
+   match="tos 0x10"
+   fib_rule6_test_match_n_redirect "$match" "$match" "tos redirect to 
table"
+
+   match="fwmark 0x64"
+   getmatch="mark 0x64&q

[PATCH net-next v2 1/3] ipv4: support sport and dport in RTM_GETROUTE

2018-05-06 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This is a followup to fib rules sport, dport match support.
Having them supported in getroute makes it easier to test
fib rule lookups. Used by fib rule self tests. Before this patch
getroute used same skb to pass through the route lookup and
for the netlink getroute reply msg. This patch allocates separate
skb's to keep flow dissector happy.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 include/uapi/linux/rtnetlink.h |   2 +
 net/ipv4/route.c   | 151 ++---
 2 files changed, 115 insertions(+), 38 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 9b15005..630ecf4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -327,6 +327,8 @@ enum rtattr_type_t {
RTA_PAD,
RTA_UID,
RTA_TTL_PROPAGATE,
+   RTA_SPORT,
+   RTA_DPORT,
__RTA_MAX
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1412a7b..e91ed62 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2568,11 +2568,10 @@ struct rtable *ip_route_output_flow(struct net *net, 
struct flowi4 *flp4,
 EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 /* called with rcu_read_lock held */
-static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
-   struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq)
+static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
+   struct rtable *rt, u32 table_id, struct flowi4 *fl4,
+   struct sk_buff *skb, u32 portid, u32 seq)
 {
-   struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
struct nlmsghdr *nlh;
unsigned long expires = 0;
@@ -2651,6 +2650,14 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
from_kuid_munged(current_user_ns(), fl4->flowi4_uid)))
goto nla_put_failure;
 
+   if (fl4->fl4_sport &&
+   nla_put_be16(skb, RTA_SPORT, fl4->fl4_sport))
+   goto nla_put_failure;
+
+   if (fl4->fl4_dport &&
+   nla_put_be16(skb, RTA_DPORT, fl4->fl4_dport))
+   goto nla_put_failure;
+
error = rt->dst.error;
 
if (rt_is_input_route(rt)) {
@@ -2668,7 +2675,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
}
} else
 #endif
-   if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
+   if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
goto nla_put_failure;
}
 
@@ -2683,35 +2690,86 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
return -EMSGSIZE;
 }
 
+static int nla_get_port(struct nlattr *attr, __be16 *port)
+{
+   int p = nla_get_be16(attr);
+
+   if (p <= 0 || p >= 0x)
+   return -EINVAL;
+
+   *port = p;
+   return 0;
+}
+
+static int inet_rtm_getroute_reply(struct sk_buff *in_skb, struct nlmsghdr 
*nlh,
+  __be32 dst, __be32 src, struct flowi4 *fl4,
+  struct rtable *rt, struct fib_result *res)
+{
+   struct net *net = sock_net(in_skb->sk);
+   struct rtmsg *rtm = nlmsg_data(nlh);
+   u32 table_id = RT_TABLE_MAIN;
+   struct sk_buff *skb;
+   int err = 0;
+
+   skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
+   if (!skb) {
+   err = -ENOMEM;
+   return err;
+   }
+
+   if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
+   table_id = res->table ? res->table->tb_id : 0;
+
+   if (rtm->rtm_flags & RTM_F_FIB_MATCH)
+   err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
+   nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
+   rt->rt_type, res->prefix, res->prefixlen,
+   fl4->flowi4_tos, res->fi, 0);
+   else
+   err = rt_fill_info(net, dst, src, rt, table_id,
+  fl4, skb, NETLINK_CB(in_skb).portid,
+  nlh->nlmsg_seq);
+   if (err < 0)
+   goto errout;
+
+   return rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
+
+errout:
+   kfree_skb(skb);
+   return err;
+}
+
 static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 struct netlink_ext_ack *extack)
 {
struct net *net = sock_net(in_skb->sk);
-   struct rtmsg *rtm;
struct nlattr *tb[RTA_MAX+1];
+   __be16 sport = 0, dport = 0;
struct fib_result res = {};
struct rtable *rt = NULL;
+   struct sk_buff

[PATCH net-next v2 0/3] fib rule selftest

2018-05-06 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.


Roopa Prabhu (3):
  ipv4: support sport and dport in RTM_GETROUTE
  ipv6: support sport and dport in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h|   2 +
 net/ipv4/route.c  | 152 -
 net/ipv6/route.c  |  25 +++
 tools/testing/selftests/net/Makefile  |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

Re: [PATCH net-next] neighbour: support for NTF_EXT_LEARNED flag

2018-04-25 Thread Roopa Prabhu

On Wed, Apr 25, 2018 at 10:20 AM, David Miller <da...@davemloft.net> wrote:
> From: Roopa Prabhu <ro...@cumulusnetworks.com>
> Date: Tue, 24 Apr 2018 13:49:34 -0700
>
>> From: Roopa Prabhu <ro...@cumulusnetworks.com>
>>
>> This patch extends NTF_EXT_LEARNED support to the neighbour system.
>> Example use-case: An Ethernet VPN implementation (eg in FRR routing suite)
>> can use this flag to add dynamic reachable external neigh entires
>> learned via control plane. The use of neigh NTF_EXT_LEARNED in this
>> patch is consistent with its use with bridge and vxlan fdb entries.
>>
>> Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
>
> No objection to the patch or the facility, so applied, thanks.

Thanks!

>
> What exactly is the name of this VPN technology in the FRR routing
> suite?

Its "Ethernet VPN" with BGP based control plane.

https://github.com/FRRouting/frr/wiki/Frr-3.0-%E2%86%92-4.0

reference RFC's:
https://tools.ietf.org/html/rfc7432  : BGP MPLS-Based Ethernet VPN
https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay-07
(describes how rfc7432 can be used as an Network Virtualization
Overlay (NVO) solution: eg with vxlan).

I also talked about it in my netdev2.2 tutorial:
https://www.netdevconf.org/2.2/slides/prabhu-linuxbridge-tutorial.pdf
(slide 60)

Found this blog by Vincent which describes it well:
https://vincent.bernat.im/en/blog/2017-vxlan-bgp-evpn

For the context of this patch:
Neighbor reachability information is exchanged via BGP. Remote
neighbors learnt via
BGP are installed in the kernel with NTF_EXT_LEARNED to indicate that
they are external neighbor entries.
FRR BGP also installs vxlan and bridge remote fdb entries with the
same flag. Basically replaces flood and learn
with control plane learning via BGP. Remote neighbor entries are also
used for arp/nd proxy.

[PATCH net-next] neighbour: support for NTF_EXT_LEARNED flag

2018-04-24 Thread Roopa Prabhu

From: Roopa Prabhu <ro...@cumulusnetworks.com>

This patch extends NTF_EXT_LEARNED support to the neighbour system.
Example use-case: An Ethernet VPN implementation (eg in FRR routing suite)
can use this flag to add dynamic reachable external neigh entires
learned via control plane. The use of neigh NTF_EXT_LEARNED in this
patch is consistent with its use with bridge and vxlan fdb entries.

Signed-off-by: Roopa Prabhu <ro...@cumulusnetworks.com>
---
 include/net/neighbour.h | 19 ++-
 net/core/neighbour.c|  8 +++-
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index e421f86..6c1eecd 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -246,6 +246,7 @@ static inline void *neighbour_priv(const struct neighbour 
*n)
 #define NEIGH_UPDATE_F_OVERRIDE0x0001
 #define NEIGH_UPDATE_F_WEAK_OVERRIDE   0x0002
 #define NEIGH_UPDATE_F_OVERRIDE_ISROUTER   0x0004
+#define NEIGH_UPDATE_F_EXT_LEARNED 0x2000
 #define NEIGH_UPDATE_F_ISROUTER0x4000
 #define NEIGH_UPDATE_F_ADMIN   0x8000
 
@@ -526,5 +527,21 @@ static inline void neigh_ha_snapshot(char *dst, const 
struct neighbour *n,
} while (read_seqretry(>ha_lock, seq));
 }
 
-
+static inline void neigh_update_ext_learned(struct neighbour *neigh, u32 flags,
+   int *notify)
+{
+   u8 ndm_flags = 0;
+
+   if (!(flags & NEIGH_UPDATE_F_ADMIN))
+   return;
+
+   ndm_flags |= (flags & NEIGH_UPDATE_F_EXT_LEARNED) ? NTF_EXT_LEARNED : 0;
+   if ((neigh->flags ^ ndm_flags) & NTF_EXT_LEARNED) {
+   if (ndm_flags & NTF_EXT_LEARNED)
+   neigh->flags |= NTF_EXT_LEARNED;
+   else
+   neigh->flags &= ~NTF_EXT_LEARNED;
+   *notify = 1;
+   }
+}
 #endif
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index ce51986..5afae29 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -820,7 +820,8 @@ static void neigh_periodic_work(struct work_struct *work)
write_lock(>lock);
 
state = n->nud_state;
-   if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
+   if ((state & (NUD_PERMANENT | NUD_IN_TIMER)) ||
+   (n->flags & NTF_EXT_LEARNED)) {
write_unlock(>lock);
goto next_elt;
}
@@ -1136,6 +1137,8 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
if (neigh->dead)
goto out;
 
+   neigh_update_ext_learned(neigh, flags, );
+
if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
if (old & NUD_CONNECTED)
@@ -1781,6 +1784,9 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr 
*nlh,
flags &= ~NEIGH_UPDATE_F_OVERRIDE;
}
 
+   if (ndm->ndm_flags & NTF_EXT_LEARNED)
+   flags |= NEIGH_UPDATE_F_EXT_LEARNED;
+
if (ndm->ndm_flags & NTF_USE) {
neigh_event_send(neigh, NULL);
err = 0;
-- 
2.1.4

Re: [net-next regression] kselftest failure in fib_nl_newrule()

2018-04-24 Thread Roopa Prabhu

On Tue, Apr 24, 2018 at 2:46 AM, Anders Roxell  wrote:
> Hi,
>
> fib-onlink-tests.sh (from kselftest) found a regression between
> next-20180424 [1] (worked with tag next-20180423 [2])
>
> here is tree commits that look suspicious specially this patch (sha:
> f9d4b0c1e969)
> rewrites fib_nl_newrule().his patch (sha: f9d4b0c1e969) rewrites
> fib_nl_newrule().
>
> b16fb418b1bf ("net: fib_rules: add extack support")
> f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs
> into fib_nl2rule")
> 8a14e46f1402 ("net/ipv6: Fix missing rcu dereferences on from")
>
> Cheers,
> Anders
> [1] https://lkft.validation.linaro.org/scheduler/job/195181#L3447
> [2] https://lkft.validation.linaro.org/scheduler/job/193410#L3438


Thanks for the report.

It should be fixed by my last commit:
9c20b9372fba "net: fib_rules: fix l3mdev netlink attr processing"

Just ran them again and they pass.

Re: [PATCH net v2] net: ethtool: Add missing kernel doc for FEC parameters

2018-04-23 Thread Roopa Prabhu

On Mon, Apr 23, 2018 at 3:51 PM, Florian Fainelli <f.faine...@gmail.com> wrote:
> While adding support for ethtool::get_fecparam and set_fecparam, kernel
> doc for these functions was missed, add those.
>
> Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction 
> modes")
> Signed-off-by: Florian Fainelli <f.faine...@gmail.com>

Acked-by: Roopa Prabhu <ro...@cumulusnetworks.com>

Thanks Florian.

1 2 3 4 5 6 7 8 9 >

1 - 100 of 834 matches

Mail list logo