Re: [PATCH] iproute2: show network device dependency tree

2017-02-25 Thread Jiri Pirko
Sat, Feb 25, 2017 at 09:22:22PM CET, zaboj.camp...@post.cz wrote:
>On Sat, 2017-02-25 at 18:39 +0100, Jiri Pirko wrote:
>> > Sat, Feb 25, 2017 at 05:59:00PM CET, zaboj.camp...@post.cz wrote:
>> > Add the argument '-tree' to ip-link to show network devices dependency 
>> > tree.
>> > 
>> > Example:
>> > 
>> > $ ip -tree link
>> > eth0
>> >    bond0
>> > eth1
>> >    bond0
>> > eth2
>> >    bond1
>> > eth3
>> >    bond1
>> 
>> 
>> Hmm, what is this good for? I'm probably missing something...
>
>I consider this kind of output useful when troubleshooting a complex
>configuration with many interfaces. It may show relations among
>interfaces.

Did you see https://github.com/jbenc/plotnetcfg ?


>
>
>> 
>> 
>> 
>> > 
>> > > > Signed-off-by: Zaboj Campula 
>> > ---
>> > include/utils.h |  1 +
>> > ip/ip.c |  5 ++-
>> > ip/ipaddress.c  | 97 
>> > -
>> > 3 files changed, 87 insertions(+), 16 deletions(-)
>> > 
>> > diff --git a/include/utils.h b/include/utils.h
>> > index 22369e0..f1acf4d 100644
>> > --- a/include/utils.h
>> > +++ b/include/utils.h
>> > @@ -20,6 +20,7 @@ extern int show_raw;
>> > extern int resolve_hosts;
>> > extern int oneline;
>> > extern int brief;
>> > +extern int tree;;
>> > extern int timestamp;
>> > extern int timestamp_short;
>> > extern const char * _SL_;
>> > diff --git a/ip/ip.c b/ip/ip.c
>> > index 07050b0..29747a5 100644
>> > --- a/ip/ip.c
>> > +++ b/ip/ip.c
>> > @@ -33,6 +33,7 @@ int show_details;
>> > int resolve_hosts;
>> > int oneline;
>> > int brief;
>> > +int tree;
>> > int timestamp;
>> > const char *_SL_;
>> > int force;
>> > @@ -57,7 +58,7 @@ static void usage(void)
>> > "-h[uman-readable] | -iec |\n"
>> > "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge 
>> > | link } |\n"
>> > "-4 | -6 | -I | -D | -B | -0 |\n"
>> > -"-l[oops] { maximum-addr-flush-attempts } | -br[ief] 
>> > |\n"
>> > +"-l[oops] { maximum-addr-flush-attempts } | -br[ief] 
>> > | -tr[ee] |\n"
>> > "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
>> > [filename] |\n"
>> > "-rc[vbuf] [size] | -n[etns] name | -a[ll] | 
>> > -c[olor]}\n");
>> >exit(-1);
>> > @@ -257,6 +258,8 @@ int main(int argc, char **argv)
>> > > >batch_file = argv[1];
>> > > >} else if (matches(opt, "-brief") == 0) {
>> > > >++brief;
>> > > > +  } else if (matches(opt, "-tree") == 0) {
>> > > > +  ++tree;
>> > > >} else if (matches(opt, "-rcvbuf") == 0) {
>> > > >unsigned int size;
>> > 
>> > diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> > index 242c6ea..5ebcb1a 100644
>> > --- a/ip/ipaddress.c
>> > +++ b/ip/ipaddress.c
>> > @@ -1534,6 +1534,69 @@ static int iplink_filter_req(struct nlmsghdr *nlh, 
>> > int reqlen)
>> >return 0;
>> > }
>> > 
>> > +static int has_master(struct nlmsg_chain *linfo, int index)
>> > +{
>> > > > +  struct nlmsg_list *l;
>> > > > +  struct rtattr *tb[IFLA_MAX+1];
>> > > > +  int len;
>> > > > +  for (l = linfo->head; l; l = l->next) {
>> > > > +  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>> > > > +  len = l->h.nlmsg_len;
>> > > > +  len -= NLMSG_LENGTH(sizeof(*ifi));
>> > > > +  parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
>> > > > +  if (tb[IFLA_MASTER] && *(int 
>> > > > *)RTA_DATA(tb[IFLA_MASTER]) == index)
>> > > > +  return 1;
>> > > > +  }
>> > > > +  return 0;
>> > +}
>> > +
>> > +static struct nlmsg_list *get_master(struct nlmsg_chain *linfo, struct 
>> > rtattr **tb)
>> > +{
>> > > > +  struct nlmsg_list *l;
>> > > > +  if (tb[IFLA_MASTER]) {
>> > > > +  int master = *(int *)RTA_DATA(tb[IFLA_MASTER]);
>> > > > +  for (l = linfo->head; l; l = l->next) {
>> > > > +  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>> > > > +  if (ifi->ifi_index == master)
>> > > > +  return l;
>> > > > +  }
>> > > > +  }
>> > > > +  return NULL;
>> > +}
>> > +
>> > +static void print_dev_tree_item(struct nlmsg_chain *linfo, struct 
>> > nlmsg_list *l, int indent) {
>> > > > +  char *name;
>> > > > +  int len;
>> > > > +  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>> > > > +  struct rtattr *tb[IFLA_MAX+1];
>> > > > +  len = l->h.nlmsg_len;
>> > > > +  len -= NLMSG_LENGTH(sizeof(*ifi));
>> > > > +  parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
>> > > > +  name = (char *)(tb[IFLA_IFNAME] ? 
>> > > > rta_getattr_str(tb[IFLA_IFNAME]) : "");
>> > +
>> > > > +  printf("%*s%s\n", indent * 4, "", name);
>> > +
>> > > > +  struct nlmsg_list *master = get_master(linfo, tb);
>> > > > +  if (master) {
>> > > > 

Re: [PATCH v5 4/6] ipv6: addrconf: fix 48 bit 6lowpan autoconfiguration

2017-02-25 Thread Alexander Aring

Hi,

okay now I am finally confused.

On 02/24/2017 01:14 PM, Luiz Augusto von Dentz wrote:
> From: Alexander Aring 
> 
> This patch adds support for 48 bit 6LoWPAN address length
> autoconfiguration which is the case for BTLE 6LoWPAN.
> 
> Signed-off-by: Alexander Aring 
> Signed-off-by: Luiz Augusto von Dentz 
> Reviewed-by: Stefan Schmidt 
> ---
>  net/ipv6/addrconf.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 3a2025f..7756640 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2052,12 +2052,19 @@ static void addrconf_leave_anycast(struct 
> inet6_ifaddr *ifp)
>   __ipv6_dev_ac_dec(ifp->idev, );
>  }
>  
> -static int addrconf_ifid_eui64(u8 *eui, struct net_device *dev)
> +static int addrconf_ifid_6lowpan(u8 *eui, struct net_device *dev)
>  {
> - if (dev->addr_len != EUI64_ADDR_LEN)
> + switch (dev->addr_len) {
> + case ETH_ALEN:
> + return addrconf_ifid_eui48(eui, dev);

You need to make the same thing here like what we discusss in antoher
mail.

   In the figure, letter 'b' represents a bit from the
   Bluetooth device address, copied as is without any changes on any
   bit.  This means that no bit in the IID indicates whether the
   underlying Bluetooth device address is public or random.

   |0  1|1  3|3  4|4  6|
   |0  5|6  1|2  7|8  3|
   +++++
   |||1110||
   +++++

The function addrconf_ifid_eui48 will do a u/l bitflip. You need to change
that otherwise transparent 6lowpan adaptiation will not match on your
local address.

Making u/l bit here, but not in IPHC code -> will not work together.

---

btw: That's why I think we should provide some function _once_ to
generate the IID for iphc code and ipv6. Then such mistakes will not
happen.

So we don't want to have some module dependency of 6lowpan in ipv6.
My first idea is to make a callback in 6lowpan private data of
netdevice. :-/

- Alex


Re: [PATCH v5 6/6] 6lowpan: Fix IID format for Bluetooth

2017-02-25 Thread Alexander Aring

Hi,

On 02/26/2017 07:05 AM, Alexander Aring wrote:
> 
> Hi,
> 
> On 02/24/2017 01:14 PM, Luiz Augusto von Dentz wrote:
>> From: Luiz Augusto von Dentz 
>>
>> Accourding to RFC 7668 U/L bit shall not be used:
>>
>> https://wiki.tools.ietf.org/html/rfc7668#section-3.2.2 [Page 10]:
>>
>>In the figure, letter 'b' represents a bit from the
>>Bluetooth device address, copied as is without any changes on any
>>bit.  This means that no bit in the IID indicates whether the
>>underlying Bluetooth device address is public or random.
>>
>>|0  1|1  3|3  4|4  6|
>>|0  5|6  1|2  7|8  3|
>>+++++
>>|||1110||
>>+++++
>>
>> Because of this the code cannot figure out the address type from the IP
>> address anymore thus it makes no sense to use peer_lookup_ba as it needs
>> the peer address type.
>>
> 
> I am still not quite 100% of this and want to leave my opinion about this
> handling which can be interpreted in a different way.
> 
> The RFC says here:
> 
> Following the guidance of [RFC7136], a 64-bit
> Interface Identifier (IID) is formed from the 48-bit Bluetooth device
> address by inserting two octets, with hexadecimal values of 0xFF and
> 0xFE in the middle of the 48-bit Bluetooth device address as shown in
> Figure 6. 

Okay, they said from IID from the 48-bit address. 

And IID is what you need here and what [RFC7136] describes as result,
so I think you are right.

There is no need for special u/l bitflip or link-layer multicast handling.

- Alex


Re: [PATCH v5 6/6] 6lowpan: Fix IID format for Bluetooth

2017-02-25 Thread Alexander Aring

Hi,

On 02/24/2017 01:14 PM, Luiz Augusto von Dentz wrote:
> From: Luiz Augusto von Dentz 
> 
> Accourding to RFC 7668 U/L bit shall not be used:
> 
> https://wiki.tools.ietf.org/html/rfc7668#section-3.2.2 [Page 10]:
> 
>In the figure, letter 'b' represents a bit from the
>Bluetooth device address, copied as is without any changes on any
>bit.  This means that no bit in the IID indicates whether the
>underlying Bluetooth device address is public or random.
> 
>|0  1|1  3|3  4|4  6|
>|0  5|6  1|2  7|8  3|
>+++++
>|||1110||
>+++++
> 
> Because of this the code cannot figure out the address type from the IP
> address anymore thus it makes no sense to use peer_lookup_ba as it needs
> the peer address type.
> 

I am still not quite 100% of this and want to leave my opinion about this
handling which can be interpreted in a different way.

The RFC says here:

Following the guidance of [RFC7136], a 64-bit
Interface Identifier (IID) is formed from the 48-bit Bluetooth device
address by inserting two octets, with hexadecimal values of 0xFF and
0xFE in the middle of the 48-bit Bluetooth device address as shown in
Figure 6.  In the figure, letter 'b' represents a bit from the
Bluetooth device address, copied as is without any changes on any
bit.

You cut the important part "Following the guidance of [RFC7136]".

---

What you describe is "How to see the link layer address from L3 layer"
and then it clearly said "there is no bit which indicate something
special that the address is public/random, whatever".

---

At my point of you the RFC tells here: grab the L2 layer in the way how
you quote above, but then apply [RFC7136] which is L3 handling.

I simple cc now some 6lo ietf stuff and others 6lowpan hackers, maybe we
have luck and somebody can answer this question.

---

And when we already about to start a discussion about that, what is do
to with the multicast bit of L2 address? I know you told it's not an
EUI-48 address. Are you sure?

If it's really EUI-48 then I think [RFC7136] should be applied.

- Alex


Re: [PATCH 0/2] net: sched: make it possible to unhide default qdiscs

2017-02-25 Thread Jiri Kosina
On Sat, 25 Feb 2017, Jiri Kosina wrote:

> This is a followup to a patchset submitted back in October 2016

No idea what happened that my usual patchset sending process broke and 
there is no 'References/In-reply-to' in the actual patches :/ I will 
investigate that. Please let me know in case you'd like a resend just 
because of this.

Sorry for the hassle,

-- 
Jiri Kosina
SUSE Labs



[PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-02-25 Thread Jiri Kosina
From: Jiri Kosina 

The original reason [1] for having hidden qdiscs (potential scalability 
issues in qdisc_match_from_root() with single linked list in case of large 
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert 
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by 
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too 
intrusive / unnecessary change of default behavior towards userspace. 
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows 
applications to request complete qdisc hierarchy dump, including the ones 
that have always been implicit/invisible.

[1] 
http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com
[2] 
http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net

Signed-off-by: Jiri Kosina 
---
 include/net/pkt_sched.h|  2 +-
 include/net/sch_generic.h  |  1 +
 include/uapi/linux/rtnetlink.h |  1 +
 net/sched/sch_api.c| 42 ++
 net/sched/sch_cbq.c|  4 
 net/sched/sch_drr.c|  2 ++
 net/sched/sch_dsmark.c |  1 +
 net/sched/sch_generic.c|  2 +-
 net/sched/sch_hfsc.c   |  2 ++
 net/sched/sch_htb.c|  1 +
 net/sched/sch_mq.c |  2 +-
 net/sched/sch_mqprio.c |  2 +-
 net/sched/sch_multiq.c |  1 +
 net/sched/sch_prio.c   |  4 +++-
 net/sched/sch_qfq.c|  1 +
 net/sched/sch_red.c|  1 +
 net/sched/sch_sfb.c|  1 +
 net/sched/sch_tbf.c|  1 +
 18 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd334c9584e9..0625eac2c601 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672..e7dca250d115 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -66,6 +66,7 @@ struct Qdisc {
 #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy :
  * qdisc_tree_decrease_qlen() should stop.
  */
+#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f0379d83a..c7de00e09797 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -542,6 +542,7 @@ enum {
TCA_FCNT,
TCA_STATS2,
TCA_STAB,
+   TCA_DUMP_INVISIBLE,
TCA_PAD,
__TCA_MAX
 };
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 206dc24add3a..8e4e6ab1847a 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
return NULL;
 }
 
-void qdisc_hash_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q, bool invisible)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
@@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q)
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
+   if (invisible)
+   q->flags |= TCQ_F_INVISIBLE;
}
 }
 EXPORT_SYMBOL(qdisc_hash_add);
@@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
goto err_out4;
}
 
-   qdisc_hash_add(sch);
+   qdisc_hash_add(sch, false);
 
return sch;
}
@@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct 
Qdisc *q, u32 clid,
return -1;
 }
 
-static bool tc_qdisc_dump_ignore(struct Qdisc *q)
+static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool dump_invisible)
 {
-   return (q->flags & TCQ_F_BUILTIN) ? true : false;
+   if (q->flags & TCQ_F_BUILTIN)
+   return true;
+   if ((q->flags & TCQ_F_INVISIBLE) && !dump_invisible)
+   return true;
+
+   return false;
 }
 
 static int qdisc_notify(struct net *net, struct sk_buff *oskb,
@@ -1416,12 +1423,12 @@ static int 

[PATCH 2/2] iproute2: add support for invisible qdisc dumping

2017-02-25 Thread Jiri Kosina
From: Jiri Kosina 

Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking 
kernel to perform 'full qdisc dump', as for historical reasons some of the 
default qdiscs are being hidden by the kernel.

The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.

Signed-off-by: Jiri Kosina 
---
 tc/tc_qdisc.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index 3a3701c2..29da9269 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -34,7 +34,7 @@ static int usage(void)
fprintf(stderr, "   [ stab [ help | STAB_OPTIONS] ]\n");
fprintf(stderr, "   [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n");
fprintf(stderr, "\n");
-   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
]\n");
+   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
| invisible ]\n");
fprintf(stderr, "Where:\n");
fprintf(stderr, "QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | 
etc. }\n");
fprintf(stderr, "OPTIONS := ... try tc qdisc add  
help\n");
@@ -292,6 +292,7 @@ static int tc_qdisc_list(int argc, char **argv)
 {
struct tcmsg t = { .tcm_family = AF_UNSPEC };
char d[16] = {};
+   bool dump_invisible = false;
 
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@@ -306,6 +307,8 @@ static int tc_qdisc_list(int argc, char **argv)
t.tcm_parent = TC_H_INGRESS;
} else if (matches(*argv, "help") == 0) {
usage();
+   } else if (strcmp(*argv, "invisible") == 0) {
+   dump_invisible = true;
} else {
fprintf(stderr, "What is \"%s\"? Try \"tc qdisc 
help\".\n", *argv);
return -1;
@@ -325,7 +328,25 @@ static int tc_qdisc_list(int argc, char **argv)
filter_ifindex = t.tcm_ifindex;
}
 
-   if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
+   if (dump_invisible) {
+   struct {
+   struct nlmsghdr n;
+   struct tcmsg t;
+   char buf[256];
+   } req = {
+   .n.nlmsg_type = RTM_GETQDISC,
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+   };
+
+   req.t.tcm_family = AF_UNSPEC;
+
+   addattr(, 256, TCA_DUMP_INVISIBLE);
+   if (rtnl_dump_request_n(, ) < 0) {
+   perror("Cannot send dump request");
+   return 1;
+   }
+
+   } else if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
perror("Cannot send dump request");
return 1;
}
-- 
Jiri Kosina
SUSE Labs



[PATCH 0/2] net: sched: make it possible to unhide default qdiscs

2017-02-25 Thread Jiri Kosina
This is a followup to a patchset submitted back in October 2016


http://lkml.kernel.org/r/alpine.lnx.2.00.1610211024400.31...@cbobk.fhfr.pm

that aimed at making the qdisc hierarchy more transparent and unhide the 
default qdiscs in the dump by default. After some discussion, it turned 
out that the most viable way to go would be to introduce a new netlink 
attribute for this, and let the userspace chose whether it wants to see 
the whole hierarchy dump


http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net

This is implemented by this patchset. First patch adds the new 
TCA_DUMP_INVISIBLE netlink attribute and its handling in the kernel, the 
second one adds the iproute2 counterpart.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH] iproute2: show network device dependency tree

2017-02-25 Thread Zaboj Campula
On Sat, 2017-02-25 at 18:39 +0100, Jiri Pirko wrote:
> > Sat, Feb 25, 2017 at 05:59:00PM CET, zaboj.camp...@post.cz wrote:
> > Add the argument '-tree' to ip-link to show network devices dependency tree.
> > 
> > Example:
> > 
> > $ ip -tree link
> > eth0
> >    bond0
> > eth1
> >    bond0
> > eth2
> >    bond1
> > eth3
> >    bond1
> 
> 
> Hmm, what is this good for? I'm probably missing something...

I consider this kind of output useful when troubleshooting a complex
configuration with many interfaces. It may show relations among
interfaces.


> 
> 
> 
> > 
> > > > Signed-off-by: Zaboj Campula 
> > ---
> > include/utils.h |  1 +
> > ip/ip.c |  5 ++-
> > ip/ipaddress.c  | 97 
> > -
> > 3 files changed, 87 insertions(+), 16 deletions(-)
> > 
> > diff --git a/include/utils.h b/include/utils.h
> > index 22369e0..f1acf4d 100644
> > --- a/include/utils.h
> > +++ b/include/utils.h
> > @@ -20,6 +20,7 @@ extern int show_raw;
> > extern int resolve_hosts;
> > extern int oneline;
> > extern int brief;
> > +extern int tree;;
> > extern int timestamp;
> > extern int timestamp_short;
> > extern const char * _SL_;
> > diff --git a/ip/ip.c b/ip/ip.c
> > index 07050b0..29747a5 100644
> > --- a/ip/ip.c
> > +++ b/ip/ip.c
> > @@ -33,6 +33,7 @@ int show_details;
> > int resolve_hosts;
> > int oneline;
> > int brief;
> > +int tree;
> > int timestamp;
> > const char *_SL_;
> > int force;
> > @@ -57,7 +58,7 @@ static void usage(void)
> > "-h[uman-readable] | -iec |\n"
> > "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge 
> > | link } |\n"
> > "-4 | -6 | -I | -D | -B | -0 |\n"
> > -"-l[oops] { maximum-addr-flush-attempts } | -br[ief] 
> > |\n"
> > +"-l[oops] { maximum-addr-flush-attempts } | -br[ief] | 
> > -tr[ee] |\n"
> > "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
> > [filename] |\n"
> > "-rc[vbuf] [size] | -n[etns] name | -a[ll] | 
> > -c[olor]}\n");
> > exit(-1);
> > @@ -257,6 +258,8 @@ int main(int argc, char **argv)
> > > > batch_file = argv[1];
> > > > } else if (matches(opt, "-brief") == 0) {
> > > > ++brief;
> > > > +   } else if (matches(opt, "-tree") == 0) {
> > > > +   ++tree;
> > > > } else if (matches(opt, "-rcvbuf") == 0) {
> > > > unsigned int size;
> > 
> > diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> > index 242c6ea..5ebcb1a 100644
> > --- a/ip/ipaddress.c
> > +++ b/ip/ipaddress.c
> > @@ -1534,6 +1534,69 @@ static int iplink_filter_req(struct nlmsghdr *nlh, 
> > int reqlen)
> > return 0;
> > }
> > 
> > +static int has_master(struct nlmsg_chain *linfo, int index)
> > +{
> > > > +   struct nlmsg_list *l;
> > > > +   struct rtattr *tb[IFLA_MAX+1];
> > > > +   int len;
> > > > +   for (l = linfo->head; l; l = l->next) {
> > > > +   struct ifinfomsg *ifi = NLMSG_DATA(>h);
> > > > +   len = l->h.nlmsg_len;
> > > > +   len -= NLMSG_LENGTH(sizeof(*ifi));
> > > > +   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
> > > > +   if (tb[IFLA_MASTER] && *(int 
> > > > *)RTA_DATA(tb[IFLA_MASTER]) == index)
> > > > +   return 1;
> > > > +   }
> > > > +   return 0;
> > +}
> > +
> > +static struct nlmsg_list *get_master(struct nlmsg_chain *linfo, struct 
> > rtattr **tb)
> > +{
> > > > +   struct nlmsg_list *l;
> > > > +   if (tb[IFLA_MASTER]) {
> > > > +   int master = *(int *)RTA_DATA(tb[IFLA_MASTER]);
> > > > +   for (l = linfo->head; l; l = l->next) {
> > > > +   struct ifinfomsg *ifi = NLMSG_DATA(>h);
> > > > +   if (ifi->ifi_index == master)
> > > > +   return l;
> > > > +   }
> > > > +   }
> > > > +   return NULL;
> > +}
> > +
> > +static void print_dev_tree_item(struct nlmsg_chain *linfo, struct 
> > nlmsg_list *l, int indent) {
> > > > +   char *name;
> > > > +   int len;
> > > > +   struct ifinfomsg *ifi = NLMSG_DATA(>h);
> > > > +   struct rtattr *tb[IFLA_MAX+1];
> > > > +   len = l->h.nlmsg_len;
> > > > +   len -= NLMSG_LENGTH(sizeof(*ifi));
> > > > +   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
> > > > +   name = (char *)(tb[IFLA_IFNAME] ? 
> > > > rta_getattr_str(tb[IFLA_IFNAME]) : "");
> > +
> > > > +   printf("%*s%s\n", indent * 4, "", name);
> > +
> > > > +   struct nlmsg_list *master = get_master(linfo, tb);
> > > > +   if (master) {
> > > > +   if (indent > 8) {
> > > > +   printf("%*s...\n", (indent + 1) * 4, "");
> > > > +   } else {
> > > > +   print_dev_tree_item(linfo, master, indent + 1);
> > 

[PATCH] net: vxge: fix typo argumnet argument

2017-02-25 Thread Corentin Labbe
This commit fix the typo argumnet/argument

Signed-off-by: Corentin Labbe 
---
 drivers/net/ethernet/neterion/vxge/vxge-ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c 
b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
index db55e6d..0452848 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-ethtool.c
@@ -119,7 +119,7 @@ static void vxge_ethtool_gdrvinfo(struct net_device *dev,
  * @dev: device pointer.
  * @regs: pointer to the structure with parameters given by ethtool for
  * dumping the registers.
- * @reg_space: The input argumnet into which all the registers are dumped.
+ * @reg_space: The input argument into which all the registers are dumped.
  *
  * Dumps the vpath register space of Titan NIC into the user given
  * buffer area.
-- 
2.10.2



[PATCH] net: s2io: fix typo argumnet argument

2017-02-25 Thread Corentin Labbe
This commit fix the typo argumnet/argument

Signed-off-by: Corentin Labbe 
---
 drivers/net/ethernet/neterion/s2io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index c5c1d0e..118723e 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -5397,7 +5397,7 @@ static void s2io_ethtool_gdrvinfo(struct net_device *dev,
  *  s2io_nic structure.
  *  @regs : pointer to the structure with parameters given by ethtool for
  *  dumping the registers.
- *  @reg_space: The input argumnet into which all the registers are dumped.
+ *  @reg_space: The input argument into which all the registers are dumped.
  *  Description:
  *  Dumps the entire register space of xFrame NIC into the user given
  *  buffer area.
-- 
2.10.2



Re: [PATCH] iproute2: show network device dependency tree

2017-02-25 Thread Jiri Pirko
Sat, Feb 25, 2017 at 05:59:00PM CET, zaboj.camp...@post.cz wrote:
>Add the argument '-tree' to ip-link to show network devices dependency tree.
>
>Example:
>
>$ ip -tree link
>eth0
>bond0
>eth1
>bond0
>eth2
>bond1
>eth3
>bond1


Hmm, what is this good for? I'm probably missing something...



>
>Signed-off-by: Zaboj Campula 
>---
> include/utils.h |  1 +
> ip/ip.c |  5 ++-
> ip/ipaddress.c  | 97 -
> 3 files changed, 87 insertions(+), 16 deletions(-)
>
>diff --git a/include/utils.h b/include/utils.h
>index 22369e0..f1acf4d 100644
>--- a/include/utils.h
>+++ b/include/utils.h
>@@ -20,6 +20,7 @@ extern int show_raw;
> extern int resolve_hosts;
> extern int oneline;
> extern int brief;
>+extern int tree;;
> extern int timestamp;
> extern int timestamp_short;
> extern const char * _SL_;
>diff --git a/ip/ip.c b/ip/ip.c
>index 07050b0..29747a5 100644
>--- a/ip/ip.c
>+++ b/ip/ip.c
>@@ -33,6 +33,7 @@ int show_details;
> int resolve_hosts;
> int oneline;
> int brief;
>+int tree;
> int timestamp;
> const char *_SL_;
> int force;
>@@ -57,7 +58,7 @@ static void usage(void)
> "-h[uman-readable] | -iec |\n"
> "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
> link } |\n"
> "-4 | -6 | -I | -D | -B | -0 |\n"
>-"-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n"
>+"-l[oops] { maximum-addr-flush-attempts } | -br[ief] | 
>-tr[ee] |\n"
> "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
> [filename] |\n"
> "-rc[vbuf] [size] | -n[etns] name | -a[ll] | 
> -c[olor]}\n");
>   exit(-1);
>@@ -257,6 +258,8 @@ int main(int argc, char **argv)
>   batch_file = argv[1];
>   } else if (matches(opt, "-brief") == 0) {
>   ++brief;
>+  } else if (matches(opt, "-tree") == 0) {
>+  ++tree;
>   } else if (matches(opt, "-rcvbuf") == 0) {
>   unsigned int size;
> 
>diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>index 242c6ea..5ebcb1a 100644
>--- a/ip/ipaddress.c
>+++ b/ip/ipaddress.c
>@@ -1534,6 +1534,69 @@ static int iplink_filter_req(struct nlmsghdr *nlh, int 
>reqlen)
>   return 0;
> }
> 
>+static int has_master(struct nlmsg_chain *linfo, int index)
>+{
>+  struct nlmsg_list *l;
>+  struct rtattr *tb[IFLA_MAX+1];
>+  int len;
>+  for (l = linfo->head; l; l = l->next) {
>+  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>+  len = l->h.nlmsg_len;
>+  len -= NLMSG_LENGTH(sizeof(*ifi));
>+  parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
>+  if (tb[IFLA_MASTER] && *(int *)RTA_DATA(tb[IFLA_MASTER]) == 
>index)
>+  return 1;
>+  }
>+  return 0;
>+}
>+
>+static struct nlmsg_list *get_master(struct nlmsg_chain *linfo, struct rtattr 
>**tb)
>+{
>+  struct nlmsg_list *l;
>+  if (tb[IFLA_MASTER]) {
>+  int master = *(int *)RTA_DATA(tb[IFLA_MASTER]);
>+  for (l = linfo->head; l; l = l->next) {
>+  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>+  if (ifi->ifi_index == master)
>+  return l;
>+  }
>+  }
>+  return NULL;
>+}
>+
>+static void print_dev_tree_item(struct nlmsg_chain *linfo, struct nlmsg_list 
>*l, int indent) {
>+  char *name;
>+  int len;
>+  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>+  struct rtattr *tb[IFLA_MAX+1];
>+  len = l->h.nlmsg_len;
>+  len -= NLMSG_LENGTH(sizeof(*ifi));
>+  parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
>+  name = (char *)(tb[IFLA_IFNAME] ? rta_getattr_str(tb[IFLA_IFNAME]) : 
>"");
>+
>+  printf("%*s%s\n", indent * 4, "", name);
>+
>+  struct nlmsg_list *master = get_master(linfo, tb);
>+  if (master) {
>+  if (indent > 8) {
>+  printf("%*s...\n", (indent + 1) * 4, "");
>+  } else {
>+  print_dev_tree_item(linfo, master, indent + 1);
>+  }
>+  }
>+}
>+
>+static void print_devtree(struct nlmsg_chain *linfo)
>+{
>+  struct nlmsg_list *l;
>+  for (l = linfo->head; l; l = l->next) {
>+  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>+  if (!has_master(linfo, ifi->ifi_index)) {
>+  print_dev_tree_item(linfo, l, 0);
>+  }
>+  }
>+}
>+
> static int ipaddr_list_flush_or_save(int argc, char **argv, int action)
> {
>   struct nlmsg_chain linfo = { NULL, NULL};
>@@ -1742,23 +1805,27 @@ static int ipaddr_list_flush_or_save(int argc, char 
>**argv, int action)
>   ipaddr_filter(, );
>   }
> 
>-  for (l = linfo.head; l; l = l->next) {
>-  int res = 0;
>-  struct ifinfomsg *ifi = NLMSG_DATA(>h);
>-
>-  if 

Re: [PATCH 1/1] color: use "light" colors for dark background

2017-02-25 Thread Mathias Nyman

On 2017-02-24 11:13+0100, Petr Vorel wrote:

COLORFGBG environment variable is used to detect dark background.

Idea and a bit of code is borrowed from Vim, thanks.

Signed-off-by: Petr Vorel 
---
Colors are nice, but the ones chosen aren't suitable for dark background.


Yea, I admit, the original color palette is kind of horrendous. I
wouldn't say the current colors are suitable for a light background
either.

I propose you just replace the current colors with their bold
variants, and leave the background hue guessing out all together. That
would be an all-round improvement for everyone.

I'll include some comments on this patch-set anyway, as I had a look
at it.


COLORFGBG environment variable is used in some libraries and software (e.g.
ncurses, Vim). COLORFGBG is set by various terminal emulators (e.g. konsole,
rxvt and rxvt-unicode).

Chosen colors are questionable. Best solution would be also allow user to
redefine colors, like ls does with LS_COLORS or grep with GREP_COLORS. But that
is maybe overkill.
---
include/color.h |  1 +
lib/color.c | 43 ++-
2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/color.h b/include/color.h
index c1c29831..43190db4 100644
--- a/include/color.h
+++ b/include/color.h
@@ -12,6 +12,7 @@ enum color_attr {
};

void enable_color(void);
+void set_background(void);
int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...);
enum color_attr ifa_family_color(__u8 ifa_family);
enum color_attr oper_state_color(__u8 state);
diff --git a/lib/color.c b/lib/color.c
index 95596be2..69375b26 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -1,5 +1,7 @@
#include 
#include 
+#include 
+#include 
#include 
#include 
#include 
@@ -14,6 +16,12 @@ enum color {
C_MAGENTA,
C_CYAN,
C_WHITE,
+   C_LIGHT_RED,
+   C_LIGHT_GREEN,
+   C_LIGHT_YELLOW,
+   C_LIGHT_BLUE,
+   C_LIGHT_MAGENTA,
+   C_LIGHT_CYAN,


You have missed to add C_LIGHT_WHITE here.

The naming is also confusing, because while you say "light", the
colors defined below are actually _bold_. They may have an lightening
effect, but this naming is just confusing in the code.


C_CLEAR
};

@@ -25,25 +33,58 @@ static const char * const color_codes[] = {
"\e[35m",
"\e[36m",
"\e[37m",
+   "\e[1;31m",
+   "\e[1;32m",
+   "\e[1;33m",
+   "\e[1;34m",
+   "\e[1;35m",
+   "\e[1;36m",


You have also missed to add the white color ("\e[1;37m",) here as
well.


"\e[0m",
NULL,
};

static enum color attr_colors[] = {
+   /* light background */
C_CYAN,
C_YELLOW,
C_MAGENTA,
C_BLUE,
C_GREEN,
C_RED,
+   C_CLEAR,
+
+   /* dark background */
+   C_LIGHT_CYAN,
+   C_LIGHT_YELLOW,
+   C_LIGHT_MAGENTA,
+   C_LIGHT_BLUE,
+   C_LIGHT_GREEN,
+   C_LIGHT_RED,
C_CLEAR
};

+static int is_dark_bg;
static int color_is_enabled;

void enable_color(void)
{
color_is_enabled = 1;
+   set_background();
+}
+
+void set_background(void)


The function name is a bit misleading. Only after reading the whole
function did I understand that what you do here is choose the color
palette - not set the background.


+{
+   char *p = getenv("COLORFGBG");
+
+   /*
+* COLORFGBG environment variable usually contains either two or three
+* values separated by semicolons; we want the last value in either 
case.
+* If this value is 0-6 or 8, background is dark.
+*/
+   if (p && (p = (char *)strrchr(p, ';')) != NULL
+   && ((p[1] >= '0' && p[1] <= '6') || p[1] == '8')
+   && p[2] == '\0')
+   is_dark_bg = 1;
}

int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...)
@@ -58,7 +99,7 @@ int color_fprintf(FILE *fp, enum color_attr attr, const char 
*fmt, ...)
goto end;
}

-   ret += fprintf(fp, "%s", color_codes[attr_colors[attr]]);
+   ret += fprintf(fp, "%s", color_codes[attr_colors[is_dark_bg ? attr + 7 
: attr]]);
ret += vfprintf(fp, fmt, args);
ret += fprintf(fp, "%s", color_codes[C_CLEAR]);

--
2.11.0



[PATCH] iproute2: show network device dependency tree

2017-02-25 Thread Zaboj Campula
Add the argument '-tree' to ip-link to show network devices dependency tree.

Example:

$ ip -tree link
eth0
bond0
eth1
bond0
eth2
bond1
eth3
bond1

Signed-off-by: Zaboj Campula 
---
 include/utils.h |  1 +
 ip/ip.c |  5 ++-
 ip/ipaddress.c  | 97 -
 3 files changed, 87 insertions(+), 16 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 22369e0..f1acf4d 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -20,6 +20,7 @@ extern int show_raw;
 extern int resolve_hosts;
 extern int oneline;
 extern int brief;
+extern int tree;;
 extern int timestamp;
 extern int timestamp_short;
 extern const char * _SL_;
diff --git a/ip/ip.c b/ip/ip.c
index 07050b0..29747a5 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -33,6 +33,7 @@ int show_details;
 int resolve_hosts;
 int oneline;
 int brief;
+int tree;
 int timestamp;
 const char *_SL_;
 int force;
@@ -57,7 +58,7 @@ static void usage(void)
 "-h[uman-readable] | -iec |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
 "-4 | -6 | -I | -D | -B | -0 |\n"
-"-l[oops] { maximum-addr-flush-attempts } | -br[ief] |\n"
+"-l[oops] { maximum-addr-flush-attempts } | -br[ief] | 
-tr[ee] |\n"
 "-o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
[filename] |\n"
 "-rc[vbuf] [size] | -n[etns] name | -a[ll] | -c[olor]}\n");
exit(-1);
@@ -257,6 +258,8 @@ int main(int argc, char **argv)
batch_file = argv[1];
} else if (matches(opt, "-brief") == 0) {
++brief;
+   } else if (matches(opt, "-tree") == 0) {
+   ++tree;
} else if (matches(opt, "-rcvbuf") == 0) {
unsigned int size;
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 242c6ea..5ebcb1a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1534,6 +1534,69 @@ static int iplink_filter_req(struct nlmsghdr *nlh, int 
reqlen)
return 0;
 }
 
+static int has_master(struct nlmsg_chain *linfo, int index)
+{
+   struct nlmsg_list *l;
+   struct rtattr *tb[IFLA_MAX+1];
+   int len;
+   for (l = linfo->head; l; l = l->next) {
+   struct ifinfomsg *ifi = NLMSG_DATA(>h);
+   len = l->h.nlmsg_len;
+   len -= NLMSG_LENGTH(sizeof(*ifi));
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+   if (tb[IFLA_MASTER] && *(int *)RTA_DATA(tb[IFLA_MASTER]) == 
index)
+   return 1;
+   }
+   return 0;
+}
+
+static struct nlmsg_list *get_master(struct nlmsg_chain *linfo, struct rtattr 
**tb)
+{
+   struct nlmsg_list *l;
+   if (tb[IFLA_MASTER]) {
+   int master = *(int *)RTA_DATA(tb[IFLA_MASTER]);
+   for (l = linfo->head; l; l = l->next) {
+   struct ifinfomsg *ifi = NLMSG_DATA(>h);
+   if (ifi->ifi_index == master)
+   return l;
+   }
+   }
+   return NULL;
+}
+
+static void print_dev_tree_item(struct nlmsg_chain *linfo, struct nlmsg_list 
*l, int indent) {
+   char *name;
+   int len;
+   struct ifinfomsg *ifi = NLMSG_DATA(>h);
+   struct rtattr *tb[IFLA_MAX+1];
+   len = l->h.nlmsg_len;
+   len -= NLMSG_LENGTH(sizeof(*ifi));
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+   name = (char *)(tb[IFLA_IFNAME] ? rta_getattr_str(tb[IFLA_IFNAME]) : 
"");
+
+   printf("%*s%s\n", indent * 4, "", name);
+
+   struct nlmsg_list *master = get_master(linfo, tb);
+   if (master) {
+   if (indent > 8) {
+   printf("%*s...\n", (indent + 1) * 4, "");
+   } else {
+   print_dev_tree_item(linfo, master, indent + 1);
+   }
+   }
+}
+
+static void print_devtree(struct nlmsg_chain *linfo)
+{
+   struct nlmsg_list *l;
+   for (l = linfo->head; l; l = l->next) {
+   struct ifinfomsg *ifi = NLMSG_DATA(>h);
+   if (!has_master(linfo, ifi->ifi_index)) {
+   print_dev_tree_item(linfo, l, 0);
+   }
+   }
+}
+
 static int ipaddr_list_flush_or_save(int argc, char **argv, int action)
 {
struct nlmsg_chain linfo = { NULL, NULL};
@@ -1742,23 +1805,27 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
ipaddr_filter(, );
}
 
-   for (l = linfo.head; l; l = l->next) {
-   int res = 0;
-   struct ifinfomsg *ifi = NLMSG_DATA(>h);
-
-   if (brief) {
-   if (print_linkinfo_brief(NULL, >h, stdout) == 0)
+   if (tree) {
+   print_devtree();
+   } else {
+   for (l = linfo.head; l; l = l->next) {
+   

Re: [PATCH net] net/mlx4_en: fix overflow in mlx4_en_init_timestamp()

2017-02-25 Thread Or Gerlitz
On Fri, Feb 24, 2017 at 6:21 PM, David Miller  wrote:
> From: Eric Dumazet 
> Date: Thu, 23 Feb 2017

> Tariq please review.

Dave,

Just to re-iterate what we wrote here couple of time, the IL WW is
Sun-Thu on GMT+2 hours and hence this patch was sent when the
developers/maintainer are into weekend mode.

Or.


[PATCH net] net/mlx4_en: reception NAPI/IRQ race breaker

2017-02-25 Thread Eric Dumazet
From: Eric Dumazet 

While playing with hardware timestamping of RX packets, I found
that some packets were received by TCP stack with a ~200 ms delay...

Since the timestamp was provided by the NIC, and my probe was added
in tcp_v4_rcv() while in BH handler, I was confident it was not
a sender issue, or a drop in the network.

This would happen with a very low probability, but hurting RPC
workloads.

I could track this down to the cx-3 (mlx4 driver) card that was
apparently sending an IRQ before we would arm it.

A NAPI driver normally arms the IRQ after the napi_complete_done(),
after NAPI_STATE_SCHED is cleared, so that the hard irq handler can grab
it.

This patch adds a new rx_irq_miss field that is incremented every time
the hard IRQ handler could not grab NAPI_STATE_SCHED.

Then, mlx4_en_poll_rx_cq() is able to detect that rx_irq was incremented
and attempts to read more packets, if it can re-acquire NAPI_STATE_SCHED

Note that this work around would probably not work if the IRQ is spread
over many cpus, since it assume the hard irq and softirq are handled by
the same cpu. This kind of setup is buggy anyway because of reordering
issues.

Signed-off-by: Eric Dumazet 
Cc: Tariq Toukan 
Cc: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |   32 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |1 
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 
867292880c07a15124a0cf099d1fcda09926548e..7c262dc6a9971ca99a890bc3cff49289f0ef43fc
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1132,10 +1132,14 @@ void mlx4_en_rx_irq(struct mlx4_cq *mcq)
struct mlx4_en_cq *cq = container_of(mcq, struct mlx4_en_cq, mcq);
struct mlx4_en_priv *priv = netdev_priv(cq->dev);
 
-   if (likely(priv->port_up))
-   napi_schedule_irqoff(>napi);
-   else
+   if (likely(priv->port_up)) {
+   if (napi_schedule_prep(>napi))
+   __napi_schedule_irqoff(>napi);
+   else
+   cq->rx_irq_missed++;
+   } else {
mlx4_en_arm_cq(priv, cq);
+   }
 }
 
 /* Rx CQ polling - called by NAPI */
@@ -1144,9 +1148,12 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int 
budget)
struct mlx4_en_cq *cq = container_of(napi, struct mlx4_en_cq, napi);
struct net_device *dev = cq->dev;
struct mlx4_en_priv *priv = netdev_priv(dev);
-   int done;
+   int done = 0;
+   u32 rx_irq_missed;
 
-   done = mlx4_en_process_rx_cq(dev, cq, budget);
+again:
+   rx_irq_missed = READ_ONCE(cq->rx_irq_missed);
+   done += mlx4_en_process_rx_cq(dev, cq, budget - done);
 
/* If we used up all the quota - we're probably not done yet... */
if (done == budget) {
@@ -1171,10 +1178,21 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int 
budget)
 */
if (done)
done--;
+   if (napi_complete_done(napi, done))
+   mlx4_en_arm_cq(priv, cq);
+   return done;
}
-   /* Done for now */
-   if (napi_complete_done(napi, done))
+   if (unlikely(READ_ONCE(cq->rx_irq_missed) != rx_irq_missed))
+   goto again;
+
+   if (napi_complete_done(napi, done)) {
mlx4_en_arm_cq(priv, cq);
+
+   /* We might have received an interrupt too soon */
+   if (unlikely(READ_ONCE(cq->rx_irq_missed) != rx_irq_missed) &&
+   napi_reschedule(napi))
+   goto again;
+   }
return done;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 
4941b692e9479bc7800e92f9bfb13baf3ff7d0d4..030f305fcca7fa4b3b9a15be3fe11e572d665003
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -369,6 +369,7 @@ struct mlx4_en_cq {
int ring;
struct net_device  *dev;
struct napi_struct  napi;
+   u32 rx_irq_missed;
int size;
int buf_size;
int vector;




[PATCH net] xfrm: provide correct dst in xfrm_neigh_lookup

2017-02-25 Thread Julian Anastasov
Fix xfrm_neigh_lookup to provide dst->path to the
neigh_lookup dst_ops method.

When skb is provided, the IP address in packet should already
match the dst->path address family. But for the non-skb case,
we should consider the last tunnel address as nexthop address.

Fixes: f894cbf847c9 ("net: Add optional SKB arg to dst_ops->neigh_lookup().")
Signed-off-by: Julian Anastasov 
---
 net/xfrm/xfrm_policy.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 5f3e878..0806dcc 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2836,14 +2836,8 @@ static unsigned int xfrm_mtu(const struct dst_entry *dst)
return mtu ? : dst_mtu(dst->path);
 }
 
-static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
-  struct sk_buff *skb,
-  const void *daddr)
-{
-   return dst->path->ops->neigh_lookup(dst, skb, daddr);
-}
-
-static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
+   const void *daddr)
 {
const struct dst_entry *path = dst->path;
 
@@ -2857,6 +2851,25 @@ static void xfrm_confirm_neigh(const struct dst_entry 
*dst, const void *daddr)
else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
daddr = >id.daddr;
}
+   return daddr;
+}
+
+static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
+  struct sk_buff *skb,
+  const void *daddr)
+{
+   const struct dst_entry *path = dst->path;
+
+   if (!skb)
+   daddr = xfrm_get_dst_nexthop(dst, daddr);
+   return path->ops->neigh_lookup(path, skb, daddr);
+}
+
+static void xfrm_confirm_neigh(const struct dst_entry *dst, const void *daddr)
+{
+   const struct dst_entry *path = dst->path;
+
+   daddr = xfrm_get_dst_nexthop(dst, daddr);
path->ops->confirm_neigh(path, daddr);
 }
 
-- 
1.9.3



Re: [bug report] rhashtable: Add nested tables

2017-02-25 Thread Herbert Xu
On Thu, Feb 16, 2017 at 03:17:03PM +0300, Dan Carpenter wrote:
> Hello Herbert Xu,
> 
> This is a semi-automatic email about new static checker warnings.
> 
> The patch 40137906c5f5: "rhashtable: Add nested tables" from Feb 11, 
> 2017, leads to the following Smatch complaint:
> 
> lib/rhashtable.c:149 bucket_table_free()
>warn: variable dereferenced before check 'tbl' (see line 146)
> 
> lib/rhashtable.c
>145{
>146if (tbl->nest)
> ^
> New unchecked dereference.
> 
>147nested_bucket_table_free(tbl);
>148
>149if (tbl)
> ^^^
> Old code checked for NULL.
> 
>150kvfree(tbl->locks);
>151

Thanks for catching this.  I checked all the callers and none
of them can provide a NULL pointer so let's just kill the check.

---8<---
Subject: rhashtable: Fix use before NULL check in bucket_table_free

Dan Carpenter reported a use before NULL check bug in the function
bucket_table_free.  In fact we don't need the NULL check at all as
no caller can provide a NULL argument.  So this patch fixes this by
simply removing it.

Reported-by: Dan Carpenter 
Signed-off-by: Herbert Xu 
---

 lib/rhashtable.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 172454e..fac1a78 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -146,9 +146,7 @@ static void bucket_table_free(const struct bucket_table 
*tbl)
if (tbl->nest)
nested_bucket_table_free(tbl);
 
-   if (tbl)
-   kvfree(tbl->locks);
-
+   kvfree(tbl->locks);
kvfree(tbl);
 }
 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [rhashtable] 5d60de5ff1 [ INFO: suspicious RCU usage. ]

2017-02-25 Thread Herbert Xu
On Sat, Feb 18, 2017 at 01:23:31PM +0800, Fengguang Wu wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://github.com/0day-ci/linux 
> Herbert-Xu/rhashtable-Handle-table-allocation-failure-during-insertion/20170212-030221
> 
> commit 5d60de5ff12fb1e966c863bcb41c1e2bdde66c50
> Author: Herbert Xu 
> AuthorDate: Sat Feb 11 19:26:47 2017 +0800
> Commit: 0day robot 
> CommitDate: Sun Feb 12 03:02:26 2017 +0800
> 
>  rhashtable: Add nested tables
>  
>  This patch adds code that handles GFP_ATOMIC kmalloc failure on
>  insertion.  As we cannot use vmalloc, we solve it by making our
>  hash table nested.  That is, we allocate single pages at each level
>  and reach our desired table size by nesting them.
>  
>  When a nested table is created, only a single page is allocated
>  at the top-level.  Lower levels are allocated on demand during
>  insertion.  Therefore for each insertion to succeed, only two
>  (non-consecutive) pages are needed.
>  
>  After a nested table is created, a rehash will be scheduled in
>  order to switch to a vmalloced table as soon as possible.  Also,
>  the rehash code will never rehash into a nested table.  If we
>  detect a nested table during a rehash, the rehash will be aborted
>  and a new rehash will be scheduled.
>  
>  Signed-off-by: Herbert Xu 
> 
> +--+++--+
> |  | 
> eadcee1e3a | 5d60de5ff1 | v4.10-rc8_021716 |
> +--+++--+
> | boot_successes   | 0
>   | 0  | 9|
> | boot_failures| 177  
>   | 48 | 82   |
> | Kernel_panic-not_syncing:softlockup:hung_tasks   | 177  
>   | 48 | 14   |
> | INFO:suspicious_RCU_usage| 0
>   | 39 | 64   |
> | BUG:KASAN:user-memory-access_on_address  | 0
>   | 0  | 16   |
> | BUG:unable_to_handle_kernel  | 0
>   | 0  | 18   |
> | Oops | 0
>   | 0  | 18   |
> | Kernel_panic-not_syncing:Fatal_exception | 0
>   | 0  | 18   |
> | BUG:kernel_hang_in_test_stage| 0
>   | 0  | 19   |
> | WARNING:at_kernel/sched/sched.h:#set_next_entity | 0
>   | 0  | 15   |
> | INFO:possible_circular_locking_dependency_detected   | 0
>   | 0  | 4|
> | WARNING:at_include/linux/cpumask.h:#find_get_context | 0
>   | 0  | 2|
> | invoked_oom-killer:gfp_mask=0x   | 0
>   | 0  | 4|
> | Mem-Info | 0
>   | 0  | 4|
> | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0
>   | 0  | 4|
> | Kernel_panic-not_syncing:Attempted_to_kill_init!exitcode=| 0
>   | 0  | 7|
> +--+++--+
> 
> [9.022401] Testing concurrent rhashtable access from 10 threads
> [   15.83] 
> [   15.801145] ===
> [   15.801802] [ INFO: suspicious RCU usage. ]
> [   15.802476] 4.10.0-rc7-00090-g5d60de5 #1 Not tainted
> [   15.803235] ---
> [   15.803866] lib/rhashtable.c:1126 suspicious rcu_dereference_protected() 
> usage!

Sorry, I chose the wrong annotation to fix the previous report.
This patch should fix it completely.

---8<---
Subject: rhashtable: Fix RCU dereference annotation in rht_bucket_nested

The current annotation is wrong as it says that we're only called
under spinlock.  In fact it should be marked as under either
spinlock or RCU read lock.

Fixes: da20420f83ea ("rhashtable: Add nested tables")
Reported-by: Fengguang Wu 
Signed-off-by: Herbert Xu 
---

 lib/rhashtable.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index fac1a78..c5b9b93 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ 

Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE

2017-02-25 Thread Xin Long
On Fri, Feb 24, 2017 at 6:14 PM, David Laight  wrote:
>
> From: Xin Long
> > Sent: 24 February 2017 06:44
> ...
> > > IIRC sctp_packet_can_append_data() is called for the first queued
> > > data chunk in order to decide whether to generate a message that
> > > consists only of data chunks.
> > > If it returns SCTP_XMIT_OK then a message is built collecting the
> > > rest of the queued data chunks (until the window fills).
> > >
> > > So if I send a message with MSG_MORE set (on an idle connection)
> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
> > >
> > > I now send a second small message, this time with MSG_MORE clear.
> > > The message is queued, then the code looks to see if it can send anything.
> > >
> > > sctp_packet_can_append_data() is called for the first queued chunk.
> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > > message is built.
> > > The second message isn't even looked at.
> > You're right. I can see the problem now.
> >
> > What I expected is it should work like:
> >
> > 1, send 3 small chunks with MSG_MORE set, the queue is:
> >   chk3 [set] -> chk2 [set] -> chk1 [set]
>
> Strange way to write a queue! chk1 points to chk2 :-)
haha, just  a model.

>
> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
>
> I don't think processing the entire queue is a good idea.
> Both from execution time and the effects on the data cache.
> The SCTP code is horrid enough as it is.
you check the codes in last email, it's not processing the entire queue.

1). only when queue has delay chunk inside by checking queue->has_delay
and current chunk has msg_more flag.

2). will break on the first chunk with clear in the queue.

but yes, in 2), extra work has to be done than before, but not much.

>
> > 3. then if user send more small chunks with MSG_MORE set,
> > the queue is like:
> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> 
> > chk1 [clear]
> > so that the new small chunks' flag will not affect the other chunks 
> > bundling.
>
> That isn't really necessary.
> The user can't expect to have absolute control over which chunks get bundled
> together.
> If the above chunks still aren't big enough to fill a frame the code might
> as well wait for the next chunk instead of building a packet that contains
> chk1 through to chkB.
>
> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be 
> sent.
> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> queued chunks will be sent.
>
> So immediately after your (3) the application is expected to send a chunk
> with MSG_MORE clear - at that point all the queued chunks can be sent in
> a single packet.
understand this.

what I'm worried about is if the msg_more is saved in assoc:
 chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
then when you send a small chkA with MSG_MORE,
the queue will be like:
 chkA [set] -> chk4[set] -> chk3 [set] -> chk2 [set] -> chk1 [set]
because msg_more is saved in assoc, every chunk can look at it.
chk1 - chk4 are big enough to be packed into a packet, they were
not sent last time because a lot of chunks are in the retransmit
queue.

But now even if retransmit queue is null, chk1-chk4 are still blocked.

can you accept that chkA may block the old chunks ?

>
> So just save the last MSG_MORE on the association as I did.
I will think about it, thanks.

>
> David