Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote: > On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote: > > > --- > > net/core/dev.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 8cba3d8..1354c7b 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4153,8 +4153,11 @@ ncls: > > else > > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > > } else { > > + if (deliver_exact) > > + goto inactive; /* bond or team inactive slave */ > > drop: > > atomic_long_inc(>dev->rx_dropped); > > +inactive: > > kfree_skb(skb); > > /* Jamal, now you will not able to escape explaining > > * me how you were going to use this. :-) > > Note that if you still have a kfree_skb() instead of consume_skb(), > some tools will still give you a wrong signal (packet dropped ...). > > But then maybe the signal is telling some truth. > > We receive a packet, and decide to drop it because no one was willing to > handle it. > > Maybe someone wants to know a particular slave receives 10,000 such > frames per second and hurts performance with useless work. > > We should at least increment some counter and maybe dump it with > "ethtool -S" or something. I've been digging into ethtool -S a little bit, and am somewhat at a loss as to how I would wire into this. From what I've been able to figure out, it's entirely device-specific-ish counters spit out. On my sfc cards, I get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for the core network stack, these are actually added up and shoved into rx_dropped, and no other network driver has those two individual counters. By itself, rx_dropped isn't output directly anywhere from ethtool, so far as I can see. And ethtool -S bondX shows absolutely nothing. *Should* ethtool -S be dumping all the network core stats? I have to say I was more than a little surprised at this: # ethtool -S bond0 no stats available Particularly given that if I look in /proc/net/dev or /sys/devices/virtual/net/bond0/statistics/*, there are quite a few stats that are being tracked and make their way out to userspace... -- Jarod Wilson ja...@redhat.com
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
From: Jarod WilsonDate: Tue, 26 Jan 2016 16:14:53 -0500 > # ethtool -S bond0 > no stats available ethtool -S is for device specific stats. Some drivers use this facility to provide per-RX-queue and per-TX-queue versions of the existing core netdev stats.
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Tue, Jan 26, 2016 at 1:14 PM, Jarod Wilsonwrote: > On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote: >> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote: >> >> > --- >> > net/core/dev.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > diff --git a/net/core/dev.c b/net/core/dev.c >> > index 8cba3d8..1354c7b 100644 >> > --- a/net/core/dev.c >> > +++ b/net/core/dev.c >> > @@ -4153,8 +4153,11 @@ ncls: >> > else >> > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); >> > } else { >> > + if (deliver_exact) >> > + goto inactive; /* bond or team inactive slave */ >> > drop: >> > atomic_long_inc(>dev->rx_dropped); >> > +inactive: >> > kfree_skb(skb); >> > /* Jamal, now you will not able to escape explaining >> > * me how you were going to use this. :-) >> >> Note that if you still have a kfree_skb() instead of consume_skb(), >> some tools will still give you a wrong signal (packet dropped ...). >> >> But then maybe the signal is telling some truth. >> >> We receive a packet, and decide to drop it because no one was willing to >> handle it. >> >> Maybe someone wants to know a particular slave receives 10,000 such >> frames per second and hurts performance with useless work. >> >> We should at least increment some counter and maybe dump it with >> "ethtool -S" or something. > > I've been digging into ethtool -S a little bit, and am somewhat at a loss > as to how I would wire into this. From what I've been able to figure out, > it's entirely device-specific-ish counters spit out. On my sfc cards, I > get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for > the core network stack, these are actually added up and shoved into > rx_dropped, and no other network driver has those two individual counters. > > By itself, rx_dropped isn't output directly anywhere from ethtool, > so far as I can see. And ethtool -S bondX shows absolutely nothing. > *Should* ethtool -S be dumping all the network core stats? I have to say I > was more than a little surprised at this: # ip -s -s link sh dev eth0 15: eth0: mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 3c:97:0e:be:91:7b brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 0 00 0 0 0 RX errors: length crc frame fifomissed 00 0 0 0 TX: bytes packets errors dropped carrier collsns 0 00 0 0 0 TX errors: aborted fifowindow heartbeat 00 0 0 So start with the following patch : diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index a30b78090594..7c762cf1e4d5 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -68,6 +68,8 @@ struct rtnl_link_stats64 { /* for cslip etc */ __u64 rx_compressed; __u64 tx_compressed; + + __u64 rx_nohandler; /* packet was of no interest */ }; /* The struct should be in sync with struct ifmap */
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Tue, Jan 26, 2016 at 01:24:59PM -0800, Eric Dumazet wrote: > On Tue, Jan 26, 2016 at 1:14 PM, Jarod Wilsonwrote: > > On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote: > >> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote: > >> > >> > --- > >> > net/core/dev.c | 3 +++ > >> > 1 file changed, 3 insertions(+) > >> > > >> > diff --git a/net/core/dev.c b/net/core/dev.c > >> > index 8cba3d8..1354c7b 100644 > >> > --- a/net/core/dev.c > >> > +++ b/net/core/dev.c > >> > @@ -4153,8 +4153,11 @@ ncls: > >> > else > >> > ret = pt_prev->func(skb, skb->dev, pt_prev, > >> > orig_dev); > >> > } else { > >> > + if (deliver_exact) > >> > + goto inactive; /* bond or team inactive slave */ > >> > drop: > >> > atomic_long_inc(>dev->rx_dropped); > >> > +inactive: > >> > kfree_skb(skb); > >> > /* Jamal, now you will not able to escape explaining > >> > * me how you were going to use this. :-) > >> > >> Note that if you still have a kfree_skb() instead of consume_skb(), > >> some tools will still give you a wrong signal (packet dropped ...). > >> > >> But then maybe the signal is telling some truth. > >> > >> We receive a packet, and decide to drop it because no one was willing to > >> handle it. > >> > >> Maybe someone wants to know a particular slave receives 10,000 such > >> frames per second and hurts performance with useless work. > >> > >> We should at least increment some counter and maybe dump it with > >> "ethtool -S" or something. > > > > I've been digging into ethtool -S a little bit, and am somewhat at a loss > > as to how I would wire into this. From what I've been able to figure out, > > it's entirely device-specific-ish counters spit out. On my sfc cards, I > > get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for > > the core network stack, these are actually added up and shoved into > > rx_dropped, and no other network driver has those two individual counters. > > > > By itself, rx_dropped isn't output directly anywhere from ethtool, > > so far as I can see. And ethtool -S bondX shows absolutely nothing. > > *Should* ethtool -S be dumping all the network core stats? I have to say I > > was more than a little surprised at this: > > # ip -s -s link sh dev eth0 > 15: eth0: mtu 1500 qdisc > pfifo_fast state DOWN mode DEFAULT group default qlen 1000 > link/ether 3c:97:0e:be:91:7b brd ff:ff:ff:ff:ff:ff > RX: bytes packets errors dropped overrun mcast > 0 00 0 0 0 > RX errors: length crc frame fifomissed >00 0 0 0 > TX: bytes packets errors dropped carrier collsns > 0 00 0 0 0 > TX errors: aborted fifowindow heartbeat >00 0 0 > > So start with the following patch : > > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index a30b78090594..7c762cf1e4d5 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -68,6 +68,8 @@ struct rtnl_link_stats64 { > /* for cslip etc */ > __u64 rx_compressed; > __u64 tx_compressed; > + > + __u64 rx_nohandler; /* packet was of no interest */ > }; > > /* The struct should be in sync with struct ifmap */ I'm already well past that point, using rx_dropped_inactive as the stat name though, based on the prior discussion. I can certainly convert that over to rx_nohandler easily enough. It looks like adding a column to ip's output there would be as simple as fetching the stat over netlink and spitting it out. -- Jarod Wilson ja...@redhat.com
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Tue, Jan 26, 2016 at 01:21:00PM -0800, David Miller wrote: > From: Jarod Wilson> Date: Tue, 26 Jan 2016 16:14:53 -0500 > > > # ethtool -S bond0 > > no stats available > > ethtool -S is for device specific stats. Okay, good, that was what it looked like to me. Glad I'm not completely lost here. :) So this sort of output wouldn't belong there, it should show up in sysfs, procfs, and be available to ip over netlink. > Some drivers use this facility to provide per-RX-queue and per-TX-queue > versions of the existing core netdev stats. -- Jarod Wilson ja...@redhat.com
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Mon, Jan 25, 2016 at 09:27:20AM -0500, Jarod Wilson wrote: > On Sun, Jan 24, 2016 at 10:42:22PM -0800, David Miller wrote: > > From: Jarod Wilson> > Date: Fri, 22 Jan 2016 14:11:22 -0500 > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 8cba3d8..1354c7b 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -4153,8 +4153,11 @@ ncls: > > > else > > > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > > > } else { > > > + if (deliver_exact) > > > + goto inactive; /* bond or team inactive slave */ > > > drop: > > > atomic_long_inc(>dev->rx_dropped); > > > +inactive: > > > kfree_skb(skb); > > > /* Jamal, now you will not able to escape explaining > > >* me how you were going to use this. :-) > > > > I agree that rx_dropped is not the correct stat to bump here, but > > I'm totally against the event disappearing completely into thin > > air. > > > > You have to replace the rx_dropped bump with _something_. > > > > The only reason this hasn't been "fixed" yet is that everyone is > > too damn lazy to implement that "something". > > Would you want to see all things that shouldn't increment rx_dropped come > in one shot, along with the four or so other counters, as discussed in the > prior thread, or can they be done piecemeal? To date, I'm really only > familiar with this particular case, and could probably get something > together this week. To address the rest, I'd have to poke around a bit > more and see what there is to see and do. Spent a while hacking around today, now have this, p7p1 and p5p2 are the inactive slaves in the bond: [root@dell-per720-06 ~]# cat /proc/net/dev Inter-| Receive | Transmit face |bytespackets errs drop drop_i fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed p6p1: 16024 23800 00 0 0 521 0 0000 0 0 0 p7p1: 1691386 1653700 165680 0 0 488 0 0000 0 0 0 p7p2: 1709438 1671800 00 0 0 561 0 0000 0 0 0 bond0: 6183056 6306500 331510 0 0 13964 24747 193000 0 0 0 p4p1: 0 000 00 0 0 0 0 0000 0 0 0 p4p2: 0 000 00 0 0 0 0 0000 0 0 0 lo:4928 5000 00 0 0 0 4928 50000 0 0 0 p5p1: 2259498 2340100 00 0 0 6740 24747 193000 0 0 0 p5p2: 2232172 2312700 165830 0 0 6736 0 0000 0 0 0 em4: 2347251 1822400 00 0 090 4541 47000 0 0 0 em2: 1590296 1606100 00 0 081 0 0000 0 0 0 em1: 1590180 1606000 00 0 079 0 0000 0 0 0 em3: 2343156 1820900 00 0 094 0 0000 0 0 0 [root@dell-per720-06 ~]# cat /sys/devices/virtual/net/bond0/statistics/rx_dropped_inactive 33181 Haven't yet thrown together anything for ethtool -S output as Eric had suggested, but I'll dig into that tomorrow. -- Jarod Wilson ja...@redhat.com
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Sun, Jan 24, 2016 at 10:42:22PM -0800, David Miller wrote: > From: Jarod Wilson> Date: Fri, 22 Jan 2016 14:11:22 -0500 > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 8cba3d8..1354c7b 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4153,8 +4153,11 @@ ncls: > > else > > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > > } else { > > + if (deliver_exact) > > + goto inactive; /* bond or team inactive slave */ > > drop: > > atomic_long_inc(>dev->rx_dropped); > > +inactive: > > kfree_skb(skb); > > /* Jamal, now you will not able to escape explaining > > * me how you were going to use this. :-) > > -- > > 1.8.3.1 > > > > I agree that rx_dropped is not the correct stat to bump here, but > I'm totally against the event disappearing completely into thin > air. > > You have to replace the rx_dropped bump with _something_. > > The only reason this hasn't been "fixed" yet is that everyone is > too damn lazy to implement that "something". Would you want to see all things that shouldn't increment rx_dropped come in one shot, along with the four or so other counters, as discussed in the prior thread, or can they be done piecemeal? To date, I'm really only familiar with this particular case, and could probably get something together this week. To address the rest, I'd have to poke around a bit more and see what there is to see and do. -- Jarod Wilson ja...@redhat.com
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
From: Jarod WilsonDate: Fri, 22 Jan 2016 14:11:22 -0500 > diff --git a/net/core/dev.c b/net/core/dev.c > index 8cba3d8..1354c7b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4153,8 +4153,11 @@ ncls: > else > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > } else { > + if (deliver_exact) > + goto inactive; /* bond or team inactive slave */ > drop: > atomic_long_inc(>dev->rx_dropped); > +inactive: > kfree_skb(skb); > /* Jamal, now you will not able to escape explaining >* me how you were going to use this. :-) > -- > 1.8.3.1 > I agree that rx_dropped is not the correct stat to bump here, but I'm totally against the event disappearing completely into thin air. You have to replace the rx_dropped bump with _something_. The only reason this hasn't been "fixed" yet is that everyone is too damn lazy to implement that "something".
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
Fri, Jan 22, 2016 at 09:59:12PM CET, jay.vosbu...@canonical.com wrote: >Jarod Wilsonwrote: > >>The network core tries to keep track of dropped packets, but some packets >>you wouldn't really call dropped, so much as intentionally ignored, under >>certain circumstances. One such case is that of bonding and team device >>slaves that are currently inactive. Their respective rx_handler functions >>return RX_HANDLER_EXACT (the only places in the kernel that return that), >>which ends up tracking into the network core's __netif_receive_skb_core() >>function's drop path, with no pt_prev set. On a noisy network, this can >>result in a very rapidly incrementing rx_dropped counter, not only on the >>inactive slave(s), but also on the master device, such as the following: >[...] >>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an >>active-backup bond0, and you can see that all three have high drop counts, >>with the master bond0 showing a tally of all three. >> >>I know that this was previously discussed some here: >> >>http://www.spinics.net/lists/netdev/msg226341.html >> >>It seems additional counters never came to fruition, but honestly, for >>this particular case, I'm not even sure they're warranted, I'd be inclined >>to say just silently drop these packets without incrementing a counter. At >>least, that's probably what would make someone who has complained loudly >>about this issue happy, as they have monitoring tools that are squaking >>loudly at any increments to rx_dropped. In this case, it is delivered with exact delivery according to per-dev registered callback. We just have to avoid it gets to bond. So this case is not "to drop", but rather "to block skb to don't get where it does not belong". > > I don't think the kernel should silently drop packets; there >should be a counter somewhere. If a packet is being thrown away >deliberately, it should not just vanish into the screaming void of >space. Someday someone will try and track down where that packet is >being dropped. > > I've had that same conversation with customers who insist on >accounting for every packet drop (from the "any drop is an error" >mindset), so I understand the issue. > > Thinking about the prior discussion, the rx_drop_inactive is >still a good idea, but I'd actually today get good use from a >"rx_drop_unforwardable" (or an equivalent but shorter name) counter that >counts every time a packet is dropped due to is_skb_forwardable() >returning false. __dev_forward_skb does this (and hits rx_dropped), as >does the bridge (and does not count it). > > -J > >>CC: "David S. Miller" >>CC: Eric Dumazet >>CC: Jiri Pirko >>CC: Daniel Borkmann >>CC: Tom Herbert >>CC: Jay Vosburgh >>CC: Veaceslav Falico >>CC: Andy Gospodarek >>CC: netdev@vger.kernel.org >>Signed-off-by: Jarod Wilson >>--- >> net/core/dev.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >>diff --git a/net/core/dev.c b/net/core/dev.c >>index 8cba3d8..1354c7b 100644 >>--- a/net/core/dev.c >>+++ b/net/core/dev.c >>@@ -4153,8 +4153,11 @@ ncls: >> else >> ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); >> } else { >>+ if (deliver_exact) >>+ goto inactive; /* bond or team inactive slave */ >> drop: >> atomic_long_inc(>dev->rx_dropped); >>+inactive: >> kfree_skb(skb); >> /* Jamal, now you will not able to escape explaining >> * me how you were going to use this. :-) >>-- >>1.8.3.1 >> > >--- > -Jay Vosburgh, jay.vosbu...@canonical.com
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
Fri, Jan 22, 2016 at 08:11:22PM CET, ja...@redhat.com wrote: >The network core tries to keep track of dropped packets, but some packets >you wouldn't really call dropped, so much as intentionally ignored, under >certain circumstances. One such case is that of bonding and team device >slaves that are currently inactive. Their respective rx_handler functions >return RX_HANDLER_EXACT (the only places in the kernel that return that), >which ends up tracking into the network core's __netif_receive_skb_core() >function's drop path, with no pt_prev set. On a noisy network, this can >result in a very rapidly incrementing rx_dropped counter, not only on the >inactive slave(s), but also on the master device, such as the following: > >Inter-| Receive| Transmit > face |bytespackets errs drop fifo frame compressed multicast|bytes > packets errs drop fifo colls carrier compressed > p7p1: 14783346 1404300 1404280 0 0 2040 680 > 8000 0 0 0 > p7p2: 14805198 140648000 0 0 20340 >0000 0 0 0 > bond0: 53365248 5327980 4211600 0 0115151 2040 > 24000 0 0 0 >lo:5420 54000 0 0 0 5420 > 54000 0 0 0 > p5p1: 19292195 1961970 1403680 0 0 56564 680 > 8000 0 0 0 > p5p2: 19289707 1961710 1403640 0 0 56547 680 > 8000 0 0 0 > em3: 20996626 158214000 0 0 3830 >0000 0 0 0 > em2: 14065122 138462000 0 0 3100 >0000 0 0 0 > em1: 14063162 138440000 0 0 3080 >0000 0 0 0 > em4: 21050830 158729000 0 0 38571662 > 469000 0 0 0 > ib0: 0 0000 0 0 00 > 0000 0 0 0 > >In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an >active-backup bond0, and you can see that all three have high drop counts, >with the master bond0 showing a tally of all three. > >I know that this was previously discussed some here: > >http://www.spinics.net/lists/netdev/msg226341.html > >It seems additional counters never came to fruition, but honestly, for >this particular case, I'm not even sure they're warranted, I'd be inclined >to say just silently drop these packets without incrementing a counter. At >least, that's probably what would make someone who has complained loudly >about this issue happy, as they have monitoring tools that are squaking >loudly at any increments to rx_dropped. > >CC: "David S. Miller">CC: Eric Dumazet >CC: Jiri Pirko >CC: Daniel Borkmann >CC: Tom Herbert >CC: Jay Vosburgh >CC: Veaceslav Falico >CC: Andy Gospodarek >CC: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson Acked-by: Jiri Pirko I think this should be considered as a bug and go to -net.
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote: > --- > net/core/dev.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 8cba3d8..1354c7b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4153,8 +4153,11 @@ ncls: > else > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > } else { > + if (deliver_exact) > + goto inactive; /* bond or team inactive slave */ > drop: > atomic_long_inc(>dev->rx_dropped); > +inactive: > kfree_skb(skb); > /* Jamal, now you will not able to escape explaining >* me how you were going to use this. :-) Note that if you still have a kfree_skb() instead of consume_skb(), some tools will still give you a wrong signal (packet dropped ...). But then maybe the signal is telling some truth. We receive a packet, and decide to drop it because no one was willing to handle it. Maybe someone wants to know a particular slave receives 10,000 such frames per second and hurts performance with useless work. We should at least increment some counter and maybe dump it with "ethtool -S" or something.
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
On Fri, Jan 22, 2016 at 02:11:22PM -0500, Jarod Wilson wrote: > The network core tries to keep track of dropped packets, but some packets > you wouldn't really call dropped, so much as intentionally ignored, under > certain circumstances. One such case is that of bonding and team device > slaves that are currently inactive. Their respective rx_handler functions > return RX_HANDLER_EXACT (the only places in the kernel that return that), > which ends up tracking into the network core's __netif_receive_skb_core() > function's drop path, with no pt_prev set. On a noisy network, this can > result in a very rapidly incrementing rx_dropped counter, not only on the > inactive slave(s), but also on the master device, such as the following: > > Inter-| Receive| Transmit > face |bytespackets errs drop fifo frame compressed multicast|bytes > packets errs drop fifo colls carrier compressed > p7p1: 14783346 1404300 1404280 0 0 2040 680 > 8000 0 0 0 > p7p2: 14805198 140648000 0 0 20340 > 0000 0 0 0 > bond0: 53365248 5327980 4211600 0 0115151 2040 > 24000 0 0 0 > lo:5420 54000 0 0 0 5420 > 54000 0 0 0 > p5p1: 19292195 1961970 1403680 0 0 56564 680 > 8000 0 0 0 > p5p2: 19289707 1961710 1403640 0 0 56547 680 > 8000 0 0 0 >em3: 20996626 158214000 0 0 3830 > 0000 0 0 0 >em2: 14065122 138462000 0 0 3100 > 0000 0 0 0 >em1: 14063162 138440000 0 0 3080 > 0000 0 0 0 >em4: 21050830 158729000 0 0 38571662 > 469000 0 0 0 >ib0: 0 0000 0 0 00 >0000 0 0 0 > > In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an > active-backup bond0, and you can see that all three have high drop counts, > with the master bond0 showing a tally of all three. > > I know that this was previously discussed some here: > > http://www.spinics.net/lists/netdev/msg226341.html > > It seems additional counters never came to fruition, but honestly, for > this particular case, I'm not even sure they're warranted, I'd be inclined > to say just silently drop these packets without incrementing a counter. At > least, that's probably what would make someone who has complained loudly > about this issue happy, as they have monitoring tools that are squaking > loudly at any increments to rx_dropped. I completely agree. > CC: "David S. Miller"> CC: Eric Dumazet > CC: Jiri Pirko > CC: Daniel Borkmann > CC: Tom Herbert > CC: Jay Vosburgh > CC: Veaceslav Falico > CC: Andy Gospodarek > CC: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson Acked-by: Andy Gospodarek > --- > net/core/dev.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 8cba3d8..1354c7b 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4153,8 +4153,11 @@ ncls: > else > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > } else { > + if (deliver_exact) > + goto inactive; /* bond or team inactive slave */ > drop: > atomic_long_inc(>dev->rx_dropped); > +inactive: > kfree_skb(skb); > /* Jamal, now you will not able to escape explaining >* me how you were going to use this. :-) > -- > 1.8.3.1 >
Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves
Jarod Wilsonwrote: >The network core tries to keep track of dropped packets, but some packets >you wouldn't really call dropped, so much as intentionally ignored, under >certain circumstances. One such case is that of bonding and team device >slaves that are currently inactive. Their respective rx_handler functions >return RX_HANDLER_EXACT (the only places in the kernel that return that), >which ends up tracking into the network core's __netif_receive_skb_core() >function's drop path, with no pt_prev set. On a noisy network, this can >result in a very rapidly incrementing rx_dropped counter, not only on the >inactive slave(s), but also on the master device, such as the following: [...] >In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an >active-backup bond0, and you can see that all three have high drop counts, >with the master bond0 showing a tally of all three. > >I know that this was previously discussed some here: > >http://www.spinics.net/lists/netdev/msg226341.html > >It seems additional counters never came to fruition, but honestly, for >this particular case, I'm not even sure they're warranted, I'd be inclined >to say just silently drop these packets without incrementing a counter. At >least, that's probably what would make someone who has complained loudly >about this issue happy, as they have monitoring tools that are squaking >loudly at any increments to rx_dropped. I don't think the kernel should silently drop packets; there should be a counter somewhere. If a packet is being thrown away deliberately, it should not just vanish into the screaming void of space. Someday someone will try and track down where that packet is being dropped. I've had that same conversation with customers who insist on accounting for every packet drop (from the "any drop is an error" mindset), so I understand the issue. Thinking about the prior discussion, the rx_drop_inactive is still a good idea, but I'd actually today get good use from a "rx_drop_unforwardable" (or an equivalent but shorter name) counter that counts every time a packet is dropped due to is_skb_forwardable() returning false. __dev_forward_skb does this (and hits rx_dropped), as does the bridge (and does not count it). -J >CC: "David S. Miller" >CC: Eric Dumazet >CC: Jiri Pirko >CC: Daniel Borkmann >CC: Tom Herbert >CC: Jay Vosburgh >CC: Veaceslav Falico >CC: Andy Gospodarek >CC: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson >--- > net/core/dev.c | 3 +++ > 1 file changed, 3 insertions(+) > >diff --git a/net/core/dev.c b/net/core/dev.c >index 8cba3d8..1354c7b 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -4153,8 +4153,11 @@ ncls: > else > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); > } else { >+ if (deliver_exact) >+ goto inactive; /* bond or team inactive slave */ > drop: > atomic_long_inc(>dev->rx_dropped); >+inactive: > kfree_skb(skb); > /* Jamal, now you will not able to escape explaining >* me how you were going to use this. :-) >-- >1.8.3.1 > --- -Jay Vosburgh, jay.vosbu...@canonical.com