Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-26 Thread Jarod Wilson
On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote:
> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:
> 
> > ---
> >  net/core/dev.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8cba3d8..1354c7b 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4153,8 +4153,11 @@ ncls:
> > else
> > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> > } else {
> > +   if (deliver_exact)
> > +   goto inactive; /* bond or team inactive slave */
> >  drop:
> > atomic_long_inc(>dev->rx_dropped);
> > +inactive:
> > kfree_skb(skb);
> > /* Jamal, now you will not able to escape explaining
> >  * me how you were going to use this. :-)
> 
> Note that if you still have a kfree_skb() instead of consume_skb(),
> some tools will still give you a wrong signal (packet dropped ...).
> 
> But then maybe the signal is telling some truth.
> 
> We receive a packet, and decide to drop it because no one was willing to
> handle it.
> 
> Maybe someone wants to know a particular slave receives 10,000 such
> frames per second and hurts performance with useless work.
> 
> We should at least increment some counter and maybe dump it with
> "ethtool -S" or something.

I've been digging into ethtool -S a little bit, and am somewhat at a loss
as to how I would wire into this. From what I've been able to figure out,
it's entirely device-specific-ish counters spit out. On my sfc cards, I
get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for
the core network stack, these are actually added up and shoved into
rx_dropped, and no other network driver has those two individual counters.

By itself, rx_dropped isn't output directly anywhere from ethtool, 
so far as I can see. And ethtool -S bondX shows absolutely nothing.
*Should* ethtool -S be dumping all the network core stats? I have to say I
was more than a little surprised at this:

# ethtool -S bond0
no stats available

Particularly given that if I look in /proc/net/dev or
/sys/devices/virtual/net/bond0/statistics/*, there are quite a few stats
that are being tracked and make their way out to userspace...

-- 
Jarod Wilson
ja...@redhat.com



Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-26 Thread David Miller
From: Jarod Wilson 
Date: Tue, 26 Jan 2016 16:14:53 -0500

> # ethtool -S bond0
> no stats available

ethtool -S is for device specific stats.

Some drivers use this facility to provide per-RX-queue and per-TX-queue
versions of the existing core netdev stats.


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-26 Thread Eric Dumazet
On Tue, Jan 26, 2016 at 1:14 PM, Jarod Wilson  wrote:
> On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote:
>> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:
>>
>> > ---
>> >  net/core/dev.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/net/core/dev.c b/net/core/dev.c
>> > index 8cba3d8..1354c7b 100644
>> > --- a/net/core/dev.c
>> > +++ b/net/core/dev.c
>> > @@ -4153,8 +4153,11 @@ ncls:
>> > else
>> > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>> > } else {
>> > +   if (deliver_exact)
>> > +   goto inactive; /* bond or team inactive slave */
>> >  drop:
>> > atomic_long_inc(>dev->rx_dropped);
>> > +inactive:
>> > kfree_skb(skb);
>> > /* Jamal, now you will not able to escape explaining
>> >  * me how you were going to use this. :-)
>>
>> Note that if you still have a kfree_skb() instead of consume_skb(),
>> some tools will still give you a wrong signal (packet dropped ...).
>>
>> But then maybe the signal is telling some truth.
>>
>> We receive a packet, and decide to drop it because no one was willing to
>> handle it.
>>
>> Maybe someone wants to know a particular slave receives 10,000 such
>> frames per second and hurts performance with useless work.
>>
>> We should at least increment some counter and maybe dump it with
>> "ethtool -S" or something.
>
> I've been digging into ethtool -S a little bit, and am somewhat at a loss
> as to how I would wire into this. From what I've been able to figure out,
> it's entirely device-specific-ish counters spit out. On my sfc cards, I
> get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for
> the core network stack, these are actually added up and shoved into
> rx_dropped, and no other network driver has those two individual counters.
>
> By itself, rx_dropped isn't output directly anywhere from ethtool,
> so far as I can see. And ethtool -S bondX shows absolutely nothing.
> *Should* ethtool -S be dumping all the network core stats? I have to say I
> was more than a little surprised at this:

# ip -s -s link sh dev eth0
15: eth0:  mtu 1500 qdisc
pfifo_fast state DOWN mode DEFAULT group default qlen 1000
link/ether 3c:97:0e:be:91:7b brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
0  00   0   0   0
RX errors: length  crc frame   fifomissed
   00   0   0   0
TX: bytes  packets  errors  dropped carrier collsns
0  00   0   0   0
TX errors: aborted fifowindow  heartbeat
   00   0   0

So start with the following patch :

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b78090594..7c762cf1e4d5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -68,6 +68,8 @@ struct rtnl_link_stats64 {
/* for cslip etc */
__u64   rx_compressed;
__u64   tx_compressed;
+
+   __u64   rx_nohandler;   /* packet was of no interest */
 };

 /* The struct should be in sync with struct ifmap */


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-26 Thread Jarod Wilson
On Tue, Jan 26, 2016 at 01:24:59PM -0800, Eric Dumazet wrote:
> On Tue, Jan 26, 2016 at 1:14 PM, Jarod Wilson  wrote:
> > On Sat, Jan 23, 2016 at 07:23:09AM -0800, Eric Dumazet wrote:
> >> On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:
> >>
> >> > ---
> >> >  net/core/dev.c | 3 +++
> >> >  1 file changed, 3 insertions(+)
> >> >
> >> > diff --git a/net/core/dev.c b/net/core/dev.c
> >> > index 8cba3d8..1354c7b 100644
> >> > --- a/net/core/dev.c
> >> > +++ b/net/core/dev.c
> >> > @@ -4153,8 +4153,11 @@ ncls:
> >> > else
> >> > ret = pt_prev->func(skb, skb->dev, pt_prev, 
> >> > orig_dev);
> >> > } else {
> >> > +   if (deliver_exact)
> >> > +   goto inactive; /* bond or team inactive slave */
> >> >  drop:
> >> > atomic_long_inc(>dev->rx_dropped);
> >> > +inactive:
> >> > kfree_skb(skb);
> >> > /* Jamal, now you will not able to escape explaining
> >> >  * me how you were going to use this. :-)
> >>
> >> Note that if you still have a kfree_skb() instead of consume_skb(),
> >> some tools will still give you a wrong signal (packet dropped ...).
> >>
> >> But then maybe the signal is telling some truth.
> >>
> >> We receive a packet, and decide to drop it because no one was willing to
> >> handle it.
> >>
> >> Maybe someone wants to know a particular slave receives 10,000 such
> >> frames per second and hurts performance with useless work.
> >>
> >> We should at least increment some counter and maybe dump it with
> >> "ethtool -S" or something.
> >
> > I've been digging into ethtool -S a little bit, and am somewhat at a loss
> > as to how I would wire into this. From what I've been able to figure out,
> > it's entirely device-specific-ish counters spit out. On my sfc cards, I
> > get rx_noskb_drops and rx_nodesc_drop_cnt output from ethtool -S, but for
> > the core network stack, these are actually added up and shoved into
> > rx_dropped, and no other network driver has those two individual counters.
> >
> > By itself, rx_dropped isn't output directly anywhere from ethtool,
> > so far as I can see. And ethtool -S bondX shows absolutely nothing.
> > *Should* ethtool -S be dumping all the network core stats? I have to say I
> > was more than a little surprised at this:
> 
> # ip -s -s link sh dev eth0
> 15: eth0:  mtu 1500 qdisc
> pfifo_fast state DOWN mode DEFAULT group default qlen 1000
> link/ether 3c:97:0e:be:91:7b brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 0  00   0   0   0
> RX errors: length  crc frame   fifomissed
>00   0   0   0
> TX: bytes  packets  errors  dropped carrier collsns
> 0  00   0   0   0
> TX errors: aborted fifowindow  heartbeat
>00   0   0
> 
> So start with the following patch :
> 
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index a30b78090594..7c762cf1e4d5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -68,6 +68,8 @@ struct rtnl_link_stats64 {
> /* for cslip etc */
> __u64   rx_compressed;
> __u64   tx_compressed;
> +
> +   __u64   rx_nohandler;   /* packet was of no interest */
>  };
> 
>  /* The struct should be in sync with struct ifmap */

I'm already well past that point, using rx_dropped_inactive as the stat
name though, based on the prior discussion. I can certainly convert that
over to rx_nohandler easily enough. It looks like adding a column to ip's
output there would be as simple as fetching the stat over netlink and
spitting it out.

-- 
Jarod Wilson
ja...@redhat.com



Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-26 Thread Jarod Wilson
On Tue, Jan 26, 2016 at 01:21:00PM -0800, David Miller wrote:
> From: Jarod Wilson 
> Date: Tue, 26 Jan 2016 16:14:53 -0500
> 
> > # ethtool -S bond0
> > no stats available
> 
> ethtool -S is for device specific stats.

Okay, good, that was what it looked like to me. Glad I'm not completely
lost here. :)

So this sort of output wouldn't belong there, it should show up in sysfs,
procfs, and be available to ip over netlink.

> Some drivers use this facility to provide per-RX-queue and per-TX-queue
> versions of the existing core netdev stats.

-- 
Jarod Wilson
ja...@redhat.com



Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-25 Thread Jarod Wilson
On Mon, Jan 25, 2016 at 09:27:20AM -0500, Jarod Wilson wrote:
> On Sun, Jan 24, 2016 at 10:42:22PM -0800, David Miller wrote:
> > From: Jarod Wilson 
> > Date: Fri, 22 Jan 2016 14:11:22 -0500
> > 
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 8cba3d8..1354c7b 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -4153,8 +4153,11 @@ ncls:
> > >   else
> > >   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> > >   } else {
> > > + if (deliver_exact)
> > > + goto inactive; /* bond or team inactive slave */
> > >  drop:
> > >   atomic_long_inc(>dev->rx_dropped);
> > > +inactive:
> > >   kfree_skb(skb);
> > >   /* Jamal, now you will not able to escape explaining
> > >* me how you were going to use this. :-)
> > 
> > I agree that rx_dropped is not the correct stat to bump here, but
> > I'm totally against the event disappearing completely into thin
> > air.
> > 
> > You have to replace the rx_dropped bump with _something_.
> > 
> > The only reason this hasn't been "fixed" yet is that everyone is
> > too damn lazy to implement that "something".
> 
> Would you want to see all things that shouldn't increment rx_dropped come
> in one shot, along with the four or so other counters, as discussed in the
> prior thread, or can they be done piecemeal? To date, I'm really only
> familiar with this particular case, and could probably get something
> together this week. To address the rest, I'd have to poke around a bit
> more and see what there is to see and do.

Spent a while hacking around today, now have this, p7p1 and p5p2 are
the inactive slaves in the bond:

[root@dell-per720-06 ~]# cat /proc/net/dev
Inter-|   Receive   |  
Transmit
 face |bytespackets errs drop drop_i fifo frame compressed multicast|bytes  
  packets errs drop fifo colls carrier compressed
  p6p1:   16024 23800  00 0  0   521
0   0000 0   0  0
  p7p1: 1691386   1653700  165680 0  0   488
0   0000 0   0  0
  p7p2: 1709438   1671800  00 0  0   561
0   0000 0   0  0
 bond0: 6183056   6306500  331510 0  0 13964
24747 193000 0   0  0
  p4p1:   0   000  00 0  0 0
0   0000 0   0  0
  p4p2:   0   000  00 0  0 0
0   0000 0   0  0
lo:4928  5000  00 0  0 0 
4928  50000 0   0  0
  p5p1: 2259498   2340100  00 0  0  6740
24747 193000 0   0  0
  p5p2: 2232172   2312700  165830 0  0  6736
0   0000 0   0  0
   em4: 2347251   1822400  00 0  090 
4541  47000 0   0  0
   em2: 1590296   1606100  00 0  081
0   0000 0   0  0
   em1: 1590180   1606000  00 0  079
0   0000 0   0  0
   em3: 2343156   1820900  00 0  094
0   0000 0   0  0
[root@dell-per720-06 ~]# cat 
/sys/devices/virtual/net/bond0/statistics/rx_dropped_inactive
33181

Haven't yet thrown together anything for ethtool -S output as Eric had
suggested, but I'll dig into that tomorrow.

-- 
Jarod Wilson
ja...@redhat.com



Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-25 Thread Jarod Wilson
On Sun, Jan 24, 2016 at 10:42:22PM -0800, David Miller wrote:
> From: Jarod Wilson 
> Date: Fri, 22 Jan 2016 14:11:22 -0500
> 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8cba3d8..1354c7b 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4153,8 +4153,11 @@ ncls:
> > else
> > ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
> > } else {
> > +   if (deliver_exact)
> > +   goto inactive; /* bond or team inactive slave */
> >  drop:
> > atomic_long_inc(>dev->rx_dropped);
> > +inactive:
> > kfree_skb(skb);
> > /* Jamal, now you will not able to escape explaining
> >  * me how you were going to use this. :-)
> > -- 
> > 1.8.3.1
> > 
> 
> I agree that rx_dropped is not the correct stat to bump here, but
> I'm totally against the event disappearing completely into thin
> air.
> 
> You have to replace the rx_dropped bump with _something_.
> 
> The only reason this hasn't been "fixed" yet is that everyone is
> too damn lazy to implement that "something".

Would you want to see all things that shouldn't increment rx_dropped come
in one shot, along with the four or so other counters, as discussed in the
prior thread, or can they be done piecemeal? To date, I'm really only
familiar with this particular case, and could probably get something
together this week. To address the rest, I'd have to poke around a bit
more and see what there is to see and do.

-- 
Jarod Wilson
ja...@redhat.com



Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-24 Thread David Miller
From: Jarod Wilson 
Date: Fri, 22 Jan 2016 14:11:22 -0500

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>   else
>   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>   } else {
> + if (deliver_exact)
> + goto inactive; /* bond or team inactive slave */
>  drop:
>   atomic_long_inc(>dev->rx_dropped);
> +inactive:
>   kfree_skb(skb);
>   /* Jamal, now you will not able to escape explaining
>* me how you were going to use this. :-)
> -- 
> 1.8.3.1
> 

I agree that rx_dropped is not the correct stat to bump here, but
I'm totally against the event disappearing completely into thin
air.

You have to replace the rx_dropped bump with _something_.

The only reason this hasn't been "fixed" yet is that everyone is
too damn lazy to implement that "something".


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Jiri Pirko
Fri, Jan 22, 2016 at 09:59:12PM CET, jay.vosbu...@canonical.com wrote:
>Jarod Wilson  wrote:
>
>>The network core tries to keep track of dropped packets, but some packets
>>you wouldn't really call dropped, so much as intentionally ignored, under
>>certain circumstances. One such case is that of bonding and team device
>>slaves that are currently inactive. Their respective rx_handler functions
>>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>>which ends up tracking into the network core's __netif_receive_skb_core()
>>function's drop path, with no pt_prev set. On a noisy network, this can
>>result in a very rapidly incrementing rx_dropped counter, not only on the
>>inactive slave(s), but also on the master device, such as the following:
>[...]
>>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>>active-backup bond0, and you can see that all three have high drop counts,
>>with the master bond0 showing a tally of all three.
>>
>>I know that this was previously discussed some here:
>>
>>http://www.spinics.net/lists/netdev/msg226341.html
>>
>>It seems additional counters never came to fruition, but honestly, for
>>this particular case, I'm not even sure they're warranted, I'd be inclined
>>to say just silently drop these packets without incrementing a counter. At
>>least, that's probably what would make someone who has complained loudly
>>about this issue happy, as they have monitoring tools that are squaking
>>loudly at any increments to rx_dropped.

In this case, it is delivered with exact delivery according to per-dev
registered callback. We just have to avoid it gets to bond. So this case
is not "to drop", but rather "to block skb to don't get where it does
not belong".

>
>   I don't think the kernel should silently drop packets; there
>should be a counter somewhere.  If a packet is being thrown away
>deliberately, it should not just vanish into the screaming void of
>space.  Someday someone will try and track down where that packet is
>being dropped.
>
>   I've had that same conversation with customers who insist on
>accounting for every packet drop (from the "any drop is an error"
>mindset), so I understand the issue.
>
>   Thinking about the prior discussion, the rx_drop_inactive is
>still a good idea, but I'd actually today get good use from a
>"rx_drop_unforwardable" (or an equivalent but shorter name) counter that
>counts every time a packet is dropped due to is_skb_forwardable()
>returning false.  __dev_forward_skb does this (and hits rx_dropped), as
>does the bridge (and does not count it).
>
>   -J
>
>>CC: "David S. Miller" 
>>CC: Eric Dumazet 
>>CC: Jiri Pirko 
>>CC: Daniel Borkmann 
>>CC: Tom Herbert 
>>CC: Jay Vosburgh 
>>CC: Veaceslav Falico 
>>CC: Andy Gospodarek 
>>CC: netdev@vger.kernel.org
>>Signed-off-by: Jarod Wilson 
>>---
>> net/core/dev.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 8cba3d8..1354c7b 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -4153,8 +4153,11 @@ ncls:
>>  else
>>  ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>>  } else {
>>+ if (deliver_exact)
>>+ goto inactive; /* bond or team inactive slave */
>> drop:
>>  atomic_long_inc(>dev->rx_dropped);
>>+inactive:
>>  kfree_skb(skb);
>>  /* Jamal, now you will not able to escape explaining
>>   * me how you were going to use this. :-)
>>-- 
>>1.8.3.1
>>
>
>---
>   -Jay Vosburgh, jay.vosbu...@canonical.com


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Jiri Pirko
Fri, Jan 22, 2016 at 08:11:22PM CET, ja...@redhat.com wrote:
>The network core tries to keep track of dropped packets, but some packets
>you wouldn't really call dropped, so much as intentionally ignored, under
>certain circumstances. One such case is that of bonding and team device
>slaves that are currently inactive. Their respective rx_handler functions
>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>which ends up tracking into the network core's __netif_receive_skb_core()
>function's drop path, with no pt_prev set. On a noisy network, this can
>result in a very rapidly incrementing rx_dropped counter, not only on the
>inactive slave(s), but also on the master device, such as the following:
>
>Inter-|   Receive|  Transmit
> face |bytespackets errs drop fifo frame compressed multicast|bytes
> packets errs drop fifo colls carrier compressed
>  p7p1: 14783346  1404300 1404280 0  0  2040  680  
>  8000 0   0  0
>  p7p2: 14805198  140648000 0  0  20340
>0000 0   0  0
> bond0: 53365248  5327980 4211600 0  0115151 2040  
> 24000 0   0  0
>lo:5420  54000 0  0 0 5420 
>  54000 0   0  0
>  p5p1: 19292195  1961970 1403680 0  0 56564  680  
>  8000 0   0  0
>  p5p2: 19289707  1961710 1403640 0  0 56547  680  
>  8000 0   0  0
>   em3: 20996626  158214000 0  0   3830
>0000 0   0  0
>   em2: 14065122  138462000 0  0   3100
>0000 0   0  0
>   em1: 14063162  138440000 0  0   3080
>0000 0   0  0
>   em4: 21050830  158729000 0  0   38571662
>  469000 0   0  0
>   ib0:   0   0000 0  0 00 
>   0000 0   0  0
>
>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>active-backup bond0, and you can see that all three have high drop counts,
>with the master bond0 showing a tally of all three.
>
>I know that this was previously discussed some here:
>
>http://www.spinics.net/lists/netdev/msg226341.html
>
>It seems additional counters never came to fruition, but honestly, for
>this particular case, I'm not even sure they're warranted, I'd be inclined
>to say just silently drop these packets without incrementing a counter. At
>least, that's probably what would make someone who has complained loudly
>about this issue happy, as they have monitoring tools that are squaking
>loudly at any increments to rx_dropped.
>
>CC: "David S. Miller" 
>CC: Eric Dumazet 
>CC: Jiri Pirko 
>CC: Daniel Borkmann 
>CC: Tom Herbert 
>CC: Jay Vosburgh 
>CC: Veaceslav Falico 
>CC: Andy Gospodarek 
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson 

Acked-by: Jiri Pirko 

I think this should be considered as a bug and go to -net.


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Eric Dumazet
On Fri, 2016-01-22 at 14:11 -0500, Jarod Wilson wrote:

> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>   else
>   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>   } else {
> + if (deliver_exact)
> + goto inactive; /* bond or team inactive slave */
>  drop:
>   atomic_long_inc(>dev->rx_dropped);
> +inactive:
>   kfree_skb(skb);
>   /* Jamal, now you will not able to escape explaining
>* me how you were going to use this. :-)

Note that if you still have a kfree_skb() instead of consume_skb(),
some tools will still give you a wrong signal (packet dropped ...).

But then maybe the signal is telling some truth.

We receive a packet, and decide to drop it because no one was willing to
handle it.

Maybe someone wants to know a particular slave receives 10,000 such
frames per second and hurts performance with useless work.

We should at least increment some counter and maybe dump it with
"ethtool -S" or something.





Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-23 Thread Andy Gospodarek
On Fri, Jan 22, 2016 at 02:11:22PM -0500, Jarod Wilson wrote:
> The network core tries to keep track of dropped packets, but some packets
> you wouldn't really call dropped, so much as intentionally ignored, under
> certain circumstances. One such case is that of bonding and team device
> slaves that are currently inactive. Their respective rx_handler functions
> return RX_HANDLER_EXACT (the only places in the kernel that return that),
> which ends up tracking into the network core's __netif_receive_skb_core()
> function's drop path, with no pt_prev set. On a noisy network, this can
> result in a very rapidly incrementing rx_dropped counter, not only on the
> inactive slave(s), but also on the master device, such as the following:
> 
> Inter-|   Receive|  Transmit
>  face |bytespackets errs drop fifo frame compressed multicast|bytes
> packets errs drop fifo colls carrier compressed
>   p7p1: 14783346  1404300 1404280 0  0  2040  680 
>   8000 0   0  0
>   p7p2: 14805198  140648000 0  0  20340   
> 0000 0   0  0
>  bond0: 53365248  5327980 4211600 0  0115151 2040 
>  24000 0   0  0
> lo:5420  54000 0  0 0 5420
>   54000 0   0  0
>   p5p1: 19292195  1961970 1403680 0  0 56564  680 
>   8000 0   0  0
>   p5p2: 19289707  1961710 1403640 0  0 56547  680 
>   8000 0   0  0
>em3: 20996626  158214000 0  0   3830   
> 0000 0   0  0
>em2: 14065122  138462000 0  0   3100   
> 0000 0   0  0
>em1: 14063162  138440000 0  0   3080   
> 0000 0   0  0
>em4: 21050830  158729000 0  0   38571662   
>   469000 0   0  0
>ib0:   0   0000 0  0 00
>0000 0   0  0
> 
> In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
> active-backup bond0, and you can see that all three have high drop counts,
> with the master bond0 showing a tally of all three.
> 
> I know that this was previously discussed some here:
> 
> http://www.spinics.net/lists/netdev/msg226341.html
> 
> It seems additional counters never came to fruition, but honestly, for
> this particular case, I'm not even sure they're warranted, I'd be inclined
> to say just silently drop these packets without incrementing a counter. At
> least, that's probably what would make someone who has complained loudly
> about this issue happy, as they have monitoring tools that are squaking
> loudly at any increments to rx_dropped.

I completely agree.

> CC: "David S. Miller" 
> CC: Eric Dumazet 
> CC: Jiri Pirko 
> CC: Daniel Borkmann 
> CC: Tom Herbert 
> CC: Jay Vosburgh 
> CC: Veaceslav Falico 
> CC: Andy Gospodarek 
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson 

Acked-by: Andy Gospodarek 

> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8cba3d8..1354c7b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4153,8 +4153,11 @@ ncls:
>   else
>   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>   } else {
> + if (deliver_exact)
> + goto inactive; /* bond or team inactive slave */
>  drop:
>   atomic_long_inc(>dev->rx_dropped);
> +inactive:
>   kfree_skb(skb);
>   /* Jamal, now you will not able to escape explaining
>* me how you were going to use this. :-)
> -- 
> 1.8.3.1
> 


Re: [RFC PATCH net] net/core: don't increment rx_dropped on inactive slaves

2016-01-22 Thread Jay Vosburgh
Jarod Wilson  wrote:

>The network core tries to keep track of dropped packets, but some packets
>you wouldn't really call dropped, so much as intentionally ignored, under
>certain circumstances. One such case is that of bonding and team device
>slaves that are currently inactive. Their respective rx_handler functions
>return RX_HANDLER_EXACT (the only places in the kernel that return that),
>which ends up tracking into the network core's __netif_receive_skb_core()
>function's drop path, with no pt_prev set. On a noisy network, this can
>result in a very rapidly incrementing rx_dropped counter, not only on the
>inactive slave(s), but also on the master device, such as the following:
[...]
>In this scenario, p5p1, p5p2 and p7p1 are all inactive slaves in an
>active-backup bond0, and you can see that all three have high drop counts,
>with the master bond0 showing a tally of all three.
>
>I know that this was previously discussed some here:
>
>http://www.spinics.net/lists/netdev/msg226341.html
>
>It seems additional counters never came to fruition, but honestly, for
>this particular case, I'm not even sure they're warranted, I'd be inclined
>to say just silently drop these packets without incrementing a counter. At
>least, that's probably what would make someone who has complained loudly
>about this issue happy, as they have monitoring tools that are squaking
>loudly at any increments to rx_dropped.

I don't think the kernel should silently drop packets; there
should be a counter somewhere.  If a packet is being thrown away
deliberately, it should not just vanish into the screaming void of
space.  Someday someone will try and track down where that packet is
being dropped.

I've had that same conversation with customers who insist on
accounting for every packet drop (from the "any drop is an error"
mindset), so I understand the issue.

Thinking about the prior discussion, the rx_drop_inactive is
still a good idea, but I'd actually today get good use from a
"rx_drop_unforwardable" (or an equivalent but shorter name) counter that
counts every time a packet is dropped due to is_skb_forwardable()
returning false.  __dev_forward_skb does this (and hits rx_dropped), as
does the bridge (and does not count it).

-J

>CC: "David S. Miller" 
>CC: Eric Dumazet 
>CC: Jiri Pirko 
>CC: Daniel Borkmann 
>CC: Tom Herbert 
>CC: Jay Vosburgh 
>CC: Veaceslav Falico 
>CC: Andy Gospodarek 
>CC: netdev@vger.kernel.org
>Signed-off-by: Jarod Wilson 
>---
> net/core/dev.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 8cba3d8..1354c7b 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -4153,8 +4153,11 @@ ncls:
>   else
>   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
>   } else {
>+  if (deliver_exact)
>+  goto inactive; /* bond or team inactive slave */
> drop:
>   atomic_long_inc(>dev->rx_dropped);
>+inactive:
>   kfree_skb(skb);
>   /* Jamal, now you will not able to escape explaining
>* me how you were going to use this. :-)
>-- 
>1.8.3.1
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com