date:20150702

Re: [PATCH] ipv6: Make MLD packets to only be processed locally.

2015-07-02 Thread David Miller


Your patch is severely corrupted by your email client, for example
TAB characters have been transformed into sequences of SPACE
characters.

Please read Documentation/email-clients.txt, fix your setup, and
then send a test patch to yourself and only when you can successfully
apply that patch you receive should you try resubmitting it again
here.

Thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] virtio/vhost: cross endian support

2015-07-02 Thread Michael S. Tsirkin

On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote:
 On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote:
  virtio/vhost: cross endian support
 
 Ugh. Does this really have to be dynamic?
 
 Can't virtio do the sane thing, and just use a _fixed_ endianness?
 
 Doing a unconditional byte swap is faster and simpler than the crazy
 conditionals. That's true regardless of endianness, but gets to be
 even more so if the fixed endianness is little-endian, since BE is
 not-so-slowly fading from the world.
 
Linus

Yea, well - support for legacy BE guests on the new LE hosts is
exactly the motivation for this.

I dislike it too, but there are two redeeming properties that
made me merge this:

1.  It's a trivial amount of code: since we wrap host/guest accesses
anyway, almost all of it is well hidden from drivers.

2.  Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY -
and when it's clear, there's zero overhead (as some point it was
tested by compiling with and without the patches, got the same
stripped binary).

Maybe we could create a Kconfig symbol to enforce point (2): prevent
people from enabling it e.g. on x86. I will look into this - but it can
be done by a patch on top, so I think this can be merged as is.

Or do you know of someone using kernel with all config options enabled
undiscriminately?

Thanks,

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Packet capturing performance

2015-07-02 Thread yzhu1


Hi,

You can use netfilter to mirror the packets to another nic. Then you can
capture these cloned patckets on this nic.

It can not affect the performance. Believe me, I made tests with it.

Zhu Yanjun

On 05/20/2015 09:13 PM, Deniz Eren wrote:

Hi,

I'm having problem with packet capturing performance on my linux server.

I am using Intel ixgbe 10g NIC with v3.19.1 version driver over Linux
3.15.9  based system. Naturally I can route 3.8Mpps packet from spoof
(random source) addressed traffic.

Whenever I open netsniff-ng to listen interface to capture packets at
silent mode, the capturing performance slows down at the same time to
~1.2Mpps levels. I have doing pps measurements by watching the changes
at /sys/class/net/interface_name/statistics/rx_packets so the
performance can not be affected the measurements (instead of tcpstat
etc).

My first theory was bpf is cause of this slowdown. When I try to
analyze the reason of this bottleneck I see that the bpf affects the
slow down ratio. When I narrow the filter to match 1/16 packet of
traffic (for example: src net 16.0.0.0/4 ), the capturing paket
performance stay ~3.7Mpps. And I start 16 netsniff-ng process (each
one process 1/16 part of entire traffic) with different filters the
performance stays ~3.0Mpps and the union of the 16 filter equal to
0.0.0.0/0 (0.0.0.0/4 + 16.0.0.0/4 + 32.0.0.0/4 + ...  + 248.0.0.0/4 =
0.0.0.0/0) . In other words
I think performance of network stack slow downs dramatically after a
number of matching traffic packets with given bpf.

But after some investigation and some advice from more expert people
the problem seems to be pf_packet sockets overhead. But I don't know
exactly where is the bottleneck. Do you have any idea exactly where
could be the bottleneck?

Since I am using netfilter a lot, kernel bypass is not an option for me.

To solve this problem I have two options for now:

- First one is experimenting socket fanout and adapting my tools to
use socket fanout.
- Second one is somehow similar, open more than one (ex: 16) socket
MMAP'ed socket whose have different filters from each other to match
with different part of the traffic at single netsniff_ng process. But
this one is too hacky and requires user-space modifications.

But I want to ask is there a better solution to this problem? Am I
missing a network tuning on linux or my ethernet device?

Thanks in advance,
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ax88179_178a: add reset functionality in reset_resume

2015-07-02 Thread David Miller


Emails encoded in HTML will not make it to the mailing list, and also
will not be read by me.

You have some serious email posting issues, that you will need to
sort out on your own before contributing changes.  Meanwhile perhaps
you can find a colleague who can submit patches properly to the list.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] virtio/vhost: cross endian support

2015-07-02 Thread Michael S. Tsirkin

On Wed, Jul 01, 2015 at 12:03:59PM -0700, Linus Torvalds wrote:
 On Wed, Jul 1, 2015 at 12:02 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  Doing a unconditional byte swap is faster and simpler than the crazy
  conditionals.
 
 Unconditional endianness not only makes for simpler and faster code,
 it also ends up being easier to debug and add things like type
 annotations for sparse.
 
 Linus

At least this last one is well covered by these patches: this uses
separate sparse types so all accesses are statically verified by sparse
to use the correct accessor.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pull request: bluetooth 2015-07-02

2015-07-02 Thread Johan Hedberg

Hi Dave,

A couple of regressions crept in because of a patch to use proper list
APIs rather than manually reading  writing the next/prev pointers
(commit 835a6a2f8603237a3e6cded5a6765090ecb06ea5). Turns out this was
masking a few bugs: a missing INIT_LIST_HEAD() call and incorrectly
using list_del() rather than list_del_init(). The two patches in this
set fix these, and it'd be nice they could still make it to 4.2-rc1 to
avoid new bug reports from users.

Please let me know if there are any issues pulling. Thanks.

Johan

--
The following changes since commit 011cb197a84ed547c2b6b12a86adbeec1be0fdaf:

  Merge branch 'bnx2x' (2015-06-25 06:30:43 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git 
for-upstream

for you to fetch changes up to ab944c83f6690df0c7f67e6bcc29fc0c82ef6021:

  Bluetooth: Reinitialize the list after deletion for session user list 
(2015-06-30 21:46:19 +0200)


Tedd Ho-Jeong An (2):
  Bluetooth: hidp: Initialize list header of hidp session user
  Bluetooth: Reinitialize the list after deletion for session user list

 net/bluetooth/hidp/core.c  | 1 +
 net/bluetooth/l2cap_core.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)


pgpLl4z8LLUSZ.pgp
Description: PGP signature

Re: [RFC] virtio_net: Adding tx_timeout function.

2015-07-02 Thread Pankaj Gupta


 On Wed, Jun 24, 2015 at 10:31:09PM -0300, Julio Faracco wrote:
  2015-06-24 3:10 GMT-03:00 Michael S. Tsirkin m...@redhat.com:
  
   On Tue, Jun 23, 2015 at 10:44:29PM -0300, Julio Faracco wrote:
virtio_net paravirtualized driver does not have a tx_timeout() function
to
guarantee that the driver will recover properly after receiving a
timeout
during a transmission of a packet. This patch add this feature and
throw a
timeout exception after 5 HZ. Considering some tests, this is the best
time to use here.
   
Signed-off-by: Julio Faracco jcfara...@gmail.com
Cc: Jason Wang jasow...@redhat.com
  
   Looks like a bunch of locks and flushes are missing in this patch.  IMHO
   that's just too painful with current hardware.  IMO the right thing to
   do here is to add ability to reset specific queues to hardware.
  
  
  I agree, Michael. This model is the default one resetting the device
  due to transmission timeout.
  To have a better performance, only some queues must be reset.
 
 It's not a question of performance. You would need to write
 a bunch of code anyway. Why not do it in the hypervisor
 so guest can simply write into a register and reset
 a ring?
 
 
 BTW now that I think about it, this requires Jason's
 patches that introduce the tx interrupt, otherwise
 packet will timeout simply because no packets are sent.

I am trying to understand how TX interrupt patches will help
here? This function will be called when driver fails to send 
packets. Even before TX interrupt patches, packets are flowing.

Is my understanding wrong some where?

 
 
---
 drivers/net/virtio_net.c |   69
 +-
 1 file changed, 68 insertions(+), 1 deletion(-)
   
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 63c7810..75ac45c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -135,6 +135,9 @@ struct virtnet_info {
  /* Work struct for config space updates */
  struct work_struct config_work;
   
+ /* Work struct for resetting the virtio-net driver. */
+ struct work_struct reset_task;
+
  /* Does the affinity hint is set for virtqueues? */
  bool affinity_hint_set;
   
@@ -1394,6 +1397,18 @@ static int virtnet_change_mtu(struct net_device
*dev, int new_mtu)
  return 0;
 }
   
+static void virtnet_tx_timeout(struct net_device *dev)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+
+ dev_warn(dev-dev, TX Timeout exception with latency: %ld\n,
+  jiffies - dev_trans_start(dev));
+
+ schedule_work(vi-reset_task);
  
   What if after this triggers user does something
   to the device (e.g. attempts to remove it)?
   Or if a packet is transmitted or used?
  
  At some point, this work must be canceled.
  Yes, you are right. Specially, when the driver is being removed.
  
+}
+
+static void virtnet_reset_task(struct work_struct *work);
+
 static const struct net_device_ops virtnet_netdev = {
  .ndo_open= virtnet_open,
  .ndo_stop= virtnet_close,
@@ -1405,6 +1420,7 @@ static const struct net_device_ops virtnet_netdev
= {
  .ndo_get_stats64 = virtnet_stats,
  .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
  .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+ .ndo_tx_timeout  = virtnet_tx_timeout,
 #ifdef CONFIG_NET_POLL_CONTROLLER
  .ndo_poll_controller = virtnet_netpoll,
 #endif
@@ -1750,6 +1766,7 @@ static int virtnet_probe(struct virtio_device
*vdev)
  dev-netdev_ops = virtnet_netdev;
  dev-features = NETIF_F_HIGHDMA;
   
+ dev-watchdog_timeo = 5 * HZ;
  dev-ethtool_ops = virtnet_ethtool_ops;
  SET_NETDEV_DEV(dev, vdev-dev);
   
@@ -1811,6 +1828,7 @@ static int virtnet_probe(struct virtio_device
*vdev)
  }
   
  INIT_WORK(vi-config_work, virtnet_config_changed_work);
+ INIT_WORK(vi-reset_task, virtnet_reset_task);
   
  /* If we can receive ANY GSO packets, we must allocate large
  ones. */
  if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1891,7 +1909,7 @@ static int virtnet_probe(struct virtio_device
*vdev)
  netif_carrier_on(dev);
  }
   
- pr_debug(virtnet: registered device %s with %d RX and TX
vq's\n,
+ pr_debug(virtio_net: registered device %s with %d RX and TX
vq's\n,
   dev-name, max_queue_pairs);
   
  return 0;
@@ -2001,6 +2019,55 @@ static int virtnet_restore(struct virtio_device
*vdev)
 }
 #endif
   
+static void virtnet_reset_task(struct work_struct *work)
+{
+ struct virtnet_info *vi =
+ container_of(work, struct virtnet_info, reset_task);
+ struct net_device *dev

Re: macb: zynq: why is SG disabled?

2015-07-02 Thread Nicolas Ferre

Le 02/07/2015 04:33, Punnaiah Choudary Kalluri a écrit :
 Hi Nicolae and Cyrille,
 
SG feature was not tested for Zynq using macb driver but tested it using 
 the
  emacps Driver in Xilinx tree (this driver is deprecated recently)
 
 We will test and enable this feature in driver for Zynq.

Ok, fine: we will be glad to merge patches that would enable this
feature on Zynq.

Bye,


 -Original Message-
 From: Cyrille Pitchen [mailto:cyrille.pitc...@atmel.com]
 Sent: Wednesday, July 01, 2015 10:34 PM
 To: Nicolae Rosia; Michal Simek; Punnaiah Choudary Kalluri;
 netdev@vger.kernel.org; Nicolas Ferre; linux-arm-ker...@lists.infradead.org
 Subject: Re: macb: zynq: why is SG disabled?

 Le 01/07/2015 17:14, Nicolae Rosia a écrit :
 Hello,

 After reading the GEM part of Zynq7000 Technical Reference Manual [0], I
 think that SG should be supported.
 Is there a reason why SG is disabled in macb for Zynq?

 Best regards,
 Nicolae Rosia

 Hi Nicolae,

 when the scatter-gather patch was introduced, the feature was enabled only
 on tested boards to avoid regressions on other boards.
 So SG is enabled on sama5d4x and sama5d2x SoCs. SG is disabled on purpose
 on sama5d3x.

 For Zynq, I think the feature is still disabled just because it has never 
 been
 tested.

 Best regards,

 Cyrille
 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] add stealth mode

2015-07-02 Thread Nicolas Dichtel


Le 02/07/2015 00:53, Matteo Croce a écrit :

Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Dest-Unreach for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.
---
  include/linux/inetdevice.h | 1 +
  include/linux/ipv6.h   | 1 +
  include/uapi/linux/ip.h| 1 +
  net/ipv4/devinet.c | 1 +
  net/ipv4/icmp.c| 6 ++
  net/ipv4/tcp_ipv4.c| 3 ++-
  net/ipv4/udp.c | 4 +++-
  net/ipv6/addrconf.c| 7 +++
  net/ipv6/icmp.c| 3 ++-
  net/ipv6/tcp_ipv6.c| 2 +-
  net/ipv6/udp.c | 3 ++-
  11 files changed, 27 insertions(+), 5 deletions(-)


It is recommended to add an explanation of new sysctl here:
Documentation/networking/ip-sysctl.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-02 Thread Julian Anastasov


Hello,

On Wed, 1 Jul 2015, Alex Gartrell wrote:

 On Wed, Jul 1, 2015 at 4:26 PM, Eric Dumazet eduma...@google.com wrote:
  I think you are mistaken Alex.
 
 Indeed, I was!  Should be unsurpising.
 
 
  socket early demux cannot possibly set skb-destructor to sock_rfree()
 
 Yeah I will admit adding the code to sock_rfree reflexively out of paranoia.
 
  If skb-destructor is set by early demux, it correctly points to 
  sock_edemux()
 
  And this one correctly handles all socket variants.
 
 Yes, the problem appears to be in ip_vs_prepare_tunneled_skb
 (ip_vs_xmit.c:859 in 4.0)
 
 if (skb_headroom(skb)  max_headroom || skb_cloned(skb)) {
 new_skb = skb_realloc_headroom(skb, max_headroom);
 if (!new_skb)
 goto error;
 if (skb-sk)
 skb_set_owner_w(new_skb, skb-sk);
 consume_skb(skb);
 skb = new_skb;
 }
 
 skb_set_owner_w sets sock_wfree.
 
 I'll figure out how to ensure that we're using an appropriate destructor here.

Alex, in our discussion on January I thought
we can skip calling skb_orphan for some cases but as
input and output path use different skb-destructor
we should call skb_orphan for every method, in every
case when skb-dev != NULL, even when we do not call
LOCAL_OUT, i.e. when NF_ACCEPT is returned for traffic
to local real server. We should not call it only for
local socket (skb-dev == NULL).

I think, your patch from January is almost
good:

http://archive.linuxvirtualserver.org/html/lvs-devel/2015-01/msg00014.html

Just add skb-dev check and we should be fine.
And the patch from Eric for IPVS looks good too.

Regards

--
Julian Anastasov j...@ssi.bg
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PULL] virtio/vhost: cross endian support

2015-07-02 Thread Greg Kurz

On Thu, 2 Jul 2015 08:01:28 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote:
  On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com wrote:
   virtio/vhost: cross endian support
  
  Ugh. Does this really have to be dynamic?
  
  Can't virtio do the sane thing, and just use a _fixed_ endianness?
  
  Doing a unconditional byte swap is faster and simpler than the crazy
  conditionals. That's true regardless of endianness, but gets to be
  even more so if the fixed endianness is little-endian, since BE is
  not-so-slowly fading from the world.
  
 Linus
 
 Yea, well - support for legacy BE guests on the new LE hosts is
 exactly the motivation for this.
 
 I dislike it too, but there are two redeeming properties that
 made me merge this:
 
 1.  It's a trivial amount of code: since we wrap host/guest accesses
 anyway, almost all of it is well hidden from drivers.
 
 2.  Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY -
 and when it's clear, there's zero overhead (as some point it was
 tested by compiling with and without the patches, got the same
 stripped binary).
 
 Maybe we could create a Kconfig symbol to enforce point (2): prevent
 people from enabling it e.g. on x86. I will look into this - but it can
 be done by a patch on top, so I think this can be merged as is.
 

This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I
am not aware of any other users. Maybe create a symbol that would
be only selected by PPC_BOOK3S_64 ?


 Or do you know of someone using kernel with all config options enabled
 undiscriminately?
 
 Thanks,
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] virtio_net: Adding tx_timeout function.

2015-07-02 Thread Michael S. Tsirkin

On Thu, Jul 02, 2015 at 03:23:56AM -0400, Pankaj Gupta wrote:
 
  On Wed, Jun 24, 2015 at 10:31:09PM -0300, Julio Faracco wrote:
   2015-06-24 3:10 GMT-03:00 Michael S. Tsirkin m...@redhat.com:
   
On Tue, Jun 23, 2015 at 10:44:29PM -0300, Julio Faracco wrote:
 virtio_net paravirtualized driver does not have a tx_timeout() 
 function
 to
 guarantee that the driver will recover properly after receiving a
 timeout
 during a transmission of a packet. This patch add this feature and
 throw a
 timeout exception after 5 HZ. Considering some tests, this is the best
 time to use here.

 Signed-off-by: Julio Faracco jcfara...@gmail.com
 Cc: Jason Wang jasow...@redhat.com
   
Looks like a bunch of locks and flushes are missing in this patch.  IMHO
that's just too painful with current hardware.  IMO the right thing to
do here is to add ability to reset specific queues to hardware.
   
   
   I agree, Michael. This model is the default one resetting the device
   due to transmission timeout.
   To have a better performance, only some queues must be reset.
  
  It's not a question of performance. You would need to write
  a bunch of code anyway. Why not do it in the hypervisor
  so guest can simply write into a register and reset
  a ring?
  
  
  BTW now that I think about it, this requires Jason's
  patches that introduce the tx interrupt, otherwise
  packet will timeout simply because no packets are sent.
 
 I am trying to understand how TX interrupt patches will help
 here? This function will be called when driver fails to send 
 packets. Even before TX interrupt patches, packets are flowing.
 
 Is my understanding wrong some where?

Without tx interrupts, skbs are never freed until you send
more packets. Which might never happen.


  
  
 ---
  drivers/net/virtio_net.c |   69
  +-
  1 file changed, 68 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 63c7810..75ac45c 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -135,6 +135,9 @@ struct virtnet_info {
   /* Work struct for config space updates */
   struct work_struct config_work;

 + /* Work struct for resetting the virtio-net driver. */
 + struct work_struct reset_task;
 +
   /* Does the affinity hint is set for virtqueues? */
   bool affinity_hint_set;

 @@ -1394,6 +1397,18 @@ static int virtnet_change_mtu(struct net_device
 *dev, int new_mtu)
   return 0;
  }

 +static void virtnet_tx_timeout(struct net_device *dev)
 +{
 + struct virtnet_info *vi = netdev_priv(dev);
 +
 + dev_warn(dev-dev, TX Timeout exception with latency: %ld\n,
 +  jiffies - dev_trans_start(dev));
 +
 + schedule_work(vi-reset_task);
   
What if after this triggers user does something
to the device (e.g. attempts to remove it)?
Or if a packet is transmitted or used?
   
   At some point, this work must be canceled.
   Yes, you are right. Specially, when the driver is being removed.
   
 +}
 +
 +static void virtnet_reset_task(struct work_struct *work);
 +
  static const struct net_device_ops virtnet_netdev = {
   .ndo_open= virtnet_open,
   .ndo_stop= virtnet_close,
 @@ -1405,6 +1420,7 @@ static const struct net_device_ops 
 virtnet_netdev
 = {
   .ndo_get_stats64 = virtnet_stats,
   .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
   .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
 + .ndo_tx_timeout  = virtnet_tx_timeout,
  #ifdef CONFIG_NET_POLL_CONTROLLER
   .ndo_poll_controller = virtnet_netpoll,
  #endif
 @@ -1750,6 +1766,7 @@ static int virtnet_probe(struct virtio_device
 *vdev)
   dev-netdev_ops = virtnet_netdev;
   dev-features = NETIF_F_HIGHDMA;

 + dev-watchdog_timeo = 5 * HZ;
   dev-ethtool_ops = virtnet_ethtool_ops;
   SET_NETDEV_DEV(dev, vdev-dev);

 @@ -1811,6 +1828,7 @@ static int virtnet_probe(struct virtio_device
 *vdev)
   }

   INIT_WORK(vi-config_work, virtnet_config_changed_work);
 + INIT_WORK(vi-reset_task, virtnet_reset_task);

   /* If we can receive ANY GSO packets, we must allocate large
   ones. */
   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 @@ -1891,7 +1909,7 @@ static int virtnet_probe(struct virtio_device
 *vdev)
   netif_carrier_on(dev);
   }

 - pr_debug(virtnet: registered device %s with %d RX and TX
 vq's\n,
 + pr_debug(virtio_net: registered device %s with %d RX and TX
 vq's\n,
dev-name, max_queue_pairs);

   return 0;
 @@

Re: [PULL] virtio/vhost: cross endian support

2015-07-02 Thread Michael S. Tsirkin

On Thu, Jul 02, 2015 at 11:12:56AM +0200, Greg Kurz wrote:
 On Thu, 2 Jul 2015 08:01:28 +0200
 Michael S. Tsirkin m...@redhat.com wrote:
 
  On Wed, Jul 01, 2015 at 12:02:50PM -0700, Linus Torvalds wrote:
   On Wed, Jul 1, 2015 at 2:31 AM, Michael S. Tsirkin m...@redhat.com 
   wrote:
virtio/vhost: cross endian support
   
   Ugh. Does this really have to be dynamic?
   
   Can't virtio do the sane thing, and just use a _fixed_ endianness?
   
   Doing a unconditional byte swap is faster and simpler than the crazy
   conditionals. That's true regardless of endianness, but gets to be
   even more so if the fixed endianness is little-endian, since BE is
   not-so-slowly fading from the world.
   
  Linus
  
  Yea, well - support for legacy BE guests on the new LE hosts is
  exactly the motivation for this.
  
  I dislike it too, but there are two redeeming properties that
  made me merge this:
  
  1.  It's a trivial amount of code: since we wrap host/guest accesses
  anyway, almost all of it is well hidden from drivers.
  
  2.  Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY -
  and when it's clear, there's zero overhead (as some point it was
  tested by compiling with and without the patches, got the same
  stripped binary).
  
  Maybe we could create a Kconfig symbol to enforce point (2): prevent
  people from enabling it e.g. on x86. I will look into this - but it can
  be done by a patch on top, so I think this can be merged as is.
  
 
 This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I
 am not aware of any other users. Maybe create a symbol that would
 be only selected by PPC_BOOK3S_64 ?

I think some ARM systems are trying to support cross-endian
configurations as well.

Besides that, yes, this is more or less what I had in mind.

 
  Or do you know of someone using kernel with all config options enabled
  undiscriminately?
  
  Thanks,
  
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] add stealth mode

2015-07-02 Thread Matteo Croce

Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce mat...@openwrt.org
---
 Documentation/networking/ip-sysctl.txt | 12 
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 12 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt
b/Documentation/networking/ip-sysctl.txt
index 5fae770..9eed021 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,12 @@ tag - INTEGER
  Allows you to write a number, which can be used as required.
  Default value is 0.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMP replies to echo requests and timestamp.
+ Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru

@@ -1584,6 +1590,12 @@ stable_secret - IPv6 address

  By default the stable secret is unset.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMPv6 replies to echo requests.
+ Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
  Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct
in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)

 struct in_ifaddr {
  struct hlist_node hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
  __s32   ndisc_notify;
  __s32 suppress_frag_ndisc;
  __s32 accept_ra_mtu;
+ __s32 stealth;
  struct ipv6_stable_secret {
  bool initialized;
  struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
  IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+ IPV4_DEVCONF_STEALTH,
  __IPV4_DEVCONF_MAX
 };

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7498716..6b9930a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table {
   promote_secondaries),
  DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
   route_localnet),
+ DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth),
  },
 };

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..e8e71fb 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
  struct net *net;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  net = dev_net(skb_dst(skb)-dev);
  if (!net-ipv4.sysctl_icmp_echo_ignore_all) {
  struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
  if (skb-len  4)
  goto out_err;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  /*
  * Fill in the current time as ms since midnight UT:
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b..6f3e6e9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -77,6 +77,7 @@
 #include net/busy_poll.h

 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/ipv6.h
 #include linux/stddef.h
 #include linux/proc_fs.h
@@ -1652,7 +1653,7 @@ csum_error:
  TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS);
 bad_packet:
  TCP_INC_STATS_BH(net, TCP_MIB_INERRS);
- } else {
+ } else if (!IN_DEV_STEALTH(skb-dev-ip_ptr)) {
  tcp_v4_send_reset(NULL, skb);
  }

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604..780069d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -96,6 +96,7 @@
 #include linux/timer.h
 #include linux/mm.h
 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/netdevice.h
 #include

Re: [PATCH] add stealth mode

2015-07-02 Thread Nicolas Dichtel


Le 02/07/2015 10:38, Matteo Croce a écrit :

Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce mat...@openwrt.org
---
  Documentation/networking/ip-sysctl.txt | 12 
  include/linux/inetdevice.h |  1 +
  include/linux/ipv6.h   |  1 +
  include/uapi/linux/ip.h|  1 +
  net/ipv4/devinet.c |  1 +
  net/ipv4/icmp.c|  6 ++
  net/ipv4/tcp_ipv4.c|  3 ++-
  net/ipv4/udp.c |  4 +++-
  net/ipv6/addrconf.c|  7 +++
  net/ipv6/icmp.c|  3 ++-
  net/ipv6/tcp_ipv6.c|  2 +-
  net/ipv6/udp.c |  3 ++-
  12 files changed, 39 insertions(+), 5 deletions(-)
Please, read 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches


The subject of your email should contain v2 and you should describe the change
from v1 after the '---'.
Also, right now, net-next is closed, so new features are not accepted.


Regards,
Nicolas
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] add stealth mode

2015-07-02 Thread Matteo Croce

Add option to disable any reply not related to a listening socket,
like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
Also disables ICMP replies to echo request and timestamp.
The stealth mode can be enabled selectively for a single interface.

Signed-off-by: Matteo Croce mat...@openwrt.org
---
check the patch with checkpatch.pl and add documentation in ip-sysctl.txt

 Documentation/networking/ip-sysctl.txt | 12 
 include/linux/inetdevice.h |  1 +
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ip.h|  1 +
 net/ipv4/devinet.c |  1 +
 net/ipv4/icmp.c|  6 ++
 net/ipv4/tcp_ipv4.c|  3 ++-
 net/ipv4/udp.c |  4 +++-
 net/ipv6/addrconf.c|  7 +++
 net/ipv6/icmp.c|  3 ++-
 net/ipv6/tcp_ipv6.c|  2 +-
 net/ipv6/udp.c |  3 ++-
 12 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt
b/Documentation/networking/ip-sysctl.txt
index 5fae770..9eed021 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1181,6 +1181,12 @@ tag - INTEGER
  Allows you to write a number, which can be used as required.
  Default value is 0.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMP replies to echo requests and timestamp.
+ Default value is 0.
+
 Alexey Kuznetsov.
 kuz...@ms2.inr.ac.ru

@@ -1584,6 +1590,12 @@ stable_secret - IPv6 address

  By default the stable secret is unset.

+stealth - BOOLEAN
+ Disable any reply not related to a listening socket,
+ like RST/ACK for TCP and ICMP Port-Unreachable for UDP.
+ Also disables ICMPv6 replies to echo requests.
+ Default value is 0.
+
 icmp/*:
 ratelimit - INTEGER
  Limit the maximal rates for sending ICMPv6 packets.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a4328ce..a64c01e 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -128,6 +128,7 @@ static inline void ipv4_devconf_setall(struct
in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_STEALTH(in_dev) IN_DEV_MAXCONF((in_dev), STEALTH)

 struct in_ifaddr {
  struct hlist_node hash;
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..49494ec 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -53,6 +53,7 @@ struct ipv6_devconf {
  __s32   ndisc_notify;
  __s32 suppress_frag_ndisc;
  __s32 accept_ra_mtu;
+ __s32 stealth;
  struct ipv6_stable_secret {
  bool initialized;
  struct in6_addr secret;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index 08f894d..4acbf99 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -165,6 +165,7 @@ enum
  IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
  IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
+ IPV4_DEVCONF_STEALTH,
  __IPV4_DEVCONF_MAX
 };

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 7498716..6b9930a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2178,6 +2178,7 @@ static struct devinet_sysctl_table {
   promote_secondaries),
  DEVINET_SYSCTL_FLUSHING_ENTRY(ROUTE_LOCALNET,
   route_localnet),
+ DEVINET_SYSCTL_RW_ENTRY(STEALTH, stealth),
  },
 };

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..e8e71fb 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -882,6 +882,9 @@ static bool icmp_echo(struct sk_buff *skb)
 {
  struct net *net;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  net = dev_net(skb_dst(skb)-dev);
  if (!net-ipv4.sysctl_icmp_echo_ignore_all) {
  struct icmp_bxm icmp_param;
@@ -915,6 +918,9 @@ static bool icmp_timestamp(struct sk_buff *skb)
  if (skb-len  4)
  goto out_err;

+ if (IN_DEV_STEALTH(skb-dev-ip_ptr))
+ return true;
+
  /*
  * Fill in the current time as ms since midnight UT:
  */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d7d4c2b..6f3e6e9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -77,6 +77,7 @@
 #include net/busy_poll.h

 #include linux/inet.h
+#include linux/inetdevice.h
 #include linux/ipv6.h
 #include linux/stddef.h
 #include linux/proc_fs.h
@@ -1652,7 +1653,7 @@ csum_error:
  TCP_INC_STATS_BH(net, TCP_MIB_CSUMERRORS);
 bad_packet:
  TCP_INC_STATS_BH(net, TCP_MIB_INERRS);
- } else {
+ } else if (!IN_DEV_STEALTH(skb-dev-ip_ptr)) {
  tcp_v4_send_reset(NULL, skb);
  }

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604..780069d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -96,6 +96,7 @@
 #include linux/timer.h
 #include linux/mm.h
 #include

Re: [PATCH] bonding: primary_reselect with failure is not working properly

2015-07-02 Thread Eric Dumazet

On Thu, 2015-07-02 at 15:43 +0530, Mazhar Rana wrote:
 When primary_reselect is set to failure, primary interface should
 not become active until current active slave is up. But if we set first
 member of bond device as a primary interface and primary_reselect
 is set to failure then whenever primary interface's link get back(up)
 it become active slave even if current active slave is still up.
 
 With this patch, bond_find_best_slave will not traverse members if
 primary interface is not candidate for failover/reselection and current
 active slave is still up.
 
 Signed-off-by: Mazhar Rana mazhar.r...@cyberoam.com
 Reviewed-by: Sanket Shah sanket.s...@cyberoam.com
 ---
  drivers/net/bonding/bond_main.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
 index 19eb990..ac71261 100644
 --- a/drivers/net/bonding/bond_main.c
 +++ b/drivers/net/bonding/bond_main.c
 @@ -715,7 +715,7 @@ static bool bond_should_change_active(struct bonding 
 *bond)
   */
  static struct slave *bond_find_best_slave(struct bonding *bond)
  {
 - struct slave *slave, *bestslave = NULL, *primary;
 + struct slave *slave, *bestslave = NULL, *primary, *curr;
   struct list_head *iter;
   int mintime = bond-params.updelay;
  
 @@ -724,6 +724,14 @@ static struct slave *bond_find_best_slave(struct bonding 
 *bond)
   bond_should_change_active(bond))
   return primary;
  
 + /* We are here means primary interface is not candidate for
 +  * reslection/failover. If currenet active slave is still up
 +  * then there is no meaning to traverse  members.
 +  */
 + curr = rtnl_dereference(bond-curr_active_slave);

Here you carefully use rtnl_dereference(bond-curr_active_slave)

(This is good)

 + if (curr  curr-link == BOND_LINK_UP)
 + return bond-curr_active_slave;

But here you return bond-curr_active_slave 
instead of curr

 +
   bond_for_each_slave(bond, slave, iter) {
   if (slave-link == BOND_LINK_UP)
   return slave;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rtnetlink: Actually use the policy for the IFLA_VF_INFO

2015-07-02 Thread Daniel Borkmann


On 07/01/2015 11:36 AM, Daniel Borkmann wrote:

Hi Jason,

On 07/01/2015 12:52 AM, Jason Gunthorpe wrote:

It turns out the policy was defined but never actually checked,
so lets check it.

Fixes: ebc08a6f47ee (rtnetlink: Add VF config code to rtnetlink)


I would argue that the actual commit would be ...

Fixes: c02db8c6290b (rtnetlink: make SR-IOV VF interface symmetric)

... since in ebc08a6f47ee, these members were part of ifla_policy[]
which has been validated (if we ignore the fact that it was NLA_BINARY).

So, commit c02db8c6290b moved it into a nested attribute (IFLA_VF_INFO)
where we indeed don't do further validation. Imho, we should pass the
parsed attribute table from nla_parse_nested() down into do_setvfinfo(),
something like the below; I can give it a test run on my ixgbe.


Sorry for the late reply, something like this looks good from my side.

Thanks,
Daniel
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] mlx4: TCP/UDP packets have L4 hash

2015-07-02 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

Mellanox driver has the knowledge if rxhash is a L4 hash,
if it receives a non fragmented TCP or UDP frame and
NETIF_F_RXCSUM is enabled on netdev.

ip_summed value is CHECKSUM_UNNECESSARY in this case.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Amir Vadai am...@mellanox.com
Cc: Ido Shamay i...@mellanox.com
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 7a4f20bb7fcb..12c65e1ad6a9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -917,7 +917,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
if (dev-features  NETIF_F_RXHASH)
skb_set_hash(gro_skb,
 
be32_to_cpu(cqe-immed_rss_invalid),
-PKT_HASH_TYPE_L3);
+(ip_summed == 
CHECKSUM_UNNECESSARY) ?
+   PKT_HASH_TYPE_L4 :
+   PKT_HASH_TYPE_L3);
 
skb_record_rx_queue(gro_skb, cq-ring);
skb_mark_napi_id(gro_skb, cq-napi);
@@ -963,7 +965,9 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
if (dev-features  NETIF_F_RXHASH)
skb_set_hash(skb,
 be32_to_cpu(cqe-immed_rss_invalid),
-PKT_HASH_TYPE_L3);
+(ip_summed == CHECKSUM_UNNECESSARY) ?
+   PKT_HASH_TYPE_L4 :
+   PKT_HASH_TYPE_L3);
 
if ((be32_to_cpu(cqe-vlan_my_qpn) 
MLX4_CQE_VLAN_PRESENT_MASK) 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mlx4: failed to allocate default counter port 1

2015-07-02 Thread Sebastian Ott

On Wed, 1 Jul 2015, Or Gerlitz wrote:
 On Wed, Jul 1, 2015 at 5:18 PM, Sebastian Ott seb...@linux.vnet.ibm.com 
 wrote:
  OK, using this patch it worked:
 
 yep, I forgot to recap err to zero.
 
 By it worked you mean the VF is live and kicking, all functionality
 you had before the 4.2 merge window is back again?
 

Yes.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] bonding: primary_reselect with failure is not working properly

2015-07-02 Thread Mazhar Rana

When primary_reselect is set to failure, primary interface should
not become active until current active slave is up. But if we set first
member of bond device as a primary interface and primary_reselect
is set to failure then whenever primary interface's link get back(up)
it become active slave even if current active slave is still up.

With this patch, bond_find_best_slave will not traverse members if
primary interface is not candidate for failover/reselection and current
active slave is still up.

Signed-off-by: Mazhar Rana mazhar.r...@cyberoam.com
Reviewed-by: Sanket Shah sanket.s...@cyberoam.com
---
 drivers/net/bonding/bond_main.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990..ac71261 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -715,7 +715,7 @@ static bool bond_should_change_active(struct bonding *bond)
  */
 static struct slave *bond_find_best_slave(struct bonding *bond)
 {
-   struct slave *slave, *bestslave = NULL, *primary;
+   struct slave *slave, *bestslave = NULL, *primary, *curr;
struct list_head *iter;
int mintime = bond-params.updelay;
 
@@ -724,6 +724,14 @@ static struct slave *bond_find_best_slave(struct bonding 
*bond)
bond_should_change_active(bond))
return primary;
 
+   /* We are here means primary interface is not candidate for
+* reslection/failover. If currenet active slave is still up
+* then there is no meaning to traverse  members.
+*/
+   curr = rtnl_dereference(bond-curr_active_slave);
+   if (curr  curr-link == BOND_LINK_UP)
+   return bond-curr_active_slave;
+
bond_for_each_slave(bond, slave, iter) {
if (slave-link == BOND_LINK_UP)
return slave;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] bonding: primary_reselect with failure is not working properly

2015-07-02 Thread Mazhar Rana

When primary_reselect is set to failure, primary interface should
not become active until current active slave is up. But if we set first
member of bond device as a primary interface and primary_reselect
is set to failure then whenever primary interface's link get back(up)
it become active slave even if current active slave is still up.

With this patch, bond_find_best_slave will not traverse members if
primary interface is not candidate for failover/reselection and current
active slave is still up.

Signed-off-by: Mazhar Rana mazhar.r...@cyberoam.com
Reviewed-by: Sanket Shah sanket.s...@cyberoam.com
---
v2: return curr instead of bond-curr_active_slave.

 drivers/net/bonding/bond_main.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990..ac71261 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -715,7 +715,7 @@ static bool bond_should_change_active(struct bonding *bond)
  */
 static struct slave *bond_find_best_slave(struct bonding *bond)
 {
-   struct slave *slave, *bestslave = NULL, *primary;
+   struct slave *slave, *bestslave = NULL, *primary, *curr;
struct list_head *iter;
int mintime = bond-params.updelay;
 
@@ -724,6 +724,14 @@ static struct slave *bond_find_best_slave(struct bonding 
*bond)
bond_should_change_active(bond))
return primary;
 
+   /* We are here means primary interface is not candidate for
+* reslection/failover. If currenet active slave is still up
+* then there is no meaning to traverse  members.
+*/
+   curr = rtnl_dereference(bond-curr_active_slave);
+   if (curr  curr-link == BOND_LINK_UP)
+   return curr;
+
bond_for_each_slave(bond, slave, iter) {
if (slave-link == BOND_LINK_UP)
return slave;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] bridge: vlan: fix usage of vlan 0 and 4095 again

2015-07-02 Thread Nikolay Aleksandrov

Vlan ids 0 and 4095 were disallowed by commit:
8adff41c3d25 (bridge: Don't use VID 0 and 4095 in vlan filtering)
but then the check was removed when vlan ranges were introduced by:
bdced7ef7838 (bridge: support for multiple vlans and vlan ranges in setlink 
and dellink requests)
So reintroduce the vlan range check.
Before patch:
[root@testvm ~]# bridge vlan add vid 0 dev eth0 master
(succeeds)
After Patch:
[root@testvm ~]# bridge vlan add vid 0 dev eth0 master
RTNETLINK answers: Invalid argument

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: bdced7ef7838 (bridge: support for multiple vlans and vlan ranges in 
setlink and dellink requests)
---
 net/bridge/br_netlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 6b67ed3831de..364bdc98bd9b 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -457,6 +457,8 @@ static int br_afspec(struct net_bridge *br,
if (nla_len(attr) != sizeof(struct bridge_vlan_info))
return -EINVAL;
vinfo = nla_data(attr);
+   if (!vinfo-vid || vinfo-vid = VLAN_VID_MASK)
+   return -EINVAL;
if (vinfo-flags  BRIDGE_VLAN_INFO_RANGE_BEGIN) {
if (vinfo_start)
return -EINVAL;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 6/6] net_sched: act: remove spinlock in fast path

2015-07-02 Thread Eric Dumazet

Final step for gact RCU operation :

1) Use percpu stats
2) update lastuse only every clock tick
3) Remove spinlock acquisition, as it is no longer needed.

Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.

My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 10 Mpps after patch.

Tested:

On receiver :
IP=ip
TC=tc
dev=eth0

$TC qdisc del dev $dev ingress 2/dev/null
$TC qdisc add dev $dev ingress
$TC filter del dev $dev root pref 10 2/dev/null
$TC filter del dev $dev pref 10 2/dev/null
tc filter add dev $dev est 1sec 4sec parent : protocol ip prio 1 \
u32 match ip src 7.0.0.0/8 flowid 1:15 action drop

Sender sends packets flood from 7/8 network

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com
---
 net/sched/act_gact.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 7c7e72e95943..e054e5630aab 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -88,7 +88,7 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
 
if (!tcf_hash_check(parm-index, a, bind)) {
ret = tcf_hash_create(parm-index, est, a, sizeof(*gact),
- bind, false);
+ bind, true);
if (ret)
return ret;
ret = ACT_P_CREATED;
@@ -121,9 +121,8 @@ static int tcf_gact(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_gact *gact = a-priv;
-   int action = gact-tcf_action;
+   int action = READ_ONCE(gact-tcf_action);
 
-   spin_lock(gact-tcf_lock);
 #ifdef CONFIG_GACT_PROB
{
u32 ptype = READ_ONCE(gact-tcfg_ptype);
@@ -132,12 +131,11 @@ static int tcf_gact(struct sk_buff *skb, const struct 
tc_action *a,
action = gact_rand[ptype](gact);
}
 #endif
-   gact-tcf_bstats.bytes += qdisc_pkt_len(skb);
-   gact-tcf_bstats.packets++;
+   bstats_cpu_update(this_cpu_ptr(gact-common.cpu_bstats), skb);
if (action == TC_ACT_SHOT)
-   gact-tcf_qstats.drops++;
-   gact-tcf_tm.lastuse = jiffies;
-   spin_unlock(gact-tcf_lock);
+   qstats_drop_inc(this_cpu_ptr(gact-common.cpu_qstats));
+   if (gact-tcf_tm.lastuse != jiffies)
+   gact-tcf_tm.lastuse = jiffies;
 
return action;
 }
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/6] net: sched: add percpu stats to actions

2015-07-02 Thread Eric Dumazet

Reuse existing percpu infrastructure Jonh added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com
---
 include/net/act_api.h|  4 +++-
 net/sched/act_api.c  | 44 ++--
 net/sched/act_bpf.c  |  2 +-
 net/sched/act_connmark.c |  3 ++-
 net/sched/act_csum.c |  3 ++-
 net/sched/act_gact.c |  3 ++-
 net/sched/act_ipt.c  |  2 +-
 net/sched/act_mirred.c   |  3 ++-
 net/sched/act_nat.c  |  3 ++-
 net/sched/act_pedit.c|  3 ++-
 net/sched/act_simple.c   |  3 ++-
 net/sched/act_skbedit.c  |  3 ++-
 net/sched/act_vlan.c |  3 ++-
 13 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 3ee4c92afd1b..db2063ffd181 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -21,6 +21,8 @@ struct tcf_common {
struct gnet_stats_rate_est64tcfc_rate_est;
spinlock_t  tcfc_lock;
struct rcu_head tcfc_rcu;
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+   struct gnet_stats_queue __percpu *cpu_qstats;
 };
 #define tcf_head   common.tcfc_head
 #define tcf_index  common.tcfc_index
@@ -103,7 +105,7 @@ int tcf_hash_release(struct tc_action *a, int bind);
 u32 tcf_hash_new_index(struct tcf_hashinfo *hinfo);
 int tcf_hash_check(u32 index, struct tc_action *a, int bind);
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
-   int size, int bind);
+   int size, int bind, bool cpustats);
 void tcf_hash_cleanup(struct tc_action *a, struct nlattr *est);
 void tcf_hash_insert(struct tc_action *a);
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index af427a3dbcba..074a32f466f8 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -27,6 +27,15 @@
 #include net/act_api.h
 #include net/netlink.h
 
+static void free_tcf(struct rcu_head *head)
+{
+   struct tcf_common *p = container_of(head, struct tcf_common, tcfc_rcu);
+
+   free_percpu(p-cpu_bstats);
+   free_percpu(p-cpu_qstats);
+   kfree(p);
+}
+
 void tcf_hash_destroy(struct tc_action *a)
 {
struct tcf_common *p = a-priv;
@@ -41,7 +50,7 @@ void tcf_hash_destroy(struct tc_action *a)
 * gen_estimator est_timer() might access p-tcfc_lock
 * or bstats, wait a RCU grace period before freeing p
 */
-   kfree_rcu(p, tcfc_rcu);
+   call_rcu(p-tcfc_rcu, free_tcf);
 }
 EXPORT_SYMBOL(tcf_hash_destroy);
 
@@ -230,15 +239,16 @@ void tcf_hash_cleanup(struct tc_action *a, struct nlattr 
*est)
if (est)
gen_kill_estimator(pc-tcfc_bstats,
   pc-tcfc_rate_est);
-   kfree_rcu(pc, tcfc_rcu);
+   call_rcu(pc-tcfc_rcu, free_tcf);
 }
 EXPORT_SYMBOL(tcf_hash_cleanup);
 
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
-   int size, int bind)
+   int size, int bind, bool cpustats)
 {
struct tcf_hashinfo *hinfo = a-ops-hinfo;
struct tcf_common *p = kzalloc(size, GFP_KERNEL);
+   int err = -ENOMEM;
 
if (unlikely(!p))
return -ENOMEM;
@@ -246,18 +256,32 @@ int tcf_hash_create(u32 index, struct nlattr *est, struct 
tc_action *a,
if (bind)
p-tcfc_bindcnt = 1;
 
+   if (cpustats) {
+   p-cpu_bstats = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
+   if (!p-cpu_bstats) {
+err1:
+   kfree(p);
+   return err;
+   }
+   p-cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+   if (!p-cpu_qstats) {
+err2:
+   free_percpu(p-cpu_bstats);
+   goto err1;
+   }
+   }
spin_lock_init(p-tcfc_lock);
INIT_HLIST_NODE(p-tcfc_head);
p-tcfc_index = index ? index : tcf_hash_new_index(hinfo);
p-tcfc_tm.install = jiffies;
p-tcfc_tm.lastuse = jiffies;
if (est) {
-   int err = gen_new_estimator(p-tcfc_bstats, NULL,
-   p-tcfc_rate_est,
-   p-tcfc_lock, est);
+   err = gen_new_estimator(p-tcfc_bstats, p-cpu_bstats,
+   p-tcfc_rate_est,
+   p-tcfc_lock, est);
if (err) {
-   kfree(p);
-   return err;
+   free_percpu(p-cpu_qstats);
+   goto err2;
}
}
 
@@ -615,10 +639,10 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct 
tc_action *a,

[PATCH net-next 3/6] net_sched: act: make tcfg_pval non zero

2015-07-02 Thread Eric Dumazet

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com
---
 net/sched/act_gact.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index a4f8af29ee30..42284aad77dd 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -28,14 +28,14 @@
 #ifdef CONFIG_GACT_PROB
 static int gact_net_rand(struct tcf_gact *gact)
 {
-   if (!gact-tcfg_pval || prandom_u32() % gact-tcfg_pval)
+   if (prandom_u32() % gact-tcfg_pval)
return gact-tcf_action;
return gact-tcfg_paction;
 }
 
 static int gact_determ(struct tcf_gact *gact)
 {
-   if (!gact-tcfg_pval || gact-tcf_bstats.packets % gact-tcfg_pval)
+   if (gact-tcf_bstats.packets % gact-tcfg_pval)
return gact-tcf_action;
return gact-tcfg_paction;
 }
@@ -105,7 +105,7 @@ static int tcf_gact_init(struct net *net, struct nlattr 
*nla,
 #ifdef CONFIG_GACT_PROB
if (p_parm) {
gact-tcfg_paction = p_parm-paction;
-   gact-tcfg_pval= p_parm-pval;
+   gact-tcfg_pval= max_t(u16, 1, p_parm-pval);
gact-tcfg_ptype   = p_parm-ptype;
}
 #endif
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 4/6] net_sched: act: use a separate packet counters for gact_determ()

2015-07-02 Thread Eric Dumazet

Second step for gact RCU operation :

We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.

gact_determ() would not work without a central packet counter,
so lets add it for this mode.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com
---
 include/net/tc_act/tc_gact.h | 7 ---
 net/sched/act_gact.c | 4 +++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 9fc9b578908a..592a6bc02b0b 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -6,9 +6,10 @@
 struct tcf_gact {
struct tcf_common   common;
 #ifdef CONFIG_GACT_PROB
-u16tcfg_ptype;
-u16tcfg_pval;
-inttcfg_paction;
+   u16 tcfg_ptype;
+   u16 tcfg_pval;
+   int tcfg_paction;
+   atomic_tpackets;
 #endif
 };
 #define to_gact(a) \
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 42284aad77dd..e968290e8378 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -35,7 +35,9 @@ static int gact_net_rand(struct tcf_gact *gact)
 
 static int gact_determ(struct tcf_gact *gact)
 {
-   if (gact-tcf_bstats.packets % gact-tcfg_pval)
+   u32 pack = atomic_inc_return(gact-packets);
+
+   if (pack % gact-tcfg_pval)
return gact-tcf_action;
return gact-tcfg_paction;
 }
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 5/6] net_sched: act: read tcfg_ptype once

2015-07-02 Thread Eric Dumazet

Third step for gact RCU operation :

Following patch will get rid of spinlock protection,
so we need to read tcfg_ptype once.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com
---
 net/sched/act_gact.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index e968290e8378..7c7e72e95943 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -121,16 +121,16 @@ static int tcf_gact(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_gact *gact = a-priv;
-   int action = TC_ACT_SHOT;
+   int action = gact-tcf_action;
 
spin_lock(gact-tcf_lock);
 #ifdef CONFIG_GACT_PROB
-   if (gact-tcfg_ptype)
-   action = gact_rand[gact-tcfg_ptype](gact);
-   else
-   action = gact-tcf_action;
-#else
-   action = gact-tcf_action;
+   {
+   u32 ptype = READ_ONCE(gact-tcfg_ptype);
+
+   if (ptype)
+   action = gact_rand[ptype](gact);
+   }
 #endif
gact-tcf_bstats.bytes += qdisc_pkt_len(skb);
gact-tcf_bstats.packets++;
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.1 regression in resizable hashtable tests

2015-07-02 Thread Thomas Graf

On 07/01/15 at 01:21pm, Meelis Roos wrote:
 This is 4.1 on sparc64 - one of my boxes that happens to have most 
 runtime test left on from some debugging effort. In 4.0 it was fine, 4.1 
 gives this in dmesg:
 
 [   31.898697] Running resizable hashtable tests...
 [   31.898915]   Adding 2048 keys
 [   31.952911]   Traversal complete: counted=17, nelems=2048, entries=2048
 [   31.953004] Test failed: Total count mismatch ^^^
 [   32.022676]   Traversal complete: counted=17, nelems=2048, entries=2048
 [   32.022788] Test failed: Total count mismatch ^^^
 [   32.022828]   Deleting 2048 keys

Thanks for the report. I think this is already fixed. Can you try with the
following commit:

commit 246b23a7695bd5a457aa51a36a948cce53d1d477
Author: Thomas Graf tg...@suug.ch
Date:   Thu Apr 30 22:37:44 2015 +

rhashtable-test: Use walker to test bucket statistics

As resizes may continue to run in the background, use walker to
ensure we see all entries. Also print the encountered number
of rehashes queued up while traversing.

This may lead to warnings due to entries being seen multiple
times. We consider them non-fatal.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: David S. Miller da...@davemloft.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/6] net_sched: act: lockless operation

2015-07-02 Thread Eric Dumazet

As mentioned by Alexei last week in Budapest, it is a bit weird
to take a spinlock in order to drop a packet in a tc filter...

Lets add percpu infra for tc actions and use it for gact.

Before changes, my host with 8 RX queues was handling 5 Mpps,
and more than 10 Mpps after.

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com

Eric Dumazet (6):
  net: sched: extend percpu stats helpers
  net: sched: add percpu stats to actions
  net_sched: act: make tcfg_pval non zero
  net_sched: act: use a separate packet counters for gact_determ()
  net_sched: act: read tcfg_ptype once
  net_sched: act: remove spinlock in fast path

 include/net/act_api.h|  4 +++-
 include/net/sch_generic.h| 27 +--
 include/net/tc_act/tc_gact.h |  7 ---
 net/core/dev.c   |  4 ++--
 net/sched/act_api.c  | 44 ++--
 net/sched/act_bpf.c  |  2 +-
 net/sched/act_connmark.c |  3 ++-
 net/sched/act_csum.c |  3 ++-
 net/sched/act_gact.c | 35 ++-
 net/sched/act_ipt.c  |  2 +-
 net/sched/act_mirred.c   |  3 ++-
 net/sched/act_nat.c  |  3 ++-
 net/sched/act_pedit.c|  3 ++-
 net/sched/act_simple.c   |  3 ++-
 net/sched/act_skbedit.c  |  3 ++-
 net/sched/act_vlan.c |  3 ++-
 16 files changed, 96 insertions(+), 53 deletions(-)

-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/6] net: sched: extend percpu stats helpers

2015-07-02 Thread Eric Dumazet

qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.

We want to add percpu stats for tc action, so this patch add common
helpers.

qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()

Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Alexei Starovoitov a...@plumgrid.com
Cc: Jamal Hadi Salim j...@mojatatu.com
Cc: John Fastabend john.fastab...@gmail.com
---
 include/net/sch_generic.h | 27 +--
 net/core/dev.c|  4 ++--
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2738f6f87908..0cd49b21b211 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -513,17 +513,21 @@ static inline void bstats_update(struct 
gnet_stats_basic_packed *bstats,
bstats-packets += skb_is_gso(skb) ? skb_shinfo(skb)-gso_segs : 1;
 }
 
-static inline void qdisc_bstats_update_cpu(struct Qdisc *sch,
-  const struct sk_buff *skb)
+static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
+const struct sk_buff *skb)
 {
-   struct gnet_stats_basic_cpu *bstats =
-   this_cpu_ptr(sch-cpu_bstats);
-
u64_stats_update_begin(bstats-syncp);
bstats_update(bstats-bstats, skb);
u64_stats_update_end(bstats-syncp);
 }
 
+static inline void qdisc_bstats_cpu_update(struct Qdisc *sch,
+  const struct sk_buff *skb)
+{
+   bstats_cpu_update(this_cpu_ptr(sch-cpu_bstats), skb);
+
+}
+
 static inline void qdisc_bstats_update(struct Qdisc *sch,
   const struct sk_buff *skb)
 {
@@ -547,16 +551,19 @@ static inline void __qdisc_qstats_drop(struct Qdisc *sch, 
int count)
sch-qstats.drops += count;
 }
 
-static inline void qdisc_qstats_drop(struct Qdisc *sch)
+static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
 {
-   sch-qstats.drops++;
+   qstats-drops++;
 }
 
-static inline void qdisc_qstats_drop_cpu(struct Qdisc *sch)
+static inline void qdisc_qstats_drop(struct Qdisc *sch)
 {
-   struct gnet_stats_queue *qstats = this_cpu_ptr(sch-cpu_qstats);
+   qstats_drop_inc(sch-qstats);
+}
 
-   qstats-drops++;
+static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
+{
+   qstats_drop_inc(this_cpu_ptr(sch-cpu_qstats));
 }
 
 static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
diff --git a/net/core/dev.c b/net/core/dev.c
index 6778ad52..e0d270143fc7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3646,7 +3646,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff 
*skb,
 
qdisc_skb_cb(skb)-pkt_len = skb-len;
skb-tc_verd = SET_TC_AT(skb-tc_verd, AT_INGRESS);
-   qdisc_bstats_update_cpu(cl-q, skb);
+   qdisc_bstats_cpu_update(cl-q, skb);
 
switch (tc_classify(skb, cl, cl_res)) {
case TC_ACT_OK:
@@ -3654,7 +3654,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff 
*skb,
skb-tc_index = TC_H_MIN(cl_res.classid);
break;
case TC_ACT_SHOT:
-   qdisc_qstats_drop_cpu(cl-q);
+   qdisc_qstats_cpu_drop(cl-q);
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
kfree_skb(skb);
-- 
2.4.3.573.g4eafbef

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] net_sched: gen_estimator: extend pps limit

2015-07-02 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

rate estimators are limited to 4 Mpps, which was fine years ago, but
too small with current hardware generation.

Lets use 2^5 scaling instead of 2^10 to get 128 Mpps new limit.

On 64bit arch, use an unsigned long for temp storage and remove limit.
(We do not expect 32bit arches to be able to reach this point)

Tested:

tc -s -d filter sh dev eth0 parent :

filter protocol ip pref 1 u32 
filter protocol ip pref 1 u32 fh 800: ht divisor 1 
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 
1:15 
  match 0700/ff00 at 12
action order 1: gact action drop
 random type none pass val 0
 index 1 ref 1 bind 1 installed 166 sec
Action statistics:
Sent 39734251496 bytes 863788076 pkt (dropped 863788117, overlimits 0 
requeues 0) 
rate 4067Mbit 11053596pps backlog 0b 0p requeues 0 

Signed-off-by: Eric Dumazet eduma...@google.com
---
 net/core/gen_estimator.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index 9dfb88a933e7..92d886f4adcb 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -66,7 +66,7 @@
 
NOTES.
 
-   * avbps is scaled by 2^5, avpps is scaled by 2^10.
+   * avbps and avpps are scaled by 2^5.
* both values are reported as 32 bit unsigned values. bps can
  overflow for fast links : max speed being 34360Mbit/sec
* Minimal interval is HZ/4=250msec (it is the greatest common divisor
@@ -85,10 +85,10 @@ struct gen_estimator
struct gnet_stats_rate_est64*rate_est;
spinlock_t  *stats_lock;
int ewma_log;
+   u32 last_packets;
+   unsigned long   avpps;
u64 last_bytes;
u64 avbps;
-   u32 last_packets;
-   u32 avpps;
struct rcu_head e_rcu;
struct rb_node  node;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
@@ -118,8 +118,8 @@ static void est_timer(unsigned long arg)
rcu_read_lock();
list_for_each_entry_rcu(e, elist[idx].list, list) {
struct gnet_stats_basic_packed b = {0};
+   unsigned long rate;
u64 brate;
-   u32 rate;
 
spin_lock(e-stats_lock);
read_lock(est_lock);
@@ -133,10 +133,11 @@ static void est_timer(unsigned long arg)
e-avbps += (brate  e-ewma_log) - (e-avbps  e-ewma_log);
e-rate_est-bps = (e-avbps+0xF)5;
 
-   rate = (b.packets - e-last_packets)(12 - idx);
+   rate = b.packets - e-last_packets;
+   rate = (7 - idx);
e-last_packets = b.packets;
e-avpps += (rate  e-ewma_log) - (e-avpps  e-ewma_log);
-   e-rate_est-pps = (e-avpps+0x1FF)10;
+   e-rate_est-pps = (e-avpps + 0xF)  5;
 skip:
read_unlock(est_lock);
spin_unlock(e-stats_lock);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net-ipvs: Delete an unnecessary check before the function call module_put

2015-07-02 Thread SF Markus Elfring

From: Markus Elfring elfr...@users.sourceforge.net
Date: Thu, 2 Jul 2015 17:00:14 +0200

The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring elfr...@users.sourceforge.net
---
 net/netfilter/ipvs/ip_vs_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sched.c b/net/netfilter/ipvs/ip_vs_sched.c
index 199760c..e50221b 100644
--- a/net/netfilter/ipvs/ip_vs_sched.c
+++ b/net/netfilter/ipvs/ip_vs_sched.c
@@ -137,7 +137,7 @@ struct ip_vs_scheduler *ip_vs_scheduler_get(const char 
*sched_name)
 
 void ip_vs_scheduler_put(struct ip_vs_scheduler *scheduler)
 {
-   if (scheduler  scheduler-module)
+   if (scheduler)
module_put(scheduler-module);
 }
 
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/6] net: sched: add percpu stats to actions

2015-07-02 Thread John Fastabend

On 15-07-02 06:07 AM, Eric Dumazet wrote:
 Reuse existing percpu infrastructure Jonh added for qdisc.
 
 This patch adds a new cpustats parameter to tcf_hash_create() and all
 actions pass false, meaning this patch should have no effect yet.
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Alexei Starovoitov a...@plumgrid.com
 Cc: Jamal Hadi Salim j...@mojatatu.com
 Cc: John Fastabend john.fastab...@gmail.com
 ---
  include/net/act_api.h|  4 +++-
  net/sched/act_api.c  | 44 ++--
  net/sched/act_bpf.c  |  2 +-
  net/sched/act_connmark.c |  3 ++-
  net/sched/act_csum.c |  3 ++-
  net/sched/act_gact.c |  3 ++-
  net/sched/act_ipt.c  |  2 +-
  net/sched/act_mirred.c   |  3 ++-
  net/sched/act_nat.c  |  3 ++-
  net/sched/act_pedit.c|  3 ++-
  net/sched/act_simple.c   |  3 ++-
  net/sched/act_skbedit.c  |  3 ++-
  net/sched/act_vlan.c |  3 ++-
  13 files changed, 57 insertions(+), 22 deletions(-)
 

Nice.

Acked-by: John Fastabend john.r.fastab...@intel.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 3/6] net_sched: act: make tcfg_pval non zero

2015-07-02 Thread John Fastabend

On 15-07-02 06:07 AM, Eric Dumazet wrote:
 First step for gact RCU operation :
 
 Instead of testing if tcfg_pval is zero or not, just make it 1.
 
 No change in behavior, but slightly faster code.
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Alexei Starovoitov a...@plumgrid.com
 Cc: Jamal Hadi Salim j...@mojatatu.com
 Cc: John Fastabend john.fastab...@gmail.com
 ---
  net/sched/act_gact.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
 index a4f8af29ee30..42284aad77dd 100644
 --- a/net/sched/act_gact.c
 +++ b/net/sched/act_gact.c

Acked-by: John Fastabend john.r.fastab...@intel.com

 @@ -28,14 +28,14 @@
  #ifdef CONFIG_GACT_PROB
  static int gact_net_rand(struct tcf_gact *gact)
  {
 - if (!gact-tcfg_pval || prandom_u32() % gact-tcfg_pval)
 + if (prandom_u32() % gact-tcfg_pval)
   return gact-tcf_action;
   return gact-tcfg_paction;
  }
  
  static int gact_determ(struct tcf_gact *gact)
  {
 - if (!gact-tcfg_pval || gact-tcf_bstats.packets % gact-tcfg_pval)
 + if (gact-tcf_bstats.packets % gact-tcfg_pval)
   return gact-tcf_action;
   return gact-tcfg_paction;
  }
 @@ -105,7 +105,7 @@ static int tcf_gact_init(struct net *net, struct nlattr 
 *nla,
  #ifdef CONFIG_GACT_PROB
   if (p_parm) {
   gact-tcfg_paction = p_parm-paction;
 - gact-tcfg_pval= p_parm-pval;
 + gact-tcfg_pval= max_t(u16, 1, p_parm-pval);
   gact-tcfg_ptype   = p_parm-ptype;
   }
  #endif
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/6] net_sched: act: use a separate packet counters for gact_determ()

2015-07-02 Thread John Fastabend

On 15-07-02 06:07 AM, Eric Dumazet wrote:
 Second step for gact RCU operation :
 
 We want to get rid of the spinlock protecting gact operations.
 Stats (packets/bytes) will soon be per cpu.
 
 gact_determ() would not work without a central packet counter,
 so lets add it for this mode.
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Alexei Starovoitov a...@plumgrid.com
 Cc: Jamal Hadi Salim j...@mojatatu.com
 Cc: John Fastabend john.fastab...@gmail.com
 ---

Acked-by: John Fastabend john.r.fastab...@intel.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [v2,5/9] fsl/fman: Add Frame Manager support

2015-07-02 Thread Liberman Igal

Hi Scott,
Thank you for your feedback, please take a look at my comments/questions.

Regards,
Igal Liberman.

 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, June 26, 2015 6:55 AM
 To: Liberman Igal-B31950
 Cc: netdev@vger.kernel.org; linuxppc-...@lists.ozlabs.org; Bucur Madalin-
 Cristian-B32716; pebo...@tiscali.nl
 Subject: Re: [v2,5/9] fsl/fman: Add Frame Manager support
 
 On Wed, 2015-06-24 at 22:35 +0300, igal.liber...@freescale.com wrote:
  From: Igal Liberman igal.liber...@freescale.com
 
  Add Frame Manger Driver support.
  This patch adds The FMan configuration, initialization and runtime
  control routines.
 
  Signed-off-by: Igal Liberman igal.liber...@freescale.com
  ---
   drivers/net/ethernet/freescale/fman/Kconfig|   35 +
   drivers/net/ethernet/freescale/fman/Makefile   |2 +-
   drivers/net/ethernet/freescale/fman/fm.c   | 1406
  
   drivers/net/ethernet/freescale/fman/fm.h   |  394 ++
   drivers/net/ethernet/freescale/fman/fm_common.h|  142 ++
   drivers/net/ethernet/freescale/fman/fm_drv.c   |  701 ++
   drivers/net/ethernet/freescale/fman/fm_drv.h   |  116 ++
   drivers/net/ethernet/freescale/fman/inc/enet_ext.h |  199 +++
   drivers/net/ethernet/freescale/fman/inc/fm_ext.h   |  488 +++
   .../net/ethernet/freescale/fman/inc/fsl_fman_drv.h |   99 ++
   drivers/net/ethernet/freescale/fman/inc/service.h  |   55 +
   11 files changed, 3636 insertions(+), 1 deletion(-)  create mode
  100644 drivers/net/ethernet/freescale/fman/fm.c
   create mode 100644 drivers/net/ethernet/freescale/fman/fm.h
   create mode 100644 drivers/net/ethernet/freescale/fman/fm_common.h
   create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.c
   create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.h
   create mode 100644 drivers/net/ethernet/freescale/fman/inc/enet_ext.h
   create mode 100644 drivers/net/ethernet/freescale/fman/inc/fm_ext.h
   create mode 100644
  drivers/net/ethernet/freescale/fman/inc/fsl_fman_drv.h
   create mode 100644 drivers/net/ethernet/freescale/fman/inc/service.h
 
 Again, please start with something pared down, without extraneous
 features, but *with* enough functionality to actually pass packets around.
 Getting this thing into decent shape is going to be hard enough without
 carrying around the excess baggage.
 
  diff --git a/drivers/net/ethernet/freescale/fman/Kconfig
  b/drivers/net/ethernet/freescale/fman/Kconfig
  index 825a0d5..12c75bfd 100644
  --- a/drivers/net/ethernet/freescale/fman/Kconfig
  +++ b/drivers/net/ethernet/freescale/fman/Kconfig
  @@ -7,3 +7,38 @@ config FSL_FMAN
Freescale Data-Path Acceleration Architecture Frame Manager
(FMan) support
 
  +if FSL_FMAN
  +
  +config FSL_FM_MAX_FRAME_SIZE
  + int Maximum L2 frame size
  + range 64 9600
  + default 1522
  + help
  + Configure this in relation to the maximum possible MTU of your
  + network configuration. In particular, one would need to
  + increase this value in order to use jumbo frames.
  + FSL_FM_MAX_FRAME_SIZE must accommodate the Ethernet FCS
  + (4 bytes) and one ETH+VLAN header (18 bytes), to a total of
  + 22 bytes in excess of the desired L3 MTU.
  +
  + Note that having too large a FSL_FM_MAX_FRAME_SIZE (much
 larger
  + than the actual MTU) may lead to buffer exhaustion, especially
  + in the case of badly fragmented datagrams on the Rx path.
  + Conversely, having a FSL_FM_MAX_FRAME_SIZE smaller than the
  + actual MTU will lead to frames being dropped.
 
 Scatter gather can't be used for jumbo frames?
 

Scatter gather is used, it's introduced in dpaa_eth as a separate patch from 
the basic support.
The dpaa_eth can work in S/G mode or use large buffers, max frame size sized to 
reduce S/G overhead (performance vs memory used trade-off).

 Why is this a compile-time option?
 

This is needed for a couple of reasons:
 - FMan resource sizing - we need to know the maximum frame size we plan to use 
for determining the Rx FIFO sizes at config time
 - There are issues when changing the MAC maximum frame size at runtime thus 
the need to set in HW the maximum allowable and compensate from sw (drop frames 
above the set MTU).

  +
  +config FSL_FM_RX_EXTRA_HEADROOM
  + int Add extra headroom at beginning of data buffers
  + range 16 384
  + default 64
  + help
  + Configure this to tell the Frame Manager to reserve some extra
  + space at the beginning of a data buffer on the receive path,
  + before Internal Context fields are copied. This is in addition
  + to the private data area already reserved for driver internal
  + use. The provided value must be a multiple of 16.
  +
  + This option does not affect in any way the

Re: [PATCH net-next 1/6] net: sched: extend percpu stats helpers

2015-07-02 Thread John Fastabend

On 15-07-02 06:07 AM, Eric Dumazet wrote:
 qdisc_bstats_update_cpu() and other helpers were added to support
 percpu stats for qdisc.
 
 We want to add percpu stats for tc action, so this patch add common
 helpers.
 
 qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
 qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Alexei Starovoitov a...@plumgrid.com
 Cc: Jamal Hadi Salim j...@mojatatu.com
 Cc: John Fastabend john.fastab...@gmail.com
 ---

Acked-by: John Fastabend john.r.fastab...@intel.com

stupid nit below,

  include/net/sch_generic.h | 27 +--
  net/core/dev.c|  4 ++--
  2 files changed, 19 insertions(+), 12 deletions(-)
 
 diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
 index 2738f6f87908..0cd49b21b211 100644
 --- a/include/net/sch_generic.h
 +++ b/include/net/sch_generic.h
 @@ -513,17 +513,21 @@ static inline void bstats_update(struct 
 gnet_stats_basic_packed *bstats,
   bstats-packets += skb_is_gso(skb) ? skb_shinfo(skb)-gso_segs : 1;
  }
  
 -static inline void qdisc_bstats_update_cpu(struct Qdisc *sch,
 -const struct sk_buff *skb)
 +static inline void bstats_cpu_update(struct gnet_stats_basic_cpu *bstats,
 +  const struct sk_buff *skb)
  {
 - struct gnet_stats_basic_cpu *bstats =
 - this_cpu_ptr(sch-cpu_bstats);
 -
   u64_stats_update_begin(bstats-syncp);
   bstats_update(bstats-bstats, skb);
   u64_stats_update_end(bstats-syncp);
  }
  
 +static inline void qdisc_bstats_cpu_update(struct Qdisc *sch,
 +const struct sk_buff *skb)
 +{
 + bstats_cpu_update(this_cpu_ptr(sch-cpu_bstats), skb);
 +

spurious new line.

 +}
 +

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] bridge: vlan: fix usage of vlan 0 and 4095 again

2015-07-02 Thread Toshiaki Makita

On 15/07/02 (木) 21:48, Nikolay Aleksandrov wrote:
 Vlan ids 0 and 4095 were disallowed by commit:
 8adff41c3d25 (bridge: Don't use VID 0 and 4095 in vlan filtering)
 but then the check was removed when vlan ranges were introduced by:
 bdced7ef7838 (bridge: support for multiple vlans and vlan ranges in setlink 
 and dellink requests)
 So reintroduce the vlan range check.
 Before patch:
 [root@testvm ~]# bridge vlan add vid 0 dev eth0 master
 (succeeds)
 After Patch:
 [root@testvm ~]# bridge vlan add vid 0 dev eth0 master
 RTNETLINK answers: Invalid argument
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 Fixes: bdced7ef7838 (bridge: support for multiple vlans and vlan ranges in 
 setlink and dellink requests)

Thank you for fixing this.

Acked-by: Toshiaki Makita toshiaki.maki...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 5/6] net_sched: act: read tcfg_ptype once

2015-07-02 Thread John Fastabend

On 15-07-02 06:07 AM, Eric Dumazet wrote:
 Third step for gact RCU operation :
 
 Following patch will get rid of spinlock protection,
 so we need to read tcfg_ptype once.
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Alexei Starovoitov a...@plumgrid.com
 Cc: Jamal Hadi Salim j...@mojatatu.com
 Cc: John Fastabend john.fastab...@gmail.com
 ---
  net/sched/act_gact.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)
 
 diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
 index e968290e8378..7c7e72e95943 100644
 --- a/net/sched/act_gact.c
 +++ b/net/sched/act_gact.c
 @@ -121,16 +121,16 @@ static int tcf_gact(struct sk_buff *skb, const struct 
 tc_action *a,
   struct tcf_result *res)
  {
   struct tcf_gact *gact = a-priv;
 - int action = TC_ACT_SHOT;
 + int action = gact-tcf_action;
  
   spin_lock(gact-tcf_lock);
  #ifdef CONFIG_GACT_PROB
 - if (gact-tcfg_ptype)
 - action = gact_rand[gact-tcfg_ptype](gact);
 - else
 - action = gact-tcf_action;
 -#else
 - action = gact-tcf_action;
 + {
 + u32 ptype = READ_ONCE(gact-tcfg_ptype);
 +
 + if (ptype)
 + action = gact_rand[ptype](gact);
 + }
  #endif
   gact-tcf_bstats.bytes += qdisc_pkt_len(skb);
   gact-tcf_bstats.packets++;
 

Acked-by: John Fastabend john.r.fastab...@intel.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net-ipv6: Delete an unnecessary check before the function call free_percpu

2015-07-02 Thread SF Markus Elfring

From: Markus Elfring elfr...@users.sourceforge.net
Date: Thu, 2 Jul 2015 16:30:24 +0200

The free_percpu() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring elfr...@users.sourceforge.net
---
 net/ipv6/route.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1a1122a..6090969 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -369,10 +369,7 @@ static void ip6_dst_destroy(struct dst_entry *dst)
struct inet6_dev *idev;
 
dst_destroy_metrics_generic(dst);
-
-   if (rt-rt6i_pcpu)
-   free_percpu(rt-rt6i_pcpu);
-
+   free_percpu(rt-rt6i_pcpu);
rt6_uncached_list_del(rt);
 
idev = rt-rt6i_idev;
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: amd-xgbe e0700000.xgmac: DMA-API: device driver tries to sync DMA memory it has not allocated

2015-07-02 Thread Lendacky, Thomas

Hi Kim,

Yup, no problem.  I think I know what the issue is.  I should be using 
dma_sync_single_range_for_cpu instead of dma_sync_single_for_cpu.  The page 
allocations that the driver is doing have crossed a 1MB boundary causing the 
warning because I'm using a calculated DMA address rather than the base DMA 
address + an offset.

I'll try to reproduce it so I can verify, but I believe that is the issue.

Is there any information on what was being run to trigger this?

Thanks,
Tom

-Original Message-
From: Kim Phillips [mailto:kim.phill...@arm.com] 
Sent: Thursday, July 02, 2015 2:02 PM
To: Lendacky, Thomas
Cc: netdev@vger.kernel.org
Subject: amd-xgbe e070.xgmac: DMA-API: device driver tries to sync DMA 
memory it has not allocated

Hi Tom,

A pristine 4.1 kernel with CONFIG_DMA_API_DEBUG=y produces this call
trace on an AMD Seattle:

[  112.896576] [ cut here ]
[  112.896591] WARNING: CPU: 2 PID: 1059 at lib/dma-debug.c:1202 
check_sync+0x138/0x56c()
[  112.896597] amd-xgbe e070.xgmac: DMA-API: device driver tries to sync 
DMA memory it has not allocated [device address=0x008003d52000] [size=1536 
bytes]
[  112.896600] Modules linked in: cpufreq_stats vfat fat xfs libcrc32c 
spi_pl022 aes_ce_blk ablk_helper cryptd aes_ce_cipher ghash_ce sha2_ce sha1_ce 
uio_pdrv_genirq uio fuse
[  112.896634] CPU: 2 PID: 1059 Comm: sshd Tainted: GW   4.1.0 #10
[  112.896638] Hardware name: Default string Default string/Default string, 
BIOS ROD0082B 06/16/2015
[  112.896641] Call trace:
[  112.899086] [fe097b20] dump_backtrace+0x0/0x170
[  112.899091] [fe097cb0] show_stack+0x20/0x2c
[  112.899097] [fe813da0] dump_stack+0x8c/0xc4
[  112.899102] [fe0c45bc] warn_slowpath_common+0xa0/0xd8
[  112.899106] [fe0c4668] warn_slowpath_fmt+0x74/0x88
[  112.899109] [fe486f24] check_sync+0x134/0x56c
[  112.899113] [fe4873ac] debug_dma_sync_single_for_cpu+0x50/0x5c
[  112.899119] [fe5b07fc] xgbe_rx_poll+0x1e0/0x6c0
[  112.899123] [fe5b1cdc] xgbe_one_poll+0x34/0x6c
[  112.899128] [fe6a510c] net_rx_action+0x270/0x504
[  112.899133] [fe0c9ba0] __do_softirq+0x120/0x60c
[  112.899136] [fe0ca3fc] irq_exit+0xa4/0xe4
[  112.899143] [fe12d2f8] __handle_domain_irq+0x74/0xc4
[  112.899146] [fe0903ec] gic_handle_irq+0x38/0x84
[  112.899150] Exception stack(0xfe03de09bbf0 to 0xfe03de09bd10)
[  112.899154] bbe0: 1ee64b30 fe00 
002111e8 fe00
[  112.899158] bc00: de09bd30 fe03 0081a668 fe00 dd54fa00 fe03 
de09bcb0 fe03
[  112.899162] bc20: 0001  0001    
 
[  112.899166] bc40:   002111e8 fe00   
de098000 fe03
[  112.899170] bc60: de09bc10 fe03 1d456228  0076  
0008 
[  112.899174] bc80: 1000 00077d2a  001dcd65 00213e98 fe00 
96ea8f60 03ff
[  112.899178] bca0: 65b9a2e2  1ee64b30 fe00 002111e8 fe00 
1ee64ba0 fe00
[  112.899181] bcc0: 0041  1ee10b80 fe00 025c58c8 fe00 
0003 
[  112.899185] bce0: 027b6c48 fe00 00100077  02ab0b4b  
de09bd30 fe03
[  112.899188] bd00: 0081a660 fe00 de09bd30 fe03
[  112.899192] [fe0934e8] el1_irq+0x68/0x100
[  112.899198] [fe2111e4] validate_mm+0x44/0x2d4
[  112.899203] [fe211edc] vma_link+0x98/0xe0
[  112.899206] [fe213e70] do_brk+0x2ec/0x314
[  112.899209] [fe213fd4] SyS_brk+0x13c/0x170
[  112.899212] ---[ end trace cbf36648db00d232 ]---

Can you look into it?

Thanks,

Kim


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No:  2548782

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] bonding: primary_reselect with failure is not working properly

2015-07-02 Thread Jay Vosburgh


[ added netdev to cc ]

Mazhar Rana ranamazh...@gmail.com wrote:

When primary_reselect is set to failure, primary interface should
not become active until current active slave is up. But if we set first

I think you mean until current active slave is down here, not
up.

member of bond device as a primary interface and primary_reselect
is set to failure then whenever primary interface's link get back(up)
it become active slave even if current active slave is still up.

With this patch, bond_find_best_slave will not traverse members if
primary interface is not candidate for failover/reselection and current
active slave is still up.

Signed-off-by: Mazhar Rana mazhar.r...@cyberoam.com
Reviewed-by: Sanket Shah sanket.s...@cyberoam.com
---
v2: return curr instead of bond-curr_active_slave.

 drivers/net/bonding/bond_main.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990..ac71261 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -715,7 +715,7 @@ static bool bond_should_change_active(struct bonding *bond)
  */
 static struct slave *bond_find_best_slave(struct bonding *bond)
 {
-  struct slave *slave, *bestslave = NULL, *primary;
+  struct slave *slave, *bestslave = NULL, *primary, *curr;
   struct list_head *iter;
   int mintime = bond-params.updelay;
 
@@ -724,6 +724,14 @@ static struct slave *bond_find_best_slave(struct bonding 
*bond)
   bond_should_change_active(bond))
   return primary;
 
+  /* We are here means primary interface is not candidate for
+   * reslection/failover. If currenet active slave is still up
+   * then there is no meaning to traverse  members.
+   */
+  curr = rtnl_dereference(bond-curr_active_slave);
+  if (curr  curr-link == BOND_LINK_UP)
+  return curr;
+
   bond_for_each_slave(bond, slave, iter) {
   if (slave-link == BOND_LINK_UP)
   return slave;
-- 

I believe the above patch will work, but I also think these
functions are kind of hacky, as bond_should_change_active() doesn't
really give the answer its name implies, so we have to second guess
here.

I think the following, while a bigger change, ends up with
clearer code.  Compile tested only.  Comments?

-J

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 19eb990..8c30f6b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -689,40 +689,54 @@ out:
 
 }
 
-static bool bond_should_change_active(struct bonding *bond)
+static struct slave *bond_choose_primary_or_current(struct bonding *bond)
 {
struct slave *prim = rtnl_dereference(bond-primary_slave);
struct slave *curr = rtnl_dereference(bond-curr_active_slave);
 
-   if (!prim || !curr || curr-link != BOND_LINK_UP)
-   return true;
+   if (!prim || !prim-link == BOND_LINK_UP)
+   return curr;
+
if (bond-force_primary) {
bond-force_primary = false;
-   return true;
+   return prim;
+   }
+
+   if (!curr || curr-link != BOND_LINK_UP)
+   return prim;
+
+   /* At this point, prim and curr are both up */
+   switch (bond-params.primary_reselect) {
+   case BOND_PRI_RESELECT_ALWAYS:
+   return prim;
+   case BOND_PRI_RESELECT_BETTER:
+   if (prim-speed  curr-speed)
+   return curr;
+   if (prim-speed == curr-speed  prim-duplex = curr-duplex)
+   return curr;
+   return prim;
+   case BOND_PRI_RESELECT_FAILURE:
+   return curr;
+   default:
+   netdev_err(bond-dev, impossible primary_reselect %d\n,
+  bond-params.primary_reselect);
+   return curr;
}
-   if (bond-params.primary_reselect == BOND_PRI_RESELECT_BETTER 
-   (prim-speed  curr-speed ||
-(prim-speed == curr-speed  prim-duplex = curr-duplex)))
-   return false;
-   if (bond-params.primary_reselect == BOND_PRI_RESELECT_FAILURE)
-   return false;
-   return true;
 }
 
 /**
- * find_best_interface - select the best available slave to be the active one
+ * bond_find_best_slave - select the best available slave to be the active one
  * @bond: our bonding struct
  */
 static struct slave *bond_find_best_slave(struct bonding *bond)
 {
-   struct slave *slave, *bestslave = NULL, *primary;
+   struct slave *slave, *bestslave = NULL;
struct list_head *iter;
int mintime = bond-params.updelay;
 
-   primary = rtnl_dereference(bond-primary_slave);
-   if (primary  primary-link == BOND_LINK_UP 
-   bond_should_change_active(bond))
-   return primary;
+   slave =

Re: [PATCH net-next 6/6] net_sched: act: remove spinlock in fast path

2015-07-02 Thread Eric Dumazet

On Thu, 2015-07-02 at 09:35 -0700, John Fastabend wrote:
  +   if (gact-tcf_tm.lastuse != jiffies)
  +   gact-tcf_tm.lastuse = jiffies;
 
 I'm missing the point of the if block. Is that really good enough
 for the 32bit system case? I would have expected some wrapper to
 handle it here something like u64_stats_() maybe _u64_jiffies(). Maybe
 after a coffee I'll make sense of it.
 

Point is to not dirty cache line for every packet ?

Doing the test means we attempt dirtying only ~HZ times per second,
which really matters to handle millions of packets per second.

My tests show a good enough performance, not sure we want a percpu thing
for this lastuse field.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] ipv6: Make MLD packets to only be processed locally

2015-07-02 Thread Hermin Anggawijaya

Before commit daad151263cf (ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().) MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.

Make MLD packet only processed locally.

Fixes: daad151263cf (ipv6: Make ipv6_is_mld() inline and use it from 
ip6_mc_input().)

Signed-off-by: Hermin Anggawijaya hermin.anggawij...@alliedtelesis.co.nz
---
 net/ipv6/ip6_input.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index f2e464e..57990c9 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -331,10 +331,10 @@ int ip6_mc_input(struct sk_buff *skb)
if (offset  0)
goto out;
 
-   if (!ipv6_is_mld(skb, nexthdr, offset))
-   goto out;
+   if (ipv6_is_mld(skb, nexthdr, offset))
+   deliver = true;
 
-   deliver = true;
+   goto out;
}
/* unknown RA - process it normally */
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rtnetlink: Actually use the policy for the IFLA_VF_INFO

2015-07-02 Thread Jason Gunthorpe

On Thu, Jul 02, 2015 at 10:34:54AM +0200, Daniel Borkmann wrote:
 So, commit c02db8c6290b moved it into a nested attribute (IFLA_VF_INFO)
 where we indeed don't do further validation. Imho, we should pass the
 parsed attribute table from nla_parse_nested() down into do_setvfinfo(),
 something like the below; I can give it a test run on my ixgbe.
 
 Sorry for the late reply, something like this looks good from my side.

Okay, since it is your patch, will you send it to DaveM with my
Reported-By?

Cheers,
Jason
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/1] Add PTP cross-timestamp to the PTP driver interface

2015-07-02 Thread Christopher Hall

This patch allows system and device time (cross-timestamp) to be performed 
by the driver. Currently, the cross-timestamping is performed in the
PTP_SYS_OFFSET ioctl.  The PTP clock driver reads gettimeofday() and the 
gettime64() callback provided by the driver. The cross-timestamp is best 
effort where the latency between the capture of system time 
(getnstimeofday()) and the device time (driver callback) may be significant.

This patch adds an additional callback getsynctime64(). Which will be called 
when the driver is able to perform a more accurate, implementation specific 
cross-timestamping.  For example, future network devices that implement 
PCIE PTM will be able to precisely correlate the device clock with the system 
clock with virtually zero latency between captures.  This added callback can 
be used by the driver to expose this functionality.

The callback, getsynctime64(), will only be called when defined and
n_samples == 1 because the driver returns only 1 cross-timestamp where 
multiple samples cannot be chained together.

This patch also adds to the capabilities ioctl (PTP_CLOCK_GETCAPS), allowing 
applications to query whether or not drivers implement the getsynctime 
callback, providing more precise cross timestamping.

Christopher Hall (1):
  Added additional callback to ptp_clock_info:

 drivers/ptp/ptp_chardev.c| 29 +
 include/linux/ptp_clock_kernel.h |  8 
 include/uapi/linux/ptp_clock.h   |  4 +++-
 3 files changed, 32 insertions(+), 9 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/1] Added additional callback to ptp_clock_info:

2015-07-02 Thread Christopher Hall

* getsynctime64()

This takes 2 arguments referring to system and device time

With this callback drivers may provide both system time and device time
to ensure precise correlation

Modified PTP_SYS_OFFSET ioctl in PTP clock driver to use the above
callback if it's available

Added capability (PTP_CLOCK_GETCAPS) for checking whether driver supports
precise timestamping

Signed-off-by: Christopher Hall christopher.s.h...@intel.com
---
 drivers/ptp/ptp_chardev.c| 29 +
 include/linux/ptp_clock_kernel.h |  8 
 include/uapi/linux/ptp_clock.h   |  4 +++-
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index da7bae9..2a83aea 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -124,7 +124,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
struct ptp_clock_info *ops = ptp-info;
struct ptp_clock_time *pct;
-   struct timespec64 ts;
+   struct timespec64 ts, systs;
int enable, err = 0;
unsigned int i, pin_index;
 
@@ -138,6 +138,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
caps.n_per_out = ptp-info-n_per_out;
caps.pps = ptp-info-pps;
caps.n_pins = ptp-info-n_pins;
+   caps.precise_timestamping = ptp-info-getsynctime64 != NULL;
if (copy_to_user((void __user *)arg, caps, sizeof(caps)))
err = -EFAULT;
break;
@@ -196,19 +197,31 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
break;
}
pct = sysoff-ts[0];
-   for (i = 0; i  sysoff-n_samples; i++) {
-   getnstimeofday64(ts);
+   if (ptp-info-getsynctime64  sysoff-n_samples == 1) {
+   ptp-info-getsynctime64(ptp-info, ts, systs);
+   pct-sec = systs.tv_sec;
+   pct-nsec = systs.tv_nsec;
+   pct++;
pct-sec = ts.tv_sec;
pct-nsec = ts.tv_nsec;
pct++;
-   ptp-info-gettime64(ptp-info, ts);
+   pct-sec = systs.tv_sec;
+   pct-nsec = systs.tv_nsec;
+   } else {
+   for (i = 0; i  sysoff-n_samples; i++) {
+   getnstimeofday64(ts);
+   pct-sec = ts.tv_sec;
+   pct-nsec = ts.tv_nsec;
+   pct++;
+   ptp-info-gettime64(ptp-info, ts);
+   pct-sec = ts.tv_sec;
+   pct-nsec = ts.tv_nsec;
+   pct++;
+   }
+   getnstimeofday64(ts);
pct-sec = ts.tv_sec;
pct-nsec = ts.tv_nsec;
-   pct++;
}
-   getnstimeofday64(ts);
-   pct-sec = ts.tv_sec;
-   pct-nsec = ts.tv_nsec;
if (copy_to_user((void __user *)arg, sysoff, sizeof(*sysoff)))
err = -EFAULT;
break;
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index b8b7306..344f129 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -67,6 +67,11 @@ struct ptp_clock_request {
  * @gettime64:  Reads the current time from the hardware clock.
  *  parameter ts: Holds the result.
  *
+ * @getsynctime64:  Reads the current time from the hardware clock and system
+ *  clock simultaneously.
+ *  parameter dev: Holds the device time
+ *  parameter sys: Holds the system time
+ *
  * @settime64:  Set the current time on the hardware clock.
  *  parameter ts: Time value to set.
  *
@@ -105,6 +110,9 @@ struct ptp_clock_info {
int (*adjfreq)(struct ptp_clock_info *ptp, s32 delta);
int (*adjtime)(struct ptp_clock_info *ptp, s64 delta);
int (*gettime64)(struct ptp_clock_info *ptp, struct timespec64 *ts);
+   int (*getsynctime64)
+   (struct ptp_clock_info *ptp, struct timespec64 *dev,
+struct timespec64 *sys);
int (*settime64)(struct ptp_clock_info *p, const struct timespec64 *ts);
int (*enable)(struct ptp_clock_info *ptp,
  struct ptp_clock_request *request, int on);
diff --git a/include/uapi/linux/ptp_clock.h b/include/uapi/linux/ptp_clock.h
index f0b7bfe..421b637 100644
--- a/include/uapi/linux/ptp_clock.h
+++ b/include/uapi/linux/ptp_clock.h
@@ -51,7 +51,9 @@ struct ptp_clock_caps {
int n_per_out; /*

[RFC PATCH net-next] tcp: add NV congestion control

2015-07-02 Thread Lawrence Brakmo

This is a request for comments.

TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of
NV was presented at 2010's LPC (slides). It is a delayed based
congestion avoidance for the data center. This version has been tested
within a 10G rack where the HW RTTs are 20-50us.

A description of TCP-NV, including implementation and experimental
results, can be found at:
http://www.brakmo.org/networking/tcp-nv/TCPNV.html

The current version includes many module parameters to support
experimentation with the parameters.

Signed-off-by: Lawrence Brakmo lawre...@brakmo.org
---
 include/linux/skbuff.h |   2 +-
 include/linux/tcp.h|   4 +
 include/net/tcp.h  |   5 +-
 net/ipv4/Kconfig   |  16 ++
 net/ipv4/Makefile  |   1 +
 net/ipv4/sysctl_net_ipv4.c |   9 +
 net/ipv4/tcp_input.c   |   5 +
 net/ipv4/tcp_nv.c  | 477 +
 net/ipv4/tcp_output.c  |   4 +-
 9 files changed, 520 insertions(+), 3 deletions(-)
 create mode 100644 net/ipv4/tcp_nv.c

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d6cdd6e..96a131d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -547,7 +547,7 @@ struct sk_buff {
 * want to keep them across layers you have to do a skb_clone()
 * first. This is owned by whoever has the skb queued ATM.
 */
-   charcb[48] __aligned(8);
+   charcb[52] __aligned(8);
 
unsigned long   _skb_refdst;
void(*destructor)(struct sk_buff *skb);
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 48c3696..05e0da5 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -254,6 +254,10 @@ struct tcp_sock {
u32 lost_out;   /* Lost packets */
u32 sacked_out; /* SACK'd packets   */
u32 fackets_out;/* FACK'd packets   */
+   u32 ack_in_flight;  /* This field is populated when new acks
+* are received. It contains the number of 
+* bytes in flight when the last packet
+* acked was sent. Used by tcp-nv. */
 
/* from STCP, retrans queue hinting */
struct sk_buff* lost_skb_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 950cfec..3e385c1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
 extern int sysctl_tcp_min_tso_segs;
 extern int sysctl_tcp_autocorking;
 extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_nv_enable;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
@@ -720,12 +721,14 @@ static inline u32 tcp_skb_timestamp(const struct sk_buff 
*skb)
 /* This is what the send packet queuing engine uses to pass
  * TCP per-packet control information to the transmission code.
  * We also store the host-order sequence numbers in here too.
- * This is 44 bytes if IPV6 is enabled.
+ * This is 48 bytes if IPV6 is enabled.
  * If this grows please adjust skbuff.h:skbuff-cb[xxx] size appropriately.
  */
 struct tcp_skb_cb {
__u32   seq;/* Starting sequence number */
__u32   end_seq;/* SEQ + FIN + SYN + datalen*/
+   __u32   in_flight;  /* bytes in flight when this packet
+* was sent. */
union {
/* Note : tcp_tw_isn is used in input path only
 *(isn chosen by tcp_timewait_state_process())
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6fb3c90..c21f85d 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -539,6 +539,22 @@ config TCP_CONG_VEGAS
window. TCP Vegas should provide less packet loss, but it is
not as aggressive as TCP Reno.
 
+config TCP_CONG_NV
+   tristate TCP NV
+   default m
+   ---help---
+   TCP NV is a follow up to TCP Vegas. It has been modified to deal with
+   10G networks, measurement noise introduced by LRO, GRO and interrupt
+   coalescence. In addition, it will decrease its cwnd multiplicative
+   instead of linearly.
+
+   Note that in general congestion avoidance (cwnd decreased when # packets
+   queued grows) cannot coexist with congestion control (cwnd decreased 
only
+   when there is packet loss) due to fairness issues. One scenario when the
+   can coexist safely is when the CA flows have RTTs  CC flows RTTs.
+
+   For further details see 
http://www.brakmo.org/networking/tcp-nv/TCPNVhtml
+
 config TCP_CONG_SCALABLE
tristate Scalable TCP
default n
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index efc43f3..06f335f 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) +=

Re: [RFC PATCH net-next] tcp: add NV congestion control

2015-07-02 Thread Eric Dumazet

On Thu, 2015-07-02 at 18:21 -0700, Lawrence Brakmo wrote:
 This is a request for comments.
 
 TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of
 NV was presented at 2010's LPC (slides). It is a delayed based
 congestion avoidance for the data center. This version has been tested
 within a 10G rack where the HW RTTs are 20-50us.
 
 A description of TCP-NV, including implementation and experimental
 results, can be found at:
 http://www.brakmo.org/networking/tcp-nv/TCPNV.html
 
 The current version includes many module parameters to support
 experimentation with the parameters.
 
 Signed-off-by: Lawrence Brakmo lawre...@brakmo.org
 ---
  include/linux/skbuff.h |   2 +-
  include/linux/tcp.h|   4 +
  include/net/tcp.h  |   5 +-
  net/ipv4/Kconfig   |  16 ++
  net/ipv4/Makefile  |   1 +
  net/ipv4/sysctl_net_ipv4.c |   9 +
  net/ipv4/tcp_input.c   |   5 +
  net/ipv4/tcp_nv.c  | 477 
 +
  net/ipv4/tcp_output.c  |   4 +-
  9 files changed, 520 insertions(+), 3 deletions(-)
  create mode 100644 net/ipv4/tcp_nv.c
 
 diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
 index d6cdd6e..96a131d 100644
 --- a/include/linux/skbuff.h
 +++ b/include/linux/skbuff.h
 @@ -547,7 +547,7 @@ struct sk_buff {
* want to keep them across layers you have to do a skb_clone()
* first. This is owned by whoever has the skb queued ATM.
*/
 - charcb[48] __aligned(8);
 + charcb[52] __aligned(8);
  

skb bloat alert.

This adds 8 bytes to cb[], and sk_buff, for no reason.

tcp_skb_cb is currently 44 bytes, so even if you add one u32, it should
not exceed cb[]


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] cxgb4: Fix incorrect sequence numbers shown in devlog

2015-07-02 Thread Hariprasad Shenai

Part of commit 49aa284fe64c4c1 (cxgb4: Add support for devlog)
change introduced a real bug where the Device Log Sequence Numbers are
no longer being converted from firmware Big-Endian to local CPU-Endian
format.

This patch moves all of the translation into the devlog_show() routine.
The only endianness code now in devlog_open() is the small loop to find the
earliest (lowest Sequence Number) Device Log entry in the circular buffer.

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 25 +++---
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 484eb8c..a11485f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -952,16 +952,23 @@ static int devlog_show(struct seq_file *seq, void *v)
 * eventually have to put a format interpreter in here ...
 */
seq_printf(seq, %10d  %15llu  %8s  %8s  ,
-  e-seqno, e-timestamp,
+  be32_to_cpu(e-seqno),
+  be64_to_cpu(e-timestamp),
   (e-level  ARRAY_SIZE(devlog_level_strings)
? devlog_level_strings[e-level]
: UNKNOWN),
   (e-facility  ARRAY_SIZE(devlog_facility_strings)
? devlog_facility_strings[e-facility]
: UNKNOWN));
-   seq_printf(seq, e-fmt, e-params[0], e-params[1],
-  e-params[2], e-params[3], e-params[4],
-  e-params[5], e-params[6], e-params[7]);
+   seq_printf(seq, e-fmt,
+  be32_to_cpu(e-params[0]),
+  be32_to_cpu(e-params[1]),
+  be32_to_cpu(e-params[2]),
+  be32_to_cpu(e-params[3]),
+  be32_to_cpu(e-params[4]),
+  be32_to_cpu(e-params[5]),
+  be32_to_cpu(e-params[6]),
+  be32_to_cpu(e-params[7]));
}
return 0;
 }
@@ -1043,23 +1050,17 @@ static int devlog_open(struct inode *inode, struct file 
*file)
return ret;
}
 
-   /* Translate log multi-byte integral elements into host native format
-* and determine where the first entry in the log is.
+   /* Find the earliest (lowest Sequence Number) log entry in the
+* circular Device Log.
 */
for (fseqno = ~((u32)0), index = 0; index  dinfo-nentries; index++) {
struct fw_devlog_e *e = dinfo-log[index];
-   int i;
__u32 seqno;
 
if (e-timestamp == 0)
continue;
 
-   e-timestamp = (__force __be64)be64_to_cpu(e-timestamp);
seqno = be32_to_cpu(e-seqno);
-   for (i = 0; i  8; i++)
-   e-params[i] =
-   (__force __be32)be32_to_cpu(e-params[i]);
-
if (seqno  fseqno) {
fseqno = seqno;
dinfo-first = index;
-- 
2.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-02 Thread Eric Dumazet

On Thu, 2015-07-02 at 14:18 -0700, Alex Gartrell wrote:
 On Thu, Jul 2, 2015 at 1:44 AM, Julian Anastasov j...@ssi.bg wrote:
  I think, your patch from January is almost
  good:
 
 I'll rebase it, add your other suggestions, test it, and send it in.
 
  And the patch from Eric for IPVS looks good too.
 
 Are we sure that we want to change the semantics of set_owner_w to
 orphan it?  It works for us but that's not the behavior I'd expect
 from that function and might burn someone later?

I do not understand the concern.

skb_set_owner_w() callers are attempting to :

1) Remove association of a previous socket (skb_orphan()), if it was
there (while most skb at this point are not associated with a previous
socket)

2) Attach skb to a socket.

My fix makes sure this new socket is not a timewait or request sock.

This could happen when routes are changed in a malicious way,
because in early demux, socket dst cache is not valid anymore,
but we keep skb-sk set.

(This could happen without ipvs being in the picture I think)

Bug could happen for example if 
A) GRO cooks a GRO packet
B) we find a timewait socket and attach it to skb (and soon we also
might find a syn_recv socket)
C) Route decides to forward packet
D) output interface needs to add some headroom, check for example
net/ipv6/ip6_gre.c around lines 699





--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Logically DeadCode

2015-07-02 Thread Rahul Jain

hello,

From 0c34030166a150d6d9f1ab52e7bb40a5440a68c2 Mon Sep 17 00:00:00 2001
From: Rahul Jain rahul.j...@samsung.com
Date: Fri, 3 Jul 2015 10:19:12 +0530
Subject: [PATCH] Logically DeadCode

Signed-off-by: Rahul Jain rahul.j...@samsung.com
Signed-off-by: Amit Khatri amit.kha...@samsung.com
---
 net/wireless/util.c| 3 ---
 net/wireless/wext-compat.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index baf7218..507e894 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -1009,9 +1009,6 @@ int cfg80211_change_iface(struct 
cfg80211_registered_device *rdev,
case NUM_NL80211_IFTYPES:
/* not happening */
break;
-   case NL80211_IFTYPE_P2P_DEVICE:
-   WARN_ON(1);
-   break;
}
}
 
diff --git a/net/wireless/wext-compat.c b/net/wireless/wext-compat.c
index fd68283..8de1b64 100644
--- a/net/wireless/wext-compat.c
+++ b/net/wireless/wext-compat.c
@@ -387,9 +387,6 @@ static int cfg80211_wext_siwretry(struct net_device *dev,
changed |= WIPHY_PARAM_RETRY_SHORT;
}
 
-   if (!changed)
-   return 0;
-
err = rdev_set_wiphy_params(rdev, changed);
if (err) {
wdev-wiphy-retry_short = oshort;
-- 
1.9.1

Re: [PATCH net-next 1/3] gro: Pull headers into skb head for 1st skb in gro list

2015-07-02 Thread Eric Dumazet

On Thu, 2015-07-02 at 14:04 -0700, Tom Herbert wrote:
 When setting up the first skb in a gro list we ensure that all the
 headers up to skb_gro_offset have been pulled into head. In subsequent
 uses of this skb (e.g. determining same_flow) it is assumed that the
 headers can be accessed in the skb head.
 
 Signed-off-by: Tom Herbert t...@herbertland.com
 ---
  net/core/dev.c | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/net/core/dev.c b/net/core/dev.c
 index 6778a99..05e0e37 100644
 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -4228,6 +4228,10 @@ static enum gro_result dev_gro_receive(struct 
 napi_struct *napi, struct sk_buff
   } else {
   napi-gro_count++;
   }
 +
 + /* Ensure all headers are pulled into head for 1st skb */
 + skb_gro_header_slow(skb, skb_gro_offset(skb), 0);
 +
   NAPI_GRO_CB(skb)-count = 1;
   NAPI_GRO_CB(skb)-age = jiffies;
   NAPI_GRO_CB(skb)-last = skb;

I do not see how we can reach this point without having headers already
pulled.

For example, tcp_gro_receive() has :

off = skb_gro_offset(skb);
hlen = off + sizeof(*th);
th = skb_gro_header_fast(skb, off);
if (skb_gro_header_hard(skb, hlen)) {
th = skb_gro_header_slow(skb, hlen, off);
if (unlikely(!th))
goto out;
}

So by definition, skb is put in gro_list only if it was fully validated as a GRO
candidate.

Given the complexity of your 2nd patch, I am stopping the review right
now.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv6: sysctl to restrict candidate source addresses

2015-07-02 Thread Erik Kline

Per RFC 6724, section 4, Candidate Source Addresses:

It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the outgoing interface).

Add a sysctl to enable this behaviour.

Signed-off-by: Erik Kline e...@google.com
---
 Documentation/networking/ip-sysctl.txt | 12 
 include/linux/ipv6.h   |  1 +
 include/uapi/linux/ipv6.h  |  1 +
 net/ipv6/addrconf.c| 30 +-
 4 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 5fae770..d8f3e60 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1435,6 +1435,18 @@ mtu - INTEGER
Default Maximum Transfer Unit
Default: 1280 (IPv6 required minimum)
 
+restrict_srcaddr - INTEGER
+   Restrict candidate source addresses (vis. RFC 6724, section 4).
+
+   When set to 1, the candidate source addresses for destinations
+   routed via this interface are restricted to the set of addresses
+   configured on this interface.
+
+   Possible values are:
+   0 : no source address restrictions
+   1 : require matching outgoing interface
+   Default:  0
+
 router_probe_interval - INTEGER
Minimum interval (in seconds) between Router Probing described
in RFC4191.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 82806c6..6867d1f 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -57,6 +57,7 @@ struct ipv6_devconf {
bool initialized;
struct in6_addr secret;
} stable_secret;
+   __s32   restrict_srcaddr;
void*sysctl;
 };
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 5efa54a..b174758 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -171,6 +171,7 @@ enum {
DEVCONF_USE_OPTIMISTIC,
DEVCONF_ACCEPT_RA_MTU,
DEVCONF_STABLE_SECRET,
+   DEVCONF_RESTRICT_SRCADDR,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..f72c974 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -211,7 +211,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.accept_ra_mtu  = 1,
.stable_secret  = {
.initialized = false,
-   }
+   },
+   .restrict_srcaddr   = 0,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -253,6 +254,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly 
= {
.stable_secret  = {
.initialized = false,
},
+   .restrict_srcaddr   = 0,
 };
 
 /* Check if a valid qdisc is available */
@@ -1366,7 +1368,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
*score = scores[0], *hiscore = scores[1];
struct ipv6_saddr_dst dst;
struct net_device *dev;
-   int dst_type;
+   struct inet6_dev *idev;
+   int dst_type, restrict_srcaddr = 0;
 
dst_type = __ipv6_addr_type(daddr);
dst.addr = daddr;
@@ -1380,9 +1383,12 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 
rcu_read_lock();
 
-   for_each_netdev_rcu(net, dev) {
-   struct inet6_dev *idev;
+   if (dst_dev) {
+   idev = __in6_dev_get(dst_dev);
+   restrict_srcaddr = (idev) ? idev-cnf.restrict_srcaddr : 0;
+   }
 
+   for_each_netdev_rcu(net, dev) {
/* Candidate Source Address (section 4)
 *  - multicast and link-local destination address,
 *the set of candidate source address MUST only
@@ -1394,9 +1400,14 @@ int ipv6_dev_get_saddr(struct net *net, const struct 
net_device *dst_dev,
 *include addresses assigned to interfaces
 *belonging to the same site as the outgoing
 *interface.)
+*  - It is RECOMMENDED that the candidate source addresses
+*be the set of unicast addresses assigned to the
+*interface that will be used to send to the destination
+*(the 'outgoing' interface). (RFC 6724)
 */
if (((dst_type  IPV6_ADDR_MULTICAST) ||
-dst.scope = IPV6_ADDR_SCOPE_LINKLOCAL) 
+dst.scope = IPV6_ADDR_SCOPE_LINKLOCAL ||
+restrict_srcaddr) 
dst.ifindex  dev-ifindex != dst.ifindex)
continue;
 
@@ -4586,6 +4597,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf 
*cnf,
array[DEVCONF_ACCEPT_RA_FROM_LOCAL] = cnf-accept_ra_from_local;

[PATCH v2 v4.2-rc1] printk: make extended printk support conditional on netconsole

2015-07-02 Thread Tejun Heo

6fe29354befe (printk: implement support for extended console
drivers) implemented extended printk support for extended netconsole.
The code added was miniscule but it added static 8k buffer
unconditionally unnecessarily bloating the kernel for cases where
extended netconsole is not used.

This patch introduces CONFIG_PRINTK_CON_EXTENDED which is selected by
CONFIG_NETCONSOLE.  If the config option is not set, extended printk
support is compiled out along with the static buffer.

Verified 8k reduction in vmlinux bss when !CONFIG_NETCONSOLE.

v2: Added a warning for cases where CON_EXTENDED is requested while
CONFIG_PRINTK_CON_EXTENDED is disabled as suggested by Petr.

Cc: Petr Mladek pmla...@suse.com
Signed-off-by: Tejun Heo t...@kernel.org
Reported-and-suggested-by: Geert Uytterhoeven ge...@linux-m68k.org
---
 drivers/net/Kconfig|1 +
 init/Kconfig   |3 +++
 kernel/printk/printk.c |   40 
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 019fcef..39587f1 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -195,6 +195,7 @@ config GENEVE
 
 config NETCONSOLE
tristate Network console logging support
+   select PRINTK_CON_EXTENDED
---help---
If you want to log kernel messages over the network, enable this.
See file:Documentation/networking/netconsole.txt for details.
diff --git a/init/Kconfig b/init/Kconfig
index bcc41bd..cd281ab 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1438,6 +1438,9 @@ config PRINTK
  very difficult to diagnose system problems, saying N here is
  strongly discouraged.
 
+config PRINTK_CON_EXTENDED
+   bool
+
 config BUG
bool BUG() support if EXPERT
default y
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cf8c242..f719118 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -84,6 +84,10 @@ static struct lockdep_map console_lock_dep_map = {
 };
 #endif
 
+#ifdef CONFIG_PRINTK_CON_EXTENDED
+
+#define CONSOLE_EXT_LOG_BUF_LENCONSOLE_EXT_LOG_MAX
+
 /*
  * Number of registered extended console drivers.
  *
@@ -96,6 +100,32 @@ static struct lockdep_map console_lock_dep_map = {
  */
 static int nr_ext_console_drivers;
 
+static void inc_nr_ext_console_drivers(void)
+{
+   nr_ext_console_drivers++;
+}
+
+static void dec_nr_ext_console_drivers(void)
+{
+   nr_ext_console_drivers--;
+}
+
+#else  /* CONFIG_PRINTK_CON_EXTENDED */
+
+#define CONSOLE_EXT_LOG_BUF_LEN0
+#define nr_ext_console_drivers 0
+
+static void inc_nr_ext_console_drivers(void)
+{
+   WARN_ONCE(true, printk: CON_EXTENDED requested when 
!CONFIG_PRINTK_CON_EXTENDED\n);
+}
+
+static void dec_nr_ext_console_drivers(void)
+{
+}
+
+#endif /* CONFIG_PRINTK_CON_EXTENDED */
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -2224,7 +2254,7 @@ static void console_cont_flush(char *text, size_t size)
  */
 void console_unlock(void)
 {
-   static char ext_text[CONSOLE_EXT_LOG_MAX];
+   static char ext_text[CONSOLE_EXT_LOG_BUF_LEN];
static char text[LOG_LINE_MAX + PREFIX_MAX];
static u64 seen_seq;
unsigned long flags;
@@ -2561,9 +2591,11 @@ void register_console(struct console *newcon)
console_drivers-next = newcon;
}
 
-   if (newcon-flags  CON_EXTENDED)
-   if (!nr_ext_console_drivers++)
+   if (newcon-flags  CON_EXTENDED) {
+   if (!nr_ext_console_drivers)
pr_info(printk: continuation disabled due to ext 
consoles, expect more fragments in /dev/kmsg\n);
+   inc_nr_ext_console_drivers();
+   }
 
if (newcon-flags  CON_PRINTBUFFER) {
/*
@@ -2638,7 +2670,7 @@ int unregister_console(struct console *console)
}
 
if (!res  (console-flags  CON_EXTENDED))
-   nr_ext_console_drivers--;
+   dec_nr_ext_console_drivers();
 
/*
 * If this isn't the last console and it has CON_CONSDEV set, we
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 3/6] net_sched: act: make tcfg_pval non zero

2015-07-02 Thread Alexei Starovoitov


On 7/2/15 6:07 AM, Eric Dumazet wrote:

First step for gact RCU operation :

Instead of testing if tcfg_pval is zero or not, just make it 1.

No change in behavior, but slightly faster code.

Signed-off-by: Eric Dumazeteduma...@google.com


Nice trick!
Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bonding: primary_reselect with failure is not working properly

2015-07-02 Thread David Miller


Emails encoded in HTML will not reach the mailing list, only plain
ASCII text is accepted.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 6/6] net_sched: act: remove spinlock in fast path

2015-07-02 Thread Alexei Starovoitov


On 7/2/15 6:07 AM, Eric Dumazet wrote:

Final step for gact RCU operation :

1) Use percpu stats
2) update lastuse only every clock tick
3) Remove spinlock acquisition, as it is no longer needed.

Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.

My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 10 Mpps after patch.

Signed-off-by: Eric Dumazeteduma...@google.com


Great stuff. Thank you for fixing it!

Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net_sched: gen_estimator: extend pps limit

2015-07-02 Thread Alexei Starovoitov


On 7/2/15 6:57 AM, Eric Dumazet wrote:

From: Eric Dumazeteduma...@google.com

rate estimators are limited to 4 Mpps, which was fine years ago, but
too small with current hardware generation.

Lets use 2^5 scaling instead of 2^10 to get 128 Mpps new limit.

On 64bit arch, use an unsigned long for temp storage and remove limit.
(We do not expect 32bit arches to be able to reach this point)

Tested:

tc -s -d filter sh dev eth0 parent :

filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 
1:15
   match 0700/ff00 at 12
action order 1: gact action drop
 random type none pass val 0
 index 1 ref 1 bind 1 installed 166 sec
Action statistics:
Sent 39734251496 bytes 863788076 pkt (dropped 863788117, overlimits 0 
requeues 0)
rate 4067Mbit 11053596pps backlog 0b 0p requeues 0

Signed-off-by: Eric Dumazeteduma...@google.com


Looks good to me.
Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rtnetlink: Actually use the policy for the IFLA_VF_INFO

2015-07-02 Thread Jason Gunthorpe

On Wed, Jul 01, 2015 at 11:36:15AM +0200, Daniel Borkmann wrote:
 Hi Jason,
 
 On 07/01/2015 12:52 AM, Jason Gunthorpe wrote:
 It turns out the policy was defined but never actually checked,
 so lets check it.
 
 Fixes: ebc08a6f47ee (rtnetlink: Add VF config code to rtnetlink)
 
 I would argue that the actual commit would be ...
 
 Fixes: c02db8c6290b (rtnetlink: make SR-IOV VF interface symmetric)

Yes, agree

 So, commit c02db8c6290b moved it into a nested attribute (IFLA_VF_INFO)
 where we indeed don't do further validation. Imho, we should pass the
 parsed attribute table from nla_parse_nested() down into do_setvfinfo(),
 something like the below; I can give it a test run on my ixgbe.

Yes, that is saner overall

Jason
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] netlink: Delete an unnecessary check before the function call module_put

2015-07-02 Thread SF Markus Elfring

From: Markus Elfring elfr...@users.sourceforge.net
Date: Thu, 2 Jul 2015 18:38:12 +0200

The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring elfr...@users.sourceforge.net
---
 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index dea9253..9a0ae71 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -158,7 +158,7 @@ static int __netlink_remove_tap(struct netlink_tap *nt)
 out:
spin_unlock(netlink_tap_lock);
 
-   if (found  nt-module)
+   if (found)
module_put(nt-module);
 
return found ? 0 : -ENODEV;
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net-RDS: Delete an unnecessary check before the function call module_put

2015-07-02 Thread SF Markus Elfring

From: Markus Elfring elfr...@users.sourceforge.net
Date: Thu, 2 Jul 2015 17:58:21 +0200

The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring elfr...@users.sourceforge.net
---
 net/rds/transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/transport.c b/net/rds/transport.c
index 8b4a6cd..83498e1 100644
--- a/net/rds/transport.c
+++ b/net/rds/transport.c
@@ -73,7 +73,7 @@ EXPORT_SYMBOL_GPL(rds_trans_unregister);
 
 void rds_trans_put(struct rds_transport *trans)
 {
-   if (trans  trans-t_owner)
+   if (trans)
module_put(trans-t_owner);
 }
 
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 6/6] net_sched: act: remove spinlock in fast path

2015-07-02 Thread John Fastabend

On 15-07-02 06:07 AM, Eric Dumazet wrote:
 Final step for gact RCU operation :
 
 1) Use percpu stats
 2) update lastuse only every clock tick
 3) Remove spinlock acquisition, as it is no longer needed.
 
 Since this is the last contended lock in packet RX when tc gact is used,
 this gives impressive gain.
 
 My host with 8 RX queues was handling 5 Mpps before the patch,
 and more than 10 Mpps after patch.
 
 Tested:
 
 On receiver :
 IP=ip
 TC=tc
 dev=eth0
 
 $TC qdisc del dev $dev ingress 2/dev/null
 $TC qdisc add dev $dev ingress
 $TC filter del dev $dev root pref 10 2/dev/null
 $TC filter del dev $dev pref 10 2/dev/null
 tc filter add dev $dev est 1sec 4sec parent : protocol ip prio 1 \
   u32 match ip src 7.0.0.0/8 flowid 1:15 action drop
 
 Sender sends packets flood from 7/8 network
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Alexei Starovoitov a...@plumgrid.com
 Cc: Jamal Hadi Salim j...@mojatatu.com
 Cc: John Fastabend john.fastab...@gmail.com
 ---
  net/sched/act_gact.c | 14 ++
  1 file changed, 6 insertions(+), 8 deletions(-)

[...]

 @@ -121,9 +121,8 @@ static int tcf_gact(struct sk_buff *skb, const struct 
 tc_action *a,
   struct tcf_result *res)
  {
   struct tcf_gact *gact = a-priv;
 - int action = gact-tcf_action;
 + int action = READ_ONCE(gact-tcf_action);
  
 - spin_lock(gact-tcf_lock);
  #ifdef CONFIG_GACT_PROB
   {
   u32 ptype = READ_ONCE(gact-tcfg_ptype);
 @@ -132,12 +131,11 @@ static int tcf_gact(struct sk_buff *skb, const struct 
 tc_action *a,
   action = gact_rand[ptype](gact);
   }
  #endif
 - gact-tcf_bstats.bytes += qdisc_pkt_len(skb);
 - gact-tcf_bstats.packets++;
 + bstats_cpu_update(this_cpu_ptr(gact-common.cpu_bstats), skb);
   if (action == TC_ACT_SHOT)
 - gact-tcf_qstats.drops++;
 - gact-tcf_tm.lastuse = jiffies;
 - spin_unlock(gact-tcf_lock);
 + qstats_drop_inc(this_cpu_ptr(gact-common.cpu_qstats));
 + if (gact-tcf_tm.lastuse != jiffies)
 + gact-tcf_tm.lastuse = jiffies;

I'm missing the point of the if block. Is that really good enough
for the 32bit system case? I would have expected some wrapper to
handle it here something like u64_stats_() maybe _u64_jiffies(). Maybe
after a coffee I'll make sense of it.

  
   return action;
  }
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 4.1 regression in resizable hashtable tests

2015-07-02 Thread Meelis Roos

  [   31.898697] Running resizable hashtable tests...
  [   31.898915]   Adding 2048 keys
  [   31.952911]   Traversal complete: counted=17, nelems=2048, entries=2048
  [   31.953004] Test failed: Total count mismatch ^^^
  [   32.022676]   Traversal complete: counted=17, nelems=2048, entries=2048
  [   32.022788] Test failed: Total count mismatch ^^^
  [   32.022828]   Deleting 2048 keys
 
 Thanks for the report. I think this is already fixed. Can you try with the
 following commit:
 
 commit 246b23a7695bd5a457aa51a36a948cce53d1d477

Tried todays got, it's actually worse:

[0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36'
[0.00] PROMLIB: Root node compatible: 
[0.00] Linux version 4.1.0-12127-g4da3064 (mroos@u5) (gcc version 4.9.2 
(Debian 4.9.2-20) ) #19 Thu Jul 2 21:09:48 EEST 2015
[0.00] bootconsole [earlyprom0] enabled
[0.00] ARCH: SUN4U
[0.00] Ethernet address: 08:00:20:f8:c7:72
[0.00] MM: PAGE_OFFSET is 0xf800 (max_phys_bits == 40)
[0.00] MM: VMALLOC [0x0001 -- 0x0600]
[0.00] MM: VMEMMAP [0x0600 -- 0x0c00]
[0.00] Kernel: Using 10 locked TLB entries for main kernel image.
[0.00] Remapping the kernel... done.
[0.00] kmemleak: Kernel memory leak detector disabled
[0.00] OF stdout device is: /pci@1f,0/pci@1,1/ebus@1/se@14,40:a
[0.00] PROM: Built device tree with 70266 bytes of memory.
[0.00] Top of RAM: 0x1ff2c000, Total RAM: 0x1ff2a000
[0.00] Memory hole size: 0MB
[0.00] Allocated 16384 bytes for kernel page tables.
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x-0x1ff2bfff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x1fefdfff]
[0.00]   node   0: [mem 0x1ff0-0x1ff2bfff]
[0.00] Initmem setup node 0 [mem 0x-0x1ff2bfff]
[0.00] On node 0 totalpages: 65429
[0.00]   Normal zone: 512 pages used for memmap
[0.00]   Normal zone: 0 pages reserved
[0.00]   Normal zone: 65429 pages, LIFO batch:15
[0.00] Booting Linux...
[0.00] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]
[0.00] CPU CAPS: [vis]
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0 
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 64917
[0.00] Kernel command line: root=/dev/sda1 ro
[0.00] PID hash table entries: 2048 (order: 1, 16384 bytes)
[0.00] Dentry cache hash table entries: 65536 (order: 6, 524288 bytes)
[0.00] Inode-cache hash table entries: 32768 (order: 5, 262144 bytes)
[0.00] Sorting __ex_table...
[0.00] Memory: 475912K/523432K available (5266K kernel code, 516K 
rwdata, 1672K rodata, 520K init, 30210K bss, 47520K reserved, 0K cma-reserved)
[0.00] Running RCU self tests
[0.00] Testing tracer nop: PASSED
[0.00] NR_IRQS:2048 nr_irqs:2048 1
[   26.997388] clocksource: tick: mask: 0x max_cycles: 
0x5306eb473f, max_idle_ns: 440795213232 ns
[   27.101123] clocksource: mult[2c71c72] shift[24]
[   27.140662] clockevent: mult[5c28f5c3] shift[32]
[   27.182668] Console: colour dummy device 80x25
[   27.218768] console [tty0] enabled
[   27.243489] bootconsole [earlyprom0] disabled
[   27.279960] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[   27.280027] ... MAX_LOCKDEP_SUBCLASSES:  8
[   27.280070] ... MAX_LOCK_DEPTH:  48
[   27.280114] ... MAX_LOCKDEP_KEYS:8191
[   27.280159] ... CLASSHASH_SIZE:  4096
[   27.280204] ... MAX_LOCKDEP_ENTRIES: 32768
[   27.280250] ... MAX_LOCKDEP_CHAINS:  65536
[   27.280295] ... CHAINHASH_SIZE:  32768
[   27.280341]  memory used by lock dependency info: 8159 kB
[   27.280392]  per task-struct memory footprint: 1920 bytes
[   27.280443] 
[   27.280480] | Locking API testsuite:
[   27.280518] 

[   27.280584]  | spin |wlock |rlock |mutex | 
wsem | rsem |
[   27.280650]   
--
[   27.280755]  A-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   27.347473]  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   27.414742]  A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   27.482380]  A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   27.550106]  A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  
ok  |  ok  |
[   27.618220]  A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |

[PATCH net-next] hv_netvsc: Add support to set MTU reservation from guest side

2015-07-02 Thread Haiyang Zhang

When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.

Signed-off-by: Haiyang Zhang haiya...@microsoft.com
Reviewed-by: K. Y. Srinivasan k...@microsoft.com
---
 drivers/net/hyperv/netvsc_drv.c   |3 +--
 drivers/net/hyperv/rndis_filter.c |2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 358475e..68e7ece 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -743,8 +743,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
if (nvdev-nvsp_version = NVSP_PROTOCOL_VERSION_2)
limit = NETVSC_MTU - ETH_HLEN;
 
-   /* Hyper-V hosts don't support MTU  ETH_DATA_LEN (1500) */
-   if (mtu  ETH_DATA_LEN || mtu  limit)
+   if (mtu  68 || mtu  limit)
return -EINVAL;
 
nvdev-start_remove = true;
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 006c1b8..172824e 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1053,7 +1053,7 @@ int rndis_filter_device_add(struct hv_device *dev,
ret = rndis_filter_query_device(rndis_device,
RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE,
mtu, size);
-   if (ret == 0  size == sizeof(u32))
+   if (ret == 0  size == sizeof(u32)  mtu  net_device-ndev-mtu)
net_device-ndev-mtu = mtu;
 
/* Get the mac address */
-- 
1.7.4.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

amd-xgbe e0700000.xgmac: DMA-API: device driver tries to sync DMA memory it has not allocated

2015-07-02 Thread Kim Phillips

Hi Tom,

A pristine 4.1 kernel with CONFIG_DMA_API_DEBUG=y produces this call
trace on an AMD Seattle:

[  112.896576] [ cut here ]
[  112.896591] WARNING: CPU: 2 PID: 1059 at lib/dma-debug.c:1202 
check_sync+0x138/0x56c()
[  112.896597] amd-xgbe e070.xgmac: DMA-API: device driver tries to sync 
DMA memory it has not allocated [device address=0x008003d52000] [size=1536 
bytes]
[  112.896600] Modules linked in: cpufreq_stats vfat fat xfs libcrc32c 
spi_pl022 aes_ce_blk ablk_helper cryptd aes_ce_cipher ghash_ce sha2_ce sha1_ce 
uio_pdrv_genirq uio fuse
[  112.896634] CPU: 2 PID: 1059 Comm: sshd Tainted: GW   4.1.0 #10
[  112.896638] Hardware name: Default string Default string/Default string, 
BIOS ROD0082B 06/16/2015
[  112.896641] Call trace:
[  112.899086] [fe097b20] dump_backtrace+0x0/0x170
[  112.899091] [fe097cb0] show_stack+0x20/0x2c
[  112.899097] [fe813da0] dump_stack+0x8c/0xc4
[  112.899102] [fe0c45bc] warn_slowpath_common+0xa0/0xd8
[  112.899106] [fe0c4668] warn_slowpath_fmt+0x74/0x88
[  112.899109] [fe486f24] check_sync+0x134/0x56c
[  112.899113] [fe4873ac] debug_dma_sync_single_for_cpu+0x50/0x5c
[  112.899119] [fe5b07fc] xgbe_rx_poll+0x1e0/0x6c0
[  112.899123] [fe5b1cdc] xgbe_one_poll+0x34/0x6c
[  112.899128] [fe6a510c] net_rx_action+0x270/0x504
[  112.899133] [fe0c9ba0] __do_softirq+0x120/0x60c
[  112.899136] [fe0ca3fc] irq_exit+0xa4/0xe4
[  112.899143] [fe12d2f8] __handle_domain_irq+0x74/0xc4
[  112.899146] [fe0903ec] gic_handle_irq+0x38/0x84
[  112.899150] Exception stack(0xfe03de09bbf0 to 0xfe03de09bd10)
[  112.899154] bbe0: 1ee64b30 fe00 
002111e8 fe00
[  112.899158] bc00: de09bd30 fe03 0081a668 fe00 dd54fa00 fe03 
de09bcb0 fe03
[  112.899162] bc20: 0001  0001    
 
[  112.899166] bc40:   002111e8 fe00   
de098000 fe03
[  112.899170] bc60: de09bc10 fe03 1d456228  0076  
0008 
[  112.899174] bc80: 1000 00077d2a  001dcd65 00213e98 fe00 
96ea8f60 03ff
[  112.899178] bca0: 65b9a2e2  1ee64b30 fe00 002111e8 fe00 
1ee64ba0 fe00
[  112.899181] bcc0: 0041  1ee10b80 fe00 025c58c8 fe00 
0003 
[  112.899185] bce0: 027b6c48 fe00 00100077  02ab0b4b  
de09bd30 fe03
[  112.899188] bd00: 0081a660 fe00 de09bd30 fe03
[  112.899192] [fe0934e8] el1_irq+0x68/0x100
[  112.899198] [fe2111e4] validate_mm+0x44/0x2d4
[  112.899203] [fe211edc] vma_link+0x98/0xe0
[  112.899206] [fe213e70] do_brk+0x2ec/0x314
[  112.899209] [fe213fd4] SyS_brk+0x13c/0x170
[  112.899212] ---[ end trace cbf36648db00d232 ]---

Can you look into it?

Thanks,

Kim


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No:  2548782

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rionet: Don't try to corrupt skbuff assigning data pointer directly

2015-07-02 Thread David Miller

From: Alexander Sverdlin alexander.sverd...@nokia.com
Date: Wed, 1 Jul 2015 15:01:11 +0200

 It's not allowed to assign data pointer of skbuff directly, this makes no 
 sense
 if the assigned pointer is the very same as already existing one, or it brakes
 all the pointer arithmetics in all other cases. We cannot do better as just
 compare them and report BUG() in case of mismatch.

 Signed-off-by: Alexander Sverdlin alexander.sverd...@nokia.com

BUG takes the entire machine out, which is worse than corrupting the
skb-data

If you really want to assert this condition, do it in a way that
doesn't kill the entire machine.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bpqether broken in 4.1

2015-07-02 Thread Ralf Baechle

Eric's Commit 1d5da757da860a6916adbf68b09e868062b4b3b8 (ax25: Stop using
magic neighbour cache operations.) breaks IP traffic over the AX.25 bpqether
driver.

Here's how to reproduce the issue if you don't have an AX.25 setup.  The
arp command is there to fudge things if you don't have a peer that would
answer ARP requests.

# modprobe bpqether
# ifconfig bpq0 hw ax25 abcdef-7 172.20.4.1/24
# arp -H ax25 -s 172.20.4.2 uvwxyz-9
# ping 172.20.4.2

Result in one Dead loop on virtual device bpq0, fix it urgently! message
per ping packet.  With the following little debug patch

diff --git a/net/core/dev.c b/net/core/dev.c
index aa82f9a..5fef868 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3011,6 +3011,7 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
 recursion_alert:
net_crit_ratelimited(Dead loop on virtual device %s, 
fix it urgently!\n,
 dev-name);
+   WARN_ON(1);
}
}
 
I get the following backtrace:

[   33.149171] Dead loop on virtual device bpq0, fix it urgently!
[   33.149718] [ cut here ]
[   33.149754] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3014 
__dev_queue_xmit+0x3f6/0x530()
[   33.149769] Modules linked in:
[   33.149789] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.1.0-00010-g21c6d95-dirty #18
[   33.149799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.8.1-20150318_183358- 04/01/2014
[   33.149810]   de52945c8e778a65 88007fc039a8 
816d2165
[   33.149823]    88007fc039e8 
810634aa
[   33.149833]  88007fc039c8  880078f9 
880078f9
[   33.149844] Call Trace:
[   33.149885]  IRQ  [816d2165] dump_stack+0x45/0x57
[   33.149927]  [810634aa] warn_slowpath_common+0x8a/0xc0
[   33.149939]  [810635da] warn_slowpath_null+0x1a/0x20
[   33.149949]  [815c7c06] __dev_queue_xmit+0x3f6/0x530
[   33.149967]  [8108cbed] ? ttwu_do_wakeup+0x1d/0xe0
[   33.149978]  [815c7d53] dev_queue_xmit_sk+0x13/0x20
[   33.149994]  [816b9951] ax25_queue_xmit+0x61/0x70
[   33.150005]  [816b9476] ax25_ip_xmit+0xd6/0x2d0
[   33.150022]  [8108fb47] ? wake_up_process+0x27/0x50
[   33.150050]  [814dda35] bpq_xmit+0x1d5/0x200
[   33.150061]  [815c7694] dev_hard_start_xmit+0x264/0x3e0
[   33.150073]  [815c7ccd] __dev_queue_xmit+0x4bd/0x530
[   33.150083]  [815c7d53] dev_queue_xmit_sk+0x13/0x20
[   33.150099]  [815d03c2] neigh_connected_output+0xc2/0x110
[   33.150110]  [815d3483] neigh_update+0x333/0x770
[   33.150117]  [8162d2a7] arp_process.isra.15+0x2f7/0x690
[   33.150117]  [8162d736] arp_rcv+0xe6/0x130
[   33.150117]  [815c5543] __netif_receive_skb_core+0x693/0x830
[   33.150117]  [815c56f8] __netif_receive_skb+0x18/0x60
[   33.150117]  [815c6532] process_backlog+0xb2/0x150
[   33.150117]  [815c5cd2] net_rx_action+0x212/0x340
[   33.150117]  [81067aeb] __do_softirq+0x10b/0x2d0
[   33.150117]  [81067f15] irq_exit+0x145/0x150
[   33.150117]  [816da8a8] do_IRQ+0x58/0xf0
[   33.150117]  [816d896e] common_interrupt+0x6e/0x6e
[   33.150117]  EOI  [8104b236] ? native_safe_halt+0x6/0x10
[   33.150117]  [810c4d43] ? rcu_eqs_enter+0xa3/0xb0
[   33.150117]  [8100ddbe] default_idle+0x1e/0xc0
[   33.150117]  [8100e81f] arch_cpu_idle+0xf/0x20
[   33.150117]  [810a6f57] cpu_startup_entry+0x377/0x3f0
[   33.150117]  [816c989c] rest_init+0x7c/0x80
[   33.150117]  [81d32fe4] start_kernel+0x484/0x4a5
[   33.150117]  [81d32120] ? early_idt_handler_array+0x120/0x120
[   33.150117]  [81d32315] x86_64_start_reservations+0x2a/0x2c
[   33.150117]  [81d3245c] x86_64_start_kernel+0x145/0x168
[   33.150117] ---[ end trace ff4df9d904cced48 ]---

  Ralf
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-02 Thread Julian Anastasov


Hello,

On Thu, 2 Jul 2015, Julian Anastasov wrote:

   Alex, in our discussion on January I thought
 we can skip calling skb_orphan for some cases but as
 input and output path use different skb-destructor
 we should call skb_orphan for every method, in every
 case when skb-dev != NULL, even when we do not call
 LOCAL_OUT, i.e. when NF_ACCEPT is returned for traffic
 to local real server. We should not call it only for
 local socket (skb-dev == NULL).
 
   I think, your patch from January is almost
 good:
 
 http://archive.linuxvirtualserver.org/html/lvs-devel/2015-01/msg00014.html
 
   Just add skb-dev check and we should be fine.

Sorry, I overlooked the problem. Above is not
correct because we can avoid the skb_orphan call
when 'local' is true. ip_vs_nat_send_or_cont should
call skb_orphan even for local=true while for TUN
it should be before ip_vs_prepare_tunneled_skb.
All other methods should avoid skb_orphan if
local=true or skb-dev is NULL.

Regards

--
Julian Anastasov j...@ssi.bg
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull request: bluetooth 2015-07-02

2015-07-02 Thread David Miller

From: Johan Hedberg johan.hedb...@gmail.com
Date: Thu, 2 Jul 2015 10:30:38 +0300

 A couple of regressions crept in because of a patch to use proper list
 APIs rather than manually reading  writing the next/prev pointers
 (commit 835a6a2f8603237a3e6cded5a6765090ecb06ea5). Turns out this was
 masking a few bugs: a missing INIT_LIST_HEAD() call and incorrectly
 using list_del() rather than list_del_init(). The two patches in this
 set fix these, and it'd be nice they could still make it to 4.2-rc1 to
 avoid new bug reports from users.

 Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] bridge: vlan: fix usage of vlan 0 and 4095 again

2015-07-02 Thread David Miller

From: Nikolay Aleksandrov niko...@cumulusnetworks.com
Date: Thu,  2 Jul 2015 05:48:17 -0700

 Vlan ids 0 and 4095 were disallowed by commit:
 8adff41c3d25 (bridge: Don't use VID 0 and 4095 in vlan filtering)
 but then the check was removed when vlan ranges were introduced by:
 bdced7ef7838 (bridge: support for multiple vlans and vlan ranges in setlink 
 and dellink requests)
 So reintroduce the vlan range check.
 Before patch:
 [root@testvm ~]# bridge vlan add vid 0 dev eth0 master
 (succeeds)
 After Patch:
 [root@testvm ~]# bridge vlan add vid 0 dev eth0 master
 RTNETLINK answers: Invalid argument

 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 Fixes: bdced7ef7838 (bridge: support for multiple vlans and vlan ranges in 
 setlink and dellink requests)

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2,5/9] fsl/fman: Add Frame Manager support

2015-07-02 Thread Scott Wood

On Thu, 2015-07-02 at 10:32 -0500, Liberman Igal-B31950 wrote:
 Hi Scott,
 Thank you for your feedback, please take a look at my comments/questions.
 
 Regards,
 Igal Liberman.
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, June 26, 2015 6:55 AM
  To: Liberman Igal-B31950
  Cc: netdev@vger.kernel.org; linuxppc-...@lists.ozlabs.org; Bucur Madalin-
  Cristian-B32716; pebo...@tiscali.nl
  Subject: Re: [v2,5/9] fsl/fman: Add Frame Manager support
  
  On Wed, 2015-06-24 at 22:35 +0300,  igal.liberman@freescale.comwrote:
   From: Igal Liberman igal.liber...@freescale.com
   
   Add Frame Manger Driver support.
   This patch adds The FMan configuration, initialization and runtime
   control routines.
   
   Signed-off-by: Igal Liberman igal.liber...@freescale.com
   ---
drivers/net/ethernet/freescale/fman/Kconfig|   35 +
drivers/net/ethernet/freescale/fman/Makefile   |2 +-
drivers/net/ethernet/freescale/fman/fm.c   | 1406
   
drivers/net/ethernet/freescale/fman/fm.h   |  394 ++
drivers/net/ethernet/freescale/fman/fm_common.h|  142 ++
drivers/net/ethernet/freescale/fman/fm_drv.c   |  701 ++
drivers/net/ethernet/freescale/fman/fm_drv.h   |  116 ++
drivers/net/ethernet/freescale/fman/inc/enet_ext.h |  199 +++
drivers/net/ethernet/freescale/fman/inc/fm_ext.h   |  488 +++
.../net/ethernet/freescale/fman/inc/fsl_fman_drv.h |   99 ++
drivers/net/ethernet/freescale/fman/inc/service.h  |   55 +
11 files changed, 3636 insertions(+), 1 deletion(-)  create mode
   100644 drivers/net/ethernet/freescale/fman/fm.c
create mode 100644 drivers/net/ethernet/freescale/fman/fm.h
create mode 100644 drivers/net/ethernet/freescale/fman/fm_common.h
create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.c
create mode 100644 drivers/net/ethernet/freescale/fman/fm_drv.h
create mode 100644 drivers/net/ethernet/freescale/fman/inc/enet_ext.h
create mode 100644 drivers/net/ethernet/freescale/fman/inc/fm_ext.h
create mode 100644
   drivers/net/ethernet/freescale/fman/inc/fsl_fman_drv.h
create mode 100644 drivers/net/ethernet/freescale/fman/inc/service.h
  
  Again, please start with something pared down, without extraneous
  features, but *with* enough functionality to actually pass packets around.
  Getting this thing into decent shape is going to be hard enough without
  carrying around the excess baggage.
  
   diff --git a/drivers/net/ethernet/freescale/fman/Kconfig
   b/drivers/net/ethernet/freescale/fman/Kconfig
   index 825a0d5..12c75bfd 100644
   --- a/drivers/net/ethernet/freescale/fman/Kconfig
   +++ b/drivers/net/ethernet/freescale/fman/Kconfig
   @@ -7,3 +7,38 @@ config FSL_FMAN
 Freescale Data-Path Acceleration Architecture Frame 
   Manager
 (FMan) support
   
   +if FSL_FMAN
   +
   +config FSL_FM_MAX_FRAME_SIZE
   + int Maximum L2 frame size
   + range 64 9600
   + default 1522
   + help
   + Configure this in relation to the maximum possible MTU of 
   your
   + network configuration. In particular, one would need to
   + increase this value in order to use jumbo frames.
   + FSL_FM_MAX_FRAME_SIZE must accommodate the Ethernet FCS
   + (4 bytes) and one ETH+VLAN header (18 bytes), to a total 
   of
   + 22 bytes in excess of the desired L3 MTU.
   +
   + Note that having too large a FSL_FM_MAX_FRAME_SIZE (much
  larger
   + than the actual MTU) may lead to buffer exhaustion, 
   especially
   + in the case of badly fragmented datagrams on the Rx path.
   + Conversely, having a FSL_FM_MAX_FRAME_SIZE smaller than 
   the
   + actual MTU will lead to frames being dropped.
  
  Scatter gather can't be used for jumbo frames?
  
 
 Scatter gather is used, it's introduced in dpaa_eth as a separate patch 
 from the basic support.
 The dpaa_eth can work in S/G mode or use large buffers, max frame size 
 sized to reduce S/G overhead (performance vs memory used trade-off).

That's not what the help text says: In particular, one would need to
increase this value in order to use jumbo frames and Conversely, having a 
FSL_FM_MAX_FRAME_SIZE smaller than the actual MTU will lead to frames being 
dropped.

  Why is this a compile-time option?
  
 
 This is needed for a couple of reasons:
  - FMan resource sizing - we need to know the maximum frame size we plan to 
 use for determining the Rx FIFO sizes at config time

Why can't the FIFO be resized at runtime?

  - There are issues when changing the MAC maximum frame size at runtime 
 thus the need to set in HW the maximum allowable and compensate from sw 
 (drop frames above the set MTU).

What are the issues?

In any case, it could at least be a module parameter (i.e. a kernel command 
line argument when not built as a

[PATCH net-next 3/3] vxlan: GRO support at tunnel layer

2015-07-02 Thread Tom Herbert

Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel.

Testing:

Ran 200 netperf TCP_STREAM instance

- With fix (GRO enabled on VXLAN interface)

  Verify GRO is happening.

  9084 MBps tput
  3.44% CPU utilization

- Without fix (GRO disabled on VXLAN interface)

  Verified no GRO is happening.

  9084 MBps tput
  5.54% CPU utilization

Signed-off-by: Tom Herbert t...@herbertland.com
---
 drivers/net/vxlan.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d41b482..363f6b1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -28,6 +28,7 @@
 #include linux/hash.h
 #include linux/ethtool.h
 #include net/arp.h
+#include net/gro_cells.h
 #include net/ndisc.h
 #include net/ip.h
 #include net/ip_tunnels.h
@@ -132,6 +133,7 @@ struct vxlan_dev {
spinlock_thash_lock;
unsigned int  addrcnt;
unsigned int  addrmax;
+   struct gro_cells  gro_cells;
 
struct hlist_head fdb_head[FDB_HASH_SIZE];
 };
@@ -1318,7 +1320,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct 
sk_buff *skb,
stats-rx_bytes += skb-len;
u64_stats_update_end(stats-syncp);
 
-   netif_rx(skb);
+   gro_cells_receive(vxlan-gro_cells, skb);
 
return;
 drop:
@@ -2376,6 +2378,8 @@ static void vxlan_setup(struct net_device *dev)
 
vxlan-dev = dev;
 
+   gro_cells_init(vxlan-gro_cells, dev);
+
for (h = 0; h  FDB_HASH_SIZE; ++h)
INIT_HLIST_HEAD(vxlan-fdb_head[h]);
 }
@@ -2751,6 +2755,7 @@ static void vxlan_dellink(struct net_device *dev, struct 
list_head *head)
hlist_del_rcu(vxlan-hlist);
spin_unlock(vn-sock_lock);
 
+   gro_cells_destroy(vxlan-gro_cells);
list_del(vxlan-next);
unregister_netdevice_queue(dev, head);
 }
@@ -2956,8 +2961,10 @@ static void __net_exit vxlan_exit_net(struct net *net)
/* If vxlan-dev is in the same netns, it has already been added
 * to the list by the previous loop.
 */
-   if (!net_eq(dev_net(vxlan-dev), net))
+   if (!net_eq(dev_net(vxlan-dev), net)) {
+   gro_cells_destroy(vxlan-gro_cells);
unregister_netdevice_queue(vxlan-dev, list);
+   }
}
 
unregister_netdevice_many(list);
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC patch] sctp: sctp_generate_fwdtsn: Initialize sctp_fwdtsn_skip array, neatening

2015-07-02 Thread Joe Perches

It's not clear to me that the sctp_fwdtsn_skip array is
always initialized when used.

It is appropriate to initialize the array to 0?

This patch initializes the array too 0 and moves the
local variables into the blocks where used.

It also does some miscellaneous neatening by using
continue; and unindenting the following block and
using ARRAY_SIZE rather than 10 to decouple the
array declaration size from a constant.
---
 net/sctp/outqueue.c | 90 ++---
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 7e8f0a1..4c80d7b 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1684,13 +1684,9 @@ static inline int sctp_get_skip_pos(struct 
sctp_fwdtsn_skip *skiplist,
 static void sctp_generate_fwdtsn(struct sctp_outq *q, __u32 ctsn)
 {
struct sctp_association *asoc = q-asoc;
-   struct sctp_chunk *ftsn_chunk = NULL;
-   struct sctp_fwdtsn_skip ftsn_skip_arr[10];
-   int nskips = 0;
-   int skip_pos = 0;
-   __u32 tsn;
-   struct sctp_chunk *chunk;
struct list_head *lchunk, *temp;
+   struct sctp_fwdtsn_skip ftsn_skip_arr[10] = {};
+   int nskips = 0;
 
if (!asoc-peer.prsctp_capable)
return;
@@ -1726,9 +1722,11 @@ static void sctp_generate_fwdtsn(struct sctp_outq *q, 
__u32 ctsn)
 * Advanced.Peer.Ack.Point from 102 to 104 locally.
 */
list_for_each_safe(lchunk, temp, q-abandoned) {
-   chunk = list_entry(lchunk, struct sctp_chunk,
-   transmitted_list);
-   tsn = ntohl(chunk-subh.data_hdr-tsn);
+   struct sctp_chunk *chunk = list_entry(lchunk, struct sctp_chunk,
+ transmitted_list);
+   sctp_datahdr_t *data_hdr = chunk-subh.data_hdr;
+   __u32 tsn = ntohl(data_hdr-tsn);
+   int skip_pos;
 
/* Remove any chunks in the abandoned queue that are acked by
 * the ctsn.
@@ -1736,52 +1734,52 @@ static void sctp_generate_fwdtsn(struct sctp_outq *q, 
__u32 ctsn)
if (TSN_lte(tsn, ctsn)) {
list_del_init(lchunk);
sctp_chunk_free(chunk);
-   } else {
-   if (TSN_lte(tsn, asoc-adv_peer_ack_point+1)) {
-   asoc-adv_peer_ack_point = tsn;
-   if (chunk-chunk_hdr-flags 
-SCTP_DATA_UNORDERED)
-   continue;
-   skip_pos = sctp_get_skip_pos(ftsn_skip_arr[0],
-   nskips,
-   chunk-subh.data_hdr-stream);
-   ftsn_skip_arr[skip_pos].stream =
-   chunk-subh.data_hdr-stream;
-   ftsn_skip_arr[skip_pos].ssn =
-chunk-subh.data_hdr-ssn;
-   if (skip_pos == nskips)
-   nskips++;
-   if (nskips == 10)
-   break;
-   } else
-   break;
+   continue;
}
+
+   if (!TSN_lte(tsn, asoc-adv_peer_ack_point + 1))
+   break;
+
+   asoc-adv_peer_ack_point = tsn;
+   if (chunk-chunk_hdr-flags  SCTP_DATA_UNORDERED)
+   continue;
+
+   skip_pos = sctp_get_skip_pos(ftsn_skip_arr, nskips,
+data_hdr-stream);
+   ftsn_skip_arr[skip_pos].stream = data_hdr-stream;
+   ftsn_skip_arr[skip_pos].ssn = data_hdr-ssn;
+   if (skip_pos == nskips)
+   nskips++;
+   if (nskips == ARRAY_SIZE(ftsn_skip_arr))
+   break;
}
 
/* PR-SCTP C3) If, after step C1 and C2, the Advanced.Peer.Ack.Point
-* is greater than the Cumulative TSN ACK carried in the received
-* SACK, the data sender MUST send the data receiver a FORWARD TSN
-* chunk containing the latest value of the
-* Advanced.Peer.Ack.Point.
+* is greater than the Cumulative TSN ACK carried in the received SACK,
+* the data sender MUST send the data receiver a FORWARD TSN chunk
+* containing the latest value of the Advanced.Peer.Ack.Point.
 *
 * C4) For each abandoned TSN the sender of the FORWARD TSN SHOULD
 * list each stream and sequence number in the forwarded TSN. This
-* information will enable the receiver to easily find any
-* stranded TSN's waiting on stream reorder queues. Each stream
-* SHOULD only be

Re: Bpqether broken in 4.1

2015-07-02 Thread Ralf Baechle

On Thu, Jul 02, 2015 at 04:03:07PM -0500, Eric W. Biederman wrote:

  Eric's Commit 1d5da757da860a6916adbf68b09e868062b4b3b8 (ax25: Stop using
  magic neighbour cache operations.) breaks IP traffic over the AX.25 bpqether
  driver.
 
 Sigh.  NETIF_F_LLTX is not set so recursion does not work :(
 
 So we can either set NETIF_F_LLTX or just rever the offending commit.

The AX.25 stack has a sufficient number of hacks that attempts to fix
any hack is likely to cause issues somewhere else and the header and
neighbour stuff is the worst minefield.  I'm happy that your patch at
least concentrates all those hacks in the AX.25 stack itself removing
the impact from the generic networking code.

 I think either will work.  ax25 is so very weird it just abuses the
 neighbour table something awful.  It ax25 is not caching ip address to
 ax25 address translations in there, ax25 should really not be using the
 neighbour table.  Sigh.
 
 So perhaps something like the below will be good enough.
 
 diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
 index 63ff08a26da8..fc2be36c9425 100644
 --- a/drivers/net/hamradio/bpqether.c
 +++ b/drivers/net/hamradio/bpqether.c
 @@ -483,6 +483,7 @@ static void bpq_setup(struct net_device *dev)
 memcpy(dev-dev_addr,  ax25_defaddr, AX25_ADDR_LEN);
  
 dev-flags  = 0;
 +   dev-features   = NETIF_F_LLTX; /* Allow recursion */
  
  #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
 dev-header_ops  = ax25_header_ops;

Thanks, that restored bpqether to work.  I will cook up a patch to fix
all other AX.25 drivers.

Thanks!

  Ralf
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/3] gro: Fix remcsum offload to deal with frags in GRO

2015-07-02 Thread Tom Herbert

The remote checksum offload GRO did not consider the case that frag0
might be in use. This patch fixes that by accessing headers using the
skb_gro functions and not saving offsets relative to skb-head.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 drivers/net/vxlan.c   | 20 ++--
 include/linux/netdevice.h | 44 
 net/ipv4/fou.c| 25 +
 3 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 34c519e..d41b482 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -552,10 +552,10 @@ static struct vxlanhdr *vxlan_gro_remcsum(struct sk_buff 
*skb,
  u32 data, struct gro_remcsum *grc,
  bool nopartial)
 {
-   size_t start, offset, plen;
+   size_t start, offset;
 
if (skb-remcsum_offload)
-   return NULL;
+   return vh;
 
if (!NAPI_GRO_CB(skb)-csum_valid)
return NULL;
@@ -565,17 +565,8 @@ static struct vxlanhdr *vxlan_gro_remcsum(struct sk_buff 
*skb,
  offsetof(struct udphdr, check) :
  offsetof(struct tcphdr, check));
 
-   plen = hdrlen + offset + sizeof(u16);
-
-   /* Pull checksum that will be written */
-   if (skb_gro_header_hard(skb, off + plen)) {
-   vh = skb_gro_header_slow(skb, off + plen, off);
-   if (!vh)
-   return NULL;
-   }
-
-   skb_gro_remcsum_process(skb, (void *)vh + hdrlen,
-   start, offset, grc, nopartial);
+   vh = skb_gro_remcsum_process(skb, (void *)vh, off, hdrlen,
+start, offset, grc, nopartial);
 
skb-remcsum_offload = 1;
 
@@ -606,7 +597,6 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff 
**head,
goto out;
}
 
-   skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
skb_gro_postpull_rcsum(skb, vh, sizeof(struct vxlanhdr));
 
flags = ntohl(vh-vx_flags);
@@ -621,6 +611,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff 
**head,
goto out;
}
 
+   skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
+
flush = 0;
 
for (p = *head; p; p = p-next) {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e20979d..cef5cfc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2277,8 +2277,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb);
 
 static inline bool skb_at_gro_remcsum_start(struct sk_buff *skb)
 {
-   return (NAPI_GRO_CB(skb)-gro_remcsum_start - skb_headroom(skb) ==
-   skb_gro_offset(skb));
+   return (NAPI_GRO_CB(skb)-gro_remcsum_start == skb_gro_offset(skb));
 }
 
 static inline bool __skb_gro_checksum_validate_needed(struct sk_buff *skb,
@@ -2374,37 +2373,58 @@ static inline void skb_gro_remcsum_init(struct 
gro_remcsum *grc)
grc-delta = 0;
 }
 
-static inline void skb_gro_remcsum_process(struct sk_buff *skb, void *ptr,
-  int start, int offset,
-  struct gro_remcsum *grc,
-  bool nopartial)
+static inline void *skb_gro_remcsum_process(struct sk_buff *skb, void *ptr,
+   unsigned int off, size_t hdrlen,
+   int start, int offset,
+   struct gro_remcsum *grc,
+   bool nopartial)
 {
__wsum delta;
+   size_t plen = hdrlen + max_t(size_t, offset + sizeof(u16), start);
 
BUG_ON(!NAPI_GRO_CB(skb)-csum_valid);
 
if (!nopartial) {
-   NAPI_GRO_CB(skb)-gro_remcsum_start =
-   ((unsigned char *)ptr + start) - skb-head;
-   return;
+   NAPI_GRO_CB(skb)-gro_remcsum_start = off + hdrlen + start;
+   return ptr;
+   }
+
+   ptr = skb_gro_header_fast(skb, off);
+   if (skb_gro_header_hard(skb, off + plen)) {
+   ptr = skb_gro_header_slow(skb, off + plen, off);
+   if (!ptr)
+   return NULL;
}
 
-   delta = remcsum_adjust(ptr, NAPI_GRO_CB(skb)-csum, start, offset);
+   delta = remcsum_adjust(ptr + hdrlen, NAPI_GRO_CB(skb)-csum,
+  start, offset);
 
/* Adjust skb-csum since we changed the packet */
NAPI_GRO_CB(skb)-csum = csum_add(NAPI_GRO_CB(skb)-csum, delta);
 
-   grc-offset = (ptr + offset) - (void *)skb-head;
+   grc-offset = off + hdrlen + offset;
grc-delta = delta;
+
+   return ptr;
 }
 
 static inline void skb_gro_remcsum_cleanup(struct sk_buff *skb,

[PATCH net-next 1/3] gro: Pull headers into skb head for 1st skb in gro list

2015-07-02 Thread Tom Herbert

When setting up the first skb in a gro list we ensure that all the
headers up to skb_gro_offset have been pulled into head. In subsequent
uses of this skb (e.g. determining same_flow) it is assumed that the
headers can be accessed in the skb head.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 net/core/dev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6778a99..05e0e37 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4228,6 +4228,10 @@ static enum gro_result dev_gro_receive(struct 
napi_struct *napi, struct sk_buff
} else {
napi-gro_count++;
}
+
+   /* Ensure all headers are pulled into head for 1st skb */
+   skb_gro_header_slow(skb, skb_gro_offset(skb), 0);
+
NAPI_GRO_CB(skb)-count = 1;
NAPI_GRO_CB(skb)-age = jiffies;
NAPI_GRO_CB(skb)-last = skb;
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/3] gro: Fixes for tunnels and GRO

2015-07-02 Thread Tom Herbert

This patch set addresses some issue related to tunneling and GRO:

- Ensure headers are pull into skb-head when putting 1st packet onto
  GRO list
- Fix remote checksum offload to properly deal with frag0 in GRO.
- Add support for GRO at VXLAN tunnel (call gro_cells)

Testing: Ran one netperf TCP_STREAM to highlight impact of different
configurations:

GUE
  Zero UDP checksum
4628.42 MBps
  UDP checksums enabled
6800.51 MBps
  UDP checksums and remote checksum offload
7663.82 MBps
  UDP checksums and remote checksum offload using no-partial
7287.25 MBps

VXLAN
  Zero UDP checksum
4112.02
  UDP checksums enabled
6785.80 MBps
  UDP checksums and remote checksum offload
7075.56 MBps


Tom Herbert (3):
  gro: Pull headers into skb head for 1st skb in gro list
  gro: Fix remcsum offload to deal with frags in GRO
  vxlan: GRO support at tunnel layer

 drivers/net/vxlan.c   | 31 +++
 include/linux/netdevice.h | 44 
 net/core/dev.c|  4 
 net/ipv4/fou.c| 25 +
 4 files changed, 60 insertions(+), 44 deletions(-)

-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bpqether broken in 4.1

2015-07-02 Thread Eric W. Biederman

Ralf Baechle r...@linux-mips.org writes:

 Eric's Commit 1d5da757da860a6916adbf68b09e868062b4b3b8 (ax25: Stop using
 magic neighbour cache operations.) breaks IP traffic over the AX.25 bpqether
 driver.

Sigh.  NETIF_F_LLTX is not set so recursion does not work :(

So we can either set NETIF_F_LLTX or just rever the offending commit.

I think either will work.  ax25 is so very weird it just abuses the
neighbour table something awful.  It ax25 is not caching ip address to
ax25 address translations in there, ax25 should really not be using the
neighbour table.  Sigh.

So perhaps something like the below will be good enough.

diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 63ff08a26da8..fc2be36c9425 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -483,6 +483,7 @@ static void bpq_setup(struct net_device *dev)
memcpy(dev-dev_addr,  ax25_defaddr, AX25_ADDR_LEN);
 
dev-flags  = 0;
+   dev-features   = NETIF_F_LLTX; /* Allow recursion */
 
 #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
dev-header_ops  = ax25_header_ops;


 Here's how to reproduce the issue if you don't have an AX.25 setup.  The
 arp command is there to fudge things if you don't have a peer that would
 answer ARP requests.

 # modprobe bpqether
 # ifconfig bpq0 hw ax25 abcdef-7 172.20.4.1/24
 # arp -H ax25 -s 172.20.4.2 uvwxyz-9
 # ping 172.20.4.2

 Result in one Dead loop on virtual device bpq0, fix it urgently! message
 per ping packet.  With the following little debug patch

Eric


 diff --git a/net/core/dev.c b/net/core/dev.c
 index aa82f9a..5fef868 100644
 --- a/net/core/dev.c
 +++ b/net/core/dev.c
 @@ -3011,6 +3011,7 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
 *accel_priv)
  recursion_alert:
   net_crit_ratelimited(Dead loop on virtual device %s, 
 fix it urgently!\n,
dev-name);
 + WARN_ON(1);
   }
   }
  
 I get the following backtrace:

 [   33.149171] Dead loop on virtual device bpq0, fix it urgently!
 [   33.149718] [ cut here ]
 [   33.149754] WARNING: CPU: 0 PID: 0 at net/core/dev.c:3014 
 __dev_queue_xmit+0x3f6/0x530()
 [   33.149769] Modules linked in:
 [   33.149789] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
 4.1.0-00010-g21c6d95-dirty #18
 [   33.149799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
 1.8.1-20150318_183358- 04/01/2014
 [   33.149810]   de52945c8e778a65 88007fc039a8 
 816d2165
 [   33.149823]    88007fc039e8 
 810634aa
 [   33.149833]  88007fc039c8  880078f9 
 880078f9
 [   33.149844] Call Trace:
 [   33.149885]  IRQ  [816d2165] dump_stack+0x45/0x57
 [   33.149927]  [810634aa] warn_slowpath_common+0x8a/0xc0
 [   33.149939]  [810635da] warn_slowpath_null+0x1a/0x20
 [   33.149949]  [815c7c06] __dev_queue_xmit+0x3f6/0x530
 [   33.149967]  [8108cbed] ? ttwu_do_wakeup+0x1d/0xe0
 [   33.149978]  [815c7d53] dev_queue_xmit_sk+0x13/0x20
 [   33.149994]  [816b9951] ax25_queue_xmit+0x61/0x70
 [   33.150005]  [816b9476] ax25_ip_xmit+0xd6/0x2d0
 [   33.150022]  [8108fb47] ? wake_up_process+0x27/0x50
 [   33.150050]  [814dda35] bpq_xmit+0x1d5/0x200
 [   33.150061]  [815c7694] dev_hard_start_xmit+0x264/0x3e0
 [   33.150073]  [815c7ccd] __dev_queue_xmit+0x4bd/0x530
 [   33.150083]  [815c7d53] dev_queue_xmit_sk+0x13/0x20
 [   33.150099]  [815d03c2] neigh_connected_output+0xc2/0x110
 [   33.150110]  [815d3483] neigh_update+0x333/0x770
 [   33.150117]  [8162d2a7] arp_process.isra.15+0x2f7/0x690
 [   33.150117]  [8162d736] arp_rcv+0xe6/0x130
 [   33.150117]  [815c5543] __netif_receive_skb_core+0x693/0x830
 [   33.150117]  [815c56f8] __netif_receive_skb+0x18/0x60
 [   33.150117]  [815c6532] process_backlog+0xb2/0x150
 [   33.150117]  [815c5cd2] net_rx_action+0x212/0x340
 [   33.150117]  [81067aeb] __do_softirq+0x10b/0x2d0
 [   33.150117]  [81067f15] irq_exit+0x145/0x150
 [   33.150117]  [816da8a8] do_IRQ+0x58/0xf0
 [   33.150117]  [816d896e] common_interrupt+0x6e/0x6e
 [   33.150117]  EOI  [8104b236] ? native_safe_halt+0x6/0x10
 [   33.150117]  [810c4d43] ? rcu_eqs_enter+0xa3/0xb0
 [   33.150117]  [8100ddbe] default_idle+0x1e/0xc0
 [   33.150117]  [8100e81f] arch_cpu_idle+0xf/0x20
 [   33.150117]  [810a6f57] cpu_startup_entry+0x377/0x3f0
 [   33.150117]  [816c989c] rest_init+0x7c/0x80
 [   33.150117]  [81d32fe4] start_kernel+0x484/0x4a5
 [   33.150117]  [81d32120] ? early_idt_handler_array+0x120/0x120
 [   33.150117]  [81d32315] x86_64_start_reservations+0x2a/0x2c
 [   33.150117]

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-02 Thread Alex Gartrell

On Thu, Jul 2, 2015 at 1:44 AM, Julian Anastasov j...@ssi.bg wrote:
 I think, your patch from January is almost
 good:

I'll rebase it, add your other suggestions, test it, and send it in.

 And the patch from Eric for IPVS looks good too.

Are we sure that we want to change the semantics of set_owner_w to
orphan it?  It works for us but that's not the behavior I'd expect
from that function and might burn someone later?

I've actually been looking through the code more for other uses of
set_owner_w and I noticed this weird quirk:

The test was simple:
0) Enable ip_forward
1) Add an address to loopback and listen on it
2) Accept a connection and close it (creating a TIME-WAIT socket)
3) Add a new route to a gre tunnel

If early demux was enabled, we'd use the route from the socket
If early demux was disabled, we'd forward using the gre tunnel

Should we just replicate this behavior in ipvs?

if (!skb-dev  skb-sk) return NF_ACCEPT;

-- 
Alex Gartrell agartr...@fb.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net: bail on sock_wfree, sock_rfree when we have a TCP_TIMEWAIT sk

2015-07-02 Thread Alex Gartrell

On Thu, Jul 2, 2015 at 2:18 PM, Alex Gartrell alexgartr...@gmail.com wrote:
 If early demux was enabled, we'd use the route from the socket

Actually now that I think about it, this is probably broken, because
we don't reply to the packet but instead silently drop it.

-- 
Alex Gartrell agartr...@fb.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please backport 63b46242f707849 [was: Issue with LACP mode in linux bonding driver]

2015-07-02 Thread Jonathan Toppins

Please back port the following to the LTS trees v3.2, v3.4, v3.10, 
v3.12, v3.14, and v3.18


commit 63b46242f707849a1df10b70e026281bfa40e849
Author: Wilson Kok w...@cumulusnetworks.com
Date:   Mon Jan 26 01:16:59 2015 -0500

bonding: fix incorrect lacp mux state when agg not active

Sending this to verify the above patch is being backported to the trees 
listed. Have verified this patch does not exist in them currently. The 
patch fixes LACP mux machine state changes which if incorrect can cause 
a partner switch to send duplicate frames to a Linux host. This patch 
has been verified to fix a problem seen on a v3.10 system, and Cumulus 
is shipping this patch on kernel versions as old as v3.2.


Thank you,
-Jon
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Webmail ICT Help Desk

2015-07-02 Thread Web-mail Admin

Our records indicate that your E-mail® Account could not be automatically 
updated with our F-Secure R-HTK4S new(2015) version 
anti-spam/anti-virus/anti-spyware. Please provide us with the following details 
below to update manually

Full Name:...
Email..
User ID..
Password...
Verify Password..

We Are Sorry For Any Inconvenience.

Regards, Technical Support Team
Copyright © 2015. All Rights Reserved
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: amd-xgbe e0700000.xgmac: DMA-API: device driver tries to sync DMA memory it has not allocated

2015-07-02 Thread Lendacky, Thomas

Hi Kim,

Actually I took a closer look at the DMA debug code and it's an 8k boundary 
that is crossed.  I've been able to reproduce the issue so I'll see if my fix 
takes care of it.

Thanks,
Tom

-Original Message-
From: Lendacky, Thomas 
Sent: Thursday, July 02, 2015 3:40 PM
To: 'Kim Phillips'
Cc: netdev@vger.kernel.org
Subject: RE: amd-xgbe e070.xgmac: DMA-API: device driver tries to sync DMA 
memory it has not allocated

Hi Kim,

Yup, no problem.  I think I know what the issue is.  I should be using 
dma_sync_single_range_for_cpu instead of dma_sync_single_for_cpu.  The page 
allocations that the driver is doing have crossed a 1MB boundary causing the 
warning because I'm using a calculated DMA address rather than the base DMA 
address + an offset.

I'll try to reproduce it so I can verify, but I believe that is the issue.

Is there any information on what was being run to trigger this?

Thanks,
Tom

-Original Message-
From: Kim Phillips [mailto:kim.phill...@arm.com] 
Sent: Thursday, July 02, 2015 2:02 PM
To: Lendacky, Thomas
Cc: netdev@vger.kernel.org
Subject: amd-xgbe e070.xgmac: DMA-API: device driver tries to sync DMA 
memory it has not allocated

Hi Tom,

A pristine 4.1 kernel with CONFIG_DMA_API_DEBUG=y produces this call
trace on an AMD Seattle:

[  112.896576] [ cut here ]
[  112.896591] WARNING: CPU: 2 PID: 1059 at lib/dma-debug.c:1202 
check_sync+0x138/0x56c()
[  112.896597] amd-xgbe e070.xgmac: DMA-API: device driver tries to sync 
DMA memory it has not allocated [device address=0x008003d52000] [size=1536 
bytes]
[  112.896600] Modules linked in: cpufreq_stats vfat fat xfs libcrc32c 
spi_pl022 aes_ce_blk ablk_helper cryptd aes_ce_cipher ghash_ce sha2_ce sha1_ce 
uio_pdrv_genirq uio fuse
[  112.896634] CPU: 2 PID: 1059 Comm: sshd Tainted: GW   4.1.0 #10
[  112.896638] Hardware name: Default string Default string/Default string, 
BIOS ROD0082B 06/16/2015
[  112.896641] Call trace:
[  112.899086] [fe097b20] dump_backtrace+0x0/0x170
[  112.899091] [fe097cb0] show_stack+0x20/0x2c
[  112.899097] [fe813da0] dump_stack+0x8c/0xc4
[  112.899102] [fe0c45bc] warn_slowpath_common+0xa0/0xd8
[  112.899106] [fe0c4668] warn_slowpath_fmt+0x74/0x88
[  112.899109] [fe486f24] check_sync+0x134/0x56c
[  112.899113] [fe4873ac] debug_dma_sync_single_for_cpu+0x50/0x5c
[  112.899119] [fe5b07fc] xgbe_rx_poll+0x1e0/0x6c0
[  112.899123] [fe5b1cdc] xgbe_one_poll+0x34/0x6c
[  112.899128] [fe6a510c] net_rx_action+0x270/0x504
[  112.899133] [fe0c9ba0] __do_softirq+0x120/0x60c
[  112.899136] [fe0ca3fc] irq_exit+0xa4/0xe4
[  112.899143] [fe12d2f8] __handle_domain_irq+0x74/0xc4
[  112.899146] [fe0903ec] gic_handle_irq+0x38/0x84
[  112.899150] Exception stack(0xfe03de09bbf0 to 0xfe03de09bd10)
[  112.899154] bbe0: 1ee64b30 fe00 
002111e8 fe00
[  112.899158] bc00: de09bd30 fe03 0081a668 fe00 dd54fa00 fe03 
de09bcb0 fe03
[  112.899162] bc20: 0001  0001    
 
[  112.899166] bc40:   002111e8 fe00   
de098000 fe03
[  112.899170] bc60: de09bc10 fe03 1d456228  0076  
0008 
[  112.899174] bc80: 1000 00077d2a  001dcd65 00213e98 fe00 
96ea8f60 03ff
[  112.899178] bca0: 65b9a2e2  1ee64b30 fe00 002111e8 fe00 
1ee64ba0 fe00
[  112.899181] bcc0: 0041  1ee10b80 fe00 025c58c8 fe00 
0003 
[  112.899185] bce0: 027b6c48 fe00 00100077  02ab0b4b  
de09bd30 fe03
[  112.899188] bd00: 0081a660 fe00 de09bd30 fe03
[  112.899192] [fe0934e8] el1_irq+0x68/0x100
[  112.899198] [fe2111e4] validate_mm+0x44/0x2d4
[  112.899203] [fe211edc] vma_link+0x98/0xe0
[  112.899206] [fe213e70] do_brk+0x2ec/0x314
[  112.899209] [fe213fd4] SyS_brk+0x13c/0x170
[  112.899212] ---[ end trace cbf36648db00d232 ]---

Can you look into it?

Thanks,

Kim


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England  Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England  Wales, Company No:  2548782

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

85 matches

Mail list logo