date:20160225

pull-request: can-next 2016-02-26

2016-02-25 Thread Marc Kleine-Budde

Hello David,

this is a pull request of 3 patch for net-next/master.

There are two patches by Simon Horman, in which the device tree support
for the rcar_can driver is improved. One patch by me fixes the bad
coding style of the ems_usb driver which was introduced recently.

regards,
Marc

---

The following changes since commit 8d3f2806f8fbd9b222b3504580b5eaa9ba2964b8:

  Merge branch 'ethtool-ksettings' (2016-02-25 22:07:00 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.6-20160226

for you to fetch changes up to f71096dfd129e4ad1ae80cd10d5ac050e5730f8a:

  can: rcar: add device tree support for r8a779[234] (2016-02-26 08:43:34 +0100)


linux-can-next-for-4.6-20160226


Marc Kleine-Budde (1):
  can: ems_usb: fix coding style

Simon Horman (2):
  can: rcar: add gen[12] fallback compatibility strings
  can: rcar: add device tree support for r8a779[234]

 Documentation/devicetree/bindings/net/can/rcar_can.txt | 11 ++-
 drivers/net/can/rcar_can.c |  2 ++
 drivers/net/can/usb/ems_usb.c  |  8 +++-
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

[PATCH net-next 5/5] vxlan: implement GPE in L3 mode

2016-02-25 Thread Jiri Benc

Implement L3 mode for VXLAN-GPE (i.e. IPv4/IPv6 payload directly after the
VXLAN header).

The GPE header parsing has to be moved before iptunnel_pull_header, as we
need to know the protocol.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c  | 127 +++
 include/net/vxlan.h  |   3 +-
 include/uapi/linux/if_link.h |   1 +
 3 files changed, 108 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 81a7c1a829b9..b79472d3fd36 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1191,6 +1191,7 @@ out:
 }
 
 static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
+   __be32 *protocol,
struct sk_buff *skb, u32 vxflags)
 {
struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
@@ -1210,9 +1211,30 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr 
*unparsed,
if (gpe->oam_flag)
return false;
 
-   if (gpe->next_protocol != VXLAN_GPE_NP_ETHERNET)
+   /* L2 mode */
+   if (gpe->next_protocol == VXLAN_GPE_NP_ETHERNET) {
+   if (vxflags & VXLAN_F_GPE_L3)
+   return false;
+   *protocol = htons(ETH_P_TEB);
+   goto out;
+   }
+
+   /* L3 mode */
+   if (!(vxflags & VXLAN_F_GPE_L3))
+   return false;
+
+   switch (gpe->next_protocol) {
+   case VXLAN_GPE_NP_IPV4:
+   *protocol = htons(ETH_P_IP);
+   break;
+   case VXLAN_GPE_NP_IPV6:
+   *protocol = htons(ETH_P_IPV6);
+   break;
+   default:
return false;
+   }
 
+out:
unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
return true;
 }
@@ -1282,9 +1304,10 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
*skb)
struct vxlanhdr unparsed;
struct vxlan_metadata _md;
struct vxlan_metadata *md = &_md;
+   __be32 protocol = htons(ETH_P_TEB);
void *oiph;
 
-   /* Need Vxlan and inner Ethernet header to be present */
+   /* Need UDP and VXLAN header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
return 1;
 
@@ -1308,7 +1331,14 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
*skb)
if (!vxlan)
goto drop;
 
-   if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB),
+   /* For backwards compatibility, only allow reserved fields to be
+* used by VXLAN extensions if explicitly requested.
+*/
+   if (vs->flags & VXLAN_F_GPE)
+   if (!vxlan_parse_gpe_hdr(, , skb, vs->flags))
+   goto drop;
+
+   if (iptunnel_pull_header(skb, VXLAN_HLEN, protocol,
 !net_eq(vxlan->net, dev_net(vxlan->dev
goto drop;
 
@@ -1329,12 +1359,6 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
*skb)
memset(md, 0, sizeof(*md));
}
 
-   /* For backwards compatibility, only allow reserved fields to be
-* used by VXLAN extensions if explicitly requested.
-*/
-   if (vs->flags & VXLAN_F_GPE)
-   if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
-   goto drop;
if (vs->flags & VXLAN_F_REMCSUM_RX)
if (!vxlan_remcsum(, skb, vs->flags))
goto drop;
@@ -1353,8 +1377,13 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff 
*skb)
goto drop;
}
 
-   if (!vxlan_set_mac(vxlan, vs, skb))
-   goto drop;
+   if (protocol == htons(ETH_P_TEB)) {
+   if (!vxlan_set_mac(vxlan, vs, skb))
+   goto drop;
+   } else {
+   skb->dev = vxlan->dev;
+   skb->pkt_type = PACKET_HOST;
+   }
 
oiph = skb_network_header(skb);
skb_reset_network_header(skb);
@@ -1713,12 +1742,29 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, 
u32 vxflags,
gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
 }
 
-static void vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags)
+static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags,
+  __be16 protocol)
 {
struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
 
gpe->np_applied = 1;
-   gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
+
+   /* L2 mode */
+   if (!(vxflags & VXLAN_F_GPE_L3)) {
+   gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
+   return 0;
+   }
+
+   /* L3 mode */
+   switch (protocol) {
+   case htons(ETH_P_IP):
+   gpe->next_protocol = VXLAN_GPE_NP_IPV4;
+   return 0;
+   case htons(ETH_P_IPV6):
+   gpe->next_protocol = VXLAN_GPE_NP_IPV6;
+   return 0;
+   }
+   return -EPFNOSUPPORT;
 }
 
 static int vxlan_build_skb(struct sk_buff *skb, struct

[PATCH net-next 2/5] vxlan: move L2 mode initialization to a separate function

2016-02-25 Thread Jiri Benc

This will allow to initialize vxlan in L3 mode based on the passed rtnl
attributes.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c7844bae339d..49fcca8d8a8e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2437,7 +2437,7 @@ static int vxlan_fill_metadata_dst(struct net_device 
*dev, struct sk_buff *skb)
return 0;
 }
 
-static const struct net_device_ops vxlan_netdev_ops = {
+static const struct net_device_ops vxlan_netdev_l2mode_ops = {
.ndo_init   = vxlan_init,
.ndo_uninit = vxlan_uninit,
.ndo_open   = vxlan_open,
@@ -2491,10 +2491,6 @@ static void vxlan_setup(struct net_device *dev)
struct vxlan_dev *vxlan = netdev_priv(dev);
unsigned int h;
 
-   eth_hw_addr_random(dev);
-   ether_setup(dev);
-
-   dev->netdev_ops = _netdev_ops;
dev->destructor = free_netdev;
SET_NETDEV_DEVTYPE(dev, _type);
 
@@ -2509,8 +2505,7 @@ static void vxlan_setup(struct net_device *dev)
dev->hw_features |= NETIF_F_GSO_SOFTWARE;
dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
netif_keep_dst(dev);
-   dev->priv_flags &= ~IFF_TX_SKB_SHARING;
-   dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE;
+   dev->priv_flags |= IFF_NO_QUEUE;
 
INIT_LIST_HEAD(>next);
spin_lock_init(>hash_lock);
@@ -2529,6 +2524,15 @@ static void vxlan_setup(struct net_device *dev)
INIT_HLIST_HEAD(>fdb_head[h]);
 }
 
+static void vxlan_l2mode_setup(struct net_device *dev)
+{
+   eth_hw_addr_random(dev);
+   ether_setup(dev);
+   dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+   dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+   dev->netdev_ops = _netdev_l2mode_ops;
+}
+
 static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_ID] = { .type = NLA_U32 },
[IFLA_VXLAN_GROUP]  = { .len = FIELD_SIZEOF(struct iphdr, daddr) },
@@ -2759,6 +2763,8 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
((conf->flags & VXLAN_F_GBP) && (conf->flags & VXLAN_F_GPE)))
return -EINVAL;
 
+   vxlan_l2mode_setup(dev);
+
vxlan->net = src_net;
 
dst->remote_vni = conf->vni;
-- 
1.8.3.1

[PATCH net-next 4/5] vxlan: fix too large pskb_may_pull with remote checksum

2016-02-25 Thread Jiri Benc

The vxlan header is pulled at this point, don't include it again in the
calculation.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 029e0e5b0b2b..81a7c1a829b9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1142,7 +1142,7 @@ static int vxlan_igmp_leave(struct vxlan_dev *vxlan)
 static bool vxlan_remcsum(struct vxlanhdr *unparsed,
  struct sk_buff *skb, u32 vxflags)
 {
-   size_t start, offset, plen;
+   size_t start, offset;
 
if (!(unparsed->vx_flags & VXLAN_HF_RCO) || skb->remcsum_offload)
goto out;
@@ -1150,9 +1150,7 @@ static bool vxlan_remcsum(struct vxlanhdr *unparsed,
start = vxlan_rco_start(unparsed->vx_vni);
offset = start + vxlan_rco_offset(unparsed->vx_vni);
 
-   plen = sizeof(struct vxlanhdr) + offset + sizeof(u16);
-
-   if (!pskb_may_pull(skb, plen))
+   if (!pskb_may_pull(skb, offset + sizeof(u16)))
return false;
 
skb_remcsum_process(skb, (void *)(vxlan_hdr(skb) + 1), start, offset,
-- 
1.8.3.1

[PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-25 Thread Jiri Benc

Implement VXLAN-GPE. Only L2 mode (i.e. encapsulated Ethernet frame) is
supported by this patch.

L3 mode will be added by subsequent patches.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c  | 68 ++--
 include/net/vxlan.h  | 62 +++-
 include/uapi/linux/if_link.h |  8 ++
 3 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 775ddb48388d..c7844bae339d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1192,6 +1192,33 @@ out:
unparsed->vx_flags &= ~VXLAN_GBP_USED_BITS;
 }
 
+static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
+   struct sk_buff *skb, u32 vxflags)
+{
+   struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)unparsed;
+
+   /* Need to have Next Protocol set for interfaces in GPE mode. */
+   if (!gpe->np_applied)
+   return false;
+   /* "The initial version is 0. If a receiver does not support the
+* version indicated it MUST drop the packet.
+*/
+   if (gpe->version != 0)
+   return false;
+   /* "When the O bit is set to 1, the packet is an OAM packet and OAM
+* processing MUST occur." However, we don't implement OAM
+* processing, thus drop the packet.
+*/
+   if (gpe->oam_flag)
+   return false;
+
+   if (gpe->next_protocol != VXLAN_GPE_NP_ETHERNET)
+   return false;
+
+   unparsed->vx_flags &= ~VXLAN_GPE_USED_BITS;
+   return true;
+}
+
 static bool vxlan_set_mac(struct vxlan_dev *vxlan,
  struct vxlan_sock *vs,
  struct sk_buff *skb)
@@ -1307,6 +1334,9 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
/* For backwards compatibility, only allow reserved fields to be
 * used by VXLAN extensions if explicitly requested.
 */
+   if (vs->flags & VXLAN_F_GPE)
+   if (!vxlan_parse_gpe_hdr(, skb, vs->flags))
+   goto drop;
if (vs->flags & VXLAN_F_REMCSUM_RX)
if (!vxlan_remcsum(, skb, vs->flags))
goto drop;
@@ -1685,6 +1715,14 @@ static void vxlan_build_gbp_hdr(struct vxlanhdr *vxh, 
u32 vxflags,
gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
 }
 
+static void vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 vxflags)
+{
+   struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vxh;
+
+   gpe->np_applied = 1;
+   gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
+}
+
 static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst,
   int iphdr_len, __be32 vni,
   struct vxlan_metadata *md, u32 vxflags,
@@ -1744,6 +1782,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
 
if (vxflags & VXLAN_F_GBP)
vxlan_build_gbp_hdr(vxh, vxflags, md);
+   if (vxflags & VXLAN_F_GPE)
+   vxlan_build_gpe_hdr(vxh, vxflags);
 
skb_set_inner_protocol(skb, htons(ETH_P_TEB));
return 0;
@@ -2515,6 +2555,7 @@ static const struct nla_policy 
vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_REMCSUM_RX] = { .type = NLA_U8 },
[IFLA_VXLAN_GBP]= { .type = NLA_FLAG, },
[IFLA_VXLAN_REMCSUM_NOPARTIAL]  = { .type = NLA_FLAG },
+   [IFLA_VXLAN_GPE_MODE]   = { .type = NLA_U8, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2714,6 +2755,10 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
__be16 default_port = vxlan->cfg.dst_port;
struct net_device *lowerdev = NULL;
 
+   if (((conf->flags & VXLAN_F_LEARN) && (conf->flags & VXLAN_F_GPE)) ||
+   ((conf->flags & VXLAN_F_GBP) && (conf->flags & VXLAN_F_GPE)))
+   return -EINVAL;
+
vxlan->net = src_net;
 
dst->remote_vni = conf->vni;
@@ -2770,8 +2815,12 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
dev->needed_headroom = needed_headroom;
 
memcpy(>cfg, conf, sizeof(*conf));
-   if (!vxlan->cfg.dst_port)
-   vxlan->cfg.dst_port = default_port;
+   if (!vxlan->cfg.dst_port) {
+   if (conf->flags & VXLAN_F_GPE)
+   vxlan->cfg.dst_port = 4790; /* IANA assigned VXLAN-GPE 
port */
+   else
+   vxlan->cfg.dst_port = default_port;
+   }
vxlan->flags |= conf->flags;
 
if (!vxlan->cfg.age_interval)
@@ -2941,6 +2990,13 @@ static int vxlan_newlink(struct net *src_net, struct 
net_device *dev,
if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
 
+   if (data[IFLA_VXLAN_GPE_MODE]) {
+   u8 mode = nla_get_u8(data[IFLA_VXLAN_GPE_MODE]);
+
+   if (mode > 0

[PATCH net-next 0/5] vxlan: implement Generic Protocol Extension (GPE)

2016-02-25 Thread Jiri Benc

VXLAN-GPE can operate in two modes: with encapsulated Ethernet header
(L2 mode) or with L3 header (e.g. IP header) directly following VXLAN-GPE
header (L3 mode).

Add support for both modes. The L2 mode is simple, as it's basically the
same as plain VXLAN, only with added bits in the header. The L3 mode is more
complicated. The patches adding it follow the same model as the tun/tap
driver: depending on the chosen mode, the vxlan interface is created either
as ARPHRD_ETHER (L2 mode) or ARPHRD_NONE (L3 mode).

Note that the internal fdb control plane cannot be used together with
VXLAN-GPE and attempt to configure it will be rejected by the driver. This
in theory could be relaxed for L2 mode in the future if such need arises.

Jiri Benc (5):
  vxlan: implement GPE in L2 mode
  vxlan: move L2 mode initialization to a separate function
  vxlan: move fdb code to common location in vxlan_xmit
  vxlan: fix too large pskb_may_pull with remote checksum
  vxlan: implement GPE in L3 mode

 drivers/net/vxlan.c  | 223 ---
 include/net/vxlan.h  |  63 +++-
 include/uapi/linux/if_link.h |   9 ++
 3 files changed, 258 insertions(+), 37 deletions(-)

-- 
1.8.3.1

[PATCH net-next 3/5] vxlan: move fdb code to common location in vxlan_xmit

2016-02-25 Thread Jiri Benc

Handle VXLAN_F_COLLECT_METADATA before VXLAN_F_PROXY. The latter does not
make sense with the former, as it needs populated fdb which does not happen
in metadata mode.

After this cleanup, the fdb code in vxlan_xmit is moved to a common location
and can be later skipped for L3 mode vxlan tunnels.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 49fcca8d8a8e..029e0e5b0b2b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2139,9 +2139,17 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
struct net_device *dev)
info = skb_tunnel_info(skb);
 
skb_reset_mac_header(skb);
-   eth = eth_hdr(skb);
 
-   if ((vxlan->flags & VXLAN_F_PROXY)) {
+   if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
+   if (info && info->mode & IP_TUNNEL_INFO_TX)
+   vxlan_xmit_one(skb, dev, NULL, false);
+   else
+   kfree_skb(skb);
+   return NETDEV_TX_OK;
+   }
+
+   if (vxlan->flags & VXLAN_F_PROXY) {
+   eth = eth_hdr(skb);
if (ntohs(eth->h_proto) == ETH_P_ARP)
return arp_reduce(dev, skb);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -2156,18 +2164,10 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
struct net_device *dev)
msg->icmph.icmp6_type == 
NDISC_NEIGHBOUR_SOLICITATION)
return neigh_reduce(dev, skb);
}
-   eth = eth_hdr(skb);
 #endif
}
 
-   if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
-   if (info && info->mode & IP_TUNNEL_INFO_TX)
-   vxlan_xmit_one(skb, dev, NULL, false);
-   else
-   kfree_skb(skb);
-   return NETDEV_TX_OK;
-   }
-
+   eth = eth_hdr(skb);
f = vxlan_find_mac(vxlan, eth->h_dest);
did_rsc = false;
 
-- 
1.8.3.1

[PATCH] can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()

2016-02-25 Thread Marc Kleine-Budde

From: Maximilain Schneider 

gs_destroy_candev() erroneously calls kfree() on a struct gs_can *, which is
allocated through alloc_candev() and should instead be freed using
free_candev() alone.

The inappropriate use of kfree() causes the kernel to hang when
gs_destroy_candev() is called.

Only the struct gs_usb * which is allocated through kzalloc() should be freed
using kfree() when the device is disconnected.

Signed-off-by: Maximilian Schneider 
Cc: linux-stable 
Signed-off-by: Marc Kleine-Budde 
---
 drivers/net/can/usb/gs_usb.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index 5eee62badf45..cbc99d5649af 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -826,9 +826,8 @@ static struct gs_can *gs_make_candev(unsigned int channel, 
struct usb_interface
 static void gs_destroy_candev(struct gs_can *dev)
 {
unregister_candev(dev->netdev);
-   free_candev(dev->netdev);
usb_kill_anchored_urbs(>tx_submitted);
-   kfree(dev);
+   free_candev(dev->netdev);
 }
 
 static int gs_usb_probe(struct usb_interface *intf, const struct usb_device_id 
*id)
@@ -913,12 +912,15 @@ static int gs_usb_probe(struct usb_interface *intf, const 
struct usb_device_id *
for (i = 0; i < icount; i++) {
dev->canch[i] = gs_make_candev(i, intf);
if (IS_ERR_OR_NULL(dev->canch[i])) {
+   /* save error code to return later */
+   rc = PTR_ERR(dev->canch[i]);
+
/* on failure destroy previously created candevs */
icount = i;
-   for (i = 0; i < icount; i++) {
+   for (i = 0; i < icount; i++)
gs_destroy_candev(dev->canch[i]);
-   dev->canch[i] = NULL;
-   }
+
+   usb_kill_anchored_urbs(>rx_submitted);
kfree(dev);
return rc;
}
@@ -939,16 +941,12 @@ static void gs_usb_disconnect(struct usb_interface *intf)
return;
}
 
-   for (i = 0; i < GS_MAX_INTF; i++) {
-   struct gs_can *can = dev->canch[i];
-
-   if (!can)
-   continue;
-
-   gs_destroy_candev(can);
-   }
+   for (i = 0; i < GS_MAX_INTF; i++)
+   if (dev->canch[i])
+   gs_destroy_candev(dev->canch[i]);
 
usb_kill_anchored_urbs(>rx_submitted);
+   kfree(dev);
 }
 
 static const struct usb_device_id gs_usb_table[] = {
-- 
2.7.0

pull-request: can 2016-02-26

2016-02-25 Thread Marc Kleine-Budde

Hello David,

this is a pull request of one patch for net.

The patch by Maximilain Schneider fixes a kfree() problem during disconnect in
the gs_usb driver.

regrds,
Marc
---

The following changes since commit 4c0b6eaf373a5323f03a3a20c42fc435715b073d:

  net: thunderx: Fix for Qset error due to CQ full (2016-02-25 16:25:34 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
tags/linux-can-fixes-for-4.5-20160226

for you to fetch changes up to e9a2d81b1761093386a0bb8a4f51642ac785ef63:

  can: gs_usb: fixed disconnect bug by removing erroneous use of kfree() 
(2016-02-26 08:36:33 +0100)


linux-can-fixes-for-4.5-20160226


Maximilain Schneider (1):
  can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()

 drivers/net/can/usb/gs_usb.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

Re: [PATCH] uapi glibc compat: fix cases where glibc net/if.h is included before linux/if.h

2016-02-25 Thread Mikko Rapeli

(Adding libc-alpha list, review of https://lkml.org/lkml/2016/2/7/89 )

On Wed, Feb 17, 2016 at 10:46:20AM -0500, David Miller wrote:
> From: Mikko Rapeli 
> Date: Sun,  7 Feb 2016 16:03:21 +0200
> 
> > @@ -68,6 +72,8 @@
> >   * @IFF_ECHO: echo sent packets. Volatile.
> >   */
> >  enum net_device_flags {
> > +/* for compatibility with glibc net/if.h */
> > +#if __UAPI_DEF_IF_NET_DEVICE_FLAGS
> > IFF_UP  = 1<<0,  /* sysfs */
> > IFF_BROADCAST   = 1<<1,  /* volatile */
> > IFF_DEBUG   = 1<<2,  /* sysfs */
> > @@ -84,11 +90,14 @@ enum net_device_flags {
> > IFF_PORTSEL = 1<<13, /* sysfs */
> > IFF_AUTOMEDIA   = 1<<14, /* sysfs */
> > IFF_DYNAMIC = 1<<15, /* sysfs */
> > +#endif /* __UAPI_DEF_IF_NET_DEVICE_FLAGS */
> > IFF_LOWER_UP= 1<<16, /* volatile */
> > IFF_DORMANT = 1<<17, /* volatile */
> > IFF_ECHO= 1<<18, /* volatile */
> >  };
> 
> This is going to get messy is IFF_LOWER_UP, IFF_DORMANT, and IFF_ECHO
> get added the the glibc header.  Why not just handle it now with
> another __UAPI_DEF_FOO guard so that the additions to net/if.h can
> deal with this case too.

Do you mean that the enum should be protected with a single guard or
should I have one guard for current conflicts and one for the future
if glibc headers include IFF_LOWER_UP and others?

-Mikko

Re: [PATCH net] ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL

2016-02-25 Thread Hannes Frederic Sowa


On 26.02.2016 01:45, Wakko Warner wrote:

Now there's another one:
[  777.315931] [ cut here ]
[  777.316099] WARNING: CPU: 0 PID: 1404 at 
/usr/src/linux/dist/4.4-nobklcd/net/ipv4/af_inet.c:155 
inet_sock_destruct+0x1cb/0x1f0()
[  777.316189] Modules linked in: nfsv3 af_packet scsi_transport_iscsi nfsd 
auth_rpcgss oid_registry exportfs nfs lockd grace sunrpc ipv6 ata_piix libata 
evdev virtio_balloon virtio_net unix
[  777.316416] CPU: 0 PID: 1404 Comm: kworker/0:1H Not tainted 4.4.0 #2
[  777.316468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[  777.316547] Workqueue: rpciod xprt_autoclose [sunrpc]
[  777.316598]  815053f0 811d46ee  
81041383
[  777.316680]  88003d268b40 88003d268cc0 88003d07b3f8 
88003d07b368
[  777.316763]   8136f53b 88003d268b40 
8800160f2d40
[  777.316845] Call Trace:
[  777.316868]  [] ? dump_stack+0x47/0x69
[  777.316911]  [] ? warn_slowpath_common+0x73/0xa0
[  777.316963]  [] ? inet_sock_destruct+0x1cb/0x1f0
[  777.317018]  [] ? sk_destruct+0x13/0xc0
[  777.317061]  [] ? inet_release+0x31/0x50
[  777.317108]  [] ? sock_release+0x15/0x70
[  777.317153]  [] ? xs_close+0x9/0x20 [sunrpc]
[  777.317206]  [] ? xprt_autoclose+0x2d/0x60 [sunrpc]
[  777.317261]  [] ? process_one_work+0x129/0x3f0
[  777.317313]  [] ? worker_thread+0x42/0x490
[  777.317367]  [] ? process_one_work+0x3f0/0x3f0
[  777.317421]  [] ? kthread+0xb8/0xd0
[  777.317465]  [] ? kthread_worker_fn+0x100/0x100
[  777.317521]  [] ? ret_from_fork+0x3f/0x70
[  777.317564]  [] ? kthread_worker_fn+0x100/0x100
[  777.317618] ---[ end trace 220e17a0bf3ec971 ]---

This one happened on the client side VM.  There was only 1 NFS mount.  The
server VM didn't show anything nor did the host.


Can you send me your specific kernel version? There are multiple 
WARN_ONs and I want to catch the right one.


Thanks!

Re: [net-next PATCH v2 3/3] net: sched: cls_u32 add bit to specify software only rules

2016-02-25 Thread John Fastabend

On 16-02-25 11:02 PM, Samudrala, Sridhar wrote:
> On 2/25/2016 3:20 PM, John Fastabend wrote:
>> In the initial implementation the only way to stop a rule from being
>> inserted into the hardware table was via the device feature flag.
>> However this doesn't work well when working on an end host system
>> where packets are expect to hit both the hardware and software
>> datapaths.
>>
>> For example we can imagine a rule that will match an IP address and
>> increment a field. If we install this rule in both hardware and
>> software we may increment the field twice. To date we have only
>> added support for the drop action so we have been able to ignore
>> these cases. But as we extend the action support we will hit this
>> example plus more such cases. Arguably these are not even corner
>> cases in many working systems these cases will be common.
>>
>> To avoid forcing the driver to always abort (i.e. the above example)
>> this patch adds a flag to add a rule in software only. A careful
>> user can use this flag to build software and hardware datapaths
>> that work together. One example we have found particularly useful
>> is to use hardware resources to set the skb->mark on the skb when
>> the match may be expensive to run in software but a mark lookup
>> in a hash table is cheap. The idea here is hardware can do in one
>> lookup what the u32 classifier may need to traverse multiple lists
>> and hash tables to compute. The flag is only passed down on inserts
>> on deletion to avoid stale references in hardware we always try
> 
> I think this is supposed to be a new sentence starting with 'On deletion'

Yep.

>> to remove a rule if it exists.
>>
>> The flags field is part of the classifier specific options. Although
>> it is tempting to lift this into the generic structure doing this
>> proves difficult do to how the tc netlink attributes are implemented
>> along with how the dump/change routines are called. There is also
>> precedence for putting seemingly generic pieces in the specific
>> classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
>> although not ideal I've left FLAGS in the u32 options as well as it
>> simplifies the code greatly and user space has already learned how
>> to manage these bits ala 'tc' tool.
>>
>> Another thing if trying to update a rule we require the flags to
>> be unchanged. This is to force user space, software u32 and
>> the hardware u32 to keep in sync. Thanks to Simon Horman for
>> catching this case.
>>

[...]

>> u32_policy[TCA_U32_MAX + 1] = {
>>   [TCA_U32_SEL]= { .len = sizeof(struct tc_u32_sel) },
>>   [TCA_U32_INDEV]= { .type = NLA_STRING, .len = IFNAMSIZ },
>>   [TCA_U32_MARK]= { .len = sizeof(struct tc_u32_mark) },
>> +[TCA_U32_FLAGS]= { .len = NLA_U32 },
> should be  .type = NLA_U32
> 

Yep stupid typo. I think I'm going to write some smatch files to
catch these sorts of things they should be detectable pragmatically.

Thanks.

>> 
>

Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h

2016-02-25 Thread Mikko Rapeli

On Thu, Feb 25, 2016 at 10:08:56PM +0100, Thomas Graf wrote:
> On 01/06/16 at 09:20am, Stephen Hemminger wrote:
> > This commit breaks compilation of iproute2 with net-next.
> > 
> > commit 1ffad83dffd675cd742286ae82dca7d746cb0da8
> > Author: Mikko Rapeli 
> > Date:   Thu Oct 15 07:56:30 2015 +0200
> > 
> > netfilter: fix include files for compilation
> > 
> > Add missing header dependencies and other small changes so that each 
> > file
> > compiles alone in userspace.
> > 
> > Signed-off-by: Mikko Rapeli 
> > Signed-off-by: Pablo Neira Ayuso 
> > 
> > For iproute2, a copy of kernel headers (from make install_headers) is used.
> > After this change. the build of x_tables.c fails because IFNAMSIZ is already
> > defined in net/if.h
> 
> There is another issue with this commit. iptables.h included from m_ipt.c
> includes  xtables.h which includes  which is not
> available on a system without xtables.
> 
> gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations 
> -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES 
> -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  
> -DHAVE_SETNS -DHAVE_ELF -DCONFIG_GACT -DCONFIG_GACT_PROB 
> -DIPT_LIB_DIR=\"/lib/xtables\" -DYY_NO_INPUT   -c -o m_ipt.o m_ipt.c
> In file included from ../include/iptables.h:5:0,
>  from m_ipt.c:17:
> ../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or 
> directory
>  #include 

I don't see any dependencies from kernel side uapi headers to iptables or
xtables in my tree based on v4.5-rc2. Maybe this is a problem in
iproute2.

-Mikko

Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h

2016-02-25 Thread Mikko Rapeli

On Thu, Feb 25, 2016 at 09:53:32PM +0100, Daniel Borkmann wrote:
> On 02/04/2016 08:13 AM, Josh Boyer wrote:
> >On Thu, Jan 7, 2016 at 2:15 PM, Mikko Rapeli  wrote:
> >>On Thu, Jan 07, 2016 at 10:30:40AM -0800, Stephen Hemminger wrote:
> >>>On Thu, 7 Jan 2016 07:29:50 +
> >>>Mikko Rapeli  wrote:
> On Wed, Jan 06, 2016 at 09:20:07AM -0800, Stephen Hemminger wrote:
> >This commit breaks compilation of iproute2 with net-next.
> 
> Ok, linux/if.h and libc net/if.h have overlapping defines, and this is not
> the only one. I saw lots of them in the core dump headers.
> 
> How should we handle them? Another ifndef for IFNAMSIZ into kernel uapi
> headers?
> 
> -Mikko
> >>>
> >>>Probably need to do the same thing that was done previously for these
> >>>kind of conflicts.  This makes make linux/if.h change to adapt to net/if.h
> >>>being included before it.
> >>
> >>Ok, got it. And found include/uapi/linux/libc-compat.h. Did not know about 
> >>it
> >>and was looking for solutions to these problems.
> >>
> >>But now I feel like writing a test script for mixing of kernel uapi
> >>and libc headers to find out how many other collitions are still there.
> >>Not good for the pile of over 70 patches in my branch
> >>https://github.com/torvalds/linux/compare/master...mcfrisk:headers_test_v05
> >>
> >>>Or revert your patch.
> >>
> >>I'm fine with this too.
> >
> >This is causing a number of build failures in Fedora rawhide now.  Did
> >anyone submit a revert or patch to fix this issue?
> 
> Mikko, was there any follow-up patch to fix this? Seems like the build
> error is not yet resolved.

First draft of proper fix:

https://lkml.org/lkml/2016/2/7/89

Will fix review comments from David Miller soon and I need to figure out
how to get the needed fixes to the glibc side too.

The problem from my point of view is that kernel uapi header files have
incomplete dependencies, which were hiding kernel uapi and glibc header file
conflicts.

I will try to systematically fix the kernel uapi and glibc header conflicts
but this will take some time. If you want, you can hide the problem
by reverting the needed header file dependency fixes.

-Mikko

Re: [net-next PATCH v2 3/3] net: sched: cls_u32 add bit to specify software only rules

2016-02-25 Thread Samudrala, Sridhar


On 2/25/2016 3:20 PM, John Fastabend wrote:

In the initial implementation the only way to stop a rule from being
inserted into the hardware table was via the device feature flag.
However this doesn't work well when working on an end host system
where packets are expect to hit both the hardware and software
datapaths.

For example we can imagine a rule that will match an IP address and
increment a field. If we install this rule in both hardware and
software we may increment the field twice. To date we have only
added support for the drop action so we have been able to ignore
these cases. But as we extend the action support we will hit this
example plus more such cases. Arguably these are not even corner
cases in many working systems these cases will be common.

To avoid forcing the driver to always abort (i.e. the above example)
this patch adds a flag to add a rule in software only. A careful
user can use this flag to build software and hardware datapaths
that work together. One example we have found particularly useful
is to use hardware resources to set the skb->mark on the skb when
the match may be expensive to run in software but a mark lookup
in a hash table is cheap. The idea here is hardware can do in one
lookup what the u32 classifier may need to traverse multiple lists
and hash tables to compute. The flag is only passed down on inserts
on deletion to avoid stale references in hardware we always try


I think this is supposed to be a new sentence starting with 'On deletion'

to remove a rule if it exists.

The flags field is part of the classifier specific options. Although
it is tempting to lift this into the generic structure doing this
proves difficult do to how the tc netlink attributes are implemented
along with how the dump/change routines are called. There is also
precedence for putting seemingly generic pieces in the specific
classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
although not ideal I've left FLAGS in the u32 options as well as it
simplifies the code greatly and user space has already learned how
to manage these bits ala 'tc' tool.

Another thing if trying to update a rule we require the flags to
be unchanged. This is to force user space, software u32 and
the hardware u32 to keep in sync. Thanks to Simon Horman for
catching this case.

Signed-off-by: John Fastabend 
---
  include/net/pkt_cls.h|   13 +++--
  include/uapi/linux/pkt_cls.h |1 +
  net/sched/cls_u32.c  |   37 +++--
  3 files changed, 39 insertions(+), 12 deletions(-)




@@ -482,7 +485,9 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *h)
}
  }
  
-static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)

+static void u32_replace_hw_knode(struct tcf_proto *tp,
+struct tc_u_knode *n,
+u32 flags)
  {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_u32_offload u32_offload = {0};
@@ -491,7 +496,7 @@ static void u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
  
-	if (tc_should_offload(dev)) {

+   if (tc_should_offload(dev, flags)) {
offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
offload.cls_u32->knode.handle = n->handle;
offload.cls_u32->knode.fshift = n->fshift;
@@ -679,6 +684,7 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] 
= {
[TCA_U32_SEL]   = { .len = sizeof(struct tc_u32_sel) },
[TCA_U32_INDEV] = { .type = NLA_STRING, .len = IFNAMSIZ },
[TCA_U32_MARK]  = { .len = sizeof(struct tc_u32_mark) },
+   [TCA_U32_FLAGS] = { .len = NLA_U32 },

should be  .type = NLA_U32

Re: [PATCH] ipv4: do not cache local route when using SO_BINDTODEVICE

2016-02-25 Thread Kouya Shimura

Thanks for reviewing.

David Miller  writes:
> From: Kouya Shimura 
> Date: Tue, 23 Feb 2016 14:27:32 +0900
>
>> After v3.6, output routes are cached, however, the 'rt_iif' field of
>> struct rtable can not be shared by various traffics using SO_BINDTODEVICE.
>> It causes that traffic can not reach the local receiver which also uses 
>> SO_BINDTODEVICE after another traffic creates a local route cache. 
>> 
>>   /* sender and receiver are running on same host. */
>>   sender:
>> - socket(SOCK_DGRAM)
>> - setsockopt(SO_BINDTODEVICE, "eth0")
>> - connect(INADDR_ANY)
>> - send("message") -->[loopback]--+
>>  |
>>   receiver:  |
>> - socket(SOCK_DGRAM) |
>> - setsockopt(SO_BINDTODEVICE, "eth0")|
>> - bind(INADDR_ANY)   |
>> - recv()  <--+
>>   /* sometimes reach, sometimes not reach!!! */
>> 
>> Before v3.6, above traffics always reached. It seems to be a
>> regression. Actually the 'dhcp_release' command relies on it.
>> 
>> Commit 7bd86cc282a458b66c41e3f6 ("ipv4: Cache local output routes")
>> originated this issue. Revert it conditionally.
>> 
>> Signed-off-by: Kouya Shimura 
>
> I strongly fear that this will cause a performance regression for some 
> important
> cases.  If you elide the cache, it means that every route lookup has
> to allocate,
> create, and destroy the cache object.  This is really expensive.

The performace degradation was on my mind. Thus, I intended that this
patch affects only special (abnormal?) case that destination is local
and using SO_BINDTODEVICE. Isn't it acceptable?

This patch deesn't elide the existing cache. Normal (not using
SO_BINDTODEVICE) routes are still cached (not destoyed). Outer
routes, too. I confirmed it.

If you worry about non-zeroed 'orig_oif' case, this patch certainly
implies a perfomance problem. 'orig_oif' seems to be non-zero in
cases of ICMP, GRE, MULTICAST besides SO_BINDTODEVICE. 

>
> There has to be a better way to do so, so I'm not applying this patch,
> sorry.

Yes, it has to be. The root cause is sharing the 'rt_iif' field.
I'm looking forward to the right fix.

Thanks,
Kouya

Re: [PATCH 1/4] kernel: time: Add current_nw_timestamp() for network timestamps

2016-02-25 Thread YOSHIFUJI Hideaki/吉藤英明

Hi,

Deepa Dinamani wrote:
>>>  include/linux/ip.h |  2 ++
>>>  include/linux/time64.h |  3 +++
>>>  kernel/time/time.c | 26 ++
>>>  3 files changed, 31 insertions(+)
>>>
>> Since net/ipv4/* are the only users, it is enough to put
>> it in under net/ipv4/.
> 
> time.c hosts functions that are used by individual subsystems like
> current_fs_time() used by filesystems
> (sometimes used by other subsystems also).
> 
> The network timestamp function is used for both source route ip option
> and timestamp icmp messages.
> So it makes it difficult for it to be owned by a single layer.
> This is the reason it was chosen to include here.
> 
> Another option is to include it in the lowest layer its used:
> af_inet.c. Is this what you were suggesting?
> 

Yes, that's right.

--yoshfuji

> -Deepa
>

Re: [PATCH net-next RFC] rtnetlink: add new RTM_GETSTATS to dump link stats

2016-02-25 Thread roopa

On 2/25/16, 8:26 AM, David Miller wrote:
> From: roopa 
> Date: Wed, 24 Feb 2016 21:25:50 -0800
>
>> I did go back and forth on the attribute vs mask.
>> cosmetic but, i guess i did not feel good about having to redefine every 
>> attribute again
>> for the bitmap filter ...and i anticipate the list of stat attributes to 
>> grow overtime (maybe there is a better way).
>> enum {
>>IFLA_LINK_STATS64,
>>IFLA_LINK_INET6_STATS,
>>IFLA_LINK_MPLS_STATS,
>>__IFLA_LINK_STATS_MAX,
>> };
>>
>> #define IFLA_LINK_STATS64_FILTER  0
>> #define IFLA_LINK_INET6_STATS_FILTER (1 << 0)
>> #define IFLA_LINK_MPLS_STATS_FILTER (2 << 0)
> The filter for X is always (1 << X), so we could work with something like:
>
> #define IFLA_LINK_FILTER_BIT(ATTR)(1 << (ATTR))
i like it
>
> Which seems to suggest that emitting no stats unless they are explicitly 
> requested in
> the bitmask makes sense because:
>
> 1) You don't have to special case STATS64 in the filter mask
>
> 2) Application are forced to be aware of filtering and thus choose only
>what they want to see
>
> How about this?
I am ok with it. It keeps the filtering mask handling consistent. Its a bit 
inconsistent with the other dump functions,
but this gives user full control of what combination of stats she wants. For 
something like stats, i think this makes sense.

Thanks again!
Roopa

[net-next PATCH] IPv6: Use a 16 bit length field when computing a IPv6 UDP checksum

2016-02-25 Thread Alexander Duyck

This change makes it so that we only use a 16 bit length field instead of a
32 bit length field when computing a UDP checksum for IPv6.

This fixes an issue found with UDP tunnels over IPv6 where the total size
exceeded 65536 for a frame that was to be segmented.  As a result the
checksum being computed didn't match the frame data so we ended up being
off by 1 for the final checksum value since we didn't cancel out the upper
16 bits of the length.

The reasoning behind this is that RFC2460 states that for protocols such as
UDP that carry their own length field we should use that when computing the
checksum for the pseudo-header.  As such we should be using a 16 bit value,
not a 32 bit as is currently occurring when computing the UDP checksum for
IPv6.

Signed-off-by: Alexander Duyck 
---

I don't have the highest of confidence that this patch is the correct
solution for this, however the way I see it we only have a few
alternatives.  For now I am using this patch to get the conversation
started.

There ends up really being two issues here.

The first issue is that IPv4 and IPv6 GSO frames are being assembled that
have a length that is greater than 65535 and the truncated value is being
placed in the length fields.  The problem as I see it is that I am not sure
there is any definitive way to prevent this without reducing the maximum
GSO size for all sources within a given network namespace.  In addition I
am not sure what the impact on performance would be if we were to implement
such an approach.

The second issue is that if we cannot prevent the first issue we really
need to have a consistent way of doing the IPv4 and IPv6 checksums.  The
IPv4 checksum passes the length value as a short, while the IPv6 passes
this value as an integer.  As a result we get different behavior between
the two which makes it difficult for us to put together a generic routine
for handling checksums for segmentation at higher levels.

 include/net/ip6_checksum.h |9 +
 1 file changed, 9 insertions(+)

diff --git a/include/net/ip6_checksum.h b/include/net/ip6_checksum.h
index 1a49b73f7f6e..c55529ac12cf 100644
--- a/include/net/ip6_checksum.h
+++ b/include/net/ip6_checksum.h
@@ -95,6 +95,15 @@ static inline __sum16 udp_v6_check(int len,
   const struct in6_addr *daddr,
   __wsum base)
 {
+   /* RFC 2460 states that for protocols such as UDP which include
+* a length in their header we should use that value when computing
+* the pseudo-header instead of the total payload length minus
+* extension headers.  Since UDP has a length field that is only
+* 16 bits in length we need to cap the length field to 16 bits to
+* match what will be present in the header.
+*/
+   len &= 0x;
+
return csum_ipv6_magic(saddr, daddr, len, IPPROTO_UDP, base);
 }

Re: [PATCH net-next] net: Facility to report route quality of connected sockets

2016-02-25 Thread David Miller

From: Tom Herbert 
Date: Wed, 24 Feb 2016 10:02:52 -0800

> This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
> purpose is to allow an application to give feedback to the kernel about
> the quality of the network path for a connected socket. The value
> argument indicates the type of quality report. For this initial patch
> the only supported advice is a value of 1 which indicates "bad path,
> please reroute"-- the action taken by the kernel is to call
> dst_negative_advice which will attempt to choose a different ECMP route,
> reset the TX hash for flow label and UDP source port in encapsulation,
> etc.
> 
> This facility should be useful for connected UDP sockets where only the
> application can provide any feedback about path quality. It could also
> be useful for TCP applications that have additional knowledge about the
> path outside of the normal TCP control loop.
> 
> Signed-off-by: Tom Herbert 

This looks fine, applied, thanks Tom.

Re: [PATCH net-next v7] net: ipv6: Make address flushing on ifdown optional

2016-02-25 Thread David Miller

From: David Ahern 
Date: Wed, 24 Feb 2016 09:25:37 -0800

> Currently, all ipv6 addresses are flushed when the interface is configured
> down, including global, static addresses:
 ...
> Add a new sysctl to make this behavior optional. The new setting defaults to
> flush all addresses to maintain backwards compatibility. When the set global
> addresses with no expire times are not flushed on an admin down. The sysctl
> is per-interface or system-wide for all interfaces
> 
> $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
> or
> $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
> 
> Will keep addresses on eth1 on an admin down.
 ...
> Signed-off-by: David Ahern 

Applied, thanks for your persistence David :)

[PATCH iproute2 v2] iplink: Support VF Trust

2016-02-25 Thread Hiroshi Shimamoto

From: Hiroshi Shimamoto 

Add IFLA_VF_TRUST message to trust the VF.
PF can accept some privileged operation from the trusted VF.
For example, ixgbe PF doesn't allow to enable VF promiscuous mode until
the VF is trusted because it may hurt performance.

To trust VF.
 # ip link set dev eth0 vf 1 trust on

To untrust VF.
 # ip link set dev eth0 vf 1 trust off

Signed-off-by: Hiroshi Shimamoto 
---

v1 -> v2: rebase to the latest code of iproute2.

The VF trust patch has been in kernel and the IFLA_VF_TRUST netlink attribute
has been included iproute2, but no actual handler for this.
This patch add the functionality to trust vf from ip command.

 ip/iplink.c   | 13 +
 man/man8/ip-link.8.in |  7 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 5ab9d61..69f5057 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -82,6 +82,7 @@ void iplink_usage(void)
fprintf(stderr, "  [ spoofchk { on | 
off} ] ]\n");
fprintf(stderr, "  [ query_rss { on | 
off} ] ]\n");
fprintf(stderr, "  [ state { auto | 
enable | disable} ] ]\n");
+   fprintf(stderr, "  [ trust { on | off} 
] ]\n");
fprintf(stderr, " [ master DEVICE ]\n");
fprintf(stderr, " [ nomaster ]\n");
fprintf(stderr, " [ addrgenmode { eui64 | none 
| stable_secret | random } ]\n");
@@ -356,6 +357,18 @@ static int iplink_parse_vf(int vf, int *argcp, char 
***argvp,
ivs.vf = vf;
addattr_l(>n, sizeof(*req), IFLA_VF_RSS_QUERY_EN, 
, sizeof(ivs));
 
+   } else if (matches(*argv, "trust") == 0) {
+   struct ifla_vf_trust ivt;
+   NEXT_ARG();
+   if (matches(*argv, "on") == 0)
+   ivt.setting = 1;
+   else if (matches(*argv, "off") == 0)
+   ivt.setting = 0;
+   else
+   invarg("Invalid \"trust\" value\n", *argv);
+   ivt.vf = vf;
+   addattr_l(>n, sizeof(*req), IFLA_VF_TRUST, , 
sizeof(ivt));
+
} else if (matches(*argv, "state") == 0) {
struct ifla_vf_link_state ivl;
 
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 4d32343..7dd7a90 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -142,7 +142,8 @@ ip-link \- network device configuration
 .B min_tx_rate
 .IR TXRATE " ] ["
 .B spoofchk { on | off } ] [
-.B state { auto | enable | disable}
+.B state { auto | enable | disable} ] [
+.B trust { on | off }
 ] |
 .br
 .B master
@@ -1019,6 +1020,10 @@ parameter must be specified.
 reflection of the PF link state, enable lets the VF to communicate with other 
VFs on
 this host even if the PF link state is down, disable causes the HW to drop any 
packets
 sent by the VF.
+.sp
+.BI trust " on|off"
+- trust the specified VF user. This enables that VF user can set a specific 
feature
+which may impact security and/or performance. (e.g. VF multicast promiscuous 
mode)
 .in -8
 
 .TP
-- 
1.8.3.1

Re: [PATCH v2 net] bonding: don't use stale speed and duplex information

2016-02-25 Thread zhuyj


On 02/25/2016 09:33 PM, Jay Vosburgh wrote:

zhuyj  wrote:
[...]

I delved into the source code and Emil's tests. I think that the problem
that this patch expects to fix occurs very unusually.

Do you agree with me?

If so, maybe the following patch can reduce the performance loss.
Please comment on it. Thanks a lot.


diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index b7f1a99..c4c511a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2129,7 +2129,9 @@ static void bond_miimon_commit(struct bonding *bond)
continue;

case BOND_LINK_UP:
-   bond_update_speed_duplex(slave);
+   if (slave->speed == SPEED_UNKNOWN)
+   bond_update_speed_duplex(slave);
+
bond_set_slave_link_state(slave, BOND_LINK_UP,
BOND_SLAVE_NOTIFY_NOW);
slave->last_link_up = jiffies;

I don't believe the speed is necessarily SPEED_UNKNOWN coming in
here.  If the race occurs at a time later than the initial enslavement,
speed may already be set (and the race manifests if the new speed
changes, i.e., the link changes from 1 Gb/sec to 10 Gb/sec), so I don't
think this is functionally correct.

Hi, Jay

Thanks for your reply.

IMHO, "If the race occurs at a time later than the initial enslavement,
speed may already be set (and the race manifests if the new speed
changes, i.e., the link changes from 1 Gb/sec to 10 Gb/sec)", from my test,
this will not happen because the previous source code make the speed 
correct.


This "bond_update_speed_duplex" repeats to get the correct speed.

That is, this patch is to fix the error in initial enslavement. The 
mentioned scenario

will not occur.

Even though the performance impact is minimal, if we can avoid this 
performance

impact, why not ?

Best Regards!
Zhu Yanjun



Also, the call to bond_miimon_commit itself is already gated by
bond_miimon_inspect finding a link state change.  The performance impact
here should be minimal.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH] appletalk: Pass IP-over-DDP packets through when 'ipddp0' interface is not present

2016-02-25 Thread Adam Seering

On Thu, 2016-02-25 at 14:33 -0500, David Miller wrote:
> From: Adam Seering 
> Date: Tue, 23 Feb 2016 09:19:13 -0500
> 
> > Let userspace programs transmit and receive raw IP-over-DDP packets
> > with a kernel where "ipddp" was compiled as a module but is not
> loaded
> > (so no "ipddp0" network interface is exposed).  This makes the
> "module
> > is not loaded" behavior match the "module was never compiled"
> behavior.
> > 
> > Signed-off-by: Adam Seering 
> 
> I think a better approache is to somehow autoload the module.

Could you elaborate?  Specifically: the kernel currently suppresses
packets on behalf of the module even after the module is unloaded.  How
would autoloading the module help with that?

Re: [PATCH net] ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL

2016-02-25 Thread Wakko Warner

Wakko Warner wrote:
> Hannes Frederic Sowa wrote:
> > Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL
> > skbs down. This can easily happen with UDP+IPv4 sockets with the first
> > MSG_MORE write smaller than the MTU, second write is a sendfile.
> > 
> > Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path,
> > were we calculate the checksum manually during copying.
> > 
> > Commit d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked
> > sockets") started to exposes this bug.
> > 
> > Fixes: d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
> > Reported-by: Jiri Benc 
> > Cc: Jiri Benc 
> > Reported-by: Wakko Warner 
> > Cc: Wakko Warner 
> > Signed-off-by: Hannes Frederic Sowa 
> 
> I just tested this with 2 of my VMs.  It appears to have fixed the issue.

Now there's another one:
[  777.315931] [ cut here ]
[  777.316099] WARNING: CPU: 0 PID: 1404 at 
/usr/src/linux/dist/4.4-nobklcd/net/ipv4/af_inet.c:155 
inet_sock_destruct+0x1cb/0x1f0()
[  777.316189] Modules linked in: nfsv3 af_packet scsi_transport_iscsi nfsd 
auth_rpcgss oid_registry exportfs nfs lockd grace sunrpc ipv6 ata_piix libata 
evdev virtio_balloon virtio_net unix
[  777.316416] CPU: 0 PID: 1404 Comm: kworker/0:1H Not tainted 4.4.0 #2
[  777.316468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[  777.316547] Workqueue: rpciod xprt_autoclose [sunrpc]
[  777.316598]  815053f0 811d46ee  
81041383
[  777.316680]  88003d268b40 88003d268cc0 88003d07b3f8 
88003d07b368
[  777.316763]   8136f53b 88003d268b40 
8800160f2d40
[  777.316845] Call Trace:
[  777.316868]  [] ? dump_stack+0x47/0x69
[  777.316911]  [] ? warn_slowpath_common+0x73/0xa0
[  777.316963]  [] ? inet_sock_destruct+0x1cb/0x1f0
[  777.317018]  [] ? sk_destruct+0x13/0xc0
[  777.317061]  [] ? inet_release+0x31/0x50
[  777.317108]  [] ? sock_release+0x15/0x70
[  777.317153]  [] ? xs_close+0x9/0x20 [sunrpc]
[  777.317206]  [] ? xprt_autoclose+0x2d/0x60 [sunrpc]
[  777.317261]  [] ? process_one_work+0x129/0x3f0
[  777.317313]  [] ? worker_thread+0x42/0x490
[  777.317367]  [] ? process_one_work+0x3f0/0x3f0
[  777.317421]  [] ? kthread+0xb8/0xd0
[  777.317465]  [] ? kthread_worker_fn+0x100/0x100
[  777.317521]  [] ? ret_from_fork+0x3f/0x70
[  777.317564]  [] ? kthread_worker_fn+0x100/0x100
[  777.317618] ---[ end trace 220e17a0bf3ec971 ]---

This one happened on the client side VM.  There was only 1 NFS mount.  The
server VM didn't show anything nor did the host.

> > ---
> >  net/ipv4/ip_output.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 64878efa045c13..565bf64b2b7d60 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1236,13 +1236,16 @@ ssize_t ip_append_page(struct sock *sk, struct 
> > flowi4 *fl4, struct page *page,
> > if (!skb)
> > return -EINVAL;
> >  
> > -   cork->length += size;
> > if ((size + skb->len > mtu) &&
> > (sk->sk_protocol == IPPROTO_UDP) &&
> > (rt->dst.dev->features & NETIF_F_UFO)) {
> > +   if (skb->ip_summed != CHECKSUM_PARTIAL)
> > +   return -EOPNOTSUPP;
> > +
> > skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
> > skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > }
> > +   cork->length += size;
> >  
> > while (size > 0) {
> > if (skb_is_gso(skb)) {
> > -- 
> > 2.5.0
> > 
> -- 
>  Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
>  million bugs.
-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.

Re: linux-next: manual merge of the net-next tree with the net tree

2016-02-25 Thread Daniel Borkmann


On 02/26/2016 01:13 AM, Stephen Rothwell wrote:
[...]

I fixed it up (see below) and can carry the fix as necessary (no action
is required).


Looks good to me, thanks Stephen!

Best,
Daniel

linux-next: manual merge of the net-next tree with the net tree

2016-02-25 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  include/uapi/linux/bpf.h

between commit:

  2da897e51d7f ("bpf: fix csum setting for bpf_set_tunnel_key")

from the net tree and commit:

  d5a3b1f69186 ("bpf: introduce BPF_MAP_TYPE_STACK_TRACE")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell

diff --cc include/uapi/linux/bpf.h
index 5df4881dea7b,6496f98d3d68..
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@@ -292,9 -321,12 +321,15 @@@ enum bpf_func_id 
  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
  #define BPF_F_TUNINFO_IPV6(1ULL << 0)
  
 +/* BPF_FUNC_skb_set_tunnel_key flags. */
 +#define BPF_F_ZERO_CSUM_TX(1ULL << 1)
 +
+ /* BPF_FUNC_get_stackid flags. */
+ #define BPF_F_SKIP_FIELD_MASK 0xffULL
+ #define BPF_F_USER_STACK  (1ULL << 8)
+ #define BPF_F_FAST_STACK_CMP  (1ULL << 9)
+ #define BPF_F_REUSE_STACKID   (1ULL << 10)
+ 
  /* user accessible mirror of in-kernel sk_buff.
   * new fields can only be added to the end of this structure
   */

Re: [PATCH net] ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL

2016-02-25 Thread Wakko Warner

Hannes Frederic Sowa wrote:
> Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL
> skbs down. This can easily happen with UDP+IPv4 sockets with the first
> MSG_MORE write smaller than the MTU, second write is a sendfile.
> 
> Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path,
> were we calculate the checksum manually during copying.
> 
> Commit d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked
> sockets") started to exposes this bug.
> 
> Fixes: d749c9cbffd6 ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
> Reported-by: Jiri Benc 
> Cc: Jiri Benc 
> Reported-by: Wakko Warner 
> Cc: Wakko Warner 
> Signed-off-by: Hannes Frederic Sowa 

I just tested this with 2 of my VMs.  It appears to have fixed the issue.

> ---
>  net/ipv4/ip_output.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 64878efa045c13..565bf64b2b7d60 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1236,13 +1236,16 @@ ssize_t   ip_append_page(struct sock *sk, struct 
> flowi4 *fl4, struct page *page,
>   if (!skb)
>   return -EINVAL;
>  
> - cork->length += size;
>   if ((size + skb->len > mtu) &&
>   (sk->sk_protocol == IPPROTO_UDP) &&
>   (rt->dst.dev->features & NETIF_F_UFO)) {
> + if (skb->ip_summed != CHECKSUM_PARTIAL)
> + return -EOPNOTSUPP;
> +
>   skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
>   skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>   }
> + cork->length += size;
>  
>   while (size > 0) {
>   if (skb_is_gso(skb)) {
> -- 
> 2.5.0
> 
-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.

Re: [net-next PATCH 0/5] net_sched: Add support for IFE action

2016-02-25 Thread Daniel Borkmann


On 02/25/2016 11:40 PM, Jamal Hadi Salim wrote:

On 16-02-25 04:34 PM, Daniel Borkmann wrote:

On 02/25/2016 01:23 PM, Jamal Hadi Salim wrote:

On 16-02-24 12:48 PM, Daniel Borkmann wrote:

On 02/24/2016 01:49 PM, Jamal Hadi Salim wrote:

[...]

Drivers do set the hash. My use case is slightly different.
I have a NIC which has an embedded cavium processor. This thing
strips off the TLV and uses the hash to select the host MSI.
Only thing we dont use at the moment is queue_mapping.


Ok, but the example says ingress qdisc. ;) I presume the driver for the
NIC and the offloading parts are non-public? :/


Well, it is not exactly mellanox or intel - but I am sure someone would
be willing to sell that NIC.


So, without them, placing
this on ingress qdisc doesn't seem much useful wrt the skb hash example,
and most people only have the software part (for ingress I mean)
available.


Do you want me to take skbbhash out? I just want to get this set in then
worry about adding and modifying.


Well, if the NIC driver and offloading parts for ife are not available
(or cannot be made available)


If you are willing to pay for one i can have one sold to you.
Note: There is no dependency on such a NIC. You asked for an example
of how one would use skbhash and i pointed it out to you.
Infact nothing in the commit notes pointed to any NIC.


Yes, but that was not the point, I think we're talking past each other. ;)
The example (or better, use-case) you provided with your NIC seems useful.
But when you want to decode this with your kernel module, f.e. attached on
ingress qdisc, it's an entirely different situation. Anyway, point is subsequent
calls to skb_get_hash() will overwrite what you decoded there, that's why I
asked and tried to point this out earlier. You may want to look at sth
like __skb_set_sw_hash() ...


in the upstream kernel tree, then you have
to assume that everyone else can _only_ use this with the existing tc
facilities in the kernel. And as such, the whole set needs to be evaluated,
I think this is nothing new. ;-)


Seriously this is getting to a ridiculous level now.
I gave a talk - which you attended. I wrote a paper which you may have
at minimal glanced at.
I have had discussions with you on this very subject.


I think I only suggested that in case you still don't have an ethertype
assigned to this to require the user to specify one instead of some default.


And as soon as i posted the patches your statements led from you
needing a good use case to why not use a netdev and why i have
all these things that are not very useful. None of which came up before.


Fair enough, didn't see that code before and just tried to review it, probably
shouldn't have, so I'm not looking further into this anymore. ;)

Thanks anyway,
Daniel

Re: [PATCH][V3] mt7601u: do not free dma_buf when ivp allocation fails

2016-02-25 Thread Kuba Kicinski

On 25 February 2016 18:24:27 GMT-05:00, Colin King  
wrote:
>From: Colin Ian King 
>
>If the allocation of ivp fails the error handling attempts to
>free an uninitialized dma_buf; this data structure just contains
>garbage on the stack, so the freeing will cause issues when the
>urb, buf and dma fields are free'd. Fix this by not free'ing the
>dma_buf if the ivp allocation fails.
>
>Signed-off-by: Colin Ian King 

LGTM, thanks.

[PATCH net-next 2/4] lan78xx: remove unnecessary code

2016-02-25 Thread Woojung.Huh

It is not required after commit cd772de358d6
("phy: keep pause flags in phy driver features")

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/lan78xx.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 4ec25e8..d100421 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1623,12 +1623,7 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 
/* MAC doesn't support 1000T Half */
phydev->supported &= ~SUPPORTED_1000baseT_Half;
-   phydev->supported |= (SUPPORTED_10baseT_Half |
- SUPPORTED_10baseT_Full |
- SUPPORTED_100baseT_Half |
- SUPPORTED_100baseT_Full |
- SUPPORTED_1000baseT_Full |
- SUPPORTED_Pause | SUPPORTED_Asym_Pause);
+
genphy_config_aneg(phydev);
 
phy_start(phydev);
-- 
2.7.0

[PATCH net-next 0/4] lan78xx: driver update

2016-02-25 Thread Woojung.Huh

This patch series add new ethtool functions of set_pauseparam  & get_pauseparam
and MAINTAINERS entry.

Woojung Huh (4):
  lan78xx: replace devid to chipid & chiprev
  lan78xx: remove unnecessary code
  lan78xx: add ethtool set & get pause functions
  MAINTAINERS: Add LAN78XX entry

 MAINTAINERS   |   7 +++
 drivers/net/usb/lan78xx.c | 107 ++
 drivers/net/usb/lan78xx.h |   1 +
 3 files changed, 97 insertions(+), 18 deletions(-)

-- 
2.7.0

[PATCH net-next 3/4] lan78xx: add ethtool set & get pause functions

2016-02-25 Thread Woojung.Huh

Add ethtool operations of set_pauseram and get_pauseparm.

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/lan78xx.c | 80 +--
 1 file changed, 77 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index d100421..705c180 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -36,7 +36,7 @@
 #define DRIVER_AUTHOR  "WOOJUNG HUH "
 #define DRIVER_DESC"LAN78XX USB 3.0 Gigabit Ethernet Devices"
 #define DRIVER_NAME"lan78xx"
-#define DRIVER_VERSION "1.0.2"
+#define DRIVER_VERSION "1.0.3"
 
 #define TX_TIMEOUT_JIFFIES (5 * HZ)
 #define THROTTLE_JIFFIES   (HZ / 8)
@@ -281,6 +281,9 @@ struct lan78xx_net {
u32 chipid;
u32 chiprev;
struct mii_bus  *mdiobus;
+
+   int fc_autoneg;
+   u8  fc_request_control;
 };
 
 /* use ethtool to change the level for any given device */
@@ -902,11 +905,15 @@ static int lan78xx_update_flowcontrol(struct lan78xx_net 
*dev, u8 duplex,
 {
u32 flow = 0, fct_flow = 0;
int ret;
+   u8 cap;
 
-   u8 cap = mii_resolve_flowctrl_fdx(lcladv, rmtadv);
+   if (dev->fc_autoneg)
+   cap = mii_resolve_flowctrl_fdx(lcladv, rmtadv);
+   else
+   cap = dev->fc_request_control;
 
if (cap & FLOW_CTRL_TX)
-   flow = (FLOW_CR_TX_FCEN_ | 0x);
+   flow |= (FLOW_CR_TX_FCEN_ | 0x);
 
if (cap & FLOW_CTRL_RX)
flow |= FLOW_CR_RX_FCEN_;
@@ -1386,6 +1393,62 @@ static int lan78xx_set_settings(struct net_device *net, 
struct ethtool_cmd *cmd)
return ret;
 }
 
+static void lan78xx_get_pause(struct net_device *net,
+ struct ethtool_pauseparam *pause)
+{
+   struct lan78xx_net *dev = netdev_priv(net);
+   struct phy_device *phydev = net->phydev;
+   struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
+
+   phy_ethtool_gset(phydev, );
+
+   pause->autoneg = dev->fc_autoneg;
+
+   if (dev->fc_request_control & FLOW_CTRL_TX)
+   pause->tx_pause = 1;
+
+   if (dev->fc_request_control & FLOW_CTRL_RX)
+   pause->rx_pause = 1;
+}
+
+static int lan78xx_set_pause(struct net_device *net,
+struct ethtool_pauseparam *pause)
+{
+   struct lan78xx_net *dev = netdev_priv(net);
+   struct phy_device *phydev = net->phydev;
+   struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET };
+   int ret;
+
+   phy_ethtool_gset(phydev, );
+
+   if (pause->autoneg && !ecmd.autoneg) {
+   ret = -EINVAL;
+   goto exit;
+   }
+
+   dev->fc_request_control = 0;
+   if (pause->rx_pause)
+   dev->fc_request_control |= FLOW_CTRL_RX;
+
+   if (pause->tx_pause)
+   dev->fc_request_control |= FLOW_CTRL_TX;
+
+   if (ecmd.autoneg) {
+   u32 mii_adv;
+
+   ecmd.advertising &= ~(ADVERTISED_Pause | ADVERTISED_Asym_Pause);
+   mii_adv = (u32)mii_advertise_flowctrl(dev->fc_request_control);
+   ecmd.advertising |= mii_adv_to_ethtool_adv_t(mii_adv);
+   phy_ethtool_sset(phydev, );
+   }
+
+   dev->fc_autoneg = pause->autoneg;
+
+   ret = 0;
+exit:
+   return ret;
+}
+
 static const struct ethtool_ops lan78xx_ethtool_ops = {
.get_link   = lan78xx_get_link,
.nway_reset = lan78xx_nway_reset,
@@ -1404,6 +1467,8 @@ static const struct ethtool_ops lan78xx_ethtool_ops = {
.set_wol= lan78xx_set_wol,
.get_eee= lan78xx_get_eee,
.set_eee= lan78xx_set_eee,
+   .get_pauseparam = lan78xx_get_pause,
+   .set_pauseparam = lan78xx_set_pause,
 };
 
 static int lan78xx_ioctl(struct net_device *netdev, struct ifreq *rq, int cmd)
@@ -1591,6 +1656,7 @@ static void lan78xx_link_status_change(struct net_device 
*net)
 static int lan78xx_phy_init(struct lan78xx_net *dev)
 {
int ret;
+   u32 mii_adv;
struct phy_device *phydev = dev->net->phydev;
 
phydev = phy_find_first(dev->mdiobus);
@@ -1624,8 +1690,16 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
/* MAC doesn't support 1000T Half */
phydev->supported &= ~SUPPORTED_1000baseT_Half;
 
+   /* support both flow controls */
+   dev->fc_request_control = (FLOW_CTRL_RX | FLOW_CTRL_TX);
+   phydev->advertising &= ~(ADVERTISED_Pause | ADVERTISED_Asym_Pause);
+   mii_adv = (u32)mii_advertise_flowctrl(dev->fc_request_control);
+   phydev->advertising |= mii_adv_to_ethtool_adv_t(mii_adv);
+
genphy_config_aneg(phydev);
 
+   dev->fc_autoneg = phydev->autoneg;
+
phy_start(phydev);
 
netif_dbg(dev, ifup, dev->net, "phy initialised successfully");
-- 
2.7.0

[PATCH net-next 1/4] lan78xx: replace devid to chipid & chiprev

2016-02-25 Thread Woojung.Huh

Replace devid to chipid & chiprev for easy access.

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/lan78xx.c | 20 +++-
 drivers/net/usb/lan78xx.h |  1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 1c299b8..4ec25e8 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -278,7 +278,8 @@ struct lan78xx_net {
int link_on;
u8  mdix_ctrl;
 
-   u32 devid;
+   u32 chipid;
+   u32 chiprev;
struct mii_bus  *mdiobus;
 };
 
@@ -471,7 +472,7 @@ static int lan78xx_read_raw_eeprom(struct lan78xx_net *dev, 
u32 offset,
 */
ret = lan78xx_read_reg(dev, HW_CFG, );
saved = val;
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800) {
+   if (dev->chipid == ID_REV_CHIP_ID_7800_) {
val &= ~(HW_CFG_LED1_EN_ | HW_CFG_LED0_EN_);
ret = lan78xx_write_reg(dev, HW_CFG, val);
}
@@ -505,7 +506,7 @@ static int lan78xx_read_raw_eeprom(struct lan78xx_net *dev, 
u32 offset,
 
retval = 0;
 exit:
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800)
+   if (dev->chipid == ID_REV_CHIP_ID_7800_)
ret = lan78xx_write_reg(dev, HW_CFG, saved);
 
return retval;
@@ -539,7 +540,7 @@ static int lan78xx_write_raw_eeprom(struct lan78xx_net 
*dev, u32 offset,
 */
ret = lan78xx_read_reg(dev, HW_CFG, );
saved = val;
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800) {
+   if (dev->chipid == ID_REV_CHIP_ID_7800_) {
val &= ~(HW_CFG_LED1_EN_ | HW_CFG_LED0_EN_);
ret = lan78xx_write_reg(dev, HW_CFG, val);
}
@@ -587,7 +588,7 @@ static int lan78xx_write_raw_eeprom(struct lan78xx_net 
*dev, u32 offset,
 
retval = 0;
 exit:
-   if ((dev->devid & ID_REV_CHIP_ID_MASK_) == 0x7800)
+   if (dev->chipid == ID_REV_CHIP_ID_7800_)
ret = lan78xx_write_reg(dev, HW_CFG, saved);
 
return retval;
@@ -1555,9 +1556,9 @@ static int lan78xx_mdio_init(struct lan78xx_net *dev)
snprintf(dev->mdiobus->id, MII_BUS_ID_SIZE, "usb-%03d:%03d",
 dev->udev->bus->busnum, dev->udev->devnum);
 
-   switch (dev->devid & ID_REV_CHIP_ID_MASK_) {
-   case 0x7800:
-   case 0x7850:
+   switch (dev->chipid) {
+   case ID_REV_CHIP_ID_7800_:
+   case ID_REV_CHIP_ID_7850_:
/* set to internal PHY id */
dev->mdiobus->phy_mask = ~(1 << 1);
break;
@@ -1918,7 +1919,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 
/* save DEVID for later usage */
ret = lan78xx_read_reg(dev, ID_REV, );
-   dev->devid = buf;
+   dev->chipid = (buf & ID_REV_CHIP_ID_MASK_) >> 16;
+   dev->chiprev = buf & ID_REV_CHIP_REV_MASK_;
 
/* Respond to the IN token with a NAK */
ret = lan78xx_read_reg(dev, USB_CFG0, );
diff --git a/drivers/net/usb/lan78xx.h b/drivers/net/usb/lan78xx.h
index a93fb65..4092790 100644
--- a/drivers/net/usb/lan78xx.h
+++ b/drivers/net/usb/lan78xx.h
@@ -107,6 +107,7 @@
 #define ID_REV_CHIP_ID_MASK_   (0x)
 #define ID_REV_CHIP_REV_MASK_  (0x)
 #define ID_REV_CHIP_ID_7800_   (0x7800)
+#define ID_REV_CHIP_ID_7850_   (0x7850)
 
 #define FPGA_REV   (0x04)
 #define FPGA_REV_MINOR_MASK_   (0xFF00)
-- 
2.7.0

[PATCH net-next 4/4] MAINTAINERS: Add LAN78XX entry

2016-02-25 Thread Woojung.Huh

Add maintainers for Microchip LAN78XX.
unglinuxdri...@microchip.com is alias email which goes to current
developers work for Microchip Network related products.

Signed-off-by: Woojung Huh 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 27393cf..12b764f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11332,6 +11332,13 @@ S: Maintained
 F: drivers/usb/host/isp116x*
 F: include/linux/usb/isp116x.h
 
+USB LAN78XX ETHERNET DRIVER
+M: Woojung Huh 
+M: Microchip Linux Driver Support 
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/usb/lan78xx.*
+
 USB MASS STORAGE DRIVER
 M: Matthew Dharm 
 L: linux-...@vger.kernel.org
-- 
2.7.0

Re: [PATCH][V3] mt7601u: do not free dma_buf when ivp allocation fails

2016-02-25 Thread Julian Calaby

Hi,

On Fri, Feb 26, 2016 at 10:24 AM, Colin King  wrote:
> From: Colin Ian King 
>
> If the allocation of ivp fails the error handling attempts to
> free an uninitialized dma_buf; this data structure just contains
> garbage on the stack, so the freeing will cause issues when the
> urb, buf and dma fields are free'd. Fix this by not free'ing the
> dma_buf if the ivp allocation fails.
>
> Signed-off-by: Colin Ian King 

Looks right to me.

Reviewed-by: Julian Calaby 

> ---
>  drivers/net/wireless/mediatek/mt7601u/mcu.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/mediatek/mt7601u/mcu.c 
> b/drivers/net/wireless/mediatek/mt7601u/mcu.c
> index fbb1986..91c4b34 100644
> --- a/drivers/net/wireless/mediatek/mt7601u/mcu.c
> +++ b/drivers/net/wireless/mediatek/mt7601u/mcu.c
> @@ -362,7 +362,9 @@ mt7601u_upload_firmware(struct mt7601u_dev *dev, const 
> struct mt76_fw *fw)
> int i, ret;
>
> ivb = kmemdup(fw->ivb, sizeof(fw->ivb), GFP_KERNEL);
> -   if (!ivb || mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
> +   if (!ivb)
> +   return -ENOMEM;
> +   if (mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
> ret = -ENOMEM;
> goto error;
> }
> --
> 2.7.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/

Re: [net-next PATCH v2 0/3] tc software only flag

2016-02-25 Thread John Fastabend

On 16-02-25 03:19 PM, John Fastabend wrote:
> This adds a software only flag to tc but incorporates a bunch of comments
> from the original attempt at this.

In case its not entirely obvious I dropped the hardware only case for
now because I'm investigating Jiri's comment. But just having the
software only case lets me handle many use cases. Now in the u32
classifier I can build graphs where part of the graph is software only
and other parts are sw/hw. This lets me use part of the mark bit or
queue information to learn if the hardware has already acted on the
skb and then parse it differently in the classifier.

> 
> First instead of having the offload decision logic be embedded in cls_u32
> I lifted into cls_pkt.h so it can be used anywhere.
> 
> In order to do this I put the flag defines in pkt_cls.h as well. However
> it was suggested that perhaps these flags could be lifted into the
> upper layer of TCA_ as well but I'm afraid this can not be done with
> existing tc design as far as I can tell. The problem is the filters are
> packed and unpacked in the classifier specific code and pushing the flags
> through the high level doesn't seem easily doable. And we already have
> this design where classifiers handle generic options such as actions and
> policers. So I think adding one more thing here is OK as 'tc', et. al.
> already know how to handle this type of thing.
> 
> Thanks,
> .John
> 
> ---
> 
> John Fastabend (3):
>   net: sched: consolidate offload decision in cls_u32
>   net: cls_u32: move TC offload feature bit into cls_u32 offload logic
>   net: sched: cls_u32 add bit to specify software only rules
> 
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |3 --
>  include/net/pkt_cls.h |   17 +++
>  include/uapi/linux/pkt_cls.h  |1 +
>  net/sched/cls_u32.c   |   37 
> ++---
>  4 files changed, 45 insertions(+), 13 deletions(-)
> 
> --
> Signature
>

[PATCH][V3] mt7601u: do not free dma_buf when ivp allocation fails

2016-02-25 Thread Colin King

From: Colin Ian King 

If the allocation of ivp fails the error handling attempts to
free an uninitialized dma_buf; this data structure just contains
garbage on the stack, so the freeing will cause issues when the
urb, buf and dma fields are free'd. Fix this by not free'ing the
dma_buf if the ivp allocation fails.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/mediatek/mt7601u/mcu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mediatek/mt7601u/mcu.c 
b/drivers/net/wireless/mediatek/mt7601u/mcu.c
index fbb1986..91c4b34 100644
--- a/drivers/net/wireless/mediatek/mt7601u/mcu.c
+++ b/drivers/net/wireless/mediatek/mt7601u/mcu.c
@@ -362,7 +362,9 @@ mt7601u_upload_firmware(struct mt7601u_dev *dev, const 
struct mt76_fw *fw)
int i, ret;
 
ivb = kmemdup(fw->ivb, sizeof(fw->ivb), GFP_KERNEL);
-   if (!ivb || mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
+   if (!ivb)
+   return -ENOMEM;
+   if (mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
ret = -ENOMEM;
goto error;
}
-- 
2.7.0

[net-next PATCH v2 3/3] net: sched: cls_u32 add bit to specify software only rules

2016-02-25 Thread John Fastabend

In the initial implementation the only way to stop a rule from being
inserted into the hardware table was via the device feature flag.
However this doesn't work well when working on an end host system
where packets are expect to hit both the hardware and software
datapaths.

For example we can imagine a rule that will match an IP address and
increment a field. If we install this rule in both hardware and
software we may increment the field twice. To date we have only
added support for the drop action so we have been able to ignore
these cases. But as we extend the action support we will hit this
example plus more such cases. Arguably these are not even corner
cases in many working systems these cases will be common.

To avoid forcing the driver to always abort (i.e. the above example)
this patch adds a flag to add a rule in software only. A careful
user can use this flag to build software and hardware datapaths
that work together. One example we have found particularly useful
is to use hardware resources to set the skb->mark on the skb when
the match may be expensive to run in software but a mark lookup
in a hash table is cheap. The idea here is hardware can do in one
lookup what the u32 classifier may need to traverse multiple lists
and hash tables to compute. The flag is only passed down on inserts
on deletion to avoid stale references in hardware we always try
to remove a rule if it exists.

The flags field is part of the classifier specific options. Although
it is tempting to lift this into the generic structure doing this
proves difficult do to how the tc netlink attributes are implemented
along with how the dump/change routines are called. There is also
precedence for putting seemingly generic pieces in the specific
classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
although not ideal I've left FLAGS in the u32 options as well as it
simplifies the code greatly and user space has already learned how
to manage these bits ala 'tc' tool.

Another thing if trying to update a rule we require the flags to
be unchanged. This is to force user space, software u32 and
the hardware u32 to keep in sync. Thanks to Simon Horman for
catching this case.

Signed-off-by: John Fastabend 
---
 include/net/pkt_cls.h|   13 +++--
 include/uapi/linux/pkt_cls.h |1 +
 net/sched/cls_u32.c  |   37 +++--
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 6096e96..42dc412 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -392,12 +392,21 @@ struct tc_cls_u32_offload {
};
 };
 
-static inline bool tc_should_offload(struct net_device *dev)
+/* tca flags definitions */
+#define TCA_CLS_FLAGS_SOFTWARE 1
+
+static inline bool tc_should_offload(struct net_device *dev, u32 flags)
 {
if (!(dev->features & NETIF_F_HW_TC))
return false;
 
-   return dev->netdev_ops->ndo_setup_tc;
+   if (flags & TCA_CLS_FLAGS_SOFTWARE)
+   return false;
+
+   if (!dev->netdev_ops->ndo_setup_tc)
+   return false;
+
+   return true;
 }
 
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 4398737..9874f568 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -172,6 +172,7 @@ enum {
TCA_U32_INDEV,
TCA_U32_PCNT,
TCA_U32_MARK,
+   TCA_U32_FLAGS,
__TCA_U32_MAX
 };
 
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 24e888b..6d4450d 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -59,6 +59,7 @@ struct tc_u_knode {
 #ifdef CONFIG_CLS_U32_PERF
struct tc_u32_pcnt __percpu *pf;
 #endif
+   u32 flags;
 #ifdef CONFIG_CLS_U32_MARK
u32 val;
u32 mask;
@@ -434,7 +435,7 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 
handle)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (tc_should_offload(dev)) {
+   if (tc_should_offload(dev, 0)) {
offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
offload.cls_u32->knode.handle = handle;
dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
@@ -442,7 +443,9 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 
handle)
}
 }
 
-static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+static void u32_replace_hw_hnode(struct tcf_proto *tp,
+struct tc_u_hnode *h,
+u32 flags)
 {
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_u32_offload u32_offload = {0};
@@ -451,7 +454,7 @@ static void u32_replace_hw_hnode(struct tcf_proto *tp, 
struct tc_u_hnode *h)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if

[net-next PATCH v2 2/3] net: cls_u32: move TC offload feature bit into cls_u32 offload logic

2016-02-25 Thread John Fastabend

In the original series drivers would get offload requests for cls_u32
rules even if the feature bit is disabled. This meant the driver had
to do a boiler plate check on the feature bit before adding/deleting
the rule.

This patch lifts the check into the core code and removes it from the
driver specific case.

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |3 ---
 include/net/pkt_cls.h |3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cf4b729..b893ff8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8400,9 +8400,6 @@ int __ixgbe_setup_tc(struct net_device *dev, u32 handle, 
__be16 proto,
 
if (TC_H_MAJ(handle) == TC_H_MAJ(TC_H_INGRESS) &&
tc->type == TC_SETUP_CLSU32) {
-   if (!(dev->features & NETIF_F_HW_TC))
-   return -EINVAL;
-
switch (tc->cls_u32->command) {
case TC_CLSU32_NEW_KNODE:
case TC_CLSU32_REPLACE_KNODE:
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index e64d20b..6096e96 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -394,6 +394,9 @@ struct tc_cls_u32_offload {
 
 static inline bool tc_should_offload(struct net_device *dev)
 {
+   if (!(dev->features & NETIF_F_HW_TC))
+   return false;
+
return dev->netdev_ops->ndo_setup_tc;
 }

[net-next PATCH v2 1/3] net: sched: consolidate offload decision in cls_u32

2016-02-25 Thread John Fastabend

The offload decision was originally very basic and tied to if the dev
implemented the appropriate ndo op hook. The next step is to allow
the user to more flexibly define if any paticular rule should be
offloaded or not. In order to have this logic in one function lift
the current check into a helper routine tc_should_offload().

Signed-off-by: John Fastabend 
---
 include/net/pkt_cls.h |5 +
 net/sched/cls_u32.c   |8 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2121df5..e64d20b 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -392,4 +392,9 @@ struct tc_cls_u32_offload {
};
 };
 
+static inline bool tc_should_offload(struct net_device *dev)
+{
+   return dev->netdev_ops->ndo_setup_tc;
+}
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index d54bc94..24e888b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -434,7 +434,7 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 
handle)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (dev->netdev_ops->ndo_setup_tc) {
+   if (tc_should_offload(dev)) {
offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
offload.cls_u32->knode.handle = handle;
dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
@@ -451,7 +451,7 @@ static void u32_replace_hw_hnode(struct tcf_proto *tp, 
struct tc_u_hnode *h)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (dev->netdev_ops->ndo_setup_tc) {
+   if (tc_should_offload(dev)) {
offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
offload.cls_u32->hnode.divisor = h->divisor;
offload.cls_u32->hnode.handle = h->handle;
@@ -471,7 +471,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *h)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (dev->netdev_ops->ndo_setup_tc) {
+   if (tc_should_offload(dev)) {
offload.cls_u32->command = TC_CLSU32_DELETE_HNODE;
offload.cls_u32->hnode.divisor = h->divisor;
offload.cls_u32->hnode.handle = h->handle;
@@ -491,7 +491,7 @@ static void u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (dev->netdev_ops->ndo_setup_tc) {
+   if (tc_should_offload(dev)) {
offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
offload.cls_u32->knode.handle = n->handle;
offload.cls_u32->knode.fshift = n->fshift;

[net-next PATCH v2 0/3] tc software only flag

2016-02-25 Thread John Fastabend

This adds a software only flag to tc but incorporates a bunch of comments
from the original attempt at this.

First instead of having the offload decision logic be embedded in cls_u32
I lifted into cls_pkt.h so it can be used anywhere.

In order to do this I put the flag defines in pkt_cls.h as well. However
it was suggested that perhaps these flags could be lifted into the
upper layer of TCA_ as well but I'm afraid this can not be done with
existing tc design as far as I can tell. The problem is the filters are
packed and unpacked in the classifier specific code and pushing the flags
through the high level doesn't seem easily doable. And we already have
this design where classifiers handle generic options such as actions and
policers. So I think adding one more thing here is OK as 'tc', et. al.
already know how to handle this type of thing.

Thanks,
.John

---

John Fastabend (3):
  net: sched: consolidate offload decision in cls_u32
  net: cls_u32: move TC offload feature bit into cls_u32 offload logic
  net: sched: cls_u32 add bit to specify software only rules


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |3 --
 include/net/pkt_cls.h |   17 +++
 include/uapi/linux/pkt_cls.h  |1 +
 net/sched/cls_u32.c   |   37 ++---
 4 files changed, 45 insertions(+), 13 deletions(-)

--
Signature

Re: [PATCH][V2] mt7601u: do not free dma_buf when ivp allocation fails

2016-02-25 Thread Julian Calaby

Hi Colin,

On Fri, Feb 26, 2016 at 10:09 AM, Colin King  wrote:
> From: Colin Ian King 
>
> If the allocation of ivp fails the error handling attempts to
> free an uninitialized dma_buf; this data structure just contains
> garbage on the stack, so the freeing will cause issues when the
> urb, buf and dma fields are free'd. Fix this by not free'ing the
> dma_buf if the ivp allocation fails.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/wireless/mediatek/mt7601u/mcu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/mediatek/mt7601u/mcu.c 
> b/drivers/net/wireless/mediatek/mt7601u/mcu.c
> index fbb1986..70e4b5e 100644
> --- a/drivers/net/wireless/mediatek/mt7601u/mcu.c
> +++ b/drivers/net/wireless/mediatek/mt7601u/mcu.c
> @@ -362,10 +362,10 @@ mt7601u_upload_firmware(struct mt7601u_dev *dev, const 
> struct mt76_fw *fw)
> int i, ret;
>
> ivb = kmemdup(fw->ivb, sizeof(fw->ivb), GFP_KERNEL);
> -   if (!ivb || mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
> -   ret = -ENOMEM;
> +   if (!ivb)
> +   return -ENOMEM;
> +   if (mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf))
> goto error;

Are you sure this is right? Isn't ret unset here and consequently
returned at the end of the error label?

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/

[PATCH][V2] mt7601u: do not free dma_buf when ivp allocation fails

2016-02-25 Thread Colin King

From: Colin Ian King 

If the allocation of ivp fails the error handling attempts to
free an uninitialized dma_buf; this data structure just contains
garbage on the stack, so the freeing will cause issues when the
urb, buf and dma fields are free'd. Fix this by not free'ing the
dma_buf if the ivp allocation fails.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/mediatek/mt7601u/mcu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt7601u/mcu.c 
b/drivers/net/wireless/mediatek/mt7601u/mcu.c
index fbb1986..70e4b5e 100644
--- a/drivers/net/wireless/mediatek/mt7601u/mcu.c
+++ b/drivers/net/wireless/mediatek/mt7601u/mcu.c
@@ -362,10 +362,10 @@ mt7601u_upload_firmware(struct mt7601u_dev *dev, const 
struct mt76_fw *fw)
int i, ret;
 
ivb = kmemdup(fw->ivb, sizeof(fw->ivb), GFP_KERNEL);
-   if (!ivb || mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
-   ret = -ENOMEM;
+   if (!ivb)
+   return -ENOMEM;
+   if (mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf))
goto error;
-   }
 
ilm_len = le32_to_cpu(fw->hdr.ilm_len) - sizeof(fw->ivb);
dev_dbg(dev->dev, "loading FW - ILM %u + IVB %zu\n",
-- 
2.7.0

Re: [net-next PATCH 3/4] net: sched: cls_u32 add bit to specify software only rules

2016-02-25 Thread John Fastabend

On 16-02-25 03:05 PM, Jamal Hadi Salim wrote:
> On 16-02-25 04:56 PM, John Fastabend wrote:
>> On 16-02-25 04:56 AM, Jamal Hadi Salim wrote:
> 
>>
>> decoding that is not a problem. The ixgbe driver code already applied
>> can decode that without much trouble. The thing I want to avoid is
>> requiring my driver to do the inverse translation because although it
>> is possible its entirely unnecessary.
>>
>> To do the above example without a software cache for example
>> means I read out row 2048 of a hardware table then you get a bunch of
>> bits. From those bits I consult what fields are being loaded into
>> the table in the packet processing pipeline. I learn its the src_ip
>> fields then I have the value/mask and can unwind it. Finally if I
>> collapsed some hash tables onto this hardware table I have to do the
>> inverse of my collapsing scheme. The ixgbe one is sort of simple just
>> because I only have one table in hardware but with multiple tables
>> its a bit more difficult. Finally I've unwound the thing and can
>> print something back out of 'tc' but it seems easier to just cache
>> the hardware rules somewhere. Maybe other driver/hardware will have
>> a different opinion though depending on how much your firmware can
>> store and how ambitious you are. Personally I don't see any need
>> for the above code.
>>
> 
> I think if you can cache the rules and have a way to easily map to
> the hardware then this would work fine.

Yep that is the goal. I think the debate is if its acceptable to do
it with an entirely new filter list ingress_hw Jiri's opinion is that
it would be best to do it inline inside the classifier. At the moment
I'm looking at the code to see if there is a clean way to do it. IMO
using a ingress_hw classifier is a nice solution.

In the meantime I just respun patches 1-3 with the feedback and will
submit those while I work out patch 4.

> 
> cheers,
> jamal

Re: [net-next PATCH 3/4] net: sched: cls_u32 add bit to specify software only rules

2016-02-25 Thread Jamal Hadi Salim


On 16-02-25 04:56 PM, John Fastabend wrote:

On 16-02-25 04:56 AM, Jamal Hadi Salim wrote:




decoding that is not a problem. The ixgbe driver code already applied
can decode that without much trouble. The thing I want to avoid is
requiring my driver to do the inverse translation because although it
is possible its entirely unnecessary.

To do the above example without a software cache for example
means I read out row 2048 of a hardware table then you get a bunch of
bits. From those bits I consult what fields are being loaded into
the table in the packet processing pipeline. I learn its the src_ip
fields then I have the value/mask and can unwind it. Finally if I
collapsed some hash tables onto this hardware table I have to do the
inverse of my collapsing scheme. The ixgbe one is sort of simple just
because I only have one table in hardware but with multiple tables
its a bit more difficult. Finally I've unwound the thing and can
print something back out of 'tc' but it seems easier to just cache
the hardware rules somewhere. Maybe other driver/hardware will have
a different opinion though depending on how much your firmware can
store and how ambitious you are. Personally I don't see any need
for the above code.



I think if you can cache the rules and have a way to easily map to
the hardware then this would work fine.

cheers,
jamal

[PATCH v18 4/9] bpf: Mark __bpf_prog_run() stack frame as non-standard

2016-02-25 Thread Josh Poimboeuf

objtool reports the following false positive warnings:

  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x5c: sibling call from callable 
instruction with changed frame pointer
  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x60: function has unreachable 
instruction
  objtool: kernel/bpf/core.o: __bpf_prog_run()+0x64: function has unreachable 
instruction
  [...]

It's confused by the following dynamic jump instruction in
__bpf_prog_run()::

  jmp *(%r12,%rax,8)

which corresponds to the following line in the C code:

  goto *jumptable[insn->code];

There's no way for objtool to deterministically find all possible
branch targets for a dynamic jump, so it can't verify this code.

In this case the jumps all stay within the function, and there's nothing
unusual going on related to the stack, so we can whitelist the function.

Signed-off-by: Josh Poimboeuf 
Acked-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Cc: netdev@vger.kernel.org
---
 kernel/bpf/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 972d9a8..be0abf6 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -649,6 +650,7 @@ load_byte:
WARN_RATELIMIT(1, "unknown opcode %02x\n", insn->code);
return 0;
 }
+STACK_FRAME_NON_STANDARD(__bpf_prog_run); /* jump table */
 
 bool bpf_prog_array_compatible(struct bpf_array *array,
   const struct bpf_prog *fp)
-- 
2.4.3

Re: [patch net-next v2 0/9] Introduce devlink interface and first drivers to use it

2016-02-25 Thread Jiri Pirko

Thu, Feb 25, 2016 at 10:53:30PM CET, han...@stressinduktion.org wrote:
>On 25.02.2016 22:12, Jiri Pirko wrote:
>>Thu, Feb 25, 2016 at 09:44:43PM CET, han...@stressinduktion.org wrote:
>>>On 25.02.2016 21:12, David Miller wrote:
From: Jiri Pirko 
Date: Tue, 23 Feb 2016 16:51:25 +0100

>There a is need for some userspace API that would allow to expose things
>that are not directly related to any device class like net_device of
>ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>
>Use cases:
>1) get/set of port type (Ethernet/InfiniBand)
>2) setting up port splitters - split port into multiple ones and squash 
>again,
>enables usage of splitter cable
>3) setting up shared buffers - shared among multiple ports within
>one chip (work in progress)
>4) configuration of switch wide properties - resources division etc - This 
>will
>allow to pass configuration that is unacceptable to be passed as
>a module option.
>
>First patch of this set introduces a new generic Netlink based interface,
>called "devlink". It is similar to nl80211 model and it is heavily
>influenced by it, including the API definition. The devlink introduction 
>patch
>implements use cases 1) and 2). Other 2 are in development atm and will
>be addressed by follow-ups.
>
>It is very convenient for drivers to use devlink, as you can see in other
>patches in this set.
>
>Counterpart for devlink is userspace tool for now called "dl". Command line
>interface and outputs are derived from "ip" tool so it should be easy
>for users to get used to it.

I am very close to applying this series as-is.

The clincher for me is that there is precendence in the nl80211 stuff,
so obviously whatever userland infrastructure sits on top of that has
found a way to deal whatever perceived shortcomings devlink has.
>>>
>>>Actually nl80211 phy interfaces aren't really managed by wpa_supplicant nor
>>>NetworkManager, but they use net_device names to discover those later on. In
>>>devlink we don't necessarily have netdev names, thus my only objection to
>>>this series is to switch to stable topology identifiers.
>>
>>Hannes, as I mentioned earlier in one of my replies, you can choose to
>>use devlink name or pci (or other) address for your commands. So you
>>have your stable names even before udev takes care of renaming. It's up
>>to you as a user what handle to use.
>
>I understood that part from the beginning. In my opinion we should simply
>keep APIs simple and clean, reduced to the minimum and let user space build
>upon this.
>
>If an entity in the kernel doesn't need a name and/or a naming database, I
>would not bother implementing it. Not everyone is a kernel developer who uses
>Linux and can infer the problems from dl --help. I bet most users will use
>names at some point in time and they simply can't match even with the udev
>quirks in place when setting up netbooting or configuration must happen from
>the currently provided initramfs. This is my point.
>
>If names are fragile to use simply don't provide it if not necessary.
>
>devlink is an interface of its own and the names won't be referenced by other
>subsystems like iptables -i  or iproute. So why bother with names in
>the kernel?
>
>On the rest with netlink etc. I agree already. I just want to eliminate
>possible mistakes by users.
>
>Hope that makes my point more clear,

Okay. You convinced me. Will send v3 removing in-kernel naming and
indexing of devlinks and will only use bus type and bus name as a handle.

Thanks.

[Patch net-next v3 1/4] net_sched: introduce qdisc_replace() helper

2016-02-25 Thread Cong Wang

Remove nearly duplicated code and prepare for the following patch.

Cc: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 include/net/sch_generic.h | 17 +
 net/sched/sch_cbq.c   |  7 +--
 net/sched/sch_drr.c   |  6 +-
 net/sched/sch_dsmark.c|  8 +---
 net/sched/sch_hfsc.c  |  6 +-
 net/sched/sch_htb.c   |  9 +
 net/sched/sch_multiq.c|  8 +---
 net/sched/sch_netem.c | 10 +-
 net/sched/sch_prio.c  |  8 +---
 net/sched/sch_qfq.c   |  6 +-
 net/sched/sch_red.c   |  7 +--
 net/sched/sch_sfb.c   |  7 +--
 net/sched/sch_tbf.c   |  8 +---
 13 files changed, 29 insertions(+), 78 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 636a362..8fdad9f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -707,6 +707,23 @@ static inline void qdisc_reset_queue(struct Qdisc *sch)
sch->qstats.backlog = 0;
 }
 
+static inline struct Qdisc *qdisc_replace(struct Qdisc *sch, struct Qdisc *new,
+ struct Qdisc **pold)
+{
+   struct Qdisc *old;
+
+   sch_tree_lock(sch);
+   old = *pold;
+   *pold = new;
+   if (old != NULL) {
+   qdisc_tree_decrease_qlen(old, old->q.qlen);
+   qdisc_reset(old);
+   }
+   sch_tree_unlock(sch);
+
+   return old;
+}
+
 static inline unsigned int __qdisc_queue_drop(struct Qdisc *sch,
  struct sk_buff_head *list)
 {
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index c538d9e..7f8474c 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1624,13 +1624,8 @@ static int cbq_graft(struct Qdisc *sch, unsigned long 
arg, struct Qdisc *new,
new->reshape_fail = cbq_reshape_fail;
 #endif
}
-   sch_tree_lock(sch);
-   *old = cl->q;
-   cl->q = new;
-   qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-   qdisc_reset(*old);
-   sch_tree_unlock(sch);
 
+   *old = qdisc_replace(sch, new, >q);
return 0;
 }
 
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index a1cd778..b96c9a8 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -226,11 +226,7 @@ static int drr_graft_class(struct Qdisc *sch, unsigned 
long arg,
new = _qdisc;
}
 
-   sch_tree_lock(sch);
-   drr_purge_queue(cl);
-   *old = cl->qdisc;
-   cl->qdisc = new;
-   sch_tree_unlock(sch);
+   *old = qdisc_replace(sch, new, >qdisc);
return 0;
 }
 
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index f357f34..cfddb1c 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -73,13 +73,7 @@ static int dsmark_graft(struct Qdisc *sch, unsigned long arg,
new = _qdisc;
}
 
-   sch_tree_lock(sch);
-   *old = p->q;
-   p->q = new;
-   qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-   qdisc_reset(*old);
-   sch_tree_unlock(sch);
-
+   *old = qdisc_replace(sch, new, >q);
return 0;
 }
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index b7ebe2c..089f3b6 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1215,11 +1215,7 @@ hfsc_graft_class(struct Qdisc *sch, unsigned long arg, 
struct Qdisc *new,
new = _qdisc;
}
 
-   sch_tree_lock(sch);
-   hfsc_purge_queue(sch, cl);
-   *old = cl->qdisc;
-   cl->qdisc = new;
-   sch_tree_unlock(sch);
+   *old = qdisc_replace(sch, new, >qdisc);
return 0;
 }
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 15ccd7f..0efbcf3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1163,14 +1163,7 @@ static int htb_graft(struct Qdisc *sch, unsigned long 
arg, struct Qdisc *new,
 cl->common.classid)) == NULL)
return -ENOBUFS;
 
-   sch_tree_lock(sch);
-   *old = cl->un.leaf.q;
-   cl->un.leaf.q = new;
-   if (*old != NULL) {
-   qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-   qdisc_reset(*old);
-   }
-   sch_tree_unlock(sch);
+   *old = qdisc_replace(sch, new, >un.leaf.q);
return 0;
 }
 
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 4e904ca..a0103a1 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -303,13 +303,7 @@ static int multiq_graft(struct Qdisc *sch, unsigned long 
arg, struct Qdisc *new,
if (new == NULL)
new = _qdisc;
 
-   sch_tree_lock(sch);
-   *old = q->queues[band];
-   q->queues[band] = new;
-   qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
-   qdisc_reset(*old);
-   sch_tree_unlock(sch);
-
+   *old = qdisc_replace(sch, new, >queues[band]);
return

[Patch net-next v3 2/4] net_sched: update hierarchical backlog too

2016-02-25 Thread Cong Wang

When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.

Cc: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 include/net/codel.h   |  4 
 include/net/sch_generic.h |  5 +++--
 net/sched/sch_api.c   |  8 +---
 net/sched/sch_cbq.c   |  5 +++--
 net/sched/sch_choke.c |  6 --
 net/sched/sch_codel.c | 10 ++
 net/sched/sch_drr.c   |  3 ++-
 net/sched/sch_fq.c|  4 +++-
 net/sched/sch_fq_codel.c  | 17 -
 net/sched/sch_hfsc.c  |  3 ++-
 net/sched/sch_hhf.c   | 10 +++---
 net/sched/sch_htb.c   | 10 ++
 net/sched/sch_multiq.c|  8 +---
 net/sched/sch_netem.c |  3 ++-
 net/sched/sch_pie.c   |  5 +++--
 net/sched/sch_prio.c  |  7 ---
 net/sched/sch_qfq.c   |  3 ++-
 net/sched/sch_red.c   |  3 ++-
 net/sched/sch_sfb.c   |  3 ++-
 net/sched/sch_sfq.c   | 16 +---
 net/sched/sch_tbf.c   |  7 +--
 21 files changed, 91 insertions(+), 49 deletions(-)

diff --git a/include/net/codel.h b/include/net/codel.h
index 267e702..d168aca 100644
--- a/include/net/codel.h
+++ b/include/net/codel.h
@@ -162,12 +162,14 @@ struct codel_vars {
  * struct codel_stats - contains codel shared variables and stats
  * @maxpacket: largest packet we've seen so far
  * @drop_count:temp count of dropped packets in dequeue()
+ * @drop_len:  bytes of dropped packets in dequeue()
  * ecn_mark:   number of packets we ECN marked instead of dropping
  * ce_mark:number of packets CE marked because sojourn time was above 
ce_threshold
  */
 struct codel_stats {
u32 maxpacket;
u32 drop_count;
+   u32 drop_len;
u32 ecn_mark;
u32 ce_mark;
 };
@@ -308,6 +310,7 @@ static struct sk_buff *codel_dequeue(struct Qdisc *sch,
  
vars->rec_inv_sqrt);
goto end;
}
+   stats->drop_len += qdisc_pkt_len(skb);
qdisc_drop(skb, sch);
stats->drop_count++;
skb = dequeue_func(vars, sch);
@@ -330,6 +333,7 @@ static struct sk_buff *codel_dequeue(struct Qdisc *sch,
if (params->ecn && INET_ECN_set_ce(skb)) {
stats->ecn_mark++;
} else {
+   stats->drop_len += qdisc_pkt_len(skb);
qdisc_drop(skb, sch);
stats->drop_count++;
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8fdad9f..e5bba89 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -396,7 +396,8 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
  struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_destroy(struct Qdisc *qdisc);
-void qdisc_tree_decrease_qlen(struct Qdisc *qdisc, unsigned int n);
+void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
+  unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
  const struct Qdisc_ops *ops);
 struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
@@ -716,7 +717,7 @@ static inline struct Qdisc *qdisc_replace(struct Qdisc 
*sch, struct Qdisc *new,
old = *pold;
*pold = new;
if (old != NULL) {
-   qdisc_tree_decrease_qlen(old, old->q.qlen);
+   qdisc_tree_reduce_backlog(old, old->q.qlen, 
old->qstats.backlog);
qdisc_reset(old);
}
sch_tree_unlock(sch);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index de1e176..3b180ff 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -744,14 +744,15 @@ static u32 qdisc_alloc_handle(struct net_device *dev)
return 0;
 }
 
-void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
+void qdisc_tree_reduce_backlog(struct Qdisc *sch, unsigned int n,
+  unsigned int len)
 {
const struct Qdisc_class_ops *cops;
unsigned long cl;
u32 parentid;
int drops;
 
-   if (n == 0)
+   if (n == 0 && len == 0)
return;
drops = max_t(int, n, 0);
rcu_read_lock();
@@ -774,11 +775,12 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned 
int n)
cops->put(sch, cl);
}
sch->q.qlen -= n;
+   sch->qstats.backlog -= len;
__qdisc_qstats_drop(sch, drops);
}
rcu_read_unlock();

[Patch net-next v3 3/4] sch_htb: update backlog as well

2016-02-25 Thread Cong Wang

We saw qlen!=0 but backlog==0 on our production machine:

qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 
3.17
 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 
requeues 0)
 backlog 0b 72p requeues 0

The problem is we only count qlen for HTB qdisc but not backlog.
We need to update backlog too when we update qlen, so that we
can at least know the average packet length.

Cc: Jamal Hadi Salim 
Acked-by: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/sch_htb.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 846a7f9..87b02ed3 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -600,6 +600,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc 
*sch)
htb_activate(q, cl);
}
 
+   qdisc_qstats_backlog_inc(sch, skb);
sch->q.qlen++;
return NET_XMIT_SUCCESS;
 }
@@ -889,6 +890,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 ok:
qdisc_bstats_update(sch, skb);
qdisc_unthrottled(sch);
+   qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--;
return skb;
}
@@ -955,6 +957,7 @@ static unsigned int htb_drop(struct Qdisc *sch)
unsigned int len;
if (cl->un.leaf.q->ops->drop &&
(len = cl->un.leaf.q->ops->drop(cl->un.leaf.q))) {
+   sch->qstats.backlog -= len;
sch->q.qlen--;
if (!cl->un.leaf.q->q.qlen)
htb_deactivate(q, cl);
@@ -984,12 +987,12 @@ static void htb_reset(struct Qdisc *sch)
}
cl->prio_activity = 0;
cl->cmode = HTB_CAN_SEND;
-
}
}
qdisc_watchdog_cancel(>watchdog);
__skb_queue_purge(>direct_queue);
sch->q.qlen = 0;
+   sch->qstats.backlog = 0;
memset(q->hlevel, 0, sizeof(q->hlevel));
memset(q->row_mask, 0, sizeof(q->row_mask));
for (i = 0; i < TC_HTB_NUMPRIO; i++)
-- 
2.1.0

[Patch net-next v3 0/4] net_sched: update backlog for hierarchical qdisc's

2016-02-25 Thread Cong Wang

For hierarchical qdisc like HTB, we currently only update its qlen
but leave its backlog as zero:

qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 
ver 3.17
 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 
requeues 0)
 backlog 0b 72p requeues 0

This patchset makes backlog as accurate as qlen.

---
v3: rebase and fix the n==0 case for qdisc_tree_reduce_backlog()
v2: rebase and update changelog, not code change

Cong Wang (4):
  net_sched: introduce qdisc_replace() helper
  net_sched: update hierarchical backlog too
  sch_htb: update backlog as well
  sch_dsmark: update backlog as well

 include/net/codel.h   |  4 
 include/net/sch_generic.h | 20 +++-
 net/sched/sch_api.c   |  8 +---
 net/sched/sch_cbq.c   | 12 
 net/sched/sch_choke.c |  6 --
 net/sched/sch_codel.c | 10 ++
 net/sched/sch_drr.c   |  9 +++--
 net/sched/sch_dsmark.c| 11 ---
 net/sched/sch_fq.c|  4 +++-
 net/sched/sch_fq_codel.c  | 17 -
 net/sched/sch_hfsc.c  |  9 +++--
 net/sched/sch_hhf.c   | 10 +++---
 net/sched/sch_htb.c   | 24 +++-
 net/sched/sch_multiq.c| 16 ++--
 net/sched/sch_netem.c | 13 +++--
 net/sched/sch_pie.c   |  5 +++--
 net/sched/sch_prio.c  | 15 +--
 net/sched/sch_qfq.c   |  9 +++--
 net/sched/sch_red.c   | 10 +++---
 net/sched/sch_sfb.c   | 10 +++---
 net/sched/sch_sfq.c   | 16 +---
 net/sched/sch_tbf.c   | 15 ++-
 22 files changed, 126 insertions(+), 127 deletions(-)

-- 
2.1.0

[Patch net-next v3 4/4] sch_dsmark: update backlog as well

2016-02-25 Thread Cong Wang

Similarly, we need to update backlog too when we update qlen.

Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/sch_dsmark.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index cfddb1c..d0dff0c 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -258,6 +258,7 @@ static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc 
*sch)
return err;
}
 
+   qdisc_qstats_backlog_inc(sch, skb);
sch->q.qlen++;
 
return NET_XMIT_SUCCESS;
@@ -280,6 +281,7 @@ static struct sk_buff *dsmark_dequeue(struct Qdisc *sch)
return NULL;
 
qdisc_bstats_update(sch, skb);
+   qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--;
 
index = skb->tc_index & (p->indices - 1);
@@ -395,6 +397,7 @@ static void dsmark_reset(struct Qdisc *sch)
 
pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p);
qdisc_reset(p->q);
+   sch->qstats.backlog = 0;
sch->q.qlen = 0;
 }
 
-- 
2.1.0

Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread Jamal Hadi Salim


On 16-02-25 04:46 PM, Daniel Borkmann wrote:

On 02/25/2016 01:20 PM, Jamal Hadi Salim wrote:



Let me think about it. Likely it will be subsequent patches - I just
want to get this set out first.


Yes, I mean one of the key motivation was "[...] to horizontally scale
packet processing at scope of a chasis or rack [...]".



By adding more processing points horizontally. Please read the paper;
I think we are getting to a point where this discussion is no longer
productive.


So for people
who don't have that NIC with embedded Cavium processor, they might
already hit scalability issues for encode/decode right there.



A lock on a policy that already matches is not a serious
scaling problem that you need to offload to an embedded NIC.
Do note the point here is to scale by adding more machines.
And this shit is deployed and has proven it does scale.

cheers,
jamal

Re: [net-next PATCH 0/5] net_sched: Add support for IFE action

2016-02-25 Thread Jamal Hadi Salim


On 16-02-25 04:34 PM, Daniel Borkmann wrote:

On 02/25/2016 01:23 PM, Jamal Hadi Salim wrote:

On 16-02-24 12:48 PM, Daniel Borkmann wrote:

On 02/24/2016 01:49 PM, Jamal Hadi Salim wrote:

[...]

Drivers do set the hash. My use case is slightly different.
I have a NIC which has an embedded cavium processor. This thing
strips off the TLV and uses the hash to select the host MSI.
Only thing we dont use at the moment is queue_mapping.


Ok, but the example says ingress qdisc. ;) I presume the driver for the
NIC and the offloading parts are non-public? :/


Well, it is not exactly mellanox or intel - but I am sure someone would
be willing to sell that NIC.


So, without them, placing
this on ingress qdisc doesn't seem much useful wrt the skb hash example,
and most people only have the software part (for ingress I mean)
available.


Do you want me to take skbbhash out? I just want to get this set in then
worry about adding and modifying.


Well, if the NIC driver and offloading parts for ife are not available
(or cannot be made available)


If you are willing to pay for one i can have one sold to you.
Note: There is no dependency on such a NIC. You asked for an example
of how one would use skbhash and i pointed it out to you.
Infact nothing in the commit notes pointed to any NIC.


in the upstream kernel tree, then you have
to assume that everyone else can _only_ use this with the existing tc
facilities in the kernel. And as such, the whole set needs to be evaluated,
I think this is nothing new. ;-)



Seriously this is getting to a ridiculous level now.
I gave a talk - which you attended. I wrote a paper which you may have
at minimal glanced at.
I have had discussions with you on this very subject.
And as soon as i posted the patches your statements led from you
needing a good use case to why not use a netdev and why i have
all these things that are not very useful. None of which came up before.
For someone who works on ebpf - where there is plenty of code that
noone gets to see you are making rather bold statements.


That is why I asked for a "typical setup" and use-case earlier. And if
it doesn't have any obvious ones in upstream context without the missing
pieces, then the code might linger around unused by anyone. So taking out
these parts would be highly preferred (unless there's a _good_ reason not
to).



So is there anything that is useful in your view?

cheers,
jamal

Re: [PATCH net-next 0/6] qed*: Driver updates

2016-02-25 Thread David Miller

From: Yuval Mintz 
Date: Wed, 24 Feb 2016 16:52:44 +0200

> Usually I try to provide a sensible description of the patch set even if
> it lacks a general 'motif', but this simply contains several small,
> unrelated and self-explenatory tweaks and additions.

Series applied, however

I really wonder about how you stop and start the TX queues on carrier status
change.

The qdisc layer does that for you already, and prevents any TX from occuring
when the carrier is off.

So please explore removing that seemingly needless code.

Thanks.

[PATCH] mt7601u: do not free dma_buf when ivp allocation fails

2016-02-25 Thread Colin King

From: Colin Ian King 

If the allocation of ivp fails the error handling attempts to
free an uninitialized dma_buf; this data structure just contains
garbage on the stack, so the freeing will cause issues when the
urb, buf and dma fields are cleaned. Fix this by handling the
ivp check and

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/mediatek/mt7601u/mcu.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt7601u/mcu.c 
b/drivers/net/wireless/mediatek/mt7601u/mcu.c
index fbb1986..06ac657 100644
--- a/drivers/net/wireless/mediatek/mt7601u/mcu.c
+++ b/drivers/net/wireless/mediatek/mt7601u/mcu.c
@@ -359,13 +359,13 @@ mt7601u_upload_firmware(struct mt7601u_dev *dev, const 
struct mt76_fw *fw)
struct mt7601u_dma_buf dma_buf;
void *ivb;
u32 ilm_len, dlm_len;
-   int i, ret;
+   int i, ret = -ENOMEM;
 
ivb = kmemdup(fw->ivb, sizeof(fw->ivb), GFP_KERNEL);
-   if (!ivb || mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf)) {
-   ret = -ENOMEM;
+   if (!ivb)
+   goto error_ivb;
+   if (mt7601u_usb_alloc_buf(dev, MCU_FW_URB_SIZE, _buf))
goto error;
-   }
 
ilm_len = le32_to_cpu(fw->hdr.ilm_len) - sizeof(fw->ivb);
dev_dbg(dev->dev, "loading FW - ILM %u + IVB %zu\n",
@@ -396,8 +396,9 @@ mt7601u_upload_firmware(struct mt7601u_dev *dev, const 
struct mt76_fw *fw)
 
dev_dbg(dev->dev, "Firmware running!\n");
 error:
-   kfree(ivb);
mt7601u_usb_free_buf(dev, _buf);
+error_ivb:
+   kfree(ivb);
 
return ret;
 }
-- 
2.7.0

Re: [PATCH/RFC v4 net-next] ravb: Add dma queue interrupt support

2016-02-25 Thread Sergei Shtylyov


On 02/21/2016 10:16 PM, Sergei Shtylyov wrote:

[...]


From: Kazuya Mizuguchi 

This patch supports the following interrupts.

- One interrupt for multiple (descriptor, error, management)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control
rx/tx)

This patch improve efficiency of the interrupt handler by adding the
interrupt handler corresponding to each interrupt source described
above. Additionally, it reduces the number of times of the access to
EthernetAVB IF.

Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 


[...]


diff --git a/drivers/net/ethernet/renesas/ravb_main.c
b/drivers/net/ethernet/renesas/ravb_main.c
index ac43ed9..076f25f 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c

[...]

+  struct net_device *ndev, struct device *dev,
+  const char *ch)
+{
+   char *name;
+   int error;
+
+   name = devm_kasprintf(dev, GFP_KERNEL, "%s:%s", ndev->name, ch);
+   error = request_irq(irq, handler, IRQF_SHARED, name, ndev);


 Not sure if we need IRQF_SHARED on those IRQs, they're not really
shareable...


I don't know whether this causes something bad.
I think this controller is supporting a shared IRQ.


Based on the high-level trigger, I'd rather suspect not. Anyway, all the
SoC IRQs are dedicated to a certain (single) source.


I don't want to change that if there is no fatal issue in the use of
IRQF_SHARED.
However, I will remove the flag if it is simple waste...


It's not a waste but it just shouldn't be needed.


Actually for RX/TX DMA you're routing 2 interrupts to the same handler, so
it's needed, sorry.


   Scratch that, it's not multiple handlers, it's multiple IRQs...


[...]


Thanks,
kaneko


MBR, Sergei

Re: [Patch] rose_route_frame() NULL pointer dereference kernel panic

2016-02-25 Thread David Miller

From: f6bvp 
Date: Wed, 24 Feb 2016 17:53:11 +0100

> @@ -863,6 +863,11 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb
> *ax25)

First of all this patch was corrupted by your email client.

> int res = 0;
> char buf[11];
> 
> +   if (ax25 == NULL) {
> +   printk("Null ax25 destination !\n");
> +   return res;
> +   }

Second of all, this log message is not appropriate, especially given that
rose_xmit() calls rose_rout_frame() with an explicit NULL second argument.

Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread John Fastabend

On 16-02-25 01:46 PM, Daniel Borkmann wrote:
> On 02/25/2016 01:20 PM, Jamal Hadi Salim wrote:
>> On 16-02-24 12:37 PM, Daniel Borkmann wrote:
>>> On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:
 From: Jamal Hadi Salim 
>>> [...]
 +static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
 +[TCA_IFE_PARMS] = {.len = sizeof(struct tc_ife)},
 +[TCA_IFE_DMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
 +[TCA_IFE_SMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
>>>
>>> This is buggy btw ...
>>
>> I am sure i cutnpasted that from somewhere. Thanks for catching
>> it; I will remove NLA_BINARY ref.
> 
> Yeah, NLA_BINARY seems to be a bit of a misleading name. We should
> probably audit, if there are more such users already in the tree.
> 

At some point in the past (maybe a year ago?) I went through and
fixed a handful of these but yeah it seems to be a common error.

> [...]
>>> Maybe try to make this lockless in the fast path? Otherwise placing
>>> this on ingress / egress (f.e. clsact) doesn't really scale.
>>
>> Let me think about it. Likely it will be subsequent patches - I just
>> want to get this set out first.
> 
> Yes, I mean one of the key motivation was "[...] to horizontally scale
> packet processing at scope of a chasis or rack [...]". So for people
> who don't have that NIC with embedded Cavium processor, they might
> already hit scalability issues for encode/decode right there.
> 
> Thanks again,
> Daniel

Re: [PATCH net-next] tipc: fix null deref crash in compat config path

2016-02-25 Thread David Miller

From: Florian Westphal 
Date: Wed, 24 Feb 2016 17:20:17 +0100

> msg.dst_sk needs to be set up with a valid socket because some callbacks
> later derive the netns from it.
> 
> Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast 
> message allocation")
> Reported-by: Jon Maloy 
> Bisected-by: Jon Maloy 
> Signed-off-by: Florian Westphal 
> Acked-by Jon Maloy 

Applied.

Re: [PATCH net-next 1/1] tipc: fix crash during node removal

2016-02-25 Thread David Miller

From: Jon Maloy 
Date: Wed, 24 Feb 2016 11:10:48 -0500

> When the TIPC module is unloaded, we have identified a race condition
> that allows a node reference counter to go to zero and the node instance
> being freed before the node timer is finished with accessing it. This
> leads to occasional crashes, especially in multi-namespace environments.
> 
> The scenario goes as follows:
> 
> CPU0:(node_stop)   CPU1:(node_timeout)  // ref == 2
> 
> 1:  if(!mod_timer())
> 2: if (del_timer())
> 3:   tipc_node_put()// ref -> 1
> 4: tipc_node_put()  // ref -> 0
> 5:   kfree_rcu(node);
> 6:   tipc_node_get(node)
> 7:   // BOOM!
> 
> We now clean up this functionality as follows:
> 
> 1) We remove the node pointer from the node lookup table before we
>attempt deactivating the timer. This way, we reduce the risk that
>tipc_node_find() may obtain a valid pointer to an instance marked
>for deletion; a harmless but undesirable situation.
> 
> 2) We use del_timer_sync() instead of del_timer() to safely deactivate
>the node timer without any risk that it might be reactivated by the
>timeout handler. There is no risk of deadlock here, since the two
>functions never touch the same spinlocks.
> 
> 3: We remove a pointless tipc_node_get() + tipc_node_put() from the
>timeout handler.
> 
> Reported-by: Zhijiang Hu 
> Acked-by: Ying Xue 
> Signed-off-by: Jon Maloy 

Applied.

Re: [PATCH net-next 1/1] tipc: eliminate risk of finding to-be-deleted node instance

2016-02-25 Thread David Miller

From: Jon Maloy 
Date: Wed, 24 Feb 2016 11:00:19 -0500

> Although we have never seen it happen, we have identified the
> following problematic scenario when nodes are stopped and deleted:
> 
> CPU0:CPU1:
> 
> tipc_node_xxx()   //ref == 1
>tipc_node_put()//ref -> 0
>  tipc_node_find() // node still in table
>tipc_node_delete()
>  list_del_rcu(n. list)
>  tipc_node_get()  //ref -> 1, bad
>  kfree_rcu()
> 
>  tipc_node_put() //ref to 0 again.
>  kfree_rcu() // BOOM!
> 
> We fix this by introducing use of the conditional kref_get_if_not_zero()
> instead of kref_get() in the function tipc_node_find(). This eliminates
> any risk of post-mortem access.
> 
> Reported-by: Zhijiang Hu 
> Acked-by: Ying Xue 
> Signed-off-by: Jon Maloy 

Applied.

Re: [net-next v2 00/15][pull request] 1GbE Intel Wired LAN Driver Updates 2016-02-24

2016-02-25 Thread Jeff Kirsher

On Thu, 2016-02-25 at 09:45 +0200, Or Gerlitz wrote:
> On Thu, Feb 25, 2016 at 4:14 AM, Jeff Kirsher
>  wrote:
> > v2: Dropped patches 6-10 in the original series.  Patch 6-7 added
> support
> > for character device for AVB and based on community feedback,
> we do not
> > want to do this.  Patches 8-10 provided fixes to the
> problematic code
> > added in patches 6 & 7.  So all of them must go!
> 
> Jeff,
> 
> Did you had special reason not to squash patches 8-10 into 6-7 in the
> 1st place?

Well is all water under the bridge now, but I had thought of just
merging Julia's patches into Gangfeng's original series, the only
reason I did not was that the fixes were not by the original author, so
Julia would have lost the recognition for fixes.  Had the the original
series not compiled or was functionally broken, I would have required
Gangfeng to re-spin the original series.

signature.asc
Description: This is a digitally signed message part

Re: [net-next PATCH 3/4] net: sched: cls_u32 add bit to specify software only rules

2016-02-25 Thread John Fastabend

On 16-02-25 04:56 AM, Jamal Hadi Salim wrote:
> On 16-02-24 11:04 PM, John Fastabend wrote:
>> On 16-02-24 05:31 AM, Jamal Hadi Salim wrote:
> 
>> I think this is absolutely necessary not only for performance of
>> reporting the rules back to software but if we don't do it generically
>> the driver will have to do it anyways because doing the inverse
>> transformation from hw implementation to u32 is really tricky and in
>> fact with hnodes and knodes there are multiple cls_u32 "programs" that
>> functionally are the same so we have no way to return what the user
>> actually programmed without it. Further eBPF (the next classifier I'm
>> working on) is even worse in this regard.
> 
> Ok, I guess there are multiple use cases for it ;->
> Yes, with ebpf it will be worse because data and instructions are
> inter- mingled (and our interest is in data only). Note:
> Over the years this has been a big struggle for human
> friendliness. I thought we didnt care about humans (as in automation)
> but you are saying this will affect machines too ;-> We cant allow
> that ;->
> 
> Note: You could decode u32 descriptions but it is an involved effort.
> Example, see this feature in tc:
> ---
> jhs@jhs1 tc -pretty filter ls dev $ETH parent : protocol ip
> filter pref 11 u32
> filter pref 11 u32 fh 800: ht divisor 1
> filter pref 11 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
>   match IP src 10.0.0.130/32
> action order 1: gact action drop
>  random type none pass val 0
>  index 1 ref 1 bind 1
> 

decoding that is not a problem. The ixgbe driver code already applied
can decode that without much trouble. The thing I want to avoid is
requiring my driver to do the inverse translation because although it
is possible its entirely unnecessary.

To do the above example without a software cache for example
means I read out row 2048 of a hardware table then you get a bunch of
bits. From those bits I consult what fields are being loaded into
the table in the packet processing pipeline. I learn its the src_ip
fields then I have the value/mask and can unwind it. Finally if I
collapsed some hash tables onto this hardware table I have to do the
inverse of my collapsing scheme. The ixgbe one is sort of simple just
because I only have one table in hardware but with multiple tables
its a bit more difficult. Finally I've unwound the thing and can
print something back out of 'tc' but it seems easier to just cache
the hardware rules somewhere. Maybe other driver/hardware will have
a different opinion though depending on how much your firmware can
store and how ambitious you are. Personally I don't see any need
for the above code.

> See that "match" field reading in anglais? It requires more and more
> additions of pretty printers that translate back.
> 
> What about adding some tag to allow for easy "babel translation"?
> 

not sure what this would be...

>> You can see my solution to
>> this "load in hardware" filter list in patch 4/4. See Jiri's comment
>> also on that and see if you agree.
> 
> Ok, will do.
> 
> cheers,
> jamal

Re: [patch net-next v2 0/9] Introduce devlink interface and first drivers to use it

2016-02-25 Thread Hannes Frederic Sowa

On 25.02.2016 22:12, Jiri Pirko wrote:

Thu, Feb 25, 2016 at 09:44:43PM CET, han...@stressinduktion.org wrote:

On 25.02.2016 21:12, David Miller wrote:

From: Jiri Pirko 
Date: Tue, 23 Feb 2016 16:51:25 +0100

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

Use cases:
1) get/set of port type (Ethernet/InfiniBand)
2) setting up port splitters - split port into multiple ones and squash again,
enables usage of splitter cable
3) setting up shared buffers - shared among multiple ports within
one chip (work in progress)
4) configuration of switch wide properties - resources division etc - This will
allow to pass configuration that is unacceptable to be passed as
a module option.

First patch of this set introduces a new generic Netlink based interface,
called "devlink". It is similar to nl80211 model and it is heavily
influenced by it, including the API definition. The devlink introduction patch
implements use cases 1) and 2). Other 2 are in development atm and will
be addressed by follow-ups.

It is very convenient for drivers to use devlink, as you can see in other
patches in this set.

Counterpart for devlink is userspace tool for now called "dl". Command line
interface and outputs are derived from "ip" tool so it should be easy
for users to get used to it.

I am very close to applying this series as-is.

The clincher for me is that there is precendence in the nl80211 stuff,
so obviously whatever userland infrastructure sits on top of that has
found a way to deal whatever perceived shortcomings devlink has.

Actually nl80211 phy interfaces aren't really managed by wpa_supplicant nor
NetworkManager, but they use net_device names to discover those later on. In
devlink we don't necessarily have netdev names, thus my only objection to
this series is to switch to stable topology identifiers.

Hannes, as I mentioned earlier in one of my replies, you can choose to
use devlink name or pci (or other) address for your commands. So you
have your stable names even before udev takes care of renaming. It's up
to you as a user what handle to use.

I understood that part from the beginning. In my opinion we should 
simply keep APIs simple and clean, reduced to the minimum and let user 
space build upon this.

If an entity in the kernel doesn't need a name and/or a naming database, 
I would not bother implementing it. Not everyone is a kernel developer 
who uses Linux and can infer the problems from dl --help. I bet most 
users will use names at some point in time and they simply can't match 
even with the udev quirks in place when setting up netbooting or 
configuration must happen from the currently provided initramfs. This is 
my point.

If names are fragile to use simply don't provide it if not necessary.

devlink is an interface of its own and the names won't be referenced by 
other subsystems like iptables -i  or iproute. So why bother with 
names in the kernel?

On the rest with netlink etc. I agree already. I just want to eliminate 
possible mistakes by users.

Hope that makes my point more clear,
Hannes

Re: [PATCH 3/3] 3c59x: Use setup_timer()

2016-02-25 Thread David Miller

From: Amitoj Kaur Chawla 
Date: Wed, 24 Feb 2016 19:28:19 +0530

> Convert a call to init_timer and accompanying intializations of
> the timer's data and function fields to a call to setup_timer.
> 
> The Coccinelle semantic patch that fixes this problem is
> as follows:
>  
> // 
> @@
> expression t,f,d;
> @@
> 
> -init_timer();
> +setup_timer(,f,d);
>  ...
> -t.data = d;
> -t.function = f;
> // 
> 
> Signed-off-by: Amitoj Kaur Chawla 

Applied.

Re: [PATCH] netxen: Use kobj_to_dev()

2016-02-25 Thread David Miller

From: Amitoj Kaur Chawla 
Date: Wed, 24 Feb 2016 20:09:38 +0530

> Introduce the use of kobj_to_dev() helper function instead of open
> coding it with container_of()
> 
> The Coccinelle semantic patch used to make this change is as follows:
> 
> //
> @@
> expression a;
> symbol kobj;
> @@
> - container_of(a, struct device, kobj)
> + kobj_to_dev(a)
> //
> 
> Signed-off-by: Amitoj Kaur Chawla 

Applied.

Re: [PATCH 1/3] net: tulip: Use setup_timer()

2016-02-25 Thread David Miller

From: Amitoj Kaur Chawla 
Date: Wed, 24 Feb 2016 19:27:49 +0530

> Convert a call to init_timer and accompanying intializations of
> the timer's data and function fields to a call to setup_timer.
> 
> The Coccinelle semantic patch that fixes this problem is
> as follows:
> 
> // 
> @@
> expression t,f,d;
> @@
> 
> -init_timer();
> +setup_timer(,f,d);
> -t.data = d;
> -t.function = f;
> // 
> 
> Signed-off-by: Amitoj Kaur Chawla 

Applied.

Re: [PATCH 2/3] forcedeth: Use setup_timer()

2016-02-25 Thread David Miller

From: Amitoj Kaur Chawla 
Date: Wed, 24 Feb 2016 19:28:01 +0530

> Convert a call to init_timer and accompanying intializations of
> the timer's data and function fields to a call to setup_timer.
> 
> The Coccinelle semantic patch that fixes this problem is
> as follows:
>  
> // 
> @@
> expression t,f,d;
> @@
> 
> -init_timer();
> +setup_timer(,f,d);
> -t.data = d;
> -t.function = f;
> // 
> 
> Signed-off-by: Amitoj Kaur Chawla 

Applied.

Re: [patch net-next v2 0/9] Introduce devlink interface and first drivers to use it

2016-02-25 Thread Andy Gospodarek

On Thu, Feb 25, 2016 at 03:12:47PM -0500, David Miller wrote:
> From: Jiri Pirko 
> Date: Tue, 23 Feb 2016 16:51:25 +0100
> 
> > There a is need for some userspace API that would allow to expose things
> > that are not directly related to any device class like net_device of
> > ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> > 
> > Use cases:
> > 1) get/set of port type (Ethernet/InfiniBand)
> > 2) setting up port splitters - split port into multiple ones and squash 
> > again,
> >enables usage of splitter cable
> > 3) setting up shared buffers - shared among multiple ports within
> >one chip (work in progress)
> > 4) configuration of switch wide properties - resources division etc - This 
> > will
> >allow to pass configuration that is unacceptable to be passed as
> >a module option.
> > 
> > First patch of this set introduces a new generic Netlink based interface,
> > called "devlink". It is similar to nl80211 model and it is heavily
> > influenced by it, including the API definition. The devlink introduction 
> > patch
> > implements use cases 1) and 2). Other 2 are in development atm and will
> > be addressed by follow-ups.
> > 
> > It is very convenient for drivers to use devlink, as you can see in other
> > patches in this set.
> > 
> > Counterpart for devlink is userspace tool for now called "dl". Command line
> > interface and outputs are derived from "ip" tool so it should be easy
> > for users to get used to it.
> 
> I am very close to applying this series as-is.
> 
> The clincher for me is that there is precendence in the nl80211 stuff,
> so obviously whatever userland infrastructure sits on top of that has
> found a way to deal whatever perceived shortcomings devlink has.
> 
> If all that people can come up with is "device names! omg UDEV!" and
> "multiple tools are hard to use!"  That's awesome, because it means
> nobody has any real substantial objection to this facility. :-)

Generally speaking I had no issue with the patch after being *ahem*
corrected to realize that the set operations only applied to the devlink
devices not specifically to netdevs.

I find this interface useful for some other items we have discussed in
the past (resource limits for routes, etc), so I'm happy to see some
interface for general chip maintenance included.

Re: [net-next PATCH v2 1/5] introduce IFE action

2016-02-25 Thread Daniel Borkmann


On 02/25/2016 01:20 PM, Jamal Hadi Salim wrote:

On 16-02-24 12:37 PM, Daniel Borkmann wrote:

On 02/23/2016 01:49 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

[...]

+static const struct nla_policy ife_policy[TCA_IFE_MAX + 1] = {
+[TCA_IFE_PARMS] = {.len = sizeof(struct tc_ife)},
+[TCA_IFE_DMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},
+[TCA_IFE_SMAC] = {.type = NLA_BINARY,.len = ETH_ALEN},


This is buggy btw ...


I am sure i cutnpasted that from somewhere. Thanks for catching
it; I will remove NLA_BINARY ref.


Yeah, NLA_BINARY seems to be a bit of a misleading name. We should
probably audit, if there are more such users already in the tree.

[...]

Maybe try to make this lockless in the fast path? Otherwise placing
this on ingress / egress (f.e. clsact) doesn't really scale.


Let me think about it. Likely it will be subsequent patches - I just
want to get this set out first.


Yes, I mean one of the key motivation was "[...] to horizontally scale
packet processing at scope of a chasis or rack [...]". So for people
who don't have that NIC with embedded Cavium processor, they might
already hit scalability issues for encode/decode right there.

Thanks again,
Daniel

[PATCH net-next] hv_netvsc: add ethtool support for set and get of settings

2016-02-25 Thread Simon Xiao

This patch allows the user to set and retrieve speed and duplex of the
hv_netvsc device via ethtool.

Example:
$ ethtool eth0
Settings for eth0:
...
Speed: Unknown!
Duplex: Unknown! (255)
...
$ ethtool -s eth0 speed 1000 duplex full
$ ethtool eth0
Settings for eth0:
...
Speed: 1000Mb/s
Duplex: Full
...

This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.

Signed-off-by: Simon Xiao 
---
 drivers/net/hyperv/hyperv_net.h |  4 +++
 drivers/net/hyperv/netvsc_drv.c | 56 +
 2 files changed, 60 insertions(+)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index fcb92c0..b4c6878 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -658,6 +658,10 @@ struct net_device_context {
 
struct netvsc_stats __percpu *tx_stats;
struct netvsc_stats __percpu *rx_stats;
+
+   /* Ethtool settings */
+   u8 duplex;
+   u32 speed;
 };
 
 /* Per netvsc device */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 202e2b1..e703b9a 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -799,6 +799,58 @@ static int netvsc_set_channels(struct net_device *net,
goto do_set;
 }
 
+static bool netvsc_validate_ethtool_ss_cmd(const struct ethtool_cmd *cmd)
+{
+   struct ethtool_cmd diff1 = *cmd;
+   struct ethtool_cmd diff2 = {};
+
+   ethtool_cmd_speed_set(, 0);
+   diff1.duplex = 0;
+   /* advertising and cmd are usually set */
+   diff1.advertising = 0;
+   diff1.cmd = 0;
+   /* We set port to PORT_OTHER */
+   diff2.port = PORT_OTHER;
+
+   return !memcmp(, , sizeof(diff1));
+}
+
+static void netvsc_init_settings(struct net_device *dev)
+{
+   struct net_device_context *ndc = netdev_priv(dev);
+
+   ndc->speed = SPEED_UNKNOWN;
+   ndc->duplex = DUPLEX_UNKNOWN;
+}
+
+static int netvsc_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+   struct net_device_context *ndc = netdev_priv(dev);
+
+   ethtool_cmd_speed_set(cmd, ndc->speed);
+   cmd->duplex = ndc->duplex;
+   cmd->port = PORT_OTHER;
+
+   return 0;
+}
+
+static int netvsc_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+   struct net_device_context *ndc = netdev_priv(dev);
+   u32 speed;
+
+   speed = ethtool_cmd_speed(cmd);
+   if (!ethtool_validate_speed(speed) ||
+   !ethtool_validate_duplex(cmd->duplex) ||
+   !netvsc_validate_ethtool_ss_cmd(cmd))
+   return -EINVAL;
+
+   ndc->speed = speed;
+   ndc->duplex = cmd->duplex;
+
+   return 0;
+}
+
 static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 {
struct net_device_context *ndevctx = netdev_priv(ndev);
@@ -923,6 +975,8 @@ static const struct ethtool_ops ethtool_ops = {
.get_channels   = netvsc_get_channels,
.set_channels   = netvsc_set_channels,
.get_ts_info= ethtool_op_get_ts_info,
+   .get_settings   = netvsc_get_settings,
+   .set_settings   = netvsc_set_settings,
 };
 
 static const struct net_device_ops device_ops = {
@@ -1112,6 +1166,8 @@ static int netvsc_probe(struct hv_device *dev,
netif_set_real_num_tx_queues(net, nvdev->num_chn);
netif_set_real_num_rx_queues(net, nvdev->num_chn);
 
+   netvsc_init_settings(net);
+
ret = register_netdev(net);
if (ret != 0) {
pr_err("Unable to register netdev.\n");
-- 
2.5.0

Re: [net-next PATCH 0/5] net_sched: Add support for IFE action

2016-02-25 Thread Daniel Borkmann


On 02/25/2016 01:23 PM, Jamal Hadi Salim wrote:

On 16-02-24 12:48 PM, Daniel Borkmann wrote:

On 02/24/2016 01:49 PM, Jamal Hadi Salim wrote:

[...]

Drivers do set the hash. My use case is slightly different.
I have a NIC which has an embedded cavium processor. This thing
strips off the TLV and uses the hash to select the host MSI.
Only thing we dont use at the moment is queue_mapping.


Ok, but the example says ingress qdisc. ;) I presume the driver for the
NIC and the offloading parts are non-public? :/


Well, it is not exactly mellanox or intel - but I am sure someone would
be willing to sell that NIC.


So, without them, placing
this on ingress qdisc doesn't seem much useful wrt the skb hash example,
and most people only have the software part (for ingress I mean) available.


Do you want me to take skbbhash out? I just want to get this set in then
worry about adding and modifying.


Well, if the NIC driver and offloading parts for ife are not available
(or cannot be made available) in the upstream kernel tree, then you have
to assume that everyone else can _only_ use this with the existing tc
facilities in the kernel. And as such, the whole set needs to be evaluated,
I think this is nothing new. ;-)

That is why I asked for a "typical setup" and use-case earlier. And if
it doesn't have any obvious ones in upstream context without the missing
pieces, then the code might linger around unused by anyone. So taking out
these parts would be highly preferred (unless there's a _good_ reason not
to).

Thanks,
Daniel

Re: [PATCH net 3/7] net/mlx5e: Fix soft lockup when HW Timestamping is enabled

2016-02-25 Thread David Miller

From: Saeed Mahameed 
Date: Wed, 24 Feb 2016 13:18:49 +0200

> From: Eran Ben Elisha 
> 
> Readers/Writers lock for SW timecounter was acquired without disabling
> interrupts on local CPU.
> 
> The problematic scenario:
> * HW timestamping is enabled
> * Timestamp overflow periodic service task is running on local CPU and
>   holding write_lock for SW timecounter
> * Completion arrives, triggers interrupt for local CPU.
>   Interrupt routine calls napi_schedule(), which triggers rx/tx
>   skb process.
>   An attempt to read SW timecounter using read_lock is done, which is
>   already locked by a writer on the same CPU and cause soft lockup.
> 
> Add irqsave/irqrestore for when using the readers/writers lock.
> 
> Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
> Signed-off-by: Eran Ben Elisha 
> Signed-off-by: Saeed Mahameed 

I'd like to suggest a slight modification to this fix.

The canonical way to deal with this issue is to use IRQ disabling
for the write lock side, but not for the read lock side.

You only need to prevent a read lock via interrupt from nesting on
the same cpu where you took the write lock.

So you don't need the expensive IRQ disabling for the read lock side,
which is the more important path performance wise.

Thanks.

Re: [PATCH] net: thunderx: Fix for Qset error due to CQ full

2016-02-25 Thread David Miller

From: sunil.kovv...@gmail.com
Date: Wed, 24 Feb 2016 16:40:50 +0530

> From: Sunil Goutham 
> 
> On Thunderx pass 1.x and pass2 due to a HW errata default CQ
> DROP_LEVEL of 0x80 is not sufficient to avoid CQ_WR_FULL Qset
> error when packets are being received at >20Mpps resulting in
> complete stall of packet reception.
> 
> This patch will configure it to 0x100 which is what is expected
> by HW on Thunderx. On future passes of thunderx and other chips
> HW default/reset value will be 0x100 or higher hence not overwritten.
> 
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Sunil Goutham 

Applied, thank you.

Re: [v2, 0/3] Add PTP support for ls1021a platform

2016-02-25 Thread David Miller

From: Yangbo Lu 
Date: Wed, 24 Feb 2016 17:26:53 +0800

> This patchset is to enable ptp support for ls1021a platform. The endianness
> issue in gianfar driver and gianfar ptp driver must be fixed, and a 1588
> timer node must be added into dts.
> 
> Changes for v2:
>   - Modified commit message
>   - Added more reviewers

Series applied to net-next, thanks.

Re: [PATCHv3] net: fix bridge multicast packet checksum validation

2016-02-25 Thread David Miller

From: Linus Lüssing 
Date: Wed, 24 Feb 2016 04:21:42 +0100

> We need to update the skb->csum after pulling the skb, otherwise
> an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
> in the bridge code. Additionally this fixes the following splats for
> network devices / bridge ports with support for and enabled RX checksum
> offloading:
> 
> [...]
> [   43.986968] eth0: hw csum failure
> [   43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
> [   43.996193] Hardware name: BCM2709
> [   43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] 
> (show_stack+0x10/0x14)
> [   44.007432] [<8001cf14>] (show_stack) from [<801ab614>] 
> (dump_stack+0x80/0x90)
> [   44.014695] [<801ab614>] (dump_stack) from [<802e4548>] 
> (__skb_checksum_complete+0x6c/0xac)
> [   44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] 
> (ipv6_mc_validate_checksum+0x104/0x178)
> [   44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] 
> (skb_checksum_trimmed+0x130/0x188)
> [   44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] 
> (ipv6_mc_check_mld+0x118/0x338)
> [   44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] 
> (br_multicast_rcv+0x5dc/0xd00)
> [   44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] 
> (br_handle_frame_finish+0xac/0x51c)
> [...]
> 
> Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
> Reported-by: Álvaro Fernández Rojas 
> Signed-off-by: Linus Lüssing 

Applied and queued up for -stable, thanks!

Re: [PATCH net v4] r8169:fix "rtl_counters_cond == 1 (loop: 1000, delay: 10)" log spam.

2016-02-25 Thread David Miller

From: Chunhao Lin 
Date: Wed, 24 Feb 2016 14:18:42 +0800

> There will be a log spam when there is no cable plugged. Please refer to
> following links. https://bugzilla.kernel.org/show_bug.cgi?id=104351
> https://bugzilla.kernel.org/show_bug.cgi?id=107421
> 
> This issue is caused by runtime power management. When there is no cable
> plugged, the driver will be suspend (runtime suspend) by OS and NIC will be
> put into the D3 state. During this time, if OS call rtl8169_get_stats64()
> to dump tally counter, because NIC is in D3 state, the register value read
> by driver will return all 0xff. This will let driver think tally counter
> flag is not toggled and then sends the warning message "rtl_counters_cond
> == 1 (loop: 1000, delay: 10)" to kernel log.
> 
> For fixing this issue, 1.add checking driver's pm runtime status in
> rtl8169_get_stats64(). 2.dump tally counter before going runtime suspend
> for counter accuracy in runtime suspend.
> 
> Signed-off-by: Chunhao Lin 

Applied, thanks.

Re: [PATCH net] Propagate MAC address changes to VLANs

2016-02-25 Thread David Miller

From: Mike Manning 
Date: Tue, 23 Feb 2016 20:56:31 +

> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -293,15 +293,15 @@ static void vlan_sync_address(struct net
> 
>  /* vlan address was different from the old address and is equal to
>   * the new address */
> -if (!ether_addr_equal(vlandev->dev_addr, vlan->real_dev_addr) &&
> +if ((vlandev->flags & IFF_UP) &&
> +!ether_addr_equal(vlandev->dev_addr, vlan->real_dev_addr) &&

This patch is corrupted by your email client.

Please read Documentation/email-clients.txt, fix your setup, and email a
test patch to yourself.

Only resubmit this patch to the list once you are able to successfully
apply the patch in that test email.

Re: [PATCH 1/2] net: qca_spi: Don't clear IFF_BROADCAST

2016-02-25 Thread David Miller

From: Stefan Wahren 
Date: Tue, 23 Feb 2016 19:23:23 +

> Currently qcaspi_netdev_setup accidentally clears IFF_BROADCAST.
> So fix this by keeping the flags from ether_setup.
> 
> Reported-by: Michael Heimpold 
> Signed-off-by: Stefan Wahren 
> Fixes: 291ab06ecf67 (net: qualcomm: new Ethernet over SPI driver for QCA7000)

Applied and queued up for -stable.

Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h

2016-02-25 Thread Thomas Graf

On 01/06/16 at 09:20am, Stephen Hemminger wrote:
> This commit breaks compilation of iproute2 with net-next.
> 
> commit 1ffad83dffd675cd742286ae82dca7d746cb0da8
> Author: Mikko Rapeli 
> Date:   Thu Oct 15 07:56:30 2015 +0200
> 
> netfilter: fix include files for compilation
> 
> Add missing header dependencies and other small changes so that each file
> compiles alone in userspace.
> 
> Signed-off-by: Mikko Rapeli 
> Signed-off-by: Pablo Neira Ayuso 
> 
> For iproute2, a copy of kernel headers (from make install_headers) is used.
> After this change. the build of x_tables.c fails because IFNAMSIZ is already
> defined in net/if.h

There is another issue with this commit. iptables.h included from m_ipt.c
includes  xtables.h which includes  which is not
available on a system without xtables.

gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations 
-Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES 
-DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  -DHAVE_SETNS 
-DHAVE_ELF -DCONFIG_GACT -DCONFIG_GACT_PROB -DIPT_LIB_DIR=\"/lib/xtables\" 
-DYY_NO_INPUT   -c -o m_ipt.o m_ipt.c
In file included from ../include/iptables.h:5:0,
 from m_ipt.c:17:
../include/xtables.h:34:29: fatal error: xtables-version.h: No such file or 
directory
 #include

Re: [PATCH 2/2] net: qca_spi: clear IFF_TX_SKB_SHARING

2016-02-25 Thread David Miller

From: Stefan Wahren 
Date: Tue, 23 Feb 2016 19:23:24 +

> ether_setup sets IFF_TX_SKB_SHARING but this is not supported by
> qca_spi as it modifies the skb on xmit.
> 
> Signed-off-by: Stefan Wahren 
> Fixes: 291ab06ecf67 (net: qualcomm: new Ethernet over SPI driver for QCA7000)

Applied and queued up for -stable.

Re: [patch net-next v2 0/9] Introduce devlink interface and first drivers to use it

2016-02-25 Thread Jiri Pirko

Thu, Feb 25, 2016 at 09:44:43PM CET, han...@stressinduktion.org wrote:
>On 25.02.2016 21:12, David Miller wrote:
>>From: Jiri Pirko 
>>Date: Tue, 23 Feb 2016 16:51:25 +0100
>>
>>>There a is need for some userspace API that would allow to expose things
>>>that are not directly related to any device class like net_device of
>>>ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>>>
>>>Use cases:
>>>1) get/set of port type (Ethernet/InfiniBand)
>>>2) setting up port splitters - split port into multiple ones and squash 
>>>again,
>>>enables usage of splitter cable
>>>3) setting up shared buffers - shared among multiple ports within
>>>one chip (work in progress)
>>>4) configuration of switch wide properties - resources division etc - This 
>>>will
>>>allow to pass configuration that is unacceptable to be passed as
>>>a module option.
>>>
>>>First patch of this set introduces a new generic Netlink based interface,
>>>called "devlink". It is similar to nl80211 model and it is heavily
>>>influenced by it, including the API definition. The devlink introduction 
>>>patch
>>>implements use cases 1) and 2). Other 2 are in development atm and will
>>>be addressed by follow-ups.
>>>
>>>It is very convenient for drivers to use devlink, as you can see in other
>>>patches in this set.
>>>
>>>Counterpart for devlink is userspace tool for now called "dl". Command line
>>>interface and outputs are derived from "ip" tool so it should be easy
>>>for users to get used to it.
>>
>>I am very close to applying this series as-is.
>>
>>The clincher for me is that there is precendence in the nl80211 stuff,
>>so obviously whatever userland infrastructure sits on top of that has
>>found a way to deal whatever perceived shortcomings devlink has.
>
>Actually nl80211 phy interfaces aren't really managed by wpa_supplicant nor
>NetworkManager, but they use net_device names to discover those later on. In
>devlink we don't necessarily have netdev names, thus my only objection to
>this series is to switch to stable topology identifiers.

Hannes, as I mentioned earlier in one of my replies, you can choose to
use devlink name or pci (or other) address for your commands. So you
have your stable names even before udev takes care of renaming. It's up
to you as a user what handle to use.

Re: [PATCH 1/1] phy: marvell: Fix 88E1510 initialization

2016-02-25 Thread David Miller

From: Clemens Gruber 
Date: Tue, 23 Feb 2016 20:16:58 +0100

> A bug was introduced in the merge commit b633353115e3 ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
> The generic marvell_config_init (and therefore marvell_of_reg_init) is
> not called anymore for the Marvell 88E1510 (in net-next).
> 
> This patch calls marvell_config_init and moves the specific init
> function for the 88E1510 below the marvell_config_init function to avoid
> adding a function predeclaration.
> 
> Signed-off-by: Clemens Gruber 

Applied, thanks for noticing and fixing this merge error!

Re: [PATCH] socket.7: Document some BPF-related socket options

2016-02-25 Thread Alexei Starovoitov

On Thu, Feb 25, 2016 at 03:27:45PM -0500, Craig Gallek wrote:
> From: Craig Gallek 
> 
> Document the behavior and the first kernel version for each of the
> following socket options:
> SO_ATTACH_FILTER
> SO_ATTACH_BPF
> SO_ATTACH_REUSEPORT_CBPF
> SO_ATTACH_REUSEPORT_EBPF
> SO_DETACH_FILTER
> SO_DETACH_BPF
> 
> Signed-off-by: Craig Gallek 

Thanks! Looks good to me.
Acked-by: Alexei Starovoitov

Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h

2016-02-25 Thread Daniel Borkmann


On 02/04/2016 08:13 AM, Josh Boyer wrote:

On Thu, Jan 7, 2016 at 2:15 PM, Mikko Rapeli  wrote:

On Thu, Jan 07, 2016 at 10:30:40AM -0800, Stephen Hemminger wrote:

On Thu, 7 Jan 2016 07:29:50 +
Mikko Rapeli  wrote:

On Wed, Jan 06, 2016 at 09:20:07AM -0800, Stephen Hemminger wrote:

This commit breaks compilation of iproute2 with net-next.


Ok, linux/if.h and libc net/if.h have overlapping defines, and this is not
the only one. I saw lots of them in the core dump headers.

How should we handle them? Another ifndef for IFNAMSIZ into kernel uapi
headers?

-Mikko


Probably need to do the same thing that was done previously for these
kind of conflicts.  This makes make linux/if.h change to adapt to net/if.h
being included before it.


Ok, got it. And found include/uapi/linux/libc-compat.h. Did not know about it
and was looking for solutions to these problems.

But now I feel like writing a test script for mixing of kernel uapi
and libc headers to find out how many other collitions are still there.
Not good for the pile of over 70 patches in my branch
https://github.com/torvalds/linux/compare/master...mcfrisk:headers_test_v05


Or revert your patch.


I'm fine with this too.


This is causing a number of build failures in Fedora rawhide now.  Did
anyone submit a revert or patch to fix this issue?


Mikko, was there any follow-up patch to fix this? Seems like the build
error is not yet resolved.

Thanks,
Daniel

Re: Sending short raw packets using sendmsg() broke

2016-02-25 Thread Willem de Bruijn

> Commit 9c7077622dd9174 added a check, ll_header_truncated(), which requires
> that a packet transmitted using sendmsg() with PF_PACKET, SOCK_RAW must be
> longer than dev->hard_header_len.
>
> https://github.com/torvalds/linux/commit/9c7077622dd9174
> https://github.com/torvalds/linux/blob/master/net/packet/af_packet.c#L2329

As David already pointed out, this has been revised to allow greater
than or equal. Note that the behavior was already present for
tpacket_snd and this patch only rationalized the two paths.

>
> The bug that popped up is that an application (aprx) can no longer send
> short AX.25 packets using sendmsg(). Packets shorter than 77 bytes fail this
> check in ll_header_truncated(). With older kernels, no problem. AX.25 (and
> some other protocols) have variable-length headers (somewhere between 21 and
> 77 bytes in this case).
>
> hard_header_len is set in drivers/net/hamradio/mkiss.c to
> AX25_KISS_HEADER_LEN + AX25_MAX_HEADER_LEN + 3 which works out to be

If header length is variable and can be shorter than
AX25_MAX_HEADER_LEN then the check will still trigger.

> (1+17+7*8+3)=77.
>
> https://github.com/torvalds/linux/blob/master/drivers/net/hamradio/mkiss.c#L845
>
> I guessed that we could probably set hard_header_len to be the minimum
> length of the packet header (AX25_KISS_HEADER_LEN + AX25_HEADER_LEN + 3) to
> make things work again, but I first asked Alan Cox for an opinion, and he
> says hard_header_len is set correctly to be the worst-case maximum header
> length, and that the ll_header_truncated commit should be reverted instead,
> since it doesn't take variable-length headers into account.

hard_header_length is used in cases where we have to reserve room or
check against packets that exceed frame size, so it indeed should not
be changed to be the minimum header length in a variable header length
scenario.

In most protocols the header length is fixed, so there is no separate
field for minimal header length. If this is the only such check in the
kernel (and I haven't found another after a cursory inspection), then
perhaps an exception should be made here for this one protocol family.

> s = socket(AF_AX25, SOCK_DGRAM, 0);
>
> I didn't yet figure out why that works, maybe the sendto() of an AX25 
> datagram does not go through that hard_header_len check.

That is a different protocol family. The above check is limited to
sends over packets of family PF_PACKET.

Re: [patch net-next v2 0/9] Introduce devlink interface and first drivers to use it

2016-02-25 Thread Hannes Frederic Sowa

On 25.02.2016 21:12, David Miller wrote:

From: Jiri Pirko 
Date: Tue, 23 Feb 2016 16:51:25 +0100

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

Use cases:
1) get/set of port type (Ethernet/InfiniBand)
2) setting up port splitters - split port into multiple ones and squash again,
enables usage of splitter cable
3) setting up shared buffers - shared among multiple ports within
one chip (work in progress)
4) configuration of switch wide properties - resources division etc - This will
allow to pass configuration that is unacceptable to be passed as
a module option.

First patch of this set introduces a new generic Netlink based interface,
called "devlink". It is similar to nl80211 model and it is heavily
influenced by it, including the API definition. The devlink introduction patch
implements use cases 1) and 2). Other 2 are in development atm and will
be addressed by follow-ups.

It is very convenient for drivers to use devlink, as you can see in other
patches in this set.

Counterpart for devlink is userspace tool for now called "dl". Command line
interface and outputs are derived from "ip" tool so it should be easy
for users to get used to it.

I am very close to applying this series as-is.

The clincher for me is that there is precendence in the nl80211 stuff,
so obviously whatever userland infrastructure sits on top of that has
found a way to deal whatever perceived shortcomings devlink has.

Actually nl80211 phy interfaces aren't really managed by wpa_supplicant 
nor NetworkManager, but they use net_device names to discover those 
later on. In devlink we don't necessarily have netdev names, thus my 
only objection to this series is to switch to stable topology identifiers.

If all that people can come up with is "device names! omg UDEV!" and
"multiple tools are hard to use!"  That's awesome, because it means
nobody has any real substantial objection to this facility. :-)

I do think this is important if a lot of devlinks are visible and they 
don't yet provide a net_device. One has to use e.g. pci identifiers or 
depend on module loading order. Especially I have no idea how this 
should work with devices where the module is not yet loaded (and most 
device drivers don't split between pci layer and net_device). This is 
even more complicated then the udev stuff we have nowadays.

Bye,
Hannes

Re: [PATCH RFC 3/3] net: convert tc_u32 to use the intermediate representation

2016-02-25 Thread John Fastabend

On 16-02-25 09:37 AM, Pablo Neira Ayuso wrote:
> This patch moves the u32 parser from the ixgbe that John has made to the
> core u32. This parser has been adapted to build the intermediate
> representation.
> 
> To store the parsing information, this patch introduces a parse table
> object, one per device, so we don't need to store the parsing states in
> the adapter, which is the major dependency with previous patches.
> 
> Since u32 allows rules connected via links, the u32 parser tracks this
> links and then generates the intermediate representation that is passed
> to the ixgbe driver.

It also supports hash tables and loops.

> 
> New drivers will only have to implement the jit translation code based
> on the intermediate representation. With some extra work, I think it
> should be possible to generalize the existing tc specific ndo action so
> it can be used by other frontends.
> 
> I tried to stick to John's original u32 frontend parser as much as
> possible, adapting it to build the intermediate representation.
> 
> After this change, we don't expose the tc action structure layout and
> other similar frontend details to the backend anymore to the backend
> anymore. I think this is good since changes in the frontend should not
> need to be propagated to the 1..n drivers supporting u32 offloads. In
> that sense, this helps to keep the frontend software representation
> separated from low-level backend driver details.
> 
> After this patch, it should be possible to put the tc_cls_u32_knode
> structure into diet since we only need the handle (as unique id) and the
> ast tree.
> 

On ixgbe this is true but going forward I can support hash functions
so you need the divisor, prio, and handle minimally. I'm not sure how
to do hash tables for example in this IR yet. Might be there I'm still
trying to grok the IR details.

> I couldn't send any more incremental changes to update previous work
> since the u32 parser and the internal representation were put together,
> that why this patch is slightly large.
> 
> Signed-off-by: Pablo Neira Ayuso 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h   |   4 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 216 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_model.h | 112 
>  include/net/pkt_cls.h  |   3 +
>  net/sched/cls_u32.c| 344 
> +

Feels like a lot of code to me when the original direct u32 to hw
was reasonably small and the flower implementation is small as well.

[...]

>  
> +static int ixgbe_tcp_jit(struct net_ir_jit_ctx *ctx,
> +  const struct net_ir_expr *expr,
> +  void *data)
> +{
> + struct ixgbe_filter *f = (struct ixgbe_filter *)data;
> + struct net_ir_expr *right = expr->relational.right;
> + struct net_ir_expr *payload;
> + u32 mask = 0x;
> +
> + if (expr->relational.left->type == NET_IR_EXPR_BINOP) {
> + payload = expr->relational.left->binop.left;
> + mask = expr->relational.left->binop.right->value.data;
> + } else {
> + payload = expr->relational.left;
> + }
> +
> + switch (payload->payload.offset) {

I don't like this because it hardcodes the offset pieces into the
driver. The reason I used a model.h header file with its structures
in ixgbe is the next step is to expose the parser so the model can be
updated. So that when the hardware parser changes the data structure
can be changed via firmware call. The style of coding here wont work
for that so we still need the model.h file and for loop.

> + case offsetof(struct tcphdr, source):
> + f->input->filter.formatted.src_port = right->value.data & 
> 0x;
> + f->mask.formatted.src_port = mask & 0x;
> + break;
> + case offsetof(struct tcphdr, dest):
> + f->input->filter.formatted.dst_port = right->value.data & 
> 0x;
> + f->mask.formatted.dst_port = mask & 0x;
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> + return 0;
> +}
> +
> +static struct net_ir_proto_desc ixgbe_tcp_desc = {
> + .base   = NET_IR_PAYLOAD_TRANSPORT_HDR,
> + .protonum   = IPPROTO_TCP,

hmm I'm having trouble tracking down how this is used but the point of
a lot of ongoing work is to not be dependent on any specific protocol
values. The IPPROTO_TCP here looks suspicious why do we need this? It
seems to imply I would need to define a IPPROTO_FOO if I wanted to add
a new protocol.

> + .jit= ixgbe_tcp_jit,
> +};
> +
> +static int ixgbe_ipv4_jit(struct net_ir_jit_ctx *ctx,
> +   const struct net_ir_expr *expr,
> +   void *data)
> +{
> + struct ixgbe_filter *f = (struct ixgbe_filter *)data;
> + struct net_ir_expr *right = expr->relational.right;
> + struct net_ir_expr *payload;
> +

Re: [PATCH net] net: vrf: Remove direct access to skb->data

2016-02-25 Thread David Miller

From: David Ahern 
Date: Tue, 23 Feb 2016 10:10:26 -0800

> Nik pointed that the VRF driver should be using skb_header_pointer
> instead of accessing skb->data and bits beyond directly which can
> be garbage.
> 
> Cc: Nikolay Aleksandrov 
> Signed-off-by: David Ahern 
> ---
> Dave: This should go into v4.4 stable as well.

Applied and queued up for -stable, thanks.

[PATCH] socket.7: Document some BPF-related socket options

2016-02-25 Thread Craig Gallek

From: Craig Gallek 

Document the behavior and the first kernel version for each of the
following socket options:
SO_ATTACH_FILTER
SO_ATTACH_BPF
SO_ATTACH_REUSEPORT_CBPF
SO_ATTACH_REUSEPORT_EBPF
SO_DETACH_FILTER
SO_DETACH_BPF

Signed-off-by: Craig Gallek 
---
 man7/socket.7 | 104 --
 1 file changed, 86 insertions(+), 18 deletions(-)

diff --git a/man7/socket.7 b/man7/socket.7
index db7cb8324dde..79b4f3158541 100644
--- a/man7/socket.7
+++ b/man7/socket.7
@@ -53,13 +53,6 @@
 .\" SO_BPF_EXTENSIONS (3.14)
 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
 .\"Author: Michal Sekletar 
-.\" SO_ATTACH_BPF (3.19)
-.\" and SO_DETACH_BPF as synonym for SO_DETACH_FILTER
-.\" commit 89aa075832b0da4402acebd698d0411dcc82d03e
-.\"Author: Alexei Starovoitov 
-.\"SO_ATTACH_REUSEPORT_CBPF, SO_ATTACH_REUSEPORT_EBPF (4.5)
-.\"commit 538950a1b7527a0a52ccd9337e3fcd304f027f13
-.\"Author: Craig Gallek 
 .\"
 .TH SOCKET 7 2015-05-07 Linux "Linux Programmer's Manual"
 .SH NAME
@@ -311,6 +304,80 @@ The value 0 indicates that this is not a listening socket,
 the value 1 indicates that this is a listening socket.
 This socket option is read-only.
 .TP
+.BR SO_ATTACH_FILTER " and " SO_ATTACH_BPF
+Attach a classic or extended BPF program (respectively) to the socket
+for use as a filter of incoming packets.  A packet will be dropped if
+the filter returns zero or have its data truncated to the non-zero
+length returned.  If the value returned is greater or equal to the
+packet's data length, the packet is allowed to proceed unmodified.
+
+The argument for
+.BR SO_ATTACH_FILTER
+is a
+.I sock_fprog
+structure in
+.B .
+.sp
+.in +4n
+.nf
+struct sock_fprog {
+unsigned short  len;
+struct sock_filter *filter;
+};
+.fi
+.in
+.IP
+The argument for
+.BR SO_ATTACH_BPF
+is a file descriptor returned by the
+.BR bpf (2)
+system call and must represent a program of type
+.BR BPF_PROG_TYPE_SOCKET_FILTER.
+
+.BR SO_ATTACH_FILTER
+is available in Linux 2.2.
+.BR SO_ATTACH_BPF
+is available in Linux 3.19.  Both classic and extended BPF are
+explained in the kernel source file
+.I Documentation/networking/filter.txt
+.TP
+.BR SO_ATTACH_REUSEPORT_CBPF " and " SO_ATTACH_REUSEPORT_EBPF " (since Linux 
4.5)"
+For use with the
+.BR SO_REUSEPORT
+option, these options allow the user to define a classic or extended
+BPF program (respectively) which defines how packets are assigned to
+the sockets in the reuseport group.  The program must return an index
+between 0 and N-1 representing the socket which should receive the
+packet (where N is the number of sockets in the group). If the BPF
+program returns an invalid index, socket selection will fall back to
+the plain
+.BR SO_REUSEPORT
+mechanism.
+
+Sockets are numbered in the order in which they are added to the group
+(that is, the order of
+.BR bind (2)
+calls for UDP sockets or the order of
+.BR listen (2)
+calls for TCP sockets).  New sockets added to the group will inherit
+the program.  When a socket is removed from the group (via
+.BR close (2))
+the last socket in the group will be moved into the closed socket's
+position.
+
+These options may be set repeatedly at any time on any single socket
+in the group to replace the current BPF program used by all sockets in
+the group.
+.BR SO_ATTACH_REUSEPORT_CBPF
+takes the same socket argument type as
+.BR SO_ATTACH_FILTER
+and
+.BR SO_ATTACH_REUSEPORT_EBPF
+takes the same socket argument type as
+.BR SO_ATTACH_BPF.
+UDP support for this feature is available in Linux 4.5.
+TCP support for this feature is available in Linux 4.6.
+.TP
 .B SO_BINDTODEVICE
 Bind this socket to a particular device like \(lqeth0\(rq,
 as specified in the passed interface name.
@@ -368,6 +435,18 @@ Only allowed for processes with the
 .B CAP_NET_ADMIN
 capability or an effective user ID of 0.
 .TP
+.BR SO_DETACH_FILTER " and " SO_DETACH_BPF
+These options may be used to remove the BPF program attached to the
+socket with either
+.BR SO_ATTACH_FILTER
+or
+.BR SO_ATTACH_BPF.
+The option value is ignored.
+.BR SO_DETACH_FILTER
+is available in Linux 2.2.
+.BR SO_DETACH_BPF
+is available in Linux 3.19.
+.TP
 .BR SO_DOMAIN " (since Linux 2.6.32)"
 Retrieves the socket domain as an integer, returning a value such as
 .BR AF_INET6 .
@@ -991,17 +1070,6 @@ where only the later program needs to set the
 option.
 Typically this difference is invisible, since, for example, a server
 program is designed to always set this option.
-.SH BUGS
-The
-.B CONFIG_FILTER
-socket options
-.B SO_ATTACH_FILTER
-and
-.B SO_DETACH_FILTER
-.\" FIXME Document SO_ATTACH_FILTER and SO_DETACH_FILTER
-are not documented.
-The suggested interface to use them is via the libpcap
-library.
 .\" .SH AUTHORS
 .\" This man page was written by Andi Kleen.
 .SH SEE

Re: Sending short raw packets using sendmsg() broke

2016-02-25 Thread David Miller

From: Heikki Hannikainen 
Date: Thu, 25 Feb 2016 21:36:07 +0200 (EET)

> Commit 9c7077622dd9174 added a check, ll_header_truncated(), which
> requires that a packet transmitted using sendmsg() with PF_PACKET,
> SOCK_RAW must be longer than dev->hard_header_len.

Fixed by:

commit 880621c2605b82eb5af91a2c94223df6f5a3fb64
Author: Martin Blumenstingl 
Date:   Sun Nov 22 17:46:09 2015 +0100

packet: Allow packets with only a header (but no payload)

Re: [PATCH net 1/5] ppp: lock ppp structure before modifying mru in ppp_ioctl()

2016-02-25 Thread David Miller

From: Guillaume Nault 
Date: Thu, 25 Feb 2016 20:16:55 +0100

> Understood. Just to be sure, does patch #2 falls under lack of
> demonstration? Or should I repost it separately?

Please repost it if you feel this way, with the race and corruption
possibility explained in detail in the commit log message.

Thanks.

Re: [PATCH net-next 0/5] vxlan: consolidate rx handling

2016-02-25 Thread David Miller

From: Jiri Benc 
Date: Tue, 23 Feb 2016 18:02:54 +0100

> Currently, vxlan_rcv is just called at the end of vxlan_udp_encap_recv,
> continuing the rx processing where vxlan_udp_encap_recv left it. There's no
> clear border between those two functions. This patchset moves
> vxlan_udp_encap_recv and vxlan_rcv into a single function.
> 
> This also allows to do some simplification in error path.
> 
> The VXLAN-GPE implementation that will follow up this set can be seen at:
> https://github.com/jbenc/linux-vxlan/commits/master

Series applied, thank you.

Re: [net-next PATCH 0/2] GENEVE/VXLAN: Enable outer Tx checksum by default

2016-02-25 Thread David Miller

From: Tom Herbert 
Date: Tue, 23 Feb 2016 08:47:30 -0800

> Right, GRO should probably not coalesce packets with non-zero IP
> identifiers due to the loss of information.

If they are monotonically increasing, which is the only case worth
caring about, it absolutely should.

Because that can be done without any loss of information.

Re: [patch net-next v2 0/9] Introduce devlink interface and first drivers to use it

2016-02-25 Thread David Miller

From: Jiri Pirko 
Date: Tue, 23 Feb 2016 16:51:25 +0100

> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> 
> Use cases:
> 1) get/set of port type (Ethernet/InfiniBand)
> 2) setting up port splitters - split port into multiple ones and squash again,
>enables usage of splitter cable
> 3) setting up shared buffers - shared among multiple ports within
>one chip (work in progress)
> 4) configuration of switch wide properties - resources division etc - This 
> will
>allow to pass configuration that is unacceptable to be passed as
>a module option.
> 
> First patch of this set introduces a new generic Netlink based interface,
> called "devlink". It is similar to nl80211 model and it is heavily
> influenced by it, including the API definition. The devlink introduction patch
> implements use cases 1) and 2). Other 2 are in development atm and will
> be addressed by follow-ups.
> 
> It is very convenient for drivers to use devlink, as you can see in other
> patches in this set.
> 
> Counterpart for devlink is userspace tool for now called "dl". Command line
> interface and outputs are derived from "ip" tool so it should be easy
> for users to get used to it.

I am very close to applying this series as-is.

The clincher for me is that there is precendence in the nl80211 stuff,
so obviously whatever userland infrastructure sits on top of that has
found a way to deal whatever perceived shortcomings devlink has.

If all that people can come up with is "device names! omg UDEV!" and
"multiple tools are hard to use!"  That's awesome, because it means
nobody has any real substantial objection to this facility. :-)

Re: [PATCH] can: ems_usb: Fix possible tx overflow

2016-02-25 Thread David Miller

From: Marc Kleine-Budde 
Date: Mon, 22 Feb 2016 13:18:38 +0100

> David, I can send a patch to clean up the coding style. Do you want it
> for net or via net-next?

net-next is fine.

Sending short raw packets using sendmsg() broke

2016-02-25 Thread Heikki Hannikainen



Hi,

Commit 9c7077622dd9174 added a check, ll_header_truncated(), which 
requires that a packet transmitted using sendmsg() with PF_PACKET, 
SOCK_RAW must be longer than dev->hard_header_len.


https://github.com/torvalds/linux/commit/9c7077622dd9174
https://github.com/torvalds/linux/blob/master/net/packet/af_packet.c#L2329

The bug that popped up is that an application (aprx) can no longer send 
short AX.25 packets using sendmsg(). Packets shorter than 77 bytes fail 
this check in ll_header_truncated(). With older kernels, no problem. AX.25 
(and some other protocols) have variable-length headers (somewhere between 
21 and 77 bytes in this case).


hard_header_len is set in drivers/net/hamradio/mkiss.c to 
AX25_KISS_HEADER_LEN + AX25_MAX_HEADER_LEN + 3 which works out to be 
(1+17+7*8+3)=77.


https://github.com/torvalds/linux/blob/master/drivers/net/hamradio/mkiss.c#L845

I guessed that we could probably set hard_header_len to be the minimum 
length of the packet header (AX25_KISS_HEADER_LEN + AX25_HEADER_LEN + 3) 
to make things work again, but I first asked Alan Cox for an opinion, and 
he says hard_header_len is set correctly to be the worst-case maximum 
header length, and that the ll_header_truncated commit should be reverted 
instead, since it doesn't take variable-length headers into account.


 - Hessu


-- Forwarded message --
From: Heikki Hannikainen 
To: Aprx software 
Date: Thu, 25 Feb 2016 11:22:05 +0200 (EET)
Subject: Re: packet size is too short Kernel Error with Aprx


Hi,

I spent a bit of time trying to understand what's happening. As described by 
others, if the packet being transmitted is short, the newer Linux kernels drop 
it, saying this in the kernel log (dmesg):


[405809.774704] aprx: packet size is too short (59 <= 77)

The check in the kernel is here:

https://github.com/torvalds/linux/blob/master/net/packet/af_packet.c#L2329

The check requires that the packet is longer than dev->hard_header_len. 
hard_header_len is set in linux drivers/net/hamradio/mkiss.c to 
AX25_KISS_HEADER_LEN + AX25_MAX_HEADER_LEN + 3 which works out to be 
(1+17+7*8+3)=77:


#define AX25_MAX_DIGIS  8
#define AX25_HEADER_LEN 17
#define AX25_ADDR_LEN   7
#define AX25_DIGI_HEADER_LEN(AX25_MAX_DIGIS * AX25_ADDR_LEN)
#define AX25_MAX_HEADER_LEN (AX25_HEADER_LEN + 
AX25_DIGI_HEADER_LEN)


https://github.com/torvalds/linux/blob/master/drivers/net/hamradio/mkiss.c#L845

aprx uses the sendmsg() system call to send raw, variable-length-header AX25 
frames, and those may well be shorter than 77 bytes, if there are not many 
digipeaters in the path and the packet payload is short (not a bad idea on a 
1200 bit/s channel).


https://github.com/PhirePhly/aprx/blob/master/netax25.c#L758

It may be that hard_header_len should be set in mkiss.c to AX25_KISS_HEADER_LEN 
+ AX25_HEADER_LEN + 3 instead, if I understood this right. From 
include/linux/netdevice.h:


 *  @hard_header_len: Hardware header length, which means that this is the
 *minimum size of a packet.

As was pointed out, you *can* use the ax25-tools beacon program to transmit 
short packets! beacon does not use sendmsg(), it generates a pair of struct 
full_sockaddr_ax25 using libax25 ax25_aton() for source call and destination 
call+digipeater path, calls bind() to set the source call and then sends it 
with sendto(), simplified:


s = socket(AF_AX25, SOCK_DGRAM, 0);
len = ax25_aton(sourcecall, );
bind(s, (struct sockaddr *), len);
dlen = ax25_aton(addr, );
sendto(s, message, strlen(message), 0, (struct sockaddr *), dlen);

I didn't yet figure out why that works, maybe the sendto() of an AX25 datagram 
does not go through that hard_header_len check.


If I understood things right (I'm not entirely sure about the kernel sendmsg() 
code path yet), there are two things that could be done here:


- get the kernel fixed for supporting short raw AX.25 packet transmission again
- in the mean while, change aprx to call bind() and sendto() for every packet 
instead of a single sendmsg() - slightly unoptimal, but at 1200 bit/s and a few 
packets per second, who is going to notice...


  - Hessu, OH7LZB

Re: [PATCH] appletalk: Pass IP-over-DDP packets through when 'ipddp0' interface is not present

2016-02-25 Thread David Miller

From: Adam Seering 
Date: Tue, 23 Feb 2016 09:19:13 -0500

> Let userspace programs transmit and receive raw IP-over-DDP packets
> with a kernel where "ipddp" was compiled as a module but is not loaded
> (so no "ipddp0" network interface is exposed).  This makes the "module
> is not loaded" behavior match the "module was never compiled" behavior.
> 
> Signed-off-by: Adam Seering 

I think a better approache is to somehow autoload the module.

1 2 >

1 - 100 of 197 matches

Mail list logo