Re: [PATCH iproute2 -next] examples, bpf: further improve examples

2015-12-10 Thread Stephen Hemminger
On Wed,  2 Dec 2015 00:25:36 +0100
Daniel Borkmann  wrote:

> Improve example files further and add a more generic set of possible
> helpers for them that can be used.
> 
> Signed-off-by: Daniel Borkmann 

Sure, applied to net-next
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute] ila: Add support for ILA lwtunnels

2015-12-10 Thread Stephen Hemminger
On Mon, 30 Nov 2015 14:57:28 -0800
Tom Herbert  wrote:

> This patch:
>  - Adds a utility function for parsing a 64 bit address
>  - Adds a utility function for converting a 64 bit address to ASCII
>  - Adds and ILA encap type in lwt tunnels
> 
> Signed-off-by: Tom Herbert 

Looks good applied.

You might want to get some of that 64 bit stuff into libmnl as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] net: Flush local routes when device changes vrf association

2015-12-10 Thread Nikolay Aleksandrov
On 12/10/2015 07:25 PM, David Ahern wrote:
> The VRF driver cycles netdevs when an interface is enslaved or released:
> the down event is used to flush neighbor and route tables and the up
> event (if the interface was already up) effectively moves local and
> connected routes to the proper table.
> 
> As of 4f823defdd5b the local route is left hanging around after a link
> down, so when a netdev is moved from one VRF to another (or released
> from a VRF altogether) local routes are left in the wrong table.
> 
> Fix by handling the NETDEV_CHANGEUPPER event. When the upper dev is
> an L3mdev then call fib_disable_ip to flush all routes, local ones
> to.
> 
> Fixes: 4f823defdd5b ("ipv4: fix to not remove local route on link down")
> Cc: Julian Anastasov 
> Signed-off-by: David Ahern 
> ---
> v2
> - key off NETDEV_CHANGEUPPER event vs using a new event
> 
>  net/ipv4/fib_frontend.c | 9 +
>  1 file changed, 9 insertions(+)
> 

Looks much better to me, thanks!

Signed-off-by: Nikolay Aleksandrov 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] uapi: export ila.h

2015-12-10 Thread Stephen Hemminger
The file ila.h used for lightweight tunnels is being used by iproute2
but is not exported yet.

Signed-off-by: Stephen Hemminger 
---
 include/uapi/linux/Kbuild | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 628e6e6..c2e5d6c 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -186,6 +186,7 @@ header-y += if_tunnel.h
 header-y += if_vlan.h
 header-y += if_x25.h
 header-y += igmp.h
+header-y += ila.h
 header-y += in6.h
 header-y += inet_diag.h
 header-y += in.h
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/4] mpls: validate L2 via address length

2015-12-10 Thread Robert Shearman
If an L2 via address for an mpls nexthop is specified, the length of
the L2 address must match that expected by the output device,
otherwise it could access memory beyond the end of the via address
buffer in the route.

This check was present prior to commit f8efb73c97e2 ("mpls: multipath
route support"), but got lost in the refactoring, so add it back,
applying it to all nexthops in multipath routes.

Fixes: f8efb73c97e2 ("mpls: multipath route support")
Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index c70d750148b6..3be29cb1f658 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -534,6 +534,10 @@ static int mpls_nh_assign_dev(struct net *net, struct 
mpls_route *rt,
if (!mpls_dev_get(dev))
goto errout;
 
+   if ((nh->nh_via_table == NEIGH_LINK_TABLE) &&
+   (dev->addr_len != nh->nh_via_alen))
+   goto errout;
+
RCU_INIT_POINTER(nh->nh_dev, dev);
 
return 0;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/4] mpls: fixes for nexthops without via addresses

2015-12-10 Thread Robert Shearman
These four fixes all apply to the case of having an mpls route with an
output device, but without a nexthop.

Patches 2 and 3 could really have been combined in one patch, but I
wanted to separate the fix for some recent breakage from the fix for a
day-1 issue.

Robert Shearman (4):
  mpls: validate L2 via address length
  mpls: don't dump RTA_VIA attribute if not specified
  mpls: fix out-of-bounds access when via address not specified
  mpls: make via address optional for multipath routes

 net/mpls/af_mpls.c | 43 +++
 1 file changed, 31 insertions(+), 12 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4 8/8] openvswitch: Interface with NAT.

2015-12-10 Thread Pablo Neira Ayuso
On Tue, Dec 08, 2015 at 05:01:10PM -0800, Jarno Rajahalme wrote:
> - /* Call the helper right after nf_conntrack_in() for confirmed
> -  * connections, but only when commiting for unconfirmed connections.
> -  */
>   ct = nf_ct_get(skb, );
> - if (ct && (nf_ct_is_confirmed(ct) ? !cached : info->commit)
> - && ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
> - WARN_ONCE(1, "helper rejected packet");
> - return -EINVAL;
> + if (ct) {
> +#ifdef CONFIG_NF_NAT_NEEDED
> + /* Packets starting a new connection must be NATted before the
> +  * helper, so that the helper knows about the NAT.  We enforce
> +  * this by delaying both NAT and helper calls for unconfirmed
> +  * connections until the commiting CT action.  For later
> +  * packets NAT and Helper may be called in either order.
> +  *
> +  * NAT will be done only if the CT action has NAT, and only
> +  * once per packet (per zone), as guarded by the NAT bits in
> +  * the key->ct.state.
> +  */
> + if (info->nat && !(key->ct.state & OVS_CS_F_NAT_MASK) &&
> + (nf_ct_is_confirmed(ct) || info->commit) &&
> + ovs_ct_nat(net, key, info, skb, ct, ctinfo) != NF_ACCEPT) {
> + WARN_ONCE(1, "NAT rejected packet");

NAT can drop packets, so this warn_on I don't think you need it.

> + return -EINVAL;
> + }
> +#endif
> + /* Call the helper whenever nf_conntrack_in() was called for
> +  * confirmed connections, but only when commiting for
> +  * unconfirmed connections.
> +  */
> + if ((nf_ct_is_confirmed(ct) ? !cached : info->commit)
> + && ovs_ct_helper(skb, info->family) != NF_ACCEPT) {
> + WARN_ONCE(1, "helper rejected packet");

Same thing may happen with helpers.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/4] mpls: don't dump RTA_VIA attribute if not specified

2015-12-10 Thread Robert Shearman
The problem seen is that when adding a route with a nexthop with no
via address specified, iproute2 generates bogus output:

  # ip -f mpls route add 100 dev lo
  # ip -f mpls route list
  100 via inet 0.0.8.0 dev lo

The reason for this is that the kernel generates an RTA_VIA attribute
with the family set to AF_INET, but the via address data having zero
length. The cause of family being AF_INET is that on route insert
cfg->rc_via_table is left set to 0, which just happens to be
NEIGH_ARP_TABLE which is then translated into AF_INET.

iproute2 doesn't validate the length prior to printing and so prints
garbage. Although it could be fixed to do the validation, I would
argue that AF_INET addresses should always be exactly 4 bytes so the
kernel is really giving userspace bogus data.

Therefore, avoid generating the RTA_VIA attribute when dumping the
route if the via address wasn't specified on add/modify. This is
indicated by NEIGH_ARP_TABLE and a zero via address length - if the
user specified a via address the address length would have been
validated such that it was 4 bytes. Although this is a change in
behaviour that is visible to userspace, I believe that what was
generated before was invalid and as such userspace wouldn't be
expecting it.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 3be29cb1f658..ac1c116abaac 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1235,7 +1235,9 @@ static int mpls_dump_route(struct sk_buff *skb, u32 
portid, u32 seq, int event,
nla_put_labels(skb, RTA_NEWDST, nh->nh_labels,
   nh->nh_label))
goto nla_put_failure;
-   if (nla_put_via(skb, nh->nh_via_table, mpls_nh_via(rt, nh),
+   if ((nh->nh_via_table != NEIGH_ARP_TABLE ||
+nh->nh_via_alen != 0) &&
+   nla_put_via(skb, nh->nh_via_table, mpls_nh_via(rt, nh),
nh->nh_via_alen))
goto nla_put_failure;
dev = rtnl_dereference(nh->nh_dev);
@@ -1323,7 +1325,9 @@ static inline size_t lfib_nlmsg_size(struct mpls_route 
*rt)
 
if (nh->nh_dev)
payload += nla_total_size(4); /* RTA_OIF */
-   payload += nla_total_size(2 + nh->nh_via_alen); /* RTA_VIA */
+   if (nh->nh_via_table != NEIGH_ARP_TABLE ||
+   nh->nh_via_alen != 0) /* RTA_VIA */
+   payload += nla_total_size(2 + nh->nh_via_alen);
if (nh->nh_labels) /* RTA_NEWDST */
payload += nla_total_size(nh->nh_labels * 4);
} else {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] hv_netvsc: Fix race condition on Multi-Send Data field

2015-12-10 Thread Haiyang Zhang
In commit 2a04ae8acb14 ("hv_netvsc: remove locking in netvsc_send()"), the
locking for MSD (Multi-Send Data) field was removed. This could cause a
race condition between RNDIS control messages and data packets processing,
because these two types of traffic are not synchronized.
This patch fixes this issue by sending control messages out directly
without reading MSD field.

Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/netvsc.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 02bab9a..059fc52 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -867,6 +867,14 @@ int netvsc_send(struct hv_device *device,
packet->send_buf_index = NETVSC_INVALID_INDEX;
packet->cp_partial = false;
 
+   /* Send control message directly without accessing msd (Multi-Send
+* Data) field which may be changed during data packet processing.
+*/
+   if (!skb) {
+   cur_send = packet;
+   goto send_now;
+   }
+
msdp = _device->msd[q_idx];
 
/* batch packets in send buffer if possible */
@@ -939,6 +947,7 @@ int netvsc_send(struct hv_device *device,
}
}
 
+send_now:
if (cur_send)
ret = netvsc_send_pkt(cur_send, net_device, pb, skb);
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 4/4] mpls: make via address optional for multipath routes

2015-12-10 Thread Robert Shearman
The via address is optional for a single path route, yet is mandatory
when the multipath attribute is used:

  # ip -f mpls route add 100 dev lo
  # ip -f mpls route add 101 nexthop dev lo
  RTNETLINK answers: Invalid argument

Make them consistent by making the via address optional when the
RTA_MULTIPATH attribute is being parsed so that both forms of
specifying the route work.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 7bfc85f52ca8..c32fc411a911 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -604,10 +604,14 @@ static int mpls_nh_build(struct net *net, struct 
mpls_route *rt,
goto errout;
}
 
-   err = nla_get_via(via, >nh_via_alen, >nh_via_table,
- __mpls_nh_via(rt, nh));
-   if (err)
-   goto errout;
+   if (via) {
+   err = nla_get_via(via, >nh_via_alen, >nh_via_table,
+ __mpls_nh_via(rt, nh));
+   if (err)
+   goto errout;
+   } else {
+   nh->nh_via_table = MPLS_NEIGH_TABLE_UNSPEC;
+   }
 
err = mpls_nh_assign_dev(net, rt, nh, oif);
if (err)
@@ -689,9 +693,6 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
nla_newdst = nla_find(attrs, attrlen, RTA_NEWDST);
}
 
-   if (!nla_via)
-   goto errout;
-
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
rtnh->rtnh_ifindex, nla_via,
nla_newdst);
@@ -1271,7 +1272,8 @@ static int mpls_dump_route(struct sk_buff *skb, u32 
portid, u32 seq, int event,
nh->nh_labels,
nh->nh_label))
goto nla_put_failure;
-   if (nla_put_via(skb, nh->nh_via_table,
+   if (nh->nh_via_table != MPLS_NEIGH_TABLE_UNSPEC &&
+   nla_put_via(skb, nh->nh_via_table,
mpls_nh_via(rt, nh),
nh->nh_via_alen))
goto nla_put_failure;
@@ -1343,7 +1345,9 @@ static inline size_t lfib_nlmsg_size(struct mpls_route 
*rt)
 
for_nexthops(rt) {
nhsize += nla_total_size(sizeof(struct rtnexthop));
-   nhsize += nla_total_size(2 + nh->nh_via_alen);
+   /* RTA_VIA */
+   if (nh->nh_via_table != MPLS_NEIGH_TABLE_UNSPEC)
+   nhsize += nla_total_size(2 + nh->nh_via_alen);
if (nh->nh_labels)
nhsize += nla_total_size(nh->nh_labels * 4);
} endfor_nexthops(rt);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 3/4] mpls: fix out-of-bounds access when via address not specified

2015-12-10 Thread Robert Shearman
When a via address isn't specified, the via table is left initialised
to 0 (NEIGH_ARP_TABLE), and the via address length also left
initialised to 0. This results in a via address array of length 0
being allocated (contiguous with route and nexthop array), meaning
that when a packet is sent using neigh_xmit the neighbour lookup and
creation will cause an out-of-bounds access when accessing the 4 bytes
of the IPv4 address it assumes it has been given a pointer to.

This could be fixed by allocating the 4 bytes of via address necessary
and leaving it as all zeroes. However, it seems wrong to me to use an
ipv4 nexthop (including possibly ARPing for 0.0.0.0) when the user
didn't specify to do so.

Instead, set the via address table to NEIGH_NR_TABLES to signify it
hasn't been specified and use this at forwarding time to signify a
neigh_xmit using an L2 address consisting of the device address. This
mechanism is the same as that used for both ARP and ND for loopback
interfaces and those flagged as no-arp, which are all we can really
support in this case.

Fixes: cf4b24f0024f ("mpls: reduce memory usage of routes")
Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ac1c116abaac..7bfc85f52ca8 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -27,6 +27,8 @@
  */
 #define MAX_MP_SELECT_LABELS 4
 
+#define MPLS_NEIGH_TABLE_UNSPEC (NEIGH_LINK_TABLE + 1)
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -317,7 +319,13 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
}
}
 
-   err = neigh_xmit(nh->nh_via_table, out_dev, mpls_nh_via(rt, nh), skb);
+   /* If via wasn't specified then send out using device address */
+   if (nh->nh_via_table == MPLS_NEIGH_TABLE_UNSPEC)
+   err = neigh_xmit(NEIGH_LINK_TABLE, out_dev,
+out_dev->dev_addr, skb);
+   else
+   err = neigh_xmit(nh->nh_via_table, out_dev,
+mpls_nh_via(rt, nh), skb);
if (err)
net_dbg_ratelimited("%s: packet transmission failed: %d\n",
__func__, err);
@@ -1122,6 +1130,7 @@ static int rtm_to_route_config(struct sk_buff *skb,  
struct nlmsghdr *nlh,
 
cfg->rc_label   = LABEL_NOT_SPECIFIED;
cfg->rc_protocol= rtm->rtm_protocol;
+   cfg->rc_via_table   = MPLS_NEIGH_TABLE_UNSPEC;
cfg->rc_nlflags = nlh->nlmsg_flags;
cfg->rc_nlinfo.portid   = NETLINK_CB(skb).portid;
cfg->rc_nlinfo.nlh  = nlh;
@@ -1235,8 +1244,7 @@ static int mpls_dump_route(struct sk_buff *skb, u32 
portid, u32 seq, int event,
nla_put_labels(skb, RTA_NEWDST, nh->nh_labels,
   nh->nh_label))
goto nla_put_failure;
-   if ((nh->nh_via_table != NEIGH_ARP_TABLE ||
-nh->nh_via_alen != 0) &&
+   if (nh->nh_via_table != MPLS_NEIGH_TABLE_UNSPEC &&
nla_put_via(skb, nh->nh_via_table, mpls_nh_via(rt, nh),
nh->nh_via_alen))
goto nla_put_failure;
@@ -1325,8 +1333,7 @@ static inline size_t lfib_nlmsg_size(struct mpls_route 
*rt)
 
if (nh->nh_dev)
payload += nla_total_size(4); /* RTA_OIF */
-   if (nh->nh_via_table != NEIGH_ARP_TABLE ||
-   nh->nh_via_alen != 0) /* RTA_VIA */
+   if (nh->nh_via_table != MPLS_NEIGH_TABLE_UNSPEC) /* RTA_VIA */
payload += nla_total_size(2 + nh->nh_via_alen);
if (nh->nh_labels) /* RTA_NEWDST */
payload += nla_total_size(nh->nh_labels * 4);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4 2/8] netfilter: Factor out nf_ct_get_info().

2015-12-10 Thread Pablo Neira Ayuso
On Tue, Dec 08, 2015 at 05:01:04PM -0800, Jarno Rajahalme wrote:
> Define a new inline function to map conntrack status to enum
> ip_conntrack_info.  This removes the need to otherwise duplicate this
> code in a later patch ("openvswitch: Find existing conntrack entry
> after upcall.").
> 
> Signed-off-by: Jarno Rajahalme 
> ---
>  include/net/netfilter/nf_conntrack.h | 15 +++
>  net/netfilter/nf_conntrack_core.c| 22 +++---
>  2 files changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_conntrack.h 
> b/include/net/netfilter/nf_conntrack.h
> index fde4068..b3de10e 100644
> --- a/include/net/netfilter/nf_conntrack.h
> +++ b/include/net/netfilter/nf_conntrack.h
> @@ -125,6 +125,21 @@ nf_ct_tuplehash_to_ctrack(const struct 
> nf_conntrack_tuple_hash *hash)
>   tuplehash[hash->tuple.dst.dir]);
>  }
>  
> +static inline enum ip_conntrack_info
> +nf_ct_get_info(const struct nf_conntrack_tuple_hash *h)
> +{
> + const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
> +
> + if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY)
> + return IP_CT_ESTABLISHED_REPLY;
> + /* Once we've had two way comms, always ESTABLISHED. */
> + if (test_bit(IPS_SEEN_REPLY_BIT, >status))
> + return IP_CT_ESTABLISHED;
> + if (test_bit(IPS_EXPECTED_BIT, >status))
> + return IP_CT_RELATED;
> + return IP_CT_NEW;
> +}
> +
>  static inline u_int16_t nf_ct_l3num(const struct nf_conn *ct)
>  {
>   return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index 3cb3cb8..70ddbd8 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1056,25 +1056,9 @@ resolve_normal_ct(struct net *net, struct nf_conn 
> *tmpl,
>   ct = nf_ct_tuplehash_to_ctrack(h);
>  
>   /* It exists; we have (non-exclusive) reference. */
> - if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) {
> - *ctinfo = IP_CT_ESTABLISHED_REPLY;
> - /* Please set reply bit if this packet OK */
> - *set_reply = 1;
> - } else {
> - /* Once we've had two way comms, always ESTABLISHED. */
> - if (test_bit(IPS_SEEN_REPLY_BIT, >status)) {
> - pr_debug("nf_conntrack_in: normal packet for %p\n", ct);

This implicitly assumes we don't want pr_debug for nf_conntrack
anymore. Not telling this is wrong, but we have more pr_debug() calls
in nf_conntrack that will remain there.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] list: introduce list_is_first()

2015-12-10 Thread Jiri Kosina
On Thu, 10 Dec 2015, Jens Axboe wrote:

> It's a balance, as we also should not make APIs out of everything. As I said,
> purely my opinion, but I think the is_last/is_first have jumped the shark.

I don't have a strong opinion either way.

What I think we should do though, is to either have both (i.e accept this 
patchset) or have neither of them (i.e. drop list_is_last()).

Otherwise people are likely to be confused by such an asymetric API and 
will keep posting patches for it over and over again.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute PATCH] tc.8: Fix reference to tc-tcindex.8

2015-12-10 Thread Stephen Hemminger
On Thu, 10 Dec 2015 13:24:51 +0100
Phil Sutter  wrote:

> Just a typo there, it's spelled correctly in SEE ALSO section..
> 
> Signed-off-by: Phil Sutter 

Applied
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: Flush local routes when device changes vrf association

2015-12-10 Thread David Ahern

On 12/9/15 6:35 PM, David Ahern wrote:

The VRF driver cycles netdevs when an interface is enslaved or released:
the down event is used to flush neighbor and route tables and the up
event (if the interface was already up) effectively moves local and
connected routes to the proper table.

As of 4f823defdd5b the local route is left hanging around after a link
down, so when a netdev is moved from one VRF to another (or released
from a VRF altogether) local routes are left in the wrong table.

Fix by introducing a NETDEV_VRF_CHANGE event that can be used to trigger
the flush of all routes, including local ones.

Fixes: 4f823defdd5b ("ipv4: fix to not remove local route on link down")
Cc: Julian Anastasov 
Signed-off-by: David Ahern 


At Nik's pushing I see that I can do this without adding a new netdev 
event; the NETDEV_CHANGEUPPER can be used for this as well.


Please disregard this patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] vrf: Add support for table names

2015-12-10 Thread Stephen Hemminger
On Tue,  8 Dec 2015 12:24:44 -0800
David Ahern  wrote:

> Currently, the table id for VRF devices requires an integer. Convert
> it to use rtnl_rttable_a2n which handles table names from the iproute2
> directory.
> 
> This also fixes a bug in the original commit where table name are not
> properly handled.
> 
> Fixes: 15faa0a30bed ("add support for VRF device")
> Signed-off-by: David Ahern 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 0/2] net: thunderx: Support for pass-2 hw features

2015-12-10 Thread Pavel Fedin
 All series:

 Reviewed-by: Pavel Fedin 

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
Behalf Of Sunil Goutham
Sent: Thursday, December 10, 2015 10:55 AM
To: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; 
p.fe...@samsung.com; sunil.gout...@caviumnetworks.com; Sunil
Goutham
Subject: [PATCH v2 0/2] net: thunderx: Support for pass-2 hw features

From: Sunil Goutham 

This patch set adds support for new features added in pass-2 revision of 
hardware like TSO and count based interrupt coalescing.

Changes from v1:
- Addressed comments received regarding boolean bit field changes
  by excluding them from this patch. Will submit a seperate
  patch along with cleanup of unsed field.
- Got rid of new macro 'VNIC_NAPI_WEIGHT' introduced in
  count threshold interrupt patch.

Sunil Goutham (2):
  net: thunderx: HW TSO support for pass-2 hardware
  net: thunderx: Enable CQE count threshold interrupt

 drivers/net/ethernet/cavium/thunder/nic.h  |6 
 drivers/net/ethernet/cavium/thunder/nic_main.c |   11 ++-
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   15 -
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |   22 ++
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |2 +-
 drivers/net/ethernet/cavium/thunder/q_struct.h |   30 ++-
 6 files changed, 55 insertions(+), 31 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in the body 
of a message to majord...@vger.kernel.org More
majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set

2015-12-10 Thread Corinna Vinschen
On Dec  9 23:43, Francois Romieu wrote:
> Corinna Vinschen  :
> [...]
> > This patch is supposed to fix this behaviour.  If LanWake is 0, the
> > function now returns 0.  Thus ethtool correctly reports "Wake-on: d".
> 
> Can you turn it into a DMI controlled one (something like
> drivers/net/ethernet/marvell/skge.c use of dmi_check_system)

I could do this (after I could lay my hands on such a board, that is),
but I'm not convinced that this makes a lot of sense for two reasons.

> in order to
> avoid a global change of behavior ?

1.  There is no global change in behaviour.  The usual way to handle the
WoL flags is to set the affected method flags and additionally set
LanWake if any of the method flags is set.  The fact that the method
flags don't enable WoL without also settting the LanWake flag is
documented.

__rtl8169_get_wol not reflecting this is a bug.  The code lazily
assumes that checking the WoL method flags is sufficient while in
fact it isn't.  __rtl8169_set_wol sets the LanWake flag accordingly,
but that doesn't mean the driver may assume that the flags haven't
been set differently.  I can easily hack the driver to set LanWake
to 0 and ethtool would still happily report WoL is active.  That's
plain wrong.

2. While we now know a single board which neglects to set the LanWake
   flag, that doesn't mean there aren't other boards out there doing the
   same.
   
   On top of that, the state of the NIC registers in terms of WoL are
   *not* board-specific.  They are regular NIC registers which are just
   set in a combination which the driver in it's current state evaluates
   wrongly.  It doesn't matter who and why the flags have been set that
   way.  The driver should reflect the actual state, and that requires
   to check for LanWake.

For those reasons I think that my fix is the right thing to do.

> Btw it's probably time to emit some warning during driver probe if wol
> bits are not consistent with LanWake.

That's a good idea.  I'll propose a followup patch with this addition.


Thanks,
Corinna


pgp6boYAPgk5h.pgp
Description: PGP signature


[PATCH v2 2/2] net: thunderx: Enable CQE count threshold interrupt

2015-12-10 Thread Sunil Goutham
From: Sunil Goutham 

This feature is introduced in pass-2 chip and with this CQ interrupt
coalescing will work based on both timer and count.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |2 +-
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index b11fc09..d0d1b54 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -299,7 +299,7 @@ static int nicvf_init_cmp_queue(struct nicvf *nic,
return err;
 
cq->desc = cq->dmem.base;
-   cq->thresh = CMP_QUEUE_CQE_THRESH;
+   cq->thresh = pass1_silicon(nic->pdev) ? 0 : CMP_QUEUE_CQE_THRESH;
nic->cq_coalesce_usecs = (CMP_QUEUE_TIMER_THRESH * 0.05) - 1;
 
return 0;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
index a4f6667..c5030a7 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -75,7 +75,7 @@
  */
 #define CMP_QSIZE  CMP_QUEUE_SIZE2
 #define CMP_QUEUE_LEN  (1ULL << (CMP_QSIZE + 10))
-#define CMP_QUEUE_CQE_THRESH   0
+#define CMP_QUEUE_CQE_THRESH   (NAPI_POLL_WEIGHT / 2)
 #define CMP_QUEUE_TIMER_THRESH 80 /* ~2usec */
 
 #define RBDR_SIZE  RBDR_SIZE0
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 1/3] rco: Clean up casting errors

2015-12-10 Thread Tom Herbert
Fixe a couple of cast errors found by sparse.

Signed-off-by: Tom Herbert 
---
 include/net/checksum.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 9fcaedf..10a16b5 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -165,7 +165,8 @@ static inline __wsum remcsum_adjust(void *ptr, __wsum csum,
csum = csum_sub(csum, csum_partial(ptr, start, 0));
 
/* Set derived checksum in packet */
-   delta = csum_sub(csum_fold(csum), *psum);
+   delta = csum_sub((__force __wsum)csum_fold(csum),
+(__force __wsum)*psum);
*psum = csum_fold(csum);
 
return delta;
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OVS VXLAN decap rule has full match on TTL for the outer headers?

2015-12-10 Thread Or Gerlitz
On Thu, Dec 10, 2015 at 11:23 PM, Joe Stringer  wrote:
> On 10 December 2015 at 13:06, Or Gerlitz  wrote:
>> On Wed, Dec 9, 2015 at 2:22 AM, Joe Stringer  wrote:

> As far as the mask, I briefly discussed this with Jarno and it seems
> like it could be something as simple as zeroing the ip_ttl mask in
> tnl_wc_init().

 to make sure I follow, will that have the consequence that we (user +
 kernel) will practically not be testing the ttl for these flows?

>>> Yes, it would cause userspace to 'wildcard' the field so the kernel
>>> flows that are installed will ignore it during lookup.

>> Cool, any chance this is gonna fit into your schedule to meet 4.4? if
>> not, for 4.5?
>> Also, can the patch be made simple/small enough to go into -stable as well?

> It's a userspace change.


mmm, in a downstream post of this thread [1] Haggai pointed to you
that there's code in the OVS kernel path that that rejects new tunnel
flows if they don't have the TTL mask set, so he's wrong? where?

Or.

[1] http://marc.info/?l=linux-netdev=144880328121156=2
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set

2015-12-10 Thread Corinna Vinschen
On Dec 10 21:40, Francois Romieu wrote:
> Corinna Vinschen  :
> [...]
> > I could do this (after I could lay my hands on such a board, that is),
> > but I'm not convinced that this makes a lot of sense for two reasons.
> 
> Ok, let's get this change applied. Whatever happens should not be
> hard to manage (I'm thinking about other boards or BIOSes relying on the
> current - broken as it can be - behavior to work correctly).
> 
> [...]
> > 1.  There is no global change in behaviour.  The usual way to handle the
> > WoL flags is to set the affected method flags and additionally set
> > LanWake if any of the method flags is set.  The fact that the method
> > flags don't enable WoL without also settting the LanWake flag is
> > documented.
> 
> I see no such thing in "7.5 Power Management Function" of the 8168c
> registers datasheet. While Config3 states Magic Packet and Link Up
> dependencies on Config1.PMEn, it says nothing about Config5.LanWake.
> 
> On old 8169 chipsets LanWake is autoloaded from EEPROM.
> 
> Plausible for Config5.{B, M, U}WF ? Ok.
> 
> Documented ? I am genuinely curious to know where.

Ok, I reread the documentation I have, and I got that wrong it seems.
Apparently the LanWake flag enables or disables the LANWAKE/LANWAKEB pin
only but not the other possible PM events.

So, self-NACKed.

It's still a bit weird.  On the machines I tested this on, if I disable
LanWake and shutdown the machine, I can send, e.g., MagicPackets as much
as I like, the machined don't come up.  Isn't it a bit misleading then
if ethtool reports that some WoL method is enabled but it doesn't work?


Corinna


pgpdp6cN6Fxeq.pgp
Description: PGP signature


[PATCH net-next v3 3/3] geneve: Remote Checksum Offload support

2015-12-10 Thread Tom Herbert
Add support for remote checksum offload in both the normal and GRO
paths. netlinks command are used to enable sending of the Remote
Checksum Data, and allow processing of it on receive.

Signed-off-by: Tom Herbert 
---
 drivers/net/geneve.c | 162 ---
 include/net/geneve.h |  22 --
 include/uapi/linux/if_link.h |   3 +
 3 files changed, 174 insertions(+), 13 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 0750d7a..9d4f487 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -78,6 +78,9 @@ struct geneve_dev {
 #define GENEVE_F_UDP_CSUM  BIT(0)
 #define GENEVE_F_UDP_ZERO_CSUM6_TX BIT(1)
 #define GENEVE_F_UDP_ZERO_CSUM6_RX BIT(2)
+#define GENEVE_F_REMCSUM_TXBIT(3)
+#define GENEVE_F_REMCSUM_RXBIT(4)
+#define GENEVE_F_REMCSUM_NOPARTIAL BIT(5)
 
 struct geneve_sock {
boolcollect_md;
@@ -308,6 +311,33 @@ static void geneve_uninit(struct net_device *dev)
free_percpu(dev->tstats);
 }
 
+static struct genevehdr *geneve_remcsum(struct sk_buff *skb,
+   struct genevehdr *gh,
+   size_t hdrlen, bool nopartial)
+{
+   size_t start, offset, plen;
+
+   if (skb->remcsum_offload)
+   return gh;
+
+   start = gh->rco_start << GENEVE_RCO_SHIFT;
+   offset = start + (gh->udp_rco ?
+ offsetof(struct udphdr, check) :
+ offsetof(struct tcphdr, check));
+
+   plen = hdrlen + offset + sizeof(u16);
+
+   if (!pskb_may_pull(skb, plen))
+   return NULL;
+
+   gh = (struct genevehdr *)(udp_hdr(skb) + 1);
+
+   skb_remcsum_process(skb, (void *)gh + hdrlen, start, offset,
+   nopartial);
+
+   return gh;
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
@@ -336,6 +366,15 @@ static int geneve_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
if (!gs)
goto drop;
 
+   if (geneveh->rco && (gs->flags & GENEVE_F_REMCSUM_RX)) {
+   geneveh = geneve_remcsum(skb, geneveh,
+sizeof(*geneveh) + opts_len,
+!!(gs->flags &
+   GENEVE_F_REMCSUM_NOPARTIAL));
+   if (unlikely(!geneveh))
+   goto drop;
+   }
+
geneve_rx(gs, skb);
return 0;
 
@@ -397,6 +436,32 @@ static int geneve_hlen(struct genevehdr *gh)
return sizeof(*gh) + gh->opt_len * 4;
 }
 
+static struct genevehdr *geneve_gro_remcsum(struct sk_buff *skb,
+   unsigned int off,
+   struct genevehdr *gh, size_t hdrlen,
+   struct gro_remcsum *grc,
+   bool nopartial)
+{
+   size_t start, offset;
+
+   if (skb->remcsum_offload)
+   return gh;
+
+   if (!NAPI_GRO_CB(skb)->csum_valid)
+   return NULL;
+
+   start = gh->rco_start << GENEVE_RCO_SHIFT;
+   offset = start + (gh->udp_rco ?
+ offsetof(struct udphdr, check) :
+ offsetof(struct tcphdr, check));
+
+   gh = skb_gro_remcsum_process(skb, (void *)gh, off, hdrlen,
+start, offset, grc, nopartial);
+
+   skb->remcsum_offload = 1;
+
+   return gh;
+}
 static struct sk_buff **geneve_gro_receive(struct sk_buff **head,
   struct sk_buff *skb,
   struct udp_offload *uoff)
@@ -407,6 +472,11 @@ static struct sk_buff **geneve_gro_receive(struct sk_buff 
**head,
const struct packet_offload *ptype;
__be16 type;
int flush = 1;
+   struct gro_remcsum grc;
+   struct geneve_sock *gs = container_of(uoff, struct geneve_sock,
+ udp_offloads);
+
+   skb_gro_remcsum_init();
 
off_gnv = skb_gro_offset(skb);
hlen = off_gnv + sizeof(*gh);
@@ -421,6 +491,16 @@ static struct sk_buff **geneve_gro_receive(struct sk_buff 
**head,
goto out;
gh_len = geneve_hlen(gh);
 
+   skb_gro_postpull_rcsum(skb, gh, gh_len);
+
+   if (gh->rco && (gs->flags & GENEVE_F_REMCSUM_RX)) {
+   gh = geneve_gro_remcsum(skb, off_gnv, gh, gh_len, ,
+   !!(gs->flags &
+ GENEVE_F_REMCSUM_NOPARTIAL));
+   if (unlikely(!gh))
+   goto out;
+   }
+
hlen = off_gnv + gh_len;
if (skb_gro_header_hard(skb, hlen)) {
gh = skb_gro_header_slow(skb, hlen, 

[PATCH net-next v3 0/3] geneve: Add support for Remote Checksum Offload

2015-12-10 Thread Tom Herbert
This patch set adds UDP checksum configuration via netlink and
Remote Checksum Offload for Geneve,

v2:
  - Fix type in commi log

v3:
  - Fix issue of taking sizeof a pointer instead of the actual object (for real)

Testing (10Gbps mlx4):

Single connection TCP_STREAM in netperf

  - No UDP checksums, no RCO
 4371.9 Mbpos

  - UDP checksums enabled, no RCO
 7263.4 Mbps
   
  - UDP checksums enabled, RCO enabled
 7607.6 Mbps

200 TCP_RR streams

  - No UDP checksums, no RCO
55.05% CPU utilization
879284.9 tps
184/231/742 50/90/99% latencies

  - UDP checksums enabled, no RCO
55.46% CPU utilization
901785 tps
176/222/738 50/90/99% latencies

  - UDP checksums enabled, RCO enabled
52.36% CPU utilization
910582 tps
174/218/706 50/90/99% latencies


Tom Herbert (3):
  rco: Clean up casting errors
  geneve: UDP checksum configuration via netlink
  geneve: Remote Checksum Offload support

 drivers/net/geneve.c | 249 ++-
 include/net/checksum.h   |   3 +-
 include/net/geneve.h |  22 +++-
 include/uapi/linux/if_link.h |   6 ++
 4 files changed, 246 insertions(+), 34 deletions(-)

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 2/3] geneve: UDP checksum configuration via netlink

2015-12-10 Thread Tom Herbert
Add support to enable and disable UDP checksums via netlink. This is
similar to how VXLAN and GUE allow this. This includes support for
enabling the UDP zero checksum (for both TX and RX).

Signed-off-by: Tom Herbert 
---
 drivers/net/geneve.c | 93 +---
 include/uapi/linux/if_link.h |  3 ++
 2 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index de5c30c..0750d7a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -71,8 +71,14 @@ struct geneve_dev {
__be16 dst_port;
bool   collect_md;
struct gro_cells   gro_cells;
+   u32flags;
 };
 
+/* Geneve device flags */
+#define GENEVE_F_UDP_CSUM  BIT(0)
+#define GENEVE_F_UDP_ZERO_CSUM6_TX BIT(1)
+#define GENEVE_F_UDP_ZERO_CSUM6_RX BIT(2)
+
 struct geneve_sock {
boolcollect_md;
struct list_headlist;
@@ -81,6 +87,7 @@ struct geneve_sock {
int refcnt;
struct udp_offload  udp_offloads;
struct hlist_head   vni_list[VNI_HASH_SIZE];
+   u32 flags;
 };
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
@@ -343,7 +350,7 @@ error:
 }
 
 static struct socket *geneve_create_sock(struct net *net, bool ipv6,
-__be16 port)
+__be16 port, u32 flags)
 {
struct socket *sock;
struct udp_port_cfg udp_conf;
@@ -354,6 +361,8 @@ static struct socket *geneve_create_sock(struct net *net, 
bool ipv6,
if (ipv6) {
udp_conf.family = AF_INET6;
udp_conf.ipv6_v6only = 1;
+   udp_conf.use_udp6_rx_checksums =
+   !(flags & GENEVE_F_UDP_ZERO_CSUM6_RX);
} else {
udp_conf.family = AF_INET;
udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
@@ -480,7 +489,7 @@ static int geneve_gro_complete(struct sk_buff *skb, int 
nhoff,
 
 /* Create new listen socket if needed */
 static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
-   bool ipv6)
+   bool ipv6, u32 flags)
 {
struct geneve_net *gn = net_generic(net, geneve_net_id);
struct geneve_sock *gs;
@@ -492,7 +501,7 @@ static struct geneve_sock *geneve_socket_create(struct net 
*net, __be16 port,
if (!gs)
return ERR_PTR(-ENOMEM);
 
-   sock = geneve_create_sock(net, ipv6, port);
+   sock = geneve_create_sock(net, ipv6, port, flags);
if (IS_ERR(sock)) {
kfree(gs);
return ERR_CAST(sock);
@@ -575,12 +584,13 @@ static int geneve_sock_add(struct geneve_dev *geneve, 
bool ipv6)
goto out;
}
 
-   gs = geneve_socket_create(net, geneve->dst_port, ipv6);
+   gs = geneve_socket_create(net, geneve->dst_port, ipv6, geneve->flags);
if (IS_ERR(gs))
return PTR_ERR(gs);
 
 out:
gs->collect_md = geneve->collect_md;
+   gs->flags = geneve->flags;
 #if IS_ENABLED(CONFIG_IPV6)
if (ipv6)
geneve->sock6 = gs;
@@ -642,11 +652,12 @@ static void geneve_build_header(struct genevehdr *geneveh,
 
 static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
__be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-   bool csum, bool xnet)
+   u32 flags, bool xnet)
 {
struct genevehdr *gnvh;
int min_headroom;
int err;
+   bool udp_sum = !!(flags & GENEVE_F_UDP_CSUM);
 
skb_scrub_packet(skb, xnet);
 
@@ -658,7 +669,7 @@ static int geneve_build_skb(struct rtable *rt, struct 
sk_buff *skb,
goto free_rt;
}
 
-   skb = udp_tunnel_handle_offloads(skb, csum);
+   skb = udp_tunnel_handle_offloads(skb, udp_sum);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
goto free_rt;
@@ -678,11 +689,12 @@ free_rt:
 #if IS_ENABLED(CONFIG_IPV6)
 static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-bool csum, bool xnet)
+u32 flags, bool xnet)
 {
struct genevehdr *gnvh;
int min_headroom;
int err;
+   bool udp_sum = !(flags & GENEVE_F_UDP_ZERO_CSUM6_TX);
 
skb_scrub_packet(skb, xnet);
 
@@ -694,7 +706,7 @@ static int geneve6_build_skb(struct dst_entry *dst, struct 
sk_buff *skb,
goto free_dst;
}
 
-   skb = udp_tunnel_handle_offloads(skb, csum);
+   skb = udp_tunnel_handle_offloads(skb, udp_sum);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
goto free_dst;
@@ 

Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set

2015-12-10 Thread Francois Romieu
Corinna Vinschen  :
[...]
> I could do this (after I could lay my hands on such a board, that is),
> but I'm not convinced that this makes a lot of sense for two reasons.

Ok, let's get this change applied. Whatever happens should not be
hard to manage (I'm thinking about other boards or BIOSes relying on the
current - broken as it can be - behavior to work correctly).

[...]
> 1.  There is no global change in behaviour.  The usual way to handle the
> WoL flags is to set the affected method flags and additionally set
> LanWake if any of the method flags is set.  The fact that the method
> flags don't enable WoL without also settting the LanWake flag is
> documented.

I see no such thing in "7.5 Power Management Function" of the 8168c
registers datasheet. While Config3 states Magic Packet and Link Up
dependencies on Config1.PMEn, it says nothing about Config5.LanWake.

On old 8169 chipsets LanWake is autoloaded from EEPROM.

Plausible for Config5.{B, M, U}WF ? Ok.

Documented ? I am genuinely curious to know where.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Checksum offload queries

2015-12-10 Thread Rustad, Mark D
Edward Cree  wrote:

> I have just realised something startling.  Assuming the inner protocol uses 
> the ones complement checksum in the way IP, UDP and TCP do, the outer 
> checksum can be computed *without looking at the payload*.  Why?  Because the 
> ones complement sum of (say) a correctly checksummed UDP datagram is simply 
> the complement of the ones complement sum of the pseudo header.  Similarly, 
> the ones complement sum of a correctly checksummed IP header is zero.
> Therefore, the outer checksum depends _only_ on the inner and outer pseudo 
> headers and the encapsulation headers.  For example, with UDP encapsulated in 
> VXLAN, we have the following packet structure:
> ETH IP UDP VXLAN inner-ETH inner-IP inner-UDP PAYLOAD
> and the outer checksum equals
> ~([outer_pseudo] + [UDP] + [VXLAN] + [inner-ETH] + ~[inner_pseudo])
> where [] denotes summation, and all addition is ones complement.
> This can easily be computed in software, especially as the stack already has 
> ~[inner_pseudo]: it's stored in the inner checksum field to help inner 
> checksum offload.
> 
> Have I made a mistake in my ones-complement maths, or is outer checksum 
> offload as unnecessary as IP header checksum offload?

I agree with the overall observation, in that the outer checksum can be derived 
from the inner one. I think that the inner-ip header needs to be added (after 
subtracting out the inner_pseudo as you indicate above), because the entire raw 
inner IP header needs to be included in the outer checksum. I haven't thought 
this all through in detail yet. It would be really nice to have a function that 
implemented something like this. Could one be structured to handle most 
encapsulations?

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: OVS VXLAN decap rule has full match on TTL for the outer headers?

2015-12-10 Thread Or Gerlitz
On Wed, Dec 9, 2015 at 2:22 AM, Joe Stringer  wrote:
> On 8 December 2015 at 13:23, Or Gerlitz  wrote:
>> On Tue, Dec 8, 2015 at 9:20 PM, Joe Stringer  wrote:

 Apologies for the delayed response, we haven't found anything
 interesting yet although we've mostly looked at plain set-field
 actions with a combination of kernel/userspace versions. I plan to
 carve out some time later this week to take another look.

>>> (resending due to teething issues with new email and plain-text, sorry
>>> for the spam)

>>> As far as the mask, I briefly discussed this with Jarno and it seems
>>> like it could be something as simple as zeroing the ip_ttl mask in
>>> tnl_wc_init().

>> to make sure I follow, will that have the consequence that we (user +
>> kernel) will practically not be testing the ttl for these flows?

> Yes, it would cause userspace to 'wildcard' the field so the kernel
> flows that are installed will ignore it during lookup.

Cool, any chance this is gonna fit into your schedule to meet 4.4? if
not, for 4.5?

Also, can the patch be made simple/small enough to go into -stable as well?

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCHv2 net-next 1/2] tcp: RTO Restart (RTOR)

2015-12-10 Thread Per Hurtig


> 10 dec. 2015 kl. 16:37 skrev Neal Cardwell :
> 
>> On Thu, Dec 10, 2015 at 1:51 AM, Per Hurtig  wrote:
>> 
>>> On 08 Dec 2015, at 14:47, Eric Dumazet  wrote:
>>> 
>>> On Tue, 2015-12-08 at 10:19 +0100, Per Hurtig wrote:
>>> 
 +static u32 tcp_unsent_pkts(const struct sock *sk, u32 ulimit)
 +{
 +struct sk_buff *skb = tcp_send_head(sk);
 +u32 pkts = 0;
 +
 +if (skb)
 +tcp_for_write_queue_from(skb, sk) {
 +pkts += tcp_skb_pcount(skb);
 +
 +if (ulimit && pkts >= ulimit)
 +return ulimit;
 +}
 +
 +return pkts;
 +}
>>> 
>>> 
>>> Considering Yuchung feedback, have you looked at using an approximation
>>> instead ?
>>> 
>>> (ie using tp->write_seq - tp->snd_nxt)
>> 
>> Well, an approximation is rather “dangerous” as missing a single packet
>> could inhibit the desired behaviour. If looping is undesired, I think a
>> better solution is to actually *not* do this check at all and instead rely
>> solely on the
>> 
>> tp->packets_out < TCP_RTORESTART_THRESH
> 
> Yes, this simpler version seems very much preferable, IMHO. I agree
> that it does not seem worth the complexity to try to cover the kind of
> corner cases you outline.
> 
> I would also suggest a TCP_RTORESTART_THRESH value higher than 4.
> 
> In the ID at https://tools.ietf.org/html/draft-ietf-tcpm-rtorestart-10 it 
> says:
> 
>   The RECOMMENDED value of rrthresh is four, as this value will ensure
>   that RTOR is only used when fast retransmit cannot be triggered.
> 
> But my sense is that fast retransmit is often not triggered at
> in-flight counts of much higher than 4, due to drop-tail queues, TSO
> bursts, the initial IW10 being unpaced, etc. It would be interesting
> to see A/B experiments for a few TCP_RTORESTART_THRESH values, say, 4
> vs 10.
> 
> neal
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sure. One idea could also be to use the "reordering" value as a dynamic 
threshhold?

-- Per
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Checksum offload queries

2015-12-10 Thread Edward Cree
On 10/12/15 16:26, Tom Herbert wrote:
> It sounds like potentially interesting work. You'll probably want my patches 
> that provider helper functions that allow a driver to verify that it can 
> offload a checksum. We'll have to update those also to allow two checksums. 
I have just realised something startling.  Assuming the inner protocol uses the 
ones complement checksum in the way IP, UDP and TCP do, the outer checksum can 
be computed *without looking at the payload*.  Why?  Because the ones 
complement sum of (say) a correctly checksummed UDP datagram is simply the 
complement of the ones complement sum of the pseudo header.  Similarly, the 
ones complement sum of a correctly checksummed IP header is zero.
Therefore, the outer checksum depends _only_ on the inner and outer pseudo 
headers and the encapsulation headers.  For example, with UDP encapsulated in 
VXLAN, we have the following packet structure:
ETH IP UDP VXLAN inner-ETH inner-IP inner-UDP PAYLOAD
and the outer checksum equals
~([outer_pseudo] + [UDP] + [VXLAN] + [inner-ETH] + ~[inner_pseudo])
where [] denotes summation, and all addition is ones complement.
This can easily be computed in software, especially as the stack already has 
~[inner_pseudo]: it's stored in the inner checksum field to help inner checksum 
offload.

Have I made a mistake in my ones-complement maths, or is outer checksum offload 
as unnecessary as IP header checksum offload?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 3/3] geneve: Remote Checksum Offload support

2015-12-10 Thread Jesse Gross
On Thu, Dec 10, 2015 at 12:37 PM, Tom Herbert  wrote:
> Add support for remote checksum offload in both the normal and GRO
> paths. netlinks command are used to enable sending of the Remote
> Checksum Data, and allow processing of it on receive.
>
> Signed-off-by: Tom Herbert 

Tom, can you please split this patch off and mark it as RFC or similar?

I don't have any objections to implementing remote checksum offload
for Geneve in general but I think that it's pretty clear that the
format that you are using here is not the direction that the protocol
is going to evolve. We don't need to fragment the protocol by applying
this at this time.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OVS VXLAN decap rule has full match on TTL for the outer headers?

2015-12-10 Thread Joe Stringer
On 10 December 2015 at 13:06, Or Gerlitz  wrote:
> On Wed, Dec 9, 2015 at 2:22 AM, Joe Stringer  wrote:
>> On 8 December 2015 at 13:23, Or Gerlitz  wrote:
>>> On Tue, Dec 8, 2015 at 9:20 PM, Joe Stringer  wrote:
>
> Apologies for the delayed response, we haven't found anything
> interesting yet although we've mostly looked at plain set-field
> actions with a combination of kernel/userspace versions. I plan to
> carve out some time later this week to take another look.
>
 (resending due to teething issues with new email and plain-text, sorry
 for the spam)
>
 As far as the mask, I briefly discussed this with Jarno and it seems
 like it could be something as simple as zeroing the ip_ttl mask in
 tnl_wc_init().
>
>>> to make sure I follow, will that have the consequence that we (user +
>>> kernel) will practically not be testing the ttl for these flows?
>
>> Yes, it would cause userspace to 'wildcard' the field so the kernel
>> flows that are installed will ignore it during lookup.
>
> Cool, any chance this is gonna fit into your schedule to meet 4.4? if
> not, for 4.5?
>
> Also, can the patch be made simple/small enough to go into -stable as well?

It's a userspace change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] VSOCK: Introduce virtio-vsock.ko

2015-12-10 Thread Alex Bennée

Stefan Hajnoczi  writes:

> From: Asias He 
>
> VM sockets virtio transport implementation. This module runs in guest
> kernel.

checkpatch warns on a bunch of whitespace/tab issues.

>
> Signed-off-by: Asias He 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v2:
>  * Fix total_tx_buf accounting
>  * Add virtio_transport global mutex to prevent races
> ---
>  net/vmw_vsock/virtio_transport.c | 466 
> +++
>  1 file changed, 466 insertions(+)
>  create mode 100644 net/vmw_vsock/virtio_transport.c
>
> diff --git a/net/vmw_vsock/virtio_transport.c 
> b/net/vmw_vsock/virtio_transport.c
> new file mode 100644
> index 000..df65dca
> --- /dev/null
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -0,0 +1,466 @@
> +/*
> + * virtio transport for vsock
> + *
> + * Copyright (C) 2013-2015 Red Hat, Inc.
> + * Author: Asias He 
> + * Stefan Hajnoczi 
> + *
> + * Some of the code is take from Gerd Hoffmann 's
> + * early virtio-vsock proof-of-concept bits.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static struct workqueue_struct *virtio_vsock_workqueue;
> +static struct virtio_vsock *the_virtio_vsock;
> +static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> +static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
> +
> +struct virtio_vsock {
> + /* Virtio device */
> + struct virtio_device *vdev;
> + /* Virtio virtqueue */
> + struct virtqueue *vqs[VSOCK_VQ_MAX];
> + /* Wait queue for send pkt */
> + wait_queue_head_t queue_wait;
> + /* Work item to send pkt */
> + struct work_struct tx_work;
> + /* Work item to recv pkt */
> + struct work_struct rx_work;
> + /* Mutex to protect send pkt*/
> + struct mutex tx_lock;
> + /* Mutex to protect recv pkt*/
> + struct mutex rx_lock;

Further down I got confused by what lock was what and exactly what was
being protected. If the receive and transmit paths touch separate things
it might be worth re-arranging the structure to make it clearer, eg:

   /* The transmit path is protected by tx_lock */
   struct mutex tx_lock;
   struct work_struct tx_work;
   ..
   ..

   /* The receive path is protected by rx_lock */
   wait_queue_head_t queue_wait;
   ..
   ..

 Which might make things a little clearer. Then all the redundant
 information in the comments can be removed. I don't need to know what
 is a Virtio device, virtqueue or wait_queue etc as they are implicit in
 the structure name.

> + /* Number of recv buffers */
> + int rx_buf_nr;
> + /* Number of max recv buffers */
> + int rx_buf_max_nr;
> + /* Used for global tx buf limitation */
> + u32 total_tx_buf;
> + /* Guest context id, just like guest ip address */
> + u32 guest_cid;
> +};
> +
> +static struct virtio_vsock *virtio_vsock_get(void)
> +{
> + return the_virtio_vsock;
> +}
> +
> +static u32 virtio_transport_get_local_cid(void)
> +{
> + struct virtio_vsock *vsock = virtio_vsock_get();
> +
> + return vsock->guest_cid;
> +}
> +
> +static int
> +virtio_transport_send_pkt(struct vsock_sock *vsk,
> +   struct virtio_vsock_pkt_info *info)
> +{
> + u32 src_cid, src_port, dst_cid, dst_port;
> + int ret, in_sg = 0, out_sg = 0;
> + struct virtio_transport *trans;
> + struct virtio_vsock_pkt *pkt;
> + struct virtio_vsock *vsock;
> + struct scatterlist hdr, buf, *sgs[2];
> + struct virtqueue *vq;
> + u32 pkt_len = info->pkt_len;
> + DEFINE_WAIT(wait);
> +
> + vsock = virtio_vsock_get();
> + if (!vsock)
> + return -ENODEV;
> +
> + src_cid = virtio_transport_get_local_cid();
> + src_port = vsk->local_addr.svm_port;
> + if (!info->remote_cid) {
> + dst_cid = vsk->remote_addr.svm_cid;
> + dst_port = vsk->remote_addr.svm_port;
> + } else {
> + dst_cid = info->remote_cid;
> + dst_port = info->remote_port;
> + }
> +
> + trans = vsk->trans;
> + vq = vsock->vqs[VSOCK_VQ_TX];
> +
> + if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
> + pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> + pkt_len = virtio_transport_get_credit(trans, pkt_len);
> + /* Do not send zero length OP_RW pkt*/
> + if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> + return pkt_len;
> +
> + /* Respect global tx buf limitation */
> + mutex_lock(>tx_lock);
> + while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
> + prepare_to_wait_exclusive(>queue_wait, ,
> +   TASK_UNINTERRUPTIBLE);
> + mutex_unlock(>tx_lock);
> +

[PATCH net-next] bpf, inode: allow for rename and link ops

2015-12-10 Thread Daniel Borkmann
Add support for renaming and hard links to the fs. Most of this can be
implemented by using simple library operations under the same constraints
that we don't use a reserved name like elsewhere. Linking can be useful
to share/manage things like maps across subsystem users. It works within
the file system boundary, but is not allowed for directories.

Symbolic links are explicitly not implemented here, as it can be better
done already by doing bind mounts inside bpf fs to set up shared directories
f.e. useful when using volumes in docker containers that map a private
working directory into /sys/fs/bpf/ which contains itself a bind mounted
path from the host's /sys/fs/bpf/ mount that is shared among multiple
containers. For single maps instead of whole directory, hard links can
be easily used to do the same.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/inode.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 5a8a797..f2ece3c 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -187,11 +187,31 @@ static int bpf_mkobj(struct inode *dir, struct dentry 
*dentry, umode_t mode,
}
 }
 
+static int bpf_link(struct dentry *old_dentry, struct inode *dir,
+   struct dentry *new_dentry)
+{
+   if (bpf_dname_reserved(new_dentry))
+   return -EPERM;
+
+   return simple_link(old_dentry, dir, new_dentry);
+}
+
+static int bpf_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+   if (bpf_dname_reserved(new_dentry))
+   return -EPERM;
+
+   return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
+}
+
 static const struct inode_operations bpf_dir_iops = {
.lookup = simple_lookup,
.mknod  = bpf_mkobj,
.mkdir  = bpf_mkdir,
.rmdir  = simple_rmdir,
+   .rename = bpf_rename,
+   .link   = bpf_link,
.unlink = simple_unlink,
 };
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 2/3] geneve: UDP checksum configuration via netlink

2015-12-10 Thread Tom Herbert
Add support to enable and disable UDP checksums via netlink. This is
similar to how VXLAN and GUE allow this. This includes support for
enabling the UDP zero checksum (for both TX and RX).

Signed-off-by: Tom Herbert 
---
 drivers/net/geneve.c | 93 +---
 include/uapi/linux/if_link.h |  3 ++
 2 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index de5c30c..0750d7a 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -71,8 +71,14 @@ struct geneve_dev {
__be16 dst_port;
bool   collect_md;
struct gro_cells   gro_cells;
+   u32flags;
 };
 
+/* Geneve device flags */
+#define GENEVE_F_UDP_CSUM  BIT(0)
+#define GENEVE_F_UDP_ZERO_CSUM6_TX BIT(1)
+#define GENEVE_F_UDP_ZERO_CSUM6_RX BIT(2)
+
 struct geneve_sock {
boolcollect_md;
struct list_headlist;
@@ -81,6 +87,7 @@ struct geneve_sock {
int refcnt;
struct udp_offload  udp_offloads;
struct hlist_head   vni_list[VNI_HASH_SIZE];
+   u32 flags;
 };
 
 static inline __u32 geneve_net_vni_hash(u8 vni[3])
@@ -343,7 +350,7 @@ error:
 }
 
 static struct socket *geneve_create_sock(struct net *net, bool ipv6,
-__be16 port)
+__be16 port, u32 flags)
 {
struct socket *sock;
struct udp_port_cfg udp_conf;
@@ -354,6 +361,8 @@ static struct socket *geneve_create_sock(struct net *net, 
bool ipv6,
if (ipv6) {
udp_conf.family = AF_INET6;
udp_conf.ipv6_v6only = 1;
+   udp_conf.use_udp6_rx_checksums =
+   !(flags & GENEVE_F_UDP_ZERO_CSUM6_RX);
} else {
udp_conf.family = AF_INET;
udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
@@ -480,7 +489,7 @@ static int geneve_gro_complete(struct sk_buff *skb, int 
nhoff,
 
 /* Create new listen socket if needed */
 static struct geneve_sock *geneve_socket_create(struct net *net, __be16 port,
-   bool ipv6)
+   bool ipv6, u32 flags)
 {
struct geneve_net *gn = net_generic(net, geneve_net_id);
struct geneve_sock *gs;
@@ -492,7 +501,7 @@ static struct geneve_sock *geneve_socket_create(struct net 
*net, __be16 port,
if (!gs)
return ERR_PTR(-ENOMEM);
 
-   sock = geneve_create_sock(net, ipv6, port);
+   sock = geneve_create_sock(net, ipv6, port, flags);
if (IS_ERR(sock)) {
kfree(gs);
return ERR_CAST(sock);
@@ -575,12 +584,13 @@ static int geneve_sock_add(struct geneve_dev *geneve, 
bool ipv6)
goto out;
}
 
-   gs = geneve_socket_create(net, geneve->dst_port, ipv6);
+   gs = geneve_socket_create(net, geneve->dst_port, ipv6, geneve->flags);
if (IS_ERR(gs))
return PTR_ERR(gs);
 
 out:
gs->collect_md = geneve->collect_md;
+   gs->flags = geneve->flags;
 #if IS_ENABLED(CONFIG_IPV6)
if (ipv6)
geneve->sock6 = gs;
@@ -642,11 +652,12 @@ static void geneve_build_header(struct genevehdr *geneveh,
 
 static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb,
__be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-   bool csum, bool xnet)
+   u32 flags, bool xnet)
 {
struct genevehdr *gnvh;
int min_headroom;
int err;
+   bool udp_sum = !!(flags & GENEVE_F_UDP_CSUM);
 
skb_scrub_packet(skb, xnet);
 
@@ -658,7 +669,7 @@ static int geneve_build_skb(struct rtable *rt, struct 
sk_buff *skb,
goto free_rt;
}
 
-   skb = udp_tunnel_handle_offloads(skb, csum);
+   skb = udp_tunnel_handle_offloads(skb, udp_sum);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
goto free_rt;
@@ -678,11 +689,12 @@ free_rt:
 #if IS_ENABLED(CONFIG_IPV6)
 static int geneve6_build_skb(struct dst_entry *dst, struct sk_buff *skb,
 __be16 tun_flags, u8 vni[3], u8 opt_len, u8 *opt,
-bool csum, bool xnet)
+u32 flags, bool xnet)
 {
struct genevehdr *gnvh;
int min_headroom;
int err;
+   bool udp_sum = !(flags & GENEVE_F_UDP_ZERO_CSUM6_TX);
 
skb_scrub_packet(skb, xnet);
 
@@ -694,7 +706,7 @@ static int geneve6_build_skb(struct dst_entry *dst, struct 
sk_buff *skb,
goto free_dst;
}
 
-   skb = udp_tunnel_handle_offloads(skb, csum);
+   skb = udp_tunnel_handle_offloads(skb, udp_sum);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
goto free_dst;
@@ 

[PATCH net-next v2 0/3] geneve: Add support for Remote Checksum Offload

2015-12-10 Thread Tom Herbert
This patch set adds UDP checksum configuration via netlink and
Remote Checksum Offload for Geneve,

v2:
  - Fix issue of taking sizeof a pointer instead of the actual object
  - Fix type in commi log

Testing (10Gbps mlx4):

Single connection TCP_STREAM in netperf

  - No UDP checksums, no RCO
 4371.9 Mbpos

  - UDP checksums enabled, no RCO
 7263.4 Mbps
   
  - UDP checksums enabled, RCO enabled
 7607.6 Mbps

200 TCP_RR streams

  - No UDP checksums, no RCO
55.05% CPU utilization
879284.9 tps
184/231/742 50/90/99% latencies

  - UDP checksums enabled, no RCO
55.46% CPU utilization
901785 tps
176/222/738 50/90/99% latencies

  - UDP checksums enabled, RCO enabled
52.36% CPU utilization
910582 tps
174/218/706 50/90/99% latencies


Tom Herbert (3):
  rco: Clean up casting errors
  geneve: UDP checksum configuration via netlink
  geneve: Remote Checksum Offload support

 drivers/net/geneve.c | 249 ++-
 include/net/checksum.h   |   3 +-
 include/net/geneve.h |  22 +++-
 include/uapi/linux/if_link.h |   6 ++
 4 files changed, 246 insertions(+), 34 deletions(-)

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 3/3] geneve: Remote Checksum Offload support

2015-12-10 Thread Tom Herbert
Add support for remote checksum offload in both the normal and GRO
paths. netlinks command are used to enable sending of the Remote
Checksum Data, and allow processing of it on receive.

Signed-off-by: Tom Herbert 
---
 drivers/net/geneve.c | 162 ---
 include/net/geneve.h |  22 --
 include/uapi/linux/if_link.h |   3 +
 3 files changed, 174 insertions(+), 13 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 0750d7a..68945a4 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -78,6 +78,9 @@ struct geneve_dev {
 #define GENEVE_F_UDP_CSUM  BIT(0)
 #define GENEVE_F_UDP_ZERO_CSUM6_TX BIT(1)
 #define GENEVE_F_UDP_ZERO_CSUM6_RX BIT(2)
+#define GENEVE_F_REMCSUM_TXBIT(3)
+#define GENEVE_F_REMCSUM_RXBIT(4)
+#define GENEVE_F_REMCSUM_NOPARTIAL BIT(5)
 
 struct geneve_sock {
boolcollect_md;
@@ -308,6 +311,33 @@ static void geneve_uninit(struct net_device *dev)
free_percpu(dev->tstats);
 }
 
+static struct genevehdr *geneve_remcsum(struct sk_buff *skb,
+   struct genevehdr *gh,
+   size_t hdrlen, bool nopartial)
+{
+   size_t start, offset, plen;
+
+   if (skb->remcsum_offload)
+   return gh;
+
+   start = gh->rco_start << GENEVE_RCO_SHIFT;
+   offset = start + (gh->udp_rco ?
+ offsetof(struct udphdr, check) :
+ offsetof(struct tcphdr, check));
+
+   plen = hdrlen + offset + sizeof(u16);
+
+   if (!pskb_may_pull(skb, plen))
+   return NULL;
+
+   gh = (struct genevehdr *)(udp_hdr(skb) + 1);
+
+   skb_remcsum_process(skb, (void *)gh + hdrlen, start, offset,
+   nopartial);
+
+   return gh;
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int geneve_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
@@ -336,6 +366,15 @@ static int geneve_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
if (!gs)
goto drop;
 
+   if (geneveh->rco && (gs->flags & GENEVE_F_REMCSUM_RX)) {
+   geneveh = geneve_remcsum(skb, geneveh,
+sizeof(geneveh) + opts_len,
+!!(gs->flags &
+   GENEVE_F_REMCSUM_NOPARTIAL));
+   if (unlikely(!geneveh))
+   goto drop;
+   }
+
geneve_rx(gs, skb);
return 0;
 
@@ -397,6 +436,32 @@ static int geneve_hlen(struct genevehdr *gh)
return sizeof(*gh) + gh->opt_len * 4;
 }
 
+static struct genevehdr *geneve_gro_remcsum(struct sk_buff *skb,
+   unsigned int off,
+   struct genevehdr *gh, size_t hdrlen,
+   struct gro_remcsum *grc,
+   bool nopartial)
+{
+   size_t start, offset;
+
+   if (skb->remcsum_offload)
+   return gh;
+
+   if (!NAPI_GRO_CB(skb)->csum_valid)
+   return NULL;
+
+   start = gh->rco_start << GENEVE_RCO_SHIFT;
+   offset = start + (gh->udp_rco ?
+ offsetof(struct udphdr, check) :
+ offsetof(struct tcphdr, check));
+
+   gh = skb_gro_remcsum_process(skb, (void *)gh, off, hdrlen,
+start, offset, grc, nopartial);
+
+   skb->remcsum_offload = 1;
+
+   return gh;
+}
 static struct sk_buff **geneve_gro_receive(struct sk_buff **head,
   struct sk_buff *skb,
   struct udp_offload *uoff)
@@ -407,6 +472,11 @@ static struct sk_buff **geneve_gro_receive(struct sk_buff 
**head,
const struct packet_offload *ptype;
__be16 type;
int flush = 1;
+   struct gro_remcsum grc;
+   struct geneve_sock *gs = container_of(uoff, struct geneve_sock,
+ udp_offloads);
+
+   skb_gro_remcsum_init();
 
off_gnv = skb_gro_offset(skb);
hlen = off_gnv + sizeof(*gh);
@@ -421,6 +491,16 @@ static struct sk_buff **geneve_gro_receive(struct sk_buff 
**head,
goto out;
gh_len = geneve_hlen(gh);
 
+   skb_gro_postpull_rcsum(skb, gh, gh_len);
+
+   if (gh->rco && (gs->flags & GENEVE_F_REMCSUM_RX)) {
+   gh = geneve_gro_remcsum(skb, off_gnv, gh, gh_len, ,
+   !!(gs->flags &
+ GENEVE_F_REMCSUM_NOPARTIAL));
+   if (unlikely(!gh))
+   goto out;
+   }
+
hlen = off_gnv + gh_len;
if (skb_gro_header_hard(skb, hlen)) {
gh = skb_gro_header_slow(skb, hlen, 

[PATCH net-next v2 1/3] rco: Clean up casting errors

2015-12-10 Thread Tom Herbert
Fixe a couple of cast errors found by sparse.

Signed-off-by: Tom Herbert 
---
 include/net/checksum.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 9fcaedf..10a16b5 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -165,7 +165,8 @@ static inline __wsum remcsum_adjust(void *ptr, __wsum csum,
csum = csum_sub(csum, csum_partial(ptr, start, 0));
 
/* Set derived checksum in packet */
-   delta = csum_sub(csum_fold(csum), *psum);
+   delta = csum_sub((__force __wsum)csum_fold(csum),
+(__force __wsum)*psum);
*psum = csum_fold(csum);
 
return delta;
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4 2/8] netfilter: Factor out nf_ct_get_info().

2015-12-10 Thread Jarno Rajahalme

> On Dec 10, 2015, at 11:14 AM, Pablo Neira Ayuso  wrote:
> 
> On Tue, Dec 08, 2015 at 05:01:04PM -0800, Jarno Rajahalme wrote:
>> Define a new inline function to map conntrack status to enum
>> ip_conntrack_info.  This removes the need to otherwise duplicate this
>> code in a later patch ("openvswitch: Find existing conntrack entry
>> after upcall.").
>> 
>> Signed-off-by: Jarno Rajahalme 
>> ---
>> include/net/netfilter/nf_conntrack.h | 15 +++
>> net/netfilter/nf_conntrack_core.c| 22 +++---
>> 2 files changed, 18 insertions(+), 19 deletions(-)
>> 
>> diff --git a/include/net/netfilter/nf_conntrack.h 
>> b/include/net/netfilter/nf_conntrack.h
>> index fde4068..b3de10e 100644
>> --- a/include/net/netfilter/nf_conntrack.h
>> +++ b/include/net/netfilter/nf_conntrack.h
>> @@ -125,6 +125,21 @@ nf_ct_tuplehash_to_ctrack(const struct 
>> nf_conntrack_tuple_hash *hash)
>>  tuplehash[hash->tuple.dst.dir]);
>> }
>> 
>> +static inline enum ip_conntrack_info
>> +nf_ct_get_info(const struct nf_conntrack_tuple_hash *h)
>> +{
>> +const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
>> +
>> +if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY)
>> +return IP_CT_ESTABLISHED_REPLY;
>> +/* Once we've had two way comms, always ESTABLISHED. */
>> +if (test_bit(IPS_SEEN_REPLY_BIT, >status))
>> +return IP_CT_ESTABLISHED;
>> +if (test_bit(IPS_EXPECTED_BIT, >status))
>> +return IP_CT_RELATED;
>> +return IP_CT_NEW;
>> +}
>> +
>> static inline u_int16_t nf_ct_l3num(const struct nf_conn *ct)
>> {
>>  return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num;
>> diff --git a/net/netfilter/nf_conntrack_core.c 
>> b/net/netfilter/nf_conntrack_core.c
>> index 3cb3cb8..70ddbd8 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1056,25 +1056,9 @@ resolve_normal_ct(struct net *net, struct nf_conn 
>> *tmpl,
>>  ct = nf_ct_tuplehash_to_ctrack(h);
>> 
>>  /* It exists; we have (non-exclusive) reference. */
>> -if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) {
>> -*ctinfo = IP_CT_ESTABLISHED_REPLY;
>> -/* Please set reply bit if this packet OK */
>> -*set_reply = 1;
>> -} else {
>> -/* Once we've had two way comms, always ESTABLISHED. */
>> -if (test_bit(IPS_SEEN_REPLY_BIT, >status)) {
>> -pr_debug("nf_conntrack_in: normal packet for %p\n", ct);
> 
> This implicitly assumes we don't want pr_debug for nf_conntrack
> anymore. Not telling this is wrong, but we have more pr_debug() calls
> in nf_conntrack that will remain there.

Dropping pr_debug was unintentional. Unless I hear otherwise, I’ll add the 
pr_debug() back for the next version.

Thanks,

  Jarno

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 03/11] ixgbe: Simplify definitions for regidx and bit in set_vfta

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 03/11] ixgbe: Simplify definitions for
> regidx and bit in set_vfta
> 
> This patch simplifies the logic for setting the VFTA register by removing the
> number of conditional checks needed.  Instead we just use some boolean logic
> to generate vfta_delta, and if that is set then we xor the vfta by that value 
> and
> write it back.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 08/11] ixgbe: Add support for VLAN promiscuous with SR-IOV

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 08/11] ixgbe: Add support for VLAN
> promiscuous with SR-IOV
> 
> This patch adds support for VLAN promiscuous with SR-IOV enabled.
> 
> The code prior to this patch was only adding the PF to VLANs that the VF had
> added.  As such enabling promiscuous mode would actually not add any
> additional VLAN filters so visibility was limited.  This lead to a number of 
> issues
> as the bridge and OVS would expect us to accept all VLAN tagged packets when
> promiscuous mode was enabled, and instead we would filter out most if not all
> depending on the configuration of the PF.
> 
> With this patch what we do is set all the bits in the VFTA and all of the 
> VLVF bits
> associated with the pool belonging to the PF.  By doing this the PF is 
> guaranteed
> to receive all VLAN tagged traffic associated with the RAR filters assigned 
> to the
> PF.  In addition we will clean up those same bits in the event of promiscuous
> mode being disabled.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 07/11] ixgbe: Reorder search to work from the top down instead of bottom up

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 07/11] ixgbe: Reorder search to work
> from the top down instead of bottom up
> 
> This patch is meant to reduce the complexity of the search function used for
> finding a VLVF entry associated with a given VLAN ID.  The previous code was
> searching from bottom to top.  I reordered it to search from top to bottom.  
> In
> addition I pulled an AND statement out of the loop and instead replaced it 
> with
> an OR statement outside the loop.  This should help to reduce the overall size
> and complexity of the function.
> 
> There was also some formatting I cleaned up in regards to whitespace and such.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OVS VXLAN decap rule has full match on TTL for the outer headers?

2015-12-10 Thread Joe Stringer
On 10 December 2015 at 13:43, Or Gerlitz  wrote:
> On Thu, Dec 10, 2015 at 11:23 PM, Joe Stringer  wrote:
>> On 10 December 2015 at 13:06, Or Gerlitz  wrote:
>>> On Wed, Dec 9, 2015 at 2:22 AM, Joe Stringer  wrote:
>
>> As far as the mask, I briefly discussed this with Jarno and it seems
>> like it could be something as simple as zeroing the ip_ttl mask in
>> tnl_wc_init().
>
> to make sure I follow, will that have the consequence that we (user +
> kernel) will practically not be testing the ttl for these flows?
>
 Yes, it would cause userspace to 'wildcard' the field so the kernel
 flows that are installed will ignore it during lookup.
>
>>> Cool, any chance this is gonna fit into your schedule to meet 4.4? if
>>> not, for 4.5?
>>> Also, can the patch be made simple/small enough to go into -stable as well?
>
>> It's a userspace change.
>
>
> mmm, in a downstream post of this thread [1] Haggai pointed to you
> that there's code in the OVS kernel path that that rejects new tunnel
> flows if they don't have the TTL mask set, so he's wrong? where?
>
> Or.
>
> [1] http://marc.info/?l=linux-netdev=144880328121156=2

The rejection is within an if statement called "if (!is_mask)", so it
seems to me like it is enforcing the flow key to specify a TTL value
(any), and doesn't care what the mask does.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v4 2/4] rhashtable: add function to replace an element

2015-12-10 Thread Tom Herbert
Add the rhashtable_replace_fast function. This replaces one object in
the table with another atomically. The hashes of the new and old objects
must be equal.

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..77deece 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -819,4 +819,86 @@ out:
return err;
 }
 
+/* Internal function, please use rhashtable_replace_fast() instead */
+static inline int __rhashtable_replace_fast(
+   struct rhashtable *ht, struct bucket_table *tbl,
+   struct rhash_head *obj_old, struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct rhash_head __rcu **pprev;
+   struct rhash_head *he;
+   spinlock_t *lock;
+   unsigned int hash;
+   int err = -ENOENT;
+
+   /* Minimally, the old and new objects must have same hash
+* (which should mean identifiers are the same).
+*/
+   hash = rht_head_hashfn(ht, tbl, obj_old, params);
+   if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
+   return -EINVAL;
+
+   lock = rht_bucket_lock(tbl, hash);
+
+   spin_lock_bh(lock);
+
+   pprev = >buckets[hash];
+   rht_for_each(he, tbl, hash) {
+   if (he != obj_old) {
+   pprev = >next;
+   continue;
+   }
+
+   rcu_assign_pointer(obj_new->next, obj_old->next);
+   rcu_assign_pointer(*pprev, obj_new);
+   err = 0;
+   break;
+   }
+
+   spin_unlock_bh(lock);
+
+   return err;
+}
+
+/**
+ * rhashtable_replace_fast - replace an object in hash table
+ * @ht:hash table
+ * @obj_old:   pointer to hash head inside object being replaced
+ * @obj_new:   pointer to hash head inside object which is new
+ * @params:hash table parameters
+ *
+ * Replacing an object doesn't affect the number of elements in the hash table
+ * or bucket, so we don't need to worry about shrinking or expanding the
+ * table here.
+ *
+ * Returns zero on success, -ENOENT if the entry could not be found,
+ * -EINVAL if hash is not the same for the old and new objects.
+ */
+static inline int rhashtable_replace_fast(
+   struct rhashtable *ht, struct rhash_head *obj_old,
+   struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct bucket_table *tbl;
+   int err;
+
+   rcu_read_lock();
+
+   tbl = rht_dereference_rcu(ht->tbl, ht);
+
+   /* Because we have already taken (and released) the bucket
+* lock in old_tbl, if we find that future_tbl is not yet
+* visible then that guarantees the entry to still be in
+* the old tbl if it exists.
+*/
+   while ((err = __rhashtable_replace_fast(ht, tbl, obj_old,
+   obj_new, params)) &&
+  (tbl = rht_dereference_rcu(tbl->future_tbl, ht)))
+   ;
+
+   rcu_read_unlock();
+
+   return err;
+}
+
 #endif /* _LINUX_RHASHTABLE_H */
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v4 4/4] ila: Add generic ILA translation facility

2015-12-10 Thread Tom Herbert
This patch implements an ILA tanslation table. This table can be
configured with identifier to locator mappings, and can be be queried
to resolve a mapping. Queries can be parameterized based on interface,
direction (incoming or outoing), and matching locator.  The table is
implemented using rhashtable and is configured via netlink (through
"ip ila .." in iproute).

The table may be used as alternative means to do do ILA tanslations
other than the lw tunnels

Signed-off-by: Tom Herbert 
---
 include/net/ila.h |  18 ++
 include/uapi/linux/ila.h  |  22 ++
 net/ipv6/ila/Makefile |   2 +-
 net/ipv6/ila/ila.h|   2 +
 net/ipv6/ila/ila_common.c |   8 +
 net/ipv6/ila/ila_xlat.c   | 679 ++
 6 files changed, 730 insertions(+), 1 deletion(-)
 create mode 100644 include/net/ila.h
 create mode 100644 net/ipv6/ila/ila_xlat.c

diff --git a/include/net/ila.h b/include/net/ila.h
new file mode 100644
index 000..9f4f43e
--- /dev/null
+++ b/include/net/ila.h
@@ -0,0 +1,18 @@
+/*
+ * ILA kernel interface
+ *
+ * Copyright (c) 2015 Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#ifndef _NET_ILA_H
+#define _NET_ILA_H
+
+int ila_xlat_outgoing(struct sk_buff *skb);
+int ila_xlat_incoming(struct sk_buff *skb);
+
+#endif /* _NET_ILA_H */
diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 7ed9e67..abde7bb 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -3,13 +3,35 @@
 #ifndef _UAPI_LINUX_ILA_H
 #define _UAPI_LINUX_ILA_H
 
+/* NETLINK_GENERIC related info */
+#define ILA_GENL_NAME  "ila"
+#define ILA_GENL_VERSION   0x1
+
 enum {
ILA_ATTR_UNSPEC,
ILA_ATTR_LOCATOR,   /* u64 */
+   ILA_ATTR_IDENTIFIER,/* u64 */
+   ILA_ATTR_LOCATOR_MATCH, /* u64 */
+   ILA_ATTR_IFINDEX,   /* s32 */
+   ILA_ATTR_DIR,   /* u32 */
 
__ILA_ATTR_MAX,
 };
 
 #define ILA_ATTR_MAX   (__ILA_ATTR_MAX - 1)
 
+enum {
+   ILA_CMD_UNSPEC,
+   ILA_CMD_ADD,
+   ILA_CMD_DEL,
+   ILA_CMD_GET,
+
+   __ILA_CMD_MAX,
+};
+
+#define ILA_CMD_MAX(__ILA_CMD_MAX - 1)
+
+#define ILA_DIR_IN (1 << 0)
+#define ILA_DIR_OUT(1 << 1)
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 31d136b..4b32e59 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index b94081f..28542cb 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -42,5 +42,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct 
ila_params *p);
 
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
+int ila_xlat_init(void);
+void ila_xlat_fini(void);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 64e1904..32dc9aa 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -80,12 +80,20 @@ static int __init ila_init(void)
if (ret)
goto fail_lwt;
 
+   ret = ila_xlat_init();
+   if (ret)
+   goto fail_xlat;
+
+   return 0;
+fail_xlat:
+   ila_lwt_fini();
 fail_lwt:
return ret;
 }
 
 static void __exit ila_fini(void)
 {
+   ila_xlat_fini();
ila_lwt_fini();
 }
 
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
new file mode 100644
index 000..2931ca3
--- /dev/null
+++ b/net/ipv6/ila/ila_xlat.c
@@ -0,0 +1,679 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ila.h"
+
+struct ila_xlat_params {
+   struct ila_params ip;
+   __be64 identifier;
+   int ifindex;
+   unsigned int dir;
+};
+
+struct ila_map {
+   struct ila_xlat_params p;
+   struct rhash_head node;
+   struct ila_map __rcu *next;
+   struct rcu_head rcu;
+};
+
+static unsigned int ila_net_id;
+
+struct ila_net {
+   struct rhashtable rhash_table;
+   spinlock_t *locks; /* Bucket locks for entry manipulation */
+   unsigned int locks_mask;
+};
+
+#defineLOCKS_PER_CPU 10
+
+static int alloc_ila_locks(struct ila_net *ilan, gfp_t gfp)
+{
+   unsigned int i, size;
+   unsigned int nr_pcpus = num_possible_cpus();
+
+   nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
+   size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
+
+   if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+   if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+   gfp == 

[PATCH net-next v4 1/4] ila: Create net/ipv6/ila directory

2015-12-10 Thread Tom Herbert
Create ila directory in preparation for supporting other hooks in the
kernel than LWT for doing ILA. This includes:
  - Moving ila.c to ila/ila_lwt.c
  - Splitting out some common functions into ila_common.c

Signed-off-by: Tom Herbert 
---
 net/ipv6/Makefile |   2 +-
 net/ipv6/ila.c| 229 --
 net/ipv6/ila/Makefile |   7 ++
 net/ipv6/ila/ila.h|  46 ++
 net/ipv6/ila/ila_common.c |  95 +++
 net/ipv6/ila/ila_lwt.c| 152 ++
 6 files changed, 301 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c

diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2c900c7..2fbd90b 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
-obj-$(CONFIG_IPV6_ILA) += ila.o
+obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)+= netfilter/
 
 obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c
deleted file mode 100644
index 1a6852e..000
--- a/net/ipv6/ila.c
+++ /dev/null
@@ -1,229 +0,0 @@
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct ila_params {
-   __be64 locator;
-   __be64 locator_match;
-   __wsum csum_diff;
-};
-
-static inline struct ila_params *ila_params_lwtunnel(
-   struct lwtunnel_state *lwstate)
-{
-   return (struct ila_params *)lwstate->data;
-}
-
-static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
-{
-   __be32 diff[] = {
-   ~from[0], ~from[1], to[0], to[1],
-   };
-
-   return csum_partial(diff, sizeof(diff), 0);
-}
-
-static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
-{
-   if (*(__be64 *)>daddr == p->locator_match)
-   return p->csum_diff;
-   else
-   return compute_csum_diff8((__be32 *)>daddr,
- (__be32 *)>locator);
-}
-
-static void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
-{
-   __wsum diff;
-   struct ipv6hdr *ip6h = ipv6_hdr(skb);
-   size_t nhoff = sizeof(struct ipv6hdr);
-
-   /* First update checksum */
-   switch (ip6h->nexthdr) {
-   case NEXTHDR_TCP:
-   if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr {
-   struct tcphdr *th = (struct tcphdr *)
-   (skb_network_header(skb) + nhoff);
-
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(>check, skb,
-   diff, true);
-   }
-   break;
-   case NEXTHDR_UDP:
-   if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr {
-   struct udphdr *uh = (struct udphdr *)
-   (skb_network_header(skb) + nhoff);
-
-   if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(>check, skb,
-   diff, true);
-   if (!uh->check)
-   uh->check = CSUM_MANGLED_0;
-   }
-   }
-   break;
-   case NEXTHDR_ICMP:
-   if (likely(pskb_may_pull(skb,
-nhoff + sizeof(struct icmp6hdr {
-   struct icmp6hdr *ih = (struct icmp6hdr *)
-   (skb_network_header(skb) + nhoff);
-
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(>icmp6_cksum, skb,
-   diff, true);
-   }
-   break;
-   }
-
-   /* Now change destination address */
-   *(__be64 *)>daddr = p->locator;
-}
-
-static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
-{
-   struct dst_entry *dst = skb_dst(skb);
-
-   if (skb->protocol != htons(ETH_P_IPV6))
-   goto drop;
-
-   update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
-   return dst->lwtstate->orig_output(net, sk, skb);
-
-drop:
-   kfree_skb(skb);
-   return -EINVAL;
-}
-
-static int ila_input(struct sk_buff *skb)
-{
- 

[PATCH net-next v4 3/4] netlink: add a start callback for starting a netlink dump

2015-12-10 Thread Tom Herbert
The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert 
---
 include/linux/netlink.h  |  2 ++
 include/net/genetlink.h  |  2 ++
 net/netlink/af_netlink.c |  4 
 net/netlink/genetlink.c  | 16 
 4 files changed, 24 insertions(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 639e9b8..0b41959 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 struct netlink_callback {
struct sk_buff  *skb;
const struct nlmsghdr   *nlh;
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -153,6 +154,7 @@ struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int 
flags);
 
 struct netlink_dump_control {
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 1b6b6dc..43c0e77 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -114,6 +114,7 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
  * @flags: flags
  * @policy: attribute validation policy
  * @doit: standard command callback
+ * @start: start callback for dumps
  * @dumpit: callback for dumpers
  * @done: completion callback for dumps
  * @ops_list: operations list
@@ -122,6 +123,7 @@ struct genl_ops {
const struct nla_policy *policy;
int(*doit)(struct sk_buff *skb,
   struct genl_info *info);
+   int(*start)(struct netlink_callback *cb);
int(*dumpit)(struct sk_buff *skb,
 struct netlink_callback *cb);
int(*done)(struct netlink_callback *cb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 59651af..81dc1bb 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2915,6 +2915,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
cb = >cb;
memset(cb, 0, sizeof(*cb));
+   cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2927,6 +2928,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
mutex_unlock(nlk->cb_mutex);
 
+   if (cb->start)
+   cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index bc0e504..8e63662 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 }
 EXPORT_SYMBOL(genlmsg_put);
 
+static int genl_lock_start(struct netlink_callback *cb)
+{
+   /* our ops are always const - netlink API doesn't propagate that */
+   const struct genl_ops *ops = cb->data;
+   int rc = 0;
+
+   if (ops->start) {
+   genl_lock();
+   rc = ops->start(cb);
+   genl_unlock();
+   }
+   return rc;
+}
+
 static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
/* our ops are always const - netlink API doesn't propagate that */
@@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+   .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
} else {
struct netlink_dump_control c = {
.module = family->module,
+   .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 01/11] ixgbe: Return error on failure to allocate mac_table

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:09 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 01/11] ixgbe: Return error on failure 
> to
> allocate mac_table
> 
> Add a check to make certain mac_table was actually allocated and is not NULL.
> If it is NULL return -ENOMEM and allow the probe routine to fail rather then
> causing a NULL pointer dereference further down the line.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 11/11] ixgbe: Clean stale VLANs when changing port vlan or resetting

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:11 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 11/11] ixgbe: Clean stale VLANs when
> changing port vlan or resetting
> 
> This patch guarantees that the VFs do not have access to VLANs that they were
> not supposed to.  What this patch does is add code so that we delete the
> previous port VLAN after adding a new one, and if we reset the VF we clear 
> all of
> the filters associated with it.
> 
> Previously the code was leaving all previous VLANs mapped to the VF and they
> didn't get deleted unless the VF specifically requested it or if the PF 
> itself was
> reset.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Suspicious RCU from iwlwifi driver

2015-12-10 Thread Laura Abbott

Hi,

I'm currently seeing a suspicous RCU usage warning:

===
[ INFO: suspicious RCU usage. ]
 4.4.0-rc4-next-20151210 #23 Not tainted
 ---
 drivers/net/wireless/intel/iwlwifi/mvm/sta.c:1226 suspicious 
rcu_dereference_protected() usage!
 
 other info that might help us debug this:
 
 rcu_scheduler_active = 1, debug_locks = 0

 4 locks held by irq/34-iwlwifi/978:
  #0:  (sync_cmd_lockdep_map){..}, at: [] 
iwl_pcie_irq_handler+0x5/0x1870 [iwlwifi]
  #1:  (&(>lock)->rlock){+.+...}, at: [] 
iwl_pcie_irq_handler+0x90a/0x1870 [iwlwifi]
  #2:  (rcu_read_lock){..}, at: [] 
ieee80211_rx_napi+0xc3/0xbd0 [mac80211]
  #3:  (&(>rx_path_lock)->rlock){+.}, at: [] 
ieee80211_rx_handlers+0x3e/0x2c80 [mac80211]
 
  stack backtrace:

 CPU: 4 PID: 978 Comm: irq/34-iwlwifi Not tainted 4.4.0-rc4-next-20151210 #23
 Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS GMET62WW (2.10 ) 03/19/2014
   5a05ae61 880407dc3760 81434b69
  8804053e3200 880407dc3790 81106757 
  880408d8a868 880408d8a868 8804061cc630 880407dc37b0
 Call Trace:
  [] dump_stack+0x4b/0x72
  [] lockdep_rcu_suspicious+0xd7/0x110
  [] iwl_mvm_get_key_sta_id.part.3+0x88/0x90 [iwlmvm]
  [] iwl_mvm_update_tkip_key+0x241/0x280 [iwlmvm]
  [] iwl_mvm_mac_update_tkip_key+0x1c/0x20 [iwlmvm]
  [] ieee80211_tkip_decrypt_data+0x249/0x5f0 [mac80211]
  [] ? skb_copy_bits+0x137/0x2f0
  [] ? __pskb_pull_tail+0x85/0x3a0
  [] ieee80211_crypto_tkip_decrypt+0xce/0x150 [mac80211]
  [] ieee80211_rx_handlers+0x970/0x2c80 [mac80211]
  [] ? __lock_acquire+0x4ba/0x1b70
  [] ieee80211_prepare_and_rx_handle+0x1e4/0xac0 [mac80211]
  [] ieee80211_rx_napi+0x336/0xbd0 [mac80211]
  [] ? ieee80211_rx_napi+0xc3/0xbd0 [mac80211]
  [] iwl_mvm_rx_rx_mpdu+0x4ad/0x880 [iwlmvm]
  [] ? iwl_mvm_rx_rx_mpdu+0x1e2/0x880 [iwlmvm]
  [] iwl_mvm_rx+0x56/0x240 [iwlmvm]
  [] iwl_pcie_irq_handler+0xf8c/0x1870 [iwlwifi]
  [] ? __schedule+0x414/0xaf0
  [] irq_thread_fn+0x20/0x50
  [] irq_thread+0x16b/0x1f0
  [] ? __schedule+0x414/0xaf0
  [] ? irq_forced_thread_fn+0x70/0x70
  [] ? wake_threads_waitq+0x30/0x30
  [] ? irq_thread_dtor+0xb0/0xb0
  [] kthread+0x101/0x120
  [] ? trace_hardirqs_on_caller+0x129/0x1b0
  [] ? kthread_create_on_node+0x250/0x250
  [] ret_from_fork+0x3f/0x70
  [] ? kthread_create_on_node+0x250/0x250

If I revert 9513c5e18a0dc55a1fc9c890715098ba2315830b
(iwlwifi: mvm: Avoid dereferencing sta if it was already flushed)
The warning goes away. Known issue?

Thanks,
Laura
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] r8169: Don't claim WoL works if LanWake flag is not set

2015-12-10 Thread Francois Romieu
Corinna Vinschen  :
[...]
> It's still a bit weird.  On the machines I tested this on, if I disable
> LanWake and shutdown the machine, I can send, e.g., MagicPackets as much
> as I like, the machined don't come up.  Isn't it a bit misleading then
> if ethtool reports that some WoL method is enabled but it doesn't work?

Of course it is. :o(

I'm fine with Config5.LanWake changes if you have empirical evidences that
it helps.

We have terse - outdated ? - documentation and some hint from
http://marc.info/?l=linux-netdev=137654699802446. I'm unable to figure
what an/the adequate change could be, especially a low level chance of
regression one.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ieee802154-atusb: Delete an unnecessary check before the function call "kfree_skb"

2015-12-10 Thread Stefan Schmidt

Hello.

On 10/12/15 19:16, Marcel Holtmann wrote:

Hi Stefan,


From: Markus Elfring 
Date: Mon, 16 Nov 2015 13:50:23 +0100

The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
drivers/net/ieee802154/atusb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ieee802154/atusb.c b/drivers/net/ieee802154/atusb.c
index 199a94a..b1cd865 100644
--- a/drivers/net/ieee802154/atusb.c
+++ b/drivers/net/ieee802154/atusb.c
@@ -310,8 +310,7 @@ static void atusb_free_urbs(struct atusb *atusb)
 urb = usb_get_from_anchor(>idle_urbs);
 if (!urb)
 break;
-if (urb->context)
-kfree_skb(urb->context);
+kfree_skb(urb->context);
 usb_free_urb(urb);
 }
}

Acked-by: Stefan Schmidt 


You got the original patch and my ACK on this one or would you prefer me to 
resend it again?

this slipped through, but now it does no longer apply.

Applying: ieee802154-atusb: Delete an unnecessary check before the function call 
"kfree_skb"
error: patch failed: drivers/net/ieee802154/atusb.c:310
error: drivers/net/ieee802154/atusb.c: patch does not apply
Patch failed at 0001 ieee802154-atusb: Delete an unnecessary check before the function 
call "kfree_skb"


Not good. I did another resend, this one applied, compiled and worked 
fine for me. You will be in the to line.


regards
Stefan Schmidt

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 04/11] ixgbe: Reduce VT code indent in set_vfta by introducing jump label

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 04/11] ixgbe: Reduce VT code indent in
> set_vfta by introducing jump label
> 
> In order to clear the way for upcoming work I thought it best to drop the 
> level of
> indent in the ixgbe_set_vfta_generic function.  Most of the code is held in 
> the
> virtualization specific section.  So the easiest approach is to just add a 
> jump label
> and jump past the bulk of the code if it is not enabled.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 09/11] ixgbe: Fix VLAN promisc in relation to SR-IOV

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 09/11] ixgbe: Fix VLAN promisc in 
> relation
> to SR-IOV
> 
> From: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 05/11] ixgbe: Simplify configuration of setting VLVF and VLVFB

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 05/11] ixgbe: Simplify configuration of
> setting VLVF and VLVFB
> 
> This patch addresses several issues within the VLVF and VLVFB configuration
> 
> First was the fact that code was overly complicated with multiple conditional
> paths depending on if we adding or removing and which bit we were going to
> add or remove.  Instead of messing with all that I have simplified it by 
> using (vid /
> 32) and (1 - vid / 32) to identify our register and the other vlvfb register.
> 
> Second was the fact that we were likely leaking a few packets into the PF in
> cases where we were deleting an entry and the VFTA filter for that entry as 
> the
> ordering was such that we deleted the pool and then the VLAN filter instead of
> the other way around.  I have updated that by adding a check for no bits being
> set and if that occurs we clear things up in the proper order.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 06/11] ixgbe: Add support for adding/removing VLAN on PF bypassing the VLVF

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 06/11] ixgbe: Add support for
> adding/removing VLAN on PF bypassing the VLVF
> 
> This patch adds support for bypassing the VLVF entry creation when the PF is
> adding a new VLAN.  The advantage to doing this is that we can then save the
> VLVF entries for the VFs which must have them in order to function, versus the
> PF which can fall back on the default pool entry.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 02/11] ixgbe: Fix SR-IOV VLAN pool configuration

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 02/11] ixgbe: Fix SR-IOV VLAN pool
> configuration
> 
> The code for checking the PF bit in ixgbe_set_vf_vlan_msg was using the wrong
> offset and as a result it was pulling the VLAN off of the PF even if there 
> were VFs
> numbered greater than 40 that still had the VLAN enabled.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [next PATCH 10/11] ixgbe: Clear stale pool mappings

2015-12-10 Thread Schmitt, Phillip J


> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Alexander Duyck
> Sent: Monday, November 02, 2015 5:10 PM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [next PATCH 10/11] ixgbe: Clear stale pool mappings
> 
> From: Alexander Duyck 
> 
> This patch makes certain that we clear the pool mappings added when we
> configure default MAC addresses for the interface.  Without this we run the 
> risk
> of leaking an address into pool 0 which really belongs to VF 0 when SR-IOV is
> enabled.
> 
> Signed-off-by: Alexander Duyck 

Tested-by: Phil Schmitt 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v4 0/4] ila: Optimization to preserve value of early demux

2015-12-10 Thread Tom Herbert
In the current implementation of ILA, LWT is used to perform
translation on both the input and output paths. This is functional,
however there is a big performance hit in the receive path. Early
demux occurs before the routing lookup (a hit actually obviates the
route lookup). Therefore the stack currently performs early
demux before translation so that a local connection with ILA
addresses is never matched. Note that this issue is not just
with ILA, but pretty much any translated or encapsulated packet
handled by LWT would miss the opportunity for early demux. Solving
the general problem seems non trivial since we would need to move
the route lookup before early demx thereby mitigating the value.

This patch set addresses the issue for ILA by adding a fast locator
lookup that occurs before early demux. This done by hooking in to
NF_INET_PRE_ROUTING

For the backend we implement an rhashtable that contains identifier
to locator to mappings. The table also allows more specific matches
that include original locator and interface.

This patch set:
 - Add an rhashtable function to atomically replace and element.
   This is useful to implement sub-trees from a table entry
   without needing to use a special anchor structure as the
   table entry.
 - Add a start callback for starting a netlink dump.
 - Creates an ila directory under net/ipv6 and moves ila.c to it.
   ila.c is split into ila_common.c and ila_lwt.c.
 - Implement a table to do identifier->locator mapping. This is
   an rhashtable (in ila_xlat.c).
 - Configuration for the table with netlink.
 - Add a hook into NF_INET_PRE_ROUTING to perform ILA translation
   before early demux.

Changes in v2:
 - Use iptables targets instead of a new xfrm function

Changes in v3:
 - Add __rcu to next pointer in struct ila_map

Changes in v4:
 - Use hook for NF_INET_PRE_ROUTING

Testing:
   Running 200 netperf TCP_RR streams

No ILA, baseline
   79.26% CPU utilization
   1678282 tps
   104/189/390 50/90/99% latencies

ILA before fix (LWT on both input and output)
   81.91% CPU utilization
   1464723 tps (-14.5% from baseline)
   121/215/411 50/90/99% latencies

ILA after fix
   80.62% CPU utilization
   1622985 (-3.4% from baseline)
   110/191/347 50/90/99% latencies


Tom Herbert (4):
  ila: Create net/ipv6/ila directory
  rhashtable: add function to replace an element
  netlink: add a start callback for starting a netlink dump
  ila: Add generic ILA translation facility

 include/linux/netlink.h|   2 +
 include/linux/rhashtable.h |  82 ++
 include/net/genetlink.h|   2 +
 include/net/ila.h  |  18 ++
 include/uapi/linux/ila.h   |  22 ++
 net/ipv6/Makefile  |   2 +-
 net/ipv6/ila.c | 229 ---
 net/ipv6/ila/Makefile  |   7 +
 net/ipv6/ila/ila.h |  48 
 net/ipv6/ila/ila_common.c  | 103 +++
 net/ipv6/ila/ila_lwt.c | 152 ++
 net/ipv6/ila/ila_xlat.c| 679 +
 net/netlink/af_netlink.c   |   4 +
 net/netlink/genetlink.c|  16 ++
 14 files changed, 1136 insertions(+), 230 deletions(-)
 create mode 100644 include/net/ila.h
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c
 create mode 100644 net/ipv6/ila/ila_xlat.c

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mm: memcontrol: MEMCG no longer works with SLOB

2015-12-10 Thread Vladimir Davydov
On Wed, Dec 09, 2015 at 03:01:07PM -0500, Johannes Weiner wrote:
> On Wed, Dec 09, 2015 at 05:32:39PM +0100, Arnd Bergmann wrote:
> > The change to move the kmem accounting into the normal memcg
> > code means we can no longer use memcg with slob, which lacks
> > the memcg_params member in its struct kmem_cache:
> > 
> > ../mm/slab.h: In function 'is_root_cache':
> > ../mm/slab.h:187:10: error: 'struct kmem_cache' has no member named 
> > 'memcg_params'

Argh, I completely forgot about this SLOB thing :-(

> > 
> > This enforces the new dependency in Kconfig. Alternatively,
> > we could change the slob code to allow using MEMCG.
> 
> I'm curious, was this a random config or do you actually use
> CONFIG_SLOB && CONFIG_MEMCG?
> 
> Excluding CONFIG_MEMCG completely for slob seems harsh, but I would
> prefer not littering the source with
> 
> #if defined(CONFIG_MEMCG) && (defined(CONFIG_SLAB) || defined(CONFIG_SLUB))
> 
> or
> 
> #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
> 
> for such a special case. The #ifdefs are already out of hand in there.
> 
> Vladimir, what would you think of simply doing this?
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 5adec08..0b3ec4b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -25,6 +25,9 @@ struct kmem_cache {
>   int refcount;   /* Use counter */
>   void (*ctor)(void *);   /* Called on object slot creation */
>   struct list_head list;  /* List of all slab caches on the system */
> +#ifdef CONFIG_MEMCG
> + struct memcg_cache_params memcg_params;
> +#endif
>  };
>  
>  #endif /* CONFIG_SLOB */

I don't like it. This would result in allocation of per memcg arrays for
each list_lru/kmem_cache, which would never be used. This looks
extremely ugly. I'd prefer to make CONFIG_MEMCG depend on SL[AU]B, but
I'm afraid such a change will be frowned upon - who knows who uses
MEMCG & SLOB?

I guess SLOB could be made memcg-aware, but I don't think it's worth the
trouble, although I can take a look in this direction - from a quick
glance at SLOB it shouldn't be difficult. If we decide to go this way, I
think we could use this patch as a temporary fix, which would be
reverted eventually.

Otherwise, no matter how tempting the idea to put all memcg stuff under
CONFIG_MEMCG is, I think it won't fly, so for now we should use ifdefs.
To avoid complex checks, we could define a macro in memcontrol.h, say
MEMCG_KMEM_ENABLED, and use it throughout the code. And I think we
should wrap list_lru stuff in it either :-/

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] ser_gigaset: fix deallocation of platform device structure

2015-12-10 Thread Paul Bolle
On wo, 2015-12-09 at 12:10 +0100, Tilman Schmidt wrote:
> Am 09.12.2015 um 00:12 schrieb Paul Bolle:

> > So what does setting
> > cs->hw.ser->dev.dev.driver_data to NULL just before freeing it buy
> > us?
> 
> We're freeing cs->hw.ser, not cs->hw.ser->dev.
> Clearing the reference to cs from the device structure before freeing 
> cs guards against possible use-after-free.
> 
> > > + kfree(cs->hw.ser);
> > > + cs->hw.ser = NULL;
> > 
> > I might be missing something, but what does setting this to NULL buy 
> > us here?
> 
> Just defensive programming. Guarding against possible use-after-free 
> or double-free.

I'm inclined to think this is not the best way to guard against such
nasty bugs. But then again, I'm only a few months into my shift of
looking after the gigaset drivers and haven't had to track down such
bugs yet. But I'd be surprised if many other drivers do it that way and
think this is a job for (tree wide) debugging tools. But, whatever the
merits of our views, we can defer this discussion to some future date.
See below.

> I'm a big fan of one change per patch. If we also want to modify the
> moved code then that should be done in a separate patch. It makes
> bisecting so much easier. Same reason why I separated out patch 3/3.


Fair enough.


Paul Bolle
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] ser_gigaset fixes

2015-12-10 Thread Paul Bolle
Hi Tilman,

On di, 2015-12-08 at 12:00 +0100, Tilman Schmidt wrote:
> this series is the result of our discussion on the "freeing an
> active object" bug. I split my proposed patch into two patches
> for the separate topics of moving the ser_cardstate kfree() and
> dropping the useless kfree()s, and also included an unrelated
> patch (1/3) that had fallen through the cracks in my last series.
> 
> Patch 2/3 should go into stable releases all the way back to 2.6.32.
> It applies cleanly to release 3.*/4.* with at most offset 1.
> For release 2.6.32 there is a trivial merge conflict with a removed
> comment line.

1/3 ran into objections and, I think, Alan Cox is working on an
alternative for it. Would you mind resending 2/3 and 3/3 as a two
patches series? Feel free to add
Acked-by: Paul Bolle 

to both.

(The previous gigaset series, which you sent in July this year, was
picked up from netdev directly by David Miller. Unless people actually
prefer these patches to also be signed-off by me, I'm perfectly fine
with that.)

Thanks,


Paul Bolle
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[iproute PATCH] tc.8: Fix reference to tc-tcindex.8

2015-12-10 Thread Phil Sutter
Just a typo there, it's spelled correctly in SEE ALSO section..

Signed-off-by: Phil Sutter 
---
 man/man8/tc.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 6275c4b32b167..4e99dcad5b1da 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -181,7 +181,7 @@ Match Resource Reservation Protocol (RSVP) packets.
 .TP
 tcindex
 Filter packets based on traffic control index. See
-.BR tc-index (8).
+.BR tc-tcindex (8).
 .TP
 u32
 Generic filtering on arbitrary packet data, assisted by syntax to abstract 
common operations. See
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-10 Thread David Laight
From: Daniel Borkmann
> Sent: 09 December 2015 19:19
> On 12/09/2015 06:11 PM, Marcelo Ricardo Leitner wrote:
> > Em 09-12-2015 14:31, David Laight escreveu:
> >> From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> >>> Sent: 09 December 2015 16:00
> >>> On Wed, 2015-12-09 at 15:49 +, David Laight wrote:
> > SCTP is lacking proper np->opt cloning at accept() time.
> >
> > TCP and DCCP use ipv6_dup_options() helper, do the same in SCTP.
> >
> > We might later factorize this code in a common helper to avoid
> > future mistakes.
> 
>  I'm wondering what the real impact of this and the other recent
>  SCTP bugs/patches is on real workloads?
>  We have enough trouble getting our customers to use kernels
>  later that the 2.6.18 based RHEL5 - without having to persuade
>  them to use kernels that contain very recent fixes.
> >>>
> >>> It all depends if your customers let (hostile ?) people run programs on
> >>> the boxes.
> >>
> >> If they require hostile programs I'm not worried.
> >
> > Not really "require", but "allow", as in: allowing third-party applications 
> > to run on it.
> 
> Yeah :/ given distros enable almost everything anyway, the first unpriv'ed
> socket(..., IPPROTO_SCTP) call auto-loads SCTP module. But to be honest, I'd
> be surprised if Cloud providers allow for this. Most of this might only run
> on dedicated boxes with telco appliances.

Yes, I'm worried about whether our M3UA code is likely to crash customer
systems, not whether hostile applications can crash it.
These boxes ought to be on private networks since the sigtran protocols
themselves have nothing that even gives a hint of security.

David


Re: [PATCH v3 1/4] VSOCK: Introduce virtio-vsock-common.ko

2015-12-10 Thread Alex Bennée

Stefan Hajnoczi  writes:

> From: Asias He 
>
> This module contains the common code and header files for the following
> virtio-vsock and virtio-vhost kernel modules.

General comment checkpatch has a bunch of warnings about 80 character
limits, extra braces and BUG_ON usage.

>
> Signed-off-by: Asias He 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v3:
>  * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
>of REQUEST/RESPONSE/ACK
>  * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
>  * Only allow host->guest connections (same security model as latest
>VMware)
> v2:
>  * Fix peer_buf_alloc inheritance on child socket
>  * Notify other side of SOCK_STREAM disconnect (fixes shutdown
>semantics)
>  * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
>  * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
>  * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
> ---
>  include/linux/virtio_vsock.h| 203 
>  include/uapi/linux/virtio_ids.h |   1 +
>  include/uapi/linux/virtio_vsock.h   |  87 
>  net/vmw_vsock/virtio_transport_common.c | 854 
> 
>  4 files changed, 1145 insertions(+)
>  create mode 100644 include/linux/virtio_vsock.h
>  create mode 100644 include/uapi/linux/virtio_vsock.h
>  create mode 100644 net/vmw_vsock/virtio_transport_common.c
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> new file mode 100644
> index 000..e54eb45
> --- /dev/null
> +++ b/include/linux/virtio_vsock.h
> @@ -0,0 +1,203 @@
> +/*
> + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> + * anyone can use the definitions to implement compatible
> drivers/servers:

Is anything in here actually exposed to userspace or the guest? The
#ifdef __KERNEL__ statement seems redundant for this file at least.

> + *
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
> IS''
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Copyright (C) Red Hat, Inc., 2013-2015
> + * Copyright (C) Asias He , 2013
> + * Copyright (C) Stefan Hajnoczi , 2015
> + */
> +
> +#ifndef _LINUX_VIRTIO_VSOCK_H
> +#define _LINUX_VIRTIO_VSOCK_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE128
> +#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE(1024 * 256)
> +#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE(1024 * 256)
> +#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4)
> +#define VIRTIO_VSOCK_MAX_BUF_SIZE0xUL
> +#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE(1024 * 64)
> +#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE (1024 * 1024 * 16)
> +#define VIRTIO_VSOCK_MAX_DGRAM_SIZE  (1024 * 64)
> +
> +struct vsock_transport_recv_notify_data;
> +struct vsock_transport_send_notify_data;
> +struct sockaddr_vm;
> +struct vsock_sock;
> +
> +enum {
> + VSOCK_VQ_CTRL   = 0,
> + VSOCK_VQ_RX = 1, /* for host to guest data */
> + VSOCK_VQ_TX = 2, /* for guest to host data */
> + VSOCK_VQ_MAX= 3,
> +};
> +
> +/* virtio transport socket state */
> +struct virtio_transport {
> + struct virtio_transport_pkt_ops *ops;
> + struct vsock_sock *vsk;
> +
> + u32 buf_size;
> + u32 buf_size_min;
> + u32 buf_size_max;
> +
> + struct mutex tx_lock;
> + struct mutex rx_lock;
> +
> + struct list_head 

Re: [PATCH] iwlegacy: mark il_adjust_beacon_interval as noinline

2015-12-10 Thread Stanislaw Gruszka
On Wed, Dec 09, 2015 at 05:42:41PM +0100, Arnd Bergmann wrote:
> With the new optimized do_div() code, some versions of gcc
> produce obviously incorrect code that leads to a link error
> in iwlegacy/common.o:
> 
> drivers/built-in.o: In function `il_send_rxon_timing':
> :(.text+0xa6b4d4): undefined reference to `ilog2_NaN'
> :(.text+0xa6b4f0): undefined reference to `__aeabi_uldivmod'
> 
> In a few thousand randconfig builds, I have seen this problem
> a couple of times in this file, but never anywhere else in the
> kernel, so we can try to work around this in the only file
> that shows the behavior, by marking the il_adjust_beacon_interval
> function as noinline, which convinces gcc to use the unoptimized
> do_div() all the time.

I don't think this is good way to "fix" the issue, but also have
nothing against to this particular change.

Acked-by: Stanislaw Gruszka 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ieee802154-atusb: Delete an unnecessary check before the function call "kfree_skb"

2015-12-10 Thread Stefan Schmidt

Marcel,

On 17/11/15 17:18, Stefan Schmidt wrote:

Hello.

On 17/11/15 17:17, Stefan Schmidt wrote:

From: Markus Elfring 
Date: Mon, 16 Nov 2015 13:50:23 +0100

The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/ieee802154/atusb.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ieee802154/atusb.c 
b/drivers/net/ieee802154/atusb.c

index 199a94a..b1cd865 100644
--- a/drivers/net/ieee802154/atusb.c
+++ b/drivers/net/ieee802154/atusb.c
@@ -310,8 +310,7 @@ static void atusb_free_urbs(struct atusb *atusb)
 urb = usb_get_from_anchor(>idle_urbs);
 if (!urb)
 break;
-if (urb->context)
-kfree_skb(urb->context);
+kfree_skb(urb->context);
 usb_free_urb(urb);
 }
 }


Acked-by: Stefan Schmidt 



You got the original patch and my ACK on this one or would you prefer me 
to resend it again?


regards
Stefan Schmidt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute PATCH] route: ignore RTAX_HOPLIMIT of value -1

2015-12-10 Thread Stephen Hemminger
On Wed, 2 Dec 2015 12:50:22 +
Phil Sutter  wrote:

> Older kernels use -1 internally as indicator to use the sysctl default,
> but they still export the setting. Newer kernels use 0 to indicate that
> (which is why the conversion from -1 to 0 was done here), but they also
> stopped exporting the value. Since the meaning of -1 is clear, treat it
> equally like default on newer kernels (which is to not print anything).
> 
> Signed-off-by: Phil Sutter 
> ---
>  ip/iproute.c | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)

Applied, thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] libnetlink: don't confuse variables in rtnl_talk()

2015-12-10 Thread Stephen Hemminger
On Thu,  3 Dec 2015 17:13:48 +0100
Nicolas Dichtel  wrote:

> There is two variables named 'len' in rtnl_talk. In fact, commit
> c079e121a73a didn't work. For example, it was possible to trigger
> a seg fault with this command:
> $ ip link set gre2 type ip6gre hoplimit 32
> 
> Let's rename the argument len to maxlen.
> 
> Fixes: c079e121a73a ("libnetlink: add size argument to rtnl_talk")
> Reported-by: Thomas Faivre 
> Signed-off-by: Nicolas Dichtel 

Applied, thanks, Not sure why compiler isn't catching this.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ieee802154-atusb: Delete an unnecessary check before the function call "kfree_skb"

2015-12-10 Thread Marcel Holtmann
Hi Stefan,

>>> From: Markus Elfring 
>>> Date: Mon, 16 Nov 2015 13:50:23 +0100
>>> 
>>> The kfree_skb() function tests whether its argument is NULL and then
>>> returns immediately. Thus the test around the call is not needed.
>>> 
>>> This issue was detected by using the Coccinelle software.
>>> 
>>> Signed-off-by: Markus Elfring 
>>> ---
>>> drivers/net/ieee802154/atusb.c | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/net/ieee802154/atusb.c b/drivers/net/ieee802154/atusb.c
>>> index 199a94a..b1cd865 100644
>>> --- a/drivers/net/ieee802154/atusb.c
>>> +++ b/drivers/net/ieee802154/atusb.c
>>> @@ -310,8 +310,7 @@ static void atusb_free_urbs(struct atusb *atusb)
>>> urb = usb_get_from_anchor(>idle_urbs);
>>> if (!urb)
>>> break;
>>> -if (urb->context)
>>> -kfree_skb(urb->context);
>>> +kfree_skb(urb->context);
>>> usb_free_urb(urb);
>>> }
>>> }
>> 
>> Acked-by: Stefan Schmidt 
> 
> 
> You got the original patch and my ACK on this one or would you prefer me to 
> resend it again?

this slipped through, but now it does no longer apply.

Applying: ieee802154-atusb: Delete an unnecessary check before the function 
call "kfree_skb"
error: patch failed: drivers/net/ieee802154/atusb.c:310
error: drivers/net/ieee802154/atusb.c: patch does not apply
Patch failed at 0001 ieee802154-atusb: Delete an unnecessary check before the 
function call "kfree_skb"

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2] net: Flush local routes when device changes vrf association

2015-12-10 Thread David Ahern
The VRF driver cycles netdevs when an interface is enslaved or released:
the down event is used to flush neighbor and route tables and the up
event (if the interface was already up) effectively moves local and
connected routes to the proper table.

As of 4f823defdd5b the local route is left hanging around after a link
down, so when a netdev is moved from one VRF to another (or released
from a VRF altogether) local routes are left in the wrong table.

Fix by handling the NETDEV_CHANGEUPPER event. When the upper dev is
an L3mdev then call fib_disable_ip to flush all routes, local ones
to.

Fixes: 4f823defdd5b ("ipv4: fix to not remove local route on link down")
Cc: Julian Anastasov 
Signed-off-by: David Ahern 
---
v2
- key off NETDEV_CHANGEUPPER event vs using a new event

 net/ipv4/fib_frontend.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index cc8f3e506cde..473447593060 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1155,6 +1155,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
 static int fib_netdev_event(struct notifier_block *this, unsigned long event, 
void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct netdev_notifier_changeupper_info *info;
struct in_device *in_dev;
struct net *net = dev_net(dev);
unsigned int flags;
@@ -1193,6 +1194,14 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
case NETDEV_CHANGEMTU:
rt_cache_flush(net);
break;
+   case NETDEV_CHANGEUPPER:
+   info = ptr;
+   /* flush all routes if dev is linked to or unlinked from
+* an L3 master device (e.g., VRF)
+*/
+   if (info->upper_dev && netif_is_l3_master(info->upper_dev))
+   fib_disable_ip(dev, NETDEV_DOWN, true);
+   break;
}
return NOTIFY_DONE;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ip neigh: device is optional for proxy entries

2015-12-10 Thread Stephen Hemminger
On Tue, 01 Dec 2015 01:17:06 +0300
Konstantin Khlebnikov  wrote:

> Though dumping such entries crashes present kernels.
> 
> Signed-off-by: Konstantin Khlebnikov 

Applied thanks, but I nobody will use it until after your
kernel patch makes it to stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/4] stmmac: create of compatible mdio bus for stmacc driver

2015-12-10 Thread Giuseppe CAVALLARO

Hi

also pls fix this typo

stmmac: create of compatible mdio bus for stmacc driver
 
  stmmac

On 12/9/2015 9:39 AM, Phil Reid wrote:

The DSA driver needs to be passed a reference to an mdio bus. Typically
the mac is configured to use a fixed link but the mdio bus still needs
to be registered so that it con configure the switch.
This patch follows the same process as the altera tse ethernet driver for
creation of the mdio bus.

Acked-by: Rob Herring 
Signed-off-by: Phil Reid 
---
  Documentation/devicetree/bindings/net/stmmac.txt   |  8 ++
  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 31 +++---
  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  2 +-
  3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index f34fc3c..fd5ddf8 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -47,6 +47,7 @@ Optional properties:
  - snps,burst_len: The AXI burst lenth value of the AXI BUS MODE register.
  - tx-fifo-depth: See ethernet.txt file in the same directory
  - rx-fifo-depth: See ethernet.txt file in the same directory
+- mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus.

  Examples:

@@ -65,4 +66,11 @@ Examples:
tx-fifo-depth = <16384>;
clocks = <>;
clock-names = "stmmaceth";
+   mdio0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "snps,dwmac-mdio";
+   phy1: ethernet-phy@0 {
+   };
+   };
};
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index bba670c..bb6f75c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -29,7 +29,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include 

  #include "stmmac.h"
@@ -200,10 +200,29 @@ int stmmac_mdio_register(struct net_device *ndev)
struct stmmac_priv *priv = netdev_priv(ndev);
struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
int addr, found;
+   struct device_node *mdio_node = NULL;
+   struct device_node *child_node = NULL;

if (!mdio_bus_data)
return 0;

+   if (IS_ENABLED(CONFIG_OF)) {
+   for_each_child_of_node(priv->device->of_node, child_node) {
+   if (of_device_is_compatible(child_node,
+   "snps,dwmac-mdio")) {
+   mdio_node = child_node;
+   break;
+   }
+   }
+
+   if (mdio_node) {
+   netdev_dbg(ndev, "FOUND MDIO subnode\n");
+   } else {
+   netdev_err(ndev, "NO MDIO subnode\n");
+   return 0;
+   }
+   }
+
new_bus = mdiobus_alloc();
if (new_bus == NULL)
return -ENOMEM;
@@ -231,7 +250,8 @@ int stmmac_mdio_register(struct net_device *ndev)
new_bus->irq = irqlist;
new_bus->phy_mask = mdio_bus_data->phy_mask;
new_bus->parent = priv->device;
-   err = mdiobus_register(new_bus);
+
+   err = of_mdiobus_register(new_bus, mdio_node);
if (err != 0) {
pr_err("%s: Cannot register as MDIO bus\n", new_bus->name);
goto bus_register_fail;
@@ -284,13 +304,6 @@ int stmmac_mdio_register(struct net_device *ndev)
}
}

-   if (!found) {
-   pr_warn("%s: No PHY found\n", ndev->name);
-   mdiobus_unregister(new_bus);
-   mdiobus_free(new_bus);
-   return -ENODEV;
-   }


hmm,  this could be necessary on some platforms that wants to
get the phy addr at runtime and in case of failure then
removes the registered bus.


-
priv->mii = new_bus;

return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index d02691b..6863420 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -146,7 +146,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0)
dev_warn(>dev, "snps,phy-addr property is deprecated\n");

-   if (plat->phy_node || plat->phy_bus_name)


can this breaks some conf case?


+   if (plat->phy_bus_name)
plat->mdio_bus_data = NULL;
else
  

RE: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-10 Thread David Laight
From: Eric Dumazet
> Sent: 10 December 2015 15:58
>
> BTW, are you even using IPv6 SCTP sessions ?

Our M3UA/SCTP protocol stack supports them and defaults to using
IPv6 listening sockets for IPv4 connections.

I very much doubt than any customers have used them yet.
So most of the IPv6 connections will have been to ::1
during internal regression testing.

We don't even try to set any IPv6 (or IPv4) options.

Just SO_REUSEADDR, TCP/SCTP_NODELAY, SCTP_EVENTS, SCTP_INITMSG,
SO_KEEPALIVE (tcp), IPV6_V6ONLY (if binding separate listeners),
SCTP_SOCKOPT_BINX_ADD (WTF is this a 'socket option') and
SO_LINGER (to get abortive close on SCTP connections on kernels
before 3.18).

David



Re: [BUG] net: performance regression on ixgbe (Intel 82599EB 10-Gigabit NIC)

2015-12-10 Thread Rick Jones

On 12/10/2015 06:18 AM, Otto Sabart wrote:

*) Is irqbalance disabled and the IRQs set the same each time, or might
there be variability possible there?  Each of the five netperf runs will be
a different four-tuple which means each may (or may not) get RSS hashed/etc
differently.


The irqbalance is disabled on all systems.

Can you suggest, if there is a need to assign irqs manually? Which irqs
we should pin to which CPU?


Likely as not it will depend on your goals.  When I want single-stream 
results, I will tend to disable irqbalance and set all the IRQs to one 
CPU in the system (often as not CPU0 but that is as much habit as 
anything else).  The idea is to clamp-down on any source of run-to-run 
variation.  I will also sometimes alter where I bind netperf/netserver 
to show the effects (especially on service demand) when 
netperf/netserver run on the same CPU as the IRQ, a thread in the same 
core as the IRQ, a core in the same processor as the IRQ and/or a core 
in another processor.  Unless all the IRQs are pointed at the same CPU 
(or I always specify the same, full four-tuple for addressing and wait 
for TIME_WAIT) that can be a challenge to keep straight.


When I want to measure aggregate, I either let irqbalance do its thing 
and run a bunch of warm-up tests, or simply peanut-butter the IRQs 
across the CPUs with variations on the theme of:


grep eth[23] /proc/interrupts | awk -F ":" -v cpus=12 '{mask = 1 * 
2^(count++ % cpus);printf("echo %x > 
/proc/irq/%d/smp_affinity\n",mask,$1)}' | sh


How one might structure/alter that pipeline will depend on the CPU 
enumeration.  That one was from a 2x6 core system where I didn't want to 
hit the second thread of each core, and the enumeration was the first 
twelve CPUs were on thread 0 of each core of both processors.



*) It is perhaps adding duct tape to already-present belt and suspenders,
but is power-management set to a fixed state on the systems involved? (Since
this seems to be ProLiant G7s going by the legends on the charts, either
static high perf or static low power I would imagine)


Power management is set to OS-Control in bios, which effectively means,
that _bios_ does not do any power management at all.


Probably just as well :)


*) What is the difference before/after for the service demands?  The netperf
tests being run are asking for CPU utilization but I don't see the service
demand change being summarized.


Unfortunatelly we does not have any summary chart for service demands,
we will add some shortly.


*) Does a specific CPU on one side or the other saturate?
(LOCAL_CPU_PEAK_UTIL, LOCAL_CPU_PEAK_ID, REMOTE_CPU_PEAK_UTIL,
REMOTE_CPU_PEAK_ID output selectors)


We are sort of stuck in a stone age. We still use old fashion tcp/udp
migrated tests, but we plan to switch to omni.


Well, you don't have to invoke with -t omni to make use of the output 
selectors - just add the -O (or -o or -k) test-specific option.





*) What are the processors involved?  Presumably the "other system" is
fixed?


In this case:

hp-dl380g7 - $ lscpu:
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):24
On-line CPU(s) list:   0-23
Thread(s) per core:2
Core(s) per socket:6
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 44
Model name:Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
Stepping:  2
CPU MHz:   2660.000
BogoMIPS:  5331.27
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23


hp-dl385g7 - $ lscpu:
tecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):24
On-line CPU(s) list:   0-23
Thread(s) per core:1
Core(s) per socket:12
Socket(s): 2
NUMA node(s):  4
Vendor ID: AuthenticAMD
CPU family:16
Model: 9
Model name:AMD Opteron(tm) Processor 6172
Stepping:  1
CPU MHz:   2100.000
BogoMIPS:  4200.39
Virtualization:AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache:  512K
L3 cache:  5118K
NUMA node0 CPU(s): 0,2,4,6,8,10
NUMA node1 CPU(s): 12,14,16,18,20,22
NUMA node2 CPU(s): 13,15,17,19,21,23
NUMA node3 CPU(s): 1,3,5,7,9,11


I guess that helps explain why there were such large differences in the 
deltas between TCP_STREAM and TCP_MAERTS since it wasn't the same 
per-core "horsepower" on either side and so why LRO on/off could have 
also affected the TCP_STREAM results. (When LRO was off it was off on 
both sides, and when on was on on both yes?)


happy benchmarking,

rick jones

--
To 

Re: [PATCH net] sfc: only use RSS filters if we're using RSS

2015-12-10 Thread David Miller
From: Bert Kenward 
Date: Thu, 10 Dec 2015 13:30:08 +

> Without this filter insertion on a VF would fail if only one channel
> was in use. This would include the unicast station filter and therefore
> no traffic would be received.
> 
> Signed-off-by: Bert Kenward 
> ---
>  drivers/net/ethernet/sfc/ef10.c  | 23 +++
>  drivers/net/ethernet/sfc/efx.h   |  5 +
>  drivers/net/ethernet/sfc/farch.c |  2 +-
>  3 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index c4a0e8a..368e98e 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -3309,7 +3309,7 @@ static int efx_ef10_filter_remove_internal(struct 
> efx_nic *efx,
>  
>   new_spec.priority = EFX_FILTER_PRI_AUTO;
>   new_spec.flags = (EFX_FILTER_FLAG_RX |
> -   EFX_FILTER_FLAG_RX_RSS);
> + (efx_rss_enabled(efx) ? EFX_FILTER_FLAG_RX_RSS : 0));

This was indented properly before your change, please don't damage the 
indentation
like this.  There must be 4 TAB characters and 2 SPACE characters on that
second line of the assignment so that the first character is precisely at the
first column after the openning parenthesis of the first line.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sched/cls_flow.c : allow nfct-* keys work on ingress interfaces

2015-12-10 Thread Cong Wang
Hello,

On Thu, Dec 10, 2015 at 5:24 AM, Гаврилов Игорь  wrote:
> Improved CTTUPLE macro with code from sched/act_connmark.c, so it be able to 
> get unNATed addresses from nf_conntrack.

Please follow the guideline to submit a formal kernel patch:
https://www.kernel.org/doc/Documentation/SubmittingPatches

Hints:
a) The patch has to be inlined
b) You have to sign it off
c) Please Cc the maintainers, in this case, Jamal Hadi Salim 
d) Use checkpatch.pl to check coding styles etc.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] VSOCK: Introduce virtio-vsock-common.ko

2015-12-10 Thread Stefan Hajnoczi
On Thu, Dec 10, 2015 at 10:17:07AM +, Alex Bennée wrote:
> Stefan Hajnoczi  writes:
> 
> > From: Asias He 
> >
> > This module contains the common code and header files for the following
> > virtio-vsock and virtio-vhost kernel modules.
> 
> General comment checkpatch has a bunch of warnings about 80 character
> limits, extra braces and BUG_ON usage.

Will fix in the next verison.

> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > new file mode 100644
> > index 000..e54eb45
> > --- /dev/null
> > +++ b/include/linux/virtio_vsock.h
> > @@ -0,0 +1,203 @@
> > +/*
> > + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> > + * anyone can use the definitions to implement compatible
> > drivers/servers:
> 
> Is anything in here actually exposed to userspace or the guest? The
> #ifdef __KERNEL__ statement seems redundant for this file at least.

You are right.  I think the header was copied from a uapi file.

I'll compare against other virtio code and apply an appropriate header.

> > +void virtio_vsock_dumppkt(const char *func,  const struct virtio_vsock_pkt 
> > *pkt)
> > +{
> > +   pr_debug("%s: pkt=%p, op=%d, len=%d, %d:%d---%d:%d, len=%d\n",
> > +func, pkt,
> > +le16_to_cpu(pkt->hdr.op),
> > +le32_to_cpu(pkt->hdr.len),
> > +le32_to_cpu(pkt->hdr.src_cid),
> > +le32_to_cpu(pkt->hdr.src_port),
> > +le32_to_cpu(pkt->hdr.dst_cid),
> > +le32_to_cpu(pkt->hdr.dst_port),
> > +pkt->len);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_vsock_dumppkt);
> 
> Why export this at all? The only users are in this file so you could
> make it static.

I'll make it static.

> > +u32 virtio_transport_get_credit(struct virtio_transport *trans, u32 credit)
> > +{
> > +   u32 ret;
> > +
> > +   mutex_lock(>tx_lock);
> > +   ret = trans->peer_buf_alloc - (trans->tx_cnt - trans->peer_fwd_cnt);
> > +   if (ret > credit)
> > +   ret = credit;
> > +   trans->tx_cnt += ret;
> > +   mutex_unlock(>tx_lock);
> > +
> > +   pr_debug("%s: ret=%d, buf_alloc=%d, peer_buf_alloc=%d,"
> > +"tx_cnt=%d, fwd_cnt=%d, peer_fwd_cnt=%d\n", __func__,
> 
> I think __func__ is superfluous here as the dynamic print code already
> has it and can print it when required. Having said that there seems to
> be plenty of code already in the kernel that uses __func__ :-/

I'll convert most printks to tracepoints in the next revision.

> > +u64 virtio_transport_get_max_buffer_size(struct vsock_sock *vsk)
> > +{
> > +   struct virtio_transport *trans = vsk->trans;
> > +
> > +   return trans->buf_size_max;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_get_max_buffer_size);
> 
> All these accesses functions seem pretty simple. Maybe they should be
> inline header functions or even #define macros?

They are used as struct vsock_transport function pointers.  What is the
advantage to inlining them?

> > +int virtio_transport_notify_send_post_enqueue(struct vsock_sock *vsk,
> > +   ssize_t written, struct vsock_transport_send_notify_data *data)
> > +{
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_transport_notify_send_post_enqueue);
> 
> This makes me wonder if the calling code should be having
> if(transport->fn) checks rather than filling stuff out will null
> implementations but I guess that's a question better aimed at the
> maintainers.

I've considered it too.  I'll try to streamline this in the next
revision.

> > +/* We are under the virtio-vsock's vsock->rx_lock or
> > + * vhost-vsock's vq->mutex lock */
> > +void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
> > +{
> > +   struct virtio_transport *trans;
> > +   struct sockaddr_vm src, dst;
> > +   struct vsock_sock *vsk;
> > +   struct sock *sk;
> > +
> > +   vsock_addr_init(, le32_to_cpu(pkt->hdr.src_cid), 
> > le32_to_cpu(pkt->hdr.src_port));
> > +   vsock_addr_init(, le32_to_cpu(pkt->hdr.dst_cid), 
> > le32_to_cpu(pkt->hdr.dst_port));
> > +
> > +   virtio_vsock_dumppkt(__func__, pkt);
> > +
> > +   if (le16_to_cpu(pkt->hdr.type) != VIRTIO_VSOCK_TYPE_STREAM) {
> > +   /* TODO send RST */
> 
> TODO's shouldn't make it into final submissions.
> 
> > +   goto free_pkt;
> > +   }
> > +
> > +   /* The socket must be in connected or bound table
> > +* otherwise send reset back
> > +*/
> > +   sk = vsock_find_connected_socket(, );
> > +   if (!sk) {
> > +   sk = vsock_find_bound_socket();
> > +   if (!sk) {
> > +   pr_debug("%s: can not find bound_socket\n", __func__);
> > +   virtio_vsock_dumppkt(__func__, pkt);
> > +   /* Ignore this pkt instead of sending reset back */
> > +   /* TODO send a RST unless this packet is a RST
> > (to avoid infinite loops) */
> 
> Ditto.

Thanks, I'll complete the RST code in the next revision.


signature.asc
Description: PGP signature


Re: [PATCH v5 1/4] stmmac: create of compatible mdio bus for stmacc driver

2015-12-10 Thread Phil Reid

G'day Giuseppe,

On 11/12/2015 1:16 AM, Giuseppe CAVALLARO wrote:

Hi

also pls fix this typo

stmmac: create of compatible mdio bus for stmacc driver
  
   stmmac

Will do.


On 12/9/2015 9:39 AM, Phil Reid wrote:

The DSA driver needs to be passed a reference to an mdio bus. Typically
the mac is configured to use a fixed link but the mdio bus still needs
to be registered so that it con configure the switch.
This patch follows the same process as the altera tse ethernet driver for
creation of the mdio bus.

Acked-by: Rob Herring 
Signed-off-by: Phil Reid 
---
  Documentation/devicetree/bindings/net/stmmac.txt   |  8 ++
  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 31 +++---
  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  2 +-
  3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/stmmac.txt 
b/Documentation/devicetree/bindings/net/stmmac.txt
index f34fc3c..fd5ddf8 100644
--- a/Documentation/devicetree/bindings/net/stmmac.txt
+++ b/Documentation/devicetree/bindings/net/stmmac.txt
@@ -47,6 +47,7 @@ Optional properties:
  - snps,burst_len: The AXI burst lenth value of the AXI BUS MODE register.
  - tx-fifo-depth: See ethernet.txt file in the same directory
  - rx-fifo-depth: See ethernet.txt file in the same directory
+- mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus.

  Examples:

@@ -65,4 +66,11 @@ Examples:
  tx-fifo-depth = <16384>;
  clocks = <>;
  clock-names = "stmmaceth";
+mdio0 {
+#address-cells = <1>;
+#size-cells = <0>;
+compatible = "snps,dwmac-mdio";
+phy1: ethernet-phy@0 {
+};
+};
  };
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index bba670c..bb6f75c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -29,7 +29,7 @@
  #include 
  #include 
  #include 
-
+#include 
  #include 

  #include "stmmac.h"
@@ -200,10 +200,29 @@ int stmmac_mdio_register(struct net_device *ndev)
  struct stmmac_priv *priv = netdev_priv(ndev);
  struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
  int addr, found;
+struct device_node *mdio_node = NULL;
+struct device_node *child_node = NULL;

  if (!mdio_bus_data)
  return 0;

+if (IS_ENABLED(CONFIG_OF)) {
+for_each_child_of_node(priv->device->of_node, child_node) {
+if (of_device_is_compatible(child_node,
+"snps,dwmac-mdio")) {
+mdio_node = child_node;
+break;
+}
+}
+
+if (mdio_node) {
+netdev_dbg(ndev, "FOUND MDIO subnode\n");
+} else {
+netdev_err(ndev, "NO MDIO subnode\n");
+return 0;
+}
+}
+
  new_bus = mdiobus_alloc();
  if (new_bus == NULL)
  return -ENOMEM;
@@ -231,7 +250,8 @@ int stmmac_mdio_register(struct net_device *ndev)
  new_bus->irq = irqlist;
  new_bus->phy_mask = mdio_bus_data->phy_mask;
  new_bus->parent = priv->device;
-err = mdiobus_register(new_bus);
+
+err = of_mdiobus_register(new_bus, mdio_node);
  if (err != 0) {
  pr_err("%s: Cannot register as MDIO bus\n", new_bus->name);
  goto bus_register_fail;
@@ -284,13 +304,6 @@ int stmmac_mdio_register(struct net_device *ndev)
  }
  }

-if (!found) {
-pr_warn("%s: No PHY found\n", ndev->name);
-mdiobus_unregister(new_bus);
-mdiobus_free(new_bus);
-return -ENODEV;
-}


hmm,  this could be necessary on some platforms that wants to
get the phy addr at runtime and in case of failure then
removes the registered bus.


Could make this conditional on (!found && !mdio_node).
Therefore if dt node exists it will be created regardless, otherwise unregister 
if nothing found.
Thoughts?


-
  priv->mii = new_bus;

  return 0;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index d02691b..6863420 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -146,7 +146,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
  if (of_property_read_u32(np, "snps,phy-addr", >phy_addr) == 0)
  dev_warn(>dev, "snps,phy-addr property is deprecated\n");
ec
-if (plat->phy_node || plat->phy_bus_name)



I think that would be dependent on the device tree.
However this may also work:
if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)


can this breaks some conf case?


+if (plat->phy_bus_name)
  

Re: [PATCH v3 2/4] VSOCK: Introduce virtio-vsock.ko

2015-12-10 Thread Stefan Hajnoczi
On Thu, Dec 10, 2015 at 09:23:25PM +, Alex Bennée wrote:
> Stefan Hajnoczi  writes:
> 
> > From: Asias He 
> >
> > VM sockets virtio transport implementation. This module runs in guest
> > kernel.
> 
> checkpatch warns on a bunch of whitespace/tab issues.

Will fix in the next version.

> > +struct virtio_vsock {
> > +   /* Virtio device */
> > +   struct virtio_device *vdev;
> > +   /* Virtio virtqueue */
> > +   struct virtqueue *vqs[VSOCK_VQ_MAX];
> > +   /* Wait queue for send pkt */
> > +   wait_queue_head_t queue_wait;
> > +   /* Work item to send pkt */
> > +   struct work_struct tx_work;
> > +   /* Work item to recv pkt */
> > +   struct work_struct rx_work;
> > +   /* Mutex to protect send pkt*/
> > +   struct mutex tx_lock;
> > +   /* Mutex to protect recv pkt*/
> > +   struct mutex rx_lock;
> 
> Further down I got confused by what lock was what and exactly what was
> being protected. If the receive and transmit paths touch separate things
> it might be worth re-arranging the structure to make it clearer, eg:
> 
>/* The transmit path is protected by tx_lock */
>struct mutex tx_lock;
>struct work_struct tx_work;
>..
>..
> 
>/* The receive path is protected by rx_lock */
>wait_queue_head_t queue_wait;
>..
>..
> 
>  Which might make things a little clearer. Then all the redundant
>  information in the comments can be removed. I don't need to know what
>  is a Virtio device, virtqueue or wait_queue etc as they are implicit in
>  the structure name.

Thanks, that is a nice idea.

> > +   mutex_lock(>tx_lock);
> > +   while ((ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt,
> > +   GFP_KERNEL)) < 0) {
> > +   prepare_to_wait_exclusive(>queue_wait, ,
> > + TASK_UNINTERRUPTIBLE);
> > +   mutex_unlock(>tx_lock);
> > +   schedule();
> > +   mutex_lock(>tx_lock);
> > +   finish_wait(>queue_wait, );
> > +   }
> > +   virtqueue_kick(vq);
> > +   mutex_unlock(>tx_lock);
> 
> What are we protecting with tx_lock here? See comments above about
> making the lock usage semantics clearer.

vq (vsock->vqs[VSOCK_VQ_TX]) is being protected.  Concurrent calls to
virtqueue_add_sgs() are not allowed.

> > +
> > +   return pkt_len;
> > +}
> > +
> > +static struct virtio_transport_pkt_ops virtio_ops = {
> > +   .send_pkt = virtio_transport_send_pkt,
> > +};
> > +
> > +static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> > +{
> > +   int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> > +   struct virtio_vsock_pkt *pkt;
> > +   struct scatterlist hdr, buf, *sgs[2];
> > +   struct virtqueue *vq;
> > +   int ret;
> > +
> > +   vq = vsock->vqs[VSOCK_VQ_RX];
> > +
> > +   do {
> > +   pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> > +   if (!pkt) {
> > +   pr_debug("%s: fail to allocate pkt\n", __func__);
> > +   goto out;
> > +   }
> > +
> > +   /* TODO: use mergeable rx buffer */
> 
> TODO's should end up in merged code.

Will fix in next revision.

> > +   pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> > +   if (!pkt->buf) {
> > +   pr_debug("%s: fail to allocate pkt->buf\n", __func__);
> > +   goto err;
> > +   }
> > +
> > +   sg_init_one(, >hdr, sizeof(pkt->hdr));
> > +   sgs[0] = 
> > +
> > +   sg_init_one(, pkt->buf, buf_len);
> > +   sgs[1] = 
> > +   ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
> > +   if (ret)
> > +   goto err;
> > +   vsock->rx_buf_nr++;
> > +   } while (vq->num_free);
> > +   if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
> > +   vsock->rx_buf_max_nr = vsock->rx_buf_nr;
> > +out:
> > +   virtqueue_kick(vq);
> > +   return;
> > +err:
> > +   virtqueue_kick(vq);
> > +   virtio_transport_free_pkt(pkt);
> 
> You could free the pkt memory at the fail site and just have one exit path.

Okay, I agree the err label is of marginal use.  Let's get rid of it.

> 
> > +   return;
> > +}
> > +
> > +static void virtio_transport_send_pkt_work(struct work_struct *work)
> > +{
> > +   struct virtio_vsock *vsock =
> > +   container_of(work, struct virtio_vsock, tx_work);
> > +   struct virtio_vsock_pkt *pkt;
> > +   bool added = false;
> > +   struct virtqueue *vq;
> > +   unsigned int len;
> > +   struct sock *sk;
> > +
> > +   vq = vsock->vqs[VSOCK_VQ_TX];
> > +   mutex_lock(>tx_lock);
> > +   do {
> 
> You can move the declarations of pkt/len into the do block.

Okay.

> 
> > +   virtqueue_disable_cb(vq);
> > +   while ((pkt = virtqueue_get_buf(vq, )) != NULL) {
> 
> And the sk declaration here

Okay.

> > +static void virtio_transport_recv_pkt_work(struct work_struct *work)
> > +{
> > +   struct virtio_vsock *vsock =
> > +   container_of(work, struct virtio_vsock, rx_work);
> > +   struct 

[PATCH net] sfc: only use RSS filters if we're using RSS

2015-12-10 Thread Bert Kenward
Without this filter insertion on a VF would fail if only one channel
was in use. This would include the unicast station filter and therefore
no traffic would be received.

Signed-off-by: Bert Kenward 
---
 drivers/net/ethernet/sfc/ef10.c  | 23 +++
 drivers/net/ethernet/sfc/efx.h   |  5 +
 drivers/net/ethernet/sfc/farch.c |  2 +-
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index c4a0e8a..368e98e 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3309,7 +3309,7 @@ static int efx_ef10_filter_remove_internal(struct efx_nic 
*efx,
 
new_spec.priority = EFX_FILTER_PRI_AUTO;
new_spec.flags = (EFX_FILTER_FLAG_RX |
- EFX_FILTER_FLAG_RX_RSS);
+   (efx_rss_enabled(efx) ? EFX_FILTER_FLAG_RX_RSS : 0));
new_spec.dmaq_id = 0;
new_spec.rss_context = EFX_FILTER_RSS_CONTEXT_DEFAULT;
rc = efx_ef10_filter_push(efx, _spec,
@@ -3931,6 +3931,7 @@ static int efx_ef10_filter_insert_addr_list(struct 
efx_nic *efx,
 {
struct efx_ef10_filter_table *table = efx->filter_state;
struct efx_ef10_dev_addr *addr_list;
+   enum efx_filter_flags filter_flags;
struct efx_filter_spec spec;
u8 baddr[ETH_ALEN];
unsigned int i, j;
@@ -3945,11 +3946,11 @@ static int efx_ef10_filter_insert_addr_list(struct 
efx_nic *efx,
addr_count = table->dev_uc_count;
}
 
+   filter_flags = efx_rss_enabled(efx) ? EFX_FILTER_FLAG_RX_RSS : 0;
+
/* Insert/renew filters */
for (i = 0; i < addr_count; i++) {
-   efx_filter_init_rx(, EFX_FILTER_PRI_AUTO,
-  EFX_FILTER_FLAG_RX_RSS,
-  0);
+   efx_filter_init_rx(, EFX_FILTER_PRI_AUTO, filter_flags, 0);
efx_filter_set_eth_local(, EFX_FILTER_VID_UNSPEC,
 addr_list[i].addr);
rc = efx_ef10_filter_insert(efx, , true);
@@ -3978,9 +3979,7 @@ static int efx_ef10_filter_insert_addr_list(struct 
efx_nic *efx,
 
if (multicast && rollback) {
/* Also need an Ethernet broadcast filter */
-   efx_filter_init_rx(, EFX_FILTER_PRI_AUTO,
-  EFX_FILTER_FLAG_RX_RSS,
-  0);
+   efx_filter_init_rx(, EFX_FILTER_PRI_AUTO, filter_flags, 0);
eth_broadcast_addr(baddr);
efx_filter_set_eth_local(, EFX_FILTER_VID_UNSPEC, baddr);
rc = efx_ef10_filter_insert(efx, , true);
@@ -4010,13 +4009,14 @@ static int efx_ef10_filter_insert_def(struct efx_nic 
*efx, bool multicast,
 {
struct efx_ef10_filter_table *table = efx->filter_state;
struct efx_ef10_nic_data *nic_data = efx->nic_data;
+   enum efx_filter_flags filter_flags;
struct efx_filter_spec spec;
u8 baddr[ETH_ALEN];
int rc;
 
-   efx_filter_init_rx(, EFX_FILTER_PRI_AUTO,
-  EFX_FILTER_FLAG_RX_RSS,
-  0);
+   filter_flags = efx_rss_enabled(efx) ? EFX_FILTER_FLAG_RX_RSS : 0;
+
+   efx_filter_init_rx(, EFX_FILTER_PRI_AUTO, filter_flags, 0);
 
if (multicast)
efx_filter_set_mc_def();
@@ -4033,8 +4033,7 @@ static int efx_ef10_filter_insert_def(struct efx_nic 
*efx, bool multicast,
if (!nic_data->workaround_26807) {
/* Also need an Ethernet broadcast filter */
efx_filter_init_rx(, EFX_FILTER_PRI_AUTO,
-  EFX_FILTER_FLAG_RX_RSS,
-  0);
+  filter_flags, 0);
eth_broadcast_addr(baddr);
efx_filter_set_eth_local(, EFX_FILTER_VID_UNSPEC,
 baddr);
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 1aaf76c..1082747 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -76,6 +76,11 @@ void efx_schedule_slow_fill(struct efx_rx_queue *rx_queue);
 #define EFX_TXQ_MAX_ENT(efx)   (EFX_WORKAROUND_35388(efx) ? \
 EFX_MAX_DMAQ_SIZE / 2 : EFX_MAX_DMAQ_SIZE)
 
+static inline bool efx_rss_enabled(struct efx_nic *efx)
+{
+   return efx->rss_spread > 1;
+}
+
 /* Filters */
 
 void efx_mac_reconfigure(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/farch.c b/drivers/net/ethernet/sfc/farch.c
index 5a1c5a8..133e9e3 100644
--- a/drivers/net/ethernet/sfc/farch.c
+++ b/drivers/net/ethernet/sfc/farch.c
@@ -2242,7 +2242,7 @@ efx_farch_filter_init_rx_auto(struct efx_nic *efx,
 */
spec->priority = EFX_FILTER_PRI_AUTO;

[PATCH v2] wlcore/wl12xx: spi: fix oops on firmware load

2015-12-10 Thread Uri Mashiach
The maximum chunks used by the function is
(SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE + 1).
The original commands array had space for
(SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) commands.
When the last chunk is used (len > 4 * WSPI_MAX_CHUNK_SIZE), the last
command is stored outside the bounds of the commands array.

Oops 5 (page fault) is generated during current wl1271 firmware load
attempt:

root@debian-armhf:~# ifconfig wlan0 up
[  294.312399] Unable to handle kernel paging request at virtual address
00203fc4
[  294.320173] pgd = de528000
[  294.323028] [00203fc4] *pgd=
[  294.326916] Internal error: Oops: 5 [#1] SMP ARM
[  294.331789] Modules linked in: bnep rfcomm bluetooth ipv6 arc4 wl12xx
wlcore mac80211 musb_dsps cfg80211 musb_hdrc usbcore usb_common
wlcore_spi omap_rng rng_core musb_am335x omap_wdt cpufreq_dt thermal_sys
hwmon
[  294.351838] CPU: 0 PID: 1827 Comm: ifconfig Not tainted
4.2.0-2-g3e9ad27-dirty #78
[  294.360154] Hardware name: Generic AM33XX (Flattened Device Tree)
[  294.366557] task: dc9d6d40 ti: de55 task.ti: de55
[  294.372236] PC is at __spi_validate+0xa8/0x2ac
[  294.376902] LR is at __spi_sync+0x78/0x210
[  294.381200] pc : []lr : []psr: 6013
[  294.381200] sp : de551998  ip : de5519d8  fp : 0020
[  294.393242] r10: de551c8c  r9 : de5519d8  r8 : de3a9000
[  294.398730] r7 : de3a9258  r6 : de3a9400  r5 : de551a48  r4 :
00203fbc
[  294.405577] r3 :   r2 :   r1 :   r0 :
de3a9000
[  294.412420] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment user
[  294.419918] Control: 10c5387d  Table: 9e528019  DAC: 0015
[  294.425954] Process ifconfig (pid: 1827, stack limit = 0xde550218)
[  294.432437] Stack: (0xde551998 to 0xde552000)

...

[  294.883613] [] (__spi_validate) from []
(__spi_sync+0x78/0x210)
[  294.891670] [] (__spi_sync) from []
(wl12xx_spi_raw_write+0xfc/0x148 [wlcore_spi])
[  294.901661] [] (wl12xx_spi_raw_write [wlcore_spi]) from
[] (wlcore_boot_upload_firmware+0x1ec/0x458 [wlcore])
[  294.914038] [] (wlcore_boot_upload_firmware [wlcore]) from
[] (wl12xx_boot+0xc10/0xfac [wl12xx])
[  294.925161] [] (wl12xx_boot [wl12xx]) from []
(wl1271_op_add_interface+0x5b0/0x910 [wlcore])
[  294.936364] [] (wl1271_op_add_interface [wlcore]) from
[] (ieee80211_do_open+0x44c/0xf7c [mac80211])
[  294.947963] [] (ieee80211_do_open [mac80211]) from
[] (__dev_open+0xa8/0x110)
[  294.957307] [] (__dev_open) from []
(__dev_change_flags+0x88/0x148)
[  294.965713] [] (__dev_change_flags) from []
(dev_change_flags+0x18/0x48)
[  294.974576] [] (dev_change_flags) from []
(devinet_ioctl+0x6b4/0x7d0)
[  294.983191] [] (devinet_ioctl) from []
(sock_ioctl+0x1e4/0x2bc)
[  294.991244] [] (sock_ioctl) from []
(do_vfs_ioctl+0x420/0x6b0)
[  294.999208] [] (do_vfs_ioctl) from []
(SyS_ioctl+0x6c/0x7c)
[  295.006880] [] (SyS_ioctl) from []
(ret_fast_syscall+0x0/0x54)
[  295.014835] Code: e1550004 e2444034 0a7d e5953018 (e5942008)
[  295.021544] ---[ end trace 66ed188198f4e24e ]---

Signed-off-by: Uri Mashiach 
Acked-by: Igor Grinberg 
Cc: sta...@vger.kernel.org
---
v1 -> v2: Stable tag v3.9+

 drivers/net/wireless/ti/wlcore/spi.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/spi.c 
b/drivers/net/wireless/ti/wlcore/spi.c
index f1ac283..720e4e4 100644
--- a/drivers/net/wireless/ti/wlcore/spi.c
+++ b/drivers/net/wireless/ti/wlcore/spi.c
@@ -73,7 +73,10 @@
  */
 #define SPI_AGGR_BUFFER_SIZE (4 * PAGE_SIZE)
 
-#define WSPI_MAX_NUM_OF_CHUNKS (SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE)
+/* Maximum number of SPI write chunks */
+#define WSPI_MAX_NUM_OF_CHUNKS \
+   ((SPI_AGGR_BUFFER_SIZE / WSPI_MAX_CHUNK_SIZE) + 1)
+
 
 struct wl12xx_spi_glue {
struct device *dev;
@@ -268,9 +271,10 @@ static int __must_check wl12xx_spi_raw_write(struct device 
*child, int addr,
 void *buf, size_t len, bool fixed)
 {
struct wl12xx_spi_glue *glue = dev_get_drvdata(child->parent);
-   struct spi_transfer t[2 * (WSPI_MAX_NUM_OF_CHUNKS + 1)];
+   /* SPI write buffers - 2 for each chunk */
+   struct spi_transfer t[2 * WSPI_MAX_NUM_OF_CHUNKS];
struct spi_message m;
-   u32 commands[WSPI_MAX_NUM_OF_CHUNKS];
+   u32 commands[WSPI_MAX_NUM_OF_CHUNKS]; /* 1 command per chunk */
u32 *cmd;
u32 chunk_len;
int i;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] xfrm: add rcu grace period in xfrm_policy_destroy()

2015-12-10 Thread Steffen Klassert
On Tue, Dec 08, 2015 at 10:58:29PM -0500, David Miller wrote:
> From: Eric Dumazet 
> Date: Tue,  8 Dec 2015 07:22:01 -0800
> 
> > We will soon switch sk->sk_policy[] to RCU protection,
> > as SYNACK packets are sent while listener socket is not locked.
> > 
> > This patch simply adds RCU grace period before struct xfrm_policy
> > freeing, and the corresponding rcu_head in struct xfrm_policy.
> > 
> > Signed-off-by: Eric Dumazet 
> 
> I'll give Steffen an opportunity to review these two patches.

Looks ok and survived some tests with socket policies.

If you want to take these direct into the net tree:

Acked-by: Steffen Klassert 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [BACKPORT] [3.14.56] bnx2x: Don't notify about scratchpad parities

2015-12-10 Thread Patrick Schaaf
On Friday 06 November 2015 09:32:46 Greg KH wrote:
> On Thu, Nov 05, 2015 at 11:18:37AM +0100, Patrick Schaaf wrote:
> > bnx2x: Don't notify about scratchpad parities
> > 
> > This is a (trivial) "backport" of ad6afbe9578d1fa26680faf78c846bd8c00d1d6e
> > to stable kernel 3.14.56.
> 
> This patch isn't in 4.1 either, do you want it there as well?

Hi Greg,

I didn't see the patch in 3.14.57 or 3.14.58 - could you please consider it 
again (for all stable kernels that don't have it)?

My three machines with bnx2x interfaces have been running file with patch 
3.14.56, for the last 35 days. The original problematic event (spewing a 
million messages which are suppressed by that patch), did not reoccur so far 
(neither did any other issue, dmesg is completely empty since boot).

best regards
  Patrick

Related earlier posts / reports, for reference:

http://marc.info/?l=linux-netdev=144663711626469
http://lists.openwall.net/netdev/2015/11/05/48

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sched/cls_flow.c : allow nfct-* keys work on ingress interfaces

2015-12-10 Thread Гаврилов Игорь
Improved CTTUPLE macro with code from sched/act_connmark.c, so it be able to 
get unNATed addresses from nf_conntrack.--- cls_flow.c	2015-11-02 02:05:25.0 +0200
+++ net/sched/cls_flow.c	2015-12-10 17:04:03.075202330 +0200
@@ -30,6 +30,8 @@
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 #include 
+#include 
+#include 
 #endif
 
 struct flow_head {
@@ -132,16 +134,43 @@
 }
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
-#define CTTUPLE(skb, member)		\
+#define CTTUPLE(skb, direction, member)		\
 ({	\
 	enum ip_conntrack_info ctinfo;	\
-	const struct nf_conn *ct = nf_ct_get(skb, );		\
-	if (ct == NULL)			\
-		goto fallback;		\
-	ct->tuplehash[CTINFO2DIR(ctinfo)].tuple.member;			\
+	struct nf_conntrack_tuple tuple;\
+	struct nf_conntrack_zone zone;	\
+	const struct nf_conntrack_tuple_hash *thash;			\
+	__be32 result;			\
+	int proto;			\
+	struct nf_conn *ct = nf_ct_get(skb, );			\
+	if (ct == NULL){		\
+		switch (tc_skb_protocol(skb)) {			\
+		case htons(ETH_P_IP):\
+		proto = NFPROTO_IPV4; 			\
+break;	\
+		case htons(ETH_P_IPV6):\
+proto = NFPROTO_IPV6;			\
+break;	\
+			default: goto fallback;\
+	} 			\
+	\
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, )) \
+goto fallback;		\
+zone.id = NF_CT_DEFAULT_ZONE_ID;\
+zone.dir = NF_CT_DEFAULT_ZONE_DIR;\
+	\
+thash = nf_conntrack_find_get(dev_net(skb->dev), , );\
+if (!thash) goto fallback;	\
+ct = nf_ct_tuplehash_to_ctrack(thash);\
+	result = ct->tuplehash[(thash->tuple.dst.dir == IP_CT_DIR_REPLY) ? IP_CT_DIR_ORIGINAL : IP_CT_DIR_REPLY].tuple.src.member;	\
+	nf_ct_put(ct);			\
+	} else {			\
+	result = ct->tuplehash[CTINFO2DIR(ctinfo)].tuple.direction.member;	\
+	}\
+	result;\
 })
 #else
-#define CTTUPLE(skb, member)		\
+#define CTTUPLE(skb, direction, member)		\
 ({	\
 	goto fallback;			\
 	0;\
@@ -152,9 +181,9 @@
 {
 	switch (tc_skb_protocol(skb)) {
 	case htons(ETH_P_IP):
-		return ntohl(CTTUPLE(skb, src.u3.ip));
+		return ntohl(CTTUPLE(skb, src, u3.ip));
 	case htons(ETH_P_IPV6):
-		return ntohl(CTTUPLE(skb, src.u3.ip6[3]));
+		return ntohl(CTTUPLE(skb, src, u3.ip6[3]));
 	}
 fallback:
 	return flow_get_src(skb, flow);
@@ -162,11 +191,12 @@
 
 static u32 flow_get_nfct_dst(const struct sk_buff *skb, const struct flow_keys *flow)
 {
+
 	switch (tc_skb_protocol(skb)) {
 	case htons(ETH_P_IP):
-		return ntohl(CTTUPLE(skb, dst.u3.ip));
+		return ntohl(CTTUPLE(skb, dst, u3.ip));
 	case htons(ETH_P_IPV6):
-		return ntohl(CTTUPLE(skb, dst.u3.ip6[3]));
+		return ntohl(CTTUPLE(skb, dst, u3.ip6[3]));
 	}
 fallback:
 	return flow_get_dst(skb, flow);
@@ -174,14 +204,14 @@
 
 static u32 flow_get_nfct_proto_src(const struct sk_buff *skb, const struct flow_keys *flow)
 {
-	return ntohs(CTTUPLE(skb, src.u.all));
+	return ntohs(CTTUPLE(skb, src, u.all));
 fallback:
 	return flow_get_proto_src(skb, flow);
 }
 
 static u32 flow_get_nfct_proto_dst(const struct sk_buff *skb, const struct flow_keys *flow)
 {
-	return ntohs(CTTUPLE(skb, dst.u.all));
+	return ntohs(CTTUPLE(skb, dst, u.all));
 fallback:
 	return flow_get_proto_dst(skb, flow);
 }


Re: [PATCH 2/3] ser_gigaset: fix deallocation of platform device structure

2015-12-10 Thread Peter Hurley
Hi Tilman,

On 12/09/2015 03:10 AM, Tilman Schmidt wrote:
> Am 09.12.2015 um 00:12 schrieb Paul Bolle:
> 
>>> --- a/drivers/isdn/gigaset/ser-gigaset.c
>>> +++ b/drivers/isdn/gigaset/ser-gigaset.c
>>> @@ -370,19 +370,23 @@ static void gigaset_freecshw(struct cardstate
>>> *cs)
>>> tasklet_kill(>write_tasklet);
>>> if (!cs->hw.ser)
>>> return;
>>> -   dev_set_drvdata(>hw.ser->dev.dev, NULL);
>>> platform_device_unregister(>hw.ser->dev);
>>> -   kfree(cs->hw.ser);
>>> -   cs->hw.ser = NULL;
>>>  }
>>>  
>>>  static void gigaset_device_release(struct device *dev)
>>>  {
>>> struct platform_device *pdev = to_platform_device(dev);
>>> +   struct cardstate *cs = dev_get_drvdata(dev);
>>>  
>>> /* adapted from platform_device_release() in
>>> drivers/base/platform.c */
>>> kfree(dev->platform_data);
>>> kfree(pdev->resource);
>>> +
>>> +   if (!cs)
>>> +   return;
>>> +   dev_set_drvdata(dev, NULL);

This is of marginal value and (I think) unnecessary; it implies
the core will use the device after release, which would trigger
many problems if true.


>> dev equals cs->hw.ser->dev.dev, doesn't it?
> 
> Correct.
> 
>> So what does setting
>> cs->hw.ser->dev.dev.driver_data to NULL just before freeing it buy us?
> 
> We're freeing cs->hw.ser, not cs->hw.ser->dev.
> Clearing the reference to cs from the device structure before freeing cs
> guards against possible use-after-free.
> 
>>> +   kfree(cs->hw.ser);
>>> +   cs->hw.ser = NULL;

This pattern is common, and defends against much more common
driver bugs.

Unfortunately, much of the good this pattern is intended to do in finding
use-after-free bugs is undone by explicit tests for null everywhere else.
Not saying that's the case here; rather, generally speaking.

Like the
if (!tty && !tty->ops && )

code.

Better just to let it crash.

Regards,
Peter Hurley


>> I might be missing something, but what does setting this to NULL buy us
>> here?
> 
> Just defensive programming. Guarding against possible use-after-free or
> double-free.
> 
>>
>> (I realize that I'm asking questions to code that isn't actually new but
>> only moved around, but I think that's still an opportunity to have
>> another look at that code.)
> 
> I'm a big fan of one change per patch. If we also want to modify the
> moved code then that should be done in a separate patch. It makes
> bisecting so much easier. Same reason why I separated out patch 3/3. And
> btw same reason why I think patch 1/3 should go in as-is, as an obvious
> fix to commit f34d7a5b, and any concerns about whether those tests are
> useful should be addressed by a separate patch.
> 
> Regards,
> Tilman
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ixgbe: on recv increment rx.ring->stats.yields instead of tx

2015-12-10 Thread Pavel Tikhomirov
It seem to be non intentionaly changed to tx in
ms commit: adc810900a703ee78fe88fd65e086d359fec04b2
ixgbe: Refactor busy poll socket code to address multiple issues

Lock is taken from ixgbe_low_latency_recv, and there under this
lock we use ixgbe_clean_rx_irq so it looks wrong for me to increment
tx counter.

Yield stats can be shown through ethtool:
ethtool -S enp129s0 | grep yield

Signed-off-by: Pavel Tikhomirov 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 1d21745..7656d46 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -451,7 +451,7 @@ static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector 
*q_vector)
IXGBE_QV_STATE_POLL);
 #ifdef BP_EXTENDED_STATS
if (rc != IXGBE_QV_STATE_IDLE)
-   q_vector->tx.ring->stats.yields++;
+   q_vector->rx.ring->stats.yields++;
 #endif
return rc == IXGBE_QV_STATE_IDLE;
 }
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Checksum offload queries

2015-12-10 Thread Edward Cree
On 09/12/15 18:00, Tom Herbert wrote:
> That is not at all true. If the stack has set up VXLAN RCO and the > device 
> decides to set the inner checksum itself then the checksum > will be bad. The 
> checksum interface is very specific please read it > carefully (sk_buff.h), 
> if the driver/device thinks it is smarter > than the stack and tries to do 
> set its own rules on how checksum > offload works then things will eventually 
> break miserably.

Ok, I've passed that on to the guy working on this bit of the driver.

It looks like the best way to support the capabilities of NICs like the
sfc 8000 series (which can fill in two checksums but uses packet parsing
to figure out what to do, rather than using csum start/offset) is:

(core / stack)
* add NETIF_F_HW_2CSUMS (or whatever name)
* squeeze a second csum start/offset pair into the skb (as you mention,
 we can do this without size increase)
* Modify (Tx) CHECKSUM_PARTIAL generation to use both csum pairs.
 Presumably by creating CHECKSUM_PARTIAL_2CSUMS to indicate that the
 second csum pair has been filled in as well.

(sfc driver)
* declare 2CSUMS support
* on getting an skb for xmit, check whether the csum pairs match what our
 eeevil packet parsing hardware will do.  If so, send it with appropriate
 csum offload settings (we can enable/disable inner & outer offload
 independently, with TX Option descriptors).  Any csum pair that doesn't
 match, we call skb_checksum_help to do it in software, and tell the
 hardware not to do that one.

Optionally, we could also create NETIF_F_IP[V6]_2CSUMS in the stack and
have our driver advertise that instead, but since there has to be a
fallback to skb_checksum_help in the driver anyway, there doesn't seem
to be much point.

Does that seem reasonable?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: sctp: clone options to avoid use after free

2015-12-10 Thread Eric Dumazet
On Thu, 2015-12-10 at 12:26 +, David Laight wrote:

> Yes, I'm worried about whether our M3UA code is likely to crash customer
> systems, not whether hostile applications can crash it.
> These boxes ought to be on private networks since the sigtran protocols
> themselves have nothing that even gives a hint of security.

As long as the listener socket is kept as is, meaning that the only use
of it is the poll()/select()/accept() system calls, you are safe.

The bug is about having a fuzzer, specifically playing games with multi
threads so that the listener ipv6 options are changed after accept(). 

This should not really happen in real world applications : If ipv6
options need to be set on listener, they are set before first accept()
is performed, and not unset until application exits and kill all
sessions.

BTW, are you even using IPv6 SCTP sessions ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/4] i40e: Kernel dependency update for i40e to support geneve offload

2015-12-10 Thread Jeff Kirsher
On Tue, 2015-12-08 at 10:12 -0800, Anjali Singhai Jain wrote:
> Update the Kconfig file with dependency for supporting GENEVE tunnel
> offloads.
> 
> Signed-off-by: Anjali Singhai Jain 
> Signed-off-by: Kiran Patil 

Acked-by: Jeff Kirsher 

> ---
>  drivers/net/ethernet/intel/Kconfig | 10 ++
>  1 file changed, 10 insertions(+)


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 2/4] i40e: geneve tunnel offload support

2015-12-10 Thread Jeff Kirsher
On Tue, 2015-12-08 at 10:12 -0800, Anjali Singhai Jain wrote:
> This patch adds driver hooks to implement ndo_ops to add/del
> udp port in the HW to identify GENEVE tunnels.
> 
> Signed-off-by: Anjali Singhai Jain 
> Signed-off-by: Kiran Patil 

Acked-by: Jeff Kirsher 

> ---
>  drivers/net/ethernet/intel/i40e/i40e.h  |  16 +--
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 167
> ++--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c |   8 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h |   2 +-
>  4 files changed, 150 insertions(+), 43 deletions(-)


signature.asc
Description: This is a digitally signed message part


[PATCH 2/4] livepatch: use list_is_first()

2015-12-10 Thread Geliang Tang
For better readability, use list_is_first() instead of open-coded.

Signed-off-by: Geliang Tang 
---
 kernel/livepatch/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index bc2c85c..be64106 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -479,7 +479,7 @@ static int __klp_enable_patch(struct klp_patch *patch)
return -EINVAL;
 
/* enforce stacking: only the first disabled patch can be enabled */
-   if (patch->list.prev != _patches &&
+   if (!list_is_first(>list, _patches) &&
list_prev_entry(patch, list)->state == KLP_DISABLED)
return -EBUSY;
 
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] elevator: use list_is_{first,last}

2015-12-10 Thread Geliang Tang
For better readability, use list_is_{first,last}() instead of open-coded.

Signed-off-by: Geliang Tang 
---
 block/noop-iosched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/noop-iosched.c b/block/noop-iosched.c
index a163c48..d44326e 100644
--- a/block/noop-iosched.c
+++ b/block/noop-iosched.c
@@ -44,7 +44,7 @@ noop_former_request(struct request_queue *q, struct request 
*rq)
 {
struct noop_data *nd = q->elevator->elevator_data;
 
-   if (rq->queuelist.prev == >queue)
+   if (list_is_first(>queuelist, >queue))
return NULL;
return list_prev_entry(rq, queuelist);
 }
@@ -54,7 +54,7 @@ noop_latter_request(struct request_queue *q, struct request 
*rq)
 {
struct noop_data *nd = q->elevator->elevator_data;
 
-   if (rq->queuelist.next == >queue)
+   if (list_is_last(>queuelist, >queue))
return NULL;
return list_next_entry(rq, queuelist);
 }
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] netfilter: ipset: use list_is_first()

2015-12-10 Thread Geliang Tang
For better readability, use list_is_first() instead of open-coded.

Signed-off-by: Geliang Tang 
---
 net/netfilter/ipset/ip_set_list_set.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index bbede95..9d757d6 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -288,7 +288,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
n = list_next_entry(next, list);
} else {
/* Insert before prev element */
-   if (prev->list.prev != >members)
+   if (!list_is_first(>list, >members))
n = list_prev_entry(prev, list);
}
/* Can we replace a timed out entry? */
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] net: performance regression on ixgbe (Intel 82599EB 10-Gigabit NIC)

2015-12-10 Thread Otto Sabart
Hi Rick,

> *)  It is good to be binding netperf and netserver - helps with
> reproducibility, but why the two -T options?  A brief look at src/netsh.c
> suggests it will indeed set the two binding options separately but that is
> merely a side-effect of how I wrote the code.  It wasn't an intentional
> thing.

It's because of the way we generate arguments for netperf.
'-T 0, -T ,0' does the same as '-T 0,0', but the first option is more
convenient for us.

> *) Is irqbalance disabled and the IRQs set the same each time, or might
> there be variability possible there?  Each of the five netperf runs will be
> a different four-tuple which means each may (or may not) get RSS hashed/etc
> differently.

The irqbalance is disabled on all systems.

Can you suggest, if there is a need to assign irqs manually? Which irqs
we should pin to which CPU?

> *) It is perhaps adding duct tape to already-present belt and suspenders,
> but is power-management set to a fixed state on the systems involved? (Since
> this seems to be ProLiant G7s going by the legends on the charts, either
> static high perf or static low power I would imagine)

Power management is set to OS-Control in bios, which effectively means,
that _bios_ does not do any power management at all.

> *) What is the difference before/after for the service demands?  The netperf
> tests being run are asking for CPU utilization but I don't see the service
> demand change being summarized.

Unfortunatelly we does not have any summary chart for service demands,
we will add some shortly.

> *) Does a specific CPU on one side or the other saturate?
> (LOCAL_CPU_PEAK_UTIL, LOCAL_CPU_PEAK_ID, REMOTE_CPU_PEAK_UTIL,
> REMOTE_CPU_PEAK_ID output selectors)

We are sort of stuck in a stone age. We still use old fashion tcp/udp
migrated tests, but we plan to switch to omni.

> *) What are the processors involved?  Presumably the "other system" is
> fixed?

In this case:

hp-dl380g7 - $ lscpu:
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):24
On-line CPU(s) list:   0-23
Thread(s) per core:2
Core(s) per socket:6
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 44
Model name:Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
Stepping:  2
CPU MHz:   2660.000
BogoMIPS:  5331.27
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23


hp-dl385g7 - $ lscpu:
tecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):24
On-line CPU(s) list:   0-23
Thread(s) per core:1
Core(s) per socket:12
Socket(s): 2
NUMA node(s):  4
Vendor ID: AuthenticAMD
CPU family:16
Model: 9
Model name:AMD Opteron(tm) Processor 6172
Stepping:  1
CPU MHz:   2100.000
BogoMIPS:  4200.39
Virtualization:AMD-V
L1d cache: 64K
L1i cache: 64K
L2 cache:  512K
L3 cache:  5118K
NUMA node0 CPU(s): 0,2,4,6,8,10
NUMA node1 CPU(s): 12,14,16,18,20,22
NUMA node2 CPU(s): 13,15,17,19,21,23
NUMA node3 CPU(s): 1,3,5,7,9,11


Thank you for your hints!

Ota
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] list: introduce list_is_first()

2015-12-10 Thread Geliang Tang
We already have list_is_last(), it makes sense to also add
list_is_first() for consistency. This list utility function
to check for first element in a list.

Signed-off-by: Geliang Tang 
---
 include/linux/list.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/list.h b/include/linux/list.h
index 5356f4d..2c43ef4 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -170,6 +170,17 @@ static inline void list_move_tail(struct list_head *list,
 }
 
 /**
+ * list_is_first - tests whether @list is the first entry in list @head
+ * @list: the entry to test
+ * @head: the head of the list
+ */
+static inline int list_is_first(const struct list_head *list,
+   const struct list_head *head)
+{
+   return list->prev == head;
+}
+
+/**
  * list_is_last - tests whether @list is the last entry in list @head
  * @list: the entry to test
  * @head: the head of the list
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/macb: add support for resetting PHY using GPIO

2015-12-10 Thread Gregory CLEMENT
Hi Sascha,
 
 On jeu., déc. 10 2015, Sascha Hauer  wrote:

> Hi Gregory,
>
> On Wed, Dec 09, 2015 at 06:49:43PM +0100, Gregory CLEMENT wrote:
>> With device tree it is no more possible to reset the PHY at board
>> level. Furthermore, doing in the driver allow to power down the PHY when
>> the network interface is no more used.
>> 
>> The patch introduces a new optional property "phy-reset-gpio" inspired
>> from the one use for the FEC.
>
> I don't think it's a good idea to further extend the usage of this
> binding. The driver should use the phy-handle property and
> of_phy_connect() which gives you a proper device node for the phy. Then
> the phy device node should get the reset gpio. I know it's more work,

So you suggest to pass from this binding:
macb1: ethernet@fc028000 {
phy-mode = "rmii";
status = "okay";
#address-cells = <1>;
#size-cells = <0>;
status = "okay";
phy-reset-gpio = < 6 GPIO_ACTIVE_HIGH>;

ethernet-phy@1 {
reg = <0x1>;
interrupt-parent = <>;
interrupts = <31 IRQ_TYPE_EDGE_FALLING>;

};
};

to this binding
macb1: ethernet@fc028000 {
phy-mode = "rmii";
status = "okay";
#address-cells = <1>;
#size-cells = <0>;
status = "okay";

ethernet-phy@1 {
reg = <0x1>;
interrupt-parent = <>;
interrupts = <31 IRQ_TYPE_EDGE_FALLING>;
phy-reset-gpio = < 6 GPIO_ACTIVE_HIGH>;
};
};

> but doing it like this gives you additional goodies like proper handling
> of the max-speed property, a fixed-link if necessary and picking the

Currently there is phy_connect_direct so we can already handle the
preperty of the phy.

> correct phy if there are muliple phys on the bus.

but I agree with this one.

Gregory

>
> Sascha
>
> -- 
> Pengutronix e.K.   | |
> Industrial Linux Solutions | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
> Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] list: introduce list_is_first()

2015-12-10 Thread Jens Axboe

On 12/10/2015 07:17 AM, Geliang Tang wrote:

We already have list_is_last(), it makes sense to also add
list_is_first() for consistency. This list utility function
to check for first element in a list.


Honestly, I think we already have way too many of these kind of helpers. 
IMHO they don't really help, they hurt readability. You should know how 
the list works anyway, and if you do, then it's a no-brainer what's 
first and last. If you don't, then you are bound to screw up in other ways.


Just my 2 cents.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 7/9] net/mlx5_core: Flow steering tree initialization

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb 

Flow steering initialization is based on static tree which
illustrates the flow steering tree when the driver is loaded. The
initialization considers the max supported flow table level of the device,
a minimum of 2 kernel flow tables(vlan and mac) are required to have
kernel flow table functionality.

The tree structures when the driver is loaded:

root_namespace(receive nic)
  |
priority-0 (kernel priority)
  |
namespace(kernel namespace)
  |
priority-0 (flow tables priority)

In the following patches, When the EN driver will use the flow steering
API, it create two flow tables and their flow groups under
priority-0(flow tables priority).

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  374 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |4 +
 include/linux/mlx5/driver.h   |2 +
 include/linux/mlx5/fs.h   |8 +
 4 files changed, 388 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 1828351..4264e8b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -37,6 +37,54 @@
 #include "fs_core.h"
 #include "fs_cmd.h"
 
+#define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
+sizeof(struct init_tree_node))
+
+#define INIT_PRIO(min_level_val, max_ft_val,\
+ start_level_val, ...) {.type = FS_TYPE_PRIO,\
+   .min_ft_level = min_level_val,\
+   .start_level = start_level_val,\
+   .max_ft = max_ft_val,\
+   .children = (struct init_tree_node[]) {__VA_ARGS__},\
+   .ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
+}
+
+#define ADD_PRIO(min_level_val, max_ft_val, start_level_val, ...)\
+   INIT_PRIO(min_level_val, max_ft_val, start_level_val,\
+ __VA_ARGS__)\
+
+#define ADD_FT_PRIO(max_ft_val, start_level_val, ...)\
+   INIT_PRIO(0, max_ft_val, start_level_val,\
+ __VA_ARGS__)\
+
+#define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
+   .children = (struct init_tree_node[]) {__VA_ARGS__},\
+   .ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
+}
+
+#define KERNEL_START_LEVEL 0
+#define KERNEL_P0_START_LEVEL KERNEL_START_LEVEL
+#define KERNEL_MAX_FT 2
+#define KENREL_MIN_LEVEL 2
+static struct init_tree_node {
+   enum fs_node_type   type;
+   struct init_tree_node *children;
+   int ar_size;
+   int min_ft_level;
+   int prio;
+   int max_ft;
+   int start_level;
+} root_fs = {
+   .type = FS_TYPE_NAMESPACE,
+   .ar_size = 1,
+   .children = (struct init_tree_node[]) {
+   ADD_PRIO(KENREL_MIN_LEVEL, KERNEL_MAX_FT,
+KERNEL_START_LEVEL,
+ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT,
+   KERNEL_P0_START_LEVEL))),
+   }
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -671,3 +719,329 @@ static void mlx5_destroy_flow_group(struct 
mlx5_flow_group *fg)
mlx5_core_warn(get_dev(>node), "Flow group %d wasn't 
destroyed, refcount > 1\n",
   fg->id);
 }
+
+static struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct 
mlx5_core_dev *dev,
+  enum 
mlx5_flow_namespace_type type)
+{
+   struct mlx5_flow_root_namespace *root_ns = dev->priv.root_ns;
+   int prio;
+   static struct fs_prio *fs_prio;
+   struct mlx5_flow_namespace *ns;
+
+   if (!root_ns)
+   return NULL;
+
+   switch (type) {
+   case MLX5_FLOW_NAMESPACE_KERNEL:
+   prio = 0;
+   break;
+   case MLX5_FLOW_NAMESPACE_FDB:
+   if (dev->priv.fdb_root_ns)
+   return >priv.fdb_root_ns->ns;
+   else
+   return NULL;
+   default:
+   return NULL;
+   }
+
+   fs_prio = find_prio(_ns->ns, prio);
+   if (!fs_prio)
+   return NULL;
+
+   ns = list_first_entry(_prio->node.children,
+ typeof(*ns),
+ node.list);
+
+   return ns;
+}
+
+static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
+ unsigned prio, int max_ft,
+ int start_level)
+{
+   

[PATCH net-next V1 6/9] net/mlx5_core: Introduce flow steering API

2015-12-10 Thread Saeed Mahameed
From: Maor Gottlieb 

Introducing the following objects:

mlx5_flow_root_namespace: represent the root of specific flow table
type tree(e.g NIC receive, FDB, etc..)

mlx5_flow_group: define the mask of the flow specification.

fs_fte(flow steering flow table entry): defines the value of the
flow specification.

The following describes the relationships between the tree objects:
root_namespace --> priorities -->namespaces -->
priorities -->flow-tables --> flow-groups -->
flow-entries --> destinations

When we create new object(flow table/flow group/flow table entry), we
call to the FW command and then we add the related sw object to the tree.

When we destroy object, e.g. call to mlx5_destroy_flow_table, we use
the tree node destructor for destroying the FW object and remove the
node from the tree.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  464 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   23 +
 2 files changed, 487 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index cac0d15..1828351 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -35,6 +35,12 @@
 
 #include "mlx5_core.h"
 #include "fs_core.h"
+#include "fs_cmd.h"
+
+static void del_rule(struct fs_node *node);
+static void del_flow_table(struct fs_node *node);
+static void del_flow_group(struct fs_node *node);
+static void del_fte(struct fs_node *node);
 
 static void tree_init_node(struct fs_node *node,
   unsigned int refcount,
@@ -207,3 +213,461 @@ static bool compare_match_criteria(u8 
match_criteria_enable1,
return match_criteria_enable1 == match_criteria_enable2 &&
!memcmp(mask1, mask2, MLX5_ST_SZ_BYTES(fte_match_param));
 }
+
+static struct mlx5_flow_root_namespace *find_root(struct fs_node *node)
+{
+   struct fs_node *root;
+   struct mlx5_flow_namespace *ns;
+
+   root = node->root;
+
+   if (WARN_ON(root->type != FS_TYPE_NAMESPACE)) {
+   pr_warn("mlx5: flow steering node is not in tree or 
garbaged\n");
+   return NULL;
+   }
+
+   ns = container_of(root, struct mlx5_flow_namespace, node);
+   return container_of(ns, struct mlx5_flow_root_namespace, ns);
+}
+
+static inline struct mlx5_core_dev *get_dev(struct fs_node *node)
+{
+   struct mlx5_flow_root_namespace *root = find_root(node);
+
+   if (root)
+   return root->dev;
+   return NULL;
+}
+
+static void del_flow_table(struct fs_node *node)
+{
+   struct mlx5_flow_table *ft;
+   struct mlx5_core_dev *dev;
+   struct fs_prio *prio;
+   int err;
+
+   fs_get_obj(ft, node);
+   dev = get_dev(>node);
+
+   err = mlx5_cmd_destroy_flow_table(dev, ft);
+   if (err)
+   pr_warn("flow steering can't destroy ft\n");
+   fs_get_obj(prio, ft->node.parent);
+   prio->num_ft--;
+}
+
+static void del_rule(struct fs_node *node)
+{
+   struct mlx5_flow_rule *rule;
+   struct mlx5_flow_table *ft;
+   struct mlx5_flow_group *fg;
+   struct fs_fte *fte;
+   u32 *match_value;
+   struct mlx5_core_dev *dev = get_dev(node);
+   int match_len = MLX5_ST_SZ_BYTES(fte_match_param);
+   int err;
+
+   match_value = mlx5_vzalloc(match_len);
+   if (!match_value) {
+   pr_warn("failed to allocate inbox\n");
+   return;
+   }
+
+   fs_get_obj(rule, node);
+   fs_get_obj(fte, rule->node.parent);
+   fs_get_obj(fg, fte->node.parent);
+   memcpy(match_value, fte->val, sizeof(fte->val));
+   fs_get_obj(ft, fg->node.parent);
+   list_del(>node.list);
+   fte->dests_size--;
+   if (fte->dests_size) {
+   err = mlx5_cmd_update_fte(dev, ft,
+ fg->id, fte);
+   if (err)
+   pr_warn("%s can't del rule fg id=%d fte_index=%d\n",
+   __func__, fg->id, fte->index);
+   }
+   kvfree(match_value);
+}
+
+static void del_fte(struct fs_node *node)
+{
+   struct mlx5_flow_table *ft;
+   struct mlx5_flow_group *fg;
+   struct mlx5_core_dev *dev;
+   struct fs_fte *fte;
+   int err;
+
+   fs_get_obj(fte, node);
+   fs_get_obj(fg, fte->node.parent);
+   fs_get_obj(ft, fg->node.parent);
+
+   dev = get_dev(>node);
+   err = mlx5_cmd_delete_fte(dev, ft,
+ fte->index);
+   if (err)
+   pr_warn("flow steering can't delete fte in index %d of flow 
group id %d\n",
+   fte->index, fg->id);
+
+   fte->status = 0;
+   

  1   2   >