Re: [PATCH] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Alexandre Belloni
On 18/06/2015 at 12:18:19 +0200, Nicolas Ferre wrote :
 From: Cyrille Pitchen cyrille.pitc...@atmel.com
 
 Add the compatible string for Atmel sama5d2 SoC family as the configuration
 options differ from other instances of the GEM.
 
 Signed-off-by: Cyrille Pitchen cyrille.pitc...@atmel.com
 Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
 ---
  drivers/net/ethernet/cadence/macb.c | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/drivers/net/ethernet/cadence/macb.c 
 b/drivers/net/ethernet/cadence/macb.c
 index 740d04fd2223..caeb39561567 100644
 --- a/drivers/net/ethernet/cadence/macb.c
 +++ b/drivers/net/ethernet/cadence/macb.c
 @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
   .init = macb_init,
  };
  
 +static const struct macb_config sama5d2_config = {
 + .caps = 0,
 + .dma_burst_length = 16,
 + .clk_init = macb_clk_init,
 + .init = macb_init,
 +};
 +
  static const struct macb_config sama5d3_config = {
   .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
   .dma_burst_length = 16,
 @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
   { .compatible = cdns,macb },
   { .compatible = cdns,pc302-gem, .data = pc302gem_config },
   { .compatible = cdns,gem, .data = pc302gem_config },
 + { .compatible = atmel,sama5d2-gem, .data = sama5d2_config },

This compatible has to be documented

   { .compatible = atmel,sama5d3-gem, .data = sama5d3_config },
   { .compatible = atmel,sama5d4-gem, .data = sama5d4_config },
   { .compatible = cdns,at91rm9200-emac, .data = emac_config },
 -- 
 2.1.3
 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ipv6 0/1] ipv6: addrconf: routes are not deleted if last ipv6 address is removed

2015-06-18 Thread Hannes Frederic Sowa
On Thu, 2015-06-18 at 14:59 +0530, Mazhar Rana wrote:
 Hi,
 
 After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
 (ipv6: don't disable interface if last ipv6 address is removed)'
 it is not clearing ipv6 interface configurations(routes, neighbours,
 etc) when last ipv6 address of interface is removed.
 
 This is now creating functionality issue with below deployment.
 
 On ubuntu 14.04 (upgraded with linux kernel 3.19)
 eth1 GW1: 2604:2000:7000:2::102
 eth0 GW2: 2001:df7:6000:101::1b:102
 
 HostA: 3804:3000:1406:2::102 (reachable via GW1 and GW2 both)
 
 In this deployment, HostA is reachable via eth0 and eth1. I prefer
 that all traffic for HostA should go via GW1 which is available on 
 link eth1. 
 
 $ ip -6 ro s
 2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
 2604:2000:7000:2::/64 dev eth1  proto kernel  metric 256 
 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
 fe80::/64 dev eth0  proto kernel  metric 256 
 fe80::/64 dev eth1  proto kernel  metric 256 
 default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1 
 
 On failure of GW1 I removed all ipv6 address of eth1 so all traffic
 should go through default gateway 'GW2'.
 
 $ sudo ip -6 addr flush dev eth1
 $ ip -6 ro s
 2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
 fe80::/64 dev eth0  proto kernel  metric 256 
 fe80::/64 dev eth0.100  proto kernel  metric 256 
 default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1
 
 But here, route for HostA is not deleted, so traffic for HostA is
 still trying to go through GW1 which is not reachable anymore.
 
 If 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
 (ipv6: don't disable interface if last ipv6 address is removed)'
 is taken only for problem mention on changlog of that commit then 
 here I have alternate proposal which will overcome both issue.
 
 Do you see any side effect of this proposal?

In theory IPv6 mandates that on-link information (which subnet is available on
which link) and address specific connected routes should not depend on each
other. That said, your initial assumption that clearing addresses from an
interface to shut it down for IPv6 operation is wrong.

I guess the check was there to make sure each link has an LL address.

As we changed backwards compatibility here I am a bit ambivalent.

Another glitch I noticed with your patch: We don't set disable_ipv6 bit on
addrconf_ifdown with how==0, so we cannot easily bring the interface up without
disturbing IPv4 operations, could you check, that the disable_ipv6 switch works
to at least bring the ipv6 part of the interface up again?

Bye,
Hannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().

2015-06-18 Thread Hiroaki Shimoda
inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH
disabled. So no need to disable BH in inet_diag_dump_reqs().

Signed-off-by: Hiroaki Shimoda shimoda.hiro...@gmail.com
---
 net/ipv4/inet_diag.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 21985d8d41e7..4ca789ba63cb 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct 
sock *sk,
 
entry.family = sk-sk_family;
 
-   spin_lock_bh(icsk-icsk_accept_queue.syn_wait_lock);
+   spin_lock(icsk-icsk_accept_queue.syn_wait_lock);
 
lopt = icsk-icsk_accept_queue.listen_opt;
if (!lopt || !listen_sock_qlen(lopt))
@@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct 
sock *sk,
}
 
 out:
-   spin_unlock_bh(icsk-icsk_accept_queue.syn_wait_lock);
+   spin_unlock(icsk-icsk_accept_queue.syn_wait_lock);
 
return err;
 }
-- 
2.3.6

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] packet: avoid out of bounds read in round robin fanout

2015-06-18 Thread Eric Dumazet
On Wed, 2015-06-17 at 15:59 -0400, Willem de Bruijn wrote:
 From: Willem de Bruijn will...@google.com
 
 PACKET_FANOUT_LB computes f-rr_cur such that it is modulo
 f-num_members. It returns the old value unconditionally, but
 f-num_members may have changed since the last store. Ensure
 that the return value is always  num.
 
 When modifying the logic, simplify it further by replacing the loop
 with an unconditional atomic increment.
 
 Fixes: dc99f600698d (packet: Add fanout support.)
 Suggested-by: Eric Dumazet eduma...@google.com
 Signed-off-by: Willem de Bruijn will...@google.com
 ---

Acked-by: Eric Dumazet eduma...@google.com


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Eric Dumazet
On Wed, 2015-06-17 at 17:59 -0700, Mahesh Bandewar wrote:
 Actor and Partner details can be accessed via proc-fs, sys-fs
 entries or netlink interface. These interfaces are world readable
 at this moment. The earlier patch-series made the LACP communication
 secure to avoid nuisance attack from within the same L2 domain but
 it did not prevent someone unprivileged looking at that information
 on host and perform the same act.
 
 This patch essentially avoids spitting those entries if the user
 in question does not have enough privileges.
 
 Signed-off-by: Mahesh Bandewar mahe...@google.com
 ---
  drivers/net/bonding/bond_netlink.c |  11 ++--
  drivers/net/bonding/bond_procfs.c  | 101 
 +++--
  drivers/net/bonding/bond_sysfs.c   |  12 ++---
  3 files changed, 67 insertions(+), 57 deletions(-)
 
 diff --git a/drivers/net/bonding/bond_netlink.c 
 b/drivers/net/bonding/bond_netlink.c
 index 5580fcde738f..3fd3aa4b145e 100644
 --- a/drivers/net/bonding/bond_netlink.c
 +++ b/drivers/net/bonding/bond_netlink.c
 @@ -600,18 +600,23 @@ static int bond_fill_info(struct sk_buff *skb,
  
   if (BOND_MODE(bond) == BOND_MODE_8023AD) {
   struct ad_info info;
 + u8 zero_mac[ETH_ALEN];
  
 + eth_zero_addr(zero_mac);
   if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
 - bond-params.ad_actor_sys_prio))
 + capable(CAP_NET_ADMIN) ?
 + bond-params.ad_actor_sys_prio : 0))
   goto nla_put_failure;
  
   if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
 - bond-params.ad_user_port_key))
 + capable(CAP_NET_ADMIN) ?
 + bond-params.ad_user_port_key : 0))
   goto nla_put_failure;
  
   if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
   sizeof(bond-params.ad_actor_system),
 - bond-params.ad_actor_system))
 + capable(CAP_NET_ADMIN) ?
 + bond-params.ad_actor_system : zero_mac))
   goto nla_put_failure;
  

Hmm... I would rather not send these fake attributes at all ?



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] Revert tcp: switch tcp_fastopen key generation to net_get_random_once

2015-06-18 Thread Eric Dumazet
On Thu, 2015-06-18 at 11:32 +0200, Hannes Frederic Sowa wrote:
 Hello Christoph,

  There does not seem to be a better way to handle this. We could try
  to make the call to kmalloc and crypto_alloc_cipher during bootup, and
  then generate the random value only on-the-fly (when the first TFO-SYN
  comes in) with net_get_random_once in order to have the better entropy
  that comes with doing the late initialisation of the random value. But
  that's probably net-next material.
 
 can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt 
 and
 sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process 
 context
 but we still defer the extraction of entropy as long as posible?

Yes, I do not think this would be hard. This bug is old (3.13) and does
not seem very urgent to expedite a revert.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] tun, macvtap: higher order allocations for skbs

2015-06-18 Thread Michael S. Tsirkin
On Thu, Jun 18, 2015 at 12:54:44PM +0200, Christian Borntraeger wrote:
 Am 18.06.2015 um 12:20 schrieb Michael S. Tsirkin:
  Needs more testing. Anyone see anything wrong with this?
 Can you explain the motivation? 
 FWIW, basic networking between two guest over macvtap still
 seems to work on s390 so I dont see any obvious regression.
 
 Christian

Shorter fragment list often makes processing in the net stack more
efficient.

  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   drivers/net/macvtap.c | 2 +-
   drivers/net/tun.c | 2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
  index 928f3f4..80e87e4 100644
  --- a/drivers/net/macvtap.c
  +++ b/drivers/net/macvtap.c
  @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct 
  sock *sk, size_t prepad,
  linear = len;
  
  skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
  -  err, 0);
  +  err, 1);
  if (!skb)
  return NULL;
  
  diff --git a/drivers/net/tun.c b/drivers/net/tun.c
  index cb376b2d..8f2f1e5 100644
  --- a/drivers/net/tun.c
  +++ b/drivers/net/tun.c
  @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
  *tfile,
  linear = len;
  
  skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
  -  err, 0);
  +  err, 1);
  if (!skb)
  return ERR_PTR(err);
  
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] net/macb: bindings doc/trivial: fix sama5d4 comment

2015-06-18 Thread Nicolas Ferre
On sama5d4, we only have a GEM IP that is configured to do 10/100 Mbits. So the
use of Gigabit can be confusing.

Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
---
 Documentation/devicetree/bindings/net/macb.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 0ae6974383d7..97349e3f3ff2 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -8,7 +8,7 @@ Required properties:
   Use cdns,pc302-gem for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: cdns,gem.
   Use atmel,sama5d3-gem for the Gigabit IP available on Atmel sama5d3 SoCs.
-  Use atmel,sama5d4-gem for the Gigabit IP available on Atmel sama5d4 SoCs.
+  Use atmel,sama5d4-gem for the GEM IP (10/100) available on Atmel sama5d4 
SoCs.
   Use cdns,zynqmp-gem for Zynq Ultrascale+ MPSoC.
 - reg: Address and length of the register set for the device
 - interrupts: Should contain macb interrupt
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/4] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Nicolas Ferre
From: Cyrille Pitchen cyrille.pitc...@atmel.com

Add the compatible string for Atmel sama5d2 SoC family as the configuration
options differ from other instances of the GEM.

Signed-off-by: Cyrille Pitchen cyrille.pitc...@atmel.com
Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
---
 drivers/net/ethernet/cadence/macb.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 740d04fd2223..caeb39561567 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
.init = macb_init,
 };
 
+static const struct macb_config sama5d2_config = {
+   .caps = 0,
+   .dma_burst_length = 16,
+   .clk_init = macb_clk_init,
+   .init = macb_init,
+};
+
 static const struct macb_config sama5d3_config = {
.caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
.dma_burst_length = 16,
@@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
{ .compatible = cdns,macb },
{ .compatible = cdns,pc302-gem, .data = pc302gem_config },
{ .compatible = cdns,gem, .data = pc302gem_config },
+   { .compatible = atmel,sama5d2-gem, .data = sama5d2_config },
{ .compatible = atmel,sama5d3-gem, .data = sama5d3_config },
{ .compatible = atmel,sama5d4-gem, .data = sama5d4_config },
{ .compatible = cdns,at91rm9200-emac, .data = emac_config },
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/4] net/macb: bindings doc: add sama5d2 compatibility sting

2015-06-18 Thread Nicolas Ferre
Add sama5d2 to the biding documentation for this use of the GEM IP.

Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
---
 Documentation/devicetree/bindings/net/macb.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 97349e3f3ff2..b5d79761ac97 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -7,6 +7,7 @@ Required properties:
   Use cdns,at32ap7000-macb for other 10/100 usage or use the generic form: 
cdns,macb.
   Use cdns,pc302-gem for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: cdns,gem.
+  Use atmel,sama5d2-gem for the GEM IP (10/100) available on Atmel sama5d2 
SoCs.
   Use atmel,sama5d3-gem for the Gigabit IP available on Atmel sama5d3 SoCs.
   Use atmel,sama5d4-gem for the GEM IP (10/100) available on Atmel sama5d4 
SoCs.
   Use cdns,zynqmp-gem for Zynq Ultrascale+ MPSoC.
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces

2015-06-18 Thread Eric W. Biederman

Cc' list trimmed as this is not longer about the original patch
submission.

Julian Anastasov j...@ssi.bg writes:

   Hello,

 On Wed, 17 Jun 2015, Eric W. Biederman wrote:

 p.s.  I do have my patch that I can toss in your direction if you are
 interested.

   Of course... I'll be able to check it after 8 hours...


My incremental patch for ipvs on top of everything else I have pushed
out looks like this:

From: Eric W. Biederman ebied...@xmission.com
Date: Fri, 12 Jun 2015 18:34:12 -0500
Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used

Pass struct net down to where it is used and stop guessing
which network namespace should be used.

Signed-off-by: Eric W. Biederman ebied...@xmission.com
---
 include/net/ip_vs.h |  45 +++-
 net/netfilter/ipvs/ip_vs_conn.c |  11 ++-
 net/netfilter/ipvs/ip_vs_core.c | 118 ++--
 net/netfilter/ipvs/ip_vs_ftp.c  |   8 +--
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |   9 ++-
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |   5 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c|   8 +--
 net/netfilter/ipvs/ip_vs_proto_udp.c|   5 +-
 net/netfilter/ipvs/ip_vs_xmit.c |  51 --
 net/netfilter/xt_ipvs.c |   2 +-
 10 files changed, 108 insertions(+), 154 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 4e3731ee4eac..a556d14cff70 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -35,37 +35,6 @@ static inline struct netns_ipvs *net_ipvs(struct net* net)
return net-ipvs;
 }
 
-/* Get net ptr from skb in traffic cases
- * use skb_sknet when call is from userland (ioctl or netlink)
- */
-static inline struct net *skb_net(const struct sk_buff *skb)
-{
-#ifdef CONFIG_NET_NS
-#ifdef CONFIG_IP_VS_DEBUG
-   /*
-* This is used for debug only.
-* Start with the most likely hit
-* End with BUG
-*/
-   if (likely(skb-dev  dev_net(skb-dev)))
-   return dev_net(skb-dev);
-   if (skb_dst(skb)  skb_dst(skb)-dev)
-   return dev_net(skb_dst(skb)-dev);
-   WARN(skb-sk, Maybe skb_sknet should be used in %s() at line:%d\n,
- __func__, __LINE__);
-   if (likely(skb-sk  sock_net(skb-sk)))
-   return sock_net(skb-sk);
-   pr_err(There is no net ptr to find in the skb in %s() line:%d\n,
-   __func__, __LINE__);
-   BUG();
-#else
-   return dev_net(skb-dev ? : skb_dst(skb)-dev);
-#endif
-#else
-   return init_net;
-#endif
-}
-
 static inline struct net *skb_sknet(const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_NS
@@ -441,19 +410,19 @@ struct ip_vs_protocol {
 
void (*exit_netns)(struct net *net, struct ip_vs_proto_data *pd);
 
-   int (*conn_schedule)(int af, struct sk_buff *skb,
+   int (*conn_schedule)(struct net *net, int af, struct sk_buff *skb,
 struct ip_vs_proto_data *pd,
 int *verdict, struct ip_vs_conn **cpp,
 struct ip_vs_iphdr *iph);
 
struct ip_vs_conn *
-   (*conn_in_get)(int af,
+   (*conn_in_get)(struct net *net, int af,
   const struct sk_buff *skb,
   const struct ip_vs_iphdr *iph,
   int inverse);
 
struct ip_vs_conn *
-   (*conn_out_get)(int af,
+   (*conn_out_get)(struct net *net, int af,
const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
int inverse);
@@ -1179,13 +1148,15 @@ static inline void ip_vs_conn_fill_param(struct net 
*net, int af, int protocol,
 struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p);
 struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
 
-struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
+struct ip_vs_conn * ip_vs_conn_in_get_proto(struct net *net, int af,
+   const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
int inverse);
 
 struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
 
-struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
+struct ip_vs_conn * ip_vs_conn_out_get_proto(struct net *net, int af,
+const struct sk_buff *skb,
 const struct ip_vs_iphdr *iph,
 int inverse);
 
@@ -1215,7 +1186,7 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
 
 const char *ip_vs_state_name(__u16 proto, int state);
 
-void ip_vs_tcp_conn_listen(struct net *net, struct ip_vs_conn *cp);
+void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp);
 int ip_vs_check_template(struct ip_vs_conn *ct);
 void 

[PATCH net-next 1/3 v5] net: track link-status of ipv4 nexthops

2015-06-18 Thread Andy Gospodarek
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off.  No action is taken,
but additional flags are passed to userspace to indicate carrier status.

This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.

v2: Split out kernel functionality into 2 patches, this patch simply sets and
clears new nexthop flag RTNH_F_LINKDOWN.

v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.

v5: Whitespace and variable declaration fixups suggested by Dave

Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
---
 include/net/ip_fib.h   |  4 +--
 include/uapi/linux/rtnetlink.h |  3 +++
 net/ipv4/fib_frontend.c| 22 ++--
 net/ipv4/fib_semantics.c   | 60 +-
 4 files changed, 66 insertions(+), 23 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..f73d27c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -305,9 +305,9 @@ void fib_flush_external(struct net *net);
 
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
-int fib_sync_down_dev(struct net_device *dev, int force);
+int fib_sync_down_dev(struct net_device *dev, unsigned long event);
 int fib_sync_down_addr(struct net *net, __be32 local);
-int fib_sync_up(struct net_device *dev);
+int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 void fib_select_multipath(struct fib_result *res);
 
 /* Exported by fib_trie.c */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..8ab874a 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -338,6 +338,9 @@ struct rtnexthop {
 #define RTNH_F_PERVASIVE   2   /* Do recursive gateway lookup  */
 #define RTNH_F_ONLINK  4   /* Gateway is forced on link*/
 #define RTNH_F_OFFLOAD 8   /* offloaded route */
+#define RTNH_F_LINKDOWN16  /* carrier-down on nexthop */
+
+#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | RTNH_F_LINKDOWN) /* used as mask 
for route comparisons */
 
 /* Macros to handle hexthops */
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..54d3c45 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1063,9 +1063,9 @@ static void nl_fib_lookup_exit(struct net *net)
net-ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, int force)
+static void fib_disable_ip(struct net_device *dev, unsigned long event)
 {
-   if (fib_sync_down_dev(dev, force))
+   if (fib_sync_down_dev(dev, event))
fib_flush(dev_net(dev));
rt_cache_flush(dev_net(dev));
arp_ifdown(dev);
@@ -1081,7 +1081,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
case NETDEV_UP:
fib_add_ifaddr(ifa);
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   fib_sync_up(dev);
+   fib_sync_up(dev, RTNH_F_DEAD);
 #endif
atomic_inc(net-ipv4.dev_addr_genid);
rt_cache_flush(dev_net(dev));
@@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
/* Last address was deleted from this interface.
 * Disable IP.
 */
-   fib_disable_ip(dev, 1);
+   fib_disable_ip(dev, event);
} else {
rt_cache_flush(dev_net(dev));
}
@@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct in_device *in_dev;
struct net *net = dev_net(dev);
+   unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, 2);
+   fib_disable_ip(dev, event);
rt_flush_dev(dev);
return NOTIFY_DONE;
}
@@ -1124,16 +1125,21 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
fib_add_ifaddr(ifa);
} endfor_ifa(in_dev);
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   fib_sync_up(dev);
+   fib_sync_up(dev, RTNH_F_DEAD);
 #endif
atomic_inc(net-ipv4.dev_addr_genid);
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, 0);
+   fib_disable_ip(dev, event);
break;
-   case NETDEV_CHANGEMTU:
case NETDEV_CHANGE:
+   flags = dev_get_flags(dev);
+   if (flags  

Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Scott Feldman
On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
go...@cumulusnetworks.com wrote:
 Signed-off-by: Andy Gospodaerk go...@cumulusnetworks.com
 Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com

 ---
  ip/iproute.c | 4 
  1 file changed, 4 insertions(+)

 diff --git a/ip/iproute.c b/ip/iproute.c
 index 3795baf..3369c49 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
 nlmsghdr *n, void *arg)
 fprintf(fp, offload );
 if (r-rtm_flags  RTM_F_NOTIFY)
 fprintf(fp, notify );
 +   if (r-rtm_flags  RTNH_F_LINKDOWN)
 +   fprintf(fp, linkdown );


iproute.c: In function ‘print_route’:
iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in
this function)
iproute.c:454:21: note: each undeclared identifier is reported only
once for each function it appears in
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/4] net/macb: bindings doc: fix compatibility string

2015-06-18 Thread Nicolas Ferre
In the driver and the DT bindings we use the atmel prefix. Fix it in the
binding documentation.

Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
---
 Documentation/devicetree/bindings/net/macb.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 8ec5fdf444e9..0ae6974383d7 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -7,8 +7,8 @@ Required properties:
   Use cdns,at32ap7000-macb for other 10/100 usage or use the generic form: 
cdns,macb.
   Use cdns,pc302-gem for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: cdns,gem.
-  Use cdns,sama5d3-gem for the Gigabit IP available on Atmel sama5d3 SoCs.
-  Use cdns,sama5d4-gem for the Gigabit IP available on Atmel sama5d4 SoCs.
+  Use atmel,sama5d3-gem for the Gigabit IP available on Atmel sama5d3 SoCs.
+  Use atmel,sama5d4-gem for the Gigabit IP available on Atmel sama5d4 SoCs.
   Use cdns,zynqmp-gem for Zynq Ultrascale+ MPSoC.
 - reg: Address and length of the register set for the device
 - interrupts: Should contain macb interrupt
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] net/macb: add sama5d2 support

2015-06-18 Thread Nicolas Ferre
Hi,

This series is basically the support for another flavor of the GEM IP
configuration. It ended up being a series because of some little fixes made to
the binding documentation before adding the new compatibility string.

Bye,

v2: - fix bindings
- add sama5d2 compatibility string to the binding documentation

Cyrille Pitchen (1):
  net/macb: add config for Atmel sama5d2 SoCs

Nicolas Ferre (3):
  net/macb: bindings doc: fix compatibility string
  net/macb: bindings doc/trivial: fix sama5d4 comment
  net/macb: bindings doc: add sama5d2 compatibility sting

 Documentation/devicetree/bindings/net/macb.txt | 5 +++--
 drivers/net/ethernet/cadence/macb.c| 8 
 2 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Eric Dumazet
On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko mho...@suse.cz wrote:

 Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
 _current_ implementation of the allocator has this nasty and very subtle
 side effect but that doesn't mean it should be abused outside of the mm
 proper. Why shouldn't this path wake the kswapd and let it compact
 memory on the background to increase the success rate for the later
 high order allocations?

I kind of agree.

If kswapd is a problem (is it ???) we should fix it, instead of adding
yet another flag to some random locations attempting
memory allocations.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] Revert tcp: switch tcp_fastopen key generation to net_get_random_once

2015-06-18 Thread Christoph Paasch
On 18/06/15 - 04:14:13, Eric Dumazet wrote:
 On Thu, 2015-06-18 at 11:32 +0200, Hannes Frederic Sowa wrote:
   There does not seem to be a better way to handle this. We could try
   to make the call to kmalloc and crypto_alloc_cipher during bootup, and
   then generate the random value only on-the-fly (when the first TFO-SYN
   comes in) with net_get_random_once in order to have the better entropy
   that comes with doing the late initialisation of the random value. But
   that's probably net-next material.
  
  can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt 
  and
  sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process 
  context
  but we still defer the extraction of entropy as long as posible?
 
 Yes, I do not think this would be hard. This bug is old (3.13) and does
 not seem very urgent to expedite a revert.

True, it would be simpler to call tcp_fastopen_init_key_once to the
setsocketopt() and inet_listen().

I will resubmit.


Christoph

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 20/22] fjes: epstop_task

2015-06-18 Thread Sergei Shtylyov

Hello.

On 6/18/2015 3:49 AM, Taku Izumi wrote:


This patch adds epstop_task.
This task is used to process other receiver's
cancellation request.



Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
  drivers/platform/x86/fjes/fjes_hw.c   | 34 ++
  drivers/platform/x86/fjes/fjes_hw.h   |  1 +
  drivers/platform/x86/fjes/fjes_main.c |  1 +
  3 files changed, 36 insertions(+)



diff --git a/drivers/platform/x86/fjes/fjes_hw.c 
b/drivers/platform/x86/fjes/fjes_hw.c
index e07b266..c22679a 100644
--- a/drivers/platform/x86/fjes/fjes_hw.c
+++ b/drivers/platform/x86/fjes/fjes_hw.c

[...]

@@ -1123,3 +1126,34 @@ static void fjes_hw_update_zone_task(struct work_struct 
*work)
}
  }

+static void fjes_hw_epstop_task(struct work_struct *work)
+{
+   struct fjes_hw *hw = container_of(work,
+   struct fjes_hw, epstop_task);


   Please start the continuation lines under 'work' on the first line.


+   struct fjes_adapter *adapter = (struct fjes_adapter *)hw-back;
+   int epid_bit;
+   unsigned long remain_bit;
+
+   while ((remain_bit = hw-epstop_req_bit)) {
+


   Don't think this empty line is needed.


+   for (epid_bit = 0; remain_bit; (remain_bit = 1),
+   (epid_bit++)) {


   Inner parens not needed, the comma operator has lowest priority.


+
+   if (remain_bit  1) {
+


   Don't think this empty line is needed.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Nicolas Ferre
Le 18/06/2015 15:30, Alexandre Belloni a écrit :
 On 18/06/2015 at 12:18:19 +0200, Nicolas Ferre wrote :
 From: Cyrille Pitchen cyrille.pitc...@atmel.com

 Add the compatible string for Atmel sama5d2 SoC family as the configuration
 options differ from other instances of the GEM.

 Signed-off-by: Cyrille Pitchen cyrille.pitc...@atmel.com
 Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
 ---
  drivers/net/ethernet/cadence/macb.c | 8 
  1 file changed, 8 insertions(+)

 diff --git a/drivers/net/ethernet/cadence/macb.c 
 b/drivers/net/ethernet/cadence/macb.c
 index 740d04fd2223..caeb39561567 100644
 --- a/drivers/net/ethernet/cadence/macb.c
 +++ b/drivers/net/ethernet/cadence/macb.c
 @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
  .init = macb_init,
  };
  
 +static const struct macb_config sama5d2_config = {
 +.caps = 0,
 +.dma_burst_length = 16,
 +.clk_init = macb_clk_init,
 +.init = macb_init,
 +};
 +
  static const struct macb_config sama5d3_config = {
  .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
  .dma_burst_length = 16,
 @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
  { .compatible = cdns,macb },
  { .compatible = cdns,pc302-gem, .data = pc302gem_config },
  { .compatible = cdns,gem, .data = pc302gem_config },
 +{ .compatible = atmel,sama5d2-gem, .data = sama5d2_config },
 
 This compatible has to be documented

Sure, I re-send a series right now (and add some documentation fixes).

Thanks, bye,

 
  { .compatible = atmel,sama5d3-gem, .data = sama5d3_config },
  { .compatible = atmel,sama5d4-gem, .data = sama5d4_config },
  { .compatible = cdns,at91rm9200-emac, .data = emac_config },
 -- 
 2.1.3

 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/22] fjes: net_device_ops.ndo_tx_timeout

2015-06-18 Thread Sergei Shtylyov

Hello.

On 6/18/2015 3:49 AM, Taku Izumi wrote:


This patch adds net_device_ops.ndo_tx_timeout callback.



Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
  drivers/platform/x86/fjes/fjes_main.c | 9 +
  1 file changed, 9 insertions(+)



diff --git a/drivers/platform/x86/fjes/fjes_main.c 
b/drivers/platform/x86/fjes/fjes_main.c
index 72541a7..84727d8 100644
--- a/drivers/platform/x86/fjes/fjes_main.c
+++ b/drivers/platform/x86/fjes/fjes_main.c

[...]

@@ -739,6 +741,13 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *skb,
return ret;
  }

+static void fjes_tx_retry(struct net_device *netdev)
+{
+   struct netdev_queue *curQueue = netdev_get_tx_queue(netdev, 0);


   No CamelCase, please.

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v4] net: track link-status of ipv4 nexthops

2015-06-18 Thread Andy Gospodarek
On Thu, Jun 18, 2015 at 03:26:30AM -0700, David Miller wrote:
 From: Andy Gospodarek go...@cumulusnetworks.com
 Date: Mon, 15 Jun 2015 12:33:19 -0400
 
  @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block 
  *this, unsigned long event, vo
  struct net_device *dev = netdev_notifier_info_to_dev(ptr);
  struct in_device *in_dev;
  struct net *net = dev_net(dev);
  +   unsigned flags;
 
 Please always fully spell out unsigned int instead of shortening it to
 just unsigned, thanks.
 
  @@ -920,11 +926,17 @@ struct fib_info *fib_create_info(struct fib_config 
  *cfg)
  if (!nh-nh_dev)
  goto failure;
  } else {
  +   int linkdown = 0;
  change_nexthops(fi) {
 
 Please put an empty line between local variable declarations and
 code.

Ugh, thanks.  I'll fixup this and your other comments with v5.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/3 v5] net: ipv4 sysctl option to ignore routes when nexthop link is down

2015-06-18 Thread Andy Gospodarek
This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it the
netlink listeners would have no idea whether or not a nexthop would be
selected.   The kernel only sets RTNH_F_DEAD internally if the inteface has
IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
cache
local 80.0.0.1 dev lo  src 80.0.0.1
cache local
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
cache
local 80.0.0.1 dev lo  src 80.0.0.1
cache local
80.0.0.2 dev p8p1  src 80.0.0.1
cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
---
 include/linux/inetdevice.h|  3 +++
 include/net/fib_rules.h   |  3 ++-
 include/net/ip_fib.h  | 16 +---
 include/uapi/linux/ip.h   |  1 +
 net/ipv4/devinet.c|  2 ++
 net/ipv4/fib_frontend.c   |  6 +++---
 net/ipv4/fib_rules.c  |  5 +++--
 net/ipv4/fib_semantics.c  | 31 ++-
 net/ipv4/fib_trie.c   |  7 +++
 net/ipv4/netfilter/ipt_rpfilter.c |  2 +-
 net/ipv4/route.c  | 10 +-
 11 files changed, 62 insertions(+), 24 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 0a21fbe..a4328ce 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -120,6 +120,9 @@ static inline void ipv4_devconf_setall(struct in_device 
*in_dev)
 || (!IN_DEV_FORWARD(in_dev)  \
  IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS)))
 
+#define IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) \
+   IN_DEV_CONF_GET((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
+
 #define IN_DEV_ARPFILTER(in_dev)   IN_DEV_ORCONF((in_dev), ARPFILTER)
 #define IN_DEV_ARP_ACCEPT(in_dev)  IN_DEV_ORCONF((in_dev), ARP_ACCEPT)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6d67383..903a55e 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -36,7 +36,8 @@ struct fib_lookup_arg {
void*result;
struct fib_rule *rule;
int flags;
-#define FIB_LOOKUP_NOREF   1
+#define FIB_LOOKUP_NOREF   1
+#define FIB_LOOKUP_IGNORE_LINKSTATE2
 };
 
 struct 

[PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status

2015-06-18 Thread Andy Gospodarek
This series adds the ability to have the Linux kernel track whether or
not a particular route should be used based on the link-status of the
interface associated with the next-hop.

Before this patch any link-failure on an interface that was serving as a
gateway for some systems could result in those systems being isolated
from the rest of the network as the stack would continue to attempt to
send frames out of an interface that is actually linked-down.  When the
kernel is responsible for all forwarding, it should also be responsible
for taking action when the traffic can no longer be forwarded -- there
is no real need to outsource link-monitoring to userspace anymore.

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, the kernel will not only report to
userspace that the link is down, but it will also report to userspace
that a route is dead.  This will signal to userspace that the route will
not be selected.

With the new sysctls set, the following behavior can be observed
(interface p8p1 is link-down):

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 
# ip route get 90.0.0.1 
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1 
cache 
# ip route get 80.0.0.1 
local 80.0.0.1 dev lo  src 80.0.0.1 
cache local 
# ip route get 80.0.0.2
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15 
cache 

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2 
# ip route get 90.0.0.1 
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1 
cache 
# ip route get 80.0.0.1 
local 80.0.0.1 dev lo  src 80.0.0.1 
cache local 
# ip route get 80.0.0.2
80.0.0.2 dev p8p1  src 80.0.0.1 
cache 

and the output changes to what one would expect.

If the global or interface sysctl is not set, the following output would be
expected when p8p1 is down:

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 

If the dead flag does not appear there should be no expectation that the
kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches: first to add linkdown flag and
second to add new sysctl settings.  Also took suggestion from Alex to
simplify code by only checking sysctl during fib lookup and suggestion
from Scott to add a per-interface sysctl.  Added iproute2 patch to
recognize and print linkdown flag.

v3: Code cleanups along with reverse-path checks suggested by Alex and
small fixes related to problems found when multipath was disabled.

v4: Drop binary sysctls

v5: Whitespace and variable declaration fixups suggested by Dave

Though there were some that preferred not to have a configuration option
and to make this behavior the default when it was discussed in Ottawa
earlier this year since it was time to do this.  I wanted to propose
the config option to preserve the current behavior for those that desire
it.  I'll happily remove it if Dave and Linus approve.

An IPv6 implementation is also needed (DECnet too!), but I wanted to start with
the IPv4 implementation to get people comfortable with the idea before moving
forward.  If this is accepted the IPv6 implementation can be posted shortly.

There was also a request for switchdev support for this, but that will be
posted as a followup as switchdev does not currently handle dead
next-hops in a multi-path case and I felt that infra needed to be added
first.

FWIW, we have been running the original version of this series with a
global sysctl and our customers have been happily using a backported
version for IPv4 and IPv6 for 6 months.

Andy Gospodarek (3):
  net: track link-status of ipv4 nexthops
  net: ipv4 sysctl option to ignore 

[PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Andy Gospodarek
Signed-off-by: Andy Gospodaerk go...@cumulusnetworks.com
Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com

---
 ip/iproute.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ip/iproute.c b/ip/iproute.c
index 3795baf..3369c49 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
fprintf(fp, offload );
if (r-rtm_flags  RTM_F_NOTIFY)
fprintf(fp, notify );
+   if (r-rtm_flags  RTNH_F_LINKDOWN)
+   fprintf(fp, linkdown );
if (tb[RTA_MARK]) {
unsigned int mark = *(unsigned int*)RTA_DATA(tb[RTA_MARK]);
if (mark) {
@@ -670,6 +672,8 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
fprintf(fp,  onlink);
if (nh-rtnh_flags  RTNH_F_PERVASIVE)
fprintf(fp,  pervasive);
+   if (nh-rtnh_flags  RTNH_F_LINKDOWN)
+   fprintf(fp,  linkdown);
len -= NLMSG_ALIGN(nh-rtnh_len);
nh = RTNH_NEXT(nh);
}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] net/macb: add sama5d2 support

2015-06-18 Thread Alexandre Belloni
On 18/06/2015 at 16:27:19 +0200, Nicolas Ferre wrote :
 Hi,
 
 This series is basically the support for another flavor of the GEM IP
 configuration. It ended up being a series because of some little fixes made to
 the binding documentation before adding the new compatibility string.
 
 Bye,
 
 v2: - fix bindings
 - add sama5d2 compatibility string to the binding documentation
 
 Cyrille Pitchen (1):
   net/macb: add config for Atmel sama5d2 SoCs
 
 Nicolas Ferre (3):
   net/macb: bindings doc: fix compatibility string
   net/macb: bindings doc/trivial: fix sama5d4 comment
   net/macb: bindings doc: add sama5d2 compatibility sting
 
  Documentation/devicetree/bindings/net/macb.txt | 5 +++--
  drivers/net/ethernet/cadence/macb.c| 8 
  2 files changed, 11 insertions(+), 2 deletions(-)
 

For the patch set:
Acked-by: Alexandre Belloni alexandre.bell...@free-electrons.com

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Vlastimil Babka

On 06/18/2015 04:43 PM, Michal Hocko wrote:

On Thu 18-06-15 07:35:53, Eric Dumazet wrote:

On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko mho...@suse.cz wrote:


Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
_current_ implementation of the allocator has this nasty and very subtle
side effect but that doesn't mean it should be abused outside of the mm
proper. Why shouldn't this path wake the kswapd and let it compact
memory on the background to increase the success rate for the later
high order allocations?


I kind of agree.

If kswapd is a problem (is it ???) we should fix it, instead of adding
yet another flag to some random locations attempting
memory allocations.


No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
access some portion of the memory reserves (see gfp_to_alloc_flags resp.
__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
hack to not give that access which was introduced for THP AFAIR.

The implicit access to memory reserves for non sleeping allocation has
been there for ages and it might be not suitable for this particular
path but that doesn't mean another gfp flag with a different side effect
should be hijacked. We should either stop doing that implicit access to
memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
that is a problem to be solved in the mm proper. Spreading subtle
dependencies outside of mm will just make situation worse.


So you are not proposing to use these __GFP_RESERVE/NORESERVE flag 
outside of mm, right? (besides, we distinguish several kinds of 
reserves, so what exactly would the flag do?) As that would be also 
subtle dependency. The general problem I think is that we should want 
the mm users to specify higher-level intentions (such as GFP_KERNEL) 
which would map to specific directions (__GFP_*) for the allocator, and 
currently it's rather a mess of both kinds of flags. Clearly the 
intention here is opportunistic allocation that should not 
reclaim/compact, use reserves, wake up kswapd (?) because it's better to 
fall back to smaller pages than wait) and we don't seem to have a 
GFP_OPPORTUNISTIC flag for that. The allocation has to then mask out 
__GFP_WAIT which however looks like an atomic allocation to the 
allocator and give access to reserves, etc...

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/22] fjes: net_device_ops.ndo_get_stats64

2015-06-18 Thread Sergei Shtylyov

Hello.

On 6/18/2015 3:49 AM, Taku Izumi wrote:


This patch adds net_device_ops.ndo_get_stats64 callback.



Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
---
  drivers/platform/x86/fjes/fjes_main.c | 14 ++
  1 file changed, 14 insertions(+)



diff --git a/drivers/platform/x86/fjes/fjes_main.c 
b/drivers/platform/x86/fjes/fjes_main.c
index 97bf487..eeda824 100644
--- a/drivers/platform/x86/fjes/fjes_main.c
+++ b/drivers/platform/x86/fjes/fjes_main.c
@@ -57,6 +57,8 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *,
  static void fjes_raise_intr_rxdata_task(struct work_struct *);
  static void fjes_tx_stall_task(struct work_struct *);
  static irqreturn_t fjes_intr(int, void*);
+static struct rtnl_link_stats64
+*fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);


   I'd leave * on the first line, otherwise it looks quite ugly..

[...]

@@ -734,6 +737,17 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *skb,
return ret;
  }

+static struct rtnl_link_stats64
+*fjes_get_stats64(struct net_device *netdev,


   Same here.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next v2 2/5] net: add phys ID compare helper to test if two IDs are the same

2015-06-18 Thread Sergei Shtylyov

Hello.

On 6/18/2015 12:53 AM, sfel...@gmail.com wrote:


From: Scott Feldman sfel...@gmail.com



Signed-off-by: Scott Feldman sfel...@gmail.com
---
  include/linux/netdevice.h |7 +++
  net/switchdev/switchdev.c |8 ++--
  2 files changed, 9 insertions(+), 6 deletions(-)



diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7be616e1..63090ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -766,6 +766,13 @@ struct netdev_phys_item_id {
unsigned char id_len;
  };

+static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
+   struct netdev_phys_item_id *b)
+{
+   return ((a-id_len == b-id_len) 
+   (memcmp(a-id, b-id, a-id_len) == 0));


   Parens around the *return* expression not needed (and neither the ones 
around ==).


[...]

WBR, Sergei

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Michal Hocko
On Wed 17-06-15 16:02:59, David Rientjes wrote:
 On Fri, 12 Jun 2015, Vlastimil Babka wrote:
 
   diff --git a/net/core/skbuff.c b/net/core/skbuff.c
   index 3cfff2a..41ec022 100644
   --- a/net/core/skbuff.c
   +++ b/net/core/skbuff.c
   @@ -4398,7 +4398,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long
   header_len,
   
 while (order) {
 if (npages = 1  order) {
   - page = alloc_pages(gfp_mask |
   + page = alloc_pages((gfp_mask  ~__GFP_WAIT) |
__GFP_COMP |
__GFP_NOWARN |
__GFP_NORETRY,
  
  Note that __GFP_NORETRY is weaker than ~__GFP_WAIT and thus redundant. But 
  it
  won't hurt anything leaving it there. And you might consider __GFP_NO_KSWAPD
  instead, as I said in the other thread.
  
 
 Yeah, I agreed with __GFP_NO_KSWAPD to avoid utilizing memory reserves for 
 this.

Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
_current_ implementation of the allocator has this nasty and very subtle
side effect but that doesn't mean it should be abused outside of the mm
proper. Why shouldn't this path wake the kswapd and let it compact
memory on the background to increase the success rate for the later
high order allocations?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Michal Hocko
On Thu 18-06-15 07:35:53, Eric Dumazet wrote:
 On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko mho...@suse.cz wrote:
 
  Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
  _current_ implementation of the allocator has this nasty and very subtle
  side effect but that doesn't mean it should be abused outside of the mm
  proper. Why shouldn't this path wake the kswapd and let it compact
  memory on the background to increase the success rate for the later
  high order allocations?
 
 I kind of agree.
 
 If kswapd is a problem (is it ???) we should fix it, instead of adding
 yet another flag to some random locations attempting
 memory allocations.

No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
access some portion of the memory reserves (see gfp_to_alloc_flags resp.
__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
hack to not give that access which was introduced for THP AFAIR.

The implicit access to memory reserves for non sleeping allocation has
been there for ages and it might be not suitable for this particular
path but that doesn't mean another gfp flag with a different side effect
should be hijacked. We should either stop doing that implicit access to
memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
that is a problem to be solved in the mm proper. Spreading subtle
dependencies outside of mm will just make situation worse. 
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Andy Gospodarek
On Thu, Jun 18, 2015 at 04:17:36AM -0700, Eric Dumazet wrote:
 On Wed, 2015-06-17 at 17:59 -0700, Mahesh Bandewar wrote:
  Actor and Partner details can be accessed via proc-fs, sys-fs
  entries or netlink interface. These interfaces are world readable
  at this moment. The earlier patch-series made the LACP communication
  secure to avoid nuisance attack from within the same L2 domain but
  it did not prevent someone unprivileged looking at that information
  on host and perform the same act.
  
  This patch essentially avoids spitting those entries if the user
  in question does not have enough privileges.
  
  Signed-off-by: Mahesh Bandewar mahe...@google.com
  ---
   drivers/net/bonding/bond_netlink.c |  11 ++--
   drivers/net/bonding/bond_procfs.c  | 101 
  +++--
   drivers/net/bonding/bond_sysfs.c   |  12 ++---
   3 files changed, 67 insertions(+), 57 deletions(-)
  
  diff --git a/drivers/net/bonding/bond_netlink.c 
  b/drivers/net/bonding/bond_netlink.c
  index 5580fcde738f..3fd3aa4b145e 100644
  --- a/drivers/net/bonding/bond_netlink.c
  +++ b/drivers/net/bonding/bond_netlink.c
  @@ -600,18 +600,23 @@ static int bond_fill_info(struct sk_buff *skb,
   
  if (BOND_MODE(bond) == BOND_MODE_8023AD) {
  struct ad_info info;
  +   u8 zero_mac[ETH_ALEN];
   
  +   eth_zero_addr(zero_mac);
  if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
  -   bond-params.ad_actor_sys_prio))
  +   capable(CAP_NET_ADMIN) ?
  +   bond-params.ad_actor_sys_prio : 0))
  goto nla_put_failure;
   
  if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
  -   bond-params.ad_user_port_key))
  +   capable(CAP_NET_ADMIN) ?
  +   bond-params.ad_user_port_key : 0))
  goto nla_put_failure;
   
  if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
  sizeof(bond-params.ad_actor_system),
  -   bond-params.ad_actor_system))
  +   capable(CAP_NET_ADMIN) ?
  +   bond-params.ad_actor_system : zero_mac))
  goto nla_put_failure;
   
 
 Hmm... I would rather not send these fake attributes at all ?

That would be my preference as well.  Sorry if my lack of elaboration on
on my earlier email made this confusing.

If there are values that should not be visible to non-root users, then
don't send them at all.  Do not just send NULL values.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Michal Hocko
On Thu 18-06-15 17:22:40, Vlastimil Babka wrote:
 On 06/18/2015 04:43 PM, Michal Hocko wrote:
 On Thu 18-06-15 07:35:53, Eric Dumazet wrote:
 On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko mho...@suse.cz wrote:
 
 Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
 _current_ implementation of the allocator has this nasty and very subtle
 side effect but that doesn't mean it should be abused outside of the mm
 proper. Why shouldn't this path wake the kswapd and let it compact
 memory on the background to increase the success rate for the later
 high order allocations?
 
 I kind of agree.
 
 If kswapd is a problem (is it ???) we should fix it, instead of adding
 yet another flag to some random locations attempting
 memory allocations.
 
 No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
 access some portion of the memory reserves (see gfp_to_alloc_flags resp.
 __zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
 hack to not give that access which was introduced for THP AFAIR.
 
 The implicit access to memory reserves for non sleeping allocation has
 been there for ages and it might be not suitable for this particular
 path but that doesn't mean another gfp flag with a different side effect
 should be hijacked. We should either stop doing that implicit access to
 memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
 that is a problem to be solved in the mm proper. Spreading subtle
 dependencies outside of mm will just make situation worse.
 
 So you are not proposing to use these __GFP_RESERVE/NORESERVE flag outside
 of mm, right? (besides, we distinguish several kinds of reserves, so what
 exactly would the flag do?)

That is to be discussed. Most allocations already express their interest
in memory reserves by __GFP_HIGH directly or by GFP_ATOMIC indirectly.
So maybe we do not need any additional flag here. There are not that
many ~__GFP_WAIT and most of them seem to require it _only_ because the
context doesn't allow for sleeping (e.g. to prevent from deadlocks).

 As that would be also subtle dependency. The
 general problem I think is that we should want the mm users to specify
 higher-level intentions (such as GFP_KERNEL) which would map to specific
 directions (__GFP_*) for the allocator, and currently it's rather a mess of
 both kinds of flags.

I agree. So I think that maybe we should drop that implicit access to
memory reserves for ~__GFP_WAIT allocations and let it do what it is
documented to do.

 Clearly the intention here is opportunistic allocation that should
 not reclaim/compact, use reserves, wake up kswapd (?) because it's
 better to fall back to smaller pages than wait) and we don't seem to
 have a GFP_OPPORTUNISTIC flag for that. The allocation has to then
 mask out __GFP_WAIT which however looks like an atomic allocation to
 the allocator and give access to reserves, etc...

I think simply dropping GFP_WAIT is a good way to express that. The
fact that the current implementation gives access to memory reserves
implicitly is just a detail and the user of the allocator shouldn't care
about that.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2] tcp: Do not call tcp_fastopen_reset_cipher from interrupt context

2015-06-18 Thread Christoph Paasch
tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.

Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:

[   36.001813] BUG: sleeping function called from invalid context at 
mm/slub.c:1266
[   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   36.008250]  04f2 88007f8838a8 8171d53a 
880075a084a8
[   36.009630]  880075a08000 88007f8838c8 810967d3 
88007f883928
[   36.011076]   88007f8838f8 81096892 
88007f89be00
[   36.012494] Call Trace:
[   36.012953]  IRQ  [8171d53a] dump_stack+0x4f/0x6d
[   36.014085]  [810967d3] ___might_sleep+0x103/0x170
[   36.015117]  [81096892] __might_sleep+0x52/0x90
[   36.016117]  [8118e887] kmem_cache_alloc_trace+0x47/0x190
[   36.017266]  [81680d82] ? tcp_fastopen_reset_cipher+0x42/0x130
[   36.018485]  [81680d82] tcp_fastopen_reset_cipher+0x42/0x130
[   36.019679]  [81680f01] tcp_fastopen_init_key_once+0x61/0x70
[   36.020884]  [81680f2c] __tcp_fastopen_cookie_gen+0x1c/0x60
[   36.022058]  [816814ff] tcp_try_fastopen+0x58f/0x730
[   36.023118]  [81671788] tcp_conn_request+0x3e8/0x7b0
[   36.024185]  [810e3872] ? __module_text_address+0x12/0x60
[   36.025327]  [8167b2e1] tcp_v4_conn_request+0x51/0x60
[   36.026410]  [816727e0] tcp_rcv_state_process+0x190/0xda0
[   36.027556]  [81661f97] ? __inet_lookup_established+0x47/0x170
[   36.028784]  [8167c2ad] tcp_v4_do_rcv+0x16d/0x3d0
[   36.029832]  [812e6806] ? security_sock_rcv_skb+0x16/0x20
[   36.030936]  [8167cc8a] tcp_v4_rcv+0x77a/0x7b0
[   36.031875]  [816af8c3] ? iptable_filter_hook+0x33/0x70
[   36.032953]  [81657d22] ip_local_deliver_finish+0x92/0x1f0
[   36.034065]  [81657f1a] ip_local_deliver+0x9a/0xb0
[   36.035069]  [81657c90] ? ip_rcv+0x3d0/0x3d0
[   36.035963]  [81657569] ip_rcv_finish+0x119/0x330
[   36.036950]  [81657ba7] ip_rcv+0x2e7/0x3d0
[   36.037847]  [81610652] __netif_receive_skb_core+0x552/0x930
[   36.038994]  [81610a57] __netif_receive_skb+0x27/0x70
[   36.040033]  [81610b72] process_backlog+0xd2/0x1f0
[   36.041025]  [81611482] net_rx_action+0x122/0x310
[   36.042007]  [81076743] __do_softirq+0x103/0x2f0
[   36.042978]  [81723e3c] do_softirq_own_stack+0x1c/0x30

This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)

Cc: Eric Dumazet eric.duma...@gmail.com
Cc: Hannes Frederic Sowa han...@stressinduktion.org
Fixes: 222e83d2e0ae (tcp: switch tcp_fastopen key generation to 
net_get_random_once)
Signed-off-by: Christoph Paasch cpaa...@apple.com
---

Notes:
v2: Instead of reverting Hannes' patch, move the call to 
tcp_fastopen_init_once
to the places where we enable TFO on the server-side from user-context.

 net/ipv4/af_inet.c  | 2 ++
 net/ipv4/tcp.c  | 7 +--
 net/ipv4/tcp_fastopen.c | 2 --
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8b47a4d79d04..a5aa54ea6533 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog)
err = 0;
if (err)
goto out;
+
+   tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f1377f2a0472..bb2ce74f6004 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2545,10 +2545,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 
case TCP_FASTOPEN:
if (val = 0  ((1  sk-sk_state)  (TCPF_CLOSE |
-   TCPF_LISTEN)))
+   TCPF_LISTEN))) {
+   tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
-   else
+   } else {
err = -EINVAL;
+   }
break;
case TCP_TIMESTAMP:
if (!tp-repair)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 

[PATCH] mvneta: add forgotten initialization of autonegotiation bits

2015-06-18 Thread Stas Sergeev

The commit 898b2970e2c9 (mvneta: implement SGMII-based in-band link state
signaling)
changed mvneta_adjust_link() so that it does not clear the auto-negotiation
bits in MVNETA_GMAC_AUTONEG_CONFIG register. This was necessary for
auto-negotiation mode to work.
Unfortunately I haven't checked if these bits are ever initialized.
It appears they are not.
This patch adds the missing initialization of the auto-negotiation bits
in the MVNETA_GMAC_AUTONEG_CONFIG register.
It fixes the following regression:
https://www.mail-archive.com/netdev@vger.kernel.org/msg67928.html

Since the patch was tested to fix a regression, it should be applied to
stable tree.

Tested-by: Arnaud Ebalard a...@natisbad.org

CC: Thomas Petazzoni thomas.petazz...@free-electrons.com
CC: Florian Fainelli f.faine...@gmail.com
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
CC: sta...@vger.kernel.org

Signed-off-by: Stas Sergeev s...@users.sourceforge.net
---
 drivers/net/ethernet/marvell/mvneta.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index ce5f7f9..74176ec 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1013,6 +1013,12 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
val = mvreg_read(pp, MVNETA_GMAC_CLOCK_DIVIDER);
val |= MVNETA_GMAC_1MS_CLOCK_ENABLE;
mvreg_write(pp, MVNETA_GMAC_CLOCK_DIVIDER, val);
+   } else {
+   val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
+   val = ~(MVNETA_GMAC_INBAND_AN_ENABLE |
+  MVNETA_GMAC_AN_SPEED_EN |
+  MVNETA_GMAC_AN_DUPLEX_EN);
+   mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val);
}

mvneta_set_ucast_table(pp, -1);
-- 
1.9.1
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Andy Gospodarek
On Thu, Jun 18, 2015 at 08:43:08AM -0700, Scott Feldman wrote:
 On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
  Signed-off-by: Andy Gospodaerk go...@cumulusnetworks.com
  Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
 
  ---
   ip/iproute.c | 4 
   1 file changed, 4 insertions(+)
 
  diff --git a/ip/iproute.c b/ip/iproute.c
  index 3795baf..3369c49 100644
  --- a/ip/iproute.c
  +++ b/ip/iproute.c
  @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
  nlmsghdr *n, void *arg)
  fprintf(fp, offload );
  if (r-rtm_flags  RTM_F_NOTIFY)
  fprintf(fp, notify );
  +   if (r-rtm_flags  RTNH_F_LINKDOWN)
  +   fprintf(fp, linkdown );
 
 
 iproute.c: In function ‘print_route’:
 iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in
 this function)
 iproute.c:454:21: note: each undeclared identifier is reported only
 once for each function it appears in

Yes, you need to pull that from the patches above into your iproute2
sources.  Stephen regularly tells people not to pose uapi updates, so I
did not.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Alexei Starovoitov
On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote:
 Original code has a problem, cause following code failed to pass verifier:
 
  r1 - r10
  r1 -= 8
  r2 = 8
  r3 = unsafe pointer
  call BPF_FUNC_probe_read  -- R1 type=inv expected=fp
 
 However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
 loaded successfully.
 
 This is because the verifier allows only BPF_ADD instruction on a
 FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
 on FRAME_PTR reigster to get a UNKNOWN_VALUE register.
 
 This patch fix it by adding BPF_SUB in stack_relative checking.

It's not a bug. It's catching ADD only by design.
If we let it recognize SUB then one might argue we should let it
recognize multiply, shifts and all other arithmetic on pointers.
verifier will be getting bigger and bigger. Where do we stop?
llvm only emits canonical ADD. If you've seen llvm doing SUB,
let's fix it there.
So what piece generated this 'r1 -= 8' ?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Scott Feldman
On Thu, Jun 18, 2015 at 8:57 AM, Andy Gospodarek
go...@cumulusnetworks.com wrote:
 On Thu, Jun 18, 2015 at 08:43:08AM -0700, Scott Feldman wrote:
 On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
  Signed-off-by: Andy Gospodaerk go...@cumulusnetworks.com
  Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
 
  ---
   ip/iproute.c | 4 
   1 file changed, 4 insertions(+)
 
  diff --git a/ip/iproute.c b/ip/iproute.c
  index 3795baf..3369c49 100644
  --- a/ip/iproute.c
  +++ b/ip/iproute.c
  @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
  nlmsghdr *n, void *arg)
  fprintf(fp, offload );
  if (r-rtm_flags  RTM_F_NOTIFY)
  fprintf(fp, notify );
  +   if (r-rtm_flags  RTNH_F_LINKDOWN)
  +   fprintf(fp, linkdown );


 iproute.c: In function ‘print_route’:
 iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in
 this function)
 iproute.c:454:21: note: each undeclared identifier is reported only
 once for each function it appears in

 Yes, you need to pull that from the patches above into your iproute2
 sources.  Stephen regularly tells people not to pose uapi updates, so I
 did not.

Ok, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e1000e driver - hang after 4 hours of uptime - finally bisected!

2015-06-18 Thread Valdis Kletnieks
(follow up to a report from last week - bisecting took a while as I could
only do 1 or 2 tests an evening)

My Dell Latitude E6530 crashes with a specific kernel lockup almost
exactly 4 hours after boot if there isn't a cable connected to the
Ethernet port:

[14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
1
[14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
2
[14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
1
[14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
1
[14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
3
[14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0

As far as I can tell, the timestamp jitter is just how long it takes me to
enter the cryptLUKS passphrase for the hard drive at boot...

lspci tells me:

lspci -vvv -s 00:19.0
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network 
Connection (rev 04)
DeviceName:  Onboard LAN
Subsystem: Dell Device 0535
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 28
Region 0: Memory at f770 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at f7739000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at f040 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: fee00318  Data: 
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e


The traceback always looks like:

[14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0

[14479.906908] Call Trace:
[14479.906914]  NMI  [ba94db16] dump_stack+0x50/0xa8
[14479.906930]  [ba948bb9] panic+0xcd/0x1e4
[14479.906940]  [ba166a60] ? perf_event_task_disable+0xc0/0xc0
[14479.906952]  [ba125d8b] watchdog_overflow_callback+0x9b/0xa0
[14479.906959]  [ba16a684] __perf_event_overflow+0xc4/0x1f0
[14479.906968]  [ba16b3a4] perf_event_overflow+0x14/0x20
[14479.906976]  [ba022271] intel_pmu_handle_irq+0x1e1/0x430
[14479.906990]  [ba01a0f6] perf_event_nmi_handler+0x26/0x40
[14479.906999]  [ba0085b3] nmi_handle+0x103/0x340
[14479.907005]  [ba0084b5] ? nmi_handle+0x5/0x340
[14479.907017]  [ba008a53] default_do_nmi+0xc3/0x120
[14479.907032]  [ba008b98] do_nmi+0xe8/0x130
[14479.907044]  [ba95c9a8] end_repeat_nmi+0x1e/0x2e
[14479.907055]  [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0
[14479.907061]  [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0
[14479.907069]  [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0
[14479.907075]  EOE  [ba0e9529] timecounter_read+0x19/0x60
[14479.907088]  [ba53687e] e1000e_phc_gettime+0x2e/0x60
[14479.907098]  [ba536a31] e1000e_systim_overflow_work+0x31/0x70
[14479.907105]  [ba07ad19] process_one_work+0x3c9/0x980
[14479.907115]  [ba07ac62] ? process_one_work+0x312/0x980
[14479.907125]  [ba07b348] ? worker_thread+0x78/0x760
[14479.907134]  [ba07b59c] worker_thread+0x2cc/0x760
[14479.907144]  [ba07b2d0] ? process_one_work+0x980/0x980
[14479.907154]  [ba082a5e] kthread+0xfe/0x120
[14479.907163]  [ba08ca50] ? finish_task_switch+0x50/0x1c0
[14479.907173]  [ba082960] ? kthread_create_on_node+0x270/0x270
[14479.907179]  [ba95ae4f] ret_from_fork+0x3f/0x70
[14479.907188]  [ba082960] ? kthread_create_on_node+0x270/0x270
[14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Bisection tells me it's this commit:

commit 83129b37ef35bb6a7f01c060129736a8db5d31c4
Author: Yanir Lubetkin yanirx.lubet...@intel.com
Date:   Tue Jun 2 17:05:45 2015 +0300

e1000e: fix systim issues

Two issues involving systim were reported.
1. Clock is not running in the correct frequency
2. In some situations, systim values were not incremented linearly
This patch fixes the hardware clock configuration and the 

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Steven Rostedt
On Thu, 18 Jun 2015 18:50:51 -0400
Jeff Layton jlay...@poochiereds.net wrote:
 
 The interesting bit here is that the sockets all seem to connect to port
 55201 on the remote host, if I'm reading these traces correctly. What's
 listening on that port on the server?
 
 This might give some helpful info:
 
 $ rpcinfo -p NFS servername

# rpcinfo -p wife
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
1000241   udp  34243  status
1000241   tcp  34498  status

# rpcinfo -p localhost
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
1000241   udp  38332  status
1000241   tcp  52684  status
132   tcp   2049  nfs
133   tcp   2049  nfs
134   tcp   2049  nfs
1002272   tcp   2049
1002273   tcp   2049
132   udp   2049  nfs
133   udp   2049  nfs
134   udp   2049  nfs
1002272   udp   2049
1002273   udp   2049
1000211   udp  53218  nlockmgr
1000213   udp  53218  nlockmgr
1000214   udp  53218  nlockmgr
1000211   tcp  49825  nlockmgr
1000213   tcp  49825  nlockmgr
1000214   tcp  49825  nlockmgr
151   udp  49166  mountd
151   tcp  48797  mountd
152   udp  47856  mountd
152   tcp  53839  mountd
153   udp  36090  mountd
153   tcp  46390  mountd

Note, the box has been rebooted since I posted my last trace.

 
 Also, what NFS version are you using to mount here? Your fstab entries
 suggest that you're using the default version (for whatever distro this
 is), but have you (e.g.) set up nfsmount.conf to default to v3 on this
 box?
 

My box is Debian testing (recently updated).

# dpkg -l nfs-*

ii  nfs-common 1:1.2.8-9amd64NFS support files common to clien
ii  nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server


same for both boxes.

nfsmount.conf doesn't exist on either box.

I'm assuming it is using nfs4.

Anything else I can provide?

-- Steve
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Steven Rostedt
On Thu, 18 Jun 2015 21:37:02 -0400
Jeff Layton jlay...@poochiereds.net wrote:

  Note, the box has been rebooted since I posted my last trace.
  
 
 Ahh pity. The port has probably changed...if you trace it again maybe
 try to figure out what it's talking to before rebooting the server?

I could probably re-enable the trace again.

Would it be best if I put back the commits and run it with the buggy
kernel. I could then run these commands after the bug happens and/or
before the port goes away.

 
 Oh! I was thinking that you were seeing this extra port on the
 _client_, but now rereading your original mail I see that it's
 appearing up on the NFS server. Is that correct?

Correct, the bug is on the NFS server, not the client. The client is
already up and running, and had the filesystem mounted when the server
rebooted. I take it that this happened when the client tried to
reconnect.

Just let me know what you would like to do. As this is my main
production server of my local network, I would only be able to do this
a few times. Let me know all the commands and tracing you would like to
have. I'll try it tomorrow (going to bed now).

-- Steve


 
 So, assuming that this is NFSv4.0, then this port is probably bound
 when the server is establishing the callback channel to the client. So
 we may need to look at how those xprts are being created and whether
 there are differences from a standard client xprt.
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 00/22] FUJITSU Extended Socket network device driver

2015-06-18 Thread Izumi, Taku

 Thank you for reviewing.

 As Alex mentioned earlier, I suspect this is more appropriate for drivers/net.
 If David objects, we can consider for platform/drivers/x86.

 OK, I'll migrate the code from drivers/platform/x86 to drivers/net and also
 incorporate comments. I'm going to resend one soon.

 Sincerely,
 Taku Izumi

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] fm10k: Report MAC address on driver load

2015-06-18 Thread Alexander Duyck

On 06/18/2015 04:49 PM, Jeff Kirsher wrote:

On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote:

This change adds the MAC address to the list of values recorded on
driver
load.  The MAC address represents the serial number of the unit and
allows
us to track the value should a card be replaced in a system.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

With the recent fm10k patches that Jake submitted, this patch no longer
applies cleanly.  If you could re-spin your patch against my next-queue
tree (dev-queue branch) that would be much appreciated.


I should have a new patch for you in 20 minutes or so.  Just waiting on 
the build to finish and then I'll give it a quick test.


- Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] fm10k: Report MAC address on driver load

2015-06-18 Thread Alexander Duyck
This change adds the MAC address to the list of values recorded on driver
load.  The MAC address represents the serial number of the unit and allows
us to track the value should a card be replaced in a system.

The log message should now be similar in output to that of ixgbe.

Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
---

v2: Moved printing of MAC onto separate line similar to ixgbe.

(Hopefully this works for you Jeff.  I took at look at the patch and just
 moved the bit I needed down.  I figured since this block hasn't changed I
 should be able to get away with just doing this instead of pulling and
 rebasing off of your tree. )

 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index ce53ff25f88d..62a584f633d8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1843,6 +1843,9 @@ static int fm10k_probe(struct pci_dev *pdev,
/* print warning for non-optimal configurations */
fm10k_slot_warn(interface);
 
+   /* report MAC address for logging */
+   dev_info(pdev-dev, %pM\n, netdev-dev_addr);
+
/* enable SR-IOV after registering netdev to enforce PF/VF ordering */
fm10k_iov_configure(pdev, 0);
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-18 Thread Roopa Prabhu
From: Roopa Prabhu ro...@cumulusnetworks.com

Introduces two netlink attributes RTA_ENCAP_TYPE and
RTA_ENCAP to support attaching encap information to ipv4 routes.

RTA_ENCAP is a nested attribute as suggested by Thomas
(and also as Robert had it in his series). RTA_ENCAP
netlink policy is declared by the light weight tunnel
drivers that support this encap type.

fib code calls the following for each nexthop:
- new route handler:
lwt build state (that parses RTA_ENCAP and returns
lwt state that lives in every fib_nh)
- del dump hanlder:
lwt release handler to release lwt state data
- route dump hanlder:
lwt dump encap to fill RTA_ENCAP data
- during input route lookup
sets dst-output to lwtunnel_output which
in turn calls the corresponding lwt tunnel
output function which applies the required
encap and xmits the packet

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/net/ip_fib.h   |7 ++-
 include/net/route.h|3 ++
 include/uapi/linux/rtnetlink.h |3 +-
 net/ipv4/fib_frontend.c|8 
 net/ipv4/fib_semantics.c   |   93 +++-
 net/ipv4/route.c   |   33 +-
 6 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..49f18d7 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -44,7 +44,9 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
- };
+   struct nlattr   *fc_encap;
+   u16 fc_encap_type;
+};
 
 struct fib_info;
 struct rtable;
@@ -89,6 +91,9 @@ struct fib_nh {
struct rtable __rcu * __percpu *nh_pcpu_rth_output;
struct rtable __rcu *nh_rth_input;
struct fnhe_hash_bucket __rcu *nh_exceptions;
+#ifdef CONFIG_LWTUNNEL
+   struct lwtunnel_state   *nh_lwtstate;
+#endif
 };
 
 /*
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..39a6495 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -66,6 +66,9 @@ struct rtable {
 
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+#ifdef CONFIG_LWTUNNEL
+   struct lwtunnel_state   *rt_lwtstate;
+#endif
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..6c089ad 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -308,6 +308,8 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_ENCAP_TYPE,
+   RTA_ENCAP,
__RTA_MAX
 };
 
@@ -357,7 +359,6 @@ struct rtvia {
 };
 
 /* RTM_CACHEINFO */
-
 struct rta_cacheinfo {
__u32   rta_clntref;
__u32   rta_lastuse;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..fbe0630 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -591,6 +591,8 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_FLOW]  = { .type = NLA_U32 },
+   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
+   [RTA_ENCAP] = { .type = NLA_NESTED },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
@@ -656,6 +658,12 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
case RTA_TABLE:
cfg-fc_table = nla_get_u32(attr);
break;
+   case RTA_ENCAP:
+   cfg-fc_encap = attr;
+   break;
+   case RTA_ENCAP_TYPE:
+   cfg-fc_encap_type = nla_get_u16(attr);
+   break;
}
}
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 28ec3c1..54dd287 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -42,6 +42,7 @@
 #include net/ip_fib.h
 #include net/netlink.h
 #include net/nexthop.h
+#include net/lwtunnel.h
 
 #include fib_lookup.h
 
@@ -208,6 +209,10 @@ static void free_fib_info_rcu(struct rcu_head *head)
change_nexthops(fi) {
if (nexthop_nh-nh_dev)
dev_put(nexthop_nh-nh_dev);
+#ifdef CONFIG_LWTUNNEL
+   if (nexthop_nh-nh_lwtstate)
+   lwtunnel_state_put(nexthop_nh-nh_lwtstate);
+#endif
free_nh_exceptions(nexthop_nh);
rt_fibinfo_free_cpus(nexthop_nh-nh_pcpu_rth_output);
rt_fibinfo_free(nexthop_nh-nh_rth_input);
@@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct 

[PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-18 Thread Roopa Prabhu
From: Roopa Prabhu ro...@cumulusnetworks.com

provides ops to parse, build and output encaped
packets for drivers that want to attach tunnel encap
information to routes.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/linux/lwtunnel.h  |6 ++
 include/net/lwtunnel.h|   84 +
 include/uapi/linux/lwtunnel.h |   11 +++
 net/Kconfig   |5 ++
 net/core/Makefile |1 +
 net/core/lwtunnel.c   |  162 +
 6 files changed, 269 insertions(+)
 create mode 100644 include/linux/lwtunnel.h
 create mode 100644 include/net/lwtunnel.h
 create mode 100644 include/uapi/linux/lwtunnel.h
 create mode 100644 net/core/lwtunnel.c

diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h
new file mode 100644
index 000..97f32f8
--- /dev/null
+++ b/include/linux/lwtunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_LWTUNNEL_H_
+#define _LINUX_LWTUNNEL_H_
+
+#include uapi/linux/lwtunnel.h
+
+#endif /* _LINUX_LWTUNNEL_H_ */
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
new file mode 100644
index 000..649da3c
--- /dev/null
+++ b/include/net/lwtunnel.h
@@ -0,0 +1,84 @@
+#ifndef __NET_LWTUNNEL_H
+#define __NET_LWTUNNEL_H 1
+
+#include linux/lwtunnel.h
+#include linux/netdevice.h
+#include linux/skbuff.h
+#include linux/types.h
+#include net/dsfield.h
+#include net/ip.h
+#include net/rtnetlink.h
+
+#define LWTUNNEL_HASH_BITS   7
+#define LWTUNNEL_HASH_SIZE   (1  LWTUNNEL_HASH_BITS)
+
+struct lwtunnel_hdr {
+   int len;
+   __u8data[0];
+};
+
+/* lw tunnel state flags */
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+
+#define lwtunnel_output_redirect(lwtstate) (lwtstate  \
+   (lwtstate-flags  LWTUNNEL_STATE_OUTPUT_REDIRECT))
+
+struct lwtunnel_state {
+   __u16   type;
+   __u16   flags;
+   atomic_trefcnt;
+   struct lwtunnel_hdr tunnel;
+};
+
+struct lwtunnel_net {
+   struct hlist_head tunnels[LWTUNNEL_HASH_SIZE];
+};
+
+struct lwtunnel_encap_ops {
+   int (*build_state)(struct net_device *dev, struct nlattr *encap,
+  struct lwtunnel_state **ts);
+   int (*output)(struct sock *sk, struct sk_buff *skb);
+   int (*fill_encap)(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate);
+   int (*get_encap_size)(struct lwtunnel_state *lwtstate);
+};
+
+#define MAX_LWTUNNEL_ENCAP_OPS 8
+extern const struct lwtunnel_encap_ops __rcu *
+   lwtun_encaps[MAX_LWTUNNEL_ENCAP_OPS];
+
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+   atomic_inc(lws-refcnt);
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+   if (!lws)
+   return;
+
+   if (atomic_dec_and_test(lws-refcnt))
+   kfree(lws);
+}
+
+static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb)
+{
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+   return rt-rt_lwtstate;
+}
+
+int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+struct nlattr *encap,
+struct lwtunnel_state **lws);
+int lwtunnel_fill_encap(struct sk_buff *skb,
+   struct lwtunnel_state *lwtstate);
+int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
+struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
+
+#endif /* __NET_LWTUNNEL_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
new file mode 100644
index 000..11150c0
--- /dev/null
+++ b/include/uapi/linux/lwtunnel.h
@@ -0,0 +1,11 @@
+#ifndef _UAPI_LWTUNNEL_H_
+#define _UAPI_LWTUNNEL_H_
+
+#include linux/types.h
+
+enum tunnel_encap_types {
+   LWTUNNEL_ENCAP_NONE,
+   LWTUNNEL_ENCAP_MPLS,
+};
+
+#endif /* _UAPI_LWTUNNEL_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 57a7c5a..e296d6f 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -374,9 +374,14 @@ source net/caif/Kconfig
 source net/ceph/Kconfig
 source net/nfc/Kconfig
 
+config LWTUNNEL
+   bool Network light weight tunnels
+   ---help---
+ light weight tunnels
 
 endif   # if NET
 
 # Used by archs to tell that they support BPF_JIT
 config HAVE_BPF_JIT
bool
+
diff --git a/net/core/Makefile b/net/core/Makefile
index fec0856..086b01f 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
 obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
 obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
+obj-$(CONFIG_LWTUNNEL) += 

[PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver

2015-06-18 Thread Roopa Prabhu
From: Roopa Prabhu ro...@cumulusnetworks.com

This series implements infrastructure for light weight tunnels to support
mpls label edge routers (ie mpls ip tunnels). As previously discussed 
having netdevices will not scale. Hence this series introduces new RTA_ENCAP*
attributes to attach encap information with routes (following suggestion
from Eric Biederman).

The first patch introduces an infrastructure to support light weight tunnels
that dont have netdevices. The infrastructure allows tunnel drivers
to register handlers to parse and build tunnel encap data which can be attached
to each route nexthop.

The second patch adds support in ipv4 fib to carry such light weight tunnel
encap data.

The third patch implements mpls ip tunnels using this light weight tunnel
infrastructure.

Could not think of a better name, so, it is 'lwt' for 'light weight tunnels'
for now.

I do have iproute2 patches. Can post them separately if required
(they are currently in my github tree
https://github.com/CumulusNetworks/iproute2 (mpls branch))

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com

v2:
- bug fixes (more testing)
- feedback from Thomas
- A flag in lwtunnel state that allows using the chosen
  output device instead of redirecting dst output to the
  lwt output function.
- This flag can be set by the tunnel driver at tunnel state
  build time
- moved lwtstate pointer from dst_entry to rtable (seemed cleaner
looking at thomas's openvswitch patches)
- moved mpls iptunnel code into separate file (following erics and
roberts initial patches)

Roopa Prabhu (3):
  lwt: infrastructure to support light weight tunnels
  ipv4: add support for light weight tunnel encap attributes
  mpls: support for ip mpls tunnels

 include/linux/lwtunnel.h   |6 ++
 include/linux/mpls_iptunnel.h  |6 ++
 include/net/ip_fib.h   |7 +-
 include/net/lwtunnel.h |   84 +++
 include/net/mpls_iptunnel.h|   29 +
 include/net/route.h|3 +
 include/uapi/linux/lwtunnel.h  |   11 ++
 include/uapi/linux/mpls_iptunnel.h |   26 +
 include/uapi/linux/rtnetlink.h |3 +-
 net/Kconfig|5 +
 net/core/Makefile  |1 +
 net/core/lwtunnel.c|  162 
 net/ipv4/fib_frontend.c|8 ++
 net/ipv4/fib_semantics.c   |   93 +++-
 net/ipv4/route.c   |   33 +-
 net/mpls/Kconfig   |5 +
 net/mpls/Makefile  |1 +
 net/mpls/af_mpls.c |9 +-
 net/mpls/internal.h|3 +
 net/mpls/mpls_iptunnel.c   |  205 
 20 files changed, 692 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/lwtunnel.h
 create mode 100644 include/linux/mpls_iptunnel.h
 create mode 100644 include/net/lwtunnel.h
 create mode 100644 include/net/mpls_iptunnel.h
 create mode 100644 include/uapi/linux/lwtunnel.h
 create mode 100644 include/uapi/linux/mpls_iptunnel.h
 create mode 100644 net/core/lwtunnel.c
 create mode 100644 net/mpls/mpls_iptunnel.c

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wangnan (F)



On 2015/6/19 0:00, Alexei Starovoitov wrote:

On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote:

Original code has a problem, cause following code failed to pass verifier:

  r1 - r10
  r1 -= 8
  r2 = 8
  r3 = unsafe pointer
  call BPF_FUNC_probe_read  -- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

It's not a bug. It's catching ADD only by design.
If we let it recognize SUB then one might argue we should let it
recognize multiply, shifts and all other arithmetic on pointers.
verifier will be getting bigger and bigger. Where do we stop?
llvm only emits canonical ADD. If you've seen llvm doing SUB,
let's fix it there.
So what piece generated this 'r1 -= 8' ?



I hit this problem when writing code of automatical parameter generator. The
instruction is generated by myself. Now I have corrected my code.

Thank you.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Andy Gospodarek
On Thu, Jun 18, 2015 at 11:30:54AM -0700, Mahesh Bandewar wrote:
 Actor and Partner details can be accessed via proc-fs, sys-fs
 entries or netlink interface. These interfaces are world readable
 at this moment. The earlier patch-series made the LACP communication
 secure to avoid nuisance attack from within the same L2 domain but
 it did not prevent someone unprivileged looking at that information
 on host and perform the same act.
 
 This patch essentially avoids spitting those entries if the user
 in question does not have enough privileges.
 
 Signed-off-by: Mahesh Bandewar mahe...@google.com
 ---
  drivers/net/bonding/bond_netlink.c |  23 +
  drivers/net/bonding/bond_procfs.c  | 101 
 +++--
  drivers/net/bonding/bond_sysfs.c   |  12 ++---
  3 files changed, 71 insertions(+), 65 deletions(-)
 
[...]
 diff --git a/drivers/net/bonding/bond_procfs.c 
 b/drivers/net/bonding/bond_procfs.c
 index e7f3047a26df..f514fe5e80a5 100644
 --- a/drivers/net/bonding/bond_procfs.c
 +++ b/drivers/net/bonding/bond_procfs.c
[...]
 @@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq,
   seq_printf(seq, Partner Churned Count: %d\n,
  port-churn_partner_count);
  
 - seq_puts(seq, details actor lacp pdu:\n);
 - seq_printf(seq, system priority: %d\n,
 -port-actor_system_priority);
 - seq_printf(seq, system mac address: %pM\n,
 -port-actor_system);
 - seq_printf(seq, port key: %d\n,
 -port-actor_oper_port_key);
 - seq_printf(seq, port priority: %d\n,
 -port-actor_port_priority);
 - seq_printf(seq, port number: %d\n,
 -port-actor_port_number);
 - seq_printf(seq, port state: %d\n,
 -port-actor_oper_port_state);
 -
 - seq_puts(seq, details partner lacp pdu:\n);
 - seq_printf(seq, system priority: %d\n,
 -port-partner_oper.system_priority);
 - seq_printf(seq, system mac address: %pM\n,
 -port-partner_oper.system);
 - seq_printf(seq, oper key: %d\n,
 -port-partner_oper.key);
 - seq_printf(seq, port priority: %d\n,
 -port-partner_oper.port_priority);
 - seq_printf(seq, port number: %d\n,
 -port-partner_oper.port_number);
 - seq_printf(seq, port state: %d\n,
 -port-partner_oper.port_state);
 + if (capable(CAP_NET_ADMIN)) {
 + seq_puts(seq, details actor lacp pdu:\n);
 + seq_printf(seq, system priority: %d\n,
 +port-actor_system_priority);
 + seq_printf(seq, system mac address: %pM\n,
 +port-actor_system);
 + seq_printf(seq, port key: %d\n,
 +port-actor_oper_port_key);
 + seq_printf(seq, port priority: %d\n,
 +port-actor_port_priority);
 + seq_printf(seq, port number: %d\n,
 +port-actor_port_number);
 + seq_printf(seq, port state: %d\n,
 +port-actor_oper_port_state);
 +
 + seq_puts(seq, details partner lacp pdu:\n);
 + seq_printf(seq, system priority: %d\n,
 +port-partner_oper.system_priority);
 + seq_printf(seq, system mac address: %pM\n,
 +port-partner_oper.system);
 + seq_printf(seq, oper key: %d\n,
 +port-partner_oper.key);
 + seq_printf(seq, port priority: %d\n,
 +port-partner_oper.port_priority);
 + seq_printf(seq, port number: %d\n,
 +port-partner_oper.port_number);
 + seq_printf(seq, port state: %d\n,
 +port-partner_oper.port_state);
 + }
   } else {
   seq_puts(seq, Aggregator ID: N/A\n);
   }

With this patch, 

[PATCH 1/1] ixgbe: use kzalloc for allocating one thing

2015-06-18 Thread Maninder Singh
Use kzalloc rather than kcalloc(1..

The semantic patch that makes this change is as follows:

// smpl
@@
@@

- kcalloc(1,
+ kzalloc(
  ...)
// /smpl

and removing checkpatch below CHECK:
CHECK: Prefer kzalloc(sizeof(*fwd_adapter)...) over 
kzalloc(sizeof(struct ixgbe_fwd_adapter)...)

Signed-off-by: Maninder Singh maninder...@samsung.com
Reviewed-by: Vaneet Narang v.nar...@samsung.com
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3bf2f3c..3f58757 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8134,7 +8134,7 @@ static void *ixgbe_fwd_add(struct net_device *pdev, 
struct net_device *vdev)
(adapter-num_rx_pools  IXGBE_MAX_MACVLANS))
return ERR_PTR(-EBUSY);
 
-   fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL);
+   fwd_adapter = kzalloc(sizeof(*fwd_adapter), GFP_KERNEL);
if (!fwd_adapter)
return ERR_PTR(-ENOMEM);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Jeff Layton
On Thu, 18 Jun 2015 21:08:43 -0400
Steven Rostedt rost...@goodmis.org wrote:

 On Thu, 18 Jun 2015 18:50:51 -0400
 Jeff Layton jlay...@poochiereds.net wrote:
  
  The interesting bit here is that the sockets all seem to connect to port
  55201 on the remote host, if I'm reading these traces correctly. What's
  listening on that port on the server?
  
  This might give some helpful info:
  
  $ rpcinfo -p NFS servername
 
 # rpcinfo -p wife
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  34243  status
 1000241   tcp  34498  status
 
 # rpcinfo -p localhost
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  38332  status
 1000241   tcp  52684  status
 132   tcp   2049  nfs
 133   tcp   2049  nfs
 134   tcp   2049  nfs
 1002272   tcp   2049
 1002273   tcp   2049
 132   udp   2049  nfs
 133   udp   2049  nfs
 134   udp   2049  nfs
 1002272   udp   2049
 1002273   udp   2049
 1000211   udp  53218  nlockmgr
 1000213   udp  53218  nlockmgr
 1000214   udp  53218  nlockmgr
 1000211   tcp  49825  nlockmgr
 1000213   tcp  49825  nlockmgr
 1000214   tcp  49825  nlockmgr
 151   udp  49166  mountd
 151   tcp  48797  mountd
 152   udp  47856  mountd
 152   tcp  53839  mountd
 153   udp  36090  mountd
 153   tcp  46390  mountd
 
 Note, the box has been rebooted since I posted my last trace.
 

Ahh pity. The port has probably changed...if you trace it again maybe
try to figure out what it's talking to before rebooting the server?

  
  Also, what NFS version are you using to mount here? Your fstab entries
  suggest that you're using the default version (for whatever distro this
  is), but have you (e.g.) set up nfsmount.conf to default to v3 on this
  box?
  
 
 My box is Debian testing (recently updated).
 
 # dpkg -l nfs-*
 
 ii  nfs-common 1:1.2.8-9amd64NFS support files common to clien
 ii  nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server
 
 
 same for both boxes.
 
 nfsmount.conf doesn't exist on either box.
 
 I'm assuming it is using nfs4.
 

(cc'ing Bruce)

Oh! I was thinking that you were seeing this extra port on the
_client_, but now rereading your original mail I see that it's
appearing up on the NFS server. Is that correct?

So, assuming that this is NFSv4.0, then this port is probably bound
when the server is establishing the callback channel to the client. So
we may need to look at how those xprts are being created and whether
there are differences from a standard client xprt.

-- 
Jeff Layton jlay...@poochiereds.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull request: bluetooth-next 2015-06-18

2015-06-18 Thread Johan Hedberg
Hi Dave,

Here's the final bluetooth-next pull request for 4.2.

 - Cleanups  fixes to 802.15.4 code and related drivers
 - Fix btusb driver memory leak
 - New USB IDs for Atheros controllers
 - Support for BCM4324B3 UART based Broadcom controller
 - Fix for Bluetooth encryption key size handling
 - Broadcom controller initialization fixes
 - Support for Intel controller DDC parameters
 - Support for multiple Bluetooth LE advertising instances
 - Fix for HCI user channel cleanup path

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit a9ab2184f451ec78af245ebb8b663d8700d44672:

  Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge 
(2015-05-31 01:07:06 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 952497b159468477392f9b562b904da9bc76d468:

  Bluetooth: Fix warning of potentially uninitialized adv_instance variable 
(2015-06-18 21:05:31 +0300)


Aleksei Volkov (1):
  Bluetooth: btusb: Correct typo in Roper Class 1 Bluetooth Dongle

Alexander Aring (20):
  ieee802154: 6lowpan: set ackreq when needed
  mac802154: remove unneeded vif struct
  mac802154: cleanup address filtering flags
  mac802154: remove aack hw flag
  mac802154: cleanup ieee802154 hardware flags
  mac802154: remove unused hw_filt attribute
  mac802154: rearrange attribute in ieee802154_hw
  mac802154: add missing structure comments
  mac802154: change pan_coord type to bool
  mac802154: fix flags BIT definitions order
  mac802154: iface: fix hrtimer cancel on ifdown
  mac802154: iface: flush workqueue before stop
  at86rf230: use level high as fallback default
  at86rf230: add support for sleep state
  fakelb: add xmit_async after stop testcase
  at86rf230: fix phy settings while sleeping
  at86rf230: add recommended csma backoffs settings
  at86rf230: cleanup start and stop callbacks
  mac802154: iface: fix order while interface up
  mac802154: iface: cleanup stack variable

Alexey Dobriyan (1):
  Bluetooth: Stop sabotaging list poisoning

Arron Wang (2):
  Bluetooth: Make l2cap_recv_acldata() and sco_recv_scodata() return void
  Bluetooth: Move SCO support under BT_BREDR config option

Chan-yeol Park (1):
  Bluetooth: hci_uart: Fix dereferencing of ERR_PTR

Christoffer Holmstedt (1):
  nl802154: fix misspelled enum

Dmitry Tunin (3):
  ath3k: Add support of 0489:e076 AR3012 device
  ath3k: add support of 13d3:3474 AR3012 device
  Bluetooth: ath3k: Add support of 04ca:300d AR3012 device

Florian Grandel (20):
  Bluetooth: hci_core/mgmt: Introduce multi-adv list
  Bluetooth: hci_core/mgmt: move adv timeout to hdev
  Bluetooth: mgmt: dry update_scan_rsp_data()
  Bluetooth: mgmt: rename update_*_data_for_instance()
  Bluetooth: mgmt: multi adv for read_adv_features()
  Bluetooth: mgmt: multi adv for get_current_adv_instance()
  Bluetooth: mgmt: multi adv for get_adv_instance_flags()
  Bluetooth: mgmt: improve get_adv_instance_flags() readability
  Bluetooth: mgmt: multi adv for enable_advertising()
  Bluetooth: mgmt: multi adv for create_instance_scan_rsp_data()
  Bluetooth: mgmt: multi adv for create_instance_adv_data()
  Bluetooth: mgmt: multi adv for set_advertising*()
  Bluetooth: mgmt: multi adv for clear_adv_instances()
  Bluetooth: mgmt/hci_core: multi-adv for add_advertising*()
  Bluetooth: mgmt: multi adv for remove_advertising*()
  Bluetooth: mgmt: program multi-adv on power on
  Bluetooth: mgmt: multi-adv for trigger_le_scan()
  Bluetooth: mgmt: multi-adv for mgmt_reenable_advertising()
  Bluetooth: hci_core: remove obsolete adv_instance
  Bluetooth: hci_core: increase max adv inst

Frederic Danis (7):
  Bluetooth: btbcm: Move request/release_firmware()
  Bluetooth: btbcm: Add BCM4324B3 UART device
  Bluetooth: hci_uart: Support operational speed during setup
  Bluetooth: btbcm: Add helper functions for UART setup
  Bluetooth: hci_uart: Update Broadcom UART setup
  Bluetooth: hci_uart: Add bcm_set_baudrate()
  Bluetooth: hci_uart: Fix speed selection

Glenn Ruben Bakke (5):
  Bluetooth: 6lowpan: Enable delete_netdev to be scheduled when last peer 
is deleted
  Bluetooth: 6lowpan: Rename ambiguous variable
  Bluetooth: 6lowpan: Move netdev sysfs device reference
  Bluetooth: 6lowpan: Fix double kfree of netdev priv
  Bluetooth: 6lowpan: Fix module refcount

Ilya Faenson (2):
  Bluetooth: btbcm: Support the BCM4354 Bluetooth UART device
  Bluetooth: hci_uart: Add new line discipline enhancements

Jaganath Kanakkassery (1):
  Bluetooth: Fix potential NULL dereference in RFCOMM bind callback

Johan Hedberg (10):
  

Re: [PATCH net-next] x_table: align per cpu xt_counter

2015-06-18 Thread Pablo Neira Ayuso
On Wed, Jun 17, 2015 at 07:08:15PM +0200, Florian Westphal wrote:
 Eric Dumazet eric.duma...@gmail.com wrote:
  From: Eric Dumazet eduma...@google.com
  
  Let's force a 16 bytes alignment on xt_counter percpu allocations,
  so that bytes and packets sit in same cache line.
  
  xt_counter being exported to user space, we cannot add __align(16) on
  the structure itself.
 
 Sorry, I was away.  Looks great.
 
 Acked-by: Florian Westphal f...@strlen.de

Applied, thanks Eric and Florian !
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Mahesh Bandewar

 Hmm... I would rather not send these fake attributes at all ?

 That would be my preference as well.  Sorry if my lack of elaboration on
 on my earlier email made this confusing.

 If there are values that should not be visible to non-root users, then
 don't send them at all.  Do not just send NULL values.

OK, would change this in the next rev.

Thanks,
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] bridge: multicast: start querier timer when running user-space stp

2015-06-18 Thread Nikolay Aleksandrov

 On Jun 18, 2015, at 6:37 AM, Herbert Xu herb...@gondor.apana.org.au wrote:
 
 On Wed, Jun 17, 2015 at 04:28:31AM -0700, Nikolay Aleksandrov wrote:
 From: Satish Ashok sas...@cumulusnetworks.com
 
 When STP is running in user-space and querier is configured, the
 querier timer is not started when a port goes to forwarding state.
 
 Signed-off-by: Satish Ashok sas...@cumulusnetworks.com
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 Fixes: eb1d16414339 (bridge: Add core IGMP snooping support)
 ---
 net/bridge/br_stp.c | 3 +++
 1 file changed, 3 insertions(+)
 
 diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
 index fb3ebe615513..1e2f2f1ff6b0 100644
 --- a/net/bridge/br_stp.c
 +++ b/net/bridge/br_stp.c
 @@ -456,6 +456,9 @@ void br_port_state_selection(struct net_bridge *br)
  p-topology_change_ack = 0;
  br_make_blocking(p);
  }
 +} else if (br-stp_enabled == BR_USER_STP 
 +   p-state == BR_STATE_FORWARDING) {
 +br_multicast_enable_port(p);
  }
 
 Minor nit, the stp_enabled check appears to be redundant since
 you're in the else clause.
 

Right you are, I’ve overlooked it.

 More importantly, I'm not sure about the logic.  For kernel STP,
 we enable the port as soon as we get out of blocking.  IIRC enabling
 the port just means that we start tracking subscriptions/queries
 so it should be OK to do even while we're listening/learning.
 
 In any case the logic should be identical whether we use kernel
 STP or user-space STP.
 
 So how about removing br_multicast_enable_port from br_make_forward
 and just add it here for both kernel and user-space STP?
 
 Thanks,
 -- 
 Email: Herbert Xu herb...@gondor.apana.org.au
 Home Page: http://gondor.apana.org.au/~herbert/
 PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Makes sense, I’ll re-spin, test and post a v2. Thank you for the suggestion.

Cheers,
 Nik


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status

2015-06-18 Thread Scott Feldman
On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
go...@cumulusnetworks.com wrote:
 This series adds the ability to have the Linux kernel track whether or
 not a particular route should be used based on the link-status of the
 interface associated with the next-hop.

 Before this patch any link-failure on an interface that was serving as a
 gateway for some systems could result in those systems being isolated
 from the rest of the network as the stack would continue to attempt to
 send frames out of an interface that is actually linked-down.  When the
 kernel is responsible for all forwarding, it should also be responsible
 for taking action when the traffic can no longer be forwarded -- there
 is no real need to outsource link-monitoring to userspace anymore.

 This feature is only enabled with the new per-interface or ipv4 global
 sysctls called 'ignore_routes_with_linkdown'.

 net.ipv4.conf.all.ignore_routes_with_linkdown = 0
 net.ipv4.conf.default.ignore_routes_with_linkdown = 0
 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
 ...

 When the above sysctls are set, the kernel will not only report to
 userspace that the link is down, but it will also report to userspace
 that a route is dead.  This will signal to userspace that the route will
 not be selected.

 With the new sysctls set, the following behavior can be observed
 (interface p8p1 is link-down):

 # ip route show
 default via 10.0.5.2 dev p9p1
 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
 # ip route get 90.0.0.1
 90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
 cache
 # ip route get 80.0.0.1
 local 80.0.0.1 dev lo  src 80.0.0.1
 cache local
 # ip route get 80.0.0.2
 80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
 cache

 While the route does remain in the table (so it can be modified if
 needed rather than being wiped away as it would be if IFF_UP was
 cleared), the proper next-hop is chosen automatically when the link is
 down.  Now interface p8p1 is linked-up:

 # ip route show
 default via 10.0.5.2 dev p9p1
 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
 192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
 # ip route get 90.0.0.1
 90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
 cache
 # ip route get 80.0.0.1
 local 80.0.0.1 dev lo  src 80.0.0.1
 cache local
 # ip route get 80.0.0.2
 80.0.0.2 dev p8p1  src 80.0.0.1
 cache

 and the output changes to what one would expect.

 If the global or interface sysctl is not set, the following output would be
 expected when p8p1 is down:

 # ip route show
 default via 10.0.5.2 dev p9p1
 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

 If the dead flag does not appear there should be no expectation that the
 kernel would skip using this route due to link being down.

 v2: Split kernel changes into 2 patches: first to add linkdown flag and
 second to add new sysctl settings.  Also took suggestion from Alex to
 simplify code by only checking sysctl during fib lookup and suggestion
 from Scott to add a per-interface sysctl.  Added iproute2 patch to
 recognize and print linkdown flag.

 v3: Code cleanups along with reverse-path checks suggested by Alex and
 small fixes related to problems found when multipath was disabled.

 v4: Drop binary sysctls

 v5: Whitespace and variable declaration fixups suggested by Dave

 Though there were some that preferred not to have a configuration option
 and to make this behavior the default when it was discussed in Ottawa
 earlier this year since it was time to do this.  I wanted to propose
 the config option to preserve the current behavior for those that desire
 it.  I'll happily remove it if Dave and Linus approve.

 An IPv6 implementation is also needed (DECnet too!), but I wanted to start 
 with
 the IPv4 implementation to get people comfortable with the idea before moving
 forward.  If this is accepted the IPv6 implementation can be posted shortly.

 There was also a request for switchdev support for this, but that will be
 posted as a followup as switchdev does not currently handle dead
 next-hops in a multi-path case and I felt that infra needed to be added
 first.

Andy, I finally got some time to try your patches with
switchdev+rocker.  With static routes I see the same results as
you...feature is 

[PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Mahesh Bandewar
Actor and Partner details can be accessed via proc-fs, sys-fs
entries or netlink interface. These interfaces are world readable
at this moment. The earlier patch-series made the LACP communication
secure to avoid nuisance attack from within the same L2 domain but
it did not prevent someone unprivileged looking at that information
on host and perform the same act.

This patch essentially avoids spitting those entries if the user
in question does not have enough privileges.

Signed-off-by: Mahesh Bandewar mahe...@google.com
---
 drivers/net/bonding/bond_netlink.c |  23 +
 drivers/net/bonding/bond_procfs.c  | 101 +++--
 drivers/net/bonding/bond_sysfs.c   |  12 ++---
 3 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/drivers/net/bonding/bond_netlink.c 
b/drivers/net/bonding/bond_netlink.c
index 5580fcde738f..1bda29249d12 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -601,19 +601,20 @@ static int bond_fill_info(struct sk_buff *skb,
if (BOND_MODE(bond) == BOND_MODE_8023AD) {
struct ad_info info;
 
-   if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
-   bond-params.ad_actor_sys_prio))
-   goto nla_put_failure;
-
-   if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
-   bond-params.ad_user_port_key))
-   goto nla_put_failure;
+   if (capable(CAP_NET_ADMIN)) {
+   if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
+   bond-params.ad_actor_sys_prio))
+   goto nla_put_failure;
 
-   if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
-   sizeof(bond-params.ad_actor_system),
-   bond-params.ad_actor_system))
-   goto nla_put_failure;
+   if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
+   bond-params.ad_user_port_key))
+   goto nla_put_failure;
 
+   if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
+   sizeof(bond-params.ad_actor_system),
+   bond-params.ad_actor_system))
+   goto nla_put_failure;
+   }
if (!bond_3ad_get_active_agg_info(bond, info)) {
struct nlattr *nest;
 
diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index e7f3047a26df..f514fe5e80a5 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -135,27 +135,30 @@ static void bond_info_show_master(struct seq_file *seq)
  bond-params.ad_select);
seq_printf(seq, Aggregator selection policy (ad_select): %s\n,
   optval-string);
-   seq_printf(seq, System priority: %d\n,
-  BOND_AD_INFO(bond).system.sys_priority);
-   seq_printf(seq, System MAC address: %pM\n,
-  BOND_AD_INFO(bond).system.sys_mac_addr);
-
-   if (__bond_3ad_get_active_agg_info(bond, ad_info)) {
-   seq_printf(seq, bond %s has no active aggregator\n,
-  bond-dev-name);
-   } else {
-   seq_printf(seq, Active Aggregator Info:\n);
-
-   seq_printf(seq, \tAggregator ID: %d\n,
-  ad_info.aggregator_id);
-   seq_printf(seq, \tNumber of ports: %d\n,
-  ad_info.ports);
-   seq_printf(seq, \tActor Key: %d\n,
-  ad_info.actor_key);
-   seq_printf(seq, \tPartner Key: %d\n,
-  ad_info.partner_key);
-   seq_printf(seq, \tPartner Mac Address: %pM\n,
-  ad_info.partner_system);
+   if (capable(CAP_NET_ADMIN)) {
+   seq_printf(seq, System priority: %d\n,
+  BOND_AD_INFO(bond).system.sys_priority);
+   seq_printf(seq, System MAC address: %pM\n,
+  BOND_AD_INFO(bond).system.sys_mac_addr);
+
+   if (__bond_3ad_get_active_agg_info(bond, ad_info)) {
+   seq_printf(seq,
+  bond %s has no active aggregator\n,
+  bond-dev-name);
+   } else {
+   seq_printf(seq, Active Aggregator Info:\n);
+
+   seq_printf(seq, \tAggregator ID: %d\n,
+  

Re: [PATCH net-next 00/43] Simplify netfilter and network namespaces (take 2)

2015-06-18 Thread Pablo Neira Ayuso
On Wed, Jun 17, 2015 at 10:09:40AM -0500, Eric W. Biederman wrote:
[...]
 There are a few extra cleanups in the first group of changes sprinkled
 in as I noticed a few other things as I was sorting out the network
 namespace computation logic.

This is a rather large patchset that address many pernet issues in the
netfilter codebase, I would classify them in:

1) Patches to prepare the ground for easier pernet integration.

2) Get rid of the dev_net(dev) ? ... : ...; pattern all around the
   netfilter code.

3) Missing pernet sysctl support is some spots, eg. br_netfilter.

4) Pernet hooks, probably the largest changeset in this pile and the
   most important one IMO.

So given that it's quite evident that netfilter netns support is
half-cooked and there's room for improvement in it, as we've been
receiving patches to partially add support on things that people
sporadically needed, could you please split this in several (smaller)
batches in logical changes for easier review?

On a different front, nfnetlink_log and nfnetlink_queue also still
lack of netns support so patches for that would be also appreciated in
another different round.

I'm going to take as much of small preparation patches that I can to
reduce your patchload:

1/43, 8/43, 16/43, 17/43, 18/43, 26/43

Thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Steven Rostedt
On Thu, 18 Jun 2015 15:24:52 -0400
Trond Myklebust trond.mykleb...@primarydata.com wrote:

 On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt rost...@goodmis.org wrote:
  On Fri, 12 Jun 2015 11:50:38 -0400
  Steven Rostedt rost...@goodmis.org wrote:
 
  I reverted the following commits:
 
  c627d31ba0696cbd829437af2be2f2dee3546b1e
  9e2b9f37760e129cee053cc7b6e7288acc2a7134
  caf4ccd4e88cf2795c927834bc488c8321437586
 
  And the issue goes away. That is, I watched the port go from
  ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
 
  In fact, I watched the port with my portlist.c module, and it
  disappeared there too when it entered the TIME_WAIT state.
 
 
 I've scanned those commits again and again, and I'm not seeing how we
 could be introducing a socket leak there. The only suspect I can see
 would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you
 using NFS swap?

Not that I'm aware of.

 
  I've been running v4.0.5 with the above commits reverted for 5 days
  now, and there's still no hidden port appearing.
 
  What's the status on this? Should those commits be reverted or is there
  another solution to this bug?
 
 
 I'm trying to reproduce, but I've had no luck yet.

It seems to happen with the connection to my wife's machine, and that
is where my wife's box connects two directories via nfs:

This is what's in my wife's /etc/fstab directory

goliath:/home/upload /upload nfs auto,rw,intr,soft   0 0
goliath:/home/gallery/gallerynfs auto,ro,intr,soft   0 0

And here's what's in my /etc/exports directory

/home/upload   wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check)
/home/gallery  192.168.23.0/24(ro,sync,no_subtree_check)

Attached is my config.

-- Steve




config.gz
Description: application/gzip


Re: [PATCH] net: fix search limit handling in skb_find_text()

2015-06-18 Thread Pablo Neira Ayuso
On Tue, Jun 16, 2015 at 03:13:41PM +0300, Roman Khimov wrote:
 В письме от 16 июня 2015 12:48:41 пользователь Pablo Neira Ayuso написал:
[...]
  But if we change the existing behaviour, users may be relying on it
  and we'll get things broken for them. Someone else will come later one
  with another patch to say: hey, --to used to be inclusive but this is
  not the case anymore and it's breaking my setup.
 
 I do understand your concerns, but fixing it this way would require changing 
 skb_seq_read() and basicaly would propagate 'to' offset included semantics 
 (which seems a bit strange for programmers, IMO) further. And initially I 
 thought that changing skb_seq_read() would be more intrusive, although 
 looking 
 at all this now it looks like the only real user of upper_offset field in 
 ts_config struct is skb_find_text(), because other invocations of 
 skb_seq_read() from drivers/scsi/libiscsi_tcp.c and net/batman-adv/main.c use 
 skb-len as an upper limit.
 
   em_text_match() in net/sched/em_text.c is also suspicious.
  
  Please, elaborate.
 
 The way it constructs 'to' offset, I think it doesn't expect something to 
 match at 'to'. Although I might be wrong here.

Could you send a patch that resolves the inconsistency for programmers
while leaving the userspace exposed behaviour through xt_string and
em_string intact? Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces

2015-06-18 Thread Julian Anastasov

Hello,

On Thu, 18 Jun 2015, Eric W. Biederman wrote:

 My incremental patch for ipvs on top of everything else I have pushed
 out looks like this:
 
 From: Eric W. Biederman ebied...@xmission.com
 Date: Fri, 12 Jun 2015 18:34:12 -0500
 Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used
 
 Pass struct net down to where it is used and stop guessing
 which network namespace should be used.

At first look patch is ok. But I'm not sure
for the changes in ip_vs_xmit.c. Can you explain in
2-3 lines, when can we see different netns? Is it when
packet is forwarded to output device and it is part from
another netns?

I'm asking because these __ip_vs_get_out_rt*
calls in ip_vs_xmit.c can reroute packet to another
device...

Also, skb_sknet is another candidate for removal.
But I can take care about it after your changes are
pushed...

Regards

--
Julian Anastasov j...@ssi.bg
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status

2015-06-18 Thread Andy Gospodarek
On Thu, Jun 18, 2015 at 10:51:37AM -0700, Scott Feldman wrote:
 On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
 go...@cumulusnetworks.com wrote:
  This series adds the ability to have the Linux kernel track whether or
  not a particular route should be used based on the link-status of the
  interface associated with the next-hop.
 
  Before this patch any link-failure on an interface that was serving as a
  gateway for some systems could result in those systems being isolated
  from the rest of the network as the stack would continue to attempt to
  send frames out of an interface that is actually linked-down.  When the
  kernel is responsible for all forwarding, it should also be responsible
  for taking action when the traffic can no longer be forwarded -- there
  is no real need to outsource link-monitoring to userspace anymore.
 
  This feature is only enabled with the new per-interface or ipv4 global
  sysctls called 'ignore_routes_with_linkdown'.
 
  net.ipv4.conf.all.ignore_routes_with_linkdown = 0
  net.ipv4.conf.default.ignore_routes_with_linkdown = 0
  net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
  ...
 
  When the above sysctls are set, the kernel will not only report to
  userspace that the link is down, but it will also report to userspace
  that a route is dead.  This will signal to userspace that the route will
  not be selected.
 
  With the new sysctls set, the following behavior can be observed
  (interface p8p1 is link-down):
 
  # ip route show
  default via 10.0.5.2 dev p9p1
  10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
  70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
  80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
  90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
  90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
  # ip route get 90.0.0.1
  90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
  cache
  # ip route get 80.0.0.1
  local 80.0.0.1 dev lo  src 80.0.0.1
  cache local
  # ip route get 80.0.0.2
  80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
  cache
 
  While the route does remain in the table (so it can be modified if
  needed rather than being wiped away as it would be if IFF_UP was
  cleared), the proper next-hop is chosen automatically when the link is
  down.  Now interface p8p1 is linked-up:
 
  # ip route show
  default via 10.0.5.2 dev p9p1
  10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
  70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
  80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
  90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
  90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
  192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
  # ip route get 90.0.0.1
  90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
  cache
  # ip route get 80.0.0.1
  local 80.0.0.1 dev lo  src 80.0.0.1
  cache local
  # ip route get 80.0.0.2
  80.0.0.2 dev p8p1  src 80.0.0.1
  cache
 
  and the output changes to what one would expect.
 
  If the global or interface sysctl is not set, the following output would be
  expected when p8p1 is down:
 
  # ip route show
  default via 10.0.5.2 dev p9p1
  10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
  70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
  80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
  90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
  90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
 
  If the dead flag does not appear there should be no expectation that the
  kernel would skip using this route due to link being down.
 
  v2: Split kernel changes into 2 patches: first to add linkdown flag and
  second to add new sysctl settings.  Also took suggestion from Alex to
  simplify code by only checking sysctl during fib lookup and suggestion
  from Scott to add a per-interface sysctl.  Added iproute2 patch to
  recognize and print linkdown flag.
 
  v3: Code cleanups along with reverse-path checks suggested by Alex and
  small fixes related to problems found when multipath was disabled.
 
  v4: Drop binary sysctls
 
  v5: Whitespace and variable declaration fixups suggested by Dave
 
  Though there were some that preferred not to have a configuration option
  and to make this behavior the default when it was discussed in Ottawa
  earlier this year since it was time to do this.  I wanted to propose
  the config option to preserve the current behavior for those that desire
  it.  I'll happily remove it if Dave and Linus approve.
 
  An IPv6 implementation is also needed (DECnet too!), but I wanted to start 
  with
  the IPv4 implementation to get people comfortable with the idea before 
  moving
  forward.  If this is accepted the IPv6 implementation can be posted shortly.
 
  There was also a request for switchdev support for this, but that will be
  posted as a followup as switchdev does not currently handle dead
  next-hops in a multi-path case and I felt 

[GIT] [4.2] 2nd NFC update

2015-06-18 Thread Samuel Ortiz
Hi David,

This is a follow up fix for a typo that I introduced while cleaning
the 1st 4.2 NFC pull request patches.

The following changes since commit d0dcad8bd32a34aa85bcbd5d2033658cb3964377:

  NFC: nfcmrvl: set PB_BAIL_OUT at setup (2015-06-13 00:08:55 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git 
tags/nfc-next-4.2-2

for you to fetch changes up to fb77ff4f43990dc91926ce2704036a547482544e:

  NFC: nci: fix mistake in uart generic driver (2015-06-15 18:10:37 +0200)


Vincent Cuissard (1):
  NFC: nci: fix mistake in uart generic driver

 net/nfc/nci/uart.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/22] FUJITSU Extended Socket network device driver

2015-06-18 Thread Darren Hart
On Thu, Jun 18, 2015 at 09:45:59AM +0900, Taku Izumi wrote:
 This patchsets adds FUJITSU Extended Socket network device driver.
 Extended Socket network device is a shared memory based high-speed network 
 interface between Extended Partitions of PRIMEQUEST 2000 E2 series.
 
 You can get some information about Extended Partition and Extended
 Socket by referring the following manual.
 
 http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf
  3.2.1 Extended Partitioning
  3.2.2 Extended Socket
 

As Alex mentioned earlier, I suspect this is more appropriate for drivers/net.
If David objects, we can consider for platform/drivers/x86.

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000e driver - hang after 4 hours of uptime - finally bisected!

2015-06-18 Thread Jeff Kirsher
On Thu, 2015-06-18 at 12:46 -0400, Valdis Kletnieks wrote:
 (follow up to a report from last week - bisecting took a while as I could
 only do 1 or 2 tests an evening)
 
 My Dell Latitude E6530 crashes with a specific kernel lockup almost
 exactly 4 hours after boot if there isn't a cable connected to the
 Ethernet port:
 
 [14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 0
 [14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 0
 [14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 0
 [14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 1
 [14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 2
 [14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 1
 [14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 0
 [14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 1
 [14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 3
 [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 0
 
 As far as I can tell, the timestamp jitter is just how long it takes me to
 enter the cryptLUKS passphrase for the hard drive at boot...
 
 lspci tells me:
 
 lspci -vvv -s 00:19.0
 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network 
 Connection (rev 04)
 DeviceName:  Onboard LAN
 Subsystem: Dell Device 0535
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- 
 TAbort- MAbort- SERR- PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 28
 Region 0: Memory at f770 (32-bit, non-prefetchable) [size=128K]
 Region 1: Memory at f7739000 (32-bit, non-prefetchable) [size=4K]
 Region 2: I/O ports at f040 [size=32]
 Capabilities: [c8] Power Management version 2
 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
 Address: fee00318  Data: 
 Capabilities: [e0] PCI Advanced Features
 AFCap: TP+ FLR+
 AFCtrl: FLR-
 AFStatus: TP-
 Kernel driver in use: e1000e
 
 
 The traceback always looks like:
 
 [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
 cpu 0
 
 [14479.906908] Call Trace:
 [14479.906914]  NMI  [ba94db16] dump_stack+0x50/0xa8
 [14479.906930]  [ba948bb9] panic+0xcd/0x1e4
 [14479.906940]  [ba166a60] ? perf_event_task_disable+0xc0/0xc0
 [14479.906952]  [ba125d8b] watchdog_overflow_callback+0x9b/0xa0
 [14479.906959]  [ba16a684] __perf_event_overflow+0xc4/0x1f0
 [14479.906968]  [ba16b3a4] perf_event_overflow+0x14/0x20
 [14479.906976]  [ba022271] intel_pmu_handle_irq+0x1e1/0x430
 [14479.906990]  [ba01a0f6] perf_event_nmi_handler+0x26/0x40
 [14479.906999]  [ba0085b3] nmi_handle+0x103/0x340
 [14479.907005]  [ba0084b5] ? nmi_handle+0x5/0x340
 [14479.907017]  [ba008a53] default_do_nmi+0xc3/0x120
 [14479.907032]  [ba008b98] do_nmi+0xe8/0x130
 [14479.907044]  [ba95c9a8] end_repeat_nmi+0x1e/0x2e
 [14479.907055]  [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0
 [14479.907061]  [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0
 [14479.907069]  [ba529886] ? e1000e_cyclecounter_read+0x16/0xc0
 [14479.907075]  EOE  [ba0e9529] timecounter_read+0x19/0x60
 [14479.907088]  [ba53687e] e1000e_phc_gettime+0x2e/0x60
 [14479.907098]  [ba536a31] e1000e_systim_overflow_work+0x31/0x70
 [14479.907105]  [ba07ad19] process_one_work+0x3c9/0x980
 [14479.907115]  [ba07ac62] ? process_one_work+0x312/0x980
 [14479.907125]  [ba07b348] ? worker_thread+0x78/0x760
 [14479.907134]  [ba07b59c] worker_thread+0x2cc/0x760
 [14479.907144]  [ba07b2d0] ? process_one_work+0x980/0x980
 [14479.907154]  [ba082a5e] kthread+0xfe/0x120
 [14479.907163]  [ba08ca50] ? finish_task_switch+0x50/0x1c0
 [14479.907173]  [ba082960] ? kthread_create_on_node+0x270/0x270
 [14479.907179]  [ba95ae4f] ret_from_fork+0x3f/0x70
 [14479.907188]  [ba082960] ? kthread_create_on_node+0x270/0x270
 [14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation 
 range: 0x8000-0xbfff)
 
 Bisection tells me it's this commit:
 
 commit 83129b37ef35bb6a7f01c060129736a8db5d31c4
 Author: Yanir Lubetkin yanirx.lubet...@intel.com
 Date:   Tue Jun 2 17:05:45 2015 +0300
 
 e1000e: fix systim issues
 
 Two issues involving systim were reported.
 1. Clock 

[PATCH] NET: ROSE: Don't dereference NULL neighbour pointer.

2015-06-18 Thread Ralf Baechle
A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.

Signed-off-by: Ralf Baechle r...@linux-mips.org
Tested-by: Bernard Pidoux f6...@free.fr
Cc: sta...@vger.kernel.org #2.6.11+

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 8ae6030..dd304bc 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -192,7 +192,8 @@ static void rose_kill_by_device(struct net_device *dev)
 
if (rose-device == dev) {
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
-   rose-neighbour-use--;
+   if (rose-neighbour)
+   rose-neighbour-use--;
rose-device = NULL;
}
}
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fm10k: Report MAC address on driver load

2015-06-18 Thread Jeff Kirsher
On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote:
 This change adds the MAC address to the list of values recorded on
 driver
 load.  The MAC address represents the serial number of the unit and
 allows
 us to track the value should a card be replaced in a system.
 
 Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
 ---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks Alex, I will get this added to my queue.


signature.asc
Description: This is a digitally signed message part


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Jeff Layton
On Thu, 18 Jun 2015 15:49:14 -0400
Steven Rostedt rost...@goodmis.org wrote:

 On Thu, 18 Jun 2015 15:24:52 -0400
 Trond Myklebust trond.mykleb...@primarydata.com wrote:
 
  On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt rost...@goodmis.org 
  wrote:
   On Fri, 12 Jun 2015 11:50:38 -0400
   Steven Rostedt rost...@goodmis.org wrote:
  
   I reverted the following commits:
  
   c627d31ba0696cbd829437af2be2f2dee3546b1e
   9e2b9f37760e129cee053cc7b6e7288acc2a7134
   caf4ccd4e88cf2795c927834bc488c8321437586
  
   And the issue goes away. That is, I watched the port go from
   ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
  
   In fact, I watched the port with my portlist.c module, and it
   disappeared there too when it entered the TIME_WAIT state.
  
  
  I've scanned those commits again and again, and I'm not seeing how we
  could be introducing a socket leak there. The only suspect I can see
  would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you
  using NFS swap?
 
 Not that I'm aware of.
 
  
   I've been running v4.0.5 with the above commits reverted for 5 days
   now, and there's still no hidden port appearing.
  
   What's the status on this? Should those commits be reverted or is there
   another solution to this bug?
  
  
  I'm trying to reproduce, but I've had no luck yet.
 
 It seems to happen with the connection to my wife's machine, and that
 is where my wife's box connects two directories via nfs:
 
 This is what's in my wife's /etc/fstab directory
 
 goliath:/home/upload /upload nfs auto,rw,intr,soft   0 0
 goliath:/home/gallery/gallerynfs auto,ro,intr,soft 0 0
 
 And here's what's in my /etc/exports directory
 
 /home/upload   wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check)
 /home/gallery  192.168.23.0/24(ro,sync,no_subtree_check)
 
 Attached is my config.
 

The interesting bit here is that the sockets all seem to connect to port
55201 on the remote host, if I'm reading these traces correctly. What's
listening on that port on the server?

This might give some helpful info:

$ rpcinfo -p NFS servername

Also, what NFS version are you using to mount here? Your fstab entries
suggest that you're using the default version (for whatever distro this
is), but have you (e.g.) set up nfsmount.conf to default to v3 on this
box?

-- 
Jeff Layton jlay...@poochiereds.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/3] ipv4: L3 and L4 hash-based multipath routing

2015-06-18 Thread Alexander Duyck



On 06/17/2015 01:08 PM, Peter Nørlund wrote:

This patch adds L3 and L4 hash-based multipath routing, selectable on a
per-route basis with the reintroduced RTA_MP_ALGO attribute. The default is
now RT_MP_ALG_L3_HASH.

Signed-off-by: Peter Nørlund p...@ordbogen.com
---
  include/net/ip_fib.h   |  4 ++-
  include/net/route.h|  5 ++--
  include/uapi/linux/rtnetlink.h | 14 ++-
  net/ipv4/fib_frontend.c|  4 +++
  net/ipv4/fib_semantics.c   | 34 ++---
  net/ipv4/icmp.c|  4 +--
  net/ipv4/route.c   | 56 +++---
  net/ipv4/xfrm4_policy.c|  2 +-
  8 files changed, 103 insertions(+), 20 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 4be4f25..250d98e 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -37,6 +37,7 @@ struct fib_config {
u32 fc_flags;
u32 fc_priority;
__be32  fc_prefsrc;
+   int fc_mp_alg;
struct nlattr   *fc_mx;
struct rtnexthop*fc_mp;
int fc_mx_len;
@@ -116,6 +117,7 @@ struct fib_info {
int fib_nhs;
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_mp_weight;
+   int fib_mp_alg;
  #endif
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
@@ -308,7 +310,7 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev);
  int fib_sync_down_dev(struct net_device *dev, int force);
  int fib_sync_down_addr(struct net *net, __be32 local);
  int fib_sync_up(struct net_device *dev);
-void fib_select_multipath(struct fib_result *res);
+void fib_select_multipath(struct fib_result *res, const struct flowi4 *flow);

  /* Exported by fib_trie.c */
  void fib_trie_init(void);
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..1fc7deb 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -110,7 +110,8 @@ struct in_device;
  int ip_rt_init(void);
  void rt_cache_flush(struct net *net);
  void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
+struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp,
+const struct flowi4 *mp_flow);
  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
struct sock *sk);
  struct dst_entry *ipv4_blackhole_route(struct net *net,
@@ -267,7 +268,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 
*fl4,
  sport, dport, sk);

if (!dst || !src) {
-   rt = __ip_route_output_key(net, fl4);
+   rt = __ip_route_output_key(net, fl4, NULL);
if (IS_ERR(rt))
return rt;
ip_rt_put(rt);
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..dff4a72 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -271,6 +271,18 @@ enum rt_scope_t {
  #define RTM_F_EQUALIZE0x400   /* Multipath equalizer: NI  
*/
  #define RTM_F_PREFIX  0x800   /* Prefix addresses */

+/* Multipath algorithms */
+
+enum rt_mp_alg_t {
+   RT_MP_ALG_L3_HASH,  /* Was IP_MP_ALG_NONE */
+   RT_MP_ALG_PER_PACKET,   /* Was IP_MP_ALG_RR */
+   RT_MP_ALG_DRR,  /* not used */
+   RT_MP_ALG_RANDOM,   /* not used */
+   RT_MP_ALG_WRANDOM,  /* not used */
+   RT_MP_ALG_L4_HASH,
+   __RT_MP_ALG_MAX
+};
+
  /* Reserved table identifiers */

  enum rt_class_t {
@@ -301,7 +313,7 @@ enum rtattr_type_t {
RTA_FLOW,
RTA_CACHEINFO,
RTA_SESSION, /* no longer used */
-   RTA_MP_ALGO, /* no longer used */
+   RTA_MP_ALGO,
RTA_TABLE,
RTA_MARK,
RTA_MFC_STATS,
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..376e8c1 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -590,6 +590,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_PREFSRC]   = { .type = NLA_U32 },
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+   [RTA_MP_ALGO]   = { .type = NLA_U32 },
[RTA_FLOW]  = { .type = NLA_U32 },
  };

@@ -650,6 +651,9 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
cfg-fc_mp = nla_data(attr);
cfg-fc_mp_len = nla_len(attr);
break;
+   case RTA_MP_ALGO:
+   cfg-fc_mp_alg = nla_get_u32(attr);
+   break;
case RTA_FLOW:
cfg-fc_flow = 

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Trond Myklebust
On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt rost...@goodmis.org wrote:
 On Fri, 12 Jun 2015 11:50:38 -0400
 Steven Rostedt rost...@goodmis.org wrote:

 I reverted the following commits:

 c627d31ba0696cbd829437af2be2f2dee3546b1e
 9e2b9f37760e129cee053cc7b6e7288acc2a7134
 caf4ccd4e88cf2795c927834bc488c8321437586

 And the issue goes away. That is, I watched the port go from
 ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.

 In fact, I watched the port with my portlist.c module, and it
 disappeared there too when it entered the TIME_WAIT state.


I've scanned those commits again and again, and I'm not seeing how we
could be introducing a socket leak there. The only suspect I can see
would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you
using NFS swap?

 I've been running v4.0.5 with the above commits reverted for 5 days
 now, and there's still no hidden port appearing.

 What's the status on this? Should those commits be reverted or is there
 another solution to this bug?


I'm trying to reproduce, but I've had no luck yet.

Cheers
  Trond
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] ipv4: Lock-less per-packet multipath

2015-06-18 Thread Alexander Duyck

On 06/17/2015 01:08 PM, Peter Nørlund wrote:

The current multipath attempted to be quasi random, but in most cases it
behaved just like a round robin balancing. This patch refactors the
algorithm to be exactly that and in doing so, avoids the spin lock.

The new design paves the way for hash-based multipath, replacing the
modulo with thresholds, minimizing disruption in case of failing paths or
route replacements.

Signed-off-by: Peter Nørlund p...@ordbogen.com
---
  include/net/ip_fib.h |   6 +--
  net/ipv4/Kconfig |   1 +
  net/ipv4/fib_semantics.c | 116 ++-
  3 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..4be4f25 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -76,8 +76,8 @@ struct fib_nh {
unsigned intnh_flags;
unsigned char   nh_scope;
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   int nh_weight;
-   int nh_power;
+   int nh_mp_weight;
+   atomic_tnh_mp_upper_bound;
  #endif
  #ifdef CONFIG_IP_ROUTE_CLASSID
__u32   nh_tclassid;
@@ -115,7 +115,7 @@ struct fib_info {
  #define fib_advmss fib_metrics[RTAX_ADVMSS-1]
int fib_nhs;
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   int fib_power;
+   int fib_mp_weight;
  #endif
struct rcu_head rcu;
struct fib_nh   fib_nh[0];


I could do without some of this renaming.  For example you could 
probably not bother with adding the _mp piece to the name.  That way we 
don't have to track all the nh_weight - nh_mp_weight changes.   Also 
you could probably just use the name fib_weight since not including the 
_mp was already the convention for the multipath portions of the 
structure anyway.


This isn't really improving readability at all so I would say don't 
bother renaming it.



diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index d83071d..cb91f67 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -81,6 +81,7 @@ config IP_MULTIPLE_TABLES
  config IP_ROUTE_MULTIPATH
bool IP: equal cost multipath
depends on IP_ADVANCED_ROUTER
+   select BITREVERSE
help
  Normally, the routing tables specify a single action to be taken in
  a deterministic manner for a given packet. If you say Y here
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 28ec3c1..8c8df80 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -15,6 +15,7 @@

  #include asm/uaccess.h
  #include linux/bitops.h
+#include linux/bitrev.h
  #include linux/types.h
  #include linux/kernel.h
  #include linux/jiffies.h
@@ -57,7 +58,7 @@ static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];

  #ifdef CONFIG_IP_ROUTE_MULTIPATH

-static DEFINE_SPINLOCK(fib_multipath_lock);
+static DEFINE_PER_CPU(u8, fib_mp_rr_counter);

  #define for_nexthops(fi) {\
int nhsel; const struct fib_nh *nh; \
@@ -261,7 +262,7 @@ static inline int nh_comp(const struct fib_info *fi, const 
struct fib_info *ofi)
nh-nh_gw  != onh-nh_gw ||
nh-nh_scope != onh-nh_scope ||
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   nh-nh_weight != onh-nh_weight ||
+   nh-nh_mp_weight != onh-nh_mp_weight ||
  #endif
  #ifdef CONFIG_IP_ROUTE_CLASSID
nh-nh_tclassid != onh-nh_tclassid ||
@@ -449,6 +450,43 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int 
remaining)
return remaining  0 ? 0 : nhs;
  }



This is a good example.  If we don't do the rename we don't have to 
review changes like the one above which just add extra overhead to the 
patch.



+static void fib_rebalance(struct fib_info *fi)
+{
+   int factor;
+   int total;
+   int w;
+
+   if (fi-fib_nhs  2)
+   return;
+
+   total = 0;
+   for_nexthops(fi) {
+   if (!(nh-nh_flags  RTNH_F_DEAD))
+   total += nh-nh_mp_weight;
+   } endfor_nexthops(fi);
+
+   if (likely(total != 0)) {
+   factor = DIV_ROUND_UP(total, 8388608);
+   total /= factor;
+   } else {
+   factor = 1;
+   }
+


So where does the 8388608 value come from?  Is it just here to help 
restrict the upper_bound to a u8 value?



+   w = 0;
+   change_nexthops(fi) {
+   int upper_bound;
+
+   if (nexthop_nh-nh_flags  RTNH_F_DEAD) {
+   upper_bound = -1;
+   } else {
+   w += nexthop_nh-nh_mp_weight / factor;
+   upper_bound = DIV_ROUND_CLOSEST(256 * w, total);
+   }


This is doing some confusing stuff.  I assume the whole point is to get 
the 

Re: [PATCH] fm10k: Report MAC address on driver load

2015-06-18 Thread Jeff Kirsher
On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote:
 This change adds the MAC address to the list of values recorded on
 driver
 load.  The MAC address represents the serial number of the unit and
 allows
 us to track the value should a card be replaced in a system.
 
 Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com
 ---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

With the recent fm10k patches that Jake submitted, this patch no longer
applies cleanly.  If you could re-spin your patch against my next-queue
tree (dev-queue branch) that would be much appreciated.


signature.asc
Description: This is a digitally signed message part


RE: [Intel-wired-lan] [PATCH v6 1/3] if_link: Add control trust VF

2015-06-18 Thread Hiroshi Shimamoto
 Subject: Re: [Intel-wired-lan] [PATCH v6 1/3] if_link: Add control trust VF
 
 On 06/17/2015 04:41 AM, Hiroshi Shimamoto wrote:
  From: Hiroshi Shimamoto h-shimam...@ct.jp.nec.com
 
  Add netlink directives and ndo entry to trust VF user.
 
  This controls the special permission of VF user.
  The administrator will dedicatedly trust VF user to use some features
  which impacts security and/or performance.
 
  The administrator never turn it on unless VF user is fully trusted.
 
  Signed-off-by: Hiroshi Shimamoto h-shimam...@ct.jp.nec.com
  Reviewed-by: Hayato Momma h-mo...@ce.jp.nec.com
  CC: Choi, Sy Jong sy.jong.c...@intel.com
  ---
  include/linux/if_link.h  |  1 +
include/linux/netdevice.h|  3 +++
include/uapi/linux/if_link.h |  6 ++
net/core/rtnetlink.c | 19 +--
4 files changed, 27 insertions(+), 2 deletions(-)
 
  diff --git a/include/linux/if_link.h b/include/linux/if_link.h
  index ae5d0d2..f923d15 100644
  --- a/include/linux/if_link.h
  +++ b/include/linux/if_link.h
  @@ -24,5 +24,6 @@ struct ifla_vf_info {
  __u32 min_tx_rate;
  __u32 max_tx_rate;
  __u32 rss_query_en;
  +   __u32 trusted;
};
#endif /* _LINUX_IF_LINK_H */
  diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
  index e20979d..a034fb8 100644
  --- a/include/linux/netdevice.h
  +++ b/include/linux/netdevice.h
  @@ -873,6 +873,7 @@ typedef u16 (*select_queue_fallback_t)(struct 
  net_device *dev,
 * int (*ndo_set_vf_rate)(struct net_device *dev, int vf, int min_tx_rate,
 *  int max_tx_rate);
 * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool 
  setting);
  + * int (*ndo_set_vf_trust)(struct net_device *dev, int vf, bool setting);
 * int (*ndo_get_vf_config)(struct net_device *dev,
 *int vf, struct ifla_vf_info *ivf);
 * int (*ndo_set_vf_link_state)(struct net_device *dev, int vf, int 
  link_state);
  @@ -1095,6 +1096,8 @@ struct net_device_ops {
 int max_tx_rate);
  int (*ndo_set_vf_spoofchk)(struct net_device *dev,
 int vf, bool setting);
  +   int (*ndo_set_vf_trust)(struct net_device *dev,
  +   int vf, bool setting);
  int (*ndo_get_vf_config)(struct net_device *dev,
   int vf,
   struct ifla_vf_info *ivf);
  diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
  index 2c7e8e3..891050c 100644
  --- a/include/uapi/linux/if_link.h
  +++ b/include/uapi/linux/if_link.h
  @@ -485,6 +485,7 @@ enum {
   * on/off switch
   */
  IFLA_VF_STATS,  /* network device statistics */
  +   IFLA_VF_TRUST,  /* Trust VF */
  __IFLA_VF_MAX,
};
 
  @@ -546,6 +547,11 @@ enum {
 
#define IFLA_VF_STATS_MAX (__IFLA_VF_STATS_MAX - 1)
 
  +struct ifla_vf_trust {
  +   __u32 vf;
  +   __u32 setting;
  +};
  +
/* VF ports management section
 *
 *Nested layout of set/get msg is:
  diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
  index 2d102ce..abd1a75 100644
  --- a/net/core/rtnetlink.c
  +++ b/net/core/rtnetlink.c
  @@ -831,7 +831,8 @@ static inline int rtnl_vfinfo_size(const struct 
  net_device *dev,
   /* IFLA_VF_STATS_BROADCAST */
   nla_total_size(sizeof(__u64)) +
   /* IFLA_VF_STATS_MULTICAST */
  -nla_total_size(sizeof(__u64)));
  +nla_total_size(sizeof(__u64)) +
  +nla_total_size(sizeof(struct ifla_vf_trust)));
  return size;
  } else
  return 0;
  @@ -1151,6 +1152,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, 
  struct net_device *dev,
  struct ifla_vf_link_state vf_linkstate;
  struct ifla_vf_rss_query_en vf_rss_query_en;
  struct ifla_vf_stats vf_stats;
  +   struct ifla_vf_trust vf_trust;
 
  /*
   * Not all SR-IOV capable drivers support the
  @@ -1160,6 +1162,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, 
  struct net_device *dev,
   */
  ivi.spoofchk = -1;
  ivi.rss_query_en = -1;
  +   ivi.trusted = -1;
  memset(ivi.mac, 0, sizeof(ivi.mac));
  /* The default value for VF link state is auto
   * IFLA_VF_LINK_STATE_AUTO which equals zero
  @@ -1173,7 +1176,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, 
  struct net_device *dev,
  vf_tx_rate.vf =
  

[RFC] COLO Proxy Module

2015-06-18 Thread Li Zhijian

Hi, all

We are planning to implement a kernel module called COLO Proxy to buffer and
compare packets. This module is one of the important component of COLO project
and now it is still in early stage, so any comments and feedback are warmly
welcomed, thanks in advance.

=
# RFC: COLO-Proxy Module

## Rationale

COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop Service)
project is a high availability solution. Both Primary VM (PVM) and Secondary VM
(SVM) run in parallel. They receive the same request from client, and generate
responses in parallel too. If the response packets from PVM and SVM are
identical, they are released immediately. Otherwise, a VM checkpoint (on demand)
is conducted.
Paper:
http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0
COLO on Xen:
http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping
COLO on Qemu/KVM:
http://wiki.qemu.org/Features/COLO

By the needs of capturing response packets from PVM and SVM and finding out
whether they are identical, we introduce a new kernel module which called
colo-proxy.

This document describes the design of the colo-proxy module

## Glossary

  * PVM - Primary VM, which provides services to clients.
  * SVM - Secondary VM, a hot standby and replication of PVM.
  * PN - Primary Node, the host which PVM runs on
  * SN - Secondary Node, the host which SVM runs on

## Network topology

= Normal =
 ++
 | client |
 ++---+
-+   |+ -+
PN   |   +|SN|
+---+ +[eth0]-[switch]-[eth0]-+  |
|PVM| +---+-+||   +---+-+|
| [tap0]--+ br0 |||   | br0 ||
|   | +-+  [eth1]-[forward][eth1]--+  +-+|
+---+|||+---+|
 |||  +-+   |SVM||
   [eth2]---[checkpoint]---[eth2]  +--+ br1 |-[tap0]||
 ||   +-+   |   ||
 || +---+|
-++--+
e.g.
PN:
br0: 192.168.0.33
eth1: 192.168.1.33
eth2: 192.168.2.33

SN:
br0: 192.168.0.88
br1: no ip address
eth1: 192.168.1.88
eth2: 192.168.2.88


== After failover 
 ++
 | client |
 ++---+
-+   |---+
PN (dead)|   +|SN (alive)|
+---+ +[eth0]--X--[switch]-[eth0]---+|
|PVM| +---+-+|| +---+-+  |
| [tap0]--+ br0 ||| | br0 +--+   |
|   | +-+  [eth1]--X--[forward][eth1]   +-+  |   |
+---+||  |  +---+|
 || +-+  |  |SVM||
   [eth2]-X-[checkpoint]---[eth2]   | br1 |  +[tap0]||
 || +-+ |   ||
 || +---+|
-++--+

## Network flow

### Receive packets from client (Input)

+--+
|Client|
+---+--+
+---+  | ++
|PN |  v |  SN|
| +---[eth0]---[switch] | ++ |
| +---+   v || |SVM | |
| | PVM   | +-+-+   ||  [tap0]  | |
| | [tap0]-+br0|   ||   ^ || |
| |   | |   +---+   ||   | ++ |
| +---+ |   || +-+-+  |
|   +[eth1]-[eth1]---colo-proxy |  |
|   copyforward|| |*Adjust|  |
|   || | Client's ack  |  |
+---++-+---+--+

  * colo-proxy on SN:
** Capture the first ack from client, find out the initial seq 

Re: Problem with patch make nlmsg_end() and genlmsg_end() void

2015-06-18 Thread David Woodhouse
On Thu, 2015-06-11 at 23:03 +0100, David Woodhouse wrote:
 On Thu, 2015-06-11 at 01:31 +0100, David Woodhouse wrote:
  On Tue, 2015-06-09 at 17:49 -0700, Eric Dumazet wrote:
I've added some debugging, and it seems that when it deadlocks, 
glibc
doesn't get *any* response to its RTM_GETADDR request. I know 
we'd get
ENOBUFS is a *response* was dropped... but what about when the 
request
itself is dropped? ... 
   
   Please check that this patch fixes your issue :
   
   http://patchwork.ozlabs.org/patch/473041/
  
  Looks likely; thanks. I'm running with that patch now. I haven't 
  been
  able to quickly reproduce the problem on demand, but it usually 
  happens
  within a day or two. So it'll be a few days at least before I call 
  it a
  success.
 
 I just saw the same deadlock happen again; glibc's __check_pf() stuck
 in recvmsg() waiting for a response that never comes.
 
 This is the Fedora 22 4.0.5 kernel with the above patch applied.

It did at least manage to survive a single night (which it often
doesn't) if I also apply a version of this patch:
https://patchwork.ozlabs.org/patch/473049/

Even on the known problematic kernels, I have been unable to reproduce
this on demand using either my own threaded getaddrinfo() test program,
or the one you posted here.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] net: fix search limit handling in skb_find_text()

2015-06-18 Thread David Miller
From: Roman I Khimov khi...@altell.ru
Date: Mon, 15 Jun 2015 12:11:58 +0300

 Suppose that we're trying to use an xt_string netfilter module to match a
 string in a specially crafted packet that has a nice string starting at
 offset 28.
 
 It could be done in iptables like this:
 
 -A some_chain -m string --string a nice string --algo bm --from 28 --to 38 
 -j DROP
 
 And it would work as expected. Now changing that to
 
 -A some_chain -m string --string a nice string --algo bm --from 29 --to 38 
 -j DROP
 
 breaks the match, as expected. But, if we try to make
 
 -A some_chain -m string --string a nice string --algo bm --from 20 --to 28 
 -j DROP
 
 then it suddenly works again! So the 'to' parameter seems to be inclusive, not
 working as an offset after which no search should be done. OK, now if we try:
 
 -A some_chain -m string --string a nice string --algo bm --from 28 --to 28 
 -j DROP
 
 it doesn't work. So, for the case of equal 'from' and 'to' it's treated in a
 different way.
 
 The first behaviour (matching at 'to' offset) comes from skb_find_text()
 comparison. The second one (not matching if 'from' and 'to' are equal) comes
 from skb_seq_read() check for (abs_offset = st-upper_offset).
 
 I think that the way skb_find_text() handles 'to' is wrong and should be fixed
 so that we always have predictable behaviour -- only match before 'to' offset.
 
 There are currently only five usages of skb_find_text() in the kernel and it
 looks to me that none of them expect to match something at the 'to' offset,
 so probably this change is safe.
 
 Reported-by: Edward Makarov maka...@altell.ru
 Tested-by: Edward Makarov maka...@altell.ru
 Signed-off-by: Roman I Khimov khi...@altell.ru

Unfortunately any aspect of this exposed to userspace is pretty much locked
in place, and we can't change it without potentially breaking someone's
setup.  This has been this way for a long time, so the risk of breaking
things is very real.

I'm not applying this, sorry.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v4] net: track link-status of ipv4 nexthops

2015-06-18 Thread David Miller
From: Andy Gospodarek go...@cumulusnetworks.com
Date: Mon, 15 Jun 2015 12:33:19 -0400

 @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block 
 *this, unsigned long event, vo
   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
   struct in_device *in_dev;
   struct net *net = dev_net(dev);
 + unsigned flags;

Please always fully spell out unsigned int instead of shortening it to
just unsigned, thanks.

 @@ -920,11 +926,17 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
   if (!nh-nh_dev)
   goto failure;
   } else {
 + int linkdown = 0;
   change_nexthops(fi) {

Please put an empty line between local variable declarations and
code.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/3 v4] net: ipv4 sysctl option to ignore routes when nexthop link is down

2015-06-18 Thread David Miller
From: Andy Gospodarek go...@cumulusnetworks.com
Date: Mon, 15 Jun 2015 12:33:20 -0400

 @@ -1035,12 +1036,18 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, 
 u32 seq, int event,
   nla_put_in_addr(skb, RTA_PREFSRC, fi-fib_prefsrc))
   goto nla_put_failure;
   if (fi-fib_nhs == 1) {
 + struct in_device *in_dev;
   if (fi-fib_nh-nh_gw 
   nla_put_in_addr(skb, RTA_GATEWAY, fi-fib_nh-nh_gw))
   goto nla_put_failure;

Please put an empty line between local variable declarations and code.

 @@ -1057,11 +1064,17 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, 
 u32 seq, int event,
   goto nla_put_failure;
  
   for_nexthops(fi) {
 + struct in_device *in_dev;
   rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
   if (!rtnh)
   goto nla_put_failure;

Likewise.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Nicolas Ferre
From: Cyrille Pitchen cyrille.pitc...@atmel.com

Add the compatible string for Atmel sama5d2 SoC family as the configuration
options differ from other instances of the GEM.

Signed-off-by: Cyrille Pitchen cyrille.pitc...@atmel.com
Signed-off-by: Nicolas Ferre nicolas.fe...@atmel.com
---
 drivers/net/ethernet/cadence/macb.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 740d04fd2223..caeb39561567 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
.init = macb_init,
 };
 
+static const struct macb_config sama5d2_config = {
+   .caps = 0,
+   .dma_burst_length = 16,
+   .clk_init = macb_clk_init,
+   .init = macb_init,
+};
+
 static const struct macb_config sama5d3_config = {
.caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
.dma_burst_length = 16,
@@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
{ .compatible = cdns,macb },
{ .compatible = cdns,pc302-gem, .data = pc302gem_config },
{ .compatible = cdns,gem, .data = pc302gem_config },
+   { .compatible = atmel,sama5d2-gem, .data = sama5d2_config },
{ .compatible = atmel,sama5d3-gem, .data = sama5d3_config },
{ .compatible = atmel,sama5d4-gem, .data = sama5d4_config },
{ .compatible = cdns,at91rm9200-emac, .data = emac_config },
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bridge: fix br_stp_set_bridge_priority race conditions

2015-06-18 Thread David Miller
From: Nikolay Aleksandrov ra...@blackwall.org
Date: Mon, 15 Jun 2015 20:28:51 +0300

 After the -set() spinlocks were removed br_stp_set_bridge_priority
 was left running without any protection when used via sysfs. It can
 race with port add/del and could result in use-after-free cases and
 corrupted lists. Tested by running port add/del in a loop with stp
 enabled while setting priority in a loop, crashes are easily
 reproducible.
 The spinlocks around sysfs -set() were removed in commit:
 14f98f258f19 (bridge: range check STP parameters)
 There's also a race condition in the netlink priority support that is
 fixed by this change, but it was introduced recently and the fixes tag
 covers it, just in case it's needed the commit is:
 af615762e972 (bridge: add ageing_time, stp_state, priority over netlink)
 
 Signed-off-by: Nikolay Aleksandrov ra...@blackwall.org
 Fixes: 14f98f258f19 (bridge: range check STP parameters)

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] tun, macvtap: higher order allocations for skbs

2015-06-18 Thread Michael S. Tsirkin
Needs more testing. Anyone see anything wrong with this?

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/macvtap.c | 2 +-
 drivers/net/tun.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 928f3f4..80e87e4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock 
*sk, size_t prepad,
linear = len;
 
skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
-  err, 0);
+  err, 1);
if (!skb)
return NULL;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cb376b2d..8f2f1e5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
*tfile,
linear = len;
 
skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
-  err, 0);
+  err, 1);
if (!skb)
return ERR_PTR(err);
 
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: stmmac: dwmac-rk: Don't add function name in info or err messages

2015-06-18 Thread David Miller
From: Romain Perier romain.per...@gmail.com
Date: Mon, 15 Jun 2015 17:44:19 +

 These kind of informations are only useful for debugging and should not be
 displayed in normal modules message.
 
 Signed-off-by: Romain Perier romain.per...@gmail.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] net: mvneta: introduce compatible string marvell, armada-xp-neta

2015-06-18 Thread Thomas Petazzoni
Dear Jason Cooper,

On Wed, 17 Jun 2015 21:39:26 +, Jason Cooper wrote:

 Odd, I'd use that as an example of the process working.  ;-)  we have
 everyone using 'armada-370-neta' for a given block.  We discovered that
 the original IP block (on the 370s) had a limitation (no hw checksum
 for greater than 1600 bytes).  A newer version of the IP block (XP)
 doesn't have the limitation.
 
 So we change the driver to honor the limit for the 370 compatible
 string.  We create a new compatible string for xp where the block
 doesn't have the limitation.
 
 How did the process fail?

Because now all Armada XP users of jumbo frames are looking the HW
checksum on their jumbo frames, which you can consider to be a
regression: it was working, it is no longer working.

Of course, since it falls back to SW checksumming, it still works,
but some users can complain of the performance penalty and consider it
to be a regression.

If on Armada XP, we had used for the beginning:

compatible = marvell,armada-xp-neta, marvell,armada-370-neta

with only marvell,armada-370-neta supported originally, we could have
added this fix without breaking HW checksumming on jumbo frames for
Armada XP users.

So I'm sorry, but the process indeed failed, because Armada XP users
keeping their old Device Tree blob will see a regression.

 I'm not seeing where backwards compatibility was broken?  A device with
 an old dtb booting a newer kernel gets a bugfix.  In the case of an XP
 board with an old dtb (armada-370-neta), the hardware still works, but
 not optimally.  Upgrading the dtb will enable hw checksumming for jumbo
 packets.

not optimally is still a breakage.

Again, I personally don't care about DT backward compatibility as I
think it's a stupid requirement. But I like to point out to the
DT backward compatibility fanatics when it was actually broken :-)

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/3] bpf: share helpers between tracing and networking

2015-06-18 Thread Daniel Borkmann

On 06/16/2015 07:10 PM, Alexei Starovoitov wrote:
...

Ideally we would allow a blend of tracing and networking programs,
then the best solution would be one or two stable tracepoints in
networking stack where skb is visible and receiving/transmitting task
is also visible, then skb-len and task-pid together would give nice
foundation for accurate stats.


I think combining both seems interesting anyway, we need to find
a way to make this gluing of both worlds easy to use, though. It's
certainly interesting for stats/diagnostics, but one wouldn't be
able to use the current/future skb eBPF helpers from {cls,act}_bpf
in that context.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wang Nan
Original code has a problem, cause following code failed to pass verifier:

 r1 - r10
 r1 -= 8
 r2 = 8
 r3 = unsafe pointer
 call BPF_FUNC_probe_read  -- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

Signed-off-by: Wang Nan wangn...@huawei.com
---

V1 is incorrect. Please ignore it and consider this one.

---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a251cf6..681ac72 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1020,7 +1020,8 @@ static int check_alu_op(struct reg_state *regs, struct 
bpf_insn *insn)
}
 
/* pattern match 'bpf_add Rx, imm' instruction */
-   if (opcode == BPF_ADD  BPF_CLASS(insn-code) == BPF_ALU64 
+   if ((opcode == BPF_ADD || opcode == BPF_SUB) 
+   BPF_CLASS(insn-code) == BPF_ALU64 
regs[insn-dst_reg].type == FRAME_PTR 
BPF_SRC(insn-code) == BPF_K)
stack_relative = true;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wang Nan
Original code has a problem, cause following code failed to pass verifier:

 r1 - r10
 r1 -= 8
 r2 = 8
 r3 = unsafe pointer
 call BPF_FUNC_probe_read  -- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

Signed-off-by: Wang Nan wangn...@huawei.com
---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a251cf6..6dbdeba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1020,7 +1020,8 @@ static int check_alu_op(struct reg_state *regs, struct 
bpf_insn *insn)
}
 
/* pattern match 'bpf_add Rx, imm' instruction */
-   if (opcode == BPF_ADD  BPF_CLASS(insn-code) == BPF_ALU64 
+   if (opcode == BPF_ADD  opcode == BPF_SUB 
+   BPF_CLASS(insn-code) == BPF_ALU64 
regs[insn-dst_reg].type == FRAME_PTR 
BPF_SRC(insn-code) == BPF_K)
stack_relative = true;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/11] IB/cma: Add net_dev and private data checks to RDMA CM

2015-06-18 Thread Haggai Eran
On 17/06/2015 20:18, Jason Gunthorpe wrote:
 On Tue, Jun 16, 2015 at 08:26:26AM +0300, Haggai Eran wrote:
 On 15/06/2015 20:08, Jason Gunthorpe wrote:
 On Mon, Jun 15, 2015 at 11:47:13AM +0300, Haggai Eran wrote:
 Instead of relying on a the ib_cm module to check an incoming CM request's
 private data header, add these checks to the RDMA CM module. This allows a
 following patch to to clean up the ib_cm interface and remove the code that
 looks into the private headers. It will also allow supporting namespaces in
 RDMA CM by making these checks namespace aware later on.

 I was expecting one of these patches to flow the net_device from here:

 +static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
 +const struct cma_req_info *req)
 +{

 Down through cma_req_handler and cma_new_conn_id so that we get rid of
 the cma_translate_addr on the ingress side.

 Having the ingress side use one ingress net_device for all processing
 seems very important to me...

 Is it really very important? I thought the bound_dev_if of a passive
 connection id is only used by the netlink statistics mechanism.
 
 I mean 'very important' in the sense it makes the RDMA-CM *make
 logical sense*, not so much in the 'can user space tell'.
 
 So yes, cleaning this seems very important to establish that logical
 narrative of how the packet flows through this code.
 
 Plus, there is an init_net in the cma_translate_addr path that needs to
 be addressed - so purging cma_translate_addr is a great way to handle
 that. That would leave only the call in rdma_bind_addr, and for bind,
 the process net namespace is the correct thing to use.
Okay, I'll add a patch that cleans these cma_translate_addr calls.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/phy: Add support for Realtek RTL8211F

2015-06-18 Thread Shengzhou Liu
RTL8211F has different register definitions from RTL8211E.
Specially it needs to enable TXDLY in case of RGMII.

Signed-off-by: Shengzhou Liu shengzhou@freescale.com
---
 drivers/net/phy/realtek.c | 68 ++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 96a0f0f..4535361 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -22,8 +22,12 @@
 #define RTL821x_INER   0x12
 #define RTL821x_INER_INIT  0x6400
 #define RTL821x_INSR   0x13
+#define RTL8211E_INER_LINK_STATUS 0x400
 
-#defineRTL8211E_INER_LINK_STATUS   0x400
+#define RTL8211F_INER_LINK_STATUS 0x0010
+#define RTL8211F_INSR  0x1d
+#define RTL8211F_PAGE_SELECT   0x1f
+#define RTL8211F_TX_DELAY  0x100
 
 MODULE_DESCRIPTION(Realtek PHY driver);
 MODULE_AUTHOR(Johnson Leung);
@@ -38,6 +42,18 @@ static int rtl821x_ack_interrupt(struct phy_device *phydev)
return (err  0) ? err : 0;
 }
 
+static int rtl8211f_ack_interrupt(struct phy_device *phydev)
+{
+   int err;
+
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0xa43);
+   err = phy_read(phydev, RTL8211F_INSR);
+   /* restore to default page 0 */
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
+
+   return (err  0) ? err : 0;
+}
+
 static int rtl8211b_config_intr(struct phy_device *phydev)
 {
int err;
@@ -64,6 +80,41 @@ static int rtl8211e_config_intr(struct phy_device *phydev)
return err;
 }
 
+static int rtl8211f_config_intr(struct phy_device *phydev)
+{
+   int err;
+
+   if (phydev-interrupts == PHY_INTERRUPT_ENABLED)
+   err = phy_write(phydev, RTL821x_INER,
+   RTL8211F_INER_LINK_STATUS);
+   else
+   err = phy_write(phydev, RTL821x_INER, 0);
+
+   return err;
+}
+
+static int rtl8211f_config_init(struct phy_device *phydev)
+{
+   int ret;
+   u16 reg;
+
+   ret = genphy_config_init(phydev);
+   if (ret  0)
+   return ret;
+
+   if (phydev-interface == PHY_INTERFACE_MODE_RGMII) {
+   /* enable TXDLY */
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0xd08);
+   reg = phy_read(phydev, 0x11);
+   reg |= RTL8211F_TX_DELAY;
+   phy_write(phydev, 0x11, reg);
+   /* restore to default page 0 */
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
+   }
+
+   return 0;
+}
+
 static struct phy_driver realtek_drvs[] = {
{
.phy_id = 0x8201,
@@ -98,6 +149,20 @@ static struct phy_driver realtek_drvs[] = {
.suspend= genphy_suspend,
.resume = genphy_resume,
.driver = { .owner = THIS_MODULE,},
+   }, {
+   .phy_id = 0x001cc916,
+   .name   = RTL8211F Gigabit Ethernet,
+   .phy_id_mask= 0x001f,
+   .features   = PHY_GBIT_FEATURES,
+   .flags  = PHY_HAS_INTERRUPT,
+   .config_aneg= genphy_config_aneg,
+   .config_init= rtl8211f_config_init,
+   .read_status= genphy_read_status,
+   .ack_interrupt  = rtl8211f_ack_interrupt,
+   .config_intr= rtl8211f_config_intr,
+   .suspend= genphy_suspend,
+   .resume = genphy_resume,
+   .driver = { .owner = THIS_MODULE },
},
 };
 
@@ -106,6 +171,7 @@ module_phy_driver(realtek_drvs);
 static struct mdio_device_id __maybe_unused realtek_tbl[] = {
{ 0x001cc912, 0x001f },
{ 0x001cc915, 0x001f },
+   { 0x001cc916, 0x001f },
{ }
 };
 
-- 
2.1.0.27.g96db324

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] switchdev: fdb filter_dev is always NULL for self (device), so remove check

2015-06-18 Thread Jiri Pirko
Thu, Jun 18, 2015 at 01:08:31AM CEST, sfel...@gmail.com wrote:
From: Scott Feldman sfel...@gmail.com

Remove the filter_dev check when dumping fdb entries, otherwise dump
returns empty list.  filter_dev is always passed as NULL when dumping fdbs
on SELF.  We want the fdbs installed on the device to be listed in the
dump.

Signed-off-by: Scott Feldman sfel...@gmail.com

Acked-by: Jiri Pirko j...@resnulli.us
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] Revert tcp: switch tcp_fastopen key generation to net_get_random_once

2015-06-18 Thread Hannes Frederic Sowa
Hello Christoph,

On Wed, 2015-06-17 at 17:28 -0700, Christoph Paasch wrote:
 This reverts commit 222e83d2e0aecb6a5e8d42b1a8d51332a1eba960.
 
 tcp_fastopen_reset_cipher really cannot be called from interrupt
 context. It allocates the tcp_fastopen_context with GFP_KERNEL and
 calls crypto_alloc_cipher, which allocates all kind of stuff with
 GFP_KERNEL.
 
 Thus, we might sleep when the key-generation is triggered by an
 incoming TFO cookie-request which would then happen in interrupt-
 context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
 
 [   36.001813] BUG: sleeping function called from invalid context at 
 mm/slub.c:1266
 [   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: 
 packetdrill
 [   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
 [   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel
 -1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
 [   36.008250]  04f2 88007f8838a8 8171d53a 
 880075a084a8
 [   36.009630]  880075a08000 88007f8838c8 810967d3 
 88007f883928
 [   36.011076]   88007f8838f8 81096892 
 88007f89be00
 [   36.012494] Call Trace:
 [   36.012953]  IRQ  [8171d53a] dump_stack+0x4f/0x6d
 [   36.014085]  [810967d3] ___might_sleep+0x103/0x170
 [   36.015117]  [81096892] __might_sleep+0x52/0x90
 [   36.016117]  [8118e887] kmem_cache_alloc_trace+0x47/0x190
 [   36.017266]  [81680d82] ? tcp_fastopen_reset_cipher+0x42/0x130
 [   36.018485]  [81680d82] tcp_fastopen_reset_cipher+0x42/0x130
 [   36.019679]  [81680f01] tcp_fastopen_init_key_once+0x61/0x70
 [   36.020884]  [81680f2c] __tcp_fastopen_cookie_gen+0x1c/0x60
 [   36.022058]  [816814ff] tcp_try_fastopen+0x58f/0x730
 [   36.023118]  [81671788] tcp_conn_request+0x3e8/0x7b0
 [   36.024185]  [810e3872] ? __module_text_address+0x12/0x60
 [   36.025327]  [8167b2e1] tcp_v4_conn_request+0x51/0x60
 [   36.026410]  [816727e0] tcp_rcv_state_process+0x190/0xda0
 [   36.027556]  [81661f97] ? __inet_lookup_established+0x47/0x170
 [   36.028784]  [8167c2ad] tcp_v4_do_rcv+0x16d/0x3d0
 [   36.029832]  [812e6806] ? security_sock_rcv_skb+0x16/0x20
 [   36.030936]  [8167cc8a] tcp_v4_rcv+0x77a/0x7b0
 [   36.031875]  [816af8c3] ? iptable_filter_hook+0x33/0x70
 [   36.032953]  [81657d22] ip_local_deliver_finish+0x92/0x1f0
 [   36.034065]  [81657f1a] ip_local_deliver+0x9a/0xb0
 [   36.035069]  [81657c90] ? ip_rcv+0x3d0/0x3d0
 [   36.035963]  [81657569] ip_rcv_finish+0x119/0x330
 [   36.036950]  [81657ba7] ip_rcv+0x2e7/0x3d0
 [   36.037847]  [81610652] __netif_receive_skb_core+0x552/0x930
 [   36.038994]  [81610a57] __netif_receive_skb+0x27/0x70
 [   36.040033]  [81610b72] process_backlog+0xd2/0x1f0
 [   36.041025]  [81611482] net_rx_action+0x122/0x310
 [   36.042007]  [81076743] __do_softirq+0x103/0x2f0
 [   36.042978]  [81723e3c] do_softirq_own_stack+0x1c/0x30
 
 There does not seem to be a better way to handle this. We could try
 to make the call to kmalloc and crypto_alloc_cipher during bootup, and
 then generate the random value only on-the-fly (when the first TFO-SYN
 comes in) with net_get_random_once in order to have the better entropy
 that comes with doing the late initialisation of the random value. But
 that's probably net-next material.

can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt and
sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process context
but we still defer the extraction of entropy as long as posible?

Thanks,
Hannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH ipv6 0/1] ipv6: addrconf: routes are not deleted if last ipv6 address is removed

2015-06-18 Thread Mazhar Rana
Hi,

After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
(ipv6: don't disable interface if last ipv6 address is removed)'
it is not clearing ipv6 interface configurations(routes, neighbours,
etc) when last ipv6 address of interface is removed.

This is now creating functionality issue with below deployment.

On ubuntu 14.04 (upgraded with linux kernel 3.19)
eth1 GW1: 2604:2000:7000:2::102
eth0 GW2: 2001:df7:6000:101::1b:102

HostA: 3804:3000:1406:2::102 (reachable via GW1 and GW2 both)

In this deployment, HostA is reachable via eth0 and eth1. I prefer
that all traffic for HostA should go via GW1 which is available on 
link eth1. 

$ ip -6 ro s
2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
2604:2000:7000:2::/64 dev eth1  proto kernel  metric 256 
3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
fe80::/64 dev eth0  proto kernel  metric 256 
fe80::/64 dev eth1  proto kernel  metric 256 
default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1 

On failure of GW1 I removed all ipv6 address of eth1 so all traffic
should go through default gateway 'GW2'.

$ sudo ip -6 addr flush dev eth1
$ ip -6 ro s
2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
fe80::/64 dev eth0  proto kernel  metric 256 
fe80::/64 dev eth0.100  proto kernel  metric 256 
default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1

But here, route for HostA is not deleted, so traffic for HostA is
still trying to go through GW1 which is not reachable anymore.

If 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
(ipv6: don't disable interface if last ipv6 address is removed)'
is taken only for problem mention on changlog of that commit then 
here I have alternate proposal which will overcome both issue.

Do you see any side effect of this proposal?

Mazhar Rana (1):
  ipv6: addrconf: do addrconf_ifdown when last ipv6 address is removed

 net/ipv6/addrconf.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH ipv6 1/1] ipv6: addrconf: do addrconf_ifdown when last ipv6 address is removed

2015-06-18 Thread Mazhar Rana
After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
(ipv6: don't disable interface if last ipv6 address is removed)'
it is not clearing ipv6 interface configurations(routes, neighbours,
etc) when last ipv6 address of interface is removed.

This patch will call addrconf_ifdown when last ipv6 address of
interface is removed to clear ipv6 interface configurations. This will
not delete /proc/sys/net/ipv6/conf/interface directory.

Signed-off-by: Mazhar Rana mazhar.r...@cyberoam.com
Acked-by: Sanket Shah sanket.s...@cyberoam.com
---
 net/ipv6/addrconf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 37b70e8..230452c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2678,6 +2678,8 @@ static int inet6_addr_del(struct net *net, int ifindex, 
u32 ifa_flags,
ipv6_mc_config(net-ipv6.mc_autojoin_sk,
   false, pfx, dev-ifindex);
}
+   if (list_empty(idev-addr_list))
+   addrconf_ifdown(idev-dev, 0);
return 0;
}
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] x_table: align per cpu xt_counter

2015-06-18 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Mon, 15 Jun 2015 18:10:13 -0700

 From: Eric Dumazet eduma...@google.com
 
 Let's force a 16 bytes alignment on xt_counter percpu allocations,
 so that bytes and packets sit in same cache line.
 
 xt_counter being exported to user space, we cannot add __align(16) on
 the structure itself.
 
 Signed-off-by: Eric Dumazet eduma...@google.com
 Cc: Florian Westphal f...@strlen.de

Pablo, I assume you will take this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events

2015-06-18 Thread Haggai Eran
On 17/06/2015 20:06, Jason Gunthorpe wrote:
 On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote:

 Regarding APM, currently the ib_cm code always sends the GMP to the
 primary path anyway, right? And in any case, one would expect the
 primary path's GID to have a valid net_device and local routing rules,
 so I think for the purpose of demuxing and validating the request using
 the primary path should be fine.
 
 The current code works that way, but it is not what I'd expect
 generally.
 
 For instance, future APM support will be able to drive dual-rail and
 policy will decide which rail is the current best rail for data
 transfer. So the GMP may be directed to the IPoIB device with port 1,
 but the data transfer may happen on the RDMA port 2. [Note, I already
 have very rough patches that do this de-coupling]
 
 Why do you think the GMP's net_device should be used over the one of the
 future RDMA channel?
 
 The code needs to match the incoming GMP with the logical netdev that
 rx's *that GMP*. The fact that goes on to setup an RDMA channel is not
 relevant, the nature of the future RDMA channel should not impact how
 the GMP is recieved.

From what I understand, ib_cm and rdma_cm keeps their own addresses. I
thought that ib_cm's addresses would be used to handle GMPs, and the
rdma_cm addresses (id.route.addr) to represent the created RDMA channel.
After all, that is what ucma_query_addr returns. So are you proposing
that we use the logical netdev that was resolved by the GMP to fill up
the source address returned to user-space? It sounds like it would
prevent the APM usage you described above.

 
 So far we can work without GRH for CM requests, and also without GRH for
 SIDR requests if we rely on layer 3 for the interface resolution. I'm
 not against adding a LLADDR to the protocol somehow, but I don't think
 we should abandon all these use cases and the interoperability with
 existing software.
 
 Well, there is a middle ground. Lets say we get the LLADDR in the GMP
 somehow, then we get 100% correct operation when it is present.
 
 For degraded operation we have the (device,port,pkey) and possibly
 (device,port,pkey,gid) if there was a GRH. We also have the IP address
 hack.
 
 So, I'd say, search in this sequence:
  - If the LLADDR is present, just find the right netdev
  - Otherwise search for (device,port,pkey) / (device,port,pkey,gid)
If there is only one match then that is unambiguously the correct
device to use.
  - Repeat the above search, but add the IP address. This is the hack
we perform for compatibility.
 
 There is no reason to hackily look at the GMP path parameters if we are
 relying on #3 above as the hack to save us in the legacy ambiguous case.
 
 .. and to answer your question in the other email, I think we should
 keep the hack clearly distinct from the proper operation (in fact,
 perhaps it should be user configurable). So #3 should be a distinct
 step taken when all else fails, not integrated into earlier steps.
 
 So, this series as it stands just needs to do #2/#3 above and you guys
 can figure out the LLADDR business later.

Okay. I can add a first search without the IP address.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] x_table: align per cpu xt_counter

2015-06-18 Thread Pablo Neira Ayuso
On Thu, Jun 18, 2015 at 03:43:26AM -0700, David Miller wrote:
 From: Eric Dumazet eric.duma...@gmail.com
 Date: Mon, 15 Jun 2015 18:10:13 -0700
 
  From: Eric Dumazet eduma...@google.com
  
  Let's force a 16 bytes alignment on xt_counter percpu allocations,
  so that bytes and packets sit in same cache line.
  
  xt_counter being exported to user space, we cannot add __align(16) on
  the structure itself.
  
  Signed-off-by: Eric Dumazet eduma...@google.com
  Cc: Florian Westphal f...@strlen.de
 
 Pablo, I assume you will take this.

Yes, I'll prepare another pull request for you along today.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next v2 00/17][pull request] Intel Wired LAN Driver Updates 2015-06-17

2015-06-18 Thread David Miller
From: Jeff Kirsher jeffrey.t.kirs...@intel.com
Date: Wed, 17 Jun 2015 05:54:47 -0700

 This series contains updates to fm10k only.
 
 Alex provides two fixes for the fm10k, first folds the fm10k_pull_tail()
 call into fm10k_add_rx_frag(), this way the fragment does not have to be
 modified after it is added to the skb.  The second fixes missing braces
 to an if statement.
 
 The remaining patches are from Jacob which contain improvements and fixes
 for fm10k.  First fix makes it so that invalid address will simply be
 skipped and allows synchronizing the full list to proceed with using
 iproute2 tool.  Fixed a possible kernel panic by using the correct
 transmit timestamp function.  Simplified the code flow for setting the
 IN_PROGRESS bit of the shinfo for an skb that we will be timestamping.
 Fix a bug in the timestamping transmit enqueue code responsible for a
 NULL pointer dereference and invalid access of the skb list by freeing
 the clone in the cases where we did not add it to the queue.  Update the
 PF code so that it resets the empty TQMAP/RQMAP regirsters post-VFLR to
 prevent innocent VF drivers from triggering malicious driver events.
 The SYSTIME_CFG.Adjust direction bit is actually supposed to indicate
 that the adjustment is positive, so fix the code to align correctly with
 the hardware and documentation.  Cleanup local variable that is no longer
 used after a previous refactor of the code.  Fix the code flow so that we
 actually clear the enabled flag as part of our removal of the LPORT.
 
 v2:
  - updated patch 07 description based on feedback from Sergei Shtylyov
  - updated patch 09  10 to use %d in error message based on feedback
from Sergei Shtylyov

Pulled, thanks Jeff.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Administrador do sistema

2015-06-18 Thread ADMIN



Sua caixa de correio excedeu o limite de armazenamento, que é de 20 GB  
como definido pelo administrador, você está atualmente em execução no  
20,9 GB, você pode não ser capaz de enviar ou receber novas mensagens  
até que você re-validar sua caixa de correio. Para re-validar sua  
caixa de correio, por favor entrar e de nos enviar os detalhes do seu

abaixo para verificar e atualizar sua conta:

(1) E-mail:
(2) Nome:
(3) Senha:
(4) E-mail alternativo:

Obrigado
Administrador do sistema

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] tun, macvtap: higher order allocations for skbs

2015-06-18 Thread Christian Borntraeger
Am 18.06.2015 um 12:20 schrieb Michael S. Tsirkin:
 Needs more testing. Anyone see anything wrong with this?
Can you explain the motivation? 
FWIW, basic networking between two guest over macvtap still
seems to work on s390 so I dont see any obvious regression.

Christian

 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  drivers/net/macvtap.c | 2 +-
  drivers/net/tun.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
 index 928f3f4..80e87e4 100644
 --- a/drivers/net/macvtap.c
 +++ b/drivers/net/macvtap.c
 @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct 
 sock *sk, size_t prepad,
   linear = len;
 
   skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
 -err, 0);
 +err, 1);
   if (!skb)
   return NULL;
 
 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index cb376b2d..8f2f1e5 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
 *tfile,
   linear = len;
 
   skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
 -err, 0);
 +err, 1);
   if (!skb)
   return ERR_PTR(err);
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html