date:20150618

Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-18 Thread Julian Anastasov


Hello,

On Thu, 18 Jun 2015, Roopa Prabhu wrote:

> @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
>   payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
>  
>   if (fi->fib_nhs) {
> + size_t nh_encapsize = 0;

Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n?

>   /* Also handles the special case fib_nhs == 1 */
>  
>   /* each nexthop is packed in an attribute */
> @@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
>   /* may contain flow and gateway attribute */
>   nhsize += 2 * nla_total_size(4);
>  
> +#ifdef CONFIG_LWTUNNEL
> + /* grab encap info */
> + for_nexthops(fi) {
> + if (nh->nh_lwtstate) {
> + /* RTA_ENCAP_TYPE */
> + nh_encapsize += lwtunnel_get_encap_size(
> + nh->nh_lwtstate);

New labels not in #ifdef:

> +
> +err_inval:
> + ret = -EINVAL;
> +
> +errout:
> + return ret;
>  }

Some other places may need changes:

- nh_comp: there is logic that decides if same fib_info
is reused from many fib nodes. There should be check
if NH matches by nh_lwtstate.

- xfrm4_fill_dst: not sure about this but some fields
are copied.

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().

2015-06-18 Thread Eric Dumazet

On Thu, 2015-06-18 at 20:40 +0900, Hiroaki Shimoda wrote:
> inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH
> disabled. So no need to disable BH in inet_diag_dump_reqs().
> 
> Signed-off-by: Hiroaki Shimoda 
> ---
>  net/ipv4/inet_diag.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 21985d8d41e7..4ca789ba63cb 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, 
> struct sock *sk,
>  
>   entry.family = sk->sk_family;
>  
> - spin_lock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
> + spin_lock(&icsk->icsk_accept_queue.syn_wait_lock);
>  
>   lopt = icsk->icsk_accept_queue.listen_opt;
>   if (!lopt || !listen_sock_qlen(lopt))
> @@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, 
> struct sock *sk,
>   }
>  
>  out:
> - spin_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
> + spin_unlock(&icsk->icsk_accept_queue.syn_wait_lock);
>  
>   return err;
>  }

Sure, although this will soon be removed completely when SYN_RECV
sockets will be stored in regular ehash table.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC net] neigh: do not modify unlinked entries

2015-06-18 Thread Eric Dumazet

On Tue, 2015-06-16 at 22:56 +0300, Julian Anastasov wrote:
> The lockless lookups can return entry that is unlinked.
> Sometimes they get reference before last neigh_cleanup_and_release,
> sometimes they do not need reference. Later, any
> modification attempts may result in the following problems:
> 
> 1. entry is not destroyed immediately because neigh_update
> can start the timer for dead entry, eg. on change to NUD_REACHABLE
> state. As result, entry lives for some time but is invisible
> and out of control.
> 
> 2. __neigh_event_send can run in parallel with neigh_destroy
> while refcnt=0 but if timer is started and expired refcnt can
> reach 0 for second time leading to second neigh_destroy and
> possible crash.
> 
> Thanks to Eric Dumazet and Ying Xue for their work and analyze
> on the __neigh_event_send change.
> 
> Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
> Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet 
> path.")
> Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
> Cc: Eric Dumazet 
> Cc: Ying Xue 
> Signed-off-by: Julian Anastasov 
> ---

Seems good to me Julian !

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next RFC v2 3/3] mpls: support for ip tunnels

2015-06-18 Thread Roopa Prabhu

From: Roopa Prabhu 

Support ip mpls tunnels using the new lwt infrastructure.

Signed-off-by: Roopa Prabhu 
---
 include/linux/mpls_iptunnel.h  |6 ++
 include/net/mpls_iptunnel.h|   29 +
 include/uapi/linux/mpls_iptunnel.h |   26 +
 net/mpls/Kconfig   |5 +
 net/mpls/Makefile  |1 +
 net/mpls/af_mpls.c |9 +-
 net/mpls/internal.h|3 +
 net/mpls/mpls_iptunnel.c   |  205 
 8 files changed, 281 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/mpls_iptunnel.h
 create mode 100644 include/net/mpls_iptunnel.h
 create mode 100644 include/uapi/linux/mpls_iptunnel.h
 create mode 100644 net/mpls/mpls_iptunnel.c

diff --git a/include/linux/mpls_iptunnel.h b/include/linux/mpls_iptunnel.h
new file mode 100644
index 000..ef29eb2
--- /dev/null
+++ b/include/linux/mpls_iptunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_MPLS_IPTUNNEL_H
+#define _LINUX_MPLS_IPTUNNEL_H
+
+#include 
+
+#endif  /* _LINUX_MPLS_IPTUNNEL_H */
diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
new file mode 100644
index 000..4234efc
--- /dev/null
+++ b/include/net/mpls_iptunnel.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2015 Cumulus Networks, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef _NET_MPLS_IPTUNNEL_H
+#define _NET_MPLS_IPTUNNEL_H 1
+
+#define MAX_NEW_LABELS 2
+
+struct mpls_iptunnel_encap {
+   u32 label[MAX_NEW_LABELS];
+   u8  labels;
+};
+
+static inline struct mpls_iptunnel_encap *mpls_lwt_hdr(struct lwtunnel_state 
*lwtstate)
+{
+   return (struct mpls_iptunnel_encap *)lwtstate->tunnel.data;
+}
+
+#endif
diff --git a/include/uapi/linux/mpls_iptunnel.h 
b/include/uapi/linux/mpls_iptunnel.h
new file mode 100644
index 000..228e36a
--- /dev/null
+++ b/include/uapi/linux/mpls_iptunnel.h
@@ -0,0 +1,26 @@
+/*
+ * mpls tunnel api
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_MPLS_IPTUNNEL_H
+#define _UAPI_LINUX_MPLS_IPTUNNEL_H
+
+/* MPLS tunnel attributes
+ * [RTA_ENCAP] = {
+ * [MPLS_IPTUNNEL_DST]
+ * }
+ */
+enum {
+   MPLS_IPTUNNEL_UNSPEC,
+   MPLS_IPTUNNEL_DST,
+   __MPLS_IPTUNNEL_MAX,
+};
+#define MPLS_IPTUNNEL_MAX (__MPLS_IPTUNNEL_MAX - 1)
+
+#endif /* _UAPI_LINUX_MPLS_IPTUNNEL_H */
diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 17bde79..3e87a6b 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -27,4 +27,9 @@ config MPLS_ROUTING
help
 Add support for forwarding of mpls packets.
 
+config MPLS_IPTUNNEL
+   tristate "MPLS: IP over MPLS tunnel support"
+   help
+Light weight tunnel handling for mpls tunnel packets
+
 endif # MPLS
diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 65bbe68..9ca9236 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile
@@ -3,5 +3,6 @@
 #
 obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o
 obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o
+obj-$(CONFIG_MPLS_IPTUNNEL) += mpls_iptunnel.o
 
 mpls_router-y := af_mpls.o
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1f93a59..c6f17ab 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -58,10 +58,11 @@ static inline struct mpls_dev *mpls_dev_get(const struct 
net_device *dev)
return rcu_dereference_rtnl(dev->mpls_ptr);
 }
 
-static bool mpls_output_possible(const struct net_device *dev)
+bool mpls_output_possible(const struct net_device *dev)
 {
return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
 }
+EXPORT_SYMBOL(mpls_output_possible);
 
 static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
 {
@@ -69,13 +70,14 @@ static unsigned int mpls_rt_header_size(const struct 
mpls_route *rt)
return rt->rt_labels * sizeof(struct mpls_shim_hdr);
 }
 
-static unsigned int mpls_dev_mtu(const struct net_device *dev)
+unsigned int mpls_dev_mtu(const struct net_device *dev)
 {
/* The amount of data the layer 2 frame can hold */
return dev->mtu;
 }
+EXPORT_SYMBOL(mpls_dev_mtu);
 
-static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
+bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 {
if (skb->len <= mtu)
return false;
@@ -85,6 +87,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, 
unsigned int mtu)

[PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes

2015-06-18 Thread Roopa Prabhu

From: Roopa Prabhu 

Introduces two netlink attributes RTA_ENCAP_TYPE and
RTA_ENCAP to support attaching encap information to ipv4 routes.

RTA_ENCAP is a nested attribute as suggested by Thomas
(and also as Robert had it in his series). RTA_ENCAP
netlink policy is declared by the light weight tunnel
drivers that support this encap type.

fib code calls the following for each nexthop:
- new route handler:
lwt build state (that parses RTA_ENCAP and returns
lwt state that lives in every fib_nh)
- del dump hanlder:
lwt release handler to release lwt state data
- route dump hanlder:
lwt dump encap to fill RTA_ENCAP data
- during input route lookup
sets dst->output to lwtunnel_output which
in turn calls the corresponding lwt tunnel
output function which applies the required
encap and xmits the packet

Signed-off-by: Roopa Prabhu 
---
 include/net/ip_fib.h   |7 ++-
 include/net/route.h|3 ++
 include/uapi/linux/rtnetlink.h |3 +-
 net/ipv4/fib_frontend.c|8 
 net/ipv4/fib_semantics.c   |   93 +++-
 net/ipv4/route.c   |   33 +-
 6 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..49f18d7 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -44,7 +44,9 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
- };
+   struct nlattr   *fc_encap;
+   u16 fc_encap_type;
+};
 
 struct fib_info;
 struct rtable;
@@ -89,6 +91,9 @@ struct fib_nh {
struct rtable __rcu * __percpu *nh_pcpu_rth_output;
struct rtable __rcu *nh_rth_input;
struct fnhe_hash_bucket __rcu *nh_exceptions;
+#ifdef CONFIG_LWTUNNEL
+   struct lwtunnel_state   *nh_lwtstate;
+#endif
 };
 
 /*
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..39a6495 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -66,6 +66,9 @@ struct rtable {
 
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+#ifdef CONFIG_LWTUNNEL
+   struct lwtunnel_state   *rt_lwtstate;
+#endif
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..6c089ad 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -308,6 +308,8 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_ENCAP_TYPE,
+   RTA_ENCAP,
__RTA_MAX
 };
 
@@ -357,7 +359,6 @@ struct rtvia {
 };
 
 /* RTM_CACHEINFO */
-
 struct rta_cacheinfo {
__u32   rta_clntref;
__u32   rta_lastuse;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..fbe0630 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -591,6 +591,8 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_FLOW]  = { .type = NLA_U32 },
+   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
+   [RTA_ENCAP] = { .type = NLA_NESTED },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
@@ -656,6 +658,12 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
case RTA_TABLE:
cfg->fc_table = nla_get_u32(attr);
break;
+   case RTA_ENCAP:
+   cfg->fc_encap = attr;
+   break;
+   case RTA_ENCAP_TYPE:
+   cfg->fc_encap_type = nla_get_u16(attr);
+   break;
}
}
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 28ec3c1..54dd287 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fib_lookup.h"
 
@@ -208,6 +209,10 @@ static void free_fib_info_rcu(struct rcu_head *head)
change_nexthops(fi) {
if (nexthop_nh->nh_dev)
dev_put(nexthop_nh->nh_dev);
+#ifdef CONFIG_LWTUNNEL
+   if (nexthop_nh->nh_lwtstate)
+   lwtunnel_state_put(nexthop_nh->nh_lwtstate);
+#endif
free_nh_exceptions(nexthop_nh);
rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
rt_fibinfo_free(&nexthop_nh->nh_rth_input);
@@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
 
if

[PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels

2015-06-18 Thread Roopa Prabhu

From: Roopa Prabhu 

provides ops to parse, build and output encaped
packets for drivers that want to attach tunnel encap
information to routes.

Signed-off-by: Roopa Prabhu 
---
 include/linux/lwtunnel.h  |6 ++
 include/net/lwtunnel.h|   84 +
 include/uapi/linux/lwtunnel.h |   11 +++
 net/Kconfig   |5 ++
 net/core/Makefile |1 +
 net/core/lwtunnel.c   |  162 +
 6 files changed, 269 insertions(+)
 create mode 100644 include/linux/lwtunnel.h
 create mode 100644 include/net/lwtunnel.h
 create mode 100644 include/uapi/linux/lwtunnel.h
 create mode 100644 net/core/lwtunnel.c

diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h
new file mode 100644
index 000..97f32f8
--- /dev/null
+++ b/include/linux/lwtunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_LWTUNNEL_H_
+#define _LINUX_LWTUNNEL_H_
+
+#include 
+
+#endif /* _LINUX_LWTUNNEL_H_ */
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
new file mode 100644
index 000..649da3c
--- /dev/null
+++ b/include/net/lwtunnel.h
@@ -0,0 +1,84 @@
+#ifndef __NET_LWTUNNEL_H
+#define __NET_LWTUNNEL_H 1
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define LWTUNNEL_HASH_BITS   7
+#define LWTUNNEL_HASH_SIZE   (1 << LWTUNNEL_HASH_BITS)
+
+struct lwtunnel_hdr {
+   int len;
+   __u8data[0];
+};
+
+/* lw tunnel state flags */
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+
+#define lwtunnel_output_redirect(lwtstate) (lwtstate && \
+   (lwtstate->flags & LWTUNNEL_STATE_OUTPUT_REDIRECT))
+
+struct lwtunnel_state {
+   __u16   type;
+   __u16   flags;
+   atomic_trefcnt;
+   struct lwtunnel_hdr tunnel;
+};
+
+struct lwtunnel_net {
+   struct hlist_head tunnels[LWTUNNEL_HASH_SIZE];
+};
+
+struct lwtunnel_encap_ops {
+   int (*build_state)(struct net_device *dev, struct nlattr *encap,
+  struct lwtunnel_state **ts);
+   int (*output)(struct sock *sk, struct sk_buff *skb);
+   int (*fill_encap)(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate);
+   int (*get_encap_size)(struct lwtunnel_state *lwtstate);
+};
+
+#define MAX_LWTUNNEL_ENCAP_OPS 8
+extern const struct lwtunnel_encap_ops __rcu *
+   lwtun_encaps[MAX_LWTUNNEL_ENCAP_OPS];
+
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+   atomic_inc(&lws->refcnt);
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+   if (!lws)
+   return;
+
+   if (atomic_dec_and_test(&lws->refcnt))
+   kfree(lws);
+}
+
+static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb)
+{
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+
+   return rt->rt_lwtstate;
+}
+
+int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+struct nlattr *encap,
+struct lwtunnel_state **lws);
+int lwtunnel_fill_encap(struct sk_buff *skb,
+   struct lwtunnel_state *lwtstate);
+int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
+struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
+
+#endif /* __NET_LWTUNNEL_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
new file mode 100644
index 000..11150c0
--- /dev/null
+++ b/include/uapi/linux/lwtunnel.h
@@ -0,0 +1,11 @@
+#ifndef _UAPI_LWTUNNEL_H_
+#define _UAPI_LWTUNNEL_H_
+
+#include 
+
+enum tunnel_encap_types {
+   LWTUNNEL_ENCAP_NONE,
+   LWTUNNEL_ENCAP_MPLS,
+};
+
+#endif /* _UAPI_LWTUNNEL_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 57a7c5a..e296d6f 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -374,9 +374,14 @@ source "net/caif/Kconfig"
 source "net/ceph/Kconfig"
 source "net/nfc/Kconfig"
 
+config LWTUNNEL
+   bool "Network light weight tunnels"
+   ---help---
+ light weight tunnels
 
 endif   # if NET
 
 # Used by archs to tell that they support BPF_JIT
 config HAVE_BPF_JIT
bool
+
diff --git a/net/core/Makefile b/net/core/Makefile
index fec0856..086b01f 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -23,3 +23,4 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o
 obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o
 obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
+obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
new file mode 100644
index 000..29c7802
--- /dev/null
+++ b/net/core/lwtunnel.c
@@ -0,0 +1,162 @@

[PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver

2015-06-18 Thread Roopa Prabhu

From: Roopa Prabhu 

This series implements infrastructure for light weight tunnels to support
mpls label edge routers (ie mpls ip tunnels). As previously discussed 
having netdevices will not scale. Hence this series introduces new RTA_ENCAP*
attributes to attach encap information with routes (following suggestion
from Eric Biederman).

The first patch introduces an infrastructure to support light weight tunnels
that dont have netdevices. The infrastructure allows tunnel drivers
to register handlers to parse and build tunnel encap data which can be attached
to each route nexthop.

The second patch adds support in ipv4 fib to carry such light weight tunnel
encap data.

The third patch implements mpls ip tunnels using this light weight tunnel
infrastructure.

Could not think of a better name, so, it is 'lwt' for 'light weight tunnels'
for now.

I do have iproute2 patches. Can post them separately if required
(they are currently in my github tree
https://github.com/CumulusNetworks/iproute2 (mpls branch))

Signed-off-by: Roopa Prabhu 

v2:
- bug fixes (more testing)
- feedback from Thomas
- A flag in lwtunnel state that allows using the chosen
  output device instead of redirecting dst output to the
  lwt output function.
- This flag can be set by the tunnel driver at tunnel state
  build time
- moved lwtstate pointer from dst_entry to rtable (seemed cleaner
looking at thomas's openvswitch patches)
- moved mpls iptunnel code into separate file (following erics and
roberts initial patches)

Roopa Prabhu (3):
  lwt: infrastructure to support light weight tunnels
  ipv4: add support for light weight tunnel encap attributes
  mpls: support for ip mpls tunnels

 include/linux/lwtunnel.h   |6 ++
 include/linux/mpls_iptunnel.h  |6 ++
 include/net/ip_fib.h   |7 +-
 include/net/lwtunnel.h |   84 +++
 include/net/mpls_iptunnel.h|   29 +
 include/net/route.h|3 +
 include/uapi/linux/lwtunnel.h  |   11 ++
 include/uapi/linux/mpls_iptunnel.h |   26 +
 include/uapi/linux/rtnetlink.h |3 +-
 net/Kconfig|5 +
 net/core/Makefile  |1 +
 net/core/lwtunnel.c|  162 
 net/ipv4/fib_frontend.c|8 ++
 net/ipv4/fib_semantics.c   |   93 +++-
 net/ipv4/route.c   |   33 +-
 net/mpls/Kconfig   |5 +
 net/mpls/Makefile  |1 +
 net/mpls/af_mpls.c |9 +-
 net/mpls/internal.h|3 +
 net/mpls/mpls_iptunnel.c   |  205 
 20 files changed, 692 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/lwtunnel.h
 create mode 100644 include/linux/mpls_iptunnel.h
 create mode 100644 include/net/lwtunnel.h
 create mode 100644 include/net/mpls_iptunnel.h
 create mode 100644 include/uapi/linux/lwtunnel.h
 create mode 100644 include/uapi/linux/mpls_iptunnel.h
 create mode 100644 net/core/lwtunnel.c
 create mode 100644 net/mpls/mpls_iptunnel.c

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 00/22] FUJITSU Extended Socket network device driver

2015-06-18 Thread Izumi, Taku


 Thank you for reviewing.

> As Alex mentioned earlier, I suspect this is more appropriate for drivers/net.
> If David objects, we can consider for platform/drivers/x86.

 OK, I'll migrate the code from drivers/platform/x86 to drivers/net and also
 incorporate comments. I'm going to resend one soon.

 Sincerely,
 Taku Izumi

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] ixgbe: use kzalloc for allocating one thing

2015-06-18 Thread Maninder Singh

Use kzalloc rather than kcalloc(1..

The semantic patch that makes this change is as follows:

// 
@@
@@

- kcalloc(1,
+ kzalloc(
  ...)
// 

and removing checkpatch below CHECK:
CHECK: Prefer kzalloc(sizeof(*fwd_adapter)...) over 
kzalloc(sizeof(struct ixgbe_fwd_adapter)...)

Signed-off-by: Maninder Singh 
Reviewed-by: Vaneet Narang 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3bf2f3c..3f58757 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8134,7 +8134,7 @@ static void *ixgbe_fwd_add(struct net_device *pdev, 
struct net_device *vdev)
(adapter->num_rx_pools > IXGBE_MAX_MACVLANS))
return ERR_PTR(-EBUSY);
 
-   fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL);
+   fwd_adapter = kzalloc(sizeof(*fwd_adapter), GFP_KERNEL);
if (!fwd_adapter)
return ERR_PTR(-ENOMEM);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Steven Rostedt

On Thu, 18 Jun 2015 21:37:02 -0400
Jeff Layton  wrote:

> > Note, the box has been rebooted since I posted my last trace.
> > 
> 
> Ahh pity. The port has probably changed...if you trace it again maybe
> try to figure out what it's talking to before rebooting the server?

I could probably re-enable the trace again.

Would it be best if I put back the commits and run it with the buggy
kernel. I could then run these commands after the bug happens and/or
before the port goes away.

> Oh! I was thinking that you were seeing this extra port on the
> _client_, but now rereading your original mail I see that it's
> appearing up on the NFS server. Is that correct?

Correct, the bug is on the NFS server, not the client. The client is
already up and running, and had the filesystem mounted when the server
rebooted. I take it that this happened when the client tried to
reconnect.

Just let me know what you would like to do. As this is my main
production server of my local network, I would only be able to do this
a few times. Let me know all the commands and tracing you would like to
have. I'll try it tomorrow (going to bed now).

-- Steve

> 
> So, assuming that this is NFSv4.0, then this port is probably bound
> when the server is establishing the callback channel to the client. So
> we may need to look at how those xprts are being created and whether
> there are differences from a standard client xprt.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Andy Gospodarek

On Thu, Jun 18, 2015 at 11:30:54AM -0700, Mahesh Bandewar wrote:
> Actor and Partner details can be accessed via proc-fs, sys-fs
> entries or netlink interface. These interfaces are world readable
> at this moment. The earlier patch-series made the LACP communication
> secure to avoid nuisance attack from within the same L2 domain but
> it did not prevent "someone unprivileged" looking at that information
> on host and perform the same act.
> 
> This patch essentially avoids spitting those entries if the user
> in question does not have enough privileges.
> 
> Signed-off-by: Mahesh Bandewar 
> ---
>  drivers/net/bonding/bond_netlink.c |  23 +
>  drivers/net/bonding/bond_procfs.c  | 101 
> +++--
>  drivers/net/bonding/bond_sysfs.c   |  12 ++---
>  3 files changed, 71 insertions(+), 65 deletions(-)
> 
[...]
> diff --git a/drivers/net/bonding/bond_procfs.c 
> b/drivers/net/bonding/bond_procfs.c
> index e7f3047a26df..f514fe5e80a5 100644
> --- a/drivers/net/bonding/bond_procfs.c
> +++ b/drivers/net/bonding/bond_procfs.c
[...]
> @@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq,
>   seq_printf(seq, "Partner Churned Count: %d\n",
>  port->churn_partner_count);
>  
> - seq_puts(seq, "details actor lacp pdu:\n");
> - seq_printf(seq, "system priority: %d\n",
> -port->actor_system_priority);
> - seq_printf(seq, "system mac address: %pM\n",
> -&port->actor_system);
> - seq_printf(seq, "port key: %d\n",
> -port->actor_oper_port_key);
> - seq_printf(seq, "port priority: %d\n",
> -port->actor_port_priority);
> - seq_printf(seq, "port number: %d\n",
> -port->actor_port_number);
> - seq_printf(seq, "port state: %d\n",
> -port->actor_oper_port_state);
> -
> - seq_puts(seq, "details partner lacp pdu:\n");
> - seq_printf(seq, "system priority: %d\n",
> -port->partner_oper.system_priority);
> - seq_printf(seq, "system mac address: %pM\n",
> -&port->partner_oper.system);
> - seq_printf(seq, "oper key: %d\n",
> -port->partner_oper.key);
> - seq_printf(seq, "port priority: %d\n",
> -port->partner_oper.port_priority);
> - seq_printf(seq, "port number: %d\n",
> -port->partner_oper.port_number);
> - seq_printf(seq, "port state: %d\n",
> -port->partner_oper.port_state);
> + if (capable(CAP_NET_ADMIN)) {
> + seq_puts(seq, "details actor lacp pdu:\n");
> + seq_printf(seq, "system priority: %d\n",
> +port->actor_system_priority);
> + seq_printf(seq, "system mac address: %pM\n",
> +&port->actor_system);
> + seq_printf(seq, "port key: %d\n",
> +port->actor_oper_port_key);
> + seq_printf(seq, "port priority: %d\n",
> +port->actor_port_priority);
> + seq_printf(seq, "port number: %d\n",
> +port->actor_port_number);
> + seq_printf(seq, "port state: %d\n",
> +port->actor_oper_port_state);
> +
> + seq_puts(seq, "details partner lacp pdu:\n");
> + seq_printf(seq, "system priority: %d\n",
> +port->partner_oper.system_priority);
> + seq_printf(seq, "system mac address: %pM\n",
> +&port->partner_oper.system);
> + seq_printf(seq, "oper key: %d\n",
> +port->partner_oper.key);
> + seq_printf(seq, "port priority: %d\n",
> +port->partner_oper.port_priority);
> + seq_printf(seq, "port number: %d\n",
> +port->partner_oper.port_number);
> + seq_printf(seq, "port state: %d\n",
> +port->partner_oper.port_state);
> +

[PATCH v2] fm10k: Report MAC address on driver load

2015-06-18 Thread Alexander Duyck

This change adds the MAC address to the list of values recorded on driver
load.  The MAC address represents the serial number of the unit and allows
us to track the value should a card be replaced in a system.

The log message should now be similar in output to that of ixgbe.

Signed-off-by: Alexander Duyck 
---

v2: Moved printing of MAC onto separate line similar to ixgbe.

(Hopefully this works for you Jeff.  I took at look at the patch and just
 moved the bit I needed down.  I figured since this block hasn't changed I
 should be able to get away with just doing this instead of pulling and
 rebasing off of your tree. )

 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index ce53ff25f88d..62a584f633d8 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1843,6 +1843,9 @@ static int fm10k_probe(struct pci_dev *pdev,
/* print warning for non-optimal configurations */
fm10k_slot_warn(interface);
 
+   /* report MAC address for logging */
+   dev_info(&pdev->dev, "%pM\n", netdev->dev_addr);
+
/* enable SR-IOV after registering netdev to enforce PF/VF ordering */
fm10k_iov_configure(pdev, 0);
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Intel-wired-lan] [PATCH] fm10k: Report MAC address on driver load

2015-06-18 Thread Alexander Duyck


On 06/18/2015 04:49 PM, Jeff Kirsher wrote:

On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote:

This change adds the MAC address to the list of values recorded on
driver
load.  The MAC address represents the serial number of the unit and
allows
us to track the value should a card be replaced in a system.

Signed-off-by: Alexander Duyck 
---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

With the recent fm10k patches that Jake submitted, this patch no longer
applies cleanly.  If you could re-spin your patch against my next-queue
tree (dev-queue branch) that would be much appreciated.


I should have a new patch for you in 20 minutes or so.  Just waiting on 
the build to finish and then I'll give it a quick test.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Jeff Layton

On Thu, 18 Jun 2015 21:08:43 -0400
Steven Rostedt  wrote:

> On Thu, 18 Jun 2015 18:50:51 -0400
> Jeff Layton  wrote:
>  
> > The interesting bit here is that the sockets all seem to connect to port
> > 55201 on the remote host, if I'm reading these traces correctly. What's
> > listening on that port on the server?
> > 
> > This might give some helpful info:
> > 
> > $ rpcinfo -p 
> 
> # rpcinfo -p wife
>program vers proto   port  service
> 104   tcp111  portmapper
> 103   tcp111  portmapper
> 102   tcp111  portmapper
> 104   udp111  portmapper
> 103   udp111  portmapper
> 102   udp111  portmapper
> 1000241   udp  34243  status
> 1000241   tcp  34498  status
> 
> # rpcinfo -p localhost
>program vers proto   port  service
> 104   tcp111  portmapper
> 103   tcp111  portmapper
> 102   tcp111  portmapper
> 104   udp111  portmapper
> 103   udp111  portmapper
> 102   udp111  portmapper
> 1000241   udp  38332  status
> 1000241   tcp  52684  status
> 132   tcp   2049  nfs
> 133   tcp   2049  nfs
> 134   tcp   2049  nfs
> 1002272   tcp   2049
> 1002273   tcp   2049
> 132   udp   2049  nfs
> 133   udp   2049  nfs
> 134   udp   2049  nfs
> 1002272   udp   2049
> 1002273   udp   2049
> 1000211   udp  53218  nlockmgr
> 1000213   udp  53218  nlockmgr
> 1000214   udp  53218  nlockmgr
> 1000211   tcp  49825  nlockmgr
> 1000213   tcp  49825  nlockmgr
> 1000214   tcp  49825  nlockmgr
> 151   udp  49166  mountd
> 151   tcp  48797  mountd
> 152   udp  47856  mountd
> 152   tcp  53839  mountd
> 153   udp  36090  mountd
> 153   tcp  46390  mountd
> 
> Note, the box has been rebooted since I posted my last trace.
> 

Ahh pity. The port has probably changed...if you trace it again maybe
try to figure out what it's talking to before rebooting the server?

> > 
> > Also, what NFS version are you using to mount here? Your fstab entries
> > suggest that you're using the default version (for whatever distro this
> > is), but have you (e.g.) set up nfsmount.conf to default to v3 on this
> > box?
> > 
> 
> My box is Debian testing (recently updated).
> 
> # dpkg -l nfs-*
> 
> ii  nfs-common 1:1.2.8-9amd64NFS support files common to clien
> ii  nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server
> 
> 
> same for both boxes.
> 
> nfsmount.conf doesn't exist on either box.
> 
> I'm assuming it is using nfs4.
> 

(cc'ing Bruce)

Oh! I was thinking that you were seeing this extra port on the
_client_, but now rereading your original mail I see that it's
appearing up on the NFS server. Is that correct?

So, assuming that this is NFSv4.0, then this port is probably bound
when the server is establishing the callback channel to the client. So
we may need to look at how those xprts are being created and whether
there are differences from a standard client xprt.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Steven Rostedt

On Thu, 18 Jun 2015 18:50:51 -0400
Jeff Layton  wrote:
 
> The interesting bit here is that the sockets all seem to connect to port
> 55201 on the remote host, if I'm reading these traces correctly. What's
> listening on that port on the server?
> 
> This might give some helpful info:
> 
> $ rpcinfo -p 

# rpcinfo -p wife
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
1000241   udp  34243  status
1000241   tcp  34498  status

# rpcinfo -p localhost
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
1000241   udp  38332  status
1000241   tcp  52684  status
132   tcp   2049  nfs
133   tcp   2049  nfs
134   tcp   2049  nfs
1002272   tcp   2049
1002273   tcp   2049
132   udp   2049  nfs
133   udp   2049  nfs
134   udp   2049  nfs
1002272   udp   2049
1002273   udp   2049
1000211   udp  53218  nlockmgr
1000213   udp  53218  nlockmgr
1000214   udp  53218  nlockmgr
1000211   tcp  49825  nlockmgr
1000213   tcp  49825  nlockmgr
1000214   tcp  49825  nlockmgr
151   udp  49166  mountd
151   tcp  48797  mountd
152   udp  47856  mountd
152   tcp  53839  mountd
153   udp  36090  mountd
153   tcp  46390  mountd

Note, the box has been rebooted since I posted my last trace.

> 
> Also, what NFS version are you using to mount here? Your fstab entries
> suggest that you're using the default version (for whatever distro this
> is), but have you (e.g.) set up nfsmount.conf to default to v3 on this
> box?
> 

My box is Debian testing (recently updated).

# dpkg -l nfs-*

ii  nfs-common 1:1.2.8-9amd64NFS support files common to clien
ii  nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server


same for both boxes.

nfsmount.conf doesn't exist on either box.

I'm assuming it is using nfs4.

Anything else I can provide?

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wangnan (F)




On 2015/6/19 0:00, Alexei Starovoitov wrote:

On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote:

Original code has a problem, cause following code failed to pass verifier:

  r1 <- r10
  r1 -= 8
  r2 = 8
  r3 = unsafe pointer
  call BPF_FUNC_probe_read  <-- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

It's not a bug. It's catching ADD only by design.
If we let it recognize SUB then one might argue we should let it
recognize multiply, shifts and all other arithmetic on pointers.
verifier will be getting bigger and bigger. Where do we stop?
llvm only emits canonical ADD. If you've seen llvm doing SUB,
let's fix it there.
So what piece generated this 'r1 -= 8' ?



I hit this problem when writing code of automatical parameter generator. The
instruction is generated by myself. Now I have corrected my code.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fm10k: Report MAC address on driver load

2015-06-18 Thread Jeff Kirsher

On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote:
> This change adds the MAC address to the list of values recorded on
> driver
> load.  The MAC address represents the serial number of the unit and
> allows
> us to track the value should a card be replaced in a system.
> 
> Signed-off-by: Alexander Duyck 
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

With the recent fm10k patches that Jake submitted, this patch no longer
applies cleanly.  If you could re-spin your patch against my next-queue
tree (dev-queue branch) that would be much appreciated.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 2/3] ipv4: L3 and L4 hash-based multipath routing

2015-06-18 Thread Alexander Duyck




On 06/17/2015 01:08 PM, Peter Nørlund wrote:

This patch adds L3 and L4 hash-based multipath routing, selectable on a
per-route basis with the reintroduced RTA_MP_ALGO attribute. The default is
now RT_MP_ALG_L3_HASH.

Signed-off-by: Peter Nørlund 
---
  include/net/ip_fib.h   |  4 ++-
  include/net/route.h|  5 ++--
  include/uapi/linux/rtnetlink.h | 14 ++-
  net/ipv4/fib_frontend.c|  4 +++
  net/ipv4/fib_semantics.c   | 34 ++---
  net/ipv4/icmp.c|  4 +--
  net/ipv4/route.c   | 56 +++---
  net/ipv4/xfrm4_policy.c|  2 +-
  8 files changed, 103 insertions(+), 20 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 4be4f25..250d98e 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -37,6 +37,7 @@ struct fib_config {
u32 fc_flags;
u32 fc_priority;
__be32  fc_prefsrc;
+   int fc_mp_alg;
struct nlattr   *fc_mx;
struct rtnexthop*fc_mp;
int fc_mx_len;
@@ -116,6 +117,7 @@ struct fib_info {
int fib_nhs;
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_mp_weight;
+   int fib_mp_alg;
  #endif
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
@@ -308,7 +310,7 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev);
  int fib_sync_down_dev(struct net_device *dev, int force);
  int fib_sync_down_addr(struct net *net, __be32 local);
  int fib_sync_up(struct net_device *dev);
-void fib_select_multipath(struct fib_result *res);
+void fib_select_multipath(struct fib_result *res, const struct flowi4 *flow);

  /* Exported by fib_trie.c */
  void fib_trie_init(void);
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..1fc7deb 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -110,7 +110,8 @@ struct in_device;
  int ip_rt_init(void);
  void rt_cache_flush(struct net *net);
  void rt_flush_dev(struct net_device *dev);
-struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
+struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp,
+const struct flowi4 *mp_flow);
  struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
struct sock *sk);
  struct dst_entry *ipv4_blackhole_route(struct net *net,
@@ -267,7 +268,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 
*fl4,
  sport, dport, sk);

if (!dst || !src) {
-   rt = __ip_route_output_key(net, fl4);
+   rt = __ip_route_output_key(net, fl4, NULL);
if (IS_ERR(rt))
return rt;
ip_rt_put(rt);
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..dff4a72 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -271,6 +271,18 @@ enum rt_scope_t {
  #define RTM_F_EQUALIZE0x400   /* Multipath equalizer: NI  
*/
  #define RTM_F_PREFIX  0x800   /* Prefix addresses */

+/* Multipath algorithms */
+
+enum rt_mp_alg_t {
+   RT_MP_ALG_L3_HASH,  /* Was IP_MP_ALG_NONE */
+   RT_MP_ALG_PER_PACKET,   /* Was IP_MP_ALG_RR */
+   RT_MP_ALG_DRR,  /* not used */
+   RT_MP_ALG_RANDOM,   /* not used */
+   RT_MP_ALG_WRANDOM,  /* not used */
+   RT_MP_ALG_L4_HASH,
+   __RT_MP_ALG_MAX
+};
+
  /* Reserved table identifiers */

  enum rt_class_t {
@@ -301,7 +313,7 @@ enum rtattr_type_t {
RTA_FLOW,
RTA_CACHEINFO,
RTA_SESSION, /* no longer used */
-   RTA_MP_ALGO, /* no longer used */
+   RTA_MP_ALGO,
RTA_TABLE,
RTA_MARK,
RTA_MFC_STATS,
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..376e8c1 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -590,6 +590,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_PREFSRC]   = { .type = NLA_U32 },
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+   [RTA_MP_ALGO]   = { .type = NLA_U32 },
[RTA_FLOW]  = { .type = NLA_U32 },
  };

@@ -650,6 +651,9 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
cfg->fc_mp = nla_data(attr);
cfg->fc_mp_len = nla_len(attr);
break;
+   case RTA_MP_ALGO:
+   cfg->fc_mp_alg = nla_get_u32(attr);
+   break;
case RTA_FLOW:
cfg->fc_flow = nla_get_u32(attr);

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Jeff Layton

On Thu, 18 Jun 2015 15:49:14 -0400
Steven Rostedt  wrote:

> On Thu, 18 Jun 2015 15:24:52 -0400
> Trond Myklebust  wrote:
> 
> > On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt  
> > wrote:
> > > On Fri, 12 Jun 2015 11:50:38 -0400
> > > Steven Rostedt  wrote:
> > >
> > >> I reverted the following commits:
> > >>
> > >> c627d31ba0696cbd829437af2be2f2dee3546b1e
> > >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134
> > >> caf4ccd4e88cf2795c927834bc488c8321437586
> > >>
> > >> And the issue goes away. That is, I watched the port go from
> > >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
> > >>
> > >> In fact, I watched the port with my portlist.c module, and it
> > >> disappeared there too when it entered the TIME_WAIT state.
> > >>
> > 
> > I've scanned those commits again and again, and I'm not seeing how we
> > could be introducing a socket leak there. The only suspect I can see
> > would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you
> > using NFS swap?
> 
> Not that I'm aware of.
> 
> > 
> > > I've been running v4.0.5 with the above commits reverted for 5 days
> > > now, and there's still no hidden port appearing.
> > >
> > > What's the status on this? Should those commits be reverted or is there
> > > another solution to this bug?
> > >
> > 
> > I'm trying to reproduce, but I've had no luck yet.
> 
> It seems to happen with the connection to my wife's machine, and that
> is where my wife's box connects two directories via nfs:
> 
> This is what's in my wife's /etc/fstab directory
> 
> goliath:/home/upload /upload nfs auto,rw,intr,soft   0 0
> goliath:/home/gallery/gallerynfs auto,ro,intr,soft 0 0
> 
> And here's what's in my /etc/exports directory
> 
> /home/upload   wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check)
> /home/gallery  192.168.23.0/24(ro,sync,no_subtree_check)
> 
> Attached is my config.
> 

The interesting bit here is that the sockets all seem to connect to port
55201 on the remote host, if I'm reading these traces correctly. What's
listening on that port on the server?

This might give some helpful info:

$ rpcinfo -p 

Also, what NFS version are you using to mount here? Your fstab entries
suggest that you're using the default version (for whatever distro this
is), but have you (e.g.) set up nfsmount.conf to default to v3 on this
box?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fm10k: Report MAC address on driver load

2015-06-18 Thread Jeff Kirsher

On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote:
> This change adds the MAC address to the list of values recorded on
> driver
> load.  The MAC address represents the serial number of the unit and
> allows
> us to track the value should a card be replaced in a system.
> 
> Signed-off-by: Alexander Duyck 
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks Alex, I will get this added to my queue.


signature.asc
Description: This is a digitally signed message part

[PATCH] NET: ROSE: Don't dereference NULL neighbour pointer.

2015-06-18 Thread Ralf Baechle

A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.

Signed-off-by: Ralf Baechle 
Tested-by: Bernard Pidoux 
Cc: sta...@vger.kernel.org #2.6.11+

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 8ae6030..dd304bc 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -192,7 +192,8 @@ static void rose_kill_by_device(struct net_device *dev)
 
if (rose->device == dev) {
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
-   rose->neighbour->use--;
+   if (rose->neighbour)
+   rose->neighbour->use--;
rose->device = NULL;
}
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000e driver - hang after 4 hours of uptime - finally bisected!

2015-06-18 Thread Jeff Kirsher

On Thu, 2015-06-18 at 12:46 -0400, Valdis Kletnieks wrote:
> (follow up to a report from last week - bisecting took a while as I could
> only do 1 or 2 tests an evening)
> 
> My Dell Latitude E6530 crashes with a specific kernel lockup almost
> exactly 4 hours after boot if there isn't a cable connected to the
> Ethernet port:
> 
> [14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 0
> [14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 0
> [14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 0
> [14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 1
> [14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 2
> [14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 1
> [14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 0
> [14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 1
> [14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 3
> [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 0
> 
> As far as I can tell, the timestamp jitter is just how long it takes me to
> enter the cryptLUKS passphrase for the hard drive at boot...
> 
> lspci tells me:
> 
> lspci -vvv -s "00:19.0"
> 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network 
> Connection (rev 04)
> DeviceName:  Onboard LAN
> Subsystem: Dell Device 0535
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> SERR-  Latency: 0
> Interrupt: pin A routed to IRQ 28
> Region 0: Memory at f770 (32-bit, non-prefetchable) [size=128K]
> Region 1: Memory at f7739000 (32-bit, non-prefetchable) [size=4K]
> Region 2: I/O ports at f040 [size=32]
> Capabilities: [c8] Power Management version 2
> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> Address: fee00318  Data: 
> Capabilities: [e0] PCI Advanced Features
> AFCap: TP+ FLR+
> AFCtrl: FLR-
> AFStatus: TP-
> Kernel driver in use: e1000e
> 
> 
> The traceback always looks like:
> 
> [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on 
> cpu 0
> 
> [14479.906908] Call Trace:
> [14479.906914][] dump_stack+0x50/0xa8
> [14479.906930]  [] panic+0xcd/0x1e4
> [14479.906940]  [] ? perf_event_task_disable+0xc0/0xc0
> [14479.906952]  [] watchdog_overflow_callback+0x9b/0xa0
> [14479.906959]  [] __perf_event_overflow+0xc4/0x1f0
> [14479.906968]  [] perf_event_overflow+0x14/0x20
> [14479.906976]  [] intel_pmu_handle_irq+0x1e1/0x430
> [14479.906990]  [] perf_event_nmi_handler+0x26/0x40
> [14479.906999]  [] nmi_handle+0x103/0x340
> [14479.907005]  [] ? nmi_handle+0x5/0x340
> [14479.907017]  [] default_do_nmi+0xc3/0x120
> [14479.907032]  [] do_nmi+0xe8/0x130
> [14479.907044]  [] end_repeat_nmi+0x1e/0x2e
> [14479.907055]  [] ? e1000e_cyclecounter_read+0x16/0xc0
> [14479.907061]  [] ? e1000e_cyclecounter_read+0x16/0xc0
> [14479.907069]  [] ? e1000e_cyclecounter_read+0x16/0xc0
> [14479.907075]  <>  [] timecounter_read+0x19/0x60
> [14479.907088]  [] e1000e_phc_gettime+0x2e/0x60
> [14479.907098]  [] e1000e_systim_overflow_work+0x31/0x70
> [14479.907105]  [] process_one_work+0x3c9/0x980
> [14479.907115]  [] ? process_one_work+0x312/0x980
> [14479.907125]  [] ? worker_thread+0x78/0x760
> [14479.907134]  [] worker_thread+0x2cc/0x760
> [14479.907144]  [] ? process_one_work+0x980/0x980
> [14479.907154]  [] kthread+0xfe/0x120
> [14479.907163]  [] ? finish_task_switch+0x50/0x1c0
> [14479.907173]  [] ? kthread_create_on_node+0x270/0x270
> [14479.907179]  [] ret_from_fork+0x3f/0x70
> [14479.907188]  [] ? kthread_create_on_node+0x270/0x270
> [14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
> 
> Bisection tells me it's this commit:
> 
> commit 83129b37ef35bb6a7f01c060129736a8db5d31c4
> Author: Yanir Lubetkin 
> Date:   Tue Jun 2 17:05:45 2015 +0300
> 
> e1000e: fix systim issues
> 
> Two issues involving systim were reported.
> 1. Clock is not running in the correct frequency
> 2. In some situations, systim values were not incremented linearly
> This patch fixes the hardware clock configuration and the spurious
> non-linear increment.

Thanks Valdis!  I will have Yanir look into it and hopefully we should
have a fix here soon for you to verify.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 00/22] FUJITSU Extended Socket network device driver

2015-06-18 Thread Darren Hart

On Thu, Jun 18, 2015 at 09:45:59AM +0900, Taku Izumi wrote:
> This patchsets adds FUJITSU Extended Socket network device driver.
> Extended Socket network device is a shared memory based high-speed network 
> interface between Extended Partitions of PRIMEQUEST 2000 E2 series.
> 
> You can get some information about Extended Partition and Extended
> Socket by referring the following manual.
> 
> http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf
>  3.2.1 Extended Partitioning
>  3.2.2 Extended Socket
> 

As Alex mentioned earlier, I suspect this is more appropriate for drivers/net.
If David objects, we can consider for platform/drivers/x86.

-- 
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT] [4.2] 2nd NFC update

2015-06-18 Thread Samuel Ortiz

Hi David,

This is a follow up fix for a typo that I introduced while cleaning
the 1st 4.2 NFC pull request patches.

The following changes since commit d0dcad8bd32a34aa85bcbd5d2033658cb3964377:

  NFC: nfcmrvl: set PB_BAIL_OUT at setup (2015-06-13 00:08:55 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git 
tags/nfc-next-4.2-2

for you to fetch changes up to fb77ff4f43990dc91926ce2704036a547482544e:

  NFC: nci: fix mistake in uart generic driver (2015-06-15 18:10:37 +0200)


Vincent Cuissard (1):
  NFC: nci: fix mistake in uart generic driver

 net/nfc/nci/uart.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: fix search limit handling in skb_find_text()

2015-06-18 Thread Pablo Neira Ayuso

On Tue, Jun 16, 2015 at 03:13:41PM +0300, Roman Khimov wrote:
> В письме от 16 июня 2015 12:48:41 пользователь Pablo Neira Ayuso написал:
[...]
> > But if we change the existing behaviour, users may be relying on it
> > and we'll get things broken for them. Someone else will come later one
> > with another patch to say: "hey, --to used to be inclusive but this is
> > not the case anymore and it's breaking my setup".
> 
> I do understand your concerns, but fixing it this way would require changing 
> skb_seq_read() and basicaly would propagate "'to' offset included" semantics 
> (which seems a bit strange for programmers, IMO) further. And initially I 
> thought that changing skb_seq_read() would be more intrusive, although 
> looking 
> at all this now it looks like the only real user of upper_offset field in 
> ts_config struct is skb_find_text(), because other invocations of 
> skb_seq_read() from drivers/scsi/libiscsi_tcp.c and net/batman-adv/main.c use 
> skb->len as an upper limit.
> 
> > > em_text_match() in net/sched/em_text.c is also suspicious.
> > 
> > Please, elaborate.
> 
> The way it constructs 'to' offset, I think it doesn't expect something to 
> match at 'to'. Although I might be wrong here.

Could you send a patch that resolves the inconsistency for programmers
while leaving the userspace exposed behaviour through xt_string and
em_string intact? Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Steven Rostedt

On Thu, 18 Jun 2015 15:24:52 -0400
Trond Myklebust  wrote:

> On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt  wrote:
> > On Fri, 12 Jun 2015 11:50:38 -0400
> > Steven Rostedt  wrote:
> >
> >> I reverted the following commits:
> >>
> >> c627d31ba0696cbd829437af2be2f2dee3546b1e
> >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134
> >> caf4ccd4e88cf2795c927834bc488c8321437586
> >>
> >> And the issue goes away. That is, I watched the port go from
> >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
> >>
> >> In fact, I watched the port with my portlist.c module, and it
> >> disappeared there too when it entered the TIME_WAIT state.
> >>
> 
> I've scanned those commits again and again, and I'm not seeing how we
> could be introducing a socket leak there. The only suspect I can see
> would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you
> using NFS swap?

Not that I'm aware of.

> 
> > I've been running v4.0.5 with the above commits reverted for 5 days
> > now, and there's still no hidden port appearing.
> >
> > What's the status on this? Should those commits be reverted or is there
> > another solution to this bug?
> >
> 
> I'm trying to reproduce, but I've had no luck yet.

It seems to happen with the connection to my wife's machine, and that
is where my wife's box connects two directories via nfs:

This is what's in my wife's /etc/fstab directory

goliath:/home/upload /upload nfs auto,rw,intr,soft   0 0
goliath:/home/gallery/gallerynfs auto,ro,intr,soft   0 0

And here's what's in my /etc/exports directory

/home/upload   wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check)
/home/gallery  192.168.23.0/24(ro,sync,no_subtree_check)

Attached is my config.

-- Steve




config.gz
Description: application/gzip

Re: [PATCH net-next 1/3] ipv4: Lock-less per-packet multipath

2015-06-18 Thread Alexander Duyck


On 06/17/2015 01:08 PM, Peter Nørlund wrote:

The current multipath attempted to be quasi random, but in most cases it
behaved just like a round robin balancing. This patch refactors the
algorithm to be exactly that and in doing so, avoids the spin lock.

The new design paves the way for hash-based multipath, replacing the
modulo with thresholds, minimizing disruption in case of failing paths or
route replacements.

Signed-off-by: Peter Nørlund 
---
  include/net/ip_fib.h |   6 +--
  net/ipv4/Kconfig |   1 +
  net/ipv4/fib_semantics.c | 116 ++-
  3 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..4be4f25 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -76,8 +76,8 @@ struct fib_nh {
unsigned intnh_flags;
unsigned char   nh_scope;
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   int nh_weight;
-   int nh_power;
+   int nh_mp_weight;
+   atomic_tnh_mp_upper_bound;
  #endif
  #ifdef CONFIG_IP_ROUTE_CLASSID
__u32   nh_tclassid;
@@ -115,7 +115,7 @@ struct fib_info {
  #define fib_advmss fib_metrics[RTAX_ADVMSS-1]
int fib_nhs;
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   int fib_power;
+   int fib_mp_weight;
  #endif
struct rcu_head rcu;
struct fib_nh   fib_nh[0];


I could do without some of this renaming.  For example you could 
probably not bother with adding the _mp piece to the name.  That way we 
don't have to track all the nh_weight -> nh_mp_weight changes.   Also 
you could probably just use the name fib_weight since not including the 
_mp was already the convention for the multipath portions of the 
structure anyway.


This isn't really improving readability at all so I would say don't 
bother renaming it.



diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index d83071d..cb91f67 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -81,6 +81,7 @@ config IP_MULTIPLE_TABLES
  config IP_ROUTE_MULTIPATH
bool "IP: equal cost multipath"
depends on IP_ADVANCED_ROUTER
+   select BITREVERSE
help
  Normally, the routing tables specify a single action to be taken in
  a deterministic manner for a given packet. If you say Y here
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 28ec3c1..8c8df80 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -15,6 +15,7 @@

  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -57,7 +58,7 @@ static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE];

  #ifdef CONFIG_IP_ROUTE_MULTIPATH

-static DEFINE_SPINLOCK(fib_multipath_lock);
+static DEFINE_PER_CPU(u8, fib_mp_rr_counter);

  #define for_nexthops(fi) {\
int nhsel; const struct fib_nh *nh; \
@@ -261,7 +262,7 @@ static inline int nh_comp(const struct fib_info *fi, const 
struct fib_info *ofi)
nh->nh_gw  != onh->nh_gw ||
nh->nh_scope != onh->nh_scope ||
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   nh->nh_weight != onh->nh_weight ||
+   nh->nh_mp_weight != onh->nh_mp_weight ||
  #endif
  #ifdef CONFIG_IP_ROUTE_CLASSID
nh->nh_tclassid != onh->nh_tclassid ||
@@ -449,6 +450,43 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int 
remaining)
return remaining > 0 ? 0 : nhs;
  }



This is a good example.  If we don't do the rename we don't have to 
review changes like the one above which just add extra overhead to the 
patch.



+static void fib_rebalance(struct fib_info *fi)
+{
+   int factor;
+   int total;
+   int w;
+
+   if (fi->fib_nhs < 2)
+   return;
+
+   total = 0;
+   for_nexthops(fi) {
+   if (!(nh->nh_flags & RTNH_F_DEAD))
+   total += nh->nh_mp_weight;
+   } endfor_nexthops(fi);
+
+   if (likely(total != 0)) {
+   factor = DIV_ROUND_UP(total, 8388608);
+   total /= factor;
+   } else {
+   factor = 1;
+   }
+


So where does the 8388608 value come from?  Is it just here to help 
restrict the upper_bound to a u8 value?



+   w = 0;
+   change_nexthops(fi) {
+   int upper_bound;
+
+   if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
+   upper_bound = -1;
+   } else {
+   w += nexthop_nh->nh_mp_weight / factor;
+   upper_bound = DIV_ROUND_CLOSEST(256 * w, total);
+   }


This is doing some confusing stuff.  I assume the whole point is to get 
the value to convert the upper_bound into a u8 value based on the weight 
where you end

Re: [PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status

2015-06-18 Thread Andy Gospodarek

On Thu, Jun 18, 2015 at 10:51:37AM -0700, Scott Feldman wrote:
> On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
>  wrote:
> > This series adds the ability to have the Linux kernel track whether or
> > not a particular route should be used based on the link-status of the
> > interface associated with the next-hop.
> >
> > Before this patch any link-failure on an interface that was serving as a
> > gateway for some systems could result in those systems being isolated
> > from the rest of the network as the stack would continue to attempt to
> > send frames out of an interface that is actually linked-down.  When the
> > kernel is responsible for all forwarding, it should also be responsible
> > for taking action when the traffic can no longer be forwarded -- there
> > is no real need to outsource link-monitoring to userspace anymore.
> >
> > This feature is only enabled with the new per-interface or ipv4 global
> > sysctls called 'ignore_routes_with_linkdown'.
> >
> > net.ipv4.conf.all.ignore_routes_with_linkdown = 0
> > net.ipv4.conf.default.ignore_routes_with_linkdown = 0
> > net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
> > ...
> >
> > When the above sysctls are set, the kernel will not only report to
> > userspace that the link is down, but it will also report to userspace
> > that a route is dead.  This will signal to userspace that the route will
> > not be selected.
> >
> > With the new sysctls set, the following behavior can be observed
> > (interface p8p1 is link-down):
> >
> > # ip route show
> > default via 10.0.5.2 dev p9p1
> > 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
> > 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
> > 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
> > 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
> > 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
> > # ip route get 90.0.0.1
> > 90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
> > cache
> > # ip route get 80.0.0.1
> > local 80.0.0.1 dev lo  src 80.0.0.1
> > cache 
> > # ip route get 80.0.0.2
> > 80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
> > cache
> >
> > While the route does remain in the table (so it can be modified if
> > needed rather than being wiped away as it would be if IFF_UP was
> > cleared), the proper next-hop is chosen automatically when the link is
> > down.  Now interface p8p1 is linked-up:
> >
> > # ip route show
> > default via 10.0.5.2 dev p9p1
> > 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
> > 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
> > 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
> > 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
> > 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
> > 192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
> > # ip route get 90.0.0.1
> > 90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
> > cache
> > # ip route get 80.0.0.1
> > local 80.0.0.1 dev lo  src 80.0.0.1
> > cache 
> > # ip route get 80.0.0.2
> > 80.0.0.2 dev p8p1  src 80.0.0.1
> > cache
> >
> > and the output changes to what one would expect.
> >
> > If the global or interface sysctl is not set, the following output would be
> > expected when p8p1 is down:
> >
> > # ip route show
> > default via 10.0.5.2 dev p9p1
> > 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
> > 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
> > 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
> > 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
> > 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
> >
> > If the dead flag does not appear there should be no expectation that the
> > kernel would skip using this route due to link being down.
> >
> > v2: Split kernel changes into 2 patches: first to add linkdown flag and
> > second to add new sysctl settings.  Also took suggestion from Alex to
> > simplify code by only checking sysctl during fib lookup and suggestion
> > from Scott to add a per-interface sysctl.  Added iproute2 patch to
> > recognize and print linkdown flag.
> >
> > v3: Code cleanups along with reverse-path checks suggested by Alex and
> > small fixes related to problems found when multipath was disabled.
> >
> > v4: Drop binary sysctls
> >
> > v5: Whitespace and variable declaration fixups suggested by Dave
> >
> > Though there were some that preferred not to have a configuration option
> > and to make this behavior the default when it was discussed in Ottawa
> > earlier this year since "it was time to do this."  I wanted to propose
> > the config option to preserve the current behavior for those that desire
> > it.  I'll happily remove it if Dave and Linus approve.
> >
> > An IPv6 implementation is also needed (DECnet too!), but I wanted to start 
> > with
> > the IPv4 implementation to get people comfortable with the idea before 
> > moving
> > forward.  If this is accepted the IPv6 implementation can be posted shortly.
> >
>

Re: [PATCH net-next 00/43] Simplify netfilter and network namespaces (take 2)

2015-06-18 Thread Pablo Neira Ayuso

On Wed, Jun 17, 2015 at 10:09:40AM -0500, Eric W. Biederman wrote:
[...]
> There are a few extra cleanups in the first group of changes sprinkled
> in as I noticed a few other things as I was sorting out the network
> namespace computation logic.

This is a rather large patchset that address many pernet issues in the
netfilter codebase, I would classify them in:

1) Patches to prepare the ground for easier pernet integration.

2) Get rid of the dev_net(dev) ? ... : ...; pattern all around the
   netfilter code.

3) Missing pernet sysctl support is some spots, eg. br_netfilter.

4) Pernet hooks, probably the largest changeset in this pile and the
   most important one IMO.

So given that it's quite evident that netfilter netns support is
half-cooked and there's room for improvement in it, as we've been
receiving patches to partially add support on things that people
sporadically needed, could you please split this in several (smaller)
batches in logical changes for easier review?

On a different front, nfnetlink_log and nfnetlink_queue also still
lack of netns support so patches for that would be also appreciated in
another different round.

I'm going to take as much of small preparation patches that I can to
reduce your patchload:

1/43, 8/43, 16/43, 17/43, 18/43, 26/43

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-18 Thread Trond Myklebust

On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt  wrote:
> On Fri, 12 Jun 2015 11:50:38 -0400
> Steven Rostedt  wrote:
>
>> I reverted the following commits:
>>
>> c627d31ba0696cbd829437af2be2f2dee3546b1e
>> 9e2b9f37760e129cee053cc7b6e7288acc2a7134
>> caf4ccd4e88cf2795c927834bc488c8321437586
>>
>> And the issue goes away. That is, I watched the port go from
>> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
>>
>> In fact, I watched the port with my portlist.c module, and it
>> disappeared there too when it entered the TIME_WAIT state.
>>

I've scanned those commits again and again, and I'm not seeing how we
could be introducing a socket leak there. The only suspect I can see
would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you
using NFS swap?

> I've been running v4.0.5 with the above commits reverted for 5 days
> now, and there's still no hidden port appearing.
>
> What's the status on this? Should those commits be reverted or is there
> another solution to this bug?
>

I'm trying to reproduce, but I've had no luck yet.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces

2015-06-18 Thread Julian Anastasov

Hello,

On Thu, 18 Jun 2015, Eric W. Biederman wrote:

> My incremental patch for ipvs on top of everything else I have pushed
> out looks like this:
> 
> From: "Eric W. Biederman" 
> Date: Fri, 12 Jun 2015 18:34:12 -0500
> Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used
> 
> Pass struct net down to where it is used and stop guessing
> which network namespace should be used.

At first look patch is ok. But I'm not sure
for the changes in ip_vs_xmit.c. Can you explain in
2-3 lines, when can we see different netns? Is it when
packet is forwarded to output device and it is part from
another netns?

I'm asking because these __ip_vs_get_out_rt*
calls in ip_vs_xmit.c can reroute packet to another
device...

Also, skb_sknet is another candidate for removal.
But I can take care about it after your changes are
pushed...

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] x_table: align per cpu xt_counter

2015-06-18 Thread Pablo Neira Ayuso

On Wed, Jun 17, 2015 at 07:08:15PM +0200, Florian Westphal wrote:
> Eric Dumazet  wrote:
> > From: Eric Dumazet 
> > 
> > Let's force a 16 bytes alignment on xt_counter percpu allocations,
> > so that bytes and packets sit in same cache line.
> > 
> > xt_counter being exported to user space, we cannot add __align(16) on
> > the structure itself.
> 
> Sorry, I was away.  Looks great.
> 
> Acked-by: Florian Westphal 

Applied, thanks Eric and Florian !
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

pull request: bluetooth-next 2015-06-18

2015-06-18 Thread Johan Hedberg

Hi Dave,

Here's the final bluetooth-next pull request for 4.2.

 - Cleanups & fixes to 802.15.4 code and related drivers
 - Fix btusb driver memory leak
 - New USB IDs for Atheros controllers
 - Support for BCM4324B3 UART based Broadcom controller
 - Fix for Bluetooth encryption key size handling
 - Broadcom controller initialization fixes
 - Support for Intel controller DDC parameters
 - Support for multiple Bluetooth LE advertising instances
 - Fix for HCI user channel cleanup path

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit a9ab2184f451ec78af245ebb8b663d8700d44672:

  Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge 
(2015-05-31 01:07:06 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 952497b159468477392f9b562b904da9bc76d468:

  Bluetooth: Fix warning of potentially uninitialized adv_instance variable 
(2015-06-18 21:05:31 +0300)


Aleksei Volkov (1):
  Bluetooth: btusb: Correct typo in Roper Class 1 Bluetooth Dongle

Alexander Aring (20):
  ieee802154: 6lowpan: set ackreq when needed
  mac802154: remove unneeded vif struct
  mac802154: cleanup address filtering flags
  mac802154: remove aack hw flag
  mac802154: cleanup ieee802154 hardware flags
  mac802154: remove unused hw_filt attribute
  mac802154: rearrange attribute in ieee802154_hw
  mac802154: add missing structure comments
  mac802154: change pan_coord type to bool
  mac802154: fix flags BIT definitions order
  mac802154: iface: fix hrtimer cancel on ifdown
  mac802154: iface: flush workqueue before stop
  at86rf230: use level high as fallback default
  at86rf230: add support for sleep state
  fakelb: add xmit_async after stop testcase
  at86rf230: fix phy settings while sleeping
  at86rf230: add recommended csma backoffs settings
  at86rf230: cleanup start and stop callbacks
  mac802154: iface: fix order while interface up
  mac802154: iface: cleanup stack variable

Alexey Dobriyan (1):
  Bluetooth: Stop sabotaging list poisoning

Arron Wang (2):
  Bluetooth: Make l2cap_recv_acldata() and sco_recv_scodata() return void
  Bluetooth: Move SCO support under BT_BREDR config option

Chan-yeol Park (1):
  Bluetooth: hci_uart: Fix dereferencing of ERR_PTR

Christoffer Holmstedt (1):
  nl802154: fix misspelled enum

Dmitry Tunin (3):
  ath3k: Add support of 0489:e076 AR3012 device
  ath3k: add support of 13d3:3474 AR3012 device
  Bluetooth: ath3k: Add support of 04ca:300d AR3012 device

Florian Grandel (20):
  Bluetooth: hci_core/mgmt: Introduce multi-adv list
  Bluetooth: hci_core/mgmt: move adv timeout to hdev
  Bluetooth: mgmt: dry update_scan_rsp_data()
  Bluetooth: mgmt: rename update_*_data_for_instance()
  Bluetooth: mgmt: multi adv for read_adv_features()
  Bluetooth: mgmt: multi adv for get_current_adv_instance()
  Bluetooth: mgmt: multi adv for get_adv_instance_flags()
  Bluetooth: mgmt: improve get_adv_instance_flags() readability
  Bluetooth: mgmt: multi adv for enable_advertising()
  Bluetooth: mgmt: multi adv for create_instance_scan_rsp_data()
  Bluetooth: mgmt: multi adv for create_instance_adv_data()
  Bluetooth: mgmt: multi adv for set_advertising*()
  Bluetooth: mgmt: multi adv for clear_adv_instances()
  Bluetooth: mgmt/hci_core: multi-adv for add_advertising*()
  Bluetooth: mgmt: multi adv for remove_advertising*()
  Bluetooth: mgmt: program multi-adv on power on
  Bluetooth: mgmt: multi-adv for trigger_le_scan()
  Bluetooth: mgmt: multi-adv for mgmt_reenable_advertising()
  Bluetooth: hci_core: remove obsolete adv_instance
  Bluetooth: hci_core: increase max adv inst

Frederic Danis (7):
  Bluetooth: btbcm: Move request/release_firmware()
  Bluetooth: btbcm: Add BCM4324B3 UART device
  Bluetooth: hci_uart: Support operational speed during setup
  Bluetooth: btbcm: Add helper functions for UART setup
  Bluetooth: hci_uart: Update Broadcom UART setup
  Bluetooth: hci_uart: Add bcm_set_baudrate()
  Bluetooth: hci_uart: Fix speed selection

Glenn Ruben Bakke (5):
  Bluetooth: 6lowpan: Enable delete_netdev to be scheduled when last peer 
is deleted
  Bluetooth: 6lowpan: Rename ambiguous variable
  Bluetooth: 6lowpan: Move netdev sysfs device reference
  Bluetooth: 6lowpan: Fix double kfree of netdev priv
  Bluetooth: 6lowpan: Fix module refcount

Ilya Faenson (2):
  Bluetooth: btbcm: Support the BCM4354 Bluetooth UART device
  Bluetooth: hci_uart: Add new line discipline enhancements

Jaganath Kanakkassery (1):
  Bluetooth: Fix potential NULL dereference in RFCOMM bind callback

Johan Hedberg (10):
  Bluetoo

[PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Mahesh Bandewar

Actor and Partner details can be accessed via proc-fs, sys-fs
entries or netlink interface. These interfaces are world readable
at this moment. The earlier patch-series made the LACP communication
secure to avoid nuisance attack from within the same L2 domain but
it did not prevent "someone unprivileged" looking at that information
on host and perform the same act.

This patch essentially avoids spitting those entries if the user
in question does not have enough privileges.

Signed-off-by: Mahesh Bandewar 
---
 drivers/net/bonding/bond_netlink.c |  23 +
 drivers/net/bonding/bond_procfs.c  | 101 +++--
 drivers/net/bonding/bond_sysfs.c   |  12 ++---
 3 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/drivers/net/bonding/bond_netlink.c 
b/drivers/net/bonding/bond_netlink.c
index 5580fcde738f..1bda29249d12 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -601,19 +601,20 @@ static int bond_fill_info(struct sk_buff *skb,
if (BOND_MODE(bond) == BOND_MODE_8023AD) {
struct ad_info info;
 
-   if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
-   bond->params.ad_actor_sys_prio))
-   goto nla_put_failure;
-
-   if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
-   bond->params.ad_user_port_key))
-   goto nla_put_failure;
+   if (capable(CAP_NET_ADMIN)) {
+   if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
+   bond->params.ad_actor_sys_prio))
+   goto nla_put_failure;
 
-   if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
-   sizeof(bond->params.ad_actor_system),
-   &bond->params.ad_actor_system))
-   goto nla_put_failure;
+   if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
+   bond->params.ad_user_port_key))
+   goto nla_put_failure;
 
+   if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
+   sizeof(bond->params.ad_actor_system),
+   &bond->params.ad_actor_system))
+   goto nla_put_failure;
+   }
if (!bond_3ad_get_active_agg_info(bond, &info)) {
struct nlattr *nest;
 
diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index e7f3047a26df..f514fe5e80a5 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -135,27 +135,30 @@ static void bond_info_show_master(struct seq_file *seq)
  bond->params.ad_select);
seq_printf(seq, "Aggregator selection policy (ad_select): %s\n",
   optval->string);
-   seq_printf(seq, "System priority: %d\n",
-  BOND_AD_INFO(bond).system.sys_priority);
-   seq_printf(seq, "System MAC address: %pM\n",
-  &BOND_AD_INFO(bond).system.sys_mac_addr);
-
-   if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
-   seq_printf(seq, "bond %s has no active aggregator\n",
-  bond->dev->name);
-   } else {
-   seq_printf(seq, "Active Aggregator Info:\n");
-
-   seq_printf(seq, "\tAggregator ID: %d\n",
-  ad_info.aggregator_id);
-   seq_printf(seq, "\tNumber of ports: %d\n",
-  ad_info.ports);
-   seq_printf(seq, "\tActor Key: %d\n",
-  ad_info.actor_key);
-   seq_printf(seq, "\tPartner Key: %d\n",
-  ad_info.partner_key);
-   seq_printf(seq, "\tPartner Mac Address: %pM\n",
-  ad_info.partner_system);
+   if (capable(CAP_NET_ADMIN)) {
+   seq_printf(seq, "System priority: %d\n",
+  BOND_AD_INFO(bond).system.sys_priority);
+   seq_printf(seq, "System MAC address: %pM\n",
+  &BOND_AD_INFO(bond).system.sys_mac_addr);
+
+   if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
+   seq_printf(seq,
+  "bond %s has no active aggregator\n",
+  bond->dev->name);
+   } else {
+   seq_printf(seq, "Active Aggregator Info:\n");
+
+   seq_printf(seq, "\tAggregator ID: %d\n",
+

Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Mahesh Bandewar

>>
>> Hmm... I would rather not send these fake attributes at all ?
>
> That would be my preference as well.  Sorry if my lack of elaboration on
> on my earlier email made this confusing.
>
> If there are values that should not be visible to non-root users, then
> don't send them at all.  Do not just send NULL values.
>
OK, would change this in the next rev.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status

2015-06-18 Thread Scott Feldman

On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
 wrote:
> This series adds the ability to have the Linux kernel track whether or
> not a particular route should be used based on the link-status of the
> interface associated with the next-hop.
>
> Before this patch any link-failure on an interface that was serving as a
> gateway for some systems could result in those systems being isolated
> from the rest of the network as the stack would continue to attempt to
> send frames out of an interface that is actually linked-down.  When the
> kernel is responsible for all forwarding, it should also be responsible
> for taking action when the traffic can no longer be forwarded -- there
> is no real need to outsource link-monitoring to userspace anymore.
>
> This feature is only enabled with the new per-interface or ipv4 global
> sysctls called 'ignore_routes_with_linkdown'.
>
> net.ipv4.conf.all.ignore_routes_with_linkdown = 0
> net.ipv4.conf.default.ignore_routes_with_linkdown = 0
> net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
> ...
>
> When the above sysctls are set, the kernel will not only report to
> userspace that the link is down, but it will also report to userspace
> that a route is dead.  This will signal to userspace that the route will
> not be selected.
>
> With the new sysctls set, the following behavior can be observed
> (interface p8p1 is link-down):
>
> # ip route show
> default via 10.0.5.2 dev p9p1
> 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
> 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
> 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
> 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
> 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
> # ip route get 90.0.0.1
> 90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
> cache
> # ip route get 80.0.0.1
> local 80.0.0.1 dev lo  src 80.0.0.1
> cache 
> # ip route get 80.0.0.2
> 80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
> cache
>
> While the route does remain in the table (so it can be modified if
> needed rather than being wiped away as it would be if IFF_UP was
> cleared), the proper next-hop is chosen automatically when the link is
> down.  Now interface p8p1 is linked-up:
>
> # ip route show
> default via 10.0.5.2 dev p9p1
> 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
> 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
> 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
> 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
> 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
> 192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
> # ip route get 90.0.0.1
> 90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
> cache
> # ip route get 80.0.0.1
> local 80.0.0.1 dev lo  src 80.0.0.1
> cache 
> # ip route get 80.0.0.2
> 80.0.0.2 dev p8p1  src 80.0.0.1
> cache
>
> and the output changes to what one would expect.
>
> If the global or interface sysctl is not set, the following output would be
> expected when p8p1 is down:
>
> # ip route show
> default via 10.0.5.2 dev p9p1
> 10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
> 70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
> 80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
> 90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
> 90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
>
> If the dead flag does not appear there should be no expectation that the
> kernel would skip using this route due to link being down.
>
> v2: Split kernel changes into 2 patches: first to add linkdown flag and
> second to add new sysctl settings.  Also took suggestion from Alex to
> simplify code by only checking sysctl during fib lookup and suggestion
> from Scott to add a per-interface sysctl.  Added iproute2 patch to
> recognize and print linkdown flag.
>
> v3: Code cleanups along with reverse-path checks suggested by Alex and
> small fixes related to problems found when multipath was disabled.
>
> v4: Drop binary sysctls
>
> v5: Whitespace and variable declaration fixups suggested by Dave
>
> Though there were some that preferred not to have a configuration option
> and to make this behavior the default when it was discussed in Ottawa
> earlier this year since "it was time to do this."  I wanted to propose
> the config option to preserve the current behavior for those that desire
> it.  I'll happily remove it if Dave and Linus approve.
>
> An IPv6 implementation is also needed (DECnet too!), but I wanted to start 
> with
> the IPv4 implementation to get people comfortable with the idea before moving
> forward.  If this is accepted the IPv6 implementation can be posted shortly.
>
> There was also a request for switchdev support for this, but that will be
> posted as a followup as switchdev does not currently handle dead
> next-hops in a multi-path case and I felt that infra needed to be added
> first.

Andy, I finally got some time to try your patches with
switchd

Re: [PATCH net 2/2] bridge: multicast: start querier timer when running user-space stp

2015-06-18 Thread Nikolay Aleksandrov


> On Jun 18, 2015, at 6:37 AM, Herbert Xu  wrote:
> 
> On Wed, Jun 17, 2015 at 04:28:31AM -0700, Nikolay Aleksandrov wrote:
>> From: Satish Ashok 
>> 
>> When STP is running in user-space and querier is configured, the
>> querier timer is not started when a port goes to forwarding state.
>> 
>> Signed-off-by: Satish Ashok 
>> Signed-off-by: Nikolay Aleksandrov 
>> Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support")
>> ---
>> net/bridge/br_stp.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index fb3ebe615513..1e2f2f1ff6b0 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -456,6 +456,9 @@ void br_port_state_selection(struct net_bridge *br)
>>  p->topology_change_ack = 0;
>>  br_make_blocking(p);
>>  }
>> +} else if (br->stp_enabled == BR_USER_STP &&
>> +   p->state == BR_STATE_FORWARDING) {
>> +br_multicast_enable_port(p);
>>  }
> 
> Minor nit, the stp_enabled check appears to be redundant since
> you're in the else clause.
> 

Right you are, I’ve overlooked it.

> More importantly, I'm not sure about the logic.  For kernel STP,
> we enable the port as soon as we get out of blocking.  IIRC enabling
> the port just means that we start tracking subscriptions/queries
> so it should be OK to do even while we're listening/learning.
> 
> In any case the logic should be identical whether we use kernel
> STP or user-space STP.
> 
> So how about removing br_multicast_enable_port from br_make_forward
> and just add it here for both kernel and user-space STP?
> 
> Thanks,
> -- 
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Makes sense, I’ll re-spin, test and post a v2. Thank you for the suggestion.

Cheers,
 Nik


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

e1000e driver - hang after 4 hours of uptime - finally bisected!

2015-06-18 Thread Valdis Kletnieks

(follow up to a report from last week - bisecting took a while as I could
only do 1 or 2 tests an evening)

My Dell Latitude E6530 crashes with a specific kernel lockup almost
exactly 4 hours after boot if there isn't a cable connected to the
Ethernet port:

[14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
1
[14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
2
[14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
1
[14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
1
[14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 
3
[14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0

As far as I can tell, the timestamp jitter is just how long it takes me to
enter the cryptLUKS passphrase for the hard drive at boot...

lspci tells me:

lspci -vvv -s "00:19.0"
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network 
Connection (rev 04)
DeviceName:  Onboard LAN
Subsystem: Dell Device 0535
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-   [] dump_stack+0x50/0xa8
[14479.906930]  [] panic+0xcd/0x1e4
[14479.906940]  [] ? perf_event_task_disable+0xc0/0xc0
[14479.906952]  [] watchdog_overflow_callback+0x9b/0xa0
[14479.906959]  [] __perf_event_overflow+0xc4/0x1f0
[14479.906968]  [] perf_event_overflow+0x14/0x20
[14479.906976]  [] intel_pmu_handle_irq+0x1e1/0x430
[14479.906990]  [] perf_event_nmi_handler+0x26/0x40
[14479.906999]  [] nmi_handle+0x103/0x340
[14479.907005]  [] ? nmi_handle+0x5/0x340
[14479.907017]  [] default_do_nmi+0xc3/0x120
[14479.907032]  [] do_nmi+0xe8/0x130
[14479.907044]  [] end_repeat_nmi+0x1e/0x2e
[14479.907055]  [] ? e1000e_cyclecounter_read+0x16/0xc0
[14479.907061]  [] ? e1000e_cyclecounter_read+0x16/0xc0
[14479.907069]  [] ? e1000e_cyclecounter_read+0x16/0xc0
[14479.907075]  <>  [] timecounter_read+0x19/0x60
[14479.907088]  [] e1000e_phc_gettime+0x2e/0x60
[14479.907098]  [] e1000e_systim_overflow_work+0x31/0x70
[14479.907105]  [] process_one_work+0x3c9/0x980
[14479.907115]  [] ? process_one_work+0x312/0x980
[14479.907125]  [] ? worker_thread+0x78/0x760
[14479.907134]  [] worker_thread+0x2cc/0x760
[14479.907144]  [] ? process_one_work+0x980/0x980
[14479.907154]  [] kthread+0xfe/0x120
[14479.907163]  [] ? finish_task_switch+0x50/0x1c0
[14479.907173]  [] ? kthread_create_on_node+0x270/0x270
[14479.907179]  [] ret_from_fork+0x3f/0x70
[14479.907188]  [] ? kthread_create_on_node+0x270/0x270
[14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation 
range: 0x8000-0xbfff)

Bisection tells me it's this commit:

commit 83129b37ef35bb6a7f01c060129736a8db5d31c4
Author: Yanir Lubetkin 
Date:   Tue Jun 2 17:05:45 2015 +0300

e1000e: fix systim issues

Two issues involving systim were reported.
1. Clock is not running in the correct frequency
2. In some situations, systim values were not incremented linearly
This patch fixes the hardware clock configuration and the spurious
non-linear increment.




pgpw96_oDSKGZ.pgp
Description: PGP signature

[PATCH net v2] tcp: Do not call tcp_fastopen_reset_cipher from interrupt context

2015-06-18 Thread Christoph Paasch

tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.

Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:

[   36.001813] BUG: sleeping function called from invalid context at 
mm/slub.c:1266
[   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   36.008250]  04f2 88007f8838a8 8171d53a 
880075a084a8
[   36.009630]  880075a08000 88007f8838c8 810967d3 
88007f883928
[   36.011076]   88007f8838f8 81096892 
88007f89be00
[   36.012494] Call Trace:
[   36.012953][] dump_stack+0x4f/0x6d
[   36.014085]  [] ___might_sleep+0x103/0x170
[   36.015117]  [] __might_sleep+0x52/0x90
[   36.016117]  [] kmem_cache_alloc_trace+0x47/0x190
[   36.017266]  [] ? tcp_fastopen_reset_cipher+0x42/0x130
[   36.018485]  [] tcp_fastopen_reset_cipher+0x42/0x130
[   36.019679]  [] tcp_fastopen_init_key_once+0x61/0x70
[   36.020884]  [] __tcp_fastopen_cookie_gen+0x1c/0x60
[   36.022058]  [] tcp_try_fastopen+0x58f/0x730
[   36.023118]  [] tcp_conn_request+0x3e8/0x7b0
[   36.024185]  [] ? __module_text_address+0x12/0x60
[   36.025327]  [] tcp_v4_conn_request+0x51/0x60
[   36.026410]  [] tcp_rcv_state_process+0x190/0xda0
[   36.027556]  [] ? __inet_lookup_established+0x47/0x170
[   36.028784]  [] tcp_v4_do_rcv+0x16d/0x3d0
[   36.029832]  [] ? security_sock_rcv_skb+0x16/0x20
[   36.030936]  [] tcp_v4_rcv+0x77a/0x7b0
[   36.031875]  [] ? iptable_filter_hook+0x33/0x70
[   36.032953]  [] ip_local_deliver_finish+0x92/0x1f0
[   36.034065]  [] ip_local_deliver+0x9a/0xb0
[   36.035069]  [] ? ip_rcv+0x3d0/0x3d0
[   36.035963]  [] ip_rcv_finish+0x119/0x330
[   36.036950]  [] ip_rcv+0x2e7/0x3d0
[   36.037847]  [] __netif_receive_skb_core+0x552/0x930
[   36.038994]  [] __netif_receive_skb+0x27/0x70
[   36.040033]  [] process_backlog+0xd2/0x1f0
[   36.041025]  [] net_rx_action+0x122/0x310
[   36.042007]  [] __do_softirq+0x103/0x2f0
[   36.042978]  [] do_softirq_own_stack+0x1c/0x30

This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)

Cc: Eric Dumazet 
Cc: Hannes Frederic Sowa 
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to 
net_get_random_once")
Signed-off-by: Christoph Paasch 
---

Notes:
v2: Instead of reverting Hannes' patch, move the call to 
tcp_fastopen_init_once
to the places where we enable TFO on the server-side from user-context.

 net/ipv4/af_inet.c  | 2 ++
 net/ipv4/tcp.c  | 7 +--
 net/ipv4/tcp_fastopen.c | 2 --
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8b47a4d79d04..a5aa54ea6533 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog)
err = 0;
if (err)
goto out;
+
+   tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f1377f2a0472..bb2ce74f6004 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2545,10 +2545,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 
case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
-   TCPF_LISTEN)))
+   TCPF_LISTEN))) {
+   tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
-   else
+   } else {
err = -EINVAL;
+   }
break;
case TCP_TIMESTAMP:
if (!tp->repair)
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 46b087a27503..f9c0fb84e435 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -78,8 +78,6 @@ static bool __tcp_fastopen_cookie_gen(const void *path,
struct tcp_fastopen_context *ctx;
bool ok = false;
 
-   tcp_fastopen_init_key_once(true);
-
rcu_read_lock();
ctx = rcu_dereference(tcp_fastopen_ctx);
if (ctx) {
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger

Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Scott Feldman

On Thu, Jun 18, 2015 at 8:57 AM, Andy Gospodarek
 wrote:
> On Thu, Jun 18, 2015 at 08:43:08AM -0700, Scott Feldman wrote:
>> On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
>>  wrote:
>> > Signed-off-by: Andy Gospodaerk 
>> > Signed-off-by: Dinesh Dutt 
>> >
>> > ---
>> >  ip/iproute.c | 4 
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/ip/iproute.c b/ip/iproute.c
>> > index 3795baf..3369c49 100644
>> > --- a/ip/iproute.c
>> > +++ b/ip/iproute.c
>> > @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
>> > nlmsghdr *n, void *arg)
>> > fprintf(fp, "offload ");
>> > if (r->rtm_flags & RTM_F_NOTIFY)
>> > fprintf(fp, "notify ");
>> > +   if (r->rtm_flags & RTNH_F_LINKDOWN)
>> > +   fprintf(fp, "linkdown ");
>>
>>
>> iproute.c: In function ‘print_route’:
>> iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in
>> this function)
>> iproute.c:454:21: note: each undeclared identifier is reported only
>> once for each function it appears in
>
> Yes, you need to pull that from the patches above into your iproute2
> sources.  Stephen regularly tells people not to pose uapi updates, so I
> did not.

Ok, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Alexei Starovoitov

On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote:
> Original code has a problem, cause following code failed to pass verifier:
> 
>  r1 <- r10
>  r1 -= 8
>  r2 = 8
>  r3 = unsafe pointer
>  call BPF_FUNC_probe_read  <-- R1 type=inv expected=fp
> 
> However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
> loaded successfully.
> 
> This is because the verifier allows only BPF_ADD instruction on a
> FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
> on FRAME_PTR reigster to get a UNKNOWN_VALUE register.
> 
> This patch fix it by adding BPF_SUB in stack_relative checking.

It's not a bug. It's catching ADD only by design.
If we let it recognize SUB then one might argue we should let it
recognize multiply, shifts and all other arithmetic on pointers.
verifier will be getting bigger and bigger. Where do we stop?
llvm only emits canonical ADD. If you've seen llvm doing SUB,
let's fix it there.
So what piece generated this 'r1 -= 8' ?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Andy Gospodarek

On Thu, Jun 18, 2015 at 08:43:08AM -0700, Scott Feldman wrote:
> On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
>  wrote:
> > Signed-off-by: Andy Gospodaerk 
> > Signed-off-by: Dinesh Dutt 
> >
> > ---
> >  ip/iproute.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/ip/iproute.c b/ip/iproute.c
> > index 3795baf..3369c49 100644
> > --- a/ip/iproute.c
> > +++ b/ip/iproute.c
> > @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
> > nlmsghdr *n, void *arg)
> > fprintf(fp, "offload ");
> > if (r->rtm_flags & RTM_F_NOTIFY)
> > fprintf(fp, "notify ");
> > +   if (r->rtm_flags & RTNH_F_LINKDOWN)
> > +   fprintf(fp, "linkdown ");
> 
> 
> iproute.c: In function ‘print_route’:
> iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in
> this function)
> iproute.c:454:21: note: each undeclared identifier is reported only
> once for each function it appears in

Yes, you need to pull that from the patches above into your iproute2
sources.  Stephen regularly tells people not to pose uapi updates, so I
did not.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] mvneta: add forgotten initialization of autonegotiation bits

2015-06-18 Thread Stas Sergeev


The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state
signaling")
changed mvneta_adjust_link() so that it does not clear the auto-negotiation
bits in MVNETA_GMAC_AUTONEG_CONFIG register. This was necessary for
auto-negotiation mode to work.
Unfortunately I haven't checked if these bits are ever initialized.
It appears they are not.
This patch adds the missing initialization of the auto-negotiation bits
in the MVNETA_GMAC_AUTONEG_CONFIG register.
It fixes the following regression:
https://www.mail-archive.com/netdev@vger.kernel.org/msg67928.html

Since the patch was tested to fix a regression, it should be applied to
stable tree.

Tested-by: Arnaud Ebalard 

CC: Thomas Petazzoni 
CC: Florian Fainelli 
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
CC: sta...@vger.kernel.org

Signed-off-by: Stas Sergeev 
---
 drivers/net/ethernet/marvell/mvneta.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index ce5f7f9..74176ec 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1013,6 +1013,12 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
val = mvreg_read(pp, MVNETA_GMAC_CLOCK_DIVIDER);
val |= MVNETA_GMAC_1MS_CLOCK_ENABLE;
mvreg_write(pp, MVNETA_GMAC_CLOCK_DIVIDER, val);
+   } else {
+   val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG);
+   val &= ~(MVNETA_GMAC_INBAND_AN_ENABLE |
+  MVNETA_GMAC_AN_SPEED_EN |
+  MVNETA_GMAC_AN_DUPLEX_EN);
+   mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val);
}

mvneta_set_ucast_table(pp, -1);
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Michal Hocko

On Thu 18-06-15 17:22:40, Vlastimil Babka wrote:
> On 06/18/2015 04:43 PM, Michal Hocko wrote:
> >On Thu 18-06-15 07:35:53, Eric Dumazet wrote:
> >>On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko  wrote:
> >>
> >>>Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
> >>>_current_ implementation of the allocator has this nasty and very subtle
> >>>side effect but that doesn't mean it should be abused outside of the mm
> >>>proper. Why shouldn't this path wake the kswapd and let it compact
> >>>memory on the background to increase the success rate for the later
> >>>high order allocations?
> >>
> >>I kind of agree.
> >>
> >>If kswapd is a problem (is it ???) we should fix it, instead of adding
> >>yet another flag to some random locations attempting
> >>memory allocations.
> >
> >No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
> >access some portion of the memory reserves (see gfp_to_alloc_flags resp.
> >__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
> >hack to not give that access which was introduced for THP AFAIR.
> >
> >The implicit access to memory reserves for non sleeping allocation has
> >been there for ages and it might be not suitable for this particular
> >path but that doesn't mean another gfp flag with a different side effect
> >should be hijacked. We should either stop doing that implicit access to
> >memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
> >that is a problem to be solved in the mm proper. Spreading subtle
> >dependencies outside of mm will just make situation worse.
> 
> So you are not proposing to use these __GFP_RESERVE/NORESERVE flag outside
> of mm, right? (besides, we distinguish several kinds of reserves, so what
> exactly would the flag do?)

That is to be discussed. Most allocations already express their interest
in memory reserves by __GFP_HIGH directly or by GFP_ATOMIC indirectly.
So maybe we do not need any additional flag here. There are not that
many ~__GFP_WAIT and most of them seem to require it _only_ because the
context doesn't allow for sleeping (e.g. to prevent from deadlocks).

> As that would be also subtle dependency. The
> general problem I think is that we should want the mm users to specify
> higher-level intentions (such as GFP_KERNEL) which would map to specific
> directions (__GFP_*) for the allocator, and currently it's rather a mess of
> both kinds of flags.

I agree. So I think that maybe we should drop that implicit access to
memory reserves for ~__GFP_WAIT allocations and let it do what it is
documented to do.

> Clearly the intention here is "opportunistic allocation that should
> not reclaim/compact, use reserves, wake up kswapd (?) because it's
> better to fall back to smaller pages than wait") and we don't seem to
> have a GFP_OPPORTUNISTIC flag for that. The allocation has to then
> mask out __GFP_WAIT which however looks like an atomic allocation to
> the allocator and give access to reserves, etc...

I think simply dropping GFP_WAIT is a good way to express that. The
fact that the current implementation gives access to memory reserves
implicitly is just a detail and the user of the allocator shouldn't care
about that.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] Revert "tcp: switch tcp_fastopen key generation to net_get_random_once"

2015-06-18 Thread Christoph Paasch

On 18/06/15 - 04:14:13, Eric Dumazet wrote:
> On Thu, 2015-06-18 at 11:32 +0200, Hannes Frederic Sowa wrote:
> > > There does not seem to be a better way to handle this. We could try
> > > to make the call to kmalloc and crypto_alloc_cipher during bootup, and
> > > then generate the random value only on-the-fly (when the first TFO-SYN
> > > comes in) with net_get_random_once in order to have the better entropy
> > > that comes with doing the late initialisation of the random value. But
> > > that's probably net-next material.
> > 
> > can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt 
> > and
> > sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process 
> > context
> > but we still defer the extraction of entropy as long as posible?
> 
> Yes, I do not think this would be hard. This bug is old (3.13) and does
> not seem very urgent to expedite a revert.

True, it would be simpler to call tcp_fastopen_init_key_once to the
setsocketopt() and inet_listen().

I will resubmit.


Christoph

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Scott Feldman

On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek
 wrote:
> Signed-off-by: Andy Gospodaerk 
> Signed-off-by: Dinesh Dutt 
>
> ---
>  ip/iproute.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 3795baf..3369c49 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
> nlmsghdr *n, void *arg)
> fprintf(fp, "offload ");
> if (r->rtm_flags & RTM_F_NOTIFY)
> fprintf(fp, "notify ");
> +   if (r->rtm_flags & RTNH_F_LINKDOWN)
> +   fprintf(fp, "linkdown ");


iproute.c: In function ‘print_route’:
iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in
this function)
iproute.c:454:21: note: each undeclared identifier is reported only
once for each function it appears in
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] net/macb: add sama5d2 support

2015-06-18 Thread Alexandre Belloni

On 18/06/2015 at 16:27:19 +0200, Nicolas Ferre wrote :
> Hi,
> 
> This series is basically the support for another flavor of the GEM IP
> configuration. It ended up being a series because of some little fixes made to
> the binding documentation before adding the new compatibility string.
> 
> Bye,
> 
> v2: - fix bindings
> - add sama5d2 compatibility string to the binding documentation
> 
> Cyrille Pitchen (1):
>   net/macb: add config for Atmel sama5d2 SoCs
> 
> Nicolas Ferre (3):
>   net/macb: bindings doc: fix compatibility string
>   net/macb: bindings doc/trivial: fix sama5d4 comment
>   net/macb: bindings doc: add sama5d2 compatibility sting
> 
>  Documentation/devicetree/bindings/net/macb.txt | 5 +++--
>  drivers/net/ethernet/cadence/macb.c| 8 
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 

For the patch set:
Acked-by: Alexandre Belloni 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/3 v5] net: track link-status of ipv4 nexthops

2015-06-18 Thread Andy Gospodarek

Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
reachable via an interface where carrier is off.  No action is taken,
but additional flags are passed to userspace to indicate carrier status.

This also includes a cleanup to fib_disable_ip to more clearly indicate
what event made the function call to replace the more cryptic force
option previously used.

v2: Split out kernel functionality into 2 patches, this patch simply sets and
clears new nexthop flag RTNH_F_LINKDOWN.

v3: Cleanups suggested by Alex as well as a bug noticed in
fib_sync_down_dev and fib_sync_up when multipath was not enabled.

v5: Whitespace and variable declaration fixups suggested by Dave

Signed-off-by: Andy Gospodarek 
Signed-off-by: Dinesh Dutt 
---
 include/net/ip_fib.h   |  4 +--
 include/uapi/linux/rtnetlink.h |  3 +++
 net/ipv4/fib_frontend.c| 22 ++--
 net/ipv4/fib_semantics.c   | 60 +-
 4 files changed, 66 insertions(+), 23 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..f73d27c 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -305,9 +305,9 @@ void fib_flush_external(struct net *net);
 
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
-int fib_sync_down_dev(struct net_device *dev, int force);
+int fib_sync_down_dev(struct net_device *dev, unsigned long event);
 int fib_sync_down_addr(struct net *net, __be32 local);
-int fib_sync_up(struct net_device *dev);
+int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 void fib_select_multipath(struct fib_result *res);
 
 /* Exported by fib_trie.c */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..8ab874a 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -338,6 +338,9 @@ struct rtnexthop {
 #define RTNH_F_PERVASIVE   2   /* Do recursive gateway lookup  */
 #define RTNH_F_ONLINK  4   /* Gateway is forced on link*/
 #define RTNH_F_OFFLOAD 8   /* offloaded route */
+#define RTNH_F_LINKDOWN16  /* carrier-down on nexthop */
+
+#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | RTNH_F_LINKDOWN) /* used as mask 
for route comparisons */
 
 /* Macros to handle hexthops */
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..54d3c45 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1063,9 +1063,9 @@ static void nl_fib_lookup_exit(struct net *net)
net->ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, int force)
+static void fib_disable_ip(struct net_device *dev, unsigned long event)
 {
-   if (fib_sync_down_dev(dev, force))
+   if (fib_sync_down_dev(dev, event))
fib_flush(dev_net(dev));
rt_cache_flush(dev_net(dev));
arp_ifdown(dev);
@@ -1081,7 +1081,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
case NETDEV_UP:
fib_add_ifaddr(ifa);
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   fib_sync_up(dev);
+   fib_sync_up(dev, RTNH_F_DEAD);
 #endif
atomic_inc(&net->ipv4.dev_addr_genid);
rt_cache_flush(dev_net(dev));
@@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
/* Last address was deleted from this interface.
 * Disable IP.
 */
-   fib_disable_ip(dev, 1);
+   fib_disable_ip(dev, event);
} else {
rt_cache_flush(dev_net(dev));
}
@@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct in_device *in_dev;
struct net *net = dev_net(dev);
+   unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, 2);
+   fib_disable_ip(dev, event);
rt_flush_dev(dev);
return NOTIFY_DONE;
}
@@ -1124,16 +1125,21 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
fib_add_ifaddr(ifa);
} endfor_ifa(in_dev);
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-   fib_sync_up(dev);
+   fib_sync_up(dev, RTNH_F_DEAD);
 #endif
atomic_inc(&net->ipv4.dev_addr_genid);
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, 0);
+   fib_disable_ip(dev, event);
break;
-   case NETDEV_CHANGEMTU:
case NETDEV_CHANGE:
+   flags = dev_get_flags(dev);
+   if (flags & (IFF_RUNNING|IFF_LOWER_UP))
+   fi

[PATCH net-next 2/3 v5] net: ipv4 sysctl option to ignore routes when nexthop link is down

2015-06-18 Thread Andy Gospodarek

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it the
netlink listeners would have no idea whether or not a nexthop would be
selected.   The kernel only sets RTNH_F_DEAD internally if the inteface has
IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
cache
local 80.0.0.1 dev lo  src 80.0.0.1
cache 
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
cache
local 80.0.0.1 dev lo  src 80.0.0.1
cache 
80.0.0.2 dev p8p1  src 80.0.0.1
cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

Signed-off-by: Andy Gospodarek 
Signed-off-by: Dinesh Dutt 
---
 include/linux/inetdevice.h|  3 +++
 include/net/fib_rules.h   |  3 ++-
 include/net/ip_fib.h  | 16 +---
 include/uapi/linux/ip.h   |  1 +
 net/ipv4/devinet.c|  2 ++
 net/ipv4/fib_frontend.c   |  6 +++---
 net/ipv4/fib_rules.c  |  5 +++--
 net/ipv4/fib_semantics.c  | 31 ++-
 net/ipv4/fib_trie.c   |  7 +++
 net/ipv4/netfilter/ipt_rpfilter.c |  2 +-
 net/ipv4/route.c  | 10 +-
 11 files changed, 62 insertions(+), 24 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 0a21fbe..a4328ce 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -120,6 +120,9 @@ static inline void ipv4_devconf_setall(struct in_device 
*in_dev)
 || (!IN_DEV_FORWARD(in_dev) && \
  IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS)))
 
+#define IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) \
+   IN_DEV_CONF_GET((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
+
 #define IN_DEV_ARPFILTER(in_dev)   IN_DEV_ORCONF((in_dev), ARPFILTER)
 #define IN_DEV_ARP_ACCEPT(in_dev)  IN_DEV_ORCONF((in_dev), ARP_ACCEPT)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6d67383..903a55e 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -36,7 +36,8 @@ struct fib_lookup_arg {
void*result;
struct fib_rule *rule;
int flags;
-#define FIB_LOOKUP_NOREF   1
+#define FIB_LOOKUP_NOREF   1
+#define FIB_LOOKUP_IGNORE_LINKSTATE2
 };
 
 struct fib_rules_ops {
diff --git a/include/net/ip_fib.h b/include/n

[PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag

2015-06-18 Thread Andy Gospodarek

Signed-off-by: Andy Gospodaerk 
Signed-off-by: Dinesh Dutt 

---
 ip/iproute.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/ip/iproute.c b/ip/iproute.c
index 3795baf..3369c49 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
fprintf(fp, "offload ");
if (r->rtm_flags & RTM_F_NOTIFY)
fprintf(fp, "notify ");
+   if (r->rtm_flags & RTNH_F_LINKDOWN)
+   fprintf(fp, "linkdown ");
if (tb[RTA_MARK]) {
unsigned int mark = *(unsigned int*)RTA_DATA(tb[RTA_MARK]);
if (mark) {
@@ -670,6 +672,8 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
fprintf(fp, " onlink");
if (nh->rtnh_flags & RTNH_F_PERVASIVE)
fprintf(fp, " pervasive");
+   if (nh->rtnh_flags & RTNH_F_LINKDOWN)
+   fprintf(fp, " linkdown");
len -= NLMSG_ALIGN(nh->rtnh_len);
nh = RTNH_NEXT(nh);
}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status

2015-06-18 Thread Andy Gospodarek

This series adds the ability to have the Linux kernel track whether or
not a particular route should be used based on the link-status of the
interface associated with the next-hop.

Before this patch any link-failure on an interface that was serving as a
gateway for some systems could result in those systems being isolated
from the rest of the network as the stack would continue to attempt to
send frames out of an interface that is actually linked-down.  When the
kernel is responsible for all forwarding, it should also be responsible
for taking action when the traffic can no longer be forwarded -- there
is no real need to outsource link-monitoring to userspace anymore.

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, the kernel will not only report to
userspace that the link is down, but it will also report to userspace
that a route is dead.  This will signal to userspace that the route will
not be selected.

With the new sysctls set, the following behavior can be observed
(interface p8p1 is link-down):

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 
# ip route get 90.0.0.1 
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1 
cache 
# ip route get 80.0.0.1 
local 80.0.0.1 dev lo  src 80.0.0.1 
cache  
# ip route get 80.0.0.2
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15 
cache 

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2 
# ip route get 90.0.0.1 
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1 
cache 
# ip route get 80.0.0.1 
local 80.0.0.1 dev lo  src 80.0.0.1 
cache  
# ip route get 80.0.0.2
80.0.0.2 dev p8p1  src 80.0.0.1 
cache 

and the output changes to what one would expect.

If the global or interface sysctl is not set, the following output would be
expected when p8p1 is down:

# ip route show 
default via 10.0.5.2 dev p9p1 
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15 
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1 
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2 

If the dead flag does not appear there should be no expectation that the
kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches: first to add linkdown flag and
second to add new sysctl settings.  Also took suggestion from Alex to
simplify code by only checking sysctl during fib lookup and suggestion
from Scott to add a per-interface sysctl.  Added iproute2 patch to
recognize and print linkdown flag.

v3: Code cleanups along with reverse-path checks suggested by Alex and
small fixes related to problems found when multipath was disabled.

v4: Drop binary sysctls

v5: Whitespace and variable declaration fixups suggested by Dave

Though there were some that preferred not to have a configuration option
and to make this behavior the default when it was discussed in Ottawa
earlier this year since "it was time to do this."  I wanted to propose
the config option to preserve the current behavior for those that desire
it.  I'll happily remove it if Dave and Linus approve.

An IPv6 implementation is also needed (DECnet too!), but I wanted to start with
the IPv4 implementation to get people comfortable with the idea before moving
forward.  If this is accepted the IPv6 implementation can be posted shortly.

There was also a request for switchdev support for this, but that will be
posted as a followup as switchdev does not currently handle dead
next-hops in a multi-path case and I felt that infra needed to be added
first.

FWIW, we have been running the original version of this series with a
global sysctl and our customers have been happily using a backported
version for IPv4 and IPv6 for >6 months.

Andy Gospodarek (3):
  net: track link-status of ipv4 nexthops
  net: ipv4 sysctl option to ignore routes when

Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Vlastimil Babka


On 06/18/2015 04:43 PM, Michal Hocko wrote:

On Thu 18-06-15 07:35:53, Eric Dumazet wrote:

On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko  wrote:


Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
_current_ implementation of the allocator has this nasty and very subtle
side effect but that doesn't mean it should be abused outside of the mm
proper. Why shouldn't this path wake the kswapd and let it compact
memory on the background to increase the success rate for the later
high order allocations?


I kind of agree.

If kswapd is a problem (is it ???) we should fix it, instead of adding
yet another flag to some random locations attempting
memory allocations.


No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
access some portion of the memory reserves (see gfp_to_alloc_flags resp.
__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
hack to not give that access which was introduced for THP AFAIR.

The implicit access to memory reserves for non sleeping allocation has
been there for ages and it might be not suitable for this particular
path but that doesn't mean another gfp flag with a different side effect
should be hijacked. We should either stop doing that implicit access to
memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
that is a problem to be solved in the mm proper. Spreading subtle
dependencies outside of mm will just make situation worse.


So you are not proposing to use these __GFP_RESERVE/NORESERVE flag 
outside of mm, right? (besides, we distinguish several kinds of 
reserves, so what exactly would the flag do?) As that would be also 
subtle dependency. The general problem I think is that we should want 
the mm users to specify higher-level intentions (such as GFP_KERNEL) 
which would map to specific directions (__GFP_*) for the allocator, and 
currently it's rather a mess of both kinds of flags. Clearly the 
intention here is "opportunistic allocation that should not 
reclaim/compact, use reserves, wake up kswapd (?) because it's better to 
fall back to smaller pages than wait") and we don't seem to have a 
GFP_OPPORTUNISTIC flag for that. The allocation has to then mask out 
__GFP_WAIT which however looks like an atomic allocation to the 
allocator and give access to reserves, etc...

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/3 v4] net: track link-status of ipv4 nexthops

2015-06-18 Thread Andy Gospodarek

On Thu, Jun 18, 2015 at 03:26:30AM -0700, David Miller wrote:
> From: Andy Gospodarek 
> Date: Mon, 15 Jun 2015 12:33:19 -0400
> 
> > @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block 
> > *this, unsigned long event, vo
> > struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> > struct in_device *in_dev;
> > struct net *net = dev_net(dev);
> > +   unsigned flags;
> 
> Please always fully spell out "unsigned int" instead of shortening it to
> just "unsigned", thanks.
> 
> > @@ -920,11 +926,17 @@ struct fib_info *fib_create_info(struct fib_config 
> > *cfg)
> > if (!nh->nh_dev)
> > goto failure;
> > } else {
> > +   int linkdown = 0;
> > change_nexthops(fi) {
> 
> Please put an empty line between local variable declarations and
> code.

Ugh, thanks.  I'll fixup this and your other comments with v5.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces

2015-06-18 Thread Eric W. Biederman


Cc' list trimmed as this is not longer about the original patch
submission.

Julian Anastasov  writes:

>   Hello,
>
> On Wed, 17 Jun 2015, Eric W. Biederman wrote:
>
>> p.s.  I do have my patch that I can toss in your direction if you are
>> interested.
>
>   Of course... I'll be able to check it after 8 hours...


My incremental patch for ipvs on top of everything else I have pushed
out looks like this:

From: "Eric W. Biederman" 
Date: Fri, 12 Jun 2015 18:34:12 -0500
Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used

Pass struct net down to where it is used and stop guessing
which network namespace should be used.

Signed-off-by: "Eric W. Biederman" 
---
 include/net/ip_vs.h |  45 +++-
 net/netfilter/ipvs/ip_vs_conn.c |  11 ++-
 net/netfilter/ipvs/ip_vs_core.c | 118 ++--
 net/netfilter/ipvs/ip_vs_ftp.c  |   8 +--
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |   9 ++-
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |   5 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c|   8 +--
 net/netfilter/ipvs/ip_vs_proto_udp.c|   5 +-
 net/netfilter/ipvs/ip_vs_xmit.c |  51 --
 net/netfilter/xt_ipvs.c |   2 +-
 10 files changed, 108 insertions(+), 154 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 4e3731ee4eac..a556d14cff70 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -35,37 +35,6 @@ static inline struct netns_ipvs *net_ipvs(struct net* net)
return net->ipvs;
 }
 
-/* Get net ptr from skb in traffic cases
- * use skb_sknet when call is from userland (ioctl or netlink)
- */
-static inline struct net *skb_net(const struct sk_buff *skb)
-{
-#ifdef CONFIG_NET_NS
-#ifdef CONFIG_IP_VS_DEBUG
-   /*
-* This is used for debug only.
-* Start with the most likely hit
-* End with BUG
-*/
-   if (likely(skb->dev && dev_net(skb->dev)))
-   return dev_net(skb->dev);
-   if (skb_dst(skb) && skb_dst(skb)->dev)
-   return dev_net(skb_dst(skb)->dev);
-   WARN(skb->sk, "Maybe skb_sknet should be used in %s() at line:%d\n",
- __func__, __LINE__);
-   if (likely(skb->sk && sock_net(skb->sk)))
-   return sock_net(skb->sk);
-   pr_err("There is no net ptr to find in the skb in %s() line:%d\n",
-   __func__, __LINE__);
-   BUG();
-#else
-   return dev_net(skb->dev ? : skb_dst(skb)->dev);
-#endif
-#else
-   return &init_net;
-#endif
-}
-
 static inline struct net *skb_sknet(const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_NS
@@ -441,19 +410,19 @@ struct ip_vs_protocol {
 
void (*exit_netns)(struct net *net, struct ip_vs_proto_data *pd);
 
-   int (*conn_schedule)(int af, struct sk_buff *skb,
+   int (*conn_schedule)(struct net *net, int af, struct sk_buff *skb,
 struct ip_vs_proto_data *pd,
 int *verdict, struct ip_vs_conn **cpp,
 struct ip_vs_iphdr *iph);
 
struct ip_vs_conn *
-   (*conn_in_get)(int af,
+   (*conn_in_get)(struct net *net, int af,
   const struct sk_buff *skb,
   const struct ip_vs_iphdr *iph,
   int inverse);
 
struct ip_vs_conn *
-   (*conn_out_get)(int af,
+   (*conn_out_get)(struct net *net, int af,
const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
int inverse);
@@ -1179,13 +1148,15 @@ static inline void ip_vs_conn_fill_param(struct net 
*net, int af, int protocol,
 struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p);
 struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
 
-struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
+struct ip_vs_conn * ip_vs_conn_in_get_proto(struct net *net, int af,
+   const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
int inverse);
 
 struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
 
-struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
+struct ip_vs_conn * ip_vs_conn_out_get_proto(struct net *net, int af,
+const struct sk_buff *skb,
 const struct ip_vs_iphdr *iph,
 int inverse);
 
@@ -1215,7 +1186,7 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
 
 const char *ip_vs_state_name(__u16 proto, int state);
 
-void ip_vs_tcp_conn_listen(struct net *net, struct ip_vs_conn *cp);
+void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp);
 int ip_vs_check_template(struct ip_vs_conn *ct);
 void ip_vs_random_dropentry(struct net *net);
 int

Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Andy Gospodarek

On Thu, Jun 18, 2015 at 04:17:36AM -0700, Eric Dumazet wrote:
> On Wed, 2015-06-17 at 17:59 -0700, Mahesh Bandewar wrote:
> > Actor and Partner details can be accessed via proc-fs, sys-fs
> > entries or netlink interface. These interfaces are world readable
> > at this moment. The earlier patch-series made the LACP communication
> > secure to avoid nuisance attack from within the same L2 domain but
> > it did not prevent "someone unprivileged" looking at that information
> > on host and perform the same act.
> > 
> > This patch essentially avoids spitting those entries if the user
> > in question does not have enough privileges.
> > 
> > Signed-off-by: Mahesh Bandewar 
> > ---
> >  drivers/net/bonding/bond_netlink.c |  11 ++--
> >  drivers/net/bonding/bond_procfs.c  | 101 
> > +++--
> >  drivers/net/bonding/bond_sysfs.c   |  12 ++---
> >  3 files changed, 67 insertions(+), 57 deletions(-)
> > 
> > diff --git a/drivers/net/bonding/bond_netlink.c 
> > b/drivers/net/bonding/bond_netlink.c
> > index 5580fcde738f..3fd3aa4b145e 100644
> > --- a/drivers/net/bonding/bond_netlink.c
> > +++ b/drivers/net/bonding/bond_netlink.c
> > @@ -600,18 +600,23 @@ static int bond_fill_info(struct sk_buff *skb,
> >  
> > if (BOND_MODE(bond) == BOND_MODE_8023AD) {
> > struct ad_info info;
> > +   u8 zero_mac[ETH_ALEN];
> >  
> > +   eth_zero_addr(zero_mac);
> > if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
> > -   bond->params.ad_actor_sys_prio))
> > +   capable(CAP_NET_ADMIN) ?
> > +   bond->params.ad_actor_sys_prio : 0))
> > goto nla_put_failure;
> >  
> > if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
> > -   bond->params.ad_user_port_key))
> > +   capable(CAP_NET_ADMIN) ?
> > +   bond->params.ad_user_port_key : 0))
> > goto nla_put_failure;
> >  
> > if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
> > sizeof(bond->params.ad_actor_system),
> > -   &bond->params.ad_actor_system))
> > +   capable(CAP_NET_ADMIN) ?
> > +   &bond->params.ad_actor_system : &zero_mac))
> > goto nla_put_failure;
> >  
> 
> Hmm... I would rather not send these fake attributes at all ?

That would be my preference as well.  Sorry if my lack of elaboration on
on my earlier email made this confusing.

If there are values that should not be visible to non-root users, then
don't send them at all.  Do not just send NULL values.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Michal Hocko

On Thu 18-06-15 07:35:53, Eric Dumazet wrote:
> On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko  wrote:
> 
> > Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
> > _current_ implementation of the allocator has this nasty and very subtle
> > side effect but that doesn't mean it should be abused outside of the mm
> > proper. Why shouldn't this path wake the kswapd and let it compact
> > memory on the background to increase the success rate for the later
> > high order allocations?
> 
> I kind of agree.
> 
> If kswapd is a problem (is it ???) we should fix it, instead of adding
> yet another flag to some random locations attempting
> memory allocations.

No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can
access some portion of the memory reserves (see gfp_to_alloc_flags resp.
__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty
hack to not give that access which was introduced for THP AFAIR.

The implicit access to memory reserves for non sleeping allocation has
been there for ages and it might be not suitable for this particular
path but that doesn't mean another gfp flag with a different side effect
should be hijacked. We should either stop doing that implicit access to
memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But
that is a problem to be solved in the mm proper. Spreading subtle
dependencies outside of mm will just make situation worse. 
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Eric Dumazet

On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko  wrote:

> Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
> _current_ implementation of the allocator has this nasty and very subtle
> side effect but that doesn't mean it should be abused outside of the mm
> proper. Why shouldn't this path wake the kswapd and let it compact
> memory on the background to increase the success rate for the later
> high order allocations?

I kind of agree.

If kswapd is a problem (is it ???) we should fix it, instead of adding
yet another flag to some random locations attempting
memory allocations.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-18 Thread Michal Hocko

On Wed 17-06-15 16:02:59, David Rientjes wrote:
> On Fri, 12 Jun 2015, Vlastimil Babka wrote:
> 
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 3cfff2a..41ec022 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -4398,7 +4398,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long
> > > header_len,
> > > 
> > >   while (order) {
> > >   if (npages >= 1 << order) {
> > > - page = alloc_pages(gfp_mask |
> > > + page = alloc_pages((gfp_mask & ~__GFP_WAIT) |
> > >  __GFP_COMP |
> > >  __GFP_NOWARN |
> > >  __GFP_NORETRY,
> > 
> > Note that __GFP_NORETRY is weaker than ~__GFP_WAIT and thus redundant. But 
> > it
> > won't hurt anything leaving it there. And you might consider __GFP_NO_KSWAPD
> > instead, as I said in the other thread.
> > 
> 
> Yeah, I agreed with __GFP_NO_KSWAPD to avoid utilizing memory reserves for 
> this.

Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the
_current_ implementation of the allocator has this nasty and very subtle
side effect but that doesn't mean it should be abused outside of the mm
proper. Why shouldn't this path wake the kswapd and let it compact
memory on the background to increase the success rate for the later
high order allocations?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/4] net/macb: bindings doc: add sama5d2 compatibility sting

2015-06-18 Thread Nicolas Ferre

Add sama5d2 to the biding documentation for this use of the GEM IP.

Signed-off-by: Nicolas Ferre 
---
 Documentation/devicetree/bindings/net/macb.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 97349e3f3ff2..b5d79761ac97 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -7,6 +7,7 @@ Required properties:
   Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: 
"cdns,macb".
   Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: "cdns,gem".
+  Use "atmel,sama5d2-gem" for the GEM IP (10/100) available on Atmel sama5d2 
SoCs.
   Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs.
   Use "atmel,sama5d4-gem" for the GEM IP (10/100) available on Atmel sama5d4 
SoCs.
   Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC.
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/4] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Nicolas Ferre

From: Cyrille Pitchen 

Add the compatible string for Atmel sama5d2 SoC family as the configuration
options differ from other instances of the GEM.

Signed-off-by: Cyrille Pitchen 
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 740d04fd2223..caeb39561567 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
.init = macb_init,
 };
 
+static const struct macb_config sama5d2_config = {
+   .caps = 0,
+   .dma_burst_length = 16,
+   .clk_init = macb_clk_init,
+   .init = macb_init,
+};
+
 static const struct macb_config sama5d3_config = {
.caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
.dma_burst_length = 16,
@@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
{ .compatible = "cdns,macb" },
{ .compatible = "cdns,pc302-gem", .data = &pc302gem_config },
{ .compatible = "cdns,gem", .data = &pc302gem_config },
+   { .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config },
{ .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config },
{ .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config },
{ .compatible = "cdns,at91rm9200-emac", .data = &emac_config },
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/4] net/macb: bindings doc/trivial: fix sama5d4 comment

2015-06-18 Thread Nicolas Ferre

On sama5d4, we only have a GEM IP that is configured to do 10/100 Mbits. So the
use of "Gigabit" can be confusing.

Signed-off-by: Nicolas Ferre 
---
 Documentation/devicetree/bindings/net/macb.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 0ae6974383d7..97349e3f3ff2 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -8,7 +8,7 @@ Required properties:
   Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: "cdns,gem".
   Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs.
-  Use "atmel,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs.
+  Use "atmel,sama5d4-gem" for the GEM IP (10/100) available on Atmel sama5d4 
SoCs.
   Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC.
 - reg: Address and length of the register set for the device
 - interrupts: Should contain macb interrupt
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/4] net/macb: bindings doc: fix compatibility string

2015-06-18 Thread Nicolas Ferre

In the driver and the DT bindings we use the "atmel" prefix. Fix it in the
binding documentation.

Signed-off-by: Nicolas Ferre 
---
 Documentation/devicetree/bindings/net/macb.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/macb.txt 
b/Documentation/devicetree/bindings/net/macb.txt
index 8ec5fdf444e9..0ae6974383d7 100644
--- a/Documentation/devicetree/bindings/net/macb.txt
+++ b/Documentation/devicetree/bindings/net/macb.txt
@@ -7,8 +7,8 @@ Required properties:
   Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: 
"cdns,macb".
   Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on
   the Cadence GEM, or the generic form: "cdns,gem".
-  Use "cdns,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs.
-  Use "cdns,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs.
+  Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs.
+  Use "atmel,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs.
   Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC.
 - reg: Address and length of the register set for the device
 - interrupts: Should contain macb interrupt
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/4] net/macb: add sama5d2 support

2015-06-18 Thread Nicolas Ferre

Hi,

This series is basically the support for another flavor of the GEM IP
configuration. It ended up being a series because of some little fixes made to
the binding documentation before adding the new compatibility string.

Bye,

v2: - fix bindings
- add sama5d2 compatibility string to the binding documentation

Cyrille Pitchen (1):
  net/macb: add config for Atmel sama5d2 SoCs

Nicolas Ferre (3):
  net/macb: bindings doc: fix compatibility string
  net/macb: bindings doc/trivial: fix sama5d4 comment
  net/macb: bindings doc: add sama5d2 compatibility sting

 Documentation/devicetree/bindings/net/macb.txt | 5 +++--
 drivers/net/ethernet/cadence/macb.c| 8 
 2 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Nicolas Ferre

Le 18/06/2015 15:30, Alexandre Belloni a écrit :
> On 18/06/2015 at 12:18:19 +0200, Nicolas Ferre wrote :
>> From: Cyrille Pitchen 
>>
>> Add the compatible string for Atmel sama5d2 SoC family as the configuration
>> options differ from other instances of the GEM.
>>
>> Signed-off-by: Cyrille Pitchen 
>> Signed-off-by: Nicolas Ferre 
>> ---
>>  drivers/net/ethernet/cadence/macb.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.c 
>> b/drivers/net/ethernet/cadence/macb.c
>> index 740d04fd2223..caeb39561567 100644
>> --- a/drivers/net/ethernet/cadence/macb.c
>> +++ b/drivers/net/ethernet/cadence/macb.c
>> @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
>>  .init = macb_init,
>>  };
>>  
>> +static const struct macb_config sama5d2_config = {
>> +.caps = 0,
>> +.dma_burst_length = 16,
>> +.clk_init = macb_clk_init,
>> +.init = macb_init,
>> +};
>> +
>>  static const struct macb_config sama5d3_config = {
>>  .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
>>  .dma_burst_length = 16,
>> @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
>>  { .compatible = "cdns,macb" },
>>  { .compatible = "cdns,pc302-gem", .data = &pc302gem_config },
>>  { .compatible = "cdns,gem", .data = &pc302gem_config },
>> +{ .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config },
> 
> This compatible has to be documented

Sure, I re-send a series right now (and add some documentation fixes).

Thanks, bye,

> 
>>  { .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config },
>>  { .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config },
>>  { .compatible = "cdns,at91rm9200-emac", .data = &emac_config },
>> -- 
>> 2.1.3
>>
> 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH net-next v2 2/5] net: add phys ID compare helper to test if two IDs are the same

2015-06-18 Thread Sergei Shtylyov


Hello.

On 6/18/2015 12:53 AM, sfel...@gmail.com wrote:


From: Scott Feldman 



Signed-off-by: Scott Feldman 
---
  include/linux/netdevice.h |7 +++
  net/switchdev/switchdev.c |8 ++--
  2 files changed, 9 insertions(+), 6 deletions(-)



diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7be616e1..63090ce 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -766,6 +766,13 @@ struct netdev_phys_item_id {
unsigned char id_len;
  };

+static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
+   struct netdev_phys_item_id *b)
+{
+   return ((a->id_len == b->id_len) &&
+   (memcmp(a->id, b->id, a->id_len) == 0));


   Parens around the *return* expression not needed (and neither the ones 
around ==).


[...]

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 12/22] fjes: net_device_ops.ndo_get_stats64

2015-06-18 Thread Sergei Shtylyov


Hello.

On 6/18/2015 3:49 AM, Taku Izumi wrote:


This patch adds net_device_ops.ndo_get_stats64 callback.



Signed-off-by: Taku Izumi 
---
  drivers/platform/x86/fjes/fjes_main.c | 14 ++
  1 file changed, 14 insertions(+)



diff --git a/drivers/platform/x86/fjes/fjes_main.c 
b/drivers/platform/x86/fjes/fjes_main.c
index 97bf487..eeda824 100644
--- a/drivers/platform/x86/fjes/fjes_main.c
+++ b/drivers/platform/x86/fjes/fjes_main.c
@@ -57,6 +57,8 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *,
  static void fjes_raise_intr_rxdata_task(struct work_struct *);
  static void fjes_tx_stall_task(struct work_struct *);
  static irqreturn_t fjes_intr(int, void*);
+static struct rtnl_link_stats64
+*fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *);


   I'd leave * on the first line, otherwise it looks quite ugly..

[...]

@@ -734,6 +737,17 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *skb,
return ret;
  }

+static struct rtnl_link_stats64
+*fjes_get_stats64(struct net_device *netdev,


   Same here.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/22] fjes: net_device_ops.ndo_tx_timeout

2015-06-18 Thread Sergei Shtylyov


Hello.

On 6/18/2015 3:49 AM, Taku Izumi wrote:


This patch adds net_device_ops.ndo_tx_timeout callback.



Signed-off-by: Taku Izumi 
---
  drivers/platform/x86/fjes/fjes_main.c | 9 +
  1 file changed, 9 insertions(+)



diff --git a/drivers/platform/x86/fjes/fjes_main.c 
b/drivers/platform/x86/fjes/fjes_main.c
index 72541a7..84727d8 100644
--- a/drivers/platform/x86/fjes/fjes_main.c
+++ b/drivers/platform/x86/fjes/fjes_main.c

[...]

@@ -739,6 +741,13 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *skb,
return ret;
  }

+static void fjes_tx_retry(struct net_device *netdev)
+{
+   struct netdev_queue *curQueue = netdev_get_tx_queue(netdev, 0);


   No CamelCase, please.

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/22] fjes: epstop_task

2015-06-18 Thread Sergei Shtylyov


Hello.

On 6/18/2015 3:49 AM, Taku Izumi wrote:


This patch adds epstop_task.
This task is used to process other receiver's
cancellation request.



Signed-off-by: Taku Izumi 
---
  drivers/platform/x86/fjes/fjes_hw.c   | 34 ++
  drivers/platform/x86/fjes/fjes_hw.h   |  1 +
  drivers/platform/x86/fjes/fjes_main.c |  1 +
  3 files changed, 36 insertions(+)



diff --git a/drivers/platform/x86/fjes/fjes_hw.c 
b/drivers/platform/x86/fjes/fjes_hw.c
index e07b266..c22679a 100644
--- a/drivers/platform/x86/fjes/fjes_hw.c
+++ b/drivers/platform/x86/fjes/fjes_hw.c

[...]

@@ -1123,3 +1126,34 @@ static void fjes_hw_update_zone_task(struct work_struct 
*work)
}
  }

+static void fjes_hw_epstop_task(struct work_struct *work)
+{
+   struct fjes_hw *hw = container_of(work,
+   struct fjes_hw, epstop_task);


   Please start the continuation lines under 'work' on the first line.


+   struct fjes_adapter *adapter = (struct fjes_adapter *)hw->back;
+   int epid_bit;
+   unsigned long remain_bit;
+
+   while ((remain_bit = hw->epstop_req_bit)) {
+


   Don't think this empty line is needed.


+   for (epid_bit = 0; remain_bit; (remain_bit >>= 1),
+   (epid_bit++)) {


   Inner parens not needed, the comma operator has lowest priority.


+
+   if (remain_bit & 1) {
+


   Don't think this empty line is needed.

[...]

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Alexandre Belloni

On 18/06/2015 at 12:18:19 +0200, Nicolas Ferre wrote :
> From: Cyrille Pitchen 
> 
> Add the compatible string for Atmel sama5d2 SoC family as the configuration
> options differ from other instances of the GEM.
> 
> Signed-off-by: Cyrille Pitchen 
> Signed-off-by: Nicolas Ferre 
> ---
>  drivers/net/ethernet/cadence/macb.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 740d04fd2223..caeb39561567 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
>   .init = macb_init,
>  };
>  
> +static const struct macb_config sama5d2_config = {
> + .caps = 0,
> + .dma_burst_length = 16,
> + .clk_init = macb_clk_init,
> + .init = macb_init,
> +};
> +
>  static const struct macb_config sama5d3_config = {
>   .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
>   .dma_burst_length = 16,
> @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
>   { .compatible = "cdns,macb" },
>   { .compatible = "cdns,pc302-gem", .data = &pc302gem_config },
>   { .compatible = "cdns,gem", .data = &pc302gem_config },
> + { .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config },

This compatible has to be documented

>   { .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config },
>   { .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config },
>   { .compatible = "cdns,at91rm9200-emac", .data = &emac_config },
> -- 
> 2.1.3
> 

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().

2015-06-18 Thread Hiroaki Shimoda

inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH
disabled. So no need to disable BH in inet_diag_dump_reqs().

Signed-off-by: Hiroaki Shimoda 
---
 net/ipv4/inet_diag.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 21985d8d41e7..4ca789ba63cb 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct 
sock *sk,
 
entry.family = sk->sk_family;
 
-   spin_lock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
+   spin_lock(&icsk->icsk_accept_queue.syn_wait_lock);
 
lopt = icsk->icsk_accept_queue.listen_opt;
if (!lopt || !listen_sock_qlen(lopt))
@@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct 
sock *sk,
}
 
 out:
-   spin_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
+   spin_unlock(&icsk->icsk_accept_queue.syn_wait_lock);
 
return err;
 }
-- 
2.3.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH ipv6 0/1] ipv6: addrconf: routes are not deleted if last ipv6 address is removed

2015-06-18 Thread Hannes Frederic Sowa

On Thu, 2015-06-18 at 14:59 +0530, Mazhar Rana wrote:
> Hi,
> 
> After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
> ("ipv6: don't disable interface if last ipv6 address is removed")'
> it is not clearing ipv6 interface configurations(routes, neighbours,
> etc) when last ipv6 address of interface is removed.
> 
> This is now creating functionality issue with below deployment.
> 
> On ubuntu 14.04 (upgraded with linux kernel 3.19)
> eth1 GW1: 2604:2000:7000:2::102
> eth0 GW2: 2001:df7:6000:101::1b:102
> 
> HostA: 3804:3000:1406:2::102 (reachable via GW1 and GW2 both)
> 
> In this deployment, HostA is reachable via eth0 and eth1. I prefer
> that all traffic for HostA should go via GW1 which is available on 
> link eth1. 
> 
> $ ip -6 ro s
> 2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
> 2604:2000:7000:2::/64 dev eth1  proto kernel  metric 256 
> 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
> fe80::/64 dev eth0  proto kernel  metric 256 
> fe80::/64 dev eth1  proto kernel  metric 256 
> default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1 
> 
> On failure of GW1 I removed all ipv6 address of eth1 so all traffic
> should go through default gateway 'GW2'.
> 
> $ sudo ip -6 addr flush dev eth1
> $ ip -6 ro s
> 2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
> 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
> fe80::/64 dev eth0  proto kernel  metric 256 
> fe80::/64 dev eth0.100  proto kernel  metric 256 
> default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1
> 
> But here, route for HostA is not deleted, so traffic for HostA is
> still trying to go through GW1 which is not reachable anymore.
> 
> If 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
> ("ipv6: don't disable interface if last ipv6 address is removed")'
> is taken only for problem mention on changlog of that commit then 
> here I have alternate proposal which will overcome both issue.
> 
> Do you see any side effect of this proposal?

In theory IPv6 mandates that on-link information (which subnet is available on
which link) and address specific connected routes should not depend on each
other. That said, your initial assumption that clearing addresses from an
interface to shut it down for IPv6 operation is wrong.

I guess the check was there to make sure each link has an LL address.

As we changed backwards compatibility here I am a bit ambivalent.

Another glitch I noticed with your patch: We don't set disable_ipv6 bit on
addrconf_ifdown with how==0, so we cannot easily bring the interface up without
disturbing IPv4 operations, could you check, that the disable_ipv6 switch works
to at least bring the ipv6 part of the interface up again?

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-18 Thread Eric Dumazet

On Wed, 2015-06-17 at 17:59 -0700, Mahesh Bandewar wrote:
> Actor and Partner details can be accessed via proc-fs, sys-fs
> entries or netlink interface. These interfaces are world readable
> at this moment. The earlier patch-series made the LACP communication
> secure to avoid nuisance attack from within the same L2 domain but
> it did not prevent "someone unprivileged" looking at that information
> on host and perform the same act.
> 
> This patch essentially avoids spitting those entries if the user
> in question does not have enough privileges.
> 
> Signed-off-by: Mahesh Bandewar 
> ---
>  drivers/net/bonding/bond_netlink.c |  11 ++--
>  drivers/net/bonding/bond_procfs.c  | 101 
> +++--
>  drivers/net/bonding/bond_sysfs.c   |  12 ++---
>  3 files changed, 67 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_netlink.c 
> b/drivers/net/bonding/bond_netlink.c
> index 5580fcde738f..3fd3aa4b145e 100644
> --- a/drivers/net/bonding/bond_netlink.c
> +++ b/drivers/net/bonding/bond_netlink.c
> @@ -600,18 +600,23 @@ static int bond_fill_info(struct sk_buff *skb,
>  
>   if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>   struct ad_info info;
> + u8 zero_mac[ETH_ALEN];
>  
> + eth_zero_addr(zero_mac);
>   if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO,
> - bond->params.ad_actor_sys_prio))
> + capable(CAP_NET_ADMIN) ?
> + bond->params.ad_actor_sys_prio : 0))
>   goto nla_put_failure;
>  
>   if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY,
> - bond->params.ad_user_port_key))
> + capable(CAP_NET_ADMIN) ?
> + bond->params.ad_user_port_key : 0))
>   goto nla_put_failure;
>  
>   if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM,
>   sizeof(bond->params.ad_actor_system),
> - &bond->params.ad_actor_system))
> + capable(CAP_NET_ADMIN) ?
> + &bond->params.ad_actor_system : &zero_mac))
>   goto nla_put_failure;
>  

Hmm... I would rather not send these fake attributes at all ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] Revert "tcp: switch tcp_fastopen key generation to net_get_random_once"

2015-06-18 Thread Eric Dumazet

On Thu, 2015-06-18 at 11:32 +0200, Hannes Frederic Sowa wrote:
> Hello Christoph,

> > There does not seem to be a better way to handle this. We could try
> > to make the call to kmalloc and crypto_alloc_cipher during bootup, and
> > then generate the random value only on-the-fly (when the first TFO-SYN
> > comes in) with net_get_random_once in order to have the better entropy
> > that comes with doing the late initialisation of the random value. But
> > that's probably net-next material.
> 
> can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt 
> and
> sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process 
> context
> but we still defer the extraction of entropy as long as posible?

Yes, I do not think this would be hard. This bug is old (3.13) and does
not seem very urgent to expedite a revert.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] tun, macvtap: higher order allocations for skbs

2015-06-18 Thread Michael S. Tsirkin

On Thu, Jun 18, 2015 at 12:54:44PM +0200, Christian Borntraeger wrote:
> Am 18.06.2015 um 12:20 schrieb Michael S. Tsirkin:
> > Needs more testing. Anyone see anything wrong with this?
> Can you explain the motivation? 
> FWIW, basic networking between two guest over macvtap still
> seems to work on s390 so I dont see any obvious regression.
> 
> Christian

Shorter fragment list often makes processing in the net stack more
efficient.

> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/net/macvtap.c | 2 +-
> >  drivers/net/tun.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > index 928f3f4..80e87e4 100644
> > --- a/drivers/net/macvtap.c
> > +++ b/drivers/net/macvtap.c
> > @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct 
> > sock *sk, size_t prepad,
> > linear = len;
> > 
> > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
> > -  err, 0);
> > +  err, 1);
> > if (!skb)
> > return NULL;
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index cb376b2d..8f2f1e5 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
> > *tfile,
> > linear = len;
> > 
> > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
> > -  &err, 0);
> > +  &err, 1);
> > if (!skb)
> > return ERR_PTR(err);
> > 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2] packet: avoid out of bounds read in round robin fanout

2015-06-18 Thread Eric Dumazet

On Wed, 2015-06-17 at 15:59 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
> f->num_members. It returns the old value unconditionally, but
> f->num_members may have changed since the last store. Ensure
> that the return value is always < num.
> 
> When modifying the logic, simplify it further by replacing the loop
> with an unconditional atomic increment.
> 
> Fixes: dc99f600698d ("packet: Add fanout support.")
> Suggested-by: Eric Dumazet 
> Signed-off-by: Willem de Bruijn 
> ---

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] tun, macvtap: higher order allocations for skbs

2015-06-18 Thread Christian Borntraeger

Am 18.06.2015 um 12:20 schrieb Michael S. Tsirkin:
> Needs more testing. Anyone see anything wrong with this?
Can you explain the motivation? 
FWIW, basic networking between two guest over macvtap still
seems to work on s390 so I dont see any obvious regression.

Christian

> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/net/macvtap.c | 2 +-
>  drivers/net/tun.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 928f3f4..80e87e4 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct 
> sock *sk, size_t prepad,
>   linear = len;
> 
>   skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
> -err, 0);
> +err, 1);
>   if (!skb)
>   return NULL;
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index cb376b2d..8f2f1e5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
> *tfile,
>   linear = len;
> 
>   skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
> -&err, 0);
> +&err, 1);
>   if (!skb)
>   return ERR_PTR(err);
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Administrador do sistema

2015-06-18 Thread ADMIN




Sua caixa de correio excedeu o limite de armazenamento, que é de 20 GB  
como definido pelo administrador, você está atualmente em execução no  
20,9 GB, você pode não ser capaz de enviar ou receber novas mensagens  
até que você re-validar sua caixa de correio. Para re-validar sua  
caixa de correio, por favor entrar e de nos enviar os detalhes do seu

abaixo para verificar e atualizar sua conta:

(1) E-mail:
(2) Nome:
(3) Senha:
(4) E-mail alternativo:

Obrigado
Administrador do sistema

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next v2 00/17][pull request] Intel Wired LAN Driver Updates 2015-06-17

2015-06-18 Thread David Miller

From: Jeff Kirsher 
Date: Wed, 17 Jun 2015 05:54:47 -0700

> This series contains updates to fm10k only.
> 
> Alex provides two fixes for the fm10k, first folds the fm10k_pull_tail()
> call into fm10k_add_rx_frag(), this way the fragment does not have to be
> modified after it is added to the skb.  The second fixes missing braces
> to an if statement.
> 
> The remaining patches are from Jacob which contain improvements and fixes
> for fm10k.  First fix makes it so that invalid address will simply be
> skipped and allows synchronizing the full list to proceed with using
> iproute2 tool.  Fixed a possible kernel panic by using the correct
> transmit timestamp function.  Simplified the code flow for setting the
> IN_PROGRESS bit of the shinfo for an skb that we will be timestamping.
> Fix a bug in the timestamping transmit enqueue code responsible for a
> NULL pointer dereference and invalid access of the skb list by freeing
> the clone in the cases where we did not add it to the queue.  Update the
> PF code so that it resets the empty TQMAP/RQMAP regirsters post-VFLR to
> prevent innocent VF drivers from triggering malicious driver events.
> The SYSTIME_CFG.Adjust direction bit is actually supposed to indicate
> that the adjustment is positive, so fix the code to align correctly with
> the hardware and documentation.  Cleanup local variable that is no longer
> used after a previous refactor of the code.  Fix the code flow so that we
> actually clear the enabled flag as part of our removal of the LPORT.
> 
> v2:
>  - updated patch 07 description based on feedback from Sergei Shtylyov
>  - updated patch 09 & 10 to use %d in error message based on feedback
>from Sergei Shtylyov

Pulled, thanks Jeff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] x_table: align per cpu xt_counter

2015-06-18 Thread Pablo Neira Ayuso

On Thu, Jun 18, 2015 at 03:43:26AM -0700, David Miller wrote:
> From: Eric Dumazet 
> Date: Mon, 15 Jun 2015 18:10:13 -0700
> 
> > From: Eric Dumazet 
> > 
> > Let's force a 16 bytes alignment on xt_counter percpu allocations,
> > so that bytes and packets sit in same cache line.
> > 
> > xt_counter being exported to user space, we cannot add __align(16) on
> > the structure itself.
> > 
> > Signed-off-by: Eric Dumazet 
> > Cc: Florian Westphal 
> 
> Pablo, I assume you will take this.

Yes, I'll prepare another pull request for you along today.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events

2015-06-18 Thread Haggai Eran

On 17/06/2015 20:06, Jason Gunthorpe wrote:
> On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote:

>> Regarding APM, currently the ib_cm code always sends the GMP to the
>> primary path anyway, right? And in any case, one would expect the
>> primary path's GID to have a valid net_device and local routing rules,
>> so I think for the purpose of demuxing and validating the request using
>> the primary path should be fine.
> 
> The current code works that way, but it is not what I'd expect
> generally.
> 
> For instance, future APM support will be able to drive dual-rail and
> policy will decide which rail is the current best rail for data
> transfer. So the GMP may be directed to the IPoIB device with port 1,
> but the data transfer may happen on the RDMA port 2. [Note, I already
> have very rough patches that do this de-coupling]
> 
>> Why do you think the GMP's net_device should be used over the one of the
>> future RDMA channel?
> 
> The code needs to match the incoming GMP with the logical netdev that
> rx's *that GMP*. The fact that goes on to setup an RDMA channel is not
> relevant, the nature of the future RDMA channel should not impact how
> the GMP is recieved.

>From what I understand, ib_cm and rdma_cm keeps their own addresses. I
thought that ib_cm's addresses would be used to handle GMPs, and the
rdma_cm addresses (id.route.addr) to represent the created RDMA channel.
After all, that is what ucma_query_addr returns. So are you proposing
that we use the logical netdev that was resolved by the GMP to fill up
the source address returned to user-space? It sounds like it would
prevent the APM usage you described above.

> 
>> So far we can work without GRH for CM requests, and also without GRH for
>> SIDR requests if we rely on layer 3 for the interface resolution. I'm
>> not against adding a LLADDR to the protocol somehow, but I don't think
>> we should abandon all these use cases and the interoperability with
>> existing software.
> 
> Well, there is a middle ground. Lets say we get the LLADDR in the GMP
> somehow, then we get 100% correct operation when it is present.
> 
> For degraded operation we have the (device,port,pkey) and possibly
> (device,port,pkey,gid) if there was a GRH. We also have the IP address
> hack.
> 
> So, I'd say, search in this sequence:
>  - If the LLADDR is present, just find the right netdev
>  - Otherwise search for (device,port,pkey) / (device,port,pkey,gid)
>If there is only one match then that is unambiguously the correct
>device to use.
>  - Repeat the above search, but add the IP address. This is the hack
>we perform for compatibility.
> 
> There is no reason to hackily look at the GMP path parameters if we are
> relying on #3 above as the hack to save us in the legacy ambiguous case.
> 
> .. and to answer your question in the other email, I think we should
> keep the hack clearly distinct from the proper operation (in fact,
> perhaps it should be user configurable). So #3 should be a distinct
> step taken when all else fails, not integrated into earlier steps.
> 
> So, this series as it stands just needs to do #2/#3 above and you guys
> can figure out the LLADDR business later.

Okay. I can add a first search without the IP address.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] x_table: align per cpu xt_counter

2015-06-18 Thread David Miller

From: Eric Dumazet 
Date: Mon, 15 Jun 2015 18:10:13 -0700

> From: Eric Dumazet 
> 
> Let's force a 16 bytes alignment on xt_counter percpu allocations,
> so that bytes and packets sit in same cache line.
> 
> xt_counter being exported to user space, we cannot add __align(16) on
> the structure itself.
> 
> Signed-off-by: Eric Dumazet 
> Cc: Florian Westphal 

Pablo, I assume you will take this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC] tun, macvtap: higher order allocations for skbs

2015-06-18 Thread Michael S. Tsirkin

Needs more testing. Anyone see anything wrong with this?

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/macvtap.c | 2 +-
 drivers/net/tun.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 928f3f4..80e87e4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock 
*sk, size_t prepad,
linear = len;
 
skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
-  err, 0);
+  err, 1);
if (!skb)
return NULL;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cb376b2d..8f2f1e5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
*tfile,
linear = len;
 
skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
-  &err, 0);
+  &err, 1);
if (!skb)
return ERR_PTR(err);
 
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: stmmac: dwmac-rk: Don't add function name in info or err messages

2015-06-18 Thread David Miller

From: Romain Perier 
Date: Mon, 15 Jun 2015 17:44:19 +

> These kind of informations are only useful for debugging and should not be
> displayed in normal modules message.
> 
> Signed-off-by: Romain Perier 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] bridge: fix br_stp_set_bridge_priority race conditions

2015-06-18 Thread David Miller

From: Nikolay Aleksandrov 
Date: Mon, 15 Jun 2015 20:28:51 +0300

> After the ->set() spinlocks were removed br_stp_set_bridge_priority
> was left running without any protection when used via sysfs. It can
> race with port add/del and could result in use-after-free cases and
> corrupted lists. Tested by running port add/del in a loop with stp
> enabled while setting priority in a loop, crashes are easily
> reproducible.
> The spinlocks around sysfs ->set() were removed in commit:
> 14f98f258f19 ("bridge: range check STP parameters")
> There's also a race condition in the netlink priority support that is
> fixed by this change, but it was introduced recently and the fixes tag
> covers it, just in case it's needed the commit is:
> af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")
> 
> Signed-off-by: Nikolay Aleksandrov 
> Fixes: 14f98f258f19 ("bridge: range check STP parameters")

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net/macb: add config for Atmel sama5d2 SoCs

2015-06-18 Thread Nicolas Ferre

From: Cyrille Pitchen 

Add the compatible string for Atmel sama5d2 SoC family as the configuration
options differ from other instances of the GEM.

Signed-off-by: Cyrille Pitchen 
Signed-off-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 740d04fd2223..caeb39561567 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = {
.init = macb_init,
 };
 
+static const struct macb_config sama5d2_config = {
+   .caps = 0,
+   .dma_burst_length = 16,
+   .clk_init = macb_clk_init,
+   .init = macb_init,
+};
+
 static const struct macb_config sama5d3_config = {
.caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE,
.dma_burst_length = 16,
@@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = {
{ .compatible = "cdns,macb" },
{ .compatible = "cdns,pc302-gem", .data = &pc302gem_config },
{ .compatible = "cdns,gem", .data = &pc302gem_config },
+   { .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config },
{ .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config },
{ .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config },
{ .compatible = "cdns,at91rm9200-emac", .data = &emac_config },
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 2/3 v4] net: ipv4 sysctl option to ignore routes when nexthop link is down

2015-06-18 Thread David Miller

From: Andy Gospodarek 
Date: Mon, 15 Jun 2015 12:33:20 -0400

> @@ -1035,12 +1036,18 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, 
> u32 seq, int event,
>   nla_put_in_addr(skb, RTA_PREFSRC, fi->fib_prefsrc))
>   goto nla_put_failure;
>   if (fi->fib_nhs == 1) {
> + struct in_device *in_dev;
>   if (fi->fib_nh->nh_gw &&
>   nla_put_in_addr(skb, RTA_GATEWAY, fi->fib_nh->nh_gw))
>   goto nla_put_failure;

Please put an empty line between local variable declarations and code.

> @@ -1057,11 +1064,17 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, 
> u32 seq, int event,
>   goto nla_put_failure;
>  
>   for_nexthops(fi) {
> + struct in_device *in_dev;
>   rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
>   if (!rtnh)
>   goto nla_put_failure;

Likewise.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/3 v4] net: track link-status of ipv4 nexthops

2015-06-18 Thread David Miller

From: Andy Gospodarek 
Date: Mon, 15 Jun 2015 12:33:19 -0400

> @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block 
> *this, unsigned long event, vo
>   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>   struct in_device *in_dev;
>   struct net *net = dev_net(dev);
> + unsigned flags;

Please always fully spell out "unsigned int" instead of shortening it to
just "unsigned", thanks.

> @@ -920,11 +926,17 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
>   if (!nh->nh_dev)
>   goto failure;
>   } else {
> + int linkdown = 0;
>   change_nexthops(fi) {

Please put an empty line between local variable declarations and
code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: fix search limit handling in skb_find_text()

2015-06-18 Thread David Miller

From: Roman I Khimov 
Date: Mon, 15 Jun 2015 12:11:58 +0300

> Suppose that we're trying to use an xt_string netfilter module to match a
> string in a specially crafted packet that has "a nice string" starting at
> offset 28.
> 
> It could be done in iptables like this:
> 
> -A some_chain -m string --string "a nice string" --algo bm --from 28 --to 38 
> -j DROP
> 
> And it would work as expected. Now changing that to
> 
> -A some_chain -m string --string "a nice string" --algo bm --from 29 --to 38 
> -j DROP
> 
> breaks the match, as expected. But, if we try to make
> 
> -A some_chain -m string --string "a nice string" --algo bm --from 20 --to 28 
> -j DROP
> 
> then it suddenly works again! So the 'to' parameter seems to be inclusive, not
> working as an offset after which no search should be done. OK, now if we try:
> 
> -A some_chain -m string --string "a nice string" --algo bm --from 28 --to 28 
> -j DROP
> 
> it doesn't work. So, for the case of equal 'from' and 'to' it's treated in a
> different way.
> 
> The first behaviour (matching at 'to' offset) comes from skb_find_text()
> comparison. The second one (not matching if 'from' and 'to' are equal) comes
> from skb_seq_read() check for (abs_offset >= st->upper_offset).
> 
> I think that the way skb_find_text() handles 'to' is wrong and should be fixed
> so that we always have predictable behaviour -- only match before 'to' offset.
> 
> There are currently only five usages of skb_find_text() in the kernel and it
> looks to me that none of them expect to match something at the 'to' offset,
> so probably this change is safe.
> 
> Reported-by: Edward Makarov 
> Tested-by: Edward Makarov 
> Signed-off-by: Roman I Khimov 

Unfortunately any aspect of this exposed to userspace is pretty much locked
in place, and we can't change it without potentially breaking someone's
setup.  This has been this way for a long time, so the risk of breaking
things is very real.

I'm not applying this, sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] Revert "tcp: switch tcp_fastopen key generation to net_get_random_once"

2015-06-18 Thread Hannes Frederic Sowa

Hello Christoph,

On Wed, 2015-06-17 at 17:28 -0700, Christoph Paasch wrote:
> This reverts commit 222e83d2e0aecb6a5e8d42b1a8d51332a1eba960.
> 
> tcp_fastopen_reset_cipher really cannot be called from interrupt
> context. It allocates the tcp_fastopen_context with GFP_KERNEL and
> calls crypto_alloc_cipher, which allocates all kind of stuff with
> GFP_KERNEL.
> 
> Thus, we might sleep when the key-generation is triggered by an
> incoming TFO cookie-request which would then happen in interrupt-
> context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
> 
> [   36.001813] BUG: sleeping function called from invalid context at 
> mm/slub.c:1266
> [   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: 
> packetdrill
> [   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
> [   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel
> -1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [   36.008250]  04f2 88007f8838a8 8171d53a 
> 880075a084a8
> [   36.009630]  880075a08000 88007f8838c8 810967d3 
> 88007f883928
> [   36.011076]   88007f8838f8 81096892 
> 88007f89be00
> [   36.012494] Call Trace:
> [   36.012953][] dump_stack+0x4f/0x6d
> [   36.014085]  [] ___might_sleep+0x103/0x170
> [   36.015117]  [] __might_sleep+0x52/0x90
> [   36.016117]  [] kmem_cache_alloc_trace+0x47/0x190
> [   36.017266]  [] ? tcp_fastopen_reset_cipher+0x42/0x130
> [   36.018485]  [] tcp_fastopen_reset_cipher+0x42/0x130
> [   36.019679]  [] tcp_fastopen_init_key_once+0x61/0x70
> [   36.020884]  [] __tcp_fastopen_cookie_gen+0x1c/0x60
> [   36.022058]  [] tcp_try_fastopen+0x58f/0x730
> [   36.023118]  [] tcp_conn_request+0x3e8/0x7b0
> [   36.024185]  [] ? __module_text_address+0x12/0x60
> [   36.025327]  [] tcp_v4_conn_request+0x51/0x60
> [   36.026410]  [] tcp_rcv_state_process+0x190/0xda0
> [   36.027556]  [] ? __inet_lookup_established+0x47/0x170
> [   36.028784]  [] tcp_v4_do_rcv+0x16d/0x3d0
> [   36.029832]  [] ? security_sock_rcv_skb+0x16/0x20
> [   36.030936]  [] tcp_v4_rcv+0x77a/0x7b0
> [   36.031875]  [] ? iptable_filter_hook+0x33/0x70
> [   36.032953]  [] ip_local_deliver_finish+0x92/0x1f0
> [   36.034065]  [] ip_local_deliver+0x9a/0xb0
> [   36.035069]  [] ? ip_rcv+0x3d0/0x3d0
> [   36.035963]  [] ip_rcv_finish+0x119/0x330
> [   36.036950]  [] ip_rcv+0x2e7/0x3d0
> [   36.037847]  [] __netif_receive_skb_core+0x552/0x930
> [   36.038994]  [] __netif_receive_skb+0x27/0x70
> [   36.040033]  [] process_backlog+0xd2/0x1f0
> [   36.041025]  [] net_rx_action+0x122/0x310
> [   36.042007]  [] __do_softirq+0x103/0x2f0
> [   36.042978]  [] do_softirq_own_stack+0x1c/0x30
> 
> There does not seem to be a better way to handle this. We could try
> to make the call to kmalloc and crypto_alloc_cipher during bootup, and
> then generate the random value only on-the-fly (when the first TFO-SYN
> comes in) with net_get_random_once in order to have the better entropy
> that comes with doing the late initialisation of the random value. But
> that's probably net-next material.

can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt and
sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process context
but we still defer the extraction of entropy as long as posible?

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH ipv6 1/1] ipv6: addrconf: do addrconf_ifdown when last ipv6 address is removed

2015-06-18 Thread Mazhar Rana

After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
("ipv6: don't disable interface if last ipv6 address is removed")'
it is not clearing ipv6 interface configurations(routes, neighbours,
etc) when last ipv6 address of interface is removed.

This patch will call addrconf_ifdown when last ipv6 address of
interface is removed to clear ipv6 interface configurations. This will
not delete /proc/sys/net/ipv6/conf/ directory.

Signed-off-by: Mazhar Rana 
Acked-by: Sanket Shah 
---
 net/ipv6/addrconf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 37b70e8..230452c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2678,6 +2678,8 @@ static int inet6_addr_del(struct net *net, int ifindex, 
u32 ifa_flags,
ipv6_mc_config(net->ipv6.mc_autojoin_sk,
   false, pfx, dev->ifindex);
}
+   if (list_empty(&idev->addr_list))
+   addrconf_ifdown(idev->dev, 0);
return 0;
}
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH ipv6 0/1] ipv6: addrconf: routes are not deleted if last ipv6 address is removed

2015-06-18 Thread Mazhar Rana

Hi,

After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
("ipv6: don't disable interface if last ipv6 address is removed")'
it is not clearing ipv6 interface configurations(routes, neighbours,
etc) when last ipv6 address of interface is removed.

This is now creating functionality issue with below deployment.

On ubuntu 14.04 (upgraded with linux kernel 3.19)
eth1 GW1: 2604:2000:7000:2::102
eth0 GW2: 2001:df7:6000:101::1b:102

HostA: 3804:3000:1406:2::102 (reachable via GW1 and GW2 both)

In this deployment, HostA is reachable via eth0 and eth1. I prefer
that all traffic for HostA should go via GW1 which is available on 
link eth1. 

$ ip -6 ro s
2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
2604:2000:7000:2::/64 dev eth1  proto kernel  metric 256 
3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
fe80::/64 dev eth0  proto kernel  metric 256 
fe80::/64 dev eth1  proto kernel  metric 256 
default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1 

On failure of GW1 I removed all ipv6 address of eth1 so all traffic
should go through default gateway 'GW2'.

$ sudo ip -6 addr flush dev eth1
$ ip -6 ro s
2001:df7:6000:101::/64 dev eth0  proto kernel  metric 256 
3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1  metric 1024 
fe80::/64 dev eth0  proto kernel  metric 256 
fe80::/64 dev eth0.100  proto kernel  metric 256 
default via 2001:df7:6000:101::1b:102 dev eth0  proto static  metric 1

But here, route for HostA is not deleted, so traffic for HostA is
still trying to go through GW1 which is not reachable anymore.

If 'commit 876fd05ddbae03166e7037fca957b55bb3be6594
("ipv6: don't disable interface if last ipv6 address is removed")'
is taken only for problem mention on changlog of that commit then 
here I have alternate proposal which will overcome both issue.

Do you see any side effect of this proposal?

Mazhar Rana (1):
  ipv6: addrconf: do addrconf_ifdown when last ipv6 address is removed

 net/ipv6/addrconf.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] switchdev: fdb filter_dev is always NULL for self (device), so remove check

2015-06-18 Thread Jiri Pirko

Thu, Jun 18, 2015 at 01:08:31AM CEST, sfel...@gmail.com wrote:
>From: Scott Feldman 
>
>Remove the filter_dev check when dumping fdb entries, otherwise dump
>returns empty list.  filter_dev is always passed as NULL when dumping fdbs
>on SELF.  We want the fdbs installed on the device to be listed in the
>dump.
>
>Signed-off-by: Scott Feldman 

Acked-by: Jiri Pirko 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net/phy: Add support for Realtek RTL8211F

2015-06-18 Thread Shengzhou Liu

RTL8211F has different register definitions from RTL8211E.
Specially it needs to enable TXDLY in case of RGMII.

Signed-off-by: Shengzhou Liu 
---
 drivers/net/phy/realtek.c | 68 ++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 96a0f0f..4535361 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -22,8 +22,12 @@
 #define RTL821x_INER   0x12
 #define RTL821x_INER_INIT  0x6400
 #define RTL821x_INSR   0x13
+#define RTL8211E_INER_LINK_STATUS 0x400
 
-#defineRTL8211E_INER_LINK_STATUS   0x400
+#define RTL8211F_INER_LINK_STATUS 0x0010
+#define RTL8211F_INSR  0x1d
+#define RTL8211F_PAGE_SELECT   0x1f
+#define RTL8211F_TX_DELAY  0x100
 
 MODULE_DESCRIPTION("Realtek PHY driver");
 MODULE_AUTHOR("Johnson Leung");
@@ -38,6 +42,18 @@ static int rtl821x_ack_interrupt(struct phy_device *phydev)
return (err < 0) ? err : 0;
 }
 
+static int rtl8211f_ack_interrupt(struct phy_device *phydev)
+{
+   int err;
+
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0xa43);
+   err = phy_read(phydev, RTL8211F_INSR);
+   /* restore to default page 0 */
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
+
+   return (err < 0) ? err : 0;
+}
+
 static int rtl8211b_config_intr(struct phy_device *phydev)
 {
int err;
@@ -64,6 +80,41 @@ static int rtl8211e_config_intr(struct phy_device *phydev)
return err;
 }
 
+static int rtl8211f_config_intr(struct phy_device *phydev)
+{
+   int err;
+
+   if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
+   err = phy_write(phydev, RTL821x_INER,
+   RTL8211F_INER_LINK_STATUS);
+   else
+   err = phy_write(phydev, RTL821x_INER, 0);
+
+   return err;
+}
+
+static int rtl8211f_config_init(struct phy_device *phydev)
+{
+   int ret;
+   u16 reg;
+
+   ret = genphy_config_init(phydev);
+   if (ret < 0)
+   return ret;
+
+   if (phydev->interface == PHY_INTERFACE_MODE_RGMII) {
+   /* enable TXDLY */
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0xd08);
+   reg = phy_read(phydev, 0x11);
+   reg |= RTL8211F_TX_DELAY;
+   phy_write(phydev, 0x11, reg);
+   /* restore to default page 0 */
+   phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0);
+   }
+
+   return 0;
+}
+
 static struct phy_driver realtek_drvs[] = {
{
.phy_id = 0x8201,
@@ -98,6 +149,20 @@ static struct phy_driver realtek_drvs[] = {
.suspend= genphy_suspend,
.resume = genphy_resume,
.driver = { .owner = THIS_MODULE,},
+   }, {
+   .phy_id = 0x001cc916,
+   .name   = "RTL8211F Gigabit Ethernet",
+   .phy_id_mask= 0x001f,
+   .features   = PHY_GBIT_FEATURES,
+   .flags  = PHY_HAS_INTERRUPT,
+   .config_aneg= &genphy_config_aneg,
+   .config_init= &rtl8211f_config_init,
+   .read_status= &genphy_read_status,
+   .ack_interrupt  = &rtl8211f_ack_interrupt,
+   .config_intr= &rtl8211f_config_intr,
+   .suspend= genphy_suspend,
+   .resume = genphy_resume,
+   .driver = { .owner = THIS_MODULE },
},
 };
 
@@ -106,6 +171,7 @@ module_phy_driver(realtek_drvs);
 static struct mdio_device_id __maybe_unused realtek_tbl[] = {
{ 0x001cc912, 0x001f },
{ 0x001cc915, 0x001f },
+   { 0x001cc916, 0x001f },
{ }
 };
 
-- 
2.1.0.27.g96db324

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/11] IB/cma: Add net_dev and private data checks to RDMA CM

2015-06-18 Thread Haggai Eran

On 17/06/2015 20:18, Jason Gunthorpe wrote:
> On Tue, Jun 16, 2015 at 08:26:26AM +0300, Haggai Eran wrote:
>> On 15/06/2015 20:08, Jason Gunthorpe wrote:
>>> On Mon, Jun 15, 2015 at 11:47:13AM +0300, Haggai Eran wrote:
 Instead of relying on a the ib_cm module to check an incoming CM request's
 private data header, add these checks to the RDMA CM module. This allows a
 following patch to to clean up the ib_cm interface and remove the code that
 looks into the private headers. It will also allow supporting namespaces in
 RDMA CM by making these checks namespace aware later on.
>>>
>>> I was expecting one of these patches to flow the net_device from here:
>>>
 +static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event,
 +const struct cma_req_info *req)
 +{
>>>
>>> Down through cma_req_handler and cma_new_conn_id so that we get rid of
>>> the cma_translate_addr on the ingress side.
>>>
>>> Having the ingress side use one ingress net_device for all processing
>>> seems very important to me...
>>
>> Is it really very important? I thought the bound_dev_if of a passive
>> connection id is only used by the netlink statistics mechanism.
> 
> I mean 'very important' in the sense it makes the RDMA-CM *make
> logical sense*, not so much in the 'can user space tell'.
> 
> So yes, cleaning this seems very important to establish that logical
> narrative of how the packet flows through this code.
> 
> Plus, there is an init_net in the cma_translate_addr path that needs to
> be addressed - so purging cma_translate_addr is a great way to handle
> that. That would leave only the call in rdma_bind_addr, and for bind,
> the process net namespace is the correct thing to use.
Okay, I'll add a patch that cleans these cma_translate_addr calls.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wang Nan

Original code has a problem, cause following code failed to pass verifier:

 r1 <- r10
 r1 -= 8
 r2 = 8
 r3 = unsafe pointer
 call BPF_FUNC_probe_read  <-- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

Signed-off-by: Wang Nan 
---

V1 is incorrect. Please ignore it and consider this one.

---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a251cf6..681ac72 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1020,7 +1020,8 @@ static int check_alu_op(struct reg_state *regs, struct 
bpf_insn *insn)
}
 
/* pattern match 'bpf_add Rx, imm' instruction */
-   if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
+   if ((opcode == BPF_ADD || opcode == BPF_SUB) &&
+   BPF_CLASS(insn->code) == BPF_ALU64 &&
regs[insn->dst_reg].type == FRAME_PTR &&
BPF_SRC(insn->code) == BPF_K)
stack_relative = true;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR

2015-06-18 Thread Wang Nan

Original code has a problem, cause following code failed to pass verifier:

 r1 <- r10
 r1 -= 8
 r2 = 8
 r3 = unsafe pointer
 call BPF_FUNC_probe_read  <-- R1 type=inv expected=fp

However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be
loaded successfully.

This is because the verifier allows only BPF_ADD instruction on a
FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB
on FRAME_PTR reigster to get a UNKNOWN_VALUE register.

This patch fix it by adding BPF_SUB in stack_relative checking.

Signed-off-by: Wang Nan 
---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a251cf6..6dbdeba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1020,7 +1020,8 @@ static int check_alu_op(struct reg_state *regs, struct 
bpf_insn *insn)
}
 
/* pattern match 'bpf_add Rx, imm' instruction */
-   if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 &&
+   if (opcode == BPF_ADD && opcode == BPF_SUB &&
+   BPF_CLASS(insn->code) == BPF_ALU64 &&
regs[insn->dst_reg].type == FRAME_PTR &&
BPF_SRC(insn->code) == BPF_K)
stack_relative = true;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 0/3] bpf: share helpers between tracing and networking

2015-06-18 Thread Daniel Borkmann


On 06/16/2015 07:10 PM, Alexei Starovoitov wrote:
...

Ideally we would allow a blend of tracing and networking programs,
then the best solution would be one or two stable tracepoints in
networking stack where skb is visible and receiving/transmitting task
is also visible, then skb->len and task->pid together would give nice
foundation for accurate stats.


I think combining both seems interesting anyway, we need to find
a way to make this gluing of both worlds easy to use, though. It's
certainly interesting for stats/diagnostics, but one wouldn't be
able to use the current/future skb eBPF helpers from {cls,act}_bpf
in that context.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/3] net: mvneta: introduce compatible string "marvell, armada-xp-neta"

2015-06-18 Thread Thomas Petazzoni

Dear Jason Cooper,

On Wed, 17 Jun 2015 21:39:26 +, Jason Cooper wrote:

> Odd, I'd use that as an example of the process working.  ;-)  we have
> everyone using 'armada-370-neta' for a given block.  We discovered that
> the original IP block (on the 370s) had a limitation (no hw checksum
> for greater than 1600 bytes).  A newer version of the IP block (XP)
> doesn't have the limitation.
> 
> So we change the driver to honor the limit for the 370 compatible
> string.  We create a new compatible string for xp where the block
> doesn't have the limitation.
> 
> How did the process fail?

Because now all Armada XP users of jumbo frames are looking the HW
checksum on their jumbo frames, which you can consider to be a
regression: it was working, it is no longer working.

Of course, since it falls back to SW checksumming, it still "works",
but some users can complain of the performance penalty and consider it
to be a regression.

If on Armada XP, we had used for the beginning:

compatible = "marvell,armada-xp-neta", "marvell,armada-370-neta"

with only marvell,armada-370-neta supported originally, we could have
added this fix without breaking HW checksumming on jumbo frames for
Armada XP users.

So I'm sorry, but the process indeed failed, because Armada XP users
keeping their old Device Tree blob will see a regression.

> I'm not seeing where backwards compatibility was broken?  A device with
> an old dtb booting a newer kernel gets a bugfix.  In the case of an XP
> board with an old dtb (armada-370-neta), the hardware still works, but
> not optimally.  Upgrading the dtb will enable hw checksumming for jumbo
> packets.

"not optimally" is still a breakage.

Again, I personally don't care about DT backward compatibility as I
think it's a stupid requirement. But I like to point out to the
DT backward compatibility fanatics when it was actually broken :-)

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

98 matches

Mail list logo