Re: [PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
Hello, On Thu, 18 Jun 2015, Roopa Prabhu wrote: > @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) > payload += nla_total_size((RTAX_MAX * nla_total_size(4))); > > if (fi->fib_nhs) { > + size_t nh_encapsize = 0; Var not in #ifdef. Any warnings with CONFIG_LWTUNNEL=n? > /* Also handles the special case fib_nhs == 1 */ > > /* each nexthop is packed in an attribute */ > @@ -374,8 +380,23 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) > /* may contain flow and gateway attribute */ > nhsize += 2 * nla_total_size(4); > > +#ifdef CONFIG_LWTUNNEL > + /* grab encap info */ > + for_nexthops(fi) { > + if (nh->nh_lwtstate) { > + /* RTA_ENCAP_TYPE */ > + nh_encapsize += lwtunnel_get_encap_size( > + nh->nh_lwtstate); New labels not in #ifdef: > + > +err_inval: > + ret = -EINVAL; > + > +errout: > + return ret; > } Some other places may need changes: - nh_comp: there is logic that decides if same fib_info is reused from many fib nodes. There should be check if NH matches by nh_lwtstate. - xfrm4_fill_dst: not sure about this but some fields are copied. Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().
On Thu, 2015-06-18 at 20:40 +0900, Hiroaki Shimoda wrote: > inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH > disabled. So no need to disable BH in inet_diag_dump_reqs(). > > Signed-off-by: Hiroaki Shimoda > --- > net/ipv4/inet_diag.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c > index 21985d8d41e7..4ca789ba63cb 100644 > --- a/net/ipv4/inet_diag.c > +++ b/net/ipv4/inet_diag.c > @@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, > struct sock *sk, > > entry.family = sk->sk_family; > > - spin_lock_bh(&icsk->icsk_accept_queue.syn_wait_lock); > + spin_lock(&icsk->icsk_accept_queue.syn_wait_lock); > > lopt = icsk->icsk_accept_queue.listen_opt; > if (!lopt || !listen_sock_qlen(lopt)) > @@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, > struct sock *sk, > } > > out: > - spin_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock); > + spin_unlock(&icsk->icsk_accept_queue.syn_wait_lock); > > return err; > } Sure, although this will soon be removed completely when SYN_RECV sockets will be stored in regular ehash table. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC net] neigh: do not modify unlinked entries
On Tue, 2015-06-16 at 22:56 +0300, Julian Anastasov wrote: > The lockless lookups can return entry that is unlinked. > Sometimes they get reference before last neigh_cleanup_and_release, > sometimes they do not need reference. Later, any > modification attempts may result in the following problems: > > 1. entry is not destroyed immediately because neigh_update > can start the timer for dead entry, eg. on change to NUD_REACHABLE > state. As result, entry lives for some time but is invisible > and out of control. > > 2. __neigh_event_send can run in parallel with neigh_destroy > while refcnt=0 but if timer is started and expired refcnt can > reach 0 for second time leading to second neigh_destroy and > possible crash. > > Thanks to Eric Dumazet and Ying Xue for their work and analyze > on the __neigh_event_send change. > > Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") > Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet > path.") > Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().") > Cc: Eric Dumazet > Cc: Ying Xue > Signed-off-by: Julian Anastasov > --- Seems good to me Julian ! Acked-by: Eric Dumazet -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next RFC v2 3/3] mpls: support for ip tunnels
From: Roopa Prabhu Support ip mpls tunnels using the new lwt infrastructure. Signed-off-by: Roopa Prabhu --- include/linux/mpls_iptunnel.h |6 ++ include/net/mpls_iptunnel.h| 29 + include/uapi/linux/mpls_iptunnel.h | 26 + net/mpls/Kconfig |5 + net/mpls/Makefile |1 + net/mpls/af_mpls.c |9 +- net/mpls/internal.h|3 + net/mpls/mpls_iptunnel.c | 205 8 files changed, 281 insertions(+), 3 deletions(-) create mode 100644 include/linux/mpls_iptunnel.h create mode 100644 include/net/mpls_iptunnel.h create mode 100644 include/uapi/linux/mpls_iptunnel.h create mode 100644 net/mpls/mpls_iptunnel.c diff --git a/include/linux/mpls_iptunnel.h b/include/linux/mpls_iptunnel.h new file mode 100644 index 000..ef29eb2 --- /dev/null +++ b/include/linux/mpls_iptunnel.h @@ -0,0 +1,6 @@ +#ifndef _LINUX_MPLS_IPTUNNEL_H +#define _LINUX_MPLS_IPTUNNEL_H + +#include + +#endif /* _LINUX_MPLS_IPTUNNEL_H */ diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h new file mode 100644 index 000..4234efc --- /dev/null +++ b/include/net/mpls_iptunnel.h @@ -0,0 +1,29 @@ +/* + * Copyright (c) 2015 Cumulus Networks, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#ifndef _NET_MPLS_IPTUNNEL_H +#define _NET_MPLS_IPTUNNEL_H 1 + +#define MAX_NEW_LABELS 2 + +struct mpls_iptunnel_encap { + u32 label[MAX_NEW_LABELS]; + u8 labels; +}; + +static inline struct mpls_iptunnel_encap *mpls_lwt_hdr(struct lwtunnel_state *lwtstate) +{ + return (struct mpls_iptunnel_encap *)lwtstate->tunnel.data; +} + +#endif diff --git a/include/uapi/linux/mpls_iptunnel.h b/include/uapi/linux/mpls_iptunnel.h new file mode 100644 index 000..228e36a --- /dev/null +++ b/include/uapi/linux/mpls_iptunnel.h @@ -0,0 +1,26 @@ +/* + * mpls tunnel api + * + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _UAPI_LINUX_MPLS_IPTUNNEL_H +#define _UAPI_LINUX_MPLS_IPTUNNEL_H + +/* MPLS tunnel attributes + * [RTA_ENCAP] = { + * [MPLS_IPTUNNEL_DST] + * } + */ +enum { + MPLS_IPTUNNEL_UNSPEC, + MPLS_IPTUNNEL_DST, + __MPLS_IPTUNNEL_MAX, +}; +#define MPLS_IPTUNNEL_MAX (__MPLS_IPTUNNEL_MAX - 1) + +#endif /* _UAPI_LINUX_MPLS_IPTUNNEL_H */ diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig index 17bde79..3e87a6b 100644 --- a/net/mpls/Kconfig +++ b/net/mpls/Kconfig @@ -27,4 +27,9 @@ config MPLS_ROUTING help Add support for forwarding of mpls packets. +config MPLS_IPTUNNEL + tristate "MPLS: IP over MPLS tunnel support" + help +Light weight tunnel handling for mpls tunnel packets + endif # MPLS diff --git a/net/mpls/Makefile b/net/mpls/Makefile index 65bbe68..9ca9236 100644 --- a/net/mpls/Makefile +++ b/net/mpls/Makefile @@ -3,5 +3,6 @@ # obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o +obj-$(CONFIG_MPLS_IPTUNNEL) += mpls_iptunnel.o mpls_router-y := af_mpls.o diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 1f93a59..c6f17ab 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -58,10 +58,11 @@ static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev) return rcu_dereference_rtnl(dev->mpls_ptr); } -static bool mpls_output_possible(const struct net_device *dev) +bool mpls_output_possible(const struct net_device *dev) { return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev); } +EXPORT_SYMBOL(mpls_output_possible); static unsigned int mpls_rt_header_size(const struct mpls_route *rt) { @@ -69,13 +70,14 @@ static unsigned int mpls_rt_header_size(const struct mpls_route *rt) return rt->rt_labels * sizeof(struct mpls_shim_hdr); } -static unsigned int mpls_dev_mtu(const struct net_device *dev) +unsigned int mpls_dev_mtu(const struct net_device *dev) { /* The amount of data the layer 2 frame can hold */ return dev->mtu; } +EXPORT_SYMBOL(mpls_dev_mtu); -static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) +bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) { if (skb->len <= mtu) return false; @@ -85,6 +87,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
[PATCH net-next RFC v2 2/3] ipv4: add support for light weight tunnel encap attributes
From: Roopa Prabhu Introduces two netlink attributes RTA_ENCAP_TYPE and RTA_ENCAP to support attaching encap information to ipv4 routes. RTA_ENCAP is a nested attribute as suggested by Thomas (and also as Robert had it in his series). RTA_ENCAP netlink policy is declared by the light weight tunnel drivers that support this encap type. fib code calls the following for each nexthop: - new route handler: lwt build state (that parses RTA_ENCAP and returns lwt state that lives in every fib_nh) - del dump hanlder: lwt release handler to release lwt state data - route dump hanlder: lwt dump encap to fill RTA_ENCAP data - during input route lookup sets dst->output to lwtunnel_output which in turn calls the corresponding lwt tunnel output function which applies the required encap and xmits the packet Signed-off-by: Roopa Prabhu --- include/net/ip_fib.h |7 ++- include/net/route.h|3 ++ include/uapi/linux/rtnetlink.h |3 +- net/ipv4/fib_frontend.c|8 net/ipv4/fib_semantics.c | 93 +++- net/ipv4/route.c | 33 +- 6 files changed, 142 insertions(+), 5 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed..49f18d7 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -44,7 +44,9 @@ struct fib_config { u32 fc_flow; u32 fc_nlflags; struct nl_info fc_nlinfo; - }; + struct nlattr *fc_encap; + u16 fc_encap_type; +}; struct fib_info; struct rtable; @@ -89,6 +91,9 @@ struct fib_nh { struct rtable __rcu * __percpu *nh_pcpu_rth_output; struct rtable __rcu *nh_rth_input; struct fnhe_hash_bucket __rcu *nh_exceptions; +#ifdef CONFIG_LWTUNNEL + struct lwtunnel_state *nh_lwtstate; +#endif }; /* diff --git a/include/net/route.h b/include/net/route.h index fe22d03..39a6495 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -66,6 +66,9 @@ struct rtable { struct list_headrt_uncached; struct uncached_list*rt_uncached_list; +#ifdef CONFIG_LWTUNNEL + struct lwtunnel_state *rt_lwtstate; +#endif }; static inline bool rt_is_input_route(const struct rtable *rt) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 17fb02f..6c089ad 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -308,6 +308,8 @@ enum rtattr_type_t { RTA_VIA, RTA_NEWDST, RTA_PREF, + RTA_ENCAP_TYPE, + RTA_ENCAP, __RTA_MAX }; @@ -357,7 +359,6 @@ struct rtvia { }; /* RTM_CACHEINFO */ - struct rta_cacheinfo { __u32 rta_clntref; __u32 rta_lastuse; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e..fbe0630 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -591,6 +591,8 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = { [RTA_METRICS] = { .type = NLA_NESTED }, [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) }, [RTA_FLOW] = { .type = NLA_U32 }, + [RTA_ENCAP_TYPE]= { .type = NLA_U16 }, + [RTA_ENCAP] = { .type = NLA_NESTED }, }; static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, @@ -656,6 +658,12 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, case RTA_TABLE: cfg->fc_table = nla_get_u32(attr); break; + case RTA_ENCAP: + cfg->fc_encap = attr; + break; + case RTA_ENCAP_TYPE: + cfg->fc_encap_type = nla_get_u16(attr); + break; } } diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 28ec3c1..54dd287 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "fib_lookup.h" @@ -208,6 +209,10 @@ static void free_fib_info_rcu(struct rcu_head *head) change_nexthops(fi) { if (nexthop_nh->nh_dev) dev_put(nexthop_nh->nh_dev); +#ifdef CONFIG_LWTUNNEL + if (nexthop_nh->nh_lwtstate) + lwtunnel_state_put(nexthop_nh->nh_lwtstate); +#endif free_nh_exceptions(nexthop_nh); rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output); rt_fibinfo_free(&nexthop_nh->nh_rth_input); @@ -366,6 +371,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) payload += nla_total_size((RTAX_MAX * nla_total_size(4))); if
[PATCH net-next RFC v2 1/3] lwt: infrastructure to support light weight tunnels
From: Roopa Prabhu provides ops to parse, build and output encaped packets for drivers that want to attach tunnel encap information to routes. Signed-off-by: Roopa Prabhu --- include/linux/lwtunnel.h |6 ++ include/net/lwtunnel.h| 84 + include/uapi/linux/lwtunnel.h | 11 +++ net/Kconfig |5 ++ net/core/Makefile |1 + net/core/lwtunnel.c | 162 + 6 files changed, 269 insertions(+) create mode 100644 include/linux/lwtunnel.h create mode 100644 include/net/lwtunnel.h create mode 100644 include/uapi/linux/lwtunnel.h create mode 100644 net/core/lwtunnel.c diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h new file mode 100644 index 000..97f32f8 --- /dev/null +++ b/include/linux/lwtunnel.h @@ -0,0 +1,6 @@ +#ifndef _LINUX_LWTUNNEL_H_ +#define _LINUX_LWTUNNEL_H_ + +#include + +#endif /* _LINUX_LWTUNNEL_H_ */ diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h new file mode 100644 index 000..649da3c --- /dev/null +++ b/include/net/lwtunnel.h @@ -0,0 +1,84 @@ +#ifndef __NET_LWTUNNEL_H +#define __NET_LWTUNNEL_H 1 + +#include +#include +#include +#include +#include +#include +#include + +#define LWTUNNEL_HASH_BITS 7 +#define LWTUNNEL_HASH_SIZE (1 << LWTUNNEL_HASH_BITS) + +struct lwtunnel_hdr { + int len; + __u8data[0]; +}; + +/* lw tunnel state flags */ +#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1 + +#define lwtunnel_output_redirect(lwtstate) (lwtstate && \ + (lwtstate->flags & LWTUNNEL_STATE_OUTPUT_REDIRECT)) + +struct lwtunnel_state { + __u16 type; + __u16 flags; + atomic_trefcnt; + struct lwtunnel_hdr tunnel; +}; + +struct lwtunnel_net { + struct hlist_head tunnels[LWTUNNEL_HASH_SIZE]; +}; + +struct lwtunnel_encap_ops { + int (*build_state)(struct net_device *dev, struct nlattr *encap, + struct lwtunnel_state **ts); + int (*output)(struct sock *sk, struct sk_buff *skb); + int (*fill_encap)(struct sk_buff *skb, + struct lwtunnel_state *lwtstate); + int (*get_encap_size)(struct lwtunnel_state *lwtstate); +}; + +#define MAX_LWTUNNEL_ENCAP_OPS 8 +extern const struct lwtunnel_encap_ops __rcu * + lwtun_encaps[MAX_LWTUNNEL_ENCAP_OPS]; + +static inline void lwtunnel_state_get(struct lwtunnel_state *lws) +{ + atomic_inc(&lws->refcnt); +} + +static inline void lwtunnel_state_put(struct lwtunnel_state *lws) +{ + if (!lws) + return; + + if (atomic_dec_and_test(&lws->refcnt)) + kfree(lws); +} + +static inline struct lwtunnel_state *lwtunnel_skb_lwstate(struct sk_buff *skb) +{ + struct rtable *rt = (struct rtable *)skb_dst(skb); + + return rt->rt_lwtstate; +} + +int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op, + unsigned int num); +int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, + unsigned int num); +int lwtunnel_build_state(struct net_device *dev, u16 encap_type, +struct nlattr *encap, +struct lwtunnel_state **lws); +int lwtunnel_fill_encap(struct sk_buff *skb, + struct lwtunnel_state *lwtstate); +int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate); +struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len); +int lwtunnel_output(struct sock *sk, struct sk_buff *skb); + +#endif /* __NET_LWTUNNEL_H */ diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h new file mode 100644 index 000..11150c0 --- /dev/null +++ b/include/uapi/linux/lwtunnel.h @@ -0,0 +1,11 @@ +#ifndef _UAPI_LWTUNNEL_H_ +#define _UAPI_LWTUNNEL_H_ + +#include + +enum tunnel_encap_types { + LWTUNNEL_ENCAP_NONE, + LWTUNNEL_ENCAP_MPLS, +}; + +#endif /* _UAPI_LWTUNNEL_H_ */ diff --git a/net/Kconfig b/net/Kconfig index 57a7c5a..e296d6f 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -374,9 +374,14 @@ source "net/caif/Kconfig" source "net/ceph/Kconfig" source "net/nfc/Kconfig" +config LWTUNNEL + bool "Network light weight tunnels" + ---help--- + light weight tunnels endif # if NET # Used by archs to tell that they support BPF_JIT config HAVE_BPF_JIT bool + diff --git a/net/core/Makefile b/net/core/Makefile index fec0856..086b01f 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -23,3 +23,4 @@ obj-$(CONFIG_NETWORK_PHY_TIMESTAMPING) += timestamping.o obj-$(CONFIG_NET_PTP_CLASSIFY) += ptp_classifier.o obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o +obj-$(CONFIG_LWTUNNEL) += lwtunnel.o diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c new file mode 100644 index 000..29c7802 --- /dev/null +++ b/net/core/lwtunnel.c @@ -0,0 +1,162 @@
[PATCH net-next RFC v2 0/3] light weight tunnel infrastructure and driver
From: Roopa Prabhu This series implements infrastructure for light weight tunnels to support mpls label edge routers (ie mpls ip tunnels). As previously discussed having netdevices will not scale. Hence this series introduces new RTA_ENCAP* attributes to attach encap information with routes (following suggestion from Eric Biederman). The first patch introduces an infrastructure to support light weight tunnels that dont have netdevices. The infrastructure allows tunnel drivers to register handlers to parse and build tunnel encap data which can be attached to each route nexthop. The second patch adds support in ipv4 fib to carry such light weight tunnel encap data. The third patch implements mpls ip tunnels using this light weight tunnel infrastructure. Could not think of a better name, so, it is 'lwt' for 'light weight tunnels' for now. I do have iproute2 patches. Can post them separately if required (they are currently in my github tree https://github.com/CumulusNetworks/iproute2 (mpls branch)) Signed-off-by: Roopa Prabhu v2: - bug fixes (more testing) - feedback from Thomas - A flag in lwtunnel state that allows using the chosen output device instead of redirecting dst output to the lwt output function. - This flag can be set by the tunnel driver at tunnel state build time - moved lwtstate pointer from dst_entry to rtable (seemed cleaner looking at thomas's openvswitch patches) - moved mpls iptunnel code into separate file (following erics and roberts initial patches) Roopa Prabhu (3): lwt: infrastructure to support light weight tunnels ipv4: add support for light weight tunnel encap attributes mpls: support for ip mpls tunnels include/linux/lwtunnel.h |6 ++ include/linux/mpls_iptunnel.h |6 ++ include/net/ip_fib.h |7 +- include/net/lwtunnel.h | 84 +++ include/net/mpls_iptunnel.h| 29 + include/net/route.h|3 + include/uapi/linux/lwtunnel.h | 11 ++ include/uapi/linux/mpls_iptunnel.h | 26 + include/uapi/linux/rtnetlink.h |3 +- net/Kconfig|5 + net/core/Makefile |1 + net/core/lwtunnel.c| 162 net/ipv4/fib_frontend.c|8 ++ net/ipv4/fib_semantics.c | 93 +++- net/ipv4/route.c | 33 +- net/mpls/Kconfig |5 + net/mpls/Makefile |1 + net/mpls/af_mpls.c |9 +- net/mpls/internal.h|3 + net/mpls/mpls_iptunnel.c | 205 20 files changed, 692 insertions(+), 8 deletions(-) create mode 100644 include/linux/lwtunnel.h create mode 100644 include/linux/mpls_iptunnel.h create mode 100644 include/net/lwtunnel.h create mode 100644 include/net/mpls_iptunnel.h create mode 100644 include/uapi/linux/lwtunnel.h create mode 100644 include/uapi/linux/mpls_iptunnel.h create mode 100644 net/core/lwtunnel.c create mode 100644 net/mpls/mpls_iptunnel.c -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 00/22] FUJITSU Extended Socket network device driver
Thank you for reviewing. > As Alex mentioned earlier, I suspect this is more appropriate for drivers/net. > If David objects, we can consider for platform/drivers/x86. OK, I'll migrate the code from drivers/platform/x86 to drivers/net and also incorporate comments. I'm going to resend one soon. Sincerely, Taku Izumi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] ixgbe: use kzalloc for allocating one thing
Use kzalloc rather than kcalloc(1.. The semantic patch that makes this change is as follows: // @@ @@ - kcalloc(1, + kzalloc( ...) // and removing checkpatch below CHECK: CHECK: Prefer kzalloc(sizeof(*fwd_adapter)...) over kzalloc(sizeof(struct ixgbe_fwd_adapter)...) Signed-off-by: Maninder Singh Reviewed-by: Vaneet Narang --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 3bf2f3c..3f58757 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -8134,7 +8134,7 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev) (adapter->num_rx_pools > IXGBE_MAX_MACVLANS)) return ERR_PTR(-EBUSY); - fwd_adapter = kcalloc(1, sizeof(struct ixgbe_fwd_adapter), GFP_KERNEL); + fwd_adapter = kzalloc(sizeof(*fwd_adapter), GFP_KERNEL); if (!fwd_adapter) return ERR_PTR(-ENOMEM); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 21:37:02 -0400 Jeff Layton wrote: > > Note, the box has been rebooted since I posted my last trace. > > > > Ahh pity. The port has probably changed...if you trace it again maybe > try to figure out what it's talking to before rebooting the server? I could probably re-enable the trace again. Would it be best if I put back the commits and run it with the buggy kernel. I could then run these commands after the bug happens and/or before the port goes away. > Oh! I was thinking that you were seeing this extra port on the > _client_, but now rereading your original mail I see that it's > appearing up on the NFS server. Is that correct? Correct, the bug is on the NFS server, not the client. The client is already up and running, and had the filesystem mounted when the server rebooted. I take it that this happened when the client tried to reconnect. Just let me know what you would like to do. As this is my main production server of my local network, I would only be able to do this a few times. Let me know all the commands and tracing you would like to have. I'll try it tomorrow (going to bed now). -- Steve > > So, assuming that this is NFSv4.0, then this port is probably bound > when the server is establishing the callback channel to the client. So > we may need to look at how those xprts are being created and whether > there are differences from a standard client xprt. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user
On Thu, Jun 18, 2015 at 11:30:54AM -0700, Mahesh Bandewar wrote: > Actor and Partner details can be accessed via proc-fs, sys-fs > entries or netlink interface. These interfaces are world readable > at this moment. The earlier patch-series made the LACP communication > secure to avoid nuisance attack from within the same L2 domain but > it did not prevent "someone unprivileged" looking at that information > on host and perform the same act. > > This patch essentially avoids spitting those entries if the user > in question does not have enough privileges. > > Signed-off-by: Mahesh Bandewar > --- > drivers/net/bonding/bond_netlink.c | 23 + > drivers/net/bonding/bond_procfs.c | 101 > +++-- > drivers/net/bonding/bond_sysfs.c | 12 ++--- > 3 files changed, 71 insertions(+), 65 deletions(-) > [...] > diff --git a/drivers/net/bonding/bond_procfs.c > b/drivers/net/bonding/bond_procfs.c > index e7f3047a26df..f514fe5e80a5 100644 > --- a/drivers/net/bonding/bond_procfs.c > +++ b/drivers/net/bonding/bond_procfs.c [...] > @@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq, > seq_printf(seq, "Partner Churned Count: %d\n", > port->churn_partner_count); > > - seq_puts(seq, "details actor lacp pdu:\n"); > - seq_printf(seq, "system priority: %d\n", > -port->actor_system_priority); > - seq_printf(seq, "system mac address: %pM\n", > -&port->actor_system); > - seq_printf(seq, "port key: %d\n", > -port->actor_oper_port_key); > - seq_printf(seq, "port priority: %d\n", > -port->actor_port_priority); > - seq_printf(seq, "port number: %d\n", > -port->actor_port_number); > - seq_printf(seq, "port state: %d\n", > -port->actor_oper_port_state); > - > - seq_puts(seq, "details partner lacp pdu:\n"); > - seq_printf(seq, "system priority: %d\n", > -port->partner_oper.system_priority); > - seq_printf(seq, "system mac address: %pM\n", > -&port->partner_oper.system); > - seq_printf(seq, "oper key: %d\n", > -port->partner_oper.key); > - seq_printf(seq, "port priority: %d\n", > -port->partner_oper.port_priority); > - seq_printf(seq, "port number: %d\n", > -port->partner_oper.port_number); > - seq_printf(seq, "port state: %d\n", > -port->partner_oper.port_state); > + if (capable(CAP_NET_ADMIN)) { > + seq_puts(seq, "details actor lacp pdu:\n"); > + seq_printf(seq, "system priority: %d\n", > +port->actor_system_priority); > + seq_printf(seq, "system mac address: %pM\n", > +&port->actor_system); > + seq_printf(seq, "port key: %d\n", > +port->actor_oper_port_key); > + seq_printf(seq, "port priority: %d\n", > +port->actor_port_priority); > + seq_printf(seq, "port number: %d\n", > +port->actor_port_number); > + seq_printf(seq, "port state: %d\n", > +port->actor_oper_port_state); > + > + seq_puts(seq, "details partner lacp pdu:\n"); > + seq_printf(seq, "system priority: %d\n", > +port->partner_oper.system_priority); > + seq_printf(seq, "system mac address: %pM\n", > +&port->partner_oper.system); > + seq_printf(seq, "oper key: %d\n", > +port->partner_oper.key); > + seq_printf(seq, "port priority: %d\n", > +port->partner_oper.port_priority); > + seq_printf(seq, "port number: %d\n", > +port->partner_oper.port_number); > + seq_printf(seq, "port state: %d\n", > +port->partner_oper.port_state); > +
[PATCH v2] fm10k: Report MAC address on driver load
This change adds the MAC address to the list of values recorded on driver load. The MAC address represents the serial number of the unit and allows us to track the value should a card be replaced in a system. The log message should now be similar in output to that of ixgbe. Signed-off-by: Alexander Duyck --- v2: Moved printing of MAC onto separate line similar to ixgbe. (Hopefully this works for you Jeff. I took at look at the patch and just moved the bit I needed down. I figured since this block hasn't changed I should be able to get away with just doing this instead of pulling and rebasing off of your tree. ) drivers/net/ethernet/intel/fm10k/fm10k_pci.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c index ce53ff25f88d..62a584f633d8 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c @@ -1843,6 +1843,9 @@ static int fm10k_probe(struct pci_dev *pdev, /* print warning for non-optimal configurations */ fm10k_slot_warn(interface); + /* report MAC address for logging */ + dev_info(&pdev->dev, "%pM\n", netdev->dev_addr); + /* enable SR-IOV after registering netdev to enforce PF/VF ordering */ fm10k_iov_configure(pdev, 0); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Intel-wired-lan] [PATCH] fm10k: Report MAC address on driver load
On 06/18/2015 04:49 PM, Jeff Kirsher wrote: On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote: This change adds the MAC address to the list of values recorded on driver load. The MAC address represents the serial number of the unit and allows us to track the value should a card be replaced in a system. Signed-off-by: Alexander Duyck --- drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) With the recent fm10k patches that Jake submitted, this patch no longer applies cleanly. If you could re-spin your patch against my next-queue tree (dev-queue branch) that would be much appreciated. I should have a new patch for you in 20 minutes or so. Just waiting on the build to finish and then I'll give it a quick test. - Alex -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 21:08:43 -0400 Steven Rostedt wrote: > On Thu, 18 Jun 2015 18:50:51 -0400 > Jeff Layton wrote: > > > The interesting bit here is that the sockets all seem to connect to port > > 55201 on the remote host, if I'm reading these traces correctly. What's > > listening on that port on the server? > > > > This might give some helpful info: > > > > $ rpcinfo -p > > # rpcinfo -p wife >program vers proto port service > 104 tcp111 portmapper > 103 tcp111 portmapper > 102 tcp111 portmapper > 104 udp111 portmapper > 103 udp111 portmapper > 102 udp111 portmapper > 1000241 udp 34243 status > 1000241 tcp 34498 status > > # rpcinfo -p localhost >program vers proto port service > 104 tcp111 portmapper > 103 tcp111 portmapper > 102 tcp111 portmapper > 104 udp111 portmapper > 103 udp111 portmapper > 102 udp111 portmapper > 1000241 udp 38332 status > 1000241 tcp 52684 status > 132 tcp 2049 nfs > 133 tcp 2049 nfs > 134 tcp 2049 nfs > 1002272 tcp 2049 > 1002273 tcp 2049 > 132 udp 2049 nfs > 133 udp 2049 nfs > 134 udp 2049 nfs > 1002272 udp 2049 > 1002273 udp 2049 > 1000211 udp 53218 nlockmgr > 1000213 udp 53218 nlockmgr > 1000214 udp 53218 nlockmgr > 1000211 tcp 49825 nlockmgr > 1000213 tcp 49825 nlockmgr > 1000214 tcp 49825 nlockmgr > 151 udp 49166 mountd > 151 tcp 48797 mountd > 152 udp 47856 mountd > 152 tcp 53839 mountd > 153 udp 36090 mountd > 153 tcp 46390 mountd > > Note, the box has been rebooted since I posted my last trace. > Ahh pity. The port has probably changed...if you trace it again maybe try to figure out what it's talking to before rebooting the server? > > > > Also, what NFS version are you using to mount here? Your fstab entries > > suggest that you're using the default version (for whatever distro this > > is), but have you (e.g.) set up nfsmount.conf to default to v3 on this > > box? > > > > My box is Debian testing (recently updated). > > # dpkg -l nfs-* > > ii nfs-common 1:1.2.8-9amd64NFS support files common to clien > ii nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server > > > same for both boxes. > > nfsmount.conf doesn't exist on either box. > > I'm assuming it is using nfs4. > (cc'ing Bruce) Oh! I was thinking that you were seeing this extra port on the _client_, but now rereading your original mail I see that it's appearing up on the NFS server. Is that correct? So, assuming that this is NFSv4.0, then this port is probably bound when the server is establishing the callback channel to the client. So we may need to look at how those xprts are being created and whether there are differences from a standard client xprt. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 18:50:51 -0400 Jeff Layton wrote: > The interesting bit here is that the sockets all seem to connect to port > 55201 on the remote host, if I'm reading these traces correctly. What's > listening on that port on the server? > > This might give some helpful info: > > $ rpcinfo -p # rpcinfo -p wife program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 38332 status 1000241 tcp 52684 status 132 tcp 2049 nfs 133 tcp 2049 nfs 134 tcp 2049 nfs 1002272 tcp 2049 1002273 tcp 2049 132 udp 2049 nfs 133 udp 2049 nfs 134 udp 2049 nfs 1002272 udp 2049 1002273 udp 2049 1000211 udp 53218 nlockmgr 1000213 udp 53218 nlockmgr 1000214 udp 53218 nlockmgr 1000211 tcp 49825 nlockmgr 1000213 tcp 49825 nlockmgr 1000214 tcp 49825 nlockmgr 151 udp 49166 mountd 151 tcp 48797 mountd 152 udp 47856 mountd 152 tcp 53839 mountd 153 udp 36090 mountd 153 tcp 46390 mountd Note, the box has been rebooted since I posted my last trace. > > Also, what NFS version are you using to mount here? Your fstab entries > suggest that you're using the default version (for whatever distro this > is), but have you (e.g.) set up nfsmount.conf to default to v3 on this > box? > My box is Debian testing (recently updated). # dpkg -l nfs-* ii nfs-common 1:1.2.8-9amd64NFS support files common to clien ii nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server same for both boxes. nfsmount.conf doesn't exist on either box. I'm assuming it is using nfs4. Anything else I can provide? -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR
On 2015/6/19 0:00, Alexei Starovoitov wrote: On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote: Original code has a problem, cause following code failed to pass verifier: r1 <- r10 r1 -= 8 r2 = 8 r3 = unsafe pointer call BPF_FUNC_probe_read <-- R1 type=inv expected=fp However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be loaded successfully. This is because the verifier allows only BPF_ADD instruction on a FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB on FRAME_PTR reigster to get a UNKNOWN_VALUE register. This patch fix it by adding BPF_SUB in stack_relative checking. It's not a bug. It's catching ADD only by design. If we let it recognize SUB then one might argue we should let it recognize multiply, shifts and all other arithmetic on pointers. verifier will be getting bigger and bigger. Where do we stop? llvm only emits canonical ADD. If you've seen llvm doing SUB, let's fix it there. So what piece generated this 'r1 -= 8' ? I hit this problem when writing code of automatical parameter generator. The instruction is generated by myself. Now I have corrected my code. Thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fm10k: Report MAC address on driver load
On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote: > This change adds the MAC address to the list of values recorded on > driver > load. The MAC address represents the serial number of the unit and > allows > us to track the value should a card be replaced in a system. > > Signed-off-by: Alexander Duyck > --- > drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) With the recent fm10k patches that Jake submitted, this patch no longer applies cleanly. If you could re-spin your patch against my next-queue tree (dev-queue branch) that would be much appreciated. signature.asc Description: This is a digitally signed message part
Re: [PATCH net-next 2/3] ipv4: L3 and L4 hash-based multipath routing
On 06/17/2015 01:08 PM, Peter Nørlund wrote: This patch adds L3 and L4 hash-based multipath routing, selectable on a per-route basis with the reintroduced RTA_MP_ALGO attribute. The default is now RT_MP_ALG_L3_HASH. Signed-off-by: Peter Nørlund --- include/net/ip_fib.h | 4 ++- include/net/route.h| 5 ++-- include/uapi/linux/rtnetlink.h | 14 ++- net/ipv4/fib_frontend.c| 4 +++ net/ipv4/fib_semantics.c | 34 ++--- net/ipv4/icmp.c| 4 +-- net/ipv4/route.c | 56 +++--- net/ipv4/xfrm4_policy.c| 2 +- 8 files changed, 103 insertions(+), 20 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 4be4f25..250d98e 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -37,6 +37,7 @@ struct fib_config { u32 fc_flags; u32 fc_priority; __be32 fc_prefsrc; + int fc_mp_alg; struct nlattr *fc_mx; struct rtnexthop*fc_mp; int fc_mx_len; @@ -116,6 +117,7 @@ struct fib_info { int fib_nhs; #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_mp_weight; + int fib_mp_alg; #endif struct rcu_head rcu; struct fib_nh fib_nh[0]; @@ -308,7 +310,7 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev); int fib_sync_down_dev(struct net_device *dev, int force); int fib_sync_down_addr(struct net *net, __be32 local); int fib_sync_up(struct net_device *dev); -void fib_select_multipath(struct fib_result *res); +void fib_select_multipath(struct fib_result *res, const struct flowi4 *flow); /* Exported by fib_trie.c */ void fib_trie_init(void); diff --git a/include/net/route.h b/include/net/route.h index fe22d03..1fc7deb 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -110,7 +110,8 @@ struct in_device; int ip_rt_init(void); void rt_cache_flush(struct net *net); void rt_flush_dev(struct net_device *dev); -struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp); +struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp, +const struct flowi4 *mp_flow); struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp, struct sock *sk); struct dst_entry *ipv4_blackhole_route(struct net *net, @@ -267,7 +268,7 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4, sport, dport, sk); if (!dst || !src) { - rt = __ip_route_output_key(net, fl4); + rt = __ip_route_output_key(net, fl4, NULL); if (IS_ERR(rt)) return rt; ip_rt_put(rt); diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 17fb02f..dff4a72 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -271,6 +271,18 @@ enum rt_scope_t { #define RTM_F_EQUALIZE0x400 /* Multipath equalizer: NI */ #define RTM_F_PREFIX 0x800 /* Prefix addresses */ +/* Multipath algorithms */ + +enum rt_mp_alg_t { + RT_MP_ALG_L3_HASH, /* Was IP_MP_ALG_NONE */ + RT_MP_ALG_PER_PACKET, /* Was IP_MP_ALG_RR */ + RT_MP_ALG_DRR, /* not used */ + RT_MP_ALG_RANDOM, /* not used */ + RT_MP_ALG_WRANDOM, /* not used */ + RT_MP_ALG_L4_HASH, + __RT_MP_ALG_MAX +}; + /* Reserved table identifiers */ enum rt_class_t { @@ -301,7 +313,7 @@ enum rtattr_type_t { RTA_FLOW, RTA_CACHEINFO, RTA_SESSION, /* no longer used */ - RTA_MP_ALGO, /* no longer used */ + RTA_MP_ALGO, RTA_TABLE, RTA_MARK, RTA_MFC_STATS, diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e..376e8c1 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -590,6 +590,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = { [RTA_PREFSRC] = { .type = NLA_U32 }, [RTA_METRICS] = { .type = NLA_NESTED }, [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) }, + [RTA_MP_ALGO] = { .type = NLA_U32 }, [RTA_FLOW] = { .type = NLA_U32 }, }; @@ -650,6 +651,9 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb, cfg->fc_mp = nla_data(attr); cfg->fc_mp_len = nla_len(attr); break; + case RTA_MP_ALGO: + cfg->fc_mp_alg = nla_get_u32(attr); + break; case RTA_FLOW: cfg->fc_flow = nla_get_u32(attr);
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 15:49:14 -0400 Steven Rostedt wrote: > On Thu, 18 Jun 2015 15:24:52 -0400 > Trond Myklebust wrote: > > > On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt > > wrote: > > > On Fri, 12 Jun 2015 11:50:38 -0400 > > > Steven Rostedt wrote: > > > > > >> I reverted the following commits: > > >> > > >> c627d31ba0696cbd829437af2be2f2dee3546b1e > > >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134 > > >> caf4ccd4e88cf2795c927834bc488c8321437586 > > >> > > >> And the issue goes away. That is, I watched the port go from > > >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. > > >> > > >> In fact, I watched the port with my portlist.c module, and it > > >> disappeared there too when it entered the TIME_WAIT state. > > >> > > > > I've scanned those commits again and again, and I'm not seeing how we > > could be introducing a socket leak there. The only suspect I can see > > would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you > > using NFS swap? > > Not that I'm aware of. > > > > > > I've been running v4.0.5 with the above commits reverted for 5 days > > > now, and there's still no hidden port appearing. > > > > > > What's the status on this? Should those commits be reverted or is there > > > another solution to this bug? > > > > > > > I'm trying to reproduce, but I've had no luck yet. > > It seems to happen with the connection to my wife's machine, and that > is where my wife's box connects two directories via nfs: > > This is what's in my wife's /etc/fstab directory > > goliath:/home/upload /upload nfs auto,rw,intr,soft 0 0 > goliath:/home/gallery/gallerynfs auto,ro,intr,soft 0 0 > > And here's what's in my /etc/exports directory > > /home/upload wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check) > /home/gallery 192.168.23.0/24(ro,sync,no_subtree_check) > > Attached is my config. > The interesting bit here is that the sockets all seem to connect to port 55201 on the remote host, if I'm reading these traces correctly. What's listening on that port on the server? This might give some helpful info: $ rpcinfo -p Also, what NFS version are you using to mount here? Your fstab entries suggest that you're using the default version (for whatever distro this is), but have you (e.g.) set up nfsmount.conf to default to v3 on this box? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fm10k: Report MAC address on driver load
On Wed, 2015-06-17 at 20:12 -0700, Alexander Duyck wrote: > This change adds the MAC address to the list of values recorded on > driver > load. The MAC address represents the serial number of the unit and > allows > us to track the value should a card be replaced in a system. > > Signed-off-by: Alexander Duyck > --- > drivers/net/ethernet/intel/fm10k/fm10k_pci.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Thanks Alex, I will get this added to my queue. signature.asc Description: This is a digitally signed message part
[PATCH] NET: ROSE: Don't dereference NULL neighbour pointer.
A ROSE socket doesn't necessarily always have a neighbour pointer so check if the neighbour pointer is valid before dereferencing it. Signed-off-by: Ralf Baechle Tested-by: Bernard Pidoux Cc: sta...@vger.kernel.org #2.6.11+ diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index 8ae6030..dd304bc 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -192,7 +192,8 @@ static void rose_kill_by_device(struct net_device *dev) if (rose->device == dev) { rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0); - rose->neighbour->use--; + if (rose->neighbour) + rose->neighbour->use--; rose->device = NULL; } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000e driver - hang after 4 hours of uptime - finally bisected!
On Thu, 2015-06-18 at 12:46 -0400, Valdis Kletnieks wrote: > (follow up to a report from last week - bisecting took a while as I could > only do 1 or 2 tests an evening) > > My Dell Latitude E6530 crashes with a specific kernel lockup almost > exactly 4 hours after boot if there isn't a cable connected to the > Ethernet port: > > [14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > [14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > [14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > [14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 1 > [14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 2 > [14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 1 > [14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > [14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 1 > [14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 3 > [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > > As far as I can tell, the timestamp jitter is just how long it takes me to > enter the cryptLUKS passphrase for the hard drive at boot... > > lspci tells me: > > lspci -vvv -s "00:19.0" > 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network > Connection (rev 04) > DeviceName: Onboard LAN > Subsystem: Dell Device 0535 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0 > Interrupt: pin A routed to IRQ 28 > Region 0: Memory at f770 (32-bit, non-prefetchable) [size=128K] > Region 1: Memory at f7739000 (32-bit, non-prefetchable) [size=4K] > Region 2: I/O ports at f040 [size=32] > Capabilities: [c8] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: fee00318 Data: > Capabilities: [e0] PCI Advanced Features > AFCap: TP+ FLR+ > AFCtrl: FLR- > AFStatus: TP- > Kernel driver in use: e1000e > > > The traceback always looks like: > > [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on > cpu 0 > > [14479.906908] Call Trace: > [14479.906914][] dump_stack+0x50/0xa8 > [14479.906930] [] panic+0xcd/0x1e4 > [14479.906940] [] ? perf_event_task_disable+0xc0/0xc0 > [14479.906952] [] watchdog_overflow_callback+0x9b/0xa0 > [14479.906959] [] __perf_event_overflow+0xc4/0x1f0 > [14479.906968] [] perf_event_overflow+0x14/0x20 > [14479.906976] [] intel_pmu_handle_irq+0x1e1/0x430 > [14479.906990] [] perf_event_nmi_handler+0x26/0x40 > [14479.906999] [] nmi_handle+0x103/0x340 > [14479.907005] [] ? nmi_handle+0x5/0x340 > [14479.907017] [] default_do_nmi+0xc3/0x120 > [14479.907032] [] do_nmi+0xe8/0x130 > [14479.907044] [] end_repeat_nmi+0x1e/0x2e > [14479.907055] [] ? e1000e_cyclecounter_read+0x16/0xc0 > [14479.907061] [] ? e1000e_cyclecounter_read+0x16/0xc0 > [14479.907069] [] ? e1000e_cyclecounter_read+0x16/0xc0 > [14479.907075] <> [] timecounter_read+0x19/0x60 > [14479.907088] [] e1000e_phc_gettime+0x2e/0x60 > [14479.907098] [] e1000e_systim_overflow_work+0x31/0x70 > [14479.907105] [] process_one_work+0x3c9/0x980 > [14479.907115] [] ? process_one_work+0x312/0x980 > [14479.907125] [] ? worker_thread+0x78/0x760 > [14479.907134] [] worker_thread+0x2cc/0x760 > [14479.907144] [] ? process_one_work+0x980/0x980 > [14479.907154] [] kthread+0xfe/0x120 > [14479.907163] [] ? finish_task_switch+0x50/0x1c0 > [14479.907173] [] ? kthread_create_on_node+0x270/0x270 > [14479.907179] [] ret_from_fork+0x3f/0x70 > [14479.907188] [] ? kthread_create_on_node+0x270/0x270 > [14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation > range: 0x8000-0xbfff) > > Bisection tells me it's this commit: > > commit 83129b37ef35bb6a7f01c060129736a8db5d31c4 > Author: Yanir Lubetkin > Date: Tue Jun 2 17:05:45 2015 +0300 > > e1000e: fix systim issues > > Two issues involving systim were reported. > 1. Clock is not running in the correct frequency > 2. In some situations, systim values were not incremented linearly > This patch fixes the hardware clock configuration and the spurious > non-linear increment. Thanks Valdis! I will have Yanir look into it and hopefully we should have a fix here soon for you to verify. signature.asc Description: This is a digitally signed message part
Re: [PATCH 00/22] FUJITSU Extended Socket network device driver
On Thu, Jun 18, 2015 at 09:45:59AM +0900, Taku Izumi wrote: > This patchsets adds FUJITSU Extended Socket network device driver. > Extended Socket network device is a shared memory based high-speed network > interface between Extended Partitions of PRIMEQUEST 2000 E2 series. > > You can get some information about Extended Partition and Extended > Socket by referring the following manual. > > http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf > 3.2.1 Extended Partitioning > 3.2.2 Extended Socket > As Alex mentioned earlier, I suspect this is more appropriate for drivers/net. If David objects, we can consider for platform/drivers/x86. -- Darren Hart Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT] [4.2] 2nd NFC update
Hi David, This is a follow up fix for a typo that I introduced while cleaning the 1st 4.2 NFC pull request patches. The following changes since commit d0dcad8bd32a34aa85bcbd5d2033658cb3964377: NFC: nfcmrvl: set PB_BAIL_OUT at setup (2015-06-13 00:08:55 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next.git tags/nfc-next-4.2-2 for you to fetch changes up to fb77ff4f43990dc91926ce2704036a547482544e: NFC: nci: fix mistake in uart generic driver (2015-06-15 18:10:37 +0200) Vincent Cuissard (1): NFC: nci: fix mistake in uart generic driver net/nfc/nci/uart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fix search limit handling in skb_find_text()
On Tue, Jun 16, 2015 at 03:13:41PM +0300, Roman Khimov wrote: > В письме от 16 июня 2015 12:48:41 пользователь Pablo Neira Ayuso написал: [...] > > But if we change the existing behaviour, users may be relying on it > > and we'll get things broken for them. Someone else will come later one > > with another patch to say: "hey, --to used to be inclusive but this is > > not the case anymore and it's breaking my setup". > > I do understand your concerns, but fixing it this way would require changing > skb_seq_read() and basicaly would propagate "'to' offset included" semantics > (which seems a bit strange for programmers, IMO) further. And initially I > thought that changing skb_seq_read() would be more intrusive, although > looking > at all this now it looks like the only real user of upper_offset field in > ts_config struct is skb_find_text(), because other invocations of > skb_seq_read() from drivers/scsi/libiscsi_tcp.c and net/batman-adv/main.c use > skb->len as an upper limit. > > > > em_text_match() in net/sched/em_text.c is also suspicious. > > > > Please, elaborate. > > The way it constructs 'to' offset, I think it doesn't expect something to > match at 'to'. Although I might be wrong here. Could you send a patch that resolves the inconsistency for programmers while leaving the userspace exposed behaviour through xt_string and em_string intact? Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 15:24:52 -0400 Trond Myklebust wrote: > On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt wrote: > > On Fri, 12 Jun 2015 11:50:38 -0400 > > Steven Rostedt wrote: > > > >> I reverted the following commits: > >> > >> c627d31ba0696cbd829437af2be2f2dee3546b1e > >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134 > >> caf4ccd4e88cf2795c927834bc488c8321437586 > >> > >> And the issue goes away. That is, I watched the port go from > >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. > >> > >> In fact, I watched the port with my portlist.c module, and it > >> disappeared there too when it entered the TIME_WAIT state. > >> > > I've scanned those commits again and again, and I'm not seeing how we > could be introducing a socket leak there. The only suspect I can see > would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you > using NFS swap? Not that I'm aware of. > > > I've been running v4.0.5 with the above commits reverted for 5 days > > now, and there's still no hidden port appearing. > > > > What's the status on this? Should those commits be reverted or is there > > another solution to this bug? > > > > I'm trying to reproduce, but I've had no luck yet. It seems to happen with the connection to my wife's machine, and that is where my wife's box connects two directories via nfs: This is what's in my wife's /etc/fstab directory goliath:/home/upload /upload nfs auto,rw,intr,soft 0 0 goliath:/home/gallery/gallerynfs auto,ro,intr,soft 0 0 And here's what's in my /etc/exports directory /home/upload wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check) /home/gallery 192.168.23.0/24(ro,sync,no_subtree_check) Attached is my config. -- Steve config.gz Description: application/gzip
Re: [PATCH net-next 1/3] ipv4: Lock-less per-packet multipath
On 06/17/2015 01:08 PM, Peter Nørlund wrote: The current multipath attempted to be quasi random, but in most cases it behaved just like a round robin balancing. This patch refactors the algorithm to be exactly that and in doing so, avoids the spin lock. The new design paves the way for hash-based multipath, replacing the modulo with thresholds, minimizing disruption in case of failing paths or route replacements. Signed-off-by: Peter Nørlund --- include/net/ip_fib.h | 6 +-- net/ipv4/Kconfig | 1 + net/ipv4/fib_semantics.c | 116 ++- 3 files changed, 68 insertions(+), 55 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed..4be4f25 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -76,8 +76,8 @@ struct fib_nh { unsigned intnh_flags; unsigned char nh_scope; #ifdef CONFIG_IP_ROUTE_MULTIPATH - int nh_weight; - int nh_power; + int nh_mp_weight; + atomic_tnh_mp_upper_bound; #endif #ifdef CONFIG_IP_ROUTE_CLASSID __u32 nh_tclassid; @@ -115,7 +115,7 @@ struct fib_info { #define fib_advmss fib_metrics[RTAX_ADVMSS-1] int fib_nhs; #ifdef CONFIG_IP_ROUTE_MULTIPATH - int fib_power; + int fib_mp_weight; #endif struct rcu_head rcu; struct fib_nh fib_nh[0]; I could do without some of this renaming. For example you could probably not bother with adding the _mp piece to the name. That way we don't have to track all the nh_weight -> nh_mp_weight changes. Also you could probably just use the name fib_weight since not including the _mp was already the convention for the multipath portions of the structure anyway. This isn't really improving readability at all so I would say don't bother renaming it. diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index d83071d..cb91f67 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -81,6 +81,7 @@ config IP_MULTIPLE_TABLES config IP_ROUTE_MULTIPATH bool "IP: equal cost multipath" depends on IP_ADVANCED_ROUTER + select BITREVERSE help Normally, the routing tables specify a single action to be taken in a deterministic manner for a given packet. If you say Y here diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 28ec3c1..8c8df80 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -15,6 +15,7 @@ #include #include +#include #include #include #include @@ -57,7 +58,7 @@ static struct hlist_head fib_info_devhash[DEVINDEX_HASHSIZE]; #ifdef CONFIG_IP_ROUTE_MULTIPATH -static DEFINE_SPINLOCK(fib_multipath_lock); +static DEFINE_PER_CPU(u8, fib_mp_rr_counter); #define for_nexthops(fi) {\ int nhsel; const struct fib_nh *nh; \ @@ -261,7 +262,7 @@ static inline int nh_comp(const struct fib_info *fi, const struct fib_info *ofi) nh->nh_gw != onh->nh_gw || nh->nh_scope != onh->nh_scope || #ifdef CONFIG_IP_ROUTE_MULTIPATH - nh->nh_weight != onh->nh_weight || + nh->nh_mp_weight != onh->nh_mp_weight || #endif #ifdef CONFIG_IP_ROUTE_CLASSID nh->nh_tclassid != onh->nh_tclassid || @@ -449,6 +450,43 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int remaining) return remaining > 0 ? 0 : nhs; } This is a good example. If we don't do the rename we don't have to review changes like the one above which just add extra overhead to the patch. +static void fib_rebalance(struct fib_info *fi) +{ + int factor; + int total; + int w; + + if (fi->fib_nhs < 2) + return; + + total = 0; + for_nexthops(fi) { + if (!(nh->nh_flags & RTNH_F_DEAD)) + total += nh->nh_mp_weight; + } endfor_nexthops(fi); + + if (likely(total != 0)) { + factor = DIV_ROUND_UP(total, 8388608); + total /= factor; + } else { + factor = 1; + } + So where does the 8388608 value come from? Is it just here to help restrict the upper_bound to a u8 value? + w = 0; + change_nexthops(fi) { + int upper_bound; + + if (nexthop_nh->nh_flags & RTNH_F_DEAD) { + upper_bound = -1; + } else { + w += nexthop_nh->nh_mp_weight / factor; + upper_bound = DIV_ROUND_CLOSEST(256 * w, total); + } This is doing some confusing stuff. I assume the whole point is to get the value to convert the upper_bound into a u8 value based on the weight where you end
Re: [PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status
On Thu, Jun 18, 2015 at 10:51:37AM -0700, Scott Feldman wrote: > On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek > wrote: > > This series adds the ability to have the Linux kernel track whether or > > not a particular route should be used based on the link-status of the > > interface associated with the next-hop. > > > > Before this patch any link-failure on an interface that was serving as a > > gateway for some systems could result in those systems being isolated > > from the rest of the network as the stack would continue to attempt to > > send frames out of an interface that is actually linked-down. When the > > kernel is responsible for all forwarding, it should also be responsible > > for taking action when the traffic can no longer be forwarded -- there > > is no real need to outsource link-monitoring to userspace anymore. > > > > This feature is only enabled with the new per-interface or ipv4 global > > sysctls called 'ignore_routes_with_linkdown'. > > > > net.ipv4.conf.all.ignore_routes_with_linkdown = 0 > > net.ipv4.conf.default.ignore_routes_with_linkdown = 0 > > net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 > > ... > > > > When the above sysctls are set, the kernel will not only report to > > userspace that the link is down, but it will also report to userspace > > that a route is dead. This will signal to userspace that the route will > > not be selected. > > > > With the new sysctls set, the following behavior can be observed > > (interface p8p1 is link-down): > > > > # ip route show > > default via 10.0.5.2 dev p9p1 > > 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 > > 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 > > 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown > > 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown > > 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 > > # ip route get 90.0.0.1 > > 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 > > cache > > # ip route get 80.0.0.1 > > local 80.0.0.1 dev lo src 80.0.0.1 > > cache > > # ip route get 80.0.0.2 > > 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 > > cache > > > > While the route does remain in the table (so it can be modified if > > needed rather than being wiped away as it would be if IFF_UP was > > cleared), the proper next-hop is chosen automatically when the link is > > down. Now interface p8p1 is linked-up: > > > > # ip route show > > default via 10.0.5.2 dev p9p1 > > 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 > > 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 > > 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 > > 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 > > 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 > > 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 > > # ip route get 90.0.0.1 > > 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 > > cache > > # ip route get 80.0.0.1 > > local 80.0.0.1 dev lo src 80.0.0.1 > > cache > > # ip route get 80.0.0.2 > > 80.0.0.2 dev p8p1 src 80.0.0.1 > > cache > > > > and the output changes to what one would expect. > > > > If the global or interface sysctl is not set, the following output would be > > expected when p8p1 is down: > > > > # ip route show > > default via 10.0.5.2 dev p9p1 > > 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 > > 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 > > 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown > > 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown > > 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 > > > > If the dead flag does not appear there should be no expectation that the > > kernel would skip using this route due to link being down. > > > > v2: Split kernel changes into 2 patches: first to add linkdown flag and > > second to add new sysctl settings. Also took suggestion from Alex to > > simplify code by only checking sysctl during fib lookup and suggestion > > from Scott to add a per-interface sysctl. Added iproute2 patch to > > recognize and print linkdown flag. > > > > v3: Code cleanups along with reverse-path checks suggested by Alex and > > small fixes related to problems found when multipath was disabled. > > > > v4: Drop binary sysctls > > > > v5: Whitespace and variable declaration fixups suggested by Dave > > > > Though there were some that preferred not to have a configuration option > > and to make this behavior the default when it was discussed in Ottawa > > earlier this year since "it was time to do this." I wanted to propose > > the config option to preserve the current behavior for those that desire > > it. I'll happily remove it if Dave and Linus approve. > > > > An IPv6 implementation is also needed (DECnet too!), but I wanted to start > > with > > the IPv4 implementation to get people comfortable with the idea before > > moving > > forward. If this is accepted the IPv6 implementation can be posted shortly. > > >
Re: [PATCH net-next 00/43] Simplify netfilter and network namespaces (take 2)
On Wed, Jun 17, 2015 at 10:09:40AM -0500, Eric W. Biederman wrote: [...] > There are a few extra cleanups in the first group of changes sprinkled > in as I noticed a few other things as I was sorting out the network > namespace computation logic. This is a rather large patchset that address many pernet issues in the netfilter codebase, I would classify them in: 1) Patches to prepare the ground for easier pernet integration. 2) Get rid of the dev_net(dev) ? ... : ...; pattern all around the netfilter code. 3) Missing pernet sysctl support is some spots, eg. br_netfilter. 4) Pernet hooks, probably the largest changeset in this pile and the most important one IMO. So given that it's quite evident that netfilter netns support is half-cooked and there's room for improvement in it, as we've been receiving patches to partially add support on things that people sporadically needed, could you please split this in several (smaller) batches in logical changes for easier review? On a different front, nfnetlink_log and nfnetlink_queue also still lack of netns support so patches for that would be also appreciated in another different round. I'm going to take as much of small preparation patches that I can to reduce your patchload: 1/43, 8/43, 16/43, 17/43, 18/43, 26/43 Thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt wrote: > On Fri, 12 Jun 2015 11:50:38 -0400 > Steven Rostedt wrote: > >> I reverted the following commits: >> >> c627d31ba0696cbd829437af2be2f2dee3546b1e >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134 >> caf4ccd4e88cf2795c927834bc488c8321437586 >> >> And the issue goes away. That is, I watched the port go from >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. >> >> In fact, I watched the port with my portlist.c module, and it >> disappeared there too when it entered the TIME_WAIT state. >> I've scanned those commits again and again, and I'm not seeing how we could be introducing a socket leak there. The only suspect I can see would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you using NFS swap? > I've been running v4.0.5 with the above commits reverted for 5 days > now, and there's still no hidden port appearing. > > What's the status on this? Should those commits be reverted or is there > another solution to this bug? > I'm trying to reproduce, but I've had no luck yet. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces
Hello, On Thu, 18 Jun 2015, Eric W. Biederman wrote: > My incremental patch for ipvs on top of everything else I have pushed > out looks like this: > > From: "Eric W. Biederman" > Date: Fri, 12 Jun 2015 18:34:12 -0500 > Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used > > Pass struct net down to where it is used and stop guessing > which network namespace should be used. At first look patch is ok. But I'm not sure for the changes in ip_vs_xmit.c. Can you explain in 2-3 lines, when can we see different netns? Is it when packet is forwarded to output device and it is part from another netns? I'm asking because these __ip_vs_get_out_rt* calls in ip_vs_xmit.c can reroute packet to another device... Also, skb_sknet is another candidate for removal. But I can take care about it after your changes are pushed... Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] x_table: align per cpu xt_counter
On Wed, Jun 17, 2015 at 07:08:15PM +0200, Florian Westphal wrote: > Eric Dumazet wrote: > > From: Eric Dumazet > > > > Let's force a 16 bytes alignment on xt_counter percpu allocations, > > so that bytes and packets sit in same cache line. > > > > xt_counter being exported to user space, we cannot add __align(16) on > > the structure itself. > > Sorry, I was away. Looks great. > > Acked-by: Florian Westphal Applied, thanks Eric and Florian ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request: bluetooth-next 2015-06-18
Hi Dave, Here's the final bluetooth-next pull request for 4.2. - Cleanups & fixes to 802.15.4 code and related drivers - Fix btusb driver memory leak - New USB IDs for Atheros controllers - Support for BCM4324B3 UART based Broadcom controller - Fix for Bluetooth encryption key size handling - Broadcom controller initialization fixes - Support for Intel controller DDC parameters - Support for multiple Bluetooth LE advertising instances - Fix for HCI user channel cleanup path Please let me know if there are any issues pulling. Thanks. Johan --- The following changes since commit a9ab2184f451ec78af245ebb8b663d8700d44672: Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge (2015-05-31 01:07:06 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream for you to fetch changes up to 952497b159468477392f9b562b904da9bc76d468: Bluetooth: Fix warning of potentially uninitialized adv_instance variable (2015-06-18 21:05:31 +0300) Aleksei Volkov (1): Bluetooth: btusb: Correct typo in Roper Class 1 Bluetooth Dongle Alexander Aring (20): ieee802154: 6lowpan: set ackreq when needed mac802154: remove unneeded vif struct mac802154: cleanup address filtering flags mac802154: remove aack hw flag mac802154: cleanup ieee802154 hardware flags mac802154: remove unused hw_filt attribute mac802154: rearrange attribute in ieee802154_hw mac802154: add missing structure comments mac802154: change pan_coord type to bool mac802154: fix flags BIT definitions order mac802154: iface: fix hrtimer cancel on ifdown mac802154: iface: flush workqueue before stop at86rf230: use level high as fallback default at86rf230: add support for sleep state fakelb: add xmit_async after stop testcase at86rf230: fix phy settings while sleeping at86rf230: add recommended csma backoffs settings at86rf230: cleanup start and stop callbacks mac802154: iface: fix order while interface up mac802154: iface: cleanup stack variable Alexey Dobriyan (1): Bluetooth: Stop sabotaging list poisoning Arron Wang (2): Bluetooth: Make l2cap_recv_acldata() and sco_recv_scodata() return void Bluetooth: Move SCO support under BT_BREDR config option Chan-yeol Park (1): Bluetooth: hci_uart: Fix dereferencing of ERR_PTR Christoffer Holmstedt (1): nl802154: fix misspelled enum Dmitry Tunin (3): ath3k: Add support of 0489:e076 AR3012 device ath3k: add support of 13d3:3474 AR3012 device Bluetooth: ath3k: Add support of 04ca:300d AR3012 device Florian Grandel (20): Bluetooth: hci_core/mgmt: Introduce multi-adv list Bluetooth: hci_core/mgmt: move adv timeout to hdev Bluetooth: mgmt: dry update_scan_rsp_data() Bluetooth: mgmt: rename update_*_data_for_instance() Bluetooth: mgmt: multi adv for read_adv_features() Bluetooth: mgmt: multi adv for get_current_adv_instance() Bluetooth: mgmt: multi adv for get_adv_instance_flags() Bluetooth: mgmt: improve get_adv_instance_flags() readability Bluetooth: mgmt: multi adv for enable_advertising() Bluetooth: mgmt: multi adv for create_instance_scan_rsp_data() Bluetooth: mgmt: multi adv for create_instance_adv_data() Bluetooth: mgmt: multi adv for set_advertising*() Bluetooth: mgmt: multi adv for clear_adv_instances() Bluetooth: mgmt/hci_core: multi-adv for add_advertising*() Bluetooth: mgmt: multi adv for remove_advertising*() Bluetooth: mgmt: program multi-adv on power on Bluetooth: mgmt: multi-adv for trigger_le_scan() Bluetooth: mgmt: multi-adv for mgmt_reenable_advertising() Bluetooth: hci_core: remove obsolete adv_instance Bluetooth: hci_core: increase max adv inst Frederic Danis (7): Bluetooth: btbcm: Move request/release_firmware() Bluetooth: btbcm: Add BCM4324B3 UART device Bluetooth: hci_uart: Support operational speed during setup Bluetooth: btbcm: Add helper functions for UART setup Bluetooth: hci_uart: Update Broadcom UART setup Bluetooth: hci_uart: Add bcm_set_baudrate() Bluetooth: hci_uart: Fix speed selection Glenn Ruben Bakke (5): Bluetooth: 6lowpan: Enable delete_netdev to be scheduled when last peer is deleted Bluetooth: 6lowpan: Rename ambiguous variable Bluetooth: 6lowpan: Move netdev sysfs device reference Bluetooth: 6lowpan: Fix double kfree of netdev priv Bluetooth: 6lowpan: Fix module refcount Ilya Faenson (2): Bluetooth: btbcm: Support the BCM4354 Bluetooth UART device Bluetooth: hci_uart: Add new line discipline enhancements Jaganath Kanakkassery (1): Bluetooth: Fix potential NULL dereference in RFCOMM bind callback Johan Hedberg (10): Bluetoo
[PATCH next v3] bonding: Display LACP info only to CAP_NET_ADMIN capable user
Actor and Partner details can be accessed via proc-fs, sys-fs entries or netlink interface. These interfaces are world readable at this moment. The earlier patch-series made the LACP communication secure to avoid nuisance attack from within the same L2 domain but it did not prevent "someone unprivileged" looking at that information on host and perform the same act. This patch essentially avoids spitting those entries if the user in question does not have enough privileges. Signed-off-by: Mahesh Bandewar --- drivers/net/bonding/bond_netlink.c | 23 + drivers/net/bonding/bond_procfs.c | 101 +++-- drivers/net/bonding/bond_sysfs.c | 12 ++--- 3 files changed, 71 insertions(+), 65 deletions(-) diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 5580fcde738f..1bda29249d12 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -601,19 +601,20 @@ static int bond_fill_info(struct sk_buff *skb, if (BOND_MODE(bond) == BOND_MODE_8023AD) { struct ad_info info; - if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO, - bond->params.ad_actor_sys_prio)) - goto nla_put_failure; - - if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY, - bond->params.ad_user_port_key)) - goto nla_put_failure; + if (capable(CAP_NET_ADMIN)) { + if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO, + bond->params.ad_actor_sys_prio)) + goto nla_put_failure; - if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM, - sizeof(bond->params.ad_actor_system), - &bond->params.ad_actor_system)) - goto nla_put_failure; + if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY, + bond->params.ad_user_port_key)) + goto nla_put_failure; + if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM, + sizeof(bond->params.ad_actor_system), + &bond->params.ad_actor_system)) + goto nla_put_failure; + } if (!bond_3ad_get_active_agg_info(bond, &info)) { struct nlattr *nest; diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index e7f3047a26df..f514fe5e80a5 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c @@ -135,27 +135,30 @@ static void bond_info_show_master(struct seq_file *seq) bond->params.ad_select); seq_printf(seq, "Aggregator selection policy (ad_select): %s\n", optval->string); - seq_printf(seq, "System priority: %d\n", - BOND_AD_INFO(bond).system.sys_priority); - seq_printf(seq, "System MAC address: %pM\n", - &BOND_AD_INFO(bond).system.sys_mac_addr); - - if (__bond_3ad_get_active_agg_info(bond, &ad_info)) { - seq_printf(seq, "bond %s has no active aggregator\n", - bond->dev->name); - } else { - seq_printf(seq, "Active Aggregator Info:\n"); - - seq_printf(seq, "\tAggregator ID: %d\n", - ad_info.aggregator_id); - seq_printf(seq, "\tNumber of ports: %d\n", - ad_info.ports); - seq_printf(seq, "\tActor Key: %d\n", - ad_info.actor_key); - seq_printf(seq, "\tPartner Key: %d\n", - ad_info.partner_key); - seq_printf(seq, "\tPartner Mac Address: %pM\n", - ad_info.partner_system); + if (capable(CAP_NET_ADMIN)) { + seq_printf(seq, "System priority: %d\n", + BOND_AD_INFO(bond).system.sys_priority); + seq_printf(seq, "System MAC address: %pM\n", + &BOND_AD_INFO(bond).system.sys_mac_addr); + + if (__bond_3ad_get_active_agg_info(bond, &ad_info)) { + seq_printf(seq, + "bond %s has no active aggregator\n", + bond->dev->name); + } else { + seq_printf(seq, "Active Aggregator Info:\n"); + + seq_printf(seq, "\tAggregator ID: %d\n", +
Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user
>> >> Hmm... I would rather not send these fake attributes at all ? > > That would be my preference as well. Sorry if my lack of elaboration on > on my earlier email made this confusing. > > If there are values that should not be visible to non-root users, then > don't send them at all. Do not just send NULL values. > OK, would change this in the next rev. Thanks, -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status
On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek wrote: > This series adds the ability to have the Linux kernel track whether or > not a particular route should be used based on the link-status of the > interface associated with the next-hop. > > Before this patch any link-failure on an interface that was serving as a > gateway for some systems could result in those systems being isolated > from the rest of the network as the stack would continue to attempt to > send frames out of an interface that is actually linked-down. When the > kernel is responsible for all forwarding, it should also be responsible > for taking action when the traffic can no longer be forwarded -- there > is no real need to outsource link-monitoring to userspace anymore. > > This feature is only enabled with the new per-interface or ipv4 global > sysctls called 'ignore_routes_with_linkdown'. > > net.ipv4.conf.all.ignore_routes_with_linkdown = 0 > net.ipv4.conf.default.ignore_routes_with_linkdown = 0 > net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 > ... > > When the above sysctls are set, the kernel will not only report to > userspace that the link is down, but it will also report to userspace > that a route is dead. This will signal to userspace that the route will > not be selected. > > With the new sysctls set, the following behavior can be observed > (interface p8p1 is link-down): > > # ip route show > default via 10.0.5.2 dev p9p1 > 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 > 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 > 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown > 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown > 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 > # ip route get 90.0.0.1 > 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 > cache > # ip route get 80.0.0.1 > local 80.0.0.1 dev lo src 80.0.0.1 > cache > # ip route get 80.0.0.2 > 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 > cache > > While the route does remain in the table (so it can be modified if > needed rather than being wiped away as it would be if IFF_UP was > cleared), the proper next-hop is chosen automatically when the link is > down. Now interface p8p1 is linked-up: > > # ip route show > default via 10.0.5.2 dev p9p1 > 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 > 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 > 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 > 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 > 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 > 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 > # ip route get 90.0.0.1 > 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 > cache > # ip route get 80.0.0.1 > local 80.0.0.1 dev lo src 80.0.0.1 > cache > # ip route get 80.0.0.2 > 80.0.0.2 dev p8p1 src 80.0.0.1 > cache > > and the output changes to what one would expect. > > If the global or interface sysctl is not set, the following output would be > expected when p8p1 is down: > > # ip route show > default via 10.0.5.2 dev p9p1 > 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 > 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 > 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown > 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown > 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 > > If the dead flag does not appear there should be no expectation that the > kernel would skip using this route due to link being down. > > v2: Split kernel changes into 2 patches: first to add linkdown flag and > second to add new sysctl settings. Also took suggestion from Alex to > simplify code by only checking sysctl during fib lookup and suggestion > from Scott to add a per-interface sysctl. Added iproute2 patch to > recognize and print linkdown flag. > > v3: Code cleanups along with reverse-path checks suggested by Alex and > small fixes related to problems found when multipath was disabled. > > v4: Drop binary sysctls > > v5: Whitespace and variable declaration fixups suggested by Dave > > Though there were some that preferred not to have a configuration option > and to make this behavior the default when it was discussed in Ottawa > earlier this year since "it was time to do this." I wanted to propose > the config option to preserve the current behavior for those that desire > it. I'll happily remove it if Dave and Linus approve. > > An IPv6 implementation is also needed (DECnet too!), but I wanted to start > with > the IPv4 implementation to get people comfortable with the idea before moving > forward. If this is accepted the IPv6 implementation can be posted shortly. > > There was also a request for switchdev support for this, but that will be > posted as a followup as switchdev does not currently handle dead > next-hops in a multi-path case and I felt that infra needed to be added > first. Andy, I finally got some time to try your patches with switchd
Re: [PATCH net 2/2] bridge: multicast: start querier timer when running user-space stp
> On Jun 18, 2015, at 6:37 AM, Herbert Xu wrote: > > On Wed, Jun 17, 2015 at 04:28:31AM -0700, Nikolay Aleksandrov wrote: >> From: Satish Ashok >> >> When STP is running in user-space and querier is configured, the >> querier timer is not started when a port goes to forwarding state. >> >> Signed-off-by: Satish Ashok >> Signed-off-by: Nikolay Aleksandrov >> Fixes: eb1d16414339 ("bridge: Add core IGMP snooping support") >> --- >> net/bridge/br_stp.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c >> index fb3ebe615513..1e2f2f1ff6b0 100644 >> --- a/net/bridge/br_stp.c >> +++ b/net/bridge/br_stp.c >> @@ -456,6 +456,9 @@ void br_port_state_selection(struct net_bridge *br) >> p->topology_change_ack = 0; >> br_make_blocking(p); >> } >> +} else if (br->stp_enabled == BR_USER_STP && >> + p->state == BR_STATE_FORWARDING) { >> +br_multicast_enable_port(p); >> } > > Minor nit, the stp_enabled check appears to be redundant since > you're in the else clause. > Right you are, I’ve overlooked it. > More importantly, I'm not sure about the logic. For kernel STP, > we enable the port as soon as we get out of blocking. IIRC enabling > the port just means that we start tracking subscriptions/queries > so it should be OK to do even while we're listening/learning. > > In any case the logic should be identical whether we use kernel > STP or user-space STP. > > So how about removing br_multicast_enable_port from br_make_forward > and just add it here for both kernel and user-space STP? > > Thanks, > -- > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Makes sense, I’ll re-spin, test and post a v2. Thank you for the suggestion. Cheers, Nik -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
e1000e driver - hang after 4 hours of uptime - finally bisected!
(follow up to a report from last week - bisecting took a while as I could only do 1 or 2 tests an evening) My Dell Latitude E6530 crashes with a specific kernel lockup almost exactly 4 hours after boot if there isn't a cable connected to the Ethernet port: [14508.846327] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14468.229720] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14463.254791] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14491.134413] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14463.396593] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 [14490.390223] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14494.680591] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [14513.365378] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1 [14482.271716] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3 [14479.906820] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 As far as I can tell, the timestamp jitter is just how long it takes me to enter the cryptLUKS passphrase for the hard drive at boot... lspci tells me: lspci -vvv -s "00:19.0" 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) DeviceName: Onboard LAN Subsystem: Dell Device 0535 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- [] dump_stack+0x50/0xa8 [14479.906930] [] panic+0xcd/0x1e4 [14479.906940] [] ? perf_event_task_disable+0xc0/0xc0 [14479.906952] [] watchdog_overflow_callback+0x9b/0xa0 [14479.906959] [] __perf_event_overflow+0xc4/0x1f0 [14479.906968] [] perf_event_overflow+0x14/0x20 [14479.906976] [] intel_pmu_handle_irq+0x1e1/0x430 [14479.906990] [] perf_event_nmi_handler+0x26/0x40 [14479.906999] [] nmi_handle+0x103/0x340 [14479.907005] [] ? nmi_handle+0x5/0x340 [14479.907017] [] default_do_nmi+0xc3/0x120 [14479.907032] [] do_nmi+0xe8/0x130 [14479.907044] [] end_repeat_nmi+0x1e/0x2e [14479.907055] [] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907061] [] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907069] [] ? e1000e_cyclecounter_read+0x16/0xc0 [14479.907075] <> [] timecounter_read+0x19/0x60 [14479.907088] [] e1000e_phc_gettime+0x2e/0x60 [14479.907098] [] e1000e_systim_overflow_work+0x31/0x70 [14479.907105] [] process_one_work+0x3c9/0x980 [14479.907115] [] ? process_one_work+0x312/0x980 [14479.907125] [] ? worker_thread+0x78/0x760 [14479.907134] [] worker_thread+0x2cc/0x760 [14479.907144] [] ? process_one_work+0x980/0x980 [14479.907154] [] kthread+0xfe/0x120 [14479.907163] [] ? finish_task_switch+0x50/0x1c0 [14479.907173] [] ? kthread_create_on_node+0x270/0x270 [14479.907179] [] ret_from_fork+0x3f/0x70 [14479.907188] [] ? kthread_create_on_node+0x270/0x270 [14479.907243] Kernel Offset: 0x3900 from 0x8100 (relocation range: 0x8000-0xbfff) Bisection tells me it's this commit: commit 83129b37ef35bb6a7f01c060129736a8db5d31c4 Author: Yanir Lubetkin Date: Tue Jun 2 17:05:45 2015 +0300 e1000e: fix systim issues Two issues involving systim were reported. 1. Clock is not running in the correct frequency 2. In some situations, systim values were not incremented linearly This patch fixes the hardware clock configuration and the spurious non-linear increment. pgpw96_oDSKGZ.pgp Description: PGP signature
[PATCH net v2] tcp: Do not call tcp_fastopen_reset_cipher from interrupt context
tcp_fastopen_reset_cipher really cannot be called from interrupt context. It allocates the tcp_fastopen_context with GFP_KERNEL and calls crypto_alloc_cipher, which allocates all kind of stuff with GFP_KERNEL. Thus, we might sleep when the key-generation is triggered by an incoming TFO cookie-request which would then happen in interrupt- context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP: [ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266 [ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill [ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14 [ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 36.008250] 04f2 88007f8838a8 8171d53a 880075a084a8 [ 36.009630] 880075a08000 88007f8838c8 810967d3 88007f883928 [ 36.011076] 88007f8838f8 81096892 88007f89be00 [ 36.012494] Call Trace: [ 36.012953][] dump_stack+0x4f/0x6d [ 36.014085] [] ___might_sleep+0x103/0x170 [ 36.015117] [] __might_sleep+0x52/0x90 [ 36.016117] [] kmem_cache_alloc_trace+0x47/0x190 [ 36.017266] [] ? tcp_fastopen_reset_cipher+0x42/0x130 [ 36.018485] [] tcp_fastopen_reset_cipher+0x42/0x130 [ 36.019679] [] tcp_fastopen_init_key_once+0x61/0x70 [ 36.020884] [] __tcp_fastopen_cookie_gen+0x1c/0x60 [ 36.022058] [] tcp_try_fastopen+0x58f/0x730 [ 36.023118] [] tcp_conn_request+0x3e8/0x7b0 [ 36.024185] [] ? __module_text_address+0x12/0x60 [ 36.025327] [] tcp_v4_conn_request+0x51/0x60 [ 36.026410] [] tcp_rcv_state_process+0x190/0xda0 [ 36.027556] [] ? __inet_lookup_established+0x47/0x170 [ 36.028784] [] tcp_v4_do_rcv+0x16d/0x3d0 [ 36.029832] [] ? security_sock_rcv_skb+0x16/0x20 [ 36.030936] [] tcp_v4_rcv+0x77a/0x7b0 [ 36.031875] [] ? iptable_filter_hook+0x33/0x70 [ 36.032953] [] ip_local_deliver_finish+0x92/0x1f0 [ 36.034065] [] ip_local_deliver+0x9a/0xb0 [ 36.035069] [] ? ip_rcv+0x3d0/0x3d0 [ 36.035963] [] ip_rcv_finish+0x119/0x330 [ 36.036950] [] ip_rcv+0x2e7/0x3d0 [ 36.037847] [] __netif_receive_skb_core+0x552/0x930 [ 36.038994] [] __netif_receive_skb+0x27/0x70 [ 36.040033] [] process_backlog+0xd2/0x1f0 [ 36.041025] [] net_rx_action+0x122/0x310 [ 36.042007] [] __do_softirq+0x103/0x2f0 [ 36.042978] [] do_softirq_own_stack+0x1c/0x30 This patch moves the call to tcp_fastopen_init_key_once to the places where a listener socket creates its TFO-state, which always happens in user-context (either from the setsockopt, or implicitly during the listen()-call) Cc: Eric Dumazet Cc: Hannes Frederic Sowa Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once") Signed-off-by: Christoph Paasch --- Notes: v2: Instead of reverting Hannes' patch, move the call to tcp_fastopen_init_once to the places where we enable TFO on the server-side from user-context. net/ipv4/af_inet.c | 2 ++ net/ipv4/tcp.c | 7 +-- net/ipv4/tcp_fastopen.c | 2 -- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 8b47a4d79d04..a5aa54ea6533 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int backlog) err = 0; if (err) goto out; + + tcp_fastopen_init_key_once(true); } err = inet_csk_listen_start(sk, backlog); if (err) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f1377f2a0472..bb2ce74f6004 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2545,10 +2545,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level, case TCP_FASTOPEN: if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE | - TCPF_LISTEN))) + TCPF_LISTEN))) { + tcp_fastopen_init_key_once(true); + err = fastopen_init_queue(sk, val); - else + } else { err = -EINVAL; + } break; case TCP_TIMESTAMP: if (!tp->repair) diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 46b087a27503..f9c0fb84e435 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -78,8 +78,6 @@ static bool __tcp_fastopen_cookie_gen(const void *path, struct tcp_fastopen_context *ctx; bool ok = false; - tcp_fastopen_init_key_once(true); - rcu_read_lock(); ctx = rcu_dereference(tcp_fastopen_ctx); if (ctx) { -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger
Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag
On Thu, Jun 18, 2015 at 8:57 AM, Andy Gospodarek wrote: > On Thu, Jun 18, 2015 at 08:43:08AM -0700, Scott Feldman wrote: >> On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek >> wrote: >> > Signed-off-by: Andy Gospodaerk >> > Signed-off-by: Dinesh Dutt >> > >> > --- >> > ip/iproute.c | 4 >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/ip/iproute.c b/ip/iproute.c >> > index 3795baf..3369c49 100644 >> > --- a/ip/iproute.c >> > +++ b/ip/iproute.c >> > @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct >> > nlmsghdr *n, void *arg) >> > fprintf(fp, "offload "); >> > if (r->rtm_flags & RTM_F_NOTIFY) >> > fprintf(fp, "notify "); >> > + if (r->rtm_flags & RTNH_F_LINKDOWN) >> > + fprintf(fp, "linkdown "); >> >> >> iproute.c: In function ‘print_route’: >> iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in >> this function) >> iproute.c:454:21: note: each undeclared identifier is reported only >> once for each function it appears in > > Yes, you need to pull that from the patches above into your iproute2 > sources. Stephen regularly tells people not to pose uapi updates, so I > did not. Ok, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR
On Thu, Jun 18, 2015 at 08:31:45AM +, Wang Nan wrote: > Original code has a problem, cause following code failed to pass verifier: > > r1 <- r10 > r1 -= 8 > r2 = 8 > r3 = unsafe pointer > call BPF_FUNC_probe_read <-- R1 type=inv expected=fp > > However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be > loaded successfully. > > This is because the verifier allows only BPF_ADD instruction on a > FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB > on FRAME_PTR reigster to get a UNKNOWN_VALUE register. > > This patch fix it by adding BPF_SUB in stack_relative checking. It's not a bug. It's catching ADD only by design. If we let it recognize SUB then one might argue we should let it recognize multiply, shifts and all other arithmetic on pointers. verifier will be getting bigger and bigger. Where do we stop? llvm only emits canonical ADD. If you've seen llvm doing SUB, let's fix it there. So what piece generated this 'r1 -= 8' ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag
On Thu, Jun 18, 2015 at 08:43:08AM -0700, Scott Feldman wrote: > On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek > wrote: > > Signed-off-by: Andy Gospodaerk > > Signed-off-by: Dinesh Dutt > > > > --- > > ip/iproute.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/ip/iproute.c b/ip/iproute.c > > index 3795baf..3369c49 100644 > > --- a/ip/iproute.c > > +++ b/ip/iproute.c > > @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct > > nlmsghdr *n, void *arg) > > fprintf(fp, "offload "); > > if (r->rtm_flags & RTM_F_NOTIFY) > > fprintf(fp, "notify "); > > + if (r->rtm_flags & RTNH_F_LINKDOWN) > > + fprintf(fp, "linkdown "); > > > iproute.c: In function ‘print_route’: > iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in > this function) > iproute.c:454:21: note: each undeclared identifier is reported only > once for each function it appears in Yes, you need to pull that from the patches above into your iproute2 sources. Stephen regularly tells people not to pose uapi updates, so I did not. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mvneta: add forgotten initialization of autonegotiation bits
The commit 898b2970e2c9 ("mvneta: implement SGMII-based in-band link state signaling") changed mvneta_adjust_link() so that it does not clear the auto-negotiation bits in MVNETA_GMAC_AUTONEG_CONFIG register. This was necessary for auto-negotiation mode to work. Unfortunately I haven't checked if these bits are ever initialized. It appears they are not. This patch adds the missing initialization of the auto-negotiation bits in the MVNETA_GMAC_AUTONEG_CONFIG register. It fixes the following regression: https://www.mail-archive.com/netdev@vger.kernel.org/msg67928.html Since the patch was tested to fix a regression, it should be applied to stable tree. Tested-by: Arnaud Ebalard CC: Thomas Petazzoni CC: Florian Fainelli CC: netdev@vger.kernel.org CC: linux-ker...@vger.kernel.org CC: sta...@vger.kernel.org Signed-off-by: Stas Sergeev --- drivers/net/ethernet/marvell/mvneta.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index ce5f7f9..74176ec 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1013,6 +1013,12 @@ static void mvneta_defaults_set(struct mvneta_port *pp) val = mvreg_read(pp, MVNETA_GMAC_CLOCK_DIVIDER); val |= MVNETA_GMAC_1MS_CLOCK_ENABLE; mvreg_write(pp, MVNETA_GMAC_CLOCK_DIVIDER, val); + } else { + val = mvreg_read(pp, MVNETA_GMAC_AUTONEG_CONFIG); + val &= ~(MVNETA_GMAC_INBAND_AN_ENABLE | + MVNETA_GMAC_AN_SPEED_EN | + MVNETA_GMAC_AN_DUPLEX_EN); + mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val); } mvneta_set_ucast_table(pp, -1); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC V3] net: don't wait for order-3 page allocation
On Thu 18-06-15 17:22:40, Vlastimil Babka wrote: > On 06/18/2015 04:43 PM, Michal Hocko wrote: > >On Thu 18-06-15 07:35:53, Eric Dumazet wrote: > >>On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko wrote: > >> > >>>Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the > >>>_current_ implementation of the allocator has this nasty and very subtle > >>>side effect but that doesn't mean it should be abused outside of the mm > >>>proper. Why shouldn't this path wake the kswapd and let it compact > >>>memory on the background to increase the success rate for the later > >>>high order allocations? > >> > >>I kind of agree. > >> > >>If kswapd is a problem (is it ???) we should fix it, instead of adding > >>yet another flag to some random locations attempting > >>memory allocations. > > > >No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can > >access some portion of the memory reserves (see gfp_to_alloc_flags resp. > >__zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty > >hack to not give that access which was introduced for THP AFAIR. > > > >The implicit access to memory reserves for non sleeping allocation has > >been there for ages and it might be not suitable for this particular > >path but that doesn't mean another gfp flag with a different side effect > >should be hijacked. We should either stop doing that implicit access to > >memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But > >that is a problem to be solved in the mm proper. Spreading subtle > >dependencies outside of mm will just make situation worse. > > So you are not proposing to use these __GFP_RESERVE/NORESERVE flag outside > of mm, right? (besides, we distinguish several kinds of reserves, so what > exactly would the flag do?) That is to be discussed. Most allocations already express their interest in memory reserves by __GFP_HIGH directly or by GFP_ATOMIC indirectly. So maybe we do not need any additional flag here. There are not that many ~__GFP_WAIT and most of them seem to require it _only_ because the context doesn't allow for sleeping (e.g. to prevent from deadlocks). > As that would be also subtle dependency. The > general problem I think is that we should want the mm users to specify > higher-level intentions (such as GFP_KERNEL) which would map to specific > directions (__GFP_*) for the allocator, and currently it's rather a mess of > both kinds of flags. I agree. So I think that maybe we should drop that implicit access to memory reserves for ~__GFP_WAIT allocations and let it do what it is documented to do. > Clearly the intention here is "opportunistic allocation that should > not reclaim/compact, use reserves, wake up kswapd (?) because it's > better to fall back to smaller pages than wait") and we don't seem to > have a GFP_OPPORTUNISTIC flag for that. The allocation has to then > mask out __GFP_WAIT which however looks like an atomic allocation to > the allocator and give access to reserves, etc... I think simply dropping GFP_WAIT is a good way to express that. The fact that the current implementation gives access to memory reserves implicitly is just a detail and the user of the allocator shouldn't care about that. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] Revert "tcp: switch tcp_fastopen key generation to net_get_random_once"
On 18/06/15 - 04:14:13, Eric Dumazet wrote: > On Thu, 2015-06-18 at 11:32 +0200, Hannes Frederic Sowa wrote: > > > There does not seem to be a better way to handle this. We could try > > > to make the call to kmalloc and crypto_alloc_cipher during bootup, and > > > then generate the random value only on-the-fly (when the first TFO-SYN > > > comes in) with net_get_random_once in order to have the better entropy > > > that comes with doing the late initialisation of the random value. But > > > that's probably net-next material. > > > > can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt > > and > > sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process > > context > > but we still defer the extraction of entropy as long as posible? > > Yes, I do not think this would be hard. This bug is old (3.13) and does > not seem very urgent to expedite a revert. True, it would be simpler to call tcp_fastopen_init_key_once to the setsocketopt() and inet_listen(). I will resubmit. Christoph -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag
On Thu, Jun 18, 2015 at 8:22 AM, Andy Gospodarek wrote: > Signed-off-by: Andy Gospodaerk > Signed-off-by: Dinesh Dutt > > --- > ip/iproute.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/ip/iproute.c b/ip/iproute.c > index 3795baf..3369c49 100644 > --- a/ip/iproute.c > +++ b/ip/iproute.c > @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct > nlmsghdr *n, void *arg) > fprintf(fp, "offload "); > if (r->rtm_flags & RTM_F_NOTIFY) > fprintf(fp, "notify "); > + if (r->rtm_flags & RTNH_F_LINKDOWN) > + fprintf(fp, "linkdown "); iproute.c: In function ‘print_route’: iproute.c:454:21: error: ‘RTNH_F_LINKDOWN’ undeclared (first use in this function) iproute.c:454:21: note: each undeclared identifier is reported only once for each function it appears in -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/4] net/macb: add sama5d2 support
On 18/06/2015 at 16:27:19 +0200, Nicolas Ferre wrote : > Hi, > > This series is basically the support for another flavor of the GEM IP > configuration. It ended up being a series because of some little fixes made to > the binding documentation before adding the new compatibility string. > > Bye, > > v2: - fix bindings > - add sama5d2 compatibility string to the binding documentation > > Cyrille Pitchen (1): > net/macb: add config for Atmel sama5d2 SoCs > > Nicolas Ferre (3): > net/macb: bindings doc: fix compatibility string > net/macb: bindings doc/trivial: fix sama5d4 comment > net/macb: bindings doc: add sama5d2 compatibility sting > > Documentation/devicetree/bindings/net/macb.txt | 5 +++-- > drivers/net/ethernet/cadence/macb.c| 8 > 2 files changed, 11 insertions(+), 2 deletions(-) > For the patch set: Acked-by: Alexandre Belloni -- Alexandre Belloni, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/3 v5] net: track link-status of ipv4 nexthops
Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are reachable via an interface where carrier is off. No action is taken, but additional flags are passed to userspace to indicate carrier status. This also includes a cleanup to fib_disable_ip to more clearly indicate what event made the function call to replace the more cryptic force option previously used. v2: Split out kernel functionality into 2 patches, this patch simply sets and clears new nexthop flag RTNH_F_LINKDOWN. v3: Cleanups suggested by Alex as well as a bug noticed in fib_sync_down_dev and fib_sync_up when multipath was not enabled. v5: Whitespace and variable declaration fixups suggested by Dave Signed-off-by: Andy Gospodarek Signed-off-by: Dinesh Dutt --- include/net/ip_fib.h | 4 +-- include/uapi/linux/rtnetlink.h | 3 +++ net/ipv4/fib_frontend.c| 22 ++-- net/ipv4/fib_semantics.c | 60 +- 4 files changed, 66 insertions(+), 23 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed..f73d27c 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -305,9 +305,9 @@ void fib_flush_external(struct net *net); /* Exported by fib_semantics.c */ int ip_fib_check_default(__be32 gw, struct net_device *dev); -int fib_sync_down_dev(struct net_device *dev, int force); +int fib_sync_down_dev(struct net_device *dev, unsigned long event); int fib_sync_down_addr(struct net *net, __be32 local); -int fib_sync_up(struct net_device *dev); +int fib_sync_up(struct net_device *dev, unsigned int nh_flags); void fib_select_multipath(struct fib_result *res); /* Exported by fib_trie.c */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 17fb02f..8ab874a 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -338,6 +338,9 @@ struct rtnexthop { #define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */ #define RTNH_F_ONLINK 4 /* Gateway is forced on link*/ #define RTNH_F_OFFLOAD 8 /* offloaded route */ +#define RTNH_F_LINKDOWN16 /* carrier-down on nexthop */ + +#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | RTNH_F_LINKDOWN) /* used as mask for route comparisons */ /* Macros to handle hexthops */ diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e..54d3c45 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1063,9 +1063,9 @@ static void nl_fib_lookup_exit(struct net *net) net->ipv4.fibnl = NULL; } -static void fib_disable_ip(struct net_device *dev, int force) +static void fib_disable_ip(struct net_device *dev, unsigned long event) { - if (fib_sync_down_dev(dev, force)) + if (fib_sync_down_dev(dev, event)) fib_flush(dev_net(dev)); rt_cache_flush(dev_net(dev)); arp_ifdown(dev); @@ -1081,7 +1081,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event, case NETDEV_UP: fib_add_ifaddr(ifa); #ifdef CONFIG_IP_ROUTE_MULTIPATH - fib_sync_up(dev); + fib_sync_up(dev, RTNH_F_DEAD); #endif atomic_inc(&net->ipv4.dev_addr_genid); rt_cache_flush(dev_net(dev)); @@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event, /* Last address was deleted from this interface. * Disable IP. */ - fib_disable_ip(dev, 1); + fib_disable_ip(dev, event); } else { rt_cache_flush(dev_net(dev)); } @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct in_device *in_dev; struct net *net = dev_net(dev); + unsigned int flags; if (event == NETDEV_UNREGISTER) { - fib_disable_ip(dev, 2); + fib_disable_ip(dev, event); rt_flush_dev(dev); return NOTIFY_DONE; } @@ -1124,16 +1125,21 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo fib_add_ifaddr(ifa); } endfor_ifa(in_dev); #ifdef CONFIG_IP_ROUTE_MULTIPATH - fib_sync_up(dev); + fib_sync_up(dev, RTNH_F_DEAD); #endif atomic_inc(&net->ipv4.dev_addr_genid); rt_cache_flush(net); break; case NETDEV_DOWN: - fib_disable_ip(dev, 0); + fib_disable_ip(dev, event); break; - case NETDEV_CHANGEMTU: case NETDEV_CHANGE: + flags = dev_get_flags(dev); + if (flags & (IFF_RUNNING|IFF_LOWER_UP)) + fi
[PATCH net-next 2/3 v5] net: ipv4 sysctl option to ignore routes when nexthop link is down
This feature is only enabled with the new per-interface or ipv4 global sysctls called 'ignore_routes_with_linkdown'. net.ipv4.conf.all.ignore_routes_with_linkdown = 0 net.ipv4.conf.default.ignore_routes_with_linkdown = 0 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, will report to userspace that a route is dead and will no longer resolve to this nexthop when performing a fib lookup. This will signal to userspace that the route will not be selected. The signalling of a RTNH_F_DEAD is only passed to userspace if the sysctl is enabled and link is down. This was done as without it the netlink listeners would have no idea whether or not a nexthop would be selected. The kernel only sets RTNH_F_DEAD internally if the inteface has IFF_UP cleared. With the new sysctl set, the following behavior can be observed (interface p8p1 is link-down): default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 cache While the route does remain in the table (so it can be modified if needed rather than being wiped away as it would be if IFF_UP was cleared), the proper next-hop is chosen automatically when the link is down. Now interface p8p1 is linked-up: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache 80.0.0.2 dev p8p1 src 80.0.0.1 cache and the output changes to what one would expect. If the sysctl is not set, the following output would be expected when p8p1 is down: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 Since the dead flag does not appear, there should be no expectation that the kernel would skip using this route due to link being down. v2: Split kernel changes into 2 patches, this actually makes a behavioral change if the sysctl is set. Also took suggestion from Alex to simplify code by only checking sysctl during fib lookup and suggestion from Scott to add a per-interface sysctl. v3: Code clean-ups to make it more readable and efficient as well as a reverse path check fix. v4: Drop binary sysctl v5: Whitespace fixups from Dave Signed-off-by: Andy Gospodarek Signed-off-by: Dinesh Dutt --- include/linux/inetdevice.h| 3 +++ include/net/fib_rules.h | 3 ++- include/net/ip_fib.h | 16 +--- include/uapi/linux/ip.h | 1 + net/ipv4/devinet.c| 2 ++ net/ipv4/fib_frontend.c | 6 +++--- net/ipv4/fib_rules.c | 5 +++-- net/ipv4/fib_semantics.c | 31 ++- net/ipv4/fib_trie.c | 7 +++ net/ipv4/netfilter/ipt_rpfilter.c | 2 +- net/ipv4/route.c | 10 +- 11 files changed, 62 insertions(+), 24 deletions(-) diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index 0a21fbe..a4328ce 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -120,6 +120,9 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) || (!IN_DEV_FORWARD(in_dev) && \ IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS))) +#define IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) \ + IN_DEV_CONF_GET((in_dev), IGNORE_ROUTES_WITH_LINKDOWN) + #define IN_DEV_ARPFILTER(in_dev) IN_DEV_ORCONF((in_dev), ARPFILTER) #define IN_DEV_ARP_ACCEPT(in_dev) IN_DEV_ORCONF((in_dev), ARP_ACCEPT) #define IN_DEV_ARP_ANNOUNCE(in_dev)IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE) diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 6d67383..903a55e 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -36,7 +36,8 @@ struct fib_lookup_arg { void*result; struct fib_rule *rule; int flags; -#define FIB_LOOKUP_NOREF 1 +#define FIB_LOOKUP_NOREF 1 +#define FIB_LOOKUP_IGNORE_LINKSTATE2 }; struct fib_rules_ops { diff --git a/include/net/ip_fib.h b/include/n
[PATCH net-next 3/3 v5] iproute2: add support to print 'linkdown' nexthop flag
Signed-off-by: Andy Gospodaerk Signed-off-by: Dinesh Dutt --- ip/iproute.c | 4 1 file changed, 4 insertions(+) diff --git a/ip/iproute.c b/ip/iproute.c index 3795baf..3369c49 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -451,6 +451,8 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) fprintf(fp, "offload "); if (r->rtm_flags & RTM_F_NOTIFY) fprintf(fp, "notify "); + if (r->rtm_flags & RTNH_F_LINKDOWN) + fprintf(fp, "linkdown "); if (tb[RTA_MARK]) { unsigned int mark = *(unsigned int*)RTA_DATA(tb[RTA_MARK]); if (mark) { @@ -670,6 +672,8 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) fprintf(fp, " onlink"); if (nh->rtnh_flags & RTNH_F_PERVASIVE) fprintf(fp, " pervasive"); + if (nh->rtnh_flags & RTNH_F_LINKDOWN) + fprintf(fp, " linkdown"); len -= NLMSG_ALIGN(nh->rtnh_len); nh = RTNH_NEXT(nh); } -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/3 v5] changes to make ipv4 routing table aware of next-hop link status
This series adds the ability to have the Linux kernel track whether or not a particular route should be used based on the link-status of the interface associated with the next-hop. Before this patch any link-failure on an interface that was serving as a gateway for some systems could result in those systems being isolated from the rest of the network as the stack would continue to attempt to send frames out of an interface that is actually linked-down. When the kernel is responsible for all forwarding, it should also be responsible for taking action when the traffic can no longer be forwarded -- there is no real need to outsource link-monitoring to userspace anymore. This feature is only enabled with the new per-interface or ipv4 global sysctls called 'ignore_routes_with_linkdown'. net.ipv4.conf.all.ignore_routes_with_linkdown = 0 net.ipv4.conf.default.ignore_routes_with_linkdown = 0 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, the kernel will not only report to userspace that the link is down, but it will also report to userspace that a route is dead. This will signal to userspace that the route will not be selected. With the new sysctls set, the following behavior can be observed (interface p8p1 is link-down): # ip route show default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 # ip route get 90.0.0.1 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 cache # ip route get 80.0.0.1 local 80.0.0.1 dev lo src 80.0.0.1 cache # ip route get 80.0.0.2 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 cache While the route does remain in the table (so it can be modified if needed rather than being wiped away as it would be if IFF_UP was cleared), the proper next-hop is chosen automatically when the link is down. Now interface p8p1 is linked-up: # ip route show default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 # ip route get 90.0.0.1 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 cache # ip route get 80.0.0.1 local 80.0.0.1 dev lo src 80.0.0.1 cache # ip route get 80.0.0.2 80.0.0.2 dev p8p1 src 80.0.0.1 cache and the output changes to what one would expect. If the global or interface sysctl is not set, the following output would be expected when p8p1 is down: # ip route show default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 If the dead flag does not appear there should be no expectation that the kernel would skip using this route due to link being down. v2: Split kernel changes into 2 patches: first to add linkdown flag and second to add new sysctl settings. Also took suggestion from Alex to simplify code by only checking sysctl during fib lookup and suggestion from Scott to add a per-interface sysctl. Added iproute2 patch to recognize and print linkdown flag. v3: Code cleanups along with reverse-path checks suggested by Alex and small fixes related to problems found when multipath was disabled. v4: Drop binary sysctls v5: Whitespace and variable declaration fixups suggested by Dave Though there were some that preferred not to have a configuration option and to make this behavior the default when it was discussed in Ottawa earlier this year since "it was time to do this." I wanted to propose the config option to preserve the current behavior for those that desire it. I'll happily remove it if Dave and Linus approve. An IPv6 implementation is also needed (DECnet too!), but I wanted to start with the IPv4 implementation to get people comfortable with the idea before moving forward. If this is accepted the IPv6 implementation can be posted shortly. There was also a request for switchdev support for this, but that will be posted as a followup as switchdev does not currently handle dead next-hops in a multi-path case and I felt that infra needed to be added first. FWIW, we have been running the original version of this series with a global sysctl and our customers have been happily using a backported version for IPv4 and IPv6 for >6 months. Andy Gospodarek (3): net: track link-status of ipv4 nexthops net: ipv4 sysctl option to ignore routes when
Re: [RFC V3] net: don't wait for order-3 page allocation
On 06/18/2015 04:43 PM, Michal Hocko wrote: On Thu 18-06-15 07:35:53, Eric Dumazet wrote: On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko wrote: Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the _current_ implementation of the allocator has this nasty and very subtle side effect but that doesn't mean it should be abused outside of the mm proper. Why shouldn't this path wake the kswapd and let it compact memory on the background to increase the success rate for the later high order allocations? I kind of agree. If kswapd is a problem (is it ???) we should fix it, instead of adding yet another flag to some random locations attempting memory allocations. No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can access some portion of the memory reserves (see gfp_to_alloc_flags resp. __zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty hack to not give that access which was introduced for THP AFAIR. The implicit access to memory reserves for non sleeping allocation has been there for ages and it might be not suitable for this particular path but that doesn't mean another gfp flag with a different side effect should be hijacked. We should either stop doing that implicit access to memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But that is a problem to be solved in the mm proper. Spreading subtle dependencies outside of mm will just make situation worse. So you are not proposing to use these __GFP_RESERVE/NORESERVE flag outside of mm, right? (besides, we distinguish several kinds of reserves, so what exactly would the flag do?) As that would be also subtle dependency. The general problem I think is that we should want the mm users to specify higher-level intentions (such as GFP_KERNEL) which would map to specific directions (__GFP_*) for the allocator, and currently it's rather a mess of both kinds of flags. Clearly the intention here is "opportunistic allocation that should not reclaim/compact, use reserves, wake up kswapd (?) because it's better to fall back to smaller pages than wait") and we don't seem to have a GFP_OPPORTUNISTIC flag for that. The allocation has to then mask out __GFP_WAIT which however looks like an atomic allocation to the allocator and give access to reserves, etc... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3 v4] net: track link-status of ipv4 nexthops
On Thu, Jun 18, 2015 at 03:26:30AM -0700, David Miller wrote: > From: Andy Gospodarek > Date: Mon, 15 Jun 2015 12:33:19 -0400 > > > @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block > > *this, unsigned long event, vo > > struct net_device *dev = netdev_notifier_info_to_dev(ptr); > > struct in_device *in_dev; > > struct net *net = dev_net(dev); > > + unsigned flags; > > Please always fully spell out "unsigned int" instead of shortening it to > just "unsigned", thanks. > > > @@ -920,11 +926,17 @@ struct fib_info *fib_create_info(struct fib_config > > *cfg) > > if (!nh->nh_dev) > > goto failure; > > } else { > > + int linkdown = 0; > > change_nexthops(fi) { > > Please put an empty line between local variable declarations and > code. Ugh, thanks. I'll fixup this and your other comments with v5. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 00/15] Simplify netfilter and network namespaces
Cc' list trimmed as this is not longer about the original patch submission. Julian Anastasov writes: > Hello, > > On Wed, 17 Jun 2015, Eric W. Biederman wrote: > >> p.s. I do have my patch that I can toss in your direction if you are >> interested. > > Of course... I'll be able to check it after 8 hours... My incremental patch for ipvs on top of everything else I have pushed out looks like this: From: "Eric W. Biederman" Date: Fri, 12 Jun 2015 18:34:12 -0500 Subject: [PATCH] ipvs: Pass struct net down to where it is needed and used Pass struct net down to where it is used and stop guessing which network namespace should be used. Signed-off-by: "Eric W. Biederman" --- include/net/ip_vs.h | 45 +++- net/netfilter/ipvs/ip_vs_conn.c | 11 ++- net/netfilter/ipvs/ip_vs_core.c | 118 ++-- net/netfilter/ipvs/ip_vs_ftp.c | 8 +-- net/netfilter/ipvs/ip_vs_proto_ah_esp.c | 9 ++- net/netfilter/ipvs/ip_vs_proto_sctp.c | 5 +- net/netfilter/ipvs/ip_vs_proto_tcp.c| 8 +-- net/netfilter/ipvs/ip_vs_proto_udp.c| 5 +- net/netfilter/ipvs/ip_vs_xmit.c | 51 -- net/netfilter/xt_ipvs.c | 2 +- 10 files changed, 108 insertions(+), 154 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 4e3731ee4eac..a556d14cff70 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -35,37 +35,6 @@ static inline struct netns_ipvs *net_ipvs(struct net* net) return net->ipvs; } -/* Get net ptr from skb in traffic cases - * use skb_sknet when call is from userland (ioctl or netlink) - */ -static inline struct net *skb_net(const struct sk_buff *skb) -{ -#ifdef CONFIG_NET_NS -#ifdef CONFIG_IP_VS_DEBUG - /* -* This is used for debug only. -* Start with the most likely hit -* End with BUG -*/ - if (likely(skb->dev && dev_net(skb->dev))) - return dev_net(skb->dev); - if (skb_dst(skb) && skb_dst(skb)->dev) - return dev_net(skb_dst(skb)->dev); - WARN(skb->sk, "Maybe skb_sknet should be used in %s() at line:%d\n", - __func__, __LINE__); - if (likely(skb->sk && sock_net(skb->sk))) - return sock_net(skb->sk); - pr_err("There is no net ptr to find in the skb in %s() line:%d\n", - __func__, __LINE__); - BUG(); -#else - return dev_net(skb->dev ? : skb_dst(skb)->dev); -#endif -#else - return &init_net; -#endif -} - static inline struct net *skb_sknet(const struct sk_buff *skb) { #ifdef CONFIG_NET_NS @@ -441,19 +410,19 @@ struct ip_vs_protocol { void (*exit_netns)(struct net *net, struct ip_vs_proto_data *pd); - int (*conn_schedule)(int af, struct sk_buff *skb, + int (*conn_schedule)(struct net *net, int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, int *verdict, struct ip_vs_conn **cpp, struct ip_vs_iphdr *iph); struct ip_vs_conn * - (*conn_in_get)(int af, + (*conn_in_get)(struct net *net, int af, const struct sk_buff *skb, const struct ip_vs_iphdr *iph, int inverse); struct ip_vs_conn * - (*conn_out_get)(int af, + (*conn_out_get)(struct net *net, int af, const struct sk_buff *skb, const struct ip_vs_iphdr *iph, int inverse); @@ -1179,13 +1148,15 @@ static inline void ip_vs_conn_fill_param(struct net *net, int af, int protocol, struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p); struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p); -struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb, +struct ip_vs_conn * ip_vs_conn_in_get_proto(struct net *net, int af, + const struct sk_buff *skb, const struct ip_vs_iphdr *iph, int inverse); struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p); -struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb, +struct ip_vs_conn * ip_vs_conn_out_get_proto(struct net *net, int af, +const struct sk_buff *skb, const struct ip_vs_iphdr *iph, int inverse); @@ -1215,7 +1186,7 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp); const char *ip_vs_state_name(__u16 proto, int state); -void ip_vs_tcp_conn_listen(struct net *net, struct ip_vs_conn *cp); +void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp); int ip_vs_check_template(struct ip_vs_conn *ct); void ip_vs_random_dropentry(struct net *net); int
Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user
On Thu, Jun 18, 2015 at 04:17:36AM -0700, Eric Dumazet wrote: > On Wed, 2015-06-17 at 17:59 -0700, Mahesh Bandewar wrote: > > Actor and Partner details can be accessed via proc-fs, sys-fs > > entries or netlink interface. These interfaces are world readable > > at this moment. The earlier patch-series made the LACP communication > > secure to avoid nuisance attack from within the same L2 domain but > > it did not prevent "someone unprivileged" looking at that information > > on host and perform the same act. > > > > This patch essentially avoids spitting those entries if the user > > in question does not have enough privileges. > > > > Signed-off-by: Mahesh Bandewar > > --- > > drivers/net/bonding/bond_netlink.c | 11 ++-- > > drivers/net/bonding/bond_procfs.c | 101 > > +++-- > > drivers/net/bonding/bond_sysfs.c | 12 ++--- > > 3 files changed, 67 insertions(+), 57 deletions(-) > > > > diff --git a/drivers/net/bonding/bond_netlink.c > > b/drivers/net/bonding/bond_netlink.c > > index 5580fcde738f..3fd3aa4b145e 100644 > > --- a/drivers/net/bonding/bond_netlink.c > > +++ b/drivers/net/bonding/bond_netlink.c > > @@ -600,18 +600,23 @@ static int bond_fill_info(struct sk_buff *skb, > > > > if (BOND_MODE(bond) == BOND_MODE_8023AD) { > > struct ad_info info; > > + u8 zero_mac[ETH_ALEN]; > > > > + eth_zero_addr(zero_mac); > > if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO, > > - bond->params.ad_actor_sys_prio)) > > + capable(CAP_NET_ADMIN) ? > > + bond->params.ad_actor_sys_prio : 0)) > > goto nla_put_failure; > > > > if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY, > > - bond->params.ad_user_port_key)) > > + capable(CAP_NET_ADMIN) ? > > + bond->params.ad_user_port_key : 0)) > > goto nla_put_failure; > > > > if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM, > > sizeof(bond->params.ad_actor_system), > > - &bond->params.ad_actor_system)) > > + capable(CAP_NET_ADMIN) ? > > + &bond->params.ad_actor_system : &zero_mac)) > > goto nla_put_failure; > > > > Hmm... I would rather not send these fake attributes at all ? That would be my preference as well. Sorry if my lack of elaboration on on my earlier email made this confusing. If there are values that should not be visible to non-root users, then don't send them at all. Do not just send NULL values. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC V3] net: don't wait for order-3 page allocation
On Thu 18-06-15 07:35:53, Eric Dumazet wrote: > On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko wrote: > > > Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the > > _current_ implementation of the allocator has this nasty and very subtle > > side effect but that doesn't mean it should be abused outside of the mm > > proper. Why shouldn't this path wake the kswapd and let it compact > > memory on the background to increase the success rate for the later > > high order allocations? > > I kind of agree. > > If kswapd is a problem (is it ???) we should fix it, instead of adding > yet another flag to some random locations attempting > memory allocations. No, kswapd is not a problem. The problem is ~__GFP_WAIT allocation can access some portion of the memory reserves (see gfp_to_alloc_flags resp. __zone_watermark_ok and ALLOC_HARDER). __GFP_NO_KSWAPD is just a dirty hack to not give that access which was introduced for THP AFAIR. The implicit access to memory reserves for non sleeping allocation has been there for ages and it might be not suitable for this particular path but that doesn't mean another gfp flag with a different side effect should be hijacked. We should either stop doing that implicit access to memory reserves and give __GFP_RESERVE or add the __GFP_NORESERVE. But that is a problem to be solved in the mm proper. Spreading subtle dependencies outside of mm will just make situation worse. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC V3] net: don't wait for order-3 page allocation
On Thu, Jun 18, 2015 at 7:30 AM, Michal Hocko wrote: > Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the > _current_ implementation of the allocator has this nasty and very subtle > side effect but that doesn't mean it should be abused outside of the mm > proper. Why shouldn't this path wake the kswapd and let it compact > memory on the background to increase the success rate for the later > high order allocations? I kind of agree. If kswapd is a problem (is it ???) we should fix it, instead of adding yet another flag to some random locations attempting memory allocations. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC V3] net: don't wait for order-3 page allocation
On Wed 17-06-15 16:02:59, David Rientjes wrote: > On Fri, 12 Jun 2015, Vlastimil Babka wrote: > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index 3cfff2a..41ec022 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -4398,7 +4398,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long > > > header_len, > > > > > > while (order) { > > > if (npages >= 1 << order) { > > > - page = alloc_pages(gfp_mask | > > > + page = alloc_pages((gfp_mask & ~__GFP_WAIT) | > > > __GFP_COMP | > > > __GFP_NOWARN | > > > __GFP_NORETRY, > > > > Note that __GFP_NORETRY is weaker than ~__GFP_WAIT and thus redundant. But > > it > > won't hurt anything leaving it there. And you might consider __GFP_NO_KSWAPD > > instead, as I said in the other thread. > > > > Yeah, I agreed with __GFP_NO_KSWAPD to avoid utilizing memory reserves for > this. Abusing __GFP_NO_KSWAPD is a wrong way to go IMHO. It is true that the _current_ implementation of the allocator has this nasty and very subtle side effect but that doesn't mean it should be abused outside of the mm proper. Why shouldn't this path wake the kswapd and let it compact memory on the background to increase the success rate for the later high order allocations? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] net/macb: bindings doc: add sama5d2 compatibility sting
Add sama5d2 to the biding documentation for this use of the GEM IP. Signed-off-by: Nicolas Ferre --- Documentation/devicetree/bindings/net/macb.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index 97349e3f3ff2..b5d79761ac97 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -7,6 +7,7 @@ Required properties: Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb". Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on the Cadence GEM, or the generic form: "cdns,gem". + Use "atmel,sama5d2-gem" for the GEM IP (10/100) available on Atmel sama5d2 SoCs. Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs. Use "atmel,sama5d4-gem" for the GEM IP (10/100) available on Atmel sama5d4 SoCs. Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC. -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/4] net/macb: add config for Atmel sama5d2 SoCs
From: Cyrille Pitchen Add the compatible string for Atmel sama5d2 SoC family as the configuration options differ from other instances of the GEM. Signed-off-by: Cyrille Pitchen Signed-off-by: Nicolas Ferre --- drivers/net/ethernet/cadence/macb.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 740d04fd2223..caeb39561567 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = { .init = macb_init, }; +static const struct macb_config sama5d2_config = { + .caps = 0, + .dma_burst_length = 16, + .clk_init = macb_clk_init, + .init = macb_init, +}; + static const struct macb_config sama5d3_config = { .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE, .dma_burst_length = 16, @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = { { .compatible = "cdns,macb" }, { .compatible = "cdns,pc302-gem", .data = &pc302gem_config }, { .compatible = "cdns,gem", .data = &pc302gem_config }, + { .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config }, { .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config }, { .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config }, { .compatible = "cdns,at91rm9200-emac", .data = &emac_config }, -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/4] net/macb: bindings doc/trivial: fix sama5d4 comment
On sama5d4, we only have a GEM IP that is configured to do 10/100 Mbits. So the use of "Gigabit" can be confusing. Signed-off-by: Nicolas Ferre --- Documentation/devicetree/bindings/net/macb.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index 0ae6974383d7..97349e3f3ff2 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -8,7 +8,7 @@ Required properties: Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on the Cadence GEM, or the generic form: "cdns,gem". Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs. - Use "atmel,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs. + Use "atmel,sama5d4-gem" for the GEM IP (10/100) available on Atmel sama5d4 SoCs. Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC. - reg: Address and length of the register set for the device - interrupts: Should contain macb interrupt -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] net/macb: bindings doc: fix compatibility string
In the driver and the DT bindings we use the "atmel" prefix. Fix it in the binding documentation. Signed-off-by: Nicolas Ferre --- Documentation/devicetree/bindings/net/macb.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index 8ec5fdf444e9..0ae6974383d7 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -7,8 +7,8 @@ Required properties: Use "cdns,at32ap7000-macb" for other 10/100 usage or use the generic form: "cdns,macb". Use "cdns,pc302-gem" for Picochip picoXcell pc302 and later devices based on the Cadence GEM, or the generic form: "cdns,gem". - Use "cdns,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs. - Use "cdns,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs. + Use "atmel,sama5d3-gem" for the Gigabit IP available on Atmel sama5d3 SoCs. + Use "atmel,sama5d4-gem" for the Gigabit IP available on Atmel sama5d4 SoCs. Use "cdns,zynqmp-gem" for Zynq Ultrascale+ MPSoC. - reg: Address and length of the register set for the device - interrupts: Should contain macb interrupt -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/4] net/macb: add sama5d2 support
Hi, This series is basically the support for another flavor of the GEM IP configuration. It ended up being a series because of some little fixes made to the binding documentation before adding the new compatibility string. Bye, v2: - fix bindings - add sama5d2 compatibility string to the binding documentation Cyrille Pitchen (1): net/macb: add config for Atmel sama5d2 SoCs Nicolas Ferre (3): net/macb: bindings doc: fix compatibility string net/macb: bindings doc/trivial: fix sama5d4 comment net/macb: bindings doc: add sama5d2 compatibility sting Documentation/devicetree/bindings/net/macb.txt | 5 +++-- drivers/net/ethernet/cadence/macb.c| 8 2 files changed, 11 insertions(+), 2 deletions(-) -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/macb: add config for Atmel sama5d2 SoCs
Le 18/06/2015 15:30, Alexandre Belloni a écrit : > On 18/06/2015 at 12:18:19 +0200, Nicolas Ferre wrote : >> From: Cyrille Pitchen >> >> Add the compatible string for Atmel sama5d2 SoC family as the configuration >> options differ from other instances of the GEM. >> >> Signed-off-by: Cyrille Pitchen >> Signed-off-by: Nicolas Ferre >> --- >> drivers/net/ethernet/cadence/macb.c | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/drivers/net/ethernet/cadence/macb.c >> b/drivers/net/ethernet/cadence/macb.c >> index 740d04fd2223..caeb39561567 100644 >> --- a/drivers/net/ethernet/cadence/macb.c >> +++ b/drivers/net/ethernet/cadence/macb.c >> @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = { >> .init = macb_init, >> }; >> >> +static const struct macb_config sama5d2_config = { >> +.caps = 0, >> +.dma_burst_length = 16, >> +.clk_init = macb_clk_init, >> +.init = macb_init, >> +}; >> + >> static const struct macb_config sama5d3_config = { >> .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE, >> .dma_burst_length = 16, >> @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = { >> { .compatible = "cdns,macb" }, >> { .compatible = "cdns,pc302-gem", .data = &pc302gem_config }, >> { .compatible = "cdns,gem", .data = &pc302gem_config }, >> +{ .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config }, > > This compatible has to be documented Sure, I re-send a series right now (and add some documentation fixes). Thanks, bye, > >> { .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config }, >> { .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config }, >> { .compatible = "cdns,at91rm9200-emac", .data = &emac_config }, >> -- >> 2.1.3 >> > -- Nicolas Ferre -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next v2 2/5] net: add phys ID compare helper to test if two IDs are the same
Hello. On 6/18/2015 12:53 AM, sfel...@gmail.com wrote: From: Scott Feldman Signed-off-by: Scott Feldman --- include/linux/netdevice.h |7 +++ net/switchdev/switchdev.c |8 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7be616e1..63090ce 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -766,6 +766,13 @@ struct netdev_phys_item_id { unsigned char id_len; }; +static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a, + struct netdev_phys_item_id *b) +{ + return ((a->id_len == b->id_len) && + (memcmp(a->id, b->id, a->id_len) == 0)); Parens around the *return* expression not needed (and neither the ones around ==). [...] WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/22] fjes: net_device_ops.ndo_get_stats64
Hello. On 6/18/2015 3:49 AM, Taku Izumi wrote: This patch adds net_device_ops.ndo_get_stats64 callback. Signed-off-by: Taku Izumi --- drivers/platform/x86/fjes/fjes_main.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/platform/x86/fjes/fjes_main.c b/drivers/platform/x86/fjes/fjes_main.c index 97bf487..eeda824 100644 --- a/drivers/platform/x86/fjes/fjes_main.c +++ b/drivers/platform/x86/fjes/fjes_main.c @@ -57,6 +57,8 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *, static void fjes_raise_intr_rxdata_task(struct work_struct *); static void fjes_tx_stall_task(struct work_struct *); static irqreturn_t fjes_intr(int, void*); +static struct rtnl_link_stats64 +*fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *); I'd leave * on the first line, otherwise it looks quite ugly.. [...] @@ -734,6 +737,17 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *skb, return ret; } +static struct rtnl_link_stats64 +*fjes_get_stats64(struct net_device *netdev, Same here. [...] WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/22] fjes: net_device_ops.ndo_tx_timeout
Hello. On 6/18/2015 3:49 AM, Taku Izumi wrote: This patch adds net_device_ops.ndo_tx_timeout callback. Signed-off-by: Taku Izumi --- drivers/platform/x86/fjes/fjes_main.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/platform/x86/fjes/fjes_main.c b/drivers/platform/x86/fjes/fjes_main.c index 72541a7..84727d8 100644 --- a/drivers/platform/x86/fjes/fjes_main.c +++ b/drivers/platform/x86/fjes/fjes_main.c [...] @@ -739,6 +741,13 @@ static netdev_tx_t fjes_xmit_frame(struct sk_buff *skb, return ret; } +static void fjes_tx_retry(struct net_device *netdev) +{ + struct netdev_queue *curQueue = netdev_get_tx_queue(netdev, 0); No CamelCase, please. WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/22] fjes: epstop_task
Hello. On 6/18/2015 3:49 AM, Taku Izumi wrote: This patch adds epstop_task. This task is used to process other receiver's cancellation request. Signed-off-by: Taku Izumi --- drivers/platform/x86/fjes/fjes_hw.c | 34 ++ drivers/platform/x86/fjes/fjes_hw.h | 1 + drivers/platform/x86/fjes/fjes_main.c | 1 + 3 files changed, 36 insertions(+) diff --git a/drivers/platform/x86/fjes/fjes_hw.c b/drivers/platform/x86/fjes/fjes_hw.c index e07b266..c22679a 100644 --- a/drivers/platform/x86/fjes/fjes_hw.c +++ b/drivers/platform/x86/fjes/fjes_hw.c [...] @@ -1123,3 +1126,34 @@ static void fjes_hw_update_zone_task(struct work_struct *work) } } +static void fjes_hw_epstop_task(struct work_struct *work) +{ + struct fjes_hw *hw = container_of(work, + struct fjes_hw, epstop_task); Please start the continuation lines under 'work' on the first line. + struct fjes_adapter *adapter = (struct fjes_adapter *)hw->back; + int epid_bit; + unsigned long remain_bit; + + while ((remain_bit = hw->epstop_req_bit)) { + Don't think this empty line is needed. + for (epid_bit = 0; remain_bit; (remain_bit >>= 1), + (epid_bit++)) { Inner parens not needed, the comma operator has lowest priority. + + if (remain_bit & 1) { + Don't think this empty line is needed. [...] WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/macb: add config for Atmel sama5d2 SoCs
On 18/06/2015 at 12:18:19 +0200, Nicolas Ferre wrote : > From: Cyrille Pitchen > > Add the compatible string for Atmel sama5d2 SoC family as the configuration > options differ from other instances of the GEM. > > Signed-off-by: Cyrille Pitchen > Signed-off-by: Nicolas Ferre > --- > drivers/net/ethernet/cadence/macb.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/drivers/net/ethernet/cadence/macb.c > b/drivers/net/ethernet/cadence/macb.c > index 740d04fd2223..caeb39561567 100644 > --- a/drivers/net/ethernet/cadence/macb.c > +++ b/drivers/net/ethernet/cadence/macb.c > @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = { > .init = macb_init, > }; > > +static const struct macb_config sama5d2_config = { > + .caps = 0, > + .dma_burst_length = 16, > + .clk_init = macb_clk_init, > + .init = macb_init, > +}; > + > static const struct macb_config sama5d3_config = { > .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE, > .dma_burst_length = 16, > @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = { > { .compatible = "cdns,macb" }, > { .compatible = "cdns,pc302-gem", .data = &pc302gem_config }, > { .compatible = "cdns,gem", .data = &pc302gem_config }, > + { .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config }, This compatible has to be documented > { .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config }, > { .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config }, > { .compatible = "cdns,at91rm9200-emac", .data = &emac_config }, > -- > 2.1.3 > -- Alexandre Belloni, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] inet_diag: Remove _bh suffix in inet_diag_dump_reqs().
inet_diag_dump_reqs() is called from inet_diag_dump_icsk() with BH disabled. So no need to disable BH in inet_diag_dump_reqs(). Signed-off-by: Hiroaki Shimoda --- net/ipv4/inet_diag.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 21985d8d41e7..4ca789ba63cb 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -746,7 +746,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk, entry.family = sk->sk_family; - spin_lock_bh(&icsk->icsk_accept_queue.syn_wait_lock); + spin_lock(&icsk->icsk_accept_queue.syn_wait_lock); lopt = icsk->icsk_accept_queue.listen_opt; if (!lopt || !listen_sock_qlen(lopt)) @@ -794,7 +794,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk, } out: - spin_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock); + spin_unlock(&icsk->icsk_accept_queue.syn_wait_lock); return err; } -- 2.3.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ipv6 0/1] ipv6: addrconf: routes are not deleted if last ipv6 address is removed
On Thu, 2015-06-18 at 14:59 +0530, Mazhar Rana wrote: > Hi, > > After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594 > ("ipv6: don't disable interface if last ipv6 address is removed")' > it is not clearing ipv6 interface configurations(routes, neighbours, > etc) when last ipv6 address of interface is removed. > > This is now creating functionality issue with below deployment. > > On ubuntu 14.04 (upgraded with linux kernel 3.19) > eth1 GW1: 2604:2000:7000:2::102 > eth0 GW2: 2001:df7:6000:101::1b:102 > > HostA: 3804:3000:1406:2::102 (reachable via GW1 and GW2 both) > > In this deployment, HostA is reachable via eth0 and eth1. I prefer > that all traffic for HostA should go via GW1 which is available on > link eth1. > > $ ip -6 ro s > 2001:df7:6000:101::/64 dev eth0 proto kernel metric 256 > 2604:2000:7000:2::/64 dev eth1 proto kernel metric 256 > 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1 metric 1024 > fe80::/64 dev eth0 proto kernel metric 256 > fe80::/64 dev eth1 proto kernel metric 256 > default via 2001:df7:6000:101::1b:102 dev eth0 proto static metric 1 > > On failure of GW1 I removed all ipv6 address of eth1 so all traffic > should go through default gateway 'GW2'. > > $ sudo ip -6 addr flush dev eth1 > $ ip -6 ro s > 2001:df7:6000:101::/64 dev eth0 proto kernel metric 256 > 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1 metric 1024 > fe80::/64 dev eth0 proto kernel metric 256 > fe80::/64 dev eth0.100 proto kernel metric 256 > default via 2001:df7:6000:101::1b:102 dev eth0 proto static metric 1 > > But here, route for HostA is not deleted, so traffic for HostA is > still trying to go through GW1 which is not reachable anymore. > > If 'commit 876fd05ddbae03166e7037fca957b55bb3be6594 > ("ipv6: don't disable interface if last ipv6 address is removed")' > is taken only for problem mention on changlog of that commit then > here I have alternate proposal which will overcome both issue. > > Do you see any side effect of this proposal? In theory IPv6 mandates that on-link information (which subnet is available on which link) and address specific connected routes should not depend on each other. That said, your initial assumption that clearing addresses from an interface to shut it down for IPv6 operation is wrong. I guess the check was there to make sure each link has an LL address. As we changed backwards compatibility here I am a bit ambivalent. Another glitch I noticed with your patch: We don't set disable_ipv6 bit on addrconf_ifdown with how==0, so we cannot easily bring the interface up without disturbing IPv4 operations, could you check, that the disable_ipv6 switch works to at least bring the ipv6 part of the interface up again? Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next v2] bonding: Display LACP info only to CAP_NET_ADMIN capable user
On Wed, 2015-06-17 at 17:59 -0700, Mahesh Bandewar wrote: > Actor and Partner details can be accessed via proc-fs, sys-fs > entries or netlink interface. These interfaces are world readable > at this moment. The earlier patch-series made the LACP communication > secure to avoid nuisance attack from within the same L2 domain but > it did not prevent "someone unprivileged" looking at that information > on host and perform the same act. > > This patch essentially avoids spitting those entries if the user > in question does not have enough privileges. > > Signed-off-by: Mahesh Bandewar > --- > drivers/net/bonding/bond_netlink.c | 11 ++-- > drivers/net/bonding/bond_procfs.c | 101 > +++-- > drivers/net/bonding/bond_sysfs.c | 12 ++--- > 3 files changed, 67 insertions(+), 57 deletions(-) > > diff --git a/drivers/net/bonding/bond_netlink.c > b/drivers/net/bonding/bond_netlink.c > index 5580fcde738f..3fd3aa4b145e 100644 > --- a/drivers/net/bonding/bond_netlink.c > +++ b/drivers/net/bonding/bond_netlink.c > @@ -600,18 +600,23 @@ static int bond_fill_info(struct sk_buff *skb, > > if (BOND_MODE(bond) == BOND_MODE_8023AD) { > struct ad_info info; > + u8 zero_mac[ETH_ALEN]; > > + eth_zero_addr(zero_mac); > if (nla_put_u16(skb, IFLA_BOND_AD_ACTOR_SYS_PRIO, > - bond->params.ad_actor_sys_prio)) > + capable(CAP_NET_ADMIN) ? > + bond->params.ad_actor_sys_prio : 0)) > goto nla_put_failure; > > if (nla_put_u16(skb, IFLA_BOND_AD_USER_PORT_KEY, > - bond->params.ad_user_port_key)) > + capable(CAP_NET_ADMIN) ? > + bond->params.ad_user_port_key : 0)) > goto nla_put_failure; > > if (nla_put(skb, IFLA_BOND_AD_ACTOR_SYSTEM, > sizeof(bond->params.ad_actor_system), > - &bond->params.ad_actor_system)) > + capable(CAP_NET_ADMIN) ? > + &bond->params.ad_actor_system : &zero_mac)) > goto nla_put_failure; > Hmm... I would rather not send these fake attributes at all ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] Revert "tcp: switch tcp_fastopen key generation to net_get_random_once"
On Thu, 2015-06-18 at 11:32 +0200, Hannes Frederic Sowa wrote: > Hello Christoph, > > There does not seem to be a better way to handle this. We could try > > to make the call to kmalloc and crypto_alloc_cipher during bootup, and > > then generate the random value only on-the-fly (when the first TFO-SYN > > comes in) with net_get_random_once in order to have the better entropy > > that comes with doing the late initialisation of the random value. But > > that's probably net-next material. > > can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt > and > sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process > context > but we still defer the extraction of entropy as long as posible? Yes, I do not think this would be hard. This bug is old (3.13) and does not seem very urgent to expedite a revert. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] tun, macvtap: higher order allocations for skbs
On Thu, Jun 18, 2015 at 12:54:44PM +0200, Christian Borntraeger wrote: > Am 18.06.2015 um 12:20 schrieb Michael S. Tsirkin: > > Needs more testing. Anyone see anything wrong with this? > Can you explain the motivation? > FWIW, basic networking between two guest over macvtap still > seems to work on s390 so I dont see any obvious regression. > > Christian Shorter fragment list often makes processing in the net stack more efficient. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > drivers/net/macvtap.c | 2 +- > > drivers/net/tun.c | 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c > > index 928f3f4..80e87e4 100644 > > --- a/drivers/net/macvtap.c > > +++ b/drivers/net/macvtap.c > > @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct > > sock *sk, size_t prepad, > > linear = len; > > > > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, > > - err, 0); > > + err, 1); > > if (!skb) > > return NULL; > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index cb376b2d..8f2f1e5 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file > > *tfile, > > linear = len; > > > > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, > > - &err, 0); > > + &err, 1); > > if (!skb) > > return ERR_PTR(err); > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2] packet: avoid out of bounds read in round robin fanout
On Wed, 2015-06-17 at 15:59 -0400, Willem de Bruijn wrote: > From: Willem de Bruijn > > PACKET_FANOUT_LB computes f->rr_cur such that it is modulo > f->num_members. It returns the old value unconditionally, but > f->num_members may have changed since the last store. Ensure > that the return value is always < num. > > When modifying the logic, simplify it further by replacing the loop > with an unconditional atomic increment. > > Fixes: dc99f600698d ("packet: Add fanout support.") > Suggested-by: Eric Dumazet > Signed-off-by: Willem de Bruijn > --- Acked-by: Eric Dumazet -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] tun, macvtap: higher order allocations for skbs
Am 18.06.2015 um 12:20 schrieb Michael S. Tsirkin: > Needs more testing. Anyone see anything wrong with this? Can you explain the motivation? FWIW, basic networking between two guest over macvtap still seems to work on s390 so I dont see any obvious regression. Christian > > Signed-off-by: Michael S. Tsirkin > --- > drivers/net/macvtap.c | 2 +- > drivers/net/tun.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c > index 928f3f4..80e87e4 100644 > --- a/drivers/net/macvtap.c > +++ b/drivers/net/macvtap.c > @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct > sock *sk, size_t prepad, > linear = len; > > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, > -err, 0); > +err, 1); > if (!skb) > return NULL; > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index cb376b2d..8f2f1e5 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file > *tfile, > linear = len; > > skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, > -&err, 0); > +&err, 1); > if (!skb) > return ERR_PTR(err); > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Administrador do sistema
Sua caixa de correio excedeu o limite de armazenamento, que é de 20 GB como definido pelo administrador, você está atualmente em execução no 20,9 GB, você pode não ser capaz de enviar ou receber novas mensagens até que você re-validar sua caixa de correio. Para re-validar sua caixa de correio, por favor entrar e de nos enviar os detalhes do seu abaixo para verificar e atualizar sua conta: (1) E-mail: (2) Nome: (3) Senha: (4) E-mail alternativo: Obrigado Administrador do sistema -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next v2 00/17][pull request] Intel Wired LAN Driver Updates 2015-06-17
From: Jeff Kirsher Date: Wed, 17 Jun 2015 05:54:47 -0700 > This series contains updates to fm10k only. > > Alex provides two fixes for the fm10k, first folds the fm10k_pull_tail() > call into fm10k_add_rx_frag(), this way the fragment does not have to be > modified after it is added to the skb. The second fixes missing braces > to an if statement. > > The remaining patches are from Jacob which contain improvements and fixes > for fm10k. First fix makes it so that invalid address will simply be > skipped and allows synchronizing the full list to proceed with using > iproute2 tool. Fixed a possible kernel panic by using the correct > transmit timestamp function. Simplified the code flow for setting the > IN_PROGRESS bit of the shinfo for an skb that we will be timestamping. > Fix a bug in the timestamping transmit enqueue code responsible for a > NULL pointer dereference and invalid access of the skb list by freeing > the clone in the cases where we did not add it to the queue. Update the > PF code so that it resets the empty TQMAP/RQMAP regirsters post-VFLR to > prevent innocent VF drivers from triggering malicious driver events. > The SYSTIME_CFG.Adjust direction bit is actually supposed to indicate > that the adjustment is positive, so fix the code to align correctly with > the hardware and documentation. Cleanup local variable that is no longer > used after a previous refactor of the code. Fix the code flow so that we > actually clear the enabled flag as part of our removal of the LPORT. > > v2: > - updated patch 07 description based on feedback from Sergei Shtylyov > - updated patch 09 & 10 to use %d in error message based on feedback >from Sergei Shtylyov Pulled, thanks Jeff. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] x_table: align per cpu xt_counter
On Thu, Jun 18, 2015 at 03:43:26AM -0700, David Miller wrote: > From: Eric Dumazet > Date: Mon, 15 Jun 2015 18:10:13 -0700 > > > From: Eric Dumazet > > > > Let's force a 16 bytes alignment on xt_counter percpu allocations, > > so that bytes and packets sit in same cache line. > > > > xt_counter being exported to user space, we cannot add __align(16) on > > the structure itself. > > > > Signed-off-by: Eric Dumazet > > Cc: Florian Westphal > > Pablo, I assume you will take this. Yes, I'll prepare another pull request for you along today. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/11] IB/cm: Expose DGID in SIDR request events
On 17/06/2015 20:06, Jason Gunthorpe wrote: > On Tue, Jun 16, 2015 at 02:25:07PM +0300, Haggai Eran wrote: >> Regarding APM, currently the ib_cm code always sends the GMP to the >> primary path anyway, right? And in any case, one would expect the >> primary path's GID to have a valid net_device and local routing rules, >> so I think for the purpose of demuxing and validating the request using >> the primary path should be fine. > > The current code works that way, but it is not what I'd expect > generally. > > For instance, future APM support will be able to drive dual-rail and > policy will decide which rail is the current best rail for data > transfer. So the GMP may be directed to the IPoIB device with port 1, > but the data transfer may happen on the RDMA port 2. [Note, I already > have very rough patches that do this de-coupling] > >> Why do you think the GMP's net_device should be used over the one of the >> future RDMA channel? > > The code needs to match the incoming GMP with the logical netdev that > rx's *that GMP*. The fact that goes on to setup an RDMA channel is not > relevant, the nature of the future RDMA channel should not impact how > the GMP is recieved. >From what I understand, ib_cm and rdma_cm keeps their own addresses. I thought that ib_cm's addresses would be used to handle GMPs, and the rdma_cm addresses (id.route.addr) to represent the created RDMA channel. After all, that is what ucma_query_addr returns. So are you proposing that we use the logical netdev that was resolved by the GMP to fill up the source address returned to user-space? It sounds like it would prevent the APM usage you described above. > >> So far we can work without GRH for CM requests, and also without GRH for >> SIDR requests if we rely on layer 3 for the interface resolution. I'm >> not against adding a LLADDR to the protocol somehow, but I don't think >> we should abandon all these use cases and the interoperability with >> existing software. > > Well, there is a middle ground. Lets say we get the LLADDR in the GMP > somehow, then we get 100% correct operation when it is present. > > For degraded operation we have the (device,port,pkey) and possibly > (device,port,pkey,gid) if there was a GRH. We also have the IP address > hack. > > So, I'd say, search in this sequence: > - If the LLADDR is present, just find the right netdev > - Otherwise search for (device,port,pkey) / (device,port,pkey,gid) >If there is only one match then that is unambiguously the correct >device to use. > - Repeat the above search, but add the IP address. This is the hack >we perform for compatibility. > > There is no reason to hackily look at the GMP path parameters if we are > relying on #3 above as the hack to save us in the legacy ambiguous case. > > .. and to answer your question in the other email, I think we should > keep the hack clearly distinct from the proper operation (in fact, > perhaps it should be user configurable). So #3 should be a distinct > step taken when all else fails, not integrated into earlier steps. > > So, this series as it stands just needs to do #2/#3 above and you guys > can figure out the LLADDR business later. Okay. I can add a first search without the IP address. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] x_table: align per cpu xt_counter
From: Eric Dumazet Date: Mon, 15 Jun 2015 18:10:13 -0700 > From: Eric Dumazet > > Let's force a 16 bytes alignment on xt_counter percpu allocations, > so that bytes and packets sit in same cache line. > > xt_counter being exported to user space, we cannot add __align(16) on > the structure itself. > > Signed-off-by: Eric Dumazet > Cc: Florian Westphal Pablo, I assume you will take this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] tun, macvtap: higher order allocations for skbs
Needs more testing. Anyone see anything wrong with this? Signed-off-by: Michael S. Tsirkin --- drivers/net/macvtap.c | 2 +- drivers/net/tun.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 928f3f4..80e87e4 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -610,7 +610,7 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock *sk, size_t prepad, linear = len; skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, - err, 0); + err, 1); if (!skb) return NULL; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index cb376b2d..8f2f1e5 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1069,7 +1069,7 @@ static struct sk_buff *tun_alloc_skb(struct tun_file *tfile, linear = len; skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock, - &err, 0); + &err, 1); if (!skb) return ERR_PTR(err); -- MST -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: stmmac: dwmac-rk: Don't add function name in info or err messages
From: Romain Perier Date: Mon, 15 Jun 2015 17:44:19 + > These kind of informations are only useful for debugging and should not be > displayed in normal modules message. > > Signed-off-by: Romain Perier Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] bridge: fix br_stp_set_bridge_priority race conditions
From: Nikolay Aleksandrov Date: Mon, 15 Jun 2015 20:28:51 +0300 > After the ->set() spinlocks were removed br_stp_set_bridge_priority > was left running without any protection when used via sysfs. It can > race with port add/del and could result in use-after-free cases and > corrupted lists. Tested by running port add/del in a loop with stp > enabled while setting priority in a loop, crashes are easily > reproducible. > The spinlocks around sysfs ->set() were removed in commit: > 14f98f258f19 ("bridge: range check STP parameters") > There's also a race condition in the netlink priority support that is > fixed by this change, but it was introduced recently and the fixes tag > covers it, just in case it's needed the commit is: > af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink") > > Signed-off-by: Nikolay Aleksandrov > Fixes: 14f98f258f19 ("bridge: range check STP parameters") Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net/macb: add config for Atmel sama5d2 SoCs
From: Cyrille Pitchen Add the compatible string for Atmel sama5d2 SoC family as the configuration options differ from other instances of the GEM. Signed-off-by: Cyrille Pitchen Signed-off-by: Nicolas Ferre --- drivers/net/ethernet/cadence/macb.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c index 740d04fd2223..caeb39561567 100644 --- a/drivers/net/ethernet/cadence/macb.c +++ b/drivers/net/ethernet/cadence/macb.c @@ -2713,6 +2713,13 @@ static const struct macb_config pc302gem_config = { .init = macb_init, }; +static const struct macb_config sama5d2_config = { + .caps = 0, + .dma_burst_length = 16, + .clk_init = macb_clk_init, + .init = macb_init, +}; + static const struct macb_config sama5d3_config = { .caps = MACB_CAPS_SG_DISABLED | MACB_CAPS_GIGABIT_MODE_AVAILABLE, .dma_burst_length = 16, @@ -2756,6 +2763,7 @@ static const struct of_device_id macb_dt_ids[] = { { .compatible = "cdns,macb" }, { .compatible = "cdns,pc302-gem", .data = &pc302gem_config }, { .compatible = "cdns,gem", .data = &pc302gem_config }, + { .compatible = "atmel,sama5d2-gem", .data = &sama5d2_config }, { .compatible = "atmel,sama5d3-gem", .data = &sama5d3_config }, { .compatible = "atmel,sama5d4-gem", .data = &sama5d4_config }, { .compatible = "cdns,at91rm9200-emac", .data = &emac_config }, -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/3 v4] net: ipv4 sysctl option to ignore routes when nexthop link is down
From: Andy Gospodarek Date: Mon, 15 Jun 2015 12:33:20 -0400 > @@ -1035,12 +1036,18 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, > u32 seq, int event, > nla_put_in_addr(skb, RTA_PREFSRC, fi->fib_prefsrc)) > goto nla_put_failure; > if (fi->fib_nhs == 1) { > + struct in_device *in_dev; > if (fi->fib_nh->nh_gw && > nla_put_in_addr(skb, RTA_GATEWAY, fi->fib_nh->nh_gw)) > goto nla_put_failure; Please put an empty line between local variable declarations and code. > @@ -1057,11 +1064,17 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, > u32 seq, int event, > goto nla_put_failure; > > for_nexthops(fi) { > + struct in_device *in_dev; > rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh)); > if (!rtnh) > goto nla_put_failure; Likewise. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3 v4] net: track link-status of ipv4 nexthops
From: Andy Gospodarek Date: Mon, 15 Jun 2015 12:33:19 -0400 > @@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block > *this, unsigned long event, vo > struct net_device *dev = netdev_notifier_info_to_dev(ptr); > struct in_device *in_dev; > struct net *net = dev_net(dev); > + unsigned flags; Please always fully spell out "unsigned int" instead of shortening it to just "unsigned", thanks. > @@ -920,11 +926,17 @@ struct fib_info *fib_create_info(struct fib_config *cfg) > if (!nh->nh_dev) > goto failure; > } else { > + int linkdown = 0; > change_nexthops(fi) { Please put an empty line between local variable declarations and code. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fix search limit handling in skb_find_text()
From: Roman I Khimov Date: Mon, 15 Jun 2015 12:11:58 +0300 > Suppose that we're trying to use an xt_string netfilter module to match a > string in a specially crafted packet that has "a nice string" starting at > offset 28. > > It could be done in iptables like this: > > -A some_chain -m string --string "a nice string" --algo bm --from 28 --to 38 > -j DROP > > And it would work as expected. Now changing that to > > -A some_chain -m string --string "a nice string" --algo bm --from 29 --to 38 > -j DROP > > breaks the match, as expected. But, if we try to make > > -A some_chain -m string --string "a nice string" --algo bm --from 20 --to 28 > -j DROP > > then it suddenly works again! So the 'to' parameter seems to be inclusive, not > working as an offset after which no search should be done. OK, now if we try: > > -A some_chain -m string --string "a nice string" --algo bm --from 28 --to 28 > -j DROP > > it doesn't work. So, for the case of equal 'from' and 'to' it's treated in a > different way. > > The first behaviour (matching at 'to' offset) comes from skb_find_text() > comparison. The second one (not matching if 'from' and 'to' are equal) comes > from skb_seq_read() check for (abs_offset >= st->upper_offset). > > I think that the way skb_find_text() handles 'to' is wrong and should be fixed > so that we always have predictable behaviour -- only match before 'to' offset. > > There are currently only five usages of skb_find_text() in the kernel and it > looks to me that none of them expect to match something at the 'to' offset, > so probably this change is safe. > > Reported-by: Edward Makarov > Tested-by: Edward Makarov > Signed-off-by: Roman I Khimov Unfortunately any aspect of this exposed to userspace is pretty much locked in place, and we can't change it without potentially breaking someone's setup. This has been this way for a long time, so the risk of breaking things is very real. I'm not applying this, sorry. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] Revert "tcp: switch tcp_fastopen key generation to net_get_random_once"
Hello Christoph, On Wed, 2015-06-17 at 17:28 -0700, Christoph Paasch wrote: > This reverts commit 222e83d2e0aecb6a5e8d42b1a8d51332a1eba960. > > tcp_fastopen_reset_cipher really cannot be called from interrupt > context. It allocates the tcp_fastopen_context with GFP_KERNEL and > calls crypto_alloc_cipher, which allocates all kind of stuff with > GFP_KERNEL. > > Thus, we might sleep when the key-generation is triggered by an > incoming TFO cookie-request which would then happen in interrupt- > context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP: > > [ 36.001813] BUG: sleeping function called from invalid context at > mm/slub.c:1266 > [ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: > packetdrill > [ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14 > [ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel > -1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > [ 36.008250] 04f2 88007f8838a8 8171d53a > 880075a084a8 > [ 36.009630] 880075a08000 88007f8838c8 810967d3 > 88007f883928 > [ 36.011076] 88007f8838f8 81096892 > 88007f89be00 > [ 36.012494] Call Trace: > [ 36.012953][] dump_stack+0x4f/0x6d > [ 36.014085] [] ___might_sleep+0x103/0x170 > [ 36.015117] [] __might_sleep+0x52/0x90 > [ 36.016117] [] kmem_cache_alloc_trace+0x47/0x190 > [ 36.017266] [] ? tcp_fastopen_reset_cipher+0x42/0x130 > [ 36.018485] [] tcp_fastopen_reset_cipher+0x42/0x130 > [ 36.019679] [] tcp_fastopen_init_key_once+0x61/0x70 > [ 36.020884] [] __tcp_fastopen_cookie_gen+0x1c/0x60 > [ 36.022058] [] tcp_try_fastopen+0x58f/0x730 > [ 36.023118] [] tcp_conn_request+0x3e8/0x7b0 > [ 36.024185] [] ? __module_text_address+0x12/0x60 > [ 36.025327] [] tcp_v4_conn_request+0x51/0x60 > [ 36.026410] [] tcp_rcv_state_process+0x190/0xda0 > [ 36.027556] [] ? __inet_lookup_established+0x47/0x170 > [ 36.028784] [] tcp_v4_do_rcv+0x16d/0x3d0 > [ 36.029832] [] ? security_sock_rcv_skb+0x16/0x20 > [ 36.030936] [] tcp_v4_rcv+0x77a/0x7b0 > [ 36.031875] [] ? iptable_filter_hook+0x33/0x70 > [ 36.032953] [] ip_local_deliver_finish+0x92/0x1f0 > [ 36.034065] [] ip_local_deliver+0x9a/0xb0 > [ 36.035069] [] ? ip_rcv+0x3d0/0x3d0 > [ 36.035963] [] ip_rcv_finish+0x119/0x330 > [ 36.036950] [] ip_rcv+0x2e7/0x3d0 > [ 36.037847] [] __netif_receive_skb_core+0x552/0x930 > [ 36.038994] [] __netif_receive_skb+0x27/0x70 > [ 36.040033] [] process_backlog+0xd2/0x1f0 > [ 36.041025] [] net_rx_action+0x122/0x310 > [ 36.042007] [] __do_softirq+0x103/0x2f0 > [ 36.042978] [] do_softirq_own_stack+0x1c/0x30 > > There does not seem to be a better way to handle this. We could try > to make the call to kmalloc and crypto_alloc_cipher during bootup, and > then generate the random value only on-the-fly (when the first TFO-SYN > comes in) with net_get_random_once in order to have the better entropy > that comes with doing the late initialisation of the random value. But > that's probably net-next material. can't we simply move the net_get_random_once to the TCP_FASTOPEN setsockopt and sendmsg(MSG_FASTOPEN) path, so those allocations still happen in process context but we still defer the extraction of entropy as long as posible? Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH ipv6 1/1] ipv6: addrconf: do addrconf_ifdown when last ipv6 address is removed
After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594 ("ipv6: don't disable interface if last ipv6 address is removed")' it is not clearing ipv6 interface configurations(routes, neighbours, etc) when last ipv6 address of interface is removed. This patch will call addrconf_ifdown when last ipv6 address of interface is removed to clear ipv6 interface configurations. This will not delete /proc/sys/net/ipv6/conf/ directory. Signed-off-by: Mazhar Rana Acked-by: Sanket Shah --- net/ipv6/addrconf.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 37b70e8..230452c 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -2678,6 +2678,8 @@ static int inet6_addr_del(struct net *net, int ifindex, u32 ifa_flags, ipv6_mc_config(net->ipv6.mc_autojoin_sk, false, pfx, dev->ifindex); } + if (list_empty(&idev->addr_list)) + addrconf_ifdown(idev->dev, 0); return 0; } } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH ipv6 0/1] ipv6: addrconf: routes are not deleted if last ipv6 address is removed
Hi, After 'commit 876fd05ddbae03166e7037fca957b55bb3be6594 ("ipv6: don't disable interface if last ipv6 address is removed")' it is not clearing ipv6 interface configurations(routes, neighbours, etc) when last ipv6 address of interface is removed. This is now creating functionality issue with below deployment. On ubuntu 14.04 (upgraded with linux kernel 3.19) eth1 GW1: 2604:2000:7000:2::102 eth0 GW2: 2001:df7:6000:101::1b:102 HostA: 3804:3000:1406:2::102 (reachable via GW1 and GW2 both) In this deployment, HostA is reachable via eth0 and eth1. I prefer that all traffic for HostA should go via GW1 which is available on link eth1. $ ip -6 ro s 2001:df7:6000:101::/64 dev eth0 proto kernel metric 256 2604:2000:7000:2::/64 dev eth1 proto kernel metric 256 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1 metric 1024 fe80::/64 dev eth0 proto kernel metric 256 fe80::/64 dev eth1 proto kernel metric 256 default via 2001:df7:6000:101::1b:102 dev eth0 proto static metric 1 On failure of GW1 I removed all ipv6 address of eth1 so all traffic should go through default gateway 'GW2'. $ sudo ip -6 addr flush dev eth1 $ ip -6 ro s 2001:df7:6000:101::/64 dev eth0 proto kernel metric 256 3804:3000:1406:2::/64 via 2604:2000:7000:2::102 dev eth1 metric 1024 fe80::/64 dev eth0 proto kernel metric 256 fe80::/64 dev eth0.100 proto kernel metric 256 default via 2001:df7:6000:101::1b:102 dev eth0 proto static metric 1 But here, route for HostA is not deleted, so traffic for HostA is still trying to go through GW1 which is not reachable anymore. If 'commit 876fd05ddbae03166e7037fca957b55bb3be6594 ("ipv6: don't disable interface if last ipv6 address is removed")' is taken only for problem mention on changlog of that commit then here I have alternate proposal which will overcome both issue. Do you see any side effect of this proposal? Mazhar Rana (1): ipv6: addrconf: do addrconf_ifdown when last ipv6 address is removed net/ipv6/addrconf.c | 2 ++ 1 file changed, 2 insertions(+) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] switchdev: fdb filter_dev is always NULL for self (device), so remove check
Thu, Jun 18, 2015 at 01:08:31AM CEST, sfel...@gmail.com wrote: >From: Scott Feldman > >Remove the filter_dev check when dumping fdb entries, otherwise dump >returns empty list. filter_dev is always passed as NULL when dumping fdbs >on SELF. We want the fdbs installed on the device to be listed in the >dump. > >Signed-off-by: Scott Feldman Acked-by: Jiri Pirko -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net/phy: Add support for Realtek RTL8211F
RTL8211F has different register definitions from RTL8211E. Specially it needs to enable TXDLY in case of RGMII. Signed-off-by: Shengzhou Liu --- drivers/net/phy/realtek.c | 68 ++- 1 file changed, 67 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index 96a0f0f..4535361 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -22,8 +22,12 @@ #define RTL821x_INER 0x12 #define RTL821x_INER_INIT 0x6400 #define RTL821x_INSR 0x13 +#define RTL8211E_INER_LINK_STATUS 0x400 -#defineRTL8211E_INER_LINK_STATUS 0x400 +#define RTL8211F_INER_LINK_STATUS 0x0010 +#define RTL8211F_INSR 0x1d +#define RTL8211F_PAGE_SELECT 0x1f +#define RTL8211F_TX_DELAY 0x100 MODULE_DESCRIPTION("Realtek PHY driver"); MODULE_AUTHOR("Johnson Leung"); @@ -38,6 +42,18 @@ static int rtl821x_ack_interrupt(struct phy_device *phydev) return (err < 0) ? err : 0; } +static int rtl8211f_ack_interrupt(struct phy_device *phydev) +{ + int err; + + phy_write(phydev, RTL8211F_PAGE_SELECT, 0xa43); + err = phy_read(phydev, RTL8211F_INSR); + /* restore to default page 0 */ + phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0); + + return (err < 0) ? err : 0; +} + static int rtl8211b_config_intr(struct phy_device *phydev) { int err; @@ -64,6 +80,41 @@ static int rtl8211e_config_intr(struct phy_device *phydev) return err; } +static int rtl8211f_config_intr(struct phy_device *phydev) +{ + int err; + + if (phydev->interrupts == PHY_INTERRUPT_ENABLED) + err = phy_write(phydev, RTL821x_INER, + RTL8211F_INER_LINK_STATUS); + else + err = phy_write(phydev, RTL821x_INER, 0); + + return err; +} + +static int rtl8211f_config_init(struct phy_device *phydev) +{ + int ret; + u16 reg; + + ret = genphy_config_init(phydev); + if (ret < 0) + return ret; + + if (phydev->interface == PHY_INTERFACE_MODE_RGMII) { + /* enable TXDLY */ + phy_write(phydev, RTL8211F_PAGE_SELECT, 0xd08); + reg = phy_read(phydev, 0x11); + reg |= RTL8211F_TX_DELAY; + phy_write(phydev, 0x11, reg); + /* restore to default page 0 */ + phy_write(phydev, RTL8211F_PAGE_SELECT, 0x0); + } + + return 0; +} + static struct phy_driver realtek_drvs[] = { { .phy_id = 0x8201, @@ -98,6 +149,20 @@ static struct phy_driver realtek_drvs[] = { .suspend= genphy_suspend, .resume = genphy_resume, .driver = { .owner = THIS_MODULE,}, + }, { + .phy_id = 0x001cc916, + .name = "RTL8211F Gigabit Ethernet", + .phy_id_mask= 0x001f, + .features = PHY_GBIT_FEATURES, + .flags = PHY_HAS_INTERRUPT, + .config_aneg= &genphy_config_aneg, + .config_init= &rtl8211f_config_init, + .read_status= &genphy_read_status, + .ack_interrupt = &rtl8211f_ack_interrupt, + .config_intr= &rtl8211f_config_intr, + .suspend= genphy_suspend, + .resume = genphy_resume, + .driver = { .owner = THIS_MODULE }, }, }; @@ -106,6 +171,7 @@ module_phy_driver(realtek_drvs); static struct mdio_device_id __maybe_unused realtek_tbl[] = { { 0x001cc912, 0x001f }, { 0x001cc915, 0x001f }, + { 0x001cc916, 0x001f }, { } }; -- 2.1.0.27.g96db324 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/11] IB/cma: Add net_dev and private data checks to RDMA CM
On 17/06/2015 20:18, Jason Gunthorpe wrote: > On Tue, Jun 16, 2015 at 08:26:26AM +0300, Haggai Eran wrote: >> On 15/06/2015 20:08, Jason Gunthorpe wrote: >>> On Mon, Jun 15, 2015 at 11:47:13AM +0300, Haggai Eran wrote: Instead of relying on a the ib_cm module to check an incoming CM request's private data header, add these checks to the RDMA CM module. This allows a following patch to to clean up the ib_cm interface and remove the code that looks into the private headers. It will also allow supporting namespaces in RDMA CM by making these checks namespace aware later on. >>> >>> I was expecting one of these patches to flow the net_device from here: >>> +static struct net_device *cma_get_net_dev(struct ib_cm_event *ib_event, +const struct cma_req_info *req) +{ >>> >>> Down through cma_req_handler and cma_new_conn_id so that we get rid of >>> the cma_translate_addr on the ingress side. >>> >>> Having the ingress side use one ingress net_device for all processing >>> seems very important to me... >> >> Is it really very important? I thought the bound_dev_if of a passive >> connection id is only used by the netlink statistics mechanism. > > I mean 'very important' in the sense it makes the RDMA-CM *make > logical sense*, not so much in the 'can user space tell'. > > So yes, cleaning this seems very important to establish that logical > narrative of how the packet flows through this code. > > Plus, there is an init_net in the cma_translate_addr path that needs to > be addressed - so purging cma_translate_addr is a great way to handle > that. That would leave only the call in rdma_bind_addr, and for bind, > the process net namespace is the correct thing to use. Okay, I'll add a patch that cleans these cma_translate_addr calls. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR
Original code has a problem, cause following code failed to pass verifier: r1 <- r10 r1 -= 8 r2 = 8 r3 = unsafe pointer call BPF_FUNC_probe_read <-- R1 type=inv expected=fp However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be loaded successfully. This is because the verifier allows only BPF_ADD instruction on a FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB on FRAME_PTR reigster to get a UNKNOWN_VALUE register. This patch fix it by adding BPF_SUB in stack_relative checking. Signed-off-by: Wang Nan --- V1 is incorrect. Please ignore it and consider this one. --- kernel/bpf/verifier.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a251cf6..681ac72 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1020,7 +1020,8 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn) } /* pattern match 'bpf_add Rx, imm' instruction */ - if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 && + if ((opcode == BPF_ADD || opcode == BPF_SUB) && + BPF_CLASS(insn->code) == BPF_ALU64 && regs[insn->dst_reg].type == FRAME_PTR && BPF_SRC(insn->code) == BPF_K) stack_relative = true; -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bpf: fix a bug in verification logic when SUB operation taken on FRAME_PTR
Original code has a problem, cause following code failed to pass verifier: r1 <- r10 r1 -= 8 r2 = 8 r3 = unsafe pointer call BPF_FUNC_probe_read <-- R1 type=inv expected=fp However, by replacing 'r1 -= 8' to 'r1 += -8' the above program can be loaded successfully. This is because the verifier allows only BPF_ADD instruction on a FRAME_PTR reigster to forge PTR_TO_STACK register, but makes BPF_SUB on FRAME_PTR reigster to get a UNKNOWN_VALUE register. This patch fix it by adding BPF_SUB in stack_relative checking. Signed-off-by: Wang Nan --- kernel/bpf/verifier.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a251cf6..6dbdeba 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1020,7 +1020,8 @@ static int check_alu_op(struct reg_state *regs, struct bpf_insn *insn) } /* pattern match 'bpf_add Rx, imm' instruction */ - if (opcode == BPF_ADD && BPF_CLASS(insn->code) == BPF_ALU64 && + if (opcode == BPF_ADD && opcode == BPF_SUB && + BPF_CLASS(insn->code) == BPF_ALU64 && regs[insn->dst_reg].type == FRAME_PTR && BPF_SRC(insn->code) == BPF_K) stack_relative = true; -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 0/3] bpf: share helpers between tracing and networking
On 06/16/2015 07:10 PM, Alexei Starovoitov wrote: ... Ideally we would allow a blend of tracing and networking programs, then the best solution would be one or two stable tracepoints in networking stack where skb is visible and receiving/transmitting task is also visible, then skb->len and task->pid together would give nice foundation for accurate stats. I think combining both seems interesting anyway, we need to find a way to make this gluing of both worlds easy to use, though. It's certainly interesting for stats/diagnostics, but one wouldn't be able to use the current/future skb eBPF helpers from {cls,act}_bpf in that context. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] net: mvneta: introduce compatible string "marvell, armada-xp-neta"
Dear Jason Cooper, On Wed, 17 Jun 2015 21:39:26 +, Jason Cooper wrote: > Odd, I'd use that as an example of the process working. ;-) we have > everyone using 'armada-370-neta' for a given block. We discovered that > the original IP block (on the 370s) had a limitation (no hw checksum > for greater than 1600 bytes). A newer version of the IP block (XP) > doesn't have the limitation. > > So we change the driver to honor the limit for the 370 compatible > string. We create a new compatible string for xp where the block > doesn't have the limitation. > > How did the process fail? Because now all Armada XP users of jumbo frames are looking the HW checksum on their jumbo frames, which you can consider to be a regression: it was working, it is no longer working. Of course, since it falls back to SW checksumming, it still "works", but some users can complain of the performance penalty and consider it to be a regression. If on Armada XP, we had used for the beginning: compatible = "marvell,armada-xp-neta", "marvell,armada-370-neta" with only marvell,armada-370-neta supported originally, we could have added this fix without breaking HW checksumming on jumbo frames for Armada XP users. So I'm sorry, but the process indeed failed, because Armada XP users keeping their old Device Tree blob will see a regression. > I'm not seeing where backwards compatibility was broken? A device with > an old dtb booting a newer kernel gets a bugfix. In the case of an XP > board with an old dtb (armada-370-neta), the hardware still works, but > not optimally. Upgrading the dtb will enable hw checksumming for jumbo > packets. "not optimally" is still a breakage. Again, I personally don't care about DT backward compatibility as I think it's a stupid requirement. But I like to point out to the DT backward compatibility fanatics when it was actually broken :-) Best regards, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html