Re: [PATCH 2/6] test_bpf: allow tests to specify an skb fragment.
On 08/03/2015 04:02 PM, Nicolas Schichan wrote: This introduce a new test-aux flag (FLAG_SKB_FRAG) to tell the populate_skb() function to add a fragment to the test skb containing the data specified in test-frag_data). Signed-off-by: Nicolas Schichan nschic...@freebox.fr Acked-by: Alexei Starovoitov a...@plumgrid.com Acked-by: Daniel Borkmann dan...@iogearbox.net I'm good with this change here, just a comment below in general. enum { CLASSIC = BIT(6), /* Old BPF instructions only. */ @@ -81,6 +83,7 @@ struct bpf_test { __u32 result; } test[MAX_SUBTESTS]; int (*fill_helper)(struct bpf_test *self); + __u8 frag_data[MAX_DATA]; }; We now have 286 tests, which is awesome! Perhaps, we need to start thinking of a better test description method soonish as the test_bpf.ko module grew to ~1.6M, i.e. whenever we add to struct bpf_test, it adds memory overhead upon all test cases. /* Large test cases need separate allocation and fill handler. */ @@ -4525,6 +4528,10 @@ static struct sk_buff *populate_skb(char *buf, int size) static void *generate_test_data(struct bpf_test *test, int sub) { + struct sk_buff *skb; + struct page *page; + void *ptr; + if (test-aux FLAG_NO_DATA) return NULL; @@ -4532,7 +4539,36 @@ static void *generate_test_data(struct bpf_test *test, int sub) * subtests generate skbs of different sizes based on * the same data. */ - return populate_skb(test-data, test-test[sub].data_size); + skb = populate_skb(test-data, test-test[sub].data_size); + if (!skb) + return NULL; + + if (test-aux FLAG_SKB_FRAG) { Really minor nit: declaration of page, ptr could have been only in this block. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: fec: fix initial runtime PM refcount
The clocks are initially active and thus the device is marked active. This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend() call at the end of probe then leaves us with an invalid refcount of -1, which in turn leads to the device staying in suspended state even though netdev open had been called. Fix this by initializing the refcount to be coherent with the initial device status. Fixes: 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus) Signed-off-by: Lucas Stach l.st...@pengutronix.de --- Please apply this as a fix for 4.2 --- drivers/net/ethernet/freescale/fec_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c650e..271bb5862346 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev) pm_runtime_set_autosuspend_delay(pdev-dev, FEC_MDIO_PM_TIMEOUT); pm_runtime_use_autosuspend(pdev-dev); + pm_runtime_get_noresume(pdev-dev); pm_runtime_set_active(pdev-dev); pm_runtime_enable(pdev-dev); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/15] drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
On 07/31/2015 12:20 PM, Viresh Kumar wrote: On 31-07-15, 11:04, Murali Karicheri wrote: On 07/31/2015 04:38 AM, Viresh Kumar wrote: IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. IS_ERR_OR_NULL() is defined as static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr) { return !ptr || IS_ERR_VALUE((unsigned long)ptr); } So the unlikely() applies only to second part. Wouldn't that be a problem for optimization? This is what the first patch of the series does: http://permalink.gmane.org/gmane.linux.kernel/2009151 Assuming the above change is merged, this patch looks good. Acked-by: Murali Karicheri m-kariche...@ti.com -- Murali Karicheri Linux Kernel, Keystone -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] act_bpf: properly support late binding of bpf action to a classifier
Since the introduction of the BPF action in d23b8ad8ab23 (tc: add BPF based action), late binding was not working as expected. I.e. setting the action part for a classifier only via 'bpf index num', where num is the index of an existing action, is being rejected by the kernel due to other missing parameters. It doesn't make sense to require these parameters such as BPF opcodes etc, as they are not going to be used anyway: in this case, they're just allocated/parsed and then freed again w/o doing anything meaningful. Instead, parse and verify the remaining parameters *after* the test on tcf_hash_check(), when we really know that we're dealing with creation of a new action or replacement of an existing one and where late binding is thus irrelevant. After patch, test case is now working: FOO=1,6 0 0 4294967295, tc actions add action bpf bytecode $FOO tc filter add dev foo parent 1: bpf bytecode $FOO flowid 1:1 action bpf index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 tc filter show dev foo filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295' action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 Late binding of a BPF action can be useful for preloading maps (e.g. before they hit traffic) in case of eBPF programs, or to share a single eBPF action with multiple classifiers. Signed-off-by: Daniel Borkmann dan...@iogearbox.net --- This one was still in my queue of fixes, net-next is totally fine here. Will push out minor iproute2 change afterwards. net/sched/act_bpf.c | 51 +++ 1 file changed, 27 insertions(+), 24 deletions(-) diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index aaae8e8..1b97dab 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -278,7 +278,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, struct tc_act_bpf *parm; struct tcf_bpf *prog; bool is_bpf, is_ebpf; - int ret; + int ret, res = 0; if (!nla) return -EINVAL; @@ -287,41 +287,43 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, if (ret 0) return ret; - is_bpf = tb[TCA_ACT_BPF_OPS_LEN] tb[TCA_ACT_BPF_OPS]; - is_ebpf = tb[TCA_ACT_BPF_FD]; - - if ((!is_bpf !is_ebpf) || (is_bpf is_ebpf) || - !tb[TCA_ACT_BPF_PARMS]) + if (!tb[TCA_ACT_BPF_PARMS]) return -EINVAL; parm = nla_data(tb[TCA_ACT_BPF_PARMS]); - memset(cfg, 0, sizeof(cfg)); - - ret = is_bpf ? tcf_bpf_init_from_ops(tb, cfg) : - tcf_bpf_init_from_efd(tb, cfg); - if (ret 0) - return ret; - if (!tcf_hash_check(parm-index, act, bind)) { ret = tcf_hash_create(parm-index, est, act, sizeof(*prog), bind, false); if (ret 0) - goto destroy_fp; + return ret; - ret = ACT_P_CREATED; + res = ACT_P_CREATED; } else { /* Don't override defaults. */ if (bind) - goto destroy_fp; + return 0; tcf_hash_release(act, bind); - if (!replace) { - ret = -EEXIST; - goto destroy_fp; - } + if (!replace) + return -EEXIST; } + is_bpf = tb[TCA_ACT_BPF_OPS_LEN] tb[TCA_ACT_BPF_OPS]; + is_ebpf = tb[TCA_ACT_BPF_FD]; + + if ((!is_bpf !is_ebpf) || (is_bpf is_ebpf)) { + ret = -EINVAL; + goto out; + } + + memset(cfg, 0, sizeof(cfg)); + + ret = is_bpf ? tcf_bpf_init_from_ops(tb, cfg) : + tcf_bpf_init_from_efd(tb, cfg); + if (ret 0) + goto out; + prog = to_bpf(act); spin_lock_bh(prog-tcf_lock); @@ -341,15 +343,16 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, spin_unlock_bh(prog-tcf_lock); - if (ret == ACT_P_CREATED) + if (res == ACT_P_CREATED) tcf_hash_insert(act); else tcf_bpf_cfg_cleanup(old); - return ret; + return res; +out: + if (res == ACT_P_CREATED) + tcf_hash_cleanup(act, est); -destroy_fp: - tcf_bpf_cfg_cleanup(cfg); return ret; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xen-netback: Allocate fraglist early to avoid complex rollback
Determine if a fraglist is needed in the tx path, and allocate it if necessary before setting up the copy and map operations. Otherwise, undoing the copy and map operations is tricky. This fixes a use-after-free: if allocating the fraglist failed, the copy and map operations that had been set up were still executed, writing over the data area of a freed skb. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- drivers/net/xen-netback/netback.c | 61 +-- 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 7d50711..1b406e7 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -810,23 +810,17 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size) static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue, struct sk_buff *skb, struct xen_netif_tx_request *txp, - struct gnttab_map_grant_ref *gop) + struct gnttab_map_grant_ref *gop, + unsigned int frag_overflow, + struct sk_buff *nskb) { struct skb_shared_info *shinfo = skb_shinfo(skb); skb_frag_t *frags = shinfo-frags; u16 pending_idx = XENVIF_TX_CB(skb)-pending_idx; int start; pending_ring_idx_t index; - unsigned int nr_slots, frag_overflow = 0; + unsigned int nr_slots; - /* At this point shinfo-nr_frags is in fact the number of -* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX. -*/ - if (shinfo-nr_frags MAX_SKB_FRAGS) { - frag_overflow = shinfo-nr_frags - MAX_SKB_FRAGS; - BUG_ON(frag_overflow MAX_SKB_FRAGS); - shinfo-nr_frags = MAX_SKB_FRAGS; - } nr_slots = shinfo-nr_frags; /* Skip first skb fragment if it is on same page as header fragment. */ @@ -841,13 +835,6 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que } if (frag_overflow) { - struct sk_buff *nskb = xenvif_alloc_skb(0); - if (unlikely(nskb == NULL)) { - if (net_ratelimit()) - netdev_err(queue-vif-dev, - Can't allocate the frag_list skb.\n); - return NULL; - } shinfo = skb_shinfo(nskb); frags = shinfo-frags; @@ -1175,9 +1162,10 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, unsigned *copy_ops, unsigned *map_ops) { - struct gnttab_map_grant_ref *gop = queue-tx_map_ops, *request_gop; - struct sk_buff *skb; + struct gnttab_map_grant_ref *gop = queue-tx_map_ops; + struct sk_buff *skb, *nskb; int ret; + unsigned int frag_overflow; while (skb_queue_len(queue-tx_queue) budget) { struct xen_netif_tx_request txreq; @@ -1265,6 +1253,29 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, break; } + skb_shinfo(skb)-nr_frags = ret; + if (data_len txreq.size) + skb_shinfo(skb)-nr_frags++; + /* At this point shinfo-nr_frags is in fact the number of +* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX. +*/ + frag_overflow = 0; + nskb = NULL; + if (skb_shinfo(skb)-nr_frags MAX_SKB_FRAGS) { + frag_overflow = skb_shinfo(skb)-nr_frags - MAX_SKB_FRAGS; + BUG_ON(frag_overflow MAX_SKB_FRAGS); + skb_shinfo(skb)-nr_frags = MAX_SKB_FRAGS; + nskb = xenvif_alloc_skb(0); + if (unlikely(nskb == NULL)) { + kfree_skb(skb); + xenvif_tx_err(queue, txreq, idx); + if (net_ratelimit()) + netdev_err(queue-vif-dev, + Can't allocate the frag_list skb.\n); + break; + } + } + if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) { struct xen_netif_extra_info *gso; gso = extras[XEN_NETIF_EXTRA_TYPE_GSO - 1]; @@ -1272,6 +1283,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue, if (xenvif_set_skb_gso(queue-vif,
Re: [PATCH 1/6] test_bpf: avoid oopsing the kernel when generate_test_data() fails.
On 08/03/2015 04:02 PM, Nicolas Schichan wrote: Signed-off-by: Nicolas Schichan nschic...@freebox.fr Acked-by: Alexei Starovoitov a...@plumgrid.com Acked-by: Daniel Borkmann dan...@iogearbox.net -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] virtio_net: add gro capability
On 08/03/2015 06:37 AM, Michael S. Tsirkin wrote: Ideally this needs to also be tested on non-vxlan configs with gro in host, to make sure this doesn't cause regressions. Measured with the same instances on the same hardware and software, taking a path through the stack (public rather than private IPs, with Distributed Virtual Router (DVR) enabled) which gives them GRO: Throughput Min Median Average Max 4.2.0-rc3+_hostGRO 6713835182329102 4.2.0-rc3+flush1k_hostGRO 6539826782068982 As singletons, Mins and Maxes probably have rather high variability, I'd focus on the Median and Average and those are within 1%. Send Service Demand Min Median Average Max 4.2.0-rc3+_hostGRO 0.332 0.496 0.490 0.651 4.2.0-rc3+flush1k_hostGRO 0.328 0.493 0.488 0.678 Receive Service Demand Min Median Average Max 4.2.0-rc3+_hostGRO 0.386 0.469 0.485 0.677 4.2.0-rc3+flush1k_hostGRO 0.369 0.466 0.477 0.665 happy benchmarking, rick -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fec: fix initial runtime PM refcount
On Mon, Aug 03, 2015 at 05:50:11PM +0200, Lucas Stach wrote: The clocks are initially active and thus the device is marked active. This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend() call at the end of probe then leaves us with an invalid refcount of -1, which in turn leads to the device staying in suspended state even though netdev open had been called. Fix this by initializing the refcount to be coherent with the initial device status. Fixes: 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus) Signed-off-by: Lucas Stach l.st...@pengutronix.de --- Please apply this as a fix for 4.2 --- drivers/net/ethernet/freescale/fec_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c650e..271bb5862346 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev) pm_runtime_set_autosuspend_delay(pdev-dev, FEC_MDIO_PM_TIMEOUT); pm_runtime_use_autosuspend(pdev-dev); + pm_runtime_get_noresume(pdev-dev); pm_runtime_set_active(pdev-dev); pm_runtime_enable(pdev-dev); This might work, but is it the correct fix? Documentation/power/runtime_pm.txt says: 534 In addition to that, the initial runtime PM status of all devices is 535 'suspended', but it need not reflect the actual physical state of the device. 536 Thus, if the device is initially active (i.e. it is able to process I/O), its 537 runtime PM status must be changed to 'active', with the help of 538 pm_runtime_set_active(), before pm_runtime_enable() is called for the device. At the point we call the pm_runtime_ functions above, all the clocks are ticking. So according to the documentation pm_runtime_set_active() is the right thing to do. But it makes no mention of have to call pm_runtime_get_noresume(). I would of expected pm_runtime_set_active() to set the count to the correct value. Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: Fix L4 checksum handling when dealing with IP fragments
On Sat, Aug 1, 2015 at 6:31 PM, Glenn Griffin ggriffin.ker...@gmail.com wrote: openvswitch modifies the L4 checksum of a packet when modifying the ip address. When an IP packet is fragmented only the first fragment contains an L4 header and checksum. Prior to this change openvswitch would modify all fragments, modifying application data in non-first fragments, causing checksum failures in the reassembled packet. Signed-off-by: Glenn Griffin ggriffin.ker...@gmail.com Patch looks good. I have one following comment. --- net/openvswitch/actions.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 8a8c0b8..bfffb1a 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -273,28 +273,36 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key, return 0; } -static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh, - __be32 *addr, __be32 new_addr) +static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh, + __be32 addr, __be32 new_addr) { int transport_len = skb-len - skb_transport_offset(skb); + if (ntohs(nh-frag_off) IP_OFFSET) + return; It is efficient to check frag-offset in network byte order. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] test_bpf: add module parameters to filter the tests to run.
On 08/03/2015 05:58 PM, Daniel Borkmann wrote: On 08/03/2015 04:02 PM, Nicolas Schichan wrote: When developping on the interpreter or a particular JIT, it can be insteresting to restrict the test list to a specific test or a s/insteresting/interesting/ [...] s/test_pbf/test_bpf/ [...] s/test_pbf/test_bpf/ [...] s/conver/cover/ Sorry for the various typos, I'll fix that in a V2. + */ +if (test_id = ARRAY_SIZE(tests)) { +pr_err(test_bpf: invalid test_id specified.\n); +return -EINVAL; +} [...] @@ -4893,6 +4955,14 @@ static __init void destroy_bpf_tests(void) } } +static bool exclude_test(int test_id) +{ +if (test_range[0] = 0 +(test_id test_range[0] || test_id test_range[1])) +return true; +return false; Minor nit: could directly return it, f.e.: return test_range[0] = 0 (test_id test_range[0] || test_id test_range[1]); I will change that. Btw, for the range test in prepare_bpf_tests(), you could also reject a negative lower bound index right there. I thought it was better to have all the sanity checks grouped in prepare_bpf_tests() (with the checking of the test_name and test_id parameters nearby) ? Also a negative lower bound is meaning that no range has been set so all tests should be run. Thanks, -- Nicolas Schichan Freebox SAS -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] rds: fix an integer overflow test in rds_info_getsockopt()
From: Dan Carpenter dan.carpen...@oracle.com Date: Sat, 1 Aug 2015 15:33:26 +0300 len is a signed integer. We check that len is not negative, so it goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than INT_MAX so the condition can never be true. I don't know if this is harmful but it seems safe to limit len to INT_MAX - 4095. Fixes: a8c879a7ee98 ('RDS: Info and stats') Signed-off-by: Dan Carpenter dan.carpen...@oracle.com Applied, thanks Dan. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RX overrun errors building on several of our hosts
After implementing a new monitoring tool I've noticed that several of our physical servers have increasing RX errors, all seem to be classified as overruns. The interfaces are Broadcom Corporation BCM57840 NetXtreme II 10 Gigabit Ethernet (rev 11) and we are using the bnx2x driver. The are configured in a bond0 using mode 0 or balance-rr. We are not seeing any errors on the switch and my guess is that this is either a config issue or driver problem since it's happening on multiple servers. All the interfaces appear to be connected at 10 gig full duplex. The servers are Dell M620s. I've gathered as much related info as I could think of that would be helpful, it can be found in this paste: http://pastebin.centos.org/31716/ I'm not entirely sure where to look next, any help would be much appreciated. Thanks, Dan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net] udp: fix dst races with multicast early demux
Hi, I have included this patch into my code and re-run our tests overnight, out of 644 iterations we did not see the kernel crash. Previous reproduction rate we would have expected 4-6 crashes in this time. So I think this fixes the issue we are seeing. Thanks, Greg From: netdev-ow...@vger.kernel.org netdev-ow...@vger.kernel.org on behalf of Eric Dumazet eric.duma...@gmail.com Sent: Saturday, 1 August 2015 10:14 p.m. To: Gregory Hoggarth Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com; Michal Kubeček Subject: [PATCH net] udp: fix dst races with multicast early demux From: Eric Dumazet eduma...@google.com Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit f8864972126899 (ipv4: fix dst race in sk_dst_get()), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk-sk_rx_dst, as mentioned in commit d0c294c53a771 (tcp: prevent fetching dst twice in early demux code) Fixes: 421b3885bf6d (udp: ipv4: Add udp early demux) Signed-off-by: Eric Dumazet eduma...@google.com Reported-by: Gregory Hoggarth gregory.hogga...@alliedtelesis.co.nz Reported-by: Alex Gartrell agartr...@fb.com Cc: Michal Kubeček mkube...@suse.cz --- David : I will be on vacation for following 7 days, no internet access. Please wait for tests done by Gregory Alex before merging this ? Thanks ! net/ipv4/udp.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..1b8c5ba7d5f7 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1995,12 +1995,19 @@ void udp_v4_early_demux(struct sk_buff *skb) skb-sk = sk; skb-destructor = sock_efree; - dst = sk-sk_rx_dst; + dst = READ_ONCE(sk-sk_rx_dst); if (dst) dst = dst_check(dst, 0); - if (dst) - skb_dst_set_noref(skb, dst); + if (dst) { + /* DST_NOCACHE can not be used without taking a reference */ + if (dst-flags DST_NOCACHE) { + if (likely(atomic_inc_not_zero(dst-__refcnt))) + skb_dst_set(skb, dst); + } else { + skb_dst_set_noref(skb, dst); + } + } } int udp_rcv(struct sk_buff *skb) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 2/2] gre: Remove support for sharing GRE protocol hook.
Support for sharing GREPROTO_CISCO port was added so that OVS gre port and kernel GRE devices can co-exist. After flow-based tunneling patches OVS GRE protocol processing is completely moved to ip_gre module. so there is no need for GRE protocol hook. Following patch consolidates GRE protocol related functions into ip_gre module. Signed-off-by: Pravin B Shelar pshe...@nicira.com --- include/net/gre.h| 86 +++-- net/ipv4/gre_demux.c | 201 +-- net/ipv4/ip_gre.c| 215 +++ 3 files changed, 209 insertions(+), 293 deletions(-) diff --git a/include/net/gre.h b/include/net/gre.h index 4193fd7..b2572b7 100644 --- a/include/net/gre.h +++ b/include/net/gre.h @@ -4,92 +4,24 @@ #include linux/skbuff.h #include net/ip_tunnels.h -#define GREPROTO_CISCO 0 -#define GREPROTO_PPTP 1 -#define GREPROTO_MAX 2 -#define GRE_IP_PROTO_MAX 2 - -struct gre_protocol { - int (*handler)(struct sk_buff *skb); - void (*err_handler)(struct sk_buff *skb, u32 info); -}; - struct gre_base_hdr { __be16 flags; __be16 protocol; }; #define GRE_HEADER_SECTION 4 +struct gre_protocol { + int (*handler)(struct sk_buff *skb); + void (*err_handler)(struct sk_buff *skb, u32 info); +}; + int gre_add_protocol(const struct gre_protocol *proto, u8 version); int gre_del_protocol(const struct gre_protocol *proto, u8 version); -struct gre_cisco_protocol { - int (*handler)(struct sk_buff *skb, const struct tnl_ptk_info *tpi); - int (*err_handler)(struct sk_buff *skb, u32 info, - const struct tnl_ptk_info *tpi); - u8 priority; -}; - -int gre_cisco_register(struct gre_cisco_protocol *proto); -int gre_cisco_unregister(struct gre_cisco_protocol *proto); +#define GREPROTO_CISCO 0 +#define GREPROTO_PPTP 1 +#define GREPROTO_MAX 2 +#define GRE_IP_PROTO_MAX 2 #define GRE_TAP_FB_NAME gretap0 - -static inline int ip_gre_calc_hlen(__be16 o_flags) -{ - int addend = 4; - - if (o_flagsTUNNEL_CSUM) - addend += 4; - if (o_flagsTUNNEL_KEY) - addend += 4; - if (o_flagsTUNNEL_SEQ) - addend += 4; - return addend; -} - -static inline __be16 gre_flags_to_tnl_flags(__be16 flags) -{ - __be16 tflags = 0; - - if (flags GRE_CSUM) - tflags |= TUNNEL_CSUM; - if (flags GRE_ROUTING) - tflags |= TUNNEL_ROUTING; - if (flags GRE_KEY) - tflags |= TUNNEL_KEY; - if (flags GRE_SEQ) - tflags |= TUNNEL_SEQ; - if (flags GRE_STRICT) - tflags |= TUNNEL_STRICT; - if (flags GRE_REC) - tflags |= TUNNEL_REC; - if (flags GRE_VERSION) - tflags |= TUNNEL_VERSION; - - return tflags; -} - -static inline __be16 tnl_flags_to_gre_flags(__be16 tflags) -{ - __be16 flags = 0; - - if (tflags TUNNEL_CSUM) - flags |= GRE_CSUM; - if (tflags TUNNEL_ROUTING) - flags |= GRE_ROUTING; - if (tflags TUNNEL_KEY) - flags |= GRE_KEY; - if (tflags TUNNEL_SEQ) - flags |= GRE_SEQ; - if (tflags TUNNEL_STRICT) - flags |= GRE_STRICT; - if (tflags TUNNEL_REC) - flags |= GRE_REC; - if (tflags TUNNEL_VERSION) - flags |= GRE_VERSION; - - return flags; -} - #endif diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 77562e0..d9c552a 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -31,7 +31,6 @@ #include net/xfrm.h static const struct gre_protocol __rcu *gre_proto[GREPROTO_MAX] __read_mostly; -static struct gre_cisco_protocol __rcu *gre_cisco_proto_list[GRE_IP_PROTO_MAX]; int gre_add_protocol(const struct gre_protocol *proto, u8 version) { @@ -61,163 +60,6 @@ int gre_del_protocol(const struct gre_protocol *proto, u8 version) } EXPORT_SYMBOL_GPL(gre_del_protocol); -static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, - bool *csum_err) -{ - const struct gre_base_hdr *greh; - __be32 *options; - int hdr_len; - - if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr - return -EINVAL; - - greh = (struct gre_base_hdr *)skb_transport_header(skb); - if (unlikely(greh-flags (GRE_VERSION | GRE_ROUTING))) - return -EINVAL; - - tpi-flags = gre_flags_to_tnl_flags(greh-flags); - hdr_len = ip_gre_calc_hlen(tpi-flags); - - if (!pskb_may_pull(skb, hdr_len)) - return -EINVAL; - - greh = (struct gre_base_hdr *)skb_transport_header(skb); - tpi-proto = greh-protocol; - - options = (__be32 *)(greh + 1); - if (greh-flags GRE_CSUM) { - if (skb_checksum_simple_validate(skb))
[PATCH net-next v2 0/2] GRE: Use flow based tunneling for OVS GRE vport.
Following patches make use of new flow based tunneling API from kernel. This allows us to directly use netdev based GRE tunnel implementation. While doing so I have removed GRE demux API which were targeted for OVS. Most of GRE protocol code is now consolidated in ip_gre module. Pravin B Shelar (2): openvswitch: Use regular GRE net_device instead of vport gre: Remove support for sharing GRE protocol hook. include/net/gre.h | 97 ++ include/net/ip_tunnels.h | 6 +- net/ipv4/gre_demux.c | 235 +--- net/ipv4/ip_gre.c | 400 ++--- net/ipv4/ip_tunnel.c | 6 +- net/ipv4/ipip.c| 2 +- net/ipv6/sit.c | 2 +- net/openvswitch/Kconfig| 1 - net/openvswitch/vport-gre.c| 230 +++- net/openvswitch/vport-netdev.c | 5 +- net/openvswitch/vport-netdev.h | 2 + net/openvswitch/vport.h| 2 +- 12 files changed, 431 insertions(+), 557 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fddi: Use a more more typical logging style
From: Joe Perches j...@perches.com Date: Sun, 02 Aug 2015 21:27:45 -0700 Use macros that don't require fixed argument counts so format and arguments can be verified by the compiler. Miscellanea: o Remove a few #if uses to allow dynamic debug to always work o whitespace neatening Signed-off-by: Joe Perches j...@perches.com This doesn't apply cleanly to net-next, please respin. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: dsa: mv88e6xxx: call _mv88e6xxx_stats_wait with SMI lock held
From: Vivien Didelot vivien.dide...@savoirfairelinux.com Date: Mon, 3 Aug 2015 09:17:44 -0400 At switch setup, _mv88e6xxx_stats_wait was called without holding the SMI mutex. Fix this by requesting the lock for this call. Also, return the _mv88e6xxx_stats_wait code, since it may fail. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com Applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/1] e1000: remove dead e1000_init_eeprom_params calls.
On Fri, Jul 24, 2015 at 2:40 PM, Francois Romieu rom...@fr.zoreil.com wrote: The device probe method e1000_probe calls e1000_init_eeprom_params itself so there's no reason to call it again from e1000_do_write_eeprom or e1000_do_read_eeprom. The sentence above assumes that e1000_init_eeprom_params is effective but it's mostly dependant on hw-mac_type: safe as e1000_probe bails out early if it can't set mac_type (see e1000_init_hw_struct, then e1000_set_mac_type). Btw, if effective, the removed paths would had been deadlock prone when e1000_eeprom_spi was set: - e1000_write_eeprom (takes e1000_eeprom_lock) - e1000_do_write_eeprom - e1000_init_eeprom_params - e1000_read_eeprom (takes e1000_eeprom_lock) (same narrative with e1000_read_eeprom - e1000_do_read_eeprom etc.) As a final note, the candidate deadlock above can't happen in e1000_probe due to the way eeprom-word_size is set / tested. Signed-off-by: Francois Romieu rom...@fr.zoreil.com --- Untested. I have found it while looking at Joern's patch. drivers/net/ethernet/intel/e1000/e1000_hw.c | 8 1 file changed, 8 deletions(-) Can you please send this to intel-wired-...@lists.osuosl.org mailing list? That is the mailing list created/used for these patches. It also helps me out by adding your patch to our patchworks project for patches against Intel wired drivers. Thanks in advance, sorry for the delayed response, was on vacation last week. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/2] lwtunnel: encap locally-generated ipv4 packets
On 8/3/15, 9:39 AM, Robert Shearman wrote: Locally-generated IPv4 packets, such as from applications running on the host or traceroute/ping currently don't have lwtunnel output redirected encap applied. However, they should do in the same way as for forwarded packets and this patch series addresses that. Robert Shearman (2): lwtunnel: set skb protocol and dev ipv4: apply lwtunnel encap for locally-generated packets net/core/lwtunnel.c | 12 ++-- net/ipv4/route.c| 2 ++ 2 files changed, 12 insertions(+), 2 deletions(-) Thanks for this patch Robert. Looks good. I have been thinking of sending a similar patch out for this and since i was also looking at ip fragmentation, I have a slightly different patch which I think should also take care of encapsulating locally generated packets too. This patch moves the output redirection to after ip fragmentation. What do you think about the below (I have briefly tested it. Was planning to test some more before sending it out as RFC) ? [PATCH net-next] lwtunnel: move output redirection to after ip fragmentation This patch adds tunnel headroom in lwtstate to make sure we account for tunnel data in mtu calculations and moves tunnel output redirection after ip fragmentation. Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com --- include/net/lwtunnel.h |1 + net/ipv4/ip_output.c |4 net/ipv4/route.c |5 +++-- net/mpls/mpls_iptunnel.c |1 + 4 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 918e03c..7816805 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -18,6 +18,7 @@ struct lwtunnel_state { __u16 flags; atomic_trefcnt; int len; + __u16 headroom; __u8data[0]; }; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 6bf89a6..ae3119f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -73,6 +73,7 @@ #include net/icmp.h #include net/checksum.h #include net/inetpeer.h +#include net/lwtunnel.h #include linux/igmp.h #include linux/netfilter_ipv4.h #include linux/netfilter_bridge.h @@ -201,6 +202,9 @@ static int ip_finish_output2(struct sock *sk, struct sk_buff *skb) skb = skb2; } + if (lwtunnel_output_redirect(rt-rt_lwtstate)) + return lwtunnel_output(sk, skb); + rcu_read_lock_bh(); nexthop = (__force u32) rt_nexthop(rt, ip_hdr(skb)-daddr); neigh = __ipv4_neigh_lookup_noref(dev, nexthop); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index d3964fa..4e07b9a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1234,6 +1234,9 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst) mtu = dst-dev-mtu; + if (lwtunnel_output_redirect(rt-rt_lwtstate)) + mtu -= rt-rt_lwtstate-headroom; + if (unlikely(dst_metric_locked(dst, RTAX_MTU))) { if (rt-rt_uses_gateway mtu 576) mtu = 576; @@ -1634,8 +1637,6 @@ static int __mkroute_input(struct sk_buff *skb, rth-dst.output = ip_output; rt_set_nexthop(rth, daddr, res, fnhe, res-fi, res-type, itag); - if (lwtunnel_output_redirect(rth-rt_lwtstate)) - rth-dst.output = lwtunnel_output; skb_dst_set(skb, rth-dst); out: err = 0; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add BQL support for 3c59x, based on patch from Tino Reichardt.
From: Loganaden Velvindron lo...@elandsys.com Date: Fri, 31 Jul 2015 23:13:13 -0700 Tested on 3Com PCI 3c905C Tornardo by running Flent multiple times. Signed-off-by: Loganaden Velvindron lo...@elandsys.com Please format your Subject line correctly, it should be of the form: [PATCH $TREE] $SUBSYSTEM: $DESCRIPTION. Where TREE is either 'net' or 'net-next'. SUBSYSTEM is the subsystem or driver name being changed, which here should be '3c59x' and then the title line descritpion of your patch. Also, I am pretty sure you will need to add logic to vortex_tx_timeout() since that resets the TX ring state. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] act_bpf: properly support late binding of bpf action to a classifier
From: Daniel Borkmann dan...@iogearbox.net Date: Mon, 3 Aug 2015 16:21:57 +0200 Since the introduction of the BPF action in d23b8ad8ab23 (tc: add BPF based action), late binding was not working as expected. I.e. setting the action part for a classifier only via 'bpf index num', where num is the index of an existing action, is being rejected by the kernel due to other missing parameters. It doesn't make sense to require these parameters such as BPF opcodes etc, as they are not going to be used anyway: in this case, they're just allocated/parsed and then freed again w/o doing anything meaningful. Instead, parse and verify the remaining parameters *after* the test on tcf_hash_check(), when we really know that we're dealing with creation of a new action or replacement of an existing one and where late binding is thus irrelevant. After patch, test case is now working: FOO=1,6 0 0 4294967295, tc actions add action bpf bytecode $FOO tc filter add dev foo parent 1: bpf bytecode $FOO flowid 1:1 action bpf index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 tc filter show dev foo filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295' action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 Late binding of a BPF action can be useful for preloading maps (e.g. before they hit traffic) in case of eBPF programs, or to share a single eBPF action with multiple classifiers. Signed-off-by: Daniel Borkmann dan...@iogearbox.net Applied to net-next, thanks Daniel. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] bridge: mdb: fix vlan_enabled access when vlans are not configured
From: Nikolay Aleksandrov niko...@cumulusnetworks.com Instead of trying to access br-vlan_enabled directly use the provided helper br_vlan_enabled(). Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- Sorry, forgot to change this before sending the patch. net/bridge/br_mdb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 1df3ef4a73b9..d747275fad18 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -490,7 +490,7 @@ static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh) return -EINVAL; pv = nbp_get_vlan_info(p); - if (br-vlan_enabled pv entry-vid == 0) { + if (br_vlan_enabled(br) pv entry-vid == 0) { for_each_set_bit(vid, pv-vlan_bitmap, VLAN_N_VID) { entry-vid = vid; err = __br_mdb_add(net, br, entry); @@ -592,7 +592,7 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh) return -EINVAL; pv = nbp_get_vlan_info(p); - if (br-vlan_enabled pv entry-vid == 0) { + if (br_vlan_enabled(br) pv entry-vid == 0) { for_each_set_bit(vid, pv-vlan_bitmap, VLAN_N_VID) { entry-vid = vid; err = __br_mdb_del(br, entry); -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] mpls: Use definition for reserved label checks
On 8/3/15, 9:50 AM, Robert Shearman wrote: In multiple locations there are checks for whether the label in hand is a reserved label or not using the arbritray value of 16. Factor this out into a #define for better maintainability and for documentation. Signed-off-by: Robert Shearman rshea...@brocade.com --- Acked-by: Roopa Prabhu ro...@cumulusnetworks.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r8169: Permit users to change transmit and receive max pachet size
From: Corcodel.marian corcodel.mar...@gmail.com Date: Tue, 04 Aug 2015 00:41:50 +0300 A this moment these param is only for test and not for large utilization. Then you can patch your local driver for testing. You change doesn't belong upstream. We're not going to litter drivers with debugging hack options. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bridge: mdb: add/del entry on all vlans if vlan_filter is enabled and vid is 0
From: Nikolay Aleksandrov ra...@blackwall.org Date: Mon, 3 Aug 2015 13:29:16 +0200 From: Satish Ashok sas...@cumulusnetworks.com Before this patch when a vid was not specified, the entry was added with vid 0 which is useless when vlan_filtering is enabled. This patch makes the entry to be added on all configured vlans when vlan filtering is enabled and respectively deleted from all, if the entry vid is 0. This is also closer to the way fdb works with regard to vid 0 and vlan filtering. ... Signed-off-by: Satish Ashok sas...@cumulusnetworks.com Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Applied, but as usual if any existing user ends up being broken I will revert this. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fddi: Use a more more typical logging style
On Mon, 2015-08-03 at 16:05 -0700, David Miller wrote: From: Joe Perches j...@perches.com Date: Sun, 02 Aug 2015 21:27:45 -0700 Use macros that don't require fixed argument counts so format and arguments can be verified by the compiler. Miscellanea: o Remove a few #if uses to allow dynamic debug to always work o whitespace neatening Signed-off-by: Joe Perches j...@perches.com This doesn't apply cleanly to net-next, please respin. Apologies for that. I used a newer version of the Evolution email client (3.16.0) which corrupts tabs. 3.16.3 doesn't seem to do that. I'll probably downgrade back to the old 3.12 version though. It doesn't send some attachments properly, but the editor at least works well. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] bridge: mdb: fix vlan_enabled access when vlans are not configured
From: Nikolay Aleksandrov ra...@blackwall.org Date: Tue, 4 Aug 2015 01:19:58 +0200 From: Nikolay Aleksandrov niko...@cumulusnetworks.com Instead of trying to access br-vlan_enabled directly use the provided helper br_vlan_enabled(). Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: null dereference in __skb_recv_datagram
Hi Folks, I just hit this one off of latest net-next tree, git hash 58da018053531b9cb91423a64f2a762ef0fe7456 I was running a set of tests from pyroute2 project: https://github.com/svinota/pyroute2 A simple `sudo make test skip_tests=test_stress` in there hits the issue every time. [ 318.244596] BUG: unable to handle kernel NULL pointer dereference at 008e [ 318.245182] IP: [81455e7c] __skb_recv_datagram+0xbc/0x5a0 [ 318.245762] PGD 80999d067 PUD 7fc04a067 PMD 0 [ 318.246336] Oops: [#1] [ 318.262158] CPU: 3 PID: 1580 Comm: dnsmasq Not tainted 4.2.0-rc4-g58da018 #28 [ 318.263143] Hardware name: MSI MS-7930/Z97S SLI PLUS (MS-7930), BIOS V1.2 05/22/2014 [ 318.264137] task: 880806c96200 ti: 8807fc09 task.ti: 8807fc09 [ 318.265136] RIP: 0010:[81455e7c] [81455e7c] __skb_recv_datagram+0xbc/0x5a0 [ 318.266153] RSP: 0018:8807fc093b98 EFLAGS: 00010082 [ 318.267158] RAX: 0296 RBX: RCX: 8807fc093c7c [ 318.268172] RDX: 0001 RSI: RDI: 88080a4b88ac [ 318.269186] RBP: 8807fc093c68 R08: 8807fc093cb0 R09: 7000 [ 318.270200] R10: 8807fc094000 R11: 0246 R12: 8807fc093c78 [ 318.271217] R13: 88080a4b8800 R14: 88080a4b8898 R15: [ 318.272236] FS: 7f52a582f700() GS:88082ecc() knlGS: [ 318.273264] CS: 0010 DS: ES: CR0: 80050033 [ 318.274286] CR2: 008e CR3: 00080824 CR4: 001406e0 [ 318.275314] DR0: DR1: DR2: [ 318.276323] DR3: DR6: fffe0ff0 DR7: 0400 [ 318.277311] Stack: [ 318.278282] dead00200200 8808061e4300 88080a4d9100 00db [ 318.279291] 8808 8807fc093a78 880806c96200 8807fc094000 [ 318.280287] 8807fc093c20 8807fc093cb0 8807fc093c7c [ 318.281260] Call Trace: [ 318.282202] [811e5af0] ? poll_select_copy_remaining+0x140/0x140 [ 318.283150] [8145639f] skb_recv_datagram+0x3f/0x60 [ 318.284077] [81492e49] netlink_recvmsg+0x59/0x360 [ 318.284984] [81445543] sock_recvmsg+0x13/0x20 [ 318.285867] [81448053] ___sys_recvmsg+0xe3/0x210 [ 318.286729] [81212af6] ? fsnotify+0x316/0x4a0 [ 318.287569] [81449367] __sys_recvmsg+0x57/0xa0 [ 318.288389] [814493c2] SyS_recvmsg+0x12/0x20 [ 318.289186] [81564aee] entry_SYSCALL_64_fastpath+0x12/0x71 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] 3c59x: Fix resource leaks in vortex_open
From: Jia-Ju Bai baijiaju1...@163.com Date: Mon, 3 Aug 2015 11:18:12 +0800 When vortex_up is failed, the skb buffers allocated by __netdev_alloc_skb in vortex_open are not released, which may cause resource leaks. This bug has been submitted before. This patch modifies the error handling code to fix it. Signed-off-by: Jia-Ju Bai baijiaju1...@163.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] igb/ixgbe: Fix ordering of SR-IOV teardown
On Wed, 2015-07-29 at 14:31 -0700, David Miller wrote: From: Alex Williamson alex.william...@redhat.com Date: Wed, 29 Jul 2015 13:33:07 -0600 I expect that's because of this patch that's in Jeff's dev-queue branch: http://git.kernel.org/cgit/linux/kernel/git/jkirsher/next-queue.git/commit/?h=dev-queueid=ddf766a812a13eca1116b5905e902184904266f9 I based these patches off that branch, assuming they'd take the same route and avoid the merge conflict. If you'd rather take these, I'll be happy to respin. Apologies for not noting the base branch in the series. Thanks, No, that's fine, this would normally go via Jeff's tree anyways. I just didn't see him take it so I assumed that it should go via me. Sorry, was on vacation last week and cell coverage was spotty where I was at. I have picked up the series. signature.asc Description: This is a digitally signed message part
[PATCH 1/1] net/ipv4: Enable flow-based ECMP
Enable flow-based ECMP. Currently if equal-cost multipath is enabled the kernel chooses between equal cost paths for each matching packet, essentially packets are round-robined between the routes. This means that packets from a single flow can traverse different routes. If one of the routes experiences congestion this can result in delayed or out of order packets arriving at the destination. This patch allows packets to be routed based on their flow - packets in the same flow will always use the same route. This prevents out of order packets. There are other issues with round-robin based ECMP routing related to variable path MTU handling and debugging. The default behaviour is changed by this patch to enable flow based ECMP routing rather than the previous round-robin routing. The behaviour can be changed using a new sysctl option /net/ipv4/route/flow_based_ecmp. See RFC2991 for more details on the problems associated with packet based ECMP routing. This patch relies on the skb hash value to select between routes. The selection uses a hash-threshold algorithm (see RFC2992). Signed-off-by: Richard Laing richard.la...@alliedtelesis.co.nz --- include/net/flow.h |8 include/net/ip_fib.h |4 include/net/route.h |2 ++ net/ipv4/fib_semantics.c | 30 ++ net/ipv4/route.c | 19 +++ 5 files changed, 59 insertions(+), 4 deletions(-) diff --git a/include/net/flow.h b/include/net/flow.h index 8109a15..b0a2524 100644 --- a/include/net/flow.h +++ b/include/net/flow.h @@ -79,6 +79,8 @@ struct flowi4 { #define fl4_ipsec_spi uli.spi #define fl4_mh_typeuli.mht.type #define fl4_gre_keyuli.gre_key + + __u32 flowi4_hash; } __attribute__((__aligned__(BITS_PER_LONG/8))); static inline void flowi4_init_output(struct flowi4 *fl4, int oif, @@ -99,6 +101,7 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif, fl4-saddr = saddr; fl4-fl4_dport = dport; fl4-fl4_sport = sport; + fl4-flowi4_hash = 0; } /* Reset some input parameters after previous lookup */ @@ -182,6 +185,11 @@ static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn) return container_of(fldn, struct flowi, u.dn); } +static inline void flowi4_set_flow_hash(struct flowi4 *fl, __u32 hash) +{ + fl-flowi4_hash = hash; +} + typedef unsigned long flow_compare_t; static inline size_t flow_key_size(u16 family) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 5fa643b..7db9f72 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -117,6 +117,8 @@ struct fib_info { #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_power; #endif + /* Cache the number of live nexthops for flow based ECMP calculation. */ + int live_nexthops; struct rcu_head rcu; struct fib_nh fib_nh[0]; #define fib_devfib_nh[0].nh_dev @@ -310,6 +312,8 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event); int fib_sync_down_addr(struct net *net, __be32 local); int fib_sync_up(struct net_device *dev, unsigned int nh_flags); void fib_select_multipath(struct fib_result *res); +void fib_select_multipath_for_flow(struct fib_result *res, + const struct flowi4 *fl4); /* Exported by fib_trie.c */ void fib_trie_init(void); diff --git a/include/net/route.h b/include/net/route.h index fe22d03..a00e606 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -252,6 +252,8 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32 flowi4_init_output(fl4, oif, sk-sk_mark, tos, RT_SCOPE_UNIVERSE, protocol, flow_flags, dst, src, dport, sport); + + flowi4_set_flow_hash(fl4, sk-sk_txhash); } static inline struct rtable *ip_route_connect(struct flowi4 *fl4, diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 3a06586..0a56ad3 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -981,6 +981,7 @@ link_it: head = fib_info_devhash[hash]; hlist_add_head(nexthop_nh-nh_hash, head); } endfor_nexthops(fi) + fi-live_nexthops = fi-fib_nhs; spin_unlock_bh(fib_info_lock); return fi; @@ -1196,6 +1197,7 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event) } ret++; } + fi-live_nexthops = fi-fib_nhs - dead; } return ret; @@ -1331,6 +1333,7 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags) if (alive 0) { fi-fib_flags = ~nh_flags; ret++; + fi-live_nexthops = alive; } } @@ -1397,4 +1400,31 @@ void fib_select_multipath(struct
[PATCH 1/2] openvswitch: Use regular GRE net_device instead of vport
With addition of flow based tunneling, there is no need to have special GRE vport. Removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar pshe...@nicira.com --- include/net/gre.h | 11 +- include/net/ip_tunnels.h | 6 +- net/ipv4/gre_demux.c | 34 -- net/ipv4/ip_gre.c | 185 - net/ipv4/ip_tunnel.c | 6 +- net/ipv4/ipip.c| 2 +- net/ipv6/sit.c | 2 +- net/openvswitch/Kconfig| 1 - net/openvswitch/vport-gre.c| 230 - net/openvswitch/vport-netdev.c | 5 +- net/openvswitch/vport-netdev.h | 2 + net/openvswitch/vport.h| 2 +- 12 files changed, 222 insertions(+), 264 deletions(-) diff --git a/include/net/gre.h b/include/net/gre.h index b531820..4193fd7 100644 --- a/include/net/gre.h +++ b/include/net/gre.h @@ -33,16 +33,7 @@ struct gre_cisco_protocol { int gre_cisco_register(struct gre_cisco_protocol *proto); int gre_cisco_unregister(struct gre_cisco_protocol *proto); -void gre_build_header(struct sk_buff *skb, const struct tnl_ptk_info *tpi, - int hdr_len); - -static inline struct sk_buff *gre_handle_offloads(struct sk_buff *skb, - bool csum) -{ - return iptunnel_handle_offloads(skb, csum, - csum ? SKB_GSO_GRE_CSUM : SKB_GSO_GRE); -} - +#define GRE_TAP_FB_NAME gretap0 static inline int ip_gre_calc_hlen(__be16 o_flags) { diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 4798441..fc37624 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -82,6 +82,8 @@ struct ip_tunnel_dst { __be32 saddr; }; +struct metadata_dst; + struct ip_tunnel { struct ip_tunnel __rcu *next; struct hlist_node hash_node; @@ -115,6 +117,7 @@ struct ip_tunnel { unsigned intprl_count; /* # of entries in PRL */ int ip_tnl_net_id; struct gro_cellsgro_cells; + boolflow_based_tunnel; }; #define TUNNEL_CSUM__cpu_to_be16(0x01) @@ -235,7 +238,8 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn, __be32 key); int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb, - const struct tnl_ptk_info *tpi, bool log_ecn_error); + const struct tnl_ptk_info *tpi, struct metadata_dst *tun_dst, + bool log_ecn_error); int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[], struct ip_tunnel_parm *p); int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[], diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 4a7b5b2..77562e0 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -61,40 +61,6 @@ int gre_del_protocol(const struct gre_protocol *proto, u8 version) } EXPORT_SYMBOL_GPL(gre_del_protocol); -void gre_build_header(struct sk_buff *skb, const struct tnl_ptk_info *tpi, - int hdr_len) -{ - struct gre_base_hdr *greh; - - skb_push(skb, hdr_len); - - skb_reset_transport_header(skb); - greh = (struct gre_base_hdr *)skb-data; - greh-flags = tnl_flags_to_gre_flags(tpi-flags); - greh-protocol = tpi-proto; - - if (tpi-flags(TUNNEL_KEY|TUNNEL_CSUM|TUNNEL_SEQ)) { - __be32 *ptr = (__be32 *)(((u8 *)greh) + hdr_len - 4); - - if (tpi-flagsTUNNEL_SEQ) { - *ptr = tpi-seq; - ptr--; - } - if (tpi-flagsTUNNEL_KEY) { - *ptr = tpi-key; - ptr--; - } - if (tpi-flagsTUNNEL_CSUM - !(skb_shinfo(skb)-gso_type - (SKB_GSO_GRE|SKB_GSO_GRE_CSUM))) { - *ptr = 0; - *(__sum16 *)ptr = csum_fold(skb_checksum(skb, 0, -skb-len, 0)); - } - } -} -EXPORT_SYMBOL_GPL(gre_build_header); - static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, bool *csum_err) { diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 5fd7064..31f2ec5 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -25,6 +25,7 @@ #include linux/udp.h #include linux/if_arp.h #include linux/mroute.h +#include linux/if_vlan.h #include linux/init.h #include linux/in6.h #include linux/inetdevice.h @@ -47,6 +48,7 @@ #include net/netns/generic.h #include net/rtnetlink.h #include net/gre.h +#include net/dst_metadata.h #if IS_ENABLED(CONFIG_IPV6)
[net-next:master 173/173] net/bridge/br_mdb.c:493:8: error: 'struct net_bridge' has no member named 'vlan_enabled'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: e44deb2f0cce9183ca94d14effd4170a35eec31d commit: e44deb2f0cce9183ca94d14effd4170a35eec31d [173/173] bridge: mdb: add/del entry on all vlans if vlan_filter is enabled and vid is 0 config: sh-titan_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout e44deb2f0cce9183ca94d14effd4170a35eec31d # save the attached .config to linux build tree make.cross ARCH=sh All error/warnings (new ones prefixed by ): net/bridge/br_mdb.c: In function 'br_mdb_add': net/bridge/br_mdb.c:493:8: error: 'struct net_bridge' has no member named 'vlan_enabled' net/bridge/br_mdb.c: In function 'br_mdb_del': net/bridge/br_mdb.c:595:8: error: 'struct net_bridge' has no member named 'vlan_enabled' vim +493 net/bridge/br_mdb.c 487 488 p = br_port_get_rtnl(pdev); 489 if (!p || p-br != br || p-state == BR_STATE_DISABLED) 490 return -EINVAL; 491 492 pv = nbp_get_vlan_info(p); 493 if (br-vlan_enabled pv entry-vid == 0) { 494 for_each_set_bit(vid, pv-vlan_bitmap, VLAN_N_VID) { 495 entry-vid = vid; 496 err = __br_mdb_add(net, br, entry); --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/sh 4.2.0-rc4 Kernel Configuration # CONFIG_SUPERH=y CONFIG_SUPERH32=y # CONFIG_SUPERH64 is not set CONFIG_ARCH_DEFCONFIG=arch/sh/configs/shx3_defconfig CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y # CONFIG_ARCH_SUSPEND_POSSIBLE is not set CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_SYS_SUPPORTS_HUGETLBFS=y CONFIG_SYS_SUPPORTS_PCI=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set # CONFIG_NO_IOPORT_MAP is not set CONFIG_DMA_NONCOHERENT=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_IRQ_WORK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set CONFIG_DEFAULT_HOSTNAME=(none) CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_FHANDLE is not set CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_MAY_HAVE_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_GENERIC_CLOCKEVENTS=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 # CONFIG_CGROUPS is not set # CONFIG_CHECKPOINT_RESTORE is not set CONFIG_NAMESPACES=y CONFIG_UTS_NS=y CONFIG_IPC_NS=y # CONFIG_USER_NS is not set CONFIG_PID_NS=y CONFIG_NET_NS=y # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_SYSFS_DEPRECATED is not set # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= CONFIG_RD_GZIP=y CONFIG_RD_BZIP2=y CONFIG_RD_LZMA=y CONFIG_RD_XZ=y CONFIG_RD_LZO=y CONFIG_RD_LZ4=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_HAVE_UID16=y CONFIG_BPF=y # CONFIG_EXPERT is not set CONFIG_UID16=y CONFIG_MULTIUSER=y CONFIG_SGETMASK_SYSCALL=y CONFIG_SYSFS_SYSCALL=y # CONFIG_SYSCTL_SYSCALL is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y # CONFIG_BPF_SYSCALL is not set CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_ADVISE_SYSCALLS=y CONFIG_PCI_QUIRKS=y # CONFIG_EMBEDDED is not set CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_USE_VMALLOC=y # # Kernel Performance Events And Counters #
[PATCH 2/2] gre: Remove support for sharing GRE protocol hook.
Support for sharing GREPROTO_CISCO port was added so that OVS gre port and kernel GRE devices can co-exist. After flow-based tunneling patches OVS GRE protocol processing is completely moved to ip_gre module. so there is no need for GRE protocol hook. Following patch consolidates GRE protocol related functions into ip_gre module. Signed-off-by: Pravin B Shelar pshe...@nicira.com --- include/net/gre.h| 86 +++-- net/ipv4/gre_demux.c | 201 +-- net/ipv4/ip_gre.c| 215 +++ 3 files changed, 209 insertions(+), 293 deletions(-) diff --git a/include/net/gre.h b/include/net/gre.h index 4193fd7..b2572b7 100644 --- a/include/net/gre.h +++ b/include/net/gre.h @@ -4,92 +4,24 @@ #include linux/skbuff.h #include net/ip_tunnels.h -#define GREPROTO_CISCO 0 -#define GREPROTO_PPTP 1 -#define GREPROTO_MAX 2 -#define GRE_IP_PROTO_MAX 2 - -struct gre_protocol { - int (*handler)(struct sk_buff *skb); - void (*err_handler)(struct sk_buff *skb, u32 info); -}; - struct gre_base_hdr { __be16 flags; __be16 protocol; }; #define GRE_HEADER_SECTION 4 +struct gre_protocol { + int (*handler)(struct sk_buff *skb); + void (*err_handler)(struct sk_buff *skb, u32 info); +}; + int gre_add_protocol(const struct gre_protocol *proto, u8 version); int gre_del_protocol(const struct gre_protocol *proto, u8 version); -struct gre_cisco_protocol { - int (*handler)(struct sk_buff *skb, const struct tnl_ptk_info *tpi); - int (*err_handler)(struct sk_buff *skb, u32 info, - const struct tnl_ptk_info *tpi); - u8 priority; -}; - -int gre_cisco_register(struct gre_cisco_protocol *proto); -int gre_cisco_unregister(struct gre_cisco_protocol *proto); +#define GREPROTO_CISCO 0 +#define GREPROTO_PPTP 1 +#define GREPROTO_MAX 2 +#define GRE_IP_PROTO_MAX 2 #define GRE_TAP_FB_NAME gretap0 - -static inline int ip_gre_calc_hlen(__be16 o_flags) -{ - int addend = 4; - - if (o_flagsTUNNEL_CSUM) - addend += 4; - if (o_flagsTUNNEL_KEY) - addend += 4; - if (o_flagsTUNNEL_SEQ) - addend += 4; - return addend; -} - -static inline __be16 gre_flags_to_tnl_flags(__be16 flags) -{ - __be16 tflags = 0; - - if (flags GRE_CSUM) - tflags |= TUNNEL_CSUM; - if (flags GRE_ROUTING) - tflags |= TUNNEL_ROUTING; - if (flags GRE_KEY) - tflags |= TUNNEL_KEY; - if (flags GRE_SEQ) - tflags |= TUNNEL_SEQ; - if (flags GRE_STRICT) - tflags |= TUNNEL_STRICT; - if (flags GRE_REC) - tflags |= TUNNEL_REC; - if (flags GRE_VERSION) - tflags |= TUNNEL_VERSION; - - return tflags; -} - -static inline __be16 tnl_flags_to_gre_flags(__be16 tflags) -{ - __be16 flags = 0; - - if (tflags TUNNEL_CSUM) - flags |= GRE_CSUM; - if (tflags TUNNEL_ROUTING) - flags |= GRE_ROUTING; - if (tflags TUNNEL_KEY) - flags |= GRE_KEY; - if (tflags TUNNEL_SEQ) - flags |= GRE_SEQ; - if (tflags TUNNEL_STRICT) - flags |= GRE_STRICT; - if (tflags TUNNEL_REC) - flags |= GRE_REC; - if (tflags TUNNEL_VERSION) - flags |= GRE_VERSION; - - return flags; -} - #endif diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 77562e0..d9c552a 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -31,7 +31,6 @@ #include net/xfrm.h static const struct gre_protocol __rcu *gre_proto[GREPROTO_MAX] __read_mostly; -static struct gre_cisco_protocol __rcu *gre_cisco_proto_list[GRE_IP_PROTO_MAX]; int gre_add_protocol(const struct gre_protocol *proto, u8 version) { @@ -61,163 +60,6 @@ int gre_del_protocol(const struct gre_protocol *proto, u8 version) } EXPORT_SYMBOL_GPL(gre_del_protocol); -static int parse_gre_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, - bool *csum_err) -{ - const struct gre_base_hdr *greh; - __be32 *options; - int hdr_len; - - if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr - return -EINVAL; - - greh = (struct gre_base_hdr *)skb_transport_header(skb); - if (unlikely(greh-flags (GRE_VERSION | GRE_ROUTING))) - return -EINVAL; - - tpi-flags = gre_flags_to_tnl_flags(greh-flags); - hdr_len = ip_gre_calc_hlen(tpi-flags); - - if (!pskb_may_pull(skb, hdr_len)) - return -EINVAL; - - greh = (struct gre_base_hdr *)skb_transport_header(skb); - tpi-proto = greh-protocol; - - options = (__be32 *)(greh + 1); - if (greh-flags GRE_CSUM) { - if (skb_checksum_simple_validate(skb))
Re: [PATCH net-next 5/9] openvswitch: Add conntrack action
On 31 July 2015 at 19:08, Pravin Shelar pshe...@nicira.com wrote: On Thu, Jul 30, 2015 at 11:12 AM, Joe Stringer joestrin...@nicira.com wrote: +static void prepare_frag(struct vport *vport, struct sw_flow_key *key, +struct sk_buff *skb) +{ + unsigned int hlen = ETH_HLEN; + struct ovs_frag_data *data; + + data = this_cpu_ptr(ovs_frag_data_storage); + data-dst = skb_dst(skb); + data-vport = vport; + data-key = key; + data-cb = *OVS_CB(skb); + + if (key-eth.tci htons(VLAN_TAG_PRESENT)) { + if (skb_vlan_tag_present(skb)) { + data-vlan_proto = skb-vlan_proto; + } else { + data-vlan_proto = vlan_eth_hdr(skb)-h_vlan_proto; + hlen += VLAN_HLEN; + } + } Not all actions keep flow key uptodate, so here you can access stale values. Hmm, okay. Perhaps the right thing to handle all of these cases is to just make a copy of everything up to the network offset, and restore that after fragmentation. if (unlikely(err)) { - kfree_skb(skb); + /* Hide stolen fragments from user space. */ + if (err == -EINPROGRESS) + err = 0; This does not look safe for error returned from all cases, Can you check this case specifically for the CT action case. I'll place it inside the CT action case. Thanks for the review, will roll the other fixes into the next version. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Following patches make use of new flow based tunneling API from kernel. This allows us to directly use netdev based GRE tunnel implementation. While doing so I have removed GRE demux API which were targeted for OVS. Most of GRE protocol code is now consolidated in ip_gre module. Pravin B Shelar (2): openvswitch: Use regular GRE net_device instead of vport gre: Remove support for sharing GRE protocol hook. include/net/gre.h | 97 ++ include/net/ip_tunnels.h | 6 +- net/ipv4/gre_demux.c | 235 +--- net/ipv4/ip_gre.c | 400 ++--- net/ipv4/ip_tunnel.c | 6 +- net/ipv4/ipip.c| 2 +- net/ipv6/sit.c | 2 +- net/openvswitch/Kconfig| 1 - net/openvswitch/vport-gre.c| 230 +++- net/openvswitch/vport-netdev.c | 5 +- net/openvswitch/vport-netdev.h | 2 + net/openvswitch/vport.h| 2 +- 12 files changed, 431 insertions(+), 557 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fec: fix initial runtime PM refcount
From: Lucas Stach l.st...@pengutronix.de Date: Mon, 3 Aug 2015 17:50:11 +0200 The clocks are initially active and thus the device is marked active. This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend() call at the end of probe then leaves us with an invalid refcount of -1, which in turn leads to the device staying in suspended state even though netdev open had been called. Fix this by initializing the refcount to be coherent with the initial device status. Fixes: 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus) Signed-off-by: Lucas Stach l.st...@pengutronix.de --- Please apply this as a fix for 4.2 I'm waiting for feedback to be given wrt. the runtime-pm issues. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net_dbg_ratelimited: turn into no-op when !DEBUG
On Tue, 2015-08-04 at 05:26 +0200, Jason A. Donenfeld wrote: The pr_debug family of functions turns into a no-op when -DDEBUG is not specified, opting instead to call no_printk, which gets compiled to a no-op (but retains gcc's nice warnings about printf-style arguments). The problem with net_dbg_ratelimited is that it is defined to be a variant of net_ratelimited_function, which expands to essentially: if (net_ratelimit()) pr_debug(fmt, ...); When DEBUG is not defined, then this becomes, if (net_ratelimit()) ; This seems benign, except it isn't. Firstly, there's the obvious overhead of calling net_ratelimit needlessly, which does quite some book keeping for the rate limiting. Given that the pr_debug and net_dbg_ratelimited family of functions are sprinkled liberally through performance critical code, with developers assuming they'll be compiled out to a no-op most of the time, we certainly do not want this needless book keeping. Secondly, and most visibly, even though no debug message is printed when DEBUG is not defined, if there is a flood of invocations, dmesg winds up peppered with messages such as net_ratelimit: 320 callbacks suppressed. This is because our aforementioned net_ratelimit() function actually prints this text in some circumstances. It's especially odd to see this when there isn't any other accompanying debug message. So, in sum, it doesn't make sense to have this function's current behavior, and instead it should match what every other debug family of functions in the kernel does with !DEBUG -- nothing. This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Makes sense, thanks Jason. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net_dbg_ratelimited: turn into no-op when !DEBUG
From: Joe Perches j...@perches.com Date: Mon, 03 Aug 2015 21:02:21 -0700 On Mon, 2015-08-03 at 20:57 -0700, Joe Perches wrote: On Tue, 2015-08-04 at 05:26 +0200, Jason A. Donenfeld wrote: This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Makes sense, thanks Jason. Perhaps better still would be to use if (0) no_printk so that the call and whatever argument calls the net_dbg_ratelimited makes are completely eliminated. Agreed. Jason please respin your patch to work this way. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] udp: fix dst races with multicast early demux
From: Eric Dumazet eric.duma...@gmail.com Date: Sat, 01 Aug 2015 12:14:33 +0200 From: Eric Dumazet eduma...@google.com Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit f8864972126899 (ipv4: fix dst race in sk_dst_get()), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk-sk_rx_dst, as mentioned in commit d0c294c53a771 (tcp: prevent fetching dst twice in early demux code) Fixes: 421b3885bf6d (udp: ipv4: Add udp early demux) Signed-off-by: Eric Dumazet eduma...@google.com Reported-by: Gregory Hoggarth gregory.hogga...@alliedtelesis.co.nz Reported-by: Alex Gartrell agartr...@fb.com Cc: Michal Kubeček mkube...@suse.cz --- David : I will be on vacation for following 7 days, no internet access. Please wait for tests done by Gregory Alex before merging this ? Thanks ! Now that this has been tested by Gregory, applied and queued up for -stable thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/2] lwtunnel: encap locally-generated ipv4 packets
From: Robert Shearman rshea...@brocade.com Date: Mon, 3 Aug 2015 17:39:19 +0100 Locally-generated IPv4 packets, such as from applications running on the host or traceroute/ping currently don't have lwtunnel output redirected encap applied. However, they should do in the same way as for forwarded packets and this patch series addresses that. Series applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] net: switchdev: support static FDB addresses
This patch adds a is_static boolean to the switchdev_obj_fdb structure, in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- include/net/switchdev.h | 1 + net/switchdev/switchdev.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/net/switchdev.h b/include/net/switchdev.h index e90e1a0..0e296b8 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -72,6 +72,7 @@ struct switchdev_obj { struct switchdev_obj_fdb { /* PORT_FDB */ u8 addr[ETH_ALEN]; u16 vid; + bool is_static; } fdb; } u; }; diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 28786e8..b75897c 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -810,7 +810,7 @@ static int switchdev_port_fdb_dump_cb(struct net_device *dev, ndm-ndm_flags = NTF_SELF; ndm-ndm_type= 0; ndm-ndm_ifindex = dev-ifindex; - ndm-ndm_state = NUD_REACHABLE; + ndm-ndm_state = obj-u.fdb.is_static ? NUD_NOARP : NUD_REACHABLE; if (nla_put(dump-skb, NDA_LLADDR, ETH_ALEN, obj-u.fdb.addr)) goto nla_put_failure; -- 2.4.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
net: dsa: support switchdev FDB objects
This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB objects. The first two patches add minor but necessary changes to switchdev, the third one implements the switchdev glue in DSA for FDB routines, and the forth one refactors the FDB access functions in the mv88e6xxx code. Below is an example (ports 0-2 belongs to br0, ports 3-4 belongs to br1): # bridge fdb add 3c:97:0e:11:30:6e dev swp2 # bridge fdb add 3c:97:0e:11:40:78 dev swp3 # bridge fdb add 3c:97:0e:11:50:86 dev swp4 # bridge fdb del 3c:97:0e:11:40:78 dev swp3 # bridge fdb 01:00:5e:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth1 self permanent 00:50:d2:10:78:15 dev swp0 master br0 permanent 3c:97:0e:11:30:6e dev swp2 self static 00:50:d2:10:78:15 dev swp3 master br1 permanent 3c:97:0e:11:50:86 dev swp4 self static # cat /sys/kernel/debug/dsa0/atu # DB T/P Vec State Addr # 001 Port 004 e 3c:97:0e:11:30:6e # 004 Port 010 e 3c:97:0e:11:50:86 For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged ports and bridge groups, and the remaining will be later used by VLANs. This change is necessary to welcome the support for hardware VLANs (which will follow soon). Cheers, -v Vivien Didelot (4): net: switchdev: change fdb addr for a byte array net: switchdev: support static FDB addresses net: dsa: add support for switchdev FDB objects net: dsa: mv88e6xxx: refactor FDB routines drivers/net/dsa/mv88e6171.c | 6 +- drivers/net/dsa/mv88e6352.c | 6 +- drivers/net/dsa/mv88e6xxx.c | 205 ++-- drivers/net/dsa/mv88e6xxx.h | 31 +++-- drivers/net/ethernet/rocker/rocker.c | 2 +- include/net/dsa.h| 16 ++- include/net/switchdev.h | 3 +- net/bridge/br_fdb.c | 2 +- net/dsa/slave.c | 221 +++ net/switchdev/switchdev.c| 6 +- 10 files changed, 308 insertions(+), 190 deletions(-) -- 2.4.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] net: dsa: add support for switchdev FDB objects
Remove the fdb_{add,del,getnext} function pointer in favor of new port_fdb_{add,del,getnext}. Implement the switchdev_port_obj_{add,del,dump} functions in DSA to support the SWITCHDEV_OBJ_PORT_FDB objects. These functions are called from switchdev_port_bridge_{get,set,del}link. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6171.c | 3 - drivers/net/dsa/mv88e6352.c | 3 - include/net/dsa.h | 16 ++-- net/dsa/slave.c | 221 4 files changed, 129 insertions(+), 114 deletions(-) diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c index 1c78084..cfa21ed 100644 --- a/drivers/net/dsa/mv88e6171.c +++ b/drivers/net/dsa/mv88e6171.c @@ -116,9 +116,6 @@ struct dsa_switch_driver mv88e6171_switch_driver = { .port_join_bridge = mv88e6xxx_join_bridge, .port_leave_bridge = mv88e6xxx_leave_bridge, .port_stp_update= mv88e6xxx_port_stp_update, - .fdb_add= mv88e6xxx_port_fdb_add, - .fdb_del= mv88e6xxx_port_fdb_del, - .fdb_getnext= mv88e6xxx_port_fdb_getnext, }; MODULE_ALIAS(platform:mv88e6171); diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index af210ef..eb4630f 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -341,9 +341,6 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .port_join_bridge = mv88e6xxx_join_bridge, .port_leave_bridge = mv88e6xxx_leave_bridge, .port_stp_update= mv88e6xxx_port_stp_update, - .fdb_add= mv88e6xxx_port_fdb_add, - .fdb_del= mv88e6xxx_port_fdb_del, - .fdb_getnext= mv88e6xxx_port_fdb_getnext, }; MODULE_ALIAS(platform:mv88e6172); diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63b..a090c8a 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -296,12 +296,16 @@ struct dsa_switch_driver { u32 br_port_mask); int (*port_stp_update)(struct dsa_switch *ds, int port, u8 state); - int (*fdb_add)(struct dsa_switch *ds, int port, - const unsigned char *addr, u16 vid); - int (*fdb_del)(struct dsa_switch *ds, int port, - const unsigned char *addr, u16 vid); - int (*fdb_getnext)(struct dsa_switch *ds, int port, - unsigned char *addr, bool *is_static); + + /* +* Forwarding database +*/ + int (*port_fdb_add)(struct dsa_switch *ds, int port, u16 vid, + u8 addr[ETH_ALEN]); + int (*port_fdb_del)(struct dsa_switch *ds, int port, u16 vid, + u8 addr[ETH_ALEN]); + int (*port_fdb_getnext)(struct dsa_switch *ds, int port, u16 *vid, + u8 addr[ETH_ALEN], bool *is_static); }; void register_switch_driver(struct dsa_switch_driver *type); diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 0010c69..0f99a17 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -19,6 +19,7 @@ #include net/switchdev.h #include linux/if_bridge.h #include linux/netpoll.h +#include linux/if_vlan.h #include dsa_priv.h /* slave mii_bus handling ***/ @@ -200,105 +201,6 @@ out: return 0; } -static int dsa_slave_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], -struct net_device *dev, -const unsigned char *addr, u16 vid, u16 nlm_flags) -{ - struct dsa_slave_priv *p = netdev_priv(dev); - struct dsa_switch *ds = p-parent; - int ret = -EOPNOTSUPP; - - if (ds-drv-fdb_add) - ret = ds-drv-fdb_add(ds, p-port, addr, vid); - - return ret; -} - -static int dsa_slave_fdb_del(struct ndmsg *ndm, struct nlattr *tb[], -struct net_device *dev, -const unsigned char *addr, u16 vid) -{ - struct dsa_slave_priv *p = netdev_priv(dev); - struct dsa_switch *ds = p-parent; - int ret = -EOPNOTSUPP; - - if (ds-drv-fdb_del) - ret = ds-drv-fdb_del(ds, p-port, addr, vid); - - return ret; -} - -static int dsa_slave_fill_info(struct net_device *dev, struct sk_buff *skb, - const unsigned char *addr, u16 vid, - bool is_static, - u32 portid, u32 seq, int type, - unsigned int flags) -{ - struct nlmsghdr *nlh; - struct ndmsg *ndm; - - nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags); - if (!nlh) - return -EMSGSIZE; - - ndm = nlmsg_data(nlh); - ndm-ndm_family = AF_BRIDGE; - ndm-ndm_pad1
[PATCHv2 net-next 0/4] add meminfo, bist status and misc. fixes
Hi, This patch series adds the following. Add support to dump memory address range of various hw modules Add support to dump edc bist status during ecc error Read correct bits of who am i register for T6 adapter and update T6 register range This patch series has been created against net-next tree and includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. Thanks V2: PATCH 3/4 (cxgb4/cxgb4vf: read the correct bits of PL Who Am I register) Fix switch statement in get_chip_type() and some more style fixes based on review comment by Sergei Shtylyov sergei.shtyl...@cogentembedded.com Hariprasad Shenai (4): cxgb4: Add debugfs support to dump meminfo cxgb4: Add support to dump edc bist status cxgb4/cxgb4vf: read the correct bits of PL Who Am I register cxgb4: Update T6 register ranges drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 285 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 34 ++- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 73 -- drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 131 +- drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 3 +- 5 files changed, 506 insertions(+), 20 deletions(-) -- 2.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 net-next 4/4] cxgb4: Update T6 register ranges
Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 26 -- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 5c63ceb..91750ad 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -1359,9 +1359,10 @@ void t4_get_regs(struct adapter *adap, void *buf, size_t buf_size) }; static const unsigned int t6_reg_ranges[] = { - 0x1008, 0x114c, + 0x1008, 0x1124, + 0x1138, 0x114c, 0x1180, 0x11b4, - 0x11fc, 0x1250, + 0x11fc, 0x1254, 0x1280, 0x133c, 0x1800, 0x18fc, 0x3000, 0x302c, @@ -1384,16 +1385,16 @@ void t4_get_regs(struct adapter *adap, void *buf, size_t buf_size) 0x5c10, 0x5ec0, 0x5ec8, 0x5ecc, 0x6000, 0x6040, - 0x6058, 0x615c, + 0x6058, 0x619c, 0x7700, 0x7798, 0x77c0, 0x7880, 0x78cc, 0x78fc, 0x7b00, 0x7c54, 0x7d00, 0x7efc, - 0x8dc0, 0x8de0, + 0x8dc0, 0x8de4, 0x8df8, 0x8e84, 0x8ea0, 0x8f88, - 0x8fb8, 0x911c, + 0x8fb8, 0x9124, 0x9400, 0x9470, 0x9600, 0x971c, 0x9800, 0x9808, @@ -1413,9 +1414,8 @@ void t4_get_regs(struct adapter *adap, void *buf, size_t buf_size) 0xdfc0, 0xdfe0, 0xe000, 0xf008, 0x11000, 0x11014, - 0x11048, 0x0, - 0x8, 0x1117c, - 0x11190, 0x11264, + 0x11048, 0x1117c, + 0x11190, 0x11270, 0x11300, 0x1130c, 0x12000, 0x1206c, 0x19040, 0x1906c, @@ -1500,9 +1500,8 @@ void t4_get_regs(struct adapter *adap, void *buf, size_t buf_size) 0x1ff00, 0x1ff84, 0x1ffc0, 0x1ffc8, 0x3, 0x30070, - 0x30100, 0x3015c, - 0x30190, 0x301d0, - 0x30200, 0x30318, + 0x30100, 0x301d0, + 0x30200, 0x30320, 0x30400, 0x3052c, 0x30540, 0x3061c, 0x30800, 0x30890, @@ -1578,9 +1577,8 @@ void t4_get_regs(struct adapter *adap, void *buf, size_t buf_size) 0x33c24, 0x33c50, 0x33cf0, 0x33cfc, 0x34000, 0x34070, - 0x34100, 0x3415c, - 0x34190, 0x341d0, - 0x34200, 0x34318, + 0x34100, 0x341d0, + 0x34200, 0x34320, 0x34400, 0x3452c, 0x34540, 0x3461c, 0x34800, 0x34890, -- 2.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 net-next 2/4] cxgb4: Add support to dump edc bist status
Add support to dump edc bist status for ECC data errors Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 39 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 5 ++-- 2 files changed, 42 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 800bd48..b193295 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -345,6 +345,43 @@ int t4_wr_mbox_meat(struct adapter *adap, int mbox, const void *cmd, int size, FW_CMD_MAX_TIMEOUT); } +static int t4_edc_err_read(struct adapter *adap, int idx) +{ + u32 edc_ecc_err_addr_reg; + u32 rdata_reg; + + if (is_t4(adap-params.chip)) { + CH_WARN(adap, %s: T4 NOT supported.\n, __func__); + return 0; + } + if (idx != 0 idx != 1) { + CH_WARN(adap, %s: idx %d NOT supported.\n, __func__, idx); + return 0; + } + + edc_ecc_err_addr_reg = EDC_T5_REG(EDC_H_ECC_ERR_ADDR_A, idx); + rdata_reg = EDC_T5_REG(EDC_H_BIST_STATUS_RDATA_A, idx); + + CH_WARN(adap, + edc%d err addr 0x%x: 0x%x.\n, + idx, edc_ecc_err_addr_reg, + t4_read_reg(adap, edc_ecc_err_addr_reg)); + CH_WARN(adap, + bist: 0x%x, status %llx %llx %llx %llx %llx %llx %llx %llx %llx.\n, + rdata_reg, + (unsigned long long)t4_read_reg64(adap, rdata_reg), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 8), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 16), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 24), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 32), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 40), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 48), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 56), + (unsigned long long)t4_read_reg64(adap, rdata_reg + 64)); + + return 0; +} + /** * t4_memory_rw - read/write EDC 0, EDC 1 or MC via PCIE memory window * @adap: the adapter @@ -3283,6 +3320,8 @@ static void mem_intr_handler(struct adapter *adapter, int idx) if (v ECC_CE_INT_CAUSE_F) { u32 cnt = ECC_CECNT_G(t4_read_reg(adapter, cnt_addr)); + t4_edc_err_read(adapter, idx); + t4_write_reg(adapter, cnt_addr, ECC_CECNT_V(ECC_CECNT_M)); if (printk_ratelimit()) dev_warn(adapter-pdev_dev, diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h index 0626868..13ce018 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h @@ -2867,10 +2867,11 @@ #define EDC_H_BIST_DATA_PATTERN_A 0x50010 #define EDC_H_BIST_STATUS_RDATA_A 0x50028 +#define EDC_H_ECC_ERR_ADDR_A 0x50084 #define EDC_T51_BASE_ADDR 0x50800 -#define EDC_STRIDE_T5 (EDC_T51_BASE_ADDR - EDC_T50_BASE_ADDR) -#define EDC_REG_T5(reg, idx) (reg + EDC_STRIDE_T5 * idx) +#define EDC_T5_STRIDE (EDC_T51_BASE_ADDR - EDC_T50_BASE_ADDR) +#define EDC_T5_REG(reg, idx) (reg + EDC_T5_STRIDE * idx) #define PL_VF_REV_A 0x4 #define PL_VF_WHOAMI_A 0x0 -- 2.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 net-next 1/4] cxgb4: Add debugfs support to dump meminfo
Add debug support to dump memory address ranges of various hardware modules of the adapter. Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 285 + drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 122 + 2 files changed, 407 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c index f701a6f..b657734 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c @@ -2275,6 +2275,290 @@ static const struct file_operations blocked_fl_fops = { .llseek = generic_file_llseek, }; +struct mem_desc { + unsigned int base; + unsigned int limit; + unsigned int idx; +}; + +static int mem_desc_cmp(const void *a, const void *b) +{ + return ((const struct mem_desc *)a)-base - + ((const struct mem_desc *)b)-base; +} + +static void mem_region_show(struct seq_file *seq, const char *name, + unsigned int from, unsigned int to) +{ + char buf[40]; + + string_get_size((u64)to - from + 1, 1, STRING_UNITS_2, buf, + sizeof(buf)); + seq_printf(seq, %-15s %#x-%#x [%s]\n, name, from, to, buf); +} + +static int meminfo_show(struct seq_file *seq, void *v) +{ + static const char * const memory[] = { EDC0:, EDC1:, MC:, + MC0:, MC1:}; + static const char * const region[] = { + DBQ contexts:, IMSG contexts:, FLM cache:, TCBs:, + Pstructs:, Timers:, Rx FL:, Tx FL:, Pstruct FL:, + Tx payload:, Rx payload:, LE hash:, iSCSI region:, + TDDP region:, TPT region:, STAG region:, RQ region:, + RQUDP region:, PBL region:, TXPBL region:, + DBVFIFO region:, ULPRX state:, ULPTX state:, + On-chip queues: + }; + + int i, n; + u32 lo, hi, used, alloc; + struct mem_desc avail[4]; + struct mem_desc mem[ARRAY_SIZE(region) + 3]; /* up to 3 holes */ + struct mem_desc *md = mem; + struct adapter *adap = seq-private; + + for (i = 0; i ARRAY_SIZE(mem); i++) { + mem[i].limit = 0; + mem[i].idx = i; + } + + /* Find and sort the populated memory ranges */ + i = 0; + lo = t4_read_reg(adap, MA_TARGET_MEM_ENABLE_A); + if (lo EDRAM0_ENABLE_F) { + hi = t4_read_reg(adap, MA_EDRAM0_BAR_A); + avail[i].base = EDRAM0_BASE_G(hi) 20; + avail[i].limit = avail[i].base + (EDRAM0_SIZE_G(hi) 20); + avail[i].idx = 0; + i++; + } + if (lo EDRAM1_ENABLE_F) { + hi = t4_read_reg(adap, MA_EDRAM1_BAR_A); + avail[i].base = EDRAM1_BASE_G(hi) 20; + avail[i].limit = avail[i].base + (EDRAM1_SIZE_G(hi) 20); + avail[i].idx = 1; + i++; + } + + if (is_t5(adap-params.chip)) { + if (lo EXT_MEM0_ENABLE_F) { + hi = t4_read_reg(adap, MA_EXT_MEMORY0_BAR_A); + avail[i].base = EXT_MEM0_BASE_G(hi) 20; + avail[i].limit = + avail[i].base + (EXT_MEM0_SIZE_G(hi) 20); + avail[i].idx = 3; + i++; + } + if (lo EXT_MEM1_ENABLE_F) { + hi = t4_read_reg(adap, MA_EXT_MEMORY1_BAR_A); + avail[i].base = EXT_MEM1_BASE_G(hi) 20; + avail[i].limit = + avail[i].base + (EXT_MEM1_SIZE_G(hi) 20); + avail[i].idx = 4; + i++; + } + } else { + if (lo EXT_MEM_ENABLE_F) { + hi = t4_read_reg(adap, MA_EXT_MEMORY_BAR_A); + avail[i].base = EXT_MEM_BASE_G(hi) 20; + avail[i].limit = + avail[i].base + (EXT_MEM_SIZE_G(hi) 20); + avail[i].idx = 2; + i++; + } + } + if (!i)/* no memory available */ + return 0; + sort(avail, i, sizeof(struct mem_desc), mem_desc_cmp, NULL); + + (md++)-base = t4_read_reg(adap, SGE_DBQ_CTXT_BADDR_A); + (md++)-base = t4_read_reg(adap, SGE_IMSG_CTXT_BADDR_A); + (md++)-base = t4_read_reg(adap, SGE_FLM_CACHE_BADDR_A); + (md++)-base = t4_read_reg(adap, TP_CMM_TCB_BASE_A); + (md++)-base = t4_read_reg(adap, TP_CMM_MM_BASE_A); + (md++)-base = t4_read_reg(adap, TP_CMM_TIMER_BASE_A); + (md++)-base = t4_read_reg(adap, TP_CMM_MM_RX_FLST_BASE_A); + (md++)-base = t4_read_reg(adap, TP_CMM_MM_TX_FLST_BASE_A); + (md++)-base = t4_read_reg(adap,
[PATCHv2 net-next 3/4] cxgb4/cxgb4vf: read the correct bits of PL Who Am I register
Read the correct bits of PL Who Am I for the Source PF field which has changed in T6 Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 34 - drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 8 -- drivers/net/ethernet/chelsio/cxgb4/t4_regs.h| 4 +++ drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 3 ++- 4 files changed, 45 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index d582e17..27e87b6 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -4551,6 +4551,32 @@ static void free_some_resources(struct adapter *adapter) NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA) #define SEGMENT_SIZE 128 +static int get_chip_type(struct pci_dev *pdev, u32 pl_rev) +{ + int ver, chip; + u16 device_id; + + /* Retrieve adapter's device ID */ + pci_read_config_word(pdev, PCI_DEVICE_ID, device_id); + ver = device_id 12; + switch (ver) { + case CHELSIO_T4: + chip |= CHELSIO_CHIP_CODE(CHELSIO_T4, pl_rev); + break; + case CHELSIO_T5: + chip |= CHELSIO_CHIP_CODE(CHELSIO_T5, pl_rev); + break; + case CHELSIO_T6: + chip |= CHELSIO_CHIP_CODE(CHELSIO_T6, pl_rev); + break; + default: + dev_err(pdev-dev, Device %d is not supported\n, + device_id); + return -EINVAL; + } + return chip; +} + static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) { int func, i, err, s_qpp, qpp, num_seg; @@ -4558,6 +4584,8 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) bool highdma = false; struct adapter *adapter = NULL; void __iomem *regs; + u32 whoami, pl_rev; + enum chip_type chip; printk_once(KERN_INFO %s - version %s\n, DRV_DESC, DRV_VERSION); @@ -4586,7 +4614,11 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent) goto out_unmap_bar0; /* We control everything through one PF */ - func = SOURCEPF_G(readl(regs + PL_WHOAMI_A)); + whoami = readl(regs + PL_WHOAMI_A); + pl_rev = REV_G(readl(regs + PL_REV_A)); + chip = get_chip_type(pdev, pl_rev); + func = CHELSIO_CHIP_VERSION(chip) = CHELSIO_T5 ? + SOURCEPF_G(whoami) : T6_SOURCEPF_G(whoami); if (func != ent-driver_data) { iounmap(regs); pci_disable_device(pdev); diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index b193295..5c63ceb 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -3529,7 +3529,9 @@ int t4_slow_intr_handler(struct adapter *adapter) void t4_intr_enable(struct adapter *adapter) { u32 val = 0; - u32 pf = SOURCEPF_G(t4_read_reg(adapter, PL_WHOAMI_A)); + u32 whoami = t4_read_reg(adapter, PL_WHOAMI_A); + u32 pf = CHELSIO_CHIP_VERSION(adapter-params.chip) = CHELSIO_T5 ? + SOURCEPF_G(whoami) : T6_SOURCEPF_G(whoami); if (CHELSIO_CHIP_VERSION(adapter-params.chip) = CHELSIO_T5) val = ERR_DROPPED_DB_F | ERR_EGR_CTXT_PRIO_F | DBFIFO_HP_INT_F; @@ -3554,7 +3556,9 @@ void t4_intr_enable(struct adapter *adapter) */ void t4_intr_disable(struct adapter *adapter) { - u32 pf = SOURCEPF_G(t4_read_reg(adapter, PL_WHOAMI_A)); + u32 whoami = t4_read_reg(adapter, PL_WHOAMI_A); + u32 pf = CHELSIO_CHIP_VERSION(adapter-params.chip) = CHELSIO_T5 ? + SOURCEPF_G(whoami) : T6_SOURCEPF_G(whoami); t4_write_reg(adapter, MYPF_REG(PL_PF_INT_ENABLE_A), 0); t4_set_reg_field(adapter, PL_INT_MAP0_A, 1 pf, 0); diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h index 13ce018..e444dc4 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h @@ -2588,6 +2588,10 @@ #define SOURCEPF_M0x7U #define SOURCEPF_G(x) (((x) SOURCEPF_S) SOURCEPF_M) +#define T6_SOURCEPF_S9 +#define T6_SOURCEPF_M0x7U +#define T6_SOURCEPF_G(x) (((x) T6_SOURCEPF_S) T6_SOURCEPF_M) + #define PL_INT_CAUSE_A 0x1940c #define ULP_TX_S27 diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c index 0db6dc9..63dd5fd 100644 --- a/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c @@ -619,7 +619,8 @@ int t4vf_get_sge_params(struct adapter *adapter) */ whoami = t4_read_reg(adapter, T4VF_PL_BASE_ADDR + PL_VF_WHOAMI_A); -
[PATCH net-next v2] openvswitch: Make 100 percents packets sampled when sampling rate is 1.
When sampling rate is 1, the sampling probability is UINT32_MAX. The packet should be sampled even the prandom_u32() generate the number of UINT32_MAX. And none packet need be sampled when the probability is 0. Signed-off-by: Wenyu Zhang wen...@vmware.com --- net/openvswitch/actions.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index cf04c2f..c81bcf5 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -669,9 +669,11 @@ static int sample(struct datapath *dp, struct sk_buff *skb, for (a = nla_data(attr), rem = nla_len(attr); rem 0; a = nla_next(a, rem)) { + uint32_t probability; switch (nla_type(a)) { case OVS_SAMPLE_ATTR_PROBABILITY: - if (prandom_u32() = nla_get_u32(a)) + probability = nla_get_u32(a); + if (!probability || prandom_u32() probability) return 0; break; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/2] openvswitch: Use regular GRE net_device instead of vport
On Mon, Aug 03, 2015 at 05:27:26PM -0700, Pravin B Shelar wrote: With addition of flow based tunneling, there is no need to have special GRE vport. Removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar pshe...@nicira.com ... +#define GRE_TAP_FB_NAME gretap0 ... + /* fallback device is used for flow based tunneling. */ + if (!strcmp(dev-name, GRE_TAP_FB_NAME)) { + struct ip_tunnel *t; + + t = netdev_priv(dev); + t-flow_based_tunnel = true; + eth_hw_addr_random(dev); + netif_keep_dst(dev); + } + feature detection based on netdev name? meaning that there will be only one such device for the whole host? and namespaces cannot have their own gre tunnels? (since host 'gretap0' cannot be seen in netns) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/2] openvswitch: Use regular GRE net_device instead of vport
From: Alexei Starovoitov alexei.starovoi...@gmail.com Date: Mon, 3 Aug 2015 21:23:40 -0700 On Mon, Aug 03, 2015 at 05:27:26PM -0700, Pravin B Shelar wrote: With addition of flow based tunneling, there is no need to have special GRE vport. Removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar pshe...@nicira.com ... +#define GRE_TAP_FB_NAME gretap0 ... +/* fallback device is used for flow based tunneling. */ +if (!strcmp(dev-name, GRE_TAP_FB_NAME)) { +struct ip_tunnel *t; + +t = netdev_priv(dev); +t-flow_based_tunnel = true; +eth_hw_addr_random(dev); +netif_keep_dst(dev); +} + feature detection based on netdev name? meaning that there will be only one such device for the whole host? and namespaces cannot have their own gre tunnels? (since host 'gretap0' cannot be seen in netns) Doing anything like this by netdev name is wrong. Pravin you will need to do this in some other way. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: dsa: fix EDSA frame from hwaccel frame
From: Vivien Didelot vivien.dide...@savoirfairelinux.com Date: Sun, 2 Aug 2015 21:46:02 -0400 If the underlying network device features NETIF_F_HW_VLAN_CTAG_TX, an EDSA frame is prepended with a 802.1q header once queued. To fix this, push the VLAN tag to the payload if present, before checking the frame protocol. [note: we may prefer to access directly VLAN TCI from hwaccel frames, but this approach is simpler.] Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com This is a bug fix so should target 'net', but you generated the patch against 'net-next'. In any event, you should be explicit about the tree you are targetting in order to not waste my time like this, by simply specifying the tree in your [PATCH xxx] text in your subject line. Either [PATCH net] or [PATCH net-next]. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xen-netback: Allocate fraglist early to avoid complex rollback
From: Ross Lagerwall ross.lagerw...@citrix.com Date: Mon, 3 Aug 2015 15:38:03 +0100 Determine if a fraglist is needed in the tx path, and allocate it if necessary before setting up the copy and map operations. Otherwise, undoing the copy and map operations is tricky. This fixes a use-after-free: if allocating the fraglist failed, the copy and map operations that had been set up were still executed, writing over the data area of a freed skb. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: null dereference in __skb_recv_datagram
[ 318.244596] BUG: unable to handle kernel NULL pointer dereference at 008e [ 318.245182] IP: [81455e7c] __skb_recv_datagram+0xbc/0x5a0 Replying to myself, and adding commit interested parties... I went through the git log for the function in question, and positively identified that the following commit introduces the crash: 738ac1e net: Clone skb before setting peeked flag Null dereference is at line 224 of net/core/datagram.c (according to my objdump dis-assembly): spin_lock_irqsave(queue-lock, cpu_flags); skb_queue_walk(queue, skb) { last = skb; *peeked = skb-peeked; ^--- if (flags MSG_PEEK) { if (_off = skb-len (skb-len || _off || skb-peeked)) { Beyond that, I'm probably out of my comfort zone, so if anyone has a bright idea of a patch to try, I will gladly test it. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] mpls: Use definition for reserved label checks
From: Robert Shearman rshea...@brocade.com Date: Mon, 3 Aug 2015 17:50:04 +0100 In multiple locations there are checks for whether the label in hand is a reserved label or not using the arbritray value of 16. Factor this out into a #define for better maintainability and for documentation. Signed-off-by: Robert Shearman rshea...@brocade.com --- Resend of an earlier version of this patch that was included as part of a larger series. Changes since that version: - Move new #define into userspace header file in line with other well-defined label values. Rename to match. Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/2] openvswitch: Use regular GRE net_device instead of vport
On Mon, Aug 3, 2015 at 9:59 PM, David Miller da...@davemloft.net wrote: From: Alexei Starovoitov alexei.starovoi...@gmail.com Date: Mon, 3 Aug 2015 21:23:40 -0700 On Mon, Aug 03, 2015 at 05:27:26PM -0700, Pravin B Shelar wrote: With addition of flow based tunneling, there is no need to have special GRE vport. Removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar pshe...@nicira.com ... +#define GRE_TAP_FB_NAME gretap0 ... +/* fallback device is used for flow based tunneling. */ +if (!strcmp(dev-name, GRE_TAP_FB_NAME)) { +struct ip_tunnel *t; + +t = netdev_priv(dev); +t-flow_based_tunnel = true; +eth_hw_addr_random(dev); +netif_keep_dst(dev); +} + feature detection based on netdev name? meaning that there will be only one such device for the whole host? and namespaces cannot have their own gre tunnels? (since host 'gretap0' cannot be seen in netns) Doing anything like this by netdev name is wrong. Pravin you will need to do this in some other way. ok, I will add API to create flow-based GRE device. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/2] openvswitch: Use regular GRE net_device instead of vport
On Mon, Aug 03, 2015 at 10:51:02PM -0700, Pravin Shelar wrote: On Mon, Aug 3, 2015 at 9:23 PM, Alexei Starovoitov alexei.starovoi...@gmail.com wrote: On Mon, Aug 03, 2015 at 05:27:26PM -0700, Pravin B Shelar wrote: With addition of flow based tunneling, there is no need to have special GRE vport. Removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar pshe...@nicira.com ... +#define GRE_TAP_FB_NAME gretap0 ... + /* fallback device is used for flow based tunneling. */ + if (!strcmp(dev-name, GRE_TAP_FB_NAME)) { + struct ip_tunnel *t; + + t = netdev_priv(dev); + t-flow_based_tunnel = true; + eth_hw_addr_random(dev); + netif_keep_dst(dev); + } + feature detection based on netdev name? meaning that there will be only one such device for the whole host? and namespaces cannot have their own gre tunnels? (since host 'gretap0' cannot be seen in netns) gretap0 exist in every namespace. This device is created in GRE namespce init. then all of them get to be in flow_based mode without being able to change it? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 0/2] RDS-TCP: Network namespace support
This patch series contains the set of changes to correctly set up the infra for PF_RDS sockets that use TCP as the transport in multiple network namespaces. Patch 1 in the series is the minimal set of changes to allow a single instance of RDS-TCP to run in any (i.e init_net or other) net namespace. The changes in this patch set ensure that the execution of 'modprobe [-r] rds_tcp' sets up the kernel TCP sockets relative to the current netns, so that RDS applications can send/recv packets from that netns, and the netns can later be deleted cleanly. Patch 2 of the series further allows multiple RDS-TCP instances, one per network namespace. The changes in this patch allows dynamic creation/tear-down of RDS-TCP client and server sockets across all current and future namespaces. v2 changes from RFC sent out earlier: David Ahern comments in patch 1, net_device notifier in patch 2, patch 3 broken off and submitted separately. Sowmini Varadhan (2): Make RDS-TCP work correctly when it is set up in a netns other than init_net Support multiple RDS-TCP listen endpoints, one per netns. net/rds/bind.c|3 +- net/rds/connection.c | 16 +++-- net/rds/ib.c |2 +- net/rds/ib_cm.c |5 +- net/rds/iw.c |2 +- net/rds/iw_cm.c |5 +- net/rds/rds.h | 23 ++- net/rds/send.c|3 +- net/rds/tcp.c | 167 +++- net/rds/tcp.h |7 ++- net/rds/tcp_connect.c |9 ++- net/rds/tcp_listen.c | 40 net/rds/transport.c |4 +- 13 files changed, 216 insertions(+), 70 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 2/2] RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.
Register pernet subsys init/stop functions that will set up and tear down per-net RDS-TCP listen endpoints. Unregister pernet subusys functions on 'modprobe -r' to clean up these end points. Enable keepalive on both accept and connect socket endpoints. The keepalive timer expiration will ensure that client socket endpoints will be removed as appropriate from the netns when an interface is removed from a namespace. Register a device notifier callback that will clean up all sockets (and thus avoid the need to wait for keepalive timeout) when the loopback device is unregistered from the netns indicating that the netns is getting deleted. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: net_device notifier for synchronous cleanup of sockets. net/rds/tcp.c | 163 - net/rds/tcp.h |7 ++- net/rds/tcp_connect.c |6 +- net/rds/tcp_listen.c | 38 +++- 4 files changed, 164 insertions(+), 50 deletions(-) diff --git a/net/rds/tcp.c b/net/rds/tcp.c index 98f5de3..339392b 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -35,6 +35,9 @@ #include linux/in.h #include linux/module.h #include net/tcp.h +#include net/net_namespace.h +#include net/netns/generic.h +#include net/tcp.h #include rds.h #include tcp.h @@ -250,16 +253,7 @@ static void rds_tcp_destroy_conns(void) } } -static void rds_tcp_exit(void) -{ - rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info); - rds_tcp_listen_stop(); - rds_tcp_destroy_conns(); - rds_trans_unregister(rds_tcp_transport); - rds_tcp_recv_exit(); - kmem_cache_destroy(rds_tcp_conn_slab); -} -module_exit(rds_tcp_exit); +static void rds_tcp_exit(void); struct rds_transport rds_tcp_transport = { .laddr_check= rds_tcp_laddr_check, @@ -281,6 +275,138 @@ struct rds_transport rds_tcp_transport = { .t_prefer_loopback = 1, }; +static int rds_tcp_netid; + +/* per-network namespace private data for this module */ +struct rds_tcp_net { + struct socket *rds_tcp_listen_sock; + struct work_struct rds_tcp_accept_w; +}; + +static void rds_tcp_accept_worker(struct work_struct *work) +{ + struct rds_tcp_net *rtn = container_of(work, + struct rds_tcp_net, + rds_tcp_accept_w); + + while (rds_tcp_accept_one(rtn-rds_tcp_listen_sock) == 0) + cond_resched(); +} + +void rds_tcp_accept_work(struct sock *sk) +{ + struct net *net = sock_net(sk); + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + + queue_work(rds_wq, rtn-rds_tcp_accept_w); +} + +static __net_init int rds_tcp_init_net(struct net *net) +{ + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + + rtn-rds_tcp_listen_sock = rds_tcp_listen_init(net); + if (!rtn-rds_tcp_listen_sock) { + pr_warn(could not set up listen sock\n); + return -EAFNOSUPPORT; + } + INIT_WORK(rtn-rds_tcp_accept_w, rds_tcp_accept_worker); + return 0; +} + +static void __net_exit rds_tcp_exit_net(struct net *net) +{ + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + + /* If rds_tcp_exit_net() is called as a result of netns deletion, +* the rds_tcp_kill_sock() device notifier would already have cleaned +* up the listen socket, thus there is no work to do in this function. +* +* If rds_tcp_exit_net() is called as a result of module unload, +* i.e., due to rds_tcp_exit() - unregister_pernet_subsys(), then +* we do need to clean up the listen socket here. +*/ + if (rtn-rds_tcp_listen_sock) { + rds_tcp_listen_stop(rtn-rds_tcp_listen_sock); + rtn-rds_tcp_listen_sock = NULL; + flush_work(rtn-rds_tcp_accept_w); + } +} + +static struct pernet_operations rds_tcp_net_ops = { + .init = rds_tcp_init_net, + .exit = rds_tcp_exit_net, + .id = rds_tcp_netid, + .size = sizeof(struct rds_tcp_net), +}; + +static void rds_tcp_kill_sock(struct net *net) +{ + struct rds_tcp_connection *tc, *_tc; + struct sock *sk; + struct list_head tmp_list; + struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid); + + rds_tcp_listen_stop(rtn-rds_tcp_listen_sock); + rtn-rds_tcp_listen_sock = NULL; + flush_work(rtn-rds_tcp_accept_w); + INIT_LIST_HEAD(tmp_list); + spin_lock_irq(rds_tcp_conn_lock); + list_for_each_entry_safe(tc, _tc, rds_tcp_conn_list, t_tcp_node) { + struct net *c_net = read_pnet(tc-conn-c_net); + + if (net != c_net || !tc-t_sock) + continue; + list_del(tc-t_tcp_node); + list_add_tail(tc-t_tcp_node, tmp_list); + } + spin_unlock_irq(rds_tcp_conn_lock); +
[PATCH net-next 2/2] rocker: use netdev_err after register_netdev
From: Scott Feldman sfel...@gmail.com After successful register_netdev, we can use netdev_err rather the more generic dev_err. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 0ab3a3b..4e8cad0 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4985,7 +4985,7 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number) err = rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, 0); if (err) { - dev_err(pdev-dev, install ig port table failed\n); + netdev_err(rocker_port-dev, install ig port table failed\n); goto err_port_ig_tbl; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/2] rocker: NULL port if port probe fails
From: Scott Feldman sfel...@gmail.com Set port to NULL if port probe fails so we don't try to remove partially initialized port on port probe err cleanup path. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 7b4c347..0ab3a3b 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -5005,6 +5005,7 @@ err_untagged_vlan: rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, ROCKER_OP_FLAG_REMOVE); err_port_ig_tbl: + rocker-ports[port_number] = NULL; unregister_netdev(dev); err_register_netdev: free_netdev(dev); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net_dbg_ratelimited: turn into no-op when !DEBUG
On Mon, 2015-08-03 at 20:57 -0700, Joe Perches wrote: On Tue, 2015-08-04 at 05:26 +0200, Jason A. Donenfeld wrote: This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Makes sense, thanks Jason. Perhaps better still would be to use if (0) no_printk so that the call and whatever argument calls the net_dbg_ratelimited makes are completely eliminated. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net_sched: act_bpf: remove spinlock in fast path
Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing. Also similar to gact/mirred there is a race between prog-filter and prog-tcf_action. Meaning that the program being replaced may use previous default action if it happened to return TC_ACT_UNSPEC. act_mirred race betwen tcf_action and tcfm_dev is similar. In all cases the race is harmless. Long term we may want to improve the situation by replacing the whole struct tc_action as single pointer instead of updating inner fields one by one. Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- net/sched/act_bpf.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c index 1b97dabc621a..2b8c47200152 100644 --- a/net/sched/act_bpf.c +++ b/net/sched/act_bpf.c @@ -43,10 +43,8 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act, if (unlikely(!skb_mac_header_was_set(skb))) return TC_ACT_UNSPEC; - spin_lock(prog-tcf_lock); - - prog-tcf_tm.lastuse = jiffies; - bstats_update(prog-tcf_bstats, skb); + tcf_lastuse_update(prog-tcf_tm); + bstats_cpu_update(this_cpu_ptr(prog-common.cpu_bstats), skb); /* Needed here for accessing maps. */ rcu_read_lock(); @@ -77,7 +75,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act, break; case TC_ACT_SHOT: action = filter_res; - prog-tcf_qstats.drops++; + qstats_drop_inc(this_cpu_ptr(prog-common.cpu_qstats)); break; case TC_ACT_UNSPEC: action = prog-tcf_action; @@ -87,7 +85,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act, break; } - spin_unlock(prog-tcf_lock); return action; } @@ -294,7 +291,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, if (!tcf_hash_check(parm-index, act, bind)) { ret = tcf_hash_create(parm-index, est, act, - sizeof(*prog), bind, false); + sizeof(*prog), bind, true); if (ret 0) return ret; @@ -325,7 +322,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, goto out; prog = to_bpf(act); - spin_lock_bh(prog-tcf_lock); + ASSERT_RTNL(); if (ret != ACT_P_CREATED) tcf_bpf_prog_fill_cfg(prog, old); @@ -341,8 +338,6 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla, prog-tcf_action = parm-action; prog-filter = cfg.filter; - spin_unlock_bh(prog-tcf_lock); - if (res == ACT_P_CREATED) tcf_hash_insert(act); else -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 1/2] RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net
Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: David Ahern comments. net/rds/bind.c|3 ++- net/rds/connection.c | 16 ++-- net/rds/ib.c |2 +- net/rds/ib_cm.c |5 +++-- net/rds/iw.c |2 +- net/rds/iw_cm.c |5 +++-- net/rds/rds.h | 23 +++ net/rds/send.c|3 ++- net/rds/tcp.c |4 ++-- net/rds/tcp_connect.c |3 ++- net/rds/tcp_listen.c | 16 net/rds/transport.c |4 ++-- 12 files changed, 59 insertions(+), 27 deletions(-) diff --git a/net/rds/bind.c b/net/rds/bind.c index 4ebd29c..dd666fb 100644 --- a/net/rds/bind.c +++ b/net/rds/bind.c @@ -185,7 +185,8 @@ int rds_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) ret = 0; goto out; } - trans = rds_trans_get_preferred(sin-sin_addr.s_addr); + trans = rds_trans_get_preferred(sock_net(sock-sk), + sin-sin_addr.s_addr); if (!trans) { ret = -EADDRNOTAVAIL; rds_remove_bound(rs); diff --git a/net/rds/connection.c b/net/rds/connection.c index da6da57..d4fecb2 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -117,7 +117,8 @@ static void rds_conn_reset(struct rds_connection *conn) * For now they are not garbage collected once they're created. They * are torn down as the module is removed, if ever. */ -static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, +static struct rds_connection *__rds_conn_create(struct net *net, + __be32 laddr, __be32 faddr, struct rds_transport *trans, gfp_t gfp, int is_outgoing) { @@ -157,6 +158,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, conn-c_faddr = faddr; spin_lock_init(conn-c_lock); conn-c_next_tx_seq = 1; + rds_conn_net_set(conn, net); init_waitqueue_head(conn-c_waitq); INIT_LIST_HEAD(conn-c_send_queue); @@ -174,7 +176,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, * can bind to the destination address then we'd rather the messages * flow through loopback rather than either transport. */ - loop_trans = rds_trans_get_preferred(faddr); + loop_trans = rds_trans_get_preferred(net, faddr); if (loop_trans) { rds_trans_put(loop_trans); conn-c_loopback = 1; @@ -260,17 +262,19 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr, return conn; } -struct rds_connection *rds_conn_create(__be32 laddr, __be32 faddr, +struct rds_connection *rds_conn_create(struct net *net, + __be32 laddr, __be32 faddr, struct rds_transport *trans, gfp_t gfp) { - return __rds_conn_create(laddr, faddr, trans, gfp, 0); + return __rds_conn_create(net, laddr, faddr, trans, gfp, 0); } EXPORT_SYMBOL_GPL(rds_conn_create); -struct rds_connection *rds_conn_create_outgoing(__be32 laddr, __be32 faddr, +struct rds_connection *rds_conn_create_outgoing(struct net *net, + __be32 laddr, __be32 faddr, struct rds_transport *trans, gfp_t gfp) { - return __rds_conn_create(laddr, faddr, trans, gfp, 1); + return __rds_conn_create(net, laddr, faddr, trans, gfp, 1); } EXPORT_SYMBOL_GPL(rds_conn_create_outgoing); diff --git a/net/rds/ib.c b/net/rds/ib.c index ba2dffe..1381422 100644 --- a/net/rds/ib.c +++ b/net/rds/ib.c @@ -317,7 +317,7 @@ static void rds_ib_ic_info(struct socket *sock, unsigned int len, * allowed to influence which paths have priority. We could call userspace * asserting this policy routing. */ -static int rds_ib_laddr_check(__be32 addr) +static int rds_ib_laddr_check(struct net *net, __be32 addr) { int ret; struct rdma_cm_id *cm_id; diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c index 0da2a45..f40d8f5 100644 --- a/net/rds/ib_cm.c +++ b/net/rds/ib_cm.c @@ -448,8 +448,9 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id, (unsigned long long)be64_to_cpu(lguid), (unsigned long long)be64_to_cpu(fguid)); - conn = rds_conn_create(dp-dp_daddr, dp-dp_saddr, rds_ib_transport, - GFP_KERNEL); + /* RDS/IB is not currently netns aware, thus init_net */ + conn = rds_conn_create(init_net, dp-dp_daddr, dp-dp_saddr, + rds_ib_transport, GFP_KERNEL); if (IS_ERR(conn)) {
Re: [PATCH 1/1] net/ipv4: Enable flow-based ECMP
On Tue, 4 Aug 2015 13:28:47 +1200 Richard Laing richard.la...@alliedtelesis.co.nz wrote: diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 5fa643b..7db9f72 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -117,6 +117,8 @@ struct fib_info { #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_power; #endif + /* Cache the number of live nexthops for flow based ECMP calculation. */ + int live_nexthops; unsigned or u16 ? rather than risking sign issues. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/2] openvswitch: Use regular GRE net_device instead of vport
On Mon, Aug 3, 2015 at 9:23 PM, Alexei Starovoitov alexei.starovoi...@gmail.com wrote: On Mon, Aug 03, 2015 at 05:27:26PM -0700, Pravin B Shelar wrote: With addition of flow based tunneling, there is no need to have special GRE vport. Removes all of the OVS specific GRE code and make OVS use a ip_gre net_device. Minimal GRE vport is kept to handle compatibility with current userspace application. Signed-off-by: Pravin B Shelar pshe...@nicira.com ... +#define GRE_TAP_FB_NAME gretap0 ... + /* fallback device is used for flow based tunneling. */ + if (!strcmp(dev-name, GRE_TAP_FB_NAME)) { + struct ip_tunnel *t; + + t = netdev_priv(dev); + t-flow_based_tunnel = true; + eth_hw_addr_random(dev); + netif_keep_dst(dev); + } + feature detection based on netdev name? meaning that there will be only one such device for the whole host? and namespaces cannot have their own gre tunnels? (since host 'gretap0' cannot be seen in netns) gretap0 exist in every namespace. This device is created in GRE namespce init. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 1/2] rocker: enable support for scattered packets
From: Jiri Pirko j...@resnulli.us Date: Sun, 2 Aug 2015 20:56:37 +0200 From: Ido Schimmel ido...@mellanox.com rocker supports the transmission of scattered packets, so let the kernel know about it by setting the NETIF_F_SG bit in the device's features. Signed-off-by: Ido Schimmel ido...@mellanox.com Signed-off-by: Jiri Pirko j...@resnulli.us Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 2/2] rocker: linearize skb in case frags would not fit into tx descriptor
From: Jiri Pirko j...@resnulli.us Date: Sun, 2 Aug 2015 20:56:38 +0200 Suggested-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] virtio_net: add gro capability
From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 31 Jul 2015 18:25:17 +0200 From: Eric Dumazet eduma...@google.com Straightforward patch to add GRO processing to virtio_net. napi_complete_done() usage allows more aggressive aggregation, opted-in by setting /sys/class/net/xxx/gro_flush_timeout Tested: Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec, Rick Jones reported following results. One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs and Intel 82599ES-based NICs. So, two before and two after VMs. The OpenStack compute nodes were running OpenStack Kilo, with VxLAN encapsulation being used through OVS so no GRO coming-up the host stack. The compute nodes themselves were running a 3.14-based kernel. Single-stream netperf, CPU utilizations and thus service demands are based on intra-guest reported CPU. ... Signed-off-by: Eric Dumazet eduma...@google.com Tested-by: Rick Jones rick.jon...@hp.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] Stacked vlan TSO for virtual devices
From: Toshiaki Makita makita.toshi...@lab.ntt.co.jp Date: Fri, 31 Jul 2015 15:03:23 +0900 Basically virtual devices do not need to segment double tagged packets. This patch set adds TSO feature for double tagged packets to several virtual devices, which can be realized by simply setting .ndo_features_check to passthru_features_check. Series applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ebpf: add skb-hash to offset map for usage in {cls,act}_bpf or filters
On 8/2/15 6:09 PM, Tom Herbert wrote: I was thinking whether to add skb_get_hash(), but then concluded the raw skb-hash seems fine in this case: we can directly access the hash w/o extra eBPF helper function call, it's filled out by many NICs on ingress, and in case the entropy level would not be sufficient, people can still implement their own specific sw fallback hash mix anyway. Maybe we should add the skb_get_hash also? It doesn't as useful if some scenarios we get a valid hash and in others not. we also discussed whether it makes sense to expose l4_hash and sw_hash bits as well. imo, seems a bit of overkill, since such call into sw hash function like this exposes the logic of flow_dissector looking into inner header. There are pros and cons. I think if we expose flow_dissector it's cleaner to do it directly (instead of skb_get_hash). Alternatively we can obfuscate skb_get_hash by calling it 'please compute some a hash on a packet somehow', but that will be awkward to use. The programs can compute whatever hash they like anyway. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] net: macb: Add mdio driver for accessing multiple phy devices
Hi, On 07/31/2015 11:53 PM, Nathan Sullivan wrote: On Tue, Jul 28, 2015 at 03:34:51AM +, Punnaiah Choudary Kalluri wrote: Ok. I will send you updated patch for mdio support soon and we will finalize next Course of actions if it doesn't break the existing flow. Thanks, Punnaiah Just a heads up, when mdio no longer turns off when macb goes down, the micrel 9031 phy will have an issue with interrupts getting disabling during phy suspend. I have a patch to correct this issue here: https://patchwork.ozlabs.org/patch/502189/ Would you mind including this patch in your set? You should resend the patch again and you got one more argument why this patch should go it. But it should go in own direction out of this patch. Thanks, Michal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] MIPS: Remove all the uses of custom gpio.h
On Thu, Jul 30, 2015 at 7:28 PM, Alban Bedel al...@free.fr wrote: Currently CONFIG_ARCH_HAVE_CUSTOM_GPIO_H is defined for all MIPS machines, and each machine type provides its own gpio.h. However only a handful really implement the GPIO API, most just forward everythings to gpiolib. The Alchemy machine is notable as it provides a system to allow implementing the GPIO API at the board level. But it is not used by any board currently supported, so it can also be removed. For most machine types we can just remove the custom gpio.h, as well as the custom wrappers if some exists. Some of the code found in the wrappers must be moved to the respective GPIO driver. A few more fixes are need in some drivers as they rely on linux/gpio.h to provides some machine specific definitions, or used asm/gpio.h instead of linux/gpio.h for the gpio API. Signed-off-by: Alban Bedel al...@free.fr --- This patch is based on my previous serie: MIPS: ath79: Move the GPIO driver to drivers/gpio. It supercede my previous patch named: MIPS: Remove most of the custom gpio.h Compared to the previous patch: * Fixed gpio_to_irq on jz4740 and rb532 * Cleaned up alchemy as well * Removed asm/gpio.h For testing I tried to build all mips defconfig, however my toolchain couldn't handle a few configs: ip28 malta_qemu_32r6 maltasmvp_eva sead3micro. If somebody can test these that would be more than welcome. Now a few stats about the state of CONFIG_ARCH_HAVE_CUSTOM_GPIO_H after appling this patch. Of the 31 supportd arch, 15 still have asm/gpio.h, of these 9 are just a #warning Include linux/gpio.h instead of asm/gpio.h. So we have 6 arch left: arm, avr32, blackfin, m68k, sh and unicore32. But only m68k and unicore32 really provides custom wrappers, all the others only forward to gpiolib. On the drivers side we only have 13 occurences of '#include asm/gpio.h' left, mostly in drivers used on ARM SoC. So the work left to phase out the legacy GPIO is really not that much anymore. Very good job being done here. Reviewed-by: Linus Walleij linus.wall...@linaro.org I guess this better go in through the MIPS tree. Given all the OpenWRT ports using MIPS this is excellent progress for a large hobbyist community. Yours, Linus Walleij -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r8169: Permit users to change transmit and receive max pachet size
From: Corcodel Marian corcodel.mar...@gmail.com Date: Mon, 3 Aug 2015 10:28:38 +0300 Realtek nic its very versatile and have more registers for optimise and solve different issues. I added 2 parameters rx_buf_sz and txpacketmax 1.Parameter rx_buf_sz represent Receive Packet Maximum size and on this program is 16383 bytes, eg RTL 8101E use 16000 and user may use alls values up to maximum but value great from zero. If a received packet of packet length larger than the value set here, then it will set both RWT and RES bits in the corresponding Rx Status Descriptor. If the packet, which is larger than the RMS value, is received without CRC error, it is still a good packet, although both RWT and RES bits are set in the corresponding Rx Status Descriptor. 2. Parameter txpacketmax represent Max Transmit Packet Size value must be on range 1-63.Do not put zero on any situation.Every field from range 1-63 have 128 bytes. For regular LAN applications, i.e., the max packet size is either 1518 or 1522 (VLAN) bytes, this field must be larger than the max packet size. E.g., 0x0C. On mee working good with txpacketmax=60 and rx_buf_sz=1600 Signed-off-by: Corcodel Marian corcodel.mar...@gmail.com Sorry, such module parameters are completely inappropriate. Control the values, at run time, using a standard, generic facility such as ethtool. Please stop sending patches that add new module parameters, they are almost certainly guaranteed to be unacceptable. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2 net-next] iplink: bonding: add support for IFLA_BOND_TLB_DYNAMIC_LB
From: Nikolay Aleksandrov niko...@cumulusnetworks.com Add support to be able to set and show the value of tlb_dynamic_lb (IFLA_BOND_TLB_DYNAMIC_LB). Example: $ ip -d link show dev bond0 type bond 7: bond0: BROADCAST,MULTICAST,MASTER mtu 1500 qdisc noop state DOWN mode DEFAULT group default link/ether ce:2f:e1:6e:d7:e0 brd ff:ff:ff:ff:ff:ff promiscuity 0 bond mode balance-tlb miimon 100 updelay 0 downdelay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 $ ip -d l set dev bond0 type bond tlb_dynamic_lb 0 $ ip -d link show dev bond0 type bond 7: bond0: BROADCAST,MULTICAST,MASTER mtu 1500 qdisc noop state DOWN mode DEFAULT group default link/ether ce:2f:e1:6e:d7:e0 brd ff:ff:ff:ff:ff:ff promiscuity 0 bond mode balance-tlb miimon 100 updelay 0 downdelay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy layer2 resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 0 addrgenmode eui64 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- ip/iplink_bond.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/ip/iplink_bond.c b/ip/iplink_bond.c index 2a9783e4f114..1b50de909356 100644 --- a/ip/iplink_bond.c +++ b/ip/iplink_bond.c @@ -133,6 +133,7 @@ static void print_explain(FILE *f) [ min_links MIN_LINKS ]\n [ lp_interval LP_INTERVAL ]\n [ packets_per_slave PACKETS_PER_SLAVE ]\n + [ tlb_dynamic_lb TLB_DYNAMIC_LB ]\n [ lacp_rate LACP_RATE ]\n [ ad_select AD_SELECT ]\n [ ad_user_port_key PORTKEY ]\n @@ -160,7 +161,7 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv, { __u8 mode, use_carrier, primary_reselect, fail_over_mac; __u8 xmit_hash_policy, num_peer_notif, all_slaves_active; - __u8 lacp_rate, ad_select; + __u8 lacp_rate, ad_select, tlb_dynamic_lb; __u16 ad_user_port_key, ad_actor_sys_prio; __u32 miimon, updelay, downdelay, arp_interval, arp_validate; __u32 arp_all_targets, resend_igmp, min_links, lp_interval; @@ -374,6 +375,14 @@ static int bond_parse_opt(struct link_util *lu, int argc, char **argv, return -1; addattr_l(n, 1024, IFLA_BOND_AD_ACTOR_SYSTEM, abuf, len); + } else if (matches(*argv, tlb_dynamic_lb) == 0) { + NEXT_ARG(); + if (get_u8(tlb_dynamic_lb, *argv, 0)) { + invarg(invalid tlb_dynamic_lb, *argv); + return -1; + } + addattr8(n, 1024, IFLA_BOND_TLB_DYNAMIC_LB, +tlb_dynamic_lb); } else if (matches(*argv, help) == 0) { explain(); return -1; @@ -583,6 +592,11 @@ static void bond_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) RTA_PAYLOAD(tb[IFLA_BOND_AD_ACTOR_SYSTEM]), 1 /*ARPHDR_ETHER*/, b1, sizeof(b1))); } + + if (tb[IFLA_BOND_TLB_DYNAMIC_LB]) { + fprintf(f, tlb_dynamic_lb %u , + rta_getattr_u8(tb[IFLA_BOND_TLB_DYNAMIC_LB])); + } } static void bond_print_help(struct link_util *lu, int argc, char **argv, -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1] netdev: Make 100 percents packets sampled when sampling rate is 1.
When sampling rate is 1, the sampling probability is UINT32_MAX. The packet should be sampled even the prandom32() generate the number of UINT32_MAX. And none packet need be sampled when the probability is 0. Signed-off-by: Wenyu Zhang wen...@vmware.com --- net/openvswitch/actions.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index cf04c2f..03acb09 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -669,9 +669,11 @@ static int sample(struct datapath *dp, struct sk_buff *skb, for (a = nla_data(attr), rem = nla_len(attr); rem 0; a = nla_next(a, rem)) { + uint32_t probability; switch (nla_type(a)) { case OVS_SAMPLE_ATTR_PROBABILITY: - if (prandom_u32() = nla_get_u32(a)) + probability = nla_get_u32(a); + if (!probability || probability nla_get_u32(a)) return 0; break; -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-4.2] rtlwifi: rtl8723be: Add module parameter for MSI interrupts
The driver code allows for the disabling of MSI interrupts; however the module_parm line was missed and the option fails to show with modinfo. Signed-off-by: Larry Finger larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org [3.15+] Thanks, applied to wireless-drivers.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] r8169: Permit users to change transmit and receive max pachet size
Realtek nic its very versatile and have more registers for optimise and solve different issues. I added 2 parameters rx_buf_sz and txpacketmax 1.Parameter rx_buf_sz represent Receive Packet Maximum size and on this program is 16383 bytes, eg RTL 8101E use 16000 and user may use alls values up to maximum but value great from zero. If a received packet of packet length larger than the value set here, then it will set both RWT and RES bits in the corresponding Rx Status Descriptor. If the packet, which is larger than the RMS value, is received without CRC error, it is still a good packet, although both RWT and RES bits are set in the corresponding Rx Status Descriptor. 2. Parameter txpacketmax represent Max Transmit Packet Size value must be on range 1-63.Do not put zero on any situation.Every field from range 1-63 have 128 bytes. For regular LAN applications, i.e., the max packet size is either 1518 or 1522 (VLAN) bytes, this field must be larger than the max packet size. E.g., 0x0C. On mee working good with txpacketmax=60 and rx_buf_sz=1600 Signed-off-by: Corcodel Marian corcodel.mar...@gmail.com --- drivers/net/ethernet/realtek/r8169.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 3df51fa..5a942c5 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -345,6 +345,7 @@ static const struct pci_device_id rtl8169_pci_tbl[] = { MODULE_DEVICE_TABLE(pci, rtl8169_pci_tbl); static int rx_buf_sz = 16383; +static int txpacketmax = 63; static int use_dac; static struct { u32 msg_enable; @@ -406,7 +407,7 @@ enum rtl_registers { MaxTxPacketSize = 0xec, /* 8101/8168. Unit of 128 bytes. */ -#define TxPacketMax(8064 7) +//#define TxPacketMax (8064 7) #define EarlySize 0x27 FuncEvent = 0xf0, @@ -850,6 +851,10 @@ module_param(use_dac, int, 0); MODULE_PARM_DESC(use_dac, Enable PCI DAC. Unsafe on 32 bit PCI slot.); module_param_named(debug, debug.msg_enable, int, 0); MODULE_PARM_DESC(debug, Debug verbosity level (0=none, ..., 16=all)); +module_param(rx_buf_sz, int, 0); +MODULE_PARM_DESC(rx_buf_sz, Receive Packet Maximum Size. ); +module_param(txpacketmax, int, 0); +MODULE_PARM_DESC(txpacketmax, Max Transmit Packet Size. ); MODULE_LICENSE(GPL); MODULE_VERSION(RTL8169_VERSION); MODULE_FIRMWARE(FIRMWARE_8168D_1); @@ -5593,7 +5598,7 @@ static void rtl_hw_start_8168bef(struct rtl8169_private *tp) rtl_hw_start_8168bb(tp); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); RTL_W8(Config4, RTL_R8(Config4) ~(1 0)); } @@ -5659,7 +5664,7 @@ static void rtl_hw_start_8168cp_3(struct rtl8169_private *tp) /* Magic. */ RTL_W8(DBG_REG, 0x20); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); if (tp-dev-mtu = ETH_DATA_LEN) rtl_tx_performance_tweak(pdev, 0x5 MAX_READ_REQUEST_SHIFT); @@ -5720,7 +5725,7 @@ static void rtl_hw_start_8168d(struct rtl8169_private *tp) rtl_disable_clock_request(pdev); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); if (tp-dev-mtu = ETH_DATA_LEN) rtl_tx_performance_tweak(pdev, 0x5 MAX_READ_REQUEST_SHIFT); @@ -5738,7 +5743,7 @@ static void rtl_hw_start_8168dp(struct rtl8169_private *tp) if (tp-dev-mtu = ETH_DATA_LEN) rtl_tx_performance_tweak(pdev, 0x5 MAX_READ_REQUEST_SHIFT); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); rtl_disable_clock_request(pdev); } @@ -5758,7 +5763,7 @@ static void rtl_hw_start_8168d_4(struct rtl8169_private *tp) rtl_tx_performance_tweak(pdev, 0x5 MAX_READ_REQUEST_SHIFT); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); for (i = 0; i ARRAY_SIZE(e_info_8168d_4); i++) { const struct ephy_info *e = e_info_8168d_4 + i; @@ -5798,7 +5803,7 @@ static void rtl_hw_start_8168e_1(struct rtl8169_private *tp) if (tp-dev-mtu = ETH_DATA_LEN) rtl_tx_performance_tweak(pdev, 0x5 MAX_READ_REQUEST_SHIFT); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); rtl_disable_clock_request(pdev); @@ -6227,7 +6232,7 @@ static void rtl_hw_start_8168(struct net_device *dev) RTL_W8(Cfg9346, Cfg9346_Unlock); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax); rtl_set_rx_max_size(ioaddr, rx_buf_sz); @@ -6521,7 +6526,7 @@ static void rtl_hw_start_8101(struct net_device *dev) RTL_W8(Cfg9346, Cfg9346_Unlock); - RTL_W8(MaxTxPacketSize, TxPacketMax); + RTL_W8(MaxTxPacketSize, txpacketmax);
Re: [PATCH] MIPS: Remove all the uses of custom gpio.h
On Mon, Aug 03, 2015 at 09:13:27AM +0200, Linus Walleij wrote: Very good job being done here. Reviewed-by: Linus Walleij linus.wall...@linaro.org I guess this better go in through the MIPS tree. Given all the OpenWRT ports using MIPS this is excellent progress for a large hobbyist community. Alban has posted a v2 [1] already but I take it that your Reviewed-by: applies to the v2 patch as well? Ralf [1] https://patchwork.linux-mips.org/patch/10828/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] test_bpf: test LD_ABS and LD_IND instructions on fragmented skbs.
On 08/03/2015 04:02 PM, Nicolas Schichan wrote: These new tests exercise various load sizes and offsets crossing the head/fragment boundary. Signed-off-by: Nicolas Schichan nschic...@freebox.fr Acked-by: Alexei Starovoitov a...@plumgrid.com Acked-by: Daniel Borkmann dan...@iogearbox.net -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: dsa: mv88e6xxx: call _mv88e6xxx_stats_wait with SMI lock held
On 08/03/2015 06:17 AM, Vivien Didelot wrote: At switch setup, _mv88e6xxx_stats_wait was called without holding the SMI mutex. Fix this by requesting the lock for this call. Also, return the _mv88e6xxx_stats_wait code, since it may fail. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com Not strictly needed because the mutex is initialized in the same call sequence, but it doesn't hurt and is technically ok (and may prevent others from submitting the same patch again ;-) Reviewed-by: Guenter Roeck li...@roeck-us.net --- drivers/net/dsa/mv88e6xxx.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 5e1dce1..8966cf2 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1938,6 +1938,7 @@ int mv88e6xxx_setup_common(struct dsa_switch *ds) int mv88e6xxx_setup_global(struct dsa_switch *ds) { struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + int ret; int i; /* Set the default address aging time to 5 minutes, and @@ -2036,9 +2037,11 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds) REG_WRITE(REG_GLOBAL, GLOBAL_STATS_OP, GLOBAL_STATS_OP_FLUSH_ALL); /* Wait for the flush to complete. */ - _mv88e6xxx_stats_wait(ds); + mutex_lock(ps-smi_mutex); + ret = _mv88e6xxx_stats_wait(ds); + mutex_unlock(ps-smi_mutex); - return 0; + return ret; } int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] test_bpf: add more tests for LD_ABS and LD_IND.
On 08/03/2015 04:02 PM, Nicolas Schichan wrote: This exerces the LD_ABS and LD_IND instructions for various sizes and alignments. This also checks that X when used as an offset to a BPF_IND instruction first in a filter is correctly set to 0. Signed-off-by: Nicolas Schichan nschic...@freebox.fr Acked-by: Alexei Starovoitov a...@plumgrid.com Acked-by: Daniel Borkmann dan...@iogearbox.net -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] test_bpf: add tests checking that JIT/interpreter sets A and X to 0.
On 08/03/2015 04:02 PM, Nicolas Schichan wrote: It is mandatory for the JIT or interpreter to reset the A and X registers to 0 before running the filter. Check that it is the case on various ALU and JMP instructions. Signed-off-by: Nicolas Schichan nschic...@freebox.fr Acked-by: Alexei Starovoitov a...@plumgrid.com Acked-by: Daniel Borkmann dan...@iogearbox.net -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 RFT] net: fec: Ensure clocks are enabled while using mdio bus
Hello, On Mon, Aug 03, 2015 at 03:50:23PM +0200, Andrew Lunn wrote: The problem is that on i.MX27 there are two clocks involved that both must be on to send a packet, while on i.MX6 it's only a single one (abstracted by having ipg-clock = ahb-clock). With the suggested patch only a single one is asserted to be on. This is enough for i.MX6 but it's not for i.MX27 (and from looking at the device trees also i.MX25, i.MX28, and i.MX35 are affected). I don't think it is as simple as this. If you are sending a packet, fec_enet_open() must of been called. This does a pm_runtime_get_sync() to ensure the ipg-clock is ticking, and fec_enet_clk_enable() to enable all other clocks. Can you debug this further to find out which clock is off, and where it gets turned off? I added a call to pm_runtime_get_sync to fec_enet_start_xmit and put it back at its end. This way I was able to boot with NFS-root which resulted in an oops before. But this is wrong because if I set FEC_MDIO_PM_TIMEOUT to 0 I get an oops anyhow. I added the following patch: diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c650e..508c4af4fde8 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -64,6 +64,30 @@ #include fec.h +#define fec_pm_runtime_get_sync(_dev) ({ \ + dev_info((_dev), %s: get_sync (%d), __func__, (_dev)-power.usage_count.counter); \ + pm_runtime_get_sync((_dev));\ +}) +#define pm_runtime_get_sync(_dev) fec_pm_runtime_get_sync(_dev) + +#define fec_pm_runtime_put_autosuspend(_dev) ({ \ + dev_info((_dev), %s: put_autosuspend (%d), __func__, (_dev)-power.usage_count.counter); \ + pm_runtime_put_autosuspend((_dev)); \ +}) +#define pm_runtime_put_autosuspend(_dev) fec_pm_runtime_put_autosuspend(_dev) + +#define fec_clk_prepare_enable(_clk) ({ \ + pr_info(%s: enable #_clk \n, __func__);\ + clk_prepare_enable((_clk)); \ +}) +#define clk_prepare_enable(_clk) fec_clk_prepare_enable(_clk) + +#define fec_clk_disable_unprepare(_clk)({ \ + pr_info(%s: disable #_clk \n, __func__); \ + clk_disable_unprepare((_clk)); \ +}) +#define clk_disable_unprepare(_clk) fec_clk_disable_unprepare(_clk) + static void set_multicast_list(struct net_device *ndev); static void fec_enet_itr_coal_init(struct net_device *ndev); @@ -784,6 +808,7 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev) struct netdev_queue *nq; int ret; + pr_info(%s\n, __func__); queue = skb_get_queue_mapping(skb); txq = fep-tx_queue[queue]; nq = netdev_get_tx_queue(ndev, queue); @@ -2864,6 +2889,7 @@ fec_enet_open(struct net_device *ndev) struct fec_enet_private *fep = netdev_priv(ndev); int ret; + pr_info(%s\n, __func__); ret = pm_runtime_get_sync(fep-pdev-dev); if (IS_ERR_VALUE(ret)) return ret; @@ -2932,6 +2958,8 @@ fec_enet_close(struct net_device *ndev) fec_enet_free_buffers(ndev); + pr_info(%s\n, __func__); + return 0; } @@ -3416,6 +3444,7 @@ fec_probe(struct platform_device *pdev) goto failed_clk; ret = clk_prepare_enable(fep-clk_ipg); + clk_prepare_enable(fep-clk_ipg); if (ret) goto failed_clk_ipg; Which produced the following output (piped through nl for better reference): 1 fec_enet_open 2 fec 1002b000.ethernet: fec_enet_open: get_sync (-1) 3 fec_runtime_resume: enable fep-clk_ipg 4 fec_enet_clk_enable: enable fep-clk_ahb 5 fec 1002b000.ethernet: fec_enet_mdio_write: get_sync (0) 6 fec 1002b000.ethernet: fec_enet_mdio_write: put_autosuspend (1) 7 mmc0: new SD card at address 0007 8 mmcblk0: mmc0:0007 SD01G 972 MiB (ro) 9 mmcblk0: p1 10 fec 1002b000.ethernet: fec_enet_mdio_read: get_sync (0) 11 fec 1002b000.ethernet: fec_enet_mdio_read: put_autosuspend (1) 12 fec 1002b000.ethernet: fec_enet_mdio_read: get_sync (0) 13 fec 1002b000.ethernet: fec_enet_mdio_read: put_autosuspend (1) 14 fec 1002b000.ethernet: fec_enet_mdio_read: get_sync (0) 15 fec 1002b000.ethernet: fec_enet_mdio_read: put_autosuspend (1) 16 fec 1002b000.ethernet: fec_enet_mdio_write: get_sync (0) 17 fec 1002b000.ethernet: fec_enet_mdio_write: put_autosuspend (1) 18 fec 1002b000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=1002b000.etherne:00, irq=-1) 19 fec_runtime_suspend: disable fep-clk_ipg 20 fec 1002b000.ethernet: fec_enet_mdio_read: get_sync (0) 21 fec_runtime_resume: enable
Re: [PATCH 4/6] test_bpf: add module parameters to filter the tests to run.
On 08/03/2015 06:23 PM, Nicolas Schichan wrote: ... Btw, for the range test in prepare_bpf_tests(), you could also reject a negative lower bound index right there. I thought it was better to have all the sanity checks grouped in prepare_bpf_tests() (with the checking of the test_name and test_id parameters nearby) ? Also a negative lower bound is meaning that no range has been set so all tests should be run. I just got a bit confused when loading test_range=-100,1 was not rejected, but they do indeed all run in this case. Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: Fix L4 checksum handling when dealing with IP fragments
On Mon, Aug 03, 2015 at 09:25:53AM -0700, Pravin Shelar wrote: On Sat, Aug 1, 2015 at 6:31 PM, Glenn Griffin ggriffin.ker...@gmail.com wrote: openvswitch modifies the L4 checksum of a packet when modifying the ip address. When an IP packet is fragmented only the first fragment contains an L4 header and checksum. Prior to this change openvswitch would modify all fragments, modifying application data in non-first fragments, causing checksum failures in the reassembled packet. Signed-off-by: Glenn Griffin ggriffin.ker...@gmail.com Patch looks good. I have one following comment. --- net/openvswitch/actions.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 8a8c0b8..bfffb1a 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -273,28 +273,36 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key, return 0; } -static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh, - __be32 *addr, __be32 new_addr) +static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh, + __be32 addr, __be32 new_addr) { int transport_len = skb-len - skb_transport_offset(skb); + if (ntohs(nh-frag_off) IP_OFFSET) + return; It is efficient to check frag-offset in network byte order. I'll send a revised patch. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] openvswitch: Fix L4 checksum handling when dealing with IP fragments
openvswitch modifies the L4 checksum of a packet when modifying the ip address. When an IP packet is fragmented only the first fragment contains an L4 header and checksum. Prior to this change openvswitch would modify all fragments, modifying application data in non-first fragments, causing checksum failures in the reassembled packet. Signed-off-by: Glenn Griffin ggriffin.ker...@gmail.com --- Changes in v2: - Compare frag_off in network byte order rather than host byte order net/openvswitch/actions.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 8a8c0b8..ee34f47 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -273,28 +273,36 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key, return 0; } -static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh, - __be32 *addr, __be32 new_addr) +static void update_ip_l4_checksum(struct sk_buff *skb, struct iphdr *nh, + __be32 addr, __be32 new_addr) { int transport_len = skb-len - skb_transport_offset(skb); + if (nh-frag_off htons(IP_OFFSET)) + return; + if (nh-protocol == IPPROTO_TCP) { if (likely(transport_len = sizeof(struct tcphdr))) inet_proto_csum_replace4(tcp_hdr(skb)-check, skb, -*addr, new_addr, 1); +addr, new_addr, 1); } else if (nh-protocol == IPPROTO_UDP) { if (likely(transport_len = sizeof(struct udphdr))) { struct udphdr *uh = udp_hdr(skb); if (uh-check || skb-ip_summed == CHECKSUM_PARTIAL) { inet_proto_csum_replace4(uh-check, skb, -*addr, new_addr, 1); +addr, new_addr, 1); if (!uh-check) uh-check = CSUM_MANGLED_0; } } } +} +static void set_ip_addr(struct sk_buff *skb, struct iphdr *nh, + __be32 *addr, __be32 new_addr) +{ + update_ip_l4_checksum(skb, nh, *addr, new_addr); csum_replace4(nh-check, *addr, new_addr); skb_clear_hash(skb); *addr = new_addr; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: fec: fix initial runtime PM refcount
On Mon, 3 Aug 2015, Uwe [iso-8859-1] Kleine-K�nig wrote: Hello, I have no clue about runtime-pm, but I added a few people to Cc: who should know better ... Best regards Uwe On Mon, Aug 03, 2015 at 06:15:54PM +0200, Andrew Lunn wrote: On Mon, Aug 03, 2015 at 05:50:11PM +0200, Lucas Stach wrote: The clocks are initially active and thus the device is marked active. This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend() call at the end of probe then leaves us with an invalid refcount of -1, which in turn leads to the device staying in suspended state even though netdev open had been called. Fix this by initializing the refcount to be coherent with the initial device status. Fixes: 8fff755e9f8 (net: fec: Ensure clocks are enabled while using mdio bus) Signed-off-by: Lucas Stach l.st...@pengutronix.de --- Please apply this as a fix for 4.2 --- drivers/net/ethernet/freescale/fec_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c650e..271bb5862346 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3433,6 +3433,7 @@ fec_probe(struct platform_device *pdev) pm_runtime_set_autosuspend_delay(pdev-dev, FEC_MDIO_PM_TIMEOUT); pm_runtime_use_autosuspend(pdev-dev); + pm_runtime_get_noresume(pdev-dev); pm_runtime_set_active(pdev-dev); pm_runtime_enable(pdev-dev); This might work, but is it the correct fix? It looks reasonable to me. It might also make sense to move all of that pm_runtime_* stuff to the end of the probe routine. Notice that they don't get undone if register_netdev() fails. Documentation/power/runtime_pm.txt says: 534 In addition to that, the initial runtime PM status of all devices is 535 'suspended', but it need not reflect the actual physical state of the device. 536 Thus, if the device is initially active (i.e. it is able to process I/O), its 537 runtime PM status must be changed to 'active', with the help of 538 pm_runtime_set_active(), before pm_runtime_enable() is called for the device. At the point we call the pm_runtime_ functions above, all the clocks are ticking. So according to the documentation pm_runtime_set_active() is the right thing to do. But it makes no mention of have to call pm_runtime_get_noresume(). I would of expected pm_runtime_set_active() to set the count to the correct value. pm_runtime_set_active() doesn't change the usage count. All it does is set the runtime PM status to active. A call to pm_runtime_get_noresume() (or something similar) is necessary to balance the call to pm_runtime_put_autosuspend() at the end of the probe routine. Both the _get_ and the _put_ should be present or neither should be. For instance, an alternative way to accomplish the same result is to replace pm_runtime_put_autosuspend() with pm_runtime_autosuspend(). The only difference is that the usage counter would not be elevated during the register_netdev() call, so in theory the device could be suspended while that routine is running. But if all the pm_runtime_* calls were moved to the end of the probe function, even that couldn't happen. Alan Stern -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 2/2] rocker: linearize skb in case frags would not fit into tx descriptor
On Sun, Aug 2, 2015 at 11:56 AM, Jiri Pirko j...@resnulli.us wrote: Suggested-by: Scott Feldman sfel...@gmail.com Signed-off-by: Jiri Pirko j...@resnulli.us Acked-by: Scott Feldman sfel...@gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next 1/2] rocker: enable support for scattered packets
On Sun, Aug 2, 2015 at 11:56 AM, Jiri Pirko j...@resnulli.us wrote: From: Ido Schimmel ido...@mellanox.com rocker supports the transmission of scattered packets, so let the kernel know about it by setting the NETIF_F_SG bit in the device's features. Signed-off-by: Ido Schimmel ido...@mellanox.com Signed-off-by: Jiri Pirko j...@resnulli.us Acked-by: Scott Feldman sfel...@gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ovs-dev] [PATCH v1] netdev: Make 100 percents packets sampled when sampling rate is 1.
On Mon, Aug 3, 2015 at 11:18 AM, Pravin Shelar pshe...@nicira.com wrote: On Mon, Aug 3, 2015 at 12:11 AM, Wenyu Zhang wen...@vmware.com wrote: When sampling rate is 1, the sampling probability is UINT32_MAX. The packet should be sampled even the prandom32() generate the number of UINT32_MAX. And none packet need be sampled when the probability is 0. Signed-off-by: Wenyu Zhang wen...@vmware.com --- net/openvswitch/actions.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index cf04c2f..03acb09 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -669,9 +669,11 @@ static int sample(struct datapath *dp, struct sk_buff *skb, for (a = nla_data(attr), rem = nla_len(attr); rem 0; a = nla_next(a, rem)) { + uint32_t probability; switch (nla_type(a)) { case OVS_SAMPLE_ATTR_PROBABILITY: - if (prandom_u32() = nla_get_u32(a)) + probability = nla_get_u32(a); + if (!probability || probability nla_get_u32(a)) This condition does not looks right to calculate sampling probability. When you send v2, can you also make the subject more narrow (openvswitch instead of netdev) and add the tree that you are targeting ([PATCH net] in this case)? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch -master] netfilter: xt_CT: checking for IS_ERR() instead of NULL
On 30 July 2015 at 04:57, Pablo Neira Ayuso pa...@netfilter.org wrote: On Tue, Jul 28, 2015 at 01:42:28AM +0300, Dan Carpenter wrote: We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: Dan Carpenter dan.carpen...@oracle.com Applied, thanks. I have also appended this chunk, since synproxy is also affected: --- a/net/netfilter/nf_synproxy_core.c +++ b/net/netfilter/nf_synproxy_core.c @@ -353,7 +353,7 @@ static int __net_init synproxy_net_init(struct net *net) int err = -ENOMEM; ct = nf_ct_tmpl_alloc(net, 0, GFP_KERNEL); - if (IS_ERR(ct)) { + if (!ct) { err = PTR_ERR(ct); goto err1; } Does PTR_ERR() implicitly interpret NULL as -ENOMEM? Seems like the fix applied here is a little different from the xt_CT fix. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch -master] netfilter: xt_CT: checking for IS_ERR() instead of NULL
On 3 August 2015 at 11:29, Joe Stringer joestrin...@nicira.com wrote: On 30 July 2015 at 04:57, Pablo Neira Ayuso pa...@netfilter.org wrote: On Tue, Jul 28, 2015 at 01:42:28AM +0300, Dan Carpenter wrote: We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: Dan Carpenter dan.carpen...@oracle.com Applied, thanks. I have also appended this chunk, since synproxy is also affected: --- a/net/netfilter/nf_synproxy_core.c +++ b/net/netfilter/nf_synproxy_core.c @@ -353,7 +353,7 @@ static int __net_init synproxy_net_init(struct net *net) int err = -ENOMEM; ct = nf_ct_tmpl_alloc(net, 0, GFP_KERNEL); - if (IS_ERR(ct)) { + if (!ct) { err = PTR_ERR(ct); goto err1; } Does PTR_ERR() implicitly interpret NULL as -ENOMEM? Seems like the fix applied here is a little different from the xt_CT fix. Just saw the initialization of err now, but this would be overridden within the error checking statement. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/2] ipv4: apply lwtunnel encap for locally-generated packets
lwtunnel encap is applied for forwarded packets, but not for locally-generated packets. This is because the output function is not overridden in __mkroute_output, unlike it is in __mkroute_input. The lwtunnel state is correctly set on the rth through the call to rt_set_nexthop, so all that needs to be done is to override the dst output function to be lwtunnel_output if there is lwtunnel state present and it requires output redirection. Signed-off-by: Robert Shearman rshea...@brocade.com --- net/ipv4/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 908f7ef2f12a..18fd7c9095c7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2022,6 +2022,8 @@ add: } rt_set_nexthop(rth, fl4-daddr, res, fnhe, fi, type, 0); + if (lwtunnel_output_redirect(rth-rt_lwtstate)) + rth-dst.output = lwtunnel_output; return rth; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/2] lwtunnel: set skb protocol and dev
In the locally-generated packet path skb-protocol may not be set and this is required for the lwtunnel encap in order to get the lwtstate. This would otherwise have been set by ip_output or ip6_output so set skb-protocol prior to calling the lwtunnel encap function. Additionally set skb-dev in case it is needed further down the transmit path. Signed-off-by: Robert Shearman rshea...@brocade.com --- net/core/lwtunnel.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index c240c895b319..5d6d8e3d450a 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -215,8 +215,12 @@ int lwtunnel_output6(struct sock *sk, struct sk_buff *skb) struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); struct lwtunnel_state *lwtstate = NULL; - if (rt) + if (rt) { lwtstate = rt-rt6i_lwtstate; + skb-dev = rt-dst.dev; + } + + skb-protocol = htons(ETH_P_IPV6); return __lwtunnel_output(sk, skb, lwtstate); } @@ -227,8 +231,12 @@ int lwtunnel_output(struct sock *sk, struct sk_buff *skb) struct rtable *rt = (struct rtable *)skb_dst(skb); struct lwtunnel_state *lwtstate = NULL; - if (rt) + if (rt) { lwtstate = rt-rt_lwtstate; + skb-dev = rt-dst.dev; + } + + skb-protocol = htons(ETH_P_IP); return __lwtunnel_output(sk, skb, lwtstate); } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/2] lwtunnel: encap locally-generated ipv4 packets
Locally-generated IPv4 packets, such as from applications running on the host or traceroute/ping currently don't have lwtunnel output redirected encap applied. However, they should do in the same way as for forwarded packets and this patch series addresses that. Robert Shearman (2): lwtunnel: set skb protocol and dev ipv4: apply lwtunnel encap for locally-generated packets net/core/lwtunnel.c | 12 ++-- net/ipv4/route.c| 2 ++ 2 files changed, 12 insertions(+), 2 deletions(-) -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 4/4] fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks
Do WARN_ON_ONCE instead of WARN_ON in gue_gro_receive when the offload callcaks are bad (either don't exist or gro_receive is not specified). Signed-off-by: Tom Herbert t...@herbertland.com --- net/ipv4/fou.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c index eb11f95..2d1646c 100644 --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -347,7 +347,7 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head, rcu_read_lock(); offloads = NAPI_GRO_CB(skb)-is_ipv6 ? inet6_offloads : inet_offloads; ops = rcu_dereference(offloads[guehdr-proto_ctype]); - if (WARN_ON(!ops || !ops-callbacks.gro_receive)) + if (WARN_ON_ONCE(!ops || !ops-callbacks.gro_receive)) goto out_unlock; pp = ops-callbacks.gro_receive(head, skb); -- 1.8.5.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 3/4] ipv6: Add gro functions to sit_offloads
For GRO to work with sit we need gro_receive and gro_complete populated in the sit_offload structure. Signed-off-by: Tom Herbert t...@herbertland.com --- net/ipv6/ip6_offload.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 08b6204..e893cd1 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -292,6 +292,8 @@ static struct packet_offload ipv6_packet_offload __read_mostly = { static const struct net_offload sit_offload = { .callbacks = { .gso_segment= ipv6_gso_segment, + .gro_receive= ipv6_gro_receive, + .gro_complete = ipv6_gro_complete, }, }; -- 1.8.5.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 1/4] gro: Fix remcsum offload to deal with frags in GRO
The remote checksum offload GRO did not consider the case that frag0 might be in use. This patch fixes that by accessing headers using the skb_gro functions and not saving offsets relative to skb-head. Signed-off-by: Tom Herbert t...@herbertland.com --- drivers/net/vxlan.c | 23 +-- include/linux/netdevice.h | 44 net/ipv4/fou.c| 28 3 files changed, 53 insertions(+), 42 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index e90f7a4..60b5b42 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -519,10 +519,10 @@ static struct vxlanhdr *vxlan_gro_remcsum(struct sk_buff *skb, u32 data, struct gro_remcsum *grc, bool nopartial) { - size_t start, offset, plen; + size_t start, offset; if (skb-remcsum_offload) - return NULL; + return vh; if (!NAPI_GRO_CB(skb)-csum_valid) return NULL; @@ -532,17 +532,8 @@ static struct vxlanhdr *vxlan_gro_remcsum(struct sk_buff *skb, offsetof(struct udphdr, check) : offsetof(struct tcphdr, check)); - plen = hdrlen + offset + sizeof(u16); - - /* Pull checksum that will be written */ - if (skb_gro_header_hard(skb, off + plen)) { - vh = skb_gro_header_slow(skb, off + plen, off); - if (!vh) - return NULL; - } - - skb_gro_remcsum_process(skb, (void *)vh + hdrlen, - start, offset, grc, nopartial); + vh = skb_gro_remcsum_process(skb, (void *)vh, off, hdrlen, +start, offset, grc, nopartial); skb-remcsum_offload = 1; @@ -573,7 +564,6 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, goto out; } - skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */ skb_gro_postpull_rcsum(skb, vh, sizeof(struct vxlanhdr)); flags = ntohl(vh-vx_flags); @@ -588,6 +578,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, goto out; } + skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */ + flush = 0; for (p = *head; p; p = p-next) { @@ -1110,6 +1102,9 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh, { size_t start, offset, plen; + if (skb-remcsum_offload) + return vh; + start = (data VXLAN_RCO_MASK) VXLAN_RCO_SHIFT; offset = start + ((data VXLAN_RCO_UDP) ? offsetof(struct udphdr, check) : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 607b5f4..568d7ae 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2303,8 +2303,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb); static inline bool skb_at_gro_remcsum_start(struct sk_buff *skb) { - return (NAPI_GRO_CB(skb)-gro_remcsum_start - skb_headroom(skb) == - skb_gro_offset(skb)); + return (NAPI_GRO_CB(skb)-gro_remcsum_start == skb_gro_offset(skb)); } static inline bool __skb_gro_checksum_validate_needed(struct sk_buff *skb, @@ -2400,37 +2399,58 @@ static inline void skb_gro_remcsum_init(struct gro_remcsum *grc) grc-delta = 0; } -static inline void skb_gro_remcsum_process(struct sk_buff *skb, void *ptr, - int start, int offset, - struct gro_remcsum *grc, - bool nopartial) +static inline void *skb_gro_remcsum_process(struct sk_buff *skb, void *ptr, + unsigned int off, size_t hdrlen, + int start, int offset, + struct gro_remcsum *grc, + bool nopartial) { __wsum delta; + size_t plen = hdrlen + max_t(size_t, offset + sizeof(u16), start); BUG_ON(!NAPI_GRO_CB(skb)-csum_valid); if (!nopartial) { - NAPI_GRO_CB(skb)-gro_remcsum_start = - ((unsigned char *)ptr + start) - skb-head; - return; + NAPI_GRO_CB(skb)-gro_remcsum_start = off + hdrlen + start; + return ptr; + } + + ptr = skb_gro_header_fast(skb, off); + if (skb_gro_header_hard(skb, off + plen)) { + ptr = skb_gro_header_slow(skb, off + plen, off); + if (!ptr) + return NULL; } - delta = remcsum_adjust(ptr, NAPI_GRO_CB(skb)-csum, start, offset); + delta = remcsum_adjust(ptr + hdrlen, NAPI_GRO_CB(skb)-csum, +