Re: [PATCH 1/2] net: ethernet: ti: cpts: convert to use kthread_worker
On Fri, Jul 21, 2017 at 06:49:42PM -0500, Grygorii Strashko wrote: > There could be significant delay in CPTS work schedule under high system > load and on -RT which could cause CPTS misbehavior due to internal counter > overflow. Usage of own kthread_worker allows to avoid such kind of issues > and makes it possible to tune priority of CPTS kthread_worker thread on -RT > (thread name "cpts"). I have also seen excessive delays in the time stamp work from the dp83640 under certain loads. Can we please implement this time stamp kthread_worker in the PHC subsystem? That way, the facility can be used by other drivers without code duplication. Thanks, Richard
Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported
Hi Sinan, Bjorn: On 2017/7/14 21:54, Sinan Kaya wrote: > On 7/13/2017 9:26 PM, Ding Tianhong wrote: >> There is no code to enable the PCIe Relaxed Ordering bit in the >> configuration space, >> it is only be enable by default according to the PCIe Standard >> Specification, what we >> do is to distinguish the RC problematic platform and clear the Relaxed >> Ordering bit >> to tell the PCIe EP don't send any TLPs with Relaxed Ordering Attributes to >> the Root >> Complex. > > Maybe, you should change the patch commit as > "Disable PCIe Relaxed Ordering if not supported"... I agree that to use the new commit title as your suggested, thanks. :) @Bjorn do you want me to spawn a new patchset with the new commit title and the Reviewed-by from Casey on the patch 3, or maybe you could pick this up and modify it own ? thanks. Ding >
Re: 4.12.0: igb: transmit queue 7 timed out
(Cc netdev and Intel) On Tue, Jul 18, 2017 at 1:57 PM, Justin Piszczwrote: > Hello, > > Kernel: 4.12.0 > Arch: x86_64 > > What causes this issue? It is likely a igb driver issue. > > [199141.434449] NETDEV WATCHDOG: eth1 (igb): transmit queue 7 timed out > [199141.434501] [ cut here ] > [199141.434515] WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:316 > dev_watchdog+0x212/0x220 > [199141.434528] CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.12.0 #3 > [199141.434533] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.2 > 01/16/2015 > [199141.434539] task: 8810385b8180 task.stack: c90b4000 > [199141.434546] RIP: 0010:dev_watchdog+0x212/0x220 > [199141.434551] RSP: 0018:88103f483e68 EFLAGS: 00010286 > [199141.434558] RAX: 0037 RBX: 0007 RCX: > 083f > [199141.434563] RDX: RSI: 00f6 RDI: > 003f > [199141.434569] RBP: 88103f483e88 R08: 0006a111 R09: > 0487 > [199141.434574] R10: 0082 R11: 823f49ee R12: > 8810348a8000 > [199141.434579] R13: 000a R14: 0008 R15: > 0082 > [199141.434585] FS: () GS:88103f48() > knlGS: > [199141.434590] CS: 0010 DS: ES: CR0: 80050033 > [199141.434596] CR2: 29e25c35b000 CR3: 0220a000 CR4: > 001406e0 > [199141.434600] Call Trace: > [199141.434607] > [199141.434615] ? qdisc_rcu_free+0x40/0x40 > [199141.434625] call_timer_fn.isra.4+0x19/0x90 > [199141.434633] expire_timers+0x7f/0x90 > [199141.434641] run_timer_softirq+0x84/0xe0 > [199141.434650] ? ktime_get+0x3b/0x90 > [199141.434659] ? clockevents_program_event+0x75/0xf0 > [199141.434666] __do_softirq+0xdf/0x1f0 > [199141.434673] irq_exit+0xab/0xb0 > [199141.434681] smp_trace_apic_timer_interrupt+0x63/0x90 > [199141.434688] smp_apic_timer_interrupt+0x9/0x10 > [199141.434696] apic_timer_interrupt+0x86/0x90 > [199141.434705] RIP: 0010:cpuidle_enter_state+0x153/0x1e0 > [199141.434711] RSP: 0018:c90b7e68 EFLAGS: 0286 ORIG_RAX: > ff10 > [199141.434720] RAX: 88103f499000 RBX: e8c91fd0 RCX: > 001f > [199141.434725] RDX: 20c49ba5e353f7cf RSI: 88103f496818 RDI: > > [199141.434730] RBP: c90b7e98 R08: 3f7b R09: > 0018 > [199141.434735] R10: c90b7e48 R11: 38b6 R12: > b51e3a7582f1 > [199141.434740] R13: 0004 R14: 8225e2b8 R15: > 8225e120 > [199141.434746] > [199141.434754] ? cpuidle_enter_state+0x148/0x1e0 > [199141.434762] cpuidle_enter+0x12/0x20 > [199141.434771] call_cpuidle+0x1e/0x30 > [199141.434779] do_idle+0xde/0x180 > [199141.434788] cpu_startup_entry+0x6c/0x70 > [199141.434796] start_secondary+0x13c/0x160 > [199141.434808] secondary_startup_64+0x9f/0x9f > [199141.434813] Code: 8c 24 a4 03 00 00 eb 8d 4c 89 e7 c6 05 ea 96 81 00 01 > e8 02 f0 fd ff 89 d9 4c 89 e6 48 c7 c7 f0 49 00 82 48 89 c2 e8 bd 08 6e ff > <0f> ff eb c1 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 48 55 44 8b > [199141.434895] ---[ end trace 0bac15bcef84778a ]--- > [199141.435208] igb :08:00.2 eth1: Reset adapter > [199144.839995] igb :08:00.2 eth1: igb: eth1 NIC Link is Up 1000 Mbps > Full Duplex, Flow Control: RX/TX > > Justin. >
[PATCH 1/2] net: ethernet: ti: cpts: convert to use kthread_worker
There could be significant delay in CPTS work schedule under high system load and on -RT which could cause CPTS misbehavior due to internal counter overflow. Usage of own kthread_worker allows to avoid such kind of issues and makes it possible to tune priority of CPTS kthread_worker thread on -RT (thread name "cpts"). Signed-off-by: Grygorii Strashko--- drivers/net/ethernet/ti/cpts.c | 23 +-- drivers/net/ethernet/ti/cpts.h | 4 +++- 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c index 32279d2..6a520ae 100644 --- a/drivers/net/ethernet/ti/cpts.c +++ b/drivers/net/ethernet/ti/cpts.c @@ -20,12 +20,12 @@ #include #include #include +#include #include #include #include #include #include -#include #include #include @@ -238,14 +238,15 @@ static struct ptp_clock_info cpts_info = { .enable = cpts_ptp_enable, }; -static void cpts_overflow_check(struct work_struct *work) +static void cpts_overflow_check(struct kthread_work *work) { struct timespec64 ts; struct cpts *cpts = container_of(work, struct cpts, overflow_work.work); cpts_ptp_gettime(>info, ); pr_debug("cpts overflow check at %lld.%09lu\n", ts.tv_sec, ts.tv_nsec); - schedule_delayed_work(>overflow_work, cpts->ov_check_period); + kthread_queue_delayed_work(cpts->kworker, >overflow_work, + cpts->ov_check_period); } static int cpts_match(struct sk_buff *skb, unsigned int ptp_class, @@ -378,7 +379,8 @@ int cpts_register(struct cpts *cpts) } cpts->phc_index = ptp_clock_index(cpts->clock); - schedule_delayed_work(>overflow_work, cpts->ov_check_period); + kthread_queue_delayed_work(cpts->kworker, >overflow_work, + cpts->ov_check_period); return 0; err_ptp: @@ -392,7 +394,7 @@ void cpts_unregister(struct cpts *cpts) if (WARN_ON(!cpts->clock)) return; - cancel_delayed_work_sync(>overflow_work); + kthread_cancel_delayed_work_sync(>overflow_work); ptp_clock_unregister(cpts->clock); cpts->clock = NULL; @@ -476,7 +478,6 @@ struct cpts *cpts_create(struct device *dev, void __iomem *regs, cpts->dev = dev; cpts->reg = (struct cpsw_cpts __iomem *)regs; spin_lock_init(>lock); - INIT_DELAYED_WORK(>overflow_work, cpts_overflow_check); ret = cpts_of_parse(cpts, node); if (ret) @@ -488,6 +489,14 @@ struct cpts *cpts_create(struct device *dev, void __iomem *regs, return ERR_PTR(PTR_ERR(cpts->refclk)); } + kthread_init_delayed_work(>overflow_work, cpts_overflow_check); + cpts->kworker = kthread_create_worker(0, "cpts"); + if (IS_ERR(cpts->kworker)) { + dev_err(dev, "failed to create cpts overflow_work task %ld\n", + PTR_ERR(cpts->kworker)); + return ERR_CAST(cpts->kworker); + } + clk_prepare(cpts->refclk); cpts->cc.read = cpts_systim_read; @@ -513,6 +522,8 @@ void cpts_release(struct cpts *cpts) return; clk_unprepare(cpts->refclk); + + kthread_destroy_worker(cpts->kworker); } EXPORT_SYMBOL_GPL(cpts_release); diff --git a/drivers/net/ethernet/ti/cpts.h b/drivers/net/ethernet/ti/cpts.h index 01ea82b..e8128e8 100644 --- a/drivers/net/ethernet/ti/cpts.h +++ b/drivers/net/ethernet/ti/cpts.h @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -119,13 +120,14 @@ struct cpts { u32 cc_mult; /* for the nominal frequency */ struct cyclecounter cc; struct timecounter tc; - struct delayed_work overflow_work; int phc_index; struct clk *refclk; struct list_head events; struct list_head pool; struct cpts_event pool_data[CPTS_MAX_EVENTS]; unsigned long ov_check_period; + struct kthread_worker *kworker; + struct kthread_delayed_work overflow_work; }; void cpts_rx_timestamp(struct cpts *cpts, struct sk_buff *skb); -- 2.10.1
[PATCH 0/2] net: ethernet: ti: cpts: fix tx timestamping timeout
With the low Ethernet connection speed cpdma notification about packet processing can be received before CPTS TX timestamp event, which is set when packet actually left CPSW while cpdma notification is sent when packet pushed in CPSW fifo. As result, when connection is slow and CPU is fast enough TX timestamping is not working properly. Issue was discovered using timestamping tool on am57x boards with Ethernet link speed forced to 100M and on am335x-evm with Ethernet link speed forced to 10M. This series fixes it by introducing TX SKB queue to store PTP SKBs for which Ethernet Transmit Event hasn't been received yet and then re-check this queue with new Ethernet Transmit Events by scheduling CPTS overflow work more often until TX SKB queue is not empty. The internal CPTS workqueues are also converted to use kthread_worker. Grygorii Strashko (2): net: ethernet: ti: cpts: convert to use kthread_worker net: ethernet: ti: cpts: fix tx timestamping timeout drivers/net/ethernet/ti/cpts.c | 112 + drivers/net/ethernet/ti/cpts.h | 5 +- 2 files changed, 106 insertions(+), 11 deletions(-) -- 2.10.1
[PATCH 2/2] net: ethernet: ti: cpts: fix tx timestamping timeout
With the low speed Ethernet connection CPDMA notification about packet processing can be received before CPTS TX timestamp event, which is set when packet actually left CPSW while cpdma notification is sent when packet pushed in CPSW fifo. As result, when connection is slow and CPU is fast enough TX timestamping is not working properly. Fix it, by introducing TX SKB queue to store PTP SKBs for which Ethernet Transmit Event hasn't been received yet and then re-check this queue with new Ethernet Transmit Events by scheduling CPTS overflow work more often (every 1 jiffies) until TX SKB queue is not empty. Side effect of this change is: - User space tools require to take into account possible delay in TX timestamp processing (for example ptp4l works with tx_timestamp_timeout=400 under net traffic and tx_timestamp_timeout=25 in idle). Signed-off-by: Grygorii Strashko--- drivers/net/ethernet/ti/cpts.c | 93 +++--- drivers/net/ethernet/ti/cpts.h | 1 + 2 files changed, 88 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c index 6a520ae..746dd1d 100644 --- a/drivers/net/ethernet/ti/cpts.c +++ b/drivers/net/ethernet/ti/cpts.c @@ -31,9 +31,18 @@ #include "cpts.h" +#define CPTS_SKB_TX_WORK_TIMEOUT 1 /* jiffies */ + +struct cpts_skb_cb_data { + unsigned long tmo; +}; + #define cpts_read32(c, r) readl_relaxed(>reg->r) #define cpts_write32(c, v, r) writel_relaxed(v, >reg->r) +static int cpts_match(struct sk_buff *skb, unsigned int ptp_class, + u16 ts_seqid, u8 ts_msgtype); + static int event_expired(struct cpts_event *event) { return time_after(jiffies, event->tmo); @@ -77,6 +86,47 @@ static int cpts_purge_events(struct cpts *cpts) return removed ? 0 : -1; } +static bool cpts_match_tx_ts(struct cpts *cpts, struct cpts_event *event) +{ + struct sk_buff *skb, *tmp; + u16 seqid; + u8 mtype; + bool found = false; + + mtype = (event->high >> MESSAGE_TYPE_SHIFT) & MESSAGE_TYPE_MASK; + seqid = (event->high >> SEQUENCE_ID_SHIFT) & SEQUENCE_ID_MASK; + + /* no need to grab txq.lock as access is always done under cpts->lock */ + skb_queue_walk_safe(>txq, skb, tmp) { + struct skb_shared_hwtstamps ssh; + unsigned int class = ptp_classify_raw(skb); + struct cpts_skb_cb_data *skb_cb = + (struct cpts_skb_cb_data *)skb->cb; + + if (cpts_match(skb, class, seqid, mtype)) { + u64 ns = timecounter_cyc2time(>tc, event->low); + + memset(, 0, sizeof(ssh)); + ssh.hwtstamp = ns_to_ktime(ns); + skb_tstamp_tx(skb, ); + found = true; + __skb_unlink(skb, >txq); + dev_consume_skb_any(skb); + dev_dbg(cpts->dev, "match tx timestamp mtype %u seqid %04x\n", + mtype, seqid); + } else if (time_after(jiffies, skb_cb->tmo)) { + /* timeout any expired skbs over 1s */ + dev_dbg(cpts->dev, + "expiring tx timestamp mtype %u seqid %04x\n", + mtype, seqid); + __skb_unlink(skb, >txq); + dev_consume_skb_any(skb); + } + } + + return found; +} + /* * Returns zero if matching event type was found. */ @@ -101,9 +151,15 @@ static int cpts_fifo_read(struct cpts *cpts, int match) event->low = lo; type = event_type(event); switch (type) { + case CPTS_EV_TX: + if (cpts_match_tx_ts(cpts, event)) { + /* if the new event matches an existing skb, +* then don't queue it +*/ + break; + } case CPTS_EV_PUSH: case CPTS_EV_RX: - case CPTS_EV_TX: list_del_init(>list); list_add_tail(>list, >events); break; @@ -240,13 +296,20 @@ static struct ptp_clock_info cpts_info = { static void cpts_overflow_check(struct kthread_work *work) { - struct timespec64 ts; struct cpts *cpts = container_of(work, struct cpts, overflow_work.work); + unsigned long delay = cpts->ov_check_period; + struct timespec64 ts; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + ts = ns_to_timespec64(timecounter_read(>tc)); + + if (!skb_queue_empty(>txq)) + delay = CPTS_SKB_TX_WORK_TIMEOUT; + spin_unlock_irqrestore(>lock, flags); - cpts_ptp_gettime(>info, );
Re: [PATCH iproute2 master 1/2] bpf: improve error reporting around tail calls
On Fri, 21 Jul 2017 21:13:06 +0200 Daniel Borkmannwrote: > Currently, it's still quite hard to figure out if a prog passed the > verifier, but later gets rejected due to different tail call ownership. > Figure out whether that is the case and provide appropriate error > messages to the user. > > Signed-off-by: Daniel Borkmann Sorry, dead code. Please fix and resubmit. bpf.c:356:12: warning: ‘bpf_derive_prog_from_fdinfo’ defined but not used [-Wunused-function] static int bpf_derive_prog_from_fdinfo(int fd, struct bpf_prog_data *prog)
SELinux/IP_PASSSEC regression in 4.13-rcX
Hello, I've been seeing a SELinux regression with IP_PASSSEC on the v4.13-rcX kernels and finally tracked the problem down to the skb_release_head_state() call in __udp_queue_rcv_skb(). Looking at the code and the git log it would appear that the likely culprit is 0a463c78d25b ("udp: avoid a cache miss on dequeue "); it looks similar to IP option problem fixed in 0ddf3fb2c43d2. >From a SELinux/IP_PASSSEC point of view we need access to the skb->sp pointer to examine the SAs. I'm posting this here without a patch because it isn't clear to me how you would like to fix the problem; my initial thought would be to simply make the skb_release_head_state() conditional on the skb->sp pointer, much like the IP options fix, but I'm not sure if you have a more clever idea. -- paul moore www.paul-moore.com
Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.
On Fri, Jul 21, 2017 at 11:42 AM, Cong Wangwrote: > On Thu, Jul 20, 2017 at 8:23 AM, Hangbin Liu wrote: >> 2017-07-20 23:06 GMT+08:00 Hangbin Liu : +++ b/net/ipv6/route.c @@ -3637,12 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, dst = ip6_route_lookup(net, , 0); rt = container_of(dst, struct rt6_info, dst); - if (rt->dst.error) { - err = rt->dst.error; - ip6_rt_put(rt); - goto errout; - } >>> >>> hmm... or instead of remove this check, should we check all the entry? Like >>> if ((rt->dst.error && rt != net->ipv6.ip6_null_entry && rt != >> ^^ mistake here >>> net->ipv6.ip6_blk_hole_entry) || >>> rt == net->ipv6.ip6_null_entry ) >> >> Sorry, there should be no need to check ip6_null_entry since the >> error is already >> -ENETUNREACH. So how about > > Hmm? All of these 3 entries have error set, right?? > So we should only check dst.error... removing the check seems ok to me. We can also make the check conditional to fibmatch code only to eliminate any change in behavior introduced by fibmatch. ie if (fibmatch && rt->dst.error). Hangbin, can you also pls check that fibmatch works ok for such routes with your patch applied ?. ip netns exec client ip -6 route get fibmatch 2003::1 (with latest iproute2) thank you.
[PATCH] net: stmmac: Adjust dump offset of DMA registers for ethtool
From: Thor ThayerThe commit fbf68229ffe7 ("net: stmmac: unify registers dumps methods") in the Linux kernel modified the register dump to store the DMA registers at the DMA register offset (0x1000) but ethtool (stmmac.c) looks for the DMA registers after the MAC registers which is offset 55. This patch copies the DMA registers from the higher offset to the offset where ethtool expects them. Signed-off-by: Thor Thayer --- drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h | 3 +++ drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 5 + 4 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c index cc4ea13..ec539f7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c @@ -205,7 +205,7 @@ static void dwmac1000_dump_dma_regs(void __iomem *ioaddr, u32 *reg_space) { int i; - for (i = 0; i < 23; i++) + for (i = 0; i < NUM_DWMAC1000_DMA_REGS; i++) if ((i < 12) || (i > 17)) reg_space[DMA_BUS_MODE / 4 + i] = readl(ioaddr + DMA_BUS_MODE + i * 4); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c index eef2f22..6502b9a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c @@ -70,7 +70,7 @@ static void dwmac100_dump_dma_regs(void __iomem *ioaddr, u32 *reg_space) { int i; - for (i = 0; i < 9; i++) + for (i = 0; i < NUM_DWMAC100_DMA_REGS; i++) reg_space[DMA_BUS_MODE / 4 + i] = readl(ioaddr + DMA_BUS_MODE + i * 4); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h index 56e485f..3107d19 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h @@ -136,6 +136,9 @@ #define DMA_STATUS_TI 0x0001 /* Transmit Interrupt */ #define DMA_CONTROL_FTF0x0010 /* Flush transmit FIFO */ +#define NUM_DWMAC100_DMA_REGS 9 +#define NUM_DWMAC1000_DMA_REGS 23 + void dwmac_enable_dma_transmission(void __iomem *ioaddr); void dwmac_enable_dma_irq(void __iomem *ioaddr); void dwmac_disable_dma_irq(void __iomem *ioaddr); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index 690b7c1..2ffb5b1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -33,6 +33,8 @@ #define MAC100_ETHTOOL_NAME"st_mac100" #define GMAC_ETHTOOL_NAME "st_gmac" +#define ETHTOOL_DMA_OFFSET 55 + struct stmmac_stats { char stat_string[ETH_GSTRING_LEN]; int sizeof_stat; @@ -443,6 +445,9 @@ static void stmmac_ethtool_gregs(struct net_device *dev, priv->hw->mac->dump_regs(priv->hw, reg_space); priv->hw->dma->dump_regs(priv->ioaddr, reg_space); + /* Copy DMA registers to where ethtool expects them */ + memcpy(_space[ETHTOOL_DMA_OFFSET], _space[DMA_BUS_MODE / 4], + NUM_DWMAC1000_DMA_REGS * 4); } static void -- 2.7.4
Re: A buggy behavior for Linux TCP Reno and HTCP
On Fri, Jul 21, 2017 at 1:46 PM, Neal Cardwellwrote: > On Fri, Jul 21, 2017 at 4:27 PM, Lisong Xu wrote: >> >> Hi Yuchung, >> >> This test scenario is only one example to trigger this bug. In general, as >> long as cwnd <4, the undo function has this bug. > > > Yes, personally I agree that this seems like an issue that is general enough > to be worth fixing. In the sense that, if cwnd <4, then we may well be very > congested. So we don't want to get hit by this bug wherein an undo of a loss > recovery can cause cwnd to suddenly jump (from 1, 2, or 3) up to 4. > > Seems like any of the several CCs that use tcp_reno_undo_cwnd() have this > bug. > > I guess in my mind the only question is whether we want to add a > tcp_foo_undo_cwnd() and ca->loss_cwnd to every CC module to handle this > issue (i.e. make every CC module handle it the way CUBIC does), or (my I would prefer the former b/c loss_cwnd may not be universal TCP state, just like ssthresh carries no meaning in some CC (bbr). It also seems also more consistent with the recent change on undo commit e97991832a4ea4a5f47d65f068a4c966a2eb5730 Author: Florian Westphal Date: Mon Nov 21 14:18:38 2016 +0100 tcp: make undo_cwnd mandatory for congestion modules > preference) just add a tp->loss_cwnd field so we can use shared code in > tcp_reno_undo_cwnd() to get this right across all CC modules. > > neal >
Re: A buggy behavior for Linux TCP Reno and HTCP
On Fri, Jul 21, 2017 at 4:27 PM, Lisong Xuwrote: > > Hi Yuchung, > > This test scenario is only one example to trigger this bug. In general, as > long as cwnd <4, the undo function has this bug. Yes, personally I agree that this seems like an issue that is general enough to be worth fixing. In the sense that, if cwnd <4, then we may well be very congested. So we don't want to get hit by this bug wherein an undo of a loss recovery can cause cwnd to suddenly jump (from 1, 2, or 3) up to 4. Seems like any of the several CCs that use tcp_reno_undo_cwnd() have this bug. I guess in my mind the only question is whether we want to add a tcp_foo_undo_cwnd() and ca->loss_cwnd to every CC module to handle this issue (i.e. make every CC module handle it the way CUBIC does), or (my preference) just add a tp->loss_cwnd field so we can use shared code in tcp_reno_undo_cwnd() to get this right across all CC modules. neal
Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info
John Crispinwrote: > When the flow offloading engine forwards a packet to the DMA it will send > additional info to the sw path. this includes > * physical switch port > * internal flow hash - this is required to populate the correct flow table > entry > * ppe state - this indicates what state the PPEs internal table is in for > the flow > * the reason why the packet was forwarde - these are things like bind, > unbind, timed out, ... > > once the flow table offloading patches are ready and upstream, the netfilter > layer will see the SKB and pass it o to the flow table offloading code, If this is about conntrack offloading, then I prefer if this is done without changing any core network structure. What about adding a new conntrack extension to hold whatever info you need, and then allocate a conntrack entry in the driver? This would obviously need core changes in conntrack (such as allowing calls into conntrack from drivers without hard module dependencies, and a thorough check if this causes backwards problems (e.g. right now a "-m conntrack" check in the raw table can only succeed for packets from lo interface). But I think that could be worked around, esp. if we assume that we won't see such entries a lot (assuming sw is slowpath and hw handles most packets). Thanks, Florian
Re: A buggy behavior for Linux TCP Reno and HTCP
Hi Yuchung, This test scenario is only one example to trigger this bug. In generally, as long as cwnd <4, the undo function has this bug. This would not be a problem for a normal network. But might be an issue, if the network is highly congested (e.g., many many TCP flows with small cwnd <4). In this case, the bug may possibly mistakenly double the sending rate of each flow, and make a highly congested network even more congested similar to congestion collapse. This is actually why we need the congestion control algorithms in the first place. Thanks Lisong On 7/21/2017 12:59 PM, Yuchung Cheng wrote: On Thu, Jul 20, 2017 at 2:28 PM, Wei Sunwrote: Hi Yuchung, Sorry for the confusion. The test case was adapted from an old DSACK test case (i.e., forget to remove something). Attached is a new and simple one. Thanks Note that the test scenario is fairly rare IMO: the connection first experience timeouts, then its retransmission got acked, then the original packets get Acked (ack w/ val 1400 ecr 130). It can be really long reordering, or reordering plus packet loss. The Linux undo state machines may not handle this perfectly, but it's probably not worth extra state for such rare events. On Wed, Jul 19, 2017 at 2:31 PM, Yuchung Cheng wrote: On Tue, Jul 18, 2017 at 2:36 PM, Wei Sun wrote: Hi there, We find a buggy behavior when using Linux TCP Reno and HTCP in low bandwidth or highly congested network environments. In a simple word, their undo functions may mistakenly double the cwnd, leading to a more aggressive behavior in a highly congested scenario. The detailed reason: The current reno undo function assumes cwnd halving (and thus doubles the cwnd), but it doesn't consider a corner case condition that ssthresh is at least 2. e.g., cwnd ssth An initial state: 25 A spurious loss: 12 Undo: 45 Here the cwnd after undo is two times as that before undo. Attached is a simple script to reproduce it. the packetdrill script is a bit confusing: it disables SACK but then the client returns ACK w/ SACKs, also 3 dupacks happen after RTO so the sender isn't technically going through a fast recovery... could you provide a better test? A similar reason for HTCP, so we recommend to store the cwnd on loss in .ssthresh implementation and restore it again in .undo_cwnd for TCP Reno and HTCP implementations. Thanks
Re: A buggy behavior for Linux TCP Reno and HTCP
Hi Yuchung, This test scenario is only one example to trigger this bug. In general, as long as cwnd <4, the undo function has this bug. This would not be a problem for a normal network. But might be an issue, if the network is highly congested (e.g., many many TCP flows with small cwnd <4). In this case, the bug may possibly mistakenly double the sending rate of each flow, and make a highly congested network even more congested similar to congestion collapse. This is actually why we need the congestion control algorithms in the first place. Thanks Lisong On 7/21/2017 12:59 PM, Yuchung Cheng wrote: On Thu, Jul 20, 2017 at 2:28 PM, Wei Sunwrote: Hi Yuchung, Sorry for the confusion. The test case was adapted from an old DSACK test case (i.e., forget to remove something). Attached is a new and simple one. Thanks Note that the test scenario is fairly rare IMO: the connection first experience timeouts, then its retransmission got acked, then the original packets get Acked (ack w/ val 1400 ecr 130). It can be really long reordering, or reordering plus packet loss. The Linux undo state machines may not handle this perfectly, but it's probably not worth extra state for such rare events. On Wed, Jul 19, 2017 at 2:31 PM, Yuchung Cheng wrote: On Tue, Jul 18, 2017 at 2:36 PM, Wei Sun wrote: Hi there, We find a buggy behavior when using Linux TCP Reno and HTCP in low bandwidth or highly congested network environments. In a simple word, their undo functions may mistakenly double the cwnd, leading to a more aggressive behavior in a highly congested scenario. The detailed reason: The current reno undo function assumes cwnd halving (and thus doubles the cwnd), but it doesn't consider a corner case condition that ssthresh is at least 2. e.g., cwnd ssth An initial state: 25 A spurious loss: 12 Undo: 45 Here the cwnd after undo is two times as that before undo. Attached is a simple script to reproduce it. the packetdrill script is a bit confusing: it disables SACK but then the client returns ACK w/ SACKs, also 3 dupacks happen after RTO so the sender isn't technically going through a fast recovery... could you provide a better test? A similar reason for HTCP, so we recommend to store the cwnd on loss in .ssthresh implementation and restore it again in .undo_cwnd for TCP Reno and HTCP implementations. Thanks
Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info
From: John CrispinDate: Fri, 21 Jul 2017 19:01:57 +0200 > When the flow offloading engine forwards a packet to the DMA it will > send additional info to the sw path. this includes > * physical switch port > * internal flow hash - this is required to populate the correct flow > * table entry > * ppe state - this indicates what state the PPEs internal table is in > * for the flow > * the reason why the packet was forwarde - these are things like bind, > * unbind, timed out, ... > > once the flow table offloading patches are ready and upstream, the > netfilter layer will see the SKB and pass it o to the flow table > offloading code, at which point it will finally end up inside the > offloading driver. this will need to have access to this info sent to > the sw path inside the rx descriptor to properly work out what state > the flow is in and which table entry to populate in the HW table for > offloading to work. You absolutely must justify any change to a core data structure alongside the complete and full set of patches that actually make use of that data structure change. You can't just say "here is the data structure change and BTW what actually uses this is somewhere else, and not here on the list yet." That makes it impossible to 1) evaluate the correctness of your change and 2) validate the actual use so we can suggest alternative schemes and/or approaches. So please don't suggest changes this way. Thanks.
[PATCH iproute2 master 1/2] bpf: improve error reporting around tail calls
Currently, it's still quite hard to figure out if a prog passed the verifier, but later gets rejected due to different tail call ownership. Figure out whether that is the case and provide appropriate error messages to the user. Signed-off-by: Daniel Borkmann--- lib/bpf.c | 226 ++ 1 file changed, 169 insertions(+), 57 deletions(-) diff --git a/lib/bpf.c b/lib/bpf.c index 7eb5cd9..4f6421a 100644 --- a/lib/bpf.c +++ b/lib/bpf.c @@ -344,15 +344,50 @@ static void bpf_map_pin_report(const struct bpf_elf_map *pin, fprintf(stderr, "\n"); } -static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map, - int length, enum bpf_prog_type type) +struct bpf_prog_data { + unsigned int type; + unsigned int jited; +}; + +struct bpf_map_ext { + struct bpf_prog_data owner; +}; + +static int bpf_derive_prog_from_fdinfo(int fd, struct bpf_prog_data *prog) +{ + char file[PATH_MAX], buff[4096]; + unsigned int val; + FILE *fp; + + snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd); + memset(prog, 0, sizeof(*prog)); + + fp = fopen(file, "r"); + if (!fp) { + fprintf(stderr, "No procfs support?!\n"); + return -EIO; + } + + while (fgets(buff, sizeof(buff), fp)) { + if (sscanf(buff, "prog_type:\t%u", ) == 1) + prog->type = val; + else if (sscanf(buff, "prog_jited:\t%u", ) == 1) + prog->jited = val; + } + + fclose(fp); + return 0; +} + +static int bpf_derive_elf_map_from_fdinfo(int fd, struct bpf_elf_map *map, + struct bpf_map_ext *ext) { + unsigned int val, owner_type = 0, owner_jited = 0; char file[PATH_MAX], buff[4096]; - struct bpf_elf_map tmp = {}, zero = {}; - unsigned int val, owner_type = 0; FILE *fp; snprintf(file, sizeof(file), "/proc/%d/fdinfo/%d", getpid(), fd); + memset(map, 0, sizeof(*map)); fp = fopen(file, "r"); if (!fp) { @@ -362,27 +397,48 @@ static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map, while (fgets(buff, sizeof(buff), fp)) { if (sscanf(buff, "map_type:\t%u", ) == 1) - tmp.type = val; + map->type = val; else if (sscanf(buff, "key_size:\t%u", ) == 1) - tmp.size_key = val; + map->size_key = val; else if (sscanf(buff, "value_size:\t%u", ) == 1) - tmp.size_value = val; + map->size_value = val; else if (sscanf(buff, "max_entries:\t%u", ) == 1) - tmp.max_elem = val; + map->max_elem = val; else if (sscanf(buff, "map_flags:\t%i", ) == 1) - tmp.flags = val; + map->flags = val; else if (sscanf(buff, "owner_prog_type:\t%i", ) == 1) owner_type = val; + else if (sscanf(buff, "owner_jited:\t%i", ) == 1) + owner_jited = val; } fclose(fp); + if (ext) { + memset(ext, 0, sizeof(*ext)); + ext->owner.type = owner_type; + ext->owner.jited = owner_jited; + } + + return 0; +} + +static int bpf_map_selfcheck_pinned(int fd, const struct bpf_elf_map *map, + struct bpf_map_ext *ext, int length, + enum bpf_prog_type type) +{ + struct bpf_elf_map tmp, zero = {}; + int ret; + + ret = bpf_derive_elf_map_from_fdinfo(fd, , ext); + if (ret < 0) + return ret; /* The decision to reject this is on kernel side eventually, but * at least give the user a chance to know what's wrong. */ - if (owner_type && owner_type != type) + if (ext->owner.type && ext->owner.type != type) fprintf(stderr, "Program array map owner types differ: %u (obj) != %u (pin)\n", - type, owner_type); + type, ext->owner.type); if (!memcmp(, map, length)) { return 0; @@ -882,6 +938,7 @@ int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv) .argc = argc, .argv = argv, }; + struct bpf_map_ext ext = {}; int ret, prog_fd, map_fd; enum bpf_mode mode; uint32_t map_key; @@ -908,7 +965,7 @@ int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv) goto out_prog; } - ret = bpf_map_selfcheck_pinned(map_fd, , + ret = bpf_map_selfcheck_pinned(map_fd, , ,
[PATCH iproute2 master 0/2] Minor BPF updates
Two minor updates to the BPF code, first one makes use of the recently exposed owner_jited in fdinfo to report whether a load issue related to tail calls occured, and second one fixes up custom mount of bpf fs when passed via env. Thanks! Daniel Borkmann (2): bpf: improve error reporting around tail calls bpf: fix mnt path when from env lib/bpf.c | 281 +++--- 1 file changed, 214 insertions(+), 67 deletions(-) -- 1.9.3
[PATCH iproute2 master 2/2] bpf: fix mnt path when from env
When bpf fs mount path is from env, behavior is currently broken as we continue to search in default paths, thus fix this up. Signed-off-by: Daniel Borkmann--- lib/bpf.c | 55 +-- 1 file changed, 45 insertions(+), 10 deletions(-) diff --git a/lib/bpf.c b/lib/bpf.c index 4f6421a..851742c 100644 --- a/lib/bpf.c +++ b/lib/bpf.c @@ -485,6 +485,24 @@ static int bpf_mnt_fs(const char *target) return 0; } +static int bpf_mnt_check_target(const char *target) +{ + struct stat sb = {}; + int ret; + + ret = stat(target, ); + if (ret) { + ret = mkdir(target, S_IRWXU); + if (ret) { + fprintf(stderr, "mkdir %s failed: %s\n", target, + strerror(errno)); + return ret; + } + } + + return 0; +} + static int bpf_valid_mntpt(const char *mnt, unsigned long magic) { struct statfs st_fs; @@ -497,6 +515,21 @@ static int bpf_valid_mntpt(const char *mnt, unsigned long magic) return 0; } +static const char *bpf_find_mntpt_single(unsigned long magic, char *mnt, +int len, const char *mntpt) +{ + int ret; + + ret = bpf_valid_mntpt(mntpt, magic); + if (!ret) { + strncpy(mnt, mntpt, len - 1); + mnt[len - 1] = 0; + return mnt; + } + + return NULL; +} + static const char *bpf_find_mntpt(const char *fstype, unsigned long magic, char *mnt, int len, const char * const *known_mnts) @@ -508,11 +541,8 @@ static const char *bpf_find_mntpt(const char *fstype, unsigned long magic, if (known_mnts) { ptr = known_mnts; while (*ptr) { - if (bpf_valid_mntpt(*ptr, magic) == 0) { - strncpy(mnt, *ptr, len - 1); - mnt[len - 1] = 0; + if (bpf_find_mntpt_single(magic, mnt, len, *ptr)) return mnt; - } ptr++; } } @@ -690,6 +720,7 @@ static const char *bpf_get_work_dir(enum bpf_prog_type type) static char bpf_wrk_dir[PATH_MAX]; static const char *mnt; static bool bpf_mnt_cached; + const char *mnt_env = getenv(BPF_ENV_MNT); static const char * const bpf_known_mnts[] = { BPF_DIR_MNT, "/bpf", @@ -708,13 +739,17 @@ static const char *bpf_get_work_dir(enum bpf_prog_type type) return out; } - mnt = bpf_find_mntpt("bpf", BPF_FS_MAGIC, bpf_tmp, sizeof(bpf_tmp), -bpf_known_mnts); + if (mnt_env) + mnt = bpf_find_mntpt_single(BPF_FS_MAGIC, bpf_tmp, + sizeof(bpf_tmp), mnt_env); + else + mnt = bpf_find_mntpt("bpf", BPF_FS_MAGIC, bpf_tmp, +sizeof(bpf_tmp), bpf_known_mnts); if (!mnt) { - mnt = getenv(BPF_ENV_MNT); - if (!mnt) - mnt = BPF_DIR_MNT; - ret = bpf_mnt_fs(mnt); + mnt = mnt_env ? : BPF_DIR_MNT; + ret = bpf_mnt_check_target(mnt); + if (!ret) + ret = bpf_mnt_fs(mnt); if (ret) { mnt = NULL; goto out; -- 1.9.3
Re: [PATCH V2 1/4] net-next: dsa: move struct dsa_device_ops to the global header file
A proper patch series submission must always have a proper "[PATCH 0/N] ..." header posting that explains at a high level what the patch series does, how it is doing it, and why it is doing it that way. Thank you.
Re: [PATCH net-next 1/2] rxrpc: Expose UAPI definitions to userspace
On Fri, 2017-07-21 at 19:29 +0100, David Howells wrote: > Move UAPI definitions from the internal header and place them in a UAPI > header file so that userspace can make use of them. Please use git format-patch -M to make the rename clearer.
Re: commit 16ecba59 breaks 82574L under heavy load.
On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote: > Could you please test the following patch and let me know if it: > 1) reduces the interrupt rate of the Other msi-x vector > 2) avoids the link flaps > or > 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]" > In this case, please paste icr values printed. By the way, while at fixing the e1000e, I just noticed that if you are blasting the port with traffic when it comes up, you risk getting a transmit queue time out, because the queue is started before the carrier is up. ixgbe already fixed that in cdc04dcce0598fead6029a2f95e95a4d2ea419c2. igb has the same problem (which goes away by moving the queue start to the watchdog after carrier_on, I just haven't got around to sending that patch yet). I am going to try moving the queue start to the watchdog and try it again. Trace looked like this: [ cut here ] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x1f9/0x200 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Modules linked in: dpi_drv(PO) ccu_util(PO) ipv4_mb(PO) l2bridge_config_util(PO) l2_config_util(PO) route_config_util(PO) qos_config_util(PO) sysapp_common(PO) chantry_fwd_eng_2800_config(PO) shim_module(PO) sadb_cc(PO) ipsecXformer(PO) libeCrypto(PO) ipmatch_cc(PO) l2h_cc(PO) ndproxy_cc(PO) arpint_cc(PO) portinfo_cc(PO) chantryqos_cc(PO) redirector_cc(PO) ix_ph(PO) fpm_core_cc(PO) pulse_cc(PO) vnstt_cc(PO) vnsap_cc(PO) fm_cc(PO) rutm_cc(PO) mutm_cc(PO) ethernet_tx_cc(PO) stkdrv_cc(PO) l2bridge_cc(PO) events_util(PO) sched_cc(PO) qm_cc(PO) ipv4_cc(PO) wred_cc(PO) tc_meter_cc(PO) dscp_classifier_cc(PO) classifier_6t_cc(PO) ent586_cc(PO) dev_cc_arp(PO) chantry_fwd_eng_2800_tables(PO) ether_arp_lib(PO) rtmv4_lib(PO) lkup_lib(PO) l2tm_lib(PO) fragmentation_lib(PO) properties_lib(PO) msg_support_lib(PO) utilities_lib(PO) cci_lib(PO) rm_lib(PO) libossl(O) vip(O) productSpec_x86_dp(PO) e1000e CPU: 0 PID: 0 Comm: swapper/0 Tainted: P O4.9.24 #20 Hardware name: Supermicro X7SPA-HF/X7SPA-HF, BIOS 1.2a 06/23/12 811cef1b 88007fc03e88 81037ade 88007fc03ed8 0001 0082 0001 81037b4c Call Trace: [] ? dump_stack+0x46/0x5b [] ? __warn+0xbe/0xe0 [] ? warn_slowpath_fmt+0x4c/0x50 [] ? mod_timer+0xf2/0x150 [] ? dev_watchdog+0x1f9/0x200 [] ? dev_graft_qdisc+0x70/0x70 [] ? call_timer_fn.isra.26+0x11/0x80 [] ? run_timer_softirq+0x128/0x150 [] ? __do_softirq+0xeb/0x1f0 [] ? irq_exit+0x55/0x60 [] ? smp_apic_timer_interrupt+0x39/0x50 [] ? apic_timer_interrupt+0x7c/0x90 [] ? mwait_idle+0x51/0x80 [] ? cpu_startup_entry+0xa7/0x130 [] ? start_kernel+0x306/0x30e ---[ end trace ee759b7a56e1110b ]--- -- Len Sorensen
[PATCH] netvsc: fix ptr_ret.cocci warnings
drivers/net/hyperv/netvsc_drv.c:737:8-14: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: 9749fed5d43d ("netvsc: use ERR_PTR to avoid dereference issues") CC: stephen hemmingerSigned-off-by: Fengguang Wu --- netvsc_drv.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -734,7 +734,7 @@ static int netvsc_set_queues(struct net_ return ret; net_device = rndis_filter_device_add(dev, _info); - return IS_ERR(net_device) ? PTR_ERR(net_device) : 0; + return PTR_ERR_OR_ZERO(net_device); } static int netvsc_set_channels(struct net_device *net,
Re: [Patch net] net: check mac address length for dev_set_mac_address()
On Thu, Jul 20, 2017 at 11:27 AM, Cong Wangwrote: > diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c > index 82fd4c9c4a1b..3f41601d7b7c 100644 > --- a/net/core/dev_ioctl.c > +++ b/net/core/dev_ioctl.c > @@ -262,6 +262,8 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, > unsigned int cmd) > return dev_set_mtu(dev, ifr->ifr_mtu); > > case SIOCSIFHWADDR: > + if (dev->addr_len > sizeof(struct sockaddr)) > + return -EINVAL; > return dev_set_mac_address(dev, >ifr_hwaddr); > Thinking a bit more, I should keep this patch simpler for -net and -stable, so only the above piece is necessary, the rest pieces can be put to -net-next. I will resend this, and the "team: use a larger struct for mac address". Thanks.
Re: [PATCH 4/5] e1000e: Separate signaling for link check/link up
On Fri, Jul 21, 2017 at 11:36:26AM -0700, Benjamin Poirier wrote: > Lennart reported the following race condition: > > \ e1000_watchdog_task > \ e1000e_has_link > \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link > /* link is up */ > mac->get_link_status = false; > > /* interrupt */ > \ e1000_msix_other > hw->mac.get_link_status = true; > > link_active = !hw->mac.get_link_status > /* link_active is false, wrongly */ > > This problem arises because the single flag get_link_status is used to > signal two different states: link status needs checking and link status is > down. > > Avoid the problem by using the return value of .check_for_link to signal > the link status to e1000e_has_link(). > > Reported-by: Lennart Sorensen> Signed-off-by: Benjamin Poirier This too seems potentially -stable worthy, although with patch 5, the problem becomes much much less likely to occur. -- Len Sorensen
Re: [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts
On Fri, Jul 21, 2017 at 11:36:27AM -0700, Benjamin Poirier wrote: > When e1000e_poll() is not fast enough to keep up with incoming traffic, the > adapter (when operating in msix mode) raises the Other interrupt to signal > Receiver Overrun. > > This is a double problem because 1) at the moment e1000_msix_other() > assumes that it is only called in case of Link Status Change and 2) if the > condition persists, the interrupt is repeatedly raised again in quick > succession. > > Ideally we would configure the Other interrupt to not be raised in case of > receiver overrun but this doesn't seem possible on this adapter. Instead, > we handle the first part of the problem by reverting to the practice of > reading ICR in the other interrupt handler, like before commit 16ecba59bc33 > ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit > 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME > from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts > anymore. We handle the second part of the problem by not re-enabling the > Other interrupt right away when there is overrun. Instead, we wait until > traffic subsides, napi polling mode is exited and interrupts are > re-enabled. > > Reported-by: Lennart Sorensen> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt") > Signed-off-by: Benjamin Poirier Any chance of this fix hitting -stable? After all adapter reset under load is not nice. -- Len Sorensen
Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.
On Thu, Jul 20, 2017 at 8:23 AM, Hangbin Liuwrote: > 2017-07-20 23:06 GMT+08:00 Hangbin Liu : >>> +++ b/net/ipv6/route.c >>> @@ -3637,12 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff >>> *in_skb, struct nlmsghdr *nlh, >>> dst = ip6_route_lookup(net, , 0); >>> >>> rt = container_of(dst, struct rt6_info, dst); >>> - if (rt->dst.error) { >>> - err = rt->dst.error; >>> - ip6_rt_put(rt); >>> - goto errout; >>> - } >> >> hmm... or instead of remove this check, should we check all the entry? Like >> if ((rt->dst.error && rt != net->ipv6.ip6_null_entry && rt != > ^^ mistake here >> net->ipv6.ip6_blk_hole_entry) || >> rt == net->ipv6.ip6_null_entry ) > > Sorry, there should be no need to check ip6_null_entry since the > error is already > -ENETUNREACH. So how about Hmm? All of these 3 entries have error set, right?? So we should only check dst.error...
[PATCH 1/5] e1000e: Fix error path in link detection
In case of error from e1e_rphy(), the loop will exit early and "success" will be set to true erroneously. Signed-off-by: Benjamin Poirier--- drivers/net/ethernet/intel/e1000e/phy.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c index d78d47b41a71..86ff0969efb6 100644 --- a/drivers/net/ethernet/intel/e1000e/phy.c +++ b/drivers/net/ethernet/intel/e1000e/phy.c @@ -1744,6 +1744,7 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, s32 ret_val = 0; u16 i, phy_status; + *success = false; for (i = 0; i < iterations; i++) { /* Some PHYs require the MII_BMSR register to be read * twice due to the link bit being sticky. No harm doing @@ -1763,16 +1764,16 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, ret_val = e1e_rphy(hw, MII_BMSR, _status); if (ret_val) break; - if (phy_status & BMSR_LSTATUS) + if (phy_status & BMSR_LSTATUS) { + *success = true; break; + } if (usec_interval >= 1000) msleep(usec_interval / 1000); else udelay(usec_interval); } - *success = (i < iterations); - return ret_val; } -- 2.13.2
[PATCH 3/5] e1000e: Fix return value test
All the helpers return -E1000_ERR_PHY. Signed-off-by: Benjamin Poirier--- drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 58a87134d2e5..fc6a1db2 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -5099,7 +5099,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter) break; } - if ((ret_val == E1000_ERR_PHY) && (hw->phy.type == e1000_phy_igp_3) && + if ((ret_val == -E1000_ERR_PHY) && (hw->phy.type == e1000_phy_igp_3) && (er32(CTRL) & E1000_PHY_CTRL_GBE_DISABLE)) { /* See e1000_kmrn_lock_loss_workaround_ich8lan() */ e_info("Gigabit has been disabled, downgrading speed\n"); -- 2.13.2
[PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts
When e1000e_poll() is not fast enough to keep up with incoming traffic, the adapter (when operating in msix mode) raises the Other interrupt to signal Receiver Overrun. This is a double problem because 1) at the moment e1000_msix_other() assumes that it is only called in case of Link Status Change and 2) if the condition persists, the interrupt is repeatedly raised again in quick succession. Ideally we would configure the Other interrupt to not be raised in case of receiver overrun but this doesn't seem possible on this adapter. Instead, we handle the first part of the problem by reverting to the practice of reading ICR in the other interrupt handler, like before commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts anymore. We handle the second part of the problem by not re-enabling the Other interrupt right away when there is overrun. Instead, we wait until traffic subsides, napi polling mode is exited and interrupts are re-enabled. Reported-by: Lennart SorensenFixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt") Signed-off-by: Benjamin Poirier --- drivers/net/ethernet/intel/e1000e/defines.h | 1 + drivers/net/ethernet/intel/e1000e/netdev.c | 33 +++-- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h index 0641c0098738..afb7ebe20b24 100644 --- a/drivers/net/ethernet/intel/e1000e/defines.h +++ b/drivers/net/ethernet/intel/e1000e/defines.h @@ -398,6 +398,7 @@ #define E1000_ICR_LSC 0x0004 /* Link Status Change */ #define E1000_ICR_RXSEQ 0x0008 /* Rx sequence error */ #define E1000_ICR_RXDMT00x0010 /* Rx desc min. threshold (0) */ +#define E1000_ICR_RXO 0x0040 /* Receiver Overrun */ #define E1000_ICR_RXT0 0x0080 /* Rx timer intr (ring 0) */ #define E1000_ICR_ECCER 0x0040 /* Uncorrectable ECC Error */ /* If this bit asserted, the driver should claim the interrupt */ diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 5a8ab1136566..803edd1a6401 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -1910,12 +1910,30 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data) struct net_device *netdev = data; struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = >hw; + u32 icr; + bool enable = true; + + icr = er32(ICR); + if (icr & E1000_ICR_RXO) { + ew32(ICR, E1000_ICR_RXO); + enable = false; + /* napi poll will re-enable Other, make sure it runs */ + if (napi_schedule_prep(>napi)) { + adapter->total_rx_bytes = 0; + adapter->total_rx_packets = 0; + __napi_schedule(>napi); + } + } + if (icr & E1000_ICR_LSC) { + ew32(ICR, E1000_ICR_LSC); + hw->mac.get_link_status = true; + /* guard against interrupt when we're going down */ + if (!test_bit(__E1000_DOWN, >state)) { + mod_timer(>watchdog_timer, jiffies + 1); + } + } - hw->mac.get_link_status = true; - - /* guard against interrupt when we're going down */ - if (!test_bit(__E1000_DOWN, >state)) { - mod_timer(>watchdog_timer, jiffies + 1); + if (enable && !test_bit(__E1000_DOWN, >state)) { ew32(IMS, E1000_IMS_OTHER); } @@ -2687,7 +2705,8 @@ static int e1000e_poll(struct napi_struct *napi, int weight) napi_complete_done(napi, work_done); if (!test_bit(__E1000_DOWN, >state)) { if (adapter->msix_entries) - ew32(IMS, adapter->rx_ring->ims_val); + ew32(IMS, adapter->rx_ring->ims_val | +E1000_IMS_OTHER); else e1000_irq_enable(adapter); } @@ -4204,7 +4223,7 @@ static void e1000e_trigger_lsc(struct e1000_adapter *adapter) struct e1000_hw *hw = >hw; if (adapter->msix_entries) - ew32(ICS, E1000_ICS_OTHER); + ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER); else ew32(ICS, E1000_ICS_LSC); } -- 2.13.2
[PATCH 2/5] e1000e: Fix wrong comment related to link detection
Reading e1000e_check_for_copper_link() shows that get_link_status is set to false after link has been detected. Therefore, it stays TRUE until then. Signed-off-by: Benjamin Poirier--- drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 2dcb5463d9b8..58a87134d2e5 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -5074,7 +5074,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter) /* get_link_status is set on LSC (link status) interrupt or * Rx sequence error interrupt. get_link_status will stay -* false until the check_for_link establishes link +* true until the check_for_link establishes link * for copper adapters ONLY */ switch (hw->phy.media_type) { @@ -5092,7 +5092,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter) break; case e1000_media_type_internal_serdes: ret_val = hw->mac.ops.check_for_link(hw); - link_active = adapter->hw.mac.serdes_has_link; + link_active = hw->mac.serdes_has_link; break; default: case e1000_media_type_unknown: -- 2.13.2
[PATCH 4/5] e1000e: Separate signaling for link check/link up
Lennart reported the following race condition: \ e1000_watchdog_task \ e1000e_has_link \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link /* link is up */ mac->get_link_status = false; /* interrupt */ \ e1000_msix_other hw->mac.get_link_status = true; link_active = !hw->mac.get_link_status /* link_active is false, wrongly */ This problem arises because the single flag get_link_status is used to signal two different states: link status needs checking and link status is down. Avoid the problem by using the return value of .check_for_link to signal the link status to e1000e_has_link(). Reported-by: Lennart SorensenSigned-off-by: Benjamin Poirier --- drivers/net/ethernet/intel/e1000e/mac.c| 11 --- drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/mac.c b/drivers/net/ethernet/intel/e1000e/mac.c index b322011ec282..f457c5703d0c 100644 --- a/drivers/net/ethernet/intel/e1000e/mac.c +++ b/drivers/net/ethernet/intel/e1000e/mac.c @@ -410,6 +410,9 @@ void e1000e_clear_hw_cntrs_base(struct e1000_hw *hw) * Checks to see of the link status of the hardware has changed. If a * change in link status has been detected, then we read the PHY registers * to get the current speed/duplex if link exists. + * + * Returns a negative error code (-E1000_ERR_*) or 0 (link down) or 1 (link + * up). **/ s32 e1000e_check_for_copper_link(struct e1000_hw *hw) { @@ -423,7 +426,7 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw) * Change or Rx Sequence Error interrupt. */ if (!mac->get_link_status) - return 0; + return 1; /* First we want to see if the MII Status Register reports * link. If so, then we want to get the current speed/duplex @@ -461,10 +464,12 @@ s32 e1000e_check_for_copper_link(struct e1000_hw *hw) * different link partner. */ ret_val = e1000e_config_fc_after_link_up(hw); - if (ret_val) + if (ret_val) { e_dbg("Error configuring flow control\n"); + return ret_val; + } - return ret_val; + return 1; } /** diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index fc6a1db2..5a8ab1136566 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -5081,7 +5081,7 @@ static bool e1000e_has_link(struct e1000_adapter *adapter) case e1000_media_type_copper: if (hw->mac.get_link_status) { ret_val = hw->mac.ops.check_for_link(hw); - link_active = !hw->mac.get_link_status; + link_active = ret_val > 0; } else { link_active = true; } -- 2.13.2
[patch net-next] mlxsw: spectrum_router: Don't batch neighbour deletion
From: Ido SchimmelCurrent firmware supported by the driver doesn't support batch deletion of IPv6 neighbours on a given router interface (RIF). Until a new version that supports this functionality is made available, delete neighbours one by one. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index e6d629f..548552c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -1419,25 +1419,16 @@ static void mlxsw_sp_neigh_fini(struct mlxsw_sp *mlxsw_sp) rhashtable_destroy(_sp->router->neigh_ht); } -static int mlxsw_sp_neigh_rif_flush(struct mlxsw_sp *mlxsw_sp, - const struct mlxsw_sp_rif *rif) -{ - char rauht_pl[MLXSW_REG_RAUHT_LEN]; - - mlxsw_reg_rauht_pack(rauht_pl, MLXSW_REG_RAUHT_OP_WRITE_DELETE_ALL, -rif->rif_index, rif->addr); - return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl); -} - static void mlxsw_sp_neigh_rif_gone_sync(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_rif *rif) { struct mlxsw_sp_neigh_entry *neigh_entry, *tmp; - mlxsw_sp_neigh_rif_flush(mlxsw_sp, rif); list_for_each_entry_safe(neigh_entry, tmp, >neigh_list, -rif_list_node) +rif_list_node) { + mlxsw_sp_neigh_entry_update(mlxsw_sp, neigh_entry, false); mlxsw_sp_neigh_entry_destroy(mlxsw_sp, neigh_entry); + } } struct mlxsw_sp_nexthop_key { -- 2.9.3
[PATCH net-next 0/2] rxrpc: Rearrange headers
Here's a pair of patches that rearrange some of the AF_RXRPC header files that are outside of the net/rxrpc/ directory: (1) The bits userspace need are moved to uapi/linux/rxrpc.h. [Should this be af_rxrpc.h instead, I wonder - but there doesn't seem to be precedent for that in the other net UAPI headers.] (2) For the most part, the contents of rxrpc/packet.h are no longer used outside of the AF_RXRPC module, so move them to net/rxrpc/protocol.h with the exception of the standard abort codes which are exposed to userspace when an abort occurs and the security index values which are needed when constructing keys. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-rewrite-20170721 David --- David Howells (2): rxrpc: Expose UAPI definitions to userspace rxrpc: Move the packet.h include file into net/rxrpc/ fs/afs/misc.c |1 fs/afs/rxrpc.c |1 include/linux/rxrpc.h | 79 --- include/rxrpc/packet.h | 235 include/uapi/linux/rxrpc.h | 124 +++ net/rxrpc/ar-internal.h|2 net/rxrpc/protocol.h | 190 7 files changed, 315 insertions(+), 317 deletions(-) delete mode 100644 include/linux/rxrpc.h delete mode 100644 include/rxrpc/packet.h create mode 100644 include/uapi/linux/rxrpc.h create mode 100644 net/rxrpc/protocol.h
[PATCH net-next 1/2] rxrpc: Expose UAPI definitions to userspace
Move UAPI definitions from the internal header and place them in a UAPI header file so that userspace can make use of them. Signed-off-by: David Howells--- include/linux/rxrpc.h | 79 --- include/uapi/linux/rxrpc.h | 80 2 files changed, 80 insertions(+), 79 deletions(-) delete mode 100644 include/linux/rxrpc.h create mode 100644 include/uapi/linux/rxrpc.h diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h deleted file mode 100644 index 7343f71783dc.. --- a/include/linux/rxrpc.h +++ /dev/null @@ -1,79 +0,0 @@ -/* AF_RXRPC parameters - * - * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowe...@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#ifndef _LINUX_RXRPC_H -#define _LINUX_RXRPC_H - -#include -#include - -/* - * RxRPC socket address - */ -struct sockaddr_rxrpc { - sa_family_t srx_family; /* address family */ - u16 srx_service;/* service desired */ - u16 transport_type; /* type of transport socket (SOCK_DGRAM) */ - u16 transport_len; /* length of transport address */ - union { - sa_family_t family; /* transport address family */ - struct sockaddr_in sin; /* IPv4 transport address */ - struct sockaddr_in6 sin6; /* IPv6 transport address */ - } transport; -}; - -/* - * RxRPC socket options - */ -#define RXRPC_SECURITY_KEY 1 /* [clnt] set client security key */ -#define RXRPC_SECURITY_KEYRING 2 /* [srvr] set ring of server security keys */ -#define RXRPC_EXCLUSIVE_CONNECTION 3 /* Deprecated; use RXRPC_EXCLUSIVE_CALL instead */ -#define RXRPC_MIN_SECURITY_LEVEL 4 /* minimum security level */ -#define RXRPC_UPGRADEABLE_SERVICE 5 /* Upgrade service[0] -> service[1] */ -#define RXRPC_SUPPORTED_CMSG 6 /* Get highest supported control message type */ - -/* - * RxRPC control messages - * - If neither abort or accept are specified, the message is a data message. - * - terminal messages mean that a user call ID tag can be recycled - * - s/r/- indicate whether these are applicable to sendmsg() and/or recvmsg() - */ -enum rxrpc_cmsg_type { - RXRPC_USER_CALL_ID = 1,/* sr: user call ID specifier */ - RXRPC_ABORT = 2,/* sr: abort request / notification [terminal] */ - RXRPC_ACK = 3,/* -r: [Service] RPC op final ACK received [terminal] */ - RXRPC_NET_ERROR = 5,/* -r: network error received [terminal] */ - RXRPC_BUSY = 6,/* -r: server busy received [terminal] */ - RXRPC_LOCAL_ERROR = 7,/* -r: local error generated [terminal] */ - RXRPC_NEW_CALL = 8,/* -r: [Service] new incoming call notification */ - RXRPC_ACCEPT= 9,/* s-: [Service] accept request */ - RXRPC_EXCLUSIVE_CALL= 10, /* s-: Call should be on exclusive connection */ - RXRPC_UPGRADE_SERVICE = 11, /* s-: Request service upgrade for client call */ - RXRPC_TX_LENGTH = 12, /* s-: Total length of Tx data */ - RXRPC__SUPPORTED -}; - -/* - * RxRPC security levels - */ -#define RXRPC_SECURITY_PLAIN 0 /* plain secure-checksummed packets only */ -#define RXRPC_SECURITY_AUTH1 /* authenticated packets */ -#define RXRPC_SECURITY_ENCRYPT 2 /* encrypted packets */ - -/* - * RxRPC security indices - */ -#define RXRPC_SECURITY_NONE0 /* no security protocol */ -#define RXRPC_SECURITY_RXKAD 2 /* kaserver or kerberos 4 */ -#define RXRPC_SECURITY_RXGK4 /* gssapi-based */ -#define RXRPC_SECURITY_RXK55 /* kerberos 5 */ - -#endif /* _LINUX_RXRPC_H */ diff --git a/include/uapi/linux/rxrpc.h b/include/uapi/linux/rxrpc.h new file mode 100644 index ..08e2fb9c70ae --- /dev/null +++ b/include/uapi/linux/rxrpc.h @@ -0,0 +1,80 @@ +/* Types and definitions for AF_RXRPC. + * + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#ifndef _UAPI_LINUX_RXRPC_H +#define _UAPI_LINUX_RXRPC_H + +#include +#include +#include + +/* + * RxRPC socket address + */ +struct sockaddr_rxrpc { + sa_family_t srx_family; /* address family */ + u16
[PATCH net-next 2/2] rxrpc: Move the packet.h include file into net/rxrpc/
Move the protocol description header file into net/rxrpc/ and rename it to protocol.h. It's no longer necessary to expose it as packets are no longer exposed to kernel services (such as AFS) that use the facility. The abort codes are transferred to the UAPI header instead as we pass these back to userspace and also to kernel services. Signed-off-by: David Howells--- fs/afs/misc.c |1 fs/afs/rxrpc.c |1 include/rxrpc/packet.h | 235 include/uapi/linux/rxrpc.h | 44 net/rxrpc/ar-internal.h|2 net/rxrpc/protocol.h | 190 6 files changed, 235 insertions(+), 238 deletions(-) delete mode 100644 include/rxrpc/packet.h create mode 100644 net/rxrpc/protocol.h diff --git a/fs/afs/misc.c b/fs/afs/misc.c index 100b207efc9e..c05f1f1c0d41 100644 --- a/fs/afs/misc.c +++ b/fs/afs/misc.c @@ -12,7 +12,6 @@ #include #include #include -#include #include "internal.h" #include "afs_fs.h" diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 02781e78ffb6..10743043d431 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -14,7 +14,6 @@ #include #include -#include #include "internal.h" #include "afs_cm.h" diff --git a/include/rxrpc/packet.h b/include/rxrpc/packet.h deleted file mode 100644 index a2dcfb850b9f.. --- a/include/rxrpc/packet.h +++ /dev/null @@ -1,235 +0,0 @@ -/* packet.h: Rx packet layout and definitions - * - * Copyright (C) 2002, 2007 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowe...@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * as published by the Free Software Foundation; either version - * 2 of the License, or (at your option) any later version. - */ - -#ifndef _LINUX_RXRPC_PACKET_H -#define _LINUX_RXRPC_PACKET_H - -typedef u32rxrpc_seq_t;/* Rx message sequence number */ -typedef u32rxrpc_serial_t; /* Rx message serial number */ -typedef __be32 rxrpc_seq_net_t; /* on-the-wire Rx message sequence number */ -typedef __be32 rxrpc_serial_net_t; /* on-the-wire Rx message serial number */ - -/*/ -/* - * on-the-wire Rx packet header - * - all multibyte fields should be in network byte order - */ -struct rxrpc_wire_header { - __be32 epoch; /* client boot timestamp */ -#define RXRPC_RANDOM_EPOCH 0x8000 /* Random if set, date-based if not */ - - __be32 cid;/* connection and channel ID */ -#define RXRPC_MAXCALLS 4 /* max active calls per conn */ -#define RXRPC_CHANNELMASK (RXRPC_MAXCALLS-1) /* mask for channel ID */ -#define RXRPC_CIDMASK (~RXRPC_CHANNELMASK)/* mask for connection ID */ -#define RXRPC_CIDSHIFT ilog2(RXRPC_MAXCALLS) /* shift for connection ID */ -#define RXRPC_CID_INC (1 << RXRPC_CIDSHIFT) /* connection ID increment */ - - __be32 callNumber; /* call ID (0 for connection-level packets) */ - __be32 seq;/* sequence number of pkt in call stream */ - __be32 serial; /* serial number of pkt sent to network */ - - uint8_t type; /* packet type */ -#define RXRPC_PACKET_TYPE_DATA 1 /* data */ -#define RXRPC_PACKET_TYPE_ACK 2 /* ACK */ -#define RXRPC_PACKET_TYPE_BUSY 3 /* call reject */ -#define RXRPC_PACKET_TYPE_ABORT4 /* call/connection abort */ -#define RXRPC_PACKET_TYPE_ACKALL 5 /* ACK all outstanding packets on call */ -#define RXRPC_PACKET_TYPE_CHALLENGE6 /* connection security challenge (SRVR->CLNT) */ -#define RXRPC_PACKET_TYPE_RESPONSE 7 /* connection secutity response (CLNT->SRVR) */ -#define RXRPC_PACKET_TYPE_DEBUG8 /* debug info request */ -#define RXRPC_PACKET_TYPE_VERSION 13 /* version string request */ -#define RXRPC_N_PACKET_TYPES 14 /* number of packet types (incl type 0) */ - - uint8_t flags; /* packet flags */ -#define RXRPC_CLIENT_INITIATED 0x01/* signifies a packet generated by a client */ -#define RXRPC_REQUEST_ACK 0x02/* request an unconditional ACK of this packet */ -#define RXRPC_LAST_PACKET 0x04/* the last packet from this side for this call */ -#define RXRPC_MORE_PACKETS 0x08/* more packets to come */ -#define RXRPC_JUMBO_PACKET 0x20/* [DATA] this is a jumbo packet */ -#define RXRPC_SLOW_START_OK0x20/* [ACK] slow start supported */ - - uint8_t userStatus; /* app-layer defined status */ -#define RXRPC_USERSTATUS_SERVICE_UPGRADE 0x01 /* AuriStor service
Re: [PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
On 07/21/2017 07:24 AM, Måns Rullgård wrote: > Marc Gonzalezwrites: > >> On 21/07/2017 15:47, Måns Rullgård wrote: >> >>> Marc Gonzalez wrote: >>> On 21/07/2017 15:04, Måns Rullgård wrote: > Marc Gonzalez wrote: > >> According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 >> ("Documentation: devicetree: clarify usage of the RGMII phy-modes") >> there are 4 RGMII modes to handle: >> >> "rgmii" (RX and TX delays are added by the MAC when required) >> "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, >> the MAC should not add the RX or TX delays in this case) >> "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, >> the MAC should not add an RX delay in this case) >> "rgmii-txid" (RGMII with internal TX delay provided by the PHY, >> the MAC should not add an TX delay in this case) >> >> Add TX delay in the MAC only for rgmii and rgmii-rxid. >> >> Signed-off-by: Marc Gonzalez >> --- >> drivers/net/ethernet/aurora/nb8800.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/aurora/nb8800.c >> b/drivers/net/ethernet/aurora/nb8800.c >> index ded041dbafe7..f3ed320eb4ad 100644 >> --- a/drivers/net/ethernet/aurora/nb8800.c >> +++ b/drivers/net/ethernet/aurora/nb8800.c >> @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device >> *dev) >> break; >> >> case PHY_INTERFACE_MODE_RGMII: >> -pad_mode = PAD_MODE_RGMII; >> +case PHY_INTERFACE_MODE_RGMII_RXID: >> +pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; >> break; >> >> +case PHY_INTERFACE_MODE_RGMII_ID: >> case PHY_INTERFACE_MODE_RGMII_TXID: >> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; >> +pad_mode = PAD_MODE_RGMII; >> break; >> >> default: > > I still don't like this. Having both the MAC and PHY drivers react to > the phy-connection-type property is bound to cause trouble somewhere. > > The only way out of the current mess is to define new properties for > both MAC and PHY that override the existing ones if present. Do you mean defining 4 new bindings and their corresponding phy_interface_t enum values? For example: "rgmii-v2" "rgmii-id-v2" "rgmii-rxid-v2" "rgmii-txid-v2" PHY_INTERFACE_MODE_RGMII_V2, PHY_INTERFACE_MODE_RGMII_ID_V2, PHY_INTERFACE_MODE_RGMII_RXID_V2, PHY_INTERFACE_MODE_RGMII_TXID_V2, And then handling these new enums in the at803x and nb8800 drivers? >>> >>> It has already been suggested to add new properties specifying desired >>> delays in picoseconds. If present on the MAC node, the MAC driver >>> should attempt to provide the delay, and if present on the PHY node, the >>> PHY driver is responsible. >> >> Sorry, I had already forgotten about Florian's suggestion: >>> If you introduced PHY and/or MAC specific properties to configure the >>> delays in the appropriate unit of time (say ps), you could use a >>> non-compliant 'phy-mode' just to satisfy the driver/PHY library and >>> still override the delays you need. >> >> So we would need two properties (RX and TX). >> "rgmii_rx_skew_ps" and "rgmii_tx_skew_ps" >> >> but it's not clear to me how the MAC probe function communicates >> the arguments to the phy driver. Looks like we would need to add >> two fields to struct phy_device, and maybe define a new phy-mode >> to instruct the PHY driver to look for the two fields. > > There's no need for the drivers to communicate. The location of the > properties in the device tree determines which driver should deal with > it. Exactly. I really like how the PHY delay properties are defined in Documentation/devicetree/bindings/net/micrel-ksz90x1.txt because they are quite generic and for a MAC counterpart those defined in Documentation/devicetree/bindings/net/dwmac-sun8i.txt and Documentation/devicetree/bindings/net/meson-dwmac.txt are also good examples. > >> I don't have time to work on that for now, but I do need to >> fix the nb8800 driver now. Can we apply the following patch >> in the interim? >> >> diff --git a/drivers/net/ethernet/aurora/nb8800.c >> b/drivers/net/ethernet/aurora/nb8800.c >> index ded041dbafe7..e94159507847 100644 >> --- a/drivers/net/ethernet/aurora/nb8800.c >> +++ b/drivers/net/ethernet/aurora/nb8800.c >> @@ -1268,11 +1268,10 @@ static int nb8800_tangox_init(struct net_device *dev) >> break; >> >> case PHY_INTERFACE_MODE_RGMII: >> - pad_mode = PAD_MODE_RGMII; >> - break; >> - >> + case PHY_INTERFACE_MODE_RGMII_ID: >> + case
Re: [PATCH v2 2/4] net: ethernet: nb8800: Set RGMII_MODE for all RGMII modes
On 07/21/2017 04:25 AM, Marc Gonzalez wrote: > There are 4 RGMII phy-modes to handle. ... so make sure that the MAC is configured to set the RGMII_MODE accordingly for all 4 RGMII mode. > > Signed-off-by: Marc GonzalezFixes: 52dfc8301248 ("net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller") Reviewed-by: Florian Fainelli > --- > drivers/net/ethernet/aurora/nb8800.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/aurora/nb8800.c > b/drivers/net/ethernet/aurora/nb8800.c > index 041cfb7952f8..ded041dbafe7 100644 > --- a/drivers/net/ethernet/aurora/nb8800.c > +++ b/drivers/net/ethernet/aurora/nb8800.c > @@ -609,7 +609,7 @@ static void nb8800_mac_config(struct net_device *dev) > mac_mode |= HALF_DUPLEX; > > if (gigabit) { > - if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII) > + if (phy_interface_is_rgmii(dev->phydev)) > mac_mode |= RGMII_MODE; > > mac_mode |= GMAC_MODE; > -- Florian
Re: A buggy behavior for Linux TCP Reno and HTCP
On Thu, Jul 20, 2017 at 2:28 PM, Wei Sunwrote: > Hi Yuchung, > > Sorry for the confusion. The test case was adapted from an old DSACK > test case (i.e., forget to remove something). > > Attached is a new and simple one. Thanks Note that the test scenario is fairly rare IMO: the connection first experience timeouts, then its retransmission got acked, then the original packets get Acked (ack w/ val 1400 ecr 130). It can be really long reordering, or reordering plus packet loss. The Linux undo state machines may not handle this perfectly, but it's probably not worth extra state for such rare events. > > > > On Wed, Jul 19, 2017 at 2:31 PM, Yuchung Cheng wrote: >> On Tue, Jul 18, 2017 at 2:36 PM, Wei Sun wrote: >>> Hi there, >>> >>> We find a buggy behavior when using Linux TCP Reno and HTCP in low >>> bandwidth or highly congested network environments. >>> >>> In a simple word, their undo functions may mistakenly double the cwnd, >>> leading to a more aggressive behavior in a highly congested scenario. >>> >>> >>> The detailed reason: >>> >>> The current reno undo function assumes cwnd halving (and thus doubles >>> the cwnd), but it doesn't consider a corner case condition that >>> ssthresh is at least 2. >>> >>> e.g., >>> cwnd ssth >>> An initial state: 25 >>> A spurious loss: 12 >>> Undo: 45 >>> >>> Here the cwnd after undo is two times as that before undo. Attached is >>> a simple script to reproduce it. >> the packetdrill script is a bit confusing: it disables SACK but then >> the client returns ACK w/ SACKs, also 3 dupacks happen after RTO so >> the sender isn't technically going through a fast recovery... >> >> could you provide a better test? >> >>> >>> A similar reason for HTCP, so we recommend to store the cwnd on loss >>> in .ssthresh implementation and restore it again in .undo_cwnd for TCP >>> Reno and HTCP implementations. >>> >>> Thanks
Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info
On 21/07/17 17:56, Paolo Abeni wrote: Hi, On Fri, 2017-07-21 at 17:20 +0200, John Crispin wrote: In order to make HW flow offloading work in latest MediaTek silicon we need to propagate part of the RX DMS descriptor to the upper layers populating the flow offload engines HW tables. This patch adds an extra element to struct skb_shared_info allowing the ethernet drivers RX napi code to store the required information and make it persistent for the lifecycle of the skb and its clones. Signed-off-by: John Crispin--- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4093552be1de..db9576cd946b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -426,6 +426,7 @@ struct skb_shared_info { unsigned intgso_type; u32 tskey; __be32 ip6_frag_id; + u32 dma_desc; /* * Warning : all fields before dataref are cleared in __alloc_skb() This will increase the skb_shared_info struct size, which is already quite large, and can have several kind of performance drawback. AFAIK this is discouraged. I don't understand the use case; the driver will set this field, but who is going to consume it? Thanks, Paolo Hi Paolo, When the flow offloading engine forwards a packet to the DMA it will send additional info to the sw path. this includes * physical switch port * internal flow hash - this is required to populate the correct flow table entry * ppe state - this indicates what state the PPEs internal table is in for the flow * the reason why the packet was forwarde - these are things like bind, unbind, timed out, ... once the flow table offloading patches are ready and upstream, the netfilter layer will see the SKB and pass it o to the flow table offloading code, at which point it will finally end up inside the offloading driver. this will need to have access to this info sent to the sw path inside the rx descriptor to properly work out what state the flow is in and which table entry to populate in the HW table for offloading to work. Hope that is a little clearer. current hackish driver is here [1], the patch to the ethernet driver is here [2] John [1] https://git.lede-project.org/?p=lede/blogic/staging.git;a=tree;f=target/linux/mediatek/files/drivers/net/ethernet/mediatek/mtk_hnat;hb=bc0518b9d928b43d965d8a1f8860281f0ae6a31c [2] https://git.lede-project.org/?p=lede/blogic/staging.git;a=blob;f=target/linux/mediatek/patches-4.9/0310-hwnat.patch;h=57bd0c07b39d2169f3ba08e1aa83b92dffcee025;hb=bc0518b9d928b43d965d8a1f8860281f0ae6a31c
Re: [patch net-next] mlxsw: spectrum_router: Don't ignore IPv6 notifications
On Fri, Jul 21, 2017 at 06:05:23PM +0200, Jiri Pirko wrote: > From: Ido Schimmel> > We now have all the necessary IPv6 infrastructure in place, so stop > ignoring these notifications. > > Signed-off-by: Ido Schimmel > Signed-off-by: Jiri Pirko Dave, we sent the wrong patch. Please ignore. Thanks
[PATCH net] net/socket: fix type in assignment and trim long line
The commit ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin") introduce a new sparse warning: net/socket.c:1919:27: warning: incorrect type in assignment (different address spaces) net/socket.c:1919:27:expected void *msg_control net/socket.c:1919:27:got void [noderef] *[addressable] msg_control and a line above 80 chars, let's fix them Fixes: ffb07550c76f ("copy_msghdr_from_user(): get rid of field-by-field copyin") Signed-off-by: Paolo Abeni--- I noticed only because I had a very similar commit pending; replacing the field-by-field copyin with a single copy_from_user() gives measurable performance improvements - more than 10% under UDP flood if using recvmsg() to sink the traffic --- net/socket.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/socket.c b/net/socket.c index bf21226..ad22df1 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1916,7 +1916,7 @@ static int copy_msghdr_from_user(struct msghdr *kmsg, if (copy_from_user(, umsg, sizeof(*umsg))) return -EFAULT; - kmsg->msg_control = msg.msg_control; + kmsg->msg_control = (void __force *)msg.msg_control; kmsg->msg_controllen = msg.msg_controllen; kmsg->msg_flags = msg.msg_flags; @@ -1935,7 +1935,8 @@ static int copy_msghdr_from_user(struct msghdr *kmsg, if (msg.msg_name && kmsg->msg_namelen) { if (!save_addr) { - err = move_addr_to_kernel(msg.msg_name, kmsg->msg_namelen, + err = move_addr_to_kernel(msg.msg_name, + kmsg->msg_namelen, kmsg->msg_name); if (err < 0) return err; -- 2.9.4
Re: [PATCHv2 net] ipv6: should not return rt->dst.error if it is prohibit or blk hole entry.
Hi Hangbin, [auto build test WARNING on net/master] url: https://github.com/0day-ci/linux/commits/Hangbin-Liu/ipv6-should-not-return-rt-dst-error-if-it-is-prohibit-or-blk-hole-entry/20170721-204554 config: x86_64-randconfig-x003-07211556 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/uapi/linux/capability.h:16, from include/linux/capability.h:15, from net/ipv6/route.c:29: net/ipv6/route.c: In function 'inet6_rtm_getroute': net/ipv6/route.c:3640:38: error: 'struct netns_ipv6' has no member named 'ip6_prohibit_entry'; did you mean 'ip6_null_entry'? if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^ include/linux/compiler.h:156:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> net/ipv6/route.c:3640:2: note: in expansion of macro 'if' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^~ net/ipv6/route.c:3641:21: error: 'struct netns_ipv6' has no member named 'ip6_blk_hole_entry'; did you mean 'ip6_null_entry'? rt != net->ipv6.ip6_blk_hole_entry) { ^ include/linux/compiler.h:156:30: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> net/ipv6/route.c:3640:2: note: in expansion of macro 'if' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^~ net/ipv6/route.c:3640:38: error: 'struct netns_ipv6' has no member named 'ip6_prohibit_entry'; did you mean 'ip6_null_entry'? if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^ include/linux/compiler.h:156:42: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> net/ipv6/route.c:3640:2: note: in expansion of macro 'if' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^~ net/ipv6/route.c:3641:21: error: 'struct netns_ipv6' has no member named 'ip6_blk_hole_entry'; did you mean 'ip6_null_entry'? rt != net->ipv6.ip6_blk_hole_entry) { ^ include/linux/compiler.h:156:42: note: in definition of macro '__trace_if' if (__builtin_constant_p(!!(cond)) ? !!(cond) : \ ^~~~ >> net/ipv6/route.c:3640:2: note: in expansion of macro 'if' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^~ net/ipv6/route.c:3640:38: error: 'struct netns_ipv6' has no member named 'ip6_prohibit_entry'; did you mean 'ip6_null_entry'? if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^ include/linux/compiler.h:167:16: note: in definition of macro '__trace_if' __r = !!(cond); \ ^~~~ >> net/ipv6/route.c:3640:2: note: in expansion of macro 'if' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^~ net/ipv6/route.c:3641:21: error: 'struct netns_ipv6' has no member named 'ip6_blk_hole_entry'; did you mean 'ip6_null_entry'? rt != net->ipv6.ip6_blk_hole_entry) { ^ include/linux/compiler.h:167:16: note: in definition of macro '__trace_if' __r = !!(cond); \ ^~~~ >> net/ipv6/route.c:3640:2: note: in expansion of macro 'if' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^~ net/ipv6/route.c: At top level: include/linux/compiler.h:162:4: warning: '__f' is static but declared in inline function 'strcpy' which is not static __f = { \ ^ include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if' #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) ) ^~ include/linux/string.h:390:2: note: in expansion of macro 'if' if (p_size == (size_t)-1 && q_size == (size_t)-1) ^~ include/linux/compiler.h:162:4: warning: '__f' is static but declared in inline function 'kmemdup' which is not static __f = { \ ^ include/linux/compiler.h:1
pull-request: wireless-drivers 2017-07-21
Hi Dave, important fixes for net which had accumulated while I was away. I only applied the brcmfmac and rtlwifi patches only eight hours ago and I haven't seen the kbuild report yet so they might have some build breakage in theory. But the patches are so small so the chances that they would break something are really small and so I send this to you already now to not delay them any longer. Please let me know if there are any problems. Kalle The following changes since commit 96080f697786e0a30006fcbcc5b53f350fcb3e9f: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-07-20 16:33:39 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2017-07-21 for you to fetch changes up to d755cbc26e8295ae8e5d30425364e093b4247a85: Merge tag 'iwlwifi-for-kalle-2017-07-21' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes (2017-07-21 14:33:27 +0300) wireless-drivers fixes for 4.13 Important, but small in size, fixes. brcmfmac * fix a regression in SDIO support introduced in v4.13-rc1 rtlwifi * fix a regression in bluetooth coexistance introduced in v4.13-rc1 iwlwifi * a few NULL pointer dereferences in the recovery flow * a small but important fix for IBSS * a one-liner fix for tracing, which was including too much data Arend Van Spriel (1): brcmfmac: fix regression in brcmf_sdio_txpkt_hdalign() Dan Carpenter (1): iwlwifi: missing error code in iwl_trans_pcie_alloc() Emmanuel Grumbach (3): iwlwifi: dvm: prevent an out of bounds access iwlwifi: mvm: fix a NULL pointer dereference of error in recovery iwlwifi: fix tracing when tx only is enabled Johannes Berg (1): iwlwifi: mvm: defer setting IWL_MVM_STATUS_IN_HW_RESTART Kalle Valo (1): Merge tag 'iwlwifi-for-kalle-2017-07-21' of git://git.kernel.org/.../iwlwifi/iwlwifi-fixes Larry Finger (1): Revert "rtlwifi: btcoex: rtl8723be: fix ant_sel not work" Luca Coelho (1): iwlwifi: mvm: handle IBSS probe_queue in a few missing places Mordechai Goodstein (1): iwlwifi: pcie: fix unused txq NULL pointer dereference drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 3 ++- drivers/net/wireless/intel/iwlwifi/dvm/tx.c | 2 +- drivers/net/wireless/intel/iwlwifi/iwl-devtrace.h | 4 ++-- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 8 +++- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h| 2 ++ drivers/net/wireless/intel/iwlwifi/mvm/ops.c| 6 +++--- drivers/net/wireless/intel/iwlwifi/mvm/sta.c| 15 ++- drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 3 ++- drivers/net/wireless/intel/iwlwifi/pcie/tx.c| 3 +++ drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c | 3 --- drivers/net/wireless/realtek/rtlwifi/wifi.h | 1 - 11 files changed, 36 insertions(+), 14 deletions(-) -- Kalle Valo
Re: commit 16ecba59 breaks 82574L under heavy load.
On Fri, Jul 21, 2017 at 11:27:09AM -0400, wrote: > On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote: > > Could you please test the following patch and let me know if it: > > 1) reduces the interrupt rate of the Other msi-x vector > > 2) avoids the link flaps > > or > > 3) logs some dmesg warnings of the form "Other interrupt with unhandled > > [...]" > > In this case, please paste icr values printed. > > I will give it a try. So test looks excellent. Seems to only get interrupts when link state actually changes now. > Another odd behaviour I see is that the driver will hang in > napi_synchronize on shutdown if there is traffic at the time (at least > I think that's the trigger, maybe the trigger is if there has been an > overload of traffic and the backlog in napi was used). > > From doing some searching, this seems to be a problem that has plagued > some people for years with this driver. > > I am having trouble figuring out exactly what napi_synchronize is waiting > for and who is supposed to toggle the flag it is waiting on. The flag > appears to work backwards from what I would have expected it to do. > I see lots of places that can set the bit, but only napi_enable seems > to clear it again, and I don't see how that would get called for all > the places that potentially set the bit. I just realized NAPI_STATE_SCHED and NAPIF_STATE_SCHED are the same thing and I need to look at both of those. Still something seems odd in some corner case where napi gets stuck and you can't close the port anymore due to napi_synchronize never being able to finish. Some traffic pattern causes that SCHED state bit to get into the wrong state and nothing ever clears it. Even managed to see it get stuck so it never passed traffic again and hung on shutdown. The napi poll was never called again. -- Len Sorensen
[patch net] mlxsw: spectrum_router: Don't offload routes next in list
From: Ido SchimmelEach FIB node holds a linked list of routes sharing the same prefix and length. In the case of IPv4 it's ordered according to table ID, metric and TOS and only the first route in the list is actually programmed to the device. In case a gatewayed route is added somewhere in the list, then after its nexthop group will be refreshed and become valid (due to the resolution of its gateway), it'll mistakenly overwrite the existing entry. Example: 192.168.200.0/24 dev enp3s0np3 scope link metric 1000 offload 192.168.200.0/24 via 192.168.100.1 dev enp3s0np3 metric 1000 offload Both routes are marked as offloaded despite the fact only the first one should actually be present in the device's table. When refreshing the nexthop group, don't write the route to the device's table unless it's the first in its node. Fixes: 9aecce1c7d97 ("mlxsw: spectrum_router: Correctly handle identical routes") Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 383fef5a..4b2e0fd 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -1512,6 +1512,10 @@ mlxsw_sp_nexthop_group_mac_update(struct mlxsw_sp *mlxsw_sp, static int mlxsw_sp_fib_entry_update(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_fib_entry *fib_entry); +static bool +mlxsw_sp_fib_node_entry_is_first(const struct mlxsw_sp_fib_node *fib_node, +const struct mlxsw_sp_fib_entry *fib_entry); + static int mlxsw_sp_nexthop_fib_entries_update(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_nexthop_group *nh_grp) @@ -1520,6 +1524,9 @@ mlxsw_sp_nexthop_fib_entries_update(struct mlxsw_sp *mlxsw_sp, int err; list_for_each_entry(fib_entry, _grp->fib_list, nexthop_group_node) { + if (!mlxsw_sp_fib_node_entry_is_first(fib_entry->fib_node, + fib_entry)) + continue; err = mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry); if (err) return err; -- 2.9.3
[patch net-next] mlxsw: spectrum_router: Don't ignore IPv6 notifications
From: Ido SchimmelWe now have all the necessary IPv6 infrastructure in place, so stop ignoring these notifications. Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c index 9adb239..fa15f17 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c @@ -3804,7 +3804,7 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb, struct fib_notifier_info *info = ptr; struct mlxsw_sp_router *router; - if (!net_eq(info->net, _net) || info->family != AF_INET) + if (!net_eq(info->net, _net)) return NOTIFY_DONE; fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC); -- 2.9.3
Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.
On 7/20/17 9:23 AM, Hangbin Liu wrote: > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 4d30c96..c290aa4 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -3637,13 +3637,8 @@ static int inet6_rtm_getroute(struct sk_buff > *in_skb, struct nlmsghdr *nlh, > dst = ip6_route_lookup(net, , 0); > > rt = container_of(dst, struct rt6_info, dst); > - if (rt->dst.error) { > - err = rt->dst.error; > - ip6_rt_put(rt); > - goto errout; > - } The above 5 lines introduced the change in behavior, so removing them should be sufficient.
Re: [iovisor-dev] [PATCH net 0/2] bpf: fix verifier min/max handling in BPF_SUB
Edward Cree via iovisor-devwrote: > I managed to come up with a test for the swapped bounds in BPF_SUB, so here > it is along with a patch that fixes it, separated out from my 'rewrite > everything' series so it can go to -stable. > > Edward Cree (2): > selftests/bpf: subtraction bounds test > bpf/verifier: fix min/max handling in BPF_SUB > > kernel/bpf/verifier.c | 21 +++-- > tools/testing/selftests/bpf/test_verifier.c | 28 > 2 files changed, 43 insertions(+), 6 deletions(-) > > ___ > iovisor-dev mailing list > iovisor-...@lists.iovisor.org > https://lists.iovisor.org/mailman/listinfo/iovisor-dev Thanks for separating it. Nadav
Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info
Hi, On Fri, 2017-07-21 at 17:20 +0200, John Crispin wrote: > In order to make HW flow offloading work in latest MediaTek silicon we need > to propagate part of the RX DMS descriptor to the upper layers populating > the flow offload engines HW tables. This patch adds an extra element to > struct skb_shared_info allowing the ethernet drivers RX napi code to store > the required information and make it persistent for the lifecycle of the > skb and its clones. > > Signed-off-by: John Crispin> --- > include/linux/skbuff.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 4093552be1de..db9576cd946b 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -426,6 +426,7 @@ struct skb_shared_info { > unsigned intgso_type; > u32 tskey; > __be32 ip6_frag_id; > + u32 dma_desc; > > /* >* Warning : all fields before dataref are cleared in __alloc_skb() This will increase the skb_shared_info struct size, which is already quite large, and can have several kind of performance drawback. AFAIK this is discouraged. I don't understand the use case; the driver will set this field, but who is going to consume it? Thanks, Paolo
Re: [PATCH] net: ethernet: ti: cpsw: Push the request_irq function to the end of probe
On 07/20/2017 11:45 PM, Keerthy wrote: On Friday 21 July 2017 04:14 AM, Grygorii Strashko wrote: On 07/20/2017 05:28 PM, David Miller wrote: From: Grygorii StrashkoDate: Thu, 20 Jul 2017 11:08:09 -0500 In general patch looks good to me, but it's really unexpected to receive IRQs while CPSW is probing ;( This is a poor expectation. Boot loaders and other entities can leave the device in any state whatsoever. Furthermore, enabling an IRQ whose handler cannot properly execute without crashing is wrong fundamentally. All data structures and state must be set up properly before the IRQ is requested. Therefore this patch is correct and I will apply it. Thanks. Agree (it just has never triggered before, so I meant - unexpected from current driver code point of view ;(). And I'm just worry that it might not be enough :(, especially for am335x. I tried nfs boot on am335x-evm with this patch and it boots fine for me. Thank you Keerthy, I've also simulated and tested it on am335x and dra7 and see networking working and no crashes. -- regards, -grygorii
Re: [PATCHv2 net] ipv6: should not return rt->dst.error if it is prohibit or blk hole entry.
Hi Hangbin, [auto build test ERROR on net/master] url: https://github.com/0day-ci/linux/commits/Hangbin-Liu/ipv6-should-not-return-rt-dst-error-if-it-is-prohibit-or-blk-hole-entry/20170721-204554 config: m68k-sun3_defconfig (attached as .config) compiler: m68k-linux-gcc (GCC) 4.9.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=m68k All errors (new ones prefixed by >>): net//ipv6/route.c: In function 'inet6_rtm_getroute': >> net//ipv6/route.c:3640:38: error: 'struct netns_ipv6' has no member named >> 'ip6_prohibit_entry' if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && ^ >> net//ipv6/route.c:3641:21: error: 'struct netns_ipv6' has no member named >> 'ip6_blk_hole_entry' rt != net->ipv6.ip6_blk_hole_entry) { ^ vim +3640 net//ipv6/route.c 3558 3559 static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, 3560struct netlink_ext_ack *extack) 3561 { 3562 struct net *net = sock_net(in_skb->sk); 3563 struct nlattr *tb[RTA_MAX+1]; 3564 int err, iif = 0, oif = 0; 3565 struct dst_entry *dst; 3566 struct rt6_info *rt; 3567 struct sk_buff *skb; 3568 struct rtmsg *rtm; 3569 struct flowi6 fl6; 3570 bool fibmatch; 3571 3572 err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv6_policy, 3573extack); 3574 if (err < 0) 3575 goto errout; 3576 3577 err = -EINVAL; 3578 memset(, 0, sizeof(fl6)); 3579 rtm = nlmsg_data(nlh); 3580 fl6.flowlabel = ip6_make_flowinfo(rtm->rtm_tos, 0); 3581 fibmatch = !!(rtm->rtm_flags & RTM_F_FIB_MATCH); 3582 3583 if (tb[RTA_SRC]) { 3584 if (nla_len(tb[RTA_SRC]) < sizeof(struct in6_addr)) 3585 goto errout; 3586 3587 fl6.saddr = *(struct in6_addr *)nla_data(tb[RTA_SRC]); 3588 } 3589 3590 if (tb[RTA_DST]) { 3591 if (nla_len(tb[RTA_DST]) < sizeof(struct in6_addr)) 3592 goto errout; 3593 3594 fl6.daddr = *(struct in6_addr *)nla_data(tb[RTA_DST]); 3595 } 3596 3597 if (tb[RTA_IIF]) 3598 iif = nla_get_u32(tb[RTA_IIF]); 3599 3600 if (tb[RTA_OIF]) 3601 oif = nla_get_u32(tb[RTA_OIF]); 3602 3603 if (tb[RTA_MARK]) 3604 fl6.flowi6_mark = nla_get_u32(tb[RTA_MARK]); 3605 3606 if (tb[RTA_UID]) 3607 fl6.flowi6_uid = make_kuid(current_user_ns(), 3608 nla_get_u32(tb[RTA_UID])); 3609 else 3610 fl6.flowi6_uid = iif ? INVALID_UID : current_uid(); 3611 3612 if (iif) { 3613 struct net_device *dev; 3614 int flags = 0; 3615 3616 dev = __dev_get_by_index(net, iif); 3617 if (!dev) { 3618 err = -ENODEV; 3619 goto errout; 3620 } 3621 3622 fl6.flowi6_iif = iif; 3623 3624 if (!ipv6_addr_any()) 3625 flags |= RT6_LOOKUP_F_HAS_SADDR; 3626 3627 if (!fibmatch) 3628 dst = ip6_route_input_lookup(net, dev, , flags); 3629 } else { 3630 fl6.flowi6_oif = oif; 3631 3632 if (!fibmatch) 3633 dst = ip6_route_output(net, NULL, ); 3634 } 3635 3636 if (fibmatch) 3637 dst = ip6_route_lookup(net, , 0); 3638 3639 rt = container_of(dst, struct rt6_info, dst); > 3640 if (rt->dst.error && rt != net->ipv6.ip6_prohibit_entry && > 3641 rt != net->ipv6.ip6_blk_hole_entry) { 3642 err = rt->dst.error; 3643 ip6_rt_put(rt); 3644 goto errout; 3645 } 3646 3647 skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); 3648 if (!skb) { 3649 ip6_rt_put(rt); 3650 err = -ENOBUFS; 3651 goto errout; 3652 } 3653 3654 skb_dst_set(skb, >dst); 3655 if (fibmatch) 3656 err = rt6_fill_node(net, skb, rt, NULL, NULL, iif, 3657
Re: commit 16ecba59 breaks 82574L under heavy load.
On Thu, Jul 20, 2017 at 04:44:55PM -0700, Benjamin Poirier wrote: > Could you please test the following patch and let me know if it: > 1) reduces the interrupt rate of the Other msi-x vector > 2) avoids the link flaps > or > 3) logs some dmesg warnings of the form "Other interrupt with unhandled [...]" > In this case, please paste icr values printed. I will give it a try. Another odd behaviour I see is that the driver will hang in napi_synchronize on shutdown if there is traffic at the time (at least I think that's the trigger, maybe the trigger is if there has been an overload of traffic and the backlog in napi was used). >From doing some searching, this seems to be a problem that has plagued some people for years with this driver. I am having trouble figuring out exactly what napi_synchronize is waiting for and who is supposed to toggle the flag it is waiting on. The flag appears to work backwards from what I would have expected it to do. I see lots of places that can set the bit, but only napi_enable seems to clear it again, and I don't see how that would get called for all the places that potentially set the bit. -- Len Sorensen
[RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info
In order to make HW flow offloading work in latest MediaTek silicon we need to propagate part of the RX DMS descriptor to the upper layers populating the flow offload engines HW tables. This patch adds an extra element to struct skb_shared_info allowing the ethernet drivers RX napi code to store the required information and make it persistent for the lifecycle of the skb and its clones. Signed-off-by: John Crispin--- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4093552be1de..db9576cd946b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -426,6 +426,7 @@ struct skb_shared_info { unsigned intgso_type; u32 tskey; __be32 ip6_frag_id; + u32 dma_desc; /* * Warning : all fields before dataref are cleared in __alloc_skb() -- 2.11.0
[RFC 2/2] net-next: mediatek: populate the shared
When enabling HW flow offloading, the traffic coming in on either of the GMACs is first sent to the PPE for processing. Any traffic not offloaded at this point will then be forwarded to the normal RX DMA ring for SW path processing. In this case the PPE will send additional data inside RXD4 that is later required by the upper layers to populate the flow offloading engines HW tables properly. This patch sets the skb_shared_info's dma_desc field so that we can use the value later on. Signed-off-by: John Crispin--- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index a455d1b4f1d8..42d162cd6363 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -918,6 +918,7 @@ static void mtk_update_rx_cpu_idx(struct mtk_eth *eth) static int mtk_poll_rx(struct napi_struct *napi, int budget, struct mtk_eth *eth) { + struct skb_shared_info *sh; struct mtk_rx_ring *ring; int idx; struct sk_buff *skb; @@ -1000,6 +1001,9 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget, else skb_checksum_none_assert(skb); skb->protocol = eth_type_trans(skb, netdev); + sh = skb_shinfo(skb); + + sh->dma_desc = trxd.rxd4; if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX && RX_DMA_VID(trxd.rxd3)) -- 2.11.0
[RFC 0/2] net-next: hw flow offloading
Hi, I managed to bring up the flow offloading on latest MedieTek silicon. When enabling HW flow offloading, the traffic coming in on either of the GMACs is first sent to the PPE for processing. Any traffic not offloaded at this point will then be forwarded to the normal RX DMA ring for SW path processing. In this case the PPE will send additional data inside RXD4 that is later required by the upper layers to populate the flow offloading engines HW tables properly. This series is a RFC as i am not sure how to best propagate the additional info from the RX DMA descriptor. The driver is still using NF hooks and I plan to rebase it and send it upstream once the flow table offloading patches that folks are working on are upstream. I am right now trying to get rid of the remaning hacks in the code and wanted to know if this series would be a feasible solution. John John Crispin (2): net-next: add a dma_desc element to struct skb_shared_info net-next: mediatek: populate the shared drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 include/linux/skbuff.h | 1 + 2 files changed, 5 insertions(+) -- 2.11.0
[PATCH] lib: test_rhashtable: fix for large entry counts
During concurrent access testing, threadfunc() concatenated thread ID and object index to create a unique key like so: | tdata->objs[i].value = (tdata->id << 16) | i; This breaks if a user passes an entries parameter of 64k or higher, since 'i' might use more than 16 bits then. Effectively, this will lead to duplicate keys in the table. Fix the problem by introducing a struct holding object and thread ID and using that as key instead of a single integer type field. Fixes: f4a3e90ba5739 ("rhashtable-test: extend to test concurrency") Reported by: Manuel MessnerSigned-off-by: Phil Sutter --- lib/test_rhashtable.c | 53 +++ 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c index 64e899b633371..16949d219291a 100644 --- a/lib/test_rhashtable.c +++ b/lib/test_rhashtable.c @@ -56,8 +56,13 @@ static bool enomem_retry = false; module_param(enomem_retry, bool, 0); MODULE_PARM_DESC(enomem_retry, "Retry insert even if -ENOMEM was returned (default: off)"); +struct test_obj_val { + int id; + int tid; +}; + struct test_obj { - int value; + struct test_obj_val value; struct rhash_head node; }; @@ -72,7 +77,7 @@ static struct test_obj array[MAX_ENTRIES]; static struct rhashtable_params test_rht_params = { .head_offset = offsetof(struct test_obj, node), .key_offset = offsetof(struct test_obj, value), - .key_len = sizeof(int), + .key_len = sizeof(struct test_obj_val), .hashfn = jhash, .nulls_base = (3U << RHT_BASE_SHIFT), }; @@ -109,24 +114,26 @@ static int __init test_rht_lookup(struct rhashtable *ht) for (i = 0; i < entries * 2; i++) { struct test_obj *obj; bool expected = !(i % 2); - u32 key = i; + struct test_obj_val key = { + .id = i, + }; - if (array[i / 2].value == TEST_INSERT_FAIL) + if (array[i / 2].value.id == TEST_INSERT_FAIL) expected = false; obj = rhashtable_lookup_fast(ht, , test_rht_params); if (expected && !obj) { - pr_warn("Test failed: Could not find key %u\n", key); + pr_warn("Test failed: Could not find key %u\n", key.id); return -ENOENT; } else if (!expected && obj) { pr_warn("Test failed: Unexpected entry found for key %u\n", - key); + key.id); return -EEXIST; } else if (expected && obj) { - if (obj->value != i) { + if (obj->value.id != i) { pr_warn("Test failed: Lookup value mismatch %u!=%u\n", - obj->value, i); + obj->value.id, i); return -EINVAL; } } @@ -195,7 +202,7 @@ static s64 __init test_rhashtable(struct rhashtable *ht) for (i = 0; i < entries; i++) { struct test_obj *obj = [i]; - obj->value = i * 2; + obj->value.id = i * 2; err = insert_retry(ht, >node, test_rht_params); if (err > 0) insert_retries += err; @@ -218,7 +225,7 @@ static s64 __init test_rhashtable(struct rhashtable *ht) for (i = 0; i < entries; i++) { u32 key = i * 2; - if (array[i].value != TEST_INSERT_FAIL) { + if (array[i].value.id != TEST_INSERT_FAIL) { obj = rhashtable_lookup_fast(ht, , test_rht_params); BUG_ON(!obj); @@ -242,18 +249,21 @@ static int thread_lookup_test(struct thread_data *tdata) for (i = 0; i < entries; i++) { struct test_obj *obj; - int key = (tdata->id << 16) | i; + struct test_obj_val key = { + .id = i, + .tid = tdata->id, + }; obj = rhashtable_lookup_fast(, , test_rht_params); - if (obj && (tdata->objs[i].value == TEST_INSERT_FAIL)) { - pr_err(" found unexpected object %d\n", key); + if (obj && (tdata->objs[i].value.id == TEST_INSERT_FAIL)) { + pr_err(" found unexpected object %d-%d\n", key.tid, key.id); err++; - } else if (!obj && (tdata->objs[i].value != TEST_INSERT_FAIL)) { - pr_err(" object %d not found!\n", key); + } else if (!obj && (tdata->objs[i].value.id != TEST_INSERT_FAIL)) { + pr_err(" object %d-%d not
[PATCH] iwlwifi: Demote messages about fw flags size to info
These messages are not reporting a real error, just the fact that the firmware knows about more flags then the driver. Currently these messages are presented to the user during boot if there is no bootsplash covering the console, sometimes even if the boot splash is enabled but has not started yet by the time this message is shown. Demoting it to the info level helps having a clean boot process. Signed-off-by: João Paulo Rechi Vita--- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c index 6fdb5921e17f..557acd43d705 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c @@ -487,9 +487,9 @@ static int iwl_set_ucode_api_flags(struct iwl_drv *drv, const u8 *data, int i; if (api_index >= DIV_ROUND_UP(NUM_IWL_UCODE_TLV_API, 32)) { - IWL_ERR(drv, - "api flags index %d larger than supported by driver\n", - api_index); + IWL_INFO(drv, +"api flags index %d larger than supported by driver\n", +api_index); /* don't return an error so we can load FW that has more bits */ return 0; } @@ -511,9 +511,9 @@ static int iwl_set_ucode_capabilities(struct iwl_drv *drv, const u8 *data, int i; if (api_index >= DIV_ROUND_UP(NUM_IWL_UCODE_TLV_CAPA, 32)) { - IWL_ERR(drv, - "capa flags index %d larger than supported by driver\n", - api_index); + IWL_INFO(drv, +"capa flags index %d larger than supported by driver\n", +api_index); /* don't return an error so we can load FW that has more bits */ return 0; } -- 2.13.2
Re: [PATCH v2 1/4] net: phy: at803x: Document RGMII RX and TX clock delay issues
On 21/07/2017 16:06, Timur Tabi wrote: > On 7/21/17 8:29 AM, Marc Gonzalez wrote: > >> I don't understand what you're saying. >> >> It is a correct observation that the code enabling >> RGMII RX clock delay is a NOP, since that bit will >> always be set at that point. >> >> The spec for the 8035 (I haven't checked for 8030 and 8031, >> is that what you meant by "other systems"?) states that >> Sel_clk125m_dsp, which is described as: >> "Control bit for rgmii interface rx clock delay" >> is 1 after HW reset, 1 after SW reset. >> >> So my statement "RX clock delay is enabled at reset" >> is universally true. It's not just on some systems. > > Ok, taken out of context, the comment doesn't really explain why the > code is the way it is. I'm not really happy about the word "assumes". If a HW setting defaults to 0 at reset, and some init is called right after reset, then you know the setting's value is 0. If you need that value to be 1, all you need is a function setting it to 1. This is the current situation. Commit 2e5f9f281ee8 assumes 0 at reset, and provides a function setting the value to 1. Reality is different. The HW setting defaults to 1 at reset. So it turns out that the function setting the value to 1 is pointless, because the value is already 1. There is however no way to set the value to 0. Does that explain why I wrote "assume"? Also the commit message: "The current code supports enabling RGMII RX and TX clock delays. The unstated assumption is that these settings are disabled by default at reset, which is not the case." > Maybe you should add a sentence explaining when the code is NOT a no-op. The code is *NEVER* NOT a no-op. I.e. the code enabling RX clock delay is ALWAYS a no-op. I don't understand what you think is unclear in my comment. Regards.
Re: [PATCH net 2/2] bpf/verifier: fix min/max handling in BPF_SUB
On 07/21/2017 03:37 PM, Edward Cree wrote: We have to subtract the src max from the dst min, and vice-versa, since (e.g.) the smallest result comes from the largest subtrahend. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Edward CreeLGTM, thanks for the fix! Acked-by: Daniel Borkmann --- kernel/bpf/verifier.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index af9e84a..664d939 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1865,10 +1865,12 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, * do our normal operations to the register, we need to set the values * to the min/max since they are undefined. */ - if (min_val == BPF_REGISTER_MIN_RANGE) - dst_reg->min_value = BPF_REGISTER_MIN_RANGE; - if (max_val == BPF_REGISTER_MAX_RANGE) - dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + if (opcode != BPF_SUB) { + if (min_val == BPF_REGISTER_MIN_RANGE) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; + if (max_val == BPF_REGISTER_MAX_RANGE) + dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + } switch (opcode) { case BPF_ADD: @@ -1879,10 +1881,17 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, dst_reg->min_align = min(src_align, dst_align); break; case BPF_SUB: + /* If one of our values was at the end of our ranges, then the +* _opposite_ value in the dst_reg goes to the end of our range. +*/ + if (min_val == BPF_REGISTER_MIN_RANGE) + dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + if (max_val == BPF_REGISTER_MAX_RANGE) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE) - dst_reg->min_value -= min_val; + dst_reg->min_value -= max_val; if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) - dst_reg->max_value -= max_val; + dst_reg->max_value -= min_val; dst_reg->min_align = min(src_align, dst_align); break; case BPF_MUL:
Re: [PATCH net 1/2] selftests/bpf: subtraction bounds test
On 07/21/2017 03:36 PM, Edward Cree wrote: There is a bug in the verifier's handling of BPF_SUB: [a,b] - [c,d] yields was [a-c, b-d] rather than the correct [a-d, b-c]. So here is a test which, with the bogus handling, will produce ranges of [0,0] and thus allowed accesses; whereas the correct handling will give a range of [-255, 255] (and hence the right-shift will give a range of [0, 255]) and the accesses will be rejected. Signed-off-by: Edward CreeAcked-by: Daniel Borkmann tools/testing/selftests/bpf/test_verifier.c | 28 1 file changed, 28 insertions(+) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index af7d173..addea82 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -5980,6 +5980,34 @@ static struct bpf_test tests[] = { .result = REJECT, .result_unpriv = REJECT, }, + { + "subtraction bounds (map value)", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_JMP_IMM(BPF_JGT, BPF_REG_1, 0xff, 7), + BPF_LDX_MEM(BPF_B, BPF_REG_3, BPF_REG_0, 1), + BPF_JMP_IMM(BPF_JGT, BPF_REG_3, 0xff, 5), + BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_3), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 56), + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr_unpriv = "R0 pointer arithmetic prohibited", + .errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.", + .result = REJECT, + .result_unpriv = REJECT, + }, }; static int probe_filter_length(const struct bpf_insn *fp)
Re: [PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
Marc Gonzalezwrites: > On 21/07/2017 15:47, Måns Rullgård wrote: > >> Marc Gonzalez wrote: >> >>> On 21/07/2017 15:04, Måns Rullgård wrote: >>> Marc Gonzalez wrote: > According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 > ("Documentation: devicetree: clarify usage of the RGMII phy-modes") > there are 4 RGMII modes to handle: > > "rgmii" (RX and TX delays are added by the MAC when required) > "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, > the MAC should not add the RX or TX delays in this case) > "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, > the MAC should not add an RX delay in this case) > "rgmii-txid" (RGMII with internal TX delay provided by the PHY, > the MAC should not add an TX delay in this case) > > Add TX delay in the MAC only for rgmii and rgmii-rxid. > > Signed-off-by: Marc Gonzalez > --- > drivers/net/ethernet/aurora/nb8800.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/aurora/nb8800.c > b/drivers/net/ethernet/aurora/nb8800.c > index ded041dbafe7..f3ed320eb4ad 100644 > --- a/drivers/net/ethernet/aurora/nb8800.c > +++ b/drivers/net/ethernet/aurora/nb8800.c > @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device > *dev) > break; > > case PHY_INTERFACE_MODE_RGMII: > - pad_mode = PAD_MODE_RGMII; > + case PHY_INTERFACE_MODE_RGMII_RXID: > + pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; > break; > > + case PHY_INTERFACE_MODE_RGMII_ID: > case PHY_INTERFACE_MODE_RGMII_TXID: > - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; > + pad_mode = PAD_MODE_RGMII; > break; > > default: I still don't like this. Having both the MAC and PHY drivers react to the phy-connection-type property is bound to cause trouble somewhere. The only way out of the current mess is to define new properties for both MAC and PHY that override the existing ones if present. >>> >>> Do you mean defining 4 new bindings and their corresponding >>> phy_interface_t enum values? For example: >>> >>> "rgmii-v2" >>> "rgmii-id-v2" >>> "rgmii-rxid-v2" >>> "rgmii-txid-v2" >>> >>> PHY_INTERFACE_MODE_RGMII_V2, >>> PHY_INTERFACE_MODE_RGMII_ID_V2, >>> PHY_INTERFACE_MODE_RGMII_RXID_V2, >>> PHY_INTERFACE_MODE_RGMII_TXID_V2, >>> >>> And then handling these new enums in the at803x and nb8800 drivers? >> >> It has already been suggested to add new properties specifying desired >> delays in picoseconds. If present on the MAC node, the MAC driver >> should attempt to provide the delay, and if present on the PHY node, the >> PHY driver is responsible. > > Sorry, I had already forgotten about Florian's suggestion: >> If you introduced PHY and/or MAC specific properties to configure the >> delays in the appropriate unit of time (say ps), you could use a >> non-compliant 'phy-mode' just to satisfy the driver/PHY library and >> still override the delays you need. > > So we would need two properties (RX and TX). > "rgmii_rx_skew_ps" and "rgmii_tx_skew_ps" > > but it's not clear to me how the MAC probe function communicates > the arguments to the phy driver. Looks like we would need to add > two fields to struct phy_device, and maybe define a new phy-mode > to instruct the PHY driver to look for the two fields. There's no need for the drivers to communicate. The location of the properties in the device tree determines which driver should deal with it. > I don't have time to work on that for now, but I do need to > fix the nb8800 driver now. Can we apply the following patch > in the interim? > > diff --git a/drivers/net/ethernet/aurora/nb8800.c > b/drivers/net/ethernet/aurora/nb8800.c > index ded041dbafe7..e94159507847 100644 > --- a/drivers/net/ethernet/aurora/nb8800.c > +++ b/drivers/net/ethernet/aurora/nb8800.c > @@ -1268,11 +1268,10 @@ static int nb8800_tangox_init(struct net_device *dev) > break; > > case PHY_INTERFACE_MODE_RGMII: > - pad_mode = PAD_MODE_RGMII; > - break; > - > + case PHY_INTERFACE_MODE_RGMII_ID: > + case PHY_INTERFACE_MODE_RGMII_RXID: > case PHY_INTERFACE_MODE_RGMII_TXID: > - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; > + pad_mode = PAD_MODE_RGMII; > break; > > default: Simply stop reacting to the delay aspect of the phy-connection-type property? Yes, I'm fine with that. -- Måns Rullgård
Re: [PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
On 21/07/2017 15:47, Måns Rullgård wrote: > Marc Gonzalez wrote: > >> On 21/07/2017 15:04, Måns Rullgård wrote: >> >>> Marc Gonzalez wrote: >>> According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 ("Documentation: devicetree: clarify usage of the RGMII phy-modes") there are 4 RGMII modes to handle: "rgmii" (RX and TX delays are added by the MAC when required) "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, the MAC should not add the RX or TX delays in this case) "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC should not add an RX delay in this case) "rgmii-txid" (RGMII with internal TX delay provided by the PHY, the MAC should not add an TX delay in this case) Add TX delay in the MAC only for rgmii and rgmii-rxid. Signed-off-by: Marc Gonzalez--- drivers/net/ethernet/aurora/nb8800.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c index ded041dbafe7..f3ed320eb4ad 100644 --- a/drivers/net/ethernet/aurora/nb8800.c +++ b/drivers/net/ethernet/aurora/nb8800.c @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device *dev) break; case PHY_INTERFACE_MODE_RGMII: - pad_mode = PAD_MODE_RGMII; + case PHY_INTERFACE_MODE_RGMII_RXID: + pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; break; + case PHY_INTERFACE_MODE_RGMII_ID: case PHY_INTERFACE_MODE_RGMII_TXID: - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; + pad_mode = PAD_MODE_RGMII; break; default: >>> >>> I still don't like this. Having both the MAC and PHY drivers react to >>> the phy-connection-type property is bound to cause trouble somewhere. >>> >>> The only way out of the current mess is to define new properties for >>> both MAC and PHY that override the existing ones if present. >> >> Do you mean defining 4 new bindings and their corresponding >> phy_interface_t enum values? For example: >> >> "rgmii-v2" >> "rgmii-id-v2" >> "rgmii-rxid-v2" >> "rgmii-txid-v2" >> >> PHY_INTERFACE_MODE_RGMII_V2, >> PHY_INTERFACE_MODE_RGMII_ID_V2, >> PHY_INTERFACE_MODE_RGMII_RXID_V2, >> PHY_INTERFACE_MODE_RGMII_TXID_V2, >> >> And then handling these new enums in the at803x and nb8800 drivers? > > It has already been suggested to add new properties specifying desired > delays in picoseconds. If present on the MAC node, the MAC driver > should attempt to provide the delay, and if present on the PHY node, the > PHY driver is responsible. Sorry, I had already forgotten about Florian's suggestion: > If you introduced PHY and/or MAC specific properties to configure the > delays in the appropriate unit of time (say ps), you could use a > non-compliant 'phy-mode' just to satisfy the driver/PHY library and > still override the delays you need. So we would need two properties (RX and TX). "rgmii_rx_skew_ps" and "rgmii_tx_skew_ps" but it's not clear to me how the MAC probe function communicates the arguments to the phy driver. Looks like we would need to add two fields to struct phy_device, and maybe define a new phy-mode to instruct the PHY driver to look for the two fields. I don't have time to work on that for now, but I do need to fix the nb8800 driver now. Can we apply the following patch in the interim? diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c index ded041dbafe7..e94159507847 100644 --- a/drivers/net/ethernet/aurora/nb8800.c +++ b/drivers/net/ethernet/aurora/nb8800.c @@ -1268,11 +1268,10 @@ static int nb8800_tangox_init(struct net_device *dev) break; case PHY_INTERFACE_MODE_RGMII: - pad_mode = PAD_MODE_RGMII; - break; - + case PHY_INTERFACE_MODE_RGMII_ID: + case PHY_INTERFACE_MODE_RGMII_RXID: case PHY_INTERFACE_MODE_RGMII_TXID: - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; + pad_mode = PAD_MODE_RGMII; break; default: Regards.
Re: [PATCH v2 1/4] net: phy: at803x: Document RGMII RX and TX clock delay issues
On 7/21/17 8:29 AM, Marc Gonzalez wrote: I don't understand what you're saying. It is a correct observation that the code enabling RGMII RX clock delay is a NOP, since that bit will always be set at that point. The spec for the 8035 (I haven't checked for 8030 and 8031, is that what you meant by "other systems"?) states that Sel_clk125m_dsp, which is described as: "Control bit for rgmii interface rx clock delay" is 1 after HW reset, 1 after SW reset. So my statement "RX clock delay is enabled at reset" is universally true. It's not just on some systems. Ok, taken out of context, the comment doesn't really explain why the code is the way it is. I'm not really happy about the word "assumes". Maybe you should add a sentence explaining when the code is NOT a no-op. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation.
[RFC PATCH] IP: do not modify ingress packet IP option in ip_options_echo()
While computing the response option set for LRSS, ip_options_echo() also changes the ingress packet LRSS addresses list, setting the last one to the dst specific address for the ingress packet - via memset(start[ ... The only visible effect of such change - beyond possibly damaging shared/cloned skbs - is modifying the data carried by ICMP replies changing the header information for reported the ingress packet, whichi violates RFC1122 3.2.2.6. All the others call sites just ignore the ingress packet IP options after calling ip_options_echo(). This behavior predates git history and apparently was present into the initial ip_options_echo() implementation in linux 1.3.30 but still looks wrong. Removing the fib_compute_spec_dst() call will allow completely removing, with a later patch, skb->dst usage in ip_options_echo(), so that we can release the dst entry early in the rx path and drop a couple of related hacks. This will evenutally target net-next. Signed-off-by: Paolo Abeni--- net/ipv4/ip_options.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c index 93157f2..fdda973 100644 --- a/net/ipv4/ip_options.c +++ b/net/ipv4/ip_options.c @@ -174,9 +174,6 @@ int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb, doffset -= 4; } if (doffset > 3) { - __be32 daddr = fib_compute_spec_dst(skb); - - memcpy([doffset-1], , 4); dopt->faddr = faddr; dptr[0] = start[0]; dptr[1] = doffset+3; -- 2.9.4
Re: [PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
Marc Gonzalezwrites: > On 21/07/2017 15:04, Måns Rullgård wrote: > >> Marc Gonzalez wrote: >> >>> According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 >>> ("Documentation: devicetree: clarify usage of the RGMII phy-modes") >>> there are 4 RGMII modes to handle: >>> >>> "rgmii" (RX and TX delays are added by the MAC when required) >>> "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, >>> the MAC should not add the RX or TX delays in this case) >>> "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, >>> the MAC should not add an RX delay in this case) >>> "rgmii-txid" (RGMII with internal TX delay provided by the PHY, >>> the MAC should not add an TX delay in this case) >>> >>> Add TX delay in the MAC only for rgmii and rgmii-rxid. >>> >>> Signed-off-by: Marc Gonzalez >>> --- >>> drivers/net/ethernet/aurora/nb8800.c | 6 -- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/aurora/nb8800.c >>> b/drivers/net/ethernet/aurora/nb8800.c >>> index ded041dbafe7..f3ed320eb4ad 100644 >>> --- a/drivers/net/ethernet/aurora/nb8800.c >>> +++ b/drivers/net/ethernet/aurora/nb8800.c >>> @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device >>> *dev) >>> break; >>> >>> case PHY_INTERFACE_MODE_RGMII: >>> - pad_mode = PAD_MODE_RGMII; >>> + case PHY_INTERFACE_MODE_RGMII_RXID: >>> + pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; >>> break; >>> >>> + case PHY_INTERFACE_MODE_RGMII_ID: >>> case PHY_INTERFACE_MODE_RGMII_TXID: >>> - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; >>> + pad_mode = PAD_MODE_RGMII; >>> break; >>> >>> default: >> >> I still don't like this. Having both the MAC and PHY drivers react to >> the phy-connection-type property is bound to cause trouble somewhere. >> >> The only way out of the current mess is to define new properties for >> both MAC and PHY that override the existing ones if present. > > Do you mean defining 4 new bindings and their corresponding > phy_interface_t enum values? For example: > > "rgmii-v2" > "rgmii-id-v2" > "rgmii-rxid-v2" > "rgmii-txid-v2" > > PHY_INTERFACE_MODE_RGMII_V2, > PHY_INTERFACE_MODE_RGMII_ID_V2, > PHY_INTERFACE_MODE_RGMII_RXID_V2, > PHY_INTERFACE_MODE_RGMII_TXID_V2, > > And then handling these new enums in the at803x and nb8800 drivers? It has already been suggested to add new properties specifying desired delays in picoseconds. If present on the MAC node, the MAC driver should attempt to provide the delay, and if present on the PHY node, the PHY driver is responsible. -- Måns Rullgård
Re: [PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
On 21/07/2017 15:04, Måns Rullgård wrote: > Marc Gonzalez wrote: > >> According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 >> ("Documentation: devicetree: clarify usage of the RGMII phy-modes") >> there are 4 RGMII modes to handle: >> >> "rgmii" (RX and TX delays are added by the MAC when required) >> "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, >> the MAC should not add the RX or TX delays in this case) >> "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, >> the MAC should not add an RX delay in this case) >> "rgmii-txid" (RGMII with internal TX delay provided by the PHY, >> the MAC should not add an TX delay in this case) >> >> Add TX delay in the MAC only for rgmii and rgmii-rxid. >> >> Signed-off-by: Marc Gonzalez>> --- >> drivers/net/ethernet/aurora/nb8800.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/aurora/nb8800.c >> b/drivers/net/ethernet/aurora/nb8800.c >> index ded041dbafe7..f3ed320eb4ad 100644 >> --- a/drivers/net/ethernet/aurora/nb8800.c >> +++ b/drivers/net/ethernet/aurora/nb8800.c >> @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device *dev) >> break; >> >> case PHY_INTERFACE_MODE_RGMII: >> -pad_mode = PAD_MODE_RGMII; >> +case PHY_INTERFACE_MODE_RGMII_RXID: >> +pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; >> break; >> >> +case PHY_INTERFACE_MODE_RGMII_ID: >> case PHY_INTERFACE_MODE_RGMII_TXID: >> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; >> +pad_mode = PAD_MODE_RGMII; >> break; >> >> default: > > I still don't like this. Having both the MAC and PHY drivers react to > the phy-connection-type property is bound to cause trouble somewhere. > > The only way out of the current mess is to define new properties for > both MAC and PHY that override the existing ones if present. Do you mean defining 4 new bindings and their corresponding phy_interface_t enum values? For example: "rgmii-v2" "rgmii-id-v2" "rgmii-rxid-v2" "rgmii-txid-v2" PHY_INTERFACE_MODE_RGMII_V2, PHY_INTERFACE_MODE_RGMII_ID_V2, PHY_INTERFACE_MODE_RGMII_RXID_V2, PHY_INTERFACE_MODE_RGMII_TXID_V2, And then handling these new enums in the at803x and nb8800 drivers? FWIW, PAD_MODE_GTX_CLK_DELAY is broken in tango5 (doesn't add any delay). I'm considering removing MAC-side TX delay altogether. Regards.
[PATCH net 2/2] bpf/verifier: fix min/max handling in BPF_SUB
We have to subtract the src max from the dst min, and vice-versa, since (e.g.) the smallest result comes from the largest subtrahend. Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Edward Cree--- kernel/bpf/verifier.c | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index af9e84a..664d939 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1865,10 +1865,12 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, * do our normal operations to the register, we need to set the values * to the min/max since they are undefined. */ - if (min_val == BPF_REGISTER_MIN_RANGE) - dst_reg->min_value = BPF_REGISTER_MIN_RANGE; - if (max_val == BPF_REGISTER_MAX_RANGE) - dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + if (opcode != BPF_SUB) { + if (min_val == BPF_REGISTER_MIN_RANGE) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; + if (max_val == BPF_REGISTER_MAX_RANGE) + dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + } switch (opcode) { case BPF_ADD: @@ -1879,10 +1881,17 @@ static void adjust_reg_min_max_vals(struct bpf_verifier_env *env, dst_reg->min_align = min(src_align, dst_align); break; case BPF_SUB: + /* If one of our values was at the end of our ranges, then the +* _opposite_ value in the dst_reg goes to the end of our range. +*/ + if (min_val == BPF_REGISTER_MIN_RANGE) + dst_reg->max_value = BPF_REGISTER_MAX_RANGE; + if (max_val == BPF_REGISTER_MAX_RANGE) + dst_reg->min_value = BPF_REGISTER_MIN_RANGE; if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE) - dst_reg->min_value -= min_val; + dst_reg->min_value -= max_val; if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE) - dst_reg->max_value -= max_val; + dst_reg->max_value -= min_val; dst_reg->min_align = min(src_align, dst_align); break; case BPF_MUL:
[PATCH net 1/2] selftests/bpf: subtraction bounds test
There is a bug in the verifier's handling of BPF_SUB: [a,b] - [c,d] yields was [a-c, b-d] rather than the correct [a-d, b-c]. So here is a test which, with the bogus handling, will produce ranges of [0,0] and thus allowed accesses; whereas the correct handling will give a range of [-255, 255] (and hence the right-shift will give a range of [0, 255]) and the accesses will be rejected. Signed-off-by: Edward Cree--- tools/testing/selftests/bpf/test_verifier.c | 28 1 file changed, 28 insertions(+) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index af7d173..addea82 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -5980,6 +5980,34 @@ static struct bpf_test tests[] = { .result = REJECT, .result_unpriv = REJECT, }, + { + "subtraction bounds (map value)", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, +BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_JMP_IMM(BPF_JGT, BPF_REG_1, 0xff, 7), + BPF_LDX_MEM(BPF_B, BPF_REG_3, BPF_REG_0, 1), + BPF_JMP_IMM(BPF_JGT, BPF_REG_3, 0xff, 5), + BPF_ALU64_REG(BPF_SUB, BPF_REG_1, BPF_REG_3), + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 56), + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr_unpriv = "R0 pointer arithmetic prohibited", + .errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.", + .result = REJECT, + .result_unpriv = REJECT, + }, }; static int probe_filter_length(const struct bpf_insn *fp)
[PATCH net 0/2] bpf: fix verifier min/max handling in BPF_SUB
I managed to come up with a test for the swapped bounds in BPF_SUB, so here it is along with a patch that fixes it, separated out from my 'rewrite everything' series so it can go to -stable. Edward Cree (2): selftests/bpf: subtraction bounds test bpf/verifier: fix min/max handling in BPF_SUB kernel/bpf/verifier.c | 21 +++-- tools/testing/selftests/bpf/test_verifier.c | 28 2 files changed, 43 insertions(+), 6 deletions(-)
Re: [PATCH v2 1/4] net: phy: at803x: Document RGMII RX and TX clock delay issues
On 21/07/2017 15:20, Timur Tabi wrote: > On 7/21/17 6:25 AM, Marc Gonzalez wrote: > >> + * NB: This code assumes that RGMII RX clock delay is disabled >> + * at reset, but actually, RX clock delay is enabled at reset. > > Could we change this to say, "RX clock delay is enabled at reset on some > systems."? Otherwise, it implies that the code is wrong 100% of the > time and should be fixed, not documented. I don't understand what you're saying. It is a correct observation that the code enabling RGMII RX clock delay is a NOP, since that bit will always be set at that point. The spec for the 8035 (I haven't checked for 8030 and 8031, is that what you meant by "other systems"?) states that Sel_clk125m_dsp, which is described as: "Control bit for rgmii interface rx clock delay" is 1 after HW reset, 1 after SW reset. So my statement "RX clock delay is enabled at reset" is universally true. It's not just on some systems. What am I missing? Regards.
Re: [PATCH v2 1/4] net: phy: at803x: Document RGMII RX and TX clock delay issues
On 7/21/17 6:25 AM, Marc Gonzalez wrote: +* NB: This code assumes that RGMII RX clock delay is disabled +* at reset, but actually, RX clock delay is enabled at reset. Could we change this to say, "RX clock delay is enabled at reset on some systems."? Otherwise, it implies that the code is wrong 100% of the time and should be fixed, not documented. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation.
Re: [Patch net] team: use a larger struct for mac address
Hi Cong, [auto build test WARNING on net/master] url: https://github.com/0day-ci/linux/commits/Cong-Wang/team-use-a-larger-struct-for-mac-address/20170721-203849 config: xtensa-allyesconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 4.9.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): drivers/net/team/team.c: In function '__set_port_dev_addr': >> drivers/net/team/team.c:67:9: warning: passing argument 2 of >> 'dev_set_mac_address' from incompatible pointer type return dev_set_mac_address(port_dev, ); ^ In file included from drivers/net/team/team.c:20:0: include/linux/netdevice.h:3290:5: note: expected 'struct sockaddr *' but argument is of type 'struct __kernel_sockaddr_storage *' int dev_set_mac_address(struct net_device *, struct sockaddr *); ^ vim +/dev_set_mac_address +67 drivers/net/team/team.c 3d249d4c Jiri Pirko 2011-11-11 55 3d249d4c Jiri Pirko 2011-11-11 56 /* 1d76efe1 Jiri Pirko 2012-08-17 57 * Since the ability to change device address for open port device is tested in 3d249d4c Jiri Pirko 2011-11-11 58 * team_port_add, this function can be called without control of return value 3d249d4c Jiri Pirko 2011-11-11 59 */ 1d76efe1 Jiri Pirko 2012-08-17 60 static int __set_port_dev_addr(struct net_device *port_dev, 3d249d4c Jiri Pirko 2011-11-11 61 const unsigned char *dev_addr) 3d249d4c Jiri Pirko 2011-11-11 62 { 44118243 Cong Wang 2017-07-20 63 struct sockaddr_storage addr; 3d249d4c Jiri Pirko 2011-11-11 64 44118243 Cong Wang 2017-07-20 65 memcpy(addr.__data, dev_addr, port_dev->addr_len); 44118243 Cong Wang 2017-07-20 66 addr.ss_family = port_dev->type; 3d249d4c Jiri Pirko 2011-11-11 @67 return dev_set_mac_address(port_dev, ); 3d249d4c Jiri Pirko 2011-11-11 68 } 3d249d4c Jiri Pirko 2011-11-11 69 :: The code at line 67 was first introduced by commit :: 3d249d4ca7d0ed6629a135ea1ea21c72286c0d80 net: introduce ethernet teaming device :: TO: Jiri Pirko <jpi...@redhat.com> :: CC: David S. Miller <da...@davemloft.net> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 0/4] RGMII RX and TX clock delays using Atheros 8035
On 21/07/2017 14:47, Marc Gonzalez wrote: > fping caps at 1000 packets per second, which limits > its usefulness as a benchmarking tool. "Normal" ping is slightly faster (5000 packets per second). time ping -f -c 6 -s 1450 -q 172.27.64.1 --- 172.27.64.1 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 12059ms rtt min/avg/max/mdev = 0.159/0.167/3.536/0.021 ms, ipg/ewma 0.201/0.166 ms real0m12.067s user0m0.899s sys 0m1.886s NIC statistics: rx_bytes_ok: 89760375 rx_frames_ok: 60003 rx_undersize_frames: 0 rx_fragment_frames: 0 rx_64_byte_frames: 2 rx_127_byte_frames: 0 rx_255_byte_frames: 1 rx_511_byte_frames: 0 rx_1023_byte_frames: 0 rx_max_size_frames: 6 rx_oversize_frames: 0 rx_bad_fcs_frames: 0 rx_broadcast_frames: 1 rx_multicast_frames: 0 rx_control_frames: 0 rx_pause_frames: 0 rx_unsup_control_frames: 0 rx_align_error_frames: 0 rx_overrun_frames: 0 rx_jabber_frames: 0 rx_bytes: 89760375 rx_frames: 60003 tx_bytes_ok: 89760128 tx_frames_ok: 60002 tx_64_byte_frames: 2 tx_127_byte_frames: 0 tx_255_byte_frames: 0 tx_511_byte_frames: 0 tx_1023_byte_frames: 0 tx_max_size_frames: 6 tx_oversize_frames: 0 tx_broadcast_frames: 1 tx_multicast_frames: 0 tx_control_frames: 0 tx_pause_frames: 0 tx_underrun_frames: 0 tx_single_collision_frames: 0 tx_multi_collision_frames: 0 tx_deferred_collision_frames: 0 tx_late_collision_frames: 0 tx_excessive_collision_frames: 0 tx_bytes: 89760128 tx_frames: 60002 tx_collisions: 0
Re: [Patch net] team: use a larger struct for mac address
Hi Cong, [auto build test ERROR on net/master] url: https://github.com/0day-ci/linux/commits/Cong-Wang/team-use-a-larger-struct-for-mac-address/20170721-203849 config: i386-randconfig-x019-07211017 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): drivers/net/team/team.c: In function '__set_port_dev_addr': >> drivers/net/team/team.c:67:39: error: passing argument 2 of >> 'dev_set_mac_address' from incompatible pointer type >> [-Werror=incompatible-pointer-types] return dev_set_mac_address(port_dev, ); ^ In file included from drivers/net/team/team.c:20:0: include/linux/netdevice.h:3290:5: note: expected 'struct sockaddr *' but argument is of type 'struct __kernel_sockaddr_storage *' int dev_set_mac_address(struct net_device *, struct sockaddr *); ^~~ cc1: some warnings being treated as errors vim +/dev_set_mac_address +67 drivers/net/team/team.c 3d249d4c Jiri Pirko 2011-11-11 55 3d249d4c Jiri Pirko 2011-11-11 56 /* 1d76efe1 Jiri Pirko 2012-08-17 57 * Since the ability to change device address for open port device is tested in 3d249d4c Jiri Pirko 2011-11-11 58 * team_port_add, this function can be called without control of return value 3d249d4c Jiri Pirko 2011-11-11 59 */ 1d76efe1 Jiri Pirko 2012-08-17 60 static int __set_port_dev_addr(struct net_device *port_dev, 3d249d4c Jiri Pirko 2011-11-11 61 const unsigned char *dev_addr) 3d249d4c Jiri Pirko 2011-11-11 62 { 44118243 Cong Wang 2017-07-20 63 struct sockaddr_storage addr; 3d249d4c Jiri Pirko 2011-11-11 64 44118243 Cong Wang 2017-07-20 65 memcpy(addr.__data, dev_addr, port_dev->addr_len); 44118243 Cong Wang 2017-07-20 66 addr.ss_family = port_dev->type; 3d249d4c Jiri Pirko 2011-11-11 @67 return dev_set_mac_address(port_dev, ); 3d249d4c Jiri Pirko 2011-11-11 68 } 3d249d4c Jiri Pirko 2011-11-11 69 :: The code at line 67 was first introduced by commit :: 3d249d4ca7d0ed6629a135ea1ea21c72286c0d80 net: introduce ethernet teaming device :: TO: Jiri Pirko <jpi...@redhat.com> :: CC: David S. Miller <da...@davemloft.net> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
Marc Gonzalezwrites: > According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 > ("Documentation: devicetree: clarify usage of the RGMII phy-modes") > there are 4 RGMII modes to handle: > > "rgmii" (RX and TX delays are added by the MAC when required) > "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, > the MAC should not add the RX or TX delays in this case) > "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, > the MAC should not add an RX delay in this case) > "rgmii-txid" (RGMII with internal TX delay provided by the PHY, > the MAC should not add an TX delay in this case) > > Add TX delay in the MAC only for rgmii and rgmii-rxid. > > Signed-off-by: Marc Gonzalez > --- > drivers/net/ethernet/aurora/nb8800.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/aurora/nb8800.c > b/drivers/net/ethernet/aurora/nb8800.c > index ded041dbafe7..f3ed320eb4ad 100644 > --- a/drivers/net/ethernet/aurora/nb8800.c > +++ b/drivers/net/ethernet/aurora/nb8800.c > @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device *dev) > break; > > case PHY_INTERFACE_MODE_RGMII: > - pad_mode = PAD_MODE_RGMII; > + case PHY_INTERFACE_MODE_RGMII_RXID: > + pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; > break; > > + case PHY_INTERFACE_MODE_RGMII_ID: > case PHY_INTERFACE_MODE_RGMII_TXID: > - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; > + pad_mode = PAD_MODE_RGMII; > break; > > default: I still don't like this. Having both the MAC and PHY drivers react to the phy-connection-type property is bound to cause trouble somewhere. The only way out of the current mess is to define new properties for both MAC and PHY that override the existing ones if present. -- Måns Rullgård
Re: [PATCH v2 2/4] net: ethernet: nb8800: Set RGMII_MODE for all RGMII modes
Marc Gonzalezwrites: > There are 4 RGMII phy-modes to handle. > > Signed-off-by: Marc Gonzalez Acked-by: Mans Rullgard > --- > drivers/net/ethernet/aurora/nb8800.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/aurora/nb8800.c > b/drivers/net/ethernet/aurora/nb8800.c > index 041cfb7952f8..ded041dbafe7 100644 > --- a/drivers/net/ethernet/aurora/nb8800.c > +++ b/drivers/net/ethernet/aurora/nb8800.c > @@ -609,7 +609,7 @@ static void nb8800_mac_config(struct net_device *dev) > mac_mode |= HALF_DUPLEX; > > if (gigabit) { > - if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII) > + if (phy_interface_is_rgmii(dev->phydev)) > mac_mode |= RGMII_MODE; > > mac_mode |= GMAC_MODE; > -- > 2.11.0 > -- Måns Rullgård
Re: [PATCH v2 0/4] RGMII RX and TX clock delays using Atheros 8035
On 21/07/2017 13:22, Marc Gonzalez wrote: > Changes from v1 > - Drop support for disabling RX and TX clock delays > (it breaks some boards). Document the issues instead. > - Split the MAC patch in two unrelated parts > - Fix the vantage 1172 DTS > > Marc Gonzalez (4): > net: phy: at803x: Document RGMII RX and TX clock delay issues > net: ethernet: nb8800: Set RGMII_MODE for all RGMII modes > net: ethernet: nb8800: Fix RGMII TX clock delay setup > ARM: dts: tango4: Add RGMII RX and TX clock delays > > arch/arm/boot/dts/tango4-vantage-1172.dts | 2 +- > drivers/net/ethernet/aurora/nb8800.c | 8 +--- > drivers/net/phy/at803x.c | 12 > 3 files changed, 18 insertions(+), 4 deletions(-) Rudimentary test: ip addr add 172.27.64.45/18 brd 172.27.127.255 dev eth0 ip link set eth0 up sleep 10 ## autoneg should be complete time fping -c 1 -b 1450 -p 1 -i 1 -q 172.27.64.1 ethtool -S eth0 cat /proc/interrupts # test_eth.sh [ 13.276242] ENTER at803x_enable_rx_delay [ 13.280283] BEFORE=82ee [ 13.282735] AFTER=82ee [ 13.285462] ENTER at803x_enable_tx_delay [ 13.289567] BEFORE=2d47 [ 13.292018] AFTER=2d47 [ 16.663266] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx 172.27.64.1 : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.18/0.21/3.79 real0m10.688s user0m0.106s sys 0m0.472s NIC statistics: rx_bytes_ok: 14960128 rx_frames_ok: 10002 rx_undersize_frames: 0 rx_fragment_frames: 0 rx_64_byte_frames: 2 rx_127_byte_frames: 0 rx_255_byte_frames: 0 rx_511_byte_frames: 0 rx_1023_byte_frames: 0 rx_max_size_frames: 1 rx_oversize_frames: 0 rx_bad_fcs_frames: 0 rx_broadcast_frames: 0 rx_multicast_frames: 0 rx_control_frames: 0 rx_pause_frames: 0 rx_unsup_control_frames: 0 rx_align_error_frames: 0 rx_overrun_frames: 0 rx_jabber_frames: 0 rx_bytes: 14960128 rx_frames: 10002 tx_bytes_ok: 14960128 tx_frames_ok: 10002 tx_64_byte_frames: 2 tx_127_byte_frames: 0 tx_255_byte_frames: 0 tx_511_byte_frames: 0 tx_1023_byte_frames: 0 tx_max_size_frames: 1 tx_oversize_frames: 0 tx_broadcast_frames: 1 tx_multicast_frames: 0 tx_control_frames: 0 tx_pause_frames: 0 tx_underrun_frames: 0 tx_single_collision_frames: 0 tx_multi_collision_frames: 0 tx_deferred_collision_frames: 0 tx_late_collision_frames: 0 tx_excessive_collision_frames: 0 tx_bytes: 14960128 tx_frames: 10002 tx_collisions: 0 CPU0 CPU1 19: 10825983 GIC-0 29 Edge twd 20: 85 0 irq0 1 Level ttyS0 21: 0 0 irq0 60 Level mmc0 22:228 0 irq0 8 Level mmc1 25: 20005 0 irq0 38 Level eth0 28: 1 0 irq0 37 Edge 26000.nb8800-mii:04 IPI0: 0 0 CPU wakeup interrupts IPI1: 0 0 Timer broadcast interrupts IPI2: 1183 1725 Rescheduling interrupts IPI3: 0 1 Function call interrupts IPI4: 0 0 CPU stop interrupts IPI5: 1 0 IRQ work interrupts IPI6: 0 0 completion interrupts Err: 0 fping caps at 1000 packets per second, which limits its usefulness as a benchmarking tool. Regards.
Backwards route when adding IPv6 peer?
Hi, this looks backwards to me: | root@blackbox:~# ip -6 addr add fd12:3456:789a::1 peer fd99::::1/48 dev sit0 | root@blackbox:~# ip -6 route show dev sit0 | fd12:3456:789a::/48 proto kernel metric 256 pref medium | fd99::::1 proto kernel metric 256 pref medium | root@blackbox:~# ip -6 addr show dev sit0 | 4: sit0@NONE: mtu 1480 state DOWN qlen 1 | inet6 fd12:3456:789a::1 peer fd99::::1/48 scope global |valid_lft forever preferred_lft forever The address+peer pair gets added just fine / appears correctly in the addr show output. But the routing table has the *local* address widened to the given prefix len, routing the local address space to the remote side. That is, I wanted to get a route to the peer's whole address space implicitly, but I got a route of *my* address space to the tunnel. Is this a Linux kernel bug? Or is it just working as expected? Quoting from http://www.policyrouting.org/iproute2.doc.html#ss9.2.1 | peer ADDRESS--- address of remote endpoint for pointopoint interfaces. | Again, the ADDRESS may be followed by a slash and decimal number, | encoding the network prefix length. If a peer address is specified then | the local address cannot have a network prefix length as the network | prefix is associated with the peer rather than with the local address. | In other words, netmasks can only be assigned to peer addresses when | specifying both peer and local addresses. This reads to me as if to support that the current kernel behaviour is wrong. blackbox is my notebook running something just before release of Debian 9: | root@blackbox:~# dpkg -l iproute2 | grep ^.i | ii iproute2 4.9.0-1 amd64networking and traffic control tools | root@blackbox:~# uname -a | Linux blackbox 4.9.0-2-amd64 #1 SMP Debian 4.9.13-1 (2017-02-27) x86_64 GNU/Linux but I had the same problem initially on my home server/router/firewall running Debian 8, while tweaking my OpenVPN tunnel configuration. | root@mandarb:~# dpkg -l iproute2 | grep ^.i | ii iproute2 3.16.0-2 amd64networking and traffic control tools | root@mandarb:~# uname -a | Linux mandarb 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux Regards, Fabian
[PATCH v2 3/4] net: ethernet: nb8800: Fix RGMII TX clock delay setup
According to commit e5f3a4a56ce2a707b2fb8ce37e4414dcac89c672 ("Documentation: devicetree: clarify usage of the RGMII phy-modes") there are 4 RGMII modes to handle: "rgmii" (RX and TX delays are added by the MAC when required) "rgmii-id" (RGMII with internal RX and TX delays provided by the PHY, the MAC should not add the RX or TX delays in this case) "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC should not add an RX delay in this case) "rgmii-txid" (RGMII with internal TX delay provided by the PHY, the MAC should not add an TX delay in this case) Add TX delay in the MAC only for rgmii and rgmii-rxid. Signed-off-by: Marc Gonzalez--- drivers/net/ethernet/aurora/nb8800.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c index ded041dbafe7..f3ed320eb4ad 100644 --- a/drivers/net/ethernet/aurora/nb8800.c +++ b/drivers/net/ethernet/aurora/nb8800.c @@ -1268,11 +1268,13 @@ static int nb8800_tangox_init(struct net_device *dev) break; case PHY_INTERFACE_MODE_RGMII: - pad_mode = PAD_MODE_RGMII; + case PHY_INTERFACE_MODE_RGMII_RXID: + pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; break; + case PHY_INTERFACE_MODE_RGMII_ID: case PHY_INTERFACE_MODE_RGMII_TXID: - pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY; + pad_mode = PAD_MODE_RGMII; break; default: -- 2.11.0
[PATCH v2 1/4] net: phy: at803x: Document RGMII RX and TX clock delay issues
The current code supports enabling RGMII RX and TX clock delays. The unstated assumption is that these settings are disabled by default at reset, which is not the case. RX clock delay is enabled at reset. And TX clock delay "survives" across SW resets. Thus, if the bootloader enables TX clock delay, it will remain enabled at reset in Linux. Signed-off-by: Marc Gonzalez--- drivers/net/phy/at803x.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c index c1e52b9dc58d..7a0954513b91 100644 --- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@ -283,6 +283,12 @@ static int at803x_config_init(struct phy_device *phydev) if (ret < 0) return ret; + /* +* NB: This code assumes that RGMII RX clock delay is disabled +* at reset, but actually, RX clock delay is enabled at reset. +* Disabling the delay if it has not been explicitly requested +* breaks boards that rely on the enabled-by-default behavior. +*/ if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID || phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) { ret = at803x_enable_rx_delay(phydev); @@ -290,6 +296,12 @@ static int at803x_config_init(struct phy_device *phydev) return ret; } + /* +* NB: This code assumes that RGMII TX clock delay is disabled +* at reset, but actually, TX clock delay "survives" across SW +* resets. If the bootloader enables TX clock delay, Linux is +* stuck with that setting. +*/ if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID || phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) { ret = at803x_enable_tx_delay(phydev); -- 2.11.0
[PATCH v2 2/4] net: ethernet: nb8800: Set RGMII_MODE for all RGMII modes
There are 4 RGMII phy-modes to handle. Signed-off-by: Marc Gonzalez--- drivers/net/ethernet/aurora/nb8800.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c index 041cfb7952f8..ded041dbafe7 100644 --- a/drivers/net/ethernet/aurora/nb8800.c +++ b/drivers/net/ethernet/aurora/nb8800.c @@ -609,7 +609,7 @@ static void nb8800_mac_config(struct net_device *dev) mac_mode |= HALF_DUPLEX; if (gigabit) { - if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII) + if (phy_interface_is_rgmii(dev->phydev)) mac_mode |= RGMII_MODE; mac_mode |= GMAC_MODE; -- 2.11.0
[PATCH v2 0/4] RGMII RX and TX clock delays using Atheros 8035
Changes from v1 - Drop support for disabling RX and TX clock delays (it breaks some boards). Document the issues instead. - Split the MAC patch in two unrelated parts - Fix the vantage 1172 DTS Marc Gonzalez (4): net: phy: at803x: Document RGMII RX and TX clock delay issues net: ethernet: nb8800: Set RGMII_MODE for all RGMII modes net: ethernet: nb8800: Fix RGMII TX clock delay setup ARM: dts: tango4: Add RGMII RX and TX clock delays arch/arm/boot/dts/tango4-vantage-1172.dts | 2 +- drivers/net/ethernet/aurora/nb8800.c | 8 +--- drivers/net/phy/at803x.c | 12 3 files changed, 18 insertions(+), 4 deletions(-) -- 2.11.0
[PATCH v2 4/4] ARM: dts: tango4: Add RGMII RX and TX clock delays
RX and TX clock delays are required. Signed-off-by: Marc Gonzalez--- arch/arm/boot/dts/tango4-vantage-1172.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/tango4-vantage-1172.dts b/arch/arm/boot/dts/tango4-vantage-1172.dts index 86d8df98802f..13bcc460bcb2 100644 --- a/arch/arm/boot/dts/tango4-vantage-1172.dts +++ b/arch/arm/boot/dts/tango4-vantage-1172.dts @@ -22,7 +22,7 @@ }; { - phy-connection-type = "rgmii"; + phy-connection-type = "rgmii-id"; phy-handle = <_phy>; #address-cells = <1>; #size-cells = <0>; -- 2.11.0
Re: [PATCH] bluetooth: document config options
Hi Pavel, >>> Kernel config options should include useful help text; I had to look >>> up the terms on wikipedia. >>> >>> Signed-off-by: Pavel Machek>>> >>> diff --git a/net/bluetooth/Kconfig b/net/bluetooth/Kconfig >>> index 68f951b..133c8a6 100644 >>> --- a/net/bluetooth/Kconfig >>> +++ b/net/bluetooth/Kconfig >>> @@ -45,6 +45,8 @@ config BT_BREDR >>> bool "Bluetooth Classic (BR/EDR) features" >>> depends on BT >>> default y >>> + help >>> + Support Bluetooth version 1 and 2 connections. >> >> so this is actually not correct. Version 5.0 is also Bluetooth BR/EDR. >> >> I am fine if use the terms "Basic Rate" and "Enhanced Data Rate" in the >> description, but the version numbers are kinda not how this works. So some >> alternative wording needs to be used: >> >> "Bluetooth Classic includes support for Basic Rate (BR) available with >> Bluetooth version 1.0b or later and support for Enhanced Data Rate (EDR) >> available with Bluetooth version 2.0 or later." >> > > Ok, works for me. > >>> + help >>> + Bluetooth 3 introduces high-speed mode where bluetooth endpoints >>> + negotiate fast connection over WIFI. This controls its support. >> >> Again, while Bluetooth version 3.0 introduces HS support, it is not what is >> selecting this. >> >> I think here we can write something like this: >> >> "Bluetooth High Speed includes support for off-loading Bluetooth connections >> via 802.11 physical layer available with Bluetooth version 3.0 or later." >> > > Ok. > >>> config BT_LE >>> bool "Bluetooth Low Energy (LE) features" >>> depends on BT >>> default y >>> + help >>> + Bluetooth 4 introduces special low-energy protocol, designed >>> + for simple devices. This controls its support. >> >> I think here it is also better to have an alternative text: >> >> "Bluetooth Low Energy includes support low-energy physical layer available >> with Bluetooth version 4.0 or later." >> > > Ok. Do I understand it correctly that Bluetooth LE basically has > nothing to do with bluetooth, and that Bluetooth 2 hardware will not > be able to detect / communicate with Bluetooth LE hardware? if you have LE only on one side and BR/EDR only on the other side, then they can not communicate with each other. You need at least one side with dual-mode BR/EDR/LE support. Regards Marcel
Re: [PATCH net v2] udp: preserve skb->dst if required for IP options processing
On Tue, Jul 18, 2017 at 11:57:55AM +0200, Paolo Abeni wrote: > Eric noticed that in udp_recvmsg() we still need to access > skb->dst while processing the IP options. > Since commit 0a463c78d25b ("udp: avoid a cache miss on dequeue") > skb->dst is no more available at recvmsg() time and bad things > will happen if we enter the relevant code path. > > This commit address the issue, avoid clearing skb->dst if > any IP options are present into the relevant skb. > Since the IP CB is contained in the first skb cacheline, we can > test it to decide to leverage the consume_stateless_skb() > optimization, without measurable additional cost in the faster > path. This doesn't seem to be enough for XFRM case. CMSG gets eaten up in selinux-testsuite[1]/inet_socket test in ipsec configuration. [1] https://github.com/SELinuxProject/selinux-testsuite.git -- Regards, Artem
[PATCH net-next v2 0/6] Allow to switch off UDP-based tunnel offloads per netdevice
This patchset adds a new netdevice feature to toggle RX offloads of UDP-based tunnel via ethtool. This is useful if the offload is causing issues, for example if the hardware is buggy. The feature is added to all devices providing the ->ndo_udp_tunnel_add op, and enabled by default to preserve current behavior. When the administrator disables this feature on a device, all currently offloaded ports are cleared from the device. When the feature is turned on, the stack notifies the device about all current ports. v2: - rename feature bit to NETIF_F_RX_UDP_TUNNEL_PORT - rename ethtool feature to rx-udp_tunnel-port-offload Sabrina Dubroca (6): net: add new netdevice feature for offload of RX port for UDP tunnels net: check UDP tunnel RX port offload feature before calling tunnel ndo ndo net: add infrastructure to un-offload UDP tunnel port net: call udp_tunnel_get_rx_info when NETIF_F_RX_UDP_TUNNEL_PORT is toggled geneve/vxlan: add support for NETDEV_UDP_TUNNEL_DROP_INFO geneve/vxlan: offload ports on register/unregister events drivers/net/geneve.c| 24 ++-- drivers/net/vxlan.c | 31 ++- include/linux/netdev_features.h | 2 ++ include/linux/netdevice.h | 1 + include/net/udp_tunnel.h| 8 net/core/dev.c | 28 +++- net/core/ethtool.c | 1 + net/ipv4/udp_tunnel.c | 25 - 8 files changed, 103 insertions(+), 17 deletions(-) -- 2.13.2
[PATCH net-next v2 1/6] net: add new netdevice feature for offload of RX port for UDP tunnels
This adds a new netdevice feature, so that the offloading of RX port for UDP tunnels can be disabled by the administrator on some netdevices, using the "rx-udp_tunnel-port-offload" feature in ethtool. This feature is set for all devices that provide ndo_udp_tunnel_add. Signed-off-by: Sabrina Dubroca--- include/linux/netdev_features.h | 2 ++ net/core/dev.c | 6 ++ net/core/ethtool.c | 1 + 3 files changed, 9 insertions(+) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index ebd273627334..dc8b4896b77b 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -75,6 +75,7 @@ enum { NETIF_F_HW_TC_BIT, /* Offload TC infrastructure */ NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload */ NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */ + NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */ /* * Add your fresh new feature above and remember to update @@ -138,6 +139,7 @@ enum { #define NETIF_F_HW_TC __NETIF_F(HW_TC) #define NETIF_F_HW_ESP __NETIF_F(HW_ESP) #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM) +#defineNETIF_F_RX_UDP_TUNNEL_PORT __NETIF_F(RX_UDP_TUNNEL_PORT) #define for_each_netdev_feature(mask_addr, bit)\ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) diff --git a/net/core/dev.c b/net/core/dev.c index 509af6ce8831..9081134adc0d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7530,6 +7530,12 @@ int register_netdevice(struct net_device *dev) */ dev->hw_features |= NETIF_F_SOFT_FEATURES; dev->features |= NETIF_F_SOFT_FEATURES; + + if (dev->netdev_ops->ndo_udp_tunnel_add) { + dev->features |= NETIF_F_RX_UDP_TUNNEL_PORT; + dev->hw_features |= NETIF_F_RX_UDP_TUNNEL_PORT; + } + dev->wanted_features = dev->features & dev->hw_features; if (!(dev->flags & IFF_LOOPBACK)) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 78408ab77a10..b987bc475fc8 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -105,6 +105,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_HW_TC_BIT] ="hw-tc-offload", [NETIF_F_HW_ESP_BIT] = "esp-hw-offload", [NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload", + [NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload", }; static const char -- 2.13.2
[PATCH net-next v2 5/6] geneve/vxlan: add support for NETDEV_UDP_TUNNEL_DROP_INFO
Signed-off-by: Sabrina Dubroca--- drivers/net/geneve.c | 19 +-- drivers/net/vxlan.c | 25 + 2 files changed, 30 insertions(+), 14 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index de8156c6b292..ee52233750f7 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1014,16 +1014,22 @@ static struct device_type geneve_type = { * supply the listening GENEVE udp ports. Callers are expected * to implement the ndo_udp_tunnel_add. */ -static void geneve_push_rx_ports(struct net_device *dev) +static void geneve_offload_rx_ports(struct net_device *dev, bool push) { struct net *net = dev_net(dev); struct geneve_net *gn = net_generic(net, geneve_net_id); struct geneve_sock *gs; rcu_read_lock(); - list_for_each_entry_rcu(gs, >sock_list, list) - udp_tunnel_push_rx_port(dev, gs->sock, - UDP_TUNNEL_TYPE_GENEVE); + list_for_each_entry_rcu(gs, >sock_list, list) { + if (push) { + udp_tunnel_push_rx_port(dev, gs->sock, + UDP_TUNNEL_TYPE_GENEVE); + } else { + udp_tunnel_drop_rx_port(dev, gs->sock, + UDP_TUNNEL_TYPE_GENEVE); + } + } rcu_read_unlock(); } @@ -1426,8 +1432,9 @@ static int geneve_netdevice_event(struct notifier_block *unused, { struct net_device *dev = netdev_notifier_info_to_dev(ptr); - if (event == NETDEV_UDP_TUNNEL_PUSH_INFO) - geneve_push_rx_ports(dev); + if (event == NETDEV_UDP_TUNNEL_PUSH_INFO || + event == NETDEV_UDP_TUNNEL_DROP_INFO) + geneve_offload_rx_ports(dev, event == NETDEV_UDP_TUNNEL_PUSH_INFO); return NOTIFY_DONE; } diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 96aa7e6cf214..4642d5be2fa0 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2608,7 +2608,7 @@ static struct device_type vxlan_type = { * supply the listening VXLAN udp ports. Callers are expected * to implement the ndo_udp_tunnel_add. */ -static void vxlan_push_rx_ports(struct net_device *dev) +static void vxlan_offload_rx_ports(struct net_device *dev, bool push) { struct vxlan_sock *vs; struct net *net = dev_net(dev); @@ -2617,11 +2617,19 @@ static void vxlan_push_rx_ports(struct net_device *dev) spin_lock(>sock_lock); for (i = 0; i < PORT_HASH_SIZE; ++i) { - hlist_for_each_entry_rcu(vs, >sock_list[i], hlist) - udp_tunnel_push_rx_port(dev, vs->sock, - (vs->flags & VXLAN_F_GPE) ? - UDP_TUNNEL_TYPE_VXLAN_GPE : - UDP_TUNNEL_TYPE_VXLAN); + hlist_for_each_entry_rcu(vs, >sock_list[i], hlist) { + unsigned short type; + + if (vs->flags & VXLAN_F_GPE) + type = UDP_TUNNEL_TYPE_VXLAN_GPE; + else + type = UDP_TUNNEL_TYPE_VXLAN; + + if (push) + udp_tunnel_push_rx_port(dev, vs->sock, type); + else + udp_tunnel_drop_rx_port(dev, vs->sock, type); + } } spin_unlock(>sock_lock); } @@ -3632,8 +3640,9 @@ static int vxlan_netdevice_event(struct notifier_block *unused, if (event == NETDEV_UNREGISTER) vxlan_handle_lowerdev_unregister(vn, dev); - else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO) - vxlan_push_rx_ports(dev); + else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO || +event == NETDEV_UDP_TUNNEL_DROP_INFO) + vxlan_offload_rx_ports(dev, event == NETDEV_UDP_TUNNEL_PUSH_INFO); return NOTIFY_DONE; } -- 2.13.2
[PATCH net-next v2 4/6] net: call udp_tunnel_get_rx_info when NETIF_F_RX_UDP_TUNNEL_PORT is toggled
NETIF_F_RX_UDP_TUNNEL_PORT is special, in that we need to do more than just flip the bit in dev->features. When disabling we must also clear currently offloaded ports from the device, and when enabling we must tell the device to offload the ports it can. Because vxlan stores its sockets in a hashtable and they are inserted at the head of per-bucket lists, switching the feature off and then on can result in a different set of ports being offloaded (depending on the HW's limits). Signed-off-by: Sabrina Dubroca--- net/core/dev.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 9081134adc0d..8ea6b4b42611 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -144,6 +144,7 @@ #include #include #include +#include #include "net-sysfs.h" @@ -7327,8 +7328,27 @@ int __netdev_update_features(struct net_device *dev) netdev_for_each_lower_dev(dev, lower, iter) netdev_sync_lower_features(dev, lower, features); - if (!err) + if (!err) { + netdev_features_t diff = features ^ dev->features; + + if (diff & NETIF_F_RX_UDP_TUNNEL_PORT) { + /* udp_tunnel_{get,drop}_rx_info both need +* NETIF_F_RX_UDP_TUNNEL_PORT enabled on the +* device, or they won't do anything. +* Thus we need to update dev->features +* *before* calling udp_tunnel_get_rx_info, +* but *after* calling udp_tunnel_drop_rx_info. +*/ + if (features & NETIF_F_RX_UDP_TUNNEL_PORT) { + dev->features = features; + udp_tunnel_get_rx_info(dev); + } else { + udp_tunnel_drop_rx_info(dev); + } + } + dev->features = features; + } return err < 0 ? 0 : 1; } -- 2.13.2
[PATCH net-next v2 6/6] geneve/vxlan: offload ports on register/unregister events
This improves consistency of handling when moving a netdev to another netns. Most drivers currently do a full reset when the device goes up, so that will flush the offload state anyway. Signed-off-by: Sabrina Dubroca--- drivers/net/geneve.c | 7 ++- drivers/net/vxlan.c | 10 +++--- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index ee52233750f7..6c6350b5c201 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1433,8 +1433,13 @@ static int geneve_netdevice_event(struct notifier_block *unused, struct net_device *dev = netdev_notifier_info_to_dev(ptr); if (event == NETDEV_UDP_TUNNEL_PUSH_INFO || - event == NETDEV_UDP_TUNNEL_DROP_INFO) + event == NETDEV_UDP_TUNNEL_DROP_INFO) { geneve_offload_rx_ports(dev, event == NETDEV_UDP_TUNNEL_PUSH_INFO); + } else if (event == NETDEV_UNREGISTER) { + geneve_offload_rx_ports(dev, false); + } else if (event == NETDEV_REGISTER) { + geneve_offload_rx_ports(dev, true); + } return NOTIFY_DONE; } diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 4642d5be2fa0..dbca067540d0 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -3638,11 +3638,15 @@ static int vxlan_netdevice_event(struct notifier_block *unused, struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); - if (event == NETDEV_UNREGISTER) + if (event == NETDEV_UNREGISTER) { + vxlan_offload_rx_ports(dev, false); vxlan_handle_lowerdev_unregister(vn, dev); - else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO || -event == NETDEV_UDP_TUNNEL_DROP_INFO) + } else if (event == NETDEV_REGISTER) { + vxlan_offload_rx_ports(dev, true); + } else if (event == NETDEV_UDP_TUNNEL_PUSH_INFO || + event == NETDEV_UDP_TUNNEL_DROP_INFO) { vxlan_offload_rx_ports(dev, event == NETDEV_UDP_TUNNEL_PUSH_INFO); + } return NOTIFY_DONE; } -- 2.13.2
[PATCH net-next v2 3/6] net: add infrastructure to un-offload UDP tunnel port
This adds a new NETDEV_UDP_TUNNEL_DROP_INFO event, similar to NETDEV_UDP_TUNNEL_PUSH_INFO, to signal to un-offload ports. This also adds udp_tunnel_drop_rx_port(), which calls ndo_udp_tunnel_del. Signed-off-by: Sabrina Dubroca--- include/linux/netdevice.h | 1 + include/net/udp_tunnel.h | 8 net/ipv4/udp_tunnel.c | 18 ++ 3 files changed, 27 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 614642eb7eb7..3a3cdc1b1f31 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2317,6 +2317,7 @@ struct netdev_lag_lower_state_info { #define NETDEV_PRECHANGEUPPER 0x001A #define NETDEV_CHANGELOWERSTATE0x001B #define NETDEV_UDP_TUNNEL_PUSH_INFO0x001C +#define NETDEV_UDP_TUNNEL_DROP_INFO0x001D #define NETDEV_CHANGE_TX_QUEUE_LEN 0x001E int register_netdevice_notifier(struct notifier_block *nb); diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index 02c5be037451..10cce0dd4450 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -115,6 +115,8 @@ struct udp_tunnel_info { /* Notify network devices of offloadable types */ void udp_tunnel_push_rx_port(struct net_device *dev, struct socket *sock, unsigned short type); +void udp_tunnel_drop_rx_port(struct net_device *dev, struct socket *sock, +unsigned short type); void udp_tunnel_notify_add_rx_port(struct socket *sock, unsigned short type); void udp_tunnel_notify_del_rx_port(struct socket *sock, unsigned short type); @@ -124,6 +126,12 @@ static inline void udp_tunnel_get_rx_info(struct net_device *dev) call_netdevice_notifiers(NETDEV_UDP_TUNNEL_PUSH_INFO, dev); } +static inline void udp_tunnel_drop_rx_info(struct net_device *dev) +{ + ASSERT_RTNL(); + call_netdevice_notifiers(NETDEV_UDP_TUNNEL_DROP_INFO, dev); +} + /* Transmit the skb using UDP encapsulation. */ void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, __be32 src, __be32 dst, __u8 tos, __u8 ttl, diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c index 0d3f14cdc524..6539ff15e9a3 100644 --- a/net/ipv4/udp_tunnel.c +++ b/net/ipv4/udp_tunnel.c @@ -94,6 +94,24 @@ void udp_tunnel_push_rx_port(struct net_device *dev, struct socket *sock, } EXPORT_SYMBOL_GPL(udp_tunnel_push_rx_port); +void udp_tunnel_drop_rx_port(struct net_device *dev, struct socket *sock, +unsigned short type) +{ + struct sock *sk = sock->sk; + struct udp_tunnel_info ti; + + if (!dev->netdev_ops->ndo_udp_tunnel_del || + !(dev->features & NETIF_F_RX_UDP_TUNNEL_PORT)) + return; + + ti.type = type; + ti.sa_family = sk->sk_family; + ti.port = inet_sk(sk)->inet_sport; + + dev->netdev_ops->ndo_udp_tunnel_del(dev, ); +} +EXPORT_SYMBOL_GPL(udp_tunnel_drop_rx_port); + /* Notify netdevs that UDP port started listening */ void udp_tunnel_notify_add_rx_port(struct socket *sock, unsigned short type) { -- 2.13.2
[PATCH net-next v2 2/6] net: check UDP tunnel RX port offload feature before calling tunnel ndo ndo
If NETIF_F_RX_UDP_TUNNEL_PORT was disabled on a given netdevice, skip the tunnel offload ndo call during tunnel port creation and deletion. Signed-off-by: Sabrina Dubroca--- net/ipv4/udp_tunnel.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c index 58bd39fb14b4..0d3f14cdc524 100644 --- a/net/ipv4/udp_tunnel.c +++ b/net/ipv4/udp_tunnel.c @@ -82,7 +82,8 @@ void udp_tunnel_push_rx_port(struct net_device *dev, struct socket *sock, struct sock *sk = sock->sk; struct udp_tunnel_info ti; - if (!dev->netdev_ops->ndo_udp_tunnel_add) + if (!dev->netdev_ops->ndo_udp_tunnel_add || + !(dev->features & NETIF_F_RX_UDP_TUNNEL_PORT)) return; ti.type = type; @@ -109,6 +110,8 @@ void udp_tunnel_notify_add_rx_port(struct socket *sock, unsigned short type) for_each_netdev_rcu(net, dev) { if (!dev->netdev_ops->ndo_udp_tunnel_add) continue; + if (!(dev->features & NETIF_F_RX_UDP_TUNNEL_PORT)) + continue; dev->netdev_ops->ndo_udp_tunnel_add(dev, ); } rcu_read_unlock(); @@ -131,6 +134,8 @@ void udp_tunnel_notify_del_rx_port(struct socket *sock, unsigned short type) for_each_netdev_rcu(net, dev) { if (!dev->netdev_ops->ndo_udp_tunnel_del) continue; + if (!(dev->features & NETIF_F_RX_UDP_TUNNEL_PORT)) + continue; dev->netdev_ops->ndo_udp_tunnel_del(dev, ); } rcu_read_unlock(); -- 2.13.2
Re: [PATCH] bluetooth: document config options
Hi! > > Kernel config options should include useful help text; I had to look > > up the terms on wikipedia. > > > > Signed-off-by: Pavel Machek> > > > diff --git a/net/bluetooth/Kconfig b/net/bluetooth/Kconfig > > index 68f951b..133c8a6 100644 > > --- a/net/bluetooth/Kconfig > > +++ b/net/bluetooth/Kconfig > > @@ -45,6 +45,8 @@ config BT_BREDR > > bool "Bluetooth Classic (BR/EDR) features" > > depends on BT > > default y > > + help > > + Support Bluetooth version 1 and 2 connections. > > so this is actually not correct. Version 5.0 is also Bluetooth BR/EDR. > > I am fine if use the terms "Basic Rate" and "Enhanced Data Rate" in the > description, but the version numbers are kinda not how this works. So some > alternative wording needs to be used: > > "Bluetooth Classic includes support for Basic Rate (BR) available with > Bluetooth version 1.0b or later and support for Enhanced Data Rate (EDR) > available with Bluetooth version 2.0 or later." > Ok, works for me. > > + help > > + Bluetooth 3 introduces high-speed mode where bluetooth endpoints > > + negotiate fast connection over WIFI. This controls its support. > > Again, while Bluetooth version 3.0 introduces HS support, it is not what is > selecting this. > > I think here we can write something like this: > > "Bluetooth High Speed includes support for off-loading Bluetooth connections > via 802.11 physical layer available with Bluetooth version 3.0 or later." > Ok. > > config BT_LE > > bool "Bluetooth Low Energy (LE) features" > > depends on BT > > default y > > + help > > + Bluetooth 4 introduces special low-energy protocol, designed > > + for simple devices. This controls its support. > > I think here it is also better to have an alternative text: > > "Bluetooth Low Energy includes support low-energy physical layer available > with Bluetooth version 4.0 or later." > Ok. Do I understand it correctly that Bluetooth LE basically has nothing to do with bluetooth, and that Bluetooth 2 hardware will not be able to detect / communicate with Bluetooth LE hardware? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH net-next 1/3] bluetooth: 6lowpan dev_close never returns error
Hi Stephen, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Stephen-Hemminger/net-make-dev_close-void/20170720-090123 config: x86_64-randconfig-a0-07211734 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): net/bluetooth/6lowpan.c: In function 'ifdown': >> net/bluetooth/6lowpan.c:621: warning: unused variable 'err' vim +/err +621 net/bluetooth/6lowpan.c 18722c24 Jukka Rissanen2013-12-11 618 7f118253 Jukka Rissanen2014-06-18 619 static void ifdown(struct net_device *netdev) 7f118253 Jukka Rissanen2014-06-18 620 { 7f118253 Jukka Rissanen2014-06-18 @621 int err; 7f118253 Jukka Rissanen2014-06-18 622 7f118253 Jukka Rissanen2014-06-18 623 rtnl_lock(); ca74f73e Stephen Hemminger 2017-07-18 624 dev_close(netdev); 7f118253 Jukka Rissanen2014-06-18 625 rtnl_unlock(); 7f118253 Jukka Rissanen2014-06-18 626 } 7f118253 Jukka Rissanen2014-06-18 627 :: The code at line 621 was first introduced by commit :: 7f118253820fc3ad38659485adb3ebdfe64820e1 Bluetooth: 6LoWPAN: Remove network devices when unloading :: TO: Jukka Rissanen:: CC: Marcel Holtmann --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] bluetooth: document config options
Hi Pavel, > Kernel config options should include useful help text; I had to look > up the terms on wikipedia. > > Signed-off-by: Pavel Machek> > diff --git a/net/bluetooth/Kconfig b/net/bluetooth/Kconfig > index 68f951b..133c8a6 100644 > --- a/net/bluetooth/Kconfig > +++ b/net/bluetooth/Kconfig > @@ -45,6 +45,8 @@ config BT_BREDR > bool "Bluetooth Classic (BR/EDR) features" > depends on BT > default y > + help > + Support Bluetooth version 1 and 2 connections. so this is actually not correct. Version 5.0 is also Bluetooth BR/EDR. I am fine if use the terms "Basic Rate" and "Enhanced Data Rate" in the description, but the version numbers are kinda not how this works. So some alternative wording needs to be used: "Bluetooth Classic includes support for Basic Rate (BR) available with Bluetooth version 1.0b or later and support for Enhanced Data Rate (EDR) available with Bluetooth version 2.0 or later." > source "net/bluetooth/rfcomm/Kconfig" > > @@ -58,11 +60,17 @@ config BT_HS > bool "Bluetooth High Speed (HS) features" > depends on BT_BREDR > default y > + help > + Bluetooth 3 introduces high-speed mode where bluetooth endpoints > + negotiate fast connection over WIFI. This controls its support. Again, while Bluetooth version 3.0 introduces HS support, it is not what is selecting this. I think here we can write something like this: "Bluetooth High Speed includes support for off-loading Bluetooth connections via 802.11 physical layer available with Bluetooth version 3.0 or later." > > config BT_LE > bool "Bluetooth Low Energy (LE) features" > depends on BT > default y > + help > + Bluetooth 4 introduces special low-energy protocol, designed > + for simple devices. This controls its support. I think here it is also better to have an alternative text: "Bluetooth Low Energy includes support low-energy physical layer available with Bluetooth version 4.0 or later." Regards Marcel
Re: [PATCH net-next 2/8] netvsc: add some rtnl_dereference annotations
Hi Stephen, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Stephen-Hemminger/netvsc-lockdep-and-related-fixes/20170720-191938 config: x86_64-rhel (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 Note: the linux-review/Stephen-Hemminger/netvsc-lockdep-and-related-fixes/20170720-191938 HEAD a5e48342eae178e9ed020d3634cdf92f1f05cb70 builds fine. It only hurts bisectibility. All error/warnings (new ones prefixed by >>): drivers/net/hyperv/netvsc.c: In function 'netvsc_switch_datapath': >> drivers/net/hyperv/netvsc.c:44:33: error: implicit declaration of function >> 'rtnl_dereference' [-Werror=implicit-function-declaration] struct netvsc_device *nv_dev = rtnl_dereference(net_device_ctx->nvdev); ^~~~ >> drivers/net/hyperv/netvsc.c:44:33: warning: initialization makes pointer >> from integer without a cast [-Wint-conversion] drivers/net/hyperv/netvsc.c: In function 'netvsc_device_remove': drivers/net/hyperv/netvsc.c:553:5: warning: initialization makes pointer from integer without a cast [-Wint-conversion] = rtnl_dereference(net_device_ctx->nvdev); ^~~~ cc1: some warnings being treated as errors vim +/rtnl_dereference +44 drivers/net/hyperv/netvsc.c 35 36 /* 37 * Switch the data path from the synthetic interface to the VF 38 * interface. 39 */ 40 void netvsc_switch_datapath(struct net_device *ndev, bool vf) 41 { 42 struct net_device_context *net_device_ctx = netdev_priv(ndev); 43 struct hv_device *dev = net_device_ctx->device_ctx; > 44 struct netvsc_device *nv_dev = rtnl_dereference(net_device_ctx->nvdev); 45 struct nvsp_message *init_pkt = _dev->channel_init_pkt; 46 47 memset(init_pkt, 0, sizeof(struct nvsp_message)); 48 init_pkt->hdr.msg_type = NVSP_MSG4_TYPE_SWITCH_DATA_PATH; 49 if (vf) 50 init_pkt->msg.v4_msg.active_dp.active_datapath = 51 NVSP_DATAPATH_VF; 52 else 53 init_pkt->msg.v4_msg.active_dp.active_datapath = 54 NVSP_DATAPATH_SYNTHETIC; 55 56 vmbus_sendpacket(dev->channel, init_pkt, 57 sizeof(struct nvsp_message), 58 (unsigned long)init_pkt, 59 VM_PKT_DATA_INBAND, 0); 60 61 net_device_ctx->datapath = vf; 62 } 63 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 0/4] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
On Mon, May 22, 2017 at 12:31:12PM -0700, Jeff Kirsher wrote: > On Fri, 2017-05-19 at 17:58 -0700, Amritha Nambiar wrote: > > The following series introduces a new harware offload mode in tc/mqprio > > where the TCs, the queue configurations and bandwidth rate limits are > > offloaded to the hardware. ... > This was meant to be sent out as an RFC, but apparently that did not get > conveyed when these were sent out Friday. I am looking at tc/mqprio and dcb with the purpose of implementing - Forwarding and Queuing Enhancements for Time-Sensitive Streams (FQTSS) 802.1Q-2014 Clause 34 - Scheduled Traffic (time based scheduling) P802.1Qbv using the HW capabilities of the i210. This series looks like a promising avenue for these features. My question is, did series go anywhere? I didn't see any follow ups on netdev, but maybe I missed something. Thanks, Richard
[PATCH] ceph: kernel client startsync can be removed
kernel client is still sending write,startsync. that startsync is a no-op (has been for years) and can probably be removed Link: http://tracker.ceph.com/issues/20604 Signed-off-by: Yanhu Cao--- fs/ceph/addr.c | 9 ++--- fs/ceph/file.c | 5 + include/linux/ceph/rados.h | 1 - net/ceph/osd_client.c | 5 - 4 files changed, 3 insertions(+), 17 deletions(-) diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 5083628..f9a1805 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -939,7 +939,7 @@ static int ceph_writepages_start(struct address_space *mapping, break; } - num_ops = 1 + do_sync; + num_ops = 1; strip_unit_end = page->index + ((len - 1) >> PAGE_SHIFT); @@ -1045,7 +1045,7 @@ static int ceph_writepages_start(struct address_space *mapping, for (i = 0; i < locked_pages; i++) { u64 cur_offset = page_offset(pages[i]); if (offset + len != cur_offset) { - if (op_idx + do_sync + 1 == req->r_num_ops) + if (op_idx + 1 == req->r_num_ops) break; osd_req_op_extent_dup_last(req, op_idx, cur_offset - offset); @@ -1082,17 +1082,12 @@ static int ceph_writepages_start(struct address_space *mapping, 0, !!pool, false); osd_req_op_extent_update(req, op_idx, len); - if (do_sync) { - op_idx++; - osd_req_op_init(req, op_idx, CEPH_OSD_OP_STARTSYNC, 0); - } BUG_ON(op_idx + 1 != req->r_num_ops); pool = NULL; if (i < locked_pages) { BUG_ON(num_ops <= req->r_num_ops); num_ops -= req->r_num_ops; - num_ops += do_sync; locked_pages -= i; /* allocate new pages array for next request */ diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 3d48c41..405c99b 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -800,7 +800,6 @@ static void ceph_aio_retry_work(struct work_struct *work) } req->r_ops[0] = orig_req->r_ops[0]; - osd_req_op_init(req, 1, CEPH_OSD_OP_STARTSYNC, 0); req->r_mtime = aio_req->mtime; req->r_data_offset = req->r_ops[0].extent.offset; @@ -874,8 +873,7 @@ static void ceph_aio_retry_work(struct work_struct *work) vino = ceph_vino(inode); req = ceph_osdc_new_request(>client->osdc, >i_layout, vino, pos, , 0, - /*include a 'startsync' command*/ - write ? 2 : 1, + 1, write ? CEPH_OSD_OP_WRITE : CEPH_OSD_OP_READ, flags, snapc, @@ -922,7 +920,6 @@ static void ceph_aio_retry_work(struct work_struct *work) truncate_inode_pages_range(inode->i_mapping, pos, (pos+len) | (PAGE_SIZE - 1)); - osd_req_op_init(req, 1, CEPH_OSD_OP_STARTSYNC, 0); req->r_mtime = mtime; } diff --git a/include/linux/ceph/rados.h b/include/linux/ceph/rados.h index 385db08..ba5adf0 100644 --- a/include/linux/ceph/rados.h +++ b/include/linux/ceph/rados.h @@ -226,7 +226,6 @@ struct ceph_eversion { \ /* fancy write */ \ f(APPEND, __CEPH_OSD_OP(WR, DATA, 6), "append") \ - f(STARTSYNC,__CEPH_OSD_OP(WR, DATA, 7), "startsync")\ f(SETTRUNC, __CEPH_OSD_OP(WR, DATA, 8), "settrunc") \ f(TRIMTRUNC,__CEPH_OSD_OP(WR, DATA, 9), "trimtrunc")\ \ diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 901bb82..5c9d696 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -863,8 +863,6 @@ static u32 osd_req_encode_op(struct ceph_osd_op *dst, dst->cls.method_len = src->cls.method_len; dst->cls.indata_len = cpu_to_le32(src->cls.indata_len); break; - case CEPH_OSD_OP_STARTSYNC: - break; case CEPH_OSD_OP_WATCH: dst->watch.cookie =
[PATCH V2 1/4] net-next: dsa: move struct dsa_device_ops to the global header file
We need to access this struct from within the flow_dissector to fix dissection for packets coming in on DSA devices. Signed-off-by: John Crispin--- include/net/dsa.h | 7 +++ net/dsa/dsa_priv.h | 7 --- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index 88da272d20d0..a4c0d52abc80 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -101,6 +101,13 @@ struct dsa_platform_data { struct packet_type; +struct dsa_device_ops { + struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev); + struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, + struct net_device *orig_dev); +}; + struct dsa_switch_tree { struct list_headlist; diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h index 55982cc39b24..62ea3663c2c6 100644 --- a/net/dsa/dsa_priv.h +++ b/net/dsa/dsa_priv.h @@ -65,13 +65,6 @@ struct dsa_notifier_vlan_info { int port; }; -struct dsa_device_ops { - struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev); - struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev, - struct packet_type *pt, - struct net_device *orig_dev); -}; - struct dsa_slave_priv { /* Copy of dp->ds->dst->tag_ops->xmit for faster access in hot path */ struct sk_buff *(*xmit)(struct sk_buff *skb, -- 2.11.0