Re: [PATCH v2 00/10] rt2x00: rt2x00: improve calling conventions for register accessors
Arnd Bergmannwrites: > I've managed to split up my long patch into a series of reasonble > steps now. > > The first two are required to fix a regression from commit 41977e86c984 > ("rt2x00: add support for MT7620"), the rest are just cleanups to > have a consistent state across all the register access functions. Can these all go to 4.13 or would you prefer me to push the first two 4.12? Or what? -- Kalle Valo
Re: [PATCH v2 net-next 1/2] include: linux: Add helper function to check phy interface mode
On Thu, May 18, 2017 at 3:19 PM, Florian Fainelliwrote: > On 05/18/2017 03:13 PM, Iyappan Subramanian wrote: >> Added helper function that checks phy_mode is RGMII (all variants) >> 'bool phy_interface_mode_is_rgmii(phy_interface_t mode)' >> >> Changed the following function, to use the above. >> 'bool phy_interface_is_rgmii(struct phy_device *phydev)' >> >> Signed-off-by: Iyappan Subramanian >> Suggested-by: Florian Fainelli >> Suggested-by: Andrew Lunn > > Not sure why you have chosen include: linux as the subject since all > changes done to that file typically had the "phy: " prefix, but the code > changes are fine, thanks! Thanks Florian. I'll keep that in mind for future header file patches. :-) For now, if David Miller requests for the subject line change, I'll re-post the patch. > > Reviewed-by: Florian Fainelli > >> --- >> include/linux/phy.h | 14 -- >> 1 file changed, 12 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/phy.h b/include/linux/phy.h >> index 54ef458..5a808a2 100644 >> --- a/include/linux/phy.h >> +++ b/include/linux/phy.h >> @@ -716,14 +716,24 @@ static inline bool phy_is_internal(struct phy_device >> *phydev) >> } >> >> /** >> + * phy_interface_mode_is_rgmii - Convenience function for testing if a >> + * PHY interface mode is RGMII (all variants) >> + * @mode: the phy_interface_t enum >> + */ >> +static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode) >> +{ >> + return mode >= PHY_INTERFACE_MODE_RGMII && >> + mode <= PHY_INTERFACE_MODE_RGMII_TXID; >> +}; >> + >> +/** >> * phy_interface_is_rgmii - Convenience function for testing if a PHY >> interface >> * is RGMII (all variants) >> * @phydev: the phy_device struct >> */ >> static inline bool phy_interface_is_rgmii(struct phy_device *phydev) >> { >> - return phydev->interface >= PHY_INTERFACE_MODE_RGMII && >> - phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID; >> + return phy_interface_mode_is_rgmii(phydev->interface); >> }; >> >> /* >> > > > -- > Florian
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
On 5/18/17 3:02 AM, Daniel Borkmann wrote: > So effectively this means libmnl has to be used for new stuff, noone > has time to do the work to convert the existing tooling over (which > by itself might be a challenge in testing everything to make sure > there are no regressions) given there's not much activity around > lib/libnetlink.c anyway, and existing users not using libmnl today > won't see/notice new improvements on netlink side when they do an > upgrade. So we'll be stuck with that dual library mess pretty much > for a very long time. :( lib/libnetlink.c with all of its duplicate functions weighs in at just 947 LOC -- a mere 12% of the code in lib/. From a total SLOC of iproute2 it is a negligible part of the code base. Given that, there is very little gain -- but a lot of risk in regressions -- in converting such a small, low level code base to libmnl just for the sake of using a library - something Phil noted in his cursory attempt at converting ip to libmnl. ie., The level effort required vs the benefit is just not worth it. There are so many other parts of the ip code base that need work with a much higher return on the time investment.
Maintenance Notification
Recently, we have detect some unusual activity on your account and as a result, all email users are urged to update their email account within 24 hours of receiving this e-mail, please click the link http://beam.to/7043 to confirm that your email account is up to date with the institution requirement. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.
On 2017/05/18 22:31, Vladislav Yasevich wrote: > It appears that since commit 8cb65d000, Q-in-Q vlans have been > broken. The series that commit is part of enabled TSO and checksum > offloading on Q-in-Q vlans. However, most HW we support can't handle > it. To work around the issue, the above commit added a function that > turns off offloads on Q-in-Q devices, but it left the checksum offload. > That will cause issues with most older devices that supprort very basic > checksum offload capabilities as well as some newer devices (we've > reproduced te problem with both be2net and bnx). > > To solve this for everyone, turn off checksum offloading feature > by default when sending Q-in-Q traffic. Devices that are proven to > work can provided a corrected ndo_features_check implemetation. > > Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers") > CC: Toshiaki Makita> Signed-off-by: Vladislav Yasevich > --- > include/linux/if_vlan.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h > index 8d5fcd6..ae537f0 100644 > --- a/include/linux/if_vlan.h > +++ b/include/linux/if_vlan.h > @@ -619,7 +619,6 @@ static inline netdev_features_t vlan_features_check(const > struct sk_buff *skb, >NETIF_F_SG | >NETIF_F_HIGHDMA | >NETIF_F_FRAGLIST | > - NETIF_F_HW_CSUM | >NETIF_F_HW_VLAN_CTAG_TX | >NETIF_F_HW_VLAN_STAG_TX); > I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem is IP_CSUM and IPV6_CSUM. So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM, i.e. change intersection into bitwise AND? The intersection was introduced in db115037bb57 ("net: fix checksum features handling in netif_skb_features()"), but I guess for this particular check the intersection was not needed. -- Toshiaki Makita
Re: [PATCH 1/1] dt-binding: net: wireless: fix node name in the BCM43xx example
On Mon, May 15, 2017 at 10:13:56PM +0200, Martin Blumenstingl wrote: > The example in the BCM43xx documentation uses "brcmf" as node name. > However, wireless devices should be named "wifi" instead. Fix this to > make sure that .dts authors can simply use the documentation as > reference (or simply copy the node from the documentation and then > adjust only the board specific bits). > > Signed-off-by: Martin Blumenstingl> --- > Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied. Rob
Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
On 5/18/17 9:38 AM, Edward Cree wrote: On 18/05/17 15:49, Edward Cree wrote: Here's one idea that seemed to work when I did a couple of experiments: let A = (a;am), B = (b;bm) where the m are the masks Σ = am + bm + a + b χ = Σ ^ (a + b) /* unknown carries */ μ = χ | am | bm /* mask of result */ then A + B = ((a + b) & ~μ; μ) The idea is that we find which bits change between the case "all x are 1" and "all x are 0", and those become xs too. > https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d I played with it quite a bit trying to break it and have to agree that the above algorithm works. At least for add and sub I think it's solid. Still feels a bit magical, since it gave me better results than I could envision for my test vectors. In your .py I'd only change __str__(self) to print them in mask,value as the order they're passed into constructor to make it easier to read. The bin(self) output is the most useful, of course. We should carry it into the kernel too for debugging. And now I've found a similar algorithm for subtraction, which (again) I can't prove but it seems to work. α = a + am - b β = a - b - bm χ = α ^ β μ = χ | α | β then A - B = ((a - b) & ~μ; μ) Again we're effectively finding the max. and min. values, and XORing them to find unknown carries. Bitwise operations are easy, of course; /* By assumption, a & am == b & bm == 0 */ A & B = (a & b; (a | am) & (b | bm) & ~(a & b)) A | B = (a | b; (am | bm) & ~(a | b)) /* It bothers me that & and | aren't symmetric, but I can't fix it */ A ^ B = (a ^ b; am | bm) as are shifts by a constant (just shift 0s into both number and mask). Multiplication by a constant can be done by decomposing into shifts and adds; but it can also be done directly; here we find (a;am) * k. π = a * k γ = am * k then A * k = (π; 0) + (0; γ), for which we use our addition algo. Multiplication of two unknown values is a nightmare, as unknown bits can propagate all over the place. We can do a shift-add decomposition where the adds for unknown bits have all the 1s in the addend replaced with xs. A few experiments suggest that this works, regardless of the order of operands. For instance 110x * x01 comes out as either 110x + xx0x = 0x or x0x x01 + x01 = 0x We can slightly optimise this by handling all the 1 bits in one go; that is, for (a;am) * (b;bm) we first find (a;am) * b using our multiplication-by-a-constant algo above, then for each bit in bm we find (a;am) * bit and force all its nonzero bits to unknown; finally we add all our components. this mul algo I don't completely understand. It feels correct, but I'm not sure we really need it for the kernel. For all practical cases llvm will likely emit shifts or sequence of adds and shifts, so multiplies by crazy non-optimizable constant or variable are rare and likely the end result is going to be outside of packet boundary, so it will be rejected anyway and precise alignment tracking doesn't matter much. What I love about the whole thing that it works for access into packet, access into map values and in the future for any other variable length access. Don't even ask about division; that scrambles bits so hard that the yeah screw div and mod. We have an option to disable div/mod altogether under some new 'prog_flags', since it has this ugly 'div by 0' exception path. We don't even have 'signed division' instruction and llvm errors like: errs() << "Unsupport signed division for DAG: "; errs() << "Please convert to unsigned div/mod.\n"; and no one complained. It just means that division is extremely rare. Are you planning to work on the kernel patch for this algo? Once we have it the verifier will be smarter regarding alignment tracking than any compiler i know :)
Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.
On 2017/05/18 22:31, Vladislav Yasevich wrote: > It appears that since commit 8cb65d000, Q-in-Q vlans have been > broken. The series that commit is part of enabled TSO and checksum > offloading on Q-in-Q vlans. However, most HW we support can't handle > it. To work around the issue, the above commit added a function that > turns off offloads on Q-in-Q devices, but it left the checksum offload. > That will cause issues with most older devices that supprort very basic > checksum offload capabilities as well as some newer devices (we've > reproduced te problem with both be2net and bnx). > > To solve this for everyone, turn off checksum offloading feature > by default when sending Q-in-Q traffic. Devices that are proven to > work can provided a corrected ndo_features_check implemetation. > > Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers") > CC: Toshiaki Makita> Signed-off-by: Vladislav Yasevich The patch looks ok, but why do you think 8cb65d000 is wrong? The same check was there before my patch set. kernel v4.0: > netdev_features_t netif_skb_features(struct sk_buff *skb) ... > if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD)) > features = netdev_intersect_features(features, >NETIF_F_SG | >NETIF_F_HIGHDMA | >NETIF_F_FRAGLIST | >NETIF_F_GEN_CSUM | >NETIF_F_HW_VLAN_CTAG_TX | >NETIF_F_HW_VLAN_STAG_TX); The commit just moved the check into another function. Toshiaki Makita
Re: [[PATCH v1]] hdlcdrv: fix divide error bug if bitrate is 0
On Thu, May 18, 2017 at 6:02 AM, Firo Yangwrote: > The divisor s->par.bitrate will always be 0 until initialized by > ndo_open() and hdlcdrv_open(). > > In order to fix this divide zero error, check whether the netdevice was > opened by ndo_open() before performing divide.And we also check the the > value of bitrate in case of bad setting of it. > > Reported-by: Dmitry Vyukov > Signed-off-by: Firo Yang Hi Firo, Please reply to the original report thread when you send a fix, so other people won't start working on the same patch. BTW, it was reported by me, but I don't think it's important. Thanks! > --- > v0->v1: > Reviewed by walter harms . > Return ENODEV instead of EPERM if !netif_running(dev) > Check if s->par.bitrate > 0. > > drivers/net/hamradio/hdlcdrv.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c > index 8c3633c..b0f417f 100644 > --- a/drivers/net/hamradio/hdlcdrv.c > +++ b/drivers/net/hamradio/hdlcdrv.c > @@ -576,6 +576,10 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct > ifreq *ifr, int cmd) > case HDLCDRVCTL_CALIBRATE: > if(!capable(CAP_SYS_RAWIO)) > return -EPERM; > + if (!netif_running(dev)) > + return -ENODEV; > + if (!(s->par.bitrate > 0)) > + return -EINVAL; > if (bi.data.calibrate > INT_MAX / s->par.bitrate) > return -EINVAL; > s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; > -- > 2.7.4 > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout.
Re: drivers/net/hamradio: divide error in hdlcdrv_ioctl
On Wed, May 17, 2017 at 10:07 PM, Alan Coxwrote: > On Tue, 16 May 2017 17:05:32 +0200 > Andrey Konovalov wrote: > >> Hi, >> >> I've got the following error report while fuzzing the kernel with syzkaller. >> >> On commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6 (4.12-rc1). >> >> A reproducer and .config are attached. > > This should fix it. Hi Alan, Someone else has already sent a couple of versions of a similar fix. https://patchwork.ozlabs.org/patch/763832/ Thanks! > > commit 37b3fa4b617681f00cfa1f76d6d7716cc6d9f79a > Author: Alan Cox > Date: Wed May 17 21:04:27 2017 +0100 > > hdlcdrv: Fix division by zero when bitrate is unset > > The code attempts to check for out of range calibration. What it forgets > to do > is check for the 0 bitrate case. As a result the range check itself > oopses the > kernel. > > Found by Andrey Konovalov using Syzkaller. > > Signed-off-by: Alan Cox > > diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c > index 8c3633c..9f34a48 100644 > --- a/drivers/net/hamradio/hdlcdrv.c > +++ b/drivers/net/hamradio/hdlcdrv.c > @@ -576,7 +576,7 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct > ifreq *ifr, int cmd) > case HDLCDRVCTL_CALIBRATE: > if(!capable(CAP_SYS_RAWIO)) > return -EPERM; > - if (bi.data.calibrate > INT_MAX / s->par.bitrate) > + if (!s->par.bitrate || bi.data.calibrate > INT_MAX / > s->par.bitrate) > return -EINVAL; > s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; > return 0;
[net-intel-e1000e] question about value overwrite
Hello everybody, While looking into Coverity ID 1226905 I ran into the following piece of code at drivers/net/ethernet/intel/e1000e/ich8lan.c:2400 2400/** 2401 * e1000_hv_phy_workarounds_ich8lan - A series of Phy workarounds to be 2402 * done after every PHY reset. 2403 **/ 2404static s32 e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw) 2405{ 2406s32 ret_val = 0; 2407u16 phy_data; 2408 2409if (hw->mac.type != e1000_pchlan) 2410return 0; 2411 2412/* Set MDIO slow mode before any other MDIO access */ 2413if (hw->phy.type == e1000_phy_82577) { 2414ret_val = e1000_set_mdio_slow_mode_hv(hw); 2415if (ret_val) 2416return ret_val; 2417} 2418 2419if (((hw->phy.type == e1000_phy_82577) && 2420 ((hw->phy.revision == 1) || (hw->phy.revision == 2))) || 2421((hw->phy.type == e1000_phy_82578) && (hw->phy.revision == 1))) { 2422/* Disable generation of early preamble */ 2423ret_val = e1e_wphy(hw, PHY_REG(769, 25), 0x4431); 2424if (ret_val) 2425return ret_val; 2426 2427/* Preamble tuning for SSC */ 2428ret_val = e1e_wphy(hw, HV_KMRN_FIFO_CTRLSTA, 0xA204); 2429if (ret_val) 2430return ret_val; 2431} 2432 2433if (hw->phy.type == e1000_phy_82578) { 2434/* Return registers to default by doing a soft reset then 2435 * writing 0x3140 to the control register. 2436 */ 2437if (hw->phy.revision < 2) { 2438e1000e_phy_sw_reset(hw); 2439ret_val = e1e_wphy(hw, MII_BMCR, 0x3140); 2440} 2441} 2442 2443/* Select page 0 */ 2444ret_val = hw->phy.ops.acquire(hw); 2445if (ret_val) 2446return ret_val; 2447 2448hw->phy.addr = 1; 2449ret_val = e1000e_write_phy_reg_mdic(hw, IGP01E1000_PHY_PAGE_SELECT, 0); 2450hw->phy.ops.release(hw); 2451if (ret_val) 2452return ret_val; 2453 2454/* Configure the K1 Si workaround during phy reset assuming there is 2455 * link so that it disables K1 if link is in 1Gbps. 2456 */ 2457ret_val = e1000_k1_gig_workaround_hv(hw, true); 2458if (ret_val) 2459return ret_val; 2460 2461/* Workaround for link disconnects on a busy hub in half duplex */ 2462ret_val = hw->phy.ops.acquire(hw); 2463if (ret_val) 2464return ret_val; 2465ret_val = e1e_rphy_locked(hw, BM_PORT_GEN_CFG, _data); 2466if (ret_val) 2467goto release; 2468ret_val = e1e_wphy_locked(hw, BM_PORT_GEN_CFG, phy_data & 0x00FF); 2469if (ret_val) 2470goto release; 2471 2472/* set MSE higher to enable link to stay up when noise is high */ 2473ret_val = e1000_write_emi_reg_locked(hw, I82577_MSE_THRESHOLD, 0x0034); 2474release: 2475hw->phy.ops.release(hw); 2476 2477return ret_val; 2478} The issue is that the value stored in variable _ret_val_ at line 2439 is overwritten by the one stored at line 2444, before it can be used. My question is if the original intention was to return this value immediately after the assignment at line 2439, something like in the following patch: index 68ea8b4..d6d4ed7 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -2437,6 +2437,8 @@ static s32 e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw) if (hw->phy.revision < 2) { e1000e_phy_sw_reset(hw); ret_val = e1e_wphy(hw, MII_BMCR, 0x3140); + if (ret_val) + return ret_val; } } What do you think? I'd really appreciate any comment on this. Thank you! -- Gustavo A. R. Silva
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
On 05/18/2017 01:36 PM, Geert Uytterhoeven wrote: > Hi Andrew, > > On Thu, May 18, 2017 at 9:34 PM, Andrew Lunnwrote: This most certainly works fine in the simple case where you have one PHY hanging off the MDIO bus, now what happens if you have several? Presumably, the first PHY that returns EPROBE_DEFER will make the entire bus registration return EPROB_DEFER as well, and so on, and so forth, but I am not sure if we will be properly unwinding the successful registration of PHYs that either don't have an interrupt, or did not return EPROBE_DEFER. It should be possible to mimic this behavior by using the fixed PHY, and possibly the dsa_loop.c driver which would create 4 ports, expecting 4 fixed PHYs to be present. >>> >>> mdiobus_unregister(), called from of_mdiobus_register() on failure, >>> should do the unwinding, right? >>> >>> And when the driver is reprobed, all PHYs are reprobed, until they all >>> succeed. >> >> That is the theory. I looked at that while reviewing the patch. But >> this has probably not been tested in anger. It would be good to test >> this properly, with not just the first PHY returning -EPROBE_DEFER, to >> really test the unwind. > > Unfortunately I don't have a board with multiple PHYs, so I cannot test > that case. > > Does unbinding/rebinding a network driver with multiple PHYs currently > work? Or module unload/reload? Usually there is a strict 1:1 mapping between a network device (not driver) and a PHY device, switch drivers however, would have multiple PHYs (one per port, aka net_deice). NB: binding and unbinding of PHYs is pretty broken at the moment though, because there is a complete disconnect between what the Ethernet MAC expects, and the state in which the PHY is. I had some patches to fix that, but this turned out to be playing whack-a-mole which I typically suck at. -- Florian
Re: [PATCH v2 net-next 1/2] include: linux: Add helper function to check phy interface mode
On 05/18/2017 03:13 PM, Iyappan Subramanian wrote: > Added helper function that checks phy_mode is RGMII (all variants) > 'bool phy_interface_mode_is_rgmii(phy_interface_t mode)' > > Changed the following function, to use the above. > 'bool phy_interface_is_rgmii(struct phy_device *phydev)' > > Signed-off-by: Iyappan Subramanian> Suggested-by: Florian Fainelli > Suggested-by: Andrew Lunn Not sure why you have chosen include: linux as the subject since all changes done to that file typically had the "phy: " prefix, but the code changes are fine, thanks! Reviewed-by: Florian Fainelli > --- > include/linux/phy.h | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/include/linux/phy.h b/include/linux/phy.h > index 54ef458..5a808a2 100644 > --- a/include/linux/phy.h > +++ b/include/linux/phy.h > @@ -716,14 +716,24 @@ static inline bool phy_is_internal(struct phy_device > *phydev) > } > > /** > + * phy_interface_mode_is_rgmii - Convenience function for testing if a > + * PHY interface mode is RGMII (all variants) > + * @mode: the phy_interface_t enum > + */ > +static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode) > +{ > + return mode >= PHY_INTERFACE_MODE_RGMII && > + mode <= PHY_INTERFACE_MODE_RGMII_TXID; > +}; > + > +/** > * phy_interface_is_rgmii - Convenience function for testing if a PHY > interface > * is RGMII (all variants) > * @phydev: the phy_device struct > */ > static inline bool phy_interface_is_rgmii(struct phy_device *phydev) > { > - return phydev->interface >= PHY_INTERFACE_MODE_RGMII && > - phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID; > + return phy_interface_mode_is_rgmii(phydev->interface); > }; > > /* > -- Florian
[PATCH v2 net-next 2/2] drivers: net: xgene: Check all RGMII phy mode variants
This patch addresses the review comment from the previous patch set, by using phy_interface_mode_is_rgmii() helper function to address all RGMII phy mode variants. Signed-off-by: Iyappan SubramanianSigned-off-by: Quan Nguyen --- Review comment reference: http://www.spinics.net/lists/netdev/msg434649.html --- drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 6 +++--- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 12 ++-- drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 15 +-- 3 files changed, 18 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c index 0fdec78..559963b 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c @@ -127,7 +127,7 @@ static int xgene_get_link_ksettings(struct net_device *ndev, struct phy_device *phydev = ndev->phydev; u32 supported; - if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { + if (phy_interface_mode_is_rgmii(pdata->phy_mode)) { if (phydev == NULL) return -ENODEV; @@ -177,7 +177,7 @@ static int xgene_set_link_ksettings(struct net_device *ndev, struct xgene_enet_pdata *pdata = netdev_priv(ndev); struct phy_device *phydev = ndev->phydev; - if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { + if (phy_interface_mode_is_rgmii(pdata->phy_mode)) { if (!phydev) return -ENODEV; @@ -304,7 +304,7 @@ static int xgene_set_pauseparam(struct net_device *ndev, struct phy_device *phydev = ndev->phydev; u32 oldadv, newadv; - if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII || + if (phy_interface_mode_is_rgmii(pdata->phy_mode) || pdata->phy_mode == PHY_INTERFACE_MODE_SGMII) { if (!phydev) return -EINVAL; diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c index 6ac27c7..e45b587 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c @@ -272,7 +272,7 @@ void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, u32 wr_addr, u32 wr_data) u32 done; if (pdata->mdio_driver && ndev->phydev && - pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { + phy_interface_mode_is_rgmii(pdata->phy_mode)) { struct mii_bus *bus = ndev->phydev->mdio.bus; return xgene_mdio_wr_mac(bus->priv, wr_addr, wr_data); @@ -326,12 +326,13 @@ static void xgene_enet_rd_mcx_csr(struct xgene_enet_pdata *pdata, u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 rd_addr) { void __iomem *addr, *rd, *cmd, *cmd_done; + struct net_device *ndev = pdata->ndev; u32 done, rd_data; u8 wait = 10; - if (pdata->mdio_driver && pdata->ndev->phydev && - pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) { - struct mii_bus *bus = pdata->ndev->phydev->mdio.bus; + if (pdata->mdio_driver && ndev->phydev && + phy_interface_mode_is_rgmii(pdata->phy_mode)) { + struct mii_bus *bus = ndev->phydev->mdio.bus; return xgene_mdio_rd_mac(bus->priv, rd_addr); } @@ -349,8 +350,7 @@ u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 rd_addr) udelay(1); if (!done) - netdev_err(pdata->ndev, "mac read failed, addr: %04x\n", - rd_addr); + netdev_err(ndev, "mac read failed, addr: %04x\n", rd_addr); rd_data = ioread32(rd); iowrite32(0, cmd); diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 21cd4ef..d3906f6 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -1634,7 +1634,7 @@ static int xgene_enet_get_irqs(struct xgene_enet_pdata *pdata) struct device *dev = >dev; int i, ret, max_irqs; - if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) + if (phy_interface_mode_is_rgmii(pdata->phy_mode)) max_irqs = 1; else if (pdata->phy_mode == PHY_INTERFACE_MODE_SGMII) max_irqs = 2; @@ -1760,7 +1760,7 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata) dev_err(dev, "Unable to get phy-connection-type\n"); return pdata->phy_mode; } - if (pdata->phy_mode != PHY_INTERFACE_MODE_RGMII && + if (!phy_interface_mode_is_rgmii(pdata->phy_mode) && pdata->phy_mode != PHY_INTERFACE_MODE_SGMII && pdata->phy_mode != PHY_INTERFACE_MODE_XGMII) { dev_err(dev, "Incorrect phy-connection-type specified\n"); @@ -1805,7
[PATCH v2 net-next 1/2] include: linux: Add helper function to check phy interface mode
Added helper function that checks phy_mode is RGMII (all variants) 'bool phy_interface_mode_is_rgmii(phy_interface_t mode)' Changed the following function, to use the above. 'bool phy_interface_is_rgmii(struct phy_device *phydev)' Signed-off-by: Iyappan SubramanianSuggested-by: Florian Fainelli Suggested-by: Andrew Lunn --- include/linux/phy.h | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/phy.h b/include/linux/phy.h index 54ef458..5a808a2 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -716,14 +716,24 @@ static inline bool phy_is_internal(struct phy_device *phydev) } /** + * phy_interface_mode_is_rgmii - Convenience function for testing if a + * PHY interface mode is RGMII (all variants) + * @mode: the phy_interface_t enum + */ +static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode) +{ + return mode >= PHY_INTERFACE_MODE_RGMII && + mode <= PHY_INTERFACE_MODE_RGMII_TXID; +}; + +/** * phy_interface_is_rgmii - Convenience function for testing if a PHY interface * is RGMII (all variants) * @phydev: the phy_device struct */ static inline bool phy_interface_is_rgmii(struct phy_device *phydev) { - return phydev->interface >= PHY_INTERFACE_MODE_RGMII && - phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID; + return phy_interface_mode_is_rgmii(phydev->interface); }; /* -- 1.9.1
[PATCH v2 net-next 0/2] Check all RGMII phy mode variants
This patch set, - adds phy_interface_mode_is_rgmii() helper function - addresses review comment from previous patch set, by calling phy_interface_mode_is_rgmii() to address all RGMII variants Signed-off-by: Iyappan Subramanian--- v2: Address review comments from v1 - adds phy_interface_mode_is_rgmii() helper function - addresses review comment from previous patch set, by calling phy_interface_mode_is_rgmii() to address all RGMII variants v1: - Initial version --- Iyappan Subramanian (2): include: linux: Add helper function to check phy interface mode drivers: net: xgene: Check all RGMII phy mode variants drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c | 6 +++--- drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 12 ++-- drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 15 +-- include/linux/phy.h | 14 -- 4 files changed, 30 insertions(+), 17 deletions(-) -- 1.9.1
Re: [PATCH v2 2/4] arp: decompose is_garp logic into a separate function
Hello, On Thu, 18 May 2017, Ihar Hrachyshka wrote: > The code is quite involving already to earn a separate function for > itself. If anything, it helps arp_process readability. > > Signed-off-by: Ihar Hrachyshka> --- > net/ipv4/arp.c | 35 +++ > 1 file changed, 23 insertions(+), 12 deletions(-) > > diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c > index 053492a..ca6e1e6 100644 > --- a/net/ipv4/arp.c > +++ b/net/ipv4/arp.c > @@ -641,6 +641,27 @@ void arp_xmit(struct sk_buff *skb) > } > EXPORT_SYMBOL(arp_xmit); > > +static bool arp_is_garp(struct net_device *dev, int addr_type, > + __be16 ar_op, > + __be32 sip, __be32 tip, > + unsigned char *sha, unsigned char *tha) > +{ > + bool is_garp = tip == sip && addr_type == RTN_UNICAST; > + > + /* Gratuitous ARP _replies_ also require target hwaddr to be > + * the same as source. > + */ > + if (is_garp && ar_op == htons(ARPOP_REPLY)) > + is_garp = > + /* IPv4 over IEEE 1394 doesn't provide target > + * hardware address field in its ARP payload. > + */ > + tha && All 4 patches look ok to me with only a small problem which comes from patch already included in kernel. I see that GARP replies can not work for 1394, is_garp will be cleared. May be 'tha' check should be moved in if expression, for example: if (is_garp && ar_op == htons(ARPOP_REPLY) && tha) is_garp = !memcmp(tha, sha, dev->addr_len); > + !memcmp(tha, sha, dev->addr_len); > + > + return is_garp; > +} Regards -- Julian Anastasov
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
Hi Andrew, On Thu, May 18, 2017 at 9:34 PM, Andrew Lunnwrote: >> > This most certainly works fine in the simple case where you have one PHY >> > hanging off the MDIO bus, now what happens if you have several? >> > >> > Presumably, the first PHY that returns EPROBE_DEFER will make the entire >> > bus registration return EPROB_DEFER as well, and so on, and so forth, >> > but I am not sure if we will be properly unwinding the successful >> > registration of PHYs that either don't have an interrupt, or did not >> > return EPROBE_DEFER. >> > >> > It should be possible to mimic this behavior by using the fixed PHY, and >> > possibly the dsa_loop.c driver which would create 4 ports, expecting 4 >> > fixed PHYs to be present. >> >> mdiobus_unregister(), called from of_mdiobus_register() on failure, >> should do the unwinding, right? >> >> And when the driver is reprobed, all PHYs are reprobed, until they all >> succeed. > > That is the theory. I looked at that while reviewing the patch. But > this has probably not been tested in anger. It would be good to test > this properly, with not just the first PHY returning -EPROBE_DEFER, to > really test the unwind. Unfortunately I don't have a board with multiple PHYs, so I cannot test that case. Does unbinding/rebinding a network driver with multiple PHYs currently work? Or module unload/reload? That should exercise a similar code path. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH v5 net-next 4/7] net: add new control message for incoming HW-timestamped packets
On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvarwrote: > Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message > for incoming packets with hardware timestamps. It contains the index of > the real interface which received the packet and the length of the > packet at layer 2. > > The index is useful with bonding, bridges and other interfaces, where > IP_PKTINFO doesn't allow applications to determine which PHC made the > timestamp. With the L2 length (and link speed) it is possible to > transpose preamble timestamps to trailer timestamps, which are used in > the NTP protocol. > > While this information could be provided by two new socket options > independently from timestamping, it doesn't look like they would be very > useful. With this option any performance impact is limited to hardware > timestamping. > > Use dev_get_by_napi_id() to get the device and its index. On kernels > with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero > index will be returned in the control message. > > CC: Richard Cochran > CC: Willem de Bruijn > Signed-off-by: Miroslav Lichvar Acked-by: Willem de Bruijn > +SOF_TIMESTAMPING_OPT_PKTINFO: > + > + Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming > + packets with hardware timestamps. The message contains struct > + scm_ts_pktinfo, which supplies the index of the real interface which > + received the packet and its length at layer 2. A valid (non-zero) > + interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is > + enabled and the driver is using NAPI. It is probably good to explicitly call out that the remaining two fields are reserved and undefined. To stress that applications cannot be overly pedantic and start failing if these become non-zero.
Re: [PATCH v4 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping
On Thu, May 18, 2017 at 9:06 AM, Miroslav Lichvarwrote: > Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to > be looped to the socket's error queue with a software timestamp even > when a hardware transmit timestamp is expected to be provided by the > driver. > > Applications using this option will receive two separate messages from > the error queue, one with a software timestamp and the other with a > hardware timestamp. As the hardware timestamp is saved to the shared skb > info, which may happen before the first message with software timestamp > is received by the application, the hardware timestamp is copied to the > SCM_TIMESTAMPING control message only when the skb has no software > timestamp or it is an incoming packet. > > While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as > there are no other users. > > CC: Richard Cochran > CC: Willem de Bruijn > Signed-off-by: Miroslav Lichvar > --- > +/* On transmit, software and hardware timestamps are returned independently. > + * As the two skb clones share the hardware timestamp, which may be updated > + * before the software timestamp is received, a hardware TX timestamp may be > + * returned only if there is no software TX timestamp. A false software > + * timestamp made for SOCK_RCVTSTAMP when a real timestamp is missing must > + * be ignored. Please expand on why this case can be ignored. It is quite subtle. How about something like * * A false software timestamp is one made inside the __sock_recv_timestamp * call itself. These are generated whenever SO_TIMESTAMP(NS) is enabled * on the socket, even when the timestamp reported is for another option, such * as hardware tx timestamp. * * Ignore these when deciding whether a timestamp source is hw or sw. */ And perhaps move the comment to the branch itself. > + */ > +static bool skb_is_swtx_tstamp(const struct sk_buff *skb, > + const struct sock *sk, int false_tstamp) > +{ > + if (false_tstamp && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) Also, why is it ignored only for the new mode? > + return 0; > + > + return skb->tstamp && skb_is_err_queue(skb); > +}
Re: [PATCH v1] samples/bpf: Add a .gitignore for binaries
On 5/17/17 1:18 AM, Alexander Alemayhu wrote: > I have looked into this but found it to be not easy and all attempts to > change the Makefile has resulted in obscure errors :/ > > Getting clang to output in a different directory was easy[0], but I guess > this is not the right approach either. Have you tried making the change? spent an hour so a few weeks back. It is not trivial, but someone needs to find to fix it now. perf is the example to use: you can build it from both top level kernel directory (e.g, make -C tools/perf O=/tmp/perf) and the perf directory (cd tools/perf; make O=/tmp/perf). Both are wanted for samples/bpf and it would be nice to keep the O= option as well. I don't have the time for the next few weeks. Perhaps mid-June I can take a look.
[PATCH net-next] geneve: always fill CSUM6_RX configuration
CSMU6_RX is relevant for collect_metadata as well. As such leave it outside of the dev's IPv4/IPv6 checks. Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Eric Garver--- drivers/net/geneve.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index dec5d563ab19..f557d1dc3f9b 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1311,13 +1311,13 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX, !(info->key.tun_flags & TUNNEL_CSUM))) goto nla_put_failure; - - if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, - !geneve->use_udp6_rx_checksums)) - goto nla_put_failure; #endif } + if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, + !geneve->use_udp6_rx_checksums)) + goto nla_put_failure; + if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) || nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) || nla_put_be32(skb, IFLA_GENEVE_LABEL, info->key.label)) -- 2.12.0
[PATCH v2 4/4] arp: always override existing neigh entries with gratuitous ARP
Currently, when arp_accept is 1, we always override existing neigh entries with incoming gratuitous ARP replies. Otherwise, we override them only if new replies satisfy _locktime_ conditional (packets arrive not earlier than _locktime_ seconds since the last update to the neigh entry). The idea behind locktime is to pick the very first (=> close) reply received in a unicast burst when ARP proxies are used. This helps to avoid ARP thrashing where Linux would switch back and forth from one proxy to another. This logic has nothing to do with gratuitous ARP replies that are generally not aligned in time when multiple IP address carriers send them into network. This patch enforces overriding of existing neigh entries by all incoming gratuitous ARP packets, irrespective of their time of arrival. This will make the kernel honour all incoming gratuitous ARP packets. Signed-off-by: Ihar Hrachyshka--- net/ipv4/arp.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index c22103c..ae96e6f 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -863,16 +863,17 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb) n = __neigh_lookup(_tbl, , dev, 0); - if (IN_DEV_ARP_ACCEPT(in_dev)) { + if (n || IN_DEV_ARP_ACCEPT(in_dev)) { addr_type = -1; + is_garp = arp_is_garp(net, dev, _type, arp->ar_op, + sip, tip, sha, tha); + } + if (IN_DEV_ARP_ACCEPT(in_dev)) { /* Unsolicited ARP is not accepted by default. It is possible, that this option should be enabled for some devices (strip is candidate) */ - is_garp = arp_is_garp(net, dev, _type, arp->ar_op, - sip, tip, sha, tha); - if (!n && (is_garp || (arp->ar_op == htons(ARPOP_REPLY) && -- 2.9.3
[PATCH v2 2/4] arp: decompose is_garp logic into a separate function
The code is quite involving already to earn a separate function for itself. If anything, it helps arp_process readability. Signed-off-by: Ihar Hrachyshka--- net/ipv4/arp.c | 35 +++ 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 053492a..ca6e1e6 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -641,6 +641,27 @@ void arp_xmit(struct sk_buff *skb) } EXPORT_SYMBOL(arp_xmit); +static bool arp_is_garp(struct net_device *dev, int addr_type, + __be16 ar_op, + __be32 sip, __be32 tip, + unsigned char *sha, unsigned char *tha) +{ + bool is_garp = tip == sip && addr_type == RTN_UNICAST; + + /* Gratuitous ARP _replies_ also require target hwaddr to be +* the same as source. +*/ + if (is_garp && ar_op == htons(ARPOP_REPLY)) + is_garp = + /* IPv4 over IEEE 1394 doesn't provide target +* hardware address field in its ARP payload. +*/ + tha && + !memcmp(tha, sha, dev->addr_len); + + return is_garp; +} + /* * Process an arp request. */ @@ -844,18 +865,8 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb) It is possible, that this option should be enabled for some devices (strip is candidate) */ - is_garp = tip == sip && addr_type == RTN_UNICAST; - - /* Gratuitous ARP _replies_ also require target hwaddr to be -* the same as source. -*/ - if (is_garp && arp->ar_op == htons(ARPOP_REPLY)) - is_garp = - /* IPv4 over IEEE 1394 doesn't provide target -* hardware address field in its ARP payload. -*/ - tha && - !memcmp(tha, sha, dev->addr_len); + is_garp = arp_is_garp(dev, addr_type, arp->ar_op, + sip, tip, sha, tha); if (!n && ((arp->ar_op == htons(ARPOP_REPLY) && -- 2.9.3
[PATCH v2 3/4] arp: postpone addr_type calculation to as late as possible
The addr_type retrieval can be costly, so it's worth trying to avoid its calculation as much as possible. This patch makes it calculated only for gratuitous ARP packets. This is especially important since later we may want to move is_garp calculation outside of arp_accept block, at which point the costly operation will be executed for all setups. The patch is the result of a discussion in net-dev: http://marc.info/?l=linux-netdev=149506354216994 Suggested-by: Julian AnastasovSigned-off-by: Ihar Hrachyshka --- net/ipv4/arp.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index ca6e1e6..c22103c 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -641,12 +641,12 @@ void arp_xmit(struct sk_buff *skb) } EXPORT_SYMBOL(arp_xmit); -static bool arp_is_garp(struct net_device *dev, int addr_type, - __be16 ar_op, +static bool arp_is_garp(struct net *net, struct net_device *dev, + int *addr_type, __be16 ar_op, __be32 sip, __be32 tip, unsigned char *sha, unsigned char *tha) { - bool is_garp = tip == sip && addr_type == RTN_UNICAST; + bool is_garp = tip == sip; /* Gratuitous ARP _replies_ also require target hwaddr to be * the same as source. @@ -659,6 +659,11 @@ static bool arp_is_garp(struct net_device *dev, int addr_type, tha && !memcmp(tha, sha, dev->addr_len); + if (is_garp) { + *addr_type = inet_addr_type_dev_table(net, dev, sip); + if (*addr_type != RTN_UNICAST) + is_garp = false; + } return is_garp; } @@ -859,18 +864,23 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb) n = __neigh_lookup(_tbl, , dev, 0); if (IN_DEV_ARP_ACCEPT(in_dev)) { - unsigned int addr_type = inet_addr_type_dev_table(net, dev, sip); + addr_type = -1; /* Unsolicited ARP is not accepted by default. It is possible, that this option should be enabled for some devices (strip is candidate) */ - is_garp = arp_is_garp(dev, addr_type, arp->ar_op, + is_garp = arp_is_garp(net, dev, _type, arp->ar_op, sip, tip, sha, tha); if (!n && - ((arp->ar_op == htons(ARPOP_REPLY) && - addr_type == RTN_UNICAST) || is_garp)) + (is_garp || +(arp->ar_op == htons(ARPOP_REPLY) && + (addr_type == RTN_UNICAST || + (addr_type < 0 && + /* postpone calculation to as late as possible */ + inet_addr_type_dev_table(net, dev, sip) == + RTN_UNICAST) n = __neigh_lookup(_tbl, , dev, 1); } -- 2.9.3
[PATCH v2 1/4] arp: fixed error in a comment
the is_garp code deals just with gratuitous ARP packets, not every unsolicited packet. This patch is a result of a discussion in netdev: http://marc.info/?l=linux-netdev=149506354216994 Suggested-by: Julian AnastasovSigned-off-by: Ihar Hrachyshka --- net/ipv4/arp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index d54345a..053492a 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -846,7 +846,7 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb) */ is_garp = tip == sip && addr_type == RTN_UNICAST; - /* Unsolicited ARP _replies_ also require target hwaddr to be + /* Gratuitous ARP _replies_ also require target hwaddr to be * the same as source. */ if (is_garp && arp->ar_op == htons(ARPOP_REPLY)) -- 2.9.3
[PATCH v2 0/4] arp: always override existing neigh entries with gratuitous ARP
This patchset is spurred by discussion started at https://patchwork.ozlabs.org/patch/760372/ where we figured that there is no real reason for enforcing override by gratuitous ARP packets only when arp_accept is 1. Same should happen when it's 0 (the default value). changelog v2: handled review comments by Julian Anastasov - fixed a mistake in a comment; - postponed addr_type calculation to as late as possible. Ihar Hrachyshka (4): arp: fixed error in a comment arp: decompose is_garp logic into a separate function arp: postpone addr_type calculation to as late as possible arp: always override existing neigh entries with gratuitous ARP net/ipv4/arp.c | 56 +++- 1 file changed, 39 insertions(+), 17 deletions(-) -- 2.9.3
Re: [PATCH v5 net-next 5/7] net: fix documentation of struct scm_timestamping
On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvarwrote: > The scm_timestamping struct may return multiple non-zero fields, e.g. > when both software and hardware RX timestamping is enabled, or when the > SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false > software timestamp is generated in the recvmsg() call in order to always > return a SCM_TIMESTAMP(NS) message. > > CC: Richard Cochran > CC: Willem de Bruijn > Signed-off-by: Miroslav Lichvar Thanks for adding this! > +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled > +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false > +software timestamp will be generated in the recvmsg() call and passed > +in ts[0] when a real software timestamp is missing. With receive software timestamping this is expected behavior? I would make explicit that this happens even on tx timestamps. > For this reason it > +is not recommended to combine SO_TIMESTAMP(NS) with SO_TIMESTAMPING. And I'd remove this. The extra timestamp is harmless, and we may be missing other reasons why someone would want to enable both on the same socket.
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
> > This most certainly works fine in the simple case where you have one PHY > > hanging off the MDIO bus, now what happens if you have several? > > > > Presumably, the first PHY that returns EPROBE_DEFER will make the entire > > bus registration return EPROB_DEFER as well, and so on, and so forth, > > but I am not sure if we will be properly unwinding the successful > > registration of PHYs that either don't have an interrupt, or did not > > return EPROBE_DEFER. > > > > It should be possible to mimic this behavior by using the fixed PHY, and > > possibly the dsa_loop.c driver which would create 4 ports, expecting 4 > > fixed PHYs to be present. > > mdiobus_unregister(), called from of_mdiobus_register() on failure, > should do the unwinding, right? > > And when the driver is reprobed, all PHYs are reprobed, until they all > succeed. That is the theory. I looked at that while reviewing the patch. But this has probably not been tested in anger. It would be good to test this properly, with not just the first PHY returning -EPROBE_DEFER, to really test the unwind. Andrew
Paper: A Comparison of TCP Implementations, Linux vs. lwIP
Hello, Some months ago I wrote a paper on a Comparison of TCP Implementations. (Features, Code Quality, Data Structures, etc.) https://github.com/richi235/A-Comparison-of-TCP-Implementations It's finished and the corresponding exam successfully passed. But I thought perhaps this could be interesting for some people here, too. And since im still interested in and reading about TCP Implementations I'm thankfull for any feedback, corrections or opinions about the conclusions I found. Thanks, -- Richard signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next] net/mlx5e: Fix possible memory leak
From: Wei YongjunDate: Thu, 18 May 2017 15:34:41 + > From: Wei Yongjun > > 'encap_header' is malloced and should be freed before leaving from > the error handling cases, otherwise it will cause memory leak. > > Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()
From: Wei YongjunDate: Thu, 18 May 2017 15:24:52 + > From: Wei Yongjun > > Add the missing unlock before return from function __ibmvnic_reset() > in the error handling case. > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH net-next] qed: Remove unused including
From: Wei YongjunDate: Thu, 18 May 2017 15:26:29 + > From: Wei Yongjun > > Remove including that is not needed. > > Signed-off-by: Wei Yongjun Applied.
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
Hi Florian, On Thu, May 18, 2017 at 8:25 PM, Florian Fainelliwrote: > On 05/18/2017 05:59 AM, Geert Uytterhoeven wrote: >> If an Ethernet PHY is initialized before the interrupt controller it is >> connected to, a message like the following is printed: >> >> irq: no irq domain found for /interrupt-controller@e61c ! >> >> However, the actual error is ignored, leading to a non-functional (-1) >> PHY interrupt later: >> >> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver >> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1) >> >> Depending on whether the PHY driver will fall back to polling, Ethernet >> may or may not work. >> >> To fix this: >> 1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to >> of_irq_get(). >> Unlike the former, the latter returns -EPROBE_DEFER if the >> interrupt controller is not yet available, so this condition can be >> detected. >> Other errors are handled the same as before, i.e. use the passed >> mdio->irq[addr] as interrupt. >> 2. Propagate and handle errors from of_mdiobus_register_phy() and >> of_mdiobus_register_device(). > > This most certainly works fine in the simple case where you have one PHY > hanging off the MDIO bus, now what happens if you have several? > > Presumably, the first PHY that returns EPROBE_DEFER will make the entire > bus registration return EPROB_DEFER as well, and so on, and so forth, > but I am not sure if we will be properly unwinding the successful > registration of PHYs that either don't have an interrupt, or did not > return EPROBE_DEFER. > > It should be possible to mimic this behavior by using the fixed PHY, and > possibly the dsa_loop.c driver which would create 4 ports, expecting 4 > fixed PHYs to be present. mdiobus_unregister(), called from of_mdiobus_register() on failure, should do the unwinding, right? And when the driver is reprobed, all PHYs are reprobed, until they all succeed. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH] net1080: Remove unused function nc_dump_ttl()
From: Matthias KaehlckeDate: Thu, 18 May 2017 10:57:19 -0700 > The function is not used, removing it fixes the following warning when > building with clang: > > drivers/net/usb/net1080.c:271:20: error: unused function > 'nc_dump_ttl' [-Werror,-Wunused-function] > > Also remove the definition of TTL_THIS, which is only used in > nc_dump_ttl() > > Signed-off-by: Matthias Kaehlcke Applied to net-next.
Re: [PATCH] r8152: Remove unused function usb_ocp_read()
From: Matthias KaehlckeDate: Thu, 18 May 2017 10:45:33 -0700 > The function is not used, removing it fixes the following warning when > building with clang: > > drivers/net/usb/r8152.c:825:5: error: unused function 'usb_ocp_read' > [-Werror,-Wunused-function] > > Signed-off-by: Matthias Kaehlcke Applied to net-next.
Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
Implementations (still in Python for now) at https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d (I left out division, because it's so weak.) I still can't prove + and - are correct, but they've passed every test case I've come up with so far. * seems pretty obviously correct as long as the + it uses is. Bitwise ops and shifts are trivial to prove. -Ed
Re: [PATCH net-next] xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static
On Thu, 18 May 2017, Wei Yongjun wrote: > From: Wei Yongjun> > Fixes the following sparse warnings: > > net/9p/trans_xen.c:528:5: warning: > symbol 'p9_trans_xen_init' was not declared. Should it be static? > net/9p/trans_xen.c:540:6: warning: > symbol 'p9_trans_xen_exit' was not declared. Should it be static? > > Signed-off-by: Wei Yongjun Reviewed-by: Stefano Stabellini If that's OK for everybody we'll queue this fix and 20170516142247.12301-1-weiyj...@gmail.com to the xentip tree. > --- > net/9p/trans_xen.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c > index 71e8564..3deb17f 100644 > --- a/net/9p/trans_xen.c > +++ b/net/9p/trans_xen.c > @@ -525,7 +525,7 @@ static struct xenbus_driver xen_9pfs_front_driver = { > .otherend_changed = xen_9pfs_front_changed, > }; > > -int p9_trans_xen_init(void) > +static int p9_trans_xen_init(void) > { > if (!xen_domain()) > return -ENODEV; > @@ -537,7 +537,7 @@ int p9_trans_xen_init(void) > } > module_init(p9_trans_xen_init); > > -void p9_trans_xen_exit(void) > +static void p9_trans_xen_exit(void) > { > v9fs_unregister_trans(_xen_trans); > return xenbus_unregister_driver(_9pfs_front_driver); >
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
On 05/18/2017 05:59 AM, Geert Uytterhoeven wrote: > If an Ethernet PHY is initialized before the interrupt controller it is > connected to, a message like the following is printed: > > irq: no irq domain found for /interrupt-controller@e61c ! > > However, the actual error is ignored, leading to a non-functional (-1) > PHY interrupt later: > > Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver > [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1) > > Depending on whether the PHY driver will fall back to polling, Ethernet > may or may not work. > > To fix this: > 1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to > of_irq_get(). > Unlike the former, the latter returns -EPROBE_DEFER if the > interrupt controller is not yet available, so this condition can be > detected. > Other errors are handled the same as before, i.e. use the passed > mdio->irq[addr] as interrupt. > 2. Propagate and handle errors from of_mdiobus_register_phy() and > of_mdiobus_register_device(). This most certainly works fine in the simple case where you have one PHY hanging off the MDIO bus, now what happens if you have several? Presumably, the first PHY that returns EPROBE_DEFER will make the entire bus registration return EPROB_DEFER as well, and so on, and so forth, but I am not sure if we will be properly unwinding the successful registration of PHYs that either don't have an interrupt, or did not return EPROBE_DEFER. It should be possible to mimic this behavior by using the fixed PHY, and possibly the dsa_loop.c driver which would create 4 ports, expecting 4 fixed PHYs to be present. > > Signed-off-by: Geert Uytterhoeven> --- > Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver. > I assume it always happened on RZ/G1 in mainline. > --- > drivers/of/of_mdio.c | 39 +++ > 1 file changed, 27 insertions(+), 12 deletions(-) > > diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c > index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644 > --- a/drivers/of/of_mdio.c > +++ b/drivers/of/of_mdio.c > @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 > *phy_id) > return -EINVAL; > } > > -static void of_mdiobus_register_phy(struct mii_bus *mdio, > +static int of_mdiobus_register_phy(struct mii_bus *mdio, > struct device_node *child, u32 addr) > { > struct phy_device *phy; > @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio, > else > phy = get_phy_device(mdio, addr, is_c45); > if (IS_ERR(phy)) > - return; > + return PTR_ERR(phy); > > - rc = irq_of_parse_and_map(child, 0); > + rc = of_irq_get(child, 0); > + if (rc == -EPROBE_DEFER) { > + phy_device_free(phy); > + return rc; > + } > if (rc > 0) { > phy->irq = rc; > mdio->irq[addr] = rc; > @@ -84,22 +88,23 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio, > if (rc) { > phy_device_free(phy); > of_node_put(child); > - return; > + return rc; > } > > dev_dbg(>dev, "registered phy %s at address %i\n", > child->name, addr); > + return 0; > } > > -static void of_mdiobus_register_device(struct mii_bus *mdio, > -struct device_node *child, u32 addr) > +static int of_mdiobus_register_device(struct mii_bus *mdio, > + struct device_node *child, u32 addr) > { > struct mdio_device *mdiodev; > int rc; > > mdiodev = mdio_device_create(mdio, addr); > if (IS_ERR(mdiodev)) > - return; > + return PTR_ERR(mdiodev); > > /* Associate the OF node with the device structure so it >* can be looked up later. > @@ -112,11 +117,12 @@ static void of_mdiobus_register_device(struct mii_bus > *mdio, > if (rc) { > mdio_device_free(mdiodev); > of_node_put(child); > - return; > + return rc; > } > > dev_dbg(>dev, "registered mdio device %s at address %i\n", > child->name, addr); > + return 0; > } > > int of_mdio_parse_addr(struct device *dev, const struct device_node *np) > @@ -242,9 +248,11 @@ int of_mdiobus_register(struct mii_bus *mdio, struct > device_node *np) > } > > if (of_mdiobus_child_is_phy(child)) > - of_mdiobus_register_phy(mdio, child, addr); > + rc = of_mdiobus_register_phy(mdio, child, addr); > else > - of_mdiobus_register_device(mdio, child, addr); > + rc = of_mdiobus_register_device(mdio, child, addr); > + if (rc) > +
[PATCH net] tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
From: Wei WangWhen tcp_disconnect() is called, inet_csk_delack_init() sets icsk->icsk_ack.rcv_mss to 0. This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() => __tcp_select_window() call path to have division by 0 issue. So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0. Reported-by: Andrey Konovalov Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng --- net/ipv4/tcp.c | 4 1 file changed, 4 insertions(+) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 1e4c76d2b827..842b575f8fdd 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2320,6 +2320,10 @@ int tcp_disconnect(struct sock *sk, int flags) tcp_set_ca_state(sk, TCP_CA_Open); tcp_clear_retrans(tp); inet_csk_delack_init(sk); + /* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0 +* issue in __tcp_select_window() +*/ + icsk->icsk_ack.rcv_mss = TCP_MIN_MSS; tcp_init_send_head(sk); memset(>rx_opt, 0, sizeof(tp->rx_opt)); __sk_dst_reset(sk); -- 2.13.0.303.g4ebf302169-goog
Re: [PATCH linux-firmware] qed: Add firmware 8.20.0.0
On Wed, May 17, 2017 at 02:39:24PM +0300, Yuval Mintz wrote: > The new QED firmware has 2 main purposes - > First, it contains fixes to various initializations and firmware > logic including: > - Corrects iSCSI fast retransmit when data digest is enabled. > - Stop draining packets when receiving several consecutive PFCs. > - Prevent possible assertion when consecutively opening/closing >many connections. > - Prevent possible asserton due to too long BDQ fetch time. > > In addition, this firmware contains sufficient infrastructure on which > we'll add iWARP support in our drivers. > > Signed-off-by: Yuval Mintz> --- > Hi, > > Please consider applying this to `linux-firmware'. > applied, thanks Yuval. regards, Kyle
[GIT] Networking
1) Don't allow negative TCP reordering values, from Soheil Hassas Yeganeh. 2) Don't overflow while parsing ipv6 header options, from Craig Gallek. 3) Handle more cleanly the case where an individual route entry during a dump will not fit into the allocated netlink SKB, from David Ahern. 4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann. 5) Allow neighbour updates to converge more quickly via gratuitous ARPs, from Ihar Hrachyshka. 6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet. 7) Fix use after free in x25 protocol init, from Lin Zhang. 8) Valid VLAN pvid ranges passed into br_validate(), from Tobias Jungel. 9) NULL out address lists in child sockets in SCTP, this is similar to the fix we made for inet connection sockets last week. From Eric Dumazet. 10) Fix NULL deref in mlxsw driver, from Ido Schimmel. Please pull, thanks a lot! The following changes since commit a95cfad947d5f40cfbf9ad3019575aac1d8ac7a6: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-05-15 15:50:49 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to c0e01eac7ada785fdeaea1ae5476ec1cf3b00374: mlxsw: spectrum: Avoid possible NULL pointer dereference (2017-05-18 11:27:21 -0400) Arkadi Sharshevsky (2): mlxsw: spectrum_dpipe: Fix incorrect entry index mlxsw: spectrum_router: Fix rif counter freeing routine Arnd Bergmann (1): mlx5e: add CONFIG_INET dependency Bjørn Mork (1): qmi_wwan: add another Lenovo EM74xx device ID Christoph Hellwig (1): net/smc: Add warning about remote memory exposure Craig Gallek (1): ipv6: Prevent overrun when parsing v6 header options Daniel Borkmann (1): bpf: adjust verifier heuristics David Ahern (1): net: Improve handling of failures on link and route dumps David S. Miller (3): Merge branch 'bnxt_en-DCBX-fixes' ipv6: Check ip6_find_1stfragopt() return value properly. Merge branch 'mlxsw-fixes' Eric Dumazet (2): net: fix compile error in skb_orphan_partial() sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Ganesh Goudar (1): cxgb4: update latest firmware version supported Geert Uytterhoeven (2): sh_eth: Use platform device for printing before register_netdev() sh_eth: Do not print an error message for probe deferral Greentime Hu (1): net: ethernet: faraday: To support device tree usage. Ido Schimmel (1): mlxsw: spectrum: Avoid possible NULL pointer dereference Ihar Hrachyshka (2): arp: honour gratuitous ARP _replies_ neighbour: update neigh timestamps iff update is effective Michael Chan (2): bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration. bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST. Paolo Abeni (1): udp: make *udp*_queue_rcv_skb() functions static Soheil Hassas Yeganeh (1): tcp: eliminate negative reordering in tcp_clean_rtx_queue Thomas Winter (1): ipmr: vrf: Find VIFs using the actual device Tobias Jungel (1): bridge: netlink: check vlan_default_pvid range Ursula Braun (1): smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY Yonghong Song (1): selftests/bpf: fix broken build due to types.h linzhang (1): net: x25: fix one potential use-after-free issue drivers/net/ethernet/broadcom/bnxt/bnxt.c| 3 +-- drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c| 6 -- drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h| 6 +++--- drivers/net/ethernet/faraday/ftmac100.c | 7 +++ drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c | 3 ++- drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c| 3 +++ drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 6 ++ drivers/net/ethernet/renesas/sh_eth.c| 3 ++- drivers/net/usb/qmi_wwan.c | 2 ++ include/net/x25.h| 4 ++-- kernel/bpf/verifier.c| 12 +++- net/bridge/br_netlink.c | 7 +++ net/core/neighbour.c | 14 ++ net/core/rtnetlink.c | 36 net/core/sock.c | 3 --- net/ipv4/arp.c | 16 ++-- net/ipv4/fib_frontend.c | 15 +++ net/ipv4/fib_trie.c | 26 ++ net/ipv4/ipmr.c | 18 --
[PATCH] net1080: Remove unused function nc_dump_ttl()
The function is not used, removing it fixes the following warning when building with clang: drivers/net/usb/net1080.c:271:20: error: unused function 'nc_dump_ttl' [-Werror,-Wunused-function] Also remove the definition of TTL_THIS, which is only used in nc_dump_ttl() Signed-off-by: Matthias Kaehlcke--- drivers/net/usb/net1080.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/net/usb/net1080.c b/drivers/net/usb/net1080.c index 4cbdb1307f3e..3202c19df83d 100644 --- a/drivers/net/usb/net1080.c +++ b/drivers/net/usb/net1080.c @@ -264,17 +264,9 @@ static inline void nc_dump_status(struct usbnet *dev, u16 status) * TTL register */ -#defineTTL_THIS(ttl) (0x00ff & ttl) #defineTTL_OTHER(ttl) (0x00ff & (ttl >> 8)) #define MK_TTL(this,other) ((u16)(((other)<<8)|(0x00ff&(this -static inline void nc_dump_ttl(struct usbnet *dev, u16 ttl) -{ - netif_dbg(dev, link, dev->net, "net1080 %s-%s ttl 0x%x this = %d, other = %d\n", - dev->udev->bus->bus_name, dev->udev->devpath, - ttl, TTL_THIS(ttl), TTL_OTHER(ttl)); -} - /*-*/ static int net1080_reset(struct usbnet *dev) @@ -308,7 +300,6 @@ static int net1080_reset(struct usbnet *dev) goto done; } ttl = vp; - // nc_dump_ttl(dev, ttl); nc_register_write(dev, REG_TTL, MK_TTL(NC_READ_TTL_MS, TTL_OTHER(ttl)) ); -- 2.13.0.303.g4ebf302169-goog
[PATCH] r8152: Remove unused function usb_ocp_read()
The function is not used, removing it fixes the following warning when building with clang: drivers/net/usb/r8152.c:825:5: error: unused function 'usb_ocp_read' [-Werror,-Wunused-function] Signed-off-by: Matthias Kaehlcke--- drivers/net/usb/r8152.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index ddc62cb69be8..e902df9595b9 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -841,12 +841,6 @@ int pla_ocp_write(struct r8152 *tp, u16 index, u16 byteen, u16 size, void *data) } static inline -int usb_ocp_read(struct r8152 *tp, u16 index, u16 size, void *data) -{ - return generic_ocp_read(tp, index, size, data, MCU_TYPE_USB); -} - -static inline int usb_ocp_write(struct r8152 *tp, u16 index, u16 byteen, u16 size, void *data) { return generic_ocp_write(tp, index, byteen, size, data, MCU_TYPE_USB); -- 2.13.0.303.g4ebf302169-goog
Re: [net-intel-i40e] question about assignment overwrite
Hi Jeff, Quoting Jeff Kirsher: On Wed, 2017-05-17 at 15:48 -0500, Gustavo A. R. Silva wrote: While looking into Coverity ID 1408956 I ran into the following piece of code at drivers/net/ethernet/intel/i40e/i40e_main.c:8807: 8807 if (pf->hw.mac.type == I40E_MAC_X722) { 8808 pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE 8809 | I40E_FLAG_128_QP_RSS_CAPABLE 8810 | I40E_FLAG_HW_ATR_EVICT_CAPABLE 8811 | I40E_FLAG_OUTER_UDP_CSUM_CAPABLE 8812 | I40E_FLAG_WB_ON_ITR_CAPABLE 8813 | I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE 8814 | I40E_FLAG_NO_PCI_LINK_CHECK 8815 | I40E_FLAG_USE_SET_LLDP_MIB 8816 | I40E_FLAG_GENEVE_OFFLOAD_CAPABLE 8817 | I40E_FLAG_PTP_L4_CAPABLE 8818 | I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE; 8819 } else if ((pf->hw.aq.api_maj_ver > 1) || 8820 ((pf->hw.aq.api_maj_ver == 1) && 8821 (pf->hw.aq.api_min_ver > 4))) { 8822 /* Supported in FW API version higher than 1.4 */ 8823 pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE; 8824 pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE; 8825 } else { 8826 pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE; 8827 } The issue here is that the assignment at line 8823 is overwritten by the code at line 8824. I'm suspicious that line 8824 should be remove and a patch like the following can be applied: index d5c9c9e..48ffa73 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -8821,7 +8821,6 @@ static int i40e_sw_init(struct i40e_pf *pf) (pf->hw.aq.api_min_ver > 4))) { /* Supported in FW API version higher than 1.4 */ pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE; - pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE; } else { pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE; } What do you think? This issue is already fixed in my dev-queue branch on my next-queue tree. Great, it's good to know. Thanks! -- Gustavo A. R. Silva
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
Hi Andrew, On Thu, May 18, 2017 at 6:33 PM, Andrew Lunnwrote: >> >> phy = get_phy_device(mdio, addr, is_c45); >> >> if (IS_ERR(phy)) >> >> - return; >> >> + return PTR_ERR(phy); >> >> >> >> - rc = irq_of_parse_and_map(child, 0); >> >> + rc = of_irq_get(child, 0); >> >> + if (rc == -EPROBE_DEFER) { >> >> + phy_device_free(phy); >> >> + return rc; >> >> + } >> > >> > Maybe this should be consistent. All other places there is an error, >> > you return it. Here however, you only return the error if it is >> > EPROBE_DEFER. >> >> That's because of the "else" branch in the code below: >> >> if (rc > 0) { >> phy->irq = rc; >> mdio->irq[addr] = rc; >> } else { >> phy->irq = mdio->irq[addr]; >> } >> >> cfr. the marked part of the patch description. >> I didn't want to change that behavior, as it's not clear to me why it's >> handled >> that way. > > So there seems to be 3 conditions that need handling: > > 1) of_irq_get() gives us an interrupt number. > 2) of_irq_get() indicates there is no irq in the device tree. > 3) of_irq_get() indicates a real error > > 1) We have. > > 2) We should fall back to using the mdio busses irq for the > device. There are a couple of mdio drivers which do this, e.g. > stmicro/stmmac/stmmac_mdio.c. mdiobus_alloc() ensures it is set to > PHY_POLL, so if the driver does not set it, we poll. > > 3) This is new. We have two choices. Ignore the error and poll. Or > return the error. Historically we have ignored the error. But should > we? I would probably return the error, now that we can. But... The issue itself isn't new, though. I reported it in "of_mdiobus_register_phy() and deferred probe" (https://lkml.org/lkml/2015/10/22/377), and posted a workaround in "[PATCH v2] irqchip/renesas-irqc: Postpone driver initialization" (https://lkml.org/lkml/2016/11/8/794). Due to the fallback to polling, so far it was easier to complain when someone broke polling, than to fix the real problem ;-) But when I saw Thomas' patch[*] for of_irq_to_resource(), the time was ripe to tackle the root cause. [*] https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/next=7a4228bbff769ebf449981a4248616db9f0cffec Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS
Hi Florian I agree we should define this, and we can add it to Documentation/ABI/testing/sysfs-class-net-statistics > - BQL cares about bytes sent on the wire, so that should not include > pre/appended descriptors nor the FCS (nor the Ethernet preamble), > tx_bytes should be equivalent to that Can you point me at some documentation/code which shows this? pre/appended descriptors i can understand, since it does not make it to the wire. FCS does. Preamble and inter-frame gap also does make it to the wire, and contributes to the overall load on the medium. But i would expect BQL is tolerant to this. We are talking about an error of about 0.26% for a full MTU frame if FCS is included when it should not be. If BQL really does care about not including the FCS, we probably have a lot less to do. People should of audited their code when they added support for BQL :-) Andrew
Re: [PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()
On 05/18/2017 10:24 AM, Wei Yongjun wrote: > From: Wei Yongjun> > Add the missing unlock before return from function __ibmvnic_reset() > in the error handling case. > > Fixes: ed651a10875f ("ibmvnic: Updated reset handling") > Signed-off-by: Wei Yongjun Reviewed-by: Nathan Fontenot > --- > drivers/net/ethernet/ibm/ibmvnic.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index 4f2d329..27f7933 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1313,6 +1313,7 @@ static void __ibmvnic_reset(struct work_struct *work) > > if (rc) { > free_all_rwi(adapter); > + mutex_unlock(>reset_lock); > return; > } >
Re: [PATCH v2 net-next] qed: Utilize FW 8.20.0.0
From: Yuval MintzDate: Thu, 18 May 2017 19:41:04 +0300 > This pushes qed [and as result, all qed* drivers] into using 8.20.0.0 > firmware. The changes are mostly contained in qed with minor changes > to qedi due to some HSI changes. > > Content-wise, the firmware contains fixes to various issues exposed > since the release of the previous firmware, including: > - Corrects iSCSI fast retransmit when data digest is enabled. > - Stop draining packets when receiving several consecutive PFCs. > - Prevent possible assertion when consecutively opening/closing >many connections. > - Prevent possible assertion due to too long BDQ fetch time. > > In addition, the new firmware would allow us to later add iWARP support > in qed and qedr. > > Changes from previous version > - > - V2: Fix warning in qed_debug.c > > Signed-off-by: Chad Dupuis > Signed-off-by: Ram Amrani > Signed-off-by: Tomer Tayar > Signed-off-by: Manish Rangankar > Signed-off-by: Yuval Mintz Applied, hopefully this one goes more smoothly. Thanks.
Re: [PATCH net-next] tcp: fix tcp_rearm_rto()
From: Eric DumazetDate: Thu, 18 May 2017 09:15:58 -0700 > From: Eric Dumazet > > skbs in (re)transmit queue no longer have a copy of jiffies > at the time of the transmit : skb->skb_mstamp is now in usec unit, > with no correlation to tcp_jiffies32. > > We have to convert rto from jiffies to usec, compute a time difference > in usec, then convert the delta to HZ units. > > Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock") > Signed-off-by: Eric Dumazet Applied, thanks Eric.
Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS
On 05/18/2017 08:22 AM, David Miller wrote: > From: Andrew Lunn> Date: Thu, 18 May 2017 17:09:25 +0200 > >> Since these are software counters, they can be consistent. From a >> practical point of view, i doubt they ever will all be consistent, >> there are simply too many drivers to test and change if >> needed. However, for the ones somebody cares about, they can be made >> consistent. >> >> I care about r8152, and would like to make it consistent with asix, >> dsa, e1000e. > > No objection from me for making software counters consistent. > No objection for me as well, but I think we need to agree on what these software counters represent, since there are several cases: - BQL cares about bytes sent on the wire, so that should not include pre/appended descriptors nor the FCS (nor the Ethernet preamble), tx_bytes should be equivalent to that - if we don't include the FCS on transmit, why should we include it on receive? rx_bytes should have the same rules as tx_bytes: no status/descriptor bytes, no FCS etc. -- Florian
Re: [PATCH net-next] tcp: fix tcp_rearm_rto()
On Thu, May 18, 2017 at 12:15 PM, Eric Dumazetwrote: > From: Eric Dumazet > > skbs in (re)transmit queue no longer have a copy of jiffies > at the time of the transmit : skb->skb_mstamp is now in usec unit, > with no correlation to tcp_jiffies32. > > We have to convert rto from jiffies to usec, compute a time difference > in usec, then convert the delta to HZ units. > > Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock") > Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Thank you for the quick fix, Eric!
Re: [PATCH] net1080: Mark nc_dump_ttl() as __maybe_unused
Hi David, El Thu, May 18, 2017 at 10:48:08AM -0400 David Miller ha dit: > From: Matthias Kaehlcke> Date: Wed, 17 May 2017 15:17:08 -0700 > > > The function is not used, but it looks useful for debugging. Adding the > > attribute fixes the following clang warning: > > > > drivers/net/usb/net1080.c:271:20: error: unused function > > 'nc_dump_ttl' [-Werror,-Wunused-function] > > > > Signed-off-by: Matthias Kaehlcke > > For this and the r8152 patch, I definitely prefer that the function is > removed. > > If someone needs them, they can pull it out of the GIT history. Thanks for you comments, I'll send out updated patches soon.
Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
On 18/05/17 15:49, Edward Cree wrote: > Here's one idea that seemed to work when I did a couple of experiments: > let A = (a;am), B = (b;bm) where the m are the masks > Σ = am + bm + a + b > χ = Σ ^ (a + b) /* unknown carries */ > μ = χ | am | bm /* mask of result */ > then A + B = ((a + b) & ~μ; μ) > > The idea is that we find which bits change between the case "all x are > 1" and "all x are 0", and those become xs too. And now I've found a similar algorithm for subtraction, which (again) I can't prove but it seems to work. α = a + am - b β = a - b - bm χ = α ^ β μ = χ | α | β then A - B = ((a - b) & ~μ; μ) Again we're effectively finding the max. and min. values, and XORing them to find unknown carries. Bitwise operations are easy, of course; /* By assumption, a & am == b & bm == 0 */ A & B = (a & b; (a | am) & (b | bm) & ~(a & b)) A | B = (a | b; (am | bm) & ~(a | b)) /* It bothers me that & and | aren't symmetric, but I can't fix it */ A ^ B = (a ^ b; am | bm) as are shifts by a constant (just shift 0s into both number and mask). Multiplication by a constant can be done by decomposing into shifts and adds; but it can also be done directly; here we find (a;am) * k. π = a * k γ = am * k then A * k = (π; 0) + (0; γ), for which we use our addition algo. Multiplication of two unknown values is a nightmare, as unknown bits can propagate all over the place. We can do a shift-add decomposition where the adds for unknown bits have all the 1s in the addend replaced with xs. A few experiments suggest that this works, regardless of the order of operands. For instance 110x * x01 comes out as either 110x + xx0x = 0x or x0x x01 + x01 = 0x We can slightly optimise this by handling all the 1 bits in one go; that is, for (a;am) * (b;bm) we first find (a;am) * b using our multiplication-by-a-constant algo above, then for each bit in bm we find (a;am) * bit and force all its nonzero bits to unknown; finally we add all our components. Don't even ask about division; that scrambles bits so hard that the only thing you can say for sure is that the leading 0s in the numerator stay 0 in the result. The only exception is divisions by a constant which can be converted into a shift, or divisions of a constant by another constant; if the numerator has any xs and the denominator has more than one 1, everything to the right of the first x is totally unknown in general. -Ed
Re: [PATCH 2/2] sh_eth: Do not print an error message for probe deferral
On 05/18/2017 04:01 PM, Geert Uytterhoeven wrote: EPROBE_DEFER is not an error, hence printing an error message like sh-eth ee70.ethernet: failed to initialise MDIO may confuse the user. To fix this, suppress the error message in case of probe deferral. While at it, shorten the message, and add the actual error code. Signed-off-by: Geert UytterhoevenAcked-by: Sergei Shtylyov MBR, Sergei
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
> >> phy = get_phy_device(mdio, addr, is_c45); > >> if (IS_ERR(phy)) > >> - return; > >> + return PTR_ERR(phy); > >> > >> - rc = irq_of_parse_and_map(child, 0); > >> + rc = of_irq_get(child, 0); > >> + if (rc == -EPROBE_DEFER) { > >> + phy_device_free(phy); > >> + return rc; > >> + } > > > > Maybe this should be consistent. All other places there is an error, > > you return it. Here however, you only return the error if it is > > EPROBE_DEFER. > > That's because of the "else" branch in the code below: > > if (rc > 0) { > phy->irq = rc; > mdio->irq[addr] = rc; > } else { > phy->irq = mdio->irq[addr]; > } > > cfr. the marked part of the patch description. > I didn't want to change that behavior, as it's not clear to me why it's > handled > that way. So there seems to be 3 conditions that need handling: 1) of_irq_get() gives us an interrupt number. 2) of_irq_get() indicates there is no irq in the device tree. 3) of_irq_get() indicates a real error 1) We have. 2) We should fall back to using the mdio busses irq for the device. There are a couple of mdio drivers which do this, e.g. stmicro/stmmac/stmmac_mdio.c. mdiobus_alloc() ensures it is set to PHY_POLL, so if the driver does not set it, we poll. 3) This is new. We have two choices. Ignore the error and poll. Or return the error. Historically we have ignored the error. But should we? I would probably return the error, now that we can. But... Florian? Andrew
Re: [PATCH 1/2] sh_eth: Use platform device for printing before register_netdev()
On 05/18/2017 04:01 PM, Geert Uytterhoeven wrote: The MDIO initialization failure message is printed using the network device, before it has been registered, leading to: (null): failed to initialise MDIO Use the platform device instead to fix this: sh-eth ee70.ethernet: failed to initialise MDIO Fixes: daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the network device") Signed-off-by: Geert UytterhoevenAcked-by: Sergei Shtylyov MBR, Sergei
[PATCH net-next] net: sched: provide stubs for tcf_chain_{get,put} for CONFIG_NET_CLS=n
This also changes tcf_chain_get() to return an error pointer instead of NULL, so that tcf_action_goto_chain_init() can differentiate memory allocation failure from lack of support. Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Sabrina Dubroca--- I'm not sure this EOPNOTSUPP is really necessary, ie if we can really reach the tcf_action_goto_chain_init() call when CONFIG_NET_CLS=n. If not, a simpler patch would add a tcf_chain_get() stub that just returns NULL, as we wouldn't have to care about returning an incorrect error code from tcf_action_goto_chain_init(). include/net/pkt_cls.h | 7 +++ net/sched/act_api.c | 10 +++--- net/sched/cls_api.c | 10 +- 3 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 2c213a69c196..ad0d2899529f 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -27,6 +27,13 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res, bool compat_mode); #else +static inline struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index) +{ + return ERR_PTR(-EOPNOTSUPP); +} +static inline void tcf_chain_put(struct tcf_chain *chain) +{ +} static inline int tcf_block_get(struct tcf_block **p_block, struct tcf_proto __rcu **p_filter_chain) diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 0ecf2a858767..502e0bbf35a6 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -31,12 +31,16 @@ static int tcf_action_goto_chain_init(struct tc_action *a, struct tcf_proto *tp) { u32 chain_index = a->tcfa_action & TC_ACT_EXT_VAL_MASK; + struct tcf_chain *chain; if (!tp) return -EINVAL; - a->goto_chain = tcf_chain_get(tp->chain->block, chain_index); - if (!a->goto_chain) - return -ENOMEM; + + chain = tcf_chain_get(tp->chain->block, chain_index); + if (IS_ERR(chain)) + return PTR_ERR(chain); + + a->goto_chain = chain; return 0; } diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 4020b8d932a1..8c14af3b77ae 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -193,7 +193,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block, chain = kzalloc(sizeof(*chain), GFP_KERNEL); if (!chain) - return NULL; + return ERR_PTR(-ENOMEM); list_add_tail(>list, >chain_list); chain->block = block; chain->index = chain_index; @@ -256,8 +256,8 @@ int tcf_block_get(struct tcf_block **p_block, INIT_LIST_HEAD(>chain_list); /* Create chain 0 by default, it has to be always present. */ chain = tcf_chain_create(block, 0); - if (!chain) { - err = -ENOMEM; + if (IS_ERR(chain)) { + err = PTR_ERR(chain); goto err_chain_create; } tcf_chain_filter_chain_ptr_set(chain, p_filter_chain); @@ -503,8 +503,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, goto errout; } chain = tcf_chain_get(block, chain_index); - if (!chain) { - err = -ENOMEM; + if (IS_ERR(chain)) { + err = PTR_ERR(chain); goto errout; } -- 2.13.0
[PATCH net-next] tcp: fix tcp_rearm_rto()
From: Eric Dumazetskbs in (re)transmit queue no longer have a copy of jiffies at the time of the transmit : skb->skb_mstamp is now in usec unit, with no correlation to tcp_jiffies32. We have to convert rto from jiffies to usec, compute a time difference in usec, then convert the delta to HZ units. Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock") Signed-off-by: Eric Dumazet --- net/ipv4/tcp_input.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9a5a9e8eda899666501cca06b37948ab64ae79b2..6db6b47e2bbc09aae2627a109e5a1ee9a3f4fe4e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3002,14 +3002,14 @@ void tcp_rearm_rto(struct sock *sk) if (icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { struct sk_buff *skb = tcp_write_queue_head(sk); - const u32 rto_time_stamp = - tcp_skb_timestamp(skb) + rto; - s32 delta = (s32)(rto_time_stamp - tcp_jiffies32); - /* delta may not be positive if the socket is locked + u64 rto_time_stamp = skb->skb_mstamp + +jiffies_to_usecs(rto); + s64 delta_us = rto_time_stamp - tp->tcp_mstamp; + /* delta_us may not be positive if the socket is locked * when the retrans timer fires and is rescheduled. */ - if (delta > 0) - rto = delta; + if (delta_us > 0) + rto = usecs_to_jiffies(delta_us); } inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto, TCP_RTO_MAX);
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
Hi Andrew, On Thu, May 18, 2017 at 6:09 PM, Andrew Lunnwrote: > On Thu, May 18, 2017 at 02:59:05PM +0200, Geert Uytterhoeven wrote: >> If an Ethernet PHY is initialized before the interrupt controller it is >> connected to, a message like the following is printed: >> >> irq: no irq domain found for /interrupt-controller@e61c ! >> >> However, the actual error is ignored, leading to a non-functional (-1) >> PHY interrupt later: >> >> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver >> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1) >> >> Depending on whether the PHY driver will fall back to polling, Ethernet >> may or may not work. >> >> To fix this: >> 1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to >> of_irq_get(). >> Unlike the former, the latter returns -EPROBE_DEFER if the >> interrupt controller is not yet available, so this condition can be >> detected. >> Other errors are handled the same as before, i.e. use the passed >> mdio->irq[addr] as interrupt. ^ >> 2. Propagate and handle errors from of_mdiobus_register_phy() and >> of_mdiobus_register_device(). >> >> Signed-off-by: Geert Uytterhoeven >> --- >> Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver. >> I assume it always happened on RZ/G1 in mainline. >> --- >> drivers/of/of_mdio.c | 39 +++ >> 1 file changed, 27 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c >> index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644 >> --- a/drivers/of/of_mdio.c >> +++ b/drivers/of/of_mdio.c >> @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 >> *phy_id) >> return -EINVAL; >> } >> >> -static void of_mdiobus_register_phy(struct mii_bus *mdio, >> +static int of_mdiobus_register_phy(struct mii_bus *mdio, >> struct device_node *child, u32 addr) >> { >> struct phy_device *phy; >> @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio, >> else >> phy = get_phy_device(mdio, addr, is_c45); >> if (IS_ERR(phy)) >> - return; >> + return PTR_ERR(phy); >> >> - rc = irq_of_parse_and_map(child, 0); >> + rc = of_irq_get(child, 0); >> + if (rc == -EPROBE_DEFER) { >> + phy_device_free(phy); >> + return rc; >> + } > > Maybe this should be consistent. All other places there is an error, > you return it. Here however, you only return the error if it is > EPROBE_DEFER. That's because of the "else" branch in the code below: if (rc > 0) { phy->irq = rc; mdio->irq[addr] = rc; } else { phy->irq = mdio->irq[addr]; } cfr. the marked part of the patch description. I didn't want to change that behavior, as it's not clear to me why it's handled that way. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
On Thu, May 18, 2017 at 02:59:05PM +0200, Geert Uytterhoeven wrote: > If an Ethernet PHY is initialized before the interrupt controller it is > connected to, a message like the following is printed: > > irq: no irq domain found for /interrupt-controller@e61c ! > > However, the actual error is ignored, leading to a non-functional (-1) > PHY interrupt later: > > Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver > [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1) > > Depending on whether the PHY driver will fall back to polling, Ethernet > may or may not work. > > To fix this: > 1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to > of_irq_get(). > Unlike the former, the latter returns -EPROBE_DEFER if the > interrupt controller is not yet available, so this condition can be > detected. > Other errors are handled the same as before, i.e. use the passed > mdio->irq[addr] as interrupt. > 2. Propagate and handle errors from of_mdiobus_register_phy() and > of_mdiobus_register_device(). > > Signed-off-by: Geert Uytterhoeven> --- > Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver. > I assume it always happened on RZ/G1 in mainline. > --- > drivers/of/of_mdio.c | 39 +++ > 1 file changed, 27 insertions(+), 12 deletions(-) > > diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c > index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644 > --- a/drivers/of/of_mdio.c > +++ b/drivers/of/of_mdio.c > @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 > *phy_id) > return -EINVAL; > } > > -static void of_mdiobus_register_phy(struct mii_bus *mdio, > +static int of_mdiobus_register_phy(struct mii_bus *mdio, > struct device_node *child, u32 addr) > { > struct phy_device *phy; > @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio, > else > phy = get_phy_device(mdio, addr, is_c45); > if (IS_ERR(phy)) > - return; > + return PTR_ERR(phy); > > - rc = irq_of_parse_and_map(child, 0); > + rc = of_irq_get(child, 0); > + if (rc == -EPROBE_DEFER) { > + phy_device_free(phy); > + return rc; > + } Maybe this should be consistent. All other places there is an error, you return it. Here however, you only return the error if it is EPROBE_DEFER. Andrew > if (rc > 0) { > phy->irq = rc; > mdio->irq[addr] = rc; > @@ -84,22 +88,23 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio, > if (rc) { > phy_device_free(phy); > of_node_put(child); > - return; > + return rc; > } > > dev_dbg(>dev, "registered phy %s at address %i\n", > child->name, addr); > + return 0; > } >
Re: [PATCH] xfrm: fix state migration replay sequence numbers
On 2017-05-18 16:39, Antony Antony wrote: > During xfrm migration replay and preplay sequence numbers are not > copied from the previous state. > > Here is tcpdump output showing the problem. > 10.0.10.46 is running vanilla kernel, IKE/IPsec responder. > After the migration it sent wrong sequence number, reset to 1. > The migration is from 10.0.0.52 to 10.0.0.53. > > IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7cf), length 136 > IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: > ESP(spi=0xca1c282d,seq=0x7cf), length 136 > IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d0), length 136 > IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: > ESP(spi=0xca1c282d,seq=0x7d0), length 136 > > IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] > IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] > IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] > IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] > > IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d1), length 136 > > NOTE: next sequence is wrong 0x1 > > IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), > length 136 > IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: > ESP(spi=0x43ef462d,seq=0x7d2), length 136 > IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), > length 136 > > The attached patch fix it by copying replay and preplay. > > regards, > -antony > > Antony Antony (1): > xfrm: fix state migration replay sequence numbers > > net/xfrm/xfrm_state.c | 2 ++ > 1 file changed, 2 insertions(+) > > -- > 2.9.3 > > >From 1241e8b4c38ad2bf7399599165f763af38aba8d9 Mon Sep 17 00:00:00 2001 > From: Antony Antony> Date: Thu, 18 May 2017 12:19:32 +0200 > Subject: [PATCH] xfrm: fix state migration copy replay sequence numbers > To: netdev@vger.kernel.org, Herbert Xu , Steffen > Klassert > Cc: Richard Guy Briggs > > During xfrm migration copy replay and preplay sequence numbers > from the previous state. > > Signed-off-by: Antony Antony > --- > net/xfrm/xfrm_state.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index fc3c5aa..2e291bc 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct > xfrm_state *orig) > x->curlft.add_time = orig->curlft.add_time; > x->km.state = orig->km.state; > x->km.seq = orig->km.seq; > + x->replay = orig->replay; > + x->preplay = orig->preplay; > > return x; > > -- > 2.9.3 This looks reasonable to me. With a bit more out-of-band information from Antony and Paul Wouters we have: https://tools.ietf.org/html/rfc4555#section-3.5 so while it is not explicit about what is to be copied, it only indicates that the IPsec SA is to be updated with the new address whereas this implementation creates a new IPsec SA and copies over the values, missing some. (Note: using "git format-patch --cover-letter --cc ... -o " and "git send-email --to ... " work really well together.) Reviewed-by: Richard Guy Briggs slainte mhath, RGB -- Richard Guy Briggs -- ~\-- ~\ -- \___ o \@ @Ride yer bike! Ottawa, ON, CANADA -- Lo_>__M__\\/\%__\\/\% Vote! -- _GTVS6#790__(*)__(*)(*)(*)_
Re: [PATCH net-next] net/mlx5e: Fix possible memory leak
On Thu, May 18, 2017 at 03:34:41PM +, Wei Yongjun wrote: > From: Wei Yongjun> > 'encap_header' is malloced and should be freed before leaving from > the error handling cases, otherwise it will cause memory leak. > > Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") > Signed-off-by: Wei Yongjun > --- > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > index 11c27e4..a72ecbc 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > @@ -1404,8 +1404,8 @@ static int mlx5e_create_encap_header_ipv4(struct > mlx5e_priv *priv, > > if (!(nud_state & NUD_VALID)) { > neigh_event_send(n, NULL); > - neigh_release(n); > - return -EAGAIN; > + err = -EAGAIN; > + goto out; > } > > err = mlx5_encap_alloc(priv->mdev, e->tunnel_type, > @@ -1510,8 +1510,8 @@ static int mlx5e_create_encap_header_ipv6(struct > mlx5e_priv *priv, > > if (!(nud_state & NUD_VALID)) { > neigh_event_send(n, NULL); > - neigh_release(n); > - return -EAGAIN; > + err = -EAGAIN; > + goto out; > } Reviewed-by: Yuval Shaia > > err = mlx5_encap_alloc(priv->mdev, e->tunnel_type, > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] xfrm: Make function xfrm_dev_register static
From: Wei YongjunFixes the following sparse warning: net/xfrm/xfrm_device.c:141:5: warning: symbol 'xfrm_dev_register' was not declared. Should it be static? Signed-off-by: Wei Yongjun --- net/xfrm/xfrm_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c index 8ec8a3f..50ec733 100644 --- a/net/xfrm/xfrm_device.c +++ b/net/xfrm/xfrm_device.c @@ -138,7 +138,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x) } EXPORT_SYMBOL_GPL(xfrm_dev_offload_ok); -int xfrm_dev_register(struct net_device *dev) +static int xfrm_dev_register(struct net_device *dev) { if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops) return NOTIFY_BAD;
Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails
Hi Nikolay, Nikolay Aleksandrovwrites: >> OK good to know. That intention wasn't obvious. I can make __br_mdb_del >> return void instead? What about the rest of the patchset if I do so? > > If you make it return void we will not be able to return proper error value > when doing a single operation (the else case). About the rest I see only some > minor style issues, I'll comment on the respective patches. Another minor nit > is > using switch() instead of if/else for the message types but that is really up > to > you, I don't mind either way. :-) Ho OK I understand better the batch vs single delete operation now. __br_mdb_do hardly makes sense now, because we don't know which case we are handling... But factorizing br_mdb_do still makes sense. I'll come up with something. Thanks, Vivien
[RFC net-next PATCH 5/5] mlx5: add XDP rxhash feature for driver mlx5
--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c |3 + drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 98 ++--- 2 files changed, 70 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e43411d232ee..3ae90dbdd3de 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3956,6 +3956,9 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX; netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER; + /* XDP_DRV_F_ENABLED is added in register_netdevice() */ + netdev->xdp_features = XDP_DRV_F_RXHASH; + if (mlx5e_vxlan_allowed(mdev)) { netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_UDP_TUNNEL_CSUM | diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index ae66fad98244..eb9d859bf09d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -514,14 +514,28 @@ static void mlx5e_lro_update_hdr(struct sk_buff *skb, struct mlx5_cqe64 *cqe, } } -static inline void mlx5e_skb_set_hash(struct mlx5_cqe64 *cqe, - struct sk_buff *skb) +u8 mlx5_htype_l3_to_xdp[4] = { + 0, /* 00 - none */ + XDP_HASH_TYPE_L3_IPV4, /* 01 - IPv4 */ + XDP_HASH_TYPE_L3_IPV6, /* 10 - IPv6 */ + 0, /* 11 - Reserved */ +}; + +u8 mlx5_htype_l4_to_xdp[4] = { + 0, /* 00 - none */ + XDP_HASH_TYPE_L4_TCP, /* 01 - TCP */ + XDP_HASH_TYPE_L4_UDP, /* 10 - UDP */ + 0, /* 11 - IPSEC.SPI */ +}; + +static inline void mlx5e_xdp_set_hash(struct mlx5_cqe64 *cqe, + struct xdp_buff *xdp) { u8 cht = cqe->rss_hash_type; - int ht = (cht & CQE_RSS_HTYPE_L4) ? PKT_HASH_TYPE_L4 : -(cht & CQE_RSS_HTYPE_IP) ? PKT_HASH_TYPE_L3 : - PKT_HASH_TYPE_NONE; - skb_set_hash(skb, be32_to_cpu(cqe->rss_hash_result), ht); + u32 ht = (mlx5_htype_l4_to_xdp[((cht & CQE_RSS_HTYPE_L4) >> 6)] | \ + mlx5_htype_l3_to_xdp[((cht & CQE_RSS_HTYPE_IP) >> 2)]); + + xdp_record_hash(xdp, be32_to_cpu(cqe->rss_hash_result), ht); } static inline bool is_first_ethertype_ip(struct sk_buff *skb) @@ -570,7 +584,8 @@ static inline void mlx5e_handle_csum(struct net_device *netdev, static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, u32 cqe_bcnt, struct mlx5e_rq *rq, - struct sk_buff *skb) + struct sk_buff *skb, + struct xdp_buff *xdp) { struct net_device *netdev = rq->netdev; struct mlx5e_tstamp *tstamp = rq->tstamp; @@ -593,8 +608,7 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, skb_record_rx_queue(skb, rq->ix); - if (likely(netdev->features & NETIF_F_RXHASH)) - mlx5e_skb_set_hash(cqe, skb); + xdp_set_skb_hash(xdp, skb); if (cqe_has_vlan(cqe)) __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), @@ -609,11 +623,12 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe, static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, u32 cqe_bcnt, -struct sk_buff *skb) +struct sk_buff *skb, +struct xdp_buff *xdp) { rq->stats.packets++; rq->stats.bytes += cqe_bcnt; - mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb); + mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb, xdp); } static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_xdpsq *sq) @@ -696,27 +711,27 @@ static inline bool mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq, /* returns true if packet was consumed by xdp */ static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di, - void *va, u16 *rx_headroom, u32 *len) + struct xdp_buff *xdp, void *va, + u16 *rx_headroom, u32 *len) { const struct bpf_prog *prog = READ_ONCE(rq->xdp_prog); - struct xdp_buff xdp; u32 act; if (!prog) return false; - xdp.data = va + *rx_headroom; - xdp.data_end = xdp.data + *len; - xdp.data_hard_start = va; +
[RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
Introducing a new XDP feature and associated bpf helper bpf_xdp_rxhash. The rxhash and type allow filtering on packets without touching packet memory. The performance difference on my system with a 100 Gbit/s mlx5 NIC is 12Mpps to 19Mpps. TODO: desc RXHASH and associated type, and how XDP choose to map and export these to bpf_prog's. TODO: desc how this interacts with XDP driver features system. --- include/linux/filter.h | 31 - include/linux/netdev_features.h |4 ++ include/uapi/linux/bpf.h| 56 +- kernel/bpf/verifier.c |3 ++ net/core/dev.c | 16 - net/core/filter.c | 73 +++ samples/bpf/bpf_helpers.h |2 + tools/include/uapi/linux/bpf.h | 10 + 8 files changed, 190 insertions(+), 5 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 9a7786db14fa..33a254ccd47d 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -413,7 +413,8 @@ struct bpf_prog { locked:1, /* Program image locked? */ gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1,/* Is control block accessed? */ - dst_needed:1; /* Do we need dst entry? */ + dst_needed:1, /* Do we need dst entry? */ + xdp_rxhash_needed:1;/* Req XDP RXHASH support */ kmemcheck_bitfield_end(meta); enum bpf_prog_type type; /* Type of BPF program */ u32 len;/* Number of filter blocks */ @@ -444,12 +445,40 @@ struct bpf_skb_data_end { void *data_end; }; +/* Kernel internal xdp_buff->flags */ +#define XDP_CTX_F_RXHASH_TYPE_MASK XDP_HASH_TYPE_MASK +#define XDP_CTX_F_RXHASH_TYPE_BITS XDP_HASH_TYPE_BITS +#define XDP_CTX_F_RXHASH_SW(1ULL << XDP_CTX_F_RXHASH_TYPE_BITS) +#define XDP_CTX_F_RXHASH_HW(1ULL << (XDP_CTX_F_RXHASH_TYPE_BITS+1)) + struct xdp_buff { void *data; void *data_end; void *data_hard_start; + u64 flags; + u32 rxhash; }; +/* helper functions for driver setting rxhash */ +static inline void +xdp_record_hash(struct xdp_buff *xdp, u32 hash, u32 type) +{ + xdp->flags |= XDP_CTX_F_RXHASH_HW; + xdp->flags |= type & XDP_CTX_F_RXHASH_TYPE_MASK; + xdp->rxhash = hash; +} + +static inline void +xdp_set_skb_hash(struct xdp_buff *xdp, struct sk_buff *skb) +{ + if (likely(xdp->flags & (XDP_CTX_F_RXHASH_HW|XDP_CTX_F_RXHASH_SW))) { + bool is_sw = !!(xdp->flags | XDP_CTX_F_RXHASH_SW); + bool is_l4 = !!(xdp->flags & XDP_HASH_TYPE_L4_MASK); + + __skb_set_hash(skb, xdp->rxhash, is_sw, is_l4); + } +} + /* compute the linear packet data range [data, data_end) which * will be accessed by cls_bpf, act_bpf and lwt programs */ diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index ff81ee231410..4b50e8c606c5 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -219,11 +219,13 @@ enum { /* XDP driver flags */ enum { XDP_DRV_F_ENABLED_BIT, + XDP_DRV_F_RXHASH_BIT, }; #define __XDP_DRV_F_BIT(bit) ((netdev_features_t)1 << (bit)) #define __XDP_DRV_F(name) __XDP_DRV_F_BIT(XDP_DRV_F_##name##_BIT) #define XDP_DRV_F_ENABLED __XDP_DRV_F(ENABLED) +#define XDP_DRV_F_RXHASH __XDP_DRV_F(RXHASH) /* XDP driver MUST support these features, else kernel MUST reject * bpf_prog to guarantee safe access to data structures @@ -233,7 +235,7 @@ enum { /* Some XDP features are under development. Based on bpf_prog loading * detect if kernel feature can be activated. */ -#define XDP_DRV_FEATURES_DEVEL 0 +#define XDP_DRV_FEATURES_DEVEL XDP_DRV_F_RXHASH /* Some XDP features are optional, like action return code, as they * are handled safely runtime. diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 945a1f5f63c5..1d9d3a46217d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -482,6 +482,9 @@ union bpf_attr { * Get the owner uid of the socket stored inside sk_buff. * @skb: pointer to skb * Return: uid of the socket owner on success or overflowuid if failed. + * + * u64 bpf_xdp_rxhash(xdp_md, new_hash, type, flags) + * TODO: MISSING DESC */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -531,7 +534,8 @@ union bpf_attr { FN(xdp_adjust_head),\ FN(probe_read_str), \ FN(get_socket_cookie), \ - FN(get_socket_uid), + FN(get_socket_uid), \ + FN(xdp_rxhash), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function
[RFC net-next PATCH 3/5] net: introduce XDP driver features interface
There is a fundamental difference between normal eBPF programs and (XDP) eBPF programs getting attached in a driver. For normal eBPF programs it is easy to add a new bpf feature, like a bpf helper, because is it strongly tied to the feature being available in the current core kernel code. When drivers invoke a bpf_prog, then it is not sufficient to simply relying on whether a bpf_helper exists or not. When a driver haven't implemented a given feature yet, then it is possible to expose uninitialized parts of xdp_buff. The driver pass in a pointer to xdp_buff, usually "allocated" on the stack, which must not be exposed. Only two user visible NETIF_F_XDP_* net_device feature flags are exposed via ethtool (-k) seen as "xdp" and "xdp-partial". The "xdp-partial" is detected when there is not feature equality between kernel and driver, and a netdev_warn is given. The idea is that XDP_DRV_* feature bits define a contract between the driver and the kernel, giving a reliable way to know that XDP features a driver promised to implement. Thus, knowing what bpf side features are safe to allow. There are 3 levels of features: "required", "devel" and "optional". The motivation is pushing driver vendors forward to support all the new XDP features. Once a given feature bit is moved into the "required" features, the kernel will reject loading XDP program if feature isn't implemented by driver. Features under developement, require help from the bpf infrastrucure to detect when a given helper or direct-access is used, using a bpf_prog bit to mark a need for the feature, and pulling in this bit in the xdp_features_check(). When all drivers have implemented a "devel" feature, it can be moved to the "required" feature and the bpf_prog bit can be refurbished. The "optional" features are for things that are handled safely runtime, but drivers will still get flagged as "xdp-partial" if not implementing those. --- include/linux/netdev_features.h | 32 include/linux/netdevice.h |1 + net/core/dev.c | 34 ++ net/core/ethtool.c |2 ++ 4 files changed, 69 insertions(+) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 1d4737cffc71..ff81ee231410 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -77,6 +77,8 @@ enum { NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload */ NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */ + NETIF_F_XDP_BASELINE_BIT, /* Driver supports XDP */ + NETIF_F_XDP_PARTIAL_BIT,/* not supporting all XDP features */ /* * Add your fresh new feature above and remember to update * netdev_features_strings[] in net/core/ethtool.c and maybe @@ -140,6 +142,8 @@ enum { #define NETIF_F_HW_TC __NETIF_F(HW_TC) #define NETIF_F_HW_ESP __NETIF_F(HW_ESP) #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM) +#define NETIF_F_XDP_BASELINE __NETIF_F(XDP_BASELINE) +#define NETIF_F_XDP_PARTIAL__NETIF_F(XDP_PARTIAL) #define for_each_netdev_feature(mask_addr, bit)\ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) @@ -212,4 +216,32 @@ enum { NETIF_F_GSO_UDP_TUNNEL | \ NETIF_F_GSO_UDP_TUNNEL_CSUM) +/* XDP driver flags */ +enum { + XDP_DRV_F_ENABLED_BIT, +}; + +#define __XDP_DRV_F_BIT(bit) ((netdev_features_t)1 << (bit)) +#define __XDP_DRV_F(name) __XDP_DRV_F_BIT(XDP_DRV_F_##name##_BIT) +#define XDP_DRV_F_ENABLED __XDP_DRV_F(ENABLED) + +/* XDP driver MUST support these features, else kernel MUST reject + * bpf_prog to guarantee safe access to data structures + */ +#define XDP_DRV_FEATURES_REQUIRED XDP_DRV_F_ENABLED + +/* Some XDP features are under development. Based on bpf_prog loading + * detect if kernel feature can be activated. + */ +#define XDP_DRV_FEATURES_DEVEL 0 + +/* Some XDP features are optional, like action return code, as they + * are handled safely runtime. + */ +#define XDP_DRV_FEATURES_OPTIONAL 0 + +#define XDP_DRV_FEATURES_MASK (XDP_DRV_FEATURES_REQUIRED |\ +XDP_DRV_FEATURES_DEVEL | \ +XDP_DRV_FEATURES_OPTIONAL) + #endif /* _LINUX_NETDEV_FEATURES_H */ diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9c23bd2efb56..329ae156ff65 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1685,6 +1685,7 @@ struct net_device { netdev_features_t hw_enc_features; netdev_features_t mpls_features; netdev_features_t gso_partial_features; + netdev_features_t xdp_features; int ifindex; int group; diff --git
[RFC net-next PATCH 2/5] mlx5: fix bug reading rss_hash_type from CQE
Masks for extracting part of the Completion Queue Entry (CQE) field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and CQE_RSS_HTYPE_L4. The bug resulted in setting skb->l4_hash, even-though the rss_hash_type indicated that hash was NOT computed over the L4 (UDP or TCP) part of the packet. Added comments from the datasheet, to make it more clear what these masks are selecting. Signed-off-by: Jesper Dangaard Brouer--- include/linux/mlx5/device.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index dd9a263ed368..a940ec6a046c 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -787,8 +787,14 @@ enum { }; enum { - CQE_RSS_HTYPE_IP= 0x3 << 6, - CQE_RSS_HTYPE_L4= 0x3 << 2, + CQE_RSS_HTYPE_IP= 0x3 << 2, + /* cqe->rss_hash_type[3:2] - IP destination selected for hash +* (00 = none, 01 = IPv4, 10 = IPv6, 11 = Reserved) +*/ + CQE_RSS_HTYPE_L4= 0x3 << 6, + /* cqe->rss_hash_type[7:6] - L4 destination selected for hash +* (00 = none, 01 = TCP. 10 = UDP, 11 = IPSEC.SPI +*/ }; enum {
[RFC net-next PATCH 0/5] XDP driver feature API and handling change to xdp_buff
I would like some comments on introducing a feature API between XDP drives and XDP/BPF core. The primary issue is when extending struct xdp_buff, today, drivers not implementing this feature can access uninitilized memory, using bpf-helper associated with the feature. --- Jesper Dangaard Brouer (5): samples/bpf: xdp_tx_iptunnel make use of map_data[] mlx5: fix bug reading rss_hash_type from CQE net: introduce XDP driver features interface net: new XDP feature for reading HW rxhash from drivers mlx5: add XDP rxhash feature for driver mlx5 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |3 + drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 98 ++--- include/linux/filter.h| 31 ++- include/linux/mlx5/device.h | 10 ++ include/linux/netdev_features.h | 34 +++ include/linux/netdevice.h |1 include/uapi/linux/bpf.h | 56 kernel/bpf/verifier.c |3 + net/core/dev.c| 48 ++ net/core/ethtool.c|2 net/core/filter.c | 73 samples/bpf/bpf_helpers.h |2 samples/bpf/xdp_tx_iptunnel_common.h |2 samples/bpf/xdp_tx_iptunnel_kern.c|2 samples/bpf/xdp_tx_iptunnel_user.c| 14 ++- tools/include/uapi/linux/bpf.h| 10 ++ 16 files changed, 345 insertions(+), 44 deletions(-) --
[RFC net-next PATCH 1/5] samples/bpf: xdp_tx_iptunnel make use of map_data[]
There is no reason to use a compile time constant MAX_IPTNL_ENTRIES shared between the _user.c and _kern.c, when map_data[].def.max_entries can tell us dynamically what the max_entries were of the ELF map that the bpf loaded created. Signed-off-by: Jesper Dangaard Brouer--- samples/bpf/xdp_tx_iptunnel_common.h |2 -- samples/bpf/xdp_tx_iptunnel_kern.c |2 +- samples/bpf/xdp_tx_iptunnel_user.c | 14 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/samples/bpf/xdp_tx_iptunnel_common.h b/samples/bpf/xdp_tx_iptunnel_common.h index dd12cc35110f..b065699cacb5 100644 --- a/samples/bpf/xdp_tx_iptunnel_common.h +++ b/samples/bpf/xdp_tx_iptunnel_common.h @@ -9,8 +9,6 @@ #include -#define MAX_IPTNL_ENTRIES 256U - struct vip { union { __u32 v6[4]; diff --git a/samples/bpf/xdp_tx_iptunnel_kern.c b/samples/bpf/xdp_tx_iptunnel_kern.c index 0f4f6e8c8611..b19489eb3c22 100644 --- a/samples/bpf/xdp_tx_iptunnel_kern.c +++ b/samples/bpf/xdp_tx_iptunnel_kern.c @@ -30,7 +30,7 @@ struct bpf_map_def SEC("maps") vip2tnl = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(struct vip), .value_size = sizeof(struct iptnl_info), - .max_entries = MAX_IPTNL_ENTRIES, + .max_entries = 256, }; static __always_inline void count_tx(u32 protocol) diff --git a/samples/bpf/xdp_tx_iptunnel_user.c b/samples/bpf/xdp_tx_iptunnel_user.c index 92b8bde9337c..0500a5cc75c4 100644 --- a/samples/bpf/xdp_tx_iptunnel_user.c +++ b/samples/bpf/xdp_tx_iptunnel_user.c @@ -123,11 +123,6 @@ static int parse_ports(const char *port_str, int *min_port, int *max_port) return 1; } - if (tmp_max_port - tmp_min_port + 1 > MAX_IPTNL_ENTRIES) { - fprintf(stderr, "Port range (%s) is larger than %u\n", - port_str, MAX_IPTNL_ENTRIES); - return 1; - } *min_port = tmp_min_port; *max_port = tmp_max_port; @@ -142,6 +137,7 @@ int main(int argc, char **argv) int min_port = 0, max_port = 0; struct iptnl_info tnl = {}; struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + unsigned int entries, max_entries; struct vip vip = {}; char filename[256]; int opt; @@ -238,6 +234,14 @@ int main(int argc, char **argv) return 1; } + entries = max_port - min_port + 1; + max_entries = map_data[1].def.max_entries; + if (entries > max_entries) { + fprintf(stderr, "Req port entries (%u) is larger than max %u\n", + entries, max_entries); + return 1; + } + signal(SIGINT, int_exit); while (min_port <= max_port) {
[PATCH net-next] net/mlx5e: Fix possible memory leak
From: Wei Yongjun'encap_header' is malloced and should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Wei Yongjun --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 11c27e4..a72ecbc 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1404,8 +1404,8 @@ static int mlx5e_create_encap_header_ipv4(struct mlx5e_priv *priv, if (!(nud_state & NUD_VALID)) { neigh_event_send(n, NULL); - neigh_release(n); - return -EAGAIN; + err = -EAGAIN; + goto out; } err = mlx5_encap_alloc(priv->mdev, e->tunnel_type, @@ -1510,8 +1510,8 @@ static int mlx5e_create_encap_header_ipv6(struct mlx5e_priv *priv, if (!(nud_state & NUD_VALID)) { neigh_event_send(n, NULL); - neigh_release(n); - return -EAGAIN; + err = -EAGAIN; + goto out; } err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
Re: [PATCH] net: sched: fix a use-after-free error on chain on the error exit path
From: Colin KingDate: Thu, 18 May 2017 15:07:02 +0100 > From: Colin Ian King > > Set chain to null after the call to tcf_chain_destroy so that we don't > call tcf_chain_put on the error exit path, thus avoiding a use-after-free > error. > > Detected by CoverityScan, CID#1436357 ("Use after free") > > Signed-off-by: Colin Ian King Colin, you really need to make some adjustments to how you are submitting these kinds of patches. First of all, you must indicate the target tree in your Subject line as "[PATCH net-next] " in this case. Also, you need to add an appropriate Fixes: tag right before your signoff. Thank you.
Re: [PATCH net-next 5/6] net: bridge: get msgtype from nlmsghdr in mdb ops
On 5/18/17 12:27 AM, Vivien Didelot wrote: Retrieve the message type from the nlmsghdr structure instead of hardcoding it in both br_mdb_add and br_mdb_del. Signed-off-by: Vivien Didelot--- net/bridge/br_mdb.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index a72d5e6f339f..d280b20587cb 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -569,6 +569,7 @@ static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, struct net_bridge_port *p; struct net_bridge_vlan *v; struct net_bridge *br; + int msgtype = nlh->nlmsg_type; minor nits: nlmsg_type is a u16, also please keep the order and arrange these from longest to shortest int err; err = br_mdb_parse(skb, nlh, , ); @@ -595,12 +596,12 @@ static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, if (br_vlan_enabled(br) && vg && entry->vid == 0) { list_for_each_entry(v, >vlan_list, vlist) { entry->vid = v->vid; - err = __br_mdb_do(p, entry, RTM_NEWMDB); + err = __br_mdb_do(p, entry, msgtype); if (err) break; } } else { - err = __br_mdb_do(p, entry, RTM_NEWMDB); + err = __br_mdb_do(p, entry, msgtype); } return err; @@ -677,6 +678,7 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, struct net_bridge_port *p; struct net_bridge_vlan *v; struct net_bridge *br; + int msgtype = nlh->nlmsg_type; same here int err; err = br_mdb_parse(skb, nlh, , ); @@ -703,12 +705,12 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, if (br_vlan_enabled(br) && vg && entry->vid == 0) { list_for_each_entry(v, >vlan_list, vlist) { entry->vid = v->vid; - err = __br_mdb_do(p, entry, RTM_DELMDB); + err = __br_mdb_do(p, entry, msgtype); if (err) break; } } else { - err = __br_mdb_do(p, entry, RTM_DELMDB); + err = __br_mdb_do(p, entry, msgtype); } return err;
RE: [PATCH net-next] qed: Utilize FW 8.20.0.0
> >> This pushes qed [and as result, all qed* drivers] into using 8.20.0.0 > >> firmware. The changes are mostly contained in qed with minor changes > >> to qedi due to some HSI changes. > >> > >> Content-wise, the firmware contains fixes to various issues exposed > >> since the release of the previous firmware, including: > >> - Corrects iSCSI fast retransmit when data digest is enabled. > >> - Stop draining packets when receiving several consecutive PFCs. > >> - Prevent possible assertion when consecutively opening/closing > >>many connections. > >> - Prevent possible assertion due to too long BDQ fetch time. > >> > >> In addition, the new firmware would allow us to later add iWARP > >> support in qed and qedr. > >> > >> Signed-off-by: Chad Dupuis> >> Signed-off-by: Ram Amrani > >> Signed-off-by: Tomer Tayar > >> Signed-off-by: Manish Rangankar > >> Signed-off-by: Yuval Mintz > > > > Applied. > > Actually I had to revert. Please look at the compiler output before > submitting changes: > > drivers/net/ethernet/qlogic/qed/qed_debug.c: In function ‘qed_grc_dump’: > drivers/net/ethernet/qlogic/qed/qed_debug.c:2425:6: warning: ‘addr’ may > be used uninitialized in this function [-Wmaybe-uninitialized] > u32 byte_addr = DWORDS_TO_BYTES(addr), offset = 0, i; > ^ > drivers/net/ethernet/qlogic/qed/qed_debug.c:3534:7: note: ‘addr’ was > declared here >u32 addr, size = RSS_REG_RSS_RAM_DATA_SIZE; >^ > > 'addr' is never, ever, assigned a value, yet it is passed into a function as > an > argument. Sorry about that. Will send v2 [hopefully] later today.
Re: [patch net] mlxsw: spectrum: Avoid possible NULL pointer dereference
From: Jiri PirkoDate: Thu, 18 May 2017 13:03:52 +0200 > From: Ido Schimmel > > In case we got an FDB notification for a port that doesn't exist we > execute an FDB entry delete to prevent it from re-appearing the next > time we poll for notifications. > > If the operation failed we would trigger a NULL pointer dereference as > 'mlxsw_sp_port' is NULL. > > Fix it by reporting the error using the underlying bus device instead. > > Fixes: 12f1501e7511 ("mlxsw: spectrum: remove FDB entry in case we get > unknown object notification") > Signed-off-by: Ido Schimmel > Signed-off-by: Jiri Pirko Applied, thank you.
[PATCH net-next] qed: Remove unused including
From: Wei YongjunRemove including that is not needed. Signed-off-by: Wei Yongjun --- drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 1 - drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 1 - drivers/net/ethernet/qlogic/qed/qed_l2.c| 1 - drivers/net/ethernet/qlogic/qed/qed_ll2.c | 1 - drivers/net/ethernet/qlogic/qed/qed_main.c | 1 - 5 files changed, 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c index 21a58ff..690dd2b 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c +++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c @@ -43,7 +43,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c index 339c91d..f2fd09c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c +++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c @@ -44,7 +44,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index 746fed4..fab6e69 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -43,7 +43,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c index 09c8641..b04dfc4 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c @@ -38,7 +38,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 537d123..f286daa 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -34,7 +34,6 @@ #include #include #include -#include #include #include #include
Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails
On 5/18/17 6:08 PM, Vivien Didelot wrote: Hi Nikolay, Nikolay Aleksandrovwrites: err = __br_mdb_del(br, entry); - if (!err) - __br_mdb_notify(dev, p, entry, RTM_DELMDB); + if (err) + break; + __br_mdb_notify(dev, p, entry, RTM_DELMDB); } } else { err = __br_mdb_del(br, entry); This can potentially break user-space scripts that rely on the best-effort behaviour, this is the normal "delete without vid & enabled vlan filtering". You can check the fdb delete code which does the same, this was intentional. You can add an mdb entry without a vid to all vlans, add a vlan and then try to remove it from all vlans where it is present - with this patch obviously that will fail at the new vlan. OK good to know. That intention wasn't obvious. I can make __br_mdb_del return void instead? What about the rest of the patchset if I do so? Thanks, Vivien If you make it return void we will not be able to return proper error value when doing a single operation (the else case). About the rest I see only some minor style issues, I'll comment on the respective patches. Another minor nit is using switch() instead of if/else for the message types but that is really up to you, I don't mind either way. :-) Cheers, Nik
Re: [PATCH] liquidio: make the spinlock octeon_devices_lock static
From: Colin KingDate: Thu, 18 May 2017 10:14:01 +0100 > From: Colin Ian King > > octeon_devices_lock can be made static as it does not need to be > in global scope. > > Cleans up sparse warning: "warning: symbol 'octeon_devices_lock' > was not declared. Should it be static?" > > Signed-off-by: Colin Ian King Applied.
[PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()
From: Wei YongjunAdd the missing unlock before return from function __ibmvnic_reset() in the error handling case. Fixes: ed651a10875f ("ibmvnic: Updated reset handling") Signed-off-by: Wei Yongjun --- drivers/net/ethernet/ibm/ibmvnic.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 4f2d329..27f7933 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1313,6 +1313,7 @@ static void __ibmvnic_reset(struct work_struct *work) if (rc) { free_all_rwi(adapter); + mutex_unlock(>reset_lock); return; }
Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS
From: Andrew LunnDate: Thu, 18 May 2017 17:09:25 +0200 > Since these are software counters, they can be consistent. From a > practical point of view, i doubt they ever will all be consistent, > there are simply too many drivers to test and change if > needed. However, for the ones somebody cares about, they can be made > consistent. > > I care about r8152, and would like to make it consistent with asix, > dsa, e1000e. No objection from me for making software counters consistent.
[PATCH net-next] xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static
From: Wei YongjunFixes the following sparse warnings: net/9p/trans_xen.c:528:5: warning: symbol 'p9_trans_xen_init' was not declared. Should it be static? net/9p/trans_xen.c:540:6: warning: symbol 'p9_trans_xen_exit' was not declared. Should it be static? Signed-off-by: Wei Yongjun --- net/9p/trans_xen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index 71e8564..3deb17f 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -525,7 +525,7 @@ static struct xenbus_driver xen_9pfs_front_driver = { .otherend_changed = xen_9pfs_front_changed, }; -int p9_trans_xen_init(void) +static int p9_trans_xen_init(void) { if (!xen_domain()) return -ENODEV; @@ -537,7 +537,7 @@ int p9_trans_xen_init(void) } module_init(p9_trans_xen_init); -void p9_trans_xen_exit(void) +static void p9_trans_xen_exit(void) { v9fs_unregister_trans(_xen_trans); return xenbus_unregister_driver(_9pfs_front_driver);
Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral
From: Geert UytterhoevenDate: Thu, 18 May 2017 14:59:05 +0200 > If an Ethernet PHY is initialized before the interrupt controller it is > connected to, a message like the following is printed: > > irq: no irq domain found for /interrupt-controller@e61c ! > > However, the actual error is ignored, leading to a non-functional (-1) > PHY interrupt later: > > Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver > [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1) > > Depending on whether the PHY driver will fall back to polling, Ethernet > may or may not work. > > To fix this: > 1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to > of_irq_get(). > Unlike the former, the latter returns -EPROBE_DEFER if the > interrupt controller is not yet available, so this condition can be > detected. > Other errors are handled the same as before, i.e. use the passed > mdio->irq[addr] as interrupt. > 2. Propagate and handle errors from of_mdiobus_register_phy() and > of_mdiobus_register_device(). > > Signed-off-by: Geert Uytterhoeven Florian or someone similarly knowledgable, please review.
Re: [PATCH 1/2] sh_eth: Use platform device for printing before register_netdev()
From: Geert UytterhoevenDate: Thu, 18 May 2017 15:01:34 +0200 > The MDIO initialization failure message is printed using the network > device, before it has been registered, leading to: > > (null): failed to initialise MDIO > > Use the platform device instead to fix this: > > sh-eth ee70.ethernet: failed to initialise MDIO > > Fixes: daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the > network device") > Signed-off-by: Geert Uytterhoeven Applied.
Re: [PATCH 2/2] sh_eth: Do not print an error message for probe deferral
From: Geert UytterhoevenDate: Thu, 18 May 2017 15:01:35 +0200 > EPROBE_DEFER is not an error, hence printing an error message like > > sh-eth ee70.ethernet: failed to initialise MDIO > > may confuse the user. > > To fix this, suppress the error message in case of probe deferral. > While at it, shorten the message, and add the actual error code. > > Signed-off-by: Geert Uytterhoeven Applied.
Re: [PATCH] net: ieee802154: fix net_device reference release too early
Hello. On Thu, 2017-05-18 at 15:14, Stefan Schmidt wrote: > Hello. > > On Thu, 2017-05-18 at 15:50, linzhang wrote: > > This patch fixes the kernel oops when release net_device reference in > > advance. In function raw_sendmsg(i think the dgram_sendmsg has the same > > problem), there is a race condition between dev_put and dev_queue_xmit > > when the device is gong that maybe lead to dev_queue_ximt to see > > an illegal net_device pointer. > > > > You have a test case to reproduce this oops? I fear I have not seen > one. If you have a test case handy adding it to the commit would be handy. If you do not have one around we can do without. > > So i think that dev_put should be behind of the dev_queue_xmit. > > > > Also, explicit set skb->sk is needless, sock_alloc_send_skb is > > already set it. > > You could have put this fixup in a different patch. I actually would request you to split this into two patches. One for the removal of the sk setting and one for the race condition fix. > > Signed-off-by: linzhang> > This looks more like a username instead of a real name. If you have Lin > Zhang as you English real name that would be better here. :) This would be also appreciated. > > --- > > net/ieee802154/socket.c | 10 -- > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c > > index eedba76..a60658c 100644 > > --- a/net/ieee802154/socket.c > > +++ b/net/ieee802154/socket.c > > @@ -301,15 +301,14 @@ static int raw_sendmsg(struct sock *sk, struct msghdr > > *msg, size_t size) > > goto out_skb; > > > > skb->dev = dev; > > - skb->sk = sk; > > skb->protocol = htons(ETH_P_IEEE802154); > > > > - dev_put(dev); > > - > > err = dev_queue_xmit(skb); > > if (err > 0) > > err = net_xmit_errno(err); > > > > + dev_put(dev); > > + > > return err ?: size; > > > > out_skb: > > @@ -690,15 +689,14 @@ static int dgram_sendmsg(struct sock *sk, struct > > msghdr *msg, size_t size) > > goto out_skb; > > > > skb->dev = dev; > > - skb->sk = sk; > > skb->protocol = htons(ETH_P_IEEE802154); > > > > - dev_put(dev); > > - > > err = dev_queue_xmit(skb); > > if (err > 0) > > err = net_xmit_errno(err); > > > > + dev_put(dev); > > + > > return err ?: size; > > Going to give this a test ride here now. I gave it a ride in my testbed and I encountered no problems. While I have never seen the race and oops myself doing the dev_put before the xmit can surely lead to such a race and the fix is valid. Once you have done the changes requested above and re-submit your two patches you can add my Acked-by: Stefan Schmidt to both of them. regards Stefan Schmidt
Re: [PATCH net-next] geneve: add rtnl changelink support
TL DR; There is indeed a race between geneve_changelink() and geneve transmit path w.r.t attributes being changed and the old value of those attributes being used in the transmit patch. I will resubmit V2 of the patch with those issues addressed. Thanks! Please see in-line for my other comments.. Signed-off-by: Girish Moodalbail--- drivers/net/geneve.c | 149 --- 1 file changed, 117 insertions(+), 32 deletions(-) ... @@ -1169,45 +1181,58 @@ static void init_tnl_info(struct ip_tunnel_info *info, __u16 dst_port) info->key.tp_dst = htons(dst_port); } -static int geneve_newlink(struct net *net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[]) +static int geneve_nl2info(struct net_device *dev, struct nlattr *tb[], + struct nlattr *data[], struct ip_tunnel_info *info, + bool *metadata, bool *use_udp6_rx_checksums, + bool changelink) { - bool use_udp6_rx_checksums = false; - struct ip_tunnel_info info; - bool metadata = false; + struct geneve_dev *geneve = netdev_priv(dev); - init_tnl_info(, GENEVE_UDP_PORT); + if (changelink) { + /* if changelink operation, start with old existing info */ + memcpy(info, >info, sizeof(*info)); + *metadata = geneve->collect_md; + *use_udp6_rx_checksums = geneve->use_udp6_rx_checksums; + } else { + init_tnl_info(info, GENEVE_UDP_PORT); + } if (data[IFLA_GENEVE_REMOTE] && data[IFLA_GENEVE_REMOTE6]) return -EINVAL; if (data[IFLA_GENEVE_REMOTE]) { - info.key.u.ipv4.dst = + info->key.u.ipv4.dst = nla_get_in_addr(data[IFLA_GENEVE_REMOTE]); - if (IN_MULTICAST(ntohl(info.key.u.ipv4.dst))) { + if (IN_MULTICAST(ntohl(info->key.u.ipv4.dst))) { netdev_dbg(dev, "multicast remote is unsupported\n"); return -EINVAL; } + if (changelink && + ip_tunnel_info_af(>info) == AF_INET6) { + info->mode &= ~IP_TUNNEL_INFO_IPV6; + info->key.tun_flags &= ~TUNNEL_CSUM; + *use_udp6_rx_checksums = false; + } This allows changelink to change ipv4 address but there are no changes made to the geneve tunnel port hash table after this update. The following code in geneve_changelink() does what you are asking for +if (!geneve_dst_addr_equal(>info, )) +dst_cache_reset(_cache); geneve_nl2info() accrues all the allowed changes to be made and captures it in ip_tunnel_info structure and then the above code in geneve_changelink() ensures that all the route cache associated with the old remote address are released when the next lookup occurs. We also need to check to see if there is any conflicts with existing ports. This is not needed since we don't support changing the remote port. What is the barrier between the rx/tx threads and changelink process? There is an issue here like you pointed out (thanks!). Will fix that issue. } if (data[IFLA_GENEVE_REMOTE6]) { #if IS_ENABLED(CONFIG_IPV6) - info.mode = IP_TUNNEL_INFO_IPV6; - info.key.u.ipv6.dst = + info->mode = IP_TUNNEL_INFO_IPV6; + info->key.u.ipv6.dst = nla_get_in6_addr(data[IFLA_GENEVE_REMOTE6]); - if (ipv6_addr_type() & + if (ipv6_addr_type(>key.u.ipv6.dst) & IPV6_ADDR_LINKLOCAL) { netdev_dbg(dev, "link-local remote is unsupported\n"); return -EINVAL; } - if (ipv6_addr_is_multicast()) { + if (ipv6_addr_is_multicast(>key.u.ipv6.dst)) { netdev_dbg(dev, "multicast remote is unsupported\n"); return -EINVAL; } - info.key.tun_flags |= TUNNEL_CSUM; - use_udp6_rx_checksums = true; + info->key.tun_flags |= TUNNEL_CSUM; + *use_udp6_rx_checksums = true; Same here. We need to check/fix the geneve tunnel hash table according to new IP address. This is taken care by the call to dst_cache_reset() whenever the remote address changes. This function already takes care of races and contentions 8<-8<-- /** * dst_cache_reset - invalidate the cache contents * @dst_cache: the cache * * This do not free the cached dst to avoid races and contentions. * the dst will be freed on later cache lookup. */ static inline void dst_cache_reset(struct dst_cache *dst_cache) { dst_cache->reset_ts = jiffies; }
Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails
Hi Nikolay, Nikolay Aleksandrovwrites: >> err = __br_mdb_del(br, entry); >> -if (!err) >> -__br_mdb_notify(dev, p, entry, RTM_DELMDB); >> +if (err) >> +break; >> +__br_mdb_notify(dev, p, entry, RTM_DELMDB); >> } >> } else { >> err = __br_mdb_del(br, entry); >> > > This can potentially break user-space scripts that rely on the best-effort > behaviour, this is the normal "delete without vid & enabled vlan filtering". > You can check the fdb delete code which does the same, this was intentional. > > You can add an mdb entry without a vid to all vlans, add a vlan and then try > to remove it from all vlans where it is present - with this patch obviously > that will fail at the new vlan. OK good to know. That intention wasn't obvious. I can make __br_mdb_del return void instead? What about the rest of the patchset if I do so? Thanks, Vivien
Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS
> I am afraid that we won't be able to enforce a consistent behavior, > because the HW itself is not consistent, both on the NIC and on the > switch side. Hi Florian I agree with that, for MIB counters. They tend to come direct from the hardware. However, rx_bytes and tx_bytes are not from the hardware. They are software stats, kept by the drivers. Just grep in driver/net/ethernet and you see: broadcom/bcmsysport.c: ndev->stats.rx_bytes += len; broadcom/sb1250-mac.c: dev->stats.rx_bytes += len; mellanox/mlx5/core/en_main.c: s->rx_bytes += rq_stats->bytes; microchip/encx24j600.c: dev->stats.rx_bytes += rsv->len; neterion/vxge/vxge-main.c: net_stats->rx_bytes += bytes; nuvoton/w90p910_ether.c:dev->stats.rx_bytes += length; etc. Since these are software counters, they can be consistent. From a practical point of view, i doubt they ever will all be consistent, there are simply too many drivers to test and change if needed. However, for the ones somebody cares about, they can be made consistent. I care about r8152, and would like to make it consistent with asix, dsa, e1000e. Andrew
Re: [PATCH net 3/3] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
On Thu, May 18, 2017 at 09:31:05AM -0400, Vladislav Yasevich wrote: > Since virtio does not provide it's own ndo_features_check handler, > TSO, and now checksum offload, are disabled for stacked vlans. > Re-enable the support and let the host take care of it. This > restores/improves Guest-to-Guest performance over Q-in-Q vlans. > > CC: "Michael S. Tsirkin"> CC: Jason Wang > CC: virtualizat...@lists.linux-foundation.org > Signed-off-by: Vladislav Yasevich Acked-by: Michael S. Tsirkin > --- > drivers/net/virtio_net.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 8324a5e..341fb96 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2028,6 +2028,7 @@ static const struct net_device_ops virtnet_netdev = { > .ndo_poll_controller = virtnet_netpoll, > #endif > .ndo_xdp= virtnet_xdp, > + .ndo_features_check = passthru_features_check, > }; > > static void virtnet_config_changed_work(struct work_struct *work) > -- > 2.7.4
Re: [patch net 0/2] mlxsw: couple of fixes
From: Jiri PirkoDate: Thu, 18 May 2017 09:18:51 +0200 > Couple of fixes from Arkadi Series applied.
Re: [patch net-next] mlxsw: spectrum_dpipe: Fix sparse warnings
From: Jiri PirkoDate: Thu, 18 May 2017 09:22:45 +0200 > From: Arkadi Sharshevsky > > drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:52: warning: > Using plain integer as NULL pointer > drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:74: warning: > Using plain integer as NULL pointer > > Signed-off-by: Arkadi Sharshevsky > Reviewed-by: Ido Schimmel > Signed-off-by: Jiri Pirko Applied.
Re: [PATCH v3] net: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.
From: Eric AnholtDate: Wed, 17 May 2017 17:32:12 -0700 > Cygnus is a small family of SoCs, of which we currently have > devicetree for BCM11360 and BCM58300. The 11360's B53 is mostly the > same as 58xx, just requiring a tiny bit of setup that was previously > missing. > > Signed-off-by: Eric Anholt > Reviewed-by: Florian Fainelli > Acked-by: Rob Herring Applied to net-next, thanks.
Re: [RFC] iproute: Add support for extended ack to rtnl_talk
On Thu, 18 May 2017 12:02:07 +0200 Daniel Borkmannwrote: > On 05/16/2017 06:36 PM, Stephen Hemminger wrote: > > On Sat, 13 May 2017 19:29:57 -0600 > > David Ahern wrote: > > > >> On 5/4/17 2:43 PM, Phil Sutter wrote: > >>> So in summary, given that very little change happens to iproute2's > >>> internal libnetlink, I don't see much urge to make it use libmnl as > >>> backend. In my opinion it just adds another potential source of errors. > >>> > >>> Eventually this should be a maintainer level decision, though. :) > >> > >> What is the decision on this? > > > > I am waiting for a longer before committing anything. This was to allow > > for a wider range of distribution maintainer feedback. > > > > The most likely outcome is that for 4.12 is to use libmnl for extended ack. > > And continue to support building without mnl with loss of functionality. > > > > As far as conversion of all of iproute2 to libmnl. I have better things > > to do... But for new functionality like extended ack, devlink, tipc, using > > libmnl is easy, safe and it works well. I will continue to not accept > > new code that depends on the other library (libnl). That has come up > > a couple of times. > > So effectively this means libmnl has to be used for new stuff, noone > has time to do the work to convert the existing tooling over (which > by itself might be a challenge in testing everything to make sure > there are no regressions) given there's not much activity around > lib/libnetlink.c anyway, and existing users not using libmnl today > won't see/notice new improvements on netlink side when they do an > upgrade. So we'll be stuck with that dual library mess pretty much > for a very long time. :( > > If there's such high desire to use libmnl (?), can't there be a > one time effort wrapping the core netlink code over, making a hard > cut for everyone where from one release to another the dependency > becomes really mandatory rather than optional? That's more work > initially, but still seems a lot better than growing a wild mix > of both over time where users see different behavior of the tools > depending on their setup. (This could perhaps also make actual > conversion much harder later on.) If nothing else it would be simple experiment to do libnetlink to libmnl wrappers in libnetlink.h > Can't you add that lib conversion as a Google summer of code project, > so that someone is actively taking care of that initial work? Agreed
admin
ATENCIÓN; Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación: nombre: Nombre de usuario: contraseña: Confirmar contraseña: E-mail: teléfono: Si usted no puede revalidar su buzón, el buzón se deshabilitará! Disculpa las molestias. Código de verificación: es: 006524 Correo Soporte Técnico © 2017 ¡gracias Sistemas administrador
Fw: [Bug 195807] New: general protection fault in ping_v4_sendmsg
Begin forwarded message: Date: Thu, 18 May 2017 03:36:33 + From: bugzilla-dae...@bugzilla.kernel.org To: step...@networkplumber.org Subject: [Bug 195807] New: general protection fault in ping_v4_sendmsg https://bugzilla.kernel.org/show_bug.cgi?id=195807 Bug ID: 195807 Summary: general protection fault in ping_v4_sendmsg Product: Networking Version: 2.5 Kernel Version: 4.4 to 4.10-rc7 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: IPV4 Assignee: step...@networkplumber.org Reporter: you...@ruc.edu.cn Regression: No Created attachment 256607 --> https://bugzilla.kernel.org/attachment.cgi?id=256607=edit poc and kernel config I got a general protection fault (use after free) when fuzzing the bpf system call. Attached is the PoC that can reproduce this issue in kernel version from 4.4 to 4.10-rc7. Following is the dmesg output when executing the PoC on kernel version 4.10-rc7 [ 32.949367] kasan: CONFIG_KASAN_INLINE enabled [ 32.949915] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 32.950602] general protection fault: [#1] SMP KASAN [ 32.951089] Dumping ftrace buffer: [ 32.951396](ftrace buffer empty) [ 32.951579] Modules linked in: [ 32.951579] CPU: 0 PID: 4145 Comm: poc-NB1 Not tainted 4.10.0-rc7 #1 [ 32.951579] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 32.951579] task: 880064f51bc0 task.stack: 880056568000 [ 32.951579] RIP: 0010:ping_v4_sendmsg+0xcbd/0x1240 [ 32.951579] RSP: 0018:88005656f9b8 EFLAGS: 00010206 [ 32.951579] RAX: dc00 RBX: 88005656fc20 RCX: 11000a9ad033 [ 32.951579] RDX: 0018 RSI: 0008 RDI: 00c2 [ 32.951579] RBP: 88005656fc48 R08: 0008 R09: [ 32.951579] R10: 017f R11: R12: 880054d68040 [ 32.951579] R13: R14: 88005656fb40 R15: 88005656fac0 [ 32.951579] FS: 7fc22df907c0() GS:88006ca0() knlGS: [ 32.951579] CS: 0010 DS: ES: CR0: 80050033 [ 32.951579] CR2: 20007000 CR3: 656e CR4: 06f0 [ 32.951579] Call Trace: [ 32.951579] ? ping_queue_rcv_skb+0x60/0x60 [ 32.951579] ? depot_save_stack+0x133/0x4a0 [ 32.951579] ? save_stack+0xb1/0xd0 [ 32.951579] ? save_stack_trace+0x16/0x20 [ 32.951579] ? save_stack+0x46/0xd0 [ 32.951579] ? __anon_vma_prepare+0x30e/0x570 [ 32.951579] ? handle_mm_fault+0xdb0/0x1e30 [ 32.951579] ? __do_page_fault+0x5b9/0xc50 [ 32.951579] ? do_page_fault+0x2a/0x30 [ 32.951579] ? page_fault+0x22/0x30 [ 32.951579] ? ip4_datagram_release_cb+0xf3/0x6e0 [ 32.951579] ? _raw_write_unlock_bh+0x3c/0x50 [ 32.951579] ? ping_get_port+0x37d/0x5e0 [ 32.951579] ? _raw_spin_unlock_bh+0x3c/0x50 [ 32.951579] ? release_sock+0x194/0x1d0 [ 32.951579] inet_sendmsg+0x141/0x3e0 [ 32.951579] ? inet_recvmsg+0x430/0x430 [ 32.951579] sock_sendmsg+0xde/0x120 [ 32.951579] SYSC_sendto+0x23f/0x3a0 [ 32.951579] ? SYSC_connect+0x320/0x320 [ 32.951579] ? __page_set_anon_rmap+0x1cc/0x2b0 [ 32.951579] ? __lru_cache_add+0x114/0x1a0 [ 32.951579] ? handle_mm_fault+0x6ff/0x1e30 [ 32.951579] ? get_unused_fd_flags+0xd0/0xd0 [ 32.951579] ? find_vma+0x3f/0x190 [ 32.951579] ? __do_page_fault+0x3ae/0xc50 [ 32.951579] SyS_sendto+0x4a/0x60 [ 32.951579] entry_SYSCALL_64_fastpath+0x13/0x94 [ 32.951579] RIP: 0033:0x7fc22dac6b79 [ 32.951579] RSP: 002b:7ffc4ecef988 EFLAGS: 0206 ORIG_RAX: 002c [ 32.951579] RAX: ffda RBX: RCX: 7fc22dac6b79 [ 32.951579] RDX: 0008 RSI: 20004ff5 RDI: 0003 [ 32.951579] RBP: 7ffc4ecefa00 R08: 20007000 R09: 0010 [ 32.951579] R10: 483c R11: 0206 R12: 00400b20 [ 32.951579] R13: 7ffc4ecefb30 R14: R15: [ 32.951579] Code: ff c1 e2 10 66 31 c0 01 d0 15 ff ff 00 00 f7 d0 48 89 fa c1 e8 10 48 c1 ea 03 66 89 83 a2 fe ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 [ 32.951579] RIP: ping_v4_sendmsg+0xcbd/0x1240 RSP: 88005656f9b8 [ 32.978078] ---[ end trace 3d206c2ba5fde6a4 ]--- [ 32.978505] Kernel panic - not syncing: Fatal exception [ 32.979052] Dumping ftrace buffer: [ 32.979052](ftrace buffer empty) [ 32.979052] Kernel Offset: disabled [ 32.979052] Rebooting in 86400 seconds.. -- You are receiving this mail because: You are the assignee for the bug.
Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
On 18/05/17 03:48, Alexei Starovoitov wrote: > Would it be easier to represent this logic via (mask_of_unknown, value) > instead of (mask0, mask1) ? Yes, I like this. > As far as upper bits we can tweak the algorithm to eat into > one or more bits of known bits due to carry. > Like > 00xx11 + 00xx11 = 0xxx10 > we will eat only one bit (second from left) and the highest bit > is known to stay zero, since carry can only compromise 2nd from left. > Such logic should work for sparse representation of unknown bits too > Like: > 10xx01xx10 + > 01xx01xx00 = > 1xxx10 > both upper two bits would be unknown, but only one middle bit becomes > unknown. Yes, that is the behaviour we want. But it's unclear how to efficiently compute it, without just iterating over the bits and computing carry possibilities. Here's one idea that seemed to work when I did a couple of experiments: let A = (a;am), B = (b;bm) where the m are the masks Σ = am + bm + a + b χ = Σ ^ (a + b) /* unknown carries */ μ = χ | am | bm /* mask of result */ then A + B = ((a + b) & ~μ; μ) The idea is that we find which bits change between the case "all x are 1" and "all x are 0", and those become xs too. But I'm not certain that that's always going to cover all possible values in between. It worked on the tests I came up with, and also your example above, but I can't quite prove it'll always work. -Ed
Re: [PATCH] net1080: Mark nc_dump_ttl() as __maybe_unused
From: Matthias KaehlckeDate: Wed, 17 May 2017 15:17:08 -0700 > The function is not used, but it looks useful for debugging. Adding the > attribute fixes the following clang warning: > > drivers/net/usb/net1080.c:271:20: error: unused function > 'nc_dump_ttl' [-Werror,-Wunused-function] > > Signed-off-by: Matthias Kaehlcke For this and the r8152 patch, I definitely prefer that the function is removed. If someone needs them, they can pull it out of the GIT history.
Re: [PATCH v2] e1000e: Don't return uninitialized stats
From: Benjamin PoirierDate: Wed, 17 May 2017 16:24:13 -0400 > Some statistics passed to ethtool are garbage because e1000e_get_stats64() > doesn't write them, for example: tx_heartbeat_errors. This leaks kernel > memory to userspace and confuses users. > > Do like ixgbe and use dev_get_stats() which first zeroes out > rtnl_link_stats64. > > Fixes: 5944701df90d ("net: remove useless memset's in drivers get_stats64") > Reported-by: Stefan Priebe > Signed-off-by: Benjamin Poirier Jeff, please be sure to pick this up, thanks.
[PATCH] xfrm: fix state migration replay sequence numbers
During xfrm migration replay and preplay sequence numbers are not copied from the previous state. Here is tcpdump output showing the problem. 10.0.10.46 is running vanilla kernel, IKE/IPsec responder. After the migration it sent wrong sequence number, reset to 1. The migration is from 10.0.0.52 to 10.0.0.53. IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136 IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136 IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136 IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136 IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I] IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R] IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136 NOTE: next sequence is wrong 0x1 IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136 IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136 IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136 The attached patch fix it by copying replay and preplay. regards, -antony Antony Antony (1): xfrm: fix state migration replay sequence numbers net/xfrm/xfrm_state.c | 2 ++ 1 file changed, 2 insertions(+) -- 2.9.3 >From 1241e8b4c38ad2bf7399599165f763af38aba8d9 Mon Sep 17 00:00:00 2001 From: Antony AntonyDate: Thu, 18 May 2017 12:19:32 +0200 Subject: [PATCH] xfrm: fix state migration copy replay sequence numbers To: netdev@vger.kernel.org, Herbert Xu , Steffen Klassert Cc: Richard Guy Briggs During xfrm migration copy replay and preplay sequence numbers from the previous state. Signed-off-by: Antony Antony --- net/xfrm/xfrm_state.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index fc3c5aa..2e291bc 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig) x->curlft.add_time = orig->curlft.add_time; x->km.state = orig->km.state; x->km.seq = orig->km.seq; + x->replay = orig->replay; + x->preplay = orig->preplay; return x; -- 2.9.3